├── .gitignore
├── README.md
├── convert_to_onnx.py
├── data
    ├── __init__.py
    ├── config.py
    ├── data_augment.py
    └── wider_face.py
├── detect.py
├── images
    ├── _-20_1008_0.jpg
    ├── _-20_144_2.jpg
    ├── c2f3bc4499934b8ed942743ee8e3082a.jpg
    └── mobilenetv22222.jpg
├── layers
    ├── __init__.py
    ├── functions
    │   └── prior_box.py
    └── modules
    │   ├── __init__.py
    │   └── multibox_loss.py
├── models
    ├── __init__.py
    ├── net.py
    ├── retinaface.py
    └── utils.py
├── test_widerface.py
├── train.py
├── utils
    ├── __init__.py
    ├── box_utils.py
    ├── nms
    │   ├── __init__.py
    │   └── py_cpu_nms.py
    └── timer.py
└── widerface_evaluate
    ├── README.md
    ├── box_overlaps.c
    ├── box_overlaps.pyx
    ├── evaluation.py
    ├── ground_truth
        ├── wider_easy_val.mat
        ├── wider_face_val.mat
        ├── wider_hard_val.mat
        └── wider_medium_val.mat
    └── setup.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # LightWeightFaceDetector
 2 | 
 3 | ## update20210510
 4 | Upload a new face detection dataset with face box and 5 landmarks. this dataset makes up of big face and helps to improve detection acc with close face.You can add this dataset to widerface.
 5 | 
 6 | 
 7 | [MobileFaceDet](https://pan.baidu.com/s/1x8zATo7TDx300JLPxyvI8g) passwords: eu8w
 8 | 
 9 | 
10 | 开源一个近场景人脸检测和关键点数据集，数据集包含了27k+张人脸标注，全部为近距离人脸，类似于手机前置拍摄。有助于改善移动端人脸检测和关键点回归精度，将这个数据集和widerface train数据集合并，可以训练了一个大小仅仅120k的人脸检测模型.
11 | 数据集标注如下：
12 | 
13 | 
14 | ![](images/_-20_1008_0.jpg)
15 | ![](images/_-20_144_2.jpg)
16 | 
17 | Ultra Light Weight Face Detection with Landmark, model size  is around 1M+ for Mobile or Edge devices. I samplified  RetinaFace structure for fast inference.
18 | 
19 | I test four light-weight network as backbone including mobilenet v1, v2, v3 and efficientnet-b0.
20 | 
21 | 适用于移动端或者边缘计算的轻量人脸检测和关键点检测模型，模型仅仅1M多。主要基于RetinaFace结构简化，删除了前面几个大特征图上的Head，因此小目标的人脸检测可能会有影响，在一般应用场景下影响不大。
22 | 
23 | 这里速度最快的是mobilenet_v2_0.1,效果如图：
24 | 
25 | ![](./images/mobilenetv22222.jpg)
26 | 
27 | ## WiderFace Val Performance
28 | 
29 | | Models            | Easy    | Medium  | Hard    |
30 | | ----------------- | ------- | ------- | ------- |
31 | | mobilenetv1_0.25  | 0.91718 | 0.79766 | 0.3592  |
32 | | mobilenetv2_0.1   | 0.85330 | 0.68946 | 0.2993  |
33 | | mobilenetv3_small | 0.93419 | 0.83259 | 0.3850  |
34 | | efficientnet-b0   | 0.93167 | 0.81466 | 0.37020 |
35 | 
36 | ## Data
37 | 
38 | 1. Download the [WIDERFACE](http://shuoyang1213.me/WIDERFACE/WiderFace_Results.html) dataset.
39 | 2. Here we use the organized dataset we used as in the above directory structure.
40 | 
41 | Link: from [google cloud](https://drive.google.com/open?id=11UGV3nbVv1x9IC--_tK3Uxf7hA6rlbsS) or [baidu cloud](https://pan.baidu.com/s/1jIp9t30oYivrAvrgUgIoLQ) Password: ruck
42 | 
43 | ## Training
44 | 
45 | We provide four light weight backbone(mobilenetv1, mobilenetv2, mobilenetv3, efficientnetb0) network to train model.
46 | 
47 | 1.make dir ./weights/ and download imagenet pretrained weights from [link](链接: https://pan.baidu.com/s/1zhyL9ULuIi1KdtXzhSQ4yQ 提取码: urei) and put them in ./weights/
48 | 
49 | ```Shell
50 |   ./weights/
51 |       mobilenet0.25_Final.pth
52 |       mobilenetV1X0.25_pretrain.tar
53 |       efficientnetb0_face.pth
54 |       mobilenetv3.pth
55 |       mobilenetv2_0.1_face.pth
56 |       ...
57 | ```
58 | 
59 | 1. Before training, you can check network configuration (e.g. batch_size, min_sizes and steps etc..) in ``data/config.py and train.py``.
60 | 
61 | 2. Train the model using WIDER FACE:
62 | 
63 |   ```Shell
64 |   CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --network mobilenetv1
65 |   CUDA_VISIBLE_DEVICES=0 python train.py --network mobilenetv1
66 |   ```
67 | 
68 | 
69 | ## Evaluation
70 | 
71 | ### Evaluation widerface val
72 | 
73 | 1. Generate txt file
74 | 
75 | ```Shell
76 | python test_widerface.py --trained_model weight_file --network mobilenetv1(or mobilenetv2, mobilenetv3, efficientnetb0)
77 | ```
78 | 
79 | 2. Evaluate txt results. 
80 | 
81 | ```Shell
82 | cd ./widerface_evaluate
83 | python setup.py build_ext --inplace
84 | python evaluation.py
85 | ```
86 | ## Android and IOS
87 | Android is deployed with libtorch：https://github.com/midasklr/facedetection_android.pytorch
88 | IOS use ncnn
89 | ## References
90 | 
91 | [Pytorch_Retinaface](https://github.com/biubug6/Pytorch_Retinaface)
92 | 
93 | 


--------------------------------------------------------------------------------
/convert_to_onnx.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function
 2 | import os
 3 | import argparse
 4 | import torch
 5 | import torch.backends.cudnn as cudnn
 6 | import numpy as np
 7 | from data import cfg_mnet, cfg_re50
 8 | from layers.functions.prior_box import PriorBox
 9 | from utils.nms.py_cpu_nms import py_cpu_nms
10 | import cv2
11 | from models.retinaface import RetinaFace
12 | from utils.box_utils import decode, decode_landm
13 | from utils.timer import Timer
14 | 
15 | 
16 | parser = argparse.ArgumentParser(description='Test')
17 | parser.add_argument('-m', '--trained_model', default='./weights/mobilenet0.25_Final.pth',
18 |                     type=str, help='Trained state_dict file path to open')
19 | parser.add_argument('--network', default='mobile0.25', help='Backbone network mobile0.25 or resnet50')
20 | parser.add_argument('--long_side', default=640, help='when origin_size is false, long_side is scaled size(320 or 640 for long side)')
21 | parser.add_argument('--cpu', action="store_true", default=True, help='Use cpu inference')
22 | 
23 | args = parser.parse_args()
24 | 
25 | 
26 | def check_keys(model, pretrained_state_dict):
27 |     ckpt_keys = set(pretrained_state_dict.keys())
28 |     model_keys = set(model.state_dict().keys())
29 |     used_pretrained_keys = model_keys & ckpt_keys
30 |     unused_pretrained_keys = ckpt_keys - model_keys
31 |     missing_keys = model_keys - ckpt_keys
32 |     print('Missing keys:{}'.format(len(missing_keys)))
33 |     print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys)))
34 |     print('Used keys:{}'.format(len(used_pretrained_keys)))
35 |     assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint'
36 |     return True
37 | 
38 | 
39 | def remove_prefix(state_dict, prefix):
40 |     ''' Old style model is stored with all names of parameters sharing common prefix 'module.' '''
41 |     print('remove prefix \'{}\''.format(prefix))
42 |     f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x
43 |     return {f(key): value for key, value in state_dict.items()}
44 | 
45 | 
46 | def load_model(model, pretrained_path, load_to_cpu):
47 |     print('Loading pretrained model from {}'.format(pretrained_path))
48 |     if load_to_cpu:
49 |         pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage)
50 |     else:
51 |         device = torch.cuda.current_device()
52 |         pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device))
53 |     if "state_dict" in pretrained_dict.keys():
54 |         pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.')
55 |     else:
56 |         pretrained_dict = remove_prefix(pretrained_dict, 'module.')
57 |     check_keys(model, pretrained_dict)
58 |     model.load_state_dict(pretrained_dict, strict=False)
59 |     return model
60 | 
61 | 
62 | if __name__ == '__main__':
63 |     torch.set_grad_enabled(False)
64 |     cfg = None
65 |     if args.network == "mobile0.25":
66 |         cfg = cfg_mnet
67 |     elif args.network == "resnet50":
68 |         cfg = cfg_re50
69 |     # net and model
70 |     net = RetinaFace(cfg=cfg, phase = 'test')
71 |     net = load_model(net, args.trained_model, args.cpu)
72 |     net.eval()
73 |     print('Finished loading model!')
74 |     print(net)
75 |     device = torch.device("cpu" if args.cpu else "cuda")
76 |     net = net.to(device)
77 | 
78 |     # ------------------------ export -----------------------------
79 |     output_onnx = 'FaceDetector.onnx'
80 |     print("==> Exporting model to ONNX format at '{}'".format(output_onnx))
81 |     input_names = ["input0"]
82 |     output_names = ["output0"]
83 |     inputs = torch.randn(1, 3, args.long_side, args.long_side).to(device)
84 | 
85 |     torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False,
86 |                                    input_names=input_names, output_names=output_names)
87 | 
88 | 
89 | 


--------------------------------------------------------------------------------
/data/__init__.py:
--------------------------------------------------------------------------------
1 | from .wider_face import WiderFaceDetection, detection_collate
2 | from .data_augment import *
3 | from .config import *
4 | 


--------------------------------------------------------------------------------
/data/config.py:
--------------------------------------------------------------------------------
 1 | # config.py
 2 | 
 3 | cfg_mnetv1 = {
 4 |     'name': 'mobilenet0.25',
 5 |     'min_sizes': [[64, 128], [256, 512]],
 6 |     'steps': [16, 32],
 7 |     'variance': [0.1, 0.2],
 8 |     'clip': False,
 9 |     'loc_weight': 2.0,
10 |     'gpu_train': True,
11 |     'batch_size': 64,
12 |     'ngpu': 1,
13 |     'epoch': 120,
14 |     'decay1': 80,
15 |     'decay2': 100,
16 |     'image_size': 640,
17 |     'pretrain': True,
18 |     'return_layers': {'stage2': 2, 'stage3': 3},
19 |     'in_channel': 32,
20 |     'out_channel': 64
21 | }
22 | 
23 | cfg_mnetv2 = {
24 |     'name': 'mobilenetv2_0.1',
25 |     'min_sizes': [[64, 128], [256, 512]],
26 |     'steps': [16, 32],
27 |     'variance': [0.1, 0.2],
28 |     'clip': False,
29 |     'loc_weight': 2.0,
30 |     'gpu_train': True,
31 |     'batch_size': 64,
32 |     'ngpu': 1,
33 |     'epoch': 120,
34 |     'decay1': 80,
35 |     'decay2': 100,
36 |     'image_size': 640,
37 |     'pretrain': True,
38 |     'return_layers': {'stage2': 2, 'stage3': 3},
39 |     'in_channel1': 12,
40 |     'in_channel2': 1280,
41 |     'out_channel': 64
42 | }
43 | 
44 | cfg_mnetv3 = {
45 |     'name': 'mobilenetv3',
46 |     'min_sizes': [[64, 128], [256, 512]],
47 |     'steps': [16, 32],
48 |     'variance': [0.1, 0.2],
49 |     'clip': False,
50 |     'loc_weight': 2.0,
51 |     'gpu_train': True,
52 |     'batch_size': 32,
53 |     'ngpu': 1,
54 |     'epoch': 120,
55 |     'decay1': 80,
56 |     'decay2': 100,
57 |     'image_size': 640,
58 |     'pretrain': True,
59 |     'return_layers': {'stage2': 2, 'stage3': 3},
60 |     'in_channel1': 48,
61 |     'in_channel2': 576,
62 |     'out_channel': 64
63 | }
64 | 
65 | cfg_efnetb0 = {
66 |     'name': 'efficientnetb0',
67 |     'min_sizes': [[64, 128], [256, 512]],
68 |     'steps': [16, 32],
69 |     'variance': [0.1, 0.2],
70 |     'clip': False,
71 |     'loc_weight': 2.0,
72 |     'gpu_train': True,
73 |     'batch_size': 8,
74 |     'ngpu': 1,
75 |     'epoch': 120,
76 |     'decay1': 80,
77 |     'decay2': 100,
78 |     'image_size': 640,
79 |     'pretrain': True,
80 |     'return_layers': {'stage2': 2, 'stage3': 3},
81 |     'in_channel1': 112,
82 |     'in_channel2': 1280,
83 |     'out_channel': 64
84 | }
85 | 
86 | 
87 | 
88 | 


--------------------------------------------------------------------------------
/data/data_augment.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | import random
  4 | from utils.box_utils import matrix_iof
  5 | 
  6 | 
  7 | def _crop(image, boxes, labels, landm, img_dim):
  8 |     height, width, _ = image.shape
  9 |     pad_image_flag = True
 10 | 
 11 |     for _ in range(250):
 12 |         """
 13 |         if random.uniform(0, 1) <= 0.2:
 14 |             scale = 1.0
 15 |         else:
 16 |             scale = random.uniform(0.3, 1.0)
 17 |         """
 18 |         PRE_SCALES = [0.3, 0.45, 0.6, 0.8, 1.0]
 19 |         scale = random.choice(PRE_SCALES)
 20 |         short_side = min(width, height)
 21 |         w = int(scale * short_side)
 22 |         h = w
 23 | 
 24 |         if width == w:
 25 |             l = 0
 26 |         else:
 27 |             l = random.randrange(width - w)
 28 |         if height == h:
 29 |             t = 0
 30 |         else:
 31 |             t = random.randrange(height - h)
 32 |         roi = np.array((l, t, l + w, t + h))
 33 | 
 34 |         value = matrix_iof(boxes, roi[np.newaxis])
 35 |         flag = (value >= 1)
 36 |         if not flag.any():
 37 |             continue
 38 | 
 39 |         centers = (boxes[:, :2] + boxes[:, 2:]) / 2
 40 |         mask_a = np.logical_and(roi[:2] < centers, centers < roi[2:]).all(axis=1)
 41 |         boxes_t = boxes[mask_a].copy()
 42 |         labels_t = labels[mask_a].copy()
 43 |         landms_t = landm[mask_a].copy()
 44 |         landms_t = landms_t.reshape([-1, 5, 2])
 45 | 
 46 |         if boxes_t.shape[0] == 0:
 47 |             continue
 48 | 
 49 |         image_t = image[roi[1]:roi[3], roi[0]:roi[2]]
 50 | 
 51 |         boxes_t[:, :2] = np.maximum(boxes_t[:, :2], roi[:2])
 52 |         boxes_t[:, :2] -= roi[:2]
 53 |         boxes_t[:, 2:] = np.minimum(boxes_t[:, 2:], roi[2:])
 54 |         boxes_t[:, 2:] -= roi[:2]
 55 | 
 56 |         # landm
 57 |         landms_t[:, :, :2] = landms_t[:, :, :2] - roi[:2]
 58 |         landms_t[:, :, :2] = np.maximum(landms_t[:, :, :2], np.array([0, 0]))
 59 |         landms_t[:, :, :2] = np.minimum(landms_t[:, :, :2], roi[2:] - roi[:2])
 60 |         landms_t = landms_t.reshape([-1, 10])
 61 | 
 62 | 
 63 | 	# make sure that the cropped image contains at least one face > 16 pixel at training image scale
 64 |         b_w_t = (boxes_t[:, 2] - boxes_t[:, 0] + 1) / w * img_dim
 65 |         b_h_t = (boxes_t[:, 3] - boxes_t[:, 1] + 1) / h * img_dim
 66 |         mask_b = np.minimum(b_w_t, b_h_t) > 0.0
 67 |         boxes_t = boxes_t[mask_b]
 68 |         labels_t = labels_t[mask_b]
 69 |         landms_t = landms_t[mask_b]
 70 | 
 71 |         if boxes_t.shape[0] == 0:
 72 |             continue
 73 | 
 74 |         pad_image_flag = False
 75 | 
 76 |         return image_t, boxes_t, labels_t, landms_t, pad_image_flag
 77 |     return image, boxes, labels, landm, pad_image_flag
 78 | 
 79 | 
 80 | def _distort(image):
 81 | 
 82 |     def _convert(image, alpha=1, beta=0):
 83 |         tmp = image.astype(float) * alpha + beta
 84 |         tmp[tmp < 0] = 0
 85 |         tmp[tmp > 255] = 255
 86 |         image[:] = tmp
 87 | 
 88 |     image = image.copy()
 89 | 
 90 |     if random.randrange(2):
 91 | 
 92 |         #brightness distortion
 93 |         if random.randrange(2):
 94 |             _convert(image, beta=random.uniform(-32, 32))
 95 | 
 96 |         #contrast distortion
 97 |         if random.randrange(2):
 98 |             _convert(image, alpha=random.uniform(0.5, 1.5))
 99 | 
100 |         image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
101 | 
102 |         #saturation distortion
103 |         if random.randrange(2):
104 |             _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5))
105 | 
106 |         #hue distortion
107 |         if random.randrange(2):
108 |             tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
109 |             tmp %= 180
110 |             image[:, :, 0] = tmp
111 | 
112 |         image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
113 | 
114 |     else:
115 | 
116 |         #brightness distortion
117 |         if random.randrange(2):
118 |             _convert(image, beta=random.uniform(-32, 32))
119 | 
120 |         image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
121 | 
122 |         #saturation distortion
123 |         if random.randrange(2):
124 |             _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5))
125 | 
126 |         #hue distortion
127 |         if random.randrange(2):
128 |             tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
129 |             tmp %= 180
130 |             image[:, :, 0] = tmp
131 | 
132 |         image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
133 | 
134 |         #contrast distortion
135 |         if random.randrange(2):
136 |             _convert(image, alpha=random.uniform(0.5, 1.5))
137 | 
138 |     return image
139 | 
140 | 
141 | def _expand(image, boxes, fill, p):
142 |     if random.randrange(2):
143 |         return image, boxes
144 | 
145 |     height, width, depth = image.shape
146 | 
147 |     scale = random.uniform(1, p)
148 |     w = int(scale * width)
149 |     h = int(scale * height)
150 | 
151 |     left = random.randint(0, w - width)
152 |     top = random.randint(0, h - height)
153 | 
154 |     boxes_t = boxes.copy()
155 |     boxes_t[:, :2] += (left, top)
156 |     boxes_t[:, 2:] += (left, top)
157 |     expand_image = np.empty(
158 |         (h, w, depth),
159 |         dtype=image.dtype)
160 |     expand_image[:, :] = fill
161 |     expand_image[top:top + height, left:left + width] = image
162 |     image = expand_image
163 | 
164 |     return image, boxes_t
165 | 
166 | 
167 | def _mirror(image, boxes, landms):
168 |     _, width, _ = image.shape
169 |     if random.randrange(2):
170 |         image = image[:, ::-1]
171 |         boxes = boxes.copy()
172 |         boxes[:, 0::2] = width - boxes[:, 2::-2]
173 | 
174 |         # landm
175 |         landms = landms.copy()
176 |         landms = landms.reshape([-1, 5, 2])
177 |         landms[:, :, 0] = width - landms[:, :, 0]
178 |         tmp = landms[:, 1, :].copy()
179 |         landms[:, 1, :] = landms[:, 0, :]
180 |         landms[:, 0, :] = tmp
181 |         tmp1 = landms[:, 4, :].copy()
182 |         landms[:, 4, :] = landms[:, 3, :]
183 |         landms[:, 3, :] = tmp1
184 |         landms = landms.reshape([-1, 10])
185 | 
186 |     return image, boxes, landms
187 | 
188 | 
189 | def _pad_to_square(image, rgb_mean, pad_image_flag):
190 |     if not pad_image_flag:
191 |         return image
192 |     height, width, _ = image.shape
193 |     long_side = max(width, height)
194 |     image_t = np.empty((long_side, long_side, 3), dtype=image.dtype)
195 |     image_t[:, :] = rgb_mean
196 |     image_t[0:0 + height, 0:0 + width] = image
197 |     return image_t
198 | 
199 | 
200 | def _resize_subtract_mean(image, insize, rgb_mean):
201 |     interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4]
202 |     interp_method = interp_methods[random.randrange(5)]
203 |     image = cv2.resize(image, (insize, insize), interpolation=interp_method)
204 |     image = image.astype(np.float32)
205 |     image -= rgb_mean
206 |     return image.transpose(2, 0, 1)
207 | 
208 | 
209 | class preproc(object):
210 | 
211 |     def __init__(self, img_dim, rgb_means):
212 |         self.img_dim = img_dim
213 |         self.rgb_means = rgb_means
214 | 
215 |     def __call__(self, image, targets):
216 |         assert targets.shape[0] > 0, "this image does not have gt"
217 | 
218 |         boxes = targets[:, :4].copy()
219 |         labels = targets[:, -1].copy()
220 |         landm = targets[:, 4:-1].copy()
221 | 
222 |         image_t, boxes_t, labels_t, landm_t, pad_image_flag = _crop(image, boxes, labels, landm, self.img_dim)
223 |         image_t = _distort(image_t)
224 |         image_t = _pad_to_square(image_t,self.rgb_means, pad_image_flag)
225 |         image_t, boxes_t, landm_t = _mirror(image_t, boxes_t, landm_t)
226 |         height, width, _ = image_t.shape
227 |         image_t = _resize_subtract_mean(image_t, self.img_dim, self.rgb_means)
228 |         boxes_t[:, 0::2] /= width
229 |         boxes_t[:, 1::2] /= height
230 | 
231 |         landm_t[:, 0::2] /= width
232 |         landm_t[:, 1::2] /= height
233 | 
234 |         labels_t = np.expand_dims(labels_t, 1)
235 |         targets_t = np.hstack((boxes_t, landm_t, labels_t))
236 | 
237 |         return image_t, targets_t
238 | 


--------------------------------------------------------------------------------
/data/wider_face.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import os.path
  3 | import sys
  4 | import torch
  5 | import torch.utils.data as data
  6 | import cv2
  7 | import numpy as np
  8 | 
  9 | class WiderFaceDetection(data.Dataset):
 10 |     def __init__(self, txt_path, preproc=None):
 11 |         self.preproc = preproc
 12 |         self.imgs_path = []
 13 |         self.words = []
 14 |         f = open(txt_path,'r')
 15 |         lines = f.readlines()
 16 |         isFirst = True
 17 |         labels = []
 18 |         for line in lines:
 19 |             line = line.rstrip()
 20 |             if line.startswith('#'):
 21 |                 if isFirst is True:
 22 |                     isFirst = False
 23 |                 else:
 24 |                     labels_copy = labels.copy()
 25 |                     self.words.append(labels_copy)
 26 |                     labels.clear()
 27 |                 path = line[2:]
 28 |                 path = txt_path.replace('label.txt','images/') + path
 29 |                 self.imgs_path.append(path)
 30 |             else:
 31 |                 line = line.split(' ')
 32 |                 label = [float(x) for x in line]
 33 |                 labels.append(label)
 34 | 
 35 |         self.words.append(labels)
 36 | 
 37 |     def __len__(self):
 38 |         return len(self.imgs_path)
 39 | 
 40 |     def __getitem__(self, index):
 41 |         img = cv2.imread(self.imgs_path[index])
 42 |         height, width, _ = img.shape
 43 | 
 44 |         labels = self.words[index]
 45 |         annotations = np.zeros((0, 15))
 46 |         if len(labels) == 0:
 47 |             return annotations
 48 |         for idx, label in enumerate(labels):
 49 |             annotation = np.zeros((1, 15))
 50 |             # bbox
 51 |             annotation[0, 0] = label[0]  # x1
 52 |             annotation[0, 1] = label[1]  # y1
 53 |             annotation[0, 2] = label[0] + label[2]  # x2
 54 |             annotation[0, 3] = label[1] + label[3]  # y2
 55 | 
 56 |             # landmarks
 57 |             annotation[0, 4] = label[4]    # l0_x
 58 |             annotation[0, 5] = label[5]    # l0_y
 59 |             annotation[0, 6] = label[7]    # l1_x
 60 |             annotation[0, 7] = label[8]    # l1_y
 61 |             annotation[0, 8] = label[10]   # l2_x
 62 |             annotation[0, 9] = label[11]   # l2_y
 63 |             annotation[0, 10] = label[13]  # l3_x
 64 |             annotation[0, 11] = label[14]  # l3_y
 65 |             annotation[0, 12] = label[16]  # l4_x
 66 |             annotation[0, 13] = label[17]  # l4_y
 67 |             if (annotation[0, 4]<0):
 68 |                 annotation[0, 14] = -1
 69 |             else:
 70 |                 annotation[0, 14] = 1
 71 | 
 72 |             annotations = np.append(annotations, annotation, axis=0)
 73 |         target = np.array(annotations)
 74 |         if self.preproc is not None:
 75 |             img, target = self.preproc(img, target)
 76 | 
 77 |         return torch.from_numpy(img), target
 78 | 
 79 | def detection_collate(batch):
 80 |     """Custom collate fn for dealing with batches of images that have a different
 81 |     number of associated object annotations (bounding boxes).
 82 | 
 83 |     Arguments:
 84 |         batch: (tuple) A tuple of tensor images and lists of annotations
 85 | 
 86 |     Return:
 87 |         A tuple containing:
 88 |             1) (tensor) batch of images stacked on their 0 dim
 89 |             2) (list of tensors) annotations for a given image are stacked on 0 dim
 90 |     """
 91 |     targets = []
 92 |     imgs = []
 93 |     for _, sample in enumerate(batch):
 94 |         for _, tup in enumerate(sample):
 95 |             if torch.is_tensor(tup):
 96 |                 imgs.append(tup)
 97 |             elif isinstance(tup, type(np.empty(0))):
 98 |                 annos = torch.from_numpy(tup).float()
 99 |                 targets.append(annos)
100 | 
101 |     return (torch.stack(imgs, 0), targets)
102 | 


--------------------------------------------------------------------------------
/detect.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import os
  3 | import argparse
  4 | import torch
  5 | import torch.backends.cudnn as cudnn
  6 | import numpy as np
  7 | from data import cfg_mnetv1, cfg_mnetv2, cfg_mnetv3, cfg_efnetb0
  8 | from layers.functions.prior_box import PriorBox
  9 | from utils.nms.py_cpu_nms import py_cpu_nms
 10 | import cv2
 11 | from models.retinaface import RetinaFace
 12 | from utils.box_utils import decode, decode_landm
 13 | import time
 14 | 
 15 | parser = argparse.ArgumentParser(description='Retinaface')
 16 | 
 17 | parser.add_argument('-m', '--trained_model', default='./weights/mobilenetv2_0.1_Final.pth',
 18 |                     type=str, help='Trained state_dict file path to open')
 19 | parser.add_argument('--network', default='mobilenetv2', help='Backbone network mobile0.25 ,mobilenetv2 ,mobilenetv3 or efficientnetb0')
 20 | parser.add_argument('--cpu', action="store_true", default=False, help='Use cpu inference')
 21 | parser.add_argument('--confidence_threshold', default=0.02, type=float, help='confidence_threshold')
 22 | parser.add_argument('--top_k', default=5000, type=int, help='top_k')
 23 | parser.add_argument('--nms_threshold', default=0.4, type=float, help='nms_threshold')
 24 | parser.add_argument('--keep_top_k', default=750, type=int, help='keep_top_k')
 25 | parser.add_argument('-s', '--save_image', action="store_true", default=True, help='show detection results')
 26 | parser.add_argument('--vis_thres', default=0.6, type=float, help='visualization_threshold')
 27 | args = parser.parse_args()
 28 | 
 29 | 
 30 | def check_keys(model, pretrained_state_dict):
 31 |     ckpt_keys = set(pretrained_state_dict.keys())
 32 |     model_keys = set(model.state_dict().keys())
 33 |     used_pretrained_keys = model_keys & ckpt_keys
 34 |     unused_pretrained_keys = ckpt_keys - model_keys
 35 |     missing_keys = model_keys - ckpt_keys
 36 |     print('Missing keys:{}'.format(len(missing_keys)))
 37 |     print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys)))
 38 |     print('Used keys:{}'.format(len(used_pretrained_keys)))
 39 |     assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint'
 40 |     return True
 41 | 
 42 | 
 43 | def remove_prefix(state_dict, prefix):
 44 |     ''' Old style model is stored with all names of parameters sharing common prefix 'module.' '''
 45 |     print('remove prefix \'{}\''.format(prefix))
 46 |     f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x
 47 |     return {f(key): value for key, value in state_dict.items()}
 48 | 
 49 | 
 50 | def load_model(model, pretrained_path, load_to_cpu):
 51 |     print('Loading pretrained model from {}'.format(pretrained_path))
 52 |     if load_to_cpu:
 53 |         pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage)
 54 |     else:
 55 |         device = torch.cuda.current_device()
 56 |         pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device))
 57 |     if "state_dict" in pretrained_dict.keys():
 58 |         pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.')
 59 |     else:
 60 |         pretrained_dict = remove_prefix(pretrained_dict, 'module.')
 61 |     check_keys(model, pretrained_dict)
 62 |     model.load_state_dict(pretrained_dict, strict=False)
 63 |     return model
 64 | 
 65 | 
 66 | if __name__ == '__main__':
 67 |     torch.set_grad_enabled(False)
 68 |     cfg = None
 69 |     if args.network == "mobile0.25":
 70 |         cfg = cfg_mnetv1
 71 |     elif args.network == "mobilenetv2":
 72 |         cfg = cfg_mnetv2
 73 |     elif args.network == "mobilenetv3":
 74 |         cfg = cfg_mnetv3
 75 |     elif args.network == "efficientnetb0":
 76 |         cfg = cfg_efnetb0
 77 |     # net and model
 78 |     net = RetinaFace(cfg=cfg, phase = 'test')
 79 |     net = load_model(net, args.trained_model, args.cpu)
 80 |     net.eval()
 81 |     print('Finished loading model!')
 82 |     print(net)
 83 |     cudnn.benchmark = True
 84 |     device = torch.device("cpu" if args.cpu else "cuda")
 85 |     net = net.to(device)
 86 | 
 87 |     resize = 1
 88 | 
 89 |     # testing begin
 90 |     for i in range(1):
 91 |         image_path = "./images/c2f3bc4499934b8ed942743ee8e3082a.jpg"
 92 |         img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
 93 |         img = np.float32(img_raw)
 94 | 
 95 |         im_height, im_width, _ = img.shape
 96 |         scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
 97 |         img -= (104, 117, 123)
 98 |         img = img.transpose(2, 0, 1)
 99 |         img = torch.from_numpy(img).unsqueeze(0)
100 |         img = img.to(device)
101 |         scale = scale.to(device)
102 | 
103 |         tic = time.time()
104 |         loc, conf, landms = net(img)  # forward pass
105 |         print('net forward time: {:.4f}'.format(time.time() - tic))
106 | 
107 |         priorbox = PriorBox(cfg, image_size=(im_height, im_width))
108 |         priors = priorbox.forward()
109 |         priors = priors.to(device)
110 |         prior_data = priors.data
111 |         boxes = decode(loc.data.squeeze(0), prior_data, cfg['variance'])
112 |         boxes = boxes * scale / resize
113 |         boxes = boxes.cpu().numpy()
114 |         scores = conf.squeeze(0).data.cpu().numpy()[:, 1]
115 |         landms = decode_landm(landms.data.squeeze(0), prior_data, cfg['variance'])
116 |         scale1 = torch.Tensor([img.shape[3], img.shape[2], img.shape[3], img.shape[2],
117 |                                img.shape[3], img.shape[2], img.shape[3], img.shape[2],
118 |                                img.shape[3], img.shape[2]])
119 |         scale1 = scale1.to(device)
120 |         landms = landms * scale1 / resize
121 |         landms = landms.cpu().numpy()
122 | 
123 |         # ignore low scores
124 |         inds = np.where(scores > args.confidence_threshold)[0]
125 |         boxes = boxes[inds]
126 |         landms = landms[inds]
127 |         scores = scores[inds]
128 | 
129 |         # keep top-K before NMS
130 |         order = scores.argsort()[::-1][:args.top_k]
131 |         boxes = boxes[order]
132 |         landms = landms[order]
133 |         scores = scores[order]
134 | 
135 |         # do NMS
136 |         dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
137 |         keep = py_cpu_nms(dets, args.nms_threshold)
138 |         # keep = nms(dets, args.nms_threshold,force_cpu=args.cpu)
139 |         dets = dets[keep, :]
140 |         landms = landms[keep]
141 | 
142 |         # keep top-K faster NMS
143 |         dets = dets[:args.keep_top_k, :]
144 |         landms = landms[:args.keep_top_k, :]
145 | 
146 |         dets = np.concatenate((dets, landms), axis=1)
147 | 
148 |         # show image
149 |         if args.save_image:
150 |             for b in dets:
151 |                 if b[4] < args.vis_thres:
152 |                     continue
153 |                 text = "{:.4f}".format(b[4])
154 |                 b = list(map(int, b))
155 |                 cv2.rectangle(img_raw, (b[0], b[1]), (b[2], b[3]), (0, 0, 255), 4)
156 |                 cx = b[0]
157 |                 cy = b[1] + 12
158 |                 cv2.putText(img_raw, text, (cx, cy),
159 |                             cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))
160 | 
161 |                 # landms
162 |                 cv2.circle(img_raw, (b[5], b[6]), 2, (0, 0, 255), 4)
163 |                 cv2.circle(img_raw, (b[7], b[8]), 2, (0, 255, 255), 4)
164 |                 cv2.circle(img_raw, (b[9], b[10]), 2, (255, 0, 255), 4)
165 |                 cv2.circle(img_raw, (b[11], b[12]), 2, (0, 255, 0), 4)
166 |                 cv2.circle(img_raw, (b[13], b[14]), 2, (255, 0, 0), 4)
167 |             # save image
168 | 
169 |             name = "mobilenetv22222.jpg"
170 |             cv2.imwrite(name, img_raw)
171 | 
172 | 


--------------------------------------------------------------------------------
/images/_-20_1008_0.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/_-20_1008_0.jpg


--------------------------------------------------------------------------------
/images/_-20_144_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/_-20_144_2.jpg


--------------------------------------------------------------------------------
/images/c2f3bc4499934b8ed942743ee8e3082a.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/c2f3bc4499934b8ed942743ee8e3082a.jpg


--------------------------------------------------------------------------------
/images/mobilenetv22222.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/mobilenetv22222.jpg


--------------------------------------------------------------------------------
/layers/__init__.py:
--------------------------------------------------------------------------------
1 | from .functions import *
2 | from .modules import *
3 | 


--------------------------------------------------------------------------------
/layers/functions/prior_box.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from itertools import product as product
 3 | import numpy as np
 4 | from math import ceil
 5 | 
 6 | 
 7 | class PriorBox(object):
 8 |     def __init__(self, cfg, image_size=None, phase='train'):
 9 |         super(PriorBox, self).__init__()
10 |         self.min_sizes = cfg['min_sizes']
11 |         self.steps = cfg['steps']
12 |         self.clip = cfg['clip']
13 |         self.image_size = image_size
14 |         self.feature_maps = [[ceil(self.image_size[0]/step), ceil(self.image_size[1]/step)] for step in self.steps]
15 |         self.name = "s"
16 | 
17 |     def forward(self):
18 |         anchors = []
19 |         for k, f in enumerate(self.feature_maps):
20 |             min_sizes = self.min_sizes[k]
21 |             for i, j in product(range(f[0]), range(f[1])):
22 |                 for min_size in min_sizes:
23 |                     s_kx = min_size / self.image_size[1]
24 |                     s_ky = min_size / self.image_size[0]
25 |                     dense_cx = [x * self.steps[k] / self.image_size[1] for x in [j + 0.5]]
26 |                     dense_cy = [y * self.steps[k] / self.image_size[0] for y in [i + 0.5]]
27 |                     for cy, cx in product(dense_cy, dense_cx):
28 |                         anchors += [cx, cy, s_kx, s_ky]
29 | 
30 |         # back to torch land
31 |         output = torch.Tensor(anchors).view(-1, 4)
32 |         if self.clip:
33 |             output.clamp_(max=1, min=0)
34 |         return output
35 | 


--------------------------------------------------------------------------------
/layers/modules/__init__.py:
--------------------------------------------------------------------------------
1 | from .multibox_loss import MultiBoxLoss
2 | 
3 | __all__ = ['MultiBoxLoss']
4 | 


--------------------------------------------------------------------------------
/layers/modules/multibox_loss.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | from torch.autograd import Variable
  5 | from utils.box_utils import match, log_sum_exp
  6 | from data import cfg_mnetv1, cfg_mnetv3
  7 | GPU = cfg_mnetv1['gpu_train']
  8 | 
  9 | class MultiBoxLoss(nn.Module):
 10 |     """SSD Weighted Loss Function
 11 |     Compute Targets:
 12 |         1) Produce Confidence Target Indices by matching  ground truth boxes
 13 |            with (default) 'priorboxes' that have jaccard index > threshold parameter
 14 |            (default threshold: 0.5).
 15 |         2) Produce localization target by 'encoding' variance into offsets of ground
 16 |            truth boxes and their matched  'priorboxes'.
 17 |         3) Hard negative mining to filter the excessive number of negative examples
 18 |            that comes with using a large number of default bounding boxes.
 19 |            (default negative:positive ratio 3:1)
 20 |     Objective Loss:
 21 |         L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
 22 |         Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss
 23 |         weighted by α which is set to 1 by cross val.
 24 |         Args:
 25 |             c: class confidences,
 26 |             l: predicted boxes,
 27 |             g: ground truth boxes
 28 |             N: number of matched default boxes
 29 |         See: https://arxiv.org/pdf/1512.02325.pdf for more details.
 30 |     """
 31 | 
 32 |     def __init__(self, num_classes, overlap_thresh, prior_for_matching, bkg_label, neg_mining, neg_pos, neg_overlap, encode_target):
 33 |         super(MultiBoxLoss, self).__init__()
 34 |         self.num_classes = num_classes
 35 |         self.threshold = overlap_thresh
 36 |         self.background_label = bkg_label
 37 |         self.encode_target = encode_target
 38 |         self.use_prior_for_matching = prior_for_matching
 39 |         self.do_neg_mining = neg_mining
 40 |         self.negpos_ratio = neg_pos
 41 |         self.neg_overlap = neg_overlap
 42 |         self.variance = [0.1, 0.2]
 43 | 
 44 |     def forward(self, predictions, priors, targets):
 45 |         """Multibox Loss
 46 |         Args:
 47 |             predictions (tuple): A tuple containing loc preds, conf preds,
 48 |             and prior boxes from SSD net.
 49 |                 conf shape: torch.size(batch_size,num_priors,num_classes)
 50 |                 loc shape: torch.size(batch_size,num_priors,4)
 51 |                 priors shape: torch.size(num_priors,4)
 52 | 
 53 |             ground_truth (tensor): Ground truth boxes and labels for a batch,
 54 |                 shape: [batch_size,num_objs,5] (last idx is the label).
 55 |         """
 56 | 
 57 |         loc_data, conf_data, landm_data = predictions
 58 |         priors = priors
 59 |         num = loc_data.size(0)
 60 |         num_priors = (priors.size(0))
 61 | 
 62 |         # match priors (default boxes) and ground truth boxes
 63 |         loc_t = torch.Tensor(num, num_priors, 4)
 64 |         landm_t = torch.Tensor(num, num_priors, 10)
 65 |         conf_t = torch.LongTensor(num, num_priors)
 66 |         for idx in range(num):
 67 |             truths = targets[idx][:, :4].data
 68 |             labels = targets[idx][:, -1].data
 69 |             landms = targets[idx][:, 4:14].data
 70 |             defaults = priors.data
 71 |             match(self.threshold, truths, defaults, self.variance, labels, landms, loc_t, conf_t, landm_t, idx)
 72 |         if GPU:
 73 |             loc_t = loc_t.cuda()
 74 |             conf_t = conf_t.cuda()
 75 |             landm_t = landm_t.cuda()
 76 | 
 77 |         zeros = torch.tensor(0).cuda()
 78 |         # landm Loss (Smooth L1)
 79 |         # Shape: [batch,num_priors,10]
 80 |         pos1 = conf_t > zeros
 81 |         num_pos_landm = pos1.long().sum(1, keepdim=True)
 82 |         N1 = max(num_pos_landm.data.sum().float(), 1)
 83 |         pos_idx1 = pos1.unsqueeze(pos1.dim()).expand_as(landm_data)
 84 |         landm_p = landm_data[pos_idx1].view(-1, 10)
 85 |         landm_t = landm_t[pos_idx1].view(-1, 10)
 86 |         loss_landm = F.smooth_l1_loss(landm_p, landm_t, reduction='sum')
 87 | 
 88 | 
 89 |         pos = conf_t != zeros
 90 |         conf_t[pos] = 1
 91 | 
 92 |         # Localization Loss (Smooth L1)
 93 |         # Shape: [batch,num_priors,4]
 94 |         pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
 95 |         loc_p = loc_data[pos_idx].view(-1, 4)
 96 |         loc_t = loc_t[pos_idx].view(-1, 4)
 97 |         loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum')
 98 | 
 99 |         # Compute max conf across batch for hard negative mining
100 |         batch_conf = conf_data.view(-1, self.num_classes)
101 |         loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))
102 | 
103 |         # Hard Negative Mining
104 |         loss_c[pos.view(-1, 1)] = 0 # filter out pos boxes for now
105 |         loss_c = loss_c.view(num, -1)
106 |         _, loss_idx = loss_c.sort(1, descending=True)
107 |         _, idx_rank = loss_idx.sort(1)
108 |         num_pos = pos.long().sum(1, keepdim=True)
109 |         num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
110 |         neg = idx_rank < num_neg.expand_as(idx_rank)
111 | 
112 |         # Confidence Loss Including Positive and Negative Examples
113 |         pos_idx = pos.unsqueeze(2).expand_as(conf_data)
114 |         neg_idx = neg.unsqueeze(2).expand_as(conf_data)
115 |         conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
116 |         targets_weighted = conf_t[(pos+neg).gt(0)]
117 |         loss_c = F.cross_entropy(conf_p, targets_weighted, reduction='sum')
118 | 
119 |         # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
120 |         N = max(num_pos.data.sum().float(), 1)
121 |         loss_l /= N
122 |         loss_c /= N
123 |         loss_landm /= N1
124 | 
125 |         return loss_l, loss_c, loss_landm
126 | 


--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/models/__init__.py


--------------------------------------------------------------------------------
/models/net.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import torch
  3 | import torch.nn as nn
  4 | import torchvision.models._utils as _utils
  5 | import torchvision.models as models
  6 | import torch.nn.functional as F
  7 | from torch.autograd import Variable
  8 | from torch.nn import init
  9 | from .utils import (
 10 |     round_filters,
 11 |     round_repeats,
 12 |     drop_connect,
 13 |     get_same_padding_conv2d,
 14 |     get_model_params,
 15 |     efficientnet_params,
 16 |     load_pretrained_weights,
 17 |     Swish,
 18 |     MemoryEfficientSwish,
 19 |     calculate_output_image_size
 20 | )
 21 | 
 22 | def _make_divisible(v, divisor, min_value=None):
 23 |     """
 24 |     This function is taken from the original tf repo.
 25 |     It ensures that all layers have a channel number that is divisible by 8
 26 |     It can be seen here:
 27 |     https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
 28 |     :param v:
 29 |     :param divisor:
 30 |     :param min_value:
 31 |     :return:
 32 |     """
 33 |     if min_value is None:
 34 |         min_value = divisor
 35 |     new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
 36 |     # Make sure that round down does not go down by more than 10%.
 37 |     if new_v < 0.9 * v:
 38 |         new_v += divisor
 39 |     return new_v
 40 | 
 41 | 
 42 | def conv_3x3_bn(inp, oup, stride):
 43 |     return nn.Sequential(
 44 |         nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
 45 |         nn.BatchNorm2d(oup),
 46 |         nn.ReLU6(inplace=True)
 47 |     )
 48 | 
 49 | 
 50 | def conv_1x1_bn(inp, oup):
 51 |     return nn.Sequential(
 52 |         nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
 53 |         nn.BatchNorm2d(oup),
 54 |         nn.ReLU6(inplace=True)
 55 |     )
 56 | 
 57 | 
 58 | class InvertedResidual(nn.Module):
 59 |     def __init__(self, inp, oup, stride, expand_ratio):
 60 |         super(InvertedResidual, self).__init__()
 61 |         assert stride in [1, 2]
 62 | 
 63 |         hidden_dim = round(inp * expand_ratio)
 64 |         self.identity = stride == 1 and inp == oup
 65 | 
 66 |         if expand_ratio == 1:
 67 |             self.conv = nn.Sequential(
 68 |                 # dw
 69 |                 nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
 70 |                 nn.BatchNorm2d(hidden_dim),
 71 |                 nn.ReLU6(inplace=True),
 72 |                 # pw-linear
 73 |                 nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
 74 |                 nn.BatchNorm2d(oup),
 75 |             )
 76 |         else:
 77 |             self.conv = nn.Sequential(
 78 |                 # pw
 79 |                 nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
 80 |                 nn.BatchNorm2d(hidden_dim),
 81 |                 nn.ReLU6(inplace=True),
 82 |                 # dw
 83 |                 nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
 84 |                 nn.BatchNorm2d(hidden_dim),
 85 |                 nn.ReLU6(inplace=True),
 86 |                 # pw-linear
 87 |                 nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
 88 |                 nn.BatchNorm2d(oup),
 89 |             )
 90 | 
 91 |     def forward(self, x):
 92 |         if self.identity:
 93 |             return x + self.conv(x)
 94 |         else:
 95 |             return self.conv(x)
 96 | 
 97 | class hswish(nn.Module):
 98 |     def forward(self, x):
 99 |         out = x * F.relu6(x + 3, inplace=True) / 6
100 |         return out
101 | 
102 | 
103 | class hsigmoid(nn.Module):
104 |     def forward(self, x):
105 |         out = F.relu6(x + 3, inplace=True) / 6
106 |         return out
107 | 
108 | 
109 | class SeModule(nn.Module):
110 |     def __init__(self, in_size, reduction=4):
111 |         super(SeModule, self).__init__()
112 |         self.se = nn.Sequential(
113 |             nn.AdaptiveAvgPool2d(1),
114 |             nn.Conv2d(in_size, in_size // reduction, kernel_size=1, stride=1, padding=0, bias=False),
115 |             nn.BatchNorm2d(in_size // reduction),
116 |             nn.ReLU(inplace=True),
117 |             nn.Conv2d(in_size // reduction, in_size, kernel_size=1, stride=1, padding=0, bias=False),
118 |             nn.BatchNorm2d(in_size),
119 |             hsigmoid()
120 |         )
121 | 
122 |     def forward(self, x):
123 |         return x * self.se(x)
124 | 
125 | 
126 | class Block(nn.Module):
127 |     '''expand + depthwise + pointwise'''
128 |     def __init__(self, kernel_size, in_size, expand_size, out_size, nolinear, semodule, stride):
129 |         super(Block, self).__init__()
130 |         self.stride = stride
131 |         self.se = semodule
132 |         self.conv1 = nn.Conv2d(in_size, expand_size, kernel_size=1, stride=1, padding=0, bias=False)
133 |         self.bn1 = nn.BatchNorm2d(expand_size)
134 |         self.nolinear1 = nolinear
135 |         self.conv2 = nn.Conv2d(expand_size, expand_size, kernel_size=kernel_size, stride=stride, padding=kernel_size//2, groups=expand_size, bias=False)
136 |         self.bn2 = nn.BatchNorm2d(expand_size)
137 |         self.nolinear2 = nolinear
138 |         self.conv3 = nn.Conv2d(expand_size, out_size, kernel_size=1, stride=1, padding=0, bias=False)
139 |         self.bn3 = nn.BatchNorm2d(out_size)
140 | 
141 |         self.shortcut = nn.Sequential()
142 |         if stride == 1 and in_size != out_size:
143 |             self.shortcut = nn.Sequential(
144 |                 nn.Conv2d(in_size, out_size, kernel_size=1, stride=1, padding=0, bias=False),
145 |                 nn.BatchNorm2d(out_size),
146 |             )
147 | 
148 |     def forward(self, x):
149 |         out = self.nolinear1(self.bn1(self.conv1(x)))
150 |         out = self.nolinear2(self.bn2(self.conv2(out)))
151 |         out = self.bn3(self.conv3(out))
152 |         if self.se != None:
153 |             out = self.se(out)
154 |         out = out + self.shortcut(x) if self.stride==1 else out
155 |         return out
156 | 
157 | def conv_bn(inp, oup, stride = 1, leaky = 0):
158 |     return nn.Sequential(
159 |         nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
160 |         nn.BatchNorm2d(oup),
161 |         nn.LeakyReLU(negative_slope=leaky, inplace=True)
162 |     )
163 | 
164 | def conv_bn_no_relu(inp, oup, stride):
165 |     return nn.Sequential(
166 |         nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
167 |         nn.BatchNorm2d(oup),
168 |     )
169 | 
170 | def conv_bn1X1(inp, oup, stride, leaky=0):
171 |     return nn.Sequential(
172 |         nn.Conv2d(inp, oup, 1, stride, padding=0, bias=False),
173 |         nn.BatchNorm2d(oup),
174 |         nn.LeakyReLU(negative_slope=leaky, inplace=True)
175 |     )
176 | 
177 | def conv_dw(inp, oup, stride, leaky=0.1):
178 |     return nn.Sequential(
179 |         nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
180 |         nn.BatchNorm2d(inp),
181 |         nn.LeakyReLU(negative_slope= leaky,inplace=True),
182 | 
183 |         nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
184 |         nn.BatchNorm2d(oup),
185 |         nn.LeakyReLU(negative_slope= leaky,inplace=True),
186 |     )
187 | 
188 | class SSH(nn.Module):
189 |     def __init__(self, in_channel, out_channel):
190 |         super(SSH, self).__init__()
191 |         assert out_channel % 4 == 0
192 |         leaky = 0
193 |         if (out_channel <= 64):
194 |             leaky = 0.1
195 |         self.conv3X3 = conv_bn_no_relu(in_channel, out_channel//2, stride=1)
196 | 
197 |         self.conv5X5_1 = conv_bn(in_channel, out_channel//4, stride=1, leaky = leaky)
198 |         self.conv5X5_2 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
199 | 
200 |         self.conv7X7_2 = conv_bn(out_channel//4, out_channel//4, stride=1, leaky = leaky)
201 |         self.conv7x7_3 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
202 | 
203 |     def forward(self, input):
204 |         conv3X3 = self.conv3X3(input)
205 | 
206 |         conv5X5_1 = self.conv5X5_1(input)
207 |         conv5X5 = self.conv5X5_2(conv5X5_1)
208 | 
209 |         conv7X7_2 = self.conv7X7_2(conv5X5_1)
210 |         conv7X7 = self.conv7x7_3(conv7X7_2)
211 | 
212 |         out = torch.cat([conv3X3, conv5X5, conv7X7], dim=1)
213 |         out = F.relu(out)
214 |         return out
215 | 
216 | class FPN(nn.Module):
217 |     def __init__(self,in_channels_list,out_channels):
218 |         super(FPN,self).__init__()
219 |         leaky = 0
220 |         if (out_channels <= 64):
221 |             leaky = 0.1
222 |         # self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky)
223 |         self.output2 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky)
224 |         self.output3 = conv_bn1X1(in_channels_list[1], out_channels, stride = 1, leaky = leaky)
225 | 
226 |         self.merge1 = conv_bn(out_channels, out_channels, leaky = leaky)
227 |         self.merge2 = conv_bn(out_channels, out_channels, leaky = leaky)
228 | 
229 |     def forward(self, input):
230 |         # names = list(input.keys())
231 |         input = list(input.values())
232 | 
233 |         # output1 = self.output1(input[0])
234 |         output2 = self.output2(input[0])
235 |         output3 = self.output3(input[1])
236 |         # print("output size 1 : {} 2 : {} 3 : {}".format(output1.size(),output2))
237 | 
238 |         up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode="nearest")
239 |         output2 = output2 + up3
240 |         output2 = self.merge2(output2)
241 | 
242 |         # up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode="nearest")
243 |         # output1 = output1 + up2
244 |         # output1 = self.merge1(output1)
245 | 
246 |         out = [output2, output3]
247 |         return out
248 | 
249 | 
250 | 
251 | class MobileNetV1(nn.Module):
252 |     def __init__(self):
253 |         super(MobileNetV1, self).__init__()
254 |         self.stage1 = nn.Sequential(
255 |             conv_bn(3, 8, 2, leaky = 0.1),    # 3      ##2
256 |             conv_dw(8, 16, 1),   # 7
257 |             conv_dw(16, 32, 2),  # 11                   ## 4
258 |             conv_dw(32, 32, 1),  # 19
259 |             conv_dw(32, 64, 2),  # 27                   ##8
260 |             conv_dw(64, 64, 1),  # 43
261 |         )
262 |         self.stage2 = nn.Sequential(
263 |             conv_dw(64, 128, 2),  # 43 + 16 = 59         #16
264 |             conv_dw(128, 128, 1), # 59 + 32 = 91
265 |             conv_dw(128, 128, 1), # 91 + 32 = 123
266 |             conv_dw(128, 128, 1), # 123 + 32 = 155
267 |             conv_dw(128, 128, 1), # 155 + 32 = 187
268 |             conv_dw(128, 128, 1), # 187 + 32 = 219
269 |         )
270 |         self.stage3 = nn.Sequential(                    #32
271 |             conv_dw(128, 256, 2), # 219 +3 2 = 241
272 |             conv_dw(256, 256, 1), # 241 + 64 = 301
273 |         )
274 |         self.avg = nn.AdaptiveAvgPool2d((1,1))
275 |         self.fc = nn.Linear(256, 1000)
276 | 
277 |     def forward(self, x):
278 |         x = self.stage1(x)
279 |         x = self.stage2(x)
280 |         x = self.stage3(x)
281 |         x = self.avg(x)
282 |         # x = self.model(x)
283 |         x = x.view(-1, 256)
284 |         x = self.fc(x)
285 |         return x
286 | 
287 | class MobileNetV2(nn.Module):
288 |     def __init__(self, num_classes=1000, width_mult=0.1):
289 |         super(MobileNetV2, self).__init__()
290 |         # setting of inverted residual blocks
291 |         self.cfgs = [
292 |             # t, c, n, s
293 |             [1,  16, 1, 1],
294 |             [6,  24, 2, 2],
295 |             [6,  32, 3, 2],
296 |             [6,  64, 4, 2],
297 |             [6,  96, 3, 1],
298 |             [6, 160, 3, 2],
299 |             [6, 320, 1, 1],
300 |         ]
301 | 
302 |         # building first layer
303 |         input_channel = _make_divisible(32 * width_mult, 4)
304 |         layers = [conv_3x3_bn(3, input_channel, 2)]
305 |         # building inverted residual blocks
306 |         block = InvertedResidual
307 |         for t, c, n, s in self.cfgs[:3]:
308 |             output_channel = _make_divisible(c * width_mult, 4)
309 |             for i in range(n):
310 |                 layers.append(block(input_channel, output_channel, s if i == 0 else 1, t))
311 |                 input_channel = output_channel
312 |         self.stage1 = nn.Sequential(*layers)
313 |         layers2 = []
314 |         for t, c, n, s in self.cfgs[3:5]:
315 |             output_channel = _make_divisible(c * width_mult, 4)
316 |             for i in range(n):
317 |                 layers2.append(block(input_channel, output_channel, s if i == 0 else 1, t))
318 |                 input_channel = output_channel
319 |         self.stage2 = nn.Sequential(*layers2)
320 |         layers3 = []
321 |         for t, c, n, s in self.cfgs[5:]:
322 |             output_channel = _make_divisible(c * width_mult, 4)
323 |             for i in range(n):
324 |                 layers3.append(block(input_channel, output_channel, s if i == 0 else 1, t))
325 |                 input_channel = output_channel
326 |         # building last several layers
327 |         output_channel = _make_divisible(1280 * width_mult, 4 ) if width_mult > 1.0 else 1280
328 |         layers3.append(conv_1x1_bn(input_channel, output_channel))
329 |         self.stage3 = nn.Sequential(*layers3)
330 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
331 |         self.classifier = nn.Linear(output_channel, num_classes)
332 | 
333 |     def forward(self, x):
334 |         x = self.stage1(x)
335 |         x = self.stage2(x)
336 |         x = self.stage3(x)
337 |         x = self.avgpool(x)
338 |         x = x.view(x.size(0), -1)
339 |         x = self.classifier(x)
340 |         return x
341 | 
342 | 
343 | 
344 | class MobileNetV3_Small(nn.Module):
345 |     def __init__(self, num_classes=1000):
346 |         super(MobileNetV3_Small, self).__init__()
347 |         self.stage1 = nn.Sequential(nn.Conv2d(3, 16, kernel_size=3, stride=2, padding=1, bias=False),
348 |                                     nn.BatchNorm2d(16),
349 |                                     hswish(),
350 |                                     Block(3, 16, 16, 16, nn.ReLU(inplace=True), SeModule(16), 2),
351 |                                     Block(3, 16, 72, 24, nn.ReLU(inplace=True), None, 2),
352 |                                     Block(3, 24, 88, 24, nn.ReLU(inplace=True), None, 1))
353 |         self.stage2 = nn.Sequential(Block(5, 24, 96, 40, hswish(), SeModule(40), 2),
354 |                                     Block(5, 40, 240, 40, hswish(), SeModule(40), 1),
355 |                                     Block(5, 40, 240, 40, hswish(), SeModule(40), 1),
356 |                                     Block(5, 40, 120, 48, hswish(), SeModule(48), 1),
357 |                                     Block(5, 48, 144, 48, hswish(), SeModule(48), 1))
358 |         self.stage3 = nn.Sequential(Block(5, 48, 288, 96, hswish(), SeModule(96), 2),
359 |                                     Block(5, 96, 576, 96, hswish(), SeModule(96), 1),
360 |                                     Block(5, 96, 576, 96, hswish(), SeModule(96), 1),
361 |                                     nn.Conv2d(96, 576, kernel_size=1, stride=1, padding=0, bias=False),
362 |                                     nn.BatchNorm2d(576),
363 |                                     hswish())
364 |         # self.conv2 = nn.Conv2d(96, 576, kernel_size=1, stride=1, padding=0, bias=False)
365 |         # self.bn2 = nn.BatchNorm2d(576)
366 |         # self.hs2 = hswish()
367 |         self.linear3 = nn.Linear(576, 1280)
368 |         self.bn3 = nn.BatchNorm1d(1280)
369 |         self.hs3 = hswish()
370 |         self.linear4 = nn.Linear(1280, num_classes)
371 | 
372 |     def forward(self, out):
373 |         out = self.stage1(out)
374 |         out = self.stage2(out)
375 |         out = self.stage3(out)
376 |         out = F.avg_pool2d(out, 7)
377 |         out = out.view(out.size(0), -1)
378 |         out = self.hs3(self.bn3(self.linear3(out)))
379 |         out = self.linear4(out)
380 |         return out
381 | 
382 | class MBConvBlock(nn.Module):
383 |     """Mobile Inverted Residual Bottleneck Block.
384 | 
385 |     Args:
386 |         block_args (namedtuple): BlockArgs, defined in utils.py.
387 |         global_params (namedtuple): GlobalParam, defined in utils.py.
388 |         image_size (tuple or list): [image_height, image_width].
389 | 
390 |     References:
391 |         [1] https://arxiv.org/abs/1704.04861 (MobileNet v1)
392 |         [2] https://arxiv.org/abs/1801.04381 (MobileNet v2)
393 |         [3] https://arxiv.org/abs/1905.02244 (MobileNet v3)
394 |     """
395 | 
396 |     def __init__(self, block_args, global_params, drop_connect_rate=None, image_size=None):
397 |         super().__init__()
398 |         self._block_args = block_args
399 |         self._bn_mom = 1 - global_params.batch_norm_momentum # pytorch's difference from tensorflow
400 |         self._bn_eps = global_params.batch_norm_epsilon
401 |         self.has_se = (self._block_args.se_ratio is not None) and (0 < self._block_args.se_ratio <= 1)
402 |         self.id_skip = block_args.id_skip  # whether to use skip connection and drop connect
403 | 
404 |         # Expansion phase (Inverted Bottleneck)
405 |         inp = self._block_args.input_filters  # number of input channels
406 |         oup = self._block_args.input_filters * self._block_args.expand_ratio  # number of output channels
407 |         if self._block_args.expand_ratio != 1:
408 |             Conv2d = get_same_padding_conv2d(image_size=image_size)
409 |             self._expand_conv = Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, bias=False)
410 |             self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
411 |             # image_size = calculate_output_image_size(image_size, 1) <-- this wouldn't modify image_size
412 | 
413 |         # Depthwise convolution phase
414 |         k = self._block_args.kernel_size
415 |         s = self._block_args.stride
416 |         Conv2d = get_same_padding_conv2d(image_size=image_size)
417 |         self._depthwise_conv = Conv2d(
418 |             in_channels=oup, out_channels=oup, groups=oup,  # groups makes it depthwise
419 |             kernel_size=k, stride=s, bias=False)
420 |         self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
421 |         image_size = calculate_output_image_size(image_size, s)
422 | 
423 |         # Squeeze and Excitation layer, if desired
424 |         if self.has_se:
425 |             Conv2d = get_same_padding_conv2d(image_size=(1,1))
426 |             num_squeezed_channels = max(1, int(self._block_args.input_filters * self._block_args.se_ratio))
427 |             self._se_reduce = Conv2d(in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1)
428 |             self._se_expand = Conv2d(in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1)
429 | 
430 |         # Pointwise convolution phase
431 |         final_oup = self._block_args.output_filters
432 |         Conv2d = get_same_padding_conv2d(image_size=image_size)
433 |         self._project_conv = Conv2d(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False)
434 |         self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._bn_mom, eps=self._bn_eps)
435 |         self._swish = MemoryEfficientSwish()
436 |         self.drop_connect_rate = drop_connect_rate
437 | 
438 |     def forward(self, inputs):
439 |         """MBConvBlock's forward function.
440 | 
441 |         Args:
442 |             inputs (tensor): Input tensor.
443 |             drop_connect_rate (bool): Drop connect rate (float, between 0 and 1).
444 | 
445 |         Returns:
446 |             Output of this block after processing.
447 |         """
448 | 
449 |         # Expansion and Depthwise Convolution
450 |         x = inputs
451 |         if self._block_args.expand_ratio != 1:
452 |             x = self._expand_conv(inputs)
453 |             x = self._bn0(x)
454 |             x = self._swish(x)
455 | 
456 |         x = self._depthwise_conv(x)
457 |         x = self._bn1(x)
458 |         x = self._swish(x)
459 | 
460 |         # Squeeze and Excitation
461 |         if self.has_se:
462 |             x_squeezed = F.adaptive_avg_pool2d(x, 1)
463 |             x_squeezed = self._se_reduce(x_squeezed)
464 |             x_squeezed = self._swish(x_squeezed)
465 |             x_squeezed = self._se_expand(x_squeezed)
466 |             x = torch.sigmoid(x_squeezed) * x
467 | 
468 |         # Pointwise Convolution
469 |         x = self._project_conv(x)
470 |         x = self._bn2(x)
471 | 
472 |         # Skip connection and drop connect
473 |         input_filters, output_filters = self._block_args.input_filters, self._block_args.output_filters
474 |         if self.id_skip and self._block_args.stride == 1 and input_filters == output_filters:
475 |             # The combination of skip connection and drop connect brings about stochastic depth.
476 |             if self.drop_connect_rate:
477 |                 x = drop_connect(x, p=self.drop_connect_rate, training=self.training)
478 |             x = x + inputs  # skip connection
479 |         return x
480 | 
481 |     def set_swish(self, memory_efficient=True):
482 |         """Sets swish function as memory efficient (for training) or standard (for export).
483 | 
484 |         Args:
485 |             memory_efficient (bool): Whether to use memory-efficient version of swish.
486 |         """
487 |         self._swish = MemoryEfficientSwish() if memory_efficient else Swish()
488 | 
489 | 
490 | 
491 | class EfficientNet(nn.Module):
492 |     """EfficientNet model.
493 |        Most easily loaded with the .from_name or .from_pretrained methods.
494 | 
495 |     Args:
496 |         blocks_args (list[namedtuple]): A list of BlockArgs to construct blocks.
497 |         global_params (namedtuple): A set of GlobalParams shared between blocks.
498 | 
499 |     References:
500 |         [1] https://arxiv.org/abs/1905.11946 (EfficientNet)
501 |     """
502 | 
503 |     def __init__(self, blocks_args=None, global_params=None):
504 |         super().__init__()
505 |         assert isinstance(blocks_args, list), 'blocks_args should be a list'
506 |         assert len(blocks_args) > 0, 'block args must be greater than 0'
507 |         self._global_params = global_params
508 |         self._blocks_args = blocks_args
509 | 
510 |         # Batch norm parameters
511 |         bn_mom = 1 - self._global_params.batch_norm_momentum
512 |         bn_eps = self._global_params.batch_norm_epsilon
513 | 
514 |         # Get stem static or dynamic convolution depending on image size
515 |         image_size = global_params.image_size
516 |         Conv2d = get_same_padding_conv2d(image_size=image_size)
517 | 
518 |         # Stem
519 |         in_channels = 3  # rgb
520 |         out_channels = round_filters(32, self._global_params)  # number of output channels
521 |         image_size = calculate_output_image_size(image_size, 2)
522 |         # Build blocks
523 |         stage1 = [Conv2d(in_channels, out_channels, kernel_size=3, stride=2, bias=False),
524 |                                      nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps),
525 |                                      MemoryEfficientSwish()]
526 |         stage2 = []
527 |         stage3 = []
528 |         print("self._global_params.drop_connect_rate:",self._global_params.drop_connect_rate)
529 |         for idx, block_args in enumerate(self._blocks_args[:3]):
530 |             # Update block input and output filters based on depth multiplier.
531 |             block_args = block_args._replace(
532 |                 input_filters=round_filters(block_args.input_filters, self._global_params),
533 |                 output_filters=round_filters(block_args.output_filters, self._global_params),
534 |                 num_repeat=round_repeats(block_args.num_repeat, self._global_params)
535 |             )
536 |             drop_connect_rate = self._global_params.drop_connect_rate
537 |             if drop_connect_rate:
538 |                 drop_connect_rate *= float(idx) / len(self._blocks_args)  # scale drop connect_rate
539 | 
540 |             # The first block needs to take care of stride and filter size increase.
541 |             stage1.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size))
542 |             image_size = calculate_output_image_size(image_size, block_args.stride)
543 |             if block_args.num_repeat > 1:  # modify block_args to keep same output size
544 |                 block_args = block_args._replace(input_filters=block_args.output_filters, stride=1)
545 |             for _ in range(block_args.num_repeat - 1):
546 |                 stage1.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate,image_size=image_size))
547 |                 # image_size = calculate_output_image_size(image_size, block_args.stride)  # stride = 1
548 |         ########################### stage 2 ############################
549 |         for idx, block_args in enumerate(self._blocks_args[3:5]):
550 | 
551 |             # Update block input and output filters based on depth multiplier.
552 |             block_args = block_args._replace(
553 |                 input_filters=round_filters(block_args.input_filters, self._global_params),
554 |                 output_filters=round_filters(block_args.output_filters, self._global_params),
555 |                 num_repeat=round_repeats(block_args.num_repeat, self._global_params)
556 |             )
557 |             drop_connect_rate = self._global_params.drop_connect_rate
558 |             if drop_connect_rate:
559 |                 drop_connect_rate *= float(idx+3) / len(self._blocks_args)  # scale drop connect_rate
560 |             # The first block needs to take care of stride and filter size increase.
561 |             stage2.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size))
562 |             image_size = calculate_output_image_size(image_size, block_args.stride)
563 |             if block_args.num_repeat > 1:  # modify block_args to keep same output size
564 |                 block_args = block_args._replace(input_filters=block_args.output_filters, stride=1)
565 |             for _ in range(block_args.num_repeat - 1):
566 |                 stage2.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size))
567 |                 # image_size = calculate_output_image_size(image_size, block_args.stride)  # stride = 1
568 |             #############################stage 3 ###########################
569 |         for idx, block_args in enumerate(self._blocks_args[5:]):
570 | 
571 |             # Update block input and output filters based on depth multiplier.
572 |             block_args = block_args._replace(
573 |                 input_filters=round_filters(block_args.input_filters, self._global_params),
574 |                 output_filters=round_filters(block_args.output_filters, self._global_params),
575 |                 num_repeat=round_repeats(block_args.num_repeat, self._global_params)
576 |             )
577 |             drop_connect_rate = self._global_params.drop_connect_rate
578 |             if drop_connect_rate:
579 |                 drop_connect_rate *= float(idx + 5) / len(self._blocks_args)  # scale drop connect_rate
580 |             # The first block needs to take care of stride and filter size increase.
581 |             stage3.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate,image_size=image_size))
582 |             image_size = calculate_output_image_size(image_size, block_args.stride)
583 |             if block_args.num_repeat > 1:  # modify block_args to keep same output size
584 |                 block_args = block_args._replace(input_filters=block_args.output_filters, stride=1)
585 |             for _ in range(block_args.num_repeat - 1):
586 |                 stage3.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size))
587 |                 # image_size = calculate_output_image_size(image_size, block_args.stride)  # stride = 1
588 | 
589 |         # Head
590 |         in_channels = block_args.output_filters  # output of final block
591 |         out_channels = round_filters(1280, self._global_params)
592 |         Conv2d = get_same_padding_conv2d(image_size=image_size)
593 |         stage3.extend([Conv2d(in_channels, out_channels, kernel_size=1, bias=False),
594 |                             nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps),
595 |                             MemoryEfficientSwish()])
596 |         self.stage1 = nn.Sequential(*stage1)
597 |         self.stage2 = nn.Sequential(*stage2)
598 |         self.stage3 = nn.Sequential(*stage3)
599 | 
600 |         # Final linear layer
601 |         self._avg_pooling = nn.AdaptiveAvgPool2d(1)
602 |         self._dropout = nn.Dropout(self._global_params.dropout_rate)
603 |         self._fc = nn.Linear(out_channels, self._global_params.num_classes)
604 | 
605 |     def set_swish(self, memory_efficient=True):
606 |         """Sets swish function as memory efficient (for training) or standard (for export).
607 | 
608 |         Args:
609 |             memory_efficient (bool): Whether to use memory-efficient version of swish.
610 | 
611 |         """
612 |         self._swish = MemoryEfficientSwish() if memory_efficient else Swish()
613 |         for block in self._blocks:
614 |             block.set_swish(memory_efficient)
615 | 
616 |     def forward(self, inputs):
617 |         # Convolution layers
618 |         print("input size:",inputs.size())
619 |         x = self.stage1(inputs)
620 |         print("out size : ",x.size())
621 |         x = self.stage2(x)
622 |         x = self.stage3(x)
623 |         # Pooling and final linear layer
624 |         x = self._avg_pooling(x)
625 |         x = x.flatten(start_dim=1)
626 |         x = self._dropout(x)
627 |         x = self._fc(x)
628 | 
629 |         return x
630 | 
631 |     @classmethod
632 |     def from_name(cls, model_name='efficientnet-b0', in_channels=3, **override_params):
633 |         blocks_args, global_params = get_model_params(model_name, override_params)
634 |         model = cls(blocks_args, global_params)
635 |         return model
636 | 
637 |     @classmethod
638 |     def get_image_size(cls, model_name):
639 |         """Get the input image size for a given efficientnet model.
640 | 
641 |         Args:
642 |             model_name (str): Name for efficientnet.
643 | 
644 |         Returns:
645 |             Input image size (resolution).
646 |         """
647 |         cls._check_model_name_is_valid(model_name)
648 |         _, _, res, _ = efficientnet_params(model_name)
649 |         return res
650 | 
651 | 
652 | 
653 | 
654 | 


--------------------------------------------------------------------------------
/models/retinaface.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torchvision.models.detection.backbone_utils as backbone_utils
  4 | import torchvision.models._utils as _utils
  5 | import torch.nn.functional as F
  6 | from collections import OrderedDict
  7 | 
  8 | from models.net import MobileNetV1 as MobileNetV1
  9 | from models.net import MobileNetV3_Small as MobileNetV3
 10 | from models.net import MobileNetV2 as MobileNetV2
 11 | from models.net import EfficientNet as EfficientNet
 12 | from models.net import FPN as FPN
 13 | from models.net import SSH as SSH
 14 | 
 15 | 
 16 | 
 17 | class ClassHead(nn.Module):
 18 |     def __init__(self,inchannels=512,num_anchors=3):
 19 |         super(ClassHead,self).__init__()
 20 |         self.num_anchors = num_anchors
 21 |         self.conv1x1 = nn.Conv2d(inchannels,self.num_anchors*2,kernel_size=(1,1),stride=1,padding=0)
 22 | 
 23 |     def forward(self,x):
 24 |         out = self.conv1x1(x)
 25 |         out = out.permute(0,2,3,1).contiguous()
 26 |         
 27 |         return out.view(out.shape[0], -1, 2)
 28 | 
 29 | class BboxHead(nn.Module):
 30 |     def __init__(self,inchannels=512,num_anchors=3):
 31 |         super(BboxHead,self).__init__()
 32 |         self.conv1x1 = nn.Conv2d(inchannels,num_anchors*4,kernel_size=(1,1),stride=1,padding=0)
 33 | 
 34 |     def forward(self,x):
 35 |         out = self.conv1x1(x)
 36 |         out = out.permute(0,2,3,1).contiguous()
 37 | 
 38 |         return out.view(out.shape[0], -1, 4)
 39 | 
 40 | class LandmarkHead(nn.Module):
 41 |     def __init__(self,inchannels=512,num_anchors=3):
 42 |         super(LandmarkHead,self).__init__()
 43 |         self.conv1x1 = nn.Conv2d(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0)
 44 | 
 45 |     def forward(self,x):
 46 |         out = self.conv1x1(x)
 47 |         out = out.permute(0,2,3,1).contiguous()
 48 | 
 49 |         return out.view(out.shape[0], -1, 10)
 50 | 
 51 | class RetinaFace(nn.Module):
 52 |     def __init__(self, cfg = None, phase = 'train'):
 53 |         """
 54 |         :param cfg:  Network related settings.
 55 |         :param phase: train or test.
 56 |         """
 57 |         super(RetinaFace,self).__init__()
 58 |         self.phase = phase
 59 |         backbone = None
 60 |         if cfg['name'] == 'mobilenet0.25':
 61 |             backbone = MobileNetV1()
 62 |             if cfg['pretrain']:
 63 |                 checkpoint = torch.load("./weights/mobilenetV1X0.25_pretrain.tar", map_location=torch.device('cpu'))
 64 |                 from collections import OrderedDict
 65 |                 new_state_dict = OrderedDict()
 66 |                 for k, v in checkpoint['state_dict'].items():
 67 |                     name = k[7:]  # remove module.
 68 |                     new_state_dict[name] = v
 69 |                 # load params
 70 |                 backbone.load_state_dict(new_state_dict)
 71 |         elif cfg['name'] == 'Resnet50':
 72 |             import torchvision.models as models
 73 |             backbone = models.resnet50(pretrained=cfg['pretrain'])
 74 |         elif cfg['name'] == 'mobilenetv3':
 75 |             backbone = MobileNetV3()
 76 |             # print(backbone)
 77 |             if cfg['pretrain']:
 78 |                 checkpoint = torch.load("./weights/mobilenetv3.pth", map_location=torch.device('cpu'))
 79 |                 print("Pretrained Weights : ",type(checkpoint))
 80 |                 backbone.load_state_dict(checkpoint)
 81 |         elif cfg['name'] == 'mobilenetv2_0.1':
 82 |             backbone = MobileNetV2()
 83 |             if cfg['pretrain']:
 84 |                 checkpoint = torch.load("./weights/mobilenetv2_0.1_face.pth", map_location=torch.device('cpu'))
 85 |                 backbone.load_state_dict(checkpoint)
 86 |         elif cfg['name'] == 'efficientnetb0':
 87 |             backbone = EfficientNet.from_name("efficientnet-b0")
 88 |             if cfg['pretrain']:
 89 |                 checkpoint = torch.load("./weights/efficientnetb0_face.pth", map_location=torch.device('cpu'))
 90 |                 backbone.load_state_dict(checkpoint)
 91 |                 print("succeed loaded weights...")
 92 |         self.body = _utils.IntermediateLayerGetter(backbone, cfg['return_layers'])
 93 |         if cfg['name'] == 'mobilenet0.25':
 94 |             in_channels_stage2 = cfg['in_channel']
 95 |             in_channels_list = [
 96 |                 # in_channels_stage2 * 2,
 97 |                 in_channels_stage2*4,
 98 |                 in_channels_stage2*8,
 99 |             ]
100 |         elif cfg['name'] == 'mobilenetv2_0.1':
101 |             in_channels_stage2 = cfg['in_channel1']
102 |             in_channels_stage3 = cfg['in_channel2']
103 |             in_channels_list = [
104 |                 # in_channels_stage2 * 2,
105 |                 in_channels_stage2,
106 |                 in_channels_stage3,
107 |             ]
108 |         elif cfg['name'] == 'mobilenetv3':
109 |             in_channels_stage2 = cfg['in_channel1']
110 |             in_channels_stage3 = cfg['in_channel2']
111 |             in_channels_list = [
112 |                 # in_channels_stage2 * 2,
113 |                 in_channels_stage2,
114 |                 in_channels_stage3,
115 |             ]
116 |         elif cfg['name'] == 'efficientnetb0':
117 |             in_channels_stage2 = cfg['in_channel1']
118 |             in_channels_stage3 = cfg['in_channel2']
119 |             in_channels_list = [
120 |                 # in_channels_stage2 * 2,
121 |                 in_channels_stage2,
122 |                 in_channels_stage3,
123 |             ]
124 |         out_channels = cfg['out_channel']
125 |         self.fpn = FPN(in_channels_list,out_channels)
126 |         # self.ssh1 = SSH(out_channels, out_channels)
127 |         self.ssh2 = SSH(out_channels, out_channels)
128 |         self.ssh3 = SSH(out_channels, out_channels)
129 | 
130 |         self.ClassHead = self._make_class_head(fpn_num=2, inchannels=cfg['out_channel'])
131 |         self.BboxHead = self._make_bbox_head(fpn_num=2, inchannels=cfg['out_channel'])
132 |         self.LandmarkHead = self._make_landmark_head(fpn_num=2, inchannels=cfg['out_channel'])
133 | 
134 |     def _make_class_head(self,fpn_num=2,inchannels=64,anchor_num=2):
135 |         classhead = nn.ModuleList()
136 |         for i in range(fpn_num):
137 |             classhead.append(ClassHead(inchannels,anchor_num))
138 |         return classhead
139 |     
140 |     def _make_bbox_head(self,fpn_num=2,inchannels=64,anchor_num=2):
141 |         bboxhead = nn.ModuleList()
142 |         for i in range(fpn_num):
143 |             bboxhead.append(BboxHead(inchannels,anchor_num))
144 |         return bboxhead
145 | 
146 |     def _make_landmark_head(self,fpn_num=2,inchannels=64,anchor_num=2):
147 |         landmarkhead = nn.ModuleList()
148 |         for i in range(fpn_num):
149 |             landmarkhead.append(LandmarkHead(inchannels,anchor_num))
150 |         return landmarkhead
151 | 
152 |     def forward(self,inputs):
153 |         out = self.body(inputs)
154 |         # print("out size : ",out.size())
155 |         # FPN
156 |         fpn = self.fpn(out)
157 | 
158 |         # SSH
159 |         # feature1 = self.ssh1(fpn[0])
160 |         feature2 = self.ssh2(fpn[0])
161 |         feature3 = self.ssh3(fpn[1])
162 |         # print("Feature1 size {} 2 size : {}".format(feature2.size(),feature3.size()))
163 |         # features = [feature1, feature2, feature3]
164 |         features = [feature2, feature3]
165 | 
166 |         bbox_regressions = torch.cat([self.BboxHead[i](feature) for i, feature in enumerate(features)], dim=1)
167 |         classifications = torch.cat([self.ClassHead[i](feature) for i, feature in enumerate(features)],dim=1)
168 |         ldm_regressions = torch.cat([self.LandmarkHead[i](feature) for i, feature in enumerate(features)], dim=1)
169 | 
170 |         if self.phase == 'train':
171 |             output = (bbox_regressions, classifications, ldm_regressions)
172 |         else:
173 |             output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions)
174 |         return output
175 | 
176 | 


--------------------------------------------------------------------------------
/models/utils.py:
--------------------------------------------------------------------------------
  1 | """utils.py - Helper functions for building the model and for loading model parameters.
  2 |    These helper functions are built to mirror those in the official TensorFlow implementation.
  3 | """
  4 | 
  5 | # Author: lukemelas (github username)
  6 | # Github repo: https://github.com/lukemelas/EfficientNet-PyTorch
  7 | # With adjustments and added comments by workingcoder (github username).
  8 | 
  9 | import re
 10 | import math
 11 | import collections
 12 | from functools import partial
 13 | import torch
 14 | from torch import nn
 15 | from torch.nn import functional as F
 16 | from torch.utils import model_zoo
 17 | 
 18 | 
 19 | ################################################################################
 20 | ### Help functions for model architecture
 21 | ################################################################################
 22 | 
 23 | # GlobalParams and BlockArgs: Two namedtuples
 24 | # Swish and MemoryEfficientSwish: Two implementations of the method
 25 | # round_filters and round_repeats:
 26 | #     Functions to calculate params for scaling model width and depth ! ! !
 27 | # get_width_and_height_from_size and calculate_output_image_size
 28 | # drop_connect: A structural design
 29 | # get_same_padding_conv2d:
 30 | #     Conv2dDynamicSamePadding
 31 | #     Conv2dStaticSamePadding
 32 | # get_same_padding_maxPool2d:
 33 | #     MaxPool2dDynamicSamePadding
 34 | #     MaxPool2dStaticSamePadding
 35 | #     It's an additional function, not used in EfficientNet,
 36 | #     but can be used in other model (such as EfficientDet).
 37 | # Identity: An implementation of identical mapping
 38 | 
 39 | # Parameters for the entire model (stem, all blocks, and head)
 40 | GlobalParams = collections.namedtuple('GlobalParams', [
 41 |     'width_coefficient', 'depth_coefficient', 'image_size', 'dropout_rate',
 42 |     'num_classes', 'batch_norm_momentum', 'batch_norm_epsilon',
 43 |     'drop_connect_rate', 'depth_divisor', 'min_depth'])
 44 | 
 45 | # Parameters for an individual model block
 46 | BlockArgs = collections.namedtuple('BlockArgs', [
 47 |     'num_repeat', 'kernel_size', 'stride', 'expand_ratio',
 48 |     'input_filters', 'output_filters', 'se_ratio', 'id_skip'])
 49 | 
 50 | # Set GlobalParams and BlockArgs's defaults
 51 | GlobalParams.__new__.__defaults__ = (None,) * len(GlobalParams._fields)
 52 | BlockArgs.__new__.__defaults__ = (None,) * len(BlockArgs._fields)
 53 | 
 54 | 
 55 | # An ordinary implementation of Swish function
 56 | class Swish(nn.Module):
 57 |     def forward(self, x):
 58 |         return x * torch.sigmoid(x)
 59 | 
 60 | 
 61 | # A memory-efficient implementation of Swish function
 62 | class SwishImplementation(torch.autograd.Function):
 63 |     @staticmethod
 64 |     def forward(ctx, i):
 65 |         result = i * torch.sigmoid(i)
 66 |         ctx.save_for_backward(i)
 67 |         return result
 68 | 
 69 |     @staticmethod
 70 |     def backward(ctx, grad_output):
 71 |         i = ctx.saved_tensors[0]
 72 |         sigmoid_i = torch.sigmoid(i)
 73 |         return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
 74 | 
 75 | class MemoryEfficientSwish(nn.Module):
 76 |     def forward(self, x):
 77 |         return SwishImplementation.apply(x)
 78 | 
 79 | 
 80 | def round_filters(filters, global_params):
 81 |     """Calculate and round number of filters based on width multiplier.
 82 |        Use width_coefficient, depth_divisor and min_depth of global_params.
 83 | 
 84 |     Args:
 85 |         filters (int): Filters number to be calculated.
 86 |         global_params (namedtuple): Global params of the model.
 87 | 
 88 |     Returns:
 89 |         new_filters: New filters number after calculating.
 90 |     """
 91 |     multiplier = global_params.width_coefficient
 92 |     if not multiplier:
 93 |         return filters
 94 |     # TODO: modify the params names.
 95 |     #       maybe the names (width_divisor,min_width)
 96 |     #       are more suitable than (depth_divisor,min_depth).
 97 |     divisor = global_params.depth_divisor
 98 |     min_depth = global_params.min_depth
 99 |     filters *= multiplier
100 |     min_depth = min_depth or divisor # pay attention to this line when using min_depth
101 |     # follow the formula transferred from official TensorFlow implementation
102 |     new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor)
103 |     if new_filters < 0.9 * filters: # prevent rounding by more than 10%
104 |         new_filters += divisor
105 |     return int(new_filters)
106 | 
107 | 
108 | def round_repeats(repeats, global_params):
109 |     """Calculate module's repeat number of a block based on depth multiplier.
110 |        Use depth_coefficient of global_params.
111 | 
112 |     Args:
113 |         repeats (int): num_repeat to be calculated.
114 |         global_params (namedtuple): Global params of the model.
115 | 
116 |     Returns:
117 |         new repeat: New repeat number after calculating.
118 |     """
119 |     multiplier = global_params.depth_coefficient
120 |     if not multiplier:
121 |         return repeats
122 |     # follow the formula transferred from official TensorFlow implementation
123 |     return int(math.ceil(multiplier * repeats))
124 | 
125 | 
126 | def drop_connect(inputs, p, training):
127 |     """Drop connect.
128 |        
129 |     Args:
130 |         input (tensor: BCWH): Input of this structure.
131 |         p (float: 0.0~1.0): Probability of drop connection.
132 |         training (bool): The running mode.
133 | 
134 |     Returns:
135 |         output: Output after drop connection.
136 |     """
137 |     assert p >= 0 and p <= 1, 'p must be in range of [0,1]'
138 | 
139 |     if not training:
140 |         return inputs
141 | 
142 |     batch_size = inputs.shape[0]
143 |     keep_prob = 1 - p
144 | 
145 |     # generate binary_tensor mask according to probability (p for 0, 1-p for 1)
146 |     random_tensor = keep_prob
147 |     random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=inputs.dtype, device=inputs.device)
148 |     binary_tensor = torch.floor(random_tensor)
149 | 
150 |     output = inputs / keep_prob * binary_tensor
151 |     return output
152 | 
153 | 
154 | def get_width_and_height_from_size(x):
155 |     """Obtain height and width from x.
156 | 
157 |     Args:
158 |         x (int, tuple or list): Data size.
159 | 
160 |     Returns:
161 |         size: A tuple or list (H,W).
162 |     """
163 |     if isinstance(x, int):
164 |         return x, x
165 |     if isinstance(x, list) or isinstance(x, tuple):
166 |         return x
167 |     else:
168 |         raise TypeError()
169 | 
170 | 
171 | def calculate_output_image_size(input_image_size, stride):
172 |     """Calculates the output image size when using Conv2dSamePadding with a stride.
173 |        Necessary for static padding. Thanks to mannatsingh for pointing this out.
174 | 
175 |     Args:
176 |         input_image_size (int, tuple or list): Size of input image.
177 |         stride (int, tuple or list): Conv2d operation's stride.
178 | 
179 |     Returns:
180 |         output_image_size: A list [H,W].
181 |     """
182 |     if input_image_size is None:
183 |         return None
184 |     image_height, image_width = get_width_and_height_from_size(input_image_size)
185 |     stride = stride if isinstance(stride, int) else stride[0]
186 |     image_height = int(math.ceil(image_height / stride))
187 |     image_width = int(math.ceil(image_width / stride))
188 |     return [image_height, image_width]
189 | 
190 | 
191 | # Note: 
192 | # The following 'SamePadding' functions make output size equal ceil(input size/stride).
193 | # Only when stride equals 1, can the output size be the same as input size.
194 | # Don't be confused by their function names ! ! !
195 | 
196 | def get_same_padding_conv2d(image_size=None):
197 |     """Chooses static padding if you have specified an image size, and dynamic padding otherwise.
198 |        Static padding is necessary for ONNX exporting of models.
199 | 
200 |     Args:
201 |         image_size (int or tuple): Size of the image.
202 | 
203 |     Returns:
204 |         Conv2dDynamicSamePadding or Conv2dStaticSamePadding.
205 |     """
206 |     if image_size is None:
207 |         return Conv2dDynamicSamePadding
208 |     else:
209 |         return partial(Conv2dStaticSamePadding, image_size=image_size)
210 | 
211 | 
212 | class Conv2dDynamicSamePadding(nn.Conv2d):
213 |     """2D Convolutions like TensorFlow, for a dynamic image size.
214 |        The padding is operated in forward function by calculating dynamically.
215 |     """
216 | 
217 |     # Tips for 'SAME' mode padding.
218 |     #     Given the following:
219 |     #         i: width or height
220 |     #         s: stride
221 |     #         k: kernel size
222 |     #         d: dilation
223 |     #         p: padding
224 |     #     Output after Conv2d:
225 |     #         o = floor((i+p-((k-1)*d+1))/s+1)
226 |     # If o equals i, i = floor((i+p-((k-1)*d+1))/s+1),
227 |     # => p = (i-1)*s+((k-1)*d+1)-i
228 | 
229 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True):
230 |         super().__init__(in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias)
231 |         self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2
232 | 
233 |     def forward(self, x):
234 |         ih, iw = x.size()[-2:]
235 |         kh, kw = self.weight.size()[-2:]
236 |         sh, sw = self.stride
237 |         oh, ow = math.ceil(ih / sh), math.ceil(iw / sw) # change the output size according to stride ! ! !
238 |         pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
239 |         pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
240 |         if pad_h > 0 or pad_w > 0:
241 |             x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2])
242 |         return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
243 | 
244 | 
245 | class Conv2dStaticSamePadding(nn.Conv2d):
246 |     """2D Convolutions like TensorFlow's 'SAME' mode, with the given input image size.
247 |        The padding mudule is calculated in construction function, then used in forward.
248 |     """
249 | 
250 |     # With the same calculation as Conv2dDynamicSamePadding
251 | 
252 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1, image_size=None, **kwargs):
253 |         super().__init__(in_channels, out_channels, kernel_size, stride, **kwargs)
254 |         self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2
255 | 
256 |         # Calculate padding based on image size and save it
257 |         assert image_size is not None
258 |         ih, iw = (image_size, image_size) if isinstance(image_size, int) else image_size
259 |         kh, kw = self.weight.size()[-2:]
260 |         sh, sw = self.stride
261 |         oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
262 |         pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
263 |         pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
264 |         if pad_h > 0 or pad_w > 0:
265 |             self.static_padding = nn.ZeroPad2d((pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2))
266 |         else:
267 |             self.static_padding = Identity()
268 | 
269 |     def forward(self, x):
270 |         x = self.static_padding(x)
271 |         x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
272 |         return x
273 | 
274 | 
275 | def get_same_padding_maxPool2d(image_size=None):
276 |     """Chooses static padding if you have specified an image size, and dynamic padding otherwise.
277 |        Static padding is necessary for ONNX exporting of models.
278 | 
279 |     Args:
280 |         image_size (int or tuple): Size of the image.
281 | 
282 |     Returns:
283 |         MaxPool2dDynamicSamePadding or MaxPool2dStaticSamePadding.
284 |     """
285 |     if image_size is None:
286 |         return MaxPool2dDynamicSamePadding
287 |     else:
288 |         return partial(MaxPool2dStaticSamePadding, image_size=image_size)
289 | 
290 | 
291 | class MaxPool2dDynamicSamePadding(nn.MaxPool2d):
292 |     """2D MaxPooling like TensorFlow's 'SAME' mode, with a dynamic image size.
293 |        The padding is operated in forward function by calculating dynamically.
294 |     """
295 | 
296 |     def __init__(self, kernel_size, stride, padding=0, dilation=1, return_indices=False, ceil_mode=False):
297 |         super().__init__(kernel_size, stride, padding, dilation, return_indices, ceil_mode)
298 |         self.stride = [self.stride] * 2 if isinstance(self.stride, int) else self.stride
299 |         self.kernel_size = [self.kernel_size] * 2 if isinstance(self.kernel_size, int) else self.kernel_size
300 |         self.dilation = [self.dilation] * 2 if isinstance(self.dilation, int) else self.dilation
301 | 
302 |     def forward(self, x):
303 |         ih, iw = x.size()[-2:]
304 |         kh, kw = self.kernel_size
305 |         sh, sw = self.stride
306 |         oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
307 |         pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
308 |         pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
309 |         if pad_h > 0 or pad_w > 0:
310 |             x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2])
311 |         return F.max_pool2d(x, self.kernel_size, self.stride, self.padding,
312 |                             self.dilation, self.ceil_mode, self.return_indices)
313 | 
314 | class MaxPool2dStaticSamePadding(nn.MaxPool2d):
315 |     """2D MaxPooling like TensorFlow's 'SAME' mode, with the given input image size.
316 |        The padding mudule is calculated in construction function, then used in forward.
317 |     """
318 | 
319 |     def __init__(self, kernel_size, stride, image_size=None, **kwargs):
320 |         super().__init__(kernel_size, stride, **kwargs)
321 |         self.stride = [self.stride] * 2 if isinstance(self.stride, int) else self.stride
322 |         self.kernel_size = [self.kernel_size] * 2 if isinstance(self.kernel_size, int) else self.kernel_size
323 |         self.dilation = [self.dilation] * 2 if isinstance(self.dilation, int) else self.dilation
324 | 
325 |         # Calculate padding based on image size and save it
326 |         assert image_size is not None
327 |         ih, iw = (image_size, image_size) if isinstance(image_size, int) else image_size
328 |         kh, kw = self.kernel_size
329 |         sh, sw = self.stride
330 |         oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
331 |         pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
332 |         pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
333 |         if pad_h > 0 or pad_w > 0:
334 |             self.static_padding = nn.ZeroPad2d((pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2))
335 |         else:
336 |             self.static_padding = Identity()
337 | 
338 |     def forward(self, x):
339 |         x = self.static_padding(x)
340 |         x = F.max_pool2d(x, self.kernel_size, self.stride, self.padding,
341 |                          self.dilation, self.ceil_mode, self.return_indices)
342 |         return x
343 | 
344 | class Identity(nn.Module):
345 |     """Identity mapping.
346 |        Send input to output directly.
347 |     """
348 | 
349 |     def __init__(self):
350 |         super(Identity, self).__init__()
351 | 
352 |     def forward(self, input):
353 |         return input
354 | 
355 | 
356 | ################################################################################
357 | ### Helper functions for loading model params
358 | ################################################################################
359 | 
360 | # BlockDecoder: A Class for encoding and decoding BlockArgs
361 | # efficientnet_params: A function to query compound coefficient
362 | # get_model_params and efficientnet:
363 | #     Functions to get BlockArgs and GlobalParams for efficientnet
364 | # url_map and url_map_advprop: Dicts of url_map for pretrained weights
365 | # load_pretrained_weights: A function to load pretrained weights
366 | 
367 | class BlockDecoder(object):
368 |     """Block Decoder for readability,
369 |        straight from the official TensorFlow repository.
370 |     """
371 | 
372 |     @staticmethod
373 |     def _decode_block_string(block_string):
374 |         """Get a block through a string notation of arguments.
375 | 
376 |         Args:
377 |             block_string (str): A string notation of arguments.
378 |                                 Examples: 'r1_k3_s11_e1_i32_o16_se0.25_noskip'.
379 | 
380 |         Returns:
381 |             BlockArgs: The namedtuple defined at the top of this file.
382 |         """
383 |         assert isinstance(block_string, str)
384 | 
385 |         ops = block_string.split('_')
386 |         options = {}
387 |         for op in ops:
388 |             splits = re.split(r'(\d.*)', op)
389 |             if len(splits) >= 2:
390 |                 key, value = splits[:2]
391 |                 options[key] = value
392 | 
393 |         # Check stride
394 |         assert (('s' in options and len(options['s']) == 1) or
395 |                 (len(options['s']) == 2 and options['s'][0] == options['s'][1]))
396 | 
397 |         return BlockArgs(
398 |             num_repeat=int(options['r']),
399 |             kernel_size=int(options['k']),
400 |             stride=[int(options['s'][0])],
401 |             expand_ratio=int(options['e']),
402 |             input_filters=int(options['i']),
403 |             output_filters=int(options['o']),
404 |             se_ratio=float(options['se']) if 'se' in options else None,
405 |             id_skip=('noskip' not in block_string))
406 | 
407 |     @staticmethod
408 |     def _encode_block_string(block):
409 |         """Encode a block to a string.
410 | 
411 |         Args:
412 |             block (namedtuple): A BlockArgs type argument.
413 | 
414 |         Returns:
415 |             block_string: A String form of BlockArgs.
416 |         """
417 |         args = [
418 |             'r%d' % block.num_repeat,
419 |             'k%d' % block.kernel_size,
420 |             's%d%d' % (block.strides[0], block.strides[1]),
421 |             'e%s' % block.expand_ratio,
422 |             'i%d' % block.input_filters,
423 |             'o%d' % block.output_filters
424 |         ]
425 |         if 0 < block.se_ratio <= 1:
426 |             args.append('se%s' % block.se_ratio)
427 |         if block.id_skip is False:
428 |             args.append('noskip')
429 |         return '_'.join(args)
430 | 
431 |     @staticmethod
432 |     def decode(string_list):
433 |         """Decode a list of string notations to specify blocks inside the network.
434 | 
435 |         Args:
436 |             string_list (list[str]): A list of strings, each string is a notation of block.
437 | 
438 |         Returns:
439 |             blocks_args: A list of BlockArgs namedtuples of block args.
440 |         """
441 |         assert isinstance(string_list, list)
442 |         blocks_args = []
443 |         for block_string in string_list:
444 |             blocks_args.append(BlockDecoder._decode_block_string(block_string))
445 |         return blocks_args
446 | 
447 |     @staticmethod
448 |     def encode(blocks_args):
449 |         """Encode a list of BlockArgs to a list of strings.
450 | 
451 |         Args:
452 |             blocks_args (list[namedtuples]): A list of BlockArgs namedtuples of block args.
453 | 
454 |         Returns:
455 |             block_strings: A list of strings, each string is a notation of block.
456 |         """
457 |         block_strings = []
458 |         for block in blocks_args:
459 |             block_strings.append(BlockDecoder._encode_block_string(block))
460 |         return block_strings
461 | 
462 | 
463 | def efficientnet_params(model_name):
464 |     """Map EfficientNet model name to parameter coefficients.
465 | 
466 |     Args:
467 |         model_name (str): Model name to be queried.
468 | 
469 |     Returns:
470 |         params_dict[model_name]: A (width,depth,res,dropout) tuple.
471 |     """
472 |     params_dict = {
473 |         # Coefficients:   width,depth,res,dropout
474 |         'efficientnet-b0': (1.0, 1.0, 224, 0.2),
475 |         'efficientnet-b1': (1.0, 1.1, 240, 0.2),
476 |         'efficientnet-b2': (1.1, 1.2, 260, 0.3),
477 |         'efficientnet-b3': (1.2, 1.4, 300, 0.3),
478 |         'efficientnet-b4': (1.4, 1.8, 380, 0.4),
479 |         'efficientnet-b5': (1.6, 2.2, 456, 0.4),
480 |         'efficientnet-b6': (1.8, 2.6, 528, 0.5),
481 |         'efficientnet-b7': (2.0, 3.1, 600, 0.5),
482 |         'efficientnet-b8': (2.2, 3.6, 672, 0.5),
483 |         'efficientnet-l2': (4.3, 5.3, 800, 0.5),
484 |     }
485 |     return params_dict[model_name]
486 | 
487 | 
488 | def efficientnet(width_coefficient=None, depth_coefficient=None, image_size=None,
489 |                  dropout_rate=0.2, drop_connect_rate=0.2, num_classes=1000):
490 |     """Create BlockArgs and GlobalParams for efficientnet model.
491 | 
492 |     Args:
493 |         width_coefficient (float)
494 |         depth_coefficient (float)
495 |         image_size (int)
496 |         dropout_rate (float)
497 |         drop_connect_rate (float)
498 |         num_classes (int)
499 | 
500 |         Meaning as the name suggests.
501 | 
502 |     Returns:
503 |         blocks_args, global_params.
504 |     """
505 | 
506 |     # Blocks args for the whole model(efficientnet-b0 by default)
507 |     # It will be modified in the construction of EfficientNet Class according to model
508 |     blocks_args = [
509 |         'r1_k3_s11_e1_i32_o16_se0.25',
510 |         'r2_k3_s22_e6_i16_o24_se0.25',
511 |         'r2_k5_s22_e6_i24_o40_se0.25',
512 |         'r3_k3_s22_e6_i40_o80_se0.25',
513 |         'r3_k5_s11_e6_i80_o112_se0.25',
514 |         'r4_k5_s22_e6_i112_o192_se0.25',
515 |         'r1_k3_s11_e6_i192_o320_se0.25',
516 |     ]
517 |     blocks_args = BlockDecoder.decode(blocks_args)
518 | 
519 |     global_params = GlobalParams(
520 |         width_coefficient=width_coefficient,
521 |         depth_coefficient=depth_coefficient,
522 |         image_size=image_size,
523 |         dropout_rate=dropout_rate,
524 | 
525 |         num_classes=num_classes,
526 |         batch_norm_momentum=0.99,
527 |         batch_norm_epsilon=1e-3,
528 |         drop_connect_rate=drop_connect_rate,
529 |         depth_divisor=8,
530 |         min_depth=None,
531 |     )
532 | 
533 |     return blocks_args, global_params
534 | 
535 | 
536 | def get_model_params(model_name, override_params):
537 |     """Get the block args and global params for a given model name.
538 | 
539 |     Args:
540 |         model_name (str): Model's name.
541 |         override_params (dict): A dict to modify global_params.
542 | 
543 |     Returns:
544 |         blocks_args, global_params
545 |     """
546 |     if model_name.startswith('efficientnet'):
547 |         w, d, s, p = efficientnet_params(model_name)
548 |         # note: all models have drop connect rate = 0.2
549 |         blocks_args, global_params = efficientnet(
550 |             width_coefficient=w, depth_coefficient=d, dropout_rate=p, image_size=s)
551 |     else:
552 |         raise NotImplementedError('model name is not pre-defined: %s' % model_name)
553 |     if override_params:
554 |         # ValueError will be raised here if override_params has fields not included in global_params.
555 |         global_params = global_params._replace(**override_params)
556 |     return blocks_args, global_params
557 | 
558 | 
559 | # train with Standard methods
560 | # check more details in paper(EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks)
561 | url_map = {
562 |     'efficientnet-b0': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b0-355c32eb.pth',
563 |     'efficientnet-b1': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b1-f1951068.pth',
564 |     'efficientnet-b2': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b2-8bb594d6.pth',
565 |     'efficientnet-b3': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b3-5fb5a3c3.pth',
566 |     'efficientnet-b4': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b4-6ed6700e.pth',
567 |     'efficientnet-b5': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b5-b6417697.pth',
568 |     'efficientnet-b6': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b6-c76e70fd.pth',
569 |     'efficientnet-b7': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b7-dcc49843.pth',
570 | }
571 | 
572 | # train with Adversarial Examples(AdvProp)
573 | # check more details in paper(Adversarial Examples Improve Image Recognition)
574 | url_map_advprop = {
575 |     'efficientnet-b0': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b0-b64d5a18.pth',
576 |     'efficientnet-b1': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b1-0f3ce85a.pth',
577 |     'efficientnet-b2': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b2-6e9d97e5.pth',
578 |     'efficientnet-b3': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b3-cdd7c0f4.pth',
579 |     'efficientnet-b4': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b4-44fb3a87.pth',
580 |     'efficientnet-b5': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b5-86493f6b.pth',
581 |     'efficientnet-b6': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b6-ac80338e.pth',
582 |     'efficientnet-b7': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b7-4652b6dd.pth',
583 |     'efficientnet-b8': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b8-22a8fe65.pth',
584 | }
585 | 
586 | # TODO: add the petrained weights url map of 'efficientnet-l2'
587 | 
588 | 
589 | def load_pretrained_weights(model, model_name, weights_path=None, load_fc=True, advprop=False):
590 |     """Loads pretrained weights from weights path or download using url.
591 | 
592 |     Args:
593 |         model (Module): The whole model of efficientnet.
594 |         model_name (str): Model name of efficientnet.
595 |         weights_path (None or str): 
596 |             str: path to pretrained weights file on the local disk.
597 |             None: use pretrained weights downloaded from the Internet.
598 |         load_fc (bool): Whether to load pretrained weights for fc layer at the end of the model.
599 |         advprop (bool): Whether to load pretrained weights
600 |                         trained with advprop (valid when weights_path is None).
601 |     """
602 |     if isinstance(weights_path,str):
603 |         state_dict = torch.load(weights_path)
604 |     else:
605 |         # AutoAugment or Advprop (different preprocessing)
606 |         url_map_ = url_map_advprop if advprop else url_map
607 |         state_dict = model_zoo.load_url(url_map_[model_name])
608 |     
609 |     if load_fc:
610 |         ret = model.load_state_dict(state_dict, strict=False)
611 |         assert not ret.missing_keys, f'Missing keys when loading pretrained weights: {ret.missing_keys}'
612 |     else:
613 |         state_dict.pop('_fc.weight')
614 |         state_dict.pop('_fc.bias')
615 |         ret = model.load_state_dict(state_dict, strict=False)
616 |         assert set(ret.missing_keys) == set(
617 |             ['_fc.weight', '_fc.bias']), f'Missing keys when loading pretrained weights: {ret.missing_keys}'
618 |     assert not ret.unexpected_keys, f'Missing keys when loading pretrained weights: {ret.unexpected_keys}'
619 | 
620 |     print('Loaded pretrained weights for {}'.format(model_name))
621 | 


--------------------------------------------------------------------------------
/test_widerface.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import os
  3 | import argparse
  4 | import torch
  5 | import torch.backends.cudnn as cudnn
  6 | import numpy as np
  7 | from data import cfg_mnetv1, cfg_mnetv2, cfg_mnetv3
  8 | from layers.functions.prior_box import PriorBox
  9 | from utils.nms.py_cpu_nms import py_cpu_nms
 10 | import cv2
 11 | from models.retinaface import RetinaFace
 12 | from utils.box_utils import decode, decode_landm
 13 | from utils.timer import Timer
 14 | 
 15 | 
 16 | parser = argparse.ArgumentParser(description='Retinaface')
 17 | parser.add_argument('-m', '--trained_model', default='./weights/mobilenetv2_0.1_Final.pth',
 18 |                     type=str, help='Trained state_dict file path to open')
 19 | parser.add_argument('--network', default='mobilenetv2', help='Backbone network mobilenetv1, mobilenetv2 or mobilenetv3')
 20 | parser.add_argument('--origin_size', default=True, type=str, help='Whether use origin image size to evaluate')
 21 | parser.add_argument('--save_folder', default='./widerface_evaluate/widerface_txt/', type=str, help='Dir to save txt results')
 22 | parser.add_argument('--cpu', action="store_true", default=False, help='Use cpu inference')
 23 | parser.add_argument('--dataset_folder', default='./data/widerface/val/images/', type=str, help='dataset path')
 24 | parser.add_argument('--confidence_threshold', default=0.02, type=float, help='confidence_threshold')
 25 | parser.add_argument('--top_k', default=5000, type=int, help='top_k')
 26 | parser.add_argument('--nms_threshold', default=0.4, type=float, help='nms_threshold')
 27 | parser.add_argument('--keep_top_k', default=750, type=int, help='keep_top_k')
 28 | parser.add_argument('-s', '--save_image', action="store_true", default=False, help='show detection results')
 29 | parser.add_argument('--vis_thres', default=0.5, type=float, help='visualization_threshold')
 30 | args = parser.parse_args()
 31 | 
 32 | 
 33 | def check_keys(model, pretrained_state_dict):
 34 |     ckpt_keys = set(pretrained_state_dict.keys())
 35 |     model_keys = set(model.state_dict().keys())
 36 |     used_pretrained_keys = model_keys & ckpt_keys
 37 |     unused_pretrained_keys = ckpt_keys - model_keys
 38 |     missing_keys = model_keys - ckpt_keys
 39 |     print('Missing keys:{}'.format(len(missing_keys)))
 40 |     print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys)))
 41 |     print('Used keys:{}'.format(len(used_pretrained_keys)))
 42 |     assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint'
 43 |     return True
 44 | 
 45 | 
 46 | def remove_prefix(state_dict, prefix):
 47 |     ''' Old style model is stored with all names of parameters sharing common prefix 'module.' '''
 48 |     print('remove prefix \'{}\''.format(prefix))
 49 |     f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x
 50 |     return {f(key): value for key, value in state_dict.items()}
 51 | 
 52 | 
 53 | def load_model(model, pretrained_path, load_to_cpu):
 54 |     print('Loading pretrained model from {}'.format(pretrained_path))
 55 |     if load_to_cpu:
 56 |         pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage)
 57 |     else:
 58 |         device = torch.cuda.current_device()
 59 |         pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device))
 60 |     if "state_dict" in pretrained_dict.keys():
 61 |         pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.')
 62 |     else:
 63 |         pretrained_dict = remove_prefix(pretrained_dict, 'module.')
 64 |     check_keys(model, pretrained_dict)
 65 |     model.load_state_dict(pretrained_dict, strict=False)
 66 |     return model
 67 | 
 68 | 
 69 | if __name__ == '__main__':
 70 |     torch.set_grad_enabled(False)
 71 | 
 72 |     cfg = None
 73 |     if args.network == "mobilenetv1":
 74 |         cfg = cfg_mnetv1
 75 |     elif args.network == "mobilenetv2":
 76 |         cfg = cfg_mnetv2
 77 |     elif args.network == "mobilenetv3":
 78 |         cfg = cfg_mnetv3
 79 |     # net and model
 80 |     net = RetinaFace(cfg=cfg, phase = 'test')
 81 |     net = load_model(net, args.trained_model, args.cpu)
 82 |     net.eval()
 83 |     print('Finished loading model!')
 84 |     print(net)
 85 |     cudnn.benchmark = True
 86 |     device = torch.device("cpu" if args.cpu else "cuda")
 87 |     net = net.to(device)
 88 | 
 89 |     # testing dataset
 90 |     testset_folder = args.dataset_folder
 91 |     testset_list = args.dataset_folder[:-7] + "wider_val.txt"
 92 | 
 93 |     with open(testset_list, 'r') as fr:
 94 |         test_dataset = fr.read().split()
 95 |     num_images = len(test_dataset)
 96 | 
 97 |     _t = {'forward_pass': Timer(), 'misc': Timer()}
 98 | 
 99 |     # testing begin
100 |     for i, img_name in enumerate(test_dataset):
101 |         image_path = testset_folder + img_name
102 |         img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
103 |         img = np.float32(img_raw)
104 | 
105 |         # testing scale
106 |         target_size = 1600
107 |         max_size = 2150
108 |         im_shape = img.shape
109 |         im_size_min = np.min(im_shape[0:2])
110 |         im_size_max = np.max(im_shape[0:2])
111 |         resize = float(target_size) / float(im_size_min)
112 |         # prevent bigger axis from being more than max_size:
113 |         if np.round(resize * im_size_max) > max_size:
114 |             resize = float(max_size) / float(im_size_max)
115 |         if args.origin_size:
116 |             resize = 1
117 | 
118 |         if resize != 1:
119 |             img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
120 |         im_height, im_width, _ = img.shape
121 |         scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
122 |         img -= (104, 117, 123)
123 |         img = img.transpose(2, 0, 1)
124 |         img = torch.from_numpy(img).unsqueeze(0)
125 |         img = img.to(device)
126 |         scale = scale.to(device)
127 | 
128 |         _t['forward_pass'].tic()
129 |         loc, conf, landms = net(img)  # forward pass
130 |         _t['forward_pass'].toc()
131 |         _t['misc'].tic()
132 |         priorbox = PriorBox(cfg, image_size=(im_height, im_width))
133 |         priors = priorbox.forward()
134 |         priors = priors.to(device)
135 |         prior_data = priors.data
136 |         boxes = decode(loc.data.squeeze(0), prior_data, cfg['variance'])
137 |         boxes = boxes * scale / resize
138 |         boxes = boxes.cpu().numpy()
139 |         scores = conf.squeeze(0).data.cpu().numpy()[:, 1]
140 |         landms = decode_landm(landms.data.squeeze(0), prior_data, cfg['variance'])
141 |         scale1 = torch.Tensor([img.shape[3], img.shape[2], img.shape[3], img.shape[2],
142 |                                img.shape[3], img.shape[2], img.shape[3], img.shape[2],
143 |                                img.shape[3], img.shape[2]])
144 |         scale1 = scale1.to(device)
145 |         landms = landms * scale1 / resize
146 |         landms = landms.cpu().numpy()
147 | 
148 |         # ignore low scores
149 |         inds = np.where(scores > args.confidence_threshold)[0]
150 |         boxes = boxes[inds]
151 |         landms = landms[inds]
152 |         scores = scores[inds]
153 | 
154 |         # keep top-K before NMS
155 |         order = scores.argsort()[::-1]
156 |         # order = scores.argsort()[::-1][:args.top_k]
157 |         boxes = boxes[order]
158 |         landms = landms[order]
159 |         scores = scores[order]
160 | 
161 |         # do NMS
162 |         dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
163 |         keep = py_cpu_nms(dets, args.nms_threshold)
164 |         # keep = nms(dets, args.nms_threshold,force_cpu=args.cpu)
165 |         dets = dets[keep, :]
166 |         landms = landms[keep]
167 | 
168 |         # keep top-K faster NMS
169 |         # dets = dets[:args.keep_top_k, :]
170 |         # landms = landms[:args.keep_top_k, :]
171 | 
172 |         dets = np.concatenate((dets, landms), axis=1)
173 |         _t['misc'].toc()
174 | 
175 |         # --------------------------------------------------------------------
176 |         save_name = args.save_folder + img_name[:-4] + ".txt"
177 |         dirname = os.path.dirname(save_name)
178 |         if not os.path.isdir(dirname):
179 |             os.makedirs(dirname)
180 |         with open(save_name, "w") as fd:
181 |             bboxs = dets
182 |             file_name = os.path.basename(save_name)[:-4] + "\n"
183 |             bboxs_num = str(len(bboxs)) + "\n"
184 |             fd.write(file_name)
185 |             fd.write(bboxs_num)
186 |             for box in bboxs:
187 |                 x = int(box[0])
188 |                 y = int(box[1])
189 |                 w = int(box[2]) - int(box[0])
190 |                 h = int(box[3]) - int(box[1])
191 |                 confidence = str(box[4])
192 |                 line = str(x) + " " + str(y) + " " + str(w) + " " + str(h) + " " + confidence + " \n"
193 |                 fd.write(line)
194 | 
195 |         print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images, _t['forward_pass'].average_time, _t['misc'].average_time))
196 | 
197 |         # save image
198 |         if args.save_image:
199 |             for b in dets:
200 |                 if b[4] < args.vis_thres:
201 |                     continue
202 |                 text = "{:.4f}".format(b[4])
203 |                 b = list(map(int, b))
204 |                 cv2.rectangle(img_raw, (b[0], b[1]), (b[2], b[3]), (0, 0, 255), 2)
205 |                 cx = b[0]
206 |                 cy = b[1] + 12
207 |                 cv2.putText(img_raw, text, (cx, cy),
208 |                             cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))
209 | 
210 |                 # landms
211 |                 cv2.circle(img_raw, (b[5], b[6]), 1, (0, 0, 255), 4)
212 |                 cv2.circle(img_raw, (b[7], b[8]), 1, (0, 255, 255), 4)
213 |                 cv2.circle(img_raw, (b[9], b[10]), 1, (255, 0, 255), 4)
214 |                 cv2.circle(img_raw, (b[11], b[12]), 1, (0, 255, 0), 4)
215 |                 cv2.circle(img_raw, (b[13], b[14]), 1, (255, 0, 0), 4)
216 |             # save image
217 |             if not os.path.exists("./results/"):
218 |                 os.makedirs("./results/")
219 |             name = "./results/" + str(i) + ".jpg"
220 |             cv2.imwrite(name, img_raw)
221 | 
222 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import os
  3 | import torch
  4 | import torch.optim as optim
  5 | import torch.backends.cudnn as cudnn
  6 | import argparse
  7 | import torch.utils.data as data
  8 | from data import WiderFaceDetection, detection_collate, preproc, cfg_mnetv1, cfg_mnetv2, cfg_mnetv3, cfg_efnetb0
  9 | from layers.modules import MultiBoxLoss
 10 | from layers.functions.prior_box import PriorBox
 11 | import time
 12 | import datetime
 13 | import math
 14 | from models.retinaface import RetinaFace
 15 | 
 16 | parser = argparse.ArgumentParser(description='Retinaface Training')
 17 | parser.add_argument('--training_dataset', default='./data/widerface/train/label.txt', help='Training dataset directory')
 18 | parser.add_argument('--network', default='efficientnetb0', help='Backbone network mobilenetv2, mobilenetv2, mobilenetv3, efficientnetb0')
 19 | parser.add_argument('--num_workers', default=4, type=int, help='Number of workers used in dataloading')
 20 | parser.add_argument('--lr', '--learning-rate', default=1e-3, type=float, help='initial learning rate')
 21 | parser.add_argument('--momentum', default=0.9, type=float, help='momentum')
 22 | parser.add_argument('--resume_net', default=None, help='resume net for retraining')
 23 | parser.add_argument('--resume_epoch', default=0, type=int, help='resume iter for retraining')
 24 | parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD')
 25 | parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD')
 26 | parser.add_argument('--save_folder', default='./weights/', help='Location to save checkpoint models')
 27 | 
 28 | args = parser.parse_args()
 29 | 
 30 | if not os.path.exists(args.save_folder):
 31 |     os.mkdir(args.save_folder)
 32 | cfg = None
 33 | if args.network == "mobilenetv1":
 34 |     cfg = cfg_mnetv1
 35 | elif args.network == "mobilenetv2":
 36 |     cfg = cfg_mnetv2
 37 | elif args.network == "mobilenetv3":
 38 |     cfg = cfg_mnetv3
 39 | elif args.network == "efficientnetb0":
 40 |     print("Loading Backbone EfficientNet-b0")
 41 |     cfg = cfg_efnetb0
 42 | else:
 43 |     print("Light weight FaceDet only support mobilenet...")
 44 | 
 45 | rgb_mean = (104, 117, 123) # bgr order
 46 | num_classes = 2
 47 | img_dim = cfg['image_size']
 48 | num_gpu = cfg['ngpu']
 49 | batch_size = cfg['batch_size']
 50 | max_epoch = cfg['epoch']
 51 | gpu_train = cfg['gpu_train']
 52 | 
 53 | num_workers = args.num_workers
 54 | momentum = args.momentum
 55 | weight_decay = args.weight_decay
 56 | initial_lr = args.lr
 57 | gamma = args.gamma
 58 | training_dataset = args.training_dataset
 59 | save_folder = args.save_folder
 60 | 
 61 | net = RetinaFace(cfg=cfg)
 62 | print("Printing net...")
 63 | print(net)
 64 | 
 65 | if args.resume_net is not None:
 66 |     print('Loading resume network...')
 67 |     state_dict = torch.load(args.resume_net)
 68 |     # create new OrderedDict that does not contain `module.`
 69 |     from collections import OrderedDict
 70 |     new_state_dict = OrderedDict()
 71 |     for k, v in state_dict.items():
 72 |         head = k[:7]
 73 |         if head == 'module.':
 74 |             name = k[7:] # remove `module.`
 75 |         else:
 76 |             name = k
 77 |         new_state_dict[name] = v
 78 |     net.load_state_dict(new_state_dict)
 79 | 
 80 | if num_gpu > 1 and gpu_train:
 81 |     net = torch.nn.DataParallel(net).cuda()
 82 | else:
 83 |     net = net.cuda()
 84 | 
 85 | cudnn.benchmark = True
 86 | 
 87 | 
 88 | #optimizer = optim.SGD(net.parameters(), lr=initial_lr, momentum=momentum, weight_decay=weight_decay)
 89 | optimizer = optim.AdamW(net.parameters(), lr=initial_lr, weight_decay=weight_decay)
 90 | 
 91 | criterion = MultiBoxLoss(num_classes, 0.35, True, 0, True, 7, 0.35, False)
 92 | 
 93 | priorbox = PriorBox(cfg, image_size=(img_dim, img_dim))
 94 | with torch.no_grad():
 95 |     priors = priorbox.forward()
 96 |     priors = priors.cuda()
 97 | 
 98 | def train():
 99 |     net.train()
100 |     epoch = 0 + args.resume_epoch
101 |     print('Loading Dataset...')
102 | 
103 |     dataset = WiderFaceDetection( training_dataset,preproc(img_dim, rgb_mean))
104 | 
105 |     epoch_size = math.ceil(len(dataset) / batch_size)
106 |     max_iter = max_epoch * epoch_size
107 | 
108 |     stepvalues = (cfg['decay1'] * epoch_size, cfg['decay2'] * epoch_size)
109 |     step_index = 0
110 | 
111 |     if args.resume_epoch > 0:
112 |         start_iter = args.resume_epoch * epoch_size
113 |     else:
114 |         start_iter = 0
115 | 
116 |     for iteration in range(start_iter, max_iter):
117 |         if iteration % epoch_size == 0:
118 |             # create batch iterator
119 |             batch_iterator = iter(data.DataLoader(dataset, batch_size, shuffle=True, num_workers=num_workers, collate_fn=detection_collate))
120 |             if (epoch % 10 == 0 and epoch > 0) or (epoch % 5 == 0 and epoch > cfg['decay1']):
121 |                 torch.save(net.state_dict(), save_folder + cfg['name']+ '_epoch_' + str(epoch) + '.pth')
122 |             epoch += 1
123 | 
124 |         load_t0 = time.time()
125 |         if iteration in stepvalues:
126 |             step_index += 1
127 |         lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size)
128 | 
129 |         # load train data
130 |         images, targets = next(batch_iterator)
131 |         images = images.cuda()
132 |         targets = [anno.cuda() for anno in targets]
133 | 
134 |         # forward
135 |         out = net(images)
136 | 
137 |         # backprop
138 |         optimizer.zero_grad()
139 |         loss_l, loss_c, loss_landm = criterion(out, priors, targets)
140 |         loss = cfg['loc_weight'] * loss_l + loss_c + loss_landm
141 |         loss.backward()
142 |         optimizer.step()
143 |         load_t1 = time.time()
144 |         batch_time = load_t1 - load_t0
145 |         eta = int(batch_time * (max_iter - iteration))
146 |         print('Epoch:{}/{} || Epochiter: {}/{} || Iter: {}/{} || Loc: {:.4f} Cla: {:.4f} Landm: {:.4f} || LR: {:.8f} || Batchtime: {:.4f} s || ETA: {}'
147 |               .format(epoch, max_epoch, (iteration % epoch_size) + 1,
148 |               epoch_size, iteration + 1, max_iter, loss_l.item(), loss_c.item(), loss_landm.item(), lr, batch_time, str(datetime.timedelta(seconds=eta))))
149 | 
150 |     torch.save(net.state_dict(), save_folder + cfg['name'] + '_Final.pth')
151 |     # torch.save(net.state_dict(), save_folder + 'Final_Retinaface.pth')
152 | 
153 | 
154 | def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size):
155 |     """Sets the learning rate
156 |     # Adapted from PyTorch Imagenet example:
157 |     # https://github.com/pytorch/examples/blob/master/imagenet/main.py
158 |     """
159 |     warmup_epoch = 5
160 |     if epoch <= warmup_epoch:
161 |         lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch)
162 |     else:
163 |         lr = initial_lr * (gamma ** (step_index))
164 |     for param_group in optimizer.param_groups:
165 |         param_group['lr'] = lr
166 |     return lr
167 | 
168 | if __name__ == '__main__':
169 |     train()
170 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/utils/__init__.py


--------------------------------------------------------------------------------
/utils/box_utils.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | 
  4 | 
  5 | def point_form(boxes):
  6 |     """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
  7 |     representation for comparison to point form ground truth data.
  8 |     Args:
  9 |         boxes: (tensor) center-size default boxes from priorbox layers.
 10 |     Return:
 11 |         boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
 12 |     """
 13 |     return torch.cat((boxes[:, :2] - boxes[:, 2:]/2,     # xmin, ymin
 14 |                      boxes[:, :2] + boxes[:, 2:]/2), 1)  # xmax, ymax
 15 | 
 16 | 
 17 | def center_size(boxes):
 18 |     """ Convert prior_boxes to (cx, cy, w, h)
 19 |     representation for comparison to center-size form ground truth data.
 20 |     Args:
 21 |         boxes: (tensor) point_form boxes
 22 |     Return:
 23 |         boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
 24 |     """
 25 |     return torch.cat((boxes[:, 2:] + boxes[:, :2])/2,  # cx, cy
 26 |                      boxes[:, 2:] - boxes[:, :2], 1)  # w, h
 27 | 
 28 | 
 29 | def intersect(box_a, box_b):
 30 |     """ We resize both tensors to [A,B,2] without new malloc:
 31 |     [A,2] -> [A,1,2] -> [A,B,2]
 32 |     [B,2] -> [1,B,2] -> [A,B,2]
 33 |     Then we compute the area of intersect between box_a and box_b.
 34 |     Args:
 35 |       box_a: (tensor) bounding boxes, Shape: [A,4].
 36 |       box_b: (tensor) bounding boxes, Shape: [B,4].
 37 |     Return:
 38 |       (tensor) intersection area, Shape: [A,B].
 39 |     """
 40 |     A = box_a.size(0)
 41 |     B = box_b.size(0)
 42 |     max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2),
 43 |                        box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
 44 |     min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2),
 45 |                        box_b[:, :2].unsqueeze(0).expand(A, B, 2))
 46 |     inter = torch.clamp((max_xy - min_xy), min=0)
 47 |     return inter[:, :, 0] * inter[:, :, 1]
 48 | 
 49 | 
 50 | def jaccard(box_a, box_b):
 51 |     """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
 52 |     is simply the intersection over union of two boxes.  Here we operate on
 53 |     ground truth boxes and default boxes.
 54 |     E.g.:
 55 |         A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
 56 |     Args:
 57 |         box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4]
 58 |         box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4]
 59 |     Return:
 60 |         jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)]
 61 |     """
 62 |     inter = intersect(box_a, box_b)
 63 |     area_a = ((box_a[:, 2]-box_a[:, 0]) *
 64 |               (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]
 65 |     area_b = ((box_b[:, 2]-box_b[:, 0]) *
 66 |               (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)  # [A,B]
 67 |     union = area_a + area_b - inter
 68 |     return inter / union  # [A,B]
 69 | 
 70 | 
 71 | def matrix_iou(a, b):
 72 |     """
 73 |     return iou of a and b, numpy version for data augenmentation
 74 |     """
 75 |     lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
 76 |     rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
 77 | 
 78 |     area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
 79 |     area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
 80 |     area_b = np.prod(b[:, 2:] - b[:, :2], axis=1)
 81 |     return area_i / (area_a[:, np.newaxis] + area_b - area_i)
 82 | 
 83 | 
 84 | def matrix_iof(a, b):
 85 |     """
 86 |     return iof of a and b, numpy version for data augenmentation
 87 |     """
 88 |     lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
 89 |     rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
 90 | 
 91 |     area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
 92 |     area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
 93 |     return area_i / np.maximum(area_a[:, np.newaxis], 1)
 94 | 
 95 | 
 96 | def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):
 97 |     """Match each prior box with the ground truth box of the highest jaccard
 98 |     overlap, encode the bounding boxes, then return the matched indices
 99 |     corresponding to both confidence and location preds.
100 |     Args:
101 |         threshold: (float) The overlap threshold used when mathing boxes.
102 |         truths: (tensor) Ground truth boxes, Shape: [num_obj, 4].
103 |         priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
104 |         variances: (tensor) Variances corresponding to each prior coord,
105 |             Shape: [num_priors, 4].
106 |         labels: (tensor) All the class labels for the image, Shape: [num_obj].
107 |         landms: (tensor) Ground truth landms, Shape [num_obj, 10].
108 |         loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
109 |         conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
110 |         landm_t: (tensor) Tensor to be filled w/ endcoded landm targets.
111 |         idx: (int) current batch index
112 |     Return:
113 |         The matched indices corresponding to 1)location 2)confidence 3)landm preds.
114 |     """
115 |     # jaccard index
116 |     overlaps = jaccard(
117 |         truths,
118 |         point_form(priors)
119 |     )
120 |     # (Bipartite Matching)
121 |     # [1,num_objects] best prior for each ground truth
122 |     best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
123 | 
124 |     # ignore hard gt
125 |     valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
126 |     best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
127 |     if best_prior_idx_filter.shape[0] <= 0:
128 |         loc_t[idx] = 0
129 |         conf_t[idx] = 0
130 |         return
131 | 
132 |     # [1,num_priors] best ground truth for each prior
133 |     best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
134 |     best_truth_idx.squeeze_(0)
135 |     best_truth_overlap.squeeze_(0)
136 |     best_prior_idx.squeeze_(1)
137 |     best_prior_idx_filter.squeeze_(1)
138 |     best_prior_overlap.squeeze_(1)
139 |     best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2)  # ensure best prior
140 |     # TODO refactor: index  best_prior_idx with long tensor
141 |     # ensure every gt matches with its prior of max overlap
142 |     for j in range(best_prior_idx.size(0)):     # 判别此anchor是预测哪一个boxes
143 |         best_truth_idx[best_prior_idx[j]] = j
144 |     matches = truths[best_truth_idx]            # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
145 |     conf = labels[best_truth_idx]               # Shape: [num_priors]      此处为每一个anchor对应的label取出来
146 |     conf[best_truth_overlap < threshold] = 0    # label as background   overlap<0.35的全部作为负样本
147 |     loc = encode(matches, priors, variances)
148 | 
149 |     matches_landm = landms[best_truth_idx]
150 |     landm = encode_landm(matches_landm, priors, variances)
151 |     loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
152 |     conf_t[idx] = conf  # [num_priors] top class label for each prior
153 |     landm_t[idx] = landm
154 | 
155 | 
156 | def encode(matched, priors, variances):
157 |     """Encode the variances from the priorbox layers into the ground truth boxes
158 |     we have matched (based on jaccard overlap) with the prior boxes.
159 |     Args:
160 |         matched: (tensor) Coords of ground truth for each prior in point-form
161 |             Shape: [num_priors, 4].
162 |         priors: (tensor) Prior boxes in center-offset form
163 |             Shape: [num_priors,4].
164 |         variances: (list[float]) Variances of priorboxes
165 |     Return:
166 |         encoded boxes (tensor), Shape: [num_priors, 4]
167 |     """
168 | 
169 |     # dist b/t match center and prior's center
170 |     g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
171 |     # encode variance
172 |     g_cxcy /= (variances[0] * priors[:, 2:])
173 |     # match wh / prior wh
174 |     g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
175 |     g_wh = torch.log(g_wh) / variances[1]
176 |     # return target for smooth_l1_loss
177 |     return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]
178 | 
179 | def encode_landm(matched, priors, variances):
180 |     """Encode the variances from the priorbox layers into the ground truth boxes
181 |     we have matched (based on jaccard overlap) with the prior boxes.
182 |     Args:
183 |         matched: (tensor) Coords of ground truth for each prior in point-form
184 |             Shape: [num_priors, 10].
185 |         priors: (tensor) Prior boxes in center-offset form
186 |             Shape: [num_priors,4].
187 |         variances: (list[float]) Variances of priorboxes
188 |     Return:
189 |         encoded landm (tensor), Shape: [num_priors, 10]
190 |     """
191 | 
192 |     # dist b/t match center and prior's center
193 |     matched = torch.reshape(matched, (matched.size(0), 5, 2))
194 |     priors_cx = priors[:, 0].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
195 |     priors_cy = priors[:, 1].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
196 |     priors_w = priors[:, 2].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
197 |     priors_h = priors[:, 3].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
198 |     priors = torch.cat([priors_cx, priors_cy, priors_w, priors_h], dim=2)
199 |     g_cxcy = matched[:, :, :2] - priors[:, :, :2]
200 |     # encode variance
201 |     g_cxcy /= (variances[0] * priors[:, :, 2:])
202 |     # g_cxcy /= priors[:, :, 2:]
203 |     g_cxcy = g_cxcy.reshape(g_cxcy.size(0), -1)
204 |     # return target for smooth_l1_loss
205 |     return g_cxcy
206 | 
207 | 
208 | # Adapted from https://github.com/Hakuyume/chainer-ssd
209 | def decode(loc, priors, variances):
210 |     """Decode locations from predictions using priors to undo
211 |     the encoding we did for offset regression at train time.
212 |     Args:
213 |         loc (tensor): location predictions for loc layers,
214 |             Shape: [num_priors,4]
215 |         priors (tensor): Prior boxes in center-offset form.
216 |             Shape: [num_priors,4].
217 |         variances: (list[float]) Variances of priorboxes
218 |     Return:
219 |         decoded bounding box predictions
220 |     """
221 | 
222 |     boxes = torch.cat((
223 |         priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
224 |         priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1)
225 |     boxes[:, :2] -= boxes[:, 2:] / 2
226 |     boxes[:, 2:] += boxes[:, :2]
227 |     return boxes
228 | 
229 | def decode_landm(pre, priors, variances):
230 |     """Decode landm from predictions using priors to undo
231 |     the encoding we did for offset regression at train time.
232 |     Args:
233 |         pre (tensor): landm predictions for loc layers,
234 |             Shape: [num_priors,10]
235 |         priors (tensor): Prior boxes in center-offset form.
236 |             Shape: [num_priors,4].
237 |         variances: (list[float]) Variances of priorboxes
238 |     Return:
239 |         decoded landm predictions
240 |     """
241 |     landms = torch.cat((priors[:, :2] + pre[:, :2] * variances[0] * priors[:, 2:],
242 |                         priors[:, :2] + pre[:, 2:4] * variances[0] * priors[:, 2:],
243 |                         priors[:, :2] + pre[:, 4:6] * variances[0] * priors[:, 2:],
244 |                         priors[:, :2] + pre[:, 6:8] * variances[0] * priors[:, 2:],
245 |                         priors[:, :2] + pre[:, 8:10] * variances[0] * priors[:, 2:],
246 |                         ), dim=1)
247 |     return landms
248 | 
249 | 
250 | def log_sum_exp(x):
251 |     """Utility function for computing log_sum_exp while determining
252 |     This will be used to determine unaveraged confidence loss across
253 |     all examples in a batch.
254 |     Args:
255 |         x (Variable(tensor)): conf_preds from conf layers
256 |     """
257 |     x_max = x.data.max()
258 |     return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max
259 | 
260 | 
261 | # Original author: Francisco Massa:
262 | # https://github.com/fmassa/object-detection.torch
263 | # Ported to PyTorch by Max deGroot (02/01/2017)
264 | def nms(boxes, scores, overlap=0.5, top_k=200):
265 |     """Apply non-maximum suppression at test time to avoid detecting too many
266 |     overlapping bounding boxes for a given object.
267 |     Args:
268 |         boxes: (tensor) The location preds for the img, Shape: [num_priors,4].
269 |         scores: (tensor) The class predscores for the img, Shape:[num_priors].
270 |         overlap: (float) The overlap thresh for suppressing unnecessary boxes.
271 |         top_k: (int) The Maximum number of box preds to consider.
272 |     Return:
273 |         The indices of the kept boxes with respect to num_priors.
274 |     """
275 | 
276 |     keep = torch.Tensor(scores.size(0)).fill_(0).long()
277 |     if boxes.numel() == 0:
278 |         return keep
279 |     x1 = boxes[:, 0]
280 |     y1 = boxes[:, 1]
281 |     x2 = boxes[:, 2]
282 |     y2 = boxes[:, 3]
283 |     area = torch.mul(x2 - x1, y2 - y1)
284 |     v, idx = scores.sort(0)  # sort in ascending order
285 |     # I = I[v >= 0.01]
286 |     idx = idx[-top_k:]  # indices of the top-k largest vals
287 |     xx1 = boxes.new()
288 |     yy1 = boxes.new()
289 |     xx2 = boxes.new()
290 |     yy2 = boxes.new()
291 |     w = boxes.new()
292 |     h = boxes.new()
293 | 
294 |     # keep = torch.Tensor()
295 |     count = 0
296 |     while idx.numel() > 0:
297 |         i = idx[-1]  # index of current largest val
298 |         # keep.append(i)
299 |         keep[count] = i
300 |         count += 1
301 |         if idx.size(0) == 1:
302 |             break
303 |         idx = idx[:-1]  # remove kept element from view
304 |         # load bboxes of next highest vals
305 |         torch.index_select(x1, 0, idx, out=xx1)
306 |         torch.index_select(y1, 0, idx, out=yy1)
307 |         torch.index_select(x2, 0, idx, out=xx2)
308 |         torch.index_select(y2, 0, idx, out=yy2)
309 |         # store element-wise max with next highest score
310 |         xx1 = torch.clamp(xx1, min=x1[i])
311 |         yy1 = torch.clamp(yy1, min=y1[i])
312 |         xx2 = torch.clamp(xx2, max=x2[i])
313 |         yy2 = torch.clamp(yy2, max=y2[i])
314 |         w.resize_as_(xx2)
315 |         h.resize_as_(yy2)
316 |         w = xx2 - xx1
317 |         h = yy2 - yy1
318 |         # check sizes of xx1 and xx2.. after each iteration
319 |         w = torch.clamp(w, min=0.0)
320 |         h = torch.clamp(h, min=0.0)
321 |         inter = w*h
322 |         # IoU = i / (area(a) + area(b) - i)
323 |         rem_areas = torch.index_select(area, 0, idx)  # load remaining areas)
324 |         union = (rem_areas - inter) + area[i]
325 |         IoU = inter/union  # store result in iou
326 |         # keep only elements with an IoU <= overlap
327 |         idx = idx[IoU.le(overlap)]
328 |     return keep, count
329 | 
330 | 
331 | 


--------------------------------------------------------------------------------
/utils/nms/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/utils/nms/__init__.py


--------------------------------------------------------------------------------
/utils/nms/py_cpu_nms.py:
--------------------------------------------------------------------------------
 1 | # --------------------------------------------------------
 2 | # Fast R-CNN
 3 | # Copyright (c) 2015 Microsoft
 4 | # Licensed under The MIT License [see LICENSE for details]
 5 | # Written by Ross Girshick
 6 | # --------------------------------------------------------
 7 | 
 8 | import numpy as np
 9 | 
10 | def py_cpu_nms(dets, thresh):
11 |     """Pure Python NMS baseline."""
12 |     x1 = dets[:, 0]
13 |     y1 = dets[:, 1]
14 |     x2 = dets[:, 2]
15 |     y2 = dets[:, 3]
16 |     scores = dets[:, 4]
17 | 
18 |     areas = (x2 - x1 + 1) * (y2 - y1 + 1)
19 |     order = scores.argsort()[::-1]
20 | 
21 |     keep = []
22 |     while order.size > 0:
23 |         i = order[0]
24 |         keep.append(i)
25 |         xx1 = np.maximum(x1[i], x1[order[1:]])
26 |         yy1 = np.maximum(y1[i], y1[order[1:]])
27 |         xx2 = np.minimum(x2[i], x2[order[1:]])
28 |         yy2 = np.minimum(y2[i], y2[order[1:]])
29 | 
30 |         w = np.maximum(0.0, xx2 - xx1 + 1)
31 |         h = np.maximum(0.0, yy2 - yy1 + 1)
32 |         inter = w * h
33 |         ovr = inter / (areas[i] + areas[order[1:]] - inter)
34 | 
35 |         inds = np.where(ovr <= thresh)[0]
36 |         order = order[inds + 1]
37 | 
38 |     return keep
39 | 


--------------------------------------------------------------------------------
/utils/timer.py:
--------------------------------------------------------------------------------
 1 | # --------------------------------------------------------
 2 | # Fast R-CNN
 3 | # Copyright (c) 2015 Microsoft
 4 | # Licensed under The MIT License [see LICENSE for details]
 5 | # Written by Ross Girshick
 6 | # --------------------------------------------------------
 7 | 
 8 | import time
 9 | 
10 | 
11 | class Timer(object):
12 |     """A simple timer."""
13 |     def __init__(self):
14 |         self.total_time = 0.
15 |         self.calls = 0
16 |         self.start_time = 0.
17 |         self.diff = 0.
18 |         self.average_time = 0.
19 | 
20 |     def tic(self):
21 |         # using time.time instead of time.clock because time time.clock
22 |         # does not normalize for multithreading
23 |         self.start_time = time.time()
24 | 
25 |     def toc(self, average=True):
26 |         self.diff = time.time() - self.start_time
27 |         self.total_time += self.diff
28 |         self.calls += 1
29 |         self.average_time = self.total_time / self.calls
30 |         if average:
31 |             return self.average_time
32 |         else:
33 |             return self.diff
34 | 
35 |     def clear(self):
36 |         self.total_time = 0.
37 |         self.calls = 0
38 |         self.start_time = 0.
39 |         self.diff = 0.
40 |         self.average_time = 0.
41 | 


--------------------------------------------------------------------------------
/widerface_evaluate/README.md:
--------------------------------------------------------------------------------
 1 | # WiderFace-Evaluation
 2 | Python Evaluation Code for [Wider Face Dataset](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/)
 3 | 
 4 | 
 5 | ## Usage
 6 | 
 7 | 
 8 | ##### before evaluating ....
 9 | 
10 | ````
11 | python3 setup.py build_ext --inplace
12 | ````
13 | 
14 | ##### evaluating
15 | 
16 | **GroungTruth:** `wider_face_val.mat`, `wider_easy_val.mat`, `wider_medium_val.mat`,`wider_hard_val.mat`
17 | 
18 | ````
19 | python3 evaluation.py -p <your prediction dir> -g <groud truth dir>
20 | ````
21 | 
22 | ## Bugs & Problems
23 | please issue
24 | 
25 | ## Acknowledgements
26 | 
27 | some code borrowed from Sergey Karayev
28 | 


--------------------------------------------------------------------------------
/widerface_evaluate/box_overlaps.pyx:
--------------------------------------------------------------------------------
 1 | # --------------------------------------------------------
 2 | # Fast R-CNN
 3 | # Copyright (c) 2015 Microsoft
 4 | # Licensed under The MIT License [see LICENSE for details]
 5 | # Written by Sergey Karayev
 6 | # --------------------------------------------------------
 7 | 
 8 | cimport cython
 9 | import numpy as np
10 | cimport numpy as np
11 | 
12 | DTYPE = np.float
13 | ctypedef np.float_t DTYPE_t
14 | 
15 | def bbox_overlaps(
16 |         np.ndarray[DTYPE_t, ndim=2] boxes,
17 |         np.ndarray[DTYPE_t, ndim=2] query_boxes):
18 |     """
19 |     Parameters
20 |     ----------
21 |     boxes: (N, 4) ndarray of float
22 |     query_boxes: (K, 4) ndarray of float
23 |     Returns
24 |     -------
25 |     overlaps: (N, K) ndarray of overlap between boxes and query_boxes
26 |     """
27 |     cdef unsigned int N = boxes.shape[0]
28 |     cdef unsigned int K = query_boxes.shape[0]
29 |     cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
30 |     cdef DTYPE_t iw, ih, box_area
31 |     cdef DTYPE_t ua
32 |     cdef unsigned int k, n
33 |     for k in range(K):
34 |         box_area = (
35 |             (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
36 |             (query_boxes[k, 3] - query_boxes[k, 1] + 1)
37 |         )
38 |         for n in range(N):
39 |             iw = (
40 |                 min(boxes[n, 2], query_boxes[k, 2]) -
41 |                 max(boxes[n, 0], query_boxes[k, 0]) + 1
42 |             )
43 |             if iw > 0:
44 |                 ih = (
45 |                     min(boxes[n, 3], query_boxes[k, 3]) -
46 |                     max(boxes[n, 1], query_boxes[k, 1]) + 1
47 |                 )
48 |                 if ih > 0:
49 |                     ua = float(
50 |                         (boxes[n, 2] - boxes[n, 0] + 1) *
51 |                         (boxes[n, 3] - boxes[n, 1] + 1) +
52 |                         box_area - iw * ih
53 |                     )
54 |                     overlaps[n, k] = iw * ih / ua
55 |     return overlaps


--------------------------------------------------------------------------------
/widerface_evaluate/evaluation.py:
--------------------------------------------------------------------------------
  1 | """
  2 | WiderFace evaluation code
  3 | author: wondervictor
  4 | mail: tianhengcheng@gmail.com
  5 | copyright@wondervictor
  6 | """
  7 | 
  8 | import os
  9 | import tqdm
 10 | import pickle
 11 | import argparse
 12 | import numpy as np
 13 | from scipy.io import loadmat
 14 | from bbox import bbox_overlaps
 15 | from IPython import embed
 16 | 
 17 | 
 18 | def get_gt_boxes(gt_dir):
 19 |     """ gt dir: (wider_face_val.mat, wider_easy_val.mat, wider_medium_val.mat, wider_hard_val.mat)"""
 20 | 
 21 |     gt_mat = loadmat(os.path.join(gt_dir, 'wider_face_val.mat'))
 22 |     hard_mat = loadmat(os.path.join(gt_dir, 'wider_hard_val.mat'))
 23 |     medium_mat = loadmat(os.path.join(gt_dir, 'wider_medium_val.mat'))
 24 |     easy_mat = loadmat(os.path.join(gt_dir, 'wider_easy_val.mat'))
 25 | 
 26 |     facebox_list = gt_mat['face_bbx_list']
 27 |     event_list = gt_mat['event_list']
 28 |     file_list = gt_mat['file_list']
 29 | 
 30 |     hard_gt_list = hard_mat['gt_list']
 31 |     medium_gt_list = medium_mat['gt_list']
 32 |     easy_gt_list = easy_mat['gt_list']
 33 | 
 34 |     return facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list
 35 | 
 36 | 
 37 | def get_gt_boxes_from_txt(gt_path, cache_dir):
 38 | 
 39 |     cache_file = os.path.join(cache_dir, 'gt_cache.pkl')
 40 |     if os.path.exists(cache_file):
 41 |         f = open(cache_file, 'rb')
 42 |         boxes = pickle.load(f)
 43 |         f.close()
 44 |         return boxes
 45 | 
 46 |     f = open(gt_path, 'r')
 47 |     state = 0
 48 |     lines = f.readlines()
 49 |     lines = list(map(lambda x: x.rstrip('\r\n'), lines))
 50 |     boxes = {}
 51 |     print(len(lines))
 52 |     f.close()
 53 |     current_boxes = []
 54 |     current_name = None
 55 |     for line in lines:
 56 |         if state == 0 and '--' in line:
 57 |             state = 1
 58 |             current_name = line
 59 |             continue
 60 |         if state == 1:
 61 |             state = 2
 62 |             continue
 63 | 
 64 |         if state == 2 and '--' in line:
 65 |             state = 1
 66 |             boxes[current_name] = np.array(current_boxes).astype('float32')
 67 |             current_name = line
 68 |             current_boxes = []
 69 |             continue
 70 | 
 71 |         if state == 2:
 72 |             box = [float(x) for x in line.split(' ')[:4]]
 73 |             current_boxes.append(box)
 74 |             continue
 75 | 
 76 |     f = open(cache_file, 'wb')
 77 |     pickle.dump(boxes, f)
 78 |     f.close()
 79 |     return boxes
 80 | 
 81 | 
 82 | def read_pred_file(filepath):
 83 | 
 84 |     with open(filepath, 'r') as f:
 85 |         lines = f.readlines()
 86 |         img_file = lines[0].rstrip('\n\r')
 87 |         lines = lines[2:]
 88 | 
 89 |     # b = lines[0].rstrip('\r\n').split(' ')[:-1]
 90 |     # c = float(b)
 91 |     # a = map(lambda x: [[float(a[0]), float(a[1]), float(a[2]), float(a[3]), float(a[4])] for a in x.rstrip('\r\n').split(' ')], lines)
 92 |     boxes = []
 93 |     for line in lines:
 94 |         line = line.rstrip('\r\n').split(' ')
 95 |         if line[0] is '':
 96 |             continue
 97 |         # a = float(line[4])
 98 |         boxes.append([float(line[0]), float(line[1]), float(line[2]), float(line[3]), float(line[4])])
 99 |     boxes = np.array(boxes)
100 |     # boxes = np.array(list(map(lambda x: [float(a) for a in x.rstrip('\r\n').split(' ')], lines))).astype('float')
101 |     return img_file.split('/')[-1], boxes
102 | 
103 | 
104 | def get_preds(pred_dir):
105 |     events = os.listdir(pred_dir)
106 |     boxes = dict()
107 |     pbar = tqdm.tqdm(events)
108 | 
109 |     for event in pbar:
110 |         pbar.set_description('Reading Predictions ')
111 |         event_dir = os.path.join(pred_dir, event)
112 |         event_images = os.listdir(event_dir)
113 |         current_event = dict()
114 |         for imgtxt in event_images:
115 |             imgname, _boxes = read_pred_file(os.path.join(event_dir, imgtxt))
116 |             current_event[imgname.rstrip('.jpg')] = _boxes
117 |         boxes[event] = current_event
118 |     return boxes
119 | 
120 | 
121 | def norm_score(pred):
122 |     """ norm score
123 |     pred {key: [[x1,y1,x2,y2,s]]}
124 |     """
125 | 
126 |     max_score = 0
127 |     min_score = 1
128 | 
129 |     for _, k in pred.items():
130 |         for _, v in k.items():
131 |             if len(v) == 0:
132 |                 continue
133 |             _min = np.min(v[:, -1])
134 |             _max = np.max(v[:, -1])
135 |             max_score = max(_max, max_score)
136 |             min_score = min(_min, min_score)
137 | 
138 |     diff = max_score - min_score
139 |     for _, k in pred.items():
140 |         for _, v in k.items():
141 |             if len(v) == 0:
142 |                 continue
143 |             v[:, -1] = (v[:, -1] - min_score)/diff
144 | 
145 | 
146 | def image_eval(pred, gt, ignore, iou_thresh):
147 |     """ single image evaluation
148 |     pred: Nx5
149 |     gt: Nx4
150 |     ignore:
151 |     """
152 | 
153 |     _pred = pred.copy()
154 |     _gt = gt.copy()
155 |     pred_recall = np.zeros(_pred.shape[0])
156 |     recall_list = np.zeros(_gt.shape[0])
157 |     proposal_list = np.ones(_pred.shape[0])
158 | 
159 |     _pred[:, 2] = _pred[:, 2] + _pred[:, 0]
160 |     _pred[:, 3] = _pred[:, 3] + _pred[:, 1]
161 |     _gt[:, 2] = _gt[:, 2] + _gt[:, 0]
162 |     _gt[:, 3] = _gt[:, 3] + _gt[:, 1]
163 | 
164 |     overlaps = bbox_overlaps(_pred[:, :4], _gt)
165 | 
166 |     for h in range(_pred.shape[0]):
167 | 
168 |         gt_overlap = overlaps[h]
169 |         max_overlap, max_idx = gt_overlap.max(), gt_overlap.argmax()
170 |         if max_overlap >= iou_thresh:
171 |             if ignore[max_idx] == 0:
172 |                 recall_list[max_idx] = -1
173 |                 proposal_list[h] = -1
174 |             elif recall_list[max_idx] == 0:
175 |                 recall_list[max_idx] = 1
176 | 
177 |         r_keep_index = np.where(recall_list == 1)[0]
178 |         pred_recall[h] = len(r_keep_index)
179 |     return pred_recall, proposal_list
180 | 
181 | 
182 | def img_pr_info(thresh_num, pred_info, proposal_list, pred_recall):
183 |     pr_info = np.zeros((thresh_num, 2)).astype('float')
184 |     for t in range(thresh_num):
185 | 
186 |         thresh = 1 - (t+1)/thresh_num
187 |         r_index = np.where(pred_info[:, 4] >= thresh)[0]
188 |         if len(r_index) == 0:
189 |             pr_info[t, 0] = 0
190 |             pr_info[t, 1] = 0
191 |         else:
192 |             r_index = r_index[-1]
193 |             p_index = np.where(proposal_list[:r_index+1] == 1)[0]
194 |             pr_info[t, 0] = len(p_index)
195 |             pr_info[t, 1] = pred_recall[r_index]
196 |     return pr_info
197 | 
198 | 
199 | def dataset_pr_info(thresh_num, pr_curve, count_face):
200 |     _pr_curve = np.zeros((thresh_num, 2))
201 |     for i in range(thresh_num):
202 |         _pr_curve[i, 0] = pr_curve[i, 1] / pr_curve[i, 0]
203 |         _pr_curve[i, 1] = pr_curve[i, 1] / count_face
204 |     return _pr_curve
205 | 
206 | 
207 | def voc_ap(rec, prec):
208 | 
209 |     # correct AP calculation
210 |     # first append sentinel values at the end
211 |     mrec = np.concatenate(([0.], rec, [1.]))
212 |     mpre = np.concatenate(([0.], prec, [0.]))
213 | 
214 |     # compute the precision envelope
215 |     for i in range(mpre.size - 1, 0, -1):
216 |         mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
217 | 
218 |     # to calculate area under PR curve, look for points
219 |     # where X axis (recall) changes value
220 |     i = np.where(mrec[1:] != mrec[:-1])[0]
221 | 
222 |     # and sum (\Delta recall) * prec
223 |     ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
224 |     return ap
225 | 
226 | 
227 | def evaluation(pred, gt_path, iou_thresh=0.5):
228 |     pred = get_preds(pred)
229 |     norm_score(pred)
230 |     facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list = get_gt_boxes(gt_path)
231 |     event_num = len(event_list)
232 |     thresh_num = 1000
233 |     settings = ['easy', 'medium', 'hard']
234 |     setting_gts = [easy_gt_list, medium_gt_list, hard_gt_list]
235 |     aps = []
236 |     for setting_id in range(3):
237 |         # different setting
238 |         gt_list = setting_gts[setting_id]
239 |         count_face = 0
240 |         pr_curve = np.zeros((thresh_num, 2)).astype('float')
241 |         # [hard, medium, easy]
242 |         pbar = tqdm.tqdm(range(event_num))
243 |         for i in pbar:
244 |             pbar.set_description('Processing {}'.format(settings[setting_id]))
245 |             event_name = str(event_list[i][0][0])
246 |             img_list = file_list[i][0]
247 |             pred_list = pred[event_name]
248 |             sub_gt_list = gt_list[i][0]
249 |             # img_pr_info_list = np.zeros((len(img_list), thresh_num, 2))
250 |             gt_bbx_list = facebox_list[i][0]
251 | 
252 |             for j in range(len(img_list)):
253 |                 pred_info = pred_list[str(img_list[j][0][0])]
254 | 
255 |                 gt_boxes = gt_bbx_list[j][0].astype('float')
256 |                 keep_index = sub_gt_list[j][0]
257 |                 count_face += len(keep_index)
258 | 
259 |                 if len(gt_boxes) == 0 or len(pred_info) == 0:
260 |                     continue
261 |                 ignore = np.zeros(gt_boxes.shape[0])
262 |                 if len(keep_index) != 0:
263 |                     ignore[keep_index-1] = 1
264 |                 pred_recall, proposal_list = image_eval(pred_info, gt_boxes, ignore, iou_thresh)
265 | 
266 |                 _img_pr_info = img_pr_info(thresh_num, pred_info, proposal_list, pred_recall)
267 | 
268 |                 pr_curve += _img_pr_info
269 |         pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face)
270 | 
271 |         propose = pr_curve[:, 0]
272 |         recall = pr_curve[:, 1]
273 | 
274 |         ap = voc_ap(recall, propose)
275 |         aps.append(ap)
276 | 
277 |     print("==================== Results ====================")
278 |     print("Easy   Val AP: {}".format(aps[0]))
279 |     print("Medium Val AP: {}".format(aps[1]))
280 |     print("Hard   Val AP: {}".format(aps[2]))
281 |     print("=================================================")
282 | 
283 | 
284 | if __name__ == '__main__':
285 | 
286 |     parser = argparse.ArgumentParser()
287 |     parser.add_argument('-p', '--pred', default="./widerface_txt/")
288 |     parser.add_argument('-g', '--gt', default='./ground_truth/')
289 | 
290 |     args = parser.parse_args()
291 |     evaluation(args.pred, args.gt)
292 | 
293 | 
294 | 
295 | 
296 | 
297 | 
298 | 
299 | 
300 | 
301 | 
302 | 
303 | 
304 | 


--------------------------------------------------------------------------------
/widerface_evaluate/ground_truth/wider_easy_val.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_easy_val.mat


--------------------------------------------------------------------------------
/widerface_evaluate/ground_truth/wider_face_val.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_face_val.mat


--------------------------------------------------------------------------------
/widerface_evaluate/ground_truth/wider_hard_val.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_hard_val.mat


--------------------------------------------------------------------------------
/widerface_evaluate/ground_truth/wider_medium_val.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_medium_val.mat


--------------------------------------------------------------------------------
/widerface_evaluate/setup.py:
--------------------------------------------------------------------------------
 1 | """
 2 | WiderFace evaluation code
 3 | author: wondervictor
 4 | mail: tianhengcheng@gmail.com
 5 | copyright@wondervictor
 6 | """
 7 | 
 8 | from distutils.core import setup, Extension
 9 | from Cython.Build import cythonize
10 | import numpy
11 | 
12 | package = Extension('bbox', ['box_overlaps.pyx'], include_dirs=[numpy.get_include()])
13 | setup(ext_modules=cythonize([package]))
14 | 


--------------------------------------------------------------------------------