├── Finding tiny faces.pdf
├── README.md
├── data
├── .DS_Store
├── 92_TinyFaces.png
├── Downscale_graph.png
├── Downscaling.png
├── TotalIncrementalCount.gif
└── benchmark.png
├── detect.py
├── evaluate.py
├── metrics.py
├── notebooks
├── Blurring.ipynb
├── Counting in video.ipynb
├── Downsampling.ipynb
├── Face Detection algorithms comparison.ipynb
└── Video detection.ipynb
├── tiny_faces_model.py
└── util.py
/Finding tiny faces.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/Finding tiny faces.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ExtendedTinyFaces
2 | Analysis, review and application of Finding Tiny Faces (P. Hu) [1] with a focus on counting the many faces in a demonstration/crowd.
3 | RecVis (MVA) course - Alexandre Attia, [Sharone Dayan](https://github.com/SharoneDayan)
4 | **You can find our [pre-print report on ArXiv](https://arxiv.org/abs/1801.06504)**.
5 |
6 | ### Introduction
7 | The paper - released at CVPR 2017 - deals with finding small objects (particularly faces in our case) in an image,
8 | based on scale-specific detectors by using features defined over single (deep) feature hierarchy :
9 | Scale Invariance, Image resolution, Contextual reasoning. The algorithm is based on foveal descriptors, i.e blurring the peripheral image to encode and give just enough information about the context, mimicking the human vision.
10 | The subject is still an open challenge and we would like to enlarge it to new horizons and experiment this approach to different applications. The goal would be to deeply understand the choices of the paper, together with their applications on subjects related to security and identification. We are mainly focus on the inference part using a TensorFlow implementation, adapted from [this repo](https://github.com/cydonia999/Tiny_Faces_in_Tensorflow).
11 |
12 |
13 |
14 | ### Face detection benchmark
15 | First, we aim at comparing the Tiny Faces algorithm with other face detection models.
16 | We use two particular sub-folders of the WIDERFACE dataset (*Parade* and *Dresses*) to compare our model with Faster R-CNN for face detection (using [MXNet](https://github.com/tornadomeet/mxnet-face), MTCNN[6] (using [MXNET](https://github.com/pangyupo/mxnet_mtcnn_face_detection)), Haar Cascade[2] and HOG[3].
17 | This benchmark can be found in this [notebook](https://github.com/alexattia/ExtendedTinyFaces/blob/master/notebooks/Face%20Detection%20algorithms%20comparison.ipynb)
18 | 
19 |
20 | ### Image resolution influence
21 | The performance of the Tiny Faces algorithm is linked with the image resolution. Indeed, this parameter really affects the face detection as explained in the original paper. We used the inference part and plotted the variations of detected faces while downscaling the image resolution.
22 |
23 |
24 | ### Face Recognition
25 | Face recognition can be another application of the paper. Thus, we aim at building a Python pipeline for face recognition.
26 | We would like to use face alignment[4] and face embedding[5] to achieve face classification.
27 | The first application we would like to explore includes : counting the many different faces
28 | (numerous people displayed with different size in the picture) in a video of a crowded public demonstration.
29 | This application can be found in this
30 | [notebook](https://github.com/alexattia/ExtendedTinyFaces/blob/master/notebooks/Counting%20in%20video.ipynb).
31 | In order to achieve it, we have to match people from one frame to another one to make sure the counting of a person is not redundant. The matching is achievied with face recognition and we count people with face detection. We used a linear SVM for the face classificaton.
32 | 
33 |
34 | ### Repository organisation
35 | [notebooks](https://github.com/alexattia/ExtendedTinyFaces/tree/master/notebooks) Notebooks folder with the different application and experiments
36 | [detect.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/detect.py) File for the people matching in order to count people (cf the Counting people notebook)
37 | [evaluate.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/evaluate.py) Inference function : detecting faces in one (or mulitple) picture
38 | [tiny_faces_model.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/tiny_faces_model.py) Tiny Faces model
39 | [util.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/util.py) Misc for overlay bounding boxes
40 |
41 | ### References
42 | [[1]](https://arxiv.org/abs/1612.04402) Peiyun Hu and Deva Ramanan. Finding Tiny Faces. 2017.
43 | [[2]](https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf) P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. 2001.
44 | [[3]](http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf) Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection. 2005.
45 | [[4]](https://pdfs.semanticscholar.org/d78b/6a5b0dcaa81b1faea5fb0000045a62513567.pdf) Vahid Kazemi and Josephine Sullivan. One Millisecond Face Alignment with an Ensemble of Regression Trees
46 | [[5]](https://arxiv.org/abs/1503.03832) Florian Schroff, Dmitry Kalenichenko and James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. 2015
47 | [[6]](https://arxiv.org/abs/1604.02878)Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
48 |
49 |
--------------------------------------------------------------------------------
/data/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/.DS_Store
--------------------------------------------------------------------------------
/data/92_TinyFaces.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/92_TinyFaces.png
--------------------------------------------------------------------------------
/data/Downscale_graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/Downscale_graph.png
--------------------------------------------------------------------------------
/data/Downscaling.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/Downscaling.png
--------------------------------------------------------------------------------
/data/TotalIncrementalCount.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/TotalIncrementalCount.gif
--------------------------------------------------------------------------------
/data/benchmark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/benchmark.png
--------------------------------------------------------------------------------
/detect.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import dlib
3 | from imgaug import augmenters as iaa
4 | import pandas as pd
5 | from sklearn.svm import SVC
6 | import random
7 |
8 | face_encoder = dlib.face_recognition_model_v1('./model/dlib_face_recognition_resnet_model_v1.dat')
9 | face_pose_predictor = dlib.shape_predictor('./model/shape_predictor_68_face_landmarks.dat')
10 |
11 | def encoding_faces(images, label, coord_detect):
12 | """
13 | Encoding a list of faces using FaceNet generating a 128D vector
14 | :param images: list of images of faces to encode
15 | :param coord_detect: coordinates of the detection
16 | :return: pandas dataframe of the faces for one label
17 | """
18 | l = []
19 | for img, d in zip(images, coord_detect):
20 | (x1, y1, x2, y2) = d
21 | detected_face = dlib.rectangle(left=0, top=0, right=int(x2-x1), bottom=int(y2-y1))
22 | pose_landmarks = face_pose_predictor(img, detected_face)
23 | face_encoding = face_encoder.compute_face_descriptor(img, pose_landmarks, 1)
24 | l.append(np.append(face_encoding, [label]))
25 |
26 | return np.array(l)
27 |
28 | def create_positive_set(pictures, coords, label=1):
29 | """
30 | Create positive train set for one face from a list of three pictures of this face.
31 | Data Augmentation on these three pictures to generate 10 pictures.
32 | Encoding of the ten pictures
33 | :param pictures: list of three full pictures
34 | :param coords: list of the coordinates of the face in the first frame
35 | :return: pandas dataframe of the faces for one person
36 | """
37 | # original coordinate
38 | x1, y1, x2, y2 = coords
39 | # Load the three same faces
40 | images = [pictures[j][y1:y2,x1:x2,:] for j in range(3)]
41 | # triple each picture
42 | images = [item for item in images for i in range(5)]
43 | # Sometimes(0.5, ...) applies the given augmenter in 50% of all cases,
44 | st = lambda aug: iaa.Sometimes(0.5, aug)
45 | # add a random value from the range (-30, 30) to the first three channels and gaussian noise
46 | aug = iaa.Sequential([
47 | iaa.WithChannels( channels=[0, 1, 2], children=iaa.Add((-30, 30))),
48 | st(iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5))])
49 |
50 | images_aug = aug.augment_images(images)
51 | # add the original frame
52 | images_aug.append(pictures[0][y1:y2,x1:x2,:])
53 |
54 | # encode each of 10 faces (of the same person)
55 | coords_detect = [coords for k in range(len(images_aug))]
56 | return pd.DataFrame(encoding_faces(images_aug, label, coords_detect))
57 |
58 | def train_binclas(pics, detections, idx_detection):
59 | """
60 | Create train set and traing a binary SVM to classify faces for one original
61 | face (from frame 0)
62 | """
63 | pos = create_positive_set(pics, detections[0][idx_detection])
64 |
65 | # Choose 10 other detections from the first frame
66 | neg_detect = np.array([k for i, k in enumerate(detections[0]) if i != idx_detection])
67 | idx_neg = random.sample(range(len(neg_detect)), 10)
68 |
69 | # Get face images for the 10 detections
70 | img_neg = [pics[0][y1_:y2_,x1_:x2_,:] for (x1_, y1_, x2_, y2_) in neg_detect[idx_neg]]
71 | # Encode each face
72 | neg = pd.DataFrame(encoding_faces(img_neg, 0, neg_detect[idx_neg]))
73 |
74 | # join positive and negative samples
75 | df = pd.concat([pos, neg])
76 | df = df.sample(len(df)).reset_index(drop=True)
77 | y = df[128]
78 | X = df.drop(128, axis=1)
79 |
80 | # training
81 | clf = SVC(C=1, kernel='linear', probability=True)
82 | clf.fit(X, y)
83 | # keeping 4th picture detections in the neighborhoud
84 | x1, y1, x2, y2 = detections[0][idx_detection]
85 | neigh_detect = [k for k in detections[::-1][0] if
86 | np.abs(k[0]-x1) < 600 and
87 | np.abs(k[1]-y1) < 600 and
88 | np.abs(k[2]-x2) < 600 and
89 | np.abs(k[3]-y2) < 600]
90 |
91 | # Get face images to classify
92 | img_neighb = [pics[::-1][0][y1_:y2_,x1_:x2_,:] for (x1_, y1_, x2_, y2_) in neigh_detect]
93 | # Encode each face
94 | neigh_detect_encodings = encoding_faces(img_neighb, -1, neigh_detect)[:,:128]
95 | # compute distances
96 | distances = clf.predict_proba(neigh_detect_encodings)
97 |
98 | return neigh_detect, distances
99 |
--------------------------------------------------------------------------------
/evaluate.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | import tensorflow as tf
3 | import tiny_faces_model as tiny_model
4 | import util
5 | from argparse import ArgumentParser
6 | import cv2
7 | import scipy.io
8 | import numpy as np
9 | import matplotlib.pyplot as plt
10 | import cv2
11 | import pickle
12 |
13 | import pylab as pl
14 | import time
15 | import os
16 | import sys
17 | from scipy.special import expit
18 | import glob
19 |
20 | MAX_INPUT_DIM = 5000.0
21 |
22 | def evaluate(weight_file_path, output_dir=None, data_dir=None, img=None, list_imgs=None,
23 | prob_thresh=0.5, nms_thresh=0.1, lw=3, display=False,
24 | draw=True, save=True, print_=0):
25 | """
26 | Detect faces in images.
27 | :param weight_file_path: A pretrained weight file in the pickle format
28 | generated by matconvnet_hr101_to_tf.py.
29 | :param output_dir: A directory into which images with detected faces are output.
30 | default=None to not output detected faces
31 | :param data_dir: A directory which contains images for face detection.
32 | :param img: One image for face detection
33 | :param prob_thresh: The threshold of detection confidence.
34 | :param nms_thresh: The overlap threshold of non maximum suppression
35 | :param lw: Line width of bounding boxes. If zero specified,
36 | this is determined based on confidence of each detection.
37 | :param display: Display tiny face images on window.
38 | :param draw: Draw bouding boxes on images.
39 | :param save: Save images in output_dir.
40 | :param print_: 0 for no print, 1 for light print, 2 for full print
41 | :return: final bboxes
42 | """
43 | if type(img) != np.ndarray:
44 | one_pic = False
45 | else:
46 | one_pic = True
47 |
48 | if not output_dir:
49 | save = False
50 | draw = False
51 |
52 | # list of bounding boxes for the pictures
53 | final_bboxes = []
54 |
55 | # placeholder of input images. Currently batch size of one is supported.
56 | x = tf.placeholder(tf.float32, [1, None, None, 3]) # n, h, w, c
57 |
58 | # Create the tiny face model which weights are loaded from a pretrained model.
59 | model = tiny_model.Model(weight_file_path)
60 | score_final = model.tiny_face(x)
61 |
62 | # Load an average image and clusters(reference boxes of templates).
63 | with open(weight_file_path, "rb") as f:
64 | _, mat_params_dict = pickle.load(f)
65 |
66 | # Average RGB values from model
67 | average_image = model.get_data_by_key("average_image")
68 |
69 | # Reference boxes of template for 05x, 1x, and 2x scale
70 | clusters = model.get_data_by_key("clusters")
71 | clusters_h = clusters[:, 3] - clusters[:, 1] + 1
72 | clusters_w = clusters[:, 2] - clusters[:, 0] + 1
73 | normal_idx = np.where(clusters[:, 4] == 1)
74 |
75 | # Find image files in data_dir.
76 | filenames = []
77 | # if we provide only one picture, no need to list files in dir
78 | if one_pic:
79 | filenames = [img]
80 | elif type(list_imgs) == list:
81 | filenames = list_imgs
82 | else:
83 | for ext in ('*.png', '*.gif', '*.jpg', '*.jpeg'):
84 | filenames.extend(glob.glob(os.path.join(data_dir, ext)))
85 |
86 | # main
87 | with tf.Session() as sess:
88 | sess.run(tf.global_variables_initializer())
89 | for filename in filenames:
90 | # if we provide only one picture, no need to list files in dir
91 | if not one_pic and type(list_imgs) != list:
92 | fname = filename.split(os.sep)[-1]
93 | raw_img = cv2.imread(filename)
94 | raw_img = cv2.cvtColor(raw_img, cv2.COLOR_BGR2RGB)
95 | else:
96 | fname = 'current_picture'
97 | raw_img = filename
98 | raw_img_f = raw_img.astype(np.float32)
99 |
100 | def _calc_scales():
101 | """
102 | Compute the different scales for detection
103 | :return: [2^X] with X depending on the input image
104 | """
105 | raw_h, raw_w = raw_img.shape[0], raw_img.shape[1]
106 | min_scale = min(np.floor(np.log2(np.max(clusters_w[normal_idx] / raw_w))),
107 | np.floor(np.log2(np.max(clusters_h[normal_idx] / raw_h))))
108 | max_scale = min(1.0, -np.log2(max(raw_h, raw_w) / MAX_INPUT_DIM))
109 | scales_down = pl.frange(min_scale, 0, 1.)
110 | scales_up = pl.frange(0.5, max_scale, 0.5)
111 | scales_pow = np.hstack((scales_down, scales_up))
112 | scales = np.power(2.0, scales_pow)
113 | return scales
114 |
115 | scales = _calc_scales()
116 | start = time.time()
117 |
118 | # initialize output
119 | bboxes = np.empty(shape=(0, 5))
120 |
121 | # process input at different scales
122 | for s in scales:
123 | if print_ == 2:
124 | print("Processing {} at scale {:.4f}".format(fname, s))
125 | img = cv2.resize(raw_img_f, (0, 0), fx=s, fy=s, interpolation=cv2.INTER_LINEAR)
126 | img = img - average_image
127 | img = img[np.newaxis, :]
128 |
129 | # we don't run every template on every scale ids of templates to ignore
130 | tids = list(range(4, 12)) + ([] if s <= 1.0 else list(range(18, 25)))
131 | ignoredTids = list(set(range(0, clusters.shape[0])) - set(tids))
132 |
133 | # run through the net
134 | score_final_tf = sess.run(score_final, feed_dict={x: img})
135 |
136 | # collect scores
137 | score_cls_tf, score_reg_tf = score_final_tf[:, :, :, :25], score_final_tf[:, :, :, 25:125]
138 | prob_cls_tf = expit(score_cls_tf)
139 | prob_cls_tf[0, :, :, ignoredTids] = 0.0
140 |
141 | def _calc_bounding_boxes():
142 | # threshold for detection
143 | _, fy, fx, fc = np.where(prob_cls_tf > prob_thresh)
144 |
145 | # interpret heatmap into bounding boxes
146 | cy = fy * 8 - 1
147 | cx = fx * 8 - 1
148 | ch = clusters[fc, 3] - clusters[fc, 1] + 1
149 | cw = clusters[fc, 2] - clusters[fc, 0] + 1
150 |
151 | # extract bounding box refinement
152 | Nt = clusters.shape[0]
153 | tx = score_reg_tf[0, :, :, 0:Nt]
154 | ty = score_reg_tf[0, :, :, Nt:2*Nt]
155 | tw = score_reg_tf[0, :, :, 2*Nt:3*Nt]
156 | th = score_reg_tf[0, :, :, 3*Nt:4*Nt]
157 |
158 | # refine bounding boxes
159 | dcx = cw * tx[fy, fx, fc]
160 | dcy = ch * ty[fy, fx, fc]
161 | rcx = cx + dcx
162 | rcy = cy + dcy
163 | rcw = cw * np.exp(tw[fy, fx, fc])
164 | rch = ch * np.exp(th[fy, fx, fc])
165 |
166 | scores = score_cls_tf[0, fy, fx, fc]
167 | tmp_bboxes = np.vstack((rcx - rcw / 2, rcy - rch / 2, rcx + rcw / 2, rcy + rch / 2))
168 | tmp_bboxes = np.vstack((tmp_bboxes / s, scores))
169 | tmp_bboxes = tmp_bboxes.transpose()
170 | return tmp_bboxes
171 |
172 | tmp_bboxes = _calc_bounding_boxes()
173 | bboxes = np.vstack((bboxes, tmp_bboxes)) # : (5265, 5)
174 |
175 | if print_ >= 1:
176 | print("time {:.2f} secs for {}".format(time.time() - start, fname))
177 |
178 | # non maximum suppression
179 | refind_idx = tf.image.non_max_suppression(tf.convert_to_tensor(bboxes[:, :4], dtype=tf.float32),
180 | tf.convert_to_tensor(bboxes[:, 4], dtype=tf.float32),
181 | max_output_size=bboxes.shape[0], iou_threshold=nms_thresh)
182 | refind_idx = sess.run(refind_idx)
183 | refined_bboxes = bboxes[refind_idx]
184 |
185 | # convert bbox coordinates to int
186 | # f_box = overlay_bounding_boxes(raw_img, refined_bboxes, lw, draw)
187 | f_box = [[int(x) for x in r[:4] for r in refined_bboxes]]
188 |
189 | if display:
190 | # plt.axis('off')
191 | plt.imshow(raw_img)
192 | plt.show()
193 |
194 | if save:
195 | # save image with bounding boxes
196 | raw_img = cv2.cvtColor(raw_img, cv2.COLOR_RGB2BGR)
197 | cv2.imwrite(os.path.join(output_dir, fname), raw_img)
198 |
199 | final_bboxes.append(f_box)
200 |
201 | if len(final_bboxes) == 1:
202 | final_bboxes = final_bboxes[0]
203 | return final_bboxes
204 |
--------------------------------------------------------------------------------
/metrics.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | import glob
4 | data_folder = './tiny/data/widerface/WIDER_val/images/'
5 |
6 | def get_folder_name(pic):
7 | """
8 | Get folder name from the picture name
9 | 1_Handshaking_Handshaking_1_411.jpg -->1--Handshaking/
10 | :param pic: picture name
11 | :return: folder name
12 | """
13 | x = pic.split('_')[1:3]
14 | s = pic.split('_')[0]+ '--'+ '_'.join(sorted(set(x), key=x.index)) + '/'
15 |
16 | if 'Demonstration' in s:
17 | try:
18 | s = s[:s.index('_')] + '/'
19 | except ValueError:
20 | pass
21 | return s
22 |
23 | def jaccard_distance(boxA, boxB):
24 | """
25 | Calculate the Intersection over Union (IoU) of two bounding boxes.
26 | :param bb1: list [x1, x2, y1, y2]
27 | The (x1, y1) position is at the top left corner,
28 | the (x2, y2) position is at the bottom right corner
29 | :param bb2: list [x1, x2, y1, y2]
30 | The (x1, y1) position is at the top left corner,
31 | the (x2, y2) position is at the bottom right corner
32 | :return: float in [0, 1]
33 | """
34 | # determine the (x, y)-coordinates of the intersection rectangle
35 | xA = max(boxA[0], boxB[0])
36 | yA = max(boxA[1], boxB[1])
37 | xB = min(boxA[2], boxB[2])
38 | yB = min(boxA[3], boxB[3])
39 |
40 | # compute the area of intersection rectangle
41 | interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
42 |
43 | # compute the area of both the prediction and ground-truth
44 | # rectangles
45 | boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
46 | boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
47 |
48 | # compute the intersection over union by taking the intersection
49 | # area and dividing it by the sum of prediction + ground-truth
50 | # areas - the interesection area
51 | iou = interArea / float(boxAArea + boxBArea - interArea)
52 |
53 | # return the intersection over union value
54 | return iou * (iou > 0.5)
55 |
56 | def find_best_bbox(box, predicted_boxes):
57 | """
58 | Find the corresponding predicted bounding box
59 | compared to the ground truth
60 | :param box: ground truth bounding box
61 | :param predicted_boxes: list of predicted bounding boxes
62 | :return: index of the corresponding bbox, jaccard distance
63 | """
64 | if type(box) == list:
65 | box = ' '.join(map(str,box))
66 | (x1, y1, w, h) = map(int,box.split()[:4])
67 | boxA = [x1, y1, x1+w, y1+h]
68 | l = []
69 | # boxB : [x1, x2, y1, y2] (top-left and bottom-right)
70 | for boxB in predicted_boxes:
71 | l.append(jaccard_distance(boxA, boxB))
72 | if len(l) > 0:
73 | return np.argmax(l), np.max(l)
74 | else:
75 | return -1, 0
76 |
77 | def mean_jaccard(truth_boxes, predicted_boxes, only_tp=True, blurred=0):
78 | """
79 | Compute the average Jaccard distance for the bounding boxes of
80 | one picture.
81 | :param truth_boxes: ground truth bounding boxes
82 | :param predicted_boxes: predicted bounding boxes
83 | :param only_tp: boolean to only keep true positive bounding bo
84 | :return: mean jaccard and number of TP (None, if no TP found)
85 | """
86 | l = []
87 | for truth_box in truth_boxes:
88 | if int(truth_box.split()[4]) >= blurred:
89 | _, jd = find_best_bbox(truth_box, predicted_boxes)
90 | l.append(jd)
91 | if only_tp:
92 | l = [k for k in l if k > 0]
93 | if len(l) > 0:
94 | return np.mean(l), len(l)
95 |
96 | def compute_stats(data_dir, truth, predictions, blurred=0):
97 | """
98 | Compute the mean Jaccard distance and the ratio of predicted bounding
99 | boxes compared to the number of actual bounding boxes
100 | :param data_dir: directory path with the pictures
101 | :param truth: dict of actual annotations of the bounding boxes
102 | d[name] = [(x1, y1, w, h, blur, expression, illumination, invalid, occlusion, pose)]
103 | :param predictions: list of predicted bounding boxes
104 | keeping the same order of glob.glob(pictures folder)
105 | :param blurred: 0 for all faces, 1 for normal blurred faces, 2 for heavy blurred faces
106 | :return: (len(pictures), 4) numpy array and the corresponding panda DataFrame
107 | ['mean Jaccard', 'Nb_Truth_Bboxes', 'Nb_Pred_Bboxes', 'Ratio_Bboxes']
108 | """
109 | pictures = glob.glob(data_dir + '*')
110 | n_pictures = len(pictures)
111 | jaccard, n_truth_boxes, n_pred_boxes = [], [], []
112 | a = np.zeros((n_pictures,4))
113 |
114 | for idx in range(n_pictures):
115 | truth_boxes = truth[pictures[idx].replace(data_folder, '')]
116 | temp = mean_jaccard(truth_boxes, predictions[idx], blurred=blurred)
117 | if temp:
118 | mean_jac, nb_pred = temp
119 | else:
120 | mean_jac, nb_pred = None, 0
121 | jaccard.append(mean_jac)
122 | n_truth_boxes.append(len([k for k in truth_boxes if int(k.split()[4]) >= blurred]))
123 | n_pred_boxes.append(nb_pred)
124 |
125 | a[:,0] = jaccard
126 | a[:,1] = n_truth_boxes
127 | a[:,2] = n_pred_boxes
128 | a[:,3] = a[:,2]/a[:,1]
129 | df = pd.DataFrame(a, columns=['mJaccard', 'Nb_Truth_Bboxes', 'Nb_Pred_Bboxes', 'Ratio_Bboxes'])
130 | df['Folder'] = data_dir.replace(data_folder, '')
131 | return a, df
132 |
--------------------------------------------------------------------------------
/notebooks/Counting in video.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 2,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import sys\n",
10 | "#sys.path.append('./Tiny_Faces_in_Tensorflow/')\n",
11 | "#import tiny_face_eval as tiny\n",
12 | "import evaluate\n",
13 | "from metrics import *\n",
14 | "import tensorflow as tf\n",
15 | "import matplotlib.pyplot as plt\n",
16 | "import matplotlib.patches as patches\n",
17 | "from sklearn.metrics import mean_squared_error as mse\n",
18 | "import glob\n",
19 | "import os\n",
20 | "import cv2\n",
21 | "import pandas as pd\n",
22 | "from sklearn.svm import SVC\n",
23 | "import numpy as np\n",
24 | "import imp\n",
25 | "import time\n",
26 | "import random\n",
27 | "import detect\n",
28 | "import dlib\n",
29 | "from imgaug import augmenters as iaa\n",
30 | "#imp.reload(tiny)\n",
31 | "imp.reload(detect)\n",
32 | "%matplotlib inline"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 2,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "weights_path = './Tiny_Faces_in_Tensorflow/hr_res101.pkl'"
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "metadata": {},
47 | "source": [
48 | "## Saving frames"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 3,
54 | "metadata": {},
55 | "outputs": [],
56 | "source": [
57 | "cap = cv2.VideoCapture('/home/alexattia/Work/RecVis/famvk.avi')\n",
58 | "fps = cap.get(cv2.CAP_PROP_FPS)\n",
59 | "initial_target = int(45 * fps) + 10\n",
60 | "final_target = int(49 * fps) + 10\n",
61 | "i = 0\n",
62 | "frames = []\n",
63 | "while(True):\n",
64 | " ret, frame = cap.read()\n",
65 | " i +=1 \n",
66 | " if i in range(initial_target, final_target+10):\n",
67 | " frames.append(frame[:,:,::-1])\n",
68 | " if i == final_target:\n",
69 | " break"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": 4,
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "images = []\n",
79 | "for k in range(0, len(frames), 10):\n",
80 | " try:\n",
81 | " imgs = [frames[k], frames[k+1], frames[k+2], frames[k+10]]\n",
82 | " except IndexError:\n",
83 | " imgs = [frames[k], frames[k+1], frames[k+2], frames[len(frames)-1]]\n",
84 | " images.append(imgs)"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "## Detection"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": 5,
97 | "metadata": {},
98 | "outputs": [],
99 | "source": [
100 | "all_detections = []\n",
101 | "for frames in images:\n",
102 | " detections = []\n",
103 | " for frame in frames:\n",
104 | " with tf.Graph().as_default():\n",
105 | " b = evaluate.evaluate(weight_file_path=weights_path, img=frame)\n",
106 | " detections.append(b)\n",
107 | " all_detections.append(detections)"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "## Matching "
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": 37,
120 | "metadata": {
121 | "scrolled": false
122 | },
123 | "outputs": [
124 | {
125 | "name": "stdout",
126 | "output_type": "stream",
127 | "text": [
128 | "It took 15.8 sec i.e 0.25/detection\n",
129 | "It took 16.8 sec i.e 0.27/detection\n",
130 | "It took 19.1 sec i.e 0.27/detection\n",
131 | "It took 18.6 sec i.e 0.26/detection\n",
132 | "It took 17.7 sec i.e 0.26/detection\n",
133 | "It took 17.6 sec i.e 0.24/detection\n",
134 | "It took 18.8 sec i.e 0.26/detection\n",
135 | "It took 22.0 sec i.e 0.27/detection\n",
136 | "It took 24.5 sec i.e 0.28/detection\n",
137 | "It took 25.3 sec i.e 0.28/detection\n"
138 | ]
139 | },
140 | {
141 | "ename": "TypeError",
142 | "evalue": "unsupported operand type(s) for -: 'str' and 'float'",
143 | "output_type": "error",
144 | "traceback": [
145 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
146 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
147 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0mt1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'It took %.1f sec i.e %.2f/detection'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mt1\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mt0bis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mt1\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mt0bis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdetections\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Total : %.1f'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mt0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
148 | "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for -: 'str' and 'float'"
149 | ]
150 | }
151 | ],
152 | "source": [
153 | "threshold = 0.55\n",
154 | "matcheds = []\n",
155 | "t0 = time.time()\n",
156 | "for j in range(len(images)):\n",
157 | " frames = images[j]\n",
158 | " detections = all_detections[j]\n",
159 | " matched = 0\n",
160 | " t0bis = time.time()\n",
161 | " for p in range(len(detections[0])):\n",
162 | " neigh_detect, distances = detect.train_binclas(frames, detections, p)\n",
163 | " idx_max, val_max = np.argmax(distances[:,1]), np.max(distances[:,1])\n",
164 | " if val_max > threshold:\n",
165 | " matched += 1\n",
166 | " matcheds.append(matched)\n",
167 | " t1 = time.time()\n",
168 | " print('It took %.1f sec i.e %.2f/detection' % (t1-t0bis, (t1-t0bis)/len(detections[0])))\n",
169 | "print('Total : %.1f' % (time.time() - t0))"
170 | ]
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "## Counting"
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": 62,
182 | "metadata": {},
183 | "outputs": [],
184 | "source": [
185 | "s = 0\n",
186 | "for j in range(10):\n",
187 | " detections = all_detections[j]\n",
188 | " s += len(detections[0]) - matcheds[j]\n",
189 | "s += len(detections[3])"
190 | ]
191 | },
192 | {
193 | "cell_type": "code",
194 | "execution_count": 63,
195 | "metadata": {},
196 | "outputs": [
197 | {
198 | "data": {
199 | "text/plain": [
200 | "141"
201 | ]
202 | },
203 | "execution_count": 63,
204 | "metadata": {},
205 | "output_type": "execute_result"
206 | }
207 | ],
208 | "source": [
209 | "s"
210 | ]
211 | },
212 | {
213 | "cell_type": "markdown",
214 | "metadata": {},
215 | "source": [
216 | "## Gif Producting with counting"
217 | ]
218 | },
219 | {
220 | "cell_type": "code",
221 | "execution_count": 78,
222 | "metadata": {},
223 | "outputs": [],
224 | "source": [
225 | "cap = cv2.VideoCapture('/home/alexattia/Work/RecVis/famvk.avi')\n",
226 | "fps = cap.get(cv2.CAP_PROP_FPS)\n",
227 | "initial_target = int(45 * fps) + 10\n",
228 | "final_target = int(49 * fps) + 10\n",
229 | "i = 0\n",
230 | "frames = []\n",
231 | "while(True):\n",
232 | " ret, frame = cap.read()\n",
233 | " i +=1 \n",
234 | " if i in range(initial_target, final_target, 1):\n",
235 | " frames.append(frame[:,:,::-1])\n",
236 | " if i == final_target:\n",
237 | " break"
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": 79,
243 | "metadata": {},
244 | "outputs": [],
245 | "source": [
246 | "detections = []\n",
247 | "for i, frame in enumerate(frames):\n",
248 | " with tf.Graph().as_default():\n",
249 | " b = tiny.evaluate(weight_file_path=weights_path, data_dir='.jpg', output_dir='', framee=frame,\n",
250 | " prob_thresh=0.5, nms_thresh=0.1, lw=3, \n",
251 | " display=False, save=False, draw=False, print_=0)\n",
252 | " detections.append(b[0])\n",
253 | " time.sleep(0.5)"
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": 72,
259 | "metadata": {},
260 | "outputs": [],
261 | "source": [
262 | "## Computing incremental count\n",
263 | "nbs = []\n",
264 | "init = len(all_detections[0][0])\n",
265 | "for j in range(1, 10):\n",
266 | " nbs.append(init)\n",
267 | " detections_ = all_detections[j]\n",
268 | " init += len(detections_[0]) - matcheds[j-1]\n",
269 | "init += len(detections_[3]) - matcheds[j]\n",
270 | "nbs.append(init)"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": 117,
276 | "metadata": {},
277 | "outputs": [],
278 | "source": [
279 | "k = 0\n",
280 | "l = 0\n",
281 | "images = []\n",
282 | "ff = []\n",
283 | "font = cv2.FONT_HERSHEY_SIMPLEX\n",
284 | "for j, frame in enumerate(frames):\n",
285 | " img = frame.copy()\n",
286 | " for detect_ in detections[j]:\n",
287 | " pt1, pt2 = tuple(detect_[:2]), tuple(detect_[2:])\n",
288 | " cv2.rectangle(img, pt1, pt2, (255, 0, 0), 2)\n",
289 | " cv2.putText(img, 'Incremental count : %d' % nbs[l], (1750,1300), font, 1.5, (0, 255, 0), 3)\n",
290 | " if j in range(10, 89, 9):\n",
291 | " l += 1\n",
292 | " images.append(img) \n",
293 | " cv2.imwrite('./output_video/frames_%05d.png' % j, img[:,:,::-1])"
294 | ]
295 | },
296 | {
297 | "cell_type": "markdown",
298 | "metadata": {},
299 | "source": [
300 | "## Gif Production without counting"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": 33,
306 | "metadata": {},
307 | "outputs": [],
308 | "source": [
309 | "cap = cv2.VideoCapture('/home/alexattia/Work/RecVis/famvk.avi')\n",
310 | "fps = cap.get(cv2.CAP_PROP_FPS)\n",
311 | "initial_target = int(45 * fps) + 10\n",
312 | "final_target = int(49 * fps) + 10\n",
313 | "i = 0\n",
314 | "frames = []\n",
315 | "while(True):\n",
316 | " ret, frame = cap.read()\n",
317 | " i +=1 \n",
318 | " if i in range(initial_target, final_target, 1):\n",
319 | " frames.append(frame[:,:,::-1])\n",
320 | " if i == final_target:\n",
321 | " break"
322 | ]
323 | },
324 | {
325 | "cell_type": "code",
326 | "execution_count": 32,
327 | "metadata": {},
328 | "outputs": [],
329 | "source": [
330 | "detections = []\n",
331 | "for i, frame in enumerate(frames):\n",
332 | " with tf.Graph().as_default():\n",
333 | " b = tiny.evaluate(weight_file_path=weights_path, data_dir='.jpg', output_dir='', framee=frame,\n",
334 | " prob_thresh=0.5, nms_thresh=0.1, lw=3, \n",
335 | " display=False, save=False, draw=False, print_=0)\n",
336 | " detections.append(b[0])\n",
337 | " time.sleep(0.5)"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": 41,
343 | "metadata": {},
344 | "outputs": [],
345 | "source": [
346 | "k = 0\n",
347 | "images = []\n",
348 | "for j, frame in enumerate(frames):\n",
349 | " img = frame.copy()\n",
350 | " for detect_ in detections[k]:\n",
351 | " pt1, pt2 = tuple(detect_[:2]), tuple(detect_[2:])\n",
352 | " cv2.rectangle(img, pt1, pt2, (255, 0, 0), 2)\n",
353 | " images.append(img)\n",
354 | " if j in range(0, 94, 2):\n",
355 | " k += 1\n",
356 | " cv2.imwrite('./output_video/frame_%05d.png' % j, img[:,:,::-1])"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": null,
362 | "metadata": {},
363 | "outputs": [],
364 | "source": []
365 | }
366 | ],
367 | "metadata": {
368 | "kernelspec": {
369 | "display_name": "TensorFlow",
370 | "language": "python",
371 | "name": "tf_env"
372 | },
373 | "language_info": {
374 | "codemirror_mode": {
375 | "name": "ipython",
376 | "version": 3
377 | },
378 | "file_extension": ".py",
379 | "mimetype": "text/x-python",
380 | "name": "python",
381 | "nbconvert_exporter": "python",
382 | "pygments_lexer": "ipython3",
383 | "version": "3.6.3"
384 | }
385 | },
386 | "nbformat": 4,
387 | "nbformat_minor": 2
388 | }
389 |
--------------------------------------------------------------------------------
/tiny_faces_model.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | from __future__ import absolute_import
3 | from __future__ import division
4 | from __future__ import print_function
5 |
6 | import tensorflow as tf
7 | import numpy as np
8 | import pickle
9 |
10 | class Model():
11 | def __init__(self, weight_file_path):
12 | """Overlay bounding boxes of face on images.
13 | Args:
14 | weight_file_path:
15 | A pretrained weight file in the pickle format
16 | generated by matconvnet_hr101_to_tf.py.
17 | Returns:
18 | None.
19 | """
20 | self.dtype = tf.float32
21 | self.weight_file_path = weight_file_path
22 | with open(self.weight_file_path, "rb") as f:
23 | self.mat_blocks_dict, self.mat_params_dict = pickle.load(f)
24 |
25 | def get_data_by_key(self, key):
26 | """Helper to access a pretrained model data through a key."""
27 | assert key in self.mat_params_dict, "key: " + key + " not found."
28 | return self.mat_params_dict[key]
29 |
30 | def _weight_variable_on_cpu(self, name, shape):
31 | """Helper to create a weight Variable stored on CPU memory.
32 |
33 | Args:
34 | name: name of the variable.
35 | shape: list of ints: (height, width, channel, filter).
36 |
37 | Returns:
38 | initializer for Variable.
39 | """
40 | assert len(shape) == 4
41 |
42 | weights = self.get_data_by_key(name + "_filter") # (h, w, channel, filter)
43 | assert list(weights.shape) == shape
44 | initializer = tf.constant_initializer(weights, dtype=self.dtype)
45 |
46 | with tf.device('/cpu:0'):
47 | var = tf.get_variable(name + "_w", shape, initializer=initializer, dtype=self.dtype)
48 | return var
49 |
50 | def _bias_variable_on_cpu(self, name, shape):
51 | """Helper to create a bias Variable stored on CPU memory.
52 |
53 | Args:
54 | name: name of the variable.
55 | shape: int, filter size.
56 |
57 | Returns:
58 | initializer for Variable.
59 | """
60 | assert isinstance(shape, int)
61 | bias = self.get_data_by_key(name + "_bias")
62 | assert len(bias) == shape
63 | initializer = tf.constant_initializer(bias, dtype=self.dtype)
64 |
65 | with tf.device('/cpu:0'):
66 | var = tf.get_variable(name + "_b", shape, initializer=initializer, dtype=self.dtype)
67 | return var
68 |
69 |
70 | def _bn_variable_on_cpu(self, name, shape):
71 | """Helper to create a batch normalization Variable stored on CPU memory.
72 |
73 | Args:
74 | name: name of the variable.
75 | shape: int, filter size.
76 |
77 | Returns:
78 | initializer for Variable.
79 | """
80 | assert isinstance(shape, int)
81 |
82 | name2 = "bn" + name[3:]
83 | if name.startswith("conv"):
84 | name2 = "bn_" + name
85 |
86 | scale = self.get_data_by_key(name2 + '_scale')
87 | offset = self.get_data_by_key(name2 + '_offset')
88 | mean = self.get_data_by_key(name2 + '_mean')
89 | variance = self.get_data_by_key(name2 + '_variance')
90 |
91 | with tf.device('/cpu:0'):
92 | initializer = tf.constant_initializer(scale, dtype=self.dtype)
93 | scale = tf.get_variable(name2 + "_scale", shape, initializer=initializer, dtype=self.dtype)
94 | initializer = tf.constant_initializer(offset, dtype=self.dtype)
95 | offset = tf.get_variable(name2 + "_offset", shape, initializer=initializer, dtype=self.dtype)
96 | initializer = tf.constant_initializer(mean, dtype=self.dtype)
97 | mean = tf.get_variable(name2 + "_mean", shape, initializer=initializer, dtype=self.dtype)
98 | initializer = tf.constant_initializer(variance, dtype=self.dtype)
99 | variance = tf.get_variable(name2 + "_variance", shape, initializer=initializer, dtype=self.dtype)
100 |
101 | return scale, offset, mean, variance
102 |
103 |
104 | def conv_block(self, bottom, name, shape, strides=[1,1,1,1], padding="SAME",
105 | has_bias=False, add_relu=True, add_bn=True, eps=1.0e-5):
106 | """Create a block composed of multiple layers:
107 | a conv layer
108 | a batch normalization layer
109 | an activation layer
110 |
111 | Args:
112 | bottom: A layer before this block.
113 | name: Name of the block.
114 | shape: List of ints: (height, width, channel, filter).
115 | strides: Strides of conv layer.
116 | padding: Padding of conv layer.
117 | has_bias: Whether a bias term is added.
118 | add_relu: Whether a ReLU layer is added.
119 | add_bn: Whether a batch normalization layer is added.
120 | eps: A small float number to avoid dividing by 0, used in a batch normalization layer.
121 | Returns:
122 | a block of layers
123 | """
124 | assert len(shape) == 4
125 |
126 | weight = self._weight_variable_on_cpu(name, shape)
127 | conv = tf.nn.conv2d(bottom, weight, strides, padding=padding)
128 | if has_bias:
129 | bias = self._bias_variable_on_cpu(name, shape[3])
130 |
131 | pre_activation = tf.nn.bias_add(conv, bias) if has_bias else conv
132 |
133 | if add_bn:
134 | # scale, offset, mean, variance = self._bn_variable_on_cpu("bn_" + name, shape[-1])
135 | scale, offset, mean, variance = self._bn_variable_on_cpu(name, shape[-1])
136 | pre_activation = tf.nn.batch_normalization(pre_activation, mean, variance, offset, scale, variance_epsilon=eps)
137 |
138 | relu = tf.nn.relu(pre_activation) if add_relu else pre_activation
139 |
140 | return relu
141 |
142 |
143 | def conv_trans_layer(self, bottom, name, shape, strides=[1,1,1,1], padding="SAME", has_bias=False):
144 | """Create a block composed of multiple layers:
145 | a transpose of conv layer
146 | an activation layer
147 |
148 | Args:
149 | bottom: A layer before this block.
150 | name: Name of the block.
151 | shape: List of ints: (height, width, channel, filter).
152 | strides: Strides of conv layer.
153 | padding: Padding of conv layer.
154 | has_bias: Whether a bias term is added.
155 | add_relu: Whether a ReLU layer is added.
156 | Returns:
157 | a block of layers
158 | """
159 | assert len(shape) == 4
160 |
161 | weight = self._weight_variable_on_cpu(name, shape)
162 | nb, h, w, nc = tf.split(tf.shape(bottom), num_or_size_splits=4)
163 | output_shape = tf.stack([nb, (h - 1) * strides[1] - 3 + shape[0], (w - 1) * strides[2] - 3 + shape[1], nc])[:, 0]
164 | conv = tf.nn.conv2d_transpose(bottom, weight, output_shape, strides, padding=padding)
165 | if has_bias:
166 | bias = self._bias_variable_on_cpu(name, shape[3])
167 |
168 | conv = tf.nn.bias_add(conv, bias) if has_bias else conv
169 |
170 | return conv
171 |
172 | def residual_block(self, bottom, name, in_channel, neck_channel, out_channel, trunk):
173 | """Create a residual block.
174 |
175 | Args:
176 | bottom: A layer before this block.
177 | name: Name of the block.
178 | in_channel: number of channels in a input tensor.
179 | neck_channel: number of channels in a bottleneck block.
180 | out_channel: number of channels in a output tensor.
181 | trunk: a tensor in a identity path.
182 | Returns:
183 | a block of layers
184 | """
185 | _strides = [1, 2, 2, 1] if name.startswith("res3a") or name.startswith("res4a") else [1, 1, 1, 1]
186 | res = self.conv_block(bottom, name + '_branch2a', shape=[1, 1, in_channel, neck_channel],
187 | strides=_strides, padding="VALID", add_relu=True)
188 | res = self.conv_block(res, name + '_branch2b', shape=[3, 3, neck_channel, neck_channel],
189 | padding="SAME", add_relu=True)
190 | res = self.conv_block(res, name + '_branch2c', shape=[1, 1, neck_channel, out_channel],
191 | padding="VALID", add_relu=False)
192 |
193 | res = trunk + res
194 | res = tf.nn.relu(res)
195 |
196 | return res
197 |
198 | def tiny_face(self, image):
199 | """Create a tiny face model.
200 |
201 | Args:
202 | image: an input image.
203 | Returns:
204 | a score tensor
205 | """
206 | img = tf.pad(image, [[0, 0], [3, 3], [3, 3], [0, 0]], "CONSTANT")
207 | conv = self.conv_block(img, 'conv1', shape=[7, 7, 3, 64], strides=[1, 2, 2, 1], padding="VALID", add_relu=True)
208 | pool1 = tf.nn.max_pool(conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
209 |
210 | res2a_branch1 = self.conv_block(pool1, 'res2a_branch1', shape=[1, 1, 64, 256], padding="VALID", add_relu=False)
211 | res2a = self.residual_block(pool1, 'res2a', 64, 64, 256, res2a_branch1)
212 | res2b = self.residual_block(res2a, 'res2b', 256, 64, 256, res2a)
213 | res2c = self.residual_block(res2b, 'res2c', 256, 64, 256, res2b)
214 |
215 | res3a_branch1 = self.conv_block(res2c, 'res3a_branch1', shape=[1, 1, 256, 512], strides=[1, 2, 2, 1], padding="VALID", add_relu=False)
216 | res3a = self.residual_block(res2c, 'res3a', 256, 128, 512, res3a_branch1)
217 |
218 | res3b1 = self.residual_block(res3a, 'res3b1', 512, 128, 512, res3a)
219 | res3b2 = self.residual_block(res3b1, 'res3b2', 512, 128, 512, res3b1)
220 | res3b3 = self.residual_block(res3b2, 'res3b3', 512, 128, 512, res3b2)
221 |
222 | res4a_branch1 = self.conv_block(res3b3, 'res4a_branch1', shape=[1, 1, 512, 1024], strides=[1, 2, 2, 1], padding="VALID", add_relu=False)
223 | res4a = self.residual_block(res3b3, 'res4a', 512, 256, 1024, res4a_branch1)
224 |
225 | res4b = res4a
226 | for i in range(1, 23):
227 | res4b = self.residual_block(res4b, 'res4b' + str(i), 1024, 256, 1024, res4b)
228 |
229 | score_res4 = self.conv_block(res4b, 'score_res4', shape=[1, 1, 1024, 125], padding="VALID",
230 | has_bias=True, add_relu=False, add_bn=False)
231 | score4 = self.conv_trans_layer(score_res4, 'score4', shape=[4, 4, 125, 125], strides=[1, 2, 2, 1], padding="SAME")
232 | score_res3 = self.conv_block(res3b3, 'score_res3', shape=[1, 1, 512, 125], padding="VALID",
233 | has_bias=True, add_bn=False, add_relu=False)
234 |
235 | bs, height, width = tf.split(tf.shape(score4), num_or_size_splits=4)[0:3]
236 | _size = tf.convert_to_tensor([height[0], width[0]])
237 | _offsets = tf.zeros([bs[0], 2])
238 | score_res3c = tf.image.extract_glimpse(score_res3, _size, _offsets, centered=True, normalized=False)
239 |
240 | score_final = score4 + score_res3c
241 | return score_final
242 |
--------------------------------------------------------------------------------
/util.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | from scipy.special import expit
3 | import numpy as np
4 | import cv2
5 |
6 | def overlay_bounding_boxes(raw_img, refined_bboxes, lw, draw):
7 | """
8 | Overlay bounding boxes of face on images.
9 | :param raw_img: target image.
10 | :param refined_bboxes: Bounding boxes of detected faces.
11 | :param lw: Line width of bounding boxes. If zero specified,
12 | this is determined based on confidence of each detection.
13 | :returns: bounding boxes
14 | """
15 |
16 | # Overlay bounding boxes on an image with the color based on the confidence.
17 | bboxes = []
18 | for r in refined_bboxes:
19 | _score = expit(r[4])
20 | cm_idx = int(np.ceil(_score * 255))
21 | rect_color = [int(np.ceil(x * 255)) for x in cm_data[cm_idx]] # parula
22 | _lw = lw
23 | if lw == 0: # line width of each bounding box is adaptively determined.
24 | bw, bh = r[2] - r[0] + 1, r[3] - r[0] + 1
25 | _lw = 1 if min(bw, bh) <= 20 else max(2, min(3, min(bh / 20, bw / 20)))
26 | _lw = int(np.ceil(_lw * _score))
27 |
28 | _r = [int(x) for x in r[:4]]
29 | if draw:
30 | cv2.rectangle(raw_img, (_r[0], _r[1]), (_r[2], _r[3]), rect_color, _lw)
31 | bboxes.append([_r[0], _r[1], _r[2], _r[3]])
32 | return bboxes
33 |
34 | # colormap parula borrowed from
35 | # https://github.com/BIDS/colormap/blob/master/fake_parula.py
36 | cm_data = [[ 0.26710521, 0.03311059, 0.6188155 ],
37 | [ 0.26493929, 0.04780926, 0.62261795],
38 | [ 0.26260545, 0.06084214, 0.62619176],
39 | [ 0.26009691, 0.07264411, 0.62951561],
40 | [ 0.25740785, 0.08360391, 0.63256745],
41 | [ 0.25453369, 0.09395358, 0.63532497],
42 | [ 0.25147146, 0.10384228, 0.6377661 ],
43 | [ 0.24822014, 0.11337029, 0.6398697 ],
44 | [ 0.24478105, 0.12260661, 0.64161629],
45 | [ 0.24115816, 0.131599 , 0.6429888 ],
46 | [ 0.23735836, 0.14038009, 0.64397346],
47 | [ 0.23339166, 0.14897137, 0.64456048],
48 | [ 0.22927127, 0.15738602, 0.64474476],
49 | [ 0.22501278, 0.16563165, 0.64452595],
50 | [ 0.22063349, 0.17371215, 0.64390834],
51 | [ 0.21616055, 0.18162302, 0.64290515],
52 | [ 0.21161851, 0.18936156, 0.64153295],
53 | [ 0.20703353, 0.19692415, 0.63981287],
54 | [ 0.20243273, 0.20430706, 0.63776986],
55 | [ 0.19784363, 0.211507 , 0.63543183],
56 | [ 0.19329361, 0.21852157, 0.63282872],
57 | [ 0.18880937, 0.2253495 , 0.62999156],
58 | [ 0.18442119, 0.23198815, 0.62695569],
59 | [ 0.18014936, 0.23844124, 0.62374886],
60 | [ 0.17601569, 0.24471172, 0.62040016],
61 | [ 0.17204028, 0.25080356, 0.61693715],
62 | [ 0.16824123, 0.25672163, 0.6133854 ],
63 | [ 0.16463462, 0.26247158, 0.60976836],
64 | [ 0.16123449, 0.26805963, 0.60610723],
65 | [ 0.15805279, 0.27349243, 0.60242099],
66 | [ 0.15509948, 0.27877688, 0.59872645],
67 | [ 0.15238249, 0.28392004, 0.59503836],
68 | [ 0.14990781, 0.28892902, 0.59136956],
69 | [ 0.14767951, 0.29381086, 0.58773113],
70 | [ 0.14569979, 0.29857245, 0.58413255],
71 | [ 0.1439691 , 0.30322055, 0.58058191],
72 | [ 0.14248613, 0.30776167, 0.57708599],
73 | [ 0.14124797, 0.31220208, 0.57365049],
74 | [ 0.14025018, 0.31654779, 0.57028011],
75 | [ 0.13948691, 0.32080454, 0.5669787 ],
76 | [ 0.13895174, 0.32497744, 0.56375063],
77 | [ 0.13863958, 0.32907012, 0.56060453],
78 | [ 0.138537 , 0.3330895 , 0.55753513],
79 | [ 0.13863384, 0.33704026, 0.55454374],
80 | [ 0.13891931, 0.34092684, 0.55163126],
81 | [ 0.13938212, 0.34475344, 0.54879827],
82 | [ 0.14001061, 0.34852402, 0.54604503],
83 | [ 0.14079292, 0.35224233, 0.54337156],
84 | [ 0.14172091, 0.35590982, 0.54078769],
85 | [ 0.14277848, 0.35953205, 0.53828312],
86 | [ 0.14395358, 0.36311234, 0.53585661],
87 | [ 0.1452346 , 0.36665374, 0.5335074 ],
88 | [ 0.14661019, 0.3701591 , 0.5312346 ],
89 | [ 0.14807104, 0.37363011, 0.52904278],
90 | [ 0.1496059 , 0.3770697 , 0.52692951],
91 | [ 0.15120289, 0.3804813 , 0.52488853],
92 | [ 0.15285214, 0.38386729, 0.52291854],
93 | [ 0.15454421, 0.38722991, 0.52101815],
94 | [ 0.15627225, 0.39056998, 0.5191937 ],
95 | [ 0.15802555, 0.39389087, 0.5174364 ],
96 | [ 0.15979549, 0.39719482, 0.51574311],
97 | [ 0.16157425, 0.40048375, 0.51411214],
98 | [ 0.16335571, 0.40375871, 0.51254622],
99 | [ 0.16513234, 0.40702178, 0.51104174],
100 | [ 0.1668964 , 0.41027528, 0.50959299],
101 | [ 0.16864151, 0.41352084, 0.50819797],
102 | [ 0.17036277, 0.41675941, 0.50685814],
103 | [ 0.1720542 , 0.41999269, 0.50557008],
104 | [ 0.17370932, 0.42322271, 0.50432818],
105 | [ 0.17532301, 0.42645082, 0.50313007],
106 | [ 0.17689176, 0.42967776, 0.50197686],
107 | [ 0.17841013, 0.43290523, 0.5008633 ],
108 | [ 0.17987314, 0.43613477, 0.49978492],
109 | [ 0.18127676, 0.43936752, 0.49873901],
110 | [ 0.18261885, 0.44260392, 0.49772638],
111 | [ 0.18389409, 0.44584578, 0.49673978],
112 | [ 0.18509911, 0.44909409, 0.49577605],
113 | [ 0.18623135, 0.4523496 , 0.494833 ],
114 | [ 0.18728844, 0.45561305, 0.49390803],
115 | [ 0.18826671, 0.45888565, 0.49299567],
116 | [ 0.18916393, 0.46216809, 0.49209268],
117 | [ 0.18997879, 0.46546084, 0.49119678],
118 | [ 0.19070881, 0.46876472, 0.49030328],
119 | [ 0.19135221, 0.47208035, 0.48940827],
120 | [ 0.19190791, 0.47540815, 0.48850845],
121 | [ 0.19237491, 0.47874852, 0.4876002 ],
122 | [ 0.19275204, 0.48210192, 0.48667935],
123 | [ 0.19303899, 0.48546858, 0.48574251],
124 | [ 0.19323526, 0.48884877, 0.48478573],
125 | [ 0.19334062, 0.49224271, 0.48380506],
126 | [ 0.19335574, 0.49565037, 0.4827974 ],
127 | [ 0.19328143, 0.49907173, 0.48175948],
128 | [ 0.19311664, 0.50250719, 0.48068559],
129 | [ 0.192864 , 0.50595628, 0.47957408],
130 | [ 0.19252521, 0.50941877, 0.47842186],
131 | [ 0.19210087, 0.51289469, 0.47722441],
132 | [ 0.19159194, 0.516384 , 0.47597744],
133 | [ 0.19100267, 0.51988593, 0.47467988],
134 | [ 0.19033595, 0.52340005, 0.47332894],
135 | [ 0.18959113, 0.5269267 , 0.47191795],
136 | [ 0.18877336, 0.530465 , 0.47044603],
137 | [ 0.18788765, 0.53401416, 0.46891178],
138 | [ 0.18693822, 0.53757359, 0.46731272],
139 | [ 0.18592276, 0.54114404, 0.46563962],
140 | [ 0.18485204, 0.54472367, 0.46389595],
141 | [ 0.18373148, 0.5483118 , 0.46207951],
142 | [ 0.18256585, 0.55190791, 0.4601871 ],
143 | [ 0.18135481, 0.55551253, 0.45821002],
144 | [ 0.18011172, 0.55912361, 0.45615277],
145 | [ 0.17884392, 0.56274038, 0.45401341],
146 | [ 0.17755858, 0.56636217, 0.45178933],
147 | [ 0.17625543, 0.56998972, 0.44946971],
148 | [ 0.174952 , 0.57362064, 0.44706119],
149 | [ 0.17365805, 0.57725408, 0.44456198],
150 | [ 0.17238403, 0.58088916, 0.4419703 ],
151 | [ 0.17113321, 0.58452637, 0.43927576],
152 | [ 0.1699221 , 0.58816399, 0.43648119],
153 | [ 0.1687662 , 0.5918006 , 0.43358772],
154 | [ 0.16767908, 0.59543526, 0.43059358],
155 | [ 0.16667511, 0.59906699, 0.42749697],
156 | [ 0.16575939, 0.60269653, 0.42428344],
157 | [ 0.16495764, 0.6063212 , 0.42096245],
158 | [ 0.16428695, 0.60993988, 0.41753246],
159 | [ 0.16376481, 0.61355147, 0.41399151],
160 | [ 0.16340924, 0.61715487, 0.41033757],
161 | [ 0.16323549, 0.62074951, 0.40656329],
162 | [ 0.16326148, 0.62433443, 0.40266378],
163 | [ 0.16351136, 0.62790748, 0.39864431],
164 | [ 0.16400433, 0.63146734, 0.39450263],
165 | [ 0.16475937, 0.63501264, 0.39023638],
166 | [ 0.16579502, 0.63854196, 0.38584309],
167 | [ 0.16712921, 0.64205381, 0.38132023],
168 | [ 0.168779 , 0.64554661, 0.37666513],
169 | [ 0.17075915, 0.64901912, 0.37186962],
170 | [ 0.17308572, 0.65246934, 0.36693299],
171 | [ 0.1757732 , 0.65589512, 0.36185643],
172 | [ 0.17883344, 0.65929449, 0.3566372 ],
173 | [ 0.18227669, 0.66266536, 0.35127251],
174 | [ 0.18611159, 0.66600553, 0.34575959],
175 | [ 0.19034516, 0.66931265, 0.34009571],
176 | [ 0.19498285, 0.67258423, 0.3342782 ],
177 | [ 0.20002863, 0.67581761, 0.32830456],
178 | [ 0.20548509, 0.67900997, 0.3221725 ],
179 | [ 0.21135348, 0.68215834, 0.31587999],
180 | [ 0.2176339 , 0.68525954, 0.30942543],
181 | [ 0.22432532, 0.68831023, 0.30280771],
182 | [ 0.23142568, 0.69130688, 0.29602636],
183 | [ 0.23893914, 0.69424565, 0.28906643],
184 | [ 0.2468574 , 0.69712255, 0.28194103],
185 | [ 0.25517514, 0.69993351, 0.27465372],
186 | [ 0.26388625, 0.70267437, 0.26720869],
187 | [ 0.27298333, 0.70534087, 0.25961196],
188 | [ 0.28246016, 0.70792854, 0.25186761],
189 | [ 0.29232159, 0.71043184, 0.2439642 ],
190 | [ 0.30253943, 0.71284765, 0.23594089],
191 | [ 0.31309875, 0.71517209, 0.22781515],
192 | [ 0.32399522, 0.71740028, 0.21959115],
193 | [ 0.33520729, 0.71952906, 0.21129816],
194 | [ 0.3467003 , 0.72155723, 0.20298257],
195 | [ 0.35846225, 0.72348143, 0.19466318],
196 | [ 0.3704552 , 0.72530195, 0.18639333],
197 | [ 0.38264126, 0.72702007, 0.17822762],
198 | [ 0.39499483, 0.72863609, 0.17020921],
199 | [ 0.40746591, 0.73015499, 0.1624122 ],
200 | [ 0.42001969, 0.73158058, 0.15489659],
201 | [ 0.43261504, 0.73291878, 0.14773267],
202 | [ 0.44521378, 0.73417623, 0.14099043],
203 | [ 0.45777768, 0.73536072, 0.13474173],
204 | [ 0.47028295, 0.73647823, 0.1290455 ],
205 | [ 0.48268544, 0.73753985, 0.12397794],
206 | [ 0.49497773, 0.73854983, 0.11957878],
207 | [ 0.5071369 , 0.73951621, 0.11589589],
208 | [ 0.51913764, 0.74044827, 0.11296861],
209 | [ 0.53098624, 0.74134823, 0.11080237],
210 | [ 0.5426701 , 0.74222288, 0.10940411],
211 | [ 0.55417235, 0.74308049, 0.10876749],
212 | [ 0.56550904, 0.74392086, 0.10885609],
213 | [ 0.57667994, 0.74474781, 0.10963233],
214 | [ 0.58767906, 0.74556676, 0.11105089],
215 | [ 0.59850723, 0.74638125, 0.1130567 ],
216 | [ 0.609179 , 0.74719067, 0.11558918],
217 | [ 0.61969877, 0.74799703, 0.11859042],
218 | [ 0.63007148, 0.74880206, 0.12200388],
219 | [ 0.64030249, 0.74960714, 0.12577596],
220 | [ 0.65038997, 0.75041586, 0.12985641],
221 | [ 0.66034774, 0.75122659, 0.1342004 ],
222 | [ 0.67018264, 0.75203968, 0.13876817],
223 | [ 0.67990043, 0.75285567, 0.14352456],
224 | [ 0.68950682, 0.75367492, 0.14843886],
225 | [ 0.69900745, 0.75449768, 0.15348445],
226 | [ 0.70840781, 0.75532408, 0.15863839],
227 | [ 0.71771325, 0.75615416, 0.16388098],
228 | [ 0.72692898, 0.75698787, 0.1691954 ],
229 | [ 0.73606001, 0.75782508, 0.17456729],
230 | [ 0.74511119, 0.75866562, 0.17998443],
231 | [ 0.75408719, 0.75950924, 0.18543644],
232 | [ 0.76299247, 0.76035568, 0.19091446],
233 | [ 0.77183123, 0.76120466, 0.19641095],
234 | [ 0.78060815, 0.76205561, 0.20191973],
235 | [ 0.78932717, 0.76290815, 0.20743538],
236 | [ 0.79799213, 0.76376186, 0.21295324],
237 | [ 0.8066067 , 0.76461631, 0.21846931],
238 | [ 0.81517444, 0.76547101, 0.22398014],
239 | [ 0.82369877, 0.76632547, 0.2294827 ],
240 | [ 0.832183 , 0.7671792 , 0.2349743 ],
241 | [ 0.8406303 , 0.76803167, 0.24045248],
242 | [ 0.84904371, 0.76888236, 0.24591492],
243 | [ 0.85742615, 0.76973076, 0.25135935],
244 | [ 0.86578037, 0.77057636, 0.25678342],
245 | [ 0.87410891, 0.77141875, 0.2621846 ],
246 | [ 0.88241406, 0.77225757, 0.26755999],
247 | [ 0.89070781, 0.77308772, 0.27291122],
248 | [ 0.89898836, 0.77391069, 0.27823228],
249 | [ 0.90725475, 0.77472764, 0.28351668],
250 | [ 0.91550775, 0.77553893, 0.28875751],
251 | [ 0.92375722, 0.7763404 , 0.29395046],
252 | [ 0.9320227 , 0.77712286, 0.29909267],
253 | [ 0.94027715, 0.7779011 , 0.30415428],
254 | [ 0.94856742, 0.77865213, 0.3091325 ],
255 | [ 0.95686038, 0.7793949 , 0.31397459],
256 | [ 0.965222 , 0.7800975 , 0.31864342],
257 | [ 0.97365189, 0.78076521, 0.32301107],
258 | [ 0.98227405, 0.78134549, 0.32678728],
259 | [ 0.99136564, 0.78176999, 0.3281624 ],
260 | [ 0.99505988, 0.78542889, 0.32106514],
261 | [ 0.99594185, 0.79046888, 0.31648808],
262 | [ 0.99646635, 0.79566972, 0.31244662],
263 | [ 0.99681528, 0.80094905, 0.30858532],
264 | [ 0.9970578 , 0.80627441, 0.30479247],
265 | [ 0.99724883, 0.81161757, 0.30105328],
266 | [ 0.99736711, 0.81699344, 0.29725528],
267 | [ 0.99742254, 0.82239736, 0.29337235],
268 | [ 0.99744736, 0.82781159, 0.28943391],
269 | [ 0.99744951, 0.83323244, 0.28543062],
270 | [ 0.9973953 , 0.83867931, 0.2812767 ],
271 | [ 0.99727248, 0.84415897, 0.27692897],
272 | [ 0.99713953, 0.84963903, 0.27248698],
273 | [ 0.99698641, 0.85512544, 0.26791703],
274 | [ 0.99673736, 0.86065927, 0.26304767],
275 | [ 0.99652358, 0.86616957, 0.25813608],
276 | [ 0.99622774, 0.87171946, 0.25292044],
277 | [ 0.99590494, 0.87727931, 0.24750009],
278 | [ 0.99555225, 0.88285068, 0.2418514 ],
279 | [ 0.99513763, 0.8884501 , 0.23588062],
280 | [ 0.99471252, 0.89405076, 0.2296837 ],
281 | [ 0.99421873, 0.89968246, 0.2230963 ],
282 | [ 0.99370185, 0.90532165, 0.21619768],
283 | [ 0.99313786, 0.91098038, 0.2088926 ],
284 | [ 0.99250707, 0.91666811, 0.20108214],
285 | [ 0.99187888, 0.92235023, 0.19290417],
286 | [ 0.99110991, 0.92809686, 0.18387963],
287 | [ 0.99042108, 0.93379995, 0.17458127],
288 | [ 0.98958484, 0.93956962, 0.16420166],
289 | [ 0.98873988, 0.94533859, 0.15303117],
290 | [ 0.98784836, 0.95112482, 0.14074826],
291 | [ 0.98680727, 0.95697596, 0.12661626]]
292 |
--------------------------------------------------------------------------------