├── Finding tiny faces.pdf ├── README.md ├── data ├── .DS_Store ├── 92_TinyFaces.png ├── Downscale_graph.png ├── Downscaling.png ├── TotalIncrementalCount.gif └── benchmark.png ├── detect.py ├── evaluate.py ├── metrics.py ├── notebooks ├── Blurring.ipynb ├── Counting in video.ipynb ├── Downsampling.ipynb ├── Face Detection algorithms comparison.ipynb └── Video detection.ipynb ├── tiny_faces_model.py └── util.py /Finding tiny faces.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/Finding tiny faces.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ExtendedTinyFaces 2 | Analysis, review and application of Finding Tiny Faces (P. Hu) [1] with a focus on counting the many faces in a demonstration/crowd. 3 | RecVis (MVA) course - Alexandre Attia, [Sharone Dayan](https://github.com/SharoneDayan) 4 | **You can find our [pre-print report on ArXiv](https://arxiv.org/abs/1801.06504)**. 5 | 6 | ### Introduction 7 | The paper - released at CVPR 2017 - deals with finding small objects (particularly faces in our case) in an image, 8 | based on scale-specific detectors by using features defined over single (deep) feature hierarchy : 9 | Scale Invariance, Image resolution, Contextual reasoning. The algorithm is based on foveal descriptors, i.e blurring the peripheral image to encode and give just enough information about the context, mimicking the human vision. 10 | The subject is still an open challenge and we would like to enlarge it to new horizons and experiment this approach to different applications. The goal would be to deeply understand the choices of the paper, together with their applications on subjects related to security and identification. We are mainly focus on the inference part using a TensorFlow implementation, adapted from [this repo](https://github.com/cydonia999/Tiny_Faces_in_Tensorflow). 11 | 12 |

13 | 14 | ### Face detection benchmark 15 | First, we aim at comparing the Tiny Faces algorithm with other face detection models. 16 | We use two particular sub-folders of the WIDERFACE dataset (*Parade* and *Dresses*) to compare our model with Faster R-CNN for face detection (using [MXNet](https://github.com/tornadomeet/mxnet-face), MTCNN[6] (using [MXNET](https://github.com/pangyupo/mxnet_mtcnn_face_detection)), Haar Cascade[2] and HOG[3]. 17 | This benchmark can be found in this [notebook](https://github.com/alexattia/ExtendedTinyFaces/blob/master/notebooks/Face%20Detection%20algorithms%20comparison.ipynb) 18 | ![Benchmark](https://github.com/alexattia/ExtendedTinyFaces/blob/master/data/benchmark.png) 19 | 20 | ### Image resolution influence 21 | The performance of the Tiny Faces algorithm is linked with the image resolution. Indeed, this parameter really affects the face detection as explained in the original paper. We used the inference part and plotted the variations of detected faces while downscaling the image resolution. 22 |

23 | 24 | ### Face Recognition 25 | Face recognition can be another application of the paper. Thus, we aim at building a Python pipeline for face recognition. 26 | We would like to use face alignment[4] and face embedding[5] to achieve face classification. 27 | The first application we would like to explore includes : counting the many different faces 28 | (numerous people displayed with different size in the picture) in a video of a crowded public demonstration. 29 | This application can be found in this 30 | [notebook](https://github.com/alexattia/ExtendedTinyFaces/blob/master/notebooks/Counting%20in%20video.ipynb). 31 | In order to achieve it, we have to match people from one frame to another one to make sure the counting of a person is not redundant. The matching is achievied with face recognition and we count people with face detection. We used a linear SVM for the face classificaton. 32 | ![alt-text-1](https://github.com/alexattia/ExtendedTinyFaces/blob/master/data/TotalIncrementalCount.gif) 33 | 34 | ### Repository organisation 35 | [notebooks](https://github.com/alexattia/ExtendedTinyFaces/tree/master/notebooks) Notebooks folder with the different application and experiments 36 | [detect.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/detect.py) File for the people matching in order to count people (cf the Counting people notebook) 37 | [evaluate.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/evaluate.py) Inference function : detecting faces in one (or mulitple) picture 38 | [tiny_faces_model.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/tiny_faces_model.py) Tiny Faces model 39 | [util.py](https://github.com/alexattia/ExtendedTinyFaces/blob/master/util.py) Misc for overlay bounding boxes 40 | 41 | ### References 42 | [[1]](https://arxiv.org/abs/1612.04402) Peiyun Hu and Deva Ramanan. Finding Tiny Faces. 2017. 43 | [[2]](https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf) P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. 2001. 44 | [[3]](http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf) Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection. 2005. 45 | [[4]](https://pdfs.semanticscholar.org/d78b/6a5b0dcaa81b1faea5fb0000045a62513567.pdf) Vahid Kazemi and Josephine Sullivan. One Millisecond Face Alignment with an Ensemble of Regression Trees 46 | [[5]](https://arxiv.org/abs/1503.03832) Florian Schroff, Dmitry Kalenichenko and James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. 2015 47 | [[6]](https://arxiv.org/abs/1604.02878)Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks 48 | 49 | -------------------------------------------------------------------------------- /data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/.DS_Store -------------------------------------------------------------------------------- /data/92_TinyFaces.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/92_TinyFaces.png -------------------------------------------------------------------------------- /data/Downscale_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/Downscale_graph.png -------------------------------------------------------------------------------- /data/Downscaling.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/Downscaling.png -------------------------------------------------------------------------------- /data/TotalIncrementalCount.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/TotalIncrementalCount.gif -------------------------------------------------------------------------------- /data/benchmark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexattia/ExtendedTinyFaces/ca02746297641d66c55d1bc5f0a0587c2dfaf983/data/benchmark.png -------------------------------------------------------------------------------- /detect.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import dlib 3 | from imgaug import augmenters as iaa 4 | import pandas as pd 5 | from sklearn.svm import SVC 6 | import random 7 | 8 | face_encoder = dlib.face_recognition_model_v1('./model/dlib_face_recognition_resnet_model_v1.dat') 9 | face_pose_predictor = dlib.shape_predictor('./model/shape_predictor_68_face_landmarks.dat') 10 | 11 | def encoding_faces(images, label, coord_detect): 12 | """ 13 | Encoding a list of faces using FaceNet generating a 128D vector 14 | :param images: list of images of faces to encode 15 | :param coord_detect: coordinates of the detection 16 | :return: pandas dataframe of the faces for one label 17 | """ 18 | l = [] 19 | for img, d in zip(images, coord_detect): 20 | (x1, y1, x2, y2) = d 21 | detected_face = dlib.rectangle(left=0, top=0, right=int(x2-x1), bottom=int(y2-y1)) 22 | pose_landmarks = face_pose_predictor(img, detected_face) 23 | face_encoding = face_encoder.compute_face_descriptor(img, pose_landmarks, 1) 24 | l.append(np.append(face_encoding, [label])) 25 | 26 | return np.array(l) 27 | 28 | def create_positive_set(pictures, coords, label=1): 29 | """ 30 | Create positive train set for one face from a list of three pictures of this face. 31 | Data Augmentation on these three pictures to generate 10 pictures. 32 | Encoding of the ten pictures 33 | :param pictures: list of three full pictures 34 | :param coords: list of the coordinates of the face in the first frame 35 | :return: pandas dataframe of the faces for one person 36 | """ 37 | # original coordinate 38 | x1, y1, x2, y2 = coords 39 | # Load the three same faces 40 | images = [pictures[j][y1:y2,x1:x2,:] for j in range(3)] 41 | # triple each picture 42 | images = [item for item in images for i in range(5)] 43 | # Sometimes(0.5, ...) applies the given augmenter in 50% of all cases, 44 | st = lambda aug: iaa.Sometimes(0.5, aug) 45 | # add a random value from the range (-30, 30) to the first three channels and gaussian noise 46 | aug = iaa.Sequential([ 47 | iaa.WithChannels( channels=[0, 1, 2], children=iaa.Add((-30, 30))), 48 | st(iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5))]) 49 | 50 | images_aug = aug.augment_images(images) 51 | # add the original frame 52 | images_aug.append(pictures[0][y1:y2,x1:x2,:]) 53 | 54 | # encode each of 10 faces (of the same person) 55 | coords_detect = [coords for k in range(len(images_aug))] 56 | return pd.DataFrame(encoding_faces(images_aug, label, coords_detect)) 57 | 58 | def train_binclas(pics, detections, idx_detection): 59 | """ 60 | Create train set and traing a binary SVM to classify faces for one original 61 | face (from frame 0) 62 | """ 63 | pos = create_positive_set(pics, detections[0][idx_detection]) 64 | 65 | # Choose 10 other detections from the first frame 66 | neg_detect = np.array([k for i, k in enumerate(detections[0]) if i != idx_detection]) 67 | idx_neg = random.sample(range(len(neg_detect)), 10) 68 | 69 | # Get face images for the 10 detections 70 | img_neg = [pics[0][y1_:y2_,x1_:x2_,:] for (x1_, y1_, x2_, y2_) in neg_detect[idx_neg]] 71 | # Encode each face 72 | neg = pd.DataFrame(encoding_faces(img_neg, 0, neg_detect[idx_neg])) 73 | 74 | # join positive and negative samples 75 | df = pd.concat([pos, neg]) 76 | df = df.sample(len(df)).reset_index(drop=True) 77 | y = df[128] 78 | X = df.drop(128, axis=1) 79 | 80 | # training 81 | clf = SVC(C=1, kernel='linear', probability=True) 82 | clf.fit(X, y) 83 | # keeping 4th picture detections in the neighborhoud 84 | x1, y1, x2, y2 = detections[0][idx_detection] 85 | neigh_detect = [k for k in detections[::-1][0] if 86 | np.abs(k[0]-x1) < 600 and 87 | np.abs(k[1]-y1) < 600 and 88 | np.abs(k[2]-x2) < 600 and 89 | np.abs(k[3]-y2) < 600] 90 | 91 | # Get face images to classify 92 | img_neighb = [pics[::-1][0][y1_:y2_,x1_:x2_,:] for (x1_, y1_, x2_, y2_) in neigh_detect] 93 | # Encode each face 94 | neigh_detect_encodings = encoding_faces(img_neighb, -1, neigh_detect)[:,:128] 95 | # compute distances 96 | distances = clf.predict_proba(neigh_detect_encodings) 97 | 98 | return neigh_detect, distances 99 | -------------------------------------------------------------------------------- /evaluate.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import tensorflow as tf 3 | import tiny_faces_model as tiny_model 4 | import util 5 | from argparse import ArgumentParser 6 | import cv2 7 | import scipy.io 8 | import numpy as np 9 | import matplotlib.pyplot as plt 10 | import cv2 11 | import pickle 12 | 13 | import pylab as pl 14 | import time 15 | import os 16 | import sys 17 | from scipy.special import expit 18 | import glob 19 | 20 | MAX_INPUT_DIM = 5000.0 21 | 22 | def evaluate(weight_file_path, output_dir=None, data_dir=None, img=None, list_imgs=None, 23 | prob_thresh=0.5, nms_thresh=0.1, lw=3, display=False, 24 | draw=True, save=True, print_=0): 25 | """ 26 | Detect faces in images. 27 | :param weight_file_path: A pretrained weight file in the pickle format 28 | generated by matconvnet_hr101_to_tf.py. 29 | :param output_dir: A directory into which images with detected faces are output. 30 | default=None to not output detected faces 31 | :param data_dir: A directory which contains images for face detection. 32 | :param img: One image for face detection 33 | :param prob_thresh: The threshold of detection confidence. 34 | :param nms_thresh: The overlap threshold of non maximum suppression 35 | :param lw: Line width of bounding boxes. If zero specified, 36 | this is determined based on confidence of each detection. 37 | :param display: Display tiny face images on window. 38 | :param draw: Draw bouding boxes on images. 39 | :param save: Save images in output_dir. 40 | :param print_: 0 for no print, 1 for light print, 2 for full print 41 | :return: final bboxes 42 | """ 43 | if type(img) != np.ndarray: 44 | one_pic = False 45 | else: 46 | one_pic = True 47 | 48 | if not output_dir: 49 | save = False 50 | draw = False 51 | 52 | # list of bounding boxes for the pictures 53 | final_bboxes = [] 54 | 55 | # placeholder of input images. Currently batch size of one is supported. 56 | x = tf.placeholder(tf.float32, [1, None, None, 3]) # n, h, w, c 57 | 58 | # Create the tiny face model which weights are loaded from a pretrained model. 59 | model = tiny_model.Model(weight_file_path) 60 | score_final = model.tiny_face(x) 61 | 62 | # Load an average image and clusters(reference boxes of templates). 63 | with open(weight_file_path, "rb") as f: 64 | _, mat_params_dict = pickle.load(f) 65 | 66 | # Average RGB values from model 67 | average_image = model.get_data_by_key("average_image") 68 | 69 | # Reference boxes of template for 05x, 1x, and 2x scale 70 | clusters = model.get_data_by_key("clusters") 71 | clusters_h = clusters[:, 3] - clusters[:, 1] + 1 72 | clusters_w = clusters[:, 2] - clusters[:, 0] + 1 73 | normal_idx = np.where(clusters[:, 4] == 1) 74 | 75 | # Find image files in data_dir. 76 | filenames = [] 77 | # if we provide only one picture, no need to list files in dir 78 | if one_pic: 79 | filenames = [img] 80 | elif type(list_imgs) == list: 81 | filenames = list_imgs 82 | else: 83 | for ext in ('*.png', '*.gif', '*.jpg', '*.jpeg'): 84 | filenames.extend(glob.glob(os.path.join(data_dir, ext))) 85 | 86 | # main 87 | with tf.Session() as sess: 88 | sess.run(tf.global_variables_initializer()) 89 | for filename in filenames: 90 | # if we provide only one picture, no need to list files in dir 91 | if not one_pic and type(list_imgs) != list: 92 | fname = filename.split(os.sep)[-1] 93 | raw_img = cv2.imread(filename) 94 | raw_img = cv2.cvtColor(raw_img, cv2.COLOR_BGR2RGB) 95 | else: 96 | fname = 'current_picture' 97 | raw_img = filename 98 | raw_img_f = raw_img.astype(np.float32) 99 | 100 | def _calc_scales(): 101 | """ 102 | Compute the different scales for detection 103 | :return: [2^X] with X depending on the input image 104 | """ 105 | raw_h, raw_w = raw_img.shape[0], raw_img.shape[1] 106 | min_scale = min(np.floor(np.log2(np.max(clusters_w[normal_idx] / raw_w))), 107 | np.floor(np.log2(np.max(clusters_h[normal_idx] / raw_h)))) 108 | max_scale = min(1.0, -np.log2(max(raw_h, raw_w) / MAX_INPUT_DIM)) 109 | scales_down = pl.frange(min_scale, 0, 1.) 110 | scales_up = pl.frange(0.5, max_scale, 0.5) 111 | scales_pow = np.hstack((scales_down, scales_up)) 112 | scales = np.power(2.0, scales_pow) 113 | return scales 114 | 115 | scales = _calc_scales() 116 | start = time.time() 117 | 118 | # initialize output 119 | bboxes = np.empty(shape=(0, 5)) 120 | 121 | # process input at different scales 122 | for s in scales: 123 | if print_ == 2: 124 | print("Processing {} at scale {:.4f}".format(fname, s)) 125 | img = cv2.resize(raw_img_f, (0, 0), fx=s, fy=s, interpolation=cv2.INTER_LINEAR) 126 | img = img - average_image 127 | img = img[np.newaxis, :] 128 | 129 | # we don't run every template on every scale ids of templates to ignore 130 | tids = list(range(4, 12)) + ([] if s <= 1.0 else list(range(18, 25))) 131 | ignoredTids = list(set(range(0, clusters.shape[0])) - set(tids)) 132 | 133 | # run through the net 134 | score_final_tf = sess.run(score_final, feed_dict={x: img}) 135 | 136 | # collect scores 137 | score_cls_tf, score_reg_tf = score_final_tf[:, :, :, :25], score_final_tf[:, :, :, 25:125] 138 | prob_cls_tf = expit(score_cls_tf) 139 | prob_cls_tf[0, :, :, ignoredTids] = 0.0 140 | 141 | def _calc_bounding_boxes(): 142 | # threshold for detection 143 | _, fy, fx, fc = np.where(prob_cls_tf > prob_thresh) 144 | 145 | # interpret heatmap into bounding boxes 146 | cy = fy * 8 - 1 147 | cx = fx * 8 - 1 148 | ch = clusters[fc, 3] - clusters[fc, 1] + 1 149 | cw = clusters[fc, 2] - clusters[fc, 0] + 1 150 | 151 | # extract bounding box refinement 152 | Nt = clusters.shape[0] 153 | tx = score_reg_tf[0, :, :, 0:Nt] 154 | ty = score_reg_tf[0, :, :, Nt:2*Nt] 155 | tw = score_reg_tf[0, :, :, 2*Nt:3*Nt] 156 | th = score_reg_tf[0, :, :, 3*Nt:4*Nt] 157 | 158 | # refine bounding boxes 159 | dcx = cw * tx[fy, fx, fc] 160 | dcy = ch * ty[fy, fx, fc] 161 | rcx = cx + dcx 162 | rcy = cy + dcy 163 | rcw = cw * np.exp(tw[fy, fx, fc]) 164 | rch = ch * np.exp(th[fy, fx, fc]) 165 | 166 | scores = score_cls_tf[0, fy, fx, fc] 167 | tmp_bboxes = np.vstack((rcx - rcw / 2, rcy - rch / 2, rcx + rcw / 2, rcy + rch / 2)) 168 | tmp_bboxes = np.vstack((tmp_bboxes / s, scores)) 169 | tmp_bboxes = tmp_bboxes.transpose() 170 | return tmp_bboxes 171 | 172 | tmp_bboxes = _calc_bounding_boxes() 173 | bboxes = np.vstack((bboxes, tmp_bboxes)) # : (5265, 5) 174 | 175 | if print_ >= 1: 176 | print("time {:.2f} secs for {}".format(time.time() - start, fname)) 177 | 178 | # non maximum suppression 179 | refind_idx = tf.image.non_max_suppression(tf.convert_to_tensor(bboxes[:, :4], dtype=tf.float32), 180 | tf.convert_to_tensor(bboxes[:, 4], dtype=tf.float32), 181 | max_output_size=bboxes.shape[0], iou_threshold=nms_thresh) 182 | refind_idx = sess.run(refind_idx) 183 | refined_bboxes = bboxes[refind_idx] 184 | 185 | # convert bbox coordinates to int 186 | # f_box = overlay_bounding_boxes(raw_img, refined_bboxes, lw, draw) 187 | f_box = [[int(x) for x in r[:4] for r in refined_bboxes]] 188 | 189 | if display: 190 | # plt.axis('off') 191 | plt.imshow(raw_img) 192 | plt.show() 193 | 194 | if save: 195 | # save image with bounding boxes 196 | raw_img = cv2.cvtColor(raw_img, cv2.COLOR_RGB2BGR) 197 | cv2.imwrite(os.path.join(output_dir, fname), raw_img) 198 | 199 | final_bboxes.append(f_box) 200 | 201 | if len(final_bboxes) == 1: 202 | final_bboxes = final_bboxes[0] 203 | return final_bboxes 204 | -------------------------------------------------------------------------------- /metrics.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import glob 4 | data_folder = './tiny/data/widerface/WIDER_val/images/' 5 | 6 | def get_folder_name(pic): 7 | """ 8 | Get folder name from the picture name 9 | 1_Handshaking_Handshaking_1_411.jpg -->1--Handshaking/ 10 | :param pic: picture name 11 | :return: folder name 12 | """ 13 | x = pic.split('_')[1:3] 14 | s = pic.split('_')[0]+ '--'+ '_'.join(sorted(set(x), key=x.index)) + '/' 15 | 16 | if 'Demonstration' in s: 17 | try: 18 | s = s[:s.index('_')] + '/' 19 | except ValueError: 20 | pass 21 | return s 22 | 23 | def jaccard_distance(boxA, boxB): 24 | """ 25 | Calculate the Intersection over Union (IoU) of two bounding boxes. 26 | :param bb1: list [x1, x2, y1, y2] 27 | The (x1, y1) position is at the top left corner, 28 | the (x2, y2) position is at the bottom right corner 29 | :param bb2: list [x1, x2, y1, y2] 30 | The (x1, y1) position is at the top left corner, 31 | the (x2, y2) position is at the bottom right corner 32 | :return: float in [0, 1] 33 | """ 34 | # determine the (x, y)-coordinates of the intersection rectangle 35 | xA = max(boxA[0], boxB[0]) 36 | yA = max(boxA[1], boxB[1]) 37 | xB = min(boxA[2], boxB[2]) 38 | yB = min(boxA[3], boxB[3]) 39 | 40 | # compute the area of intersection rectangle 41 | interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1) 42 | 43 | # compute the area of both the prediction and ground-truth 44 | # rectangles 45 | boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1) 46 | boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1) 47 | 48 | # compute the intersection over union by taking the intersection 49 | # area and dividing it by the sum of prediction + ground-truth 50 | # areas - the interesection area 51 | iou = interArea / float(boxAArea + boxBArea - interArea) 52 | 53 | # return the intersection over union value 54 | return iou * (iou > 0.5) 55 | 56 | def find_best_bbox(box, predicted_boxes): 57 | """ 58 | Find the corresponding predicted bounding box 59 | compared to the ground truth 60 | :param box: ground truth bounding box 61 | :param predicted_boxes: list of predicted bounding boxes 62 | :return: index of the corresponding bbox, jaccard distance 63 | """ 64 | if type(box) == list: 65 | box = ' '.join(map(str,box)) 66 | (x1, y1, w, h) = map(int,box.split()[:4]) 67 | boxA = [x1, y1, x1+w, y1+h] 68 | l = [] 69 | # boxB : [x1, x2, y1, y2] (top-left and bottom-right) 70 | for boxB in predicted_boxes: 71 | l.append(jaccard_distance(boxA, boxB)) 72 | if len(l) > 0: 73 | return np.argmax(l), np.max(l) 74 | else: 75 | return -1, 0 76 | 77 | def mean_jaccard(truth_boxes, predicted_boxes, only_tp=True, blurred=0): 78 | """ 79 | Compute the average Jaccard distance for the bounding boxes of 80 | one picture. 81 | :param truth_boxes: ground truth bounding boxes 82 | :param predicted_boxes: predicted bounding boxes 83 | :param only_tp: boolean to only keep true positive bounding bo 84 | :return: mean jaccard and number of TP (None, if no TP found) 85 | """ 86 | l = [] 87 | for truth_box in truth_boxes: 88 | if int(truth_box.split()[4]) >= blurred: 89 | _, jd = find_best_bbox(truth_box, predicted_boxes) 90 | l.append(jd) 91 | if only_tp: 92 | l = [k for k in l if k > 0] 93 | if len(l) > 0: 94 | return np.mean(l), len(l) 95 | 96 | def compute_stats(data_dir, truth, predictions, blurred=0): 97 | """ 98 | Compute the mean Jaccard distance and the ratio of predicted bounding 99 | boxes compared to the number of actual bounding boxes 100 | :param data_dir: directory path with the pictures 101 | :param truth: dict of actual annotations of the bounding boxes 102 | d[name] = [(x1, y1, w, h, blur, expression, illumination, invalid, occlusion, pose)] 103 | :param predictions: list of predicted bounding boxes 104 | keeping the same order of glob.glob(pictures folder) 105 | :param blurred: 0 for all faces, 1 for normal blurred faces, 2 for heavy blurred faces 106 | :return: (len(pictures), 4) numpy array and the corresponding panda DataFrame 107 | ['mean Jaccard', 'Nb_Truth_Bboxes', 'Nb_Pred_Bboxes', 'Ratio_Bboxes'] 108 | """ 109 | pictures = glob.glob(data_dir + '*') 110 | n_pictures = len(pictures) 111 | jaccard, n_truth_boxes, n_pred_boxes = [], [], [] 112 | a = np.zeros((n_pictures,4)) 113 | 114 | for idx in range(n_pictures): 115 | truth_boxes = truth[pictures[idx].replace(data_folder, '')] 116 | temp = mean_jaccard(truth_boxes, predictions[idx], blurred=blurred) 117 | if temp: 118 | mean_jac, nb_pred = temp 119 | else: 120 | mean_jac, nb_pred = None, 0 121 | jaccard.append(mean_jac) 122 | n_truth_boxes.append(len([k for k in truth_boxes if int(k.split()[4]) >= blurred])) 123 | n_pred_boxes.append(nb_pred) 124 | 125 | a[:,0] = jaccard 126 | a[:,1] = n_truth_boxes 127 | a[:,2] = n_pred_boxes 128 | a[:,3] = a[:,2]/a[:,1] 129 | df = pd.DataFrame(a, columns=['mJaccard', 'Nb_Truth_Bboxes', 'Nb_Pred_Bboxes', 'Ratio_Bboxes']) 130 | df['Folder'] = data_dir.replace(data_folder, '') 131 | return a, df 132 | -------------------------------------------------------------------------------- /notebooks/Counting in video.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import sys\n", 10 | "#sys.path.append('./Tiny_Faces_in_Tensorflow/')\n", 11 | "#import tiny_face_eval as tiny\n", 12 | "import evaluate\n", 13 | "from metrics import *\n", 14 | "import tensorflow as tf\n", 15 | "import matplotlib.pyplot as plt\n", 16 | "import matplotlib.patches as patches\n", 17 | "from sklearn.metrics import mean_squared_error as mse\n", 18 | "import glob\n", 19 | "import os\n", 20 | "import cv2\n", 21 | "import pandas as pd\n", 22 | "from sklearn.svm import SVC\n", 23 | "import numpy as np\n", 24 | "import imp\n", 25 | "import time\n", 26 | "import random\n", 27 | "import detect\n", 28 | "import dlib\n", 29 | "from imgaug import augmenters as iaa\n", 30 | "#imp.reload(tiny)\n", 31 | "imp.reload(detect)\n", 32 | "%matplotlib inline" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "weights_path = './Tiny_Faces_in_Tensorflow/hr_res101.pkl'" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## Saving frames" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 3, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "cap = cv2.VideoCapture('/home/alexattia/Work/RecVis/famvk.avi')\n", 58 | "fps = cap.get(cv2.CAP_PROP_FPS)\n", 59 | "initial_target = int(45 * fps) + 10\n", 60 | "final_target = int(49 * fps) + 10\n", 61 | "i = 0\n", 62 | "frames = []\n", 63 | "while(True):\n", 64 | " ret, frame = cap.read()\n", 65 | " i +=1 \n", 66 | " if i in range(initial_target, final_target+10):\n", 67 | " frames.append(frame[:,:,::-1])\n", 68 | " if i == final_target:\n", 69 | " break" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 4, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "images = []\n", 79 | "for k in range(0, len(frames), 10):\n", 80 | " try:\n", 81 | " imgs = [frames[k], frames[k+1], frames[k+2], frames[k+10]]\n", 82 | " except IndexError:\n", 83 | " imgs = [frames[k], frames[k+1], frames[k+2], frames[len(frames)-1]]\n", 84 | " images.append(imgs)" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "## Detection" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 5, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "all_detections = []\n", 101 | "for frames in images:\n", 102 | " detections = []\n", 103 | " for frame in frames:\n", 104 | " with tf.Graph().as_default():\n", 105 | " b = evaluate.evaluate(weight_file_path=weights_path, img=frame)\n", 106 | " detections.append(b)\n", 107 | " all_detections.append(detections)" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "## Matching " 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 37, 120 | "metadata": { 121 | "scrolled": false 122 | }, 123 | "outputs": [ 124 | { 125 | "name": "stdout", 126 | "output_type": "stream", 127 | "text": [ 128 | "It took 15.8 sec i.e 0.25/detection\n", 129 | "It took 16.8 sec i.e 0.27/detection\n", 130 | "It took 19.1 sec i.e 0.27/detection\n", 131 | "It took 18.6 sec i.e 0.26/detection\n", 132 | "It took 17.7 sec i.e 0.26/detection\n", 133 | "It took 17.6 sec i.e 0.24/detection\n", 134 | "It took 18.8 sec i.e 0.26/detection\n", 135 | "It took 22.0 sec i.e 0.27/detection\n", 136 | "It took 24.5 sec i.e 0.28/detection\n", 137 | "It took 25.3 sec i.e 0.28/detection\n" 138 | ] 139 | }, 140 | { 141 | "ename": "TypeError", 142 | "evalue": "unsupported operand type(s) for -: 'str' and 'float'", 143 | "output_type": "error", 144 | "traceback": [ 145 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 146 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 147 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0mt1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'It took %.1f sec i.e %.2f/detection'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mt1\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mt0bis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mt1\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mt0bis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdetections\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Total : %.1f'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mt0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 148 | "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for -: 'str' and 'float'" 149 | ] 150 | } 151 | ], 152 | "source": [ 153 | "threshold = 0.55\n", 154 | "matcheds = []\n", 155 | "t0 = time.time()\n", 156 | "for j in range(len(images)):\n", 157 | " frames = images[j]\n", 158 | " detections = all_detections[j]\n", 159 | " matched = 0\n", 160 | " t0bis = time.time()\n", 161 | " for p in range(len(detections[0])):\n", 162 | " neigh_detect, distances = detect.train_binclas(frames, detections, p)\n", 163 | " idx_max, val_max = np.argmax(distances[:,1]), np.max(distances[:,1])\n", 164 | " if val_max > threshold:\n", 165 | " matched += 1\n", 166 | " matcheds.append(matched)\n", 167 | " t1 = time.time()\n", 168 | " print('It took %.1f sec i.e %.2f/detection' % (t1-t0bis, (t1-t0bis)/len(detections[0])))\n", 169 | "print('Total : %.1f' % (time.time() - t0))" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "## Counting" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 62, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "s = 0\n", 186 | "for j in range(10):\n", 187 | " detections = all_detections[j]\n", 188 | " s += len(detections[0]) - matcheds[j]\n", 189 | "s += len(detections[3])" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": 63, 195 | "metadata": {}, 196 | "outputs": [ 197 | { 198 | "data": { 199 | "text/plain": [ 200 | "141" 201 | ] 202 | }, 203 | "execution_count": 63, 204 | "metadata": {}, 205 | "output_type": "execute_result" 206 | } 207 | ], 208 | "source": [ 209 | "s" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "## Gif Producting with counting" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": 78, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "cap = cv2.VideoCapture('/home/alexattia/Work/RecVis/famvk.avi')\n", 226 | "fps = cap.get(cv2.CAP_PROP_FPS)\n", 227 | "initial_target = int(45 * fps) + 10\n", 228 | "final_target = int(49 * fps) + 10\n", 229 | "i = 0\n", 230 | "frames = []\n", 231 | "while(True):\n", 232 | " ret, frame = cap.read()\n", 233 | " i +=1 \n", 234 | " if i in range(initial_target, final_target, 1):\n", 235 | " frames.append(frame[:,:,::-1])\n", 236 | " if i == final_target:\n", 237 | " break" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 79, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "detections = []\n", 247 | "for i, frame in enumerate(frames):\n", 248 | " with tf.Graph().as_default():\n", 249 | " b = tiny.evaluate(weight_file_path=weights_path, data_dir='.jpg', output_dir='', framee=frame,\n", 250 | " prob_thresh=0.5, nms_thresh=0.1, lw=3, \n", 251 | " display=False, save=False, draw=False, print_=0)\n", 252 | " detections.append(b[0])\n", 253 | " time.sleep(0.5)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 72, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [ 262 | "## Computing incremental count\n", 263 | "nbs = []\n", 264 | "init = len(all_detections[0][0])\n", 265 | "for j in range(1, 10):\n", 266 | " nbs.append(init)\n", 267 | " detections_ = all_detections[j]\n", 268 | " init += len(detections_[0]) - matcheds[j-1]\n", 269 | "init += len(detections_[3]) - matcheds[j]\n", 270 | "nbs.append(init)" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 117, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "k = 0\n", 280 | "l = 0\n", 281 | "images = []\n", 282 | "ff = []\n", 283 | "font = cv2.FONT_HERSHEY_SIMPLEX\n", 284 | "for j, frame in enumerate(frames):\n", 285 | " img = frame.copy()\n", 286 | " for detect_ in detections[j]:\n", 287 | " pt1, pt2 = tuple(detect_[:2]), tuple(detect_[2:])\n", 288 | " cv2.rectangle(img, pt1, pt2, (255, 0, 0), 2)\n", 289 | " cv2.putText(img, 'Incremental count : %d' % nbs[l], (1750,1300), font, 1.5, (0, 255, 0), 3)\n", 290 | " if j in range(10, 89, 9):\n", 291 | " l += 1\n", 292 | " images.append(img) \n", 293 | " cv2.imwrite('./output_video/frames_%05d.png' % j, img[:,:,::-1])" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "## Gif Production without counting" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 33, 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "cap = cv2.VideoCapture('/home/alexattia/Work/RecVis/famvk.avi')\n", 310 | "fps = cap.get(cv2.CAP_PROP_FPS)\n", 311 | "initial_target = int(45 * fps) + 10\n", 312 | "final_target = int(49 * fps) + 10\n", 313 | "i = 0\n", 314 | "frames = []\n", 315 | "while(True):\n", 316 | " ret, frame = cap.read()\n", 317 | " i +=1 \n", 318 | " if i in range(initial_target, final_target, 1):\n", 319 | " frames.append(frame[:,:,::-1])\n", 320 | " if i == final_target:\n", 321 | " break" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 32, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "detections = []\n", 331 | "for i, frame in enumerate(frames):\n", 332 | " with tf.Graph().as_default():\n", 333 | " b = tiny.evaluate(weight_file_path=weights_path, data_dir='.jpg', output_dir='', framee=frame,\n", 334 | " prob_thresh=0.5, nms_thresh=0.1, lw=3, \n", 335 | " display=False, save=False, draw=False, print_=0)\n", 336 | " detections.append(b[0])\n", 337 | " time.sleep(0.5)" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": 41, 343 | "metadata": {}, 344 | "outputs": [], 345 | "source": [ 346 | "k = 0\n", 347 | "images = []\n", 348 | "for j, frame in enumerate(frames):\n", 349 | " img = frame.copy()\n", 350 | " for detect_ in detections[k]:\n", 351 | " pt1, pt2 = tuple(detect_[:2]), tuple(detect_[2:])\n", 352 | " cv2.rectangle(img, pt1, pt2, (255, 0, 0), 2)\n", 353 | " images.append(img)\n", 354 | " if j in range(0, 94, 2):\n", 355 | " k += 1\n", 356 | " cv2.imwrite('./output_video/frame_%05d.png' % j, img[:,:,::-1])" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [] 365 | } 366 | ], 367 | "metadata": { 368 | "kernelspec": { 369 | "display_name": "TensorFlow", 370 | "language": "python", 371 | "name": "tf_env" 372 | }, 373 | "language_info": { 374 | "codemirror_mode": { 375 | "name": "ipython", 376 | "version": 3 377 | }, 378 | "file_extension": ".py", 379 | "mimetype": "text/x-python", 380 | "name": "python", 381 | "nbconvert_exporter": "python", 382 | "pygments_lexer": "ipython3", 383 | "version": "3.6.3" 384 | } 385 | }, 386 | "nbformat": 4, 387 | "nbformat_minor": 2 388 | } 389 | -------------------------------------------------------------------------------- /tiny_faces_model.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from __future__ import absolute_import 3 | from __future__ import division 4 | from __future__ import print_function 5 | 6 | import tensorflow as tf 7 | import numpy as np 8 | import pickle 9 | 10 | class Model(): 11 | def __init__(self, weight_file_path): 12 | """Overlay bounding boxes of face on images. 13 | Args: 14 | weight_file_path: 15 | A pretrained weight file in the pickle format 16 | generated by matconvnet_hr101_to_tf.py. 17 | Returns: 18 | None. 19 | """ 20 | self.dtype = tf.float32 21 | self.weight_file_path = weight_file_path 22 | with open(self.weight_file_path, "rb") as f: 23 | self.mat_blocks_dict, self.mat_params_dict = pickle.load(f) 24 | 25 | def get_data_by_key(self, key): 26 | """Helper to access a pretrained model data through a key.""" 27 | assert key in self.mat_params_dict, "key: " + key + " not found." 28 | return self.mat_params_dict[key] 29 | 30 | def _weight_variable_on_cpu(self, name, shape): 31 | """Helper to create a weight Variable stored on CPU memory. 32 | 33 | Args: 34 | name: name of the variable. 35 | shape: list of ints: (height, width, channel, filter). 36 | 37 | Returns: 38 | initializer for Variable. 39 | """ 40 | assert len(shape) == 4 41 | 42 | weights = self.get_data_by_key(name + "_filter") # (h, w, channel, filter) 43 | assert list(weights.shape) == shape 44 | initializer = tf.constant_initializer(weights, dtype=self.dtype) 45 | 46 | with tf.device('/cpu:0'): 47 | var = tf.get_variable(name + "_w", shape, initializer=initializer, dtype=self.dtype) 48 | return var 49 | 50 | def _bias_variable_on_cpu(self, name, shape): 51 | """Helper to create a bias Variable stored on CPU memory. 52 | 53 | Args: 54 | name: name of the variable. 55 | shape: int, filter size. 56 | 57 | Returns: 58 | initializer for Variable. 59 | """ 60 | assert isinstance(shape, int) 61 | bias = self.get_data_by_key(name + "_bias") 62 | assert len(bias) == shape 63 | initializer = tf.constant_initializer(bias, dtype=self.dtype) 64 | 65 | with tf.device('/cpu:0'): 66 | var = tf.get_variable(name + "_b", shape, initializer=initializer, dtype=self.dtype) 67 | return var 68 | 69 | 70 | def _bn_variable_on_cpu(self, name, shape): 71 | """Helper to create a batch normalization Variable stored on CPU memory. 72 | 73 | Args: 74 | name: name of the variable. 75 | shape: int, filter size. 76 | 77 | Returns: 78 | initializer for Variable. 79 | """ 80 | assert isinstance(shape, int) 81 | 82 | name2 = "bn" + name[3:] 83 | if name.startswith("conv"): 84 | name2 = "bn_" + name 85 | 86 | scale = self.get_data_by_key(name2 + '_scale') 87 | offset = self.get_data_by_key(name2 + '_offset') 88 | mean = self.get_data_by_key(name2 + '_mean') 89 | variance = self.get_data_by_key(name2 + '_variance') 90 | 91 | with tf.device('/cpu:0'): 92 | initializer = tf.constant_initializer(scale, dtype=self.dtype) 93 | scale = tf.get_variable(name2 + "_scale", shape, initializer=initializer, dtype=self.dtype) 94 | initializer = tf.constant_initializer(offset, dtype=self.dtype) 95 | offset = tf.get_variable(name2 + "_offset", shape, initializer=initializer, dtype=self.dtype) 96 | initializer = tf.constant_initializer(mean, dtype=self.dtype) 97 | mean = tf.get_variable(name2 + "_mean", shape, initializer=initializer, dtype=self.dtype) 98 | initializer = tf.constant_initializer(variance, dtype=self.dtype) 99 | variance = tf.get_variable(name2 + "_variance", shape, initializer=initializer, dtype=self.dtype) 100 | 101 | return scale, offset, mean, variance 102 | 103 | 104 | def conv_block(self, bottom, name, shape, strides=[1,1,1,1], padding="SAME", 105 | has_bias=False, add_relu=True, add_bn=True, eps=1.0e-5): 106 | """Create a block composed of multiple layers: 107 | a conv layer 108 | a batch normalization layer 109 | an activation layer 110 | 111 | Args: 112 | bottom: A layer before this block. 113 | name: Name of the block. 114 | shape: List of ints: (height, width, channel, filter). 115 | strides: Strides of conv layer. 116 | padding: Padding of conv layer. 117 | has_bias: Whether a bias term is added. 118 | add_relu: Whether a ReLU layer is added. 119 | add_bn: Whether a batch normalization layer is added. 120 | eps: A small float number to avoid dividing by 0, used in a batch normalization layer. 121 | Returns: 122 | a block of layers 123 | """ 124 | assert len(shape) == 4 125 | 126 | weight = self._weight_variable_on_cpu(name, shape) 127 | conv = tf.nn.conv2d(bottom, weight, strides, padding=padding) 128 | if has_bias: 129 | bias = self._bias_variable_on_cpu(name, shape[3]) 130 | 131 | pre_activation = tf.nn.bias_add(conv, bias) if has_bias else conv 132 | 133 | if add_bn: 134 | # scale, offset, mean, variance = self._bn_variable_on_cpu("bn_" + name, shape[-1]) 135 | scale, offset, mean, variance = self._bn_variable_on_cpu(name, shape[-1]) 136 | pre_activation = tf.nn.batch_normalization(pre_activation, mean, variance, offset, scale, variance_epsilon=eps) 137 | 138 | relu = tf.nn.relu(pre_activation) if add_relu else pre_activation 139 | 140 | return relu 141 | 142 | 143 | def conv_trans_layer(self, bottom, name, shape, strides=[1,1,1,1], padding="SAME", has_bias=False): 144 | """Create a block composed of multiple layers: 145 | a transpose of conv layer 146 | an activation layer 147 | 148 | Args: 149 | bottom: A layer before this block. 150 | name: Name of the block. 151 | shape: List of ints: (height, width, channel, filter). 152 | strides: Strides of conv layer. 153 | padding: Padding of conv layer. 154 | has_bias: Whether a bias term is added. 155 | add_relu: Whether a ReLU layer is added. 156 | Returns: 157 | a block of layers 158 | """ 159 | assert len(shape) == 4 160 | 161 | weight = self._weight_variable_on_cpu(name, shape) 162 | nb, h, w, nc = tf.split(tf.shape(bottom), num_or_size_splits=4) 163 | output_shape = tf.stack([nb, (h - 1) * strides[1] - 3 + shape[0], (w - 1) * strides[2] - 3 + shape[1], nc])[:, 0] 164 | conv = tf.nn.conv2d_transpose(bottom, weight, output_shape, strides, padding=padding) 165 | if has_bias: 166 | bias = self._bias_variable_on_cpu(name, shape[3]) 167 | 168 | conv = tf.nn.bias_add(conv, bias) if has_bias else conv 169 | 170 | return conv 171 | 172 | def residual_block(self, bottom, name, in_channel, neck_channel, out_channel, trunk): 173 | """Create a residual block. 174 | 175 | Args: 176 | bottom: A layer before this block. 177 | name: Name of the block. 178 | in_channel: number of channels in a input tensor. 179 | neck_channel: number of channels in a bottleneck block. 180 | out_channel: number of channels in a output tensor. 181 | trunk: a tensor in a identity path. 182 | Returns: 183 | a block of layers 184 | """ 185 | _strides = [1, 2, 2, 1] if name.startswith("res3a") or name.startswith("res4a") else [1, 1, 1, 1] 186 | res = self.conv_block(bottom, name + '_branch2a', shape=[1, 1, in_channel, neck_channel], 187 | strides=_strides, padding="VALID", add_relu=True) 188 | res = self.conv_block(res, name + '_branch2b', shape=[3, 3, neck_channel, neck_channel], 189 | padding="SAME", add_relu=True) 190 | res = self.conv_block(res, name + '_branch2c', shape=[1, 1, neck_channel, out_channel], 191 | padding="VALID", add_relu=False) 192 | 193 | res = trunk + res 194 | res = tf.nn.relu(res) 195 | 196 | return res 197 | 198 | def tiny_face(self, image): 199 | """Create a tiny face model. 200 | 201 | Args: 202 | image: an input image. 203 | Returns: 204 | a score tensor 205 | """ 206 | img = tf.pad(image, [[0, 0], [3, 3], [3, 3], [0, 0]], "CONSTANT") 207 | conv = self.conv_block(img, 'conv1', shape=[7, 7, 3, 64], strides=[1, 2, 2, 1], padding="VALID", add_relu=True) 208 | pool1 = tf.nn.max_pool(conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME') 209 | 210 | res2a_branch1 = self.conv_block(pool1, 'res2a_branch1', shape=[1, 1, 64, 256], padding="VALID", add_relu=False) 211 | res2a = self.residual_block(pool1, 'res2a', 64, 64, 256, res2a_branch1) 212 | res2b = self.residual_block(res2a, 'res2b', 256, 64, 256, res2a) 213 | res2c = self.residual_block(res2b, 'res2c', 256, 64, 256, res2b) 214 | 215 | res3a_branch1 = self.conv_block(res2c, 'res3a_branch1', shape=[1, 1, 256, 512], strides=[1, 2, 2, 1], padding="VALID", add_relu=False) 216 | res3a = self.residual_block(res2c, 'res3a', 256, 128, 512, res3a_branch1) 217 | 218 | res3b1 = self.residual_block(res3a, 'res3b1', 512, 128, 512, res3a) 219 | res3b2 = self.residual_block(res3b1, 'res3b2', 512, 128, 512, res3b1) 220 | res3b3 = self.residual_block(res3b2, 'res3b3', 512, 128, 512, res3b2) 221 | 222 | res4a_branch1 = self.conv_block(res3b3, 'res4a_branch1', shape=[1, 1, 512, 1024], strides=[1, 2, 2, 1], padding="VALID", add_relu=False) 223 | res4a = self.residual_block(res3b3, 'res4a', 512, 256, 1024, res4a_branch1) 224 | 225 | res4b = res4a 226 | for i in range(1, 23): 227 | res4b = self.residual_block(res4b, 'res4b' + str(i), 1024, 256, 1024, res4b) 228 | 229 | score_res4 = self.conv_block(res4b, 'score_res4', shape=[1, 1, 1024, 125], padding="VALID", 230 | has_bias=True, add_relu=False, add_bn=False) 231 | score4 = self.conv_trans_layer(score_res4, 'score4', shape=[4, 4, 125, 125], strides=[1, 2, 2, 1], padding="SAME") 232 | score_res3 = self.conv_block(res3b3, 'score_res3', shape=[1, 1, 512, 125], padding="VALID", 233 | has_bias=True, add_bn=False, add_relu=False) 234 | 235 | bs, height, width = tf.split(tf.shape(score4), num_or_size_splits=4)[0:3] 236 | _size = tf.convert_to_tensor([height[0], width[0]]) 237 | _offsets = tf.zeros([bs[0], 2]) 238 | score_res3c = tf.image.extract_glimpse(score_res3, _size, _offsets, centered=True, normalized=False) 239 | 240 | score_final = score4 + score_res3c 241 | return score_final 242 | -------------------------------------------------------------------------------- /util.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from scipy.special import expit 3 | import numpy as np 4 | import cv2 5 | 6 | def overlay_bounding_boxes(raw_img, refined_bboxes, lw, draw): 7 | """ 8 | Overlay bounding boxes of face on images. 9 | :param raw_img: target image. 10 | :param refined_bboxes: Bounding boxes of detected faces. 11 | :param lw: Line width of bounding boxes. If zero specified, 12 | this is determined based on confidence of each detection. 13 | :returns: bounding boxes 14 | """ 15 | 16 | # Overlay bounding boxes on an image with the color based on the confidence. 17 | bboxes = [] 18 | for r in refined_bboxes: 19 | _score = expit(r[4]) 20 | cm_idx = int(np.ceil(_score * 255)) 21 | rect_color = [int(np.ceil(x * 255)) for x in cm_data[cm_idx]] # parula 22 | _lw = lw 23 | if lw == 0: # line width of each bounding box is adaptively determined. 24 | bw, bh = r[2] - r[0] + 1, r[3] - r[0] + 1 25 | _lw = 1 if min(bw, bh) <= 20 else max(2, min(3, min(bh / 20, bw / 20))) 26 | _lw = int(np.ceil(_lw * _score)) 27 | 28 | _r = [int(x) for x in r[:4]] 29 | if draw: 30 | cv2.rectangle(raw_img, (_r[0], _r[1]), (_r[2], _r[3]), rect_color, _lw) 31 | bboxes.append([_r[0], _r[1], _r[2], _r[3]]) 32 | return bboxes 33 | 34 | # colormap parula borrowed from 35 | # https://github.com/BIDS/colormap/blob/master/fake_parula.py 36 | cm_data = [[ 0.26710521, 0.03311059, 0.6188155 ], 37 | [ 0.26493929, 0.04780926, 0.62261795], 38 | [ 0.26260545, 0.06084214, 0.62619176], 39 | [ 0.26009691, 0.07264411, 0.62951561], 40 | [ 0.25740785, 0.08360391, 0.63256745], 41 | [ 0.25453369, 0.09395358, 0.63532497], 42 | [ 0.25147146, 0.10384228, 0.6377661 ], 43 | [ 0.24822014, 0.11337029, 0.6398697 ], 44 | [ 0.24478105, 0.12260661, 0.64161629], 45 | [ 0.24115816, 0.131599 , 0.6429888 ], 46 | [ 0.23735836, 0.14038009, 0.64397346], 47 | [ 0.23339166, 0.14897137, 0.64456048], 48 | [ 0.22927127, 0.15738602, 0.64474476], 49 | [ 0.22501278, 0.16563165, 0.64452595], 50 | [ 0.22063349, 0.17371215, 0.64390834], 51 | [ 0.21616055, 0.18162302, 0.64290515], 52 | [ 0.21161851, 0.18936156, 0.64153295], 53 | [ 0.20703353, 0.19692415, 0.63981287], 54 | [ 0.20243273, 0.20430706, 0.63776986], 55 | [ 0.19784363, 0.211507 , 0.63543183], 56 | [ 0.19329361, 0.21852157, 0.63282872], 57 | [ 0.18880937, 0.2253495 , 0.62999156], 58 | [ 0.18442119, 0.23198815, 0.62695569], 59 | [ 0.18014936, 0.23844124, 0.62374886], 60 | [ 0.17601569, 0.24471172, 0.62040016], 61 | [ 0.17204028, 0.25080356, 0.61693715], 62 | [ 0.16824123, 0.25672163, 0.6133854 ], 63 | [ 0.16463462, 0.26247158, 0.60976836], 64 | [ 0.16123449, 0.26805963, 0.60610723], 65 | [ 0.15805279, 0.27349243, 0.60242099], 66 | [ 0.15509948, 0.27877688, 0.59872645], 67 | [ 0.15238249, 0.28392004, 0.59503836], 68 | [ 0.14990781, 0.28892902, 0.59136956], 69 | [ 0.14767951, 0.29381086, 0.58773113], 70 | [ 0.14569979, 0.29857245, 0.58413255], 71 | [ 0.1439691 , 0.30322055, 0.58058191], 72 | [ 0.14248613, 0.30776167, 0.57708599], 73 | [ 0.14124797, 0.31220208, 0.57365049], 74 | [ 0.14025018, 0.31654779, 0.57028011], 75 | [ 0.13948691, 0.32080454, 0.5669787 ], 76 | [ 0.13895174, 0.32497744, 0.56375063], 77 | [ 0.13863958, 0.32907012, 0.56060453], 78 | [ 0.138537 , 0.3330895 , 0.55753513], 79 | [ 0.13863384, 0.33704026, 0.55454374], 80 | [ 0.13891931, 0.34092684, 0.55163126], 81 | [ 0.13938212, 0.34475344, 0.54879827], 82 | [ 0.14001061, 0.34852402, 0.54604503], 83 | [ 0.14079292, 0.35224233, 0.54337156], 84 | [ 0.14172091, 0.35590982, 0.54078769], 85 | [ 0.14277848, 0.35953205, 0.53828312], 86 | [ 0.14395358, 0.36311234, 0.53585661], 87 | [ 0.1452346 , 0.36665374, 0.5335074 ], 88 | [ 0.14661019, 0.3701591 , 0.5312346 ], 89 | [ 0.14807104, 0.37363011, 0.52904278], 90 | [ 0.1496059 , 0.3770697 , 0.52692951], 91 | [ 0.15120289, 0.3804813 , 0.52488853], 92 | [ 0.15285214, 0.38386729, 0.52291854], 93 | [ 0.15454421, 0.38722991, 0.52101815], 94 | [ 0.15627225, 0.39056998, 0.5191937 ], 95 | [ 0.15802555, 0.39389087, 0.5174364 ], 96 | [ 0.15979549, 0.39719482, 0.51574311], 97 | [ 0.16157425, 0.40048375, 0.51411214], 98 | [ 0.16335571, 0.40375871, 0.51254622], 99 | [ 0.16513234, 0.40702178, 0.51104174], 100 | [ 0.1668964 , 0.41027528, 0.50959299], 101 | [ 0.16864151, 0.41352084, 0.50819797], 102 | [ 0.17036277, 0.41675941, 0.50685814], 103 | [ 0.1720542 , 0.41999269, 0.50557008], 104 | [ 0.17370932, 0.42322271, 0.50432818], 105 | [ 0.17532301, 0.42645082, 0.50313007], 106 | [ 0.17689176, 0.42967776, 0.50197686], 107 | [ 0.17841013, 0.43290523, 0.5008633 ], 108 | [ 0.17987314, 0.43613477, 0.49978492], 109 | [ 0.18127676, 0.43936752, 0.49873901], 110 | [ 0.18261885, 0.44260392, 0.49772638], 111 | [ 0.18389409, 0.44584578, 0.49673978], 112 | [ 0.18509911, 0.44909409, 0.49577605], 113 | [ 0.18623135, 0.4523496 , 0.494833 ], 114 | [ 0.18728844, 0.45561305, 0.49390803], 115 | [ 0.18826671, 0.45888565, 0.49299567], 116 | [ 0.18916393, 0.46216809, 0.49209268], 117 | [ 0.18997879, 0.46546084, 0.49119678], 118 | [ 0.19070881, 0.46876472, 0.49030328], 119 | [ 0.19135221, 0.47208035, 0.48940827], 120 | [ 0.19190791, 0.47540815, 0.48850845], 121 | [ 0.19237491, 0.47874852, 0.4876002 ], 122 | [ 0.19275204, 0.48210192, 0.48667935], 123 | [ 0.19303899, 0.48546858, 0.48574251], 124 | [ 0.19323526, 0.48884877, 0.48478573], 125 | [ 0.19334062, 0.49224271, 0.48380506], 126 | [ 0.19335574, 0.49565037, 0.4827974 ], 127 | [ 0.19328143, 0.49907173, 0.48175948], 128 | [ 0.19311664, 0.50250719, 0.48068559], 129 | [ 0.192864 , 0.50595628, 0.47957408], 130 | [ 0.19252521, 0.50941877, 0.47842186], 131 | [ 0.19210087, 0.51289469, 0.47722441], 132 | [ 0.19159194, 0.516384 , 0.47597744], 133 | [ 0.19100267, 0.51988593, 0.47467988], 134 | [ 0.19033595, 0.52340005, 0.47332894], 135 | [ 0.18959113, 0.5269267 , 0.47191795], 136 | [ 0.18877336, 0.530465 , 0.47044603], 137 | [ 0.18788765, 0.53401416, 0.46891178], 138 | [ 0.18693822, 0.53757359, 0.46731272], 139 | [ 0.18592276, 0.54114404, 0.46563962], 140 | [ 0.18485204, 0.54472367, 0.46389595], 141 | [ 0.18373148, 0.5483118 , 0.46207951], 142 | [ 0.18256585, 0.55190791, 0.4601871 ], 143 | [ 0.18135481, 0.55551253, 0.45821002], 144 | [ 0.18011172, 0.55912361, 0.45615277], 145 | [ 0.17884392, 0.56274038, 0.45401341], 146 | [ 0.17755858, 0.56636217, 0.45178933], 147 | [ 0.17625543, 0.56998972, 0.44946971], 148 | [ 0.174952 , 0.57362064, 0.44706119], 149 | [ 0.17365805, 0.57725408, 0.44456198], 150 | [ 0.17238403, 0.58088916, 0.4419703 ], 151 | [ 0.17113321, 0.58452637, 0.43927576], 152 | [ 0.1699221 , 0.58816399, 0.43648119], 153 | [ 0.1687662 , 0.5918006 , 0.43358772], 154 | [ 0.16767908, 0.59543526, 0.43059358], 155 | [ 0.16667511, 0.59906699, 0.42749697], 156 | [ 0.16575939, 0.60269653, 0.42428344], 157 | [ 0.16495764, 0.6063212 , 0.42096245], 158 | [ 0.16428695, 0.60993988, 0.41753246], 159 | [ 0.16376481, 0.61355147, 0.41399151], 160 | [ 0.16340924, 0.61715487, 0.41033757], 161 | [ 0.16323549, 0.62074951, 0.40656329], 162 | [ 0.16326148, 0.62433443, 0.40266378], 163 | [ 0.16351136, 0.62790748, 0.39864431], 164 | [ 0.16400433, 0.63146734, 0.39450263], 165 | [ 0.16475937, 0.63501264, 0.39023638], 166 | [ 0.16579502, 0.63854196, 0.38584309], 167 | [ 0.16712921, 0.64205381, 0.38132023], 168 | [ 0.168779 , 0.64554661, 0.37666513], 169 | [ 0.17075915, 0.64901912, 0.37186962], 170 | [ 0.17308572, 0.65246934, 0.36693299], 171 | [ 0.1757732 , 0.65589512, 0.36185643], 172 | [ 0.17883344, 0.65929449, 0.3566372 ], 173 | [ 0.18227669, 0.66266536, 0.35127251], 174 | [ 0.18611159, 0.66600553, 0.34575959], 175 | [ 0.19034516, 0.66931265, 0.34009571], 176 | [ 0.19498285, 0.67258423, 0.3342782 ], 177 | [ 0.20002863, 0.67581761, 0.32830456], 178 | [ 0.20548509, 0.67900997, 0.3221725 ], 179 | [ 0.21135348, 0.68215834, 0.31587999], 180 | [ 0.2176339 , 0.68525954, 0.30942543], 181 | [ 0.22432532, 0.68831023, 0.30280771], 182 | [ 0.23142568, 0.69130688, 0.29602636], 183 | [ 0.23893914, 0.69424565, 0.28906643], 184 | [ 0.2468574 , 0.69712255, 0.28194103], 185 | [ 0.25517514, 0.69993351, 0.27465372], 186 | [ 0.26388625, 0.70267437, 0.26720869], 187 | [ 0.27298333, 0.70534087, 0.25961196], 188 | [ 0.28246016, 0.70792854, 0.25186761], 189 | [ 0.29232159, 0.71043184, 0.2439642 ], 190 | [ 0.30253943, 0.71284765, 0.23594089], 191 | [ 0.31309875, 0.71517209, 0.22781515], 192 | [ 0.32399522, 0.71740028, 0.21959115], 193 | [ 0.33520729, 0.71952906, 0.21129816], 194 | [ 0.3467003 , 0.72155723, 0.20298257], 195 | [ 0.35846225, 0.72348143, 0.19466318], 196 | [ 0.3704552 , 0.72530195, 0.18639333], 197 | [ 0.38264126, 0.72702007, 0.17822762], 198 | [ 0.39499483, 0.72863609, 0.17020921], 199 | [ 0.40746591, 0.73015499, 0.1624122 ], 200 | [ 0.42001969, 0.73158058, 0.15489659], 201 | [ 0.43261504, 0.73291878, 0.14773267], 202 | [ 0.44521378, 0.73417623, 0.14099043], 203 | [ 0.45777768, 0.73536072, 0.13474173], 204 | [ 0.47028295, 0.73647823, 0.1290455 ], 205 | [ 0.48268544, 0.73753985, 0.12397794], 206 | [ 0.49497773, 0.73854983, 0.11957878], 207 | [ 0.5071369 , 0.73951621, 0.11589589], 208 | [ 0.51913764, 0.74044827, 0.11296861], 209 | [ 0.53098624, 0.74134823, 0.11080237], 210 | [ 0.5426701 , 0.74222288, 0.10940411], 211 | [ 0.55417235, 0.74308049, 0.10876749], 212 | [ 0.56550904, 0.74392086, 0.10885609], 213 | [ 0.57667994, 0.74474781, 0.10963233], 214 | [ 0.58767906, 0.74556676, 0.11105089], 215 | [ 0.59850723, 0.74638125, 0.1130567 ], 216 | [ 0.609179 , 0.74719067, 0.11558918], 217 | [ 0.61969877, 0.74799703, 0.11859042], 218 | [ 0.63007148, 0.74880206, 0.12200388], 219 | [ 0.64030249, 0.74960714, 0.12577596], 220 | [ 0.65038997, 0.75041586, 0.12985641], 221 | [ 0.66034774, 0.75122659, 0.1342004 ], 222 | [ 0.67018264, 0.75203968, 0.13876817], 223 | [ 0.67990043, 0.75285567, 0.14352456], 224 | [ 0.68950682, 0.75367492, 0.14843886], 225 | [ 0.69900745, 0.75449768, 0.15348445], 226 | [ 0.70840781, 0.75532408, 0.15863839], 227 | [ 0.71771325, 0.75615416, 0.16388098], 228 | [ 0.72692898, 0.75698787, 0.1691954 ], 229 | [ 0.73606001, 0.75782508, 0.17456729], 230 | [ 0.74511119, 0.75866562, 0.17998443], 231 | [ 0.75408719, 0.75950924, 0.18543644], 232 | [ 0.76299247, 0.76035568, 0.19091446], 233 | [ 0.77183123, 0.76120466, 0.19641095], 234 | [ 0.78060815, 0.76205561, 0.20191973], 235 | [ 0.78932717, 0.76290815, 0.20743538], 236 | [ 0.79799213, 0.76376186, 0.21295324], 237 | [ 0.8066067 , 0.76461631, 0.21846931], 238 | [ 0.81517444, 0.76547101, 0.22398014], 239 | [ 0.82369877, 0.76632547, 0.2294827 ], 240 | [ 0.832183 , 0.7671792 , 0.2349743 ], 241 | [ 0.8406303 , 0.76803167, 0.24045248], 242 | [ 0.84904371, 0.76888236, 0.24591492], 243 | [ 0.85742615, 0.76973076, 0.25135935], 244 | [ 0.86578037, 0.77057636, 0.25678342], 245 | [ 0.87410891, 0.77141875, 0.2621846 ], 246 | [ 0.88241406, 0.77225757, 0.26755999], 247 | [ 0.89070781, 0.77308772, 0.27291122], 248 | [ 0.89898836, 0.77391069, 0.27823228], 249 | [ 0.90725475, 0.77472764, 0.28351668], 250 | [ 0.91550775, 0.77553893, 0.28875751], 251 | [ 0.92375722, 0.7763404 , 0.29395046], 252 | [ 0.9320227 , 0.77712286, 0.29909267], 253 | [ 0.94027715, 0.7779011 , 0.30415428], 254 | [ 0.94856742, 0.77865213, 0.3091325 ], 255 | [ 0.95686038, 0.7793949 , 0.31397459], 256 | [ 0.965222 , 0.7800975 , 0.31864342], 257 | [ 0.97365189, 0.78076521, 0.32301107], 258 | [ 0.98227405, 0.78134549, 0.32678728], 259 | [ 0.99136564, 0.78176999, 0.3281624 ], 260 | [ 0.99505988, 0.78542889, 0.32106514], 261 | [ 0.99594185, 0.79046888, 0.31648808], 262 | [ 0.99646635, 0.79566972, 0.31244662], 263 | [ 0.99681528, 0.80094905, 0.30858532], 264 | [ 0.9970578 , 0.80627441, 0.30479247], 265 | [ 0.99724883, 0.81161757, 0.30105328], 266 | [ 0.99736711, 0.81699344, 0.29725528], 267 | [ 0.99742254, 0.82239736, 0.29337235], 268 | [ 0.99744736, 0.82781159, 0.28943391], 269 | [ 0.99744951, 0.83323244, 0.28543062], 270 | [ 0.9973953 , 0.83867931, 0.2812767 ], 271 | [ 0.99727248, 0.84415897, 0.27692897], 272 | [ 0.99713953, 0.84963903, 0.27248698], 273 | [ 0.99698641, 0.85512544, 0.26791703], 274 | [ 0.99673736, 0.86065927, 0.26304767], 275 | [ 0.99652358, 0.86616957, 0.25813608], 276 | [ 0.99622774, 0.87171946, 0.25292044], 277 | [ 0.99590494, 0.87727931, 0.24750009], 278 | [ 0.99555225, 0.88285068, 0.2418514 ], 279 | [ 0.99513763, 0.8884501 , 0.23588062], 280 | [ 0.99471252, 0.89405076, 0.2296837 ], 281 | [ 0.99421873, 0.89968246, 0.2230963 ], 282 | [ 0.99370185, 0.90532165, 0.21619768], 283 | [ 0.99313786, 0.91098038, 0.2088926 ], 284 | [ 0.99250707, 0.91666811, 0.20108214], 285 | [ 0.99187888, 0.92235023, 0.19290417], 286 | [ 0.99110991, 0.92809686, 0.18387963], 287 | [ 0.99042108, 0.93379995, 0.17458127], 288 | [ 0.98958484, 0.93956962, 0.16420166], 289 | [ 0.98873988, 0.94533859, 0.15303117], 290 | [ 0.98784836, 0.95112482, 0.14074826], 291 | [ 0.98680727, 0.95697596, 0.12661626]] 292 | --------------------------------------------------------------------------------