├── Config.py ├── README.md ├── Test.py ├── Train.py ├── __pycache__ ├── Config.cpython-37.pyc ├── augmentations.cpython-37.pyc ├── detection.cpython-37.pyc ├── l2norm.cpython-37.pyc ├── loss_function.cpython-37.pyc ├── my_window.cpython-37.pyc ├── ssd_net_vgg.cpython-37.pyc ├── utils.cpython-37.pyc └── voc0712.cpython-37.pyc ├── augmentations.py ├── bus_dataset.log ├── camera.py ├── camera_detection.py ├── camera_detection_1.py ├── detection.py ├── dnf_test.jpg ├── dnf_test_done.jpg ├── environment.yml ├── eval.py ├── l2norm.py ├── loss_function.py ├── model_file_test.py ├── result.jpg ├── ssd_net_vgg.py ├── test.jpg ├── test_done.jpg ├── utils.py ├── video_detection.py ├── voc0712.py └── weights └── readme.txt /Config.py: -------------------------------------------------------------------------------- 1 | ''' 2 | 本项目是我在github(国内的话是gitee)的免费开源项目。如果你在某些平台(CSDN、淘宝)付费下载了该项目,烦请告知(邮箱(PengfeiM@outlook.com))。 3 | ''' 4 | import os.path as osp 5 | sk = [ 15, 30, 60, 111, 162, 213, 264 ] 6 | feature_map = [ 38, 19, 10, 5, 3, 1 ] 7 | steps = [ 8, 16, 32, 64, 100, 300 ] 8 | image_size = 300 9 | aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]] 10 | MEANS = (104, 117, 123) 11 | batch_size = 8 12 | data_load_number_worker = 0 13 | lr = 5e-4 14 | momentum = 0.9 15 | weight_decacy = 5e-4 16 | gamma = 0.1 17 | VOC_ROOT = osp.join('./', "dataset/") 18 | dataset_root = VOC_ROOT 19 | use_cuda = True 20 | lr_steps = (80000, 100000, 120000) 21 | max_iter = 120000 22 | class_num = 5 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 基于 CNN 的疲劳驾驶检测 2 | 简单的毕设小项目 3 | ## 更新/Update 4 | > [!ATTENTION] 这个项目我很久没更新了,由于一些原因吧。最近终于有时间把自己的开发环境重新整好了。 5 | > 很高兴看到还有人在关注这个项目,只有由于一些个人原因没能及时回复大家的疑问,请谅解。 6 | > 接下来一段时间,我会在工作之余,尽可能修复这个项目中的一些问题(主要是版本不兼容),同时我也会把我自己的项目环境放在 7 | > 代码仓中,如果你使用 miniconda 或者 anaconda,那么你可以直接从 `environment.yml` 导入项目的虚拟环境。 8 | > 即使不适用 conda,我想这个清单也可以在一定程度上帮助你配置环境。 9 | > 暂时我会使用 cpu 先调试项目,可以的话,后续会把 gpu 相关环境配置和方法放上来。 10 | 11 | **我的硬件配置** 12 | > 作为参考 13 | ```bash 14 | _,met$$$$$gg. revan_m@Ebon-Hawk 15 | ,g$$$$$$$$$$$$$$$P. ----------------- 16 | ,g$$P"" """Y$$.". OS: Debian GNU/Linux 12 (bookworm) x86_64 17 | ,$$P' `$$$. Host: Windows Subsystem for Linux - Debian (2.4.13) 18 | ',$$P ,ggs. `$$b: Kernel: Linux 5.15.167.4-microsoft-standard-WSL2 19 | `d$$' ,$P"' . $$$ 20 | $$P d$' , $$P 21 | $$: $$. - ,d$$' Shell: zsh 5.9 22 | $$; Y$b._ _,d$P' WM: WSLg 1.0.65 (Wayland) 23 | Y$$. `.`"Y$$$$P"' Terminal: tmux 3.5a 24 | `$$b "-.__ CPU: 11th Gen Intel(R) Core(TM) i7-11800H (4) @ 2.30 GHz 25 | `Y$$b GPU 1: Microsoft Basic Render Driver 26 | `Y$$. GPU 2: Microsoft Basic Render Driver 27 | `$$b. Memory: 740.60 MiB / 7.63 GiB (9%) 28 | `Y$$b. Swap: 0 B / 2.00 GiB (0%) 29 | `"Y$b._ 30 | `"""" 31 | 32 | 33 | 34 | 35 | Battery (Microsoft Hyper-V Virtual Batte): 100% [AC Connected] 36 | Locale: zh_CN.UTF-8 37 | ``` 38 | 啊,看起来显卡信息这里没有。我的机器其实有一个独显3060laptop,相信知道这个显卡的大概了解其性能,不再多做赘述。 39 | 40 | 41 | ## 郑重声明: 42 | 本项目是我在github(国内的话是gitee)的免费开源项目。我没有授权任何平台(CSDN、淘宝)付费提供该项目。 43 | This project is open-source on github and gitee. 44 | No authorization to any platform to sell my project. 45 | 46 | > [!IMPORTANT] 如果你需要论文,请直接发[邮件](PengfeiM@outlook.com),不要提 issue,issue用来解决问题,或者提出你的想法。 47 | 48 | ## 运行环境(Excution Environment): 49 | 50 | > 1.python 3.7.1 51 | > 2.pytorch 1.0.1 52 | > 3.python-opencv 53 | > 4.cuda大概可能是8或者9,时间太久记不清了。 不过主要还是显卡-cuda-cudnn-pytorch版本对应吧。 54 | 55 | ## 说明(Notions) 56 | 57 | 预训练的权重文件[vgg_16] 58 | 59 | 1、具体的配置文件请看 Config.py 文件--file that save the configuration 60 | 2、训练运行 python Train.py --file that start the training and control the loops 61 | 3、单张测试 python test.py --file that test ssd with one image 62 | 4、测试网络性能 python eval.py --file that evaluate the performance 63 | 5、测试视频 python camera_detection.py --file that test the cnn with a video sequence 64 | 65 | ## 目前进度(Process: All Done): 66 | 67 | | 内容 | 进度 | 68 | | ---------------- | ---- | 69 | | PERCLOS 计算 | DONE | 70 | | 眨眼频率计算 | DONE | 71 | | 打哈欠检测及计算 | DONE | 72 | | 疲劳检测 | DONE | 73 | 74 | ## 主要文件说明(File in the repo): 75 | 76 | ssd_net_vgg.py 定义 class SSD 的文件(define the ssd cnn) 77 | Train.py 训练代码 (training) 78 | voc0712.py 数据集处理代码(没有改文件名,改的话还要改其他代码,麻烦) (processing the dataset) 79 | loss_function.py 损失函数 (loss function) 80 | detection.py 检测结果的处理代码,将 SSD 返回结果处理为 opencv 可以处理的形式 81 | eval.py 评估网络性能代码 82 | test.py 单张图片测试代码 Ps:没写参数接口,所以要改测试的图片就要手动改代码内部文件名了 83 | l2norm.py l2 正则化 84 | Config.py 配置参数 85 | utils.py 工具类 86 | camera.py opencv 调用摄像头测试 87 | camera_detection.py 摄像头检测代码 V1,V2 88 | video_detection.py 视频检测,V3 89 | 90 | ## 数据集结构: 91 | 92 | > /dataset: 93 | > 94 | > > /Annotations 存放含有目标信息的 xml 文件 95 | > > /ImageSets/Main 存放图片名的文件 96 | > > /JPEGImages 存放图片 97 | > > /gray2rgb.m 灰度图转三通道 98 | > > /txt.py 生成 ImageSets 文件的代码 99 | 100 | ## 权重文件存放路径: 101 | 102 | weights 103 | 测试后的图片存放位置: 104 | tested 105 | 106 | ## 参考代码: 107 | 108 | https://github.com/amdegroot/ssd.pytorch 109 | 110 | ## 数据集和权重文件: 111 | (针对部分代码中涉及的文件(指ssd_voc_5000_plus.pth),翻了翻旧U盘,算是找到了。) 112 | 百度云: 113 | [数据集和权重文件](https://pan.baidu.com/s/1cgl94gxSNEW0ZI-wYcZtpQ) 114 | 提取码:hwsi 115 | Onedrive: 116 | [数据集](https://mailustceducn-my.sharepoint.com/:u:/g/personal/mpf916_mail_ustc_edu_cn/ER0UB-cAe1VDp9hJZ7e5Ef4B7kGvVX4PePSj7WRtb9VrLQ?e=lbDnjV) 117 | [权重文件](https://mailustceducn-my.sharepoint.com/:f:/g/personal/mpf916_mail_ustc_edu_cn/EqGCPA3SGz5Mp-RMHJSoSSwBg-KG09qwgSAPiOjMOcVVtQ?e=v5yhQz) 118 | 119 | ## 测试 120 | 121 | 1、运行 Train.py 训练 122 | 2、eval 可以用于测试整个测试集,test 用于单张图片测试。 123 | 124 | ## 关于问题讨论 125 | 欢迎大家就代码中存在的问题提issue,同时本存储库开放了讨论功能(Discussion),欢迎各位将一些共性的问题放到Dicussion中提问(我也会将部分以前的issue放到Discussion中)。 126 | 127 | ## 关于咨询 128 | 如果issue和Discussion不能满足您的需要,随时可以发邮件到我的邮箱(PengfeiM@outlook.com)提出您的问题。 129 | 当然,不管是issue/discussion还是邮件,我都会尽快回复(issue和discussion有更新github会给我发邮件,我也会时常检查github手机端APP)。 130 | 131 | **最后,如果您想要支持我的工作,请扫描下面的二维码** 132 | ![我的支付宝](https://user-images.githubusercontent.com/45191163/116050673-55db0400-a6aa-11eb-9588-cc0546e89f70.jpg) 133 | 134 | **谢谢您对我的支持和帮助** 135 | ## Star History 136 | [![Star History Chart](https://api.star-history.com/svg?repos=PengfeiM/Fatigue-Driven-Detection-Based-on-CNN&type=Date)](https://www.star-history.com/#PengfeiM/Fatigue-Driven-Detection-Based-on-CNN&Date) 137 | 138 | -------------------------------------------------------------------------------- /Test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | # import pdb 3 | 4 | from torch.autograd import Variable 5 | from detection import Detect 6 | # from ssd_net_vgg import * 7 | from ssd_net_vgg import SSD 8 | # from voc0712 import * 9 | from voc0712 import VOC_CLASSES 10 | import Config as config 11 | import torch.nn as nn 12 | import numpy as np 13 | import cv2 14 | import utils 15 | if torch.cuda.is_available(): 16 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 17 | colors_tableau = [(255, 255, 255), (31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120), 18 | (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150), 19 | (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148), 20 | (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199), 21 | (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229), (158, 218, 229), (158, 218, 229)] 22 | 23 | net = SSD() # initialize SSD 24 | net = torch.nn.DataParallel(net) 25 | net.train(mode=False) 26 | # net.load_state_dict(torch.load('./weights/ssd300_VOC_100000.pth',map_location=lambda storage, loc: storage)) 27 | net.load_state_dict(torch.load('./weights/ssd_voc_120000.pth', map_location=lambda storage, loc: storage)) 28 | img_id = 60 29 | name = 'test' 30 | image = cv2.imread('./' + name + '.jpg', cv2.IMREAD_COLOR) 31 | x = cv2.resize(image, (300, 300)).astype(np.float32) 32 | x -= (104.0, 117.0, 123.0) 33 | x = x.astype(np.float32) 34 | x = x[:, :, ::-1].copy() 35 | # plt.imshow(x) 36 | x = torch.from_numpy(x).permute(2, 0, 1) 37 | xx = Variable(x.unsqueeze(0)) # wrap tensor in Variable 38 | if torch.cuda.is_available(): 39 | xx = xx.cuda() 40 | y = net(xx) 41 | softmax = nn.Softmax(dim=-1) 42 | # detect = Detect(config.class_num, 0, 200, 0.01, 0.45) 43 | detect = Detect.apply # pytorch新版本需要这样使用 44 | priors = utils.default_prior_box() 45 | 46 | loc, conf = y 47 | loc = torch.cat([o.view(o.size(0), -1) for o in loc], 1) 48 | conf = torch.cat([o.view(o.size(0), -1) for o in conf], 1) 49 | 50 | detections = detect( 51 | loc.view(loc.size(0), -1, 4), 52 | softmax(conf.view(conf.size(0), -1, config.class_num)), 53 | torch.cat([o.view(-1, 4) for o in priors], 0), 54 | config.class_num, 55 | 200, 56 | 0.7, 57 | 0.45 58 | ).data 59 | # detections = detect.apply 60 | 61 | labels = VOC_CLASSES 62 | top_k = 10 63 | 64 | # plt.imshow(rgb_image) # plot the image for matplotlib 65 | 66 | # scale each detection back up to the image 67 | scale = torch.Tensor(image.shape[1::-1]).repeat(2) 68 | for i in range(detections.size(1)): 69 | j = 0 70 | while detections[0, i, j, 0] >= 0.4: 71 | score = detections[0, i, j, 0] 72 | label_name = labels[i - 1] 73 | display_txt = '%s: %.2f' % (label_name, score) 74 | pt = (detections[0, i, j, 1:] * scale).cpu().numpy() 75 | rec = [int(e) for e in pt] 76 | # pdb.set_trace() 77 | coords = (pt[0], pt[1]), pt[2] - pt[0] + 1, pt[3] - pt[1] + 1 78 | color = colors_tableau[i] 79 | # cv2.rectangle(image, (pt[0], pt[1]), (pt[2], pt[3]), color, 2) 80 | print(rec) 81 | cv2.rectangle(img=image, rec=rec, color=color, thickness=2) 82 | cv2.putText(image, display_txt, (int(pt[0]), int(pt[1]) + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255, 255, 255), 1, 8) 83 | j += 1 84 | # cv2.imshow('test', image) 85 | # cv2.waitKey(100000) # not implemented in headless version 86 | print("------end-------") 87 | cv2.imwrite(name + '_done.jpg', image) 88 | -------------------------------------------------------------------------------- /Train.py: -------------------------------------------------------------------------------- 1 | ''' 2 | 本项目是我在github(国内的话是gitee)的免费开源项目。如果你在某些平台(CSDN、淘宝)付费下载了该项目,烦请告知(邮箱(PengfeiM@outlook.com))。 3 | ''' 4 | 5 | import torch 6 | import Config 7 | if Config.use_cuda: 8 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 9 | if not Config.use_cuda: 10 | print("WARNING: It looks like you have a CUDA device, but aren't " + 11 | "using CUDA.\nRun with --cuda for optimal training speed.") 12 | torch.set_default_tensor_type('torch.FloatTensor') 13 | 14 | import torch.nn as nn 15 | import cv2 16 | import utils 17 | import loss_function 18 | import voc0712 19 | import augmentations 20 | import ssd_net_vgg 21 | import torch.utils.data as data 22 | import torch.optim as optim 23 | from torch.autograd import Variable 24 | def adjust_learning_rate(optimizer, gamma, step): 25 | """Sets the learning rate to the initial LR decayed by 10 at every 26 | specified step 27 | # Adapted from PyTorch Imagenet example: 28 | # https://github.com/pytorch/examples/blob/master/imagenet/main.py 29 | """ 30 | lr = Config.lr * (gamma ** (step)) 31 | for param_group in optimizer.param_groups: 32 | param_group['lr'] = lr 33 | def detection_collate(batch): 34 | """Custom collate fn for dealing with batches of images that have a different 35 | number of associated object annotations (bounding boxes). 36 | 37 | Arguments: 38 | batch: (tuple) A tuple of tensor images and lists of annotations 39 | 40 | Return: 41 | A tuple containing: 42 | 1) (tensor) batch of images stacked on their 0 dim 43 | 2) (list of tensors) annotations for a given image are stacked on 44 | 0 dim 45 | """ 46 | targets = [] 47 | imgs = [] 48 | for sample in batch: 49 | imgs.append(sample[0]) 50 | targets.append(torch.FloatTensor(sample[1])) 51 | return torch.stack(imgs, 0), targets 52 | def xavier(param): 53 | nn.init.xavier_uniform_(param) 54 | def weights_init(m): 55 | if isinstance(m, nn.Conv2d): 56 | xavier(m.weight.data) 57 | m.bias.data.zero_() 58 | def train(): 59 | dataset = voc0712.VOCDetection(root=Config.dataset_root, 60 | transform=augmentations.SSDAugmentation(Config.image_size, 61 | Config.MEANS)) 62 | data_loader = data.DataLoader(dataset, Config.batch_size, 63 | num_workers=Config.data_load_number_worker, 64 | shuffle=True, collate_fn=detection_collate, 65 | pin_memory=True) 66 | 67 | net = ssd_net_vgg.SSD() 68 | vgg_weights = torch.load('./weights/vgg16_reducedfc.pth') 69 | 70 | net.apply(weights_init) 71 | net.vgg.load_state_dict(vgg_weights) 72 | # net.apply(weights_init) 73 | if Config.use_cuda: 74 | net = torch.nn.DataParallel(net) 75 | net = net.cuda() 76 | net.train() 77 | loss_fun = loss_function.LossFun() 78 | optimizer = optim.SGD(net.parameters(), lr=Config.lr, momentum=Config.momentum, 79 | weight_decay=Config.weight_decacy) 80 | iter = 0 81 | step_index = 0 82 | before_epoch = -1 83 | for epoch in range(1000): 84 | for step,(img,target) in enumerate(data_loader): 85 | if Config.use_cuda: 86 | img = img.cuda() 87 | target = [ann.cuda() for ann in target] 88 | img = torch.Tensor(img) 89 | loc_pre,conf_pre = net(img) 90 | priors = utils.default_prior_box() 91 | optimizer.zero_grad() 92 | loss_l,loss_c = loss_fun((loc_pre,conf_pre),target,priors) 93 | loss = loss_l + loss_c 94 | loss.backward() 95 | optimizer.step() 96 | if iter % 1 == 0 or before_epoch!=epoch: 97 | print('epoch : ',epoch,' iter : ',iter,' step : ',step,' loss : ',loss.item()) 98 | before_epoch = epoch 99 | iter+=1 100 | if iter in Config.lr_steps: 101 | step_index+=1 102 | adjust_learning_rate(optimizer,Config.gamma,step_index) 103 | if iter % 10000 == 0 and iter!=0: 104 | torch.save(net.state_dict(), 'weights/ssd300_VOC_' + 105 | repr(iter) + '.pth') 106 | if iter >= Config.max_iter: 107 | break 108 | torch.save(net.state_dict(), 'weights/ssd_voc_120000.pth') 109 | 110 | if __name__ == '__main__': 111 | train() 112 | 113 | 114 | 115 | -------------------------------------------------------------------------------- /__pycache__/Config.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/Config.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/augmentations.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/augmentations.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/detection.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/detection.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/l2norm.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/l2norm.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/loss_function.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/loss_function.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/my_window.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/my_window.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/ssd_net_vgg.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/ssd_net_vgg.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/utils.cpython-37.pyc -------------------------------------------------------------------------------- /__pycache__/voc0712.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/__pycache__/voc0712.cpython-37.pyc -------------------------------------------------------------------------------- /augmentations.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torchvision import transforms 3 | import cv2 4 | import numpy as np 5 | import types 6 | from numpy import random 7 | 8 | def intersect(box_a, box_b): 9 | max_xy = np.minimum(box_a[:, 2:], box_b[2:]) 10 | min_xy = np.maximum(box_a[:, :2], box_b[:2]) 11 | inter = np.clip((max_xy - min_xy), a_min=0, a_max=np.inf) 12 | return inter[:, 0] * inter[:, 1] 13 | 14 | 15 | def jaccard_numpy(box_a, box_b): 16 | """Compute the jaccard overlap of two sets of boxes. The jaccard overlap 17 | is simply the intersection over union of two boxes. 18 | E.g.: 19 | A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 20 | Args: 21 | box_a: Multiple bounding boxes, Shape: [num_boxes,4] 22 | box_b: Single bounding box, Shape: [4] 23 | Return: 24 | jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]] 25 | """ 26 | inter = intersect(box_a, box_b) 27 | area_a = ((box_a[:, 2]-box_a[:, 0]) * 28 | (box_a[:, 3]-box_a[:, 1])) # [A,B] 29 | area_b = ((box_b[2]-box_b[0]) * 30 | (box_b[3]-box_b[1])) # [A,B] 31 | union = area_a + area_b - inter 32 | return inter / union # [A,B] 33 | 34 | 35 | class Compose(object): 36 | """Composes several augmentations together. 37 | Args: 38 | transforms (List[Transform]): list of transforms to compose. 39 | Example: 40 | >>> augmentations.Compose([ 41 | >>> transforms.CenterCrop(10), 42 | >>> transforms.ToTensor(), 43 | >>> ]) 44 | """ 45 | 46 | def __init__(self, transforms): 47 | self.transforms = transforms 48 | 49 | def __call__(self, img, boxes=None, labels=None): 50 | for t in self.transforms: 51 | img, boxes, labels = t(img, boxes, labels) 52 | return img, boxes, labels 53 | 54 | 55 | class Lambda(object): 56 | """Applies a lambda as a transform.""" 57 | 58 | def __init__(self, lambd): 59 | assert isinstance(lambd, types.LambdaType) 60 | self.lambd = lambd 61 | 62 | def __call__(self, img, boxes=None, labels=None): 63 | return self.lambd(img, boxes, labels) 64 | 65 | 66 | class ConvertFromInts(object): 67 | def __call__(self, image, boxes=None, labels=None): 68 | return image.astype(np.float32), boxes, labels 69 | 70 | 71 | class SubtractMeans(object): 72 | def __init__(self, mean): 73 | self.mean = np.array(mean, dtype=np.float32) 74 | 75 | def __call__(self, image, boxes=None, labels=None): 76 | image = image.astype(np.float32) 77 | image -= self.mean 78 | return image.astype(np.float32), boxes, labels 79 | 80 | 81 | class ToAbsoluteCoords(object): 82 | def __call__(self, image, boxes=None, labels=None): 83 | height, width, channels = image.shape 84 | boxes[:, 0] *= width 85 | boxes[:, 2] *= width 86 | boxes[:, 1] *= height 87 | boxes[:, 3] *= height 88 | 89 | return image, boxes, labels 90 | 91 | 92 | class ToPercentCoords(object): 93 | def __call__(self, image, boxes=None, labels=None): 94 | height, width, channels = image.shape 95 | boxes[:, 0] /= width 96 | boxes[:, 2] /= width 97 | boxes[:, 1] /= height 98 | boxes[:, 3] /= height 99 | 100 | return image, boxes, labels 101 | 102 | 103 | class Resize(object): 104 | def __init__(self, size=300): 105 | self.size = size 106 | 107 | def __call__(self, image, boxes=None, labels=None): 108 | image = cv2.resize(image, (self.size, 109 | self.size)) 110 | return image, boxes, labels 111 | 112 | 113 | class RandomSaturation(object): 114 | def __init__(self, lower=0.5, upper=1.5): 115 | self.lower = lower 116 | self.upper = upper 117 | assert self.upper >= self.lower, "contrast upper must be >= lower." 118 | assert self.lower >= 0, "contrast lower must be non-negative." 119 | 120 | def __call__(self, image, boxes=None, labels=None): 121 | if random.randint(2): 122 | image[:, :, 1] *= random.uniform(self.lower, self.upper) 123 | 124 | return image, boxes, labels 125 | 126 | 127 | class RandomHue(object): 128 | def __init__(self, delta=18.0): 129 | assert delta >= 0.0 and delta <= 360.0 130 | self.delta = delta 131 | 132 | def __call__(self, image, boxes=None, labels=None): 133 | if random.randint(2): 134 | image[:, :, 0] += random.uniform(-self.delta, self.delta) 135 | image[:, :, 0][image[:, :, 0] > 360.0] -= 360.0 136 | image[:, :, 0][image[:, :, 0] < 0.0] += 360.0 137 | return image, boxes, labels 138 | 139 | 140 | class RandomLightingNoise(object): 141 | def __init__(self): 142 | self.perms = ((0, 1, 2), (0, 2, 1), 143 | (1, 0, 2), (1, 2, 0), 144 | (2, 0, 1), (2, 1, 0)) 145 | 146 | def __call__(self, image, boxes=None, labels=None): 147 | if random.randint(2): 148 | swap = self.perms[random.randint(len(self.perms))] 149 | shuffle = SwapChannels(swap) # shuffle channels 150 | image = shuffle(image) 151 | return image, boxes, labels 152 | 153 | 154 | class ConvertColor(object): 155 | def __init__(self, current='BGR', transform='HSV'): 156 | self.transform = transform 157 | self.current = current 158 | 159 | def __call__(self, image, boxes=None, labels=None): 160 | if self.current == 'BGR' and self.transform == 'HSV': 161 | image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) 162 | elif self.current == 'HSV' and self.transform == 'BGR': 163 | image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR) 164 | else: 165 | raise NotImplementedError 166 | return image, boxes, labels 167 | 168 | 169 | class RandomContrast(object): 170 | def __init__(self, lower=0.5, upper=1.5): 171 | self.lower = lower 172 | self.upper = upper 173 | assert self.upper >= self.lower, "contrast upper must be >= lower." 174 | assert self.lower >= 0, "contrast lower must be non-negative." 175 | 176 | # expects float image 177 | def __call__(self, image, boxes=None, labels=None): 178 | if random.randint(2): 179 | alpha = random.uniform(self.lower, self.upper) 180 | image *= alpha 181 | return image, boxes, labels 182 | 183 | 184 | class RandomBrightness(object): 185 | def __init__(self, delta=32): 186 | assert delta >= 0.0 187 | assert delta <= 255.0 188 | self.delta = delta 189 | 190 | def __call__(self, image, boxes=None, labels=None): 191 | if random.randint(2): 192 | delta = random.uniform(-self.delta, self.delta) 193 | image += delta 194 | return image, boxes, labels 195 | 196 | 197 | class ToCV2Image(object): 198 | def __call__(self, tensor, boxes=None, labels=None): 199 | return tensor.cpu().numpy().astype(np.float32).transpose((1, 2, 0)), boxes, labels 200 | 201 | 202 | class ToTensor(object): 203 | def __call__(self, cvimage, boxes=None, labels=None): 204 | return torch.from_numpy(cvimage.astype(np.float32)).permute(2, 0, 1), boxes, labels 205 | 206 | 207 | class RandomSampleCrop(object): 208 | """Crop 209 | Arguments: 210 | img (Image): the image being input during training 211 | boxes (Tensor): the original bounding boxes in pt form 212 | labels (Tensor): the class labels for each bbox 213 | mode (float tuple): the min and max jaccard overlaps 214 | Return: 215 | (img, boxes, classes) 216 | img (Image): the cropped image 217 | boxes (Tensor): the adjusted bounding boxes in pt form 218 | labels (Tensor): the class labels for each bbox 219 | """ 220 | def __init__(self): 221 | self.sample_options = ( 222 | # using entire original input image 223 | None, 224 | # sample a patch s.t. MIN jaccard w/ obj in .1,.3,.4,.7,.9 225 | (0.1, None), 226 | (0.3, None), 227 | (0.7, None), 228 | (0.9, None), 229 | # randomly sample a patch 230 | (None, None), 231 | ) 232 | 233 | def __call__(self, image, boxes=None, labels=None): 234 | height, width, _ = image.shape 235 | while True: 236 | # randomly choose a mode 237 | mode = random.choice(self.sample_options) 238 | if mode is None: 239 | return image, boxes, labels 240 | 241 | min_iou, max_iou = mode 242 | if min_iou is None: 243 | min_iou = float('-inf') 244 | if max_iou is None: 245 | max_iou = float('inf') 246 | 247 | # max trails (50) 248 | for _ in range(50): 249 | current_image = image 250 | 251 | w = random.uniform(0.3 * width, width) 252 | h = random.uniform(0.3 * height, height) 253 | 254 | # aspect ratio constraint b/t .5 & 2 255 | if h / w < 0.5 or h / w > 2: 256 | continue 257 | 258 | left = random.uniform(width - w) 259 | top = random.uniform(height - h) 260 | 261 | # convert to integer rect x1,y1,x2,y2 262 | rect = np.array([int(left), int(top), int(left+w), int(top+h)]) 263 | 264 | # calculate IoU (jaccard overlap) b/t the cropped and gt boxes 265 | overlap = jaccard_numpy(boxes, rect) 266 | 267 | # is min and max overlap constraint satisfied? if not try again 268 | if overlap.min() < min_iou and max_iou < overlap.max(): 269 | continue 270 | 271 | # cut the crop from the image 272 | current_image = current_image[rect[1]:rect[3], rect[0]:rect[2], 273 | :] 274 | 275 | # keep overlap with gt box IF center in sampled patch 276 | centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0 277 | 278 | # mask in all gt boxes that above and to the left of centers 279 | m1 = (rect[0] < centers[:, 0]) * (rect[1] < centers[:, 1]) 280 | 281 | # mask in all gt boxes that under and to the right of centers 282 | m2 = (rect[2] > centers[:, 0]) * (rect[3] > centers[:, 1]) 283 | 284 | # mask in that both m1 and m2 are true 285 | mask = m1 * m2 286 | 287 | # have any valid boxes? try again if not 288 | if not mask.any(): 289 | continue 290 | 291 | # take only matching gt boxes 292 | current_boxes = boxes[mask, :].copy() 293 | 294 | # take only matching gt labels 295 | current_labels = labels[mask] 296 | 297 | # should we use the box left and top corner or the crop's 298 | current_boxes[:, :2] = np.maximum(current_boxes[:, :2], 299 | rect[:2]) 300 | # adjust to crop (by substracting crop's left,top) 301 | current_boxes[:, :2] -= rect[:2] 302 | 303 | current_boxes[:, 2:] = np.minimum(current_boxes[:, 2:], 304 | rect[2:]) 305 | # adjust to crop (by substracting crop's left,top) 306 | current_boxes[:, 2:] -= rect[:2] 307 | 308 | return current_image, current_boxes, current_labels 309 | 310 | 311 | class Expand(object): 312 | def __init__(self, mean): 313 | self.mean = mean 314 | 315 | def __call__(self, image, boxes, labels): 316 | if random.randint(2): 317 | return image, boxes, labels 318 | 319 | height, width, depth = image.shape 320 | ratio = random.uniform(1, 4) 321 | left = random.uniform(0, width*ratio - width) 322 | top = random.uniform(0, height*ratio - height) 323 | 324 | expand_image = np.zeros( 325 | (int(height*ratio), int(width*ratio), depth), 326 | dtype=image.dtype) 327 | expand_image[:, :, :] = self.mean 328 | expand_image[int(top):int(top + height), 329 | int(left):int(left + width)] = image 330 | image = expand_image 331 | 332 | boxes = boxes.copy() 333 | boxes[:, :2] += (int(left), int(top)) 334 | boxes[:, 2:] += (int(left), int(top)) 335 | 336 | return image, boxes, labels 337 | 338 | 339 | class RandomMirror(object): 340 | def __call__(self, image, boxes, classes): 341 | _, width, _ = image.shape 342 | if random.randint(2): 343 | image = image[:, ::-1] 344 | boxes = boxes.copy() 345 | boxes[:, 0::2] = width - boxes[:, 2::-2] 346 | return image, boxes, classes 347 | 348 | 349 | class SwapChannels(object): 350 | """Transforms a tensorized image by swapping the channels in the order 351 | specified in the swap tuple. 352 | Args: 353 | swaps (int triple): final order of channels 354 | eg: (2, 1, 0) 355 | """ 356 | 357 | def __init__(self, swaps): 358 | self.swaps = swaps 359 | 360 | def __call__(self, image): 361 | """ 362 | Args: 363 | image (Tensor): image tensor to be transformed 364 | Return: 365 | a tensor with channels swapped according to swap 366 | """ 367 | # if torch.is_tensor(image): 368 | # image = image.data.cpu().numpy() 369 | # else: 370 | # image = np.array(image) 371 | image = image[:, :, self.swaps] 372 | return image 373 | 374 | 375 | class PhotometricDistort(object): 376 | def __init__(self): 377 | self.pd = [ 378 | RandomContrast(), 379 | ConvertColor(transform='HSV'), 380 | RandomSaturation(), 381 | RandomHue(), 382 | ConvertColor(current='HSV', transform='BGR'), 383 | RandomContrast() 384 | ] 385 | self.rand_brightness = RandomBrightness() 386 | self.rand_light_noise = RandomLightingNoise() 387 | 388 | def __call__(self, image, boxes, labels): 389 | im = image.copy() 390 | im, boxes, labels = self.rand_brightness(im, boxes, labels) 391 | if random.randint(2): 392 | distort = Compose(self.pd[:-1]) 393 | else: 394 | distort = Compose(self.pd[1:]) 395 | im, boxes, labels = distort(im, boxes, labels) 396 | return self.rand_light_noise(im, boxes, labels) 397 | 398 | 399 | class SSDAugmentation(object): 400 | def __init__(self, size=300, mean=(104, 117, 123)): 401 | self.mean = mean 402 | self.size = size 403 | self.augment = Compose([ 404 | ConvertFromInts(), 405 | ToAbsoluteCoords(), 406 | PhotometricDistort(), 407 | Expand(self.mean), 408 | RandomSampleCrop(), 409 | RandomMirror(), 410 | ToPercentCoords(), 411 | Resize(self.size), 412 | SubtractMeans(self.mean) 413 | ]) 414 | 415 | def __call__(self, img, boxes, labels): 416 | return self.augment(img, boxes, labels) 417 | -------------------------------------------------------------------------------- /bus_dataset.log: -------------------------------------------------------------------------------- 1 | iter: accuracy 2 | 10000 0.906613 3 | 20000 0.912413 4 | 30000 0.940835 5 | 40000 0.939095 6 | 50000 0.956497 7 | 60000 0.946636 8 | 70000 0.966937 9 | 80000 0.941995 10 | 90000 0.968677 11 | 100000 0.972158 12 | 110000 0.970708 13 | 120000 0.969548 14 | final 0.969548 15 | 16 | final=120009 -------------------------------------------------------------------------------- /camera.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import time 3 | 4 | cap=cv2.VideoCapture('G:\\ustc\\bishe\\Captures\\002.WMV') 5 | while cap.isOpened(): 6 | ret,frame=cap.read() 7 | cv2.imshow('capture', frame) 8 | time.sleep(0.050) 9 | if cv2.waitKey(1) & 0xFF == ord('q'): 10 | break 11 | cap.release() 12 | cv2.destroyAllWindows() -------------------------------------------------------------------------------- /camera_detection.py: -------------------------------------------------------------------------------- 1 | from torch.autograd import Variable 2 | from detection import * 3 | from ssd_net_vgg import * 4 | from voc0712 import * 5 | import torch 6 | import torch.nn as nn 7 | import numpy as np 8 | import cv2 9 | import utils 10 | import torch.backends.cudnn as cudnn 11 | import time 12 | #检测cuda是否可用 13 | if torch.cuda.is_available(): 14 | print('-----gpu mode-----') 15 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 16 | else: 17 | print('-----cpu mode-----') 18 | colors_tableau=[ (214, 39, 40),(23, 190, 207),(188, 189, 34),(188,34,188),(205,108,8)] 19 | 20 | def Yawn(list_Y,list_y1): 21 | list_cmp=list_Y[:len(list_Y1)]==list_Y1 22 | for flag in list_cmp: 23 | if flag==False: 24 | return False 25 | return True 26 | #初始化网络 27 | net=SSD() 28 | net=torch.nn.DataParallel(net) 29 | net.train(mode=False) 30 | net.load_state_dict(torch.load('./weights/ssd300_VOC_100000.pth',map_location=lambda storage,loc: storage)) 31 | if torch.cuda.is_available(): 32 | net = net.cuda() 33 | cudnn.benchmark = True 34 | 35 | img_mean=(104.0,117.0,123.0) 36 | 37 | #调用摄像头 38 | cap=cv2.VideoCapture(0) 39 | max_fps=0 40 | 41 | #保存检测结果的List 42 | #眼睛和嘴巴都是,张开为‘1’,闭合为‘0’ 43 | list_B=np.ones(15)#眼睛状态List,建议根据fps修改 44 | list_Y=np.zeros(50)#嘴巴状态list,建议根据fps修改 45 | list_Y1=np.ones(5)#如果在list_Y中存在list_Y1,则判定一次打哈欠,同上,长度建议修改 46 | blink_count=0#眨眼计数 47 | yawn_count=0 48 | blink_start=time.time()#炸眼时间 49 | yawn_start=time.time()#打哈欠时间 50 | blink_freq=0.5 51 | yawn_freq=0 52 | #开始检测,按‘q’退出 53 | while(True): 54 | flag_B=True#是否闭眼的flag 55 | flag_Y=False 56 | num_rec=0#检测到的眼睛的数量 57 | start=time.time()#计时 58 | ret,img=cap.read()#读取图片 59 | 60 | #检测 61 | x=cv2.resize(img,(300,300)).astype(np.float32) 62 | x-=img_mean 63 | x=x.astype(np.float32) 64 | x=x[:,:,::-1].copy() 65 | x=torch.from_numpy(x).permute(2,0,1) 66 | xx=Variable(x.unsqueeze(0)) 67 | if torch.cuda.is_available(): 68 | xx=xx.cuda() 69 | y=net(xx) 70 | softmax=nn.Softmax(dim=-1) 71 | #detect=Detect(config.class_num,0,200,0.01,0.45) 72 | detect = Detect.apply 73 | priors=utils.default_prior_box() 74 | 75 | loc,conf=y 76 | loc=torch.cat([o.view(o.size(0),-1)for o in loc],1) 77 | conf=torch.cat([o.view(o.size(0),-1)for o in conf],1) 78 | 79 | detections=detect( 80 | loc.view(loc.size(0),-1,4), 81 | softmax(conf.view(conf.size(0),-1,config.class_num)), 82 | torch.cat([o.view(-1,4) for o in priors],0), 83 | config.class_num, 84 | 200, 85 | 0.7, 86 | 0.45 87 | ).data 88 | labels=VOC_CLASSES 89 | top_k=10 90 | 91 | #将检测结果放置于图片上 92 | scale=torch.Tensor(img.shape[1::-1]).repeat(2) 93 | for i in range(detections.size(1)): 94 | 95 | j=0 96 | while detections[0,i,j,0]>=0.4: 97 | score=detections[0,i,j,0] 98 | label_name=labels[i-1] 99 | if label_name=='closed_eye': 100 | flag_B=False 101 | if label_name=='open_mouth': 102 | flag_Y=True 103 | display_txt='%s:%.2f'%(label_name,score) 104 | pt=(detections[0,i,j,1:]*scale).cpu().numpy() 105 | coords=(pt[0],pt[1]),pt[2]-pt[0]+1,pt[3]-pt[1]+1 106 | color=colors_tableau[i] 107 | cv2.rectangle(img,(pt[0],pt[1]),(pt[2],pt[3]),color,2) 108 | cv2.putText(img,display_txt,(int(pt[0]),int(pt[1])+10),cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8) 109 | j+=1 110 | num_rec+=1 111 | if num_rec>0: 112 | if flag_B: 113 | #print(' 1:eye-open') 114 | list_B=np.append(list_B,1)#睁眼为‘1’ 115 | else: 116 | #print(' 0:eye-closed') 117 | list_B=np.append(list_B,0)#闭眼为‘0’ 118 | list_B=np.delete(list_B,0) 119 | if flag_Y: 120 | list_Y=np.append(list_Y,1) 121 | else: 122 | list_Y=np.append(list_Y,0) 123 | list_Y=np.delete(list_Y,0) 124 | else: 125 | print('nothing detected') 126 | #print(list) 127 | #实时计算PERCLOS 128 | perclos=1-np.average(list_B) 129 | print('perclos={:f}'.format(perclos)) 130 | if list_B[13]==1 and list_B[14]==0: 131 | #如果上一帧为’1‘,此帧为’0‘则判定为眨眼 132 | print('----------------眨眼----------------------') 133 | blink_count+=1 134 | blink_T=time.time()-blink_start 135 | if blink_T>10: 136 | #每10秒计算一次眨眼频率 137 | blink_freq=blink_count/blink_T 138 | blink_start=time.time() 139 | blink_count=0 140 | print('blink_freq={:f}'.format(blink_freq)) 141 | #检测打哈欠 142 | #if Yawn(list_Y,list_Y1): 143 | if (list_Y[len(list_Y)-len(list_Y1):]==list_Y1).all(): 144 | print('----------------------打哈欠----------------------') 145 | yawn_count+=1 146 | list_Y=np.zeros(50) 147 | #计算打哈欠频率 148 | yawn_T=time.time()-yawn_start 149 | if yawn_T>60: 150 | yawn_freq=yawn_count/yawn_T 151 | yawn_start=time.time() 152 | yawn_count=0 153 | print('yawn_freq={:f}'.fomat(yawn_freq)) 154 | 155 | #此处为判断疲劳部分 156 | ''' 157 | 想法1:最简单,但是太影响实时性 158 | if(perclos>0.4 or blink_freq<0.25 or yawn_freq>5/60): 159 | print('疲劳') 160 | if(blink_freq<0.25) 161 | else: 162 | print('清醒') 163 | ''' 164 | #想法2: 165 | if(perclos>0.4): 166 | print('疲劳') 167 | elif(blink_freq<0.25): 168 | print('疲劳') 169 | blink_freq=0.5#如果因为眨眼频率判断疲劳,则初始化眨眼频率 170 | elif(yawn_freq>5.0/60): 171 | print("疲劳") 172 | yawn_freq=0#初始化,同上 173 | else: 174 | print('清醒') 175 | T=time.time()-start 176 | fps=1/T#实时在视频上显示fps 177 | if fps>max_fps: 178 | max_fps=fps 179 | fps_txt='fps:%.2f'%(fps) 180 | cv2.putText(img,fps_txt,(0,10),cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8) 181 | cv2.imshow("ssd",img) 182 | if cv2.waitKey(100) & 0xff == ord('q'): 183 | break 184 | #print("-------end-------") 185 | cap.release() 186 | cv2.destroyAllWindows() 187 | #print(max_fps) -------------------------------------------------------------------------------- /camera_detection_1.py: -------------------------------------------------------------------------------- 1 | from torch.autograd import Variable 2 | from detection import * 3 | from ssd_net_vgg import * 4 | from voc0712 import * 5 | import torch 6 | import torch.nn as nn 7 | import numpy as np 8 | import cv2 9 | import utils 10 | import torch.backends.cudnn as cudnn 11 | import time 12 | #检测cuda是否可用 13 | if torch.cuda.is_available(): 14 | print('-----gpu mode-----') 15 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 16 | else: 17 | print('-----cpu mode-----') 18 | colors_tableau=[ (214, 39, 40),(23, 190, 207),(188, 189, 34),(188,34,188),(205,108,8)] 19 | 20 | def Yawn(list_Y,list_y1): 21 | list_cmp=list_Y[:len(list_Y1)]==list_Y1 22 | for flag in list_cmp: 23 | if flag==False: 24 | return False 25 | return True 26 | #初始化网络 27 | net=SSD() 28 | net=torch.nn.DataParallel(net) 29 | net.train(mode=False) 30 | net.load_state_dict(torch.load('./weights/ssd_voc_5000_plus.pth',map_location=lambda storage,loc: storage)) 31 | if torch.cuda.is_available(): 32 | net = net.cuda() 33 | cudnn.benchmark = True 34 | 35 | img_mean=(104.0,117.0,123.0) 36 | 37 | #调用摄像头 38 | cap=cv2.VideoCapture(0) 39 | max_fps=0 40 | 41 | #保存检测结果的List 42 | #眼睛和嘴巴都是,张开为‘1’,闭合为‘0’ 43 | list_B=np.ones(15)#眼睛状态List,建议根据fps修改,个人电脑fps≈6 44 | list_Y=np.zeros(50)#嘴巴状态list,建议根据fps修改 45 | list_Y1=np.ones(5)#如果在list_Y中存在list_Y1,则判定一次打哈欠,同上,长度建议修改 46 | list_blink=list(60)#大约是记录10S内信息,眨眼为‘1’,不眨眼为‘0’ 47 | list_yawn=np.zeros(360)#大约是一分钟内打哈欠记录,打哈欠为‘1’,不打哈欠为‘0’ 48 | 49 | #blink_count=0#眨眼计数 50 | #yawn_count=0 51 | #blink_start=time.time()#炸眼时间 52 | #yawn_start=time.time()#打哈欠时间 53 | blink_freq=0.5 54 | yawn_freq=0 55 | #开始检测,按‘q’退出 56 | while(True): 57 | flag_B=True#是否闭眼的flag 58 | flag_Y=False#张嘴flag 59 | 60 | num_rec=0#检测到的眼睛的数量 61 | start=time.time()#计时 62 | ret,img=cap.read()#读取图片 63 | 64 | #检测 65 | x=cv2.resize(img,(300,300)).astype(np.float32) 66 | x-=img_mean 67 | x=x.astype(np.float32) 68 | x=x[:,:,::-1].copy() 69 | x=torch.from_numpy(x).permute(2,0,1) 70 | xx=Variable(x.unsqueeze(0)) 71 | if torch.cuda.is_available(): 72 | xx=xx.cuda() 73 | y=net(xx) 74 | softmax=nn.Softmax(dim=-1) 75 | detect=Detect(config.class_num,0,200,0.01,0.45) 76 | priors=utils.default_prior_box() 77 | 78 | loc,conf=y 79 | loc=torch.cat([o.view(o.size(0),-1)for o in loc],1) 80 | conf=torch.cat([o.view(o.size(0),-1)for o in conf],1) 81 | 82 | detections=detect( 83 | loc.view(loc.size(0),-1,4), 84 | softmax(conf.view(conf.size(0),-1,config.class_num)), 85 | torch.cat([o.view(-1,4) for o in priors],0) 86 | ).data 87 | labels=VOC_CLASSES 88 | top_k=10 89 | 90 | #将检测结果放置于图片上 91 | scale=torch.Tensor(img.shape[1::-1]).repeat(2) 92 | for i in range(detections.size(1)): 93 | 94 | j=0 95 | while detections[0,i,j,0]>=0.4: 96 | score=detections[0,i,j,0] 97 | label_name=labels[i-1] 98 | if label_name=='closed_eye': 99 | flag_B=False 100 | if label_name=='open_mouth': 101 | flag_Y=True 102 | display_txt='%s:%.2f'%(label_name,score) 103 | pt=(detections[0,i,j,1:]*scale).cpu().numpy() 104 | coords=(pt[0],pt[1]),pt[2]-pt[0]+1,pt[3]-pt[1]+1 105 | color=colors_tableau[i] 106 | cv2.rectangle(img,(pt[0],pt[1]),(pt[2],pt[3]),color,2) 107 | cv2.putText(img,display_txt,(int(pt[0]),int(pt[1])+10),cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8) 108 | j+=1 109 | num_rec+=1 110 | if num_rec>0: 111 | if flag_B: 112 | #print(' 1:eye-open') 113 | list_B=np.append(list_B,1)#睁眼为‘1’ 114 | else: 115 | #print(' 0:eye-closed') 116 | list_B=np.append(list_B,0)#闭眼为‘0’ 117 | list_B=np.delete(list_B,0) 118 | if flag_Y: 119 | list_Y=np.append(list_Y,1) 120 | else: 121 | list_Y=np.append(list_Y,0) 122 | list_Y=np.delete(list_Y,0) 123 | else: 124 | print('nothing detected') 125 | #print(list) 126 | 127 | if list_B[13]==1 and list_B[14]==0: 128 | #如果上一帧为’1‘,此帧为’0‘则判定为眨眼 129 | print('----------------眨眼----------------------') 130 | list_blink=np.append(list_blink,1) 131 | else: 132 | list_blink=np.append(list_blink,0) 133 | list_blink=np.delete(list_blink,0) 134 | 135 | 136 | #检测打哈欠 137 | #if Yawn(list_Y,list_Y1): 138 | if (list_Y[len(list_Y)-len(list_Y1):]==list_Y1).all(): 139 | print('----------------------打哈欠----------------------') 140 | yawn_count+=1 141 | list_Y=np.zeros(50)#此处是检测到一次打哈欠之后将嘴部状态list全部置‘0’,考虑到打哈欠所用时间较长,所以基本不会出现漏检 142 | list_yawn=np.append(list_yawn,1) 143 | else: 144 | list_yawn=np.append(list_yawn,0) 145 | list_yawn=np.delete(list_yawn,0) 146 | 147 | 148 | 149 | #实时计算PERCLOS perblink,peryawn 150 | #即计算平均闭眼时长百分比,平均眨眼百分比,平均打哈欠百分比 151 | perclos=1-np.average(list_B) 152 | perblink=np.average(list_blink) 153 | peryawn=np.average(list_yawn) 154 | #print('perclos={:f}'.format(perclos)) 155 | 156 | #此处为判断疲劳部分 157 | #想法1:两个频率计算改为实时的,所以此处不再修改 158 | if(perclos>0.4 or perblink<0.25 or peryawn>5/60): 159 | print('疲劳') 160 | #if(blink_freq<0.25) 161 | else: 162 | print('清醒') 163 | 164 | '''#想法2: 165 | if(perclos>0.4): 166 | { 167 | print('疲劳') 168 | } 169 | elif(blink_freq<0.25): 170 | { 171 | print('疲劳') 172 | blink_freq=0.5#如果因为眨眼频率判断疲劳,则初始化眨眼频率 173 | } 174 | elif(yawn_freq>5.0/60): 175 | { 176 | print("疲劳") 177 | yawn_freq=0#初始化,同上 178 | } 179 | else: 180 | { 181 | print('清醒') 182 | } 183 | ''' 184 | T=time.time()-start 185 | fps=1/T#实时在视频上显示fps 186 | if fps>max_fps: 187 | max_fps=fps 188 | fps_txt='fps:%.2f'%(fps) 189 | cv2.putText(img,fps_txt,(0,10),cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8) 190 | cv2.imshow("ssd",img) 191 | if cv2.waitKey(100) & 0xff == ord('q'): 192 | break 193 | #print("-------end-------") 194 | cap.release() 195 | cv2.destroyAllWindows() 196 | #print(max_fps) -------------------------------------------------------------------------------- /detection.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.autograd import Function 3 | from utils import decode, nms 4 | 5 | 6 | class Detect(Function): 7 | """At test time, Detect is the final layer of SSD. Decode location preds, 8 | apply non-maximum suppression to location predictions based on conf 9 | scores and threshold to a top_k number of output predictions for both 10 | confidence score and locations. 11 | """ 12 | def __init__(self, num_classes, bkg_label, top_k, conf_thresh, nms_thresh): 13 | self.num_classes = num_classes 14 | self.background_label = bkg_label 15 | self.top_k = top_k 16 | # Parameters used in nms. 17 | self.nms_thresh = nms_thresh 18 | if nms_thresh <= 0: 19 | raise ValueError('nms_threshold must be non negative.') 20 | self.conf_thresh = conf_thresh 21 | self.variance = (0.1,0.2) 22 | 23 | @staticmethod 24 | def forward(self, loc_data, conf_data, prior_data, num_classes, top_k, conf_thresh, nms_thresh): 25 | """ 26 | Args: 27 | loc_data: (tensor) Loc preds from loc layers 28 | Shape: [batch,num_priors*4] 29 | conf_data: (tensor) Shape: Conf preds from conf layers 30 | Shape: [batch*num_priors,num_classes] 31 | prior_data: (tensor) Prior boxes and variances from priorbox layers 32 | Shape: [1,num_priors,4] 33 | """ 34 | num = loc_data.size(0) # batch size 35 | num_priors = prior_data.size(0) 36 | output = torch.zeros(num, num_classes, top_k, 5) 37 | conf_preds = conf_data.view(num, num_priors, 38 | num_classes).transpose(2, 1) 39 | 40 | # Decode predictions into bboxes. 41 | variance = (0.1, 0.2) 42 | for i in range(num): 43 | decoded_boxes = decode(loc_data[i], prior_data, variance) 44 | # For each class, perform nms 45 | conf_scores = conf_preds[i].clone() 46 | for cl in range(1, num_classes): 47 | c_mask = conf_scores[cl].gt(conf_thresh) 48 | scores = conf_scores[cl][c_mask] 49 | if scores.dim() == 0: 50 | continue 51 | l_mask = c_mask.unsqueeze(1).expand_as(decoded_boxes) 52 | boxes = decoded_boxes[l_mask].view(-1, 4) 53 | # idx of highest scoring and non-overlapping boxes per class 54 | ids, count = nms(boxes, scores, nms_thresh, top_k) 55 | 56 | if count==0: 57 | continue 58 | output[i, cl, :count] = \ 59 | torch.cat((scores[ids[:count]].unsqueeze(1), 60 | boxes[ids[:count]]), 1) 61 | flt = output.contiguous().view(num, -1, 5) 62 | _, idx = flt[:, :, 0].sort(1, descending=True) 63 | _, rank = idx.sort(1) 64 | flt[(rank < top_k).unsqueeze(-1).expand_as(flt)].fill_(0) 65 | return output 66 | -------------------------------------------------------------------------------- /dnf_test.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/dnf_test.jpg -------------------------------------------------------------------------------- /dnf_test_done.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/dnf_test_done.jpg -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: torch 2 | channels: 3 | - defaults 4 | - https://repo.anaconda.com/pkgs/main 5 | - https://repo.anaconda.com/pkgs/r 6 | dependencies: 7 | - _libgcc_mutex=0.1=main 8 | - _openmp_mutex=5.1=1_gnu 9 | - bzip2=1.0.8=h5eee18b_6 10 | - ca-certificates=2025.2.25=h06a4308_0 11 | - expat=2.7.1=h6a678d5_0 12 | - ld_impl_linux-64=2.40=h12ee557_0 13 | - libffi=3.4.4=h6a678d5_1 14 | - libgcc-ng=11.2.0=h1234567_1 15 | - libgomp=11.2.0=h1234567_1 16 | - libstdcxx-ng=11.2.0=h1234567_1 17 | - libuuid=1.41.5=h5eee18b_0 18 | - ncurses=6.4=h6a678d5_0 19 | - openssl=3.0.16=h5eee18b_0 20 | - pip=25.0=py312h06a4308_0 21 | - python=3.12.9=h5148396_0 22 | - readline=8.2=h5eee18b_0 23 | - setuptools=75.8.0=py312h06a4308_0 24 | - sqlite=3.45.3=h5eee18b_0 25 | - tk=8.6.14=h39e8969_0 26 | - tzdata=2025a=h04d1e81_0 27 | - wheel=0.45.1=py312h06a4308_0 28 | - xz=5.6.4=h5eee18b_1 29 | - zlib=1.2.13=h5eee18b_1 30 | - pip: 31 | - contourpy==1.3.2 32 | - cycler==0.12.1 33 | - filelock==3.18.0 34 | - fonttools==4.57.0 35 | - fsspec==2025.3.2 36 | - jinja2==3.1.6 37 | - kiwisolver==1.4.8 38 | - markupsafe==3.0.2 39 | - matplotlib==3.10.1 40 | - mpmath==1.3.0 41 | - networkx==3.4.2 42 | - numpy==2.2.5 43 | - nvidia-cublas-cu12==12.4.5.8 44 | - nvidia-cuda-cupti-cu12==12.4.127 45 | - nvidia-cuda-nvrtc-cu12==12.4.127 46 | - nvidia-cuda-runtime-cu12==12.4.127 47 | - nvidia-cudnn-cu12==9.1.0.70 48 | - nvidia-cufft-cu12==11.2.1.3 49 | - nvidia-curand-cu12==10.3.5.147 50 | - nvidia-cusolver-cu12==11.6.1.9 51 | - nvidia-cusparse-cu12==12.3.1.170 52 | - nvidia-cusparselt-cu12==0.6.2 53 | - nvidia-nccl-cu12==2.21.5 54 | - nvidia-nvjitlink-cu12==12.4.127 55 | - nvidia-nvtx-cu12==12.4.127 56 | - opencv-python-headless==4.11.0.86 57 | - packaging==25.0 58 | - pillow==11.2.1 59 | - pyparsing==3.2.3 60 | - python-dateutil==2.9.0.post0 61 | - six==1.17.0 62 | - sympy==1.13.1 63 | - torch==2.6.0 64 | - torchvision==0.21.0 65 | - triton==3.2.0 66 | - typing-extensions==4.13.2 67 | prefix: $HOME/miniconda3/envs/torch 68 | -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | from torch.autograd import Variable 2 | from detection import * 3 | from ssd_net_vgg import * 4 | from voc0712 import * 5 | import torch 6 | import torch.nn as nn 7 | import numpy as np 8 | import cv2 9 | import utils 10 | import torch.backends.cudnn as cudnn 11 | import time 12 | import torch.utils.data as data 13 | import sys 14 | import os 15 | import pickle 16 | 17 | #检测cuda是否可用 18 | if torch.cuda.is_available(): 19 | print('-----gpu mode-----') 20 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 21 | else: 22 | print('-----cpu mode-----') 23 | colors_tableau=[ (214, 39, 40),(23, 190, 207),(188, 189, 34),(188,34,188),(205,108,8)] 24 | 25 | net=SSD() 26 | net=torch.nn.DataParallel(net) 27 | net.train(mode=False) 28 | net.load_state_dict(torch.load('./weights/ssd300_voc_100000.pth',map_location=lambda storage,loc: storage)) 29 | if torch.cuda.is_available(): 30 | net = net.cuda() 31 | cudnn.benchmark = True 32 | 33 | devkit_path='./dataset/' 34 | annopath=os.path.join(devkit_path,'Annotations', '%s.xml') 35 | ftest=open(devkit_path+'ImageSets/Main/test.txt','r') 36 | img_mean=(104.0,117.0,123.0) 37 | 38 | def parse_rec(filename): 39 | '''获取图片中所有的label和坐标''' 40 | tree=ET.parse(filename) 41 | objects=[] 42 | for obj in tree.findall('object'): 43 | obj_struct={} 44 | obj_struct['name']=obj.find('name').text 45 | bbox=obj.find('bndbox') 46 | obj_struct['bbox']=[int(bbox.find('xmin').text)-1, 47 | int(bbox.find('ymin').text)-1, 48 | int(bbox.find('xmax').text)-1, 49 | int(bbox.find('ymax').text)-1] 50 | objects.append(obj_struct) 51 | 52 | return objects 53 | 54 | def IoU(obj_R,obj_P): 55 | #计算交并比 56 | cood_r=obj_R['bbox'] 57 | cood_p=obj_P['bbox'] 58 | ixmin=max(cood_r[0],cood_p[0]) 59 | iymin=max(cood_r[1],cood_p[1]) 60 | ixmax=min(cood_r[2],cood_p[2]) 61 | iymax=min(cood_r[3],cood_p[3]) 62 | iw=max(ixmax-ixmin,0.) 63 | ih=max(iymax-iymin,0.) 64 | inters=iw*ih*1.0 65 | uni=((cood_r[2]-cood_r[0])*(cood_r[3]-cood_r[1])+ 66 | (cood_p[2]-cood_p[0])*(cood_p[3]-cood_p[1])- 67 | inters) 68 | overlaps=inters/uni 69 | return overlaps 70 | 71 | count=0 72 | time_start=time.time() 73 | accu_num=0 74 | real_num=0 75 | 76 | for line in ftest: 77 | name=line.strip() 78 | print(name) 79 | obj_real=parse_rec(devkit_path+'Annotations/'+name+'.xml') 80 | real_num+=len(obj_real) 81 | img=cv2.imread(devkit_path+'JPEGImages/'+name+'.jpg',cv2.IMREAD_COLOR) 82 | x=cv2.resize(img,(300,300)).astype(np.float32) 83 | x-=img_mean 84 | x=x.astype(np.float32) 85 | x=x[:,:,::-1].copy() 86 | x=torch.from_numpy(x).permute(2,0,1) 87 | xx=Variable(x.unsqueeze(0)) 88 | if torch.cuda.is_available(): 89 | xx=xx.cuda() 90 | y=net(xx) 91 | softmax=nn.Softmax(dim=-1) 92 | detect=Detect(config.class_num,0,200,0.01,0.45) 93 | priors=utils.default_prior_box() 94 | 95 | loc,conf=y 96 | loc=torch.cat([o.view(o.size(0),-1)for o in loc],1) 97 | conf=torch.cat([o.view(o.size(0),-1)for o in conf],1) 98 | 99 | detections=detect( 100 | loc.view(loc.size(0),-1,4), 101 | softmax(conf.view(conf.size(0),-1,config.class_num)), 102 | torch.cat([o.view(-1,4) for o in priors],0) 103 | ).data 104 | labels=VOC_CLASSES 105 | top_k=10 106 | 107 | scale=torch.Tensor(img.shape[1::-1]).repeat(2) 108 | obj_pre=[] 109 | for i in range(detections.size(1)): 110 | j=0 111 | 112 | while detections[0,i,j,0]>=0.4: 113 | score=detections[0,i,j,0] 114 | obj={} 115 | obj['name']=labels[i-1] 116 | pt=(detections[0,i,j,1:]*scale).cpu().numpy() 117 | obj['bbox']=[int(pt[0]), 118 | int(pt[1]), 119 | int(pt[2]), 120 | int(pt[3])] 121 | obj_pre.append(obj) 122 | 123 | label_name=labels[i-1] 124 | display_txt='%s:%.2f'%(label_name,score) 125 | coords=(pt[0],pt[1]),pt[2]-pt[0]+1,pt[3]-pt[1]+1 126 | color=colors_tableau[i] 127 | cv2.rectangle(img,(pt[0],pt[1]),(pt[2],pt[3]),color,2) 128 | cv2.putText(img,display_txt,(int(pt[0]),int(pt[1])+10),cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8) 129 | 130 | j+=1 131 | 132 | #把测试过的图片写入磁盘 133 | #cv2.imwrite('./tested/'+name+'.jpg',img) 134 | #print('Pic:'+name+" writed!") 135 | 136 | for obj_R in obj_real: 137 | for obj_P in obj_pre: 138 | if IoU(obj_R,obj_P)>0.5:#阈值暂设为0.5 139 | if obj_R['name']==obj_P['name']: 140 | accu_num+=1 141 | count+=1 142 | print("-------end-------") 143 | elapsed=(time.time()-time_start) 144 | print('共{:d}张图片\n用时:{:f} s\nfps={:f}\n准确率:{:f}' 145 | .format(count,elapsed,count/elapsed,accu_num/real_num)) -------------------------------------------------------------------------------- /l2norm.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from torch.autograd import Function 4 | from torch.autograd import Variable 5 | import torch.nn.init as init 6 | import Config 7 | class L2Norm(nn.Module): 8 | def __init__(self,n_channels, scale): 9 | super(L2Norm,self).__init__() 10 | self.n_channels = n_channels 11 | self.gamma = scale or None 12 | self.eps = 1e-10 13 | if Config.use_cuda: 14 | self.weight = nn.Parameter(torch.Tensor(self.n_channels).cuda()) 15 | else: 16 | self.weight = nn.Parameter(torch.Tensor(self.n_channels)) 17 | self.reset_parameters() 18 | 19 | def reset_parameters(self): 20 | nn.init.constant_(self.weight,self.gamma) 21 | 22 | def forward(self, x): 23 | norm = x.pow(2).sum(dim=1, keepdim=True).sqrt()+self.eps 24 | #x /= norm 25 | x = torch.div(x,norm) 26 | out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x 27 | return out 28 | -------------------------------------------------------------------------------- /loss_function.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | import utils 5 | import Config 6 | 7 | class LossFun(nn.Module): 8 | def __init__(self): 9 | super(LossFun,self).__init__() 10 | def forward(self, prediction,targets,priors_boxes): 11 | loc_data , conf_data = prediction 12 | loc_data = torch.cat([o.view(o.size(0),-1,4) for o in loc_data] ,1) 13 | conf_data = torch.cat([o.view(o.size(0),-1,Config.class_num) for o in conf_data],1) 14 | priors_boxes = torch.cat([o.view(-1,4) for o in priors_boxes],0) 15 | if Config.use_cuda: 16 | loc_data = loc_data.cuda() 17 | conf_data = conf_data.cuda() 18 | priors_boxes = priors_boxes.cuda() 19 | # batch_size 20 | batch_num = loc_data.size(0) 21 | # default_box数量 22 | box_num = loc_data.size(1) 23 | # 存储targets根据每一个prior_box变换后的数据 24 | target_loc = torch.Tensor(batch_num,box_num,4) 25 | target_loc.requires_grad_(requires_grad=False) 26 | # 存储每一个default_box预测的种类 27 | target_conf = torch.LongTensor(batch_num,box_num) 28 | target_conf.requires_grad_(requires_grad=False) 29 | if Config.use_cuda: 30 | target_loc = target_loc.cuda() 31 | target_conf = target_conf.cuda() 32 | # 因为一次batch可能有多个图,每次循环计算出一个图中的box,即8732个box的loc和conf,存放在target_loc和target_conf中 33 | for batch_id in range(batch_num): 34 | target_truths = targets[batch_id][:,:-1].data 35 | target_labels = targets[batch_id][:,-1].data 36 | if Config.use_cuda: 37 | target_truths = target_truths.cuda() 38 | target_labels = target_labels.cuda() 39 | # 计算box函数,即公式中loc损失函数的计算公式 40 | utils.match(0.5,target_truths,priors_boxes,target_labels,target_loc,target_conf,batch_id) 41 | pos = target_conf > 0 42 | pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data) 43 | # 相当于论文中L1损失函数乘xij的操作 44 | pre_loc_xij = loc_data[pos_idx].view(-1,4) 45 | tar_loc_xij = target_loc[pos_idx].view(-1,4) 46 | # 将计算好的loc和预测进行smooth_li损失函数 47 | loss_loc = F.smooth_l1_loss(pre_loc_xij,tar_loc_xij,size_average=False) 48 | 49 | batch_conf = conf_data.view(-1,Config.class_num) 50 | 51 | # 参照论文中conf计算方式,求出ci 52 | loss_c = utils.log_sum_exp(batch_conf) - batch_conf.gather(1, target_conf.view(-1, 1)) 53 | 54 | loss_c = loss_c.view(batch_num, -1) 55 | # 将正样本设定为0 56 | loss_c[pos] = 0 57 | 58 | # 将剩下的负样本排序,选出目标数量的负样本 59 | _, loss_idx = loss_c.sort(1, descending=True) 60 | _, idx_rank = loss_idx.sort(1) 61 | 62 | num_pos = pos.long().sum(1, keepdim=True) 63 | num_neg = torch.clamp(3*num_pos, max=pos.size(1)-1) 64 | 65 | # 提取出正负样本 66 | neg = idx_rank < num_neg.expand_as(idx_rank) 67 | pos_idx = pos.unsqueeze(2).expand_as(conf_data) 68 | neg_idx = neg.unsqueeze(2).expand_as(conf_data) 69 | 70 | conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1, Config.class_num) 71 | targets_weighted = target_conf[(pos+neg).gt(0)] 72 | loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False) 73 | 74 | N = num_pos.data.sum().double() 75 | loss_l = loss_loc.double() 76 | loss_c = loss_c.double() 77 | loss_l /= N 78 | loss_c /= N 79 | return loss_l, loss_c 80 | -------------------------------------------------------------------------------- /model_file_test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | vgg_weights = torch.load('./vgg16_reducedfc.pth') 3 | print(vgg_weights.keys()) -------------------------------------------------------------------------------- /result.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/result.jpg -------------------------------------------------------------------------------- /ssd_net_vgg.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import l2norm 4 | import Config as config 5 | class SSD(nn.Module): 6 | def __init__(self): 7 | super(SSD,self).__init__() 8 | self.vgg = [] 9 | #vgg-16模型 10 | self.vgg.append(nn.Conv2d(in_channels=3,out_channels=64,kernel_size=3,stride=1,padding=1))#conv1_1 11 | self.vgg.append(nn.ReLU(inplace=True)) 12 | self.vgg.append(nn.Conv2d(in_channels=64,out_channels=64,kernel_size=3,stride=1,padding=1))#conv1_2 13 | self.vgg.append(nn.ReLU(inplace=True)) 14 | self.vgg.append(nn.MaxPool2d(kernel_size=2,stride=2))#maxpool1 15 | self.vgg.append(nn.Conv2d(in_channels=64,out_channels=128,kernel_size=3,stride=1,padding=1))#conv2_1 16 | self.vgg.append(nn.ReLU(inplace=True)) 17 | self.vgg.append(nn.Conv2d(in_channels=128,out_channels=128,kernel_size=3,stride=1,padding=1))#conv2_2 18 | self.vgg.append(nn.ReLU(inplace=True)) 19 | self.vgg.append(nn.MaxPool2d(kernel_size=2,stride=2))#maxpool2 20 | self.vgg.append(nn.Conv2d(in_channels=128,out_channels=256,kernel_size=3,stride=1,padding=1))#conv3_1 21 | self.vgg.append(nn.ReLU(inplace=True)) 22 | self.vgg.append(nn.Conv2d(in_channels=256,out_channels=256,kernel_size=3,stride=1,padding=1))#conv3_2 23 | self.vgg.append(nn.ReLU(inplace=True)) 24 | self.vgg.append(nn.Conv2d(in_channels=256,out_channels=256,kernel_size=3,stride=1,padding=1))#conv3_3 25 | self.vgg.append(nn.ReLU(inplace=True)) 26 | self.vgg.append(nn.MaxPool2d(kernel_size=2,stride=2,ceil_mode=True))#maxpool3 27 | self.vgg.append(nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,stride=1,padding=1))#conv4_1 28 | self.vgg.append(nn.ReLU(inplace=True)) 29 | self.vgg.append(nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,stride=1,padding=1))#conv4_2 30 | self.vgg.append(nn.ReLU(inplace=True)) 31 | self.vgg.append(nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,stride=1,padding=1))#conv4_3 32 | self.vgg.append(nn.ReLU(inplace=True)) 33 | self.vgg.append(nn.MaxPool2d(kernel_size=2,stride=2))#maxpool4 34 | self.vgg.append(nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,stride=1,padding=1))#conv5_1 35 | self.vgg.append(nn.ReLU(inplace=True)) 36 | self.vgg.append(nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,stride=1,padding=1))#conv5_2 37 | self.vgg.append(nn.ReLU(inplace=True)) 38 | self.vgg.append(nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,stride=1,padding=1))#conv5_3 39 | self.vgg.append(nn.ReLU(inplace=True)) 40 | self.vgg.append(nn.MaxPool2d(kernel_size=3,stride=1,padding=1))#maxpool5 41 | self.vgg.append(nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=3,padding=6,dilation=6))#conv6 42 | self.vgg.append(nn.ReLU(inplace=True)) 43 | self.vgg.append(nn.Conv2d(in_channels=1024,out_channels=1024,kernel_size=1))#conv7 44 | self.vgg.append(nn.ReLU(inplace=True)) 45 | self.vgg = nn.ModuleList(self.vgg) 46 | self.conv8_1 = nn.Sequential( 47 | nn.Conv2d(in_channels=1024,out_channels=256,kernel_size=1), 48 | nn.ReLU(inplace=True) 49 | ) 50 | self.conv8_2 = nn.Sequential( 51 | nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,stride=2,padding=1), 52 | nn.ReLU(inplace=True) 53 | ) 54 | self.conv9_1 = nn.Sequential( 55 | nn.Conv2d(in_channels=512,out_channels=128,kernel_size=1), 56 | nn.ReLU(inplace=True) 57 | ) 58 | self.conv9_2 = nn.Sequential( 59 | nn.Conv2d(in_channels=128,out_channels=256,kernel_size=3,stride=2,padding=1), 60 | nn.ReLU(inplace=True) 61 | ) 62 | self.conv10_1 = nn.Sequential( 63 | nn.Conv2d(in_channels=256,out_channels=128,kernel_size=1), 64 | nn.ReLU(inplace=True) 65 | ) 66 | self.conv10_2 = nn.Sequential( 67 | nn.Conv2d(in_channels=128,out_channels=256,kernel_size=3,stride=1), 68 | nn.ReLU(inplace=True) 69 | ) 70 | self.conv11_1 = nn.Sequential( 71 | nn.Conv2d(in_channels=256, out_channels=128, kernel_size=1), 72 | nn.ReLU(inplace=True) 73 | ) 74 | self.conv11_2 = nn.Sequential( 75 | nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1), 76 | nn.ReLU(inplace=True) 77 | ) 78 | #特征层位置输出 79 | self.feature_map_loc_1 = nn.Sequential( 80 | nn.Conv2d(in_channels=512,out_channels=4*4,kernel_size=3,stride=1,padding=1) 81 | ) 82 | self.feature_map_loc_2 = nn.Sequential( 83 | nn.Conv2d(in_channels=1024,out_channels=6*4,kernel_size=3,stride=1,padding=1) 84 | ) 85 | self.feature_map_loc_3 = nn.Sequential( 86 | nn.Conv2d(in_channels=512,out_channels=6*4,kernel_size=3,stride=1,padding=1) 87 | ) 88 | self.feature_map_loc_4 = nn.Sequential( 89 | nn.Conv2d(in_channels=256,out_channels=6*4,kernel_size=3,stride=1,padding=1) 90 | ) 91 | self.feature_map_loc_5 = nn.Sequential( 92 | nn.Conv2d(in_channels=256,out_channels=4*4,kernel_size=3,stride=1,padding=1) 93 | ) 94 | self.feature_map_loc_6 = nn.Sequential( 95 | nn.Conv2d(in_channels=256,out_channels=4*4,kernel_size=3,stride=1,padding=1) 96 | ) 97 | #特征层类别输出 98 | self.feature_map_conf_1 = nn.Sequential( 99 | nn.Conv2d(in_channels=512,out_channels=4*config.class_num,kernel_size=3,stride=1,padding=1) 100 | ) 101 | self.feature_map_conf_2 = nn.Sequential( 102 | nn.Conv2d(in_channels=1024,out_channels=6*config.class_num,kernel_size=3,stride=1,padding=1) 103 | ) 104 | self.feature_map_conf_3 = nn.Sequential( 105 | nn.Conv2d(in_channels=512,out_channels=6*config.class_num,kernel_size=3,stride=1,padding=1) 106 | ) 107 | self.feature_map_conf_4 = nn.Sequential( 108 | nn.Conv2d(in_channels=256,out_channels=6*config.class_num,kernel_size=3,stride=1,padding=1) 109 | ) 110 | self.feature_map_conf_5 = nn.Sequential( 111 | nn.Conv2d(in_channels=256,out_channels=4*config.class_num,kernel_size=3,stride=1,padding=1) 112 | ) 113 | self.feature_map_conf_6 = nn.Sequential( 114 | nn.Conv2d(in_channels=256,out_channels=4*config.class_num,kernel_size=3,stride=1,padding=1) 115 | ) 116 | 117 | 118 | #正向传播过程 119 | def forward(self, image): 120 | out = self.vgg[0](image) 121 | out = self.vgg[1](out) 122 | out = self.vgg[2](out) 123 | out = self.vgg[3](out) 124 | out = self.vgg[4](out) 125 | out = self.vgg[5](out) 126 | out = self.vgg[6](out) 127 | out = self.vgg[7](out) 128 | out = self.vgg[8](out) 129 | out = self.vgg[9](out) 130 | out = self.vgg[10](out) 131 | out = self.vgg[11](out) 132 | out = self.vgg[12](out) 133 | out = self.vgg[13](out) 134 | out = self.vgg[14](out) 135 | out = self.vgg[15](out) 136 | out = self.vgg[16](out) 137 | out = self.vgg[17](out) 138 | out = self.vgg[18](out) 139 | out = self.vgg[19](out) 140 | out = self.vgg[20](out) 141 | out = self.vgg[21](out) 142 | out = self.vgg[22](out) 143 | my_L2Norm = l2norm.L2Norm(512, 20) 144 | feature_map_1 = out 145 | feature_map_1 = my_L2Norm(feature_map_1) 146 | loc_1 = self.feature_map_loc_1(feature_map_1).permute((0,2,3,1)).contiguous() 147 | conf_1 = self.feature_map_conf_1(feature_map_1).permute((0,2,3,1)).contiguous() 148 | out = self.vgg[23](out) 149 | out = self.vgg[24](out) 150 | out = self.vgg[25](out) 151 | out = self.vgg[26](out) 152 | out = self.vgg[27](out) 153 | out = self.vgg[28](out) 154 | out = self.vgg[29](out) 155 | out = self.vgg[30](out) 156 | out = self.vgg[31](out) 157 | out = self.vgg[32](out) 158 | out = self.vgg[33](out) 159 | out = self.vgg[34](out) 160 | feature_map_2 = out 161 | loc_2 = self.feature_map_loc_2(feature_map_2).permute((0,2,3,1)).contiguous() 162 | conf_2 = self.feature_map_conf_2(feature_map_2).permute((0,2,3,1)).contiguous() 163 | out = self.conv8_1(out) 164 | out = self.conv8_2(out) 165 | feature_map_3 = out 166 | loc_3 = self.feature_map_loc_3(feature_map_3).permute((0,2,3,1)).contiguous() 167 | conf_3 = self.feature_map_conf_3(feature_map_3).permute((0,2,3,1)).contiguous() 168 | out = self.conv9_1(out) 169 | out = self.conv9_2(out) 170 | feature_map_4 = out 171 | loc_4 = self.feature_map_loc_4(feature_map_4).permute((0,2,3,1)).contiguous() 172 | conf_4 = self.feature_map_conf_4(feature_map_4).permute((0,2,3,1)).contiguous() 173 | out = self.conv10_1(out) 174 | out = self.conv10_2(out) 175 | feature_map_5 = out 176 | loc_5 = self.feature_map_loc_5(feature_map_5).permute((0,2,3,1)).contiguous() 177 | conf_5 = self.feature_map_conf_5(feature_map_5).permute((0,2,3,1)).contiguous() 178 | out = self.conv11_1(out) 179 | out = self.conv11_2(out) 180 | feature_map_6 = out 181 | loc_6 = self.feature_map_loc_6(feature_map_6).permute((0,2,3,1)).contiguous() 182 | conf_6 = self.feature_map_conf_6(feature_map_6).permute((0,2,3,1)).contiguous() 183 | loc_list = [loc_1,loc_2,loc_3,loc_4,loc_5,loc_6] 184 | conf_list = [conf_1,conf_2,conf_3,conf_4,conf_5,conf_6] 185 | return loc_list,conf_list 186 | -------------------------------------------------------------------------------- /test.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/test.jpg -------------------------------------------------------------------------------- /test_done.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PengfeiM/Fatigue-Driven-Detection-Based-on-CNN/e2aeabe12bff3544dbbab04fa312ebfd866f2cd5/test_done.jpg -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import Config 2 | from itertools import product as product 3 | from math import sqrt as sqrt 4 | import torch 5 | def default_prior_box(): 6 | mean_layer = [] 7 | for k,f in enumerate(Config.feature_map): 8 | mean = [] 9 | for i,j in product(range(f),repeat=2): 10 | f_k = Config.image_size/Config.steps[k] 11 | cx = (j+0.5)/f_k 12 | cy = (i+0.5)/f_k 13 | 14 | s_k = Config.sk[k]/Config.image_size 15 | mean += [cx,cy,s_k,s_k] 16 | 17 | s_k_prime = sqrt(s_k * Config.sk[k+1]/Config.image_size) 18 | mean += [cx,cy,s_k_prime,s_k_prime] 19 | for ar in Config.aspect_ratios[k]: 20 | mean += [cx, cy, s_k * sqrt(ar), s_k/sqrt(ar)] 21 | mean += [cx, cy, s_k / sqrt(ar), s_k * sqrt(ar)] 22 | if Config.use_cuda: 23 | mean = torch.Tensor(mean).cuda().view(Config.feature_map[k], Config.feature_map[k], -1).contiguous() 24 | else: 25 | mean = torch.Tensor(mean).view( Config.feature_map[k],Config.feature_map[k],-1).contiguous() 26 | mean.clamp_(max=1, min=0) 27 | mean_layer.append(mean) 28 | 29 | return mean_layer 30 | def encode(match_boxes,prior_box,variances): 31 | g_cxcy = (match_boxes[:, :2] + match_boxes[:, 2:])/2 - prior_box[:, :2] 32 | # encode variance 33 | g_cxcy /= (variances[0] * prior_box[:, 2:]) 34 | # match wh / prior wh 35 | g_wh = (match_boxes[:, 2:] - match_boxes[:, :2]) / prior_box[:, 2:] 36 | g_wh = torch.log(g_wh) / variances[1] 37 | # return target for smooth_l1_loss 38 | return torch.cat([g_cxcy, g_wh], 1) # [num_priors,4] 39 | 40 | def change_prior_box(box): 41 | if Config.use_cuda: 42 | return torch.cat((box[:, :2] - box[:, 2:]/2, # xmin, ymin 43 | box[:, :2] + box[:, 2:]/2), 1).cuda() # xmax, ymax 44 | else: 45 | return torch.cat((box[:, :2] - box[:, 2:]/2, # xmin, ymin 46 | box[:, :2] + box[:, 2:]/2), 1) 47 | # 计算两个box的交集 48 | def insersect(box1,box2): 49 | label_num = box1.size(0) 50 | box_num = box2.size(0) 51 | max_xy = torch.min( 52 | box1[:,2:].unsqueeze(1).expand(label_num,box_num,2), 53 | box2[:,2:].unsqueeze(0).expand(label_num,box_num,2) 54 | ) 55 | min_xy = torch.max( 56 | box1[:,:2].unsqueeze(1).expand(label_num,box_num,2), 57 | box2[:,:2].unsqueeze(0).expand(label_num,box_num,2) 58 | ) 59 | inter = torch.clamp((max_xy-min_xy),min=0) 60 | return inter[:,:,0]*inter[:,:,1] 61 | 62 | def jaccard(box_a, box_b): 63 | """计算jaccard比 64 | 公式: 65 | A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 66 | """ 67 | inter = insersect(box_a, box_b) 68 | area_a = ((box_a[:, 2]-box_a[:, 0]) * 69 | (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B] 70 | area_b = ((box_b[:, 2]-box_b[:, 0]) * 71 | (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter) # [A,B] 72 | union = area_a + area_b - inter 73 | return inter / union # [A,B] 74 | def point_form(boxes): 75 | 76 | return torch.cat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin 77 | boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax 78 | def match(threshold, truths, priors, labels, loc_t, conf_t, idx): 79 | """计算default box和实际位置的jaccard比,计算出每个box的最大jaccard比的种类和每个种类的最大jaccard比的box 80 | Args: 81 | threshold: (float) jaccard比的阈值. 82 | truths: (tensor) 实际位置. 83 | priors: (tensor) default box 84 | labels: (tensor) 一个图片实际包含的类别数. 85 | loc_t: (tensor) 需要存储每个box不同类别中的最大jaccard比. 86 | conf_t: (tensor) 存储每个box的最大jaccard比的类别. 87 | idx: (int) 当前的批次 88 | """ 89 | # 计算jaccard比 90 | overlaps = jaccard( 91 | truths, 92 | # 转换priors,转换为x_min,y_min,x_max和y_max 93 | point_form(priors) 94 | ) 95 | # [1,num_objects] best prior for each ground truth 96 | # 实际包含的类别对应box中jaccarb最大的box和对应的索引值,即每个类别最优box 97 | best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) 98 | # [1,num_priors] best ground truth for each prior 99 | # 每一个box,在实际类别中最大的jaccard比的类别,即每个box最优类别 100 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True) 101 | best_truth_idx.squeeze_(0) 102 | best_truth_overlap.squeeze_(0) 103 | best_prior_idx.squeeze_(1) 104 | best_prior_overlap.squeeze_(1) 105 | # 将每个类别中的最大box设置为2,确保不影响后边操作 106 | best_truth_overlap.index_fill_(0, best_prior_idx, 2) 107 | 108 | # 计算每一个box的最优类别,和每个类别的最优loc 109 | for j in range(best_prior_idx.size(0)): 110 | best_truth_idx[best_prior_idx[j]] = j 111 | matches = truths[best_truth_idx] # Shape: [num_priors,4] 112 | conf = labels[best_truth_idx] + 1 # Shape: [num_priors] 113 | conf[best_truth_overlap < threshold] = 0 # label as background 114 | # 实现loc的转换,具体的转换公式参照论文中的loc的loss函数的计算公式 115 | loc = encode(matches, priors,(0.1,0.2)) 116 | loc_t[idx] = loc # [num_priors,4] encoded offsets to learn 117 | conf_t[idx] = conf # [num_priors] top class label for each prior 118 | 119 | 120 | def log_sum_exp(x): 121 | """Utility function for computing log_sum_exp while determining 122 | This will be used to determine unaveraged confidence loss across 123 | all examples in a batch. 124 | Args: 125 | x (Variable(tensor)): conf_preds from conf layers 126 | """ 127 | x_max = x.data.max() 128 | result = torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max 129 | return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max 130 | 131 | def decode(loc, priors, variances): 132 | """Decode locations from predictions using priors to undo 133 | the encoding we did for offset regression at train time. 134 | Args: 135 | loc (tensor): location predictions for loc layers, 136 | Shape: [num_priors,4] 137 | priors (tensor): Prior boxes in center-offset form. 138 | Shape: [num_priors,4]. 139 | variances: (list[float]) Variances of priorboxes 140 | Return: 141 | decoded bounding box predictions 142 | """ 143 | 144 | boxes = torch.cat(( 145 | priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:], 146 | priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1) 147 | boxes[:, :2] -= boxes[:, 2:] / 2 148 | boxes[:, 2:] += boxes[:, :2] 149 | return boxes 150 | def nms(boxes, scores, overlap=0.5, top_k=200): 151 | """Apply non-maximum suppression at test time to avoid detecting too many 152 | overlapping bounding boxes for a given object. 153 | Args: 154 | boxes: (tensor) The location preds for the img, Shape: [num_priors,4]. 155 | scores: (tensor) The class predscores for the img, Shape:[num_priors]. 156 | overlap: (float) The overlap thresh for suppressing unnecessary boxes. 157 | top_k: (int) The Maximum number of box preds to consider. 158 | Return: 159 | The indices of the kept boxes with respect to num_priors. 160 | """ 161 | 162 | keep = scores.new(scores.size(0)).zero_().long() 163 | if boxes.numel() == 0: 164 | return keep,0 165 | x1 = boxes[:, 0] 166 | y1 = boxes[:, 1] 167 | x2 = boxes[:, 2] 168 | y2 = boxes[:, 3] 169 | area = torch.mul(x2 - x1, y2 - y1) 170 | v, idx = scores.sort(0) # sort in ascending order 171 | # I = I[v >= 0.01] 172 | idx = idx[-top_k:] # indices of the top-k largest vals 173 | xx1 = boxes.new() 174 | yy1 = boxes.new() 175 | xx2 = boxes.new() 176 | yy2 = boxes.new() 177 | w = boxes.new() 178 | h = boxes.new() 179 | 180 | # keep = torch.Tensor() 181 | count = 0 182 | while idx.numel() > 0: 183 | i = idx[-1] # index of current largest val 184 | # keep.append(i) 185 | keep[count] = i 186 | count += 1 187 | if idx.size(0) == 1: 188 | break 189 | idx = idx[:-1] # remove kept element from view 190 | # load bboxes of next highest vals 191 | torch.index_select(x1, 0, idx, out=xx1) 192 | torch.index_select(y1, 0, idx, out=yy1) 193 | torch.index_select(x2, 0, idx, out=xx2) 194 | torch.index_select(y2, 0, idx, out=yy2) 195 | # store element-wise max with next highest score 196 | xx1 = torch.clamp(xx1, min=x1[i]) 197 | yy1 = torch.clamp(yy1, min=y1[i]) 198 | xx2 = torch.clamp(xx2, max=x2[i]) 199 | yy2 = torch.clamp(yy2, max=y2[i]) 200 | w.resize_as_(xx2) 201 | h.resize_as_(yy2) 202 | w = xx2 - xx1 203 | h = yy2 - yy1 204 | # check sizes of xx1 and xx2.. after each iteration 205 | w = torch.clamp(w, min=0.0) 206 | h = torch.clamp(h, min=0.0) 207 | inter = w*h 208 | # IoU = i / (area(a) + area(b) - i) 209 | rem_areas = torch.index_select(area, 0, idx) # load remaining areas) 210 | union = (rem_areas - inter) + area[i] 211 | IoU = inter/union # store result in iou 212 | # keep only elements with an IoU <= overlap 213 | idx = idx[IoU.le(overlap)] 214 | return keep, count 215 | if __name__ == '__main__': 216 | mean = default_prior_box() 217 | print(mean) -------------------------------------------------------------------------------- /video_detection.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Fri Apr 26 15:58:49 2019 4 | 5 | @author: 朋飞 6 | """ 7 | 8 | from torch.autograd import Variable 9 | from detection import * 10 | from ssd_net_vgg import * 11 | from voc0712 import * 12 | import torch 13 | import torch.nn as nn 14 | import numpy as np 15 | import cv2 16 | import utils 17 | import torch.backends.cudnn as cudnn 18 | import time 19 | #检测cuda是否可用 20 | if torch.cuda.is_available(): 21 | print('-----gpu mode-----') 22 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 23 | else: 24 | print('-----cpu mode-----') 25 | colors_tableau=[ (214, 39, 40),(23, 190, 207),(188, 189, 34),(188,34,188),(205,108,8)] 26 | 27 | def Yawn(list_Y,list_y1): 28 | list_cmp=list_Y[:len(list_Y1)]==list_Y1 29 | for flag in list_cmp: 30 | if flag==False: 31 | return False 32 | return True 33 | #初始化网络 34 | net=SSD() 35 | net=torch.nn.DataParallel(net) 36 | net.train(mode=False) 37 | net.load_state_dict(torch.load('./weights/ssd300_VOC_100000.pth',map_location=lambda storage,loc: storage)) 38 | if torch.cuda.is_available(): 39 | net = net.cuda() 40 | cudnn.benchmark = True 41 | 42 | img_mean=(104.0,117.0,123.0) 43 | 44 | #打开视频文件,file_name改成0即为打开摄像头 45 | file_name='C:/Users/HP/Desktop/9-FemaleNoGlasses.avi' 46 | cap=cv2.VideoCapture(file_name) 47 | max_fps=0 48 | 49 | #保存检测结果的List 50 | #眼睛和嘴巴都是,张开为‘1’,闭合为‘0’ 51 | video_fps=20#视频fps=20 52 | list_B=np.ones(video_fps*3)#眼睛状态List,建议根据fps修改,视频fps=20 53 | list_Y=np.zeros(video_fps*10)#嘴巴状态list,10s 54 | list_Y1=np.ones(int(video_fps*1.5))#如果在list_Y中存在list_Y1,则判定一次打哈欠(大约1.5s), 55 | list_Y1[int(video_fps*1.5)-1]=0#从持续张嘴到闭嘴判定为一次打哈欠 56 | list_blink=np.ones(video_fps*10)#大约是记录10S内信息,眨眼为‘1’,不眨眼为‘0’ 57 | list_yawn=np.zeros(video_fps*30)#大约是半分钟内打哈欠记录,打哈欠为‘1’,不打哈欠为‘0’ 58 | 59 | #blink_count=0#眨眼计数 60 | #yawn_count=0 61 | #blink_start=time.time()#炸眼时间 62 | #yawn_start=time.time()#打哈欠时间 63 | blink_freq=0.5 64 | yawn_freq=0 65 | #开始检测,按‘q’退出 66 | while cap.isOpened(): 67 | flag_B=True#是否闭眼的flag 68 | flag_Y=False#张嘴flag 69 | 70 | num_rec=0#检测到的眼睛的数量 71 | start=time.time()#计时 72 | ret,img=cap.read()#读取图片 73 | 74 | #检测 75 | x=cv2.resize(img,(300,300)).astype(np.float32) 76 | x-=img_mean 77 | x=x.astype(np.float32) 78 | x=x[:,:,::-1].copy() 79 | x=torch.from_numpy(x).permute(2,0,1) 80 | xx=Variable(x.unsqueeze(0)) 81 | if torch.cuda.is_available(): 82 | xx=xx.cuda() 83 | y=net(xx) 84 | softmax=nn.Softmax(dim=-1) 85 | # detect=Detect(config.class_num,0,200,0.01,0.45) 86 | detect = Detect.apply 87 | priors=utils.default_prior_box() 88 | 89 | loc,conf=y 90 | loc=torch.cat([o.view(o.size(0),-1)for o in loc],1) 91 | conf=torch.cat([o.view(o.size(0),-1)for o in conf],1) 92 | 93 | detections=detect( 94 | loc.view(loc.size(0),-1,4), 95 | softmax(conf.view(conf.size(0),-1,config.class_num)), 96 | torch.cat([o.view(-1,4) for o in priors],0), 97 | config.class_num, 98 | 200, 99 | 0.7, 100 | 0.45 101 | ).data 102 | labels=VOC_CLASSES 103 | top_k=10 104 | 105 | #将检测结果放置于图片上 106 | scale=torch.Tensor(img.shape[1::-1]).repeat(2) 107 | for i in range(detections.size(1)): 108 | 109 | j=0 110 | while detections[0,i,j,0]>=0.4: 111 | score=detections[0,i,j,0] 112 | label_name=labels[i-1] 113 | if label_name=='closed_eye': 114 | flag_B=False 115 | if label_name=='open_mouth': 116 | flag_Y=True 117 | display_txt='%s:%.2f'%(label_name,score) 118 | pt=(detections[0,i,j,1:]*scale).cpu().numpy() 119 | coords=(pt[0],pt[1]),pt[2]-pt[0]+1,pt[3]-pt[1]+1 120 | color=colors_tableau[i] 121 | cv2.rectangle(img,(pt[0],pt[1]),(pt[2],pt[3]),color,2) 122 | cv2.putText(img,display_txt,(int(pt[0]),int(pt[1])+10),cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8) 123 | j+=1 124 | num_rec+=1 125 | if num_rec>0: 126 | if flag_B: 127 | #print(' 1:eye-open') 128 | list_B=np.append(list_B,1)#睁眼为‘1’ 129 | else: 130 | #print(' 0:eye-closed') 131 | list_B=np.append(list_B,0)#闭眼为‘0’ 132 | list_B=np.delete(list_B,0) 133 | if flag_Y: 134 | list_Y=np.append(list_Y,1) 135 | else: 136 | list_Y=np.append(list_Y,0) 137 | list_Y=np.delete(list_Y,0) 138 | else: 139 | print('nothing detected') 140 | #print(list) 141 | 142 | if list_B[13]==1 and list_B[14]==0: 143 | #如果上一帧为’1‘,此帧为’0‘则判定为眨眼 144 | print('----------------眨眼----------------------') 145 | list_blink=np.append(list_blink,1) 146 | else: 147 | list_blink=np.append(list_blink,0) 148 | list_blink=np.delete(list_blink,0) 149 | 150 | 151 | #检测打哈欠 152 | #if Yawn(list_Y,list_Y1): 153 | if (list_Y[len(list_Y)-len(list_Y1):]==list_Y1).all(): 154 | print('----------------------打哈欠----------------------') 155 | list_Y=np.zeros(50)#此处是检测到一次打哈欠之后将嘴部状态list全部置‘0’,考虑到打哈欠所用时间较长,所以基本不会出现漏检 156 | list_yawn=np.append(list_yawn,1) 157 | else: 158 | list_yawn=np.append(list_yawn,0) 159 | list_yawn=np.delete(list_yawn,0) 160 | 161 | 162 | 163 | #实时计算PERCLOS perblink,peryawn 164 | #即计算平均闭眼时长百分比,平均眨眼百分比,平均打哈欠百分比 165 | perclos=1-np.average(list_B) 166 | perblink=np.average(list_blink) 167 | peryawn=np.average(list_yawn) 168 | #print('perclos={:f}'.format(perclos)) 169 | 170 | #此处为判断疲劳部分 171 | #想法1:两个频率计算改为实时的,所以此处不再修改 172 | if(perclos>0.4 or perblink<2.5/(10*video_fps) or peryawn>3/(30*video_fps)): 173 | print('疲劳') 174 | else: 175 | print('清醒') 176 | 177 | 178 | T=time.time()-start 179 | fps=1/T#实时在视频上显示fps 180 | if fps>max_fps: 181 | max_fps=fps 182 | fps_txt='fps:%.2f'%(fps) 183 | cv2.putText(img,fps_txt,(0,10),cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8) 184 | cv2.imshow("ssd",img) 185 | if cv2.waitKey(100) & 0xff == ord('q'): 186 | break 187 | #print("-------end-------") 188 | cap.release() 189 | cv2.destroyAllWindows() 190 | #print(max_fps) -------------------------------------------------------------------------------- /voc0712.py: -------------------------------------------------------------------------------- 1 | """VOC Dataset Classes 2 | 3 | Original author: Francisco Massa 4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py 5 | 6 | Updated by: Ellis Brown, Max deGroot 7 | """ 8 | import os.path as osp 9 | import sys 10 | import torch 11 | import torch.utils.data as data 12 | import cv2 13 | import numpy as np 14 | if sys.version_info[0] == 2: 15 | import xml.etree.cElementTree as ET 16 | else: 17 | import xml.etree.ElementTree as ET 18 | 19 | VOC_CLASSES = [ # always index 0 20 | 'open_eye','closed_eye','closed_mouth','open_mouth'] 21 | 22 | # note: if you used our download scripts, this should be right 23 | VOC_ROOT = osp.join('./', "data/VOCdevkit/") 24 | 25 | 26 | class VOCAnnotationTransform(object): 27 | """Transforms a VOC annotation into a Tensor of bbox coords and label index 28 | Initilized with a dictionary lookup of classnames to indexes 29 | 30 | Arguments: 31 | class_to_ind (dict, optional): dictionary lookup of classnames -> indexes 32 | (default: alphabetic indexing of VOC's 20 classes) 33 | keep_difficult (bool, optional): keep difficult instances or not 34 | (default: False) 35 | height (int): height 36 | width (int): width 37 | """ 38 | 39 | def __init__(self, class_to_ind=None, keep_difficult=False): 40 | self.class_to_ind = class_to_ind or dict( 41 | zip(VOC_CLASSES, range(len(VOC_CLASSES)))) 42 | self.keep_difficult = keep_difficult 43 | 44 | def __call__(self, target, width, height): 45 | """ 46 | Arguments: 47 | target (annotation) : the target annotation to be made usable 48 | will be an ET.Element 49 | Returns: 50 | a list containing lists of bounding boxes [bbox coords, class name] 51 | """ 52 | res = [] 53 | for obj in target.iter('object'): 54 | difficult = int(obj.find('difficult').text) == 1 55 | if not self.keep_difficult and difficult: 56 | continue 57 | name = obj.find('name').text.lower().strip() 58 | bbox = obj.find('bndbox') 59 | 60 | pts = ['xmin', 'ymin', 'xmax', 'ymax'] 61 | bndbox = [] 62 | for i, pt in enumerate(pts): 63 | cur_pt = int(bbox.find(pt).text) - 1 64 | # scale height or width 65 | cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height 66 | bndbox.append(cur_pt) 67 | label_idx = self.class_to_ind[name] 68 | bndbox.append(label_idx) 69 | res += [bndbox] # [xmin, ymin, xmax, ymax, label_ind] 70 | # img_id = target.find('filename').text[:-4] 71 | 72 | return res # [[xmin, ymin, xmax, ymax, label_ind], ... ] 73 | 74 | 75 | class VOCDetection(data.Dataset): 76 | """VOC Detection Dataset Object 77 | 78 | input is image, target is annotation 79 | 80 | Arguments: 81 | root (string): filepath to VOCdevkit folder. 82 | image_set (string): imageset to use (eg. 'train', 'val', 'test') 83 | transform (callable, optional): transformation to perform on the 84 | input image 85 | target_transform (callable, optional): transformation to perform on the 86 | target `annotation` 87 | (eg: take in caption string, return tensor of word indices) 88 | dataset_name (string, optional): which dataset to load 89 | (default: 'VOC2007') 90 | """ 91 | 92 | def __init__(self, root, 93 | image_sets=[('trainval')], 94 | transform=None, target_transform=VOCAnnotationTransform(), 95 | dataset_name='My_Data'): 96 | self.root = root 97 | self.image_set = image_sets 98 | self.transform = transform 99 | self.target_transform = target_transform 100 | self.name = dataset_name 101 | self._annopath = osp.join('%s', 'Annotations', '%s.xml') 102 | self._imgpath = osp.join('%s', 'JPEGImages', '%s.jpg') 103 | self.ids = list() 104 | for (name) in image_sets: 105 | #rootpath = osp.join(self.root, 'VOC' + year) 106 | rootpath=self.root 107 | for line in open(osp.join(rootpath, 'ImageSets', 'Main', name + '.txt')): 108 | self.ids.append((rootpath, line.strip())) 109 | 110 | def __getitem__(self, index): 111 | im, gt, h, w = self.pull_item(index) 112 | 113 | return im, gt 114 | 115 | def __len__(self): 116 | return len(self.ids) 117 | 118 | def pull_item(self, index): 119 | img_id = self.ids[index] 120 | 121 | target = ET.parse(self._annopath % img_id).getroot() 122 | img = cv2.imread(self._imgpath % img_id) 123 | height, width, channels = img.shape 124 | 125 | if self.target_transform is not None: 126 | target = self.target_transform(target, width, height) 127 | 128 | if self.transform is not None: 129 | target = np.array(target) 130 | img, boxes, labels = self.transform(img, target[:, :4], target[:, 4]) 131 | # to rgb 132 | img = img[:, :, (2, 1, 0)] 133 | # img = img.transpose(2, 0, 1) 134 | target = np.hstack((boxes, np.expand_dims(labels, axis=1))) 135 | return torch.from_numpy(img).permute(2, 0, 1), target, height, width 136 | # return torch.from_numpy(img), target, height, width 137 | 138 | def pull_image(self, index): 139 | '''Returns the original image object at index in PIL form 140 | 141 | Note: not using self.__getitem__(), as any transformations passed in 142 | could mess up this functionality. 143 | 144 | Argument: 145 | index (int): index of img to show 146 | Return: 147 | PIL img 148 | ''' 149 | img_id = self.ids[index] 150 | return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR) 151 | 152 | def pull_anno(self, index): 153 | '''Returns the original annotation of image at index 154 | 155 | Note: not using self.__getitem__(), as any transformations passed in 156 | could mess up this functionality. 157 | 158 | Argument: 159 | index (int): index of img to get annotation of 160 | Return: 161 | list: [img_id, [(label, bbox coords),...]] 162 | eg: ('001718', [('dog', (96, 13, 438, 332))]) 163 | ''' 164 | img_id = self.ids[index] 165 | anno = ET.parse(self._annopath % img_id).getroot() 166 | gt = self.target_transform(anno, 1, 1) 167 | return img_id[1], gt 168 | 169 | def pull_tensor(self, index): 170 | '''Returns the original image at an index in tensor form 171 | 172 | Note: not using self.__getitem__(), as any transformations passed in 173 | could mess up this functionality. 174 | 175 | Argument: 176 | index (int): index of img to show 177 | Return: 178 | tensorized version of img, squeezed 179 | ''' 180 | return torch.Tensor(self.pull_image(index)).unsqueeze_(0) 181 | -------------------------------------------------------------------------------- /weights/readme.txt: -------------------------------------------------------------------------------- 1 | 将下载好的权重文件放在此处 2 | 3 | 1000-5000 是原来的 4 | 10000-120000是后来的数据集训练结果 --------------------------------------------------------------------------------