├── Individual Project Report_La Rosa.html ├── Individual Project Report_La Rosa.ipynb ├── README.md ├── md ├── Individual Project Report_La Rosa.md ├── output_0_0.png ├── output_13_0.png ├── output_17_0.png ├── output_25_0.png ├── output_47_0.png ├── output_51_0.png ├── output_55_0.png ├── output_58_0.png └── output_62_0.png └── outputs ├── cleaned_timer.png ├── custom_yolo.png ├── header.png ├── iou_threshold.png ├── methodology.png ├── pre_trained_yolo.png ├── resized_cctv_full_fast.mp4 ├── resized_cctv_full_fast_cut.mov ├── resized_cctv_full_fast_cut.mp4 ├── sample_frame.png └── stations.png /README.md: -------------------------------------------------------------------------------- 1 | # Employee Monitoring Using Object Detection 2 | 3 | Deep Learning Individual Project by Patrick Guillano P. La Rosa - March 03, 2022. The code, analysis, and the full report are included in the [Technical Report](https://github.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/blob/main/md/Individual%20Project%20Report_La%20Rosa.md). If you have any questions regarding this study, please send me a message via [LinkedIn](https://www.linkedin.com/in/patricklarosa/). 4 | 5 | 6 |

7 | 8 | ## Executive Summary 9 | 10 |

Since the pandemic is finally softening and nearing its end, many companies are returning to the office. However, we encounter some forgotten problems as we stay in the office. As an employee, overtime hours are usually not reflected in our salary because we cannot track and justify how long we have worked in a day. On the other hand, as an employer, we are not sure how long an employee works in a day, primarily when the company supports a flexible schedule. Some employee could easily cheat their time which happened to our small business before. This project aims to answer. How might we ensure fairness in tracking hours worked for employees?

11 | 12 |

The solution I implemented in this project is to use deep learning techniques to detect an employee and monitor the time the employee is seated in their station. Another feature of this project is to count the number of people inside the office. To be able to track if there are some violations of social distancing.

13 | 14 |

I extracted 200 sample frames from an office's closed-circuit television (CCTV) footage, then trained a custom YOLO model to detect employees with the manually labeled sampled images as a training and validation set. I set a threshold depending on the Intersect Over Union (IoU) of the employee to each station to determine if the seat is taken or vacant. To measure the time the employee is seated at the station, I performed Optical Character Recognition (OCR) to the time of the CCTV footage presented on the screen to have a more accurate result than using frames per second of the camera might vary.

15 | 16 |

The application of transfer learning on YOLO increased the average precision from 69% trained on the COCO dataset to 98% in our custom dataset. This allows for robust and real-time detection when deployed as a real-time employee tracker. Thresholding for rules values depends on the camera angle and distinguishes between employees and edge cases.

17 | 18 |

Recommendations for this project involve additional features outside of the current scope, like detection of the actual action of the employee if they are talking, or working, among others. Further recommendations involve alternative use cases for the system. With modifications to the implementation, this system may be redeployed for parking management, security systems, and traffic management.

19 | 20 | https://user-images.githubusercontent.com/67182415/177245750-7488612b-f820-4b33-95b0-74e9ad26776e.mov 21 | -------------------------------------------------------------------------------- /md/Individual Project Report_La Rosa.md: -------------------------------------------------------------------------------- 1 | ```python 2 | display(Image(filename="cover.png")) 3 | ``` 4 | 5 | 6 | ![png](output_0_0.png) 7 | 8 | 9 | ## Highlights 10 | 11 | - Employee monitoring to ensure fairness in tracking hours worked for employees 12 | - Use of YOLO for counting and detecting persons in a room 13 | - Fine tuning of YOLO in a custom dataset to improve the result by 30% 14 | - Use of Intersection over Union (IoU) as a threshold to determine if the station is taken or vacant 15 | - Use of image processing techniques and use of pytesseract to perform OCR to get the time of the cctv camera 16 | 17 | ## Introduction 18 | 19 |

Since the pandemic is finally softening and nearing its end, many companies are returning to the office. However, we encounter some forgotten problems as we stay in the office. As an employee, overtime hours are usually not reflected in our salary because we cannot track and justify how long we have worked in a day. On the other hand, as an employer, we are not sure how long an employee works in a day, primarily when the company supports a flexible schedule. Some employee could easily cheat their time which happened to our small business before. This project aims to answer How might we ensure fairness in tracking hours worked for employees?

20 |

The solution that I implemented in this project is to use deep learning techniques to detect an employee and monitor the time the employee is seated in its station. An additional feature in this project is to count the number of person inside the office. To be able to track if there are some violations on social distancing.

21 | 22 | 23 | ```python 24 | # python standard libraries 25 | import os 26 | import time 27 | import glob 28 | import re 29 | import shutil 30 | import pickle 31 | from datetime import datetime, timedelta 32 | from base64 import b64decode, b64encode 33 | 34 | # google colab/notebook libraries 35 | from IPython.display import display, Javascript, Image 36 | from google.colab.output import eval_js 37 | from google.colab.patches import cv2_imshow 38 | 39 | # external libraries 40 | import cv2 41 | import numpy as np 42 | import PIL 43 | import io 44 | import html 45 | import matplotlib.pyplot as plt 46 | from tqdm import tqdm 47 | from skimage.morphology import erosion 48 | 49 | # define color constants 50 | person_color = (220, 155, 58) 51 | vacant_color = (0, 0, 200) 52 | taken_color = (0, 200, 0) 53 | station_color = (0, 100, 0) 54 | 55 | # constant coordinates 56 | coordinates = { 57 | 'station_1' : {'x1':1600, 'x2':1800, 58 | 'y1':575, 'y2':780}, 59 | 'station_2' : {'x1':1287, 'x2':1472, 60 | 'y1':310, 'y2':425}, 61 | 'station_3' : {'x1':1145, 'x2':1287, 62 | 'y1':197, 'y2':268}, 63 | 64 | 'station_4' : {'x1':561, 'x2':764, 65 | 'y1':424, 'y2':578} 66 | } 67 | 68 | coordinates_ocr= [(1256, 39), (1885, 101)] 69 | 70 | %matplotlib inline 71 | ``` 72 | 73 | 74 | ```python 75 | from IPython.display import HTML 76 | from IPython.display import Image 77 | 78 | HTML(''' 90 | 104 | ''') 106 | ``` 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | ```python 115 | # install if working on colab 116 | !sudo apt install tesseract-ocr 117 | !pip -qq install pytesseract 118 | 119 | # import and install separately to avoid dependency issues later 120 | import pytesseract 121 | ``` 122 | 123 | 124 | ```python 125 | # clone darknet repo if you are using original repository 126 | !git clone https://github.com/AlexeyAB/darknet 127 | ``` 128 | 129 | 130 | ```python 131 | # connect to google drive 132 | from google.colab import drive 133 | drive.mount('/content/drive') 134 | ``` 135 | 136 | 137 | ```python 138 | # copy whole dataset 139 | !cp /content/drive/MyDrive/MSDS/ML3/final_project/XVR_ch5_main*.mp4 /content/ 140 | 141 | # clone of darknet github, extracted dataset, configurations, and custom weights 142 | !cp /content/drive/MyDrive/MSDS/ML3/final_project/darknet_best.zip /content/ 143 | 144 | # trained weights from custom dataset 145 | !cp /content/drive/MyDrive/MSDS/ML3/final_project/yolov4-obj_best.weights /content/ 146 | 147 | # unzip darknet directory 148 | !unzip -qq darknet_best.zip 149 | 150 | # clone original darknet repo (use if you want default settings) 151 | !git clone https://github.com/AlexeyAB/darknet 152 | 153 | # change makefile to have GPU, OPENCV and LIBSO enabled 154 | %cd /content/darknet 155 | !sed -i 's/OPENCV=0/OPENCV=1/' Makefile 156 | !sed -i 's/GPU=0/GPU=1/' Makefile 157 | !sed -i 's/CUDNN=0/CUDNN=1/' Makefile 158 | !sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile 159 | !sed -i 's/LIBSO=0/LIBSO=1/' Makefile 160 | 161 | # make darknet (builds darknet so that you can then use the darknet.py file 162 | # and have its dependencies) 163 | !make 164 | ``` 165 | 166 |

Exploratory Data Analysis

167 | 168 |

Since it is just a single camera each of the frames would have the same size and channels. For exploratory data analysis, let us check the dimesions of one frame of the CCTV feed.

169 | 170 | 171 | ```python 172 | # print sample image 173 | vidcap = cv2.VideoCapture('../XVR_ch5_main_20220214100004_20220214110005.mp4') 174 | success,frame = vidcap.read() 175 | cv2_imshow(frame) 176 | ``` 177 | 178 | 179 | ![png](output_13_0.png) 180 | 181 | 182 | The frame have dimensions of: 183 | 184 | 185 | ```python 186 | # Get dimensions of image 187 | width, height, channels = frame.shape 188 | print(f'width: {width}') 189 | print(f'height: {height}') 190 | print(f'channels: {channels}') 191 | ``` 192 | 193 | width: 1080 194 | height: 1920 195 | channels: 3 196 | 197 | 198 | ## Methodology 199 | 200 | 201 | ```python 202 | display(Image(filename="methodology.png")) 203 | ``` 204 | 205 | 206 | ![png](output_17_0.png) 207 | 208 | 209 | a.) Dataset 210 |

The dataset used for this project is a personal video surveillance of our small business with total length of 1 hour in February 14.

211 | 212 | b.) Extract images 213 |

Extracted 200 sample images to be used for training and validation set.

214 | 215 | c.) Fine tune YOLOv4 model 216 |

Performed transfer learning using pretrained weights trained on COCO dataset and fine tune using custom dataset.

217 | 218 | d.) Perform non-max suppression 219 |

Performed non-max suppression to remove multiple bounding boxes in a single object and get the maximum confidence among those bounding boxes that overlaps.

220 | 221 | e.) Set-up work station 222 |

Set up work station to get and create a rule based on IoU of the detected person and the workstation to identify what are the stations that are taken and vacant.

223 | 224 | f.) Compute for the time of the employee in work station 225 |

Used the information provided by the DVR which can be found on the upper right of the image and performed image processing techniques and optical character recognition to parse the image to a datetime python object.

226 | 227 | ## Results and Discussion 228 | 229 |

i. Pre trained YOLO on COCO dataset

230 | 231 |

YOLO has pre trained model on COCO dataset which could classify 80 objects. COCO dataset is a large scale object detection, segmentation, and captioning of over 330k images. One of the objects that the YOLO trained on this dataset could classify is person which is what we need for this study. Let us try to use if it would work well on our dataset.

It turns out that for some of the frames it would perfectly classify the person in the image. However, there are some frames that the model misclassify the person like as we can see in the lower right of the figure. Additionally, there are sometimes multiple bounding boxes on a single object and some objects that are not needed in this project are being classified like tv monitor. Although the model is trained on thousand of images, it was not really trained on this type of environment and probably not to all angles of a person.

285 | 286 | ## ii. Train YOLO on custom dataset 287 | 288 |

To remedy the issues found in using pretrained model on COCO dataset, we can perform training on custom dataset. The detailed explanation on how to train your YOLO on custom dataset can be found in their documentation.

289 | 290 |

a. Extract custom dataset

291 | 292 |

Training YOLO on custom dataset would require images, so we need to transform our video and sample it into images. 50 to 100 images are usually enough to train a single object but for this project I sampled the data into 200 images to ensure more correctness of the data.

293 | 294 | 295 | 296 | ```python 297 | # extract data from video to 700 sampled images 298 | cap = cv2.VideoCapture('XVR_ch5_main_20220214100004_20220214110005.mp4') 299 | length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) 300 | img_array =[] 301 | counter = 0 302 | image_count = 0 303 | while ret: 304 | # Capture frame-by-frame 305 | ret, frame = cap.read() 306 | 307 | if counter % int(length/700) == 0: 308 | fname = f'{image_count}.jpg' 309 | image_count += 1 310 | cv2.imwrite(fname, frame) 311 | 312 | counter += 1 313 | if cv2.waitKey(1) & 0xFF == ord('q'): 314 | break 315 | 316 | # When everything done, release the capture 317 | cap.release() 318 | cv2.destroyAllWindows() 319 | ``` 320 | 321 |

b. Manually label person in custom dataset

322 | 323 |

Now that we have our 200 images, we need to manually label the images in a format expected by the YOLO algorithm which is {object-class} {x_center} {y_center} {width} {height}. I used LabelImg to manually label my data into bounding boxes. It will produce text files of bounding boxes with the same filename as the image file and a text file that contains the class names. We then split the data into train and 10 percent validation set.

324 | 325 | 326 | ```python 327 | # copy txt and jpg data to a directory 328 | fnames = glob.glob('data/*.txt') 329 | 330 | for fname in fnames: 331 | fname_target = fname.replace('data', 'train') 332 | shutil.copyfile(fname, fname_target) 333 | if 'classes' in fname: 334 | continue 335 | else: 336 | image_fname = fname.replace('txt', 'jpg') 337 | image_target = image_fname.replace('data', 'train') 338 | shutil.copyfile(image_fname, image_target) 339 | 340 | # create train and test data 341 | # Percentage of images to be used for the test set 342 | percentage_test = 10; 343 | 344 | # Create and/or truncate train.txt and test.txt 345 | file_train = open('./data/train.txt', 'w') 346 | file_test = open('./data/test.txt', 'w') 347 | 348 | # Populate train.txt and test.txt 349 | counter = 1 350 | index_test = round(100 / percentage_test) 351 | for pathAndFilename in glob.iglob(os.path.join(os.getcwd(), "*.jpg")): 352 | title, ext = os.path.splitext(os.path.basename(pathAndFilename)) 353 | 354 | if counter == index_test: 355 | counter = 1 356 | file_test.write("data/obj" + "/" + title + '.jpg' + "\n") 357 | else: 358 | file_train.write("data/obj" + "/" + title + '.jpg' + "\n") 359 | counter = counter + 1 360 | 361 | file_train.close() 362 | file_test.close() 363 | ``` 364 | 365 |

c. Train the model

366 | 367 |

The model training can be performed by using the detector train command which expects at least three parameters: data, configurations, and initial weights. Initially I used yolov4.conv.137 as my pre trained weights which is trained in COCO dataset. Then, I retrained my weights to further improve the results. Overall it took me about six hours of training to get a good result.

368 | 369 | 370 | ```python 371 | # train by transfer learning from weights trained on coco dataset by darknet 372 | !./darknet detector train data/obj.data cfg/yolov4-obj.cfg yolov4.conv.137 -dont_show -map 373 | ``` 374 | 375 | 376 | ```python 377 | # continue training model by transfer learning from weights trained on custom dataset 378 | !./darknet detector train data/obj.data cfg/yolov4-obj.cfg backup/yolov4-obj_last.weights -dont_show -map 379 | ``` 380 | 381 | 382 | ```python 383 | # backup saved weights 384 | !cp backup/yolov4-obj_1000.weights /content/drive/MyDrive/MSDS/ML3/final_project/yolov4-obj_1000.weights 385 | !cp backup/yolov4-obj_1000.weights /content/drive/MyDrive/MSDS/ML3/final_project/yolov4-obj_best.weights 386 | !cp backup/yolov4-obj_1000.weights /content/drive/MyDrive/MSDS/ML3/final_project/yolov4-obj_last.weights 387 | 388 | # copy mean average precision per epoch 389 | !cp chart.png /content/drive/MyDrive/MSDS/ML3/final_project/chart_trained.png 390 | ``` 391 | 392 |

d. Perform Non-Max Suppression

393 | 394 |

The non max supression created by darknet prioritizes the bottom right bounding box and removes the overlap. I have updated the code of darknet to get the maximum confidence of the bounding boxes and remove the overlap. The solution could be slower than darknet implementation but it yielded into better accuracy which is more important in this project. I chose 65% threshold for the non-max suppression as it provided optimal result.

e. Inference custom YOLO

477 | 478 |

Now that we have trained our custom YOLO model and created a custom non-max suppression, we can now try if the model would work on a test set. The average precision increased from 69% trained on coco dataset to 98% in our custom dataset. You can evaluate using mean average precision of the object. It can be calculated by adjusting the threshold of confidence and get the average precision score on 50% IoU score. It can be calculated by using detector map command of YOLOv4 repository.

479 | 480 | 481 | ```python 482 | # import darknet functions to perform object detections 483 | from darknet import * 484 | # load in our YOLOv4 architecture network 485 | (network, 486 | class_names, 487 | class_colors) = load_network("cfg/yolov4-obj.cfg", 488 | "data/obj.data", 489 | "backup/yolov4-obj_best.weights") 490 | width = network_width(network) 491 | height = network_height(network) 492 | 493 | # darknet helper function to run detection on image 494 | def darknet_helper(img, width, height): 495 | """ darknet helper function to get detections, width and height ratio 496 | 497 | Parameters 498 | ========== 499 | img : np.array 500 | image file 501 | width : int 502 | width 503 | height : int 504 | height 505 | 506 | Returns 507 | ========= 508 | darknet_helper : tuple 509 | tuple of detections, width and height ratio 510 | """ 511 | darknet_image = make_image(width, height, 3) 512 | img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 513 | img_resized = cv2.resize(img_rgb, (width, height), 514 | interpolation=cv2.INTER_LINEAR) 515 | 516 | # get image ratios to convert bounding boxes to proper size 517 | img_height, img_width, _ = img.shape 518 | width_ratio = img_width/width 519 | height_ratio = img_height/height 520 | 521 | # run model on darknet style image to get detections 522 | copy_image_from_bytes(darknet_image, img_resized.tobytes()) 523 | detections = detect_image(network, class_names, darknet_image) 524 | free_image(darknet_image) 525 | return detections, width_ratio, height_ratio 526 | ``` 527 | 528 | 529 | ```python 530 | # run custom yolo on a sample image 531 | vidcap = cv2.VideoCapture('../XVR_ch5_main_20220214100004_20220214110005.mp4') 532 | 533 | for i in range(15): 534 | success,frame = vidcap.read() 535 | 536 | # get the predicted detections of the trained custom yolo 537 | detections, width_ratio, height_ratio = darknet_helper(frame, width, height) 538 | 539 | # apply non max suppression to eliminate multiple predictions 540 | # on same person 541 | detections = non_max_suppression_fast(detections, 0.65) 542 | 543 | for label, confidence, bbox in detections: 544 | left, top, right, bottom = bbox2points(bbox) 545 | left, top, right, bottom = (int(left * width_ratio), int(top * height_ratio), 546 | int(right * width_ratio), int(bottom * height_ratio)) 547 | cv2.rectangle(frame, (left, top), (right, bottom), person_color, 2) 548 | cv2.putText(frame, "{} [{:.2f}]".format(label, float(confidence)), 549 | (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 550 | person_color, 2) 551 | 552 | cv2_imshow(frame) 553 | ``` 554 | 555 | 556 | ```python 557 | display(Image(filename="../trained.png")) 558 | ``` 559 | 560 | 561 | ![png](output_47_0.png) 562 | 563 | 564 | 565 | ```python 566 | # evaluate in validation set 567 | !./darknet detector map data/obj.data cfg/yolov4-obj.cfg backup/yolov4-obj_best.weights -points 0 568 | ``` 569 | 570 |

iii. Set up work station area

571 | 572 |

We now get the coordinates of the four work stations. I used paint to manually get the coordinates of each of the stations and plotted inte figure below.

573 | 574 | 575 | ```python 576 | # run custom yolo on a sample image 577 | vidcap = cv2.VideoCapture('/content/XVR_ch5_main_20220214100004_20220214110005.mp4') 578 | success,frame = vidcap.read() 579 | 580 | 581 | for stations, coordinate in coordinates.items(): 582 | cv2.rectangle(frame, (coordinate['x1'], coordinate['y1']), 583 | (coordinate['x2'], coordinate['y2']), station_color, 2) 584 | cv2.putText(frame, f"{stations}", 585 | (coordinate['x1'], coordinate['y1'] - 5), 586 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, 587 | station_color, 2) 588 | 589 | cv2_imshow(frame) 590 | ``` 591 | 592 | 593 | ![png](output_51_0.png) 594 | 595 | 596 |

iv. Integrate Custom YOLO on work stations

597 | 598 |

To integrate work stations in our custom network, we need to set up a rule to determine if the work station is taken or vacant. I used the Intersect Over Union (IoU) of 0.3 as a threshold if they overlap with the person to determine if the station is taken.

599 | 600 | 601 | ```python 602 | def get_iou(bb1, bb2): 603 | """ 604 | Calculate the Intersection over Union (IoU) of two bounding boxes. 605 | 606 | Parameters 607 | ---------- 608 | bb1 : dict 609 | Keys: {'x1', 'x2', 'y1', 'y2'} 610 | The (x1, y1) position is at the top left corner, 611 | the (x2, y2) position is at the bottom right corner 612 | bb2 : dict 613 | Keys: {'x1', 'x2', 'y1', 'y2'} 614 | The (x, y) position is at the top left corner, 615 | the (x2, y2) position is at the bottom right corner 616 | 617 | Returns 618 | ------- 619 | float 620 | in [0, 1] 621 | """ 622 | # determine the coordinates of the intersection rectangle 623 | x_left = max(bb1['x1'], bb2['x1']) 624 | y_top = max(bb1['y1'], bb2['y1']) 625 | x_right = min(bb1['x2'], bb2['x2']) 626 | y_bottom = min(bb1['y2'], bb2['y2']) 627 | 628 | if x_right < x_left or y_bottom < y_top: 629 | return 0.0 630 | 631 | # The intersection of two axis-aligned bounding boxes is always an 632 | # axis-aligned bounding box 633 | intersection_area = (x_right - x_left) * (y_bottom - y_top) 634 | 635 | # compute the area of both AABBs 636 | bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1']) 637 | bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1']) 638 | # compute the intersection over union by taking the intersection 639 | # area and dividing it by the sum of prediction + ground-truth 640 | # areas - the interesection area 641 | iou = intersection_area / float(bb1_area + bb2_area - intersection_area) 642 | 643 | return iou 644 | ``` 645 | 646 | 647 | ```python 648 | # run custom yolo on a sample image 649 | vidcap = cv2.VideoCapture('/content/XVR_ch5_main_20220214100004_20220214110005.mp4') 650 | success,frame = vidcap.read() 651 | 652 | # get the predicted detections of the trained custom yolo 653 | detections, width_ratio, height_ratio = darknet_helper(frame, width, height) 654 | 655 | # apply non max suppression to eliminate multiple predictions 656 | # on same person 657 | detections = non_max_suppression_fast(detections, 0.65) 658 | detections_bb = [] 659 | for label, confidence, bbox in detections: 660 | left, top, right, bottom = bbox2points(bbox) 661 | left, top, right, bottom = (int(left * width_ratio), 662 | int(top * height_ratio), 663 | int(right * width_ratio), 664 | int(bottom * height_ratio)) 665 | 666 | cv2.rectangle(frame, (left, top), (right, bottom), person_color, 2) 667 | cv2.putText(frame, "{} [{:.2f}]".format(label, float(confidence)), 668 | (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 669 | person_color, 2) 670 | 671 | detections_bb.append({ 672 | 'x1' : left, 673 | 'y1' : top, 674 | 'x2' : right, 675 | 'y2' : bottom 676 | }) 677 | 678 | thresh = 0.3 679 | for stations, coordinate in coordinates.items(): 680 | taken = False 681 | for detection in detections_bb: 682 | iou = get_iou(coordinate, detection) 683 | if iou >= thresh: 684 | taken = True 685 | break 686 | color = taken_color if taken else vacant_color 687 | 688 | cv2.rectangle(frame, (coordinate['x1'], coordinate['y1']), 689 | (coordinate['x2'], coordinate['y2']), color, 2) 690 | 691 | cv2.putText(frame, f"{stations}", 692 | (coordinate['x1'], coordinate['y1'] - 5), 693 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, 694 | color, 2) 695 | frame = cv2.resize(frame, (1080, 720), 696 | interpolation=cv2.INTER_AREA) 697 | 698 | cv2_imshow(frame) 699 | ``` 700 | 701 | 702 | ![png](output_55_0.png) 703 | 704 | 705 |

v. Extract datetime information

706 | 707 |

To get the time the employee stays in its work station, we can perform it in two ways: We can use the information of the image presented by the DVR which can be seen in the upper right of the image, or used the information of the camera by knowing the frames per second of the camera. In this project I used both method but the former is more appropriate since based on my experience, CCTV cameras are often changed compared to the DVR. Cameras could have multiple frames per second, and using that information could yield to incorrect classification of time. Hence, extracting the information produced by the DVR would have longer usability of the model that we would create. Here is the sample of the original image:

708 | 709 | 710 | ```python 711 | if not imgs: 712 | vidcap = cv2.VideoCapture('../XVR_ch5_main_20220214100004_20220214110005.mp4') 713 | success,imgs = vidcap.read() 714 | 715 | cv2_imshow(imgs[1]) 716 | ``` 717 | 718 | 719 | ![png](output_58_0.png) 720 | 721 | 722 | ### a. Datetime information using OCR 723 | 724 |

First, I performed image preprocessing in order for the OCR model to understand the text in the image more accurately. The image processing techniques performed are extract the image to get the area with the date time only, convert the image into BGR to Gray, adjust the contrast and brightness to make the background brighter before thresholding, perform adaptive thresholding to remove the background, and lasty use morphological operations such as multiple erosion to make the text bolder.

725 | 726 | 727 | ```python 728 | def multi_ero(im, num): 729 | """ Perform multiple erosion on the image 730 | 731 | Parameters 732 | ========== 733 | im : np.array 734 | image file 735 | num : int 736 | number of times to apply erosion 737 | """ 738 | for i in range(num): 739 | im = erosion(im) 740 | return im 741 | 742 | imgs = [] 743 | # get images for testing 744 | vidcap = cv2.VideoCapture('../XVR_ch5_main_20220214100004_20220214110005.mp4') 745 | success,frame = vidcap.read() 746 | for i in tqdm(range(8000)): 747 | # Capture frame-by-frame 748 | success, frame = vidcap.read() 749 | if not i % 50: 750 | if frame is not None: 751 | imgs.append(frame) 752 | else: 753 | pass 754 | invalid = [] 755 | valid = [] 756 | datetime_clean = [] 757 | 758 | 759 | for img in tqdm(imgs): 760 | img = cv2.cvtColor(img.copy(), cv2.COLOR_BGR2GRAY) 761 | 762 | contrast = 3 763 | contrast = max(contrast, 1.0); contrast = min(contrast, 3.0) 764 | 765 | brightness = 60 766 | brightness = max(brightness, 0.0); brightness = min(brightness, 100.0) 767 | 768 | img = np.clip(contrast * img.astype('float32') 769 | + brightness, 0.0, 255.0) 770 | 771 | img = img.astype('uint8') 772 | 773 | img = cv2.adaptiveThreshold(img, 774 | 255, 775 | cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 776 | cv2.THRESH_BINARY, 777 | 21, 2) 778 | 779 | img = img[coordinates_ocr[0][1]:coordinates_ocr[1][1], 780 | coordinates_ocr[0][0]:coordinates_ocr[1][0]] 781 | 782 | img = multi_ero(img, 2) 783 | datetime_clean.append(img) 784 | text = pytesseract.image_to_string(img, lang='eng', 785 | config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789:-') 786 | 787 | time_format = r'[0-5]\d:[0-5]\d:[0-5]\d' 788 | date_format = r'\d{4}-(?:0\d|1[12])-(?:[0-2]\d|3[01])' 789 | datetime_format = date_format + time_format 790 | text = text.replace(' ', '') 791 | try: 792 | timestamp_string = re.sub('(\d{4}-(?:0\d|1[12])-(?:[0-2]\d|3[01]))', 793 | r'\1' + r' ', 794 | re.findall(datetime_format, text)[0]) 795 | except: 796 | invalid.append(text) 797 | continue 798 | 799 | 800 | if len(text) != 20: 801 | invalid.append(text) 802 | 803 | else: 804 | valid.append(text) 805 | ``` 806 | 807 | 808 | ```python 809 | cv2_imshow(datetime_clean[1]) 810 | ``` 811 | 812 | 813 | ![png](output_62_0.png) 814 | 815 | 816 |

I used Pytesseract to read the text in the image and convert it to datetime object of python. Pytesseract is an optical character recognition (OCR) tool for python made by Google that uses Deep Learning and in particular LSTM to predict on the text of the image. I used the following configurations:psm=10 so that it it will classify per character, and tessedit_char_whitelist=0123456789:- so that the model would be forced to classify between these whitelist characters which are expected in our date and time element

817 | 818 | 819 | ```python 820 | def get_ocr_datetime(img, contrast=3, brightness=60): 821 | """ get the datetime equivalent based on the image 822 | 823 | Parameters 824 | ========== 825 | img : np.array 826 | image file 827 | contrast : int 828 | contrast between 1-3 829 | brightness : int 830 | brightness between 0-100 831 | 832 | Returns 833 | ========= 834 | get_ocr_datetime : datetime.datetime 835 | datetime equivalent of the cctv image 836 | """ 837 | # convert to grayscale 838 | img = cv2.cvtColor(img.copy(), cv2.COLOR_BGR2GRAY) 839 | 840 | contrast = max(contrast, 1.0) 841 | contrast = min(contrast, 3.0) 842 | 843 | brightness = max(brightness, 0.0) 844 | brightness = min(brightness, 100.0) 845 | 846 | # clip image based on contrast and brightness provided 847 | img = np.clip(contrast * img.astype('float32') 848 | + brightness, 0.0, 255.0) 849 | 850 | img = img.astype('uint8') 851 | 852 | # perform adaptive thresholding 853 | img = cv2.adaptiveThreshold(img, 854 | 255, 855 | cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 856 | cv2.THRESH_BINARY, 857 | 21, 2) 858 | 859 | # perform segmentation on the region of interest 860 | img = img[coordinates_ocr[0][1]:coordinates_ocr[1][1], 861 | coordinates_ocr[0][0]:coordinates_ocr[1][0]] 862 | 863 | # perform multiple erosion 864 | img = multi_ero(img, 2) 865 | 866 | # get text using pytesseract 867 | text = pytesseract.image_to_string(img, lang='eng', 868 | config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789:-') 869 | 870 | # check validity of results 871 | time_format = r'[0-5]\d:[0-5]\d:[0-5]\d' 872 | date_format = r'\d{4}-(?:0\d|1[12])-(?:[0-2]\d|3[01])' 873 | datetime_format = date_format + time_format 874 | text = text.replace(' ', '') 875 | 876 | if len(text) == 20: 877 | text = '2022-02-14' + text[10:] 878 | 879 | try: 880 | timestamp_string = re.sub('(\d{4}-(?:0\d|1[12])-(?:[0-2]\d|3[01]))', 881 | r'\1' + r' ', 882 | re.findall(datetime_format, text)[0]) 883 | except: 884 | return None 885 | 886 | return datetime.strptime(timestamp_string, "%Y-%m-%d %H:%M:%S") 887 | ``` 888 | 889 | 890 | ```python 891 | print(f'correct datetime format percentage: {len(valid)/len(imgs) * 100}') 892 | ``` 893 | 894 | correct datetime format percentage: 70.0 895 | 896 | 897 |

I checked the format and found out that 70% of the results are correctly classified in a correct datetime format by the model. In particular, it experienced difficulties in classifying number eight when it is near six. However there are about 15 frames per second so there will be a lot of chance for the model to get the correct time. Here is a sample of the output of the pytesseract that is passed to be converted to Python datetime.

898 | 899 | 900 | ```python 901 | get_ocr_datetime(imgs[1]) 902 | ``` 903 | 904 | 905 | datetime.datetime(2022, 2, 14, 10, 0, 7) 906 | 907 | 908 | ### b. Datetime information using fps 909 | 910 |

The camera that I am using is a 15 fps which is one of the standard of a CCTV. Using that information, we count the number of frames an employee is sitting on its station then for every 15 frames we count that as 1 second.

911 | 912 |

VI. Integration of timer to workstation

913 | 914 |

a. OCR

915 | 916 | 917 | ```python 918 | # initialize timer per station 919 | # list definition: 920 | # list[0] : total time in work station 921 | # list[1] : last datetime 922 | # list[2] : debt 923 | timer = {'station_' + str(i): [timedelta(0), None, False] for i in range(1,5)} 924 | 925 | %cd /content/ 926 | cap = cv2.VideoCapture('XVR_ch5_main_20220214100004_20220214110005.mp4') 927 | success,frame = cap.read() 928 | 929 | width = 1600 930 | height = 900 931 | resize = True 932 | img_array =[] 933 | for i in tqdm(range(4500)): 934 | # Capture frame-by-frame 935 | ret, frame = cap.read() 936 | 937 | if i <= 2600: 938 | continue 939 | 940 | detections, width_ratio, height_ratio = darknet_helper(frame, 941 | width, 942 | height) 943 | detections = non_max_suppression_fast(detections, 0.65) 944 | detections_bb = [] 945 | for label, confidence, bbox in detections: 946 | left, top, right, bottom = bbox2points(bbox) 947 | left, top, right, bottom = (int(left * width_ratio), 948 | int(top * height_ratio), 949 | int(right * width_ratio), 950 | int(bottom * height_ratio)) 951 | cv2.rectangle(frame, (left, top), (right, bottom), person_color, 2) 952 | cv2.putText(frame, "{} [{:.2f}]".format(label, float(confidence)), 953 | (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 954 | person_color, 4) 955 | 956 | detections_bb.append({ 957 | 'x1' : left, 958 | 'y1' : top, 959 | 'x2' : right, 960 | 'y2' : bottom 961 | }) 962 | 963 | thresh = 0.3 964 | 965 | for stations, coordinate in coordinates.items(): 966 | taken = False 967 | for detection in detections_bb: 968 | iou = get_iou(coordinate, detection) 969 | if iou >= thresh: 970 | taken = True 971 | break 972 | 973 | if taken or timer[stations][2]: 974 | ocr_time = get_ocr_datetime(frame) 975 | if ocr_time is None: 976 | timer[stations][2] = True 977 | continue 978 | else: 979 | timer[stations][2] = False 980 | if timer[stations][1] is None: 981 | timer[stations][1] = ocr_time 982 | else: 983 | if timer[stations][1] > ocr_time: 984 | # invalid time 985 | timer[stations][2] = True 986 | elif (ocr_time - timer[stations][1]) <= timedelta(seconds=3): 987 | timer[stations][0] += (ocr_time - timer[stations][1]) 988 | timer[stations][1] = ocr_time 989 | else: 990 | # invalid time 991 | timer[stations][2] = True 992 | 993 | color = taken_color if taken else vacant_color 994 | 995 | cv2.rectangle(frame, (coordinate['x1'], coordinate['y1']), 996 | (coordinate['x2'], coordinate['y2']), color, 2) 997 | 998 | cv2.putText(frame, f"{stations} [{str(timer[stations][0])}]", 999 | (coordinate['x1'], coordinate['y1'] - 5), 1000 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1001 | color, 2) 1002 | 1003 | if resize: 1004 | frame = cv2.resize(frame, (width, height), 1005 | interpolation=cv2.INTER_AREA) 1006 | img_array.append(frame) 1007 | 1008 | if cv2.waitKey(1) & 0xFF == ord('q'): 1009 | break 1010 | 1011 | # When everything done, release the capture 1012 | cap.release() 1013 | cv2.destroyAllWindows() 1014 | ``` 1015 | 1016 | ### b. FPS 1017 | 1018 | 1019 | ```python 1020 | # initialize timer per station 1021 | # list definition: 1022 | # list[0] : total time in work station 1023 | # list[1] : number of frames in each work station 1024 | timer = {'station_' + str(i): [timedelta(0), 0] for i in range(1,5)} 1025 | 1026 | %cd /content/ 1027 | cap = cv2.VideoCapture('XVR_ch5_main_20220214100004_20220214110005.mp4') 1028 | success,frame = cap.read() 1029 | 1030 | width = 1600 1031 | height = 900 1032 | resize = False 1033 | img_array =[] 1034 | for i in tqdm(range(4300)): 1035 | # Capture frame-by-frame 1036 | ret, frame = cap.read() 1037 | 1038 | if i <= 2600: 1039 | continue 1040 | 1041 | detections, width_ratio, height_ratio = darknet_helper(frame, 1042 | width, 1043 | height) 1044 | detections = non_max_suppression_fast(detections, 0.65) 1045 | detections_bb = [] 1046 | for label, confidence, bbox in detections: 1047 | left, top, right, bottom = bbox2points(bbox) 1048 | left, top, right, bottom = (int(left * width_ratio), 1049 | int(top * height_ratio), 1050 | int(right * width_ratio), 1051 | int(bottom * height_ratio)) 1052 | cv2.rectangle(frame, (left, top), (right, bottom), person_color, 2) 1053 | cv2.putText(frame, "{} [{:.2f}]".format(label, float(confidence)), 1054 | (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1055 | person_color, 4) 1056 | 1057 | detections_bb.append({ 1058 | 'x1' : left, 1059 | 'y1' : top, 1060 | 'x2' : right, 1061 | 'y2' : bottom 1062 | }) 1063 | 1064 | thresh = 0.2 1065 | 1066 | for stations, coordinate in coordinates.items(): 1067 | taken = False 1068 | for detection in detections_bb: 1069 | iou = get_iou(coordinate, detection) 1070 | if iou >= thresh: 1071 | taken = True 1072 | break 1073 | 1074 | if taken: 1075 | timer[stations][1] += 1 1076 | if timer[stations][1] % 15 == 0: 1077 | timer[stations][1] = 0 1078 | timer[stations][0] += timedelta(seconds=1) 1079 | 1080 | 1081 | color = taken_color if taken else vacant_color 1082 | 1083 | cv2.rectangle(frame, (coordinate['x1'], coordinate['y1']), 1084 | (coordinate['x2'], coordinate['y2']), color, 2) 1085 | 1086 | cv2.putText(frame, f"{stations} [{str(timer[stations][0])}]", 1087 | (coordinate['x1'], coordinate['y1'] - 5), 1088 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1089 | color, 2) 1090 | 1091 | count_person = len(detections_bb) 1092 | cv2.rectangle(frame, (23, 26), 1093 | (208, 63), (0,0,0), -1) 1094 | 1095 | cv2.putText(frame, f"Count of Person: {count_person:0>2}", 1096 | (23 + 5,26+ 25), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1097 | (255,255, 255), 2) 1098 | 1099 | if resize: 1100 | frame = cv2.resize(frame, (width, height), 1101 | interpolation=cv2.INTER_AREA) 1102 | img_array.append(frame) 1103 | 1104 | if cv2.waitKey(1) & 0xFF == ord('q'): 1105 | break 1106 | 1107 | # When everything done, release the capture 1108 | cap.release() 1109 | cv2.destroyAllWindows() 1110 | ``` 1111 | 1112 | ## VII. Save frames as Video 1113 | 1114 |

Save frames as video as mp4 and compress it in order to be compatible with Google Colab. The demo video will be submitted separately so it will not blow up the size of the notebook.

1115 | 1116 | 1117 | ```python 1118 | cap = cv2.VideoCapture('resized_cctv_full.mp4') 1119 | 1120 | img_array =[] 1121 | success = True 1122 | while success: 1123 | success,frame = cap.read() 1124 | # Capture frame-by-frame 1125 | img_array.append(frame) 1126 | ``` 1127 | 1128 | 1129 | ```python 1130 | %cd /content/ 1131 | fname = 'resized_cctv.mp4' 1132 | if not resize: 1133 | width = 1920 1134 | height = 1080 1135 | 1136 | if any([True if fname in f else False for f in os.listdir()]): 1137 | !rm resized_cctv.mp4 1138 | 1139 | out = cv2.VideoWriter('/content/resized_cctv.mp4', 1140 | cv2.VideoWriter_fourcc(*'MP4V'), 1141 | 20, (1600, 900)) 1142 | 1143 | for i in tqdm(range(len(img_array))): 1144 | out.write(img_array[i]) 1145 | out.release() 1146 | ``` 1147 | 1148 | 1149 | ```python 1150 | %cd darknet 1151 | 1152 | from IPython.display import HTML 1153 | from base64 import b64encode 1154 | import os 1155 | 1156 | # Input video path 1157 | save_path = "/content/resized_cctv_full.mp4" 1158 | 1159 | # Compressed video path 1160 | compressed_path = "/content/resized_cctv_compressed.mp4" 1161 | 1162 | os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}") 1163 | 1164 | # Show video 1165 | mp4 = open(compressed_path,'rb').read() 1166 | data_url = "data:video/mp4;base64," + b64encode(mp4).decode() 1167 | HTML(""" 1168 |

1171 | """ % data_url) 1172 | 1173 | ``` 1174 | 1175 | 1176 | ```python 1177 | !cp /content/resized_cctv.mp4 /content/drive/MyDrive/MSDS/ML3/final_project/resized_cctv_full_fast.mp4 1178 | ``` 1179 | -------------------------------------------------------------------------------- /md/output_0_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_0_0.png -------------------------------------------------------------------------------- /md/output_13_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_13_0.png -------------------------------------------------------------------------------- /md/output_17_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_17_0.png -------------------------------------------------------------------------------- /md/output_25_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_25_0.png -------------------------------------------------------------------------------- /md/output_47_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_47_0.png -------------------------------------------------------------------------------- /md/output_51_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_51_0.png -------------------------------------------------------------------------------- /md/output_55_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_55_0.png -------------------------------------------------------------------------------- /md/output_58_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_58_0.png -------------------------------------------------------------------------------- /md/output_62_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/md/output_62_0.png -------------------------------------------------------------------------------- /outputs/cleaned_timer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/cleaned_timer.png -------------------------------------------------------------------------------- /outputs/custom_yolo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/custom_yolo.png -------------------------------------------------------------------------------- /outputs/header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/header.png -------------------------------------------------------------------------------- /outputs/iou_threshold.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/iou_threshold.png -------------------------------------------------------------------------------- /outputs/methodology.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/methodology.png -------------------------------------------------------------------------------- /outputs/pre_trained_yolo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/pre_trained_yolo.png -------------------------------------------------------------------------------- /outputs/resized_cctv_full_fast.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/resized_cctv_full_fast.mp4 -------------------------------------------------------------------------------- /outputs/resized_cctv_full_fast_cut.mov: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/resized_cctv_full_fast_cut.mov -------------------------------------------------------------------------------- /outputs/resized_cctv_full_fast_cut.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/resized_cctv_full_fast_cut.mp4 -------------------------------------------------------------------------------- /outputs/sample_frame.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/sample_frame.png -------------------------------------------------------------------------------- /outputs/stations.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pgplarosa/Employee-Monitoring-Using-Object-Detection/84165a72930d785d7489e4a6300d4b4b69812957/outputs/stations.png --------------------------------------------------------------------------------