├── README.md
├── media
    ├── demo.mp4
    ├── directory_structure.png
    ├── fps_fp16.png
    ├── fps_fp32.png
    ├── fps_int8.png
    ├── inference_time_fp16.png
    ├── inference_time_fp32.png
    ├── inference_time_int8.png
    ├── model_loading_time_fp16.png
    ├── model_loading_time_fp32.png
    └── model_loading_time_int8.png
├── requirements.txt
└── src
    ├── face_detection.py
    ├── facial_landmarks_detection.py
    ├── gaze_estimation.py
    ├── head_pose_estimation.py
    ├── input_feeder.py
    ├── main.py
    └── mouse_controller.py


/README.md:
--------------------------------------------------------------------------------
  1 | # Computer-Pointer-Controller
  2 | 
  3 | ## Introduction
  4 | Computer Pointer Controller app is used to controll the movement of mouse pointer by the direction of eyes and also estimated pose of head. This app takes video as input and then app estimates eye-direction and head-pose and based on that estimation it move the mouse pointers.
  5 | 
  6 | ## Demo video
  7 | [![Demo video](https://img.youtube.com/vi/qR9rQQ4wiMQ/0.jpg)](https://www.youtube.com/watch?v=qR9rQQ4wiMQ)
  8 | 
  9 | ## Project Set Up and Installation
 10 | 
 11 | ### Setup
 12 | 
 13 | #### Prerequisites
 14 |   - You need to install openvino successfully. <br/>
 15 |   See this [guide](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html) for installing openvino.
 16 | 
 17 | #### Step 1
 18 | Clone the repository:- https://github.com/denilDG/Computer-Pointer-Controller
 19 | 
 20 | #### Step 2
 21 | Initialize the openVINO environment:-
 22 | ```
 23 | source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5
 24 | ```
 25 | 
 26 | #### Step 3
 27 | 
 28 | Download the following models by using openVINO model downloader:-
 29 | 
 30 | **1. Face Detection Model**
 31 | ```
 32 | python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "face-detection-adas-binary-0001"
 33 | ```
 34 | **2. Facial Landmarks Detection Model**
 35 | ```
 36 | python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "landmarks-regression-retail-0009"
 37 | ```
 38 | **3. Head Pose Estimation Model**
 39 | ```
 40 | python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "head-pose-estimation-adas-0001"
 41 | ```
 42 | **4. Gaze Estimation Model**
 43 | ```
 44 | python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "gaze-estimation-adas-0002"
 45 | ```
 46 | 
 47 | ## Demo
 48 | 
 49 | Open a new terminal and run the following commands:-
 50 | 
 51 | **1. Change the directory to src directory of project repository**
 52 | ```
 53 | cd <project-repo-path>/src
 54 | ```
 55 | **2. Run the main.py file**
 56 | ```
 57 | python main.py -f <Path of xml file of face detection model> \
 58 | -fl <Path of xml file of facial landmarks detection model> \
 59 | -hp <Path of xml file of head pose estimation model> \
 60 | -g <Path of xml file of gaze estimation model> \
 61 | -i <Path of input video file or enter cam for taking input video from webcam> 
 62 | ```
 63 | 
 64 | - If you want to run app on GPU:-
 65 | ```
 66 | python main.py -f <Path of xml file of face detection model> \
 67 | -fl <Path of xml file of facial landmarks detection model> \
 68 | -hp <Path of xml file of head pose estimation model> \
 69 | -g <Path of xml file of gaze estimation model> \
 70 | -i <Path of input video file or enter cam for taking input video from webcam> 
 71 | -d GPU
 72 | ```
 73 | - If you want to run app on FPGA:-
 74 | ```
 75 | python main.py -f <Path of xml file of face detection model> \
 76 | -fl <Path of xml file of facial landmarks detection model> \
 77 | -hp <Path of xml file of head pose estimation model> \
 78 | -g <Path of xml file of gaze estimation model> \
 79 | -i <Path of input video file or enter cam for taking input video from webcam> 
 80 | -d HETERO:FPGA,CPU
 81 | ```
 82 | 
 83 | ## Documentation
 84 | 
 85 | ### Documentatiob of used models
 86 | 
 87 | 1. [Face Detection Model](https://docs.openvinotoolkit.org/latest/_models_intel_face_detection_adas_binary_0001_description_face_detection_adas_binary_0001.html)
 88 | 2. [Facial Landmarks Detection Model](https://docs.openvinotoolkit.org/latest/_models_intel_landmarks_regression_retail_0009_description_landmarks_regression_retail_0009.html)
 89 | 3. [Head Pose Estimation Model](https://docs.openvinotoolkit.org/latest/_models_intel_head_pose_estimation_adas_0001_description_head_pose_estimation_adas_0001.html)
 90 | 4. [Gaze Estimation Model](https://docs.openvinotoolkit.org/latest/_models_intel_gaze_estimation_adas_0002_description_gaze_estimation_adas_0002.html)
 91 | 
 92 | ### Command Line Arguments for Running the app
 93 | 
 94 | Following are commanda line arguments that can use for while running the main.py file ` python main.py `:-
 95 | 
 96 |   1. -h                : Get the information about all the command line arguments
 97 |   2. -fl    (required) : Specify the path of Face Detection model's xml file
 98 |   3. -hp    (required) : Specify the path of Head Pose Estimation model's xml file
 99 |   4. -g     (required) : Specify the path of Gaze Estimation model's xml file
100 |   5. -i     (required) : Specify the path of input video file or enter cam for taking input video from webcam
101 |   6. -d     (optional) : Specify the target device to infer the video file on the model. Suppoerted devices are: CPU, GPU,                            FPGA (For running on FPGA used HETERO:FPGA,CPU), MYRIAD.
102 |   7. -l     (optional) : Specify the absolute path of cpu extension if some layers of models are not supported on the device.
103 |   9. -prob  (optional) : Specify the probability threshold for face detection model to detect the face accurately from video frame.
104 |   8. -flags (optional) : Specify the flags from fd, fld, hp, ge if you want to visualize the output of corresponding models                           of each frame (write flags with space seperation. Ex:- -flags fd fld hp).
105 | 
106 | ### Directory Structure of the project
107 | 
108 | ![directory_structure_img](media/directory_structure.png)
109 | 
110 | - src folder contains all the source files:-
111 |   1. face_detection.py 
112 |      - Contains preprocession of video frame, perform infernce on it and detect the face, postprocess the                          outputs.
113 |      
114 |   2. facial_landmarks_detection.py
115 |      - Take the deteted face as input, preprocessed it, perform inference on it and detect the eye landmarks, postprocess          the outputs.
116 |      
117 |   3. head_pose_estimation.py
118 |      - Take the detected face as input, preprocessed it, perform inference on it and detect the head postion by predicting          yaw - roll - pitch angles, postprocess the outputs.
119 |      
120 |   4. gaze_estimation.py
121 |      - Take the left eye, rigt eye, head pose angles as inputs, preprocessed it, perform inference and predict the gaze            vector, postprocess the outputs.
122 |      
123 |   5. input_feeder.py
124 |      - Contains InputFeeder class which initialize VideoCapture as per the user argument and return the frames one by one.
125 |      
126 |   6. mouse_controller.py
127 |      - Contains MouseController class which take x, y coordinates value, speed, precisions and according these values it            moves the mouse pointer by using pyautogui library.
128 |   7. main.py
129 |      - Users need to run main.py file for running the app.
130 |  
131 | - media folder contains demo video which user can use for testing the app.
132 | 
133 | 
134 | 
135 | ## Benchmarks
136 | Benchmark results of the model.
137 | 
138 | ### FP32
139 | 
140 | **Inference Time** <br/> 
141 | ![inference_time_fp32_image](media/inference_time_fp32.png "Inference Time")
142 | 
143 | **Frames per Second** <br/> 
144 | ![fps_fp32_image](media/fps_fp32.png "Frames per Second")
145 | 
146 | **Model Loading Time** <br/> 
147 | ![model_loading_time_fp32_image](media/model_loading_time_fp32.png "Model Loading Time")
148 | 
149 | ### FP16
150 | 
151 | **Inference Time** <br/> 
152 | ![inference_time_fp16_image](media/inference_time_fp16.png "Inference Time")
153 | 
154 | **Frames per Second** <br/> 
155 | ![fps_fp16_image](media/fps_fp16.png "Frames per Second")
156 | 
157 | **Model Loading Time** <br/> 
158 | ![model_loading_time_fp16_image](media/model_loading_time_fp16.png "Model Loading Time")
159 | 
160 | ### INT8
161 | **Inference Time** <br/> 
162 | ![inference_time_int8_image](media/inference_time_int8.png "Inference Time")
163 | 
164 | **Frames per Second** <br/> 
165 | ![fps_int8_image](media/fps_int8.png "Frames per Second")
166 | 
167 | **Model Loading Time** <br/> 
168 | ![model_loading_time_int8_image](media/model_loading_time_int8.png "Model Loading Time")
169 | 
170 | ## Results
171 | I have run the model in 5 diffrent hardware:-
172 | 1. Intel Core i5-6500TE CPU 
173 | 2. Intel Core i5-6500TE GPU 
174 | 3. IEI Mustang F100-A10 FPGA 
175 | 4. Intel Xeon E3-1268L v5 CPU 
176 | 5. Intel Atom x7-E3950 UP2 GPU
177 | 
178 | Also compared their performances by inference time, frame per second and model loading time.
179 | 
180 | As we can see from above graph that FPGA took more time for inference than other device because it programs each gate of fpga for compatible for this application. It can take time but there are advantages of FPGA such as:-
181 | - It is robust meaning it is programmable per requirements unlike other hardwares.
182 | - It has also longer life-span.
183 | 
184 | GPU proccesed more frames per second compared to any other hardware and specially when model precision is FP16 because GPU has severals Execution units and their instruction sets are optimized for 16bit floating point data types.
185 | 
186 | - We have run models with different precision, but precision affects the accuracy. Mdoel size can reduce by lowing the precision from FP32 to FP16 or INT8 and inference becomes faster but because of lowing the precision model can lose some of the important information because of that accuracy of model can decrease. 
187 | 
188 | - So when you use lower precision model then you can get lower accuracy than higher precision model.
189 | 
190 | ## Stand Out Suggestions
191 | 
192 | ### Edge Cases
193 | 
194 | 1. If for some reason model can not detect the face then it prints unable to detect the face and read another frame till it    detects the face or user closes the window.
195 | 
196 | 2. If there are more than one face detected in the frame then model takes the first detected face for control the mouse  pointer.
197 | 
198 | 
199 | 


--------------------------------------------------------------------------------
/media/demo.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/demo.mp4


--------------------------------------------------------------------------------
/media/directory_structure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/directory_structure.png


--------------------------------------------------------------------------------
/media/fps_fp16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/fps_fp16.png


--------------------------------------------------------------------------------
/media/fps_fp32.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/fps_fp32.png


--------------------------------------------------------------------------------
/media/fps_int8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/fps_int8.png


--------------------------------------------------------------------------------
/media/inference_time_fp16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/inference_time_fp16.png


--------------------------------------------------------------------------------
/media/inference_time_fp32.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/inference_time_fp32.png


--------------------------------------------------------------------------------
/media/inference_time_int8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/inference_time_int8.png


--------------------------------------------------------------------------------
/media/model_loading_time_fp16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/model_loading_time_fp16.png


--------------------------------------------------------------------------------
/media/model_loading_time_fp32.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/model_loading_time_fp32.png


--------------------------------------------------------------------------------
/media/model_loading_time_int8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/denilgabani/Computer-Pointer-Controller/4b5764912edb4ae337b92f01b38a36d44e001afc/media/model_loading_time_int8.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | image==1.5.27
 2 | ipdb==0.12.3
 3 | ipython==7.10.2
 4 | numpy==1.17.4
 5 | Pillow==6.2.1
 6 | requests==2.22.0
 7 | virtualenv==16.7.9
 8 | PyAutoGUI==0.9.50
 9 | 
10 | 


--------------------------------------------------------------------------------
/src/face_detection.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | This is a sample class for a model. You may choose to use it as-is or make any changes to it.
  3 | This has been provided just to give you an idea of how to structure your model class.
  4 | '''
  5 | import cv2
  6 | import numpy as np
  7 | from openvino.inference_engine import IECore
  8 | 
  9 | class FaceDetectionModel:
 10 |     '''
 11 |     Class for the Face Detection Model.
 12 |     '''
 13 |     def __init__(self, model_name, device='CPU', extensions=None):
 14 |         '''
 15 |         TODO: Use this to set your instance variables.
 16 |         '''
 17 |         self.model_name = model_name
 18 |         self.device = device
 19 |         self.extensions = extensions
 20 |         self.model_structure = self.model_name
 21 |         self.model_weights = self.model_name.split('.')[0]+'.bin'
 22 |         self.plugin = None
 23 |         self.network = None
 24 |         self.exec_net = None
 25 |         self.input_name = None
 26 |         self.input_shape = None
 27 |         self.output_names = None
 28 |         self.output_shape = None
 29 | 
 30 |     def load_model(self):
 31 |         '''
 32 |         TODO: You will need to complete this method.
 33 |         This method is for loading the model to the device specified by the user.
 34 |         If your model requires any Plugins, this is where you can load them.
 35 |         '''
 36 |         self.plugin = IECore()
 37 |         self.network = self.plugin.read_network(model=self.model_structure, weights=self.model_weights)
 38 |         supported_layers = self.plugin.query_network(network=self.network, device_name=self.device)
 39 |         unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
 40 |         
 41 |         
 42 |         if len(unsupported_layers)!=0 and self.device=='CPU':
 43 |             print("unsupported layers found:{}".format(unsupported_layers))
 44 |             if not self.extensions==None:
 45 |                 print("Adding cpu_extension")
 46 |                 self.plugin.add_extension(self.extensions, self.device)
 47 |                 supported_layers = self.plugin.query_network(network = self.network, device_name=self.device)
 48 |                 unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
 49 |                 if len(unsupported_layers)!=0:
 50 |                     print("After adding the extension still unsupported layers found")
 51 |                     exit(1)
 52 |                 print("After adding the extension the issue is resolved")
 53 |             else:
 54 |                 print("Give the path of cpu extension")
 55 |                 exit(1)
 56 |                 
 57 |         self.exec_net = self.plugin.load_network(network=self.network, device_name=self.device,num_requests=1)
 58 |         
 59 |         self.input_name = next(iter(self.network.inputs))
 60 |         self.input_shape = self.network.inputs[self.input_name].shape
 61 |         self.output_names = next(iter(self.network.outputs))
 62 |         self.output_shape = self.network.outputs[self.output_names].shape
 63 |         
 64 |     def predict(self, image, prob_threshold):
 65 |         '''
 66 |         TODO: You will need to complete this method.
 67 |         This method is meant for running predictions on the input image.
 68 |         '''
 69 |         
 70 |         img_processed = self.preprocess_input(image.copy())
 71 |         outputs = self.exec_net.infer({self.input_name:img_processed})
 72 |         coords = self.preprocess_output(outputs, prob_threshold)
 73 |         if (len(coords)==0):
 74 |             return 0, 0
 75 |         coords = coords[0] #take the first detected face
 76 |         h=image.shape[0]
 77 |         w=image.shape[1]
 78 |         coords = coords* np.array([w, h, w, h])
 79 |         coords = coords.astype(np.int32)
 80 |         
 81 |         cropped_face = image[coords[1]:coords[3], coords[0]:coords[2]]
 82 |         return cropped_face, coords
 83 | 
 84 |     def check_model(self):
 85 |         ''
 86 | 
 87 |     def preprocess_input(self, image):
 88 |         '''
 89 |         Before feeding the data into the model for inference,
 90 |         you might have to preprocess it. This function is where you can do that.
 91 |         '''
 92 |         image_resized = cv2.resize(image, (self.input_shape[3], self.input_shape[2]))
 93 |         img_processed = np.transpose(np.expand_dims(image_resized,axis=0), (0,3,1,2))
 94 |         return img_processed
 95 |             
 96 | 
 97 |     def preprocess_output(self, outputs, prob_threshold):
 98 |         '''
 99 |         Before feeding the output of this model to the next model,
100 |         you might have to preprocess the output. This function is where you can do that.
101 |         '''
102 |         coords =[]
103 |         outs = outputs[self.output_names][0][0]
104 |         for out in outs:
105 |             conf = out[2]
106 |             if conf>prob_threshold:
107 |                 x_min=out[3]
108 |                 y_min=out[4]
109 |                 x_max=out[5]
110 |                 y_max=out[6]
111 |                 coords.append([x_min,y_min,x_max,y_max])
112 |         return coords
113 |         
114 | 
115 | 


--------------------------------------------------------------------------------
/src/facial_landmarks_detection.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | This is a sample class for a model. You may choose to use it as-is or make any changes to it.
  3 | This has been provided just to give you an idea of how to structure your model class.
  4 | '''
  5 | import cv2
  6 | import numpy as np
  7 | from openvino.inference_engine import IECore
  8 | 
  9 | class FacialLandmarksDetectionModel:
 10 |     '''
 11 |     Class for the Face Detection Model.
 12 |     '''
 13 |     def __init__(self, model_name, device='CPU', extensions=None):
 14 |         '''
 15 |         TODO: Use this to set your instance variables.
 16 |         '''
 17 |         self.model_name = model_name
 18 |         self.device = device
 19 |         self.extensions = extensions
 20 |         self.model_structure = self.model_name
 21 |         self.model_weights = self.model_name.split(".")[0]+'.bin'
 22 |         self.plugin = None
 23 |         self.network = None
 24 |         self.exec_net = None
 25 |         self.input_name = None
 26 |         self.input_shape = None
 27 |         self.output_names = None
 28 |         self.output_shape = None
 29 | 
 30 |     def load_model(self):
 31 |         '''
 32 |         TODO: You will need to complete this method.
 33 |         This method is for loading the model to the device specified by the user.
 34 |         If your model requires any Plugins, this is where you can load them.
 35 |         '''
 36 |         self.plugin = IECore()
 37 |         self.network = self.plugin.read_network(model=self.model_structure, weights=self.model_weights)
 38 |         supported_layers = self.plugin.query_network(network=self.network, device_name=self.device)
 39 |         unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
 40 |         
 41 |         
 42 |         if len(unsupported_layers)!=0 and self.device=='CPU':
 43 |             print("unsupported layers found:{}".format(unsupported_layers))
 44 |             if not self.extensions==None:
 45 |                 print("Adding cpu_extension")
 46 |                 self.plugin.add_extension(self.extensions, self.device)
 47 |                 supported_layers = self.plugin.query_network(network = self.network, device_name=self.device)
 48 |                 unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
 49 |                 if len(unsupported_layers)!=0:
 50 |                     print("After adding the extension still unsupported layers found")
 51 |                     exit(1)
 52 |                 print("After adding the extension the issue is resolved")
 53 |             else:
 54 |                 print("Give the path of cpu extension")
 55 |                 exit(1)
 56 |                 
 57 |         self.exec_net = self.plugin.load_network(network=self.network, device_name=self.device,num_requests=1)
 58 |         
 59 |         self.input_name = next(iter(self.network.inputs))
 60 |         self.input_shape = self.network.inputs[self.input_name].shape
 61 |         self.output_names = next(iter(self.network.outputs))
 62 |         self.output_shape = self.network.outputs[self.output_names].shape
 63 |         
 64 |     def predict(self, image):
 65 |         '''
 66 |         TODO: You will need to complete this method.
 67 |         This method is meant for running predictions on the input image.
 68 |         '''
 69 |         img_processed = self.preprocess_input(image.copy())
 70 |         outputs = self.exec_net.infer({self.input_name:img_processed})
 71 |         coords = self.preprocess_output(outputs)
 72 |         h=image.shape[0]
 73 |         w=image.shape[1]
 74 |         coords = coords* np.array([w, h, w, h])
 75 |         coords = coords.astype(np.int32) #(lefteye_x, lefteye_y, righteye_x, righteye_y)
 76 |         le_xmin=coords[0]-10
 77 |         le_ymin=coords[1]-10
 78 |         le_xmax=coords[0]+10
 79 |         le_ymax=coords[1]+10
 80 |         
 81 |         re_xmin=coords[2]-10
 82 |         re_ymin=coords[3]-10
 83 |         re_xmax=coords[2]+10
 84 |         re_ymax=coords[3]+10
 85 |         #cv2.rectangle(image,(le_xmin,le_ymin),(le_xmax,le_ymax),(255,0,0))
 86 |         #cv2.rectangle(image,(re_xmin,re_ymin),(re_xmax,re_ymax),(255,0,0))
 87 |         #cv2.imshow("Image",image)
 88 |         left_eye =  image[le_ymin:le_ymax, le_xmin:le_xmax]
 89 |         right_eye = image[re_ymin:re_ymax, re_xmin:re_xmax]
 90 |         eye_coords = [[le_xmin,le_ymin,le_xmax,le_ymax], [re_xmin,re_ymin,re_xmax,re_ymax]]
 91 |         return left_eye, right_eye, eye_coords
 92 |         
 93 |     def check_model(self):
 94 |         ''
 95 | 
 96 |     def preprocess_input(self, image):
 97 |         '''
 98 |         Before feeding the data into the model for inference,
 99 |         you might have to preprocess it. This function is where you can do that.
100 |         '''
101 |         image_cvt = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
102 |         image_resized = cv2.resize(image_cvt, (self.input_shape[3], self.input_shape[2]))
103 |         img_processed = np.transpose(np.expand_dims(image_resized,axis=0), (0,3,1,2))
104 |         return img_processed
105 |             
106 | 
107 |     def preprocess_output(self, outputs):
108 |         '''
109 |         Before feeding the output of this model to the next model,
110 |         you might have to preprocess the output. This function is where you can do that.
111 |         '''
112 | 
113 |         outs = outputs[self.output_names][0]
114 |         leye_x = outs[0].tolist()[0][0]
115 |         leye_y = outs[1].tolist()[0][0]
116 |         reye_x = outs[2].tolist()[0][0]
117 |         reye_y = outs[3].tolist()[0][0]
118 |         
119 |         return (leye_x, leye_y, reye_x, reye_y)
120 | 


--------------------------------------------------------------------------------
/src/gaze_estimation.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | This is a sample class for a model. You may choose to use it as-is or make any changes to it.
  3 | This has been provided just to give you an idea of how to structure your model class.
  4 | '''
  5 | import cv2
  6 | import numpy as np
  7 | from openvino.inference_engine import IECore
  8 | import math
  9 | 
 10 | class GazeEstimationModel:
 11 |     '''
 12 |     Class for the Face Detection Model.
 13 |     '''
 14 |     def __init__(self, model_name, device='CPU', extensions=None):
 15 |         '''
 16 |         TODO: Use this to set your instance variables.
 17 |         '''
 18 |         self.model_name = model_name
 19 |         self.device = device
 20 |         self.extensions = extensions
 21 |         self.model_structure = self.model_name
 22 |         self.model_weights = self.model_name.split(".")[0]+'.bin'
 23 |         self.plugin = None
 24 |         self.network = None
 25 |         self.exec_net = None
 26 |         self.input_name = None
 27 |         self.input_shape = None
 28 |         self.output_names = None
 29 |         self.output_shape = None
 30 | 
 31 |     def load_model(self):
 32 |         '''
 33 |         TODO: You will need to complete this method.
 34 |         This method is for loading the model to the device specified by the user.
 35 |         If your model requires any Plugins, this is where you can load them.
 36 |         '''
 37 |         self.plugin = IECore()
 38 |         self.network = self.plugin.read_network(model=self.model_structure, weights=self.model_weights)
 39 |         supported_layers = self.plugin.query_network(network=self.network, device_name=self.device)
 40 |         unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
 41 |         
 42 |         
 43 |         if len(unsupported_layers)!=0 and self.device=='CPU':
 44 |             print("unsupported layers found:{}".format(unsupported_layers))
 45 |             if not self.extensions==None:
 46 |                 print("Adding cpu_extension")
 47 |                 self.plugin.add_extension(self.extensions, self.device)
 48 |                 supported_layers = self.plugin.query_network(network = self.network, device_name=self.device)
 49 |                 unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
 50 |                 if len(unsupported_layers)!=0:
 51 |                     print("After adding the extension still unsupported layers found")
 52 |                     exit(1)
 53 |                 print("After adding the extension the issue is resolved")
 54 |             else:
 55 |                 print("Give the path of cpu extension")
 56 |                 exit(1)
 57 |                 
 58 |         self.exec_net = self.plugin.load_network(network=self.network, device_name=self.device,num_requests=1)
 59 |         
 60 |         self.input_name = [i for i in self.network.inputs.keys()]
 61 |         self.input_shape = self.network.inputs[self.input_name[1]].shape
 62 |         self.output_names = [i for i in self.network.outputs.keys()]
 63 | 
 64 |         
 65 |     def predict(self, left_eye_image, right_eye_image, hpa):
 66 |         '''
 67 |         TODO: You will need to complete this method.
 68 |         This method is meant for running predictions on the input image.
 69 |         '''
 70 |         le_img_processed, re_img_processed = self.preprocess_input(left_eye_image.copy(), right_eye_image.copy())
 71 |         outputs = self.exec_net.infer({'head_pose_angles':hpa, 'left_eye_image':le_img_processed, 'right_eye_image':re_img_processed})
 72 |         new_mouse_coord, gaze_vector = self.preprocess_output(outputs,hpa)
 73 | 
 74 |         return new_mouse_coord, gaze_vector
 75 | 
 76 |     def check_model(self):
 77 |         ''
 78 | 
 79 |     def preprocess_input(self, left_eye, right_eye):
 80 |         '''
 81 |         Before feeding the data into the model for inference,
 82 |         you might have to preprocess it. This function is where you can do that.
 83 |         '''
 84 |         le_image_resized = cv2.resize(left_eye, (self.input_shape[3], self.input_shape[2]))
 85 |         re_image_resized = cv2.resize(right_eye, (self.input_shape[3], self.input_shape[2]))
 86 |         le_img_processed = np.transpose(np.expand_dims(le_image_resized,axis=0), (0,3,1,2))
 87 |         re_img_processed = np.transpose(np.expand_dims(re_image_resized,axis=0), (0,3,1,2))
 88 |         return le_img_processed, re_img_processed
 89 |             
 90 | 
 91 |     def preprocess_output(self, outputs,hpa):
 92 |         '''
 93 |         Before feeding the output of this model to the next model,
 94 |         you might have to preprocess the output. This function is where you can do that.
 95 |         '''
 96 |         
 97 |         gaze_vector = outputs[self.output_names[0]].tolist()[0]
 98 |         #gaze_vector = gaze_vector / cv2.norm(gaze_vector)
 99 |         rollValue = hpa[2] #angle_r_fc output from HeadPoseEstimation model
100 |         cosValue = math.cos(rollValue * math.pi / 180.0)
101 |         sinValue = math.sin(rollValue * math.pi / 180.0)
102 |         
103 |         newx = gaze_vector[0] * cosValue + gaze_vector[1] * sinValue
104 |         newy = -gaze_vector[0] *  sinValue+ gaze_vector[1] * cosValue
105 |         return (newx,newy), gaze_vector
106 |         
107 |         
108 | 


--------------------------------------------------------------------------------
/src/head_pose_estimation.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | This is a sample class for a model. You may choose to use it as-is or make any changes to it.
 3 | This has been provided just to give you an idea of how to structure your model class.
 4 | '''
 5 | import cv2
 6 | import numpy as np
 7 | from openvino.inference_engine import IECore
 8 | 
 9 | class HeadPoseEstimationModel:
10 |     '''
11 |     Class for the Face Detection Model.
12 |     '''
13 |     def __init__(self, model_name, device='CPU', extensions=None):
14 |         '''
15 |         TODO: Use this to set your instance variables.
16 |         '''
17 |         self.model_name = model_name
18 |         self.device = device
19 |         self.extensions = extensions
20 |         self.model_structure = self.model_name
21 |         self.model_weights = self.model_name.split(".")[0]+'.bin'
22 |         self.plugin = None
23 |         self.network = None
24 |         self.exec_net = None
25 |         self.input_name = None
26 |         self.input_shape = None
27 |         self.output_names = None
28 | 
29 |     def load_model(self):
30 |         '''
31 |         TODO: You will need to complete this method.
32 |         This method is for loading the model to the device specified by the user.
33 |         If your model requires any Plugins, this is where you can load them.
34 |         '''
35 |         self.plugin = IECore()
36 |         self.network = self.plugin.read_network(model=self.model_structure, weights=self.model_weights)
37 |         supported_layers = self.plugin.query_network(network=self.network, device_name=self.device)
38 |         unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
39 |         
40 |         
41 |         if len(unsupported_layers)!=0 and self.device=='CPU':
42 |             print("unsupported layers found:{}".format(unsupported_layers))
43 |             if not self.extensions==None:
44 |                 print("Adding cpu_extension")
45 |                 self.plugin.add_extension(self.extensions, self.device)
46 |                 supported_layers = self.plugin.query_network(network = self.network, device_name=self.device)
47 |                 unsupported_layers = [l for l in self.network.layers.keys() if l not in supported_layers]
48 |                 if len(unsupported_layers)!=0:
49 |                     print("After adding the extension still unsupported layers found")
50 |                     exit(1)
51 |                 print("After adding the extension the issue is resolved")
52 |             else:
53 |                 print("Give the path of cpu extension")
54 |                 exit(1)
55 |                 
56 |         self.exec_net = self.plugin.load_network(network=self.network, device_name=self.device,num_requests=1)
57 |         
58 |         self.input_name = next(iter(self.network.inputs))
59 |         self.input_shape = self.network.inputs[self.input_name].shape
60 |         self.output_names = [i for i in self.network.outputs.keys()]
61 |         
62 |     def predict(self, image):
63 |         '''
64 |         TODO: You will need to complete this method.
65 |         This method is meant for running predictions on the input image.
66 |         '''
67 |         img_processed = self.preprocess_input(image.copy())
68 |         outputs = self.exec_net.infer({self.input_name:img_processed})
69 |         finalOutput = self.preprocess_output(outputs)
70 |         return finalOutput
71 |         
72 | 
73 |     def check_model(self):
74 |         ''
75 | 
76 |     def preprocess_input(self, image):
77 |         '''
78 |         Before feeding the data into the model for inference,
79 |         you might have to preprocess it. This function is where you can do that.
80 |         '''
81 |         image_resized = cv2.resize(image, (self.input_shape[3], self.input_shape[2]))
82 |         img_processed = np.transpose(np.expand_dims(image_resized,axis=0), (0,3,1,2))
83 |         return img_processed
84 |             
85 | 
86 |     def preprocess_output(self, outputs):
87 |         '''
88 |         Before feeding the output of this model to the next model,
89 |         you might have to preprocess the output. This function is where you can do that.
90 |         '''
91 |         outs = []
92 |         outs.append(outputs['angle_y_fc'].tolist()[0][0])
93 |         outs.append(outputs['angle_p_fc'].tolist()[0][0])
94 |         outs.append(outputs['angle_r_fc'].tolist()[0][0])
95 |         return outs
96 | 


--------------------------------------------------------------------------------
/src/input_feeder.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | This class can be used to feed input from an image, webcam, or video to your model.
 3 | Sample usage:
 4 |     feed=InputFeeder(input_type='video', input_file='video.mp4')
 5 |     feed.load_data()
 6 |     for batch in feed.next_batch():
 7 |         do_something(batch)
 8 |     feed.close()
 9 | '''
10 | import cv2
11 | 
12 | class InputFeeder:
13 |     def __init__(self, input_type, input_file=None):
14 |         '''
15 |         input_type: str, The type of input. Can be 'video' for video file, 'image' for image file,
16 |                     or 'cam' to use webcam feed.
17 |         input_file: str, The file that contains the input image or video file. Leave empty for cam input_type.
18 |         '''
19 |         self.input_type=input_type
20 |         if input_type=='video' or input_type=='image':
21 |             self.input_file=input_file
22 |     
23 |     def load_data(self):
24 |         if self.input_type=='video':
25 |             self.cap=cv2.VideoCapture(self.input_file)
26 |         elif self.input_type=='cam':
27 |             self.cap=cv2.VideoCapture(0)
28 |         else:
29 |             self.cap=cv2.imread(self.input_file)
30 | 
31 |     def next_batch(self):
32 |         '''
33 |         Returns the next image from either a video file or webcam.
34 |         If input_type is 'image', then it returns the same image.
35 |         '''
36 |         while True:
37 |             for _ in range(10):
38 |                 ret, frame=self.cap.read()
39 |             yield ret, frame
40 | 
41 | 
42 |     def close(self):
43 |         '''
44 |         Closes the VideoCapture.
45 |         '''
46 |         if not self.input_type=='image':
47 |             self.cap.release()
48 | 
49 | 


--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- coding: utf-8 -*-
  3 | """
  4 | Created on Thu Apr 30 17:15:45 2020
  5 | 
  6 | @author: dg
  7 | """
  8 | import cv2
  9 | import os
 10 | import logging
 11 | import numpy as np
 12 | from face_detection import FaceDetectionModel
 13 | from facial_landmarks_detection import FacialLandmarksDetectionModel
 14 | from gaze_estimation import GazeEstimationModel
 15 | from head_pose_estimation import HeadPoseEstimationModel
 16 | from mouse_controller import MouseController
 17 | from argparse import ArgumentParser
 18 | from input_feeder import InputFeeder
 19 | 
 20 | def build_argparser():
 21 |     #Parse command line arguments.
 22 | 
 23 |     #:return: command line arguments
 24 |     parser = ArgumentParser()
 25 |     parser.add_argument("-f", "--facedetectionmodel", required=True, type=str,
 26 |                         help="Specify Path to .xml file of Face Detection model.")
 27 |     parser.add_argument("-fl", "--faciallandmarkmodel", required=True, type=str,
 28 |                         help="Specify Path to .xml file of Facial Landmark Detection model.")
 29 |     parser.add_argument("-hp", "--headposemodel", required=True, type=str,
 30 |                         help="Specify Path to .xml file of Head Pose Estimation model.")
 31 |     parser.add_argument("-g", "--gazeestimationmodel", required=True, type=str,
 32 |                         help="Specify Path to .xml file of Gaze Estimation model.")
 33 |     parser.add_argument("-i", "--input", required=True, type=str,
 34 |                         help="Specify Path to video file or enter cam for webcam")
 35 |     parser.add_argument("-flags", "--previewFlags", required=False, nargs='+',
 36 |                         default=[],
 37 |                         help="Specify the flags from fd, fld, hp, ge like --flags fd hp fld (Seperate each flag by space)"
 38 |                              "for see the visualization of different model outputs of each frame," 
 39 |                              "fd for Face Detection, fld for Facial Landmark Detection"
 40 |                              "hp for Head Pose Estimation, ge for Gaze Estimation." )
 41 |     parser.add_argument("-l", "--cpu_extension", required=False, type=str,
 42 |                         default=None,
 43 |                         help="MKLDNN (CPU)-targeted custom layers."
 44 |                              "Absolute path to a shared library with the"
 45 |                              "kernels impl.")
 46 |     parser.add_argument("-prob", "--prob_threshold", required=False, type=float,
 47 |                         default=0.6,
 48 |                         help="Probability threshold for model to detect the face accurately from the video frame.")
 49 |     parser.add_argument("-d", "--device", type=str, default="CPU",
 50 |                         help="Specify the target device to infer on: "
 51 |                              "CPU, GPU, FPGA or MYRIAD is acceptable. Sample "
 52 |                              "will look for a suitable plugin for device "
 53 |                              "specified (CPU by default)")
 54 |     
 55 |     return parser
 56 | 
 57 | 
 58 | 
 59 | def main():
 60 | 
 61 |     # Grab command line args
 62 |     args = build_argparser().parse_args()
 63 |     previewFlags = args.previewFlags
 64 |     
 65 |     logger = logging.getLogger()
 66 |     inputFilePath = args.input
 67 |     inputFeeder = None
 68 |     if inputFilePath.lower()=="cam":
 69 |             inputFeeder = InputFeeder("cam")
 70 |     else:
 71 |         if not os.path.isfile(inputFilePath):
 72 |             logger.error("Unable to find specified video file")
 73 |             exit(1)
 74 |         inputFeeder = InputFeeder("video",inputFilePath)
 75 |     
 76 |     modelPathDict = {'FaceDetectionModel':args.facedetectionmodel, 'FacialLandmarksDetectionModel':args.faciallandmarkmodel, 
 77 |     'GazeEstimationModel':args.gazeestimationmodel, 'HeadPoseEstimationModel':args.headposemodel}
 78 |     
 79 |     for fileNameKey in modelPathDict.keys():
 80 |         if not os.path.isfile(modelPathDict[fileNameKey]):
 81 |             logger.error("Unable to find specified "+fileNameKey+" xml file")
 82 |             exit(1)
 83 |             
 84 |     fdm = FaceDetectionModel(modelPathDict['FaceDetectionModel'], args.device, args.cpu_extension)
 85 |     fldm = FacialLandmarksDetectionModel(modelPathDict['FacialLandmarksDetectionModel'], args.device, args.cpu_extension)
 86 |     gem = GazeEstimationModel(modelPathDict['GazeEstimationModel'], args.device, args.cpu_extension)
 87 |     hpem = HeadPoseEstimationModel(modelPathDict['HeadPoseEstimationModel'], args.device, args.cpu_extension)
 88 |     
 89 |     mc = MouseController('medium','fast')
 90 |     
 91 |     inputFeeder.load_data()
 92 |     fdm.load_model()
 93 |     fldm.load_model()
 94 |     hpem.load_model()
 95 |     gem.load_model()
 96 |     
 97 |     frame_count = 0
 98 |     for ret, frame in inputFeeder.next_batch():
 99 |         if not ret:
100 |             break
101 |         frame_count+=1
102 |         if frame_count%5==0:
103 |             cv2.imshow('video',cv2.resize(frame,(500,500)))
104 |     
105 |         key = cv2.waitKey(60)
106 |         croppedFace, face_coords = fdm.predict(frame.copy(), args.prob_threshold)
107 |         if type(croppedFace)==int:
108 |             logger.error("Unable to detect the face.")
109 |             if key==27:
110 |                 break
111 |             continue
112 |         
113 |         hp_out = hpem.predict(croppedFace.copy())
114 |         
115 |         left_eye, right_eye, eye_coords = fldm.predict(croppedFace.copy())
116 |         
117 |         new_mouse_coord, gaze_vector = gem.predict(left_eye, right_eye, hp_out)
118 |         
119 |         if (not len(previewFlags)==0):
120 |             preview_frame = frame.copy()
121 |             if 'fd' in previewFlags:
122 |                 #cv2.rectangle(preview_frame, (face_coords[0], face_coords[1]), (face_coords[2], face_coords[3]), (255,0,0), 3)
123 |                 preview_frame = croppedFace
124 |             if 'fld' in previewFlags:
125 |                 cv2.rectangle(croppedFace, (eye_coords[0][0]-10, eye_coords[0][1]-10), (eye_coords[0][2]+10, eye_coords[0][3]+10), (0,255,0), 3)
126 |                 cv2.rectangle(croppedFace, (eye_coords[1][0]-10, eye_coords[1][1]-10), (eye_coords[1][2]+10, eye_coords[1][3]+10), (0,255,0), 3)
127 |                 #preview_frame[face_coords[1]:face_coords[3], face_coords[0]:face_coords[2]] = croppedFace
128 |                 
129 |             if 'hp' in previewFlags:
130 |                 cv2.putText(preview_frame, "Pose Angles: yaw:{:.2f} | pitch:{:.2f} | roll:{:.2f}".format(hp_out[0],hp_out[1],hp_out[2]), (10, 20), cv2.FONT_HERSHEY_COMPLEX, 0.25, (0, 255, 0), 1)
131 |             if 'ge' in previewFlags:
132 |                 x, y, w = int(gaze_vector[0]*12), int(gaze_vector[1]*12), 160
133 |                 le =cv2.line(left_eye.copy(), (x-w, y-w), (x+w, y+w), (255,0,255), 2)
134 |                 cv2.line(le, (x-w, y+w), (x+w, y-w), (255,0,255), 2)
135 |                 re = cv2.line(right_eye.copy(), (x-w, y-w), (x+w, y+w), (255,0,255), 2)
136 |                 cv2.line(re, (x-w, y+w), (x+w, y-w), (255,0,255), 2)
137 |                 croppedFace[eye_coords[0][1]:eye_coords[0][3],eye_coords[0][0]:eye_coords[0][2]] = le
138 |                 croppedFace[eye_coords[1][1]:eye_coords[1][3],eye_coords[1][0]:eye_coords[1][2]] = re
139 |                 #preview_frame[face_coords[1]:face_coords[3], face_coords[0]:face_coords[2]] = croppedFace
140 |                 
141 |             cv2.imshow("visualization",cv2.resize(preview_frame,(500,500)))
142 |         
143 |         if frame_count%5==0:
144 |             mc.move(new_mouse_coord[0],new_mouse_coord[1])    
145 |         if key==27:
146 |                 break
147 |     logger.error("VideoStream ended...")
148 |     cv2.destroyAllWindows()
149 |     inputFeeder.close()
150 |      
151 |     
152 | 
153 | if __name__ == '__main__':
154 |     main() 
155 |  
156 | 


--------------------------------------------------------------------------------
/src/mouse_controller.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | This is a sample class that you can use to control the mouse pointer.
 3 | It uses the pyautogui library. You can set the precision for mouse movement
 4 | (how much the mouse moves) and the speed (how fast it moves) by changing 
 5 | precision_dict and speed_dict.
 6 | Calling the move function with the x and y output of the gaze estimation model
 7 | will move the pointer.
 8 | This class is provided to help get you started; you can choose whether you want to use it or create your own from scratch.
 9 | '''
10 | import pyautogui
11 | 
12 | class MouseController:
13 |     def __init__(self, precision, speed):
14 |         precision_dict={'high':100, 'low':1000, 'medium':500}
15 |         speed_dict={'fast':1, 'slow':10, 'medium':5}
16 | 
17 |         self.precision=precision_dict[precision]
18 |         self.speed=speed_dict[speed]
19 | 
20 |     def move(self, x, y):
21 |         pyautogui.moveRel(x*self.precision, -1*y*self.precision, duration=self.speed)
22 | 
23 | 


--------------------------------------------------------------------------------