└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # hand-gesture-recognition-deep-learning
 2 | Project to recognize  hand gesture using state of the art neural networks.
 3 | 
 4 | ## Team
 5 | 
 6 | Aman Srivastava [https://www.linkedin.com/in/amansrivastava1/]
 7 | 
 8 | Nurul Q Khan [https://www.linkedin.com/in/nurulquamar/]
 9 | 
10 | Prakash Srinivasan [https://www.linkedin.com/in/prakash-srinivasan-6641812/]
11 | 
12 | Tim Kumar [https://www.linkedin.com/in/tim-kumar-b1519252/]
13 | 
14 | ## Problem Statement
15 | Imagine you are working as a data scientist at a home electronics company which manufactures state of the art smart televisions. You want to develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote
16 | 
17 | The gestures are continuously monitored by the webcam mounted on the TV. Each gesture corresponds to a specific command:
18 | 
19 | - Thumbs up:  Increase the volume
20 | - Thumbs down: Decrease the volume
21 | - Left swipe: 'Jump' backwards 10 seconds
22 | - Right swipe: 'Jump' forward 10 seconds  
23 | - Stop: Pause the movie
24 | 
25 | Each video is a sequence of 30 frames (or images)
26 | 
27 | ## Understanding the Dataset
28 | The training data consists of a few hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds long) is divided into a sequence of 30 frames(images). These videos have been recorded by various people performing one of the five gestures in front of a webcam - similar to what the smart TV will use. 
29 | 
30 | The data is in a [zip](https://drive.google.com/uc?id=1ehyrYBQ5rbQQe6yL4XbLWe3FMvuVUGiL) file. The zip file contains a 'train' and a 'val' folder with two CSV files for the two folders.
31 | 
32 | ## Model Overview
33 | 
34 | | Model Name     | Model Type | Number of parameters | Augment Data | Model Size(in MB) | Highest Validation accuracy | Corres-ponding Training accuracy | Observations                                                                                                                                                               |
35 | |----------------|------------|----------------------|--------------|-------------------|-----------------------------|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
36 | | conv_3d1_model | Conv3D     | 1,117,061            | No           | NA                | 78%                         | 99%                              | Model is over-fitting. Augment data using cropping                                                                                                                         |
37 | | conv_3d2_model | Conv3D     | 3,638,981            | Yes          | 43.8              | 85%                         | 91%                              | Model is not over-fitting. Next we will try to reduce the parameter size. Moreover since we see minor oscillations in loss, let's try lowering the learning rate to 0.0002 |
38 | | conv_3d3_model | Conv3D     | 1,762,613            | Yes          | 21.2              | 85%                         | 83%                              | Model has stable results .Also we were able to reduce the parameter size by half. Let's trying adding more layers at the same level of abstractions                        |
39 | | conv_3d4_model | Conv3D     | 2,556,533            | Yes          | 30.8              | 76%                         | 89%                              | With more layers added model is over-fitting. Let's try adding dropouts at the convolution layers                                                                          |
40 | | conv_3d5_model | Conv3D     | 2,556,533            | Yes          | 30.8              | 70%                         | 89%                              | Adding dropouts has further reduced validation accuracy as its not to learn  generalizable features and its further over-fitting                                           |
41 | | conv_3d6_model | Conv3D     | 696,645              | Yes          | 8.46              | 77%                         | 92%                              | Reducing the number of network parameters by reducing image resolution/ filter size and dense layer neurons. Comparably good validation accuracy                           |
42 | | conv_3d7_model | Conv3D     | 504,709              | Yes          | 6.15              | 77%                         | 85%                              |                                                                                                                                                                            |
43 | | conv_3d8_model | Conv3D     | 230,949              | Yes          | 2.87              | 78%                         | 86%                              |                                                                                                                                                                            |
44 | | rnn_cnn1_model | CNN-LSTM   | 1,657,445            | Yes          | 20                | 75%                         | 92%                              | Model is over-fitting. Let’s try reducing the number of layers in next iteration                                                                                           |
45 | 
46 | ## Models with More Data Augmentation
47 | 
48 | | Model Name      | Model Type | Number of parameters | Augment Data | Model Size(in MB) | Highest validation accuracy | Corresponding Training accuracy |
49 | |-----------------|------------|----------------------|--------------|-------------------|-----------------------------|---------------------------------|
50 | | conv_3d10_model | Conv3D     | 3,638,981            | Yes          | 43.8              | 86%                         | 86%                             |
51 | | conv_3d11_model | Conv3D     | 1,762,613            | Yes          | 21.2              | 78 %                        | 79 %                            |
52 | | conv_3d12_model | Conv3D     | 2,556,533            | Yes          | 30.8              | 81%                         | 84%                             |
53 | | conv_3d13_model | Conv3D     | 2,556,533            | Yes          | 30.8              | 31%                         | 78%                             |
54 | | conv_3d14_model | Conv3D     | 696,645              | Yes          | 8.46              | 77%                         | 87%                             |
55 | | conv_3d15_model | Conv3D     | 504,709              | Yes          | 6.15              | 75%                         | 82%                             |
56 | | conv_3d16_model | Conv3D     | 230,949              | Yes          | 2.87              | 76%                         | 77%                             |
57 | | rnn_cnn2_model  | CNN-LSTM   | 1,346,021            | Yes          | 31                | 78%                         | 96%                             |
58 | 
59 | ## Transfer Learning Models (CNN + RNN)
60 | ### Mobilenet model is considered as its parameter size is less compared to Inception and Resnet models
61 | 
62 | | Model Name        | Number of parameters | Augment Data | Model Size(in MB) | Highest validation accuracy | Corres-ponding Training accuracy | Observations                                                                                                                                     |
63 | |-------------------|----------------------|--------------|-------------------|-----------------------------|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
64 | | rnn_cnn_tl_model  | 3,840,453            | Yes          | 20.4              | 56%                         | 85%                              | For this experiment, Mobilenet layer weights are not trained. Validation accuracy is very poor. So let’s train mobilenet layer’s weights as well |
65 | | rnn_cnn_tl2_model | 3,692,869            | Yes          | 42.3              | 97%                         | 99%                              | We get a better accuracy on training mobilenet layer’s weights as well.                                                                          |
66 | 
67 | ## Note: If notebook doesnt load then view it here: https://nbviewer.jupyter.org/github/amanrocks11/hand-gesture-recognition-deep-learning/blob/master/Gesture_Recognition_Final.ipynb
68 | 


--------------------------------------------------------------------------------