├── README.md
├── Team Linear Digressors-Presentation.pdf
├── code
├── Install_Packages.txt
├── Install_Packages_gpu.txt
├── Rotate_images.py
├── cnn_model_train.py
├── create_gestures.py
├── display_gestures.py
├── final.py
├── gesture_db.db
├── hist
├── load_images.py
└── set_hand_histogram.py
└── img
├── Capture.PNG
├── Capture1.PNG
├── asd
├── demo.gif
├── demo1.gif
└── demo2.gif
/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 
3 | 
4 | [](https://choosealicense.com/licenses/mit)
5 | [](http://hits.dwyl.io/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning)
6 |
7 | # Sign Language Interpreter using Deep Learning
8 | > A sign language interpreter using live video feed from the camera.
9 | The project was completed in 24 hours as part of HackUNT-19, the University of North Texas's annual Hackathon. You can view the project demo on [YouTube](http://bit.ly/2ZkhqLz).
10 |
11 | ## Table of contents
12 | * [General info](#general-info)
13 | * [Screenshots](#screenshots)
14 | * [Demo](#demo)
15 | * [Technologies and Tools](#technologies-and-tools)
16 | * [Setup](#setup)
17 | * [Process](#process)
18 | * [Code Examples](#code-examples)
19 | * [Features](#features)
20 | * [Status](#status)
21 | * [Contact](#contact)
22 |
23 | ## General info
24 |
25 | The theme at HACK UNT 19 was to use technology to improve accessibility by finding a creative solution to benefit the lives of those with a disability.
26 | We wanted to make it easy for 70 million people across the world with hearing impairment to be independent of translators for their daily communication needs, so we designed the app to work as a personal translator 24*7 for the deaf people.
27 |
28 | ## Demo
29 | 
30 |
31 |
32 |
33 | 
34 |
35 |
36 |
37 | 
38 |
39 |
40 | **The entire demo of the project can be found on [YouTube](http://bit.ly/2ZkhqLz).**
41 |
42 |
43 | ## Screenshots
44 |
45 | 
46 | 
47 |
48 | ## Technologies and Tools
49 | * Python
50 | * TensorFlow
51 | * Keras
52 | * OpenCV
53 |
54 | ## Setup
55 |
56 | * Use comand promt to setup environment by using install_packages.txt and install_packages_gpu.txt files.
57 |
58 | `pyton -m pip r install_packages.txt`
59 |
60 | This will help you in installing all the libraries required for the project.
61 |
62 | ## Process
63 |
64 | * Run `set_hand_histogram.py` to set the hand histogram for creating gestures.
65 | * Once you get a good histogram, save it in the code folder, or you can use the histogram created by us that can be found [here](https://github.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/blob/master/code/hist).
66 | * Added gestures and label them using OpenCV which uses webcam feed. by running `create_gestures.py` and stores them in a database. Alternately, you can use the gestures created by us [here](https://github.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/blob/master/code/gesture_db.db).
67 | * Add different variations to the captured gestures by flipping all the images by using `Rotate_images.py`.
68 | * Run `load_images.py` to split all the captured gestures into training, validation and test set.
69 | * To view all the gestures, run `display_gestures.py` .
70 | * Train the model using Keras by running `cnn_model_train.py`.
71 | * Run `final.py`. This will open up the gesture recognition window which will use your webcam to interpret the trained American Sign Language gestures.
72 |
73 | ## Code Examples
74 |
75 | ````
76 | # Model Traiining using CNN
77 |
78 | import numpy as np
79 | import pickle
80 | import cv2, os
81 | from glob import glob
82 | from keras import optimizers
83 | from keras.models import Sequential
84 | from keras.layers import Dense
85 | from keras.layers import Dropout
86 | from keras.layers import Flatten
87 | from keras.layers.convolutional import Conv2D
88 | from keras.layers.convolutional import MaxPooling2D
89 | from keras.utils import np_utils
90 | from keras.callbacks import ModelCheckpoint
91 | from keras import backend as K
92 | K.set_image_dim_ordering('tf')
93 |
94 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
95 |
96 | def get_image_size():
97 | img = cv2.imread('gestures/1/100.jpg', 0)
98 | return img.shape
99 |
100 | def get_num_of_classes():
101 | return len(glob('gestures/*'))
102 |
103 | image_x, image_y = get_image_size()
104 |
105 | def cnn_model():
106 | num_of_classes = get_num_of_classes()
107 | model = Sequential()
108 | model.add(Conv2D(16, (2,2), input_shape=(image_x, image_y, 1), activation='relu'))
109 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))
110 | model.add(Conv2D(32, (3,3), activation='relu'))
111 | model.add(MaxPooling2D(pool_size=(3, 3), strides=(3, 3), padding='same'))
112 | model.add(Conv2D(64, (5,5), activation='relu'))
113 | model.add(MaxPooling2D(pool_size=(5, 5), strides=(5, 5), padding='same'))
114 | model.add(Flatten())
115 | model.add(Dense(128, activation='relu'))
116 | model.add(Dropout(0.2))
117 | model.add(Dense(num_of_classes, activation='softmax'))
118 | sgd = optimizers.SGD(lr=1e-2)
119 | model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
120 | filepath="cnn_model_keras2.h5"
121 | checkpoint1 = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
122 | callbacks_list = [checkpoint1]
123 | #from keras.utils import plot_model
124 | #plot_model(model, to_file='model.png', show_shapes=True)
125 | return model, callbacks_list
126 |
127 | def train():
128 | with open("train_images", "rb") as f:
129 | train_images = np.array(pickle.load(f))
130 | with open("train_labels", "rb") as f:
131 | train_labels = np.array(pickle.load(f), dtype=np.int32)
132 |
133 | with open("val_images", "rb") as f:
134 | val_images = np.array(pickle.load(f))
135 | with open("val_labels", "rb") as f:
136 | val_labels = np.array(pickle.load(f), dtype=np.int32)
137 |
138 | train_images = np.reshape(train_images, (train_images.shape[0], image_x, image_y, 1))
139 | val_images = np.reshape(val_images, (val_images.shape[0], image_x, image_y, 1))
140 | train_labels = np_utils.to_categorical(train_labels)
141 | val_labels = np_utils.to_categorical(val_labels)
142 |
143 | print(val_labels.shape)
144 |
145 | model, callbacks_list = cnn_model()
146 | model.summary()
147 | model.fit(train_images, train_labels, validation_data=(val_images, val_labels), epochs=15, batch_size=500, callbacks=callbacks_list)
148 | scores = model.evaluate(val_images, val_labels, verbose=0)
149 | print("CNN Error: %.2f%%" % (100-scores[1]*100))
150 | #model.save('cnn_model_keras2.h5')
151 |
152 | train()
153 | K.clear_session();
154 |
155 | ````
156 |
157 | ## Features
158 | Our model was able to predict the 44 characters in the ASL with a prediction accuracy >95%.
159 |
160 | Features that can be added:
161 | * Deploy the project on cloud and create an API for using it.
162 | * Increase the vocabulary of our model
163 | * Incorporate feedback mechanism to make the model more robust
164 | * Add more sign languages
165 |
166 | ## Status
167 | Project is: _finished_. Our team was the winner of the UNT Hackaton 2019. You can find the our final submission post on [devpost](http://bit.ly/2WWllwg).
168 |
169 | ## Contact
170 | Created by me with my teammates [Siddharth Oza](https://github.com/siddharthoza), Harsh Gupta, and [Manish Shukla](https://github.com/Manishms18).
171 |
172 |
173 |
174 |
175 | If you loved what you read here and feel like we can collaborate to produce some exciting stuff, or if you
176 | just want to shoot a question, please feel free to connect with me on
177 | email or
178 | LinkedIn
179 |
--------------------------------------------------------------------------------
/Team Linear Digressors-Presentation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/Team Linear Digressors-Presentation.pdf
--------------------------------------------------------------------------------
/code/Install_Packages.txt:
--------------------------------------------------------------------------------
1 | h5py
2 | numpy
3 | scikit-learn
4 | sklearn
5 | keras
6 | opencv-python
7 | pyttsx3
8 |
--------------------------------------------------------------------------------
/code/Install_Packages_gpu.txt:
--------------------------------------------------------------------------------
1 | h5py
2 | numpy
3 | scikit-learn
4 | sklearn
5 | tensorflow-gpu
6 | keras
7 | opencv-python
8 | pyttsx3
9 |
--------------------------------------------------------------------------------
/code/Rotate_images.py:
--------------------------------------------------------------------------------
1 | import cv2, os
2 |
3 | def flip_images():
4 | gest_folder = "gestures"
5 | images_labels = []
6 | images = []
7 | labels = []
8 | for g_id in os.listdir(gest_folder):
9 | for i in range(1200):
10 | path = gest_folder+"/"+g_id+"/"+str(i+1)+".jpg"
11 | new_path = gest_folder+"/"+g_id+"/"+str(i+1+1200)+".jpg"
12 | print(path)
13 | img = cv2.imread(path, 0)
14 | img = cv2.flip(img, 1)
15 | cv2.imwrite(new_path, img)
16 |
17 | flip_images()
18 |
--------------------------------------------------------------------------------
/code/cnn_model_train.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pickle
3 | import cv2, os
4 | from glob import glob
5 | from keras import optimizers
6 | from keras.models import Sequential
7 | from keras.layers import Dense
8 | from keras.layers import Dropout
9 | from keras.layers import Flatten
10 | from keras.layers.convolutional import Conv2D
11 | from keras.layers.convolutional import MaxPooling2D
12 | from keras.utils import np_utils
13 | from keras.callbacks import ModelCheckpoint
14 | from keras import backend as K
15 | K.set_image_dim_ordering('tf')
16 |
17 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
18 |
19 | def get_image_size():
20 | img = cv2.imread('gestures/1/100.jpg', 0)
21 | return img.shape
22 |
23 | def get_num_of_classes():
24 | return len(glob('gestures/*'))
25 |
26 | image_x, image_y = get_image_size()
27 |
28 | def cnn_model():
29 | num_of_classes = get_num_of_classes()
30 | model = Sequential()
31 | model.add(Conv2D(16, (2,2), input_shape=(image_x, image_y, 1), activation='relu'))
32 | model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))
33 | model.add(Conv2D(32, (3,3), activation='relu'))
34 | model.add(MaxPooling2D(pool_size=(3, 3), strides=(3, 3), padding='same'))
35 | model.add(Conv2D(64, (5,5), activation='relu'))
36 | model.add(MaxPooling2D(pool_size=(5, 5), strides=(5, 5), padding='same'))
37 | model.add(Flatten())
38 | model.add(Dense(128, activation='relu'))
39 | model.add(Dropout(0.2))
40 | model.add(Dense(num_of_classes, activation='softmax'))
41 | sgd = optimizers.SGD(lr=1e-2)
42 | model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
43 | filepath="cnn_model_keras2.h5"
44 | checkpoint1 = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
45 | callbacks_list = [checkpoint1]
46 | #from keras.utils import plot_model
47 | #plot_model(model, to_file='model.png', show_shapes=True)
48 | return model, callbacks_list
49 |
50 | def train():
51 | with open("train_images", "rb") as f:
52 | train_images = np.array(pickle.load(f))
53 | with open("train_labels", "rb") as f:
54 | train_labels = np.array(pickle.load(f), dtype=np.int32)
55 |
56 | with open("val_images", "rb") as f:
57 | val_images = np.array(pickle.load(f))
58 | with open("val_labels", "rb") as f:
59 | val_labels = np.array(pickle.load(f), dtype=np.int32)
60 |
61 | train_images = np.reshape(train_images, (train_images.shape[0], image_x, image_y, 1))
62 | val_images = np.reshape(val_images, (val_images.shape[0], image_x, image_y, 1))
63 | train_labels = np_utils.to_categorical(train_labels)
64 | val_labels = np_utils.to_categorical(val_labels)
65 |
66 | print(val_labels.shape)
67 |
68 | model, callbacks_list = cnn_model()
69 | model.summary()
70 | model.fit(train_images, train_labels, validation_data=(val_images, val_labels), epochs=15, batch_size=500, callbacks=callbacks_list)
71 | scores = model.evaluate(val_images, val_labels, verbose=0)
72 | print("CNN Error: %.2f%%" % (100-scores[1]*100))
73 | #model.save('cnn_model_keras2.h5')
74 |
75 | train()
76 | K.clear_session();
77 |
--------------------------------------------------------------------------------
/code/create_gestures.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import numpy as np
3 | import pickle, os, sqlite3, random
4 |
5 | image_x, image_y = 50, 50
6 |
7 | def get_hand_hist():
8 | with open("hist", "rb") as f:
9 | hist = pickle.load(f)
10 | return hist
11 |
12 | def init_create_folder_database():
13 | # create the folder and database if not exist
14 | if not os.path.exists("gestures"):
15 | os.mkdir("gestures")
16 | if not os.path.exists("gesture_db.db"):
17 | conn = sqlite3.connect("gesture_db.db")
18 | create_table_cmd = "CREATE TABLE gesture ( g_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, g_name TEXT NOT NULL )"
19 | conn.execute(create_table_cmd)
20 | conn.commit()
21 |
22 | def create_folder(folder_name):
23 | if not os.path.exists(folder_name):
24 | os.mkdir(folder_name)
25 |
26 | def store_in_db(g_id, g_name):
27 | conn = sqlite3.connect("gesture_db.db")
28 | cmd = "INSERT INTO gesture (g_id, g_name) VALUES (%s, \'%s\')" % (g_id, g_name)
29 | try:
30 | conn.execute(cmd)
31 | except sqlite3.IntegrityError:
32 | choice = input("g_id already exists. Want to change the record? (y/n): ")
33 | if choice.lower() == 'y':
34 | cmd = "UPDATE gesture SET g_name = \'%s\' WHERE g_id = %s" % (g_name, g_id)
35 | conn.execute(cmd)
36 | else:
37 | print("Doing nothing...")
38 | return
39 | conn.commit()
40 |
41 | def store_images(g_id):
42 | total_pics = 1200
43 | hist = get_hand_hist()
44 | cam = cv2.VideoCapture(1)
45 | if cam.read()[0]==False:
46 | cam = cv2.VideoCapture(0)
47 | x, y, w, h = 300, 100, 300, 300
48 |
49 | create_folder("gestures/"+str(g_id))
50 | pic_no = 0
51 | flag_start_capturing = False
52 | frames = 0
53 |
54 | while True:
55 | img = cam.read()[1]
56 | img = cv2.flip(img, 1)
57 | imgHSV = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
58 | dst = cv2.calcBackProject([imgHSV], [0, 1], hist, [0, 180, 0, 256], 1)
59 | disc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(10,10))
60 | cv2.filter2D(dst,-1,disc,dst)
61 | blur = cv2.GaussianBlur(dst, (11,11), 0)
62 | blur = cv2.medianBlur(blur, 15)
63 | thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
64 | thresh = cv2.merge((thresh,thresh,thresh))
65 | thresh = cv2.cvtColor(thresh, cv2.COLOR_BGR2GRAY)
66 | thresh = thresh[y:y+h, x:x+w]
67 | contours = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[1]
68 |
69 | if len(contours) > 0:
70 | contour = max(contours, key = cv2.contourArea)
71 | if cv2.contourArea(contour) > 10000 and frames > 50:
72 | x1, y1, w1, h1 = cv2.boundingRect(contour)
73 | pic_no += 1
74 | save_img = thresh[y1:y1+h1, x1:x1+w1]
75 | if w1 > h1:
76 | save_img = cv2.copyMakeBorder(save_img, int((w1-h1)/2) , int((w1-h1)/2) , 0, 0, cv2.BORDER_CONSTANT, (0, 0, 0))
77 | elif h1 > w1:
78 | save_img = cv2.copyMakeBorder(save_img, 0, 0, int((h1-w1)/2) , int((h1-w1)/2) , cv2.BORDER_CONSTANT, (0, 0, 0))
79 | save_img = cv2.resize(save_img, (image_x, image_y))
80 | rand = random.randint(0, 10)
81 | if rand % 2 == 0:
82 | save_img = cv2.flip(save_img, 1)
83 | cv2.putText(img, "Capturing...", (30, 60), cv2.FONT_HERSHEY_TRIPLEX, 2, (127, 255, 255))
84 | cv2.imwrite("gestures/"+str(g_id)+"/"+str(pic_no)+".jpg", save_img)
85 |
86 | cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)
87 | cv2.putText(img, str(pic_no), (30, 400), cv2.FONT_HERSHEY_TRIPLEX, 1.5, (127, 127, 255))
88 | cv2.imshow("Capturing gesture", img)
89 | cv2.imshow("thresh", thresh)
90 | keypress = cv2.waitKey(1)
91 | if keypress == ord('c'):
92 | if flag_start_capturing == False:
93 | flag_start_capturing = True
94 | else:
95 | flag_start_capturing = False
96 | frames = 0
97 | if flag_start_capturing == True:
98 | frames += 1
99 | if pic_no == total_pics:
100 | break
101 |
102 | init_create_folder_database()
103 | g_id = input("Enter gesture no.: ")
104 | g_name = input("Enter gesture name/text: ")
105 | store_in_db(g_id, g_name)
106 | store_images(g_id)
107 |
--------------------------------------------------------------------------------
/code/display_gestures.py:
--------------------------------------------------------------------------------
1 | import cv2, os, random
2 | import numpy as np
3 |
4 | def get_image_size():
5 | img = cv2.imread('gestures/0/100.jpg', 0)
6 | return img.shape
7 |
8 | gestures = os.listdir('gestures/')
9 | gestures.sort(key = int)
10 | begin_index = 0
11 | end_index = 5
12 | image_x, image_y = get_image_size()
13 |
14 | if len(gestures)%5 != 0:
15 | rows = int(len(gestures)/5)+1
16 | else:
17 | rows = int(len(gestures)/5)
18 |
19 | full_img = None
20 | for i in range(rows):
21 | col_img = None
22 | for j in range(begin_index, end_index):
23 | img_path = "gestures/%s/%d.jpg" % (j, random.randint(1, 1200))
24 | img = cv2.imread(img_path, 0)
25 | if np.any(img == None):
26 | img = np.zeros((image_y, image_x), dtype = np.uint8)
27 | if np.any(col_img == None):
28 | col_img = img
29 | else:
30 | col_img = np.hstack((col_img, img))
31 |
32 | begin_index += 5
33 | end_index += 5
34 | if np.any(full_img == None):
35 | full_img = col_img
36 | else:
37 | full_img = np.vstack((full_img, col_img))
38 |
39 |
40 | cv2.imshow("gestures", full_img)
41 | cv2.imwrite('full_img.jpg', full_img)
42 | cv2.waitKey(0)
43 |
--------------------------------------------------------------------------------
/code/final.py:
--------------------------------------------------------------------------------
1 | import cv2, pickle
2 | import numpy as np
3 | import tensorflow as tf
4 | from cnn_tf import cnn_model_fn
5 | import os
6 | import sqlite3, pyttsx3
7 | from keras.models import load_model
8 | from threading import Thread
9 |
10 | engine = pyttsx3.init()
11 | engine.setProperty('rate', 150)
12 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
13 | model = load_model('cnn_model_keras2.h5')
14 |
15 | def get_hand_hist():
16 | with open("hist", "rb") as f:
17 | hist = pickle.load(f)
18 | return hist
19 |
20 | def get_image_size():
21 | img = cv2.imread('gestures/0/100.jpg', 0)
22 | return img.shape
23 |
24 | image_x, image_y = get_image_size()
25 |
26 | def keras_process_image(img):
27 | img = cv2.resize(img, (image_x, image_y))
28 | img = np.array(img, dtype=np.float32)
29 | img = np.reshape(img, (1, image_x, image_y, 1))
30 | return img
31 |
32 | def keras_predict(model, image):
33 | processed = keras_process_image(image)
34 | pred_probab = model.predict(processed)[0]
35 | pred_class = list(pred_probab).index(max(pred_probab))
36 | return max(pred_probab), pred_class
37 |
38 | def get_pred_text_from_db(pred_class):
39 | conn = sqlite3.connect("gesture_db.db")
40 | cmd = "SELECT g_name FROM gesture WHERE g_id="+str(pred_class)
41 | cursor = conn.execute(cmd)
42 | for row in cursor:
43 | return row[0]
44 |
45 | def get_pred_from_contour(contour, thresh):
46 | x1, y1, w1, h1 = cv2.boundingRect(contour)
47 | save_img = thresh[y1:y1+h1, x1:x1+w1]
48 | text = ""
49 | if w1 > h1:
50 | save_img = cv2.copyMakeBorder(save_img, int((w1-h1)/2) , int((w1-h1)/2) , 0, 0, cv2.BORDER_CONSTANT, (0, 0, 0))
51 | elif h1 > w1:
52 | save_img = cv2.copyMakeBorder(save_img, 0, 0, int((h1-w1)/2) , int((h1-w1)/2) , cv2.BORDER_CONSTANT, (0, 0, 0))
53 | pred_probab, pred_class = keras_predict(model, save_img)
54 | if pred_probab*100 > 70:
55 | text = get_pred_text_from_db(pred_class)
56 | return text
57 |
58 | def get_operator(pred_text):
59 | try:
60 | pred_text = int(pred_text)
61 | except:
62 | return ""
63 | operator = ""
64 | if pred_text == 1:
65 | operator = "+"
66 | elif pred_text == 2:
67 | operator = "-"
68 | elif pred_text == 3:
69 | operator = "*"
70 | elif pred_text == 4:
71 | operator = "/"
72 | elif pred_text == 5:
73 | operator = "%"
74 | elif pred_text == 6:
75 | operator = "**"
76 | elif pred_text == 7:
77 | operator = ">>"
78 | elif pred_text == 8:
79 | operator = "<<"
80 | elif pred_text == 9:
81 | operator = "&"
82 | elif pred_text == 0:
83 | operator = "|"
84 | return operator
85 |
86 | hist = get_hand_hist()
87 | x, y, w, h = 300, 100, 300, 300
88 | is_voice_on = True
89 |
90 | def get_img_contour_thresh(img):
91 | img = cv2.flip(img, 1)
92 | imgHSV = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
93 | dst = cv2.calcBackProject([imgHSV], [0, 1], hist, [0, 180, 0, 256], 1)
94 | disc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(10,10))
95 | cv2.filter2D(dst,-1,disc,dst)
96 | blur = cv2.GaussianBlur(dst, (11,11), 0)
97 | blur = cv2.medianBlur(blur, 15)
98 | thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
99 | thresh = cv2.merge((thresh,thresh,thresh))
100 | thresh = cv2.cvtColor(thresh, cv2.COLOR_BGR2GRAY)
101 | thresh = thresh[y:y+h, x:x+w]
102 | contours = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[0]
103 | return img, contours, thresh
104 |
105 | def say_text(text):
106 | if not is_voice_on:
107 | return
108 | while engine._inLoop:
109 | pass
110 | engine.say(text)
111 | engine.runAndWait()
112 |
113 | def calculator_mode(cam):
114 | global is_voice_on
115 | flag = {"first": False, "operator": False, "second": False, "clear": False}
116 | count_same_frames = 0
117 | first, operator, second = "", "", ""
118 | pred_text = ""
119 | calc_text = ""
120 | info = "Enter first number"
121 | Thread(target=say_text, args=(info,)).start()
122 | count_clear_frames = 0
123 | while True:
124 | img = cam.read()[1]
125 | img = cv2.resize(img, (640, 480))
126 | img, contours, thresh = get_img_contour_thresh(img)
127 | old_pred_text = pred_text
128 | if len(contours) > 0:
129 | contour = max(contours, key = cv2.contourArea)
130 | if cv2.contourArea(contour) > 10000:
131 | pred_text = get_pred_from_contour(contour, thresh)
132 | if old_pred_text == pred_text:
133 | count_same_frames += 1
134 | else:
135 | count_same_frames = 0
136 |
137 | if pred_text == "C":
138 | if count_same_frames > 5:
139 | count_same_frames = 0
140 | first, second, operator, pred_text, calc_text = '', '', '', '', ''
141 | flag['first'], flag['operator'], flag['second'], flag['clear'] = False, False, False, False
142 | info = "Enter first number"
143 | Thread(target=say_text, args=(info,)).start()
144 |
145 | elif pred_text == "Best of Luck " and count_same_frames > 15:
146 | count_same_frames = 0
147 | if flag['clear']:
148 | first, second, operator, pred_text, calc_text = '', '', '', '', ''
149 | flag['first'], flag['operator'], flag['second'], flag['clear'] = False, False, False, False
150 | info = "Enter first number"
151 | Thread(target=say_text, args=(info,)).start()
152 | elif second != '':
153 | flag['second'] = True
154 | info = "Clear screen"
155 | #Thread(target=say_text, args=(info,)).start()
156 | second = ''
157 | flag['clear'] = True
158 | try:
159 | calc_text += "= "+str(eval(calc_text))
160 | except:
161 | calc_text = "Invalid operation"
162 | if is_voice_on:
163 | speech = calc_text
164 | speech = speech.replace('-', ' minus ')
165 | speech = speech.replace('/', ' divided by ')
166 | speech = speech.replace('**', ' raised to the power ')
167 | speech = speech.replace('*', ' multiplied by ')
168 | speech = speech.replace('%', ' mod ')
169 | speech = speech.replace('>>', ' bitwise right shift ')
170 | speech = speech.replace('<<', ' bitwise leftt shift ')
171 | speech = speech.replace('&', ' bitwise and ')
172 | speech = speech.replace('|', ' bitwise or ')
173 | Thread(target=say_text, args=(speech,)).start()
174 | elif first != '':
175 | flag['first'] = True
176 | info = "Enter operator"
177 | Thread(target=say_text, args=(info,)).start()
178 | first = ''
179 |
180 | elif pred_text != "Best of Luck " and pred_text.isnumeric():
181 | if flag['first'] == False:
182 | if count_same_frames > 15:
183 | count_same_frames = 0
184 | Thread(target=say_text, args=(pred_text,)).start()
185 | first += pred_text
186 | calc_text += pred_text
187 | elif flag['operator'] == False:
188 | operator = get_operator(pred_text)
189 | if count_same_frames > 15:
190 | count_same_frames = 0
191 | flag['operator'] = True
192 | calc_text += operator
193 | info = "Enter second number"
194 | Thread(target=say_text, args=(info,)).start()
195 | operator = ''
196 | elif flag['second'] == False:
197 | if count_same_frames > 15:
198 | Thread(target=say_text, args=(pred_text,)).start()
199 | second += pred_text
200 | calc_text += pred_text
201 | count_same_frames = 0
202 |
203 | if count_clear_frames == 30:
204 | first, second, operator, pred_text, calc_text = '', '', '', '', ''
205 | flag['first'], flag['operator'], flag['second'], flag['clear'] = False, False, False, False
206 | info = "Enter first number"
207 | Thread(target=say_text, args=(info,)).start()
208 | count_clear_frames = 0
209 |
210 | blackboard = np.zeros((480, 640, 3), dtype=np.uint8)
211 | cv2.putText(blackboard, "Calculator Mode", (100, 50), cv2.FONT_HERSHEY_TRIPLEX, 1.5, (255, 0,0))
212 | cv2.putText(blackboard, "Predicted text- " + pred_text, (30, 100), cv2.FONT_HERSHEY_TRIPLEX, 1, (255, 255, 0))
213 | cv2.putText(blackboard, "Operator " + operator, (30, 140), cv2.FONT_HERSHEY_TRIPLEX, 1, (255, 255, 127))
214 | cv2.putText(blackboard, calc_text, (30, 240), cv2.FONT_HERSHEY_TRIPLEX, 2, (255, 255, 255))
215 | cv2.putText(blackboard, info, (30, 440), cv2.FONT_HERSHEY_TRIPLEX, 1, (0, 255, 255) )
216 | if is_voice_on:
217 | cv2.putText(blackboard, " ", (450, 440), cv2.FONT_HERSHEY_TRIPLEX, 1, (255, 127, 0))
218 | else:
219 | cv2.putText(blackboard, " ", (450, 440), cv2.FONT_HERSHEY_TRIPLEX, 1, (255, 127, 0))
220 | cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)
221 | res = np.hstack((img, blackboard))
222 | cv2.imshow("Recognizing gesture", res)
223 | cv2.imshow("thresh", thresh)
224 | keypress = cv2.waitKey(1)
225 | if keypress == ord('q') or keypress == ord('t'):
226 | break
227 | if keypress == ord('v') and is_voice_on:
228 | is_voice_on = False
229 | elif keypress == ord('v') and not is_voice_on:
230 | is_voice_on = True
231 |
232 | if keypress == ord('t'):
233 | return 1
234 | else:
235 | return 0
236 |
237 | def text_mode(cam):
238 | global is_voice_on
239 | text = ""
240 | word = ""
241 | count_same_frame = 0
242 | while True:
243 | img = cam.read()[1]
244 | img = cv2.resize(img, (640, 480))
245 | img, contours, thresh = get_img_contour_thresh(img)
246 | old_text = text
247 | if len(contours) > 0:
248 | contour = max(contours, key = cv2.contourArea)
249 | if cv2.contourArea(contour) > 10000:
250 | text = get_pred_from_contour(contour, thresh)
251 | if old_text == text:
252 | count_same_frame += 1
253 | else:
254 | count_same_frame = 0
255 |
256 | if count_same_frame > 20:
257 | if len(text) == 1:
258 | Thread(target=say_text, args=(text, )).start()
259 | word = word + text
260 | if word.startswith('I/Me '):
261 | word = word.replace('I/Me ', 'I ')
262 | elif word.endswith('I/Me '):
263 | word = word.replace('I/Me ', 'me ')
264 | count_same_frame = 0
265 |
266 | elif cv2.contourArea(contour) < 1000:
267 | if word != '':
268 | #print('yolo')
269 | #say_text(text)
270 | Thread(target=say_text, args=(word, )).start()
271 | text = ""
272 | word = ""
273 | else:
274 | if word != '':
275 | #print('yolo1')
276 | #say_text(text)
277 | Thread(target=say_text, args=(word, )).start()
278 | text = ""
279 | word = ""
280 | blackboard = np.zeros((480, 640, 3), dtype=np.uint8)
281 | cv2.putText(blackboard, " ", (180, 50), cv2.FONT_HERSHEY_TRIPLEX, 1.5, (255, 0,0))
282 | cv2.putText(blackboard, "Predicted text- " + text, (30, 100), cv2.FONT_HERSHEY_TRIPLEX, 1, (255, 255, 0))
283 | cv2.putText(blackboard, word, (30, 240), cv2.FONT_HERSHEY_TRIPLEX, 2, (255, 255, 255))
284 | if is_voice_on:
285 | cv2.putText(blackboard, " ", (450, 440), cv2.FONT_HERSHEY_TRIPLEX, 1, (255, 127, 0))
286 | else:
287 | cv2.putText(blackboard, " ", (450, 440), cv2.FONT_HERSHEY_TRIPLEX, 1, (255, 127, 0))
288 | cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)
289 | res = np.hstack((img, blackboard))
290 | cv2.imshow("Recognizing gesture", res)
291 | cv2.imshow("thresh", thresh)
292 | keypress = cv2.waitKey(1)
293 | if keypress == ord('q') or keypress == ord('c'):
294 | break
295 | if keypress == ord('v') and is_voice_on:
296 | is_voice_on = False
297 | elif keypress == ord('v') and not is_voice_on:
298 | is_voice_on = True
299 |
300 | if keypress == ord('c'):
301 | return 2
302 | else:
303 | return 0
304 |
305 | def recognize():
306 | cam = cv2.VideoCapture(1)
307 | if cam.read()[0]==False:
308 | cam = cv2.VideoCapture(0)
309 | text = ""
310 | word = ""
311 | count_same_frame = 0
312 | keypress = 1
313 | while True:
314 | if keypress == 1:
315 | keypress = text_mode(cam)
316 | elif keypress == 2:
317 | keypress = calculator_mode(cam)
318 | else:
319 | break
320 |
321 | keras_predict(model, np.zeros((50, 50), dtype = np.uint8))
322 | recognize()
323 |
--------------------------------------------------------------------------------
/code/gesture_db.db:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/code/gesture_db.db
--------------------------------------------------------------------------------
/code/hist:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/code/hist
--------------------------------------------------------------------------------
/code/load_images.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | from glob import glob
3 | import numpy as np
4 | import random
5 | from sklearn.utils import shuffle
6 | import pickle
7 | import os
8 |
9 | def pickle_images_labels():
10 | images_labels = []
11 | images = glob("gestures/*/*.jpg")
12 | images.sort()
13 | for image in images:
14 | print(image)
15 | label = image[image.find(os.sep)+1: image.rfind(os.sep)]
16 | img = cv2.imread(image, 0)
17 | images_labels.append((np.array(img, dtype=np.uint8), int(label)))
18 | return images_labels
19 |
20 | images_labels = pickle_images_labels()
21 | images_labels = shuffle(shuffle(shuffle(shuffle(images_labels))))
22 | images, labels = zip(*images_labels)
23 | print("Length of images_labels", len(images_labels))
24 |
25 | train_images = images[:int(5/6*len(images))]
26 | print("Length of train_images", len(train_images))
27 | with open("train_images", "wb") as f:
28 | pickle.dump(train_images, f)
29 | del train_images
30 |
31 | train_labels = labels[:int(5/6*len(labels))]
32 | print("Length of train_labels", len(train_labels))
33 | with open("train_labels", "wb") as f:
34 | pickle.dump(train_labels, f)
35 | del train_labels
36 |
37 | test_images = images[int(5/6*len(images)):int(11/12*len(images))]
38 | print("Length of test_images", len(test_images))
39 | with open("test_images", "wb") as f:
40 | pickle.dump(test_images, f)
41 | del test_images
42 |
43 | test_labels = labels[int(5/6*len(labels)):int(11/12*len(images))]
44 | print("Length of test_labels", len(test_labels))
45 | with open("test_labels", "wb") as f:
46 | pickle.dump(test_labels, f)
47 | del test_labels
48 |
49 | val_images = images[int(11/12*len(images)):]
50 | print("Length of test_images", len(val_images))
51 | with open("val_images", "wb") as f:
52 | pickle.dump(val_images, f)
53 | del val_images
54 |
55 | val_labels = labels[int(11/12*len(labels)):]
56 | print("Length of val_labels", len(val_labels))
57 | with open("val_labels", "wb") as f:
58 | pickle.dump(val_labels, f)
59 | del val_labels
60 |
--------------------------------------------------------------------------------
/code/set_hand_histogram.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import numpy as np
3 | import pickle
4 |
5 | def build_squares(img):
6 | x, y, w, h = 420, 140, 10, 10
7 | d = 10
8 | imgCrop = None
9 | crop = None
10 | for i in range(10):
11 | for j in range(5):
12 | if np.any(imgCrop == None):
13 | imgCrop = img[y:y+h, x:x+w]
14 | else:
15 | imgCrop = np.hstack((imgCrop, img[y:y+h, x:x+w]))
16 | #print(imgCrop.shape)
17 | cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 1)
18 | x+=w+d
19 | if np.any(crop == None):
20 | crop = imgCrop
21 | else:
22 | crop = np.vstack((crop, imgCrop))
23 | imgCrop = None
24 | x = 420
25 | y+=h+d
26 | return crop
27 |
28 | def get_hand_hist():
29 | cam = cv2.VideoCapture(1)
30 | if cam.read()[0]==False:
31 | cam = cv2.VideoCapture(0)
32 | x, y, w, h = 300, 100, 300, 300
33 | flagPressedC, flagPressedS = False, False
34 | imgCrop = None
35 | while True:
36 | img = cam.read()[1]
37 | img = cv2.flip(img, 1)
38 | img = cv2.resize(img, (640, 480))
39 | hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
40 |
41 | keypress = cv2.waitKey(1)
42 | if keypress == ord('c'):
43 | hsvCrop = cv2.cvtColor(imgCrop, cv2.COLOR_BGR2HSV)
44 | flagPressedC = True
45 | hist = cv2.calcHist([hsvCrop], [0, 1], None, [180, 256], [0, 180, 0, 256])
46 | cv2.normalize(hist, hist, 0, 255, cv2.NORM_MINMAX)
47 | elif keypress == ord('s'):
48 | flagPressedS = True
49 | break
50 | if flagPressedC:
51 | dst = cv2.calcBackProject([hsv], [0, 1], hist, [0, 180, 0, 256], 1)
52 | dst1 = dst.copy()
53 | disc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(10,10))
54 | cv2.filter2D(dst,-1,disc,dst)
55 | blur = cv2.GaussianBlur(dst, (11,11), 0)
56 | blur = cv2.medianBlur(blur, 15)
57 | ret,thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
58 | thresh = cv2.merge((thresh,thresh,thresh))
59 | #cv2.imshow("res", res)
60 | cv2.imshow("Thresh", thresh)
61 | if not flagPressedS:
62 | imgCrop = build_squares(img)
63 | #cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)
64 | cv2.imshow("Set hand histogram", img)
65 | cam.release()
66 | cv2.destroyAllWindows()
67 | with open("hist", "wb") as f:
68 | pickle.dump(hist, f)
69 |
70 |
71 | get_hand_hist()
72 |
--------------------------------------------------------------------------------
/img/Capture.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/img/Capture.PNG
--------------------------------------------------------------------------------
/img/Capture1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/img/Capture1.PNG
--------------------------------------------------------------------------------
/img/asd:
--------------------------------------------------------------------------------
1 | sdf
2 |
--------------------------------------------------------------------------------
/img/demo.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/img/demo.gif
--------------------------------------------------------------------------------
/img/demo1.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/img/demo1.gif
--------------------------------------------------------------------------------
/img/demo2.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashish1993utd/Sign-Language-Interpreter-using-Deep-Learning/752dced17e0b2a7681f4f2d6d45afab2f8a56b7d/img/demo2.gif
--------------------------------------------------------------------------------