├── README.md ├── image_detection.py ├── livefeed_detection.py ├── models └── sign.pt ├── requirements.txt └── scripts ├── creating_data.py ├── gifs ├── gif2.gif └── proof.gif └── model_training_deepstack.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Sign Language-to-Speech with DeepStack's Custom API 2 | 3 | ![](https://github.com/SteveKola/Sign-Language-to-Speech-with-DeepStack-Custom-API/blob/main/scripts/gifs/gif2.gif) 4 | 5 | This project is an end-to-end working prototype that uses Artificial intelligence to detect sign language meanings 6 | in images/videos and generate equivalent, realistic voice of words communicated by the sign language. 7 | 8 | 9 | ## Steps to run the project 10 | ### 1. Install DeepStack using Docker. (Skip this if you already have DeepStack installed) 11 | - Docker needs to be installed first. For Mac OS and Windows users can install Docker from 12 | [Docker's website](https://www.docker.com/products/docker-desktop). 13 | - To install on a Linux OS,run the code below; 14 | 15 | ``` 16 | sudo apt-get update && sudo apt-get install docker.io 17 | ``` 18 | - Install DeepStack. *You might want to grab a coffee while waiting for this to finish its execution :smirk:* 19 | ``` 20 | docker pull deepquestai/deepstack 21 | ``` 22 | - Test DeepQuest. 23 | ``` 24 | docker run -e VISION-SCENE=True -v localstorage:/datastore -p 80:5000 deepquestai/deepstack 25 | ``` 26 | **NOTE:** This works for the CPU variant only. To explore the other ways to install, check the 27 | [official tutorial](https://docs.deepstack.cc/#installation-guide-for-cpu-version). 28 | 29 | > **Error starting userland proxy: listen tcp4 0.0.0.0:80: bind: address already in use.** 30 | 31 | If you come across this error, change `-p 80:5000` to another port, e.g., `-p 88:5000`. 32 | (I'm using **88**, too :blush:). 33 | 34 | 35 | ### 2. Clone the Project Repository and Install Dependencies 36 | - To clone this repo, copy and run the command below in your bash and change into the new 37 | directory with the next line of code. 38 | ``` 39 | git clone https://github.com/SteveKola/Sign-Language-to-Speech-with-DeepStack-Custom-API.git 40 | cd Sign-Language-to-Speech-with-DeepStack-Custom-API 41 | ``` 42 | - To avoid potential *dependency hell*, create a virtual enviroment and 43 | activate the created virtual environment afterwards. 44 | ``` 45 | python3 -m venv env 46 | source env/bin/activate 47 | ``` 48 | - Install the dependencies using `pip install -r requirements`. 49 | - If you are on a Linux OS, TTS engines might not be pre-installed on your platform. Use the code below to install them. 50 | ``` 51 | sudo apt-get update && sudo apt-get install espeak ffmpeg libespeak1 52 | ``` 53 | 54 | 55 | ### 3. Spin up the DeepStack custom model's Server. 56 | - While still in the project directory's root, spin up the deepstack custom model's server by running the command below; 57 | ``` 58 | sudo docker run -v your_local/path/to/Sign-Language-to-Speech-with-DeepStack-Custom-API/models:/modelstore/detection -p 88:5000 deepquestai/deepstack 59 | ``` 60 | 61 | ### 4. Detect sign language meanings in image files and generate realistic voice of words. 62 | - run the image_detection script on the image; 63 | ``` 64 | python image_detection.py image_filename.file_extension 65 | ``` 66 | My default port number is 88. To specify the port on which DeepStack server is running, run this instead; 67 | ``` 68 | python image_detection.py image_filename.file_extension --deepstack-port port_number 69 | ``` 70 | Running the above command would return two new files in your project root directory - 71 | 72 | 1. a copy of the image with bbox around the detected sign with the meaning on the top of the box, 73 | 2. an audiofile of the detected sign language. 74 | 75 | ![image](https://user-images.githubusercontent.com/45284829/123965899-cfde8080-d9ac-11eb-874e-14d69b2e0c0c.png) 76 | ![image](https://user-images.githubusercontent.com/45284829/123966073-f4d2f380-d9ac-11eb-8053-80a92130dedc.png) 77 | 78 | ### 5. Detect sign language meanings on a live video (via webcam). 79 | - run the livefeed detection script; 80 | ``` 81 | python livefeed_detection.py 82 | ``` 83 | My default port number is 88. To specify the port on which DeepStack server is running, run this instead; 84 | ``` 85 | python livefeed_detection.py --deepstack-port port_number 86 | ``` 87 | This will spin up the webcam and would automatically detect any sign language words in view of the camera, 88 | while also displaying the sign meaning and returning its speech equivalent immediately through the PC's audio system. 89 | 90 | To quit, press `q`. 91 | 92 | ![](https://github.com/SteveKola/Sign-Language-to-Speech-with-DeepStack-Custom-API/blob/main/scripts/gifs/proof.gif) 93 | 94 | 95 | ## Additional Notes 96 | - **This project has built and tested successfully on a Linux machine. Other errors might arise on other Operating Systems, 97 | which might not have been accounted for in this documentation**. 98 | - The dataset used in training the model was created via my webcam using an automation scipt. 99 | [scripts/creating_data.py](https://github.com/SteveKola/Sign-Language-to-Speech-with-DeepStack-Custom-API/blob/main/scripts/creating_data.py) 100 | is the script used. 101 | - My dataset could be found in [this repository](https://github.com/SteveKola/Sign-Language-to-Speech-with-DeepStack-Custom-API/tree/main/scripts). 102 | The repo contains both the DeepStack model's data and the TensorFlow Object Detection API's data (I did that about a month before this). 103 | - Dataset was annotated in YOLOv format using [LabelImg](https://github.com/tzutalin/labelImg). 104 | - Model was trained using Colab GPU. 105 | [scripts/model_training_deepstack.ipynb](https://github.com/SteveKola/Sign-Language-to-Speech-with-DeepStack-Custom-API/blob/main/scripts/model_training_deepstack.ipynb) 106 | is the notebook used for that purpose. 107 | 108 | ## Attributions 109 | - The [DeepStack custom models' official docs](https://docs.deepstack.cc/custom-models/) contains everything that'd be 110 | needed to replicate the whole building process. It is lean and concise. 111 | - A big **thank you** to [Patrick Ryan](https://github.com/youngsoul) for making it seem like 112 | the project is not too herculean in his [article](https://docs.deepstack.cc/custom-models/). 113 | - I got my first introduction to DeepStack's custom models with this 114 | [article](https://medium.com/deepquestai/detect-any-custom-object-with-deepstack-dd0a824a761e). 115 | Having built few with TensorFlow, I can't appreciate this enough. 116 | -------------------------------------------------------------------------------- /image_detection.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import pyttsx3 3 | from deepstack_sdk import Detection, ServerConfig 4 | 5 | 6 | def predict_sign(image, port): 7 | """ 8 | Function to return the label, and coordinates of the bbox, 9 | while saving the new image (image with bbox) 10 | and audiofile to directory. 11 | 12 | params 13 | image name of the file 14 | port deepstack's server port, default is 88 15 | 16 | returns 17 | image_new new image with sign's bbox 18 | image_audio audio of the sign, converted by TTS synthesizer 19 | """ 20 | text_engine = pyttsx3.init() 21 | config = ServerConfig(f"http://localhost:{port}") 22 | detector = Detection(config=config, name="sign") 23 | 24 | name_stripped = image.rsplit('.', 1)[0]\ 25 | .replace("'", "_")\ 26 | .replace(" ", "_") 27 | 28 | detections = detector.detectObject( 29 | image=image, 30 | output=name_stripped + "_new.jpg") 31 | 32 | for detection in detections: 33 | print("Name: {}".format(detection.label)) 34 | print("Confidence: {}".format(detection.confidence)) 35 | print("x_min: {}".format(detection.x_min)) 36 | print("x_max: {}".format(detection.x_max)) 37 | print("y_min: {}".format(detection.y_min)) 38 | print("y_max: {}".format(detection.y_max)) 39 | 40 | if detection.label: 41 | text = detection.label 42 | audioname = name_stripped + "_audio.mp3" 43 | text_engine.save_to_file(text, audioname) 44 | text_engine.runAndWait() 45 | 46 | 47 | if __name__ == '__main__': 48 | # construct the argument parser and parse the arguments 49 | parser = argparse.ArgumentParser() 50 | parser.add_argument("image_filename", 51 | type=str, 52 | help="name of image to run inference on") 53 | parser.add_argument("--deepstack-port", 54 | type=int, 55 | default=88, 56 | help="port on which the deepstack server's docker image is running") 57 | args = vars(parser.parse_args()) 58 | 59 | image = args['image_filename'] 60 | port = args['deepstack_port'] 61 | predict_sign(image, port) -------------------------------------------------------------------------------- /livefeed_detection.py: -------------------------------------------------------------------------------- 1 | import cv2, requests, time 2 | 3 | import argparse 4 | import imutils 5 | from imutils.video import VideoStream 6 | 7 | import pyttsx3 8 | 9 | 10 | def predict_sign(frame, url): 11 | """ 12 | Function to return the response JSON from Deepstack's server 13 | containing the confidence, label and bbox's coordinates. 14 | 15 | params 16 | frame each image frame of the live video 17 | url Deepstack server's localhost URL 18 | 19 | returns 20 | prediction JSON response from the server 21 | """ 22 | s = time.time() 23 | response = requests.post(url, files={"image": frame}).json() 24 | e = time.time() 25 | print(f"Inferences took: {e -s} seconds.") 26 | print(response) 27 | 28 | if "success" in response and response['success'] and len(response['predictions']) > 0: 29 | prediction = response['predictions'][0] 30 | for object in response["predictions"]: 31 | print(object["label"]) 32 | else: 33 | prediction = None 34 | return prediction 35 | 36 | 37 | if __name__ == '__main__': 38 | # construct the argument parser and parse the arguments 39 | parser = argparse.ArgumentParser() 40 | parser.add_argument("--deepstack-port", 41 | type=str, 42 | default=88, 43 | help="url to the deepstack server's docker image") 44 | args = vars(parser.parse_args()) 45 | 46 | deepstack_url = f"http://localhost:{args['deepstack_port']}/v1/vision/custom/sign" 47 | 48 | # initialize the video stream and allow the camera sensor to warm up 49 | print("[INFO] starting video stream...") 50 | stream = VideoStream(src=0).start() 51 | time.sleep(2.0) 52 | 53 | color = (0, 255, 0) 54 | 55 | #initialize the Text-to-speech engine 56 | text_engine = pyttsx3.init() 57 | 58 | # loop over the frames from the video stream 59 | while True: 60 | # grab the frame from the threaded video stream and resize it 61 | # to have a maximum width of 400 pixels 62 | frame = stream.read() 63 | frame = imutils.resize(frame, width=400) 64 | success, encoded_image = cv2.imencode('.jpg', frame) 65 | source_image = content2 = encoded_image.tobytes() 66 | 67 | print("Predict...") 68 | prediction = predict_sign(source_image, deepstack_url) 69 | 70 | label = '' 71 | if prediction is not None: 72 | confidence = prediction['confidence'] 73 | label = prediction['label'] 74 | y_min = prediction['y_min'] 75 | y_max = prediction['y_max'] 76 | x_min = prediction['x_min'] 77 | x_max = prediction['x_max'] 78 | 79 | # display the label and bounding box rectangle on the output 80 | # frame 81 | cv2.putText(frame, f"{label} {confidence}", (x_min, y_min - 10), 82 | cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2) 83 | cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), color, 2) 84 | cv2.imshow("Frame", frame) 85 | 86 | # convert to sound 87 | if label: 88 | text = label 89 | text_engine.say(text) 90 | text_engine.runAndWait() 91 | 92 | key = cv2.waitKey(1) & 0xFF 93 | 94 | # if the `q` key was pressed, break from loop 95 | if key == ord("q"): 96 | break 97 | 98 | # do a bit of cleanup 99 | cv2.destroyAllWindows() 100 | stream.stop() -------------------------------------------------------------------------------- /models/sign.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stevenkolawole/Sign-Language-to-Speech-with-DeepStack-Custom-API/ca9c95547fa1db0f2e0372f741fc9a3b5aaa227b/models/sign.pt -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | opencv-python 2 | imutils 3 | pyttsx3 4 | deepstack-sdk 5 | -------------------------------------------------------------------------------- /scripts/creating_data.py: -------------------------------------------------------------------------------- 1 | import cv2, os, time 2 | 3 | IMAGES_PATH = 'dataset' 4 | try: 5 | os.mkdir(IMAGES_PATH) 6 | except OSError: 7 | pass 8 | labels = [ 'nice to meet you', 'hello', 'thanks', 'yes', 'no', 9 | 'iloveyou', 'please', 'sorry', 'you\'re welcome',] 10 | NUMBER_IMGS = 15 11 | 12 | for label in labels: 13 | try: 14 | os.mkdir(os.path.join(IMAGES_PATH, label)) 15 | except OSError as error: 16 | pass 17 | capture = cv2.VideoCapture(0) 18 | print(f'Collecting images for {label}') 19 | time.sleep(15) 20 | for image_num in range(NUMBER_IMGS): 21 | ret, frame = capture.read() 22 | image_name = os.path.join(IMAGES_PATH, label, f'{label}_{image_num}.jpg') 23 | cv2.imwrite(image_name, frame) 24 | cv2.imshow('frame', frame) 25 | time.sleep(3) 26 | 27 | if cv2.waitKey(1) & 0xFF == ord('q'): 28 | break 29 | capture.release() -------------------------------------------------------------------------------- /scripts/gifs/gif2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stevenkolawole/Sign-Language-to-Speech-with-DeepStack-Custom-API/ca9c95547fa1db0f2e0372f741fc9a3b5aaa227b/scripts/gifs/gif2.gif -------------------------------------------------------------------------------- /scripts/gifs/proof.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stevenkolawole/Sign-Language-to-Speech-with-DeepStack-Custom-API/ca9c95547fa1db0f2e0372f741fc9a3b5aaa227b/scripts/gifs/proof.gif -------------------------------------------------------------------------------- /scripts/model_training_deepstack.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Sign Language Detector with DeepStack.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | }, 17 | "accelerator": "GPU" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "3Bqhevpnutj4" 24 | }, 25 | "source": [ 26 | "## Clone the DeepStack Trainer" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "metadata": { 32 | "colab": { 33 | "base_uri": "https://localhost:8080/" 34 | }, 35 | "id": "MPmZg31Hulj5", 36 | "outputId": "44939712-2d3a-474f-c46c-e46cff35036c" 37 | }, 38 | "source": [ 39 | "!git clone https://github.com/johnolafenwa/deepstack-trainer\n", 40 | "%cd deepstack-trainer\n", 41 | "!pip install -r requirements.txt\n", 42 | "!pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html" 43 | ], 44 | "execution_count": null, 45 | "outputs": [ 46 | { 47 | "output_type": "stream", 48 | "text": [ 49 | "Cloning into 'deepstack-trainer'...\n", 50 | "remote: Enumerating objects: 119, done.\u001b[K\n", 51 | "remote: Counting objects: 100% (119/119), done.\u001b[K\n", 52 | "remote: Compressing objects: 100% (91/91), done.\u001b[K\n", 53 | "remote: Total 119 (delta 40), reused 101 (delta 25), pack-reused 0\u001b[K\n", 54 | "Receiving objects: 100% (119/119), 1001.86 KiB | 1.37 MiB/s, done.\n", 55 | "Resolving deltas: 100% (40/40), done.\n", 56 | "/content/deepstack-trainer\n", 57 | "Requirement already satisfied: Cython in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 4)) (0.29.23)\n", 58 | "Requirement already satisfied: matplotlib>=3.2.2 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 5)) (3.2.2)\n", 59 | "Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 6)) (1.19.5)\n", 60 | "Requirement already satisfied: opencv-python>=4.1.2 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 7)) (4.1.2.30)\n", 61 | "Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 8)) (7.1.2)\n", 62 | "Collecting PyYAML>=5.3\n", 63 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)\n", 64 | "\u001b[K |████████████████████████████████| 645kB 4.2MB/s \n", 65 | "\u001b[?25hRequirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 10)) (1.4.1)\n", 66 | "Requirement already satisfied: tensorboard>=2.2 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 11)) (2.5.0)\n", 67 | "Requirement already satisfied: torch>=1.7.0 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 12)) (1.9.0+cu102)\n", 68 | "Requirement already satisfied: torchvision>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 13)) (0.10.0+cu102)\n", 69 | "Requirement already satisfied: tqdm>=4.41.0 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 14)) (4.41.1)\n", 70 | "Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 20)) (0.11.1)\n", 71 | "Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 21)) (1.1.5)\n", 72 | "Collecting thop\n", 73 | " Downloading https://files.pythonhosted.org/packages/6c/8b/22ce44e1c71558161a8bd54471123cc796589c7ebbfc15a7e8932e522f83/thop-0.0.31.post2005241907-py3-none-any.whl\n", 74 | "Requirement already satisfied: pycocotools>=2.0 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 30)) (2.0.2)\n", 75 | "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.2.2->-r requirements.txt (line 5)) (1.3.1)\n", 76 | "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.2.2->-r requirements.txt (line 5)) (2.8.1)\n", 77 | "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.2.2->-r requirements.txt (line 5)) (0.10.0)\n", 78 | "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.2.2->-r requirements.txt (line 5)) (2.4.7)\n", 79 | "Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (0.6.1)\n", 80 | "Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (2.23.0)\n", 81 | "Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (1.0.1)\n", 82 | "Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (0.12.0)\n", 83 | "Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (57.0.0)\n", 84 | "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (0.4.4)\n", 85 | "Requirement already satisfied: grpcio>=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (1.34.1)\n", 86 | "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (3.3.4)\n", 87 | "Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (1.31.0)\n", 88 | "Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (3.12.4)\n", 89 | "Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (1.8.0)\n", 90 | "Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2->-r requirements.txt (line 11)) (0.36.2)\n", 91 | "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.7.0->-r requirements.txt (line 12)) (3.7.4.3)\n", 92 | "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->-r requirements.txt (line 21)) (2018.9)\n", 93 | "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib>=3.2.2->-r requirements.txt (line 5)) (1.15.0)\n", 94 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2->-r requirements.txt (line 11)) (3.0.4)\n", 95 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2->-r requirements.txt (line 11)) (1.24.3)\n", 96 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2->-r requirements.txt (line 11)) (2.10)\n", 97 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2->-r requirements.txt (line 11)) (2021.5.30)\n", 98 | "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2->-r requirements.txt (line 11)) (1.3.0)\n", 99 | "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard>=2.2->-r requirements.txt (line 11)) (4.5.0)\n", 100 | "Requirement already satisfied: rsa<5,>=3.1.4; python_version >= \"3.6\" in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard>=2.2->-r requirements.txt (line 11)) (4.7.2)\n", 101 | "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard>=2.2->-r requirements.txt (line 11)) (4.2.2)\n", 102 | "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard>=2.2->-r requirements.txt (line 11)) (0.2.8)\n", 103 | "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2->-r requirements.txt (line 11)) (3.1.1)\n", 104 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->markdown>=2.6.8->tensorboard>=2.2->-r requirements.txt (line 11)) (3.4.1)\n", 105 | "Requirement already satisfied: pyasn1>=0.1.3 in /usr/local/lib/python3.7/dist-packages (from rsa<5,>=3.1.4; python_version >= \"3.6\"->google-auth<2,>=1.6.3->tensorboard>=2.2->-r requirements.txt (line 11)) (0.4.8)\n", 106 | "Installing collected packages: PyYAML, thop\n", 107 | " Found existing installation: PyYAML 3.13\n", 108 | " Uninstalling PyYAML-3.13:\n", 109 | " Successfully uninstalled PyYAML-3.13\n", 110 | "Successfully installed PyYAML-5.4.1 thop-0.0.31.post2005241907\n", 111 | "Looking in links: https://download.pytorch.org/whl/torch_stable.html\n", 112 | "Collecting torch==1.7.0+cu110\n", 113 | "\u001b[?25l Downloading https://download.pytorch.org/whl/cu110/torch-1.7.0%2Bcu110-cp37-cp37m-linux_x86_64.whl (1137.1MB)\n", 114 | "\u001b[K |███████████████████████▌ | 834.1MB 1.5MB/s eta 0:03:16tcmalloc: large alloc 1147494400 bytes == 0x5584df92c000 @ 0x7faed4ac7615 0x5584a562dcdc 0x5584a570d52a 0x5584a5630afd 0x5584a5721fed 0x5584a56a4988 0x5584a569f4ae 0x5584a56323ea 0x5584a56a47f0 0x5584a569f4ae 0x5584a56323ea 0x5584a56a132a 0x5584a5722e36 0x5584a56a0853 0x5584a5722e36 0x5584a56a0853 0x5584a5722e36 0x5584a56a0853 0x5584a5722e36 0x5584a57a53e1 0x5584a57056a9 0x5584a5670cc4 0x5584a5631559 0x5584a56a54f8 0x5584a563230a 0x5584a56a03b5 0x5584a569f7ad 0x5584a56323ea 0x5584a56a03b5 0x5584a563230a 0x5584a56a03b5\n", 115 | "\u001b[K |█████████████████████████████▊ | 1055.7MB 1.4MB/s eta 0:00:57tcmalloc: large alloc 1434370048 bytes == 0x558523f82000 @ 0x7faed4ac7615 0x5584a562dcdc 0x5584a570d52a 0x5584a5630afd 0x5584a5721fed 0x5584a56a4988 0x5584a569f4ae 0x5584a56323ea 0x5584a56a47f0 0x5584a569f4ae 0x5584a56323ea 0x5584a56a132a 0x5584a5722e36 0x5584a56a0853 0x5584a5722e36 0x5584a56a0853 0x5584a5722e36 0x5584a56a0853 0x5584a5722e36 0x5584a57a53e1 0x5584a57056a9 0x5584a5670cc4 0x5584a5631559 0x5584a56a54f8 0x5584a563230a 0x5584a56a03b5 0x5584a569f7ad 0x5584a56323ea 0x5584a56a03b5 0x5584a563230a 0x5584a56a03b5\n", 116 | "\u001b[K |████████████████████████████████| 1137.1MB 37.3MB/s eta 0:00:01tcmalloc: large alloc 1421369344 bytes == 0x55857976e000 @ 0x7faed4ac7615 0x5584a562dcdc 0x5584a570d52a 0x5584a5630afd 0x5584a5721fed 0x5584a56a4988 0x5584a569f4ae 0x5584a56323ea 0x5584a56a060e 0x5584a569f4ae 0x5584a56323ea 0x5584a56a060e 0x5584a569f4ae 0x5584a56323ea 0x5584a56a060e 0x5584a569f4ae 0x5584a56323ea 0x5584a56a060e 0x5584a569f4ae 0x5584a56323ea 0x5584a56a060e 0x5584a563230a 0x5584a56a060e 0x5584a569f4ae 0x5584a56323ea 0x5584a56a132a 0x5584a569f4ae 0x5584a56323ea 0x5584a56a132a 0x5584a569f4ae 0x5584a5632a81\n", 117 | "\u001b[K |████████████████████████████████| 1137.1MB 16kB/s \n", 118 | "\u001b[?25hCollecting torchvision==0.8.1+cu110\n", 119 | "\u001b[?25l Downloading https://download.pytorch.org/whl/cu110/torchvision-0.8.1%2Bcu110-cp37-cp37m-linux_x86_64.whl (12.9MB)\n", 120 | "\u001b[K |████████████████████████████████| 13.0MB 234kB/s \n", 121 | "\u001b[?25hCollecting torchaudio===0.7.0\n", 122 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/5d/75/5ce994c76cf7b53ff8c577d7a8221fa0c9dfe9e34c0536c6eaf3e466788a/torchaudio-0.7.0-cp37-cp37m-manylinux1_x86_64.whl (7.6MB)\n", 123 | "\u001b[K |████████████████████████████████| 7.6MB 4.3MB/s \n", 124 | "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torch==1.7.0+cu110) (1.19.5)\n", 125 | "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch==1.7.0+cu110) (3.7.4.3)\n", 126 | "Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from torch==1.7.0+cu110) (0.16.0)\n", 127 | "Collecting dataclasses\n", 128 | " Downloading https://files.pythonhosted.org/packages/26/2f/1095cdc2868052dd1e64520f7c0d5c8c550ad297e944e641dbf1ffbb9a5d/dataclasses-0.6-py3-none-any.whl\n", 129 | "Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from torchvision==0.8.1+cu110) (7.1.2)\n", 130 | "\u001b[31mERROR: torchtext 0.10.0 has requirement torch==1.9.0, but you'll have torch 1.7.0+cu110 which is incompatible.\u001b[0m\n", 131 | "Installing collected packages: dataclasses, torch, torchvision, torchaudio\n", 132 | " Found existing installation: torch 1.9.0+cu102\n", 133 | " Uninstalling torch-1.9.0+cu102:\n", 134 | " Successfully uninstalled torch-1.9.0+cu102\n", 135 | " Found existing installation: torchvision 0.10.0+cu102\n", 136 | " Uninstalling torchvision-0.10.0+cu102:\n", 137 | " Successfully uninstalled torchvision-0.10.0+cu102\n", 138 | "Successfully installed dataclasses-0.6 torch-1.7.0+cu110 torchaudio-0.7.0 torchvision-0.8.1+cu110\n" 139 | ], 140 | "name": "stdout" 141 | } 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "id": "Ac4c4ZpHveul" 148 | }, 149 | "source": [ 150 | "## Upload Dataset" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "metadata": { 156 | "colab": { 157 | "base_uri": "https://localhost:8080/", 158 | "height": 106 159 | }, 160 | "id": "nw6nHUFss9Vk", 161 | "outputId": "04b285ba-2985-41ab-9549-04c18d5735c3" 162 | }, 163 | "source": [ 164 | "import gdown \n", 165 | "\n", 166 | "url = 'https://drive.google.com/uc?id=1zjfnPyewHi8LpHzEOvUAdmtnugHsPoSS'\n", 167 | "output = 'dataset.zip'\n", 168 | "\n", 169 | "gdown.download(url, output, quiet=False)" 170 | ], 171 | "execution_count": null, 172 | "outputs": [ 173 | { 174 | "output_type": "stream", 175 | "text": [ 176 | "Downloading...\n", 177 | "From: https://drive.google.com/uc?id=1zjfnPyewHi8LpHzEOvUAdmtnugHsPoSS\n", 178 | "To: /content/deepstack-trainer/dataset.zip\n", 179 | "11.5MB [00:00, 28.8MB/s]\n" 180 | ], 181 | "name": "stderr" 182 | }, 183 | { 184 | "output_type": "execute_result", 185 | "data": { 186 | "application/vnd.google.colaboratory.intrinsic+json": { 187 | "type": "string" 188 | }, 189 | "text/plain": [ 190 | "'dataset.zip'" 191 | ] 192 | }, 193 | "metadata": { 194 | "tags": [] 195 | }, 196 | "execution_count": 3 197 | } 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "metadata": { 203 | "colab": { 204 | "base_uri": "https://localhost:8080/" 205 | }, 206 | "id": "IJrNFU86t5O6", 207 | "outputId": "53ece408-0b40-46f3-9271-5590b79a7ad5" 208 | }, 209 | "source": [ 210 | "!unzip dataset.zip" 211 | ], 212 | "execution_count": null, 213 | "outputs": [ 214 | { 215 | "output_type": "stream", 216 | "text": [ 217 | "Archive: dataset.zip\n", 218 | "replace dataset/test/nice to meet you_7.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: " 219 | ], 220 | "name": "stdout" 221 | } 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": { 227 | "id": "AAQCr1i7wg7K" 228 | }, 229 | "source": [ 230 | "### Model Training" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "metadata": { 236 | "colab": { 237 | "base_uri": "https://localhost:8080/" 238 | }, 239 | "id": "YSgVo0Of01ER", 240 | "outputId": "95e31eb5-3013-4d70-d581-1df92b48ab92" 241 | }, 242 | "source": [ 243 | "!python3 train.py --dataset-path \"dataset\"" 244 | ], 245 | "execution_count": null, 246 | "outputs": [ 247 | { 248 | "output_type": "stream", 249 | "text": [ 250 | "Using torch 1.7.0+cu110 CUDA:0 (Tesla K80, 11441MB)\n", 251 | "\n", 252 | "Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./models/yolov5m.yaml', classes='', data={'train': 'dataset/train', 'val': 'dataset/test', 'nc': 10, 'names': ['hello', 'i_love_you', 'nice_to_meet_you', 'no', 'please', 'sorry', 'thank_you', 'yes', 'you_are_welcome', '']}, dataset_path='dataset', device='', epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], local_rank=-1, log_imgs=16, model='yolov5m', multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='train-runs/dataset', rect=False, resume=False, save_dir='train-runs/dataset/exp', single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5m.pt', workers=8, world_size=1)\n", 253 | "Start Tensorboard with \"tensorboard --logdir train-runs/dataset\", view at http://localhost:6006/\n", 254 | "2021-06-29 17:09:07.394315: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n", 255 | "Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0}\n", 256 | "Downloading https://github.com/ultralytics/yolov5/releases/download/v3.1/yolov5m.pt to yolov5m.pt...\n", 257 | "100% 41.9M/41.9M [00:02<00:00, 16.2MB/s]\n", 258 | "\n", 259 | "Overriding model.yaml nc=80 with nc=10\n", 260 | "\n", 261 | " from n params module arguments \n", 262 | " 0 -1 1 5280 models.common.Focus [3, 48, 3] \n", 263 | " 1 -1 1 41664 models.common.Conv [48, 96, 3, 2] \n", 264 | " 2 -1 1 67680 models.common.BottleneckCSP [96, 96, 2] \n", 265 | " 3 -1 1 166272 models.common.Conv [96, 192, 3, 2] \n", 266 | " 4 -1 1 639168 models.common.BottleneckCSP [192, 192, 6] \n", 267 | " 5 -1 1 664320 models.common.Conv [192, 384, 3, 2] \n", 268 | " 6 -1 1 2550144 models.common.BottleneckCSP [384, 384, 6] \n", 269 | " 7 -1 1 2655744 models.common.Conv [384, 768, 3, 2] \n", 270 | " 8 -1 1 1476864 models.common.SPP [768, 768, [5, 9, 13]] \n", 271 | " 9 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False] \n", 272 | " 10 -1 1 295680 models.common.Conv [768, 384, 1, 1] \n", 273 | " 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] \n", 274 | " 12 [-1, 6] 1 0 models.common.Concat [1] \n", 275 | " 13 -1 1 1219968 models.common.BottleneckCSP [768, 384, 2, False] \n", 276 | " 14 -1 1 74112 models.common.Conv [384, 192, 1, 1] \n", 277 | " 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] \n", 278 | " 16 [-1, 4] 1 0 models.common.Concat [1] \n", 279 | " 17 -1 1 305856 models.common.BottleneckCSP [384, 192, 2, False] \n", 280 | " 18 -1 1 332160 models.common.Conv [192, 192, 3, 2] \n", 281 | " 19 [-1, 14] 1 0 models.common.Concat [1] \n", 282 | " 20 -1 1 1072512 models.common.BottleneckCSP [384, 384, 2, False] \n", 283 | " 21 -1 1 1327872 models.common.Conv [384, 384, 3, 2] \n", 284 | " 22 [-1, 10] 1 0 models.common.Concat [1] \n", 285 | " 23 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False] \n", 286 | " 24 [17, 20, 23] 1 60615 models.yolo.Detect [10, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]\n", 287 | "Model Summary: 391 layers, 21522183 parameters, 21522183 gradients, 51.5 GFLOPS\n", 288 | "\n", 289 | "Transferred 506/514 items from yolov5m.pt\n", 290 | "Optimizer groups: 86 .bias, 94 conv.weight, 83 other\n", 291 | "Scanning 'dataset/train' for images and labels... 96 found, 0 missing, 0 empty, 0 corrupted: 100% 96/96 [00:00<00:00, 1509.34it/s]\n", 292 | "New cache created: dataset/train.cache\n", 293 | "Scanning 'dataset/train.cache' for images and labels... 96 found, 0 missing, 0 empty, 0 corrupted: 100% 96/96 [00:00<00:00, 1011691.42it/s]\n", 294 | "Scanning 'dataset/test' for images and labels... 25 found, 0 missing, 0 empty, 0 corrupted: 100% 25/25 [00:00<00:00, 1672.80it/s]\n", 295 | "New cache created: dataset/test.cache\n", 296 | "Scanning 'dataset/test.cache' for images and labels... 25 found, 0 missing, 0 empty, 0 corrupted: 100% 25/25 [00:00<00:00, 133068.02it/s]\n", 297 | "\n", 298 | "Analyzing anchors... anchors/target = 4.74, Best Possible Recall (BPR) = 1.0000\n", 299 | "Image sizes 640 train, 640 test\n", 300 | "Using 2 dataloader workers\n", 301 | "Logging results to train-runs/dataset/exp\n", 302 | "Starting training for 300 epochs...\n", 303 | "\n", 304 | " Epoch gpu_mem box obj cls total targets img_size\n", 305 | " 0% 0/6 [00:00