├── .gitignore ├── README.md ├── app.py ├── config.py ├── data ├── obj.data └── obj.names ├── dependencies.txt ├── init.sh ├── install.sh ├── labelReader.py ├── samples ├── pic1.jpg └── pic2.jpg ├── sendImage.py ├── utils ├── PythonCompleter.py ├── azure_ocr.py ├── classifier.py ├── cosmos_database.py ├── darknet_classify_image.py ├── database.py ├── keras_classify_image.py ├── local_database.py ├── locate_asset.py ├── logger.py ├── lookup_database.py ├── ocr.py ├── rotate.py └── tesseract_ocr.py └── yolo-obj.cfg /.gitignore: -------------------------------------------------------------------------------- 1 | *.jpg 2 | utils/ocr.py 3 | # Byte-compiled / optimized / DLL files 4 | __pycache__/ 5 | *.py[cod] 6 | *$py.class 7 | 8 | # C extensions 9 | *.so 10 | 11 | # Distribution / packaging 12 | .Python 13 | build/ 14 | develop-eggs/ 15 | dist/ 16 | downloads/ 17 | eggs/ 18 | .eggs/ 19 | lib/ 20 | lib64/ 21 | parts/ 22 | sdist/ 23 | var/ 24 | wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | .hypothesis/ 50 | .pytest_cache/ 51 | 52 | # Translations 53 | *.mo 54 | *.pot 55 | 56 | # Django stuff: 57 | *.log 58 | local_settings.py 59 | db.sqlite3 60 | 61 | # Flask stuff: 62 | instance/ 63 | .webassets-cache 64 | 65 | # Scrapy stuff: 66 | .scrapy 67 | 68 | # Sphinx documentation 69 | docs/_build/ 70 | 71 | # PyBuilder 72 | target/ 73 | 74 | # Jupyter Notebook 75 | .ipynb_checkpoints 76 | 77 | # pyenv 78 | .python-version 79 | 80 | # celery beat schedule file 81 | celerybeat-schedule 82 | 83 | # SageMath parsed files 84 | *.sage.py 85 | 86 | # Environments 87 | .env 88 | .venv 89 | env/ 90 | venv/ 91 | ENV/ 92 | env.bak/ 93 | venv.bak/ 94 | 95 | # Spyder project settings 96 | .spyderproject 97 | .spyproject 98 | 99 | # Rope project settings 100 | .ropeproject 101 | 102 | # mkdocs documentation 103 | /site 104 | 105 | # mypy 106 | .mypy_cache/ 107 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | LabelReader is a general Machine Learning-based solution to identifying labels in a picture. 2 | 3 | A common problem is using OCR on a simple picture. But what if that picture is complicated, has many items, and words that aren't important to the user? LabelReader finds the important label in the picture, crops it out, rotates it, and then reads the label to pinpoint the object you're looking for. 4 | 5 | # Demonstration 6 |

7 | Demonstration 8 |

9 | 10 | # The Approach 11 | 12 | The identifier's approach is straightforward: 13 | 14 | 1. Determine if a nameplate/asset is in the picture 15 | 2. Identify where that nameplate is 16 | 3. Crop out the relevant asset 17 | 4. Rotate the cropped picture so the text is readable 18 | 5. Read characters in the cropped picture 19 | 6. Find the relevant information in a database and present it to the user 20 | 21 | ### Details 22 | 23 | LabelReader uses the [Yolov3 algorithm](https://pjreddie.com) for object detection. The user can choose between the following to interact with the algorithm: 24 | * [Darknet](https://github.com/AlexeyAB/darknet) (Fast, C Implementation) 25 | * [Keras-Yolov3](https://github.com/qqwweee/keras-yolo3) (Python Implementation) 26 | 27 | This repository contains a model that has been trained on labels for headphones, and will need to be tuned for custom images. 28 | For Optical Character Recognition, LabelReader sends the processed images to Azure Cognitive Services. Users need to create an account with Cognitive Services Vision to use the model. Since it takes a few seconds to send and receive the request, LabelReader supports an alternative library, [Tesseract](https://github.com/tesseract-ocr/tesseract) for faster OCR. 29 | 30 | The repository contains another model, [RotNet](https://github.com/d4nst/RotNet) to detect how much to rotate the image. This should work for most products, but may need to be trained to suit your needs. 31 | 32 | # Getting Started 33 | 34 | LabelReader can run on Docker. It is recommended to install Docker and use the base image, continuumio/miniconda3: 35 | 36 | ``` 37 | docker pull continuumio/miniconda3 38 | docker run -i -t continuumio/miniconda3 /bin/bash 39 | apt update 40 | ``` 41 | 42 | Then, clone the repository: 43 | 44 | ``` 45 | git clone https://github.com/ecthros/labelReader 46 | cd labelReader 47 | ``` 48 | 49 | To install necessary dependencies, run: 50 | 51 | `./install.sh` 52 | 53 | This script will install necessary components and set up LabelReader to run. Once finished, run: 54 | 55 | `python labelReader.py [-k/-d] [-c/-t]` 56 | 57 | Make sure to specify if you want Keras or Darknet to classify, and Cognitive Services or Tesseract for OCR. 58 | 59 | 60 | ### Use Cases 61 | This nameplate identifier can be adapted for many causes. Identifying and analyzing parts of a picture is a very common problem, and this code is meant to be easily extendable. Simply add your own classes, extending the abstract classes given, or train your own model with the steps above. 62 | 63 | Many users might want to create a REST endpoint on Azure. This code is also included in this repository. Simply push your docker container to Docker Hub or Azure Container Storage, extending what is written, and follow the following steps: 64 | * Make sure your container automatically launches the web app locally 65 | * The endpoint will launch at /api/v1.0/image 66 | * Navigate to [Azure](https://portal.azure.com) and log in 67 | * Press the green "Create New Item" button 68 | * Select "Web App" 69 | * Enter the App name, subscription, and resource group 70 | * Select "Docker" for the OS 71 | * Select "Container Settings" and fill in the information for your container 72 | * Create the container. 73 | Note that the default web app does not have enough RAM to run most ML models, and you may need to update your plan's App service pricing tier. 74 | 75 | 76 | ### Classifier Training Notes 77 | 78 | * Training can take several hours to complete, even with an excellent GPU. 79 | * There are many ways to train the classifier, but [Darknet](https://github.com/AlexeyAB/darknet) is easy to use. 80 | * Follow the steps to train [here](https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects). 81 | * You will need approximately a hundred classified images, in various environments and lightings, to train the model. 82 | * Labeling with [VoTT](https://github.com/Microsoft/VoTT) is much easier than anything else I have found. 83 | * VoTT also creates the cfg, data, and folders for you. 84 | * Make sure your images are of the same aspect ratio, since Darknet will change the size to a fixed image (or, just change this parameter in Darknet) 85 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | from flask import Flask, jsonify, request, abort 3 | from robotidentifier import RobotIdentifier 4 | 5 | 6 | app = Flask(__name__) 7 | 8 | i = 0 9 | 10 | @app.route('/api/v1.0/image', methods=['POST']) 11 | def classify_image(): 12 | global i 13 | i += 1 14 | print(request) 15 | if not request.data: 16 | print(request.data) 17 | abort(400) 18 | with open("image" + str(i) + ".jpg", "wb") as myfile: 19 | myfile.write(request.data) 20 | return jsonify({'return': identifier.find_and_classify("image" + str(i) + '.jpg')}), 201 21 | 22 | 23 | @app.route('/') 24 | def index(): 25 | return "Hello, World!" 26 | 27 | if __name__ == '__main__': 28 | global identifier 29 | identifier = RobotIdentifier() 30 | app.run(debug=True, host='0.0.0.0', port=80) 31 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | 4 | def parse_args(): 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument('-d', '--darknet', dest='DARKNET', action='store_true', help="Specifies to use the Darknet classifier") 7 | parser.add_argument('-k', '--keras', dest='KERAS', action='store_true', help="Specifies to use the Keras classifier") 8 | parser.add_argument('-t', '--tesseract', dest='TESSERACT', action='store_true', help="Use the local Tesseract OCR engine") 9 | parser.add_argument('-c', '--cognitive_services', dest='COGNITIVE_SERVICES', action='store_true', help="Use Cognitive Servies for OCR") 10 | parser.add_argument('-s', '--key', dest="SUBSCRIPTION_KEY", default="", help="Subscription Key for Cognitive Services") 11 | parser.add_argument('-l', '--dbl', dest="DARKNET_BINARY_LOCATION", default=None, help="Location of Darknet Binary") 12 | parser.add_argument('--thresh', dest="DARKNET_THRESH", default=.25, type=float, help="Darknet threshold for successful classification (lower = more bounding boxes)") 13 | parser.add_argument('--data', dest="DARKNET_DATA_FILE", default="data/obj.data", help="Darknet data file") 14 | parser.add_argument('--cfg', dest="DARKNET_CFG_FILE", default="yolo-obj.cfg", help="Darknet configuration file") 15 | parser.add_argument('--weights', dest="DARKNET_WEIGHTS", default="yolo-obj_1600.weights", help="Weights for Darknet") 16 | parser.add_argument('-e', '--kl', dest="KERAS_LOCATION", default="keras-yolo3/", help="Location of Keras-yolo3") 17 | parser.add_argument('-r', '--show_response', dest="SHOW_RESPONSE", action='store_false', help="Shows responses from OCR") 18 | parser.add_argument('-i', '--show_images', dest="SHOW_IMAGES", action='store_true', help="Shows images after cropping and rotating") 19 | parser.add_argument('-n', '--label_name', dest='LABEL_NAME', default='label', help="Name of label for detection") 20 | parser.add_argument('-o', '--rotnet_location', dest='ROTNET_LOCATION', default="./RotNet", help="Location of RotNet") 21 | parser.add_argument('-m', '--model_name', dest='ROTNET_MODEL_NAME', default="rotnet_models/rotnet_street_view_resnet50_keras2.hdf5", help="Location of RotNet Model") 22 | parser.add_argument('-f', '--file_name', dest='ROTNET_SAVE_FILE_NAME', default="tilted.jpg", help="Where to save for RotNet") 23 | parser.add_argument('--local', dest='LOCAL_DATABASE', action='store_true', help="Use local database") 24 | parser.add_argument('-x', '--cosmos', dest='COSMOS_DATABASE', action='store_true', help='Use Cosmos database') 25 | args = parser.parse_args() 26 | if args.KERAS == False and args.DARKNET == False: 27 | parser.error("Either Darknet or Keras must be set, add -k or -d") 28 | if args.TESSERACT == False and args.COGNITIVE_SERVICES == False: 29 | parser.error("Either Tesseract or Cognitive Services must be set, add -t or -c") 30 | if args.COGNITIVE_SERVICES == True and args.SUBSCRIPTION_KEY == "": 31 | parser.error("Cognitive Services needs a subscription key, please provide with -s") 32 | return args 33 | 34 | 35 | ## Change the following variable based on what algorithms you want to use ## 36 | global DARKNET, KERAS, TESSERACT, COGNITIVE_SERVICES, DARKNET_BINARY_LOCATION, DARKNET_THRESH, DARKNET_DATA_FILE, \ 37 | DARKNET_CFG_FILE, DARKNET_WEIGHTS, KERAS, KERAS_LOCATION, SUBSCRIPTION_KEY, SHOW_RESPONSE, SHOW_IMAGES, \ 38 | LABEL_NAME, ROTNET_LOCATION, ROTNET_MODEL_NAME, ROTNET_SAVE_FILE_NAME 39 | 40 | args = parse_args() 41 | 42 | # One of {DARKNET, KERAS} needs to be true 43 | # Specifies which classifier to use 44 | DARKNET = args.DARKNET 45 | KERAS = args.KERAS 46 | 47 | # One of {TESSERACT, COGNITIVE_SERVICES} needs to be true 48 | # Specifies which OCR to use 49 | TESSERACT = args.TESSERACT 50 | COGNITIVE_SERVICES = args.COGNITIVE_SERVICES 51 | 52 | ############################################################################ 53 | 54 | ##### Darknet Information - Change if necessary to fit your needs ##### 55 | 56 | if DARKNET: 57 | if args.DARKNET_BINARY_LOCATION == None: 58 | if os.name == 'nt': 59 | global popen_spawn 60 | from pexpect import popen_spawn 61 | DARKNET_BINARY_LOCATION = "darknet.exe" 62 | else: 63 | DARKNET_BINARY_LOCATION = "./darknet" 64 | else: 65 | DARKNET_BINARY_LOCATION = args.DARKNET_BINARY_LOCATION 66 | 67 | #### Change the following attributes if you move the files/weights #### 68 | DARKNET_THRESH = args.DARKNET_THRESH 69 | DARKNET_DATA_FILE = args.DARKNET_DATA_FILE 70 | DARKNET_CFG_FILE = args.DARKNET_CFG_FILE 71 | DARKNET_WEIGHTS = args.DARKNET_WEIGHTS 72 | 73 | ####################################################################### 74 | 75 | ##### Keras Information - Change if necessary to fit your needs ##### 76 | 77 | elif KERAS: 78 | if os.name == 'nt': 79 | global popen_spawn 80 | from pexpect import popen_spawn 81 | 82 | # Change the location of Keras-yolo3 if you 83 | # move it. You will need to change Keras-yolo's 84 | # source code with the changes for the weights. 85 | KERAS_LOCATION = args.KERAS_LOCATION 86 | 87 | ##################################################################### 88 | 89 | #### Cognitive Services Information #### 90 | 91 | SUBSCRIPTION_KEY = args.SUBSCRIPTION_KEY 92 | SHOW_RESPONSE = args.SHOW_RESPONSE 93 | 94 | ######################################## 95 | 96 | ################ Locate_asset information ################ 97 | 98 | # Determines if we should show images after cropping them 99 | SHOW_IMAGES = args.SHOW_IMAGES 100 | # Name of the labels 101 | LABEL_NAME = args.LABEL_NAME 102 | 103 | ########################################################## 104 | 105 | ########################## RotNet Constants ########################### 106 | ### The following constants will most likely not need to be changed ### 107 | 108 | ROTNET_LOCATION = args.ROTNET_LOCATION 109 | ROTNET_MODEL_NAME = args.ROTNET_MODEL_NAME 110 | ROTNET_SAVE_FILE_NAME = args.ROTNET_SAVE_FILE_NAME 111 | 112 | ####################################################################### 113 | 114 | ####################### DATABASE INFO ####################### 115 | 116 | LOCAL_DATABASE = args.LOCAL_DATABASE 117 | COSMOS_DATABASE = args.COSMOS_DATABASE 118 | 119 | ############################################################# 120 | 121 | HOST = "" 122 | MASTER_KEY = "" 123 | DATABASE_ID = "" 124 | COLLECTION_ID = "" 125 | -------------------------------------------------------------------------------- /data/obj.data: -------------------------------------------------------------------------------- 1 | classes = 1 2 | train = data/train.txt 3 | valid = data/test.txt 4 | names = data/obj.names 5 | backup = backup/ -------------------------------------------------------------------------------- /data/obj.names: -------------------------------------------------------------------------------- 1 | label 2 | -------------------------------------------------------------------------------- /dependencies.txt: -------------------------------------------------------------------------------- 1 | pillow 2 | requests 3 | pexpect 4 | 5 | opencv-python 6 | keras 7 | tensorflow 8 | matplotlib 9 | 10 | -------------------------------------------------------------------------------- /init.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cd /robotidentifier 4 | python app.py -------------------------------------------------------------------------------- /install.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | #Tested on Ubuntu 16.04. 4 | 5 | 6 | if [ "`which sudo`" = "" ]; then 7 | #if we don't have sudo, grab it 8 | if [ "$(id -u)" != "0" ]; then 9 | echo "This install script needs to be run as root." 10 | exit -1 11 | fi 12 | echo -e "Installing Python" 13 | apt install -y python 14 | echo -e "\n\nInstalling Python Dependencies\n\n" 15 | apt install -y python-pip python-tk git unzip libsm6 libxext6 tesseract-ocr python-opencv libsm6 libxext6 gcc unzip wget 16 | else 17 | echo -e "Installing Python" 18 | sudo apt install -y python 19 | echo -e "\n\nInstalling Python Dependencies\n\n" 20 | sudo apt install -y python-pip python-tk git unzip libsm6 libxext6 tesseract-ocr python-opencv libsm6 libxext6 gcc unzip 21 | fi 22 | 23 | pip install pillow requests opencv-python keras tensorflow matplotlib pexpect pyocr Cython fuzzywuzzy[speedup] pydocumentdb numpy 24 | 25 | echo -e "\n\nInstalling RotNet\n\n" 26 | git clone https://github.com/ecthros/RotNet 27 | cd RotNet 28 | mkdir rotnet_models 29 | wget https://www.dropbox.com/s/ch5917qg0j9leyj/rotnet_models.zip?dl=0 30 | unzip rotnet_models.zip?dl=0 31 | mv rotnet_* rotnet_models 32 | cd .. 33 | 34 | echo -e "\n\nDownloading Weights\n\n" 35 | wget https://www.dropbox.com/s/zh4cjvuqimgm24s/yolo-obj_1600.weights?dl=0 36 | mv yolo-obj_1600.weights?dl=0 yolo-obj_1600.weights 37 | 38 | echo -e "\n\nDownloading darknet\n\n" 39 | wget https://www.dropbox.com/s/9nxzvyyi53bi4p4/darknet?dl=0 40 | mv darknet?dl=0 darknet 41 | chmod 755 darknet 42 | 43 | echo -e "\n\nDownloading Keras-Yolo3\n\n" 44 | git clone https://github.com/qqwweee/keras-yolo3 45 | cd keras-yolo3 46 | python convert.py ../yolo-obj.cfg ../yolo-obj_1600.weights model_data/yolo.h5 47 | echo "label" > model_data/coco_classes.txt 48 | 49 | #needed darknet. Moved to pwd. 50 | -------------------------------------------------------------------------------- /labelReader.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | from __future__ import print_function 3 | from config import * 4 | from utils.darknet_classify_image import * 5 | from utils.keras_classify_image import * 6 | from utils.azure_ocr import * 7 | from utils.tesseract_ocr import * 8 | import utils.logger as logger 9 | from utils.rotate import * 10 | from utils.lookup_database import * 11 | import sys 12 | from PIL import Image 13 | import time 14 | import os 15 | from RotNet.correct_rotation import * 16 | 17 | PYTHON_VERSION = sys.version_info[0] 18 | OS_VERSION = os.name 19 | 20 | class RobotIdentifier(): 21 | ''' Programatically finds and determines if a pictures contains an asset and where it is. ''' 22 | 23 | def init_vars(self): 24 | try: 25 | self.DARKNET = DARKNET 26 | self.KERAS = KERAS 27 | self.TESSERACT = TESSERACT 28 | self.COGNITIVE_SERVICES = COGNITIVE_SERVICES 29 | 30 | self.COSMOS_DATABASE = COSMOS_DATABASE 31 | self.LOCAL_DATABASE = LOCAL_DATABASE 32 | 33 | return 0 34 | except: 35 | return -1 36 | 37 | def init_classifier(self): 38 | ''' Initializes the classifier ''' 39 | try: 40 | if self.DARKNET: 41 | # Get a child process for speed considerations 42 | logger.good("Initializing Darknet") 43 | self.classifier = DarknetClassifier() 44 | elif self.KERAS: 45 | logger.good("Initializing Keras") 46 | self.classifier = KerasClassifier() 47 | if self.classifier == None or self.classifier == -1: 48 | return -1 49 | return 0 50 | except: 51 | return -1 52 | 53 | def init_ocr(self): 54 | ''' Initializes the OCR engine ''' 55 | try: 56 | if self.TESSERACT: 57 | logger.good("Initializing Tesseract") 58 | self.OCR = TesseractOCR() 59 | elif self.COGNITIVE_SERVICES: 60 | logger.good("Initializing Cognitive Services") 61 | self.OCR = AzureOCR() 62 | if self.OCR == None or self.OCR == -1: 63 | return -1 64 | return 0 65 | except: 66 | return -1 67 | 68 | def init_database(self): 69 | if self.LOCAL_DATABASE: 70 | logger.good("Initializing local database") 71 | from utils.local_database import LocalDatabase 72 | self.database = LocalDatabase() 73 | elif self.COSMOS_DATABASE: 74 | logger.good("Initializing Cosmos Database") 75 | from utils.cosmos_database import CosmosDatabase 76 | self.database = CosmosDatabase() 77 | else: 78 | self.database = -1 79 | if self.database == -1: 80 | return -1 81 | return 0 82 | 83 | 84 | def init_tabComplete(self): 85 | ''' Initializes the tab completer ''' 86 | try: 87 | if OS_VERSION == "posix": 88 | global tabCompleter 89 | global readline 90 | from utils.PythonCompleter import tabCompleter 91 | import readline 92 | comp = tabCompleter() 93 | # we want to treat '/' as part of a word, so override the delimiters 94 | readline.set_completer_delims(' \t\n;') 95 | readline.parse_and_bind("tab: complete") 96 | readline.set_completer(comp.pathCompleter) 97 | if not comp: 98 | return -1 99 | return 0 100 | except: 101 | return -1 102 | 103 | def prompt_input(self): 104 | ''' Prompts the user for input, depending on the python version. 105 | Return: The filename provided by the user. ''' 106 | if PYTHON_VERSION == 3: 107 | filename = str(input(" Specify File >>> ")) 108 | elif PYTHON_VERSION == 2: 109 | filename = str(raw_input(" Specify File >>> ")) 110 | return filename 111 | 112 | from utils.locate_asset import locate_asset 113 | 114 | def initialize(self): 115 | if self.init_vars() != 0: 116 | logger.fatal("Init vars") 117 | if self.init_tabComplete() != 0: 118 | logger.fatal("Init tabcomplete") 119 | if self.init_classifier() != 0: 120 | logger.fatal("Init Classifier") 121 | if self.init_ocr() != 0: 122 | logger.fatal("Init OCR") 123 | if initialize_rotnet() != 0: 124 | logger.fatal("Init RotNet") 125 | if self.init_database() == -1: 126 | logger.info("Not using Database") 127 | 128 | def find_and_classify(self, filename): 129 | start = time.time() 130 | 131 | #### Classify Image #### 132 | logger.good("Classifying Image") 133 | coords = self.classifier.classify_image(filename) 134 | ######################## 135 | 136 | time1 = time.time() 137 | print("Classify Time: " + str(time1-start)) 138 | 139 | #### Crop/rotate Image #### 140 | logger.good("Locating Asset") 141 | cropped_images = self.locate_asset(filename, self.classifier, lines=coords) 142 | ########################### 143 | 144 | time2 = time.time() 145 | print("Rotate Time: " + str(time2-time1)) 146 | 147 | 148 | #### Perform OCR #### 149 | ocr_results = None 150 | if cropped_images == []: 151 | logger.bad("No assets found, so terminating execution") 152 | else: 153 | logger.good("Performing OCR") 154 | ocr_results = self.OCR.ocr(cropped_images) 155 | ##################### 156 | 157 | time3 = time.time() 158 | print("OCR Time: " + str(time3-time2)) 159 | 160 | end = time.time() 161 | logger.good("Elapsed: " + str(end-start)) 162 | 163 | #### Lookup Database #### 164 | if self.database != -1: 165 | products = self.database.lookup_database(ocr_results) 166 | return products 167 | else: 168 | return ocr_results 169 | ######################### 170 | 171 | def __init__(self): 172 | ''' Run RobotIdentifier! ''' 173 | self.initialize() 174 | 175 | if __name__ == "__main__": 176 | identifier = RobotIdentifier() 177 | while True: 178 | filename = identifier.prompt_input() 179 | identifier.find_and_classify(filename) 180 | -------------------------------------------------------------------------------- /samples/pic1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ecthros/labelReader/4dea80798fca2a6bb18949f3f00671d76573c62d/samples/pic1.jpg -------------------------------------------------------------------------------- /samples/pic2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ecthros/labelReader/4dea80798fca2a6bb18949f3f00671d76573c62d/samples/pic2.jpg -------------------------------------------------------------------------------- /sendImage.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import requests 3 | import time 4 | 5 | start = time.time() 6 | 7 | if len(sys.argv) != 2: 8 | print("USAGE: sendImage.py ") 9 | 10 | with open(sys.argv[1], 'rb') as myfile: 11 | image = myfile.read() 12 | 13 | headers = {'Content-Type': 'application/octet-stream'} 14 | request_url = "REQUEST_URL" 15 | 16 | response = requests.post(request_url, headers=headers, data=image) 17 | end = time.time() 18 | print(response) 19 | print(response.json()['return']) 20 | print("Time Elapsed: " + str(end-start)) 21 | -------------------------------------------------------------------------------- /utils/PythonCompleter.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import readline 4 | import glob 5 | 6 | class tabCompleter(object): 7 | ''' A simple tab completer for linux ''' 8 | def pathCompleter(self,text,state): 9 | line = readline.get_line_buffer().split() 10 | if '~' in text: 11 | text = os.path.expanduser('~') 12 | if os.path.isdir(text): 13 | text += '/' 14 | return [x for x in glob.glob(text+'*')][state] 15 | -------------------------------------------------------------------------------- /utils/azure_ocr.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import json 3 | import time 4 | from utils.ocr import OCR 5 | from config import * 6 | from io import BytesIO 7 | from typing import Tuple, Dict, List 8 | 9 | class AzureOCR(OCR): 10 | def initialize(self): 11 | self.SUBSCRIPTION_KEY = SUBSCRIPTION_KEY 12 | self.SHOW_RESPONSE = SHOW_RESPONSE 13 | 14 | def print_response(self, area:Tuple[float, float, float, float], response:Dict) -> str: 15 | ''' Prints the response from Cognitive Services. 16 | Input: 17 | area - String describing the bounding box of the data 18 | response - The response for the image from Cognitive Services 19 | ''' 20 | txt = "" 21 | for line in response['recognitionResult']['lines']: 22 | txt += line['text'] + '\n' 23 | if self.SHOW_RESPONSE: 24 | if response["status"] == "Succeeded": 25 | #print(response['recognitionResult']['lines']) 26 | print("") 27 | print("==RESULT==" + str(area)) 28 | for line in response['recognitionResult']['lines']: 29 | print(line['text']) 30 | print("==========================") 31 | print("") 32 | else: 33 | print("Processing failed:") 34 | print(response) 35 | return txt 36 | 37 | def ocr_one_image(self, area:Tuple[float, float, float, float], image_data:object, threadList=-1, threadNum=None) -> None: 38 | ''' Performs OCR on a single image 39 | Input: 40 | area - String that describe the bounding box of the data 41 | image_data - String of the data 42 | ''' 43 | image_data = self.pic_to_string(image_data) 44 | try: 45 | request_url = "https://westus.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=Printed" 46 | headers = {'Ocp-Apim-Subscription-Key': self.SUBSCRIPTION_KEY, 'Content-Type': "application/octet-stream"} 47 | data = image_data 48 | 49 | # Send the POST request and parse the response 50 | response = requests.request('post', request_url, headers=headers, data=data) 51 | 52 | if response.status_code == 202: 53 | get_response = {} 54 | get_response["status"] = "Running" 55 | #print(get_response) 56 | # Continue sending requests until it finished processing 57 | while get_response["status"] == "Running" or get_response["status"] == "NotStarted": 58 | #print(get_response) 59 | time.sleep(.2) 60 | r2 = requests.get(response.headers['Operation-Location'], headers={'Ocp-Apim-Subscription-Key': self.SUBSCRIPTION_KEY}) 61 | get_response = r2.json() 62 | #print(get_response) 63 | res = self.print_response(area, get_response) 64 | if threadList != -1: 65 | threadList[threadNum] = (res) 66 | return res 67 | print(response) 68 | except Exception as e: 69 | print("OCR failed") 70 | print(e) 71 | return None 72 | 73 | 74 | def pic_to_string(self, image) -> str: 75 | ''' Uses PIL and StringIO to save the image to a string for further processing 76 | Input: image - an image opened by PIL 77 | Output: A string containing all the data of the picture''' 78 | output_string = BytesIO() 79 | image.save(output_string, format="JPEG") 80 | string_contents = output_string.getvalue() 81 | output_string.close() 82 | return string_contents 83 | -------------------------------------------------------------------------------- /utils/classifier.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | from typing import Tuple 3 | 4 | class Classifier(ABC): 5 | @abstractmethod 6 | def initialize(self): 7 | ''' Initialize the classifier ''' 8 | pass 9 | 10 | @abstractmethod 11 | def classify_image(self, image): 12 | ''' Classify an image. 13 | Input: An image, opened by PIL''' 14 | pass 15 | 16 | @abstractmethod 17 | def extract_info(self, line:str) -> Tuple: 18 | ''' Extract the information from a line returned by the classifier. 19 | Ex: Many programs do not return in an easily readable format, and need to be parsed. 20 | For example, a line could be: "label (90%) x:1300 y:3400 height:300 width:900". This 21 | should return the area of the bounding box. ''' 22 | pass 23 | 24 | def __init__(self): 25 | self.initialize() 26 | -------------------------------------------------------------------------------- /utils/cosmos_database.py: -------------------------------------------------------------------------------- 1 | from utils.database import Database 2 | import pydocumentdb.documents as documents 3 | import pydocumentdb.document_client as document_client 4 | import requests 5 | import traceback 6 | import urllib3 7 | from config import * 8 | from collections import ChainMap 9 | 10 | def test_ssl_connection(client): 11 | try: 12 | databases = list(client.ReadDatabases()) 13 | return True 14 | except requests.exceptions.SSLError as e: 15 | print("SSL error occured. ", e) 16 | except OSError as e: 17 | print("OSError occured. ", e) 18 | except Exception as e: 19 | print(traceback.format_exc()) 20 | return False 21 | 22 | def ObtainClient(): 23 | connection_policy = documents.ConnectionPolicy() 24 | connection_policy.SSLConfiguration = documents.SSLConfiguration() 25 | urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) 26 | connection_policy.SSLConfiguration.SSLCaCerts = False 27 | return document_client.DocumentClient(HOST, {'masterKey': MASTER_KEY}, connection_policy) 28 | 29 | def GetDocumentLink(database_id, collection_id, document_id): 30 | return "dbs/" + database_id + "/colls/" + collection_id + "/docs/" + document_id 31 | 32 | class CosmosDatabase(Database): 33 | 34 | def initialize(self): 35 | client = ObtainClient() 36 | if test_ssl_connection(client) == True: 37 | database = client.ReadDocument(GetDocumentLink(DATABASE_ID, COLLECTION_ID, "active_products"))['products'] 38 | if database == []: 39 | return -1 40 | self.database = ChainMap(*database) 41 | else: 42 | return -1 43 | return self.database 44 | 45 | -------------------------------------------------------------------------------- /utils/darknet_classify_image.py: -------------------------------------------------------------------------------- 1 | import pexpect 2 | import os 3 | from utils.classifier import Classifier 4 | from config import * 5 | from typing import Tuple 6 | 7 | class DarknetClassifier(Classifier): 8 | 9 | def initialize(self): 10 | ''' Initialize darknet. We do this for speed concerns. 11 | Input: 12 | thresh (float) - specifies the threshold of detection 13 | data (string) - name of the data file for darknet 14 | cfg (string) - name of the configuration file 15 | weights (string) - name of the pre-trained weights 16 | Return: 17 | proc (pexpect process), which we use to interact with the running darknet process ''' 18 | command = DARKNET_BINARY_LOCATION + " detector test " + DARKNET_DATA_FILE + " " + DARKNET_CFG_FILE \ 19 | + " " + DARKNET_WEIGHTS + " -thresh " + str(DARKNET_THRESH) + " -ext_output -dont_show" 20 | if os.name == 'nt': 21 | self.proc = popen_spawn.PopenSpawn(command) 22 | else: 23 | self.proc = pexpect.spawn(command) 24 | self.proc.expect('Enter Image Path:') 25 | 26 | def classify_image(self, image:str) -> str: 27 | ''' Classifies a given image. Simply provide the name (string) of the image, and the proc to do it on. 28 | Input: 29 | image (string) - name of the saved image file 30 | self.proc (proc) - Pexpect proc to interact with 31 | Return: 32 | Returns the output from darknet, which gives the location of each bounding box. ''' 33 | self.proc.sendline(image) 34 | self.proc.expect('Enter Image Path:', timeout=90) 35 | res = self.proc.before 36 | return res.decode('utf-8') 37 | def extract_info(self, line:str) -> Tuple: 38 | ''' Extracts the information from a single line that contains a label. 39 | Input: line (string), a line that already contains the label 40 | Output: area (Tuple of four ints), which gives the area of the bounding box. 41 | ''' 42 | nameplate_info = line.split() 43 | nameplate_confidence = nameplate_info[1] 44 | nameplate_left_x = int(nameplate_info[3]) 45 | nameplate_top_y = int(nameplate_info[5]) 46 | nameplate_width = int(nameplate_info[7]) 47 | nameplate_height = int(nameplate_info[9][:-1]) 48 | 49 | area = (nameplate_left_x, nameplate_top_y, (nameplate_left_x + nameplate_width), (nameplate_top_y + nameplate_height)) 50 | 51 | return area 52 | -------------------------------------------------------------------------------- /utils/database.py: -------------------------------------------------------------------------------- 1 | from fuzzywuzzy import fuzz 2 | from fuzzywuzzy import process 3 | from typing import Tuple 4 | from abc import ABC, abstractmethod 5 | from utils.logger import * 6 | 7 | class Database(object): 8 | 9 | @abstractmethod 10 | def initialize(self): 11 | ''' This method should return the self.database parameter. self.database should be a dictionary 12 | with product identifiers as the key and values if they are enabled or not. ''' 13 | pass 14 | 15 | def __init__(self): 16 | self.database = self.initialize() 17 | 18 | def lookup_database(self, txt:Tuple[Tuple[float, float, float, float], str]): 19 | ''' Input: 20 | txt ((area, string) tuple) - Contains the bounding box of the image and the accompanying string. 21 | This methodwill look up the string and determine if the product is active or disabled.''' 22 | if txt is None: 23 | return 24 | products = "" 25 | for line in txt: 26 | lines = line[1].split('\n') 27 | max = 0 28 | bestGuess = "UNKNOWN" 29 | bestWord = "" 30 | keys = self.database.keys() 31 | for l in lines: 32 | for word in l.split(' '): 33 | if word != "": 34 | (guess, confidence) = process.extractOne(word, keys, scorer=fuzz.token_sort_ratio) 35 | if confidence > max: 36 | max = confidence 37 | bestGuess = guess 38 | bestWord = word 39 | 40 | if bestGuess == "UNKNOWN": 41 | print("Unknown product - " + str(line[0])) 42 | else: 43 | print(bestWord) 44 | product = (str(self.database[bestGuess]) + " product (" + bestGuess + ") - " + str(line[0]) + ", confidence: " + str(max)) 45 | products += product 46 | print(product) 47 | return products 48 | 49 | -------------------------------------------------------------------------------- /utils/keras_classify_image.py: -------------------------------------------------------------------------------- 1 | import pexpect 2 | import os 3 | from utils.classifier import Classifier 4 | from config import * 5 | from typing import Tuple 6 | 7 | class KerasClassifier(Classifier): 8 | 9 | def initialize(self): 10 | ''' Initialize the Keras-yolo model for speed concerns. 11 | Return: None, but self.proc is populated with a procedure that can interface with Keras-Yolo ''' 12 | 13 | command = "python yolo_video.py --image" 14 | if os.name == 'nt': 15 | self.proc = popen_spawn.PopenSpawn(command, cwd=os.path.dirname(KERAS_LOCATION)) 16 | else: 17 | self.proc = pexpect.spawn(command, cwd=os.path.dirname(KERAS_LOCATION)) 18 | self.proc.expect('Input image filename:', timeout=900) 19 | 20 | 21 | def classify_image(self, image:str) -> str: 22 | ''' Classifies a given image using Keras-Yolo3. 23 | Should already be initialized. 24 | Input: 25 | image (string) - Provide the saved filename 26 | Returns: 27 | string of the results from Keras-Yolo3''' 28 | self.proc.sendline("../" + image) # Todo please fix this line 29 | self.proc.expect('Input image filename:', timeout=900) 30 | res = self.proc.before 31 | return res.decode('utf-8') 32 | 33 | def extract_info(self, line:str) -> Tuple: 34 | ''' Extracts the information from a single line that contains a label. 35 | Input: line (string), a line that already contains the label 36 | Output: area (Tuple of four ints), which gives the area of the bounding box. 37 | ''' 38 | nameplate_info = line.split() 39 | nameplate_confidence = nameplate_info[1] 40 | nameplate_left_x = int(nameplate_info[2][1:][:-1]) 41 | nameplate_top_y = int(nameplate_info[3][:-1]) 42 | nameplate_right_x = int(nameplate_info[4][1:][:-1]) 43 | nameplate_bottom_y = int(nameplate_info[5][:-1]) 44 | 45 | area = (nameplate_left_x, nameplate_top_y, nameplate_right_x, (nameplate_bottom_y)) 46 | 47 | return area 48 | -------------------------------------------------------------------------------- /utils/local_database.py: -------------------------------------------------------------------------------- 1 | from utils.database import Database 2 | 3 | 4 | bose_qc25 = { 5 | "065252Z80341129AE": "Active", 6 | "065252Z80571416AE": "Inactive" 7 | } 8 | 9 | DATABASE = { 10 | "715053-0010": bose_qc25 11 | } 12 | 13 | class LocalDatabase(Database): 14 | def initialize(self): 15 | return bose_qc25 16 | -------------------------------------------------------------------------------- /utils/locate_asset.py: -------------------------------------------------------------------------------- 1 | from PIL import Image 2 | from PIL import ImageFilter 3 | import utils.logger as logger 4 | from utils.rotate import rotate 5 | from config import * 6 | from typing import Tuple, List 7 | import sys 8 | i = 0 9 | def crop_image(image, area:Tuple) -> object: 10 | ''' Uses PIL to crop an image, given its area. 11 | Input: 12 | image - PIL opened image 13 | Area - Coordinates in tuple (xmin, ymax, xmax, ymin) format ''' 14 | img = Image.open(image) 15 | cropped_image = img.crop(area) 16 | 17 | # Rotation should happen here 18 | rotated_image = rotate(cropped_image) 19 | 20 | size = (3200, 3200) 21 | rotated_image.thumbnail(size, Image.ANTIALIAS) 22 | global i 23 | rotated_image.save("asdf" + str(i) + ".jpg", "JPEG") 24 | i += 1 25 | 26 | if SHOW_IMAGES: 27 | logger.good("Showing cropped image") 28 | rotated_image.show() 29 | 30 | return rotated_image 31 | 32 | 33 | def locate_asset(self, image, classifier, lines="") -> List: 34 | ''' Determines where an asset is in the picture, returning 35 | a set of coordinates, for the top left, top right, bottom 36 | left, and bottom right of the tag 37 | Returns: 38 | [(area, image)] 39 | Area is the coordinates of the bounding box 40 | Image is the image, opened by PIL.''' 41 | cropped_images = [] 42 | 43 | for line in str(lines).split('\n'): 44 | 45 | if LABEL_NAME + ":" in line: 46 | # Extract the nameplate info 47 | area = classifier.extract_info(line) 48 | # Open image 49 | cropped_images.append((area, crop_image(image, area))) 50 | if cropped_images == []: 51 | logger.bad("No label found in image.") 52 | else: 53 | logger.good("Found " + str(len(cropped_images)) + " label(s) in image.") 54 | 55 | return cropped_images 56 | -------------------------------------------------------------------------------- /utils/logger.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | def good(message:str): 4 | ''' Prints a message with [+] at the front to signify success ''' 5 | print("[+] " + str(message)) 6 | 7 | def bad(message:str): 8 | ''' Prints a message with [-] at the front to signify failure ''' 9 | print("[-] " + str(message)) 10 | def info(message:str): 11 | ''' Prints a message to signify information ''' 12 | print("[ ] " + str(message)) 13 | 14 | def fatal(failure:str): 15 | print("[/] " + str(failure) + " failed, exiting now") 16 | sys.exit(1) 17 | -------------------------------------------------------------------------------- /utils/lookup_database.py: -------------------------------------------------------------------------------- 1 | from fuzzywuzzy import fuzz 2 | from fuzzywuzzy import process 3 | from typing import Tuple 4 | bose_qc25 = { 5 | "065252Z80341129AE": "Active", 6 | "065252Z80571416AE": "Inactive" 7 | } 8 | 9 | DATABASE = { 10 | "715053-0010": bose_qc25 11 | } 12 | 13 | 14 | def lookup_database(txt:Tuple[Tuple[float, float, float, float], str]): 15 | ''' Input: 16 | txt ((area, string) tuple) - Contains the bounding box of the image and the accompanying string. 17 | This methodwill look up the string and determine if the product is active or disabled.''' 18 | if txt is None: 19 | return 20 | for line in txt: 21 | lines = line[1].split('\n') 22 | max = 0 23 | bestGuess = "UNKNOWN" 24 | bestWord = "" 25 | keys = bose_qc25.keys() 26 | for l in lines: 27 | for word in l.split(' '): 28 | if word != "": 29 | (guess, confidence) = process.extractOne(word, keys, scorer=fuzz.token_sort_ratio) 30 | if confidence > max: 31 | max = confidence 32 | bestGuess = guess 33 | bestWord = word 34 | 35 | if bestGuess == "UNKNOWN": 36 | print("Unknown product - " + str(line[0])) 37 | else: 38 | print(bestWord) 39 | print(str(bose_qc25[bestGuess]) + " product (" + bestGuess + ") - " + str(line[0]) + ", confidence: " + str(max)) 40 | 41 | -------------------------------------------------------------------------------- /utils/ocr.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | from typing import List 3 | import threading 4 | 5 | class OCR(ABC): 6 | @abstractmethod 7 | def initialize(self): 8 | ''' Initialize the OCR ''' 9 | pass 10 | 11 | @abstractmethod 12 | def ocr_one_image(self, images:List) -> List: 13 | ''' OCR an image. 14 | Input: An array of (area, image)s, opened by PIL and pre-processed 15 | Return: An array of (area, message), where the message is from OCR''' 16 | pass 17 | 18 | def ocr(self, images:List) -> List: 19 | '''Sends an opened image to Azure's cognitive services. 20 | Input: images (tuple(area, image)) 21 | Returns the results from Tesseract.''' 22 | threads = [] 23 | threadResults = ["" for i in range(len(images))] 24 | threadNum = 0 25 | results = [] 26 | for image in images: 27 | t = threading.Thread(target=self.ocr_one_image, args=(image[0], image[1]), kwargs={'threadList':threadResults, 'threadNum':threadNum}) 28 | 29 | t.start() 30 | threads.append(t) 31 | threadNum += 1 32 | 33 | for t in threads: 34 | t.join() 35 | i = 0 36 | for result in threadResults: 37 | results.append((images[i][0], result)) 38 | i += 1 39 | return results 40 | 41 | def __init__(self): 42 | self.initialize() 43 | -------------------------------------------------------------------------------- /utils/rotate.py: -------------------------------------------------------------------------------- 1 | from PIL import Image 2 | import utils.logger as logger 3 | import subprocess 4 | import os 5 | import pexpect 6 | from RotNet.correct_rotation import * 7 | from config import * 8 | import time 9 | 10 | def initialize_rotnet() -> int: 11 | ''' For speed concerns, let's load up the model first 12 | Head to the RotNet directory and use correct_rotation to lod the model ''' 13 | try: 14 | logger.good("Initializing RotNet") 15 | init_rotnet(ROTNET_LOCATION + "/" + ROTNET_MODEL_NAME) 16 | return 0 17 | except: 18 | return -1 19 | 20 | 21 | def rotate(image:object) -> object: 22 | ''' Uses RotNet's Keras/Tensorflow algorithm to rotate an image. 23 | Input: image, opened with PIL 24 | Output: Rotated image ''' 25 | 26 | # We need to save the file first for processing 27 | image.save(ROTNET_SAVE_FILE_NAME, "JPEG") 28 | 29 | logger.good("Rotating Image") 30 | rotate_image(ROTNET_SAVE_FILE_NAME) 31 | image = Image.open(ROTNET_SAVE_FILE_NAME) 32 | return image 33 | -------------------------------------------------------------------------------- /utils/tesseract_ocr.py: -------------------------------------------------------------------------------- 1 | from utils.ocr import OCR 2 | import pyocr 3 | import pyocr.builders 4 | import sys 5 | from PIL import Image 6 | from typing import Tuple 7 | 8 | class TesseractOCR(OCR): 9 | 10 | def initialize(self): 11 | ''' Initialize Tesseract and load it up for speed ''' 12 | tools = pyocr.get_available_tools() 13 | if len(tools) == 0: 14 | print("No tools found, do you have Tesseract installed?") 15 | sys.exit(1) 16 | self.tool = tools[0] 17 | self.langs = self.tool.get_available_languages() 18 | 19 | def ocr_one_image(self, area, image, threadList=-1, threadNum=None): 20 | print("Starting image...") 21 | txt = self.tool.image_to_string(image, lang=self.langs[0], builder=pyocr.builders.TextBuilder()) 22 | print("==RESULT==" + str(area) + "\n" + txt + "\n==========================") 23 | if threadList != -1: 24 | threadList[threadNum] = txt 25 | return txt 26 | -------------------------------------------------------------------------------- /yolo-obj.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | ## Testing 3 | #batch=1 4 | #subdivisions=1 5 | # Training 6 | batch=32 7 | subdivisions=8 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 2500 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=18 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=1 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=18 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=1 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=18 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=1 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | 790 | --------------------------------------------------------------------------------