├── .gitignore
├── README.md
├── app.py
├── config.py
├── data
├── obj.data
└── obj.names
├── dependencies.txt
├── init.sh
├── install.sh
├── labelReader.py
├── samples
├── pic1.jpg
└── pic2.jpg
├── sendImage.py
├── utils
├── PythonCompleter.py
├── azure_ocr.py
├── classifier.py
├── cosmos_database.py
├── darknet_classify_image.py
├── database.py
├── keras_classify_image.py
├── local_database.py
├── locate_asset.py
├── logger.py
├── lookup_database.py
├── ocr.py
├── rotate.py
└── tesseract_ocr.py
└── yolo-obj.cfg
/.gitignore:
--------------------------------------------------------------------------------
1 | *.jpg
2 | utils/ocr.py
3 | # Byte-compiled / optimized / DLL files
4 | __pycache__/
5 | *.py[cod]
6 | *$py.class
7 |
8 | # C extensions
9 | *.so
10 |
11 | # Distribution / packaging
12 | .Python
13 | build/
14 | develop-eggs/
15 | dist/
16 | downloads/
17 | eggs/
18 | .eggs/
19 | lib/
20 | lib64/
21 | parts/
22 | sdist/
23 | var/
24 | wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *.cover
49 | .hypothesis/
50 | .pytest_cache/
51 |
52 | # Translations
53 | *.mo
54 | *.pot
55 |
56 | # Django stuff:
57 | *.log
58 | local_settings.py
59 | db.sqlite3
60 |
61 | # Flask stuff:
62 | instance/
63 | .webassets-cache
64 |
65 | # Scrapy stuff:
66 | .scrapy
67 |
68 | # Sphinx documentation
69 | docs/_build/
70 |
71 | # PyBuilder
72 | target/
73 |
74 | # Jupyter Notebook
75 | .ipynb_checkpoints
76 |
77 | # pyenv
78 | .python-version
79 |
80 | # celery beat schedule file
81 | celerybeat-schedule
82 |
83 | # SageMath parsed files
84 | *.sage.py
85 |
86 | # Environments
87 | .env
88 | .venv
89 | env/
90 | venv/
91 | ENV/
92 | env.bak/
93 | venv.bak/
94 |
95 | # Spyder project settings
96 | .spyderproject
97 | .spyproject
98 |
99 | # Rope project settings
100 | .ropeproject
101 |
102 | # mkdocs documentation
103 | /site
104 |
105 | # mypy
106 | .mypy_cache/
107 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | LabelReader is a general Machine Learning-based solution to identifying labels in a picture.
2 |
3 | A common problem is using OCR on a simple picture. But what if that picture is complicated, has many items, and words that aren't important to the user? LabelReader finds the important label in the picture, crops it out, rotates it, and then reads the label to pinpoint the object you're looking for.
4 |
5 | # Demonstration
6 |
7 |
8 |
9 |
10 | # The Approach
11 |
12 | The identifier's approach is straightforward:
13 |
14 | 1. Determine if a nameplate/asset is in the picture
15 | 2. Identify where that nameplate is
16 | 3. Crop out the relevant asset
17 | 4. Rotate the cropped picture so the text is readable
18 | 5. Read characters in the cropped picture
19 | 6. Find the relevant information in a database and present it to the user
20 |
21 | ### Details
22 |
23 | LabelReader uses the [Yolov3 algorithm](https://pjreddie.com) for object detection. The user can choose between the following to interact with the algorithm:
24 | * [Darknet](https://github.com/AlexeyAB/darknet) (Fast, C Implementation)
25 | * [Keras-Yolov3](https://github.com/qqwweee/keras-yolo3) (Python Implementation)
26 |
27 | This repository contains a model that has been trained on labels for headphones, and will need to be tuned for custom images.
28 | For Optical Character Recognition, LabelReader sends the processed images to Azure Cognitive Services. Users need to create an account with Cognitive Services Vision to use the model. Since it takes a few seconds to send and receive the request, LabelReader supports an alternative library, [Tesseract](https://github.com/tesseract-ocr/tesseract) for faster OCR.
29 |
30 | The repository contains another model, [RotNet](https://github.com/d4nst/RotNet) to detect how much to rotate the image. This should work for most products, but may need to be trained to suit your needs.
31 |
32 | # Getting Started
33 |
34 | LabelReader can run on Docker. It is recommended to install Docker and use the base image, continuumio/miniconda3:
35 |
36 | ```
37 | docker pull continuumio/miniconda3
38 | docker run -i -t continuumio/miniconda3 /bin/bash
39 | apt update
40 | ```
41 |
42 | Then, clone the repository:
43 |
44 | ```
45 | git clone https://github.com/ecthros/labelReader
46 | cd labelReader
47 | ```
48 |
49 | To install necessary dependencies, run:
50 |
51 | `./install.sh`
52 |
53 | This script will install necessary components and set up LabelReader to run. Once finished, run:
54 |
55 | `python labelReader.py [-k/-d] [-c/-t]`
56 |
57 | Make sure to specify if you want Keras or Darknet to classify, and Cognitive Services or Tesseract for OCR.
58 |
59 |
60 | ### Use Cases
61 | This nameplate identifier can be adapted for many causes. Identifying and analyzing parts of a picture is a very common problem, and this code is meant to be easily extendable. Simply add your own classes, extending the abstract classes given, or train your own model with the steps above.
62 |
63 | Many users might want to create a REST endpoint on Azure. This code is also included in this repository. Simply push your docker container to Docker Hub or Azure Container Storage, extending what is written, and follow the following steps:
64 | * Make sure your container automatically launches the web app locally
65 | * The endpoint will launch at /api/v1.0/image
66 | * Navigate to [Azure](https://portal.azure.com) and log in
67 | * Press the green "Create New Item" button
68 | * Select "Web App"
69 | * Enter the App name, subscription, and resource group
70 | * Select "Docker" for the OS
71 | * Select "Container Settings" and fill in the information for your container
72 | * Create the container.
73 | Note that the default web app does not have enough RAM to run most ML models, and you may need to update your plan's App service pricing tier.
74 |
75 |
76 | ### Classifier Training Notes
77 |
78 | * Training can take several hours to complete, even with an excellent GPU.
79 | * There are many ways to train the classifier, but [Darknet](https://github.com/AlexeyAB/darknet) is easy to use.
80 | * Follow the steps to train [here](https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects).
81 | * You will need approximately a hundred classified images, in various environments and lightings, to train the model.
82 | * Labeling with [VoTT](https://github.com/Microsoft/VoTT) is much easier than anything else I have found.
83 | * VoTT also creates the cfg, data, and folders for you.
84 | * Make sure your images are of the same aspect ratio, since Darknet will change the size to a fixed image (or, just change this parameter in Darknet)
85 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | from flask import Flask, jsonify, request, abort
3 | from robotidentifier import RobotIdentifier
4 |
5 |
6 | app = Flask(__name__)
7 |
8 | i = 0
9 |
10 | @app.route('/api/v1.0/image', methods=['POST'])
11 | def classify_image():
12 | global i
13 | i += 1
14 | print(request)
15 | if not request.data:
16 | print(request.data)
17 | abort(400)
18 | with open("image" + str(i) + ".jpg", "wb") as myfile:
19 | myfile.write(request.data)
20 | return jsonify({'return': identifier.find_and_classify("image" + str(i) + '.jpg')}), 201
21 |
22 |
23 | @app.route('/')
24 | def index():
25 | return "Hello, World!"
26 |
27 | if __name__ == '__main__':
28 | global identifier
29 | identifier = RobotIdentifier()
30 | app.run(debug=True, host='0.0.0.0', port=80)
31 |
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 | import os
2 | import argparse
3 |
4 | def parse_args():
5 | parser = argparse.ArgumentParser()
6 | parser.add_argument('-d', '--darknet', dest='DARKNET', action='store_true', help="Specifies to use the Darknet classifier")
7 | parser.add_argument('-k', '--keras', dest='KERAS', action='store_true', help="Specifies to use the Keras classifier")
8 | parser.add_argument('-t', '--tesseract', dest='TESSERACT', action='store_true', help="Use the local Tesseract OCR engine")
9 | parser.add_argument('-c', '--cognitive_services', dest='COGNITIVE_SERVICES', action='store_true', help="Use Cognitive Servies for OCR")
10 | parser.add_argument('-s', '--key', dest="SUBSCRIPTION_KEY", default="", help="Subscription Key for Cognitive Services")
11 | parser.add_argument('-l', '--dbl', dest="DARKNET_BINARY_LOCATION", default=None, help="Location of Darknet Binary")
12 | parser.add_argument('--thresh', dest="DARKNET_THRESH", default=.25, type=float, help="Darknet threshold for successful classification (lower = more bounding boxes)")
13 | parser.add_argument('--data', dest="DARKNET_DATA_FILE", default="data/obj.data", help="Darknet data file")
14 | parser.add_argument('--cfg', dest="DARKNET_CFG_FILE", default="yolo-obj.cfg", help="Darknet configuration file")
15 | parser.add_argument('--weights', dest="DARKNET_WEIGHTS", default="yolo-obj_1600.weights", help="Weights for Darknet")
16 | parser.add_argument('-e', '--kl', dest="KERAS_LOCATION", default="keras-yolo3/", help="Location of Keras-yolo3")
17 | parser.add_argument('-r', '--show_response', dest="SHOW_RESPONSE", action='store_false', help="Shows responses from OCR")
18 | parser.add_argument('-i', '--show_images', dest="SHOW_IMAGES", action='store_true', help="Shows images after cropping and rotating")
19 | parser.add_argument('-n', '--label_name', dest='LABEL_NAME', default='label', help="Name of label for detection")
20 | parser.add_argument('-o', '--rotnet_location', dest='ROTNET_LOCATION', default="./RotNet", help="Location of RotNet")
21 | parser.add_argument('-m', '--model_name', dest='ROTNET_MODEL_NAME', default="rotnet_models/rotnet_street_view_resnet50_keras2.hdf5", help="Location of RotNet Model")
22 | parser.add_argument('-f', '--file_name', dest='ROTNET_SAVE_FILE_NAME', default="tilted.jpg", help="Where to save for RotNet")
23 | parser.add_argument('--local', dest='LOCAL_DATABASE', action='store_true', help="Use local database")
24 | parser.add_argument('-x', '--cosmos', dest='COSMOS_DATABASE', action='store_true', help='Use Cosmos database')
25 | args = parser.parse_args()
26 | if args.KERAS == False and args.DARKNET == False:
27 | parser.error("Either Darknet or Keras must be set, add -k or -d")
28 | if args.TESSERACT == False and args.COGNITIVE_SERVICES == False:
29 | parser.error("Either Tesseract or Cognitive Services must be set, add -t or -c")
30 | if args.COGNITIVE_SERVICES == True and args.SUBSCRIPTION_KEY == "":
31 | parser.error("Cognitive Services needs a subscription key, please provide with -s")
32 | return args
33 |
34 |
35 | ## Change the following variable based on what algorithms you want to use ##
36 | global DARKNET, KERAS, TESSERACT, COGNITIVE_SERVICES, DARKNET_BINARY_LOCATION, DARKNET_THRESH, DARKNET_DATA_FILE, \
37 | DARKNET_CFG_FILE, DARKNET_WEIGHTS, KERAS, KERAS_LOCATION, SUBSCRIPTION_KEY, SHOW_RESPONSE, SHOW_IMAGES, \
38 | LABEL_NAME, ROTNET_LOCATION, ROTNET_MODEL_NAME, ROTNET_SAVE_FILE_NAME
39 |
40 | args = parse_args()
41 |
42 | # One of {DARKNET, KERAS} needs to be true
43 | # Specifies which classifier to use
44 | DARKNET = args.DARKNET
45 | KERAS = args.KERAS
46 |
47 | # One of {TESSERACT, COGNITIVE_SERVICES} needs to be true
48 | # Specifies which OCR to use
49 | TESSERACT = args.TESSERACT
50 | COGNITIVE_SERVICES = args.COGNITIVE_SERVICES
51 |
52 | ############################################################################
53 |
54 | ##### Darknet Information - Change if necessary to fit your needs #####
55 |
56 | if DARKNET:
57 | if args.DARKNET_BINARY_LOCATION == None:
58 | if os.name == 'nt':
59 | global popen_spawn
60 | from pexpect import popen_spawn
61 | DARKNET_BINARY_LOCATION = "darknet.exe"
62 | else:
63 | DARKNET_BINARY_LOCATION = "./darknet"
64 | else:
65 | DARKNET_BINARY_LOCATION = args.DARKNET_BINARY_LOCATION
66 |
67 | #### Change the following attributes if you move the files/weights ####
68 | DARKNET_THRESH = args.DARKNET_THRESH
69 | DARKNET_DATA_FILE = args.DARKNET_DATA_FILE
70 | DARKNET_CFG_FILE = args.DARKNET_CFG_FILE
71 | DARKNET_WEIGHTS = args.DARKNET_WEIGHTS
72 |
73 | #######################################################################
74 |
75 | ##### Keras Information - Change if necessary to fit your needs #####
76 |
77 | elif KERAS:
78 | if os.name == 'nt':
79 | global popen_spawn
80 | from pexpect import popen_spawn
81 |
82 | # Change the location of Keras-yolo3 if you
83 | # move it. You will need to change Keras-yolo's
84 | # source code with the changes for the weights.
85 | KERAS_LOCATION = args.KERAS_LOCATION
86 |
87 | #####################################################################
88 |
89 | #### Cognitive Services Information ####
90 |
91 | SUBSCRIPTION_KEY = args.SUBSCRIPTION_KEY
92 | SHOW_RESPONSE = args.SHOW_RESPONSE
93 |
94 | ########################################
95 |
96 | ################ Locate_asset information ################
97 |
98 | # Determines if we should show images after cropping them
99 | SHOW_IMAGES = args.SHOW_IMAGES
100 | # Name of the labels
101 | LABEL_NAME = args.LABEL_NAME
102 |
103 | ##########################################################
104 |
105 | ########################## RotNet Constants ###########################
106 | ### The following constants will most likely not need to be changed ###
107 |
108 | ROTNET_LOCATION = args.ROTNET_LOCATION
109 | ROTNET_MODEL_NAME = args.ROTNET_MODEL_NAME
110 | ROTNET_SAVE_FILE_NAME = args.ROTNET_SAVE_FILE_NAME
111 |
112 | #######################################################################
113 |
114 | ####################### DATABASE INFO #######################
115 |
116 | LOCAL_DATABASE = args.LOCAL_DATABASE
117 | COSMOS_DATABASE = args.COSMOS_DATABASE
118 |
119 | #############################################################
120 |
121 | HOST = ""
122 | MASTER_KEY = ""
123 | DATABASE_ID = ""
124 | COLLECTION_ID = ""
125 |
--------------------------------------------------------------------------------
/data/obj.data:
--------------------------------------------------------------------------------
1 | classes = 1
2 | train = data/train.txt
3 | valid = data/test.txt
4 | names = data/obj.names
5 | backup = backup/
--------------------------------------------------------------------------------
/data/obj.names:
--------------------------------------------------------------------------------
1 | label
2 |
--------------------------------------------------------------------------------
/dependencies.txt:
--------------------------------------------------------------------------------
1 | pillow
2 | requests
3 | pexpect
4 |
5 | opencv-python
6 | keras
7 | tensorflow
8 | matplotlib
9 |
10 |
--------------------------------------------------------------------------------
/init.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | cd /robotidentifier
4 | python app.py
--------------------------------------------------------------------------------
/install.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | #Tested on Ubuntu 16.04.
4 |
5 |
6 | if [ "`which sudo`" = "" ]; then
7 | #if we don't have sudo, grab it
8 | if [ "$(id -u)" != "0" ]; then
9 | echo "This install script needs to be run as root."
10 | exit -1
11 | fi
12 | echo -e "Installing Python"
13 | apt install -y python
14 | echo -e "\n\nInstalling Python Dependencies\n\n"
15 | apt install -y python-pip python-tk git unzip libsm6 libxext6 tesseract-ocr python-opencv libsm6 libxext6 gcc unzip wget
16 | else
17 | echo -e "Installing Python"
18 | sudo apt install -y python
19 | echo -e "\n\nInstalling Python Dependencies\n\n"
20 | sudo apt install -y python-pip python-tk git unzip libsm6 libxext6 tesseract-ocr python-opencv libsm6 libxext6 gcc unzip
21 | fi
22 |
23 | pip install pillow requests opencv-python keras tensorflow matplotlib pexpect pyocr Cython fuzzywuzzy[speedup] pydocumentdb numpy
24 |
25 | echo -e "\n\nInstalling RotNet\n\n"
26 | git clone https://github.com/ecthros/RotNet
27 | cd RotNet
28 | mkdir rotnet_models
29 | wget https://www.dropbox.com/s/ch5917qg0j9leyj/rotnet_models.zip?dl=0
30 | unzip rotnet_models.zip?dl=0
31 | mv rotnet_* rotnet_models
32 | cd ..
33 |
34 | echo -e "\n\nDownloading Weights\n\n"
35 | wget https://www.dropbox.com/s/zh4cjvuqimgm24s/yolo-obj_1600.weights?dl=0
36 | mv yolo-obj_1600.weights?dl=0 yolo-obj_1600.weights
37 |
38 | echo -e "\n\nDownloading darknet\n\n"
39 | wget https://www.dropbox.com/s/9nxzvyyi53bi4p4/darknet?dl=0
40 | mv darknet?dl=0 darknet
41 | chmod 755 darknet
42 |
43 | echo -e "\n\nDownloading Keras-Yolo3\n\n"
44 | git clone https://github.com/qqwweee/keras-yolo3
45 | cd keras-yolo3
46 | python convert.py ../yolo-obj.cfg ../yolo-obj_1600.weights model_data/yolo.h5
47 | echo "label" > model_data/coco_classes.txt
48 |
49 | #needed darknet. Moved to pwd.
50 |
--------------------------------------------------------------------------------
/labelReader.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | from __future__ import print_function
3 | from config import *
4 | from utils.darknet_classify_image import *
5 | from utils.keras_classify_image import *
6 | from utils.azure_ocr import *
7 | from utils.tesseract_ocr import *
8 | import utils.logger as logger
9 | from utils.rotate import *
10 | from utils.lookup_database import *
11 | import sys
12 | from PIL import Image
13 | import time
14 | import os
15 | from RotNet.correct_rotation import *
16 |
17 | PYTHON_VERSION = sys.version_info[0]
18 | OS_VERSION = os.name
19 |
20 | class RobotIdentifier():
21 | ''' Programatically finds and determines if a pictures contains an asset and where it is. '''
22 |
23 | def init_vars(self):
24 | try:
25 | self.DARKNET = DARKNET
26 | self.KERAS = KERAS
27 | self.TESSERACT = TESSERACT
28 | self.COGNITIVE_SERVICES = COGNITIVE_SERVICES
29 |
30 | self.COSMOS_DATABASE = COSMOS_DATABASE
31 | self.LOCAL_DATABASE = LOCAL_DATABASE
32 |
33 | return 0
34 | except:
35 | return -1
36 |
37 | def init_classifier(self):
38 | ''' Initializes the classifier '''
39 | try:
40 | if self.DARKNET:
41 | # Get a child process for speed considerations
42 | logger.good("Initializing Darknet")
43 | self.classifier = DarknetClassifier()
44 | elif self.KERAS:
45 | logger.good("Initializing Keras")
46 | self.classifier = KerasClassifier()
47 | if self.classifier == None or self.classifier == -1:
48 | return -1
49 | return 0
50 | except:
51 | return -1
52 |
53 | def init_ocr(self):
54 | ''' Initializes the OCR engine '''
55 | try:
56 | if self.TESSERACT:
57 | logger.good("Initializing Tesseract")
58 | self.OCR = TesseractOCR()
59 | elif self.COGNITIVE_SERVICES:
60 | logger.good("Initializing Cognitive Services")
61 | self.OCR = AzureOCR()
62 | if self.OCR == None or self.OCR == -1:
63 | return -1
64 | return 0
65 | except:
66 | return -1
67 |
68 | def init_database(self):
69 | if self.LOCAL_DATABASE:
70 | logger.good("Initializing local database")
71 | from utils.local_database import LocalDatabase
72 | self.database = LocalDatabase()
73 | elif self.COSMOS_DATABASE:
74 | logger.good("Initializing Cosmos Database")
75 | from utils.cosmos_database import CosmosDatabase
76 | self.database = CosmosDatabase()
77 | else:
78 | self.database = -1
79 | if self.database == -1:
80 | return -1
81 | return 0
82 |
83 |
84 | def init_tabComplete(self):
85 | ''' Initializes the tab completer '''
86 | try:
87 | if OS_VERSION == "posix":
88 | global tabCompleter
89 | global readline
90 | from utils.PythonCompleter import tabCompleter
91 | import readline
92 | comp = tabCompleter()
93 | # we want to treat '/' as part of a word, so override the delimiters
94 | readline.set_completer_delims(' \t\n;')
95 | readline.parse_and_bind("tab: complete")
96 | readline.set_completer(comp.pathCompleter)
97 | if not comp:
98 | return -1
99 | return 0
100 | except:
101 | return -1
102 |
103 | def prompt_input(self):
104 | ''' Prompts the user for input, depending on the python version.
105 | Return: The filename provided by the user. '''
106 | if PYTHON_VERSION == 3:
107 | filename = str(input(" Specify File >>> "))
108 | elif PYTHON_VERSION == 2:
109 | filename = str(raw_input(" Specify File >>> "))
110 | return filename
111 |
112 | from utils.locate_asset import locate_asset
113 |
114 | def initialize(self):
115 | if self.init_vars() != 0:
116 | logger.fatal("Init vars")
117 | if self.init_tabComplete() != 0:
118 | logger.fatal("Init tabcomplete")
119 | if self.init_classifier() != 0:
120 | logger.fatal("Init Classifier")
121 | if self.init_ocr() != 0:
122 | logger.fatal("Init OCR")
123 | if initialize_rotnet() != 0:
124 | logger.fatal("Init RotNet")
125 | if self.init_database() == -1:
126 | logger.info("Not using Database")
127 |
128 | def find_and_classify(self, filename):
129 | start = time.time()
130 |
131 | #### Classify Image ####
132 | logger.good("Classifying Image")
133 | coords = self.classifier.classify_image(filename)
134 | ########################
135 |
136 | time1 = time.time()
137 | print("Classify Time: " + str(time1-start))
138 |
139 | #### Crop/rotate Image ####
140 | logger.good("Locating Asset")
141 | cropped_images = self.locate_asset(filename, self.classifier, lines=coords)
142 | ###########################
143 |
144 | time2 = time.time()
145 | print("Rotate Time: " + str(time2-time1))
146 |
147 |
148 | #### Perform OCR ####
149 | ocr_results = None
150 | if cropped_images == []:
151 | logger.bad("No assets found, so terminating execution")
152 | else:
153 | logger.good("Performing OCR")
154 | ocr_results = self.OCR.ocr(cropped_images)
155 | #####################
156 |
157 | time3 = time.time()
158 | print("OCR Time: " + str(time3-time2))
159 |
160 | end = time.time()
161 | logger.good("Elapsed: " + str(end-start))
162 |
163 | #### Lookup Database ####
164 | if self.database != -1:
165 | products = self.database.lookup_database(ocr_results)
166 | return products
167 | else:
168 | return ocr_results
169 | #########################
170 |
171 | def __init__(self):
172 | ''' Run RobotIdentifier! '''
173 | self.initialize()
174 |
175 | if __name__ == "__main__":
176 | identifier = RobotIdentifier()
177 | while True:
178 | filename = identifier.prompt_input()
179 | identifier.find_and_classify(filename)
180 |
--------------------------------------------------------------------------------
/samples/pic1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ecthros/labelReader/4dea80798fca2a6bb18949f3f00671d76573c62d/samples/pic1.jpg
--------------------------------------------------------------------------------
/samples/pic2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ecthros/labelReader/4dea80798fca2a6bb18949f3f00671d76573c62d/samples/pic2.jpg
--------------------------------------------------------------------------------
/sendImage.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import requests
3 | import time
4 |
5 | start = time.time()
6 |
7 | if len(sys.argv) != 2:
8 | print("USAGE: sendImage.py ")
9 |
10 | with open(sys.argv[1], 'rb') as myfile:
11 | image = myfile.read()
12 |
13 | headers = {'Content-Type': 'application/octet-stream'}
14 | request_url = "REQUEST_URL"
15 |
16 | response = requests.post(request_url, headers=headers, data=image)
17 | end = time.time()
18 | print(response)
19 | print(response.json()['return'])
20 | print("Time Elapsed: " + str(end-start))
21 |
--------------------------------------------------------------------------------
/utils/PythonCompleter.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import readline
4 | import glob
5 |
6 | class tabCompleter(object):
7 | ''' A simple tab completer for linux '''
8 | def pathCompleter(self,text,state):
9 | line = readline.get_line_buffer().split()
10 | if '~' in text:
11 | text = os.path.expanduser('~')
12 | if os.path.isdir(text):
13 | text += '/'
14 | return [x for x in glob.glob(text+'*')][state]
15 |
--------------------------------------------------------------------------------
/utils/azure_ocr.py:
--------------------------------------------------------------------------------
1 | import requests
2 | import json
3 | import time
4 | from utils.ocr import OCR
5 | from config import *
6 | from io import BytesIO
7 | from typing import Tuple, Dict, List
8 |
9 | class AzureOCR(OCR):
10 | def initialize(self):
11 | self.SUBSCRIPTION_KEY = SUBSCRIPTION_KEY
12 | self.SHOW_RESPONSE = SHOW_RESPONSE
13 |
14 | def print_response(self, area:Tuple[float, float, float, float], response:Dict) -> str:
15 | ''' Prints the response from Cognitive Services.
16 | Input:
17 | area - String describing the bounding box of the data
18 | response - The response for the image from Cognitive Services
19 | '''
20 | txt = ""
21 | for line in response['recognitionResult']['lines']:
22 | txt += line['text'] + '\n'
23 | if self.SHOW_RESPONSE:
24 | if response["status"] == "Succeeded":
25 | #print(response['recognitionResult']['lines'])
26 | print("")
27 | print("==RESULT==" + str(area))
28 | for line in response['recognitionResult']['lines']:
29 | print(line['text'])
30 | print("==========================")
31 | print("")
32 | else:
33 | print("Processing failed:")
34 | print(response)
35 | return txt
36 |
37 | def ocr_one_image(self, area:Tuple[float, float, float, float], image_data:object, threadList=-1, threadNum=None) -> None:
38 | ''' Performs OCR on a single image
39 | Input:
40 | area - String that describe the bounding box of the data
41 | image_data - String of the data
42 | '''
43 | image_data = self.pic_to_string(image_data)
44 | try:
45 | request_url = "https://westus.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=Printed"
46 | headers = {'Ocp-Apim-Subscription-Key': self.SUBSCRIPTION_KEY, 'Content-Type': "application/octet-stream"}
47 | data = image_data
48 |
49 | # Send the POST request and parse the response
50 | response = requests.request('post', request_url, headers=headers, data=data)
51 |
52 | if response.status_code == 202:
53 | get_response = {}
54 | get_response["status"] = "Running"
55 | #print(get_response)
56 | # Continue sending requests until it finished processing
57 | while get_response["status"] == "Running" or get_response["status"] == "NotStarted":
58 | #print(get_response)
59 | time.sleep(.2)
60 | r2 = requests.get(response.headers['Operation-Location'], headers={'Ocp-Apim-Subscription-Key': self.SUBSCRIPTION_KEY})
61 | get_response = r2.json()
62 | #print(get_response)
63 | res = self.print_response(area, get_response)
64 | if threadList != -1:
65 | threadList[threadNum] = (res)
66 | return res
67 | print(response)
68 | except Exception as e:
69 | print("OCR failed")
70 | print(e)
71 | return None
72 |
73 |
74 | def pic_to_string(self, image) -> str:
75 | ''' Uses PIL and StringIO to save the image to a string for further processing
76 | Input: image - an image opened by PIL
77 | Output: A string containing all the data of the picture'''
78 | output_string = BytesIO()
79 | image.save(output_string, format="JPEG")
80 | string_contents = output_string.getvalue()
81 | output_string.close()
82 | return string_contents
83 |
--------------------------------------------------------------------------------
/utils/classifier.py:
--------------------------------------------------------------------------------
1 | from abc import ABC, abstractmethod
2 | from typing import Tuple
3 |
4 | class Classifier(ABC):
5 | @abstractmethod
6 | def initialize(self):
7 | ''' Initialize the classifier '''
8 | pass
9 |
10 | @abstractmethod
11 | def classify_image(self, image):
12 | ''' Classify an image.
13 | Input: An image, opened by PIL'''
14 | pass
15 |
16 | @abstractmethod
17 | def extract_info(self, line:str) -> Tuple:
18 | ''' Extract the information from a line returned by the classifier.
19 | Ex: Many programs do not return in an easily readable format, and need to be parsed.
20 | For example, a line could be: "label (90%) x:1300 y:3400 height:300 width:900". This
21 | should return the area of the bounding box. '''
22 | pass
23 |
24 | def __init__(self):
25 | self.initialize()
26 |
--------------------------------------------------------------------------------
/utils/cosmos_database.py:
--------------------------------------------------------------------------------
1 | from utils.database import Database
2 | import pydocumentdb.documents as documents
3 | import pydocumentdb.document_client as document_client
4 | import requests
5 | import traceback
6 | import urllib3
7 | from config import *
8 | from collections import ChainMap
9 |
10 | def test_ssl_connection(client):
11 | try:
12 | databases = list(client.ReadDatabases())
13 | return True
14 | except requests.exceptions.SSLError as e:
15 | print("SSL error occured. ", e)
16 | except OSError as e:
17 | print("OSError occured. ", e)
18 | except Exception as e:
19 | print(traceback.format_exc())
20 | return False
21 |
22 | def ObtainClient():
23 | connection_policy = documents.ConnectionPolicy()
24 | connection_policy.SSLConfiguration = documents.SSLConfiguration()
25 | urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
26 | connection_policy.SSLConfiguration.SSLCaCerts = False
27 | return document_client.DocumentClient(HOST, {'masterKey': MASTER_KEY}, connection_policy)
28 |
29 | def GetDocumentLink(database_id, collection_id, document_id):
30 | return "dbs/" + database_id + "/colls/" + collection_id + "/docs/" + document_id
31 |
32 | class CosmosDatabase(Database):
33 |
34 | def initialize(self):
35 | client = ObtainClient()
36 | if test_ssl_connection(client) == True:
37 | database = client.ReadDocument(GetDocumentLink(DATABASE_ID, COLLECTION_ID, "active_products"))['products']
38 | if database == []:
39 | return -1
40 | self.database = ChainMap(*database)
41 | else:
42 | return -1
43 | return self.database
44 |
45 |
--------------------------------------------------------------------------------
/utils/darknet_classify_image.py:
--------------------------------------------------------------------------------
1 | import pexpect
2 | import os
3 | from utils.classifier import Classifier
4 | from config import *
5 | from typing import Tuple
6 |
7 | class DarknetClassifier(Classifier):
8 |
9 | def initialize(self):
10 | ''' Initialize darknet. We do this for speed concerns.
11 | Input:
12 | thresh (float) - specifies the threshold of detection
13 | data (string) - name of the data file for darknet
14 | cfg (string) - name of the configuration file
15 | weights (string) - name of the pre-trained weights
16 | Return:
17 | proc (pexpect process), which we use to interact with the running darknet process '''
18 | command = DARKNET_BINARY_LOCATION + " detector test " + DARKNET_DATA_FILE + " " + DARKNET_CFG_FILE \
19 | + " " + DARKNET_WEIGHTS + " -thresh " + str(DARKNET_THRESH) + " -ext_output -dont_show"
20 | if os.name == 'nt':
21 | self.proc = popen_spawn.PopenSpawn(command)
22 | else:
23 | self.proc = pexpect.spawn(command)
24 | self.proc.expect('Enter Image Path:')
25 |
26 | def classify_image(self, image:str) -> str:
27 | ''' Classifies a given image. Simply provide the name (string) of the image, and the proc to do it on.
28 | Input:
29 | image (string) - name of the saved image file
30 | self.proc (proc) - Pexpect proc to interact with
31 | Return:
32 | Returns the output from darknet, which gives the location of each bounding box. '''
33 | self.proc.sendline(image)
34 | self.proc.expect('Enter Image Path:', timeout=90)
35 | res = self.proc.before
36 | return res.decode('utf-8')
37 | def extract_info(self, line:str) -> Tuple:
38 | ''' Extracts the information from a single line that contains a label.
39 | Input: line (string), a line that already contains the label
40 | Output: area (Tuple of four ints), which gives the area of the bounding box.
41 | '''
42 | nameplate_info = line.split()
43 | nameplate_confidence = nameplate_info[1]
44 | nameplate_left_x = int(nameplate_info[3])
45 | nameplate_top_y = int(nameplate_info[5])
46 | nameplate_width = int(nameplate_info[7])
47 | nameplate_height = int(nameplate_info[9][:-1])
48 |
49 | area = (nameplate_left_x, nameplate_top_y, (nameplate_left_x + nameplate_width), (nameplate_top_y + nameplate_height))
50 |
51 | return area
52 |
--------------------------------------------------------------------------------
/utils/database.py:
--------------------------------------------------------------------------------
1 | from fuzzywuzzy import fuzz
2 | from fuzzywuzzy import process
3 | from typing import Tuple
4 | from abc import ABC, abstractmethod
5 | from utils.logger import *
6 |
7 | class Database(object):
8 |
9 | @abstractmethod
10 | def initialize(self):
11 | ''' This method should return the self.database parameter. self.database should be a dictionary
12 | with product identifiers as the key and values if they are enabled or not. '''
13 | pass
14 |
15 | def __init__(self):
16 | self.database = self.initialize()
17 |
18 | def lookup_database(self, txt:Tuple[Tuple[float, float, float, float], str]):
19 | ''' Input:
20 | txt ((area, string) tuple) - Contains the bounding box of the image and the accompanying string.
21 | This methodwill look up the string and determine if the product is active or disabled.'''
22 | if txt is None:
23 | return
24 | products = ""
25 | for line in txt:
26 | lines = line[1].split('\n')
27 | max = 0
28 | bestGuess = "UNKNOWN"
29 | bestWord = ""
30 | keys = self.database.keys()
31 | for l in lines:
32 | for word in l.split(' '):
33 | if word != "":
34 | (guess, confidence) = process.extractOne(word, keys, scorer=fuzz.token_sort_ratio)
35 | if confidence > max:
36 | max = confidence
37 | bestGuess = guess
38 | bestWord = word
39 |
40 | if bestGuess == "UNKNOWN":
41 | print("Unknown product - " + str(line[0]))
42 | else:
43 | print(bestWord)
44 | product = (str(self.database[bestGuess]) + " product (" + bestGuess + ") - " + str(line[0]) + ", confidence: " + str(max))
45 | products += product
46 | print(product)
47 | return products
48 |
49 |
--------------------------------------------------------------------------------
/utils/keras_classify_image.py:
--------------------------------------------------------------------------------
1 | import pexpect
2 | import os
3 | from utils.classifier import Classifier
4 | from config import *
5 | from typing import Tuple
6 |
7 | class KerasClassifier(Classifier):
8 |
9 | def initialize(self):
10 | ''' Initialize the Keras-yolo model for speed concerns.
11 | Return: None, but self.proc is populated with a procedure that can interface with Keras-Yolo '''
12 |
13 | command = "python yolo_video.py --image"
14 | if os.name == 'nt':
15 | self.proc = popen_spawn.PopenSpawn(command, cwd=os.path.dirname(KERAS_LOCATION))
16 | else:
17 | self.proc = pexpect.spawn(command, cwd=os.path.dirname(KERAS_LOCATION))
18 | self.proc.expect('Input image filename:', timeout=900)
19 |
20 |
21 | def classify_image(self, image:str) -> str:
22 | ''' Classifies a given image using Keras-Yolo3.
23 | Should already be initialized.
24 | Input:
25 | image (string) - Provide the saved filename
26 | Returns:
27 | string of the results from Keras-Yolo3'''
28 | self.proc.sendline("../" + image) # Todo please fix this line
29 | self.proc.expect('Input image filename:', timeout=900)
30 | res = self.proc.before
31 | return res.decode('utf-8')
32 |
33 | def extract_info(self, line:str) -> Tuple:
34 | ''' Extracts the information from a single line that contains a label.
35 | Input: line (string), a line that already contains the label
36 | Output: area (Tuple of four ints), which gives the area of the bounding box.
37 | '''
38 | nameplate_info = line.split()
39 | nameplate_confidence = nameplate_info[1]
40 | nameplate_left_x = int(nameplate_info[2][1:][:-1])
41 | nameplate_top_y = int(nameplate_info[3][:-1])
42 | nameplate_right_x = int(nameplate_info[4][1:][:-1])
43 | nameplate_bottom_y = int(nameplate_info[5][:-1])
44 |
45 | area = (nameplate_left_x, nameplate_top_y, nameplate_right_x, (nameplate_bottom_y))
46 |
47 | return area
48 |
--------------------------------------------------------------------------------
/utils/local_database.py:
--------------------------------------------------------------------------------
1 | from utils.database import Database
2 |
3 |
4 | bose_qc25 = {
5 | "065252Z80341129AE": "Active",
6 | "065252Z80571416AE": "Inactive"
7 | }
8 |
9 | DATABASE = {
10 | "715053-0010": bose_qc25
11 | }
12 |
13 | class LocalDatabase(Database):
14 | def initialize(self):
15 | return bose_qc25
16 |
--------------------------------------------------------------------------------
/utils/locate_asset.py:
--------------------------------------------------------------------------------
1 | from PIL import Image
2 | from PIL import ImageFilter
3 | import utils.logger as logger
4 | from utils.rotate import rotate
5 | from config import *
6 | from typing import Tuple, List
7 | import sys
8 | i = 0
9 | def crop_image(image, area:Tuple) -> object:
10 | ''' Uses PIL to crop an image, given its area.
11 | Input:
12 | image - PIL opened image
13 | Area - Coordinates in tuple (xmin, ymax, xmax, ymin) format '''
14 | img = Image.open(image)
15 | cropped_image = img.crop(area)
16 |
17 | # Rotation should happen here
18 | rotated_image = rotate(cropped_image)
19 |
20 | size = (3200, 3200)
21 | rotated_image.thumbnail(size, Image.ANTIALIAS)
22 | global i
23 | rotated_image.save("asdf" + str(i) + ".jpg", "JPEG")
24 | i += 1
25 |
26 | if SHOW_IMAGES:
27 | logger.good("Showing cropped image")
28 | rotated_image.show()
29 |
30 | return rotated_image
31 |
32 |
33 | def locate_asset(self, image, classifier, lines="") -> List:
34 | ''' Determines where an asset is in the picture, returning
35 | a set of coordinates, for the top left, top right, bottom
36 | left, and bottom right of the tag
37 | Returns:
38 | [(area, image)]
39 | Area is the coordinates of the bounding box
40 | Image is the image, opened by PIL.'''
41 | cropped_images = []
42 |
43 | for line in str(lines).split('\n'):
44 |
45 | if LABEL_NAME + ":" in line:
46 | # Extract the nameplate info
47 | area = classifier.extract_info(line)
48 | # Open image
49 | cropped_images.append((area, crop_image(image, area)))
50 | if cropped_images == []:
51 | logger.bad("No label found in image.")
52 | else:
53 | logger.good("Found " + str(len(cropped_images)) + " label(s) in image.")
54 |
55 | return cropped_images
56 |
--------------------------------------------------------------------------------
/utils/logger.py:
--------------------------------------------------------------------------------
1 | import sys
2 |
3 | def good(message:str):
4 | ''' Prints a message with [+] at the front to signify success '''
5 | print("[+] " + str(message))
6 |
7 | def bad(message:str):
8 | ''' Prints a message with [-] at the front to signify failure '''
9 | print("[-] " + str(message))
10 | def info(message:str):
11 | ''' Prints a message to signify information '''
12 | print("[ ] " + str(message))
13 |
14 | def fatal(failure:str):
15 | print("[/] " + str(failure) + " failed, exiting now")
16 | sys.exit(1)
17 |
--------------------------------------------------------------------------------
/utils/lookup_database.py:
--------------------------------------------------------------------------------
1 | from fuzzywuzzy import fuzz
2 | from fuzzywuzzy import process
3 | from typing import Tuple
4 | bose_qc25 = {
5 | "065252Z80341129AE": "Active",
6 | "065252Z80571416AE": "Inactive"
7 | }
8 |
9 | DATABASE = {
10 | "715053-0010": bose_qc25
11 | }
12 |
13 |
14 | def lookup_database(txt:Tuple[Tuple[float, float, float, float], str]):
15 | ''' Input:
16 | txt ((area, string) tuple) - Contains the bounding box of the image and the accompanying string.
17 | This methodwill look up the string and determine if the product is active or disabled.'''
18 | if txt is None:
19 | return
20 | for line in txt:
21 | lines = line[1].split('\n')
22 | max = 0
23 | bestGuess = "UNKNOWN"
24 | bestWord = ""
25 | keys = bose_qc25.keys()
26 | for l in lines:
27 | for word in l.split(' '):
28 | if word != "":
29 | (guess, confidence) = process.extractOne(word, keys, scorer=fuzz.token_sort_ratio)
30 | if confidence > max:
31 | max = confidence
32 | bestGuess = guess
33 | bestWord = word
34 |
35 | if bestGuess == "UNKNOWN":
36 | print("Unknown product - " + str(line[0]))
37 | else:
38 | print(bestWord)
39 | print(str(bose_qc25[bestGuess]) + " product (" + bestGuess + ") - " + str(line[0]) + ", confidence: " + str(max))
40 |
41 |
--------------------------------------------------------------------------------
/utils/ocr.py:
--------------------------------------------------------------------------------
1 | from abc import ABC, abstractmethod
2 | from typing import List
3 | import threading
4 |
5 | class OCR(ABC):
6 | @abstractmethod
7 | def initialize(self):
8 | ''' Initialize the OCR '''
9 | pass
10 |
11 | @abstractmethod
12 | def ocr_one_image(self, images:List) -> List:
13 | ''' OCR an image.
14 | Input: An array of (area, image)s, opened by PIL and pre-processed
15 | Return: An array of (area, message), where the message is from OCR'''
16 | pass
17 |
18 | def ocr(self, images:List) -> List:
19 | '''Sends an opened image to Azure's cognitive services.
20 | Input: images (tuple(area, image))
21 | Returns the results from Tesseract.'''
22 | threads = []
23 | threadResults = ["" for i in range(len(images))]
24 | threadNum = 0
25 | results = []
26 | for image in images:
27 | t = threading.Thread(target=self.ocr_one_image, args=(image[0], image[1]), kwargs={'threadList':threadResults, 'threadNum':threadNum})
28 |
29 | t.start()
30 | threads.append(t)
31 | threadNum += 1
32 |
33 | for t in threads:
34 | t.join()
35 | i = 0
36 | for result in threadResults:
37 | results.append((images[i][0], result))
38 | i += 1
39 | return results
40 |
41 | def __init__(self):
42 | self.initialize()
43 |
--------------------------------------------------------------------------------
/utils/rotate.py:
--------------------------------------------------------------------------------
1 | from PIL import Image
2 | import utils.logger as logger
3 | import subprocess
4 | import os
5 | import pexpect
6 | from RotNet.correct_rotation import *
7 | from config import *
8 | import time
9 |
10 | def initialize_rotnet() -> int:
11 | ''' For speed concerns, let's load up the model first
12 | Head to the RotNet directory and use correct_rotation to lod the model '''
13 | try:
14 | logger.good("Initializing RotNet")
15 | init_rotnet(ROTNET_LOCATION + "/" + ROTNET_MODEL_NAME)
16 | return 0
17 | except:
18 | return -1
19 |
20 |
21 | def rotate(image:object) -> object:
22 | ''' Uses RotNet's Keras/Tensorflow algorithm to rotate an image.
23 | Input: image, opened with PIL
24 | Output: Rotated image '''
25 |
26 | # We need to save the file first for processing
27 | image.save(ROTNET_SAVE_FILE_NAME, "JPEG")
28 |
29 | logger.good("Rotating Image")
30 | rotate_image(ROTNET_SAVE_FILE_NAME)
31 | image = Image.open(ROTNET_SAVE_FILE_NAME)
32 | return image
33 |
--------------------------------------------------------------------------------
/utils/tesseract_ocr.py:
--------------------------------------------------------------------------------
1 | from utils.ocr import OCR
2 | import pyocr
3 | import pyocr.builders
4 | import sys
5 | from PIL import Image
6 | from typing import Tuple
7 |
8 | class TesseractOCR(OCR):
9 |
10 | def initialize(self):
11 | ''' Initialize Tesseract and load it up for speed '''
12 | tools = pyocr.get_available_tools()
13 | if len(tools) == 0:
14 | print("No tools found, do you have Tesseract installed?")
15 | sys.exit(1)
16 | self.tool = tools[0]
17 | self.langs = self.tool.get_available_languages()
18 |
19 | def ocr_one_image(self, area, image, threadList=-1, threadNum=None):
20 | print("Starting image...")
21 | txt = self.tool.image_to_string(image, lang=self.langs[0], builder=pyocr.builders.TextBuilder())
22 | print("==RESULT==" + str(area) + "\n" + txt + "\n==========================")
23 | if threadList != -1:
24 | threadList[threadNum] = txt
25 | return txt
26 |
--------------------------------------------------------------------------------
/yolo-obj.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | ## Testing
3 | #batch=1
4 | #subdivisions=1
5 | # Training
6 | batch=32
7 | subdivisions=8
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 2500
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 |
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 |
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 |
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=18
604 | activation=linear
605 |
606 |
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
610 | classes=1
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 |
617 |
618 | [route]
619 | layers = -4
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 |
629 | [upsample]
630 | stride=2
631 |
632 | [route]
633 | layers = -1, 61
634 |
635 |
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 |
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 |
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 |
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 |
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=18
690 | activation=linear
691 |
692 |
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
696 | classes=1
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 |
703 |
704 |
705 | [route]
706 | layers = -4
707 |
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 |
716 | [upsample]
717 | stride=2
718 |
719 | [route]
720 | layers = -1, 36
721 |
722 |
723 |
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 |
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 |
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 |
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 |
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 |
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 |
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=18
777 | activation=linear
778 |
779 |
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
783 | classes=1
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1
789 |
790 |
--------------------------------------------------------------------------------