├── .github
    └── ISSUE_TEMPLATE
    │   ├── bug_report.md
    │   └── feature_request.md
├── .gitignore
├── README.md
├── gait
    └── core
    │   ├── main.py
    │   └── model_data.py
├── live_detection.py
├── model.json
├── opencvutils
    ├── detection
    │   ├── detection.py
    │   ├── haarcascade_frontalface_default.xml
    │   └── test.py
    └── detection_tests
    │   ├── frontal_face
    │       ├── haarcascade_frontalface_default.xml
    │       └── object_detection.py
    │   ├── full_body
    │       └── motion_detector.py
    │   └── pedestrian
    │       └── pedestrian_detection.py
├── requirements.txt
├── yoloutils
    └── detection
    │   ├── detection.py
    │   ├── dog.jpg
    │   ├── general_detection.py
    │   ├── testing_gen_detection.py
    │   ├── yolov3.cfg
    │   └── yolov3.txt
├── yolov3.cfg
└── yolov3.txt


/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | 
 5 | ---
 6 | 
 7 | **Describe the bug**
 8 | A clear and concise description of what the bug is.
 9 | 
10 | **To Reproduce**
11 | Steps to reproduce the behavior:
12 | 1. Go to '...'
13 | 2. Click on '....'
14 | 3. Scroll down to '....'
15 | 4. See error
16 | 
17 | **Expected behavior**
18 | A clear and concise description of what you expected to happen.
19 | 
20 | **Screenshots**
21 | If applicable, add screenshots to help explain your problem.
22 | 
23 | **Desktop (please complete the following information):**
24 |  - OS: [e.g. iOS]
25 |  - Browser [e.g. chrome, safari]
26 |  - Version [e.g. 22]
27 | 
28 | **Smartphone (please complete the following information):**
29 |  - Device: [e.g. iPhone6]
30 |  - OS: [e.g. iOS8.1]
31 |  - Browser [e.g. stock browser, safari]
32 |  - Version [e.g. 22]
33 | 
34 | **Additional context**
35 | Add any other context about the problem here.
36 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Feature request
 3 | about: Suggest an idea for this project
 4 | 
 5 | ---
 6 | 
 7 | **Is your feature request related to a problem? Please describe.**
 8 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
 9 | 
10 | **Describe the solution you'd like**
11 | A clear and concise description of what you want to happen.
12 | 
13 | **Describe alternatives you've considered**
14 | A clear and concise description of any alternative solutions or features you've considered.
15 | 
16 | **Additional context**
17 | Add any other context or screenshots about the feature request here.
18 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | yolov3.weights
  6 | 
  7 | # C extensions
  8 | *.so
  9 | 
 10 | # Distribution / packaging
 11 | .Python
 12 | build/
 13 | develop-eggs/
 14 | dist/
 15 | downloads/
 16 | eggs/
 17 | .eggs/
 18 | lib/
 19 | lib64/
 20 | parts/
 21 | sdist/
 22 | var/
 23 | wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | detection_tools/
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Idea
 37 | *.iml
 38 | .idea/
 39 | 
 40 | # Image sources
 41 | imagesrc/
 42 | 
 43 | # Installer logs
 44 | pip-log.txt
 45 | pip-delete-this-directory.txt
 46 | 
 47 | # Unit test / coverage reports
 48 | htmlcov/
 49 | .tox/
 50 | .coverage
 51 | .coverage.*
 52 | .cache
 53 | nosetests.xml
 54 | coverage.xml
 55 | *.cover
 56 | .hypothesis/
 57 | .pytest_cache/
 58 | 
 59 | # Translations
 60 | *.mo
 61 | *.pot
 62 | 
 63 | # Django stuff:
 64 | *.log
 65 | local_settings.py
 66 | db.sqlite3
 67 | 
 68 | # Flask stuff:
 69 | instance/
 70 | .webassets-cache
 71 | 
 72 | # Scrapy stuff:
 73 | .scrapy
 74 | 
 75 | # Sphinx documentation
 76 | docs/_build/
 77 | 
 78 | # PyBuilder
 79 | target/
 80 | 
 81 | # Jupyter Notebook
 82 | .ipynb_checkpoints
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # celery beat schedule file
 88 | celerybeat-schedule
 89 | 
 90 | # SageMath parsed files
 91 | *.sage.py
 92 | 
 93 | # Environments
 94 | .env
 95 | .venv
 96 | env/
 97 | venv/
 98 | ENV/
 99 | env.bak/
100 | venv.bak/
101 | 
102 | # Spyder project settings
103 | .spyderproject
104 | .spyproject
105 | 
106 | # Rope project settings
107 | .ropeproject
108 | 
109 | # mkdocs documentation
110 | /site
111 | 
112 | # mypy
113 | .mypy_cache/
114 | 
115 | *.h5
116 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Cross-View Gait Based Human Identification
 2 | 
 3 | GAIT is a kind of behavioral biometric feature, whose raw data are video sequences presenting walking people. It is particularly suitable for long-distance human identification, and requires no explicit co-operation by subjects, compared with other kinds of biometric features such as fingerprint and iris.
 4 | 
 5 | ## Using multiple images of the same person from different angles.
 6 | ![gait1](https://i.imgur.com/8QESu5B.png)
 7 | 
 8 | ## One person having multiple outfits
 9 | ![gait2](https://i.imgur.com/NUT0kaf.jpg)
10 | 
11 | ## The Deep Neural Network Architecture
12 | ![gait3](https://i.imgur.com/4col0vk.png)
13 | 
14 | 
15 | #### Currently Under Active Development
16 | 


--------------------------------------------------------------------------------
/gait/core/main.py:
--------------------------------------------------------------------------------
 1 | from keras.layers import (
 2 |     Convolution2D,
 3 |     MaxPooling2D,
 4 |     Flatten,
 5 |     Dense
 6 | )
 7 | from keras.preprocessing.image import ImageDataGenerator
 8 | from keras.models import Sequential
 9 | from keras.models import model_from_json
10 | import simplejson as sj
11 | 
12 | def create_model():
13 |     model = Sequential()
14 |     model.add(Convolution2D(4, (3, 3), input_shape=(240, 320, 3), activation='relu'))
15 |     model.add(MaxPooling2D(pool_size=(2, 2)))
16 |     model.add(Flatten())
17 |     model.add(Dense(output_dim=560, activation='relu'))
18 |     model.add(Dense(output_dim=560, activation='relu'))
19 |     model.add(Dense(output_dim=560, activation='relu'))
20 |     model.add(Dense(output_dim=560, activation='relu'))
21 |     model.add(Dense(output_dim=1, activation='sigmoid'))
22 |     return model
23 | 
24 | def train(model, training_set, test_set):
25 |     model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
26 |     model.fit_generator(
27 |             training_set,
28 |             steps_per_epoch=250,
29 |             epochs=25,
30 |             verbose=1,
31 |             validation_data=test_set,
32 |             validation_steps=62.5
33 |     )
34 | 
35 | def save_model(model):
36 |     print("Saving...")
37 |     model.save_weights("model.h5")
38 |     print(" [*] Weights")
39 |     open("model.json", "w").write(
40 |             sj.dumps(sj.loads(model.to_json()), indent=4)
41 |     )
42 |     print(" [*] Model")
43 | 
44 | def load_model():
45 |     print("Loading...")
46 |     json_file = open("model.json", "r")
47 |     model = model_from_json(json_file.read())
48 |     print(" [*] Model")
49 |     model.load_weights("model.h5")
50 |     print(" [*] Weights")
51 |     json_file.close()
52 |     return model
53 | 
54 | def dataset_provider(datagen):
55 |     return datagen.flow_from_directory(
56 |         'imagesrc',
57 |         target_size=(240, 320),
58 |         batch_size=32,
59 |         class_mode='binary'
60 |     )
61 | 
62 | # Primary datagen
63 | train_datagen = ImageDataGenerator(
64 |     shear_range=0.2,
65 |     zoom_range=0.2,
66 |     horizontal_flip=True)
67 | # Validation datagen
68 | test_datagen = ImageDataGenerator(rescale=1. / 255)
69 | 
70 | # Primary Set for training
71 | training_set = dataset_provider(train_datagen)
72 | # Secondary / Test set for validation
73 | test_set = dataset_provider(test_datagen)
74 | 
75 | model = create_model()
76 | train(model, training_set, test_set)
77 | save_model(model)


--------------------------------------------------------------------------------
/gait/core/model_data.py:
--------------------------------------------------------------------------------
 1 | from keras.models import model_from_json
 2 | 
 3 | # load json and create model
 4 | json_file = open('model.json', 'r')
 5 | loaded_model_json = json_file.read()
 6 | json_file.close()
 7 | loaded_model = model_from_json(loaded_model_json)
 8 | # load weights into new model
 9 | loaded_model.load_weights("model.h5")
10 | print("Loaded model from disk")
11 |  
12 | # evaluate loaded model on test data
13 | loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
14 | score = loaded_model.evaluate(X, Y, verbose=0)
15 | print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))
16 | 
17 | 


--------------------------------------------------------------------------------
/live_detection.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import argparse
 3 | import time
 4 | import cv2
 5 | 
 6 | objectDetected = 'item'
 7 | with open('yolov3.txt', 'r') as f:
 8 |     classes = [line.strip() for line in f.readlines()]
 9 | 
10 | COLORS = np.random.uniform(0, 255, size=(len(classes), 3))
11 | 
12 | def distance_to_camera(knownWidth, focalLength, perWidth):
13 |     return (knownWidth * focalLength) / perWidth
14 | 
15 | def get_output_layers(net):
16 |     layer_names = net.getLayerNames()
17 |     output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
18 |     return output_layers
19 | 
20 | def draw_prediction(img, class_id, confidence, x, y, x_plus_w, y_plus_h):
21 |     label = str(classes[class_id])
22 |     color = COLORS[class_id]
23 |     cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)
24 |     cv2.putText(img, label+'('+str(int(confidence*100))+'%)', (x - 10, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
25 | 
26 | net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
27 | print("[INFO] starting video stream...")
28 | vs = cv2.VideoCapture(0)
29 | start = time.time()
30 | 
31 | frame_count = 0.0
32 | 
33 | KNOWN_DISTANCE = 24.0
34 | KNOWN_WIDTH = 11.0
35 | 
36 | while True:
37 |         ret, frame = vs.read()
38 |         cv2.resize(frame, (600, frame.shape[0]))
39 | 
40 |         scale = 0.00392
41 |         (height, width) = frame.shape[:2]
42 |         blob = cv2.dnn.blobFromImage(frame, scale, (416, 416), (0, 0, 0), True, crop=False)
43 |         net.setInput(blob)
44 |         outs = net.forward(get_output_layers(net))
45 |         class_ids = []
46 |         confidences = []
47 |         boxes = []
48 |         conf_threshold = 0.5
49 |         nms_threshold = 0.4
50 |         for out in outs:
51 |             for detection in out:
52 |                 scores = detection[5:]
53 |                 class_id = np.argmax(scores)
54 |                 confidence = scores[class_id]
55 |                 if confidence > 0.5:
56 |                     center_x = int(detection[0] * width)
57 |                     center_y = int(detection[1] * height)
58 |                     w = int(detection[2] * width)
59 |                     h = int(detection[3] * height)
60 |                     x = center_x - w / 2
61 |                     y = center_y - h / 2
62 |                     class_ids.append(class_id)
63 |                     confidences.append(float(confidence))
64 |                     boxes.append([x, y, w, h])
65 | 
66 |         indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)
67 | 
68 |         for i in indices:
69 |             i = i[0]
70 |             box = boxes[i]
71 |             x = box[0]
72 |             y = box[1]
73 |             w = box[2]
74 |             h = box[3]
75 |             draw_prediction(frame, class_ids[i], confidences[i], round(x), round(y), round(x + w), round(y + h))
76 | 
77 |         frame_count += 1
78 | 
79 |         cv2.imshow("Frame", frame)
80 |         key = cv2.waitKey(1) & 0xFF
81 |         #if the `q` key was pressed, break from the loop
82 |         if key == ord("q"):
83 |              break
84 | 
85 | end = time.time()
86 | 
87 | time_elapsed = end - start
88 | 
89 | print("[INFO] elapsed time: {:.2f}".format(time_elapsed))
90 | print("[INFO] approx. FPS: {:.2f}".format(frame_count / time_elapsed))
91 | vs.release()
92 | cv2.destroyAllWindows()
93 | 


--------------------------------------------------------------------------------
/model.json:
--------------------------------------------------------------------------------
  1 | {
  2 |     "class_name": "Sequential",
  3 |     "config": [
  4 |         {
  5 |             "class_name": "Conv2D",
  6 |             "config": {
  7 |                 "name": "conv2d_1",
  8 |                 "trainable": true,
  9 |                 "batch_input_shape": [
 10 |                     null,
 11 |                     240,
 12 |                     320,
 13 |                     3
 14 |                 ],
 15 |                 "dtype": "float32",
 16 |                 "filters": 4,
 17 |                 "kernel_size": [
 18 |                     3,
 19 |                     3
 20 |                 ],
 21 |                 "strides": [
 22 |                     1,
 23 |                     1
 24 |                 ],
 25 |                 "padding": "valid",
 26 |                 "data_format": "channels_last",
 27 |                 "dilation_rate": [
 28 |                     1,
 29 |                     1
 30 |                 ],
 31 |                 "activation": "relu",
 32 |                 "use_bias": true,
 33 |                 "kernel_initializer": {
 34 |                     "class_name": "VarianceScaling",
 35 |                     "config": {
 36 |                         "scale": 1.0,
 37 |                         "mode": "fan_avg",
 38 |                         "distribution": "uniform",
 39 |                         "seed": null
 40 |                     }
 41 |                 },
 42 |                 "bias_initializer": {
 43 |                     "class_name": "Zeros",
 44 |                     "config": {}
 45 |                 },
 46 |                 "kernel_regularizer": null,
 47 |                 "bias_regularizer": null,
 48 |                 "activity_regularizer": null,
 49 |                 "kernel_constraint": null,
 50 |                 "bias_constraint": null
 51 |             }
 52 |         },
 53 |         {
 54 |             "class_name": "MaxPooling2D",
 55 |             "config": {
 56 |                 "name": "max_pooling2d_1",
 57 |                 "trainable": true,
 58 |                 "pool_size": [
 59 |                     2,
 60 |                     2
 61 |                 ],
 62 |                 "padding": "valid",
 63 |                 "strides": [
 64 |                     2,
 65 |                     2
 66 |                 ],
 67 |                 "data_format": "channels_last"
 68 |             }
 69 |         },
 70 |         {
 71 |             "class_name": "Flatten",
 72 |             "config": {
 73 |                 "name": "flatten_1",
 74 |                 "trainable": true,
 75 |                 "data_format": "channels_last"
 76 |             }
 77 |         },
 78 |         {
 79 |             "class_name": "Dense",
 80 |             "config": {
 81 |                 "name": "dense_1",
 82 |                 "trainable": true,
 83 |                 "units": 560,
 84 |                 "activation": "relu",
 85 |                 "use_bias": true,
 86 |                 "kernel_initializer": {
 87 |                     "class_name": "VarianceScaling",
 88 |                     "config": {
 89 |                         "scale": 1.0,
 90 |                         "mode": "fan_avg",
 91 |                         "distribution": "uniform",
 92 |                         "seed": null
 93 |                     }
 94 |                 },
 95 |                 "bias_initializer": {
 96 |                     "class_name": "Zeros",
 97 |                     "config": {}
 98 |                 },
 99 |                 "kernel_regularizer": null,
100 |                 "bias_regularizer": null,
101 |                 "activity_regularizer": null,
102 |                 "kernel_constraint": null,
103 |                 "bias_constraint": null
104 |             }
105 |         },
106 |         {
107 |             "class_name": "Dense",
108 |             "config": {
109 |                 "name": "dense_2",
110 |                 "trainable": true,
111 |                 "units": 560,
112 |                 "activation": "relu",
113 |                 "use_bias": true,
114 |                 "kernel_initializer": {
115 |                     "class_name": "VarianceScaling",
116 |                     "config": {
117 |                         "scale": 1.0,
118 |                         "mode": "fan_avg",
119 |                         "distribution": "uniform",
120 |                         "seed": null
121 |                     }
122 |                 },
123 |                 "bias_initializer": {
124 |                     "class_name": "Zeros",
125 |                     "config": {}
126 |                 },
127 |                 "kernel_regularizer": null,
128 |                 "bias_regularizer": null,
129 |                 "activity_regularizer": null,
130 |                 "kernel_constraint": null,
131 |                 "bias_constraint": null
132 |             }
133 |         },
134 |         {
135 |             "class_name": "Dense",
136 |             "config": {
137 |                 "name": "dense_3",
138 |                 "trainable": true,
139 |                 "units": 560,
140 |                 "activation": "relu",
141 |                 "use_bias": true,
142 |                 "kernel_initializer": {
143 |                     "class_name": "VarianceScaling",
144 |                     "config": {
145 |                         "scale": 1.0,
146 |                         "mode": "fan_avg",
147 |                         "distribution": "uniform",
148 |                         "seed": null
149 |                     }
150 |                 },
151 |                 "bias_initializer": {
152 |                     "class_name": "Zeros",
153 |                     "config": {}
154 |                 },
155 |                 "kernel_regularizer": null,
156 |                 "bias_regularizer": null,
157 |                 "activity_regularizer": null,
158 |                 "kernel_constraint": null,
159 |                 "bias_constraint": null
160 |             }
161 |         },
162 |         {
163 |             "class_name": "Dense",
164 |             "config": {
165 |                 "name": "dense_4",
166 |                 "trainable": true,
167 |                 "units": 560,
168 |                 "activation": "relu",
169 |                 "use_bias": true,
170 |                 "kernel_initializer": {
171 |                     "class_name": "VarianceScaling",
172 |                     "config": {
173 |                         "scale": 1.0,
174 |                         "mode": "fan_avg",
175 |                         "distribution": "uniform",
176 |                         "seed": null
177 |                     }
178 |                 },
179 |                 "bias_initializer": {
180 |                     "class_name": "Zeros",
181 |                     "config": {}
182 |                 },
183 |                 "kernel_regularizer": null,
184 |                 "bias_regularizer": null,
185 |                 "activity_regularizer": null,
186 |                 "kernel_constraint": null,
187 |                 "bias_constraint": null
188 |             }
189 |         },
190 |         {
191 |             "class_name": "Dense",
192 |             "config": {
193 |                 "name": "dense_5",
194 |                 "trainable": true,
195 |                 "units": 1,
196 |                 "activation": "sigmoid",
197 |                 "use_bias": true,
198 |                 "kernel_initializer": {
199 |                     "class_name": "VarianceScaling",
200 |                     "config": {
201 |                         "scale": 1.0,
202 |                         "mode": "fan_avg",
203 |                         "distribution": "uniform",
204 |                         "seed": null
205 |                     }
206 |                 },
207 |                 "bias_initializer": {
208 |                     "class_name": "Zeros",
209 |                     "config": {}
210 |                 },
211 |                 "kernel_regularizer": null,
212 |                 "bias_regularizer": null,
213 |                 "activity_regularizer": null,
214 |                 "kernel_constraint": null,
215 |                 "bias_constraint": null
216 |             }
217 |         }
218 |     ],
219 |     "keras_version": "2.2.2",
220 |     "backend": "tensorflow"
221 | }


--------------------------------------------------------------------------------
/opencvutils/detection/detection.py:
--------------------------------------------------------------------------------
  1 | from time import sleep
  2 | import time
  3 | import logging as log
  4 | import argparse
  5 | import cv2
  6 | import datetime as dt
  7 | import datetime
  8 | 
  9 | class MotionDetection:
 10 | 
 11 |     def __init__(self):
 12 |         print("Motion Detection Ready to Run!")
 13 | 
 14 |     def run(self):
 15 |         ap = argparse.ArgumentParser()
 16 |         ap.add_argument("-v", "--video", help="path to the video file")
 17 |         ap.add_argument("-a", "--min-area", type=int, default=500, help="minimum area size")
 18 |         args = vars(ap.parse_args())
 19 | 
 20 |         # if the video argument is None, then we are reading from webcam
 21 |         if args.get("video", None) is None:
 22 |             vs = cv2.VideoCapture(0)
 23 |             time.sleep(2.0)
 24 | 
 25 |         # otherwise, we are reading from a video file
 26 |         else:
 27 |             vs = cv2.VideoCapture(args["video"])
 28 | 
 29 |         # initialize the first frame in the video stream
 30 |         firstFrame = None
 31 | 
 32 |         # loop over the frames of the video
 33 |         while True:
 34 |             # grab the current frame and initialize the occupied/unoccupied
 35 |             # text
 36 |             ret, frame = vs.read()
 37 |             text = "Unoccupied"
 38 | 
 39 |             # if the frame could not be grabbed, then we have reached the end
 40 |             # of the video
 41 |             if frame is None:
 42 |                 break
 43 | 
 44 |             # resize the frame, convert it to grayscale, and blur it
 45 |             frame = cv2.resize(frame, (500, frame.shape[0]))
 46 |             gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
 47 |             gray = cv2.GaussianBlur(gray, (21, 21), 0)
 48 | 
 49 |             # if the first frame is None, initialize it
 50 |             if firstFrame is None:
 51 |                 firstFrame = gray
 52 |                 continue
 53 | 
 54 |             # compute the absolute difference between the current frame and
 55 |             # first frame
 56 |             frameDelta = cv2.absdiff(firstFrame, gray)
 57 |             thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1]
 58 | 
 59 |             # dilate the thresholded image to fill in holes, then find contours
 60 |             # on thresholded image
 61 |             thresh = cv2.dilate(thresh, None, iterations=2)
 62 |             cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
 63 |                                     cv2.CHAIN_APPROX_SIMPLE)
 64 |             cnts = cnts[0]
 65 | 
 66 |             # loop over the contours
 67 |             for c in cnts:
 68 |                 # if the contour is too small, ignore it
 69 |                 if cv2.contourArea(c) < args["min_area"]:
 70 |                     continue
 71 | 
 72 |                 # compute the bounding box for the contour, draw it on the frame,
 73 |                 # and update the text
 74 |                 (x, y, w, h) = cv2.boundingRect(c)
 75 |                 cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
 76 |                 # TODO Occupied by whom? Using GAIT, passing the video argument to gait
 77 |                 text = "Occupied"
 78 | 
 79 |             # draw the text and timestamp on the frame
 80 |             cv2.putText(frame, "Room Status: {}".format(text), (10, 20),
 81 |                         cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
 82 |             cv2.putText(frame, datetime.datetime.now().strftime("%A %d %B %Y %I:%M:%S%p"),
 83 |                         (10, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.35, (0, 0, 255), 1)
 84 | 
 85 |             # show the frame and record if the user presses a key
 86 |             cv2.imshow("Gait Recognition", frame)
 87 |             cv2.imshow("Thresh", thresh)
 88 |             cv2.imshow("Frame Delta", frameDelta)
 89 |             key = cv2.waitKey(1) & 0xFF
 90 | 
 91 |             # if the `q` key is pressed, break from the lop
 92 |             if key == ord("q"):
 93 |                 break
 94 | 
 95 |         # cleanup the camera and close any open windows
 96 |         vs.release()
 97 |         cv2.destroyAllWindows()
 98 | 
 99 | 
100 | class PedestrianDetection:
101 | 
102 |     def __init__(self):
103 |         print("Pedestrian Detection Ready to Run")
104 | 
105 |     def run(self):
106 |         hog = cv2.HOGDescriptor()
107 |         hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
108 |         cap = cv2.VideoCapture("/path/to/test/video")
109 |         while True:
110 |             r, frame = cap.read()
111 |             if r:
112 |                 start_time = time.time()
113 |                 frame = cv2.resize(frame, (1280, 720))  # Downscale to improve frame rate
114 |                 gray_frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)  # HOG needs a grayscale image
115 | 
116 |                 rects, weights = hog.detectMultiScale(gray_frame)
117 | 
118 |                 # Measure elapsed time for detections
119 |                 end_time = time.time()
120 |                 print("Elapsed time:", end_time - start_time)
121 | 
122 |                 for i, (x, y, w, h) in enumerate(rects):
123 |                     if weights[i] < 0.7:
124 |                         continue
125 |                     cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
126 | 
127 |                 cv2.imshow("preview", frame)
128 |             k = cv2.waitKey(1)
129 |             if k & 0xFF == ord("q"):  # Exit condition
130 |                 break
131 | 
132 | 
133 | class FaceDetection:
134 | 
135 |     def __init__(self):
136 |         print("Face Detection Ready to run!")
137 | 
138 |     def run(self):
139 |         cascPath = "haarcascade_frontalface_default.xml"
140 |         faceCascade = cv2.CascadeClassifier(cascPath)
141 |         log.basicConfig(filename='webcam.log', level=log.INFO)
142 | 
143 |         video_capture = cv2.VideoCapture(0)
144 |         anterior = 0
145 | 
146 |         while True:
147 |             if not video_capture.isOpened():
148 |                 print('Unable to load camera.')
149 |                 sleep(5)
150 |                 pass
151 | 
152 |             # Capture frame-by-frame
153 |             ret, frame = video_capture.read()
154 | 
155 |             gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
156 | 
157 |             faces = faceCascade.detectMultiScale(
158 |                 gray,
159 |                 scaleFactor=1.1,
160 |                 minNeighbors=5,
161 |                 minSize=(30, 30)
162 |             )
163 | 
164 |             # Draw a rectangle around the faces
165 |             for (x, y, w, h) in faces:
166 |                 cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
167 | 
168 |             if anterior != len(faces):
169 |                 anterior = len(faces)
170 |                 log.info("faces: " + str(len(faces)) + " at " + str(dt.datetime.now()))
171 | 
172 |             # Display the resulting frame
173 |             cv2.imshow('Video', frame)
174 | 
175 |             if cv2.waitKey(1) & 0xFF == ord('q'):
176 |                 break
177 | 
178 |             # Display the resulting frame
179 |             cv2.imshow('Video', frame)
180 | 
181 |         # When everything is done, release the capture
182 |         video_capture.release()
183 |         cv2.destroyAllWindows()
184 | 


--------------------------------------------------------------------------------
/opencvutils/detection/test.py:
--------------------------------------------------------------------------------
 1 | from detection import MotionDetection, PedestrianDetection, FaceDetection
 2 | 
 3 | 
 4 | def motion_test():
 5 |     motion_obj = MotionDetection()
 6 |     motion_obj.run()
 7 | 
 8 | 
 9 | def face_test():
10 |     face_obj = FaceDetection()
11 |     face_obj.run()
12 | 
13 | 
14 | def ped_test():
15 |     ped_obj = PedestrianDetection()
16 |     ped_obj.run()
17 | 
18 | 
19 | #motion_test()#face_test()
20 | ped_test()
21 | 


--------------------------------------------------------------------------------
/opencvutils/detection_tests/frontal_face/object_detection.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import sys
 3 | import logging as log
 4 | import datetime as dt
 5 | from time import sleep
 6 | 
 7 | cascPath = "haarcascade_frontalface_default.xml"
 8 | faceCascade = cv2.CascadeClassifier(cascPath)
 9 | log.basicConfig(filename='webcam.log',level=log.INFO)
10 | 
11 | video_capture = cv2.VideoCapture(0)
12 | anterior = 0
13 | 
14 | while True:
15 |     if not video_capture.isOpened():
16 |         print('Unable to load camera.')
17 |         sleep(5)
18 |         pass
19 | 
20 |     # Capture frame-by-frame
21 |     ret, frame = video_capture.read()
22 | 
23 |     gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
24 | 
25 |     faces = faceCascade.detectMultiScale(
26 |         gray,
27 |         scaleFactor=1.1,
28 |         minNeighbors=5,
29 |         minSize=(30, 30)
30 |     )
31 | 
32 |     # Draw a rectangle around the faces
33 |     for (x, y, w, h) in faces:
34 |         cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
35 | 
36 |     if anterior != len(faces):
37 |         anterior = len(faces)
38 |         log.info("faces: "+str(len(faces))+" at "+str(dt.datetime.now()))
39 | 
40 | 
41 |     # Display the resulting frame
42 |     cv2.imshow('Video', frame)
43 | 
44 | 
45 |     if cv2.waitKey(1) & 0xFF == ord('q'):
46 |         break
47 | 
48 |     # Display the resulting frame
49 |     cv2.imshow('Video', frame)
50 | 
51 | # When everything is done, release the capture
52 | video_capture.release()
53 | cv2.destroyAllWindows()
54 | 


--------------------------------------------------------------------------------
/opencvutils/detection_tests/full_body/motion_detector.py:
--------------------------------------------------------------------------------
 1 | # python motion_detector.py
 2 | # python motion_detector.py --video "path to recorder video"
 3 | 
 4 | import argparse
 5 | import datetime
 6 | import time
 7 | import cv2
 8 | 
 9 | # construct the argument parser and parse the arguments
10 | ap = argparse.ArgumentParser()
11 | ap.add_argument("-v", "--video", help="path to the video file")
12 | ap.add_argument("-a", "--min-area", type=int, default=500, help="minimum area size")
13 | args = vars(ap.parse_args())
14 | 
15 | # if the video argument is None, then we are reading from webcam
16 | if args.get("video", None) is None:
17 |     vs = VideoCapture(0)
18 |     time.sleep(2.0)
19 | 
20 | # otherwise, we are reading from a video file
21 | else:
22 |     vs = cv2.VideoCapture(args["video"])
23 | 
24 | # initialize the first frame in the video stream
25 | firstFrame = None
26 | 
27 | # loop over the frames of the video
28 | while True:
29 |     # grab the current frame and initialize the occupied/unoccupied
30 |     # text
31 |     ret, frame = vs.read()
32 |     text = "Unoccupied"
33 | 
34 |     # if the frame could not be grabbed, then we have reached the end
35 |     # of the video
36 |     if frame is None:
37 |         break
38 | 
39 |     # resize the frame, convert it to grayscale, and blur it
40 |     frame = cv2.resize(frame, (500, frame.shape[0]))
41 |     gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
42 |     gray = cv2.GaussianBlur(gray, (21, 21), 0)
43 | 
44 |     # if the first frame is None, initialize it
45 |     if firstFrame is None:
46 |         firstFrame = gray
47 |         continue
48 | 
49 |     # compute the absolute difference between the current frame and
50 |     # first frame
51 |     frameDelta = cv2.absdiff(firstFrame, gray)
52 |     thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1]
53 | 
54 |     # dilate the thresholded image to fill in holes, then find contours
55 |     # on thresholded image
56 |     thresh = cv2.dilate(thresh, None, iterations=2)
57 |     cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
58 |                             cv2.CHAIN_APPROX_SIMPLE)
59 |     cnts = cnts[0]
60 | 
61 |     # loop over the contours
62 |     for c in cnts:
63 |         # if the contour is too small, ignore it
64 |         if cv2.contourArea(c) < args["min_area"]:
65 |             continue
66 | 
67 |         # compute the bounding box for the contour, draw it on the frame,
68 |         # and update the text
69 |         (x, y, w, h) = cv2.boundingRect(c)
70 |         cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
71 |         # TODO Occupied by whom? Using GAIT, passing the video argument to gait
72 |         text = "Occupied"
73 | 
74 |     # draw the text and timestamp on the frame
75 |     cv2.putText(frame, "Room Status: {}".format(text), (10, 20),
76 |                 cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
77 |     cv2.putText(frame, datetime.datetime.now().strftime("%A %d %B %Y %I:%M:%S%p"),
78 |                 (10, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.35, (0, 0, 255), 1)
79 | 
80 |     # show the frame and record if the user presses a key
81 |     cv2.imshow("Gait Recognition", frame)
82 |     cv2.imshow("Thresh", thresh)
83 |     cv2.imshow("Frame Delta", frameDelta)
84 |     key = cv2.waitKey(1) & 0xFF
85 | 
86 |     # if the `q` key is pressed, break from the lop
87 |     if key == ord("q"):
88 |         break
89 | 
90 | # cleanup the camera and close any open windows
91 | vs.release()
92 | cv2.destroyAllWindows()
93 | 


--------------------------------------------------------------------------------
/opencvutils/detection_tests/pedestrian/pedestrian_detection.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import time
 3 | 
 4 | hog = cv2.HOGDescriptor()
 5 | hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
 6 | cap = cv2.VideoCapture("/path/to/test/video")
 7 | while True:
 8 |     r, frame = cap.read()
 9 |     if r:
10 |         start_time = time.time()
11 |         frame = cv2.resize(frame, (1280, 720))  # Downscale to improve frame rate
12 |         gray_frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)  # HOG needs a grayscale image
13 | 
14 |         rects, weights = hog.detectMultiScale(gray_frame)
15 | 
16 |         # Measure elapsed time for detections
17 |         end_time = time.time()
18 |         print("Elapsed time:", end_time - start_time)
19 | 
20 |         for i, (x, y, w, h) in enumerate(rects):
21 |             if weights[i] < 0.7:
22 |                 continue
23 |             cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
24 | 
25 |         cv2.imshow("preview", frame)
26 |     k = cv2.waitKey(1)
27 |     if k & 0xFF == ord("q"):  # Exit condition
28 |         break
29 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | pandas
3 | keras
4 | tensorflow
5 | imageio
6 | matplotlib
7 | h5py
8 | opencv-python
9 | 


--------------------------------------------------------------------------------
/yoloutils/detection/detection.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import argparse
  3 | import numpy as np
  4 | 
  5 | ap = argparse.ArgumentParser()
  6 | ap.add_argument('-i', '--image', required=True,
  7 |                 help='path to input image')
  8 | ap.add_argument('-c', '--config', required=True,
  9 |                 help='path to yolo config file')
 10 | ap.add_argument('-w', '--weights', required=True,
 11 |                 help='path to yolo pre-trained weights')
 12 | ap.add_argument('-cl', '--classes', required=True,
 13 |                 help='path to text file containing class names')
 14 | args = ap.parse_args()
 15 | 
 16 | 
 17 | def get_output_layers(net):
 18 | 
 19 |     layer_names = net.getLayerNames()
 20 | 
 21 |     output_layers = [layer_names[i[0] - 1]
 22 |                      for i in net.getUnconnectedOutLayers()]
 23 | 
 24 |     return output_layers
 25 | 
 26 | 
 27 | def draw_prediction(img, class_id, confidence, x, y, x_plus_w, y_plus_h):
 28 | 
 29 |     label = str(classes[class_id])
 30 | 
 31 |     color = COLORS[class_id]
 32 | 
 33 |     cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)
 34 | 
 35 |     cv2.putText(img, label, (x-10, y-10),
 36 |                 cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
 37 | 
 38 | 
 39 | image = cv2.imread(args.image)
 40 | 
 41 | Width = image.shape[1]
 42 | Height = image.shape[0]
 43 | scale = 0.00392
 44 | 
 45 | classes = None
 46 | 
 47 | with open(args.classes, 'r') as f:
 48 |     classes = [line.strip() for line in f.readlines()]
 49 | 
 50 | COLORS = np.random.uniform(0, 255, size=(len(classes), 3))
 51 | 
 52 | net = cv2.dnn.readNet(args.weights, args.config)
 53 | 
 54 | blob = cv2.dnn.blobFromImage(
 55 |     image, scale, (416, 416), (0, 0, 0), True, crop=False)
 56 | 
 57 | net.setInput(blob)
 58 | 
 59 | outs = net.forward(get_output_layers(net))
 60 | 
 61 | class_ids = []
 62 | confidences = []
 63 | boxes = []
 64 | conf_threshold = 0.5
 65 | nms_threshold = 0.4
 66 | 
 67 | 
 68 | for out in outs:
 69 |     for detection in out:
 70 |         scores = detection[5:]
 71 |         class_id = np.argmax(scores)
 72 |         confidence = scores[class_id]
 73 |         if confidence > 0.5:
 74 |             center_x = int(detection[0] * Width)
 75 |             center_y = int(detection[1] * Height)
 76 |             w = int(detection[2] * Width)
 77 |             h = int(detection[3] * Height)
 78 |             x = center_x - w / 2
 79 |             y = center_y - h / 2
 80 |             class_ids.append(class_id)
 81 |             confidences.append(float(confidence))
 82 |             boxes.append([x, y, w, h])
 83 | 
 84 | 
 85 | indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)
 86 | 
 87 | for i in indices:
 88 |     i = i[0]
 89 |     box = boxes[i]
 90 |     x = box[0]
 91 |     y = box[1]
 92 |     w = box[2]
 93 |     h = box[3]
 94 |     draw_prediction(image, class_ids[i], confidences[i], round(
 95 |         x), round(y), round(x+w), round(y+h))
 96 | 
 97 | cv2.imshow("object detection", image)
 98 | cv2.waitKey()
 99 | 
100 | cv2.imwrite("object-detection.jpg", image)
101 | cv2.destroyAllWindows()
102 | 


--------------------------------------------------------------------------------
/yoloutils/detection/dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rshrc/gait-recognition/30dc1a7ed7432a5be510229053ba1e278aefea55/yoloutils/detection/dog.jpg


--------------------------------------------------------------------------------
/yoloutils/detection/general_detection.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import argparse
  3 | import numpy as np
  4 | 
  5 | 
  6 | class DetectObject:
  7 | 
  8 |     def __init__(self, ap):
  9 | 
 10 |         self.classes = None
 11 |         self.ap = argparse.ArgumentParser()
 12 |         self.ap.add_argument('-i', '--image', required=True,
 13 |                              help='path to input image')
 14 |         self.ap.add_argument('-c', '--config', required=True,
 15 |                              help='path to yolo config file')
 16 |         self.ap.add_argument('-w', '--weights', required=True,
 17 |                              help='path to yolo pre-trained weights')
 18 |         self.ap.add_argument('-cl', '--classes', required=True,
 19 |                              help='path to text file containing class names')
 20 |         self.args = ap.parse_args()
 21 | 
 22 |     def get_output_layers(self, net):
 23 | 
 24 |         self.layer_names = net.getLayerNames()
 25 | 
 26 |         self.output_layers = [self.layer_names[i[0] - 1]
 27 |                               for i in self.net.getUnconnectedOutLayers()]
 28 | 
 29 |         return self.output_layers
 30 | 
 31 |     def draw_prediction(self, img, class_id, confidence, x, y, x_plus_w, y_plus_h):
 32 | 
 33 |         self.label = str(self.classes[self.class_id])
 34 | 
 35 |         self.color = COLORS[class_id]
 36 | 
 37 |         self.cv2.rectangle(self.img, (self.x, self.y),
 38 |                            (self.x_plus_w, self.y_plus_h), self.scorescolor, 2)
 39 | 
 40 |         self.cv2.putText(self.img, self.label, (x-10, y-10),
 41 |                          self.cv2.FONT_HERSHEY_SIMPLEX, 0.5, self.color, 2)
 42 | 
 43 |     def read_image(self):
 44 | 
 45 |         self.image = cv2.imread(self.args.image)
 46 | 
 47 |         self.Width = self.image.shape[1]
 48 |         self.Height = self.image.shape[0]
 49 |         self.scale = 0.00392
 50 | 
 51 |         with open(self.args.classes, 'r') as f:
 52 |             self.classes = [line.strip() for line in f.readlines()]
 53 | 
 54 |         self.COLORS = np.random.uniform(0, 255, size=(len(classes), 3))
 55 | 
 56 |         self.net = self.cv2.dnn.readNet(self.args.weights, self.args.config)
 57 | 
 58 |         self.blob = self.cv2.dnn.blobFromImage(
 59 |             self.image, self.scale, (416, 416), (0, 0, 0), True, crop=False)
 60 | 
 61 |         self.net.setInput(blob)
 62 | 
 63 |         self.outs = net.forward(get_output_layers(net))
 64 | 
 65 |         self.class_ids = []
 66 |         self.confidences = []
 67 |         self.boxes = []
 68 |         self.conf_threshold = 0.5
 69 |         self.nms_threshold = 0.4
 70 | 
 71 |         for out in self.outs:
 72 |             for detection in self.out:
 73 |                 self.scores = detection[5:]
 74 |                 self.class_id = np.argmax(self.scores)
 75 |                 self.confidence = scores[class_id]
 76 |                 if self.confidence > 0.5:
 77 |                     self.center_x = int(detection[0] * Width)
 78 |                     self.center_y = int(detection[1] * Height)
 79 |                     self.w = int(detection[2] * Width)
 80 |                     selfh = int(detection[3] * Height)
 81 |                     self.x = center_x - w / 2
 82 |                     self.y = center_y - h / 2
 83 |                     self.class_ids.append(class_id)
 84 |                     self.confidences.append(float(confidence))
 85 |                     self.boxes.append([x, y, w, h])
 86 | 
 87 |         self.indices = cv2.dnn.NMSBoxes(
 88 |             boxes, confidences, conf_threshold, nms_threshold)
 89 | 
 90 |         for i in indices:
 91 |             i = i[0]
 92 |             box = boxes[i]
 93 |             x = box[0]
 94 |             y = box[1]
 95 |             w = box[2]
 96 |             h = box[3]
 97 |             draw_prediction(image, class_ids[i], confidences[i], round(
 98 |                 x), round(y), round(x+w), round(y+h))
 99 | 
100 |         cv2.imshow("object detection", image)
101 |         cv2.waitKey()
102 | 
103 |         cv2.imwrite("object-detection.jpg", image)
104 |         cv2.destroyAllWindows()
105 | 


--------------------------------------------------------------------------------
/yoloutils/detection/testing_gen_detection.py:
--------------------------------------------------------------------------------
1 | from general_detection import DetectObject
2 | 


--------------------------------------------------------------------------------
/yoloutils/detection/yolov3.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=16
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 | 
606 | 
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1


--------------------------------------------------------------------------------
/yoloutils/detection/yolov3.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorcycle
 5 | airplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | couch
59 | potted plant
60 | bed
61 | dining table
62 | toilet
63 | tv
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush


--------------------------------------------------------------------------------
/yolov3.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=16
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 | 
606 | 
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .7
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .7
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .7
787 | truth_thresh = 1
788 | random=1


--------------------------------------------------------------------------------
/yolov3.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorcycle
 5 | airplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | couch
59 | potted plant
60 | bed
61 | dining table
62 | toilet
63 | tv
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush


--------------------------------------------------------------------------------