├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── README.md ├── gait └── core │ ├── main.py │ └── model_data.py ├── live_detection.py ├── model.json ├── opencvutils ├── detection │ ├── detection.py │ ├── haarcascade_frontalface_default.xml │ └── test.py └── detection_tests │ ├── frontal_face │ ├── haarcascade_frontalface_default.xml │ └── object_detection.py │ ├── full_body │ └── motion_detector.py │ └── pedestrian │ └── pedestrian_detection.py ├── requirements.txt ├── yoloutils └── detection │ ├── detection.py │ ├── dog.jpg │ ├── general_detection.py │ ├── testing_gen_detection.py │ ├── yolov3.cfg │ └── yolov3.txt ├── yolov3.cfg └── yolov3.txt /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | 5 | --- 6 | 7 | **Describe the bug** 8 | A clear and concise description of what the bug is. 9 | 10 | **To Reproduce** 11 | Steps to reproduce the behavior: 12 | 1. Go to '...' 13 | 2. Click on '....' 14 | 3. Scroll down to '....' 15 | 4. See error 16 | 17 | **Expected behavior** 18 | A clear and concise description of what you expected to happen. 19 | 20 | **Screenshots** 21 | If applicable, add screenshots to help explain your problem. 22 | 23 | **Desktop (please complete the following information):** 24 | - OS: [e.g. iOS] 25 | - Browser [e.g. chrome, safari] 26 | - Version [e.g. 22] 27 | 28 | **Smartphone (please complete the following information):** 29 | - Device: [e.g. iPhone6] 30 | - OS: [e.g. iOS8.1] 31 | - Browser [e.g. stock browser, safari] 32 | - Version [e.g. 22] 33 | 34 | **Additional context** 35 | Add any other context about the problem here. 36 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | 5 | --- 6 | 7 | **Is your feature request related to a problem? Please describe.** 8 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 9 | 10 | **Describe the solution you'd like** 11 | A clear and concise description of what you want to happen. 12 | 13 | **Describe alternatives you've considered** 14 | A clear and concise description of any alternative solutions or features you've considered. 15 | 16 | **Additional context** 17 | Add any other context or screenshots about the feature request here. 18 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | yolov3.weights 6 | 7 | # C extensions 8 | *.so 9 | 10 | # Distribution / packaging 11 | .Python 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | detection_tools/ 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Idea 37 | *.iml 38 | .idea/ 39 | 40 | # Image sources 41 | imagesrc/ 42 | 43 | # Installer logs 44 | pip-log.txt 45 | pip-delete-this-directory.txt 46 | 47 | # Unit test / coverage reports 48 | htmlcov/ 49 | .tox/ 50 | .coverage 51 | .coverage.* 52 | .cache 53 | nosetests.xml 54 | coverage.xml 55 | *.cover 56 | .hypothesis/ 57 | .pytest_cache/ 58 | 59 | # Translations 60 | *.mo 61 | *.pot 62 | 63 | # Django stuff: 64 | *.log 65 | local_settings.py 66 | db.sqlite3 67 | 68 | # Flask stuff: 69 | instance/ 70 | .webassets-cache 71 | 72 | # Scrapy stuff: 73 | .scrapy 74 | 75 | # Sphinx documentation 76 | docs/_build/ 77 | 78 | # PyBuilder 79 | target/ 80 | 81 | # Jupyter Notebook 82 | .ipynb_checkpoints 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # celery beat schedule file 88 | celerybeat-schedule 89 | 90 | # SageMath parsed files 91 | *.sage.py 92 | 93 | # Environments 94 | .env 95 | .venv 96 | env/ 97 | venv/ 98 | ENV/ 99 | env.bak/ 100 | venv.bak/ 101 | 102 | # Spyder project settings 103 | .spyderproject 104 | .spyproject 105 | 106 | # Rope project settings 107 | .ropeproject 108 | 109 | # mkdocs documentation 110 | /site 111 | 112 | # mypy 113 | .mypy_cache/ 114 | 115 | *.h5 116 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Cross-View Gait Based Human Identification 2 | 3 | GAIT is a kind of behavioral biometric feature, whose raw data are video sequences presenting walking people. It is particularly suitable for long-distance human identification, and requires no explicit co-operation by subjects, compared with other kinds of biometric features such as fingerprint and iris. 4 | 5 | ## Using multiple images of the same person from different angles. 6 | ![gait1](https://i.imgur.com/8QESu5B.png) 7 | 8 | ## One person having multiple outfits 9 | ![gait2](https://i.imgur.com/NUT0kaf.jpg) 10 | 11 | ## The Deep Neural Network Architecture 12 | ![gait3](https://i.imgur.com/4col0vk.png) 13 | 14 | 15 | #### Currently Under Active Development 16 | -------------------------------------------------------------------------------- /gait/core/main.py: -------------------------------------------------------------------------------- 1 | from keras.layers import ( 2 | Convolution2D, 3 | MaxPooling2D, 4 | Flatten, 5 | Dense 6 | ) 7 | from keras.preprocessing.image import ImageDataGenerator 8 | from keras.models import Sequential 9 | from keras.models import model_from_json 10 | import simplejson as sj 11 | 12 | def create_model(): 13 | model = Sequential() 14 | model.add(Convolution2D(4, (3, 3), input_shape=(240, 320, 3), activation='relu')) 15 | model.add(MaxPooling2D(pool_size=(2, 2))) 16 | model.add(Flatten()) 17 | model.add(Dense(output_dim=560, activation='relu')) 18 | model.add(Dense(output_dim=560, activation='relu')) 19 | model.add(Dense(output_dim=560, activation='relu')) 20 | model.add(Dense(output_dim=560, activation='relu')) 21 | model.add(Dense(output_dim=1, activation='sigmoid')) 22 | return model 23 | 24 | def train(model, training_set, test_set): 25 | model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) 26 | model.fit_generator( 27 | training_set, 28 | steps_per_epoch=250, 29 | epochs=25, 30 | verbose=1, 31 | validation_data=test_set, 32 | validation_steps=62.5 33 | ) 34 | 35 | def save_model(model): 36 | print("Saving...") 37 | model.save_weights("model.h5") 38 | print(" [*] Weights") 39 | open("model.json", "w").write( 40 | sj.dumps(sj.loads(model.to_json()), indent=4) 41 | ) 42 | print(" [*] Model") 43 | 44 | def load_model(): 45 | print("Loading...") 46 | json_file = open("model.json", "r") 47 | model = model_from_json(json_file.read()) 48 | print(" [*] Model") 49 | model.load_weights("model.h5") 50 | print(" [*] Weights") 51 | json_file.close() 52 | return model 53 | 54 | def dataset_provider(datagen): 55 | return datagen.flow_from_directory( 56 | 'imagesrc', 57 | target_size=(240, 320), 58 | batch_size=32, 59 | class_mode='binary' 60 | ) 61 | 62 | # Primary datagen 63 | train_datagen = ImageDataGenerator( 64 | shear_range=0.2, 65 | zoom_range=0.2, 66 | horizontal_flip=True) 67 | # Validation datagen 68 | test_datagen = ImageDataGenerator(rescale=1. / 255) 69 | 70 | # Primary Set for training 71 | training_set = dataset_provider(train_datagen) 72 | # Secondary / Test set for validation 73 | test_set = dataset_provider(test_datagen) 74 | 75 | model = create_model() 76 | train(model, training_set, test_set) 77 | save_model(model) -------------------------------------------------------------------------------- /gait/core/model_data.py: -------------------------------------------------------------------------------- 1 | from keras.models import model_from_json 2 | 3 | # load json and create model 4 | json_file = open('model.json', 'r') 5 | loaded_model_json = json_file.read() 6 | json_file.close() 7 | loaded_model = model_from_json(loaded_model_json) 8 | # load weights into new model 9 | loaded_model.load_weights("model.h5") 10 | print("Loaded model from disk") 11 | 12 | # evaluate loaded model on test data 13 | loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) 14 | score = loaded_model.evaluate(X, Y, verbose=0) 15 | print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100)) 16 | 17 | -------------------------------------------------------------------------------- /live_detection.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import argparse 3 | import time 4 | import cv2 5 | 6 | objectDetected = 'item' 7 | with open('yolov3.txt', 'r') as f: 8 | classes = [line.strip() for line in f.readlines()] 9 | 10 | COLORS = np.random.uniform(0, 255, size=(len(classes), 3)) 11 | 12 | def distance_to_camera(knownWidth, focalLength, perWidth): 13 | return (knownWidth * focalLength) / perWidth 14 | 15 | def get_output_layers(net): 16 | layer_names = net.getLayerNames() 17 | output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] 18 | return output_layers 19 | 20 | def draw_prediction(img, class_id, confidence, x, y, x_plus_w, y_plus_h): 21 | label = str(classes[class_id]) 22 | color = COLORS[class_id] 23 | cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2) 24 | cv2.putText(img, label+'('+str(int(confidence*100))+'%)', (x - 10, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) 25 | 26 | net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg') 27 | print("[INFO] starting video stream...") 28 | vs = cv2.VideoCapture(0) 29 | start = time.time() 30 | 31 | frame_count = 0.0 32 | 33 | KNOWN_DISTANCE = 24.0 34 | KNOWN_WIDTH = 11.0 35 | 36 | while True: 37 | ret, frame = vs.read() 38 | cv2.resize(frame, (600, frame.shape[0])) 39 | 40 | scale = 0.00392 41 | (height, width) = frame.shape[:2] 42 | blob = cv2.dnn.blobFromImage(frame, scale, (416, 416), (0, 0, 0), True, crop=False) 43 | net.setInput(blob) 44 | outs = net.forward(get_output_layers(net)) 45 | class_ids = [] 46 | confidences = [] 47 | boxes = [] 48 | conf_threshold = 0.5 49 | nms_threshold = 0.4 50 | for out in outs: 51 | for detection in out: 52 | scores = detection[5:] 53 | class_id = np.argmax(scores) 54 | confidence = scores[class_id] 55 | if confidence > 0.5: 56 | center_x = int(detection[0] * width) 57 | center_y = int(detection[1] * height) 58 | w = int(detection[2] * width) 59 | h = int(detection[3] * height) 60 | x = center_x - w / 2 61 | y = center_y - h / 2 62 | class_ids.append(class_id) 63 | confidences.append(float(confidence)) 64 | boxes.append([x, y, w, h]) 65 | 66 | indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold) 67 | 68 | for i in indices: 69 | i = i[0] 70 | box = boxes[i] 71 | x = box[0] 72 | y = box[1] 73 | w = box[2] 74 | h = box[3] 75 | draw_prediction(frame, class_ids[i], confidences[i], round(x), round(y), round(x + w), round(y + h)) 76 | 77 | frame_count += 1 78 | 79 | cv2.imshow("Frame", frame) 80 | key = cv2.waitKey(1) & 0xFF 81 | #if the `q` key was pressed, break from the loop 82 | if key == ord("q"): 83 | break 84 | 85 | end = time.time() 86 | 87 | time_elapsed = end - start 88 | 89 | print("[INFO] elapsed time: {:.2f}".format(time_elapsed)) 90 | print("[INFO] approx. FPS: {:.2f}".format(frame_count / time_elapsed)) 91 | vs.release() 92 | cv2.destroyAllWindows() 93 | -------------------------------------------------------------------------------- /model.json: -------------------------------------------------------------------------------- 1 | { 2 | "class_name": "Sequential", 3 | "config": [ 4 | { 5 | "class_name": "Conv2D", 6 | "config": { 7 | "name": "conv2d_1", 8 | "trainable": true, 9 | "batch_input_shape": [ 10 | null, 11 | 240, 12 | 320, 13 | 3 14 | ], 15 | "dtype": "float32", 16 | "filters": 4, 17 | "kernel_size": [ 18 | 3, 19 | 3 20 | ], 21 | "strides": [ 22 | 1, 23 | 1 24 | ], 25 | "padding": "valid", 26 | "data_format": "channels_last", 27 | "dilation_rate": [ 28 | 1, 29 | 1 30 | ], 31 | "activation": "relu", 32 | "use_bias": true, 33 | "kernel_initializer": { 34 | "class_name": "VarianceScaling", 35 | "config": { 36 | "scale": 1.0, 37 | "mode": "fan_avg", 38 | "distribution": "uniform", 39 | "seed": null 40 | } 41 | }, 42 | "bias_initializer": { 43 | "class_name": "Zeros", 44 | "config": {} 45 | }, 46 | "kernel_regularizer": null, 47 | "bias_regularizer": null, 48 | "activity_regularizer": null, 49 | "kernel_constraint": null, 50 | "bias_constraint": null 51 | } 52 | }, 53 | { 54 | "class_name": "MaxPooling2D", 55 | "config": { 56 | "name": "max_pooling2d_1", 57 | "trainable": true, 58 | "pool_size": [ 59 | 2, 60 | 2 61 | ], 62 | "padding": "valid", 63 | "strides": [ 64 | 2, 65 | 2 66 | ], 67 | "data_format": "channels_last" 68 | } 69 | }, 70 | { 71 | "class_name": "Flatten", 72 | "config": { 73 | "name": "flatten_1", 74 | "trainable": true, 75 | "data_format": "channels_last" 76 | } 77 | }, 78 | { 79 | "class_name": "Dense", 80 | "config": { 81 | "name": "dense_1", 82 | "trainable": true, 83 | "units": 560, 84 | "activation": "relu", 85 | "use_bias": true, 86 | "kernel_initializer": { 87 | "class_name": "VarianceScaling", 88 | "config": { 89 | "scale": 1.0, 90 | "mode": "fan_avg", 91 | "distribution": "uniform", 92 | "seed": null 93 | } 94 | }, 95 | "bias_initializer": { 96 | "class_name": "Zeros", 97 | "config": {} 98 | }, 99 | "kernel_regularizer": null, 100 | "bias_regularizer": null, 101 | "activity_regularizer": null, 102 | "kernel_constraint": null, 103 | "bias_constraint": null 104 | } 105 | }, 106 | { 107 | "class_name": "Dense", 108 | "config": { 109 | "name": "dense_2", 110 | "trainable": true, 111 | "units": 560, 112 | "activation": "relu", 113 | "use_bias": true, 114 | "kernel_initializer": { 115 | "class_name": "VarianceScaling", 116 | "config": { 117 | "scale": 1.0, 118 | "mode": "fan_avg", 119 | "distribution": "uniform", 120 | "seed": null 121 | } 122 | }, 123 | "bias_initializer": { 124 | "class_name": "Zeros", 125 | "config": {} 126 | }, 127 | "kernel_regularizer": null, 128 | "bias_regularizer": null, 129 | "activity_regularizer": null, 130 | "kernel_constraint": null, 131 | "bias_constraint": null 132 | } 133 | }, 134 | { 135 | "class_name": "Dense", 136 | "config": { 137 | "name": "dense_3", 138 | "trainable": true, 139 | "units": 560, 140 | "activation": "relu", 141 | "use_bias": true, 142 | "kernel_initializer": { 143 | "class_name": "VarianceScaling", 144 | "config": { 145 | "scale": 1.0, 146 | "mode": "fan_avg", 147 | "distribution": "uniform", 148 | "seed": null 149 | } 150 | }, 151 | "bias_initializer": { 152 | "class_name": "Zeros", 153 | "config": {} 154 | }, 155 | "kernel_regularizer": null, 156 | "bias_regularizer": null, 157 | "activity_regularizer": null, 158 | "kernel_constraint": null, 159 | "bias_constraint": null 160 | } 161 | }, 162 | { 163 | "class_name": "Dense", 164 | "config": { 165 | "name": "dense_4", 166 | "trainable": true, 167 | "units": 560, 168 | "activation": "relu", 169 | "use_bias": true, 170 | "kernel_initializer": { 171 | "class_name": "VarianceScaling", 172 | "config": { 173 | "scale": 1.0, 174 | "mode": "fan_avg", 175 | "distribution": "uniform", 176 | "seed": null 177 | } 178 | }, 179 | "bias_initializer": { 180 | "class_name": "Zeros", 181 | "config": {} 182 | }, 183 | "kernel_regularizer": null, 184 | "bias_regularizer": null, 185 | "activity_regularizer": null, 186 | "kernel_constraint": null, 187 | "bias_constraint": null 188 | } 189 | }, 190 | { 191 | "class_name": "Dense", 192 | "config": { 193 | "name": "dense_5", 194 | "trainable": true, 195 | "units": 1, 196 | "activation": "sigmoid", 197 | "use_bias": true, 198 | "kernel_initializer": { 199 | "class_name": "VarianceScaling", 200 | "config": { 201 | "scale": 1.0, 202 | "mode": "fan_avg", 203 | "distribution": "uniform", 204 | "seed": null 205 | } 206 | }, 207 | "bias_initializer": { 208 | "class_name": "Zeros", 209 | "config": {} 210 | }, 211 | "kernel_regularizer": null, 212 | "bias_regularizer": null, 213 | "activity_regularizer": null, 214 | "kernel_constraint": null, 215 | "bias_constraint": null 216 | } 217 | } 218 | ], 219 | "keras_version": "2.2.2", 220 | "backend": "tensorflow" 221 | } -------------------------------------------------------------------------------- /opencvutils/detection/detection.py: -------------------------------------------------------------------------------- 1 | from time import sleep 2 | import time 3 | import logging as log 4 | import argparse 5 | import cv2 6 | import datetime as dt 7 | import datetime 8 | 9 | class MotionDetection: 10 | 11 | def __init__(self): 12 | print("Motion Detection Ready to Run!") 13 | 14 | def run(self): 15 | ap = argparse.ArgumentParser() 16 | ap.add_argument("-v", "--video", help="path to the video file") 17 | ap.add_argument("-a", "--min-area", type=int, default=500, help="minimum area size") 18 | args = vars(ap.parse_args()) 19 | 20 | # if the video argument is None, then we are reading from webcam 21 | if args.get("video", None) is None: 22 | vs = cv2.VideoCapture(0) 23 | time.sleep(2.0) 24 | 25 | # otherwise, we are reading from a video file 26 | else: 27 | vs = cv2.VideoCapture(args["video"]) 28 | 29 | # initialize the first frame in the video stream 30 | firstFrame = None 31 | 32 | # loop over the frames of the video 33 | while True: 34 | # grab the current frame and initialize the occupied/unoccupied 35 | # text 36 | ret, frame = vs.read() 37 | text = "Unoccupied" 38 | 39 | # if the frame could not be grabbed, then we have reached the end 40 | # of the video 41 | if frame is None: 42 | break 43 | 44 | # resize the frame, convert it to grayscale, and blur it 45 | frame = cv2.resize(frame, (500, frame.shape[0])) 46 | gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 47 | gray = cv2.GaussianBlur(gray, (21, 21), 0) 48 | 49 | # if the first frame is None, initialize it 50 | if firstFrame is None: 51 | firstFrame = gray 52 | continue 53 | 54 | # compute the absolute difference between the current frame and 55 | # first frame 56 | frameDelta = cv2.absdiff(firstFrame, gray) 57 | thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1] 58 | 59 | # dilate the thresholded image to fill in holes, then find contours 60 | # on thresholded image 61 | thresh = cv2.dilate(thresh, None, iterations=2) 62 | cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, 63 | cv2.CHAIN_APPROX_SIMPLE) 64 | cnts = cnts[0] 65 | 66 | # loop over the contours 67 | for c in cnts: 68 | # if the contour is too small, ignore it 69 | if cv2.contourArea(c) < args["min_area"]: 70 | continue 71 | 72 | # compute the bounding box for the contour, draw it on the frame, 73 | # and update the text 74 | (x, y, w, h) = cv2.boundingRect(c) 75 | cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) 76 | # TODO Occupied by whom? Using GAIT, passing the video argument to gait 77 | text = "Occupied" 78 | 79 | # draw the text and timestamp on the frame 80 | cv2.putText(frame, "Room Status: {}".format(text), (10, 20), 81 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2) 82 | cv2.putText(frame, datetime.datetime.now().strftime("%A %d %B %Y %I:%M:%S%p"), 83 | (10, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.35, (0, 0, 255), 1) 84 | 85 | # show the frame and record if the user presses a key 86 | cv2.imshow("Gait Recognition", frame) 87 | cv2.imshow("Thresh", thresh) 88 | cv2.imshow("Frame Delta", frameDelta) 89 | key = cv2.waitKey(1) & 0xFF 90 | 91 | # if the `q` key is pressed, break from the lop 92 | if key == ord("q"): 93 | break 94 | 95 | # cleanup the camera and close any open windows 96 | vs.release() 97 | cv2.destroyAllWindows() 98 | 99 | 100 | class PedestrianDetection: 101 | 102 | def __init__(self): 103 | print("Pedestrian Detection Ready to Run") 104 | 105 | def run(self): 106 | hog = cv2.HOGDescriptor() 107 | hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) 108 | cap = cv2.VideoCapture("/path/to/test/video") 109 | while True: 110 | r, frame = cap.read() 111 | if r: 112 | start_time = time.time() 113 | frame = cv2.resize(frame, (1280, 720)) # Downscale to improve frame rate 114 | gray_frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY) # HOG needs a grayscale image 115 | 116 | rects, weights = hog.detectMultiScale(gray_frame) 117 | 118 | # Measure elapsed time for detections 119 | end_time = time.time() 120 | print("Elapsed time:", end_time - start_time) 121 | 122 | for i, (x, y, w, h) in enumerate(rects): 123 | if weights[i] < 0.7: 124 | continue 125 | cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) 126 | 127 | cv2.imshow("preview", frame) 128 | k = cv2.waitKey(1) 129 | if k & 0xFF == ord("q"): # Exit condition 130 | break 131 | 132 | 133 | class FaceDetection: 134 | 135 | def __init__(self): 136 | print("Face Detection Ready to run!") 137 | 138 | def run(self): 139 | cascPath = "haarcascade_frontalface_default.xml" 140 | faceCascade = cv2.CascadeClassifier(cascPath) 141 | log.basicConfig(filename='webcam.log', level=log.INFO) 142 | 143 | video_capture = cv2.VideoCapture(0) 144 | anterior = 0 145 | 146 | while True: 147 | if not video_capture.isOpened(): 148 | print('Unable to load camera.') 149 | sleep(5) 150 | pass 151 | 152 | # Capture frame-by-frame 153 | ret, frame = video_capture.read() 154 | 155 | gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 156 | 157 | faces = faceCascade.detectMultiScale( 158 | gray, 159 | scaleFactor=1.1, 160 | minNeighbors=5, 161 | minSize=(30, 30) 162 | ) 163 | 164 | # Draw a rectangle around the faces 165 | for (x, y, w, h) in faces: 166 | cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) 167 | 168 | if anterior != len(faces): 169 | anterior = len(faces) 170 | log.info("faces: " + str(len(faces)) + " at " + str(dt.datetime.now())) 171 | 172 | # Display the resulting frame 173 | cv2.imshow('Video', frame) 174 | 175 | if cv2.waitKey(1) & 0xFF == ord('q'): 176 | break 177 | 178 | # Display the resulting frame 179 | cv2.imshow('Video', frame) 180 | 181 | # When everything is done, release the capture 182 | video_capture.release() 183 | cv2.destroyAllWindows() 184 | -------------------------------------------------------------------------------- /opencvutils/detection/test.py: -------------------------------------------------------------------------------- 1 | from detection import MotionDetection, PedestrianDetection, FaceDetection 2 | 3 | 4 | def motion_test(): 5 | motion_obj = MotionDetection() 6 | motion_obj.run() 7 | 8 | 9 | def face_test(): 10 | face_obj = FaceDetection() 11 | face_obj.run() 12 | 13 | 14 | def ped_test(): 15 | ped_obj = PedestrianDetection() 16 | ped_obj.run() 17 | 18 | 19 | #motion_test()#face_test() 20 | ped_test() 21 | -------------------------------------------------------------------------------- /opencvutils/detection_tests/frontal_face/object_detection.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import sys 3 | import logging as log 4 | import datetime as dt 5 | from time import sleep 6 | 7 | cascPath = "haarcascade_frontalface_default.xml" 8 | faceCascade = cv2.CascadeClassifier(cascPath) 9 | log.basicConfig(filename='webcam.log',level=log.INFO) 10 | 11 | video_capture = cv2.VideoCapture(0) 12 | anterior = 0 13 | 14 | while True: 15 | if not video_capture.isOpened(): 16 | print('Unable to load camera.') 17 | sleep(5) 18 | pass 19 | 20 | # Capture frame-by-frame 21 | ret, frame = video_capture.read() 22 | 23 | gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 24 | 25 | faces = faceCascade.detectMultiScale( 26 | gray, 27 | scaleFactor=1.1, 28 | minNeighbors=5, 29 | minSize=(30, 30) 30 | ) 31 | 32 | # Draw a rectangle around the faces 33 | for (x, y, w, h) in faces: 34 | cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2) 35 | 36 | if anterior != len(faces): 37 | anterior = len(faces) 38 | log.info("faces: "+str(len(faces))+" at "+str(dt.datetime.now())) 39 | 40 | 41 | # Display the resulting frame 42 | cv2.imshow('Video', frame) 43 | 44 | 45 | if cv2.waitKey(1) & 0xFF == ord('q'): 46 | break 47 | 48 | # Display the resulting frame 49 | cv2.imshow('Video', frame) 50 | 51 | # When everything is done, release the capture 52 | video_capture.release() 53 | cv2.destroyAllWindows() 54 | -------------------------------------------------------------------------------- /opencvutils/detection_tests/full_body/motion_detector.py: -------------------------------------------------------------------------------- 1 | # python motion_detector.py 2 | # python motion_detector.py --video "path to recorder video" 3 | 4 | import argparse 5 | import datetime 6 | import time 7 | import cv2 8 | 9 | # construct the argument parser and parse the arguments 10 | ap = argparse.ArgumentParser() 11 | ap.add_argument("-v", "--video", help="path to the video file") 12 | ap.add_argument("-a", "--min-area", type=int, default=500, help="minimum area size") 13 | args = vars(ap.parse_args()) 14 | 15 | # if the video argument is None, then we are reading from webcam 16 | if args.get("video", None) is None: 17 | vs = VideoCapture(0) 18 | time.sleep(2.0) 19 | 20 | # otherwise, we are reading from a video file 21 | else: 22 | vs = cv2.VideoCapture(args["video"]) 23 | 24 | # initialize the first frame in the video stream 25 | firstFrame = None 26 | 27 | # loop over the frames of the video 28 | while True: 29 | # grab the current frame and initialize the occupied/unoccupied 30 | # text 31 | ret, frame = vs.read() 32 | text = "Unoccupied" 33 | 34 | # if the frame could not be grabbed, then we have reached the end 35 | # of the video 36 | if frame is None: 37 | break 38 | 39 | # resize the frame, convert it to grayscale, and blur it 40 | frame = cv2.resize(frame, (500, frame.shape[0])) 41 | gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 42 | gray = cv2.GaussianBlur(gray, (21, 21), 0) 43 | 44 | # if the first frame is None, initialize it 45 | if firstFrame is None: 46 | firstFrame = gray 47 | continue 48 | 49 | # compute the absolute difference between the current frame and 50 | # first frame 51 | frameDelta = cv2.absdiff(firstFrame, gray) 52 | thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1] 53 | 54 | # dilate the thresholded image to fill in holes, then find contours 55 | # on thresholded image 56 | thresh = cv2.dilate(thresh, None, iterations=2) 57 | cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, 58 | cv2.CHAIN_APPROX_SIMPLE) 59 | cnts = cnts[0] 60 | 61 | # loop over the contours 62 | for c in cnts: 63 | # if the contour is too small, ignore it 64 | if cv2.contourArea(c) < args["min_area"]: 65 | continue 66 | 67 | # compute the bounding box for the contour, draw it on the frame, 68 | # and update the text 69 | (x, y, w, h) = cv2.boundingRect(c) 70 | cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) 71 | # TODO Occupied by whom? Using GAIT, passing the video argument to gait 72 | text = "Occupied" 73 | 74 | # draw the text and timestamp on the frame 75 | cv2.putText(frame, "Room Status: {}".format(text), (10, 20), 76 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2) 77 | cv2.putText(frame, datetime.datetime.now().strftime("%A %d %B %Y %I:%M:%S%p"), 78 | (10, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.35, (0, 0, 255), 1) 79 | 80 | # show the frame and record if the user presses a key 81 | cv2.imshow("Gait Recognition", frame) 82 | cv2.imshow("Thresh", thresh) 83 | cv2.imshow("Frame Delta", frameDelta) 84 | key = cv2.waitKey(1) & 0xFF 85 | 86 | # if the `q` key is pressed, break from the lop 87 | if key == ord("q"): 88 | break 89 | 90 | # cleanup the camera and close any open windows 91 | vs.release() 92 | cv2.destroyAllWindows() 93 | -------------------------------------------------------------------------------- /opencvutils/detection_tests/pedestrian/pedestrian_detection.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import time 3 | 4 | hog = cv2.HOGDescriptor() 5 | hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) 6 | cap = cv2.VideoCapture("/path/to/test/video") 7 | while True: 8 | r, frame = cap.read() 9 | if r: 10 | start_time = time.time() 11 | frame = cv2.resize(frame, (1280, 720)) # Downscale to improve frame rate 12 | gray_frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY) # HOG needs a grayscale image 13 | 14 | rects, weights = hog.detectMultiScale(gray_frame) 15 | 16 | # Measure elapsed time for detections 17 | end_time = time.time() 18 | print("Elapsed time:", end_time - start_time) 19 | 20 | for i, (x, y, w, h) in enumerate(rects): 21 | if weights[i] < 0.7: 22 | continue 23 | cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) 24 | 25 | cv2.imshow("preview", frame) 26 | k = cv2.waitKey(1) 27 | if k & 0xFF == ord("q"): # Exit condition 28 | break 29 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | pandas 3 | keras 4 | tensorflow 5 | imageio 6 | matplotlib 7 | h5py 8 | opencv-python 9 | -------------------------------------------------------------------------------- /yoloutils/detection/detection.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import argparse 3 | import numpy as np 4 | 5 | ap = argparse.ArgumentParser() 6 | ap.add_argument('-i', '--image', required=True, 7 | help='path to input image') 8 | ap.add_argument('-c', '--config', required=True, 9 | help='path to yolo config file') 10 | ap.add_argument('-w', '--weights', required=True, 11 | help='path to yolo pre-trained weights') 12 | ap.add_argument('-cl', '--classes', required=True, 13 | help='path to text file containing class names') 14 | args = ap.parse_args() 15 | 16 | 17 | def get_output_layers(net): 18 | 19 | layer_names = net.getLayerNames() 20 | 21 | output_layers = [layer_names[i[0] - 1] 22 | for i in net.getUnconnectedOutLayers()] 23 | 24 | return output_layers 25 | 26 | 27 | def draw_prediction(img, class_id, confidence, x, y, x_plus_w, y_plus_h): 28 | 29 | label = str(classes[class_id]) 30 | 31 | color = COLORS[class_id] 32 | 33 | cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2) 34 | 35 | cv2.putText(img, label, (x-10, y-10), 36 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) 37 | 38 | 39 | image = cv2.imread(args.image) 40 | 41 | Width = image.shape[1] 42 | Height = image.shape[0] 43 | scale = 0.00392 44 | 45 | classes = None 46 | 47 | with open(args.classes, 'r') as f: 48 | classes = [line.strip() for line in f.readlines()] 49 | 50 | COLORS = np.random.uniform(0, 255, size=(len(classes), 3)) 51 | 52 | net = cv2.dnn.readNet(args.weights, args.config) 53 | 54 | blob = cv2.dnn.blobFromImage( 55 | image, scale, (416, 416), (0, 0, 0), True, crop=False) 56 | 57 | net.setInput(blob) 58 | 59 | outs = net.forward(get_output_layers(net)) 60 | 61 | class_ids = [] 62 | confidences = [] 63 | boxes = [] 64 | conf_threshold = 0.5 65 | nms_threshold = 0.4 66 | 67 | 68 | for out in outs: 69 | for detection in out: 70 | scores = detection[5:] 71 | class_id = np.argmax(scores) 72 | confidence = scores[class_id] 73 | if confidence > 0.5: 74 | center_x = int(detection[0] * Width) 75 | center_y = int(detection[1] * Height) 76 | w = int(detection[2] * Width) 77 | h = int(detection[3] * Height) 78 | x = center_x - w / 2 79 | y = center_y - h / 2 80 | class_ids.append(class_id) 81 | confidences.append(float(confidence)) 82 | boxes.append([x, y, w, h]) 83 | 84 | 85 | indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold) 86 | 87 | for i in indices: 88 | i = i[0] 89 | box = boxes[i] 90 | x = box[0] 91 | y = box[1] 92 | w = box[2] 93 | h = box[3] 94 | draw_prediction(image, class_ids[i], confidences[i], round( 95 | x), round(y), round(x+w), round(y+h)) 96 | 97 | cv2.imshow("object detection", image) 98 | cv2.waitKey() 99 | 100 | cv2.imwrite("object-detection.jpg", image) 101 | cv2.destroyAllWindows() 102 | -------------------------------------------------------------------------------- /yoloutils/detection/dog.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rshrc/gait-recognition/30dc1a7ed7432a5be510229053ba1e278aefea55/yoloutils/detection/dog.jpg -------------------------------------------------------------------------------- /yoloutils/detection/general_detection.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import argparse 3 | import numpy as np 4 | 5 | 6 | class DetectObject: 7 | 8 | def __init__(self, ap): 9 | 10 | self.classes = None 11 | self.ap = argparse.ArgumentParser() 12 | self.ap.add_argument('-i', '--image', required=True, 13 | help='path to input image') 14 | self.ap.add_argument('-c', '--config', required=True, 15 | help='path to yolo config file') 16 | self.ap.add_argument('-w', '--weights', required=True, 17 | help='path to yolo pre-trained weights') 18 | self.ap.add_argument('-cl', '--classes', required=True, 19 | help='path to text file containing class names') 20 | self.args = ap.parse_args() 21 | 22 | def get_output_layers(self, net): 23 | 24 | self.layer_names = net.getLayerNames() 25 | 26 | self.output_layers = [self.layer_names[i[0] - 1] 27 | for i in self.net.getUnconnectedOutLayers()] 28 | 29 | return self.output_layers 30 | 31 | def draw_prediction(self, img, class_id, confidence, x, y, x_plus_w, y_plus_h): 32 | 33 | self.label = str(self.classes[self.class_id]) 34 | 35 | self.color = COLORS[class_id] 36 | 37 | self.cv2.rectangle(self.img, (self.x, self.y), 38 | (self.x_plus_w, self.y_plus_h), self.scorescolor, 2) 39 | 40 | self.cv2.putText(self.img, self.label, (x-10, y-10), 41 | self.cv2.FONT_HERSHEY_SIMPLEX, 0.5, self.color, 2) 42 | 43 | def read_image(self): 44 | 45 | self.image = cv2.imread(self.args.image) 46 | 47 | self.Width = self.image.shape[1] 48 | self.Height = self.image.shape[0] 49 | self.scale = 0.00392 50 | 51 | with open(self.args.classes, 'r') as f: 52 | self.classes = [line.strip() for line in f.readlines()] 53 | 54 | self.COLORS = np.random.uniform(0, 255, size=(len(classes), 3)) 55 | 56 | self.net = self.cv2.dnn.readNet(self.args.weights, self.args.config) 57 | 58 | self.blob = self.cv2.dnn.blobFromImage( 59 | self.image, self.scale, (416, 416), (0, 0, 0), True, crop=False) 60 | 61 | self.net.setInput(blob) 62 | 63 | self.outs = net.forward(get_output_layers(net)) 64 | 65 | self.class_ids = [] 66 | self.confidences = [] 67 | self.boxes = [] 68 | self.conf_threshold = 0.5 69 | self.nms_threshold = 0.4 70 | 71 | for out in self.outs: 72 | for detection in self.out: 73 | self.scores = detection[5:] 74 | self.class_id = np.argmax(self.scores) 75 | self.confidence = scores[class_id] 76 | if self.confidence > 0.5: 77 | self.center_x = int(detection[0] * Width) 78 | self.center_y = int(detection[1] * Height) 79 | self.w = int(detection[2] * Width) 80 | selfh = int(detection[3] * Height) 81 | self.x = center_x - w / 2 82 | self.y = center_y - h / 2 83 | self.class_ids.append(class_id) 84 | self.confidences.append(float(confidence)) 85 | self.boxes.append([x, y, w, h]) 86 | 87 | self.indices = cv2.dnn.NMSBoxes( 88 | boxes, confidences, conf_threshold, nms_threshold) 89 | 90 | for i in indices: 91 | i = i[0] 92 | box = boxes[i] 93 | x = box[0] 94 | y = box[1] 95 | w = box[2] 96 | h = box[3] 97 | draw_prediction(image, class_ids[i], confidences[i], round( 98 | x), round(y), round(x+w), round(y+h)) 99 | 100 | cv2.imshow("object detection", image) 101 | cv2.waitKey() 102 | 103 | cv2.imwrite("object-detection.jpg", image) 104 | cv2.destroyAllWindows() 105 | -------------------------------------------------------------------------------- /yoloutils/detection/testing_gen_detection.py: -------------------------------------------------------------------------------- 1 | from general_detection import DetectObject 2 | -------------------------------------------------------------------------------- /yoloutils/detection/yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=16 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 -------------------------------------------------------------------------------- /yoloutils/detection/yolov3.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | couch 59 | potted plant 60 | bed 61 | dining table 62 | toilet 63 | tv 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush -------------------------------------------------------------------------------- /yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=16 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 -------------------------------------------------------------------------------- /yolov3.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | couch 59 | potted plant 60 | bed 61 | dining table 62 | toilet 63 | tv 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush --------------------------------------------------------------------------------