├── requirements.txt ├── README.md ├── source.py └── yolov3-custom.cfg /requirements.txt: -------------------------------------------------------------------------------- 1 | imutils==0.5.4 2 | numpy==1.24.3 3 | opencv_python==4.6.0.66 4 | Pillow==10.0.1 5 | tensorflow==2.13.0 6 | tensorflow_intel==2.13.0 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Helmet-and-Number-Plate-Detection-and-Recognition 2 | Motorcycle Accidents have been rapidly growing throughout the years in many countries. The helmet is the main safety equipment of motorcyclists. There was need to propose an automated system that monitors motorcycles and detects the persons wearing helmet or not and a system to detect number plates. 3 | 4 | This system proposes an automated system for detecting motorcyclists who do not wear a helmet and a system for retrieving motorcycle number plates.So, our model detects the helmet of the rider. If the two-wheeler rider does not wear the helmet, it detects the number plate of the vehicle.To detect the objects, this deep learning algorithm uses CNN (Convolutional Neural Network) that recognizes specific objects in videos, live images or feeds. 5 | 6 | This project is a Streamlit-based application for detecting helmets, bikes, and recognizing number plates in a video stream. It uses the YOLOv3 object detection model for detecting bikes and helmets and a CNN model for helmet detection. Additionally, it recognizes number plates in real-time video. 7 | 8 | # Installation 9 | 1. Clone the Repository 10 | 11 | Clone this repository to your local machine. 12 | 3. Install Dependencies : 13 | 14 | Navigate to the project directory and install the required dependencies listed in the requirements.txt file using pip : 15 | 16 | pip install -r requirements.txt 17 | 5. Download YOLO Weights and Configuration : 18 | 19 | Download the YOLOv3 weights (yolov3-custom_7000.weights) and configuration (yolov3-custom.cfg) files. You can obtain these files from your YOLOv3 training or a pre-trained YOLOv3 model. Place these files in the project directory. 20 | 7. Download Helmet Detection Model : 21 | 22 | Download the helmet detection model (helmet-nonhelmet_cnn.h5) and place it in the project directory. You can train this model using your dataset or use a pre-trained one. 23 | 24 | # Usage 25 | 1. Run the Streamlit App 26 | 27 | To run the Streamlit app, use the following command: 28 | 29 | streamlit run source.py 30 | 31 | This will start the Streamlit development server and open the app in your default web browser. 32 | 33 | 2. Upload a Video File 34 | 35 | On the Streamlit app, use the file uploader to select a video file (e.g., MP4 or AVI) that you want to process. 36 | 37 | 3. View the Detection and Recognition Results 38 | 39 | The app will display the video with real-time detections of bikes and helmets. If a helmet is detected, it will be labeled as "Helmet" or "No Helmet" based on the helmet detection model's prediction. Number plates, if present, will also be recognized and displayed. 40 | 41 | 4. Interact with the App 42 | 43 | You can pause, resume, and navigate through the video using the app's interface. Observe the real-time results as the video plays. 44 | 45 | # File Structure 46 | 47 | * source.py: The main Streamlit app code for helmet, bike, and number plate detection and recognition. 48 | * requirements.txt: A list of required Python packages and their versions. 49 | * yolov3-custom_7000.weights: YOLOv3 custom-trained weights for object detection. 50 | `https://drive.google.com/file/d/17DWQ1WfYHxYD_wab2OQybHwRaDDGN54K/view?usp=sharing` 51 | * yolov3-custom.cfg: YOLOv3 custom model configuration file. 52 | * helmet-nonhelmet_cnn.h5: Helmet detection CNN model weights. 53 | `https://drive.google.com/file/d/1QW5Fw3sWHqSiJIpzxkYLpREjO_OmqX8W/view?usp=sharing` 54 | 55 | # ScreenShots 56 | ![ss](https://github.com/FatimaSidra/Helmet-and-Number-Plate-Detection-and-Recognition/assets/112679516/dc00805f-2ce6-457b-b152-5b97f4b497bd) 57 | ![ss2](https://github.com/FatimaSidra/Helmet-and-Number-Plate-Detection-and-Recognition/assets/112679516/3254988d-1fd7-4cba-a53e-bd3efea3ce12) 58 | ![ss3](https://github.com/FatimaSidra/Helmet-and-Number-Plate-Detection-and-Recognition/assets/112679516/a98592c1-06e0-4933-ac5d-5fb4110a7190) 59 | 60 | # Sample Output video 61 | https://drive.google.com/file/d/1L4BRoO4WndLfTzfy4bOi-Oa7RpTWwOsU/view?usp=sharing 62 | 63 | # Acknowledgements 64 | * This project uses YOLOv3 for object detection. You can find more information about YOLOv3 here. 65 | 66 | https://pjreddie.com/darknet/yolo/ 67 | 68 | * The helmet detection model is a CNN-based model used for detecting helmets on bike riders.Number plate recognition is performed in real-time to identify and display number plates. 69 | * Special thanks to the Streamlit community for creating an easy-to-use web framework for data science applications. Visit Streamlit's official website for more information. 70 | 71 | Feel free to customize and extend this project to suit your specific needs or explore other object detection and recognition tasks using Streamlit. 72 | -------------------------------------------------------------------------------- /source.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import cv2 3 | import numpy as np 4 | import os 5 | from PIL import Image 6 | import time 7 | import imutils 8 | from tensorflow.keras.models import load_model 9 | 10 | # Load YOLO model 11 | net = cv2.dnn.readNet("yolov3-custom_7000.weights", "yolov3-custom.cfg") 12 | net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) 13 | net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA) 14 | 15 | # Define mask values for each YOLO layer 16 | mask1 = [0, 1, 2] 17 | mask2 = [3, 4, 5] 18 | mask3 = [6, 7, 8] 19 | 20 | # Load helmet detection model 21 | model = load_model('helmet-nonhelmet_cnn.h5') 22 | st.write('Model loaded!!!') 23 | 24 | st.title("Bike,Helmet and Number Plate Detection and Recognition") 25 | 26 | uploaded_file = st.file_uploader("Choose a video file", type=["mp4", "avi"]) 27 | if uploaded_file is not None: 28 | # Save the uploaded file to a temporary directory 29 | temp_file_path = os.path.join("temp", uploaded_file.name) 30 | with open(temp_file_path, "wb") as temp_file: 31 | temp_file.write(uploaded_file.read()) 32 | 33 | video = cv2.VideoCapture(temp_file_path) 34 | 35 | if not video.isOpened(): 36 | st.error("Error: Could not open video file.") 37 | else: 38 | stframe = st.empty() 39 | 40 | while True: 41 | ret, frame = video.read() 42 | 43 | if not ret: 44 | break 45 | 46 | frame = imutils.resize(frame, height=500) 47 | height, width = frame.shape[:2] 48 | 49 | blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False) 50 | net.setInput(blob) 51 | 52 | # Specify the correct mask for the YOLO layer 53 | if mask1: 54 | outs = [net.forward("yolo_82"), net.forward("yolo_94"), net.forward("yolo_106")] 55 | elif mask2: 56 | outs = [net.forward("yolo_89"), net.forward("yolo_101"), net.forward("yolo_113")] 57 | elif mask3: 58 | outs = [net.forward("yolo_96"), net.forward("yolo_108"), net.forward("yolo_120")] 59 | 60 | confidences = [] 61 | boxes = [] 62 | classIds = [] 63 | 64 | for out in outs: 65 | for detection in out: 66 | scores = detection[5:] 67 | class_id = np.argmax(scores) 68 | confidence = scores[class_id] 69 | if confidence > 0.3: 70 | center_x = int(detection[0] * width) 71 | center_y = int(detection[1] * height) 72 | w = int(detection[2] * width) 73 | h = int(detection[3] * height) 74 | x = int(center_x - w / 2) 75 | y = int(center_y - h / 2) 76 | boxes.append([x, y, w, h]) 77 | confidences.append(float(confidence)) 78 | classIds.append(class_id) 79 | 80 | indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4) 81 | 82 | for i in range(len(boxes)): 83 | if i in indexes: 84 | x, y, w, h = boxes[i] 85 | if classIds[i] == 0: # bike 86 | helmet_roi = frame[max(0, y):max(0, y) + max(0, h) // 4, max(0, x):max(0, x) + max(0, w)] 87 | if helmet_roi.shape[0] > 0 and helmet_roi.shape[1] > 0: 88 | helmet_roi = cv2.resize(helmet_roi, (224, 224)) 89 | helmet_roi = np.array(helmet_roi, dtype='float32') 90 | helmet_roi = helmet_roi.reshape(1, 224, 224, 3) 91 | helmet_roi = helmet_roi / 255.0 92 | prediction = int(model.predict(helmet_roi)[0][0]) 93 | if prediction == 0: 94 | frame = cv2.putText(frame, 'Helmet', (x, y - 20), cv2.FONT_HERSHEY_SIMPLEX, 0.7, 95 | (0, 255, 0), 2) 96 | else: 97 | frame = cv2.putText(frame, 'No Helmet', (x, y - 20), cv2.FONT_HERSHEY_SIMPLEX, 0.7, 98 | (0, 0, 255), 2) 99 | cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) 100 | 101 | stframe.image(frame, channels="BGR", use_column_width=True) 102 | 103 | video.release() 104 | # Remove the temporary video file 105 | os.remove(temp_file_path) 106 | -------------------------------------------------------------------------------- /yolov3-custom.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | #batch=64 4 | #subdivisions=16 5 | # Training 6 | batch=64 7 | subdivisions=16 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 6000 21 | policy=steps 22 | steps=4800,5400 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=21 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 41,176, 75,283, 123,320, 194,348, 273,373,0,0,0,0,0,0,0,0 610 | classes=2 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .7 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=21 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 41,176, 75,283, 123,320, 194,348, 273,373,0,0,0,0,0,0,0,0 696 | classes=2 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .7 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=21 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 41,176, 75,283, 123,320, 194,348, 273,373,0,0,0,0,0,0,0,0 783 | classes=2 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .7 787 | truth_thresh = 1 788 | random=1 789 | 790 | --------------------------------------------------------------------------------