├── README.md
├── models
├── best.pt
└── yolo11n.pt
├── requirements.txt
├── rolling_video
└── Video_20241001164011269.avi
├── tracker.py
├── yolo_detection_tracking.py
└── yolo_detector.py
/README.md:
--------------------------------------------------------------------------------
1 |
YOLO Object Detection and Tracking
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 | A real-time object detection and tracking system using YOLO 11 and Deep SORT.
10 |
11 |
12 | ---
13 |
14 | 📋 Table of Contents
15 |
25 |
26 | ---
27 |
28 | 📖 Overview
29 |
30 | This project implements real-time object detection and tracking using YOLO and Deep SORT. The tracking algorithm ensures persistent IDs for detected objects and handles detection across video frames.
31 |
32 |
33 | ---
34 |
35 | 🌟 Features
36 |
37 |
38 | - Real-time object detection using YOLO.
39 | - Deep SORT object tracking with ID persistence across frames.
40 | - Customizable detection confidence threshold.
41 | - Aspect ratio maintained using padding for resized images.
42 | - Filter to track only objects that appear in the center of the frame.
43 |
44 |
45 | ---
46 |
47 | 🛠️ Dependencies
48 | Make sure to install the following Python libraries:
49 |
50 | pip install opencv-python torch deep_sort_realtime numpy
51 |
52 |
53 | - opencv-python - For handling video frames and drawing bounding boxes.
54 | - torch - To load and run the YOLO model.
55 | - deep_sort_realtime - For object tracking across frames.
56 | - numpy - General-purpose array operations.
57 |
58 |
59 | ---
60 |
61 | 💻 Installation
62 |
63 |
64 | - Clone the repository:
65 |
66 | git clone https://github.com/iamrukeshduwal/yolov11_real_time_object_detection_with_DeepSORT.git
67 | cd yolov11_real_time_object_detection_with_DeepSORT
68 |
69 |
70 | - Install the required Python libraries:
71 |
72 | pip install -r requirements.txt
73 |
74 | - Ensure your YOLO model weights are placed in the correct directory and update the
MODEL_PATH
in yolo_detection_tracker.py
accordingly.
75 |
76 |
77 | ---
78 |
79 | 🚀 Usage
80 | Run the following command to start detecting and tracking objects in a video:
81 |
82 | python yolo_detection_tracker.py
83 |
84 | Modify the video path and parameters (e.g., confidence threshold) in yolo_detection_tracker.py
to suit your needs.
85 |
86 | ---
87 |
88 | 📝 Code Explanation
89 |
90 | yolo_detection_tracker.py
91 | The main script that handles video input, object detection with YOLO, and tracking with Deep SORT.
92 |
93 | detector = YoloDetector(model_path=MODEL_PATH, confidence=0.2)
94 | tracker = Tracker()
95 |
96 |
97 | Tracks objects, maintains their IDs, and only tracks objects in the middle of the frame.
98 |
99 | yolo_detector.py
100 | Contains the YoloDetector class that loads the YOLO model and performs object detection.
101 |
102 | detections = detector.detect(frame)
103 |
104 |
105 | tracker.py
106 | Defines the Tracker class, which implements object tracking using the Deep SORT algorithm.
107 |
108 | tracking_ids, boxes = tracker.track(detections, resized_frame)
109 |
110 |
111 | ---
112 |
113 | ⚙️ Customization
114 |
115 | Adjusting Detection Confidence
116 |
117 | You can change the detection confidence threshold in the YoloDetector
by modifying the following line in yolo_detection_tracker.py
:
118 |
119 | detector = YoloDetector(model_path=MODEL_PATH, confidence=0.3)
120 |
121 |
122 | Filtering Objects by Position
123 |
124 | The current implementation only tracks objects detected in the middle of the frame. You can adjust this behavior in yolo_detection_tracker.py
by modifying the center filtering logic.
125 |
126 | ---
127 |
128 | 📜 License
129 | This project is licensed under the MIT License. See the LICENSE
file for more details.
130 |
131 | ---
132 |
133 |
134 |
--------------------------------------------------------------------------------
/models/best.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iamrukeshduwal/yolov11_real_time_object_detection_with_DeepSORT/eece29c0576d150f5aa10bc46042cfecda99e2e6/models/best.pt
--------------------------------------------------------------------------------
/models/yolo11n.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iamrukeshduwal/yolov11_real_time_object_detection_with_DeepSORT/eece29c0576d150f5aa10bc46042cfecda99e2e6/models/yolo11n.pt
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | opencv-python==4.10.0.84
2 | ultralytics==8.3.1
3 | deep-sort-realtime==1.3.2
4 |
--------------------------------------------------------------------------------
/rolling_video/Video_20241001164011269.avi:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iamrukeshduwal/yolov11_real_time_object_detection_with_DeepSORT/eece29c0576d150f5aa10bc46042cfecda99e2e6/rolling_video/Video_20241001164011269.avi
--------------------------------------------------------------------------------
/tracker.py:
--------------------------------------------------------------------------------
1 | from deep_sort_realtime.deepsort_tracker import DeepSort
2 |
3 |
4 | class Tracker:
5 | def __init__(self):
6 | self.object_tracker = DeepSort(
7 | max_age=5,
8 | n_init=2,
9 | nms_max_overlap=0.5, # Adjusted for better overlap handling
10 | max_cosine_distance=0.8,
11 | nn_budget=None,
12 | override_track_class=None,
13 | embedder="mobilenet",
14 | half=True,
15 | bgr=True,
16 | embedder_model_name=None,
17 | embedder_wts=None,
18 | polygon=False,
19 | today=None
20 | )
21 |
22 |
23 |
24 | def track(self, detections, frame):
25 | tracks = self.object_tracker.update_tracks(detections, frame=frame)
26 |
27 | tracking_ids = []
28 | boxes = []
29 | for track in tracks:
30 | if not track.is_confirmed():
31 | continue
32 | tracking_ids.append(track.track_id)
33 | ltrb = track.to_ltrb()
34 | boxes.append(ltrb)
35 |
36 | return tracking_ids, boxes
37 |
--------------------------------------------------------------------------------
/yolo_detection_tracking.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import numpy as np
3 | import time
4 | from yolo_detector import YoloDetector
5 | from tracker import Tracker
6 |
7 | MODEL_PATH = "./models/best.pt"
8 | VIDEO_PATH = "./rolling_video/Video_20241001164011269.avi"
9 |
10 | # Set the desired input size for YOLO (416x416)
11 | YOLO_INPUT_SIZE = 416
12 |
13 | # Set the desired FPS for frame control
14 | DESIRED_FPS = 3 # Slow down to 5 frames per second
15 |
16 | def resize_with_padding(image, target_size):
17 | h, w,_ = image.shape[:3]
18 | scale = min(target_size / w, target_size / h)
19 | new_w = int(w * scale)
20 | new_h = int(h * scale)
21 |
22 | resized_image = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
23 |
24 | pad_w = (target_size - new_w) // 2
25 | pad_h = (target_size - new_h) // 2
26 |
27 | padded_image = cv2.copyMakeBorder(resized_image, pad_h, target_size - new_h - pad_h,
28 | pad_w, target_size - new_w - pad_w, cv2.BORDER_CONSTANT, value=[128, 128, 128])
29 |
30 | return padded_image, scale, pad_w, pad_h
31 |
32 | def correct_bbox(bbox, scale, pad_w, pad_h, original_w, original_h):
33 | # Correct bounding box by reversing the scaling and padding
34 | x1 = (bbox[0] - pad_w) / scale
35 | y1 = (bbox[1] - pad_h) / scale
36 | x2 = (bbox[2] - pad_w) / scale
37 | y2 = (bbox[3] - pad_h) / scale
38 |
39 | # Clip the bounding box to ensure it's within image bounds
40 | x1 = max(0, min(original_w, x1))
41 | y1 = max(0, min(original_h, y1))
42 | x2 = max(0, min(original_w, x2))
43 | y2 = max(0, min(original_h, y2))
44 |
45 | return [int(x1), int(y1), int(x2), int(y2)]
46 |
47 | def main():
48 | detector = YoloDetector(model_path=MODEL_PATH, confidence=0.4)
49 | tracker = Tracker()
50 |
51 | cap = cv2.VideoCapture(VIDEO_PATH)
52 |
53 | if not cap.isOpened():
54 | print("Error: Unable to open video file.")
55 | exit()
56 |
57 | # Frame control using a delay (calculated based on desired FPS)
58 | frame_delay = int(1000 / DESIRED_FPS) # Delay in milliseconds
59 |
60 | while True:
61 | ret, frame = cap.read()
62 | if not ret:
63 | break
64 |
65 | original_h, original_w = frame.shape[:2]
66 |
67 | # Resize the frame with padding to 416x416
68 | resized_frame, scale, pad_w, pad_h = resize_with_padding(frame, YOLO_INPUT_SIZE)
69 |
70 | start_time = time.perf_counter()
71 |
72 | # Detect objects on the resized 416x416 frame
73 | detections = detector.detect(resized_frame)
74 | tracking_ids, boxes = tracker.track(detections, resized_frame)
75 |
76 | # Draw the bounding boxes on the original frame (resize back)
77 | for tracking_id, bbox in zip(tracking_ids, boxes):
78 | corrected_bbox = correct_bbox(bbox, scale, pad_w, pad_h, original_w, original_h)
79 |
80 | # Draw the bounding box and tracking ID on the original frame
81 | cv2.rectangle(frame, (corrected_bbox[0], corrected_bbox[1]), (corrected_bbox[2], corrected_bbox[3]), (0, 0, 255), 2)
82 | cv2.putText(frame, f"{str(tracking_id)}", (corrected_bbox[0], corrected_bbox[1] - 10),
83 | cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
84 |
85 | end_time = time.perf_counter()
86 | fps = 1 / (end_time - start_time)
87 | print(f"Current fps: {fps}")
88 |
89 | # Show the frame
90 | cv2.imshow("Frame", frame)
91 |
92 | # Frame control: wait for a keypress or based on desired FPS
93 | key = cv2.waitKey(frame_delay) & 0xFF # Slows down frame processing
94 |
95 | # Break the loop if 'q' or 'ESC' is pressed
96 | if key == ord("q") or key == 27:
97 | break
98 |
99 | cap.release()
100 | cv2.destroyAllWindows()
101 |
102 | if __name__ == "__main__":
103 | main()
104 |
--------------------------------------------------------------------------------
/yolo_detector.py:
--------------------------------------------------------------------------------
1 | from ultralytics import YOLO
2 |
3 |
4 | class YoloDetector:
5 | def __init__(self, model_path, confidence):
6 | self.model = YOLO(model_path)
7 | self.classList = ["data"]
8 | self.confidence = confidence
9 |
10 | def detect(self, image):
11 | results = self.model.predict(image, conf=self.confidence)
12 | result = results[0]
13 | detections = self.make_detections(result)
14 | return detections
15 |
16 | def make_detections(self, result):
17 | boxes = result.boxes
18 | detections = []
19 | for box in boxes:
20 | x1, y1, x2, y2 = box.xyxy[0]
21 | x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
22 | w, h = x2 - x1, y2 - y1
23 | class_number = int(box.cls[0])
24 |
25 | if result.names[class_number] not in self.classList:
26 | continue
27 | conf = box.conf[0]
28 | detections.append((([x1, y1, w, h]), class_number, conf))
29 | return detections
30 |
--------------------------------------------------------------------------------