├── requirements.txt ├── LICENSE ├── README.md └── YO-FLO.py /requirements.txt: -------------------------------------------------------------------------------- 1 | opencv-python 2 | torch 3 | transformers 4 | Pillow 5 | numpy 6 | tk 7 | colorama 8 | simpleaudio 9 | requests 10 | matplotlib 11 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | MIT License 3 | 4 | Copyright (c) 2024 Charles Norton 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # YO-FLO: YOLO-Like Object Detection with Florence Models 2 | 3 | Welcome to YO-FLO, a proof-of-concept implementation of YOLO-like object detection using the Florence-2-base-ft model. Inspired by the powerful YOLO (You Only Look Once) object detection framework, YO-FLO leverages the capabilities of the Florence foundational vision model to achieve real-time inference while maintaining a lightweight footprint. 4 | 5 | ## Table of Contents 6 | 7 | - Introduction 8 | - Features 9 | - Installation 10 | - Usage 11 | - Error Handling 12 | - Contributing 13 | - License 14 | 15 | ## Introduction 16 | 17 | YO-FLO explores whether the new Florence foundational vision model can be implemented in a YOLO-like format for object detection. Florence-2 is designed by Microsoft as a unified vision-language model capable of handling diverse tasks such as object detection, captioning, and segmentation. To achieve this, it uses a sequence-to-sequence framework where images and task-specific prompts are processed to generate the desired text outputs. The model's architecture combines a DaViT vision encoder with a transformer-based multi-modal encoder-decoder, making it versatile and efficient. 18 | 19 | Florence-2 has been trained on the extensive FLD-5B dataset, containing 126 million images and over 5 billion annotations, ensuring high-quality performance across multiple tasks. Despite its relatively small size, Florence-2 demonstrates strong zero-shot and fine-tuning capabilities, making it an excellent choice for real-time applications. 20 | 21 | ## Features 22 | 23 | - **Real-Time Object Detection**: Achieve YOLO-like performance using the Florence-2-base-ft model. 24 | - **Class-Specific Detection**: Specify the class of objects you want to detect (e.g., 'cat', 'dog'). 25 | - **Expression Comprehension**: Detect objects or states via questions for mundane, cool, and exotic results! 26 | - **Beep and Screenshot on Detection**: Toggle options to beep and take screenshots when the target class or phrase is detected. 27 | - **Tkinter GUI**: A user-friendly graphical interface for easy interaction. 28 | - **Cross-Platform Compatibility**: Works on Windows, macOS, and Linux. 29 | - **Toggle Headless Mode**: Enable or disable headless mode for running without GUI. 30 | - **Update Inference Rate**: Display the rate of inferences per second during real-time detection. 31 | - **Screenshot on Yes/No Inference**: Automatically save screenshots based on yes/no answers from expression comprehension. 32 | - **Visual Grounding**: Identify and highlight specific regions in an image based on descriptive phrases. 33 | - **Evaluate Inference Tree**: Use a tree of inference phrases to evaluate multiple conditions in a single run. 34 | - **Plot Bounding Boxes**: Visualize detection results by plotting bounding boxes on the image. 35 | - **Save Screenshots**: Save screenshots of detected objects or regions of interest. 36 | - **Robust Error Handling**: Comprehensive error management for smooth operation. 37 | - **Webcam Detection Control**: Start and stop webcam-based detection with ease. 38 | - **Debug Mode**: Toggle detailed logging for development and troubleshooting purposes. 39 | 40 | ## Installation 41 | 42 | ### Prerequisites 43 | 44 | - Python 3.7 or higher 45 | - pip 46 | 47 | ### Installing Dependencies 48 | 49 | ``` 50 | pip install torch transformers pillow opencv-python colorama simpleaudio huggingface-hub 51 | ``` 52 | 53 | ## Usage 54 | 55 | ### Running YO-FLO 56 | 57 | To start YO-FLO, run the following command: 58 | 59 | ``` 60 | python yo-flo.py 61 | ``` 62 | 63 | ### Menu Options 64 | 65 | 1. **Select Model Path**: Choose a local directory containing the Florence model. 66 | 2. **Download Model from HuggingFace**: Download and initialize the Florence-2-base-ft model from HuggingFace. 67 | 3. **Set Class Name**: Specify the class name you want to detect (leave blank to show all detections). 68 | 4. **Set Phrase**: Enter the phrase for comprehension detection (e.g., 'Is the person smiling?', 'Is the cat laying down?'). 69 | 5. **Set Visual Grounding Phrase**: Enter the phrase for visual grounding. 70 | 6. **Set Inference Tree**: Enter multiple inference phrases to evaluate several conditions. 71 | 7. **Toggle Beep on Detection**: Enable or disable the beep sound on detection. 72 | 8. **Toggle Screenshot on Detection**: Enable or disable taking screenshots on detection. 73 | 9. **Toggle Screenshot on Yes/No Inference**: Enable or disable taking screenshots based on yes/no inference results. 74 | 10. **Start Webcam Detection**: Begin real-time object detection using your webcam. 75 | 11. **Stop Webcam Detection**: Stop the webcam detection and return to the menu. 76 | 12. **Toggle Debug Mode**: Enable or disable debug mode for detailed logging. 77 | 13. **Toggle Headless Mode**: Enable or disable headless mode for running without GUI. 78 | 14. **Exit**: Exit the application. 79 | 80 | ### Example Workflow 81 | 82 | 1. Select Model Path or Download Model from HuggingFace. 83 | 2. Set Class Name to specify what you want to detect (e.g., 'cat', 'dog'). 84 | 3. Set Phrase for specific phrase-based inference. 85 | 4. Set Visual Grounding Phrase to bound specific regions to detect. 86 | 5. Set Inference Tree for evaluating multiple conditions. 87 | 6. Toggle Beep on Detection if you want an audible alert. 88 | 7. Toggle Screenshot on Detection if you want to save screenshots of detections. 89 | 8. Toggle Screenshot on Yes/No Inference to save screenshots based on comprehension results. 90 | 9. Start Webcam Detection to begin detecting objects in real-time. 91 | 92 | ## Error Handling 93 | 94 | YO-FLO includes robust error handling to ensure smooth operation: 95 | 96 | - **Model Initialization Errors**: Handles cases where the model path is incorrect or the model fails to load. 97 | - **Webcam Access Errors**: Notifies if the webcam cannot be accessed. 98 | - **Image Processing Errors**: Catches errors during frame processing and provides detailed messages. 99 | - **File Not Found Errors**: Alerts if required files (e.g., beep sound file) are missing. 100 | - **General Exception Handling**: Catches and logs any unexpected errors to prevent crashes. 101 | 102 | ### Example Error Messages 103 | 104 | - **Error loading model**: Model path not found or model failed to load. 105 | - **Error running object detection**: Issues during object detection process. 106 | - **Error plotting bounding boxes**: Problems with visualizing detection results. 107 | - **Error toggling beep**: Issues enabling or disabling the beep sound. 108 | - **Error saving screenshot**: Problems saving detection screenshots. 109 | - **OpenCV error**: Errors related to OpenCV operations. 110 | 111 | ## Contributing 112 | 113 | We welcome contributions to improve YO-FLO. Please follow these steps: 114 | 115 | 1. Fork the repository. 116 | 2. Create a new branch (git checkout -b feature-branch). 117 | 3. Commit your changes (git commit -am 'Add new feature'). 118 | 4. Push to the branch (git push origin feature-branch). 119 | 5. Create a new Pull Request. 120 | 121 | ## License 122 | 123 | YO-FLO is licensed under the MIT License. 124 | 125 | --- 126 | 127 | Thank you for using YO-FLO! We are excited to see what amazing applications you will build with this tool. Happy detecting! 128 | -------------------------------------------------------------------------------- /YO-FLO.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import threading 5 | import time 6 | import sys 7 | import cv2 8 | import torch 9 | from datetime import datetime 10 | from PIL import Image 11 | from transformers import AutoProcessor, AutoModelForCausalLM, BitsAndBytesConfig 12 | from huggingface_hub import snapshot_download, hf_hub_download 13 | from colorama import Fore, Style, init 14 | import tkinter as tk 15 | from tkinter import filedialog, simpledialog, Toplevel 16 | import numpy as np 17 | from dataclasses import dataclass, field 18 | from typing import Optional, List, Tuple, Dict, Any 19 | from concurrent.futures import ThreadPoolExecutor, Future 20 | from queue import Queue, Empty, Full 21 | import re 22 | from pathlib import Path 23 | import gc 24 | 25 | # Conditional imports for PTZ functionality 26 | PTZ_AVAILABLE = False 27 | try: 28 | import hid # For PTZ camera HID 29 | PTZ_HID_AVAILABLE = True 30 | except ImportError: 31 | PTZ_HID_AVAILABLE = False 32 | print(f"{Fore.YELLOW}HID module not available. PTZ camera control will be disabled.{Style.RESET_ALL}") 33 | 34 | try: 35 | import msvcrt # For Windows-specific PTZ control with arrow keys 36 | PTZ_MSVCRT_AVAILABLE = True 37 | except ImportError: 38 | PTZ_MSVCRT_AVAILABLE = False 39 | print(f"{Fore.YELLOW}msvcrt module not available. Manual PTZ control with arrow keys will be disabled.{Style.RESET_ALL}") 40 | 41 | # Set overall PTZ availability based on critical components 42 | PTZ_AVAILABLE = PTZ_HID_AVAILABLE 43 | 44 | init(autoreset=True) 45 | 46 | # ============================================================================ 47 | # CONFIGURATION MANAGEMENT 48 | # ============================================================================ 49 | 50 | @dataclass 51 | class AppConfig: 52 | """Central configuration for the YO-FLO application""" 53 | # Model settings 54 | DEFAULT_MODEL: str = "microsoft/Florence-2-base-ft" 55 | MODEL_CACHE_DIR: str = "model" 56 | QUANTIZATION_OPTIONS: List[str] = field(default_factory=lambda: ["none", "4bit"]) 57 | 58 | # PTZ Camera settings 59 | PTZ_VENDOR_ID: int = 0x046D 60 | PTZ_PRODUCT_ID: int = 0x085F 61 | PTZ_USAGE_PAGE: int = 65280 62 | PTZ_USAGE: int = 1 63 | PTZ_COMMAND_DELAY: float = 0.2 64 | 65 | # PTZ Tracking settings 66 | PTZ_DESIRED_RATIO: float = 0.20 67 | PTZ_ZOOM_TOLERANCE: float = 0.4 68 | PTZ_PAN_TILT_TOLERANCE: int = 25 69 | PTZ_PAN_TILT_INTERVAL: float = 0.75 70 | PTZ_ZOOM_INTERVAL: float = 0.5 71 | PTZ_SMOOTHING_FACTOR: float = 0.2 72 | PTZ_MAX_ERRORS: int = 5 73 | 74 | # Recording settings 75 | RECORDING_FPS: float = 20.0 76 | RECORDING_CODEC: str = "XVID" 77 | RECORDING_TIMEOUT: float = 1.0 # Stop recording after 1s of no detection 78 | 79 | # Frame processing settings 80 | MAX_FPS: int = 30 81 | MIN_PROCESS_INTERVAL: float = 1.0 / 30 # Max 30 FPS 82 | FRAME_QUEUE_SIZE: int = 10 83 | PROCESSING_THREADS: int = 4 84 | 85 | # Security settings 86 | ALLOWED_MODEL_EXTENSIONS: List[str] = field(default_factory=lambda: [".bin", ".json", ".safetensors", ".pt"]) 87 | MAX_CLASS_NAME_LENGTH: int = 50 88 | MAX_PHRASE_LENGTH: int = 200 89 | ALLOWED_CLASS_NAME_CHARS: str = r'^[a-zA-Z0-9\s\-_,]+$' 90 | 91 | # Logging settings 92 | LOG_FILE: str = "alerts.log" 93 | LOG_FORMAT: str = "%(asctime)s - %(levelname)s - %(message)s" 94 | 95 | # GUI settings 96 | WINDOW_TITLE: str = "YO-FLO Vision System" 97 | DEFAULT_WEBCAM_INDICES: List[int] = field(default_factory=lambda: [0]) 98 | 99 | # Memory management 100 | CUDA_MEMORY_FRACTION: float = 0.8 101 | CLEANUP_INTERVAL: float = 60.0 # Run cleanup every 60 seconds 102 | 103 | # Global config instance 104 | config = AppConfig() 105 | 106 | # ============================================================================ 107 | # SECURITY UTILITIES 108 | # ============================================================================ 109 | 110 | class SecurityValidator: 111 | """Validates and sanitizes user inputs for security""" 112 | 113 | @staticmethod 114 | def validate_path(path: str, allowed_extensions: List[str] = None) -> Optional[Path]: 115 | """ 116 | Validates a file/directory path for security issues 117 | 118 | :param path: Path to validate 119 | :param allowed_extensions: List of allowed file extensions 120 | :return: Validated Path object or None if invalid 121 | """ 122 | try: 123 | # Convert to Path object for safe handling 124 | safe_path = Path(path).resolve() 125 | 126 | # Check if path exists 127 | if not safe_path.exists(): 128 | logging.warning(f"Path does not exist: {safe_path}") 129 | return None 130 | 131 | # Prevent directory traversal 132 | if ".." in str(path): 133 | logging.error(f"Potential directory traversal attempt: {path}") 134 | return None 135 | 136 | # Check file extensions if provided 137 | if allowed_extensions and safe_path.is_file(): 138 | if safe_path.suffix.lower() not in allowed_extensions: 139 | logging.error(f"Invalid file extension: {safe_path.suffix}") 140 | return None 141 | 142 | return safe_path 143 | 144 | except Exception as e: 145 | logging.error(f"Path validation error: {e}") 146 | return None 147 | 148 | @staticmethod 149 | def sanitize_class_names(input_string: str) -> Optional[List[str]]: 150 | """ 151 | Sanitizes class name input from user 152 | 153 | :param input_string: Raw input string 154 | :return: List of sanitized class names or None if invalid 155 | """ 156 | if not input_string or len(input_string) > config.MAX_CLASS_NAME_LENGTH * 10: 157 | return None 158 | 159 | # Check for allowed characters 160 | if not re.match(config.ALLOWED_CLASS_NAME_CHARS, input_string): 161 | logging.warning(f"Invalid characters in class names: {input_string}") 162 | return None 163 | 164 | # Split and clean individual class names 165 | class_names = [] 166 | for name in input_string.split(','): 167 | name = name.strip().lower() 168 | if name and len(name) <= config.MAX_CLASS_NAME_LENGTH: 169 | class_names.append(name) 170 | 171 | return class_names if class_names else None 172 | 173 | @staticmethod 174 | def sanitize_phrase(phrase: str) -> Optional[str]: 175 | """ 176 | Sanitizes phrase input from user 177 | 178 | :param phrase: Raw phrase input 179 | :return: Sanitized phrase or None if invalid 180 | """ 181 | if not phrase or len(phrase) > config.MAX_PHRASE_LENGTH: 182 | return None 183 | 184 | # Remove potentially dangerous characters 185 | sanitized = re.sub(r'[<>\"\'\\]', '', phrase.strip()) 186 | 187 | return sanitized if sanitized else None 188 | 189 | # ============================================================================ 190 | # IMPROVED FRAME PROCESSOR WITH THREADING 191 | # ============================================================================ 192 | 193 | class FrameProcessor: 194 | """Handles frame processing with proper threading and queue management""" 195 | 196 | def __init__(self, max_workers: int = None): 197 | """ 198 | Initialize the frame processor 199 | 200 | :param max_workers: Maximum number of worker threads 201 | """ 202 | self.max_workers = max_workers or config.PROCESSING_THREADS 203 | self.executor = ThreadPoolExecutor(max_workers=self.max_workers) 204 | self.processing_queue = Queue(maxsize=config.FRAME_QUEUE_SIZE) 205 | self.result_queue = Queue(maxsize=config.FRAME_QUEUE_SIZE) 206 | self.active_futures: List[Future] = [] 207 | self.shutdown_flag = threading.Event() 208 | self.last_process_time = time.time() 209 | self.frame_lock = threading.Lock() 210 | self.stats_lock = threading.Lock() 211 | 212 | # Performance statistics 213 | self.frames_processed = 0 214 | self.frames_dropped = 0 215 | self.processing_times = [] 216 | 217 | def should_process_frame(self) -> bool: 218 | """Check if enough time has passed to process next frame (FPS limiting)""" 219 | current_time = time.time() 220 | time_elapsed = current_time - self.last_process_time 221 | 222 | if time_elapsed >= config.MIN_PROCESS_INTERVAL: 223 | self.last_process_time = current_time 224 | return True 225 | return False 226 | 227 | def add_frame(self, frame: np.ndarray, metadata: Dict[str, Any] = None) -> bool: 228 | """ 229 | Add a frame to the processing queue 230 | 231 | :param frame: Frame to process 232 | :param metadata: Optional metadata for the frame 233 | :return: True if frame was added, False if queue is full 234 | """ 235 | if self.shutdown_flag.is_set(): 236 | return False 237 | 238 | if not self.should_process_frame(): 239 | with self.stats_lock: 240 | self.frames_dropped += 1 241 | return False 242 | 243 | try: 244 | self.processing_queue.put_nowait({ 245 | 'frame': frame, 246 | 'metadata': metadata or {}, 247 | 'timestamp': time.time() 248 | }) 249 | return True 250 | except Full: 251 | with self.stats_lock: 252 | self.frames_dropped += 1 253 | logging.debug("Frame queue is full, dropping frame") 254 | return False 255 | 256 | def process_frame(self, frame_data: Dict[str, Any], 257 | processing_func: callable) -> Optional[Any]: 258 | """ 259 | Process a single frame with memory management 260 | 261 | :param frame_data: Frame data dictionary 262 | :param processing_func: Function to process the frame 263 | :return: Processing result or None 264 | """ 265 | frame = frame_data['frame'] 266 | start_time = time.time() 267 | result = None 268 | 269 | try: 270 | # Process frame 271 | result = processing_func(frame, frame_data['metadata']) 272 | 273 | # Update statistics 274 | with self.stats_lock: 275 | self.frames_processed += 1 276 | self.processing_times.append(time.time() - start_time) 277 | if len(self.processing_times) > 100: 278 | self.processing_times.pop(0) 279 | 280 | return result 281 | 282 | except Exception as e: 283 | logging.error(f"Error processing frame: {e}") 284 | return None 285 | 286 | finally: 287 | # Memory cleanup 288 | del frame 289 | if torch.cuda.is_available(): 290 | torch.cuda.empty_cache() 291 | gc.collect() 292 | 293 | def submit_frame_batch(self, frames: List[np.ndarray], 294 | processing_func: callable) -> List[Future]: 295 | """ 296 | Submit a batch of frames for processing 297 | 298 | :param frames: List of frames to process 299 | :param processing_func: Function to process each frame 300 | :return: List of futures for the submitted tasks 301 | """ 302 | futures = [] 303 | for frame in frames: 304 | if self.add_frame(frame): 305 | future = self.executor.submit( 306 | self.process_frame, 307 | {'frame': frame, 'metadata': {}, 'timestamp': time.time()}, 308 | processing_func 309 | ) 310 | futures.append(future) 311 | self.active_futures.append(future) 312 | 313 | # Clean up completed futures 314 | self.active_futures = [f for f in self.active_futures if not f.done()] 315 | 316 | return futures 317 | 318 | def get_statistics(self) -> Dict[str, Any]: 319 | """Get processing statistics""" 320 | with self.stats_lock: 321 | avg_time = np.mean(self.processing_times) if self.processing_times else 0 322 | return { 323 | 'frames_processed': self.frames_processed, 324 | 'frames_dropped': self.frames_dropped, 325 | 'average_processing_time': avg_time, 326 | 'queue_size': self.processing_queue.qsize(), 327 | 'active_tasks': len(self.active_futures) 328 | } 329 | 330 | def shutdown(self, wait: bool = True): 331 | """ 332 | Shutdown the frame processor 333 | 334 | :param wait: Whether to wait for pending tasks to complete 335 | """ 336 | self.shutdown_flag.set() 337 | 338 | # Clear queues 339 | while not self.processing_queue.empty(): 340 | try: 341 | self.processing_queue.get_nowait() 342 | except Empty: 343 | break 344 | 345 | # Cancel active futures if not waiting 346 | if not wait: 347 | for future in self.active_futures: 348 | future.cancel() 349 | 350 | self.executor.shutdown(wait=wait) 351 | logging.info("Frame processor shutdown complete") 352 | 353 | # ============================================================================ 354 | # IMPROVED LOGGING SETUP 355 | # ============================================================================ 356 | 357 | def setup_logging(log_to_file: bool = False, log_level: int = logging.INFO): 358 | """ 359 | Sets up the logging configuration for the entire application. 360 | 361 | :param log_to_file: Boolean indicating whether to also log to a file. 362 | :param log_level: Logging level (e.g., logging.DEBUG, logging.INFO) 363 | """ 364 | handlers = [logging.StreamHandler()] 365 | 366 | if log_to_file: 367 | # Create logs directory if it doesn't exist 368 | log_dir = Path("logs") 369 | log_dir.mkdir(exist_ok=True) 370 | 371 | # Add timestamp to log filename 372 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 373 | log_file = log_dir / f"yoflo_{timestamp}.log" 374 | 375 | handlers.append(logging.FileHandler(log_file)) 376 | 377 | logging.basicConfig( 378 | level=log_level, 379 | format=config.LOG_FORMAT, 380 | handlers=handlers 381 | ) 382 | 383 | # Set specific loggers to warning to reduce noise 384 | logging.getLogger("transformers").setLevel(logging.WARNING) 385 | logging.getLogger("PIL").setLevel(logging.WARNING) 386 | 387 | # ============================================================================ 388 | # IMPROVED PTZ CONTROLLER 389 | # ============================================================================ 390 | 391 | class PTZController: 392 | """ 393 | Class to control PTZ camera movements via HID commands with improved error handling. 394 | """ 395 | 396 | def __init__(self, vendor_id: int = None, product_id: int = None, 397 | usage_page: int = None, usage: int = None): 398 | """ 399 | Initializes the PTZController with configuration-based defaults 400 | """ 401 | self.vendor_id = vendor_id or config.PTZ_VENDOR_ID 402 | self.product_id = product_id or config.PTZ_PRODUCT_ID 403 | self.usage_page = usage_page or config.PTZ_USAGE_PAGE 404 | self.usage = usage or config.PTZ_USAGE 405 | self.device = None 406 | self.device_lock = threading.Lock() 407 | self.command_count = 0 408 | self.error_count = 0 409 | 410 | self._initialize_device() 411 | 412 | def _initialize_device(self): 413 | """Initialize the HID device connection""" 414 | if not PTZ_HID_AVAILABLE: 415 | logging.warning("PTZ control unavailable - HID module not loaded.") 416 | return 417 | 418 | try: 419 | ptz_path = None 420 | for d in hid.enumerate(self.vendor_id, self.product_id): 421 | if d['usage_page'] == self.usage_page and d['usage'] == self.usage: 422 | ptz_path = d['path'] 423 | break 424 | 425 | if ptz_path: 426 | self.device = hid.device() 427 | self.device.open_path(ptz_path) 428 | logging.info("PTZ HID interface opened successfully.") 429 | else: 430 | logging.warning("No suitable PTZ HID interface found.") 431 | 432 | except IOError as e: 433 | logging.error(f"Error opening PTZ device: {e}") 434 | self.error_count += 1 435 | except Exception as e: 436 | logging.error(f"Unexpected error during PTZ device initialization: {e}") 437 | self.error_count += 1 438 | 439 | def send_command(self, report_id: int, value: int) -> bool: 440 | """ 441 | Sends a command to the PTZ device via HID write with thread safety. 442 | 443 | :param report_id: The report ID for the PTZ control 444 | :param value: The value representing the command 445 | :return: True if command was sent successfully 446 | """ 447 | if not PTZ_HID_AVAILABLE or not self.device: 448 | logging.debug("PTZ Device not initialized or not available.") 449 | return False 450 | 451 | with self.device_lock: 452 | command = [report_id & 0xFF, value] + [0x00] * 30 453 | 454 | try: 455 | self.device.write(command) 456 | self.command_count += 1 457 | logging.debug(f"PTZ command sent: report_id={report_id}, value={value}") 458 | time.sleep(config.PTZ_COMMAND_DELAY) 459 | return True 460 | 461 | except IOError as e: 462 | logging.error(f"Error sending PTZ command: {e}") 463 | self.error_count += 1 464 | 465 | # Try to reconnect if too many errors 466 | if self.error_count > 5: 467 | self._reconnect() 468 | return False 469 | 470 | except Exception as e: 471 | logging.error(f"Unexpected error sending PTZ command: {e}") 472 | self.error_count += 1 473 | return False 474 | 475 | def _reconnect(self): 476 | """Attempt to reconnect to the PTZ device""" 477 | logging.info("Attempting to reconnect to PTZ device...") 478 | self.close() 479 | time.sleep(1) 480 | self._initialize_device() 481 | 482 | def pan_right(self) -> bool: 483 | """Pans the camera to the right.""" 484 | return self.send_command(0x0B, 0x02) 485 | 486 | def pan_left(self) -> bool: 487 | """Pans the camera to the left.""" 488 | return self.send_command(0x0B, 0x03) 489 | 490 | def tilt_up(self) -> bool: 491 | """Tilts the camera upward.""" 492 | return self.send_command(0x0B, 0x00) 493 | 494 | def tilt_down(self) -> bool: 495 | """Tilts the camera downward.""" 496 | return self.send_command(0x0B, 0x01) 497 | 498 | def zoom_in(self) -> bool: 499 | """Zooms the camera in.""" 500 | return self.send_command(0x0B, 0x04) 501 | 502 | def zoom_out(self) -> bool: 503 | """Zooms the camera out.""" 504 | return self.send_command(0x0B, 0x05) 505 | 506 | def get_statistics(self) -> Dict[str, int]: 507 | """Get PTZ controller statistics""" 508 | return { 509 | 'commands_sent': self.command_count, 510 | 'errors': self.error_count 511 | } 512 | 513 | def close(self): 514 | """Closes the HID device handle safely.""" 515 | if not PTZ_HID_AVAILABLE: 516 | return 517 | 518 | with self.device_lock: 519 | if self.device: 520 | try: 521 | self.device.close() 522 | logging.info("PTZ device closed successfully.") 523 | except Exception as e: 524 | logging.error(f"Error closing PTZ device: {e}") 525 | finally: 526 | self.device = None 527 | 528 | # ============================================================================ 529 | # PTZ TRACKER 530 | # ============================================================================ 531 | 532 | class PTZTracker: 533 | """ 534 | Autonomous PTZ tracking class with improved error handling 535 | """ 536 | 537 | def __init__(self, camera: Optional[PTZController], 538 | desired_ratio: float = None, 539 | zoom_tolerance: float = None, 540 | pan_tilt_tolerance: int = None, 541 | pan_tilt_interval: float = None, 542 | zoom_interval: float = None, 543 | smoothing_factor: float = None, 544 | max_consecutive_errors: int = None): 545 | """ 546 | Initializes the PTZTracker with configuration defaults 547 | """ 548 | # Use config defaults if not specified 549 | self.desired_ratio = desired_ratio or config.PTZ_DESIRED_RATIO 550 | self.zoom_tolerance = zoom_tolerance or config.PTZ_ZOOM_TOLERANCE 551 | self.pan_tilt_tolerance = pan_tilt_tolerance or config.PTZ_PAN_TILT_TOLERANCE 552 | self.pan_tilt_interval = pan_tilt_interval or config.PTZ_PAN_TILT_INTERVAL 553 | self.zoom_interval = zoom_interval or config.PTZ_ZOOM_INTERVAL 554 | self.smoothing_factor = smoothing_factor or config.PTZ_SMOOTHING_FACTOR 555 | self.max_consecutive_errors = max_consecutive_errors or config.PTZ_MAX_ERRORS 556 | 557 | # Check if camera is available 558 | if not camera or not PTZ_AVAILABLE: 559 | self.active = False 560 | self.camera = None 561 | logging.info("PTZ Tracker initialized but inactive - PTZ functionality not available.") 562 | return 563 | 564 | # Validate parameters 565 | self._validate_parameters() 566 | 567 | self.camera = camera 568 | self.last_pan_tilt_adjust = 0.0 569 | self.last_zoom_adjust = 0.0 570 | self.smoothed_width = None 571 | self.smoothed_height = None 572 | self.active = False 573 | self.consecutive_errors = 0 574 | self.tracking_lock = threading.Lock() 575 | 576 | def _validate_parameters(self): 577 | """Validate tracker parameters""" 578 | if not (0 < self.smoothing_factor < 1): 579 | raise ValueError("smoothing_factor must be between 0 and 1.") 580 | if self.desired_ratio <= 0 or self.desired_ratio >= 1: 581 | raise ValueError("desired_ratio should be between 0 and 1.") 582 | if self.zoom_tolerance < 0: 583 | raise ValueError("zoom_tolerance must be >= 0.") 584 | if self.pan_tilt_tolerance < 0: 585 | raise ValueError("pan_tilt_tolerance must be >= 0.") 586 | if self.pan_tilt_interval <= 0 or self.zoom_interval <= 0: 587 | raise ValueError("Intervals must be positive.") 588 | if self.max_consecutive_errors < 1: 589 | raise ValueError("max_consecutive_errors must be at least 1.") 590 | 591 | def activate(self, active: bool = True): 592 | """Activate or deactivate PTZ tracking""" 593 | if not PTZ_AVAILABLE or not self.camera: 594 | logging.warning("Cannot activate PTZ tracking - PTZ functionality not available.") 595 | self.active = False 596 | return 597 | 598 | with self.tracking_lock: 599 | self.active = active 600 | if not active: 601 | self.smoothed_width = None 602 | self.smoothed_height = None 603 | self.consecutive_errors = 0 604 | 605 | status = "activated" if active else "deactivated" 606 | logging.info(f"PTZ tracking {status}") 607 | 608 | def adjust_camera(self, bbox: Tuple[float, float, float, float], 609 | frame_width: int, frame_height: int): 610 | """ 611 | Adjusts camera to keep object centered and properly sized 612 | 613 | :param bbox: Bounding box (x1, y1, x2, y2) 614 | :param frame_width: Frame width in pixels 615 | :param frame_height: Frame height in pixels 616 | """ 617 | if not self.active or not PTZ_AVAILABLE or not self.camera: 618 | return 619 | 620 | with self.tracking_lock: 621 | x1, y1, x2, y2 = bbox 622 | 623 | # Validate bounding box 624 | if x1 >= x2 or y1 >= y2: 625 | logging.debug("Invalid bbox coordinates; skipping camera adjustment.") 626 | return 627 | 628 | bbox_width = x2 - x1 629 | bbox_height = y2 - y1 630 | 631 | # Initialize or update smoothed dimensions 632 | if self.smoothed_width is None: 633 | self.smoothed_width = bbox_width 634 | self.smoothed_height = bbox_height 635 | else: 636 | self.smoothed_width = ( 637 | self.smoothing_factor * bbox_width + 638 | (1 - self.smoothing_factor) * self.smoothed_width 639 | ) 640 | self.smoothed_height = ( 641 | self.smoothing_factor * bbox_height + 642 | (1 - self.smoothing_factor) * self.smoothed_height 643 | ) 644 | 645 | # Calculate centers 646 | bbox_center_x = (x1 + x2) / 2 647 | bbox_center_y = (y1 + y2) / 2 648 | frame_center_x = frame_width / 2 649 | frame_center_y = frame_height / 2 650 | 651 | # Calculate desired dimensions 652 | desired_width = frame_width * self.desired_ratio 653 | desired_height = frame_height * self.desired_ratio 654 | 655 | min_width = desired_width * (1 - self.zoom_tolerance) 656 | max_width = desired_width * (1 + self.zoom_tolerance) 657 | min_height = desired_height * (1 - self.zoom_tolerance) 658 | max_height = desired_height * (1 + self.zoom_tolerance) 659 | 660 | current_time = time.time() 661 | 662 | # Handle Pan/Tilt 663 | if (current_time - self.last_pan_tilt_adjust) >= self.pan_tilt_interval: 664 | dx = bbox_center_x - frame_center_x 665 | dy = bbox_center_y - frame_center_y 666 | 667 | pan_tilt_moved = False 668 | 669 | if abs(dx) > self.pan_tilt_tolerance: 670 | command = "pan_left" if dx < 0 else "pan_right" 671 | pan_tilt_moved = self._safe_camera_command(command) or pan_tilt_moved 672 | 673 | if abs(dy) > self.pan_tilt_tolerance: 674 | command = "tilt_up" if dy < 0 else "tilt_down" 675 | pan_tilt_moved = self._safe_camera_command(command) or pan_tilt_moved 676 | 677 | if pan_tilt_moved: 678 | self.last_pan_tilt_adjust = current_time 679 | 680 | # Handle Zoom 681 | if (current_time - self.last_zoom_adjust) >= self.zoom_interval: 682 | width_too_small = self.smoothed_width < min_width 683 | height_too_small = self.smoothed_height < min_height 684 | width_too_large = self.smoothed_width > max_width 685 | height_too_large = self.smoothed_height > max_height 686 | 687 | zoom_moved = False 688 | 689 | if width_too_small or height_too_small: 690 | zoom_moved = self._safe_camera_command("zoom_in") 691 | elif width_too_large or height_too_large: 692 | zoom_moved = self._safe_camera_command("zoom_out") 693 | 694 | if zoom_moved: 695 | self.last_zoom_adjust = current_time 696 | 697 | # Check for too many errors 698 | if self.consecutive_errors >= self.max_consecutive_errors: 699 | logging.error("Too many consecutive camera errors, deactivating PTZ tracking.") 700 | self.activate(False) 701 | 702 | def _safe_camera_command(self, command: str) -> bool: 703 | """ 704 | Safely execute a camera command 705 | 706 | :param command: Command name to execute 707 | :return: True if command succeeded 708 | """ 709 | if not PTZ_AVAILABLE or not self.camera: 710 | self.consecutive_errors += 1 711 | return False 712 | 713 | if not hasattr(self.camera, command): 714 | logging.error(f"Camera does not support command '{command}'.") 715 | return False 716 | 717 | try: 718 | method = getattr(self.camera, command) 719 | success = method() 720 | 721 | if success: 722 | self.consecutive_errors = 0 723 | else: 724 | self.consecutive_errors += 1 725 | 726 | return success 727 | 728 | except Exception as e: 729 | self.consecutive_errors += 1 730 | logging.error(f"Error executing camera command '{command}': {e}") 731 | return False 732 | 733 | # ============================================================================ 734 | # MODEL MANAGER 735 | # ============================================================================ 736 | 737 | class ModelManager: 738 | """ 739 | Enhanced model manager with better memory management 740 | """ 741 | 742 | def __init__(self, device: torch.device, quantization: Optional[str] = None): 743 | """ 744 | Initialize the ModelManager 745 | 746 | :param device: Torch device (cuda/cpu) 747 | :param quantization: Quantization mode (None, "4bit") 748 | """ 749 | self.device = device 750 | self.model = None 751 | self.processor = None 752 | self.quantization = quantization 753 | self.model_lock = threading.Lock() 754 | 755 | def _get_quant_config(self) -> Optional[BitsAndBytesConfig]: 756 | """Get quantization configuration""" 757 | if self.quantization == "4bit": 758 | logging.info("Using 4-bit quantization.") 759 | return BitsAndBytesConfig( 760 | load_in_4bit=True, 761 | bnb_4bit_compute_dtype=torch.float16, 762 | bnb_4bit_use_double_quant=True, 763 | ) 764 | return None 765 | 766 | def load_local_model(self, model_path: str) -> bool: 767 | """ 768 | Load a local model with proper error handling 769 | 770 | :param model_path: Path to model directory 771 | :return: True if successful 772 | """ 773 | with self.model_lock: 774 | if not os.path.exists(model_path): 775 | logging.error(f"Model path {os.path.abspath(model_path)} does not exist.") 776 | return False 777 | 778 | if not os.path.isdir(model_path): 779 | logging.error(f"Model path {os.path.abspath(model_path)} is not a directory.") 780 | return False 781 | 782 | try: 783 | logging.info(f"Loading model from {os.path.abspath(model_path)}") 784 | quant_config = self._get_quant_config() 785 | 786 | # Clear existing model 787 | if self.model: 788 | del self.model 789 | torch.cuda.empty_cache() 790 | 791 | self.model = AutoModelForCausalLM.from_pretrained( 792 | model_path, 793 | trust_remote_code=True, 794 | quantization_config=quant_config, 795 | ).eval() 796 | 797 | if not self.quantization: 798 | self.model.to(self.device) 799 | if torch.cuda.is_available(): 800 | self.model = self.model.half() 801 | logging.info("Using FP16 precision for the model.") 802 | 803 | self.processor = AutoProcessor.from_pretrained( 804 | model_path, trust_remote_code=True 805 | ) 806 | 807 | logging.info(f"Model loaded successfully from {os.path.abspath(model_path)}") 808 | return True 809 | 810 | except (OSError, ValueError, ModuleNotFoundError) as e: 811 | logging.error(f"Error initializing model: {e}") 812 | except Exception as e: 813 | logging.error(f"Unexpected error initializing model: {e}") 814 | 815 | return False 816 | 817 | def download_and_load_model(self, repo_id: str = "microsoft/Florence-2-base-ft") -> bool: 818 | """ 819 | Download and load model from Hugging Face 820 | 821 | :param repo_id: HuggingFace repository ID 822 | :return: True if successful 823 | """ 824 | try: 825 | local_model_dir = config.MODEL_CACHE_DIR 826 | 827 | # Create directory if it doesn't exist 828 | Path(local_model_dir).mkdir(parents=True, exist_ok=True) 829 | 830 | logging.info(f"Downloading model from {repo_id}...") 831 | snapshot_download(repo_id=repo_id, local_dir=local_model_dir) 832 | 833 | if not os.path.exists(local_model_dir): 834 | logging.error(f"Model download failed, directory {local_model_dir} does not exist.") 835 | return False 836 | 837 | logging.info(f"Model downloaded to {os.path.abspath(local_model_dir)}") 838 | return self.load_local_model(local_model_dir) 839 | 840 | except OSError as e: 841 | logging.error(f"OS error during model download: {e}") 842 | except Exception as e: 843 | logging.error(f"Error downloading model: {e}") 844 | 845 | return False 846 | 847 | # ============================================================================ 848 | # RECORDING MANAGER 849 | # ============================================================================ 850 | 851 | class RecordingManager: 852 | """ 853 | Enhanced recording manager with better resource management 854 | """ 855 | 856 | def __init__(self, record_mode: Optional[str] = None): 857 | """ 858 | Initialize the recording manager 859 | 860 | :param record_mode: Recording mode (None, "od", "infy", "infn") 861 | """ 862 | self.record_mode = record_mode 863 | self.recording = False 864 | self.video_writer = None 865 | self.video_out_path = None 866 | self.last_detection_time = time.time() 867 | self.writer_lock = threading.Lock() 868 | self.frame_count = 0 869 | self.start_time = None 870 | 871 | def start_recording(self, frame: np.ndarray) -> bool: 872 | """ 873 | Start video recording 874 | 875 | :param frame: Initial frame 876 | :return: True if successful 877 | """ 878 | with self.writer_lock: 879 | if self.recording or not self.record_mode: 880 | return False 881 | 882 | try: 883 | height, width = frame.shape[:2] 884 | timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') 885 | 886 | # Create recordings directory 887 | record_dir = Path("recordings") 888 | record_dir.mkdir(exist_ok=True) 889 | 890 | self.video_out_path = str(record_dir / f"recording_{timestamp}.avi") 891 | 892 | fourcc = cv2.VideoWriter_fourcc(*config.RECORDING_CODEC) 893 | self.video_writer = cv2.VideoWriter( 894 | self.video_out_path, 895 | fourcc, 896 | config.RECORDING_FPS, 897 | (width, height) 898 | ) 899 | 900 | if self.video_writer.isOpened(): 901 | self.recording = True 902 | self.start_time = time.time() 903 | self.frame_count = 0 904 | logging.info(f"Started recording: {self.video_out_path}") 905 | return True 906 | else: 907 | logging.error("Failed to open video writer") 908 | return False 909 | 910 | except Exception as e: 911 | logging.error(f"Error starting recording: {e}") 912 | return False 913 | 914 | def stop_recording(self) -> Optional[str]: 915 | """ 916 | Stop video recording 917 | 918 | :return: Path to recorded video 919 | """ 920 | with self.writer_lock: 921 | if not self.recording: 922 | return None 923 | 924 | try: 925 | if self.video_writer: 926 | self.video_writer.release() 927 | 928 | self.recording = False 929 | duration = time.time() - self.start_time if self.start_time else 0 930 | 931 | logging.info( 932 | f"Stopped recording: {self.video_out_path} " 933 | f"(Duration: {duration:.2f}s, Frames: {self.frame_count})" 934 | ) 935 | 936 | path = self.video_out_path 937 | self.video_out_path = None 938 | self.video_writer = None 939 | self.frame_count = 0 940 | self.start_time = None 941 | 942 | return path 943 | 944 | except Exception as e: 945 | logging.error(f"Error stopping recording: {e}") 946 | return None 947 | 948 | def write_frame(self, frame: np.ndarray) -> bool: 949 | """ 950 | Write a frame to the video 951 | 952 | :param frame: Frame to write 953 | :return: True if successful 954 | """ 955 | with self.writer_lock: 956 | if not self.recording or not self.video_writer: 957 | return False 958 | 959 | try: 960 | self.video_writer.write(frame) 961 | self.frame_count += 1 962 | return True 963 | except Exception as e: 964 | logging.error(f"Error writing frame: {e}") 965 | return False 966 | 967 | def handle_recording_by_detection(self, detections: List, frame: np.ndarray): 968 | """Handle recording based on object detection""" 969 | if not self.record_mode or self.record_mode != "od": 970 | return 971 | 972 | current_time = time.time() 973 | 974 | if detections: 975 | if not self.recording: 976 | self.start_recording(frame) 977 | self.last_detection_time = current_time 978 | self.write_frame(frame) 979 | else: 980 | if self.recording and (current_time - self.last_detection_time) > config.RECORDING_TIMEOUT: 981 | self.stop_recording() 982 | 983 | def handle_recording_by_inference(self, inference_result: str, frame: np.ndarray): 984 | """Handle recording based on inference results""" 985 | if not self.record_mode or self.record_mode not in ["infy", "infn"]: 986 | return 987 | 988 | should_record = False 989 | 990 | if self.record_mode == "infy" and inference_result.lower() == "yes": 991 | should_record = True 992 | elif self.record_mode == "infn" and inference_result.lower() == "no": 993 | should_record = True 994 | 995 | if should_record: 996 | if not self.recording: 997 | self.start_recording(frame) 998 | self.write_frame(frame) 999 | else: 1000 | if self.recording: 1001 | self.stop_recording() 1002 | 1003 | def cleanup(self): 1004 | """Clean up resources""" 1005 | if self.recording: 1006 | self.stop_recording() 1007 | 1008 | # ============================================================================ 1009 | # IMAGE UTILITIES 1010 | # ============================================================================ 1011 | 1012 | class ImageUtils: 1013 | """Utility class for image operations""" 1014 | 1015 | @staticmethod 1016 | def plot_bbox(image: np.ndarray, detections: List[Tuple[List[float], str]]) -> np.ndarray: 1017 | """ 1018 | Draw bounding boxes on image 1019 | 1020 | :param image: Input image 1021 | :param detections: List of (bbox, label) tuples 1022 | :return: Image with bounding boxes 1023 | """ 1024 | try: 1025 | for bbox, label in detections: 1026 | x1, y1, x2, y2 = map(int, bbox) 1027 | cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2) 1028 | cv2.putText( 1029 | image, 1030 | label, 1031 | (x1, y1 - 10), 1032 | cv2.FONT_HERSHEY_SIMPLEX, 1033 | 0.5, 1034 | (0, 255, 0), 1035 | 2, 1036 | ) 1037 | return image 1038 | except cv2.error as e: 1039 | logging.error(f"OpenCV error plotting bounding boxes: {e}") 1040 | except Exception as e: 1041 | logging.error(f"Error plotting bounding boxes: {e}") 1042 | return image 1043 | 1044 | @staticmethod 1045 | def save_screenshot(frame: np.ndarray) -> Optional[str]: 1046 | """ 1047 | Save a screenshot with timestamp 1048 | 1049 | :param frame: Frame to save 1050 | :return: Path to saved screenshot 1051 | """ 1052 | try: 1053 | # Create screenshots directory 1054 | screenshot_dir = Path("screenshots") 1055 | screenshot_dir.mkdir(exist_ok=True) 1056 | 1057 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 1058 | filename = str(screenshot_dir / f"screenshot_{timestamp}.png") 1059 | 1060 | cv2.imwrite(filename, frame) 1061 | logging.info(f"Screenshot saved: {filename}") 1062 | return filename 1063 | 1064 | except cv2.error as e: 1065 | logging.error(f"OpenCV error saving screenshot: {e}") 1066 | except Exception as e: 1067 | logging.error(f"Error saving screenshot: {e}") 1068 | 1069 | return None 1070 | 1071 | # ============================================================================ 1072 | # ALERT LOGGER 1073 | # ============================================================================ 1074 | 1075 | class AlertLogger: 1076 | """Enhanced alert logging with thread safety""" 1077 | 1078 | def __init__(self, log_file: str = None): 1079 | """ 1080 | Initialize alert logger 1081 | 1082 | :param log_file: Path to log file 1083 | """ 1084 | self.log_file = log_file or config.LOG_FILE 1085 | self.log_lock = threading.Lock() 1086 | 1087 | # Create logs directory 1088 | log_dir = Path("logs") 1089 | log_dir.mkdir(exist_ok=True) 1090 | 1091 | self.log_path = log_dir / self.log_file 1092 | 1093 | def log_alert(self, message: str): 1094 | """ 1095 | Log an alert message 1096 | 1097 | :param message: Alert message 1098 | """ 1099 | with self.log_lock: 1100 | try: 1101 | timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f") 1102 | log_entry = f"{timestamp} - {message}\n" 1103 | 1104 | with open(self.log_path, "a") as log_file: 1105 | log_file.write(log_entry) 1106 | 1107 | logging.info(f"Alert logged: {message}") 1108 | 1109 | except IOError as e: 1110 | logging.error(f"IO error logging alert: {e}") 1111 | except Exception as e: 1112 | logging.error(f"Error logging alert: {e}") 1113 | 1114 | # ============================================================================ 1115 | # PTZ CONTROL THREAD 1116 | # ============================================================================ 1117 | 1118 | def ptz_control_thread(ptz_camera: PTZController): 1119 | """ 1120 | Thread for manual PTZ control using keyboard 1121 | 1122 | :param ptz_camera: PTZ camera controller 1123 | """ 1124 | if not PTZ_MSVCRT_AVAILABLE: 1125 | print("Cannot start PTZ control thread - msvcrt module not available.") 1126 | return 1127 | 1128 | if not PTZ_HID_AVAILABLE or not ptz_camera: 1129 | print("Cannot start PTZ control thread - PTZ camera not available.") 1130 | return 1131 | 1132 | print("PTZ control started. Use arrow keys to pan/tilt, +/- to zoom, q to quit.") 1133 | 1134 | while True: 1135 | try: 1136 | ch = msvcrt.getch() 1137 | 1138 | if ch == b'\xe0': # Arrow key prefix 1139 | arrow = msvcrt.getch() 1140 | if arrow == b'H': # Up arrow 1141 | ptz_camera.tilt_up() 1142 | elif arrow == b'P': # Down arrow 1143 | ptz_camera.tilt_down() 1144 | elif arrow == b'K': # Left arrow 1145 | ptz_camera.pan_left() 1146 | elif arrow == b'M': # Right arrow 1147 | ptz_camera.pan_right() 1148 | elif ch == b'+': 1149 | ptz_camera.zoom_in() 1150 | elif ch == b'-': 1151 | ptz_camera.zoom_out() 1152 | elif ch == b'q': 1153 | print("Quitting PTZ control.") 1154 | break 1155 | 1156 | except Exception as e: 1157 | logging.error(f"Error in PTZ control thread: {e}") 1158 | break 1159 | 1160 | # ============================================================================ 1161 | # MAIN YO-FLO APPLICATION CLASS 1162 | # ============================================================================ 1163 | 1164 | class YO_FLO: 1165 | def __init__(self): 1166 | """Initialize YO-FLO with all attributes properly initialized""" 1167 | 1168 | # Device configuration 1169 | self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 1170 | 1171 | # Model and processor 1172 | self.model = None 1173 | self.processor = None 1174 | self.model_path = None 1175 | self.model_manager = None 1176 | self.quantization = None 1177 | 1178 | # GUI elements 1179 | self.root = tk.Tk() 1180 | self.root.withdraw() 1181 | self.caption_label = None 1182 | self.inference_rate_label = None 1183 | self.inference_result_label = None 1184 | self.inference_phrases_result_labels = [] 1185 | 1186 | # Detection settings 1187 | self.class_names = [] 1188 | self.detections = [] 1189 | self.phrase = None 1190 | self.visual_grounding_phrase = None 1191 | self.inference_title = None 1192 | self.inference_phrases = [] 1193 | 1194 | # Feature flags 1195 | self.headless_mode = False 1196 | self.object_detection_active = False 1197 | self.expression_comprehension_active = False 1198 | self.visual_grounding_active = False 1199 | self.inference_tree_active = False 1200 | self.beep_active = False 1201 | self.screenshot_active = False 1202 | self.screenshot_on_yes_active = False 1203 | self.screenshot_on_no_active = False 1204 | self.debug = False 1205 | self.log_to_file_active = False 1206 | 1207 | # Tracking and timing 1208 | self.target_detected = False 1209 | self.last_beep_time = 0 1210 | self.inference_start_time = None 1211 | self.inference_count = 0 1212 | self.last_process_time = time.time() 1213 | 1214 | # Image handling 1215 | self.latest_image = None 1216 | self.frame_lock = threading.Lock() 1217 | 1218 | # Recording 1219 | self.record = None 1220 | self.recording_manager = None 1221 | 1222 | # PTZ 1223 | self.ptz_camera = None 1224 | self.ptz_tracker = None 1225 | self.track_object_name = None 1226 | 1227 | # Threading 1228 | self.webcam_threads = [] 1229 | self.webcam_indices = config.DEFAULT_WEBCAM_INDICES 1230 | self.stop_webcam_flag = threading.Event() 1231 | self.frame_processor = FrameProcessor() 1232 | 1233 | # Performance 1234 | self.scaler = torch.cuda.amp.GradScaler() 1235 | 1236 | # Cleanup 1237 | self.cleanup_thread = None 1238 | self.cleanup_flag = threading.Event() 1239 | 1240 | # Security validator 1241 | self.validator = SecurityValidator() 1242 | 1243 | # Alert logger 1244 | self.alert_logger = AlertLogger() 1245 | 1246 | # Start periodic cleanup 1247 | self._start_cleanup_thread() 1248 | 1249 | def _start_cleanup_thread(self): 1250 | """Start a thread for periodic memory cleanup""" 1251 | def cleanup_worker(): 1252 | while not self.cleanup_flag.is_set(): 1253 | time.sleep(config.CLEANUP_INTERVAL) 1254 | self._periodic_cleanup() 1255 | 1256 | self.cleanup_thread = threading.Thread(target=cleanup_worker, daemon=True) 1257 | self.cleanup_thread.start() 1258 | 1259 | def _periodic_cleanup(self): 1260 | """Perform periodic memory cleanup""" 1261 | try: 1262 | gc.collect() 1263 | if torch.cuda.is_available(): 1264 | torch.cuda.empty_cache() 1265 | logging.debug("Periodic memory cleanup completed") 1266 | except Exception as e: 1267 | logging.error(f"Error during periodic cleanup: {e}") 1268 | 1269 | # ------------------------------------------------------------------------- 1270 | # Model Management 1271 | # ------------------------------------------------------------------------- 1272 | 1273 | def init_model_manager(self, quantization_mode: Optional[str] = None): 1274 | """Initialize the ModelManager with proper validation""" 1275 | if quantization_mode and quantization_mode not in config.QUANTIZATION_OPTIONS: 1276 | logging.warning(f"Invalid quantization mode: {quantization_mode}") 1277 | quantization_mode = None 1278 | 1279 | self.quantization = quantization_mode 1280 | self.model_manager = ModelManager(self.device, self.quantization) 1281 | 1282 | def load_local_model(self, model_path: str): 1283 | """Load local model with path validation""" 1284 | safe_path = self.validator.validate_path(model_path) 1285 | if not safe_path: 1286 | print(f"{Fore.RED}Invalid or unsafe model path: {model_path}{Style.RESET_ALL}") 1287 | return 1288 | 1289 | if not safe_path.is_dir(): 1290 | print(f"{Fore.RED}Model path must be a directory: {safe_path}{Style.RESET_ALL}") 1291 | return 1292 | 1293 | if not self.model_manager: 1294 | self.init_model_manager() 1295 | 1296 | ok = self.model_manager.load_local_model(str(safe_path)) 1297 | if ok: 1298 | self.model = self.model_manager.model 1299 | self.processor = self.model_manager.processor 1300 | self.model_path = str(safe_path) 1301 | print(f"{Fore.GREEN}Model loaded successfully from {safe_path}{Style.RESET_ALL}") 1302 | else: 1303 | print(f"{Fore.RED}Failed to load model from {safe_path}{Style.RESET_ALL}") 1304 | 1305 | def download_model(self, repo_id: str = "microsoft/Florence-2-base-ft"): 1306 | """Download and load model from Hugging Face""" 1307 | if not self.model_manager: 1308 | self.init_model_manager() 1309 | 1310 | ok = self.model_manager.download_and_load_model(repo_id) 1311 | if ok: 1312 | self.model = self.model_manager.model 1313 | self.processor = self.model_manager.processor 1314 | print(f"{Fore.GREEN}Model downloaded and initialized successfully!{Style.RESET_ALL}") 1315 | else: 1316 | print(f"{Fore.RED}Failed to download/initialize model.{Style.RESET_ALL}") 1317 | 1318 | # ------------------------------------------------------------------------- 1319 | # Model Inference Methods 1320 | # ------------------------------------------------------------------------- 1321 | 1322 | def prepare_inputs(self, task_prompt: str, image: Image.Image, phrase: Optional[str] = None): 1323 | """Prepare inputs for model inference""" 1324 | inputs = self.processor(text=task_prompt, images=image, return_tensors="pt").to(self.device) 1325 | 1326 | if phrase: 1327 | inputs["input_ids"] = torch.cat( 1328 | [ 1329 | inputs["input_ids"], 1330 | self.processor.tokenizer(phrase, return_tensors="pt") 1331 | .input_ids[:, 1:] 1332 | .to(self.device), 1333 | ], 1334 | dim=1, 1335 | ) 1336 | 1337 | for k, v in inputs.items(): 1338 | if torch.is_floating_point(v): 1339 | inputs[k] = v.half() 1340 | 1341 | return inputs 1342 | 1343 | def run_model(self, inputs: Dict[str, torch.Tensor]) -> torch.Tensor: 1344 | """Run model inference""" 1345 | with torch.amp.autocast("cuda"): 1346 | generated_ids = self.model.generate( 1347 | input_ids=inputs["input_ids"], 1348 | pixel_values=inputs.get("pixel_values"), 1349 | max_new_tokens=1024, 1350 | early_stopping=False, 1351 | do_sample=False, 1352 | num_beams=1, 1353 | ) 1354 | return generated_ids 1355 | 1356 | def process_object_detection_outputs(self, generated_ids: torch.Tensor, 1357 | image_size: Tuple[int, int]) -> Dict: 1358 | """Process object detection outputs""" 1359 | generated_text = self.processor.batch_decode( 1360 | generated_ids, skip_special_tokens=False 1361 | )[0] 1362 | parsed_answer = self.processor.post_process_generation( 1363 | generated_text, task="", image_size=image_size 1364 | ) 1365 | return parsed_answer 1366 | 1367 | def process_expression_comprehension_outputs(self, generated_ids: torch.Tensor) -> str: 1368 | """Process expression comprehension outputs""" 1369 | generated_text = self.processor.batch_decode( 1370 | generated_ids, skip_special_tokens=False 1371 | )[0] 1372 | return generated_text 1373 | 1374 | def run_object_detection(self, image: Image.Image) -> List[Tuple[List[float], str]]: 1375 | """Run object detection on an image""" 1376 | try: 1377 | if not self.model or not self.processor: 1378 | raise ValueError("Model or processor is not initialized.") 1379 | 1380 | task_prompt = "" 1381 | if self.debug: 1382 | print(f"Running object detection with task prompt: {task_prompt}") 1383 | 1384 | inputs = self.prepare_inputs(task_prompt, image) 1385 | generated_ids = self.run_model(inputs) 1386 | 1387 | if self.debug: 1388 | print(f"Generated IDs: {generated_ids}") 1389 | 1390 | parsed_answer = self.process_object_detection_outputs(generated_ids, image.size) 1391 | 1392 | if self.debug: 1393 | print(f"Parsed answer: {parsed_answer}") 1394 | 1395 | detections = [] 1396 | if parsed_answer and "" in parsed_answer: 1397 | for bbox, label in zip( 1398 | parsed_answer[""]["bboxes"], 1399 | parsed_answer[""]["labels"] 1400 | ): 1401 | if not self.class_names or label.lower() in self.class_names: 1402 | detections.append((bbox, label)) 1403 | 1404 | return detections 1405 | 1406 | except AttributeError as e: 1407 | logging.error(f"Model or processor not initialized properly: {e}") 1408 | except Exception as e: 1409 | logging.error(f"Error running object detection: {e}") 1410 | 1411 | return [] 1412 | 1413 | def run_expression_comprehension(self, image: Image.Image, phrase: str) -> Optional[str]: 1414 | """Run expression comprehension on an image""" 1415 | try: 1416 | task_prompt = "" 1417 | 1418 | if self.debug: 1419 | print(f"Running expression comprehension with phrase: {phrase}") 1420 | 1421 | inputs = self.prepare_inputs(task_prompt, image, phrase) 1422 | generated_ids = self.run_model(inputs) 1423 | 1424 | if self.debug: 1425 | print(f"Generated IDs: {generated_ids}") 1426 | 1427 | generated_text = self.process_expression_comprehension_outputs(generated_ids) 1428 | 1429 | if self.debug: 1430 | print(f"Generated text: {generated_text}") 1431 | 1432 | return generated_text 1433 | 1434 | except Exception as e: 1435 | logging.error(f"Error running expression comprehension: {e}") 1436 | return None 1437 | 1438 | def run_visual_grounding(self, image: Image.Image, phrase: str) -> Optional[List[float]]: 1439 | """Run visual grounding on an image""" 1440 | try: 1441 | task_prompt = "" 1442 | inputs = self.prepare_inputs(task_prompt, image, phrase) 1443 | generated_ids = self.run_model(inputs) 1444 | 1445 | if self.debug: 1446 | print(f"Generated IDs: {generated_ids}") 1447 | 1448 | generated_text = self.processor.batch_decode( 1449 | generated_ids, skip_special_tokens=False 1450 | )[0] 1451 | 1452 | if self.debug: 1453 | print(f"Generated text: {generated_text}") 1454 | 1455 | parsed_answer = self.processor.post_process_generation( 1456 | generated_text, task=task_prompt, image_size=image.size 1457 | ) 1458 | 1459 | if self.debug: 1460 | print(f"Parsed answer: {parsed_answer}") 1461 | 1462 | if task_prompt in parsed_answer and parsed_answer[task_prompt]["bboxes"]: 1463 | return parsed_answer[task_prompt]["bboxes"][0] 1464 | 1465 | return None 1466 | 1467 | except Exception as e: 1468 | logging.error(f"Error running visual grounding: {e}") 1469 | return None 1470 | 1471 | def evaluate_inference_tree(self, image: Image.Image) -> Tuple[str, List[bool]]: 1472 | """Evaluate inference tree on an image""" 1473 | try: 1474 | if not self.inference_phrases: 1475 | logging.error("No inference phrases set.") 1476 | return "FAIL", [] 1477 | 1478 | results = [] 1479 | phrase_results = [] 1480 | 1481 | for phrase in self.inference_phrases: 1482 | result = self.run_expression_comprehension(image, phrase) 1483 | if result: 1484 | if "yes" in result.lower(): 1485 | results.append(True) 1486 | phrase_results.append(True) 1487 | else: 1488 | results.append(False) 1489 | phrase_results.append(False) 1490 | 1491 | overall_result = "PASS" if all(results) else "FAIL" 1492 | return overall_result, phrase_results 1493 | 1494 | except Exception as e: 1495 | logging.error(f"Error evaluating inference tree: {e}") 1496 | return "FAIL", [] 1497 | 1498 | # ------------------------------------------------------------------------- 1499 | # GUI Methods 1500 | # ------------------------------------------------------------------------- 1501 | 1502 | def select_model_path(self): 1503 | """Select model path with security validation""" 1504 | try: 1505 | model_path = filedialog.askdirectory( 1506 | title="Select Model Directory", 1507 | initialdir=os.getcwd() 1508 | ) 1509 | 1510 | if model_path: 1511 | self.load_local_model(model_path) 1512 | else: 1513 | print(f"{Fore.YELLOW}Model path selection cancelled.{Style.RESET_ALL}") 1514 | 1515 | except Exception as e: 1516 | print(f"{Fore.RED}Error selecting model path: {e}{Style.RESET_ALL}") 1517 | 1518 | def download_model_gui(self): 1519 | """Download model from GUI""" 1520 | try: 1521 | self.download_model(config.DEFAULT_MODEL) 1522 | except Exception as e: 1523 | print(f"{Fore.RED}Error downloading model: {e}{Style.RESET_ALL}") 1524 | 1525 | def set_class_names(self): 1526 | """Set class names with input sanitization""" 1527 | try: 1528 | class_names_input = simpledialog.askstring( 1529 | "Set Class Names", 1530 | "Enter class names separated by commas (e.g., 'cat, dog'):" 1531 | ) 1532 | 1533 | if class_names_input: 1534 | sanitized_names = self.validator.sanitize_class_names(class_names_input) 1535 | 1536 | if sanitized_names: 1537 | self.class_names = sanitized_names 1538 | print(f"{Fore.GREEN}Set to detect: {', '.join(self.class_names)}{Style.RESET_ALL}") 1539 | else: 1540 | print(f"{Fore.RED}Invalid class names input{Style.RESET_ALL}") 1541 | else: 1542 | self.class_names = [] 1543 | print(f"{Fore.GREEN}Showing all detections{Style.RESET_ALL}") 1544 | 1545 | except Exception as e: 1546 | print(f"{Fore.RED}Error setting class names: {e}{Style.RESET_ALL}") 1547 | 1548 | def set_phrase(self): 1549 | """Set phrase with input sanitization""" 1550 | try: 1551 | phrase_input = simpledialog.askstring( 1552 | "Set Phrase", 1553 | "Enter a yes/no question (e.g., 'Is the person smiling?'):" 1554 | ) 1555 | 1556 | if phrase_input: 1557 | sanitized_phrase = self.validator.sanitize_phrase(phrase_input) 1558 | 1559 | if sanitized_phrase: 1560 | self.phrase = sanitized_phrase 1561 | print(f"{Fore.GREEN}Set to comprehend: {self.phrase}{Style.RESET_ALL}") 1562 | else: 1563 | print(f"{Fore.RED}Invalid phrase input{Style.RESET_ALL}") 1564 | else: 1565 | self.phrase = None 1566 | print(f"{Fore.GREEN}No phrase set for comprehension{Style.RESET_ALL}") 1567 | 1568 | except Exception as e: 1569 | print(f"{Fore.RED}Error setting phrase: {e}{Style.RESET_ALL}") 1570 | 1571 | def set_visual_grounding_phrase(self): 1572 | """Set visual grounding phrase""" 1573 | try: 1574 | phrase_input = simpledialog.askstring( 1575 | "Set Visual Grounding Phrase", 1576 | "Enter the phrase for visual grounding:" 1577 | ) 1578 | 1579 | if phrase_input: 1580 | sanitized_phrase = self.validator.sanitize_phrase(phrase_input) 1581 | 1582 | if sanitized_phrase: 1583 | self.visual_grounding_phrase = sanitized_phrase 1584 | print(f"{Fore.GREEN}Set visual grounding phrase: {self.visual_grounding_phrase}{Style.RESET_ALL}") 1585 | else: 1586 | print(f"{Fore.RED}Invalid phrase input{Style.RESET_ALL}") 1587 | else: 1588 | self.visual_grounding_phrase = None 1589 | print(f"{Fore.GREEN}No phrase set for visual grounding{Style.RESET_ALL}") 1590 | 1591 | except Exception as e: 1592 | print(f"{Fore.RED}Error setting visual grounding phrase: {e}{Style.RESET_ALL}") 1593 | 1594 | def set_inference_tree(self): 1595 | """Set up inference tree""" 1596 | try: 1597 | self.inference_title = simpledialog.askstring( 1598 | "Inference Title", 1599 | "Enter the title for the inference tree:" 1600 | ) 1601 | 1602 | self.inference_phrases = [] 1603 | for i in range(3): 1604 | phrase = simpledialog.askstring( 1605 | "Set Inference Phrase", 1606 | f"Enter inference phrase {i+1} (e.g., 'Is it cloudy?'):" 1607 | ) 1608 | 1609 | if phrase: 1610 | sanitized = self.validator.sanitize_phrase(phrase) 1611 | if sanitized: 1612 | self.inference_phrases.append(sanitized) 1613 | else: 1614 | print(f"{Fore.RED}Invalid phrase {i+1}{Style.RESET_ALL}") 1615 | return 1616 | else: 1617 | print(f"{Fore.YELLOW}Cancelled setting inference phrase {i+1}.{Style.RESET_ALL}") 1618 | return 1619 | 1620 | if self.inference_title and self.inference_phrases: 1621 | print(f"{Fore.GREEN}Inference tree set with title: {self.inference_title}{Style.RESET_ALL}") 1622 | for phrase in self.inference_phrases: 1623 | print(f"{Fore.GREEN}Inference phrase: {phrase}{Style.RESET_ALL}") 1624 | else: 1625 | print(f"{Fore.YELLOW}Inference tree setting cancelled.{Style.RESET_ALL}") 1626 | 1627 | except Exception as e: 1628 | print(f"{Fore.RED}Error setting inference tree: {e}{Style.RESET_ALL}") 1629 | 1630 | # ------------------------------------------------------------------------- 1631 | # Feature Toggles 1632 | # ------------------------------------------------------------------------- 1633 | 1634 | def toggle_file_logging(self): 1635 | """Toggle file logging""" 1636 | self.log_to_file_active = not self.log_to_file_active 1637 | setup_logging(self.log_to_file_active) 1638 | status = "enabled" if self.log_to_file_active else "disabled" 1639 | print(f"{Fore.GREEN}File logging is now {status}{Style.RESET_ALL}") 1640 | 1641 | def toggle_headless(self): 1642 | """Toggle headless mode""" 1643 | try: 1644 | self.headless_mode = not self.headless_mode 1645 | status = "enabled" if self.headless_mode else "disabled" 1646 | print(f"{Fore.GREEN}Headless mode is now {status}{Style.RESET_ALL}") 1647 | except Exception as e: 1648 | print(f"{Fore.RED}Error toggling headless mode: {e}{Style.RESET_ALL}") 1649 | 1650 | def toggle_object_detection(self): 1651 | """Toggle object detection""" 1652 | self.object_detection_active = not self.object_detection_active 1653 | if not self.object_detection_active: 1654 | self.detections.clear() 1655 | self.class_names = [] 1656 | status = "enabled" if self.object_detection_active else "disabled" 1657 | print(f"{Fore.GREEN}Object detection is now {status}{Style.RESET_ALL}") 1658 | 1659 | def toggle_expression_comprehension(self): 1660 | """Toggle expression comprehension""" 1661 | self.expression_comprehension_active = not self.expression_comprehension_active 1662 | status = "enabled" if self.expression_comprehension_active else "disabled" 1663 | print(f"{Fore.GREEN}Expression comprehension is now {status}{Style.RESET_ALL}") 1664 | 1665 | def toggle_visual_grounding(self): 1666 | """Toggle visual grounding""" 1667 | self.visual_grounding_active = not self.visual_grounding_active 1668 | status = "enabled" if self.visual_grounding_active else "disabled" 1669 | print(f"{Fore.GREEN}Visual grounding is now {status}{Style.RESET_ALL}") 1670 | 1671 | def toggle_inference_tree(self): 1672 | """Toggle inference tree""" 1673 | self.inference_tree_active = not self.inference_tree_active 1674 | status = "enabled" if self.inference_tree_active else "disabled" 1675 | print(f"{Fore.GREEN}Inference tree evaluation is now {status}{Style.RESET_ALL}") 1676 | 1677 | def toggle_beep(self): 1678 | """Toggle beep on detection""" 1679 | self.beep_active = not self.beep_active 1680 | status = "active" if self.beep_active else "inactive" 1681 | print(f"{Fore.GREEN}Beep is now {status}{Style.RESET_ALL}") 1682 | 1683 | def toggle_screenshot(self): 1684 | """Toggle screenshot on detection""" 1685 | self.screenshot_active = not self.screenshot_active 1686 | status = "active" if self.screenshot_active else "inactive" 1687 | print(f"{Fore.GREEN}Screenshot on detection is now {status}{Style.RESET_ALL}") 1688 | 1689 | def toggle_screenshot_on_yes(self): 1690 | """Toggle screenshot on yes inference""" 1691 | self.screenshot_on_yes_active = not self.screenshot_on_yes_active 1692 | status = "active" if self.screenshot_on_yes_active else "inactive" 1693 | print(f"{Fore.GREEN}Screenshot on Yes Inference is now {status}{Style.RESET_ALL}") 1694 | 1695 | def toggle_screenshot_on_no(self): 1696 | """Toggle screenshot on no inference""" 1697 | self.screenshot_on_no_active = not self.screenshot_on_no_active 1698 | status = "active" if self.screenshot_on_no_active else "inactive" 1699 | print(f"{Fore.GREEN}Screenshot on No Inference is now {status}{Style.RESET_ALL}") 1700 | 1701 | def toggle_debug(self): 1702 | """Toggle debug mode""" 1703 | self.debug = not self.debug 1704 | status = "enabled" if self.debug else "disabled" 1705 | print(f"{Fore.GREEN}Debug mode is now {status}{Style.RESET_ALL}") 1706 | 1707 | # ------------------------------------------------------------------------- 1708 | # PTZ Control Methods 1709 | # ------------------------------------------------------------------------- 1710 | 1711 | def init_ptz_camera(self): 1712 | """Initialize PTZ camera""" 1713 | if not PTZ_AVAILABLE: 1714 | print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}") 1715 | return 1716 | 1717 | if not self.ptz_camera: 1718 | self.ptz_camera = PTZController() 1719 | 1720 | def set_ptz_target_class(self): 1721 | """Set PTZ target class""" 1722 | if not PTZ_AVAILABLE: 1723 | print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}") 1724 | return 1725 | 1726 | try: 1727 | target_class = simpledialog.askstring( 1728 | "PTZ Target Class", 1729 | "Enter the object class name to track (e.g., 'person'):" 1730 | ) 1731 | 1732 | if target_class: 1733 | sanitized = self.validator.sanitize_class_names(target_class) 1734 | if sanitized: 1735 | self.track_object_name = sanitized[0] 1736 | print(f"{Fore.GREEN}PTZ tracking target: {self.track_object_name}{Style.RESET_ALL}") 1737 | else: 1738 | print(f"{Fore.RED}Invalid target class{Style.RESET_ALL}") 1739 | else: 1740 | print(f"{Fore.YELLOW}PTZ target class input cancelled.{Style.RESET_ALL}") 1741 | 1742 | except Exception as e: 1743 | print(f"{Fore.RED}Error setting PTZ target class: {e}{Style.RESET_ALL}") 1744 | 1745 | def start_autonomous_ptz_tracking(self): 1746 | """Start autonomous PTZ tracking""" 1747 | if not PTZ_AVAILABLE: 1748 | print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}") 1749 | return 1750 | 1751 | self.init_ptz_camera() 1752 | if not self.ptz_tracker: 1753 | self.ptz_tracker = PTZTracker(self.ptz_camera) 1754 | 1755 | self.ptz_tracker.activate(True) 1756 | 1757 | if self.track_object_name: 1758 | print(f"{Fore.GREEN}Autonomous PTZ tracking activated for: {self.track_object_name}{Style.RESET_ALL}") 1759 | else: 1760 | print(f"{Fore.GREEN}Autonomous PTZ tracking activated (no target set).{Style.RESET_ALL}") 1761 | 1762 | def stop_autonomous_ptz_tracking(self): 1763 | """Stop autonomous PTZ tracking""" 1764 | if not PTZ_AVAILABLE: 1765 | print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}") 1766 | return 1767 | 1768 | if self.ptz_tracker: 1769 | self.ptz_tracker.activate(False) 1770 | print(f"{Fore.GREEN}Autonomous PTZ tracking deactivated.{Style.RESET_ALL}") 1771 | 1772 | def open_manual_ptz_control(self): 1773 | """Open manual PTZ control""" 1774 | if not PTZ_AVAILABLE or not PTZ_MSVCRT_AVAILABLE: 1775 | print(f"{Fore.YELLOW}PTZ manual control not available.{Style.RESET_ALL}") 1776 | return 1777 | 1778 | self.init_ptz_camera() 1779 | if not self.ptz_camera: 1780 | print(f"{Fore.YELLOW}PTZ camera could not be initialized.{Style.RESET_ALL}") 1781 | return 1782 | 1783 | thread = threading.Thread(target=ptz_control_thread, args=(self.ptz_camera,), daemon=True) 1784 | thread.start() 1785 | 1786 | # ------------------------------------------------------------------------- 1787 | # Recording Control 1788 | # ------------------------------------------------------------------------- 1789 | 1790 | def set_record_mode(self, mode: Optional[str]): 1791 | """Set recording mode""" 1792 | self.record = mode 1793 | self.recording_manager = RecordingManager(self.record) 1794 | mode_str = mode if mode else "None" 1795 | print(f"{Fore.GREEN}Recording mode set to {mode_str}{Style.RESET_ALL}") 1796 | 1797 | # ------------------------------------------------------------------------- 1798 | # Frame Processing 1799 | # ------------------------------------------------------------------------- 1800 | 1801 | def should_process_frame(self) -> bool: 1802 | """Check if enough time has passed for next frame""" 1803 | current_time = time.time() 1804 | if (current_time - self.last_process_time) >= config.MIN_PROCESS_INTERVAL: 1805 | self.last_process_time = current_time 1806 | return True 1807 | return False 1808 | 1809 | def _pick_tracked_object(self, detections: List[Tuple[List[float], str]]) -> Optional[List[float]]: 1810 | """Pick the largest bounding box of the tracked object""" 1811 | if not self.track_object_name: 1812 | return None 1813 | 1814 | candidate_detections = [ 1815 | (bbox, label) 1816 | for bbox, label in detections 1817 | if label.lower() == self.track_object_name.lower() 1818 | ] 1819 | 1820 | if not candidate_detections: 1821 | return None 1822 | 1823 | def bbox_area(bb): 1824 | return (bb[2] - bb[0]) * (bb[3] - bb[1]) 1825 | 1826 | largest_bbox = max(candidate_detections, key=lambda x: bbox_area(x[0]))[0] 1827 | return largest_bbox 1828 | 1829 | def plot_bbox(self, image: np.ndarray) -> np.ndarray: 1830 | """Plot bounding boxes on image""" 1831 | try: 1832 | if not self.detections: 1833 | return image 1834 | return ImageUtils.plot_bbox(image, self.detections) 1835 | except Exception as e: 1836 | logging.error(f"Error plotting bounding boxes: {e}") 1837 | return image 1838 | 1839 | def plot_visual_grounding_bbox(self, image: np.ndarray, bbox: List[float], phrase: str) -> np.ndarray: 1840 | """Plot visual grounding bounding box""" 1841 | try: 1842 | if bbox: 1843 | x1, y1, x2, y2 = map(int, bbox[:4]) 1844 | cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 2) 1845 | cv2.putText( 1846 | image, 1847 | phrase, 1848 | (x1, y1 - 10), 1849 | cv2.FONT_HERSHEY_SIMPLEX, 1850 | 0.5, 1851 | (255, 0, 0), 1852 | 2, 1853 | ) 1854 | return image 1855 | except Exception as e: 1856 | logging.error(f"Error plotting visual grounding bbox: {e}") 1857 | return image 1858 | 1859 | def beep_sound(self): 1860 | """Play beep sound""" 1861 | try: 1862 | if os.name == "nt": 1863 | os.system("echo \a") 1864 | else: 1865 | print("\a") 1866 | except Exception as e: 1867 | logging.error(f"Error playing beep sound: {e}") 1868 | 1869 | def update_inference_rate(self): 1870 | """Update inference rate display""" 1871 | if self.inference_start_time is None: 1872 | self.inference_start_time = time.time() 1873 | else: 1874 | elapsed_time = time.time() - self.inference_start_time 1875 | if elapsed_time > 0: 1876 | inferences_per_second = self.inference_count / elapsed_time 1877 | if self.inference_rate_label: 1878 | self.inference_rate_label.config( 1879 | text=f"Inferences/sec: {inferences_per_second:.2f}", 1880 | fg="green" 1881 | ) 1882 | 1883 | def update_caption_window(self, caption: str): 1884 | """Update caption window""" 1885 | if self.caption_label: 1886 | if caption.lower() == "yes": 1887 | self.caption_label.config( 1888 | text=caption, 1889 | fg="green", 1890 | bg="black", 1891 | font=("Helvetica", 14, "bold") 1892 | ) 1893 | if self.screenshot_on_yes_active: 1894 | with self.frame_lock: 1895 | if self.latest_image: 1896 | frame_bgr = cv2.cvtColor(np.array(self.latest_image), cv2.COLOR_RGB2BGR) 1897 | ImageUtils.save_screenshot(frame_bgr) 1898 | elif caption.lower() == "no": 1899 | self.caption_label.config( 1900 | text=caption, 1901 | fg="red", 1902 | bg="black", 1903 | font=("Helvetica", 14, "bold") 1904 | ) 1905 | if self.screenshot_on_no_active: 1906 | with self.frame_lock: 1907 | if self.latest_image: 1908 | frame_bgr = cv2.cvtColor(np.array(self.latest_image), cv2.COLOR_RGB2BGR) 1909 | ImageUtils.save_screenshot(frame_bgr) 1910 | else: 1911 | self.caption_label.config( 1912 | text=caption, 1913 | fg="white", 1914 | bg="black", 1915 | font=("Helvetica", 14, "bold") 1916 | ) 1917 | 1918 | def update_inference_result_window(self, result: str, phrase_results: List[bool]): 1919 | """Update inference result window""" 1920 | if self.inference_result_label: 1921 | if result.lower() == "pass": 1922 | self.inference_result_label.config( 1923 | text=result, 1924 | fg="green", 1925 | bg="black", 1926 | font=("Helvetica", 14, "bold") 1927 | ) 1928 | else: 1929 | self.inference_result_label.config( 1930 | text=result, 1931 | fg="red", 1932 | bg="black", 1933 | font=("Helvetica", 14, "bold") 1934 | ) 1935 | 1936 | for idx, phrase_result in enumerate(phrase_results): 1937 | if idx < len(self.inference_phrases_result_labels): 1938 | label = self.inference_phrases_result_labels[idx] 1939 | if phrase_result: 1940 | label.config( 1941 | text=f"Inference {idx+1}: PASS", 1942 | fg="green", 1943 | bg="black", 1944 | font=("Helvetica", 14, "bold") 1945 | ) 1946 | else: 1947 | label.config( 1948 | text=f"Inference {idx+1}: FAIL", 1949 | fg="red", 1950 | bg="black", 1951 | font=("Helvetica", 14, "bold") 1952 | ) 1953 | 1954 | # ------------------------------------------------------------------------- 1955 | # Webcam Detection 1956 | # ------------------------------------------------------------------------- 1957 | 1958 | def start_webcam_detection(self): 1959 | """Start webcam detection""" 1960 | if self.webcam_threads: 1961 | print(f"{Fore.RED}Webcam detection is already running.{Style.RESET_ALL}") 1962 | return 1963 | 1964 | self.stop_webcam_flag.clear() 1965 | 1966 | for index in self.webcam_indices: 1967 | thread = threading.Thread( 1968 | target=self._webcam_detection_thread, 1969 | args=(index,), 1970 | daemon=True 1971 | ) 1972 | thread.start() 1973 | self.webcam_threads.append(thread) 1974 | 1975 | print(f"{Fore.GREEN}Started webcam detection{Style.RESET_ALL}") 1976 | 1977 | def stop_webcam_detection(self): 1978 | """Stop webcam detection""" 1979 | if not self.webcam_threads: 1980 | print(f"{Fore.RED}Webcam detection is not running.{Style.RESET_ALL}") 1981 | return 1982 | 1983 | # Deactivate all features 1984 | self.object_detection_active = False 1985 | self.expression_comprehension_active = False 1986 | self.visual_grounding_active = False 1987 | self.inference_tree_active = False 1988 | 1989 | # Signal threads to stop 1990 | self.stop_webcam_flag.set() 1991 | 1992 | # Wait for threads with timeout 1993 | for thread in self.webcam_threads: 1994 | thread.join(timeout=2.0) 1995 | 1996 | self.webcam_threads.clear() 1997 | 1998 | print(f"{Fore.GREEN}Webcam detection stopped successfully.{Style.RESET_ALL}") 1999 | 2000 | def _webcam_detection_thread(self, index: int): 2001 | """Webcam detection thread with enhanced processing""" 2002 | cap = None 2003 | try: 2004 | cap = cv2.VideoCapture(index) 2005 | if not cap.isOpened(): 2006 | print(f"{Fore.RED}Error: Could not open webcam {index}.{Style.RESET_ALL}") 2007 | return 2008 | 2009 | while not self.stop_webcam_flag.is_set(): 2010 | # Frame rate limiting 2011 | if not self.should_process_frame(): 2012 | time.sleep(0.001) 2013 | continue 2014 | 2015 | ret, frame = cap.read() 2016 | if not ret: 2017 | print(f"{Fore.RED}Failed to capture from webcam {index}.{Style.RESET_ALL}") 2018 | break 2019 | 2020 | try: 2021 | # Thread-safe image storage 2022 | with self.frame_lock: 2023 | image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 2024 | image_pil = Image.fromarray(image) 2025 | self.latest_image = image_pil 2026 | 2027 | # Process frame 2028 | self._process_single_frame(image_pil, frame, index) 2029 | 2030 | # Display if not headless 2031 | if not self.headless_mode: 2032 | self._display_frame(frame, index) 2033 | 2034 | if cv2.waitKey(1) & 0xFF == ord('q'): 2035 | break 2036 | 2037 | except Exception as e: 2038 | logging.error(f"Error processing frame from webcam {index}: {e}") 2039 | 2040 | except Exception as e: 2041 | logging.error(f"Error in webcam thread {index}: {e}") 2042 | finally: 2043 | if cap: 2044 | cap.release() 2045 | if not self.headless_mode: 2046 | cv2.destroyWindow(f"Object Detection Webcam {index}") 2047 | 2048 | def _process_single_frame(self, image_pil: Image.Image, frame: np.ndarray, index: int): 2049 | """Process a single frame with all active features""" 2050 | 2051 | # Expression Comprehension 2052 | if self.expression_comprehension_active and self.phrase: 2053 | results = self.run_expression_comprehension(image_pil, self.phrase) 2054 | if results: 2055 | caption = "Yes" if "yes" in results.lower() else "No" 2056 | self.update_caption_window(caption) 2057 | if self.headless_mode: 2058 | print(f"Expression result: {caption}") 2059 | self.inference_count += 1 2060 | self.update_inference_rate() 2061 | 2062 | if self.recording_manager: 2063 | self.recording_manager.handle_recording_by_inference(caption.lower(), frame) 2064 | 2065 | # Object Detection 2066 | if self.object_detection_active: 2067 | self.detections = self.run_object_detection(image_pil) 2068 | if self.headless_mode: 2069 | print(f"Detections from webcam {index}: {self.detections}") 2070 | self.inference_count += 1 2071 | self.update_inference_rate() 2072 | 2073 | # Update target detected flag 2074 | self.target_detected = bool(self.detections) 2075 | 2076 | if self.recording_manager: 2077 | self.recording_manager.handle_recording_by_detection(self.detections, frame) 2078 | 2079 | # PTZ tracking 2080 | if PTZ_AVAILABLE and self.ptz_tracker and self.ptz_tracker.active: 2081 | primary_bbox = self._pick_tracked_object(self.detections) 2082 | if primary_bbox is not None: 2083 | h, w = frame.shape[:2] 2084 | self.ptz_tracker.adjust_camera(primary_bbox, w, h) 2085 | 2086 | # Visual Grounding 2087 | if self.visual_grounding_active and self.visual_grounding_phrase: 2088 | bbox = self.run_visual_grounding(image_pil, self.visual_grounding_phrase) 2089 | if bbox: 2090 | if not self.headless_mode: 2091 | self.plot_visual_grounding_bbox(frame, bbox, self.visual_grounding_phrase) 2092 | else: 2093 | print(f"Visual grounding result: {bbox}") 2094 | self.inference_count += 1 2095 | self.update_inference_rate() 2096 | 2097 | # Inference Tree 2098 | if self.inference_tree_active and self.inference_title and self.inference_phrases: 2099 | result, phrase_results = self.evaluate_inference_tree(image_pil) 2100 | self.update_inference_result_window(result, phrase_results) 2101 | if self.headless_mode: 2102 | print(f"Inference tree result: {result}, Details: {phrase_results}") 2103 | self.inference_count += 1 2104 | self.update_inference_rate() 2105 | 2106 | # Recording 2107 | if self.recording_manager and self.recording_manager.recording: 2108 | self.recording_manager.write_frame(frame) 2109 | 2110 | def _display_frame(self, frame: np.ndarray, index: int): 2111 | """Display frame with overlays""" 2112 | try: 2113 | bbox_image = self.plot_bbox(frame.copy()) 2114 | cv2.imshow(f"Object Detection Webcam {index}", bbox_image) 2115 | 2116 | current_time = time.time() 2117 | 2118 | # Beep on detection 2119 | if self.beep_active and self.target_detected: 2120 | if current_time - self.last_beep_time > 1: 2121 | threading.Thread(target=self.beep_sound, daemon=True).start() 2122 | self.last_beep_time = current_time 2123 | 2124 | # Screenshot on detection 2125 | if self.screenshot_active and self.target_detected: 2126 | ImageUtils.save_screenshot(bbox_image) 2127 | 2128 | except Exception as e: 2129 | logging.error(f"Error displaying frame: {e}") 2130 | 2131 | # ------------------------------------------------------------------------- 2132 | # Cleanup 2133 | # ------------------------------------------------------------------------- 2134 | 2135 | def cleanup(self): 2136 | """Comprehensive cleanup method""" 2137 | try: 2138 | logging.info("Starting YO-FLO cleanup...") 2139 | 2140 | # Stop all threads 2141 | self.stop_webcam_detection() 2142 | self.cleanup_flag.set() 2143 | 2144 | # Clean up frame processor 2145 | if self.frame_processor: 2146 | self.frame_processor.shutdown(wait=False) 2147 | 2148 | # Clean up recording manager 2149 | if self.recording_manager: 2150 | self.recording_manager.cleanup() 2151 | 2152 | # Close PTZ camera 2153 | if self.ptz_camera: 2154 | self.ptz_camera.close() 2155 | 2156 | # Clear model from memory 2157 | if self.model: 2158 | del self.model 2159 | self.model = None 2160 | 2161 | if self.processor: 2162 | del self.processor 2163 | self.processor = None 2164 | 2165 | # Clear CUDA cache 2166 | if torch.cuda.is_available(): 2167 | torch.cuda.empty_cache() 2168 | 2169 | # Force garbage collection 2170 | gc.collect() 2171 | 2172 | # Destroy all OpenCV windows 2173 | cv2.destroyAllWindows() 2174 | 2175 | logging.info("YO-FLO cleanup completed") 2176 | 2177 | except Exception as e: 2178 | logging.error(f"Error during cleanup: {e}") 2179 | 2180 | def __del__(self): 2181 | """Destructor to ensure cleanup""" 2182 | self.cleanup() 2183 | 2184 | # ------------------------------------------------------------------------- 2185 | # Main Menu 2186 | # ------------------------------------------------------------------------- 2187 | 2188 | def main_menu(self): 2189 | """Create and display the main GUI menu""" 2190 | self.root.deiconify() 2191 | self.root.title(config.WINDOW_TITLE) 2192 | 2193 | def on_closing(): 2194 | """Handle window closing""" 2195 | self.cleanup() 2196 | self.root.destroy() 2197 | 2198 | self.root.protocol("WM_DELETE_WINDOW", on_closing) 2199 | 2200 | try: 2201 | # Model Management Frame 2202 | model_frame = tk.LabelFrame(self.root, text="Model Management") 2203 | model_frame.pack(fill="x", padx=10, pady=5) 2204 | 2205 | tk.Button( 2206 | model_frame, 2207 | text="Select Model Path", 2208 | command=self.select_model_path 2209 | ).pack(fill="x") 2210 | 2211 | tk.Button( 2212 | model_frame, 2213 | text="Download Model from HuggingFace", 2214 | command=self.download_model_gui 2215 | ).pack(fill="x") 2216 | 2217 | tk.Button( 2218 | model_frame, 2219 | text="Toggle File Logging", 2220 | command=self.toggle_file_logging 2221 | ).pack(fill="x") 2222 | 2223 | # Detection Settings Frame 2224 | detection_frame = tk.LabelFrame(self.root, text="Detection Settings") 2225 | detection_frame.pack(fill="x", padx=10, pady=5) 2226 | 2227 | tk.Button( 2228 | detection_frame, 2229 | text="Set Classes for Object Detection", 2230 | command=self.set_class_names 2231 | ).pack(fill="x") 2232 | 2233 | tk.Button( 2234 | detection_frame, 2235 | text="Set Phrase for Yes/No Inference", 2236 | command=self.set_phrase 2237 | ).pack(fill="x") 2238 | 2239 | tk.Button( 2240 | detection_frame, 2241 | text="Set Grounding Phrase", 2242 | command=self.set_visual_grounding_phrase 2243 | ).pack(fill="x") 2244 | 2245 | tk.Button( 2246 | detection_frame, 2247 | text="Set Inference Tree", 2248 | command=self.set_inference_tree 2249 | ).pack(fill="x") 2250 | 2251 | # Feature Toggles Frame 2252 | feature_frame = tk.LabelFrame(self.root, text="Feature Toggles") 2253 | feature_frame.pack(fill="x", padx=10, pady=5) 2254 | 2255 | tk.Button( 2256 | feature_frame, 2257 | text="Object Detection", 2258 | command=self.toggle_object_detection 2259 | ).pack(fill="x") 2260 | 2261 | tk.Button( 2262 | feature_frame, 2263 | text="Yes/No Inference", 2264 | command=self.toggle_expression_comprehension 2265 | ).pack(fill="x") 2266 | 2267 | tk.Button( 2268 | feature_frame, 2269 | text="Visual Grounding", 2270 | command=self.toggle_visual_grounding 2271 | ).pack(fill="x") 2272 | 2273 | tk.Button( 2274 | feature_frame, 2275 | text="Inference Tree", 2276 | command=self.toggle_inference_tree 2277 | ).pack(fill="x") 2278 | 2279 | tk.Button( 2280 | feature_frame, 2281 | text="Headless Mode", 2282 | command=self.toggle_headless 2283 | ).pack(fill="x") 2284 | 2285 | # Triggers Frame 2286 | trigger_frame = tk.LabelFrame(self.root, text="Triggers") 2287 | trigger_frame.pack(fill="x", padx=10, pady=5) 2288 | 2289 | tk.Button( 2290 | trigger_frame, 2291 | text="Beep on Detection", 2292 | command=self.toggle_beep 2293 | ).pack(fill="x") 2294 | 2295 | tk.Button( 2296 | trigger_frame, 2297 | text="Screenshot on Detection", 2298 | command=self.toggle_screenshot 2299 | ).pack(fill="x") 2300 | 2301 | tk.Button( 2302 | trigger_frame, 2303 | text="Screenshot on Yes", 2304 | command=self.toggle_screenshot_on_yes 2305 | ).pack(fill="x") 2306 | 2307 | tk.Button( 2308 | trigger_frame, 2309 | text="Screenshot on No", 2310 | command=self.toggle_screenshot_on_no 2311 | ).pack(fill="x") 2312 | 2313 | # PTZ Control Frame 2314 | if PTZ_AVAILABLE: 2315 | ptz_frame = tk.LabelFrame(self.root, text="PTZ Control") 2316 | ptz_frame.pack(fill="x", padx=10, pady=5) 2317 | 2318 | tk.Button( 2319 | ptz_frame, 2320 | text="Open Manual PTZ Control", 2321 | command=self.open_manual_ptz_control 2322 | ).pack(fill="x") 2323 | 2324 | tk.Button( 2325 | ptz_frame, 2326 | text="Set PTZ Target Class", 2327 | command=self.set_ptz_target_class 2328 | ).pack(fill="x") 2329 | 2330 | tk.Button( 2331 | ptz_frame, 2332 | text="Start Autonomous Tracking", 2333 | command=self.start_autonomous_ptz_tracking 2334 | ).pack(fill="x") 2335 | 2336 | tk.Button( 2337 | ptz_frame, 2338 | text="Stop Autonomous Tracking", 2339 | command=self.stop_autonomous_ptz_tracking 2340 | ).pack(fill="x") 2341 | else: 2342 | ptz_frame = tk.LabelFrame(self.root, text="PTZ Control (Unavailable)") 2343 | ptz_frame.pack(fill="x", padx=10, pady=5) 2344 | 2345 | tk.Label( 2346 | ptz_frame, 2347 | text="PTZ functionality not available - missing required modules", 2348 | fg="red" 2349 | ).pack(fill="x") 2350 | 2351 | # Recording Frame 2352 | recording_frame = tk.LabelFrame(self.root, text="Recording Control") 2353 | recording_frame.pack(fill="x", padx=10, pady=5) 2354 | 2355 | tk.Button( 2356 | recording_frame, 2357 | text="No Recording", 2358 | command=lambda: self.set_record_mode(None) 2359 | ).pack(fill="x") 2360 | 2361 | tk.Button( 2362 | recording_frame, 2363 | text="Record on Detection", 2364 | command=lambda: self.set_record_mode("od") 2365 | ).pack(fill="x") 2366 | 2367 | tk.Button( 2368 | recording_frame, 2369 | text='Record on "Yes"', 2370 | command=lambda: self.set_record_mode("infy") 2371 | ).pack(fill="x") 2372 | 2373 | tk.Button( 2374 | recording_frame, 2375 | text='Record on "No"', 2376 | command=lambda: self.set_record_mode("infn") 2377 | ).pack(fill="x") 2378 | 2379 | # Webcam Control Frame 2380 | webcam_frame = tk.LabelFrame(self.root, text="Webcam Control") 2381 | webcam_frame.pack(fill="x", padx=10, pady=5) 2382 | 2383 | tk.Button( 2384 | webcam_frame, 2385 | text="Start Webcam Detection", 2386 | command=self.start_webcam_detection 2387 | ).pack(fill="x") 2388 | 2389 | tk.Button( 2390 | webcam_frame, 2391 | text="Stop Webcam Detection", 2392 | command=self.stop_webcam_detection 2393 | ).pack(fill="x") 2394 | 2395 | # Debug Frame 2396 | debug_frame = tk.LabelFrame(self.root, text="Debug") 2397 | debug_frame.pack(fill="x", padx=10, pady=5) 2398 | 2399 | tk.Button( 2400 | debug_frame, 2401 | text="Toggle Debug Mode", 2402 | command=self.toggle_debug 2403 | ).pack(fill="x") 2404 | 2405 | # Inference Rate Frame 2406 | inference_rate_frame = tk.LabelFrame(self.root, text="Inference Rate") 2407 | inference_rate_frame.pack(fill="x", padx=10, pady=5) 2408 | 2409 | self.inference_rate_label = tk.Label( 2410 | inference_rate_frame, 2411 | text="Inferences/sec: N/A", 2412 | fg="white", 2413 | bg="black", 2414 | font=("Helvetica", 14, "bold") 2415 | ) 2416 | self.inference_rate_label.pack(fill="x") 2417 | 2418 | # Binary Inference Frame 2419 | binary_inference_frame = tk.LabelFrame(self.root, text="Binary Inference") 2420 | binary_inference_frame.pack(fill="x", padx=10, pady=5) 2421 | 2422 | self.caption_label = tk.Label( 2423 | binary_inference_frame, 2424 | text="Binary Inference: N/A", 2425 | fg="white", 2426 | bg="black", 2427 | font=("Helvetica", 14, "bold") 2428 | ) 2429 | self.caption_label.pack(fill="x") 2430 | 2431 | # Inference Tree Frame 2432 | inference_tree_frame = tk.LabelFrame(self.root, text="Inference Tree") 2433 | inference_tree_frame.pack(fill="x", padx=10, pady=5) 2434 | 2435 | self.inference_result_label = tk.Label( 2436 | inference_tree_frame, 2437 | text="Inference Tree: N/A", 2438 | fg="white", 2439 | bg="black", 2440 | font=("Helvetica", 14, "bold") 2441 | ) 2442 | self.inference_result_label.pack(fill="x") 2443 | 2444 | for i in range(3): 2445 | label = tk.Label( 2446 | inference_tree_frame, 2447 | text=f"Inference {i+1}: N/A", 2448 | fg="white", 2449 | bg="black", 2450 | font=("Helvetica", 14, "bold") 2451 | ) 2452 | label.pack(fill="x") 2453 | self.inference_phrases_result_labels.append(label) 2454 | 2455 | # Statistics update 2456 | def update_stats(): 2457 | if self.frame_processor: 2458 | stats = self.frame_processor.get_statistics() 2459 | # You could add a stats label here to display this info 2460 | self.root.after(1000, update_stats) 2461 | 2462 | update_stats() 2463 | 2464 | except Exception as e: 2465 | print(f"{Fore.RED}Error creating menu: {e}{Style.RESET_ALL}") 2466 | 2467 | self.root.mainloop() 2468 | 2469 | # ============================================================================ 2470 | # MAIN ENTRY POINT 2471 | # ============================================================================ 2472 | 2473 | def main(): 2474 | """Main entry point with proper error handling and cleanup""" 2475 | app = None 2476 | 2477 | try: 2478 | # Setup logging 2479 | setup_logging(log_to_file=False, log_level=logging.INFO) 2480 | 2481 | # Create application instance 2482 | app = YO_FLO() 2483 | app.init_model_manager(quantization_mode=None) 2484 | 2485 | print(f"{Fore.BLUE}{Style.BRIGHT}YO-FLO Vision System v2.0{Style.RESET_ALL}") 2486 | print(f"{Fore.CYAN}Enhanced with security, threading, and memory management{Style.RESET_ALL}") 2487 | print(f"{Fore.CYAN}Created with comprehensive improvements for production use{Style.RESET_ALL}") 2488 | 2489 | # Run the GUI 2490 | app.main_menu() 2491 | 2492 | except KeyboardInterrupt: 2493 | print(f"\n{Fore.YELLOW}Interrupted by user{Style.RESET_ALL}") 2494 | 2495 | except Exception as e: 2496 | logging.error(f"Fatal error: {e}", exc_info=True) 2497 | print(f"{Fore.RED}Fatal error: {e}{Style.RESET_ALL}") 2498 | 2499 | finally: 2500 | # Ensure cleanup 2501 | if app: 2502 | app.cleanup() 2503 | logging.info("Application shutdown complete") 2504 | 2505 | if __name__ == "__main__": 2506 | main() 2507 | --------------------------------------------------------------------------------