├── requirements.txt
├── LICENSE
├── README.md
└── YO-FLO.py


/requirements.txt:
--------------------------------------------------------------------------------
 1 | opencv-python
 2 | torch
 3 | transformers
 4 | Pillow
 5 | numpy
 6 | tk
 7 | colorama
 8 | simpleaudio
 9 | requests
10 | matplotlib
11 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | 
 2 | MIT License
 3 | 
 4 | Copyright (c) 2024 Charles Norton
 5 | 
 6 | Permission is hereby granted, free of charge, to any person obtaining a copy
 7 | of this software and associated documentation files (the "Software"), to deal
 8 | in the Software without restriction, including without limitation the rights
 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | copies of the Software, and to permit persons to whom the Software is
11 | furnished to do so, subject to the following conditions:
12 | 
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # YO-FLO: YOLO-Like Object Detection with Florence Models
  2 | 
  3 | Welcome to YO-FLO, a proof-of-concept implementation of YOLO-like object detection using the Florence-2-base-ft model. Inspired by the powerful YOLO (You Only Look Once) object detection framework, YO-FLO leverages the capabilities of the Florence foundational vision model to achieve real-time inference while maintaining a lightweight footprint.
  4 | 
  5 | ## Table of Contents
  6 | 
  7 | - Introduction
  8 | - Features
  9 | - Installation
 10 | - Usage
 11 | - Error Handling
 12 | - Contributing
 13 | - License
 14 | 
 15 | ## Introduction
 16 | 
 17 | YO-FLO explores whether the new Florence foundational vision model can be implemented in a YOLO-like format for object detection. Florence-2 is designed by Microsoft as a unified vision-language model capable of handling diverse tasks such as object detection, captioning, and segmentation. To achieve this, it uses a sequence-to-sequence framework where images and task-specific prompts are processed to generate the desired text outputs. The model's architecture combines a DaViT vision encoder with a transformer-based multi-modal encoder-decoder, making it versatile and efficient.
 18 | 
 19 | Florence-2 has been trained on the extensive FLD-5B dataset, containing 126 million images and over 5 billion annotations, ensuring high-quality performance across multiple tasks. Despite its relatively small size, Florence-2 demonstrates strong zero-shot and fine-tuning capabilities, making it an excellent choice for real-time applications.
 20 | 
 21 | ## Features
 22 | 
 23 | - **Real-Time Object Detection**: Achieve YOLO-like performance using the Florence-2-base-ft model.
 24 | - **Class-Specific Detection**: Specify the class of objects you want to detect (e.g., 'cat', 'dog').
 25 | - **Expression Comprehension**: Detect objects or states via questions for mundane, cool, and exotic results!
 26 | - **Beep and Screenshot on Detection**: Toggle options to beep and take screenshots when the target class or phrase is detected.
 27 | - **Tkinter GUI**: A user-friendly graphical interface for easy interaction.
 28 | - **Cross-Platform Compatibility**: Works on Windows, macOS, and Linux.
 29 | - **Toggle Headless Mode**: Enable or disable headless mode for running without GUI.
 30 | - **Update Inference Rate**: Display the rate of inferences per second during real-time detection.
 31 | - **Screenshot on Yes/No Inference**: Automatically save screenshots based on yes/no answers from expression comprehension.
 32 | - **Visual Grounding**: Identify and highlight specific regions in an image based on descriptive phrases.
 33 | - **Evaluate Inference Tree**: Use a tree of inference phrases to evaluate multiple conditions in a single run.
 34 | - **Plot Bounding Boxes**: Visualize detection results by plotting bounding boxes on the image.
 35 | - **Save Screenshots**: Save screenshots of detected objects or regions of interest.
 36 | - **Robust Error Handling**: Comprehensive error management for smooth operation.
 37 | - **Webcam Detection Control**: Start and stop webcam-based detection with ease.
 38 | - **Debug Mode**: Toggle detailed logging for development and troubleshooting purposes.
 39 | 
 40 | ## Installation
 41 | 
 42 | ### Prerequisites
 43 | 
 44 | - Python 3.7 or higher
 45 | - pip
 46 | 
 47 | ### Installing Dependencies
 48 | 
 49 | ```
 50 | pip install torch transformers pillow opencv-python colorama simpleaudio huggingface-hub
 51 | ```
 52 | 
 53 | ## Usage
 54 | 
 55 | ### Running YO-FLO
 56 | 
 57 | To start YO-FLO, run the following command:
 58 | 
 59 | ```
 60 | python yo-flo.py
 61 | ```
 62 | 
 63 | ### Menu Options
 64 | 
 65 | 1. **Select Model Path**: Choose a local directory containing the Florence model.
 66 | 2. **Download Model from HuggingFace**: Download and initialize the Florence-2-base-ft model from HuggingFace.
 67 | 3. **Set Class Name**: Specify the class name you want to detect (leave blank to show all detections).
 68 | 4. **Set Phrase**: Enter the phrase for comprehension detection (e.g., 'Is the person smiling?', 'Is the cat laying down?').
 69 | 5. **Set Visual Grounding Phrase**: Enter the phrase for visual grounding.
 70 | 6. **Set Inference Tree**: Enter multiple inference phrases to evaluate several conditions.
 71 | 7. **Toggle Beep on Detection**: Enable or disable the beep sound on detection.
 72 | 8. **Toggle Screenshot on Detection**: Enable or disable taking screenshots on detection.
 73 | 9. **Toggle Screenshot on Yes/No Inference**: Enable or disable taking screenshots based on yes/no inference results.
 74 | 10. **Start Webcam Detection**: Begin real-time object detection using your webcam.
 75 | 11. **Stop Webcam Detection**: Stop the webcam detection and return to the menu.
 76 | 12. **Toggle Debug Mode**: Enable or disable debug mode for detailed logging.
 77 | 13. **Toggle Headless Mode**: Enable or disable headless mode for running without GUI.
 78 | 14. **Exit**: Exit the application.
 79 | 
 80 | ### Example Workflow
 81 | 
 82 | 1. Select Model Path or Download Model from HuggingFace.
 83 | 2. Set Class Name to specify what you want to detect (e.g., 'cat', 'dog').
 84 | 3. Set Phrase for specific phrase-based inference.
 85 | 4. Set Visual Grounding Phrase to bound specific regions to detect.
 86 | 5. Set Inference Tree for evaluating multiple conditions.
 87 | 6. Toggle Beep on Detection if you want an audible alert.
 88 | 7. Toggle Screenshot on Detection if you want to save screenshots of detections.
 89 | 8. Toggle Screenshot on Yes/No Inference to save screenshots based on comprehension results.
 90 | 9. Start Webcam Detection to begin detecting objects in real-time.
 91 | 
 92 | ## Error Handling
 93 | 
 94 | YO-FLO includes robust error handling to ensure smooth operation:
 95 | 
 96 | - **Model Initialization Errors**: Handles cases where the model path is incorrect or the model fails to load.
 97 | - **Webcam Access Errors**: Notifies if the webcam cannot be accessed.
 98 | - **Image Processing Errors**: Catches errors during frame processing and provides detailed messages.
 99 | - **File Not Found Errors**: Alerts if required files (e.g., beep sound file) are missing.
100 | - **General Exception Handling**: Catches and logs any unexpected errors to prevent crashes.
101 | 
102 | ### Example Error Messages
103 | 
104 | - **Error loading model**: Model path not found or model failed to load.
105 | - **Error running object detection**: Issues during object detection process.
106 | - **Error plotting bounding boxes**: Problems with visualizing detection results.
107 | - **Error toggling beep**: Issues enabling or disabling the beep sound.
108 | - **Error saving screenshot**: Problems saving detection screenshots.
109 | - **OpenCV error**: Errors related to OpenCV operations.
110 | 
111 | ## Contributing
112 | 
113 | We welcome contributions to improve YO-FLO. Please follow these steps:
114 | 
115 | 1. Fork the repository.
116 | 2. Create a new branch (git checkout -b feature-branch).
117 | 3. Commit your changes (git commit -am 'Add new feature').
118 | 4. Push to the branch (git push origin feature-branch).
119 | 5. Create a new Pull Request.
120 | 
121 | ## License
122 | 
123 | YO-FLO is licensed under the MIT License.
124 | 
125 | ---
126 | 
127 | Thank you for using YO-FLO! We are excited to see what amazing applications you will build with this tool. Happy detecting!
128 | 


--------------------------------------------------------------------------------
/YO-FLO.py:
--------------------------------------------------------------------------------
   1 | import argparse
   2 | import logging
   3 | import os
   4 | import threading
   5 | import time
   6 | import sys
   7 | import cv2
   8 | import torch
   9 | from datetime import datetime
  10 | from PIL import Image
  11 | from transformers import AutoProcessor, AutoModelForCausalLM, BitsAndBytesConfig
  12 | from huggingface_hub import snapshot_download, hf_hub_download
  13 | from colorama import Fore, Style, init
  14 | import tkinter as tk
  15 | from tkinter import filedialog, simpledialog, Toplevel
  16 | import numpy as np
  17 | from dataclasses import dataclass, field
  18 | from typing import Optional, List, Tuple, Dict, Any
  19 | from concurrent.futures import ThreadPoolExecutor, Future
  20 | from queue import Queue, Empty, Full
  21 | import re
  22 | from pathlib import Path
  23 | import gc
  24 | 
  25 | # Conditional imports for PTZ functionality
  26 | PTZ_AVAILABLE = False
  27 | try:
  28 |     import hid  # For PTZ camera HID
  29 |     PTZ_HID_AVAILABLE = True
  30 | except ImportError:
  31 |     PTZ_HID_AVAILABLE = False
  32 |     print(f"{Fore.YELLOW}HID module not available. PTZ camera control will be disabled.{Style.RESET_ALL}")
  33 | 
  34 | try:
  35 |     import msvcrt  # For Windows-specific PTZ control with arrow keys
  36 |     PTZ_MSVCRT_AVAILABLE = True
  37 | except ImportError:
  38 |     PTZ_MSVCRT_AVAILABLE = False
  39 |     print(f"{Fore.YELLOW}msvcrt module not available. Manual PTZ control with arrow keys will be disabled.{Style.RESET_ALL}")
  40 | 
  41 | # Set overall PTZ availability based on critical components
  42 | PTZ_AVAILABLE = PTZ_HID_AVAILABLE
  43 | 
  44 | init(autoreset=True)
  45 | 
  46 | # ============================================================================
  47 | # CONFIGURATION MANAGEMENT
  48 | # ============================================================================
  49 | 
  50 | @dataclass
  51 | class AppConfig:
  52 |     """Central configuration for the YO-FLO application"""
  53 |     # Model settings
  54 |     DEFAULT_MODEL: str = "microsoft/Florence-2-base-ft"
  55 |     MODEL_CACHE_DIR: str = "model"
  56 |     QUANTIZATION_OPTIONS: List[str] = field(default_factory=lambda: ["none", "4bit"])
  57 |     
  58 |     # PTZ Camera settings
  59 |     PTZ_VENDOR_ID: int = 0x046D
  60 |     PTZ_PRODUCT_ID: int = 0x085F
  61 |     PTZ_USAGE_PAGE: int = 65280
  62 |     PTZ_USAGE: int = 1
  63 |     PTZ_COMMAND_DELAY: float = 0.2
  64 |     
  65 |     # PTZ Tracking settings
  66 |     PTZ_DESIRED_RATIO: float = 0.20
  67 |     PTZ_ZOOM_TOLERANCE: float = 0.4
  68 |     PTZ_PAN_TILT_TOLERANCE: int = 25
  69 |     PTZ_PAN_TILT_INTERVAL: float = 0.75
  70 |     PTZ_ZOOM_INTERVAL: float = 0.5
  71 |     PTZ_SMOOTHING_FACTOR: float = 0.2
  72 |     PTZ_MAX_ERRORS: int = 5
  73 |     
  74 |     # Recording settings
  75 |     RECORDING_FPS: float = 20.0
  76 |     RECORDING_CODEC: str = "XVID"
  77 |     RECORDING_TIMEOUT: float = 1.0  # Stop recording after 1s of no detection
  78 |     
  79 |     # Frame processing settings
  80 |     MAX_FPS: int = 30
  81 |     MIN_PROCESS_INTERVAL: float = 1.0 / 30  # Max 30 FPS
  82 |     FRAME_QUEUE_SIZE: int = 10
  83 |     PROCESSING_THREADS: int = 4
  84 |     
  85 |     # Security settings
  86 |     ALLOWED_MODEL_EXTENSIONS: List[str] = field(default_factory=lambda: [".bin", ".json", ".safetensors", ".pt"])
  87 |     MAX_CLASS_NAME_LENGTH: int = 50
  88 |     MAX_PHRASE_LENGTH: int = 200
  89 |     ALLOWED_CLASS_NAME_CHARS: str = r'^[a-zA-Z0-9\s\-_,]+$'
  90 |     
  91 |     # Logging settings
  92 |     LOG_FILE: str = "alerts.log"
  93 |     LOG_FORMAT: str = "%(asctime)s - %(levelname)s - %(message)s"
  94 |     
  95 |     # GUI settings
  96 |     WINDOW_TITLE: str = "YO-FLO Vision System"
  97 |     DEFAULT_WEBCAM_INDICES: List[int] = field(default_factory=lambda: [0])
  98 |     
  99 |     # Memory management
 100 |     CUDA_MEMORY_FRACTION: float = 0.8
 101 |     CLEANUP_INTERVAL: float = 60.0  # Run cleanup every 60 seconds
 102 | 
 103 | # Global config instance
 104 | config = AppConfig()
 105 | 
 106 | # ============================================================================
 107 | # SECURITY UTILITIES
 108 | # ============================================================================
 109 | 
 110 | class SecurityValidator:
 111 |     """Validates and sanitizes user inputs for security"""
 112 |     
 113 |     @staticmethod
 114 |     def validate_path(path: str, allowed_extensions: List[str] = None) -> Optional[Path]:
 115 |         """
 116 |         Validates a file/directory path for security issues
 117 |         
 118 |         :param path: Path to validate
 119 |         :param allowed_extensions: List of allowed file extensions
 120 |         :return: Validated Path object or None if invalid
 121 |         """
 122 |         try:
 123 |             # Convert to Path object for safe handling
 124 |             safe_path = Path(path).resolve()
 125 |             
 126 |             # Check if path exists
 127 |             if not safe_path.exists():
 128 |                 logging.warning(f"Path does not exist: {safe_path}")
 129 |                 return None
 130 |             
 131 |             # Prevent directory traversal
 132 |             if ".." in str(path):
 133 |                 logging.error(f"Potential directory traversal attempt: {path}")
 134 |                 return None
 135 |             
 136 |             # Check file extensions if provided
 137 |             if allowed_extensions and safe_path.is_file():
 138 |                 if safe_path.suffix.lower() not in allowed_extensions:
 139 |                     logging.error(f"Invalid file extension: {safe_path.suffix}")
 140 |                     return None
 141 |             
 142 |             return safe_path
 143 |             
 144 |         except Exception as e:
 145 |             logging.error(f"Path validation error: {e}")
 146 |             return None
 147 |     
 148 |     @staticmethod
 149 |     def sanitize_class_names(input_string: str) -> Optional[List[str]]:
 150 |         """
 151 |         Sanitizes class name input from user
 152 |         
 153 |         :param input_string: Raw input string
 154 |         :return: List of sanitized class names or None if invalid
 155 |         """
 156 |         if not input_string or len(input_string) > config.MAX_CLASS_NAME_LENGTH * 10:
 157 |             return None
 158 |         
 159 |         # Check for allowed characters
 160 |         if not re.match(config.ALLOWED_CLASS_NAME_CHARS, input_string):
 161 |             logging.warning(f"Invalid characters in class names: {input_string}")
 162 |             return None
 163 |         
 164 |         # Split and clean individual class names
 165 |         class_names = []
 166 |         for name in input_string.split(','):
 167 |             name = name.strip().lower()
 168 |             if name and len(name) <= config.MAX_CLASS_NAME_LENGTH:
 169 |                 class_names.append(name)
 170 |         
 171 |         return class_names if class_names else None
 172 |     
 173 |     @staticmethod
 174 |     def sanitize_phrase(phrase: str) -> Optional[str]:
 175 |         """
 176 |         Sanitizes phrase input from user
 177 |         
 178 |         :param phrase: Raw phrase input
 179 |         :return: Sanitized phrase or None if invalid
 180 |         """
 181 |         if not phrase or len(phrase) > config.MAX_PHRASE_LENGTH:
 182 |             return None
 183 |         
 184 |         # Remove potentially dangerous characters
 185 |         sanitized = re.sub(r'[<>\"\'\\]', '', phrase.strip())
 186 |         
 187 |         return sanitized if sanitized else None
 188 | 
 189 | # ============================================================================
 190 | # IMPROVED FRAME PROCESSOR WITH THREADING
 191 | # ============================================================================
 192 | 
 193 | class FrameProcessor:
 194 |     """Handles frame processing with proper threading and queue management"""
 195 |     
 196 |     def __init__(self, max_workers: int = None):
 197 |         """
 198 |         Initialize the frame processor
 199 |         
 200 |         :param max_workers: Maximum number of worker threads
 201 |         """
 202 |         self.max_workers = max_workers or config.PROCESSING_THREADS
 203 |         self.executor = ThreadPoolExecutor(max_workers=self.max_workers)
 204 |         self.processing_queue = Queue(maxsize=config.FRAME_QUEUE_SIZE)
 205 |         self.result_queue = Queue(maxsize=config.FRAME_QUEUE_SIZE)
 206 |         self.active_futures: List[Future] = []
 207 |         self.shutdown_flag = threading.Event()
 208 |         self.last_process_time = time.time()
 209 |         self.frame_lock = threading.Lock()
 210 |         self.stats_lock = threading.Lock()
 211 |         
 212 |         # Performance statistics
 213 |         self.frames_processed = 0
 214 |         self.frames_dropped = 0
 215 |         self.processing_times = []
 216 |         
 217 |     def should_process_frame(self) -> bool:
 218 |         """Check if enough time has passed to process next frame (FPS limiting)"""
 219 |         current_time = time.time()
 220 |         time_elapsed = current_time - self.last_process_time
 221 |         
 222 |         if time_elapsed >= config.MIN_PROCESS_INTERVAL:
 223 |             self.last_process_time = current_time
 224 |             return True
 225 |         return False
 226 |     
 227 |     def add_frame(self, frame: np.ndarray, metadata: Dict[str, Any] = None) -> bool:
 228 |         """
 229 |         Add a frame to the processing queue
 230 |         
 231 |         :param frame: Frame to process
 232 |         :param metadata: Optional metadata for the frame
 233 |         :return: True if frame was added, False if queue is full
 234 |         """
 235 |         if self.shutdown_flag.is_set():
 236 |             return False
 237 |         
 238 |         if not self.should_process_frame():
 239 |             with self.stats_lock:
 240 |                 self.frames_dropped += 1
 241 |             return False
 242 |         
 243 |         try:
 244 |             self.processing_queue.put_nowait({
 245 |                 'frame': frame,
 246 |                 'metadata': metadata or {},
 247 |                 'timestamp': time.time()
 248 |             })
 249 |             return True
 250 |         except Full:
 251 |             with self.stats_lock:
 252 |                 self.frames_dropped += 1
 253 |             logging.debug("Frame queue is full, dropping frame")
 254 |             return False
 255 |     
 256 |     def process_frame(self, frame_data: Dict[str, Any], 
 257 |                      processing_func: callable) -> Optional[Any]:
 258 |         """
 259 |         Process a single frame with memory management
 260 |         
 261 |         :param frame_data: Frame data dictionary
 262 |         :param processing_func: Function to process the frame
 263 |         :return: Processing result or None
 264 |         """
 265 |         frame = frame_data['frame']
 266 |         start_time = time.time()
 267 |         result = None
 268 |         
 269 |         try:
 270 |             # Process frame
 271 |             result = processing_func(frame, frame_data['metadata'])
 272 |             
 273 |             # Update statistics
 274 |             with self.stats_lock:
 275 |                 self.frames_processed += 1
 276 |                 self.processing_times.append(time.time() - start_time)
 277 |                 if len(self.processing_times) > 100:
 278 |                     self.processing_times.pop(0)
 279 |             
 280 |             return result
 281 |             
 282 |         except Exception as e:
 283 |             logging.error(f"Error processing frame: {e}")
 284 |             return None
 285 |             
 286 |         finally:
 287 |             # Memory cleanup
 288 |             del frame
 289 |             if torch.cuda.is_available():
 290 |                 torch.cuda.empty_cache()
 291 |             gc.collect()
 292 |     
 293 |     def submit_frame_batch(self, frames: List[np.ndarray], 
 294 |                           processing_func: callable) -> List[Future]:
 295 |         """
 296 |         Submit a batch of frames for processing
 297 |         
 298 |         :param frames: List of frames to process
 299 |         :param processing_func: Function to process each frame
 300 |         :return: List of futures for the submitted tasks
 301 |         """
 302 |         futures = []
 303 |         for frame in frames:
 304 |             if self.add_frame(frame):
 305 |                 future = self.executor.submit(
 306 |                     self.process_frame,
 307 |                     {'frame': frame, 'metadata': {}, 'timestamp': time.time()},
 308 |                     processing_func
 309 |                 )
 310 |                 futures.append(future)
 311 |                 self.active_futures.append(future)
 312 |         
 313 |         # Clean up completed futures
 314 |         self.active_futures = [f for f in self.active_futures if not f.done()]
 315 |         
 316 |         return futures
 317 |     
 318 |     def get_statistics(self) -> Dict[str, Any]:
 319 |         """Get processing statistics"""
 320 |         with self.stats_lock:
 321 |             avg_time = np.mean(self.processing_times) if self.processing_times else 0
 322 |             return {
 323 |                 'frames_processed': self.frames_processed,
 324 |                 'frames_dropped': self.frames_dropped,
 325 |                 'average_processing_time': avg_time,
 326 |                 'queue_size': self.processing_queue.qsize(),
 327 |                 'active_tasks': len(self.active_futures)
 328 |             }
 329 |     
 330 |     def shutdown(self, wait: bool = True):
 331 |         """
 332 |         Shutdown the frame processor
 333 |         
 334 |         :param wait: Whether to wait for pending tasks to complete
 335 |         """
 336 |         self.shutdown_flag.set()
 337 |         
 338 |         # Clear queues
 339 |         while not self.processing_queue.empty():
 340 |             try:
 341 |                 self.processing_queue.get_nowait()
 342 |             except Empty:
 343 |                 break
 344 |         
 345 |         # Cancel active futures if not waiting
 346 |         if not wait:
 347 |             for future in self.active_futures:
 348 |                 future.cancel()
 349 |         
 350 |         self.executor.shutdown(wait=wait)
 351 |         logging.info("Frame processor shutdown complete")
 352 | 
 353 | # ============================================================================
 354 | # IMPROVED LOGGING SETUP
 355 | # ============================================================================
 356 | 
 357 | def setup_logging(log_to_file: bool = False, log_level: int = logging.INFO):
 358 |     """
 359 |     Sets up the logging configuration for the entire application.
 360 |     
 361 |     :param log_to_file: Boolean indicating whether to also log to a file.
 362 |     :param log_level: Logging level (e.g., logging.DEBUG, logging.INFO)
 363 |     """
 364 |     handlers = [logging.StreamHandler()]
 365 |     
 366 |     if log_to_file:
 367 |         # Create logs directory if it doesn't exist
 368 |         log_dir = Path("logs")
 369 |         log_dir.mkdir(exist_ok=True)
 370 |         
 371 |         # Add timestamp to log filename
 372 |         timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
 373 |         log_file = log_dir / f"yoflo_{timestamp}.log"
 374 |         
 375 |         handlers.append(logging.FileHandler(log_file))
 376 |     
 377 |     logging.basicConfig(
 378 |         level=log_level,
 379 |         format=config.LOG_FORMAT,
 380 |         handlers=handlers
 381 |     )
 382 |     
 383 |     # Set specific loggers to warning to reduce noise
 384 |     logging.getLogger("transformers").setLevel(logging.WARNING)
 385 |     logging.getLogger("PIL").setLevel(logging.WARNING)
 386 | 
 387 | # ============================================================================
 388 | # IMPROVED PTZ CONTROLLER
 389 | # ============================================================================
 390 | 
 391 | class PTZController:
 392 |     """
 393 |     Class to control PTZ camera movements via HID commands with improved error handling.
 394 |     """
 395 |     
 396 |     def __init__(self, vendor_id: int = None, product_id: int = None,
 397 |                  usage_page: int = None, usage: int = None):
 398 |         """
 399 |         Initializes the PTZController with configuration-based defaults
 400 |         """
 401 |         self.vendor_id = vendor_id or config.PTZ_VENDOR_ID
 402 |         self.product_id = product_id or config.PTZ_PRODUCT_ID
 403 |         self.usage_page = usage_page or config.PTZ_USAGE_PAGE
 404 |         self.usage = usage or config.PTZ_USAGE
 405 |         self.device = None
 406 |         self.device_lock = threading.Lock()
 407 |         self.command_count = 0
 408 |         self.error_count = 0
 409 |         
 410 |         self._initialize_device()
 411 |     
 412 |     def _initialize_device(self):
 413 |         """Initialize the HID device connection"""
 414 |         if not PTZ_HID_AVAILABLE:
 415 |             logging.warning("PTZ control unavailable - HID module not loaded.")
 416 |             return
 417 |         
 418 |         try:
 419 |             ptz_path = None
 420 |             for d in hid.enumerate(self.vendor_id, self.product_id):
 421 |                 if d['usage_page'] == self.usage_page and d['usage'] == self.usage:
 422 |                     ptz_path = d['path']
 423 |                     break
 424 |             
 425 |             if ptz_path:
 426 |                 self.device = hid.device()
 427 |                 self.device.open_path(ptz_path)
 428 |                 logging.info("PTZ HID interface opened successfully.")
 429 |             else:
 430 |                 logging.warning("No suitable PTZ HID interface found.")
 431 |                 
 432 |         except IOError as e:
 433 |             logging.error(f"Error opening PTZ device: {e}")
 434 |             self.error_count += 1
 435 |         except Exception as e:
 436 |             logging.error(f"Unexpected error during PTZ device initialization: {e}")
 437 |             self.error_count += 1
 438 |     
 439 |     def send_command(self, report_id: int, value: int) -> bool:
 440 |         """
 441 |         Sends a command to the PTZ device via HID write with thread safety.
 442 |         
 443 |         :param report_id: The report ID for the PTZ control
 444 |         :param value: The value representing the command
 445 |         :return: True if command was sent successfully
 446 |         """
 447 |         if not PTZ_HID_AVAILABLE or not self.device:
 448 |             logging.debug("PTZ Device not initialized or not available.")
 449 |             return False
 450 |         
 451 |         with self.device_lock:
 452 |             command = [report_id & 0xFF, value] + [0x00] * 30
 453 |             
 454 |             try:
 455 |                 self.device.write(command)
 456 |                 self.command_count += 1
 457 |                 logging.debug(f"PTZ command sent: report_id={report_id}, value={value}")
 458 |                 time.sleep(config.PTZ_COMMAND_DELAY)
 459 |                 return True
 460 |                 
 461 |             except IOError as e:
 462 |                 logging.error(f"Error sending PTZ command: {e}")
 463 |                 self.error_count += 1
 464 |                 
 465 |                 # Try to reconnect if too many errors
 466 |                 if self.error_count > 5:
 467 |                     self._reconnect()
 468 |                 return False
 469 |                 
 470 |             except Exception as e:
 471 |                 logging.error(f"Unexpected error sending PTZ command: {e}")
 472 |                 self.error_count += 1
 473 |                 return False
 474 |     
 475 |     def _reconnect(self):
 476 |         """Attempt to reconnect to the PTZ device"""
 477 |         logging.info("Attempting to reconnect to PTZ device...")
 478 |         self.close()
 479 |         time.sleep(1)
 480 |         self._initialize_device()
 481 |     
 482 |     def pan_right(self) -> bool:
 483 |         """Pans the camera to the right."""
 484 |         return self.send_command(0x0B, 0x02)
 485 |     
 486 |     def pan_left(self) -> bool:
 487 |         """Pans the camera to the left."""
 488 |         return self.send_command(0x0B, 0x03)
 489 |     
 490 |     def tilt_up(self) -> bool:
 491 |         """Tilts the camera upward."""
 492 |         return self.send_command(0x0B, 0x00)
 493 |     
 494 |     def tilt_down(self) -> bool:
 495 |         """Tilts the camera downward."""
 496 |         return self.send_command(0x0B, 0x01)
 497 |     
 498 |     def zoom_in(self) -> bool:
 499 |         """Zooms the camera in."""
 500 |         return self.send_command(0x0B, 0x04)
 501 |     
 502 |     def zoom_out(self) -> bool:
 503 |         """Zooms the camera out."""
 504 |         return self.send_command(0x0B, 0x05)
 505 |     
 506 |     def get_statistics(self) -> Dict[str, int]:
 507 |         """Get PTZ controller statistics"""
 508 |         return {
 509 |             'commands_sent': self.command_count,
 510 |             'errors': self.error_count
 511 |         }
 512 |     
 513 |     def close(self):
 514 |         """Closes the HID device handle safely."""
 515 |         if not PTZ_HID_AVAILABLE:
 516 |             return
 517 |         
 518 |         with self.device_lock:
 519 |             if self.device:
 520 |                 try:
 521 |                     self.device.close()
 522 |                     logging.info("PTZ device closed successfully.")
 523 |                 except Exception as e:
 524 |                     logging.error(f"Error closing PTZ device: {e}")
 525 |                 finally:
 526 |                     self.device = None
 527 | 
 528 | # ============================================================================
 529 | # PTZ TRACKER
 530 | # ============================================================================
 531 | 
 532 | class PTZTracker:
 533 |     """
 534 |     Autonomous PTZ tracking class with improved error handling
 535 |     """
 536 |     
 537 |     def __init__(self, camera: Optional[PTZController], 
 538 |                  desired_ratio: float = None,
 539 |                  zoom_tolerance: float = None,
 540 |                  pan_tilt_tolerance: int = None,
 541 |                  pan_tilt_interval: float = None,
 542 |                  zoom_interval: float = None,
 543 |                  smoothing_factor: float = None,
 544 |                  max_consecutive_errors: int = None):
 545 |         """
 546 |         Initializes the PTZTracker with configuration defaults
 547 |         """
 548 |         # Use config defaults if not specified
 549 |         self.desired_ratio = desired_ratio or config.PTZ_DESIRED_RATIO
 550 |         self.zoom_tolerance = zoom_tolerance or config.PTZ_ZOOM_TOLERANCE
 551 |         self.pan_tilt_tolerance = pan_tilt_tolerance or config.PTZ_PAN_TILT_TOLERANCE
 552 |         self.pan_tilt_interval = pan_tilt_interval or config.PTZ_PAN_TILT_INTERVAL
 553 |         self.zoom_interval = zoom_interval or config.PTZ_ZOOM_INTERVAL
 554 |         self.smoothing_factor = smoothing_factor or config.PTZ_SMOOTHING_FACTOR
 555 |         self.max_consecutive_errors = max_consecutive_errors or config.PTZ_MAX_ERRORS
 556 |         
 557 |         # Check if camera is available
 558 |         if not camera or not PTZ_AVAILABLE:
 559 |             self.active = False
 560 |             self.camera = None
 561 |             logging.info("PTZ Tracker initialized but inactive - PTZ functionality not available.")
 562 |             return
 563 |         
 564 |         # Validate parameters
 565 |         self._validate_parameters()
 566 |         
 567 |         self.camera = camera
 568 |         self.last_pan_tilt_adjust = 0.0
 569 |         self.last_zoom_adjust = 0.0
 570 |         self.smoothed_width = None
 571 |         self.smoothed_height = None
 572 |         self.active = False
 573 |         self.consecutive_errors = 0
 574 |         self.tracking_lock = threading.Lock()
 575 |     
 576 |     def _validate_parameters(self):
 577 |         """Validate tracker parameters"""
 578 |         if not (0 < self.smoothing_factor < 1):
 579 |             raise ValueError("smoothing_factor must be between 0 and 1.")
 580 |         if self.desired_ratio <= 0 or self.desired_ratio >= 1:
 581 |             raise ValueError("desired_ratio should be between 0 and 1.")
 582 |         if self.zoom_tolerance < 0:
 583 |             raise ValueError("zoom_tolerance must be >= 0.")
 584 |         if self.pan_tilt_tolerance < 0:
 585 |             raise ValueError("pan_tilt_tolerance must be >= 0.")
 586 |         if self.pan_tilt_interval <= 0 or self.zoom_interval <= 0:
 587 |             raise ValueError("Intervals must be positive.")
 588 |         if self.max_consecutive_errors < 1:
 589 |             raise ValueError("max_consecutive_errors must be at least 1.")
 590 |     
 591 |     def activate(self, active: bool = True):
 592 |         """Activate or deactivate PTZ tracking"""
 593 |         if not PTZ_AVAILABLE or not self.camera:
 594 |             logging.warning("Cannot activate PTZ tracking - PTZ functionality not available.")
 595 |             self.active = False
 596 |             return
 597 |         
 598 |         with self.tracking_lock:
 599 |             self.active = active
 600 |             if not active:
 601 |                 self.smoothed_width = None
 602 |                 self.smoothed_height = None
 603 |                 self.consecutive_errors = 0
 604 |             
 605 |             status = "activated" if active else "deactivated"
 606 |             logging.info(f"PTZ tracking {status}")
 607 |     
 608 |     def adjust_camera(self, bbox: Tuple[float, float, float, float], 
 609 |                      frame_width: int, frame_height: int):
 610 |         """
 611 |         Adjusts camera to keep object centered and properly sized
 612 |         
 613 |         :param bbox: Bounding box (x1, y1, x2, y2)
 614 |         :param frame_width: Frame width in pixels
 615 |         :param frame_height: Frame height in pixels
 616 |         """
 617 |         if not self.active or not PTZ_AVAILABLE or not self.camera:
 618 |             return
 619 |         
 620 |         with self.tracking_lock:
 621 |             x1, y1, x2, y2 = bbox
 622 |             
 623 |             # Validate bounding box
 624 |             if x1 >= x2 or y1 >= y2:
 625 |                 logging.debug("Invalid bbox coordinates; skipping camera adjustment.")
 626 |                 return
 627 |             
 628 |             bbox_width = x2 - x1
 629 |             bbox_height = y2 - y1
 630 |             
 631 |             # Initialize or update smoothed dimensions
 632 |             if self.smoothed_width is None:
 633 |                 self.smoothed_width = bbox_width
 634 |                 self.smoothed_height = bbox_height
 635 |             else:
 636 |                 self.smoothed_width = (
 637 |                     self.smoothing_factor * bbox_width +
 638 |                     (1 - self.smoothing_factor) * self.smoothed_width
 639 |                 )
 640 |                 self.smoothed_height = (
 641 |                     self.smoothing_factor * bbox_height +
 642 |                     (1 - self.smoothing_factor) * self.smoothed_height
 643 |                 )
 644 |             
 645 |             # Calculate centers
 646 |             bbox_center_x = (x1 + x2) / 2
 647 |             bbox_center_y = (y1 + y2) / 2
 648 |             frame_center_x = frame_width / 2
 649 |             frame_center_y = frame_height / 2
 650 |             
 651 |             # Calculate desired dimensions
 652 |             desired_width = frame_width * self.desired_ratio
 653 |             desired_height = frame_height * self.desired_ratio
 654 |             
 655 |             min_width = desired_width * (1 - self.zoom_tolerance)
 656 |             max_width = desired_width * (1 + self.zoom_tolerance)
 657 |             min_height = desired_height * (1 - self.zoom_tolerance)
 658 |             max_height = desired_height * (1 + self.zoom_tolerance)
 659 |             
 660 |             current_time = time.time()
 661 |             
 662 |             # Handle Pan/Tilt
 663 |             if (current_time - self.last_pan_tilt_adjust) >= self.pan_tilt_interval:
 664 |                 dx = bbox_center_x - frame_center_x
 665 |                 dy = bbox_center_y - frame_center_y
 666 |                 
 667 |                 pan_tilt_moved = False
 668 |                 
 669 |                 if abs(dx) > self.pan_tilt_tolerance:
 670 |                     command = "pan_left" if dx < 0 else "pan_right"
 671 |                     pan_tilt_moved = self._safe_camera_command(command) or pan_tilt_moved
 672 |                 
 673 |                 if abs(dy) > self.pan_tilt_tolerance:
 674 |                     command = "tilt_up" if dy < 0 else "tilt_down"
 675 |                     pan_tilt_moved = self._safe_camera_command(command) or pan_tilt_moved
 676 |                 
 677 |                 if pan_tilt_moved:
 678 |                     self.last_pan_tilt_adjust = current_time
 679 |             
 680 |             # Handle Zoom
 681 |             if (current_time - self.last_zoom_adjust) >= self.zoom_interval:
 682 |                 width_too_small = self.smoothed_width < min_width
 683 |                 height_too_small = self.smoothed_height < min_height
 684 |                 width_too_large = self.smoothed_width > max_width
 685 |                 height_too_large = self.smoothed_height > max_height
 686 |                 
 687 |                 zoom_moved = False
 688 |                 
 689 |                 if width_too_small or height_too_small:
 690 |                     zoom_moved = self._safe_camera_command("zoom_in")
 691 |                 elif width_too_large or height_too_large:
 692 |                     zoom_moved = self._safe_camera_command("zoom_out")
 693 |                 
 694 |                 if zoom_moved:
 695 |                     self.last_zoom_adjust = current_time
 696 |             
 697 |             # Check for too many errors
 698 |             if self.consecutive_errors >= self.max_consecutive_errors:
 699 |                 logging.error("Too many consecutive camera errors, deactivating PTZ tracking.")
 700 |                 self.activate(False)
 701 |     
 702 |     def _safe_camera_command(self, command: str) -> bool:
 703 |         """
 704 |         Safely execute a camera command
 705 |         
 706 |         :param command: Command name to execute
 707 |         :return: True if command succeeded
 708 |         """
 709 |         if not PTZ_AVAILABLE or not self.camera:
 710 |             self.consecutive_errors += 1
 711 |             return False
 712 |         
 713 |         if not hasattr(self.camera, command):
 714 |             logging.error(f"Camera does not support command '{command}'.")
 715 |             return False
 716 |         
 717 |         try:
 718 |             method = getattr(self.camera, command)
 719 |             success = method()
 720 |             
 721 |             if success:
 722 |                 self.consecutive_errors = 0
 723 |             else:
 724 |                 self.consecutive_errors += 1
 725 |             
 726 |             return success
 727 |             
 728 |         except Exception as e:
 729 |             self.consecutive_errors += 1
 730 |             logging.error(f"Error executing camera command '{command}': {e}")
 731 |             return False
 732 | 
 733 | # ============================================================================
 734 | # MODEL MANAGER
 735 | # ============================================================================
 736 | 
 737 | class ModelManager:
 738 |     """
 739 |     Enhanced model manager with better memory management
 740 |     """
 741 |     
 742 |     def __init__(self, device: torch.device, quantization: Optional[str] = None):
 743 |         """
 744 |         Initialize the ModelManager
 745 |         
 746 |         :param device: Torch device (cuda/cpu)
 747 |         :param quantization: Quantization mode (None, "4bit")
 748 |         """
 749 |         self.device = device
 750 |         self.model = None
 751 |         self.processor = None
 752 |         self.quantization = quantization
 753 |         self.model_lock = threading.Lock()
 754 |     
 755 |     def _get_quant_config(self) -> Optional[BitsAndBytesConfig]:
 756 |         """Get quantization configuration"""
 757 |         if self.quantization == "4bit":
 758 |             logging.info("Using 4-bit quantization.")
 759 |             return BitsAndBytesConfig(
 760 |                 load_in_4bit=True,
 761 |                 bnb_4bit_compute_dtype=torch.float16,
 762 |                 bnb_4bit_use_double_quant=True,
 763 |             )
 764 |         return None
 765 |     
 766 |     def load_local_model(self, model_path: str) -> bool:
 767 |         """
 768 |         Load a local model with proper error handling
 769 |         
 770 |         :param model_path: Path to model directory
 771 |         :return: True if successful
 772 |         """
 773 |         with self.model_lock:
 774 |             if not os.path.exists(model_path):
 775 |                 logging.error(f"Model path {os.path.abspath(model_path)} does not exist.")
 776 |                 return False
 777 |             
 778 |             if not os.path.isdir(model_path):
 779 |                 logging.error(f"Model path {os.path.abspath(model_path)} is not a directory.")
 780 |                 return False
 781 |             
 782 |             try:
 783 |                 logging.info(f"Loading model from {os.path.abspath(model_path)}")
 784 |                 quant_config = self._get_quant_config()
 785 |                 
 786 |                 # Clear existing model
 787 |                 if self.model:
 788 |                     del self.model
 789 |                     torch.cuda.empty_cache()
 790 |                 
 791 |                 self.model = AutoModelForCausalLM.from_pretrained(
 792 |                     model_path,
 793 |                     trust_remote_code=True,
 794 |                     quantization_config=quant_config,
 795 |                 ).eval()
 796 |                 
 797 |                 if not self.quantization:
 798 |                     self.model.to(self.device)
 799 |                     if torch.cuda.is_available():
 800 |                         self.model = self.model.half()
 801 |                         logging.info("Using FP16 precision for the model.")
 802 |                 
 803 |                 self.processor = AutoProcessor.from_pretrained(
 804 |                     model_path, trust_remote_code=True
 805 |                 )
 806 |                 
 807 |                 logging.info(f"Model loaded successfully from {os.path.abspath(model_path)}")
 808 |                 return True
 809 |                 
 810 |             except (OSError, ValueError, ModuleNotFoundError) as e:
 811 |                 logging.error(f"Error initializing model: {e}")
 812 |             except Exception as e:
 813 |                 logging.error(f"Unexpected error initializing model: {e}")
 814 |             
 815 |             return False
 816 |     
 817 |     def download_and_load_model(self, repo_id: str = "microsoft/Florence-2-base-ft") -> bool:
 818 |         """
 819 |         Download and load model from Hugging Face
 820 |         
 821 |         :param repo_id: HuggingFace repository ID
 822 |         :return: True if successful
 823 |         """
 824 |         try:
 825 |             local_model_dir = config.MODEL_CACHE_DIR
 826 |             
 827 |             # Create directory if it doesn't exist
 828 |             Path(local_model_dir).mkdir(parents=True, exist_ok=True)
 829 |             
 830 |             logging.info(f"Downloading model from {repo_id}...")
 831 |             snapshot_download(repo_id=repo_id, local_dir=local_model_dir)
 832 |             
 833 |             if not os.path.exists(local_model_dir):
 834 |                 logging.error(f"Model download failed, directory {local_model_dir} does not exist.")
 835 |                 return False
 836 |             
 837 |             logging.info(f"Model downloaded to {os.path.abspath(local_model_dir)}")
 838 |             return self.load_local_model(local_model_dir)
 839 |             
 840 |         except OSError as e:
 841 |             logging.error(f"OS error during model download: {e}")
 842 |         except Exception as e:
 843 |             logging.error(f"Error downloading model: {e}")
 844 |         
 845 |         return False
 846 | 
 847 | # ============================================================================
 848 | # RECORDING MANAGER
 849 | # ============================================================================
 850 | 
 851 | class RecordingManager:
 852 |     """
 853 |     Enhanced recording manager with better resource management
 854 |     """
 855 |     
 856 |     def __init__(self, record_mode: Optional[str] = None):
 857 |         """
 858 |         Initialize the recording manager
 859 |         
 860 |         :param record_mode: Recording mode (None, "od", "infy", "infn")
 861 |         """
 862 |         self.record_mode = record_mode
 863 |         self.recording = False
 864 |         self.video_writer = None
 865 |         self.video_out_path = None
 866 |         self.last_detection_time = time.time()
 867 |         self.writer_lock = threading.Lock()
 868 |         self.frame_count = 0
 869 |         self.start_time = None
 870 |     
 871 |     def start_recording(self, frame: np.ndarray) -> bool:
 872 |         """
 873 |         Start video recording
 874 |         
 875 |         :param frame: Initial frame
 876 |         :return: True if successful
 877 |         """
 878 |         with self.writer_lock:
 879 |             if self.recording or not self.record_mode:
 880 |                 return False
 881 |             
 882 |             try:
 883 |                 height, width = frame.shape[:2]
 884 |                 timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
 885 |                 
 886 |                 # Create recordings directory
 887 |                 record_dir = Path("recordings")
 888 |                 record_dir.mkdir(exist_ok=True)
 889 |                 
 890 |                 self.video_out_path = str(record_dir / f"recording_{timestamp}.avi")
 891 |                 
 892 |                 fourcc = cv2.VideoWriter_fourcc(*config.RECORDING_CODEC)
 893 |                 self.video_writer = cv2.VideoWriter(
 894 |                     self.video_out_path,
 895 |                     fourcc,
 896 |                     config.RECORDING_FPS,
 897 |                     (width, height)
 898 |                 )
 899 |                 
 900 |                 if self.video_writer.isOpened():
 901 |                     self.recording = True
 902 |                     self.start_time = time.time()
 903 |                     self.frame_count = 0
 904 |                     logging.info(f"Started recording: {self.video_out_path}")
 905 |                     return True
 906 |                 else:
 907 |                     logging.error("Failed to open video writer")
 908 |                     return False
 909 |                     
 910 |             except Exception as e:
 911 |                 logging.error(f"Error starting recording: {e}")
 912 |                 return False
 913 |     
 914 |     def stop_recording(self) -> Optional[str]:
 915 |         """
 916 |         Stop video recording
 917 |         
 918 |         :return: Path to recorded video
 919 |         """
 920 |         with self.writer_lock:
 921 |             if not self.recording:
 922 |                 return None
 923 |             
 924 |             try:
 925 |                 if self.video_writer:
 926 |                     self.video_writer.release()
 927 |                 
 928 |                 self.recording = False
 929 |                 duration = time.time() - self.start_time if self.start_time else 0
 930 |                 
 931 |                 logging.info(
 932 |                     f"Stopped recording: {self.video_out_path} "
 933 |                     f"(Duration: {duration:.2f}s, Frames: {self.frame_count})"
 934 |                 )
 935 |                 
 936 |                 path = self.video_out_path
 937 |                 self.video_out_path = None
 938 |                 self.video_writer = None
 939 |                 self.frame_count = 0
 940 |                 self.start_time = None
 941 |                 
 942 |                 return path
 943 |                 
 944 |             except Exception as e:
 945 |                 logging.error(f"Error stopping recording: {e}")
 946 |                 return None
 947 |     
 948 |     def write_frame(self, frame: np.ndarray) -> bool:
 949 |         """
 950 |         Write a frame to the video
 951 |         
 952 |         :param frame: Frame to write
 953 |         :return: True if successful
 954 |         """
 955 |         with self.writer_lock:
 956 |             if not self.recording or not self.video_writer:
 957 |                 return False
 958 |             
 959 |             try:
 960 |                 self.video_writer.write(frame)
 961 |                 self.frame_count += 1
 962 |                 return True
 963 |             except Exception as e:
 964 |                 logging.error(f"Error writing frame: {e}")
 965 |                 return False
 966 |     
 967 |     def handle_recording_by_detection(self, detections: List, frame: np.ndarray):
 968 |         """Handle recording based on object detection"""
 969 |         if not self.record_mode or self.record_mode != "od":
 970 |             return
 971 |         
 972 |         current_time = time.time()
 973 |         
 974 |         if detections:
 975 |             if not self.recording:
 976 |                 self.start_recording(frame)
 977 |             self.last_detection_time = current_time
 978 |             self.write_frame(frame)
 979 |         else:
 980 |             if self.recording and (current_time - self.last_detection_time) > config.RECORDING_TIMEOUT:
 981 |                 self.stop_recording()
 982 |     
 983 |     def handle_recording_by_inference(self, inference_result: str, frame: np.ndarray):
 984 |         """Handle recording based on inference results"""
 985 |         if not self.record_mode or self.record_mode not in ["infy", "infn"]:
 986 |             return
 987 |         
 988 |         should_record = False
 989 |         
 990 |         if self.record_mode == "infy" and inference_result.lower() == "yes":
 991 |             should_record = True
 992 |         elif self.record_mode == "infn" and inference_result.lower() == "no":
 993 |             should_record = True
 994 |         
 995 |         if should_record:
 996 |             if not self.recording:
 997 |                 self.start_recording(frame)
 998 |             self.write_frame(frame)
 999 |         else:
1000 |             if self.recording:
1001 |                 self.stop_recording()
1002 |     
1003 |     def cleanup(self):
1004 |         """Clean up resources"""
1005 |         if self.recording:
1006 |             self.stop_recording()
1007 | 
1008 | # ============================================================================
1009 | # IMAGE UTILITIES
1010 | # ============================================================================
1011 | 
1012 | class ImageUtils:
1013 |     """Utility class for image operations"""
1014 |     
1015 |     @staticmethod
1016 |     def plot_bbox(image: np.ndarray, detections: List[Tuple[List[float], str]]) -> np.ndarray:
1017 |         """
1018 |         Draw bounding boxes on image
1019 |         
1020 |         :param image: Input image
1021 |         :param detections: List of (bbox, label) tuples
1022 |         :return: Image with bounding boxes
1023 |         """
1024 |         try:
1025 |             for bbox, label in detections:
1026 |                 x1, y1, x2, y2 = map(int, bbox)
1027 |                 cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
1028 |                 cv2.putText(
1029 |                     image,
1030 |                     label,
1031 |                     (x1, y1 - 10),
1032 |                     cv2.FONT_HERSHEY_SIMPLEX,
1033 |                     0.5,
1034 |                     (0, 255, 0),
1035 |                     2,
1036 |                 )
1037 |             return image
1038 |         except cv2.error as e:
1039 |             logging.error(f"OpenCV error plotting bounding boxes: {e}")
1040 |         except Exception as e:
1041 |             logging.error(f"Error plotting bounding boxes: {e}")
1042 |         return image
1043 |     
1044 |     @staticmethod
1045 |     def save_screenshot(frame: np.ndarray) -> Optional[str]:
1046 |         """
1047 |         Save a screenshot with timestamp
1048 |         
1049 |         :param frame: Frame to save
1050 |         :return: Path to saved screenshot
1051 |         """
1052 |         try:
1053 |             # Create screenshots directory
1054 |             screenshot_dir = Path("screenshots")
1055 |             screenshot_dir.mkdir(exist_ok=True)
1056 |             
1057 |             timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
1058 |             filename = str(screenshot_dir / f"screenshot_{timestamp}.png")
1059 |             
1060 |             cv2.imwrite(filename, frame)
1061 |             logging.info(f"Screenshot saved: {filename}")
1062 |             return filename
1063 |             
1064 |         except cv2.error as e:
1065 |             logging.error(f"OpenCV error saving screenshot: {e}")
1066 |         except Exception as e:
1067 |             logging.error(f"Error saving screenshot: {e}")
1068 |         
1069 |         return None
1070 | 
1071 | # ============================================================================
1072 | # ALERT LOGGER
1073 | # ============================================================================
1074 | 
1075 | class AlertLogger:
1076 |     """Enhanced alert logging with thread safety"""
1077 |     
1078 |     def __init__(self, log_file: str = None):
1079 |         """
1080 |         Initialize alert logger
1081 |         
1082 |         :param log_file: Path to log file
1083 |         """
1084 |         self.log_file = log_file or config.LOG_FILE
1085 |         self.log_lock = threading.Lock()
1086 |         
1087 |         # Create logs directory
1088 |         log_dir = Path("logs")
1089 |         log_dir.mkdir(exist_ok=True)
1090 |         
1091 |         self.log_path = log_dir / self.log_file
1092 |     
1093 |     def log_alert(self, message: str):
1094 |         """
1095 |         Log an alert message
1096 |         
1097 |         :param message: Alert message
1098 |         """
1099 |         with self.log_lock:
1100 |             try:
1101 |                 timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")
1102 |                 log_entry = f"{timestamp} - {message}\n"
1103 |                 
1104 |                 with open(self.log_path, "a") as log_file:
1105 |                     log_file.write(log_entry)
1106 |                 
1107 |                 logging.info(f"Alert logged: {message}")
1108 |                 
1109 |             except IOError as e:
1110 |                 logging.error(f"IO error logging alert: {e}")
1111 |             except Exception as e:
1112 |                 logging.error(f"Error logging alert: {e}")
1113 | 
1114 | # ============================================================================
1115 | # PTZ CONTROL THREAD
1116 | # ============================================================================
1117 | 
1118 | def ptz_control_thread(ptz_camera: PTZController):
1119 |     """
1120 |     Thread for manual PTZ control using keyboard
1121 |     
1122 |     :param ptz_camera: PTZ camera controller
1123 |     """
1124 |     if not PTZ_MSVCRT_AVAILABLE:
1125 |         print("Cannot start PTZ control thread - msvcrt module not available.")
1126 |         return
1127 |     
1128 |     if not PTZ_HID_AVAILABLE or not ptz_camera:
1129 |         print("Cannot start PTZ control thread - PTZ camera not available.")
1130 |         return
1131 |     
1132 |     print("PTZ control started. Use arrow keys to pan/tilt, +/- to zoom, q to quit.")
1133 |     
1134 |     while True:
1135 |         try:
1136 |             ch = msvcrt.getch()
1137 |             
1138 |             if ch == b'\xe0':  # Arrow key prefix
1139 |                 arrow = msvcrt.getch()
1140 |                 if arrow == b'H':  # Up arrow
1141 |                     ptz_camera.tilt_up()
1142 |                 elif arrow == b'P':  # Down arrow
1143 |                     ptz_camera.tilt_down()
1144 |                 elif arrow == b'K':  # Left arrow
1145 |                     ptz_camera.pan_left()
1146 |                 elif arrow == b'M':  # Right arrow
1147 |                     ptz_camera.pan_right()
1148 |             elif ch == b'+':
1149 |                 ptz_camera.zoom_in()
1150 |             elif ch == b'-':
1151 |                 ptz_camera.zoom_out()
1152 |             elif ch == b'q':
1153 |                 print("Quitting PTZ control.")
1154 |                 break
1155 |                 
1156 |         except Exception as e:
1157 |             logging.error(f"Error in PTZ control thread: {e}")
1158 |             break
1159 | 
1160 | # ============================================================================
1161 | # MAIN YO-FLO APPLICATION CLASS
1162 | # ============================================================================
1163 | 
1164 | class YO_FLO:
1165 |     def __init__(self):
1166 |         """Initialize YO-FLO with all attributes properly initialized"""
1167 |         
1168 |         # Device configuration
1169 |         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
1170 |         
1171 |         # Model and processor
1172 |         self.model = None
1173 |         self.processor = None
1174 |         self.model_path = None
1175 |         self.model_manager = None
1176 |         self.quantization = None
1177 |         
1178 |         # GUI elements
1179 |         self.root = tk.Tk()
1180 |         self.root.withdraw()
1181 |         self.caption_label = None
1182 |         self.inference_rate_label = None
1183 |         self.inference_result_label = None
1184 |         self.inference_phrases_result_labels = []
1185 |         
1186 |         # Detection settings
1187 |         self.class_names = []
1188 |         self.detections = []
1189 |         self.phrase = None
1190 |         self.visual_grounding_phrase = None
1191 |         self.inference_title = None
1192 |         self.inference_phrases = []
1193 |         
1194 |         # Feature flags
1195 |         self.headless_mode = False
1196 |         self.object_detection_active = False
1197 |         self.expression_comprehension_active = False
1198 |         self.visual_grounding_active = False
1199 |         self.inference_tree_active = False
1200 |         self.beep_active = False
1201 |         self.screenshot_active = False
1202 |         self.screenshot_on_yes_active = False
1203 |         self.screenshot_on_no_active = False
1204 |         self.debug = False
1205 |         self.log_to_file_active = False
1206 |         
1207 |         # Tracking and timing
1208 |         self.target_detected = False
1209 |         self.last_beep_time = 0
1210 |         self.inference_start_time = None
1211 |         self.inference_count = 0
1212 |         self.last_process_time = time.time()
1213 |         
1214 |         # Image handling
1215 |         self.latest_image = None
1216 |         self.frame_lock = threading.Lock()
1217 |         
1218 |         # Recording
1219 |         self.record = None
1220 |         self.recording_manager = None
1221 |         
1222 |         # PTZ
1223 |         self.ptz_camera = None
1224 |         self.ptz_tracker = None
1225 |         self.track_object_name = None
1226 |         
1227 |         # Threading
1228 |         self.webcam_threads = []
1229 |         self.webcam_indices = config.DEFAULT_WEBCAM_INDICES
1230 |         self.stop_webcam_flag = threading.Event()
1231 |         self.frame_processor = FrameProcessor()
1232 |         
1233 |         # Performance
1234 |         self.scaler = torch.cuda.amp.GradScaler()
1235 |         
1236 |         # Cleanup
1237 |         self.cleanup_thread = None
1238 |         self.cleanup_flag = threading.Event()
1239 |         
1240 |         # Security validator
1241 |         self.validator = SecurityValidator()
1242 |         
1243 |         # Alert logger
1244 |         self.alert_logger = AlertLogger()
1245 |         
1246 |         # Start periodic cleanup
1247 |         self._start_cleanup_thread()
1248 |     
1249 |     def _start_cleanup_thread(self):
1250 |         """Start a thread for periodic memory cleanup"""
1251 |         def cleanup_worker():
1252 |             while not self.cleanup_flag.is_set():
1253 |                 time.sleep(config.CLEANUP_INTERVAL)
1254 |                 self._periodic_cleanup()
1255 |         
1256 |         self.cleanup_thread = threading.Thread(target=cleanup_worker, daemon=True)
1257 |         self.cleanup_thread.start()
1258 |     
1259 |     def _periodic_cleanup(self):
1260 |         """Perform periodic memory cleanup"""
1261 |         try:
1262 |             gc.collect()
1263 |             if torch.cuda.is_available():
1264 |                 torch.cuda.empty_cache()
1265 |             logging.debug("Periodic memory cleanup completed")
1266 |         except Exception as e:
1267 |             logging.error(f"Error during periodic cleanup: {e}")
1268 |     
1269 |     # -------------------------------------------------------------------------
1270 |     # Model Management
1271 |     # -------------------------------------------------------------------------
1272 |     
1273 |     def init_model_manager(self, quantization_mode: Optional[str] = None):
1274 |         """Initialize the ModelManager with proper validation"""
1275 |         if quantization_mode and quantization_mode not in config.QUANTIZATION_OPTIONS:
1276 |             logging.warning(f"Invalid quantization mode: {quantization_mode}")
1277 |             quantization_mode = None
1278 |         
1279 |         self.quantization = quantization_mode
1280 |         self.model_manager = ModelManager(self.device, self.quantization)
1281 |     
1282 |     def load_local_model(self, model_path: str):
1283 |         """Load local model with path validation"""
1284 |         safe_path = self.validator.validate_path(model_path)
1285 |         if not safe_path:
1286 |             print(f"{Fore.RED}Invalid or unsafe model path: {model_path}{Style.RESET_ALL}")
1287 |             return
1288 |         
1289 |         if not safe_path.is_dir():
1290 |             print(f"{Fore.RED}Model path must be a directory: {safe_path}{Style.RESET_ALL}")
1291 |             return
1292 |         
1293 |         if not self.model_manager:
1294 |             self.init_model_manager()
1295 |         
1296 |         ok = self.model_manager.load_local_model(str(safe_path))
1297 |         if ok:
1298 |             self.model = self.model_manager.model
1299 |             self.processor = self.model_manager.processor
1300 |             self.model_path = str(safe_path)
1301 |             print(f"{Fore.GREEN}Model loaded successfully from {safe_path}{Style.RESET_ALL}")
1302 |         else:
1303 |             print(f"{Fore.RED}Failed to load model from {safe_path}{Style.RESET_ALL}")
1304 |     
1305 |     def download_model(self, repo_id: str = "microsoft/Florence-2-base-ft"):
1306 |         """Download and load model from Hugging Face"""
1307 |         if not self.model_manager:
1308 |             self.init_model_manager()
1309 |         
1310 |         ok = self.model_manager.download_and_load_model(repo_id)
1311 |         if ok:
1312 |             self.model = self.model_manager.model
1313 |             self.processor = self.model_manager.processor
1314 |             print(f"{Fore.GREEN}Model downloaded and initialized successfully!{Style.RESET_ALL}")
1315 |         else:
1316 |             print(f"{Fore.RED}Failed to download/initialize model.{Style.RESET_ALL}")
1317 |     
1318 |     # -------------------------------------------------------------------------
1319 |     # Model Inference Methods
1320 |     # -------------------------------------------------------------------------
1321 |     
1322 |     def prepare_inputs(self, task_prompt: str, image: Image.Image, phrase: Optional[str] = None):
1323 |         """Prepare inputs for model inference"""
1324 |         inputs = self.processor(text=task_prompt, images=image, return_tensors="pt").to(self.device)
1325 |         
1326 |         if phrase:
1327 |             inputs["input_ids"] = torch.cat(
1328 |                 [
1329 |                     inputs["input_ids"],
1330 |                     self.processor.tokenizer(phrase, return_tensors="pt")
1331 |                     .input_ids[:, 1:]
1332 |                     .to(self.device),
1333 |                 ],
1334 |                 dim=1,
1335 |             )
1336 |         
1337 |         for k, v in inputs.items():
1338 |             if torch.is_floating_point(v):
1339 |                 inputs[k] = v.half()
1340 |         
1341 |         return inputs
1342 |     
1343 |     def run_model(self, inputs: Dict[str, torch.Tensor]) -> torch.Tensor:
1344 |         """Run model inference"""
1345 |         with torch.amp.autocast("cuda"):
1346 |             generated_ids = self.model.generate(
1347 |                 input_ids=inputs["input_ids"],
1348 |                 pixel_values=inputs.get("pixel_values"),
1349 |                 max_new_tokens=1024,
1350 |                 early_stopping=False,
1351 |                 do_sample=False,
1352 |                 num_beams=1,
1353 |             )
1354 |         return generated_ids
1355 |     
1356 |     def process_object_detection_outputs(self, generated_ids: torch.Tensor, 
1357 |                                         image_size: Tuple[int, int]) -> Dict:
1358 |         """Process object detection outputs"""
1359 |         generated_text = self.processor.batch_decode(
1360 |             generated_ids, skip_special_tokens=False
1361 |         )[0]
1362 |         parsed_answer = self.processor.post_process_generation(
1363 |             generated_text, task="<OD>", image_size=image_size
1364 |         )
1365 |         return parsed_answer
1366 |     
1367 |     def process_expression_comprehension_outputs(self, generated_ids: torch.Tensor) -> str:
1368 |         """Process expression comprehension outputs"""
1369 |         generated_text = self.processor.batch_decode(
1370 |             generated_ids, skip_special_tokens=False
1371 |         )[0]
1372 |         return generated_text
1373 |     
1374 |     def run_object_detection(self, image: Image.Image) -> List[Tuple[List[float], str]]:
1375 |         """Run object detection on an image"""
1376 |         try:
1377 |             if not self.model or not self.processor:
1378 |                 raise ValueError("Model or processor is not initialized.")
1379 |             
1380 |             task_prompt = "<OD>"
1381 |             if self.debug:
1382 |                 print(f"Running object detection with task prompt: {task_prompt}")
1383 |             
1384 |             inputs = self.prepare_inputs(task_prompt, image)
1385 |             generated_ids = self.run_model(inputs)
1386 |             
1387 |             if self.debug:
1388 |                 print(f"Generated IDs: {generated_ids}")
1389 |             
1390 |             parsed_answer = self.process_object_detection_outputs(generated_ids, image.size)
1391 |             
1392 |             if self.debug:
1393 |                 print(f"Parsed answer: {parsed_answer}")
1394 |             
1395 |             detections = []
1396 |             if parsed_answer and "<OD>" in parsed_answer:
1397 |                 for bbox, label in zip(
1398 |                     parsed_answer["<OD>"]["bboxes"], 
1399 |                     parsed_answer["<OD>"]["labels"]
1400 |                 ):
1401 |                     if not self.class_names or label.lower() in self.class_names:
1402 |                         detections.append((bbox, label))
1403 |             
1404 |             return detections
1405 |             
1406 |         except AttributeError as e:
1407 |             logging.error(f"Model or processor not initialized properly: {e}")
1408 |         except Exception as e:
1409 |             logging.error(f"Error running object detection: {e}")
1410 |         
1411 |         return []
1412 |     
1413 |     def run_expression_comprehension(self, image: Image.Image, phrase: str) -> Optional[str]:
1414 |         """Run expression comprehension on an image"""
1415 |         try:
1416 |             task_prompt = "<CAPTION_TO_EXPRESSION_COMPREHENSION>"
1417 |             
1418 |             if self.debug:
1419 |                 print(f"Running expression comprehension with phrase: {phrase}")
1420 |             
1421 |             inputs = self.prepare_inputs(task_prompt, image, phrase)
1422 |             generated_ids = self.run_model(inputs)
1423 |             
1424 |             if self.debug:
1425 |                 print(f"Generated IDs: {generated_ids}")
1426 |             
1427 |             generated_text = self.process_expression_comprehension_outputs(generated_ids)
1428 |             
1429 |             if self.debug:
1430 |                 print(f"Generated text: {generated_text}")
1431 |             
1432 |             return generated_text
1433 |             
1434 |         except Exception as e:
1435 |             logging.error(f"Error running expression comprehension: {e}")
1436 |             return None
1437 |     
1438 |     def run_visual_grounding(self, image: Image.Image, phrase: str) -> Optional[List[float]]:
1439 |         """Run visual grounding on an image"""
1440 |         try:
1441 |             task_prompt = "<CAPTION_TO_PHRASE_GROUNDING>"
1442 |             inputs = self.prepare_inputs(task_prompt, image, phrase)
1443 |             generated_ids = self.run_model(inputs)
1444 |             
1445 |             if self.debug:
1446 |                 print(f"Generated IDs: {generated_ids}")
1447 |             
1448 |             generated_text = self.processor.batch_decode(
1449 |                 generated_ids, skip_special_tokens=False
1450 |             )[0]
1451 |             
1452 |             if self.debug:
1453 |                 print(f"Generated text: {generated_text}")
1454 |             
1455 |             parsed_answer = self.processor.post_process_generation(
1456 |                 generated_text, task=task_prompt, image_size=image.size
1457 |             )
1458 |             
1459 |             if self.debug:
1460 |                 print(f"Parsed answer: {parsed_answer}")
1461 |             
1462 |             if task_prompt in parsed_answer and parsed_answer[task_prompt]["bboxes"]:
1463 |                 return parsed_answer[task_prompt]["bboxes"][0]
1464 |             
1465 |             return None
1466 |             
1467 |         except Exception as e:
1468 |             logging.error(f"Error running visual grounding: {e}")
1469 |             return None
1470 |     
1471 |     def evaluate_inference_tree(self, image: Image.Image) -> Tuple[str, List[bool]]:
1472 |         """Evaluate inference tree on an image"""
1473 |         try:
1474 |             if not self.inference_phrases:
1475 |                 logging.error("No inference phrases set.")
1476 |                 return "FAIL", []
1477 |             
1478 |             results = []
1479 |             phrase_results = []
1480 |             
1481 |             for phrase in self.inference_phrases:
1482 |                 result = self.run_expression_comprehension(image, phrase)
1483 |                 if result:
1484 |                     if "yes" in result.lower():
1485 |                         results.append(True)
1486 |                         phrase_results.append(True)
1487 |                     else:
1488 |                         results.append(False)
1489 |                         phrase_results.append(False)
1490 |             
1491 |             overall_result = "PASS" if all(results) else "FAIL"
1492 |             return overall_result, phrase_results
1493 |             
1494 |         except Exception as e:
1495 |             logging.error(f"Error evaluating inference tree: {e}")
1496 |             return "FAIL", []
1497 |     
1498 |     # -------------------------------------------------------------------------
1499 |     # GUI Methods
1500 |     # -------------------------------------------------------------------------
1501 |     
1502 |     def select_model_path(self):
1503 |         """Select model path with security validation"""
1504 |         try:
1505 |             model_path = filedialog.askdirectory(
1506 |                 title="Select Model Directory",
1507 |                 initialdir=os.getcwd()
1508 |             )
1509 |             
1510 |             if model_path:
1511 |                 self.load_local_model(model_path)
1512 |             else:
1513 |                 print(f"{Fore.YELLOW}Model path selection cancelled.{Style.RESET_ALL}")
1514 |                 
1515 |         except Exception as e:
1516 |             print(f"{Fore.RED}Error selecting model path: {e}{Style.RESET_ALL}")
1517 |     
1518 |     def download_model_gui(self):
1519 |         """Download model from GUI"""
1520 |         try:
1521 |             self.download_model(config.DEFAULT_MODEL)
1522 |         except Exception as e:
1523 |             print(f"{Fore.RED}Error downloading model: {e}{Style.RESET_ALL}")
1524 |     
1525 |     def set_class_names(self):
1526 |         """Set class names with input sanitization"""
1527 |         try:
1528 |             class_names_input = simpledialog.askstring(
1529 |                 "Set Class Names",
1530 |                 "Enter class names separated by commas (e.g., 'cat, dog'):"
1531 |             )
1532 |             
1533 |             if class_names_input:
1534 |                 sanitized_names = self.validator.sanitize_class_names(class_names_input)
1535 |                 
1536 |                 if sanitized_names:
1537 |                     self.class_names = sanitized_names
1538 |                     print(f"{Fore.GREEN}Set to detect: {', '.join(self.class_names)}{Style.RESET_ALL}")
1539 |                 else:
1540 |                     print(f"{Fore.RED}Invalid class names input{Style.RESET_ALL}")
1541 |             else:
1542 |                 self.class_names = []
1543 |                 print(f"{Fore.GREEN}Showing all detections{Style.RESET_ALL}")
1544 |                 
1545 |         except Exception as e:
1546 |             print(f"{Fore.RED}Error setting class names: {e}{Style.RESET_ALL}")
1547 |     
1548 |     def set_phrase(self):
1549 |         """Set phrase with input sanitization"""
1550 |         try:
1551 |             phrase_input = simpledialog.askstring(
1552 |                 "Set Phrase",
1553 |                 "Enter a yes/no question (e.g., 'Is the person smiling?'):"
1554 |             )
1555 |             
1556 |             if phrase_input:
1557 |                 sanitized_phrase = self.validator.sanitize_phrase(phrase_input)
1558 |                 
1559 |                 if sanitized_phrase:
1560 |                     self.phrase = sanitized_phrase
1561 |                     print(f"{Fore.GREEN}Set to comprehend: {self.phrase}{Style.RESET_ALL}")
1562 |                 else:
1563 |                     print(f"{Fore.RED}Invalid phrase input{Style.RESET_ALL}")
1564 |             else:
1565 |                 self.phrase = None
1566 |                 print(f"{Fore.GREEN}No phrase set for comprehension{Style.RESET_ALL}")
1567 |                 
1568 |         except Exception as e:
1569 |             print(f"{Fore.RED}Error setting phrase: {e}{Style.RESET_ALL}")
1570 |     
1571 |     def set_visual_grounding_phrase(self):
1572 |         """Set visual grounding phrase"""
1573 |         try:
1574 |             phrase_input = simpledialog.askstring(
1575 |                 "Set Visual Grounding Phrase",
1576 |                 "Enter the phrase for visual grounding:"
1577 |             )
1578 |             
1579 |             if phrase_input:
1580 |                 sanitized_phrase = self.validator.sanitize_phrase(phrase_input)
1581 |                 
1582 |                 if sanitized_phrase:
1583 |                     self.visual_grounding_phrase = sanitized_phrase
1584 |                     print(f"{Fore.GREEN}Set visual grounding phrase: {self.visual_grounding_phrase}{Style.RESET_ALL}")
1585 |                 else:
1586 |                     print(f"{Fore.RED}Invalid phrase input{Style.RESET_ALL}")
1587 |             else:
1588 |                 self.visual_grounding_phrase = None
1589 |                 print(f"{Fore.GREEN}No phrase set for visual grounding{Style.RESET_ALL}")
1590 |                 
1591 |         except Exception as e:
1592 |             print(f"{Fore.RED}Error setting visual grounding phrase: {e}{Style.RESET_ALL}")
1593 |     
1594 |     def set_inference_tree(self):
1595 |         """Set up inference tree"""
1596 |         try:
1597 |             self.inference_title = simpledialog.askstring(
1598 |                 "Inference Title",
1599 |                 "Enter the title for the inference tree:"
1600 |             )
1601 |             
1602 |             self.inference_phrases = []
1603 |             for i in range(3):
1604 |                 phrase = simpledialog.askstring(
1605 |                     "Set Inference Phrase",
1606 |                     f"Enter inference phrase {i+1} (e.g., 'Is it cloudy?'):"
1607 |                 )
1608 |                 
1609 |                 if phrase:
1610 |                     sanitized = self.validator.sanitize_phrase(phrase)
1611 |                     if sanitized:
1612 |                         self.inference_phrases.append(sanitized)
1613 |                     else:
1614 |                         print(f"{Fore.RED}Invalid phrase {i+1}{Style.RESET_ALL}")
1615 |                         return
1616 |                 else:
1617 |                     print(f"{Fore.YELLOW}Cancelled setting inference phrase {i+1}.{Style.RESET_ALL}")
1618 |                     return
1619 |             
1620 |             if self.inference_title and self.inference_phrases:
1621 |                 print(f"{Fore.GREEN}Inference tree set with title: {self.inference_title}{Style.RESET_ALL}")
1622 |                 for phrase in self.inference_phrases:
1623 |                     print(f"{Fore.GREEN}Inference phrase: {phrase}{Style.RESET_ALL}")
1624 |             else:
1625 |                 print(f"{Fore.YELLOW}Inference tree setting cancelled.{Style.RESET_ALL}")
1626 |                 
1627 |         except Exception as e:
1628 |             print(f"{Fore.RED}Error setting inference tree: {e}{Style.RESET_ALL}")
1629 |     
1630 |     # -------------------------------------------------------------------------
1631 |     # Feature Toggles
1632 |     # -------------------------------------------------------------------------
1633 |     
1634 |     def toggle_file_logging(self):
1635 |         """Toggle file logging"""
1636 |         self.log_to_file_active = not self.log_to_file_active
1637 |         setup_logging(self.log_to_file_active)
1638 |         status = "enabled" if self.log_to_file_active else "disabled"
1639 |         print(f"{Fore.GREEN}File logging is now {status}{Style.RESET_ALL}")
1640 |     
1641 |     def toggle_headless(self):
1642 |         """Toggle headless mode"""
1643 |         try:
1644 |             self.headless_mode = not self.headless_mode
1645 |             status = "enabled" if self.headless_mode else "disabled"
1646 |             print(f"{Fore.GREEN}Headless mode is now {status}{Style.RESET_ALL}")
1647 |         except Exception as e:
1648 |             print(f"{Fore.RED}Error toggling headless mode: {e}{Style.RESET_ALL}")
1649 |     
1650 |     def toggle_object_detection(self):
1651 |         """Toggle object detection"""
1652 |         self.object_detection_active = not self.object_detection_active
1653 |         if not self.object_detection_active:
1654 |             self.detections.clear()
1655 |             self.class_names = []
1656 |         status = "enabled" if self.object_detection_active else "disabled"
1657 |         print(f"{Fore.GREEN}Object detection is now {status}{Style.RESET_ALL}")
1658 |     
1659 |     def toggle_expression_comprehension(self):
1660 |         """Toggle expression comprehension"""
1661 |         self.expression_comprehension_active = not self.expression_comprehension_active
1662 |         status = "enabled" if self.expression_comprehension_active else "disabled"
1663 |         print(f"{Fore.GREEN}Expression comprehension is now {status}{Style.RESET_ALL}")
1664 |     
1665 |     def toggle_visual_grounding(self):
1666 |         """Toggle visual grounding"""
1667 |         self.visual_grounding_active = not self.visual_grounding_active
1668 |         status = "enabled" if self.visual_grounding_active else "disabled"
1669 |         print(f"{Fore.GREEN}Visual grounding is now {status}{Style.RESET_ALL}")
1670 |     
1671 |     def toggle_inference_tree(self):
1672 |         """Toggle inference tree"""
1673 |         self.inference_tree_active = not self.inference_tree_active
1674 |         status = "enabled" if self.inference_tree_active else "disabled"
1675 |         print(f"{Fore.GREEN}Inference tree evaluation is now {status}{Style.RESET_ALL}")
1676 |     
1677 |     def toggle_beep(self):
1678 |         """Toggle beep on detection"""
1679 |         self.beep_active = not self.beep_active
1680 |         status = "active" if self.beep_active else "inactive"
1681 |         print(f"{Fore.GREEN}Beep is now {status}{Style.RESET_ALL}")
1682 |     
1683 |     def toggle_screenshot(self):
1684 |         """Toggle screenshot on detection"""
1685 |         self.screenshot_active = not self.screenshot_active
1686 |         status = "active" if self.screenshot_active else "inactive"
1687 |         print(f"{Fore.GREEN}Screenshot on detection is now {status}{Style.RESET_ALL}")
1688 |     
1689 |     def toggle_screenshot_on_yes(self):
1690 |         """Toggle screenshot on yes inference"""
1691 |         self.screenshot_on_yes_active = not self.screenshot_on_yes_active
1692 |         status = "active" if self.screenshot_on_yes_active else "inactive"
1693 |         print(f"{Fore.GREEN}Screenshot on Yes Inference is now {status}{Style.RESET_ALL}")
1694 |     
1695 |     def toggle_screenshot_on_no(self):
1696 |         """Toggle screenshot on no inference"""
1697 |         self.screenshot_on_no_active = not self.screenshot_on_no_active
1698 |         status = "active" if self.screenshot_on_no_active else "inactive"
1699 |         print(f"{Fore.GREEN}Screenshot on No Inference is now {status}{Style.RESET_ALL}")
1700 |     
1701 |     def toggle_debug(self):
1702 |         """Toggle debug mode"""
1703 |         self.debug = not self.debug
1704 |         status = "enabled" if self.debug else "disabled"
1705 |         print(f"{Fore.GREEN}Debug mode is now {status}{Style.RESET_ALL}")
1706 |     
1707 |     # -------------------------------------------------------------------------
1708 |     # PTZ Control Methods
1709 |     # -------------------------------------------------------------------------
1710 |     
1711 |     def init_ptz_camera(self):
1712 |         """Initialize PTZ camera"""
1713 |         if not PTZ_AVAILABLE:
1714 |             print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}")
1715 |             return
1716 |         
1717 |         if not self.ptz_camera:
1718 |             self.ptz_camera = PTZController()
1719 |     
1720 |     def set_ptz_target_class(self):
1721 |         """Set PTZ target class"""
1722 |         if not PTZ_AVAILABLE:
1723 |             print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}")
1724 |             return
1725 |         
1726 |         try:
1727 |             target_class = simpledialog.askstring(
1728 |                 "PTZ Target Class",
1729 |                 "Enter the object class name to track (e.g., 'person'):"
1730 |             )
1731 |             
1732 |             if target_class:
1733 |                 sanitized = self.validator.sanitize_class_names(target_class)
1734 |                 if sanitized:
1735 |                     self.track_object_name = sanitized[0]
1736 |                     print(f"{Fore.GREEN}PTZ tracking target: {self.track_object_name}{Style.RESET_ALL}")
1737 |                 else:
1738 |                     print(f"{Fore.RED}Invalid target class{Style.RESET_ALL}")
1739 |             else:
1740 |                 print(f"{Fore.YELLOW}PTZ target class input cancelled.{Style.RESET_ALL}")
1741 |                 
1742 |         except Exception as e:
1743 |             print(f"{Fore.RED}Error setting PTZ target class: {e}{Style.RESET_ALL}")
1744 |     
1745 |     def start_autonomous_ptz_tracking(self):
1746 |         """Start autonomous PTZ tracking"""
1747 |         if not PTZ_AVAILABLE:
1748 |             print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}")
1749 |             return
1750 |         
1751 |         self.init_ptz_camera()
1752 |         if not self.ptz_tracker:
1753 |             self.ptz_tracker = PTZTracker(self.ptz_camera)
1754 |         
1755 |         self.ptz_tracker.activate(True)
1756 |         
1757 |         if self.track_object_name:
1758 |             print(f"{Fore.GREEN}Autonomous PTZ tracking activated for: {self.track_object_name}{Style.RESET_ALL}")
1759 |         else:
1760 |             print(f"{Fore.GREEN}Autonomous PTZ tracking activated (no target set).{Style.RESET_ALL}")
1761 |     
1762 |     def stop_autonomous_ptz_tracking(self):
1763 |         """Stop autonomous PTZ tracking"""
1764 |         if not PTZ_AVAILABLE:
1765 |             print(f"{Fore.YELLOW}PTZ camera functionality not available.{Style.RESET_ALL}")
1766 |             return
1767 |         
1768 |         if self.ptz_tracker:
1769 |             self.ptz_tracker.activate(False)
1770 |             print(f"{Fore.GREEN}Autonomous PTZ tracking deactivated.{Style.RESET_ALL}")
1771 |     
1772 |     def open_manual_ptz_control(self):
1773 |         """Open manual PTZ control"""
1774 |         if not PTZ_AVAILABLE or not PTZ_MSVCRT_AVAILABLE:
1775 |             print(f"{Fore.YELLOW}PTZ manual control not available.{Style.RESET_ALL}")
1776 |             return
1777 |         
1778 |         self.init_ptz_camera()
1779 |         if not self.ptz_camera:
1780 |             print(f"{Fore.YELLOW}PTZ camera could not be initialized.{Style.RESET_ALL}")
1781 |             return
1782 |         
1783 |         thread = threading.Thread(target=ptz_control_thread, args=(self.ptz_camera,), daemon=True)
1784 |         thread.start()
1785 |     
1786 |     # -------------------------------------------------------------------------
1787 |     # Recording Control
1788 |     # -------------------------------------------------------------------------
1789 |     
1790 |     def set_record_mode(self, mode: Optional[str]):
1791 |         """Set recording mode"""
1792 |         self.record = mode
1793 |         self.recording_manager = RecordingManager(self.record)
1794 |         mode_str = mode if mode else "None"
1795 |         print(f"{Fore.GREEN}Recording mode set to {mode_str}{Style.RESET_ALL}")
1796 |     
1797 |     # -------------------------------------------------------------------------
1798 |     # Frame Processing
1799 |     # -------------------------------------------------------------------------
1800 |     
1801 |     def should_process_frame(self) -> bool:
1802 |         """Check if enough time has passed for next frame"""
1803 |         current_time = time.time()
1804 |         if (current_time - self.last_process_time) >= config.MIN_PROCESS_INTERVAL:
1805 |             self.last_process_time = current_time
1806 |             return True
1807 |         return False
1808 |     
1809 |     def _pick_tracked_object(self, detections: List[Tuple[List[float], str]]) -> Optional[List[float]]:
1810 |         """Pick the largest bounding box of the tracked object"""
1811 |         if not self.track_object_name:
1812 |             return None
1813 |         
1814 |         candidate_detections = [
1815 |             (bbox, label)
1816 |             for bbox, label in detections
1817 |             if label.lower() == self.track_object_name.lower()
1818 |         ]
1819 |         
1820 |         if not candidate_detections:
1821 |             return None
1822 |         
1823 |         def bbox_area(bb):
1824 |             return (bb[2] - bb[0]) * (bb[3] - bb[1])
1825 |         
1826 |         largest_bbox = max(candidate_detections, key=lambda x: bbox_area(x[0]))[0]
1827 |         return largest_bbox
1828 |     
1829 |     def plot_bbox(self, image: np.ndarray) -> np.ndarray:
1830 |         """Plot bounding boxes on image"""
1831 |         try:
1832 |             if not self.detections:
1833 |                 return image
1834 |             return ImageUtils.plot_bbox(image, self.detections)
1835 |         except Exception as e:
1836 |             logging.error(f"Error plotting bounding boxes: {e}")
1837 |             return image
1838 |     
1839 |     def plot_visual_grounding_bbox(self, image: np.ndarray, bbox: List[float], phrase: str) -> np.ndarray:
1840 |         """Plot visual grounding bounding box"""
1841 |         try:
1842 |             if bbox:
1843 |                 x1, y1, x2, y2 = map(int, bbox[:4])
1844 |                 cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 2)
1845 |                 cv2.putText(
1846 |                     image,
1847 |                     phrase,
1848 |                     (x1, y1 - 10),
1849 |                     cv2.FONT_HERSHEY_SIMPLEX,
1850 |                     0.5,
1851 |                     (255, 0, 0),
1852 |                     2,
1853 |                 )
1854 |             return image
1855 |         except Exception as e:
1856 |             logging.error(f"Error plotting visual grounding bbox: {e}")
1857 |             return image
1858 |     
1859 |     def beep_sound(self):
1860 |         """Play beep sound"""
1861 |         try:
1862 |             if os.name == "nt":
1863 |                 os.system("echo \a")
1864 |             else:
1865 |                 print("\a")
1866 |         except Exception as e:
1867 |             logging.error(f"Error playing beep sound: {e}")
1868 |     
1869 |     def update_inference_rate(self):
1870 |         """Update inference rate display"""
1871 |         if self.inference_start_time is None:
1872 |             self.inference_start_time = time.time()
1873 |         else:
1874 |             elapsed_time = time.time() - self.inference_start_time
1875 |             if elapsed_time > 0:
1876 |                 inferences_per_second = self.inference_count / elapsed_time
1877 |                 if self.inference_rate_label:
1878 |                     self.inference_rate_label.config(
1879 |                         text=f"Inferences/sec: {inferences_per_second:.2f}",
1880 |                         fg="green"
1881 |                     )
1882 |     
1883 |     def update_caption_window(self, caption: str):
1884 |         """Update caption window"""
1885 |         if self.caption_label:
1886 |             if caption.lower() == "yes":
1887 |                 self.caption_label.config(
1888 |                     text=caption,
1889 |                     fg="green",
1890 |                     bg="black",
1891 |                     font=("Helvetica", 14, "bold")
1892 |                 )
1893 |                 if self.screenshot_on_yes_active:
1894 |                     with self.frame_lock:
1895 |                         if self.latest_image:
1896 |                             frame_bgr = cv2.cvtColor(np.array(self.latest_image), cv2.COLOR_RGB2BGR)
1897 |                             ImageUtils.save_screenshot(frame_bgr)
1898 |             elif caption.lower() == "no":
1899 |                 self.caption_label.config(
1900 |                     text=caption,
1901 |                     fg="red",
1902 |                     bg="black",
1903 |                     font=("Helvetica", 14, "bold")
1904 |                 )
1905 |                 if self.screenshot_on_no_active:
1906 |                     with self.frame_lock:
1907 |                         if self.latest_image:
1908 |                             frame_bgr = cv2.cvtColor(np.array(self.latest_image), cv2.COLOR_RGB2BGR)
1909 |                             ImageUtils.save_screenshot(frame_bgr)
1910 |             else:
1911 |                 self.caption_label.config(
1912 |                     text=caption,
1913 |                     fg="white",
1914 |                     bg="black",
1915 |                     font=("Helvetica", 14, "bold")
1916 |                 )
1917 |     
1918 |     def update_inference_result_window(self, result: str, phrase_results: List[bool]):
1919 |         """Update inference result window"""
1920 |         if self.inference_result_label:
1921 |             if result.lower() == "pass":
1922 |                 self.inference_result_label.config(
1923 |                     text=result,
1924 |                     fg="green",
1925 |                     bg="black",
1926 |                     font=("Helvetica", 14, "bold")
1927 |                 )
1928 |             else:
1929 |                 self.inference_result_label.config(
1930 |                     text=result,
1931 |                     fg="red",
1932 |                     bg="black",
1933 |                     font=("Helvetica", 14, "bold")
1934 |                 )
1935 |         
1936 |         for idx, phrase_result in enumerate(phrase_results):
1937 |             if idx < len(self.inference_phrases_result_labels):
1938 |                 label = self.inference_phrases_result_labels[idx]
1939 |                 if phrase_result:
1940 |                     label.config(
1941 |                         text=f"Inference {idx+1}: PASS",
1942 |                         fg="green",
1943 |                         bg="black",
1944 |                         font=("Helvetica", 14, "bold")
1945 |                     )
1946 |                 else:
1947 |                     label.config(
1948 |                         text=f"Inference {idx+1}: FAIL",
1949 |                         fg="red",
1950 |                         bg="black",
1951 |                         font=("Helvetica", 14, "bold")
1952 |                     )
1953 |     
1954 |     # -------------------------------------------------------------------------
1955 |     # Webcam Detection
1956 |     # -------------------------------------------------------------------------
1957 |     
1958 |     def start_webcam_detection(self):
1959 |         """Start webcam detection"""
1960 |         if self.webcam_threads:
1961 |             print(f"{Fore.RED}Webcam detection is already running.{Style.RESET_ALL}")
1962 |             return
1963 |         
1964 |         self.stop_webcam_flag.clear()
1965 |         
1966 |         for index in self.webcam_indices:
1967 |             thread = threading.Thread(
1968 |                 target=self._webcam_detection_thread,
1969 |                 args=(index,),
1970 |                 daemon=True
1971 |             )
1972 |             thread.start()
1973 |             self.webcam_threads.append(thread)
1974 |         
1975 |         print(f"{Fore.GREEN}Started webcam detection{Style.RESET_ALL}")
1976 |     
1977 |     def stop_webcam_detection(self):
1978 |         """Stop webcam detection"""
1979 |         if not self.webcam_threads:
1980 |             print(f"{Fore.RED}Webcam detection is not running.{Style.RESET_ALL}")
1981 |             return
1982 |         
1983 |         # Deactivate all features
1984 |         self.object_detection_active = False
1985 |         self.expression_comprehension_active = False
1986 |         self.visual_grounding_active = False
1987 |         self.inference_tree_active = False
1988 |         
1989 |         # Signal threads to stop
1990 |         self.stop_webcam_flag.set()
1991 |         
1992 |         # Wait for threads with timeout
1993 |         for thread in self.webcam_threads:
1994 |             thread.join(timeout=2.0)
1995 |         
1996 |         self.webcam_threads.clear()
1997 |         
1998 |         print(f"{Fore.GREEN}Webcam detection stopped successfully.{Style.RESET_ALL}")
1999 |     
2000 |     def _webcam_detection_thread(self, index: int):
2001 |         """Webcam detection thread with enhanced processing"""
2002 |         cap = None
2003 |         try:
2004 |             cap = cv2.VideoCapture(index)
2005 |             if not cap.isOpened():
2006 |                 print(f"{Fore.RED}Error: Could not open webcam {index}.{Style.RESET_ALL}")
2007 |                 return
2008 |             
2009 |             while not self.stop_webcam_flag.is_set():
2010 |                 # Frame rate limiting
2011 |                 if not self.should_process_frame():
2012 |                     time.sleep(0.001)
2013 |                     continue
2014 |                 
2015 |                 ret, frame = cap.read()
2016 |                 if not ret:
2017 |                     print(f"{Fore.RED}Failed to capture from webcam {index}.{Style.RESET_ALL}")
2018 |                     break
2019 |                 
2020 |                 try:
2021 |                     # Thread-safe image storage
2022 |                     with self.frame_lock:
2023 |                         image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
2024 |                         image_pil = Image.fromarray(image)
2025 |                         self.latest_image = image_pil
2026 |                     
2027 |                     # Process frame
2028 |                     self._process_single_frame(image_pil, frame, index)
2029 |                     
2030 |                     # Display if not headless
2031 |                     if not self.headless_mode:
2032 |                         self._display_frame(frame, index)
2033 |                     
2034 |                     if cv2.waitKey(1) & 0xFF == ord('q'):
2035 |                         break
2036 |                     
2037 |                 except Exception as e:
2038 |                     logging.error(f"Error processing frame from webcam {index}: {e}")
2039 |                     
2040 |         except Exception as e:
2041 |             logging.error(f"Error in webcam thread {index}: {e}")
2042 |         finally:
2043 |             if cap:
2044 |                 cap.release()
2045 |             if not self.headless_mode:
2046 |                 cv2.destroyWindow(f"Object Detection Webcam {index}")
2047 |     
2048 |     def _process_single_frame(self, image_pil: Image.Image, frame: np.ndarray, index: int):
2049 |         """Process a single frame with all active features"""
2050 |         
2051 |         # Expression Comprehension
2052 |         if self.expression_comprehension_active and self.phrase:
2053 |             results = self.run_expression_comprehension(image_pil, self.phrase)
2054 |             if results:
2055 |                 caption = "Yes" if "yes" in results.lower() else "No"
2056 |                 self.update_caption_window(caption)
2057 |                 if self.headless_mode:
2058 |                     print(f"Expression result: {caption}")
2059 |                 self.inference_count += 1
2060 |                 self.update_inference_rate()
2061 |                 
2062 |                 if self.recording_manager:
2063 |                     self.recording_manager.handle_recording_by_inference(caption.lower(), frame)
2064 |         
2065 |         # Object Detection
2066 |         if self.object_detection_active:
2067 |             self.detections = self.run_object_detection(image_pil)
2068 |             if self.headless_mode:
2069 |                 print(f"Detections from webcam {index}: {self.detections}")
2070 |             self.inference_count += 1
2071 |             self.update_inference_rate()
2072 |             
2073 |             # Update target detected flag
2074 |             self.target_detected = bool(self.detections)
2075 |             
2076 |             if self.recording_manager:
2077 |                 self.recording_manager.handle_recording_by_detection(self.detections, frame)
2078 |             
2079 |             # PTZ tracking
2080 |             if PTZ_AVAILABLE and self.ptz_tracker and self.ptz_tracker.active:
2081 |                 primary_bbox = self._pick_tracked_object(self.detections)
2082 |                 if primary_bbox is not None:
2083 |                     h, w = frame.shape[:2]
2084 |                     self.ptz_tracker.adjust_camera(primary_bbox, w, h)
2085 |         
2086 |         # Visual Grounding
2087 |         if self.visual_grounding_active and self.visual_grounding_phrase:
2088 |             bbox = self.run_visual_grounding(image_pil, self.visual_grounding_phrase)
2089 |             if bbox:
2090 |                 if not self.headless_mode:
2091 |                     self.plot_visual_grounding_bbox(frame, bbox, self.visual_grounding_phrase)
2092 |                 else:
2093 |                     print(f"Visual grounding result: {bbox}")
2094 |                 self.inference_count += 1
2095 |                 self.update_inference_rate()
2096 |         
2097 |         # Inference Tree
2098 |         if self.inference_tree_active and self.inference_title and self.inference_phrases:
2099 |             result, phrase_results = self.evaluate_inference_tree(image_pil)
2100 |             self.update_inference_result_window(result, phrase_results)
2101 |             if self.headless_mode:
2102 |                 print(f"Inference tree result: {result}, Details: {phrase_results}")
2103 |             self.inference_count += 1
2104 |             self.update_inference_rate()
2105 |         
2106 |         # Recording
2107 |         if self.recording_manager and self.recording_manager.recording:
2108 |             self.recording_manager.write_frame(frame)
2109 |     
2110 |     def _display_frame(self, frame: np.ndarray, index: int):
2111 |         """Display frame with overlays"""
2112 |         try:
2113 |             bbox_image = self.plot_bbox(frame.copy())
2114 |             cv2.imshow(f"Object Detection Webcam {index}", bbox_image)
2115 |             
2116 |             current_time = time.time()
2117 |             
2118 |             # Beep on detection
2119 |             if self.beep_active and self.target_detected:
2120 |                 if current_time - self.last_beep_time > 1:
2121 |                     threading.Thread(target=self.beep_sound, daemon=True).start()
2122 |                     self.last_beep_time = current_time
2123 |             
2124 |             # Screenshot on detection
2125 |             if self.screenshot_active and self.target_detected:
2126 |                 ImageUtils.save_screenshot(bbox_image)
2127 |                 
2128 |         except Exception as e:
2129 |             logging.error(f"Error displaying frame: {e}")
2130 |     
2131 |     # -------------------------------------------------------------------------
2132 |     # Cleanup
2133 |     # -------------------------------------------------------------------------
2134 |     
2135 |     def cleanup(self):
2136 |         """Comprehensive cleanup method"""
2137 |         try:
2138 |             logging.info("Starting YO-FLO cleanup...")
2139 |             
2140 |             # Stop all threads
2141 |             self.stop_webcam_detection()
2142 |             self.cleanup_flag.set()
2143 |             
2144 |             # Clean up frame processor
2145 |             if self.frame_processor:
2146 |                 self.frame_processor.shutdown(wait=False)
2147 |             
2148 |             # Clean up recording manager
2149 |             if self.recording_manager:
2150 |                 self.recording_manager.cleanup()
2151 |             
2152 |             # Close PTZ camera
2153 |             if self.ptz_camera:
2154 |                 self.ptz_camera.close()
2155 |             
2156 |             # Clear model from memory
2157 |             if self.model:
2158 |                 del self.model
2159 |                 self.model = None
2160 |             
2161 |             if self.processor:
2162 |                 del self.processor
2163 |                 self.processor = None
2164 |             
2165 |             # Clear CUDA cache
2166 |             if torch.cuda.is_available():
2167 |                 torch.cuda.empty_cache()
2168 |             
2169 |             # Force garbage collection
2170 |             gc.collect()
2171 |             
2172 |             # Destroy all OpenCV windows
2173 |             cv2.destroyAllWindows()
2174 |             
2175 |             logging.info("YO-FLO cleanup completed")
2176 |             
2177 |         except Exception as e:
2178 |             logging.error(f"Error during cleanup: {e}")
2179 |     
2180 |     def __del__(self):
2181 |         """Destructor to ensure cleanup"""
2182 |         self.cleanup()
2183 |     
2184 |     # -------------------------------------------------------------------------
2185 |     # Main Menu
2186 |     # -------------------------------------------------------------------------
2187 |     
2188 |     def main_menu(self):
2189 |         """Create and display the main GUI menu"""
2190 |         self.root.deiconify()
2191 |         self.root.title(config.WINDOW_TITLE)
2192 |         
2193 |         def on_closing():
2194 |             """Handle window closing"""
2195 |             self.cleanup()
2196 |             self.root.destroy()
2197 |         
2198 |         self.root.protocol("WM_DELETE_WINDOW", on_closing)
2199 |         
2200 |         try:
2201 |             # Model Management Frame
2202 |             model_frame = tk.LabelFrame(self.root, text="Model Management")
2203 |             model_frame.pack(fill="x", padx=10, pady=5)
2204 |             
2205 |             tk.Button(
2206 |                 model_frame,
2207 |                 text="Select Model Path",
2208 |                 command=self.select_model_path
2209 |             ).pack(fill="x")
2210 |             
2211 |             tk.Button(
2212 |                 model_frame,
2213 |                 text="Download Model from HuggingFace",
2214 |                 command=self.download_model_gui
2215 |             ).pack(fill="x")
2216 |             
2217 |             tk.Button(
2218 |                 model_frame,
2219 |                 text="Toggle File Logging",
2220 |                 command=self.toggle_file_logging
2221 |             ).pack(fill="x")
2222 |             
2223 |             # Detection Settings Frame
2224 |             detection_frame = tk.LabelFrame(self.root, text="Detection Settings")
2225 |             detection_frame.pack(fill="x", padx=10, pady=5)
2226 |             
2227 |             tk.Button(
2228 |                 detection_frame,
2229 |                 text="Set Classes for Object Detection",
2230 |                 command=self.set_class_names
2231 |             ).pack(fill="x")
2232 |             
2233 |             tk.Button(
2234 |                 detection_frame,
2235 |                 text="Set Phrase for Yes/No Inference",
2236 |                 command=self.set_phrase
2237 |             ).pack(fill="x")
2238 |             
2239 |             tk.Button(
2240 |                 detection_frame,
2241 |                 text="Set Grounding Phrase",
2242 |                 command=self.set_visual_grounding_phrase
2243 |             ).pack(fill="x")
2244 |             
2245 |             tk.Button(
2246 |                 detection_frame,
2247 |                 text="Set Inference Tree",
2248 |                 command=self.set_inference_tree
2249 |             ).pack(fill="x")
2250 |             
2251 |             # Feature Toggles Frame
2252 |             feature_frame = tk.LabelFrame(self.root, text="Feature Toggles")
2253 |             feature_frame.pack(fill="x", padx=10, pady=5)
2254 |             
2255 |             tk.Button(
2256 |                 feature_frame,
2257 |                 text="Object Detection",
2258 |                 command=self.toggle_object_detection
2259 |             ).pack(fill="x")
2260 |             
2261 |             tk.Button(
2262 |                 feature_frame,
2263 |                 text="Yes/No Inference",
2264 |                 command=self.toggle_expression_comprehension
2265 |             ).pack(fill="x")
2266 |             
2267 |             tk.Button(
2268 |                 feature_frame,
2269 |                 text="Visual Grounding",
2270 |                 command=self.toggle_visual_grounding
2271 |             ).pack(fill="x")
2272 |             
2273 |             tk.Button(
2274 |                 feature_frame,
2275 |                 text="Inference Tree",
2276 |                 command=self.toggle_inference_tree
2277 |             ).pack(fill="x")
2278 |             
2279 |             tk.Button(
2280 |                 feature_frame,
2281 |                 text="Headless Mode",
2282 |                 command=self.toggle_headless
2283 |             ).pack(fill="x")
2284 |             
2285 |             # Triggers Frame
2286 |             trigger_frame = tk.LabelFrame(self.root, text="Triggers")
2287 |             trigger_frame.pack(fill="x", padx=10, pady=5)
2288 |             
2289 |             tk.Button(
2290 |                 trigger_frame,
2291 |                 text="Beep on Detection",
2292 |                 command=self.toggle_beep
2293 |             ).pack(fill="x")
2294 |             
2295 |             tk.Button(
2296 |                 trigger_frame,
2297 |                 text="Screenshot on Detection",
2298 |                 command=self.toggle_screenshot
2299 |             ).pack(fill="x")
2300 |             
2301 |             tk.Button(
2302 |                 trigger_frame,
2303 |                 text="Screenshot on Yes",
2304 |                 command=self.toggle_screenshot_on_yes
2305 |             ).pack(fill="x")
2306 |             
2307 |             tk.Button(
2308 |                 trigger_frame,
2309 |                 text="Screenshot on No",
2310 |                 command=self.toggle_screenshot_on_no
2311 |             ).pack(fill="x")
2312 |             
2313 |             # PTZ Control Frame
2314 |             if PTZ_AVAILABLE:
2315 |                 ptz_frame = tk.LabelFrame(self.root, text="PTZ Control")
2316 |                 ptz_frame.pack(fill="x", padx=10, pady=5)
2317 |                 
2318 |                 tk.Button(
2319 |                     ptz_frame,
2320 |                     text="Open Manual PTZ Control",
2321 |                     command=self.open_manual_ptz_control
2322 |                 ).pack(fill="x")
2323 |                 
2324 |                 tk.Button(
2325 |                     ptz_frame,
2326 |                     text="Set PTZ Target Class",
2327 |                     command=self.set_ptz_target_class
2328 |                 ).pack(fill="x")
2329 |                 
2330 |                 tk.Button(
2331 |                     ptz_frame,
2332 |                     text="Start Autonomous Tracking",
2333 |                     command=self.start_autonomous_ptz_tracking
2334 |                 ).pack(fill="x")
2335 |                 
2336 |                 tk.Button(
2337 |                     ptz_frame,
2338 |                     text="Stop Autonomous Tracking",
2339 |                     command=self.stop_autonomous_ptz_tracking
2340 |                 ).pack(fill="x")
2341 |             else:
2342 |                 ptz_frame = tk.LabelFrame(self.root, text="PTZ Control (Unavailable)")
2343 |                 ptz_frame.pack(fill="x", padx=10, pady=5)
2344 |                 
2345 |                 tk.Label(
2346 |                     ptz_frame,
2347 |                     text="PTZ functionality not available - missing required modules",
2348 |                     fg="red"
2349 |                 ).pack(fill="x")
2350 |             
2351 |             # Recording Frame
2352 |             recording_frame = tk.LabelFrame(self.root, text="Recording Control")
2353 |             recording_frame.pack(fill="x", padx=10, pady=5)
2354 |             
2355 |             tk.Button(
2356 |                 recording_frame,
2357 |                 text="No Recording",
2358 |                 command=lambda: self.set_record_mode(None)
2359 |             ).pack(fill="x")
2360 |             
2361 |             tk.Button(
2362 |                 recording_frame,
2363 |                 text="Record on Detection",
2364 |                 command=lambda: self.set_record_mode("od")
2365 |             ).pack(fill="x")
2366 |             
2367 |             tk.Button(
2368 |                 recording_frame,
2369 |                 text='Record on "Yes"',
2370 |                 command=lambda: self.set_record_mode("infy")
2371 |             ).pack(fill="x")
2372 |             
2373 |             tk.Button(
2374 |                 recording_frame,
2375 |                 text='Record on "No"',
2376 |                 command=lambda: self.set_record_mode("infn")
2377 |             ).pack(fill="x")
2378 |             
2379 |             # Webcam Control Frame
2380 |             webcam_frame = tk.LabelFrame(self.root, text="Webcam Control")
2381 |             webcam_frame.pack(fill="x", padx=10, pady=5)
2382 |             
2383 |             tk.Button(
2384 |                 webcam_frame,
2385 |                 text="Start Webcam Detection",
2386 |                 command=self.start_webcam_detection
2387 |             ).pack(fill="x")
2388 |             
2389 |             tk.Button(
2390 |                 webcam_frame,
2391 |                 text="Stop Webcam Detection",
2392 |                 command=self.stop_webcam_detection
2393 |             ).pack(fill="x")
2394 |             
2395 |             # Debug Frame
2396 |             debug_frame = tk.LabelFrame(self.root, text="Debug")
2397 |             debug_frame.pack(fill="x", padx=10, pady=5)
2398 |             
2399 |             tk.Button(
2400 |                 debug_frame,
2401 |                 text="Toggle Debug Mode",
2402 |                 command=self.toggle_debug
2403 |             ).pack(fill="x")
2404 |             
2405 |             # Inference Rate Frame
2406 |             inference_rate_frame = tk.LabelFrame(self.root, text="Inference Rate")
2407 |             inference_rate_frame.pack(fill="x", padx=10, pady=5)
2408 |             
2409 |             self.inference_rate_label = tk.Label(
2410 |                 inference_rate_frame,
2411 |                 text="Inferences/sec: N/A",
2412 |                 fg="white",
2413 |                 bg="black",
2414 |                 font=("Helvetica", 14, "bold")
2415 |             )
2416 |             self.inference_rate_label.pack(fill="x")
2417 |             
2418 |             # Binary Inference Frame
2419 |             binary_inference_frame = tk.LabelFrame(self.root, text="Binary Inference")
2420 |             binary_inference_frame.pack(fill="x", padx=10, pady=5)
2421 |             
2422 |             self.caption_label = tk.Label(
2423 |                 binary_inference_frame,
2424 |                 text="Binary Inference: N/A",
2425 |                 fg="white",
2426 |                 bg="black",
2427 |                 font=("Helvetica", 14, "bold")
2428 |             )
2429 |             self.caption_label.pack(fill="x")
2430 |             
2431 |             # Inference Tree Frame
2432 |             inference_tree_frame = tk.LabelFrame(self.root, text="Inference Tree")
2433 |             inference_tree_frame.pack(fill="x", padx=10, pady=5)
2434 |             
2435 |             self.inference_result_label = tk.Label(
2436 |                 inference_tree_frame,
2437 |                 text="Inference Tree: N/A",
2438 |                 fg="white",
2439 |                 bg="black",
2440 |                 font=("Helvetica", 14, "bold")
2441 |             )
2442 |             self.inference_result_label.pack(fill="x")
2443 |             
2444 |             for i in range(3):
2445 |                 label = tk.Label(
2446 |                     inference_tree_frame,
2447 |                     text=f"Inference {i+1}: N/A",
2448 |                     fg="white",
2449 |                     bg="black",
2450 |                     font=("Helvetica", 14, "bold")
2451 |                 )
2452 |                 label.pack(fill="x")
2453 |                 self.inference_phrases_result_labels.append(label)
2454 |             
2455 |             # Statistics update
2456 |             def update_stats():
2457 |                 if self.frame_processor:
2458 |                     stats = self.frame_processor.get_statistics()
2459 |                     # You could add a stats label here to display this info
2460 |                 self.root.after(1000, update_stats)
2461 |             
2462 |             update_stats()
2463 |             
2464 |         except Exception as e:
2465 |             print(f"{Fore.RED}Error creating menu: {e}{Style.RESET_ALL}")
2466 |         
2467 |         self.root.mainloop()
2468 | 
2469 | # ============================================================================
2470 | # MAIN ENTRY POINT
2471 | # ============================================================================
2472 | 
2473 | def main():
2474 |     """Main entry point with proper error handling and cleanup"""
2475 |     app = None
2476 |     
2477 |     try:
2478 |         # Setup logging
2479 |         setup_logging(log_to_file=False, log_level=logging.INFO)
2480 |         
2481 |         # Create application instance
2482 |         app = YO_FLO()
2483 |         app.init_model_manager(quantization_mode=None)
2484 |         
2485 |         print(f"{Fore.BLUE}{Style.BRIGHT}YO-FLO Vision System v2.0{Style.RESET_ALL}")
2486 |         print(f"{Fore.CYAN}Enhanced with security, threading, and memory management{Style.RESET_ALL}")
2487 |         print(f"{Fore.CYAN}Created with comprehensive improvements for production use{Style.RESET_ALL}")
2488 |         
2489 |         # Run the GUI
2490 |         app.main_menu()
2491 |         
2492 |     except KeyboardInterrupt:
2493 |         print(f"\n{Fore.YELLOW}Interrupted by user{Style.RESET_ALL}")
2494 |         
2495 |     except Exception as e:
2496 |         logging.error(f"Fatal error: {e}", exc_info=True)
2497 |         print(f"{Fore.RED}Fatal error: {e}{Style.RESET_ALL}")
2498 |         
2499 |     finally:
2500 |         # Ensure cleanup
2501 |         if app:
2502 |             app.cleanup()
2503 |         logging.info("Application shutdown complete")
2504 | 
2505 | if __name__ == "__main__":
2506 |     main()
2507 | 


--------------------------------------------------------------------------------