├── readme.md
└── main.py


/readme.md:
--------------------------------------------------------------------------------
  1 | # 🌐🤖 WebSurferAI: Your Autonomous Web Navigator
  2 | 
  3 | [![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://example.com) <!-- Replace with your actual build status badge if you have one -->
  4 | [![Python Version](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)
  5 | [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE) <!-- Add a LICENSE file to your repo, e.g., MIT -->
  6 | 
  7 | 
  8 | **WebSurferAI** is a cutting-edge autonomous web assistant that leverages the power of Google's Gemini Pro Vision API and Playwright to interact with websites like a human.  It can understand complex tasks, navigate web pages, fill forms, extract information, and even handle CAPTCHAs (with your help!).  This project is in active development, and we're looking for passionate contributors to make it even more amazing! 🚀✨
  9 | 
 10 | ## 🌟 Features
 11 | 
 12 | *   **Autonomous Task Execution:**  Give it a task, and it will try its best to complete it.
 13 | *   **Intelligent Navigation:**  Uses Gemini Pro Vision to understand web pages and decide the best course of action.
 14 | *   **Dynamic Interaction:** Clicks buttons, fills forms, scrolls pages, and more.
 15 | *   **Information Extraction:**  Pulls relevant text and data from websites.
 16 | *   **Memory System:** Remembers past interactions and learns from them (stores in `memory.json`).
 17 | *   **CAPTCHA Handling:**  Detects CAPTCHAs and prompts you for manual assistance.
 18 | *   **Website Exploration Mode:**  Can explore websites to discover functionalities, especially useful for initial task discovery.
 19 | *   **Error Recovery:** Attempts to recover from errors and continue the task.
 20 | *   **Debug Mode:**  Provides detailed logging and saves screenshots for each step.
 21 | *   **Internal Monologue:**  Logs Gemini's reasoning and decision-making process (in debug mode).
 22 | * **Handles dialogs**
 23 | 
 24 | ## 🚀 Getting Started
 25 | 
 26 | ### Prerequisites
 27 | 
 28 | 1.  **Python:**  Make sure you have Python 3.7 or higher installed.  You can check by running `python --version` or `python3 --version` in your terminal.
 29 | 2.  **Playwright:**  The project uses Playwright for browser automation.  It will be installed in the next step, but you'll need the browser binaries.
 30 | 3.  **Google Gemini API Key:** You'll need an API key for Google Gemini. You can get one from [Google AI Studio](https://makersuite.google.com/app/apikey).
 31 | 4. **Node.js and npm (or yarn)**: Playwright uses Node.js. Install Node.js and npm.
 32 | 
 33 | ### Installation
 34 | 
 35 | 1.  **Clone the repository:**
 36 | 
 37 |     ```bash
 38 |     git clone https://github.com/yourusername/WebSurferAI.git  # Replace with your repo URL
 39 |     cd WebSurferAI
 40 |     ```
 41 | 
 42 | 2.  **Create a virtual environment (recommended):**
 43 | 
 44 |     ```bash
 45 |     python3 -m venv venv
 46 |     source venv/bin/activate  # On Windows: venv\Scripts\activate
 47 |     ```
 48 | 
 49 | 3.  **Install dependencies:**
 50 | 
 51 |     ```bash
 52 |     pip install -r requirements.txt
 53 |     ```
 54 |     Create a `requirements.txt` file with these contents:
 55 |     ```
 56 |     playwright
 57 |     Pillow
 58 |     python-dotenv
 59 |     google-generativeai
 60 |     ```
 61 | 
 62 | 4.  **Install Playwright browsers:**
 63 | 
 64 |     ```bash
 65 |     playwright install
 66 |     ```
 67 |     This command downloads the necessary browser binaries (Chromium, Firefox, WebKit).  This might take a few minutes.
 68 | 
 69 | 5.  **Set up your API Key:**
 70 | 
 71 |     *   Create a `.env` file in the root directory of the project.
 72 |     *   Add your Gemini API key to the `.env` file:
 73 | 
 74 |         ```
 75 |         GEMINI_API_KEY=your_api_key_here
 76 |         ```
 77 |     *   **Important:**  *Never* commit your `.env` file to version control.  Add `.env` to your `.gitignore` file.
 78 | 
 79 | ### Running the Assistant
 80 | 
 81 | 1.  **From the command line:**
 82 | 
 83 |     ```bash
 84 |     python main.py "Your task here"
 85 |     ```
 86 |      Replace `"Your task here"` with the task you want the assistant to perform.  For example:
 87 | 
 88 |     ```bash
 89 |     python main.py "Find the price of a Tesla Model 3 on the Tesla website"
 90 |     ```
 91 | 
 92 |     *   **Headless mode:** To run without showing the browser window, use the `--headless` flag:
 93 | 
 94 |         ```bash
 95 |         python main.py "Your task here" --headless
 96 |         ```
 97 | 
 98 |     *   **Debug mode:** For more detailed output and screenshots, use the `--debug` flag:
 99 | 
100 |         ```bash
101 |         python main.py "Your task here" --debug
102 |         ```
103 |     *   **Specify memory file (optional):** Use --memory_file, defaults to `memory.json`
104 | 
105 |         ```bash
106 |         python main.py "Your task here" --memory_file my_custom_memory.json
107 |         ```
108 | 
109 | 2.  **Interactive Mode:** If you run the script without a task argument, it will start in interactive mode:
110 | 
111 |     ```bash
112 |     python main.py
113 |     ```
114 | 
115 |     You can then enter tasks one by one.  You can also use the following commands:
116 |     *   `exit` or `quit`:  End the program.
117 |     *   `clear memory`:  Clears the assistant's memory.  You can also specify a category: `clear memory website`.
118 |     *   `show memory`: Displays the current contents of the memory.
119 | 
120 | ## 🤝 Contributing
121 | 
122 | We ❤️ contributions!  WebSurferAI is a community project, and we welcome anyone who wants to help make it better. Whether you're a seasoned developer or just starting out, there are many ways to contribute:
123 | 
124 | *   **Bug Reports:** If you find a bug, please open an issue on GitHub.  Be as detailed as possible, including steps to reproduce the bug.
125 | *   **Feature Requests:**  Have an idea for a new feature?  Open an issue and describe it!
126 | *   **Code Contributions:**
127 |     *   Fork the repository.
128 |     *   Create a new branch for your feature or bug fix: `git checkout -b my-new-feature`
129 |     *   Make your changes.
130 |     *   Write tests for your code (if applicable).
131 |     *   Ensure your code follows the existing style (use a linter like `flake8` or `pylint`).
132 |     *   Commit your changes: `git commit -m "Add some amazing feature"`
133 |     *   Push to your branch: `git push origin my-new-feature`
134 |     *   Open a pull request on GitHub.
135 | *   **Documentation:**  Improve the README, add docstrings to the code, or create tutorials.
136 | *   **Testing:** Help us test the assistant on different websites and with different tasks.
137 | *   **Ideas and Feedback:** Share your thoughts and suggestions on how to improve the project.
138 | 
139 | We especially need help with:
140 | 
141 | *   **Improving the prompt engineering:** Refining the prompts sent to Gemini can significantly enhance the assistant's performance.
142 | *   **Expanding error handling:**  Making the assistant more robust to unexpected website behavior.
143 | *   **Adding support for more websites:**  Testing and adapting the assistant to work with a wider range of websites.
144 | *   **Developing a user interface:**  A graphical user interface would make the assistant more accessible.
145 | *   **Creating more sophisticated memory management:**  Improving how the assistant stores and retrieves information.
146 | * **Parallel task execution:** allowing for multiple simultaneous actions, if possible.
147 | 
148 | ## 🗺️ Project Structure
149 | 
150 | *   `main.py`:  The main script containing the `AutonomousWebAssistant` class and the command-line interface.
151 | *   `screenshots/`:  Directory where screenshots are saved (created automatically).
152 | *   `memory.json`:  The default file where the assistant's memory is stored (created automatically).
153 | *   `.env`:  File for storing your API key (you need to create this).
154 | * `requirements.txt`: List of Python dependencies.
155 | 
156 | ## 📝 License
157 | 
158 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. (You need to create a LICENSE file).
159 | 
160 | ## 🙏 Acknowledgements
161 | 
162 | *   Google Gemini Team
163 | *   Playwright Team
164 | *   All the contributors!
165 | 
166 | Let's build the future of web automation together! 🌐✨🤖
167 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
   1 | import os
   2 | import time
   3 | import argparse
   4 | import json
   5 | import base64
   6 | from urllib.parse import urlparse, urljoin
   7 | from datetime import datetime
   8 | import uuid
   9 | from dotenv import load_dotenv
  10 | import logging
  11 | from playwright.sync_api import sync_playwright
  12 | from PIL import Image
  13 | from io import BytesIO
  14 | import google.generativeai as genai
  15 | 
  16 | # --- Setup Logging ---
  17 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
  18 | 
  19 | # Load environment variables
  20 | load_dotenv()
  21 | 
  22 | # Configure Google Gemini API
  23 | API_KEY = os.getenv("GEMINI_API_KEY")
  24 | if not API_KEY:
  25 |     API_KEY = "YOUR_API_KEY_HERE"  # Default API KEY
  26 | if API_KEY == "YOUR_API_KEY_HERE":
  27 |     logging.warning("Using default API key. Set your GEMINI_API_KEY in .env file for proper use.")
  28 | 
  29 | genai.configure(api_key=API_KEY)
  30 | 
  31 | # Initialize the Gemini model - using the visual model
  32 | MODEL_NAME = "gemini-2.0-flash-thinking-exp"  # "gemini-2.0-flash" or "gemini-1.0-pro-vision-001",  gemini-pro
  33 | model = genai.GenerativeModel(MODEL_NAME)
  34 | 
  35 | class AutonomousWebAssistant:
  36 |     def __init__(self, headless=False, debug=False, screenshot_dir="screenshots", memory_file="memory.json"):
  37 |         self.playwright = sync_playwright().start()
  38 |         self.browser = self.playwright.chromium.launch(headless=headless)
  39 |         self.page = None  # Playwright Page object will be initialized in initialize_browser
  40 |         self.headless = headless
  41 |         self.debug = debug
  42 |         self.screenshot_dir = screenshot_dir
  43 |         self.screenshot_count = 0
  44 |         self.current_task = None
  45 |         self.task_history = []
  46 |         self.action_history = []
  47 |         self.memory_file = memory_file
  48 |         self.memory = self.load_memory()
  49 |         self.captcha_solving_active = False
  50 |         self.element_search_timeout = 10
  51 |         self.explored_urls = set()
  52 |         self.internal_monologue = []
  53 | 
  54 |         if not os.path.exists(self.screenshot_dir):
  55 |             os.makedirs(self.screenshot_dir)
  56 | 
  57 |         self.initialize_browser()
  58 | 
  59 |     def initialize_browser(self):
  60 |         """Initialize the Playwright browser and page."""
  61 |         if self.page:
  62 |             self.page.close() # Close existing page if any before creating new one
  63 |         self.page = self.browser.new_page()
  64 |         self.page.set_viewport_size({"width": 1920, "height": 1080}) # Consistent viewport size
  65 |         logging.info("Playwright browser initialized.")
  66 | 
  67 |     def load_memory(self):
  68 |         """Loads memory from the memory file."""
  69 |         try:
  70 |             with open(self.memory_file, 'r') as f:
  71 |                 return json.load(f)
  72 |         except (FileNotFoundError, json.JSONDecodeError):
  73 |             logging.info("Memory file not found or invalid. Starting with an empty memory.")
  74 |             return {}
  75 | 
  76 |     def save_memory(self):
  77 |         """Saves the current memory to the memory file."""
  78 |         try:
  79 |             with open(self.memory_file, 'w') as f:
  80 |                 json.dump(self.memory, f, indent=4)
  81 |         except Exception as e:
  82 |             logging.error(f"Error saving memory to file: {e}")
  83 | 
  84 |     def add_memory(self, key, value, category="general"):
  85 |         """Adds a new memory entry using UUIDs for unique keys."""
  86 |         memory_id = str(uuid.uuid4())
  87 |         self.memory[memory_id] = {
  88 |             "key": key,
  89 |             "value": value,
  90 |             "category": category,
  91 |             "timestamp": datetime.now().isoformat()
  92 |         }
  93 |         self.save_memory()
  94 |         return memory_id
  95 | 
  96 |     def retrieve_memory(self, key, category=None):
  97 |         """Retrieves memory entries based on key and optionally category."""
  98 |         results = []
  99 |         for mem_id, mem_data in self.memory.items():
 100 |             if mem_data['key'] == key and (category is None or mem_data['category'] == category):
 101 |                 results.append(mem_data)
 102 |         return results
 103 | 
 104 |     def clear_memory(self, category=None):
 105 |         """Clears memory entries, optionally filtering by category."""
 106 |         if category:
 107 |             keys_to_delete = [mem_id for mem_id, mem_data in self.memory.items() if mem_data['category'] == category]
 108 |             for mem_id in keys_to_delete:
 109 |                 del self.memory[mem_id]
 110 |         else:
 111 |             self.memory = {}
 112 |         self.save_memory()
 113 | 
 114 |     def close_browser(self):
 115 |         """Close the Playwright browser and context."""
 116 |         if self.browser:
 117 |             self.browser.close()
 118 |             self.playwright.stop()
 119 |             logging.info("Playwright browser closed.")
 120 | 
 121 |     def take_screenshot(self, filename=None):
 122 |         """Take a screenshot and save it, or return as bytes."""
 123 |         timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
 124 |         self.screenshot_count += 1
 125 |         if filename is None:
 126 |             filename = f"{self.screenshot_dir}/screenshot_{timestamp}_{self.screenshot_count}.png"
 127 | 
 128 |         screenshot = self.page.screenshot() # Take screenshot using Playwright
 129 | 
 130 |         if filename:
 131 |             with open(filename, "wb") as f:
 132 |                 f.write(screenshot)
 133 |             logging.info(f"📸 Screenshot saved: {filename}")
 134 |         return screenshot, filename
 135 | 
 136 |     def get_screenshot_as_base64(self):
 137 |         """Get the current screenshot as base64 string for API requests."""
 138 |         screenshot = self.page.screenshot()
 139 |         return base64.b64encode(screenshot).decode('utf-8')
 140 | 
 141 |     def execute_task(self, user_task):
 142 |         """Main method to process and execute a user task autonomously."""
 143 |         logging.info(f"🤖 Understanding task: {user_task}")
 144 |         self.current_task = user_task
 145 |         self.task_history.append(user_task)
 146 |         self.internal_monologue = []
 147 | 
 148 |         if not self.action_history or self.action_history[-1]['action'] == "TASK_COMPLETE":
 149 |             self.navigate_to_url("https://www.google.com")
 150 | 
 151 |         max_steps = 30
 152 |         current_step = 0
 153 |         exploration_depth = 2
 154 |         retry_attempts = 0
 155 |         max_retry_attempts = 3
 156 | 
 157 |         while current_step < max_steps:
 158 |             current_step += 1
 159 |             logging.info(f"\n🔄 Step {current_step}/{max_steps}: Taking screenshot and determining next action...")
 160 | 
 161 |             screenshot, filename = self.take_screenshot()
 162 | 
 163 |             next_action = self.get_next_action_from_gemini(screenshot, user_task, current_step)
 164 | 
 165 |             self.internal_monologue.append({
 166 |                 "step": current_step,
 167 |                 "gemini_reasoning": next_action.get("reasoning", "No reasoning provided"),
 168 |                 "action": next_action["action"],
 169 |                 "details": next_action.get("details", {}),
 170 |                 "self_assessment": "Evaluating action..."
 171 |             })
 172 | 
 173 |             self.action_history.append({
 174 |                 "step": current_step,
 175 |                 "action": next_action["action"],
 176 |                 "details": next_action.get("details", {}),
 177 |                 "screenshot": filename
 178 |             })
 179 | 
 180 |             if next_action["action"] == "TASK_COMPLETE":
 181 |                 logging.info(f"✅ Task completed: {next_action.get('message', 'Gemini determined the task is complete')}")
 182 |                 break
 183 |             elif next_action["action"] == "MANUAL_CAPTCHA":
 184 |                 logging.warning("🚨 Captcha detected! Pausing automation. Please solve the captcha manually in the browser.")
 185 |                 self.captcha_solving_active = True
 186 |                 input("Press Enter after you have solved the captcha...")
 187 |                 self.captcha_solving_active = False
 188 |                 logging.info("Resuming automation...")
 189 |                 continue
 190 |             elif next_action["action"] == "EXPLORE_WEBSITE":
 191 |                 logging.info("🌐 Initiating website exploration...")
 192 |                 self.explore_website(url=self.page.url, max_depth=exploration_depth)
 193 |                 logging.info("Exploration complete. Resuming task execution.")
 194 |                 continue
 195 |             elif next_action["action"] == "RETRY":
 196 |                 logging.info("🔄 Gemini suggested to retry the last action...")
 197 |                 retry_attempts += 1
 198 |                 if retry_attempts > max_retry_attempts:
 199 |                     logging.error(f"❌ Max retry attempts reached ({max_retry_attempts}).  Aborting.")
 200 |                     break
 201 |                 continue
 202 |             else:
 203 |                 retry_attempts = 0
 204 | 
 205 |             status = self.execute_action(next_action)
 206 |             self.internal_monologue[-1]["action_result"] = status
 207 | 
 208 |             if status.get("status") == "ERROR":
 209 |                 logging.error(f"❌ Error executing action: {status.get('message')}")
 210 |                 recovery_screenshot, _ = self.take_screenshot()
 211 |                 recovery_action = self.get_recovery_action(recovery_screenshot, status.get("message"), user_task)
 212 | 
 213 |                 if recovery_action["action"] == "ABORT":
 214 |                     logging.error("❌ Cannot recover from error, aborting task")
 215 |                     break
 216 | 
 217 |                 self.internal_monologue[-1]["recovery_action"] = recovery_action
 218 |                 recovery_status = self.execute_action(recovery_action)
 219 |                 self.internal_monologue[-1]["recovery_result"] = recovery_status
 220 | 
 221 |                 if recovery_status.get("status") == "ERROR":
 222 |                     logging.error(f"❌ Recovery action failed: {recovery_status.get('message')}. Aborting.")
 223 |                     break
 224 | 
 225 |             time.sleep(1)
 226 | 
 227 |         summary = self.generate_task_summary(user_task)
 228 |         logging.info("\n📊 Task Summary:")
 229 |         logging.info(summary)
 230 | 
 231 |         if self.debug:
 232 |             logging.info("\n🧠 Internal Monologue:")
 233 |             for thought in self.internal_monologue:
 234 |                 logging.info(thought)
 235 | 
 236 |         return {
 237 |             "task": user_task,
 238 |             "steps": current_step,
 239 |             "actions": self.action_history,
 240 |             "summary": summary,
 241 |             "internal_monologue": self.internal_monologue
 242 |         }
 243 | 
 244 |     def get_next_action_from_gemini(self, screenshot, task, step_number):
 245 |         """Use Gemini to analyze screenshot and determine next action."""
 246 |         if isinstance(screenshot, bytes):
 247 |             image_bytes = screenshot
 248 |         else:
 249 |             with open(screenshot, "rb") as f:
 250 |                 image_bytes = f.read()
 251 | 
 252 |         image_parts = [
 253 |             {
 254 |                 "inline_data": {
 255 |                     "data": base64.b64encode(image_bytes).decode("utf-8"),
 256 |                     "mime_type": "image/png"
 257 |                 }
 258 |             }
 259 |         ]
 260 | 
 261 |         relevant_memories = []
 262 |         relevant_memories.extend(list(self.memory.values())[-5:])
 263 |         relevant_memories.extend(self.retrieve_memory(key=urlparse(self.page.url).netloc, category="website"))
 264 | 
 265 |         memory_context = ""
 266 |         if relevant_memories:
 267 |             memory_context = "\n**Relevant Memories:**\n"
 268 |             for mem in relevant_memories:
 269 |                 memory_context += f"- {mem['key']}: {mem['value']}\n"
 270 | 
 271 |         prompt = f"""
 272 |                 You are an expert web automation assistant using Playwright.
 273 | 
 274 |                 **Current Task:** {task}
 275 |                 **Step Number:** {step_number}
 276 |                 **Current URL:** {self.page.url}
 277 |                 **Page Title:** {self.page.title()}
 278 |                 **Previous Actions:** (Summarized) {self.summarize_action_history()}
 279 |                 {memory_context}
 280 | 
 281 |                 **Your Goal:** Autonomously complete the user's task by interacting with the webpage using Playwright.
 282 | 
 283 |                 **Consider these capabilities and instructions when deciding the next action:**
 284 | 
 285 |                 1.  **Task Understanding and Goal Decomposition:** Understand the overall task. Break it down into smaller steps.
 286 | 
 287 |                 2.  **Website Exploration for Task Discovery (NEW FEATURE):** If needed for vague tasks, suggest action: "EXPLORE_WEBSITE".
 288 | 
 289 |                 3.  **CAPTCHA Handling:** If CAPTCHA is visible, suggest "MANUAL_CAPTCHA".
 290 | 
 291 |                 4.  **Action Selection Strategy:** Choose the MOST RELEVANT SINGLE NEXT ACTION.
 292 | 
 293 |                 5.  **Element Identification (Playwright Locators):** For "CLICK" and "TYPE" actions, use robust Playwright locators:
 294 |                     *   Prioritize **text-based locators** (e.g., `"text=Submit"`, `"text='Log In'"`, `"text=exact:Search"`).
 295 |                     *   Use **CSS selectors** when text locators are insufficient (e.g., `"#id"`, `.class`, `"div > button"`).
 296 |                     *   Consider **role-based locators** for accessibility (e.g., `"[role='button']"`, `"getByRole('link', name='Learn more')"`).
 297 |                     *   For complex scenarios, use **chained locators** (e.g., `".parent >> .child"`).
 298 |                     *   If multiple elements match, use `:nth(index)` or `locator.nth(index)` to target a specific one.
 299 |                     *   Suggest the **most specific and reliable locator** in 'details'.
 300 |                     *   If text is reliable, use text locators. Otherwise, use CSS or other suitable locators.
 301 | 
 302 |                 6.  **Recovery and Retry:** Suggest "action: RETRY" for transient errors.
 303 | 
 304 |                 7.  **TASK_COMPLETE Recognition:** Suggest "action: TASK_COMPLETE" when the task is fulfilled.
 305 | 
 306 |                 8. **Memory Utilization:** Use provided memories to inform decisions.
 307 | 
 308 |                 **Output Format:** Return JSON object:
 309 |                 {{
 310 |                   "action":  (CLICK, TYPE, NAVIGATE, SCROLL, WAIT, EXTRACT, TASK_COMPLETE, MANUAL_CAPTCHA, EXPLORE_WEBSITE, ABORT, RETRY)
 311 |                   "details": {{ ...action-specific details... }}
 312 |                   "reasoning": "Explain action choice."
 313 |                   "message": "User-friendly action description."
 314 |                 }}
 315 |                 **Examples:**
 316 |                 {{"action": "CLICK", "details": {{"locator": "text=Sign In"}}, "reasoning": "User needs to log in", "message": "Clicking 'Sign In' button."}}
 317 |                 {{"action": "TYPE", "details": {{"locator": "#search-query", "text": "product search"}}, "reasoning": "Searching for products", "message": "Typing 'product search' in search box."}}
 318 |                 {{"action": "NAVIGATE", "details": {{"url": "https://example.com/pricing"}}, "reasoning": "Navigating to pricing page", "message": "Navigating to pricing page."}}
 319 |                 {{"action": "TASK_COMPLETE", "reasoning": "Task completed", "message": "Task completed."}}
 320 |                 {{"action": "MANUAL_CAPTCHA", "reasoning": "Captcha detected", "message": "Solve CAPTCHA manually."}}
 321 |                 {{"action": "EXPLORE_WEBSITE", "details": {{}}, "reasoning": "Exploring website for testing", "message": "Initiating website exploration."}}
 322 |                 {{"action": "RETRY", "reasoning": "Retrying last action", "message": "Retrying last action."}}
 323 | 
 324 |                 **IMPORTANT:** Respond with JSON ONLY.
 325 |                 """
 326 | 
 327 |         try:
 328 |             response = model.generate_content([prompt] + image_parts)
 329 |             response_text = response.text.strip()
 330 | 
 331 |             try:
 332 |                 if response_text.startswith("```json"):
 333 |                     json_text = response_text.split("```json")[1].split("```")[0].strip()
 334 |                 elif response_text.startswith("```"):
 335 |                     json_text = response_text.split("```")[1].strip()
 336 |                 else:
 337 |                     json_text = response_text
 338 | 
 339 |                 action_data = json.loads(json_text)
 340 | 
 341 |                 logging.info(f"💭 Gemini's reasoning: {action_data.get('reasoning', 'No reasoning provided')}")
 342 |                 logging.info(f"🚀 Next action: {action_data.get('message', action_data.get('action', 'Unknown action'))}")
 343 |                 return action_data
 344 | 
 345 |             except json.JSONDecodeError as e:
 346 |                 logging.error(f"❌ Error parsing Gemini response as JSON: {e}")
 347 |                 logging.error(f"Response text: {response_text}")
 348 |                 return {
 349 |                     "action": "WAIT",
 350 |                     "details": {"seconds": 5},
 351 |                     "reasoning": "JSON parsing failed. Waiting and will re-prompt.",
 352 |                     "message": "Waiting for 5 seconds due to API response error. Re-prompting."
 353 |                 }
 354 | 
 355 |         except Exception as e:
 356 |             logging.error(f"❌ Error getting next action from Gemini (API error): {e}")
 357 |             return {
 358 |                 "action": "WAIT",
 359 |                 "details": {"seconds": 10},
 360 |                 "reasoning": "Gemini API call failed. Waiting and will re-prompt.",
 361 |                 "message": "Waiting for 10 seconds due to API error. Re-prompting."
 362 |             }
 363 | 
 364 |     def summarize_action_history(self, num_actions=5):
 365 |         """Summarize recent action history for Gemini context."""
 366 |         if not self.action_history:
 367 |             return "No actions taken yet."
 368 |         recent_actions = self.action_history[-num_actions:]
 369 |         summary = []
 370 |         for action_data in recent_actions:
 371 |             action_type = action_data['action']
 372 |             message = action_data.get('message', action_type)
 373 |             summary.append(f"Step {action_data['step']}: {message}")
 374 |         return "; ".join(summary)
 375 | 
 376 |     def get_recovery_action(self, screenshot, error_message, task):
 377 |         """Get a recovery action from Gemini when an action fails."""
 378 |         prompt = f"""
 379 |                 There was an error during web automation.
 380 | 
 381 |                 **Task:** {task}
 382 |                 **Error Message:** {error_message}
 383 |                 **Current URL:** {self.page.url}
 384 |                 **Error Screenshot:** Analyze screenshot to understand error context.
 385 |                 **Recent Actions:** (Summarized) {self.summarize_action_history()}
 386 | 
 387 |                 Determine a recovery action to continue the task.
 388 | 
 389 |                 **Recovery Action Considerations:**
 390 |                 1.  Analyze screenshot and error message to understand *why* the action failed.
 391 |                 2.  Is the error transient or a logical mistake?
 392 |                 3.  Suggest a recovery action to resolve the issue.
 393 |                 4.  Available actions: CLICK, TYPE, NAVIGATE, SCROLL, WAIT, ABORT, RETRY.
 394 | 
 395 |                 **When to Use RETRY:** If error is temporary or due to loading, retry the *same* action after delay.
 396 | 
 397 |                 **Explain Reasoning:** In "reasoning", explain *why* the recovery action is suggested.
 398 | 
 399 |                 **Output Format:** JSON object:
 400 |                 {{
 401 |                   "action":  (CLICK, TYPE, NAVIGATE, SCROLL, WAIT, ABORT, RETRY)
 402 |                   "details": {{ ...action-specific details... }}
 403 |                   "reasoning": "Explain recovery action."
 404 |                 }}
 405 | 
 406 |                 **If recovery is impossible, use "action": "ABORT".**
 407 | 
 408 |                 **IMPORTANT:** Respond with JSON ONLY.
 409 |                 """
 410 |         if isinstance(screenshot, bytes):
 411 |             image_bytes = screenshot
 412 |         else:
 413 |             with open(screenshot, "rb") as f:
 414 |                 image_bytes = f.read()
 415 | 
 416 |         image_parts = [
 417 |             {
 418 |                 "inline_data": {
 419 |                     "data": base64.b64encode(image_bytes).decode("utf-8"),
 420 |                     "mime_type": "image/png"
 421 |                 }
 422 |             }
 423 |         ]
 424 | 
 425 |         try:
 426 |             response = model.generate_content([prompt] + image_parts)
 427 |             response_text = response.text.strip()
 428 | 
 429 |             try:
 430 |                 if response_text.startswith("```json"):
 431 |                     json_text = response_text.split("```json")[1].split("```")[0].strip()
 432 |                 elif response_text.startswith("```"):
 433 |                     json_text = response_text.split("```")[1].strip()
 434 |                 else:
 435 |                     json_text = response_text
 436 | 
 437 |                 recovery_action = json.loads(json_text)
 438 |                 logging.info(f"🛠️ Recovery action suggested by Gemini: {recovery_action.get('reasoning', 'No reasoning provided')}")
 439 |                 return recovery_action
 440 | 
 441 |             except (json.JSONDecodeError, IndexError) as e:
 442 |                 logging.error(f"❌ Error parsing recovery action JSON: {e}")
 443 |                 return {"action": "ABORT", "reasoning": "Could not parse recovery action from Gemini."}
 444 | 
 445 |         except Exception as e:
 446 |             logging.error(f"❌ Error getting recovery action from Gemini (API error): {e}")
 447 |             return {"action": "ABORT", "reasoning": f"API error during recovery action request: {str(e)}"}
 448 | 
 449 |     def execute_action(self, action_data):
 450 |         """Execute an action based on action type and details using Playwright."""
 451 |         action_type = action_data.get("action", "").upper()
 452 |         details = action_data.get("details", {})
 453 | 
 454 |         logging.info(f"⚙️ Executing: {action_data.get('message', action_type)}")
 455 | 
 456 |         try:
 457 |             if action_type == "CLICK":
 458 |                 locator_str = details.get("locator", "")
 459 |                 text = details.get("text", "") # Text might be used as fallback locator if locator_str is not provided or fails
 460 | 
 461 |                 return self.click_element(locator_str=locator_str, text=text)
 462 | 
 463 |             elif action_type == "TYPE":
 464 |                 locator_str = details.get("locator", "")
 465 |                 text = details.get("text", "")
 466 |                 return self.type_text(locator_str=locator_str, text=text)
 467 | 
 468 |             elif action_type == "NAVIGATE":
 469 |                 url = details.get("url", "")
 470 |                 return self.navigate_to_url(url)
 471 | 
 472 |             elif action_type == "SCROLL":
 473 |                 direction = details.get("direction", "down")
 474 |                 amount = details.get("amount", 300)
 475 |                 return self.scroll_page(direction, amount)
 476 | 
 477 |             elif action_type == "WAIT":
 478 |                 seconds = details.get("seconds", 3)
 479 |                 return self.wait_for(seconds)
 480 | 
 481 |             elif action_type == "EXTRACT":
 482 |                 extract_type = details.get("type", "text")
 483 |                 locator_str = details.get("locator") # Locator for extraction
 484 |                 return self.extract_content(extract_type, locator_str=locator_str)
 485 | 
 486 |             elif action_type == "EXPLORE_WEBSITE":
 487 |                 return {"status": "SUCCESS", "message": "Website exploration action acknowledged."}
 488 | 
 489 |             elif action_type in ["TASK_COMPLETE", "ABORT", "MANUAL_CAPTCHA", "RETRY"]:
 490 |                 return {"status": "SUCCESS", "message": "Action acknowledged"}
 491 | 
 492 |             else:
 493 |                 return {"status": "ERROR", "message": f"Unknown action type: {action_type}"}
 494 | 
 495 |         except Exception as e:
 496 |             return {"status": "ERROR", "message": str(e)}
 497 | 
 498 |     def explore_website(self, url, max_depth, current_depth=0):
 499 |         """Recursively explore a website using Playwright."""
 500 |         if current_depth >= max_depth or url in self.explored_urls:
 501 |             return
 502 | 
 503 |         try:
 504 |             logging.info(f"\n🌐 Exploring URL: {url}, Depth: {current_depth}")
 505 |             nav_status = self.navigate_to_url(url)
 506 |             if nav_status.get("status") == "ERROR":
 507 |                 logging.error(f"❌ Error navigating to {url} during exploration: {nav_status.get('message')}")
 508 |                 return
 509 | 
 510 |             self.explored_urls.add(url)
 511 | 
 512 |             self.take_screenshot()
 513 |             extract_result = self.extract_content(extract_type="text")
 514 | 
 515 |             if extract_result.get("status") == "SUCCESS":
 516 |                 logging.info(f"📄 Extracted content from: {url} (excerpt): {extract_result['data']['text'][:150]}...")
 517 |                 self.add_memory(key=urlparse(url).netloc, value=extract_result['data']['text'][:500], category="website")
 518 | 
 519 |             else:
 520 |                 logging.warning(f"⚠️  Failed to extract content from: {url}")
 521 | 
 522 |             # Find all 'a' tags using Playwright locator for links
 523 |             links_locator = self.page.locator('a')
 524 |             links_count = links_locator.count() # Get count of links for iteration (more efficient than fetching all elements at once)
 525 | 
 526 |             urls_to_explore = set()
 527 |             base_url_parsed = urlparse(url)
 528 | 
 529 |             for i in range(links_count): # Iterate through links using index
 530 |                 try:
 531 |                     link_element = links_locator.nth(i) # Get link element by index
 532 |                     href = link_element.get_attribute('href') # Get href attribute using Playwright
 533 |                     absolute_url = urljoin(url, href)
 534 | 
 535 |                     if absolute_url and absolute_url.startswith(('http://', 'https://')):
 536 |                         url_parsed = urlparse(absolute_url)
 537 |                         if url_parsed.netloc == base_url_parsed.netloc and absolute_url not in self.explored_urls:
 538 |                             urls_to_explore.add(absolute_url)
 539 | 
 540 |                 except Exception as e: # Catch any issues during link processing
 541 |                     logging.warning(f"Issue processing link during exploration: {e}")
 542 |                     continue
 543 | 
 544 |             for next_url in urls_to_explore:
 545 |                 self.explore_website(next_url, max_depth, current_depth + 1)
 546 | 
 547 |         except Exception as e:
 548 |             logging.error(f"🔥 Error during website exploration of {url}: {e}")
 549 | 
 550 |     def find_element_by_locator(self, locator_str, text=None, index=0):
 551 |         """Find an element using Playwright locator or fallback to text if locator fails."""
 552 |         start_time = time.time()
 553 | 
 554 |         while time.time() - start_time < self.element_search_timeout:
 555 |             try:
 556 |                 if locator_str:
 557 |                     locator = self.page.locator(locator_str) # Use Playwright locator directly
 558 |                     count = locator.count() # Check if elements are found
 559 | 
 560 |                     if count > 0:
 561 |                         if 0 <= index < count:
 562 |                             element_locator = locator.nth(index) # Get specific element if index is within range
 563 |                         else:
 564 |                             element_locator = locator.first # Default to the first element if index is out of range
 565 | 
 566 |                         if self.debug:
 567 |                             element_locator.evaluate("element => { element.style.border = '3px solid red'; }") # Highlight element
 568 |                             time.sleep(0.5)
 569 |                         return element_locator # Return Playwright Locator object
 570 | 
 571 |                 if text: # Fallback to text based search if locator_str is not provided or initial locator didn't find element
 572 |                     # Playwright Text Locators are very powerful and should be preferred.
 573 |                     text_locator_strategies = [
 574 |                         f"text={text}", # Exact text match
 575 |                         f"text={text}>>nth={index}", # Exact text match with index
 576 |                         f"text=regexp:^{text}$", # Exact text match using regex
 577 |                         f"text=*{text}", # Contains text
 578 |                         f"text=regexp:{text}", # Contains text using regex
 579 |                         f"text=iregex:{text}", # Contains text, case-insensitive regex
 580 |                         f"text='{text}'", # Exact text with single quotes
 581 |                         f"text=\"{text}\"", # Exact text with double quotes
 582 |                         f"text=localized:\"{text}\"" # Localized text (if applicable)
 583 |                     ]
 584 |                     for strategy in text_locator_strategies:
 585 |                         locator = self.page.locator(strategy)
 586 |                         count = locator.count()
 587 |                         if count > 0:
 588 |                             if 0 <= index < count:
 589 |                                 element_locator = locator.nth(index)
 590 |                             else:
 591 |                                 element_locator = locator.first
 592 | 
 593 |                             if self.debug:
 594 |                                 element_locator.evaluate("element => { element.style.border = '3px solid blue'; }")
 595 |                                 time.sleep(0.5)
 596 |                             return element_locator # Return Playwright Locator object
 597 | 
 598 |             except Exception as e:
 599 |                 logging.warning(f"Error finding element with locator '{locator_str}' or text '{text}': {e}. Retrying...")
 600 |                 pass # Retry
 601 | 
 602 |         return None # Element not found
 603 | 
 604 |     def click_element(self, locator_str=None, text=None, index=0):
 605 |         """Click on an element using Playwright locator or text. Demonstrates various click options."""
 606 |         try:
 607 |             logging.info(f"🖱️ Clicking: {text if text else locator_str}")
 608 |             element_locator = self.find_element_by_locator(locator_str, text, index)
 609 | 
 610 |             if element_locator:
 611 | 
 612 |                 # --- Playwright Click Actions and Options ---
 613 |                 # 1. Basic Click:
 614 |                 # element_locator.click()
 615 | 
 616 |                 # 2. Force Click (Bypasses visibility checks - use cautiously):
 617 |                 # element_locator.click(force=True)
 618 | 
 619 |                 # 3. Positioned Click (Click at specific coordinates within the element):
 620 |                 # bounding_box = element_locator.bounding_box()
 621 |                 # if bounding_box:
 622 |                 #     x = bounding_box['x'] + bounding_box['width'] / 2 # Center X
 623 |                 #     y = bounding_box['y'] + bounding_box['height'] / 2 # Center Y
 624 |                 #     self.page.mouse.click(x, y)
 625 |                 # else:
 626 |                 #     element_locator.click() # Fallback if bounding box fails
 627 | 
 628 |                 # 4. Click with Delay (Simulate user-like click):
 629 |                 # element_locator.click(delay=100) # 100ms delay
 630 | 
 631 |                 # 5. No Wait After (For faster navigation in some cases - use with care):
 632 |                 # element_locator.click(no_wait_after=True)
 633 | 
 634 |                 # 6. Timeout for Click (Control how long to wait for element to be actionable):
 635 |                 # element_locator.click(timeout=5000) # 5 seconds timeout
 636 | 
 637 |                 # 7. Multiple Clicks (Double click, Triple click etc.):
 638 |                 # element_locator.click(click_count=2) # Double click
 639 | 
 640 |                 # Using a standard click for now for general use case:
 641 |                 element_locator.click()
 642 | 
 643 |                 # --- Waiting after Click ---
 644 |                 # 1. Wait for Load State (Most common for page navigation):
 645 |                 self.page.wait_for_load_state("load") # "load", "domcontentloaded", "networkidle"
 646 | 
 647 |                 # 2. Wait for Navigation (Specifically for navigation actions):
 648 |                 # self.page.wait_for_navigation() # Waits until navigation completes
 649 | 
 650 |                 # 3. Wait for Selector (Wait for an element to appear after click):
 651 |                 # self.page.wait_for_selector(".next-page-content")
 652 | 
 653 |                 # 4. Explicit Timeout (If specific wait is needed):
 654 |                 # time.sleep(2) # Wait for 2 seconds
 655 | 
 656 |                 if self.debug:
 657 |                     self.take_screenshot()
 658 | 
 659 |                 return {
 660 |                     "status": "SUCCESS",
 661 |                     "message": f"Clicked on element with locator: '{locator_str}' or text: '{text}'",
 662 |                     "title": self.page.title(),
 663 |                     "current_url": self.page.url
 664 |                 }
 665 |             else:
 666 |                 return {"status": "ERROR", "message": f"Element not found for click: locator='{locator_str}', text='{text}'"}
 667 |         except Exception as e:
 668 |             return {"status": "ERROR", "message": str(e)}
 669 | 
 670 |     def type_text(self, locator_str=None, text=None):
 671 |         """Type text into an input element using Playwright. Demonstrates various typing methods."""
 672 |         if not text:
 673 |             return {"status": "ERROR", "message": "No text provided to type"}
 674 | 
 675 |         try:
 676 |             logging.info(f"⌨️ Typing: {text}")
 677 | 
 678 |             element_locator = None
 679 |             if locator_str:
 680 |                 element_locator = self.find_element_by_locator(locator_str)
 681 | 
 682 |             if not element_locator:
 683 |                 # Fallback to find any input, textarea, or editable element if locator fails
 684 |                 input_locators = [
 685 |                     "input", "textarea", "[contenteditable='true']", "[role='textbox']"
 686 |                 ]
 687 |                 for sel in input_locators:
 688 |                     temp_locator = self.page.locator(sel)
 689 |                     if temp_locator.count() > 0:
 690 |                         element_locator = temp_locator.first # Take the first one if multiple are found
 691 |                         break
 692 | 
 693 |             if element_locator:
 694 |                 # --- Playwright Typing Actions and Options ---
 695 |                 # 1. Fill (Recommended for input fields - clears existing content and types):
 696 |                 # element_locator.fill(text)
 697 | 
 698 |                 # 2. Type (Simulates keyboard typing - appends to existing content, can use delay):
 699 |                 # element_locator.type(text) # Basic type
 700 |                 # element_locator.type(text, delay=50) # Type with 50ms delay per character
 701 | 
 702 |                 # 3. Press Sequences (Send special keys, combinations):
 703 |                 # element_locator.press("Enter")
 704 |                 # element_locator.press("Shift+Tab")
 705 |                 # element_locator.pressSequentially(text, delay=50) # Type with delay, like .type but can handle special characters better
 706 | 
 707 |                 # 4. Clear and Type (Manual clear before typing):
 708 |                 # element_locator.clear() # Playwright's clear is robust
 709 |                 # element_locator.type(text)
 710 | 
 711 |                 # Using fill for robustness in most input scenarios:
 712 |                 element_locator.fill(text)
 713 | 
 714 |                 return {"status": "SUCCESS", "message": f"Typed '{text}' into input field using locator: '{locator_str}'"}
 715 |             else:
 716 |                 # Fallback to typing into focused element if no specific input is found
 717 |                 self.page.keyboard.type(text) # Type into currently focused element
 718 |                 return {"status": "SUCCESS", "message": f"Typed '{text}' into active element (fallback)"}
 719 | 
 720 |         except Exception as e:
 721 |             return {"status": "ERROR", "message": str(e)}
 722 | 
 723 |     def navigate_to_url(self, url):
 724 |         """Navigate to a specific URL using Playwright."""
 725 |         if not url:
 726 |             return {"status": "ERROR", "message": "No URL provided"}
 727 | 
 728 |         try:
 729 |             if not url.startswith(('http://', 'https://')):
 730 |                 url = 'https://' + url
 731 | 
 732 |             logging.info(f"🌐 Navigating to: {url}")
 733 |             self.page.goto(url, wait_until="load", timeout=30000) # Playwright's goto with wait_until and timeout
 734 | 
 735 |             self.handle_dialogs() # Handle dialogs after navigation
 736 | 
 737 |             if self.debug:
 738 |                 self.take_screenshot()
 739 | 
 740 |             return {
 741 |                 "status": "SUCCESS",
 742 |                 "message": f"Navigated to {url}",
 743 |                 "title": self.page.title(),
 744 |                 "current_url": self.page.url
 745 |             }
 746 |         except Exception as e:
 747 |             return {"status": "ERROR", "message": str(e)}
 748 | 
 749 |     def scroll_page(self, direction="down", amount=300):
 750 |         """Scroll the page using Playwright. Demonstrates different scroll options."""
 751 |         try:
 752 |             logging.info(f"📜 Scrolling {direction}")
 753 | 
 754 |             # --- Playwright Scrolling Options ---
 755 |             # 1. JavaScript Scroll (Similar to Selenium, but using Playwright's evaluate):
 756 |             if direction.lower() == "down":
 757 |                 self.page.evaluate(f"window.scrollBy(0, {amount})")
 758 |             elif direction.lower() == "up":
 759 |                 self.page.evaluate(f"window.scrollBy(0, -{amount})")
 760 |             elif direction.lower() == "top":
 761 |                 self.page.evaluate("window.scrollTo(0, 0)")
 762 |             elif direction.lower() == "bottom":
 763 |                 self.page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
 764 |             elif direction.lower() == "right":
 765 |                 self.page.evaluate(f"window.scrollBy({amount}, 0)")
 766 |             elif direction.lower() == "left":
 767 |                 self.page.evaluate(f"window.scrollBy(-{amount}, 0)")
 768 | 
 769 |             # 2. Playwright's built-in scrolling (More control over element scrolling - for specific elements, not whole page directly)
 770 |             # For whole page scrolling, JavaScript approach is still common and effective.
 771 | 
 772 |             time.sleep(1)
 773 | 
 774 |             if self.debug:
 775 |                 self.take_screenshot()
 776 | 
 777 |             return {"status": "SUCCESS", "message": f"Scrolled {direction}"}
 778 |         except Exception as e:
 779 |             return {"status": "ERROR", "message": str(e)}
 780 | 
 781 |     def wait_for(self, seconds=3):
 782 |         """Wait for the specified number of seconds using Playwright."""
 783 |         try:
 784 |             logging.info(f"⏱️ Waiting for {seconds} seconds")
 785 |             self.page.wait_for_timeout(seconds * 1000) # Playwright's wait_for_timeout takes milliseconds
 786 |             return {"status": "SUCCESS", "message": f"Waited for {seconds} seconds"}
 787 |         except Exception as e:
 788 |             return {"status": "ERROR", "message": str(e)}
 789 | 
 790 |     def extract_content(self, extract_type="text", locator_str=None):
 791 |         """Extract content from the page using Playwright. Demonstrates various extraction methods."""
 792 |         try:
 793 |             logging.info(f"📄 Extracting {extract_type} content")
 794 | 
 795 |             if extract_type == "text":
 796 |                 # Extract main text content, optionally using a locator
 797 | 
 798 |                 if locator_str:
 799 |                     element_locator = self.find_element_by_locator(locator_str=locator_str)
 800 |                     if element_locator:
 801 |                         # --- Playwright Text Extraction Methods ---
 802 |                         # 1. textContent() - Get text content of the element and its children
 803 |                         text_content = element_locator.text_content()
 804 | 
 805 |                         # 2. innerText() - Get rendered text content (similar to browser's innerText property)
 806 |                         # text_content = element_locator.inner_text()
 807 | 
 808 |                         # 3. innerHTML() - Get the inner HTML content of the element
 809 |                         # html_content = element_locator.inner_html()
 810 |                         # text_content = html_content # Or process HTML as needed
 811 | 
 812 |                         # 4. getAttribute() - Get specific attribute value
 813 |                         # attribute_value = element_locator.get_attribute("href")
 814 |                         # text_content = attribute_value # Or process attribute value
 815 | 
 816 |                     else:
 817 |                         return {"status": "ERROR", "message": f"Could not find element with locator: {locator_str}"}
 818 |                 else:
 819 |                     # Extract from whole body if no locator specified
 820 |                     text_content = self.page.locator("body").text_content() # Extract text from body
 821 | 
 822 |                 main_text = text_content[:2000] + "..." if len(text_content) > 2000 else text_content
 823 | 
 824 |                 return {
 825 |                     "status": "SUCCESS",
 826 |                     "message": f"Extracted text content",
 827 |                     "data": {
 828 |                         "title": self.page.title(),
 829 |                         "url": self.page.url,
 830 |                         "text": main_text
 831 |                     }
 832 |                 }
 833 | 
 834 |             elif extract_type == "links":
 835 |                 # Extract links
 836 |                 links = []
 837 |                 link_elements_locator = self.page.locator("a") # Locator for all 'a' tags
 838 |                 link_count = link_elements_locator.count() # Get count of links for iteration
 839 | 
 840 |                 for i in range(min(link_count, 20)): # Limit to first 20 links
 841 |                     try:
 842 |                         link_element = link_elements_locator.nth(i)
 843 |                         href = link_element.get_attribute("href") # Get 'href' attribute
 844 |                         text = link_element.text_content().strip() # Get link text
 845 | 
 846 |                         if href and text and len(text) > 1:
 847 |                             links.append({"url": href, "text": text})
 848 |                     except:
 849 |                         continue
 850 | 
 851 |                 return {
 852 |                     "status": "SUCCESS",
 853 |                     "message": f"Extracted {len(links)} links",
 854 |                     "data": {
 855 |                         "title": self.page.title(),
 856 |                         "url": self.page.url,
 857 |                         "links": links
 858 |                     }
 859 |                 }
 860 | 
 861 |             elif extract_type == "search_results":
 862 |                 # Extract search results (Google Search example)
 863 |                 results = []
 864 |                 search_result_selectors = [
 865 |                     "div.g", "div[data-sokoban-container]", "div.v7W49e" # Common Google search result containers
 866 |                 ]
 867 | 
 868 |                 for selector in search_result_selectors:
 869 |                     result_elements_locator = self.page.locator(selector)
 870 |                     result_count = result_elements_locator.count()
 871 | 
 872 |                     if result_count > 0:
 873 |                         for i in range(min(result_count, 10)): # Limit to first 10 results
 874 |                             try:
 875 |                                 result_element = result_elements_locator.nth(i)
 876 | 
 877 |                                 # --- Chained Locators for deeper element selection ---
 878 |                                 title_locator = result_element.locator("h3") # Find h3 within result
 879 |                                 title = title_locator.text_content()
 880 | 
 881 |                                 link_locator = title_locator.locator("xpath=./ancestor::a") # Find parent 'a' tag using XPath relative to title
 882 |                                 link = link_locator.get_attribute("href")
 883 | 
 884 |                                 desc_locator = result_element.locator("div.VwiC3b, div.s") # Find description
 885 |                                 description = desc_locator.text_content() if desc_locator.count() > 0 else "" # Optional description
 886 | 
 887 |                                 results.append({
 888 |                                     "title": title,
 889 |                                     "url": link,
 890 |                                     "description": description
 891 |                                 })
 892 |                             except:
 893 |                                 continue
 894 |                         if results:
 895 |                             break # Stop if results are found for a selector
 896 | 
 897 |                 return {
 898 |                     "status": "SUCCESS",
 899 |                     "message": f"Extracted {len(results)} search results",
 900 |                     "data": {
 901 |                         "query": self.page.title().replace(" - Google Search", ""),
 902 |                         "url": self.page.url,
 903 |                         "results": results
 904 |                     }
 905 |                 }
 906 |             elif extract_type == "element_text" and locator_str: # Extract text from a specific element using locator
 907 |                  element_locator = self.find_element_by_locator(locator_str=locator_str)
 908 |                  if element_locator:
 909 |                      return {
 910 |                          "status": "SUCCESS",
 911 |                          "message": f"Extracted text from element with locator '{locator_str}'",
 912 |                          "data": {
 913 |                              "text": element_locator.text_content(),
 914 |                              "url": self.page.url
 915 |                          }
 916 |                      }
 917 |                  else:
 918 |                      return {"status": "ERROR", "message": f"Element with locator '{locator_str}' not found for extraction."}
 919 | 
 920 |             else:
 921 |                 return {"status": "ERROR", "message": f"Unknown extract type: {extract_type}"}
 922 | 
 923 |         except Exception as e:
 924 |             return {"status": "ERROR", "message": str(e)}
 925 | 
 926 |     def handle_dialogs(self):
 927 |         """Handle common dialogs like cookie notices and popups using Playwright."""
 928 |         dismiss_selectors = [
 929 |             "#L2AGLb",  # Google cookie notice
 930 |             "button[aria-label='Accept all']",
 931 |             "button[aria-label='Accept']",
 932 |             "text=Accept", # Text based locator example
 933 |             "text=Accept all",
 934 |             "text=I agree",
 935 |             "text=Agree",
 936 |             "text=Allow",
 937 |             "text=Close",
 938 |             "text=No thanks",
 939 |             "text=Got it",
 940 |             ".modal button",
 941 |             ".popup button",
 942 |             "[aria-label='Close']",
 943 |             ".cookie-banner button",
 944 |             "#consent-banner button",
 945 |             ".consent button"
 946 |         ]
 947 | 
 948 |         for selector in dismiss_selectors:
 949 |             try:
 950 |                 dialog_locator = self.page.locator(selector)
 951 |                 if dialog_locator.count() > 0: # Check if dialog element exists
 952 |                     if dialog_locator.is_visible(): # Check for visibility to ensure it's actually displayed
 953 |                         dialog_locator.click(timeout=5000) # Click to dismiss, with a timeout
 954 |                         logging.info(f"🍪 Dismissed dialog with selector: {selector}")
 955 |                         break # Dismiss only one dialog at a time per handle_dialogs call
 956 |             except Exception as e:
 957 |                 logging.warning(f"Issue handling dialog with selector '{selector}': {e}")
 958 |                 continue
 959 | 
 960 |     def generate_task_summary(self, task):
 961 |         """Generate a summary of the task execution."""
 962 |         successful_steps = sum(
 963 |             1 for action in self.action_history if action.get('action') not in ['TASK_COMPLETE', 'ABORT', 'MANUAL_CAPTCHA',
 964 |                                                                                'RETRY',
 965 |                                                                                'EXPLORE_WEBSITE'] and action.get(
 966 |                 'status') == 'SUCCESS')
 967 |         error_steps = sum(1 for action in self.action_history if action.get('status') == 'ERROR')
 968 |         manual_captcha_steps = sum(1 for action in self.action_history if action.get('action') == 'MANUAL_CAPTCHA')
 969 |         exploration_steps = sum(1 for action in self.action_history if action.get('action') == 'EXPLORE_WEBSITE')
 970 | 
 971 |         final_screenshot, _ = self.take_screenshot()
 972 | 
 973 |         summary = [
 974 |             f"Task: {task}",
 975 |             f"Completed {successful_steps} action(s) with {error_steps} error(s) encountered.",
 976 |             f"Manual Captcha Handled: {manual_captcha_steps} time(s).",
 977 |             f"Website Exploration Steps: {exploration_steps} initiated.",
 978 |             f"Final URL: {self.page.url}",
 979 |             f"Final page title: {self.page.title()}"
 980 |         ]
 981 | 
 982 |         if exploration_steps > 0:
 983 |             summary.append("Note: Website exploration was performed.")
 984 | 
 985 |         try:
 986 |             extract_result = self.extract_content("text")
 987 |             if extract_result.get("status") == "SUCCESS":
 988 |                 summary.append(f"Page content (excerpt): {extract_result['data']['text'][:200]}...")
 989 |         except:
 990 |             pass
 991 | 
 992 |         return "\n".join(summary)
 993 | 
 994 | def run_assistant():
 995 |     """Main function to run the autonomous web assistant."""
 996 |     parser = argparse.ArgumentParser(description="Autonomous Web Assistant powered by Gemini and Playwright")
 997 |     parser.add_argument("task", nargs="?", help="The task to perform")
 998 |     parser.add_argument("--headless", action="store_true", help="Run in headless mode (no browser UI)")
 999 |     parser.add_argument("--debug", action="store_true", help="Enable debug mode with more screenshots and logging")
1000 |     parser.add_argument("--memory_file", default="memory.json", help="Path to the memory file (JSON format).")
1001 |     args = parser.parse_args()
1002 | 
1003 |     assistant = AutonomousWebAssistant(headless=args.headless, debug=args.debug, memory_file=args.memory_file)
1004 | 
1005 |     try:
1006 |         if args.task:
1007 |             assistant.execute_task(args.task)
1008 |         else:
1009 |             print("🤖 Autonomous Web Assistant powered by Gemini and Playwright")
1010 |             print("Type 'exit' or 'quit' to end, 'clear memory' to clear, or 'show memory' to display memory.")
1011 | 
1012 |             while True:
1013 |                 task = input("Enter a task (or command): ")
1014 |                 if task.lower() in ['exit', 'quit']:
1015 |                     break
1016 |                 elif task.lower() == 'clear memory':
1017 |                     category = input("Clear all memory or specific category? (all/[category_name]): ").strip()
1018 |                     if category.lower() == 'all':
1019 |                          assistant.clear_memory()
1020 |                     else:
1021 |                         assistant.clear_memory(category=category)
1022 |                     print("Memory cleared.")
1023 |                 elif task.lower() == 'show memory':
1024 |                     print(json.dumps(assistant.memory, indent=4))
1025 |                 else:
1026 |                     assistant.execute_task(task)
1027 |     finally:
1028 |         assistant.close_browser()
1029 | 
1030 | if __name__ == "__main__":
1031 |     run_assistant()
1032 | 


--------------------------------------------------------------------------------