├── readme.md └── main.py /readme.md: -------------------------------------------------------------------------------- 1 | # 🌐🤖 WebSurferAI: Your Autonomous Web Navigator 2 | 3 | [![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://example.com) 4 | [![Python Version](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/) 5 | [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE) 6 | 7 | 8 | **WebSurferAI** is a cutting-edge autonomous web assistant that leverages the power of Google's Gemini Pro Vision API and Playwright to interact with websites like a human. It can understand complex tasks, navigate web pages, fill forms, extract information, and even handle CAPTCHAs (with your help!). This project is in active development, and we're looking for passionate contributors to make it even more amazing! 🚀✨ 9 | 10 | ## 🌟 Features 11 | 12 | * **Autonomous Task Execution:** Give it a task, and it will try its best to complete it. 13 | * **Intelligent Navigation:** Uses Gemini Pro Vision to understand web pages and decide the best course of action. 14 | * **Dynamic Interaction:** Clicks buttons, fills forms, scrolls pages, and more. 15 | * **Information Extraction:** Pulls relevant text and data from websites. 16 | * **Memory System:** Remembers past interactions and learns from them (stores in `memory.json`). 17 | * **CAPTCHA Handling:** Detects CAPTCHAs and prompts you for manual assistance. 18 | * **Website Exploration Mode:** Can explore websites to discover functionalities, especially useful for initial task discovery. 19 | * **Error Recovery:** Attempts to recover from errors and continue the task. 20 | * **Debug Mode:** Provides detailed logging and saves screenshots for each step. 21 | * **Internal Monologue:** Logs Gemini's reasoning and decision-making process (in debug mode). 22 | * **Handles dialogs** 23 | 24 | ## 🚀 Getting Started 25 | 26 | ### Prerequisites 27 | 28 | 1. **Python:** Make sure you have Python 3.7 or higher installed. You can check by running `python --version` or `python3 --version` in your terminal. 29 | 2. **Playwright:** The project uses Playwright for browser automation. It will be installed in the next step, but you'll need the browser binaries. 30 | 3. **Google Gemini API Key:** You'll need an API key for Google Gemini. You can get one from [Google AI Studio](https://makersuite.google.com/app/apikey). 31 | 4. **Node.js and npm (or yarn)**: Playwright uses Node.js. Install Node.js and npm. 32 | 33 | ### Installation 34 | 35 | 1. **Clone the repository:** 36 | 37 | ```bash 38 | git clone https://github.com/yourusername/WebSurferAI.git # Replace with your repo URL 39 | cd WebSurferAI 40 | ``` 41 | 42 | 2. **Create a virtual environment (recommended):** 43 | 44 | ```bash 45 | python3 -m venv venv 46 | source venv/bin/activate # On Windows: venv\Scripts\activate 47 | ``` 48 | 49 | 3. **Install dependencies:** 50 | 51 | ```bash 52 | pip install -r requirements.txt 53 | ``` 54 | Create a `requirements.txt` file with these contents: 55 | ``` 56 | playwright 57 | Pillow 58 | python-dotenv 59 | google-generativeai 60 | ``` 61 | 62 | 4. **Install Playwright browsers:** 63 | 64 | ```bash 65 | playwright install 66 | ``` 67 | This command downloads the necessary browser binaries (Chromium, Firefox, WebKit). This might take a few minutes. 68 | 69 | 5. **Set up your API Key:** 70 | 71 | * Create a `.env` file in the root directory of the project. 72 | * Add your Gemini API key to the `.env` file: 73 | 74 | ``` 75 | GEMINI_API_KEY=your_api_key_here 76 | ``` 77 | * **Important:** *Never* commit your `.env` file to version control. Add `.env` to your `.gitignore` file. 78 | 79 | ### Running the Assistant 80 | 81 | 1. **From the command line:** 82 | 83 | ```bash 84 | python main.py "Your task here" 85 | ``` 86 | Replace `"Your task here"` with the task you want the assistant to perform. For example: 87 | 88 | ```bash 89 | python main.py "Find the price of a Tesla Model 3 on the Tesla website" 90 | ``` 91 | 92 | * **Headless mode:** To run without showing the browser window, use the `--headless` flag: 93 | 94 | ```bash 95 | python main.py "Your task here" --headless 96 | ``` 97 | 98 | * **Debug mode:** For more detailed output and screenshots, use the `--debug` flag: 99 | 100 | ```bash 101 | python main.py "Your task here" --debug 102 | ``` 103 | * **Specify memory file (optional):** Use --memory_file, defaults to `memory.json` 104 | 105 | ```bash 106 | python main.py "Your task here" --memory_file my_custom_memory.json 107 | ``` 108 | 109 | 2. **Interactive Mode:** If you run the script without a task argument, it will start in interactive mode: 110 | 111 | ```bash 112 | python main.py 113 | ``` 114 | 115 | You can then enter tasks one by one. You can also use the following commands: 116 | * `exit` or `quit`: End the program. 117 | * `clear memory`: Clears the assistant's memory. You can also specify a category: `clear memory website`. 118 | * `show memory`: Displays the current contents of the memory. 119 | 120 | ## 🤝 Contributing 121 | 122 | We ❤️ contributions! WebSurferAI is a community project, and we welcome anyone who wants to help make it better. Whether you're a seasoned developer or just starting out, there are many ways to contribute: 123 | 124 | * **Bug Reports:** If you find a bug, please open an issue on GitHub. Be as detailed as possible, including steps to reproduce the bug. 125 | * **Feature Requests:** Have an idea for a new feature? Open an issue and describe it! 126 | * **Code Contributions:** 127 | * Fork the repository. 128 | * Create a new branch for your feature or bug fix: `git checkout -b my-new-feature` 129 | * Make your changes. 130 | * Write tests for your code (if applicable). 131 | * Ensure your code follows the existing style (use a linter like `flake8` or `pylint`). 132 | * Commit your changes: `git commit -m "Add some amazing feature"` 133 | * Push to your branch: `git push origin my-new-feature` 134 | * Open a pull request on GitHub. 135 | * **Documentation:** Improve the README, add docstrings to the code, or create tutorials. 136 | * **Testing:** Help us test the assistant on different websites and with different tasks. 137 | * **Ideas and Feedback:** Share your thoughts and suggestions on how to improve the project. 138 | 139 | We especially need help with: 140 | 141 | * **Improving the prompt engineering:** Refining the prompts sent to Gemini can significantly enhance the assistant's performance. 142 | * **Expanding error handling:** Making the assistant more robust to unexpected website behavior. 143 | * **Adding support for more websites:** Testing and adapting the assistant to work with a wider range of websites. 144 | * **Developing a user interface:** A graphical user interface would make the assistant more accessible. 145 | * **Creating more sophisticated memory management:** Improving how the assistant stores and retrieves information. 146 | * **Parallel task execution:** allowing for multiple simultaneous actions, if possible. 147 | 148 | ## 🗺️ Project Structure 149 | 150 | * `main.py`: The main script containing the `AutonomousWebAssistant` class and the command-line interface. 151 | * `screenshots/`: Directory where screenshots are saved (created automatically). 152 | * `memory.json`: The default file where the assistant's memory is stored (created automatically). 153 | * `.env`: File for storing your API key (you need to create this). 154 | * `requirements.txt`: List of Python dependencies. 155 | 156 | ## 📝 License 157 | 158 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. (You need to create a LICENSE file). 159 | 160 | ## 🙏 Acknowledgements 161 | 162 | * Google Gemini Team 163 | * Playwright Team 164 | * All the contributors! 165 | 166 | Let's build the future of web automation together! 🌐✨🤖 167 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import argparse 4 | import json 5 | import base64 6 | from urllib.parse import urlparse, urljoin 7 | from datetime import datetime 8 | import uuid 9 | from dotenv import load_dotenv 10 | import logging 11 | from playwright.sync_api import sync_playwright 12 | from PIL import Image 13 | from io import BytesIO 14 | import google.generativeai as genai 15 | 16 | # --- Setup Logging --- 17 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') 18 | 19 | # Load environment variables 20 | load_dotenv() 21 | 22 | # Configure Google Gemini API 23 | API_KEY = os.getenv("GEMINI_API_KEY") 24 | if not API_KEY: 25 | API_KEY = "YOUR_API_KEY_HERE" # Default API KEY 26 | if API_KEY == "YOUR_API_KEY_HERE": 27 | logging.warning("Using default API key. Set your GEMINI_API_KEY in .env file for proper use.") 28 | 29 | genai.configure(api_key=API_KEY) 30 | 31 | # Initialize the Gemini model - using the visual model 32 | MODEL_NAME = "gemini-2.0-flash-thinking-exp" # "gemini-2.0-flash" or "gemini-1.0-pro-vision-001", gemini-pro 33 | model = genai.GenerativeModel(MODEL_NAME) 34 | 35 | class AutonomousWebAssistant: 36 | def __init__(self, headless=False, debug=False, screenshot_dir="screenshots", memory_file="memory.json"): 37 | self.playwright = sync_playwright().start() 38 | self.browser = self.playwright.chromium.launch(headless=headless) 39 | self.page = None # Playwright Page object will be initialized in initialize_browser 40 | self.headless = headless 41 | self.debug = debug 42 | self.screenshot_dir = screenshot_dir 43 | self.screenshot_count = 0 44 | self.current_task = None 45 | self.task_history = [] 46 | self.action_history = [] 47 | self.memory_file = memory_file 48 | self.memory = self.load_memory() 49 | self.captcha_solving_active = False 50 | self.element_search_timeout = 10 51 | self.explored_urls = set() 52 | self.internal_monologue = [] 53 | 54 | if not os.path.exists(self.screenshot_dir): 55 | os.makedirs(self.screenshot_dir) 56 | 57 | self.initialize_browser() 58 | 59 | def initialize_browser(self): 60 | """Initialize the Playwright browser and page.""" 61 | if self.page: 62 | self.page.close() # Close existing page if any before creating new one 63 | self.page = self.browser.new_page() 64 | self.page.set_viewport_size({"width": 1920, "height": 1080}) # Consistent viewport size 65 | logging.info("Playwright browser initialized.") 66 | 67 | def load_memory(self): 68 | """Loads memory from the memory file.""" 69 | try: 70 | with open(self.memory_file, 'r') as f: 71 | return json.load(f) 72 | except (FileNotFoundError, json.JSONDecodeError): 73 | logging.info("Memory file not found or invalid. Starting with an empty memory.") 74 | return {} 75 | 76 | def save_memory(self): 77 | """Saves the current memory to the memory file.""" 78 | try: 79 | with open(self.memory_file, 'w') as f: 80 | json.dump(self.memory, f, indent=4) 81 | except Exception as e: 82 | logging.error(f"Error saving memory to file: {e}") 83 | 84 | def add_memory(self, key, value, category="general"): 85 | """Adds a new memory entry using UUIDs for unique keys.""" 86 | memory_id = str(uuid.uuid4()) 87 | self.memory[memory_id] = { 88 | "key": key, 89 | "value": value, 90 | "category": category, 91 | "timestamp": datetime.now().isoformat() 92 | } 93 | self.save_memory() 94 | return memory_id 95 | 96 | def retrieve_memory(self, key, category=None): 97 | """Retrieves memory entries based on key and optionally category.""" 98 | results = [] 99 | for mem_id, mem_data in self.memory.items(): 100 | if mem_data['key'] == key and (category is None or mem_data['category'] == category): 101 | results.append(mem_data) 102 | return results 103 | 104 | def clear_memory(self, category=None): 105 | """Clears memory entries, optionally filtering by category.""" 106 | if category: 107 | keys_to_delete = [mem_id for mem_id, mem_data in self.memory.items() if mem_data['category'] == category] 108 | for mem_id in keys_to_delete: 109 | del self.memory[mem_id] 110 | else: 111 | self.memory = {} 112 | self.save_memory() 113 | 114 | def close_browser(self): 115 | """Close the Playwright browser and context.""" 116 | if self.browser: 117 | self.browser.close() 118 | self.playwright.stop() 119 | logging.info("Playwright browser closed.") 120 | 121 | def take_screenshot(self, filename=None): 122 | """Take a screenshot and save it, or return as bytes.""" 123 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 124 | self.screenshot_count += 1 125 | if filename is None: 126 | filename = f"{self.screenshot_dir}/screenshot_{timestamp}_{self.screenshot_count}.png" 127 | 128 | screenshot = self.page.screenshot() # Take screenshot using Playwright 129 | 130 | if filename: 131 | with open(filename, "wb") as f: 132 | f.write(screenshot) 133 | logging.info(f"📸 Screenshot saved: {filename}") 134 | return screenshot, filename 135 | 136 | def get_screenshot_as_base64(self): 137 | """Get the current screenshot as base64 string for API requests.""" 138 | screenshot = self.page.screenshot() 139 | return base64.b64encode(screenshot).decode('utf-8') 140 | 141 | def execute_task(self, user_task): 142 | """Main method to process and execute a user task autonomously.""" 143 | logging.info(f"🤖 Understanding task: {user_task}") 144 | self.current_task = user_task 145 | self.task_history.append(user_task) 146 | self.internal_monologue = [] 147 | 148 | if not self.action_history or self.action_history[-1]['action'] == "TASK_COMPLETE": 149 | self.navigate_to_url("https://www.google.com") 150 | 151 | max_steps = 30 152 | current_step = 0 153 | exploration_depth = 2 154 | retry_attempts = 0 155 | max_retry_attempts = 3 156 | 157 | while current_step < max_steps: 158 | current_step += 1 159 | logging.info(f"\n🔄 Step {current_step}/{max_steps}: Taking screenshot and determining next action...") 160 | 161 | screenshot, filename = self.take_screenshot() 162 | 163 | next_action = self.get_next_action_from_gemini(screenshot, user_task, current_step) 164 | 165 | self.internal_monologue.append({ 166 | "step": current_step, 167 | "gemini_reasoning": next_action.get("reasoning", "No reasoning provided"), 168 | "action": next_action["action"], 169 | "details": next_action.get("details", {}), 170 | "self_assessment": "Evaluating action..." 171 | }) 172 | 173 | self.action_history.append({ 174 | "step": current_step, 175 | "action": next_action["action"], 176 | "details": next_action.get("details", {}), 177 | "screenshot": filename 178 | }) 179 | 180 | if next_action["action"] == "TASK_COMPLETE": 181 | logging.info(f"✅ Task completed: {next_action.get('message', 'Gemini determined the task is complete')}") 182 | break 183 | elif next_action["action"] == "MANUAL_CAPTCHA": 184 | logging.warning("🚨 Captcha detected! Pausing automation. Please solve the captcha manually in the browser.") 185 | self.captcha_solving_active = True 186 | input("Press Enter after you have solved the captcha...") 187 | self.captcha_solving_active = False 188 | logging.info("Resuming automation...") 189 | continue 190 | elif next_action["action"] == "EXPLORE_WEBSITE": 191 | logging.info("🌐 Initiating website exploration...") 192 | self.explore_website(url=self.page.url, max_depth=exploration_depth) 193 | logging.info("Exploration complete. Resuming task execution.") 194 | continue 195 | elif next_action["action"] == "RETRY": 196 | logging.info("🔄 Gemini suggested to retry the last action...") 197 | retry_attempts += 1 198 | if retry_attempts > max_retry_attempts: 199 | logging.error(f"❌ Max retry attempts reached ({max_retry_attempts}). Aborting.") 200 | break 201 | continue 202 | else: 203 | retry_attempts = 0 204 | 205 | status = self.execute_action(next_action) 206 | self.internal_monologue[-1]["action_result"] = status 207 | 208 | if status.get("status") == "ERROR": 209 | logging.error(f"❌ Error executing action: {status.get('message')}") 210 | recovery_screenshot, _ = self.take_screenshot() 211 | recovery_action = self.get_recovery_action(recovery_screenshot, status.get("message"), user_task) 212 | 213 | if recovery_action["action"] == "ABORT": 214 | logging.error("❌ Cannot recover from error, aborting task") 215 | break 216 | 217 | self.internal_monologue[-1]["recovery_action"] = recovery_action 218 | recovery_status = self.execute_action(recovery_action) 219 | self.internal_monologue[-1]["recovery_result"] = recovery_status 220 | 221 | if recovery_status.get("status") == "ERROR": 222 | logging.error(f"❌ Recovery action failed: {recovery_status.get('message')}. Aborting.") 223 | break 224 | 225 | time.sleep(1) 226 | 227 | summary = self.generate_task_summary(user_task) 228 | logging.info("\n📊 Task Summary:") 229 | logging.info(summary) 230 | 231 | if self.debug: 232 | logging.info("\n🧠 Internal Monologue:") 233 | for thought in self.internal_monologue: 234 | logging.info(thought) 235 | 236 | return { 237 | "task": user_task, 238 | "steps": current_step, 239 | "actions": self.action_history, 240 | "summary": summary, 241 | "internal_monologue": self.internal_monologue 242 | } 243 | 244 | def get_next_action_from_gemini(self, screenshot, task, step_number): 245 | """Use Gemini to analyze screenshot and determine next action.""" 246 | if isinstance(screenshot, bytes): 247 | image_bytes = screenshot 248 | else: 249 | with open(screenshot, "rb") as f: 250 | image_bytes = f.read() 251 | 252 | image_parts = [ 253 | { 254 | "inline_data": { 255 | "data": base64.b64encode(image_bytes).decode("utf-8"), 256 | "mime_type": "image/png" 257 | } 258 | } 259 | ] 260 | 261 | relevant_memories = [] 262 | relevant_memories.extend(list(self.memory.values())[-5:]) 263 | relevant_memories.extend(self.retrieve_memory(key=urlparse(self.page.url).netloc, category="website")) 264 | 265 | memory_context = "" 266 | if relevant_memories: 267 | memory_context = "\n**Relevant Memories:**\n" 268 | for mem in relevant_memories: 269 | memory_context += f"- {mem['key']}: {mem['value']}\n" 270 | 271 | prompt = f""" 272 | You are an expert web automation assistant using Playwright. 273 | 274 | **Current Task:** {task} 275 | **Step Number:** {step_number} 276 | **Current URL:** {self.page.url} 277 | **Page Title:** {self.page.title()} 278 | **Previous Actions:** (Summarized) {self.summarize_action_history()} 279 | {memory_context} 280 | 281 | **Your Goal:** Autonomously complete the user's task by interacting with the webpage using Playwright. 282 | 283 | **Consider these capabilities and instructions when deciding the next action:** 284 | 285 | 1. **Task Understanding and Goal Decomposition:** Understand the overall task. Break it down into smaller steps. 286 | 287 | 2. **Website Exploration for Task Discovery (NEW FEATURE):** If needed for vague tasks, suggest action: "EXPLORE_WEBSITE". 288 | 289 | 3. **CAPTCHA Handling:** If CAPTCHA is visible, suggest "MANUAL_CAPTCHA". 290 | 291 | 4. **Action Selection Strategy:** Choose the MOST RELEVANT SINGLE NEXT ACTION. 292 | 293 | 5. **Element Identification (Playwright Locators):** For "CLICK" and "TYPE" actions, use robust Playwright locators: 294 | * Prioritize **text-based locators** (e.g., `"text=Submit"`, `"text='Log In'"`, `"text=exact:Search"`). 295 | * Use **CSS selectors** when text locators are insufficient (e.g., `"#id"`, `.class`, `"div > button"`). 296 | * Consider **role-based locators** for accessibility (e.g., `"[role='button']"`, `"getByRole('link', name='Learn more')"`). 297 | * For complex scenarios, use **chained locators** (e.g., `".parent >> .child"`). 298 | * If multiple elements match, use `:nth(index)` or `locator.nth(index)` to target a specific one. 299 | * Suggest the **most specific and reliable locator** in 'details'. 300 | * If text is reliable, use text locators. Otherwise, use CSS or other suitable locators. 301 | 302 | 6. **Recovery and Retry:** Suggest "action: RETRY" for transient errors. 303 | 304 | 7. **TASK_COMPLETE Recognition:** Suggest "action: TASK_COMPLETE" when the task is fulfilled. 305 | 306 | 8. **Memory Utilization:** Use provided memories to inform decisions. 307 | 308 | **Output Format:** Return JSON object: 309 | {{ 310 | "action": (CLICK, TYPE, NAVIGATE, SCROLL, WAIT, EXTRACT, TASK_COMPLETE, MANUAL_CAPTCHA, EXPLORE_WEBSITE, ABORT, RETRY) 311 | "details": {{ ...action-specific details... }} 312 | "reasoning": "Explain action choice." 313 | "message": "User-friendly action description." 314 | }} 315 | **Examples:** 316 | {{"action": "CLICK", "details": {{"locator": "text=Sign In"}}, "reasoning": "User needs to log in", "message": "Clicking 'Sign In' button."}} 317 | {{"action": "TYPE", "details": {{"locator": "#search-query", "text": "product search"}}, "reasoning": "Searching for products", "message": "Typing 'product search' in search box."}} 318 | {{"action": "NAVIGATE", "details": {{"url": "https://example.com/pricing"}}, "reasoning": "Navigating to pricing page", "message": "Navigating to pricing page."}} 319 | {{"action": "TASK_COMPLETE", "reasoning": "Task completed", "message": "Task completed."}} 320 | {{"action": "MANUAL_CAPTCHA", "reasoning": "Captcha detected", "message": "Solve CAPTCHA manually."}} 321 | {{"action": "EXPLORE_WEBSITE", "details": {{}}, "reasoning": "Exploring website for testing", "message": "Initiating website exploration."}} 322 | {{"action": "RETRY", "reasoning": "Retrying last action", "message": "Retrying last action."}} 323 | 324 | **IMPORTANT:** Respond with JSON ONLY. 325 | """ 326 | 327 | try: 328 | response = model.generate_content([prompt] + image_parts) 329 | response_text = response.text.strip() 330 | 331 | try: 332 | if response_text.startswith("```json"): 333 | json_text = response_text.split("```json")[1].split("```")[0].strip() 334 | elif response_text.startswith("```"): 335 | json_text = response_text.split("```")[1].strip() 336 | else: 337 | json_text = response_text 338 | 339 | action_data = json.loads(json_text) 340 | 341 | logging.info(f"💭 Gemini's reasoning: {action_data.get('reasoning', 'No reasoning provided')}") 342 | logging.info(f"🚀 Next action: {action_data.get('message', action_data.get('action', 'Unknown action'))}") 343 | return action_data 344 | 345 | except json.JSONDecodeError as e: 346 | logging.error(f"❌ Error parsing Gemini response as JSON: {e}") 347 | logging.error(f"Response text: {response_text}") 348 | return { 349 | "action": "WAIT", 350 | "details": {"seconds": 5}, 351 | "reasoning": "JSON parsing failed. Waiting and will re-prompt.", 352 | "message": "Waiting for 5 seconds due to API response error. Re-prompting." 353 | } 354 | 355 | except Exception as e: 356 | logging.error(f"❌ Error getting next action from Gemini (API error): {e}") 357 | return { 358 | "action": "WAIT", 359 | "details": {"seconds": 10}, 360 | "reasoning": "Gemini API call failed. Waiting and will re-prompt.", 361 | "message": "Waiting for 10 seconds due to API error. Re-prompting." 362 | } 363 | 364 | def summarize_action_history(self, num_actions=5): 365 | """Summarize recent action history for Gemini context.""" 366 | if not self.action_history: 367 | return "No actions taken yet." 368 | recent_actions = self.action_history[-num_actions:] 369 | summary = [] 370 | for action_data in recent_actions: 371 | action_type = action_data['action'] 372 | message = action_data.get('message', action_type) 373 | summary.append(f"Step {action_data['step']}: {message}") 374 | return "; ".join(summary) 375 | 376 | def get_recovery_action(self, screenshot, error_message, task): 377 | """Get a recovery action from Gemini when an action fails.""" 378 | prompt = f""" 379 | There was an error during web automation. 380 | 381 | **Task:** {task} 382 | **Error Message:** {error_message} 383 | **Current URL:** {self.page.url} 384 | **Error Screenshot:** Analyze screenshot to understand error context. 385 | **Recent Actions:** (Summarized) {self.summarize_action_history()} 386 | 387 | Determine a recovery action to continue the task. 388 | 389 | **Recovery Action Considerations:** 390 | 1. Analyze screenshot and error message to understand *why* the action failed. 391 | 2. Is the error transient or a logical mistake? 392 | 3. Suggest a recovery action to resolve the issue. 393 | 4. Available actions: CLICK, TYPE, NAVIGATE, SCROLL, WAIT, ABORT, RETRY. 394 | 395 | **When to Use RETRY:** If error is temporary or due to loading, retry the *same* action after delay. 396 | 397 | **Explain Reasoning:** In "reasoning", explain *why* the recovery action is suggested. 398 | 399 | **Output Format:** JSON object: 400 | {{ 401 | "action": (CLICK, TYPE, NAVIGATE, SCROLL, WAIT, ABORT, RETRY) 402 | "details": {{ ...action-specific details... }} 403 | "reasoning": "Explain recovery action." 404 | }} 405 | 406 | **If recovery is impossible, use "action": "ABORT".** 407 | 408 | **IMPORTANT:** Respond with JSON ONLY. 409 | """ 410 | if isinstance(screenshot, bytes): 411 | image_bytes = screenshot 412 | else: 413 | with open(screenshot, "rb") as f: 414 | image_bytes = f.read() 415 | 416 | image_parts = [ 417 | { 418 | "inline_data": { 419 | "data": base64.b64encode(image_bytes).decode("utf-8"), 420 | "mime_type": "image/png" 421 | } 422 | } 423 | ] 424 | 425 | try: 426 | response = model.generate_content([prompt] + image_parts) 427 | response_text = response.text.strip() 428 | 429 | try: 430 | if response_text.startswith("```json"): 431 | json_text = response_text.split("```json")[1].split("```")[0].strip() 432 | elif response_text.startswith("```"): 433 | json_text = response_text.split("```")[1].strip() 434 | else: 435 | json_text = response_text 436 | 437 | recovery_action = json.loads(json_text) 438 | logging.info(f"🛠️ Recovery action suggested by Gemini: {recovery_action.get('reasoning', 'No reasoning provided')}") 439 | return recovery_action 440 | 441 | except (json.JSONDecodeError, IndexError) as e: 442 | logging.error(f"❌ Error parsing recovery action JSON: {e}") 443 | return {"action": "ABORT", "reasoning": "Could not parse recovery action from Gemini."} 444 | 445 | except Exception as e: 446 | logging.error(f"❌ Error getting recovery action from Gemini (API error): {e}") 447 | return {"action": "ABORT", "reasoning": f"API error during recovery action request: {str(e)}"} 448 | 449 | def execute_action(self, action_data): 450 | """Execute an action based on action type and details using Playwright.""" 451 | action_type = action_data.get("action", "").upper() 452 | details = action_data.get("details", {}) 453 | 454 | logging.info(f"⚙️ Executing: {action_data.get('message', action_type)}") 455 | 456 | try: 457 | if action_type == "CLICK": 458 | locator_str = details.get("locator", "") 459 | text = details.get("text", "") # Text might be used as fallback locator if locator_str is not provided or fails 460 | 461 | return self.click_element(locator_str=locator_str, text=text) 462 | 463 | elif action_type == "TYPE": 464 | locator_str = details.get("locator", "") 465 | text = details.get("text", "") 466 | return self.type_text(locator_str=locator_str, text=text) 467 | 468 | elif action_type == "NAVIGATE": 469 | url = details.get("url", "") 470 | return self.navigate_to_url(url) 471 | 472 | elif action_type == "SCROLL": 473 | direction = details.get("direction", "down") 474 | amount = details.get("amount", 300) 475 | return self.scroll_page(direction, amount) 476 | 477 | elif action_type == "WAIT": 478 | seconds = details.get("seconds", 3) 479 | return self.wait_for(seconds) 480 | 481 | elif action_type == "EXTRACT": 482 | extract_type = details.get("type", "text") 483 | locator_str = details.get("locator") # Locator for extraction 484 | return self.extract_content(extract_type, locator_str=locator_str) 485 | 486 | elif action_type == "EXPLORE_WEBSITE": 487 | return {"status": "SUCCESS", "message": "Website exploration action acknowledged."} 488 | 489 | elif action_type in ["TASK_COMPLETE", "ABORT", "MANUAL_CAPTCHA", "RETRY"]: 490 | return {"status": "SUCCESS", "message": "Action acknowledged"} 491 | 492 | else: 493 | return {"status": "ERROR", "message": f"Unknown action type: {action_type}"} 494 | 495 | except Exception as e: 496 | return {"status": "ERROR", "message": str(e)} 497 | 498 | def explore_website(self, url, max_depth, current_depth=0): 499 | """Recursively explore a website using Playwright.""" 500 | if current_depth >= max_depth or url in self.explored_urls: 501 | return 502 | 503 | try: 504 | logging.info(f"\n🌐 Exploring URL: {url}, Depth: {current_depth}") 505 | nav_status = self.navigate_to_url(url) 506 | if nav_status.get("status") == "ERROR": 507 | logging.error(f"❌ Error navigating to {url} during exploration: {nav_status.get('message')}") 508 | return 509 | 510 | self.explored_urls.add(url) 511 | 512 | self.take_screenshot() 513 | extract_result = self.extract_content(extract_type="text") 514 | 515 | if extract_result.get("status") == "SUCCESS": 516 | logging.info(f"📄 Extracted content from: {url} (excerpt): {extract_result['data']['text'][:150]}...") 517 | self.add_memory(key=urlparse(url).netloc, value=extract_result['data']['text'][:500], category="website") 518 | 519 | else: 520 | logging.warning(f"⚠️ Failed to extract content from: {url}") 521 | 522 | # Find all 'a' tags using Playwright locator for links 523 | links_locator = self.page.locator('a') 524 | links_count = links_locator.count() # Get count of links for iteration (more efficient than fetching all elements at once) 525 | 526 | urls_to_explore = set() 527 | base_url_parsed = urlparse(url) 528 | 529 | for i in range(links_count): # Iterate through links using index 530 | try: 531 | link_element = links_locator.nth(i) # Get link element by index 532 | href = link_element.get_attribute('href') # Get href attribute using Playwright 533 | absolute_url = urljoin(url, href) 534 | 535 | if absolute_url and absolute_url.startswith(('http://', 'https://')): 536 | url_parsed = urlparse(absolute_url) 537 | if url_parsed.netloc == base_url_parsed.netloc and absolute_url not in self.explored_urls: 538 | urls_to_explore.add(absolute_url) 539 | 540 | except Exception as e: # Catch any issues during link processing 541 | logging.warning(f"Issue processing link during exploration: {e}") 542 | continue 543 | 544 | for next_url in urls_to_explore: 545 | self.explore_website(next_url, max_depth, current_depth + 1) 546 | 547 | except Exception as e: 548 | logging.error(f"🔥 Error during website exploration of {url}: {e}") 549 | 550 | def find_element_by_locator(self, locator_str, text=None, index=0): 551 | """Find an element using Playwright locator or fallback to text if locator fails.""" 552 | start_time = time.time() 553 | 554 | while time.time() - start_time < self.element_search_timeout: 555 | try: 556 | if locator_str: 557 | locator = self.page.locator(locator_str) # Use Playwright locator directly 558 | count = locator.count() # Check if elements are found 559 | 560 | if count > 0: 561 | if 0 <= index < count: 562 | element_locator = locator.nth(index) # Get specific element if index is within range 563 | else: 564 | element_locator = locator.first # Default to the first element if index is out of range 565 | 566 | if self.debug: 567 | element_locator.evaluate("element => { element.style.border = '3px solid red'; }") # Highlight element 568 | time.sleep(0.5) 569 | return element_locator # Return Playwright Locator object 570 | 571 | if text: # Fallback to text based search if locator_str is not provided or initial locator didn't find element 572 | # Playwright Text Locators are very powerful and should be preferred. 573 | text_locator_strategies = [ 574 | f"text={text}", # Exact text match 575 | f"text={text}>>nth={index}", # Exact text match with index 576 | f"text=regexp:^{text}$", # Exact text match using regex 577 | f"text=*{text}", # Contains text 578 | f"text=regexp:{text}", # Contains text using regex 579 | f"text=iregex:{text}", # Contains text, case-insensitive regex 580 | f"text='{text}'", # Exact text with single quotes 581 | f"text=\"{text}\"", # Exact text with double quotes 582 | f"text=localized:\"{text}\"" # Localized text (if applicable) 583 | ] 584 | for strategy in text_locator_strategies: 585 | locator = self.page.locator(strategy) 586 | count = locator.count() 587 | if count > 0: 588 | if 0 <= index < count: 589 | element_locator = locator.nth(index) 590 | else: 591 | element_locator = locator.first 592 | 593 | if self.debug: 594 | element_locator.evaluate("element => { element.style.border = '3px solid blue'; }") 595 | time.sleep(0.5) 596 | return element_locator # Return Playwright Locator object 597 | 598 | except Exception as e: 599 | logging.warning(f"Error finding element with locator '{locator_str}' or text '{text}': {e}. Retrying...") 600 | pass # Retry 601 | 602 | return None # Element not found 603 | 604 | def click_element(self, locator_str=None, text=None, index=0): 605 | """Click on an element using Playwright locator or text. Demonstrates various click options.""" 606 | try: 607 | logging.info(f"🖱️ Clicking: {text if text else locator_str}") 608 | element_locator = self.find_element_by_locator(locator_str, text, index) 609 | 610 | if element_locator: 611 | 612 | # --- Playwright Click Actions and Options --- 613 | # 1. Basic Click: 614 | # element_locator.click() 615 | 616 | # 2. Force Click (Bypasses visibility checks - use cautiously): 617 | # element_locator.click(force=True) 618 | 619 | # 3. Positioned Click (Click at specific coordinates within the element): 620 | # bounding_box = element_locator.bounding_box() 621 | # if bounding_box: 622 | # x = bounding_box['x'] + bounding_box['width'] / 2 # Center X 623 | # y = bounding_box['y'] + bounding_box['height'] / 2 # Center Y 624 | # self.page.mouse.click(x, y) 625 | # else: 626 | # element_locator.click() # Fallback if bounding box fails 627 | 628 | # 4. Click with Delay (Simulate user-like click): 629 | # element_locator.click(delay=100) # 100ms delay 630 | 631 | # 5. No Wait After (For faster navigation in some cases - use with care): 632 | # element_locator.click(no_wait_after=True) 633 | 634 | # 6. Timeout for Click (Control how long to wait for element to be actionable): 635 | # element_locator.click(timeout=5000) # 5 seconds timeout 636 | 637 | # 7. Multiple Clicks (Double click, Triple click etc.): 638 | # element_locator.click(click_count=2) # Double click 639 | 640 | # Using a standard click for now for general use case: 641 | element_locator.click() 642 | 643 | # --- Waiting after Click --- 644 | # 1. Wait for Load State (Most common for page navigation): 645 | self.page.wait_for_load_state("load") # "load", "domcontentloaded", "networkidle" 646 | 647 | # 2. Wait for Navigation (Specifically for navigation actions): 648 | # self.page.wait_for_navigation() # Waits until navigation completes 649 | 650 | # 3. Wait for Selector (Wait for an element to appear after click): 651 | # self.page.wait_for_selector(".next-page-content") 652 | 653 | # 4. Explicit Timeout (If specific wait is needed): 654 | # time.sleep(2) # Wait for 2 seconds 655 | 656 | if self.debug: 657 | self.take_screenshot() 658 | 659 | return { 660 | "status": "SUCCESS", 661 | "message": f"Clicked on element with locator: '{locator_str}' or text: '{text}'", 662 | "title": self.page.title(), 663 | "current_url": self.page.url 664 | } 665 | else: 666 | return {"status": "ERROR", "message": f"Element not found for click: locator='{locator_str}', text='{text}'"} 667 | except Exception as e: 668 | return {"status": "ERROR", "message": str(e)} 669 | 670 | def type_text(self, locator_str=None, text=None): 671 | """Type text into an input element using Playwright. Demonstrates various typing methods.""" 672 | if not text: 673 | return {"status": "ERROR", "message": "No text provided to type"} 674 | 675 | try: 676 | logging.info(f"⌨️ Typing: {text}") 677 | 678 | element_locator = None 679 | if locator_str: 680 | element_locator = self.find_element_by_locator(locator_str) 681 | 682 | if not element_locator: 683 | # Fallback to find any input, textarea, or editable element if locator fails 684 | input_locators = [ 685 | "input", "textarea", "[contenteditable='true']", "[role='textbox']" 686 | ] 687 | for sel in input_locators: 688 | temp_locator = self.page.locator(sel) 689 | if temp_locator.count() > 0: 690 | element_locator = temp_locator.first # Take the first one if multiple are found 691 | break 692 | 693 | if element_locator: 694 | # --- Playwright Typing Actions and Options --- 695 | # 1. Fill (Recommended for input fields - clears existing content and types): 696 | # element_locator.fill(text) 697 | 698 | # 2. Type (Simulates keyboard typing - appends to existing content, can use delay): 699 | # element_locator.type(text) # Basic type 700 | # element_locator.type(text, delay=50) # Type with 50ms delay per character 701 | 702 | # 3. Press Sequences (Send special keys, combinations): 703 | # element_locator.press("Enter") 704 | # element_locator.press("Shift+Tab") 705 | # element_locator.pressSequentially(text, delay=50) # Type with delay, like .type but can handle special characters better 706 | 707 | # 4. Clear and Type (Manual clear before typing): 708 | # element_locator.clear() # Playwright's clear is robust 709 | # element_locator.type(text) 710 | 711 | # Using fill for robustness in most input scenarios: 712 | element_locator.fill(text) 713 | 714 | return {"status": "SUCCESS", "message": f"Typed '{text}' into input field using locator: '{locator_str}'"} 715 | else: 716 | # Fallback to typing into focused element if no specific input is found 717 | self.page.keyboard.type(text) # Type into currently focused element 718 | return {"status": "SUCCESS", "message": f"Typed '{text}' into active element (fallback)"} 719 | 720 | except Exception as e: 721 | return {"status": "ERROR", "message": str(e)} 722 | 723 | def navigate_to_url(self, url): 724 | """Navigate to a specific URL using Playwright.""" 725 | if not url: 726 | return {"status": "ERROR", "message": "No URL provided"} 727 | 728 | try: 729 | if not url.startswith(('http://', 'https://')): 730 | url = 'https://' + url 731 | 732 | logging.info(f"🌐 Navigating to: {url}") 733 | self.page.goto(url, wait_until="load", timeout=30000) # Playwright's goto with wait_until and timeout 734 | 735 | self.handle_dialogs() # Handle dialogs after navigation 736 | 737 | if self.debug: 738 | self.take_screenshot() 739 | 740 | return { 741 | "status": "SUCCESS", 742 | "message": f"Navigated to {url}", 743 | "title": self.page.title(), 744 | "current_url": self.page.url 745 | } 746 | except Exception as e: 747 | return {"status": "ERROR", "message": str(e)} 748 | 749 | def scroll_page(self, direction="down", amount=300): 750 | """Scroll the page using Playwright. Demonstrates different scroll options.""" 751 | try: 752 | logging.info(f"📜 Scrolling {direction}") 753 | 754 | # --- Playwright Scrolling Options --- 755 | # 1. JavaScript Scroll (Similar to Selenium, but using Playwright's evaluate): 756 | if direction.lower() == "down": 757 | self.page.evaluate(f"window.scrollBy(0, {amount})") 758 | elif direction.lower() == "up": 759 | self.page.evaluate(f"window.scrollBy(0, -{amount})") 760 | elif direction.lower() == "top": 761 | self.page.evaluate("window.scrollTo(0, 0)") 762 | elif direction.lower() == "bottom": 763 | self.page.evaluate("window.scrollTo(0, document.body.scrollHeight)") 764 | elif direction.lower() == "right": 765 | self.page.evaluate(f"window.scrollBy({amount}, 0)") 766 | elif direction.lower() == "left": 767 | self.page.evaluate(f"window.scrollBy(-{amount}, 0)") 768 | 769 | # 2. Playwright's built-in scrolling (More control over element scrolling - for specific elements, not whole page directly) 770 | # For whole page scrolling, JavaScript approach is still common and effective. 771 | 772 | time.sleep(1) 773 | 774 | if self.debug: 775 | self.take_screenshot() 776 | 777 | return {"status": "SUCCESS", "message": f"Scrolled {direction}"} 778 | except Exception as e: 779 | return {"status": "ERROR", "message": str(e)} 780 | 781 | def wait_for(self, seconds=3): 782 | """Wait for the specified number of seconds using Playwright.""" 783 | try: 784 | logging.info(f"⏱️ Waiting for {seconds} seconds") 785 | self.page.wait_for_timeout(seconds * 1000) # Playwright's wait_for_timeout takes milliseconds 786 | return {"status": "SUCCESS", "message": f"Waited for {seconds} seconds"} 787 | except Exception as e: 788 | return {"status": "ERROR", "message": str(e)} 789 | 790 | def extract_content(self, extract_type="text", locator_str=None): 791 | """Extract content from the page using Playwright. Demonstrates various extraction methods.""" 792 | try: 793 | logging.info(f"📄 Extracting {extract_type} content") 794 | 795 | if extract_type == "text": 796 | # Extract main text content, optionally using a locator 797 | 798 | if locator_str: 799 | element_locator = self.find_element_by_locator(locator_str=locator_str) 800 | if element_locator: 801 | # --- Playwright Text Extraction Methods --- 802 | # 1. textContent() - Get text content of the element and its children 803 | text_content = element_locator.text_content() 804 | 805 | # 2. innerText() - Get rendered text content (similar to browser's innerText property) 806 | # text_content = element_locator.inner_text() 807 | 808 | # 3. innerHTML() - Get the inner HTML content of the element 809 | # html_content = element_locator.inner_html() 810 | # text_content = html_content # Or process HTML as needed 811 | 812 | # 4. getAttribute() - Get specific attribute value 813 | # attribute_value = element_locator.get_attribute("href") 814 | # text_content = attribute_value # Or process attribute value 815 | 816 | else: 817 | return {"status": "ERROR", "message": f"Could not find element with locator: {locator_str}"} 818 | else: 819 | # Extract from whole body if no locator specified 820 | text_content = self.page.locator("body").text_content() # Extract text from body 821 | 822 | main_text = text_content[:2000] + "..." if len(text_content) > 2000 else text_content 823 | 824 | return { 825 | "status": "SUCCESS", 826 | "message": f"Extracted text content", 827 | "data": { 828 | "title": self.page.title(), 829 | "url": self.page.url, 830 | "text": main_text 831 | } 832 | } 833 | 834 | elif extract_type == "links": 835 | # Extract links 836 | links = [] 837 | link_elements_locator = self.page.locator("a") # Locator for all 'a' tags 838 | link_count = link_elements_locator.count() # Get count of links for iteration 839 | 840 | for i in range(min(link_count, 20)): # Limit to first 20 links 841 | try: 842 | link_element = link_elements_locator.nth(i) 843 | href = link_element.get_attribute("href") # Get 'href' attribute 844 | text = link_element.text_content().strip() # Get link text 845 | 846 | if href and text and len(text) > 1: 847 | links.append({"url": href, "text": text}) 848 | except: 849 | continue 850 | 851 | return { 852 | "status": "SUCCESS", 853 | "message": f"Extracted {len(links)} links", 854 | "data": { 855 | "title": self.page.title(), 856 | "url": self.page.url, 857 | "links": links 858 | } 859 | } 860 | 861 | elif extract_type == "search_results": 862 | # Extract search results (Google Search example) 863 | results = [] 864 | search_result_selectors = [ 865 | "div.g", "div[data-sokoban-container]", "div.v7W49e" # Common Google search result containers 866 | ] 867 | 868 | for selector in search_result_selectors: 869 | result_elements_locator = self.page.locator(selector) 870 | result_count = result_elements_locator.count() 871 | 872 | if result_count > 0: 873 | for i in range(min(result_count, 10)): # Limit to first 10 results 874 | try: 875 | result_element = result_elements_locator.nth(i) 876 | 877 | # --- Chained Locators for deeper element selection --- 878 | title_locator = result_element.locator("h3") # Find h3 within result 879 | title = title_locator.text_content() 880 | 881 | link_locator = title_locator.locator("xpath=./ancestor::a") # Find parent 'a' tag using XPath relative to title 882 | link = link_locator.get_attribute("href") 883 | 884 | desc_locator = result_element.locator("div.VwiC3b, div.s") # Find description 885 | description = desc_locator.text_content() if desc_locator.count() > 0 else "" # Optional description 886 | 887 | results.append({ 888 | "title": title, 889 | "url": link, 890 | "description": description 891 | }) 892 | except: 893 | continue 894 | if results: 895 | break # Stop if results are found for a selector 896 | 897 | return { 898 | "status": "SUCCESS", 899 | "message": f"Extracted {len(results)} search results", 900 | "data": { 901 | "query": self.page.title().replace(" - Google Search", ""), 902 | "url": self.page.url, 903 | "results": results 904 | } 905 | } 906 | elif extract_type == "element_text" and locator_str: # Extract text from a specific element using locator 907 | element_locator = self.find_element_by_locator(locator_str=locator_str) 908 | if element_locator: 909 | return { 910 | "status": "SUCCESS", 911 | "message": f"Extracted text from element with locator '{locator_str}'", 912 | "data": { 913 | "text": element_locator.text_content(), 914 | "url": self.page.url 915 | } 916 | } 917 | else: 918 | return {"status": "ERROR", "message": f"Element with locator '{locator_str}' not found for extraction."} 919 | 920 | else: 921 | return {"status": "ERROR", "message": f"Unknown extract type: {extract_type}"} 922 | 923 | except Exception as e: 924 | return {"status": "ERROR", "message": str(e)} 925 | 926 | def handle_dialogs(self): 927 | """Handle common dialogs like cookie notices and popups using Playwright.""" 928 | dismiss_selectors = [ 929 | "#L2AGLb", # Google cookie notice 930 | "button[aria-label='Accept all']", 931 | "button[aria-label='Accept']", 932 | "text=Accept", # Text based locator example 933 | "text=Accept all", 934 | "text=I agree", 935 | "text=Agree", 936 | "text=Allow", 937 | "text=Close", 938 | "text=No thanks", 939 | "text=Got it", 940 | ".modal button", 941 | ".popup button", 942 | "[aria-label='Close']", 943 | ".cookie-banner button", 944 | "#consent-banner button", 945 | ".consent button" 946 | ] 947 | 948 | for selector in dismiss_selectors: 949 | try: 950 | dialog_locator = self.page.locator(selector) 951 | if dialog_locator.count() > 0: # Check if dialog element exists 952 | if dialog_locator.is_visible(): # Check for visibility to ensure it's actually displayed 953 | dialog_locator.click(timeout=5000) # Click to dismiss, with a timeout 954 | logging.info(f"🍪 Dismissed dialog with selector: {selector}") 955 | break # Dismiss only one dialog at a time per handle_dialogs call 956 | except Exception as e: 957 | logging.warning(f"Issue handling dialog with selector '{selector}': {e}") 958 | continue 959 | 960 | def generate_task_summary(self, task): 961 | """Generate a summary of the task execution.""" 962 | successful_steps = sum( 963 | 1 for action in self.action_history if action.get('action') not in ['TASK_COMPLETE', 'ABORT', 'MANUAL_CAPTCHA', 964 | 'RETRY', 965 | 'EXPLORE_WEBSITE'] and action.get( 966 | 'status') == 'SUCCESS') 967 | error_steps = sum(1 for action in self.action_history if action.get('status') == 'ERROR') 968 | manual_captcha_steps = sum(1 for action in self.action_history if action.get('action') == 'MANUAL_CAPTCHA') 969 | exploration_steps = sum(1 for action in self.action_history if action.get('action') == 'EXPLORE_WEBSITE') 970 | 971 | final_screenshot, _ = self.take_screenshot() 972 | 973 | summary = [ 974 | f"Task: {task}", 975 | f"Completed {successful_steps} action(s) with {error_steps} error(s) encountered.", 976 | f"Manual Captcha Handled: {manual_captcha_steps} time(s).", 977 | f"Website Exploration Steps: {exploration_steps} initiated.", 978 | f"Final URL: {self.page.url}", 979 | f"Final page title: {self.page.title()}" 980 | ] 981 | 982 | if exploration_steps > 0: 983 | summary.append("Note: Website exploration was performed.") 984 | 985 | try: 986 | extract_result = self.extract_content("text") 987 | if extract_result.get("status") == "SUCCESS": 988 | summary.append(f"Page content (excerpt): {extract_result['data']['text'][:200]}...") 989 | except: 990 | pass 991 | 992 | return "\n".join(summary) 993 | 994 | def run_assistant(): 995 | """Main function to run the autonomous web assistant.""" 996 | parser = argparse.ArgumentParser(description="Autonomous Web Assistant powered by Gemini and Playwright") 997 | parser.add_argument("task", nargs="?", help="The task to perform") 998 | parser.add_argument("--headless", action="store_true", help="Run in headless mode (no browser UI)") 999 | parser.add_argument("--debug", action="store_true", help="Enable debug mode with more screenshots and logging") 1000 | parser.add_argument("--memory_file", default="memory.json", help="Path to the memory file (JSON format).") 1001 | args = parser.parse_args() 1002 | 1003 | assistant = AutonomousWebAssistant(headless=args.headless, debug=args.debug, memory_file=args.memory_file) 1004 | 1005 | try: 1006 | if args.task: 1007 | assistant.execute_task(args.task) 1008 | else: 1009 | print("🤖 Autonomous Web Assistant powered by Gemini and Playwright") 1010 | print("Type 'exit' or 'quit' to end, 'clear memory' to clear, or 'show memory' to display memory.") 1011 | 1012 | while True: 1013 | task = input("Enter a task (or command): ") 1014 | if task.lower() in ['exit', 'quit']: 1015 | break 1016 | elif task.lower() == 'clear memory': 1017 | category = input("Clear all memory or specific category? (all/[category_name]): ").strip() 1018 | if category.lower() == 'all': 1019 | assistant.clear_memory() 1020 | else: 1021 | assistant.clear_memory(category=category) 1022 | print("Memory cleared.") 1023 | elif task.lower() == 'show memory': 1024 | print(json.dumps(assistant.memory, indent=4)) 1025 | else: 1026 | assistant.execute_task(task) 1027 | finally: 1028 | assistant.close_browser() 1029 | 1030 | if __name__ == "__main__": 1031 | run_assistant() 1032 | --------------------------------------------------------------------------------