├── requirements.txt ├── assets └── llamaindex-agentworkflow.gif ├── LICENSE ├── README.md └── main.py /requirements.txt: -------------------------------------------------------------------------------- 1 | llama-index 2 | openai 3 | tavily-python 4 | selenium 5 | helium 6 | pillow 7 | python-dotenv 8 | -------------------------------------------------------------------------------- /assets/llamaindex-agentworkflow.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lesteroliver911/llamaindex-agentworkflow-browse-agent/HEAD/assets/llamaindex-agentworkflow.gif -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 Lester Oliver 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AI-Powered Web Automation Agents using LlamaIndex AgentWorkflow 🤖 2 | 3 | Automate web interactions using LLMs and llamaindex agentworkflow. 4 | 5 | ![LlamaIndex Agent Workflow Demo](https://github.com/lesteroliver911/llamaindex-agentworkflow-browse-agent/blob/main/assets/llamaindex-agentworkflow.gif) 6 | 7 | ## Why This Exists 8 | 9 | Traditional web automation is brittle and high-maintenance. This system uses GPT-4 and LlamaIndex to create resilient, natural-language-driven automation that: 10 | 11 | - Reduces engineering time spent on web scraping/testing 12 | - Adapts to UI changes without code updates 13 | - Enables non-technical team members to create automation workflows 14 | 15 | ## Core Features 16 | 17 | - **Natural Language Control**: Write instructions in plain English 18 | - **Intelligent Navigation**: Automatically finds and interacts with UI elements 19 | - **Content Analysis**: Extracts and analyzes web content using GPT-4o 20 | - **State Management**: Maintains context across multi-step workflows 21 | 22 | ## Quick Start 23 | 24 | ```bash 25 | # Clone the repo 26 | git clone https://github.com/yourusername/web-automation-agents 27 | 28 | # Install dependencies 29 | pip install -r requirements.txt 30 | 31 | # Set up environment variables 32 | cp .env.example .env 33 | # Add your API keys to .env 34 | 35 | # Run with example workflow 36 | python main.py 37 | ``` 38 | 39 | ### Content Research 40 | ```python 41 | instruction = """ 42 | Go to competitor.com/pricing, 43 | compare all plan features, 44 | screenshot the comparison table, 45 | analyze pricing strategy 46 | """ 47 | 48 | workflow.run(instruction) 49 | ``` 50 | 51 | ### Web Testing 52 | ```python 53 | instruction = """ 54 | Navigate to app.com/signup, 55 | try creating account with invalid email, 56 | verify error message, 57 | screenshot results 58 | """ 59 | 60 | workflow.run(instruction) 61 | ``` 62 | 63 | ## Requirements 64 | 65 | - Python 3.8+ 66 | - OpenAI API key (GPT-4o access required) 67 | - Chrome/Chromium browser 68 | - 8GB+ RAM recommended 69 | 70 | ### Custom Agent Tools 71 | ```python 72 | @tool 73 | async def custom_action(ctx: Context) -> str: 74 | # Your custom automation logic 75 | return "Action completed" 76 | 77 | browser_agent.add_tool(custom_action) 78 | ``` 79 | 80 | ### Workflow Customization 81 | ```python 82 | workflow = AgentWorkflow( 83 | agents=[browser_agent, analysis_agent], 84 | root_agent="BrowserAgent", 85 | initial_state={ 86 | "custom_data": {}, 87 | "screenshots": [] 88 | } 89 | ) 90 | ``` 91 | 92 | ## Production Considerations 93 | 94 | - Rate Limiting: Implement appropriate delays for web requests 95 | - Error Handling: Add retry logic for network issues 96 | - State Management: Consider persistent storage for workflow state 97 | - Monitoring: Add logging for production debugging 98 | 99 | ## Roadmap 100 | 101 | - [ ] Parallel agent execution 102 | - [ ] Custom LLM support 103 | - [ ] Browser profile management 104 | - [ ] API endpoint wrapper 105 | 106 | 107 | ## License 108 | 109 | MIT 110 | 111 | Built with ❤️ using [LlamaIndex](https://github.com/jerryjliu/llama_index) 112 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from llama_index.llms.openai import OpenAI 2 | from tavily import AsyncTavilyClient 3 | from llama_index.core.workflow import Context 4 | from llama_index.core.agent.workflow import FunctionAgent, ReActAgent 5 | from llama_index.core.agent.workflow import AgentWorkflow 6 | from llama_index.core.agent.workflow import ( 7 | AgentInput, 8 | AgentOutput, 9 | ToolCall, 10 | ToolCallResult, 11 | AgentStream, 12 | ) 13 | from selenium import webdriver 14 | from selenium.webdriver.common.by import By 15 | from selenium.webdriver.support import expected_conditions as EC 16 | from selenium.webdriver.support.ui import WebDriverWait 17 | from selenium.common.exceptions import ElementNotInteractableException, TimeoutException 18 | import helium 19 | from PIL import Image 20 | from io import BytesIO 21 | from time import sleep 22 | import os 23 | from dotenv import load_dotenv 24 | 25 | # Load environment variables 26 | load_dotenv() 27 | 28 | # Initialize OpenAI client with API key from environment variable 29 | llm = OpenAI(model="gpt-4", api_key=os.getenv("OPENAI_API_KEY")) 30 | 31 | # Initialize Chrome driver 32 | chrome_options = webdriver.ChromeOptions() 33 | chrome_options.add_argument("--force-device-scale-factor=1") 34 | chrome_options.add_argument("--window-size=1000,1300") 35 | chrome_options.add_argument("--disable-pdf-viewer") 36 | driver = helium.start_chrome(headless=False, options=chrome_options) 37 | 38 | # Web interaction tools 39 | async def navigate_to(url: str) -> str: 40 | """Navigate to a specific URL.""" 41 | helium.go_to(url) 42 | return f"Navigated to {url}" 43 | 44 | async def click_element(text: str, element_type: str = "button") -> str: 45 | """Click an element with specific text.""" 46 | try: 47 | if element_type == "link": 48 | helium.click(helium.Link(text)) 49 | else: 50 | helium.click(text) 51 | return f"Clicked {element_type} with text: {text}" 52 | except Exception as e: 53 | return f"Error clicking element: {str(e)}" 54 | 55 | async def search_text(text: str, nth_result: int = 1) -> str: 56 | """Search for text on the current page.""" 57 | elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]") 58 | if nth_result > len(elements): 59 | return f"Match n°{nth_result} not found (only {len(elements)} matches found)" 60 | elem = elements[nth_result - 1] 61 | driver.execute_script("arguments[0].scrollIntoView(true);", elem) 62 | return f"Found {len(elements)} matches for '{text}'. Focused on element {nth_result}" 63 | 64 | async def take_screenshot(ctx: Context) -> str: 65 | """Take a screenshot of the current page.""" 66 | sleep(1.0) 67 | png_bytes = driver.get_screenshot_as_png() 68 | image = Image.open(BytesIO(png_bytes)) 69 | current_state = await ctx.get("state") 70 | current_state["screenshots"].append(image) 71 | await ctx.set("state", current_state) 72 | return f"Screenshot taken: {image.size} pixels" 73 | 74 | # Define specialized agents 75 | browser_agent = FunctionAgent( 76 | name="BrowserAgent", 77 | description="Agent capable of web browsing and interaction", 78 | system_prompt=( 79 | "You are a web browsing agent that can navigate websites, click elements, " 80 | "and search for text on pages. You can also take screenshots of the current page." 81 | ), 82 | llm=llm, 83 | tools=[navigate_to, click_element, search_text, take_screenshot], 84 | can_handoff_to=["AnalysisAgent"], 85 | ) 86 | 87 | analysis_agent = FunctionAgent( 88 | name="AnalysisAgent", 89 | description="Agent for analyzing web content and screenshots", 90 | system_prompt=( 91 | "You analyze web content and screenshots to extract relevant information " 92 | "and provide insights based on the browsing results." 93 | ), 94 | llm=llm, 95 | tools=[search_text, take_screenshot], 96 | can_handoff_to=["BrowserAgent"], 97 | ) 98 | 99 | # Create workflow 100 | agent_workflow = AgentWorkflow( 101 | agents=[browser_agent, analysis_agent], 102 | root_agent=browser_agent.name, 103 | initial_state={ 104 | "screenshots": [], 105 | "current_url": "", 106 | "extracted_info": "", 107 | }, 108 | ) 109 | 110 | # Main execution function 111 | async def main(): 112 | # Get user input 113 | user_input = input("Enter your browsing instruction: ") 114 | 115 | handler = agent_workflow.run( 116 | user_msg=user_input 117 | ) 118 | 119 | current_agent = None 120 | async for event in handler.stream_events(): 121 | if hasattr(event, "current_agent_name") and event.current_agent_name != current_agent: 122 | current_agent = event.current_agent_name 123 | print(f"\n{'='*50}") 124 | print(f"🤖 Agent: {current_agent}") 125 | print(f"{'='*50}\n") 126 | 127 | if __name__ == "__main__": 128 | import asyncio 129 | asyncio.run(main()) 130 | --------------------------------------------------------------------------------