├── requirements.txt
├── assets
    └── llamaindex-agentworkflow.gif
├── LICENSE
├── README.md
└── main.py


/requirements.txt:
--------------------------------------------------------------------------------
1 | llama-index
2 | openai
3 | tavily-python
4 | selenium
5 | helium
6 | pillow
7 | python-dotenv 
8 | 


--------------------------------------------------------------------------------
/assets/llamaindex-agentworkflow.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lesteroliver911/llamaindex-agentworkflow-browse-agent/HEAD/assets/llamaindex-agentworkflow.gif


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Lester Oliver
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # AI-Powered Web Automation Agents using LlamaIndex AgentWorkflow 🤖
  2 | 
  3 | Automate web interactions using LLMs and llamaindex agentworkflow. 
  4 | 
  5 | ![LlamaIndex Agent Workflow Demo](https://github.com/lesteroliver911/llamaindex-agentworkflow-browse-agent/blob/main/assets/llamaindex-agentworkflow.gif)
  6 | 
  7 | ## Why This Exists
  8 | 
  9 | Traditional web automation is brittle and high-maintenance. This system uses GPT-4 and LlamaIndex to create resilient, natural-language-driven automation that:
 10 | 
 11 | - Reduces engineering time spent on web scraping/testing
 12 | - Adapts to UI changes without code updates
 13 | - Enables non-technical team members to create automation workflows
 14 | 
 15 | ## Core Features
 16 | 
 17 | - **Natural Language Control**: Write instructions in plain English
 18 | - **Intelligent Navigation**: Automatically finds and interacts with UI elements
 19 | - **Content Analysis**: Extracts and analyzes web content using GPT-4o
 20 | - **State Management**: Maintains context across multi-step workflows
 21 | 
 22 | ## Quick Start
 23 | 
 24 | ```bash
 25 | # Clone the repo
 26 | git clone https://github.com/yourusername/web-automation-agents
 27 | 
 28 | # Install dependencies
 29 | pip install -r requirements.txt
 30 | 
 31 | # Set up environment variables
 32 | cp .env.example .env
 33 | # Add your API keys to .env
 34 | 
 35 | # Run with example workflow
 36 | python main.py
 37 | ```
 38 | 
 39 | ### Content Research
 40 | ```python
 41 | instruction = """
 42 | Go to competitor.com/pricing,
 43 | compare all plan features,
 44 | screenshot the comparison table,
 45 | analyze pricing strategy
 46 | """
 47 | 
 48 | workflow.run(instruction)
 49 | ```
 50 | 
 51 | ### Web Testing
 52 | ```python
 53 | instruction = """
 54 | Navigate to app.com/signup,
 55 | try creating account with invalid email,
 56 | verify error message,
 57 | screenshot results
 58 | """
 59 | 
 60 | workflow.run(instruction)
 61 | ```
 62 | 
 63 | ## Requirements
 64 | 
 65 | - Python 3.8+
 66 | - OpenAI API key (GPT-4o access required)
 67 | - Chrome/Chromium browser
 68 | - 8GB+ RAM recommended
 69 | 
 70 | ### Custom Agent Tools
 71 | ```python
 72 | @tool
 73 | async def custom_action(ctx: Context) -> str:
 74 |     # Your custom automation logic
 75 |     return "Action completed"
 76 | 
 77 | browser_agent.add_tool(custom_action)
 78 | ```
 79 | 
 80 | ### Workflow Customization
 81 | ```python
 82 | workflow = AgentWorkflow(
 83 |     agents=[browser_agent, analysis_agent],
 84 |     root_agent="BrowserAgent",
 85 |     initial_state={
 86 |         "custom_data": {},
 87 |         "screenshots": []
 88 |     }
 89 | )
 90 | ```
 91 | 
 92 | ## Production Considerations
 93 | 
 94 | - Rate Limiting: Implement appropriate delays for web requests
 95 | - Error Handling: Add retry logic for network issues
 96 | - State Management: Consider persistent storage for workflow state
 97 | - Monitoring: Add logging for production debugging
 98 | 
 99 | ## Roadmap
100 | 
101 | - [ ] Parallel agent execution
102 | - [ ] Custom LLM support
103 | - [ ] Browser profile management
104 | - [ ] API endpoint wrapper
105 | 
106 | 
107 | ## License
108 | 
109 | MIT
110 | 
111 | Built with ❤️ using [LlamaIndex](https://github.com/jerryjliu/llama_index)
112 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | from llama_index.llms.openai import OpenAI
  2 | from tavily import AsyncTavilyClient
  3 | from llama_index.core.workflow import Context
  4 | from llama_index.core.agent.workflow import FunctionAgent, ReActAgent
  5 | from llama_index.core.agent.workflow import AgentWorkflow
  6 | from llama_index.core.agent.workflow import (
  7 |     AgentInput,
  8 |     AgentOutput,
  9 |     ToolCall,
 10 |     ToolCallResult,
 11 |     AgentStream,
 12 | )
 13 | from selenium import webdriver
 14 | from selenium.webdriver.common.by import By
 15 | from selenium.webdriver.support import expected_conditions as EC
 16 | from selenium.webdriver.support.ui import WebDriverWait
 17 | from selenium.common.exceptions import ElementNotInteractableException, TimeoutException
 18 | import helium
 19 | from PIL import Image
 20 | from io import BytesIO
 21 | from time import sleep
 22 | import os
 23 | from dotenv import load_dotenv
 24 | 
 25 | # Load environment variables
 26 | load_dotenv()
 27 | 
 28 | # Initialize OpenAI client with API key from environment variable
 29 | llm = OpenAI(model="gpt-4", api_key=os.getenv("OPENAI_API_KEY"))
 30 | 
 31 | # Initialize Chrome driver
 32 | chrome_options = webdriver.ChromeOptions()
 33 | chrome_options.add_argument("--force-device-scale-factor=1")
 34 | chrome_options.add_argument("--window-size=1000,1300")
 35 | chrome_options.add_argument("--disable-pdf-viewer")
 36 | driver = helium.start_chrome(headless=False, options=chrome_options)
 37 | 
 38 | # Web interaction tools
 39 | async def navigate_to(url: str) -> str:
 40 |     """Navigate to a specific URL."""
 41 |     helium.go_to(url)
 42 |     return f"Navigated to {url}"
 43 | 
 44 | async def click_element(text: str, element_type: str = "button") -> str:
 45 |     """Click an element with specific text."""
 46 |     try:
 47 |         if element_type == "link":
 48 |             helium.click(helium.Link(text))
 49 |         else:
 50 |             helium.click(text)
 51 |         return f"Clicked {element_type} with text: {text}"
 52 |     except Exception as e:
 53 |         return f"Error clicking element: {str(e)}"
 54 | 
 55 | async def search_text(text: str, nth_result: int = 1) -> str:
 56 |     """Search for text on the current page."""
 57 |     elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")
 58 |     if nth_result > len(elements):
 59 |         return f"Match n°{nth_result} not found (only {len(elements)} matches found)"
 60 |     elem = elements[nth_result - 1]
 61 |     driver.execute_script("arguments[0].scrollIntoView(true);", elem)
 62 |     return f"Found {len(elements)} matches for '{text}'. Focused on element {nth_result}"
 63 | 
 64 | async def take_screenshot(ctx: Context) -> str:
 65 |     """Take a screenshot of the current page."""
 66 |     sleep(1.0)
 67 |     png_bytes = driver.get_screenshot_as_png()
 68 |     image = Image.open(BytesIO(png_bytes))
 69 |     current_state = await ctx.get("state")
 70 |     current_state["screenshots"].append(image)
 71 |     await ctx.set("state", current_state)
 72 |     return f"Screenshot taken: {image.size} pixels"
 73 | 
 74 | # Define specialized agents
 75 | browser_agent = FunctionAgent(
 76 |     name="BrowserAgent",
 77 |     description="Agent capable of web browsing and interaction",
 78 |     system_prompt=(
 79 |         "You are a web browsing agent that can navigate websites, click elements, "
 80 |         "and search for text on pages. You can also take screenshots of the current page."
 81 |     ),
 82 |     llm=llm,
 83 |     tools=[navigate_to, click_element, search_text, take_screenshot],
 84 |     can_handoff_to=["AnalysisAgent"],
 85 | )
 86 | 
 87 | analysis_agent = FunctionAgent(
 88 |     name="AnalysisAgent",
 89 |     description="Agent for analyzing web content and screenshots",
 90 |     system_prompt=(
 91 |         "You analyze web content and screenshots to extract relevant information "
 92 |         "and provide insights based on the browsing results."
 93 |     ),
 94 |     llm=llm,
 95 |     tools=[search_text, take_screenshot],
 96 |     can_handoff_to=["BrowserAgent"],
 97 | )
 98 | 
 99 | # Create workflow
100 | agent_workflow = AgentWorkflow(
101 |     agents=[browser_agent, analysis_agent],
102 |     root_agent=browser_agent.name,
103 |     initial_state={
104 |         "screenshots": [],
105 |         "current_url": "",
106 |         "extracted_info": "",
107 |     },
108 | )
109 | 
110 | # Main execution function
111 | async def main():
112 |     # Get user input
113 |     user_input = input("Enter your browsing instruction: ")
114 |     
115 |     handler = agent_workflow.run(
116 |         user_msg=user_input
117 |     )
118 | 
119 |     current_agent = None
120 |     async for event in handler.stream_events():
121 |         if hasattr(event, "current_agent_name") and event.current_agent_name != current_agent:
122 |             current_agent = event.current_agent_name
123 |             print(f"\n{'='*50}")
124 |             print(f"🤖 Agent: {current_agent}")
125 |             print(f"{'='*50}\n")
126 | 
127 | if __name__ == "__main__":
128 |     import asyncio
129 |     asyncio.run(main())
130 | 


--------------------------------------------------------------------------------