├── web_search_agent ├── __init__.py └── agent.py ├── .env.template ├── .gitattributes ├── LICENSE └── README.md /web_search_agent/__init__.py: -------------------------------------------------------------------------------- 1 | from . import agent 2 | -------------------------------------------------------------------------------- /.env.template: -------------------------------------------------------------------------------- 1 | GOOGLE_GENAI_USE_VERTEXAI="False" 2 | GOOGLE_API_KEY="GEMINI_API_KEY" -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 MeirKaD 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Web Search Agent using Google ADK and Bright Data MCP 2 | 3 | This repository contains a web search agent built with Google's Agent Development Kit (ADK) and Bright Data's Model Context Protocol (MCP). The agent can search the web and retrieve information based on user queries. 4 | 5 | ## Prerequisites 6 | 7 | - Python 3.12 or later 8 | - Node.js and npm (for Bright Data MCP) 9 | - Google Gemini API key 10 | - Bright Data account with active Web Unblocker API zone (For Browser capabilities, Scraping Browser zone is required as well) 11 | 12 | ## Installation 13 | 14 | ### 1. Clone the repository 15 | 16 | ```bash 17 | git clone https://github.com/MeirKaD/MCP_ADK.git 18 | cd MCP_ADK 19 | ``` 20 | 21 | ### 2. Create and activate a virtual environment 22 | 23 | ```bash 24 | # For macOS/Linux 25 | python -m venv .venv 26 | source .venv/bin/activate 27 | 28 | # For Windows 29 | python -m venv .venv 30 | .venv\Scripts\activate 31 | ``` 32 | 33 | ### 3. Install the required packages 34 | 35 | ```bash 36 | pip install google-adk google-generativeai python-dotenv 37 | ``` 38 | 39 | ### 4. Install Bright Data MCP package 40 | 41 | ```bash 42 | npm install -g @brightdata/mcp 43 | ``` 44 | 45 | ### 5. Set up environment variables 46 | 47 | Create a `.env` file in the root directory by copying the `.env.template`: 48 | 49 | ```bash 50 | cp .env.template .env 51 | ``` 52 | 53 | Then, edit the `.env` file and add your Google Gemini API key: 54 | 55 | ``` 56 | GOOGLE_GENAI_USE_VERTEXAI="False" 57 | GOOGLE_API_KEY="YOUR_GEMINI_API_KEY" 58 | ``` 59 | 60 | ### 6. Configure Bright Data MCP credentials 61 | 62 | Edit the `web_search_agent/agent.py` file and replace the placeholders with your Bright Data credentials: 63 | 64 | ```python 65 | "API_TOKEN": "YOUR_BRIGHT_DATA_API_TOKEN", 66 | "WEB_UNLOCKER_ZONE": "unblocker", 67 | "BROWSER_AUTH": "brd-customer-YOUR_CUSTOMER_ID-zone-scraping_browser:YOUR_PASSWORD" 68 | ``` 69 | 70 | ## Running the Agent with ADK Web Interface 71 | 72 | ### 1. Start the ADK Web Server 73 | 74 | ```bash 75 | adk web 76 | ``` 77 | 78 | This will start a local web server, typically at `http://localhost:8000`. 79 | 80 | ### 2. Access the Web Interface 81 | 82 | Open your browser and navigate to `http://localhost:8000` to interact with your agent through the ADK web interface. 83 | 84 | ## How the Agent Works 85 | 86 | The agent is built using Google's Agent Development Kit (ADK) and uses Gemini 2.0 Flash as the underlying model. It leverages Bright Data's Model Context Protocol (MCP) to perform web searches and retrieve information from websites. 87 | 88 | The agent initializes the MCP toolset asynchronously when the first request is received, connecting to Bright Data's services to enable web search capabilities. 89 | 90 | ## Features 91 | 92 | - Web search using Bright Data MCP 93 | - Information retrieval from websites 94 | - Answering questions based on web content 95 | - Automatic cleanup of resources when the agent terminates 96 | 97 | ## Customization 98 | 99 | You can customize the agent's behavior by modifying the `web_search_agent/agent.py` file: 100 | 101 | - Change the model by updating the `model` parameter 102 | - Modify the agent's description and instructions 103 | - Add additional tools or capabilities 104 | 105 | ## Troubleshooting 106 | 107 | If you encounter issues: 108 | 109 | 1. Ensure your Google Gemini API key is valid 110 | 2. Check your Bright Data credentials 111 | 3. Verify that Node.js and npm are correctly installed 112 | 4. Make sure you have the correct version of Python and all required packages 113 | 114 | ## License 115 | 116 | MIT 117 | 118 | ## Acknowledgements 119 | 120 | - Google Agent Development Kit (ADK) 121 | - Bright Data MCP 122 | -------------------------------------------------------------------------------- /web_search_agent/agent.py: -------------------------------------------------------------------------------- 1 | import os 2 | import threading 3 | import asyncio 4 | from dotenv import load_dotenv 5 | from google.adk.agents import Agent, SequentialAgent, LoopAgent 6 | from google.adk.agents.callback_context import CallbackContext 7 | from google.adk.models import LlmRequest, LlmResponse 8 | from google.adk.events import Event, EventActions 9 | from google.genai import types 10 | from typing import Optional 11 | 12 | load_dotenv() 13 | 14 | _mcp_tools = None 15 | _exit_stack = None 16 | _initialized = False 17 | _initialization_in_progress = False 18 | _init_lock = threading.Lock() 19 | 20 | print("Module loaded: web_research_agent") 21 | 22 | def create_planner_agent(): 23 | return Agent( 24 | name="planner", 25 | model="gemini-2.0-flash", 26 | description="Plans research by breaking down complex topics into search queries", 27 | instruction=""" 28 | You are a research planning expert. Your task is to: 29 | 1. Analyze the user's research topic 30 | 2. Break it down into 3-5 specific search queries that together will cover the topic comprehensively 31 | 3. Output a JSON object with format: {"queries": ["query1", "query2", "query3"]} 32 | Be concise and focused in your search queries. 33 | """, 34 | output_key="search_queries" 35 | ) 36 | 37 | # Define researcher agent with improved tool guidance 38 | def create_researcher_agent(): 39 | return Agent( 40 | name="researcher", 41 | model="gemini-2.0-flash", 42 | description="Executes web searches and extracts relevant information", 43 | instruction=""" 44 | You are a web researcher. You will: 45 | 1. Take the specific search queries from the planner 46 | 2. For EACH query: 47 | a. Use search_engine to find relevant information (start with "google" engine) 48 | b. Select 2-3 most relevant results and for each result: 49 | i. Use scraping_browser_navigate to navigate to the URL 50 | ii. Use scraping_browser_get_text to extract the main content 51 | iii. If needed, use scraping_browser_links to find important sections and scraping_browser_click to navigate to them 52 | c. If a page fails to load or lacks information, try another result 53 | 3. Summarize key findings for each query with source citations 54 | 55 | IMPORTANT: 56 | - Always begin with search_engine to discover relevant pages 57 | - Then use browser tools in this sequence: 58 | 1. scraping_browser_navigate (to go to the URL) 59 | 2. scraping_browser_get_text (to extract content) 60 | 3. scraping_browser_links and scraping_browser_click (if you need to navigate within the site) 61 | - Include clear citations with URLs for each piece of information 62 | - Format your findings for each search query separately 63 | """, 64 | before_model_callback=check_researcher_tools 65 | ) 66 | 67 | # Define publisher agent with clear instruction 68 | def create_publisher_agent(): 69 | return Agent( 70 | name="publisher", 71 | model="gemini-2.0-flash", 72 | description="Synthesizes research findings into a comprehensive and detailed final document", 73 | instruction=""" 74 | You are an expert Technical Writer and Synthesist. Your mission is to transform the detailed research findings provided by the researcher into a comprehensive, well-structured, and insightful final report. 75 | 76 | Follow these steps meticulously: 77 | 1. **Deep Analysis & Synthesis:** Carefully review *all* the research findings, summaries, and cited sources provided by the researcher for *all* search queries. Do not just list findings; **synthesize** them. Identify connections, relationships, common themes, contrasting points, and overall patterns across the different pieces of information and sources. 78 | 2. **Logical Structure:** Organize the synthesized information into a coherent and deeply structured document. Use logical sections and sub-sections with clear, descriptive headings (using Markdown H2, H3, etc.) to group related concepts and findings. A possible structure could be: Introduction, Key Theme/Aspect 1 (with sub-points), Key Theme/Aspect 2 (with sub-points), ..., Conclusion, References. Adapt the structure based on the content. 79 | 3. **Compelling Introduction:** Write a robust introduction that clearly defines the topic, states the report's main objectives, highlights the key questions or areas explored, and provides a roadmap for the reader, outlining the main sections of the report. 80 | 4. **Detailed Body Sections:** Elaborate on the synthesized findings within each section. Provide sufficient detail and explanation. Explain concepts clearly. Ensure that claims and statements are directly supported by the research gathered by the researcher. **Explicitly reference the source URLs** where appropriate within the text (e.g., "According to [Source URL], ..."). Aim for thoroughness and depth, ensuring all significant aspects uncovered by the research are included. Use bullet points or numbered lists for clarity where appropriate. Ensure smooth transitions between paragraphs and sections. 81 | 5. **Insightful Conclusion:** Craft a strong conclusion that summarizes the most important findings and synthesized insights from the report. Briefly reiterate the main points discussed. You may also briefly mention limitations based *only* on the provided research or suggest natural next steps *if strongly implied* by the findings, but do *not* introduce entirely new information or opinions. 82 | 6. **Professional Formatting:** Format the entire document using clean and consistent Markdown. Utilize headings, lists (bulleted and numbered), bold/italic emphasis, and potentially blockquotes effectively to enhance readability and structure. 83 | 7. **Comprehensive References:** Create a dedicated "References" section at the very end. List *all* unique source URLs that were cited in the researcher's findings and used in your report. Ensure the list is clean and easy to read. 84 | 8. **Tone and Quality:** Maintain a professional, objective, and informative tone throughout the report. Ensure the language is clear, precise, and accurate according to the research. Strive for a high-quality, polished final document that is significantly more detailed and synthesized than the raw researcher output. Cover all key aspects comprehensively. 85 | """, 86 | output_key="final_document" 87 | ) 88 | 89 | # Create a single initialization function that leverages the EXISTING event loop 90 | async def initialize_mcp_tools(): 91 | """Initialize MCP tools using the existing event loop.""" 92 | global _mcp_tools, _exit_stack, _initialized, _initialization_in_progress 93 | 94 | if _initialized: 95 | return _mcp_tools 96 | 97 | with _init_lock: 98 | if _initialized: 99 | return _mcp_tools 100 | 101 | if _initialization_in_progress: 102 | while _initialization_in_progress: 103 | await asyncio.sleep(0.1) 104 | return _mcp_tools 105 | 106 | _initialization_in_progress = True 107 | 108 | try: 109 | from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StdioServerParameters 110 | 111 | print("Connecting to Bright Data MCP...") 112 | tools, exit_stack = await MCPToolset.from_server( 113 | connection_params=StdioServerParameters( 114 | command='npx', 115 | args=["-y", "@brightdata/mcp"], 116 | env={ 117 | "API_TOKEN": "YOUR_API_TOKEN", 118 | "WEB_UNLOCKER_ZONE": "UB_ZONE", 119 | "BROWSER_AUTH": "SBR_USER:SBR_PASS" 120 | } 121 | ) 122 | ) 123 | print(f"MCP Toolset created successfully with {len(tools)} tools") 124 | 125 | _mcp_tools = tools 126 | _exit_stack = exit_stack 127 | 128 | import atexit 129 | 130 | def cleanup_mcp(): 131 | global _exit_stack 132 | if _exit_stack: 133 | print("Closing MCP server connection...") 134 | try: 135 | 136 | loop = asyncio.new_event_loop() 137 | loop.run_until_complete(_exit_stack.aclose()) 138 | loop.close() 139 | print("MCP server connection closed successfully.") 140 | except Exception as e: 141 | print(f"Error closing MCP connection: {e}") 142 | finally: 143 | _exit_stack = None 144 | 145 | atexit.register(cleanup_mcp) 146 | 147 | _initialized = True 148 | 149 | # Find and update the researcher agent if root_agent is defined 150 | for agent in root_agent.sub_agents: 151 | if agent.name == "researcher": 152 | agent.tools = tools 153 | print(f"Successfully added {len(tools)} tools to researcher agent") 154 | 155 | # List some tool names for debugging 156 | tool_names = [tool.name for tool in tools[:5]] 157 | print(f"Available tools include: {', '.join(tool_names)}") 158 | break 159 | 160 | print("MCP initialization complete!") 161 | return tools 162 | 163 | except Exception as e: 164 | print(f"Error initializing MCP tools: {e}") 165 | return None 166 | finally: 167 | _initialization_in_progress = False 168 | 169 | 170 | async def wait_for_initialization(): 171 | """Wait for MCP initialization to complete.""" 172 | global _initialized 173 | 174 | if not _initialized: 175 | print("Starting initialization in callback...") 176 | await initialize_mcp_tools() 177 | 178 | return _initialized 179 | 180 | def check_researcher_tools(callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]: 181 | global _mcp_tools, _initialized 182 | 183 | agent_name = callback_context.agent_name 184 | 185 | if agent_name == "researcher" and not _initialized: 186 | print("Researcher agent needs tools - will start initialization") 187 | 188 | loop = asyncio.get_event_loop() 189 | loop.create_task(initialize_mcp_tools()) 190 | 191 | print("Initialization started in background. Asking user to retry.") 192 | return LlmResponse( 193 | content=types.Content( 194 | role="model", 195 | parts=[types.Part(text="Initializing research tools. This happens only once. Please try your query again in a few moments.")] 196 | ) 197 | ) 198 | 199 | return None 200 | 201 | root_agent = SequentialAgent( 202 | name="web_research_agent", 203 | description="An agent that researches topics on the web and creates comprehensive reports", 204 | sub_agents=[ 205 | create_planner_agent(), 206 | create_researcher_agent(), 207 | create_publisher_agent() 208 | ] 209 | ) 210 | 211 | print("Agent structure created. MCP tools will be initialized on first use.") 212 | --------------------------------------------------------------------------------