├── .env.test ├── .gitignore ├── LICENSE ├── README.md ├── agent.py ├── agent_prompt.txt ├── browser.py ├── examples ├── .DS_Store ├── events-in-bangalore.json ├── events-in-bangalore.webm └── repo-issues.mp4 ├── index.py ├── requirements.txt └── server.py /.env.test: -------------------------------------------------------------------------------- 1 | ANTHROPIC_API_KEY= 2 | DEEPSEEK_API_KEY= 3 | OMNIPARSER_API= -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | agent 2 | *.env -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 Addy Bhatia 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # OneQuery 2 | 3 | [![GitHub License](https://img.shields.io/github/license/addy999/onequery)](https://github.com/addy999/onequery/blob/main/LICENSE) 4 | [![GitHub Last Commit](https://img.shields.io/github/last-commit/addy999/onequery)](https://github.com/addy999/onequery/commits/main) 5 | 6 | > 🔨 **Note:** This repository is still in development. Contributions and feedback are welcome! 7 | 8 | ## Setup 9 | 10 | - Requirements: `pip install -r requirements.txt` 11 | - Install browser: `python -m playwright install` 12 | - This project uses Playwright to control the browser. You can install the browser of your choice using the command above. 13 | - Write your environment variables in a `.env` file (see `.env.test`) 14 | - Install OmniParser 15 | - For webpage analysis, we use the [OmniParser](https://huggingface.co/spaces/microsoft/OmniParser) model from Hugging Face. You'll need to host it via an [API](https://github.com/addy999/omniparser-api) locally. 16 | 17 | ## Examples 18 | 19 | - Finding issues on a github repo 20 | 21 | [![Video Demo 1](http://img.youtube.com/vi/a_QPDnAosKM/0.jpg)](https://youtu.be/a_QPDnAosKM?si=pXtZgrRlvXzii7FX "Finding issues on a GitHub repo") 22 | 23 | - Finding live events 24 | 25 | [![Video Demo 2](http://img.youtube.com/vi/sp_YuZ1Q4wU/0.jpg)](https://youtu.be/sp_YuZ1Q4wU?feature=shared "Finding live events") 26 | 27 | ## Usage 28 | 29 | ### General query with no source to start with 30 | 31 | ```python 32 | task = "Find 2 recent issues from PyTorch repository." 33 | 34 | class IssueModel(BaseModel): 35 | date: str 36 | title: str 37 | author: str 38 | description: str 39 | 40 | class OutputModel(BaseModel): 41 | issues: list[IssueModel] 42 | 43 | scraper = WebScraper(task, None, OutputModel) 44 | scraper.run() 45 | ``` 46 | 47 | ### If you know the URL 48 | 49 | ```python 50 | start_url = "https://in.bookmyshow.com/" 51 | task = "Find 5 events happening in Bangalore this week." 52 | 53 | class EventsModel(BaseModel): 54 | name: str 55 | date: str 56 | location: str 57 | 58 | class OutputModel(BaseModel): 59 | events: list[EventsModel] 60 | 61 | scraper = WebScraper(task, start_url, OutputModel) 62 | scraper.run() 63 | ``` 64 | 65 | ### Serving with a REST API 66 | 67 | Server: 68 | 69 | ```bash 70 | pip install fastapi[all] 71 | ``` 72 | 73 | ```python 74 | uvicorn server:app --reload 75 | ``` 76 | 77 | Client: 78 | 79 | ```python 80 | import requests 81 | 82 | url = "http://0.0.0.0:8000/scrape" 83 | 84 | payload = { 85 | "start_url": "http://example.com", 86 | "task": "Scrape the website for data", 87 | "schema": { 88 | "title": (str, ...), 89 | "description": (str, ...) 90 | } 91 | } 92 | 93 | response = requests.post(url, json=payload) 94 | 95 | print(response.status_code) 96 | print(response.json()) 97 | ``` 98 | 99 | > 💡 **Tip:** For a hosted solution with a lightning fast Zig based browser, worldwide proxy support, and job queuing system, check out [onequery.app](https://www.onequery.app). 100 | 101 | ## Testing 102 | 103 | In the works 104 | 105 | ## Status 106 | 107 | - ✅ Basic functionality 108 | - 🛠️ Testing 109 | - 🛠️ Documentation 110 | 111 | ## Architecture 112 | 113 | (needs to be revised) 114 | 115 | ### Flowchart 116 | 117 | ```mermaid 118 | graph TD; 119 | A[Text Query] --> B[WebLLM]; 120 | B --> C[Browser Instructions]; 121 | C --> D[Browser Execution]; 122 | D --> E[OmniParser]; 123 | E --> F[Screenshot & Structured Info]; 124 | F --> G[AI]; 125 | C --> G; 126 | G --> H[JSON Output]; 127 | ``` 128 | 129 | ### Stack 130 | 131 | - Browser: [Playwright](https://github.com/microsoft/playwright-python) 132 | - VLLM: [OmniParser](https://github.com/addy999/omniparser-api) 133 | 134 | 135 | ## Alternatives 136 | - https://github.com/CognosysAI/browser/ 137 | -------------------------------------------------------------------------------- /agent.py: -------------------------------------------------------------------------------- 1 | import re 2 | import json 3 | import ast 4 | from pydantic import BaseModel 5 | import litellm 6 | from litellm import api_key, completion 7 | import os 8 | from functools import lru_cache 9 | from datetime import datetime 10 | from together import Together 11 | 12 | client = Together() 13 | 14 | base_path = os.path.dirname(os.path.abspath(__file__)) 15 | RAW_SYSTEM_PROMPT = open(os.path.join(base_path, "agent_prompt.txt")).read() 16 | # SYSTEM_JSON_PROMPT = open(os.path.join(base_path, "agent_prompt_json.txt")).read() 17 | 18 | # litellm.set_verbose = True 19 | litellm.modify_params = True 20 | 21 | 22 | def parse_text(text): 23 | next_action_pattern = r"\n(.*?)\n" 24 | next_action2_pattern = r"\n(.*?)\n" 25 | explanation_pattern = r"\n(.*?)\n" 26 | next_task_pattern = r"\n(.*?)\n" 27 | 28 | next_action_match = re.search(next_action_pattern, text, re.DOTALL) 29 | next_action2_match = re.search(next_action2_pattern, text, re.DOTALL) 30 | explanation_match = re.search(explanation_pattern, text, re.DOTALL) 31 | next_task_match = re.search(next_task_pattern, text, re.DOTALL) 32 | 33 | result = { 34 | "next_action": next_action_match.group(1) if next_action_match else None, 35 | "next_action_2": (next_action2_match.group(1) if next_action2_match else None), 36 | "explanation": explanation_match.group(1) if explanation_match else None, 37 | "next_task": next_task_match.group(1) if next_task_match else None, 38 | } 39 | 40 | return result 41 | 42 | 43 | def is_valid_json(string: str) -> bool: 44 | try: 45 | json.loads(string) 46 | return True 47 | except json.JSONDecodeError: 48 | return False 49 | 50 | 51 | def clean_up_json(string: str) -> str: 52 | def extract_json_from_string(string): 53 | start_index = string.find("{") 54 | end_index = string.rfind("}") 55 | if start_index != -1 and end_index != -1: 56 | return string[start_index : end_index + 1] 57 | return "" 58 | 59 | cleaned = ( 60 | extract_json_from_string(string) 61 | .strip() 62 | .replace("\n", "") 63 | .replace('\\"', '"') 64 | .replace("```", "") 65 | .replace("json", "") 66 | ) 67 | 68 | # Check if there's a missing "}" at the end and add it 69 | if cleaned.count("{") > cleaned.count("}"): 70 | cleaned += "}" 71 | 72 | if not is_valid_json(cleaned): 73 | try: 74 | cleaned = json.dumps(ast.literal_eval(cleaned)) 75 | except (ValueError, SyntaxError): 76 | raise ValueError("String not valid", cleaned) 77 | return cleaned 78 | 79 | 80 | def get_reply(state, mode="anthropic") -> str: 81 | today_date = datetime.now().strftime("%Y-%m-%d") 82 | SYSTEM_PROMPT = f"{RAW_SYSTEM_PROMPT}\n\nToday's date: {today_date}" 83 | if mode == "ollama": 84 | reply = ( 85 | completion( 86 | model="ollama/llama3.3", 87 | max_tokens=256, 88 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state, 89 | temperature=0.3, 90 | ) 91 | .choices[0] 92 | .message.content 93 | ) 94 | elif mode == "deepseek": 95 | reply = ( 96 | completion( 97 | model="deepseek/deepseek-chat", 98 | max_tokens=256, 99 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state, 100 | temperature=0.3, 101 | ) 102 | .choices[0] 103 | .message.content 104 | ) 105 | elif mode == "anthropic": 106 | reply = ( 107 | completion( 108 | model="anthropic/claude-3-5-sonnet-20241022", 109 | max_tokens=256, 110 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state, 111 | temperature=0.3, 112 | ) 113 | .choices[0] 114 | .message.content 115 | ) 116 | elif mode == "deepseek-r1": 117 | reply = ( 118 | client.chat.completions.create( 119 | model="deepseek-ai/DeepSeek-R1", 120 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state, 121 | ) 122 | .choices[0] 123 | .message.content 124 | ) 125 | return parse_text(reply) 126 | 127 | 128 | def summarize_text(prompt: str, documents: list, schema: str) -> str: 129 | return json.loads( 130 | clean_up_json( 131 | ( 132 | "{" 133 | + completion( 134 | model="anthropic/claude-3-5-sonnet-20241022", 135 | max_tokens=1000, 136 | temperature=0.3, 137 | messages=[ 138 | { 139 | "role": "system", 140 | "content": f"""Summarize the following documents for this prompt in JSON format. 141 | 142 | Prompt: {prompt} 143 | 144 | Return using this schema: {schema}""", 145 | }, 146 | { 147 | "role": "user", 148 | "content": [ 149 | { 150 | "type": "text", 151 | "text": text, 152 | } 153 | for text in documents 154 | ], 155 | }, 156 | {"role": "assistant", "content": "{"}, 157 | ], 158 | ) 159 | .choices[0] 160 | .message.content 161 | ) 162 | ) 163 | ) 164 | 165 | 166 | def fetch_query_for_rag(task: str) -> str: 167 | response = clean_up_json( 168 | "{" 169 | + completion( 170 | model="anthropic/claude-3-5-sonnet-20241022", 171 | max_tokens=256, 172 | temperature=0.3, 173 | messages=[ 174 | { 175 | "role": "user", 176 | "content": "Generate a simple keyword/phrase query for a RAG system based on the following task. Return the query as JSON with 'query' key. The query should help fetch documents relevant to the task: " 177 | + task, 178 | }, 179 | {"role": "assistant", "content": "{"}, 180 | ], 181 | ) 182 | .choices[0] 183 | .message.content 184 | ) 185 | return json.loads(response)["query"] 186 | 187 | 188 | @lru_cache(maxsize=128, typed=True) 189 | def find_schema_for_query(query: str) -> str: 190 | return clean_up_json( 191 | # "{" 192 | completion( 193 | model="claude-3-5-haiku-20241022", 194 | temperature=0.5, 195 | max_tokens=512, 196 | messages=[ 197 | { 198 | "role": "system", 199 | "content": """You're an expert in data science. You're helping a colleague form JSON schemas for their data. You're given a query and asked to find the schema for it. 200 | 201 | Example: 202 | Query: Find 2 recent issues from PyTorch repository. 203 | Schema: {'properties': {'date': {'title': 'Date', 'type': 'string'}, 'title': {'title': 'Title', 'type': 'string'}, 'author': {'title': 'Author', 'type': 'string'}, 'description': {'title': 'Description', 'type': 'string'}}, 'required': ['date', 'title', 'author', 'description'], 'title': 'IssueModel', 'type': 'object'} 204 | 205 | Example: 206 | Query: Find 5 events happening in Bangalore this week. 207 | Schema: {'properties': {'name': {'title': 'Name', 'type': 'string'}, 'date': {'title': 'Date', 'type': 'string'}, 'location': {'title': 'Location', 'type': 'string'}}, 'required': ['name', 'date', 'location'], 'title': 'EventsModel', 'type': 'object'}""", 208 | }, 209 | { 210 | "role": "user", 211 | "content": f"""Find the schema for the following query: {query}.""", 212 | }, 213 | # {"role": "assistant", "content": "{"}, 214 | ], 215 | ) 216 | .choices[0] 217 | .message.content 218 | ) 219 | -------------------------------------------------------------------------------- /agent_prompt.txt: -------------------------------------------------------------------------------- 1 | You are a web browsing agent tasked with navigating web pages and performing actions based on given instructions and visual information. Your goal is to determine the next 1-2 appropriate actions to take on a webpage, given an initial task and a list of elements on the current page state. 2 | 3 | The list of elements contains boxes highlighting relevant HTML elements, each with a unique identifier (UID) listed. UID must be an integer only. 4 | 5 | Avoid signing in to any website - you can only access public data. 6 | 7 | The possible actions you can take are: 8 | 1. change(value=[str], uid=[str]) - Change the value of an element 9 | 2. click(uid=[str]) - Click on an element 10 | 3. scroll(x=[int], y=[int]) - Scroll the page 11 | 4. submit(uid=[str]) - Submit a form 12 | 5. text_input(text=[str], uid=[str]) - Input text into a field 13 | 6. enter - Press enter if inside a text box previously 14 | 7. back - go back a page 15 | 8. nothing - if no more actions are needed. 16 | 9. search(value=[str]) - Google a query if the webpage isn't helpful and you need to find other websites to visit. 17 | 18 | To determine the next action: 19 | 1. Carefully analyze the elements and the initial task. 20 | 2. Consider which HTML elements are relevant to accomplishing the task. 21 | 3. If there's any modals open (like cookie banners or pop ups), close them. 22 | 4. Determine the most appropriate action to take based on the available elements and the task at hand. 23 | 5. Choose one of the possible actions listed above that best fits the current situation. 24 | 6. Do not duplicate the last action 25 | 7. It is possible that no action will be required. Assume all webpages have been recorded by another agent as they are visited. 26 | 8. If you have recently navigated to a new page and there's not enough content on the page, try scrolling down. 27 | 9. If you're on a search engine page, run the search action instead of modifying text input. 28 | 10. If the webpage says you've been blocked or if you're unable to access a webpage, go back to search and try a different website. 29 | 11. If continually seeing blocked webpages, exit. 30 | 13. You can not create a second aciton if the first action is `search`, `back`, or `click`. 31 | 32 | 33 | Once you have determined the next action, output your decision in the following format: 34 | 35 | [Insert the chosen action here, following the format specified in the action list] 36 | 37 | 38 | 39 | [If applicable, insert the next chosen action here to follow the previous action, also following the format specified in the action list] 40 | 41 | 42 | 43 | [One sentence to instruct the next agent to continue this task] 44 | 45 | 46 | Provide a brief explanation for your chosen action(s): 47 | 48 | [Insert your explanation here] 49 | 50 | 51 | Remember to base your decision solely on the information provided in the initial task and the elements. Do not assume or infer any additional information beyond what is explicitly stated. 52 | 53 | Example actions: 54 | - click(uid="1") 55 | - text_input(text="username", uid="12") 56 | - change(value="new_value", uid="5") 57 | - scroll(x=0, y=100) 58 | - submit(uid="3") 59 | - enter 60 | - back 61 | - nothing 62 | - search(value="top news in new york today") -------------------------------------------------------------------------------- /browser.py: -------------------------------------------------------------------------------- 1 | import json 2 | import ast 3 | import random 4 | import asyncio 5 | import base64 6 | import os 7 | import shutil 8 | import time 9 | import ua_generator 10 | from dataclasses import dataclass 11 | from datetime import datetime 12 | from typing import Dict, List, Optional, Tuple, Union 13 | 14 | import requests 15 | from playwright.async_api import Page, async_playwright 16 | from pydantic import BaseModel 17 | 18 | from .index import RAGSystem 19 | from .agent import fetch_query_for_rag, get_reply, summarize_text 20 | 21 | 22 | def call_process_image_api( 23 | image_path, box_threshold=0.05, iou_threshold=0.1, timeout=60 24 | ): 25 | start = time.time() 26 | url = os.environ.get("OMNIPARSER_API") 27 | with open(image_path, "rb") as image_file: 28 | image_data = image_file.read() 29 | 30 | files = {"image_file": ("image.png", image_data, "image/png")} 31 | params = {"box_threshold": box_threshold, "iou_threshold": iou_threshold} 32 | 33 | for attempt in range(2): 34 | response = requests.post(url, files=files, params=params, timeout=timeout) 35 | if response.status_code == 200: 36 | resp = response.json() 37 | return resp["image"], resp["parsed_content_list"], resp["label_coordinates"] 38 | else: 39 | if attempt == 1: 40 | raise Exception( 41 | f"Request failed with status code {response.status_code}" 42 | ) 43 | time.sleep(1) # Wait a bit before retrying 44 | 45 | 46 | # wake up the server 47 | try: 48 | call_process_image_api("downloaded_image.png", 0.05, 0.1, timeout=60) 49 | except Exception: 50 | pass 51 | 52 | 53 | @dataclass 54 | class WebElement: 55 | id: int 56 | text: str 57 | x: float 58 | y: float 59 | width: float 60 | height: float 61 | element_type: str # 'text' or 'icon' 62 | 63 | @property 64 | def center(self) -> Tuple[float, float]: 65 | """Returns the center coordinates of the element""" 66 | return (self.x + (self.width / 2), self.y + (self.height / 2)) 67 | 68 | @property 69 | def bounds(self) -> Tuple[float, float, float, float]: 70 | """Returns the boundary coordinates (x1, y1, x2, y2)""" 71 | return (self.x, self.y, self.x + self.width, self.y + self.height) 72 | 73 | 74 | class WebPageProcessor: 75 | def __init__(self): 76 | self.elements: Dict[int, WebElement] = {} 77 | 78 | def load_elements(self, text_boxes: str, coordinates: str) -> None: 79 | """ 80 | Load elements from the processed webpage data 81 | 82 | Args: 83 | text_boxes: String mapping ID to text content 84 | coordinates: String mapping ID to [x, y, width, height] lists 85 | """ 86 | 87 | self.elements = {} 88 | 89 | def parse_text_boxes(text: str) -> dict: 90 | # Split into lines and filter empty lines 91 | lines = [line.strip() for line in text.split("\n") if line.strip()] 92 | 93 | # Dictionary to store results 94 | boxes = {} 95 | 96 | for line in lines: 97 | # Split on ":" to separate ID from text 98 | id_part, text_part = line.split(":", 1) 99 | 100 | # Extract ID number using string operations 101 | id_str = id_part.split("ID")[1].strip() 102 | id_num = int(id_str) 103 | 104 | # Store in dictionary with cleaned text 105 | boxes[id_num] = text_part.strip() 106 | 107 | return boxes 108 | 109 | def parse_coordinates(coords: str) -> dict: 110 | """ 111 | Example string: 112 | `{'0': [0.89625, 0.04333332697550456, 0.06125, 0.03], '1': [0.01875, 0.14499998728434244, 0.34875, 0.03833333333333333]}` 113 | """ 114 | return ast.literal_eval(coords) 115 | 116 | coordinates = parse_coordinates(coordinates) 117 | for element_id, text in parse_text_boxes(text_boxes).items(): 118 | id_str = str(element_id) 119 | if id_str in coordinates: 120 | coords = coordinates[id_str] 121 | element_type = "icon" if "Icon Box" in text else "text" 122 | 123 | self.elements[element_id] = WebElement( 124 | id=element_id, 125 | text=text.strip(), 126 | x=coords[0], 127 | y=coords[1], 128 | width=coords[2], 129 | height=coords[3], 130 | element_type=element_type, 131 | ) 132 | 133 | async def click_element(self, page, element_id: int) -> None: 134 | """Click an element using its center coordinates""" 135 | if element_id not in self.elements: 136 | raise ValueError(f"Element ID {element_id} not found") 137 | 138 | element = self.elements[element_id] 139 | x, y = element.center 140 | 141 | # Convert normalized coordinates to actual pixels 142 | viewport_size = await page.viewport() 143 | actual_x = x * viewport_size["width"] 144 | actual_y = y * viewport_size["height"] 145 | 146 | await page.mouse.click(actual_x, actual_y) 147 | 148 | def find_elements_by_text( 149 | self, text: str, partial_match: bool = True 150 | ) -> List[WebElement]: 151 | """Find elements containing the specified text""" 152 | matches = [] 153 | for element in self.elements.values(): 154 | if partial_match and text.lower() in element.text.lower(): 155 | matches.append(element) 156 | elif not partial_match and text.lower() == element.text.lower(): 157 | matches.append(element) 158 | return matches 159 | 160 | def get_nearby_elements( 161 | self, element_id: int, max_distance: float = 0.1 162 | ) -> List[WebElement]: 163 | """Find elements within a certain distance of the specified element""" 164 | if element_id not in self.elements: 165 | raise ValueError(f"Element ID {element_id} not found") 166 | 167 | source = self.elements[element_id] 168 | nearby = [] 169 | 170 | for element in self.elements.values(): 171 | if element.id == element_id: 172 | continue 173 | 174 | # Calculate center-to-center distance 175 | sx, sy = source.center 176 | ex, ey = element.center 177 | distance = ((sx - ex) ** 2 + (sy - ey) ** 2) ** 0.5 178 | 179 | if distance <= max_distance: 180 | nearby.append(element) 181 | 182 | return nearby 183 | 184 | 185 | @dataclass 186 | class Action: 187 | action_type: str 188 | params: Optional[Dict[str, Union[str, int]]] 189 | 190 | 191 | class PlaywrightExecutor: 192 | def __init__(self, page: Page, web_processor: "WebPageProcessor"): 193 | self.page = page 194 | self.processor = web_processor 195 | 196 | async def execute_action(self, action_str: str) -> None: 197 | """Execute a Playwright action from a string command.""" 198 | print("> Executing action:", action_str) 199 | action = self.parse_action(action_str) 200 | element = None 201 | if "uid" in action.params: 202 | element = self.processor.elements.get(int(action.params["uid"])) 203 | if not element: 204 | raise ValueError(f"Element with uid {action.params['uid']} not found") 205 | if action.action_type == "click": 206 | await self._execute_click(element) 207 | elif action.action_type == "text_input": 208 | await self._execute_change(element, action.params["text"]) 209 | elif action.action_type == "change": 210 | await self._execute_change(element, action.params["value"]) 211 | elif action.action_type == "load": 212 | await self._execute_load(action.params["url"]) 213 | elif action.action_type == "scroll": 214 | await self._execute_scroll(int(action.params["x"]), int(action.params["y"])) 215 | elif action.action_type == "submit": 216 | await self._execute_submit(element) 217 | elif action.action_type == "back": 218 | await self.page.go_back() 219 | elif action.action_type == "enter": 220 | await self.page.keyboard.press("Enter") 221 | elif action.action_type == "nothing": 222 | pass 223 | else: 224 | raise ValueError(f"Unknown action type: {action.action_type}") 225 | 226 | def parse_action(self, action_str: str) -> Action: 227 | """Parse an action string into an Action object.""" 228 | if action_str == "back": 229 | return Action(action_type="back", params={}) 230 | if action_str == "enter": 231 | return Action(action_type="enter", params={}) 232 | if action_str == "nothing": 233 | return Action(action_type="nothing", params={}) 234 | action_type = action_str[: action_str.index("(")] 235 | params_str = action_str[action_str.index("(") + 1 : action_str.rindex(")")] 236 | params = {} 237 | if params_str: 238 | param_pairs = params_str.split(",") 239 | for pair in param_pairs: 240 | key, value = pair.split("=", 1) 241 | key = key.strip() 242 | value = value.strip().strip("\"'") 243 | params[key] = value 244 | return Action(action_type=action_type, params=params) 245 | 246 | async def _execute_click(self, element: "WebElement") -> None: 247 | """Execute a click action.""" 248 | x, y = element.center 249 | viewport = self.page.viewport_size 250 | actual_x = x * viewport["width"] 251 | actual_y = y * viewport["height"] 252 | await self.page.mouse.move(actual_x, actual_y) 253 | await self.page.mouse.click(actual_x, actual_y, delay=100) 254 | 255 | async def _execute_text_input(self, element: "WebElement", text: str) -> None: 256 | """Execute a text input action.""" 257 | x, y = element.center 258 | viewport = self.page.viewport_size 259 | actual_x = x * viewport["width"] 260 | actual_y = y * viewport["height"] 261 | await self.page.mouse.click(actual_x, actual_y, delay=100) 262 | await self.page.keyboard.type(text, delay=100) 263 | 264 | async def _execute_change(self, element: "WebElement", value: str) -> None: 265 | """Execute a change action.""" 266 | x, y = element.center 267 | viewport = self.page.viewport_size 268 | actual_x = x * viewport["width"] 269 | actual_y = y * viewport["height"] 270 | await self.page.mouse.click(actual_x, actual_y) 271 | await self.page.keyboard.down("Meta") 272 | await self.page.keyboard.press("A") 273 | await self.page.keyboard.up("Meta") 274 | await self.page.keyboard.type(value, delay=100) 275 | 276 | async def _execute_load(self, url: str) -> None: 277 | """Execute a load action.""" 278 | await self.page.goto(url) 279 | 280 | async def _execute_scroll(self, x: int, y: int) -> None: 281 | """Execute a scroll action.""" 282 | await self.page.evaluate(f"window.scrollTo({x}, {y})") 283 | await self.page.wait_for_timeout(1000) 284 | 285 | async def _execute_submit(self, element: "WebElement") -> None: 286 | """Execute a submit action.""" 287 | x, y = element.center 288 | viewport = self.page.viewport_size 289 | actual_x = x * viewport["width"] 290 | actual_y = y * viewport["height"] 291 | await self.page.mouse.click(actual_x, actual_y) 292 | 293 | 294 | class WebScraper: 295 | def __init__(self, task, start_url, output_model: BaseModel, callback=None): 296 | self.logs = [] 297 | self.log_callback = callback 298 | 299 | self._log("Initializing WebScraper...") 300 | self.task = task 301 | self.start_url = start_url 302 | index_path = "output/index" 303 | if os.path.exists(index_path): 304 | shutil.rmtree(index_path) 305 | self.rag = RAGSystem(index_path="output/index") 306 | self.web_processor = WebPageProcessor() 307 | self.output_model = output_model 308 | self.browser = None 309 | self.iteration_count = 0 310 | self._log("Done initializing WebScraper") 311 | 312 | def _log(self, message): 313 | self.logs.append(message) 314 | if self.log_callback: 315 | self.log_callback(message) 316 | 317 | async def main(self, p): 318 | # locally 319 | self._log("Starting browser...") 320 | self.browser = await p.chromium.launch( 321 | headless=True, 322 | ) 323 | 324 | user_agent = ua_generator.generate( 325 | device="desktop", 326 | ) 327 | 328 | context = await self.browser.new_context( 329 | record_video_dir="videos/", 330 | record_video_size={"width": 1920, "height": 1080}, 331 | ) 332 | page = await context.new_page() 333 | await page.set_viewport_size({"width": 1920, "height": 1080}) 334 | next_task = ( 335 | "Find the website to visit." 336 | if "google.com" in self.start_url 337 | else "Figure out what to do on the website." 338 | ) 339 | next_action = f'load(url="{self.start_url}")' 340 | second_action = None 341 | max_iterations = 30 342 | self.iteration_count = 0 343 | state = [ 344 | { 345 | "role": "user", 346 | "content": f"""Overall goal: {self.task}. Try to find the following information in your search: {self.output_model["properties"]}""", 347 | }, 348 | { 349 | "role": "assistant", 350 | "content": "Okay. Let's get started.", 351 | }, 352 | ] 353 | while next_task and self.iteration_count < max_iterations: 354 | executor = PlaywrightExecutor(page, self.web_processor) 355 | self._log(f"> Executing action {next_action}") 356 | await executor.execute_action(next_action) 357 | time.sleep(1) 358 | if second_action: 359 | self._log(f"> Executing second action {second_action}") 360 | await executor.execute_action(second_action) 361 | time.sleep(1) 362 | self._log("> Inspecting the screen...") 363 | start_time = datetime.now() 364 | await page.screenshot(path="screenshot.png", scale="css") 365 | img, parsed, coordinates = call_process_image_api( 366 | "screenshot.png", 0.2, 0.1 367 | ) 368 | 369 | # Save the base64 image locally as "screenshot.png" 370 | image_data = base64.b64decode(img) 371 | with open("screenshot.png", "wb") as f: 372 | f.write(image_data) 373 | 374 | end_time = datetime.now() 375 | self._log(f"Inspection took: {(end_time - start_time).total_seconds()}s") 376 | self.web_processor.load_elements(parsed, coordinates) 377 | text_content = " ".join( 378 | [ 379 | a.text 380 | for a in self.web_processor.elements.values() 381 | if a.element_type == "text" 382 | ] 383 | ) 384 | self.rag.add_document( 385 | text_content, 386 | {"url": page.url, "timestamp": datetime.now().isoformat()}, 387 | ) 388 | state.append( 389 | { 390 | "role": "user", 391 | "content": "Elements on screen: " + parsed, 392 | # { 393 | # "type": "image", 394 | # "source": { 395 | # "type": "base64", 396 | # "media_type": "image/png", 397 | # "data": img, 398 | # }, 399 | # }, 400 | # ], 401 | } 402 | ) 403 | self._log("> Getting reply from AI...") 404 | start_time = datetime.now() 405 | reply = get_reply(state) 406 | self._log( 407 | f"> AI time taken: {(datetime.now() - start_time).total_seconds()}" 408 | ) 409 | 410 | next_task, next_action, second_action = ( 411 | reply["next_task"], 412 | reply["next_action"], 413 | reply.get("next_action_2"), 414 | ) 415 | self._log( 416 | f"> Next_task: {next_task}, Next action: {next_action}, Second action: {second_action}" 417 | ) 418 | state.append( 419 | { 420 | "role": "assistant", 421 | "content": f"Next task: {next_task}. Next action: {next_action}", 422 | } 423 | ) 424 | 425 | if next_action == "nothing" or next_action is None: 426 | self._log("> No further action required.") 427 | self.iteration_count += 1000 428 | else: 429 | self.iteration_count += 1 430 | return page, context 431 | 432 | async def run(self): 433 | async with async_playwright() as p: 434 | start = time.time() 435 | page, context = await self.main(p) 436 | 437 | rag_query = fetch_query_for_rag(self.task) 438 | self._log(f"> Querying RAG for task: {rag_query}") 439 | docs = [a["text"] for a in self.rag.query(rag_query)] 440 | answer = summarize_text(self.task, docs, self.output_model) 441 | # self._log(f"> Answer: {answer}") 442 | self._log(f"> Total time taken: {time.time() - start}") 443 | 444 | try: 445 | await context.close() 446 | except Exception as e: 447 | raise Warning(e) 448 | 449 | return answer 450 | -------------------------------------------------------------------------------- /examples/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/.DS_Store -------------------------------------------------------------------------------- /examples/events-in-bangalore.json: -------------------------------------------------------------------------------- 1 | { 2 | "events": [ 3 | { 4 | "name": "Sonu Nigam Live in Concert", 5 | "date": "Sun, 23 Feb", 6 | "location": "Phoenix Marketcity, Bengaluru" 7 | }, 8 | { 9 | "name": "Rambo Circus - Olympian Circus", 10 | "date": "Sat, 11 Jan onwards", 11 | "location": "Olympian circus, J.P nagar" 12 | }, 13 | { 14 | "name": "Anubhav Singh Bassi Stand-up Comedy", 15 | "date": "Sat, 4 Jan onwards", 16 | "location": "St. John's Auditorium" 17 | }, 18 | { 19 | "name": "Aakash Gupta - Daily Ka Kaam Hai", 20 | "date": "This Weekend", 21 | "location": "Prestige Centre" 22 | }, 23 | { 24 | "name": "Japan Habba", 25 | "date": "This Weekend", 26 | "location": "Phoenix Marketcity, Bengaluru" 27 | } 28 | ] 29 | } -------------------------------------------------------------------------------- /examples/events-in-bangalore.webm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/events-in-bangalore.webm -------------------------------------------------------------------------------- /examples/repo-issues.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/repo-issues.mp4 -------------------------------------------------------------------------------- /index.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict, Optional 2 | import numpy as np 3 | from sentence_transformers import SentenceTransformer 4 | import faiss 5 | import json 6 | import os 7 | 8 | model_name = "all-MiniLM-L6-v2" # "all-MiniLM-L6-v2" 9 | model = SentenceTransformer(model_name, local_files_only=True) 10 | dimension = model.get_sentence_embedding_dimension() 11 | 12 | 13 | class RAGSystem: 14 | def __init__(self, index_path: str = "rag_index"): 15 | """ 16 | Initialize the RAG system with a sentence transformer model and storage paths. 17 | 18 | Args: 19 | model_name: The name of the sentence transformer model to use 20 | index_path: Directory to store the FAISS index and documents 21 | """ 22 | self.encoder = model 23 | self.index_path = index_path 24 | self.dimension = dimension 25 | 26 | # Initialize FAISS index 27 | self.index = faiss.IndexFlatIP(self.dimension) # Inner product index 28 | 29 | # Storage for documents and their metadata 30 | self.documents = [] 31 | self.doc_embeddings = [] 32 | 33 | # Create storage directory if it doesn't exist 34 | os.makedirs(index_path, exist_ok=True) 35 | 36 | # Load existing index if available 37 | self._load_index() 38 | 39 | def add_document(self, text: str, metadata: Optional[Dict] = None) -> int: 40 | """ 41 | Add a new document to the system. 42 | 43 | Args: 44 | text: The document text 45 | metadata: Optional metadata about the document (e.g., URL, timestamp) 46 | 47 | Returns: 48 | doc_id: The ID of the added document 49 | """ 50 | # Create document object 51 | doc_id = len(self.documents) 52 | doc = {"id": doc_id, "text": text, "metadata": metadata or {}} 53 | 54 | # Compute embedding 55 | embedding = self.encoder.encode([text])[0] 56 | 57 | # Add to storage 58 | self.documents.append(doc) 59 | self.doc_embeddings.append(embedding) 60 | 61 | # Add to FAISS index 62 | self.index.add(np.array([embedding], dtype=np.float32)) 63 | 64 | # Save updated index 65 | self._save_index() 66 | 67 | return doc_id 68 | 69 | def query(self, question: str, k: int = 5) -> List[Dict]: 70 | """ 71 | Query the system with a question and retrieve relevant documents. 72 | 73 | Args: 74 | question: The query text 75 | k: Number of documents to retrieve 76 | 77 | Returns: 78 | List of relevant documents with their similarity scores 79 | """ 80 | # Encode query 81 | query_embedding = self.encoder.encode([question])[0] 82 | 83 | # Search index 84 | scores, doc_indices = self.index.search( 85 | np.array([query_embedding], dtype=np.float32), k 86 | ) 87 | 88 | # Prepare results 89 | results = [] 90 | for score, doc_idx in zip(scores[0], doc_indices[0]): 91 | if doc_idx != -1: # Valid index 92 | doc = self.documents[doc_idx].copy() 93 | doc["similarity_score"] = float(score) 94 | results.append(doc) 95 | 96 | return results 97 | 98 | def _save_index(self): 99 | """Save the current state of the system""" 100 | # Save FAISS index 101 | faiss.write_index(self.index, os.path.join(self.index_path, "index.faiss")) 102 | 103 | # Save documents and embeddings 104 | with open(os.path.join(self.index_path, "documents.json"), "w") as f: 105 | json.dump(self.documents, f) 106 | 107 | np.save( 108 | os.path.join(self.index_path, "embeddings.npy"), 109 | np.array(self.doc_embeddings), 110 | ) 111 | 112 | def _load_index(self): 113 | """Load the saved state if it exists""" 114 | index_file = os.path.join(self.index_path, "index.faiss") 115 | docs_file = os.path.join(self.index_path, "documents.json") 116 | embeddings_file = os.path.join(self.index_path, "embeddings.npy") 117 | 118 | if all(os.path.exists(f) for f in [index_file, docs_file, embeddings_file]): 119 | self.index = faiss.read_index(index_file) 120 | 121 | with open(docs_file, "r") as f: 122 | self.documents = json.load(f) 123 | 124 | self.doc_embeddings = np.load(embeddings_file).tolist() 125 | 126 | def get_document_count(self) -> int: 127 | """Return the number of documents in the system""" 128 | return len(self.documents) 129 | 130 | 131 | def check_if_information_found( 132 | rag: RAGSystem, query: str, threshold: float = 0.6 133 | ) -> bool: 134 | """ 135 | Check if the RAG system has found the desired information. 136 | 137 | Args: 138 | rag: The RAG system instance 139 | query: The information we're looking for 140 | threshold: Similarity threshold to consider information as found 141 | 142 | Returns: 143 | bool: Whether the information has been found 144 | """ 145 | results = rag.query(query, k=1) 146 | return bool(results and results[0]["similarity_score"] >= threshold) 147 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | playwright 2 | requests 3 | anthropic 4 | pillow 5 | litellm 6 | pydantic 7 | datasets 8 | huggingface_hub 9 | transformers 10 | beautifulsoup4 11 | sentence_transformers 12 | scikit-learn 13 | faiss-cpu 14 | rich 15 | ua-generator -------------------------------------------------------------------------------- /server.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, Body 2 | from typing import List 3 | from rich.traceback import install 4 | from pydantic import BaseModel 5 | from .browser import WebScraper 6 | from pydantic import create_model 7 | 8 | install(show_locals=True) 9 | 10 | app = FastAPI() 11 | 12 | 13 | class ScrapeRequestModel(BaseModel): 14 | start_url: str 15 | task: str 16 | schema: dict 17 | 18 | 19 | @app.post("/scrape") 20 | async def scrape(request: ScrapeRequestModel = Body(...)): 21 | start_url = request.start_url 22 | task = request.task 23 | schema = request.schema 24 | model = create_model( 25 | "ResponseModel", **{key: (value, ...) for key, value in schema.items()} 26 | ) 27 | 28 | class OutputModel(BaseModel): 29 | results: List[model] 30 | 31 | class Config: 32 | arbitrary_types_allowed = True 33 | 34 | scraper = WebScraper(task, start_url, OutputModel) 35 | result = await scraper.run() 36 | return result 37 | --------------------------------------------------------------------------------