├── .env.test
├── .gitignore
├── LICENSE
├── README.md
├── agent.py
├── agent_prompt.txt
├── browser.py
├── examples
├── .DS_Store
├── events-in-bangalore.json
├── events-in-bangalore.webm
└── repo-issues.mp4
├── index.py
├── requirements.txt
└── server.py
/.env.test:
--------------------------------------------------------------------------------
1 | ANTHROPIC_API_KEY=
2 | DEEPSEEK_API_KEY=
3 | OMNIPARSER_API=
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | agent
2 | *.env
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 Addy Bhatia
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # OneQuery
2 |
3 | [](https://github.com/addy999/onequery/blob/main/LICENSE)
4 | [](https://github.com/addy999/onequery/commits/main)
5 |
6 | > 🔨 **Note:** This repository is still in development. Contributions and feedback are welcome!
7 |
8 | ## Setup
9 |
10 | - Requirements: `pip install -r requirements.txt`
11 | - Install browser: `python -m playwright install`
12 | - This project uses Playwright to control the browser. You can install the browser of your choice using the command above.
13 | - Write your environment variables in a `.env` file (see `.env.test`)
14 | - Install OmniParser
15 | - For webpage analysis, we use the [OmniParser](https://huggingface.co/spaces/microsoft/OmniParser) model from Hugging Face. You'll need to host it via an [API](https://github.com/addy999/omniparser-api) locally.
16 |
17 | ## Examples
18 |
19 | - Finding issues on a github repo
20 |
21 | [](https://youtu.be/a_QPDnAosKM?si=pXtZgrRlvXzii7FX "Finding issues on a GitHub repo")
22 |
23 | - Finding live events
24 |
25 | [](https://youtu.be/sp_YuZ1Q4wU?feature=shared "Finding live events")
26 |
27 | ## Usage
28 |
29 | ### General query with no source to start with
30 |
31 | ```python
32 | task = "Find 2 recent issues from PyTorch repository."
33 |
34 | class IssueModel(BaseModel):
35 | date: str
36 | title: str
37 | author: str
38 | description: str
39 |
40 | class OutputModel(BaseModel):
41 | issues: list[IssueModel]
42 |
43 | scraper = WebScraper(task, None, OutputModel)
44 | scraper.run()
45 | ```
46 |
47 | ### If you know the URL
48 |
49 | ```python
50 | start_url = "https://in.bookmyshow.com/"
51 | task = "Find 5 events happening in Bangalore this week."
52 |
53 | class EventsModel(BaseModel):
54 | name: str
55 | date: str
56 | location: str
57 |
58 | class OutputModel(BaseModel):
59 | events: list[EventsModel]
60 |
61 | scraper = WebScraper(task, start_url, OutputModel)
62 | scraper.run()
63 | ```
64 |
65 | ### Serving with a REST API
66 |
67 | Server:
68 |
69 | ```bash
70 | pip install fastapi[all]
71 | ```
72 |
73 | ```python
74 | uvicorn server:app --reload
75 | ```
76 |
77 | Client:
78 |
79 | ```python
80 | import requests
81 |
82 | url = "http://0.0.0.0:8000/scrape"
83 |
84 | payload = {
85 | "start_url": "http://example.com",
86 | "task": "Scrape the website for data",
87 | "schema": {
88 | "title": (str, ...),
89 | "description": (str, ...)
90 | }
91 | }
92 |
93 | response = requests.post(url, json=payload)
94 |
95 | print(response.status_code)
96 | print(response.json())
97 | ```
98 |
99 | > 💡 **Tip:** For a hosted solution with a lightning fast Zig based browser, worldwide proxy support, and job queuing system, check out [onequery.app](https://www.onequery.app).
100 |
101 | ## Testing
102 |
103 | In the works
104 |
105 | ## Status
106 |
107 | - ✅ Basic functionality
108 | - 🛠️ Testing
109 | - 🛠️ Documentation
110 |
111 | ## Architecture
112 |
113 | (needs to be revised)
114 |
115 | ### Flowchart
116 |
117 | ```mermaid
118 | graph TD;
119 | A[Text Query] --> B[WebLLM];
120 | B --> C[Browser Instructions];
121 | C --> D[Browser Execution];
122 | D --> E[OmniParser];
123 | E --> F[Screenshot & Structured Info];
124 | F --> G[AI];
125 | C --> G;
126 | G --> H[JSON Output];
127 | ```
128 |
129 | ### Stack
130 |
131 | - Browser: [Playwright](https://github.com/microsoft/playwright-python)
132 | - VLLM: [OmniParser](https://github.com/addy999/omniparser-api)
133 |
134 |
135 | ## Alternatives
136 | - https://github.com/CognosysAI/browser/
137 |
--------------------------------------------------------------------------------
/agent.py:
--------------------------------------------------------------------------------
1 | import re
2 | import json
3 | import ast
4 | from pydantic import BaseModel
5 | import litellm
6 | from litellm import api_key, completion
7 | import os
8 | from functools import lru_cache
9 | from datetime import datetime
10 | from together import Together
11 |
12 | client = Together()
13 |
14 | base_path = os.path.dirname(os.path.abspath(__file__))
15 | RAW_SYSTEM_PROMPT = open(os.path.join(base_path, "agent_prompt.txt")).read()
16 | # SYSTEM_JSON_PROMPT = open(os.path.join(base_path, "agent_prompt_json.txt")).read()
17 |
18 | # litellm.set_verbose = True
19 | litellm.modify_params = True
20 |
21 |
22 | def parse_text(text):
23 | next_action_pattern = r"\n(.*?)\n"
24 | next_action2_pattern = r"\n(.*?)\n"
25 | explanation_pattern = r"\n(.*?)\n"
26 | next_task_pattern = r"\n(.*?)\n"
27 |
28 | next_action_match = re.search(next_action_pattern, text, re.DOTALL)
29 | next_action2_match = re.search(next_action2_pattern, text, re.DOTALL)
30 | explanation_match = re.search(explanation_pattern, text, re.DOTALL)
31 | next_task_match = re.search(next_task_pattern, text, re.DOTALL)
32 |
33 | result = {
34 | "next_action": next_action_match.group(1) if next_action_match else None,
35 | "next_action_2": (next_action2_match.group(1) if next_action2_match else None),
36 | "explanation": explanation_match.group(1) if explanation_match else None,
37 | "next_task": next_task_match.group(1) if next_task_match else None,
38 | }
39 |
40 | return result
41 |
42 |
43 | def is_valid_json(string: str) -> bool:
44 | try:
45 | json.loads(string)
46 | return True
47 | except json.JSONDecodeError:
48 | return False
49 |
50 |
51 | def clean_up_json(string: str) -> str:
52 | def extract_json_from_string(string):
53 | start_index = string.find("{")
54 | end_index = string.rfind("}")
55 | if start_index != -1 and end_index != -1:
56 | return string[start_index : end_index + 1]
57 | return ""
58 |
59 | cleaned = (
60 | extract_json_from_string(string)
61 | .strip()
62 | .replace("\n", "")
63 | .replace('\\"', '"')
64 | .replace("```", "")
65 | .replace("json", "")
66 | )
67 |
68 | # Check if there's a missing "}" at the end and add it
69 | if cleaned.count("{") > cleaned.count("}"):
70 | cleaned += "}"
71 |
72 | if not is_valid_json(cleaned):
73 | try:
74 | cleaned = json.dumps(ast.literal_eval(cleaned))
75 | except (ValueError, SyntaxError):
76 | raise ValueError("String not valid", cleaned)
77 | return cleaned
78 |
79 |
80 | def get_reply(state, mode="anthropic") -> str:
81 | today_date = datetime.now().strftime("%Y-%m-%d")
82 | SYSTEM_PROMPT = f"{RAW_SYSTEM_PROMPT}\n\nToday's date: {today_date}"
83 | if mode == "ollama":
84 | reply = (
85 | completion(
86 | model="ollama/llama3.3",
87 | max_tokens=256,
88 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
89 | temperature=0.3,
90 | )
91 | .choices[0]
92 | .message.content
93 | )
94 | elif mode == "deepseek":
95 | reply = (
96 | completion(
97 | model="deepseek/deepseek-chat",
98 | max_tokens=256,
99 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
100 | temperature=0.3,
101 | )
102 | .choices[0]
103 | .message.content
104 | )
105 | elif mode == "anthropic":
106 | reply = (
107 | completion(
108 | model="anthropic/claude-3-5-sonnet-20241022",
109 | max_tokens=256,
110 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
111 | temperature=0.3,
112 | )
113 | .choices[0]
114 | .message.content
115 | )
116 | elif mode == "deepseek-r1":
117 | reply = (
118 | client.chat.completions.create(
119 | model="deepseek-ai/DeepSeek-R1",
120 | messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
121 | )
122 | .choices[0]
123 | .message.content
124 | )
125 | return parse_text(reply)
126 |
127 |
128 | def summarize_text(prompt: str, documents: list, schema: str) -> str:
129 | return json.loads(
130 | clean_up_json(
131 | (
132 | "{"
133 | + completion(
134 | model="anthropic/claude-3-5-sonnet-20241022",
135 | max_tokens=1000,
136 | temperature=0.3,
137 | messages=[
138 | {
139 | "role": "system",
140 | "content": f"""Summarize the following documents for this prompt in JSON format.
141 |
142 | Prompt: {prompt}
143 |
144 | Return using this schema: {schema}""",
145 | },
146 | {
147 | "role": "user",
148 | "content": [
149 | {
150 | "type": "text",
151 | "text": text,
152 | }
153 | for text in documents
154 | ],
155 | },
156 | {"role": "assistant", "content": "{"},
157 | ],
158 | )
159 | .choices[0]
160 | .message.content
161 | )
162 | )
163 | )
164 |
165 |
166 | def fetch_query_for_rag(task: str) -> str:
167 | response = clean_up_json(
168 | "{"
169 | + completion(
170 | model="anthropic/claude-3-5-sonnet-20241022",
171 | max_tokens=256,
172 | temperature=0.3,
173 | messages=[
174 | {
175 | "role": "user",
176 | "content": "Generate a simple keyword/phrase query for a RAG system based on the following task. Return the query as JSON with 'query' key. The query should help fetch documents relevant to the task: "
177 | + task,
178 | },
179 | {"role": "assistant", "content": "{"},
180 | ],
181 | )
182 | .choices[0]
183 | .message.content
184 | )
185 | return json.loads(response)["query"]
186 |
187 |
188 | @lru_cache(maxsize=128, typed=True)
189 | def find_schema_for_query(query: str) -> str:
190 | return clean_up_json(
191 | # "{"
192 | completion(
193 | model="claude-3-5-haiku-20241022",
194 | temperature=0.5,
195 | max_tokens=512,
196 | messages=[
197 | {
198 | "role": "system",
199 | "content": """You're an expert in data science. You're helping a colleague form JSON schemas for their data. You're given a query and asked to find the schema for it.
200 |
201 | Example:
202 | Query: Find 2 recent issues from PyTorch repository.
203 | Schema: {'properties': {'date': {'title': 'Date', 'type': 'string'}, 'title': {'title': 'Title', 'type': 'string'}, 'author': {'title': 'Author', 'type': 'string'}, 'description': {'title': 'Description', 'type': 'string'}}, 'required': ['date', 'title', 'author', 'description'], 'title': 'IssueModel', 'type': 'object'}
204 |
205 | Example:
206 | Query: Find 5 events happening in Bangalore this week.
207 | Schema: {'properties': {'name': {'title': 'Name', 'type': 'string'}, 'date': {'title': 'Date', 'type': 'string'}, 'location': {'title': 'Location', 'type': 'string'}}, 'required': ['name', 'date', 'location'], 'title': 'EventsModel', 'type': 'object'}""",
208 | },
209 | {
210 | "role": "user",
211 | "content": f"""Find the schema for the following query: {query}.""",
212 | },
213 | # {"role": "assistant", "content": "{"},
214 | ],
215 | )
216 | .choices[0]
217 | .message.content
218 | )
219 |
--------------------------------------------------------------------------------
/agent_prompt.txt:
--------------------------------------------------------------------------------
1 | You are a web browsing agent tasked with navigating web pages and performing actions based on given instructions and visual information. Your goal is to determine the next 1-2 appropriate actions to take on a webpage, given an initial task and a list of elements on the current page state.
2 |
3 | The list of elements contains boxes highlighting relevant HTML elements, each with a unique identifier (UID) listed. UID must be an integer only.
4 |
5 | Avoid signing in to any website - you can only access public data.
6 |
7 | The possible actions you can take are:
8 | 1. change(value=[str], uid=[str]) - Change the value of an element
9 | 2. click(uid=[str]) - Click on an element
10 | 3. scroll(x=[int], y=[int]) - Scroll the page
11 | 4. submit(uid=[str]) - Submit a form
12 | 5. text_input(text=[str], uid=[str]) - Input text into a field
13 | 6. enter - Press enter if inside a text box previously
14 | 7. back - go back a page
15 | 8. nothing - if no more actions are needed.
16 | 9. search(value=[str]) - Google a query if the webpage isn't helpful and you need to find other websites to visit.
17 |
18 | To determine the next action:
19 | 1. Carefully analyze the elements and the initial task.
20 | 2. Consider which HTML elements are relevant to accomplishing the task.
21 | 3. If there's any modals open (like cookie banners or pop ups), close them.
22 | 4. Determine the most appropriate action to take based on the available elements and the task at hand.
23 | 5. Choose one of the possible actions listed above that best fits the current situation.
24 | 6. Do not duplicate the last action
25 | 7. It is possible that no action will be required. Assume all webpages have been recorded by another agent as they are visited.
26 | 8. If you have recently navigated to a new page and there's not enough content on the page, try scrolling down.
27 | 9. If you're on a search engine page, run the search action instead of modifying text input.
28 | 10. If the webpage says you've been blocked or if you're unable to access a webpage, go back to search and try a different website.
29 | 11. If continually seeing blocked webpages, exit.
30 | 13. You can not create a second aciton if the first action is `search`, `back`, or `click`.
31 |
32 |
33 | Once you have determined the next action, output your decision in the following format:
34 |
35 | [Insert the chosen action here, following the format specified in the action list]
36 |
37 |
38 |
39 | [If applicable, insert the next chosen action here to follow the previous action, also following the format specified in the action list]
40 |
41 |
42 |
43 | [One sentence to instruct the next agent to continue this task]
44 |
45 |
46 | Provide a brief explanation for your chosen action(s):
47 |
48 | [Insert your explanation here]
49 |
50 |
51 | Remember to base your decision solely on the information provided in the initial task and the elements. Do not assume or infer any additional information beyond what is explicitly stated.
52 |
53 | Example actions:
54 | - click(uid="1")
55 | - text_input(text="username", uid="12")
56 | - change(value="new_value", uid="5")
57 | - scroll(x=0, y=100)
58 | - submit(uid="3")
59 | - enter
60 | - back
61 | - nothing
62 | - search(value="top news in new york today")
--------------------------------------------------------------------------------
/browser.py:
--------------------------------------------------------------------------------
1 | import json
2 | import ast
3 | import random
4 | import asyncio
5 | import base64
6 | import os
7 | import shutil
8 | import time
9 | import ua_generator
10 | from dataclasses import dataclass
11 | from datetime import datetime
12 | from typing import Dict, List, Optional, Tuple, Union
13 |
14 | import requests
15 | from playwright.async_api import Page, async_playwright
16 | from pydantic import BaseModel
17 |
18 | from .index import RAGSystem
19 | from .agent import fetch_query_for_rag, get_reply, summarize_text
20 |
21 |
22 | def call_process_image_api(
23 | image_path, box_threshold=0.05, iou_threshold=0.1, timeout=60
24 | ):
25 | start = time.time()
26 | url = os.environ.get("OMNIPARSER_API")
27 | with open(image_path, "rb") as image_file:
28 | image_data = image_file.read()
29 |
30 | files = {"image_file": ("image.png", image_data, "image/png")}
31 | params = {"box_threshold": box_threshold, "iou_threshold": iou_threshold}
32 |
33 | for attempt in range(2):
34 | response = requests.post(url, files=files, params=params, timeout=timeout)
35 | if response.status_code == 200:
36 | resp = response.json()
37 | return resp["image"], resp["parsed_content_list"], resp["label_coordinates"]
38 | else:
39 | if attempt == 1:
40 | raise Exception(
41 | f"Request failed with status code {response.status_code}"
42 | )
43 | time.sleep(1) # Wait a bit before retrying
44 |
45 |
46 | # wake up the server
47 | try:
48 | call_process_image_api("downloaded_image.png", 0.05, 0.1, timeout=60)
49 | except Exception:
50 | pass
51 |
52 |
53 | @dataclass
54 | class WebElement:
55 | id: int
56 | text: str
57 | x: float
58 | y: float
59 | width: float
60 | height: float
61 | element_type: str # 'text' or 'icon'
62 |
63 | @property
64 | def center(self) -> Tuple[float, float]:
65 | """Returns the center coordinates of the element"""
66 | return (self.x + (self.width / 2), self.y + (self.height / 2))
67 |
68 | @property
69 | def bounds(self) -> Tuple[float, float, float, float]:
70 | """Returns the boundary coordinates (x1, y1, x2, y2)"""
71 | return (self.x, self.y, self.x + self.width, self.y + self.height)
72 |
73 |
74 | class WebPageProcessor:
75 | def __init__(self):
76 | self.elements: Dict[int, WebElement] = {}
77 |
78 | def load_elements(self, text_boxes: str, coordinates: str) -> None:
79 | """
80 | Load elements from the processed webpage data
81 |
82 | Args:
83 | text_boxes: String mapping ID to text content
84 | coordinates: String mapping ID to [x, y, width, height] lists
85 | """
86 |
87 | self.elements = {}
88 |
89 | def parse_text_boxes(text: str) -> dict:
90 | # Split into lines and filter empty lines
91 | lines = [line.strip() for line in text.split("\n") if line.strip()]
92 |
93 | # Dictionary to store results
94 | boxes = {}
95 |
96 | for line in lines:
97 | # Split on ":" to separate ID from text
98 | id_part, text_part = line.split(":", 1)
99 |
100 | # Extract ID number using string operations
101 | id_str = id_part.split("ID")[1].strip()
102 | id_num = int(id_str)
103 |
104 | # Store in dictionary with cleaned text
105 | boxes[id_num] = text_part.strip()
106 |
107 | return boxes
108 |
109 | def parse_coordinates(coords: str) -> dict:
110 | """
111 | Example string:
112 | `{'0': [0.89625, 0.04333332697550456, 0.06125, 0.03], '1': [0.01875, 0.14499998728434244, 0.34875, 0.03833333333333333]}`
113 | """
114 | return ast.literal_eval(coords)
115 |
116 | coordinates = parse_coordinates(coordinates)
117 | for element_id, text in parse_text_boxes(text_boxes).items():
118 | id_str = str(element_id)
119 | if id_str in coordinates:
120 | coords = coordinates[id_str]
121 | element_type = "icon" if "Icon Box" in text else "text"
122 |
123 | self.elements[element_id] = WebElement(
124 | id=element_id,
125 | text=text.strip(),
126 | x=coords[0],
127 | y=coords[1],
128 | width=coords[2],
129 | height=coords[3],
130 | element_type=element_type,
131 | )
132 |
133 | async def click_element(self, page, element_id: int) -> None:
134 | """Click an element using its center coordinates"""
135 | if element_id not in self.elements:
136 | raise ValueError(f"Element ID {element_id} not found")
137 |
138 | element = self.elements[element_id]
139 | x, y = element.center
140 |
141 | # Convert normalized coordinates to actual pixels
142 | viewport_size = await page.viewport()
143 | actual_x = x * viewport_size["width"]
144 | actual_y = y * viewport_size["height"]
145 |
146 | await page.mouse.click(actual_x, actual_y)
147 |
148 | def find_elements_by_text(
149 | self, text: str, partial_match: bool = True
150 | ) -> List[WebElement]:
151 | """Find elements containing the specified text"""
152 | matches = []
153 | for element in self.elements.values():
154 | if partial_match and text.lower() in element.text.lower():
155 | matches.append(element)
156 | elif not partial_match and text.lower() == element.text.lower():
157 | matches.append(element)
158 | return matches
159 |
160 | def get_nearby_elements(
161 | self, element_id: int, max_distance: float = 0.1
162 | ) -> List[WebElement]:
163 | """Find elements within a certain distance of the specified element"""
164 | if element_id not in self.elements:
165 | raise ValueError(f"Element ID {element_id} not found")
166 |
167 | source = self.elements[element_id]
168 | nearby = []
169 |
170 | for element in self.elements.values():
171 | if element.id == element_id:
172 | continue
173 |
174 | # Calculate center-to-center distance
175 | sx, sy = source.center
176 | ex, ey = element.center
177 | distance = ((sx - ex) ** 2 + (sy - ey) ** 2) ** 0.5
178 |
179 | if distance <= max_distance:
180 | nearby.append(element)
181 |
182 | return nearby
183 |
184 |
185 | @dataclass
186 | class Action:
187 | action_type: str
188 | params: Optional[Dict[str, Union[str, int]]]
189 |
190 |
191 | class PlaywrightExecutor:
192 | def __init__(self, page: Page, web_processor: "WebPageProcessor"):
193 | self.page = page
194 | self.processor = web_processor
195 |
196 | async def execute_action(self, action_str: str) -> None:
197 | """Execute a Playwright action from a string command."""
198 | print("> Executing action:", action_str)
199 | action = self.parse_action(action_str)
200 | element = None
201 | if "uid" in action.params:
202 | element = self.processor.elements.get(int(action.params["uid"]))
203 | if not element:
204 | raise ValueError(f"Element with uid {action.params['uid']} not found")
205 | if action.action_type == "click":
206 | await self._execute_click(element)
207 | elif action.action_type == "text_input":
208 | await self._execute_change(element, action.params["text"])
209 | elif action.action_type == "change":
210 | await self._execute_change(element, action.params["value"])
211 | elif action.action_type == "load":
212 | await self._execute_load(action.params["url"])
213 | elif action.action_type == "scroll":
214 | await self._execute_scroll(int(action.params["x"]), int(action.params["y"]))
215 | elif action.action_type == "submit":
216 | await self._execute_submit(element)
217 | elif action.action_type == "back":
218 | await self.page.go_back()
219 | elif action.action_type == "enter":
220 | await self.page.keyboard.press("Enter")
221 | elif action.action_type == "nothing":
222 | pass
223 | else:
224 | raise ValueError(f"Unknown action type: {action.action_type}")
225 |
226 | def parse_action(self, action_str: str) -> Action:
227 | """Parse an action string into an Action object."""
228 | if action_str == "back":
229 | return Action(action_type="back", params={})
230 | if action_str == "enter":
231 | return Action(action_type="enter", params={})
232 | if action_str == "nothing":
233 | return Action(action_type="nothing", params={})
234 | action_type = action_str[: action_str.index("(")]
235 | params_str = action_str[action_str.index("(") + 1 : action_str.rindex(")")]
236 | params = {}
237 | if params_str:
238 | param_pairs = params_str.split(",")
239 | for pair in param_pairs:
240 | key, value = pair.split("=", 1)
241 | key = key.strip()
242 | value = value.strip().strip("\"'")
243 | params[key] = value
244 | return Action(action_type=action_type, params=params)
245 |
246 | async def _execute_click(self, element: "WebElement") -> None:
247 | """Execute a click action."""
248 | x, y = element.center
249 | viewport = self.page.viewport_size
250 | actual_x = x * viewport["width"]
251 | actual_y = y * viewport["height"]
252 | await self.page.mouse.move(actual_x, actual_y)
253 | await self.page.mouse.click(actual_x, actual_y, delay=100)
254 |
255 | async def _execute_text_input(self, element: "WebElement", text: str) -> None:
256 | """Execute a text input action."""
257 | x, y = element.center
258 | viewport = self.page.viewport_size
259 | actual_x = x * viewport["width"]
260 | actual_y = y * viewport["height"]
261 | await self.page.mouse.click(actual_x, actual_y, delay=100)
262 | await self.page.keyboard.type(text, delay=100)
263 |
264 | async def _execute_change(self, element: "WebElement", value: str) -> None:
265 | """Execute a change action."""
266 | x, y = element.center
267 | viewport = self.page.viewport_size
268 | actual_x = x * viewport["width"]
269 | actual_y = y * viewport["height"]
270 | await self.page.mouse.click(actual_x, actual_y)
271 | await self.page.keyboard.down("Meta")
272 | await self.page.keyboard.press("A")
273 | await self.page.keyboard.up("Meta")
274 | await self.page.keyboard.type(value, delay=100)
275 |
276 | async def _execute_load(self, url: str) -> None:
277 | """Execute a load action."""
278 | await self.page.goto(url)
279 |
280 | async def _execute_scroll(self, x: int, y: int) -> None:
281 | """Execute a scroll action."""
282 | await self.page.evaluate(f"window.scrollTo({x}, {y})")
283 | await self.page.wait_for_timeout(1000)
284 |
285 | async def _execute_submit(self, element: "WebElement") -> None:
286 | """Execute a submit action."""
287 | x, y = element.center
288 | viewport = self.page.viewport_size
289 | actual_x = x * viewport["width"]
290 | actual_y = y * viewport["height"]
291 | await self.page.mouse.click(actual_x, actual_y)
292 |
293 |
294 | class WebScraper:
295 | def __init__(self, task, start_url, output_model: BaseModel, callback=None):
296 | self.logs = []
297 | self.log_callback = callback
298 |
299 | self._log("Initializing WebScraper...")
300 | self.task = task
301 | self.start_url = start_url
302 | index_path = "output/index"
303 | if os.path.exists(index_path):
304 | shutil.rmtree(index_path)
305 | self.rag = RAGSystem(index_path="output/index")
306 | self.web_processor = WebPageProcessor()
307 | self.output_model = output_model
308 | self.browser = None
309 | self.iteration_count = 0
310 | self._log("Done initializing WebScraper")
311 |
312 | def _log(self, message):
313 | self.logs.append(message)
314 | if self.log_callback:
315 | self.log_callback(message)
316 |
317 | async def main(self, p):
318 | # locally
319 | self._log("Starting browser...")
320 | self.browser = await p.chromium.launch(
321 | headless=True,
322 | )
323 |
324 | user_agent = ua_generator.generate(
325 | device="desktop",
326 | )
327 |
328 | context = await self.browser.new_context(
329 | record_video_dir="videos/",
330 | record_video_size={"width": 1920, "height": 1080},
331 | )
332 | page = await context.new_page()
333 | await page.set_viewport_size({"width": 1920, "height": 1080})
334 | next_task = (
335 | "Find the website to visit."
336 | if "google.com" in self.start_url
337 | else "Figure out what to do on the website."
338 | )
339 | next_action = f'load(url="{self.start_url}")'
340 | second_action = None
341 | max_iterations = 30
342 | self.iteration_count = 0
343 | state = [
344 | {
345 | "role": "user",
346 | "content": f"""Overall goal: {self.task}. Try to find the following information in your search: {self.output_model["properties"]}""",
347 | },
348 | {
349 | "role": "assistant",
350 | "content": "Okay. Let's get started.",
351 | },
352 | ]
353 | while next_task and self.iteration_count < max_iterations:
354 | executor = PlaywrightExecutor(page, self.web_processor)
355 | self._log(f"> Executing action {next_action}")
356 | await executor.execute_action(next_action)
357 | time.sleep(1)
358 | if second_action:
359 | self._log(f"> Executing second action {second_action}")
360 | await executor.execute_action(second_action)
361 | time.sleep(1)
362 | self._log("> Inspecting the screen...")
363 | start_time = datetime.now()
364 | await page.screenshot(path="screenshot.png", scale="css")
365 | img, parsed, coordinates = call_process_image_api(
366 | "screenshot.png", 0.2, 0.1
367 | )
368 |
369 | # Save the base64 image locally as "screenshot.png"
370 | image_data = base64.b64decode(img)
371 | with open("screenshot.png", "wb") as f:
372 | f.write(image_data)
373 |
374 | end_time = datetime.now()
375 | self._log(f"Inspection took: {(end_time - start_time).total_seconds()}s")
376 | self.web_processor.load_elements(parsed, coordinates)
377 | text_content = " ".join(
378 | [
379 | a.text
380 | for a in self.web_processor.elements.values()
381 | if a.element_type == "text"
382 | ]
383 | )
384 | self.rag.add_document(
385 | text_content,
386 | {"url": page.url, "timestamp": datetime.now().isoformat()},
387 | )
388 | state.append(
389 | {
390 | "role": "user",
391 | "content": "Elements on screen: " + parsed,
392 | # {
393 | # "type": "image",
394 | # "source": {
395 | # "type": "base64",
396 | # "media_type": "image/png",
397 | # "data": img,
398 | # },
399 | # },
400 | # ],
401 | }
402 | )
403 | self._log("> Getting reply from AI...")
404 | start_time = datetime.now()
405 | reply = get_reply(state)
406 | self._log(
407 | f"> AI time taken: {(datetime.now() - start_time).total_seconds()}"
408 | )
409 |
410 | next_task, next_action, second_action = (
411 | reply["next_task"],
412 | reply["next_action"],
413 | reply.get("next_action_2"),
414 | )
415 | self._log(
416 | f"> Next_task: {next_task}, Next action: {next_action}, Second action: {second_action}"
417 | )
418 | state.append(
419 | {
420 | "role": "assistant",
421 | "content": f"Next task: {next_task}. Next action: {next_action}",
422 | }
423 | )
424 |
425 | if next_action == "nothing" or next_action is None:
426 | self._log("> No further action required.")
427 | self.iteration_count += 1000
428 | else:
429 | self.iteration_count += 1
430 | return page, context
431 |
432 | async def run(self):
433 | async with async_playwright() as p:
434 | start = time.time()
435 | page, context = await self.main(p)
436 |
437 | rag_query = fetch_query_for_rag(self.task)
438 | self._log(f"> Querying RAG for task: {rag_query}")
439 | docs = [a["text"] for a in self.rag.query(rag_query)]
440 | answer = summarize_text(self.task, docs, self.output_model)
441 | # self._log(f"> Answer: {answer}")
442 | self._log(f"> Total time taken: {time.time() - start}")
443 |
444 | try:
445 | await context.close()
446 | except Exception as e:
447 | raise Warning(e)
448 |
449 | return answer
450 |
--------------------------------------------------------------------------------
/examples/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/.DS_Store
--------------------------------------------------------------------------------
/examples/events-in-bangalore.json:
--------------------------------------------------------------------------------
1 | {
2 | "events": [
3 | {
4 | "name": "Sonu Nigam Live in Concert",
5 | "date": "Sun, 23 Feb",
6 | "location": "Phoenix Marketcity, Bengaluru"
7 | },
8 | {
9 | "name": "Rambo Circus - Olympian Circus",
10 | "date": "Sat, 11 Jan onwards",
11 | "location": "Olympian circus, J.P nagar"
12 | },
13 | {
14 | "name": "Anubhav Singh Bassi Stand-up Comedy",
15 | "date": "Sat, 4 Jan onwards",
16 | "location": "St. John's Auditorium"
17 | },
18 | {
19 | "name": "Aakash Gupta - Daily Ka Kaam Hai",
20 | "date": "This Weekend",
21 | "location": "Prestige Centre"
22 | },
23 | {
24 | "name": "Japan Habba",
25 | "date": "This Weekend",
26 | "location": "Phoenix Marketcity, Bengaluru"
27 | }
28 | ]
29 | }
--------------------------------------------------------------------------------
/examples/events-in-bangalore.webm:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/events-in-bangalore.webm
--------------------------------------------------------------------------------
/examples/repo-issues.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/repo-issues.mp4
--------------------------------------------------------------------------------
/index.py:
--------------------------------------------------------------------------------
1 | from typing import List, Dict, Optional
2 | import numpy as np
3 | from sentence_transformers import SentenceTransformer
4 | import faiss
5 | import json
6 | import os
7 |
8 | model_name = "all-MiniLM-L6-v2" # "all-MiniLM-L6-v2"
9 | model = SentenceTransformer(model_name, local_files_only=True)
10 | dimension = model.get_sentence_embedding_dimension()
11 |
12 |
13 | class RAGSystem:
14 | def __init__(self, index_path: str = "rag_index"):
15 | """
16 | Initialize the RAG system with a sentence transformer model and storage paths.
17 |
18 | Args:
19 | model_name: The name of the sentence transformer model to use
20 | index_path: Directory to store the FAISS index and documents
21 | """
22 | self.encoder = model
23 | self.index_path = index_path
24 | self.dimension = dimension
25 |
26 | # Initialize FAISS index
27 | self.index = faiss.IndexFlatIP(self.dimension) # Inner product index
28 |
29 | # Storage for documents and their metadata
30 | self.documents = []
31 | self.doc_embeddings = []
32 |
33 | # Create storage directory if it doesn't exist
34 | os.makedirs(index_path, exist_ok=True)
35 |
36 | # Load existing index if available
37 | self._load_index()
38 |
39 | def add_document(self, text: str, metadata: Optional[Dict] = None) -> int:
40 | """
41 | Add a new document to the system.
42 |
43 | Args:
44 | text: The document text
45 | metadata: Optional metadata about the document (e.g., URL, timestamp)
46 |
47 | Returns:
48 | doc_id: The ID of the added document
49 | """
50 | # Create document object
51 | doc_id = len(self.documents)
52 | doc = {"id": doc_id, "text": text, "metadata": metadata or {}}
53 |
54 | # Compute embedding
55 | embedding = self.encoder.encode([text])[0]
56 |
57 | # Add to storage
58 | self.documents.append(doc)
59 | self.doc_embeddings.append(embedding)
60 |
61 | # Add to FAISS index
62 | self.index.add(np.array([embedding], dtype=np.float32))
63 |
64 | # Save updated index
65 | self._save_index()
66 |
67 | return doc_id
68 |
69 | def query(self, question: str, k: int = 5) -> List[Dict]:
70 | """
71 | Query the system with a question and retrieve relevant documents.
72 |
73 | Args:
74 | question: The query text
75 | k: Number of documents to retrieve
76 |
77 | Returns:
78 | List of relevant documents with their similarity scores
79 | """
80 | # Encode query
81 | query_embedding = self.encoder.encode([question])[0]
82 |
83 | # Search index
84 | scores, doc_indices = self.index.search(
85 | np.array([query_embedding], dtype=np.float32), k
86 | )
87 |
88 | # Prepare results
89 | results = []
90 | for score, doc_idx in zip(scores[0], doc_indices[0]):
91 | if doc_idx != -1: # Valid index
92 | doc = self.documents[doc_idx].copy()
93 | doc["similarity_score"] = float(score)
94 | results.append(doc)
95 |
96 | return results
97 |
98 | def _save_index(self):
99 | """Save the current state of the system"""
100 | # Save FAISS index
101 | faiss.write_index(self.index, os.path.join(self.index_path, "index.faiss"))
102 |
103 | # Save documents and embeddings
104 | with open(os.path.join(self.index_path, "documents.json"), "w") as f:
105 | json.dump(self.documents, f)
106 |
107 | np.save(
108 | os.path.join(self.index_path, "embeddings.npy"),
109 | np.array(self.doc_embeddings),
110 | )
111 |
112 | def _load_index(self):
113 | """Load the saved state if it exists"""
114 | index_file = os.path.join(self.index_path, "index.faiss")
115 | docs_file = os.path.join(self.index_path, "documents.json")
116 | embeddings_file = os.path.join(self.index_path, "embeddings.npy")
117 |
118 | if all(os.path.exists(f) for f in [index_file, docs_file, embeddings_file]):
119 | self.index = faiss.read_index(index_file)
120 |
121 | with open(docs_file, "r") as f:
122 | self.documents = json.load(f)
123 |
124 | self.doc_embeddings = np.load(embeddings_file).tolist()
125 |
126 | def get_document_count(self) -> int:
127 | """Return the number of documents in the system"""
128 | return len(self.documents)
129 |
130 |
131 | def check_if_information_found(
132 | rag: RAGSystem, query: str, threshold: float = 0.6
133 | ) -> bool:
134 | """
135 | Check if the RAG system has found the desired information.
136 |
137 | Args:
138 | rag: The RAG system instance
139 | query: The information we're looking for
140 | threshold: Similarity threshold to consider information as found
141 |
142 | Returns:
143 | bool: Whether the information has been found
144 | """
145 | results = rag.query(query, k=1)
146 | return bool(results and results[0]["similarity_score"] >= threshold)
147 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | playwright
2 | requests
3 | anthropic
4 | pillow
5 | litellm
6 | pydantic
7 | datasets
8 | huggingface_hub
9 | transformers
10 | beautifulsoup4
11 | sentence_transformers
12 | scikit-learn
13 | faiss-cpu
14 | rich
15 | ua-generator
--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
1 | from fastapi import FastAPI, Body
2 | from typing import List
3 | from rich.traceback import install
4 | from pydantic import BaseModel
5 | from .browser import WebScraper
6 | from pydantic import create_model
7 |
8 | install(show_locals=True)
9 |
10 | app = FastAPI()
11 |
12 |
13 | class ScrapeRequestModel(BaseModel):
14 | start_url: str
15 | task: str
16 | schema: dict
17 |
18 |
19 | @app.post("/scrape")
20 | async def scrape(request: ScrapeRequestModel = Body(...)):
21 | start_url = request.start_url
22 | task = request.task
23 | schema = request.schema
24 | model = create_model(
25 | "ResponseModel", **{key: (value, ...) for key, value in schema.items()}
26 | )
27 |
28 | class OutputModel(BaseModel):
29 | results: List[model]
30 |
31 | class Config:
32 | arbitrary_types_allowed = True
33 |
34 | scraper = WebScraper(task, start_url, OutputModel)
35 | result = await scraper.run()
36 | return result
37 |
--------------------------------------------------------------------------------