├── .env.test
├── .gitignore
├── LICENSE
├── README.md
├── agent.py
├── agent_prompt.txt
├── browser.py
├── examples
    ├── .DS_Store
    ├── events-in-bangalore.json
    ├── events-in-bangalore.webm
    └── repo-issues.mp4
├── index.py
├── requirements.txt
└── server.py


/.env.test:
--------------------------------------------------------------------------------
1 | ANTHROPIC_API_KEY=
2 | DEEPSEEK_API_KEY=
3 | OMNIPARSER_API=


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | agent
2 | *.env


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Addy Bhatia
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # OneQuery
  2 | 
  3 | [![GitHub License](https://img.shields.io/github/license/addy999/onequery)](https://github.com/addy999/onequery/blob/main/LICENSE)
  4 | [![GitHub Last Commit](https://img.shields.io/github/last-commit/addy999/onequery)](https://github.com/addy999/onequery/commits/main)
  5 | 
  6 | > 🔨 **Note:** This repository is still in development. Contributions and feedback are welcome!
  7 | 
  8 | ## Setup
  9 | 
 10 | - Requirements: `pip install -r requirements.txt`
 11 | - Install browser: `python -m playwright install`
 12 |   - This project uses Playwright to control the browser. You can install the browser of your choice using the command above.
 13 | - Write your environment variables in a `.env` file (see `.env.test`)
 14 | - Install OmniParser
 15 |   - For webpage analysis, we use the [OmniParser](https://huggingface.co/spaces/microsoft/OmniParser) model from Hugging Face. You'll need to host it via an [API](https://github.com/addy999/omniparser-api) locally.
 16 | 
 17 | ## Examples
 18 | 
 19 | - Finding issues on a github repo
 20 | 
 21 | [![Video Demo 1](http://img.youtube.com/vi/a_QPDnAosKM/0.jpg)](https://youtu.be/a_QPDnAosKM?si=pXtZgrRlvXzii7FX "Finding issues on a GitHub repo")
 22 |   
 23 | - Finding live events
 24 | 
 25 | [![Video Demo 2](http://img.youtube.com/vi/sp_YuZ1Q4wU/0.jpg)](https://youtu.be/sp_YuZ1Q4wU?feature=shared "Finding live events")
 26 | 
 27 | ## Usage
 28 | 
 29 | ### General query with no source to start with
 30 | 
 31 | ```python
 32 | task = "Find 2 recent issues from PyTorch repository."
 33 | 
 34 | class IssueModel(BaseModel):
 35 |     date: str
 36 |     title: str
 37 |     author: str
 38 |     description: str
 39 | 
 40 | class OutputModel(BaseModel):
 41 |     issues: list[IssueModel]
 42 | 
 43 | scraper = WebScraper(task, None, OutputModel)
 44 | scraper.run()
 45 | ```
 46 | 
 47 | ### If you know the URL
 48 | 
 49 | ```python
 50 | start_url = "https://in.bookmyshow.com/"
 51 | task = "Find 5 events happening in Bangalore this week."
 52 | 
 53 | class EventsModel(BaseModel):
 54 |     name: str
 55 |     date: str
 56 |     location: str
 57 | 
 58 | class OutputModel(BaseModel):
 59 |     events: list[EventsModel]
 60 | 
 61 | scraper = WebScraper(task, start_url, OutputModel)
 62 | scraper.run()
 63 | ```
 64 | 
 65 | ### Serving with a REST API
 66 | 
 67 | Server:
 68 | 
 69 | ```bash
 70 | pip install fastapi[all]
 71 | ```
 72 | 
 73 | ```python
 74 | uvicorn server:app --reload
 75 | ```
 76 | 
 77 | Client:
 78 | 
 79 | ```python
 80 | import requests
 81 | 
 82 | url = "http://0.0.0.0:8000/scrape"
 83 | 
 84 | payload = {
 85 |     "start_url": "http://example.com",
 86 |     "task": "Scrape the website for data",
 87 |     "schema": {
 88 |         "title": (str, ...),
 89 |         "description": (str, ...)
 90 |     }
 91 | }
 92 | 
 93 | response = requests.post(url, json=payload)
 94 | 
 95 | print(response.status_code)
 96 | print(response.json())
 97 | ```
 98 | 
 99 | > 💡 **Tip:** For a hosted solution with a lightning fast Zig based browser, worldwide proxy support, and job queuing system, check out [onequery.app](https://www.onequery.app).
100 | 
101 | ## Testing
102 | 
103 | In the works
104 | 
105 | ## Status
106 | 
107 | - ✅ Basic functionality
108 | - 🛠️ Testing
109 | - 🛠️ Documentation
110 | 
111 | ## Architecture
112 | 
113 | (needs to be revised)
114 | 
115 | ### Flowchart
116 | 
117 | ```mermaid
118 | graph TD;
119 |     A[Text Query] --> B[WebLLM];
120 |     B --> C[Browser Instructions];
121 |     C --> D[Browser Execution];
122 |     D --> E[OmniParser];
123 |     E --> F[Screenshot & Structured Info];
124 |     F --> G[AI];
125 |     C --> G;
126 |     G --> H[JSON Output];
127 | ```
128 | 
129 | ### Stack
130 | 
131 | - Browser: [Playwright](https://github.com/microsoft/playwright-python)
132 | - VLLM: [OmniParser](https://github.com/addy999/omniparser-api)
133 | 
134 | 
135 | ## Alternatives
136 | - https://github.com/CognosysAI/browser/
137 | 


--------------------------------------------------------------------------------
/agent.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import json
  3 | import ast
  4 | from pydantic import BaseModel
  5 | import litellm
  6 | from litellm import api_key, completion
  7 | import os
  8 | from functools import lru_cache
  9 | from datetime import datetime
 10 | from together import Together
 11 | 
 12 | client = Together()
 13 | 
 14 | base_path = os.path.dirname(os.path.abspath(__file__))
 15 | RAW_SYSTEM_PROMPT = open(os.path.join(base_path, "agent_prompt.txt")).read()
 16 | # SYSTEM_JSON_PROMPT = open(os.path.join(base_path, "agent_prompt_json.txt")).read()
 17 | 
 18 | # litellm.set_verbose = True
 19 | litellm.modify_params = True
 20 | 
 21 | 
 22 | def parse_text(text):
 23 |     next_action_pattern = r"<next_action-1>\n(.*?)\n</next_action-1>"
 24 |     next_action2_pattern = r"<next_action-2>\n(.*?)\n</next_action-2>"
 25 |     explanation_pattern = r"<explanation>\n(.*?)\n</explanation>"
 26 |     next_task_pattern = r"<next_task>\n(.*?)\n</next_task>"
 27 | 
 28 |     next_action_match = re.search(next_action_pattern, text, re.DOTALL)
 29 |     next_action2_match = re.search(next_action2_pattern, text, re.DOTALL)
 30 |     explanation_match = re.search(explanation_pattern, text, re.DOTALL)
 31 |     next_task_match = re.search(next_task_pattern, text, re.DOTALL)
 32 | 
 33 |     result = {
 34 |         "next_action": next_action_match.group(1) if next_action_match else None,
 35 |         "next_action_2": (next_action2_match.group(1) if next_action2_match else None),
 36 |         "explanation": explanation_match.group(1) if explanation_match else None,
 37 |         "next_task": next_task_match.group(1) if next_task_match else None,
 38 |     }
 39 | 
 40 |     return result
 41 | 
 42 | 
 43 | def is_valid_json(string: str) -> bool:
 44 |     try:
 45 |         json.loads(string)
 46 |         return True
 47 |     except json.JSONDecodeError:
 48 |         return False
 49 | 
 50 | 
 51 | def clean_up_json(string: str) -> str:
 52 |     def extract_json_from_string(string):
 53 |         start_index = string.find("{")
 54 |         end_index = string.rfind("}")
 55 |         if start_index != -1 and end_index != -1:
 56 |             return string[start_index : end_index + 1]
 57 |         return ""
 58 | 
 59 |     cleaned = (
 60 |         extract_json_from_string(string)
 61 |         .strip()
 62 |         .replace("\n", "")
 63 |         .replace('\\"', '"')
 64 |         .replace("```", "")
 65 |         .replace("json", "")
 66 |     )
 67 | 
 68 |     # Check if there's a missing "}" at the end and add it
 69 |     if cleaned.count("{") > cleaned.count("}"):
 70 |         cleaned += "}"
 71 | 
 72 |     if not is_valid_json(cleaned):
 73 |         try:
 74 |             cleaned = json.dumps(ast.literal_eval(cleaned))
 75 |         except (ValueError, SyntaxError):
 76 |             raise ValueError("String not valid", cleaned)
 77 |     return cleaned
 78 | 
 79 | 
 80 | def get_reply(state, mode="anthropic") -> str:
 81 |     today_date = datetime.now().strftime("%Y-%m-%d")
 82 |     SYSTEM_PROMPT = f"{RAW_SYSTEM_PROMPT}\n\nToday's date: {today_date}"
 83 |     if mode == "ollama":
 84 |         reply = (
 85 |             completion(
 86 |                 model="ollama/llama3.3",
 87 |                 max_tokens=256,
 88 |                 messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
 89 |                 temperature=0.3,
 90 |             )
 91 |             .choices[0]
 92 |             .message.content
 93 |         )
 94 |     elif mode == "deepseek":
 95 |         reply = (
 96 |             completion(
 97 |                 model="deepseek/deepseek-chat",
 98 |                 max_tokens=256,
 99 |                 messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
100 |                 temperature=0.3,
101 |             )
102 |             .choices[0]
103 |             .message.content
104 |         )
105 |     elif mode == "anthropic":
106 |         reply = (
107 |             completion(
108 |                 model="anthropic/claude-3-5-sonnet-20241022",
109 |                 max_tokens=256,
110 |                 messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
111 |                 temperature=0.3,
112 |             )
113 |             .choices[0]
114 |             .message.content
115 |         )
116 |     elif mode == "deepseek-r1":
117 |         reply = (
118 |             client.chat.completions.create(
119 |                 model="deepseek-ai/DeepSeek-R1",
120 |                 messages=[{"role": "system", "content": SYSTEM_PROMPT}] + state,
121 |             )
122 |             .choices[0]
123 |             .message.content
124 |         )
125 |     return parse_text(reply)
126 | 
127 | 
128 | def summarize_text(prompt: str, documents: list, schema: str) -> str:
129 |     return json.loads(
130 |         clean_up_json(
131 |             (
132 |                 "{"
133 |                 + completion(
134 |                     model="anthropic/claude-3-5-sonnet-20241022",
135 |                     max_tokens=1000,
136 |                     temperature=0.3,
137 |                     messages=[
138 |                         {
139 |                             "role": "system",
140 |                             "content": f"""Summarize the following documents for this prompt in JSON format.
141 |                 
142 |                 Prompt: {prompt}
143 |                 
144 |                 Return using this schema: {schema}""",
145 |                         },
146 |                         {
147 |                             "role": "user",
148 |                             "content": [
149 |                                 {
150 |                                     "type": "text",
151 |                                     "text": text,
152 |                                 }
153 |                                 for text in documents
154 |                             ],
155 |                         },
156 |                         {"role": "assistant", "content": "{"},
157 |                     ],
158 |                 )
159 |                 .choices[0]
160 |                 .message.content
161 |             )
162 |         )
163 |     )
164 | 
165 | 
166 | def fetch_query_for_rag(task: str) -> str:
167 |     response = clean_up_json(
168 |         "{"
169 |         + completion(
170 |             model="anthropic/claude-3-5-sonnet-20241022",
171 |             max_tokens=256,
172 |             temperature=0.3,
173 |             messages=[
174 |                 {
175 |                     "role": "user",
176 |                     "content": "Generate a simple keyword/phrase query for a RAG system based on the following task. Return the query as JSON with 'query' key. The query should help fetch documents relevant to the task: "
177 |                     + task,
178 |                 },
179 |                 {"role": "assistant", "content": "{"},
180 |             ],
181 |         )
182 |         .choices[0]
183 |         .message.content
184 |     )
185 |     return json.loads(response)["query"]
186 | 
187 | 
188 | @lru_cache(maxsize=128, typed=True)
189 | def find_schema_for_query(query: str) -> str:
190 |     return clean_up_json(
191 |         # "{"
192 |         completion(
193 |             model="claude-3-5-haiku-20241022",
194 |             temperature=0.5,
195 |             max_tokens=512,
196 |             messages=[
197 |                 {
198 |                     "role": "system",
199 |                     "content": """You're an expert in data science. You're helping a colleague form JSON schemas for their data. You're given a query and asked to find the schema for it.
200 |                  
201 |                  Example:
202 |                  Query: Find 2 recent issues from PyTorch repository.
203 |                  Schema: {'properties': {'date': {'title': 'Date', 'type': 'string'}, 'title': {'title': 'Title', 'type': 'string'}, 'author': {'title': 'Author', 'type': 'string'}, 'description': {'title': 'Description', 'type': 'string'}}, 'required': ['date', 'title', 'author', 'description'], 'title': 'IssueModel', 'type': 'object'}
204 |                  
205 |                  Example:
206 |                  Query: Find 5 events happening in Bangalore this week.
207 |                  Schema: {'properties': {'name': {'title': 'Name', 'type': 'string'}, 'date': {'title': 'Date', 'type': 'string'}, 'location': {'title': 'Location', 'type': 'string'}}, 'required': ['name', 'date', 'location'], 'title': 'EventsModel', 'type': 'object'}""",
208 |                 },
209 |                 {
210 |                     "role": "user",
211 |                     "content": f"""Find the schema for the following query: {query}.""",
212 |                 },
213 |                 # {"role": "assistant", "content": "{"},
214 |             ],
215 |         )
216 |         .choices[0]
217 |         .message.content
218 |     )
219 | 


--------------------------------------------------------------------------------
/agent_prompt.txt:
--------------------------------------------------------------------------------
 1 | You are a web browsing agent tasked with navigating web pages and performing actions based on given instructions and visual information. Your goal is to determine the next 1-2 appropriate actions to take on a webpage, given an initial task and a list of elements on the current page state.
 2 |  
 3 | The list of elements contains boxes highlighting relevant HTML elements, each with a unique identifier (UID) listed. UID must be an integer only.
 4 | 
 5 | Avoid signing in to any website - you can only access public data.
 6 | 
 7 | The possible actions you can take are:
 8 | 1. change(value=[str], uid=[str]) - Change the value of an element
 9 | 2. click(uid=[str]) - Click on an element
10 | 3. scroll(x=[int], y=[int]) - Scroll the page
11 | 4. submit(uid=[str]) - Submit a form
12 | 5. text_input(text=[str], uid=[str]) - Input text into a field
13 | 6. enter - Press enter if inside a text box previously
14 | 7. back - go back a page
15 | 8. nothing - if no more actions are needed.
16 | 9. search(value=[str]) - Google a query if the webpage isn't helpful and you need to find other websites to visit.
17 | 
18 | To determine the next action:
19 | 1. Carefully analyze the elements and the initial task.
20 | 2. Consider which HTML elements are relevant to accomplishing the task.
21 | 3. If there's any modals open (like cookie banners or pop ups), close them.
22 | 4. Determine the most appropriate action to take based on the available elements and the task at hand.
23 | 5. Choose one of the possible actions listed above that best fits the current situation.
24 | 6. Do not duplicate the last action
25 | 7. It is possible that no action will be required. Assume all webpages have been recorded by another agent as they are visited.
26 | 8. If you have recently navigated to a new page and there's not enough content on the page, try scrolling down.
27 | 9. If you're on a search engine page, run the search action instead of modifying text input.
28 | 10. If the webpage says you've been blocked or if you're unable to access a webpage, go back to search and try a different website. 
29 | 11. If continually seeing blocked webpages, exit.
30 | 13. You can not create a second aciton if the first action is `search`, `back`, or `click`.
31 | 
32 | 
33 | Once you have determined the next action, output your decision in the following format:
34 | <next_action-1>
35 | [Insert the chosen action here, following the format specified in the action list]
36 | </next_action-1>
37 | 
38 | <next_action-2>
39 | [If applicable, insert the next chosen action here to follow the previous action, also following the format specified in the action list]
40 | </next_action-2>
41 | 
42 | <next_task>
43 | [One sentence to instruct the next agent to continue this task]
44 | </next_task>
45 | 
46 | Provide a brief explanation for your chosen action(s):
47 | <explanation>
48 | [Insert your explanation here]
49 | </explanation>
50 | 
51 | Remember to base your decision solely on the information provided in the initial task and the elements. Do not assume or infer any additional information beyond what is explicitly stated.
52 | 
53 | Example actions:
54 | - click(uid="1")
55 | - text_input(text="username", uid="12")
56 | - change(value="new_value", uid="5")
57 | - scroll(x=0, y=100)
58 | - submit(uid="3")
59 | - enter
60 | - back
61 | - nothing
62 | - search(value="top news in new york today")


--------------------------------------------------------------------------------
/browser.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import ast
  3 | import random
  4 | import asyncio
  5 | import base64
  6 | import os
  7 | import shutil
  8 | import time
  9 | import ua_generator
 10 | from dataclasses import dataclass
 11 | from datetime import datetime
 12 | from typing import Dict, List, Optional, Tuple, Union
 13 | 
 14 | import requests
 15 | from playwright.async_api import Page, async_playwright
 16 | from pydantic import BaseModel
 17 | 
 18 | from .index import RAGSystem
 19 | from .agent import fetch_query_for_rag, get_reply, summarize_text
 20 | 
 21 | 
 22 | def call_process_image_api(
 23 |     image_path, box_threshold=0.05, iou_threshold=0.1, timeout=60
 24 | ):
 25 |     start = time.time()
 26 |     url = os.environ.get("OMNIPARSER_API")
 27 |     with open(image_path, "rb") as image_file:
 28 |         image_data = image_file.read()
 29 | 
 30 |     files = {"image_file": ("image.png", image_data, "image/png")}
 31 |     params = {"box_threshold": box_threshold, "iou_threshold": iou_threshold}
 32 | 
 33 |     for attempt in range(2):
 34 |         response = requests.post(url, files=files, params=params, timeout=timeout)
 35 |         if response.status_code == 200:
 36 |             resp = response.json()
 37 |             return resp["image"], resp["parsed_content_list"], resp["label_coordinates"]
 38 |         else:
 39 |             if attempt == 1:
 40 |                 raise Exception(
 41 |                     f"Request failed with status code {response.status_code}"
 42 |                 )
 43 |             time.sleep(1)  # Wait a bit before retrying
 44 | 
 45 | 
 46 | # wake up the server
 47 | try:
 48 |     call_process_image_api("downloaded_image.png", 0.05, 0.1, timeout=60)
 49 | except Exception:
 50 |     pass
 51 | 
 52 | 
 53 | @dataclass
 54 | class WebElement:
 55 |     id: int
 56 |     text: str
 57 |     x: float
 58 |     y: float
 59 |     width: float
 60 |     height: float
 61 |     element_type: str  # 'text' or 'icon'
 62 | 
 63 |     @property
 64 |     def center(self) -> Tuple[float, float]:
 65 |         """Returns the center coordinates of the element"""
 66 |         return (self.x + (self.width / 2), self.y + (self.height / 2))
 67 | 
 68 |     @property
 69 |     def bounds(self) -> Tuple[float, float, float, float]:
 70 |         """Returns the boundary coordinates (x1, y1, x2, y2)"""
 71 |         return (self.x, self.y, self.x + self.width, self.y + self.height)
 72 | 
 73 | 
 74 | class WebPageProcessor:
 75 |     def __init__(self):
 76 |         self.elements: Dict[int, WebElement] = {}
 77 | 
 78 |     def load_elements(self, text_boxes: str, coordinates: str) -> None:
 79 |         """
 80 |         Load elements from the processed webpage data
 81 | 
 82 |         Args:
 83 |             text_boxes: String mapping ID to text content
 84 |             coordinates: String mapping ID to [x, y, width, height] lists
 85 |         """
 86 | 
 87 |         self.elements = {}
 88 | 
 89 |         def parse_text_boxes(text: str) -> dict:
 90 |             # Split into lines and filter empty lines
 91 |             lines = [line.strip() for line in text.split("\n") if line.strip()]
 92 | 
 93 |             # Dictionary to store results
 94 |             boxes = {}
 95 | 
 96 |             for line in lines:
 97 |                 # Split on ":" to separate ID from text
 98 |                 id_part, text_part = line.split(":", 1)
 99 | 
100 |                 # Extract ID number using string operations
101 |                 id_str = id_part.split("ID")[1].strip()
102 |                 id_num = int(id_str)
103 | 
104 |                 # Store in dictionary with cleaned text
105 |                 boxes[id_num] = text_part.strip()
106 | 
107 |             return boxes
108 | 
109 |         def parse_coordinates(coords: str) -> dict:
110 |             """
111 |             Example string:
112 |             `{'0': [0.89625, 0.04333332697550456, 0.06125, 0.03], '1': [0.01875, 0.14499998728434244, 0.34875, 0.03833333333333333]}`
113 |             """
114 |             return ast.literal_eval(coords)
115 | 
116 |         coordinates = parse_coordinates(coordinates)
117 |         for element_id, text in parse_text_boxes(text_boxes).items():
118 |             id_str = str(element_id)
119 |             if id_str in coordinates:
120 |                 coords = coordinates[id_str]
121 |                 element_type = "icon" if "Icon Box" in text else "text"
122 | 
123 |                 self.elements[element_id] = WebElement(
124 |                     id=element_id,
125 |                     text=text.strip(),
126 |                     x=coords[0],
127 |                     y=coords[1],
128 |                     width=coords[2],
129 |                     height=coords[3],
130 |                     element_type=element_type,
131 |                 )
132 | 
133 |     async def click_element(self, page, element_id: int) -> None:
134 |         """Click an element using its center coordinates"""
135 |         if element_id not in self.elements:
136 |             raise ValueError(f"Element ID {element_id} not found")
137 | 
138 |         element = self.elements[element_id]
139 |         x, y = element.center
140 | 
141 |         # Convert normalized coordinates to actual pixels
142 |         viewport_size = await page.viewport()
143 |         actual_x = x * viewport_size["width"]
144 |         actual_y = y * viewport_size["height"]
145 | 
146 |         await page.mouse.click(actual_x, actual_y)
147 | 
148 |     def find_elements_by_text(
149 |         self, text: str, partial_match: bool = True
150 |     ) -> List[WebElement]:
151 |         """Find elements containing the specified text"""
152 |         matches = []
153 |         for element in self.elements.values():
154 |             if partial_match and text.lower() in element.text.lower():
155 |                 matches.append(element)
156 |             elif not partial_match and text.lower() == element.text.lower():
157 |                 matches.append(element)
158 |         return matches
159 | 
160 |     def get_nearby_elements(
161 |         self, element_id: int, max_distance: float = 0.1
162 |     ) -> List[WebElement]:
163 |         """Find elements within a certain distance of the specified element"""
164 |         if element_id not in self.elements:
165 |             raise ValueError(f"Element ID {element_id} not found")
166 | 
167 |         source = self.elements[element_id]
168 |         nearby = []
169 | 
170 |         for element in self.elements.values():
171 |             if element.id == element_id:
172 |                 continue
173 | 
174 |             # Calculate center-to-center distance
175 |             sx, sy = source.center
176 |             ex, ey = element.center
177 |             distance = ((sx - ex) ** 2 + (sy - ey) ** 2) ** 0.5
178 | 
179 |             if distance <= max_distance:
180 |                 nearby.append(element)
181 | 
182 |         return nearby
183 | 
184 | 
185 | @dataclass
186 | class Action:
187 |     action_type: str
188 |     params: Optional[Dict[str, Union[str, int]]]
189 | 
190 | 
191 | class PlaywrightExecutor:
192 |     def __init__(self, page: Page, web_processor: "WebPageProcessor"):
193 |         self.page = page
194 |         self.processor = web_processor
195 | 
196 |     async def execute_action(self, action_str: str) -> None:
197 |         """Execute a Playwright action from a string command."""
198 |         print("> Executing action:", action_str)
199 |         action = self.parse_action(action_str)
200 |         element = None
201 |         if "uid" in action.params:
202 |             element = self.processor.elements.get(int(action.params["uid"]))
203 |             if not element:
204 |                 raise ValueError(f"Element with uid {action.params['uid']} not found")
205 |         if action.action_type == "click":
206 |             await self._execute_click(element)
207 |         elif action.action_type == "text_input":
208 |             await self._execute_change(element, action.params["text"])
209 |         elif action.action_type == "change":
210 |             await self._execute_change(element, action.params["value"])
211 |         elif action.action_type == "load":
212 |             await self._execute_load(action.params["url"])
213 |         elif action.action_type == "scroll":
214 |             await self._execute_scroll(int(action.params["x"]), int(action.params["y"]))
215 |         elif action.action_type == "submit":
216 |             await self._execute_submit(element)
217 |         elif action.action_type == "back":
218 |             await self.page.go_back()
219 |         elif action.action_type == "enter":
220 |             await self.page.keyboard.press("Enter")
221 |         elif action.action_type == "nothing":
222 |             pass
223 |         else:
224 |             raise ValueError(f"Unknown action type: {action.action_type}")
225 | 
226 |     def parse_action(self, action_str: str) -> Action:
227 |         """Parse an action string into an Action object."""
228 |         if action_str == "back":
229 |             return Action(action_type="back", params={})
230 |         if action_str == "enter":
231 |             return Action(action_type="enter", params={})
232 |         if action_str == "nothing":
233 |             return Action(action_type="nothing", params={})
234 |         action_type = action_str[: action_str.index("(")]
235 |         params_str = action_str[action_str.index("(") + 1 : action_str.rindex(")")]
236 |         params = {}
237 |         if params_str:
238 |             param_pairs = params_str.split(",")
239 |             for pair in param_pairs:
240 |                 key, value = pair.split("=", 1)
241 |                 key = key.strip()
242 |                 value = value.strip().strip("\"'")
243 |                 params[key] = value
244 |         return Action(action_type=action_type, params=params)
245 | 
246 |     async def _execute_click(self, element: "WebElement") -> None:
247 |         """Execute a click action."""
248 |         x, y = element.center
249 |         viewport = self.page.viewport_size
250 |         actual_x = x * viewport["width"]
251 |         actual_y = y * viewport["height"]
252 |         await self.page.mouse.move(actual_x, actual_y)
253 |         await self.page.mouse.click(actual_x, actual_y, delay=100)
254 | 
255 |     async def _execute_text_input(self, element: "WebElement", text: str) -> None:
256 |         """Execute a text input action."""
257 |         x, y = element.center
258 |         viewport = self.page.viewport_size
259 |         actual_x = x * viewport["width"]
260 |         actual_y = y * viewport["height"]
261 |         await self.page.mouse.click(actual_x, actual_y, delay=100)
262 |         await self.page.keyboard.type(text, delay=100)
263 | 
264 |     async def _execute_change(self, element: "WebElement", value: str) -> None:
265 |         """Execute a change action."""
266 |         x, y = element.center
267 |         viewport = self.page.viewport_size
268 |         actual_x = x * viewport["width"]
269 |         actual_y = y * viewport["height"]
270 |         await self.page.mouse.click(actual_x, actual_y)
271 |         await self.page.keyboard.down("Meta")
272 |         await self.page.keyboard.press("A")
273 |         await self.page.keyboard.up("Meta")
274 |         await self.page.keyboard.type(value, delay=100)
275 | 
276 |     async def _execute_load(self, url: str) -> None:
277 |         """Execute a load action."""
278 |         await self.page.goto(url)
279 | 
280 |     async def _execute_scroll(self, x: int, y: int) -> None:
281 |         """Execute a scroll action."""
282 |         await self.page.evaluate(f"window.scrollTo({x}, {y})")
283 |         await self.page.wait_for_timeout(1000)
284 | 
285 |     async def _execute_submit(self, element: "WebElement") -> None:
286 |         """Execute a submit action."""
287 |         x, y = element.center
288 |         viewport = self.page.viewport_size
289 |         actual_x = x * viewport["width"]
290 |         actual_y = y * viewport["height"]
291 |         await self.page.mouse.click(actual_x, actual_y)
292 | 
293 | 
294 | class WebScraper:
295 |     def __init__(self, task, start_url, output_model: BaseModel, callback=None):
296 |         self.logs = []
297 |         self.log_callback = callback
298 | 
299 |         self._log("Initializing WebScraper...")
300 |         self.task = task
301 |         self.start_url = start_url
302 |         index_path = "output/index"
303 |         if os.path.exists(index_path):
304 |             shutil.rmtree(index_path)
305 |         self.rag = RAGSystem(index_path="output/index")
306 |         self.web_processor = WebPageProcessor()
307 |         self.output_model = output_model
308 |         self.browser = None
309 |         self.iteration_count = 0
310 |         self._log("Done initializing WebScraper")
311 | 
312 |     def _log(self, message):
313 |         self.logs.append(message)
314 |         if self.log_callback:
315 |             self.log_callback(message)
316 | 
317 |     async def main(self, p):
318 |         # locally
319 |         self._log("Starting browser...")
320 |         self.browser = await p.chromium.launch(
321 |             headless=True,
322 |         )
323 | 
324 |         user_agent = ua_generator.generate(
325 |             device="desktop",
326 |         )
327 | 
328 |         context = await self.browser.new_context(
329 |             record_video_dir="videos/",
330 |             record_video_size={"width": 1920, "height": 1080},
331 |         )
332 |         page = await context.new_page()
333 |         await page.set_viewport_size({"width": 1920, "height": 1080})
334 |         next_task = (
335 |             "Find the website to visit."
336 |             if "google.com" in self.start_url
337 |             else "Figure out what to do on the website."
338 |         )
339 |         next_action = f'load(url="{self.start_url}")'
340 |         second_action = None
341 |         max_iterations = 30
342 |         self.iteration_count = 0
343 |         state = [
344 |             {
345 |                 "role": "user",
346 |                 "content": f"""Overall goal: {self.task}. Try to find the following information in your search: {self.output_model["properties"]}""",
347 |             },
348 |             {
349 |                 "role": "assistant",
350 |                 "content": "Okay. Let's get started.",
351 |             },
352 |         ]
353 |         while next_task and self.iteration_count < max_iterations:
354 |             executor = PlaywrightExecutor(page, self.web_processor)
355 |             self._log(f"> Executing action {next_action}")
356 |             await executor.execute_action(next_action)
357 |             time.sleep(1)
358 |             if second_action:
359 |                 self._log(f"> Executing second action {second_action}")
360 |                 await executor.execute_action(second_action)
361 |                 time.sleep(1)
362 |             self._log("> Inspecting the screen...")
363 |             start_time = datetime.now()
364 |             await page.screenshot(path="screenshot.png", scale="css")
365 |             img, parsed, coordinates = call_process_image_api(
366 |                 "screenshot.png", 0.2, 0.1
367 |             )
368 | 
369 |             # Save the base64 image locally as "screenshot.png"
370 |             image_data = base64.b64decode(img)
371 |             with open("screenshot.png", "wb") as f:
372 |                 f.write(image_data)
373 | 
374 |             end_time = datetime.now()
375 |             self._log(f"Inspection took: {(end_time - start_time).total_seconds()}s")
376 |             self.web_processor.load_elements(parsed, coordinates)
377 |             text_content = " ".join(
378 |                 [
379 |                     a.text
380 |                     for a in self.web_processor.elements.values()
381 |                     if a.element_type == "text"
382 |                 ]
383 |             )
384 |             self.rag.add_document(
385 |                 text_content,
386 |                 {"url": page.url, "timestamp": datetime.now().isoformat()},
387 |             )
388 |             state.append(
389 |                 {
390 |                     "role": "user",
391 |                     "content": "Elements on screen: " + parsed,
392 |                     # {
393 |                     #     "type": "image",
394 |                     #     "source": {
395 |                     #         "type": "base64",
396 |                     #         "media_type": "image/png",
397 |                     #         "data": img,
398 |                     #     },
399 |                     # },
400 |                     # ],
401 |                 }
402 |             )
403 |             self._log("> Getting reply from AI...")
404 |             start_time = datetime.now()
405 |             reply = get_reply(state)
406 |             self._log(
407 |                 f"> AI time taken: {(datetime.now() - start_time).total_seconds()}"
408 |             )
409 | 
410 |             next_task, next_action, second_action = (
411 |                 reply["next_task"],
412 |                 reply["next_action"],
413 |                 reply.get("next_action_2"),
414 |             )
415 |             self._log(
416 |                 f"> Next_task: {next_task}, Next action: {next_action}, Second action: {second_action}"
417 |             )
418 |             state.append(
419 |                 {
420 |                     "role": "assistant",
421 |                     "content": f"Next task: {next_task}. Next action: {next_action}",
422 |                 }
423 |             )
424 | 
425 |             if next_action == "nothing" or next_action is None:
426 |                 self._log("> No further action required.")
427 |                 self.iteration_count += 1000
428 |             else:
429 |                 self.iteration_count += 1
430 |         return page, context
431 | 
432 |     async def run(self):
433 |         async with async_playwright() as p:
434 |             start = time.time()
435 |             page, context = await self.main(p)
436 | 
437 |             rag_query = fetch_query_for_rag(self.task)
438 |             self._log(f"> Querying RAG for task: {rag_query}")
439 |             docs = [a["text"] for a in self.rag.query(rag_query)]
440 |             answer = summarize_text(self.task, docs, self.output_model)
441 |             # self._log(f"> Answer: {answer}")
442 |             self._log(f"> Total time taken: {time.time() - start}")
443 | 
444 |             try:
445 |                 await context.close()
446 |             except Exception as e:
447 |                 raise Warning(e)
448 | 
449 |         return answer
450 | 


--------------------------------------------------------------------------------
/examples/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/.DS_Store


--------------------------------------------------------------------------------
/examples/events-in-bangalore.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "events": [
 3 |     {
 4 |       "name": "Sonu Nigam Live in Concert",
 5 |       "date": "Sun, 23 Feb",
 6 |       "location": "Phoenix Marketcity, Bengaluru"
 7 |     },
 8 |     {
 9 |       "name": "Rambo Circus - Olympian Circus",
10 |       "date": "Sat, 11 Jan onwards",
11 |       "location": "Olympian circus, J.P nagar"
12 |     },
13 |     {
14 |       "name": "Anubhav Singh Bassi Stand-up Comedy",
15 |       "date": "Sat, 4 Jan onwards",
16 |       "location": "St. John's Auditorium"
17 |     },
18 |     {
19 |       "name": "Aakash Gupta - Daily Ka Kaam Hai",
20 |       "date": "This Weekend",
21 |       "location": "Prestige Centre"
22 |     },
23 |     {
24 |       "name": "Japan Habba",
25 |       "date": "This Weekend",
26 |       "location": "Phoenix Marketcity, Bengaluru"
27 |     }
28 |   ]
29 | }


--------------------------------------------------------------------------------
/examples/events-in-bangalore.webm:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/events-in-bangalore.webm


--------------------------------------------------------------------------------
/examples/repo-issues.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/addy999/onequery/f3cafc068a4f2d95baa890069db3b1376cbda272/examples/repo-issues.mp4


--------------------------------------------------------------------------------
/index.py:
--------------------------------------------------------------------------------
  1 | from typing import List, Dict, Optional
  2 | import numpy as np
  3 | from sentence_transformers import SentenceTransformer
  4 | import faiss
  5 | import json
  6 | import os
  7 | 
  8 | model_name = "all-MiniLM-L6-v2"  # "all-MiniLM-L6-v2"
  9 | model = SentenceTransformer(model_name, local_files_only=True)
 10 | dimension = model.get_sentence_embedding_dimension()
 11 | 
 12 | 
 13 | class RAGSystem:
 14 |     def __init__(self, index_path: str = "rag_index"):
 15 |         """
 16 |         Initialize the RAG system with a sentence transformer model and storage paths.
 17 | 
 18 |         Args:
 19 |             model_name: The name of the sentence transformer model to use
 20 |             index_path: Directory to store the FAISS index and documents
 21 |         """
 22 |         self.encoder = model
 23 |         self.index_path = index_path
 24 |         self.dimension = dimension
 25 | 
 26 |         # Initialize FAISS index
 27 |         self.index = faiss.IndexFlatIP(self.dimension)  # Inner product index
 28 | 
 29 |         # Storage for documents and their metadata
 30 |         self.documents = []
 31 |         self.doc_embeddings = []
 32 | 
 33 |         # Create storage directory if it doesn't exist
 34 |         os.makedirs(index_path, exist_ok=True)
 35 | 
 36 |         # Load existing index if available
 37 |         self._load_index()
 38 | 
 39 |     def add_document(self, text: str, metadata: Optional[Dict] = None) -> int:
 40 |         """
 41 |         Add a new document to the system.
 42 | 
 43 |         Args:
 44 |             text: The document text
 45 |             metadata: Optional metadata about the document (e.g., URL, timestamp)
 46 | 
 47 |         Returns:
 48 |             doc_id: The ID of the added document
 49 |         """
 50 |         # Create document object
 51 |         doc_id = len(self.documents)
 52 |         doc = {"id": doc_id, "text": text, "metadata": metadata or {}}
 53 | 
 54 |         # Compute embedding
 55 |         embedding = self.encoder.encode([text])[0]
 56 | 
 57 |         # Add to storage
 58 |         self.documents.append(doc)
 59 |         self.doc_embeddings.append(embedding)
 60 | 
 61 |         # Add to FAISS index
 62 |         self.index.add(np.array([embedding], dtype=np.float32))
 63 | 
 64 |         # Save updated index
 65 |         self._save_index()
 66 | 
 67 |         return doc_id
 68 | 
 69 |     def query(self, question: str, k: int = 5) -> List[Dict]:
 70 |         """
 71 |         Query the system with a question and retrieve relevant documents.
 72 | 
 73 |         Args:
 74 |             question: The query text
 75 |             k: Number of documents to retrieve
 76 | 
 77 |         Returns:
 78 |             List of relevant documents with their similarity scores
 79 |         """
 80 |         # Encode query
 81 |         query_embedding = self.encoder.encode([question])[0]
 82 | 
 83 |         # Search index
 84 |         scores, doc_indices = self.index.search(
 85 |             np.array([query_embedding], dtype=np.float32), k
 86 |         )
 87 | 
 88 |         # Prepare results
 89 |         results = []
 90 |         for score, doc_idx in zip(scores[0], doc_indices[0]):
 91 |             if doc_idx != -1:  # Valid index
 92 |                 doc = self.documents[doc_idx].copy()
 93 |                 doc["similarity_score"] = float(score)
 94 |                 results.append(doc)
 95 | 
 96 |         return results
 97 | 
 98 |     def _save_index(self):
 99 |         """Save the current state of the system"""
100 |         # Save FAISS index
101 |         faiss.write_index(self.index, os.path.join(self.index_path, "index.faiss"))
102 | 
103 |         # Save documents and embeddings
104 |         with open(os.path.join(self.index_path, "documents.json"), "w") as f:
105 |             json.dump(self.documents, f)
106 | 
107 |         np.save(
108 |             os.path.join(self.index_path, "embeddings.npy"),
109 |             np.array(self.doc_embeddings),
110 |         )
111 | 
112 |     def _load_index(self):
113 |         """Load the saved state if it exists"""
114 |         index_file = os.path.join(self.index_path, "index.faiss")
115 |         docs_file = os.path.join(self.index_path, "documents.json")
116 |         embeddings_file = os.path.join(self.index_path, "embeddings.npy")
117 | 
118 |         if all(os.path.exists(f) for f in [index_file, docs_file, embeddings_file]):
119 |             self.index = faiss.read_index(index_file)
120 | 
121 |             with open(docs_file, "r") as f:
122 |                 self.documents = json.load(f)
123 | 
124 |             self.doc_embeddings = np.load(embeddings_file).tolist()
125 | 
126 |     def get_document_count(self) -> int:
127 |         """Return the number of documents in the system"""
128 |         return len(self.documents)
129 | 
130 | 
131 | def check_if_information_found(
132 |     rag: RAGSystem, query: str, threshold: float = 0.6
133 | ) -> bool:
134 |     """
135 |     Check if the RAG system has found the desired information.
136 | 
137 |     Args:
138 |         rag: The RAG system instance
139 |         query: The information we're looking for
140 |         threshold: Similarity threshold to consider information as found
141 | 
142 |     Returns:
143 |         bool: Whether the information has been found
144 |     """
145 |     results = rag.query(query, k=1)
146 |     return bool(results and results[0]["similarity_score"] >= threshold)
147 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | playwright
 2 | requests
 3 | anthropic
 4 | pillow
 5 | litellm
 6 | pydantic
 7 | datasets
 8 | huggingface_hub
 9 | transformers
10 | beautifulsoup4
11 | sentence_transformers
12 | scikit-learn
13 | faiss-cpu
14 | rich
15 | ua-generator


--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
 1 | from fastapi import FastAPI, Body
 2 | from typing import List
 3 | from rich.traceback import install
 4 | from pydantic import BaseModel
 5 | from .browser import WebScraper
 6 | from pydantic import create_model
 7 | 
 8 | install(show_locals=True)
 9 | 
10 | app = FastAPI()
11 | 
12 | 
13 | class ScrapeRequestModel(BaseModel):
14 |     start_url: str
15 |     task: str
16 |     schema: dict
17 | 
18 | 
19 | @app.post("/scrape")
20 | async def scrape(request: ScrapeRequestModel = Body(...)):
21 |     start_url = request.start_url
22 |     task = request.task
23 |     schema = request.schema
24 |     model = create_model(
25 |         "ResponseModel", **{key: (value, ...) for key, value in schema.items()}
26 |     )
27 | 
28 |     class OutputModel(BaseModel):
29 |         results: List[model]
30 | 
31 |         class Config:
32 |             arbitrary_types_allowed = True
33 | 
34 |     scraper = WebScraper(task, start_url, OutputModel)
35 |     result = await scraper.run()
36 |     return result
37 | 


--------------------------------------------------------------------------------