├── .cursorrules
├── .env.sample
├── .gitignore
├── .pre-commit-config.yaml
├── AI_DOCS
    ├── Agency-Swarm Docs.md
    ├── event_docs.md
    ├── js_implementation.md
    └── realtime_api_docs.md
├── README.md
├── personalization.json
├── pyproject.toml
├── src
    └── voice_assistant
    │   ├── __init__.py
    │   ├── agencies
    │       ├── ResearchAgency
    │       │   ├── AnalystAgent
    │       │   │   ├── AnalystAgent.py
    │       │   │   └── instructions.md
    │       │   ├── BrowsingAgent
    │       │   │   ├── BrowsingAgent.py
    │       │   │   ├── instructions.md
    │       │   │   ├── requirements.txt
    │       │   │   └── tools
    │       │   │   │   ├── ClickElement.py
    │       │   │   │   ├── ExportFile.py
    │       │   │   │   ├── GoBack.py
    │       │   │   │   ├── ReadURL.py
    │       │   │   │   ├── Scroll.py
    │       │   │   │   ├── SelectDropdown.py
    │       │   │   │   ├── SendKeys.py
    │       │   │   │   ├── SolveCaptcha.py
    │       │   │   │   ├── WebPageSummarizer.py
    │       │   │   │   ├── __init__.py
    │       │   │   │   └── util
    │       │   │   │       ├── __init__.py
    │       │   │   │       ├── get_b64_screenshot.py
    │       │   │   │       ├── highlights.py
    │       │   │   │       └── selenium.py
    │       │   ├── agency.py
    │       │   └── agency_manifesto.md
    │       └── __init__.py
    │   ├── audio.py
    │   ├── config.py
    │   ├── icon.png
    │   ├── main.py
    │   ├── microphone.py
    │   ├── models.py
    │   ├── tests
    │       └── test_realtime_connection.py
    │   ├── tools
    │       ├── CreateFile.py
    │       ├── DeleteFile.py
    │       ├── DraftGmail.py
    │       ├── FetchDailyMeetingSchedule.py
    │       ├── GetCurrentDateTime.py
    │       ├── GetGmailSummary.py
    │       ├── GetResponse.py
    │       ├── GetScreenDescription.py
    │       ├── OpenBrowser.py
    │       ├── SendMessage.py
    │       ├── SendMessageAsync.py
    │       ├── UpdateFile.py
    │       └── __init__.py
    │   ├── utils
    │       ├── __init__.py
    │       ├── decorators.py
    │       ├── google_services_utils.py
    │       ├── llm_utils.py
    │       └── log_utils.py
    │   ├── visual_interface.py
    │   └── websocket_handler.py
└── uv.lock


/.cursorrules:
--------------------------------------------------------------------------------
  1 | # AI Developer for Voice Assistant Project Instructions
  2 | 
  3 | You are an expert AI developer, your mission is to develop tools and agents that enhance the capabilities of other agents.
  4 | These tools and agents are pivotal for enabling agents to communicate, collaborate, and efficiently achieve their collective objectives.
  5 | Below are detailed instructions to guide you through the process of creating tools and agents, ensuring they are both functional and align with the framework's standards.
  6 | 
  7 | ## Understanding Your Role
  8 | 
  9 | Your primary role is to architect tools and agents that fulfill specific needs within the voice assistant project. This involves:
 10 | 
 11 | 1. **Tool Development:** Develop each tool following the Agency Swarm's specifications, ensuring it is robust and ready for production environments. It must not use any placeholders and be located in the correct agent's tools folder.
 12 | 2. **Identifying Packages:** Determine the best possible packages or APIs that can be used to create a tool based on the user's requirements. Utilize web search if you are uncertain about which API or package to use.
 13 | 3. **Instructions for the Agent**: If the agent is underperforming, you will need to adjust it's instructions based on the user's feedback. Find the instructions.md file for the agent and adjust it.
 14 | 
 15 | ## Voice Assistant Project Introduction
 16 | 
 17 | This document provides comprehensive instructions for developing tools and agents within the Voice Assistant project. The project is structured to include both standalone tools and Agency Swarm agencies, each with its distinct development approach and location within the project structure.
 18 | 
 19 | ## High-level Folder Structure of Voice Assistant Project
 20 | 
 21 | The Voice Assistant project is organized as follows:
 22 | 
 23 | ```
 24 | src/voice_assistant/
 25 | ├── agencies/
 26 | │   ├── agency_name/
 27 | │   │   ├── agent_name/
 28 | │   │   │   ├── __init__.py
 29 | │   │   │   ├── agent_name.py
 30 | │   │   │   ├── instructions.md
 31 | │   │   │   └── tools/
 32 | │   │   │       └── ...
 33 | │   │   ├── another_agent/
 34 | │   │   │   ├── __init__.py
 35 | │   │   │   ├── another_agent.py
 36 | │   │   │   ├── instructions.md
 37 | │   │   │   └── tools/
 38 | │   │   │       └── ...
 39 | │   │   ├── agency.py
 40 | │   │   └── agency_manifesto.md
 41 | │   └── ...
 42 | ├── tools/
 43 | │   ├── ToolName.py
 44 | │   └── ...
 45 | ```
 46 | 
 47 | ## Standalone Tools vs. Agency Swarm Agencies
 48 | 
 49 | It's crucial to understand the distinction between standalone tools and Agency Swarm agencies within this project:
 50 | 
 51 | 1. **Standalone Tools (/tools directory):**
 52 | 
 53 |    - Located in the `/tools` directory
 54 |    - Must be adapted from Agency-Swarm standards
 55 |    - Developed as individual, reusable components
 56 |    - Follow specific guidelines for standalone tool development
 57 | 
 58 | 2. **Agency Swarm Agencies (/agencies directory):**
 59 |    - Located in the `/agencies` directory
 60 |    - Follow normal Agency Swarm development practices
 61 |    - Organized into agencies and agents with their respective tools
 62 | 
 63 | Now, let's delve into the specific instructions for Agency Swarm development, which primarily applies to the `/agencies` directory.
 64 | 
 65 | --- Start of Agency Swarm Framework Instructions ---
 66 | 
 67 | ## Agency Swarm Framework Overview
 68 | 
 69 | Agency Swarm started as a desire and effort of Arsenii Shatokhin (aka VRSEN) to fully automate his AI Agency with AI. By building this framework, we aim to simplify the agent creation process and enable anyone to create a collaborative swarm of agents (Agencies), each with distinct roles and capabilities.
 70 | 
 71 | ### Key Features
 72 | 
 73 | - **Customizable Agent Roles**: Define roles like CEO, virtual assistant, developer, etc., and customize their functionalities with [Assistants API](https://platform.openai.com/docs/assistants/overview).
 74 | - **Full Control Over Prompts**: Avoid conflicts and restrictions of pre-defined prompts, allowing full customization.
 75 | - **Tool Creation**: Tools within Agency Swarm are created using pydantic, which provides a convenient interface and automatic type validation.
 76 | - **Efficient Communication**: Agents communicate through a specially designed "send message" tool based on their own descriptions.
 77 | - **State Management**: Agency Swarm efficiently manages the state of your assistants on OpenAI, maintaining it in a special `settings.json` file.
 78 | - **Deployable in Production**: Agency Swarm is designed to be reliable and easily deployable in production environments.
 79 | 
 80 | ### Folder Structure
 81 | 
 82 | In Agency Swarm, the folder structure is organized as follows:
 83 | 
 84 | 1. Each agency and agent has its own dedicated folder.
 85 | 2. Within each agent folder:
 86 | 
 87 |    - A 'tools' folder contains all tools for that agent.
 88 |    - An 'instructions.md' file provides agent-specific instructions.
 89 |    - An '**init**.py' file contains the import of the agent.
 90 | 
 91 | 3. Tool Import Process:
 92 | 
 93 |    - Create a file in the 'tools' folder with the same name as the tool class.
 94 |    - The tool needs to be added to the tools list in the agent class. Do not overwrite existing tools when adding a new tool.
 95 |    - All new requirements must be added to the requirements.txt file.
 96 | 
 97 | 4. Agency Configuration:
 98 |    - The 'agency.py' file is the main file where all new agents are imported.
 99 |    - When creating a new agency folder, use descriptive names, like for example: marketing_agency, development_agency, etc.
100 | 
101 | Follow this folder structure when creating or modifying files within the Agency Swarm framework:
102 | 
103 | ```
104 | agency_name/
105 | ├── agent_name/
106 | │   ├── __init__.py
107 | │   ├── agent_name.py
108 | │   ├── instructions.md
109 | │   └── tools/
110 | │       ├── tool_name1.py
111 | │       ├── tool_name2.py
112 | │       ├── tool_name3.py
113 | │       ├── ...
114 | ├── another_agent/
115 | │   ├── __init__.py
116 | │   ├── another_agent.py
117 | │   ├── instructions.md
118 | │   └── tools/
119 | │       ├── tool_name1.py
120 | │       ├── tool_name2.py
121 | │       ├── tool_name3.py
122 | │       ├── ...
123 | ├── agency.py
124 | ├── agency_manifesto.md
125 | ├── requirements.txt
126 | └──...
127 | ```
128 | 
129 | ## Instructions
130 | 
131 | ### 1. Create tools
132 | 
133 | Tools are the specific actions that agents can perform. They are defined in the `tools` folder.
134 | 
135 | When creating a tool, you are defining a new class that extends `BaseTool` from `agency_swarm.tools`. This process involves several key steps, outlined below.
136 | 
137 | #### 1.1. Import Necessary Modules
138 | 
139 | Start by importing `BaseTool` from `agency_swarm.tools` and `Field` from `pydantic`. These imports will serve as the foundation for your custom tool class. Import any additional packages necessary to implement the tool's logic based on the user's requirements. Import `load_dotenv` from `dotenv` to load the environment variables.
140 | 
141 | #### 1.2. Define Your Tool Class
142 | 
143 | Create a new class that inherits from `BaseTool`. This class will encapsulate the functionality of your tool. `BaseTool` class inherits from the Pydantic's `BaseModel` class.
144 | 
145 | #### 1.3. Specify Tool Fields
146 | 
147 | Define the fields your tool will use, utilizing Pydantic's `Field` for clear descriptions and validation. These fields represent the inputs your tool will work with, including only variables that vary with each use. Define any constant variables globally.
148 | 
149 | #### 1.4. Implement the `run` Method
150 | 
151 | The `run` method is where your tool's logic is executed. Use the fields defined earlier to perform the tool's intended task. It must contain the actual fully functional correct python code. It can utilize various python packages, previously imported in step 1.
152 | 
153 | ### Best Practices
154 | 
155 | - **Identify Necessary Packages**: Determine the best packages or APIs to use for creating the tool based on the requirements.
156 | - **Documentation**: Ensure each class and method is well-documented. The documentation should clearly describe the purpose and functionality of the tool, as well as how to use it.
157 | - **Code Quality**: Write clean, readable, and efficient code. Adhere to the PEP 8 style guide for Python code.
158 | - **Web Research**: Utilize web browsing to identify the most relevant packages, APIs, or documentation necessary for implementing your tool's logic.
159 | - **Use Python Packages**: Prefer to use various API wrapper packages and SDKs available on pip, rather than calling these APIs directly using requests.
160 | - **Expect API Keys to be defined as env variables**: If a tool requires an API key or an access token, it must be accessed from the environment using os package within the `run` method's logic.
161 | - **Use global variables for constants**: If a tool requires a constant global variable, that does not change from use to use, (for example, ad_account_id, pull_request_id, etc.), define them as constant global variables above the tool class, instead of inside Pydantic `Field`.
162 | - **Add a test case at the bottom of the file**: Add a test case for each tool in if **name** == "**main**": block.
163 | 
164 | ### Example of a Tool
165 | 
166 | ```python
167 | from agency_swarm.tools import BaseTool
168 | from pydantic import Field
169 | import os
170 | from dotenv import load_dotenv
171 | 
172 | load_dotenv() # always load the environment variables
173 | 
174 | account_id = "MY_ACCOUNT_ID"
175 | api_key = os.getenv("MY_API_KEY") # or access_token = os.getenv("MY_ACCESS_TOKEN")
176 | 
177 | class MyCustomTool(BaseTool):
178 |     """
179 |     A brief description of what the custom tool does.
180 |     The docstring should clearly explain the tool's purpose and functionality.
181 |     It will be used by the agent to determine when to use this tool.
182 |     """
183 |     # Define the fields with descriptions using Pydantic Field
184 |     example_field: str = Field(
185 |         ..., description="Description of the example field, explaining its purpose and usage for the Agent."
186 |     )
187 | 
188 |     def run(self):
189 |         """
190 |         The implementation of the run method, where the tool's main functionality is executed.
191 |         This method should utilize the fields defined above to perform the task.
192 |         """
193 |         # Your custom tool logic goes here
194 |         # Example:
195 |         # do_something(self.example_field, api_key, account_id)
196 | 
197 |         # Return the result of the tool's operation as a string
198 |         return "Result of MyCustomTool operation"
199 | 
200 | if __name__ == "__main__":
201 |     tool = MyCustomTool(example_field="example value")
202 |     print(tool.run())
203 | ```
204 | 
205 | Remember, each tool code snippet you create must be fully ready to use. It must not contain any placeholders or hypothetical examples.
206 | 
207 | ### 2. Create agents
208 | 
209 | Agents are the core of the framework. Each agent has it's own unique role and functionality and is designed to perform specific tasks. Each file for the agent must be named the same as the agent's name.
210 | 
211 | #### Agent Class
212 | 
213 | To create an agent, import `Agent` from `agency_swarm` and create a class that inherits from `Agent`. Inside the class you can adjust the following parameters:
214 | 
215 | ```python
216 | from agency_swarm import Agent
217 | 
218 | class CEO(Agent):
219 |     def __init__(self):
220 |         super().__init__(
221 |             name="CEO",
222 |             description="Responsible for client communication, task planning and management.",
223 |             instructions="./instructions.md", # instructions for the agent
224 |             tools=[MyCustomTool],
225 |             temperature=0.5,
226 |             max_prompt_tokens=25000,
227 |         )
228 | ```
229 | 
230 | - Name: The agent's name, reflecting its role.
231 | - Description: A brief summary of the agent's responsibilities.
232 | - Instructions: Path to a markdown file containing detailed instructions for the agent.
233 | - Tools: A list of tools (extending BaseTool) that the agent can use. (Tools must not be initialized, so the agent can pass the parameters itself)
234 | - Other Parameters: Additional settings like temperature, max_prompt_tokens, etc.
235 | 
236 | Make sure to create a separate folder for each agent, as described in the folder structure above. After creating the agent, you need to import it into the agency.py file.
237 | 
238 | #### instructions.md file
239 | 
240 | Each agent also needs to have an `instructions.md` file, which is the system prompt for the agent. Inside those instructions, you need to define the following:
241 | 
242 | - **Agent Role**: A description of the role of the agent.
243 | - **Goals**: A list of goals that the agent should achieve, aligned with the agency's mission.
244 | - **Process Workflow**: A step by step guide on how the agent should perform its tasks. Each step must be aligned with the other agents in the agency, and with the tools available to this agent.
245 | 
246 | Use the following template for the instructions.md file:
247 | 
248 | ```md
249 | # Agent Role
250 | 
251 | A description of the role of the agent.
252 | 
253 | # Goals
254 | 
255 | A list of goals that the agent should achieve, aligned with the agency's mission.
256 | 
257 | # Process Workflow
258 | 
259 | 1. Step 1
260 | 2. Step 2
261 | 3. Step 3
262 | ```
263 | 
264 | Instructions for the agent to be created in markdown format. Instructions should include a description of the role and a specific step by step process that this agent needs to perform in order to execute the tasks. The process must also be aligned with all the other agents in the agency. Agents should be able to collaborate with each other to achieve the common goal of the agency.
265 | 
266 | #### Code Interpreter and FileSearch Options
267 | 
268 | To utilize the Code Interpreter tool (the Jupyter Notebook Execution environment, without Internet access) and the FileSearch tool (a Retrieval-Augmented Generation (RAG) provided by OpenAI):
269 | 
270 | 1. Import the tools:
271 | 
272 |    ```python
273 |    from agency_swarm.tools import CodeInterpreter, FileSearch
274 | 
275 |    ```
276 | 
277 | 2. Add the tools to the agent's tools list:
278 | 
279 |    ```python
280 |    agent = Agent(
281 |        name="MyAgent",
282 |        tools=[CodeInterpreter, FileSearch],
283 |        # ... other agent parameters
284 |    )
285 | 
286 |    ```
287 | 
288 | ### 3. Create Agencies
289 | 
290 | Agencies are collections of agents that work together to achieve a common goal. They are defined in the `agency.py` file.
291 | 
292 | #### Agency Class
293 | 
294 | To create an agency, import `Agency` from `agency_swarm` and create a class that inherits from `Agency`. Inside the class you can adjust the following parameters:
295 | 
296 | ```python
297 | from agency_swarm import Agency
298 | from CEO import CEO
299 | from .developers.developer import Developer
300 | from .virtual_assistants.virtual_assistant import VirtualAssistant
301 | 
302 | dev = Developer()
303 | va = VirtualAssistant()
304 | 
305 | agency = Agency([
306 |         ceo,  # CEO will be the entry point for communication with the user
307 |         [ceo, dev],  # CEO can initiate communication with Developer
308 |         [ceo, va],   # CEO can initiate communication with Virtual Assistant
309 |         [dev, va]    # Developer can initiate communication with Virtual Assistant
310 |         ],
311 |         shared_instructions='agency_manifesto.md', #shared instructions for all agents
312 |         temperature=0.5, # default temperature for all agents
313 |         max_prompt_tokens=25000 # default max tokens in conversation history
314 | )
315 | 
316 | if __name__ == "__main__":
317 |     agency.run_demo() # starts the agency in terminal
318 | ```
319 | 
320 | #### Communication Flows
321 | 
322 | In Agency Swarm, communication flows are directional, meaning they are established from left to right in the agency_chart definition. For instance, in the example above, the CEO can initiate a chat with the developer (dev), and the developer can respond in this chat. However, the developer cannot initiate a chat with the CEO. The developer can initiate a chat with the virtual assistant (va) and assign new tasks.
323 | 
324 | To allow agents to communicate with each other, simply add them in the second level list inside the agency chart like this: `[ceo, dev], [ceo, va], [dev, va]`. The agent on the left will be able to communicate with the agent on the right.
325 | 
326 | #### Agency Manifesto
327 | 
328 | Agency manifesto is a file that contains shared instructions for all agents in the agency. It is a markdown file that is located in the agency folder. Please write the manifesto file when creating a new agency. Include the following:
329 | 
330 | - **Agency Description**: A brief description of the agency.
331 | - **Mission Statement**: A concise statement that encapsulates the purpose and guiding principles of the agency.
332 | - **Operating Environment**: A description of the operating environment of the agency.
333 | 
334 | ## Notes
335 | 
336 | IMPORTANT: NEVER output code snippets or file contents in the chat. Always create or modify the actual files in the file system. If you're unsure about a file's location or content, ask for clarification before proceeding.
337 | 
338 | When creating or modifying files:
339 | 
340 | 1. Use the appropriate file creation or modification syntax (e.g., ```python:path/to/file.py for Python files).
341 | 2. Write the full content of the file, not just snippets or placeholders.
342 | 3. Ensure all necessary imports and dependencies are included.
343 | 4. Follow the specified file creation order rigorously: 1. tools, 2. agents, 3. agency, 4. requirements.txt.
344 | 
345 | If you find yourself about to output code in the chat, STOP and reconsider your approach. Always prioritize actual file creation and modification over chat explanations.
346 | 
347 | --- End of Agency Swarm Instructions ---
348 | 
349 | ## Standalone Tools in /tools Directory
350 | 
351 | To reiterate the distinction, the `/tools` directory contains standalone tools that are adapted from Agency-Swarm standards but are not directly part of any specific agent or agency. When developing these tools:
352 | 
353 | 1. Place all standalone tools in the `src/voice_assistant/tools/` directory.
354 | 2. Each tool should be in its own file, named after the tool class (e.g., `GetCurrentDateTime.py` for `GetCurrentDateTime` class).
355 | 3. Tools must inherit from `BaseTool` from `agency_swarm.tools`.
356 | 4. Use async syntax for the `run` method.
357 | 5. For synchronous operations within async tools, use `asyncio.to_thread`.
358 | 6. Always use environment variables for API keys and sensitive information.
359 | 7. Add a test case at the bottom of each tool file.
360 | 
361 | These standalone tools can be used across different agencies or independently, providing flexibility and reusability within the Voice Assistant project.
362 | 
363 | Remember, when developing within the Voice Assistant project, always consider whether you're working on a standalone tool (/tools) or an Agency Swarm agency (/agencies) and follow the appropriate guidelines for each.
364 | 


--------------------------------------------------------------------------------
/.env.sample:
--------------------------------------------------------------------------------
1 | OPENAI_API_KEY=
2 | PERSONALIZATION_FILE=./personalization.json
3 | SCRATCH_PAD_DIR=./scratchpad
4 | EMAIL_SENDER=sender@example.com
5 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | output/
  2 | input/
  3 | 
  4 | # Based on https://raw.githubusercontent.com/github/gitignore/main/Node.gitignore
  5 | 
  6 | # Logs
  7 | 
  8 | logs
  9 | _.log
 10 | npm-debug.log_
 11 | yarn-debug.log*
 12 | yarn-error.log*
 13 | lerna-debug.log*
 14 | .pnpm-debug.log*
 15 | 
 16 | # Caches
 17 | 
 18 | .cache
 19 | 
 20 | # Diagnostic reports (https://nodejs.org/api/report.html)
 21 | 
 22 | report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
 23 | 
 24 | # Runtime data
 25 | 
 26 | pids
 27 | _.pid
 28 | _.seed
 29 | *.pid.lock
 30 | 
 31 | # Directory for instrumented libs generated by jscoverage/JSCover
 32 | 
 33 | lib-cov
 34 | 
 35 | # Coverage directory used by tools like istanbul
 36 | 
 37 | coverage
 38 | *.lcov
 39 | 
 40 | # nyc test coverage
 41 | 
 42 | .nyc_output
 43 | 
 44 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
 45 | 
 46 | .grunt
 47 | 
 48 | # Bower dependency directory (https://bower.io/)
 49 | 
 50 | bower_components
 51 | 
 52 | # node-waf configuration
 53 | 
 54 | .lock-wscript
 55 | 
 56 | # Compiled binary addons (https://nodejs.org/api/addons.html)
 57 | 
 58 | build/Release
 59 | 
 60 | # Dependency directories
 61 | 
 62 | node_modules/
 63 | jspm_packages/
 64 | 
 65 | # Snowpack dependency directory (https://snowpack.dev/)
 66 | 
 67 | web_modules/
 68 | 
 69 | # TypeScript cache
 70 | 
 71 | *.tsbuildinfo
 72 | 
 73 | # Optional npm cache directory
 74 | 
 75 | .npm
 76 | 
 77 | # Optional eslint cache
 78 | 
 79 | .eslintcache
 80 | 
 81 | # Optional stylelint cache
 82 | 
 83 | .stylelintcache
 84 | 
 85 | # Microbundle cache
 86 | 
 87 | .rpt2_cache/
 88 | .rts2_cache_cjs/
 89 | .rts2_cache_es/
 90 | .rts2_cache_umd/
 91 | 
 92 | # Optional REPL history
 93 | 
 94 | .node_repl_history
 95 | 
 96 | # Output of 'npm pack'
 97 | 
 98 | *.tgz
 99 | 
100 | # Yarn Integrity file
101 | 
102 | .yarn-integrity
103 | 
104 | # dotenv environment variable files
105 | 
106 | .env
107 | .env.development.local
108 | .env.test.local
109 | .env.production.local
110 | .env.local
111 | 
112 | # parcel-bundler cache (https://parceljs.org/)
113 | 
114 | .parcel-cache
115 | 
116 | # Next.js build output
117 | 
118 | .next
119 | out
120 | 
121 | # Nuxt.js build / generate output
122 | 
123 | .nuxt
124 | dist
125 | 
126 | # Gatsby files
127 | 
128 | # Comment in the public line in if your project uses Gatsby and not Next.js
129 | 
130 | # https://nextjs.org/blog/next-9-1#public-directory-support
131 | 
132 | # public
133 | 
134 | # vuepress build output
135 | 
136 | .vuepress/dist
137 | 
138 | # vuepress v2.x temp and cache directory
139 | 
140 | .temp
141 | 
142 | # Docusaurus cache and generated files
143 | 
144 | .docusaurus
145 | 
146 | # Serverless directories
147 | 
148 | .serverless/
149 | 
150 | # FuseBox cache
151 | 
152 | .fusebox/
153 | 
154 | # DynamoDB Local files
155 | 
156 | .dynamodb/
157 | 
158 | # TernJS port file
159 | 
160 | .tern-port
161 | 
162 | # Stores VSCode versions used for testing VSCode extensions
163 | 
164 | .vscode-test
165 | 
166 | # yarn v2
167 | 
168 | .yarn/cache
169 | .yarn/unplugged
170 | .yarn/build-state.yml
171 | .yarn/install-state.gz
172 | .pnp.*
173 | 
174 | # IntelliJ based IDEs
175 | .idea
176 | 
177 | # Finder (MacOS) folder config
178 | .DS_Store
179 | .aider*
180 | 
181 | __pycache__/
182 | 
183 | .venv/
184 | 
185 | apps/marimo-prompt-library/prompt_executions
186 | 
187 | .env
188 | 
189 | scratchpad/
190 | 
191 | runtime_time_table.jsonl
192 | 
193 | settings.json
194 | token.json
195 | credentials.json
196 | 
197 | screenshot.jpg
198 | 


--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
 1 | repos:
 2 |   - repo: https://github.com/pre-commit/pre-commit-hooks
 3 |     rev: v5.0.0
 4 |     hooks:
 5 |       - id: trailing-whitespace
 6 |       - id: end-of-file-fixer
 7 |       - id: check-yaml
 8 |       - id: check-toml
 9 |       - id: debug-statements
10 |         language_version: python3
11 | 
12 |   - repo: https://github.com/astral-sh/ruff-pre-commit
13 |     rev: v0.6.9
14 |     hooks:
15 |       - id: ruff
16 |         args: [--fix, --select=I]
17 |       - id: ruff-format
18 | 


--------------------------------------------------------------------------------
/AI_DOCS/Agency-Swarm Docs.md:
--------------------------------------------------------------------------------
  1 | # AI Agent Creator Instructions for Agency Swarm Framework
  2 | 
  3 | You are an expert AI developer, your mission is to develop tools and agents that enhance the capabilities of other agents. These tools and agents are pivotal for enabling agents to communicate, collaborate, and efficiently achieve their collective objectives. Below are detailed instructions to guide you through the process of creating tools and agents, ensuring they are both functional and align with the framework's standards.
  4 | 
  5 | ## Understanding Your Role
  6 | 
  7 | Your primary role is to architect tools and agents that fulfill specific needs within the agency. This involves:
  8 | 
  9 | 1. **Tool Development:** Develop each tool following the Agency Swarm's specifications, ensuring it is robust and ready for production environments. It must not use any placeholders and be located in the correct agent's tools folder.
 10 | 2. **Identifying Packages:** Determine the best possible packages or APIs that can be used to create a tool based on the user's requirements. Utilize web search if you are uncertain about which API or package to use.
 11 | 3. **Instructions for the Agent**: If the agent is underperforming, you will need to adjust it's instructions based on the user's feedback. Find the instructions.md file for the agent and adjust it.
 12 | 
 13 | ## Agency Swarm Framework Overview
 14 | 
 15 | Agency Swarm started as a desire and effort of Arsenii Shatokhin (aka VRSEN) to fully automate his AI Agency with AI. By building this framework, we aim to simplify the agent creation process and enable anyone to create a collaborative swarm of agents (Agencies), each with distinct roles and capabilities.
 16 | 
 17 | ### Key Features
 18 | 
 19 | - **Customizable Agent Roles**: Define roles like CEO, virtual assistant, developer, etc., and customize their functionalities with [Assistants API](https://platform.openai.com/docs/assistants/overview).
 20 | - **Full Control Over Prompts**: Avoid conflicts and restrictions of pre-defined prompts, allowing full customization.
 21 | - **Tool Creation**: Tools within Agency Swarm are created using pydantic, which provides a convenient interface and automatic type validation.
 22 | - **Efficient Communication**: Agents communicate through a specially designed "send message" tool based on their own descriptions.
 23 | - **State Management**: Agency Swarm efficiently manages the state of your assistants on OpenAI, maintaining it in a special `settings.json` file.
 24 | - **Deployable in Production**: Agency Swarm is designed to be reliable and easily deployable in production environments.
 25 | 
 26 | ### Folder Structure
 27 | 
 28 | In Agency Swarm, the folder structure is organized as follows:
 29 | 
 30 | 1. Each agency and agent has its own dedicated folder.
 31 | 2. Within each agent folder:
 32 | 
 33 |    - A 'tools' folder contains all tools for that agent.
 34 |    - An 'instructions.md' file provides agent-specific instructions.
 35 |    - An '**init**.py' file contains the import of the agent.
 36 | 
 37 | 3. Tool Import Process:
 38 | 
 39 |    - Create a file in the 'tools' folder with the same name as the tool class.
 40 |    - The tool needs to be added to the tools list in the agent class. Do not overwrite existing tools when adding a new tool.
 41 |    - All new requirements must be added to the requirements.txt file.
 42 | 
 43 | 4. Agency Configuration:
 44 |    - The 'agency.py' file is the main file where all new agents are imported.
 45 |    - When creating a new agency folder, use descriptive names, like for example: marketing_agency, development_agency, etc.
 46 | 
 47 | Follow this folder structure when creating or modifying files within the Agency Swarm framework:
 48 | 
 49 | ```
 50 | agency_name/
 51 | ├── agent_name/
 52 | │   ├── __init__.py
 53 | │   ├── agent_name.py
 54 | │   ├── instructions.md
 55 | │   └── tools/
 56 | │       ├── tool_name1.py
 57 | │       ├── tool_name2.py
 58 | │       ├── tool_name3.py
 59 | │       ├── ...
 60 | ├── another_agent/
 61 | │   ├── __init__.py
 62 | │   ├── another_agent.py
 63 | │   ├── instructions.md
 64 | │   └── tools/
 65 | │       ├── tool_name1.py
 66 | │       ├── tool_name2.py
 67 | │       ├── tool_name3.py
 68 | │       ├── ...
 69 | ├── agency.py
 70 | ├── agency_manifesto.md
 71 | ├── requirements.txt
 72 | └──...
 73 | ```
 74 | 
 75 | ## Instructions
 76 | 
 77 | ### 1. Create tools
 78 | 
 79 | Tools are the specific actions that agents can perform. They are defined in the `tools` folder.
 80 | 
 81 | When creating a tool, you are defining a new class that extends `BaseTool` from `agency_swarm.tools`. This process involves several key steps, outlined below.
 82 | 
 83 | #### 1. Import Necessary Modules
 84 | 
 85 | Start by importing `BaseTool` from `agency_swarm.tools` and `Field` from `pydantic`. These imports will serve as the foundation for your custom tool class. Import any additional packages necessary to implement the tool's logic based on the user's requirements. Import `load_dotenv` from `dotenv` to load the environment variables.
 86 | 
 87 | #### 2. Define Your Tool Class
 88 | 
 89 | Create a new class that inherits from `BaseTool`. This class will encapsulate the functionality of your tool. `BaseTool` class inherits from the Pydantic's `BaseModel` class.
 90 | 
 91 | #### 3. Specify Tool Fields
 92 | 
 93 | Define the fields your tool will use, utilizing Pydantic's `Field` for clear descriptions and validation. These fields represent the inputs your tool will work with, including only variables that vary with each use. Define any constant variables globally.
 94 | 
 95 | #### 4. Implement the `run` Method
 96 | 
 97 | The `run` method is where your tool's logic is executed. Use the fields defined earlier to perform the tool's intended task. It must contain the actual fully functional correct python code. It can utilize various python packages, previously imported in step 1.
 98 | 
 99 | ### Best Practices
100 | 
101 | - **Identify Necessary Packages**: Determine the best packages or APIs to use for creating the tool based on the requirements.
102 | - **Documentation**: Ensure each class and method is well-documented. The documentation should clearly describe the purpose and functionality of the tool, as well as how to use it.
103 | - **Code Quality**: Write clean, readable, and efficient code. Adhere to the PEP 8 style guide for Python code.
104 | - **Web Research**: Utilize web browsing to identify the most relevant packages, APIs, or documentation necessary for implementing your tool's logic.
105 | - **Use Python Packages**: Prefer to use various API wrapper packages and SDKs available on pip, rather than calling these APIs directly using requests.
106 | - **Expect API Keys to be defined as env variables**: If a tool requires an API key or an access token, it must be accessed from the environment using os package within the `run` method's logic.
107 | - **Use global variables for constants**: If a tool requires a constant global variable, that does not change from use to use, (for example, ad_account_id, pull_request_id, etc.), define them as constant global variables above the tool class, instead of inside Pydantic `Field`.
108 | - **Add a test case at the bottom of the file**: Add a test case for each tool in if **name** == "**main**": block.
109 | 
110 | ### Example of a Tool
111 | 
112 | ```python
113 | from agency_swarm.tools import BaseTool
114 | from pydantic import Field
115 | import os
116 | from dotenv import load_dotenv
117 | 
118 | load_dotenv() # always load the environment variables
119 | 
120 | account_id = "MY_ACCOUNT_ID"
121 | api_key = os.getenv("MY_API_KEY") # or access_token = os.getenv("MY_ACCESS_TOKEN")
122 | 
123 | class MyCustomTool(BaseTool):
124 |     """
125 |     A brief description of what the custom tool does.
126 |     The docstring should clearly explain the tool's purpose and functionality.
127 |     It will be used by the agent to determine when to use this tool.
128 |     """
129 |     # Define the fields with descriptions using Pydantic Field
130 |     example_field: str = Field(
131 |         ..., description="Description of the example field, explaining its purpose and usage for the Agent."
132 |     )
133 | 
134 |     def run(self):
135 |         """
136 |         The implementation of the run method, where the tool's main functionality is executed.
137 |         This method should utilize the fields defined above to perform the task.
138 |         """
139 |         # Your custom tool logic goes here
140 |         # Example:
141 |         # do_something(self.example_field, api_key, account_id)
142 | 
143 |         # Return the result of the tool's operation as a string
144 |         return "Result of MyCustomTool operation"
145 | 
146 | if __name__ == "__main__":
147 |     tool = MyCustomTool(example_field="example value")
148 |     print(tool.run())
149 | ```
150 | 
151 | Remember, each tool code snippet you create must be fully ready to use. It must not contain any placeholders or hypothetical examples.
152 | 
153 | ## 2. Create agents
154 | 
155 | Agents are the core of the framework. Each agent has it's own unique role and functionality and is designed to perform specific tasks. Each file for the agent must be named the same as the agent's name.
156 | 
157 | ### Agent Class
158 | 
159 | To create an agent, import `Agent` from `agency_swarm` and create a class that inherits from `Agent`. Inside the class you can adjust the following parameters:
160 | 
161 | ```python
162 | from agency_swarm import Agent
163 | 
164 | class CEO(Agent):
165 |     def __init__(self):
166 |         super().__init__(
167 |             name="CEO",
168 |             description="Responsible for client communication, task planning and management.",
169 |             instructions="./instructions.md", # instructions for the agent
170 |             tools=[MyCustomTool],
171 |             temperature=0.5,
172 |             max_prompt_tokens=25000,
173 |         )
174 | ```
175 | 
176 | - Name: The agent's name, reflecting its role.
177 | - Description: A brief summary of the agent's responsibilities.
178 | - Instructions: Path to a markdown file containing detailed instructions for the agent.
179 | - Tools: A list of tools (extending BaseTool) that the agent can use. (Tools must not be initialized, so the agent can pass the parameters itself)
180 | - Other Parameters: Additional settings like temperature, max_prompt_tokens, etc.
181 | 
182 | Make sure to create a separate folder for each agent, as described in the folder structure above. After creating the agent, you need to import it into the agency.py file.
183 | 
184 | #### instructions.md file
185 | 
186 | Each agent also needs to have an `instructions.md` file, which is the system prompt for the agent. Inside those instructions, you need to define the following:
187 | 
188 | - **Agent Role**: A description of the role of the agent.
189 | - **Goals**: A list of goals that the agent should achieve, aligned with the agency's mission.
190 | - **Process Workflow**: A step by step guide on how the agent should perform its tasks. Each step must be aligned with the other agents in the agency, and with the tools available to this agent.
191 | 
192 | Use the following template for the instructions.md file:
193 | 
194 | ```md
195 | # Agent Role
196 | 
197 | A description of the role of the agent.
198 | 
199 | # Goals
200 | 
201 | A list of goals that the agent should achieve, aligned with the agency's mission.
202 | 
203 | # Process Workflow
204 | 
205 | 1. Step 1
206 | 2. Step 2
207 | 3. Step 3
208 | ```
209 | 
210 | Instructions for the agent to be created in markdown format. Instructions should include a description of the role and a specific step by step process that this agent needs to perform in order to execute the tasks. The process must also be aligned with all the other agents in the agency. Agents should be able to collaborate with each other to achieve the common goal of the agency.
211 | 
212 | #### Code Interpreter and FileSearch Options
213 | 
214 | To utilize the Code Interpreter tool (the Jupyter Notebook Execution environment, without Internet access) and the FileSearch tool (a Retrieval-Augmented Generation (RAG) provided by OpenAI):
215 | 
216 | 1. Import the tools:
217 | 
218 |    ```python
219 |    from agency_swarm.tools import CodeInterpreter, FileSearch
220 | 
221 |    ```
222 | 
223 | 2. Add the tools to the agent's tools list:
224 | 
225 |    ```python
226 |    agent = Agent(
227 |        name="MyAgent",
228 |        tools=[CodeInterpreter, FileSearch],
229 |        # ... other agent parameters
230 |    )
231 | 
232 |    ```
233 | 
234 | ## 3. Create Agencies
235 | 
236 | Agencies are collections of agents that work together to achieve a common goal. They are defined in the `agency.py` file.
237 | 
238 | ### Agency Class
239 | 
240 | To create an agency, import `Agency` from `agency_swarm` and create a class that inherits from `Agency`. Inside the class you can adjust the following parameters:
241 | 
242 | ```python
243 | from agency_swarm import Agency
244 | from CEO import CEO
245 | from Developer import Developer
246 | from VirtualAssistant import VirtualAssistant
247 | 
248 | dev = Developer()
249 | va = VirtualAssistant()
250 | 
251 | agency = Agency([
252 |         ceo,  # CEO will be the entry point for communication with the user
253 |         [ceo, dev],  # CEO can initiate communication with Developer
254 |         [ceo, va],   # CEO can initiate communication with Virtual Assistant
255 |         [dev, va]    # Developer can initiate communication with Virtual Assistant
256 |         ],
257 |         shared_instructions='agency_manifesto.md', #shared instructions for all agents
258 |         temperature=0.5, # default temperature for all agents
259 |         max_prompt_tokens=25000 # default max tokens in conversation history
260 | )
261 | 
262 | if __name__ == "__main__":
263 |     agency.run_demo() # starts the agency in terminal
264 | ```
265 | 
266 | #### Communication Flows
267 | 
268 | In Agency Swarm, communication flows are directional, meaning they are established from left to right in the agency_chart definition. For instance, in the example above, the CEO can initiate a chat with the developer (dev), and the developer can respond in this chat. However, the developer cannot initiate a chat with the CEO. The developer can initiate a chat with the virtual assistant (va) and assign new tasks.
269 | 
270 | To allow agents to communicate with each other, simply add them in the second level list inside the agency chart like this: `[ceo, dev], [ceo, va], [dev, va]`. The agent on the left will be able to communicate with the agent on the right.
271 | 
272 | #### Agency Manifesto
273 | 
274 | Agency manifesto is a file that contains shared instructions for all agents in the agency. It is a markdown file that is located in the agency folder. Please write the manifesto file when creating a new agency. Include the following:
275 | 
276 | - **Agency Description**: A brief description of the agency.
277 | - **Mission Statement**: A concise statement that encapsulates the purpose and guiding principles of the agency.
278 | - **Operating Environment**: A description of the operating environment of the agency.
279 | 
280 | # Notes
281 | 
282 | IMPORTANT: NEVER output code snippets or file contents in the chat. Always create or modify the actual files in the file system. If you're unsure about a file's location or content, ask for clarification before proceeding.
283 | 
284 | When creating or modifying files:
285 | 
286 | 1. Use the appropriate file creation or modification syntax (e.g., ```python:path/to/file.py for Python files).
287 | 2. Write the full content of the file, not just snippets or placeholders.
288 | 3. Ensure all necessary imports and dependencies are included.
289 | 4. Follow the specified file creation order rigorously: 1. tools, 2. agents, 3. agency, 4. requirements.txt.
290 | 
291 | If you find yourself about to output code in the chat, STOP and reconsider your approach. Always prioritize actual file creation and modification over chat explanations.
292 | 


--------------------------------------------------------------------------------
/AI_DOCS/event_docs.md:
--------------------------------------------------------------------------------
  1 | # Realtime API Events
  2 | 
  3 | # Realtime API Events
  4 | 
  5 | - Session Configuration
  6 |   • session.update
  7 |     - Configures the connection-wide behavior of the conversation session
  8 |     - Typically sent immediately after connecting
  9 |     - Can be sent at any point to reconfigure behavior after the current response is complete
 10 | 
 11 | - Input Audio
 12 |   • input_audio_buffer.append
 13 |     - Appends audio data to the shared user input buffer
 14 |     - Audio not processed until end of speech detected or manual response.create sent
 15 |   • input_audio_buffer.clear
 16 |     - Clears the current audio input buffer
 17 |     - Does not impact responses already in progress
 18 |   • input_audio_buffer.commit
 19 |     - Commits current state of user input buffer to subscribed conversations
 20 |     - Includes it as information for the next response
 21 | 
 22 | - Item Management (for establishing history or including non-audio item information)
 23 |   • conversation.item.create
 24 |     - Inserts a new item into the conversation
 25 |     - Can be positioned according to previous_item_id
 26 |     - Provides new input, tool responses, or historical information
 27 |   • conversation.item.delete
 28 |     - Removes an item from an existing conversation
 29 |   • conversation.item.truncate
 30 |     - Manually shortens text and/or audio content in a message
 31 |     - Useful for situations with faster-than-realtime model generation
 32 | 
 33 | - Response Management
 34 |   • response.create
 35 |     - Initiates model processing of unprocessed conversation input
 36 |     - Signifies the end of the caller's logical turn
 37 |     - Must be called for text input, tool responses, none mode, etc.
 38 |   • response.cancel
 39 |     - Cancels an in-progress response
 40 | 
 41 | - Responses: commands sent by the /realtime endpoint to the caller
 42 |   • session.created
 43 |     - Sent upon successful connection establishment
 44 |     - Provides a connection-specific ID for debugging or logging
 45 |   • session.updated
 46 |     - Sent in response to a session.update event
 47 |     - Reflects changes made to the session configuration
 48 | 
 49 | - Caller Item Acknowledgement
 50 |   • conversation.item.created
 51 |     - Acknowledges insertion of a new conversation item
 52 |   • conversation.item.deleted
 53 |     - Acknowledges removal of an existing conversation item
 54 |   • conversation.item.truncated
 55 |     - Acknowledges truncation of an existing conversation item
 56 | 
 57 | - Response Flow
 58 |   • response.created
 59 |     - Notifies start of a new response for a conversation
 60 |     - Snapshots input state and begins generation of new items
 61 |   • response.done
 62 |     - Notifies completion of response generation
 63 |   • rate_limits.updated
 64 |     - Sent after response.done
 65 |     - Provides current rate limit information
 66 | 
 67 | - Item Flow in a Response
 68 |   • response.output_item.added
 69 |     - Notifies creation of a new, server-generated conversation item
 70 |   • response_output_item_done
 71 |     - Notifies completion of a new conversation item's addition
 72 | 
 73 | - Content Flow within Response Items
 74 |   • response.content_part.added
 75 |     - Notifies creation of a new content part within a conversation item
 76 |   • response.content_part.done
 77 |     - Signals completion of a newly created content part
 78 |   • response.audio.delta
 79 |     - Provides incremental update to binary audio data
 80 |   • response.audio.done
 81 |     - Signals completion of audio content part updates
 82 |   • response.audio_transcript.delta
 83 |     - Provides incremental update to audio transcription
 84 |   • response.audio_transcript.done
 85 |     - Signals completion of audio transcription updates
 86 |   • response.text.delta
 87 |     - Provides incremental update to text content
 88 |   • response.text.done
 89 |     - Signals completion of text content updates
 90 |   • response.function_call_arguments.delta
 91 |     - Provides incremental update to function call arguments
 92 |   • response.function_call_arguments.done
 93 |     - Signals completion of function call arguments
 94 | 
 95 | - User Input Audio
 96 |   • input_audio_buffer.speech_started
 97 |     - Notifies detection of speech start in input audio buffer
 98 |   • input_audio_buffer.speech_stopped
 99 |     - Notifies detection of speech end in input audio buffer
100 |   • conversation.item.input_audio_transcription.completed
101 |     - Notifies availability of input audio transcription
102 |   • conversation.item_input_audio_transcription.failed
103 |     - Notifies failure of input audio transcription
104 |   • input_audio_buffer_committed
105 |     - Acknowledges submission of user audio input buffer
106 |   • input_audio_buffer_cleared
107 |     - Acknowledges clearing of pending user audio input buffer
108 | 
109 | - Other
110 |   • error
111 |     - Indicates processing error in the session
112 |     - Includes detailed error message
113 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Realtime API Async Python Assistant
  2 | 
  3 | This project demonstrates the use of OpenAI's Realtime API to create an AI assistant capable of handling voice input, performing various tasks, and providing audio responses. It showcases the integration of tools, structured output responses, and real-time interaction.
  4 | 
  5 | ## Features
  6 | 
  7 | ### Core Functionality
  8 | - Real-time voice interaction with an AI assistant
  9 | - Asynchronous audio input and output handling
 10 | - Custom tools execution based on user requests
 11 | 
 12 | ### Task Delegation & Communication
 13 | - **Synchronous Communication**: Direct, immediate interaction with agents for quick tasks
 14 | - **Asynchronous Task Delegation**: Long-running task delegation to agencies/agents
 15 |   - Send messages to agency CEOs without waiting for responses
 16 |   - Send messages to subordinate agents on behalf of CEOs
 17 | - **Task Status Monitoring**: Check completion status and retrieve responses
 18 | - Multiple specialized AI agent teams working collaboratively
 19 | 
 20 | ### Integration Services
 21 | - Google Calendar integration for meeting schedule management
 22 | - Gmail integration for email handling and drafting
 23 | - Browser interaction for web-related tasks
 24 | - File system operations (create, update, delete)
 25 | 
 26 | ## Available Tools
 27 | 
 28 | ### Agency Communication Tools
 29 | - **SendMessage**: Synchronous communication with agencies/agents for quick tasks
 30 |   - Direct interaction with immediate response
 31 |   - Suitable for simple, fast-completing tasks
 32 | 
 33 | - **SendMessageAsync**: Asynchronous task delegation
 34 |   - Initiates long-running tasks without waiting
 35 |   - Returns immediately to allow other operations
 36 | 
 37 | - **GetResponse**: Task status and response retrieval
 38 |   - Checks completion status of async tasks
 39 |   - Retrieves agent responses when tasks complete
 40 | 
 41 | ### Google Workspace Integration
 42 | - **FetchDailyMeetingSchedule**: Fetches and formats the user's daily meeting schedule from Google Calendar
 43 | - **GetGmailSummary**: Provides a concise summary of unread Gmail messages from the past 48 hours
 44 | - **DraftGmail**: Composes email drafts, either as a reply to an email from GetGmailSummary, or as a new message
 45 | 
 46 | ### System Tools
 47 | - **GetScreenDescription**: Captures and analyzes the current screen content for the assistant
 48 | - **FileOps**:
 49 |    - **CreateFile**: Generates new files with user-specified content
 50 |    - **UpdateFile**: Modifies existing files with new content
 51 |    - **DeleteFile**: Removes specified files from the system
 52 | - **OpenBrowser**: Launches a web browser with a given URL
 53 | - **GetCurrentDateTime**: Retrieves and reports the current date and time
 54 | 
 55 | ## Setup
 56 | 
 57 | ### MacOS Installation
 58 | 
 59 | 1. Install [Python 3.12](https://www.python.org/downloads/macos/).
 60 | 2. Install [uv](https://docs.astral.sh/uv/), a modern Python package manager
 61 | 3. Clone this repository to your local machine
 62 | 4. Create a local environment file `.env` based on `.env.sample`
 63 | 5. Customize `personalization.json` and `config.py` to your preferences
 64 | 6. Install the required audio library: `brew install portaudio`
 65 | 7. Install project dependencies: `uv sync`
 66 | 8. Launch the assistant: `uv run main`
 67 | 
 68 | ### Google Cloud API Configuration
 69 | 
 70 | To enable Google Cloud API integration, follow these steps:
 71 | 
 72 | 1. Create OAuth 2.0 Client IDs in the Google Cloud Console
 73 | 2. Place the `credentials.json` file in the project's root directory
 74 | 3. Configure `http://localhost:8080/` as an Authorized Redirect URI in your Google Cloud project settings
 75 | 4. Set the OAuth consent screen to "Internal" user type
 76 | 5. Enable the following APIs and scopes in your Google Cloud project:
 77 |    - Gmail API
 78 |      - `https://www.googleapis.com/auth/gmail.readonly`
 79 |      - `https://www.googleapis.com/auth/gmail.compose`
 80 |      - `https://www.googleapis.com/auth/gmail.modify`
 81 |    - Google Calendar API
 82 |      - `https://www.googleapis.com/auth/calendar.readonly`
 83 | 
 84 | ## Configuration
 85 | 
 86 | The project relies on environment variables and a `personalization.json` file for configuration. Ensure you have set up:
 87 | 
 88 | - `OPENAI_API_KEY`: Your personal OpenAI API key
 89 | - `PERSONALIZATION_FILE`: Path to your customized personalization JSON file
 90 | - `SCRATCH_PAD_DIR`: Directory for temporary file storage
 91 | 
 92 | ## Usage
 93 | 
 94 | After launching the assistant, interact using voice commands. Example interactions:
 95 | 
 96 | 1. "What do I have on my schedule for today? Tell me only most important meetings."
 97 | 2. "Do I have any important emails?"
 98 | 3. "Open ChatGPT in my browser."
 99 | 4. "Create a new file named user_data.txt with some example content."
100 | 5. "Update the user_data.txt file by adding more information."
101 | 6. "Delete the user_data.txt file."
102 | 7. "Ask the research team to write a detailed market analysis report."
103 | 8. "Check if the research team has completed the market analysis report."
104 | 
105 | ## Code Structure
106 | 
107 | ### Core Components
108 | 
109 | - `main.py`: Application entry point
110 | - `agencies/`: Agency-Swarm teams of specialized agents
111 | - `tools/`: Standalone tools for various functions
112 | - `config.py`: Configuration settings and environment variable management
113 | - `visual_interface.py`: Visual interface for audio energy visualization
114 | - `websocket_handler.py`: WebSocket event and message processing
115 | 
116 | ### Key Features
117 | 
118 | 1. **Asynchronous WebSocket Communication**:
119 |    Utilizes `websockets` for asynchronous connection with the OpenAI Realtime API
120 | 
121 | 2. **Audio Input/Output Handling**:
122 |    Manages real-time audio capture and playback with PCM16 format support and VAD (Voice Activity Detection)
123 | 
124 | 3. **Function Execution**:
125 |    Standalone tools in `tools/` are invoked by the AI assistant based on user requests
126 | 
127 | 4. **Structured Output Processing**:
128 |    OpenAI's Structured Outputs are used to generate precise, structured responses
129 | 
130 | 5. **Visual Interface**:
131 |    PyGame-based interface provides real-time visualization of audio volume
132 | 
133 | ## Extending Functionality
134 | 
135 | ### Adding Standalone Tools
136 | 
137 | Standalone tools are independent functions not associated with specific agents or agencies.
138 | 
139 | To add a new standalone tool:
140 | 1. Create a new file in the `tools/` directory
141 | 2. Implement the `run` method using async syntax, utilizing `asyncio.to_thread` for blocking operations
142 | 3. Install any necessary dependencies: `uv add <package_name>`
143 | 
144 | ### Adding New Agencies
145 | 
146 | Agencies are Agency-Swarm style teams of specialized agents working together on complex tasks.
147 | 
148 | To add a new agency:
149 | 1. Drag-and-drop your agency folder into the `agencies/` directory
150 | 2. Set `async_mode="threading"` in agency configuration to enable async messaging (SendMessageAsync and GetResponse)
151 | 3. Install any required dependencies: `uv add <package_name>`
152 | 
153 | ## Additional Resources
154 | 
155 | - [OpenAI Realtime API Documentation](https://platform.openai.com/docs/guides/realtime)
156 | - [OpenAI Structured Outputs Guide](https://platform.openai.com/docs/guides/structured-outputs)
157 | - [WebSockets Library for Python](https://websockets.readthedocs.io/)
158 | 


--------------------------------------------------------------------------------
/personalization.json:
--------------------------------------------------------------------------------
1 | {
2 |   "browser": "chrome",
3 |   "ai_assistant_name": "Sky",
4 |   "user_name": "VRSEN",
5 |   "assistant_instructions": "You are {ai_assistant_name}, a concise and efficient **voice assistant** for {user_name}.\nKey points:\n1. Provide brief, rapid responses.\n2. Immediately utilize available functions when appropriate, except for destructive actions.\n3. Immediately relay subordinate agent responses. Sometimes it may make sense to wait for the subordinate agent to respond before continuing.\n4. If you find yourself providing a long response, STOP and ask if the user still wants you to continue."
6 | }
7 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [project]
 2 | name = "voice-assistant"
 3 | version = "0.1.0"
 4 | description = "Agency Swarm Voice Interface"
 5 | readme = "README.md"
 6 | requires-python = ">=3.12"
 7 | dependencies = [
 8 |     "agency-swarm==0.4.4",
 9 |     "aiohttp>=3.10.10",
10 |     "google-api-python-client>=2.149.0",
11 |     "google-auth-httplib2>=0.2.0",
12 |     "google-auth-oauthlib>=1.2.1",
13 |     "numpy",
14 |     "openai",
15 |     "pillow>=10.4.0",
16 |     "pyaudio",
17 |     "pygame>=2.6.1",
18 |     "python-dotenv>=1.0.1",
19 |     "selenium-stealth>=1.0.6",
20 |     "selenium>=4.25.0",
21 |     "webdriver-manager>=4.0.2",
22 |     "websockets",
23 | ]
24 | 
25 | [build-system]
26 | requires = ["hatchling"]
27 | build-backend = "hatchling.build"
28 | 
29 | [project.scripts]
30 | main = "voice_assistant.main:main"
31 | 


--------------------------------------------------------------------------------
/src/voice_assistant/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VRSEN/agency-voice-interface/2d9d39ce02d9cb9628e8de79b3543fe05885ad42/src/voice_assistant/__init__.py


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/AnalystAgent/AnalystAgent.py:
--------------------------------------------------------------------------------
 1 | from agency_swarm import Agent
 2 | from agency_swarm.tools import CodeInterpreter, FileSearch
 3 | 
 4 | 
 5 | class AnalystAgent(Agent):
 6 |     def __init__(self):
 7 |         super().__init__(
 8 |             name="AnalystAgent",
 9 |             description="Analyzes data, generates insights, and performs complex calculations using code interpreter and file search capabilities.",
10 |             instructions="./instructions.md",
11 |             tools=[CodeInterpreter, FileSearch],
12 |             temperature=0.0,
13 |             max_prompt_tokens=25000,
14 |         )
15 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/AnalystAgent/instructions.md:
--------------------------------------------------------------------------------
 1 | # Agent Role
 2 | 
 3 | As an Analyst Agent, your role is to analyze data, generate insights, and perform complex calculations to support the research process. You have access to both code execution capabilities and file search functionality to enhance your analytical capabilities.
 4 | 
 5 | # Goals
 6 | 
 7 | 1. Analyze data and generate meaningful insights
 8 | 2. Perform complex calculations and data manipulations
 9 | 3. Search through relevant files and documentation for context
10 | 4. Support other agents with data-driven decision making
11 | 5. Create visualizations and reports when needed
12 | 
13 | # Process Workflow
14 | 
15 | 1. When receiving a task, first assess if you need to:
16 |    - Search existing files for context (using FileSearch)
17 |    - Execute code for analysis (using CodeInterpreter)
18 |    - Both of the above
19 | 
20 | 2. If searching files:
21 |    - Use FileSearch to locate relevant documentation or data
22 |    - Extract and summarize key information
23 |    - Consider how this information affects the analysis
24 | 
25 | 3. If performing analysis:
26 |    - Use CodeInterpreter to write and execute analytical code
27 |    - Ensure code is well-documented and efficient
28 |    - Generate visualizations when they would aid understanding
29 |    - Validate results before sharing
30 | 
31 | 4. When collaborating with other agents:
32 |    - Provide clear explanations of your findings
33 |    - Include relevant code snippets or file references
34 |    - Make specific recommendations based on your analysis
35 | 
36 | 5. Always:
37 |    - Document your methodology
38 |    - Explain your reasoning
39 |    - Highlight any assumptions or limitations
40 |    - Suggest next steps or areas for further investigation
41 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/BrowsingAgent.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import re
  3 | 
  4 | from agency_swarm.agents import Agent
  5 | from typing_extensions import override
  6 | 
  7 | 
  8 | class BrowsingAgent(Agent):
  9 |     SCREENSHOT_FILE_NAME = "screenshot.jpg"
 10 | 
 11 |     def __init__(self, selenium_config=None, **kwargs):
 12 |         from .tools.util.selenium import set_selenium_config
 13 | 
 14 |         super().__init__(
 15 |             name="BrowsingAgent",
 16 |             description="This agent is designed to navigate and search web effectively.",
 17 |             instructions="./instructions.md",
 18 |             files_folder="./files",
 19 |             schemas_folder="./schemas",
 20 |             tools=[],
 21 |             tools_folder="./tools",
 22 |             temperature=0,
 23 |             max_prompt_tokens=16000,
 24 |             model="gpt-4o",
 25 |             validation_attempts=25,
 26 |             **kwargs,
 27 |         )
 28 |         if selenium_config is not None:
 29 |             set_selenium_config(selenium_config)
 30 | 
 31 |         self.prev_message = ""
 32 | 
 33 |     @override
 34 |     def response_validator(self, message):
 35 |         from selenium.webdriver.common.by import By
 36 |         from selenium.webdriver.support.select import Select
 37 | 
 38 |         from .tools.util import (
 39 |             highlight_elements_with_labels,
 40 |             remove_highlight_and_labels,
 41 |         )
 42 |         from .tools.util.selenium import get_web_driver, set_web_driver
 43 | 
 44 |         # Filter out everything in square brackets
 45 |         filtered_message = re.sub(r"\[.*?\]", "", message).strip()
 46 | 
 47 |         if filtered_message and self.prev_message == filtered_message:
 48 |             raise ValueError(
 49 |                 "Do not repeat yourself. If you are stuck, try a different approach or search in google for the page you are looking for directly."
 50 |             )
 51 | 
 52 |         self.prev_message = filtered_message
 53 | 
 54 |         if "[send screenshot]" in message.lower():
 55 |             wd = get_web_driver()
 56 |             remove_highlight_and_labels(wd)
 57 |             self.take_screenshot()
 58 |             response_text = "Here is the screenshot of the current web page:"
 59 | 
 60 |         elif "[highlight clickable elements]" in message.lower():
 61 |             wd = get_web_driver()
 62 |             highlight_elements_with_labels(
 63 |                 wd,
 64 |                 'a, button, div[onclick], div[role="button"], div[tabindex], '
 65 |                 'span[onclick], span[role="button"], span[tabindex]',
 66 |             )
 67 |             self._shared_state.set(
 68 |                 "elements_highlighted",
 69 |                 'a, button, div[onclick], div[role="button"], div[tabindex], '
 70 |                 'span[onclick], span[role="button"], span[tabindex]',
 71 |             )
 72 | 
 73 |             self.take_screenshot()
 74 | 
 75 |             all_elements = wd.find_elements(By.CSS_SELECTOR, ".highlighted-element")
 76 | 
 77 |             all_element_texts = [element.text for element in all_elements]
 78 | 
 79 |             element_texts_json = {}
 80 |             for i, element_text in enumerate(all_element_texts):
 81 |                 element_texts_json[str(i + 1)] = self.remove_unicode(element_text)
 82 | 
 83 |             element_texts_json = {k: v for k, v in element_texts_json.items() if v}
 84 | 
 85 |             element_texts_formatted = ", ".join(
 86 |                 [f"{k}: {v}" for k, v in element_texts_json.items()]
 87 |             )
 88 | 
 89 |             response_text = (
 90 |                 "Here is the screenshot of the current web page with highlighted clickable elements. \n\n"
 91 |                 "Texts of the elements are: " + element_texts_formatted + ".\n\n"
 92 |                 "Elements without text are not shown, but are available on screenshot. \n"
 93 |                 "Please make sure to analyze the screenshot to find the clickable element you need to click on."
 94 |             )
 95 | 
 96 |         elif "[highlight text fields]" in message.lower():
 97 |             wd = get_web_driver()
 98 |             highlight_elements_with_labels(wd, "input, textarea")
 99 |             self._shared_state.set("elements_highlighted", "input, textarea")
100 | 
101 |             self.take_screenshot()
102 | 
103 |             all_elements = wd.find_elements(By.CSS_SELECTOR, ".highlighted-element")
104 | 
105 |             all_element_texts = [element.text for element in all_elements]
106 | 
107 |             element_texts_json = {}
108 |             for i, element_text in enumerate(all_element_texts):
109 |                 element_texts_json[str(i + 1)] = self.remove_unicode(element_text)
110 | 
111 |             element_texts_formatted = ", ".join(
112 |                 [f"{k}: {v}" for k, v in element_texts_json.items()]
113 |             )
114 | 
115 |             response_text = (
116 |                 "Here is the screenshot of the current web page with highlighted text fields: \n"
117 |                 "Texts of the elements are: " + element_texts_formatted + ".\n"
118 |                 "Please make sure to analyze the screenshot to find the text field you need to fill."
119 |             )
120 | 
121 |         elif "[highlight dropdowns]" in message.lower():
122 |             wd = get_web_driver()
123 |             highlight_elements_with_labels(wd, "select")
124 |             self._shared_state.set("elements_highlighted", "select")
125 | 
126 |             self.take_screenshot()
127 | 
128 |             all_elements = wd.find_elements(By.CSS_SELECTOR, ".highlighted-element")
129 | 
130 |             all_selector_values = {}
131 | 
132 |             i = 0
133 |             for element in all_elements:
134 |                 select = Select(element)
135 |                 options = select.options
136 |                 selector_values = {}
137 |                 for j, option in enumerate(options):
138 |                     selector_values[str(j)] = option.text
139 |                     if j > 10:
140 |                         break
141 |                 all_selector_values[str(i + 1)] = selector_values
142 | 
143 |             all_selector_values = {k: v for k, v in all_selector_values.items() if v}
144 |             all_selector_values_formatted = ", ".join(
145 |                 [f"{k}: {v}" for k, v in all_selector_values.items()]
146 |             )
147 | 
148 |             response_text = (
149 |                 "Here is the screenshot with highlighted dropdowns. \n"
150 |                 "Selector values are: " + all_selector_values_formatted + ".\n"
151 |                 "Please make sure to analyze the screenshot to find the dropdown you need to select."
152 |             )
153 | 
154 |         else:
155 |             return message
156 | 
157 |         set_web_driver(wd)
158 |         content = self.create_response_content(response_text)
159 |         raise ValueError(content)
160 | 
161 |     def take_screenshot(self):
162 |         from .tools.util import get_b64_screenshot
163 |         from .tools.util.selenium import get_web_driver
164 | 
165 |         wd = get_web_driver()
166 |         screenshot = get_b64_screenshot(wd)
167 |         screenshot_data = base64.b64decode(screenshot)
168 |         with open(self.SCREENSHOT_FILE_NAME, "wb") as screenshot_file:
169 |             screenshot_file.write(screenshot_data)
170 | 
171 |     def create_response_content(self, response_text):
172 |         with open(self.SCREENSHOT_FILE_NAME, "rb") as file:
173 |             file_id = self.client.files.create(
174 |                 file=file,
175 |                 purpose="vision",
176 |             ).id
177 | 
178 |         content = [
179 |             {"type": "text", "text": response_text},
180 |             {"type": "image_file", "image_file": {"file_id": file_id}},
181 |         ]
182 |         return content
183 | 
184 |     # Function to check for Unicode escape sequences
185 |     def remove_unicode(self, data):
186 |         return re.sub(r"[^\x00-\x7F]+", "", data)
187 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/instructions.md:
--------------------------------------------------------------------------------
 1 | # Browsing Agent Instructions
 2 | 
 3 | As an advanced browsing agent, you are equipped with specialized tools to navigate and search the web effectively. Your primary objective is to fulfill the user's requests by efficiently utilizing these tools.
 4 | 
 5 | ### Primary Instructions:
 6 | 
 7 | 1. **Avoid Guessing URLs**: Never attempt to guess the direct URL. Always perform a Google search if applicable, or return to your previous search results.
 8 | 2. **Navigating to New Pages**: Always use the `ClickElement` tool to open links when navigating to a new web page from the current source. Do not guess the direct URL.
 9 | 3. **Single Page Interaction**: You can only open and interact with one web page at a time. The previous web page will be closed when you open a new one. To navigate back, use the `GoBack` tool.
10 | 4. **Requesting Screenshots**: Before using tools that interact with the web page, ask the user to send you the appropriate screenshot using one of the commands below.
11 | 
12 | ### Commands to Request Screenshots:
13 | 
14 | - **'[send screenshot]'**: Sends the current browsing window as an image. Use this command if the user asks what is on the page.
15 | - **'[highlight clickable elements]'**: Highlights all clickable elements on the current web page. This must be done before using the `ClickElement` tool.
16 | - **'[highlight text fields]'**: Highlights all text fields on the current web page. This must be done before using the `SendKeys` tool.
17 | - **'[highlight dropdowns]'**: Highlights all dropdowns on the current web page. This must be done before using the `SelectDropdown` tool.
18 | 
19 | ### Important Reminders:
20 | 
21 | - Only open and interact with one web page at a time. Do not attempt to read or click on multiple links simultaneously. Complete your interactions with the current web page before proceeding to a different source.
22 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/requirements.txt:
--------------------------------------------------------------------------------
1 | selenium
2 | webdriver-manager
3 | selenium_stealth
4 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/ClickElement.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | 
 3 | from agency_swarm.tools import BaseTool
 4 | from pydantic import Field
 5 | from selenium.webdriver.common.by import By
 6 | 
 7 | from .util import get_web_driver, set_web_driver
 8 | from .util.highlights import remove_highlight_and_labels
 9 | 
10 | 
11 | class ClickElement(BaseTool):
12 |     """
13 |     This tool clicks on an element on the current web page based on its number.
14 | 
15 |     Before using this tool make sure to highlight clickable elements on the page by outputting '[highlight clickable elements]' message.
16 |     """
17 | 
18 |     element_number: int = Field(
19 |         ...,
20 |         description="The number of the element to click on. The element numbers are displayed on the page after highlighting elements.",
21 |     )
22 | 
23 |     def run(self):
24 |         wd = get_web_driver()
25 | 
26 |         if "button" not in self._shared_state.get("elements_highlighted", ""):
27 |             raise ValueError(
28 |                 "Please highlight clickable elements on the page first by outputting '[highlight clickable elements]' message. You must output just the message without calling the tool first, so the user can respond with the screenshot."
29 |             )
30 | 
31 |         all_elements = wd.find_elements(By.CSS_SELECTOR, ".highlighted-element")
32 | 
33 |         # iterate through all elements with a number in the text
34 |         try:
35 |             element_text = all_elements[self.element_number - 1].text
36 |             element_text = element_text.strip() if element_text else ""
37 |             # Subtract 1 because sequence numbers start at 1, but list indices start at 0
38 |             try:
39 |                 all_elements[self.element_number - 1].click()
40 |             except Exception as e:
41 |                 if "element click intercepted" in str(e).lower():
42 |                     wd.execute_script(
43 |                         "arguments[0].click();", all_elements[self.element_number - 1]
44 |                     )
45 |                 else:
46 |                     raise e
47 | 
48 |             time.sleep(3)
49 | 
50 |             result = f"Clicked on element {self.element_number}. Text on clicked element: '{element_text}'. Current URL is {wd.current_url} To further analyze the page, output '[send screenshot]' command."
51 |         except IndexError:
52 |             result = "Element number is invalid. Please try again with a valid element number."
53 |         except Exception as e:
54 |             result = str(e)
55 | 
56 |         wd = remove_highlight_and_labels(wd)
57 | 
58 |         wd.execute_script("document.body.style.zoom='1.5'")
59 | 
60 |         set_web_driver(wd)
61 | 
62 |         self._shared_state.set("elements_highlighted", "")
63 | 
64 |         return result
65 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/ExportFile.py:
--------------------------------------------------------------------------------
 1 | import base64
 2 | import os
 3 | 
 4 | from agency_swarm.tools import BaseTool
 5 | 
 6 | from .util import get_web_driver
 7 | 
 8 | 
 9 | class ExportFile(BaseTool):
10 |     """This tool converts the current full web page into a file and returns its file_id. You can then send this file id back to the user for further processing."""
11 | 
12 |     def run(self):
13 |         wd = get_web_driver()
14 |         from agency_swarm import get_openai_client
15 | 
16 |         client = get_openai_client()
17 | 
18 |         # Define the parameters for the PDF
19 |         params = {
20 |             "landscape": False,
21 |             "displayHeaderFooter": False,
22 |             "printBackground": True,
23 |             "preferCSSPageSize": True,
24 |         }
25 | 
26 |         # Execute the command to print to PDF
27 |         result = wd.execute_cdp_cmd("Page.printToPDF", params)
28 |         pdf = result["data"]
29 | 
30 |         pdf_bytes = base64.b64decode(pdf)
31 | 
32 |         # Save the PDF to a file
33 |         with open("exported_file.pdf", "wb") as f:
34 |             f.write(pdf_bytes)
35 | 
36 |         file_id = client.files.create(
37 |             file=open("exported_file.pdf", "rb"),
38 |             purpose="assistants",
39 |         ).id
40 | 
41 |         self._shared_state.set("file_id", file_id)
42 | 
43 |         return (
44 |             "Success. File exported with id: `"
45 |             + file_id
46 |             + "` You can now send this file id back to the user."
47 |         )
48 | 
49 | 
50 | if __name__ == "__main__":
51 |     wd = get_web_driver()
52 |     wd.get("https://www.google.com")
53 |     tool = ExportFile()
54 |     tool.run()
55 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/GoBack.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | 
 3 | from agency_swarm.tools import BaseTool
 4 | 
 5 | from .util.selenium import get_web_driver, set_web_driver
 6 | 
 7 | 
 8 | class GoBack(BaseTool):
 9 |     """W
10 |     This tool allows you to go back 1 page in the browser history. Use it in case of a mistake or if a page shows you unexpected content.
11 |     """
12 | 
13 |     def run(self):
14 |         wd = get_web_driver()
15 | 
16 |         wd.back()
17 | 
18 |         time.sleep(3)
19 | 
20 |         set_web_driver(wd)
21 | 
22 |         return "Success. Went back 1 page. Current URL is: " + wd.current_url
23 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/ReadURL.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | 
 3 | from agency_swarm.tools import BaseTool
 4 | from pydantic import Field
 5 | 
 6 | from .util.selenium import get_web_driver, set_web_driver
 7 | 
 8 | 
 9 | class ReadURL(BaseTool):
10 |     """
11 |     This tool reads a single URL and opens it in your current browser window. For each new source, either navigate directly to a URL that you believe contains the answer to the user's question or perform a Google search (e.g., 'https://google.com/search?q=search') if necessary.
12 | 
13 |     If you are unsure of the direct URL, do not guess. Instead, use the ClickElement tool to click on links that might contain the desired information on the current web page.
14 | 
15 |     Note: This tool only supports opening one URL at a time. The previous URL will be closed when you open a new one.
16 |     """
17 | 
18 |     chain_of_thought: str = Field(
19 |         ...,
20 |         description="Think step-by-step about where you need to navigate next to find the necessary information.",
21 |         exclude=True,
22 |     )
23 |     url: str = Field(
24 |         ...,
25 |         description="URL of the webpage.",
26 |         examples=["https://google.com/search?q=search"],
27 |     )
28 | 
29 |     class ToolConfig:
30 |         one_call_at_a_time: bool = True
31 | 
32 |     def run(self):
33 |         wd = get_web_driver()
34 | 
35 |         wd.get(self.url)
36 | 
37 |         time.sleep(2)
38 | 
39 |         set_web_driver(wd)
40 | 
41 |         self._shared_state.set("elements_highlighted", "")
42 | 
43 |         return (
44 |             "Current URL is: "
45 |             + wd.current_url
46 |             + "\n"
47 |             + "Please output '[send screenshot]' next to analyze the current web page or '[highlight clickable elements]' for further navigation."
48 |         )
49 | 
50 | 
51 | if __name__ == "__main__":
52 |     tool = ReadURL(url="https://google.com")
53 |     print(tool.run())
54 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/Scroll.py:
--------------------------------------------------------------------------------
 1 | from typing import Literal
 2 | 
 3 | from agency_swarm.tools import BaseTool
 4 | from pydantic import Field
 5 | 
 6 | from .util.selenium import get_web_driver, set_web_driver
 7 | 
 8 | 
 9 | class Scroll(BaseTool):
10 |     """
11 |     This tool allows you to scroll the current web page up or down by 1 screen height.
12 |     """
13 | 
14 |     direction: Literal["up", "down"] = Field(..., description="Direction to scroll.")
15 | 
16 |     def run(self):
17 |         wd = get_web_driver()
18 | 
19 |         height = wd.get_window_size()["height"]
20 | 
21 |         # Get the zoom level
22 |         zoom_level = wd.execute_script("return document.body.style.zoom || '1';")
23 |         zoom_level = (
24 |             float(zoom_level.strip("%")) / 100
25 |             if "%" in zoom_level
26 |             else float(zoom_level)
27 |         )
28 | 
29 |         # Adjust height by zoom level
30 |         adjusted_height = height / zoom_level
31 | 
32 |         current_scroll_position = wd.execute_script("return window.pageYOffset;")
33 |         total_scroll_height = wd.execute_script("return document.body.scrollHeight;")
34 | 
35 |         result = ""
36 | 
37 |         if self.direction == "up":
38 |             if current_scroll_position == 0:
39 |                 # Reached the top of the page
40 |                 result = "Reached the top of the page. Cannot scroll up any further.\n"
41 |             else:
42 |                 wd.execute_script(f"window.scrollBy(0, -{adjusted_height});")
43 |                 result = "Scrolled up by 1 screen height. Make sure to output '[send screenshot]' command to analyze the page after scrolling."
44 | 
45 |         elif self.direction == "down":
46 |             if current_scroll_position + adjusted_height >= total_scroll_height:
47 |                 # Reached the bottom of the page
48 |                 result = (
49 |                     "Reached the bottom of the page. Cannot scroll down any further.\n"
50 |                 )
51 |             else:
52 |                 wd.execute_script(f"window.scrollBy(0, {adjusted_height});")
53 |                 result = "Scrolled down by 1 screen height. Make sure to output '[send screenshot]' command to analyze the page after scrolling."
54 | 
55 |         set_web_driver(wd)
56 | 
57 |         return result
58 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/SelectDropdown.py:
--------------------------------------------------------------------------------
 1 | from typing import Dict
 2 | 
 3 | from agency_swarm.tools import BaseTool
 4 | from pydantic import Field, model_validator
 5 | from selenium.webdriver.common.by import By
 6 | from selenium.webdriver.support.select import Select
 7 | 
 8 | from .util import get_web_driver, set_web_driver
 9 | from .util.highlights import remove_highlight_and_labels
10 | 
11 | 
12 | class SelectDropdown(BaseTool):
13 |     """
14 |     This tool selects an option in a dropdown on the current web page based on the description of that element and which option to select.
15 | 
16 |     Before using this tool make sure to highlight dropdown elements on the page by outputting '[highlight dropdowns]' message.
17 |     """
18 | 
19 |     key_value_pairs: Dict[str, str] = Field(
20 |         ...,
21 |         description="A dictionary where the key is the sequence number of the dropdown element and the value is the index of the option to select.",
22 |         examples=[{"1": 0, "2": 1}, {"3": 2}],
23 |     )
24 | 
25 |     @model_validator(mode="before")
26 |     @classmethod
27 |     def check_key_value_pairs(cls, data):
28 |         if not data.get("key_value_pairs"):
29 |             raise ValueError(
30 |                 "key_value_pairs is required. Example format: "
31 |                 "key_value_pairs={'1': 0, '2': 1}"
32 |             )
33 |         return data
34 | 
35 |     def run(self):
36 |         wd = get_web_driver()
37 | 
38 |         if "select" not in self._shared_state.get("elements_highlighted", ""):
39 |             raise ValueError(
40 |                 "Please highlight dropdown elements on the page first by outputting '[highlight dropdowns]' message. You must output just the message without calling the tool first, so the user can respond with the screenshot."
41 |             )
42 | 
43 |         all_elements = wd.find_elements(By.CSS_SELECTOR, ".highlighted-element")
44 | 
45 |         try:
46 |             for key, value in self.key_value_pairs.items():
47 |                 key = int(key)
48 |                 element = all_elements[key - 1]
49 | 
50 |                 select = Select(element)
51 | 
52 |                 # Select the first option (index 0)
53 |                 select.select_by_index(int(value))
54 |             result = f"Success. Option is selected in the dropdown. To further analyze the page, output '[send screenshot]' command."
55 |         except Exception as e:
56 |             result = str(e)
57 | 
58 |         remove_highlight_and_labels(wd)
59 | 
60 |         set_web_driver(wd)
61 | 
62 |         return result
63 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/SendKeys.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | from typing import Dict
 3 | 
 4 | from agency_swarm.tools import BaseTool
 5 | from pydantic import Field, model_validator
 6 | from selenium.webdriver import Keys
 7 | from selenium.webdriver.common.by import By
 8 | 
 9 | from .util import get_web_driver, set_web_driver
10 | from .util.highlights import remove_highlight_and_labels
11 | 
12 | 
13 | class SendKeys(BaseTool):
14 |     """
15 |     This tool sends keys into input fields on the current webpage based on the description of that element and what needs to be typed. It then clicks "Enter" on the last element to submit the form. You do not need to tell it to press "Enter"; it will do that automatically.
16 | 
17 |     Before using this tool make sure to highlight the input elements on the page by outputting '[highlight text fields]' message.
18 |     """
19 | 
20 |     elements_and_texts: Dict[int, str] = Field(
21 |         ...,
22 |         description="A dictionary where the key is the element number and the value is the text to be typed.",
23 |         examples=[
24 |             {52: "johndoe@gmail.com", 53: "password123"},
25 |             {3: "John Doe", 4: "123 Main St"},
26 |         ],
27 |     )
28 | 
29 |     @model_validator(mode="before")
30 |     @classmethod
31 |     def check_elements_and_texts(cls, data):
32 |         if not data.get("elements_and_texts"):
33 |             raise ValueError(
34 |                 "elements_and_texts is required. Example format: "
35 |                 "elements_and_texts={1: 'John Doe', 2: '123 Main St'}"
36 |             )
37 |         return data
38 | 
39 |     def run(self):
40 |         wd = get_web_driver()
41 |         if "input" not in self._shared_state.get("elements_highlighted", ""):
42 |             raise ValueError(
43 |                 "Please highlight input elements on the page first by outputting '[highlight text fields]' message. You must output just the message without calling the tool first, so the user can respond with the screenshot."
44 |             )
45 | 
46 |         all_elements = wd.find_elements(By.CSS_SELECTOR, ".highlighted-element")
47 | 
48 |         i = 0
49 |         try:
50 |             for key, value in self.elements_and_texts.items():
51 |                 key = int(key)
52 |                 element = all_elements[key - 1]
53 | 
54 |                 try:
55 |                     element.click()
56 |                     element.send_keys(Keys.CONTROL + "a")  # Select all text in input
57 |                     element.send_keys(Keys.DELETE)
58 |                     element.clear()
59 |                 except Exception as e:
60 |                     pass
61 |                 element.send_keys(value)
62 |                 # send enter key to the last element
63 |                 if i == len(self.elements_and_texts) - 1:
64 |                     element.send_keys(Keys.RETURN)
65 |                     time.sleep(3)
66 |                 i += 1
67 |             result = f"Sent input to element and pressed Enter. Current URL is {wd.current_url} To further analyze the page, output '[send screenshot]' command."
68 |         except Exception as e:
69 |             result = str(e)
70 | 
71 |         remove_highlight_and_labels(wd)
72 | 
73 |         set_web_driver(wd)
74 | 
75 |         return result
76 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/SolveCaptcha.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import time
  3 | 
  4 | from agency_swarm.tools import BaseTool
  5 | from agency_swarm.util import get_openai_client
  6 | from selenium.webdriver.common.by import By
  7 | from selenium.webdriver.support.expected_conditions import (
  8 |     frame_to_be_available_and_switch_to_it,
  9 |     presence_of_element_located,
 10 | )
 11 | from selenium.webdriver.support.wait import WebDriverWait
 12 | 
 13 | from .util import get_b64_screenshot, remove_highlight_and_labels
 14 | from .util.selenium import get_web_driver
 15 | 
 16 | 
 17 | class SolveCaptcha(BaseTool):
 18 |     """
 19 |     This tool asks a human to solve captcha on the current webpage. Make sure that captcha is visible before running it.
 20 |     """
 21 | 
 22 |     def run(self):
 23 |         wd = get_web_driver()
 24 | 
 25 |         try:
 26 |             WebDriverWait(wd, 10).until(
 27 |                 frame_to_be_available_and_switch_to_it(
 28 |                     (By.XPATH, "//iframe[@title='reCAPTCHA']")
 29 |                 )
 30 |             )
 31 | 
 32 |             element = WebDriverWait(wd, 3).until(
 33 |                 presence_of_element_located((By.ID, "recaptcha-anchor"))
 34 |             )
 35 |         except Exception as e:
 36 |             return "Could not find captcha checkbox"
 37 | 
 38 |         try:
 39 |             # Scroll the element into view
 40 |             wd.execute_script("arguments[0].scrollIntoView(true);", element)
 41 |             time.sleep(1)  # Give some time for the scrolling to complete
 42 | 
 43 |             # Click the element using JavaScript
 44 |             wd.execute_script("arguments[0].click();", element)
 45 |         except Exception as e:
 46 |             return f"Could not click captcha checkbox: {str(e)}"
 47 | 
 48 |         try:
 49 |             # Now check if the reCAPTCHA is checked
 50 |             WebDriverWait(wd, 3).until(
 51 |                 lambda d: d.find_element(
 52 |                     By.CLASS_NAME, "recaptcha-checkbox"
 53 |                 ).get_attribute("aria-checked")
 54 |                 == "true"
 55 |             )
 56 | 
 57 |             return "Success"
 58 |         except Exception as e:
 59 |             pass
 60 | 
 61 |         wd.switch_to.default_content()
 62 | 
 63 |         client = get_openai_client()
 64 | 
 65 |         WebDriverWait(wd, 10).until(
 66 |             frame_to_be_available_and_switch_to_it(
 67 |                 (
 68 |                     By.XPATH,
 69 |                     "//iframe[@title='recaptcha challenge expires in two minutes']",
 70 |                 )
 71 |             )
 72 |         )
 73 | 
 74 |         time.sleep(2)
 75 | 
 76 |         attempts = 0
 77 |         while attempts < 5:
 78 |             tiles = wd.find_elements(By.CLASS_NAME, "rc-imageselect-tile")
 79 | 
 80 |             # filter out tiles with rc-imageselect-dynamic-selected class
 81 |             tiles = [
 82 |                 tile
 83 |                 for tile in tiles
 84 |                 if not tile.get_attribute("class").endswith(
 85 |                     "rc-imageselect-dynamic-selected"
 86 |                 )
 87 |             ]
 88 | 
 89 |             image_content = []
 90 |             i = 0
 91 |             for tile in tiles:
 92 |                 i += 1
 93 |                 screenshot = get_b64_screenshot(wd, tile)
 94 | 
 95 |                 image_content.append(
 96 |                     {
 97 |                         "type": "text",
 98 |                         "text": f"Image {i}:",
 99 |                     }
100 |                 )
101 |                 image_content.append(
102 |                     {
103 |                         "type": "image_url",
104 |                         "image_url": {
105 |                             "url": f"data:image/jpeg;base64,{screenshot}",
106 |                             "detail": "high",
107 |                         },
108 |                     },
109 |                 )
110 |             # highlight all titles with rc-imageselect-tile class but not with rc-imageselect-dynamic-selected
111 |             # wd = highlight_elements_with_labels(wd, 'td.rc-imageselect-tile:not(.rc-imageselect-dynamic-selected)')
112 | 
113 |             # screenshot = get_b64_screenshot(wd, wd.find_element(By.ID, "rc-imageselect"))
114 | 
115 |             task_text = (
116 |                 wd.find_element(By.CLASS_NAME, "rc-imageselect-instructions")
117 |                 .text.strip()
118 |                 .replace("\n", " ")
119 |             )
120 | 
121 |             continuous_task = "once there are none left" in task_text.lower()
122 | 
123 |             task_text = task_text.replace("Click verify", "Output 0")
124 |             task_text = task_text.replace("click skip", "Output 0")
125 |             task_text = task_text.replace("once", "if")
126 |             task_text = task_text.replace("none left", "none")
127 |             task_text = task_text.replace("all", "only")
128 |             task_text = task_text.replace("squares", "images")
129 | 
130 |             additional_info = ""
131 |             if len(tiles) > 9:
132 |                 additional_info = (
133 |                     "Keep in mind that all images are a part of a bigger image "
134 |                     "from left to right, and top to bottom. The grid is 4x4. "
135 |                 )
136 | 
137 |             messages = [
138 |                 {
139 |                     "role": "system",
140 |                     "content": f"""You are an advanced AI designed to support users with visual impairments.
141 |                     User will provide you with {i} images numbered from 1 to {i}. Your task is to output
142 |                     the numbers of the images that contain the requested object, or at least some part of the requested
143 |                     object. {additional_info}If there are no individual images that satisfy this condition, output 0.
144 |                     """.replace("\n", ""),
145 |                 },
146 |                 {
147 |                     "role": "user",
148 |                     "content": [
149 |                         *image_content,
150 |                         {
151 |                             "type": "text",
152 |                             "text": f"{task_text}. Only output numbers separated by commas and nothing else. "
153 |                             f"Output 0 if there are none.",
154 |                         },
155 |                     ],
156 |                 },
157 |             ]
158 | 
159 |             response = client.chat.completions.create(
160 |                 model="gpt-4o",
161 |                 messages=messages,
162 |                 max_tokens=1024,
163 |                 temperature=0.0,
164 |             )
165 | 
166 |             message = response.choices[0].message
167 |             message_text = message.content
168 | 
169 |             # check if 0 is in the message
170 |             if "0" in message_text and "10" not in message_text:
171 |                 # Find the button by its ID
172 |                 verify_button = wd.find_element(By.ID, "recaptcha-verify-button")
173 | 
174 |                 verify_button_text = verify_button.text
175 | 
176 |                 # Click the button
177 |                 wd.execute_script("arguments[0].click();", verify_button)
178 | 
179 |                 time.sleep(1)
180 | 
181 |                 try:
182 |                     if self.verify_checkbox(wd):
183 |                         return "Success. Captcha solved."
184 |                 except Exception as e:
185 |                     print("Not checked")
186 |                     pass
187 | 
188 |             else:
189 |                 numbers = [
190 |                     int(s.strip())
191 |                     for s in message_text.split(",")
192 |                     if s.strip().isdigit()
193 |                 ]
194 | 
195 |                 # Click the tiles based on the provided numbers
196 |                 for number in numbers:
197 |                     wd.execute_script("arguments[0].click();", tiles[number - 1])
198 |                     time.sleep(0.5)
199 | 
200 |                 time.sleep(3)
201 | 
202 |                 if not continuous_task:
203 |                     # Find the button by its ID
204 |                     verify_button = wd.find_element(By.ID, "recaptcha-verify-button")
205 | 
206 |                     verify_button_text = verify_button.text
207 | 
208 |                     # Click the button
209 |                     wd.execute_script("arguments[0].click();", verify_button)
210 | 
211 |                     try:
212 |                         if self.verify_checkbox(wd):
213 |                             return "Success. Captcha solved."
214 |                     except Exception as e:
215 |                         pass
216 |                 else:
217 |                     continue
218 | 
219 |             if "verify" in verify_button_text.lower():
220 |                 attempts += 1
221 | 
222 |         wd = remove_highlight_and_labels(wd)
223 | 
224 |         wd.switch_to.default_content()
225 | 
226 |         # close captcha
227 |         try:
228 |             element = WebDriverWait(wd, 3).until(
229 |                 presence_of_element_located((By.XPATH, "//iframe[@title='reCAPTCHA']"))
230 |             )
231 | 
232 |             wd.execute_script(
233 |                 f"document.elementFromPoint({element.location['x']}, {element.location['y']-10}).click();"
234 |             )
235 |         except Exception as e:
236 |             print(e)
237 |             pass
238 | 
239 |         return "Could not solve captcha."
240 | 
241 |     def verify_checkbox(self, wd):
242 |         wd.switch_to.default_content()
243 | 
244 |         try:
245 |             WebDriverWait(wd, 10).until(
246 |                 frame_to_be_available_and_switch_to_it(
247 |                     (By.XPATH, "//iframe[@title='reCAPTCHA']")
248 |                 )
249 |             )
250 | 
251 |             WebDriverWait(wd, 5).until(
252 |                 lambda d: d.find_element(
253 |                     By.CLASS_NAME, "recaptcha-checkbox"
254 |                 ).get_attribute("aria-checked")
255 |                 == "true"
256 |             )
257 | 
258 |             return True
259 |         except Exception as e:
260 |             wd.switch_to.default_content()
261 | 
262 |             WebDriverWait(wd, 10).until(
263 |                 frame_to_be_available_and_switch_to_it(
264 |                     (
265 |                         By.XPATH,
266 |                         "//iframe[@title='recaptcha challenge expires in two minutes']",
267 |                     )
268 |                 )
269 |             )
270 | 
271 |         return False
272 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/WebPageSummarizer.py:
--------------------------------------------------------------------------------
 1 | from agency_swarm.tools import BaseTool
 2 | from selenium.webdriver.common.by import By
 3 | 
 4 | from .util import get_web_driver, set_web_driver
 5 | 
 6 | 
 7 | class WebPageSummarizer(BaseTool):
 8 |     """
 9 |     This tool summarizes the content of the current web page, extracting the main points and providing a concise summary.
10 |     """
11 | 
12 |     def run(self):
13 |         from agency_swarm import get_openai_client
14 | 
15 |         wd = get_web_driver()
16 |         client = get_openai_client()
17 | 
18 |         content = wd.find_element(By.TAG_NAME, "body").text
19 | 
20 |         # only use the first 10000 characters
21 |         content = " ".join(content.split()[:10000])
22 | 
23 |         completion = client.chat.completions.create(
24 |             model="gpt-3.5-turbo",
25 |             messages=[
26 |                 {
27 |                     "role": "system",
28 |                     "content": "Your task is to summarize the content of the provided webpage. The summary should be concise and informative, capturing the main points and takeaways of the page.",
29 |                 },
30 |                 {
31 |                     "role": "user",
32 |                     "content": "Summarize the content of the following webpage:\n\n"
33 |                     + content,
34 |                 },
35 |             ],
36 |             temperature=0.0,
37 |         )
38 | 
39 |         return completion.choices[0].message.content
40 | 
41 | 
42 | if __name__ == "__main__":
43 |     wd = get_web_driver()
44 |     wd.get("https://en.wikipedia.org/wiki/Python_(programming_language)")
45 |     set_web_driver(wd)
46 |     tool = WebPageSummarizer()
47 |     print(tool.run())
48 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/__init__.py:
--------------------------------------------------------------------------------
 1 | from .ClickElement import ClickElement
 2 | from .ExportFile import ExportFile
 3 | from .GoBack import GoBack
 4 | from .ReadURL import ReadURL
 5 | from .Scroll import Scroll
 6 | from .SelectDropdown import SelectDropdown
 7 | from .SendKeys import SendKeys
 8 | from .SolveCaptcha import SolveCaptcha
 9 | from .WebPageSummarizer import WebPageSummarizer
10 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/util/__init__.py:
--------------------------------------------------------------------------------
1 | from .get_b64_screenshot import get_b64_screenshot
2 | from .highlights import highlight_elements_with_labels, remove_highlight_and_labels
3 | from .selenium import get_web_driver, set_web_driver
4 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/util/get_b64_screenshot.py:
--------------------------------------------------------------------------------
1 | def get_b64_screenshot(wd, element=None):
2 |     if element:
3 |         screenshot_b64 = element.screenshot_as_base64
4 |     else:
5 |         screenshot_b64 = wd.get_screenshot_as_base64()
6 | 
7 |     return screenshot_b64
8 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/util/highlights.py:
--------------------------------------------------------------------------------
  1 | def highlight_elements_with_labels(driver, selector):
  2 |     """
  3 |     This function highlights clickable elements like buttons, links, and certain divs and spans
  4 |     that match the given CSS selector on the webpage with a red border and ensures that labels are visible and positioned
  5 |     correctly within the viewport.
  6 | 
  7 |     :param driver: Instance of Selenium WebDriver.
  8 |     :param selector: CSS selector for the elements to be highlighted.
  9 |     """
 10 |     script = f"""
 11 |         // Helper function to check if an element is visible
 12 |         function isElementVisible(element) {{
 13 |             var rect = element.getBoundingClientRect();
 14 |             if (rect.width <= 0 || rect.height <= 0 ||
 15 |                 rect.top >= (window.innerHeight || document.documentElement.clientHeight) ||
 16 |                 rect.bottom <= 0 ||
 17 |                 rect.left >= (window.innerWidth || document.documentElement.clientWidth) ||
 18 |                 rect.right <= 0) {{
 19 |                 return false;
 20 |             }}
 21 |             // Check if any parent element is hidden, which would hide this element as well
 22 |             var parent = element;
 23 |             while (parent) {{
 24 |                 var style = window.getComputedStyle(parent);
 25 |                 if (style.display === 'none' || style.visibility === 'hidden') {{
 26 |                     return false;
 27 |                 }}
 28 |                 parent = parent.parentElement;
 29 |             }}
 30 |             return true;
 31 |         }}
 32 | 
 33 |         // Remove previous labels and styles if they exist
 34 |         document.querySelectorAll('.highlight-label').forEach(function(label) {{
 35 |             label.remove();
 36 |         }});
 37 |         document.querySelectorAll('.highlighted-element').forEach(function(element) {{
 38 |             element.classList.remove('highlighted-element');
 39 |             element.removeAttribute('data-highlighted');
 40 |         }});
 41 | 
 42 |         // Inject custom style for highlighting elements
 43 |         var styleElement = document.getElementById('highlight-style');
 44 |         if (!styleElement) {{
 45 |             styleElement = document.createElement('style');
 46 |             styleElement.id = 'highlight-style';
 47 |             document.head.appendChild(styleElement);
 48 |         }}
 49 |         styleElement.textContent = `
 50 |             .highlighted-element {{
 51 |                 border: 2px solid red !important;
 52 |                 position: relative;
 53 |                 box-sizing: border-box;
 54 |             }}
 55 |             .highlight-label {{
 56 |                 position: absolute;
 57 |                 z-index: 2147483647;
 58 |                 background: yellow;
 59 |                 color: black;
 60 |                 font-size: 25px;
 61 |                 padding: 3px 5px;
 62 |                 border: 1px solid black;
 63 |                 border-radius: 3px;
 64 |                 white-space: nowrap;
 65 |                 box-shadow: 0px 0px 2px #000;
 66 |                 top: -25px;
 67 |                 left: 0;
 68 |                 display: none;
 69 |             }}
 70 |         `;
 71 | 
 72 |         // Function to create and append a label to the body
 73 |         function createAndAdjustLabel(element, index) {{
 74 |             if (!isElementVisible(element)) return;
 75 | 
 76 |             element.classList.add('highlighted-element');
 77 |             var label = document.createElement('div');
 78 |             label.className = 'highlight-label';
 79 |             label.textContent = index.toString();
 80 |             label.style.display = 'block'; // Make the label visible
 81 | 
 82 |             // Calculate label position
 83 |             var rect = element.getBoundingClientRect();
 84 |             var top = rect.top + window.scrollY - 25; // Position label above the element
 85 |             var left = rect.left + window.scrollX;
 86 | 
 87 |             label.style.top = top + 'px';
 88 |             label.style.left = left + 'px';
 89 | 
 90 |             document.body.appendChild(label); // Append the label to the body
 91 |         }}
 92 | 
 93 |         // Select all clickable elements and apply the styles
 94 |         var allElements = document.querySelectorAll('{selector}');
 95 |         var index = 1;
 96 |         allElements.forEach(function(element) {{
 97 |             // Check if the element is not already highlighted and is visible
 98 |             if (!element.dataset.highlighted && isElementVisible(element)) {{
 99 |                 element.dataset.highlighted = 'true';
100 |                 createAndAdjustLabel(element, index++);
101 |             }}
102 |         }});
103 |         """
104 | 
105 |     driver.execute_script(script)
106 | 
107 |     return driver
108 | 
109 | 
110 | def remove_highlight_and_labels(driver):
111 |     """
112 |     This function removes all red borders and labels from the webpage elements,
113 |     reversing the changes made by the highlight functions using Selenium WebDriver.
114 | 
115 |     :param driver: Instance of Selenium WebDriver.
116 |     """
117 |     selector = (
118 |         'a, button, input, textarea, div[onclick], div[role="button"], div[tabindex], span[onclick], '
119 |         'span[role="button"], span[tabindex]'
120 |     )
121 |     script = f"""
122 |         // Remove all labels
123 |         document.querySelectorAll('.highlight-label').forEach(function(label) {{
124 |             label.remove();
125 |         }});
126 | 
127 |         // Remove the added style for red borders
128 |         var highlightStyle = document.getElementById('highlight-style');
129 |         if (highlightStyle) {{
130 |             highlightStyle.remove();
131 |         }}
132 | 
133 |         // Remove inline styles added by highlighting function
134 |         document.querySelectorAll('{selector}').forEach(function(element) {{
135 |             element.style.border = '';
136 |         }});
137 |         """
138 | 
139 |     driver.execute_script(script)
140 | 
141 |     return driver
142 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/BrowsingAgent/tools/util/selenium.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | wd = None
  4 | 
  5 | selenium_config = {
  6 |     "chrome_profile_path": None,
  7 |     "headless": False,
  8 |     "full_page_screenshot": True,
  9 | }
 10 | 
 11 | 
 12 | def get_web_driver():
 13 |     print("Initializing WebDriver...")
 14 |     try:
 15 |         from selenium import webdriver
 16 |         from selenium.webdriver.chrome.service import Service as ChromeService
 17 | 
 18 |         print("Selenium imported successfully.")
 19 |     except ImportError:
 20 |         print("Selenium not installed. Please install it with pip install selenium")
 21 |         raise ImportError
 22 | 
 23 |     try:
 24 |         from webdriver_manager.chrome import ChromeDriverManager
 25 | 
 26 |         print("webdriver_manager imported successfully.")
 27 |     except ImportError:
 28 |         print(
 29 |             "webdriver_manager not installed. Please install it with pip install webdriver-manager"
 30 |         )
 31 |         raise ImportError
 32 | 
 33 |     try:
 34 |         from selenium_stealth import stealth
 35 | 
 36 |         print("selenium_stealth imported successfully.")
 37 |     except ImportError:
 38 |         print(
 39 |             "selenium_stealth not installed. Please install it with pip install selenium-stealth"
 40 |         )
 41 |         raise ImportError
 42 | 
 43 |     global wd, selenium_config
 44 | 
 45 |     if wd:
 46 |         print("Returning existing WebDriver instance.")
 47 |         return wd
 48 | 
 49 |     chrome_profile_path = selenium_config.get("chrome_profile_path", None)
 50 |     profile_directory = None
 51 |     user_data_dir = None
 52 |     if isinstance(chrome_profile_path, str) and os.path.exists(chrome_profile_path):
 53 |         profile_directory = (
 54 |             os.path.split(chrome_profile_path)[-1].strip("\\").rstrip("/")
 55 |         )
 56 |         user_data_dir = os.path.split(chrome_profile_path)[0].strip("\\").rstrip("/")
 57 |         print(f"Using Chrome profile: {profile_directory}")
 58 |         print(f"Using Chrome user data dir: {user_data_dir}")
 59 |         print(f"Using Chrome profile path: {chrome_profile_path}")
 60 | 
 61 |     chrome_options = webdriver.ChromeOptions()
 62 |     print("ChromeOptions initialized.")
 63 | 
 64 |     chrome_driver_path = "/usr/bin/chromedriver"
 65 |     if not os.path.exists(chrome_driver_path):
 66 |         print(
 67 |             "ChromeDriver not found at /usr/bin/chromedriver. Installing using webdriver_manager."
 68 |         )
 69 |         chrome_driver_path = ChromeDriverManager().install()
 70 |     else:
 71 |         print(f"ChromeDriver found at {chrome_driver_path}.")
 72 | 
 73 |     if selenium_config.get("headless", False):
 74 |         chrome_options.add_argument("--headless")
 75 |         print("Headless mode enabled.")
 76 |     if selenium_config.get("full_page_screenshot", False):
 77 |         chrome_options.add_argument("--start-maximized")
 78 |         print("Full page screenshot mode enabled.")
 79 |     else:
 80 |         chrome_options.add_argument("--window-size=1920,1080")
 81 |         print("Window size set to 1920,1080.")
 82 | 
 83 |     chrome_options.add_argument("--no-sandbox")
 84 |     chrome_options.add_argument("--disable-gpu")
 85 |     chrome_options.add_argument("--disable-dev-shm-usage")
 86 |     chrome_options.add_argument("--remote-debugging-port=9222")
 87 |     chrome_options.add_argument("--disable-extensions")
 88 |     chrome_options.add_argument("--disable-popup-blocking")
 89 |     chrome_options.add_argument("--ignore-certificate-errors")
 90 |     chrome_options.add_argument("--disable-blink-features=AutomationControlled")
 91 |     chrome_options.add_argument("--disable-web-security")
 92 |     chrome_options.add_argument("--allow-running-insecure-content")
 93 |     chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
 94 |     chrome_options.add_experimental_option("useAutomationExtension", False)
 95 |     print("Chrome options configured.")
 96 | 
 97 |     if user_data_dir and profile_directory:
 98 |         chrome_options.add_argument(f"user-data-dir={user_data_dir}")
 99 |         chrome_options.add_argument(f"profile-directory={profile_directory}")
100 |         print(
101 |             f"Using user data dir: {user_data_dir} and profile directory: {profile_directory}"
102 |         )
103 | 
104 |     try:
105 |         wd = webdriver.Chrome(
106 |             service=ChromeService(chrome_driver_path), options=chrome_options
107 |         )
108 |         print("WebDriver initialized successfully.")
109 |         if wd.capabilities["chrome"]["userDataDir"]:
110 |             print(f"Profile path in use: {wd.capabilities['chrome']['userDataDir']}")
111 |     except Exception as e:
112 |         print(f"Error initializing WebDriver: {e}")
113 |         raise e
114 | 
115 |     if not selenium_config.get("chrome_profile_path", None):
116 |         stealth(
117 |             wd,
118 |             languages=["en-US", "en"],
119 |             vendor="Google Inc.",
120 |             platform="Win32",
121 |             webgl_vendor="Intel Inc.",
122 |             renderer="Intel Iris OpenGL Engine",
123 |             fix_hairline=True,
124 |         )
125 |         print("Stealth mode configured.")
126 | 
127 |     wd.implicitly_wait(3)
128 |     print("Implicit wait set to 3 seconds.")
129 | 
130 |     return wd
131 | 
132 | 
133 | def set_web_driver(new_wd):
134 |     # remove all popups
135 |     js_script = """
136 |     var popUpSelectors = ['modal', 'popup', 'overlay', 'dialog']; // Add more selectors that are commonly used for pop-ups
137 |     popUpSelectors.forEach(function(selector) {
138 |         var elements = document.querySelectorAll(selector);
139 |         elements.forEach(function(element) {
140 |             // You can choose to hide or remove; here we're removing the element
141 |             element.parentNode.removeChild(element);
142 |         });
143 |     });
144 |     """
145 | 
146 |     new_wd.execute_script(js_script)
147 | 
148 |     # Close LinkedIn specific popups
149 |     if "linkedin.com" in new_wd.current_url:
150 |         linkedin_js_script = """
151 |         var linkedinSelectors = ['div.msg-overlay-list-bubble', 'div.ml4.msg-overlay-list-bubble__tablet-height'];
152 |         linkedinSelectors.forEach(function(selector) {
153 |             var elements = document.querySelectorAll(selector);
154 |             elements.forEach(function(element) {
155 |                 element.parentNode.removeChild(element);
156 |             });
157 |         });
158 |         """
159 |         new_wd.execute_script(linkedin_js_script)
160 | 
161 |     new_wd.execute_script("document.body.style.zoom='1.2'")
162 | 
163 |     global wd
164 |     wd = new_wd
165 | 
166 | 
167 | def set_selenium_config(config):
168 |     global selenium_config
169 |     selenium_config = config
170 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/agency.py:
--------------------------------------------------------------------------------
 1 | from agency_swarm import Agency
 2 | 
 3 | from .AnalystAgent.AnalystAgent import AnalystAgent
 4 | from .BrowsingAgent.BrowsingAgent import BrowsingAgent
 5 | 
 6 | 
 7 | def create_agency():
 8 |     browsing_agent = BrowsingAgent()
 9 |     analyst_agent = AnalystAgent()
10 | 
11 |     agency = Agency(
12 |         [
13 |             analyst_agent,
14 |             [analyst_agent, browsing_agent],
15 |         ],
16 |         shared_instructions="agency_manifesto.md",
17 |         temperature=0.0,
18 |         max_prompt_tokens=25000,
19 |         async_mode="threading",
20 |     )
21 | 
22 |     return agency
23 | 
24 | 
25 | agency = create_agency()
26 | 
27 | 
28 | if __name__ == "__main__":
29 |     agency.run_demo()
30 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/ResearchAgency/agency_manifesto.md:
--------------------------------------------------------------------------------
 1 | # Research Agency Manifesto
 2 | 
 3 | The Research Agency leverages advanced AI-driven web browsing capabilities to gather, analyze, and synthesize information from diverse online sources. Our mission is to empower users with timely, relevant, and insightful information, supporting informed decision-making and expanding knowledge on specific topics.
 4 | 
 5 | Our Web Browsing Agent operates in a dynamic digital environment, focusing on:
 6 | 
 7 | 1. Efficient navigation and access to a wide range of online resources
 8 | 2. Intelligent analysis of information to extract key insights
 9 | 3. Effective synthesis of data from multiple sources to provide comprehensive understanding
10 | 
11 | We are committed to delivering high-quality, actionable information that directly addresses user inquiries and enhances their decision-making processes.
12 | 


--------------------------------------------------------------------------------
/src/voice_assistant/agencies/__init__.py:
--------------------------------------------------------------------------------
 1 | import importlib
 2 | import os
 3 | 
 4 | from agency_swarm import Agency
 5 | 
 6 | 
 7 | def load_agencies() -> dict[str, Agency]:
 8 |     agencies = {}
 9 |     current_dir = os.path.dirname(os.path.abspath(__file__))
10 | 
11 |     for agency_folder in os.listdir(current_dir):
12 |         agency_path = os.path.join(current_dir, agency_folder)
13 |         if os.path.isdir(agency_path) and agency_folder != "__pycache__":
14 |             try:
15 |                 agency_module = importlib.import_module(
16 |                     f"voice_assistant.agencies.{agency_folder}.agency"
17 |                 )
18 |                 agencies[agency_folder] = getattr(agency_module, "agency")
19 |             except (ImportError, AttributeError) as e:
20 |                 print(f"Error loading agency {agency_folder}: {e}")
21 | 
22 |     return agencies
23 | 
24 | 
25 | # Load all agencies
26 | AGENCIES: dict[str, Agency] = load_agencies()
27 | 
28 | AGENCIES_AND_AGENTS_STRING = "\n".join(
29 |     f"Agency '{agency_name}' has the following agents: {', '.join(agent.name for agent in agency.agents)}"
30 |     for agency_name, agency in AGENCIES.items()
31 | )
32 | print("Available Agencies and Agents:\n", AGENCIES_AND_AGENTS_STRING)  # Debug print
33 | 


--------------------------------------------------------------------------------
/src/voice_assistant/audio.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import logging
 3 | 
 4 | import pyaudio
 5 | 
 6 | from voice_assistant.config import CHANNELS, FORMAT, RATE
 7 | 
 8 | logger = logging.getLogger(__name__)
 9 | 
10 | 
11 | class AudioPlayer:
12 |     def __init__(self):
13 |         self.p = pyaudio.PyAudio()
14 |         self.stream = self.p.open(
15 |             format=FORMAT, channels=CHANNELS, rate=RATE, output=True, start=False
16 |         )
17 |         self.is_playing = False
18 | 
19 |     async def play_audio_chunk(self, audio_chunk: bytes, visual_interface):
20 |         if not self.is_playing:
21 |             self.stream.start_stream()
22 |             self.is_playing = True
23 |             visual_interface.set_assistant_speaking(True)
24 | 
25 |         self.stream.write(audio_chunk)
26 | 
27 |         # Update energy for visualization
28 |         visual_interface.process_audio_data(audio_chunk)
29 | 
30 |         # Allow other tasks to run
31 |         await asyncio.sleep(0)
32 | 
33 |     async def stop_playback(self, visual_interface):
34 |         if self.is_playing:
35 |             # Add a small delay of silence at the end
36 |             silence_duration = 0.2  # 200ms
37 |             silence_frames = int(RATE * silence_duration)
38 |             silence = b"\x00" * (silence_frames * CHANNELS * 2)
39 |             self.stream.write(silence)
40 | 
41 |             await asyncio.sleep(0.5)
42 | 
43 |             self.stream.stop_stream()
44 |             self.is_playing = False
45 |             visual_interface.set_assistant_speaking(False)
46 |             logger.debug("Audio playback completed")
47 | 
48 |     def close(self):
49 |         self.stream.close()
50 |         self.p.terminate()
51 | 
52 | 
53 | audio_player = AudioPlayer()
54 | 


--------------------------------------------------------------------------------
/src/voice_assistant/config.py:
--------------------------------------------------------------------------------
 1 | # src/voice_assistant/config.py
 2 | import json
 3 | import os
 4 | 
 5 | import pyaudio
 6 | from dotenv import load_dotenv
 7 | 
 8 | # Load environment variables
 9 | load_dotenv()
10 | 
11 | # Constants
12 | PREFIX_PADDING_MS = 300
13 | SILENCE_THRESHOLD = 0.5
14 | SILENCE_DURATION_MS = 400
15 | RUN_TIME_TABLE_LOG_JSON = "runtime_time_table.jsonl"
16 | CHUNK = 1024
17 | FORMAT = pyaudio.paInt16
18 | CHANNELS = 1
19 | RATE = 24000
20 | 
21 | # Load personalization settings
22 | PERSONALIZATION_FILE = os.getenv("PERSONALIZATION_FILE", "./personalization.json")
23 | with open(PERSONALIZATION_FILE, "r") as f:
24 |     personalization = json.load(f)
25 | 
26 | AI_ASSISTANT_NAME = personalization.get("ai_assistant_name", "Assistant")
27 | USER_NAME = personalization.get("user_name", "User")
28 | 
29 | # Load assistant instructions from personalization file
30 | SESSION_INSTRUCTIONS = personalization.get("assistant_instructions", "").format(
31 |     ai_assistant_name=AI_ASSISTANT_NAME, user_name=USER_NAME
32 | )
33 | 
34 | # Check for required environment variables
35 | REQUIRED_ENV_VARS = ["OPENAI_API_KEY", "PERSONALIZATION_FILE", "SCRATCH_PAD_DIR"]
36 | MISSING_VARS = [var for var in REQUIRED_ENV_VARS if not os.getenv(var)]
37 | if MISSING_VARS:
38 |     raise EnvironmentError(
39 |         f"Missing required environment variables: {', '.join(MISSING_VARS)}"
40 |     )
41 | 
42 | SCRATCH_PAD_DIR = os.getenv("SCRATCH_PAD_DIR", "./scratchpad")
43 | os.makedirs(SCRATCH_PAD_DIR, exist_ok=True)
44 | 


--------------------------------------------------------------------------------
/src/voice_assistant/icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VRSEN/agency-voice-interface/2d9d39ce02d9cb9628e8de79b3543fe05885ad42/src/voice_assistant/icon.png


--------------------------------------------------------------------------------
/src/voice_assistant/main.py:
--------------------------------------------------------------------------------
  1 | # src/voice_assistant/main.py
  2 | import asyncio
  3 | import json
  4 | import logging
  5 | import os
  6 | 
  7 | import pygame
  8 | import websockets
  9 | from websockets.exceptions import ConnectionClosedError
 10 | 
 11 | from voice_assistant.config import (
 12 |     PREFIX_PADDING_MS,
 13 |     SESSION_INSTRUCTIONS,
 14 |     SILENCE_DURATION_MS,
 15 |     SILENCE_THRESHOLD,
 16 | )
 17 | from voice_assistant.microphone import AsyncMicrophone
 18 | from voice_assistant.tools import TOOL_SCHEMAS
 19 | from voice_assistant.utils import base64_encode_audio
 20 | from voice_assistant.utils.log_utils import log_ws_event
 21 | from voice_assistant.visual_interface import (
 22 |     VisualInterface,
 23 |     run_visual_interface,
 24 | )
 25 | from voice_assistant.websocket_handler import process_ws_messages
 26 | 
 27 | # Set up logging
 28 | logging.basicConfig(
 29 |     level=logging.INFO,
 30 |     format="%(asctime)s.%(msecs)03d - %(levelname)s - %(message)s",
 31 |     datefmt="%H:%M:%S",
 32 | )
 33 | logger = logging.getLogger(__name__)
 34 | 
 35 | 
 36 | async def realtime_api():
 37 |     while True:
 38 |         try:
 39 |             api_key = os.getenv("OPENAI_API_KEY")
 40 |             if not api_key:
 41 |                 logger.error("Please set the OPENAI_API_KEY in your .env file.")
 42 |                 return
 43 | 
 44 |             exit_event = asyncio.Event()
 45 | 
 46 |             url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01"
 47 |             headers = {
 48 |                 "Authorization": f"Bearer {api_key}",
 49 |                 "OpenAI-Beta": "realtime=v1",
 50 |             }
 51 | 
 52 |             mic = AsyncMicrophone()
 53 |             visual_interface = VisualInterface()
 54 | 
 55 |             async with websockets.connect(url, extra_headers=headers) as websocket:
 56 |                 logger.info("Connected to the server.")
 57 |                 # Initialize the session with voice capabilities and tools
 58 |                 session_update = {
 59 |                     "type": "session.update",
 60 |                     "session": {
 61 |                         "modalities": ["text", "audio"],
 62 |                         "instructions": SESSION_INSTRUCTIONS,
 63 |                         "voice": "shimmer",
 64 |                         "input_audio_format": "pcm16",
 65 |                         "output_audio_format": "pcm16",
 66 |                         "turn_detection": {
 67 |                             "type": "server_vad",
 68 |                             "threshold": SILENCE_THRESHOLD,
 69 |                             "prefix_padding_ms": PREFIX_PADDING_MS,
 70 |                             "silence_duration_ms": SILENCE_DURATION_MS,
 71 |                         },
 72 |                         "tools": TOOL_SCHEMAS,
 73 |                     },
 74 |                 }
 75 |                 log_ws_event("outgoing", session_update)
 76 |                 await websocket.send(json.dumps(session_update))
 77 | 
 78 |                 ws_task = asyncio.create_task(
 79 |                     process_ws_messages(websocket, mic, visual_interface)
 80 |                 )
 81 |                 visual_task = asyncio.create_task(
 82 |                     run_visual_interface(visual_interface)
 83 |                 )
 84 | 
 85 |                 logger.info(
 86 |                     "Conversation started. Speak freely, and the assistant will respond."
 87 |                 )
 88 |                 mic.start_recording()
 89 |                 logger.info("Recording started. Listening for speech...")
 90 | 
 91 |                 try:
 92 |                     while not exit_event.is_set():
 93 |                         await asyncio.sleep(0.01)  # Small delay to reduce CPU usage
 94 |                         if not mic.is_receiving:
 95 |                             audio_data = mic.get_audio_data()
 96 |                             if audio_data:
 97 |                                 base64_audio = base64_encode_audio(audio_data)
 98 |                                 if base64_audio:
 99 |                                     audio_event = {
100 |                                         "type": "input_audio_buffer.append",
101 |                                         "audio": base64_audio,
102 |                                     }
103 |                                     log_ws_event("outgoing", audio_event)
104 |                                     await websocket.send(json.dumps(audio_event))
105 |                                     # Update energy for visualization
106 |                                     visual_interface.process_audio_data(audio_data)
107 |                                 else:
108 |                                     logger.debug("No audio data to send")
109 |                 except KeyboardInterrupt:
110 |                     logger.info("Keyboard interrupt received. Closing the connection.")
111 |                 except Exception as e:
112 |                     logger.exception(
113 |                         f"An unexpected error occurred in the main loop: {e}"
114 |                     )
115 |                 finally:
116 |                     exit_event.set()
117 |                     mic.stop_recording()
118 |                     mic.close()
119 |                     await websocket.close()
120 |                     visual_interface.set_active(False)
121 | 
122 |                 # Wait for the WebSocket processing task to complete
123 |                 try:
124 |                     await ws_task
125 |                     await visual_task
126 |                 except Exception as e:
127 |                     logging.exception(f"Error in WebSocket processing task: {e}")
128 | 
129 |             # If execution reaches here without exceptions, exit the loop
130 |             break
131 |         except ConnectionClosedError as e:
132 |             if "keepalive ping timeout" in str(e):
133 |                 logging.warning(
134 |                     "WebSocket connection lost due to keepalive ping timeout. Reconnecting..."
135 |                 )
136 |                 await asyncio.sleep(1)  # Wait before reconnecting
137 |                 continue  # Retry the connection
138 |             logging.exception("WebSocket connection closed unexpectedly.")
139 |             break  # Exit the loop on other connection errors
140 |         except Exception as e:
141 |             logging.exception(f"An unexpected error occurred: {e}")
142 |             break  # Exit the loop on unexpected exceptions
143 |         finally:
144 |             if "mic" in locals():
145 |                 mic.stop_recording()
146 |                 mic.close()
147 |             if "websocket" in locals():
148 |                 await websocket.close()
149 |             pygame.quit()
150 | 
151 | 
152 | async def main_async():
153 |     await realtime_api()
154 | 
155 | 
156 | def main():
157 |     try:
158 |         asyncio.run(main_async())
159 |     except KeyboardInterrupt:
160 |         logger.info("Program terminated by user")
161 |     except Exception as e:
162 |         logger.exception(f"An unexpected error occurred: {e}")
163 | 
164 | 
165 | if __name__ == "__main__":
166 |     print("Press Ctrl+C to exit the program.")
167 |     main()
168 | 


--------------------------------------------------------------------------------
/src/voice_assistant/microphone.py:
--------------------------------------------------------------------------------
 1 | # src/voice_assistant/microphone.py
 2 | import logging
 3 | import queue
 4 | from typing import Optional
 5 | 
 6 | import pyaudio
 7 | 
 8 | from voice_assistant.config import CHANNELS, CHUNK, FORMAT, RATE
 9 | 
10 | logger = logging.getLogger(__name__)
11 | 
12 | 
13 | class AsyncMicrophone:
14 |     def __init__(self):
15 |         self.p = pyaudio.PyAudio()
16 |         self.stream = self.p.open(
17 |             format=FORMAT,
18 |             channels=CHANNELS,
19 |             rate=RATE,
20 |             input=True,
21 |             frames_per_buffer=CHUNK,
22 |             stream_callback=self.callback,
23 |         )
24 |         self.queue = queue.Queue()
25 |         self.is_recording = False
26 |         self.is_receiving = False
27 |         logger.info("AsyncMicrophone initialized")
28 | 
29 |     def callback(self, in_data, frame_count, time_info, status):
30 |         if self.is_recording and not self.is_receiving:
31 |             self.queue.put(in_data)
32 |         return (None, pyaudio.paContinue)
33 | 
34 |     def start_recording(self):
35 |         self.is_recording = True
36 |         logger.info("Started recording")
37 | 
38 |     def stop_recording(self):
39 |         self.is_recording = False
40 |         logger.info("Stopped recording")
41 | 
42 |     def start_receiving(self):
43 |         self.is_receiving = True
44 |         self.is_recording = False
45 |         logger.info("Started receiving assistant response")
46 | 
47 |     def stop_receiving(self):
48 |         self.is_receiving = False
49 |         logger.info("Stopped receiving assistant response")
50 | 
51 |     def get_audio_data(self) -> Optional[bytes]:
52 |         data = b""
53 |         while not self.queue.empty():
54 |             data += self.queue.get()
55 |         return data if data else None
56 | 
57 |     def close(self):
58 |         self.stream.stop_stream()
59 |         self.stream.close()
60 |         self.p.terminate()
61 |         logger.info("AsyncMicrophone closed")
62 | 


--------------------------------------------------------------------------------
/src/voice_assistant/models.py:
--------------------------------------------------------------------------------
 1 | # src/voice_assistant/models.py
 2 | from enum import StrEnum
 3 | 
 4 | from pydantic import BaseModel
 5 | 
 6 | 
 7 | class ModelName(StrEnum):
 8 |     BASE_MODEL = "gpt-4o"
 9 |     FAST_MODEL = "gpt-4o-mini"
10 |     REASONING_MODEL_LARGE = "o1-preview"
11 |     REASONING_MODEL_SMALL = "o1-mini"
12 | 
13 | 
14 | class WebUrl(BaseModel):
15 |     url: str
16 | 
17 | 
18 | class CreateFileResponse(BaseModel):
19 |     file_content: str
20 |     file_name: str
21 | 
22 | 
23 | class FileSelectionResponse(BaseModel):
24 |     file: str
25 |     model: ModelName = ModelName.BASE_MODEL
26 | 
27 | 
28 | class FileUpdateResponse(BaseModel):
29 |     updates: str
30 | 
31 | 
32 | class FileDeleteResponse(BaseModel):
33 |     file: str
34 |     force_delete: bool
35 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tests/test_realtime_connection.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import os
 3 | 
 4 | import websockets
 5 | from dotenv import load_dotenv
 6 | 
 7 | # Load environment variables from .env file
 8 | load_dotenv()
 9 | 
10 | 
11 | async def test_realtime_api_connection():
12 |     # Retrieve your API key from the environment variables
13 |     api_key = os.getenv("OPENAI_API_KEY")
14 |     if not api_key:
15 |         print("Please set the OPENAI_API_KEY environment variable in your .env file.")
16 |         return
17 | 
18 |     # Define the WebSocket URL with the appropriate model
19 |     # Replace 'gpt-4' with the correct model name if necessary
20 |     url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01"
21 | 
22 |     # Set the required headers
23 |     headers = {
24 |         "Authorization": f"Bearer {api_key}",
25 |         "OpenAI-Beta": "realtime=v1",
26 |     }
27 | 
28 |     # Attempt to establish the WebSocket connection
29 |     try:
30 |         async with websockets.connect(url, extra_headers=headers):
31 |             print("Connected to the server.")
32 |     except websockets.InvalidStatusCode as e:
33 |         print(f"Failed to connect: {e}")
34 |         if e.status_code == 403:
35 |             print("HTTP 403 Forbidden: Access denied.")
36 |             print("You may not have access to the Realtime API.")
37 |         else:
38 |             print(f"HTTP {e.status_code}")
39 |     except Exception as e:
40 |         print(f"An unexpected error occurred: {e}")
41 | 
42 | 
43 | if __name__ == "__main__":
44 |     asyncio.run(test_realtime_api_connection())
45 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/CreateFile.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | from agency_swarm.tools import BaseTool
 4 | from dotenv import load_dotenv
 5 | from pydantic import Field
 6 | 
 7 | from voice_assistant.config import SCRATCH_PAD_DIR
 8 | from voice_assistant.models import CreateFileResponse
 9 | from voice_assistant.utils.decorators import timeit_decorator
10 | from voice_assistant.utils.llm_utils import get_structured_output_completion
11 | 
12 | load_dotenv()
13 | 
14 | 
15 | class CreateFile(BaseTool):
16 |     """A tool for creating a new file with generated content based on a prompt."""
17 | 
18 |     file_name: str = Field(..., description="The name of the file to be created.")
19 |     prompt: str = Field(
20 |         ..., description="The prompt to generate content for the new file."
21 |     )
22 | 
23 |     async def run(self):
24 |         result = await create_file(self.file_name, self.prompt)
25 |         return str(result)
26 | 
27 | 
28 | @timeit_decorator
29 | async def create_file(file_name: str, prompt: str) -> dict:
30 |     file_path = os.path.join(SCRATCH_PAD_DIR, file_name)
31 | 
32 |     if os.path.exists(file_path):
33 |         return {"status": "File already exists"}
34 | 
35 |     prompt_structure = f"""
36 |     <purpose>
37 |         Generate content for a new file based on the user's prompt and the file name.
38 |     </purpose>
39 | 
40 |     <instructions>
41 |         <instruction>Based on the user's prompt and the file name, generate content for a new file.</instruction>
42 |         <instruction>The file name is: {file_name}</instruction>
43 |         <instruction>Use the following prompt to generate the content: {prompt}</instruction>
44 |     </instructions>
45 |     """
46 | 
47 |     response = await get_structured_output_completion(
48 |         prompt_structure, CreateFileResponse
49 |     )
50 | 
51 |     with open(file_path, "w") as f:
52 |         f.write(response.file_content)
53 | 
54 |     return {"status": "File created", "file_name": response.file_name}
55 | 
56 | 
57 | if __name__ == "__main__":
58 |     import asyncio
59 | 
60 |     tool = CreateFile(file_name="test.txt", prompt="Write a short story about a robot.")
61 | 
62 |     print(asyncio.run(tool.run()))
63 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/DeleteFile.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | from agency_swarm.tools import BaseTool
 4 | from dotenv import load_dotenv
 5 | from pydantic import Field
 6 | 
 7 | from voice_assistant.config import SCRATCH_PAD_DIR
 8 | from voice_assistant.models import FileDeleteResponse
 9 | from voice_assistant.utils.decorators import timeit_decorator
10 | from voice_assistant.utils.llm_utils import get_structured_output_completion
11 | 
12 | load_dotenv()
13 | 
14 | 
15 | class DeleteFile(BaseTool):
16 |     """A tool for deleting a file based on a prompt."""
17 | 
18 |     prompt: str = Field(..., description="The prompt to identify which file to delete.")
19 |     force_delete: bool = Field(
20 |         False, description="Whether to force delete the file without confirmation."
21 |     )
22 | 
23 |     async def run(self):
24 |         result = await delete_file(self.prompt, self.force_delete)
25 |         return str(result)
26 | 
27 | 
28 | @timeit_decorator
29 | async def delete_file(prompt: str, force_delete: bool = False) -> dict:
30 |     available_files = os.listdir(SCRATCH_PAD_DIR)
31 | 
32 |     # Select file to delete based on user prompt
33 |     file_delete_response = await get_structured_output_completion(
34 |         create_file_selection_prompt(available_files, prompt), FileDeleteResponse
35 |     )
36 | 
37 |     if not file_delete_response.file:
38 |         return {"status": "No matching file found"}
39 | 
40 |     file_path = os.path.join(SCRATCH_PAD_DIR, file_delete_response.file)
41 | 
42 |     if not os.path.exists(file_path):
43 |         return {"status": "File does not exist", "file_name": file_delete_response.file}
44 | 
45 |     if not force_delete:
46 |         return {
47 |             "status": "Confirmation required",
48 |             "file_name": file_delete_response.file,
49 |             "message": f"Are you sure you want to delete '{file_delete_response.file}'? Say force delete if you want to delete.",
50 |         }
51 | 
52 |     os.remove(file_path)
53 |     return {"status": "File deleted", "file_name": file_delete_response.file}
54 | 
55 | 
56 | def create_file_selection_prompt(available_files, user_prompt):
57 |     return f"""
58 | <purpose>
59 |     Select a file from the available files to delete.
60 | </purpose>
61 | 
62 | <instructions>
63 |     <instruction>Based on the user's prompt and the list of available files, infer which file the user wants to delete.</instruction>
64 |     <instruction>If no file matches, return an empty string for 'file'.</instruction>
65 | </instructions>
66 | 
67 | <available-files>
68 |     {", ".join(available_files)}
69 | </available-files>
70 | 
71 | <user-prompt>
72 |     {user_prompt}
73 | </user-prompt>
74 |     """
75 | 
76 | 
77 | if __name__ == "__main__":
78 |     import asyncio
79 | 
80 |     tool = DeleteFile(prompt="Delete the test file", force_delete=True)
81 |     print(asyncio.run(tool.run()))
82 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/DraftGmail.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import base64
  3 | import os
  4 | from datetime import datetime
  5 | from email.mime.text import MIMEText
  6 | from typing import Any, Dict, Optional
  7 | 
  8 | from agency_swarm.tools import BaseTool
  9 | from pydantic import Field, PrivateAttr
 10 | 
 11 | from voice_assistant.utils.google_services_utils import GoogleServicesUtils
 12 | 
 13 | 
 14 | class DraftGmail(BaseTool):
 15 |     """A tool to draft an email. Either reply_to_id or recipient must be provided."""
 16 | 
 17 |     subject: Optional[str] = Field(None, description="Subject of the email")
 18 |     content: str = Field(..., description="Content of the email")
 19 |     recipient: Optional[str] = Field(
 20 |         None,
 21 |         description="Recipient of the email. If not provided, the email will be sent to the recipient in the reply_to_id",
 22 |     )
 23 |     reply_to_id: Optional[str] = Field(None, description="ID of the email to reply to")
 24 |     _service: Optional[GoogleServicesUtils] = PrivateAttr(None)
 25 | 
 26 |     async def run(self) -> Dict[str, Any]:
 27 |         self._service = await GoogleServicesUtils.authenticate_service("gmail")
 28 |         return await self.draft_email()
 29 | 
 30 |     async def draft_email(self) -> Dict[str, Any]:
 31 |         try:
 32 |             message = await asyncio.to_thread(self._create_message)
 33 |             draft = await asyncio.to_thread(
 34 |                 lambda: self._service.users()
 35 |                 .drafts()
 36 |                 .create(userId="me", body={"message": message})
 37 |                 .execute()
 38 |             )
 39 |             return {
 40 |                 "draft_id": draft["id"],
 41 |                 "message": "Email draft created successfully",
 42 |                 "drafted_at": datetime.utcnow().isoformat(),
 43 |             }
 44 |         except Exception as e:
 45 |             return {"error": str(e), "message": "Failed to create email draft"}
 46 | 
 47 |     def _create_message(self) -> Dict[str, Any]:
 48 |         message = MIMEText(self.content)
 49 |         thread_id = None
 50 | 
 51 |         if self.reply_to_id:
 52 |             original_message = (
 53 |                 self._service.users()
 54 |                 .messages()
 55 |                 .get(userId="me", id=self.reply_to_id, format="full")
 56 |                 .execute()
 57 |             )
 58 |             thread_id = original_message.get("threadId")
 59 |             if not thread_id:
 60 |                 raise ValueError("Original message does not have a threadId.")
 61 | 
 62 |             headers = original_message["payload"]["headers"]
 63 |             original_subject = next(
 64 |                 (header["value"] for header in headers if header["name"] == "Subject"),
 65 |                 "No Subject",
 66 |             )
 67 |             original_from = next(
 68 |                 (header["value"] for header in headers if header["name"] == "From"),
 69 |                 "Unknown",
 70 |             )
 71 |             message["to"] = original_from
 72 |             message["subject"] = f"Re: {original_subject}"
 73 |             message["In-Reply-To"] = self.reply_to_id
 74 |             message["References"] = self.reply_to_id
 75 |         else:
 76 |             if self.recipient is None:
 77 |                 raise ValueError("Recipient is required for new emails")
 78 | 
 79 |             if self.subject is None:
 80 |                 raise ValueError("Subject is required for new emails")
 81 | 
 82 |             message["to"] = self.recipient
 83 |             message["subject"] = self.subject
 84 | 
 85 |         message["from"] = os.getenv("EMAIL_SENDER", "sender@example.com")
 86 |         raw_message = base64.urlsafe_b64encode(message.as_bytes()).decode("utf-8")
 87 |         return {"raw": raw_message, "threadId": thread_id}
 88 | 
 89 | 
 90 | if __name__ == "__main__":
 91 |     import asyncio
 92 | 
 93 |     async def main():
 94 |         # Example usage for a new email
 95 |         tool = DraftGmail(
 96 |             subject="Important Meeting",
 97 |             content="Hello,\n\nThis is a draft email for our upcoming meeting.\n\nBest regards,\nYour Name",
 98 |             recipient="recipient@example.com",
 99 |         )
100 |         result = await tool.run()
101 |         print("New email draft:", result)
102 | 
103 |         # Example usage for a reply
104 |         reply_tool = DraftGmail(
105 |             content="Thank you for your email. I'll review the draft and get back to you soon.",
106 |             reply_to_id="1929188e90b212c3",  # Replace with an actual email ID
107 |         )
108 |         reply_result = await reply_tool.run()
109 |         print("Reply draft:", reply_result)
110 | 
111 |     asyncio.run(main())
112 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/FetchDailyMeetingSchedule.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import logging
 3 | from datetime import UTC, datetime
 4 | 
 5 | from agency_swarm.tools import BaseTool
 6 | from dotenv import load_dotenv
 7 | from pydantic import Field
 8 | 
 9 | from voice_assistant.utils.google_services_utils import GoogleServicesUtils
10 | 
11 | load_dotenv()
12 | 
13 | logger = logging.getLogger(__name__)
14 | 
15 | 
16 | class FetchDailyMeetingSchedule(BaseTool):
17 |     """A tool to fetch and format the user's daily meeting schedule from Google Calendar."""
18 | 
19 |     date: str = Field(
20 |         default_factory=lambda: datetime.now(UTC).strftime("%Y-%m-%d"),
21 |         description="The date for which to fetch the meeting schedule. Defaults to today if not provided.",
22 |     )
23 | 
24 |     async def run(self) -> str:
25 |         try:
26 |             meetings = await self.fetch_meetings(self.date)
27 |             formatted_meetings = self.format_meetings(meetings)
28 |             return formatted_meetings
29 |         except Exception as e:
30 |             logger.error(f"Error in FetchDailyMeetingSchedule: {str(e)}")
31 |             return f"An error occurred while fetching the meeting schedule: {str(e)}"
32 | 
33 |     async def fetch_meetings(self, date) -> list[dict]:
34 |         service = await GoogleServicesUtils.authenticate_service("calendar")
35 |         events_result = await asyncio.to_thread(
36 |             service.events()
37 |             .list(
38 |                 calendarId="primary",
39 |                 timeMin=f"{date}T00:00:00Z",
40 |                 timeMax=f"{date}T23:59:59Z",
41 |                 singleEvents=True,
42 |                 orderBy="startTime",
43 |             )
44 |             .execute
45 |         )
46 |         return events_result.get("items", [])
47 | 
48 |     def format_meetings(self, meetings) -> str:
49 |         formatted = []
50 |         for meeting in meetings:
51 |             start_time = datetime.fromisoformat(
52 |                 meeting["start"].get("dateTime", meeting["start"].get("date"))
53 |             )
54 |             end_time = datetime.fromisoformat(
55 |                 meeting["end"].get("dateTime", meeting["end"].get("date"))
56 |             )
57 | 
58 |             formatted_meeting = f"{start_time.strftime('%I:%M %p')} - {end_time.strftime('%I:%M %p')}: {meeting.get('summary', 'Untitled Event')}"
59 | 
60 |             if meeting.get("location"):
61 |                 formatted_meeting += f" | Location: {meeting['location']}"
62 | 
63 |             if meeting.get("description"):
64 |                 description = meeting["description"].split("\n")[0]
65 |                 formatted_meeting += f" | Description: {description}"
66 | 
67 |             formatted.append(formatted_meeting)
68 | 
69 |         if not formatted:
70 |             return "No meetings scheduled for today."
71 | 
72 |         return "Today's Agenda:\n" + "\n".join(formatted)
73 | 
74 | 
75 | if __name__ == "__main__":
76 |     tool = FetchDailyMeetingSchedule()
77 |     result = asyncio.run(tool.run())
78 |     print(result)
79 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/GetCurrentDateTime.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | from datetime import datetime
 3 | 
 4 | from agency_swarm.tools import BaseTool
 5 | 
 6 | 
 7 | class GetCurrentDateTime(BaseTool):
 8 |     """A tool to get the current date, time, and day of the week."""
 9 | 
10 |     async def run(self) -> str:
11 |         return datetime.now().strftime("%A, %Y-%m-%d %H:%M:%S")
12 | 
13 | 
14 | if __name__ == "__main__":
15 |     tool = GetCurrentDateTime()
16 |     print(asyncio.run(tool.run()))
17 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/GetGmailSummary.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import base64
  3 | import logging
  4 | import re
  5 | from datetime import datetime, timedelta
  6 | from typing import List, Optional
  7 | 
  8 | from agency_swarm.tools import BaseTool
  9 | from dotenv import load_dotenv
 10 | from pydantic import Field, PrivateAttr
 11 | 
 12 | from voice_assistant.models import ModelName
 13 | from voice_assistant.utils.google_services_utils import GoogleServicesUtils
 14 | from voice_assistant.utils.llm_utils import get_model_completion
 15 | 
 16 | logger = logging.getLogger(__name__)
 17 | 
 18 | load_dotenv()
 19 | 
 20 | 
 21 | class GetGmailSummary(BaseTool):
 22 |     """A tool to summarize unread Gmail messages from the last two days."""
 23 | 
 24 |     max_results: int = Field(
 25 |         default=10,
 26 |         description="Maximum number of unread emails to fetch. Defaults to 10.",
 27 |     )
 28 |     _service: Optional[GoogleServicesUtils] = PrivateAttr(None)
 29 | 
 30 |     async def run(self) -> str:
 31 |         """
 32 |         Main execution method to fetch and summarize unread Gmail messages.
 33 |         """
 34 |         logger.info("Starting Gmail authentication.")
 35 |         self._service = await GoogleServicesUtils.authenticate_service("gmail")
 36 | 
 37 |         logger.info("Fetching unread messages.")
 38 |         messages = await self._fetch_unread_messages()
 39 | 
 40 |         if not messages:
 41 |             logger.info("No unread messages found.")
 42 |             return "No unread Gmail messages found in the last two days."
 43 | 
 44 |         logger.info("Summarizing messages using GPT-4o-mini.")
 45 |         summary = await self._summarize_messages_with_gpt(messages)
 46 | 
 47 |         logger.info("Gmail summary completed.")
 48 |         return summary
 49 | 
 50 |     async def _fetch_unread_messages(self) -> List[dict]:
 51 |         """
 52 |         Fetch unread messages from the last two days.
 53 |         """
 54 |         two_days_ago = (datetime.now() - timedelta(days=2)).strftime("%Y/%m/%d")
 55 |         query = f"is:unread after:{two_days_ago}"
 56 |         logger.info(f"Executing query: {query}")
 57 | 
 58 |         results = await asyncio.to_thread(
 59 |             lambda: self._service.users()
 60 |             .messages()
 61 |             .list(userId="me", q=query, maxResults=self.max_results)
 62 |             .execute()
 63 |         )
 64 | 
 65 |         messages = results.get("messages", [])
 66 |         full_messages = []
 67 |         logger.info(f"Number of messages fetched: {len(messages)}")
 68 | 
 69 |         for message in messages:
 70 |             msg = await asyncio.to_thread(
 71 |                 lambda: self._service.users()
 72 |                 .messages()
 73 |                 .get(userId="me", id=message["id"], format="full")
 74 |                 .execute()
 75 |             )
 76 |             msg["id"] = message["id"]
 77 |             full_messages.append(msg)
 78 | 
 79 |         logger.info("All messages fetched successfully.")
 80 |         return full_messages
 81 | 
 82 |     async def _summarize_messages_with_gpt(self, messages: List[dict]) -> str:
 83 |         """
 84 |         Summarize the given messages using GPT model.
 85 |         """
 86 |         full_texts = []
 87 |         for msg in messages:
 88 |             email_data = self._extract_email_data(msg)
 89 |             full_texts.append(self._format_email_text(email_data))
 90 | 
 91 |         prompt = (
 92 |             "Please provide a summary of the following emails. "
 93 |             "For each email, include the email ID, subject, sender, date, "
 94 |             "and a brief summary of the content without too many details.\n\n"
 95 |         )
 96 |         summary = await get_model_completion(
 97 |             prompt + "\n\n".join(full_texts),
 98 |             ModelName.FAST_MODEL,
 99 |         )
100 |         return summary
101 | 
102 |     def _extract_email_data(self, msg: dict) -> dict:
103 |         """
104 |         Extract relevant data from an email message.
105 |         """
106 |         payload = msg["payload"]
107 |         headers = payload.get("headers", [])
108 |         return {
109 |             "id": msg.get("id", "Unknown ID"),
110 |             "subject": next(
111 |                 (h["value"] for h in headers if h["name"] == "Subject"), "No Subject"
112 |             ),
113 |             "from": next(
114 |                 (h["value"] for h in headers if h["name"] == "From"), "Unknown Sender"
115 |             ),
116 |             "date": next(
117 |                 (h["value"] for h in headers if h["name"] == "Date"), "Unknown Date"
118 |             ),
119 |             "body": self._extract_body(payload),
120 |         }
121 | 
122 |     def _format_email_text(self, email_data: dict) -> str:
123 |         """
124 |         Format email data into a string representation.
125 |         """
126 |         return (
127 |             f"Email ID: {email_data['id']}\n"
128 |             f"From: {email_data['from']}\n"
129 |             f"Date: {email_data['date']}\n"
130 |             f"Subject: {email_data['subject']}\n"
131 |             f"Body: {email_data['body']}\n"
132 |         )
133 | 
134 |     def _extract_body(self, payload: dict) -> str:
135 |         """
136 |         Extract the body from an email payload, handling various MIME types and nested parts.
137 |         """
138 |         if "parts" in payload:
139 |             body = self._recursive_extract(payload["parts"])
140 |             if body:
141 |                 return body
142 | 
143 |         # Fallback to the main body if no parts are found
144 |         data = payload.get("body", {}).get("data", "")
145 |         if data:
146 |             try:
147 |                 decoded_body = base64.urlsafe_b64decode(data).decode("utf-8")
148 |                 return self._remove_links(decoded_body)
149 |             except Exception as e:
150 |                 logger.error(f"Error decoding main body: {e}")
151 |         return "No body content"
152 | 
153 |     def _recursive_extract(self, parts: List[dict]) -> str:
154 |         """
155 |         Recursively extract the body from email parts.
156 |         """
157 |         for part in parts:
158 |             mime_type = part.get("mimeType", "")
159 |             body = part.get("body", {})
160 |             data = body.get("data", "")
161 | 
162 |             if data and mime_type in ["text/plain", "text/html"]:
163 |                 try:
164 |                     decoded_body = base64.urlsafe_b64decode(data).decode("utf-8")
165 |                     return self._remove_links(decoded_body)
166 |                 except Exception as e:
167 |                     logger.error(f"Error decoding {mime_type} part: {e}")
168 |             elif "parts" in part:
169 |                 result = self._recursive_extract(part["parts"])
170 |                 if result:
171 |                     return result
172 |         return ""
173 | 
174 |     def _remove_links(self, text: str) -> str:
175 |         """
176 |         Remove URLs from the given text.
177 |         """
178 |         url_pattern = re.compile(r"http\S+|www\.\S+")
179 |         cleaned_text = url_pattern.sub("", text)
180 |         logger.debug("Removed links from the email body.")
181 |         return cleaned_text
182 | 
183 | 
184 | if __name__ == "__main__":
185 | 
186 |     async def main():
187 |         tool = GetGmailSummary(max_results=5)
188 |         result = await tool.run()
189 |         print(result)
190 | 
191 |     asyncio.run(main())
192 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/GetResponse.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import logging
  3 | from typing import Any, Optional
  4 | 
  5 | from agency_swarm import Agency, get_openai_client
  6 | from agency_swarm.threads import Thread
  7 | from agency_swarm.tools import BaseTool
  8 | from openai import OpenAI
  9 | from pydantic import Field, PrivateAttr, field_validator
 10 | 
 11 | from voice_assistant.agencies import AGENCIES, AGENCIES_AND_AGENTS_STRING
 12 | from voice_assistant.utils.decorators import timeit_decorator
 13 | 
 14 | logger = logging.getLogger(__name__)
 15 | 
 16 | 
 17 | class GetResponse(BaseTool):
 18 |     """
 19 |     Checks the status of a task or retrieves the response from a specific agent within a specified agency.
 20 | 
 21 |     Use this tool after initiating a long-running task with 'SendMessageAsync'.
 22 |     Use the same parameters you used with 'SendMessageAsync' to check if the task is completed.
 23 |     If the task is completed, this tool will return the agent's response.
 24 |     If the task is still in progress, it will inform you accordingly.
 25 | 
 26 |     Available Agencies and Agents:
 27 |     {agency_agents}
 28 |     """
 29 | 
 30 |     agency_name: str = Field(..., description="The name of the agency.")
 31 |     agent_name: Optional[str] = Field(
 32 |         None, description="The name of the agent, or None to use the default agent."
 33 |     )
 34 |     _client: OpenAI = PrivateAttr()
 35 | 
 36 |     def __init__(self, **kwargs):
 37 |         super().__init__(**kwargs)
 38 |         self._client = get_openai_client()
 39 | 
 40 |     @field_validator("agency_name", mode="before")
 41 |     def validate_agency_name(cls, value: str) -> str:
 42 |         if value not in AGENCIES:
 43 |             available = ", ".join(AGENCIES.keys())
 44 |             raise ValueError(
 45 |                 f"Agency '{value}' not found. Available agencies: {available}"
 46 |             )
 47 |         return value
 48 | 
 49 |     @field_validator("agent_name", mode="before")
 50 |     def validate_agent_name(cls, value: Optional[str]) -> Optional[str]:
 51 |         if value:
 52 |             agent_names = [
 53 |                 agent.name for agency in AGENCIES.values() for agent in agency.agents
 54 |             ]
 55 |             if value not in agent_names:
 56 |                 available = ", ".join(agent_names)
 57 |                 raise ValueError(
 58 |                     f"Agent '{value}' not found. Available agents: {available}"
 59 |                 )
 60 |         return value
 61 | 
 62 |     @timeit_decorator
 63 |     async def run(self) -> str:
 64 |         """
 65 |         Executes the GetResponse tool to check task status or retrieve agent response.
 66 | 
 67 |         Returns:
 68 |             str: The result message based on the task status.
 69 |         """
 70 |         agency: Agency = AGENCIES.get(self.agency_name)
 71 | 
 72 |         # Determine the thread based on agent_name
 73 |         if not self.agent_name or self.agent_name == agency.ceo.name:
 74 |             thread = agency.main_thread
 75 |         else:
 76 |             thread = agency.agents_and_threads.get(agency.ceo.name, {}).get(
 77 |                 self.agent_name
 78 |             )
 79 | 
 80 |         if not thread:
 81 |             return f"Error: No thread found between '{agency.ceo.name}' and '{self.agent_name}'"
 82 |         if not thread.thread or not thread.id:
 83 |             return f"Error: Thread between '{agency.ceo.name}' and '{self.agent_name}' is not initialized"
 84 | 
 85 |         run = await asyncio.to_thread(self._get_last_run, thread)
 86 | 
 87 |         if not run:
 88 |             return (
 89 |                 "System Notification: 'Agent is ready to receive a message. "
 90 |                 "Please send a message with the 'SendMessageAsync' tool.'"
 91 |             )
 92 | 
 93 |         if run.status in ["queued", "in_progress", "requires_action"]:
 94 |             return (
 95 |                 "System Notification: 'Task is not completed yet. Please tell the user to wait "
 96 |                 "and try again later.'"
 97 |             )
 98 | 
 99 |         if run.status == "failed":
100 |             return (
101 |                 f"System Notification: 'Agent run failed with error: {run.last_error.message}. "
102 |                 "You may send another message with the 'SendMessageAsync' tool.'"
103 |             )
104 | 
105 |         messages = await asyncio.to_thread(
106 |             self._client.beta.threads.messages.list, thread_id=thread.id, order="desc"
107 |         )
108 | 
109 |         if messages.data and messages.data[0].content:
110 |             response_text = messages.data[0].content[0].text.value
111 |             return f"{self.agent_name}'s Response: '{response_text}'"
112 |         else:
113 |             return "System Notification: 'No response found from the agent.'"
114 | 
115 |     def _get_last_run(self, thread: Thread) -> Optional[Any]:
116 |         runs = self._client.beta.threads.runs.list(
117 |             thread_id=thread.id,
118 |             order="desc",
119 |         )
120 |         return runs.data[0] if runs.data else None
121 | 
122 | 
123 | # Dynamically update the class docstring with the list of agencies and their agents
124 | GetResponse.__doc__ = GetResponse.__doc__.format(
125 |     agency_agents=AGENCIES_AND_AGENTS_STRING
126 | )
127 | 
128 | 
129 | if __name__ == "__main__":
130 | 
131 |     async def main():
132 |         # Example usage for a specific thread
133 |         tool = GetResponse(
134 |             agency_name="ResearchAgency",
135 |             agent_name="BrowsingAgent",
136 |         )
137 |         response = await tool.run()
138 |         print(response)
139 | 
140 |     asyncio.run(main())
141 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/GetScreenDescription.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import base64
  3 | import io
  4 | import os
  5 | import tempfile
  6 | 
  7 | import aiohttp
  8 | from agency_swarm.tools import BaseTool
  9 | from dotenv import load_dotenv
 10 | from PIL import Image
 11 | from pydantic import Field
 12 | 
 13 | from voice_assistant.models import ModelName
 14 | from voice_assistant.utils.decorators import timeit_decorator
 15 | 
 16 | load_dotenv()
 17 | 
 18 | OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
 19 | 
 20 | 
 21 | class GetScreenDescription(BaseTool):
 22 |     """Get a text description of the user's active window."""
 23 | 
 24 |     prompt: str = Field(..., description="Prompt to analyze the screenshot")
 25 | 
 26 |     async def run(self) -> str:
 27 |         """Execute the screen description tool."""
 28 |         screenshot_path = await self.take_screenshot()
 29 | 
 30 |         try:
 31 |             file_content = await asyncio.to_thread(self._read_file, screenshot_path)
 32 |             resized_content = await asyncio.to_thread(self._resize_image, file_content)
 33 |             encoded_image = base64.b64encode(resized_content).decode("utf-8")
 34 |             analysis = await self.analyze_image(encoded_image)
 35 |         finally:
 36 |             asyncio.create_task(asyncio.to_thread(os.remove, screenshot_path))
 37 | 
 38 |         return analysis
 39 | 
 40 |     @timeit_decorator
 41 |     async def take_screenshot(self) -> str:
 42 |         """Capture a screenshot of the active window."""
 43 |         with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp_file:
 44 |             screenshot_path = tmp_file.name
 45 | 
 46 |         bounds = await self._get_active_window_bounds()
 47 |         if not bounds:
 48 |             raise RuntimeError("Unable to retrieve the active window bounds.")
 49 | 
 50 |         x, y, width, height = bounds
 51 | 
 52 |         process = await asyncio.create_subprocess_exec(
 53 |             "screencapture",
 54 |             "-R",
 55 |             f"{x},{y},{width},{height}",
 56 |             screenshot_path,
 57 |             stdout=asyncio.subprocess.PIPE,
 58 |             stderr=asyncio.subprocess.PIPE,
 59 |         )
 60 | 
 61 |         stdout, stderr = await process.communicate()
 62 | 
 63 |         if process.returncode != 0:
 64 |             raise RuntimeError(f"screencapture failed: {stderr.decode().strip()}")
 65 | 
 66 |         if not os.path.exists(screenshot_path):
 67 |             raise FileNotFoundError(f"Screenshot was not created at {screenshot_path}")
 68 | 
 69 |         return screenshot_path
 70 | 
 71 |     async def _get_active_window_bounds(self) -> tuple:
 72 |         """Retrieve the bounds of the active window."""
 73 |         script = """
 74 |         tell application "System Events"
 75 |             set frontApp to first application process whose frontmost is true
 76 |             tell frontApp
 77 |                 try
 78 |                     set win to front window
 79 |                     set {x, y} to position of win
 80 |                     set {w, h} to size of win
 81 |                     return {x, y, w, h}
 82 |                 on error
 83 |                     return {}
 84 |                 end try
 85 |             end tell
 86 |         end tell
 87 |         """
 88 |         process = await asyncio.create_subprocess_exec(
 89 |             "osascript",
 90 |             "-e",
 91 |             script,
 92 |             stdout=asyncio.subprocess.PIPE,
 93 |             stderr=asyncio.subprocess.PIPE,
 94 |         )
 95 | 
 96 |         stdout, stderr = await process.communicate()
 97 | 
 98 |         if process.returncode != 0:
 99 |             return None
100 | 
101 |         output = stdout.decode().strip()
102 |         if not output:
103 |             return None
104 | 
105 |         try:
106 |             bounds = eval(output)
107 |             return bounds if isinstance(bounds, tuple) and len(bounds) == 4 else None
108 |         except Exception as e:
109 |             print(f"Error parsing bounds: {e}")
110 |             return None
111 | 
112 |     @timeit_decorator
113 |     async def analyze_image(self, base64_image: str) -> str:
114 |         """Send the encoded image and prompt to the OpenAI API for analysis."""
115 |         headers = {
116 |             "Content-Type": "application/json",
117 |             "Authorization": f"Bearer {OPENAI_API_KEY}",
118 |         }
119 | 
120 |         payload = {
121 |             "model": ModelName.FAST_MODEL,
122 |             "messages": [
123 |                 {
124 |                     "role": "system",
125 |                     "content": "You are an expert at analyzing screenshots and describing their content. Your output should be a concise and informative description of the screenshot, focusing on the aspects mentioned in the user's prompt. Pay close attention to the specific questions or elements the user is asking about.",
126 |                 },
127 |                 {
128 |                     "role": "user",
129 |                     "content": [
130 |                         {
131 |                             "type": "text",
132 |                             "text": f"Analyze this screenshot, paying particular attention to the following prompt: {self.prompt}",
133 |                         },
134 |                         {
135 |                             "type": "image_url",
136 |                             "image_url": {
137 |                                 "url": f"data:image/png;base64,{base64_image}"
138 |                             },
139 |                         },
140 |                     ],
141 |                 },
142 |             ],
143 |             "max_tokens": 500,
144 |         }
145 | 
146 |         async with aiohttp.ClientSession() as session:
147 |             async with session.post(
148 |                 "https://api.openai.com/v1/chat/completions",
149 |                 headers=headers,
150 |                 json=payload,
151 |             ) as response:
152 |                 if response.status != 200:
153 |                     error = await response.text()
154 |                     raise RuntimeError(f"OpenAI API error: {error}")
155 |                 result = await response.json()
156 |                 return result["choices"][0]["message"]["content"]
157 | 
158 |     def _read_file(self, path: str) -> bytes:
159 |         """Read and return the content of a file."""
160 |         with open(path, "rb") as image_file:
161 |             return image_file.read()
162 | 
163 |     def _resize_image(self, image_data: bytes) -> bytes:
164 |         """Resize the image to reduce payload size while preserving aspect ratio."""
165 |         with Image.open(io.BytesIO(image_data)) as img:
166 |             img.thumbnail((1600, 1200), Image.ANTIALIAS)
167 |             with io.BytesIO() as output:
168 |                 img.save(output, format="PNG")
169 |                 return output.getvalue()
170 | 
171 | 
172 | if __name__ == "__main__":
173 | 
174 |     async def test_tool():
175 |         tool = GetScreenDescription(
176 |             prompt="What do you see in this screenshot? Describe the main elements."
177 |         )
178 |         try:
179 |             result = await tool.run()
180 |             print(result)
181 |         except Exception as e:
182 |             print(f"Error during test: {e}")
183 | 
184 |     asyncio.run(test_tool())
185 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/OpenBrowser.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import json
 3 | import logging
 4 | import os
 5 | import webbrowser
 6 | from concurrent.futures import ThreadPoolExecutor
 7 | from enum import Enum
 8 | 
 9 | from agency_swarm.tools import BaseTool
10 | from pydantic import Field
11 | 
12 | from voice_assistant.models import WebUrl
13 | from voice_assistant.utils.decorators import timeit_decorator
14 | 
15 | logger = logging.getLogger(__name__)
16 | 
17 | with open(os.getenv("PERSONALIZATION_FILE")) as f:
18 |     personalization = json.load(f)
19 | browser = personalization["browser"]
20 | 
21 | 
22 | class OpenBrowser(BaseTool):
23 |     """Open a browser with a specified URL."""
24 | 
25 |     chain_of_thought: str = Field(
26 |         ..., description="Step-by-step thought process to determine the URL to open."
27 |     )
28 |     url: str = Field(..., description="The URL to open")
29 | 
30 |     @timeit_decorator
31 |     async def run(self):
32 |         if self.url:
33 |             logger.info(f"📖 open_browser() Opening URL: {self.url}")
34 |             loop = asyncio.get_running_loop()
35 |             with ThreadPoolExecutor() as pool:
36 |                 await loop.run_in_executor(pool, webbrowser.get(browser).open, self.url)
37 |             return {"status": "Browser opened", "url": self.url}
38 |         return {"status": "No URL found"}
39 | 
40 | 
41 | if __name__ == "__main__":
42 |     tool = OpenBrowser(
43 |         chain_of_thought="I want to open my favorite website",
44 |         url="https://www.linkedin.com",
45 |     )
46 |     asyncio.run(tool.run())
47 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/SendMessage.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This tool allows you to send a message to a specific agent within a specified agency and receive a response.
 3 | 
 4 | To use this tool, provide the message you want to send, the name of the agency to which the agent belongs, and optionally the name of the agent to whom the message should be sent. If the agent name is not specified, the message will be sent to the default agent for that agency.
 5 | """
 6 | 
 7 | import asyncio
 8 | 
 9 | from agency_swarm.tools import BaseTool
10 | from pydantic import Field
11 | 
12 | from voice_assistant.agencies import AGENCIES, AGENCIES_AND_AGENTS_STRING
13 | from voice_assistant.utils.decorators import timeit_decorator
14 | 
15 | 
16 | class SendMessage(BaseTool):
17 |     """
18 |     Sends a message to a specific agent within a specified agency and waits for an immediate response.
19 | 
20 |     Use this tool for direct, synchronous communication with agents for tasks that can be completed quickly.
21 |     The agent processes the message and returns a response immediately.
22 |     If 'agent_name' is not provided, the message is sent to the main agent in the agency.
23 | 
24 |     To continue the dialogue, invoke this tool again with your follow-up message.
25 |     Note: You are responsible for relaying the agent's responses back to the user.
26 |     Do not send more than one message at a time.
27 | 
28 |     Available Agencies and Agents:
29 |     {agency_agents}
30 |     """
31 | 
32 |     message: str = Field(..., description="The message to be sent.")
33 |     agency_name: str = Field(
34 |         ..., description="The name of the agency to send the message to."
35 |     )
36 |     agent_name: str | None = Field(
37 |         None,
38 |         description="The name of the agent to send the message to, or None to use the default agent.",
39 |     )
40 | 
41 |     def __init__(self, **data):
42 |         super().__init__(**data)
43 | 
44 |     @timeit_decorator
45 |     async def run(self) -> str:
46 |         result = await self._send_message()
47 |         return str(result)
48 | 
49 |     async def _send_message(self) -> str:
50 |         agency = AGENCIES.get(self.agency_name)
51 |         if agency:
52 |             recipient_agent = None
53 |             if self.agent_name:
54 |                 recipient_agent = next(
55 |                     (agent for agent in agency.agents if agent.name == self.agent_name),
56 |                     None,
57 |                 )
58 |                 if not recipient_agent:
59 |                     return f"Agent '{self.agent_name}' not found in agency '{self.agency_name}'. Available agents: {', '.join(agent.name for agent in agency.agents)}"
60 |             else:
61 |                 recipient_agent = None
62 | 
63 |             response = await asyncio.to_thread(
64 |                 agency.get_completion,
65 |                 message=self.message,
66 |                 recipient_agent=recipient_agent,
67 |             )
68 |             return response
69 |         else:
70 |             return f"Agency '{self.agency_name}' not found"
71 | 
72 | 
73 | # Dynamically update the class docstring with the list of agencies and their agents
74 | SendMessage.__doc__ = SendMessage.__doc__.format(
75 |     agency_agents=AGENCIES_AND_AGENTS_STRING
76 | )
77 | 
78 | 
79 | if __name__ == "__main__":
80 |     tool = SendMessage(
81 |         message="Hello, how are you?",
82 |         agency_name="ResearchAgency",
83 |         agent_name="BrowsingAgent",
84 |     )
85 |     print(asyncio.run(tool.run()))
86 | 
87 |     tool = SendMessage(
88 |         message="Hello, how are you?",
89 |         agency_name="ResearchAgency",
90 |         agent_name=None,
91 |     )
92 |     print(asyncio.run(tool.run()))
93 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/SendMessageAsync.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This tool allows you to send a message to a specific agent within a specified agency without waiting for a response.
  3 | 
  4 | To use this tool, provide the message you want to send, the name of the agency to which the agent belongs, and optionally the name of the agent to whom the message should be sent. If the agent name is not specified, the message will be sent to the default agent for that agency.
  5 | """
  6 | 
  7 | import asyncio
  8 | import logging
  9 | 
 10 | from agency_swarm.agency import Agency
 11 | from agency_swarm.threads import Thread
 12 | from agency_swarm.threads.thread_async import ThreadAsync
 13 | from agency_swarm.tools import BaseTool
 14 | from pydantic import Field
 15 | 
 16 | from voice_assistant.agencies import AGENCIES, AGENCIES_AND_AGENTS_STRING
 17 | from voice_assistant.utils.decorators import timeit_decorator
 18 | 
 19 | logger = logging.getLogger(__name__)
 20 | 
 21 | 
 22 | class SendMessageAsync(BaseTool):
 23 |     """
 24 |     Sends a message to a specific agent within a specified agency without waiting for an immediate response.
 25 | 
 26 |     Use this tool to initiate long-running tasks asynchronously.
 27 |     After sending the message, you can use the 'GetResponse' tool with the same 'agency_name' and 'agent_name' values to check the status or retrieve the agent's response.
 28 |     This allows you to perform other tasks or interact with the user while the agent processes the request.
 29 | 
 30 |     Available Agencies and Agents:
 31 |     {agency_agents}
 32 |     """
 33 | 
 34 |     message: str = Field(..., description="The message to be sent.")
 35 |     agency_name: str = Field(
 36 |         ..., description="The name of the agency to send the message to."
 37 |     )
 38 |     agent_name: str | None = Field(
 39 |         None,
 40 |         description="The name of the agent to send the message to, or None to use the default agent.",
 41 |     )
 42 | 
 43 |     @timeit_decorator
 44 |     async def run(self) -> str:
 45 |         result = await self.send_message()
 46 |         return str(result)
 47 | 
 48 |     async def send_message(self) -> str:
 49 |         agency: Agency | None = AGENCIES.get(self.agency_name)
 50 |         if not agency:
 51 |             return f"Agency '{self.agency_name}' not found"
 52 | 
 53 |         if not self.agent_name or self.agent_name == agency.ceo.name:
 54 |             thread: Thread = agency.main_thread
 55 |         else:
 56 |             recipient_agent = next(
 57 |                 (agent for agent in agency.agents if agent.name == self.agent_name),
 58 |                 None,
 59 |             )
 60 |             if not recipient_agent:
 61 |                 return f"Agent '{self.agent_name}' not found in agency '{self.agency_name}'. Available agents: {', '.join(agent.name for agent in agency.agents)}"
 62 | 
 63 |             thread: Thread = agency.agents_and_threads.get(agency.ceo.name, {}).get(
 64 |                 self.agent_name
 65 |             )
 66 | 
 67 |         if isinstance(thread, ThreadAsync):
 68 |             return await asyncio.to_thread(
 69 |                 thread.get_completion_async,
 70 |                 message=self.message,
 71 |                 recipient_agent=recipient_agent,
 72 |             )
 73 |         else:
 74 |             await asyncio.to_thread(
 75 |                 thread.get_completion,
 76 |                 message=self.message,
 77 |                 recipient_agent=recipient_agent,
 78 |             )
 79 |         return f"Message sent asynchronously. Use 'GetResponse' to check status."
 80 | 
 81 | 
 82 | # Dynamically update the class docstring with the list of agencies and their agents
 83 | SendMessageAsync.__doc__ = SendMessageAsync.__doc__.format(
 84 |     agency_agents=AGENCIES_AND_AGENTS_STRING
 85 | )
 86 | 
 87 | 
 88 | if __name__ == "__main__":
 89 |     tool = SendMessageAsync(
 90 |         message="Write a long paragraph about the history of the internet.",
 91 |         agency_name="ResearchAgency",
 92 |         agent_name="BrowsingAgent",
 93 |     )
 94 |     print(asyncio.run(tool.run()))
 95 | 
 96 |     tool = SendMessageAsync(
 97 |         message="Write a long paragraph about the history of the internet.",
 98 |         agency_name="ResearchAgency",
 99 |         agent_name=None,
100 |     )
101 |     print(asyncio.run(tool.run()))
102 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/UpdateFile.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | 
  4 | from agency_swarm.tools import BaseTool
  5 | from dotenv import load_dotenv
  6 | from pydantic import Field
  7 | 
  8 | from voice_assistant.config import SCRATCH_PAD_DIR
  9 | from voice_assistant.models import FileSelectionResponse, ModelName
 10 | from voice_assistant.utils.decorators import timeit_decorator
 11 | from voice_assistant.utils.llm_utils import (
 12 |     get_structured_output_completion,
 13 |     parse_chat_completion,
 14 | )
 15 | 
 16 | load_dotenv()
 17 | 
 18 | 
 19 | class UpdateFile(BaseTool):
 20 |     """A tool for updating the content of a file based on a prompt."""
 21 | 
 22 |     prompt: str = Field(
 23 |         ...,
 24 |         description="The prompt to identify which file to update and how to update it.",
 25 |     )
 26 | 
 27 |     async def run(self):
 28 |         result = await update_file(self.prompt)
 29 |         if "model_used" in result and isinstance(result["model_used"], ModelName):
 30 |             result["model_used"] = result["model_used"].value
 31 |         return str(result)
 32 | 
 33 | 
 34 | @timeit_decorator
 35 | async def update_file(prompt: str) -> dict:
 36 |     available_files = os.listdir(SCRATCH_PAD_DIR)
 37 |     available_model_map = {model.value: model.name for model in ModelName}
 38 | 
 39 |     file_selection_response = await get_structured_output_completion(
 40 |         create_file_selection_prompt(
 41 |             available_files, json.dumps(available_model_map), prompt
 42 |         ),
 43 |         FileSelectionResponse,
 44 |     )
 45 | 
 46 |     if not file_selection_response.file:
 47 |         return {"status": "No matching file found"}
 48 | 
 49 |     selected_file = file_selection_response.file
 50 |     selected_model = file_selection_response.model or ModelName.BASE_MODEL
 51 |     file_path = os.path.join(SCRATCH_PAD_DIR, selected_file)
 52 | 
 53 |     with open(file_path, "r") as f:
 54 |         file_content = f.read()
 55 | 
 56 |     updated_content = await parse_chat_completion(
 57 |         create_file_update_prompt(selected_file, file_content, prompt),
 58 |         selected_model,
 59 |     )
 60 | 
 61 |     with open(file_path, "w") as f:
 62 |         f.write(updated_content)
 63 | 
 64 |     return {
 65 |         "status": "File updated",
 66 |         "file_name": selected_file,
 67 |         "model_used": selected_model,
 68 |     }
 69 | 
 70 | 
 71 | def create_file_selection_prompt(available_files, available_model_map, user_prompt):
 72 |     return f"""
 73 | <purpose>
 74 |     Select a file from the available files and choose the appropriate model based on the user's prompt.
 75 | </purpose>
 76 | 
 77 | <instructions>
 78 |     <instruction>Based on the user's prompt and the list of available files, infer which file the user wants to update.</instruction>
 79 |     <instruction>Also, select the most appropriate model from the available models mapping.</instruction>
 80 |     <instruction>If the user does not specify a model, default to 'base_model'.</instruction>
 81 |     <instruction>If no file matches, return an empty string for 'file'.</instruction>
 82 | </instructions>
 83 | 
 84 | <available-files>
 85 |     {", ".join(available_files)}
 86 | </available-files>
 87 | 
 88 | <available-model-map>
 89 |     {available_model_map}
 90 | </available-model-map>
 91 | 
 92 | <user-prompt>
 93 |     {user_prompt}
 94 | </user-prompt>
 95 |     """
 96 | 
 97 | 
 98 | def create_file_update_prompt(file_name, file_content, user_prompt):
 99 |     return f"""
100 | <purpose>
101 |     Update the content of the file based on the user's prompt.
102 | </purpose>
103 | 
104 | <instructions>
105 |     <instruction>Based on the user's prompt and the file content, generate the updated content for the file.</instruction>
106 |     <instruction>The file-name is the name of the file to update.</instruction>
107 |     <instruction>The user's prompt describes the updates to make.</instruction>
108 |     <instruction>Respond exclusively with the updates to the file and nothing else; they will be used to overwrite the file entirely using f.write().</instruction>
109 |     <instruction>Do not include any preamble or commentary or markdown formatting, just the raw updates.</instruction>
110 |     <instruction>Be precise and accurate.</instruction>
111 | </instructions>
112 | 
113 | <file-name>
114 |     {file_name}
115 | </file-name>
116 | 
117 | <file-content>
118 |     {file_content}
119 | </file-content>
120 | 
121 | <user-prompt>
122 |     {user_prompt}
123 | </user-prompt>
124 |     """
125 | 
126 | 
127 | if __name__ == "__main__":
128 |     import asyncio
129 | 
130 |     tool = UpdateFile(prompt="Update the test file to include a paragraph about AI")
131 |     print(asyncio.run(tool.run()))
132 | 


--------------------------------------------------------------------------------
/src/voice_assistant/tools/__init__.py:
--------------------------------------------------------------------------------
 1 | import importlib
 2 | import logging
 3 | import os
 4 | 
 5 | from agency_swarm.tools import BaseTool
 6 | 
 7 | logger = logging.getLogger(__name__)
 8 | 
 9 | 
10 | def load_tools():
11 |     tools = []
12 |     current_dir = os.path.dirname(os.path.abspath(__file__))
13 |     for filename in os.listdir(current_dir):
14 |         if filename.endswith(".py") and filename != "__init__.py":
15 |             module_name = filename[:-3]
16 |             module = importlib.import_module(f"voice_assistant.tools.{module_name}")
17 |             for name, obj in module.__dict__.items():
18 |                 if (
19 |                     isinstance(obj, type)
20 |                     and issubclass(obj, BaseTool)
21 |                     and obj != BaseTool
22 |                 ):
23 |                     tools.append(obj)
24 |     return tools
25 | 
26 | 
27 | def prepare_tool_schemas():
28 |     """Prepare the schemas for the tools."""
29 |     tool_schemas = []
30 |     for tool in TOOLS:
31 |         tool_schema = {k: v for k, v in tool.openai_schema.items() if k != "strict"}
32 |         tool_type = "function" if not hasattr(tool, "type") else tool.type
33 |         tool_schemas.append({**tool_schema, "type": tool_type})
34 | 
35 |     logger.debug("Tool Schemas:\n%s", tool_schemas)
36 |     return tool_schemas
37 | 
38 | 
39 | # Load all tools
40 | TOOLS: list[BaseTool] = load_tools()
41 | TOOL_SCHEMAS = prepare_tool_schemas()
42 | 


--------------------------------------------------------------------------------
/src/voice_assistant/utils/__init__.py:
--------------------------------------------------------------------------------
1 | import base64
2 | 
3 | 
4 | def base64_encode_audio(audio_bytes: bytes) -> str:
5 |     return base64.b64encode(audio_bytes).decode("utf-8")
6 | 


--------------------------------------------------------------------------------
/src/voice_assistant/utils/decorators.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import functools
 3 | import time
 4 | 
 5 | from voice_assistant.utils.log_utils import log_runtime
 6 | 
 7 | 
 8 | def timeit_decorator(func):
 9 |     @functools.wraps(func)
10 |     async def async_wrapper(*args, **kwargs):
11 |         start_time = time.perf_counter()
12 |         result = await func(*args, **kwargs)
13 |         duration = round(time.perf_counter() - start_time, 4)
14 |         if args and hasattr(args[0], "__class__"):
15 |             class_name = args[0].__class__.__name__
16 |             log_runtime(f"{class_name}.{func.__name__}", duration)
17 |         else:
18 |             log_runtime(func.__name__, duration)
19 |         return result
20 | 
21 |     @functools.wraps(func)
22 |     def sync_wrapper(*args, **kwargs):
23 |         start_time = time.perf_counter()
24 |         result = func(*args, **kwargs)
25 |         duration = round(time.perf_counter() - start_time, 4)
26 |         if args and hasattr(args[0], "__class__"):
27 |             class_name = args[0].__class__.__name__
28 |             log_runtime(f"{class_name}.{func.__name__}", duration)
29 |         else:
30 |             log_runtime(func.__name__, duration)
31 |         return result
32 | 
33 |     return async_wrapper if asyncio.iscoroutinefunction(func) else sync_wrapper
34 | 


--------------------------------------------------------------------------------
/src/voice_assistant/utils/google_services_utils.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import logging
 3 | import os
 4 | 
 5 | from dotenv import load_dotenv
 6 | from google.auth.transport.requests import Request
 7 | from google.oauth2.credentials import Credentials
 8 | from google_auth_oauthlib.flow import InstalledAppFlow
 9 | from googleapiclient.discovery import build
10 | 
11 | load_dotenv()
12 | 
13 | logger = logging.getLogger(__name__)
14 | 
15 | 
16 | class GoogleServicesUtils:
17 |     """
18 |     Utility class for Gmail and Google Calendar authentication and service creation.
19 |     """
20 | 
21 |     SCOPES = [
22 |         "https://www.googleapis.com/auth/gmail.readonly",
23 |         "https://www.googleapis.com/auth/gmail.compose",
24 |         "https://www.googleapis.com/auth/calendar.readonly",
25 |     ]
26 | 
27 |     SERVICE_API_VERSIONS = {"gmail": "v1", "calendar": "v3"}
28 | 
29 |     @staticmethod
30 |     async def authenticate_service(service_name):
31 |         """
32 |         Authenticates the user and returns a Gmail or Google Calendar service object.
33 |         """
34 | 
35 |         def authenticate():
36 |             creds = None
37 |             token_path = "token.json"
38 |             credentials_path = "credentials.json"
39 | 
40 |             if os.path.exists(token_path):
41 |                 creds = Credentials.from_authorized_user_file(
42 |                     token_path, GoogleServicesUtils.SCOPES
43 |                 )
44 |                 logger.info(f"Loaded {service_name} credentials from token.json.")
45 | 
46 |             if not creds or not creds.valid:
47 |                 if creds and creds.expired and creds.refresh_token:
48 |                     logger.info(f"Refreshing expired {service_name} credentials.")
49 |                     creds.refresh(Request())
50 |                 else:
51 |                     logger.info(f"Initiating new {service_name} authentication flow.")
52 |                     flow = InstalledAppFlow.from_client_secrets_file(
53 |                         credentials_path, GoogleServicesUtils.SCOPES
54 |                     )
55 |                     creds = flow.run_local_server(port=8080)  # Fixed port
56 |                 with open(token_path, "w") as token:
57 |                     token.write(creds.to_json())
58 |                     logger.info(f"Saved new {service_name} credentials to token.json.")
59 | 
60 |             api_version = GoogleServicesUtils.SERVICE_API_VERSIONS.get(service_name)
61 |             if api_version is None:
62 |                 raise ValueError(f"Unsupported service: {service_name}")
63 | 
64 |             return build(service_name, api_version, credentials=creds)
65 | 
66 |         try:
67 |             service = await asyncio.to_thread(authenticate)
68 |             logger.info(
69 |                 f"{service_name.capitalize()} service authenticated successfully."
70 |             )
71 |             return service
72 |         except Exception as e:
73 |             logger.error(f"Failed to authenticate {service_name} service: {e}")
74 |             raise e
75 | 
76 |     @staticmethod
77 |     async def authenticate_gmail():
78 |         """
79 |         Authenticates the user and returns a Gmail service object.
80 |         """
81 |         return await GoogleServicesUtils.authenticate_service("gmail")
82 | 
83 |     @staticmethod
84 |     async def authenticate_calendar():
85 |         """
86 |         Authenticates the user and returns a Google Calendar service object.
87 |         """
88 |         return await GoogleServicesUtils.authenticate_service("calendar")
89 | 


--------------------------------------------------------------------------------
/src/voice_assistant/utils/llm_utils.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import os
 3 | 
 4 | import aiohttp
 5 | import openai
 6 | from pydantic import BaseModel
 7 | 
 8 | from voice_assistant.models import ModelName
 9 | 
10 | API_KEY = os.getenv("OPENAI_API_KEY")
11 | OPENAI_CLIENT = openai.OpenAI(api_key=API_KEY)
12 | 
13 | 
14 | async def get_model_completion(prompt: str, model: ModelName) -> str:
15 |     headers = {
16 |         "Content-Type": "application/json",
17 |         "Authorization": f"Bearer {API_KEY}",
18 |     }
19 | 
20 |     payload = {
21 |         "model": model.value,
22 |         "messages": [
23 |             {
24 |                 "role": "user",
25 |                 "content": prompt,
26 |             }
27 |         ],
28 |     }
29 | 
30 |     async with aiohttp.ClientSession() as session:
31 |         async with session.post(
32 |             "https://api.openai.com/v1/chat/completions",
33 |             headers=headers,
34 |             json=payload,
35 |         ) as response:
36 |             if response.status != 200:
37 |                 error = await response.text()
38 |                 raise RuntimeError(f"OpenAI API error: {error}")
39 |             result = await response.json()
40 |             return result["choices"][0]["message"]["content"]
41 | 
42 | 
43 | async def get_structured_output_completion(
44 |     prompt: str, response_format: BaseModel
45 | ) -> BaseModel:
46 |     completion = await asyncio.to_thread(
47 |         OPENAI_CLIENT.beta.chat.completions.parse,
48 |         model=ModelName.BASE_MODEL.value,
49 |         messages=[{"role": "user", "content": prompt}],
50 |         response_format=response_format,
51 |     )
52 |     message = completion.choices[0].message
53 |     if not message.parsed:
54 |         raise ValueError(message.refusal)
55 |     return message.parsed
56 | 
57 | 
58 | async def parse_chat_completion(prompt: str, model: ModelName) -> str:
59 |     completion = await asyncio.to_thread(
60 |         OPENAI_CLIENT.beta.chat.completions.parse,
61 |         model=model.value,
62 |         messages=[{"role": "user", "content": prompt}],
63 |     )
64 |     return completion.choices[0].message.content
65 | 


--------------------------------------------------------------------------------
/src/voice_assistant/utils/log_utils.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import logging
 3 | from datetime import datetime
 4 | 
 5 | from voice_assistant.config import RUN_TIME_TABLE_LOG_JSON
 6 | 
 7 | logger = logging.getLogger(__name__)
 8 | 
 9 | 
10 | def log_runtime(function_or_name: str, duration: float):
11 |     time_record = {
12 |         "timestamp": datetime.now().isoformat(),
13 |         "function": function_or_name,
14 |         "duration": f"{duration:.4f}",
15 |     }
16 |     with open(RUN_TIME_TABLE_LOG_JSON, "a") as file:
17 |         json.dump(time_record, file)
18 |         file.write("\n")
19 | 
20 |     logger.info(f"⏰ {function_or_name}() took {duration:.4f} seconds")
21 | 
22 | 
23 | def log_ws_event(direction: str, event: dict):
24 |     event_type = event.get("type", "Unknown")
25 |     event_emojis = {
26 |         "session.update": "🛠️",
27 |         "session.created": "🔌",
28 |         "session.updated": "🔄",
29 |         "input_audio_buffer.append": "🎤",
30 |         "input_audio_buffer.commit": "✅",
31 |         "input_audio_buffer.speech_started": "🗣️",
32 |         "input_audio_buffer.speech_stopped": "🤫",
33 |         "input_audio_buffer.cleared": "🧹",
34 |         "input_audio_buffer.committed": "📨",
35 |         "conversation.item.create": "📥",
36 |         "conversation.item.delete": "🗑️",
37 |         "conversation.item.truncate": "✂️",
38 |         "conversation.item.created": "📤",
39 |         "conversation.item.deleted": "🗑️",
40 |         "conversation.item.truncated": "✂️",
41 |         "response.create": "➡️",
42 |         "response.created": "📝",
43 |         "response.output_item.added": "➕",
44 |         "response.output_item.done": "✅",
45 |         "response.text.delta": "✍️",
46 |         "response.text.done": "📝",
47 |         "response.audio.delta": "🔊",
48 |         "response.audio.done": "🔇",
49 |         "response.done": "✔️",
50 |         "response.cancel": "⛔",
51 |         "response.function_call_arguments.delta": "📥",
52 |         "response.function_call_arguments.done": "📥",
53 |         "rate_limits.updated": "⏳",
54 |         "error": "❌",
55 |         "conversation.item.input_audio_transcription.completed": "📝",
56 |         "conversation.item.input_audio_transcription.failed": "⚠️",
57 |     }
58 |     emoji = event_emojis.get(event_type, "❓")
59 |     icon = "⬆️ - Out" if direction.lower() == "outgoing" else "⬇️ - In"
60 |     logger.info(f"{emoji} {icon} {event_type}")
61 | 


--------------------------------------------------------------------------------
/src/voice_assistant/visual_interface.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import os
  3 | from collections import deque
  4 | 
  5 | import numpy as np
  6 | import pygame
  7 | 
  8 | 
  9 | class VisualInterface:
 10 |     def __init__(self, width=400, height=400):
 11 |         pygame.init()
 12 |         self.width = width
 13 |         self.height = height
 14 |         self.screen = pygame.display.set_mode((width, height))
 15 |         pygame.display.set_caption("Assistant Voice Activity")
 16 | 
 17 |         # Set the app icon
 18 |         icon_path = os.path.join(os.path.dirname(__file__), "icon.png")
 19 |         icon = pygame.image.load(icon_path)
 20 |         pygame.display.set_icon(icon)
 21 | 
 22 |         self.clock = pygame.time.Clock()
 23 |         self.is_active = False
 24 |         self.is_assistant_speaking = False
 25 |         self.active_color = (50, 139, 246)  # Sky Blue
 26 |         self.inactive_color = (100, 100, 100)  # Gray
 27 |         self.current_color = self.inactive_color
 28 |         self.base_radius = 100
 29 |         self.current_radius = self.base_radius
 30 |         self.energy_queue = deque(maxlen=50)  # Store last 50 energy values
 31 |         self.update_interval = 0.05  # Update every 50ms
 32 |         self.max_energy = 1.0  # Initial max energy value
 33 | 
 34 |     async def update(self):
 35 |         for event in pygame.event.get():
 36 |             if event.type == pygame.QUIT:
 37 |                 pygame.quit()
 38 |                 return False
 39 | 
 40 |         self.screen.fill((0, 0, 0))  # Black background
 41 | 
 42 |         # Smooth transition for radius
 43 |         target_radius = self.base_radius
 44 |         if self.energy_queue:
 45 |             normalized_energy = np.mean(self.energy_queue) / (
 46 |                 self.max_energy or 1.0
 47 |             )  # Avoid division by zero
 48 |             target_radius += int(normalized_energy * self.base_radius)
 49 | 
 50 |         self.current_radius += (target_radius - self.current_radius) * 0.2
 51 |         self.current_radius = min(
 52 |             max(self.current_radius, self.base_radius), self.width // 2
 53 |         )
 54 | 
 55 |         # Smooth transition for color
 56 |         target_color = (
 57 |             self.active_color
 58 |             if self.is_active or self.is_assistant_speaking
 59 |             else self.inactive_color
 60 |         )
 61 |         self.current_color = tuple(
 62 |             int(self.current_color[i] + (target_color[i] - self.current_color[i]) * 0.1)
 63 |             for i in range(3)
 64 |         )
 65 | 
 66 |         pygame.draw.circle(
 67 |             self.screen,
 68 |             self.current_color,
 69 |             (self.width // 2, self.height // 2),
 70 |             int(self.current_radius),
 71 |         )
 72 | 
 73 |         pygame.display.flip()
 74 |         self.clock.tick(60)
 75 |         await asyncio.sleep(self.update_interval)
 76 |         return True
 77 | 
 78 |     def set_active(self, is_active):
 79 |         self.is_active = is_active
 80 | 
 81 |     def set_assistant_speaking(self, is_speaking):
 82 |         self.is_assistant_speaking = is_speaking
 83 | 
 84 |     def update_energy(self, energy):
 85 |         if isinstance(energy, np.ndarray):
 86 |             energy = np.mean(np.abs(energy))
 87 |         self.energy_queue.append(energy)
 88 | 
 89 |         # Update max_energy dynamically
 90 |         current_max = max(self.energy_queue)
 91 |         if current_max > self.max_energy:
 92 |             self.max_energy = current_max
 93 |         elif len(self.energy_queue) == self.energy_queue.maxlen:
 94 |             self.max_energy = max(self.energy_queue)
 95 | 
 96 |     def process_audio_data(self, audio_data: bytes):
 97 |         """Process and update audio energy for visualization."""
 98 |         audio_frame = np.frombuffer(audio_data, dtype=np.int16)
 99 |         energy = np.abs(audio_frame).mean()
100 |         self.update_energy(energy)
101 | 
102 | 
103 | async def run_visual_interface(interface):
104 |     while True:
105 |         if not await interface.update():
106 |             break
107 | 


--------------------------------------------------------------------------------
/src/voice_assistant/websocket_handler.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import json
  3 | import logging
  4 | import time
  5 | 
  6 | import websockets
  7 | 
  8 | from voice_assistant.audio import audio_player
  9 | from voice_assistant.tools import TOOLS
 10 | from voice_assistant.utils.log_utils import log_runtime, log_ws_event
 11 | 
 12 | logger = logging.getLogger(__name__)
 13 | 
 14 | 
 15 | async def process_ws_messages(websocket, mic, visual_interface):
 16 |     assistant_reply = ""
 17 |     function_call = None
 18 |     function_call_args = ""
 19 |     response_start_time = None
 20 | 
 21 |     while True:
 22 |         try:
 23 |             message = await websocket.recv()
 24 |             event = json.loads(message)
 25 |             log_ws_event("incoming", event)
 26 | 
 27 |             event_type = event.get("type")
 28 | 
 29 |             if event_type == "response.created":
 30 |                 mic.start_receiving()
 31 |                 visual_interface.set_active(True)
 32 |             elif event_type == "response.output_item.added":
 33 |                 item = event.get("item", {})
 34 |                 if item.get("type") == "function_call":
 35 |                     function_call = item
 36 |                     function_call_args = ""
 37 |             elif event_type == "response.function_call_arguments.delta":
 38 |                 function_call_args += event.get("delta", "")
 39 |             elif event_type == "response.function_call_arguments.done":
 40 |                 if function_call:
 41 |                     function_name = function_call.get("name")
 42 |                     call_id = function_call.get("call_id")
 43 |                     try:
 44 |                         args = (
 45 |                             json.loads(function_call_args) if function_call_args else {}
 46 |                         )
 47 |                     except json.JSONDecodeError:
 48 |                         logger.error(
 49 |                             f"Failed to parse function arguments: {function_call_args}"
 50 |                         )
 51 |                         args = {}
 52 | 
 53 |                     tool = next(
 54 |                         (
 55 |                             t
 56 |                             for t in TOOLS
 57 |                             if t.__name__.lower() == function_name.lower()
 58 |                         ),
 59 |                         None,
 60 |                     )
 61 |                     if tool:
 62 |                         logger.info(
 63 |                             f"🛠️ Calling function: {function_name} with args: {args}"
 64 |                         )
 65 |                         try:
 66 |                             tool_instance = tool(**args)
 67 |                             result = await tool_instance.run()
 68 |                             logger.info(
 69 |                                 f"🛠️ Function {function_name} call result: {result}"
 70 |                             )
 71 |                         except Exception as e:
 72 |                             logger.error(
 73 |                                 f"Error calling function {function_name}: {str(e)}"
 74 |                             )
 75 |                             result = {
 76 |                                 "error": f"Function '{function_name}' failed: {str(e)}"
 77 |                             }
 78 |                     else:
 79 |                         logger.warning(f"Function '{function_name}' not found in TOOLS")
 80 |                         result = {"error": f"Function '{function_name}' not found."}
 81 | 
 82 |                     function_call_output = {
 83 |                         "type": "conversation.item.create",
 84 |                         "item": {
 85 |                             "type": "function_call_output",
 86 |                             "call_id": call_id,
 87 |                             "output": json.dumps(result),
 88 |                         },
 89 |                     }
 90 |                     log_ws_event("outgoing", function_call_output)
 91 |                     await websocket.send(json.dumps(function_call_output))
 92 |                     await websocket.send(json.dumps({"type": "response.create"}))
 93 |                     function_call = None
 94 |                     function_call_args = ""
 95 |             elif event_type == "response.text.delta":
 96 |                 assistant_reply += event.get("delta", "")
 97 |                 print(
 98 |                     f"Assistant: {event.get('delta', '')}",
 99 |                     end="",
100 |                     flush=True,
101 |                 )
102 |             elif event_type == "response.audio.delta":
103 |                 audio_chunk = base64.b64decode(event["delta"])
104 |                 await audio_player.play_audio_chunk(audio_chunk, visual_interface)
105 |             elif event_type == "response.done":
106 |                 if response_start_time is not None:
107 |                     response_duration = time.perf_counter() - response_start_time
108 |                     log_runtime("realtime_api_response", response_duration)
109 |                     response_start_time = None
110 | 
111 |                 logger.info("Assistant response complete.")
112 |                 await audio_player.stop_playback(visual_interface)
113 |                 assistant_reply = ""
114 |                 logger.info("Calling stop_receiving()")
115 |                 mic.stop_receiving()
116 |                 visual_interface.set_active(False)
117 |                 mic.start_recording()
118 |                 logger.info("Started recording for next user input")
119 |             elif event_type == "rate_limits.updated":
120 |                 mic.start_recording()
121 |                 logger.info("Resumed recording after rate_limits.updated")
122 |             elif event_type == "error":
123 |                 error_message = event.get("error", {}).get("message", "")
124 |                 if "buffer is empty" in error_message:
125 |                     logger.info("Received 'buffer is empty' error, no audio data sent.")
126 |                     continue
127 |                 elif "Conversation already has an active response" in error_message:
128 |                     logger.info(
129 |                         "Received 'active response' error, adjusting response flow."
130 |                     )
131 |                     continue
132 |                 else:
133 |                     logger.error(f"Unhandled error: {error_message}")
134 |                     break
135 |             elif event_type == "input_audio_buffer.speech_started":
136 |                 logger.info("Speech detected, listening...")
137 |                 visual_interface.set_active(True)
138 |             elif event_type == "input_audio_buffer.speech_stopped":
139 |                 mic.stop_recording()
140 |                 logger.info("Speech ended, processing...")
141 |                 visual_interface.set_active(False)
142 | 
143 |                 response_start_time = time.perf_counter()
144 |         except websockets.ConnectionClosed:
145 |             logger.warning("WebSocket connection closed")
146 |             break
147 | 
148 |     audio_player.close()
149 | 


--------------------------------------------------------------------------------