├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
├── run.py
└── src
    ├── __init__.py
    ├── anthropic.py
    ├── computer.py
    ├── main.py
    ├── prompt_manager.py
    ├── store.py
    ├── voice_control.py
    └── window.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Python
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | *.so
 6 | .Python
 7 | build/
 8 | develop-eggs/
 9 | dist/
10 | downloads/
11 | eggs/
12 | .eggs/
13 | lib/
14 | lib64/
15 | parts/
16 | sdist/
17 | var/
18 | wheels/
19 | *.egg-info/
20 | .installed.cfg
21 | *.egg
22 | 
23 | # Virtual Environment
24 | venv/
25 | env/
26 | ENV/
27 | .env
28 | .venv
29 | env.bak/
30 | venv.bak/
31 | 
32 | # IDE specific files
33 | .idea/
34 | .vscode/
35 | *.swp
36 | *.swo
37 | .DS_Store
38 | 
39 | # Project specific
40 | *.log
41 | logs/
42 | *.db
43 | *.sqlite3
44 | 
45 | # Distribution / packaging
46 | .Python
47 | build/
48 | develop-eggs/
49 | dist/
50 | downloads/
51 | eggs/
52 | .eggs/
53 | lib/
54 | lib64/
55 | parts/
56 | sdist/
57 | var/
58 | wheels/
59 | share/python-wheels/
60 | *.egg-info/
61 | .installed.cfg
62 | *.egg
63 | 
64 | # Unit test / coverage reports
65 | htmlcov/
66 | .tox/
67 | .nox/
68 | .coverage
69 | .coverage.*
70 | .cache
71 | nosetests.xml
72 | coverage.xml
73 | *.cover
74 | *.py,cover
75 | .hypothesis/
76 | .pytest_cache/
77 | cover/
78 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Apache License 2.0
 2 |                     
 3 | Copyright (c) 2024 Ishan Nagpal
 4 | 
 5 | Licensed under the Apache License, Version 2.0 (the "License"); provided that any 
 6 | derivative works, modifications, implementations, or sublicensed variations thereof 
 7 | (hereinafter collectively referred to as "Derivatives") shall be subject to the 
 8 | following irrevocable provisions: all Derivatives shall (i) constitute works for 
 9 | hire as defined under applicable intellectual property law and (ii) automatically 
10 | grant to the Original Rights Holder a perpetual, irrevocable, transferable, 
11 | sublicensable right and license to use, modify, distribute, and commercialize such 
12 | Derivatives without restriction or additional consideration. The foregoing license 
13 | grant shall survive any termination or expiration of the License.
14 | 
15 | http://www.apache.org/licenses/LICENSE-2.0
16 | 
17 | Unless required by applicable law or agreed to in writing, software distributed 
18 | under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
19 | CONDITIONS OF ANY KIND, either express or implied. See the License for the specific 
20 | language governing permissions and limitations under the License.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 👨🏽‍💻 Grunty
 2 | 
 3 | Self-hosted desktop app to have AI control your computer, powered by the new Claude [computer use](https://www.anthropic.com/news/3-5-models-and-computer-use) capability. Allow Claude to take over your laptop and do your tasks for you (or at least attempt to, lol). Written in Python, using PyQt.
 4 | 
 5 | ## Demo
 6 | Here, I asked it to use [vim](https://vim.rtorr.com/) to create a game in Python, run it, and play it.
 7 | 
 8 | https://github.com/user-attachments/assets/fa9b195e-fae6-4dbc-adb9-dc42519624b1
 9 | 
10 | Video was sped up 8x btw. [Computer use](https://www.anthropic.com/news/3-5-models-and-computer-use) is pretty slow as of today.
11 | 
12 | ## ⚠️ Important Disclaimers
13 | 
14 | 1. **This is experimental software** - It gives an AI control of your mouse and keyboard. Things can and will go wrong.
15 | 
16 | 2. **Tread Lightly** - If it wipes your computer, sends weird emails, or orders 100 pizzas... that's on you. 
17 | 
18 | Anthropic can see your screen through screenshots during actions. Hide sensitive information or private stuff.
19 | 
20 | ## 🎯 Features
21 | - Literally ask AI to do ANYTHING on your computer that you do with a mouse and keyboard. Browse the web, write code, blah blah.
22 | 
23 | # 💻 Platforms
24 | - Anything you can run Python on: MacOS, Windows, Linux, etc.
25 | 
26 | ## 🛠️ Setup
27 | 
28 | Get an Anthropic API key [here]([https://console.anthropic.com/keys](https://console.anthropic.com/dashboard)).
29 | 
30 | ```bash
31 | # Python 3.10+ recommended
32 | python -m venv venv
33 | source venv/bin/activate  # or `venv\Scripts\activate` on Windows
34 | pip install -r requirements.txt
35 | 
36 | # Add API key to .env
37 | echo "ANTHROPIC_API_KEY=your-key-here" > .env
38 | 
39 | # Run
40 | python run.py
41 | ```
42 | 
43 | ## 🔑 Productivity Keybindings
44 | - `Ctrl + Enter`: Execute the current instruction
45 | - `Ctrl + C`: Stop the current agent action
46 | - `Ctrl + W`: Minimize to system tray
47 | - `Ctrl + Q`: Quit application
48 | 
49 | ## 💡 Tips
50 | - Claude really loves Firefox. You might want to install it for better UI detection and accurate mouse clicks.
51 | - Be specific and explicit, help it out a bit
52 | - Always monitor the agent's actions
53 | 
54 | ## 🐛 Known Issues
55 | 
56 | - Sometimes, it doesn't take a screenshot to validate that the input is selected, and types stuff in the wrong place.. Press CMD+C to end the action when this happens, and quit and restart the agent. I'm working on a fix.
57 | 
58 | ## 🤝 Contributing
59 | 
60 | Issues and PRs are most welcome! Made this is in a day so don't really have a roadmap in mind. Hmu on Twitter @ishanxnagpal if you're got interesting ideas you wanna share. 
61 | 
62 | ## 📄 License
63 | 
64 | [Apache License 2.0](LICENSE)
65 | 
66 | ---
67 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | PyQt6
 2 | pyautogui
 3 | requests
 4 | anthropic
 5 | python-dotenv
 6 | pillow
 7 | numpy
 8 | qtawesome
 9 | SpeechRecognition
10 | pyttsx3
11 | keyboard
12 | pyaudio
13 | 


--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
1 | from src.main import main
2 | 
3 | if __name__ == "__main__":
4 |     main()


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suitedaces/computer-agent/5d360ac21d70d8e96eab2c8a386398606432de1e/src/__init__.py


--------------------------------------------------------------------------------
/src/anthropic.py:
--------------------------------------------------------------------------------
 1 | import anthropic
 2 | from anthropic.types.beta import BetaMessage, BetaTextBlock, BetaToolUseBlock
 3 | import os
 4 | from dotenv import load_dotenv
 5 | import logging
 6 | from .prompt_manager import PromptManager
 7 | 
 8 | class AnthropicClient:
 9 |     def __init__(self):
10 |         load_dotenv()  # Load environment variables from .env file
11 |         self.api_key = os.getenv("ANTHROPIC_API_KEY")
12 |         if not self.api_key:
13 |             raise ValueError("ANTHROPIC_API_KEY not found in environment variables")
14 |         
15 |         try:
16 |             self.client = anthropic.Anthropic(api_key=self.api_key)
17 |             self.prompt_manager = PromptManager()
18 |         except Exception as e:
19 |             raise ValueError(f"Failed to initialize Anthropic client: {str(e)}")
20 |         
21 |     def get_next_action(self, run_history) -> BetaMessage:
22 |         try:
23 |             # Convert BetaMessage objects to dictionaries
24 |             cleaned_history = []
25 |             for message in run_history:
26 |                 if isinstance(message, BetaMessage):
27 |                     cleaned_history.append({
28 |                         "role": message.role,
29 |                         "content": message.content
30 |                     })
31 |                 elif isinstance(message, dict):
32 |                     cleaned_history.append(message)
33 |                 else:
34 |                     raise ValueError(f"Unexpected message type: {type(message)}")
35 |             
36 |             response = self.client.beta.messages.create(
37 |                 model="claude-3-5-sonnet-20241022",
38 |                 max_tokens=1024,
39 |                 tools=[
40 |                     {
41 |                         "type": "computer_20241022",
42 |                         "name": "computer",
43 |                         "display_width_px": 1280,
44 |                         "display_height_px": 800,
45 |                         "display_number": 1,
46 |                     },
47 |                     {
48 |                         "name": "finish_run",
49 |                         "description": "Call this function when you have achieved the goal of the task.",
50 |                         "input_schema": {
51 |                             "type": "object",
52 |                             "properties": {
53 |                                 "success": {
54 |                                     "type": "boolean",
55 |                                     "description": "Whether the task was successful"
56 |                                 },
57 |                                 "error": {
58 |                                     "type": "string",
59 |                                     "description": "The error message if the task was not successful"
60 |                                 }
61 |                             },
62 |                             "required": ["success"]
63 |                         }
64 |                     }
65 |                 ],
66 |                 messages=cleaned_history,
67 |                 system=self.prompt_manager.get_current_prompt(),
68 |                 betas=["computer-use-2024-10-22"],
69 |             )
70 | 
71 |             # If Claude responds with just text (no tool use), create a finish_run action with the message
72 |             has_tool_use = any(isinstance(content, BetaToolUseBlock) for content in response.content)
73 |             if not has_tool_use:
74 |                 text_content = next((content.text for content in response.content if isinstance(content, BetaTextBlock)), "")
75 |                 # Create a synthetic tool use block for finish_run
76 |                 response.content.append(BetaToolUseBlock(
77 |                     id="synthetic_finish",
78 |                     type="tool_use",
79 |                     name="finish_run",
80 |                     input={
81 |                         "success": False,
82 |                         "error": f"Claude needs more information: {text_content}"
83 |                     }
84 |                 ))
85 |                 logging.info(f"Added synthetic finish_run for text-only response: {text_content}")
86 | 
87 |             return response
88 |             
89 |         except anthropic.APIError as e:
90 |             raise Exception(f"API Error: {str(e)}")
91 |         except Exception as e:
92 |             raise Exception(f"Unexpected error: {str(e)}")
93 | 


--------------------------------------------------------------------------------
/src/computer.py:
--------------------------------------------------------------------------------
  1 | import pyautogui
  2 | from PIL import Image
  3 | import io
  4 | import base64
  5 | import time
  6 | 
  7 | class ComputerControl:
  8 |     def __init__(self):
  9 |         self.screen_width, self.screen_height = pyautogui.size()
 10 |         pyautogui.PAUSE = 0.5  # Add a small delay between actions for stability
 11 |         self.last_click_position = None
 12 |         
 13 |     def perform_action(self, action):
 14 |         action_type = action['type']
 15 |         
 16 |         # Take a screenshot before the action
 17 |         before_screenshot = self.take_screenshot()
 18 |         
 19 |         try:
 20 |             if action_type == 'mouse_move':
 21 |                 x, y = self.map_from_ai_space(action['x'], action['y'])
 22 |                 pyautogui.moveTo(x, y)
 23 |                 time.sleep(0.2)  # Wait for move to complete
 24 |                 
 25 |             elif action_type == 'left_click':
 26 |                 pyautogui.click()
 27 |                 time.sleep(0.2)  # Wait for click to register
 28 |                 self.last_click_position = pyautogui.position()
 29 |                 
 30 |             elif action_type == 'right_click':
 31 |                 pyautogui.rightClick()
 32 |                 time.sleep(0.2)
 33 |                 
 34 |             elif action_type == 'middle_click':
 35 |                 pyautogui.middleClick()
 36 |                 time.sleep(0.2)
 37 |                 
 38 |             elif action_type == 'double_click':
 39 |                 pyautogui.doubleClick()
 40 |                 time.sleep(0.2)
 41 |                 self.last_click_position = pyautogui.position()
 42 |                 
 43 |             elif action_type == 'left_click_drag':
 44 |                 start_x, start_y = pyautogui.position()
 45 |                 end_x, end_y = self.map_from_ai_space(action['x'], action['y'])
 46 |                 pyautogui.dragTo(end_x, end_y, button='left', duration=0.5)
 47 |                 time.sleep(0.2)
 48 |                 
 49 |             elif action_type == 'type':
 50 |                 # If we have a last click position, ensure we're still there
 51 |                 if self.last_click_position:
 52 |                     current_pos = pyautogui.position()
 53 |                     if current_pos != self.last_click_position:
 54 |                         pyautogui.click(self.last_click_position)
 55 |                         time.sleep(0.2)
 56 |                 
 57 |                 pyautogui.write(action['text'], interval=0.1)
 58 |                 time.sleep(0.2)
 59 |                 
 60 |             elif action_type == 'key':
 61 |                 pyautogui.press(action['text'])
 62 |                 time.sleep(0.2)
 63 |                 
 64 |             elif action_type == 'screenshot':
 65 |                 return self.take_screenshot()
 66 |                 
 67 |             elif action_type == 'cursor_position':
 68 |                 x, y = pyautogui.position()
 69 |                 return self.map_to_ai_space(x, y)
 70 |                 
 71 |             else:
 72 |                 raise ValueError(f"Unsupported action: {action_type}")
 73 |             
 74 |             # Take a screenshot after the action
 75 |             after_screenshot = self.take_screenshot()
 76 |             return after_screenshot
 77 |             
 78 |         except Exception as e:
 79 |             raise Exception(f"Action failed: {action_type} - {str(e)}")
 80 |         
 81 |     def take_screenshot(self):
 82 |         screenshot = pyautogui.screenshot()
 83 |         ai_screenshot = self.resize_for_ai(screenshot)
 84 |         buffered = io.BytesIO()
 85 |         ai_screenshot.save(buffered, format="PNG")
 86 |         return base64.b64encode(buffered.getvalue()).decode('utf-8')
 87 |         
 88 |     def map_from_ai_space(self, x, y):
 89 |         ai_width, ai_height = 1280, 800
 90 |         return (x * self.screen_width / ai_width, y * self.screen_height / ai_height)
 91 |         
 92 |     def map_to_ai_space(self, x, y):
 93 |         ai_width, ai_height = 1280, 800
 94 |         return (x * ai_width / self.screen_width, y * ai_height / self.screen_height)
 95 |         
 96 |     def resize_for_ai(self, screenshot):
 97 |         return screenshot.resize((1280, 800), Image.LANCZOS)
 98 | 
 99 |     def cleanup(self):
100 |         """Clean up any resources or running processes"""
101 |         # Add cleanup code here if needed
102 |         pass
103 | 


--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import logging
 3 | from PyQt6.QtWidgets import QApplication
 4 | from .window import MainWindow
 5 | from .store import Store
 6 | from .anthropic import AnthropicClient
 7 | 
 8 | logging.basicConfig(filename='agent.log', level=logging.DEBUG, 
 9 |                     format='%(asctime)s - %(levelname)s - %(message)s')
10 | 
11 | def main():
12 |     app = QApplication(sys.argv)
13 |     
14 |     app.setQuitOnLastWindowClosed(False)  # Prevent app from quitting when window is closed
15 |     
16 |     store = Store()
17 |     anthropic_client = AnthropicClient()
18 |     
19 |     window = MainWindow(store, anthropic_client)
20 |     window.show()  # Just show normally, no maximize
21 |     
22 |     sys.exit(app.exec())
23 | 
24 | if __name__ == "__main__":
25 |     main()
26 | 


--------------------------------------------------------------------------------
/src/prompt_manager.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | from pathlib import Path
 4 | 
 5 | DEFAULT_SYSTEM_PROMPT = """The user will ask you to perform a task and you should use their computer to do so. After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking: 'I have evaluated step X...' If not correct, try again. Only when you confirm a step was executed correctly should you move on to the next one. Note that you have to click into the browser address bar before typing a URL. You should always call a tool! Always return a tool call. Remember call the finish_run tool when you have achieved the goal of the task. Do not explain you have finished the task, just call the tool. Use keyboard shortcuts to navigate whenever possible. Please remember to take a screenshot after EVERY step to confirm you have achieved the right outcome."""
 6 | 
 7 | class PromptManager:
 8 |     def __init__(self):
 9 |         self.config_dir = Path.home() / ".grunty"
10 |         self.config_file = self.config_dir / "prompts.json"
11 |         self.current_prompt = self.load_prompt()
12 | 
13 |     def load_prompt(self) -> str:
14 |         """Load the system prompt from the config file or return the default"""
15 |         try:
16 |             if not self.config_dir.exists():
17 |                 self.config_dir.mkdir(parents=True)
18 |             
19 |             if not self.config_file.exists():
20 |                 self.save_prompt(DEFAULT_SYSTEM_PROMPT)
21 |                 return DEFAULT_SYSTEM_PROMPT
22 | 
23 |             with open(self.config_file, 'r') as f:
24 |                 data = json.load(f)
25 |                 return data.get('system_prompt', DEFAULT_SYSTEM_PROMPT)
26 |         except Exception as e:
27 |             print(f"Error loading prompt: {e}")
28 |             return DEFAULT_SYSTEM_PROMPT
29 | 
30 |     def save_prompt(self, prompt: str) -> bool:
31 |         """Save the system prompt to the config file"""
32 |         try:
33 |             with open(self.config_file, 'w') as f:
34 |                 json.dump({'system_prompt': prompt}, f, indent=2)
35 |             self.current_prompt = prompt
36 |             return True
37 |         except Exception as e:
38 |             print(f"Error saving prompt: {e}")
39 |             return False
40 | 
41 |     def reset_to_default(self) -> bool:
42 |         """Reset the system prompt to the default value"""
43 |         return self.save_prompt(DEFAULT_SYSTEM_PROMPT)
44 | 
45 |     def get_current_prompt(self) -> str:
46 |         """Get the current system prompt"""
47 |         return self.current_prompt
48 | 


--------------------------------------------------------------------------------
/src/store.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | from .anthropic import AnthropicClient
  3 | from .computer import ComputerControl
  4 | from anthropic.types.beta import BetaMessage, BetaToolUseBlock, BetaTextBlock
  5 | import json
  6 | 
  7 | 
  8 | logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
  9 | logger = logging.getLogger(__name__)
 10 | 
 11 | class Store:
 12 |     def __init__(self):
 13 |         self.instructions = ""
 14 |         self.fully_auto = True
 15 |         self.running = False
 16 |         self.error = None
 17 |         self.run_history = []
 18 |         self.last_tool_use_id = None
 19 |         
 20 |         try:
 21 |             self.anthropic_client = AnthropicClient()
 22 |         except ValueError as e:
 23 |             self.error = str(e)
 24 |             logger.error(f"AnthropicClient initialization error: {self.error}")
 25 |         self.computer_control = ComputerControl()
 26 |         
 27 |     def set_instructions(self, instructions):
 28 |         self.instructions = instructions
 29 |         logger.info(f"Instructions set: {instructions}")
 30 |         
 31 |     def run_agent(self, update_callback):
 32 |         if self.error:
 33 |             update_callback(f"Error: {self.error}")
 34 |             logger.error(f"Agent run failed due to initialization error: {self.error}")
 35 |             return
 36 | 
 37 |         self.running = True
 38 |         self.error = None
 39 |         self.run_history = [{"role": "user", "content": self.instructions}]
 40 |         logger.info("Starting agent run")
 41 |         
 42 |         while self.running:
 43 |             try:
 44 |                 message = self.anthropic_client.get_next_action(self.run_history)
 45 |                 self.run_history.append(message)
 46 |                 logger.debug(f"Received message from Anthropic: {message}")
 47 |                 
 48 |                 # Display assistant's message in the chat
 49 |                 self.display_assistant_message(message, update_callback)
 50 |                 
 51 |                 action = self.extract_action(message)
 52 |                 logger.info(f"Extracted action: {action}")
 53 |                 
 54 |                 if action['type'] == 'error':
 55 |                     self.error = action['message']
 56 |                     update_callback(f"Error: {self.error}")
 57 |                     logger.error(f"Action extraction error: {self.error}")
 58 |                     self.running = False
 59 |                     break
 60 |                 elif action['type'] == 'finish':
 61 |                     update_callback("Task completed successfully.")
 62 |                     logger.info("Task completed successfully")
 63 |                     self.running = False
 64 |                     break
 65 |                 
 66 |                 try:
 67 |                     # Perform the action and get the screenshot
 68 |                     screenshot = self.computer_control.perform_action(action)
 69 |                     
 70 |                     if screenshot:  # Only add screenshot if one was returned
 71 |                         self.run_history.append({
 72 |                             "role": "user",
 73 |                             "content": [
 74 |                                 {
 75 |                                     "type": "tool_result",
 76 |                                     "tool_use_id": self.last_tool_use_id,
 77 |                                     "content": [
 78 |                                         {"type": "text", "text": "Here is a screenshot after the action was executed"},
 79 |                                         {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot}}
 80 |                                     ]
 81 |                                 }
 82 |                             ]
 83 |                         })
 84 |                         logger.debug("Screenshot added to run history")
 85 |                     
 86 |                 except Exception as action_error:
 87 |                     error_msg = f"Action failed: {str(action_error)}"
 88 |                     update_callback(f"Error: {error_msg}")
 89 |                     logger.error(error_msg)
 90 |                     # Don't stop running, let the AI handle the error
 91 |                     self.run_history.append({
 92 |                         "role": "user",
 93 |                         "content": [{"type": "text", "text": error_msg}]
 94 |                     })
 95 |                 
 96 |             except Exception as e:
 97 |                 self.error = str(e)
 98 |                 update_callback(f"Error: {self.error}")
 99 |                 logger.exception(f"Unexpected error during agent run: {self.error}")
100 |                 self.running = False
101 |                 break
102 |         
103 |     def stop_run(self):
104 |         """Stop the current agent run and clean up resources"""
105 |         self.running = False
106 |         if hasattr(self, 'computer_control'):
107 |             self.computer_control.cleanup()
108 |         logger.info("Agent run stopped")
109 |         # Add a message to the run history to indicate stopping
110 |         self.run_history.append({
111 |             "role": "user",
112 |             "content": [{"type": "text", "text": "Agent run stopped by user."}]
113 |         })
114 |         
115 |     def extract_action(self, message):
116 |         logger.debug(f"Extracting action from message: {message}")
117 |         if not isinstance(message, BetaMessage):
118 |             logger.error(f"Unexpected message type: {type(message)}")
119 |             return {'type': 'error', 'message': 'Unexpected message type'}
120 |         
121 |         for item in message.content:
122 |             if isinstance(item, BetaToolUseBlock):
123 |                 tool_use = item
124 |                 logger.debug(f"Found tool use: {tool_use}")
125 |                 self.last_tool_use_id = tool_use.id
126 |                 if tool_use.name == 'finish_run':
127 |                     return {'type': 'finish'}
128 |                 
129 |                 if tool_use.name != 'computer':
130 |                     logger.error(f"Unexpected tool: {tool_use.name}")
131 |                     return {'type': 'error', 'message': f"Unexpected tool: {tool_use.name}"}
132 |                 
133 |                 input_data = tool_use.input
134 |                 action_type = input_data.get('action')
135 |                 
136 |                 if action_type in ['mouse_move', 'left_click_drag']:
137 |                     if 'coordinate' not in input_data or len(input_data['coordinate']) != 2:
138 |                         logger.error(f"Invalid coordinate for mouse action: {input_data}")
139 |                         return {'type': 'error', 'message': 'Invalid coordinate for mouse action'}
140 |                     return {
141 |                         'type': action_type,
142 |                         'x': input_data['coordinate'][0],
143 |                         'y': input_data['coordinate'][1]
144 |                     }
145 |                 elif action_type in ['left_click', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position']:
146 |                     return {'type': action_type}
147 |                 elif action_type in ['type', 'key']:
148 |                     if 'text' not in input_data:
149 |                         logger.error(f"Missing text for keyboard action: {input_data}")
150 |                         return {'type': 'error', 'message': 'Missing text for keyboard action'}
151 |                     return {'type': action_type, 'text': input_data['text']}
152 |                 else:
153 |                     logger.error(f"Unsupported action: {action_type}")
154 |                     return {'type': 'error', 'message': f"Unsupported action: {action_type}"}
155 |         
156 |         logger.error("No tool use found in message")
157 |         return {'type': 'error', 'message': 'No tool use found in message'}
158 | 
159 |     def display_assistant_message(self, message, update_callback):
160 |         if isinstance(message, BetaMessage):
161 |             for item in message.content:
162 |                 if isinstance(item, BetaTextBlock):
163 |                     # Clean and format the text
164 |                     text = item.text.strip()
165 |                     if text:  # Only send non-empty messages
166 |                         update_callback(f"Assistant: {text}")
167 |                 elif isinstance(item, BetaToolUseBlock):
168 |                     # Format tool use in a more readable way
169 |                     tool_name = item.name
170 |                     tool_input = item.input
171 |                     
172 |                     # Convert tool use to a more readable format
173 |                     if tool_name == 'computer':
174 |                         action = {
175 |                             'type': tool_input.get('action'),
176 |                             'x': tool_input.get('coordinate', [0, 0])[0] if 'coordinate' in tool_input else None,
177 |                             'y': tool_input.get('coordinate', [0, 0])[1] if 'coordinate' in tool_input else None,
178 |                             'text': tool_input.get('text')
179 |                         }
180 |                         update_callback(f"Performed action: {json.dumps(action)}")
181 |                     elif tool_name == 'finish_run':
182 |                         update_callback("Assistant: Task completed! ")
183 |                     else:
184 |                         update_callback(f"Assistant action: {tool_name} - {json.dumps(tool_input)}")
185 | 
186 |     def cleanup(self):
187 |         if hasattr(self, 'computer_control'):
188 |             self.computer_control.cleanup()
189 | 


--------------------------------------------------------------------------------
/src/voice_control.py:
--------------------------------------------------------------------------------
  1 | import speech_recognition as sr
  2 | import pyttsx3
  3 | import keyboard
  4 | import threading
  5 | import time
  6 | from PyQt6.QtCore import QObject, pyqtSignal
  7 | 
  8 | class VoiceController(QObject):
  9 |     voice_input_signal = pyqtSignal(str)  # Signal to emit when voice input is received
 10 |     status_signal = pyqtSignal(str)  # Signal to emit status updates
 11 |     
 12 |     def __init__(self):
 13 |         super().__init__()
 14 |         self.recognizer = sr.Recognizer()
 15 |         self.engine = pyttsx3.init()
 16 |         self.is_listening = False
 17 |         self.is_processing = False  # Flag to track if we're processing a command
 18 |         self.listening_thread = None
 19 |         self.wake_word = "hey grunty"  # Wake word to activate voice control
 20 |         
 21 |         # Configure text-to-speech
 22 |         self.engine.setProperty('rate', 150)  # Speed of speech
 23 |         voices = self.engine.getProperty('voices')
 24 |         self.engine.setProperty('voice', voices[1].id)  # Use female voice
 25 |         
 26 |     def speak(self, text):
 27 |         """Text-to-speech output with enhanced status updates"""
 28 |         if not text:
 29 |             return
 30 |             
 31 |         self.status_signal.emit("Initializing speech...")
 32 |         try:
 33 |             # Configure voice settings for this utterance
 34 |             self.engine.setProperty('rate', 150)
 35 |             self.status_signal.emit("Starting to speak...")
 36 |             
 37 |             # Break text into sentences for better status updates
 38 |             sentences = text.split('.')
 39 |             for i, sentence in enumerate(sentences, 1):
 40 |                 if sentence.strip():
 41 |                     self.status_signal.emit(f"Speaking {i}/{len(sentences)}: {sentence.strip()}")
 42 |                     self.engine.say(sentence)
 43 |                     self.engine.runAndWait()
 44 |                     
 45 |             self.status_signal.emit("Finished speaking")
 46 |         except Exception as e:
 47 |             self.status_signal.emit(f"Speech error: {str(e)}")
 48 |         finally:
 49 |             self.status_signal.emit("Ready")
 50 |             
 51 |     def listen_for_command(self):
 52 |         """Listen for voice input with enhanced status updates"""
 53 |         with sr.Microphone() as source:
 54 |             try:
 55 |                 self.status_signal.emit("Adjusting for ambient noise...")
 56 |                 self.recognizer.adjust_for_ambient_noise(source, duration=0.5)
 57 |                 
 58 |                 self.status_signal.emit("Listening for wake word...")
 59 |                 audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=5)
 60 |                 
 61 |                 self.status_signal.emit("Processing audio...")
 62 |                 text = self.recognizer.recognize_google(audio).lower().strip()
 63 |                 self.status_signal.emit(f"Heard: {text}")  # Debug what was heard
 64 |                 
 65 |                 # More flexible wake word detection
 66 |                 if any(text.startswith(word) for word in ["hey grunty", "hey gruny", "hi grunty", "hi gruny"]):
 67 |                     self.status_signal.emit("Wake word detected! Listening for command...")
 68 |                     audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=5)
 69 |                     self.status_signal.emit("Processing command...")
 70 |                     command = self.recognizer.recognize_google(audio).lower()
 71 |                     self.status_signal.emit(f"Command received: {command}")
 72 |                     return command
 73 |                 else:
 74 |                     self.status_signal.emit("Wake word not detected, continuing to listen...")
 75 |                     return None
 76 |                 
 77 |             except sr.WaitTimeoutError:
 78 |                 self.status_signal.emit("Listening timed out")
 79 |             except sr.UnknownValueError:
 80 |                 self.status_signal.emit("Could not understand audio")
 81 |             except sr.RequestError as e:
 82 |                 self.status_signal.emit(f"Speech recognition error: {str(e)}")
 83 |             except Exception as e:
 84 |                 self.status_signal.emit(f"Error: {str(e)}")
 85 |             finally:
 86 |                 self.status_signal.emit("Ready")
 87 |             
 88 |             return None
 89 |             
 90 |     def voice_control_loop(self):
 91 |         """Main loop for voice control"""
 92 |         while self.is_listening:
 93 |             if not self.is_processing:
 94 |                 try:
 95 |                     self.is_processing = True
 96 |                     command = self.listen_for_command()
 97 |                     if command:
 98 |                         self.voice_input_signal.emit(command)
 99 |                 finally:
100 |                     self.is_processing = False
101 |             time.sleep(0.1)  # Small delay to prevent CPU hogging
102 |                 
103 |     def toggle_voice_control(self):
104 |         """Toggle voice control on/off"""
105 |         if not self.is_listening:
106 |             self.is_listening = True
107 |             self.listening_thread = threading.Thread(target=self.voice_control_loop)
108 |             self.listening_thread.daemon = True
109 |             self.listening_thread.start()
110 |             self.status_signal.emit("Voice control activated - Say 'hey Grunty' to start")
111 |             self.speak("Voice control activated")
112 |         else:
113 |             self.is_listening = False
114 |             if self.listening_thread:
115 |                 self.listening_thread.join(timeout=1)
116 |             self.status_signal.emit("Voice control deactivated")
117 |             self.speak("Voice control deactivated")
118 |             
119 |     def finish_processing(self):
120 |         """Call this when command processing is complete"""
121 |         self.is_processing = False
122 |         self.speak("Ready for next command")
123 | 
124 |     def cleanup(self):
125 |         """Clean up voice control resources"""
126 |         # Stop voice control if it's running
127 |         if self.is_listening:
128 |             self.toggle_voice_control()  # This will stop the listening thread
129 |             
130 |         # Stop any pending speech
131 |         if hasattr(self, 'speak_queue'):
132 |             self.speak_queue.put(None)  # Signal speak thread to stop
133 |             if hasattr(self, 'speak_thread'):
134 |                 self.speak_thread.join(timeout=1.0)
135 | 


--------------------------------------------------------------------------------
/src/window.py:
--------------------------------------------------------------------------------
  1 | from PyQt6.QtWidgets import (QMainWindow, QVBoxLayout, QHBoxLayout, QWidget, QTextEdit, 
  2 |                              QPushButton, QLabel, QProgressBar, QSystemTrayIcon, QMenu, QApplication, QDialog, QLineEdit, QMenuBar, QStatusBar)
  3 | from PyQt6.QtCore import Qt, QPoint, pyqtSignal, QThread, QUrl, QSettings
  4 | from PyQt6.QtGui import QFont, QKeySequence, QShortcut, QAction, QTextCursor, QDesktopServices
  5 | from .store import Store
  6 | from .anthropic import AnthropicClient  
  7 | from .voice_control import VoiceController
  8 | from .prompt_manager import PromptManager
  9 | import logging
 10 | import qtawesome as qta
 11 | 
 12 | logger = logging.getLogger(__name__)
 13 | 
 14 | class AgentThread(QThread):
 15 |     update_signal = pyqtSignal(str)
 16 |     finished_signal = pyqtSignal()
 17 | 
 18 |     def __init__(self, store):
 19 |         super().__init__()
 20 |         self.store = store
 21 | 
 22 |     def run(self):
 23 |         self.store.run_agent(self.update_signal.emit)
 24 |         self.finished_signal.emit()
 25 | 
 26 | class SystemPromptDialog(QDialog):
 27 |     def __init__(self, parent=None, prompt_manager=None):
 28 |         super().__init__(parent)
 29 |         self.prompt_manager = prompt_manager
 30 |         self.setWindowTitle("Edit System Prompt")
 31 |         self.setFixedSize(800, 600)
 32 |         
 33 |         layout = QVBoxLayout()
 34 |         
 35 |         # Description
 36 |         desc_label = QLabel("Edit the system prompt that defines the agent's behavior. Be careful with changes as they may affect functionality.")
 37 |         desc_label.setWordWrap(True)
 38 |         desc_label.setStyleSheet("color: #666; margin: 10px 0;")
 39 |         layout.addWidget(desc_label)
 40 |         
 41 |         # Prompt editor
 42 |         self.prompt_editor = QTextEdit()
 43 |         self.prompt_editor.setPlainText(self.prompt_manager.get_current_prompt())
 44 |         self.prompt_editor.setStyleSheet("""
 45 |             QTextEdit {
 46 |                 background-color: #262626;
 47 |                 border: 1px solid #333333;
 48 |                 border-radius: 8px;
 49 |                 color: #ffffff;
 50 |                 padding: 12px;
 51 |                 font-family: Inter;
 52 |                 font-size: 14px;
 53 |             }
 54 |         """)
 55 |         layout.addWidget(self.prompt_editor)
 56 |         
 57 |         # Buttons
 58 |         button_layout = QHBoxLayout()
 59 |         
 60 |         reset_btn = QPushButton("Reset to Default")
 61 |         reset_btn.clicked.connect(self.reset_prompt)
 62 |         reset_btn.setStyleSheet("""
 63 |             QPushButton {
 64 |                 background-color: #666666;
 65 |                 color: white;
 66 |                 border: none;
 67 |                 padding: 10px 20px;
 68 |                 border-radius: 5px;
 69 |             }
 70 |             QPushButton:hover {
 71 |                 background-color: #777777;
 72 |             }
 73 |         """)
 74 |         
 75 |         save_btn = QPushButton("Save Changes")
 76 |         save_btn.clicked.connect(self.save_changes)
 77 |         save_btn.setStyleSheet("""
 78 |             QPushButton {
 79 |                 background-color: #4CAF50;
 80 |                 color: white;
 81 |                 border: none;
 82 |                 padding: 10px 20px;
 83 |                 border-radius: 5px;
 84 |             }
 85 |             QPushButton:hover {
 86 |                 background-color: #45a049;
 87 |             }
 88 |         """)
 89 |         
 90 |         button_layout.addWidget(reset_btn)
 91 |         button_layout.addStretch()
 92 |         button_layout.addWidget(save_btn)
 93 |         
 94 |         layout.addLayout(button_layout)
 95 |         self.setLayout(layout)
 96 |     
 97 |     def reset_prompt(self):
 98 |         if self.prompt_manager.reset_to_default():
 99 |             self.prompt_editor.setPlainText(self.prompt_manager.get_current_prompt())
100 |     
101 |     def save_changes(self):
102 |         new_prompt = self.prompt_editor.toPlainText()
103 |         if self.prompt_manager.save_prompt(new_prompt):
104 |             self.accept()
105 |         else:
106 |             # Show error message
107 |             pass
108 | 
109 | class MainWindow(QMainWindow):
110 |     def __init__(self, store, anthropic_client):
111 |         super().__init__()
112 |         self.store = store
113 |         self.anthropic_client = anthropic_client
114 |         self.prompt_manager = PromptManager()
115 |         
116 |         # Initialize theme settings
117 |         self.settings = QSettings('Grunty', 'Preferences')
118 |         self.dark_mode = self.settings.value('dark_mode', True, type=bool)
119 |         
120 |         # Initialize voice control
121 |         self.voice_controller = VoiceController()
122 |         self.voice_controller.voice_input_signal.connect(self.handle_voice_input)
123 |         self.voice_controller.status_signal.connect(self.update_status)
124 |         
125 |         # Status bar for voice feedback
126 |         self.status_bar = QStatusBar()
127 |         self.setStatusBar(self.status_bar)
128 |         self.status_bar.showMessage("Voice control ready")
129 |         
130 |         # Check if API key is missing
131 |         if self.store.error and "ANTHROPIC_API_KEY not found" in self.store.error:
132 |             self.show_api_key_dialog()
133 |         
134 |         self.setWindowTitle("Grunty 👨💻")
135 |         self.setGeometry(100, 100, 400, 600)
136 |         self.setMinimumSize(400, 500)  # Increased minimum size for better usability
137 |         
138 |         # Set rounded corners and border
139 |         self.setWindowFlags(Qt.WindowType.FramelessWindowHint)
140 |         self.setAttribute(Qt.WidgetAttribute.WA_TranslucentBackground)
141 |         
142 |         self.setup_ui()
143 |         self.setup_tray()
144 |         self.setup_shortcuts()
145 |         
146 |     def show_api_key_dialog(self):
147 |         dialog = QDialog(self)
148 |         dialog.setWindowTitle("API Key Required")
149 |         dialog.setFixedWidth(400)
150 |         
151 |         layout = QVBoxLayout()
152 |         
153 |         # Icon and title
154 |         title_layout = QHBoxLayout()
155 |         icon_label = QLabel()
156 |         icon_label.setPixmap(qta.icon('fa5s.key', color='#4CAF50').pixmap(32, 32))
157 |         title_layout.addWidget(icon_label)
158 |         title_label = QLabel("Anthropic API Key Required")
159 |         title_label.setStyleSheet("font-size: 16px; font-weight: bold; color: #4CAF50;")
160 |         title_layout.addWidget(title_label)
161 |         layout.addLayout(title_layout)
162 |         
163 |         # Description
164 |         desc_label = QLabel("Please enter your Anthropic API key to continue. You can find this in your Anthropic dashboard.")
165 |         desc_label.setWordWrap(True)
166 |         desc_label.setStyleSheet("color: #666; margin: 10px 0;")
167 |         layout.addWidget(desc_label)
168 |         
169 |         # API Key input
170 |         self.api_key_input = QLineEdit()
171 |         self.api_key_input.setPlaceholderText("sk-ant-...")
172 |         self.api_key_input.setStyleSheet("""
173 |             QLineEdit {
174 |                 padding: 10px;
175 |                 border: 2px solid #4CAF50;
176 |                 border-radius: 5px;
177 |                 font-size: 14px;
178 |             }
179 |         """)
180 |         layout.addWidget(self.api_key_input)
181 |         
182 |         # Save button
183 |         save_btn = QPushButton("Save API Key")
184 |         save_btn.setStyleSheet("""
185 |             QPushButton {
186 |                 background-color: #4CAF50;
187 |                 color: white;
188 |                 border: none;
189 |                 padding: 10px;
190 |                 border-radius: 5px;
191 |                 font-size: 14px;
192 |                 font-weight: bold;
193 |             }
194 |             QPushButton:hover {
195 |                 background-color: #45a049;
196 |             }
197 |         """)
198 |         save_btn.clicked.connect(lambda: self.save_api_key(dialog))
199 |         layout.addWidget(save_btn)
200 |         
201 |         dialog.setLayout(layout)
202 |         dialog.exec()
203 | 
204 |     def save_api_key(self, dialog):
205 |         api_key = self.api_key_input.text().strip()
206 |         if not api_key:
207 |             return
208 |             
209 |         # Save to .env file
210 |         with open('.env', 'w') as f:
211 |             f.write(f'ANTHROPIC_API_KEY={api_key}')
212 |             
213 |         # Reinitialize the store and anthropic client
214 |         self.store = Store()
215 |         self.anthropic_client = AnthropicClient()
216 |         dialog.accept()
217 | 
218 |     def setup_ui(self):
219 |         central_widget = QWidget()
220 |         self.setCentralWidget(central_widget)
221 |         
222 |         # Create main layout
223 |         main_layout = QVBoxLayout()
224 |         main_layout.setContentsMargins(15, 15, 15, 15)
225 |         central_widget.setLayout(main_layout)
226 |         
227 |         # Container widget for rounded corners
228 |         self.container = QWidget()  # Make it an instance variable
229 |         self.container.setObjectName("container")
230 |         container_layout = QVBoxLayout()
231 |         container_layout.setSpacing(0)  # Remove spacing between elements
232 |         self.container.setLayout(container_layout)
233 |         
234 |         # Create title bar
235 |         title_bar = QWidget()
236 |         title_bar.setObjectName("titleBar")
237 |         title_bar_layout = QHBoxLayout(title_bar)
238 |         title_bar_layout.setContentsMargins(10, 5, 10, 5)
239 |         
240 |         # Add Grunty title with robot emoji
241 |         title_label = QLabel("Grunty 🤖")
242 |         title_label.setObjectName("titleLabel")
243 |         title_bar_layout.addWidget(title_label)
244 |         
245 |         # Add File Menu
246 |         file_menu = QMenu("File")
247 |         new_task_action = QAction("New Task", self)
248 |         new_task_action.setShortcut("Ctrl+N")
249 |         edit_prompt_action = QAction("Edit System Prompt", self)
250 |         edit_prompt_action.setShortcut("Ctrl+E")
251 |         edit_prompt_action.triggered.connect(self.show_prompt_dialog)
252 |         quit_action = QAction("Quit", self)
253 |         quit_action.setShortcut("Ctrl+Q")
254 |         quit_action.triggered.connect(self.quit_application)
255 |         file_menu.addAction(new_task_action)
256 |         file_menu.addAction(edit_prompt_action)
257 |         file_menu.addSeparator()
258 |         file_menu.addAction(quit_action)
259 |         
260 |         file_button = QPushButton("File")
261 |         file_button.setObjectName("menuButton")
262 |         file_button.clicked.connect(lambda: file_menu.exec(file_button.mapToGlobal(QPoint(0, file_button.height()))))
263 |         title_bar_layout.addWidget(file_button)
264 |         
265 |         # Add spacer to push remaining items to the right
266 |         title_bar_layout.addStretch()
267 |         
268 |         # Theme toggle button
269 |         self.theme_button = QPushButton()
270 |         self.theme_button.setObjectName("titleBarButton")
271 |         self.theme_button.clicked.connect(self.toggle_theme)
272 |         self.update_theme_button()
273 |         title_bar_layout.addWidget(self.theme_button)
274 |         
275 |         # Minimize and close buttons
276 |         minimize_button = QPushButton("−")
277 |         minimize_button.setObjectName("titleBarButton")
278 |         minimize_button.clicked.connect(self.showMinimized)
279 |         title_bar_layout.addWidget(minimize_button)
280 |         
281 |         close_button = QPushButton("×")
282 |         close_button.setObjectName("titleBarButton")
283 |         close_button.clicked.connect(self.close)
284 |         title_bar_layout.addWidget(close_button)
285 |         
286 |         container_layout.addWidget(title_bar)
287 |         
288 |         # Action log with modern styling
289 |         self.action_log = QTextEdit()
290 |         self.action_log.setReadOnly(True)
291 |         self.action_log.setStyleSheet("""
292 |             QTextEdit {
293 |                 background-color: #262626;
294 |                 border: none;
295 |                 border-radius: 0;
296 |                 color: #ffffff;
297 |                 padding: 16px;
298 |                 font-family: Inter;
299 |                 font-size: 13px;
300 |             }
301 |         """)
302 |         container_layout.addWidget(self.action_log, stretch=1)  # Give it flexible space
303 |         
304 |         # Progress bar - Now above input area
305 |         self.progress_bar = QProgressBar()
306 |         self.progress_bar.setRange(0, 0)
307 |         self.progress_bar.setTextVisible(False)
308 |         self.progress_bar.setStyleSheet("""
309 |             QProgressBar {
310 |                 border: none;
311 |                 background-color: #262626;
312 |                 height: 2px;
313 |                 margin: 0;
314 |             }
315 |             QProgressBar::chunk {
316 |                 background-color: #4CAF50;
317 |             }
318 |         """)
319 |         self.progress_bar.hide()
320 |         container_layout.addWidget(self.progress_bar)
321 | 
322 |         # Input section container - Fixed height at bottom
323 |         input_section = QWidget()
324 |         input_section.setObjectName("input_section")
325 |         input_section.setStyleSheet("""
326 |             QWidget {
327 |                 background-color: #1e1e1e;
328 |                 border-top: 1px solid #333333;
329 |             }
330 |         """)
331 |         input_layout = QVBoxLayout()
332 |         input_layout.setContentsMargins(16, 16, 16, 16)
333 |         input_layout.setSpacing(12)
334 |         input_section.setLayout(input_layout)
335 | 
336 |         # Input area with modern styling
337 |         self.input_area = QTextEdit()
338 |         self.input_area.setPlaceholderText("What can I do for you today?")
339 |         self.input_area.setFixedHeight(100)  # Fixed height for input
340 |         self.input_area.setStyleSheet("""
341 |             QTextEdit {
342 |                 background-color: #262626;
343 |                 border: 1px solid #333333;
344 |                 border-radius: 8px;
345 |                 color: #ffffff;
346 |                 padding: 12px;
347 |                 font-family: Inter;
348 |                 font-size: 14px;
349 |                 selection-background-color: #4CAF50;
350 |             }
351 |             QTextEdit:focus {
352 |                 border: 1px solid #4CAF50;
353 |             }
354 |         """)
355 |         # Connect textChanged signal
356 |         self.input_area.textChanged.connect(self.update_run_button)
357 |         input_layout.addWidget(self.input_area)
358 | 
359 |         # Control buttons with modern styling
360 |         control_layout = QHBoxLayout()
361 |         
362 |         self.run_button = QPushButton(qta.icon('fa5s.play', color='white'), "Start")
363 |         self.stop_button = QPushButton(qta.icon('fa5s.stop', color='white'), "Stop")
364 |         
365 |         # Connect button signals
366 |         self.run_button.clicked.connect(self.run_agent)
367 |         self.stop_button.clicked.connect(self.stop_agent)
368 |         
369 |         # Initialize button states
370 |         self.run_button.setEnabled(True)
371 |         self.stop_button.setEnabled(False)
372 |         
373 |         for button in (self.run_button, self.stop_button):
374 |             button.setFixedHeight(40)
375 |             if button == self.run_button:
376 |                 button.setStyleSheet("""
377 |                     QPushButton {
378 |                         background-color: #4CAF50;
379 |                         color: white;
380 |                         border: none;
381 |                         border-radius: 8px;
382 |                         padding: 0 24px;
383 |                         font-family: Inter;
384 |                         font-size: 14px;
385 |                         font-weight: bold;
386 |                     }
387 |                     QPushButton:hover {
388 |                         background-color: #45a049;
389 |                     }
390 |                     QPushButton:disabled {
391 |                         background-color: #333333;
392 |                         color: #666666;
393 |                     }
394 |                 """)
395 |             else:  # Stop button
396 |                 button.setStyleSheet("""
397 |                     QPushButton {
398 |                         background-color: #ff4444;
399 |                         color: white;
400 |                         border: none;
401 |                         border-radius: 8px;
402 |                         padding: 0 24px;
403 |                         font-family: Inter;
404 |                         font-size: 14px;
405 |                         font-weight: bold;
406 |                     }
407 |                     QPushButton:hover {
408 |                         background-color: #ff3333;
409 |                     }
410 |                     QPushButton:disabled {
411 |                         background-color: #333333;
412 |                         color: #666666;
413 |                     }
414 |                 """)
415 |             control_layout.addWidget(button)
416 |         
417 |         # Add voice control button to control layout
418 |         self.voice_button = QPushButton(qta.icon('fa5s.microphone', color='white'), "Voice")
419 |         self.voice_button.setFixedHeight(40)
420 |         self.voice_button.setStyleSheet("""
421 |             QPushButton {
422 |                 background-color: #4CAF50;
423 |                 color: white;
424 |                 border: none;
425 |                 border-radius: 8px;
426 |                 padding: 0 24px;
427 |                 font-family: Inter;
428 |                 font-size: 14px;
429 |                 font-weight: bold;
430 |             }
431 |             QPushButton:hover {
432 |                 background-color: #45a049;
433 |             }
434 |             QPushButton:checked {
435 |                 background-color: #ff4444;
436 |             }
437 |         """)
438 |         self.voice_button.setCheckable(True)
439 |         self.voice_button.clicked.connect(self.toggle_voice_control)
440 |         control_layout.addWidget(self.voice_button)
441 |         
442 |         input_layout.addLayout(control_layout)
443 | 
444 |         # Add input section to main container
445 |         container_layout.addWidget(input_section)
446 | 
447 |         # Add the container to the main layout
448 |         main_layout.addWidget(self.container)
449 |         
450 |         # Apply theme after all widgets are set up
451 |         self.apply_theme()
452 |         
453 |     def update_theme_button(self):
454 |         if self.dark_mode:
455 |             self.theme_button.setIcon(qta.icon('fa5s.sun', color='white'))
456 |             self.theme_button.setToolTip("Switch to Light Mode")
457 |         else:
458 |             self.theme_button.setIcon(qta.icon('fa5s.moon', color='black'))
459 |             self.theme_button.setToolTip("Switch to Dark Mode")
460 | 
461 |     def toggle_theme(self):
462 |         self.dark_mode = not self.dark_mode
463 |         self.settings.setValue('dark_mode', self.dark_mode)
464 |         self.update_theme_button()
465 |         self.apply_theme()
466 | 
467 |     def apply_theme(self):
468 |         # Apply styles based on theme
469 |         colors = {
470 |             'bg': '#1a1a1a' if self.dark_mode else '#ffffff',
471 |             'text': '#ffffff' if self.dark_mode else '#000000',
472 |             'button_bg': '#333333' if self.dark_mode else '#f0f0f0',
473 |             'button_text': '#ffffff' if self.dark_mode else '#000000',
474 |             'button_hover': '#4CAF50' if self.dark_mode else '#e0e0e0',
475 |             'border': '#333333' if self.dark_mode else '#e0e0e0'
476 |         }
477 | 
478 |         # Container style
479 |         container_style = f"""
480 |             QWidget#container {{
481 |                 background-color: {colors['bg']};
482 |                 border-radius: 12px;
483 |                 border: 1px solid {colors['border']};
484 |             }}
485 |         """
486 |         self.container.setStyleSheet(container_style)  # Use instance variable
487 | 
488 |         # Update title label
489 |         self.findChild(QLabel, "titleLabel").setStyleSheet(f"color: {colors['text']}; padding: 5px;")
490 | 
491 |         # Update action log
492 |         self.action_log.setStyleSheet(f"""
493 |             QTextEdit {{
494 |                 background-color: {colors['bg']};
495 |                 border: none;
496 |                 border-radius: 0;
497 |                 color: {colors['text']};
498 |                 padding: 16px;
499 |                 font-family: Inter;
500 |                 font-size: 13px;
501 |             }}
502 |         """)
503 | 
504 |         # Update input area
505 |         self.input_area.setStyleSheet(f"""
506 |             QTextEdit {{
507 |                 background-color: {colors['bg']};
508 |                 border: 1px solid {colors['border']};
509 |                 border-radius: 8px;
510 |                 color: {colors['text']};
511 |                 padding: 12px;
512 |                 font-family: Inter;
513 |                 font-size: 14px;
514 |                 selection-background-color: {colors['button_hover']};
515 |             }}
516 |             QTextEdit:focus {{
517 |                 border: 1px solid {colors['button_hover']};
518 |             }}
519 |         """)
520 | 
521 |         # Update progress bar
522 |         self.progress_bar.setStyleSheet(f"""
523 |             QProgressBar {{
524 |                 border: none;
525 |                 background-color: {colors['bg']};
526 |                 height: 2px;
527 |                 margin: 0;
528 |             }}
529 |             QProgressBar::chunk {{
530 |                 background-color: {colors['button_hover']};
531 |             }}
532 |         """)
533 | 
534 |         # Update input section
535 |         input_section_style = f"""
536 |             QWidget {{
537 |                 background-color: {colors['button_bg']};
538 |                 border-top: 1px solid {colors['border']};
539 |             }}
540 |         """
541 |         self.findChild(QWidget, "input_section").setStyleSheet(input_section_style)
542 | 
543 |         # Update window controls style
544 |         window_control_style = f"""
545 |             QPushButton {{
546 |                 color: {colors['button_text']};
547 |                 background-color: transparent;
548 |                 border-radius: 8px;
549 |                 padding: 4px 12px;
550 |                 font-weight: bold;
551 |             }}
552 |             QPushButton:hover {{
553 |                 background-color: {colors['button_hover']};
554 |             }}
555 |         """
556 | 
557 |         # Apply to all window control buttons
558 |         for button in [self.theme_button, 
559 |                       self.findChild(QPushButton, "menuButton"),
560 |                       self.findChild(QPushButton, "titleBarButton")]:
561 |             if button:
562 |                 button.setStyleSheet(window_control_style)
563 | 
564 |         # Update theme button icon
565 |         if self.dark_mode:
566 |             self.theme_button.setIcon(qta.icon('fa5s.sun', color=colors['button_text']))
567 |         else:
568 |             self.theme_button.setIcon(qta.icon('fa5s.moon', color=colors['button_text']))
569 | 
570 |         # Update tray menu style if needed
571 |         if hasattr(self, 'tray_icon') and self.tray_icon.contextMenu():
572 |             self.tray_icon.contextMenu().setStyleSheet(f"""
573 |                 QMenu {{
574 |                     background-color: {colors['bg']};
575 |                     color: {colors['text']};
576 |                     border: 1px solid {colors['border']};
577 |                     border-radius: 6px;
578 |                     padding: 5px;
579 |                 }}
580 |                 QMenu::item {{
581 |                     padding: 8px 25px 8px 8px;
582 |                     border-radius: 4px;
583 |                 }}
584 |                 QMenu::item:selected {{
585 |                     background-color: {colors['button_hover']};
586 |                     color: white;
587 |                 }}
588 |                 QMenu::separator {{
589 |                     height: 1px;
590 |                     background: {colors['border']};
591 |                     margin: 5px 0px;
592 |                 }}
593 |             """)
594 |         
595 |     def update_run_button(self):
596 |         self.run_button.setEnabled(bool(self.input_area.toPlainText().strip()))
597 |         
598 |     def setup_tray(self):
599 |         self.tray_icon = QSystemTrayIcon(self)
600 |         # Make the icon larger and more visible
601 |         icon = qta.icon('fa5s.robot', scale_factor=1.5, color='white')
602 |         self.tray_icon.setIcon(icon)
603 |         
604 |         # Create the tray menu
605 |         tray_menu = QMenu()
606 |         
607 |         # Add a title item (non-clickable)
608 |         title_action = tray_menu.addAction("Grunty 👨🏽‍💻")
609 |         title_action.setEnabled(False)
610 |         tray_menu.addSeparator()
611 |         
612 |         # Add "New Task" option with icon
613 |         new_task = tray_menu.addAction(qta.icon('fa5s.plus', color='white'), "New Task")
614 |         new_task.triggered.connect(self.show)
615 |         
616 |         # Add "Show/Hide" toggle with icon
617 |         toggle_action = tray_menu.addAction(qta.icon('fa5s.eye', color='white'), "Show/Hide")
618 |         toggle_action.triggered.connect(self.toggle_window)
619 |         
620 |         tray_menu.addSeparator()
621 |         
622 |         # Add Quit option with icon
623 |         quit_action = tray_menu.addAction(qta.icon('fa5s.power-off', color='white'), "Quit")
624 |         quit_action.triggered.connect(self.quit_application)
625 |         
626 |         # Style the menu for dark mode
627 |         tray_menu.setStyleSheet("""
628 |             QMenu {
629 |                 background-color: #333333;
630 |                 color: white;
631 |                 border: 1px solid #444444;
632 |                 border-radius: 6px;
633 |                 padding: 5px;
634 |             }
635 |             QMenu::item {
636 |                 padding: 8px 25px 8px 8px;
637 |                 border-radius: 4px;
638 |             }
639 |             QMenu::item:selected {
640 |                 background-color: #4CAF50;
641 |             }
642 |             QMenu::separator {
643 |                 height: 1px;
644 |                 background: #444444;
645 |                 margin: 5px 0px;
646 |             }
647 |         """)
648 |         
649 |         self.tray_icon.setContextMenu(tray_menu)
650 |         self.tray_icon.show()
651 |         
652 |         # Show a notification when the app starts
653 |         self.tray_icon.showMessage(
654 |             "Grunty is running",
655 |             "Click the robot icon in the menu bar to get started!",
656 |             QSystemTrayIcon.MessageIcon.Information,
657 |             3000
658 |         )
659 |         
660 |         # Connect double-click to toggle window
661 |         self.tray_icon.activated.connect(self.tray_icon_activated)
662 | 
663 |     def tray_icon_activated(self, reason):
664 |         if reason == QSystemTrayIcon.ActivationReason.DoubleClick:
665 |             self.toggle_window()
666 | 
667 |     def toggle_window(self):
668 |         if self.isVisible():
669 |             self.hide()
670 |         else:
671 |             self.show()
672 |             self.raise_()
673 |             self.activateWindow()
674 | 
675 |     def run_agent(self):
676 |         instructions = self.input_area.toPlainText()
677 |         if not instructions:
678 |             self.update_log("Please enter instructions before running the agent.")
679 |             return
680 |         
681 |         self.store.set_instructions(instructions)
682 |         self.run_button.setEnabled(False)
683 |         self.stop_button.setEnabled(True)
684 |         self.progress_bar.show()
685 |         self.action_log.clear()
686 |         self.input_area.clear()  # Clear the input area after starting the agent
687 |         
688 |         self.agent_thread = AgentThread(self.store)
689 |         self.agent_thread.update_signal.connect(self.update_log)
690 |         self.agent_thread.finished_signal.connect(self.agent_finished)
691 |         self.agent_thread.start()
692 |         
693 |     def stop_agent(self):
694 |         self.store.stop_run()
695 |         self.stop_button.setEnabled(False)
696 |         
697 |     def agent_finished(self):
698 |         self.run_button.setEnabled(True)
699 |         self.stop_button.setEnabled(False)
700 |         self.progress_bar.hide()
701 |         
702 |         # Yellow completion message with sparkle emoji
703 |         completion_message = '''
704 |             <div style="margin: 6px 0;">
705 |                 <span style="
706 |                     display: inline-flex;
707 |                     align-items: center;
708 |                     background-color: rgba(45, 45, 45, 0.95);
709 |                     border: 1px solid rgba(255, 255, 255, 0.1);
710 |                     border-radius: 100px;
711 |                     padding: 4px 12px;
712 |                     color: #FFD700;
713 |                     font-family: Inter, -apple-system, system-ui, sans-serif;
714 |                     font-size: 13px;
715 |                     line-height: 1.4;
716 |                     white-space: nowrap;
717 |                 ">✨ Agent run completed</span>
718 |             </div>
719 |         '''
720 |         self.action_log.append(completion_message)
721 |         
722 |         # Notify voice controller that processing is complete
723 |         if hasattr(self, 'voice_controller'):
724 |             self.voice_controller.finish_processing()
725 |         
726 |         
727 |     def update_log(self, message):
728 |         if message.startswith("Performed action:"):
729 |             action_text = message.replace("Performed action:", "").strip()
730 |             
731 |             # Pill-shaped button style with green text
732 |             button_style = '''
733 |                 <div style="margin: 6px 0;">
734 |                     <span style="
735 |                         display: inline-flex;
736 |                         align-items: center;
737 |                         background-color: rgba(45, 45, 45, 0.95);
738 |                         border: 1px solid rgba(255, 255, 255, 0.1);
739 |                         border-radius: 100px;
740 |                         padding: 4px 12px;
741 |                         color: #4CAF50;
742 |                         font-family: Inter, -apple-system, system-ui, sans-serif;
743 |                         font-size: 13px;
744 |                         line-height: 1.4;
745 |                         white-space: nowrap;
746 |                     ">{}</span>
747 |                 </div>
748 |             '''
749 |             
750 |             try:
751 |                 import json
752 |                 action_data = json.loads(action_text)
753 |                 action_type = action_data.get('type', '').lower()
754 |                 
755 |                 if action_type == "type":
756 |                     text = action_data.get('text', '')
757 |                     msg = f'⌨️ <span style="margin: 0 4px; color: #4CAF50;">Typed</span> <span style="color: #4CAF50">"{text}"</span>'
758 |                     self.action_log.append(button_style.format(msg))
759 |                     
760 |                 elif action_type == "key":
761 |                     key = action_data.get('text', '')
762 |                     msg = f'⌨️ <span style="margin: 0 4px; color: #4CAF50;">Pressed</span> <span style="color: #4CAF50">{key}</span>'
763 |                     self.action_log.append(button_style.format(msg))
764 |                     
765 |                 elif action_type == "mouse_move":
766 |                     x = action_data.get('x', 0)
767 |                     y = action_data.get('y', 0)
768 |                     msg = f'🖱️ <span style="margin: 0 4px; color: #4CAF50;">Moved to</span> <span style="color: #4CAF50">({x}, {y})</span>'
769 |                     self.action_log.append(button_style.format(msg))
770 |                     
771 |                 elif action_type == "screenshot":
772 |                     msg = '📸 <span style="margin: 0 4px; color: #4CAF50;">Captured Screenshot</span>'
773 |                     self.action_log.append(button_style.format(msg))
774 |                     
775 |                 elif "click" in action_type:
776 |                     x = action_data.get('x', 0)
777 |                     y = action_data.get('y', 0)
778 |                     click_map = {
779 |                         "left_click": "Left Click",
780 |                         "right_click": "Right Click",
781 |                         "middle_click": "Middle Click",
782 |                         "double_click": "Double Click"
783 |                     }
784 |                     click_type = click_map.get(action_type, "Click")
785 |                     msg = f'👆 <span style="margin: 0 4px; color: #4CAF50;">{click_type}</span> <span style="color: #4CAF50">({x}, {y})</span>'
786 |                     self.action_log.append(button_style.format(msg))
787 |                     
788 |             except json.JSONDecodeError:
789 |                 self.action_log.append(button_style.format(action_text))
790 | 
791 |         # Clean assistant message style without green background
792 |         elif message.startswith("Assistant:"):
793 |             message_style = '''
794 |                 <div style="
795 |                     border-left: 2px solid #666;
796 |                     padding: 8px 16px;
797 |                     margin: 8px 0;
798 |                     font-family: Inter, -apple-system, system-ui, sans-serif;
799 |                     font-size: 13px;
800 |                     line-height: 1.5;
801 |                     color: #e0e0e0;
802 |                 ">{}</div>
803 |             '''
804 |             clean_message = message.replace("Assistant:", "").strip()
805 |             self.action_log.append(message_style.format(f'💬 {clean_message}'))
806 | 
807 |         # Subtle assistant action style
808 |         elif message.startswith("Assistant action:"):
809 |             action_style = '''
810 |                 <div style="
811 |                     color: #666;
812 |                     font-style: italic;
813 |                     padding: 4px 0;
814 |                     font-size: 12px;
815 |                     font-family: Inter, -apple-system, system-ui, sans-serif;
816 |                     line-height: 1.4;
817 |                 ">🤖 {}</div>
818 |             '''
819 |             clean_message = message.replace("Assistant action:", "").strip()
820 |             self.action_log.append(action_style.format(clean_message))
821 | 
822 |         # Regular message style
823 |         else:
824 |             regular_style = '''
825 |                 <div style="
826 |                     padding: 4px 0;
827 |                     color: #e0e0e0;
828 |                     font-family: Inter, -apple-system, system-ui, sans-serif;
829 |                     font-size: 13px;
830 |                     line-height: 1.4;
831 |                 ">{}</div>
832 |             '''
833 |             self.action_log.append(regular_style.format(message))
834 | 
835 |         # Scroll to bottom
836 |         self.action_log.verticalScrollBar().setValue(
837 |             self.action_log.verticalScrollBar().maximum()
838 |         )
839 |         
840 |     def handle_voice_input(self, text):
841 |         """Handle voice input by setting it in the input area and running the agent"""
842 |         self.input_area.setText(text)
843 |         if text.strip():  # Only run if there's actual text
844 |             self.run_agent()
845 |         
846 |     def update_status(self, message):
847 |         """Update status bar with voice control status"""
848 |         self.status_bar.showMessage(message)
849 |         
850 |     def update_voice_status(self, status):
851 |         """Update the action log with voice control status"""
852 |         status_style = '''
853 |             <div style="margin: 6px 0;">
854 |                 <span style="
855 |                     display: inline-flex;
856 |                     align-items: center;
857 |                     background-color: rgba(45, 45, 45, 0.95);
858 |                     border: 1px solid rgba(255, 255, 255, 0.1);
859 |                     border-radius: 100px;
860 |                     padding: 4px 12px;
861 |                     color: #4CAF50;
862 |                     font-family: Inter, -apple-system, system-ui, sans-serif;
863 |                     font-size: 13px;
864 |                     line-height: 1.4;
865 |                     white-space: nowrap;
866 |                 ">🎤 {}</span>
867 |             </div>
868 |         '''
869 |         self.action_log.append(status_style.format(status))
870 |         
871 |     def toggle_voice_control(self):
872 |         """Toggle voice control on/off"""
873 |         if self.voice_button.isChecked():
874 |             self.voice_controller.toggle_voice_control()
875 |         else:
876 |             self.voice_controller.toggle_voice_control()
877 |             
878 |     def setup_shortcuts(self):
879 |         # Essential shortcuts
880 |         close_window = QShortcut(QKeySequence("Ctrl+W"), self)
881 |         close_window.activated.connect(self.close)
882 |         
883 |         # Add Ctrl+C to stop agent
884 |         stop_agent = QShortcut(QKeySequence("Ctrl+C"), self)
885 |         stop_agent.activated.connect(self.stop_agent)
886 |         
887 |         # Add Ctrl+Enter to send message
888 |         send_message = QShortcut(QKeySequence("Ctrl+Return"), self)
889 |         send_message.activated.connect(self.run_agent)
890 |         
891 |         # Add Alt+V shortcut for voice control
892 |         voice_shortcut = QShortcut(QKeySequence("Alt+V"), self)
893 |         voice_shortcut.activated.connect(lambda: self.voice_button.click())
894 |         
895 |         # Allow tab for indentation
896 |         self.input_area.setTabChangesFocus(False)
897 |         
898 |         # Custom text editing handlers
899 |         self.input_area.keyPressEvent = self.handle_input_keypress
900 | 
901 |     def handle_input_keypress(self, event):
902 |         # Handle tab key for indentation
903 |         if event.key() == Qt.Key.Key_Tab:
904 |             cursor = self.input_area.textCursor()
905 |             cursor.insertText("    ")  # Insert 4 spaces for tab
906 |             return
907 |             
908 |         # Handle Ctrl+Enter to run agent
909 |         if event.key() == Qt.Key.Key_Return and event.modifiers() == Qt.KeyboardModifier.ControlModifier:
910 |             self.run_agent()
911 |             return
912 |             
913 |         # For all other keys, use default handling
914 |         QTextEdit.keyPressEvent(self.input_area, event)
915 |         
916 |     def mousePressEvent(self, event):
917 |         self.oldPos = event.globalPosition().toPoint()
918 | 
919 |     def mouseMoveEvent(self, event):
920 |         delta = QPoint(event.globalPosition().toPoint() - self.oldPos)
921 |         self.move(self.x() + delta.x(), self.y() + delta.y())
922 |         self.oldPos = event.globalPosition().toPoint()
923 |         
924 |     def closeEvent(self, event):
925 |         """Handle window close event - properly quit the application"""
926 |         self.quit_application()
927 |         event.accept()  # Allow the close
928 |         
929 |     def quit_application(self):
930 |         """Clean up resources and quit the application"""
931 |         # Stop any running agent
932 |         self.store.stop_run()
933 |         
934 |         # Clean up voice control
935 |         if hasattr(self, 'voice_controller'):
936 |             self.voice_controller.cleanup()
937 |         
938 |         # Save settings
939 |         self.settings.sync()
940 |         
941 |         # Hide tray icon before quitting
942 |         if hasattr(self, 'tray_icon'):
943 |             self.tray_icon.hide()
944 |         
945 |         # Actually quit the application
946 |         QApplication.quit()
947 | 
948 |     def show_prompt_dialog(self):
949 |         dialog = SystemPromptDialog(self, self.prompt_manager)
950 |         dialog.exec()


--------------------------------------------------------------------------------