.
675 |
676 | ---
677 |
678 | Restricted
679 |
680 | - Do not use any subset of this code for any commercial purposes.
681 | - Developer assumes liability for any direct, indirect, incidental, special, consequential or exemplary damages.
682 |
--------------------------------------------------------------------------------
/MEDIA.md:
--------------------------------------------------------------------------------
1 | # Open Interface Demos and Media Resources
2 |
3 | This document is an extension of [README.md](README.md) which I recommend to check out first.
4 |
5 | Feel free to use these images and videos in publication and reach out to me for any questions.
6 |
7 | Sections
8 | - [Demo Videos](https://github.com/AmberSahdev/Open-Interface/blob/main/MEDIA.md#demos)
9 | - [Images](https://github.com/AmberSahdev/Open-Interface/blob/main/MEDIA.md#images)
10 |
11 | ## Demos
12 |
13 | ### "Write a Web App"
14 | 
15 |
16 | https://github.com/AmberSahdev/Open-Interface/assets/23853621/956ac674-2c70-4011-9aad-50bd338b2674
17 |
18 | ### "Go to the Bottom of Chet Baker's Wikipedia Page"
19 | 
20 |
21 | https://github.com/AmberSahdev/Open-Interface/assets/23853621/bf9041b8-3c14-407f-8a72-1cddd7bc6ff9
22 |
23 | ### "Make me a meal plan in Google Docs"
24 | 
25 |
26 | https://github.com/AmberSahdev/Open-Interface/assets/23853621/424f61ad-5ee6-425f-b922-a1eecad2ef7b
27 |
28 | ##
29 |
30 | ---
31 |
32 | ---
33 |
34 | ---
35 |
36 | ## Images:
37 |
38 | UI and Custom LLM Models:
39 | 
40 |
41 | Making a Web App:
42 | 
43 |
44 | Writing a Meal Plan:
45 | 
46 |
47 | Navigating Wikipedia:
48 | 
49 |
50 |
51 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Open Interface
2 |
3 |
4 |
5 |
6 |
7 | ### Control Your Computer Using LLMs
8 |
9 | Open Interface
10 | - Self-drives your computer by sending your requests to an LLM backend (GPT-4o, Gemini, etc) to figure out the required steps.
11 | - Automatically executes these steps by simulating keyboard and mouse input.
12 | - Course-corrects by sending the LLM backend updated screenshots of the progress as needed.
13 |
14 |
15 |
16 |
Full Autopilot for All Computers Using LLMs
17 |
18 | [](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
19 | [](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
20 | [](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
21 |
22 | []((https://github.com/AmberSahdev/Open-Interface/releases/latest))
23 | 
24 | 
25 | 
26 | [](https://github.com/AmberSahdev/Open-Interface/releases/latest)
27 |
28 |
29 |
30 | ### Demo 💻
31 | "Solve Today's Wordle"
32 | 
33 | *clipped, 2x*
34 |
35 |
36 | More Demos
37 |
38 | -
39 | "Make me a meal plan in Google Docs"
40 |
41 |
42 | -
43 | "Write a Web App"
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 | ### Install 💽
52 |
53 |
MacOS
54 |
55 | - Download the MacOS binary from the latest release.
56 | - Unzip the file and move Open Interface to the Applications Folder.
57 |
59 |
60 |
61 |
62 | Apple Silicon M-Series Macs
63 |
64 | -
65 | Open Interface will ask you for Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.
66 |
67 | -
68 | In case it doesn't, manually add these permission via System Settings -> Privacy and Security
69 |
70 | 
72 |
74 |
75 |
76 |
77 |
78 | Intel Macs
79 |
80 | -
81 | Launch the app from the Applications folder.
82 | You might face the standard Mac "Open Interface cannot be opened" error.
83 | 
85 | In that case, press "Cancel".
86 | Then go to System Preferences -> Security and Privacy -> Open Anyway.
87 |
89 |
91 |
93 |
94 |
95 | -
96 | Open Interface will also need Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.
97 | 
99 |
101 |
102 |
103 |
104 |
105 | - Lastly, checkout the Setup section to connect Open Interface to LLMs (OpenAI GPT-4V)
106 |
107 |
108 |
109 |
Linux
110 |
111 | - Linux binary has been tested on Ubuntu 20.04 so far.
112 | - Download the Linux zip file from the latest release.
113 | -
114 | Extract the executable and checkout the Setup section to connect Open Interface to LLMs, such as OpenAI GPT-4V.
115 |
116 |
117 |
118 |
Windows
119 |
120 | - Windows binary has been tested on Windows 10.
121 | - Download the Windows zip file from the latest release.
122 | - Unzip the folder, move the exe to the desired location, double click to open, and voila.
123 | - Checkout the Setup section to connect Open Interface to LLMs (OpenAI GPT-4V)
124 |
125 |
126 |
127 |
128 |
Run as a Script
129 |
130 | - Clone the repo
git clone https://github.com/AmberSahdev/Open-Interface.git
131 | - Enter the directory
cd Open-Interface
132 | - Optionally use a Python virtual environment
133 |
134 | - Note: pyenv handles tkinter installation weirdly so you may have to debug for your own system yourself.
135 | pyenv local 3.12.2
136 | python -m venv .venv
137 | source .venv/bin/activate
138 |
139 |
140 | - Install dependencies
pip install -r requirements.txt
141 | - Run the app using
python app/app.py
142 |
143 |
144 |
145 | ### Setup 🛠️
146 |
147 | Set up the OpenAI API key
148 |
149 | - Get your OpenAI API key
150 | - Open Interface needs access to GPT-4o to perform user requests. GPT-4o keys can be downloaded from your OpenAI account at [platform.openai.com/settings/organization/api-keys](https://platform.openai.com/settings/organization/api-keys).
151 | - [Follow the steps here](https://help.openai.com/en/articles/8264644-what-is-prepaid-billing) to add balance to your OpenAI account. To unlock GPT-4o a minimum payment of $5 is needed.
152 | - [More info](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4)
153 | - Save the API key in Open Interface settings
154 | - In Open Interface, go to the Settings menu on the top right and enter the key you received from OpenAI into the text field like so:
155 |
156 |
157 |
158 |
159 |
160 |
161 | - After setting the API key for the first time you'll need to restart the app.
162 |
163 |
164 |
165 |
166 | Set up the Google Gemini API key
167 |
168 | - Go to Settings -> Advanced Settings and select the Gemini model you wish to use.
169 | - Get your Google Gemini API key from https://aistudio.google.com/app/apikey.
170 | - Save the API key in Open Interface settings.
171 | - Save the settings and restart the app.
172 |
173 |
174 |
175 |
176 | Optional: Setup a Custom LLM
177 |
178 | - Open Interface supports using other OpenAI API style LLMs (such as Llava) as a backend and can be configured easily in the Advanced Settings window.
179 | - Enter the custom base url and model name in the Advanced Settings window and the API key in the Settings window as needed.
180 | - NOTE - If you're using Llama:
181 | - You may need to enter a random string like "xxx" in the API key input box.
182 | - You may need to append /v1/ to the base URL.
183 |
184 |
185 |
186 |
187 |
188 | - If your LLM does not support an OpenAI style API, you can use a library like [this](https://github.com/BerriAI/litellm) to convert it to one.
189 | - You will need to restart the app after these changes.
190 |
191 |
192 |
193 |
194 |
195 | ### Stuff It’s Error-Prone At, For Now 😬
196 |
197 | - Accurate spatial-reasoning and hence clicking buttons.
198 | - Keeping track of itself in tabular contexts, like Excel and Google Sheets, for similar reasons as stated above.
199 | - Navigating complex GUI-rich applications like Counter-Strike, Spotify, Garage Band, etc due to heavy reliance on cursor actions.
200 |
201 |
202 | ### The Future 🔮
203 | (*with better models trained on video walkthroughs like Youtube tutorials*)
204 | - "Create a couple of bass samples for me in Garage Band for my latest project."
205 | - "Read this design document for a new feature, edit the code on Github, and submit it for review."
206 | - "Find my friends' music taste from Spotify and create a party playlist for tonight's event."
207 | - "Take the pictures from my Tahoe trip and make a White Lotus type montage in iMovie."
208 |
209 | ### Notes 📝
210 | - Cost Estimation: $0.0005 - $0.002 per LLM request depending on the model used.
211 | (User requests can require between two to a few dozen LLM backend calls depending on the request's complexity.)
212 | - You can interrupt the app anytime by pressing the Stop button, or by dragging your cursor to any of the screen corners.
213 | - Open Interface can only see your primary display when using multiple monitors. Therefore, if the cursor/focus is on a secondary screen, it might keep retrying the same actions as it is unable to see its progress.
214 |
215 |
216 |
217 | ### System Diagram 🖼️
218 | ```
219 | +----------------------------------------------------+
220 | | App |
221 | | |
222 | | +-------+ |
223 | | | GUI | |
224 | | +-------+ |
225 | | ^ |
226 | | | |
227 | | v |
228 | | +-----------+ (Screenshot + Goal) +-----------+ |
229 | | | | --------------------> | | |
230 | | | Core | | LLM | |
231 | | | | <-------------------- | (GPT-4o) | |
232 | | +-----------+ (Instructions) +-----------+ |
233 | | | |
234 | | v |
235 | | +-------------+ |
236 | | | Interpreter | |
237 | | +-------------+ |
238 | | | |
239 | | v |
240 | | +-------------+ |
241 | | | Executer | |
242 | | +-------------+ |
243 | +----------------------------------------------------+
244 | ```
245 |
246 | ---
247 |
248 | ### Star History ⭐️
249 |
250 |
251 |
252 |
253 |
254 | ### Links 🔗
255 | - Check out more of my projects at [AmberSah.dev](https://AmberSah.dev).
256 | - Other demos and press kit can be found at [MEDIA.md](MEDIA.md).
257 |
258 |
259 |
260 |

261 |
262 |
263 |
--------------------------------------------------------------------------------
/app/README.md:
--------------------------------------------------------------------------------
1 | # Open Interface
2 |
3 | ### Usage
4 | ```commandline
5 | python3 app.py
6 | ```
7 |
8 | ### System Diagram
9 | ```
10 | +----------------------------------------------------+
11 | | App |
12 | | |
13 | | +-------+ |
14 | | | GUI | |
15 | | +-------+ |
16 | | ^ |
17 | | | (via MP Queues) |
18 | | v |
19 | | +-----------+ (Screenshot + Goal) +-----------+ |
20 | | | | --------------------> | | |
21 | | | Core | | LLM | |
22 | | | | <-------------------- | (GPT-4V) | |
23 | | +-----------+ (Instructions) +-----------+ |
24 | | | |
25 | | v |
26 | | +-------------+ |
27 | | | Interpreter | |
28 | | +-------------+ |
29 | | | |
30 | | v |
31 | | +-------------+ |
32 | | | Executer | |
33 | | +-------------+ |
34 | +----------------------------------------------------+
35 | ```
--------------------------------------------------------------------------------
/app/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/app/__init__.py
--------------------------------------------------------------------------------
/app/app.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import threading
3 | from multiprocessing import freeze_support
4 |
5 | from core import Core
6 | from ui import UI
7 |
8 |
9 | class App:
10 | """
11 | +----------------------------------------------------+
12 | | App |
13 | | |
14 | | +-------+ |
15 | | | GUI | |
16 | | +-------+ |
17 | | ^ |
18 | | | (via MP Queues) |
19 | | v |
20 | | +-----------+ (Screenshot + Goal) +-----------+ |
21 | | | | --------------------> | | |
22 | | | Core | | LLM | |
23 | | | | <-------------------- | (GPT-4V) | |
24 | | +-----------+ (Instructions) +-----------+ |
25 | | | |
26 | | v |
27 | | +-------------+ |
28 | | | Interpreter | |
29 | | +-------------+ |
30 | | | |
31 | | v |
32 | | +-------------+ |
33 | | | Executer | |
34 | | +-------------+ |
35 | +----------------------------------------------------+
36 | """
37 |
38 | def __init__(self):
39 | self.core = Core()
40 | self.ui = UI()
41 |
42 | # Create threads to facilitate communication between core and ui through queues
43 | self.core_to_ui_connection_thread = threading.Thread(target=self.send_status_from_core_to_ui, daemon=True)
44 | self.ui_to_core_connection_thread = threading.Thread(target=self.send_user_request_from_ui_to_core, daemon=True)
45 |
46 | def run(self) -> None:
47 | self.core_to_ui_connection_thread.start()
48 | self.ui_to_core_connection_thread.start()
49 |
50 | self.ui.run()
51 |
52 | def send_status_from_core_to_ui(self) -> None:
53 | while True:
54 | status: str = self.core.status_queue.get()
55 | print(f'Sending status from thread - thread: {threading.current_thread().name}, status: {status}')
56 | self.ui.display_current_status(status)
57 |
58 | def send_user_request_from_ui_to_core(self) -> None:
59 | while True:
60 | user_request: str = self.ui.main_window.user_request_queue.get()
61 | print(f'Sending user request: {user_request}')
62 |
63 | if user_request == 'stop':
64 | self.core.stop_previous_request()
65 |
66 | # ensures all threads are joined before force quit (my code)
67 | try:
68 | for thread in threading.enumerate():
69 | if thread != threading.main_thread():
70 | thread.join(timeout=2)
71 | except Exception as e:
72 | continue
73 |
74 | else:
75 | threading.Thread(target=self.core.execute_user_request, args=(user_request,), daemon=True).start()
76 |
77 | def cleanup(self):
78 | self.core.cleanup()
79 |
80 |
81 | if __name__ == '__main__':
82 | freeze_support() # As required by pyinstaller https://www.pyinstaller.org/en/stable/common-issues-and-pitfalls.html#multi-processing
83 | app = App()
84 | app.run()
85 | app.cleanup()
86 | sys.exit(0)
87 |
--------------------------------------------------------------------------------
/app/core.py:
--------------------------------------------------------------------------------
1 | import time
2 | from multiprocessing import Queue
3 | from typing import Optional, Any
4 |
5 | from openai import OpenAIError
6 |
7 | from interpreter import Interpreter
8 | from llm import LLM
9 | from utils.settings import Settings
10 |
11 |
12 | class Core:
13 | def __init__(self):
14 | self.status_queue = Queue()
15 | self.interrupt_execution = False
16 | self.settings_dict = Settings().get_dict()
17 |
18 | self.interpreter = Interpreter(self.status_queue)
19 |
20 | self.llm = None
21 | try:
22 | self.llm = LLM()
23 | except OpenAIError as e:
24 | self.status_queue.put(f'Set your OpenAPI API Key in Settings and Restart the App. Error: {e}')
25 | except Exception as e:
26 | self.status_queue.put(f'An error occurred during startup. Please fix and restart the app.\n'
27 | f'Error likely in file {Settings().settings_file_path}.\n'
28 | f'Error: {e}')
29 |
30 | def execute_user_request(self, user_request: str) -> None:
31 | self.stop_previous_request()
32 | time.sleep(0.1)
33 | self.execute(user_request)
34 |
35 | def stop_previous_request(self) -> None:
36 | self.interrupt_execution = True
37 |
38 | def execute(self, user_request: str, step_num: int = 0) -> Optional[str]:
39 | """
40 | This function might recurse.
41 |
42 | user_request: The original user request
43 | step_number: the number of times we've called the LLM for this request.
44 | Used to keep track of whether it's a fresh request we're processing (step number 0), or if we're already
45 | in the middle of one.
46 | Without it the LLM kept looping after finishing the user request.
47 | Also, it is needed because the LLM we are using doesn't have a stateful/assistant mode.
48 | """
49 | self.interrupt_execution = False
50 |
51 | if not self.llm:
52 | status = 'Set your OpenAPI API Key in Settings and Restart the App'
53 | self.status_queue.put(status)
54 | return status
55 |
56 | try:
57 | instructions: dict[str, Any] = self.llm.get_instructions_for_objective(user_request, step_num)
58 |
59 | if instructions == {}:
60 | # Sometimes LLM sends malformed JSON response, in that case retry once more.
61 | instructions = self.llm.get_instructions_for_objective(user_request + ' Please reply in valid JSON',
62 | step_num)
63 |
64 | for step in instructions['steps']:
65 | if self.interrupt_execution:
66 | self.status_queue.put('Interrupted')
67 | self.interrupt_execution = False
68 | return 'Interrupted'
69 |
70 | success = self.interpreter.process_command(step)
71 |
72 | if not success:
73 | return 'Unable to execute the request'
74 |
75 | except Exception as e:
76 | status = f'Exception Unable to execute the request - {e}'
77 | self.status_queue.put(status)
78 | return status
79 |
80 | if instructions['done']:
81 | # Communicate Results
82 | self.status_queue.put(instructions['done'])
83 | self.play_ding_on_completion()
84 | return instructions['done']
85 | else:
86 | # if not done, continue to next phase
87 | self.status_queue.put('Fetching further instructions based on current state')
88 | return self.execute(user_request, step_num + 1)
89 |
90 | def play_ding_on_completion(self):
91 | # Play ding sound to signal completion
92 | if self.settings_dict.get('play_ding_on_completion'):
93 | print('\a')
94 |
95 | def cleanup(self):
96 | self.llm.cleanup()
97 |
--------------------------------------------------------------------------------
/app/interpreter.py:
--------------------------------------------------------------------------------
1 | import json
2 | from multiprocessing import Queue
3 | from time import sleep
4 | from typing import Any
5 |
6 | import pyautogui
7 |
8 |
9 | class Interpreter:
10 | def __init__(self, status_queue: Queue):
11 | # MP Queue to put current status of execution in while processes commands.
12 | # It helps us reflect the current status on the UI.
13 | self.status_queue = status_queue
14 |
15 | def process_commands(self, json_commands: list[dict[str, Any]]) -> bool:
16 | """
17 | Reads a list of JSON commands and runs the corresponding function call as specified in context.txt
18 | :param json_commands: List of JSON Objects with format as described in context.txt
19 | :return: True for successful execution, False for exception while interpreting or executing.
20 | """
21 | for command in json_commands:
22 | success = self.process_command(command)
23 | if not success:
24 | return False # End early and return
25 | return True
26 |
27 | def process_command(self, json_command: dict[str, Any]) -> bool:
28 | """
29 | Reads the passed in JSON object and extracts relevant details. Format is specified in context.txt.
30 | After interpretation, it proceeds to execute the appropriate function call.
31 |
32 | :return: True for successful execution, False for exception while interpreting or executing.
33 | """
34 | function_name = json_command['function']
35 | parameters = json_command.get('parameters', {})
36 | human_readable_justification = json_command.get('human_readable_justification')
37 | print(f'Now performing - {function_name} - {parameters} - {human_readable_justification}')
38 | self.status_queue.put(human_readable_justification)
39 | try:
40 | self.execute_function(function_name, parameters)
41 | return True
42 | except Exception as e:
43 | print(f'\nError:\nWe are having a problem executing this step - {type(e)} - {e}')
44 | print(f'This was the json we received from the LLM: {json.dumps(json_command, indent=2)}')
45 | print(f'This is what we extracted:')
46 | print(f'\t function_name:{function_name}')
47 | print(f'\t parameters:{parameters}')
48 |
49 | return False
50 |
51 | def execute_function(self, function_name: str, parameters: dict[str, Any]) -> None:
52 | """
53 | We are expecting only two types of function calls below
54 | 1. time.sleep() - to wait for web pages, applications, and other things to load.
55 | 2. pyautogui calls to interact with system's mouse and keyboard.
56 | """
57 | # Sometimes pyautogui needs warming up i.e. sometimes first call isn't executed hence padding a random call here
58 | pyautogui.press("command", interval=0.2)
59 |
60 | if function_name == "sleep" and parameters.get("secs"):
61 | sleep(parameters.get("secs"))
62 | elif hasattr(pyautogui, function_name):
63 | # Execute the corresponding pyautogui function i.e. Keyboard or Mouse commands.
64 | function_to_call = getattr(pyautogui, function_name)
65 |
66 | # Special handling for the 'write' function
67 | if function_name == 'write' and ('string' in parameters or 'text' in parameters):
68 | # 'write' function expects a string, not a 'text' keyword argument but LLM sometimes gets confused on the parameter name.
69 | string_to_write = parameters.get('string') or parameters.get('text')
70 | interval = parameters.get('interval', 0.1)
71 | function_to_call(string_to_write, interval=interval)
72 | elif function_name == 'press' and ('keys' in parameters or 'key' in parameters):
73 | # 'press' can take a list of keys or a single key
74 | keys_to_press = parameters.get('keys') or parameters.get('key')
75 | presses = parameters.get('presses', 1)
76 | interval = parameters.get('interval', 0.2)
77 | function_to_call(keys_to_press, presses=presses, interval=interval)
78 | elif function_name == 'hotkey':
79 | # 'hotkey' function expects multiple key arguments, not a list
80 | keys = list(parameters.values())
81 | function_to_call(*keys)
82 | else:
83 | # For other functions, pass the parameters as they are
84 | function_to_call(**parameters)
85 | else:
86 | print(f'No such function {function_name} in our interface\'s interpreter')
87 |
--------------------------------------------------------------------------------
/app/llm.py:
--------------------------------------------------------------------------------
1 | from pathlib import Path
2 | from typing import Any
3 |
4 | from models.factory import ModelFactory
5 | from utils import local_info
6 | from utils.screen import Screen
7 | from utils.settings import Settings
8 |
9 | DEFAULT_MODEL_NAME = 'gpt-4o'
10 |
11 |
12 | class LLM:
13 | """
14 | LLM Request
15 | {
16 | "original_user_request": ...,
17 | "step_num": ...,
18 | "screenshot": ...
19 | }
20 |
21 | step_num is the count of times we've interacted with the LLM for this user request.
22 | If it's 0, we know it's a fresh user request.
23 | If it's greater than 0, then we know we are already in the middle of a request.
24 | Therefore, if the number is positive and from the screenshot it looks like request is complete, then return an
25 | empty list in steps and a string in done. Don't keep looping the same request.
26 |
27 | Expected LLM Response
28 | {
29 | "steps": [
30 | {
31 | "function": "...",
32 | "parameters": {
33 | "key1": "value1",
34 | ...
35 | },
36 | "human_readable_justification": "..."
37 | },
38 | {...},
39 | ...
40 | ],
41 | "done": ...
42 | }
43 |
44 | function is the function name to call in the executer.
45 | parameters are the parameters of the above function.
46 | human_readable_justification is what we can use to debug in case program fails somewhere or to explain to user why we're doing what we're doing.
47 | done is null if user request is not complete, and it's a string when it's complete that either contains the
48 | information that the user asked for, or just acknowledges completion of the user requested task. This is going
49 | to be communicated to the user if it's present.
50 | """
51 |
52 | def __init__(self):
53 | self.settings_dict: dict[str, str] = Settings().get_dict()
54 | model_name, base_url, api_key = self.get_settings_values()
55 |
56 | self.model_name = model_name
57 | context = self.read_context_txt_file()
58 |
59 | self.model = ModelFactory.create_model(self.model_name, base_url, api_key, context)
60 |
61 | def get_settings_values(self) -> tuple[str, str, str]:
62 | model_name = self.settings_dict.get('model') or DEFAULT_MODEL_NAME
63 | base_url = (self.settings_dict.get('base_url') or 'https://api.openai.com/v1/').rstrip('/') + '/'
64 | api_key = self.settings_dict.get('api_key')
65 |
66 | return model_name, base_url, api_key
67 |
68 | def read_context_txt_file(self) -> str:
69 | # Construct context for the assistant by reading context.txt and adding extra system information
70 | context = ''
71 | path_to_context_file = Path(__file__).resolve().parent.joinpath('resources', 'context.txt')
72 | with open(path_to_context_file, 'r') as file:
73 | context += file.read()
74 |
75 | context += f' Locally installed apps are {",".join(local_info.locally_installed_apps)}.'
76 | context += f' OS is {local_info.operating_system}.'
77 | context += f' Primary screen size is {Screen().get_size()}.\n'
78 |
79 | if 'default_browser' in self.settings_dict.keys() and self.settings_dict['default_browser']:
80 | context += f'\nDefault browser is {self.settings_dict["default_browser"]}.'
81 |
82 | if 'custom_llm_instructions' in self.settings_dict:
83 | context += f'\nCustom user-added info: {self.settings_dict["custom_llm_instructions"]}.'
84 |
85 | return context
86 |
87 | def get_instructions_for_objective(self, original_user_request: str, step_num: int = 0) -> dict[str, Any]:
88 | return self.model.get_instructions_for_objective(original_user_request, step_num)
89 |
90 | def cleanup(self):
91 | self.model.cleanup()
92 |
--------------------------------------------------------------------------------
/app/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/app/models/__init__.py
--------------------------------------------------------------------------------
/app/models/deprecated/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/app/models/deprecated/__init__.py
--------------------------------------------------------------------------------
/app/models/factory.py:
--------------------------------------------------------------------------------
1 | from models.gpt4o import GPT4o
2 | from models.gpt4v import GPT4v
3 | from models.gemini import Gemini
4 |
5 |
6 | class ModelFactory:
7 | @staticmethod
8 | def create_model(model_name, *args):
9 | try:
10 | if model_name == 'gpt-4o' or model_name == 'gpt-4o-mini':
11 | return GPT4o(model_name, *args)
12 | elif model_name == 'gpt-4-vision-preview' or model_name == 'gpt-4-turbo':
13 | return GPT4v(model_name, *args)
14 | elif model_name.startswith("gemini"):
15 | return Gemini(model_name, *args[1:])
16 | else:
17 | # Llama/Llava models will work with the standard code I wrote for GPT4V without the assitant mode features of gpt4o
18 | return GPT4v(model_name, *args)
19 | except Exception as e:
20 | raise ValueError(f'Unsupported model type {model_name}. Create entry in app/models/. Error: {e}')
21 |
--------------------------------------------------------------------------------
/app/models/gemini.py:
--------------------------------------------------------------------------------
1 | import json
2 | import os
3 | from typing import Any
4 |
5 | from google import genai
6 | from google.genai import types
7 | from utils.screen import Screen
8 |
9 |
10 | class Gemini:
11 | def __init__(self, model_name, api_key, context):
12 | self.model_name = model_name
13 | self.api_key = api_key
14 | self.context = context
15 | self.client = genai.Client(api_key=api_key)
16 |
17 | if api_key:
18 | os.environ['GEMINI_API_KEY'] = api_key
19 |
20 | def get_instructions_for_objective(self, original_user_request: str, step_num: int = 0) -> dict[str, Any]:
21 | safety_settings = [
22 | types.SafetySetting(category=category.value, threshold="BLOCK_NONE")
23 | for category in types.HarmCategory
24 | if category.value != 'HARM_CATEGORY_UNSPECIFIED'
25 | ]
26 | message_content = self.format_user_request_for_llm(original_user_request, step_num)
27 |
28 | llm_response = self.client.models.generate_content(
29 | model=self.model_name,
30 | contents=message_content,
31 | config=types.GenerateContentConfig(safety_settings=safety_settings),
32 | )
33 | json_instructions: dict[str, Any] = self.convert_llm_response_to_json_instructions(llm_response)
34 | return json_instructions
35 |
36 | def format_user_request_for_llm(self, original_user_request, step_num) -> list[Any]:
37 | base64_img: str = Screen().get_screenshot_in_base64()
38 |
39 | request_data: str = json.dumps({
40 | "original_user_request": original_user_request,
41 | "step_num": step_num,
42 | })
43 |
44 | message_content = [
45 | {"text": self.context + request_data + "\n\nHere is a screenshot of the user's screen:"},
46 | {"inline_data": {
47 | "mime_type": "image/jpeg",
48 | "data": base64_img,
49 | }},
50 | ]
51 | return message_content
52 |
53 | def convert_llm_response_to_json_instructions(self, llm_response) -> dict[str, Any]:
54 | llm_response_data = llm_response.text.strip()
55 |
56 | start_index = llm_response_data.find("{")
57 | end_index = llm_response_data.rfind("}")
58 |
59 | try:
60 | json_response = json.loads(llm_response_data[start_index:end_index + 1].strip())
61 | except Exception as e:
62 | print(f"Error while parsing JSON response - {e}")
63 | json_response = {}
64 |
65 | return json_response
66 |
67 | def cleanup(self):
68 | pass
69 |
--------------------------------------------------------------------------------
/app/models/gpt4o.py:
--------------------------------------------------------------------------------
1 | import json
2 | import time
3 | from typing import Any
4 |
5 | from models.model import Model
6 | from openai.types.beta.threads.message import Message
7 | from utils.screen import Screen
8 |
9 |
10 | # TODO
11 | # [ ] Function calling with assistants api - https://platform.openai.com/docs/assistants/tools/function-calling/quickstart
12 |
13 | class GPT4o(Model):
14 | def __init__(self, model_name, base_url, api_key, context):
15 | super().__init__(model_name, base_url, api_key, context)
16 |
17 | # GPT4o has Assistant Mode enabled that we can utilize to make Open Interface be more contextually aware
18 | self.assistant = self.client.beta.assistants.create(
19 | name='Open Interface Backend',
20 | instructions=self.context,
21 | model=model_name,
22 | # tools=[],
23 | )
24 |
25 | self.thread = self.client.beta.threads.create()
26 |
27 | # IDs of images uploaded to OpenAI for use with the assistants API, can be cleaned up once thread is no longer needed
28 | self.list_of_image_ids = []
29 |
30 | def get_instructions_for_objective(self, original_user_request: str, step_num: int = 0) -> dict[str, Any]:
31 | # Upload screenshot to OpenAI - Note: Don't delete files from openai while the thread is active
32 | openai_screenshot_file_id = self.upload_screenshot_and_get_file_id()
33 |
34 | self.list_of_image_ids.append(openai_screenshot_file_id)
35 |
36 | # Format user request to send to LLM
37 | formatted_user_request = self.format_user_request_for_llm(original_user_request, step_num,
38 | openai_screenshot_file_id)
39 |
40 | # Read response
41 | llm_response = self.send_message_to_llm(formatted_user_request)
42 | json_instructions: dict[str, Any] = self.convert_llm_response_to_json_instructions(llm_response)
43 |
44 | return json_instructions
45 |
46 | def send_message_to_llm(self, formatted_user_request) -> Message:
47 | message = self.client.beta.threads.messages.create(
48 | thread_id=self.thread.id,
49 | role='user',
50 | content=formatted_user_request
51 | )
52 |
53 | run = self.client.beta.threads.runs.create_and_poll(
54 | thread_id=self.thread.id,
55 | assistant_id=self.assistant.id,
56 | instructions=''
57 | )
58 |
59 | while run.status != 'completed':
60 | print(f'Waiting for response, sleeping for 1. run.status={run.status}')
61 | time.sleep(1)
62 |
63 | if run.status == 'failed':
64 | print(f'failed run run.required_action:{run.required_action} run.last_error: {run.last_error}\n\n')
65 | return None
66 |
67 | if run.status == 'completed':
68 | # NOTE: Apparently right now the API doesn't have a way to retrieve just the last message???
69 | # So instead you get all messages and take the latest one
70 | response = self.client.beta.threads.messages.list(
71 | thread_id=self.thread.id
72 | )
73 |
74 | return response.data[0]
75 | else:
76 | print('Run did not complete successfully.')
77 | return None
78 |
79 | def upload_screenshot_and_get_file_id(self):
80 | # Files are used to upload documents like images that can be used with features like Assistants
81 | # Assistants API cannot take base64 images like chat.completions API
82 | filepath = Screen().get_screenshot_file()
83 |
84 | response = self.client.files.create(
85 | file=open(filepath, 'rb'),
86 | purpose='vision'
87 | )
88 | return response.id
89 |
90 | def format_user_request_for_llm(self, original_user_request, step_num, openai_screenshot_file_id) -> list[
91 | dict[str, Any]]:
92 | request_data: str = json.dumps({
93 | 'original_user_request': original_user_request,
94 | 'step_num': step_num
95 | })
96 |
97 | content = [
98 | {
99 | 'type': 'text',
100 | 'text': request_data
101 | },
102 | {
103 | 'type': 'image_file',
104 | 'image_file': {
105 | 'file_id': openai_screenshot_file_id
106 | }
107 | }
108 | ]
109 |
110 | return content
111 |
112 | def convert_llm_response_to_json_instructions(self, llm_response: Message) -> dict[str, Any]:
113 | llm_response_data: str = llm_response.content[0].text.value.strip()
114 |
115 | # Our current LLM model does not guarantee a JSON response hence we manually parse the JSON part of the response
116 | # Check for updates here - https://platform.openai.com/docs/guides/text-generation/json-mode
117 | start_index = llm_response_data.find('{')
118 | end_index = llm_response_data.rfind('}')
119 |
120 | try:
121 | json_response = json.loads(llm_response_data[start_index:end_index + 1].strip())
122 | except Exception as e:
123 | print(f'Error while parsing JSON response - {e}')
124 | json_response = {}
125 |
126 | return json_response
127 |
128 | def cleanup(self):
129 | # Note: Cannot delete screenshots while the thread is active. Cleanup during shut down.
130 | for id in self.list_of_image_ids:
131 | self.client.files.delete(id)
132 | self.thread = self.client.beta.threads.create() # Using old thread even by accident would cause Image errors
133 |
--------------------------------------------------------------------------------
/app/models/gpt4v.py:
--------------------------------------------------------------------------------
1 | import json
2 | from typing import Any
3 |
4 | from models.model import Model
5 | from openai import ChatCompletion
6 | from utils.screen import Screen
7 |
8 |
9 | class GPT4v(Model):
10 | def get_instructions_for_objective(self, original_user_request: str, step_num: int = 0) -> dict[str, Any]:
11 | message: list[dict[str, Any]] = self.format_user_request_for_llm(original_user_request, step_num)
12 | llm_response = self.send_message_to_llm(message)
13 | json_instructions: dict[str, Any] = self.convert_llm_response_to_json_instructions(llm_response)
14 | return json_instructions
15 |
16 | def format_user_request_for_llm(self, original_user_request, step_num) -> list[dict[str, Any]]:
17 | base64_img: str = Screen().get_screenshot_in_base64()
18 |
19 | request_data: str = json.dumps({
20 | 'original_user_request': original_user_request,
21 | 'step_num': step_num
22 | })
23 |
24 | # We have to add context every request for now which is expensive because our chosen model doesn't have a
25 | # stateful/Assistant mode yet.
26 | message = [
27 | {'type': 'text', 'text': self.context + request_data},
28 | {'type': 'image_url',
29 | 'image_url': {
30 | 'url': f'data:image/jpeg;base64,{base64_img}'
31 | }
32 | }
33 | ]
34 |
35 | return message
36 |
37 | def send_message_to_llm(self, message) -> ChatCompletion:
38 | response = self.client.chat.completions.create(
39 | model=self.model_name,
40 | messages=[
41 | {
42 | 'role': 'user',
43 | 'content': message,
44 | }
45 | ],
46 | max_tokens=800,
47 | )
48 | return response
49 |
50 | def convert_llm_response_to_json_instructions(self, llm_response: ChatCompletion) -> dict[str, Any]:
51 | llm_response_data: str = llm_response.choices[0].message.content.strip()
52 |
53 | # Our current LLM model does not guarantee a JSON response hence we manually parse the JSON part of the response
54 | # Check for updates here - https://platform.openai.com/docs/guides/text-generation/json-mode
55 | start_index = llm_response_data.find('{')
56 | end_index = llm_response_data.rfind('}')
57 |
58 | try:
59 | json_response = json.loads(llm_response_data[start_index:end_index + 1].strip())
60 | except Exception as e:
61 | print(f'Error while parsing JSON response - {e}')
62 | json_response = {}
63 |
64 | return json_response
65 |
--------------------------------------------------------------------------------
/app/models/model.py:
--------------------------------------------------------------------------------
1 | import os
2 | from typing import Any
3 |
4 | from openai import OpenAI
5 |
6 |
7 | class Model:
8 | def __init__(self, model_name, base_url, api_key, context):
9 | self.model_name = model_name
10 | self.base_url = base_url
11 | self.api_key = api_key
12 | self.context = context
13 | self.client = OpenAI(api_key=api_key, base_url=base_url)
14 |
15 | if api_key:
16 | os.environ['OPENAI_API_KEY'] = api_key
17 |
18 | def get_instructions_for_objective(self, *args) -> dict[str, Any]:
19 | pass
20 |
21 | def format_user_request_for_llm(self, *args):
22 | pass
23 |
24 | def convert_llm_response_to_json_instructions(self, *args) -> dict[str, Any]:
25 | pass
26 |
27 | def cleanup(self, *args):
28 | pass
29 |
--------------------------------------------------------------------------------
/app/models/o1.py:
--------------------------------------------------------------------------------
1 | """
2 | Removed untested code because I am not eligible for o1 API access yet. Haven't reached tier 5 billing.
3 | """
--------------------------------------------------------------------------------
/app/resources/context.txt:
--------------------------------------------------------------------------------
1 | Context:
2 | You are Open Interface, the backend for an app controlling a user's computer. User requests will be conversational such as "Open Sublime text", or "Create an Excel sheet with a meal plan for the week", or "how old is Steve Carrel".
3 | You return steps to navigate to the correct application, get to the correct text box if needed, and deliver the objective being asked of you as if you were a personal assistant controlling the computer.
4 |
5 | Do this by returning valid JSON responses that map to function calls that can control the mouse, keyboard, and wait (for applications to load) as needed. Only send me back a valid JSON response that I can put in json.loads() without an error - this is extremely important. Do not add any leading or trailing characters.
6 |
7 | To control the keyboard and mouse of my computer, use the pyautogui library.
8 | Be mindful to use the correct parameter name for its corresponding function call - this is very important.
9 | Also keep the typing interval low around 0.05.
10 | In addition to pyautogui, you can also call sleep(seconds) to wait for apps, web pages, and other things to load.
11 |
12 | Sometimes it will be necessary for you to do half the objective, request a new screenshot to verify whether you are where you expect, and then provide the further steps.
13 |
14 | In the JSON request I send you there will be three parameters:
15 | "original_user_request": the user requested objective
16 | "step_num": if it's 0, it's a new request. Any other number means that you had requested for a screenshot to judge your progress and are on an intermediary step.
17 | "screenshot": the latest state of the system in a screenshot.
18 |
19 | Expected LLM Response:
20 | """
21 | {
22 | "steps": [
23 | {
24 | "function": "...",
25 | "parameters": {
26 | "key1": "value1",
27 | ...
28 | },
29 | "human_readable_justification": "..."
30 | },
31 | {...},
32 | ...
33 | ],
34 | "done": ...
35 | }
36 | """
37 |
38 | "function" is the function name to call in the executor.
39 | "parameters" is the parameters of the above function.
40 | "human_readable_justification" explains to users why you're doing what you're doing.
41 | "done" is null if user request is not complete, and a string when it's complete. populate it with the information that the user asked for, or just acknowledge completion of the user requested task. Verify with a screenshot if the user requested objective is reached, and remember to populate this field when you think you have completed a user task, or we will keep going in loops. This is important.
42 |
43 | Critical guidelines based on your past behavior to help you in your objectives:
44 | 1. If you think a task is complete, don't keep enqueuing more steps. Just fill the "done" parameter with value. This is very important.
45 | 2. Be extra careful in opening spotlight on MacOS, if you fail at that nothing proceeding it will work. The key sequence to open spotlight is to hold down command, then hold down space, then release. This is very important. You can use pyautogui.hotkey('command', 'space'). Please do this right.
46 | 3. When you open applications and webpages, include sleeps in your response to give them time to load.
47 | 4. When you perform a complex navigation don't pass in too many steps after that. Request a new screenshot to verify whether things are going to plan, or if you need to correct course.
48 | 5. At the same time send at least 4-5 steps when possible because you don't want to be slow and calls to the GPT API are time-consuming.
49 | 6. Try to only send 4-5 steps at a time and then leave done empty, so the app can re-enqueue the request for you with a new screenshot. This is very important! Without new screenshots you generally do not perform well.
50 | 7. Break down your response into very simple steps. This is very important.
51 | 8. If you don't think you can execute a task or execute it safely, leave steps empty and return done with an explanation.
52 | 9. Only accept as request something you can reasonably perform on a computer.
53 | 10. Don't overwrite any user data. Always try to open new windows and tabs after you open an application or browser. This is very important.
54 | 11. If you ever encounter a login page, return done with an explanation and ask user to give you a new command after logging in manually.
55 | 12. pyautogui.press("enter") is not the same as pyautogui.write("\n") - please do not confuse them.
56 | 13. Try going to a link directly if you know it instead of searching for it. This is very important.
57 | 14. Very importantly, before you start typing make sure you are within the intended text box. Sometimes an application is open in the background and you think it's in the foreground and start typing. You can check if the correct application is active right now by looking at the top left for the application name on MacOS.
58 | 15. Try not switching applications with keyboard shortcuts, instead always launch applications with spotlight on MacOS.
59 | 16. Do not just rely on thread history to understand state, always look at the latest screenshot being sent with a request. User may perform other actions, navigate in and out of apps between requests. ALWAYS look at state of the system with the screenshot provided.
60 | 17. Try not to use pyautogui's mouse commands and instead rely on keyboard functions. You risk doing poorly with mouse navigation.
61 |
62 | Lastly, do not ever, ever do anything to hurt the user or the computer system - do not perform risky deletes, or any other similar actions.
63 | And always reply in valid JSON.
64 |
65 | ---
66 | pyautogui keyboard documentation
67 | The write() Function
68 | ===
69 | The primary keyboard function is ``write()``. This function will type the characters in the string that is passed. To add a delay interval in between pressing each character key, pass an int or float for the ``interval`` keyword argument.
70 | For example:
71 | .. code:: python
72 | >>> pyautogui.write('Hello world!') # prints out "Hello world!" instantly
73 | >>> pyautogui.write('Hello world!', interval=0.25) # prints out "Hello world!" with a quarter second delay after each character
74 | You can only press single-character keys with ``write()``, so you can't press the Shift or F1 keys, for example.
75 | The press(), keyDown(), and keyUp() Functions
76 | ===
77 | To press these keys, call the ``press()`` function and pass it a string from the ``pyautogui.KEYBOARD_KEYS`` such as ``enter``, ``esc``, ``f1``. See `KEYBOARD_KEYS`_.
78 | For example:
79 | .. code:: python
80 | >>> pyautogui.press('enter') # press the Enter key
81 | >>> pyautogui.press('f1') # press the F1 key
82 | >>> pyautogui.press('left') # press the left arrow key
83 | The ``press()`` function is really just a wrapper for the ``keyDown()`` and ``keyUp()`` functions, which simulate pressing a key down and then releasing it up. These functions can be called by themselves. For example, to press the left arrow key three times while holding down the Shift key, call the following:
84 | .. code:: python
85 | >>> pyautogui.keyDown('shift') # hold down the shift key
86 | >>> pyautogui.press('left') # press the left arrow key
87 | >>> pyautogui.press('left') # press the left arrow key
88 | >>> pyautogui.press('left') # press the left arrow key
89 | >>> pyautogui.keyUp('shift') # release the shift key
90 | To press multiple keys similar to what ``write()`` does, pass a list of strings to ``press()``. For example:
91 | .. code:: python
92 | >>> pyautogui.press(['left', 'left', 'left'])
93 | Or you can set how many presses `left`:
94 | .. code:: python
95 | >>> pyautogui.press('left', presses=3)
96 | To add a delay interval in between each press, pass an int or float for the ``interval`` keyword argument.
97 | The hold() Context Manager
98 | ===
99 | To make holding a key convenient, the ``hold()`` function can be used as a context manager and passed a string from the ``pyautogui.KEYBOARD_KEYS`` such as ``shift``, ``ctrl``, ``alt``, and this key will be held for the duration of the ``with`` context block. See `KEYBOARD_KEYS`_.
100 | .. code:: python
101 | >>> with pyautogui.hold('shift'):
102 | pyautogui.press(['left', 'left', 'left'])
103 | . . .is equivalent to this code:
104 | .. code:: python
105 | >>> pyautogui.keyDown('shift') # hold down the shift key
106 | >>> pyautogui.press('left') # press the left arrow key
107 | >>> pyautogui.press('left') # press the left arrow key
108 | >>> pyautogui.press('left') # press the left arrow key
109 | >>> pyautogui.keyUp('shift') # release the shift key
110 | The hotkey() Function
111 | ===
112 | To make pressing hotkeys or keyboard shortcuts convenient, the ``hotkey()`` can be passed several key strings which will be pressed down in order, and then released in reverse order. This code:
113 | .. code:: python
114 | >>> pyautogui.hotkey('ctrl', 'shift', 'esc')
115 | . . .is equivalent to this code:
116 | .. code:: python
117 | >>> pyautogui.keyDown('ctrl')
118 | >>> pyautogui.keyDown('shift')
119 | >>> pyautogui.keyDown('esc')
120 | >>> pyautogui.keyUp('esc')
121 | >>> pyautogui.keyUp('shift')
122 | >>> pyautogui.keyUp('ctrl')
123 | To add a delay interval in between each press, pass an int or float for the ``interval`` keyword argument.
124 |
125 | KEYBOARD_KEYS
126 | ===
127 | The following are the valid strings to pass to the ``press()``, ``keyDown()``, ``keyUp()``, and ``hotkey()`` functions:
128 | .. code:: python
129 | ['\t', '\n', '\r', ' ', '!', '"', '#', '$', '%', '&', "'", '(',
130 | ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7',
131 | '8', '9', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`',
132 | 'a', 'b', 'c', 'd', 'e','f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
133 | 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~',
134 | 'accept', 'add', 'alt', 'altleft', 'altright', 'apps', 'backspace',
135 | 'browserback', 'browserfavorites', 'browserforward', 'browserhome',
136 | 'browserrefresh', 'browsersearch', 'browserstop', 'capslock', 'clear',
137 | 'convert', 'ctrl', 'ctrlleft', 'ctrlright', 'decimal', 'del', 'delete',
138 | 'divide', 'down', 'end', 'enter', 'esc', 'escape', 'execute', 'f1', 'f10',
139 | 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f2', 'f20',
140 | 'f21', 'f22', 'f23', 'f24', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9',
141 | 'final', 'fn', 'hanguel', 'hangul', 'hanja', 'help', 'home', 'insert', 'junja',
142 | 'kana', 'kanji', 'launchapp1', 'launchapp2', 'launchmail',
143 | 'launchmediaselect', 'left', 'modechange', 'multiply', 'nexttrack',
144 | 'nonconvert', 'num0', 'num1', 'num2', 'num3', 'num4', 'num5', 'num6',
145 | 'num7', 'num8', 'num9', 'numlock', 'pagedown', 'pageup', 'pause', 'pgdn',
146 | 'pgup', 'playpause', 'prevtrack', 'print', 'printscreen', 'prntscrn',
147 | 'prtsc', 'prtscr', 'return', 'right', 'scrolllock', 'select', 'separator',
148 | 'shift', 'shiftleft', 'shiftright', 'sleep', 'space', 'stop', 'subtract', 'tab',
149 | 'up', 'volumedown', 'volumemute', 'volumeup', 'win', 'winleft', 'winright', 'yen',
150 | 'command', 'option', 'optionleft', 'optionright']
--------------------------------------------------------------------------------
/app/resources/icon.ico:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/app/resources/icon.ico
--------------------------------------------------------------------------------
/app/resources/icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/app/resources/icon.png
--------------------------------------------------------------------------------
/app/resources/microphone.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/app/resources/microphone.png
--------------------------------------------------------------------------------
/app/resources/old-context.txt:
--------------------------------------------------------------------------------
1 | Context:
2 | You are now the backend for a program that is controlling my computer. User requests will be conversational such as "Open Sublime text", or "Create an Excel sheet with a meal plan for the week", "how old is Steve Carrel".
3 | You are supposed to return steps navigate to the correct application, get to the text box if needed, and deliver the content being asked of you as if you were a personal assistant.
4 |
5 | You will be able to do this by returning valid JSON responses that map back to function calls that can control the mouse, keyboard, and wait (for applications to load) as needed. I will specify the API we can use to communicate.
6 | Only send me back a valid JSON response that I can put in json.loads() without an error - this is extremely important. Do not add any leading or trailing characters.
7 |
8 | Sometimes it will be necessary for you to do half the action, request a new screenshot to verify whether you are where you expect, and then provide the further steps. There is a way to do that I will specify later.
9 |
10 | In the JSON request I send you there will be three parameters:
11 | "original_user_request": the user requested action
12 | "step_num": if it's 0, it's a new request. Any other number means that you had requested for a screenshot to judge your progress.
13 | "screenshot": the latest state of the system in a screenshot.
14 |
15 | Expected LLM Response
16 | {
17 | "steps": [
18 | {
19 | "function": "...",
20 | "parameters": {
21 | "key1": "value1",
22 | ...
23 | },
24 | "human_readable_justification": "..."
25 | },
26 | {...},
27 | ...
28 | ],
29 | "done": ...
30 | }
31 |
32 | "function" is the function name to call in the executor.
33 | "parameters" is the parameters of the above function.
34 | "human_readable_justification" is what we can use to debug in case program fails somewhere or to explain to user why we're doing what we're doing.
35 | "done" is null if user request is not complete, and it's a string when it's complete that either contains the information that the user asked for, or just acknowledges completion of the user requested task. This is going to be communicated to the user if it's present. Remember to populate done when you think you have completed a user task, or we will keep going in loops, and we don't want to do that. But also make sure with a screenshot that the job is actually done. This is important.
36 |
37 | To control the keyboard and mouse of my computer, use the pyautogui library.
38 | Keyboard Documentation: [Text from: https://raw.githubusercontent.com/asweigart/pyautogui/master/docs/keyboard.rst]
39 | Mouse Documentation: [Text from: https://raw.githubusercontent.com/asweigart/pyautogui/master/docs/mouse.rst]
40 | Be mindful to use the correct parameter name for its corresponding function call - this is very important.
41 | Also keep the typing interval low around 0.05.
42 | In addition to pyautogui, you can also call sleep(seconds) to wait for apps, web pages, and other things to load.
43 |
44 | Here are some directions based on your past behavior to make you better:
45 | 1. If you think a task is complete, don't keep enqueuing more steps. Just fill the "done" parameter with value. This is very important.
46 | 2. Be extra careful in opening spotlight on MacOS, you usually fail at that and then nothing after works. To open spotlight the key sequence is to hold down command, then space, then release. This is very important.
47 | 3. When you open applications and webpages, include sleeps in your response so you give them time to load.
48 | 4. When you perform any complex navigation don't pass in too many steps after that, so you can receive the latest screenshot to verify if things are going to plan or if you need to correct course.
49 | 5. At the same time send at least 4-5 steps when possible because calls to GPT API are time-consuming and we don't want to be slow.
50 | 6. Break down your response into very simple steps. This is very important.
51 | 7. Do not use pyautogui's mouse commands. Completely rely on keyboard functions. You do extremely poorly with mouse navigation.
52 | 8. If you don't think you can execute a task or execute it safely, leave steps empty and return done with an explanation.
53 | 9. Very importantly don't respond in anything but JSON.
54 | 10. Only accept as request something you can reasonably perform on a computer.
55 | 11. Very importantly always try to open new windows and tabs after you open an application or browser. This is so that we don't overwrite any user data. This is very important.
56 | 12. If you ever encounter a login page, return done with an explanation and ask user to give you a new command after logging in manually.
57 | 13. Try to only send 4-5 steps at a time and then leave done empty, so I can reenqueue the request for you with a new screenshot. This is very important! Without new screenshots you generally do not perform well.
58 | 14. pyautogui.press("enter") is not the same as pyautogui.write("\n") - please do not interchange them.
59 | 15. Try going to links directly instead of searching for them. This is very important.
60 | 16. Very importantly, before you start typing make sure you are within the intended text box. Sometimes an application is open in the background and you think it's in the foreground and start typing. You can check if the correct application is active right now by looking at the top left for the application name on MacOS.
61 | 17. Try not switching applications with keyboard shortcuts, instead always launch applications with spotlight on MacOS.
62 | 18. Do not just rely on thread history to understand state, always look at the latest screenshot being sent with a request. User may perform other actions, navigate in and out of apps between requests. ALWAYS look at state of the system with the screenshot provided.
63 |
64 | Lastly, do not ever, ever do anything to hurt the user or the computer system - do not perform risky deletes, or any other similar actions.
65 |
66 | I will now show you the source code so you can better understand how your responses will be interpreted.
67 |
68 | class Core:
69 | def __init__(self):
70 | self.llm = LLM()
71 | self.interpreter = Interpreter()
72 | def run(self):
73 | while True:
74 | user_request = input("\nEnter your request: ").strip()
75 | self.execute(user_request)
76 | def execute(self, user_request, step_num=0):
77 | """
78 | user_request: The original user request
79 | step_number: the number of times we've called the LLM for this request.
80 | Used to keep track of whether it's a fresh request we're processing (step number 0), or if we're already in the middle of one.
81 | Without it the LLM kept looping after finishing the user request.
82 | Also, it is needed because the LLM we are using doesn't have a stateful/assistant mode.
83 | """
84 | instructions = self.llm.get_instructions_for_objective(user_request, step_num)
85 | # Send to Interpreter and Executor
86 | self.interpreter.process(instructions["steps"]) # GPTToLocalInterface.py
87 | if instructions["done"]:
88 | # Communicate Results
89 | print(instructions["done"])
90 | else:
91 | # if not done, continue to next phase
92 | self.execute(user_request, step_num + 1)
93 |
94 | class Interpreter:
95 | def __init__(self):
96 | pass
97 | def process(self, json_commands):
98 | for command in json_commands:
99 | function_name = command["function"]
100 | parameters = command.get('parameters', {})
101 | self.execute_function(function_name, parameters)
102 | def execute_function(self, function_name, parameters):
103 | """
104 | We are expecting only two types of function calls below
105 | 1. time.sleep() - to wait for web pages, applications, and other things to load.
106 | 2. pyautogui calls to interact with system's mouse and keyboard.
107 | """
108 | if function_name == "sleep" and parameters.get("secs"):
109 | sleep(parameters.get("secs"))
110 | elif hasattr(pyautogui, function_name):
111 | # Execute the corresponding pyautogui function i.e. Keyboard or Mouse commands.
112 | function_to_call = getattr(pyautogui, function_name)
113 | # Special handling for the 'write' function
114 | if function_name == 'write' and ('string' in parameters or 'text' in parameters):
115 | # 'write' function expects a string, not a 'text' keyword argument. LLM sometimes gets confused on what to send.
116 | string_to_write = parameters.get('string') or parameters.get('text')
117 | interval = parameters.get('interval', 0.05)
118 | function_to_call(string_to_write, interval=interval)
119 | elif function_name == 'press' and ('keys' in parameters or 'key' in parameters):
120 | # 'press' can take a list of keys or a single key
121 | keys_to_press = parameters['keys'] or parameters.get('key')
122 | presses = parameters.get('presses', 1)
123 | interval = parameters.get('interval', 0.0)
124 | for key in keys_to_press:
125 | function_to_call(key, presses=presses, interval=interval)
126 | elif function_name == 'hotkey':
127 | # 'hotkey' function expects multiple key arguments, not a list
128 | function_to_call(*parameters['keys'])
129 | else:
130 | # For other functions, pass the parameters as they are
131 | function_to_call(**parameters)
132 | else:
133 | print(f"No such function {function_name} in our interface's interpreter")
134 | class LLM:
135 | def __init__(self):
136 | self.client = OpenAI()
137 | self.model = "gpt-4o"
138 | with open('context.txt', 'r') as file:
139 | self.context = file.read()
140 | self.context += f"\nDefault browser is {local_info.default_browser}."
141 | self.context += f" Locally installed apps are {','.join(local_info.locally_installed_apps)}."
142 | self.context += f" Primary screen size is {Screen().get_size()}.\n"
143 | self.assistant = self.client.beta.assistants.create(
144 | name="Open Interface Backend",
145 | instructions=self.context,
146 | model="gpt-4o",
147 | )
148 | self.thread = self.client.beta.threads.create()
149 | def get_instructions_for_objective(self, original_user_request, step_num=0):
150 | openai_file_id_for_screenshot, temp_filename = self.upload_screenshot_and_get_file_id()
151 | formatted_user_request = self.format_user_request_for_llm(original_user_request, step_num,
152 | openai_file_id_for_screenshot)
153 | llm_response = self.send_message_to_llm_v2(formatted_user_request)
154 | json_instructions: dict[str, Any] = self.convert_llm_response_to_json_v2(llm_response)
155 | return json_instructions
156 | def format_user_request_for_llm(self, original_user_request, step_num, openai_file_id_for_screenshot) -> list[
157 | dict[str, Any]]:
158 | request_data: str = json.dumps({
159 | 'original_user_request': original_user_request,
160 | 'step_num': step_num
161 | })
162 | content = [
163 | {
164 | 'type': 'text',
165 | 'text': request_data
166 | },
167 | {
168 | 'type': 'image_file',
169 | 'image_file': {
170 | 'file_id': openai_file_id_for_screenshot
171 | }
172 | }
173 | ]
174 | return content
175 | def send_message_to_llm_v2(self, formatted_user_request) -> Message:
176 | message = self.client.beta.threads.messages.create(
177 | thread_id=self.thread.id,
178 | role="user",
179 | content=formatted_user_request
180 | )
181 | run = self.client.beta.threads.runs.create_and_poll(
182 | thread_id=self.thread.id,
183 | assistant_id=self.assistant.id,
184 | instructions=''
185 | )
186 | while run.status != 'completed':
187 | print(f'Waiting for response, sleeping for 1. run.status={run.status}')
188 | time.sleep(1)
189 | if run.status == 'failed':
190 | print(f'failed run run.required_action:{run.required_action} run.last_error: {run.last_error}\n\n')
191 | return None
192 | if run.status == 'completed':
193 | # NOTE: Apparently right now the API doesn't have a way to retrieve just the last message???
194 | # So instead you get all messages and take the latest one
195 | response = self.client.beta.threads.messages.list(
196 | thread_id=self.thread.id)
197 | return response.data[0]
198 | else:
199 | print("Run did not complete successfully.")
200 | return None
201 | def convert_llm_response_to_json_v2(self, llm_response: ChatCompletion) -> dict[str, Any]:
202 | llm_response_data: str = llm_response.content[0].text.value.strip()
203 | start_index = llm_response_data.find('{')
204 | end_index = llm_response_data.rfind('}')
205 | try:
206 | json_response = json.loads(llm_response_data[start_index:end_index + 1].strip())
207 | except Exception as e:
208 | print(f'Error while parsing JSON response - {e}')
209 | json_response = {}
210 | return json_response
211 | End of code
212 |
--------------------------------------------------------------------------------
/app/ui.py:
--------------------------------------------------------------------------------
1 | import threading
2 | import webbrowser
3 | from multiprocessing import Queue
4 | from pathlib import Path
5 |
6 | # import speech_recognition as sr
7 | import ttkbootstrap as ttk
8 | from PIL import Image, ImageTk
9 |
10 | from llm import DEFAULT_MODEL_NAME
11 | from utils.settings import Settings
12 | from version import version
13 |
14 |
15 | def open_link(url) -> None:
16 | webbrowser.open_new(url)
17 |
18 |
19 | class UI:
20 | def __init__(self):
21 | self.main_window = self.MainWindow()
22 |
23 | def run(self) -> None:
24 | self.main_window.mainloop()
25 |
26 | def display_current_status(self, text: str):
27 | self.main_window.update_message(text)
28 |
29 | class AdvancedSettingsWindow(ttk.Toplevel):
30 | """
31 | Self-contained settings sub-window for the UI
32 | """
33 |
34 | def __init__(self, parent):
35 | super().__init__(parent)
36 | self.title('Advanced Settings')
37 | self.minsize(300, 300)
38 | self.settings = Settings()
39 | self.create_widgets()
40 |
41 | # Populate UI
42 | settings_dict = self.settings.get_dict()
43 |
44 | if 'base_url' in settings_dict:
45 | self.base_url_entry.insert(0, settings_dict['base_url'])
46 | if 'model' in settings_dict:
47 | self.model_entry.insert(0, settings_dict['model'])
48 | self.model_var.set(settings_dict.get('model', 'custom'))
49 | else:
50 | self.model_entry.insert(0, DEFAULT_MODEL_NAME)
51 | self.model_var.set(DEFAULT_MODEL_NAME)
52 |
53 | def create_widgets(self) -> None:
54 | # Radio buttons for model selection
55 | ttk.Label(self, text='Select Model:', bootstyle="primary").pack(pady=10, padx=10)
56 | self.model_var = ttk.StringVar(value='custom') # default selection
57 |
58 | # Create a frame to hold the radio buttons
59 | radio_frame = ttk.Frame(self)
60 | radio_frame.pack(padx=20, pady=10) # Add padding around the frame
61 |
62 | models = [
63 | ('GPT-4o (Default. Medium-Accurate, Medium-Fast)', 'gpt-4o'),
64 | ('GPT-4o-mini (Cheapest, Fastest)', 'gpt-4o-mini'),
65 | ('GPT-4v (Deprecated. Most-Accurate, Slowest)', 'gpt-4-vision-preview'),
66 | ('GPT-4-Turbo (Least Accurate, Fast)', 'gpt-4-turbo'),
67 | ('', ''),
68 | ('Gemini gemini-2.0-flash (Free, Fast)', 'gemini-2.0-flash'),
69 | ('Gemini gemini-2.0-flash-lite', 'gemini-2.0-flash-lite'),
70 | ('Gemini gemini-2.0-flash-thinking-exp', 'gemini-2.0-flash-thinking-exp'),
71 | ('Gemini gemini-2.0-pro-exp-02-05', 'gemini-2.0-pro-exp-02-05'),
72 | ('', ''),
73 | ('Custom (Specify Settings Below)', 'custom')
74 | ]
75 | for text, value in models:
76 | if text == '' and value == '':
77 | ttk.Separator(radio_frame, orient='horizontal').pack(fill='x', pady=10)
78 | else:
79 | ttk.Radiobutton(radio_frame, text=text, value=value, variable=self.model_var, bootstyle="info").pack(
80 | anchor=ttk.W, pady=5)
81 |
82 | label_base_url = ttk.Label(self, text='Custom OpenAI-Like API Model Base URL', bootstyle="secondary")
83 | label_base_url.pack(pady=10)
84 |
85 | # Entry for Base URL
86 | self.base_url_entry = ttk.Entry(self, width=30)
87 | self.base_url_entry.pack()
88 |
89 | # Model Label
90 | label_model = ttk.Label(self, text='Custom Model Name:', bootstyle="secondary")
91 | label_model.pack(pady=10)
92 |
93 | # Entry for Model
94 | self.model_entry = ttk.Entry(self, width=30)
95 | self.model_entry.pack()
96 |
97 | # Save Button
98 | save_button = ttk.Button(self, text='Save Settings', bootstyle="success", command=self.save_button)
99 | save_button.pack(pady=20)
100 |
101 | # Restart App Label
102 | restart_app_label = ttk.Label(self, text='Restart the app after any change in settings',
103 | font=('Helvetica', 10))
104 | restart_app_label.pack(pady=(0, 20))
105 |
106 | def save_button(self) -> None:
107 | base_url = self.base_url_entry.get().strip()
108 | model = self.model_var.get() if self.model_var.get() != 'custom' else self.model_entry.get().strip()
109 | settings_dict = {
110 | 'base_url': base_url,
111 | 'model': model,
112 | }
113 |
114 | self.settings.save_settings_to_file(settings_dict)
115 | self.destroy()
116 |
117 | class SettingsWindow(ttk.Toplevel):
118 | """
119 | Self-contained settings sub-window for the UI
120 | """
121 |
122 | def __init__(self, parent):
123 | super().__init__(parent)
124 | self.title('Settings')
125 | self.minsize(300, 450)
126 | self.available_themes = ['darkly', 'cyborg', 'journal', 'solar', 'superhero']
127 | self.create_widgets()
128 |
129 | self.settings = Settings()
130 |
131 | # Populate UI
132 | settings_dict = self.settings.get_dict()
133 |
134 | if 'api_key' in settings_dict:
135 | self.api_key_entry.insert(0, settings_dict['api_key'])
136 | if 'default_browser' in settings_dict:
137 | self.browser_combobox.set(settings_dict['default_browser'])
138 | if 'play_ding_on_completion' in settings_dict:
139 | self.play_ding.set(1 if settings_dict['play_ding_on_completion'] else 0)
140 | if 'custom_llm_instructions':
141 | self.llm_instructions_text.insert('1.0', settings_dict['custom_llm_instructions'])
142 | self.theme_combobox.set(settings_dict.get('theme', 'superhero'))
143 |
144 | def create_widgets(self) -> None:
145 | # API Key Widgets
146 | label_api = ttk.Label(self, text='OpenAI/Gemini/LLM Model API Key:', bootstyle="info")
147 | label_api.pack(pady=10)
148 | self.api_key_entry = ttk.Entry(self, width=30)
149 | self.api_key_entry.pack()
150 |
151 | # Label for Browser Choice
152 | label_browser = ttk.Label(self, text='Choose Default Browser:', bootstyle="info")
153 | label_browser.pack(pady=10)
154 |
155 | # Dropdown for Browser Choice
156 | self.browser_var = ttk.StringVar()
157 | self.browser_combobox = ttk.Combobox(self, textvariable=self.browser_var,
158 | values=['Safari', 'Firefox', 'Chrome'])
159 | self.browser_combobox.pack(pady=5)
160 | self.browser_combobox.set('Choose Browser')
161 |
162 | # Label for Custom LLM Guidance
163 | label_llm = ttk.Label(self, text='Custom LLM Guidance:', bootstyle="info")
164 | label_llm.pack(pady=10)
165 |
166 | # Text Box for Custom LLM Instructions
167 | self.llm_instructions_text = ttk.Text(self, height=10, width=50)
168 | self.llm_instructions_text.pack(padx=(10, 10), pady=(0, 10))
169 |
170 | # Checkbox for "Play Ding" option
171 | self.play_ding = ttk.IntVar()
172 | play_ding_checkbox = ttk.Checkbutton(self, text="Play Ding on Completion", variable=self.play_ding,
173 | bootstyle="round-toggle")
174 | play_ding_checkbox.pack(pady=10)
175 |
176 | # Theme Selection Widgets
177 | label_theme = ttk.Label(self, text='UI Theme:', bootstyle="info")
178 | label_theme.pack()
179 | self.theme_var = ttk.StringVar()
180 | self.theme_combobox = ttk.Combobox(self, textvariable=self.theme_var, values=self.available_themes,
181 | state="readonly")
182 | self.theme_combobox.pack(pady=5)
183 | self.theme_combobox.set('superhero')
184 | # Add binding for immediate theme change
185 | self.theme_combobox.bind('<>', self.on_theme_change)
186 |
187 | # Button to open Advanced Settings
188 | advanced_settings_button = ttk.Button(self, text='Advanced Settings', bootstyle="info",
189 | command=self.open_advanced_settings)
190 | advanced_settings_button.pack(pady=(10, 0))
191 |
192 | # Save Button
193 | save_button = ttk.Button(self, text='Save Settings', bootstyle="success", command=self.save_button)
194 | save_button.pack(pady=5)
195 |
196 | # Restart App Label
197 | restart_app_label = ttk.Label(self, text='Restart the app after any change in settings',
198 | font=('Helvetica', 10))
199 | restart_app_label.pack(pady=(0, 10))
200 |
201 | # Hyperlink Label
202 | link_label = ttk.Label(self, text='Setup Instructions', bootstyle="primary")
203 | link_label.pack()
204 | link_label.bind('', lambda e: open_link(
205 | 'https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#setup-%EF%B8%8F'))
206 |
207 | # Check for updates Label
208 | update_label = ttk.Label(self, text='Check for Updates', bootstyle="primary")
209 | update_label.pack()
210 | update_label.bind('', lambda e: open_link(
211 | 'https://github.com/AmberSahdev/Open-Interface/releases/latest'))
212 |
213 | # Version Label
214 | version_label = ttk.Label(self, text=f'Version: {str(version)}', font=('Helvetica', 10))
215 | version_label.pack(side="bottom", pady=10)
216 |
217 | def on_theme_change(self, event=None) -> None:
218 | # Apply theme immediately when selected
219 | theme = self.theme_var.get()
220 | self.master.change_theme(theme)
221 |
222 | def save_button(self) -> None:
223 | theme = self.theme_var.get()
224 | api_key = self.api_key_entry.get().strip()
225 | default_browser = self.browser_var.get()
226 | settings_dict = {
227 | 'api_key': api_key,
228 | 'default_browser': default_browser,
229 | 'play_ding_on_completion': bool(self.play_ding.get()),
230 | 'custom_llm_instructions': self.llm_instructions_text.get("1.0", "end-1c").strip(),
231 | 'theme': theme
232 | }
233 |
234 | # Remove redundant theme change since it's already applied
235 | self.settings.save_settings_to_file(settings_dict)
236 | self.destroy()
237 |
238 | def open_advanced_settings(self):
239 | # Open the advanced settings window
240 | UI.AdvancedSettingsWindow(self)
241 |
242 | class MainWindow(ttk.Window):
243 | def __init__(self):
244 | settings = Settings()
245 | settings_dict = settings.get_dict()
246 | theme = settings_dict.get('theme', 'superhero')
247 |
248 | try:
249 | super().__init__(themename=theme)
250 | except:
251 | super().__init__() # https://github.com/AmberSahdev/Open-Interface/issues/35
252 |
253 | self.title('Open Interface')
254 | window_width = 450
255 | window_height = 270
256 | self.minsize(window_width, window_height)
257 |
258 | # Set the geometry of the window
259 | # Calculate position for bottom right corner
260 | screen_width = self.winfo_screenwidth()
261 | x_position = screen_width - window_width - 10 # 10px margin from the right edge
262 | y_position = 50 # 50px margin from the bottom edge
263 | self.geometry(f'{window_width}x{window_height}+{x_position}+{y_position}')
264 |
265 | # PhotoImage object needs to persist as long as the app does, hence it's a class object.
266 | path_to_icon_png = Path(__file__).resolve().parent.joinpath('resources', 'icon.png')
267 | # path_to_microphone_png = Path(__file__).resolve().parent.joinpath('resources', 'microphone.png')
268 | self.logo_img = ImageTk.PhotoImage(Image.open(path_to_icon_png).resize((50, 50)))
269 | # self.mic_icon = ImageTk.PhotoImage(Image.open(path_to_microphone_png).resize((18, 18)))
270 |
271 | # This adds app icon in linux which pyinstaller can't
272 | self.tk.call('wm', 'iconphoto', self._w, self.logo_img)
273 |
274 | ###
275 | # MP Queue to facilitate communication between UI and Core.
276 | # Put user requests received from UI text box into this queue which will then be dequeued in App to be sent
277 | # to core.
278 | self.user_request_queue = Queue()
279 |
280 | # Put messages to display on the UI here so we can dequeue them in the main thread
281 | self.message_display_queue = Queue()
282 | # Set up periodic UI processing
283 | self.after(200, self.process_message_display_queue)
284 | ###
285 |
286 | self.create_widgets()
287 |
288 | def change_theme(self, theme_name: str) -> None:
289 | self.style.theme_use(theme_name)
290 |
291 | def create_widgets(self) -> None:
292 | # Creates and arranges the UI elements
293 | # Frame
294 | frame = ttk.Frame(self, padding='10 10 10 10')
295 | frame.grid(column=0, row=0, sticky=(ttk.W, ttk.E, ttk.N, ttk.S))
296 | frame.columnconfigure(0, weight=1)
297 |
298 | logo_label = ttk.Label(frame, image=self.logo_img)
299 | logo_label.grid(column=0, row=0, sticky=ttk.W, pady=(10, 20))
300 |
301 | # Heading Label
302 | heading_label = ttk.Label(frame, text='What would you like me to do?', font=('Helvetica', 16),
303 | bootstyle="primary",
304 | wraplength=300)
305 | heading_label.grid(column=0, row=1, columnspan=3, sticky=ttk.W)
306 |
307 | # Entry widget
308 | self.entry = ttk.Entry(frame, width=38)
309 | self.entry.grid(column=0, row=2, sticky=(ttk.W, ttk.E))
310 |
311 | # Bind the Enter key to the submit function
312 | self.entry.bind("", lambda event: self.execute_user_request())
313 | self.entry.bind("", lambda event: self.execute_user_request())
314 |
315 | # Mic Button
316 | # mic_button = ttk.Button(frame, image=self.mic_icon, bootstyle="link", command=self.start_voice_input_thread)
317 | # mic_button.grid(column=1, row=2, padx=(0, 5))
318 |
319 | # Submit Button
320 | button = ttk.Button(frame, text='Submit', bootstyle="success", command=self.execute_user_request)
321 | button.grid(column=2, row=2, padx=10)
322 |
323 | # Settings Button
324 | settings_button = ttk.Button(self, text='Settings', bootstyle="info-outline", command=self.open_settings)
325 | settings_button.place(relx=1.0, rely=0.0, anchor='ne', x=-5, y=5)
326 |
327 | # Stop Button
328 | stop_button = ttk.Button(self, text='Stop', bootstyle="danger-outline", command=self.stop_previous_request)
329 | stop_button.place(relx=1.0, rely=1.0, anchor='se', x=-10, y=-10)
330 |
331 | # Text display for echoed input
332 | self.input_display = ttk.Label(frame, text='', font=('Helvetica', 16), wraplength=400)
333 | self.input_display.grid(column=0, row=3, columnspan=3, sticky=ttk.W)
334 |
335 | # Text display for additional messages
336 | self.message_display = ttk.Label(frame, text='', font=('Helvetica', 14), wraplength=400)
337 | self.message_display.grid(column=0, row=6, columnspan=3, sticky=ttk.W)
338 |
339 | def open_settings(self) -> None:
340 | UI.SettingsWindow(self)
341 |
342 | def stop_previous_request(self) -> None:
343 | # Interrupt currently running request by queueing a stop signal.
344 | self.user_request_queue.put('stop')
345 |
346 | # force quit program
347 | self.destroy()
348 |
349 | def display_input(self) -> str:
350 | # Get the entry and update the input display
351 | user_input = self.entry.get()
352 | self.input_display['text'] = f'{user_input}'
353 |
354 | # Clear the entry widget
355 | self.entry.delete(0, ttk.END)
356 |
357 | return user_input.strip()
358 |
359 | def execute_user_request(self) -> None:
360 | # Puts the user request received from the UI into the MP queue being read in App to be sent to Core.
361 | user_request = self.display_input()
362 |
363 | if user_request == '' or user_request is None:
364 | return
365 |
366 | self.update_message('Fetching Instructions')
367 |
368 | self.user_request_queue.put(user_request)
369 |
370 | def start_voice_input_thread(self) -> None:
371 | # Start voice input in a separate thread
372 | threading.Thread(target=self.voice_input, daemon=True).start()
373 |
374 | def voice_input(self) -> None:
375 | # Function to handle voice input
376 | # Currently commented out because the speech_recognition library doesn't compile well on MacOS.
377 | # TODO: Replace with an alternative library
378 | """
379 | recognizer = sr.Recognizer()
380 | with sr.Microphone() as source:
381 | self.update_message('Listening...')
382 | # This might also help with asking for mic permissions on Macs
383 | recognizer.adjust_for_ambient_noise(source)
384 | try:
385 | audio = recognizer.listen(source, timeout=4)
386 | try:
387 | text = recognizer.recognize_google(audio)
388 | self.entry.delete(0, ttk.END)
389 | self.entry.insert(0, text)
390 | self.update_message('')
391 | except sr.UnknownValueError:
392 | self.update_message('Could not understand audio')
393 | except sr.RequestError as e:
394 | self.update_message(f'Could not request results - {e}')
395 | except sr.WaitTimeoutError:
396 | self.update_message('Didn\'t hear anything')
397 | """
398 |
399 | def update_message(self, message: str) -> None:
400 | # Update the message display with the provided text.
401 | # Ensure thread safety when updating the Tkinter GUI.
402 | try:
403 | if threading.current_thread() is threading.main_thread():
404 | self.message_display['text'] = message
405 | else:
406 | self.message_display_queue.put(message)
407 | except Exception as e:
408 | print(f"Error updating message: {e}")
409 |
410 | def process_message_display_queue(self):
411 | try:
412 | while not self.message_display_queue.empty():
413 | message = self.message_display_queue.get_nowait()
414 | self.message_display.config(text=message)
415 | except Exception as e:
416 | print(f"Error processing message_display_queue: {e}")
417 |
418 | # Call this function every 100ms
419 | self.after(200, self.process_message_display_queue)
420 |
--------------------------------------------------------------------------------
/app/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/app/utils/__init__.py
--------------------------------------------------------------------------------
/app/utils/local_info.py:
--------------------------------------------------------------------------------
1 | import os
2 | import platform
3 |
4 | """
5 | List the apps the user has locally, default browsers, etc.
6 | """
7 | try:
8 | locally_installed_apps: list[str] = [app for app in os.listdir('/Applications') if app.endswith('.app')]
9 | except:
10 | locally_installed_apps: list[str] = ["Unknown"]
11 |
12 | operating_system: str = platform.platform()
13 |
--------------------------------------------------------------------------------
/app/utils/screen.py:
--------------------------------------------------------------------------------
1 | import base64
2 | import io
3 | import os
4 | import tempfile
5 |
6 | import pyautogui
7 | from PIL import Image
8 | from utils.settings import Settings
9 |
10 |
11 | class Screen:
12 | def get_size(self) -> tuple[int, int]:
13 | screen_width, screen_height = pyautogui.size() # Get the size of the primary monitor.
14 | return screen_width, screen_height
15 |
16 | def get_screenshot(self) -> Image.Image:
17 | # Enable screen recording from settings
18 | img = pyautogui.screenshot() # Takes roughly 100ms # img.show()
19 | return img
20 |
21 | def get_screenshot_in_base64(self) -> str:
22 | # Base64 images work with ChatCompletions API but not Assistants API
23 | img_bytes = self.get_screenshot_as_file_object()
24 | encoded_image = base64.b64encode(img_bytes.read()).decode('utf-8')
25 | return encoded_image
26 |
27 | def get_screenshot_as_file_object(self):
28 | # In memory files don't work with OpenAI Assistants API because of missing filename attribute
29 | img_bytes = io.BytesIO()
30 | img = self.get_screenshot()
31 | img.save(img_bytes, format='PNG') # Save the screenshot to an in-memory file.
32 | img_bytes.seek(0)
33 | return img_bytes
34 |
35 | def get_temp_filename_for_current_screenshot(self):
36 | with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as tmpfile:
37 | screenshot = self.get_screenshot()
38 | screenshot.save(tmpfile.name)
39 | return tmpfile.name
40 |
41 | def get_screenshot_file(self):
42 | # Gonna always keep a screenshot.png in ~/.open-interface/ because file objects, temp files, every other way has an error
43 | filename = 'screenshot.png'
44 | filepath = os.path.join(Settings().get_settings_directory_path(), filename)
45 | img = self.get_screenshot()
46 | img.save(filepath)
47 | return filepath
48 |
--------------------------------------------------------------------------------
/app/utils/settings.py:
--------------------------------------------------------------------------------
1 | import base64
2 | import json
3 | import os
4 | from pathlib import Path
5 |
6 |
7 | class Settings:
8 | def __init__(self):
9 | self.settings_file_path = self.get_settings_directory_path() + 'settings.json'
10 | os.makedirs(os.path.dirname(self.settings_file_path), exist_ok=True)
11 | self.settings = self.load_settings_from_file()
12 |
13 | def get_settings_directory_path(self):
14 | return str(Path.home()) + '/.open-interface/'
15 |
16 | def get_dict(self) -> dict[str, str]:
17 | return self.settings
18 |
19 | def _read_settings_file(self) -> dict[str, str]:
20 | if os.path.exists(self.settings_file_path):
21 | with open(self.settings_file_path, 'r') as file:
22 | try:
23 | return json.load(file)
24 | except Exception:
25 | return {}
26 | return {}
27 |
28 | def save_settings_to_file(self, settings_dict) -> None:
29 | settings: dict[str, str] = self._read_settings_file()
30 |
31 | for setting_name in settings_dict:
32 | setting_val = settings_dict[setting_name]
33 | if setting_val is not None:
34 | if setting_name == "api_key":
35 | api_key = settings_dict["api_key"]
36 |
37 | # TODO: Now we have two keys OPENAI_API_KEY and GEMINI_API_KEY
38 | os.environ["OPENAI_API_KEY"] = api_key # Set environment variable
39 |
40 | encoded_api_key = base64.b64encode(api_key.encode()).decode()
41 | settings['api_key'] = encoded_api_key
42 | else:
43 | settings[setting_name] = setting_val
44 |
45 | with open(self.settings_file_path, 'w+') as file:
46 | json.dump(settings, file, indent=4)
47 |
48 | def load_settings_from_file(self) -> dict[str, str]:
49 | """
50 | if os.path.exists(self.settings_file_path):
51 | with open(self.settings_file_path, 'r') as file:
52 | try:
53 | settings = json.load(file)
54 | except:
55 | return {}
56 |
57 | # Decode the API key
58 | if 'api_key' in settings:
59 | decoded_api_key = base64.b64decode(settings['api_key']).decode()
60 | settings['api_key'] = decoded_api_key
61 |
62 | return settings
63 | else:
64 | return {}
65 | """
66 | settings: dict[str, str] = self._read_settings_file()
67 | # Decode the API keys
68 | if 'api_key' in settings:
69 | decoded_api_key = base64.b64decode(settings['api_key']).decode()
70 | settings['api_key'] = decoded_api_key
71 |
72 | return settings
73 |
--------------------------------------------------------------------------------
/app/version.py:
--------------------------------------------------------------------------------
1 | from packaging.version import Version
2 |
3 | version = Version('0.9.0')
4 |
--------------------------------------------------------------------------------
/assets/Simple_Bottom_of_Wikipedia_2x.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/Simple_Bottom_of_Wikipedia_2x.gif
--------------------------------------------------------------------------------
/assets/advanced_settings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/advanced_settings.png
--------------------------------------------------------------------------------
/assets/code_web_app_demo_2x.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/code_web_app_demo_2x.gif
--------------------------------------------------------------------------------
/assets/code_web_app_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/code_web_app_image.png
--------------------------------------------------------------------------------
/assets/icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/icon.png
--------------------------------------------------------------------------------
/assets/icon_old.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/icon_old.png
--------------------------------------------------------------------------------
/assets/mac_m3_accessibility.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/mac_m3_accessibility.png
--------------------------------------------------------------------------------
/assets/mac_m3_screenrecording.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/mac_m3_screenrecording.png
--------------------------------------------------------------------------------
/assets/macos_accessibility.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/macos_accessibility.png
--------------------------------------------------------------------------------
/assets/macos_open_anyway.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/macos_open_anyway.png
--------------------------------------------------------------------------------
/assets/macos_screen_recording.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/macos_screen_recording.png
--------------------------------------------------------------------------------
/assets/macos_security.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/macos_security.png
--------------------------------------------------------------------------------
/assets/macos_system_preferences.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/macos_system_preferences.png
--------------------------------------------------------------------------------
/assets/macos_unverified_developer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/macos_unverified_developer.png
--------------------------------------------------------------------------------
/assets/macos_unzip_move_to_applications.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/macos_unzip_move_to_applications.png
--------------------------------------------------------------------------------
/assets/meal_plan_demo_2x.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/meal_plan_demo_2x.gif
--------------------------------------------------------------------------------
/assets/meal_plan_demo_2x_with_annotation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/meal_plan_demo_2x_with_annotation.gif
--------------------------------------------------------------------------------
/assets/meal_plan_demo_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/meal_plan_demo_image.png
--------------------------------------------------------------------------------
/assets/mov_to_2x_mov_and_gif.py:
--------------------------------------------------------------------------------
1 | from moviepy.editor import VideoFileClip, concatenate_videoclips, vfx
2 | import os
3 |
4 | # Load the video
5 | # NOTE: If the video is too big you might have to trim it yourself first
6 | video_name = "wordle_demo"
7 | video = VideoFileClip(video_name + ".mov")
8 |
9 | # Define the segments to remove (in seconds) as tuples (start, end)
10 | segments_to_remove = [(4, 20), (30, 40), (55, 80), (87, 97)]
11 |
12 | # Create a list to hold the subclips
13 | subclips = []
14 |
15 | # Initial start of the first clip
16 | start = 0
17 | for (start_remove, end_remove) in segments_to_remove:
18 | # Add subclip before the segment to remove
19 | subclips.append(video.subclip(start, start_remove))
20 | # Update the start for the next subclip
21 | start = end_remove
22 |
23 | # Add the last subclip after the final segment to remove
24 | subclips.append(video.subclip(start))
25 |
26 | # Concatenate all the subclips
27 | final_clip = concatenate_videoclips(subclips)
28 |
29 | # Speed up the video
30 | final_clip = final_clip.fx(vfx.speedx, 2) # 2x speed
31 |
32 | # Write the result to a file
33 | output_video = video_name + "_2x.mov"
34 | final_clip.write_videofile(output_video, codec='libx264')
35 |
36 | # Generate palette for high-quality GIF
37 | os.system(f"ffmpeg -i {output_video} -vf \"fps=10,scale=1080:-1:flags=bicubic,palettegen\" -y palette.png")
38 |
39 | # Create the GIF using the generated palette
40 | output_gif = video_name + "_2x.gif"
41 | os.system(f"rm {output_gif}")
42 | os.system(f"ffmpeg -i {output_video} -i palette.png -lavfi \"fps=10,scale=1080:-1:flags=bicubic [x]; [x][1:v] paletteuse\" {output_gif}")
--------------------------------------------------------------------------------
/assets/palette.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/palette.png
--------------------------------------------------------------------------------
/assets/set_openai_api_key.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/set_openai_api_key.png
--------------------------------------------------------------------------------
/assets/simple_bottom_of_wikipedia_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/simple_bottom_of_wikipedia_image.png
--------------------------------------------------------------------------------
/assets/ui.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/ui.png
--------------------------------------------------------------------------------
/assets/ui0.8.0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/ui0.8.0.png
--------------------------------------------------------------------------------
/assets/ui2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/ui2.png
--------------------------------------------------------------------------------
/assets/ui_0_5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/ui_0_5.png
--------------------------------------------------------------------------------
/assets/ui_0_6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/ui_0_6.png
--------------------------------------------------------------------------------
/assets/wordle_demo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/wordle_demo.png
--------------------------------------------------------------------------------
/assets/wordle_demo_2x.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/assets/wordle_demo_2x.gif
--------------------------------------------------------------------------------
/build.py:
--------------------------------------------------------------------------------
1 | """
2 | PyInstaller build script
3 |
4 | > python3 -m venv env
5 | > source env/bin/activate (env Scripts activate for Windows)
6 | > python3 -m pip install -r requirements.txt
7 | > python3 -m pip install pyinstaller
8 | > python3 build.py
9 |
10 |
11 | Platform specific libraries that MIGHT be needed for compiling binaries
12 | Linux
13 | - sudo apt install portaudio19-dev
14 | - sudo apt-get install python3-tk python3-dev
15 |
16 | MacOS
17 | - brew install portaudio
18 | - if you're using pyenv, you might also need to install tkinter manually.
19 | I followed this guide https://dev.to/xshapira/using-tkinter-with-pyenv-a-simple-two-step-guide-hh5.
20 |
21 | NOTES:
22 | 1. For use in future projects, note that pyinstaller will print hundreds of unrelated error messages, but to find
23 | the critical error start scrolling upwards from the bottom and find the first error before it starts cleanup and
24 | destroying resources. It will likely be an import or a path error.
25 | 2. Extra steps before using multiprocessing might be required
26 | https://www.pyinstaller.org/en/stable/common-issues-and-pitfalls.html#why-is-calling-multiprocessing-freeze-support-required
27 | 3. Change file reads accordingly
28 | https://pyinstaller.org/en/stable/runtime-information.html#placing-data-files-at-expected-locations-inside-the-bundle
29 | 4. Code signing for MacOS
30 | https://github.com/pyinstaller/pyinstaller/wiki/Recipe-OSX-Code-Signing
31 | https://developer.apple.com/library/archive/technotes/tn2206/_index.html
32 | https://gist.github.com/txoof/0636835d3cc65245c6288b2374799c43
33 | https://github.com/txoof/codesign
34 | https://github.com/The-Nicholas-R-Barrow-Company-LLC/python3-pyinstaller-base-app-codesigning
35 | https://pyinstaller.org/en/stable/feature-notes.html#macos-binary-code-signing
36 | """
37 |
38 | import os
39 | import platform
40 | import sys
41 |
42 | import PyInstaller.__main__
43 |
44 | from app.version import version
45 |
46 |
47 | def build(signing_key=None):
48 | input('Did you remember to increment version.py? ' + str(version))
49 | app_name = 'Open\\ Interface'
50 |
51 | compile(signing_key)
52 |
53 | macos = platform.system() == 'Darwin'
54 | if macos and signing_key:
55 | # Codesign
56 | os.system(
57 | f'codesign --deep --force --verbose --sign "{signing_key}" dist/{app_name}.app --options runtime')
58 |
59 | zip_name = zip()
60 |
61 | if macos and signing_key:
62 | keychain_profile = signing_key.split('(')[0].strip()
63 |
64 | # Notarize
65 | os.system(f'xcrun notarytool submit --wait --keychain-profile "{keychain_profile}" --verbose dist/{zip_name}')
66 | input(f'Check whether notarization was successful using \n\t xcrun notarytool history --keychain-profile {keychain_profile}.\nYou can check debug logs using \n\t xcrun notarytool log --keychain-profile "{keychain_profile}" ')
67 |
68 | # Staple
69 | os.system(f'xcrun stapler staple dist/{app_name}.app')
70 |
71 | # Zip the signed, stapled file
72 | zip_name = zip()
73 |
74 |
75 | def compile(signing_key=None):
76 | # Path to your main application script
77 | app_script = os.path.join('app', 'app.py')
78 |
79 | # Common PyInstaller options
80 | pyinstaller_options = [
81 | '--clean',
82 | '--noconfirm',
83 |
84 | # Debug
85 | # '--debug=all',
86 |
87 | # --- Basics --- #
88 | '--name=Open Interface',
89 | '--icon=app/resources/icon.png',
90 | '--windowed', # Remove this if your application is a console program, also helps to remove this while debugging
91 | # '--onefile', # NOTE: Might not work on Windows. Also discouraged to enable both windowed and one file on Mac.
92 |
93 | # Where to find necessary packages to bundle (python3 -m pip show xxx)
94 | '--paths=./env/lib/python3.12/site-packages',
95 |
96 | # Packaging fails without explicitly including these modules here as shown by the logs outputted by debug=all
97 | '--hidden-import=pyautogui',
98 | '--hidden-import=appdirs',
99 | '--hidden-import=pyparsing',
100 | '--hidden-import=ttkbootstrap',
101 | '--hidden-import=openai',
102 |
103 | # pypi google_genai doesn't play nice with pyinstaller without this
104 | '--hidden-import=google_genai',
105 | '--hidden-import=google',
106 | '--hidden-import=google.genai',
107 |
108 | # NOTE: speech_recognition is the name of the directory that this package is in within ../site-packages/,
109 | # whereas the pypi name is SpeechRecognition (pip install SpeechRecognition).
110 | # This was hard to pin down and took a long time to debug.
111 | # '--hidden-import=speech_recognition',
112 |
113 | # Static files and resources --add-data=src:dest
114 | # - File reads change accordingly - https://pyinstaller.org/en/stable/runtime-information.html#placing-data-files-at-expected-locations-inside-the-bundle
115 | '--add-data=app/resources/*:resources',
116 |
117 | # Manually including source code and submodules because app doesn't launch without it
118 | '--add-data=app/*.py:.',
119 | '--add-data=app/utils/*.py:utils', # Submodules need to be included manually
120 | '--add-data=app/models/*.py:models', # Submodules need to be included manually
121 |
122 | app_script
123 | ]
124 |
125 | # Platform-specific options
126 | if platform.system() == 'Darwin': # MacOS
127 | if signing_key:
128 | pyinstaller_options.extend([
129 | f'--codesign-identity={signing_key}'
130 | ])
131 | # Apple Notarization has a problem because this binary used in speech_recognition is signed with too old an SDK
132 | # from PyInstaller.utils.osx import set_macos_sdk_version
133 | # set_macos_sdk_version('env/lib/python3.12/site-packages/speech_recognition/flac-mac', 10, 9, 0) # NOTE: Change the path according to where your binary is located
134 | elif platform.system() == 'Linux':
135 | pyinstaller_options.extend([
136 | '--hidden-import=PIL._tkinter_finder',
137 | '--onefile'
138 | ])
139 | elif platform.system() == 'Windows':
140 | pyinstaller_options.extend([
141 | '--onefile'
142 | ])
143 |
144 | # Run PyInstaller with the specified options
145 | PyInstaller.__main__.run(pyinstaller_options)
146 | print('Done. Check dist/ for executables.')
147 |
148 |
149 | def zip():
150 | # Zip the app
151 | print('Zipping the executables')
152 | app_name = 'Open\\ Interface'
153 |
154 | zip_name = 'Open-Interface-v' + str(version)
155 | if platform.system() == 'Darwin': # MacOS
156 | if platform.processor() == 'arm':
157 | zip_name = zip_name + '-MacOS-M-Series' + '.zip'
158 | else:
159 | zip_name = zip_name + '-MacOS-Intel' + '.zip'
160 |
161 | # Special zip command for macos to keep the complex directory metadata intact to keep the codesigning valid
162 | zip_cli_command = 'cd dist/; ditto -c -k --sequesterRsrc --keepParent ' + app_name + '.app ' + zip_name
163 | elif platform.system() == 'Linux':
164 | zip_name = zip_name + '-Linux.zip'
165 | zip_cli_command = 'cd dist/; zip -r9 ' + zip_name + ' ' + app_name
166 | elif platform.system() == 'Windows':
167 | zip_name = zip_name + '-Windows.zip'
168 | zip_cli_command = 'cd dist & powershell Compress-Archive -Path \'Open Interface.exe\' -DestinationPath ' + zip_name
169 |
170 | # input(f'zip_cli_command - {zip_cli_command} \nExecute?')
171 | os.system(zip_cli_command)
172 | return zip_name
173 |
174 |
175 | def setup():
176 | # Update the venv with any new updates
177 | os.system("pip install -r requirements.txt")
178 |
179 |
180 | if __name__ == '__main__':
181 | apple_code_signing_key = None
182 | if len(sys.argv) > 1:
183 | apple_code_signing_key = sys.argv[1] # python3 build.py "Developer ID Application: ... (...)"
184 | print("apple_code_signing_key: ", apple_code_signing_key)
185 | elif len(sys.argv) == 1 and platform.system() == 'Darwin':
186 | input("Are you sure you don't wanna sign your code? ")
187 |
188 | setup()
189 | build(apple_code_signing_key)
190 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | altgraph==0.17.4
2 | annotated-types==0.6.0
3 | anyio==4.8.0
4 | certifi==2024.7.4
5 | charset-normalizer==3.3.2
6 | distro==1.9.0
7 | exceptiongroup==1.2.0
8 | h11==0.14.0
9 | httpcore==1.0.2
10 | httpx==0.28.1
11 | idna==3.7
12 | macholib==1.16.3
13 | MouseInfo==0.1.3
14 | moviepy==1.0.3
15 | openai==1.66.3
16 | packaging==24.2
17 | pillow==11.1.0
18 | PyAudio==0.2.14
19 | PyAutoGUI==0.9.54
20 | PyGetWindow==0.0.9
21 | pyinstaller==6.11.0
22 | pyinstaller-hooks-contrib==2024.9
23 | PyMsgBox==1.0.9
24 | pyobjc-core==10.1; sys_platform == 'darwin'
25 | pyobjc-framework-Cocoa==10.1; sys_platform == 'darwin'
26 | pyobjc-framework-Quartz==10.1; sys_platform == 'darwin'
27 | pyperclip==1.8.2
28 | PyRect==0.2.0
29 | PyScreeze==0.1.30
30 | pytweening==1.0.7
31 | requests==2.32.0
32 | rubicon-objc==0.4.7
33 | setuptools==75.3.0
34 | sniffio==1.3.0
35 | tqdm==4.66.4
36 | ttkbootstrap==1.10.1
37 | typing_extensions==4.12.2
38 | urllib3==2.2.2
39 | google-genai==1.5.0
40 |
--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AmberSahdev/Open-Interface/c96ffb27d0673a395e7852b643df9ad69d02c1c0/tests/__init__.py
--------------------------------------------------------------------------------
/tests/simple_test.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import threading
4 | import time
5 |
6 | sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
7 | sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../app')))
8 |
9 | from app import App
10 |
11 | from multiprocessing import freeze_support
12 |
13 |
14 | def main():
15 | app = App()
16 | threading.Thread(target=simple_test, args=(app,), daemon=True).start()
17 | app.run()
18 |
19 |
20 | def simple_test(app):
21 | # Says hi, waits 12 seconds, requests to open chrome
22 | time.sleep(1)
23 | put_requests_in_app(app, 'Hello')
24 | time.sleep(12)
25 | put_requests_in_app(app, 'Open Chrome')
26 |
27 |
28 | def put_requests_in_app(app, request):
29 | app.ui.main_window.user_request_queue.put(request)
30 |
31 |
32 | if __name__ == '__main__':
33 | freeze_support() # As required by pyinstaller https://www.pyinstaller.org/en/stable/common-issues-and-pitfalls.html#multi-processing
34 | main()
35 | sys.exit(0)
36 |
--------------------------------------------------------------------------------