├── .env.template ├── .gitignore ├── LICENSE ├── README.md ├── legacy ├── realtime-classes.py └── realtime-simple.py ├── requirements.txt └── src ├── AudioIO.py ├── Realtime.py ├── Socket.py └── main.py /.env.template: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY= 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .venv/ 3 | __pycache__/ 4 | .env 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Pi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # OpenAI Realtime API Python Edition 2 | 3 | Python implementation of OpenAI's realtime API 4 | 5 | OpenAI have a Node.js + JavaScript wrapper [here](https://github.com/openai/openai-realtime-api-beta), as well as a [openai-realtime-console](https://github.com/openai/openai-realtime-console) demo-project, but as yet nothing in Python, so here's a start at fixing that! 6 | 7 | ### A couple of useful links: 8 | - [Guide](https://platform.openai.com/docs/guides/realtime) 9 | - [API Reference](https://platform.openai.com/docs/api-reference/realtime-client-events) 10 | 11 | 12 | # Getting it running 13 | 14 | - Create a Virtual Environment if you want to: `python -m venv .venv ; ./.venv/bin/activate` 15 | 16 | - `pip install -r requirements.txt` 17 | 18 | - Create a `.env` file like `.env.template` filling in your OpenAI API key 19 | 20 | - Run it 21 | 22 | You can run the legacy files: `python legacy/realtime-simple.py` or `python legacy/realtime-classes.py` which work while being minimal (especially the first one). Probably good for getting a feeling of how it works. 23 | 24 | Alternatively `cd src; python main.py` -- this is the codebase I'll be building off moving forwards. 25 | 26 | 27 | # Notes: 28 | 29 | ## legacy/ 30 | - `legacy/realtime-simple.py` is "Least number of lines that gets the job done" 31 | - `legacy/realtime-classes.py` is arguably tidier 32 | 33 | Both work! The AI talks to you, and you can talk back. 34 | 35 | I have to mute the mic while the AI is speaking, else it gets back its own audio and gets very (entertainingly) confused. "Hello, I'm a helpful assistant!" "Gosh, so am I!" "What a coincidence!" "I know, right?!", etc. 36 | 37 | ## src/ (current/future) 38 | I've abstracted websocket-stuff and audioIO-stuff into Socket.py and AudioIO.py, which leaves Realtime.py free to make more sense. 39 | 40 | I did take a run at doing this async with Trio, but at this point it just gets in the way. Maybe I'll return to an async model. I'm not sold on it, much as I love Trio; exception-handling and teardown are a pain. 41 | 42 | ## Additional note (7 Oct 2024) 43 | After some testing, it's clear that legacy/realtime-simple.py functions crisply, and there is some responsiveness issue with src/. 44 | 45 | This could be a locking issue, with audio arriving from the websocket into an input buffer, which is, on a worker thread, drained to the speakers. It could be with mic-data, which is buffered in a worker thread and drained to the websocket. It could be both, and/or something else. 46 | 47 | Python is not an ideal language for realtime audio processing, and likely this was factored into account by the OpenAI team's decision to initially publish only a Node.js implementation. 48 | 49 | # Vision 50 | 51 | It would be nice to clean this up to act as a fully-featured Python API for this service. 52 | 53 | 54 | # TODO 55 | 56 | - Firstly the code needs picking through, to ensure a clean / robust skeleton. 57 | 58 | - EDIT: Actually the architecture in src/ needs to be revised, to account for the above Additional Note. 59 | 60 | - Need some thought on what such a lib should expose & how to expose it (e.g. callbacks). 61 | 62 | - Fleshing out API support (it's quite a big API). 63 | 64 | - Tool-Use / Function-Calling. 65 | 66 | - User-interruption-support via feedback cancellation (currently I'm having to mute the mic while openAI audio is playing out of the speakers, which means I can't interrupt it). There's WebRTC AEC (Adaptive Echo Cancellation), but I can't find any off-the-shelf pip library that doesn't require fiddling (building deps). Maybe `pip install adaptfilt` is a good solution. This looks doable. 67 | 68 | 69 | # Do involve! 70 | 71 | Contributions are invited, in which case you are welcome to contact the author (You'll find a link to the sap.ient.ai Discord on https://github.com/sap-ient-ai upon which I exist as `_p_i_`). 72 | 73 | 74 | # Thanks 75 | 76 | Thanks to https://www.naptha.ai/ for providing vital funding that allows me to Do My Own Thing. 77 | -------------------------------------------------------------------------------- /legacy/realtime-classes.py: -------------------------------------------------------------------------------- 1 | import threading 2 | import pyaudio 3 | import queue 4 | import base64 5 | import json 6 | import os 7 | import time 8 | from websocket import create_connection, WebSocketConnectionClosedException 9 | from dotenv import load_dotenv 10 | import logging 11 | 12 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s') 13 | 14 | load_dotenv() 15 | 16 | CHUNK_SIZE = 1024 17 | RATE = 24000 18 | FORMAT = pyaudio.paInt16 19 | REENGAGE_DELAY_MS = 500 20 | 21 | 22 | class Socket: 23 | def __init__(self, api_key, ws_url): 24 | self.api_key = api_key 25 | self.ws_url = ws_url 26 | self.ws = None 27 | self.on_msg = None 28 | self._stop_event = threading.Event() 29 | self.recv_thread = None 30 | self.lock = threading.Lock() 31 | 32 | def connect(self): 33 | self.ws = create_connection(self.ws_url, header=[f'Authorization: Bearer {self.api_key}', 'OpenAI-Beta: realtime=v1']) 34 | logging.info('Connected to WebSocket.') 35 | self.recv_thread = threading.Thread(target=self._receive_messages) 36 | self.recv_thread.start() 37 | 38 | def _receive_messages(self): 39 | while not self._stop_event.is_set(): 40 | try: 41 | message = self.ws.recv() 42 | if message and self.on_msg: 43 | self.on_msg(json.loads(message)) 44 | except WebSocketConnectionClosedException: 45 | logging.error('WebSocket connection closed.') 46 | break 47 | except Exception as e: 48 | logging.error(f'Error receiving message: {e}') 49 | logging.info('Exiting WebSocket receiving thread.') 50 | 51 | def send(self, data): 52 | try: 53 | with self.lock: 54 | if self.ws: 55 | self.ws.send(json.dumps(data)) 56 | except WebSocketConnectionClosedException: 57 | logging.error('WebSocket connection closed.') 58 | except Exception as e: 59 | logging.error(f'Error sending message: {e}') 60 | 61 | def kill(self): 62 | self._stop_event.set() 63 | if self.ws: 64 | try: 65 | self.ws.send_close() 66 | self.ws.close() 67 | logging.info('WebSocket connection closed.') 68 | except Exception as e: 69 | logging.error(f'Error closing WebSocket: {e}') 70 | if self.recv_thread: 71 | self.recv_thread.join() 72 | 73 | class AudioIO: 74 | def __init__(self, chunk_size=CHUNK_SIZE, rate=RATE, format=FORMAT): 75 | self.chunk_size = chunk_size 76 | self.rate = rate 77 | self.format = format 78 | self.audio_buffer = bytearray() 79 | self.mic_queue = queue.Queue() 80 | self.mic_on_at = 0 81 | self.mic_active = None 82 | self._stop_event = threading.Event() 83 | self.p = pyaudio.PyAudio() 84 | 85 | def _mic_callback(self, in_data, frame_count, time_info, status): 86 | if time.time() > self.mic_on_at: 87 | if not self.mic_active: 88 | logging.info('🎙️🟢 Mic active') 89 | self.mic_active = True 90 | self.mic_queue.put(in_data) 91 | else: 92 | if self.mic_active: 93 | logging.info('🎙️🔴 Mic suppressed') 94 | self.mic_active = False 95 | return (None, pyaudio.paContinue) 96 | 97 | def _spkr_callback(self, in_data, frame_count, time_info, status): 98 | bytes_needed = frame_count * 2 99 | current_buffer_size = len(self.audio_buffer) 100 | 101 | if current_buffer_size >= bytes_needed: 102 | audio_chunk = bytes(self.audio_buffer[:bytes_needed]) 103 | self.audio_buffer = self.audio_buffer[bytes_needed:] 104 | self.mic_on_at = time.time() + REENGAGE_DELAY_MS / 1000 105 | else: 106 | audio_chunk = bytes(self.audio_buffer) + b'\x00' * (bytes_needed - current_buffer_size) 107 | self.audio_buffer.clear() 108 | 109 | return (audio_chunk, pyaudio.paContinue) 110 | 111 | def start_streams(self): 112 | self.mic_stream = self.p.open( 113 | format=self.format, 114 | channels=1, 115 | rate=self.rate, 116 | input=True, 117 | stream_callback=self._mic_callback, 118 | frames_per_buffer=self.chunk_size 119 | ) 120 | self.spkr_stream = self.p.open( 121 | format=self.format, 122 | channels=1, 123 | rate=self.rate, 124 | output=True, 125 | stream_callback=self._spkr_callback, 126 | frames_per_buffer=self.chunk_size 127 | ) 128 | self.mic_stream.start_stream() 129 | self.spkr_stream.start_stream() 130 | 131 | def stop_streams(self): 132 | self.mic_stream.stop_stream() 133 | self.mic_stream.close() 134 | self.spkr_stream.stop_stream() 135 | self.spkr_stream.close() 136 | self.p.terminate() 137 | 138 | def send_mic_audio(self, socket): 139 | while not self._stop_event.is_set(): 140 | if not self.mic_queue.empty(): 141 | mic_chunk = self.mic_queue.get() 142 | logging.info(f'🎤 Sending {len(mic_chunk)} bytes of audio data.') 143 | encoded_chunk = base64.b64encode(mic_chunk).decode('utf-8') 144 | socket.send({'type': 'input_audio_buffer.append', 'audio': encoded_chunk}) 145 | 146 | def receive_audio(self, audio_chunk): 147 | self.audio_buffer.extend(audio_chunk) 148 | 149 | 150 | class Realtime: 151 | def __init__(self, api_key, ws_url): 152 | self.socket = Socket(api_key, ws_url) 153 | self.audio_io = AudioIO() 154 | 155 | def start(self): 156 | self.socket.on_msg = self.handle_message 157 | self.socket.connect() 158 | 159 | # Still works if we omit this, just doesn't speak to us first. 160 | self.socket.send({ 161 | 'type': 'response.create', 162 | 'response': { 163 | 'modalities': ['audio', 'text'], 164 | 'instructions': 'Please assist the user.' 165 | } 166 | }) 167 | 168 | audio_send_thread = threading.Thread(target=self.audio_io.send_mic_audio, args=(self.socket,)) 169 | audio_send_thread.start() 170 | 171 | self.audio_io.start_streams() 172 | 173 | def handle_message(self, message): 174 | event_type = message.get('type') 175 | logging.info(f'Received message type: {event_type}') 176 | 177 | if event_type == 'response.audio.delta': 178 | audio_content = base64.b64decode(message['delta']) 179 | self.audio_io.receive_audio(audio_content) 180 | logging.info(f'Received {len(audio_content)} bytes of audio data.') 181 | 182 | elif event_type == 'response.audio.done': 183 | logging.info('AI finished speaking.') 184 | 185 | def stop(self): 186 | logging.info('Shutting down Realtime session.') 187 | self.audio_io.stop_streams() 188 | self.socket.kill() 189 | 190 | 191 | def main(): 192 | api_key = os.getenv('OPENAI_API_KEY') 193 | ws_url = 'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01' 194 | 195 | realtime = Realtime(api_key, ws_url) 196 | 197 | try: 198 | realtime.start() 199 | while True: 200 | time.sleep(0.1) 201 | except KeyboardInterrupt: 202 | logging.info('Gracefully shutting down...') 203 | realtime.stop() 204 | 205 | 206 | if __name__ == '__main__': 207 | main() -------------------------------------------------------------------------------- /legacy/realtime-simple.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.system('cls' if os.name == 'nt' else 'clear') 3 | 4 | import threading 5 | import pyaudio 6 | import queue 7 | import base64 8 | import json 9 | import time 10 | from websocket import create_connection, WebSocketConnectionClosedException 11 | from dotenv import load_dotenv 12 | import logging 13 | 14 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s') 15 | 16 | load_dotenv() 17 | 18 | CHUNK_SIZE = 1024 19 | RATE = 24000 20 | FORMAT = pyaudio.paInt16 21 | API_KEY = os.getenv('OPENAI_API_KEY') 22 | WS_URL = 'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01' 23 | 24 | audio_buffer = bytearray() 25 | mic_queue = queue.Queue() 26 | 27 | stop_event = threading.Event() 28 | 29 | mic_on_at = 0 30 | mic_active = None 31 | REENGAGE_DELAY_MS = 500 32 | 33 | 34 | def mic_callback(in_data, frame_count, time_info, status): 35 | global mic_on_at, mic_active 36 | 37 | if time.time() > mic_on_at: 38 | if mic_active != True: 39 | logging.info('🎙️🟢 Mic active') 40 | mic_active = True 41 | mic_queue.put(in_data) 42 | else: 43 | if mic_active != False: 44 | logging.info('🎙️🔴 Mic suppressed') 45 | mic_active = False 46 | 47 | return (None, pyaudio.paContinue) 48 | 49 | 50 | def send_mic_audio_to_websocket(ws): 51 | try: 52 | while not stop_event.is_set(): 53 | if not mic_queue.empty(): 54 | mic_chunk = mic_queue.get() 55 | logging.info(f'🎤 Sending {len(mic_chunk)} bytes of audio data.') 56 | encoded_chunk = base64.b64encode(mic_chunk).decode('utf-8') 57 | message = json.dumps({'type': 'input_audio_buffer.append', 'audio': encoded_chunk}) 58 | try: 59 | ws.send(message) 60 | except WebSocketConnectionClosedException: 61 | logging.error('WebSocket connection closed.') 62 | break 63 | except Exception as e: 64 | logging.error(f'Error sending mic audio: {e}') 65 | except Exception as e: 66 | logging.error(f'Exception in send_mic_audio_to_websocket thread: {e}') 67 | finally: 68 | logging.info('Exiting send_mic_audio_to_websocket thread.') 69 | 70 | 71 | def spkr_callback(in_data, frame_count, time_info, status): 72 | global audio_buffer, mic_on_at 73 | 74 | bytes_needed = frame_count * 2 75 | current_buffer_size = len(audio_buffer) 76 | 77 | if current_buffer_size >= bytes_needed: 78 | audio_chunk = bytes(audio_buffer[:bytes_needed]) 79 | audio_buffer = audio_buffer[bytes_needed:] 80 | mic_on_at = time.time() + REENGAGE_DELAY_MS / 1000 81 | else: 82 | audio_chunk = bytes(audio_buffer) + b'\x00' * (bytes_needed - current_buffer_size) 83 | audio_buffer.clear() 84 | 85 | return (audio_chunk, pyaudio.paContinue) 86 | 87 | 88 | def receive_audio_from_websocket(ws): 89 | global audio_buffer 90 | 91 | try: 92 | while not stop_event.is_set(): 93 | try: 94 | message = ws.recv() 95 | if not message: # Handle empty message (EOF or connection close) 96 | logging.info('🔵 Received empty message (possibly EOF or WebSocket closing).') 97 | break 98 | 99 | # Now handle valid JSON messages only 100 | message = json.loads(message) 101 | event_type = message['type'] 102 | logging.info(f'⚡️ Received WebSocket event: {event_type}') 103 | 104 | if event_type == 'response.audio.delta': 105 | audio_content = base64.b64decode(message['delta']) 106 | audio_buffer.extend(audio_content) 107 | logging.info(f'🔵 Received {len(audio_content)} bytes, total buffer size: {len(audio_buffer)}') 108 | 109 | elif event_type == 'response.audio.done': 110 | logging.info('🔵 AI finished speaking.') 111 | 112 | except WebSocketConnectionClosedException: 113 | logging.error('WebSocket connection closed.') 114 | break 115 | except Exception as e: 116 | logging.error(f'Error receiving audio: {e}') 117 | except Exception as e: 118 | logging.error(f'Exception in receive_audio_from_websocket thread: {e}') 119 | finally: 120 | logging.info('Exiting receive_audio_from_websocket thread.') 121 | 122 | 123 | def connect_to_openai(): 124 | ws = None 125 | try: 126 | ws = create_connection(WS_URL, header=[f'Authorization: Bearer {API_KEY}', 'OpenAI-Beta: realtime=v1']) 127 | logging.info('Connected to OpenAI WebSocket.') 128 | 129 | ws.send(json.dumps({ 130 | 'type': 'response.create', 131 | 'response': { 132 | 'modalities': ['audio', 'text'], 133 | 'instructions': 'Please assist the user.' 134 | } 135 | })) 136 | 137 | # Start the recv and send threads 138 | receive_thread = threading.Thread(target=receive_audio_from_websocket, args=(ws,)) 139 | receive_thread.start() 140 | 141 | mic_thread = threading.Thread(target=send_mic_audio_to_websocket, args=(ws,)) 142 | mic_thread.start() 143 | 144 | # Wait for stop_event to be set 145 | while not stop_event.is_set(): 146 | time.sleep(0.1) 147 | 148 | # Send a close frame and close the WebSocket gracefully 149 | logging.info('Sending WebSocket close frame.') 150 | ws.send_close() 151 | 152 | receive_thread.join() 153 | mic_thread.join() 154 | 155 | logging.info('WebSocket closed and threads terminated.') 156 | except Exception as e: 157 | logging.error(f'Failed to connect to OpenAI: {e}') 158 | finally: 159 | if ws is not None: 160 | try: 161 | ws.close() 162 | logging.info('WebSocket connection closed.') 163 | except Exception as e: 164 | logging.error(f'Error closing WebSocket connection: {e}') 165 | 166 | 167 | def main(): 168 | p = pyaudio.PyAudio() 169 | 170 | mic_stream = p.open( 171 | format=FORMAT, 172 | channels=1, 173 | rate=RATE, 174 | input=True, 175 | stream_callback=mic_callback, 176 | frames_per_buffer=CHUNK_SIZE 177 | ) 178 | 179 | spkr_stream = p.open( 180 | format=FORMAT, 181 | channels=1, 182 | rate=RATE, 183 | output=True, 184 | stream_callback=spkr_callback, 185 | frames_per_buffer=CHUNK_SIZE 186 | ) 187 | 188 | try: 189 | mic_stream.start_stream() 190 | spkr_stream.start_stream() 191 | 192 | connect_to_openai() 193 | 194 | while mic_stream.is_active() and spkr_stream.is_active(): 195 | time.sleep(0.1) 196 | 197 | except KeyboardInterrupt: 198 | logging.info('Gracefully shutting down...') 199 | stop_event.set() 200 | 201 | finally: 202 | mic_stream.stop_stream() 203 | mic_stream.close() 204 | spkr_stream.stop_stream() 205 | spkr_stream.close() 206 | 207 | p.terminate() 208 | logging.info('Audio streams stopped and resources released. Exiting.') 209 | 210 | 211 | if __name__ == '__main__': 212 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | python-dotenv 2 | pyaudio 3 | websocket-client 4 | -------------------------------------------------------------------------------- /src/AudioIO.py: -------------------------------------------------------------------------------- 1 | import pyaudio 2 | import queue 3 | import time 4 | import logging 5 | import threading 6 | 7 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s') 8 | 9 | CHUNK_SIZE = 1024 10 | RATE = 24000 11 | FORMAT = pyaudio.paInt16 12 | REENGAGE_DELAY_MS = 500 13 | 14 | 15 | class AudioIO: 16 | def __init__(self, chunk_size=CHUNK_SIZE, rate=RATE, format=FORMAT, on_audio_callback=None): 17 | self.chunk_size = chunk_size 18 | self.rate = rate 19 | self.format = format 20 | self.audio_buffer = bytearray() 21 | self.mic_queue = queue.Queue() 22 | self.mic_on_at = 0 23 | self.mic_active = None 24 | self._stop_event = threading.Event() 25 | self.p = pyaudio.PyAudio() 26 | self.on_audio_callback = on_audio_callback # Callback for audio data 27 | 28 | def _mic_callback(self, in_data, frame_count, time_info, status): 29 | """ Microphone callback that queues audio chunks. """ 30 | if time.time() > self.mic_on_at: 31 | if not self.mic_active: 32 | logging.info('🎙️🟢 Mic active') 33 | self.mic_active = True 34 | self.mic_queue.put(in_data) 35 | else: 36 | if self.mic_active: 37 | logging.info('🎙️🔴 Mic suppressed') 38 | self.mic_active = False 39 | return (None, pyaudio.paContinue) 40 | 41 | def _spkr_callback(self, in_data, frame_count, time_info, status): 42 | """ Speaker callback that plays audio. """ 43 | bytes_needed = frame_count * 2 44 | current_buffer_size = len(self.audio_buffer) 45 | 46 | if current_buffer_size >= bytes_needed: 47 | audio_chunk = bytes(self.audio_buffer[:bytes_needed]) 48 | self.audio_buffer = self.audio_buffer[bytes_needed:] 49 | self.mic_on_at = time.time() + REENGAGE_DELAY_MS / 1000 50 | else: 51 | audio_chunk = bytes(self.audio_buffer) + b'\x00' * (bytes_needed - current_buffer_size) 52 | self.audio_buffer.clear() 53 | 54 | return (audio_chunk, pyaudio.paContinue) 55 | 56 | def start_streams(self): 57 | """ Start microphone and speaker streams. """ 58 | self.mic_stream = self.p.open( 59 | format=self.format, 60 | channels=1, 61 | rate=self.rate, 62 | input=True, 63 | stream_callback=self._mic_callback, 64 | frames_per_buffer=self.chunk_size 65 | ) 66 | self.spkr_stream = self.p.open( 67 | format=self.format, 68 | channels=1, 69 | rate=self.rate, 70 | output=True, 71 | stream_callback=self._spkr_callback, 72 | frames_per_buffer=self.chunk_size 73 | ) 74 | self.mic_stream.start_stream() 75 | self.spkr_stream.start_stream() 76 | 77 | def stop_streams(self): 78 | """ Stop and close audio streams. """ 79 | self.mic_stream.stop_stream() 80 | self.mic_stream.close() 81 | self.spkr_stream.stop_stream() 82 | self.spkr_stream.close() 83 | self.p.terminate() 84 | 85 | def process_mic_audio(self): 86 | """ Process microphone audio and call back when new audio is ready. """ 87 | while not self._stop_event.is_set(): 88 | if not self.mic_queue.empty(): 89 | mic_chunk = self.mic_queue.get() 90 | logging.info(f'🎤 Processing {len(mic_chunk)} bytes of audio data.') 91 | if self.on_audio_callback: 92 | self.on_audio_callback(mic_chunk) # Pass the audio chunk to the callback 93 | else: 94 | time.sleep(0.05) # Avoid tight loop when no audio is available 95 | 96 | def receive_audio(self, audio_chunk): 97 | """Appends audio data to the buffer for playback.""" 98 | self.audio_buffer.extend(audio_chunk) -------------------------------------------------------------------------------- /src/Realtime.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import logging 3 | import threading 4 | 5 | from Socket import Socket 6 | from AudioIO import AudioIO 7 | 8 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s') 9 | 10 | class Realtime: 11 | def __init__(self, api_key, ws_url): 12 | self.socket = Socket(api_key, ws_url, on_msg=self.handle_message) 13 | self.audio_io = AudioIO(on_audio_callback=self.send_audio_to_socket) 14 | self.audio_thread = None # Store thread references 15 | self.recv_thread = None 16 | 17 | def start(self): 18 | """ Start WebSocket and audio processing. """ 19 | self.socket.connect() 20 | 21 | # Send initial request to start the conversation 22 | self.socket.send({ 23 | 'type': 'response.create', 24 | 'response': { 25 | 'modalities': ['audio', 'text'], 26 | 'instructions': 'Please assist the user.' 27 | } 28 | }) 29 | 30 | # Start processing microphone audio 31 | self.audio_thread = threading.Thread(target=self.audio_io.process_mic_audio) 32 | self.audio_thread.start() 33 | 34 | # Start audio streams (mic and speaker) 35 | self.audio_io.start_streams() 36 | 37 | def send_audio_to_socket(self, mic_chunk): 38 | """ Callback function to send audio data to the socket. """ 39 | logging.info(f'🎤 Sending {len(mic_chunk)} bytes of audio data to socket.') 40 | encoded_chunk = base64.b64encode(mic_chunk).decode('utf-8') 41 | self.socket.send({'type': 'input_audio_buffer.append', 'audio': encoded_chunk}) 42 | 43 | def handle_message(self, message): 44 | """ Handle incoming WebSocket messages. """ 45 | event_type = message.get('type') 46 | logging.info(f'Received message type: {event_type}') 47 | 48 | if event_type == 'response.audio.delta': 49 | audio_content = base64.b64decode(message['delta']) 50 | self.audio_io.receive_audio(audio_content) 51 | logging.info(f'Received {len(audio_content)} bytes of audio data.') 52 | 53 | elif event_type == 'response.audio.done': 54 | logging.info('AI finished speaking.') 55 | 56 | def stop(self): 57 | """ Stop all processes cleanly. """ 58 | logging.info('Shutting down Realtime session.') 59 | 60 | # Signal threads to stop 61 | self.audio_io._stop_event.set() 62 | self.socket.kill() 63 | 64 | # Stop audio streams 65 | self.audio_io.stop_streams() 66 | 67 | # Join threads to ensure they exit cleanly 68 | if self.audio_thread: 69 | self.audio_thread.join() 70 | logging.info('Audio processing thread terminated.') -------------------------------------------------------------------------------- /src/Socket.py: -------------------------------------------------------------------------------- 1 | import threading 2 | import queue 3 | import json 4 | import logging 5 | import select 6 | from websocket import create_connection, WebSocketConnectionClosedException 7 | 8 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s') 9 | 10 | class Socket: 11 | def __init__(self, api_key, ws_url, on_msg=None): 12 | self.api_key = api_key 13 | self.ws_url = ws_url 14 | self.ws = None 15 | self.on_msg = on_msg # Callback for when a message is received 16 | self.send_queue = queue.Queue() # Outgoing message queue 17 | self._stop_event = threading.Event() 18 | self.loop_thread = None # Store thread reference 19 | 20 | def connect(self): 21 | """ Connect to WebSocket and start main loop. """ 22 | self.ws = create_connection(self.ws_url, header=[f'Authorization: Bearer {self.api_key}', 'OpenAI-Beta: realtime=v1']) 23 | logging.info('Connected to WebSocket.') 24 | 25 | # Start a unified loop for sending and receiving messages 26 | self.loop_thread = threading.Thread(target=self._socket_loop) 27 | self.loop_thread.start() 28 | 29 | def _socket_loop(self): 30 | """ Main loop that handles both sending and receiving messages. """ 31 | while not self._stop_event.is_set(): 32 | try: 33 | # Use select to check if WebSocket has data to read 34 | rlist, _, _ = select.select([self.ws.sock], [], [], 0.1) 35 | 36 | # If there's incoming data, receive it 37 | if rlist: 38 | message = self.ws.recv() 39 | if message and self.on_msg: 40 | logging.info(f'Received message: {message}') 41 | self.on_msg(json.loads(message)) # Call the user-provided callback 42 | 43 | # Check if there's a message in the queue to send 44 | try: 45 | outgoing_message = self.send_queue.get_nowait() 46 | self.ws.send(json.dumps(outgoing_message)) 47 | logging.info(f'Sent message: {outgoing_message}') 48 | except queue.Empty: 49 | continue # No message to send, loop back 50 | except WebSocketConnectionClosedException: 51 | logging.error('WebSocket connection closed.') 52 | break 53 | except Exception as e: 54 | logging.error(f'Error in socket loop: {e}') 55 | break 56 | 57 | def send(self, data): 58 | """ Enqueue the message to be sent. """ 59 | self.send_queue.put(data) 60 | 61 | def kill(self): 62 | """ Cleanly shut down the WebSocket and stop the loop. """ 63 | logging.info('Shutting down WebSocket.') 64 | self._stop_event.set() 65 | 66 | # Close WebSocket 67 | if self.ws: 68 | try: 69 | self.ws.send_close() 70 | self.ws.close() 71 | logging.info('WebSocket connection closed.') 72 | except Exception as e: 73 | logging.error(f'Error closing WebSocket: {e}') 74 | 75 | # Ensure the loop thread is joined 76 | if self.loop_thread: 77 | self.loop_thread.join() 78 | logging.info('WebSocket loop thread terminated.') -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import signal 3 | import time 4 | import logging 5 | from dotenv import load_dotenv 6 | 7 | from Realtime import Realtime 8 | 9 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s') 10 | 11 | # Load environment variables from a .env file 12 | load_dotenv() 13 | 14 | quitFlag = False 15 | 16 | def signal_handler(sig, frame, realtime_instance): 17 | """Handle Ctrl+C and initiate graceful shutdown.""" 18 | logging.info('Received Ctrl+C! Initiating shutdown...') 19 | realtime_instance.stop() 20 | global quitFlag 21 | quitFlag = True 22 | 23 | def main(): 24 | api_key = os.getenv('OPENAI_API_KEY') 25 | ws_url = 'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01' 26 | 27 | if not api_key: 28 | logging.error('OPENAI_API_KEY not found in environment variables!') 29 | return 30 | 31 | realtime = Realtime(api_key, ws_url) 32 | 33 | signal.signal(signal.SIGINT, lambda sig, frame: signal_handler(sig, frame, realtime)) 34 | 35 | try: 36 | realtime.start() 37 | while not quitFlag: 38 | time.sleep(0.1) 39 | 40 | except Exception as e: 41 | logging.error(f'Error in main loop: {e}') 42 | realtime.stop() 43 | 44 | finally: 45 | logging.info('Exiting main.') 46 | realtime.stop() # Ensures cleanup if any error occurs 47 | 48 | if __name__ == '__main__': 49 | main() 50 | --------------------------------------------------------------------------------