├── .env.template
├── .gitignore
├── LICENSE
├── README.md
├── legacy
    ├── realtime-classes.py
    └── realtime-simple.py
├── requirements.txt
└── src
    ├── AudioIO.py
    ├── Realtime.py
    ├── Socket.py
    └── main.py


/.env.template:
--------------------------------------------------------------------------------
1 | OPENAI_API_KEY=
2 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .venv/
3 | __pycache__/
4 | .env
5 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Pi
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # OpenAI Realtime API Python Edition
 2 | 
 3 | Python implementation of OpenAI's realtime API
 4 | 
 5 | OpenAI have a Node.js + JavaScript wrapper [here](https://github.com/openai/openai-realtime-api-beta), as well as a [openai-realtime-console](https://github.com/openai/openai-realtime-console) demo-project, but as yet nothing in Python, so here's a start at fixing that!
 6 | 
 7 | ### A couple of useful links:
 8 | - [Guide](https://platform.openai.com/docs/guides/realtime)
 9 | - [API Reference](https://platform.openai.com/docs/api-reference/realtime-client-events)
10 | 
11 | 
12 | # Getting it running
13 | 
14 | - Create a Virtual Environment if you want to: `python -m venv .venv ;  ./.venv/bin/activate`
15 | 
16 | - `pip install -r requirements.txt`
17 | 
18 | - Create a `.env` file like `.env.template` filling in your OpenAI API key
19 | 
20 | - Run it
21 | 
22 | You can run the legacy files: `python legacy/realtime-simple.py` or `python legacy/realtime-classes.py` which work while being minimal (especially the first one). Probably good for getting a feeling of how it works.
23 | 
24 | Alternatively `cd src; python main.py` -- this is the codebase I'll be building off moving forwards.
25 | 
26 | 
27 | # Notes:
28 | 
29 | ## legacy/
30 | - `legacy/realtime-simple.py` is "Least number of lines that gets the job done"
31 | - `legacy/realtime-classes.py` is arguably tidier
32 | 
33 | Both work! The AI talks to you, and you can talk back.
34 | 
35 | I have to mute the mic while the AI is speaking, else it gets back its own audio and gets very (entertainingly) confused. "Hello, I'm a helpful assistant!" "Gosh, so am I!" "What a coincidence!" "I know, right?!", etc.
36 | 
37 | ## src/ (current/future)
38 | I've abstracted websocket-stuff and audioIO-stuff into Socket.py and AudioIO.py, which leaves Realtime.py free to make more sense.
39 | 
40 | I did take a run at doing this async with Trio, but at this point it just gets in the way. Maybe I'll return to an async model. I'm not sold on it, much as I love Trio; exception-handling and teardown are a pain.
41 | 
42 | ## Additional note (7 Oct 2024)
43 | After some testing, it's clear that legacy/realtime-simple.py functions crisply, and there is some responsiveness issue with src/.
44 | 
45 | This could be a locking issue, with audio arriving from the websocket into an input buffer, which is, on a worker thread, drained to the speakers. It could be with mic-data, which is buffered in a worker thread and drained to the websocket. It could be both, and/or something else.
46 | 
47 | Python is not an ideal language for realtime audio processing, and likely this was factored into account by the OpenAI team's decision to initially publish only a Node.js implementation.
48 | 
49 | # Vision
50 | 
51 | It would be nice to clean this up to act as a fully-featured Python API for this service.
52 | 
53 | 
54 | # TODO
55 | 
56 | - Firstly the code needs picking through, to ensure a clean / robust skeleton.
57 | 
58 | - EDIT: Actually the architecture in src/ needs to be revised, to account for the above Additional Note.
59 | 
60 | - Need some thought on what such a lib should expose & how to expose it (e.g. callbacks).
61 | 
62 | - Fleshing out API support (it's quite a big API).
63 | 
64 | - Tool-Use / Function-Calling.
65 | 
66 | - User-interruption-support via feedback cancellation (currently I'm having to mute the mic while openAI audio is playing out of the speakers, which means I can't interrupt it). There's WebRTC AEC (Adaptive Echo Cancellation), but I can't find any off-the-shelf pip library that doesn't require fiddling (building deps). Maybe `pip install adaptfilt` is a good solution. This looks doable.
67 | 
68 | 
69 | # Do involve!
70 | 
71 | Contributions are invited, in which case you are welcome to contact the author (You'll find a link to the sap.ient.ai Discord on https://github.com/sap-ient-ai upon which I exist as `_p_i_`).
72 | 
73 | 
74 | # Thanks
75 | 
76 | Thanks to https://www.naptha.ai/ for providing vital funding that allows me to Do My Own Thing.
77 | 


--------------------------------------------------------------------------------
/legacy/realtime-classes.py:
--------------------------------------------------------------------------------
  1 | import threading
  2 | import pyaudio
  3 | import queue
  4 | import base64
  5 | import json
  6 | import os
  7 | import time
  8 | from websocket import create_connection, WebSocketConnectionClosedException
  9 | from dotenv import load_dotenv
 10 | import logging
 11 | 
 12 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
 13 | 
 14 | load_dotenv()
 15 | 
 16 | CHUNK_SIZE = 1024
 17 | RATE = 24000
 18 | FORMAT = pyaudio.paInt16
 19 | REENGAGE_DELAY_MS = 500
 20 | 
 21 | 
 22 | class Socket:
 23 |     def __init__(self, api_key, ws_url):
 24 |         self.api_key = api_key
 25 |         self.ws_url = ws_url
 26 |         self.ws = None
 27 |         self.on_msg = None
 28 |         self._stop_event = threading.Event()
 29 |         self.recv_thread = None
 30 |         self.lock = threading.Lock()
 31 | 
 32 |     def connect(self):
 33 |         self.ws = create_connection(self.ws_url, header=[f'Authorization: Bearer {self.api_key}', 'OpenAI-Beta: realtime=v1'])
 34 |         logging.info('Connected to WebSocket.')
 35 |         self.recv_thread = threading.Thread(target=self._receive_messages)
 36 |         self.recv_thread.start()
 37 | 
 38 |     def _receive_messages(self):
 39 |         while not self._stop_event.is_set():
 40 |             try:
 41 |                 message = self.ws.recv()
 42 |                 if message and self.on_msg:
 43 |                     self.on_msg(json.loads(message))
 44 |             except WebSocketConnectionClosedException:
 45 |                 logging.error('WebSocket connection closed.')
 46 |                 break
 47 |             except Exception as e:
 48 |                 logging.error(f'Error receiving message: {e}')
 49 |         logging.info('Exiting WebSocket receiving thread.')
 50 | 
 51 |     def send(self, data):
 52 |         try:
 53 |             with self.lock:
 54 |                 if self.ws:
 55 |                     self.ws.send(json.dumps(data))
 56 |         except WebSocketConnectionClosedException:
 57 |             logging.error('WebSocket connection closed.')
 58 |         except Exception as e:
 59 |             logging.error(f'Error sending message: {e}')
 60 | 
 61 |     def kill(self):
 62 |         self._stop_event.set()
 63 |         if self.ws:
 64 |             try:
 65 |                 self.ws.send_close()
 66 |                 self.ws.close()
 67 |                 logging.info('WebSocket connection closed.')
 68 |             except Exception as e:
 69 |                 logging.error(f'Error closing WebSocket: {e}')
 70 |         if self.recv_thread:
 71 |             self.recv_thread.join()
 72 | 
 73 | class AudioIO:
 74 |     def __init__(self, chunk_size=CHUNK_SIZE, rate=RATE, format=FORMAT):
 75 |         self.chunk_size = chunk_size
 76 |         self.rate = rate
 77 |         self.format = format
 78 |         self.audio_buffer = bytearray()
 79 |         self.mic_queue = queue.Queue()
 80 |         self.mic_on_at = 0
 81 |         self.mic_active = None
 82 |         self._stop_event = threading.Event()
 83 |         self.p = pyaudio.PyAudio()
 84 | 
 85 |     def _mic_callback(self, in_data, frame_count, time_info, status):
 86 |         if time.time() > self.mic_on_at:
 87 |             if not self.mic_active:
 88 |                 logging.info('🎙️🟢 Mic active')
 89 |                 self.mic_active = True
 90 |             self.mic_queue.put(in_data)
 91 |         else:
 92 |             if self.mic_active:
 93 |                 logging.info('🎙️🔴 Mic suppressed')
 94 |                 self.mic_active = False
 95 |         return (None, pyaudio.paContinue)
 96 | 
 97 |     def _spkr_callback(self, in_data, frame_count, time_info, status):
 98 |         bytes_needed = frame_count * 2
 99 |         current_buffer_size = len(self.audio_buffer)
100 | 
101 |         if current_buffer_size >= bytes_needed:
102 |             audio_chunk = bytes(self.audio_buffer[:bytes_needed])
103 |             self.audio_buffer = self.audio_buffer[bytes_needed:]
104 |             self.mic_on_at = time.time() + REENGAGE_DELAY_MS / 1000
105 |         else:
106 |             audio_chunk = bytes(self.audio_buffer) + b'\x00' * (bytes_needed - current_buffer_size)
107 |             self.audio_buffer.clear()
108 | 
109 |         return (audio_chunk, pyaudio.paContinue)
110 | 
111 |     def start_streams(self):
112 |         self.mic_stream = self.p.open(
113 |             format=self.format,
114 |             channels=1,
115 |             rate=self.rate,
116 |             input=True,
117 |             stream_callback=self._mic_callback,
118 |             frames_per_buffer=self.chunk_size
119 |         )
120 |         self.spkr_stream = self.p.open(
121 |             format=self.format,
122 |             channels=1,
123 |             rate=self.rate,
124 |             output=True,
125 |             stream_callback=self._spkr_callback,
126 |             frames_per_buffer=self.chunk_size
127 |         )
128 |         self.mic_stream.start_stream()
129 |         self.spkr_stream.start_stream()
130 | 
131 |     def stop_streams(self):
132 |         self.mic_stream.stop_stream()
133 |         self.mic_stream.close()
134 |         self.spkr_stream.stop_stream()
135 |         self.spkr_stream.close()
136 |         self.p.terminate()
137 | 
138 |     def send_mic_audio(self, socket):
139 |         while not self._stop_event.is_set():
140 |             if not self.mic_queue.empty():
141 |                 mic_chunk = self.mic_queue.get()
142 |                 logging.info(f'🎤 Sending {len(mic_chunk)} bytes of audio data.')
143 |                 encoded_chunk = base64.b64encode(mic_chunk).decode('utf-8')
144 |                 socket.send({'type': 'input_audio_buffer.append', 'audio': encoded_chunk})
145 | 
146 |     def receive_audio(self, audio_chunk):
147 |         self.audio_buffer.extend(audio_chunk)
148 | 
149 | 
150 | class Realtime:
151 |     def __init__(self, api_key, ws_url):
152 |         self.socket = Socket(api_key, ws_url)
153 |         self.audio_io = AudioIO()
154 | 
155 |     def start(self):
156 |         self.socket.on_msg = self.handle_message
157 |         self.socket.connect()
158 | 
159 |         # Still works if we omit this, just doesn't speak to us first.
160 |         self.socket.send({
161 |             'type': 'response.create',
162 |             'response': {
163 |                 'modalities': ['audio', 'text'],
164 |                 'instructions': 'Please assist the user.'
165 |             }
166 |         })
167 | 
168 |         audio_send_thread = threading.Thread(target=self.audio_io.send_mic_audio, args=(self.socket,))
169 |         audio_send_thread.start()
170 | 
171 |         self.audio_io.start_streams()
172 | 
173 |     def handle_message(self, message):
174 |         event_type = message.get('type')
175 |         logging.info(f'Received message type: {event_type}')
176 | 
177 |         if event_type == 'response.audio.delta':
178 |             audio_content = base64.b64decode(message['delta'])
179 |             self.audio_io.receive_audio(audio_content)
180 |             logging.info(f'Received {len(audio_content)} bytes of audio data.')
181 | 
182 |         elif event_type == 'response.audio.done':
183 |             logging.info('AI finished speaking.')
184 | 
185 |     def stop(self):
186 |         logging.info('Shutting down Realtime session.')
187 |         self.audio_io.stop_streams()
188 |         self.socket.kill()
189 | 
190 | 
191 | def main():
192 |     api_key = os.getenv('OPENAI_API_KEY')
193 |     ws_url = 'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01'
194 | 
195 |     realtime = Realtime(api_key, ws_url)
196 | 
197 |     try:
198 |         realtime.start()
199 |         while True:
200 |             time.sleep(0.1)
201 |     except KeyboardInterrupt:
202 |         logging.info('Gracefully shutting down...')
203 |         realtime.stop()
204 | 
205 | 
206 | if __name__ == '__main__':
207 |     main()


--------------------------------------------------------------------------------
/legacy/realtime-simple.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | os.system('cls' if os.name == 'nt' else 'clear')
  3 | 
  4 | import threading
  5 | import pyaudio
  6 | import queue
  7 | import base64
  8 | import json
  9 | import time
 10 | from websocket import create_connection, WebSocketConnectionClosedException
 11 | from dotenv import load_dotenv
 12 | import logging
 13 | 
 14 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
 15 | 
 16 | load_dotenv()
 17 | 
 18 | CHUNK_SIZE = 1024
 19 | RATE = 24000
 20 | FORMAT = pyaudio.paInt16
 21 | API_KEY = os.getenv('OPENAI_API_KEY')
 22 | WS_URL = 'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01'
 23 | 
 24 | audio_buffer = bytearray()
 25 | mic_queue = queue.Queue()
 26 | 
 27 | stop_event = threading.Event()
 28 | 
 29 | mic_on_at = 0
 30 | mic_active = None
 31 | REENGAGE_DELAY_MS = 500
 32 | 
 33 | 
 34 | def mic_callback(in_data, frame_count, time_info, status):
 35 |     global mic_on_at, mic_active
 36 | 
 37 |     if time.time() > mic_on_at:
 38 |         if mic_active != True:
 39 |             logging.info('🎙️🟢 Mic active')
 40 |             mic_active = True
 41 |         mic_queue.put(in_data)
 42 |     else:
 43 |         if mic_active != False:
 44 |             logging.info('🎙️🔴 Mic suppressed')
 45 |             mic_active = False
 46 | 
 47 |     return (None, pyaudio.paContinue)
 48 | 
 49 | 
 50 | def send_mic_audio_to_websocket(ws):
 51 |     try:
 52 |         while not stop_event.is_set():
 53 |             if not mic_queue.empty():
 54 |                 mic_chunk = mic_queue.get()
 55 |                 logging.info(f'🎤 Sending {len(mic_chunk)} bytes of audio data.')
 56 |                 encoded_chunk = base64.b64encode(mic_chunk).decode('utf-8')
 57 |                 message = json.dumps({'type': 'input_audio_buffer.append', 'audio': encoded_chunk})
 58 |                 try:
 59 |                     ws.send(message)
 60 |                 except WebSocketConnectionClosedException:
 61 |                     logging.error('WebSocket connection closed.')
 62 |                     break
 63 |                 except Exception as e:
 64 |                     logging.error(f'Error sending mic audio: {e}')
 65 |     except Exception as e:
 66 |         logging.error(f'Exception in send_mic_audio_to_websocket thread: {e}')
 67 |     finally:
 68 |         logging.info('Exiting send_mic_audio_to_websocket thread.')
 69 | 
 70 | 
 71 | def spkr_callback(in_data, frame_count, time_info, status):
 72 |     global audio_buffer, mic_on_at
 73 | 
 74 |     bytes_needed = frame_count * 2
 75 |     current_buffer_size = len(audio_buffer)
 76 | 
 77 |     if current_buffer_size >= bytes_needed:
 78 |         audio_chunk = bytes(audio_buffer[:bytes_needed])
 79 |         audio_buffer = audio_buffer[bytes_needed:]
 80 |         mic_on_at = time.time() + REENGAGE_DELAY_MS / 1000
 81 |     else:
 82 |         audio_chunk = bytes(audio_buffer) + b'\x00' * (bytes_needed - current_buffer_size)
 83 |         audio_buffer.clear()
 84 | 
 85 |     return (audio_chunk, pyaudio.paContinue)
 86 | 
 87 | 
 88 | def receive_audio_from_websocket(ws):
 89 |     global audio_buffer
 90 | 
 91 |     try:
 92 |         while not stop_event.is_set():
 93 |             try:
 94 |                 message = ws.recv()
 95 |                 if not message:  # Handle empty message (EOF or connection close)
 96 |                     logging.info('🔵 Received empty message (possibly EOF or WebSocket closing).')
 97 |                     break
 98 | 
 99 |                 # Now handle valid JSON messages only
100 |                 message = json.loads(message)
101 |                 event_type = message['type']
102 |                 logging.info(f'⚡️ Received WebSocket event: {event_type}')
103 | 
104 |                 if event_type == 'response.audio.delta':
105 |                     audio_content = base64.b64decode(message['delta'])
106 |                     audio_buffer.extend(audio_content)
107 |                     logging.info(f'🔵 Received {len(audio_content)} bytes, total buffer size: {len(audio_buffer)}')
108 | 
109 |                 elif event_type == 'response.audio.done':
110 |                     logging.info('🔵 AI finished speaking.')
111 | 
112 |             except WebSocketConnectionClosedException:
113 |                 logging.error('WebSocket connection closed.')
114 |                 break
115 |             except Exception as e:
116 |                 logging.error(f'Error receiving audio: {e}')
117 |     except Exception as e:
118 |         logging.error(f'Exception in receive_audio_from_websocket thread: {e}')
119 |     finally:
120 |         logging.info('Exiting receive_audio_from_websocket thread.')
121 | 
122 | 
123 | def connect_to_openai():
124 |     ws = None
125 |     try:
126 |         ws = create_connection(WS_URL, header=[f'Authorization: Bearer {API_KEY}', 'OpenAI-Beta: realtime=v1'])
127 |         logging.info('Connected to OpenAI WebSocket.')
128 | 
129 |         ws.send(json.dumps({
130 |             'type': 'response.create',
131 |             'response': {
132 |                 'modalities': ['audio', 'text'],
133 |                 'instructions': 'Please assist the user.'
134 |             }
135 |         }))
136 | 
137 |         # Start the recv and send threads
138 |         receive_thread = threading.Thread(target=receive_audio_from_websocket, args=(ws,))
139 |         receive_thread.start()
140 | 
141 |         mic_thread = threading.Thread(target=send_mic_audio_to_websocket, args=(ws,))
142 |         mic_thread.start()
143 | 
144 |         # Wait for stop_event to be set
145 |         while not stop_event.is_set():
146 |             time.sleep(0.1)
147 | 
148 |         # Send a close frame and close the WebSocket gracefully
149 |         logging.info('Sending WebSocket close frame.')
150 |         ws.send_close()
151 | 
152 |         receive_thread.join()
153 |         mic_thread.join()
154 | 
155 |         logging.info('WebSocket closed and threads terminated.')
156 |     except Exception as e:
157 |         logging.error(f'Failed to connect to OpenAI: {e}')
158 |     finally:
159 |         if ws is not None:
160 |             try:
161 |                 ws.close()
162 |                 logging.info('WebSocket connection closed.')
163 |             except Exception as e:
164 |                 logging.error(f'Error closing WebSocket connection: {e}')
165 | 
166 | 
167 | def main():
168 |     p = pyaudio.PyAudio()
169 | 
170 |     mic_stream = p.open(
171 |         format=FORMAT,
172 |         channels=1,
173 |         rate=RATE,
174 |         input=True,
175 |         stream_callback=mic_callback,
176 |         frames_per_buffer=CHUNK_SIZE
177 |     )
178 | 
179 |     spkr_stream = p.open(
180 |         format=FORMAT,
181 |         channels=1,
182 |         rate=RATE,
183 |         output=True,
184 |         stream_callback=spkr_callback,
185 |         frames_per_buffer=CHUNK_SIZE
186 |     )
187 | 
188 |     try:
189 |         mic_stream.start_stream()
190 |         spkr_stream.start_stream()
191 | 
192 |         connect_to_openai()
193 | 
194 |         while mic_stream.is_active() and spkr_stream.is_active():
195 |             time.sleep(0.1)
196 | 
197 |     except KeyboardInterrupt:
198 |         logging.info('Gracefully shutting down...')
199 |         stop_event.set()
200 | 
201 |     finally:
202 |         mic_stream.stop_stream()
203 |         mic_stream.close()
204 |         spkr_stream.stop_stream()
205 |         spkr_stream.close()
206 | 
207 |         p.terminate()
208 |         logging.info('Audio streams stopped and resources released. Exiting.')
209 | 
210 | 
211 | if __name__ == '__main__':
212 |     main()


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | python-dotenv
2 | pyaudio
3 | websocket-client
4 | 


--------------------------------------------------------------------------------
/src/AudioIO.py:
--------------------------------------------------------------------------------
 1 | import pyaudio
 2 | import queue
 3 | import time
 4 | import logging
 5 | import threading
 6 | 
 7 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
 8 | 
 9 | CHUNK_SIZE = 1024
10 | RATE = 24000
11 | FORMAT = pyaudio.paInt16
12 | REENGAGE_DELAY_MS = 500
13 | 
14 | 
15 | class AudioIO:
16 |     def __init__(self, chunk_size=CHUNK_SIZE, rate=RATE, format=FORMAT, on_audio_callback=None):
17 |         self.chunk_size = chunk_size
18 |         self.rate = rate
19 |         self.format = format
20 |         self.audio_buffer = bytearray()
21 |         self.mic_queue = queue.Queue()
22 |         self.mic_on_at = 0
23 |         self.mic_active = None
24 |         self._stop_event = threading.Event()
25 |         self.p = pyaudio.PyAudio()
26 |         self.on_audio_callback = on_audio_callback  # Callback for audio data
27 | 
28 |     def _mic_callback(self, in_data, frame_count, time_info, status):
29 |         """ Microphone callback that queues audio chunks. """
30 |         if time.time() > self.mic_on_at:
31 |             if not self.mic_active:
32 |                 logging.info('🎙️🟢 Mic active')
33 |                 self.mic_active = True
34 |             self.mic_queue.put(in_data)
35 |         else:
36 |             if self.mic_active:
37 |                 logging.info('🎙️🔴 Mic suppressed')
38 |                 self.mic_active = False
39 |         return (None, pyaudio.paContinue)
40 | 
41 |     def _spkr_callback(self, in_data, frame_count, time_info, status):
42 |         """ Speaker callback that plays audio. """
43 |         bytes_needed = frame_count * 2
44 |         current_buffer_size = len(self.audio_buffer)
45 | 
46 |         if current_buffer_size >= bytes_needed:
47 |             audio_chunk = bytes(self.audio_buffer[:bytes_needed])
48 |             self.audio_buffer = self.audio_buffer[bytes_needed:]
49 |             self.mic_on_at = time.time() + REENGAGE_DELAY_MS / 1000
50 |         else:
51 |             audio_chunk = bytes(self.audio_buffer) + b'\x00' * (bytes_needed - current_buffer_size)
52 |             self.audio_buffer.clear()
53 | 
54 |         return (audio_chunk, pyaudio.paContinue)
55 | 
56 |     def start_streams(self):
57 |         """ Start microphone and speaker streams. """
58 |         self.mic_stream = self.p.open(
59 |             format=self.format,
60 |             channels=1,
61 |             rate=self.rate,
62 |             input=True,
63 |             stream_callback=self._mic_callback,
64 |             frames_per_buffer=self.chunk_size
65 |         )
66 |         self.spkr_stream = self.p.open(
67 |             format=self.format,
68 |             channels=1,
69 |             rate=self.rate,
70 |             output=True,
71 |             stream_callback=self._spkr_callback,
72 |             frames_per_buffer=self.chunk_size
73 |         )
74 |         self.mic_stream.start_stream()
75 |         self.spkr_stream.start_stream()
76 | 
77 |     def stop_streams(self):
78 |         """ Stop and close audio streams. """
79 |         self.mic_stream.stop_stream()
80 |         self.mic_stream.close()
81 |         self.spkr_stream.stop_stream()
82 |         self.spkr_stream.close()
83 |         self.p.terminate()
84 | 
85 |     def process_mic_audio(self):
86 |         """ Process microphone audio and call back when new audio is ready. """
87 |         while not self._stop_event.is_set():
88 |             if not self.mic_queue.empty():
89 |                 mic_chunk = self.mic_queue.get()
90 |                 logging.info(f'🎤 Processing {len(mic_chunk)} bytes of audio data.')
91 |                 if self.on_audio_callback:
92 |                     self.on_audio_callback(mic_chunk)  # Pass the audio chunk to the callback
93 |             else:
94 |                 time.sleep(0.05)  # Avoid tight loop when no audio is available
95 | 
96 |     def receive_audio(self, audio_chunk):
97 |         """Appends audio data to the buffer for playback."""
98 |         self.audio_buffer.extend(audio_chunk)


--------------------------------------------------------------------------------
/src/Realtime.py:
--------------------------------------------------------------------------------
 1 | import base64
 2 | import logging
 3 | import threading
 4 | 
 5 | from Socket import Socket
 6 | from AudioIO import AudioIO
 7 | 
 8 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
 9 | 
10 | class Realtime:
11 |     def __init__(self, api_key, ws_url):
12 |         self.socket = Socket(api_key, ws_url, on_msg=self.handle_message)
13 |         self.audio_io = AudioIO(on_audio_callback=self.send_audio_to_socket)
14 |         self.audio_thread = None  # Store thread references
15 |         self.recv_thread = None
16 | 
17 |     def start(self):
18 |         """ Start WebSocket and audio processing. """
19 |         self.socket.connect()
20 | 
21 |         # Send initial request to start the conversation
22 |         self.socket.send({
23 |             'type': 'response.create',
24 |             'response': {
25 |                 'modalities': ['audio', 'text'],
26 |                 'instructions': 'Please assist the user.'
27 |             }
28 |         })
29 | 
30 |         # Start processing microphone audio
31 |         self.audio_thread = threading.Thread(target=self.audio_io.process_mic_audio)
32 |         self.audio_thread.start()
33 | 
34 |         # Start audio streams (mic and speaker)
35 |         self.audio_io.start_streams()
36 | 
37 |     def send_audio_to_socket(self, mic_chunk):
38 |         """ Callback function to send audio data to the socket. """
39 |         logging.info(f'🎤 Sending {len(mic_chunk)} bytes of audio data to socket.')
40 |         encoded_chunk = base64.b64encode(mic_chunk).decode('utf-8')
41 |         self.socket.send({'type': 'input_audio_buffer.append', 'audio': encoded_chunk})
42 | 
43 |     def handle_message(self, message):
44 |         """ Handle incoming WebSocket messages. """
45 |         event_type = message.get('type')
46 |         logging.info(f'Received message type: {event_type}')
47 | 
48 |         if event_type == 'response.audio.delta':
49 |             audio_content = base64.b64decode(message['delta'])
50 |             self.audio_io.receive_audio(audio_content)
51 |             logging.info(f'Received {len(audio_content)} bytes of audio data.')
52 | 
53 |         elif event_type == 'response.audio.done':
54 |             logging.info('AI finished speaking.')
55 | 
56 |     def stop(self):
57 |         """ Stop all processes cleanly. """
58 |         logging.info('Shutting down Realtime session.')
59 | 
60 |         # Signal threads to stop
61 |         self.audio_io._stop_event.set()
62 |         self.socket.kill()
63 | 
64 |         # Stop audio streams
65 |         self.audio_io.stop_streams()
66 | 
67 |         # Join threads to ensure they exit cleanly
68 |         if self.audio_thread:
69 |             self.audio_thread.join()
70 |             logging.info('Audio processing thread terminated.')


--------------------------------------------------------------------------------
/src/Socket.py:
--------------------------------------------------------------------------------
 1 | import threading
 2 | import queue
 3 | import json
 4 | import logging
 5 | import select
 6 | from websocket import create_connection, WebSocketConnectionClosedException
 7 | 
 8 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
 9 | 
10 | class Socket:
11 |     def __init__(self, api_key, ws_url, on_msg=None):
12 |         self.api_key = api_key
13 |         self.ws_url = ws_url
14 |         self.ws = None
15 |         self.on_msg = on_msg  # Callback for when a message is received
16 |         self.send_queue = queue.Queue()  # Outgoing message queue
17 |         self._stop_event = threading.Event()
18 |         self.loop_thread = None  # Store thread reference
19 | 
20 |     def connect(self):
21 |         """ Connect to WebSocket and start main loop. """
22 |         self.ws = create_connection(self.ws_url, header=[f'Authorization: Bearer {self.api_key}', 'OpenAI-Beta: realtime=v1'])
23 |         logging.info('Connected to WebSocket.')
24 | 
25 |         # Start a unified loop for sending and receiving messages
26 |         self.loop_thread = threading.Thread(target=self._socket_loop)
27 |         self.loop_thread.start()
28 | 
29 |     def _socket_loop(self):
30 |         """ Main loop that handles both sending and receiving messages. """
31 |         while not self._stop_event.is_set():
32 |             try:
33 |                 # Use select to check if WebSocket has data to read
34 |                 rlist, _, _ = select.select([self.ws.sock], [], [], 0.1)
35 | 
36 |                 # If there's incoming data, receive it
37 |                 if rlist:
38 |                     message = self.ws.recv()
39 |                     if message and self.on_msg:
40 |                         logging.info(f'Received message: {message}')
41 |                         self.on_msg(json.loads(message))  # Call the user-provided callback
42 | 
43 |                 # Check if there's a message in the queue to send
44 |                 try:
45 |                     outgoing_message = self.send_queue.get_nowait()
46 |                     self.ws.send(json.dumps(outgoing_message))
47 |                     logging.info(f'Sent message: {outgoing_message}')
48 |                 except queue.Empty:
49 |                     continue  # No message to send, loop back
50 |             except WebSocketConnectionClosedException:
51 |                 logging.error('WebSocket connection closed.')
52 |                 break
53 |             except Exception as e:
54 |                 logging.error(f'Error in socket loop: {e}')
55 |                 break
56 | 
57 |     def send(self, data):
58 |         """ Enqueue the message to be sent. """
59 |         self.send_queue.put(data)
60 | 
61 |     def kill(self):
62 |         """ Cleanly shut down the WebSocket and stop the loop. """
63 |         logging.info('Shutting down WebSocket.')
64 |         self._stop_event.set()
65 | 
66 |         # Close WebSocket
67 |         if self.ws:
68 |             try:
69 |                 self.ws.send_close()
70 |                 self.ws.close()
71 |                 logging.info('WebSocket connection closed.')
72 |             except Exception as e:
73 |                 logging.error(f'Error closing WebSocket: {e}')
74 | 
75 |         # Ensure the loop thread is joined
76 |         if self.loop_thread:
77 |             self.loop_thread.join()
78 |             logging.info('WebSocket loop thread terminated.')


--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import signal
 3 | import time
 4 | import logging
 5 | from dotenv import load_dotenv
 6 | 
 7 | from Realtime import Realtime
 8 | 
 9 | logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
10 | 
11 | # Load environment variables from a .env file
12 | load_dotenv()
13 | 
14 | quitFlag = False
15 | 
16 | def signal_handler(sig, frame, realtime_instance):
17 |     """Handle Ctrl+C and initiate graceful shutdown."""
18 |     logging.info('Received Ctrl+C! Initiating shutdown...')
19 |     realtime_instance.stop()
20 |     global quitFlag
21 |     quitFlag = True
22 | 
23 | def main():
24 |     api_key = os.getenv('OPENAI_API_KEY')
25 |     ws_url = 'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01'
26 | 
27 |     if not api_key:
28 |         logging.error('OPENAI_API_KEY not found in environment variables!')
29 |         return
30 | 
31 |     realtime = Realtime(api_key, ws_url)
32 | 
33 |     signal.signal(signal.SIGINT, lambda sig, frame: signal_handler(sig, frame, realtime))
34 | 
35 |     try:
36 |         realtime.start()
37 |         while not quitFlag:
38 |             time.sleep(0.1)
39 | 
40 |     except Exception as e:
41 |         logging.error(f'Error in main loop: {e}')
42 |         realtime.stop()
43 | 
44 |     finally:
45 |         logging.info('Exiting main.')
46 |         realtime.stop()  # Ensures cleanup if any error occurs
47 | 
48 | if __name__ == '__main__':
49 |     main()
50 | 


--------------------------------------------------------------------------------