├── LICENCE ├── README.md ├── main.py ├── requirements.txt └── static ├── index.html └── js └── app.js /LICENCE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 echohive 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Real-time Voice Chat Template 2 | 3 | A barebone template for building real-time voice applications with OpenAI's API and WebRTC. Features built-in arXiv paper search capabilities and customizable AI instructions. 4 | 5 | ## Features 6 | 7 | - Real-time voice communication with AI 8 | - Built-in arXiv paper search functionality 9 | - Live event display 10 | - Customizable AI instructions 11 | - Clean, dark-themed UI 12 | - WebRTC-based audio streaming 13 | 14 | ## ❤️ Support & Get 400+ AI Projects 15 | 16 | This is one of 400+ fascinating projects in my collection! [Support me on Patreon](https://www.patreon.com/c/echohive42/membership) to get: 17 | 18 | - 🎯 Access to 400+ AI projects (and growing daily!) 19 | - Including advanced projects like [2 Agent Real-time voice template with turn taking](https://www.patreon.com/posts/2-agent-real-you-118330397) 20 | - 📥 Full source code & detailed explanations 21 | - 📚 1000x Cursor Course 22 | - 🎓 Live coding sessions & AMAs 23 | - 💬 1-on-1 consultations (higher tiers) 24 | - 🎁 Exclusive discounts on AI tools & platforms (up to $180 value) 25 | 26 | ## Technical Stack 27 | 28 | - **Backend**: FastAPI 29 | - **Frontend**: HTML, JavaScript, TailwindCSS, DaisyUI 30 | - **Real-time Communication**: WebRTC 31 | - **API Integration**: OpenAI Real-time API, arXiv API 32 | - **Styling**: Dark mode with animations 33 | 34 | ## Custom Instructions System 35 | 36 | The template includes a two-layer instruction system: 37 | 38 | 1. **Backend Instructions** (main.py): 39 | 40 | - Basic AI behavior guidelines 41 | - Function-specific instructions 42 | - Voice interaction rules 43 | 2. **Frontend Instructions** (app.js): 44 | 45 | - Detailed AI role configuration 46 | - Context and guidelines 47 | - Function usage specifications 48 | - Real-time session updates 49 | 50 | ## ArXiv Integration 51 | 52 | Built-in functionality to search and fetch latest papers: 53 | 54 | - Real-time paper search 55 | - Automatic result parsing 56 | - Title, authors, summary, and link extraction 57 | - Configurable result limits 58 | 59 | ## Setup 60 | 61 | 1. Set your OpenAI API key: 62 | 63 | ```bash 64 | # Windows 65 | set OPENAI_API_KEY=your_api_key_here 66 | 67 | # Linux/Mac 68 | export OPENAI_API_KEY=your_api_key_here 69 | ``` 70 | 71 | 2. Install dependencies: 72 | 73 | ```bash 74 | pip install -r requirements.txt 75 | ``` 76 | 77 | 3. Run the application: 78 | 79 | ```bash 80 | python main.py 81 | ``` 82 | 83 | 4. Open in browser: 84 | 85 | ``` 86 | http://127.0.0.1:8000 87 | ``` 88 | 89 | ## Usage Examples 90 | 91 | 1. **Start Voice Chat**: 92 | 93 | - Click "Start Voice Chat" 94 | - Grant microphone permissions 95 | - Wait for connection confirmation 96 | 2. **Search Papers**: 97 | 98 | - "Find recent papers about machine learning" 99 | - "Show me the latest research on AI" 100 | - "Search for papers about neural networks" 101 | 3. **End Session**: 102 | 103 | - Click "Stop Voice Chat" 104 | - Resources are automatically cleaned up 105 | 106 | ## Requirements 107 | 108 | - Python 3.7+ 109 | - Modern web browser with WebRTC support 110 | - Microphone access 111 | - OpenAI API key 112 | 113 | ## Customization 114 | 115 | 1. **Modify AI Instructions**: 116 | 117 | - Edit `DEFAULT_INSTRUCTIONS` in main.py 118 | - Update `SESSION_CONFIG` in app.js 119 | 2. **Adjust ArXiv Settings**: 120 | 121 | - Modify `maxResults` parameter 122 | - Customize paper formatting 123 | - Add additional search parameters 124 | 3. **UI Customization**: 125 | 126 | - Modify TailwindCSS classes 127 | - Update DaisyUI theme 128 | - Customize animations 129 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI 2 | from fastapi.responses import JSONResponse, FileResponse 3 | from fastapi.staticfiles import StaticFiles 4 | import os 5 | import httpx 6 | from termcolor import colored 7 | 8 | # Constants 9 | OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") 10 | MODEL = "gpt-4o-realtime-preview-2024-12-17" 11 | DEFAULT_INSTRUCTIONS = """You are a helpful AI assistant. You can engage in voice conversations and help with various tasks. 12 | When using the getLatestArxivPapers function: 13 | 1. Always inform the user before making the call 14 | 2. Summarize the findings in a clear, concise way 15 | 3. Focus on the most relevant papers 16 | 4. Provide brief insights about each paper""" 17 | 18 | # Initialize FastAPI 19 | app = FastAPI(title="Voice Chat") 20 | 21 | # Mount static files 22 | app.mount("/static", StaticFiles(directory="static"), name="static") 23 | 24 | @app.get("/") 25 | async def read_root(): 26 | try: 27 | print(colored("Serving index page...", "green")) 28 | return FileResponse("static/index.html") 29 | except Exception as e: 30 | print(colored(f"Error serving index page: {str(e)}", "red")) 31 | return JSONResponse( 32 | status_code=500, 33 | content={"error": "Internal server error"} 34 | ) 35 | 36 | @app.get("/session") 37 | async def create_session(): 38 | try: 39 | print(colored("Creating new session...", "cyan")) 40 | async with httpx.AsyncClient() as client: 41 | response = await client.post( 42 | "https://api.openai.com/v1/realtime/sessions", 43 | headers={ 44 | "Authorization": f"Bearer {OPENAI_API_KEY}", 45 | "Content-Type": "application/json" 46 | }, 47 | json={ 48 | "model": MODEL, 49 | "voice": "verse", 50 | "instructions": DEFAULT_INSTRUCTIONS 51 | } 52 | ) 53 | print(colored("Session created successfully!", "green")) 54 | return response.json() 55 | except Exception as e: 56 | print(colored(f"Error creating session: {str(e)}", "red")) 57 | return JSONResponse( 58 | status_code=500, 59 | content={"error": "Failed to create session"} 60 | ) 61 | 62 | if __name__ == "__main__": 63 | import uvicorn 64 | print(colored("Starting server...", "yellow")) 65 | uvicorn.run("main:app", host="127.0.0.1", port=8000, reload=True) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi 2 | uvicorn 3 | httpx 4 | termcolor -------------------------------------------------------------------------------- /static/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Voice Chat 7 | 8 | 9 | 10 | 61 | 62 | 63 |
64 |
65 |
66 |

Voice Chat

67 | 68 | 69 |
70 |
71 | Disconnected 72 |
73 | 74 | 75 |
76 | 77 | 78 |
79 | 82 | 85 |
86 | 87 | 88 | 91 |
92 |
93 |
94 | 95 | 96 | 97 | -------------------------------------------------------------------------------- /static/js/app.js: -------------------------------------------------------------------------------- 1 | // DOM Elements 2 | const startButton = document.getElementById('startButton'); 3 | const stopButton = document.getElementById('stopButton'); 4 | const statusIndicator = document.getElementById('statusIndicator'); 5 | const statusText = document.getElementById('statusText'); 6 | const loadingAnimation = document.getElementById('loadingAnimation'); 7 | const messagesDiv = document.getElementById('messages'); 8 | 9 | // Global variables 10 | let peerConnection = null; 11 | let dataChannel = null; 12 | let mediaStream = null; 13 | let audioElement = null; 14 | 15 | // Session configuration 16 | const SESSION_CONFIG = { 17 | modalities: ['text', 'audio'], 18 | instructions: { 19 | role: 'You are a helpful AI assistant with expertise in scientific research and paper analysis.', 20 | context: 'You can engage in voice conversations and help users find and understand scientific papers.', 21 | guidelines: [ 22 | 'Always acknowledge user requests clearly', 23 | 'When searching for papers, explain what you are looking for', 24 | 'Provide concise but informative summaries', 25 | 'Focus on the most relevant and recent findings', 26 | 'Highlight key insights and potential applications' 27 | ], 28 | functionUsage: { 29 | getLatestArxivPapers: { 30 | beforeCall: 'Inform user about initiating the search', 31 | afterCall: 'Summarize findings and provide insights', 32 | presentation: 'Present papers in a clear, structured format' 33 | } 34 | } 35 | } 36 | }; 37 | 38 | // Helper functions 39 | function updateStatus(status, isConnected = false) { 40 | statusText.textContent = status; 41 | statusIndicator.className = `w-3 h-3 rounded-full ${isConnected ? 'bg-success' : 'bg-error'}`; 42 | } 43 | 44 | function addMessage(text, type = 'info') { 45 | const messageDiv = document.createElement('div'); 46 | messageDiv.className = `message ${type}`; 47 | messageDiv.textContent = text; 48 | messagesDiv.appendChild(messageDiv); 49 | 50 | // Force a reflow to ensure the animation triggers 51 | messageDiv.offsetHeight; 52 | 53 | // Add show class after a small delay to trigger the animation 54 | requestAnimationFrame(() => { 55 | messageDiv.classList.add('show'); 56 | }); 57 | 58 | // Smooth scroll to the new message 59 | messageDiv.scrollIntoView({ behavior: 'smooth', block: 'end' }); 60 | 61 | // Limit the number of messages to prevent memory issues 62 | const messages = messagesDiv.getElementsByClassName('message'); 63 | if (messages.length > 100) { 64 | messagesDiv.removeChild(messages[0]); 65 | } 66 | } 67 | 68 | function showLoading(show) { 69 | loadingAnimation.className = show ? 'mt-4' : 'hidden mt-4'; 70 | startButton.disabled = show; 71 | } 72 | 73 | // arXiv paper fetching function 74 | async function getLatestArxivPapers(query = 'AI', maxResults = 5) { 75 | try { 76 | const response = await fetch(`http://export.arxiv.org/api/query?search_query=all:${query}&start=0&max_results=${maxResults}&sortBy=submittedDate&sortOrder=descending`); 77 | const xmlText = await response.text(); 78 | const parser = new DOMParser(); 79 | const xmlDoc = parser.parseFromString(xmlText, 'text/xml'); 80 | 81 | const entries = xmlDoc.getElementsByTagName('entry'); 82 | const papers = Array.from(entries).map(entry => ({ 83 | title: entry.getElementsByTagName('title')[0]?.textContent?.trim(), 84 | authors: Array.from(entry.getElementsByTagName('author')).map(author => 85 | author.getElementsByTagName('name')[0]?.textContent?.trim() 86 | ), 87 | summary: entry.getElementsByTagName('summary')[0]?.textContent?.trim(), 88 | link: entry.getElementsByTagName('id')[0]?.textContent?.trim(), 89 | published: entry.getElementsByTagName('published')[0]?.textContent?.trim() 90 | })); 91 | 92 | return papers; 93 | } catch (error) { 94 | console.error('Error fetching arXiv papers:', error); 95 | throw error; 96 | } 97 | } 98 | 99 | // WebRTC implementation 100 | async function initializeWebRTC() { 101 | try { 102 | showLoading(true); 103 | updateStatus('Initializing...'); 104 | 105 | // Get ephemeral token 106 | const tokenResponse = await fetch('/session'); 107 | const data = await tokenResponse.json(); 108 | 109 | if (!data.client_secret?.value) { 110 | throw new Error('Failed to get ephemeral token'); 111 | } 112 | 113 | const EPHEMERAL_KEY = data.client_secret.value; 114 | 115 | // Create peer connection 116 | peerConnection = new RTCPeerConnection(); 117 | 118 | // Set up audio element 119 | audioElement = document.createElement('audio'); 120 | audioElement.autoplay = true; 121 | peerConnection.ontrack = e => audioElement.srcObject = e.streams[0]; 122 | 123 | // Add local audio track 124 | mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true }); 125 | peerConnection.addTrack(mediaStream.getTracks()[0]); 126 | 127 | // Set up data channel 128 | dataChannel = peerConnection.createDataChannel('oai-events'); 129 | dataChannel.addEventListener('message', handleDataChannelMessage); 130 | dataChannel.addEventListener('open', () => { 131 | updateStatus('Connected', true); 132 | stopButton.disabled = false; 133 | startButton.disabled = true; 134 | addMessage('Connection established'); 135 | configureTools(); // Configure available tools after connection 136 | }); 137 | 138 | // Create and set local description 139 | const offer = await peerConnection.createOffer(); 140 | await peerConnection.setLocalDescription(offer); 141 | 142 | // Connect to OpenAI Realtime API 143 | const baseUrl = 'https://api.openai.com/v1/realtime'; 144 | const model = 'gpt-4o-realtime-preview-2024-12-17'; 145 | const sdpResponse = await fetch(`${baseUrl}?model=${model}`, { 146 | method: 'POST', 147 | body: offer.sdp, 148 | headers: { 149 | Authorization: `Bearer ${EPHEMERAL_KEY}`, 150 | 'Content-Type': 'application/sdp' 151 | }, 152 | }); 153 | 154 | if (!sdpResponse.ok) { 155 | throw new Error('Failed to connect to OpenAI Realtime API'); 156 | } 157 | 158 | const answer = { 159 | type: 'answer', 160 | sdp: await sdpResponse.text(), 161 | }; 162 | await peerConnection.setRemoteDescription(answer); 163 | 164 | showLoading(false); 165 | } catch (error) { 166 | console.error('Error:', error); 167 | showLoading(false); 168 | updateStatus('Error: ' + error.message); 169 | addMessage(error.message, 'error'); 170 | } 171 | } 172 | 173 | function configureTools() { 174 | const event = { 175 | type: 'session.update', 176 | session: { 177 | ...SESSION_CONFIG, 178 | tools: [ 179 | { 180 | type: 'function', 181 | name: 'getLatestArxivPapers', 182 | description: 'Get the latest papers from arXiv based on a search query. Always inform the user before searching and provide insights after finding papers.', 183 | parameters: { 184 | type: 'object', 185 | properties: { 186 | query: { 187 | type: 'string', 188 | description: 'Search query for papers (e.g., "AI", "machine learning")' 189 | }, 190 | maxResults: { 191 | type: 'number', 192 | description: 'Maximum number of papers to return (default: 5)' 193 | } 194 | }, 195 | required: ['query'] 196 | } 197 | } 198 | ] 199 | } 200 | }; 201 | dataChannel.send(JSON.stringify(event)); 202 | 203 | // Send additional context update 204 | const contextUpdate = { 205 | type: 'conversation.item.create', 206 | item: { 207 | type: 'text', 208 | content: 'Session configured with custom instructions. Ready to assist with paper searches and analysis.' 209 | } 210 | }; 211 | dataChannel.send(JSON.stringify(contextUpdate)); 212 | } 213 | 214 | async function handleDataChannelMessage(event) { 215 | try { 216 | const msg = JSON.parse(event.data); 217 | 218 | // Handle function calls 219 | if (msg.type === 'response.function_call_arguments.done') { 220 | if (msg.name === 'getLatestArxivPapers') { 221 | const args = JSON.parse(msg.arguments); 222 | try { 223 | const papers = await getLatestArxivPapers(args.query, args.maxResults); 224 | const responseEvent = { 225 | type: 'conversation.item.create', 226 | item: { 227 | type: 'function_call_output', 228 | call_id: msg.call_id, 229 | output: JSON.stringify(papers) 230 | } 231 | }; 232 | dataChannel.send(JSON.stringify(responseEvent)); 233 | addMessage(`Found ${papers.length} papers for query: ${args.query}`); 234 | } catch (error) { 235 | addMessage(`Error fetching papers: ${error.message}`, 'error'); 236 | } 237 | } 238 | } else { 239 | addMessage(`AI: ${msg.content || JSON.stringify(msg)}`); 240 | } 241 | } catch (error) { 242 | console.error('Error parsing message:', error); 243 | addMessage(`Error parsing message: ${error.message}`, 'error'); 244 | } 245 | } 246 | 247 | function cleanup() { 248 | try { 249 | // Stop all media tracks 250 | if (mediaStream) { 251 | mediaStream.getTracks().forEach(track => { 252 | track.stop(); 253 | }); 254 | mediaStream = null; 255 | } 256 | 257 | // Clean up audio element 258 | if (audioElement) { 259 | audioElement.srcObject = null; 260 | audioElement.remove(); 261 | audioElement = null; 262 | } 263 | 264 | // Close data channel 265 | if (dataChannel) { 266 | dataChannel.close(); 267 | dataChannel = null; 268 | } 269 | 270 | // Close peer connection 271 | if (peerConnection) { 272 | peerConnection.close(); 273 | peerConnection = null; 274 | } 275 | 276 | updateStatus('Disconnected'); 277 | startButton.disabled = false; 278 | stopButton.disabled = true; 279 | addMessage('Connection closed and resources cleaned up'); 280 | } catch (error) { 281 | console.error('Error during cleanup:', error); 282 | addMessage(`Error during cleanup: ${error.message}`, 'error'); 283 | } 284 | } 285 | 286 | // Event listeners 287 | startButton.addEventListener('click', initializeWebRTC); 288 | stopButton.addEventListener('click', cleanup); 289 | 290 | // Initial setup 291 | updateStatus('Disconnected'); 292 | addMessage('Ready to start voice chat'); --------------------------------------------------------------------------------