├── LICENCE
├── README.md
├── main.py
├── requirements.txt
└── static
    ├── index.html
    └── js
        └── app.js


/LICENCE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 echohive
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Real-time Voice Chat Template
  2 | 
  3 | A barebone template for building real-time voice applications with OpenAI's API and WebRTC. Features built-in arXiv paper search capabilities and customizable AI instructions.
  4 | 
  5 | ## Features
  6 | 
  7 | - Real-time voice communication with AI
  8 | - Built-in arXiv paper search functionality
  9 | - Live event display
 10 | - Customizable AI instructions
 11 | - Clean, dark-themed UI
 12 | - WebRTC-based audio streaming
 13 | 
 14 | ## ❤️ Support & Get 400+ AI Projects
 15 | 
 16 | This is one of 400+ fascinating projects in my collection! [Support me on Patreon](https://www.patreon.com/c/echohive42/membership) to get:
 17 | 
 18 | - 🎯 Access to 400+ AI projects (and growing daily!)
 19 |   - Including advanced projects like [2 Agent Real-time voice template with turn taking](https://www.patreon.com/posts/2-agent-real-you-118330397)
 20 | - 📥 Full source code & detailed explanations
 21 | - 📚 1000x Cursor Course
 22 | - 🎓 Live coding sessions & AMAs
 23 | - 💬 1-on-1 consultations (higher tiers)
 24 | - 🎁 Exclusive discounts on AI tools & platforms (up to $180 value)
 25 | 
 26 | ## Technical Stack
 27 | 
 28 | - **Backend**: FastAPI
 29 | - **Frontend**: HTML, JavaScript, TailwindCSS, DaisyUI
 30 | - **Real-time Communication**: WebRTC
 31 | - **API Integration**: OpenAI Real-time API, arXiv API
 32 | - **Styling**: Dark mode with animations
 33 | 
 34 | ## Custom Instructions System
 35 | 
 36 | The template includes a two-layer instruction system:
 37 | 
 38 | 1. **Backend Instructions** (main.py):
 39 | 
 40 |    - Basic AI behavior guidelines
 41 |    - Function-specific instructions
 42 |    - Voice interaction rules
 43 | 2. **Frontend Instructions** (app.js):
 44 | 
 45 |    - Detailed AI role configuration
 46 |    - Context and guidelines
 47 |    - Function usage specifications
 48 |    - Real-time session updates
 49 | 
 50 | ## ArXiv Integration
 51 | 
 52 | Built-in functionality to search and fetch latest papers:
 53 | 
 54 | - Real-time paper search
 55 | - Automatic result parsing
 56 | - Title, authors, summary, and link extraction
 57 | - Configurable result limits
 58 | 
 59 | ## Setup
 60 | 
 61 | 1. Set your OpenAI API key:
 62 | 
 63 | ```bash
 64 | # Windows
 65 | set OPENAI_API_KEY=your_api_key_here
 66 | 
 67 | # Linux/Mac
 68 | export OPENAI_API_KEY=your_api_key_here
 69 | ```
 70 | 
 71 | 2. Install dependencies:
 72 | 
 73 | ```bash
 74 | pip install -r requirements.txt
 75 | ```
 76 | 
 77 | 3. Run the application:
 78 | 
 79 | ```bash
 80 | python main.py
 81 | ```
 82 | 
 83 | 4. Open in browser:
 84 | 
 85 | ```
 86 | http://127.0.0.1:8000
 87 | ```
 88 | 
 89 | ## Usage Examples
 90 | 
 91 | 1. **Start Voice Chat**:
 92 | 
 93 |    - Click "Start Voice Chat"
 94 |    - Grant microphone permissions
 95 |    - Wait for connection confirmation
 96 | 2. **Search Papers**:
 97 | 
 98 |    - "Find recent papers about machine learning"
 99 |    - "Show me the latest research on AI"
100 |    - "Search for papers about neural networks"
101 | 3. **End Session**:
102 | 
103 |    - Click "Stop Voice Chat"
104 |    - Resources are automatically cleaned up
105 | 
106 | ## Requirements
107 | 
108 | - Python 3.7+
109 | - Modern web browser with WebRTC support
110 | - Microphone access
111 | - OpenAI API key
112 | 
113 | ## Customization
114 | 
115 | 1. **Modify AI Instructions**:
116 | 
117 |    - Edit `DEFAULT_INSTRUCTIONS` in main.py
118 |    - Update `SESSION_CONFIG` in app.js
119 | 2. **Adjust ArXiv Settings**:
120 | 
121 |    - Modify `maxResults` parameter
122 |    - Customize paper formatting
123 |    - Add additional search parameters
124 | 3. **UI Customization**:
125 | 
126 |    - Modify TailwindCSS classes
127 |    - Update DaisyUI theme
128 |    - Customize animations
129 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | from fastapi import FastAPI
 2 | from fastapi.responses import JSONResponse, FileResponse
 3 | from fastapi.staticfiles import StaticFiles
 4 | import os
 5 | import httpx
 6 | from termcolor import colored
 7 | 
 8 | # Constants
 9 | OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
10 | MODEL = "gpt-4o-realtime-preview-2024-12-17"
11 | DEFAULT_INSTRUCTIONS = """You are a helpful AI assistant. You can engage in voice conversations and help with various tasks.
12 | When using the getLatestArxivPapers function:
13 | 1. Always inform the user before making the call
14 | 2. Summarize the findings in a clear, concise way
15 | 3. Focus on the most relevant papers
16 | 4. Provide brief insights about each paper"""
17 | 
18 | # Initialize FastAPI
19 | app = FastAPI(title="Voice Chat")
20 | 
21 | # Mount static files
22 | app.mount("/static", StaticFiles(directory="static"), name="static")
23 | 
24 | @app.get("/")
25 | async def read_root():
26 |     try:
27 |         print(colored("Serving index page...", "green"))
28 |         return FileResponse("static/index.html")
29 |     except Exception as e:
30 |         print(colored(f"Error serving index page: {str(e)}", "red"))
31 |         return JSONResponse(
32 |             status_code=500,
33 |             content={"error": "Internal server error"}
34 |         )
35 | 
36 | @app.get("/session")
37 | async def create_session():
38 |     try:
39 |         print(colored("Creating new session...", "cyan"))
40 |         async with httpx.AsyncClient() as client:
41 |             response = await client.post(
42 |                 "https://api.openai.com/v1/realtime/sessions",
43 |                 headers={
44 |                     "Authorization": f"Bearer {OPENAI_API_KEY}",
45 |                     "Content-Type": "application/json"
46 |                 },
47 |                 json={
48 |                     "model": MODEL,
49 |                     "voice": "verse",
50 |                     "instructions": DEFAULT_INSTRUCTIONS
51 |                 }
52 |             )
53 |             print(colored("Session created successfully!", "green"))
54 |             return response.json()
55 |     except Exception as e:
56 |         print(colored(f"Error creating session: {str(e)}", "red"))
57 |         return JSONResponse(
58 |             status_code=500,
59 |             content={"error": "Failed to create session"}
60 |         )
61 | 
62 | if __name__ == "__main__":
63 |     import uvicorn
64 |     print(colored("Starting server...", "yellow"))
65 |     uvicorn.run("main:app", host="127.0.0.1", port=8000, reload=True)


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | fastapi
2 | uvicorn
3 | httpx
4 | termcolor


--------------------------------------------------------------------------------
/static/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en" data-theme="dark">
 3 | <head>
 4 |     <meta charset="UTF-8">
 5 |     <meta name="viewport" content="width=device-width, initial-scale=1.0">
 6 |     <title>Voice Chat</title>
 7 |     <link href="https://cdn.jsdelivr.net/npm/daisyui@3.9.4/dist/full.css" rel="stylesheet" type="text/css" />
 8 |     <script src="https://cdn.tailwindcss.com"></script>
 9 |     <script src="https://cdnjs.cloudflare.com/ajax/libs/animejs/3.2.1/anime.min.js"></script>
10 |     <style>
11 |         #messages {
12 |             max-height: 400px;
13 |             overflow-y: auto;
14 |             scroll-behavior: smooth;
15 |         }
16 |         .message {
17 |             opacity: 0;
18 |             transform: translateY(20px);
19 |             transition: all 0.5s ease-out;
20 |             margin: 0.5rem 0;
21 |             padding: 1rem;
22 |             border-radius: 0.5rem;
23 |             background: rgba(255, 255, 255, 0.05);
24 |             backdrop-filter: blur(10px);
25 |             box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
26 |             border: 1px solid rgba(255, 255, 255, 0.1);
27 |         }
28 |         .message.show {
29 |             opacity: 1;
30 |             transform: translateY(0);
31 |         }
32 |         .message:hover {
33 |             background: rgba(255, 255, 255, 0.1);
34 |             transform: translateY(-2px);
35 |             transition: all 0.3s ease;
36 |         }
37 |         .message.error {
38 |             background: rgba(255, 0, 0, 0.1);
39 |             border-color: rgba(255, 0, 0, 0.2);
40 |         }
41 |         .message.info {
42 |             background: rgba(0, 255, 255, 0.05);
43 |             border-color: rgba(0, 255, 255, 0.1);
44 |         }
45 |         /* Scrollbar styling */
46 |         #messages::-webkit-scrollbar {
47 |             width: 8px;
48 |         }
49 |         #messages::-webkit-scrollbar-track {
50 |             background: rgba(0, 0, 0, 0.1);
51 |             border-radius: 4px;
52 |         }
53 |         #messages::-webkit-scrollbar-thumb {
54 |             background: rgba(255, 255, 255, 0.1);
55 |             border-radius: 4px;
56 |         }
57 |         #messages::-webkit-scrollbar-thumb:hover {
58 |             background: rgba(255, 255, 255, 0.2);
59 |         }
60 |     </style>
61 | </head>
62 | <body class="min-h-screen bg-base-300">
63 |     <div class="container mx-auto p-4">
64 |         <div class="card bg-base-100 shadow-xl">
65 |             <div class="card-body">
66 |                 <h2 class="card-title text-primary">Voice Chat</h2>
67 |                 
68 |                 <!-- Status indicator -->
69 |                 <div class="flex items-center gap-2 mb-4">
70 |                     <div id="statusIndicator" class="w-3 h-3 rounded-full bg-error"></div>
71 |                     <span id="statusText" class="text-sm">Disconnected</span>
72 |                 </div>
73 | 
74 |                 <!-- Messages -->
75 |                 <div id="messages" class="mb-4 p-4 bg-base-200 rounded-lg"></div>
76 | 
77 |                 <!-- Controls -->
78 |                 <div class="flex gap-2">
79 |                     <button id="startButton" class="btn btn-primary">
80 |                         Start Voice Chat
81 |                     </button>
82 |                     <button id="stopButton" class="btn btn-error" disabled>
83 |                         Stop Voice Chat
84 |                     </button>
85 |                 </div>
86 | 
87 |                 <!-- Loading animation -->
88 |                 <div id="loadingAnimation" class="hidden mt-4">
89 |                     <span class="loading loading-dots loading-lg text-primary"></span>
90 |                 </div>
91 |             </div>
92 |         </div>
93 |     </div>
94 | 
95 |     <script src="/static/js/app.js"></script>
96 | </body>
97 | </html> 


--------------------------------------------------------------------------------
/static/js/app.js:
--------------------------------------------------------------------------------
  1 | // DOM Elements
  2 | const startButton = document.getElementById('startButton');
  3 | const stopButton = document.getElementById('stopButton');
  4 | const statusIndicator = document.getElementById('statusIndicator');
  5 | const statusText = document.getElementById('statusText');
  6 | const loadingAnimation = document.getElementById('loadingAnimation');
  7 | const messagesDiv = document.getElementById('messages');
  8 | 
  9 | // Global variables
 10 | let peerConnection = null;
 11 | let dataChannel = null;
 12 | let mediaStream = null;
 13 | let audioElement = null;
 14 | 
 15 | // Session configuration
 16 | const SESSION_CONFIG = {
 17 |     modalities: ['text', 'audio'],
 18 |     instructions: {
 19 |         role: 'You are a helpful AI assistant with expertise in scientific research and paper analysis.',
 20 |         context: 'You can engage in voice conversations and help users find and understand scientific papers.',
 21 |         guidelines: [
 22 |             'Always acknowledge user requests clearly',
 23 |             'When searching for papers, explain what you are looking for',
 24 |             'Provide concise but informative summaries',
 25 |             'Focus on the most relevant and recent findings',
 26 |             'Highlight key insights and potential applications'
 27 |         ],
 28 |         functionUsage: {
 29 |             getLatestArxivPapers: {
 30 |                 beforeCall: 'Inform user about initiating the search',
 31 |                 afterCall: 'Summarize findings and provide insights',
 32 |                 presentation: 'Present papers in a clear, structured format'
 33 |             }
 34 |         }
 35 |     }
 36 | };
 37 | 
 38 | // Helper functions
 39 | function updateStatus(status, isConnected = false) {
 40 |     statusText.textContent = status;
 41 |     statusIndicator.className = `w-3 h-3 rounded-full ${isConnected ? 'bg-success' : 'bg-error'}`;
 42 | }
 43 | 
 44 | function addMessage(text, type = 'info') {
 45 |     const messageDiv = document.createElement('div');
 46 |     messageDiv.className = `message ${type}`;
 47 |     messageDiv.textContent = text;
 48 |     messagesDiv.appendChild(messageDiv);
 49 |     
 50 |     // Force a reflow to ensure the animation triggers
 51 |     messageDiv.offsetHeight;
 52 |     
 53 |     // Add show class after a small delay to trigger the animation
 54 |     requestAnimationFrame(() => {
 55 |         messageDiv.classList.add('show');
 56 |     });
 57 |     
 58 |     // Smooth scroll to the new message
 59 |     messageDiv.scrollIntoView({ behavior: 'smooth', block: 'end' });
 60 |     
 61 |     // Limit the number of messages to prevent memory issues
 62 |     const messages = messagesDiv.getElementsByClassName('message');
 63 |     if (messages.length > 100) {
 64 |         messagesDiv.removeChild(messages[0]);
 65 |     }
 66 | }
 67 | 
 68 | function showLoading(show) {
 69 |     loadingAnimation.className = show ? 'mt-4' : 'hidden mt-4';
 70 |     startButton.disabled = show;
 71 | }
 72 | 
 73 | // arXiv paper fetching function
 74 | async function getLatestArxivPapers(query = 'AI', maxResults = 5) {
 75 |     try {
 76 |         const response = await fetch(`http://export.arxiv.org/api/query?search_query=all:${query}&start=0&max_results=${maxResults}&sortBy=submittedDate&sortOrder=descending`);
 77 |         const xmlText = await response.text();
 78 |         const parser = new DOMParser();
 79 |         const xmlDoc = parser.parseFromString(xmlText, 'text/xml');
 80 |         
 81 |         const entries = xmlDoc.getElementsByTagName('entry');
 82 |         const papers = Array.from(entries).map(entry => ({
 83 |             title: entry.getElementsByTagName('title')[0]?.textContent?.trim(),
 84 |             authors: Array.from(entry.getElementsByTagName('author')).map(author => 
 85 |                 author.getElementsByTagName('name')[0]?.textContent?.trim()
 86 |             ),
 87 |             summary: entry.getElementsByTagName('summary')[0]?.textContent?.trim(),
 88 |             link: entry.getElementsByTagName('id')[0]?.textContent?.trim(),
 89 |             published: entry.getElementsByTagName('published')[0]?.textContent?.trim()
 90 |         }));
 91 |         
 92 |         return papers;
 93 |     } catch (error) {
 94 |         console.error('Error fetching arXiv papers:', error);
 95 |         throw error;
 96 |     }
 97 | }
 98 | 
 99 | // WebRTC implementation
100 | async function initializeWebRTC() {
101 |     try {
102 |         showLoading(true);
103 |         updateStatus('Initializing...');
104 | 
105 |         // Get ephemeral token
106 |         const tokenResponse = await fetch('/session');
107 |         const data = await tokenResponse.json();
108 |         
109 |         if (!data.client_secret?.value) {
110 |             throw new Error('Failed to get ephemeral token');
111 |         }
112 | 
113 |         const EPHEMERAL_KEY = data.client_secret.value;
114 | 
115 |         // Create peer connection
116 |         peerConnection = new RTCPeerConnection();
117 | 
118 |         // Set up audio element
119 |         audioElement = document.createElement('audio');
120 |         audioElement.autoplay = true;
121 |         peerConnection.ontrack = e => audioElement.srcObject = e.streams[0];
122 | 
123 |         // Add local audio track
124 |         mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });
125 |         peerConnection.addTrack(mediaStream.getTracks()[0]);
126 | 
127 |         // Set up data channel
128 |         dataChannel = peerConnection.createDataChannel('oai-events');
129 |         dataChannel.addEventListener('message', handleDataChannelMessage);
130 |         dataChannel.addEventListener('open', () => {
131 |             updateStatus('Connected', true);
132 |             stopButton.disabled = false;
133 |             startButton.disabled = true;
134 |             addMessage('Connection established');
135 |             configureTools(); // Configure available tools after connection
136 |         });
137 | 
138 |         // Create and set local description
139 |         const offer = await peerConnection.createOffer();
140 |         await peerConnection.setLocalDescription(offer);
141 | 
142 |         // Connect to OpenAI Realtime API
143 |         const baseUrl = 'https://api.openai.com/v1/realtime';
144 |         const model = 'gpt-4o-realtime-preview-2024-12-17';
145 |         const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
146 |             method: 'POST',
147 |             body: offer.sdp,
148 |             headers: {
149 |                 Authorization: `Bearer ${EPHEMERAL_KEY}`,
150 |                 'Content-Type': 'application/sdp'
151 |             },
152 |         });
153 | 
154 |         if (!sdpResponse.ok) {
155 |             throw new Error('Failed to connect to OpenAI Realtime API');
156 |         }
157 | 
158 |         const answer = {
159 |             type: 'answer',
160 |             sdp: await sdpResponse.text(),
161 |         };
162 |         await peerConnection.setRemoteDescription(answer);
163 | 
164 |         showLoading(false);
165 |     } catch (error) {
166 |         console.error('Error:', error);
167 |         showLoading(false);
168 |         updateStatus('Error: ' + error.message);
169 |         addMessage(error.message, 'error');
170 |     }
171 | }
172 | 
173 | function configureTools() {
174 |     const event = {
175 |         type: 'session.update',
176 |         session: {
177 |             ...SESSION_CONFIG,
178 |             tools: [
179 |                 {
180 |                     type: 'function',
181 |                     name: 'getLatestArxivPapers',
182 |                     description: 'Get the latest papers from arXiv based on a search query. Always inform the user before searching and provide insights after finding papers.',
183 |                     parameters: {
184 |                         type: 'object',
185 |                         properties: {
186 |                             query: {
187 |                                 type: 'string',
188 |                                 description: 'Search query for papers (e.g., "AI", "machine learning")'
189 |                             },
190 |                             maxResults: {
191 |                                 type: 'number',
192 |                                 description: 'Maximum number of papers to return (default: 5)'
193 |                             }
194 |                         },
195 |                         required: ['query']
196 |                     }
197 |                 }
198 |             ]
199 |         }
200 |     };
201 |     dataChannel.send(JSON.stringify(event));
202 |     
203 |     // Send additional context update
204 |     const contextUpdate = {
205 |         type: 'conversation.item.create',
206 |         item: {
207 |             type: 'text',
208 |             content: 'Session configured with custom instructions. Ready to assist with paper searches and analysis.'
209 |         }
210 |     };
211 |     dataChannel.send(JSON.stringify(contextUpdate));
212 | }
213 | 
214 | async function handleDataChannelMessage(event) {
215 |     try {
216 |         const msg = JSON.parse(event.data);
217 |         
218 |         // Handle function calls
219 |         if (msg.type === 'response.function_call_arguments.done') {
220 |             if (msg.name === 'getLatestArxivPapers') {
221 |                 const args = JSON.parse(msg.arguments);
222 |                 try {
223 |                     const papers = await getLatestArxivPapers(args.query, args.maxResults);
224 |                     const responseEvent = {
225 |                         type: 'conversation.item.create',
226 |                         item: {
227 |                             type: 'function_call_output',
228 |                             call_id: msg.call_id,
229 |                             output: JSON.stringify(papers)
230 |                         }
231 |                     };
232 |                     dataChannel.send(JSON.stringify(responseEvent));
233 |                     addMessage(`Found ${papers.length} papers for query: ${args.query}`);
234 |                 } catch (error) {
235 |                     addMessage(`Error fetching papers: ${error.message}`, 'error');
236 |                 }
237 |             }
238 |         } else {
239 |             addMessage(`AI: ${msg.content || JSON.stringify(msg)}`);
240 |         }
241 |     } catch (error) {
242 |         console.error('Error parsing message:', error);
243 |         addMessage(`Error parsing message: ${error.message}`, 'error');
244 |     }
245 | }
246 | 
247 | function cleanup() {
248 |     try {
249 |         // Stop all media tracks
250 |         if (mediaStream) {
251 |             mediaStream.getTracks().forEach(track => {
252 |                 track.stop();
253 |             });
254 |             mediaStream = null;
255 |         }
256 | 
257 |         // Clean up audio element
258 |         if (audioElement) {
259 |             audioElement.srcObject = null;
260 |             audioElement.remove();
261 |             audioElement = null;
262 |         }
263 | 
264 |         // Close data channel
265 |         if (dataChannel) {
266 |             dataChannel.close();
267 |             dataChannel = null;
268 |         }
269 | 
270 |         // Close peer connection
271 |         if (peerConnection) {
272 |             peerConnection.close();
273 |             peerConnection = null;
274 |         }
275 | 
276 |         updateStatus('Disconnected');
277 |         startButton.disabled = false;
278 |         stopButton.disabled = true;
279 |         addMessage('Connection closed and resources cleaned up');
280 |     } catch (error) {
281 |         console.error('Error during cleanup:', error);
282 |         addMessage(`Error during cleanup: ${error.message}`, 'error');
283 |     }
284 | }
285 | 
286 | // Event listeners
287 | startButton.addEventListener('click', initializeWebRTC);
288 | stopButton.addEventListener('click', cleanup);
289 | 
290 | // Initial setup
291 | updateStatus('Disconnected');
292 | addMessage('Ready to start voice chat'); 


--------------------------------------------------------------------------------