├── LICENCE
├── README.md
├── main.py
├── requirements.txt
└── static
├── index.html
└── js
└── app.js
/LICENCE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 echohive
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Real-time Voice Chat Template
2 |
3 | A barebone template for building real-time voice applications with OpenAI's API and WebRTC. Features built-in arXiv paper search capabilities and customizable AI instructions.
4 |
5 | ## Features
6 |
7 | - Real-time voice communication with AI
8 | - Built-in arXiv paper search functionality
9 | - Live event display
10 | - Customizable AI instructions
11 | - Clean, dark-themed UI
12 | - WebRTC-based audio streaming
13 |
14 | ## ❤️ Support & Get 400+ AI Projects
15 |
16 | This is one of 400+ fascinating projects in my collection! [Support me on Patreon](https://www.patreon.com/c/echohive42/membership) to get:
17 |
18 | - 🎯 Access to 400+ AI projects (and growing daily!)
19 | - Including advanced projects like [2 Agent Real-time voice template with turn taking](https://www.patreon.com/posts/2-agent-real-you-118330397)
20 | - 📥 Full source code & detailed explanations
21 | - 📚 1000x Cursor Course
22 | - 🎓 Live coding sessions & AMAs
23 | - 💬 1-on-1 consultations (higher tiers)
24 | - 🎁 Exclusive discounts on AI tools & platforms (up to $180 value)
25 |
26 | ## Technical Stack
27 |
28 | - **Backend**: FastAPI
29 | - **Frontend**: HTML, JavaScript, TailwindCSS, DaisyUI
30 | - **Real-time Communication**: WebRTC
31 | - **API Integration**: OpenAI Real-time API, arXiv API
32 | - **Styling**: Dark mode with animations
33 |
34 | ## Custom Instructions System
35 |
36 | The template includes a two-layer instruction system:
37 |
38 | 1. **Backend Instructions** (main.py):
39 |
40 | - Basic AI behavior guidelines
41 | - Function-specific instructions
42 | - Voice interaction rules
43 | 2. **Frontend Instructions** (app.js):
44 |
45 | - Detailed AI role configuration
46 | - Context and guidelines
47 | - Function usage specifications
48 | - Real-time session updates
49 |
50 | ## ArXiv Integration
51 |
52 | Built-in functionality to search and fetch latest papers:
53 |
54 | - Real-time paper search
55 | - Automatic result parsing
56 | - Title, authors, summary, and link extraction
57 | - Configurable result limits
58 |
59 | ## Setup
60 |
61 | 1. Set your OpenAI API key:
62 |
63 | ```bash
64 | # Windows
65 | set OPENAI_API_KEY=your_api_key_here
66 |
67 | # Linux/Mac
68 | export OPENAI_API_KEY=your_api_key_here
69 | ```
70 |
71 | 2. Install dependencies:
72 |
73 | ```bash
74 | pip install -r requirements.txt
75 | ```
76 |
77 | 3. Run the application:
78 |
79 | ```bash
80 | python main.py
81 | ```
82 |
83 | 4. Open in browser:
84 |
85 | ```
86 | http://127.0.0.1:8000
87 | ```
88 |
89 | ## Usage Examples
90 |
91 | 1. **Start Voice Chat**:
92 |
93 | - Click "Start Voice Chat"
94 | - Grant microphone permissions
95 | - Wait for connection confirmation
96 | 2. **Search Papers**:
97 |
98 | - "Find recent papers about machine learning"
99 | - "Show me the latest research on AI"
100 | - "Search for papers about neural networks"
101 | 3. **End Session**:
102 |
103 | - Click "Stop Voice Chat"
104 | - Resources are automatically cleaned up
105 |
106 | ## Requirements
107 |
108 | - Python 3.7+
109 | - Modern web browser with WebRTC support
110 | - Microphone access
111 | - OpenAI API key
112 |
113 | ## Customization
114 |
115 | 1. **Modify AI Instructions**:
116 |
117 | - Edit `DEFAULT_INSTRUCTIONS` in main.py
118 | - Update `SESSION_CONFIG` in app.js
119 | 2. **Adjust ArXiv Settings**:
120 |
121 | - Modify `maxResults` parameter
122 | - Customize paper formatting
123 | - Add additional search parameters
124 | 3. **UI Customization**:
125 |
126 | - Modify TailwindCSS classes
127 | - Update DaisyUI theme
128 | - Customize animations
129 |
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | from fastapi import FastAPI
2 | from fastapi.responses import JSONResponse, FileResponse
3 | from fastapi.staticfiles import StaticFiles
4 | import os
5 | import httpx
6 | from termcolor import colored
7 |
8 | # Constants
9 | OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
10 | MODEL = "gpt-4o-realtime-preview-2024-12-17"
11 | DEFAULT_INSTRUCTIONS = """You are a helpful AI assistant. You can engage in voice conversations and help with various tasks.
12 | When using the getLatestArxivPapers function:
13 | 1. Always inform the user before making the call
14 | 2. Summarize the findings in a clear, concise way
15 | 3. Focus on the most relevant papers
16 | 4. Provide brief insights about each paper"""
17 |
18 | # Initialize FastAPI
19 | app = FastAPI(title="Voice Chat")
20 |
21 | # Mount static files
22 | app.mount("/static", StaticFiles(directory="static"), name="static")
23 |
24 | @app.get("/")
25 | async def read_root():
26 | try:
27 | print(colored("Serving index page...", "green"))
28 | return FileResponse("static/index.html")
29 | except Exception as e:
30 | print(colored(f"Error serving index page: {str(e)}", "red"))
31 | return JSONResponse(
32 | status_code=500,
33 | content={"error": "Internal server error"}
34 | )
35 |
36 | @app.get("/session")
37 | async def create_session():
38 | try:
39 | print(colored("Creating new session...", "cyan"))
40 | async with httpx.AsyncClient() as client:
41 | response = await client.post(
42 | "https://api.openai.com/v1/realtime/sessions",
43 | headers={
44 | "Authorization": f"Bearer {OPENAI_API_KEY}",
45 | "Content-Type": "application/json"
46 | },
47 | json={
48 | "model": MODEL,
49 | "voice": "verse",
50 | "instructions": DEFAULT_INSTRUCTIONS
51 | }
52 | )
53 | print(colored("Session created successfully!", "green"))
54 | return response.json()
55 | except Exception as e:
56 | print(colored(f"Error creating session: {str(e)}", "red"))
57 | return JSONResponse(
58 | status_code=500,
59 | content={"error": "Failed to create session"}
60 | )
61 |
62 | if __name__ == "__main__":
63 | import uvicorn
64 | print(colored("Starting server...", "yellow"))
65 | uvicorn.run("main:app", host="127.0.0.1", port=8000, reload=True)
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | fastapi
2 | uvicorn
3 | httpx
4 | termcolor
--------------------------------------------------------------------------------
/static/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | Voice Chat
7 |
8 |
9 |
10 |
61 |
62 |
63 |
64 |
65 |
66 |
Voice Chat
67 |
68 |
69 |
70 |
71 |
Disconnected
72 |
73 |
74 |
75 |
76 |
77 |
78 |
79 |
82 |
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
94 |
95 |
96 |
97 |
--------------------------------------------------------------------------------
/static/js/app.js:
--------------------------------------------------------------------------------
1 | // DOM Elements
2 | const startButton = document.getElementById('startButton');
3 | const stopButton = document.getElementById('stopButton');
4 | const statusIndicator = document.getElementById('statusIndicator');
5 | const statusText = document.getElementById('statusText');
6 | const loadingAnimation = document.getElementById('loadingAnimation');
7 | const messagesDiv = document.getElementById('messages');
8 |
9 | // Global variables
10 | let peerConnection = null;
11 | let dataChannel = null;
12 | let mediaStream = null;
13 | let audioElement = null;
14 |
15 | // Session configuration
16 | const SESSION_CONFIG = {
17 | modalities: ['text', 'audio'],
18 | instructions: {
19 | role: 'You are a helpful AI assistant with expertise in scientific research and paper analysis.',
20 | context: 'You can engage in voice conversations and help users find and understand scientific papers.',
21 | guidelines: [
22 | 'Always acknowledge user requests clearly',
23 | 'When searching for papers, explain what you are looking for',
24 | 'Provide concise but informative summaries',
25 | 'Focus on the most relevant and recent findings',
26 | 'Highlight key insights and potential applications'
27 | ],
28 | functionUsage: {
29 | getLatestArxivPapers: {
30 | beforeCall: 'Inform user about initiating the search',
31 | afterCall: 'Summarize findings and provide insights',
32 | presentation: 'Present papers in a clear, structured format'
33 | }
34 | }
35 | }
36 | };
37 |
38 | // Helper functions
39 | function updateStatus(status, isConnected = false) {
40 | statusText.textContent = status;
41 | statusIndicator.className = `w-3 h-3 rounded-full ${isConnected ? 'bg-success' : 'bg-error'}`;
42 | }
43 |
44 | function addMessage(text, type = 'info') {
45 | const messageDiv = document.createElement('div');
46 | messageDiv.className = `message ${type}`;
47 | messageDiv.textContent = text;
48 | messagesDiv.appendChild(messageDiv);
49 |
50 | // Force a reflow to ensure the animation triggers
51 | messageDiv.offsetHeight;
52 |
53 | // Add show class after a small delay to trigger the animation
54 | requestAnimationFrame(() => {
55 | messageDiv.classList.add('show');
56 | });
57 |
58 | // Smooth scroll to the new message
59 | messageDiv.scrollIntoView({ behavior: 'smooth', block: 'end' });
60 |
61 | // Limit the number of messages to prevent memory issues
62 | const messages = messagesDiv.getElementsByClassName('message');
63 | if (messages.length > 100) {
64 | messagesDiv.removeChild(messages[0]);
65 | }
66 | }
67 |
68 | function showLoading(show) {
69 | loadingAnimation.className = show ? 'mt-4' : 'hidden mt-4';
70 | startButton.disabled = show;
71 | }
72 |
73 | // arXiv paper fetching function
74 | async function getLatestArxivPapers(query = 'AI', maxResults = 5) {
75 | try {
76 | const response = await fetch(`http://export.arxiv.org/api/query?search_query=all:${query}&start=0&max_results=${maxResults}&sortBy=submittedDate&sortOrder=descending`);
77 | const xmlText = await response.text();
78 | const parser = new DOMParser();
79 | const xmlDoc = parser.parseFromString(xmlText, 'text/xml');
80 |
81 | const entries = xmlDoc.getElementsByTagName('entry');
82 | const papers = Array.from(entries).map(entry => ({
83 | title: entry.getElementsByTagName('title')[0]?.textContent?.trim(),
84 | authors: Array.from(entry.getElementsByTagName('author')).map(author =>
85 | author.getElementsByTagName('name')[0]?.textContent?.trim()
86 | ),
87 | summary: entry.getElementsByTagName('summary')[0]?.textContent?.trim(),
88 | link: entry.getElementsByTagName('id')[0]?.textContent?.trim(),
89 | published: entry.getElementsByTagName('published')[0]?.textContent?.trim()
90 | }));
91 |
92 | return papers;
93 | } catch (error) {
94 | console.error('Error fetching arXiv papers:', error);
95 | throw error;
96 | }
97 | }
98 |
99 | // WebRTC implementation
100 | async function initializeWebRTC() {
101 | try {
102 | showLoading(true);
103 | updateStatus('Initializing...');
104 |
105 | // Get ephemeral token
106 | const tokenResponse = await fetch('/session');
107 | const data = await tokenResponse.json();
108 |
109 | if (!data.client_secret?.value) {
110 | throw new Error('Failed to get ephemeral token');
111 | }
112 |
113 | const EPHEMERAL_KEY = data.client_secret.value;
114 |
115 | // Create peer connection
116 | peerConnection = new RTCPeerConnection();
117 |
118 | // Set up audio element
119 | audioElement = document.createElement('audio');
120 | audioElement.autoplay = true;
121 | peerConnection.ontrack = e => audioElement.srcObject = e.streams[0];
122 |
123 | // Add local audio track
124 | mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });
125 | peerConnection.addTrack(mediaStream.getTracks()[0]);
126 |
127 | // Set up data channel
128 | dataChannel = peerConnection.createDataChannel('oai-events');
129 | dataChannel.addEventListener('message', handleDataChannelMessage);
130 | dataChannel.addEventListener('open', () => {
131 | updateStatus('Connected', true);
132 | stopButton.disabled = false;
133 | startButton.disabled = true;
134 | addMessage('Connection established');
135 | configureTools(); // Configure available tools after connection
136 | });
137 |
138 | // Create and set local description
139 | const offer = await peerConnection.createOffer();
140 | await peerConnection.setLocalDescription(offer);
141 |
142 | // Connect to OpenAI Realtime API
143 | const baseUrl = 'https://api.openai.com/v1/realtime';
144 | const model = 'gpt-4o-realtime-preview-2024-12-17';
145 | const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
146 | method: 'POST',
147 | body: offer.sdp,
148 | headers: {
149 | Authorization: `Bearer ${EPHEMERAL_KEY}`,
150 | 'Content-Type': 'application/sdp'
151 | },
152 | });
153 |
154 | if (!sdpResponse.ok) {
155 | throw new Error('Failed to connect to OpenAI Realtime API');
156 | }
157 |
158 | const answer = {
159 | type: 'answer',
160 | sdp: await sdpResponse.text(),
161 | };
162 | await peerConnection.setRemoteDescription(answer);
163 |
164 | showLoading(false);
165 | } catch (error) {
166 | console.error('Error:', error);
167 | showLoading(false);
168 | updateStatus('Error: ' + error.message);
169 | addMessage(error.message, 'error');
170 | }
171 | }
172 |
173 | function configureTools() {
174 | const event = {
175 | type: 'session.update',
176 | session: {
177 | ...SESSION_CONFIG,
178 | tools: [
179 | {
180 | type: 'function',
181 | name: 'getLatestArxivPapers',
182 | description: 'Get the latest papers from arXiv based on a search query. Always inform the user before searching and provide insights after finding papers.',
183 | parameters: {
184 | type: 'object',
185 | properties: {
186 | query: {
187 | type: 'string',
188 | description: 'Search query for papers (e.g., "AI", "machine learning")'
189 | },
190 | maxResults: {
191 | type: 'number',
192 | description: 'Maximum number of papers to return (default: 5)'
193 | }
194 | },
195 | required: ['query']
196 | }
197 | }
198 | ]
199 | }
200 | };
201 | dataChannel.send(JSON.stringify(event));
202 |
203 | // Send additional context update
204 | const contextUpdate = {
205 | type: 'conversation.item.create',
206 | item: {
207 | type: 'text',
208 | content: 'Session configured with custom instructions. Ready to assist with paper searches and analysis.'
209 | }
210 | };
211 | dataChannel.send(JSON.stringify(contextUpdate));
212 | }
213 |
214 | async function handleDataChannelMessage(event) {
215 | try {
216 | const msg = JSON.parse(event.data);
217 |
218 | // Handle function calls
219 | if (msg.type === 'response.function_call_arguments.done') {
220 | if (msg.name === 'getLatestArxivPapers') {
221 | const args = JSON.parse(msg.arguments);
222 | try {
223 | const papers = await getLatestArxivPapers(args.query, args.maxResults);
224 | const responseEvent = {
225 | type: 'conversation.item.create',
226 | item: {
227 | type: 'function_call_output',
228 | call_id: msg.call_id,
229 | output: JSON.stringify(papers)
230 | }
231 | };
232 | dataChannel.send(JSON.stringify(responseEvent));
233 | addMessage(`Found ${papers.length} papers for query: ${args.query}`);
234 | } catch (error) {
235 | addMessage(`Error fetching papers: ${error.message}`, 'error');
236 | }
237 | }
238 | } else {
239 | addMessage(`AI: ${msg.content || JSON.stringify(msg)}`);
240 | }
241 | } catch (error) {
242 | console.error('Error parsing message:', error);
243 | addMessage(`Error parsing message: ${error.message}`, 'error');
244 | }
245 | }
246 |
247 | function cleanup() {
248 | try {
249 | // Stop all media tracks
250 | if (mediaStream) {
251 | mediaStream.getTracks().forEach(track => {
252 | track.stop();
253 | });
254 | mediaStream = null;
255 | }
256 |
257 | // Clean up audio element
258 | if (audioElement) {
259 | audioElement.srcObject = null;
260 | audioElement.remove();
261 | audioElement = null;
262 | }
263 |
264 | // Close data channel
265 | if (dataChannel) {
266 | dataChannel.close();
267 | dataChannel = null;
268 | }
269 |
270 | // Close peer connection
271 | if (peerConnection) {
272 | peerConnection.close();
273 | peerConnection = null;
274 | }
275 |
276 | updateStatus('Disconnected');
277 | startButton.disabled = false;
278 | stopButton.disabled = true;
279 | addMessage('Connection closed and resources cleaned up');
280 | } catch (error) {
281 | console.error('Error during cleanup:', error);
282 | addMessage(`Error during cleanup: ${error.message}`, 'error');
283 | }
284 | }
285 |
286 | // Event listeners
287 | startButton.addEventListener('click', initializeWebRTC);
288 | stopButton.addEventListener('click', cleanup);
289 |
290 | // Initial setup
291 | updateStatus('Disconnected');
292 | addMessage('Ready to start voice chat');
--------------------------------------------------------------------------------