├── .env ├── README.md ├── mainpy └── requirements.txt /.env: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY= 2 | SMS_KEY= -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | AI-Call 🔊🤖 – AI-Powered Calling Application 2 | AI-Call is an open-source, AI-powered calling application that leverages voice synthesis and real-time communication technologies to simulate intelligent phone conversations. Designed to bring automation, personalization, and intelligence into phone-based interactions, AI-Call is perfect for developers, researchers, and businesses interested in conversational AI, virtual assistants, or AI customer support systems. 3 | 4 | 🚀 Features 5 | 🔊 AI-Powered Voice Calling: Simulate real-time phone calls with AI-generated speech. 6 | 7 | 🧠 Natural Language Processing (NLP): Understand and generate human-like responses. 8 | 9 | 🕹️ Real-Time Communication: Built using cutting-edge WebRTC and Twilio technologies. 10 | 11 | 🌐 Voice Cloning & Text-to-Speech (TTS): Customize call responses with your own voice model. 12 | 13 | 💡 Open Source and Developer-Friendly: Fully customizable for your use case. 14 | 15 | 🛠️ Tech Stack 16 | Frontend: HTML, CSS, JavaScript 17 | 18 | Backend: Node.js, Express 19 | 20 | AI & Voice: OpenAI APIs, ElevenLabs TTS 21 | 22 | Telephony: Twilio Programmable Voice 23 | 24 | 📦 Installation 25 | bash 26 | Copy 27 | Edit 28 | git clone https://github.com/amanp8l/ai-call.git 29 | cd ai-call 30 | npm install 31 | Add a .env file with the following variables: 32 | 33 | env 34 | Copy 35 | Edit 36 | PORT=3000 37 | TWILIO_ACCOUNT_SID=your_account_sid 38 | TWILIO_AUTH_TOKEN=your_auth_token 39 | TWILIO_PHONE_NUMBER=your_twilio_phone_number 40 | OPENAI_API_KEY=your_openai_api_key 41 | ELEVENLABS_API_KEY=your_elevenlabs_api_key 42 | 🚴‍♀️ Getting Started 43 | Start the server locally: 44 | 45 | bash 46 | Copy 47 | Edit 48 | npm start 49 | Open your browser and go to http://localhost:3000 to access the AI-powered call interface. 50 | 51 | 🔧 How It Works 52 | User initiates a call. 53 | 54 | Twilio routes the call to the AI server. 55 | 56 | The server uses OpenAI to generate responses. 57 | 58 | ElevenLabs converts text responses to speech. 59 | 60 | The speech is sent back to the caller in real time. 61 | 62 | 🎯 Use Cases 63 | 📞 AI Virtual Agents for customer support 64 | 65 | 🧪 Conversational AI Experiments 66 | 67 | 🗓️ Automated Appointment Reminders 68 | 69 | 💬 Voice-enabled Chatbots 70 | 71 | 🧑‍💻 Contribution 72 | We welcome contributions! Here’s how to get started: 73 | 74 | Fork the repository. 75 | 76 | Create a new branch (git checkout -b feature/my-feature) 77 | 78 | Make your changes and commit (git commit -am 'Add new feature') 79 | 80 | Push to the branch (git push origin feature/my-feature) 81 | 82 | Create a pull request. 83 | 84 | 📜 License 85 | This project is licensed under the MIT License. 86 | 87 | 📈 SEO Keywords 88 | AI calling app, open source voice bot, Twilio AI integration, Node.js call automation, text to speech calling app, AI virtual agent, ElevenLabs TTS app, real-time AI voice call, automated calling using GPT, openai twilio voice app 89 | 90 | 🌐 Live Demo & Docs 91 | Coming soon... 92 | -------------------------------------------------------------------------------- /mainpy: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import base64 4 | import asyncio 5 | import websockets 6 | from fastapi import FastAPI, WebSocket, Request 7 | from fastapi.responses import HTMLResponse, JSONResponse 8 | from fastapi.websockets import WebSocketDisconnect 9 | from twilio.twiml.voice_response import VoiceResponse, Connect, Say, Stream 10 | from dotenv import load_dotenv 11 | import requests 12 | 13 | load_dotenv() 14 | 15 | # Configuration 16 | OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') 17 | SMS_KEY = os.getenv('SMS_KEY') 18 | PORT = int(os.getenv('PORT', 8010)) 19 | from datetime import datetime 20 | current_datetime = datetime.now() 21 | current_date = current_datetime.strftime("%d-%m-%Y") 22 | current_time = current_datetime.strftime("%I:%M %p") 23 | SYSTEM_MESSAGE = """ 24 | I am Cosmo, a healthcare diagnostic expert from Redcliffe Labs. I can assist you with healthcare queries and test bookings. 25 | 26 | Test Booking Requirements: 27 | - Phone Number (must be valid) 28 | - City 29 | - Test Name 30 | - Preferred Date 31 | - Preferred Time 32 | - Collection Type (Home Collection/In-Clinic Collection) 33 | 34 | Key Behaviors: 35 | 1. Ask only one follow-up question at a time to gather required information 36 | 2. After test selection: 37 | - Suggest preparation guidelines 38 | - Recommend optimal timing based on current date ({current_date}) and time ({current_time}) 39 | 3. Language Protocol: 40 | - Default: English 41 | - Switch to Hindi only if user communicates in Hindi/Hinglish 42 | 4. Persona: Female healthcare expert 43 | 44 | Sample Interaction Flow: 45 | User: "Can you book a test?" 46 | Cosmo: "Of course! Are you booking this test for yourself or someone else?" 47 | [If for self: Use existing user details] 48 | [If for others: Collect name and email] 49 | 50 | Booking Confirmation: 51 | - Summarize all collected details 52 | - Confirm booking 53 | - Mention that a confirmation email will be sent 54 | 55 | Note: Always verify phone numbers for validity before proceeding with booking. 56 | """ 57 | VOICE = 'alloy' 58 | LOG_EVENT_TYPES = [ 59 | 'error', 'response.content.done', 'rate_limits.updated', 60 | 'response.done', 'input_audio_buffer.committed', 61 | 'input_audio_buffer.speech_stopped', 'input_audio_buffer.speech_started', 62 | 'session.created' 63 | ] 64 | SHOW_TIMING_MATH = False 65 | 66 | app = FastAPI() 67 | 68 | if not OPENAI_API_KEY: 69 | raise ValueError('Missing the OpenAI API key. Please set it in the .env file.') 70 | 71 | @app.get("/", response_class=JSONResponse) 72 | async def index_page(): 73 | return {"message": "Twilio Media Stream Server is running 555!"} 74 | 75 | @app.api_route("/incoming-call", methods=["GET", "POST"]) 76 | async def handle_incoming_call(request: Request): 77 | """Handle incoming call and return TwiML response to connect to Media Stream.""" 78 | body = await request.body() 79 | print("Headers:", request.headers) 80 | print("Body:", body.decode()) 81 | response = VoiceResponse() 82 | # punctuation to improve text-to-speech flow 83 | response.say("Hello there! I am an AI call assistant created by Aman Patel") 84 | response.pause(length=1) 85 | response.say("O.K. you can start talking!") 86 | host = "api.amanpatel.in" #request.url.hostname 87 | connect = Connect() 88 | connect.stream(url=f'wss://{host}/media-stream') 89 | response.append(connect) 90 | return HTMLResponse(content=str(response), media_type="application/xml") 91 | 92 | @app.websocket("/media-stream") 93 | async def handle_media_stream(websocket: WebSocket): 94 | """Handle WebSocket connections between Twilio and OpenAI.""" 95 | print("Client connected") 96 | url = "https://www.fast2sms.com/dev/bulkV2" 97 | api_key = "WOfGCbYeFN7rynSxqPRpXMhJZu2g406mjBi3tlavUdkQ5IK1A9CqMnKYXaVxAEWd6N1u3OfTP9mSG7Je" 98 | msg='Your test booking is confirmed! Details: \nPhone: 9462255025 \nCity: Bangalore \nTest: Blood Test \nDate: 06-12-2024 \nTime: 10:00AM \nCollection: In-Clinic Collection \nThank you for choosing us!' 99 | querystring = { 100 | "authorization": api_key, 101 | "message": msg, 102 | "language": "english", 103 | "route": "q", 104 | "numbers": "9462255025" 105 | } 106 | headers = { 107 | 'cache-control': "no-cache" 108 | } 109 | response = requests.request("GET", url, headers=headers, params=querystring) 110 | print(response.status_code) 111 | print(response.text) 112 | await websocket.accept() 113 | 114 | async with websockets.connect( 115 | 'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01', 116 | extra_headers={ 117 | "Authorization": f"Bearer {OPENAI_API_KEY}", 118 | "OpenAI-Beta": "realtime=v1" 119 | } 120 | ) as openai_ws: 121 | await initialize_session(openai_ws) 122 | 123 | # Connection specific state 124 | stream_sid = None 125 | latest_media_timestamp = 0 126 | last_assistant_item = None 127 | mark_queue = [] 128 | response_start_timestamp_twilio = None 129 | 130 | async def receive_from_twilio(): 131 | """Receive audio data from Twilio and send it to the OpenAI Realtime API.""" 132 | nonlocal stream_sid, latest_media_timestamp 133 | try: 134 | async for message in websocket.iter_text(): 135 | data = json.loads(message) 136 | if data['event'] == 'media' and openai_ws.open: 137 | latest_media_timestamp = int(data['media']['timestamp']) 138 | audio_append = { 139 | "type": "input_audio_buffer.append", 140 | "audio": data['media']['payload'] 141 | } 142 | await openai_ws.send(json.dumps(audio_append)) 143 | elif data['event'] == 'start': 144 | stream_sid = data['start']['streamSid'] 145 | print(f"Incoming stream has started {stream_sid}") 146 | response_start_timestamp_twilio = None 147 | latest_media_timestamp = 0 148 | last_assistant_item = None 149 | elif data['event'] == 'mark': 150 | if mark_queue: 151 | mark_queue.pop(0) 152 | except WebSocketDisconnect: 153 | print("Client disconnected.") 154 | if openai_ws.open: 155 | await openai_ws.close() 156 | 157 | async def send_to_twilio(): 158 | """Receive events from the OpenAI Realtime API, send audio back to Twilio.""" 159 | nonlocal stream_sid, last_assistant_item, response_start_timestamp_twilio 160 | try: 161 | async for openai_message in openai_ws: 162 | response = json.loads(openai_message) 163 | if response['type'] in LOG_EVENT_TYPES: 164 | print(f"Received event: {response['type']}", response) 165 | 166 | if response.get('type') == 'response.audio.delta' and 'delta' in response: 167 | audio_payload = base64.b64encode(base64.b64decode(response['delta'])).decode('utf-8') 168 | audio_delta = { 169 | "event": "media", 170 | "streamSid": stream_sid, 171 | "media": { 172 | "payload": audio_payload 173 | } 174 | } 175 | await websocket.send_json(audio_delta) 176 | 177 | if response_start_timestamp_twilio is None: 178 | response_start_timestamp_twilio = latest_media_timestamp 179 | if SHOW_TIMING_MATH: 180 | print(f"Setting start timestamp for new response: {response_start_timestamp_twilio}ms") 181 | 182 | # Update last_assistant_item safely 183 | if response.get('item_id'): 184 | last_assistant_item = response['item_id'] 185 | 186 | await send_mark(websocket, stream_sid) 187 | 188 | # Trigger an interruption. Your use case might work better using `input_audio_buffer.speech_stopped`, or combining the two. 189 | if response.get('type') == 'input_audio_buffer.speech_started': 190 | print("Speech started detected.") 191 | if last_assistant_item: 192 | print(f"Interrupting response with id: {last_assistant_item}") 193 | await handle_speech_started_event() 194 | except Exception as e: 195 | print(f"Error in send_to_twilio: {e}") 196 | 197 | async def handle_speech_started_event(): 198 | """Handle interruption when the caller's speech starts.""" 199 | nonlocal response_start_timestamp_twilio, last_assistant_item 200 | print("Handling speech started event.") 201 | if mark_queue and response_start_timestamp_twilio is not None: 202 | elapsed_time = latest_media_timestamp - response_start_timestamp_twilio 203 | if SHOW_TIMING_MATH: 204 | print(f"Calculating elapsed time for truncation: {latest_media_timestamp} - {response_start_timestamp_twilio} = {elapsed_time}ms") 205 | 206 | if last_assistant_item: 207 | if SHOW_TIMING_MATH: 208 | print(f"Truncating item with ID: {last_assistant_item}, Truncated at: {elapsed_time}ms") 209 | 210 | truncate_event = { 211 | "type": "conversation.item.truncate", 212 | "item_id": last_assistant_item, 213 | "content_index": 0, 214 | "audio_end_ms": elapsed_time 215 | } 216 | await openai_ws.send(json.dumps(truncate_event)) 217 | 218 | await websocket.send_json({ 219 | "event": "clear", 220 | "streamSid": stream_sid 221 | }) 222 | 223 | mark_queue.clear() 224 | last_assistant_item = None 225 | response_start_timestamp_twilio = None 226 | 227 | async def send_mark(connection, stream_sid): 228 | if stream_sid: 229 | mark_event = { 230 | "event": "mark", 231 | "streamSid": stream_sid, 232 | "mark": {"name": "responsePart"} 233 | } 234 | await connection.send_json(mark_event) 235 | mark_queue.append('responsePart') 236 | 237 | await asyncio.gather(receive_from_twilio(), send_to_twilio()) 238 | 239 | async def send_initial_conversation_item(openai_ws): 240 | """Send initial conversation item if AI talks first.""" 241 | initial_conversation_item = { 242 | "type": "conversation.item.create", 243 | "item": { 244 | "type": "message", 245 | "role": "user", 246 | "content": [ 247 | { 248 | "type": "input_text", 249 | "text": "Hello there! I am an AI call assistant created by Aman Patel. How can I help you?'" 250 | } 251 | ] 252 | } 253 | } 254 | await openai_ws.send(json.dumps(initial_conversation_item)) 255 | await openai_ws.send(json.dumps({"type": "response.create"})) 256 | 257 | 258 | async def initialize_session(openai_ws): 259 | """Control initial session with OpenAI.""" 260 | session_update = { 261 | "type": "session.update", 262 | "session": { 263 | "turn_detection": {"type": "server_vad"}, 264 | "input_audio_format": "g711_ulaw", 265 | "output_audio_format": "g711_ulaw", 266 | "voice": VOICE, 267 | "instructions": SYSTEM_MESSAGE, 268 | "modalities": ["text", "audio"], 269 | "temperature": 0.8, 270 | } 271 | } 272 | print('Sending session update:', json.dumps(session_update)) 273 | await openai_ws.send(json.dumps(session_update)) 274 | 275 | # Uncomment the next line to have the AI speak first 276 | # await send_initial_conversation_item(openai_ws) 277 | 278 | async def send_sms(): 279 | url = "https://www.fast2sms.com/dev/bulkV2" 280 | api_key = SMS_KEY 281 | msg='Your test booking is confirmed! Details: \nPhone: 9462255025 \nCity: Bangalore \nTest: Blood Test \nDate: 06-12-2024 \nTime: 10:00AM \nCollection: In-Clinic Collection \nThank you for choosing us!' 282 | querystring = { 283 | "authorization": api_key, 284 | "message": msg, 285 | "language": "english", 286 | "route": "q", 287 | "numbers": "9462255025" 288 | } 289 | headers = { 290 | 'cache-control': "no-cache" 291 | } 292 | response = requests.request("GET", url, headers=headers, params=querystring) 293 | return response.json() 294 | if __name__ == "__main__": 295 | import uvicorn 296 | uvicorn.run(app, host="127.0.0.1", port=PORT) 297 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohappyeyeballs==2.4.0 2 | aiohttp==3.10.6 3 | aiohttp-retry==2.8.3 4 | aiosignal==1.3.1 5 | annotated-types==0.7.0 6 | anyio==4.6.0 7 | async-timeout==4.0.3 8 | attrs==24.2.0 9 | certifi==2024.8.30 10 | charset-normalizer==3.3.2 11 | click==8.1.7 12 | exceptiongroup==1.2.2 13 | fastapi==0.115.0 14 | frozenlist==1.4.1 15 | h11==0.14.0 16 | idna==3.10 17 | multidict==6.1.0 18 | pydantic==2.9.2 19 | pydantic_core==2.23.4 20 | PyJWT==2.9.0 21 | python-dotenv==1.0.1 22 | requests==2.32.3 23 | sniffio==1.3.1 24 | starlette==0.38.6 25 | twilio==9.3.2 26 | typing_extensions==4.12.2 27 | urllib3==2.2.3 28 | uvicorn==0.30.6 29 | websockets==13.1 30 | yarl==1.12.1 31 | requests 32 | --------------------------------------------------------------------------------