├── .env
├── README.md
├── mainpy
└── requirements.txt


/.env:
--------------------------------------------------------------------------------
1 | OPENAI_API_KEY=
2 | SMS_KEY=


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | AI-Call 🔊🤖 – AI-Powered Calling Application
 2 | AI-Call is an open-source, AI-powered calling application that leverages voice synthesis and real-time communication technologies to simulate intelligent phone conversations. Designed to bring automation, personalization, and intelligence into phone-based interactions, AI-Call is perfect for developers, researchers, and businesses interested in conversational AI, virtual assistants, or AI customer support systems.
 3 | 
 4 | 🚀 Features
 5 | 🔊 AI-Powered Voice Calling: Simulate real-time phone calls with AI-generated speech.
 6 | 
 7 | 🧠 Natural Language Processing (NLP): Understand and generate human-like responses.
 8 | 
 9 | 🕹️ Real-Time Communication: Built using cutting-edge WebRTC and Twilio technologies.
10 | 
11 | 🌐 Voice Cloning & Text-to-Speech (TTS): Customize call responses with your own voice model.
12 | 
13 | 💡 Open Source and Developer-Friendly: Fully customizable for your use case.
14 | 
15 | 🛠️ Tech Stack
16 | Frontend: HTML, CSS, JavaScript
17 | 
18 | Backend: Node.js, Express
19 | 
20 | AI & Voice: OpenAI APIs, ElevenLabs TTS
21 | 
22 | Telephony: Twilio Programmable Voice
23 | 
24 | 📦 Installation
25 | bash
26 | Copy
27 | Edit
28 | git clone https://github.com/amanp8l/ai-call.git
29 | cd ai-call
30 | npm install
31 | Add a .env file with the following variables:
32 | 
33 | env
34 | Copy
35 | Edit
36 | PORT=3000
37 | TWILIO_ACCOUNT_SID=your_account_sid
38 | TWILIO_AUTH_TOKEN=your_auth_token
39 | TWILIO_PHONE_NUMBER=your_twilio_phone_number
40 | OPENAI_API_KEY=your_openai_api_key
41 | ELEVENLABS_API_KEY=your_elevenlabs_api_key
42 | 🚴‍♀️ Getting Started
43 | Start the server locally:
44 | 
45 | bash
46 | Copy
47 | Edit
48 | npm start
49 | Open your browser and go to http://localhost:3000 to access the AI-powered call interface.
50 | 
51 | 🔧 How It Works
52 | User initiates a call.
53 | 
54 | Twilio routes the call to the AI server.
55 | 
56 | The server uses OpenAI to generate responses.
57 | 
58 | ElevenLabs converts text responses to speech.
59 | 
60 | The speech is sent back to the caller in real time.
61 | 
62 | 🎯 Use Cases
63 | 📞 AI Virtual Agents for customer support
64 | 
65 | 🧪 Conversational AI Experiments
66 | 
67 | 🗓️ Automated Appointment Reminders
68 | 
69 | 💬 Voice-enabled Chatbots
70 | 
71 | 🧑‍💻 Contribution
72 | We welcome contributions! Here’s how to get started:
73 | 
74 | Fork the repository.
75 | 
76 | Create a new branch (git checkout -b feature/my-feature)
77 | 
78 | Make your changes and commit (git commit -am 'Add new feature')
79 | 
80 | Push to the branch (git push origin feature/my-feature)
81 | 
82 | Create a pull request.
83 | 
84 | 📜 License
85 | This project is licensed under the MIT License.
86 | 
87 | 📈 SEO Keywords
88 | AI calling app, open source voice bot, Twilio AI integration, Node.js call automation, text to speech calling app, AI virtual agent, ElevenLabs TTS app, real-time AI voice call, automated calling using GPT, openai twilio voice app
89 | 
90 | 🌐 Live Demo & Docs
91 | Coming soon...
92 | 


--------------------------------------------------------------------------------
/mainpy:
--------------------------------------------------------------------------------
  1 | import os
  2 | import json
  3 | import base64
  4 | import asyncio
  5 | import websockets
  6 | from fastapi import FastAPI, WebSocket, Request
  7 | from fastapi.responses import HTMLResponse, JSONResponse
  8 | from fastapi.websockets import WebSocketDisconnect
  9 | from twilio.twiml.voice_response import VoiceResponse, Connect, Say, Stream
 10 | from dotenv import load_dotenv
 11 | import requests
 12 | 
 13 | load_dotenv()
 14 | 
 15 | # Configuration
 16 | OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
 17 | SMS_KEY = os.getenv('SMS_KEY')
 18 | PORT = int(os.getenv('PORT', 8010))
 19 | from datetime import datetime
 20 | current_datetime = datetime.now()
 21 | current_date = current_datetime.strftime("%d-%m-%Y")
 22 | current_time = current_datetime.strftime("%I:%M %p")
 23 | SYSTEM_MESSAGE = """
 24 | I am Cosmo, a healthcare diagnostic expert from Redcliffe Labs. I can assist you with healthcare queries and test bookings.
 25 | 
 26 | Test Booking Requirements:
 27 | - Phone Number (must be valid)
 28 | - City
 29 | - Test Name
 30 | - Preferred Date
 31 | - Preferred Time
 32 | - Collection Type (Home Collection/In-Clinic Collection)
 33 | 
 34 | Key Behaviors:
 35 | 1. Ask only one follow-up question at a time to gather required information
 36 | 2. After test selection:
 37 |    - Suggest preparation guidelines
 38 |    - Recommend optimal timing based on current date ({current_date}) and time ({current_time})
 39 | 3. Language Protocol:
 40 |    - Default: English
 41 |    - Switch to Hindi only if user communicates in Hindi/Hinglish
 42 | 4. Persona: Female healthcare expert
 43 | 
 44 | Sample Interaction Flow:
 45 | User: "Can you book a test?"
 46 | Cosmo: "Of course! Are you booking this test for yourself or someone else?"
 47 | [If for self: Use existing user details]
 48 | [If for others: Collect name and email]
 49 | 
 50 | Booking Confirmation:
 51 | - Summarize all collected details
 52 | - Confirm booking
 53 | - Mention that a confirmation email will be sent
 54 | 
 55 | Note: Always verify phone numbers for validity before proceeding with booking.
 56 | """
 57 | VOICE = 'alloy'
 58 | LOG_EVENT_TYPES = [
 59 |     'error', 'response.content.done', 'rate_limits.updated',
 60 |     'response.done', 'input_audio_buffer.committed',
 61 |     'input_audio_buffer.speech_stopped', 'input_audio_buffer.speech_started',
 62 |     'session.created'
 63 | ]
 64 | SHOW_TIMING_MATH = False
 65 | 
 66 | app = FastAPI()
 67 | 
 68 | if not OPENAI_API_KEY:
 69 |     raise ValueError('Missing the OpenAI API key. Please set it in the .env file.')
 70 | 
 71 | @app.get("/", response_class=JSONResponse)
 72 | async def index_page():
 73 |     return {"message": "Twilio Media Stream Server is running 555!"}
 74 | 
 75 | @app.api_route("/incoming-call", methods=["GET", "POST"])
 76 | async def handle_incoming_call(request: Request):
 77 |     """Handle incoming call and return TwiML response to connect to Media Stream."""
 78 |     body = await request.body()
 79 |     print("Headers:", request.headers)
 80 |     print("Body:", body.decode())
 81 |     response = VoiceResponse()
 82 |     # <Say> punctuation to improve text-to-speech flow
 83 |     response.say("Hello there! I am an AI call assistant created by Aman Patel")
 84 |     response.pause(length=1)
 85 |     response.say("O.K. you can start talking!")
 86 |     host = "api.amanpatel.in" #request.url.hostname
 87 |     connect = Connect()
 88 |     connect.stream(url=f'wss://{host}/media-stream')
 89 |     response.append(connect)
 90 |     return HTMLResponse(content=str(response), media_type="application/xml")
 91 | 
 92 | @app.websocket("/media-stream")
 93 | async def handle_media_stream(websocket: WebSocket):
 94 |     """Handle WebSocket connections between Twilio and OpenAI."""
 95 |     print("Client connected")
 96 |     url = "https://www.fast2sms.com/dev/bulkV2"
 97 |     api_key = "WOfGCbYeFN7rynSxqPRpXMhJZu2g406mjBi3tlavUdkQ5IK1A9CqMnKYXaVxAEWd6N1u3OfTP9mSG7Je"
 98 |     msg='Your test booking is confirmed! Details: \nPhone: 9462255025 \nCity: Bangalore \nTest: Blood Test \nDate: 06-12-2024 \nTime: 10:00AM \nCollection: In-Clinic Collection  \nThank you for choosing us!'
 99 |     querystring = {
100 |         "authorization": api_key,
101 |         "message": msg,
102 |         "language": "english",
103 |         "route": "q",
104 |         "numbers": "9462255025"
105 |     }
106 |     headers = {
107 |         'cache-control': "no-cache"
108 |     }
109 |     response = requests.request("GET", url, headers=headers, params=querystring)
110 |     print(response.status_code)
111 |     print(response.text)
112 |     await websocket.accept()
113 | 
114 |     async with websockets.connect(
115 |         'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01',
116 |         extra_headers={
117 |             "Authorization": f"Bearer {OPENAI_API_KEY}",
118 |             "OpenAI-Beta": "realtime=v1"
119 |         }
120 |     ) as openai_ws:
121 |         await initialize_session(openai_ws)
122 | 
123 |         # Connection specific state
124 |         stream_sid = None
125 |         latest_media_timestamp = 0
126 |         last_assistant_item = None
127 |         mark_queue = []
128 |         response_start_timestamp_twilio = None
129 |         
130 |         async def receive_from_twilio():
131 |             """Receive audio data from Twilio and send it to the OpenAI Realtime API."""
132 |             nonlocal stream_sid, latest_media_timestamp
133 |             try:
134 |                 async for message in websocket.iter_text():
135 |                     data = json.loads(message)
136 |                     if data['event'] == 'media' and openai_ws.open:
137 |                         latest_media_timestamp = int(data['media']['timestamp'])
138 |                         audio_append = {
139 |                             "type": "input_audio_buffer.append",
140 |                             "audio": data['media']['payload']
141 |                         }
142 |                         await openai_ws.send(json.dumps(audio_append))
143 |                     elif data['event'] == 'start':
144 |                         stream_sid = data['start']['streamSid']
145 |                         print(f"Incoming stream has started {stream_sid}")
146 |                         response_start_timestamp_twilio = None
147 |                         latest_media_timestamp = 0
148 |                         last_assistant_item = None
149 |                     elif data['event'] == 'mark':
150 |                         if mark_queue:
151 |                             mark_queue.pop(0)
152 |             except WebSocketDisconnect:
153 |                 print("Client disconnected.")
154 |                 if openai_ws.open:
155 |                     await openai_ws.close()
156 | 
157 |         async def send_to_twilio():
158 |             """Receive events from the OpenAI Realtime API, send audio back to Twilio."""
159 |             nonlocal stream_sid, last_assistant_item, response_start_timestamp_twilio
160 |             try:
161 |                 async for openai_message in openai_ws:
162 |                     response = json.loads(openai_message)
163 |                     if response['type'] in LOG_EVENT_TYPES:
164 |                         print(f"Received event: {response['type']}", response)
165 | 
166 |                     if response.get('type') == 'response.audio.delta' and 'delta' in response:
167 |                         audio_payload = base64.b64encode(base64.b64decode(response['delta'])).decode('utf-8')
168 |                         audio_delta = {
169 |                             "event": "media",
170 |                             "streamSid": stream_sid,
171 |                             "media": {
172 |                                 "payload": audio_payload
173 |                             }
174 |                         }
175 |                         await websocket.send_json(audio_delta)
176 | 
177 |                         if response_start_timestamp_twilio is None:
178 |                             response_start_timestamp_twilio = latest_media_timestamp
179 |                             if SHOW_TIMING_MATH:
180 |                                 print(f"Setting start timestamp for new response: {response_start_timestamp_twilio}ms")
181 | 
182 |                         # Update last_assistant_item safely
183 |                         if response.get('item_id'):
184 |                             last_assistant_item = response['item_id']
185 | 
186 |                         await send_mark(websocket, stream_sid)
187 | 
188 |                     # Trigger an interruption. Your use case might work better using `input_audio_buffer.speech_stopped`, or combining the two.
189 |                     if response.get('type') == 'input_audio_buffer.speech_started':
190 |                         print("Speech started detected.")
191 |                         if last_assistant_item:
192 |                             print(f"Interrupting response with id: {last_assistant_item}")
193 |                             await handle_speech_started_event()
194 |             except Exception as e:
195 |                 print(f"Error in send_to_twilio: {e}")
196 | 
197 |         async def handle_speech_started_event():
198 |             """Handle interruption when the caller's speech starts."""
199 |             nonlocal response_start_timestamp_twilio, last_assistant_item
200 |             print("Handling speech started event.")
201 |             if mark_queue and response_start_timestamp_twilio is not None:
202 |                 elapsed_time = latest_media_timestamp - response_start_timestamp_twilio
203 |                 if SHOW_TIMING_MATH:
204 |                     print(f"Calculating elapsed time for truncation: {latest_media_timestamp} - {response_start_timestamp_twilio} = {elapsed_time}ms")
205 | 
206 |                 if last_assistant_item:
207 |                     if SHOW_TIMING_MATH:
208 |                         print(f"Truncating item with ID: {last_assistant_item}, Truncated at: {elapsed_time}ms")
209 | 
210 |                     truncate_event = {
211 |                         "type": "conversation.item.truncate",
212 |                         "item_id": last_assistant_item,
213 |                         "content_index": 0,
214 |                         "audio_end_ms": elapsed_time
215 |                     }
216 |                     await openai_ws.send(json.dumps(truncate_event))
217 | 
218 |                 await websocket.send_json({
219 |                     "event": "clear",
220 |                     "streamSid": stream_sid
221 |                 })
222 | 
223 |                 mark_queue.clear()
224 |                 last_assistant_item = None
225 |                 response_start_timestamp_twilio = None
226 | 
227 |         async def send_mark(connection, stream_sid):
228 |             if stream_sid:
229 |                 mark_event = {
230 |                     "event": "mark",
231 |                     "streamSid": stream_sid,
232 |                     "mark": {"name": "responsePart"}
233 |                 }
234 |                 await connection.send_json(mark_event)
235 |                 mark_queue.append('responsePart')
236 | 
237 |         await asyncio.gather(receive_from_twilio(), send_to_twilio())
238 | 
239 | async def send_initial_conversation_item(openai_ws):
240 |     """Send initial conversation item if AI talks first."""
241 |     initial_conversation_item = {
242 |         "type": "conversation.item.create",
243 |         "item": {
244 |             "type": "message",
245 |             "role": "user",
246 |             "content": [
247 |                 {
248 |                     "type": "input_text",
249 |                     "text": "Hello there! I am an AI call assistant created by Aman Patel. How can I help you?'"
250 |                 }
251 |             ]
252 |         }
253 |     }
254 |     await openai_ws.send(json.dumps(initial_conversation_item))
255 |     await openai_ws.send(json.dumps({"type": "response.create"}))
256 | 
257 | 
258 | async def initialize_session(openai_ws):
259 |     """Control initial session with OpenAI."""
260 |     session_update = {
261 |         "type": "session.update",
262 |         "session": {
263 |             "turn_detection": {"type": "server_vad"},
264 |             "input_audio_format": "g711_ulaw",
265 |             "output_audio_format": "g711_ulaw",
266 |             "voice": VOICE,
267 |             "instructions": SYSTEM_MESSAGE,
268 |             "modalities": ["text", "audio"],
269 |             "temperature": 0.8,
270 |         }
271 |     }
272 |     print('Sending session update:', json.dumps(session_update))
273 |     await openai_ws.send(json.dumps(session_update))
274 | 
275 |     # Uncomment the next line to have the AI speak first
276 |     # await send_initial_conversation_item(openai_ws)
277 |     
278 | async def send_sms():
279 |     url = "https://www.fast2sms.com/dev/bulkV2"
280 |     api_key = SMS_KEY
281 |     msg='Your test booking is confirmed! Details: \nPhone: 9462255025 \nCity: Bangalore \nTest: Blood Test \nDate: 06-12-2024 \nTime: 10:00AM \nCollection: In-Clinic Collection  \nThank you for choosing us!'
282 |     querystring = {
283 |         "authorization": api_key,
284 |         "message": msg,
285 |         "language": "english",
286 |         "route": "q",
287 |         "numbers": "9462255025"
288 |     }
289 |     headers = {
290 |         'cache-control': "no-cache"
291 |     }
292 |     response = requests.request("GET", url, headers=headers, params=querystring)
293 |     return response.json() 
294 | if __name__ == "__main__":
295 |     import uvicorn
296 |     uvicorn.run(app, host="127.0.0.1", port=PORT)
297 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | aiohappyeyeballs==2.4.0
 2 | aiohttp==3.10.6
 3 | aiohttp-retry==2.8.3
 4 | aiosignal==1.3.1
 5 | annotated-types==0.7.0
 6 | anyio==4.6.0
 7 | async-timeout==4.0.3
 8 | attrs==24.2.0
 9 | certifi==2024.8.30
10 | charset-normalizer==3.3.2
11 | click==8.1.7
12 | exceptiongroup==1.2.2
13 | fastapi==0.115.0
14 | frozenlist==1.4.1
15 | h11==0.14.0
16 | idna==3.10
17 | multidict==6.1.0
18 | pydantic==2.9.2
19 | pydantic_core==2.23.4
20 | PyJWT==2.9.0
21 | python-dotenv==1.0.1
22 | requests==2.32.3
23 | sniffio==1.3.1
24 | starlette==0.38.6
25 | twilio==9.3.2
26 | typing_extensions==4.12.2
27 | urllib3==2.2.3
28 | uvicorn==0.30.6
29 | websockets==13.1
30 | yarl==1.12.1
31 | requests
32 | 


--------------------------------------------------------------------------------