├── .gitignore ├── requirements.txt ├── README.md └── agent.py /.gitignore: -------------------------------------------------------------------------------- 1 | .env.local 2 | venv/ -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | livekit-api 2 | livekit-agents 3 | livekit-plugins-openai 4 | python-dotenv 5 | asyncio 6 | logging -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | > [!WARNING] 2 | > This example is outdated. See https://github.com/livekit-examples/python-agents-examples/tree/main/telephony for the latest Telephony examples for LiveKit Agents 3 | 4 | # LiveKit Phone Assistant Agent - SIP REFER Example 5 | 6 | ## Overview 7 | 8 | This repository contains an example implementation of a voice-enabled phone assistant using LiveKit and OpenAI. The `agent.py` module demonstrates how to handle voice interactions, DTMF signals, and SIP REFER transfers to different departments based on user input. 9 | 10 | The assistant provides options for callers to be transferred to Billing, Technical Support, or Customer Service departments by pressing corresponding digits. 11 | 12 | ## Features 13 | 14 | - **Voice Interaction**: Engages with users through voice using OpenAI's language models. 15 | - **DTMF Handling**: Listens for DTMF signals (keypad inputs) and responds accordingly. 16 | - **SIP REFER Transfer**: Transfers calls to different departments using SIP REFER requests. 17 | - **Multimodal Agent**: Utilizes LiveKit's multimodal capabilities to handle both audio and text modalities. 18 | 19 | ## Prerequisites 20 | 21 | - Python 3.7 or higher 22 | - A LiveKit Cloud account or self-hosted LiveKit server 23 | - OpenAI API key 24 | - Required Python packages listed in `requirements.txt` 25 | - A SIP Trunk with Twilio, connected to your LiveKit account as detailed [here](https://docs.livekit.io/sip/) 26 | 27 | ## Setup 28 | 29 | ### Clone the Repository 30 | 31 | ```bash 32 | git clone https://github.com/ShayneP/phone-assistant.git 33 | cd phone-assistant 34 | ``` 35 | 36 | ### Create a Virtual Environment 37 | 38 | It's always recommended to use a virtual environment to manage dependencies. 39 | 40 | ```bash 41 | python -m venv venv 42 | source venv/bin/activate # On Windows use `venv\Scripts\activate` 43 | ``` 44 | 45 | ### Install Dependencies 46 | 47 | ```bash 48 | pip install -r requirements.txt 49 | ``` 50 | 51 | ### Environment Variables 52 | 53 | Create a `.env.local` file in the root of the project with the following content: 54 | 55 | ```bash 56 | OPENAI_API_KEY=your-openai-api-key 57 | BILLING_PHONE_NUMBER=+12345678901 58 | TECH_SUPPORT_PHONE_NUMBER=+12345678901 59 | CUSTOMER_SERVICE_PHONE_NUMBER=+12345678901 60 | LIVEKIT_URL=wss://your-livekit-url.livekit.cloud 61 | LIVEKIT_API_KEY=your-livekit-api-key 62 | LIVEKIT_API_SECRET=your-livekit-api-secret 63 | ``` 64 | 65 | Replace the placeholder values with your actual API keys and phone numbers. 66 | 67 | ## Running the Assistant 68 | 69 | To start the phone assistant agent in development mode, run: 70 | 71 | ```bash 72 | python agent.py dev 73 | ``` 74 | 75 | When callers call the phone number that's attached to your SIP trunk, calls will be routed into LiveKit rooms. 76 | When a room is created, your Agent will join, wait for the caller to finish connecting, and then greet the user. 77 | 78 | ## How It Works 79 | 80 | ### Entry Point 81 | 82 | The `entrypoint` function serves as the main entry for the assistant. It initializes the `PhoneAssistant` class and manages the connection lifecycle. 83 | 84 | ### PhoneAssistant Class 85 | 86 | The `PhoneAssistant` class encapsulates the logic for: 87 | 88 | - Connecting to a LiveKit room. 89 | - Setting up event handlers for DTMF signals. 90 | - Initializing and starting the multimodal agent. 91 | - Handling SIP REFER transfers. 92 | 93 | #### Connecting to the Room 94 | 95 | The assistant connects to the LiveKit room and waits for a participant to join. 96 | 97 | ```python 98 | participant = await assistant.connect_to_room() 99 | ``` 100 | 101 | #### Starting the Agent 102 | 103 | Once connected, the assistant initializes the OpenAI model with specific instructions and starts the multimodal agent. 104 | 105 | ```python 106 | assistant.start_agent(participant) 107 | ``` 108 | 109 | #### Greeting the Caller 110 | 111 | Upon starting, the assistant greets the caller and provides options. 112 | 113 | ```python 114 | greeting = ( 115 | "Hi, thanks for calling Vandelay Industries!" 116 | "You can press 1 for Billing, 2 for Technical Support, " 117 | "or 3 for Customer Service. You can also just talk to me, since I'm a LiveKit agent." 118 | ) 119 | asyncio.create_task(assistant.say(greeting)) 120 | ``` 121 | 122 | #### Handling DTMF Signals 123 | 124 | The assistant sets up an event handler for DTMF signals to determine if the caller presses any digits. 125 | 126 | ```python 127 | @room.on("sip_dtmf_received") 128 | def handle_dtmf(dtmf_event: rtc.SipDTMF): 129 | # Logic to handle DTMF digits and initiate transfer 130 | ``` 131 | 132 | #### SIP REFER Transfer 133 | 134 | If the caller selects an option, the assistant uses SIP REFER to transfer the call to the appropriate department. 135 | 136 | ```python 137 | await assistant.transfer_call(identity, transfer_number) 138 | ``` 139 | 140 | ### Cleanup 141 | 142 | After the call ends or the room is disconnected, the resources used by the agent are cleaned up. 143 | 144 | ```python 145 | await assistant.cleanup() 146 | ``` 147 | 148 | ## Customization 149 | 150 | ### Updating Department Options 151 | 152 | You can customize the department options by modifying the `department_numbers` dictionary in the `_setup_event_handlers` method, and then changing the names of the phone numbers in your `.env.local` config file. 153 | 154 | ```python 155 | department_numbers = { 156 | "1": ("BILLING_PHONE_NUMBER", "Billing"), 157 | "2": ("TECH_SUPPORT_PHONE_NUMBER", "Tech Support"), 158 | "3": ("CUSTOMER_SERVICE_PHONE_NUMBER", "Customer Service") 159 | } 160 | ``` 161 | 162 | ### Changing Greetings and Messages 163 | 164 | Update the `greeting` variable and messages within the `say` method calls to change what the assistant says to the caller. 165 | 166 | > Note: It's important to relay the application's intent to use *voice* in the `say` method, or OpenAI will occasionally respond with a stream of text. 167 | 168 | ## Logging 169 | 170 | Logging is configured to output information to help with debugging and monitoring. 171 | 172 | ```python 173 | logger = logging.getLogger("phone-assistant") 174 | logger.setLevel(logging.INFO) 175 | ``` 176 | 177 | ## References 178 | 179 | - [LiveKit Python SDK](https://docs.livekit.io/guides/python) 180 | - [LiveKit SIP Guide](https://docs.livekit.io/sip/) 181 | - [OpenAI Realtime Integration Guide](https://docs.livekit.io/agents/openai/overview/) 182 | -------------------------------------------------------------------------------- /agent.py: -------------------------------------------------------------------------------- 1 | """ 2 | LiveKit Phone Assistant Agent 3 | This module implements a voice/text-enabled phone assistant using LiveKit and OpenAI. 4 | """ 5 | from __future__ import annotations 6 | 7 | import asyncio 8 | import logging 9 | import os 10 | from dotenv import load_dotenv 11 | from livekit import rtc, api 12 | from livekit.agents import ( 13 | AutoSubscribe, 14 | JobContext, 15 | WorkerOptions, 16 | cli, 17 | llm, 18 | ) 19 | from livekit.protocol import sip as proto_sip 20 | from livekit.agents.multimodal import MultimodalAgent 21 | from livekit.plugins import openai 22 | 23 | 24 | # Initialize environment variables 25 | # The .env.local file should look like: 26 | # OPENAI_API_KEY=your-key-here 27 | # BILLING_PHONE_NUMBER=+12345678901 28 | # TECH_SUPPORT_PHONE_NUMBER=+12345678901 29 | # CUSTOMER_SERVICE_PHONE_NUMBER=+12345678901 30 | # LIVEKIT_URL=wss://your-url-goes-here.livekit.cloud 31 | # LIVEKIT_API_KEY=your-key-here 32 | # LIVEKIT_API_SECRET=your-secret-here 33 | load_dotenv(dotenv_path=".env.local") 34 | 35 | # Initialize logging 36 | logger = logging.getLogger("phone-assistant") 37 | logger.setLevel(logging.INFO) 38 | 39 | 40 | class PhoneAssistant: 41 | """ 42 | A simple multimodal phone assistant that handles voice interactions. You can transfer the call to a department 43 | based on the DTMF digit pressed by the user. 44 | """ 45 | 46 | def __init__(self, context: JobContext): 47 | """ 48 | Initialize the PhoneAssistant with the context about the room, participant, etc. 49 | 50 | Args: 51 | context (JobContext): The context for the job. 52 | """ 53 | self.context = context 54 | self.assistant = None 55 | self.model = None 56 | self.livekit_api = None 57 | 58 | async def say(self, message: str) -> None: 59 | """ 60 | Ask the assistant to speak a message to the user. The assistant needs to be told to use its 61 | voice to respond. If you don't do this, the assistant may respond with text instead of voice, 62 | which doesn't make much sense on a phone call. 63 | 64 | Args: 65 | message (str): The message to say. 66 | """ 67 | if self.model and hasattr(self.model, 'sessions'): 68 | session = self.model.sessions[0] 69 | session.conversation.item.create( 70 | llm.ChatMessage( 71 | role="assistant", 72 | content=f"Using your voice to respond, please say: {message}" 73 | ) 74 | ) 75 | session.response.create() 76 | logger.debug(f"Asked assistant to say: {message}") 77 | 78 | async def connect_to_room(self) -> rtc.Participant: 79 | """ 80 | Connect to the LiveKit room and wait for a participant to join. 81 | 82 | Returns: 83 | rtc.Participant: The connected participant. 84 | """ 85 | room_name = self.context.room.name 86 | logger.info(f"Connecting to room: {room_name}") 87 | await self.context.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) 88 | self._setup_event_handlers(self.context.room) 89 | participant = await self.context.wait_for_participant() 90 | return participant 91 | 92 | def _setup_event_handlers(self, room: rtc.Room) -> None: 93 | """ 94 | Set up event handlers for any room events we care about. In this case, it's only the DTMF codes, 95 | but you could handle any other room events too. 96 | 97 | Args: 98 | room (rtc.Room): The LiveKit room instance. 99 | """ 100 | 101 | @room.on("sip_dtmf_received") 102 | def handle_dtmf(dtmf_event: rtc.SipDTMF): 103 | """ 104 | Handle DTMF (Dual-Tone Multi-Frequency) signals received from SIP. (These are the sounds 105 | that are made when a user presses a number on a phone keypad.) 106 | 107 | Args: 108 | dtmf_event (rtc.SipDTMF): The DTMF event data. 109 | """ 110 | code = dtmf_event.code 111 | digit = dtmf_event.digit 112 | identity = dtmf_event.participant.identity 113 | logger.info(f"DTMF received - Code: {code}, Digit: '{digit}'") 114 | 115 | # Define department mapping 116 | department_numbers = { 117 | "1": ("BILLING_PHONE_NUMBER", "Billing"), 118 | "2": ("TECH_SUPPORT_PHONE_NUMBER", "Tech Support"), 119 | "3": ("CUSTOMER_SERVICE_PHONE_NUMBER", "Customer Service") 120 | } 121 | logger.info(f"Department numbers: {department_numbers}") 122 | if digit in department_numbers: 123 | env_var, dept_name = department_numbers[digit] 124 | transfer_number = f"tel:{os.getenv(env_var)}" 125 | asyncio.create_task(self._handle_transfer(identity, transfer_number, dept_name)) 126 | else: 127 | asyncio.create_task(self.say("I'm sorry, please choose one of the options I mentioned earlier.")) 128 | 129 | 130 | async def _handle_transfer(self, identity: str, transfer_number: str, department: str) -> None: 131 | """ 132 | Handle the transfer process with department-specific messaging. 133 | 134 | Args: 135 | identity (str): The participant's identity 136 | transfer_number (str): The number to transfer to 137 | department (str): The name of the department 138 | """ 139 | await self.say(f"Transferring you to our {department} department in a moment. Please hold.") 140 | await asyncio.sleep(6) 141 | await self.transfer_call(identity, transfer_number) 142 | 143 | 144 | def start_agent(self, participant: rtc.Participant) -> None: 145 | """ 146 | Initialize and start the multimodal agent. 147 | 148 | Args: 149 | participant (rtc.Participant): The participant to interact with. 150 | """ 151 | 152 | # Initialize the OpenAI model with updated instructions 153 | self.model = openai.realtime.RealtimeModel( 154 | instructions=( 155 | "You are a friendly assistant providing support. " 156 | "Please inform users they can:\n" 157 | "- Press 1 for Billing\n" 158 | "- Press 2 for Technical Support\n" 159 | "- Press 3 for Customer Service" 160 | ), 161 | # We use Audio for voice, and text to feed the model context behind the scenes. 162 | # Whenever we use text, it's important to make sure the model knows it's supposed 163 | # to respond with voice. We do this with prompt engineering throughout the agent. 164 | modalities=["audio", "text"], 165 | voice="sage" 166 | ) 167 | 168 | # Create and start the multimodal agent 169 | self.assistant = MultimodalAgent(model=self.model) 170 | self.assistant.start(self.context.room, participant) 171 | 172 | # Greeting with menu options. This is the first thing the assistant says to the user. 173 | # You don't need to have a greeting, but it's a good idea to have one if calls are incoming. 174 | greeting = ( 175 | "Hi, thanks for calling Vandelay Industries — global leader in fine latex goods!" 176 | "You can press 1 for Billing, 2 for Technical Support, " 177 | "or 3 for Customer Service. You can also just talk to me, since I'm a LiveKit agent." 178 | ) 179 | asyncio.create_task(self.say(greeting)) 180 | 181 | async def transfer_call(self, participant_identity: str, transfer_to: str) -> None: 182 | """ 183 | Transfer the SIP call to another number. This will essentially end the current call and start a new one, 184 | the PhoneAssistant will no longer be active on the call. 185 | 186 | Args: 187 | participant_identity (str): The identity of the participant. 188 | transfer_to (str): The phone number to transfer the call to. 189 | """ 190 | logger.info(f"Transferring call for participant {participant_identity} to {transfer_to}") 191 | 192 | try: 193 | # Initialize LiveKit API client if not already done 194 | if not self.livekit_api: 195 | livekit_url = os.getenv('LIVEKIT_URL') 196 | api_key = os.getenv('LIVEKIT_API_KEY') 197 | api_secret = os.getenv('LIVEKIT_API_SECRET') 198 | logger.debug(f"Initializing LiveKit API client with URL: {livekit_url}") 199 | self.livekit_api = api.LiveKitAPI( 200 | url=livekit_url, 201 | api_key=api_key, 202 | api_secret=api_secret 203 | ) 204 | 205 | # Create transfer request 206 | transfer_request = proto_sip.TransferSIPParticipantRequest( 207 | participant_identity=participant_identity, 208 | room_name=self.context.room.name, 209 | transfer_to=transfer_to, 210 | play_dialtone=True 211 | ) 212 | logger.debug(f"Transfer request: {transfer_request}") 213 | 214 | # Perform transfer 215 | await self.livekit_api.sip.transfer_sip_participant(transfer_request) 216 | logger.info(f"Successfully transferred participant {participant_identity} to {transfer_to}") 217 | 218 | except Exception as e: 219 | logger.error(f"Failed to transfer call: {e}", exc_info=True) 220 | await self.say("I'm sorry, I couldn't transfer your call. Is there something else I can help with?") 221 | 222 | async def cleanup(self) -> None: 223 | """ 224 | Clean up resources before shutting down. 225 | """ 226 | if self.livekit_api: 227 | await self.livekit_api.aclose() 228 | self.livekit_api = None 229 | 230 | 231 | async def entrypoint(context: JobContext) -> None: 232 | """ 233 | The main entry point for the phone assistant application. 234 | 235 | Args: 236 | context (JobContext): The context for the job. 237 | """ 238 | assistant = PhoneAssistant(context) 239 | disconnect_event = asyncio.Event() 240 | 241 | @context.room.on("disconnected") 242 | def on_room_disconnect(*args): 243 | disconnect_event.set() 244 | 245 | try: 246 | participant = await assistant.connect_to_room() 247 | assistant.start_agent(participant) 248 | # Wait until the room is disconnected 249 | await disconnect_event.wait() 250 | finally: 251 | await assistant.cleanup() 252 | 253 | 254 | if __name__ == "__main__": 255 | cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) --------------------------------------------------------------------------------