├── .gitignore
├── requirements.txt
├── README.md
└── agent.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .env.local
2 | venv/


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | livekit-api
2 | livekit-agents
3 | livekit-plugins-openai
4 | python-dotenv
5 | asyncio
6 | logging


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | > [!WARNING]
  2 | > This example is outdated. See https://github.com/livekit-examples/python-agents-examples/tree/main/telephony for the latest Telephony examples for LiveKit Agents
  3 | 
  4 | # LiveKit Phone Assistant Agent - SIP REFER Example
  5 | 
  6 | ## Overview
  7 | 
  8 | This repository contains an example implementation of a voice-enabled phone assistant using LiveKit and OpenAI. The `agent.py` module demonstrates how to handle voice interactions, DTMF signals, and SIP REFER transfers to different departments based on user input.
  9 | 
 10 | The assistant provides options for callers to be transferred to Billing, Technical Support, or Customer Service departments by pressing corresponding digits.
 11 | 
 12 | ## Features
 13 | 
 14 | - **Voice Interaction**: Engages with users through voice using OpenAI's language models.
 15 | - **DTMF Handling**: Listens for DTMF signals (keypad inputs) and responds accordingly.
 16 | - **SIP REFER Transfer**: Transfers calls to different departments using SIP REFER requests.
 17 | - **Multimodal Agent**: Utilizes LiveKit's multimodal capabilities to handle both audio and text modalities.
 18 | 
 19 | ## Prerequisites
 20 | 
 21 | - Python 3.7 or higher
 22 | - A LiveKit Cloud account or self-hosted LiveKit server
 23 | - OpenAI API key
 24 | - Required Python packages listed in `requirements.txt`
 25 | - A SIP Trunk with Twilio, connected to your LiveKit account as detailed [here](https://docs.livekit.io/sip/)
 26 | 
 27 | ## Setup
 28 | 
 29 | ### Clone the Repository
 30 | 
 31 | ```bash
 32 | git clone https://github.com/ShayneP/phone-assistant.git
 33 | cd phone-assistant
 34 | ```
 35 | 
 36 | ### Create a Virtual Environment
 37 | 
 38 | It's always recommended to use a virtual environment to manage dependencies.
 39 | 
 40 | ```bash
 41 | python -m venv venv
 42 | source venv/bin/activate  # On Windows use `venv\Scripts\activate`
 43 | ```
 44 | 
 45 | ### Install Dependencies
 46 | 
 47 | ```bash
 48 | pip install -r requirements.txt
 49 | ```
 50 | 
 51 | ### Environment Variables
 52 | 
 53 | Create a `.env.local` file in the root of the project with the following content:
 54 | 
 55 | ```bash
 56 | OPENAI_API_KEY=your-openai-api-key
 57 | BILLING_PHONE_NUMBER=+12345678901
 58 | TECH_SUPPORT_PHONE_NUMBER=+12345678901
 59 | CUSTOMER_SERVICE_PHONE_NUMBER=+12345678901
 60 | LIVEKIT_URL=wss://your-livekit-url.livekit.cloud
 61 | LIVEKIT_API_KEY=your-livekit-api-key
 62 | LIVEKIT_API_SECRET=your-livekit-api-secret
 63 | ```
 64 | 
 65 | Replace the placeholder values with your actual API keys and phone numbers.
 66 | 
 67 | ## Running the Assistant
 68 | 
 69 | To start the phone assistant agent in development mode, run:
 70 | 
 71 | ```bash
 72 | python agent.py dev
 73 | ```
 74 | 
 75 | When callers call the phone number that's attached to your SIP trunk, calls will be routed into LiveKit rooms.
 76 | When a room is created, your Agent will join, wait for the caller to finish connecting, and then greet the user. 
 77 | 
 78 | ## How It Works
 79 | 
 80 | ### Entry Point
 81 | 
 82 | The `entrypoint` function serves as the main entry for the assistant. It initializes the `PhoneAssistant` class and manages the connection lifecycle.
 83 | 
 84 | ### PhoneAssistant Class
 85 | 
 86 | The `PhoneAssistant` class encapsulates the logic for:
 87 | 
 88 | - Connecting to a LiveKit room.
 89 | - Setting up event handlers for DTMF signals.
 90 | - Initializing and starting the multimodal agent.
 91 | - Handling SIP REFER transfers.
 92 | 
 93 | #### Connecting to the Room
 94 | 
 95 | The assistant connects to the LiveKit room and waits for a participant to join.
 96 | 
 97 | ```python
 98 | participant = await assistant.connect_to_room()
 99 | ```
100 | 
101 | #### Starting the Agent
102 | 
103 | Once connected, the assistant initializes the OpenAI model with specific instructions and starts the multimodal agent.
104 | 
105 | ```python
106 | assistant.start_agent(participant)
107 | ```
108 | 
109 | #### Greeting the Caller
110 | 
111 | Upon starting, the assistant greets the caller and provides options.
112 | 
113 | ```python
114 | greeting = (
115 |     "Hi, thanks for calling Vandelay Industries!"
116 |     "You can press 1 for Billing, 2 for Technical Support, "
117 |     "or 3 for Customer Service. You can also just talk to me, since I'm a LiveKit agent."
118 | )
119 | asyncio.create_task(assistant.say(greeting))
120 | ```
121 | 
122 | #### Handling DTMF Signals
123 | 
124 | The assistant sets up an event handler for DTMF signals to determine if the caller presses any digits.
125 | 
126 | ```python
127 | @room.on("sip_dtmf_received")
128 | def handle_dtmf(dtmf_event: rtc.SipDTMF):
129 |     # Logic to handle DTMF digits and initiate transfer
130 | ```
131 | 
132 | #### SIP REFER Transfer
133 | 
134 | If the caller selects an option, the assistant uses SIP REFER to transfer the call to the appropriate department.
135 | 
136 | ```python
137 | await assistant.transfer_call(identity, transfer_number)
138 | ```
139 | 
140 | ### Cleanup
141 | 
142 | After the call ends or the room is disconnected, the resources used by the agent are cleaned up.
143 | 
144 | ```python
145 | await assistant.cleanup()
146 | ```
147 | 
148 | ## Customization
149 | 
150 | ### Updating Department Options
151 | 
152 | You can customize the department options by modifying the `department_numbers` dictionary in the `_setup_event_handlers` method, and then changing the names of the phone numbers in your `.env.local` config file.
153 | 
154 | ```python
155 | department_numbers = {
156 |     "1": ("BILLING_PHONE_NUMBER", "Billing"),
157 |     "2": ("TECH_SUPPORT_PHONE_NUMBER", "Tech Support"),
158 |     "3": ("CUSTOMER_SERVICE_PHONE_NUMBER", "Customer Service")
159 | }
160 | ```
161 | 
162 | ### Changing Greetings and Messages
163 | 
164 | Update the `greeting` variable and messages within the `say` method calls to change what the assistant says to the caller.
165 | 
166 | > Note: It's important to relay the application's intent to use *voice* in the `say` method, or OpenAI will occasionally respond with a stream of text.
167 | 
168 | ## Logging
169 | 
170 | Logging is configured to output information to help with debugging and monitoring.
171 | 
172 | ```python
173 | logger = logging.getLogger("phone-assistant")
174 | logger.setLevel(logging.INFO)
175 | ```
176 | 
177 | ## References
178 | 
179 | - [LiveKit Python SDK](https://docs.livekit.io/guides/python)
180 | - [LiveKit SIP Guide](https://docs.livekit.io/sip/)
181 | - [OpenAI Realtime Integration Guide](https://docs.livekit.io/agents/openai/overview/)
182 | 


--------------------------------------------------------------------------------
/agent.py:
--------------------------------------------------------------------------------
  1 | """
  2 | LiveKit Phone Assistant Agent
  3 | This module implements a voice/text-enabled phone assistant using LiveKit and OpenAI.
  4 | """
  5 | from __future__ import annotations
  6 | 
  7 | import asyncio
  8 | import logging
  9 | import os
 10 | from dotenv import load_dotenv
 11 | from livekit import rtc, api
 12 | from livekit.agents import (
 13 |     AutoSubscribe,
 14 |     JobContext,
 15 |     WorkerOptions,
 16 |     cli,
 17 |     llm,
 18 | )
 19 | from livekit.protocol import sip as proto_sip
 20 | from livekit.agents.multimodal import MultimodalAgent
 21 | from livekit.plugins import openai
 22 | 
 23 | 
 24 | # Initialize environment variables
 25 | # The .env.local file should look like:
 26 | #   OPENAI_API_KEY=your-key-here
 27 | #   BILLING_PHONE_NUMBER=+12345678901
 28 | #   TECH_SUPPORT_PHONE_NUMBER=+12345678901
 29 | #   CUSTOMER_SERVICE_PHONE_NUMBER=+12345678901
 30 | #   LIVEKIT_URL=wss://your-url-goes-here.livekit.cloud
 31 | #   LIVEKIT_API_KEY=your-key-here
 32 | #   LIVEKIT_API_SECRET=your-secret-here
 33 | load_dotenv(dotenv_path=".env.local")
 34 | 
 35 | # Initialize logging
 36 | logger = logging.getLogger("phone-assistant")
 37 | logger.setLevel(logging.INFO)
 38 | 
 39 | 
 40 | class PhoneAssistant:
 41 |     """
 42 |     A simple multimodal phone assistant that handles voice interactions. You can transfer the call to a department
 43 |     based on the DTMF digit pressed by the user.
 44 |     """
 45 | 
 46 |     def __init__(self, context: JobContext):
 47 |         """
 48 |         Initialize the PhoneAssistant with the context about the room, participant, etc.
 49 | 
 50 |         Args:
 51 |             context (JobContext): The context for the job.
 52 |         """
 53 |         self.context = context
 54 |         self.assistant = None
 55 |         self.model = None
 56 |         self.livekit_api = None
 57 | 
 58 |     async def say(self, message: str) -> None:
 59 |         """
 60 |         Ask the assistant to speak a message to the user. The assistant needs to be told to use its
 61 |         voice to respond. If you don't do this, the assistant may respond with text instead of voice,
 62 |         which doesn't make much sense on a phone call.
 63 | 
 64 |         Args:
 65 |             message (str): The message to say.
 66 |         """
 67 |         if self.model and hasattr(self.model, 'sessions'):
 68 |             session = self.model.sessions[0]
 69 |             session.conversation.item.create(
 70 |                 llm.ChatMessage(
 71 |                     role="assistant",
 72 |                     content=f"Using your voice to respond, please say: {message}"
 73 |                 )
 74 |             )
 75 |             session.response.create()
 76 |             logger.debug(f"Asked assistant to say: {message}")
 77 | 
 78 |     async def connect_to_room(self) -> rtc.Participant:
 79 |         """
 80 |         Connect to the LiveKit room and wait for a participant to join.
 81 | 
 82 |         Returns:
 83 |             rtc.Participant: The connected participant.
 84 |         """
 85 |         room_name = self.context.room.name
 86 |         logger.info(f"Connecting to room: {room_name}")
 87 |         await self.context.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
 88 |         self._setup_event_handlers(self.context.room)
 89 |         participant = await self.context.wait_for_participant()
 90 |         return participant
 91 | 
 92 |     def _setup_event_handlers(self, room: rtc.Room) -> None:
 93 |         """
 94 |         Set up event handlers for any room events we care about. In this case, it's only the DTMF codes,
 95 |         but you could handle any other room events too.
 96 | 
 97 |         Args:
 98 |             room (rtc.Room): The LiveKit room instance.
 99 |         """
100 | 
101 |         @room.on("sip_dtmf_received")
102 |         def handle_dtmf(dtmf_event: rtc.SipDTMF):
103 |             """
104 |             Handle DTMF (Dual-Tone Multi-Frequency) signals received from SIP. (These are the sounds
105 |             that are made when a user presses a number on a phone keypad.)
106 | 
107 |             Args:
108 |                 dtmf_event (rtc.SipDTMF): The DTMF event data.
109 |             """
110 |             code = dtmf_event.code
111 |             digit = dtmf_event.digit
112 |             identity = dtmf_event.participant.identity
113 |             logger.info(f"DTMF received - Code: {code}, Digit: '{digit}'")
114 | 
115 |             # Define department mapping
116 |             department_numbers = {
117 |                 "1": ("BILLING_PHONE_NUMBER", "Billing"),
118 |                 "2": ("TECH_SUPPORT_PHONE_NUMBER", "Tech Support"),
119 |                 "3": ("CUSTOMER_SERVICE_PHONE_NUMBER", "Customer Service")
120 |             }
121 |             logger.info(f"Department numbers: {department_numbers}")
122 |             if digit in department_numbers:
123 |                 env_var, dept_name = department_numbers[digit]
124 |                 transfer_number = f"tel:{os.getenv(env_var)}"
125 |                 asyncio.create_task(self._handle_transfer(identity, transfer_number, dept_name))
126 |             else:
127 |                 asyncio.create_task(self.say("I'm sorry, please choose one of the options I mentioned earlier."))
128 | 
129 | 
130 |     async def _handle_transfer(self, identity: str, transfer_number: str, department: str) -> None:
131 |         """
132 |         Handle the transfer process with department-specific messaging.
133 | 
134 |         Args:
135 |             identity (str): The participant's identity
136 |             transfer_number (str): The number to transfer to
137 |             department (str): The name of the department
138 |         """
139 |         await self.say(f"Transferring you to our {department} department in a moment. Please hold.")
140 |         await asyncio.sleep(6)
141 |         await self.transfer_call(identity, transfer_number)
142 | 
143 | 
144 |     def start_agent(self, participant: rtc.Participant) -> None:
145 |         """
146 |         Initialize and start the multimodal agent.
147 | 
148 |         Args:
149 |             participant (rtc.Participant): The participant to interact with.
150 |         """
151 | 
152 |         # Initialize the OpenAI model with updated instructions
153 |         self.model = openai.realtime.RealtimeModel(
154 |             instructions=(
155 |                 "You are a friendly assistant providing support. "
156 |                 "Please inform users they can:\n"
157 |                 "- Press 1 for Billing\n"
158 |                 "- Press 2 for Technical Support\n"
159 |                 "- Press 3 for Customer Service"
160 |             ),
161 |             # We use Audio for voice, and text to feed the model context behind the scenes.
162 |             # Whenever we use text, it's important to make sure the model knows it's supposed 
163 |             # to respond with voice. We do this with prompt engineering throughout the agent.
164 |             modalities=["audio", "text"],
165 |             voice="sage"
166 |         )
167 | 
168 |         # Create and start the multimodal agent
169 |         self.assistant = MultimodalAgent(model=self.model)
170 |         self.assistant.start(self.context.room, participant)
171 | 
172 |         # Greeting with menu options. This is the first thing the assistant says to the user.
173 |         # You don't need to have a greeting, but it's a good idea to have one if calls are incoming.
174 |         greeting = (
175 |             "Hi, thanks for calling Vandelay Industries — global leader in fine latex goods!"
176 |             "You can press 1 for Billing, 2 for Technical Support, "
177 |             "or 3 for Customer Service. You can also just talk to me, since I'm a LiveKit agent."
178 |         )
179 |         asyncio.create_task(self.say(greeting))
180 | 
181 |     async def transfer_call(self, participant_identity: str, transfer_to: str) -> None:
182 |         """
183 |         Transfer the SIP call to another number. This will essentially end the current call and start a new one,
184 |         the PhoneAssistant will no longer be active on the call.
185 | 
186 |         Args:
187 |             participant_identity (str): The identity of the participant.
188 |             transfer_to (str): The phone number to transfer the call to.
189 |         """
190 |         logger.info(f"Transferring call for participant {participant_identity} to {transfer_to}")
191 | 
192 |         try:
193 |             # Initialize LiveKit API client if not already done
194 |             if not self.livekit_api:
195 |                 livekit_url = os.getenv('LIVEKIT_URL')
196 |                 api_key = os.getenv('LIVEKIT_API_KEY')
197 |                 api_secret = os.getenv('LIVEKIT_API_SECRET')
198 |                 logger.debug(f"Initializing LiveKit API client with URL: {livekit_url}")
199 |                 self.livekit_api = api.LiveKitAPI(
200 |                     url=livekit_url,
201 |                     api_key=api_key,
202 |                     api_secret=api_secret
203 |                 )
204 | 
205 |             # Create transfer request
206 |             transfer_request = proto_sip.TransferSIPParticipantRequest(
207 |                 participant_identity=participant_identity,
208 |                 room_name=self.context.room.name,
209 |                 transfer_to=transfer_to,
210 |                 play_dialtone=True
211 |             )
212 |             logger.debug(f"Transfer request: {transfer_request}")
213 | 
214 |             # Perform transfer
215 |             await self.livekit_api.sip.transfer_sip_participant(transfer_request)
216 |             logger.info(f"Successfully transferred participant {participant_identity} to {transfer_to}")
217 | 
218 |         except Exception as e:
219 |             logger.error(f"Failed to transfer call: {e}", exc_info=True)
220 |             await self.say("I'm sorry, I couldn't transfer your call. Is there something else I can help with?")
221 | 
222 |     async def cleanup(self) -> None:
223 |         """
224 |         Clean up resources before shutting down.
225 |         """
226 |         if self.livekit_api:
227 |             await self.livekit_api.aclose()
228 |             self.livekit_api = None
229 | 
230 | 
231 | async def entrypoint(context: JobContext) -> None:
232 |     """
233 |     The main entry point for the phone assistant application.
234 | 
235 |     Args:
236 |         context (JobContext): The context for the job.
237 |     """
238 |     assistant = PhoneAssistant(context)
239 |     disconnect_event = asyncio.Event()
240 | 
241 |     @context.room.on("disconnected")
242 |     def on_room_disconnect(*args):
243 |         disconnect_event.set()
244 | 
245 |     try:
246 |         participant = await assistant.connect_to_room()
247 |         assistant.start_agent(participant)
248 |         # Wait until the room is disconnected
249 |         await disconnect_event.wait()
250 |     finally:
251 |         await assistant.cleanup()
252 | 
253 | 
254 | if __name__ == "__main__":
255 |     cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))


--------------------------------------------------------------------------------