├── requirements.txt ├── .gitignore ├── models.py ├── prompts ├── customer_support.md ├── dr_prompt.md └── vet_prompt.md ├── README.md └── main.py /requirements.txt: -------------------------------------------------------------------------------- 1 | openai==1.30.3 2 | SpeechRecognition==3.9.0 3 | PyAudio==0.2.13 4 | elevenlabs==1.2.2 -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | logs/ 2 | outputs/ 3 | recordings/ 4 | transcripts/ 5 | .env 6 | __pycache__/ 7 | .mypy_cache/ -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass 2 | from typing import Optional 3 | 4 | @dataclass(frozen=True) 5 | class Message: 6 | role: str 7 | content: Optional[str] = None 8 | 9 | def render(self): 10 | result = self.role + ":" 11 | if self.content is not None: 12 | result += " " + self.content 13 | return result -------------------------------------------------------------------------------- /prompts/customer_support.md: -------------------------------------------------------------------------------- 1 | You've been tasked to call an airline's customer support line and reschedule your flight. You have been provided with the following information: 2 | 3 | * Your name is Test User 4 | * Your phone number is 555-555-5555 5 | * Your email is hello@example.com 6 | * Your address is 123 Fake St, New York, NY 10001 7 | * Your original flight was United Airlines flight 1234 8 | * Your original flight was scheduled to depart at 12pm on Monday, January 1st. 9 | * You'd like to reschedule your flight to depart on January 3rd. 10 | 11 | If you don't know how to respond, you can say "Sorry, I'm not sure." 12 | 13 | Begin. 14 | -------------------------------------------------------------------------------- /prompts/dr_prompt.md: -------------------------------------------------------------------------------- 1 | You're a helpful assistant that has been tasked to schedule a doctor's appointment for a patient. You have been provided with the following information: 2 | 3 | * The patient's name is Test User 4 | * The patient's phone number is 555-555-5555 5 | * The patient's email is hello@example.com 6 | * The patient's address is 123 Fake St, New York, NY 10001 7 | * The patient is a patient of Dr. Smith 8 | * The reason for the appointment is because the patient needs a checkup 9 | 10 | The patient's availability for an appointment is: 11 | * Monday, Tuesday, or Thursday between 12pm and 2pm 12 | 13 | Your objective is to make an appointment with the doctor. You shouldn't schedule or say something that is impossible. For example, you shouldn't make an appointment that falls outside of the patient's availability. 14 | 15 | If you don't know how to respond, you can say "Sorry, I'm not sure." 16 | 17 | Begin. 18 | -------------------------------------------------------------------------------- /prompts/vet_prompt.md: -------------------------------------------------------------------------------- 1 | You're a helpful assistant that has been tasked to schedule a vet appointment for a pet. You have been provided with the following information: 2 | 3 | * The pet owner's name is Test User 4 | * The pet owner's phone number is 555-555-5555 5 | * The pet owner's email is hello@example.com 6 | * The pet owner's address is 123 Fake St, New York, NY 10001 7 | * The pet's name is Fido 8 | * The pet is a dog 9 | * The pet is a previous patient of the vet 10 | * The reason for the appointment is because it needs a checkup 11 | 12 | The pet owner's availability for an appointment is: 13 | * Any day during the week between 12pm and 2pm 14 | 15 | Your objective is to make an appointment with the vet. You shouldn't schedule or say something that is impossible. For example, you shouldn't make an appointment that falls outside of the user's availability. 16 | 17 | If you don't know how to respond, you can say "Sorry, I'm not sure." 18 | 19 | Begin. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Voice Call with ChatGPT 2 | 3 | This repository is a demo of what'd be possible if you could have ChatGPT to place a call on your behalf. This is **experimental** and not meant to actually be used for placing calls. But it's pretty fun to play around with. 4 | 5 | Originally, it was built using the [Eleven Labs](https://beta.elevenlabs.io/) Voice AI API so that you could create your own voice and then have it act as the voice of ChatGPT. But it's been updated to use OpenAI's voice API instead. You could always swap these out if you wanted. 6 | 7 | ## How to use 8 | 9 | Install the dependencies: 10 | 11 | ``` 12 | pip install -r requirements.txt 13 | ``` 14 | 15 | I've included some example prompts for common scenarios where you might want a bot to place a call for you. Of course, you could also add your own. 16 | 17 | To run a scenario, run the following command: 18 | 19 | ``` 20 | python main.py -pf 'path/to/prompt/file.txt' 21 | ``` 22 | 23 | Replacing the path above with the path to the prompt file you want to use. 24 | 25 | When it starts, you'll see a "Listening..." message. That's essentially the same as a phone call being picked up. You can then play the role of the receiver, ChatGPT will play the role of the caller. 26 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import datetime 3 | from openai import OpenAI 4 | from elevenlabs.client import ElevenLabs, Voice 5 | from elevenlabs import stream 6 | import argparse 7 | from dataclasses import asdict 8 | from models import Message 9 | import speech_recognition as sr 10 | import logging 11 | from pathlib import Path 12 | logging.basicConfig(level=logging.INFO) 13 | import dotenv 14 | dotenv.load_dotenv('.env') 15 | 16 | oai_client = OpenAI() 17 | elevenlabs_client = ElevenLabs() 18 | 19 | CHAT_MODEL = "gpt-4o" 20 | TTS_MODEL = "tts-1" 21 | MODEL_TEMPERATURE = 0.5 22 | AUDIO_MODEL = "whisper-1" 23 | VOICE_ID = os.getenv("ELEVENLABS_VOICE_ID") 24 | 25 | def ask_gpt_chat(prompt: str, messages: list[Message]): 26 | """Returns ChatGPT's response to the given prompt.""" 27 | system_message = [{"role": "system", "content": prompt}] 28 | message_dicts = [asdict(message) for message in messages] 29 | conversation_messages = system_message + message_dicts 30 | response = oai_client.chat.completions.create( 31 | model=CHAT_MODEL, 32 | messages=conversation_messages, 33 | temperature=MODEL_TEMPERATURE 34 | ) 35 | return response.choices[0].message.content 36 | 37 | def setup_prompt(prompt_file: str = 'prompts/vet_prompt.md') -> str: 38 | """Creates a prompt for gpt for generating a response.""" 39 | with open(prompt_file) as f: 40 | prompt = f.read() 41 | 42 | return prompt 43 | 44 | def get_transcription(file_path: str): 45 | audio_file= open(file_path, "rb") 46 | transcription = oai_client.audio.transcriptions.create( 47 | model=AUDIO_MODEL, 48 | file=audio_file 49 | ) 50 | return transcription.text 51 | 52 | def record(): 53 | # load the speech recognizer with CLI settings 54 | r = sr.Recognizer() 55 | 56 | # record audio stream from multiple sources 57 | m = sr.Microphone() 58 | 59 | with m as source: 60 | r.adjust_for_ambient_noise(source) 61 | logging.info(f'Listening...') 62 | audio = r.listen(source) 63 | 64 | # write audio to a WAV file 65 | timestamp = datetime.datetime.now().timestamp() 66 | with open(f"./recordings/{timestamp}.wav", "wb") as f: 67 | f.write(audio.get_wav_data()) 68 | transcript = get_transcription(f"./recordings/{timestamp}.wav") 69 | with open(f"./transcripts/{timestamp}.txt", "w") as f: 70 | f.write(transcript) 71 | return transcript 72 | 73 | def oai_text_to_speech(text: str): 74 | timestamp = datetime.datetime.now().timestamp() 75 | speech_file_path = Path(__file__).parent / f"outputs/{timestamp}.mp3" 76 | response = oai_client.audio.speech.create( 77 | model=TTS_MODEL, 78 | voice="nova", 79 | input=text 80 | ) 81 | response.write_to_file(speech_file_path) 82 | return speech_file_path 83 | 84 | def elevenlabs_text_to_speech(text: str): 85 | audio_stream = elevenlabs_client.generate( 86 | text=text, 87 | voice=Voice( 88 | voice_id=VOICE_ID 89 | ), 90 | stream=True 91 | ) 92 | stream(audio_stream) 93 | 94 | def clean_up(): 95 | logging.info('Exiting...') 96 | # Delete all the recordings and transcripts 97 | for file in os.listdir('./recordings'): 98 | os.remove(f"./recordings/{file}") 99 | for file in os.listdir('./transcripts'): 100 | os.remove(f"./transcripts/{file}") 101 | for file in os.listdir('./outputs'): 102 | os.remove(f"./outputs/{file}") 103 | # Save the conversation 104 | timestamp = datetime.datetime.now().timestamp() 105 | with open(f'logs/conversation_{timestamp}.txt', 'w') as f: 106 | for message in conversation_messages: 107 | f.write(f"{message.role}: {message.content}\n") 108 | 109 | if __name__ == "__main__": 110 | parser = argparse.ArgumentParser() 111 | parser.add_argument("-pf", "--prompt_file", help="Specify the prompt file to use.", type=str) 112 | parser.add_argument("-tts", "--tts_type", help="Specify the TTS type to use.", type=str, default="openai", choices=["openai", "elevenlabs"]) 113 | args = parser.parse_args() 114 | prompt_file = args.prompt_file 115 | tts_type = args.tts_type or "openai" 116 | 117 | prompt = setup_prompt(prompt_file) 118 | conversation_messages = [] 119 | while True: 120 | try: 121 | user_input = record() 122 | logging.info(f'Receiver: {user_input}') 123 | conversation_messages.append(Message(role="user", content=user_input)) 124 | answer = ask_gpt_chat(prompt, conversation_messages) 125 | logging.info(f'Caller: {answer}') 126 | logging.info('Playing audio...') 127 | if tts_type == "elevenlabs": 128 | elevenlabs_text_to_speech(answer) 129 | else: 130 | audio_file = oai_text_to_speech(answer) 131 | # Play the audio file 132 | os.system(f"afplay {audio_file}") 133 | conversation_messages.append(Message(role="assistant", content=answer)) 134 | if 'bye' in user_input.lower(): 135 | clean_up() 136 | break 137 | except KeyboardInterrupt: 138 | clean_up() 139 | break --------------------------------------------------------------------------------