├── requirements.txt
├── .gitignore
├── models.py
├── prompts
    ├── customer_support.md
    ├── dr_prompt.md
    └── vet_prompt.md
├── README.md
└── main.py


/requirements.txt:
--------------------------------------------------------------------------------
1 | openai==1.30.3
2 | SpeechRecognition==3.9.0
3 | PyAudio==0.2.13
4 | elevenlabs==1.2.2


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | logs/
2 | outputs/
3 | recordings/
4 | transcripts/
5 | .env
6 | __pycache__/
7 | .mypy_cache/


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
 1 | from dataclasses import dataclass
 2 | from typing import Optional
 3 | 
 4 | @dataclass(frozen=True)
 5 | class Message:
 6 |     role: str
 7 |     content: Optional[str] = None
 8 | 
 9 |     def render(self):
10 |         result = self.role + ":"
11 |         if self.content is not None:
12 |             result += " " + self.content
13 |         return result


--------------------------------------------------------------------------------
/prompts/customer_support.md:
--------------------------------------------------------------------------------
 1 | You've been tasked to call an airline's customer support line and reschedule your flight. You have been provided with the following information:
 2 | 
 3 | * Your name is Test User
 4 | * Your phone number is 555-555-5555
 5 | * Your email is hello@example.com
 6 | * Your address is 123 Fake St, New York, NY 10001
 7 | * Your original flight was United Airlines flight 1234
 8 | * Your original flight was scheduled to depart at 12pm on Monday, January 1st.
 9 | * You'd like to reschedule your flight to depart on January 3rd.
10 | 
11 | If you don't know how to respond, you can say "Sorry, I'm not sure."
12 | 
13 | Begin.
14 | 


--------------------------------------------------------------------------------
/prompts/dr_prompt.md:
--------------------------------------------------------------------------------
 1 | You're a helpful assistant that has been tasked to schedule a doctor's appointment for a patient. You have been provided with the following information:
 2 | 
 3 | * The patient's name is Test User
 4 | * The patient's phone number is 555-555-5555
 5 | * The patient's email is hello@example.com
 6 | * The patient's address is 123 Fake St, New York, NY 10001
 7 | * The patient is a patient of Dr. Smith
 8 | * The reason for the appointment is because the patient needs a checkup
 9 | 
10 | The patient's availability for an appointment is:
11 | * Monday, Tuesday, or Thursday between 12pm and 2pm
12 | 
13 | Your objective is to make an appointment with the doctor. You shouldn't schedule or say something that is impossible. For example, you shouldn't make an appointment that falls outside of the patient's availability.
14 | 
15 | If you don't know how to respond, you can say "Sorry, I'm not sure."
16 | 
17 | Begin.
18 | 


--------------------------------------------------------------------------------
/prompts/vet_prompt.md:
--------------------------------------------------------------------------------
 1 | You're a helpful assistant that has been tasked to schedule a vet appointment for a pet. You have been provided with the following information:
 2 | 
 3 | * The pet owner's name is Test User
 4 | * The pet owner's phone number is 555-555-5555
 5 | * The pet owner's email is hello@example.com
 6 | * The pet owner's address is 123 Fake St, New York, NY 10001
 7 | * The pet's name is Fido
 8 | * The pet is a dog
 9 | * The pet is a previous patient of the vet
10 | * The reason for the appointment is because it needs a checkup
11 | 
12 | The pet owner's availability for an appointment is:
13 | * Any day during the week between 12pm and 2pm
14 | 
15 | Your objective is to make an appointment with the vet. You shouldn't schedule or say something that is impossible. For example, you shouldn't make an appointment that falls outside of the user's availability.
16 | 
17 | If you don't know how to respond, you can say "Sorry, I'm not sure."
18 | 
19 | Begin.
20 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Voice Call with ChatGPT
 2 | 
 3 | This repository is a demo of what'd be possible if you could have ChatGPT to place a call on your behalf. This is **experimental** and not meant to actually be used for placing calls. But it's pretty fun to play around with.
 4 | 
 5 | Originally, it was built using the [Eleven Labs](https://beta.elevenlabs.io/) Voice AI API so that you could create your own voice and then have it act as the voice of ChatGPT. But it's been updated to use OpenAI's voice API instead. You could always swap these out if you wanted.
 6 | 
 7 | ## How to use
 8 | 
 9 | Install the dependencies:
10 | 
11 | ```
12 | pip install -r requirements.txt
13 | ```
14 | 
15 | I've included some example prompts for common scenarios where you might want a bot to place a call for you. Of course, you could also add your own.
16 | 
17 | To run a scenario, run the following command:
18 | 
19 | ```
20 | python main.py -pf 'path/to/prompt/file.txt'
21 | ```
22 | 
23 | Replacing the path above with the path to the prompt file you want to use.
24 | 
25 | When it starts, you'll see a "Listening..." message. That's essentially the same as a phone call being picked up. You can then play the role of the receiver, ChatGPT will play the role of the caller.
26 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import datetime
  3 | from openai import OpenAI
  4 | from elevenlabs.client import ElevenLabs, Voice
  5 | from elevenlabs import stream
  6 | import argparse
  7 | from dataclasses import asdict
  8 | from models import Message
  9 | import speech_recognition as sr
 10 | import logging
 11 | from pathlib import Path
 12 | logging.basicConfig(level=logging.INFO)
 13 | import dotenv
 14 | dotenv.load_dotenv('.env')
 15 | 
 16 | oai_client = OpenAI()
 17 | elevenlabs_client = ElevenLabs()
 18 | 
 19 | CHAT_MODEL = "gpt-4o"
 20 | TTS_MODEL = "tts-1"
 21 | MODEL_TEMPERATURE = 0.5
 22 | AUDIO_MODEL = "whisper-1"
 23 | VOICE_ID = os.getenv("ELEVENLABS_VOICE_ID")
 24 | 
 25 | def ask_gpt_chat(prompt: str, messages: list[Message]):
 26 |     """Returns ChatGPT's response to the given prompt."""
 27 |     system_message = [{"role": "system", "content": prompt}]
 28 |     message_dicts = [asdict(message) for message in messages]
 29 |     conversation_messages = system_message + message_dicts
 30 |     response = oai_client.chat.completions.create(
 31 |         model=CHAT_MODEL,
 32 |         messages=conversation_messages,
 33 |         temperature=MODEL_TEMPERATURE
 34 |     )
 35 |     return response.choices[0].message.content
 36 | 
 37 | def setup_prompt(prompt_file: str = 'prompts/vet_prompt.md') -> str:
 38 |     """Creates a prompt for gpt for generating a response."""
 39 |     with open(prompt_file) as f:
 40 |         prompt = f.read()
 41 | 
 42 |     return prompt
 43 | 
 44 | def get_transcription(file_path: str):
 45 |     audio_file= open(file_path, "rb")
 46 |     transcription = oai_client.audio.transcriptions.create(
 47 |         model=AUDIO_MODEL, 
 48 |         file=audio_file
 49 |     )
 50 |     return transcription.text
 51 | 
 52 | def record():
 53 |     # load the speech recognizer with CLI settings
 54 |     r = sr.Recognizer()
 55 | 
 56 |     # record audio stream from multiple sources
 57 |     m = sr.Microphone()
 58 | 
 59 |     with m as source:
 60 |         r.adjust_for_ambient_noise(source)
 61 |         logging.info(f'Listening...')
 62 |         audio = r.listen(source)
 63 | 
 64 |     # write audio to a WAV file
 65 |     timestamp = datetime.datetime.now().timestamp()
 66 |     with open(f"./recordings/{timestamp}.wav", "wb") as f:
 67 |         f.write(audio.get_wav_data())
 68 |     transcript = get_transcription(f"./recordings/{timestamp}.wav")
 69 |     with open(f"./transcripts/{timestamp}.txt", "w") as f:
 70 |         f.write(transcript)
 71 |     return transcript
 72 | 
 73 | def oai_text_to_speech(text: str):
 74 |     timestamp = datetime.datetime.now().timestamp()
 75 |     speech_file_path = Path(__file__).parent / f"outputs/{timestamp}.mp3"
 76 |     response = oai_client.audio.speech.create(
 77 |         model=TTS_MODEL,
 78 |         voice="nova",
 79 |         input=text
 80 |     )
 81 |     response.write_to_file(speech_file_path)
 82 |     return speech_file_path
 83 | 
 84 | def elevenlabs_text_to_speech(text: str):
 85 |     audio_stream = elevenlabs_client.generate(
 86 |         text=text,
 87 |         voice=Voice(
 88 |             voice_id=VOICE_ID
 89 |         ),
 90 |         stream=True
 91 |     )
 92 |     stream(audio_stream)
 93 | 
 94 | def clean_up():
 95 |     logging.info('Exiting...')
 96 |     # Delete all the recordings and transcripts
 97 |     for file in os.listdir('./recordings'):
 98 |         os.remove(f"./recordings/{file}")
 99 |     for file in os.listdir('./transcripts'):
100 |         os.remove(f"./transcripts/{file}")
101 |     for file in os.listdir('./outputs'):
102 |         os.remove(f"./outputs/{file}")
103 |     # Save the conversation
104 |     timestamp = datetime.datetime.now().timestamp()
105 |     with open(f'logs/conversation_{timestamp}.txt', 'w') as f:
106 |         for message in conversation_messages:
107 |             f.write(f"{message.role}: {message.content}\n")
108 | 
109 | if __name__ == "__main__":
110 |     parser = argparse.ArgumentParser()
111 |     parser.add_argument("-pf", "--prompt_file", help="Specify the prompt file to use.", type=str)
112 |     parser.add_argument("-tts", "--tts_type", help="Specify the TTS type to use.", type=str, default="openai", choices=["openai", "elevenlabs"])
113 |     args = parser.parse_args()
114 |     prompt_file = args.prompt_file
115 |     tts_type = args.tts_type or "openai"
116 | 
117 |     prompt = setup_prompt(prompt_file)
118 |     conversation_messages = []
119 |     while True:
120 |         try:
121 |             user_input = record()
122 |             logging.info(f'Receiver: {user_input}')
123 |             conversation_messages.append(Message(role="user", content=user_input))
124 |             answer = ask_gpt_chat(prompt, conversation_messages)
125 |             logging.info(f'Caller: {answer}')
126 |             logging.info('Playing audio...')
127 |             if tts_type == "elevenlabs":
128 |                 elevenlabs_text_to_speech(answer)
129 |             else:
130 |                 audio_file = oai_text_to_speech(answer)
131 |                 # Play the audio file
132 |                 os.system(f"afplay {audio_file}")
133 |             conversation_messages.append(Message(role="assistant", content=answer))
134 |             if 'bye' in user_input.lower():
135 |                 clean_up()
136 |                 break
137 |         except KeyboardInterrupt:
138 |             clean_up()
139 |             break


--------------------------------------------------------------------------------