├── .env.example ├── .gitignore ├── LICENSE ├── Readme.md ├── main.py ├── preview.mp4 └── requirements.txt /.env.example: -------------------------------------------------------------------------------- 1 | GROQ_API_KEY = "Your Groq API Key" -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Shreyansh Rajput 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # Personal Voice Assistant 2 | 3 | This is a personal voice assistant that can perform various tasks such as playing music from YouTube, fixing errors, and chatting with you like a normal chatbot. The assistant is built using Python and leverages several libraries and APIs to provide its functionalities. 4 | 5 | # Preview 6 | 7 | [Watch the video](preview.mp4) 8 | 9 | ## Features 10 | 11 | 1. **Play Music from YouTube**: You can ask the assistant to play music from YouTube based on a search query. 12 | 2. **Fix Errors**: The assistant can take a screenshot, extract text from the image, and attempt to fix any errors found in the text. 13 | 3. **Chatbot**: The assistant can engage in normal conversations with you. 14 | 15 | Error Fixing Process: 16 | 17 | When a user reports an error, the Personal Voice Assistant takes a screenshot of the current screen to capture the exact issue. This image is then processed using OpenCV to extract the text from the screenshot. The extracted text is sent to the LLaMA 3 language model, which analyzes the content and generates a relevant response or solution. The assistant then communicates the suggested fix or troubleshooting steps back to the user, ensuring a streamlined and effective resolution process. 18 | 19 | ## Getting Started 20 | 21 | ### Prerequisites 22 | 23 | Make sure you have Python installed on your system. You can download it from [python.org](https://www.python.org/). 24 | 25 | ### Installation 26 | 27 | 1. Clone the repository: 28 | ```sh 29 | git clone https://github.com/kiritoInd/Personal-Voice-Assistant.git 30 | cd Personal-Voice-Assistant 31 | ``` 32 | 33 | 2. Install the required packages: 34 | ```sh 35 | pip install -r requirements.txt 36 | ``` 37 | 38 | ### Environment Variables 39 | 40 | Create a `.env` file in the root directory of the project and add your Groq API key: 41 | ```env 42 | GROQ_API_KEY=your_groq_api_key 43 | ``` 44 | ## Running the Assistant 45 | ### Run the following command to start the voice assistant: 46 | ``` python main.py ``` 47 | ## Usage 48 | 49 | - **Start the Assistant**: Click the "Start Bot" button in the GUI to start the assistant. 50 | - **Trigger Word**: Say "hello" to activate the assistant. 51 | - **Commands**: 52 | - **Play Music**: "Play [song name] from YouTube." 53 | - **Fix Error**: "Can you fix this error?" 54 | - **Chat**: Engage in a normal conversation. 55 | 56 | ## Adding More Functions 57 | 58 | You can add more functionalities to the assistant through the function calling list. Learn more about function calling at [DataCamp's OpenAI Function Calling Tutorial](https://www.datacamp.com/tutorial/open-ai-function-calling-tutorial). 59 | 60 | You can use the same for meta LLama3 61 | 62 | ## Adding More Functions 63 | 64 | You can add more functionalities to the assistant through the function calling list. Learn more about function calling at [DataCamp's OpenAI Function Calling Tutorial](https://www.datacamp.com/tutorial/open-ai-function-calling-tutorial). 65 | 66 | To add new functions, update the `function_calling_template` in the code: 67 | 68 | ```python 69 | function_calling_template = """ 70 | { 71 | "name": "Your Function", 72 | "description": "Description of the function", 73 | "parameters": { 74 | "type": "object", 75 | "properties": {}, 76 | "required": [], 77 | }, 78 | } 79 | """ 80 | ``` 81 | 82 | ## Libraries and APIs Used 83 | 84 | - `json` 85 | - `speech_recognition` 86 | - `pyttsx3` 87 | - `groq` 88 | - `Pillow` 89 | - `opencv-python-headless` 90 | - `pytesseract` 91 | - `datasets` 92 | - `torch` 93 | - `transformers` 94 | - `soundfile` 95 | - `sounddevice` 96 | - `requests` 97 | - `beautifulsoup4` 98 | - `keyboard` 99 | - `tkinter` 100 | 101 | ## License 102 | 103 | This project is licensed under the MIT License - see the LICENSE file for details. 104 | 105 | ## Acknowledgments 106 | 107 | - [GroqApi](https://groq.com/) 108 | - [Meta](https://github.com/meta-llama/llama3) 109 | - [DataCamp](https://www.datacamp.com/) 110 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import speech_recognition as sr 4 | import pyttsx3 5 | from groq import Groq 6 | from PIL import ImageGrab, Image 7 | 8 | import cv2 9 | import pytesseract 10 | 11 | 12 | from datasets import load_dataset 13 | import torch 14 | 15 | 16 | import time 17 | 18 | from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan 19 | from dotenv import load_dotenv 20 | import io 21 | import soundfile as sf 22 | import sounddevice as sd 23 | from flask import Response 24 | 25 | import re 26 | import requests 27 | import urllib.parse 28 | import urllib.request 29 | from bs4 import BeautifulSoup 30 | import webbrowser 31 | import time 32 | import keyboard 33 | import tkinter as tk 34 | from tkinter.scrolledtext import ScrolledText 35 | from threading import Thread 36 | 37 | # Initialize global variable for play/pause state 38 | 39 | load_dotenv() 40 | 41 | GroqApiKey = os.getenv("GROQ_API_KEY") 42 | 43 | client = Groq(api_key=GroqApiKey) 44 | 45 | processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts") 46 | model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts") 47 | vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan") 48 | 49 | # Load xvector from dataset 50 | embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation") 51 | speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0) 52 | 53 | def synthesize(text): 54 | inputs = processor(text=text, return_tensors="pt") 55 | speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder) 56 | speech_numpy = speech.numpy() 57 | buf = io.BytesIO() 58 | sf.write(buf, speech_numpy, samplerate=16000, format='WAV', subtype='PCM_16') 59 | buf.seek(0) 60 | data, samplerate = sf.read(buf) 61 | sd.play(data, samplerate) 62 | sd.wait() 63 | 64 | def Can_you_fix_error(chat_box): 65 | try: 66 | screenshot = ImageGrab.grab() 67 | cwd = os.getcwd() 68 | file_path = os.path.join(cwd, "screenshot.png") 69 | screenshot.save(file_path) 70 | Text = extract_text_from_image(file_path) 71 | Query = "Can you solve this problem and you only have to provide the code nothing else just the corrected code" 72 | response = ask_question_about_text(Text, Query) 73 | chat_box.config(state=tk.NORMAL) 74 | chat_box.insert(tk.END, f"Bot: {response}\n") 75 | chat_box.config(state=tk.DISABLED) 76 | return "Check Your Chat" 77 | except Exception as e: 78 | return f"An error occurred while taking the screenshot: {e}" 79 | 80 | def extract_text_from_image(image_path): 81 | image = cv2.imread(image_path) 82 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 83 | gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] 84 | filename = "{}.png".format(os.getpid()) 85 | cv2.imwrite(filename, gray) 86 | text = pytesseract.image_to_string(Image.open(filename)) 87 | os.remove(filename) 88 | return text.strip() if text else "No text detected." 89 | 90 | def ask_question_about_text(extracted_text, question): 91 | messages = [ 92 | { 93 | "role": "system", 94 | "content": "You are a helpful assistant. Please provide a solution to any error found in the extracted text." 95 | }, 96 | { 97 | "role": "user", 98 | "content": f"The following is the extracted text from an image:\n\n{extracted_text}\n\nNow answer the following question based on the above text:\n\n{question}" 99 | } 100 | ] 101 | 102 | chat_completion = client.chat.completions.create( 103 | messages=messages, 104 | model="llama3-8b-8192" 105 | ) 106 | 107 | response = chat_completion.choices[0].message.content 108 | return response.strip() 109 | 110 | def get_youtube_video_url(query): 111 | query_string = urllib.parse.urlencode({"search_query": query}) 112 | formatUrl = urllib.request.urlopen("https://www.youtube.com/results?" + query_string) 113 | search_results = re.findall(r"watch\?v=(\S{11})", formatUrl.read().decode()) 114 | if not search_results: 115 | return None 116 | video_url = "https://www.youtube.com/watch?v=" + "{}".format(search_results[0]) 117 | return video_url 118 | 119 | def play_music_from_youtube(query): 120 | video_url = get_youtube_video_url(query) 121 | if not video_url: 122 | return "No results found" 123 | webbrowser.open(video_url) 124 | time.sleep(2) 125 | return f"Playing: {query}" 126 | 127 | def play_pause_media(): 128 | keyboard.send('play/pause media') 129 | return "Toggled play/pause" 130 | 131 | function_calling_template = """ 132 | system 133 | 134 | You are a virtual assistant AI model you have to call function depending upon user request also there may be noramal conversation request if the function for specific problem is not provide do not use toolcall in conversation. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. For each function call return a json object with function name and arguments within XML tags as follows: 135 | 136 | {"name": ,"arguments": } 137 | 138 | 139 | Here are the available tools: 140 | { 141 | "name": "Can_you_fix_error", 142 | "description": "Fix the Error", 143 | "parameters": { 144 | "type": "object", 145 | "properties": {}, 146 | "required": [], 147 | }, 148 | "name": "play_pause_media", 149 | "description": "play or puase the current music", 150 | "parameters": { 151 | "type": "object", 152 | "properties": {}, 153 | "required": [], 154 | }, 155 | 156 | "name": "play_music_from_youtube", 157 | "description": "Play music from YouTube based on a search query", 158 | "parameters": { 159 | "type": "object", 160 | "properties": { 161 | "query": { 162 | "type": "string", 163 | "description": "The search query to find the music on YouTube" 164 | } 165 | }, 166 | "required": ["query"], 167 | } 168 | } 169 | """ 170 | 171 | def listen_for_trigger_word(trigger_word="hello"): 172 | recognizer = sr.Recognizer() 173 | with sr.Microphone() as source: 174 | while True: 175 | audio = recognizer.listen(source) 176 | try: 177 | text = recognizer.recognize_google(audio).lower() 178 | if trigger_word in text: 179 | return 180 | except sr.UnknownValueError: 181 | pass 182 | 183 | def speak(text): 184 | engine = pyttsx3.init() 185 | engine.say(text) 186 | engine.runAndWait() 187 | 188 | def process_command(command, chat_box): 189 | messages = [ 190 | { 191 | "role": "system", 192 | "content": function_calling_template 193 | }, 194 | { 195 | "role": "user", 196 | "content": command 197 | } 198 | ] 199 | chat_completion = client.chat.completions.create( 200 | messages=messages, 201 | model="llama3-8b-8192", 202 | ) 203 | response = chat_completion.choices[0].message.content 204 | 205 | if "" in response: 206 | function_call = response.split("")[1].split("")[0] 207 | function_call_json = json.loads(function_call) 208 | function_name = function_call_json['name'] 209 | if function_name == "Can_you_fix_error": 210 | return Can_you_fix_error(chat_box) 211 | elif function_name == "play_music_from_youtube": 212 | query = function_call_json['arguments']['query'] 213 | return play_music_from_youtube(query) 214 | elif function_name == "play_pause_media": 215 | return play_pause_media() 216 | 217 | return response 218 | 219 | def start_bot(chat_box): 220 | trigger_word = "hello" 221 | 222 | while True: 223 | listen_for_trigger_word(trigger_word) 224 | st = "Hey, how can I help you?" 225 | synthesize(st) 226 | chat_box.config(state=tk.NORMAL) 227 | chat_box.insert(tk.END, "Bot: Hey, how can I help you?\n") 228 | chat_box.config(state=tk.DISABLED) 229 | 230 | recognizer = sr.Recognizer() 231 | with sr.Microphone() as source: 232 | chat_box.config(state=tk.NORMAL) 233 | chat_box.insert(tk.END, "Listening for command...\n") 234 | chat_box.config(state=tk.DISABLED) 235 | audio = recognizer.listen(source) 236 | try: 237 | command = recognizer.recognize_google(audio).lower() 238 | chat_box.config(state=tk.NORMAL) 239 | chat_box.insert(tk.END, f"You: {command}\n") 240 | chat_box.config(state=tk.DISABLED) 241 | response = process_command(command, chat_box) 242 | chat_box.insert(tk.END, f"Bot: {response}\n") 243 | chat_box.config(state=tk.DISABLED) 244 | if "```" in response: 245 | # Split and format code blocks 246 | parts = response.split("```") 247 | for i, part in enumerate(parts): 248 | if i % 2 == 0: 249 | chat_box.insert(tk.END, part) 250 | else: 251 | chat_box.insert(tk.END, part, "code") 252 | else: 253 | chat_box.insert(tk.END, f"Bot: {response}\n") 254 | if len(response) > 600: 255 | speak(response) 256 | else: 257 | synthesize(response) 258 | chat_box.config(state=tk.NORMAL) 259 | except sr.UnknownValueError: 260 | synthesize("Sorry, I didn't catch that.") 261 | chat_box.config(state=tk.NORMAL) 262 | chat_box.insert(tk.END, "Bot: Sorry, I didn't catch that.\n") 263 | chat_box.config(state=tk.DISABLED) 264 | except Exception as e: 265 | synthesize("An unexpected error occurred. Please try again.") 266 | chat_box.config(state=tk.NORMAL) 267 | chat_box.insert(tk.END, f"Bot: An unexpected error occurred. Please try again. Error: {e}\n") 268 | chat_box.config(state=tk.DISABLED) 269 | break 270 | 271 | 272 | def on_start_button_click(chat_box): 273 | bot_thread = Thread(target=start_bot, args=(chat_box,)) 274 | bot_thread.start() 275 | 276 | def create_gui(): 277 | root = tk.Tk() 278 | root.title("Voice Assistant") 279 | 280 | chat_box = ScrolledText(root, wrap=tk.WORD, state='disabled') 281 | chat_box.pack(padx=10, pady=10, fill=tk.BOTH, expand=True) 282 | 283 | start_button = tk.Button(root, text="Start Bot", command=lambda: on_start_button_click(chat_box)) 284 | start_button.pack(pady=10) 285 | 286 | root.mainloop() 287 | 288 | if __name__ == "__main__": 289 | create_gui() 290 | -------------------------------------------------------------------------------- /preview.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kiritoInd/Personal-Voice-Assistant-Using-LLM-FunctionCalling/3f83fbb15bc0530176855cdb7a9bcd1471b73b2d/preview.mp4 -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | speechrecognition 2 | pyttsx3 3 | groq 4 | Pillow 5 | pytesseract 6 | datasets 7 | torch 8 | transformers 9 | python-dotenv 10 | soundfile 11 | sounddevice 12 | flask 13 | requests 14 | beautifulsoup4 15 | keyboard --------------------------------------------------------------------------------