├── .env.example
├── .gitignore
├── LICENSE
├── Readme.md
├── main.py
├── preview.mp4
└── requirements.txt


/.env.example:
--------------------------------------------------------------------------------
1 | GROQ_API_KEY = "Your Groq API Key"


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .env
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Shreyansh Rajput
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Readme.md:
--------------------------------------------------------------------------------
  1 | # Personal Voice Assistant
  2 | 
  3 | This is a personal voice assistant that can perform various tasks such as playing music from YouTube, fixing errors, and chatting with you like a normal chatbot. The assistant is built using Python and leverages several libraries and APIs to provide its functionalities.
  4 | 
  5 | # Preview
  6 | 
  7 | [Watch the video](preview.mp4)
  8 | 
  9 | ## Features
 10 | 
 11 | 1. **Play Music from YouTube**: You can ask the assistant to play music from YouTube based on a search query.
 12 | 2. **Fix Errors**: The assistant can take a screenshot, extract text from the image, and attempt to fix any errors found in the text.
 13 | 3. **Chatbot**: The assistant can engage in normal conversations with you.
 14 | 
 15 | Error Fixing Process:
 16 | 
 17 | When a user reports an error, the Personal Voice Assistant takes a screenshot of the current screen to capture the exact issue. This image is then processed using OpenCV to extract the text from the screenshot. The extracted text is sent to the LLaMA 3 language model, which analyzes the content and generates a relevant response or solution. The assistant then communicates the suggested fix or troubleshooting steps back to the user, ensuring a streamlined and effective resolution process.
 18 | 
 19 | ## Getting Started
 20 | 
 21 | ### Prerequisites
 22 | 
 23 | Make sure you have Python installed on your system. You can download it from [python.org](https://www.python.org/).
 24 | 
 25 | ### Installation
 26 | 
 27 | 1. Clone the repository:
 28 |     ```sh
 29 |     git clone https://github.com/kiritoInd/Personal-Voice-Assistant.git
 30 |     cd Personal-Voice-Assistant
 31 |     ```
 32 | 
 33 | 2. Install the required packages:
 34 |     ```sh
 35 |     pip install -r requirements.txt
 36 |     ```
 37 | 
 38 | ### Environment Variables
 39 | 
 40 | Create a `.env` file in the root directory of the project and add your Groq API key:
 41 | ```env
 42 | GROQ_API_KEY=your_groq_api_key
 43 | ```
 44 | ## Running the Assistant
 45 | ### Run the following command to start the voice assistant:
 46 | ``` python main.py ```
 47 | ## Usage
 48 | 
 49 | - **Start the Assistant**: Click the "Start Bot" button in the GUI to start the assistant.
 50 | - **Trigger Word**: Say "hello" to activate the assistant.
 51 | - **Commands**:
 52 |   - **Play Music**: "Play [song name] from YouTube."
 53 |   - **Fix Error**: "Can you fix this error?"
 54 |   - **Chat**: Engage in a normal conversation.
 55 | 
 56 | ## Adding More Functions
 57 | 
 58 | You can add more functionalities to the assistant through the function calling list. Learn more about function calling at [DataCamp's OpenAI Function Calling Tutorial](https://www.datacamp.com/tutorial/open-ai-function-calling-tutorial).
 59 | 
 60 | You can use the same for meta LLama3 
 61 | 
 62 | ## Adding More Functions
 63 | 
 64 | You can add more functionalities to the assistant through the function calling list. Learn more about function calling at [DataCamp's OpenAI Function Calling Tutorial](https://www.datacamp.com/tutorial/open-ai-function-calling-tutorial).
 65 | 
 66 | To add new functions, update the `function_calling_template` in the code:
 67 | 
 68 | ```python
 69 | function_calling_template = """ 
 70 |     <tools> {
 71 |     "name": "Your Function",
 72 |     "description": "Description of the function",
 73 |     "parameters": {
 74 |         "type": "object",
 75 |         "properties": {},
 76 |         "required": [],
 77 |     },
 78 |     } </tools>
 79 |   """
 80 | ```
 81 | 
 82 | ## Libraries and APIs Used
 83 | 
 84 | - `json`
 85 | - `speech_recognition`
 86 | - `pyttsx3`
 87 | - `groq`
 88 | - `Pillow`
 89 | - `opencv-python-headless`
 90 | - `pytesseract`
 91 | - `datasets`
 92 | - `torch`
 93 | - `transformers`
 94 | - `soundfile`
 95 | - `sounddevice`
 96 | - `requests`
 97 | - `beautifulsoup4`
 98 | - `keyboard`
 99 | - `tkinter`
100 | 
101 | ## License
102 | 
103 | This project is licensed under the MIT License - see the LICENSE file for details.
104 | 
105 | ## Acknowledgments
106 | 
107 | - [GroqApi](https://groq.com/)
108 | - [Meta](https://github.com/meta-llama/llama3)
109 | - [DataCamp](https://www.datacamp.com/)
110 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import json
  3 | import speech_recognition as sr
  4 | import pyttsx3
  5 | from groq import Groq
  6 | from PIL import ImageGrab, Image
  7 | 
  8 | import cv2
  9 | import pytesseract
 10 | 
 11 | 
 12 | from datasets import load_dataset
 13 | import torch
 14 | 
 15 | 
 16 | import time
 17 | 
 18 | from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
 19 | from dotenv import load_dotenv
 20 | import io
 21 | import soundfile as sf
 22 | import sounddevice as sd
 23 | from flask import Response
 24 | 
 25 | import re
 26 | import requests
 27 | import urllib.parse
 28 | import urllib.request
 29 | from bs4 import BeautifulSoup
 30 | import webbrowser
 31 | import time
 32 | import keyboard
 33 | import tkinter as tk
 34 | from tkinter.scrolledtext import ScrolledText
 35 | from threading import Thread
 36 | 
 37 | # Initialize global variable for play/pause state
 38 | 
 39 | load_dotenv()
 40 | 
 41 | GroqApiKey = os.getenv("GROQ_API_KEY")
 42 | 
 43 | client = Groq(api_key=GroqApiKey)
 44 | 
 45 | processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
 46 | model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
 47 | vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
 48 | 
 49 | # Load xvector from dataset
 50 | embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
 51 | speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
 52 | 
 53 | def synthesize(text):
 54 |     inputs = processor(text=text, return_tensors="pt")
 55 |     speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
 56 |     speech_numpy = speech.numpy()
 57 |     buf = io.BytesIO()
 58 |     sf.write(buf, speech_numpy, samplerate=16000, format='WAV', subtype='PCM_16')
 59 |     buf.seek(0)
 60 |     data, samplerate = sf.read(buf)
 61 |     sd.play(data, samplerate)
 62 |     sd.wait()
 63 | 
 64 | def Can_you_fix_error(chat_box):
 65 |     try:
 66 |         screenshot = ImageGrab.grab()
 67 |         cwd = os.getcwd()
 68 |         file_path = os.path.join(cwd, "screenshot.png")
 69 |         screenshot.save(file_path)
 70 |         Text = extract_text_from_image(file_path)
 71 |         Query = "Can you solve this problem and you only have to provide the code nothing else just the corrected code"
 72 |         response = ask_question_about_text(Text, Query)
 73 |         chat_box.config(state=tk.NORMAL)
 74 |         chat_box.insert(tk.END, f"Bot: {response}\n")
 75 |         chat_box.config(state=tk.DISABLED)
 76 |         return "Check Your Chat"
 77 |     except Exception as e:
 78 |         return f"An error occurred while taking the screenshot: {e}"
 79 | 
 80 | def extract_text_from_image(image_path):
 81 |     image = cv2.imread(image_path)
 82 |     gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
 83 |     gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
 84 |     filename = "{}.png".format(os.getpid())
 85 |     cv2.imwrite(filename, gray)
 86 |     text = pytesseract.image_to_string(Image.open(filename))
 87 |     os.remove(filename)
 88 |     return text.strip() if text else "No text detected."
 89 | 
 90 | def ask_question_about_text(extracted_text, question):
 91 |     messages = [
 92 |         {
 93 |             "role": "system",
 94 |             "content": "You are a helpful assistant. Please provide a solution to any error found in the extracted text."
 95 |         },
 96 |         {
 97 |             "role": "user",
 98 |             "content": f"The following is the extracted text from an image:\n\n{extracted_text}\n\nNow answer the following question based on the above text:\n\n{question}"
 99 |         }
100 |     ]
101 | 
102 |     chat_completion = client.chat.completions.create(
103 |         messages=messages,
104 |         model="llama3-8b-8192"
105 |     )
106 | 
107 |     response = chat_completion.choices[0].message.content
108 |     return response.strip()
109 | 
110 | def get_youtube_video_url(query):
111 |     query_string = urllib.parse.urlencode({"search_query": query})
112 |     formatUrl = urllib.request.urlopen("https://www.youtube.com/results?" + query_string)
113 |     search_results = re.findall(r"watch\?v=(\S{11})", formatUrl.read().decode())
114 |     if not search_results:
115 |         return None
116 |     video_url = "https://www.youtube.com/watch?v=" + "{}".format(search_results[0])
117 |     return video_url
118 | 
119 | def play_music_from_youtube(query):
120 |     video_url = get_youtube_video_url(query)
121 |     if not video_url:
122 |         return "No results found"
123 |     webbrowser.open(video_url)
124 |     time.sleep(2)
125 |     return f"Playing: {query}"
126 | 
127 | def play_pause_media():
128 |     keyboard.send('play/pause media')
129 |     return "Toggled play/pause"
130 | 
131 | function_calling_template = """
132 | system
133 | 
134 | You are a virtual assistant AI model you have to call function depending upon user request also there may be noramal conversation request if the function for specific problem is not provide do not use toolcall in conversation. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
135 | <tool_call>
136 | {"name": <function-name>,"arguments": <args-dict>}
137 | </tool_call>
138 | 
139 | Here are the available tools:
140 | <tools> {
141 |     "name": "Can_you_fix_error",
142 |     "description": "Fix the Error",
143 |     "parameters": {
144 |         "type": "object",
145 |         "properties": {},
146 |         "required": [],
147 |     },
148 |     "name": "play_pause_media",
149 |     "description": "play or puase the current music",
150 |     "parameters": {
151 |         "type": "object",
152 |         "properties": {},
153 |         "required": [],
154 |     },
155 | 
156 |     "name": "play_music_from_youtube",
157 |     "description": "Play music from YouTube based on a search query",
158 |     "parameters": {
159 |         "type": "object",
160 |         "properties": {
161 |             "query": {
162 |                 "type": "string",
163 |                 "description": "The search query to find the music on YouTube"
164 |             }
165 |         },
166 |         "required": ["query"],
167 |     }
168 | } </tools>
169 | """
170 | 
171 | def listen_for_trigger_word(trigger_word="hello"):
172 |     recognizer = sr.Recognizer()
173 |     with sr.Microphone() as source:
174 |         while True:
175 |             audio = recognizer.listen(source)
176 |             try:
177 |                 text = recognizer.recognize_google(audio).lower()
178 |                 if trigger_word in text:
179 |                     return
180 |             except sr.UnknownValueError:
181 |                 pass
182 | 
183 | def speak(text):
184 |     engine = pyttsx3.init()
185 |     engine.say(text)
186 |     engine.runAndWait()
187 | 
188 | def process_command(command, chat_box):
189 |     messages = [
190 |         {
191 |             "role": "system",
192 |             "content": function_calling_template
193 |         },
194 |         {
195 |             "role": "user",
196 |             "content": command
197 |         }
198 |     ]
199 |     chat_completion = client.chat.completions.create(
200 |         messages=messages,
201 |         model="llama3-8b-8192",
202 |     )
203 |     response = chat_completion.choices[0].message.content
204 | 
205 |     if "<tool_call>" in response:
206 |         function_call = response.split("<tool_call>")[1].split("</tool_call>")[0]
207 |         function_call_json = json.loads(function_call)
208 |         function_name = function_call_json['name']
209 |         if function_name == "Can_you_fix_error":
210 |             return Can_you_fix_error(chat_box)
211 |         elif function_name == "play_music_from_youtube":
212 |             query = function_call_json['arguments']['query']
213 |             return play_music_from_youtube(query)
214 |         elif function_name == "play_pause_media":
215 |             return play_pause_media()
216 | 
217 |     return response
218 | 
219 | def start_bot(chat_box):
220 |     trigger_word = "hello"
221 |     
222 |     while True:
223 |         listen_for_trigger_word(trigger_word)
224 |         st = "Hey, how can I help you?"
225 |         synthesize(st)
226 |         chat_box.config(state=tk.NORMAL)
227 |         chat_box.insert(tk.END, "Bot: Hey, how can I help you?\n")
228 |         chat_box.config(state=tk.DISABLED)
229 | 
230 |         recognizer = sr.Recognizer()
231 |         with sr.Microphone() as source:
232 |             chat_box.config(state=tk.NORMAL)
233 |             chat_box.insert(tk.END, "Listening for command...\n")
234 |             chat_box.config(state=tk.DISABLED)
235 |             audio = recognizer.listen(source)
236 |             try:
237 |                 command = recognizer.recognize_google(audio).lower()
238 |                 chat_box.config(state=tk.NORMAL)
239 |                 chat_box.insert(tk.END, f"You: {command}\n")
240 |                 chat_box.config(state=tk.DISABLED)
241 |                 response = process_command(command, chat_box)
242 |                 chat_box.insert(tk.END, f"Bot: {response}\n")
243 |                 chat_box.config(state=tk.DISABLED)
244 |                 if "```" in response:
245 |                     # Split and format code blocks
246 |                     parts = response.split("```")
247 |                     for i, part in enumerate(parts):
248 |                         if i % 2 == 0:
249 |                             chat_box.insert(tk.END, part)
250 |                         else:
251 |                             chat_box.insert(tk.END, part, "code")
252 |                 else:
253 |                     chat_box.insert(tk.END, f"Bot: {response}\n")
254 |                 if len(response) > 600:
255 |                     speak(response)
256 |                 else:
257 |                     synthesize(response)
258 |                 chat_box.config(state=tk.NORMAL)
259 |             except sr.UnknownValueError:
260 |                 synthesize("Sorry, I didn't catch that.")
261 |                 chat_box.config(state=tk.NORMAL)
262 |                 chat_box.insert(tk.END, "Bot: Sorry, I didn't catch that.\n")
263 |                 chat_box.config(state=tk.DISABLED)
264 |             except Exception as e:
265 |                 synthesize("An unexpected error occurred. Please try again.")
266 |                 chat_box.config(state=tk.NORMAL)
267 |                 chat_box.insert(tk.END, f"Bot: An unexpected error occurred. Please try again. Error: {e}\n")
268 |                 chat_box.config(state=tk.DISABLED)
269 |                 break
270 | 
271 | 
272 | def on_start_button_click(chat_box):
273 |     bot_thread = Thread(target=start_bot, args=(chat_box,))
274 |     bot_thread.start()
275 | 
276 | def create_gui():
277 |     root = tk.Tk()
278 |     root.title("Voice Assistant")
279 | 
280 |     chat_box = ScrolledText(root, wrap=tk.WORD, state='disabled')
281 |     chat_box.pack(padx=10, pady=10, fill=tk.BOTH, expand=True)
282 | 
283 |     start_button = tk.Button(root, text="Start Bot", command=lambda: on_start_button_click(chat_box))
284 |     start_button.pack(pady=10)
285 | 
286 |     root.mainloop()
287 | 
288 | if __name__ == "__main__":
289 |     create_gui()
290 | 


--------------------------------------------------------------------------------
/preview.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kiritoInd/Personal-Voice-Assistant-Using-LLM-FunctionCalling/3f83fbb15bc0530176855cdb7a9bcd1471b73b2d/preview.mp4


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | speechrecognition
 2 | pyttsx3
 3 | groq
 4 | Pillow
 5 | pytesseract
 6 | datasets
 7 | torch
 8 | transformers
 9 | python-dotenv
10 | soundfile
11 | sounddevice
12 | flask
13 | requests
14 | beautifulsoup4
15 | keyboard


--------------------------------------------------------------------------------