├── requirements.txt ├── LICENSE ├── README.md ├── livewhisper.py ├── mediactl.py └── assistant.py /requirements.txt: -------------------------------------------------------------------------------- 1 | openai-whisper 2 | numpy 3 | sounddevice 4 | scipy 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Nikolaus Stromberg 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LiveWhisper - Whisper based transcription 2 | 3 | `livewhisper.py` outputs psuedo-live sentence-by-sentence dictation to terminal. 4 | Using [OpenAI's Whisper](https://github.com/openai/whisper) model, and sounddevice library to listen to microphone. 5 | Audio from mic is stored if it hits a volume & frequency threshold, then when 6 | silence is detected, it saves the audio to a temp file and sends it to Whisper. 7 | 8 | *Dependencies:* Whisper, numpy, scipy, sounddevice 9 | 10 | LiveWhisper can somewhat work as an alternative to [SpeechRecognition](https://github.com/Uberi/speech_recognition). 11 | Although that now has it's own Whisper support, so it's up to you. ;) 12 | 13 | --- 14 | 15 | ## Whisper Assistant 16 | 17 | I've also included `assistant.py`, which using livewhisper as a base, is my 18 | attempt at making a simple voice-command assistant like Siri, Alexa, or Jarvis. 19 | 20 | Same dependencies as livewhisper, as well as requests, pyttsx3, wikipedia, bs4. 21 | *Also needs:* espeak and python3-espeak. 22 | 23 | The voice assistant can be activated by saying it's name, default "computer", 24 | "hey computer" or "okay computer" also work. You can wait for the computer to 25 | then respond, or immediately request an action/question without pausing. 26 | 27 | Available features: Weather, date & time, tell jokes, & do wikipedia searches. 28 | It can also handle some other requests, like basic math or real simple trivia. 29 | Tho that relies on Google's instant-answer snippets & sometimes doesn't work. 30 | 31 | Control media-players using: play, pause, next, previous, stop, what's playing? 32 | Media controls need some form of noise/echo cancelling enabled to work right. 33 | See [this page](https://www.linuxuprising.com/2020/09/how-to-enable-echo-noise-cancellation.html) for more information on how to enable that in Linux PulseAudio. 34 | 35 | You can close the assistant via `ctrl+c`, or by saying it's name & "terminate". 36 | 37 | --- 38 | 39 | If you like my projects and want to help me keep making more, 40 | please consider donating on [my Ko-fi page](https://ko-fi.com/nik85)! Thanks! 41 | 42 | [![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/F1F4GRRWB) 43 | -------------------------------------------------------------------------------- /livewhisper.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import whisper, os 3 | import numpy as np 4 | import sounddevice as sd 5 | from scipy.io.wavfile import write 6 | 7 | # This is my attempt to make psuedo-live transcription of speech using Whisper. 8 | # Since my system can't use pyaudio, I'm using sounddevice instead. 9 | # This terminal implementation can run standalone or imported for assistant.py 10 | # by Nik Stromberg - nikorasu85@gmail.com - MIT 2022 - copilot 11 | 12 | Model = 'small' # Whisper model size (tiny, base, small, medium, large) 13 | English = True # Use English-only model? 14 | Translate = False # Translate non-English to English? 15 | SampleRate = 44100 # Stream device recording frequency 16 | BlockSize = 30 # Block size in milliseconds 17 | Threshold = 0.1 # Minimum volume threshold to activate listening 18 | Vocals = [50, 1000] # Frequency range to detect sounds that could be speech 19 | EndBlocks = 40 # Number of blocks to wait before sending to Whisper 20 | 21 | class StreamHandler: 22 | def __init__(self, assist=None): 23 | if assist == None: # If not being run by my assistant, just run as terminal transcriber. 24 | class fakeAsst(): running, talking, analyze = True, False, None 25 | self.asst = fakeAsst() # anyone know a better way to do this? 26 | else: self.asst = assist 27 | self.running = True 28 | self.padding = 0 29 | self.prevblock = self.buffer = np.zeros((0,1)) 30 | self.fileready = False 31 | print("\033[96mLoading Whisper Model..\033[0m", end='', flush=True) 32 | self.model = whisper.load_model(f'{Model}{".en" if English else ""}') 33 | print("\033[90m Done.\033[0m") 34 | 35 | def callback(self, indata, frames, time, status): 36 | #if status: print(status) # for debugging, prints stream errors. 37 | if not any(indata): 38 | print('\033[31m.\033[0m', end='', flush=True) # if no input, prints red dots 39 | #print("\033[31mNo input or device is muted.\033[0m") #old way 40 | #self.running = False # used to terminate if no input 41 | return 42 | # A few alternative methods exist for detecting speech.. #indata.max() > Threshold 43 | #zero_crossing_rate = np.sum(np.abs(np.diff(np.sign(indata)))) / (2 * indata.shape[0]) # threshold 20 44 | freq = np.argmax(np.abs(np.fft.rfft(indata[:, 0]))) * SampleRate / frames 45 | if np.sqrt(np.mean(indata**2)) > Threshold and Vocals[0] <= freq <= Vocals[1] and not self.asst.talking: 46 | print('.', end='', flush=True) 47 | if self.padding < 1: self.buffer = self.prevblock.copy() 48 | self.buffer = np.concatenate((self.buffer, indata)) 49 | self.padding = EndBlocks 50 | else: 51 | self.padding -= 1 52 | if self.padding > 1: 53 | self.buffer = np.concatenate((self.buffer, indata)) 54 | elif self.padding < 1 < self.buffer.shape[0] > SampleRate: # if enough silence has passed, write to file. 55 | self.fileready = True 56 | write('dictate.wav', SampleRate, self.buffer) # I'd rather send data to Whisper directly.. 57 | self.buffer = np.zeros((0,1)) 58 | elif self.padding < 1 < self.buffer.shape[0] < SampleRate: # if recording not long enough, reset buffer. 59 | self.buffer = np.zeros((0,1)) 60 | print("\033[2K\033[0G", end='', flush=True) 61 | else: 62 | self.prevblock = indata.copy() #np.concatenate((self.prevblock[-int(SampleRate/10):], indata)) # SLOW 63 | 64 | def process(self): 65 | if self.fileready: 66 | print("\n\033[90mTranscribing..\033[0m") 67 | result = self.model.transcribe('dictate.wav',fp16=False,language='en' if English else '',task='translate' if Translate else 'transcribe') 68 | print(f"\033[1A\033[2K\033[0G{result['text']}") 69 | if self.asst.analyze != None: self.asst.analyze(result['text']) 70 | self.fileready = False 71 | 72 | def listen(self): 73 | print("\033[32mListening.. \033[37m(Ctrl+C to Quit)\033[0m") 74 | with sd.InputStream(channels=1, callback=self.callback, blocksize=int(SampleRate * BlockSize / 1000), samplerate=SampleRate): 75 | while self.running and self.asst.running: self.process() 76 | 77 | def main(): 78 | try: 79 | handler = StreamHandler() 80 | handler.listen() 81 | except (KeyboardInterrupt, SystemExit): pass 82 | finally: 83 | print("\n\033[93mQuitting..\033[0m") 84 | if os.path.exists('dictate.wav'): os.remove('dictate.wav') 85 | 86 | if __name__ == '__main__': 87 | main() # by Nik 88 | -------------------------------------------------------------------------------- /mediactl.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # The following are basic functions for controlling available media players on a Linux system, using dbus. 4 | # Intended for use with only one media player running, tho works with multiple just without separate controls. 5 | # If dbus error, try setting include-system-site-packages = true in virtual environment's pyvenv.cfg file. 6 | # by Nik Stromberg - nikorasu85@gmail.com - MIT 2022 - copilot 7 | 8 | from dbus import SessionBus, Interface 9 | 10 | bus = SessionBus() 11 | 12 | def _playerlist() -> list: 13 | """Returns a list of all available media player services, for mediactl functions.""" 14 | return [service for service in bus.list_names() if service.startswith('org.mpris.MediaPlayer2.')] 15 | 16 | def playpause() -> int: 17 | """Toggles play/pause for all available media players, returns number successed.""" 18 | players = _playerlist() 19 | worked = len(players) 20 | for player in players: 21 | try: 22 | player = bus.get_object(player, '/org/mpris/MediaPlayer2') 23 | player.PlayPause(dbus_interface='org.mpris.MediaPlayer2.Player') 24 | except: 25 | worked -= 1 26 | return worked 27 | 28 | def next() -> int: 29 | """Go to next track for all available media players, returns number successed.""" 30 | players = _playerlist() 31 | worked = len(players) 32 | for player in players: 33 | try: 34 | player = bus.get_object(player, '/org/mpris/MediaPlayer2') 35 | player.Next(dbus_interface='org.mpris.MediaPlayer2.Player') 36 | except: 37 | worked -= 1 38 | return worked 39 | 40 | def prev() -> int: 41 | """Go to previous track for all available media players, returns number successed.""" 42 | players = _playerlist() 43 | worked = len(players) 44 | for player in players: 45 | try: 46 | player = bus.get_object(player, '/org/mpris/MediaPlayer2') 47 | player.Previous(dbus_interface='org.mpris.MediaPlayer2.Player') 48 | except: 49 | worked -= 1 50 | return worked 51 | 52 | def stop() -> int: 53 | """Stop playback for all available media players, returns number successed.""" 54 | players = _playerlist() 55 | worked = len(players) 56 | for player in players: 57 | try: 58 | player = bus.get_object(player, '/org/mpris/MediaPlayer2') 59 | player.Stop(dbus_interface='org.mpris.MediaPlayer2.Player') 60 | except: 61 | worked -= 1 62 | return worked 63 | 64 | def volumeup() -> int: 65 | """Increase volume for all available media players, returns number successed.""" 66 | players = _playerlist() 67 | worked = len(players) 68 | for player in players: 69 | try: 70 | player = bus.get_object(player, '/org/mpris/MediaPlayer2') 71 | properties = Interface(player, dbus_interface='org.freedesktop.DBus.Properties') 72 | volume = properties.Get('org.mpris.MediaPlayer2.Player', 'Volume') 73 | properties.Set('org.mpris.MediaPlayer2.Player', 'Volume', volume+0.2) 74 | except: 75 | worked -= 1 76 | return worked 77 | 78 | def volumedown() -> int: 79 | """Decrease volume for all available media players, returns number successed.""" 80 | players = _playerlist() 81 | worked = len(players) 82 | for player in players: 83 | try: 84 | player = bus.get_object(player, '/org/mpris/MediaPlayer2') 85 | properties = Interface(player, dbus_interface='org.freedesktop.DBus.Properties') 86 | volume = properties.Get('org.mpris.MediaPlayer2.Player', 'Volume') 87 | properties.Set('org.mpris.MediaPlayer2.Player', 'Volume', volume-0.2) 88 | except: 89 | worked -= 1 90 | return worked 91 | 92 | def status() -> list: 93 | """Returns list of dicts containing title, artist, & status for each media player.""" 94 | players = _playerlist() 95 | details = [] 96 | for player in players: 97 | try: 98 | player = bus.get_object(player, '/org/mpris/MediaPlayer2') 99 | properties = Interface(player, dbus_interface='org.freedesktop.DBus.Properties') 100 | metadata = properties.Get('org.mpris.MediaPlayer2.Player', 'Metadata') 101 | Title = metadata['xesam:title'] if 'xesam:title' in metadata else 'Unknown' 102 | Artist = metadata['xesam:artist'][0] if 'xesam:artist' in metadata else 'Unknown' 103 | PlayStatus = properties.Get('org.mpris.MediaPlayer2.Player', 'PlaybackStatus') 104 | details.append({'status': str(PlayStatus), 'title': str(Title), 'artist': str(Artist)}) 105 | except: 106 | pass 107 | return details 108 | 109 | #if __name__ == '__main__': # If I decide to make this a standalone media controller later. -------------------------------------------------------------------------------- /assistant.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | from livewhisper import StreamHandler 3 | from bs4 import BeautifulSoup 4 | from subprocess import call 5 | import wikipedia, requests 6 | import pyttsx3, mediactl 7 | import time, os, re 8 | #import webbrowser #wip might use later 9 | 10 | # My simple AI assistant using my LiveWhisper as a base. Can perform simple tasks such as: 11 | # searching wikipedia, telling the date/time/weather/jokes, basic math and trivia, and more. 12 | # ToDo: dictation to xed or similar, dynamically open requested sites/apps, or find simpler way. 13 | # by Nik Stromberg - nikorasu85@gmail.com - MIT 2022 - copilot 14 | 15 | AIname = "computer" # Name to call the assistant, such as "computer" or "jarvis". Activates further commands. 16 | City = '' # Default city for weather, Google uses + for spaces. (uses IP location if not specified) 17 | 18 | # possibly redudant settings, but keeping them for easy debugging, for now. 19 | Model = 'small' # Whisper model size (tiny, base, small, medium, large) 20 | English = True # Use english-only model? 21 | Translate = False # Translate non-english to english? 22 | SampleRate = 44100 # Stream device recording frequency 23 | BlockSize = 30 # Block size in milliseconds 24 | Threshold = 0.1 # Minimum volume threshold to activate listening 25 | Vocals = [50, 1000] # Frequency range to detect sounds that could be speech 26 | EndBlocks = 40 # Number of blocks to wait before sending to Whisper 27 | 28 | class Assistant: 29 | def __init__(self): 30 | self.running = True 31 | self.talking = False 32 | self.prompted = False 33 | self.espeak = pyttsx3.init() 34 | self.espeak.setProperty('rate', 180) # speed of speech, 175 is terminal default, 200 is pyttsx3 default 35 | self.askwiki = False 36 | self.weatherSave = ['',0] 37 | self.ua = 'Mozilla/5.0 (X11; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0' 38 | 39 | def analyze(self, input): # This is the decision tree for the assistant 40 | string = "".join(ch for ch in input if ch not in ",.?!'").lower() # Removes punctuations Whisper adds 41 | query = string.split() # Split into words 42 | if query in ([AIname],["hey",AIname],["okay",AIname],["ok",AIname]): # if that's all they said, prompt for more input 43 | self.speak('Yes?') 44 | self.prompted = True 45 | if queried := self.prompted or string[1:].startswith((AIname,"hey "+AIname,"okay "+AIname,"ok "+AIname)): #AIname in query 46 | query = [word for word in query if word not in {"hey","okay","ok",AIname}] # remake query without AIname prompts 47 | if self.askwiki or (queried and "wikipedia" in query or "wiki" in query): 48 | wikiwords = {"okay","hey",AIname,"please","could","would","do","a","check","i","need","wikipedia", 49 | "search","for","on","what","whats","who","whos","is","was","an","does","say","can", 50 | "you","tell","give","get","me","results","info","information","about","something","ok"} 51 | query = [word for word in query if word not in wikiwords] # remake query without wikiwords 52 | if query == [] and not self.askwiki: # if query is empty after removing wikiwords, ask user for search term 53 | self.speak("What would you like to know about?") 54 | self.askwiki = True 55 | elif query == [] and self.askwiki: # if query is still empty, cancel search 56 | self.speak("No search term given, canceling.") 57 | self.askwiki = False 58 | else: 59 | self.speak(self.getwiki(" ".join(query))) # search wikipedia for query 60 | self.askwiki = False 61 | self.prompted = False 62 | elif queried and re.search(r"(song|title|track|name|playing)+", ' '.join(query)): 63 | self.speak(mediactl.status()[0]['title']) 64 | self.prompted = False 65 | elif queried and re.search(r"(play|pause|unpause|resume)+", ' '.join(query)): 66 | mediactl.playpause() 67 | self.prompted = False 68 | elif queried and "stop" in query: 69 | #self.espeak.stop() #could check .isBusy() 70 | mediactl.stop() 71 | self.prompted = False 72 | elif queried and "next" in query or "forward" in query or "skip" in query: 73 | mediactl.next() 74 | self.prompted = False 75 | elif queried and "previous" in query or "back" in query or "last" in query: 76 | mediactl.prev() 77 | self.prompted = False 78 | elif queried and re.search(r"^(volume (up|louder)|(louder|more) (music|volume)|turn (it|the (music|volume|sound)) up( more)?|turn up the (music|volume|sound)|(increase|raise) the (volume|sound))( more)?$", ' '.join(query)): 79 | mediactl.volumeup() 80 | self.prompted = False 81 | elif queried and re.search(r"^(volume (down|lower)|(lower|less) (music|volume)|turn (it|the (music|volume|sound)) down( more)?|turn down the (music|volume|sound)|(decrease|lower) the (volume|sound))( more)?$", ' '.join(query)): 82 | mediactl.volumedown() 83 | self.prompted = False 84 | elif queried and "weather" in query: # get weather for preset {City}. ToDo: allow user to specify city in prompt 85 | self.speak(self.getweather()) 86 | self.prompted = False 87 | elif queried and "time" in query: 88 | self.speak(time.strftime("The time is %-I:%M %p.")) 89 | self.prompted = False 90 | elif queried and "date" in query: 91 | self.speak(time.strftime(f"Today's date is %B {self.orday()} %Y.")) 92 | self.prompted = False 93 | elif queried and "day" in query or "today" in query: # and ("what" in query or "whats" in query): # might need this in a few places 94 | self.speak(time.strftime(f"It's %A the {self.orday()}.")) 95 | self.prompted = False 96 | elif queried and "joke" in query or "jokes" in query or "funny" in query: 97 | try: 98 | joke = requests.get('https://icanhazdadjoke.com', headers={'Accept':'text/plain','User-Agent':self.ua}).text 99 | except requests.exceptions.ConnectionError: 100 | joke = "I can't think of any jokes right now. Connection Error." 101 | self.speak(joke) 102 | self.prompted = False 103 | elif queried and "terminate" in query: # still deciding on best phrase to close the assistant 104 | self.running = False 105 | self.speak("Closing Assistant.") 106 | elif queried and len(query) > 2: # tries to detect anything else, but if user mistakenly said prompt word, ignores 107 | self.speak(self.getother('+'.join(query))) 108 | self.prompted = False 109 | 110 | def speak(self, text): 111 | self.talking = True # if I wanna add stop ability, I think function needs to be it's own object 112 | print(f"\n\033[92m{text}\033[0m\n") 113 | self.espeak.say(text) #call(['espeak',text]) #'-v','en-us' #without pytttsx3 114 | self.espeak.runAndWait() 115 | self.talking = False 116 | 117 | def getweather(self) -> str: 118 | curTime = time.time() 119 | if curTime - self.weatherSave[1] > 300 or self.weatherSave[1] == 0: # if last weather request was over 5 minutes ago 120 | try: 121 | html = requests.get("https://www.google.com/search?q=weather"+City, {'User-Agent':self.ua}).content 122 | soup = BeautifulSoup(html, 'html.parser') 123 | loc = soup.find("span",attrs={"class":"BNeawe tAd8D AP7Wnd"}).text.split(',')[0] 124 | skyc = soup.find('div', attrs={'class':'BNeawe tAd8D AP7Wnd'}).text.split('\n')[1] 125 | temp = soup.find('div', attrs={'class':'BNeawe iBp4i AP7Wnd'}).text 126 | temp += 'ahrenheit' if temp[-1] == 'F' else 'elcius' 127 | self.weatherSave[0] = f'Current weather in {loc} is {skyc}, with a temperature of {temp}.' 128 | #weather = requests.get(f'http://wttr.in/{City}?format=%C+with+a+temperature+of+%t') #alternative weather API 129 | #self.weatherSave[0] = f"Current weather in {City} is {weather.text.replace('+','')}." 130 | self.weatherSave[1] = curTime 131 | except requests.exceptions.ConnectionError: 132 | return "I couldn't connect to the weather service." 133 | return self.weatherSave[0] 134 | 135 | def getwiki(self, text) -> str: 136 | try: 137 | wikisum = wikipedia.summary(text, sentences=2, auto_suggest=False) 138 | wikipage = wikipedia.page(text, auto_suggest=False) #auto_suggest=False prevents random results 139 | try: 140 | call(['notify-send','Wikipedia',wikipage.url]) #with plyer: notification.notify('Wikipedia',wikipage.url,'Assistant') 141 | finally: 142 | return 'According to Wikipedia:\n'+wikisum #self.speak(wikisum) 143 | except (wikipedia.exceptions.PageError, wikipedia.exceptions.WikipediaException): 144 | return "I couldn't find that right now, maybe phrase it differently?" 145 | 146 | def getother(self, text) -> str: 147 | try: 148 | html = requests.get("https://www.google.com/search?q="+text, {'User-Agent':self.ua}).content 149 | soup = BeautifulSoup(html, 'html.parser') 150 | return soup.find('div', attrs={'class':'BNeawe iBp4i AP7Wnd'}).text 151 | except: 152 | return "Sorry, I'm afraid I can't do that." 153 | 154 | def orday(self) -> str: # Returns day of the month with Ordinal suffix: 1st, 2nd, 3rd, 4th, etc. 155 | day = time.strftime("%-d") 156 | return day+'th' if int(day) in [11,12,13] else day+{1:'st',2:'nd',3:'rd'}.get(int(day)%10,'th') 157 | 158 | def main(): 159 | try: 160 | AIstant = Assistant() #voice object before this? 161 | handler = StreamHandler(AIstant) 162 | handler.listen() 163 | except (KeyboardInterrupt, SystemExit): pass 164 | finally: 165 | print("\n\033[93mQuitting..\033[0m") 166 | if os.path.exists('dictate.wav'): os.remove('dictate.wav') 167 | 168 | if __name__ == '__main__': 169 | main() # by Nik 170 | --------------------------------------------------------------------------------