├── README.org ├── formatted-output.png ├── setup.py └── src └── voicenotes2org.py /README.org: -------------------------------------------------------------------------------- 1 | #+TITLE: voicenotes2org 2 | 3 | [[./formatted-output.png]] 4 | 5 | =voicenotes2org= is a Python script which collects WAV files in a given directory, sends them to Google Cloud Platform (GCP) for transcription, and then formats the resulting transcripts into a combined org file, including links back to the original audio. 6 | 7 | Each note becomes a heading in the org file, and includes: 8 | 1. The date and time of the note 9 | 2. An org-link of type =voicenote=, which, when followed, plays the original audio file in EMMS 10 | 3. Google's transcript of the note, broken down into 10-second segments. Each segment begins with a =voicenote= link which will play the original audio file /at the time offset which corresponds to that segment/. 11 | 12 | * Prerequisites 13 | 14 | This uses Google's Cloud Speech-to-Text API, and *you will need your own GCP account*. Make sure you have a service account JSON. 15 | 16 | Other than that, you'll need =python3= and =ffmpeg= on your system. Only tested on Arch Linux. 17 | 18 | * Installation 19 | 20 | Clone this repo, then install like so: 21 | 22 | #+begin_src sh 23 | git clone https://github.com/bgutter/voicenotes2org 24 | cd voicenotes2org 25 | pip install . # optionally with sudo, depending on your system 26 | #+end_src 27 | 28 | It's also on PyPI as =voicenotes2org=, but not usually up to date there. 29 | 30 | #+BEGIN_SRC sh 31 | sudo pip install voicenotes2org 32 | #+END_SRC 33 | 34 | * Basic Usage 35 | 36 | Transcription jobs can be defined on the command line, or in a config file. 37 | 38 | CLI Example: 39 | 40 | #+BEGIN_SRC bash 41 | > voicenotes2org --voice_notes_dir=~/new-voice-notes/ --archive_dir=~/org/archived-voice-notes/ --org_transcript_file=~/org/unfiled-voice-notes.org 42 | #+END_SRC 43 | 44 | ...or... 45 | 46 | Config File: 47 | 48 | #+BEGIN_SRC bash 49 | > cat ~/.config/voicenotes2org/default.toml 50 | voice_notes_dir="~/new-voice-notes/" 51 | archive_dir="~/org/archived-voice-notes/" 52 | org_transcript_file="~/org/unfiled-voice-notes.org" 53 | 54 | > voicenotes2org 55 | #+END_SRC 56 | 57 | Note that, in the config file, all relative paths will be interpreted as relative to the config file. For example, "filename_regex.txt" in =~/.config/voicenotes2org/default.toml= will be treated as =~/.config/voicenotes2org/filename_regex.txt=. 58 | 59 | In both case, the script will find every WAV file in =~/new-voice-notes/=, and transcribe them. After transcription, they will be moved to =~/org/archived-voice-notes/=. If =~/org/unfiled-voice-notes.org= does not exist, it will be created with an eval header statement which defines the =voicenote= link type. If the file already exists, voicenotes2org will only append content, leaving existing content unmodified. There will be one new heading for each WAV file transcribed. 60 | 61 | *Optional Arguments* 62 | | Option | Meaning | 63 | |-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 64 | | =--gcp_credentials_path= | Path to JSON file. If provided, use this to access the Google Speech-to-Text API. If missing, you must have configured the GOOGLE_APPLICATION_CREDENTIALS environment variable! | 65 | | =--voicenote_filename_regex_path= | Path to a text file containing a Python regex which will be used to match and parse voice note filenames. It MUST contain named groups for year, month, day, hour, minute, and ampm. All but ampm are local date/time (or, whatever you want, really), 12 hour clock. ampm should be either literally am or pm. *This is an unsanitized input.* Be smart. | 66 | | =--max_concurrent_requests= | Maximum number of concurrent transcription requests. | 67 | | =--verbose= | Default false. Print the name of WAV files currently being transcribed. | 68 | | =--just_copy= | Boolean. Default false. If true, don't remove audio from original folder. | 69 | 70 | If you prefer to avoid eval statements in your file headers, you may instead include this somewhere in your init code: 71 | 72 | #+BEGIN_SRC emacs-lisp 73 | (org-link-set-parameters "voicenote" 74 | :follow (lambda (content) 75 | (cl-multiple-value-bind (file seconds) 76 | (split-string content ":") 77 | (emms-play-file file) 78 | (sit-for 0.5) 79 | (emms-seek-to (string-to-number seconds))))) 80 | #+END_SRC 81 | 82 | * Example Output 83 | 84 | Formatted Output: 85 | 86 | [[./formatted-output.png]] 87 | 88 | Plain Text: 89 | 90 | #+BEGIN_SRC text 91 | # -*- eval: (org-link-set-parameters "voicenote" :follow (lambda (content) (cl-multiple-value-bind (file seconds) (split-string content ":") (emms-play-file file) (sit-for 0.5) (emms-seek-to seconds)))) -*- 92 | #+TITLE: Unfiled Voice Notes 93 | 94 | C-c C-o on any link to play clip starting from that offset. 95 | 96 | * New Voice Note 97 | [2020-01-01 Wed 00:52] 98 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][Archived Clip]] 99 | 100 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][00:00]] this is a second voice note I am talking into a phone right now roses are red violets are blue 101 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:10][00:10]] blah blah blah 102 | 103 | 104 | * New Voice Note 105 | [2020-01-01 Wed 00:52] 106 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][Archived Clip]] 107 | 108 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][00:00]] this is a voice note for testing this is the first one that I will do I'm going to talk about nothing 109 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:10][00:10]] because I don't know what else to say 110 | 111 | 112 | * New Voice Note 113 | [2020-01-01 Wed 00:53] 114 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][Archived Clip]] 115 | 116 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][00:00]] Mona Lisa lost her smile the painters hands are trembling now and if she's out 117 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:10][00:10]] there running wild it's just because I taught her how the Masterpiece that we had planned is laying shattered 118 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:20][00:20]] on the ground Mona Lisa lost her smile and the painters hands are trembling now and the eyes that used to burn for 119 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:30][00:30]] me now they no longer look my way and the love that used to be why it just got lost in yesterday 120 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:40][00:40]] and if she seems cold to the touch well there used to be burn a flame I gave to a little took 121 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:50][00:50]] too much til I erased the painter's name ... too much till I erased the painter's name 122 | #+END_SRC 123 | 124 | * WAV file naming rules 125 | 126 | Unless you define your own regex file, WAV files must be named according to the following pattern: 127 | 128 | .* YYYY-MM-DD H-MM AM|PM .*.wav 129 | 130 | Where: 131 | - =YYYY= is the year. 132 | - =MM= is zero-padded month. 133 | - =DD= is zero-padded day. 134 | - =H= is unpadded (sorry) hour in 12-hour format. 135 | - =MM= is zero-padded minute. 136 | - =AM|PM= is literally just "AM" or "PM". 137 | - Everything is whitespace delimited. 138 | 139 | * 🚨 Limitations 🚨 140 | 141 | Many corners have been cut in the making of this script. If literally anyone else ever uses this code, these issues might be worth fixing some day. 142 | 143 | ** Only WAV files are supported 144 | 145 | Wouldn't be hard to figure out the file format, but Google's transcription API requires non-WAV formats specify things like sample rate and encoding. I did not need this. 146 | 147 | ** Ugliness caused by avoiding Google Cloud Storage 148 | 149 | Google caps the duration of audio which has been inlined into the transcription request at 1 minute. Anything longer than that, and you need to configure a Google Cloud Storage bucket. I didn't want to, so I split each voice note into 55-second chunks with a 5-second overlap. 150 | 151 | For example, a 3 minute long voice note is actually transcribed in 4 separate chunks: 152 | 1. 0:00 to 0:55 -- 55 seconds 153 | 2. 0:50 to 1:45 -- 55 seconds, first 5 overlap 154 | 3. 1:40 to 2:35 -- 55 seconds, first 5 overlap 155 | 4. 2:30 to 3:00 -- 30 seconds, first 5 overlap 156 | 157 | To reduce (or, maybe produce) confusion, I insert an ellipsis (...) into the transcription wherever we're about to start inserting overlapped content. For example: 158 | 159 | #+BEGIN_SRC 160 | and we went to the store for some ... the store for some candy to bring with us 161 | #+END_SRC 162 | 163 | This is ugly and lazy and later versions might improve this. 164 | 165 | * Example Workflow 166 | 167 | This is how I integrate my voice recordings into org-mode. 168 | 169 | *Convenient Voice Recording* 170 | 171 | I record voice notes on my Android device using "Easy Voice Recorder". I use this app specifically because it provides a system shortcut to toggle recording. The first invocation of this shortcut begins recording, and the second stops recording, saving the audio to a new WAV file. A third invocation would start recording again, but with another new file. 172 | 173 | This app also lets you specify how audio files should be named, which makes it easy to encode date and time. 174 | 175 | Most importantly, I use the "Button Mapper" app to *bind a long-press of the volume-up key to this shortcut*. This works even when the screen is off. 176 | 177 | With this setup, ideas, tasks, and notes can be recorded instantly and effortlessly. Just long hold the volume up key, say whatever needs to be said, and long hold again to complete the file. No unlocking the phone, and no interacting with the touchscreen. 178 | 179 | Alternatively, If you don't mind carrying a second device, a dedicated voice recorder would work at least as well. 180 | 181 | *Syncing The Audio Files* 182 | 183 | I use Syncthing to sync the voice notes directory on my Android device to a directory on my PC. This is probably the easiest way to achieve near realtime syncing, and Syncthing is FOSS! 184 | 185 | Alternatively, you can manually copy the files every evening over USB, or SSH, or Google Drive, or...well, you get the idea. 186 | 187 | *Transcription* 188 | 189 | In my org directory structure, I have a file dedicated to receiving transcribed, but not yet properly filed, voice notes. Let's say that this is at =~/org/unfiled-voice-notes.org=. Let's also assume that my untranscribed voice notes are synced -- by Syncthing -- to =~/new-voice-notes/=. 190 | 191 | If I run the example command under the =Basic Usage= heading, then absent any errors, =~/new-voice-notes/= will be cleared out. This frees up space on the phone, though otherwise isn't all that important. What is important is that, for each processed audio file, a new heading will appended to =~/org/unfiled-voice-notes.org=. The audio file will now live in =~/org/archived-voice-notes/=, and any file links in the org entries will point to this location. Because the links are absolute, the headings can be moved around wherever you'd like and will not break. 192 | 193 | *Filing* 194 | 195 | Once =voicenotes2org= has returned, you should open =~/org/unfiled-voice-notes.org= in Emacs, then use =org-refile= to pop each entry into a more proper location in your org directory structure. Make sure you've configured =org-refile-targets= first! 196 | -------------------------------------------------------------------------------- /formatted-output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bgutter/voicenotes2org/3518c927489da0950a7a89dc56d3ec70c18f6f21/formatted-output.png -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | """ 2 | voicenotes2org package config 3 | """ 4 | 5 | from setuptools import setup 6 | 7 | setup( 8 | name='voicenotes2org', 9 | version='1.0.1', 10 | description='Transcribe voice recordings using the Google Cloud Speech-To-Text API, and export the results to Emacs org-mode headings.', 11 | long_description="See https://github.com/bgutter/voicenotes2org for all details!", 12 | url='https://github.com/bgutter/voicenotes2org', 13 | keywords='emacs org org-mode transcribe voice text', 14 | package_dir={'': 'src'}, 15 | py_modules=["voicenotes2org"], 16 | python_requires='>=3.5', 17 | install_requires=['pydub','google-cloud-speech','appdirs','toml','pprint'], 18 | entry_points={ 19 | 'console_scripts': [ 20 | 'voicenotes2org = voicenotes2org:main', 21 | ], 22 | }, 23 | ) 24 | -------------------------------------------------------------------------------- /src/voicenotes2org.py: -------------------------------------------------------------------------------- 1 | """ 2 | voicenotes2org.py 3 | 4 | Batch convert a collection of voice notes (WAV files) to an org-mode file. 5 | """ 6 | 7 | from google.cloud import speech_v1 8 | from google.cloud.speech_v1 import enums 9 | 10 | import pprint 11 | import pydub 12 | import appdirs 13 | import toml 14 | 15 | import argparse 16 | import shutil 17 | import glob 18 | import io 19 | import os 20 | import re 21 | import multiprocessing as mp 22 | from datetime import datetime 23 | 24 | # 25 | # These can be tinkered with to adjust the output file format 26 | # 27 | 28 | TRANSCRIPTION_CHUNK_SIZE = 10 # seconds 29 | SPLICE_STR = "..." 30 | ORG_FILE_HEADER = """# -*- eval: (org-link-set-parameters "voicenote" :follow (lambda (content) (cl-multiple-value-bind (file seconds) (split-string content ":") (emms-play-file file) (sit-for 0.5) (emms-seek-to (string-to-number seconds))))) -*- 31 | 32 | C-c C-o on any link to play clip starting from that offset. 33 | 34 | """ 35 | ENTRY_TEMPLATE = """ 36 | * New Voice Note 37 | [{time_part_str}] 38 | [[voicenote:{link_path}:0][Archived Clip]] 39 | 40 | {body} 41 | """ 42 | TRANSCRIPTION_CHUNK_TEMPLATE = "[[voicenote:{filepath}:{abssecond}][{minute}:{relsecond}]] {text}\n" 43 | DEFAULT_FNAME_PARSER = re.compile( r"[^0-9]*(?P\d+)-(?P\d+)-(?P\d+)\s+(?P\d+)-(?P\d+)\s+(?P\S*).*\.wav" ) 44 | 45 | def create_api_client( gcp_credentials_path=None ): 46 | """ 47 | Open a connection 48 | """ 49 | if "GOOGLE_APPLICATION_CREDENTIALS" not in os.environ: 50 | # Need explicit credentials -- complain if they aren't defined. 51 | if gcp_credentials_path is None: 52 | raise ValueError( "You gotta provide a GCP credentials JSON if it's not set as an environment variable. See https://cloud.google.com/docs/authentication/production." ) 53 | client = speech_v1.SpeechClient.from_service_account_json( gcp_credentials_path ) 54 | 55 | else: 56 | # It should figure things out automatically. 57 | client = speech_v1.SpeechClient() 58 | return client 59 | 60 | def transcribe_wav( local_file_path, gcp_credentials_path=None, language_code="en-US", client=None ): 61 | """ 62 | Pass in path to local WAV file, get a time-indexed transcription. 63 | 64 | Also pass in the path to your GCP credentials JSON, unless you've configured 65 | the GOOGLE_APPLICATION_CREDENTIALS environment variable. 66 | 67 | Return value is a tuple. The first item is the full transcription. The second is 68 | a list of tuples, where the first value in each tuple is offset, in seconds, 69 | since the beginning of the file, and the second value is a transcribed word. 70 | 71 | IE: 72 | text, timemap = transcribe_wav( "./something.wav" ) 73 | print( text ) # This is a test 74 | print( timemap ) # [ (1, "this"), (1, "is"), (1, "a"), (2, "test") ]. 75 | """ 76 | SEGMENT_SIZE = 55 * 1000 # 55 seconds 77 | OVERLAP_SIZE = 5 * 1000 # 5 seconds 78 | 79 | # 80 | # Instantiate a client 81 | # 82 | if client is None: 83 | client = create_api_client( gcp_credentials_path ) 84 | 85 | # 86 | # Build the request. Because we only support WAV, don't need to define encoding 87 | # or sample rate. 88 | # 89 | config = { 90 | "model": "video", # premium model, but cost is basically nothing for single user anyway. Works MUCH better. 91 | "language_code": language_code, 92 | "enable_word_time_offsets": True, 93 | } 94 | 95 | # 96 | # GCP inline audio is restricted to just one minute. To avoid needing to setup 97 | # a GCP bucket, we'll split any provided audio files into 55-second chunks with 98 | # 5 seconds of overlap (since we'll probably split a word). IE, chunk 1 is from 99 | # 0:00 to 0:55, two is from 0:50 to 1:45, etc... 100 | # 101 | full_text = "" 102 | time_map = [] 103 | full_recording = pydub.AudioSegment.from_file( local_file_path, format="wav" ) 104 | full_duration_ms = len( full_recording ) 105 | offset = 0 106 | while offset < full_duration_ms: 107 | 108 | # If we're splitting into chunks, insert a hint 109 | if offset > 0: 110 | full_text += " " + SPLICE_STR + " " 111 | time_map.append( ( int( offset / 1000 ), SPLICE_STR ) ) 112 | 113 | # Segment the clip into a RAM file 114 | this_clip = full_recording[ offset : min( offset + SEGMENT_SIZE, full_duration_ms ) ] 115 | segment_wav = io.BytesIO() 116 | this_clip.export( segment_wav, format="wav" ) 117 | segment_wav.seek(0) 118 | audio = { "content": segment_wav.read() } 119 | 120 | # 121 | # Submit the request & wait synchronously 122 | # 123 | operation = client.long_running_recognize( config, audio ) 124 | response = operation.result() 125 | 126 | # 127 | # Process the response. Only take the first alternative. 128 | # 129 | for result in response.results: 130 | if len( result.alternatives ) < 1: 131 | continue 132 | best_guess = result.alternatives[0] 133 | full_text += best_guess.transcript 134 | time_map.extend( [ ( x.start_time.seconds + int( offset / 1000 ), x.word ) for x in best_guess.words ] ) 135 | 136 | # Next offset 137 | offset += ( SEGMENT_SIZE - OVERLAP_SIZE ) 138 | 139 | return ( full_text, time_map ) 140 | 141 | def recording_date_from_full_path( wav_file_path, regex ): 142 | """ 143 | Return a datetime given a filename. 144 | Throws ValueError if the WAV file doesn't match the regex. 145 | """ 146 | # 147 | # Extract date, time, and ID from wav_file_path 148 | # 149 | match = regex.match( os.path.basename( wav_file_path ) ) 150 | if match is None: 151 | raise ValueError( "Name does not match pattern!" ) 152 | parts = match.groupdict() 153 | 154 | # Convert hours from AMPM to 24 hour, then create a datetime object 155 | dt_args = [ int( parts[p] ) for p in [ "year", "month", "day", "hour", "minute" ] ] 156 | if parts["ampm"].lower() == "am": 157 | # AM: 12->0, 1->1 ... 11->11 158 | if dt_args[-2] == 12: 159 | dt_args[-2] = 0 160 | else: 161 | # PM: 12->12, 1->13, 2->14, ... 11->23 162 | if dt_args[-2] < 12: 163 | dt_args[-2] += 12 164 | return datetime( *dt_args ) 165 | 166 | def path_as_archived( wav_file_path, archive_dir ): 167 | """ 168 | Return the intended path of this wav file after archiving. 169 | """ 170 | return os.path.join( archive_dir, os.path.basename( wav_file_path ) ) 171 | 172 | def format_org_entry( wav_file_path, text, timestamp_map, archive_dir, voicenote_filename_regex ): 173 | """ 174 | Return a string which represents the org-mode heading for this transcription. Includes 175 | links which will play the archived version of the note starting every 10 seconds. 176 | """ 177 | dt = recording_date_from_full_path( wav_file_path, voicenote_filename_regex ) 178 | time_part_str = dt.strftime( "%Y-%m-%d %a %H:%M" ) 179 | 180 | # 181 | # Accumulate words by offset, inserting links & chunks of text every N seconds 182 | # 183 | offset_limit = TRANSCRIPTION_CHUNK_SIZE 184 | words_this_chunk = [] 185 | annotated_transcription = "" 186 | 187 | def append_chunk( running_body, words_this_chunk, offset_limit ): 188 | text = " ".join( words_this_chunk ) 189 | abssecond = offset_limit - TRANSCRIPTION_CHUNK_SIZE 190 | relsecond = abssecond % 60 191 | minute = int( abssecond / 60 ) 192 | running_body += TRANSCRIPTION_CHUNK_TEMPLATE.format( 193 | filepath=path_as_archived( wav_file_path, archive_dir ), 194 | abssecond=abssecond, 195 | minute="{:02d}".format( minute ), 196 | relsecond="{:02d}".format( relsecond ), 197 | text=text ) 198 | words_this_chunk = [ word ] 199 | offset_limit = ( int( word_offset / TRANSCRIPTION_CHUNK_SIZE ) + 1 ) * TRANSCRIPTION_CHUNK_SIZE 200 | return running_body, words_this_chunk, offset_limit 201 | 202 | for ( word_offset, word ) in timestamp_map: 203 | if word_offset < offset_limit: 204 | # Keep accumulating words 205 | words_this_chunk.append( word ) 206 | else: 207 | # Finished a chunk -- write it and start the next 208 | annotated_transcription, words_this_chunk, offset_limit = append_chunk( annotated_transcription, words_this_chunk, offset_limit ) 209 | 210 | # Clear out whatever we have, if anything 211 | if len( words_this_chunk ) > 0: 212 | annotated_transcription, words_this_chunk, offset_limit = append_chunk( annotated_transcription, words_this_chunk, offset_limit ) 213 | 214 | # 215 | # Fill in the entry template 216 | # 217 | return ENTRY_TEMPLATE.format( 218 | time_part_str=time_part_str, 219 | link_path=path_as_archived( wav_file_path, archive_dir ), 220 | body=annotated_transcription ) 221 | 222 | def org_transcribe( voice_notes_dir, archive_dir, org_transcript_file, just_copy=False, gcp_credentials_path=None, verbose=False, max_concurrent_requests=5, voicenote_filename_regex=DEFAULT_FNAME_PARSER ): 223 | """ 224 | Root transcription function. Performs the following steps: 225 | """ 226 | 227 | # 228 | # Filter out anything that doesn't match the filename regex 229 | # TODO: We call recording_date_from_full_path() like 3 times for each record (maybe more). 230 | # Might as well just cache it somewhere. 231 | # 232 | all_wavs = glob.glob( os.path.join( voice_notes_dir, "*.wav" ) ) 233 | correctly_named_wavs = [] 234 | for wav in all_wavs: 235 | try: 236 | recording_date_from_full_path( wav, voicenote_filename_regex ) # Just testing for exception 237 | correctly_named_wavs.append( wav ) 238 | except ValueError: 239 | pass 240 | 241 | # 242 | # Don't create more threads than there are files to transcribe. 243 | # 244 | max_concurrent_requests = min( max_concurrent_requests, len( correctly_named_wavs ) ) 245 | 246 | # 247 | # Get all of the Google transcription results 248 | # 249 | if len( correctly_named_wavs ) > 0: 250 | pool = mp.Pool( max_concurrent_requests, initializer=worker_init_func, initargs=(subprocess_transcribe_function, gcp_credentials_path, verbose) ) 251 | results = [] 252 | for wav_file_path in correctly_named_wavs: 253 | results.append( pool.apply_async( subprocess_transcribe_function, args=( wav_file_path, voicenote_filename_regex ) ) ) 254 | pool.close() 255 | pool.join() 256 | results = [ r.get() for r in results ] 257 | results = [ r for r in results if r is not None ] 258 | else: 259 | results = [] 260 | 261 | # 262 | # Get formatted org entries for all successful transcriptions 263 | # 264 | org_entries = [] 265 | for ( date, wav_file_path, ( text, timestamp_map ) ) in results: 266 | org_entries.append( ( date, wav_file_path, format_org_entry( wav_file_path, text, timestamp_map, archive_dir, voicenote_filename_regex ) ) ) 267 | org_entries = sorted( org_entries, key=lambda x: x[0] ) 268 | 269 | # 270 | # Open file to append headings -- create if needed. 271 | # 272 | if not os.path.exists( org_transcript_file ): 273 | fout = open( org_transcript_file, "w" ) 274 | fout.write( ORG_FILE_HEADER ) 275 | else: 276 | fout = open( org_transcript_file, "a" ) 277 | 278 | # 279 | # Write each heading, move WAV files to archive if it looks 280 | # like the transcription worked. 281 | # 282 | for _, wav_file_path, org_entry in org_entries: 283 | if org_entry is not None: 284 | fout.write( org_entry ) 285 | dst_path = path_as_archived( wav_file_path, archive_dir ) 286 | if just_copy: 287 | shutil.copy2( wav_file_path, dst_path ) 288 | else: 289 | shutil.move( wav_file_path, dst_path ) 290 | else: 291 | print( "Possible failure on file {}?".format( wav_file_path ) ) 292 | fout.close() 293 | 294 | # 295 | # Done! 296 | # 297 | if verbose: 298 | print( "Done!" ) 299 | 300 | def subprocess_transcribe_function( fname, voicenote_filename_regex ): 301 | """ 302 | This is performed in another process. 303 | """ 304 | if not hasattr( subprocess_transcribe_function, "client" ): 305 | # Init function failed. 306 | return None 307 | if subprocess_transcribe_function.verbose: 308 | # TODO: We should (probably?) queue these messages and print() on a single thread/process...but.... 309 | print( "Transcribing {}...".format( fname ) ) 310 | try: 311 | ret = ( recording_date_from_full_path( fname, voicenote_filename_regex ), fname, transcribe_wav( fname, client=subprocess_transcribe_function.client ) ) 312 | except BaseException as e: 313 | # Do NOT kill the program. We'll leave the audio file in the unprocessed directory. 314 | print( "ERROR:" ) 315 | print( e ) 316 | ret = None 317 | return ret 318 | 319 | def worker_init_func( the_mapped_function, credentials_path, verbose ): 320 | """ 321 | Create a client and attach it to the function. 322 | This is called once per worker. 323 | It works because each worker is an independent process, and has its own copy 324 | of the subprocess_transcribe_function() function. 325 | """ 326 | if verbose: 327 | print( "Creating a new client..." ) 328 | try: 329 | the_mapped_function.client = create_api_client( credentials_path ) 330 | the_mapped_function.verbose = verbose 331 | except BaseException as e: 332 | # Probably failed to create a client. We want to exit, but can't from a subprocess. 333 | # subprocess_transcribe_function will return None 334 | print( e ) 335 | 336 | def main(): 337 | """ 338 | CLI for this package. Just wraps org_transcribe(). 339 | """ 340 | # 341 | # Parse CLI 342 | # 343 | parser = argparse.ArgumentParser( description="Transcribe a directory of wav files into a single Emacs org-mode file." ) 344 | parser.add_argument( "--voice_notes_dir", type=str, help="Directory of WAV files which will be searched non-recursively." ) 345 | parser.add_argument( "--archive_dir", type=str, help="Directory where WAV files will be placed after transcription." ) 346 | parser.add_argument( "--org_transcript_file", type=str, help="Org file where transcription headings will be appended. Will be created if it doesn't exist." ) 347 | parser.add_argument( "--just_copy", type=bool, help="If True, don't remove files from voice_notes_dir. Default is False." ) 348 | parser.add_argument( "--gcp_credentials_path", type=str, help="Path to GCP credentials JSON, if environment variables are unconfigured." ) 349 | parser.add_argument( "--verbose", type=bool, help="Prints out which WAV we're working on." ) 350 | parser.add_argument( "--max_concurrent_requests", type=int, help="Maximum number of concurrent transcription requests." ) 351 | parser.add_argument( "--voicenote_filename_regex_path", type=str, help="Path to a text file containing a Python regex, which will be used to match " 352 | "and parse voice note filenames. It MUST contain named groups for year, month, day, hour, minute, and ampm. All but ampm " 353 | "are local date/time (or, whatever you want, really), 12 hour clock. ampm should be either literally am or pm.") 354 | cli_kwargs = { k: v for k, v in vars( parser.parse_args() ).items() if v is not None } 355 | 356 | # 357 | # If a config file exists, find anything missing there 358 | # 359 | config_file_path = os.path.join( appdirs.user_config_dir( "voicenotes2org", "voicenotes2org" ), "default.toml" ) 360 | kwargs = {} 361 | if os.path.exists( config_file_path ): 362 | with open( config_file_path, "r" ) as fin: 363 | try: 364 | 365 | # Read the kwargs from the TOML 366 | kwargs = toml.load( fin ) 367 | 368 | # Check args and expand paths (Like ~ and $VAR 369 | # also convert any relative paths to be relative /to the config file/, not CWD 370 | path_args = [ "voice_notes_dir", "archive_dir", "org_transcript_file", "voicenote_filename_regex_path" ] 371 | for p in path_args: 372 | if p in kwargs: 373 | kwargs[ p ] = os.path.expanduser( os.path.expandvars( kwargs[ p ] ) ) 374 | if not os.path.isabs( kwargs[ p ] ): 375 | kwargs[ p ] = os.path.join( os.path.dirname( config_file_path ), kwargs[ p ] ) 376 | 377 | except toml.decoder.TomlDecodeError as e: 378 | print( "\nInvalid config file at {}!".format( config_file_path ) ) 379 | print( str( e ) ) 380 | print( ) 381 | exit( -1 ) 382 | 383 | # 384 | # Determine final kwargs -- CLI always overwrites config file 385 | # 386 | kwargs.update( cli_kwargs ) 387 | 388 | # 389 | # If user supplied a voicenote_filename_regex_path, replace it with a compiled regex. 390 | # 391 | if "voicenote_filename_regex_path" in kwargs: 392 | with open( kwargs[ "voicenote_filename_regex_path" ], "r" ) as fin: 393 | content = [ line for line in fin.readlines() if not line.startswith( "#" ) ] 394 | content = "".join( content ) 395 | try: 396 | regex = re.compile( content ) 397 | del kwargs[ "voicenote_filename_regex_path" ] # Not valid to org_transcribe 398 | except re.error as e: 399 | print( "Invalid regex!" ) 400 | print( str( e ) ) 401 | exit( -1 ) 402 | 403 | # 404 | # Explain ourselves 405 | # 406 | if "verbose" in kwargs and kwargs[ "verbose" ]: 407 | print() 408 | print( "Config Options:" ) 409 | pprint.pprint( kwargs ) 410 | print() 411 | 412 | # 413 | # Go! 414 | # 415 | org_transcribe( **kwargs ) 416 | 417 | if __name__ == "__main__": 418 | main() 419 | --------------------------------------------------------------------------------