├── README.org
├── formatted-output.png
├── setup.py
└── src
    └── voicenotes2org.py


/README.org:
--------------------------------------------------------------------------------
  1 | #+TITLE: voicenotes2org
  2 | 
  3 | [[./formatted-output.png]]
  4 | 
  5 | =voicenotes2org= is a Python script which collects WAV files in a given directory, sends them to Google Cloud Platform (GCP) for transcription, and then formats the resulting transcripts into a combined org file, including links back to the original audio.
  6 | 
  7 | Each note becomes a heading in the org file, and includes:
  8 | 1. The date and time of the note
  9 | 2. An org-link of type =voicenote=, which, when followed, plays the original audio file in EMMS
 10 | 3. Google's transcript of the note, broken down into 10-second segments. Each segment begins with a =voicenote= link which will play the original audio file /at the time offset which corresponds to that segment/.
 11 | 
 12 | * Prerequisites
 13 | 
 14 | This uses Google's Cloud Speech-to-Text API, and *you will need your own GCP account*. Make sure you have a service account JSON.
 15 | 
 16 | Other than that, you'll need =python3= and =ffmpeg= on your system. Only tested on Arch Linux.
 17 | 
 18 | * Installation
 19 | 
 20 | Clone this repo, then install like so:
 21 | 
 22 | #+begin_src sh
 23 |   git clone https://github.com/bgutter/voicenotes2org
 24 |   cd voicenotes2org
 25 |   pip install . # optionally with sudo, depending on your system
 26 | #+end_src
 27 | 
 28 | It's also on PyPI as =voicenotes2org=, but not usually up to date there.
 29 | 
 30 | #+BEGIN_SRC sh
 31 | sudo pip install voicenotes2org
 32 | #+END_SRC
 33 | 
 34 | * Basic Usage
 35 | 
 36 | Transcription jobs can be defined on the command line, or in a config file.
 37 | 
 38 | CLI Example:
 39 | 
 40 | #+BEGIN_SRC bash
 41 | > voicenotes2org --voice_notes_dir=~/new-voice-notes/ --archive_dir=~/org/archived-voice-notes/ --org_transcript_file=~/org/unfiled-voice-notes.org
 42 | #+END_SRC
 43 | 
 44 | ...or...
 45 | 
 46 | Config File:
 47 | 
 48 | #+BEGIN_SRC bash
 49 | > cat ~/.config/voicenotes2org/default.toml
 50 | voice_notes_dir="~/new-voice-notes/"
 51 | archive_dir="~/org/archived-voice-notes/"
 52 | org_transcript_file="~/org/unfiled-voice-notes.org"
 53 | 
 54 | > voicenotes2org
 55 | #+END_SRC
 56 | 
 57 | Note that, in the config file, all relative paths will be interpreted as relative to the config file. For example, "filename_regex.txt" in =~/.config/voicenotes2org/default.toml= will be treated as =~/.config/voicenotes2org/filename_regex.txt=.
 58 | 
 59 | In both case, the script will find every WAV file in =~/new-voice-notes/=, and transcribe them. After transcription, they will be moved to =~/org/archived-voice-notes/=. If =~/org/unfiled-voice-notes.org= does not exist, it will be created with an eval header statement which defines the =voicenote= link type. If the file already exists, voicenotes2org will only append content, leaving existing content unmodified. There will be one new heading for each WAV file transcribed.
 60 | 
 61 | *Optional Arguments*
 62 | | Option                            | Meaning                                                                                                                                                                                                                                                                                                          |
 63 | |-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 64 | | =--gcp_credentials_path=          | Path to JSON file. If provided, use this to access the Google Speech-to-Text API. If missing, you must have configured the GOOGLE_APPLICATION_CREDENTIALS environment variable!                                                                                                                                  |
 65 | | =--voicenote_filename_regex_path= | Path to a text file containing a Python regex which will be used to match and parse voice note filenames. It MUST contain named groups for year, month, day, hour, minute, and ampm. All but ampm are local date/time (or, whatever you want, really), 12 hour clock. ampm should be either literally am or pm. *This is an unsanitized input.* Be smart. |
 66 | | =--max_concurrent_requests=       | Maximum number of concurrent transcription requests.                                                                                                                                                                                                                                                             |
 67 | | =--verbose=                       | Default false. Print the name of WAV files currently being transcribed.                                                                                                                                                                                                                                          |
 68 | | =--just_copy=                     | Boolean. Default false. If true, don't remove audio from original folder.                                                                                                                                                                                                                                        |
 69 | 
 70 | If you prefer to avoid eval statements in your file headers, you may instead include this somewhere in your init code:
 71 | 
 72 | #+BEGIN_SRC emacs-lisp
 73 |   (org-link-set-parameters "voicenote"
 74 |                            :follow (lambda (content)
 75 |                                      (cl-multiple-value-bind (file seconds)
 76 |                                          (split-string content ":")
 77 |                                        (emms-play-file file)
 78 |                                        (sit-for 0.5)
 79 |                                        (emms-seek-to (string-to-number seconds)))))
 80 | #+END_SRC
 81 | 
 82 | * Example Output
 83 | 
 84 | Formatted Output:
 85 | 
 86 | [[./formatted-output.png]]
 87 | 
 88 | Plain Text:
 89 | 
 90 | #+BEGIN_SRC text
 91 | # -*- eval: (org-link-set-parameters "voicenote" :follow (lambda (content) (cl-multiple-value-bind (file seconds) (split-string content ":") (emms-play-file file) (sit-for 0.5) (emms-seek-to seconds)))) -*-
 92 | #+TITLE: Unfiled Voice Notes
 93 | 
 94 | C-c C-o on any link to play clip starting from that offset.
 95 | 
 96 | * New Voice Note
 97 | [2020-01-01 Wed 00:52]
 98 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][Archived Clip]]
 99 | 
100 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][00:00]] this is a second voice note I am talking into a phone right now roses are red violets are blue
101 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:10][00:10]] blah blah blah
102 | 
103 | 
104 | * New Voice Note
105 | [2020-01-01 Wed 00:52]
106 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][Archived Clip]]
107 | 
108 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][00:00]] this is a voice note for testing this is the first one that I will do I'm going to talk about nothing
109 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:10][00:10]] because I don't know what else to say
110 | 
111 | 
112 | * New Voice Note
113 | [2020-01-01 Wed 00:53]
114 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][Archived Clip]]
115 | 
116 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][00:00]] Mona Lisa lost her smile the painters hands are trembling now and if she's out
117 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:10][00:10]] there running wild it's just because I taught her how the Masterpiece that we had planned is laying shattered
118 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:20][00:20]] on the ground Mona Lisa lost her smile and the painters hands are trembling now and the eyes that used to burn for
119 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:30][00:30]] me now they no longer look my way and the love that used to be why it just got lost in yesterday
120 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:40][00:40]] and if she seems cold to the touch well there used to be burn a flame I gave to a little took
121 | [[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:50][00:50]] too much til I erased the painter's name ... too much till I erased the painter's name
122 | #+END_SRC
123 | 
124 | * WAV file naming rules
125 | 
126 | Unless you define your own regex file, WAV files must be named according to the following pattern:
127 | 
128 |     .* YYYY-MM-DD H-MM AM|PM .*.wav
129 | 
130 | Where:
131 | - =YYYY= is the year.
132 | - =MM= is zero-padded month.
133 | - =DD= is zero-padded day.
134 | - =H= is unpadded (sorry) hour in 12-hour format.
135 | - =MM= is zero-padded minute.
136 | - =AM|PM= is literally just "AM" or "PM".
137 | - Everything is whitespace delimited.
138 | 
139 | * 🚨 Limitations 🚨
140 | 
141 | Many corners have been cut in the making of this script. If literally anyone else ever uses this code, these issues might be worth fixing some day.
142 | 
143 | ** Only WAV files are supported
144 | 
145 | Wouldn't be hard to figure out the file format, but Google's transcription API requires non-WAV formats specify things like sample rate and encoding. I did not need this.
146 | 
147 | ** Ugliness caused by avoiding Google Cloud Storage
148 | 
149 | Google caps the duration of audio which has been inlined into the transcription request at 1 minute. Anything longer than that, and you need to configure a Google Cloud Storage bucket. I didn't want to, so I split each voice note into 55-second chunks with a 5-second overlap.
150 | 
151 | For example, a 3 minute long voice note is actually transcribed in 4 separate chunks:
152 | 1. 0:00 to 0:55 -- 55 seconds
153 | 2. 0:50 to 1:45 -- 55 seconds, first 5 overlap
154 | 3. 1:40 to 2:35 -- 55 seconds, first 5 overlap
155 | 4. 2:30 to 3:00 -- 30 seconds, first 5 overlap
156 | 
157 | To reduce (or, maybe produce) confusion, I insert an ellipsis (...) into the transcription wherever we're about to start inserting overlapped content. For example:
158 | 
159 | #+BEGIN_SRC
160 | and we went to the store for some ... the store for some candy to bring with us
161 | #+END_SRC
162 | 
163 | This is ugly and lazy and later versions might improve this.
164 | 
165 | * Example Workflow
166 | 
167 | This is how I integrate my voice recordings into org-mode.
168 | 
169 | *Convenient Voice Recording*
170 | 
171 | I record voice notes on my Android device using "Easy Voice Recorder". I use this app specifically because it provides a system shortcut to toggle recording. The first invocation of this shortcut begins recording, and the second stops recording, saving the audio to a new WAV file. A third invocation would start recording again, but with another new file.
172 | 
173 | This app also lets you specify how audio files should be named, which makes it easy to encode date and time.
174 | 
175 | Most importantly, I use the "Button Mapper" app to *bind a long-press of the volume-up key to this shortcut*. This works even when the screen is off.
176 | 
177 | With this setup, ideas, tasks, and notes can be recorded instantly and effortlessly. Just long hold the volume up key, say whatever needs to be said, and long hold again to complete the file. No unlocking the phone, and no interacting with the touchscreen.
178 | 
179 | Alternatively, If you don't mind carrying a second device, a dedicated voice recorder would work at least as well.
180 | 
181 | *Syncing The Audio Files*
182 | 
183 | I use Syncthing to sync the voice notes directory on my Android device to a directory on my PC. This is probably the easiest way to achieve near realtime syncing, and Syncthing is FOSS!
184 | 
185 | Alternatively, you can manually copy the files every evening over USB, or SSH, or Google Drive, or...well, you get the idea.
186 | 
187 | *Transcription*
188 | 
189 | In my org directory structure, I have a file dedicated to receiving transcribed, but not yet properly filed, voice notes. Let's say that this is at =~/org/unfiled-voice-notes.org=. Let's also assume that my untranscribed voice notes are synced -- by Syncthing -- to =~/new-voice-notes/=.
190 | 
191 | If I run the example command under the =Basic Usage= heading, then absent any errors, =~/new-voice-notes/= will be cleared out. This frees up space on the phone, though otherwise isn't all that important. What is important is that, for each processed audio file, a new heading will appended to =~/org/unfiled-voice-notes.org=. The audio file will now live in =~/org/archived-voice-notes/=, and any file links in the org entries will point to this location. Because the links are absolute, the headings can be moved around wherever you'd like and will not break.
192 | 
193 | *Filing*
194 | 
195 | Once =voicenotes2org= has returned, you should open =~/org/unfiled-voice-notes.org= in Emacs, then use =org-refile= to pop each entry into a more proper location in your org directory structure. Make sure you've configured =org-refile-targets= first!
196 | 


--------------------------------------------------------------------------------
/formatted-output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bgutter/voicenotes2org/3518c927489da0950a7a89dc56d3ec70c18f6f21/formatted-output.png


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | """
 2 | voicenotes2org package config
 3 | """
 4 | 
 5 | from setuptools import setup
 6 | 
 7 | setup(
 8 |     name='voicenotes2org',
 9 |     version='1.0.1',
10 |     description='Transcribe voice recordings using the Google Cloud Speech-To-Text API, and export the results to Emacs org-mode headings.',
11 |     long_description="See https://github.com/bgutter/voicenotes2org for all details!",
12 |     url='https://github.com/bgutter/voicenotes2org',
13 |     keywords='emacs org org-mode transcribe voice text',
14 |     package_dir={'': 'src'},
15 |     py_modules=["voicenotes2org"],
16 |     python_requires='>=3.5',
17 |     install_requires=['pydub','google-cloud-speech','appdirs','toml','pprint'],
18 |     entry_points={
19 |         'console_scripts': [
20 |             'voicenotes2org = voicenotes2org:main',
21 |         ],
22 |     },
23 | )
24 | 


--------------------------------------------------------------------------------
/src/voicenotes2org.py:
--------------------------------------------------------------------------------
  1 | """
  2 | voicenotes2org.py
  3 | 
  4 | Batch convert a collection of voice notes (WAV files) to an org-mode file.
  5 | """
  6 | 
  7 | from google.cloud import speech_v1
  8 | from google.cloud.speech_v1 import enums
  9 | 
 10 | import pprint
 11 | import pydub
 12 | import appdirs
 13 | import toml
 14 | 
 15 | import argparse
 16 | import shutil
 17 | import glob
 18 | import io
 19 | import os
 20 | import re
 21 | import multiprocessing as mp
 22 | from datetime import datetime
 23 | 
 24 | #
 25 | # These can be tinkered with to adjust the output file format
 26 | #
 27 | 
 28 | TRANSCRIPTION_CHUNK_SIZE = 10 # seconds
 29 | SPLICE_STR = "..."
 30 | ORG_FILE_HEADER = """# -*- eval: (org-link-set-parameters "voicenote" :follow (lambda (content) (cl-multiple-value-bind (file seconds) (split-string content ":") (emms-play-file file) (sit-for 0.5) (emms-seek-to (string-to-number seconds))))) -*-
 31 | 
 32 | C-c C-o on any link to play clip starting from that offset.
 33 | 
 34 | """
 35 | ENTRY_TEMPLATE = """
 36 | * New Voice Note
 37 | [{time_part_str}]
 38 | [[voicenote:{link_path}:0][Archived Clip]]
 39 | 
 40 | {body}
 41 | """
 42 | TRANSCRIPTION_CHUNK_TEMPLATE = "[[voicenote:{filepath}:{abssecond}][{minute}:{relsecond}]] {text}\n"
 43 | DEFAULT_FNAME_PARSER = re.compile( r"[^0-9]*(?P<year>\d+)-(?P<month>\d+)-(?P<day>\d+)\s+(?P<hour>\d+)-(?P<minute>\d+)\s+(?P<ampm>\S*).*\.wav" )
 44 | 
 45 | def create_api_client( gcp_credentials_path=None ):
 46 |     """
 47 |     Open a connection
 48 |     """
 49 |     if "GOOGLE_APPLICATION_CREDENTIALS" not in os.environ:
 50 |         # Need explicit credentials -- complain if they aren't defined.
 51 |         if gcp_credentials_path is None:
 52 |             raise ValueError( "You gotta provide a GCP credentials JSON if it's not set as an environment variable. See https://cloud.google.com/docs/authentication/production." )
 53 |         client = speech_v1.SpeechClient.from_service_account_json( gcp_credentials_path )
 54 | 
 55 |     else:
 56 |         # It should figure things out automatically.
 57 |         client = speech_v1.SpeechClient()
 58 |     return client
 59 | 
 60 | def transcribe_wav( local_file_path, gcp_credentials_path=None, language_code="en-US", client=None ):
 61 |     """
 62 |     Pass in path to local WAV file, get a time-indexed transcription.
 63 | 
 64 |     Also pass in the path to your GCP credentials JSON, unless you've configured
 65 |     the GOOGLE_APPLICATION_CREDENTIALS environment variable.
 66 | 
 67 |     Return value is a tuple. The first item is the full transcription. The second is
 68 |     a list of tuples, where the first value in each tuple is offset, in seconds,
 69 |     since the beginning of the file, and the second value is a transcribed word.
 70 | 
 71 |     IE:
 72 |     text, timemap = transcribe_wav( "./something.wav" )
 73 |     print( text )    # This is a test
 74 |     print( timemap ) # [ (1, "this"), (1, "is"), (1, "a"), (2, "test") ].
 75 |     """
 76 |     SEGMENT_SIZE = 55 * 1000 # 55 seconds
 77 |     OVERLAP_SIZE = 5 * 1000  # 5 seconds
 78 | 
 79 |     #
 80 |     # Instantiate a client
 81 |     #
 82 |     if client is None:
 83 |         client = create_api_client( gcp_credentials_path )
 84 | 
 85 |     #
 86 |     # Build the request. Because we only support WAV, don't need to define encoding
 87 |     # or sample rate.
 88 |     #
 89 |     config = {
 90 |         "model": "video", # premium model, but cost is basically nothing for single user anyway. Works MUCH better.
 91 |         "language_code": language_code,
 92 |         "enable_word_time_offsets": True,
 93 |     }
 94 | 
 95 |     #
 96 |     # GCP inline audio is restricted to just one minute. To avoid needing to setup
 97 |     # a GCP bucket, we'll split any provided audio files into 55-second chunks with
 98 |     # 5 seconds of overlap (since we'll probably split a word). IE, chunk 1 is from
 99 |     # 0:00 to 0:55, two is from 0:50 to 1:45, etc...
100 |     #
101 |     full_text = ""
102 |     time_map = []
103 |     full_recording = pydub.AudioSegment.from_file( local_file_path, format="wav" )
104 |     full_duration_ms = len( full_recording )
105 |     offset = 0
106 |     while offset < full_duration_ms:
107 | 
108 |         # If we're splitting into chunks, insert a hint
109 |         if offset > 0:
110 |             full_text += " " + SPLICE_STR + " "
111 |             time_map.append( ( int( offset / 1000 ), SPLICE_STR ) )
112 | 
113 |         # Segment the clip into a RAM file
114 |         this_clip = full_recording[ offset : min( offset + SEGMENT_SIZE, full_duration_ms ) ]
115 |         segment_wav = io.BytesIO()
116 |         this_clip.export( segment_wav, format="wav" )
117 |         segment_wav.seek(0)
118 |         audio = { "content": segment_wav.read() }
119 | 
120 |         #
121 |         # Submit the request & wait synchronously
122 |         #
123 |         operation = client.long_running_recognize( config, audio )
124 |         response = operation.result()
125 | 
126 |         #
127 |         # Process the response. Only take the first alternative.
128 |         #
129 |         for result in response.results:
130 |             if len( result.alternatives ) < 1:
131 |                 continue
132 |             best_guess = result.alternatives[0]
133 |             full_text += best_guess.transcript
134 |             time_map.extend( [ ( x.start_time.seconds + int( offset / 1000 ), x.word ) for x in best_guess.words ] )
135 | 
136 |         # Next offset
137 |         offset += ( SEGMENT_SIZE - OVERLAP_SIZE )
138 | 
139 |     return ( full_text, time_map )
140 | 
141 | def recording_date_from_full_path( wav_file_path, regex ):
142 |     """
143 |     Return a datetime given a filename.
144 |     Throws ValueError if the WAV file doesn't match the regex.
145 |     """
146 |     #
147 |     # Extract date, time, and ID from wav_file_path
148 |     #
149 |     match = regex.match( os.path.basename( wav_file_path ) )
150 |     if match is None:
151 |         raise ValueError( "Name does not match pattern!" )
152 |     parts = match.groupdict()
153 | 
154 |     # Convert hours from AMPM to 24 hour, then create a datetime object
155 |     dt_args = [ int( parts[p] ) for p in [ "year", "month", "day", "hour", "minute" ] ]
156 |     if parts["ampm"].lower() == "am":
157 |         # AM: 12->0, 1->1 ... 11->11
158 |         if dt_args[-2] == 12:
159 |             dt_args[-2] = 0
160 |     else:
161 |         # PM: 12->12, 1->13, 2->14, ... 11->23
162 |         if dt_args[-2] < 12:
163 |             dt_args[-2] += 12
164 |     return datetime( *dt_args )
165 | 
166 | def path_as_archived( wav_file_path, archive_dir ):
167 |     """
168 |     Return the intended path of this wav file after archiving.
169 |     """
170 |     return os.path.join( archive_dir, os.path.basename( wav_file_path ) )
171 | 
172 | def format_org_entry( wav_file_path, text, timestamp_map, archive_dir, voicenote_filename_regex ):
173 |     """
174 |     Return a string which represents the org-mode heading for this transcription. Includes
175 |     links which will play the archived version of the note starting every 10 seconds.
176 |     """
177 |     dt = recording_date_from_full_path( wav_file_path, voicenote_filename_regex )
178 |     time_part_str = dt.strftime( "%Y-%m-%d %a %H:%M" )
179 | 
180 |     #
181 |     # Accumulate words by offset, inserting links & chunks of text every N seconds
182 |     #
183 |     offset_limit = TRANSCRIPTION_CHUNK_SIZE
184 |     words_this_chunk = []
185 |     annotated_transcription = ""
186 | 
187 |     def append_chunk( running_body, words_this_chunk, offset_limit ):
188 |         text = " ".join( words_this_chunk )
189 |         abssecond = offset_limit - TRANSCRIPTION_CHUNK_SIZE
190 |         relsecond = abssecond % 60
191 |         minute = int( abssecond / 60 )
192 |         running_body += TRANSCRIPTION_CHUNK_TEMPLATE.format(
193 |             filepath=path_as_archived( wav_file_path, archive_dir ),
194 |             abssecond=abssecond,
195 |             minute="{:02d}".format( minute ),
196 |             relsecond="{:02d}".format( relsecond ),
197 |             text=text )
198 |         words_this_chunk = [ word ]
199 |         offset_limit = ( int( word_offset / TRANSCRIPTION_CHUNK_SIZE ) + 1 ) * TRANSCRIPTION_CHUNK_SIZE
200 |         return running_body, words_this_chunk, offset_limit
201 | 
202 |     for ( word_offset, word ) in timestamp_map:
203 |         if word_offset < offset_limit:
204 |             # Keep accumulating words
205 |             words_this_chunk.append( word )
206 |         else:
207 |             # Finished a chunk -- write it and start the next
208 |             annotated_transcription, words_this_chunk, offset_limit = append_chunk( annotated_transcription, words_this_chunk, offset_limit )
209 | 
210 |     # Clear out whatever we have, if anything
211 |     if len( words_this_chunk ) > 0:
212 |         annotated_transcription, words_this_chunk, offset_limit = append_chunk( annotated_transcription, words_this_chunk, offset_limit )
213 | 
214 |     #
215 |     # Fill in the entry template
216 |     #
217 |     return ENTRY_TEMPLATE.format(
218 |         time_part_str=time_part_str,
219 |         link_path=path_as_archived( wav_file_path, archive_dir ),
220 |         body=annotated_transcription )
221 | 
222 | def org_transcribe( voice_notes_dir, archive_dir, org_transcript_file, just_copy=False, gcp_credentials_path=None, verbose=False, max_concurrent_requests=5, voicenote_filename_regex=DEFAULT_FNAME_PARSER ):
223 |     """
224 |     Root transcription function. Performs the following steps:
225 |     """
226 | 
227 |     #
228 |     # Filter out anything that doesn't match the filename regex
229 |     # TODO: We call recording_date_from_full_path() like 3 times for each record (maybe more).
230 |     # Might as well just cache it somewhere.
231 |     #
232 |     all_wavs = glob.glob( os.path.join( voice_notes_dir, "*.wav" ) )
233 |     correctly_named_wavs = []
234 |     for wav in all_wavs:
235 |         try:
236 |             recording_date_from_full_path( wav, voicenote_filename_regex ) # Just testing for exception
237 |             correctly_named_wavs.append( wav )
238 |         except ValueError:
239 |             pass
240 | 
241 |     #
242 |     # Don't create more threads than there are files to transcribe.
243 |     #
244 |     max_concurrent_requests = min( max_concurrent_requests, len( correctly_named_wavs ) )
245 | 
246 |     #
247 |     # Get all of the Google transcription results
248 |     #
249 |     if len( correctly_named_wavs ) > 0:
250 |         pool = mp.Pool( max_concurrent_requests, initializer=worker_init_func, initargs=(subprocess_transcribe_function, gcp_credentials_path, verbose) )
251 |         results = []
252 |         for wav_file_path in correctly_named_wavs:
253 |             results.append( pool.apply_async( subprocess_transcribe_function, args=( wav_file_path, voicenote_filename_regex ) ) )
254 |         pool.close()
255 |         pool.join()
256 |         results = [ r.get() for r in results ]
257 |         results = [ r for r in results if r is not None ]
258 |     else:
259 |         results = []
260 | 
261 |     #
262 |     # Get formatted org entries for all successful transcriptions
263 |     #
264 |     org_entries = []
265 |     for ( date, wav_file_path, ( text, timestamp_map ) ) in results:
266 |         org_entries.append( ( date, wav_file_path, format_org_entry( wav_file_path, text, timestamp_map, archive_dir, voicenote_filename_regex ) ) )
267 |     org_entries = sorted( org_entries, key=lambda x: x[0] )
268 | 
269 |     #
270 |     # Open file to append headings -- create if needed.
271 |     #
272 |     if not os.path.exists( org_transcript_file ):
273 |         fout = open( org_transcript_file, "w" )
274 |         fout.write( ORG_FILE_HEADER )
275 |     else:
276 |         fout = open( org_transcript_file, "a" )
277 | 
278 |     #
279 |     # Write each heading, move WAV files to archive if it looks
280 |     # like the transcription worked.
281 |     #
282 |     for _, wav_file_path, org_entry in org_entries:
283 |         if org_entry is not None:
284 |             fout.write( org_entry )
285 |             dst_path = path_as_archived( wav_file_path, archive_dir )
286 |             if just_copy:
287 |                 shutil.copy2( wav_file_path, dst_path )
288 |             else:
289 |                 shutil.move( wav_file_path, dst_path )
290 |         else:
291 |             print( "Possible failure on file {}?".format( wav_file_path ) )
292 |     fout.close()
293 | 
294 |     #
295 |     # Done!
296 |     #
297 |     if verbose:
298 |         print( "Done!" )
299 | 
300 | def subprocess_transcribe_function( fname, voicenote_filename_regex ):
301 |     """
302 |     This is performed in another process.
303 |     """
304 |     if not hasattr( subprocess_transcribe_function, "client" ):
305 |         # Init function failed.
306 |         return None
307 |     if subprocess_transcribe_function.verbose:
308 |         # TODO: We should (probably?) queue these messages and print() on a single thread/process...but....
309 |         print( "Transcribing {}...".format( fname ) )
310 |     try:
311 |         ret = ( recording_date_from_full_path( fname, voicenote_filename_regex ), fname, transcribe_wav( fname, client=subprocess_transcribe_function.client ) )
312 |     except BaseException as e:
313 |         # Do NOT kill the program. We'll leave the audio file in the unprocessed directory.
314 |         print( "ERROR:" )
315 |         print( e )
316 |         ret = None
317 |     return ret
318 | 
319 | def worker_init_func( the_mapped_function, credentials_path, verbose ):
320 |     """
321 |     Create a client and attach it to the function.
322 |     This is called once per worker.
323 |     It works because each worker is an independent process, and has its own copy
324 |     of the subprocess_transcribe_function() function.
325 |     """
326 |     if verbose:
327 |         print( "Creating a new client..." )
328 |     try:
329 |         the_mapped_function.client = create_api_client( credentials_path )
330 |         the_mapped_function.verbose = verbose
331 |     except BaseException as e:
332 |         # Probably failed to create a client. We want to exit, but can't from a subprocess.
333 |         # subprocess_transcribe_function will return None
334 |         print( e )
335 | 
336 | def main():
337 |     """
338 |     CLI for this package. Just wraps org_transcribe().
339 |     """
340 |     #
341 |     # Parse CLI
342 |     #
343 |     parser = argparse.ArgumentParser( description="Transcribe a directory of wav files into a single Emacs org-mode file." )
344 |     parser.add_argument( "--voice_notes_dir", type=str, help="Directory of WAV files which will be searched non-recursively." )
345 |     parser.add_argument( "--archive_dir", type=str, help="Directory where WAV files will be placed after transcription." )
346 |     parser.add_argument( "--org_transcript_file", type=str, help="Org file where transcription headings will be appended. Will be created if it doesn't exist." )
347 |     parser.add_argument( "--just_copy", type=bool, help="If True, don't remove files from voice_notes_dir. Default is False." )
348 |     parser.add_argument( "--gcp_credentials_path", type=str, help="Path to GCP credentials JSON, if environment variables are unconfigured." )
349 |     parser.add_argument( "--verbose", type=bool, help="Prints out which WAV we're working on." )
350 |     parser.add_argument( "--max_concurrent_requests", type=int, help="Maximum number of concurrent transcription requests." )
351 |     parser.add_argument( "--voicenote_filename_regex_path", type=str, help="Path to a text file containing a Python regex, which will be used to match "
352 |                          "and parse voice note filenames. It MUST contain named groups for year, month, day, hour, minute, and ampm. All but ampm "
353 |                          "are local date/time (or, whatever you want, really), 12 hour clock. ampm should be either literally am or pm.")
354 |     cli_kwargs = { k: v for k, v in vars( parser.parse_args() ).items() if v is not None }
355 | 
356 |     #
357 |     # If a config file exists, find anything missing there
358 |     #
359 |     config_file_path = os.path.join( appdirs.user_config_dir( "voicenotes2org", "voicenotes2org" ), "default.toml" )
360 |     kwargs = {}
361 |     if os.path.exists( config_file_path ):
362 |         with open( config_file_path, "r" ) as fin:
363 |             try:
364 | 
365 |                 # Read the kwargs from the TOML
366 |                 kwargs = toml.load( fin )
367 | 
368 |                 # Check args and expand paths (Like ~ and $VAR
369 |                 # also convert any relative paths to be relative /to the config file/, not CWD
370 |                 path_args = [ "voice_notes_dir", "archive_dir", "org_transcript_file", "voicenote_filename_regex_path" ]
371 |                 for p in path_args:
372 |                     if p in kwargs:
373 |                         kwargs[ p ] = os.path.expanduser( os.path.expandvars( kwargs[ p ] ) )
374 |                         if not os.path.isabs( kwargs[ p ] ):
375 |                             kwargs[ p ] = os.path.join( os.path.dirname( config_file_path ), kwargs[ p ] )
376 | 
377 |             except toml.decoder.TomlDecodeError as e:
378 |                 print( "\nInvalid config file at {}!".format( config_file_path )  )
379 |                 print( str( e ) )
380 |                 print( )
381 |                 exit( -1 )
382 | 
383 |     #
384 |     # Determine final kwargs -- CLI always overwrites config file
385 |     #
386 |     kwargs.update( cli_kwargs )
387 | 
388 |     #
389 |     # If user supplied a voicenote_filename_regex_path, replace it with a compiled regex.
390 |     #
391 |     if "voicenote_filename_regex_path" in kwargs:
392 |         with open( kwargs[ "voicenote_filename_regex_path" ], "r" ) as fin:
393 |             content = [ line for line in fin.readlines() if not line.startswith( "#" ) ]
394 |             content = "".join( content )
395 |             try:
396 |                 regex = re.compile( content )
397 |                 del kwargs[ "voicenote_filename_regex_path" ] # Not valid to org_transcribe
398 |             except re.error as e:
399 |                 print( "Invalid regex!" )
400 |                 print( str( e ) )
401 |                 exit( -1 )
402 | 
403 |     #
404 |     # Explain ourselves
405 |     #
406 |     if "verbose" in kwargs and kwargs[ "verbose" ]:
407 |         print()
408 |         print( "Config Options:" )
409 |         pprint.pprint( kwargs )
410 |         print()
411 | 
412 |     #
413 |     # Go!
414 |     #
415 |     org_transcribe( **kwargs )
416 | 
417 | if __name__ == "__main__":
418 |     main()
419 | 


--------------------------------------------------------------------------------