├── tmp
    └── temp.txt
├── output
    └── temp.txt
├── README.md
└── main.py


/tmp/temp.txt:
--------------------------------------------------------------------------------
1 | temp


--------------------------------------------------------------------------------
/output/temp.txt:
--------------------------------------------------------------------------------
1 | temp


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # autoPodcastEditor
 2 | A program made for video podcasts to expedite the editing process by automatically switching video clips based on who is talking
 3 | 
 4 | ### See it in action!
 5 | [![Demo Video](https://i.imgur.com/ClN1S6f.png)](https://www.youtube.com/watch?v=kMJ4Bx4BBBo&feature=youtu.be)
 6 | 
 7 | Made a short video explaining what autoPodcastEditor does with examples (9:43).
 8 | Timestamps
 9 | - 0:00 - What it does and why
10 | - 2:30 - Introduction to example clips
11 | - 3:40 - First input clip display
12 | - 4:27 - Second input clip display
13 | - 5:35 - Quick explanation on global variables (sample rate, tolerance, exceed req. no audio overlap)
14 | - 6:57 - First example of program combining clips (audio overlap)
15 | - 8:19 - Second example of program combining clips (no audio overlap)
16 | 
17 | ### What it does and why
18 | - Many modern video podcasts have cameras pointed at each participant, which makes the editing process long and tedious. The process of selecting whose camera to show when they're talking is a chore that could be done autonomously. This program aims to solve that.
19 | - autoPodcastEditor exports a final video clip that switches between the cameras of each podcast participant depending on who's talking to make the editing process a breeze.
20 | - Use cases besides podcasts include things like D&D campaigns, group video game sessions, or even security camera footage to highlight significant events (assuming they have sound).
21 | 
22 | ### Dependencies
23 | - **ffmpeg** - converts video clips to audio waveform arrays (great installation instructions can be found [here](https://www.wikihow.com/Install-FFmpeg-on-Windows))
24 | - **subprocess** - calls ffmpeg commands via command line
25 | - **moviepy** - concatenates and exports video clips
26 | - **math** - processes split point calculations
27 | 
28 | ### How it works
29 | - ffmpeg converts the input files to audio clips (.wav files) and converts the audio clips to integer arrays representing the audio waveform levels throughout the clips
30 | - parseAudioData() cleans the audio arrays and shrinks them to an appropriate size based on the SAMPLE_RATE global
31 | - compareAudioArrays() 'normalizes' all of the arrays (i.e. makes them all the same length by concatenating zeroes; representing no sound, to shorter arrays), compares each entry of the cleaned audio arrays (using returnHighestIndex() to do so), and outputs an array of numbers representing which clip should be displayed based on its audio level at a given time (number is zero based, from 0 to number_input_clips - 1)
32 | - returnHighestIndex() compares the audio level at a given time between all clips, but gives the current clip priority - the EXCEEDS_BY global indicates by what percentage (in decimal form) another clip must exceed the current clip by in sound level to take priority; returns the index of the clips that should be given priority
33 | - After compareAudioArrays() generates the outputArray (priority timeline of which clip should be shown when), moviepy grabs the snippets of the video clips and concatenates them appropriately
34 | 
35 | ### Notes
36 | - All video clips must be synced at the start (i.e. synced at time = 0s) differing lengths of clips is fine, though (longer clips will override)
37 | - Does not currently support separate video + audio clips (planning on adding support soon)
38 | - Plan to add an option overlapping audio so you can hear audio from all clips simultaneously (mainly for podcasts where multiple people may be talking at once, since the program's main purpose is video switching)
39 | - No testing has been done for quality retainment and file size - unsure whether quality is dropped via the script or if there is a maximum file size limit
40 | 
41 | Clips and audio must already be set to right startpoint 
42 | Quality testing
43 | Bound SAMPLE_RATE positive and less than or equal to native rate
44 | 
45 | 
46 | 
47 | ### Compatibility
48 | - Confirmed working on Windows 10, have not confirmed on other operating systems
49 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | from moviepy.editor import *
  2 | from tkinter import *
  3 | from tkinter.filedialog import askopenfilename
  4 | from scipy.io import wavfile
  5 | import subprocess
  6 | import math
  7 | 
  8 | # GLOBALS
  9 | INPUT_FILES = []                            # List of input files (in local directory)
 10 | TEMP_FOLDER = 'tmp/'                        # Temp folder name
 11 | OUTPUT_FOLDER = 'output/'                   # Output folder name
 12 | OUTPUT_FILE_NAME = "output"                 # Output file name
 13 | SAMPLE_RATE = 24                            # Number of samples to take per second to check volume level
 14 | THRESHOLD = 5                               # Required # of consecutive highest indices needed to take priority
 15 | EXCEEDS_BY = 4                              # Percentage (in decimal form) other clip(s) must exceed volume by to overtake
 16 | NO_OVERLAP_AUDIO = True                     # Restricts audio overlapping (False = overlap audio)
 17 | 
 18 | 
 19 | # INTERNAL GLOBALS (DO NOT TOUCH)
 20 | checkpoints = []  # Global storing tuples of array length + associated index, sorted from min-max by array length
 21 | checkpoint_counter = 0  # Determines which checkpoint currently at
 22 | ul_x = 10
 23 | ul_y = 10
 24 | 
 25 | # GUI CREATION
 26 | class Window(Frame):
 27 |     def __init__(self, master=None):
 28 |         Frame.__init__(self, master)
 29 |         self.master = master
 30 |         self.init_window()
 31 | 
 32 |     def init_window(self):
 33 |         self.master.title("AutoPodcastEditor")
 34 |         self.pack(fill=BOTH, expand=1)
 35 |         sync_notice = Label(self, text="Please ensure all input clips are in sync at start and don't go out of sync!")
 36 |         sync_notice.place(x=ul_x, y=ul_y)
 37 |         browseFileDir = Button(self, text="Add File", command=self.addFile)
 38 |         browseFileDir.place(x=ul_x, y=ul_y+25)
 39 | 
 40 |         sampleRateLabel = Label(self, text="Sample Rate")
 41 |         sampleRateLabel.place(x=ul_x, y=ul_y + 320)
 42 |         self.sampleRateEntry = Entry(self, width=3)
 43 |         self.sampleRateEntry.place(x=ul_x + 73, y=ul_y + 321)
 44 |         self.sampleRateEntry.insert(END, str(SAMPLE_RATE))
 45 | 
 46 |         thresholdLabel = Label(self, text="Threshold")
 47 |         thresholdLabel.place(x=ul_x + 120, y=ul_y + 320)
 48 |         self.thresholdEntry = Entry(self, width=3)
 49 |         self.thresholdEntry.place(x=ul_x + 65 + 120, y=ul_y + 321)
 50 |         self.thresholdEntry.insert(END, str(THRESHOLD))
 51 | 
 52 |         exceedsLabel = Label(self, text="Exceeds By")
 53 |         exceedsLabel.place(x=ul_x + 235, y=ul_y + 320)
 54 |         self.exceedsEntry = Entry(self, width=3)
 55 |         self.exceedsEntry.place(x=ul_x + 65 + 235, y=ul_y + 321)
 56 |         self.exceedsEntry.insert(END, str(EXCEEDS_BY))
 57 | 
 58 |         overlapAudioLabel = Label(self, text="Overlap Audio")
 59 |         overlapAudioLabel.place(x=ul_x + 345, y=ul_y + 320)
 60 |         self.overlapAudioBox = Checkbutton(self, command=self.toggleAudio)
 61 |         self.overlapAudioBox.place(x=ul_x + 65 + 360, y=ul_y + 319)
 62 | 
 63 |         outputNameLabel = Label(self, text="Output File Name")
 64 |         outputNameLabel.place(x=ul_x, y=ul_y + 345)
 65 |         self.outputNameEntry = Entry(self, width=57)
 66 |         self.outputNameEntry.place(x=ul_x + 102, y=ul_y + 346)
 67 |         self.outputNameEntry.insert(END, OUTPUT_FILE_NAME)
 68 | 
 69 |         processButton = Button(self, text="Process", command=self.confirmSettings, width=15, height=3)
 70 |         processButton.place(x=ul_x + 460, y=ul_y + 310)
 71 | 
 72 |     def confirmSettings(self):
 73 |         global SAMPLE_RATE
 74 |         SAMPLE_RATE = int(self.sampleRateEntry.get())
 75 |         global THRESHOLD
 76 |         THRESHOLD = int(self.thresholdEntry.get())
 77 |         global EXCEEDS_BY
 78 |         EXCEEDS_BY = float(self.exceedsEntry.get())
 79 |         global OUTPUT_FILE_NAME
 80 |         OUTPUT_FILE_NAME = self.outputNameEntry.get()
 81 |         self.spliceClips()
 82 | 
 83 |     def toggleAudio(self):
 84 |         global NO_OVERLAP_AUDIO
 85 |         NO_OVERLAP_AUDIO = not NO_OVERLAP_AUDIO
 86 | 
 87 |     def addFile(self):
 88 |         filename = askopenfilename()
 89 |         if filename != '':
 90 |             INPUT_FILES.append(filename)
 91 |             fileDir = Label(self, text=filename)
 92 |             fileDir.place(x=ul_x, y=ul_y+28+23*len(INPUT_FILES))
 93 | 
 94 |     # HELPER FUNCTIONS
 95 | 
 96 |     # TAKES RAW WAVEFORM DATA AND CREATES INTEGER WAVEFORM ARRAY
 97 |     # INPUT: audioRate  = audio sample rate
 98 |     #        audioArray = associated audio waveform array
 99 |     # OUTPUT: outputArray = downscaled audioArray based on SAMPLE_RATE
100 |     def parseAudioData(self, audioRate, audioArray):
101 |         sampleDivider = math.floor(audioRate / SAMPLE_RATE)
102 |         outputArray = []
103 |         sampleCounter = 0
104 |         while sampleCounter <= audioArray.shape[0]:
105 |             outputArray.append(audioArray[sampleCounter][0])
106 |             sampleCounter += sampleDivider
107 |         return outputArray
108 | 
109 |     # TAKES ARRAY OF AUDIOARRAYS AND OUTPUTS INDEX ARRAY INDICATING WHICH ARRAY IS LOUDEST AT GIVEN TIME
110 |     # INPUT: audioArrays    = array of audioArrays for each clip to compare audio
111 |     # OUTPUT: outputArray   = array with indices 1..numArrays indicating which clip should overlay
112 |     def compareAudioArrays(self, audioArrays):
113 |         priorityArray = 0  # Current array that should have video priority (zero-based)
114 |         consecutiveArray = 0  # Which array currently has the highest waveform integer
115 |         prevArray = 0  # Which array on previous iteration had highest waveform integer
116 |         consecutiveCount = 0  # Number of consecutive times audio is larger than others
117 |         counter = 0  # Current array index to compare
118 |         outputArray = []
119 | 
120 |         audioArrays = self.normalizeArrays(audioArrays)  # Set arrays to equal lengths by concatenating zeroes
121 | 
122 |         while counter < len(audioArrays[0]):
123 |             consecutiveArray = self.returnHighestIndex(audioArrays, counter, priorityArray)  # Find index of loudest clip
124 |             # If loudest clip is a different one than the previous loudest clip, add 1 to consecutive counter
125 |             if (consecutiveArray != prevArray):
126 |                 prevArray = consecutiveArray
127 |                 consecutiveCount = 1
128 |             else:
129 |                 consecutiveCount += 1
130 |             # If the overriding loudest clip has been louder >= THRESHOLD # of times, replace it
131 |             if (consecutiveCount >= THRESHOLD):
132 |                 priorityArray = consecutiveArray
133 |             for checkpoint in checkpoints:
134 |                 if checkpoint == counter:
135 |                     priorityArray = consecutiveArray
136 |             outputArray.append(priorityArray)
137 |             counter += 1
138 |         # Write output data to text file (for debugging)
139 |         f = open(TEMP_FOLDER + "audioData.txt", "w")
140 |         f.write(str(outputArray))
141 |         f.close()
142 |         return outputArray
143 | 
144 |     # COMPARES AUDIO WAVEFORMS AT GIVEN TIME AND RETURNS INDEX OF LOUDEST
145 |     # INPUT: audioArrays     = array of audio waveforms
146 |     #        index           = index to compare waveform integers
147 |     #        currentPriority = current array that has priority (for EXCEEDS_BY)
148 |     # OUTPUT: rerturnIndex   = index of audioArray with highest waveform integer
149 |     def returnHighestIndex(self, audioArrays, index, currentPriority):
150 |         maxVal = 0  # Maximum waveform value found so far
151 |         returnIndex = 0  # Index of array with highest waveform value
152 |         for c in range(len(audioArrays)):
153 |             if c != currentPriority:
154 |                 if abs(audioArrays[c][index]) > maxVal:
155 |                     maxVal = abs(audioArrays[c][index])
156 |                     returnIndex = c
157 |             else:
158 |                 if abs(audioArrays[c][index]) * EXCEEDS_BY > maxVal:
159 |                     maxVal = abs(audioArrays[c][index]) * EXCEEDS_BY
160 |                     returnIndex = c
161 |         return returnIndex
162 | 
163 |     # SETS ALL AUDIO ARRAYS TO SAME LENGTH BY CONCATENATING ZEROES TO SHORTER ARRAYS
164 |     # INPUT: audioArrays  = array of audio arrays containing integer waveform data
165 |     # OUTPUT: outputArray = array of audio arrays all equal length (concatenates 0 to shorter arrays)
166 |     def normalizeArrays(self, audioArrays):
167 |         maxArrayLen = 0
168 |         outputArray = []
169 |         # Get the length of the longest array
170 |         for array in audioArrays:
171 |             if len(array) > maxArrayLen:
172 |                 maxArrayLen = len(array)
173 |             # checkpoints.append(len(array))
174 |         # Fill shorter arrays with trailing zeroes
175 |         for array in audioArrays:
176 |             for c in range(maxArrayLen - len(array)):
177 |                 array.append(0)
178 |             outputArray.append(array)
179 |         return outputArray
180 | 
181 |     def spliceClips(self):
182 |         # MAIN PROCESS
183 |         audioDataArrays = []
184 |         # Generate .wav files for each video clip, create associated outputArray
185 |         for i in range(len(INPUT_FILES)):
186 |             # Convert input to audio waveform and call via command in subprocess
187 |             command = "ffmpeg -i " + INPUT_FILES[
188 |                 i] + " -ab 160k -ac 2 -y -vn " + TEMP_FOLDER + "audio" + str(i) + ".wav"
189 |             subprocess.call(command, shell=False)
190 |             audioRate, audioArray = wavfile.read(TEMP_FOLDER + 'audio' + str(i) + '.wav')
191 |             audioDataArrays.append(self.parseAudioData(audioRate, audioArray))
192 |         outputArray = self.compareAudioArrays(audioDataArrays)
193 | 
194 |         # Utilizes outputArray to determine which clips should be split and inserted where
195 |         outputClipList = []
196 |         audioClipList = []  # Only used if OVERLAP_AUDIO is set to 1
197 |         counter = 0
198 |         prevPriority = -1
199 |         prevEndPt = -1
200 |         while counter < len(outputArray):
201 |             # Initialization of loop
202 |             if prevEndPt == -1:
203 |                 prevPriority = outputArray[counter]
204 |                 prevEndPt = 0
205 |             # If the 'priority clip' is different than previous, finalize previous clip and add to clip list
206 |             elif prevPriority != outputArray[counter]:
207 |                 print(str(counter) + " [" + str(prevPriority) + "] || SPLIT_PT: " + "start: " + str(
208 |                     prevEndPt) + " end: " + str(counter / SAMPLE_RATE))
209 |                 outputClipList.append(
210 |                     VideoFileClip(INPUT_FILES[prevPriority], audio=NO_OVERLAP_AUDIO).subclip(prevEndPt,
211 |                                                                                              counter / SAMPLE_RATE))
212 |                 prevPriority = outputArray[counter]
213 |                 prevEndPt = counter / SAMPLE_RATE
214 |             counter += 1
215 |         print(str(counter) + " [" + str(prevPriority) + "] || SPLIT_PT: " + "start: " + str(prevEndPt) + " end: " + str(
216 |             counter / SAMPLE_RATE))
217 |         outputClipList.append(
218 |             VideoFileClip(INPUT_FILES[prevPriority], audio=NO_OVERLAP_AUDIO).subclip(prevEndPt, (
219 |                         counter - 1) / SAMPLE_RATE))
220 |         # Concatenate clips and output
221 |         videoOutput = concatenate_videoclips(outputClipList)
222 | 
223 |         # If audio should be overlapped, create an audio file with all overlapped and mix with video
224 |         if not NO_OVERLAP_AUDIO:
225 |             for wavFileIndex in range(len(INPUT_FILES)):
226 |                 audioClipList.append(AudioFileClip(TEMP_FOLDER + "audio" + str(wavFileIndex) + ".wav"))
227 |             audioOutput = CompositeAudioClip(audioClipList)
228 |             videoOutput = videoOutput.set_audio(audioOutput)
229 | 
230 |         videoOutput.write_videofile(
231 |             OUTPUT_FOLDER + OUTPUT_FILE_NAME + ".mp4")
232 | 
233 | root = Tk()
234 | root.geometry("600x400")
235 | app = Window(root)
236 | root.mainloop()
237 | 


--------------------------------------------------------------------------------