├── .gitignore
├── README.md
├── audio_file.wav
├── examples.ipynb
├── images
    ├── cqtchromagram.png
    ├── cqtkernel.png
    ├── cqtspectrogram.png
    ├── dct.png
    ├── dst.png
    ├── imdct.png
    ├── istft.png
    ├── mdct.png
    ├── melfilterbank.png
    ├── melspectrogram.png
    ├── mfcc.png
    └── stft.png
└── zaf.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | __pycache__


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Zaf-Python
  2 | 
  3 | Zafar's Audio Functions in **Python** for audio signal analysis.
  4 | 
  5 | Files:
  6 | - [`zaf.py`](#zafpy): Python module with the audio functions.
  7 | - [`examples.ipynb`](#examplesipynb): Jupyter notebook with some examples.
  8 | - [`audio_file.wav`](#audio_filewav): audio file used for the examples.
  9 | 
 10 | See also:
 11 | - [Zaf-Matlab](https://github.com/zafarrafii/Zaf-Matlab): Zafar's Audio Functions in **Matlab** for audio signal analysis.
 12 | - [Zaf-Julia](https://github.com/zafarrafii/Zaf-Julia): Zafar's Audio Functions in **Julia** for audio signal analysis.
 13 | 
 14 | ## zaf.py
 15 | 
 16 | This Python module implements a number of functions for audio signal analysis.
 17 | 
 18 | Simply copy the file `zaf.py` in your working directory and you are good to go. Make sure you have Python 3, NumPy, and SciPy installed.
 19 | 
 20 | Functions:
 21 | - [`stft`](#stft) - Compute the short-time Fourier transform (STFT).
 22 | - [`istft`](#istft) - Compute the inverse STFT.
 23 | - [`melfilterbank`](#melfilterbank) - Compute the mel filterbank.
 24 | - [`melspectrogram`](#melspectrogram) - Compute the mel spectrogram using a mel filterbank.
 25 | - [`mfcc`](#mfcc) - Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
 26 | - [`cqtkernel`](#cqtkernel) - Compute the constant-Q transform (CQT) kernel.
 27 | - [`cqtspectrogram`](#cqtspectrogram) - Compute the CQT spectrogram using a CQT kernel.
 28 | - [`cqtchromagram`](#cqtchromagram) - Compute the CQT chromagram using a CQT kernel.
 29 | - [`dct`](#dct) - Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
 30 | - [`dst`](#dst) - Compute the discrete sine transform (DST) using the FFT.
 31 | - [`mdct`](#mdct) - Compute the modified discrete cosine transform (MDCT) using the FFT.
 32 | - [`imdct`](#imdct) - Compute the inverse MDCT using the FFT.
 33 | 
 34 | Other:
 35 | - `wavread` - Read a WAVE file (using SciPy).
 36 | - `wavwrite` - Write a WAVE file (using SciPy).
 37 | - `sigplot` - Plot a signal in seconds.
 38 | - `specshow` - Display a spectrogram in dB, seconds, and Hz.
 39 | - `melspecshow` - Display a mel spectrogram in dB, seconds, and Hz.
 40 | - `mfccshow` - Display MFCCs in seconds.
 41 | - `cqtspecshow` - Display a CQT spectrogram in dB, seconds, and Hz.
 42 | - `cqtchromshow` - Display a CQT chromagram in seconds.
 43 | 
 44 | 
 45 | ### stft
 46 | 
 47 | Compute the short-time Fourier transform (STFT).
 48 | 
 49 | ```
 50 | audio_stft = zaf.stft(audio_signal, window_function, step_length)
 51 |     
 52 | Inputs:
 53 |     audio_signal: audio signal (number_samples,)
 54 |     window_function: window function (window_length,)
 55 |     step_length: step length in samples
 56 | Output:
 57 |     audio_stft: audio STFT (window_length, number_frames)
 58 | ```
 59 | 
 60 | #### Example: Compute and display the spectrogram from an audio file.
 61 | 
 62 | ```
 63 | # Import the needed modules
 64 | import numpy as np
 65 | import scipy.signal
 66 | import zaf
 67 | import matplotlib.pyplot as plt
 68 | 
 69 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
 70 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 71 | audio_signal = np.mean(audio_signal, 1)
 72 | 
 73 | # Set the window duration in seconds (audio is stationary around 40 milliseconds)
 74 | window_duration = 0.04
 75 | 
 76 | # Derive the window length in samples (use powers of 2 for faster FFT and constant overlap-add (COLA))
 77 | window_length = pow(2, int(np.ceil(np.log2(window_duration*sampling_frequency))))
 78 | 
 79 | # Compute the window function (use SciPy's periodic Hamming window for COLA as NumPy's Hamming window is symmetric)
 80 | window_function = scipy.signal.hamming(window_length, sym=False)
 81 | 
 82 | # Set the step length in samples (half of the window length for COLA)
 83 | step_length = int(window_length/2)
 84 | 
 85 | # Compute the STFT
 86 | audio_stft = zaf.stft(audio_signal, window_function, step_length)
 87 | 
 88 | # Derive the magnitude spectrogram (without the DC component and the mirrored frequencies)
 89 | audio_spectrogram = np.absolute(audio_stft[1:int(window_length/2)+1, :])
 90 | 
 91 | # Display the spectrogram in dB, seconds, and Hz
 92 | number_samples = len(audio_signal)
 93 | plt.figure(figsize=(14, 7))
 94 | zaf.specshow(audio_spectrogram, number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
 95 | plt.title("Spectrogram (dB)")
 96 | plt.tight_layout()
 97 | plt.show()
 98 | ```
 99 | 
100 | <img src="images/stft.png" width="1000">
101 | 
102 | 
103 | ### istft
104 | 
105 | Compute the inverse short-time Fourier transform (STFT).
106 | 
107 | ```
108 | audio_signal = zaf.istft(audio_stft, window_function, step_length)
109 | 
110 | Inputs:
111 |     audio_stft: audio STFT (window_length, number_frames)
112 |     window_function: window function (window_length,)
113 |     step_length: step length in samples
114 | Output:
115 |     audio_signal: audio signal (number_samples,)
116 | ```
117 | 
118 | #### Example: Estimate the center and the sides from a stereo audio file.
119 | 
120 | ```
121 | # Import the needed modules
122 | import numpy as np
123 | import scipy.signal
124 | import zaf
125 | import matplotlib.pyplot as plt
126 | 
127 | # Read the (stereo) audio signal with its sampling frequency in Hz
128 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
129 | 
130 | # Set the parameters for the STFT
131 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
132 | window_function = scipy.signal.hamming(window_length, sym=False)
133 | step_length = int(window_length/2)
134 | 
135 | # Compute the STFTs for the left and right channels
136 | audio_stft1 = zaf.stft(audio_signal[:, 0], window_function, step_length)
137 | audio_stft2 = zaf.stft(audio_signal[:, 1], window_function, step_length)
138 | 
139 | # Derive the magnitude spectrograms (with DC component) for the left and right channels
140 | number_frequencies = int(window_length/2)+1
141 | audio_spectrogram1 = abs(audio_stft1[0:number_frequencies, :])
142 | audio_spectrogram2 = abs(audio_stft2[0:number_frequencies, :])
143 | 
144 | # Estimate the time-frequency masks for the left and right channels for the center
145 | center_mask1 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram1
146 | center_mask2 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram2
147 | 
148 | # Derive the STFTs for the left and right channels for the center (with mirrored frequencies)
149 | center_stft1 = np.multiply(np.concatenate((center_mask1, center_mask1[-2:0:-1, :])), audio_stft1)
150 | center_stft2 = np.multiply(np.concatenate((center_mask2, center_mask2[-2:0:-1, :])), audio_stft2)
151 | 
152 | # Synthesize the signals for the left and right channels for the center
153 | center_signal1 = zaf.istft(center_stft1, window_function, step_length)
154 | center_signal2 = zaf.istft(center_stft2, window_function, step_length)
155 | 
156 | # Derive the final stereo center and sides signals
157 | center_signal = np.stack((center_signal1, center_signal2), axis=1)
158 | center_signal = center_signal[0:np.shape(audio_signal)[0], :]
159 | sides_signal = audio_signal-center_signal
160 | 
161 | # Write the center and sides signals
162 | zaf.wavwrite(center_signal, sampling_frequency, "center_file.wav")
163 | zaf.wavwrite(sides_signal, sampling_frequency, "sides_file.wav")
164 | 
165 | # Display the original, center, and sides signals in seconds
166 | xtick_step = 1
167 | plt.figure(figsize=(14, 7))
168 | plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
169 | plt.ylim(-1, 1), plt.title("Original signal")
170 | plt.subplot(3, 1, 2), zaf.sigplot(center_signal, sampling_frequency, xtick_step)
171 | plt.ylim(-1, 1), plt.title("Center signal")
172 | plt.subplot(3, 1, 3), zaf.sigplot(sides_signal, sampling_frequency, xtick_step)
173 | plt.ylim(-1, 1), plt.title("Sides signal")
174 | plt.tight_layout()
175 | plt.show()
176 | ```
177 | 
178 | <img src="images/istft.png" width="1000">
179 | 
180 | 
181 | ### melfilterbank
182 | 
183 | Compute the mel filterbank.
184 | 
185 | ```
186 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
187 | 
188 | Inputs:
189 |     sampling_frequency: sampling frequency in Hz
190 |     window_length: window length for the Fourier analysis in samples
191 |     number_mels: number of mel filters
192 |     
193 | Output:
194 |     mel_filterbank: mel filterbank (sparse) (number_mels, number_frequencies)
195 | ```
196 | 
197 | #### Example: Compute and display the mel filterbank.
198 | 
199 | ```
200 | # Import the needed modules
201 | import numpy as np
202 | import zaf
203 | import matplotlib.pyplot as plt
204 | 
205 | # Compute the mel filterbank using some parameters
206 | sampling_frequency = 44100
207 | window_length = pow(2, int(np.ceil(np.log2(0.04 * sampling_frequency))))
208 | number_mels = 128
209 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
210 | 
211 | # Display the mel filterbank
212 | plt.figure(figsize=(14, 5))
213 | plt.imshow(mel_filterbank.toarray(), aspect="auto", cmap="jet", origin="lower")
214 | plt.title("Mel filterbank")
215 | plt.xlabel("Frequency index")
216 | plt.ylabel("Mel index")
217 | plt.tight_layout()
218 | plt.show()
219 | ```
220 | 
221 | <img src="images/melfilterbank.png" width="1000">
222 | 
223 | 
224 | ### melspectrogram
225 | 
226 | Compute the mel spectrogram using a mel filterbank.
227 | 
228 | ```
229 | mel_filterbank = zaf.melspectrogram(audio_signal, window_function, step_length, mel_filterbank)
230 | 
231 | Inputs:
232 |     audio_signal: audio signal (number_samples,)
233 |     window_function: window function (window_length,)
234 |     step_length: step length in samples
235 |     mel_filterbank: mel filterbank (number_mels, number_frequencies)
236 | Output:
237 |     mel_spectrogram: mel spectrogram (number_mels, number_times)
238 | ```
239 | 
240 | #### Example: Compute and display the mel spectrogram.
241 | 
242 | ```
243 | # Import the needed modules
244 | import numpy as np
245 | import scipy.signal
246 | import zaf
247 | import matplotlib.pyplot as plt
248 | 
249 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
250 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
251 | audio_signal = np.mean(audio_signal, 1)
252 | 
253 | # Set the parameters for the Fourier analysis
254 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
255 | window_function = scipy.signal.hamming(window_length, sym=False)
256 | step_length = int(window_length/2)
257 | 
258 | # Compute the mel filterbank
259 | number_mels = 128
260 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
261 | 
262 | # Compute the mel spectrogram using the filterbank
263 | mel_spectrogram = zaf.melspectrogram(audio_signal, window_function, step_length, mel_filterbank)
264 | 
265 | # Display the mel spectrogram in dB, seconds, and Hz
266 | number_samples = len(audio_signal)
267 | plt.figure(figsize=(14, 5))
268 | zaf.melspecshow(mel_spectrogram, number_samples, sampling_frequency, window_length, xtick_step=1)
269 | plt.title("Mel spectrogram (dB)")
270 | plt.tight_layout()
271 | plt.show()
272 | ```
273 | 
274 | <img src="images/melspectrogram.png" width="1000">
275 | 
276 | 
277 | ### mfcc
278 | 
279 | Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
280 | 
281 | ```
282 | audio_mfcc = zaf.mfcc(audio_signal, sample_frequency, number_filters, number_coefficients)
283 | 
284 | Inputs:
285 |     audio_signal: audio signal (number_samples,)
286 |     sampling_frequency: sampling frequency in Hz
287 |     number_filters: number of filters
288 |     number_coefficients: number of coefficients (without the 0th coefficient)
289 | Output:
290 |     audio_mfcc: audio MFCCs (number_times, number_coefficients)
291 | ```
292 | 
293 | #### Example: Compute and display the MFCCs, delta MFCCs, and delta-delta MFCCs.
294 | 
295 | ```
296 | # Import the needed modules
297 | import numpy as np
298 | import scipy.signal
299 | import zaf
300 | import matplotlib.pyplot as plt
301 | 
302 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
303 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
304 | audio_signal = np.mean(audio_signal, 1)
305 | 
306 | # Set the parameters for the Fourier analysis
307 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
308 | window_function = scipy.signal.hamming(window_length, sym=False)
309 | step_length = int(window_length/2)
310 | 
311 | # Compute the mel filterbank
312 | number_mels = 40
313 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
314 | 
315 | # Compute the MFCCs using the filterbank
316 | number_coefficients = 20
317 | audio_mfcc = zaf.mfcc(audio_signal, window_function, step_length, mel_filterbank, number_coefficients)
318 | 
319 | # Compute the delta and delta-delta MFCCs
320 | audio_dmfcc = np.diff(audio_mfcc, n=1, axis=1)
321 | audio_ddmfcc = np.diff(audio_dmfcc, n=1, axis=1)
322 | 
323 | # Display the MFCCs, delta MFCCs, and delta-delta MFCCs in seconds
324 | number_samples = len(audio_signal)
325 | xtick_step = 1
326 | plt.figure(figsize=(14, 7))
327 | plt.subplot(3, 1, 1)
328 | zaf.mfccshow(audio_mfcc, number_samples, sampling_frequency, xtick_step), plt.title("MFCCs")
329 | plt.subplot(3, 1, 2)
330 | zaf.mfccshow(audio_dmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta MFCCs")
331 | plt.subplot(3, 1, 3)
332 | zaf.mfccshow(audio_ddmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta-delta MFCCs")
333 | plt.tight_layout()
334 | plt.show()
335 | ```
336 | 
337 | <img src="images/mfcc.png" width="1000">
338 | 
339 | 
340 | ### cqtkernel
341 | 
342 | Compute the constant-Q transform (CQT) kernel.
343 | 
344 | ```
345 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
346 | 
347 | Inputs:
348 |     sampling_frequency: sampling frequency in Hz
349 |     octave_resolution: number of frequency channels per octave
350 |     minimum_frequency: minimum frequency in Hz
351 |     maximum_frequency: maximum frequency in Hz
352 | Output:
353 |     cqt_kernel: CQT kernel (sparse) (number_frequencies, fft_length)
354 | ```
355 | 
356 | #### Example: Compute and display the CQT kernel.
357 | 
358 | ```
359 | # Import the needed modules
360 | import numpy as np
361 | import zaf
362 | import matplotlib.pyplot as plt
363 | 
364 | # Set the parameters for the CQT kernel
365 | sampling_frequency = 44100
366 | octave_resolution = 24
367 | minimum_frequency = 55
368 | maximum_frequency = sampling_frequency/2
369 | 
370 | # Compute the CQT kernel
371 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
372 | 
373 | # Display the magnitude CQT kernel
374 | plt.figure(figsize=(14, 5))
375 | plt.imshow(np.absolute(cqt_kernel).toarray(), aspect="auto", cmap="jet", origin="lower")
376 | plt.title("Magnitude CQT kernel")
377 | plt.xlabel("FFT index")
378 | plt.ylabel("CQT index")
379 | plt.tight_layout()
380 | plt.show()
381 | ```
382 | 
383 | <img src="images/cqtkernel.png" width="1000">
384 | 
385 | 
386 | ### cqtspectrogram
387 | 
388 | Compute the constant-Q transform (CQT) spectrogram using a CQT kernel.
389 | 
390 | ```
391 | cqt_spectrogram = zaf.cqtspectrogram(audio_signal, sample_frequency, time_resolution, cqt_kernel)
392 | 
393 | Inputs:
394 |     audio_signal: audio signal (number_samples,)
395 |     sampling_frequency: sampling frequency in Hz
396 |     time_resolution: number of time frames per second
397 |     cqt_kernel: CQT kernel (number_frequencies, fft_length)
398 | Output:
399 |     cqt_spectrogram: CQT spectrogram (number_frequencies, number_times)
400 | ```
401 | 
402 | #### Example: Compute and display the CQT spectrogram.
403 | 
404 | ```
405 | # Import the needed modules
406 | import numpy as np
407 | import zaf
408 | import matplotlib.pyplot as plt
409 | 
410 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
411 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
412 | audio_signal = np.mean(audio_signal, 1)
413 | 
414 | # Compute the CQT kernel
415 | octave_resolution = 24
416 | minimum_frequency = 55
417 | maximum_frequency = 3520
418 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
419 | 
420 | # Compute the CQT spectrogram using the kernel
421 | time_resolution = 25
422 | cqt_spectrogram = zaf.cqtspectrogram(audio_signal, sampling_frequency, time_resolution, cqt_kernel)
423 | 
424 | # Display the CQT spectrogram in dB, seconds, and Hz
425 | plt.figure(figsize=(14, 5))
426 | zaf.cqtspecshow(cqt_spectrogram, time_resolution, octave_resolution, minimum_frequency, xtick_step=1)
427 | plt.title("CQT spectrogram (dB)")
428 | plt.tight_layout()
429 | plt.show()
430 | ```
431 | 
432 | <img src="images/cqtspectrogram.png" width="1000">
433 | 
434 | 
435 | ### cqtchromagram
436 | 
437 | Compute the constant-Q transform (CQT) chromagram using a CQT kernel.
438 | 
439 | ```
440 | cqt_chromagram = zaf.cqtchromagram(audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel)
441 | 
442 | Inputs:
443 |     audio_signal: audio signal (number_samples,)
444 |     sampling_frequency: sampling frequency in Hz
445 |     time_resolution: number of time frames per second
446 |     octave_resolution: number of frequency channels per octave
447 |     cqt_kernel: CQT kernel (number_frequencies, fft_length)
448 | Output:
449 |     cqt_chromagram: CQT chromagram (number_chromas, number_times)
450 | ```
451 | 
452 | #### Example: Compute and display the CQT chromagram.
453 | 
454 | ```
455 | # Import the needed modules
456 | import numpy as np
457 | import zaf
458 | import matplotlib.pyplot as plt
459 | 
460 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
461 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
462 | audio_signal = np.mean(audio_signal, 1)
463 | 
464 | # Compute the CQT kernel
465 | octave_resolution = 24
466 | minimum_frequency = 55
467 | maximum_frequency = 3520
468 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
469 | 
470 | # Compute the CQT chromagram using the kernel
471 | time_resolution = 25
472 | cqt_chromagram = zaf.cqtchromagram(audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel)
473 | 
474 | # Display the CQT chromagram in seconds
475 | plt.figure(figsize=(14, 3))
476 | zaf.cqtchromshow(cqt_chromagram, time_resolution, xtick_step=1)
477 | plt.title("CQT chromagram")
478 | plt.tight_layout()
479 | plt.show()
480 | ```
481 | 
482 | <img src="images/cqtchromagram.png" width="1000">
483 | 
484 | 
485 | ### dct
486 | 
487 | Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
488 | 
489 | ```
490 | audio_dct = zaf.dct(audio_signal, dct_type)
491 | 
492 | Inputs:
493 |     audio_signal: audio signal (window_length,)
494 |     dct_type: dct type (1, 2, 3, or 4)
495 | Output:
496 |     audio_dct: audio DCT (number_frequencies,)
497 | ```
498 | 
499 | #### Example: Compute the 4 different DCTs and compare them to SciPy's DCTs.
500 | 
501 | ```
502 | # Import the needed modules
503 | import numpy as np
504 | import zaf
505 | import scipy.fftpack
506 | import matplotlib.pyplot as plt
507 | 
508 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
509 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
510 | audio_signal = np.mean(audio_signal, 1)
511 | 
512 | # Get an audio segment for a given window length
513 | window_length = 1024
514 | audio_segment = audio_signal[0:window_length]
515 | 
516 | # Compute the DCT-I, II, III, and IV
517 | audio_dct1 = zaf.dct(audio_segment, 1)
518 | audio_dct2 = zaf.dct(audio_segment, 2)
519 | audio_dct3 = zaf.dct(audio_segment, 3)
520 | audio_dct4 = zaf.dct(audio_segment, 4)
521 | 
522 | # Compute SciPy's DCT-I, II, III, and IV (orthogonalized)
523 | scipy_dct1 = scipy.fftpack.dct(audio_segment, type=1, norm="ortho")
524 | scipy_dct2 = scipy.fftpack.dct(audio_segment, type=2, norm="ortho")
525 | scipy_dct3 = scipy.fftpack.dct(audio_segment, type=3, norm="ortho")
526 | scipy_dct4 = scipy.fftpack.dct(audio_segment, type=4, norm="ortho")
527 | 
528 | # Plot the DCT-I, II, III, and IV, SciPy's versions, and their differences
529 | plt.figure(figsize=(14, 7))
530 | plt.subplot(3, 4, 1), plt.plot(audio_dct1), plt.autoscale(tight=True), plt.title("DCT-I")
531 | plt.subplot(3, 4, 2), plt.plot(audio_dct2), plt.autoscale(tight=True), plt.title("DCT-II")
532 | plt.subplot(3, 4, 3), plt.plot(audio_dct3), plt.autoscale(tight=True), plt.title("DCT-III")
533 | plt.subplot(3, 4, 4), plt.plot(audio_dct4), plt.autoscale(tight=True), plt.title("DCT-IV")
534 | plt.subplot(3, 4, 5), plt.plot(scipy_dct1), plt.autoscale(tight=True), plt.title("SciPy's DCT-I")
535 | plt.subplot(3, 4, 6), plt.plot(scipy_dct2), plt.autoscale(tight=True), plt.title("SciPy's DCT-II")
536 | plt.subplot(3, 4, 7), plt.plot(scipy_dct3), plt.autoscale(tight=True), plt.title("SciPy's DCT-III")
537 | plt.subplot(3, 4, 8), plt.plot(scipy_dct4), plt.autoscale(tight=True), plt.title("SciPy's DCT-IV")
538 | plt.subplot(3, 4, 9), plt.plot(audio_dct1-scipy_dct1), plt.autoscale(tight=True), plt.title("DCT-I - SciPy's DCT-I")
539 | plt.subplot(3, 4, 10), plt.plot(audio_dct2-scipy_dct2), plt.autoscale(tight=True), plt.title("DCT-II - SciPy's DCT-II")
540 | plt.subplot(3, 4, 11), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-III - SciPy's DCT-III")
541 | plt.subplot(3, 4, 12), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-IV - SciPy's DCT-IV")
542 | plt.tight_layout()
543 | plt.show()
544 | ```
545 | 
546 | <img src="images/dct.png" width="1000">
547 | 
548 | 
549 | ### dst
550 | 
551 | Compute the discrete sine transform (DST) using the fast Fourier transform (FFT).
552 | 
553 | ```
554 | audio_dst = zaf.dst(audio_signal, dst_type)
555 | 
556 | Inputs:
557 |     audio_signal: audio signal (window_length,)
558 |     dst_type: DST type (1, 2, 3, or 4)
559 | Output:
560 |     audio_dst: audio DST (number_frequencies,)
561 | ```
562 | 
563 | #### Example: Compute the 4 different DSTs and compare their respective inverses with the original audio.
564 | 
565 | ```
566 | # Import the needed modules
567 | import numpy as np
568 | import zaf
569 | import matplotlib.pyplot as plt
570 | 
571 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
572 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
573 | audio_signal = np.mean(audio_signal, 1)
574 | 
575 | # Get an audio segment for a given window length
576 | window_length = 1024
577 | audio_segment = audio_signal[0:window_length]
578 | 
579 | # Compute the DST-I, II, III, and IV
580 | audio_dst1 = zaf.dst(audio_segment, 1)
581 | audio_dst2 = zaf.dst(audio_segment, 2)
582 | audio_dst3 = zaf.dst(audio_segment, 3)
583 | audio_dst4 = zaf.dst(audio_segment, 4)
584 | 
585 | # Compute their respective inverses, i.e., DST-I, II, III, and IV
586 | audio_idst1 = zaf.dst(audio_dst1, 1)
587 | audio_idst2 = zaf.dst(audio_dst2, 3)
588 | audio_idst3 = zaf.dst(audio_dst3, 2)
589 | audio_idst4 = zaf.dst(audio_dst4, 4)
590 | 
591 | # Plot the DST-I, II, III, and IV, their respective inverses, and their differences with the original audio segment
592 | plt.figure(figsize=(14, 7))
593 | plt.subplot(3, 4, 1), plt.plot(audio_dst1), plt.autoscale(tight=True), plt.title("DCT-I")
594 | plt.subplot(3, 4, 2), plt.plot(audio_dst2), plt.autoscale(tight=True), plt.title("DST-II")
595 | plt.subplot(3, 4, 3), plt.plot(audio_dst3), plt.autoscale(tight=True), plt.title("DST-III")
596 | plt.subplot(3, 4, 4), plt.plot(audio_dst4), plt.autoscale(tight=True), plt.title("DST-IV")
597 | plt.subplot(3, 4, 5), plt.plot(audio_idst1), plt.autoscale(tight=True), plt.title("Inverse DST-I (DST-I)")
598 | plt.subplot(3, 4, 6), plt.plot(audio_idst2), plt.autoscale(tight=True), plt.title("Inverse DST-II (DST-III)")
599 | plt.subplot(3, 4, 7), plt.plot(audio_idst3), plt.autoscale(tight=True), plt.title("Inverse DST-III (DST-II)")
600 | plt.subplot(3, 4, 8), plt.plot(audio_idst4), plt.autoscale(tight=True), plt.title("Inverse DST-IV (DST-IV)")
601 | plt.subplot(3, 4, 9), plt.plot(audio_idst1-audio_segment), plt.autoscale(tight=True)
602 | plt.title("Inverse DST-I - audio segment")
603 | plt.subplot(3, 4, 10), plt.plot(audio_idst2-audio_segment), plt.autoscale(tight=True)
604 | plt.title("Inverse DST-II - audio segment")
605 | plt.subplot(3, 4, 11), plt.plot(audio_idst3-audio_segment), plt.autoscale(tight=True)
606 | plt.title("Inverse DST-III - audio segment")
607 | plt.subplot(3, 4, 12), plt.plot(audio_idst4-audio_segment), plt.autoscale(tight=True)
608 | plt.title("Inverse DST-IV - audio segment")
609 | plt.tight_layout()
610 | plt.show()
611 | ```
612 | 
613 | <img src="images/dst.png" width="1000">
614 | 
615 | 
616 | ### mdct
617 | 
618 | Compute the modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
619 | 
620 | ```
621 | audio_mdct = zaf.mdct(audio_signal, window_function)
622 | 
623 | Inputs:
624 |     audio_signal: audio signal (number_samples,)
625 |     window_function: window function (window_length,)
626 | Output:
627 |     audio_mdct: audio MDCT (number_frequencies, number_times)
628 | ```
629 | 
630 | #### Example: Compute and display the MDCT as used in the AC-3 audio coding format.
631 | 
632 | ```
633 | # Import the needed modules
634 | import numpy as np
635 | import zaf
636 | import matplotlib.pyplot as plt
637 | 
638 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
639 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
640 | audio_signal = np.mean(audio_signal, 1)
641 | 
642 | # Compute the Kaiser-Bessel-derived (KBD) window as used in the AC-3 audio coding format
643 | window_length = 512
644 | alpha_value = 5
645 | window_function = np.kaiser(int(window_length/2)+1, alpha_value*np.pi)
646 | window_function2 = np.cumsum(window_function[1:int(window_length/2)])
647 | window_function = np.sqrt(np.concatenate((window_function2, window_function2[int(window_length/2)::-1]))
648 |                           /np.sum(window_function))
649 | 
650 | # Compute the MDCT
651 | audio_mdct = zaf.mdct(audio_signal, window_function)
652 | 
653 | # Display the MDCT in dB, seconds, and Hz
654 | number_samples = len(audio_signal)
655 | plt.figure(figsize=(14, 7))
656 | zaf.specshow(np.absolute(audio_mdct), number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
657 | plt.title("MDCT (dB)")
658 | plt.tight_layout()
659 | plt.show()
660 | ```
661 | 
662 | <img src="images/mdct.png" width="1000">
663 | 
664 | 
665 | ### imdct
666 | 
667 | Compute the inverse modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
668 | 
669 | ```
670 | audio_signal = zaf.imdct(audio_mdct, window_function)
671 | 
672 | Inputs:
673 |     audio_mdct: audio MDCT (number_frequencies, number_times)
674 |     window_function: window function (window_length,)
675 | Output:
676 |     audio_signal: audio signal (number_samples,)
677 | ```
678 | 
679 | #### Example: Verify that the MDCT is perfectly invertible.
680 | 
681 | ```
682 | # Import the needed modules
683 | import numpy as np
684 | import zaf
685 | import matplotlib.pyplot as plt
686 | 
687 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
688 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
689 | audio_signal = np.mean(audio_signal, 1)
690 | 
691 | # Compute the MDCT with a slope function as used in the Vorbis audio coding format
692 | window_length = 2048
693 | window_function = np.sin(np.pi/2*pow(np.sin(np.pi/window_length*np.arange(0.5, window_length+0.5)), 2))
694 | audio_mdct = zaf.mdct(audio_signal, window_function)
695 | 
696 | # Compute the inverse MDCT
697 | audio_signal2 = zaf.imdct(audio_mdct, window_function)
698 | audio_signal2 = audio_signal2[0:len(audio_signal)]
699 | 
700 | # Compute the differences between the original signal and the resynthesized one
701 | audio_differences = audio_signal-audio_signal2
702 | y_max = np.max(np.absolute(audio_differences))
703 | 
704 | # Display the original and resynthesized signals, and their differences in seconds
705 | xtick_step = 1
706 | plt.figure(figsize=(14, 7))
707 | plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
708 | plt.ylim(-1, 1), plt.title("Original signal")
709 | plt.subplot(3, 1, 2), zaf.sigplot(audio_signal2, sampling_frequency, xtick_step)
710 | plt.ylim(-1, 1), plt.title("Resyntesized signal")
711 | plt.subplot(3, 1, 3), zaf.sigplot(audio_differences, sampling_frequency, xtick_step)
712 | plt.ylim(-y_max, y_max), plt.title("Original - resyntesized signal")
713 | plt.tight_layout()
714 | plt.show()
715 | ```
716 | 
717 | <img src="images/imdct.png" width="1000">
718 | 
719 | 
720 | ## examples.ipynb
721 | 
722 | This Jupyter notebook shows some examples for the different functions of the Python module `zaf`.
723 | 
724 | See [Jupyter notebook viewer](https://nbviewer.jupyter.org/github/zafarrafii/Zaf-Python/blob/master/examples.ipynb).
725 | 
726 | 
727 | ## audio_file.wav
728 | 
729 | 23 second audio excerpt from the song *Que Pena Tanto Faz* performed by *Tamy*.
730 | 
731 | 
732 | # Author
733 | 
734 | - Zafar Rafii
735 | - http://zafarrafii.com/
736 | - [CV](http://zafarrafii.com/Zafar%20Rafii%20-%20C.V..pdf)
737 | - [GitHub](https://github.com/zafarrafii)
738 | - [LinkedIn](https://www.linkedin.com/in/zafarrafii/)
739 | - [Google Scholar](https://scholar.google.com/citations?user=8wbS2EsAAAAJ&hl=en)
740 | 


--------------------------------------------------------------------------------
/audio_file.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/audio_file.wav


--------------------------------------------------------------------------------
/images/cqtchromagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/cqtchromagram.png


--------------------------------------------------------------------------------
/images/cqtkernel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/cqtkernel.png


--------------------------------------------------------------------------------
/images/cqtspectrogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/cqtspectrogram.png


--------------------------------------------------------------------------------
/images/dct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/dct.png


--------------------------------------------------------------------------------
/images/dst.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/dst.png


--------------------------------------------------------------------------------
/images/imdct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/imdct.png


--------------------------------------------------------------------------------
/images/istft.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/istft.png


--------------------------------------------------------------------------------
/images/mdct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/mdct.png


--------------------------------------------------------------------------------
/images/melfilterbank.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/melfilterbank.png


--------------------------------------------------------------------------------
/images/melspectrogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/melspectrogram.png


--------------------------------------------------------------------------------
/images/mfcc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/mfcc.png


--------------------------------------------------------------------------------
/images/stft.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/stft.png


--------------------------------------------------------------------------------
/zaf.py:
--------------------------------------------------------------------------------
   1 | """
   2 | This Python module implements a number of functions for audio signal analysis.
   3 | 
   4 | Functions:
   5 |     stft - Compute the short-time Fourier transform (STFT).
   6 |     istft - Compute the inverse STFT.
   7 |     melfilterbank - Compute the mel filterbank.
   8 |     melspectrogram - Compute the mel spectrogram using a mel filterbank.
   9 |     mfcc - Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
  10 |     cqtkernel - Compute the constant-Q transform (CQT) kernel.
  11 |     cqtspectrogram - Compute the CQT spectrogram using a CQT kernel.
  12 |     cqtchromagram - Compute the CQT chromagram using a CQT kernel.
  13 |     dct - Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
  14 |     dst - Compute the discrete sine transform (DST) using the FFT.
  15 |     mdct - Compute the modified discrete cosine transform (MDCT) using the FFT.
  16 |     imdct - Compute the inverse MDCT using the FFT.
  17 | 
  18 | Other:
  19 |     wavread - Read a WAVE file (using SciPy).
  20 |     wavwrite - Write a WAVE file (using SciPy).
  21 |     sigplot - Plot a signal in seconds.
  22 |     specshow - Display an spectrogram in dB, seconds, and Hz.
  23 |     melspecshow - Display a mel spectrogram in dB, seconds, and Hz.
  24 |     mfccshow - Display MFCCs in seconds.
  25 |     cqtspecshow - Display a CQT spectrogram in dB, seconds, and Hz.
  26 |     cqtchromshow - Display a CQT chromagram in seconds.
  27 | 
  28 | Author:
  29 |     Zafar Rafii
  30 |     zafarrafii@gmail.com
  31 |     http://zafarrafii.com
  32 |     https://github.com/zafarrafii
  33 |     https://www.linkedin.com/in/zafarrafii/
  34 |     08/24/21
  35 | """
  36 | 
  37 | import numpy as np
  38 | import scipy.sparse
  39 | import scipy.signal
  40 | import scipy.fftpack
  41 | import scipy.io.wavfile
  42 | import matplotlib.pyplot as plt
  43 | 
  44 | 
  45 | def stft(audio_signal, window_function, step_length):
  46 |     """
  47 |     Compute the short-time Fourier transform (STFT).
  48 | 
  49 |     Inputs:
  50 |         audio_signal: audio signal (number_samples,)
  51 |         window_function: window function (window_length,)
  52 |         step_length: step length in samples
  53 |     Output:
  54 |         audio_stft: audio STFT (window_length, number_frames)
  55 | 
  56 |     Example: Compute and display the spectrogram from an audio file.
  57 |         # Import the needed modules
  58 |         import numpy as np
  59 |         import scipy.signal
  60 |         import zaf
  61 |         import matplotlib.pyplot as plt
  62 | 
  63 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
  64 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
  65 |         audio_signal = np.mean(audio_signal, 1)
  66 | 
  67 |         # Set the window duration in seconds (audio is stationary around 40 milliseconds)
  68 |         window_duration = 0.04
  69 | 
  70 |         # Derive the window length in samples (use powers of 2 for faster FFT and constant overlap-add (COLA))
  71 |         window_length = pow(2, int(np.ceil(np.log2(window_duration*sampling_frequency))))
  72 | 
  73 |         # Compute the window function (use SciPy's periodic Hamming window for COLA as NumPy's Hamming window is symmetric)
  74 |         window_function = scipy.signal.hamming(window_length, sym=False)
  75 | 
  76 |         # Set the step length in samples (half of the window length for COLA)
  77 |         step_length = int(window_length/2)
  78 | 
  79 |         # Compute the STFT
  80 |         audio_stft = zaf.stft(audio_signal, window_function, step_length)
  81 | 
  82 |         # Derive the magnitude spectrogram (without the DC component and the mirrored frequencies)
  83 |         audio_spectrogram = np.absolute(audio_stft[1:int(window_length/2)+1, :])
  84 | 
  85 |         # Display the spectrogram in dB, seconds, and Hz
  86 |         number_samples = len(audio_signal)
  87 |         plt.figure(figsize=(14, 7))
  88 |         zaf.specshow(audio_spectrogram, number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
  89 |         plt.title("Spectrogram (dB)")
  90 |         plt.tight_layout()
  91 |         plt.show()
  92 |     """
  93 | 
  94 |     # Get the number of samples and the window length in samples
  95 |     number_samples = len(audio_signal)
  96 |     window_length = len(window_function)
  97 | 
  98 |     # Derive the zero-padding length at the start and at the end of the signal to center the windows
  99 |     padding_length = int(np.floor(window_length / 2))
 100 | 
 101 |     # Compute the number of time frames given the zero-padding at the start and at the end of the signal
 102 |     number_times = (
 103 |         int(
 104 |             np.ceil(
 105 |                 ((number_samples + 2 * padding_length) - window_length) / step_length
 106 |             )
 107 |         )
 108 |         + 1
 109 |     )
 110 | 
 111 |     # Zero-pad the start and the end of the signal to center the windows
 112 |     audio_signal = np.pad(
 113 |         audio_signal,
 114 |         (
 115 |             padding_length,
 116 |             (
 117 |                 number_times * step_length
 118 |                 + (window_length - step_length)
 119 |                 - padding_length
 120 |             )
 121 |             - number_samples,
 122 |         ),
 123 |         "constant",
 124 |         constant_values=0,
 125 |     )
 126 | 
 127 |     # Initialize the STFT
 128 |     audio_stft = np.zeros((window_length, number_times))
 129 | 
 130 |     # Loop over the time frames
 131 |     i = 0
 132 |     for j in range(number_times):
 133 | 
 134 |         # Window the signal
 135 |         audio_stft[:, j] = audio_signal[i : i + window_length] * window_function
 136 |         i = i + step_length
 137 | 
 138 |     # Compute the Fourier transform of the frames using the FFT
 139 |     audio_stft = np.fft.fft(audio_stft, axis=0)
 140 | 
 141 |     return audio_stft
 142 | 
 143 | 
 144 | def istft(audio_stft, window_function, step_length):
 145 |     """
 146 |     Compute the inverse short-time Fourier transform (STFT).
 147 | 
 148 |     Inputs:
 149 |         audio_stft: audio STFT (window_length, number_frames)
 150 |         window_function: window function (window_length,)
 151 |         step_length: step length in samples
 152 |     Output:
 153 |         audio_signal: audio signal (number_samples,)
 154 | 
 155 |     Example: Estimate the center and the sides from a stereo audio file.
 156 |         # Import the needed modules
 157 |         import numpy as np
 158 |         import scipy.signal
 159 |         import zaf
 160 |         import matplotlib.pyplot as plt
 161 | 
 162 |         # Read the (stereo) audio signal with its sampling frequency in Hz
 163 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 164 | 
 165 |         # Set the parameters for the STFT
 166 |         window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
 167 |         window_function = scipy.signal.hamming(window_length, sym=False)
 168 |         step_length = int(window_length/2)
 169 | 
 170 |         # Compute the STFTs for the left and right channels
 171 |         audio_stft1 = zaf.stft(audio_signal[:, 0], window_function, step_length)
 172 |         audio_stft2 = zaf.stft(audio_signal[:, 1], window_function, step_length)
 173 | 
 174 |         # Derive the magnitude spectrograms (with DC component) for the left and right channels
 175 |         number_frequencies = int(window_length/2)+1
 176 |         audio_spectrogram1 = abs(audio_stft1[0:number_frequencies, :])
 177 |         audio_spectrogram2 = abs(audio_stft2[0:number_frequencies, :])
 178 | 
 179 |         # Estimate the time-frequency masks for the left and right channels for the center
 180 |         center_mask1 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram1
 181 |         center_mask2 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram2
 182 | 
 183 |         # Derive the STFTs for the left and right channels for the center (with mirrored frequencies)
 184 |         center_stft1 = np.multiply(np.concatenate((center_mask1, center_mask1[-2:0:-1, :])), audio_stft1)
 185 |         center_stft2 = np.multiply(np.concatenate((center_mask2, center_mask2[-2:0:-1, :])), audio_stft2)
 186 | 
 187 |         # Synthesize the signals for the left and right channels for the center
 188 |         center_signal1 = zaf.istft(center_stft1, window_function, step_length)
 189 |         center_signal2 = zaf.istft(center_stft2, window_function, step_length)
 190 | 
 191 |         # Derive the final stereo center and sides signals
 192 |         center_signal = np.stack((center_signal1, center_signal2), axis=1)
 193 |         center_signal = center_signal[0:np.shape(audio_signal)[0], :]
 194 |         sides_signal = audio_signal-center_signal
 195 | 
 196 |         # Write the center and sides signals
 197 |         zaf.wavwrite(center_signal, sampling_frequency, "center_file.wav")
 198 |         zaf.wavwrite(sides_signal, sampling_frequency, "sides_file.wav")
 199 | 
 200 |         # Display the original, center, and sides signals in seconds
 201 |         xtick_step = 1
 202 |         plt.figure(figsize=(14, 7))
 203 |         plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
 204 |         plt.ylim(-1, 1), plt.title("Original signal")
 205 |         plt.subplot(3, 1, 2), zaf.sigplot(center_signal, sampling_frequency, xtick_step)
 206 |         plt.ylim(-1, 1), plt.title("Center signal")
 207 |         plt.subplot(3, 1, 3), zaf.sigplot(sides_signal, sampling_frequency, xtick_step)
 208 |         plt.ylim(-1, 1), plt.title("Sides signal")
 209 |         plt.tight_layout()
 210 |         plt.show()
 211 |     """
 212 | 
 213 |     # Get the window length in samples and the number of time frames
 214 |     window_length, number_times = np.shape(audio_stft)
 215 | 
 216 |     # Compute the number of samples for the signal
 217 |     number_samples = number_times * step_length + (window_length - step_length)
 218 | 
 219 |     # Initialize the signal
 220 |     audio_signal = np.zeros(number_samples)
 221 | 
 222 |     # Compute the inverse Fourier transform of the frames and take the real part to ensure real values
 223 |     audio_stft = np.real(np.fft.ifft(audio_stft, axis=0))
 224 | 
 225 |     # Loop over the time frames
 226 |     i = 0
 227 |     for j in range(number_times):
 228 | 
 229 |         # Perform a constant overlap-add (COLA) of the signal (with proper window function and step length)
 230 |         audio_signal[i : i + window_length] = (
 231 |             audio_signal[i : i + window_length] + audio_stft[:, j]
 232 |         )
 233 |         i = i + step_length
 234 | 
 235 |     # Remove the zero-padding at the start and at the end of the signal
 236 |     audio_signal = audio_signal[
 237 |         window_length - step_length : number_samples - (window_length - step_length)
 238 |     ]
 239 | 
 240 |     # Normalize the signal by the gain introduced by the COLA (if any)
 241 |     audio_signal = audio_signal / sum(window_function[0:window_length:step_length])
 242 | 
 243 |     return audio_signal
 244 | 
 245 | 
 246 | def melfilterbank(sampling_frequency, window_length, number_filters):
 247 |     """
 248 |     Compute the mel filterbank.
 249 | 
 250 |     Inputs:
 251 |         sampling_frequency: sampling frequency in Hz
 252 |         window_length: window length for the Fourier analysis in samples
 253 |         number_mels: number of mel filters
 254 |     Output:
 255 |         mel_filterbank: mel filterbank (sparse) (number_mels, number_frequencies)
 256 | 
 257 |     Example: Compute and display the mel filterbank.
 258 |         # Import the needed modules
 259 |         import numpy as np
 260 |         import zaf
 261 |         import matplotlib.pyplot as plt
 262 | 
 263 |         # Compute the mel filterbank using some parameters
 264 |         sampling_frequency = 44100
 265 |         window_length = pow(2, int(np.ceil(np.log2(0.04 * sampling_frequency))))
 266 |         number_mels = 128
 267 |         mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
 268 | 
 269 |         # Display the mel filterbank
 270 |         plt.figure(figsize=(14, 5))
 271 |         plt.imshow(mel_filterbank.toarray(), aspect="auto", cmap="jet", origin="lower")
 272 |         plt.title("Mel filterbank")
 273 |         plt.xlabel("Frequency index")
 274 |         plt.ylabel("Mel index")
 275 |         plt.tight_layout()
 276 |         plt.show()
 277 |     """
 278 | 
 279 |     # Compute the minimum and maximum mels
 280 |     minimum_mel = 2595 * np.log10(1 + (sampling_frequency / window_length) / 700)
 281 |     maximum_mel = 2595 * np.log10(1 + (sampling_frequency / 2) / 700)
 282 | 
 283 |     # Derive the width of the half-overlapping filters in the mel scale (constant)
 284 |     filter_width = 2 * (maximum_mel - minimum_mel) / (number_filters + 1)
 285 | 
 286 |     # Compute the start and end indices of the filters in the mel scale (linearly spaced)
 287 |     filter_indices = np.arange(minimum_mel, maximum_mel + 1, filter_width / 2)
 288 | 
 289 |     # Derive the indices of the filters in the linear frequency scale (log spaced)
 290 |     filter_indices = np.round(
 291 |         700
 292 |         * (np.power(10, filter_indices / 2595) - 1)
 293 |         * window_length
 294 |         / sampling_frequency
 295 |     ).astype(int)
 296 | 
 297 |     # Initialize the mel filterbank
 298 |     mel_filterbank = np.zeros((number_filters, int(window_length / 2)))
 299 | 
 300 |     # Loop over the filters
 301 |     for i in range(number_filters):
 302 | 
 303 |         # Compute the left and right sides of the triangular filters
 304 |         # (this is more accurate than creating triangular filters directly)
 305 |         mel_filterbank[i, filter_indices[i] - 1 : filter_indices[i + 1]] = np.linspace(
 306 |             0,
 307 |             1,
 308 |             num=filter_indices[i + 1] - filter_indices[i] + 1,
 309 |         )
 310 |         mel_filterbank[
 311 |             i, filter_indices[i + 1] - 1 : filter_indices[i + 2]
 312 |         ] = np.linspace(
 313 |             1,
 314 |             0,
 315 |             num=filter_indices[i + 2] - filter_indices[i + 1] + 1,
 316 |         )
 317 | 
 318 |     # Make the mel filterbank sparse by saving it as a compressed sparse row matrix
 319 |     mel_filterbank = scipy.sparse.csr_matrix(mel_filterbank)
 320 | 
 321 |     return mel_filterbank
 322 | 
 323 | 
 324 | def melspectrogram(audio_signal, window_function, step_length, mel_filterbank):
 325 |     """
 326 |     Compute the mel spectrogram using a mel filterbank.
 327 | 
 328 |     Inputs:
 329 |         audio_signal: audio signal (number_samples,)
 330 |         window_function: window function (window_length,)
 331 |         step_length: step length in samples
 332 |         mel_filterbank: mel filterbank (number_mels, number_frequencies)
 333 |     Output:
 334 |         mel_spectrogram: mel spectrogram (number_mels, number_times)
 335 | 
 336 |     Example: Compute and display the mel spectrogram.
 337 |         # Import the needed modules
 338 |         import numpy as np
 339 |         import scipy.signal
 340 |         import zaf
 341 |         import matplotlib.pyplot as plt
 342 | 
 343 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
 344 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 345 |         audio_signal = np.mean(audio_signal, 1)
 346 | 
 347 |         # Set the parameters for the Fourier analysis
 348 |         window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
 349 |         window_function = scipy.signal.hamming(window_length, sym=False)
 350 |         step_length = int(window_length/2)
 351 | 
 352 |         # Compute the mel filterbank
 353 |         number_mels = 128
 354 |         mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
 355 | 
 356 |         # Compute the mel spectrogram using the filterbank
 357 |         mel_spectrogram = zaf.melspectrogram(audio_signal, window_function, step_length, mel_filterbank)
 358 | 
 359 |         # Display the mel spectrogram in dB, seconds, and Hz
 360 |         number_samples = len(audio_signal)
 361 |         plt.figure(figsize=(14, 5))
 362 |         zaf.melspecshow(mel_spectrogram, number_samples, sampling_frequency, window_length, xtick_step=1)
 363 |         plt.title("Mel spectrogram (dB)")
 364 |         plt.tight_layout()
 365 |         plt.show()
 366 |     """
 367 | 
 368 |     # Compute the magnitude spectrogram (without the DC component and the mirrored frequencies)
 369 |     audio_stft = stft(audio_signal, window_function, step_length)
 370 |     audio_spectrogram = abs(audio_stft[1 : int(len(window_function) / 2) + 1, :])
 371 | 
 372 |     # Compute the mel spectrogram by using the filterbank
 373 |     mel_spectrogram = np.matmul(mel_filterbank.toarray(), audio_spectrogram)
 374 | 
 375 |     return mel_spectrogram
 376 | 
 377 | 
 378 | def mfcc(
 379 |     audio_signal, window_function, step_length, mel_filterbank, number_coefficients
 380 | ):
 381 |     """
 382 |     Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
 383 | 
 384 |     Inputs:
 385 |         audio_signal: audio signal (number_samples,)
 386 |         window_function: window function (window_length,)
 387 |         step_length: step length in samples
 388 |         mel_filterbank: mel filterbank (number_mels, number_frequencies)
 389 |         number_coefficients: number of coefficients (without the 0th coefficient)
 390 |     Output:
 391 |         audio_mfcc: audio MFCCs (number_coefficients, number_times)
 392 | 
 393 |     Example: Compute and display the MFCCs, delta MFCCs, and delta-delta MFCCs.
 394 |         # Import the needed modules
 395 |         import numpy as np
 396 |         import scipy.signal
 397 |         import zaf
 398 |         import matplotlib.pyplot as plt
 399 | 
 400 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
 401 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 402 |         audio_signal = np.mean(audio_signal, 1)
 403 | 
 404 |         # Set the parameters for the Fourier analysis
 405 |         window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
 406 |         window_function = scipy.signal.hamming(window_length, sym=False)
 407 |         step_length = int(window_length/2)
 408 | 
 409 |         # Compute the mel filterbank
 410 |         number_mels = 40
 411 |         mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
 412 | 
 413 |         # Compute the MFCCs using the filterbank
 414 |         number_coefficients = 20
 415 |         audio_mfcc = zaf.mfcc(audio_signal, window_function, step_length, mel_filterbank, number_coefficients)
 416 | 
 417 |         # Compute the delta and delta-delta MFCCs
 418 |         audio_dmfcc = np.diff(audio_mfcc, n=1, axis=1)
 419 |         audio_ddmfcc = np.diff(audio_dmfcc, n=1, axis=1)
 420 | 
 421 |         # Display the MFCCs, delta MFCCs, and delta-delta MFCCs in seconds
 422 |         number_samples = len(audio_signal)
 423 |         xtick_step = 1
 424 |         plt.figure(figsize=(14, 7))
 425 |         plt.subplot(3, 1, 1)
 426 |         zaf.mfccshow(audio_mfcc, number_samples, sampling_frequency, xtick_step), plt.title("MFCCs")
 427 |         plt.subplot(3, 1, 2)
 428 |         zaf.mfccshow(audio_dmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta MFCCs")
 429 |         plt.subplot(3, 1, 3)
 430 |         zaf.mfccshow(audio_ddmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta-delta MFCCs")
 431 |         plt.tight_layout()
 432 |         plt.show()
 433 |     """
 434 | 
 435 |     # Compute the power spectrogram (without the DC component and the mirrored frequencies)
 436 |     audio_stft = stft(audio_signal, window_function, step_length)
 437 |     audio_spectrogram = np.power(
 438 |         abs(audio_stft[1 : int(len(window_function) / 2) + 1, :]), 2
 439 |     )
 440 | 
 441 |     # Compute the discrete cosine transform of the log magnitude spectrogram
 442 |     # mapped onto the mel scale using the filter bank
 443 |     audio_mfcc = scipy.fftpack.dct(
 444 |         np.log(
 445 |             np.matmul(mel_filterbank.toarray(), audio_spectrogram) + np.finfo(float).eps
 446 |         ),
 447 |         axis=0,
 448 |         norm="ortho",
 449 |     )
 450 | 
 451 |     # Keep only the first coefficients (without the 0th)
 452 |     audio_mfcc = audio_mfcc[1 : number_coefficients + 1, :]
 453 | 
 454 |     return audio_mfcc
 455 | 
 456 | 
 457 | def cqtkernel(
 458 |     sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency
 459 | ):
 460 |     """
 461 |     Compute the constant-Q transform (CQT) kernel.
 462 | 
 463 |     Inputs:
 464 |         sampling_frequency: sampling frequency in Hz
 465 |         octave_resolution: number of frequency channels per octave
 466 |         minimum_frequency: minimum frequency in Hz
 467 |         maximum_frequency: maximum frequency in Hz
 468 |     Output:
 469 |         cqt_kernel: CQT kernel (sparse) (number_frequencies, fft_length)
 470 | 
 471 |     Example: Compute and display a CQT kernel.
 472 |         # Import the needed modules
 473 |         import numpy as np
 474 |         import zaf
 475 |         import matplotlib.pyplot as plt
 476 | 
 477 |         # Set the parameters for the CQT kernel
 478 |         sampling_frequency = 44100
 479 |         octave_resolution = 24
 480 |         minimum_frequency = 55
 481 |         maximum_frequency = sampling_frequency/2
 482 | 
 483 |         # Compute the CQT kernel
 484 |         cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
 485 | 
 486 |         # Display the magnitude CQT kernel
 487 |         plt.figure(figsize=(14, 5))
 488 |         plt.imshow(np.absolute(cqt_kernel).toarray(), aspect="auto", cmap="jet", origin="lower")
 489 |         plt.title("Magnitude CQT kernel")
 490 |         plt.xlabel("FFT index")
 491 |         plt.ylabel("CQT index")
 492 |         plt.tight_layout()
 493 |         plt.show()
 494 |     """
 495 | 
 496 |     # Compute the constant ratio of frequency to resolution (= fk/(fk+1-fk))
 497 |     quality_factor = 1 / (pow(2, 1 / octave_resolution) - 1)
 498 | 
 499 |     # Compute the number of frequency channels for the CQT
 500 |     number_frequencies = round(
 501 |         octave_resolution * np.log2(maximum_frequency / minimum_frequency)
 502 |     )
 503 | 
 504 |     # Compute the window length for the FFT (= longest window for the minimum frequency)
 505 |     fft_length = int(
 506 |         pow(
 507 |             2, np.ceil(np.log2(quality_factor * sampling_frequency / minimum_frequency))
 508 |         )
 509 |     )
 510 | 
 511 |     # Initialize the (complex) CQT kernel
 512 |     cqt_kernel = np.zeros((number_frequencies, fft_length), dtype=complex)
 513 | 
 514 |     # Loop over the frequency channels
 515 |     for i in range(number_frequencies):
 516 | 
 517 |         # Derive the frequency value in Hz
 518 |         frequency_value = minimum_frequency * pow(2, i / octave_resolution)
 519 | 
 520 |         # Compute the window length in samples (nearest odd value to center the temporal kernel on 0)
 521 |         window_length = (
 522 |             2 * round(quality_factor * sampling_frequency / frequency_value / 2) + 1
 523 |         )
 524 | 
 525 |         # Compute the temporal kernel for the current frequency (odd and symmetric)
 526 |         temporal_kernel = (
 527 |             np.hamming(window_length)
 528 |             * np.exp(
 529 |                 2
 530 |                 * np.pi
 531 |                 * 1j
 532 |                 * quality_factor
 533 |                 * np.arange(-(window_length - 1) / 2, (window_length - 1) / 2 + 1)
 534 |                 / window_length
 535 |             )
 536 |             / window_length
 537 |         )
 538 | 
 539 |         # Derive the pad width to center the temporal kernels
 540 |         pad_width = int((fft_length - window_length + 1) / 2)
 541 | 
 542 |         # Save the current temporal kernel at the center
 543 |         # (the zero-padded temporal kernels are not perfectly symmetric anymore because of the even length here)
 544 |         cqt_kernel[i, pad_width : pad_width + window_length] = temporal_kernel
 545 | 
 546 |     # Derive the spectral kernels by taking the FFT of the temporal kernels
 547 |     # (the spectral kernels are almost real because the temporal kernels are almost symmetric)
 548 |     cqt_kernel = np.fft.fft(cqt_kernel, axis=1)
 549 | 
 550 |     # Make the CQT kernel sparser by zeroing magnitudes below a threshold
 551 |     cqt_kernel[np.absolute(cqt_kernel) < 0.01] = 0
 552 | 
 553 |     # Make the CQT kernel sparse by saving it as a compressed sparse row matrix
 554 |     cqt_kernel = scipy.sparse.csr_matrix(cqt_kernel)
 555 | 
 556 |     # Get the final CQT kernel by using Parseval's theorem
 557 |     cqt_kernel = np.conjugate(cqt_kernel) / fft_length
 558 | 
 559 |     return cqt_kernel
 560 | 
 561 | 
 562 | def cqtspectrogram(audio_signal, sampling_frequency, time_resolution, cqt_kernel):
 563 |     """
 564 |     Compute the constant-Q transform (CQT) spectrogram using a CQT kernel.
 565 | 
 566 |     Inputs:
 567 |         audio_signal: audio signal (number_samples,)
 568 |         sampling_frequency: sampling frequency in Hz
 569 |         time_resolution: number of time frames per second
 570 |         cqt_kernel: CQT kernel (number_frequencies, fft_length)
 571 |     Output:
 572 |         cqt_spectrogram: CQT spectrogram (number_frequencies, number_times)
 573 | 
 574 |     Example: Compute and display the CQT spectrogram.
 575 |         # Import the modules
 576 |         import numpy as np
 577 |         import zaf
 578 |         import matplotlib.pyplot as plt
 579 | 
 580 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
 581 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 582 |         audio_signal = np.mean(audio_signal, 1)
 583 | 
 584 |         # Compute the CQT kernel
 585 |         octave_resolution = 24
 586 |         minimum_frequency = 55
 587 |         maximum_frequency = 3520
 588 |         cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
 589 | 
 590 |         # Compute the CQT spectrogram using the kernel
 591 |         time_resolution = 25
 592 |         cqt_spectrogram = zaf.cqtspectrogram(audio_signal, sampling_frequency, time_resolution, cqt_kernel)
 593 | 
 594 |         # Display the CQT spectrogram in dB, seconds, and Hz
 595 |         plt.figure(figsize=(14, 5))
 596 |         zaf.cqtspecshow(cqt_spectrogram, time_resolution, octave_resolution, minimum_frequency, xtick_step=1)
 597 |         plt.title("CQT spectrogram (dB)")
 598 |         plt.tight_layout()
 599 |         plt.show()
 600 |     """
 601 | 
 602 |     # Derive the number of time samples per time frame
 603 |     step_length = round(sampling_frequency / time_resolution)
 604 | 
 605 |     # Compute the number of time frames
 606 |     number_times = int(np.floor(len(audio_signal) / step_length))
 607 | 
 608 |     # Get th number of frequency channels and the FFT length
 609 |     number_frequencies, fft_length = np.shape(cqt_kernel)
 610 | 
 611 |     # Zero-pad the signal to center the CQT
 612 |     audio_signal = np.pad(
 613 |         audio_signal,
 614 |         (
 615 |             int(np.ceil((fft_length - step_length) / 2)),
 616 |             int(np.floor((fft_length - step_length) / 2)),
 617 |         ),
 618 |         "constant",
 619 |         constant_values=(0, 0),
 620 |     )
 621 | 
 622 |     # Initialize the CQT spectrogram
 623 |     cqt_spectrogram = np.zeros((number_frequencies, number_times))
 624 | 
 625 |     # Loop over the time frames
 626 |     i = 0
 627 |     for j in range(number_times):
 628 | 
 629 |         # Compute the magnitude CQT using the kernel
 630 |         cqt_spectrogram[:, j] = np.absolute(
 631 |             cqt_kernel * np.fft.fft(audio_signal[i : i + fft_length])
 632 |         )
 633 |         i = i + step_length
 634 | 
 635 |     return cqt_spectrogram
 636 | 
 637 | 
 638 | def cqtchromagram(
 639 |     audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel
 640 | ):
 641 |     """
 642 |     Compute the constant-Q transform (CQT) chromagram using a CQT kernel.
 643 | 
 644 |     Inputs:
 645 |         audio_signal: audio signal (number_samples,)
 646 |         sampling_frequency: sampling frequency in Hz
 647 |         time_resolution: number of time frames per second
 648 |         octave_resolution: number of frequency channels per octave
 649 |         cqt_kernel: CQT kernel (number_frequencies, fft_length)
 650 |     Output:
 651 |         cqt_chromagram: CQT chromagram (octave_resolution, number_times)
 652 | 
 653 |     Example: Compute and display the CQT chromagram.
 654 |         # Import the needed modules
 655 |         import numpy as np
 656 |         import zaf
 657 |         import matplotlib.pyplot as plt
 658 | 
 659 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
 660 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 661 |         audio_signal = np.mean(audio_signal, 1)
 662 | 
 663 |         # Compute the CQT kernel
 664 |         octave_resolution = 24
 665 |         minimum_frequency = 55
 666 |         maximum_frequency = 3520
 667 |         cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
 668 | 
 669 |         # Compute the CQT chromagram using the kernel
 670 |         time_resolution = 25
 671 |         cqt_chromagram = zaf.cqtchromagram(audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel)
 672 | 
 673 |         # Display the CQT chromagram in seconds
 674 |         plt.figure(figsize=(14, 3))
 675 |         zaf.cqtchromshow(cqt_chromagram, time_resolution, xtick_step=1)
 676 |         plt.title("CQT chromagram")
 677 |         plt.tight_layout()
 678 |         plt.show()
 679 |     """
 680 | 
 681 |     # Compute the CQT spectrogram
 682 |     cqt_spectrogram = cqtspectrogram(
 683 |         audio_signal, sampling_frequency, time_resolution, cqt_kernel
 684 |     )
 685 | 
 686 |     # Get the number of frequency channels and time frames
 687 |     number_frequencies, number_times = np.shape(cqt_spectrogram)
 688 | 
 689 |     # Initialize the CQT chromagram
 690 |     cqt_chromagram = np.zeros((octave_resolution, number_times))
 691 | 
 692 |     # Loop over the chroma channels
 693 |     for i in range(octave_resolution):
 694 | 
 695 |         # Sum the energy of the frequency channels for every chroma
 696 |         cqt_chromagram[i, :] = np.sum(
 697 |             cqt_spectrogram[i:number_frequencies:octave_resolution, :], axis=0
 698 |         )
 699 | 
 700 |     return cqt_chromagram
 701 | 
 702 | 
 703 | def dct(audio_signal, dct_type):
 704 |     """
 705 |     Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
 706 | 
 707 |     Inputs:
 708 |         audio_signal: audio signal (window_length,)
 709 |         dct_type: DCT type (1, 2, 3, or 4)
 710 |     Output:
 711 |         audio_dct: audio DCT (number_frequencies,)
 712 | 
 713 |     Example: Compute the 4 different DCTs and compare them to SciPy's DCTs.
 714 |         # Import the needed modules
 715 |         import numpy as np
 716 |         import zaf
 717 |         import scipy.fftpack
 718 |         import matplotlib.pyplot as plt
 719 | 
 720 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
 721 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 722 |         audio_signal = np.mean(audio_signal, 1)
 723 | 
 724 |         # Get an audio segment for a given window length
 725 |         window_length = 1024
 726 |         audio_segment = audio_signal[0:window_length]
 727 | 
 728 |         # Compute the DCT-I, II, III, and IV
 729 |         audio_dct1 = zaf.dct(audio_segment, 1)
 730 |         audio_dct2 = zaf.dct(audio_segment, 2)
 731 |         audio_dct3 = zaf.dct(audio_segment, 3)
 732 |         audio_dct4 = zaf.dct(audio_segment, 4)
 733 | 
 734 |         # Compute SciPy's DCT-I, II, III, and IV (orthogonalized)
 735 |         scipy_dct1 = scipy.fftpack.dct(audio_segment, type=1, norm="ortho")
 736 |         scipy_dct2 = scipy.fftpack.dct(audio_segment, type=2, norm="ortho")
 737 |         scipy_dct3 = scipy.fftpack.dct(audio_segment, type=3, norm="ortho")
 738 |         scipy_dct4 = scipy.fftpack.dct(audio_segment, type=4, norm="ortho")
 739 | 
 740 |         # Plot the DCT-I, II, III, and IV, SciPy's versions, and their differences
 741 |         plt.figure(figsize=(14, 7))
 742 |         plt.subplot(3, 4, 1), plt.plot(audio_dct1), plt.autoscale(tight=True), plt.title("DCT-I")
 743 |         plt.subplot(3, 4, 2), plt.plot(audio_dct2), plt.autoscale(tight=True), plt.title("DCT-II")
 744 |         plt.subplot(3, 4, 3), plt.plot(audio_dct3), plt.autoscale(tight=True), plt.title("DCT-III")
 745 |         plt.subplot(3, 4, 4), plt.plot(audio_dct4), plt.autoscale(tight=True), plt.title("DCT-IV")
 746 |         plt.subplot(3, 4, 5), plt.plot(scipy_dct1), plt.autoscale(tight=True), plt.title("SciPy's DCT-I")
 747 |         plt.subplot(3, 4, 6), plt.plot(scipy_dct2), plt.autoscale(tight=True), plt.title("SciPy's DCT-II")
 748 |         plt.subplot(3, 4, 7), plt.plot(scipy_dct3), plt.autoscale(tight=True), plt.title("SciPy's DCT-III")
 749 |         plt.subplot(3, 4, 8), plt.plot(scipy_dct4), plt.autoscale(tight=True), plt.title("SciPy's DCT-IV")
 750 |         plt.subplot(3, 4, 9), plt.plot(audio_dct1-scipy_dct1), plt.autoscale(tight=True), plt.title("DCT-I - SciPy's DCT-I")
 751 |         plt.subplot(3, 4, 10), plt.plot(audio_dct2-scipy_dct2), plt.autoscale(tight=True), plt.title("DCT-II - SciPy's DCT-II")
 752 |         plt.subplot(3, 4, 11), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-III - SciPy's DCT-III")
 753 |         plt.subplot(3, 4, 12), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-IV - SciPy's DCT-IV")
 754 |         plt.tight_layout()
 755 |         plt.show()
 756 |     """
 757 | 
 758 |     # Check if the DCT type is I, II, III, or IV
 759 |     if dct_type == 1:
 760 | 
 761 |         # Get the number of samples
 762 |         window_length = len(audio_signal)
 763 | 
 764 |         # Pre-process the signal to make the DCT-I matrix orthogonal
 765 |         # (copy the signal to avoid modifying it outside of the function)
 766 |         audio_signal = audio_signal.copy()
 767 |         audio_signal[[0, -1]] = audio_signal[[0, -1]] * np.sqrt(2)
 768 | 
 769 |         # Compute the DCT-I using the FFT
 770 |         audio_dct = np.concatenate((audio_signal, audio_signal[-2:0:-1]))
 771 |         audio_dct = np.fft.fft(audio_dct)
 772 |         audio_dct = np.real(audio_dct[0:window_length]) / 2
 773 | 
 774 |         # Post-process the results to make the DCT-I matrix orthogonal
 775 |         audio_dct[[0, -1]] = audio_dct[[0, -1]] / np.sqrt(2)
 776 |         audio_dct = audio_dct * np.sqrt(2 / (window_length - 1))
 777 | 
 778 |         return audio_dct
 779 | 
 780 |     elif dct_type == 2:
 781 | 
 782 |         # Get the number of samples
 783 |         window_length = len(audio_signal)
 784 | 
 785 |         # Compute the DCT-II using the FFT
 786 |         audio_dct = np.zeros(4 * window_length)
 787 |         audio_dct[1 : 2 * window_length : 2] = audio_signal
 788 |         audio_dct[2 * window_length + 1 : 4 * window_length : 2] = audio_signal[::-1]
 789 |         audio_dct = np.fft.fft(audio_dct)
 790 |         audio_dct = np.real(audio_dct[0:window_length]) / 2
 791 | 
 792 |         # Post-process the results to make the DCT-II matrix orthogonal
 793 |         audio_dct[0] = audio_dct[0] / np.sqrt(2)
 794 |         audio_dct = audio_dct * np.sqrt(2 / window_length)
 795 | 
 796 |         return audio_dct
 797 | 
 798 |     elif dct_type == 3:
 799 | 
 800 |         # Get the number of samples
 801 |         window_length = len(audio_signal)
 802 | 
 803 |         # Pre-process the signal to make the DCT-III matrix orthogonal
 804 |         # (copy the signal to avoid modifying it outside of the function)
 805 |         audio_signal = audio_signal.copy()
 806 |         audio_signal[0] = audio_signal[0] * np.sqrt(2)
 807 | 
 808 |         # Compute the DCT-III using the FFT
 809 |         audio_dct = np.zeros(4 * window_length)
 810 |         audio_dct[0:window_length] = audio_signal
 811 |         audio_dct[window_length + 1 : 2 * window_length + 1] = -audio_signal[::-1]
 812 |         audio_dct[2 * window_length + 1 : 3 * window_length] = -audio_signal[1:]
 813 |         audio_dct[3 * window_length + 1 : 4 * window_length] = audio_signal[:0:-1]
 814 |         audio_dct = np.fft.fft(audio_dct)
 815 |         audio_dct = np.real(audio_dct[1 : 2 * window_length : 2]) / 4
 816 | 
 817 |         # Post-process the results to make the DCT-III matrix orthogonal
 818 |         audio_dct = audio_dct * np.sqrt(2 / window_length)
 819 | 
 820 |         return audio_dct
 821 | 
 822 |     elif dct_type == 4:
 823 | 
 824 |         # Get the number of samples
 825 |         window_length = len(audio_signal)
 826 | 
 827 |         # Compute the DCT-IV using the FFT
 828 |         audio_dct = np.zeros(8 * window_length)
 829 |         audio_dct[1 : 2 * window_length : 2] = audio_signal
 830 |         audio_dct[2 * window_length + 1 : 4 * window_length : 2] = -audio_signal[::-1]
 831 |         audio_dct[4 * window_length + 1 : 6 * window_length : 2] = -audio_signal
 832 |         audio_dct[6 * window_length + 1 : 8 * window_length : 2] = audio_signal[::-1]
 833 |         audio_dct = np.fft.fft(audio_dct)
 834 |         audio_dct = np.real(audio_dct[1 : 2 * window_length : 2]) / 4
 835 | 
 836 |         # Post-process the results to make the DCT-IV matrix orthogonal
 837 |         audio_dct = audio_dct * np.sqrt(2 / window_length)
 838 | 
 839 |         return audio_dct
 840 | 
 841 | 
 842 | def dst(audio_signal, dst_type):
 843 |     """
 844 |     Compute the discrete sine transform (DST) using the fast Fourier transform (FFT).
 845 | 
 846 |     Inputs:
 847 |         audio_signal: audio signal (window_length,)
 848 |         dst_type: DST type (1, 2, 3, or 4)
 849 |     Output:
 850 |         audio_dst: audio DST (number_frequencies,)
 851 | 
 852 |     Example: Compute the 4 different DSTs and compare their respective inverses with the original audio.
 853 |         # Import the needed modules
 854 |         import numpy as np
 855 |         import zaf
 856 |         import matplotlib.pyplot as plt
 857 | 
 858 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
 859 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
 860 |         audio_signal = np.mean(audio_signal, 1)
 861 | 
 862 |         # Get an audio segment for a given window length
 863 |         window_length = 1024
 864 |         audio_segment = audio_signal[0:window_length]
 865 | 
 866 |         # Compute the DST-I, II, III, and IV
 867 |         audio_dst1 = zaf.dst(audio_segment, 1)
 868 |         audio_dst2 = zaf.dst(audio_segment, 2)
 869 |         audio_dst3 = zaf.dst(audio_segment, 3)
 870 |         audio_dst4 = zaf.dst(audio_segment, 4)
 871 | 
 872 |         # Compute their respective inverses, i.e., DST-I, II, III, and IV
 873 |         audio_idst1 = zaf.dst(audio_dst1, 1)
 874 |         audio_idst2 = zaf.dst(audio_dst2, 3)
 875 |         audio_idst3 = zaf.dst(audio_dst3, 2)
 876 |         audio_idst4 = zaf.dst(audio_dst4, 4)
 877 | 
 878 |         # Plot the DST-I, II, III, and IV, their respective inverses, and their differences with the original audio segment
 879 |         plt.figure(figsize=(14, 7))
 880 |         plt.subplot(3, 4, 1), plt.plot(audio_dst1), plt.autoscale(tight=True), plt.title("DCT-I")
 881 |         plt.subplot(3, 4, 2), plt.plot(audio_dst2), plt.autoscale(tight=True), plt.title("DST-II")
 882 |         plt.subplot(3, 4, 3), plt.plot(audio_dst3), plt.autoscale(tight=True), plt.title("DST-III")
 883 |         plt.subplot(3, 4, 4), plt.plot(audio_dst4), plt.autoscale(tight=True), plt.title("DST-IV")
 884 |         plt.subplot(3, 4, 5), plt.plot(audio_idst1), plt.autoscale(tight=True), plt.title("Inverse DST-I (DST-I)")
 885 |         plt.subplot(3, 4, 6), plt.plot(audio_idst2), plt.autoscale(tight=True), plt.title("Inverse DST-II (DST-III)")
 886 |         plt.subplot(3, 4, 7), plt.plot(audio_idst3), plt.autoscale(tight=True), plt.title("Inverse DST-III (DST-II)")
 887 |         plt.subplot(3, 4, 8), plt.plot(audio_idst4), plt.autoscale(tight=True), plt.title("Inverse DST-IV (DST-IV)")
 888 |         plt.subplot(3, 4, 9), plt.plot(audio_idst1-audio_segment), plt.autoscale(tight=True)
 889 |         plt.title("Inverse DST-I - audio segment")
 890 |         plt.subplot(3, 4, 10), plt.plot(audio_idst2-audio_segment), plt.autoscale(tight=True)
 891 |         plt.title("Inverse DST-II - audio segment")
 892 |         plt.subplot(3, 4, 11), plt.plot(audio_idst3-audio_segment), plt.autoscale(tight=True)
 893 |         plt.title("Inverse DST-III - audio segment")
 894 |         plt.subplot(3, 4, 12), plt.plot(audio_idst4-audio_segment), plt.autoscale(tight=True)
 895 |         plt.title("Inverse DST-IV - audio segment")
 896 |         plt.tight_layout()
 897 |         plt.show()
 898 |     """
 899 | 
 900 |     # Check if the DST type is I, II, III, or IV
 901 |     if dst_type == 1:
 902 | 
 903 |         # Get the number of samples
 904 |         window_length = len(audio_signal)
 905 | 
 906 |         # Compute the DST-I using the FFT
 907 |         audio_dst = np.zeros(2 * window_length + 2)
 908 |         audio_dst[1 : window_length + 1] = audio_signal
 909 |         audio_dst[window_length + 2 :] = -audio_signal[::-1]
 910 |         audio_dst = np.fft.fft(audio_dst)
 911 |         audio_dst = -np.imag(audio_dst[1 : window_length + 1]) / 2
 912 | 
 913 |         # Post-process the results to make the DST-I matrix orthogonal
 914 |         audio_dst = audio_dst * np.sqrt(2 / (window_length + 1))
 915 | 
 916 |         return audio_dst
 917 | 
 918 |     elif dst_type == 2:
 919 | 
 920 |         # Get the number of samples
 921 |         window_length = len(audio_signal)
 922 | 
 923 |         # Compute the DST-II using the FFT
 924 |         audio_dst = np.zeros(4 * window_length)
 925 |         audio_dst[1 : 2 * window_length : 2] = audio_signal
 926 |         audio_dst[2 * window_length + 1 : 4 * window_length : 2] = -audio_signal[-1::-1]
 927 |         audio_dst = np.fft.fft(audio_dst)
 928 |         audio_dst = -np.imag(audio_dst[1 : window_length + 1]) / 2
 929 | 
 930 |         # Post-process the results to make the DST-II matrix orthogonal
 931 |         audio_dst[-1] = audio_dst[-1] / np.sqrt(2)
 932 |         audio_dst = audio_dst * np.sqrt(2 / window_length)
 933 | 
 934 |         return audio_dst
 935 | 
 936 |     elif dst_type == 3:
 937 | 
 938 |         # Get the number of samples
 939 |         window_length = len(audio_signal)
 940 | 
 941 |         # Pre-process the signal to make the DST-III matrix orthogonal
 942 |         # (copy the signal to avoid modifying it outside of the function)
 943 |         audio_signal = audio_signal.copy()
 944 |         audio_signal[-1] = audio_signal[-1] * np.sqrt(2)
 945 | 
 946 |         # Compute the DST-III using the FFT
 947 |         audio_dst = np.zeros(4 * window_length)
 948 |         audio_dst[1 : window_length + 1] = audio_signal
 949 |         audio_dst[window_length + 1 : 2 * window_length] = audio_signal[-2::-1]
 950 |         audio_dst[2 * window_length + 1 : 3 * window_length + 1] = -audio_signal
 951 |         audio_dst[3 * window_length + 1 : 4 * window_length] = -audio_signal[-2::-1]
 952 |         audio_dst = np.fft.fft(audio_dst)
 953 |         audio_dst = -np.imag(audio_dst[1 : 2 * window_length : 2]) / 4
 954 | 
 955 |         # Post-process the results to make the DST-III matrix orthogonal
 956 |         audio_dst = audio_dst * np.sqrt(2 / window_length)
 957 | 
 958 |         return audio_dst
 959 | 
 960 |     elif dst_type == 4:
 961 | 
 962 |         # Initialize the DST-IV
 963 |         window_length = len(audio_signal)
 964 |         audio_dst = np.zeros(8 * window_length)
 965 | 
 966 |         # Compute the DST-IV using the FFT
 967 |         audio_dst[1 : 2 * window_length : 2] = audio_signal
 968 |         audio_dst[2 * window_length + 1 : 4 * window_length : 2] = audio_signal[
 969 |             window_length - 1 :: -1
 970 |         ]
 971 |         audio_dst[4 * window_length + 1 : 6 * window_length : 2] = -audio_signal
 972 |         audio_dst[6 * window_length + 1 : 8 * window_length : 2] = -audio_signal[
 973 |             window_length - 1 :: -1
 974 |         ]
 975 |         audio_dst = np.fft.fft(audio_dst)
 976 |         audio_dst = -np.imag(audio_dst[1 : 2 * window_length : 2]) / 4
 977 | 
 978 |         # Post-process the results to make the DST-IV matrix orthogonal
 979 |         audio_dst = audio_dst * np.sqrt(2 / window_length)
 980 | 
 981 |         return audio_dst
 982 | 
 983 | 
 984 | def mdct(audio_signal, window_function):
 985 |     """
 986 |     Compute the modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
 987 | 
 988 |     Inputs:
 989 |         audio_signal: audio signal (number_samples,)
 990 |         window_function: window function (window_length,)
 991 |     Output:
 992 |         audio_mdct: audio MDCT (number_frequencies, number_times)
 993 | 
 994 |     Example: Compute and display the MDCT as used in the AC-3 audio coding format.
 995 |         # Import the needed modules
 996 |         import numpy as np
 997 |         import zaf
 998 |         import matplotlib.pyplot as plt
 999 | 
1000 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
1001 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
1002 |         audio_signal = np.mean(audio_signal, 1)
1003 | 
1004 |         # Compute the Kaiser-Bessel-derived (KBD) window as used in the AC-3 audio coding format
1005 |         window_length = 512
1006 |         alpha_value = 5
1007 |         window_function = np.kaiser(int(window_length/2)+1, alpha_value*np.pi)
1008 |         window_function2 = np.cumsum(window_function[1:int(window_length/2)])
1009 |         window_function = np.sqrt(np.concatenate((window_function2, window_function2[int(window_length/2)::-1]))
1010 |                                 /np.sum(window_function))
1011 | 
1012 |         # Compute the MDCT
1013 |         audio_mdct = zaf.mdct(audio_signal, window_function)
1014 | 
1015 |         # Display the MDCT in dB, seconds, and Hz
1016 |         number_samples = len(audio_signal)
1017 |         plt.figure(figsize=(14, 7))
1018 |         zaf.specshow(np.absolute(audio_mdct), number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
1019 |         plt.title("MDCT (dB)")
1020 |         plt.tight_layout()
1021 |         plt.show()
1022 |     """
1023 | 
1024 |     # Get the number of samples and the window length in samples
1025 |     number_samples = len(audio_signal)
1026 |     window_length = len(window_function)
1027 | 
1028 |     # Derive the step length and the number of frequencies (for clarity)
1029 |     step_length = int(window_length / 2)
1030 |     number_frequencies = int(window_length / 2)
1031 | 
1032 |     # Derive the number of time frames
1033 |     number_times = int(np.ceil(number_samples / step_length)) + 1
1034 | 
1035 |     # Zero-pad the start and the end of the signal to center the windows
1036 |     audio_signal = np.pad(
1037 |         audio_signal,
1038 |         (step_length, (number_times + 1) * step_length - number_samples),
1039 |         "constant",
1040 |         constant_values=0,
1041 |     )
1042 | 
1043 |     # Initialize the MDCT
1044 |     audio_mdct = np.zeros((number_frequencies, number_times))
1045 | 
1046 |     # Prepare the pre-processing and post-processing arrays
1047 |     preprocessing_array = np.exp(
1048 |         -1j * np.pi / window_length * np.arange(0, window_length)
1049 |     )
1050 |     postprocessing_array = np.exp(
1051 |         -1j
1052 |         * np.pi
1053 |         / window_length
1054 |         * (window_length / 2 + 1)
1055 |         * np.arange(0.5, window_length / 2 + 0.5)
1056 |     )
1057 | 
1058 |     # Loop over the time frames
1059 |     # (Do the pre and post-processing, and take the FFT in the loop to avoid storing twice longer frames)
1060 |     i = 0
1061 |     for j in range(number_times):
1062 | 
1063 |         # Window the signal
1064 |         audio_segment = audio_signal[i : i + window_length] * window_function
1065 |         i = i + step_length
1066 | 
1067 |         # Compute the Fourier transform of the windowed segment using the FFT after pre-processing
1068 |         audio_segment = np.fft.fft(audio_segment * preprocessing_array)
1069 | 
1070 |         # Truncate to the first half before post-processing (and take the real to ensure real values)
1071 |         audio_mdct[:, j] = np.real(
1072 |             audio_segment[0:number_frequencies] * postprocessing_array
1073 |         )
1074 | 
1075 |     return audio_mdct
1076 | 
1077 | 
1078 | def imdct(audio_mdct, window_function):
1079 |     """
1080 |     Compute the inverse modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
1081 | 
1082 |     Inputs:
1083 |         audio_mdct: audio MDCT (number_frequencies, number_times)
1084 |         window_function: window function (window_length,)
1085 |     Output:
1086 |         audio_signal: audio signal (number_samples,)
1087 | 
1088 |     Example: Verify that the MDCT is perfectly invertible.
1089 |         # Import the needed modules
1090 |         import numpy as np
1091 |         import zaf
1092 |         import matplotlib.pyplot as plt
1093 | 
1094 |         # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
1095 |         audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
1096 |         audio_signal = np.mean(audio_signal, 1)
1097 | 
1098 |         # Compute the MDCT with a slope function as used in the Vorbis audio coding format
1099 |         window_length = 2048
1100 |         window_function = np.sin(np.pi/2*pow(np.sin(np.pi/window_length*np.arange(0.5, window_length+0.5)), 2))
1101 |         audio_mdct = zaf.mdct(audio_signal, window_function)
1102 | 
1103 |         # Compute the inverse MDCT
1104 |         audio_signal2 = zaf.imdct(audio_mdct, window_function)
1105 |         audio_signal2 = audio_signal2[0:len(audio_signal)]
1106 | 
1107 |         # Compute the differences between the original signal and the resynthesized one
1108 |         audio_differences = audio_signal-audio_signal2
1109 |         y_max = np.max(np.absolute(audio_differences))
1110 | 
1111 |         # Display the original and resynthesized signals, and their differences in seconds
1112 |         xtick_step = 1
1113 |         plt.figure(figsize=(14, 7))
1114 |         plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
1115 |         plt.ylim(-1, 1), plt.title("Original signal")
1116 |         plt.subplot(3, 1, 2), zaf.sigplot(audio_signal2, sampling_frequency, xtick_step)
1117 |         plt.ylim(-1, 1), plt.title("Resyntesized signal")
1118 |         plt.subplot(3, 1, 3), zaf.sigplot(audio_differences, sampling_frequency, xtick_step)
1119 |         plt.ylim(-y_max, y_max), plt.title("Original - resyntesized signal")
1120 |         plt.tight_layout()
1121 |         plt.show()
1122 |     """
1123 | 
1124 |     # Get the number of frequency channels and time frames
1125 |     number_frequencies, number_times = np.shape(audio_mdct)
1126 | 
1127 |     # Derive the window length and the step length in samples (for clarity)
1128 |     window_length = 2 * number_frequencies
1129 |     step_length = number_frequencies
1130 | 
1131 |     # Derive the number of samples for the signal
1132 |     number_samples = step_length * (number_times + 1)
1133 | 
1134 |     # Initialize the audio signal
1135 |     audio_signal = np.zeros(number_samples)
1136 | 
1137 |     # Prepare the pre-processing and post-processing arrays
1138 |     preprocessing_array = np.exp(
1139 |         -1j
1140 |         * np.pi
1141 |         / (2 * number_frequencies)
1142 |         * (number_frequencies + 1)
1143 |         * np.arange(0, number_frequencies)
1144 |     )
1145 |     postprocessing_array = (
1146 |         np.exp(
1147 |             -1j
1148 |             * np.pi
1149 |             / (2 * number_frequencies)
1150 |             * np.arange(
1151 |                 0.5 + number_frequencies / 2,
1152 |                 2 * number_frequencies + number_frequencies / 2 + 0.5,
1153 |             )
1154 |         )
1155 |         / number_frequencies
1156 |     )
1157 | 
1158 |     # Compute the Fourier transform of the frames using the FFT after pre-processing (zero-pad to get twice the length)
1159 |     audio_mdct = np.fft.fft(
1160 |         audio_mdct * preprocessing_array[:, np.newaxis],
1161 |         n=2 * number_frequencies,
1162 |         axis=0,
1163 |     )
1164 | 
1165 |     # Apply the window function to the frames after post-processing (take the real to ensure real values)
1166 |     audio_mdct = 2 * (
1167 |         np.real(audio_mdct * postprocessing_array[:, np.newaxis])
1168 |         * window_function[:, np.newaxis]
1169 |     )
1170 | 
1171 |     # Loop over the time frames
1172 |     i = 0
1173 |     for j in range(number_times):
1174 | 
1175 |         # Recover the signal with the time-domain aliasing cancellation (TDAC) principle
1176 |         audio_signal[i : i + window_length] = (
1177 |             audio_signal[i : i + window_length] + audio_mdct[:, j]
1178 |         )
1179 |         i = i + step_length
1180 | 
1181 |     # Remove the zero-padding at the start and at the end of the signal
1182 |     audio_signal = audio_signal[step_length : -step_length - 1]
1183 | 
1184 |     return audio_signal
1185 | 
1186 | 
1187 | def wavread(audio_file):
1188 |     """
1189 |     Read a WAVE file (using SciPy).
1190 | 
1191 |     Input:
1192 |         audio_file: path to an audio file
1193 |     Outputs:
1194 |         audio_signal: audio signal (number_samples, number_channels)
1195 |         sampling_frequency: sampling frequency in Hz
1196 |     """
1197 | 
1198 |     # Read the audio file and return the sampling frequency in Hz and the non-normalized signal using SciPy
1199 |     sampling_frequency, audio_signal = scipy.io.wavfile.read(audio_file)
1200 | 
1201 |     # Normalize the signal by the data range given the size of an item in bytes
1202 |     audio_signal = audio_signal / pow(2, audio_signal.itemsize * 8 - 1)
1203 | 
1204 |     return audio_signal, sampling_frequency
1205 | 
1206 | 
1207 | def wavwrite(audio_signal, sampling_frequency, audio_file):
1208 |     """
1209 |     Write a WAVE file (using Scipy).
1210 | 
1211 |     Inputs:
1212 |         audio_signal: audio signal (number_samples, number_channels)
1213 |         sampling_frequency: sampling frequency in Hz
1214 |     Output:
1215 |         audio_file: path to an audio file
1216 |     """
1217 | 
1218 |     # Write the audio signal using SciPy
1219 |     scipy.io.wavfile.write(audio_file, sampling_frequency, audio_signal)
1220 | 
1221 | 
1222 | def sigplot(
1223 |     audio_signal,
1224 |     sampling_frequency,
1225 |     xtick_step=1,
1226 | ):
1227 |     """
1228 |     Plot a signal in seconds.
1229 | 
1230 |     Inputs:
1231 |         audio_signal: audio signal (number_samples, number_channels) (number_channels>=0)
1232 |         sampling_frequency: sampling frequency in Hz
1233 |         xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1234 |     """
1235 | 
1236 |     # Get the number of samples
1237 |     number_samples = np.shape(audio_signal)[0]
1238 | 
1239 |     # Prepare the tick locations and labels for the x-axis
1240 |     xtick_locations = np.arange(
1241 |         xtick_step * sampling_frequency,
1242 |         number_samples,
1243 |         xtick_step * sampling_frequency,
1244 |     )
1245 |     xtick_labels = np.arange(
1246 |         xtick_step, number_samples / sampling_frequency, xtick_step
1247 |     ).astype(int)
1248 | 
1249 |     # Plot the signal in seconds
1250 |     plt.plot(audio_signal)
1251 |     plt.autoscale(tight=True)
1252 |     plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1253 |     plt.xlabel("Time (s)")
1254 | 
1255 | 
1256 | def specshow(
1257 |     audio_spectrogram,
1258 |     number_samples,
1259 |     sampling_frequency,
1260 |     xtick_step=1,
1261 |     ytick_step=1000,
1262 | ):
1263 |     """
1264 |     Display a spectrogram in dB, seconds, and Hz.
1265 | 
1266 |     Inputs:
1267 |         audio_spectrogram: audio spectrogram (without DC and mirrored frequencies) (number_frequencies, number_times)
1268 |         number_samples: number of samples from the original signal
1269 |         sampling_frequency: sampling frequency from the original signal in Hz
1270 |         xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1271 |         ytick_step: step for the y-axis ticks in Hz (default: 1000 Hz)
1272 |     """
1273 | 
1274 |     # Get the number of frequency channels and time frames
1275 |     number_frequencies, number_times = np.shape(audio_spectrogram)
1276 | 
1277 |     # Derive the number of seconds and Hertz
1278 |     number_seconds = number_samples / sampling_frequency
1279 |     number_hertz = sampling_frequency / 2
1280 | 
1281 |     # Derive the number of time frames per second and the number of frequency channels per Hz
1282 |     time_resolution = number_times / number_seconds
1283 |     frequency_resolution = number_frequencies / number_hertz
1284 | 
1285 |     # Prepare the tick locations and labels for the x-axis
1286 |     xtick_locations = np.arange(
1287 |         xtick_step * time_resolution,
1288 |         number_times,
1289 |         xtick_step * time_resolution,
1290 |     )
1291 |     xtick_labels = np.arange(xtick_step, number_seconds, xtick_step).astype(int)
1292 | 
1293 |     # Prepare the tick locations and labels for the y-axis
1294 |     ytick_locations = np.arange(
1295 |         ytick_step * frequency_resolution,
1296 |         number_frequencies,
1297 |         ytick_step * frequency_resolution,
1298 |     )
1299 |     ytick_labels = np.arange(ytick_step, number_hertz, ytick_step).astype(int)
1300 | 
1301 |     # Display the spectrogram in dB, seconds, and Hz
1302 |     plt.imshow(
1303 |         20 * np.log10(audio_spectrogram), aspect="auto", cmap="jet", origin="lower"
1304 |     )
1305 |     plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1306 |     plt.yticks(ticks=ytick_locations, labels=ytick_labels)
1307 |     plt.xlabel("Time (s)")
1308 |     plt.ylabel("Frequency (Hz)")
1309 | 
1310 | 
1311 | def melspecshow(
1312 |     mel_spectrogram,
1313 |     number_samples,
1314 |     sampling_frequency,
1315 |     window_length,
1316 |     xtick_step=1,
1317 | ):
1318 |     """
1319 |     Display a mel spectrogram in dB, seconds, and Hz.
1320 | 
1321 |     Inputs:
1322 |         mel_spectrogram: mel spectrogram (number_mels, number_times)
1323 |         number_samples: number of samples from the original signal
1324 |         sampling_frequency: sampling frequency from the original signal in Hz
1325 |         window_length: window length from the Fourier analysis in number of samples
1326 |         xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1327 |     """
1328 | 
1329 |     # Get the number of mels and time frames
1330 |     number_mels, number_times = np.shape(mel_spectrogram)
1331 | 
1332 |     # Derive the number of seconds and the number of time frames per second
1333 |     number_seconds = number_samples / sampling_frequency
1334 |     time_resolution = number_times / number_seconds
1335 | 
1336 |     # Derive the minimum and maximum mel
1337 |     minimum_mel = 2595 * np.log10(1 + (sampling_frequency / window_length) / 700)
1338 |     maximum_mel = 2595 * np.log10(1 + (sampling_frequency / 2) / 700)
1339 | 
1340 |     # Compute the mel scale (linearly spaced)
1341 |     mel_scale = np.linspace(minimum_mel, maximum_mel, number_mels)
1342 | 
1343 |     # Derive the Hertz scale (log spaced)
1344 |     hertz_scale = 700 * (np.power(10, mel_scale / 2595) - 1)
1345 | 
1346 |     # Prepare the tick locations and labels for the x-axis
1347 |     xtick_locations = np.arange(
1348 |         xtick_step * time_resolution,
1349 |         number_times,
1350 |         xtick_step * time_resolution,
1351 |     )
1352 |     xtick_labels = np.arange(xtick_step, number_seconds, xtick_step).astype(int)
1353 | 
1354 |     # Prepare the tick locations and labels for the y-axis
1355 |     ytick_locations = np.arange(0, number_mels, 8)
1356 |     ytick_labels = hertz_scale[::8].astype(int)
1357 | 
1358 |     # Display the mel spectrogram in dB, seconds, and Hz
1359 |     plt.imshow(
1360 |         20 * np.log10(mel_spectrogram), aspect="auto", cmap="jet", origin="lower"
1361 |     )
1362 |     plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1363 |     plt.yticks(ticks=ytick_locations, labels=ytick_labels)
1364 |     plt.xlabel("Time (s)")
1365 |     plt.ylabel("Frequency (Hz)")
1366 | 
1367 | 
1368 | def mfccshow(
1369 |     audio_mfcc,
1370 |     number_samples,
1371 |     sampling_frequency,
1372 |     xtick_step=1,
1373 | ):
1374 |     """
1375 |     Display MFCCs in seconds.
1376 | 
1377 |     Inputs:
1378 |         audio_mfcc: audio MFCCs (number_coefficients, number_times)
1379 |         number_samples: number of samples from the original signal
1380 |         sampling_frequency: sampling frequency from the original signal in Hz
1381 |         xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1382 |     """
1383 | 
1384 |     # Get the number of time frames
1385 |     number_times = np.shape(audio_mfcc)[1]
1386 | 
1387 |     # Derive the number of seconds and the number of time frames per second
1388 |     number_seconds = number_samples / sampling_frequency
1389 |     time_resolution = number_times / number_seconds
1390 | 
1391 |     # Prepare the tick locations and labels for the x-axis
1392 |     xtick_locations = np.arange(
1393 |         xtick_step * time_resolution,
1394 |         number_times,
1395 |         xtick_step * time_resolution,
1396 |     )
1397 |     xtick_labels = np.arange(xtick_step, number_seconds, xtick_step).astype(int)
1398 | 
1399 |     # Display the MFCCs in seconds
1400 |     plt.imshow(audio_mfcc, aspect="auto", cmap="jet", origin="lower")
1401 |     plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1402 |     plt.xlabel("Time (s)")
1403 |     plt.ylabel("Coefficients")
1404 | 
1405 | 
1406 | def cqtspecshow(
1407 |     cqt_spectrogram,
1408 |     time_resolution,
1409 |     octave_resolution,
1410 |     minimum_frequency,
1411 |     xtick_step=1,
1412 | ):
1413 |     """
1414 |     Display a CQT spectrogram in dB, seconds, and Hz.
1415 | 
1416 |     Inputs:
1417 |         cqt_spectrogram: CQT spectrogram (number_frequencies, number_times)
1418 |         time_resolution: number of time frames per second
1419 |         octave_resolution: number of frequency channels per octave
1420 |         minimum_frequency: minimum frequency in Hz
1421 |         xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1422 |     """
1423 | 
1424 |     # Get the number of frequency channels and time frames
1425 |     number_frequencies, number_times = np.shape(cqt_spectrogram)
1426 | 
1427 |     # Prepare the tick locations and labels for the x-axis
1428 |     xtick_locations = np.arange(
1429 |         xtick_step * time_resolution,
1430 |         number_times,
1431 |         xtick_step * time_resolution,
1432 |     )
1433 |     xtick_labels = np.arange(
1434 |         xtick_step, number_times / time_resolution, xtick_step
1435 |     ).astype(int)
1436 | 
1437 |     # Prepare the tick locations and labels for the y-axis
1438 |     ytick_locations = np.arange(0, number_frequencies, octave_resolution)
1439 |     ytick_labels = (
1440 |         minimum_frequency * pow(2, ytick_locations / octave_resolution)
1441 |     ).astype(int)
1442 | 
1443 |     # Display the CQT spectrogram in dB and seconds, and Hz
1444 |     plt.imshow(
1445 |         20 * np.log10(cqt_spectrogram), aspect="auto", cmap="jet", origin="lower"
1446 |     )
1447 |     plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1448 |     plt.yticks(ticks=ytick_locations, labels=ytick_labels)
1449 |     plt.xlabel("Time (s)")
1450 |     plt.ylabel("Frequency (Hz)")
1451 | 
1452 | 
1453 | def cqtchromshow(
1454 |     cqt_chromagram,
1455 |     time_resolution,
1456 |     xtick_step=1,
1457 | ):
1458 |     """
1459 |     Display a CQT chromagram in seconds.
1460 | 
1461 |     Inputs:
1462 |         audio_chromagram: CQT chromagram (number_chromas, number_times)
1463 |         time_resolution: number of time frames per second
1464 |         xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1465 |     """
1466 | 
1467 |     # Get the number of time frames
1468 |     number_times = np.shape(cqt_chromagram)[1]
1469 | 
1470 |     # Prepare the tick locations and labels for the x-axis
1471 |     xtick_locations = np.arange(
1472 |         xtick_step * time_resolution,
1473 |         number_times,
1474 |         xtick_step * time_resolution,
1475 |     )
1476 |     xtick_labels = np.arange(
1477 |         xtick_step, number_times / time_resolution, xtick_step
1478 |     ).astype(int)
1479 | 
1480 |     # Display the CQT chromagram in seconds
1481 |     plt.imshow(cqt_chromagram, aspect="auto", cmap="jet", origin="lower")
1482 |     plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1483 |     plt.xlabel("Time (s)")
1484 |     plt.ylabel("Chroma")


--------------------------------------------------------------------------------