├── .gitignore
├── README.md
├── audio_file.wav
├── examples.ipynb
├── images
├── cqtchromagram.png
├── cqtkernel.png
├── cqtspectrogram.png
├── dct.png
├── dst.png
├── imdct.png
├── istft.png
├── mdct.png
├── melfilterbank.png
├── melspectrogram.png
├── mfcc.png
└── stft.png
└── zaf.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | __pycache__
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Zaf-Python
2 |
3 | Zafar's Audio Functions in **Python** for audio signal analysis.
4 |
5 | Files:
6 | - [`zaf.py`](#zafpy): Python module with the audio functions.
7 | - [`examples.ipynb`](#examplesipynb): Jupyter notebook with some examples.
8 | - [`audio_file.wav`](#audio_filewav): audio file used for the examples.
9 |
10 | See also:
11 | - [Zaf-Matlab](https://github.com/zafarrafii/Zaf-Matlab): Zafar's Audio Functions in **Matlab** for audio signal analysis.
12 | - [Zaf-Julia](https://github.com/zafarrafii/Zaf-Julia): Zafar's Audio Functions in **Julia** for audio signal analysis.
13 |
14 | ## zaf.py
15 |
16 | This Python module implements a number of functions for audio signal analysis.
17 |
18 | Simply copy the file `zaf.py` in your working directory and you are good to go. Make sure you have Python 3, NumPy, and SciPy installed.
19 |
20 | Functions:
21 | - [`stft`](#stft) - Compute the short-time Fourier transform (STFT).
22 | - [`istft`](#istft) - Compute the inverse STFT.
23 | - [`melfilterbank`](#melfilterbank) - Compute the mel filterbank.
24 | - [`melspectrogram`](#melspectrogram) - Compute the mel spectrogram using a mel filterbank.
25 | - [`mfcc`](#mfcc) - Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
26 | - [`cqtkernel`](#cqtkernel) - Compute the constant-Q transform (CQT) kernel.
27 | - [`cqtspectrogram`](#cqtspectrogram) - Compute the CQT spectrogram using a CQT kernel.
28 | - [`cqtchromagram`](#cqtchromagram) - Compute the CQT chromagram using a CQT kernel.
29 | - [`dct`](#dct) - Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
30 | - [`dst`](#dst) - Compute the discrete sine transform (DST) using the FFT.
31 | - [`mdct`](#mdct) - Compute the modified discrete cosine transform (MDCT) using the FFT.
32 | - [`imdct`](#imdct) - Compute the inverse MDCT using the FFT.
33 |
34 | Other:
35 | - `wavread` - Read a WAVE file (using SciPy).
36 | - `wavwrite` - Write a WAVE file (using SciPy).
37 | - `sigplot` - Plot a signal in seconds.
38 | - `specshow` - Display a spectrogram in dB, seconds, and Hz.
39 | - `melspecshow` - Display a mel spectrogram in dB, seconds, and Hz.
40 | - `mfccshow` - Display MFCCs in seconds.
41 | - `cqtspecshow` - Display a CQT spectrogram in dB, seconds, and Hz.
42 | - `cqtchromshow` - Display a CQT chromagram in seconds.
43 |
44 |
45 | ### stft
46 |
47 | Compute the short-time Fourier transform (STFT).
48 |
49 | ```
50 | audio_stft = zaf.stft(audio_signal, window_function, step_length)
51 |
52 | Inputs:
53 | audio_signal: audio signal (number_samples,)
54 | window_function: window function (window_length,)
55 | step_length: step length in samples
56 | Output:
57 | audio_stft: audio STFT (window_length, number_frames)
58 | ```
59 |
60 | #### Example: Compute and display the spectrogram from an audio file.
61 |
62 | ```
63 | # Import the needed modules
64 | import numpy as np
65 | import scipy.signal
66 | import zaf
67 | import matplotlib.pyplot as plt
68 |
69 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
70 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
71 | audio_signal = np.mean(audio_signal, 1)
72 |
73 | # Set the window duration in seconds (audio is stationary around 40 milliseconds)
74 | window_duration = 0.04
75 |
76 | # Derive the window length in samples (use powers of 2 for faster FFT and constant overlap-add (COLA))
77 | window_length = pow(2, int(np.ceil(np.log2(window_duration*sampling_frequency))))
78 |
79 | # Compute the window function (use SciPy's periodic Hamming window for COLA as NumPy's Hamming window is symmetric)
80 | window_function = scipy.signal.hamming(window_length, sym=False)
81 |
82 | # Set the step length in samples (half of the window length for COLA)
83 | step_length = int(window_length/2)
84 |
85 | # Compute the STFT
86 | audio_stft = zaf.stft(audio_signal, window_function, step_length)
87 |
88 | # Derive the magnitude spectrogram (without the DC component and the mirrored frequencies)
89 | audio_spectrogram = np.absolute(audio_stft[1:int(window_length/2)+1, :])
90 |
91 | # Display the spectrogram in dB, seconds, and Hz
92 | number_samples = len(audio_signal)
93 | plt.figure(figsize=(14, 7))
94 | zaf.specshow(audio_spectrogram, number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
95 | plt.title("Spectrogram (dB)")
96 | plt.tight_layout()
97 | plt.show()
98 | ```
99 |
100 |
101 |
102 |
103 | ### istft
104 |
105 | Compute the inverse short-time Fourier transform (STFT).
106 |
107 | ```
108 | audio_signal = zaf.istft(audio_stft, window_function, step_length)
109 |
110 | Inputs:
111 | audio_stft: audio STFT (window_length, number_frames)
112 | window_function: window function (window_length,)
113 | step_length: step length in samples
114 | Output:
115 | audio_signal: audio signal (number_samples,)
116 | ```
117 |
118 | #### Example: Estimate the center and the sides from a stereo audio file.
119 |
120 | ```
121 | # Import the needed modules
122 | import numpy as np
123 | import scipy.signal
124 | import zaf
125 | import matplotlib.pyplot as plt
126 |
127 | # Read the (stereo) audio signal with its sampling frequency in Hz
128 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
129 |
130 | # Set the parameters for the STFT
131 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
132 | window_function = scipy.signal.hamming(window_length, sym=False)
133 | step_length = int(window_length/2)
134 |
135 | # Compute the STFTs for the left and right channels
136 | audio_stft1 = zaf.stft(audio_signal[:, 0], window_function, step_length)
137 | audio_stft2 = zaf.stft(audio_signal[:, 1], window_function, step_length)
138 |
139 | # Derive the magnitude spectrograms (with DC component) for the left and right channels
140 | number_frequencies = int(window_length/2)+1
141 | audio_spectrogram1 = abs(audio_stft1[0:number_frequencies, :])
142 | audio_spectrogram2 = abs(audio_stft2[0:number_frequencies, :])
143 |
144 | # Estimate the time-frequency masks for the left and right channels for the center
145 | center_mask1 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram1
146 | center_mask2 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram2
147 |
148 | # Derive the STFTs for the left and right channels for the center (with mirrored frequencies)
149 | center_stft1 = np.multiply(np.concatenate((center_mask1, center_mask1[-2:0:-1, :])), audio_stft1)
150 | center_stft2 = np.multiply(np.concatenate((center_mask2, center_mask2[-2:0:-1, :])), audio_stft2)
151 |
152 | # Synthesize the signals for the left and right channels for the center
153 | center_signal1 = zaf.istft(center_stft1, window_function, step_length)
154 | center_signal2 = zaf.istft(center_stft2, window_function, step_length)
155 |
156 | # Derive the final stereo center and sides signals
157 | center_signal = np.stack((center_signal1, center_signal2), axis=1)
158 | center_signal = center_signal[0:np.shape(audio_signal)[0], :]
159 | sides_signal = audio_signal-center_signal
160 |
161 | # Write the center and sides signals
162 | zaf.wavwrite(center_signal, sampling_frequency, "center_file.wav")
163 | zaf.wavwrite(sides_signal, sampling_frequency, "sides_file.wav")
164 |
165 | # Display the original, center, and sides signals in seconds
166 | xtick_step = 1
167 | plt.figure(figsize=(14, 7))
168 | plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
169 | plt.ylim(-1, 1), plt.title("Original signal")
170 | plt.subplot(3, 1, 2), zaf.sigplot(center_signal, sampling_frequency, xtick_step)
171 | plt.ylim(-1, 1), plt.title("Center signal")
172 | plt.subplot(3, 1, 3), zaf.sigplot(sides_signal, sampling_frequency, xtick_step)
173 | plt.ylim(-1, 1), plt.title("Sides signal")
174 | plt.tight_layout()
175 | plt.show()
176 | ```
177 |
178 |
179 |
180 |
181 | ### melfilterbank
182 |
183 | Compute the mel filterbank.
184 |
185 | ```
186 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
187 |
188 | Inputs:
189 | sampling_frequency: sampling frequency in Hz
190 | window_length: window length for the Fourier analysis in samples
191 | number_mels: number of mel filters
192 |
193 | Output:
194 | mel_filterbank: mel filterbank (sparse) (number_mels, number_frequencies)
195 | ```
196 |
197 | #### Example: Compute and display the mel filterbank.
198 |
199 | ```
200 | # Import the needed modules
201 | import numpy as np
202 | import zaf
203 | import matplotlib.pyplot as plt
204 |
205 | # Compute the mel filterbank using some parameters
206 | sampling_frequency = 44100
207 | window_length = pow(2, int(np.ceil(np.log2(0.04 * sampling_frequency))))
208 | number_mels = 128
209 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
210 |
211 | # Display the mel filterbank
212 | plt.figure(figsize=(14, 5))
213 | plt.imshow(mel_filterbank.toarray(), aspect="auto", cmap="jet", origin="lower")
214 | plt.title("Mel filterbank")
215 | plt.xlabel("Frequency index")
216 | plt.ylabel("Mel index")
217 | plt.tight_layout()
218 | plt.show()
219 | ```
220 |
221 |
222 |
223 |
224 | ### melspectrogram
225 |
226 | Compute the mel spectrogram using a mel filterbank.
227 |
228 | ```
229 | mel_filterbank = zaf.melspectrogram(audio_signal, window_function, step_length, mel_filterbank)
230 |
231 | Inputs:
232 | audio_signal: audio signal (number_samples,)
233 | window_function: window function (window_length,)
234 | step_length: step length in samples
235 | mel_filterbank: mel filterbank (number_mels, number_frequencies)
236 | Output:
237 | mel_spectrogram: mel spectrogram (number_mels, number_times)
238 | ```
239 |
240 | #### Example: Compute and display the mel spectrogram.
241 |
242 | ```
243 | # Import the needed modules
244 | import numpy as np
245 | import scipy.signal
246 | import zaf
247 | import matplotlib.pyplot as plt
248 |
249 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
250 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
251 | audio_signal = np.mean(audio_signal, 1)
252 |
253 | # Set the parameters for the Fourier analysis
254 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
255 | window_function = scipy.signal.hamming(window_length, sym=False)
256 | step_length = int(window_length/2)
257 |
258 | # Compute the mel filterbank
259 | number_mels = 128
260 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
261 |
262 | # Compute the mel spectrogram using the filterbank
263 | mel_spectrogram = zaf.melspectrogram(audio_signal, window_function, step_length, mel_filterbank)
264 |
265 | # Display the mel spectrogram in dB, seconds, and Hz
266 | number_samples = len(audio_signal)
267 | plt.figure(figsize=(14, 5))
268 | zaf.melspecshow(mel_spectrogram, number_samples, sampling_frequency, window_length, xtick_step=1)
269 | plt.title("Mel spectrogram (dB)")
270 | plt.tight_layout()
271 | plt.show()
272 | ```
273 |
274 |
275 |
276 |
277 | ### mfcc
278 |
279 | Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
280 |
281 | ```
282 | audio_mfcc = zaf.mfcc(audio_signal, sample_frequency, number_filters, number_coefficients)
283 |
284 | Inputs:
285 | audio_signal: audio signal (number_samples,)
286 | sampling_frequency: sampling frequency in Hz
287 | number_filters: number of filters
288 | number_coefficients: number of coefficients (without the 0th coefficient)
289 | Output:
290 | audio_mfcc: audio MFCCs (number_times, number_coefficients)
291 | ```
292 |
293 | #### Example: Compute and display the MFCCs, delta MFCCs, and delta-delta MFCCs.
294 |
295 | ```
296 | # Import the needed modules
297 | import numpy as np
298 | import scipy.signal
299 | import zaf
300 | import matplotlib.pyplot as plt
301 |
302 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
303 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
304 | audio_signal = np.mean(audio_signal, 1)
305 |
306 | # Set the parameters for the Fourier analysis
307 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
308 | window_function = scipy.signal.hamming(window_length, sym=False)
309 | step_length = int(window_length/2)
310 |
311 | # Compute the mel filterbank
312 | number_mels = 40
313 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
314 |
315 | # Compute the MFCCs using the filterbank
316 | number_coefficients = 20
317 | audio_mfcc = zaf.mfcc(audio_signal, window_function, step_length, mel_filterbank, number_coefficients)
318 |
319 | # Compute the delta and delta-delta MFCCs
320 | audio_dmfcc = np.diff(audio_mfcc, n=1, axis=1)
321 | audio_ddmfcc = np.diff(audio_dmfcc, n=1, axis=1)
322 |
323 | # Display the MFCCs, delta MFCCs, and delta-delta MFCCs in seconds
324 | number_samples = len(audio_signal)
325 | xtick_step = 1
326 | plt.figure(figsize=(14, 7))
327 | plt.subplot(3, 1, 1)
328 | zaf.mfccshow(audio_mfcc, number_samples, sampling_frequency, xtick_step), plt.title("MFCCs")
329 | plt.subplot(3, 1, 2)
330 | zaf.mfccshow(audio_dmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta MFCCs")
331 | plt.subplot(3, 1, 3)
332 | zaf.mfccshow(audio_ddmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta-delta MFCCs")
333 | plt.tight_layout()
334 | plt.show()
335 | ```
336 |
337 |
338 |
339 |
340 | ### cqtkernel
341 |
342 | Compute the constant-Q transform (CQT) kernel.
343 |
344 | ```
345 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
346 |
347 | Inputs:
348 | sampling_frequency: sampling frequency in Hz
349 | octave_resolution: number of frequency channels per octave
350 | minimum_frequency: minimum frequency in Hz
351 | maximum_frequency: maximum frequency in Hz
352 | Output:
353 | cqt_kernel: CQT kernel (sparse) (number_frequencies, fft_length)
354 | ```
355 |
356 | #### Example: Compute and display the CQT kernel.
357 |
358 | ```
359 | # Import the needed modules
360 | import numpy as np
361 | import zaf
362 | import matplotlib.pyplot as plt
363 |
364 | # Set the parameters for the CQT kernel
365 | sampling_frequency = 44100
366 | octave_resolution = 24
367 | minimum_frequency = 55
368 | maximum_frequency = sampling_frequency/2
369 |
370 | # Compute the CQT kernel
371 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
372 |
373 | # Display the magnitude CQT kernel
374 | plt.figure(figsize=(14, 5))
375 | plt.imshow(np.absolute(cqt_kernel).toarray(), aspect="auto", cmap="jet", origin="lower")
376 | plt.title("Magnitude CQT kernel")
377 | plt.xlabel("FFT index")
378 | plt.ylabel("CQT index")
379 | plt.tight_layout()
380 | plt.show()
381 | ```
382 |
383 |
384 |
385 |
386 | ### cqtspectrogram
387 |
388 | Compute the constant-Q transform (CQT) spectrogram using a CQT kernel.
389 |
390 | ```
391 | cqt_spectrogram = zaf.cqtspectrogram(audio_signal, sample_frequency, time_resolution, cqt_kernel)
392 |
393 | Inputs:
394 | audio_signal: audio signal (number_samples,)
395 | sampling_frequency: sampling frequency in Hz
396 | time_resolution: number of time frames per second
397 | cqt_kernel: CQT kernel (number_frequencies, fft_length)
398 | Output:
399 | cqt_spectrogram: CQT spectrogram (number_frequencies, number_times)
400 | ```
401 |
402 | #### Example: Compute and display the CQT spectrogram.
403 |
404 | ```
405 | # Import the needed modules
406 | import numpy as np
407 | import zaf
408 | import matplotlib.pyplot as plt
409 |
410 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
411 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
412 | audio_signal = np.mean(audio_signal, 1)
413 |
414 | # Compute the CQT kernel
415 | octave_resolution = 24
416 | minimum_frequency = 55
417 | maximum_frequency = 3520
418 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
419 |
420 | # Compute the CQT spectrogram using the kernel
421 | time_resolution = 25
422 | cqt_spectrogram = zaf.cqtspectrogram(audio_signal, sampling_frequency, time_resolution, cqt_kernel)
423 |
424 | # Display the CQT spectrogram in dB, seconds, and Hz
425 | plt.figure(figsize=(14, 5))
426 | zaf.cqtspecshow(cqt_spectrogram, time_resolution, octave_resolution, minimum_frequency, xtick_step=1)
427 | plt.title("CQT spectrogram (dB)")
428 | plt.tight_layout()
429 | plt.show()
430 | ```
431 |
432 |
433 |
434 |
435 | ### cqtchromagram
436 |
437 | Compute the constant-Q transform (CQT) chromagram using a CQT kernel.
438 |
439 | ```
440 | cqt_chromagram = zaf.cqtchromagram(audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel)
441 |
442 | Inputs:
443 | audio_signal: audio signal (number_samples,)
444 | sampling_frequency: sampling frequency in Hz
445 | time_resolution: number of time frames per second
446 | octave_resolution: number of frequency channels per octave
447 | cqt_kernel: CQT kernel (number_frequencies, fft_length)
448 | Output:
449 | cqt_chromagram: CQT chromagram (number_chromas, number_times)
450 | ```
451 |
452 | #### Example: Compute and display the CQT chromagram.
453 |
454 | ```
455 | # Import the needed modules
456 | import numpy as np
457 | import zaf
458 | import matplotlib.pyplot as plt
459 |
460 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
461 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
462 | audio_signal = np.mean(audio_signal, 1)
463 |
464 | # Compute the CQT kernel
465 | octave_resolution = 24
466 | minimum_frequency = 55
467 | maximum_frequency = 3520
468 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
469 |
470 | # Compute the CQT chromagram using the kernel
471 | time_resolution = 25
472 | cqt_chromagram = zaf.cqtchromagram(audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel)
473 |
474 | # Display the CQT chromagram in seconds
475 | plt.figure(figsize=(14, 3))
476 | zaf.cqtchromshow(cqt_chromagram, time_resolution, xtick_step=1)
477 | plt.title("CQT chromagram")
478 | plt.tight_layout()
479 | plt.show()
480 | ```
481 |
482 |
483 |
484 |
485 | ### dct
486 |
487 | Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
488 |
489 | ```
490 | audio_dct = zaf.dct(audio_signal, dct_type)
491 |
492 | Inputs:
493 | audio_signal: audio signal (window_length,)
494 | dct_type: dct type (1, 2, 3, or 4)
495 | Output:
496 | audio_dct: audio DCT (number_frequencies,)
497 | ```
498 |
499 | #### Example: Compute the 4 different DCTs and compare them to SciPy's DCTs.
500 |
501 | ```
502 | # Import the needed modules
503 | import numpy as np
504 | import zaf
505 | import scipy.fftpack
506 | import matplotlib.pyplot as plt
507 |
508 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
509 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
510 | audio_signal = np.mean(audio_signal, 1)
511 |
512 | # Get an audio segment for a given window length
513 | window_length = 1024
514 | audio_segment = audio_signal[0:window_length]
515 |
516 | # Compute the DCT-I, II, III, and IV
517 | audio_dct1 = zaf.dct(audio_segment, 1)
518 | audio_dct2 = zaf.dct(audio_segment, 2)
519 | audio_dct3 = zaf.dct(audio_segment, 3)
520 | audio_dct4 = zaf.dct(audio_segment, 4)
521 |
522 | # Compute SciPy's DCT-I, II, III, and IV (orthogonalized)
523 | scipy_dct1 = scipy.fftpack.dct(audio_segment, type=1, norm="ortho")
524 | scipy_dct2 = scipy.fftpack.dct(audio_segment, type=2, norm="ortho")
525 | scipy_dct3 = scipy.fftpack.dct(audio_segment, type=3, norm="ortho")
526 | scipy_dct4 = scipy.fftpack.dct(audio_segment, type=4, norm="ortho")
527 |
528 | # Plot the DCT-I, II, III, and IV, SciPy's versions, and their differences
529 | plt.figure(figsize=(14, 7))
530 | plt.subplot(3, 4, 1), plt.plot(audio_dct1), plt.autoscale(tight=True), plt.title("DCT-I")
531 | plt.subplot(3, 4, 2), plt.plot(audio_dct2), plt.autoscale(tight=True), plt.title("DCT-II")
532 | plt.subplot(3, 4, 3), plt.plot(audio_dct3), plt.autoscale(tight=True), plt.title("DCT-III")
533 | plt.subplot(3, 4, 4), plt.plot(audio_dct4), plt.autoscale(tight=True), plt.title("DCT-IV")
534 | plt.subplot(3, 4, 5), plt.plot(scipy_dct1), plt.autoscale(tight=True), plt.title("SciPy's DCT-I")
535 | plt.subplot(3, 4, 6), plt.plot(scipy_dct2), plt.autoscale(tight=True), plt.title("SciPy's DCT-II")
536 | plt.subplot(3, 4, 7), plt.plot(scipy_dct3), plt.autoscale(tight=True), plt.title("SciPy's DCT-III")
537 | plt.subplot(3, 4, 8), plt.plot(scipy_dct4), plt.autoscale(tight=True), plt.title("SciPy's DCT-IV")
538 | plt.subplot(3, 4, 9), plt.plot(audio_dct1-scipy_dct1), plt.autoscale(tight=True), plt.title("DCT-I - SciPy's DCT-I")
539 | plt.subplot(3, 4, 10), plt.plot(audio_dct2-scipy_dct2), plt.autoscale(tight=True), plt.title("DCT-II - SciPy's DCT-II")
540 | plt.subplot(3, 4, 11), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-III - SciPy's DCT-III")
541 | plt.subplot(3, 4, 12), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-IV - SciPy's DCT-IV")
542 | plt.tight_layout()
543 | plt.show()
544 | ```
545 |
546 |
547 |
548 |
549 | ### dst
550 |
551 | Compute the discrete sine transform (DST) using the fast Fourier transform (FFT).
552 |
553 | ```
554 | audio_dst = zaf.dst(audio_signal, dst_type)
555 |
556 | Inputs:
557 | audio_signal: audio signal (window_length,)
558 | dst_type: DST type (1, 2, 3, or 4)
559 | Output:
560 | audio_dst: audio DST (number_frequencies,)
561 | ```
562 |
563 | #### Example: Compute the 4 different DSTs and compare their respective inverses with the original audio.
564 |
565 | ```
566 | # Import the needed modules
567 | import numpy as np
568 | import zaf
569 | import matplotlib.pyplot as plt
570 |
571 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
572 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
573 | audio_signal = np.mean(audio_signal, 1)
574 |
575 | # Get an audio segment for a given window length
576 | window_length = 1024
577 | audio_segment = audio_signal[0:window_length]
578 |
579 | # Compute the DST-I, II, III, and IV
580 | audio_dst1 = zaf.dst(audio_segment, 1)
581 | audio_dst2 = zaf.dst(audio_segment, 2)
582 | audio_dst3 = zaf.dst(audio_segment, 3)
583 | audio_dst4 = zaf.dst(audio_segment, 4)
584 |
585 | # Compute their respective inverses, i.e., DST-I, II, III, and IV
586 | audio_idst1 = zaf.dst(audio_dst1, 1)
587 | audio_idst2 = zaf.dst(audio_dst2, 3)
588 | audio_idst3 = zaf.dst(audio_dst3, 2)
589 | audio_idst4 = zaf.dst(audio_dst4, 4)
590 |
591 | # Plot the DST-I, II, III, and IV, their respective inverses, and their differences with the original audio segment
592 | plt.figure(figsize=(14, 7))
593 | plt.subplot(3, 4, 1), plt.plot(audio_dst1), plt.autoscale(tight=True), plt.title("DCT-I")
594 | plt.subplot(3, 4, 2), plt.plot(audio_dst2), plt.autoscale(tight=True), plt.title("DST-II")
595 | plt.subplot(3, 4, 3), plt.plot(audio_dst3), plt.autoscale(tight=True), plt.title("DST-III")
596 | plt.subplot(3, 4, 4), plt.plot(audio_dst4), plt.autoscale(tight=True), plt.title("DST-IV")
597 | plt.subplot(3, 4, 5), plt.plot(audio_idst1), plt.autoscale(tight=True), plt.title("Inverse DST-I (DST-I)")
598 | plt.subplot(3, 4, 6), plt.plot(audio_idst2), plt.autoscale(tight=True), plt.title("Inverse DST-II (DST-III)")
599 | plt.subplot(3, 4, 7), plt.plot(audio_idst3), plt.autoscale(tight=True), plt.title("Inverse DST-III (DST-II)")
600 | plt.subplot(3, 4, 8), plt.plot(audio_idst4), plt.autoscale(tight=True), plt.title("Inverse DST-IV (DST-IV)")
601 | plt.subplot(3, 4, 9), plt.plot(audio_idst1-audio_segment), plt.autoscale(tight=True)
602 | plt.title("Inverse DST-I - audio segment")
603 | plt.subplot(3, 4, 10), plt.plot(audio_idst2-audio_segment), plt.autoscale(tight=True)
604 | plt.title("Inverse DST-II - audio segment")
605 | plt.subplot(3, 4, 11), plt.plot(audio_idst3-audio_segment), plt.autoscale(tight=True)
606 | plt.title("Inverse DST-III - audio segment")
607 | plt.subplot(3, 4, 12), plt.plot(audio_idst4-audio_segment), plt.autoscale(tight=True)
608 | plt.title("Inverse DST-IV - audio segment")
609 | plt.tight_layout()
610 | plt.show()
611 | ```
612 |
613 |
614 |
615 |
616 | ### mdct
617 |
618 | Compute the modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
619 |
620 | ```
621 | audio_mdct = zaf.mdct(audio_signal, window_function)
622 |
623 | Inputs:
624 | audio_signal: audio signal (number_samples,)
625 | window_function: window function (window_length,)
626 | Output:
627 | audio_mdct: audio MDCT (number_frequencies, number_times)
628 | ```
629 |
630 | #### Example: Compute and display the MDCT as used in the AC-3 audio coding format.
631 |
632 | ```
633 | # Import the needed modules
634 | import numpy as np
635 | import zaf
636 | import matplotlib.pyplot as plt
637 |
638 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
639 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
640 | audio_signal = np.mean(audio_signal, 1)
641 |
642 | # Compute the Kaiser-Bessel-derived (KBD) window as used in the AC-3 audio coding format
643 | window_length = 512
644 | alpha_value = 5
645 | window_function = np.kaiser(int(window_length/2)+1, alpha_value*np.pi)
646 | window_function2 = np.cumsum(window_function[1:int(window_length/2)])
647 | window_function = np.sqrt(np.concatenate((window_function2, window_function2[int(window_length/2)::-1]))
648 | /np.sum(window_function))
649 |
650 | # Compute the MDCT
651 | audio_mdct = zaf.mdct(audio_signal, window_function)
652 |
653 | # Display the MDCT in dB, seconds, and Hz
654 | number_samples = len(audio_signal)
655 | plt.figure(figsize=(14, 7))
656 | zaf.specshow(np.absolute(audio_mdct), number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
657 | plt.title("MDCT (dB)")
658 | plt.tight_layout()
659 | plt.show()
660 | ```
661 |
662 |
663 |
664 |
665 | ### imdct
666 |
667 | Compute the inverse modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
668 |
669 | ```
670 | audio_signal = zaf.imdct(audio_mdct, window_function)
671 |
672 | Inputs:
673 | audio_mdct: audio MDCT (number_frequencies, number_times)
674 | window_function: window function (window_length,)
675 | Output:
676 | audio_signal: audio signal (number_samples,)
677 | ```
678 |
679 | #### Example: Verify that the MDCT is perfectly invertible.
680 |
681 | ```
682 | # Import the needed modules
683 | import numpy as np
684 | import zaf
685 | import matplotlib.pyplot as plt
686 |
687 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
688 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
689 | audio_signal = np.mean(audio_signal, 1)
690 |
691 | # Compute the MDCT with a slope function as used in the Vorbis audio coding format
692 | window_length = 2048
693 | window_function = np.sin(np.pi/2*pow(np.sin(np.pi/window_length*np.arange(0.5, window_length+0.5)), 2))
694 | audio_mdct = zaf.mdct(audio_signal, window_function)
695 |
696 | # Compute the inverse MDCT
697 | audio_signal2 = zaf.imdct(audio_mdct, window_function)
698 | audio_signal2 = audio_signal2[0:len(audio_signal)]
699 |
700 | # Compute the differences between the original signal and the resynthesized one
701 | audio_differences = audio_signal-audio_signal2
702 | y_max = np.max(np.absolute(audio_differences))
703 |
704 | # Display the original and resynthesized signals, and their differences in seconds
705 | xtick_step = 1
706 | plt.figure(figsize=(14, 7))
707 | plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
708 | plt.ylim(-1, 1), plt.title("Original signal")
709 | plt.subplot(3, 1, 2), zaf.sigplot(audio_signal2, sampling_frequency, xtick_step)
710 | plt.ylim(-1, 1), plt.title("Resyntesized signal")
711 | plt.subplot(3, 1, 3), zaf.sigplot(audio_differences, sampling_frequency, xtick_step)
712 | plt.ylim(-y_max, y_max), plt.title("Original - resyntesized signal")
713 | plt.tight_layout()
714 | plt.show()
715 | ```
716 |
717 |
718 |
719 |
720 | ## examples.ipynb
721 |
722 | This Jupyter notebook shows some examples for the different functions of the Python module `zaf`.
723 |
724 | See [Jupyter notebook viewer](https://nbviewer.jupyter.org/github/zafarrafii/Zaf-Python/blob/master/examples.ipynb).
725 |
726 |
727 | ## audio_file.wav
728 |
729 | 23 second audio excerpt from the song *Que Pena Tanto Faz* performed by *Tamy*.
730 |
731 |
732 | # Author
733 |
734 | - Zafar Rafii
735 | - http://zafarrafii.com/
736 | - [CV](http://zafarrafii.com/Zafar%20Rafii%20-%20C.V..pdf)
737 | - [GitHub](https://github.com/zafarrafii)
738 | - [LinkedIn](https://www.linkedin.com/in/zafarrafii/)
739 | - [Google Scholar](https://scholar.google.com/citations?user=8wbS2EsAAAAJ&hl=en)
740 |
--------------------------------------------------------------------------------
/audio_file.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/audio_file.wav
--------------------------------------------------------------------------------
/images/cqtchromagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/cqtchromagram.png
--------------------------------------------------------------------------------
/images/cqtkernel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/cqtkernel.png
--------------------------------------------------------------------------------
/images/cqtspectrogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/cqtspectrogram.png
--------------------------------------------------------------------------------
/images/dct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/dct.png
--------------------------------------------------------------------------------
/images/dst.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/dst.png
--------------------------------------------------------------------------------
/images/imdct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/imdct.png
--------------------------------------------------------------------------------
/images/istft.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/istft.png
--------------------------------------------------------------------------------
/images/mdct.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/mdct.png
--------------------------------------------------------------------------------
/images/melfilterbank.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/melfilterbank.png
--------------------------------------------------------------------------------
/images/melspectrogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/melspectrogram.png
--------------------------------------------------------------------------------
/images/mfcc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/mfcc.png
--------------------------------------------------------------------------------
/images/stft.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zafarrafii/Zaf-Python/9589f61f8da499ba53bed3465215b8cc54d287a7/images/stft.png
--------------------------------------------------------------------------------
/zaf.py:
--------------------------------------------------------------------------------
1 | """
2 | This Python module implements a number of functions for audio signal analysis.
3 |
4 | Functions:
5 | stft - Compute the short-time Fourier transform (STFT).
6 | istft - Compute the inverse STFT.
7 | melfilterbank - Compute the mel filterbank.
8 | melspectrogram - Compute the mel spectrogram using a mel filterbank.
9 | mfcc - Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
10 | cqtkernel - Compute the constant-Q transform (CQT) kernel.
11 | cqtspectrogram - Compute the CQT spectrogram using a CQT kernel.
12 | cqtchromagram - Compute the CQT chromagram using a CQT kernel.
13 | dct - Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
14 | dst - Compute the discrete sine transform (DST) using the FFT.
15 | mdct - Compute the modified discrete cosine transform (MDCT) using the FFT.
16 | imdct - Compute the inverse MDCT using the FFT.
17 |
18 | Other:
19 | wavread - Read a WAVE file (using SciPy).
20 | wavwrite - Write a WAVE file (using SciPy).
21 | sigplot - Plot a signal in seconds.
22 | specshow - Display an spectrogram in dB, seconds, and Hz.
23 | melspecshow - Display a mel spectrogram in dB, seconds, and Hz.
24 | mfccshow - Display MFCCs in seconds.
25 | cqtspecshow - Display a CQT spectrogram in dB, seconds, and Hz.
26 | cqtchromshow - Display a CQT chromagram in seconds.
27 |
28 | Author:
29 | Zafar Rafii
30 | zafarrafii@gmail.com
31 | http://zafarrafii.com
32 | https://github.com/zafarrafii
33 | https://www.linkedin.com/in/zafarrafii/
34 | 08/24/21
35 | """
36 |
37 | import numpy as np
38 | import scipy.sparse
39 | import scipy.signal
40 | import scipy.fftpack
41 | import scipy.io.wavfile
42 | import matplotlib.pyplot as plt
43 |
44 |
45 | def stft(audio_signal, window_function, step_length):
46 | """
47 | Compute the short-time Fourier transform (STFT).
48 |
49 | Inputs:
50 | audio_signal: audio signal (number_samples,)
51 | window_function: window function (window_length,)
52 | step_length: step length in samples
53 | Output:
54 | audio_stft: audio STFT (window_length, number_frames)
55 |
56 | Example: Compute and display the spectrogram from an audio file.
57 | # Import the needed modules
58 | import numpy as np
59 | import scipy.signal
60 | import zaf
61 | import matplotlib.pyplot as plt
62 |
63 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
64 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
65 | audio_signal = np.mean(audio_signal, 1)
66 |
67 | # Set the window duration in seconds (audio is stationary around 40 milliseconds)
68 | window_duration = 0.04
69 |
70 | # Derive the window length in samples (use powers of 2 for faster FFT and constant overlap-add (COLA))
71 | window_length = pow(2, int(np.ceil(np.log2(window_duration*sampling_frequency))))
72 |
73 | # Compute the window function (use SciPy's periodic Hamming window for COLA as NumPy's Hamming window is symmetric)
74 | window_function = scipy.signal.hamming(window_length, sym=False)
75 |
76 | # Set the step length in samples (half of the window length for COLA)
77 | step_length = int(window_length/2)
78 |
79 | # Compute the STFT
80 | audio_stft = zaf.stft(audio_signal, window_function, step_length)
81 |
82 | # Derive the magnitude spectrogram (without the DC component and the mirrored frequencies)
83 | audio_spectrogram = np.absolute(audio_stft[1:int(window_length/2)+1, :])
84 |
85 | # Display the spectrogram in dB, seconds, and Hz
86 | number_samples = len(audio_signal)
87 | plt.figure(figsize=(14, 7))
88 | zaf.specshow(audio_spectrogram, number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
89 | plt.title("Spectrogram (dB)")
90 | plt.tight_layout()
91 | plt.show()
92 | """
93 |
94 | # Get the number of samples and the window length in samples
95 | number_samples = len(audio_signal)
96 | window_length = len(window_function)
97 |
98 | # Derive the zero-padding length at the start and at the end of the signal to center the windows
99 | padding_length = int(np.floor(window_length / 2))
100 |
101 | # Compute the number of time frames given the zero-padding at the start and at the end of the signal
102 | number_times = (
103 | int(
104 | np.ceil(
105 | ((number_samples + 2 * padding_length) - window_length) / step_length
106 | )
107 | )
108 | + 1
109 | )
110 |
111 | # Zero-pad the start and the end of the signal to center the windows
112 | audio_signal = np.pad(
113 | audio_signal,
114 | (
115 | padding_length,
116 | (
117 | number_times * step_length
118 | + (window_length - step_length)
119 | - padding_length
120 | )
121 | - number_samples,
122 | ),
123 | "constant",
124 | constant_values=0,
125 | )
126 |
127 | # Initialize the STFT
128 | audio_stft = np.zeros((window_length, number_times))
129 |
130 | # Loop over the time frames
131 | i = 0
132 | for j in range(number_times):
133 |
134 | # Window the signal
135 | audio_stft[:, j] = audio_signal[i : i + window_length] * window_function
136 | i = i + step_length
137 |
138 | # Compute the Fourier transform of the frames using the FFT
139 | audio_stft = np.fft.fft(audio_stft, axis=0)
140 |
141 | return audio_stft
142 |
143 |
144 | def istft(audio_stft, window_function, step_length):
145 | """
146 | Compute the inverse short-time Fourier transform (STFT).
147 |
148 | Inputs:
149 | audio_stft: audio STFT (window_length, number_frames)
150 | window_function: window function (window_length,)
151 | step_length: step length in samples
152 | Output:
153 | audio_signal: audio signal (number_samples,)
154 |
155 | Example: Estimate the center and the sides from a stereo audio file.
156 | # Import the needed modules
157 | import numpy as np
158 | import scipy.signal
159 | import zaf
160 | import matplotlib.pyplot as plt
161 |
162 | # Read the (stereo) audio signal with its sampling frequency in Hz
163 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
164 |
165 | # Set the parameters for the STFT
166 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
167 | window_function = scipy.signal.hamming(window_length, sym=False)
168 | step_length = int(window_length/2)
169 |
170 | # Compute the STFTs for the left and right channels
171 | audio_stft1 = zaf.stft(audio_signal[:, 0], window_function, step_length)
172 | audio_stft2 = zaf.stft(audio_signal[:, 1], window_function, step_length)
173 |
174 | # Derive the magnitude spectrograms (with DC component) for the left and right channels
175 | number_frequencies = int(window_length/2)+1
176 | audio_spectrogram1 = abs(audio_stft1[0:number_frequencies, :])
177 | audio_spectrogram2 = abs(audio_stft2[0:number_frequencies, :])
178 |
179 | # Estimate the time-frequency masks for the left and right channels for the center
180 | center_mask1 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram1
181 | center_mask2 = np.minimum(audio_spectrogram1, audio_spectrogram2)/audio_spectrogram2
182 |
183 | # Derive the STFTs for the left and right channels for the center (with mirrored frequencies)
184 | center_stft1 = np.multiply(np.concatenate((center_mask1, center_mask1[-2:0:-1, :])), audio_stft1)
185 | center_stft2 = np.multiply(np.concatenate((center_mask2, center_mask2[-2:0:-1, :])), audio_stft2)
186 |
187 | # Synthesize the signals for the left and right channels for the center
188 | center_signal1 = zaf.istft(center_stft1, window_function, step_length)
189 | center_signal2 = zaf.istft(center_stft2, window_function, step_length)
190 |
191 | # Derive the final stereo center and sides signals
192 | center_signal = np.stack((center_signal1, center_signal2), axis=1)
193 | center_signal = center_signal[0:np.shape(audio_signal)[0], :]
194 | sides_signal = audio_signal-center_signal
195 |
196 | # Write the center and sides signals
197 | zaf.wavwrite(center_signal, sampling_frequency, "center_file.wav")
198 | zaf.wavwrite(sides_signal, sampling_frequency, "sides_file.wav")
199 |
200 | # Display the original, center, and sides signals in seconds
201 | xtick_step = 1
202 | plt.figure(figsize=(14, 7))
203 | plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
204 | plt.ylim(-1, 1), plt.title("Original signal")
205 | plt.subplot(3, 1, 2), zaf.sigplot(center_signal, sampling_frequency, xtick_step)
206 | plt.ylim(-1, 1), plt.title("Center signal")
207 | plt.subplot(3, 1, 3), zaf.sigplot(sides_signal, sampling_frequency, xtick_step)
208 | plt.ylim(-1, 1), plt.title("Sides signal")
209 | plt.tight_layout()
210 | plt.show()
211 | """
212 |
213 | # Get the window length in samples and the number of time frames
214 | window_length, number_times = np.shape(audio_stft)
215 |
216 | # Compute the number of samples for the signal
217 | number_samples = number_times * step_length + (window_length - step_length)
218 |
219 | # Initialize the signal
220 | audio_signal = np.zeros(number_samples)
221 |
222 | # Compute the inverse Fourier transform of the frames and take the real part to ensure real values
223 | audio_stft = np.real(np.fft.ifft(audio_stft, axis=0))
224 |
225 | # Loop over the time frames
226 | i = 0
227 | for j in range(number_times):
228 |
229 | # Perform a constant overlap-add (COLA) of the signal (with proper window function and step length)
230 | audio_signal[i : i + window_length] = (
231 | audio_signal[i : i + window_length] + audio_stft[:, j]
232 | )
233 | i = i + step_length
234 |
235 | # Remove the zero-padding at the start and at the end of the signal
236 | audio_signal = audio_signal[
237 | window_length - step_length : number_samples - (window_length - step_length)
238 | ]
239 |
240 | # Normalize the signal by the gain introduced by the COLA (if any)
241 | audio_signal = audio_signal / sum(window_function[0:window_length:step_length])
242 |
243 | return audio_signal
244 |
245 |
246 | def melfilterbank(sampling_frequency, window_length, number_filters):
247 | """
248 | Compute the mel filterbank.
249 |
250 | Inputs:
251 | sampling_frequency: sampling frequency in Hz
252 | window_length: window length for the Fourier analysis in samples
253 | number_mels: number of mel filters
254 | Output:
255 | mel_filterbank: mel filterbank (sparse) (number_mels, number_frequencies)
256 |
257 | Example: Compute and display the mel filterbank.
258 | # Import the needed modules
259 | import numpy as np
260 | import zaf
261 | import matplotlib.pyplot as plt
262 |
263 | # Compute the mel filterbank using some parameters
264 | sampling_frequency = 44100
265 | window_length = pow(2, int(np.ceil(np.log2(0.04 * sampling_frequency))))
266 | number_mels = 128
267 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
268 |
269 | # Display the mel filterbank
270 | plt.figure(figsize=(14, 5))
271 | plt.imshow(mel_filterbank.toarray(), aspect="auto", cmap="jet", origin="lower")
272 | plt.title("Mel filterbank")
273 | plt.xlabel("Frequency index")
274 | plt.ylabel("Mel index")
275 | plt.tight_layout()
276 | plt.show()
277 | """
278 |
279 | # Compute the minimum and maximum mels
280 | minimum_mel = 2595 * np.log10(1 + (sampling_frequency / window_length) / 700)
281 | maximum_mel = 2595 * np.log10(1 + (sampling_frequency / 2) / 700)
282 |
283 | # Derive the width of the half-overlapping filters in the mel scale (constant)
284 | filter_width = 2 * (maximum_mel - minimum_mel) / (number_filters + 1)
285 |
286 | # Compute the start and end indices of the filters in the mel scale (linearly spaced)
287 | filter_indices = np.arange(minimum_mel, maximum_mel + 1, filter_width / 2)
288 |
289 | # Derive the indices of the filters in the linear frequency scale (log spaced)
290 | filter_indices = np.round(
291 | 700
292 | * (np.power(10, filter_indices / 2595) - 1)
293 | * window_length
294 | / sampling_frequency
295 | ).astype(int)
296 |
297 | # Initialize the mel filterbank
298 | mel_filterbank = np.zeros((number_filters, int(window_length / 2)))
299 |
300 | # Loop over the filters
301 | for i in range(number_filters):
302 |
303 | # Compute the left and right sides of the triangular filters
304 | # (this is more accurate than creating triangular filters directly)
305 | mel_filterbank[i, filter_indices[i] - 1 : filter_indices[i + 1]] = np.linspace(
306 | 0,
307 | 1,
308 | num=filter_indices[i + 1] - filter_indices[i] + 1,
309 | )
310 | mel_filterbank[
311 | i, filter_indices[i + 1] - 1 : filter_indices[i + 2]
312 | ] = np.linspace(
313 | 1,
314 | 0,
315 | num=filter_indices[i + 2] - filter_indices[i + 1] + 1,
316 | )
317 |
318 | # Make the mel filterbank sparse by saving it as a compressed sparse row matrix
319 | mel_filterbank = scipy.sparse.csr_matrix(mel_filterbank)
320 |
321 | return mel_filterbank
322 |
323 |
324 | def melspectrogram(audio_signal, window_function, step_length, mel_filterbank):
325 | """
326 | Compute the mel spectrogram using a mel filterbank.
327 |
328 | Inputs:
329 | audio_signal: audio signal (number_samples,)
330 | window_function: window function (window_length,)
331 | step_length: step length in samples
332 | mel_filterbank: mel filterbank (number_mels, number_frequencies)
333 | Output:
334 | mel_spectrogram: mel spectrogram (number_mels, number_times)
335 |
336 | Example: Compute and display the mel spectrogram.
337 | # Import the needed modules
338 | import numpy as np
339 | import scipy.signal
340 | import zaf
341 | import matplotlib.pyplot as plt
342 |
343 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
344 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
345 | audio_signal = np.mean(audio_signal, 1)
346 |
347 | # Set the parameters for the Fourier analysis
348 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
349 | window_function = scipy.signal.hamming(window_length, sym=False)
350 | step_length = int(window_length/2)
351 |
352 | # Compute the mel filterbank
353 | number_mels = 128
354 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
355 |
356 | # Compute the mel spectrogram using the filterbank
357 | mel_spectrogram = zaf.melspectrogram(audio_signal, window_function, step_length, mel_filterbank)
358 |
359 | # Display the mel spectrogram in dB, seconds, and Hz
360 | number_samples = len(audio_signal)
361 | plt.figure(figsize=(14, 5))
362 | zaf.melspecshow(mel_spectrogram, number_samples, sampling_frequency, window_length, xtick_step=1)
363 | plt.title("Mel spectrogram (dB)")
364 | plt.tight_layout()
365 | plt.show()
366 | """
367 |
368 | # Compute the magnitude spectrogram (without the DC component and the mirrored frequencies)
369 | audio_stft = stft(audio_signal, window_function, step_length)
370 | audio_spectrogram = abs(audio_stft[1 : int(len(window_function) / 2) + 1, :])
371 |
372 | # Compute the mel spectrogram by using the filterbank
373 | mel_spectrogram = np.matmul(mel_filterbank.toarray(), audio_spectrogram)
374 |
375 | return mel_spectrogram
376 |
377 |
378 | def mfcc(
379 | audio_signal, window_function, step_length, mel_filterbank, number_coefficients
380 | ):
381 | """
382 | Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
383 |
384 | Inputs:
385 | audio_signal: audio signal (number_samples,)
386 | window_function: window function (window_length,)
387 | step_length: step length in samples
388 | mel_filterbank: mel filterbank (number_mels, number_frequencies)
389 | number_coefficients: number of coefficients (without the 0th coefficient)
390 | Output:
391 | audio_mfcc: audio MFCCs (number_coefficients, number_times)
392 |
393 | Example: Compute and display the MFCCs, delta MFCCs, and delta-delta MFCCs.
394 | # Import the needed modules
395 | import numpy as np
396 | import scipy.signal
397 | import zaf
398 | import matplotlib.pyplot as plt
399 |
400 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
401 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
402 | audio_signal = np.mean(audio_signal, 1)
403 |
404 | # Set the parameters for the Fourier analysis
405 | window_length = pow(2, int(np.ceil(np.log2(0.04*sampling_frequency))))
406 | window_function = scipy.signal.hamming(window_length, sym=False)
407 | step_length = int(window_length/2)
408 |
409 | # Compute the mel filterbank
410 | number_mels = 40
411 | mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
412 |
413 | # Compute the MFCCs using the filterbank
414 | number_coefficients = 20
415 | audio_mfcc = zaf.mfcc(audio_signal, window_function, step_length, mel_filterbank, number_coefficients)
416 |
417 | # Compute the delta and delta-delta MFCCs
418 | audio_dmfcc = np.diff(audio_mfcc, n=1, axis=1)
419 | audio_ddmfcc = np.diff(audio_dmfcc, n=1, axis=1)
420 |
421 | # Display the MFCCs, delta MFCCs, and delta-delta MFCCs in seconds
422 | number_samples = len(audio_signal)
423 | xtick_step = 1
424 | plt.figure(figsize=(14, 7))
425 | plt.subplot(3, 1, 1)
426 | zaf.mfccshow(audio_mfcc, number_samples, sampling_frequency, xtick_step), plt.title("MFCCs")
427 | plt.subplot(3, 1, 2)
428 | zaf.mfccshow(audio_dmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta MFCCs")
429 | plt.subplot(3, 1, 3)
430 | zaf.mfccshow(audio_ddmfcc, number_samples, sampling_frequency, xtick_step), plt.title("Delta-delta MFCCs")
431 | plt.tight_layout()
432 | plt.show()
433 | """
434 |
435 | # Compute the power spectrogram (without the DC component and the mirrored frequencies)
436 | audio_stft = stft(audio_signal, window_function, step_length)
437 | audio_spectrogram = np.power(
438 | abs(audio_stft[1 : int(len(window_function) / 2) + 1, :]), 2
439 | )
440 |
441 | # Compute the discrete cosine transform of the log magnitude spectrogram
442 | # mapped onto the mel scale using the filter bank
443 | audio_mfcc = scipy.fftpack.dct(
444 | np.log(
445 | np.matmul(mel_filterbank.toarray(), audio_spectrogram) + np.finfo(float).eps
446 | ),
447 | axis=0,
448 | norm="ortho",
449 | )
450 |
451 | # Keep only the first coefficients (without the 0th)
452 | audio_mfcc = audio_mfcc[1 : number_coefficients + 1, :]
453 |
454 | return audio_mfcc
455 |
456 |
457 | def cqtkernel(
458 | sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency
459 | ):
460 | """
461 | Compute the constant-Q transform (CQT) kernel.
462 |
463 | Inputs:
464 | sampling_frequency: sampling frequency in Hz
465 | octave_resolution: number of frequency channels per octave
466 | minimum_frequency: minimum frequency in Hz
467 | maximum_frequency: maximum frequency in Hz
468 | Output:
469 | cqt_kernel: CQT kernel (sparse) (number_frequencies, fft_length)
470 |
471 | Example: Compute and display a CQT kernel.
472 | # Import the needed modules
473 | import numpy as np
474 | import zaf
475 | import matplotlib.pyplot as plt
476 |
477 | # Set the parameters for the CQT kernel
478 | sampling_frequency = 44100
479 | octave_resolution = 24
480 | minimum_frequency = 55
481 | maximum_frequency = sampling_frequency/2
482 |
483 | # Compute the CQT kernel
484 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
485 |
486 | # Display the magnitude CQT kernel
487 | plt.figure(figsize=(14, 5))
488 | plt.imshow(np.absolute(cqt_kernel).toarray(), aspect="auto", cmap="jet", origin="lower")
489 | plt.title("Magnitude CQT kernel")
490 | plt.xlabel("FFT index")
491 | plt.ylabel("CQT index")
492 | plt.tight_layout()
493 | plt.show()
494 | """
495 |
496 | # Compute the constant ratio of frequency to resolution (= fk/(fk+1-fk))
497 | quality_factor = 1 / (pow(2, 1 / octave_resolution) - 1)
498 |
499 | # Compute the number of frequency channels for the CQT
500 | number_frequencies = round(
501 | octave_resolution * np.log2(maximum_frequency / minimum_frequency)
502 | )
503 |
504 | # Compute the window length for the FFT (= longest window for the minimum frequency)
505 | fft_length = int(
506 | pow(
507 | 2, np.ceil(np.log2(quality_factor * sampling_frequency / minimum_frequency))
508 | )
509 | )
510 |
511 | # Initialize the (complex) CQT kernel
512 | cqt_kernel = np.zeros((number_frequencies, fft_length), dtype=complex)
513 |
514 | # Loop over the frequency channels
515 | for i in range(number_frequencies):
516 |
517 | # Derive the frequency value in Hz
518 | frequency_value = minimum_frequency * pow(2, i / octave_resolution)
519 |
520 | # Compute the window length in samples (nearest odd value to center the temporal kernel on 0)
521 | window_length = (
522 | 2 * round(quality_factor * sampling_frequency / frequency_value / 2) + 1
523 | )
524 |
525 | # Compute the temporal kernel for the current frequency (odd and symmetric)
526 | temporal_kernel = (
527 | np.hamming(window_length)
528 | * np.exp(
529 | 2
530 | * np.pi
531 | * 1j
532 | * quality_factor
533 | * np.arange(-(window_length - 1) / 2, (window_length - 1) / 2 + 1)
534 | / window_length
535 | )
536 | / window_length
537 | )
538 |
539 | # Derive the pad width to center the temporal kernels
540 | pad_width = int((fft_length - window_length + 1) / 2)
541 |
542 | # Save the current temporal kernel at the center
543 | # (the zero-padded temporal kernels are not perfectly symmetric anymore because of the even length here)
544 | cqt_kernel[i, pad_width : pad_width + window_length] = temporal_kernel
545 |
546 | # Derive the spectral kernels by taking the FFT of the temporal kernels
547 | # (the spectral kernels are almost real because the temporal kernels are almost symmetric)
548 | cqt_kernel = np.fft.fft(cqt_kernel, axis=1)
549 |
550 | # Make the CQT kernel sparser by zeroing magnitudes below a threshold
551 | cqt_kernel[np.absolute(cqt_kernel) < 0.01] = 0
552 |
553 | # Make the CQT kernel sparse by saving it as a compressed sparse row matrix
554 | cqt_kernel = scipy.sparse.csr_matrix(cqt_kernel)
555 |
556 | # Get the final CQT kernel by using Parseval's theorem
557 | cqt_kernel = np.conjugate(cqt_kernel) / fft_length
558 |
559 | return cqt_kernel
560 |
561 |
562 | def cqtspectrogram(audio_signal, sampling_frequency, time_resolution, cqt_kernel):
563 | """
564 | Compute the constant-Q transform (CQT) spectrogram using a CQT kernel.
565 |
566 | Inputs:
567 | audio_signal: audio signal (number_samples,)
568 | sampling_frequency: sampling frequency in Hz
569 | time_resolution: number of time frames per second
570 | cqt_kernel: CQT kernel (number_frequencies, fft_length)
571 | Output:
572 | cqt_spectrogram: CQT spectrogram (number_frequencies, number_times)
573 |
574 | Example: Compute and display the CQT spectrogram.
575 | # Import the modules
576 | import numpy as np
577 | import zaf
578 | import matplotlib.pyplot as plt
579 |
580 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
581 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
582 | audio_signal = np.mean(audio_signal, 1)
583 |
584 | # Compute the CQT kernel
585 | octave_resolution = 24
586 | minimum_frequency = 55
587 | maximum_frequency = 3520
588 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
589 |
590 | # Compute the CQT spectrogram using the kernel
591 | time_resolution = 25
592 | cqt_spectrogram = zaf.cqtspectrogram(audio_signal, sampling_frequency, time_resolution, cqt_kernel)
593 |
594 | # Display the CQT spectrogram in dB, seconds, and Hz
595 | plt.figure(figsize=(14, 5))
596 | zaf.cqtspecshow(cqt_spectrogram, time_resolution, octave_resolution, minimum_frequency, xtick_step=1)
597 | plt.title("CQT spectrogram (dB)")
598 | plt.tight_layout()
599 | plt.show()
600 | """
601 |
602 | # Derive the number of time samples per time frame
603 | step_length = round(sampling_frequency / time_resolution)
604 |
605 | # Compute the number of time frames
606 | number_times = int(np.floor(len(audio_signal) / step_length))
607 |
608 | # Get th number of frequency channels and the FFT length
609 | number_frequencies, fft_length = np.shape(cqt_kernel)
610 |
611 | # Zero-pad the signal to center the CQT
612 | audio_signal = np.pad(
613 | audio_signal,
614 | (
615 | int(np.ceil((fft_length - step_length) / 2)),
616 | int(np.floor((fft_length - step_length) / 2)),
617 | ),
618 | "constant",
619 | constant_values=(0, 0),
620 | )
621 |
622 | # Initialize the CQT spectrogram
623 | cqt_spectrogram = np.zeros((number_frequencies, number_times))
624 |
625 | # Loop over the time frames
626 | i = 0
627 | for j in range(number_times):
628 |
629 | # Compute the magnitude CQT using the kernel
630 | cqt_spectrogram[:, j] = np.absolute(
631 | cqt_kernel * np.fft.fft(audio_signal[i : i + fft_length])
632 | )
633 | i = i + step_length
634 |
635 | return cqt_spectrogram
636 |
637 |
638 | def cqtchromagram(
639 | audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel
640 | ):
641 | """
642 | Compute the constant-Q transform (CQT) chromagram using a CQT kernel.
643 |
644 | Inputs:
645 | audio_signal: audio signal (number_samples,)
646 | sampling_frequency: sampling frequency in Hz
647 | time_resolution: number of time frames per second
648 | octave_resolution: number of frequency channels per octave
649 | cqt_kernel: CQT kernel (number_frequencies, fft_length)
650 | Output:
651 | cqt_chromagram: CQT chromagram (octave_resolution, number_times)
652 |
653 | Example: Compute and display the CQT chromagram.
654 | # Import the needed modules
655 | import numpy as np
656 | import zaf
657 | import matplotlib.pyplot as plt
658 |
659 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
660 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
661 | audio_signal = np.mean(audio_signal, 1)
662 |
663 | # Compute the CQT kernel
664 | octave_resolution = 24
665 | minimum_frequency = 55
666 | maximum_frequency = 3520
667 | cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
668 |
669 | # Compute the CQT chromagram using the kernel
670 | time_resolution = 25
671 | cqt_chromagram = zaf.cqtchromagram(audio_signal, sampling_frequency, time_resolution, octave_resolution, cqt_kernel)
672 |
673 | # Display the CQT chromagram in seconds
674 | plt.figure(figsize=(14, 3))
675 | zaf.cqtchromshow(cqt_chromagram, time_resolution, xtick_step=1)
676 | plt.title("CQT chromagram")
677 | plt.tight_layout()
678 | plt.show()
679 | """
680 |
681 | # Compute the CQT spectrogram
682 | cqt_spectrogram = cqtspectrogram(
683 | audio_signal, sampling_frequency, time_resolution, cqt_kernel
684 | )
685 |
686 | # Get the number of frequency channels and time frames
687 | number_frequencies, number_times = np.shape(cqt_spectrogram)
688 |
689 | # Initialize the CQT chromagram
690 | cqt_chromagram = np.zeros((octave_resolution, number_times))
691 |
692 | # Loop over the chroma channels
693 | for i in range(octave_resolution):
694 |
695 | # Sum the energy of the frequency channels for every chroma
696 | cqt_chromagram[i, :] = np.sum(
697 | cqt_spectrogram[i:number_frequencies:octave_resolution, :], axis=0
698 | )
699 |
700 | return cqt_chromagram
701 |
702 |
703 | def dct(audio_signal, dct_type):
704 | """
705 | Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
706 |
707 | Inputs:
708 | audio_signal: audio signal (window_length,)
709 | dct_type: DCT type (1, 2, 3, or 4)
710 | Output:
711 | audio_dct: audio DCT (number_frequencies,)
712 |
713 | Example: Compute the 4 different DCTs and compare them to SciPy's DCTs.
714 | # Import the needed modules
715 | import numpy as np
716 | import zaf
717 | import scipy.fftpack
718 | import matplotlib.pyplot as plt
719 |
720 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
721 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
722 | audio_signal = np.mean(audio_signal, 1)
723 |
724 | # Get an audio segment for a given window length
725 | window_length = 1024
726 | audio_segment = audio_signal[0:window_length]
727 |
728 | # Compute the DCT-I, II, III, and IV
729 | audio_dct1 = zaf.dct(audio_segment, 1)
730 | audio_dct2 = zaf.dct(audio_segment, 2)
731 | audio_dct3 = zaf.dct(audio_segment, 3)
732 | audio_dct4 = zaf.dct(audio_segment, 4)
733 |
734 | # Compute SciPy's DCT-I, II, III, and IV (orthogonalized)
735 | scipy_dct1 = scipy.fftpack.dct(audio_segment, type=1, norm="ortho")
736 | scipy_dct2 = scipy.fftpack.dct(audio_segment, type=2, norm="ortho")
737 | scipy_dct3 = scipy.fftpack.dct(audio_segment, type=3, norm="ortho")
738 | scipy_dct4 = scipy.fftpack.dct(audio_segment, type=4, norm="ortho")
739 |
740 | # Plot the DCT-I, II, III, and IV, SciPy's versions, and their differences
741 | plt.figure(figsize=(14, 7))
742 | plt.subplot(3, 4, 1), plt.plot(audio_dct1), plt.autoscale(tight=True), plt.title("DCT-I")
743 | plt.subplot(3, 4, 2), plt.plot(audio_dct2), plt.autoscale(tight=True), plt.title("DCT-II")
744 | plt.subplot(3, 4, 3), plt.plot(audio_dct3), plt.autoscale(tight=True), plt.title("DCT-III")
745 | plt.subplot(3, 4, 4), plt.plot(audio_dct4), plt.autoscale(tight=True), plt.title("DCT-IV")
746 | plt.subplot(3, 4, 5), plt.plot(scipy_dct1), plt.autoscale(tight=True), plt.title("SciPy's DCT-I")
747 | plt.subplot(3, 4, 6), plt.plot(scipy_dct2), plt.autoscale(tight=True), plt.title("SciPy's DCT-II")
748 | plt.subplot(3, 4, 7), plt.plot(scipy_dct3), plt.autoscale(tight=True), plt.title("SciPy's DCT-III")
749 | plt.subplot(3, 4, 8), plt.plot(scipy_dct4), plt.autoscale(tight=True), plt.title("SciPy's DCT-IV")
750 | plt.subplot(3, 4, 9), plt.plot(audio_dct1-scipy_dct1), plt.autoscale(tight=True), plt.title("DCT-I - SciPy's DCT-I")
751 | plt.subplot(3, 4, 10), plt.plot(audio_dct2-scipy_dct2), plt.autoscale(tight=True), plt.title("DCT-II - SciPy's DCT-II")
752 | plt.subplot(3, 4, 11), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-III - SciPy's DCT-III")
753 | plt.subplot(3, 4, 12), plt.plot(audio_dct3-scipy_dct3), plt.autoscale(tight=True), plt.title("DCT-IV - SciPy's DCT-IV")
754 | plt.tight_layout()
755 | plt.show()
756 | """
757 |
758 | # Check if the DCT type is I, II, III, or IV
759 | if dct_type == 1:
760 |
761 | # Get the number of samples
762 | window_length = len(audio_signal)
763 |
764 | # Pre-process the signal to make the DCT-I matrix orthogonal
765 | # (copy the signal to avoid modifying it outside of the function)
766 | audio_signal = audio_signal.copy()
767 | audio_signal[[0, -1]] = audio_signal[[0, -1]] * np.sqrt(2)
768 |
769 | # Compute the DCT-I using the FFT
770 | audio_dct = np.concatenate((audio_signal, audio_signal[-2:0:-1]))
771 | audio_dct = np.fft.fft(audio_dct)
772 | audio_dct = np.real(audio_dct[0:window_length]) / 2
773 |
774 | # Post-process the results to make the DCT-I matrix orthogonal
775 | audio_dct[[0, -1]] = audio_dct[[0, -1]] / np.sqrt(2)
776 | audio_dct = audio_dct * np.sqrt(2 / (window_length - 1))
777 |
778 | return audio_dct
779 |
780 | elif dct_type == 2:
781 |
782 | # Get the number of samples
783 | window_length = len(audio_signal)
784 |
785 | # Compute the DCT-II using the FFT
786 | audio_dct = np.zeros(4 * window_length)
787 | audio_dct[1 : 2 * window_length : 2] = audio_signal
788 | audio_dct[2 * window_length + 1 : 4 * window_length : 2] = audio_signal[::-1]
789 | audio_dct = np.fft.fft(audio_dct)
790 | audio_dct = np.real(audio_dct[0:window_length]) / 2
791 |
792 | # Post-process the results to make the DCT-II matrix orthogonal
793 | audio_dct[0] = audio_dct[0] / np.sqrt(2)
794 | audio_dct = audio_dct * np.sqrt(2 / window_length)
795 |
796 | return audio_dct
797 |
798 | elif dct_type == 3:
799 |
800 | # Get the number of samples
801 | window_length = len(audio_signal)
802 |
803 | # Pre-process the signal to make the DCT-III matrix orthogonal
804 | # (copy the signal to avoid modifying it outside of the function)
805 | audio_signal = audio_signal.copy()
806 | audio_signal[0] = audio_signal[0] * np.sqrt(2)
807 |
808 | # Compute the DCT-III using the FFT
809 | audio_dct = np.zeros(4 * window_length)
810 | audio_dct[0:window_length] = audio_signal
811 | audio_dct[window_length + 1 : 2 * window_length + 1] = -audio_signal[::-1]
812 | audio_dct[2 * window_length + 1 : 3 * window_length] = -audio_signal[1:]
813 | audio_dct[3 * window_length + 1 : 4 * window_length] = audio_signal[:0:-1]
814 | audio_dct = np.fft.fft(audio_dct)
815 | audio_dct = np.real(audio_dct[1 : 2 * window_length : 2]) / 4
816 |
817 | # Post-process the results to make the DCT-III matrix orthogonal
818 | audio_dct = audio_dct * np.sqrt(2 / window_length)
819 |
820 | return audio_dct
821 |
822 | elif dct_type == 4:
823 |
824 | # Get the number of samples
825 | window_length = len(audio_signal)
826 |
827 | # Compute the DCT-IV using the FFT
828 | audio_dct = np.zeros(8 * window_length)
829 | audio_dct[1 : 2 * window_length : 2] = audio_signal
830 | audio_dct[2 * window_length + 1 : 4 * window_length : 2] = -audio_signal[::-1]
831 | audio_dct[4 * window_length + 1 : 6 * window_length : 2] = -audio_signal
832 | audio_dct[6 * window_length + 1 : 8 * window_length : 2] = audio_signal[::-1]
833 | audio_dct = np.fft.fft(audio_dct)
834 | audio_dct = np.real(audio_dct[1 : 2 * window_length : 2]) / 4
835 |
836 | # Post-process the results to make the DCT-IV matrix orthogonal
837 | audio_dct = audio_dct * np.sqrt(2 / window_length)
838 |
839 | return audio_dct
840 |
841 |
842 | def dst(audio_signal, dst_type):
843 | """
844 | Compute the discrete sine transform (DST) using the fast Fourier transform (FFT).
845 |
846 | Inputs:
847 | audio_signal: audio signal (window_length,)
848 | dst_type: DST type (1, 2, 3, or 4)
849 | Output:
850 | audio_dst: audio DST (number_frequencies,)
851 |
852 | Example: Compute the 4 different DSTs and compare their respective inverses with the original audio.
853 | # Import the needed modules
854 | import numpy as np
855 | import zaf
856 | import matplotlib.pyplot as plt
857 |
858 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
859 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
860 | audio_signal = np.mean(audio_signal, 1)
861 |
862 | # Get an audio segment for a given window length
863 | window_length = 1024
864 | audio_segment = audio_signal[0:window_length]
865 |
866 | # Compute the DST-I, II, III, and IV
867 | audio_dst1 = zaf.dst(audio_segment, 1)
868 | audio_dst2 = zaf.dst(audio_segment, 2)
869 | audio_dst3 = zaf.dst(audio_segment, 3)
870 | audio_dst4 = zaf.dst(audio_segment, 4)
871 |
872 | # Compute their respective inverses, i.e., DST-I, II, III, and IV
873 | audio_idst1 = zaf.dst(audio_dst1, 1)
874 | audio_idst2 = zaf.dst(audio_dst2, 3)
875 | audio_idst3 = zaf.dst(audio_dst3, 2)
876 | audio_idst4 = zaf.dst(audio_dst4, 4)
877 |
878 | # Plot the DST-I, II, III, and IV, their respective inverses, and their differences with the original audio segment
879 | plt.figure(figsize=(14, 7))
880 | plt.subplot(3, 4, 1), plt.plot(audio_dst1), plt.autoscale(tight=True), plt.title("DCT-I")
881 | plt.subplot(3, 4, 2), plt.plot(audio_dst2), plt.autoscale(tight=True), plt.title("DST-II")
882 | plt.subplot(3, 4, 3), plt.plot(audio_dst3), plt.autoscale(tight=True), plt.title("DST-III")
883 | plt.subplot(3, 4, 4), plt.plot(audio_dst4), plt.autoscale(tight=True), plt.title("DST-IV")
884 | plt.subplot(3, 4, 5), plt.plot(audio_idst1), plt.autoscale(tight=True), plt.title("Inverse DST-I (DST-I)")
885 | plt.subplot(3, 4, 6), plt.plot(audio_idst2), plt.autoscale(tight=True), plt.title("Inverse DST-II (DST-III)")
886 | plt.subplot(3, 4, 7), plt.plot(audio_idst3), plt.autoscale(tight=True), plt.title("Inverse DST-III (DST-II)")
887 | plt.subplot(3, 4, 8), plt.plot(audio_idst4), plt.autoscale(tight=True), plt.title("Inverse DST-IV (DST-IV)")
888 | plt.subplot(3, 4, 9), plt.plot(audio_idst1-audio_segment), plt.autoscale(tight=True)
889 | plt.title("Inverse DST-I - audio segment")
890 | plt.subplot(3, 4, 10), plt.plot(audio_idst2-audio_segment), plt.autoscale(tight=True)
891 | plt.title("Inverse DST-II - audio segment")
892 | plt.subplot(3, 4, 11), plt.plot(audio_idst3-audio_segment), plt.autoscale(tight=True)
893 | plt.title("Inverse DST-III - audio segment")
894 | plt.subplot(3, 4, 12), plt.plot(audio_idst4-audio_segment), plt.autoscale(tight=True)
895 | plt.title("Inverse DST-IV - audio segment")
896 | plt.tight_layout()
897 | plt.show()
898 | """
899 |
900 | # Check if the DST type is I, II, III, or IV
901 | if dst_type == 1:
902 |
903 | # Get the number of samples
904 | window_length = len(audio_signal)
905 |
906 | # Compute the DST-I using the FFT
907 | audio_dst = np.zeros(2 * window_length + 2)
908 | audio_dst[1 : window_length + 1] = audio_signal
909 | audio_dst[window_length + 2 :] = -audio_signal[::-1]
910 | audio_dst = np.fft.fft(audio_dst)
911 | audio_dst = -np.imag(audio_dst[1 : window_length + 1]) / 2
912 |
913 | # Post-process the results to make the DST-I matrix orthogonal
914 | audio_dst = audio_dst * np.sqrt(2 / (window_length + 1))
915 |
916 | return audio_dst
917 |
918 | elif dst_type == 2:
919 |
920 | # Get the number of samples
921 | window_length = len(audio_signal)
922 |
923 | # Compute the DST-II using the FFT
924 | audio_dst = np.zeros(4 * window_length)
925 | audio_dst[1 : 2 * window_length : 2] = audio_signal
926 | audio_dst[2 * window_length + 1 : 4 * window_length : 2] = -audio_signal[-1::-1]
927 | audio_dst = np.fft.fft(audio_dst)
928 | audio_dst = -np.imag(audio_dst[1 : window_length + 1]) / 2
929 |
930 | # Post-process the results to make the DST-II matrix orthogonal
931 | audio_dst[-1] = audio_dst[-1] / np.sqrt(2)
932 | audio_dst = audio_dst * np.sqrt(2 / window_length)
933 |
934 | return audio_dst
935 |
936 | elif dst_type == 3:
937 |
938 | # Get the number of samples
939 | window_length = len(audio_signal)
940 |
941 | # Pre-process the signal to make the DST-III matrix orthogonal
942 | # (copy the signal to avoid modifying it outside of the function)
943 | audio_signal = audio_signal.copy()
944 | audio_signal[-1] = audio_signal[-1] * np.sqrt(2)
945 |
946 | # Compute the DST-III using the FFT
947 | audio_dst = np.zeros(4 * window_length)
948 | audio_dst[1 : window_length + 1] = audio_signal
949 | audio_dst[window_length + 1 : 2 * window_length] = audio_signal[-2::-1]
950 | audio_dst[2 * window_length + 1 : 3 * window_length + 1] = -audio_signal
951 | audio_dst[3 * window_length + 1 : 4 * window_length] = -audio_signal[-2::-1]
952 | audio_dst = np.fft.fft(audio_dst)
953 | audio_dst = -np.imag(audio_dst[1 : 2 * window_length : 2]) / 4
954 |
955 | # Post-process the results to make the DST-III matrix orthogonal
956 | audio_dst = audio_dst * np.sqrt(2 / window_length)
957 |
958 | return audio_dst
959 |
960 | elif dst_type == 4:
961 |
962 | # Initialize the DST-IV
963 | window_length = len(audio_signal)
964 | audio_dst = np.zeros(8 * window_length)
965 |
966 | # Compute the DST-IV using the FFT
967 | audio_dst[1 : 2 * window_length : 2] = audio_signal
968 | audio_dst[2 * window_length + 1 : 4 * window_length : 2] = audio_signal[
969 | window_length - 1 :: -1
970 | ]
971 | audio_dst[4 * window_length + 1 : 6 * window_length : 2] = -audio_signal
972 | audio_dst[6 * window_length + 1 : 8 * window_length : 2] = -audio_signal[
973 | window_length - 1 :: -1
974 | ]
975 | audio_dst = np.fft.fft(audio_dst)
976 | audio_dst = -np.imag(audio_dst[1 : 2 * window_length : 2]) / 4
977 |
978 | # Post-process the results to make the DST-IV matrix orthogonal
979 | audio_dst = audio_dst * np.sqrt(2 / window_length)
980 |
981 | return audio_dst
982 |
983 |
984 | def mdct(audio_signal, window_function):
985 | """
986 | Compute the modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
987 |
988 | Inputs:
989 | audio_signal: audio signal (number_samples,)
990 | window_function: window function (window_length,)
991 | Output:
992 | audio_mdct: audio MDCT (number_frequencies, number_times)
993 |
994 | Example: Compute and display the MDCT as used in the AC-3 audio coding format.
995 | # Import the needed modules
996 | import numpy as np
997 | import zaf
998 | import matplotlib.pyplot as plt
999 |
1000 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
1001 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
1002 | audio_signal = np.mean(audio_signal, 1)
1003 |
1004 | # Compute the Kaiser-Bessel-derived (KBD) window as used in the AC-3 audio coding format
1005 | window_length = 512
1006 | alpha_value = 5
1007 | window_function = np.kaiser(int(window_length/2)+1, alpha_value*np.pi)
1008 | window_function2 = np.cumsum(window_function[1:int(window_length/2)])
1009 | window_function = np.sqrt(np.concatenate((window_function2, window_function2[int(window_length/2)::-1]))
1010 | /np.sum(window_function))
1011 |
1012 | # Compute the MDCT
1013 | audio_mdct = zaf.mdct(audio_signal, window_function)
1014 |
1015 | # Display the MDCT in dB, seconds, and Hz
1016 | number_samples = len(audio_signal)
1017 | plt.figure(figsize=(14, 7))
1018 | zaf.specshow(np.absolute(audio_mdct), number_samples, sampling_frequency, xtick_step=1, ytick_step=1000)
1019 | plt.title("MDCT (dB)")
1020 | plt.tight_layout()
1021 | plt.show()
1022 | """
1023 |
1024 | # Get the number of samples and the window length in samples
1025 | number_samples = len(audio_signal)
1026 | window_length = len(window_function)
1027 |
1028 | # Derive the step length and the number of frequencies (for clarity)
1029 | step_length = int(window_length / 2)
1030 | number_frequencies = int(window_length / 2)
1031 |
1032 | # Derive the number of time frames
1033 | number_times = int(np.ceil(number_samples / step_length)) + 1
1034 |
1035 | # Zero-pad the start and the end of the signal to center the windows
1036 | audio_signal = np.pad(
1037 | audio_signal,
1038 | (step_length, (number_times + 1) * step_length - number_samples),
1039 | "constant",
1040 | constant_values=0,
1041 | )
1042 |
1043 | # Initialize the MDCT
1044 | audio_mdct = np.zeros((number_frequencies, number_times))
1045 |
1046 | # Prepare the pre-processing and post-processing arrays
1047 | preprocessing_array = np.exp(
1048 | -1j * np.pi / window_length * np.arange(0, window_length)
1049 | )
1050 | postprocessing_array = np.exp(
1051 | -1j
1052 | * np.pi
1053 | / window_length
1054 | * (window_length / 2 + 1)
1055 | * np.arange(0.5, window_length / 2 + 0.5)
1056 | )
1057 |
1058 | # Loop over the time frames
1059 | # (Do the pre and post-processing, and take the FFT in the loop to avoid storing twice longer frames)
1060 | i = 0
1061 | for j in range(number_times):
1062 |
1063 | # Window the signal
1064 | audio_segment = audio_signal[i : i + window_length] * window_function
1065 | i = i + step_length
1066 |
1067 | # Compute the Fourier transform of the windowed segment using the FFT after pre-processing
1068 | audio_segment = np.fft.fft(audio_segment * preprocessing_array)
1069 |
1070 | # Truncate to the first half before post-processing (and take the real to ensure real values)
1071 | audio_mdct[:, j] = np.real(
1072 | audio_segment[0:number_frequencies] * postprocessing_array
1073 | )
1074 |
1075 | return audio_mdct
1076 |
1077 |
1078 | def imdct(audio_mdct, window_function):
1079 | """
1080 | Compute the inverse modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
1081 |
1082 | Inputs:
1083 | audio_mdct: audio MDCT (number_frequencies, number_times)
1084 | window_function: window function (window_length,)
1085 | Output:
1086 | audio_signal: audio signal (number_samples,)
1087 |
1088 | Example: Verify that the MDCT is perfectly invertible.
1089 | # Import the needed modules
1090 | import numpy as np
1091 | import zaf
1092 | import matplotlib.pyplot as plt
1093 |
1094 | # Read the audio signal (normalized) with its sampling frequency in Hz, and average it over its channels
1095 | audio_signal, sampling_frequency = zaf.wavread("audio_file.wav")
1096 | audio_signal = np.mean(audio_signal, 1)
1097 |
1098 | # Compute the MDCT with a slope function as used in the Vorbis audio coding format
1099 | window_length = 2048
1100 | window_function = np.sin(np.pi/2*pow(np.sin(np.pi/window_length*np.arange(0.5, window_length+0.5)), 2))
1101 | audio_mdct = zaf.mdct(audio_signal, window_function)
1102 |
1103 | # Compute the inverse MDCT
1104 | audio_signal2 = zaf.imdct(audio_mdct, window_function)
1105 | audio_signal2 = audio_signal2[0:len(audio_signal)]
1106 |
1107 | # Compute the differences between the original signal and the resynthesized one
1108 | audio_differences = audio_signal-audio_signal2
1109 | y_max = np.max(np.absolute(audio_differences))
1110 |
1111 | # Display the original and resynthesized signals, and their differences in seconds
1112 | xtick_step = 1
1113 | plt.figure(figsize=(14, 7))
1114 | plt.subplot(3, 1, 1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step)
1115 | plt.ylim(-1, 1), plt.title("Original signal")
1116 | plt.subplot(3, 1, 2), zaf.sigplot(audio_signal2, sampling_frequency, xtick_step)
1117 | plt.ylim(-1, 1), plt.title("Resyntesized signal")
1118 | plt.subplot(3, 1, 3), zaf.sigplot(audio_differences, sampling_frequency, xtick_step)
1119 | plt.ylim(-y_max, y_max), plt.title("Original - resyntesized signal")
1120 | plt.tight_layout()
1121 | plt.show()
1122 | """
1123 |
1124 | # Get the number of frequency channels and time frames
1125 | number_frequencies, number_times = np.shape(audio_mdct)
1126 |
1127 | # Derive the window length and the step length in samples (for clarity)
1128 | window_length = 2 * number_frequencies
1129 | step_length = number_frequencies
1130 |
1131 | # Derive the number of samples for the signal
1132 | number_samples = step_length * (number_times + 1)
1133 |
1134 | # Initialize the audio signal
1135 | audio_signal = np.zeros(number_samples)
1136 |
1137 | # Prepare the pre-processing and post-processing arrays
1138 | preprocessing_array = np.exp(
1139 | -1j
1140 | * np.pi
1141 | / (2 * number_frequencies)
1142 | * (number_frequencies + 1)
1143 | * np.arange(0, number_frequencies)
1144 | )
1145 | postprocessing_array = (
1146 | np.exp(
1147 | -1j
1148 | * np.pi
1149 | / (2 * number_frequencies)
1150 | * np.arange(
1151 | 0.5 + number_frequencies / 2,
1152 | 2 * number_frequencies + number_frequencies / 2 + 0.5,
1153 | )
1154 | )
1155 | / number_frequencies
1156 | )
1157 |
1158 | # Compute the Fourier transform of the frames using the FFT after pre-processing (zero-pad to get twice the length)
1159 | audio_mdct = np.fft.fft(
1160 | audio_mdct * preprocessing_array[:, np.newaxis],
1161 | n=2 * number_frequencies,
1162 | axis=0,
1163 | )
1164 |
1165 | # Apply the window function to the frames after post-processing (take the real to ensure real values)
1166 | audio_mdct = 2 * (
1167 | np.real(audio_mdct * postprocessing_array[:, np.newaxis])
1168 | * window_function[:, np.newaxis]
1169 | )
1170 |
1171 | # Loop over the time frames
1172 | i = 0
1173 | for j in range(number_times):
1174 |
1175 | # Recover the signal with the time-domain aliasing cancellation (TDAC) principle
1176 | audio_signal[i : i + window_length] = (
1177 | audio_signal[i : i + window_length] + audio_mdct[:, j]
1178 | )
1179 | i = i + step_length
1180 |
1181 | # Remove the zero-padding at the start and at the end of the signal
1182 | audio_signal = audio_signal[step_length : -step_length - 1]
1183 |
1184 | return audio_signal
1185 |
1186 |
1187 | def wavread(audio_file):
1188 | """
1189 | Read a WAVE file (using SciPy).
1190 |
1191 | Input:
1192 | audio_file: path to an audio file
1193 | Outputs:
1194 | audio_signal: audio signal (number_samples, number_channels)
1195 | sampling_frequency: sampling frequency in Hz
1196 | """
1197 |
1198 | # Read the audio file and return the sampling frequency in Hz and the non-normalized signal using SciPy
1199 | sampling_frequency, audio_signal = scipy.io.wavfile.read(audio_file)
1200 |
1201 | # Normalize the signal by the data range given the size of an item in bytes
1202 | audio_signal = audio_signal / pow(2, audio_signal.itemsize * 8 - 1)
1203 |
1204 | return audio_signal, sampling_frequency
1205 |
1206 |
1207 | def wavwrite(audio_signal, sampling_frequency, audio_file):
1208 | """
1209 | Write a WAVE file (using Scipy).
1210 |
1211 | Inputs:
1212 | audio_signal: audio signal (number_samples, number_channels)
1213 | sampling_frequency: sampling frequency in Hz
1214 | Output:
1215 | audio_file: path to an audio file
1216 | """
1217 |
1218 | # Write the audio signal using SciPy
1219 | scipy.io.wavfile.write(audio_file, sampling_frequency, audio_signal)
1220 |
1221 |
1222 | def sigplot(
1223 | audio_signal,
1224 | sampling_frequency,
1225 | xtick_step=1,
1226 | ):
1227 | """
1228 | Plot a signal in seconds.
1229 |
1230 | Inputs:
1231 | audio_signal: audio signal (number_samples, number_channels) (number_channels>=0)
1232 | sampling_frequency: sampling frequency in Hz
1233 | xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1234 | """
1235 |
1236 | # Get the number of samples
1237 | number_samples = np.shape(audio_signal)[0]
1238 |
1239 | # Prepare the tick locations and labels for the x-axis
1240 | xtick_locations = np.arange(
1241 | xtick_step * sampling_frequency,
1242 | number_samples,
1243 | xtick_step * sampling_frequency,
1244 | )
1245 | xtick_labels = np.arange(
1246 | xtick_step, number_samples / sampling_frequency, xtick_step
1247 | ).astype(int)
1248 |
1249 | # Plot the signal in seconds
1250 | plt.plot(audio_signal)
1251 | plt.autoscale(tight=True)
1252 | plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1253 | plt.xlabel("Time (s)")
1254 |
1255 |
1256 | def specshow(
1257 | audio_spectrogram,
1258 | number_samples,
1259 | sampling_frequency,
1260 | xtick_step=1,
1261 | ytick_step=1000,
1262 | ):
1263 | """
1264 | Display a spectrogram in dB, seconds, and Hz.
1265 |
1266 | Inputs:
1267 | audio_spectrogram: audio spectrogram (without DC and mirrored frequencies) (number_frequencies, number_times)
1268 | number_samples: number of samples from the original signal
1269 | sampling_frequency: sampling frequency from the original signal in Hz
1270 | xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1271 | ytick_step: step for the y-axis ticks in Hz (default: 1000 Hz)
1272 | """
1273 |
1274 | # Get the number of frequency channels and time frames
1275 | number_frequencies, number_times = np.shape(audio_spectrogram)
1276 |
1277 | # Derive the number of seconds and Hertz
1278 | number_seconds = number_samples / sampling_frequency
1279 | number_hertz = sampling_frequency / 2
1280 |
1281 | # Derive the number of time frames per second and the number of frequency channels per Hz
1282 | time_resolution = number_times / number_seconds
1283 | frequency_resolution = number_frequencies / number_hertz
1284 |
1285 | # Prepare the tick locations and labels for the x-axis
1286 | xtick_locations = np.arange(
1287 | xtick_step * time_resolution,
1288 | number_times,
1289 | xtick_step * time_resolution,
1290 | )
1291 | xtick_labels = np.arange(xtick_step, number_seconds, xtick_step).astype(int)
1292 |
1293 | # Prepare the tick locations and labels for the y-axis
1294 | ytick_locations = np.arange(
1295 | ytick_step * frequency_resolution,
1296 | number_frequencies,
1297 | ytick_step * frequency_resolution,
1298 | )
1299 | ytick_labels = np.arange(ytick_step, number_hertz, ytick_step).astype(int)
1300 |
1301 | # Display the spectrogram in dB, seconds, and Hz
1302 | plt.imshow(
1303 | 20 * np.log10(audio_spectrogram), aspect="auto", cmap="jet", origin="lower"
1304 | )
1305 | plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1306 | plt.yticks(ticks=ytick_locations, labels=ytick_labels)
1307 | plt.xlabel("Time (s)")
1308 | plt.ylabel("Frequency (Hz)")
1309 |
1310 |
1311 | def melspecshow(
1312 | mel_spectrogram,
1313 | number_samples,
1314 | sampling_frequency,
1315 | window_length,
1316 | xtick_step=1,
1317 | ):
1318 | """
1319 | Display a mel spectrogram in dB, seconds, and Hz.
1320 |
1321 | Inputs:
1322 | mel_spectrogram: mel spectrogram (number_mels, number_times)
1323 | number_samples: number of samples from the original signal
1324 | sampling_frequency: sampling frequency from the original signal in Hz
1325 | window_length: window length from the Fourier analysis in number of samples
1326 | xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1327 | """
1328 |
1329 | # Get the number of mels and time frames
1330 | number_mels, number_times = np.shape(mel_spectrogram)
1331 |
1332 | # Derive the number of seconds and the number of time frames per second
1333 | number_seconds = number_samples / sampling_frequency
1334 | time_resolution = number_times / number_seconds
1335 |
1336 | # Derive the minimum and maximum mel
1337 | minimum_mel = 2595 * np.log10(1 + (sampling_frequency / window_length) / 700)
1338 | maximum_mel = 2595 * np.log10(1 + (sampling_frequency / 2) / 700)
1339 |
1340 | # Compute the mel scale (linearly spaced)
1341 | mel_scale = np.linspace(minimum_mel, maximum_mel, number_mels)
1342 |
1343 | # Derive the Hertz scale (log spaced)
1344 | hertz_scale = 700 * (np.power(10, mel_scale / 2595) - 1)
1345 |
1346 | # Prepare the tick locations and labels for the x-axis
1347 | xtick_locations = np.arange(
1348 | xtick_step * time_resolution,
1349 | number_times,
1350 | xtick_step * time_resolution,
1351 | )
1352 | xtick_labels = np.arange(xtick_step, number_seconds, xtick_step).astype(int)
1353 |
1354 | # Prepare the tick locations and labels for the y-axis
1355 | ytick_locations = np.arange(0, number_mels, 8)
1356 | ytick_labels = hertz_scale[::8].astype(int)
1357 |
1358 | # Display the mel spectrogram in dB, seconds, and Hz
1359 | plt.imshow(
1360 | 20 * np.log10(mel_spectrogram), aspect="auto", cmap="jet", origin="lower"
1361 | )
1362 | plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1363 | plt.yticks(ticks=ytick_locations, labels=ytick_labels)
1364 | plt.xlabel("Time (s)")
1365 | plt.ylabel("Frequency (Hz)")
1366 |
1367 |
1368 | def mfccshow(
1369 | audio_mfcc,
1370 | number_samples,
1371 | sampling_frequency,
1372 | xtick_step=1,
1373 | ):
1374 | """
1375 | Display MFCCs in seconds.
1376 |
1377 | Inputs:
1378 | audio_mfcc: audio MFCCs (number_coefficients, number_times)
1379 | number_samples: number of samples from the original signal
1380 | sampling_frequency: sampling frequency from the original signal in Hz
1381 | xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1382 | """
1383 |
1384 | # Get the number of time frames
1385 | number_times = np.shape(audio_mfcc)[1]
1386 |
1387 | # Derive the number of seconds and the number of time frames per second
1388 | number_seconds = number_samples / sampling_frequency
1389 | time_resolution = number_times / number_seconds
1390 |
1391 | # Prepare the tick locations and labels for the x-axis
1392 | xtick_locations = np.arange(
1393 | xtick_step * time_resolution,
1394 | number_times,
1395 | xtick_step * time_resolution,
1396 | )
1397 | xtick_labels = np.arange(xtick_step, number_seconds, xtick_step).astype(int)
1398 |
1399 | # Display the MFCCs in seconds
1400 | plt.imshow(audio_mfcc, aspect="auto", cmap="jet", origin="lower")
1401 | plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1402 | plt.xlabel("Time (s)")
1403 | plt.ylabel("Coefficients")
1404 |
1405 |
1406 | def cqtspecshow(
1407 | cqt_spectrogram,
1408 | time_resolution,
1409 | octave_resolution,
1410 | minimum_frequency,
1411 | xtick_step=1,
1412 | ):
1413 | """
1414 | Display a CQT spectrogram in dB, seconds, and Hz.
1415 |
1416 | Inputs:
1417 | cqt_spectrogram: CQT spectrogram (number_frequencies, number_times)
1418 | time_resolution: number of time frames per second
1419 | octave_resolution: number of frequency channels per octave
1420 | minimum_frequency: minimum frequency in Hz
1421 | xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1422 | """
1423 |
1424 | # Get the number of frequency channels and time frames
1425 | number_frequencies, number_times = np.shape(cqt_spectrogram)
1426 |
1427 | # Prepare the tick locations and labels for the x-axis
1428 | xtick_locations = np.arange(
1429 | xtick_step * time_resolution,
1430 | number_times,
1431 | xtick_step * time_resolution,
1432 | )
1433 | xtick_labels = np.arange(
1434 | xtick_step, number_times / time_resolution, xtick_step
1435 | ).astype(int)
1436 |
1437 | # Prepare the tick locations and labels for the y-axis
1438 | ytick_locations = np.arange(0, number_frequencies, octave_resolution)
1439 | ytick_labels = (
1440 | minimum_frequency * pow(2, ytick_locations / octave_resolution)
1441 | ).astype(int)
1442 |
1443 | # Display the CQT spectrogram in dB and seconds, and Hz
1444 | plt.imshow(
1445 | 20 * np.log10(cqt_spectrogram), aspect="auto", cmap="jet", origin="lower"
1446 | )
1447 | plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1448 | plt.yticks(ticks=ytick_locations, labels=ytick_labels)
1449 | plt.xlabel("Time (s)")
1450 | plt.ylabel("Frequency (Hz)")
1451 |
1452 |
1453 | def cqtchromshow(
1454 | cqt_chromagram,
1455 | time_resolution,
1456 | xtick_step=1,
1457 | ):
1458 | """
1459 | Display a CQT chromagram in seconds.
1460 |
1461 | Inputs:
1462 | audio_chromagram: CQT chromagram (number_chromas, number_times)
1463 | time_resolution: number of time frames per second
1464 | xtick_step: step for the x-axis ticks in seconds (default: 1 second)
1465 | """
1466 |
1467 | # Get the number of time frames
1468 | number_times = np.shape(cqt_chromagram)[1]
1469 |
1470 | # Prepare the tick locations and labels for the x-axis
1471 | xtick_locations = np.arange(
1472 | xtick_step * time_resolution,
1473 | number_times,
1474 | xtick_step * time_resolution,
1475 | )
1476 | xtick_labels = np.arange(
1477 | xtick_step, number_times / time_resolution, xtick_step
1478 | ).astype(int)
1479 |
1480 | # Display the CQT chromagram in seconds
1481 | plt.imshow(cqt_chromagram, aspect="auto", cmap="jet", origin="lower")
1482 | plt.xticks(ticks=xtick_locations, labels=xtick_labels)
1483 | plt.xlabel("Time (s)")
1484 | plt.ylabel("Chroma")
--------------------------------------------------------------------------------