├── ComplexFlux.py ├── .gitignore ├── README.md └── SuperFlux.py /ComplexFlux.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | """ 4 | Simple wrapper for calling SuperFlux with the correct defaults values. 5 | 6 | """ 7 | 8 | from SuperFlux import parser, main 9 | 10 | if __name__ == '__main__': 11 | # parse arguments 12 | args = parser(lgd=True, threshold=0.25) 13 | # and run the main SuperFlux program 14 | main(args) -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[cod] 2 | 3 | # C extensions 4 | *.so 5 | 6 | # Packages 7 | *.egg 8 | *.egg-info 9 | dist 10 | build 11 | eggs 12 | parts 13 | bin 14 | var 15 | sdist 16 | develop-eggs 17 | .installed.cfg 18 | lib 19 | lib64 20 | 21 | # Installer logs 22 | pip-log.txt 23 | 24 | # Unit test / coverage reports 25 | .coverage 26 | .tox 27 | nosetests.xml 28 | 29 | # Translations 30 | *.mo 31 | 32 | # Mr Developer 33 | .mr.developer.cfg 34 | .project 35 | .pydevproject 36 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | SuperFlux 2 | ========= 3 | 4 | Python reference implementation of the SuperFlux onset detection algorithm as 5 | described in: 6 | 7 | "Maximum Filter Vibrato Suppression for Onset Detection" 8 | by Sebastian Böck and Gerhard Widmer. 9 | Proceedings of the 16th International Conference on Digital Audio Effects 10 | (DAFx-13), Maynooth, Ireland, September 2013. 11 | 12 | and the additional local group delay (LGD) based weighting scheme described in: 13 | 14 | "Local group delay based vibrato and tremolo suppression for onset detection" 15 | by Sebastian Böck and Gerhard Widmer. 16 | Proceedings of the 13th International Society for Music Information 17 | Retrieval Conference (ISMIR), Curitiba, Brazil, November 2013. 18 | 19 | The papers can be downloaded from: 20 | 21 | 22 | 23 | 24 | 25 | If you use this software, please cite the corresponding paper. 26 | 27 | ``` 28 | @inproceedings{Boeck2013, 29 | Author = {B{\"o}ck, Sebastian and Widmer, Gerhard}, 30 | Title = {Maximum Filter Vibrato Suppression for Onset Detection}, 31 | Booktitle = {{Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13)}}, 32 | Pages = {55--61}, 33 | Address = {Maynooth, Ireland}, 34 | Month = {September}, 35 | Year = {2013} 36 | } 37 | 38 | @inproceedings{Boeck2013a, 39 | Author = {B{\"o}ck, Sebastian and Widmer, Gerhard}, 40 | Title = {Local Group Delay based Vibrato and Tremolo Suppression for Onset Detection}, 41 | Booktitle = {{Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), 2013.}, 42 | Pages = {589–-594}, 43 | Address = {Curitiba, Brazil}, 44 | Month = {November}, 45 | Year = {2013} 46 | 47 | ``` 48 | 49 | 50 | Usage 51 | ----- 52 | `SuperFlux.py input.wav` processes the audio file and writes the detected 53 | onsets to a file named `input.superflux.txt`. 54 | 55 | Please see the `-h` option to get a more detailed description of the available 56 | options, e.g. changing the suffix for the detection files. 57 | 58 | Requirements 59 | ------------ 60 | * Python 2.7 61 | * Numpy 62 | * Scipy 63 | 64 | -------------------------------------------------------------------------------- /SuperFlux.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | """ 4 | Copyright (c) 2012 - 2014 Sebastian Böck 5 | All rights reserved. 6 | 7 | Redistribution and use in source and binary forms, with or without 8 | modification, are permitted provided that the following conditions are met: 9 | 10 | 1. Redistributions of source code must retain the above copyright notice, this 11 | list of conditions and the following disclaimer. 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 17 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 18 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 20 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 21 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 22 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 23 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 24 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 25 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | 27 | """ 28 | 29 | """ 30 | Please note that this program released together with the paper 31 | 32 | "Maximum Filter Vibrato Suppression for Onset Detection" 33 | Sebastian Böck and Gerhard Widmer. 34 | Proceedings of the 16th International Conference on Digital Audio Effects 35 | (DAFx-13), Maynooth, Ireland, September 2013 36 | 37 | is not tuned in any way for speed/memory efficiency. However, it can be used 38 | as a reference implementation for the described onset detection with a maximum 39 | filter for vibrato suppression. 40 | 41 | It also serves as a reference implementation of the local group delay (LGD) 42 | based weighting extension described in: 43 | 44 | "Local group delay based vibrato and tremolo suppression for onset detection" 45 | Sebastian Böck and Gerhard Widmer. 46 | Proceedings of the 13th International Society for Music Information 47 | Retrieval Conference (ISMIR), 2013. 48 | 49 | If you use this software, please cite the corresponding paper. 50 | 51 | Please send any comments, enhancements, errata, etc. to the main author. 52 | 53 | """ 54 | 55 | import numpy as np 56 | import scipy.fftpack as fft 57 | from scipy.io import wavfile 58 | from scipy.ndimage.filters import (maximum_filter, maximum_filter1d, 59 | uniform_filter1d) 60 | 61 | 62 | class Filter(object): 63 | """ 64 | Filter Class. 65 | 66 | """ 67 | def __init__(self, num_fft_bins, fs, bands=24, fmin=30, fmax=17000, equal=False): 68 | """ 69 | Creates a new Filter object instance. 70 | 71 | :param num_fft_bins: number of FFT coefficients 72 | :param fs: sample rate of the audio file 73 | :param bands: number of filter bands 74 | :param fmin: the minimum frequency [Hz] 75 | :param fmax: the maximum frequency [Hz] 76 | :param equal: normalize the area of each band to 1 77 | 78 | """ 79 | # sample rate 80 | self.fs = fs 81 | # reduce fmax if necessary 82 | if fmax > fs / 2: 83 | fmax = fs / 2 84 | # get a list of frequencies 85 | frequencies = self.frequencies(bands, fmin, fmax) 86 | # conversion factor for mapping of frequencies to spectrogram bins 87 | factor = (fs / 2.0) / num_fft_bins 88 | # map the frequencies to the spectrogram bins 89 | frequencies = np.round(np.asarray(frequencies) / factor).astype(int) 90 | # only keep unique bins 91 | frequencies = np.unique(frequencies) 92 | # filter out all frequencies outside the valid range 93 | frequencies = [f for f in frequencies if f < num_fft_bins] 94 | # number of bands 95 | bands = len(frequencies) - 2 96 | assert bands >= 3, 'cannot create filterbank with less than 3 ' \ 97 | 'frequencies' 98 | # init the filter matrix with size: number of FFT bins x filter bands 99 | self.filterbank = np.zeros([num_fft_bins, bands], dtype=np.float) 100 | # process all bands 101 | for band in range(bands): 102 | # edge & center frequencies 103 | start, mid, stop = frequencies[band:band + 3] 104 | # create a triangular filter 105 | triangular_filter = self.triangular_filter(start, mid, stop, equal) 106 | self.filterbank[start:stop, band] = triangular_filter 107 | 108 | @staticmethod 109 | def frequencies(bands, fmin, fmax, a=440): 110 | """ 111 | Returns a list of frequencies aligned on a logarithmic scale. 112 | 113 | :param bands: number of filter bands per octave 114 | :param fmin: the minimum frequency [Hz] 115 | :param fmax: the maximum frequency [Hz] 116 | :param a: frequency of A0 [Hz] 117 | :returns: a list of frequencies 118 | 119 | Using 12 bands per octave and a=440 corresponding to the MIDI notes. 120 | 121 | """ 122 | # factor 2 frequencies are apart 123 | factor = 2.0 ** (1.0 / bands) 124 | # start with A0 125 | freq = a 126 | frequencies = [freq] 127 | # go upwards till fmax 128 | while freq <= fmax: 129 | # multiply once more, since the included frequency is a frequency 130 | # which is only used as the right corner of a (triangular) filter 131 | freq *= factor 132 | frequencies.append(freq) 133 | # restart with a and go downwards till fmin 134 | freq = a 135 | while freq >= fmin: 136 | # divide once more, since the included frequency is a frequency 137 | # which is only used as the left corner of a (triangular) filter 138 | freq /= factor 139 | frequencies.append(freq) 140 | # sort frequencies 141 | frequencies.sort() 142 | # return the list 143 | return frequencies 144 | 145 | @staticmethod 146 | def triangular_filter(start, mid, stop, equal=False): 147 | """ 148 | Calculates a triangular filter of the given size. 149 | 150 | :param start: start bin (with value 0, included in the filter) 151 | :param mid: center bin (of height 1, unless norm is True) 152 | :param stop: end bin (with value 0, not included in the filter) 153 | :param equal: normalize the area of the filter to 1 154 | :returns: a triangular shaped filter 155 | 156 | """ 157 | # height of the filter 158 | height = 1. 159 | # normalize the height 160 | if equal: 161 | height = 2. / (stop - start) 162 | # init the filter 163 | triangular_filter = np.empty(stop - start) 164 | # rising edge 165 | rising = np.linspace(0, height, (mid - start), endpoint=False) 166 | triangular_filter[:mid - start] = rising 167 | # falling edge 168 | falling = np.linspace(height, 0, (stop - mid), endpoint=False) 169 | triangular_filter[mid - start:] = falling 170 | # return 171 | return triangular_filter 172 | 173 | 174 | class Wav(object): 175 | """ 176 | Wav Class is a simple wrapper around scipy.io.wavfile. 177 | 178 | """ 179 | def __init__(self, filename): 180 | """ 181 | Creates a new Wav object instance of the given file. 182 | 183 | :param filename: name of the .wav file 184 | 185 | """ 186 | # read in the audio 187 | self.sample_rate, self.audio = wavfile.read(filename) 188 | # set the length 189 | self.num_samples = np.shape(self.audio)[0] 190 | self.length = float(self.num_samples) / self.sample_rate 191 | # set the number of channels 192 | try: 193 | # multi channel files 194 | self.num_channels = np.shape(self.audio)[1] 195 | except IndexError: 196 | # catch mono files 197 | self.num_channels = 1 198 | 199 | def attenuate(self, attenuation): 200 | """ 201 | Attenuate the audio signal. 202 | 203 | :param attenuation: attenuation level given in dB 204 | 205 | """ 206 | att = np.power(np.sqrt(10.), attenuation / 10.) 207 | self.audio = np.asarray(self.audio / att, dtype=self.audio.dtype) 208 | 209 | def downmix(self): 210 | """ 211 | Down-mix the audio signal to mono. 212 | 213 | """ 214 | if self.num_channels > 1: 215 | self.audio = np.mean(self.audio, axis=-1, dtype=self.audio.dtype) 216 | 217 | def normalize(self): 218 | """ 219 | Normalize the audio signal. 220 | 221 | """ 222 | self.audio = self.audio.astype(np.float) / np.max(self.audio) 223 | 224 | 225 | class Spectrogram(object): 226 | """ 227 | Spectrogram Class. 228 | 229 | """ 230 | def __init__(self, wav, frame_size=2048, fps=200, filterbank=None, 231 | log=False, mul=1, add=1, online=True, block_size=2048, 232 | lgd=False): 233 | """ 234 | Creates a new Spectrogram object instance and performs a STFT on the 235 | given audio. 236 | 237 | :param wav: a Wav object 238 | :param frame_size: the size for the window [samples] 239 | :param fps: frames per second 240 | :param filterbank: use the given filterbank for dimensionality 241 | reduction 242 | :param log: use logarithmic magnitude 243 | :param mul: multiply the magnitude by this factor before taking 244 | the logarithm 245 | :param add: add this value to the magnitude before taking the 246 | logarithm 247 | :param online: work in online mode (i.e. use only past information) 248 | :param block_size: perform the filtering in blocks of the given size 249 | :param lgd: compute the local group delay (needed for the 250 | ComplexFlux algorithm) 251 | 252 | """ 253 | # init some variables 254 | self.wav = wav 255 | self.fps = fps 256 | self.filterbank = filterbank 257 | if add <= 0: 258 | raise ValueError("a positive value must be added before taking " 259 | "the logarithm") 260 | if mul <= 0: 261 | raise ValueError("a positive value must be multiplied before " 262 | "taking the logarithm") 263 | # derive some variables 264 | # use floats so that seeking works properly 265 | self.hop_size = float(self.wav.sample_rate) / float(self.fps) 266 | self.num_frames = int(np.ceil(self.wav.num_samples / self.hop_size)) 267 | self.num_fft_bins = int(frame_size / 2) 268 | # initial number of bins equal to fft bins, but those can change if 269 | # filters are used 270 | self.num_bins = int(frame_size / 2) 271 | # init spec matrix 272 | if filterbank is None: 273 | # init with number of FFT frequency bins 274 | self.spec = np.empty([self.num_frames, self.num_fft_bins], 275 | dtype=np.float32) 276 | else: 277 | # init with number of filter bands 278 | self.spec = np.empty([self.num_frames, np.shape(filterbank)[1]], 279 | dtype=np.float32) 280 | # set number of bins 281 | self.num_bins = np.shape(filterbank)[1] 282 | # set the block size 283 | if not block_size or block_size > self.num_frames: 284 | block_size = self.num_frames 285 | # init block counter 286 | block = 0 287 | # init a matrix of that size 288 | spec = np.zeros([block_size, self.num_fft_bins]) 289 | # init the local group delay matrix 290 | self.lgd = None 291 | if lgd: 292 | self.lgd = np.zeros([self.num_frames, self.num_fft_bins], 293 | dtype=np.float32) 294 | # create windowing function for DFT 295 | self.window = np.hanning(frame_size) 296 | try: 297 | # the audio signal is not scaled, scale the window accordingly 298 | max_value = np.iinfo(self.wav.audio.dtype).max 299 | self._fft_window = self.window / max_value 300 | except ValueError: 301 | self._fft_window = self.window 302 | # step through all frames 303 | for frame in range(self.num_frames): 304 | # seek to the right position in the audio signal 305 | if online: 306 | # step back one frame_size after moving forward 1 hop_size 307 | # so that the current position is at the end of the window 308 | seek = int((frame + 1) * self.hop_size - frame_size) 309 | else: 310 | # step back half of the frame_size so that the frame represents 311 | # the centre of the window 312 | seek = int(frame * self.hop_size - frame_size / 2) 313 | # read in the right portion of the audio 314 | if seek >= self.wav.num_samples: 315 | # end of file reached 316 | break 317 | elif seek + frame_size >= self.wav.num_samples: 318 | # end behind the actual audio, append zeros accordingly 319 | zeros = np.zeros(seek + frame_size - self.wav.num_samples) 320 | signal = self.wav.audio[seek:] 321 | signal = np.append(signal, zeros) 322 | elif seek < 0: 323 | # start before the actual audio, pad with zeros accordingly 324 | zeros = np.zeros(-seek) 325 | signal = self.wav.audio[0:seek + frame_size] 326 | signal = np.append(zeros, signal) 327 | else: 328 | # normal read operation 329 | signal = self.wav.audio[seek:seek + frame_size] 330 | # multiply the signal with the window function 331 | signal = signal * self._fft_window 332 | # perform DFT 333 | stft = fft.fft(signal)[:self.num_fft_bins] 334 | # compute the local group delay 335 | if lgd: 336 | # unwrap the phase 337 | unwrapped_phase = np.unwrap(np.angle(stft)) 338 | # local group delay is the derivative over frequency 339 | self.lgd[frame, :-1] = (unwrapped_phase[:-1] - 340 | unwrapped_phase[1:]) 341 | # is block-wise processing needed? 342 | if filterbank is None: 343 | # no filtering needed, thus no block wise processing needed 344 | self.spec[frame] = np.abs(stft) 345 | else: 346 | # filter in blocks 347 | spec[frame % block_size] = np.abs(stft) 348 | # end of a block or end of the signal reached 349 | end_of_block = (frame + 1) / block_size > block 350 | end_of_signal = (frame + 1) == self.num_frames 351 | if end_of_block or end_of_signal: 352 | start = block * block_size 353 | stop = min(start + block_size, self.num_frames) 354 | filtered_spec = np.dot(spec[:stop - start], filterbank) 355 | self.spec[start:stop] = filtered_spec 356 | # increase the block counter 357 | block += 1 358 | # next frame 359 | # take the logarithm 360 | if log: 361 | np.log10(mul * self.spec + add, out=self.spec) 362 | 363 | 364 | class SpectralODF(object): 365 | """ 366 | The SpectralODF class implements most of the common onset detection 367 | function based on the magnitude or phase information of a spectrogram. 368 | 369 | """ 370 | def __init__(self, spectrogram, ratio=0.5, max_bins=3, diff_frames=None, 371 | temporal_filter=3, temporal_origin=0): 372 | """ 373 | Creates a new ODF object instance. 374 | 375 | :param spectrogram: a Spectrogram object on which the detection 376 | functions operate 377 | :param ratio: calculate the difference to the frame which 378 | has the given magnitude ratio 379 | :param max_bins: number of bins for the maximum filter 380 | :param diff_frames: calculate the difference to the N-th previous 381 | frame 382 | :param temporal_filter: temporal maximum filtering of the local group 383 | delay for the ComplexFlux algorithms 384 | :param temporal_origin: origin of the temporal maximum filter 385 | 386 | If no diff_frames are given, they are calculated automatically based on 387 | the given ratio. 388 | 389 | """ 390 | self.s = spectrogram 391 | # determine the number off diff frames 392 | if diff_frames is None: 393 | # get the first sample with a higher magnitude than given ratio 394 | sample = np.argmax(self.s.window > ratio) 395 | diff_samples = self.s.window.size / 2 - sample 396 | # convert to frames 397 | diff_frames = int(round(diff_samples / self.s.hop_size)) 398 | # set the minimum to 1 399 | if diff_frames < 1: 400 | diff_frames = 1 401 | self.diff_frames = diff_frames 402 | # number of bins used for the maximum filter 403 | self.max_bins = max_bins 404 | self.temporal_filter = temporal_filter 405 | self.temporal_origin = temporal_origin 406 | 407 | @staticmethod 408 | def _superflux_diff_spec(spec, diff_frames=1, max_bins=3): 409 | """ 410 | Calculate the difference spec used for SuperFlux. 411 | 412 | :param spec: magnitude spectrogram 413 | :param diff_frames: calculate the difference to the N-th previous frame 414 | :param max_bins: number of neighboring bins used for maximum 415 | filtering 416 | :return: difference spectrogram used for SuperFlux 417 | 418 | Note: If 'max_bins' is greater than 0, a maximum filter of this size 419 | is applied in the frequency direction. The difference of the 420 | k-th frequency bin of the magnitude spectrogram is then 421 | calculated relative to the maximum over m bins of the N-th 422 | previous frame (e.g. m=3: k-1, k, k+1). 423 | 424 | This method works only properly if the number of bands for the 425 | filterbank is chosen carefully. A values of 24 (i.e. quarter-tone 426 | resolution) usually yields good results. 427 | 428 | """ 429 | # init diff matrix 430 | diff_spec = np.zeros_like(spec) 431 | if diff_frames < 1: 432 | raise ValueError("number of diff_frames must be >= 1") 433 | # widen the spectrogram in frequency dimension by `max_bins` 434 | max_spec = maximum_filter(spec, size=[1, max_bins]) 435 | # calculate the diff 436 | diff_spec[diff_frames:] = spec[diff_frames:] - max_spec[0:-diff_frames] 437 | # keep only positive values 438 | np.maximum(diff_spec, 0, diff_spec) 439 | # return diff spec 440 | return diff_spec 441 | 442 | @staticmethod 443 | def _lgd_mask(spec, lgd, filterbank=None, temporal_filter=0, 444 | temporal_origin=0): 445 | """ 446 | Calculates a weighting mask for the magnitude spectrogram based on the 447 | local group delay. 448 | 449 | :param spec: the magnitude spectrogram 450 | :param lgd: local group delay of the spectrogram 451 | :param filterbank: filterbank used for dimensionality reduction of 452 | the magnitude spectrogram 453 | :param temporal_filter: temporal maximum filtering of the local group 454 | delay 455 | :param temporal_origin: origin of the temporal maximum filter 456 | 457 | "Local group delay based vibrato and tremolo suppression for onset 458 | detection" 459 | Sebastian Böck and Gerhard Widmer. 460 | Proceedings of the 13th International Society for Music Information 461 | Retrieval Conference (ISMIR), 2013. 462 | 463 | """ 464 | from scipy.ndimage import maximum_filter, minimum_filter 465 | # take only absolute values of the local group delay 466 | lgd = np.abs(lgd) 467 | 468 | # maximum filter along the temporal axis 469 | if temporal_filter > 0: 470 | lgd = maximum_filter(lgd, size=[temporal_filter, 1], 471 | origin=temporal_origin) 472 | # lgd = uniform_filter(lgd, size=[1, 3]) # better for percussive onsets 473 | 474 | # create the weighting mask 475 | if filterbank is not None: 476 | # if the magnitude spectrogram was filtered, use the minimum local 477 | # group delay value of each filterbank (expanded by one frequency 478 | # bin in both directions) as the mask 479 | mask = np.zeros_like(spec) 480 | num_bins = lgd.shape[1] 481 | for b in range(mask.shape[1]): 482 | # determine the corner bins for the mask 483 | corner_bins = np.nonzero(filterbank[:, b])[0] 484 | # always expand to the next neighbour 485 | start_bin = corner_bins[0] - 1 486 | stop_bin = corner_bins[-1] + 2 487 | # constrain the range 488 | if start_bin < 0: 489 | start_bin = 0 490 | if stop_bin > num_bins: 491 | stop_bin = num_bins 492 | # set mask 493 | mask[:, b] = np.amin(lgd[:, start_bin: stop_bin], axis=1) 494 | else: 495 | # if the spectrogram is not filtered, use a simple minimum filter 496 | # covering only the current bin and its neighbours 497 | mask = minimum_filter(lgd, size=[1, 3]) 498 | # return the normalized mask 499 | return mask / np.pi 500 | 501 | # Onset Detection Functions 502 | def superflux(self): 503 | """ 504 | SuperFlux with a maximum filter based vibrato suppression. 505 | 506 | :return: SuperFlux onset detection function 507 | 508 | "Maximum Filter Vibrato Suppression for Onset Detection" 509 | Sebastian Böck and Gerhard Widmer. 510 | Proceedings of the 16th International Conference on Digital Audio 511 | Effects (DAFx-13), Maynooth, Ireland, September 2013 512 | 513 | """ 514 | # compute the difference spectrogram as in the SuperFlux algorithm 515 | diff_spec = self._superflux_diff_spec(self.s.spec, self.diff_frames, 516 | self.max_bins) 517 | # sum all positive 1st order max. filtered differences 518 | return np.sum(diff_spec, axis=1) 519 | 520 | def complex_flux(self): 521 | """ 522 | Complex Flux with a local group delay based tremolo suppression. 523 | 524 | Calculates the difference of bin k of the magnitude spectrogram 525 | relative to the N-th previous frame of the (maximum filtered) 526 | spectrogram. 527 | 528 | :return: complex flux onset detection function 529 | 530 | "Local group delay based vibrato and tremolo suppression for onset 531 | detection" 532 | Sebastian Böck and Gerhard Widmer. 533 | Proceedings of the 13th International Society for Music Information 534 | Retrieval Conference (ISMIR), 2013. 535 | 536 | """ 537 | # compute the difference spectrogram as in the SuperFlux algorithm 538 | diff_spec = self._superflux_diff_spec(self.s.spec, self.diff_frames, 539 | self.max_bins) 540 | # create a mask based on the local group delay information 541 | mask = self._lgd_mask(self.s.spec, self.s.lgd, self.s.filterbank, 542 | self.temporal_filter, self.temporal_origin) 543 | # weight the differences with the mask 544 | diff_spec *= mask 545 | # sum all positive 1st order max. filtered and weighted differences 546 | return np.sum(diff_spec, axis=1) 547 | 548 | 549 | class Onset(object): 550 | """ 551 | Onset Class. 552 | 553 | """ 554 | def __init__(self, activations, fps, online=True, sep=''): 555 | """ 556 | Creates a new Onset object instance with the given activations of the 557 | ODF (OnsetDetectionFunction). The activations can be read from a file. 558 | 559 | :param activations: an array containing the activations of the ODF 560 | :param fps: frame rate of the activations 561 | :param online: work in online mode (i.e. use only past 562 | information) 563 | 564 | """ 565 | self.activations = None # activations of the ODF 566 | self.fps = fps # frame rate of the activation function 567 | self.online = online # online peak-picking 568 | self.detections = [] # list of detected onsets (in seconds) 569 | # set / load activations 570 | if isinstance(activations, np.ndarray): 571 | # activations are given as an array 572 | self.activations = activations 573 | else: 574 | # read in the activations from a file 575 | self.load(activations, sep) 576 | 577 | def detect(self, threshold, combine=0.03, pre_avg=0.15, pre_max=0.01, 578 | post_avg=0, post_max=0.05, delay=0): 579 | """ 580 | Detects the onsets. 581 | 582 | :param threshold: threshold for peak-picking 583 | :param combine: only report 1 onset for N seconds 584 | :param pre_avg: use N seconds past information for moving average 585 | :param pre_max: use N seconds past information for moving maximum 586 | :param post_avg: use N seconds future information for moving average 587 | :param post_max: use N seconds future information for moving maximum 588 | :param delay: report the onset N seconds delayed 589 | 590 | In online mode, post_avg and post_max are set to 0. 591 | 592 | Implements the peak-picking method described in: 593 | 594 | "Evaluating the Online Capabilities of Onset Detection Methods" 595 | Sebastian Böck, Florian Krebs and Markus Schedl 596 | Proceedings of the 13th International Society for Music Information 597 | Retrieval Conference (ISMIR), 2012 598 | 599 | """ 600 | # online mode? 601 | if self.online: 602 | post_max = 0 603 | post_avg = 0 604 | # convert timing information to frames 605 | pre_avg = int(round(self.fps * pre_avg)) 606 | pre_max = int(round(self.fps * pre_max)) 607 | post_max = int(round(self.fps * post_max)) 608 | post_avg = int(round(self.fps * post_avg)) 609 | # convert to seconds 610 | combine /= 1000. 611 | delay /= 1000. 612 | # init detections 613 | self.detections = [] 614 | # moving maximum 615 | max_length = pre_max + post_max + 1 616 | max_origin = int(np.floor((pre_max - post_max) / 2)) 617 | mov_max = maximum_filter1d(self.activations, max_length, 618 | mode='constant', origin=max_origin) 619 | # moving average 620 | avg_length = pre_avg + post_avg + 1 621 | avg_origin = int(np.floor((pre_avg - post_avg) / 2)) 622 | mov_avg = uniform_filter1d(self.activations, avg_length, 623 | mode='constant', origin=avg_origin) 624 | # detections are activation equal to the moving maximum 625 | detections = self.activations * (self.activations == mov_max) 626 | # detections must be greater or equal than the mov. average + threshold 627 | detections *= (detections >= mov_avg + threshold) 628 | # convert detected onsets to a list of timestamps 629 | detections = np.nonzero(detections)[0].astype(np.float) / self.fps 630 | # shift if necessary 631 | if delay != 0: 632 | detections += delay 633 | # always use the first detection and all others if none was reported 634 | # within the last `combine` seconds 635 | if detections.size > 1: 636 | # filter all detections which occur within `combine` seconds 637 | combined_detections = detections[1:][np.diff(detections) > combine] 638 | # add them after the first detection 639 | self.detections = np.append(detections[0], combined_detections) 640 | else: 641 | self.detections = detections 642 | 643 | def write(self, filename): 644 | """ 645 | Write the detected onsets to the given file. 646 | 647 | :param filename: the target file name 648 | 649 | Only useful if detect() was invoked before. 650 | 651 | """ 652 | with open(filename, 'w') as f: 653 | for pos in self.detections: 654 | f.write(str(pos) + '\n') 655 | 656 | def save(self, filename, sep): 657 | """ 658 | Save the onset activations to the given file. 659 | 660 | :param filename: the target file name 661 | :param sep: separator between activation values 662 | 663 | Note: using an empty separator ('') results in a binary numpy array. 664 | 665 | """ 666 | self.activations.tofile(filename, sep=sep) 667 | 668 | def load(self, filename, sep): 669 | """ 670 | Load the onset activations from the given file. 671 | 672 | :param filename: the target file name 673 | :param sep: separator between activation values 674 | 675 | Note: using an empty separator ('') results in a binary numpy array. 676 | 677 | """ 678 | self.activations = np.fromfile(filename, sep=sep) 679 | 680 | 681 | def parser(lgd=False, threshold=1.1): 682 | """ 683 | Parses the command line arguments. 684 | 685 | :param lgd: use local group delay weighting by default 686 | :param threshold: default value for threshold 687 | 688 | """ 689 | import argparse 690 | # define parser 691 | p = argparse.ArgumentParser( 692 | formatter_class=argparse.RawDescriptionHelpFormatter, description=""" 693 | If invoked without any parameters, the software detects all onsets in 694 | the given files according to the method proposed in: 695 | 696 | "Maximum Filter Vibrato Suppression for Onset Detection" 697 | Sebastian Böck and Gerhard Widmer. 698 | Proceedings of the 16th International Conference on Digital Audio Effects 699 | (DAFx-13), Maynooth, Ireland, September 2013 700 | 701 | If the '--lgd' switch is set, it additionally applies a local group delay 702 | based weighting according to the method proposed in: 703 | 704 | "Local group delay based vibrato and tremolo suppression for onset 705 | detection" 706 | Sebastian Böck and Gerhard Widmer. 707 | Proceedings of the 13th International Society for Music Information 708 | Retrieval Conference (ISMIR), 2013. 709 | 710 | The single most important parameter is the threshold ('-t'). Adjusting 711 | this parameter might help to improve performance considerably. Please note 712 | that if the local group delay weighting scheme is applied, the threshold 713 | should be adjusted to a lower value, e.g. 0.25. 714 | 715 | """) 716 | # general options 717 | p.add_argument('files', metavar='files', nargs='+', 718 | help='files to be processed') 719 | p.add_argument('-v', dest='verbose', action='store_true', 720 | help='be verbose') 721 | p.add_argument('-s', dest='save', action='store_true', default=False, 722 | help='save the activations of the onset detection function') 723 | p.add_argument('-l', dest='load', action='store_true', default=False, 724 | help='load the activations of the onset detection function') 725 | p.add_argument('--sep', action='store', default='', 726 | help='separator for saving/loading the onset detection ' 727 | 'function [default=numpy binary]') 728 | p.add_argument('--act_suffix', action='store', default='.act', 729 | help='filename suffix of the activations files ' 730 | '[default=%(default)s]') 731 | p.add_argument('--det_suffix', action='store', default='.superflux.txt', 732 | help='filename suffix of the detection files ' 733 | '[default=%(default)s]') 734 | # online / offline mode 735 | p.add_argument('--online', action='store_true', default=False, 736 | help='operate in online mode (i.e. no future information ' 737 | 'will be used for computation)') 738 | # wav options 739 | wav = p.add_argument_group('audio arguments') 740 | wav.add_argument('--norm', action='store_true', default=None, 741 | help='normalize the audio (switches to offline mode)') 742 | wav.add_argument('--att', action='store', type=float, default=None, 743 | help='attenuate the audio by ATT dB') 744 | # spectrogram options 745 | spec = p.add_argument_group('spectrogram arguments') 746 | spec.add_argument('--fps', action='store', default=200, type=int, 747 | help='frames per second [default=%(default)s]') 748 | spec.add_argument('--frame_size', action='store', type=int, default=2048, 749 | help='frame size [samples, default=%(default)s]') 750 | spec.add_argument('--ratio', action='store', type=float, default=0.5, 751 | help='window magnitude ratio to calc number of diff ' 752 | 'frames [default=%(default)s]') 753 | spec.add_argument('--diff_frames', action='store', type=int, default=None, 754 | help='diff frames') 755 | spec.add_argument('--max_bins', action='store', type=int, default=3, 756 | help='bins used for maximum filtering ' 757 | '[default=%(default)s]') 758 | # LGD stuff 759 | mask = p.add_argument_group('local group delay based weighting') 760 | mask.add_argument('--lgd', action='store_true', default=lgd, 761 | help='apply local group delay based weighting ' 762 | '[default=%(default)s]') 763 | mask.add_argument('--temporal_filter', action='store', default=3, type=int, 764 | help='apply a temporal filter of N frames before ' 765 | 'calculating the LGD weighting mask ' 766 | '[default=%(default)s]') 767 | # filtering 768 | filt = p.add_argument_group('magnitude spectrogram filtering arguments') 769 | filt.add_argument('--no_filter', dest='filter', action='store_false', 770 | default=True, help='do not filter the magnitude ' 771 | 'spectrogram with a filterbank') 772 | filt.add_argument('--fmin', action='store', default=30, type=float, 773 | help='minimum frequency of filter ' 774 | '[Hz, default=%(default)s]') 775 | filt.add_argument('--fmax', action='store', default=17000, type=float, 776 | help='maximum frequency of filter ' 777 | '[Hz, default=%(default)s]') 778 | filt.add_argument('--bands', action='store', type=int, default=24, 779 | help='number of bands per octave [default=%(default)s]') 780 | filt.add_argument('--equal', action='store_true', default=False, 781 | help='equalize triangular windows to have equal area') 782 | filt.add_argument('--block_size', action='store', default=2048, type=int, 783 | help='perform filtering in blocks of N frames ' 784 | '[default=%(default)s]') 785 | # logarithm 786 | log = p.add_argument_group('logarithmic magnitude spectrogram arguments') 787 | log.add_argument('--no_log', dest='log', action='store_false', 788 | default=True, help='use linear magnitude scale') 789 | log.add_argument('--mul', action='store', default=1, type=float, 790 | help='multiplier (before taking the log) ' 791 | '[default=%(default)s]') 792 | log.add_argument('--add', action='store', default=1, type=float, 793 | help='value added (before taking the log) ' 794 | '[default=%(default)s]') 795 | # onset detection 796 | onset = p.add_argument_group('onset peak-picking arguments') 797 | onset.add_argument('-t', dest='threshold', action='store', type=float, 798 | default=threshold, help='detection threshold ' 799 | '[default=%(default)s]') 800 | onset.add_argument('--combine', action='store', type=float, default=0.03, 801 | help='combine onsets within N seconds ' 802 | '[default=%(default)s]') 803 | onset.add_argument('--pre_avg', action='store', type=float, default=0.15, 804 | help='build average over N previous seconds ' 805 | '[default=%(default)s]') 806 | onset.add_argument('--pre_max', action='store', type=float, default=0.01, 807 | help='search maximum over N previous seconds ' 808 | '[default=%(default)s]') 809 | onset.add_argument('--post_avg', action='store', type=float, default=0, 810 | help='build average over N following seconds ' 811 | '[default=%(default)s]') 812 | onset.add_argument('--post_max', action='store', type=float, default=0.05, 813 | help='search maximum over N following seconds ' 814 | '[default=%(default)s]') 815 | onset.add_argument('--delay', action='store', type=float, default=0, 816 | help='report the onsets N seconds delayed ' 817 | '[default=%(default)s]') 818 | # version 819 | p.add_argument('--version', action='version', 820 | version='%(prog)spec 1.03 (2014-11-02)') 821 | # parse arguments 822 | args = p.parse_args() 823 | # print arguments 824 | if args.verbose: 825 | print args 826 | # return args 827 | return args 828 | 829 | 830 | def main(args): 831 | """ 832 | Main SuperFlux program. 833 | 834 | :param args: parsed arguments 835 | 836 | """ 837 | import os.path 838 | import glob 839 | import fnmatch 840 | # determine the files to process 841 | files = [] 842 | for f in args.files: 843 | # check what we have (file/path) 844 | if os.path.isdir(f): 845 | # use all files in the given path 846 | files = glob.glob(f + '/*.wav') 847 | else: 848 | # file was given, append to list 849 | files.append(f) 850 | # only process .wav files 851 | files = fnmatch.filter(files, '*.wav') 852 | files.sort() 853 | # init filterbank 854 | filt = None 855 | filterbank = None 856 | # process the files 857 | for f in files: 858 | if args.verbose: 859 | print 'processing file %s' % f 860 | # use the name of the file without the extension 861 | filename = os.path.splitext(f)[0] 862 | # do the processing stuff unless the activations are loaded from file 863 | if args.load: 864 | # load the activations from file 865 | o = Onset("%s.act" % filename, args.fps, args.online, args.sep) 866 | else: 867 | # open the wav file 868 | w = Wav(f) 869 | # normalize audio 870 | if args.norm: 871 | w.normalize() 872 | args.online = False # switch to offline mode 873 | # down-mix to mono 874 | if w.num_channels > 1: 875 | w.downmix() 876 | # attenuate signal 877 | if args.att: 878 | w.attenuate(args.att) 879 | # create filterbank if needed 880 | if args.filter: 881 | # re-create filterbank if the sample rate of the audio changes 882 | if filt is None or filt.fs != w.sample_rate: 883 | filt = Filter(args.frame_size / 2, w.sample_rate, 884 | args.bands, args.fmin, args.fmax, args.equal) 885 | filterbank = filt.filterbank 886 | # spectrogram 887 | s = Spectrogram(w, frame_size=args.frame_size, fps=args.fps, 888 | filterbank=filterbank, log=args.log, 889 | mul=args.mul, add=args.add, online=args.online, 890 | block_size=args.block_size, lgd=args.lgd) 891 | # use the spectrogram to create an SpectralODF object 892 | sodf = SpectralODF(s, ratio=args.ratio, max_bins=args.max_bins, 893 | diff_frames=args.diff_frames) 894 | # perform detection function on the object 895 | if args.lgd: 896 | act = sodf.complex_flux() 897 | else: 898 | act = sodf.superflux() 899 | # create an Onset object with the activations 900 | o = Onset(act, args.fps, args.online) 901 | if args.save: 902 | # save the raw ODF activations 903 | o.save("%s%s" % (filename, args.act_suffix), args.sep) 904 | # detect the onsets 905 | o.detect(args.threshold, args.combine, args.pre_avg, args.pre_max, 906 | args.post_avg, args.post_max, args.delay) 907 | # write the onsets to a file 908 | o.write("%s%s" % (filename, args.det_suffix)) 909 | # also output them to stdout if verbose 910 | if args.verbose: 911 | print 'detections:', o.detections 912 | # continue with next file 913 | 914 | if __name__ == '__main__': 915 | # parse arguments 916 | args = parser() 917 | # and run the main SuperFlux program 918 | main(args) 919 | --------------------------------------------------------------------------------