├── ComplexFlux.py
├── .gitignore
├── README.md
└── SuperFlux.py


/ComplexFlux.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | """
 4 | Simple wrapper for calling SuperFlux with the correct defaults values.
 5 | 
 6 | """
 7 | 
 8 | from SuperFlux import parser, main
 9 | 
10 | if __name__ == '__main__':
11 |     # parse arguments
12 |     args = parser(lgd=True, threshold=0.25)
13 |     # and run the main SuperFlux program
14 |     main(args)


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | *.py[cod]
 2 | 
 3 | # C extensions
 4 | *.so
 5 | 
 6 | # Packages
 7 | *.egg
 8 | *.egg-info
 9 | dist
10 | build
11 | eggs
12 | parts
13 | bin
14 | var
15 | sdist
16 | develop-eggs
17 | .installed.cfg
18 | lib
19 | lib64
20 | 
21 | # Installer logs
22 | pip-log.txt
23 | 
24 | # Unit test / coverage reports
25 | .coverage
26 | .tox
27 | nosetests.xml
28 | 
29 | # Translations
30 | *.mo
31 | 
32 | # Mr Developer
33 | .mr.developer.cfg
34 | .project
35 | .pydevproject
36 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | SuperFlux
 2 | =========
 3 | 
 4 | Python reference implementation of the SuperFlux onset detection algorithm as
 5 | described in:
 6 | 
 7 | "Maximum Filter Vibrato Suppression for Onset Detection"
 8 | by Sebastian Böck and Gerhard Widmer.
 9 | Proceedings of the 16th International Conference on Digital Audio Effects
10 | (DAFx-13), Maynooth, Ireland, September 2013.
11 | 
12 | and the additional local group delay (LGD) based weighting scheme described in:
13 | 
14 | "Local group delay based vibrato and tremolo suppression for onset detection"
15 | by Sebastian Böck and Gerhard Widmer.
16 | Proceedings of the 13th International Society for Music Information
17 | Retrieval Conference (ISMIR), Curitiba, Brazil, November 2013.
18 | 
19 | The papers can be downloaded from: 
20 | 
21 | <http://phenicx.upf.edu/system/files/publications/Boeck_DAFx-13.pdf>
22 | 
23 | <http://phenicx.upf.edu/system/files/publications/Boeck_ISMIR_2013.pdf>
24 | 
25 | If you use this software, please cite the corresponding paper.
26 | 
27 | ```
28 | @inproceedings{Boeck2013,
29 | 	Author = {B{\"o}ck, Sebastian and Widmer, Gerhard},
30 | 	Title = {Maximum Filter Vibrato Suppression for Onset Detection},
31 | 	Booktitle = {{Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13)}},
32 | 	Pages = {55--61},
33 | 	Address = {Maynooth, Ireland},
34 | 	Month = {September},
35 | 	Year = {2013}
36 | }
37 | 
38 | @inproceedings{Boeck2013a,
39 | 	Author = {B{\"o}ck, Sebastian and Widmer, Gerhard},
40 | 	Title = {Local Group Delay based Vibrato and Tremolo Suppression for Onset Detection},
41 | 	Booktitle = {{Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), 2013.},
42 | 	Pages = {589–-594},
43 | 	Address = {Curitiba, Brazil},
44 | 	Month = {November},
45 | 	Year = {2013}
46 | 
47 | ```
48 | 
49 | 
50 | Usage
51 | -----
52 | `SuperFlux.py input.wav` processes the audio file and writes the detected
53 | onsets to a file named `input.superflux.txt`.
54 | 
55 | Please see the `-h` option to get a more detailed description of the available
56 | options, e.g. changing the suffix for the detection files.
57 | 
58 | Requirements
59 | ------------
60 | * Python 2.7
61 | * Numpy
62 | * Scipy
63 | 
64 | 


--------------------------------------------------------------------------------
/SuperFlux.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # encoding: utf-8
  3 | """
  4 | Copyright (c) 2012 - 2014 Sebastian Böck <sebastian.boeck@jku.at>
  5 | All rights reserved.
  6 | 
  7 | Redistribution and use in source and binary forms, with or without
  8 | modification, are permitted provided that the following conditions are met:
  9 | 
 10 | 1. Redistributions of source code must retain the above copyright notice, this
 11 |    list of conditions and the following disclaimer.
 12 | 2. Redistributions in binary form must reproduce the above copyright notice,
 13 |    this list of conditions and the following disclaimer in the documentation
 14 |    and/or other materials provided with the distribution.
 15 | 
 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 17 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 18 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 20 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 21 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 22 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 23 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 24 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 25 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 26 | 
 27 | """
 28 | 
 29 | """
 30 | Please note that this program released together with the paper
 31 | 
 32 | "Maximum Filter Vibrato Suppression for Onset Detection"
 33 | Sebastian Böck and Gerhard Widmer.
 34 | Proceedings of the 16th International Conference on Digital Audio Effects
 35 | (DAFx-13), Maynooth, Ireland, September 2013
 36 | 
 37 | is not tuned in any way for speed/memory efficiency. However, it can be used
 38 | as a reference implementation for the described onset detection with a maximum
 39 | filter for vibrato suppression.
 40 | 
 41 | It also serves as a reference implementation of the local group delay (LGD)
 42 | based weighting extension described in:
 43 | 
 44 | "Local group delay based vibrato and tremolo suppression for onset detection"
 45 | Sebastian Böck and Gerhard Widmer.
 46 | Proceedings of the 13th International Society for Music Information
 47 | Retrieval Conference (ISMIR), 2013.
 48 | 
 49 | If you use this software, please cite the corresponding paper.
 50 | 
 51 | Please send any comments, enhancements, errata, etc. to the main author.
 52 | 
 53 | """
 54 | 
 55 | import numpy as np
 56 | import scipy.fftpack as fft
 57 | from scipy.io import wavfile
 58 | from scipy.ndimage.filters import (maximum_filter, maximum_filter1d,
 59 |                                    uniform_filter1d)
 60 | 
 61 | 
 62 | class Filter(object):
 63 |     """
 64 |     Filter Class.
 65 | 
 66 |     """
 67 |     def __init__(self, num_fft_bins, fs, bands=24, fmin=30, fmax=17000, equal=False):
 68 |         """
 69 |         Creates a new Filter object instance.
 70 | 
 71 |         :param num_fft_bins: number of FFT coefficients
 72 |         :param fs:           sample rate of the audio file
 73 |         :param bands:        number of filter bands
 74 |         :param fmin:         the minimum frequency [Hz]
 75 |         :param fmax:         the maximum frequency [Hz]
 76 |         :param equal:        normalize the area of each band to 1
 77 | 
 78 |         """
 79 |         # sample rate
 80 |         self.fs = fs
 81 |         # reduce fmax if necessary
 82 |         if fmax > fs / 2:
 83 |             fmax = fs / 2
 84 |         # get a list of frequencies
 85 |         frequencies = self.frequencies(bands, fmin, fmax)
 86 |         # conversion factor for mapping of frequencies to spectrogram bins
 87 |         factor = (fs / 2.0) / num_fft_bins
 88 |         # map the frequencies to the spectrogram bins
 89 |         frequencies = np.round(np.asarray(frequencies) / factor).astype(int)
 90 |         # only keep unique bins
 91 |         frequencies = np.unique(frequencies)
 92 |         # filter out all frequencies outside the valid range
 93 |         frequencies = [f for f in frequencies if f < num_fft_bins]
 94 |         # number of bands
 95 |         bands = len(frequencies) - 2
 96 |         assert bands >= 3, 'cannot create filterbank with less than 3 ' \
 97 |                            'frequencies'
 98 |         # init the filter matrix with size: number of FFT bins x filter bands
 99 |         self.filterbank = np.zeros([num_fft_bins, bands], dtype=np.float)
100 |         # process all bands
101 |         for band in range(bands):
102 |             # edge & center frequencies
103 |             start, mid, stop = frequencies[band:band + 3]
104 |             # create a triangular filter
105 |             triangular_filter = self.triangular_filter(start, mid, stop, equal)
106 |             self.filterbank[start:stop, band] = triangular_filter
107 | 
108 |     @staticmethod
109 |     def frequencies(bands, fmin, fmax, a=440):
110 |         """
111 |         Returns a list of frequencies aligned on a logarithmic scale.
112 | 
113 |         :param bands: number of filter bands per octave
114 |         :param fmin:  the minimum frequency [Hz]
115 |         :param fmax:  the maximum frequency [Hz]
116 |         :param a:     frequency of A0 [Hz]
117 |         :returns:     a list of frequencies
118 | 
119 |         Using 12 bands per octave and a=440 corresponding to the MIDI notes.
120 | 
121 |         """
122 |         # factor 2 frequencies are apart
123 |         factor = 2.0 ** (1.0 / bands)
124 |         # start with A0
125 |         freq = a
126 |         frequencies = [freq]
127 |         # go upwards till fmax
128 |         while freq <= fmax:
129 |             # multiply once more, since the included frequency is a frequency
130 |             # which is only used as the right corner of a (triangular) filter
131 |             freq *= factor
132 |             frequencies.append(freq)
133 |         # restart with a and go downwards till fmin
134 |         freq = a
135 |         while freq >= fmin:
136 |             # divide once more, since the included frequency is a frequency
137 |             # which is only used as the left corner of a (triangular) filter
138 |             freq /= factor
139 |             frequencies.append(freq)
140 |         # sort frequencies
141 |         frequencies.sort()
142 |         # return the list
143 |         return frequencies
144 | 
145 |     @staticmethod
146 |     def triangular_filter(start, mid, stop, equal=False):
147 |         """
148 |         Calculates a triangular filter of the given size.
149 | 
150 |         :param start: start bin (with value 0, included in the filter)
151 |         :param mid:   center bin (of height 1, unless norm is True)
152 |         :param stop:  end bin (with value 0, not included in the filter)
153 |         :param equal: normalize the area of the filter to 1
154 |         :returns:     a triangular shaped filter
155 | 
156 |         """
157 |         # height of the filter
158 |         height = 1.
159 |         # normalize the height
160 |         if equal:
161 |             height = 2. / (stop - start)
162 |         # init the filter
163 |         triangular_filter = np.empty(stop - start)
164 |         # rising edge
165 |         rising = np.linspace(0, height, (mid - start), endpoint=False)
166 |         triangular_filter[:mid - start] = rising
167 |         # falling edge
168 |         falling = np.linspace(height, 0, (stop - mid), endpoint=False)
169 |         triangular_filter[mid - start:] = falling
170 |         # return
171 |         return triangular_filter
172 | 
173 | 
174 | class Wav(object):
175 |     """
176 |     Wav Class is a simple wrapper around scipy.io.wavfile.
177 | 
178 |     """
179 |     def __init__(self, filename):
180 |         """
181 |         Creates a new Wav object instance of the given file.
182 | 
183 |         :param filename: name of the .wav file
184 | 
185 |         """
186 |         # read in the audio
187 |         self.sample_rate, self.audio = wavfile.read(filename)
188 |         # set the length
189 |         self.num_samples = np.shape(self.audio)[0]
190 |         self.length = float(self.num_samples) / self.sample_rate
191 |         # set the number of channels
192 |         try:
193 |             # multi channel files
194 |             self.num_channels = np.shape(self.audio)[1]
195 |         except IndexError:
196 |             # catch mono files
197 |             self.num_channels = 1
198 | 
199 |     def attenuate(self, attenuation):
200 |         """
201 |         Attenuate the audio signal.
202 | 
203 |         :param attenuation: attenuation level given in dB
204 | 
205 |         """
206 |         att = np.power(np.sqrt(10.), attenuation / 10.)
207 |         self.audio = np.asarray(self.audio / att, dtype=self.audio.dtype)
208 | 
209 |     def downmix(self):
210 |         """
211 |         Down-mix the audio signal to mono.
212 | 
213 |         """
214 |         if self.num_channels > 1:
215 |             self.audio = np.mean(self.audio, axis=-1, dtype=self.audio.dtype)
216 | 
217 |     def normalize(self):
218 |         """
219 |         Normalize the audio signal.
220 | 
221 |         """
222 |         self.audio = self.audio.astype(np.float) / np.max(self.audio)
223 | 
224 | 
225 | class Spectrogram(object):
226 |     """
227 |     Spectrogram Class.
228 | 
229 |     """
230 |     def __init__(self, wav, frame_size=2048, fps=200, filterbank=None,
231 |                  log=False, mul=1, add=1, online=True, block_size=2048,
232 |                  lgd=False):
233 |         """
234 |         Creates a new Spectrogram object instance and performs a STFT on the
235 |         given audio.
236 | 
237 |         :param wav:        a Wav object
238 |         :param frame_size: the size for the window [samples]
239 |         :param fps:        frames per second
240 |         :param filterbank: use the given filterbank for dimensionality
241 |                            reduction
242 |         :param log:        use logarithmic magnitude
243 |         :param mul:        multiply the magnitude by this factor before taking
244 |                            the logarithm
245 |         :param add:        add this value to the magnitude before taking the
246 |                            logarithm
247 |         :param online:     work in online mode (i.e. use only past information)
248 |         :param block_size: perform the filtering in blocks of the given size
249 |         :param lgd:        compute the local group delay (needed for the
250 |                            ComplexFlux algorithm)
251 | 
252 |         """
253 |         # init some variables
254 |         self.wav = wav
255 |         self.fps = fps
256 |         self.filterbank = filterbank
257 |         if add <= 0:
258 |             raise ValueError("a positive value must be added before taking "
259 |                              "the logarithm")
260 |         if mul <= 0:
261 |             raise ValueError("a positive value must be multiplied before "
262 |                              "taking the logarithm")
263 |         # derive some variables
264 |         # use floats so that seeking works properly
265 |         self.hop_size = float(self.wav.sample_rate) / float(self.fps)
266 |         self.num_frames = int(np.ceil(self.wav.num_samples / self.hop_size))
267 |         self.num_fft_bins = int(frame_size / 2)
268 |         # initial number of bins equal to fft bins, but those can change if
269 |         # filters are used
270 |         self.num_bins = int(frame_size / 2)
271 |         # init spec matrix
272 |         if filterbank is None:
273 |             # init with number of FFT frequency bins
274 |             self.spec = np.empty([self.num_frames, self.num_fft_bins],
275 |                                  dtype=np.float32)
276 |         else:
277 |             # init with number of filter bands
278 |             self.spec = np.empty([self.num_frames, np.shape(filterbank)[1]],
279 |                                  dtype=np.float32)
280 |             # set number of bins
281 |             self.num_bins = np.shape(filterbank)[1]
282 |             # set the block size
283 |             if not block_size or block_size > self.num_frames:
284 |                 block_size = self.num_frames
285 |             # init block counter
286 |             block = 0
287 |             # init a matrix of that size
288 |             spec = np.zeros([block_size, self.num_fft_bins])
289 |         # init the local group delay matrix
290 |         self.lgd = None
291 |         if lgd:
292 |             self.lgd = np.zeros([self.num_frames, self.num_fft_bins],
293 |                                 dtype=np.float32)
294 |         # create windowing function for DFT
295 |         self.window = np.hanning(frame_size)
296 |         try:
297 |             # the audio signal is not scaled, scale the window accordingly
298 |             max_value = np.iinfo(self.wav.audio.dtype).max
299 |             self._fft_window = self.window / max_value
300 |         except ValueError:
301 |             self._fft_window = self.window
302 |         # step through all frames
303 |         for frame in range(self.num_frames):
304 |             # seek to the right position in the audio signal
305 |             if online:
306 |                 # step back one frame_size after moving forward 1 hop_size
307 |                 # so that the current position is at the end of the window
308 |                 seek = int((frame + 1) * self.hop_size - frame_size)
309 |             else:
310 |                 # step back half of the frame_size so that the frame represents
311 |                 # the centre of the window
312 |                 seek = int(frame * self.hop_size - frame_size / 2)
313 |             # read in the right portion of the audio
314 |             if seek >= self.wav.num_samples:
315 |                 # end of file reached
316 |                 break
317 |             elif seek + frame_size >= self.wav.num_samples:
318 |                 # end behind the actual audio, append zeros accordingly
319 |                 zeros = np.zeros(seek + frame_size - self.wav.num_samples)
320 |                 signal = self.wav.audio[seek:]
321 |                 signal = np.append(signal, zeros)
322 |             elif seek < 0:
323 |                 # start before the actual audio, pad with zeros accordingly
324 |                 zeros = np.zeros(-seek)
325 |                 signal = self.wav.audio[0:seek + frame_size]
326 |                 signal = np.append(zeros, signal)
327 |             else:
328 |                 # normal read operation
329 |                 signal = self.wav.audio[seek:seek + frame_size]
330 |             # multiply the signal with the window function
331 |             signal = signal * self._fft_window
332 |             # perform DFT
333 |             stft = fft.fft(signal)[:self.num_fft_bins]
334 |             # compute the local group delay
335 |             if lgd:
336 |                 # unwrap the phase
337 |                 unwrapped_phase = np.unwrap(np.angle(stft))
338 |                 # local group delay is the derivative over frequency
339 |                 self.lgd[frame, :-1] = (unwrapped_phase[:-1] -
340 |                                         unwrapped_phase[1:])
341 |             # is block-wise processing needed?
342 |             if filterbank is None:
343 |                 # no filtering needed, thus no block wise processing needed
344 |                 self.spec[frame] = np.abs(stft)
345 |             else:
346 |                 # filter in blocks
347 |                 spec[frame % block_size] = np.abs(stft)
348 |                 # end of a block or end of the signal reached
349 |                 end_of_block = (frame + 1) / block_size > block
350 |                 end_of_signal = (frame + 1) == self.num_frames
351 |                 if end_of_block or end_of_signal:
352 |                     start = block * block_size
353 |                     stop = min(start + block_size, self.num_frames)
354 |                     filtered_spec = np.dot(spec[:stop - start], filterbank)
355 |                     self.spec[start:stop] = filtered_spec
356 |                     # increase the block counter
357 |                     block += 1
358 |             # next frame
359 |         # take the logarithm
360 |         if log:
361 |             np.log10(mul * self.spec + add, out=self.spec)
362 | 
363 | 
364 | class SpectralODF(object):
365 |     """
366 |     The SpectralODF class implements most of the common onset detection
367 |     function based on the magnitude or phase information of a spectrogram.
368 | 
369 |     """
370 |     def __init__(self, spectrogram, ratio=0.5, max_bins=3, diff_frames=None,
371 |                  temporal_filter=3, temporal_origin=0):
372 |         """
373 |         Creates a new ODF object instance.
374 | 
375 |         :param spectrogram:     a Spectrogram object on which the detection
376 |                                 functions operate
377 |         :param ratio:           calculate the difference to the frame which
378 |                                 has the given magnitude ratio
379 |         :param max_bins:        number of bins for the maximum filter
380 |         :param diff_frames:     calculate the difference to the N-th previous
381 |                                 frame
382 |         :param temporal_filter: temporal maximum filtering of the local group
383 |                                 delay for the ComplexFlux algorithms
384 |         :param temporal_origin: origin of the temporal maximum filter
385 | 
386 |         If no diff_frames are given, they are calculated automatically based on
387 |         the given ratio.
388 | 
389 |         """
390 |         self.s = spectrogram
391 |         # determine the number off diff frames
392 |         if diff_frames is None:
393 |             # get the first sample with a higher magnitude than given ratio
394 |             sample = np.argmax(self.s.window > ratio)
395 |             diff_samples = self.s.window.size / 2 - sample
396 |             # convert to frames
397 |             diff_frames = int(round(diff_samples / self.s.hop_size))
398 |             # set the minimum to 1
399 |             if diff_frames < 1:
400 |                 diff_frames = 1
401 |         self.diff_frames = diff_frames
402 |         # number of bins used for the maximum filter
403 |         self.max_bins = max_bins
404 |         self.temporal_filter = temporal_filter
405 |         self.temporal_origin = temporal_origin
406 | 
407 |     @staticmethod
408 |     def _superflux_diff_spec(spec, diff_frames=1, max_bins=3):
409 |         """
410 |         Calculate the difference spec used for SuperFlux.
411 | 
412 |         :param spec:        magnitude spectrogram
413 |         :param diff_frames: calculate the difference to the N-th previous frame
414 |         :param max_bins:    number of neighboring bins used for maximum
415 |                             filtering
416 |         :return:            difference spectrogram used for SuperFlux
417 | 
418 |         Note: If 'max_bins' is greater than 0, a maximum filter of this size
419 |               is applied in the frequency direction. The difference of the
420 |               k-th frequency bin of the magnitude spectrogram is then
421 |               calculated relative to the maximum over m bins of the N-th
422 |               previous frame (e.g. m=3: k-1, k, k+1).
423 | 
424 |               This method works only properly if the number of bands for the
425 |               filterbank is chosen carefully. A values of 24 (i.e. quarter-tone
426 |               resolution) usually yields good results.
427 | 
428 |         """
429 |         # init diff matrix
430 |         diff_spec = np.zeros_like(spec)
431 |         if diff_frames < 1:
432 |             raise ValueError("number of diff_frames must be >= 1")
433 |         # widen the spectrogram in frequency dimension by `max_bins`
434 |         max_spec = maximum_filter(spec, size=[1, max_bins])
435 |         # calculate the diff
436 |         diff_spec[diff_frames:] = spec[diff_frames:] - max_spec[0:-diff_frames]
437 |         # keep only positive values
438 |         np.maximum(diff_spec, 0, diff_spec)
439 |         # return diff spec
440 |         return diff_spec
441 | 
442 |     @staticmethod
443 |     def _lgd_mask(spec, lgd, filterbank=None, temporal_filter=0,
444 |                   temporal_origin=0):
445 |         """
446 |         Calculates a weighting mask for the magnitude spectrogram based on the
447 |         local group delay.
448 | 
449 |         :param spec:            the magnitude spectrogram
450 |         :param lgd:             local group delay of the spectrogram
451 |         :param filterbank:      filterbank used for dimensionality reduction of
452 |                                 the magnitude spectrogram
453 |         :param temporal_filter: temporal maximum filtering of the local group
454 |                                 delay
455 |         :param temporal_origin: origin of the temporal maximum filter
456 | 
457 |         "Local group delay based vibrato and tremolo suppression for onset
458 |          detection"
459 |         Sebastian Böck and Gerhard Widmer.
460 |         Proceedings of the 13th International Society for Music Information
461 |         Retrieval Conference (ISMIR), 2013.
462 | 
463 |         """
464 |         from scipy.ndimage import maximum_filter, minimum_filter
465 |         # take only absolute values of the local group delay
466 |         lgd = np.abs(lgd)
467 | 
468 |         # maximum filter along the temporal axis
469 |         if temporal_filter > 0:
470 |             lgd = maximum_filter(lgd, size=[temporal_filter, 1],
471 |                                  origin=temporal_origin)
472 |         # lgd = uniform_filter(lgd, size=[1, 3])  # better for percussive onsets
473 | 
474 |         # create the weighting mask
475 |         if filterbank is not None:
476 |             # if the magnitude spectrogram was filtered, use the minimum local
477 |             # group delay value of each filterbank (expanded by one frequency
478 |             # bin in both directions) as the mask
479 |             mask = np.zeros_like(spec)
480 |             num_bins = lgd.shape[1]
481 |             for b in range(mask.shape[1]):
482 |                 # determine the corner bins for the mask
483 |                 corner_bins = np.nonzero(filterbank[:, b])[0]
484 |                 # always expand to the next neighbour
485 |                 start_bin = corner_bins[0] - 1
486 |                 stop_bin = corner_bins[-1] + 2
487 |                 # constrain the range
488 |                 if start_bin < 0:
489 |                     start_bin = 0
490 |                 if stop_bin > num_bins:
491 |                     stop_bin = num_bins
492 |                 # set mask
493 |                 mask[:, b] = np.amin(lgd[:, start_bin: stop_bin], axis=1)
494 |         else:
495 |             # if the spectrogram is not filtered, use a simple minimum filter
496 |             # covering only the current bin and its neighbours
497 |             mask = minimum_filter(lgd, size=[1, 3])
498 |         # return the normalized mask
499 |         return mask / np.pi
500 | 
501 |     # Onset Detection Functions
502 |     def superflux(self):
503 |         """
504 |         SuperFlux with a maximum filter based vibrato suppression.
505 | 
506 |         :return: SuperFlux onset detection function
507 | 
508 |         "Maximum Filter Vibrato Suppression for Onset Detection"
509 |         Sebastian Böck and Gerhard Widmer.
510 |         Proceedings of the 16th International Conference on Digital Audio
511 |         Effects (DAFx-13), Maynooth, Ireland, September 2013
512 | 
513 |         """
514 |         # compute the difference spectrogram as in the SuperFlux algorithm
515 |         diff_spec = self._superflux_diff_spec(self.s.spec, self.diff_frames,
516 |                                               self.max_bins)
517 |         # sum all positive 1st order max. filtered differences
518 |         return np.sum(diff_spec, axis=1)
519 | 
520 |     def complex_flux(self):
521 |         """
522 |         Complex Flux with a local group delay based tremolo suppression.
523 | 
524 |         Calculates the difference of bin k of the magnitude spectrogram
525 |         relative to the N-th previous frame of the (maximum filtered)
526 |         spectrogram.
527 | 
528 |         :return: complex flux onset detection function
529 | 
530 |         "Local group delay based vibrato and tremolo suppression for onset
531 |          detection"
532 |         Sebastian Böck and Gerhard Widmer.
533 |         Proceedings of the 13th International Society for Music Information
534 |         Retrieval Conference (ISMIR), 2013.
535 | 
536 |         """
537 |         # compute the difference spectrogram as in the SuperFlux algorithm
538 |         diff_spec = self._superflux_diff_spec(self.s.spec, self.diff_frames,
539 |                                               self.max_bins)
540 |         # create a mask based on the local group delay information
541 |         mask = self._lgd_mask(self.s.spec, self.s.lgd, self.s.filterbank,
542 |                               self.temporal_filter, self.temporal_origin)
543 |         # weight the differences with the mask
544 |         diff_spec *= mask
545 |         # sum all positive 1st order max. filtered and weighted differences
546 |         return np.sum(diff_spec, axis=1)
547 | 
548 | 
549 | class Onset(object):
550 |     """
551 |     Onset Class.
552 | 
553 |     """
554 |     def __init__(self, activations, fps, online=True, sep=''):
555 |         """
556 |         Creates a new Onset object instance with the given activations of the
557 |         ODF (OnsetDetectionFunction). The activations can be read from a file.
558 | 
559 |         :param activations: an array containing the activations of the ODF
560 |         :param fps:         frame rate of the activations
561 |         :param online:      work in online mode (i.e. use only past
562 |                             information)
563 | 
564 |         """
565 |         self.activations = None     # activations of the ODF
566 |         self.fps = fps              # frame rate of the activation function
567 |         self.online = online        # online peak-picking
568 |         self.detections = []        # list of detected onsets (in seconds)
569 |         # set / load activations
570 |         if isinstance(activations, np.ndarray):
571 |             # activations are given as an array
572 |             self.activations = activations
573 |         else:
574 |             # read in the activations from a file
575 |             self.load(activations, sep)
576 | 
577 |     def detect(self, threshold, combine=0.03, pre_avg=0.15, pre_max=0.01,
578 |                post_avg=0, post_max=0.05, delay=0):
579 |         """
580 |         Detects the onsets.
581 | 
582 |         :param threshold: threshold for peak-picking
583 |         :param combine:   only report 1 onset for N seconds
584 |         :param pre_avg:   use N seconds past information for moving average
585 |         :param pre_max:   use N seconds past information for moving maximum
586 |         :param post_avg:  use N seconds future information for moving average
587 |         :param post_max:  use N seconds future information for moving maximum
588 |         :param delay:     report the onset N seconds delayed
589 | 
590 |         In online mode, post_avg and post_max are set to 0.
591 | 
592 |         Implements the peak-picking method described in:
593 | 
594 |         "Evaluating the Online Capabilities of Onset Detection Methods"
595 |         Sebastian Böck, Florian Krebs and Markus Schedl
596 |         Proceedings of the 13th International Society for Music Information
597 |         Retrieval Conference (ISMIR), 2012
598 | 
599 |         """
600 |         # online mode?
601 |         if self.online:
602 |             post_max = 0
603 |             post_avg = 0
604 |         # convert timing information to frames
605 |         pre_avg = int(round(self.fps * pre_avg))
606 |         pre_max = int(round(self.fps * pre_max))
607 |         post_max = int(round(self.fps * post_max))
608 |         post_avg = int(round(self.fps * post_avg))
609 |         # convert to seconds
610 |         combine /= 1000.
611 |         delay /= 1000.
612 |         # init detections
613 |         self.detections = []
614 |         # moving maximum
615 |         max_length = pre_max + post_max + 1
616 |         max_origin = int(np.floor((pre_max - post_max) / 2))
617 |         mov_max = maximum_filter1d(self.activations, max_length,
618 |                                    mode='constant', origin=max_origin)
619 |         # moving average
620 |         avg_length = pre_avg + post_avg + 1
621 |         avg_origin = int(np.floor((pre_avg - post_avg) / 2))
622 |         mov_avg = uniform_filter1d(self.activations, avg_length,
623 |                                    mode='constant', origin=avg_origin)
624 |         # detections are activation equal to the moving maximum
625 |         detections = self.activations * (self.activations == mov_max)
626 |         # detections must be greater or equal than the mov. average + threshold
627 |         detections *= (detections >= mov_avg + threshold)
628 |         # convert detected onsets to a list of timestamps
629 |         detections = np.nonzero(detections)[0].astype(np.float) / self.fps
630 |         # shift if necessary
631 |         if delay != 0:
632 |             detections += delay
633 |         # always use the first detection and all others if none was reported
634 |         # within the last `combine` seconds
635 |         if detections.size > 1:
636 |             # filter all detections which occur within `combine` seconds
637 |             combined_detections = detections[1:][np.diff(detections) > combine]
638 |             # add them after the first detection
639 |             self.detections = np.append(detections[0], combined_detections)
640 |         else:
641 |             self.detections = detections
642 | 
643 |     def write(self, filename):
644 |         """
645 |         Write the detected onsets to the given file.
646 | 
647 |         :param filename: the target file name
648 | 
649 |         Only useful if detect() was invoked before.
650 | 
651 |         """
652 |         with open(filename, 'w') as f:
653 |             for pos in self.detections:
654 |                 f.write(str(pos) + '\n')
655 | 
656 |     def save(self, filename, sep):
657 |         """
658 |         Save the onset activations to the given file.
659 | 
660 |         :param filename: the target file name
661 |         :param sep: separator between activation values
662 | 
663 |         Note: using an empty separator ('') results in a binary numpy array.
664 | 
665 |         """
666 |         self.activations.tofile(filename, sep=sep)
667 | 
668 |     def load(self, filename, sep):
669 |         """
670 |         Load the onset activations from the given file.
671 | 
672 |         :param filename: the target file name
673 |         :param sep: separator between activation values
674 | 
675 |         Note: using an empty separator ('') results in a binary numpy array.
676 | 
677 |         """
678 |         self.activations = np.fromfile(filename, sep=sep)
679 | 
680 | 
681 | def parser(lgd=False, threshold=1.1):
682 |     """
683 |     Parses the command line arguments.
684 | 
685 |     :param lgd:       use local group delay weighting by default
686 |     :param threshold: default value for threshold
687 | 
688 |     """
689 |     import argparse
690 |     # define parser
691 |     p = argparse.ArgumentParser(
692 |         formatter_class=argparse.RawDescriptionHelpFormatter, description="""
693 |     If invoked without any parameters, the software detects all onsets in
694 |     the given files according to the method proposed in:
695 | 
696 |     "Maximum Filter Vibrato Suppression for Onset Detection"
697 |     Sebastian Böck and Gerhard Widmer.
698 |     Proceedings of the 16th International Conference on Digital Audio Effects
699 |     (DAFx-13), Maynooth, Ireland, September 2013
700 | 
701 |     If the '--lgd' switch is set, it additionally applies a local group delay
702 |     based weighting according to the method proposed in:
703 | 
704 |     "Local group delay based vibrato and tremolo suppression for onset
705 |      detection"
706 |     Sebastian Böck and Gerhard Widmer.
707 |     Proceedings of the 13th International Society for Music Information
708 |     Retrieval Conference (ISMIR), 2013.
709 | 
710 |     The single most important parameter is the threshold ('-t'). Adjusting
711 |     this parameter might help to improve performance considerably. Please note
712 |     that if the local group delay weighting scheme is applied, the threshold
713 |     should be adjusted to a lower value, e.g. 0.25.
714 | 
715 |     """)
716 |     # general options
717 |     p.add_argument('files', metavar='files', nargs='+',
718 |                    help='files to be processed')
719 |     p.add_argument('-v', dest='verbose', action='store_true',
720 |                    help='be verbose')
721 |     p.add_argument('-s', dest='save', action='store_true', default=False,
722 |                    help='save the activations of the onset detection function')
723 |     p.add_argument('-l', dest='load', action='store_true', default=False,
724 |                    help='load the activations of the onset detection function')
725 |     p.add_argument('--sep', action='store', default='',
726 |                    help='separator for saving/loading the onset detection '
727 |                         'function [default=numpy binary]')
728 |     p.add_argument('--act_suffix', action='store', default='.act',
729 |                    help='filename suffix of the activations files '
730 |                         '[default=%(default)s]')
731 |     p.add_argument('--det_suffix', action='store', default='.superflux.txt',
732 |                    help='filename suffix of the detection files '
733 |                         '[default=%(default)s]')
734 |     # online / offline mode
735 |     p.add_argument('--online', action='store_true', default=False,
736 |                    help='operate in online mode (i.e. no future information '
737 |                         'will be used for computation)')
738 |     # wav options
739 |     wav = p.add_argument_group('audio arguments')
740 |     wav.add_argument('--norm', action='store_true', default=None,
741 |                      help='normalize the audio (switches to offline mode)')
742 |     wav.add_argument('--att', action='store', type=float, default=None,
743 |                      help='attenuate the audio by ATT dB')
744 |     # spectrogram options
745 |     spec = p.add_argument_group('spectrogram arguments')
746 |     spec.add_argument('--fps', action='store', default=200, type=int,
747 |                       help='frames per second [default=%(default)s]')
748 |     spec.add_argument('--frame_size', action='store', type=int, default=2048,
749 |                       help='frame size [samples, default=%(default)s]')
750 |     spec.add_argument('--ratio', action='store', type=float, default=0.5,
751 |                       help='window magnitude ratio to calc number of diff '
752 |                            'frames [default=%(default)s]')
753 |     spec.add_argument('--diff_frames', action='store', type=int, default=None,
754 |                       help='diff frames')
755 |     spec.add_argument('--max_bins', action='store', type=int, default=3,
756 |                       help='bins used for maximum filtering '
757 |                            '[default=%(default)s]')
758 |     # LGD stuff
759 |     mask = p.add_argument_group('local group delay based weighting')
760 |     mask.add_argument('--lgd', action='store_true', default=lgd,
761 |                       help='apply local group delay based weighting '
762 |                            '[default=%(default)s]')
763 |     mask.add_argument('--temporal_filter', action='store', default=3, type=int,
764 |                       help='apply a temporal filter of N frames before '
765 |                            'calculating the LGD weighting mask '
766 |                            '[default=%(default)s]')
767 |     # filtering
768 |     filt = p.add_argument_group('magnitude spectrogram filtering arguments')
769 |     filt.add_argument('--no_filter', dest='filter', action='store_false',
770 |                       default=True, help='do not filter the magnitude '
771 |                                          'spectrogram with a filterbank')
772 |     filt.add_argument('--fmin', action='store', default=30, type=float,
773 |                       help='minimum frequency of filter '
774 |                            '[Hz, default=%(default)s]')
775 |     filt.add_argument('--fmax', action='store', default=17000, type=float,
776 |                       help='maximum frequency of filter '
777 |                            '[Hz, default=%(default)s]')
778 |     filt.add_argument('--bands', action='store', type=int, default=24,
779 |                       help='number of bands per octave [default=%(default)s]')
780 |     filt.add_argument('--equal', action='store_true', default=False,
781 |                       help='equalize triangular windows to have equal area')
782 |     filt.add_argument('--block_size', action='store', default=2048, type=int,
783 |                       help='perform filtering in blocks of N frames '
784 |                            '[default=%(default)s]')
785 |     # logarithm
786 |     log = p.add_argument_group('logarithmic magnitude spectrogram arguments')
787 |     log.add_argument('--no_log', dest='log', action='store_false',
788 |                      default=True, help='use linear magnitude scale')
789 |     log.add_argument('--mul', action='store', default=1, type=float,
790 |                      help='multiplier (before taking the log) '
791 |                           '[default=%(default)s]')
792 |     log.add_argument('--add', action='store', default=1, type=float,
793 |                      help='value added (before taking the log) '
794 |                           '[default=%(default)s]')
795 |     # onset detection
796 |     onset = p.add_argument_group('onset peak-picking arguments')
797 |     onset.add_argument('-t', dest='threshold', action='store', type=float,
798 |                        default=threshold, help='detection threshold '
799 |                                                '[default=%(default)s]')
800 |     onset.add_argument('--combine', action='store', type=float, default=0.03,
801 |                        help='combine onsets within N seconds '
802 |                             '[default=%(default)s]')
803 |     onset.add_argument('--pre_avg', action='store', type=float, default=0.15,
804 |                        help='build average over N previous seconds '
805 |                             '[default=%(default)s]')
806 |     onset.add_argument('--pre_max', action='store', type=float, default=0.01,
807 |                        help='search maximum over N previous seconds '
808 |                             '[default=%(default)s]')
809 |     onset.add_argument('--post_avg', action='store', type=float, default=0,
810 |                        help='build average over N following seconds '
811 |                             '[default=%(default)s]')
812 |     onset.add_argument('--post_max', action='store', type=float, default=0.05,
813 |                        help='search maximum over N following seconds '
814 |                             '[default=%(default)s]')
815 |     onset.add_argument('--delay', action='store', type=float, default=0,
816 |                        help='report the onsets N seconds delayed '
817 |                             '[default=%(default)s]')
818 |     # version
819 |     p.add_argument('--version', action='version',
820 |                    version='%(prog)spec 1.03 (2014-11-02)')
821 |     # parse arguments
822 |     args = p.parse_args()
823 |     # print arguments
824 |     if args.verbose:
825 |         print args
826 |     # return args
827 |     return args
828 | 
829 | 
830 | def main(args):
831 |     """
832 |     Main SuperFlux program.
833 | 
834 |     :param args: parsed arguments
835 | 
836 |     """
837 |     import os.path
838 |     import glob
839 |     import fnmatch
840 |     # determine the files to process
841 |     files = []
842 |     for f in args.files:
843 |         # check what we have (file/path)
844 |         if os.path.isdir(f):
845 |             # use all files in the given path
846 |             files = glob.glob(f + '/*.wav')
847 |         else:
848 |             # file was given, append to list
849 |             files.append(f)
850 |     # only process .wav files
851 |     files = fnmatch.filter(files, '*.wav')
852 |     files.sort()
853 |     # init filterbank
854 |     filt = None
855 |     filterbank = None
856 |     # process the files
857 |     for f in files:
858 |         if args.verbose:
859 |             print 'processing file %s' % f
860 |         # use the name of the file without the extension
861 |         filename = os.path.splitext(f)[0]
862 |         # do the processing stuff unless the activations are loaded from file
863 |         if args.load:
864 |             # load the activations from file
865 |             o = Onset("%s.act" % filename, args.fps, args.online, args.sep)
866 |         else:
867 |             # open the wav file
868 |             w = Wav(f)
869 |             # normalize audio
870 |             if args.norm:
871 |                 w.normalize()
872 |                 args.online = False  # switch to offline mode
873 |             # down-mix to mono
874 |             if w.num_channels > 1:
875 |                 w.downmix()
876 |             # attenuate signal
877 |             if args.att:
878 |                 w.attenuate(args.att)
879 |             # create filterbank if needed
880 |             if args.filter:
881 |                 # re-create filterbank if the sample rate of the audio changes
882 |                 if filt is None or filt.fs != w.sample_rate:
883 |                     filt = Filter(args.frame_size / 2, w.sample_rate,
884 |                                   args.bands, args.fmin, args.fmax, args.equal)
885 |                     filterbank = filt.filterbank
886 |             # spectrogram
887 |             s = Spectrogram(w, frame_size=args.frame_size, fps=args.fps,
888 |                             filterbank=filterbank, log=args.log,
889 |                             mul=args.mul, add=args.add, online=args.online,
890 |                             block_size=args.block_size, lgd=args.lgd)
891 |             # use the spectrogram to create an SpectralODF object
892 |             sodf = SpectralODF(s, ratio=args.ratio, max_bins=args.max_bins,
893 |                                diff_frames=args.diff_frames)
894 |             # perform detection function on the object
895 |             if args.lgd:
896 |                 act = sodf.complex_flux()
897 |             else:
898 |                 act = sodf.superflux()
899 |             # create an Onset object with the activations
900 |             o = Onset(act, args.fps, args.online)
901 |             if args.save:
902 |                 # save the raw ODF activations
903 |                 o.save("%s%s" % (filename, args.act_suffix), args.sep)
904 |         # detect the onsets
905 |         o.detect(args.threshold, args.combine, args.pre_avg, args.pre_max,
906 |                  args.post_avg, args.post_max, args.delay)
907 |         # write the onsets to a file
908 |         o.write("%s%s" % (filename, args.det_suffix))
909 |         # also output them to stdout if verbose
910 |         if args.verbose:
911 |             print 'detections:', o.detections
912 |         # continue with next file
913 | 
914 | if __name__ == '__main__':
915 |     # parse arguments
916 |     args = parser()
917 |     # and run the main SuperFlux program
918 |     main(args)
919 | 


--------------------------------------------------------------------------------