├── AUTHORS ├── COPYING ├── Makefile.inc ├── Makefile.linux ├── Makefile.mingw32 ├── Makefile.mingw64 ├── Makefile.osx ├── README.md ├── bbc-vamp-plugins.cat ├── bbc-vamp-plugins.doxyfile ├── bbc-vamp-plugins.n3 └── src ├── Energy.cpp ├── Energy.h ├── Intensity.cpp ├── Intensity.h ├── Peaks.cpp ├── Peaks.h ├── Rhythm.cpp ├── Rhythm.h ├── SpectralContrast.cpp ├── SpectralContrast.h ├── SpectralFlux.cpp ├── SpectralFlux.h ├── SpeechMusicSegmenter.cpp ├── SpeechMusicSegmenter.h ├── plugins.cpp ├── vamp-plugin.list └── vamp-plugin.map /AUTHORS: -------------------------------------------------------------------------------- 1 | AUTHORS 2 | 3 | British Broadcasting Corporation 4 | -------------------------------- 5 | 6 | - Chris Baume 7 | - Yves Raimond 8 | -------------------------------------------------------------------------------- /COPYING: -------------------------------------------------------------------------------- 1 | (c) 2011-2013 British Broadcasting Corporation and contributors 2 | See "AUTHORS" file for full details. 3 | 4 | All code here, except where otherwise indicated is licensed under the 5 | Apache Licence version 2.0. 6 | 7 | ---------------------------------------------------------------------------- 8 | 9 | Apache License 10 | Version 2.0, January 2004 11 | http://www.apache.org/licenses/ 12 | 13 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 14 | 15 | 1. Definitions. 16 | 17 | "License" shall mean the terms and conditions for use, reproduction, 18 | and distribution as defined by Sections 1 through 9 of this document. 19 | 20 | "Licensor" shall mean the copyright owner or entity authorized by 21 | the copyright owner that is granting the License. 22 | 23 | "Legal Entity" shall mean the union of the acting entity and all 24 | other entities that control, are controlled by, or are under common 25 | control with that entity. For the purposes of this definition, 26 | "control" means (i) the power, direct or indirect, to cause the 27 | direction or management of such entity, whether by contract or 28 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 29 | outstanding shares, or (iii) beneficial ownership of such entity. 30 | 31 | "You" (or "Your") shall mean an individual or Legal Entity 32 | exercising permissions granted by this License. 33 | 34 | "Source" form shall mean the preferred form for making modifications, 35 | including but not limited to software source code, documentation 36 | source, and configuration files. 37 | 38 | "Object" form shall mean any form resulting from mechanical 39 | transformation or translation of a Source form, including but 40 | not limited to compiled object code, generated documentation, 41 | and conversions to other media types. 42 | 43 | "Work" shall mean the work of authorship, whether in Source or 44 | Object form, made available under the License, as indicated by a 45 | copyright notice that is included in or attached to the work 46 | (an example is provided in the Appendix below). 47 | 48 | "Derivative Works" shall mean any work, whether in Source or Object 49 | form, that is based on (or derived from) the Work and for which the 50 | editorial revisions, annotations, elaborations, or other modifications 51 | represent, as a whole, an original work of authorship. For the 52 | purposes 53 | of this License, Derivative Works shall not include works that remain 54 | separable from, or merely link (or bind by name) to the interfaces of, 55 | the Work and Derivative Works thereof. 56 | 57 | "Contribution" shall mean any work of authorship, including 58 | the original version of the Work and any modifications or additions 59 | to that Work or Derivative Works thereof, that is intentionally 60 | submitted to Licensor for inclusion in the Work by the copyright owner 61 | or by an individual or Legal Entity authorized to submit on behalf of 62 | the copyright owner. For the purposes of this definition, "submitted" 63 | means any form of electronic, verbal, or written communication sent 64 | to the Licensor or its representatives, including but not limited to 65 | communication on electronic mailing lists, source code control 66 | systems, 67 | and issue tracking systems that are managed by, or on behalf of, the 68 | Licensor for the purpose of discussing and improving the Work, but 69 | excluding communication that is conspicuously marked or otherwise 70 | designated in writing by the copyright owner as "Not a Contribution." 71 | 72 | "Contributor" shall mean Licensor and any individual or Legal Entity 73 | on behalf of whom a Contribution has been received by Licensor and 74 | subsequently incorporated within the Work. 75 | 76 | 2. Grant of Copyright License. Subject to the terms and conditions of 77 | this License, each Contributor hereby grants to You a perpetual, 78 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 79 | copyright license to reproduce, prepare Derivative Works of, 80 | publicly display, publicly perform, sublicense, and distribute the 81 | Work and such Derivative Works in Source or Object form. 82 | 83 | 3. Grant of Patent License. Subject to the terms and conditions of 84 | this License, each Contributor hereby grants to You a perpetual, 85 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 86 | (except as stated in this section) patent license to make, have made, 87 | use, offer to sell, sell, import, and otherwise transfer the Work, 88 | where such license applies only to those patent claims licensable 89 | by such Contributor that are necessarily infringed by their 90 | Contribution(s) alone or by combination of their Contribution(s) 91 | with the Work to which such Contribution(s) was submitted. If You 92 | institute patent litigation against any entity (including a 93 | cross-claim or counterclaim in a lawsuit) alleging that the Work 94 | or a Contribution incorporated within the Work constitutes direct 95 | or contributory patent infringement, then any patent licenses 96 | granted to You under this License for that Work shall terminate 97 | as of the date such litigation is filed. 98 | 99 | 4. Redistribution. You may reproduce and distribute copies of the 100 | Work or Derivative Works thereof in any medium, with or without 101 | modifications, and in Source or Object form, provided that You 102 | meet the following conditions: 103 | 104 | (a) You must give any other recipients of the Work or 105 | Derivative Works a copy of this License; and 106 | 107 | (b) You must cause any modified files to carry prominent notices 108 | stating that You changed the files; and 109 | 110 | (c) You must retain, in the Source form of any Derivative Works 111 | that You distribute, all copyright, patent, trademark, and 112 | attribution notices from the Source form of the Work, 113 | excluding those notices that do not pertain to any part of 114 | the Derivative Works; and 115 | 116 | (d) If the Work includes a "NOTICE" text file as part of its 117 | distribution, then any Derivative Works that You distribute must 118 | include a readable copy of the attribution notices contained 119 | within such NOTICE file, excluding those notices that do not 120 | pertain to any part of the Derivative Works, in at least one 121 | of the following places: within a NOTICE text file distributed 122 | as part of the Derivative Works; within the Source form or 123 | documentation, if provided along with the Derivative Works; or, 124 | within a display generated by the Derivative Works, if and 125 | wherever such third-party notices normally appear. The contents 126 | of the NOTICE file are for informational purposes only and 127 | do not modify the License. You may add Your own attribution 128 | notices within Derivative Works that You distribute, alongside 129 | or as an addendum to the NOTICE text from the Work, provided 130 | that such additional attribution notices cannot be construed 131 | as modifying the License. 132 | 133 | You may add Your own copyright statement to Your modifications and 134 | may provide additional or different license terms and conditions 135 | for use, reproduction, or distribution of Your modifications, or 136 | for any such Derivative Works as a whole, provided Your use, 137 | reproduction, and distribution of the Work otherwise complies with 138 | the conditions stated in this License. 139 | 140 | 5. Submission of Contributions. Unless You explicitly state otherwise, 141 | any Contribution intentionally submitted for inclusion in the Work 142 | by You to the Licensor shall be under the terms and conditions of 143 | this License, without any additional terms or conditions. 144 | Notwithstanding the above, nothing herein shall supersede or modify 145 | the terms of any separate license agreement you may have executed 146 | with Licensor regarding such Contributions. 147 | 148 | 6. Trademarks. This License does not grant permission to use the trade 149 | names, trademarks, service marks, or product names of the Licensor, 150 | except as required for reasonable and customary use in describing the 151 | origin of the Work and reproducing the content of the NOTICE file. 152 | 153 | 7. Disclaimer of Warranty. Unless required by applicable law or 154 | agreed to in writing, Licensor provides the Work (and each 155 | Contributor provides its Contributions) on an "AS IS" BASIS, 156 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 157 | implied, including, without limitation, any warranties or conditions 158 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 159 | PARTICULAR PURPOSE. You are solely responsible for determining the 160 | appropriateness of using or redistributing the Work and assume any 161 | risks associated with Your exercise of permissions under this License. 162 | 163 | 8. Limitation of Liability. In no event and under no legal theory, 164 | whether in tort (including negligence), contract, or otherwise, 165 | unless required by applicable law (such as deliberate and grossly 166 | negligent acts) or agreed to in writing, shall any Contributor be 167 | liable to You for damages, including any direct, indirect, special, 168 | incidental, or consequential damages of any character arising as a 169 | result of this License or out of the use or inability to use the 170 | Work (including but not limited to damages for loss of goodwill, 171 | work stoppage, computer failure or malfunction, or any and all 172 | other commercial damages or losses), even if such Contributor 173 | has been advised of the possibility of such damages. 174 | 175 | 9. Accepting Warranty or Additional Liability. While redistributing 176 | the Work or Derivative Works thereof, You may choose to offer, 177 | and charge a fee for, acceptance of support, warranty, indemnity, 178 | or other liability obligations and/or rights consistent with this 179 | License. However, in accepting such obligations, You may act only 180 | on Your own behalf and on Your sole responsibility, not on behalf 181 | of any other Contributor, and only if You agree to indemnify, 182 | defend, and hold each Contributor harmless for any liability 183 | incurred by, or claims asserted against, such Contributor by reason 184 | of your accepting any such warranty or additional liability. 185 | 186 | END OF TERMS AND CONDITIONS 187 | 188 | APPENDIX: How to apply the Apache License to your work. 189 | 190 | To apply the Apache License to your work, attach the following 191 | boilerplate notice, with the fields enclosed by brackets "[]" 192 | replaced with your own identifying information. (Don't include 193 | the brackets!) The text should be enclosed in the appropriate 194 | comment syntax for the file format. We also recommend that a 195 | file or class name and description of purpose be included on the 196 | same "printed page" as the copyright notice for easier 197 | identification within third-party archives. 198 | 199 | Copyright [yyyy] [name of copyright owner] 200 | 201 | Licensed under the Apache License, Version 2.0 (the "License"); 202 | you may not use this file except in compliance with the License. 203 | You may obtain a copy of the License at 204 | 205 | http://www.apache.org/licenses/LICENSE-2.0 206 | 207 | Unless required by applicable law or agreed to in writing, software 208 | distributed under the License is distributed on an "AS IS" BASIS, 209 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 210 | See the License for the specific language governing permissions and 211 | limitations under the License. 212 | -------------------------------------------------------------------------------- /Makefile.inc: -------------------------------------------------------------------------------- 1 | # Edit this to the location of the Vamp plugin SDK, relative to your project directory 2 | VAMP_SDK_DIR := ../vamp-plugin-sdk-2.6 3 | 4 | PLUGIN_LIBRARY_NAME := bbc-vamp-plugins 5 | 6 | SOURCES := src/Energy.cpp \ 7 | src/Intensity.cpp \ 8 | src/SpectralFlux.cpp \ 9 | src/Rhythm.cpp \ 10 | src/SpectralContrast.cpp \ 11 | src/SpeechMusicSegmenter.cpp \ 12 | src/Peaks.cpp \ 13 | src/plugins.cpp 14 | 15 | HEADERS := src/Energy.h \ 16 | src/Intensity.h \ 17 | src/SpectralFlux.h \ 18 | src/Rhythm.h \ 19 | src/SpectralContrast.h \ 20 | src/SpeechMusicSegmenter.h \ 21 | src/Peaks.h 22 | -------------------------------------------------------------------------------- /Makefile.linux: -------------------------------------------------------------------------------- 1 | include Makefile.inc 2 | 3 | CXXFLAGS := -I$(VAMP_SDK_DIR) -fPIC 4 | PLUGIN_EXT := .so 5 | LDFLAGS := -shared -Wl,-soname=$(PLUGIN) $(VAMP_SDK_DIR)/libvamp-sdk.a -Wl,--version-script=src/vamp-plugin.map 6 | 7 | PLUGIN ?= $(PLUGIN_LIBRARY_NAME)$(PLUGIN_EXT) 8 | CXX ?= g++ 9 | CC ?= gcc 10 | 11 | OBJECTS := $(SOURCES:.cpp=.o) 12 | OBJECTS := $(OBJECTS:.c=.o) 13 | 14 | $(PLUGIN): $(OBJECTS) 15 | $(CXX) -o $@ $^ $(LDFLAGS) 16 | 17 | clean: 18 | rm $(OBJECTS) 19 | 20 | distclean: clean 21 | rm $(PLUGIN) 22 | -------------------------------------------------------------------------------- /Makefile.mingw32: -------------------------------------------------------------------------------- 1 | include Makefile.inc 2 | 3 | PLUGIN_EXT := .dll 4 | PLUGIN := $(PLUGIN_LIBRARY_NAME)$(PLUGIN_EXT) 5 | CXXFLAGS := -I$(VAMP_SDK_DIR) 6 | LDFLAGS := $(LDFLAGS) -fno-exceptions -static -static-libgcc 7 | DYNAMIC_LDFLAGS = -shared -Wl,-Bsymbolic 8 | PLUGIN_LDFLAGS = $(DYNAMIC_LDFLAGS) -Wl,--retain-symbols-file=$(VAMP_SDK_DIR)/build/vamp-plugin.list 9 | PLUGIN_LIBS = $(VAMP_SDK_DIR)/libvamp-sdk.a 10 | 11 | CXX := i686-w64-mingw32-g++ 12 | CC := i686-w64-mingw32-gcc 13 | 14 | OBJECTS := $(SOURCES:.cpp=.o) 15 | OBJECTS := $(OBJECTS:.c=.o) 16 | 17 | $(PLUGIN): $(OBJECTS) 18 | $(CXX) $(LDFLAGS) $(PLUGIN_LDFLAGS) -o $@ $^ $(PLUGIN_LIBS) 19 | 20 | clean: 21 | rm $(OBJECTS) 22 | 23 | distclean: clean 24 | rm $(PLUGIN) 25 | -------------------------------------------------------------------------------- /Makefile.mingw64: -------------------------------------------------------------------------------- 1 | include Makefile.inc 2 | 3 | PLUGIN_EXT := .dll 4 | PLUGIN := $(PLUGIN_LIBRARY_NAME)$(PLUGIN_EXT) 5 | CXXFLAGS := -I$(VAMP_SDK_DIR) 6 | LDFLAGS := $(LDFLAGS) -fno-exceptions -static -static-libgcc 7 | DYNAMIC_LDFLAGS = -shared -Wl,-Bsymbolic 8 | PLUGIN_LDFLAGS = $(DYNAMIC_LDFLAGS) -Wl,--version-script=$(VAMP_SDK_DIR)/build/vamp-plugin.map 9 | PLUGIN_LIBS = $(VAMP_SDK_DIR)/libvamp-sdk.a 10 | 11 | CXX := x86_64-w64-mingw32-g++ 12 | CC := x86_64-w64-mingw32-gcc 13 | 14 | OBJECTS := $(SOURCES:.cpp=.o) 15 | OBJECTS := $(OBJECTS:.c=.o) 16 | 17 | $(PLUGIN): $(OBJECTS) 18 | $(CXX) $(LDFLAGS) $(PLUGIN_LDFLAGS) -o $@ $^ $(PLUGIN_LIBS) 19 | 20 | clean: 21 | rm $(OBJECTS) 22 | 23 | distclean: clean 24 | rm $(PLUGIN) 25 | -------------------------------------------------------------------------------- /Makefile.osx: -------------------------------------------------------------------------------- 1 | include Makefile.inc 2 | 3 | CFLAGS := -O3 -arch i386 -arch x86_64 -I$(VAMP_SDK_DIR) 4 | CXXFLAGS := $(CFLAGS) 5 | PLUGIN_EXT := .dylib 6 | LDFLAGS := -arch i386 -arch x86_64 -dynamiclib $(VAMP_SDK_DIR)/libvamp-sdk.a -exported_symbols_list src/vamp-plugin.list -install_name $(PLUGIN_LIBRARY_NAME)$(PLUGIN_EXT) 7 | 8 | PLUGIN := $(PLUGIN_LIBRARY_NAME)$(PLUGIN_EXT) 9 | CXX := g++ 10 | CC := gcc 11 | 12 | OBJECTS := $(SOURCES:.cpp=.o) 13 | OBJECTS := $(OBJECTS:.c=.o) 14 | 15 | $(PLUGIN): $(OBJECTS) 16 | $(CXX) -o $@ $^ $(LDFLAGS) 17 | 18 | clean: 19 | rm $(OBJECTS) 20 | 21 | distclean: clean 22 | rm $(PLUGIN) 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | BBC Vamp plugin collection 2 | === 3 | 4 | ## Introduction 5 | 6 | This is a collection of audio feature extraction algorithms written in the 7 | [Vamp plugin format](http://vamp-plugins.org) by BBC Research and Development. 8 | 9 | Below is a list of plugins and their outputs. Detailed information about each 10 | of the features and the algorithms used is contained in the full documentation, 11 | which is available to download from the [releases 12 | page](https://github.com/bbcrd/bbc-vamp-plugins/releases). 13 | 14 | * __Peaks__ 15 | 1. Peak/trough 16 | * __Energy__ 17 | 1. RMS energy 18 | 1. RMS energy delta 19 | 1. Moving average 20 | 1. Dip probability 21 | 1. Low energy ratio 22 | * __Intensity__ 23 | 1. Intensity 24 | 1. Intensity ratio 25 | * __Rhythm__ 26 | 1. Onset detection curve 27 | 1. Moving average of the onset detection curve 28 | 1. Difference between 1 and 2 29 | 1. Onsets 30 | 1. Average onset frequency 31 | 1. Rhythm strength 32 | 1. Autocorrelation 33 | 1. Mean correlation peak 34 | 1. Peak valley ratio 35 | 1. Tempo 36 | * __Spectral Contrast__ 37 | 1. Valleys 38 | 1. Peaks 39 | 1. Mean 40 | * __Spectral Flux__ 41 | 1. Spectral flux 42 | * __Speech/music segmenter__ 43 | 1. Segmentation 44 | 1. Detection function 45 | 46 | ## Binary installation (recommended) 47 | Download the correct plugin for your platform from the [releases 48 | page](https://github.com/bbcrd/bbc-vamp-plugins/releases) and extract the 49 | contents into the [Vamp system plugin 50 | folder](http://vamp-plugins.org/download.html#install). 51 | 52 | ## Installation from source 53 | 54 | ### Linux (Ubuntu/Debian) 55 | 56 | Firstly you will need a C++ compiler: 57 | 58 | sudo apt-get install build-essential 59 | 60 | Download the Vamp SDK 61 | 62 | wget https://code.soundsoftware.ac.uk/attachments/download/1514/vamp-plugin-sdk-2.6.tar.gz 63 | tar xvf vamp-plugin-sdk-2.6.tar.gz 64 | cd vamp-plugin-sdk-2.6 65 | 66 | Compile the SDK 67 | 68 | ./configure 69 | make sdk 70 | 71 | In Makefile.inc, set VAMP\_SDK\_DIR to the SDK path 72 | 73 | cd /path/to/bbc-vamp-plugins 74 | nano Makefile.inc 75 | 76 | Build the plugin 77 | 78 | make -f Makefile.linux 79 | 80 | Install the plugin 81 | 82 | mv bbc-vamp-plugins.so bbc-vamp-plugins.cat bbc-vamp-plugins.n3 /usr/local/lib/vamp/ 83 | 84 | ### OS/X 85 | 86 | Install [XCode](http://developer.apple.com/xcode/) if you haven't already. 87 | 88 | Download the Vamp SDK 89 | 90 | wget https://code.soundsoftware.ac.uk/attachments/download/1514/vamp-plugin-sdk-2.6.tar.gz 91 | tar xvf vamp-plugin-sdk-2.6.tar.gz 92 | cd vamp-plugin-sdk-2.6 93 | 94 | Compile the SDK 95 | 96 | make -f build/Makefile.osx sdk 97 | 98 | In Makefile.inc, set VAMP\_SDK\_DIR to the SDK path 99 | 100 | cd /path/to/bbc-vamp-plugins 101 | nano Makefile.inc 102 | 103 | Build the plugin 104 | 105 | make -f Makefile.osx 106 | 107 | Install the plugin 108 | 109 | mv bbc-vamp-plugins.dylib bbc-vamp-plugins.cat bbc-vamp-plugins.n3 /Library/Audio/Plug-Ins/Vamp/ 110 | 111 | ### Windows (cross-compiled) 112 | 113 | To compile a Windows binary from a Linux environment, install MinGW: 114 | 115 | sudo apt-get install mingw-w64 116 | 117 | Download the Vamp SDK 118 | 119 | wget https://code.soundsoftware.ac.uk/attachments/download/1514/vamp-plugin-sdk-2.6.tar.gz 120 | tar xvf vamp-plugin-sdk-2.6.tar.gz 121 | cd vamp-plugin-sdk-2.6 122 | 123 | Compile the SDK 124 | 125 | make -f build/Makefile.mingw32 sdk 126 | 127 | In Makefile.inc, set VAMP\_SDK\_DIR to the SDK path 128 | 129 | cd /path/to/bbc-vamp-plugins 130 | nano Makefile.inc 131 | 132 | Build the plugin 133 | 134 | make -f Makefile.mingw32 135 | 136 | Install the plugin by putting bbc-vamp-plugins.dll, bbc-vamp-plugins.cat and 137 | bbc-vamp-plugins.n3 in the [Vamp system plugin 138 | folder](http://vamp-plugins.org/download.html#install). 139 | 140 | ## Documentation 141 | 142 | To generate the documentation, install [Doxygen](http://www.doxygen.org) and 143 | run the following command from the src folder. The documents will appear in 144 | doc/html/index.html 145 | 146 | doxygen ../bbc-vamp-plugins.doxyfile 147 | 148 | ## Usage 149 | 150 | The two primary programs which use Vamp plugins are 151 | [sonic annotator](http://www.omras2.org/sonicannotator) and 152 | [sonic visualiser](http://www.sonicvisualiser.org/). 153 | 154 | Below is an example of how to extract the tempo of an audio file using sonic 155 | annotator and default settings: 156 | 157 | sonic-annotator -d vamp:bbc-vamp-plugins:bbc-rhythm:tempo audio.wav -w csv --csv-stdout 158 | 159 | ## Further reading 160 | 161 | * [Vamp plugins](http://vamp-plugins.org) 162 | * [BBC R&D](http://www.bbc.co.uk/rd) 163 | * [QMUL Centre for Digital Music](http://www.elec.qmul.ac.uk/digitalmusic/) 164 | 165 | ## Licensing terms and authorship 166 | 167 | Please refer to the 'COPYING' and 'AUTHORS' files. 168 | -------------------------------------------------------------------------------- /bbc-vamp-plugins.cat: -------------------------------------------------------------------------------- 1 | vamp:bbc-vamp-plugins:bbc-energy::Low Level Features 2 | vamp:bbc-vamp-plugins:bbc-intensity::Low Level Features 3 | vamp:bbc-vamp-plugins:bbc-rhythm::Low Level Features 4 | vamp:bbc-vamp-plugins:bbc-spectral-contrast::Low Level Features 5 | vamp:bbc-vamp-plugins:bbc-spectral-flux::Low Level Features 6 | vamp:bbc-vamp-plugins:bbc-speechmusic-segmenter::Classification 7 | vamp:bbc-vamp-plugins:bbc-peaks::Low Level Features 8 | -------------------------------------------------------------------------------- /bbc-vamp-plugins.n3: -------------------------------------------------------------------------------- 1 | @prefix rdfs: . 2 | @prefix xsd: . 3 | @prefix vamp: . 4 | @prefix plugbase: . 5 | @prefix owl: . 6 | @prefix dc: . 7 | @prefix af: . 8 | @prefix foaf: . 9 | @prefix cc: . 10 | @prefix : <#> . 11 | 12 | <> a vamp:PluginDescription ; 13 | foaf:maker ; 14 | foaf:primaryTopic . 15 | 16 | :bbc-vamp-plugins a vamp:PluginLibrary ; 17 | vamp:identifier "bbc-vamp-plugins" ; 18 | vamp:available_plugin plugbase:bbc-energy ; 19 | vamp:available_plugin plugbase:bbc-intensity ; 20 | vamp:available_plugin plugbase:bbc-rhythm ; 21 | vamp:available_plugin plugbase:bbc-spectral-contrast ; 22 | vamp:available_plugin plugbase:bbc-spectral-flux ; 23 | vamp:available_plugin plugbase:bbc-speechmusic-segmenter ; 24 | # foaf:page ; 25 | . 26 | 27 | plugbase:bbc-energy a vamp:Plugin ; 28 | dc:title "Energy" ; 29 | vamp:name "Energy" ; 30 | dc:description """""" ; 31 | foaf:maker [ foaf:name "BBC" ] ; # FIXME could give plugin author's URI here 32 | dc:rights """(c) 2013 British Broadcasting Corporation""" ; 33 | # cc:license ; 34 | vamp:identifier "bbc-energy" ; 35 | vamp:vamp_API_version vamp:api_version_2 ; 36 | owl:versionInfo "2" ; 37 | vamp:input_domain vamp:TimeDomain ; 38 | 39 | vamp:parameter plugbase:bbc-energy_param_threshold ; 40 | vamp:parameter plugbase:bbc-energy_param_root ; 41 | 42 | vamp:output plugbase:bbc-energy_output_rmsenergy ; 43 | vamp:output plugbase:bbc-energy_output_lowenergy ; 44 | . 45 | plugbase:bbc-energy_param_threshold a vamp:Parameter ; 46 | vamp:identifier "threshold" ; 47 | dc:title "Low energy threshold" ; 48 | dc:format "" ; 49 | vamp:min_value 0 ; 50 | vamp:max_value 10 ; 51 | vamp:unit "" ; 52 | vamp:default_value 1 ; 53 | vamp:value_names (); 54 | . 55 | plugbase:bbc-energy_param_root a vamp:QuantizedParameter ; 56 | vamp:identifier "root" ; 57 | dc:title "Use root" ; 58 | dc:format "" ; 59 | vamp:min_value 0 ; 60 | vamp:max_value 1 ; 61 | vamp:unit "" ; 62 | vamp:quantize_step 1 ; 63 | vamp:default_value 1 ; 64 | vamp:value_names (); 65 | . 66 | plugbase:bbc-energy_output_rmsenergy a vamp:DenseOutput ; 67 | vamp:identifier "rmsenergy" ; 68 | dc:title "RMS Energy" ; 69 | dc:description """RMS of the signal.""" ; 70 | vamp:fixed_bin_count "true" ; 71 | vamp:unit "" ; 72 | vamp:bin_count 1 ; 73 | # vamp:computes_event_type ; 74 | # vamp:computes_feature ; 75 | # vamp:computes_signal_type ; 76 | . 77 | plugbase:bbc-energy_output_lowenergy a vamp:SparseOutput ; 78 | vamp:identifier "lowenergy" ; 79 | dc:title "Low Energy" ; 80 | dc:description """Percentage of track which is below the low energy threshold.""" ; 81 | vamp:fixed_bin_count "true" ; 82 | vamp:unit "" ; 83 | vamp:bin_count 1 ; 84 | vamp:sample_type vamp:VariableSampleRate ; 85 | # vamp:computes_event_type ; 86 | # vamp:computes_feature ; 87 | # vamp:computes_signal_type ; 88 | . 89 | plugbase:bbc-intensity a vamp:Plugin ; 90 | dc:title "Intensity" ; 91 | vamp:name "Intensity" ; 92 | dc:description """""" ; 93 | foaf:maker [ foaf:name "BBC" ] ; # FIXME could give plugin author's URI here 94 | dc:rights """(c) 2013 British Broadcasting Corporation""" ; 95 | # cc:license ; 96 | vamp:identifier "bbc-intensity" ; 97 | vamp:vamp_API_version vamp:api_version_2 ; 98 | owl:versionInfo "1" ; 99 | vamp:input_domain vamp:FrequencyDomain ; 100 | 101 | 102 | vamp:parameter plugbase:bbc-intensity_param_numBands ; 103 | 104 | vamp:output plugbase:bbc-intensity_output_intensity ; 105 | vamp:output plugbase:bbc-intensity_output_intensity-ratio ; 106 | . 107 | plugbase:bbc-intensity_param_numBands a vamp:QuantizedParameter ; 108 | vamp:identifier "numBands" ; 109 | dc:title "Sub-bands" ; 110 | dc:format "" ; 111 | vamp:min_value 2 ; 112 | vamp:max_value 50 ; 113 | vamp:unit "" ; 114 | vamp:quantize_step 1 ; 115 | vamp:default_value 7 ; 116 | vamp:value_names (); 117 | . 118 | plugbase:bbc-intensity_output_intensity a vamp:DenseOutput ; 119 | vamp:identifier "intensity" ; 120 | dc:title "Intensity" ; 121 | dc:description """Sum of the FFT bin absolute values.""" ; 122 | vamp:fixed_bin_count "true" ; 123 | vamp:unit "" ; 124 | vamp:bin_count 1 ; 125 | # vamp:computes_event_type ; 126 | # vamp:computes_feature ; 127 | # vamp:computes_signal_type ; 128 | . 129 | plugbase:bbc-intensity_output_intensity-ratio a vamp:DenseOutput ; 130 | vamp:identifier "intensity-ratio" ; 131 | dc:title "Intensity Ratio" ; 132 | dc:description """Sum of each sub-band's absolute values.""" ; 133 | vamp:fixed_bin_count "true" ; 134 | vamp:unit "" ; 135 | vamp:bin_count 7 ; 136 | # vamp:computes_event_type ; 137 | # vamp:computes_feature ; 138 | # vamp:computes_signal_type ; 139 | . 140 | plugbase:bbc-rhythm a vamp:Plugin ; 141 | dc:title "Rhythm" ; 142 | vamp:name "Rhythm" ; 143 | dc:description """""" ; 144 | foaf:maker [ foaf:name "BBC" ] ; # FIXME could give plugin author's URI here 145 | dc:rights """(c) 2013 British Broadcasting Corporation""" ; 146 | # cc:license ; 147 | vamp:identifier "bbc-rhythm" ; 148 | vamp:vamp_API_version vamp:api_version_2 ; 149 | owl:versionInfo "1" ; 150 | vamp:input_domain vamp:FrequencyDomain ; 151 | 152 | 153 | vamp:parameter plugbase:bbc-rhythm_param_numBands ; 154 | vamp:parameter plugbase:bbc-rhythm_param_threshold ; 155 | vamp:parameter plugbase:bbc-rhythm_param_average_window ; 156 | vamp:parameter plugbase:bbc-rhythm_param_peak_window ; 157 | vamp:parameter plugbase:bbc-rhythm_param_min_bpm ; 158 | vamp:parameter plugbase:bbc-rhythm_param_max_bpm ; 159 | 160 | vamp:output plugbase:bbc-rhythm_output_onset_curve ; 161 | vamp:output plugbase:bbc-rhythm_output_average ; 162 | vamp:output plugbase:bbc-rhythm_output_diff ; 163 | vamp:output plugbase:bbc-rhythm_output_onset ; 164 | vamp:output plugbase:bbc-rhythm_output_avg-onset-freq ; 165 | vamp:output plugbase:bbc-rhythm_output_rhythm-strength ; 166 | vamp:output plugbase:bbc-rhythm_output_autocor ; 167 | vamp:output plugbase:bbc-rhythm_output_mean-correlation-peak ; 168 | vamp:output plugbase:bbc-rhythm_output_peak-valley-ratio ; 169 | vamp:output plugbase:bbc-rhythm_output_tempo ; 170 | . 171 | plugbase:bbc-rhythm_param_numBands a vamp:QuantizedParameter ; 172 | vamp:identifier "numBands" ; 173 | dc:title "Sub-bands" ; 174 | dc:format "" ; 175 | vamp:min_value 2 ; 176 | vamp:max_value 50 ; 177 | vamp:unit "" ; 178 | vamp:quantize_step 1 ; 179 | vamp:default_value 7 ; 180 | vamp:value_names (); 181 | . 182 | plugbase:bbc-rhythm_param_threshold a vamp:Parameter ; 183 | vamp:identifier "threshold" ; 184 | dc:title "Threshold" ; 185 | dc:format "" ; 186 | vamp:min_value 0 ; 187 | vamp:max_value 10 ; 188 | vamp:unit "" ; 189 | vamp:default_value 1 ; 190 | vamp:value_names (); 191 | . 192 | plugbase:bbc-rhythm_param_average_window a vamp:QuantizedParameter ; 193 | vamp:identifier "average_window" ; 194 | dc:title "Moving average window length" ; 195 | dc:format "frames" ; 196 | vamp:min_value 1 ; 197 | vamp:max_value 500 ; 198 | vamp:unit "frames" ; 199 | vamp:quantize_step 1 ; 200 | vamp:default_value 200 ; 201 | vamp:value_names (); 202 | . 203 | plugbase:bbc-rhythm_param_peak_window a vamp:QuantizedParameter ; 204 | vamp:identifier "peak_window" ; 205 | dc:title "Onset peak window length" ; 206 | dc:format "frames" ; 207 | vamp:min_value 1 ; 208 | vamp:max_value 20 ; 209 | vamp:unit "frames" ; 210 | vamp:quantize_step 1 ; 211 | vamp:default_value 6 ; 212 | vamp:value_names (); 213 | . 214 | plugbase:bbc-rhythm_param_min_bpm a vamp:QuantizedParameter ; 215 | vamp:identifier "min_bpm" ; 216 | dc:title "Minimum BPM" ; 217 | dc:format "bpm" ; 218 | vamp:min_value 5 ; 219 | vamp:max_value 300 ; 220 | vamp:unit "bpm" ; 221 | vamp:quantize_step 1 ; 222 | vamp:default_value 12 ; 223 | vamp:value_names (); 224 | . 225 | plugbase:bbc-rhythm_param_max_bpm a vamp:QuantizedParameter ; 226 | vamp:identifier "max_bpm" ; 227 | dc:title "Maximum BPM" ; 228 | dc:format "bpm" ; 229 | vamp:min_value 50 ; 230 | vamp:max_value 400 ; 231 | vamp:unit "bpm" ; 232 | vamp:quantize_step 1 ; 233 | vamp:default_value 300 ; 234 | vamp:value_names (); 235 | . 236 | plugbase:bbc-rhythm_output_onset_curve a vamp:SparseOutput ; 237 | vamp:identifier "onset_curve" ; 238 | dc:title "Onset curve" ; 239 | dc:description """Onset detection curve.""" ; 240 | vamp:fixed_bin_count "true" ; 241 | vamp:unit "" ; 242 | vamp:bin_count 1 ; 243 | vamp:sample_type vamp:VariableSampleRate ; 244 | # vamp:computes_event_type ; 245 | # vamp:computes_feature ; 246 | # vamp:computes_signal_type ; 247 | . 248 | plugbase:bbc-rhythm_output_average a vamp:SparseOutput ; 249 | vamp:identifier "average" ; 250 | dc:title "Average" ; 251 | dc:description """Moving average of onset curve.""" ; 252 | vamp:fixed_bin_count "true" ; 253 | vamp:unit "" ; 254 | vamp:bin_count 1 ; 255 | vamp:sample_type vamp:VariableSampleRate ; 256 | # vamp:computes_event_type ; 257 | # vamp:computes_feature ; 258 | # vamp:computes_signal_type ; 259 | . 260 | plugbase:bbc-rhythm_output_diff a vamp:SparseOutput ; 261 | vamp:identifier "diff" ; 262 | dc:title "Difference" ; 263 | dc:description """Difference between onset and average.""" ; 264 | vamp:fixed_bin_count "true" ; 265 | vamp:unit "" ; 266 | vamp:bin_count 1 ; 267 | vamp:sample_type vamp:VariableSampleRate ; 268 | # vamp:computes_event_type ; 269 | # vamp:computes_feature ; 270 | # vamp:computes_signal_type ; 271 | . 272 | plugbase:bbc-rhythm_output_onset a vamp:SparseOutput ; 273 | vamp:identifier "onset" ; 274 | dc:title "Onset" ; 275 | dc:description """Point of onsets.""" ; 276 | vamp:fixed_bin_count "true" ; 277 | vamp:unit "" ; 278 | vamp:bin_count 0 ; 279 | vamp:sample_type vamp:VariableSampleRate ; 280 | # vamp:computes_event_type ; 281 | # vamp:computes_feature ; 282 | # vamp:computes_signal_type ; 283 | . 284 | plugbase:bbc-rhythm_output_avg-onset-freq a vamp:SparseOutput ; 285 | vamp:identifier "avg-onset-freq" ; 286 | dc:title "Average Onset Frequency" ; 287 | dc:description """Rate of onsets per minute.""" ; 288 | vamp:fixed_bin_count "true" ; 289 | vamp:unit "" ; 290 | vamp:bin_count 1 ; 291 | vamp:sample_type vamp:VariableSampleRate ; 292 | # vamp:computes_event_type ; 293 | # vamp:computes_feature ; 294 | # vamp:computes_signal_type ; 295 | . 296 | plugbase:bbc-rhythm_output_rhythm-strength a vamp:SparseOutput ; 297 | vamp:identifier "rhythm-strength" ; 298 | dc:title "Rhythm Strength" ; 299 | dc:description """Average value of peaks in onset curve.""" ; 300 | vamp:fixed_bin_count "true" ; 301 | vamp:unit "" ; 302 | vamp:bin_count 1 ; 303 | vamp:sample_type vamp:VariableSampleRate ; 304 | # vamp:computes_event_type ; 305 | # vamp:computes_feature ; 306 | # vamp:computes_signal_type ; 307 | . 308 | plugbase:bbc-rhythm_output_autocor a vamp:SparseOutput ; 309 | vamp:identifier "autocor" ; 310 | dc:title "Autocorrelation" ; 311 | dc:description """Autocorrelation of onset detection curve.""" ; 312 | vamp:fixed_bin_count "true" ; 313 | vamp:unit "" ; 314 | vamp:bin_count 1 ; 315 | vamp:sample_type vamp:VariableSampleRate ; 316 | # vamp:computes_event_type ; 317 | # vamp:computes_feature ; 318 | # vamp:computes_signal_type ; 319 | . 320 | plugbase:bbc-rhythm_output_mean-correlation-peak a vamp:SparseOutput ; 321 | vamp:identifier "mean-correlation-peak" ; 322 | dc:title "Mean Correlation Peak" ; 323 | dc:description """Mean of the peak autocorrelation values.""" ; 324 | vamp:fixed_bin_count "true" ; 325 | vamp:unit "" ; 326 | vamp:bin_count 1 ; 327 | vamp:sample_type vamp:VariableSampleRate ; 328 | # vamp:computes_event_type ; 329 | # vamp:computes_feature ; 330 | # vamp:computes_signal_type ; 331 | . 332 | plugbase:bbc-rhythm_output_peak-valley-ratio a vamp:SparseOutput ; 333 | vamp:identifier "peak-valley-ratio" ; 334 | dc:title "Peak-Valley Ratio" ; 335 | dc:description """Ratio of the mean correlation peak to the mean correlation valley.""" ; 336 | vamp:fixed_bin_count "true" ; 337 | vamp:unit "" ; 338 | vamp:bin_count 1 ; 339 | vamp:sample_type vamp:VariableSampleRate ; 340 | # vamp:computes_event_type ; 341 | # vamp:computes_feature ; 342 | # vamp:computes_signal_type ; 343 | . 344 | plugbase:bbc-rhythm_output_tempo a vamp:SparseOutput ; 345 | vamp:identifier "tempo" ; 346 | dc:title "Tempo" ; 347 | dc:description """Overall tempo of the track in BPM.""" ; 348 | vamp:fixed_bin_count "true" ; 349 | vamp:unit "bpm" ; 350 | vamp:bin_count 1 ; 351 | vamp:sample_type vamp:VariableSampleRate ; 352 | # vamp:computes_event_type ; 353 | # vamp:computes_feature ; 354 | # vamp:computes_signal_type ; 355 | . 356 | plugbase:bbc-spectral-contrast a vamp:Plugin ; 357 | dc:title "Spectral Contrast" ; 358 | vamp:name "Spectral Contrast" ; 359 | dc:description """""" ; 360 | foaf:maker [ foaf:name "BBC" ] ; # FIXME could give plugin author's URI here 361 | dc:rights """(c) 2013 British Broadcasting Corporation""" ; 362 | # cc:license ; 363 | vamp:identifier "bbc-spectral-contrast" ; 364 | vamp:vamp_API_version vamp:api_version_2 ; 365 | owl:versionInfo "1" ; 366 | vamp:input_domain vamp:FrequencyDomain ; 367 | 368 | 369 | vamp:parameter plugbase:bbc-spectral-contrast_param_alpha ; 370 | vamp:parameter plugbase:bbc-spectral-contrast_param_numBands ; 371 | 372 | vamp:output plugbase:bbc-spectral-contrast_output_valleys ; 373 | vamp:output plugbase:bbc-spectral-contrast_output_peaks ; 374 | vamp:output plugbase:bbc-spectral-contrast_output_mean ; 375 | . 376 | plugbase:bbc-spectral-contrast_param_alpha a vamp:Parameter ; 377 | vamp:identifier "alpha" ; 378 | dc:title "Alpha" ; 379 | dc:format "" ; 380 | vamp:min_value 0 ; 381 | vamp:max_value 1 ; 382 | vamp:unit "" ; 383 | vamp:default_value 0.02 ; 384 | vamp:value_names (); 385 | . 386 | plugbase:bbc-spectral-contrast_param_numBands a vamp:QuantizedParameter ; 387 | vamp:identifier "numBands" ; 388 | dc:title "Sub-bands" ; 389 | dc:format "" ; 390 | vamp:min_value 2 ; 391 | vamp:max_value 50 ; 392 | vamp:unit "" ; 393 | vamp:quantize_step 1 ; 394 | vamp:default_value 7 ; 395 | vamp:value_names (); 396 | . 397 | plugbase:bbc-spectral-contrast_output_valleys a vamp:DenseOutput ; 398 | vamp:identifier "valleys" ; 399 | dc:title "Spectral Valleys" ; 400 | dc:description """Valley of the spectrum.""" ; 401 | vamp:fixed_bin_count "true" ; 402 | vamp:unit "" ; 403 | vamp:bin_count 7 ; 404 | # vamp:computes_event_type ; 405 | # vamp:computes_feature ; 406 | # vamp:computes_signal_type ; 407 | . 408 | plugbase:bbc-spectral-contrast_output_peaks a vamp:DenseOutput ; 409 | vamp:identifier "peaks" ; 410 | dc:title "Spectral Peaks" ; 411 | dc:description """Peak of the spectrum.""" ; 412 | vamp:fixed_bin_count "true" ; 413 | vamp:unit "" ; 414 | vamp:bin_count 7 ; 415 | # vamp:computes_event_type ; 416 | # vamp:computes_feature ; 417 | # vamp:computes_signal_type ; 418 | . 419 | plugbase:bbc-spectral-contrast_output_mean a vamp:DenseOutput ; 420 | vamp:identifier "mean" ; 421 | dc:title "Spectral Mean" ; 422 | dc:description """Mean of the spectrum.""" ; 423 | vamp:fixed_bin_count "true" ; 424 | vamp:unit "" ; 425 | vamp:bin_count 7 ; 426 | # vamp:computes_event_type ; 427 | # vamp:computes_feature ; 428 | # vamp:computes_signal_type ; 429 | . 430 | plugbase:bbc-spectral-flux a vamp:Plugin ; 431 | dc:title "Spectral Flux" ; 432 | vamp:name "Spectral Flux" ; 433 | dc:description """""" ; 434 | foaf:maker [ foaf:name "BBC" ] ; # FIXME could give plugin author's URI here 435 | dc:rights """(c) 2013 British Broadcasting Corporation""" ; 436 | # cc:license ; 437 | vamp:identifier "bbc-spectral-flux" ; 438 | vamp:vamp_API_version vamp:api_version_2 ; 439 | owl:versionInfo "1" ; 440 | vamp:input_domain vamp:FrequencyDomain ; 441 | 442 | 443 | vamp:parameter plugbase:bbc-spectral-flux_param_usel2 ; 444 | 445 | vamp:output plugbase:bbc-spectral-flux_output_spectral-flux ; 446 | . 447 | plugbase:bbc-spectral-flux_param_usel2 a vamp:QuantizedParameter ; 448 | vamp:identifier "usel2" ; 449 | dc:title "Use L2 norm over L1" ; 450 | dc:format "" ; 451 | vamp:min_value 0 ; 452 | vamp:max_value 1 ; 453 | vamp:unit "" ; 454 | vamp:quantize_step 1 ; 455 | vamp:default_value 0 ; 456 | vamp:value_names (); 457 | . 458 | plugbase:bbc-spectral-flux_output_spectral-flux a vamp:DenseOutput ; 459 | vamp:identifier "spectral-flux" ; 460 | dc:title "Spectral Flux" ; 461 | dc:description """Difference between FFT bin values.""" ; 462 | vamp:fixed_bin_count "true" ; 463 | vamp:unit "" ; 464 | vamp:bin_count 1 ; 465 | # vamp:computes_event_type ; 466 | # vamp:computes_feature ; 467 | # vamp:computes_signal_type ; 468 | . 469 | plugbase:bbc-speechmusic-segmenter a vamp:Plugin ; 470 | dc:title "Speech/Music segmenter" ; 471 | vamp:name "Speech/Music segmenter" ; 472 | dc:description """A simple speech/music segmenter""" ; 473 | foaf:maker [ foaf:name "BBC" ] ; # FIXME could give plugin author's URI here 474 | dc:rights """(c) 2011 British Broadcasting Corporation""" ; 475 | # cc:license ; 476 | vamp:identifier "bbc-speechmusic-segmenter" ; 477 | vamp:vamp_API_version vamp:api_version_2 ; 478 | owl:versionInfo "1" ; 479 | vamp:input_domain vamp:TimeDomain ; 480 | 481 | vamp:parameter plugbase:bbc-speechmusic-segmenter_param_resolution ; 482 | vamp:parameter plugbase:bbc-speechmusic-segmenter_param_change_threshold ; 483 | vamp:parameter plugbase:bbc-speechmusic-segmenter_param_decision_threshold ; 484 | vamp:parameter plugbase:bbc-speechmusic-segmenter_param_min_music_length ; 485 | vamp:parameter plugbase:bbc-speechmusic-segmenter_param_margin ; 486 | 487 | vamp:output plugbase:bbc-speechmusic-segmenter_output_segmentation ; 488 | vamp:output plugbase:bbc-speechmusic-segmenter_output_skewness ; 489 | . 490 | plugbase:bbc-speechmusic-segmenter_param_resolution a vamp:QuantizedParameter ; 491 | vamp:identifier "resolution" ; 492 | dc:title "Resolution" ; 493 | dc:format "" ; 494 | vamp:min_value 1 ; 495 | vamp:max_value 1024 ; 496 | vamp:unit "" ; 497 | vamp:quantize_step 1 ; 498 | vamp:default_value 256 ; 499 | vamp:value_names (); 500 | . 501 | plugbase:bbc-speechmusic-segmenter_param_change_threshold a vamp:Parameter ; 502 | vamp:identifier "change_threshold" ; 503 | dc:title "Change threshold" ; 504 | dc:format "" ; 505 | vamp:min_value 0 ; 506 | vamp:max_value 1 ; 507 | vamp:unit "" ; 508 | vamp:default_value 0.0781 ; 509 | vamp:value_names (); 510 | . 511 | plugbase:bbc-speechmusic-segmenter_param_decision_threshold a vamp:Parameter ; 512 | vamp:identifier "decision_threshold" ; 513 | dc:title "Decision threshold" ; 514 | dc:format "" ; 515 | vamp:min_value 0 ; 516 | vamp:max_value 1 ; 517 | vamp:unit "" ; 518 | vamp:default_value 0.2734 ; 519 | vamp:value_names (); 520 | . 521 | plugbase:bbc-speechmusic-segmenter_param_min_music_length a vamp:Parameter ; 522 | vamp:identifier "min_music_length" ; 523 | dc:title "Minimum music segment length" ; 524 | dc:format "" ; 525 | vamp:min_value 0 ; 526 | vamp:max_value 100 ; 527 | vamp:unit "" ; 528 | vamp:default_value 0 ; 529 | vamp:value_names (); 530 | . 531 | plugbase:bbc-speechmusic-segmenter_param_margin a vamp:Parameter ; 532 | vamp:identifier "margin" ; 533 | dc:title "Margin" ; 534 | dc:format "" ; 535 | vamp:min_value 0 ; 536 | vamp:max_value 50 ; 537 | vamp:unit "" ; 538 | vamp:default_value 14 ; 539 | vamp:value_names (); 540 | . 541 | plugbase:bbc-speechmusic-segmenter_output_segmentation a vamp:SparseOutput ; 542 | vamp:identifier "segmentation" ; 543 | dc:title "Segmentation" ; 544 | dc:description """Segmentation""" ; 545 | vamp:fixed_bin_count "true" ; 546 | vamp:unit "segment-type" ; 547 | a vamp:QuantizedOutput ; 548 | vamp:quantize_step 1 ; 549 | a vamp:KnownExtentsOutput ; 550 | vamp:min_value 0 ; 551 | vamp:max_value 2 ; 552 | vamp:bin_count 1 ; 553 | vamp:sample_type vamp:VariableSampleRate ; 554 | # vamp:computes_event_type ; 555 | # vamp:computes_feature ; 556 | # vamp:computes_signal_type ; 557 | . 558 | plugbase:bbc-speechmusic-segmenter_output_skewness a vamp:SparseOutput ; 559 | vamp:identifier "skewness" ; 560 | dc:title "Detection function" ; 561 | dc:description """Detection function""" ; 562 | vamp:fixed_bin_count "true" ; 563 | vamp:unit "segment-type" ; 564 | a vamp:QuantizedOutput ; 565 | vamp:quantize_step 1 ; 566 | a vamp:KnownExtentsOutput ; 567 | vamp:min_value 0 ; 568 | vamp:max_value 2 ; 569 | vamp:bin_count 1 ; 570 | vamp:sample_type vamp:VariableSampleRate ; 571 | # vamp:computes_event_type ; 572 | # vamp:computes_feature ; 573 | # vamp:computes_signal_type ; 574 | . 575 | 576 | -------------------------------------------------------------------------------- /src/Energy.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2014 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #include "Energy.h" 19 | /// @cond 20 | 21 | Energy::Energy(float inputSampleRate):Plugin(inputSampleRate) 22 | { 23 | sampleRate = inputSampleRate; 24 | threshRatio = 1; 25 | useRoot = true; 26 | prevRMS=0; 27 | avgWindowLength=1; 28 | avgPercentile=3; 29 | dipThresh=3; 30 | } 31 | 32 | Energy::~Energy() 33 | { 34 | } 35 | 36 | string 37 | Energy::getIdentifier() const 38 | { 39 | return "bbc-energy"; 40 | } 41 | 42 | string 43 | Energy::getName() const 44 | { 45 | return "Energy"; 46 | } 47 | 48 | string 49 | Energy::getDescription() const 50 | { 51 | return ""; 52 | } 53 | 54 | string 55 | Energy::getMaker() const 56 | { 57 | return "BBC"; 58 | } 59 | 60 | int 61 | Energy::getPluginVersion() const 62 | { 63 | return 3; 64 | } 65 | 66 | string 67 | Energy::getCopyright() const 68 | { 69 | return "(c) 2014 British Broadcasting Corporation"; 70 | } 71 | 72 | Energy::InputDomain 73 | Energy::getInputDomain() const 74 | { 75 | return TimeDomain; 76 | } 77 | 78 | size_t 79 | Energy::getPreferredBlockSize() const 80 | { 81 | return 1024; 82 | } 83 | 84 | size_t 85 | Energy::getPreferredStepSize() const 86 | { 87 | return 1024; 88 | } 89 | 90 | size_t 91 | Energy::getMinChannelCount() const 92 | { 93 | return 1; 94 | } 95 | 96 | size_t 97 | Energy::getMaxChannelCount() const 98 | { 99 | return 1; 100 | } 101 | 102 | Energy::ParameterList 103 | Energy::getParameterDescriptors() const 104 | { 105 | ParameterList list; 106 | 107 | ParameterDescriptor root; 108 | root.identifier = "root"; 109 | root.name = "Use root"; 110 | root.description = "Whether to apply root to energy calc."; 111 | root.unit = ""; 112 | root.minValue = 0; 113 | root.maxValue = 1; 114 | root.defaultValue = 1; 115 | root.isQuantized = true; 116 | root.quantizeStep = 1; 117 | list.push_back(root); 118 | 119 | ParameterDescriptor avgwindow; 120 | avgwindow.identifier = "avgwindow"; 121 | avgwindow.name = "Moving average window size"; 122 | avgwindow.description = "Size of moving averagw window, in seconds."; 123 | avgwindow.unit = "seconds"; 124 | avgwindow.minValue = 0.001; 125 | avgwindow.maxValue = 10; 126 | avgwindow.defaultValue = 1; 127 | avgwindow.isQuantized = false; 128 | list.push_back(avgwindow); 129 | 130 | ParameterDescriptor avgpercentile; 131 | avgpercentile.identifier = "avgpercentile"; 132 | avgpercentile.name = "Moving average percentile"; 133 | avgpercentile.description = "Percentile to use when calculating moving average."; 134 | avgpercentile.unit = ""; 135 | avgpercentile.minValue = 0; 136 | avgpercentile.maxValue = 100; 137 | avgpercentile.defaultValue = 3; 138 | avgpercentile.isQuantized = false; 139 | list.push_back(avgpercentile); 140 | 141 | ParameterDescriptor dipthresh; 142 | dipthresh.identifier = "dipthresh"; 143 | dipthresh.name = "Dip threshold"; 144 | dipthresh.description = "Threshold for calculating dips, as multiple of the moving average."; 145 | dipthresh.unit = ""; 146 | dipthresh.minValue = 0; 147 | dipthresh.maxValue = 10; 148 | dipthresh.defaultValue = 3; 149 | dipthresh.isQuantized = false; 150 | list.push_back(dipthresh); 151 | 152 | ParameterDescriptor threshold; 153 | threshold.identifier = "threshold"; 154 | threshold.name = "Low energy threshold"; 155 | threshold.description = "Threshold to use for low energy, as a multiple of mean energy."; 156 | threshold.unit = ""; 157 | threshold.minValue = 0; 158 | threshold.maxValue = 10; 159 | threshold.defaultValue = 1; 160 | threshold.isQuantized = false; 161 | list.push_back(threshold); 162 | 163 | return list; 164 | } 165 | 166 | float 167 | Energy::getParameter(string identifier) const 168 | { 169 | if (identifier == "threshold") { 170 | return threshRatio; 171 | } 172 | else if (identifier == "root") 173 | { 174 | return useRoot; 175 | } 176 | else if (identifier == "avgwindow") 177 | { 178 | return avgWindowLength; 179 | } 180 | else if (identifier == "avgpercentile") 181 | { 182 | return avgPercentile; 183 | } 184 | else if (identifier == "dipthresh") 185 | { 186 | return dipThresh; 187 | } 188 | 189 | return 0; 190 | } 191 | 192 | void 193 | Energy::setParameter(string identifier, float value) 194 | { 195 | if (identifier == "threshold") { 196 | threshRatio = value; 197 | } 198 | else if (identifier == "root") 199 | { 200 | if (value == 1) 201 | useRoot = true; 202 | else 203 | useRoot = false; 204 | } 205 | else if (identifier == "avgwindow") 206 | { 207 | avgWindowLength = value; 208 | } 209 | else if (identifier == "avgpercentile") 210 | { 211 | avgPercentile = value; 212 | } 213 | else if (identifier == "dipthresh") 214 | { 215 | dipThresh = value; 216 | } 217 | } 218 | 219 | Energy::ProgramList 220 | Energy::getPrograms() const 221 | { 222 | ProgramList list; 223 | 224 | return list; 225 | } 226 | 227 | string 228 | Energy::getCurrentProgram() const 229 | { 230 | return ""; 231 | } 232 | 233 | void 234 | Energy::selectProgram(string name) 235 | { 236 | } 237 | 238 | Energy::OutputList 239 | Energy::getOutputDescriptors() const 240 | { 241 | OutputList list; 242 | 243 | OutputDescriptor rmsenergy; 244 | rmsenergy.identifier = "rmsenergy"; 245 | rmsenergy.name = "RMS Energy"; 246 | rmsenergy.description = "RMS of the signal."; 247 | rmsenergy.unit = ""; 248 | rmsenergy.hasFixedBinCount = true; 249 | rmsenergy.binCount = 1; 250 | rmsenergy.hasKnownExtents = false; 251 | rmsenergy.isQuantized = false; 252 | rmsenergy.sampleType = OutputDescriptor::OneSamplePerStep; 253 | rmsenergy.hasDuration = false; 254 | list.push_back(rmsenergy); 255 | 256 | OutputDescriptor rmsdelta; 257 | rmsdelta.identifier = "rmsdelta"; 258 | rmsdelta.name = "RMS Energy Delta"; 259 | rmsdelta.description = "Difference between RMS of previous and current blocks."; 260 | rmsdelta.unit = ""; 261 | rmsdelta.hasFixedBinCount = true; 262 | rmsdelta.binCount = 1; 263 | rmsdelta.hasKnownExtents = false; 264 | rmsdelta.isQuantized = false; 265 | rmsdelta.sampleType = OutputDescriptor::OneSamplePerStep; 266 | rmsdelta.hasDuration = false; 267 | list.push_back(rmsdelta); 268 | 269 | OutputDescriptor lowenergy; 270 | lowenergy.identifier = "lowenergy"; 271 | lowenergy.name = "Low Energy"; 272 | lowenergy.description = "Percentage of track which is below the low energy threshold."; 273 | lowenergy.unit = ""; 274 | lowenergy.hasFixedBinCount = true; 275 | lowenergy.binCount = 1; 276 | lowenergy.hasKnownExtents = false; 277 | lowenergy.isQuantized = false; 278 | lowenergy.sampleType = OutputDescriptor::VariableSampleRate; 279 | lowenergy.sampleRate = 0; 280 | lowenergy.hasDuration = false; 281 | list.push_back(lowenergy); 282 | 283 | OutputDescriptor average; 284 | average.identifier = "average"; 285 | average.name = "Moving Average"; 286 | average.description = "Mean of RMS values over moving average window."; 287 | average.unit = ""; 288 | average.hasFixedBinCount = true; 289 | average.binCount = 1; 290 | average.hasKnownExtents = false; 291 | average.isQuantized = false; 292 | average.sampleType = OutputDescriptor::FixedSampleRate; 293 | average.sampleRate = (float)sampleRate/(float)m_stepSize; 294 | average.hasDuration = false; 295 | list.push_back(average); 296 | 297 | OutputDescriptor pdip; 298 | pdip.identifier = "pdip"; 299 | pdip.name = "Dip probability"; 300 | pdip.description = "Probability of the RMS energy dipping below the threshold."; 301 | pdip.unit = ""; 302 | pdip.hasFixedBinCount = true; 303 | pdip.binCount = 1; 304 | pdip.hasKnownExtents = false; 305 | pdip.isQuantized = false; 306 | pdip.sampleType = OutputDescriptor::FixedSampleRate; 307 | pdip.sampleRate = (float)sampleRate/(float)m_stepSize; 308 | pdip.hasDuration = false; 309 | list.push_back(pdip); 310 | 311 | return list; 312 | } 313 | 314 | bool 315 | Energy::initialise(size_t channels, size_t stepSize, size_t blockSize) 316 | { 317 | if (channels < getMinChannelCount() || 318 | channels > getMaxChannelCount()) return false; 319 | 320 | m_blockSize = blockSize; 321 | m_stepSize = stepSize; 322 | reset(); 323 | 324 | return true; 325 | } 326 | 327 | void 328 | Energy::reset() 329 | { 330 | rmsEnergy.clear(); 331 | prevRMS=0; 332 | } 333 | 334 | Energy::FeatureSet 335 | Energy::process(const float *const *inputBuffers, Vamp::RealTime timestamp) 336 | { 337 | FeatureSet output; 338 | Feature fRMS, fDelta; 339 | float totalEnergy = 0.f; 340 | float rms; 341 | 342 | // find total energy for frame 343 | for (int i=0; i rmsAvg; 372 | float total = 0.f, average = 0.f; 373 | float lowEnergy = 0.f, highEnergy = 0.f; 374 | 375 | // set window size 376 | float avgWindowSize = avgWindowLength*sampleRate/(float)m_blockSize; 377 | int avgWindowOffsetL = (int)floor(avgWindowSize/2.0); 378 | int avgWindowOffsetR = (int)ceil(avgWindowSize/2.0); 379 | 380 | for (unsigned i=0; i=rmsEnergy.size()) end = rmsEnergy.size()-1; 390 | 391 | // copy window 392 | vector::const_iterator first = rmsEnergy.begin() + start; 393 | vector::const_iterator last = rmsEnergy.begin() + end + 1; 394 | vector window(first, last); 395 | 396 | // sort window 397 | std::sort(window.begin(), window.end()); 398 | 399 | // find Xth percentile of window 400 | int pos = (int)((float)(window.size()-1) / 100.0 * avgPercentile); 401 | rmsAvg.push_back(window[pos]); 402 | 403 | // return moving average 404 | Feature fAvg; 405 | fAvg.values.push_back(rmsAvg[i]); 406 | output[3].push_back(fAvg); 407 | } 408 | 409 | // find mean of all RMS values 410 | if (rmsEnergy.size() != 0) 411 | average = total / (float)rmsEnergy.size(); 412 | 413 | // find threshold value 414 | float threshLowEnergy = average * threshRatio; 415 | 416 | for (unsigned i=0; i=rmsEnergy.size()) end = rmsEnergy.size()-1; 429 | 430 | // count dips below moving average * dipThresh 431 | float dipCount = 0; 432 | float threshDip = rmsAvg[i]*dipThresh; 433 | for (unsigned int j=start; j<=end; j++) 434 | { 435 | if (rmsEnergy[j] < threshDip) 436 | dipCount++; 437 | } 438 | 439 | // return dip probability 440 | Feature fProb; 441 | fProb.values.push_back(dipCount/(float)(end-start)); 442 | output[4].push_back(fProb); 443 | } 444 | 445 | // calculate low energy ratio 446 | float lowEnergyRatio = 0.f; 447 | if (lowEnergy + highEnergy != 0) 448 | lowEnergyRatio = (100.f * lowEnergy) / (lowEnergy + highEnergy); 449 | 450 | // return low energy 451 | Feature fLowEnergy; 452 | fLowEnergy.hasTimestamp = true; 453 | fLowEnergy.timestamp = Vamp::RealTime::fromSeconds(0); 454 | fLowEnergy.values.push_back(lowEnergyRatio); 455 | output[2].push_back(fLowEnergy); 456 | 457 | return output; 458 | } 459 | 460 | /// @endcond 461 | -------------------------------------------------------------------------------- /src/Energy.h: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #ifndef _ENERGY_H_ 19 | #define _ENERGY_H_ 20 | 21 | #include 22 | #include 23 | #include 24 | #include 25 | 26 | using std::string; 27 | using std::vector; 28 | 29 | /*! 30 | * \brief Calculates the RMS energy and related features 31 | * 32 | * \section Outputs 33 | * \par RMS energy 34 | * The root mean square energy of the signal. 35 | * \par RMS energy delta 36 | * The difference between the RMS energy of the current frame and the last. 37 | * \par Moving average 38 | * The Xth percentile of the RMS energy within a window. 39 | * \par Dip probability 40 | * The ratio of frames that have dipped below the dip threshold within the 41 | * averaging window, where the threshold is a product of the moving average. 42 | * \par Low energy ratio 43 | * Percentage of frames in the file whose energy falls below a threshold, which 44 | * is a product of the overall mean energy. 45 | * 46 | * \section Parameters 47 | * \par Use root 48 | * Whether to apply the square root in RMS calculation. (default = 1) 49 | * \par Moving average window size 50 | * The size of the averaging window, in seconds. (default = 1.0) 51 | * \par Moving average percentile 52 | * The percentile used to calculate the average. (default = 3.0) 53 | * \par Dip threshold 54 | * The threshold for calculating the dip, which is multiplied by the moving 55 | * average. (default = 3.0) 56 | * \par Low energy threshold 57 | * The threshold for calculating low energy, which is multiplied by the overall 58 | * mean RMS energy (default = 1.0) 59 | * 60 | * \section Description 61 | * 62 | * RMS energy for each block is calculated as follows. The square root 63 | * can be removed using the 'Use root' parameter (default = true) 64 | * \f[ RMS = \sqrt{\displaystyle\sum\limits_{i=0}^n x_i^2} \f] 65 | * 66 | * The dip threshold is a simple but effective speech/music 67 | * discriminator. It is defined as the ratio of frames in a moving window which 68 | * fall below a threshold, where the threshold is a product of the moving 69 | * average. 70 | * 71 | * The low energy ratio is the percentage of blocks which fall below 72 | * a certain RMS energy threshold. The threshold is set using the 'Low energy 73 | * threshold' parameter which is a ratio of the overall mean RMS energy (default = 1). 74 | */ 75 | class Energy : public Vamp::Plugin 76 | { 77 | public: 78 | /// @cond 79 | Energy(float inputSampleRate); 80 | virtual ~Energy(); 81 | string getIdentifier() const; 82 | string getName() const; 83 | string getDescription() const; 84 | string getMaker() const; 85 | int getPluginVersion() const; 86 | string getCopyright() const; 87 | InputDomain getInputDomain() const; 88 | size_t getPreferredBlockSize() const; 89 | size_t getPreferredStepSize() const; 90 | size_t getMinChannelCount() const; 91 | size_t getMaxChannelCount() const; 92 | ParameterList getParameterDescriptors() const; 93 | float getParameter(string identifier) const; 94 | void setParameter(string identifier, 95 | float value); 96 | ProgramList getPrograms() const; 97 | string getCurrentProgram() const; 98 | void selectProgram(string name); 99 | OutputList getOutputDescriptors() const; 100 | bool initialise(size_t channels, 101 | size_t stepSize, 102 | size_t blockSize); 103 | void reset(); 104 | FeatureSet process(const float *const *inputBuffers, 105 | Vamp::RealTime timestamp); 106 | FeatureSet getRemainingFeatures(); 107 | /// @endcond 108 | 109 | protected: 110 | /// @cond 111 | int m_blockSize, m_stepSize; 112 | /// @endcond 113 | 114 | float sampleRate; /*!< Variable to store input sample rate, used for calculating window sizes */ 115 | bool useRoot; /*!< Flag to indicate whether to find root of mean energy */ 116 | float threshRatio; /*!< Ratio of threshold to average energy */ 117 | vector rmsEnergy; /*!< Variable to store RMS values from previous blocks in order to calculate mean */ 118 | float prevRMS; /*!< Variable to store RMS value of previous block */ 119 | float avgWindowLength; /*!< Length of window to use for averaging, in seconds */ 120 | float avgPercentile; /*!< Percentile to calculate as average. */ 121 | float dipThresh; /*!< Threshold to use for calculating dips, as a multiple of the moving average. */ 122 | }; 123 | 124 | 125 | 126 | #endif 127 | -------------------------------------------------------------------------------- /src/Intensity.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #include "Intensity.h" 19 | /// @cond 20 | 21 | Intensity::Intensity(float inputSampleRate):Plugin(inputSampleRate) 22 | { 23 | m_sampleRate = inputSampleRate; 24 | numBands = 7; 25 | bandHighFreq = NULL; 26 | calculateBandFreqs(); 27 | } 28 | 29 | Intensity::~Intensity() 30 | { 31 | delete [] bandHighFreq; 32 | } 33 | 34 | string 35 | Intensity::getIdentifier() const 36 | { 37 | return "bbc-intensity"; 38 | } 39 | 40 | string 41 | Intensity::getName() const 42 | { 43 | return "Intensity"; 44 | } 45 | 46 | string 47 | Intensity::getDescription() const 48 | { 49 | return ""; 50 | } 51 | 52 | string 53 | Intensity::getMaker() const 54 | { 55 | return "BBC"; 56 | } 57 | 58 | int 59 | Intensity::getPluginVersion() const 60 | { 61 | return 1; 62 | } 63 | 64 | string 65 | Intensity::getCopyright() const 66 | { 67 | return "(c) 2013 British Broadcasting Corporation"; 68 | } 69 | 70 | Intensity::InputDomain 71 | Intensity::getInputDomain() const 72 | { 73 | return FrequencyDomain; 74 | } 75 | 76 | size_t 77 | Intensity::getPreferredBlockSize() const 78 | { 79 | return 1024; 80 | } 81 | 82 | size_t 83 | Intensity::getPreferredStepSize() const 84 | { 85 | return 1024; 86 | } 87 | 88 | size_t 89 | Intensity::getMinChannelCount() const 90 | { 91 | return 1; 92 | } 93 | 94 | size_t 95 | Intensity::getMaxChannelCount() const 96 | { 97 | return 1; 98 | } 99 | 100 | Intensity::ParameterList 101 | Intensity::getParameterDescriptors() const 102 | { 103 | ParameterList list; 104 | 105 | ParameterDescriptor numBandsParam; 106 | numBandsParam.identifier = "numBands"; 107 | numBandsParam.name = "Sub-bands"; 108 | numBandsParam.description = "Number of sub-bands."; 109 | numBandsParam.unit = ""; 110 | numBandsParam.minValue = 2; 111 | numBandsParam.maxValue = 50; 112 | numBandsParam.defaultValue = 7; 113 | numBandsParam.isQuantized = true; 114 | numBandsParam.quantizeStep = 1.0; 115 | list.push_back(numBandsParam); 116 | 117 | return list; 118 | } 119 | 120 | float 121 | Intensity::getParameter(string identifier) const 122 | { 123 | if (identifier == "numBands") 124 | return numBands; 125 | return 0; 126 | } 127 | 128 | void 129 | Intensity::setParameter(string identifier, float value) 130 | { 131 | if (identifier == "numBands") { 132 | numBands = value; 133 | calculateBandFreqs(); 134 | } 135 | } 136 | 137 | Intensity::ProgramList 138 | Intensity::getPrograms() const 139 | { 140 | ProgramList list; 141 | 142 | return list; 143 | } 144 | 145 | string 146 | Intensity::getCurrentProgram() const 147 | { 148 | return ""; 149 | } 150 | 151 | void 152 | Intensity::selectProgram(string name) 153 | { 154 | } 155 | 156 | Intensity::OutputList 157 | Intensity::getOutputDescriptors() const 158 | { 159 | OutputList list; 160 | 161 | OutputDescriptor intensity; 162 | intensity.identifier = "intensity"; 163 | intensity.name = "Intensity"; 164 | intensity.description = "Sum of the FFT bin absolute values."; 165 | intensity.unit = ""; 166 | intensity.hasFixedBinCount = true; 167 | intensity.binCount = 1; 168 | intensity.hasKnownExtents = false; 169 | intensity.isQuantized = false; 170 | intensity.sampleType = OutputDescriptor::OneSamplePerStep; 171 | intensity.hasDuration = false; 172 | list.push_back(intensity); 173 | 174 | OutputDescriptor intensityRatio; 175 | intensityRatio.identifier = "intensity-ratio"; 176 | intensityRatio.name = "Intensity Ratio"; 177 | intensityRatio.description = "Sum of each sub-band's absolute values."; 178 | intensityRatio.unit = ""; 179 | intensityRatio.hasFixedBinCount = true; 180 | intensityRatio.binCount = numBands; 181 | intensityRatio.hasKnownExtents = false; 182 | intensityRatio.isQuantized = false; 183 | intensityRatio.sampleType = OutputDescriptor::OneSamplePerStep; 184 | intensityRatio.hasDuration = false; 185 | list.push_back(intensityRatio); 186 | 187 | return list; 188 | } 189 | 190 | bool 191 | Intensity::initialise(size_t channels, size_t stepSize, size_t blockSize) 192 | { 193 | if (channels < getMinChannelCount() || 194 | channels > getMaxChannelCount()) return false; 195 | 196 | m_blockSize = blockSize; 197 | m_stepSize = stepSize; 198 | reset(); 199 | 200 | return true; 201 | } 202 | 203 | void 204 | Intensity::reset() 205 | { 206 | } 207 | 208 | Intensity::FeatureSet 209 | Intensity::process(const float *const *inputBuffers, Vamp::RealTime timestamp) 210 | { 211 | FeatureSet output; 212 | float total = 0; 213 | int currentBand = 0; 214 | float *bandTotal = new float[numBands]; 215 | 216 | // set band totals to zero 217 | for (int i=0; i(inputBuffers[0][i*2], inputBuffers[0][i*2+1])); 225 | 226 | // add contents of this bin to total 227 | total += binVal; 228 | 229 | // find centre frequency of this bin 230 | float freq = (i+1)*m_sampleRate / (float)m_blockSize; 231 | 232 | // locate which band this bin belongs in 233 | while (freq > bandHighFreq[currentBand]) { 234 | currentBand++; 235 | if (currentBand >= numBands) break; 236 | } 237 | 238 | // add bin value to relevent band 239 | bandTotal[currentBand] += binVal; 240 | } 241 | 242 | // send intensity outputs 243 | Feature intensity; 244 | intensity.values.push_back(total); 245 | output[0].push_back(intensity); 246 | 247 | // send intensity ratio outputs 248 | Feature intensityRatio; 249 | for (int i=0; i 22 | #include 23 | #include 24 | #include 25 | 26 | using std::string; 27 | using std::vector; 28 | using std::abs; 29 | using std::complex; 30 | 31 | /*! 32 | * \brief Calculates the intensity of a signal and the intensity ratio for a number of sub-bands 33 | * 34 | * \section Outputs 35 | * \par Intensity 36 | * The sum of the magnitude of the FFT bins. 37 | * \par Intensity ratio 38 | * The ratio between the intensity of each sub-band to the overall intensity. 39 | * 40 | * \section Parameters 41 | * \par Sub-bands 42 | * The number of sub-bands to use. (default = 7) 43 | * 44 | * \section Description 45 | * 46 | * The intensity features are based on those published in [1], section 3A. 47 | * 48 | * Firstly the signal is divided into \f$i\f$ sub-bands with the following frequency ranges. 49 | * \f[ \left(0,\frac{F_s}{2^i}\right) , \left(\frac{F_s}{2^i},\frac{F_s}{2^{i-1}}\right) 50 | * , \ldots \left(\frac{F_s}{2^2},\frac{F_s}{2^1}\right) \f] 51 | * 52 | * The intensity of each frame \f$n\f$ is calculated by summing the magnitude \f$A\f$ of each 53 | * frequency bin \f$k\f$. 54 | * \f[ I(n) = \displaystyle\sum\limits_{k=0}^{F_s/2} A(n,k) \f] 55 | * 56 | * For each sub-band \f$i\f$ with a frequency range from \f$L_i\f$ to \f$H_i\f$, the intensity 57 | * ratio is the ratio of that sub-band's intensity to the overall intensity. 58 | * \f[ D_i(n) = \frac{1}{I(n)} \displaystyle\sum\limits_{k=L_i}^{H_i} A(n,k) \f] 59 | * 60 | * [1] Lu, L., Liu, D., & Zhang, H.-J. (2006). Automatic Mood Detection and Tracking of Music 61 | * Audio Signals. IEEE Transactions on Audio, Speech and Language Processing (Vol. 14, pp. 5-18). 62 | */ 63 | class Intensity : public Vamp::Plugin 64 | { 65 | public: 66 | /// @cond 67 | Intensity(float inputSampleRate); 68 | virtual ~Intensity(); 69 | string getIdentifier() const; 70 | string getName() const; 71 | string getDescription() const; 72 | string getMaker() const; 73 | int getPluginVersion() const; 74 | string getCopyright() const; 75 | InputDomain getInputDomain() const; 76 | size_t getPreferredBlockSize() const; 77 | size_t getPreferredStepSize() const; 78 | size_t getMinChannelCount() const; 79 | size_t getMaxChannelCount() const; 80 | ParameterList getParameterDescriptors() const; 81 | float getParameter(string identifier) const; 82 | void setParameter(string identifier, 83 | float value); 84 | ProgramList getPrograms() const; 85 | string getCurrentProgram() const; 86 | void selectProgram(string name); 87 | OutputList getOutputDescriptors() const; 88 | bool initialise(size_t channels, 89 | size_t stepSize, 90 | size_t blockSize); 91 | void reset(); 92 | FeatureSet process(const float *const *inputBuffers, 93 | Vamp::RealTime timestamp); 94 | FeatureSet getRemainingFeatures(); 95 | /// @endcond 96 | 97 | protected: 98 | void calculateBandFreqs(); 99 | 100 | /// @cond 101 | int m_blockSize, m_stepSize; 102 | float m_sampleRate; 103 | /// @endcond 104 | 105 | int numBands; /*!< Number of sub-bands to use */ 106 | float *bandHighFreq; /*!< Upper frequency range of each sub-band */ 107 | }; 108 | 109 | #endif 110 | -------------------------------------------------------------------------------- /src/Peaks.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2014 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #include "Peaks.h" 19 | /// @cond 20 | 21 | Peaks::Peaks(float inputSampleRate):Plugin(inputSampleRate) 22 | { 23 | } 24 | 25 | Peaks::~Peaks() 26 | { 27 | } 28 | 29 | string 30 | Peaks::getIdentifier() const 31 | { 32 | return "bbc-peaks"; 33 | } 34 | 35 | string 36 | Peaks::getName() const 37 | { 38 | return "Peaks"; 39 | } 40 | 41 | string 42 | Peaks::getDescription() const 43 | { 44 | return ""; 45 | } 46 | 47 | string 48 | Peaks::getMaker() const 49 | { 50 | return "BBC"; 51 | } 52 | 53 | int 54 | Peaks::getPluginVersion() const 55 | { 56 | return 1; 57 | } 58 | 59 | string 60 | Peaks::getCopyright() const 61 | { 62 | return "(c) 2014 British Broadcasting Corporation"; 63 | } 64 | 65 | Peaks::InputDomain 66 | Peaks::getInputDomain() const 67 | { 68 | return TimeDomain; 69 | } 70 | 71 | size_t 72 | Peaks::getPreferredBlockSize() const 73 | { 74 | return 256; 75 | } 76 | 77 | size_t 78 | Peaks::getPreferredStepSize() const 79 | { 80 | return 256; 81 | } 82 | 83 | size_t 84 | Peaks::getMinChannelCount() const 85 | { 86 | return 1; 87 | } 88 | 89 | size_t 90 | Peaks::getMaxChannelCount() const 91 | { 92 | return 1; 93 | } 94 | 95 | Peaks::ParameterList 96 | Peaks::getParameterDescriptors() const 97 | { 98 | ParameterList list; 99 | return list; 100 | } 101 | 102 | float 103 | Peaks::getParameter(string identifier) const 104 | { 105 | return 0; 106 | } 107 | 108 | void 109 | Peaks::setParameter(string identifier, float value) 110 | { 111 | } 112 | 113 | Peaks::ProgramList 114 | Peaks::getPrograms() const 115 | { 116 | ProgramList list; 117 | return list; 118 | } 119 | 120 | string 121 | Peaks::getCurrentProgram() const 122 | { 123 | return ""; 124 | } 125 | 126 | void 127 | Peaks::selectProgram(string name) 128 | { 129 | } 130 | 131 | Peaks::OutputList 132 | Peaks::getOutputDescriptors() const 133 | { 134 | OutputList list; 135 | 136 | OutputDescriptor peaks; 137 | peaks.identifier = "peaks"; 138 | peaks.name = "Peaks"; 139 | peaks.description = "Peak and trough, in order of occurance."; 140 | peaks.unit = ""; 141 | peaks.hasFixedBinCount = true; 142 | peaks.binCount = 1; 143 | peaks.hasKnownExtents = false; 144 | peaks.isQuantized = false; 145 | peaks.sampleType = OutputDescriptor::OneSamplePerStep; 146 | peaks.hasDuration = false; 147 | list.push_back(peaks); 148 | 149 | return list; 150 | } 151 | 152 | bool 153 | Peaks::initialise(size_t channels, size_t stepSize, size_t blockSize) 154 | { 155 | if (channels < getMinChannelCount() || 156 | channels > getMaxChannelCount()) return false; 157 | 158 | m_blockSize = blockSize; 159 | m_stepSize = stepSize; 160 | reset(); 161 | 162 | return true; 163 | } 164 | 165 | void 166 | Peaks::reset() 167 | { 168 | } 169 | 170 | Peaks::FeatureSet 171 | Peaks::process(const float *const *inputBuffers, Vamp::RealTime timestamp) 172 | { 173 | float min=1.f; 174 | int minPoint=0; 175 | float max=-1.f; 176 | int maxPoint=0; 177 | for (int i=0; i max) 185 | { 186 | max=inputBuffers[0][i]; 187 | maxPoint=i; 188 | } 189 | } 190 | 191 | FeatureSet output; 192 | Feature f; 193 | if (minPoint 22 | #include 23 | #include 24 | 25 | using std::string; 26 | using std::vector; 27 | 28 | class Peaks : public Vamp::Plugin 29 | { 30 | public: 31 | /// @cond 32 | Peaks(float inputSampleRate); 33 | virtual ~Peaks(); 34 | string getIdentifier() const; 35 | string getName() const; 36 | string getDescription() const; 37 | string getMaker() const; 38 | int getPluginVersion() const; 39 | string getCopyright() const; 40 | InputDomain getInputDomain() const; 41 | size_t getPreferredBlockSize() const; 42 | size_t getPreferredStepSize() const; 43 | size_t getMinChannelCount() const; 44 | size_t getMaxChannelCount() const; 45 | ParameterList getParameterDescriptors() const; 46 | float getParameter(string identifier) const; 47 | void setParameter(string identifier, 48 | float value); 49 | ProgramList getPrograms() const; 50 | string getCurrentProgram() const; 51 | void selectProgram(string name); 52 | OutputList getOutputDescriptors() const; 53 | bool initialise(size_t channels, 54 | size_t stepSize, 55 | size_t blockSize); 56 | void reset(); 57 | FeatureSet process(const float *const *inputBuffers, 58 | Vamp::RealTime timestamp); 59 | FeatureSet getRemainingFeatures(); 60 | /// @endcond 61 | 62 | protected: 63 | /// @cond 64 | int m_blockSize, m_stepSize; 65 | /// @endcond 66 | }; 67 | 68 | 69 | 70 | #endif 71 | -------------------------------------------------------------------------------- /src/Rhythm.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #include "Rhythm.h" 19 | /// @cond 20 | 21 | Rhythm::Rhythm(float inputSampleRate) 22 | : Plugin(inputSampleRate) { 23 | m_sampleRate = inputSampleRate; 24 | numBands = 7; 25 | bandHighFreq = NULL; 26 | calculateBandFreqs(); 27 | 28 | // calculate and save half-hanny window 29 | halfHannLength = 12; 30 | halfHannWindow = new float[halfHannLength]; 31 | for (int i = 0; i < halfHannLength; i++) 32 | halfHannWindow[i] = halfHanning((float) i); 33 | 34 | // calculate and save canny window 35 | cannyLength = 12; 36 | cannyShape = 4.f; 37 | cannyWindow = new float[cannyLength * 2 + 1]; 38 | for (int i = cannyLength * -1; i < cannyLength + 1; i++) 39 | cannyWindow[i + cannyLength] = canny((float) i); 40 | 41 | // set up parameters 42 | threshold = 1; 43 | average_window = 200; 44 | peak_window = 6; 45 | max_bpm = 300; 46 | min_bpm = 12; 47 | } 48 | 49 | Rhythm::~Rhythm() { 50 | delete[] halfHannWindow; 51 | delete[] cannyWindow; 52 | delete[] bandHighFreq; 53 | } 54 | 55 | string Rhythm::getIdentifier() const { 56 | return "bbc-rhythm"; 57 | } 58 | 59 | string Rhythm::getName() const { 60 | return "Rhythm"; 61 | } 62 | 63 | string Rhythm::getDescription() const { 64 | return ""; 65 | } 66 | 67 | string Rhythm::getMaker() const { 68 | return "BBC"; 69 | } 70 | 71 | int Rhythm::getPluginVersion() const { 72 | return 1; 73 | } 74 | 75 | string Rhythm::getCopyright() const { 76 | return "(c) 2013 British Broadcasting Corporation"; 77 | } 78 | 79 | Rhythm::InputDomain Rhythm::getInputDomain() const { 80 | return FrequencyDomain; 81 | } 82 | 83 | size_t Rhythm::getPreferredBlockSize() const { 84 | return 1024; 85 | } 86 | 87 | size_t Rhythm::getPreferredStepSize() const { 88 | return 256; 89 | } 90 | 91 | size_t Rhythm::getMinChannelCount() const { 92 | return 1; 93 | } 94 | 95 | size_t Rhythm::getMaxChannelCount() const { 96 | return 1; 97 | } 98 | 99 | Rhythm::ParameterList Rhythm::getParameterDescriptors() const { 100 | ParameterList list; 101 | 102 | ParameterDescriptor numBandsParam; 103 | numBandsParam.identifier = "numBands"; 104 | numBandsParam.name = "Sub-bands"; 105 | numBandsParam.description = "Number of sub-bands."; 106 | numBandsParam.unit = ""; 107 | numBandsParam.minValue = 2; 108 | numBandsParam.maxValue = 50; 109 | numBandsParam.defaultValue = 7; 110 | numBandsParam.isQuantized = true; 111 | numBandsParam.quantizeStep = 1.0; 112 | list.push_back(numBandsParam); 113 | 114 | ParameterDescriptor thresholdParam; 115 | thresholdParam.identifier = "threshold"; 116 | thresholdParam.name = "Threshold"; 117 | thresholdParam.description = "For peak picker."; 118 | thresholdParam.unit = ""; 119 | thresholdParam.minValue = 0; 120 | thresholdParam.maxValue = 10; 121 | thresholdParam.defaultValue = 1; 122 | thresholdParam.isQuantized = false; 123 | list.push_back(thresholdParam); 124 | 125 | ParameterDescriptor average_windowParam; 126 | average_windowParam.identifier = "average_window"; 127 | average_windowParam.name = "Moving average window length"; 128 | average_windowParam.description = "Length of window used for moving average."; 129 | average_windowParam.unit = "frames"; 130 | average_windowParam.minValue = 1; 131 | average_windowParam.maxValue = 500; 132 | average_windowParam.defaultValue = 200; 133 | average_windowParam.isQuantized = true; 134 | average_windowParam.quantizeStep = 1.0; 135 | list.push_back(average_windowParam); 136 | 137 | ParameterDescriptor peak_windowParam; 138 | peak_windowParam.identifier = "peak_window"; 139 | peak_windowParam.name = "Onset peak window length"; 140 | peak_windowParam.description = "Length of window used for peak picking."; 141 | peak_windowParam.unit = "frames"; 142 | peak_windowParam.minValue = 1; 143 | peak_windowParam.maxValue = 20; 144 | peak_windowParam.defaultValue = 6; 145 | peak_windowParam.isQuantized = true; 146 | peak_windowParam.quantizeStep = 1.0; 147 | list.push_back(peak_windowParam); 148 | 149 | ParameterDescriptor min_bpmParam; 150 | min_bpmParam.identifier = "min_bpm"; 151 | min_bpmParam.name = "Minimum BPM"; 152 | min_bpmParam.description = "Minimum BPM calculated for autocorrelation."; 153 | min_bpmParam.unit = "bpm"; 154 | min_bpmParam.minValue = 5; 155 | min_bpmParam.maxValue = 300; 156 | min_bpmParam.defaultValue = 12; 157 | min_bpmParam.isQuantized = true; 158 | min_bpmParam.quantizeStep = 1.0; 159 | list.push_back(min_bpmParam); 160 | 161 | ParameterDescriptor max_bpmParam; 162 | max_bpmParam.identifier = "max_bpm"; 163 | max_bpmParam.name = "Maximum BPM"; 164 | max_bpmParam.description = "Maximum BPM calculated for autocorrelation."; 165 | max_bpmParam.unit = "bpm"; 166 | max_bpmParam.minValue = 50; 167 | max_bpmParam.maxValue = 400; 168 | max_bpmParam.defaultValue = 300; 169 | max_bpmParam.isQuantized = true; 170 | max_bpmParam.quantizeStep = 1.0; 171 | list.push_back(max_bpmParam); 172 | 173 | return list; 174 | } 175 | 176 | float Rhythm::getParameter(string identifier) const { 177 | if (identifier == "numBands") 178 | return numBands; 179 | else if (identifier == "threshold") 180 | return threshold; 181 | else if (identifier == "average_window") 182 | return average_window; 183 | else if (identifier == "peak_window") 184 | return peak_window; 185 | else if (identifier == "min_bpm") 186 | return min_bpm; 187 | else if (identifier == "max_bpm") 188 | return max_bpm; 189 | return 0; 190 | } 191 | 192 | void Rhythm::setParameter(string identifier, float value) { 193 | if (identifier == "numBands") { 194 | numBands = value; 195 | calculateBandFreqs(); 196 | } else if (identifier == "threshold") { 197 | threshold = value; 198 | } else if (identifier == "average_window") { 199 | average_window = (int) value; 200 | } else if (identifier == "peak_window") { 201 | peak_window = (int) value; 202 | } else if (identifier == "min_bpm") { 203 | min_bpm = (int) value; 204 | } else if (identifier == "max_bpm") { 205 | max_bpm = (int) value; 206 | } 207 | } 208 | 209 | Rhythm::ProgramList Rhythm::getPrograms() const { 210 | ProgramList list; 211 | 212 | return list; 213 | } 214 | 215 | string Rhythm::getCurrentProgram() const { 216 | return ""; 217 | } 218 | 219 | void Rhythm::selectProgram(string name) { 220 | } 221 | 222 | Rhythm::OutputList Rhythm::getOutputDescriptors() const { 223 | OutputList list; 224 | 225 | OutputDescriptor onset_curve; 226 | onset_curve.identifier = "onset_curve"; 227 | onset_curve.name = "Onset curve"; 228 | onset_curve.description = "Onset detection curve."; 229 | onset_curve.unit = ""; 230 | onset_curve.hasFixedBinCount = true; 231 | onset_curve.binCount = 1; 232 | onset_curve.hasKnownExtents = false; 233 | onset_curve.isQuantized = false; 234 | onset_curve.sampleType = OutputDescriptor::VariableSampleRate; 235 | onset_curve.sampleRate = 0; 236 | onset_curve.hasDuration = false; 237 | list.push_back(onset_curve); 238 | 239 | OutputDescriptor average; 240 | average.identifier = "average"; 241 | average.name = "Average"; 242 | average.description = "Moving average of onset curve."; 243 | average.unit = ""; 244 | average.hasFixedBinCount = true; 245 | average.binCount = 1; 246 | average.hasKnownExtents = false; 247 | average.isQuantized = false; 248 | average.sampleType = OutputDescriptor::VariableSampleRate; 249 | average.sampleRate = 0; 250 | average.hasDuration = false; 251 | list.push_back(average); 252 | 253 | OutputDescriptor diff; 254 | diff.identifier = "diff"; 255 | diff.name = "Difference"; 256 | diff.description = "Difference between onset and average."; 257 | diff.unit = ""; 258 | diff.hasFixedBinCount = true; 259 | diff.binCount = 1; 260 | diff.hasKnownExtents = false; 261 | diff.isQuantized = false; 262 | diff.sampleType = OutputDescriptor::VariableSampleRate; 263 | diff.sampleRate = 0; 264 | diff.hasDuration = false; 265 | list.push_back(diff); 266 | 267 | OutputDescriptor onset; 268 | onset.identifier = "onset"; 269 | onset.name = "Onset"; 270 | onset.description = "Point of onsets."; 271 | onset.unit = ""; 272 | onset.hasFixedBinCount = true; 273 | onset.binCount = 0; 274 | onset.sampleType = OutputDescriptor::VariableSampleRate; 275 | onset.sampleRate = 0; 276 | list.push_back(onset); 277 | 278 | OutputDescriptor avg_onset_freq; 279 | avg_onset_freq.identifier = "avg-onset-freq"; 280 | avg_onset_freq.name = "Average Onset Frequency"; 281 | avg_onset_freq.description = "Rate of onsets per minute."; 282 | avg_onset_freq.unit = ""; 283 | avg_onset_freq.hasFixedBinCount = true; 284 | avg_onset_freq.binCount = 1; 285 | avg_onset_freq.sampleType = OutputDescriptor::VariableSampleRate; 286 | avg_onset_freq.sampleRate = 0; 287 | avg_onset_freq.hasKnownExtents = false; 288 | avg_onset_freq.isQuantized = false; 289 | avg_onset_freq.hasDuration = false; 290 | list.push_back(avg_onset_freq); 291 | 292 | OutputDescriptor rhythm_strength; 293 | rhythm_strength.identifier = "rhythm-strength"; 294 | rhythm_strength.name = "Rhythm Strength"; 295 | rhythm_strength.description = "Average value of peaks in onset curve."; 296 | rhythm_strength.unit = ""; 297 | rhythm_strength.hasFixedBinCount = true; 298 | rhythm_strength.binCount = 1; 299 | rhythm_strength.sampleType = OutputDescriptor::VariableSampleRate; 300 | rhythm_strength.sampleRate = 0; 301 | rhythm_strength.hasKnownExtents = false; 302 | rhythm_strength.isQuantized = false; 303 | rhythm_strength.hasDuration = false; 304 | list.push_back(rhythm_strength); 305 | 306 | OutputDescriptor autocor; 307 | autocor.identifier = "autocor"; 308 | autocor.name = "Autocorrelation"; 309 | autocor.description = "Autocorrelation of onset detection curve."; 310 | autocor.unit = ""; 311 | autocor.hasFixedBinCount = true; 312 | autocor.binCount = 1; 313 | autocor.hasKnownExtents = false; 314 | autocor.isQuantized = false; 315 | autocor.sampleType = OutputDescriptor::VariableSampleRate; 316 | autocor.sampleRate = 0; 317 | autocor.hasDuration = false; 318 | list.push_back(autocor); 319 | 320 | OutputDescriptor mean_correlation_peak; 321 | mean_correlation_peak.identifier = "mean-correlation-peak"; 322 | mean_correlation_peak.name = "Mean Correlation Peak"; 323 | mean_correlation_peak.description = 324 | "Mean of the peak autocorrelation values."; 325 | mean_correlation_peak.unit = ""; 326 | mean_correlation_peak.hasFixedBinCount = true; 327 | mean_correlation_peak.binCount = 1; 328 | mean_correlation_peak.hasKnownExtents = false; 329 | mean_correlation_peak.isQuantized = false; 330 | mean_correlation_peak.sampleType = OutputDescriptor::VariableSampleRate; 331 | mean_correlation_peak.sampleRate = 0; 332 | mean_correlation_peak.hasDuration = false; 333 | list.push_back(mean_correlation_peak); 334 | 335 | OutputDescriptor peak_valley_ratio; 336 | peak_valley_ratio.identifier = "peak-valley-ratio"; 337 | peak_valley_ratio.name = "Peak-Valley Ratio"; 338 | peak_valley_ratio.description = 339 | "Ratio of the mean correlation peak to the mean correlation valley."; 340 | peak_valley_ratio.unit = ""; 341 | peak_valley_ratio.hasFixedBinCount = true; 342 | peak_valley_ratio.binCount = 1; 343 | peak_valley_ratio.hasKnownExtents = false; 344 | peak_valley_ratio.isQuantized = false; 345 | peak_valley_ratio.sampleType = OutputDescriptor::VariableSampleRate; 346 | peak_valley_ratio.sampleRate = 0; 347 | peak_valley_ratio.hasDuration = false; 348 | list.push_back(peak_valley_ratio); 349 | 350 | OutputDescriptor tempo; 351 | tempo.identifier = "tempo"; 352 | tempo.name = "Tempo"; 353 | tempo.description = "Overall tempo of the track in BPM."; 354 | tempo.unit = "bpm"; 355 | tempo.hasFixedBinCount = true; 356 | tempo.binCount = 1; 357 | tempo.hasKnownExtents = false; 358 | tempo.isQuantized = false; 359 | tempo.sampleType = OutputDescriptor::VariableSampleRate; 360 | tempo.sampleRate = 0; 361 | tempo.hasDuration = false; 362 | list.push_back(tempo); 363 | 364 | return list; 365 | } 366 | 367 | bool Rhythm::initialise(size_t channels, size_t stepSize, size_t blockSize) { 368 | if (channels < getMinChannelCount() || channels > getMaxChannelCount()) 369 | return false; 370 | 371 | m_blockSize = blockSize; 372 | m_stepSize = stepSize; 373 | reset(); 374 | 375 | return true; 376 | } 377 | 378 | void Rhythm::reset() { 379 | intensity.clear(); 380 | } 381 | 382 | Rhythm::FeatureSet Rhythm::process(const float * const *inputBuffers, 383 | Vamp::RealTime timestamp) { 384 | FeatureSet output; 385 | float total = 0; 386 | int currentBand = 0; 387 | vector bandTotal; 388 | 389 | // set band totals to zero 390 | for (int i = 0; i < numBands; i++) 391 | bandTotal.push_back(0.f); 392 | 393 | // for each frequency bin 394 | for (int i = 0; i < m_blockSize / 2; i++) { 395 | // get absolute value 396 | float binVal = abs( 397 | complex(inputBuffers[0][i * 2], inputBuffers[0][i * 2 + 1])); 398 | 399 | // add contents of this bin to total 400 | total += binVal; 401 | 402 | // find centre frequency of this bin 403 | float freq = (i + 1) * m_sampleRate / (float) m_blockSize; 404 | 405 | // locate which band this bin belongs in 406 | while (freq > bandHighFreq[currentBand]) { 407 | currentBand++; 408 | if (currentBand >= numBands) 409 | break; 410 | } 411 | 412 | // add bin value to relevent band 413 | bandTotal.at(currentBand) += binVal; 414 | } 415 | 416 | intensity.push_back(bandTotal); 417 | 418 | return output; 419 | } 420 | 421 | Rhythm::FeatureSet Rhythm::getRemainingFeatures() { 422 | FeatureSet output; 423 | int frames = intensity.size(); 424 | 425 | if (frames == 0) 426 | return output; 427 | 428 | // find envelope by convolving each subband with half-hanning window 429 | vector > envelope; 430 | halfHannConvolve(envelope); 431 | 432 | // find onset curve by convolving each subband of envelope with canny window 433 | vector onset; 434 | cannyConvolve(envelope, onset); 435 | 436 | // normalise onset curve 437 | vector onsetNorm; 438 | normalise(onset, onsetNorm); 439 | 440 | // push normalised onset curve 441 | Feature f_onset; 442 | f_onset.hasTimestamp = true; 443 | for (unsigned i = 0; i < onsetNorm.size(); i++) { 444 | f_onset.timestamp = Vamp::RealTime::frame2RealTime(i * m_stepSize, 445 | m_sampleRate); 446 | f_onset.values.clear(); 447 | f_onset.values.push_back(onsetNorm.at(i)); 448 | output[0].push_back(f_onset); 449 | } 450 | 451 | // find moving average of onset curve and difference 452 | vector onsetAverage; 453 | vector onsetDiff; 454 | movingAverage(onsetNorm, average_window, threshold, onsetAverage, onsetDiff); 455 | 456 | // push moving average 457 | Feature f_avg; 458 | f_avg.hasTimestamp = true; 459 | for (unsigned i = 0; i < onsetAverage.size(); i++) { 460 | f_avg.timestamp = Vamp::RealTime::frame2RealTime(i * m_stepSize, 461 | m_sampleRate); 462 | f_avg.values.clear(); 463 | f_avg.values.push_back(onsetAverage.at(i)); 464 | output[1].push_back(f_avg); 465 | } 466 | 467 | // push difference from average 468 | Feature f_diff; 469 | f_diff.hasTimestamp = true; 470 | for (unsigned i = 0; i < onsetDiff.size(); i++) { 471 | f_diff.timestamp = Vamp::RealTime::frame2RealTime(i * m_stepSize, 472 | m_sampleRate); 473 | f_diff.values.clear(); 474 | f_diff.values.push_back(onsetDiff.at(i)); 475 | output[2].push_back(f_diff); 476 | } 477 | 478 | // choose peaks 479 | vector peaks; 480 | findOnsetPeaks(onsetDiff, peak_window, peaks); 481 | int onsetCount = (int) peaks.size(); 482 | 483 | // push peaks 484 | Feature f_peak; 485 | f_peak.hasTimestamp = true; 486 | for (unsigned i = 0; i < peaks.size(); i++) { 487 | f_peak.timestamp = Vamp::RealTime::frame2RealTime(peaks.at(i) * m_stepSize, 488 | m_sampleRate); 489 | output[3].push_back(f_peak); 490 | } 491 | 492 | // calculate average onset frequency 493 | float averageOnsetFreq = (float) onsetCount 494 | / (float) (frames * m_stepSize / m_sampleRate); 495 | Feature f_avgOnsetFreq; 496 | f_avgOnsetFreq.hasTimestamp = true; 497 | f_avgOnsetFreq.timestamp = Vamp::RealTime::fromSeconds(0.0); 498 | f_avgOnsetFreq.values.push_back(averageOnsetFreq); 499 | output[4].push_back(f_avgOnsetFreq); 500 | 501 | // calculate rhythm strength 502 | float rhythmStrength = findMeanPeak(onset, peaks, 0); 503 | Feature f_rhythmStrength; 504 | f_rhythmStrength.hasTimestamp = true; 505 | f_rhythmStrength.timestamp = Vamp::RealTime::fromSeconds(0.0); 506 | f_rhythmStrength.values.push_back(rhythmStrength); 507 | output[5].push_back(f_rhythmStrength); 508 | 509 | // find shift range for autocor 510 | float firstShift = (int) round(60.f / max_bpm * m_sampleRate / m_stepSize); 511 | float lastShift = (int) round(60.f / min_bpm * m_sampleRate / m_stepSize); 512 | 513 | // autocorrelation 514 | vector autocor; 515 | autocorrelation(onsetDiff, firstShift, lastShift, autocor); 516 | Feature f_autoCor; 517 | f_autoCor.hasTimestamp = true; 518 | for (float shift = firstShift; shift < lastShift; shift++) { 519 | f_autoCor.timestamp = Vamp::RealTime::frame2RealTime(shift * m_stepSize, 520 | m_sampleRate); 521 | f_autoCor.values.clear(); 522 | f_autoCor.values.push_back(autocor.at(shift - firstShift)); 523 | output[6].push_back(f_autoCor); 524 | } 525 | 526 | // find peaks in autocor 527 | float percentile = 95; 528 | int autocorWindowLength = 3; 529 | vector autocorPeaks; 530 | vector autocorValleys; 531 | findCorrelationPeaks(autocor, percentile, autocorWindowLength, firstShift, 532 | autocorPeaks, autocorValleys); 533 | 534 | // find average corrolation peak 535 | float meanCorrelationPeak = findMeanPeak(autocor, autocorPeaks, firstShift); 536 | Feature f_meanCorrelationPeak; 537 | f_meanCorrelationPeak.hasTimestamp = true; 538 | f_meanCorrelationPeak.timestamp = Vamp::RealTime::fromSeconds(0.0); 539 | f_meanCorrelationPeak.values.push_back(meanCorrelationPeak); 540 | output[7].push_back(f_meanCorrelationPeak); 541 | 542 | // find peak/valley ratio 543 | float meanCorrelationValley = findMeanPeak(autocor, autocorValleys, 544 | firstShift) + 0.0001; 545 | Feature f_peakValleyRatio; 546 | f_peakValleyRatio.hasTimestamp = true; 547 | f_peakValleyRatio.timestamp = Vamp::RealTime::fromSeconds(0.0); 548 | f_peakValleyRatio.values.push_back( 549 | meanCorrelationPeak / meanCorrelationValley); 550 | output[8].push_back(f_peakValleyRatio); 551 | 552 | // find tempo from peaks 553 | float tempo = findTempo(autocorPeaks); 554 | Feature f_tempo; 555 | f_tempo.hasTimestamp = true; 556 | f_tempo.timestamp = Vamp::RealTime::fromSeconds(0.0); 557 | f_tempo.values.push_back(tempo); 558 | output[9].push_back(f_tempo); 559 | 560 | return output; 561 | } 562 | 563 | /// @endcond 564 | 565 | void Rhythm::calculateBandFreqs() { 566 | delete[] bandHighFreq; 567 | bandHighFreq = new float[numBands]; 568 | 569 | for (int k = 0; k < numBands; k++) { 570 | bandHighFreq[k] = m_sampleRate / pow(2.f, numBands - k); 571 | } 572 | } 573 | 574 | float Rhythm::halfHanning(float n) { 575 | return 0.5f 576 | + 0.5f * cos(2.f * M_PI * (n / (2.f * (float) halfHannLength - 1.f))); 577 | } 578 | 579 | float Rhythm::canny(float n) { 580 | return n / (cannyShape * cannyShape) 581 | * exp(-1 * (n * n) / (2 * cannyShape * cannyShape)); 582 | } 583 | 584 | float Rhythm::findRemainder(vector peaks, int thisPeak) { 585 | float total = 0; 586 | for (unsigned i = 0; i < peaks.size(); i++) { 587 | float ratio = (float) peaks.at(i) / (float) thisPeak; 588 | total += abs(ratio - round(ratio)); 589 | } 590 | return total; 591 | } 592 | 593 | float Rhythm::findTempo(vector peaks) { 594 | if (peaks.empty()) return 0.f; 595 | float min = findRemainder(peaks, peaks.at(0)); 596 | int minPos = 0; 597 | for (unsigned i = 1; i < peaks.size(); i++) { 598 | float result = findRemainder(peaks, peaks.at(i)); 599 | if (result < min) { 600 | min = result; 601 | minPos = i; 602 | } 603 | } 604 | return 60.f / (peaks.at(minPos) * m_stepSize / m_sampleRate); 605 | } 606 | 607 | float Rhythm::findMeanPeak(vector signal, vector peaks, int shift) { 608 | float total = 0; 609 | for (unsigned i = 0; i < peaks.size(); i++) 610 | total += signal.at(peaks.at(i) - shift); 611 | return total / peaks.size(); 612 | } 613 | 614 | void Rhythm::findCorrelationPeaks(vector autocor_in, float percentile_in, 615 | int windowLength_in, int shift_in, 616 | vector& peaks_out, 617 | vector& valleys_out) { 618 | if (autocor_in.empty()) return; 619 | 620 | vector autocorSorted(autocor_in); 621 | std::sort(autocorSorted.begin(), autocorSorted.end()); 622 | float autocorThreshold = autocorSorted.at( 623 | percentile_in / 100.f * (autocorSorted.size() - 1)); 624 | 625 | int autocorValleyPos = 0; 626 | float autocorValleyValue = autocorThreshold; 627 | 628 | for (unsigned i = 0; i < autocor_in.size(); i++) { 629 | bool success = true; 630 | 631 | // check for valley 632 | if (autocor_in.at(i) < autocorValleyValue) { 633 | autocorValleyPos = i; 634 | autocorValleyValue = autocor_in.at(i); 635 | } 636 | 637 | // if below the threshold, move onto next element 638 | if (autocor_in.at(i) < autocorThreshold) 639 | continue; 640 | 641 | // check for other peaks in the area 642 | for (int j = windowLength_in * -1; j < windowLength_in + 1; j++) { 643 | if (i + j >= 0 && i + j < autocor_in.size()) { 644 | if (autocor_in.at(i + j) > autocor_in.at(i)) 645 | success = false; 646 | } 647 | } 648 | 649 | // save peak and valley 650 | if (success) { 651 | peaks_out.push_back(shift_in + i); 652 | valleys_out.push_back(shift_in + autocorValleyPos); 653 | autocorValleyValue = autocor_in.at(i); 654 | } 655 | } 656 | } 657 | 658 | void Rhythm::autocorrelation(vector signal_in, int startShift_in, 659 | int endShift_in, vector& autocor_out) { 660 | for (float shift = startShift_in; shift < endShift_in; shift++) { 661 | float result = 0; 662 | for (unsigned frame = 0; frame < signal_in.size(); frame++) { 663 | if (frame + shift < signal_in.size()) 664 | result += signal_in.at(frame) * signal_in.at(frame + shift); 665 | } 666 | autocor_out.push_back(result / signal_in.size()); 667 | } 668 | } 669 | 670 | void Rhythm::findOnsetPeaks(vector onset_in, int windowLength_in, 671 | vector& peaks_out) { 672 | for (unsigned frame = 0; frame < onset_in.size(); frame++) { 673 | bool success = true; 674 | 675 | // ignore 0 values 676 | if (onset_in.at(frame) <= 0) 677 | continue; 678 | 679 | // if any frames within windowSize have a bigger value, this is not the peak 680 | for (int i = windowLength_in * -1; i < windowLength_in + 1; i++) { 681 | if (frame + i >= 0 && frame + i < onset_in.size()) { 682 | if (onset_in.at(frame + i) > onset_in.at(frame)) 683 | success = false; 684 | } 685 | } 686 | 687 | // push result out 688 | if (success) { 689 | peaks_out.push_back(frame); 690 | } 691 | } 692 | } 693 | 694 | void Rhythm::movingAverage(vector signal_in, int windowLength_in, 695 | float threshold_in, vector& average_out, 696 | vector& difference_out) { 697 | float avgWindowLength = (windowLength_in * 2) + 1; 698 | for (unsigned frame = 0; frame < signal_in.size(); frame++) { 699 | float result = 0; 700 | for (int i = windowLength_in * -1; i < windowLength_in + 1; i++) { 701 | if (frame + i >= 0 && frame + i < signal_in.size()) 702 | result += abs(signal_in.at(frame + i)); 703 | } 704 | 705 | // calculate average and difference results 706 | float average = result / avgWindowLength + threshold_in; 707 | float difference = signal_in.at(frame) - average; 708 | if (difference < 0) 709 | difference = 0; 710 | 711 | average_out.push_back(average); 712 | difference_out.push_back(difference); 713 | } 714 | } 715 | 716 | void Rhythm::normalise(vector signal_in, vector& normalised_out) { 717 | // find mean 718 | float total = 0; 719 | for (unsigned i = 0; i < signal_in.size(); i++) 720 | total += signal_in.at(i); 721 | float mean = total / signal_in.size(); 722 | 723 | // find std dev 724 | float std = 0; 725 | for (unsigned i = 0; i < signal_in.size(); i++) 726 | std += pow(signal_in.at(i) - mean, 2); 727 | std = sqrt(std / signal_in.size()); 728 | 729 | // normalise and rectify 730 | for (unsigned i = 0; i < signal_in.size(); i++) { 731 | normalised_out.push_back((signal_in.at(i) - mean) / std); 732 | if (normalised_out.at(i) < 0) 733 | normalised_out.at(i) = 0; 734 | } 735 | } 736 | 737 | void Rhythm::halfHannConvolve(vector >& envelope_out) { 738 | for (unsigned frame = 0; frame < intensity.size(); frame++) { 739 | vector frameResult; 740 | for (int subBand = 0; subBand < numBands; subBand++) { 741 | float result = 0; 742 | for (int shift = 0; shift < halfHannLength; shift++) { 743 | if (frame + shift < intensity.size()) 744 | result += intensity.at(frame + shift).at(subBand) 745 | * halfHannWindow[shift]; 746 | } 747 | frameResult.push_back(result); 748 | } 749 | envelope_out.push_back(frameResult); 750 | } 751 | } 752 | 753 | void Rhythm::cannyConvolve(vector > envelope_in, 754 | vector& onset_out) { 755 | for (unsigned frame = 0; frame < envelope_in.size(); frame++) { 756 | // reset feature details 757 | float sum = 0; 758 | 759 | // for each sub-band 760 | for (int subBand = 0; subBand < numBands; subBand++) { 761 | // convolve the canny window with the envelope of that sub-band 762 | for (int shift = cannyLength * -1; shift < cannyLength; shift++) { 763 | if (frame + shift >= 0 && frame + shift < envelope_in.size()) 764 | sum += envelope_in.at(frame + shift).at(subBand) 765 | * cannyWindow[shift + cannyLength]; 766 | } 767 | } 768 | 769 | // save result 770 | onset_out.push_back(sum); 771 | } 772 | } 773 | -------------------------------------------------------------------------------- /src/Rhythm.h: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #ifndef _RHYTHM_H_ 19 | #define _RHYTHM_H_ 20 | 21 | #define _USE_MATH_DEFINES 22 | 23 | #include 24 | #include 25 | #include 26 | #include 27 | #include 28 | 29 | using std::string; 30 | using std::vector; 31 | using std::complex; 32 | using std::abs; 33 | using std::cos; 34 | 35 | /*! 36 | * \brief Calculates rhythmic features of a signal, including onsets and tempo 37 | * 38 | * \section Outputs 39 | * \par Onset Curve 40 | * The filtered and half-wave rectified intensity of the signal, used to detect 41 | * onsets. 42 | * \par Average 43 | * The moving average of the onset curve, plus the threshold - used for 44 | * selecting where the peaks of the onset curve are. 45 | * \par Difference 46 | * The difference between the onset curve and its moving average. Used as the 47 | * input for peak-picking. 48 | * \par Onset 49 | * The detected note onsets. 50 | * \par Average onset frequency 51 | * The mean number of onsets per minute. 52 | * \par Rhythm strength 53 | * The mean value of the peaks in the onset curve. 54 | * \par Autocorrelation 55 | * The autocorrelation of the difference curve. 56 | * \par Mean Correlation Peak 57 | * The mean value of the peaks in the autocorrelation. 58 | * \par Peak-Valley Ratio 59 | * The mean peak-valley ratio of the autocorrelation. 60 | * \par Tempo 61 | * The estimated tempo in beats per minute. 62 | * 63 | * \section Parameters 64 | * \par Sub-bands 65 | * Number of sub-bands to divide the signal into for applying the half-hanning 66 | * window. A higher increases accuracy at the cost of processing time. 67 | * (default = 7) 68 | * \par Threshold 69 | * Amount by which to increase the moving average filter. A higher number 70 | * produces fewer onsets. (default = 1.0) 71 | * \par Moving average window length 72 | * Length of moving average window. A higher number produces a smoother curve. 73 | * (default = 200) 74 | * \par Onset peak window length 75 | * Length of window used to select peaks in the difference curve. (default = 6) 76 | * \par Minimum BPM 77 | * Minimum tempo calculated using the autocorrelation. (default = 12) 78 | * \par Maximum BPM 79 | * Maximum tempo calculated using the autocorrelation. (default = 300) 80 | * 81 | * \section Description 82 | * 83 | * The rhythm features are based on the features described in [1] (section 3C), 84 | * combined with some techniques from [2]. 85 | * 86 | * Firstly the spectrum is divided into \f$n\f$ sub-bands with the following 87 | * frequency ranges. 88 | * \f[ \left(0,\frac{F_s}{2^n}\right) , \left(\frac{F_s}{2^n}, 89 | * \frac{F_s}{2^{n-1}}\right) , \ldots \left(\frac{F_s}{2^2}, 90 | * \frac{F_s}{2^1}\right) \f] 91 | * 92 | * For each sub-band, the magnitude of the FFT bins are summed, producing 93 | * \f$n\f$ signals. Each of the signals are convolved with a half-hanning 94 | * window, where \f$L\f$ is set as 12. 95 | * \f[ H(w) = 0.5 + 0.5\cos\left(2\pi \cdot \frac{w}{2L-1} \right) \hspace{20px} 96 | * w\in[0, L-1] \f] 97 | * 98 | * Subsequently, each of the signals are convolved with a peak-enhancing canny 99 | * window, where 100 | * \f$L\f$ is set as 12 and \f$\sigma\f$ is set as 4. 101 | * \f[ C(w) = \frac{w}{\sigma^2}e^{-\frac{w^2}{2\sigma^2}} \hspace{20px} 102 | * w\in[-L,L] \f] 103 | * 104 | * The \f$n\f$ signals are summed and half-wave rectified to produce the 105 | * onset curve. 106 | * 107 | * The moving average \f$A\f$ of the onset curve \f$O\f$ is produced from 108 | * the mean value of a rectangular window of length \f$(2L+1)\f$, plus a 109 | * threshold \f$t\f$. The threshold and moving average window 110 | * length parameters control \f$t\f$ and \f$L\f$ respectively. 111 | * \f[ A(x) = \displaystyle\sum\limits_{y=-L}^{L} \frac{O(x+y)}{2L+1} + t \f] 112 | * 113 | * The difference signal is created by subtracting the moving average 114 | * from the onset curve and applying half-wave rectification. 115 | * 116 | * An onset is detected when a sample is the maximum within a given 117 | * window of length \f$(2L+1)\f$, where \f$L\f$ is set by the parameter onset 118 | * peak window length. 119 | * 120 | * The average onset frequency is the total number of onsets divided by 121 | * the length of the track in minutes. 122 | * 123 | * The rhythm strength is the mean value of the peaks of the onset curve 124 | * (pre-averaging). 125 | * 126 | * The autocorrelation is the autocorrelation of the difference signal 127 | * between delays of \f$\frac{60}{T_{max}}\cdot\frac{F_s}{s}\f$ frames and 128 | * \f$\frac{60}{T_{min}}\cdot\frac{F_s}{s}\f$ frames, where \f$T_{min}\f$ and 129 | * \f$T_{max}\f$ are the min/max tempo in BPM and \f$s\f$ is the step size in 130 | * number of frames. 131 | * 132 | * The peaks of the autocorrelation - \f$P_i\f$ - are defined as those which are 133 | * above a certain threshold, defined as the 95% confidence interval, and whose 134 | * value is the maximum within a 7-sample window. The mean correleation 135 | * peak is the mean value of the selected peaks, and the peak-valley 136 | * ratio is the ratio between the mean correlation peak and the mean value 137 | * of the valleys. A valley is defined as the minimum value between two peaks. 138 | * 139 | * The tempo is defined as the maximum common divisor of the detected 140 | * peaks. It is found by minimising the function below: 141 | * \f[ T = \underset{P_k}{argmin} \displaystyle\sum\limits_{i=1}^{N} 142 | * \left|\frac{P_i}{P_k}-\text{round}\left(\frac{P_i}{P_k}\right)\right|\f] 143 | * 144 | * \section References 145 | * 146 | * [1] Lu, L., Liu, D., & Zhang, H.-J. (2006). Automatic Mood Detection and 147 | * Tracking of Music Audio Signals. IEEE Transactions on Audio, Speech and 148 | * Language Processing (Vol. 14, pp. 5-18). 149 | * 150 | * [2] Dixon, S. (2006). Onset Detection Revisited. International Conference 151 | * on Digital Audio Effects (DAFx) (pp. 133-137). 152 | */ 153 | class Rhythm : public Vamp::Plugin { 154 | public: 155 | /// @cond 156 | Rhythm(float inputSampleRate); 157 | virtual ~Rhythm(); 158 | string getIdentifier() const; 159 | string getName() const; 160 | string getDescription() const; 161 | string getMaker() const; 162 | int getPluginVersion() const; 163 | string getCopyright() const; 164 | InputDomain getInputDomain() const; 165 | size_t getPreferredBlockSize() const; 166 | size_t getPreferredStepSize() const; 167 | size_t getMinChannelCount() const; 168 | size_t getMaxChannelCount() const; 169 | ParameterList getParameterDescriptors() const; 170 | float getParameter(string identifier) const; 171 | void setParameter(string identifier, float value); 172 | ProgramList getPrograms() const; 173 | string getCurrentProgram() const; 174 | void selectProgram(string name); 175 | OutputList getOutputDescriptors() const; 176 | bool initialise(size_t channels, size_t stepSize, size_t blockSize); 177 | void reset(); 178 | FeatureSet process(const float * const *inputBuffers, 179 | Vamp::RealTime timestamp); 180 | FeatureSet getRemainingFeatures(); 181 | /// @endcond 182 | 183 | protected: 184 | void calculateBandFreqs(); 185 | float halfHanning(float n); 186 | float canny(float n); 187 | float findRemainder(vector peaks, int thisPeak); 188 | float findTempo(vector peaks); 189 | float findMeanPeak(vector signal, vector peaks, int shift); 190 | void findCorrelationPeaks(vector autocor_in, float percentile_in, 191 | int windowLength_in, int shift_in, 192 | vector& peaks_out, vector& valleys_out); 193 | void autocorrelation(vector signal_in, int startShift_in, 194 | int endShift_in, vector& autocor_out); 195 | void findOnsetPeaks(vector onset_in, int windowLength_in, 196 | vector& peaks_out); 197 | void movingAverage(vector signal_in, int windowLength_in, 198 | float threshold_in, vector& average_out, 199 | vector& difference_out); 200 | void normalise(vector signal_in, vector& normalised_out); 201 | void halfHannConvolve(vector >& envelope_out); 202 | void cannyConvolve(vector > envelope_in, 203 | vector& onset_out); 204 | 205 | /// @cond 206 | int m_blockSize, m_stepSize; 207 | float m_sampleRate; 208 | /// @endcond 209 | 210 | int numBands; /*!< Number of sub-bands */ 211 | float *bandHighFreq; /*!< Upper frequency of each sub-band */ 212 | int halfHannLength; /*!< Length of half-hanning window */ 213 | float *halfHannWindow;/*!< Co-efficients of half-hanning window */ 214 | int cannyLength; /*!< Length of canny window */ 215 | float cannyShape; /*!< Shape of canny window */ 216 | float *cannyWindow; /*!< Co-efficients of canny window */ 217 | vector > intensity; /*!< Intensity value for each block */ 218 | float threshold; /*!< Theshold value added to moving average */ 219 | int average_window; /*!< Length of moving average window */ 220 | int peak_window; /*!< Length of peak-picking window */ 221 | int max_bpm; /*!< Maximum BPM detected in autocorrelation */ 222 | int min_bpm; /*!< Minimum BPM detected in autocorrelation */ 223 | }; 224 | 225 | #endif 226 | -------------------------------------------------------------------------------- /src/SpectralContrast.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #include "SpectralContrast.h" 19 | /// @cond 20 | 21 | SpectralContrast::SpectralContrast(float inputSampleRate):Plugin(inputSampleRate) 22 | { 23 | m_sampleRate = inputSampleRate; 24 | alpha = 0.02; 25 | numBands = 7; 26 | bandHighFreq = NULL; 27 | calculateBandFreqs(); 28 | } 29 | 30 | SpectralContrast::~SpectralContrast() 31 | { 32 | delete[] bandHighFreq; 33 | } 34 | 35 | string 36 | SpectralContrast::getIdentifier() const 37 | { 38 | return "bbc-spectral-contrast"; 39 | } 40 | 41 | string 42 | SpectralContrast::getName() const 43 | { 44 | return "Spectral Contrast"; 45 | } 46 | 47 | string 48 | SpectralContrast::getDescription() const 49 | { 50 | return ""; 51 | } 52 | 53 | string 54 | SpectralContrast::getMaker() const 55 | { 56 | return "BBC"; 57 | } 58 | 59 | int 60 | SpectralContrast::getPluginVersion() const 61 | { 62 | return 1; 63 | } 64 | 65 | string 66 | SpectralContrast::getCopyright() const 67 | { 68 | return "(c) 2013 British Broadcasting Corporation"; 69 | } 70 | 71 | SpectralContrast::InputDomain 72 | SpectralContrast::getInputDomain() const 73 | { 74 | return FrequencyDomain; 75 | } 76 | 77 | size_t 78 | SpectralContrast::getPreferredBlockSize() const 79 | { 80 | return 1024; 81 | } 82 | 83 | size_t 84 | SpectralContrast::getPreferredStepSize() const 85 | { 86 | return 512; 87 | } 88 | 89 | size_t 90 | SpectralContrast::getMinChannelCount() const 91 | { 92 | return 1; 93 | } 94 | 95 | size_t 96 | SpectralContrast::getMaxChannelCount() const 97 | { 98 | return 1; 99 | } 100 | 101 | SpectralContrast::ParameterList 102 | SpectralContrast::getParameterDescriptors() const 103 | { 104 | ParameterList list; 105 | 106 | ParameterDescriptor paramAlpha; 107 | paramAlpha.identifier = "alpha"; 108 | paramAlpha.name = "Alpha"; 109 | paramAlpha.description = "Ratio of FFT bins used to find average"; 110 | paramAlpha.unit = ""; 111 | paramAlpha.minValue = 0; 112 | paramAlpha.maxValue = 1; 113 | paramAlpha.defaultValue = 0.02; 114 | paramAlpha.isQuantized = false; 115 | list.push_back(paramAlpha); 116 | 117 | ParameterDescriptor numBandsParam; 118 | numBandsParam.identifier = "numBands"; 119 | numBandsParam.name = "Sub-bands"; 120 | numBandsParam.description = "Number of sub-bands."; 121 | numBandsParam.unit = ""; 122 | numBandsParam.minValue = 2; 123 | numBandsParam.maxValue = 50; 124 | numBandsParam.defaultValue = 7; 125 | numBandsParam.isQuantized = true; 126 | numBandsParam.quantizeStep = 1.0; 127 | list.push_back(numBandsParam); 128 | 129 | return list; 130 | } 131 | 132 | float 133 | SpectralContrast::getParameter(string identifier) const 134 | { 135 | if (identifier == "alpha") 136 | return alpha; 137 | if (identifier == "numBands") 138 | return numBands; 139 | return 0; 140 | } 141 | 142 | void 143 | SpectralContrast::setParameter(string identifier, float value) 144 | { 145 | if (identifier == "alpha") { 146 | alpha = value; 147 | } 148 | if (identifier == "numBands") { 149 | numBands = value; 150 | calculateBandFreqs(); 151 | } 152 | } 153 | 154 | SpectralContrast::ProgramList 155 | SpectralContrast::getPrograms() const 156 | { 157 | ProgramList list; 158 | 159 | return list; 160 | } 161 | 162 | string 163 | SpectralContrast::getCurrentProgram() const 164 | { 165 | return ""; 166 | } 167 | 168 | void 169 | SpectralContrast::selectProgram(string name) 170 | { 171 | } 172 | 173 | SpectralContrast::OutputList 174 | SpectralContrast::getOutputDescriptors() const 175 | { 176 | OutputList list; 177 | 178 | OutputDescriptor SpectralValleys; 179 | SpectralValleys.identifier = "valleys"; 180 | SpectralValleys.name = "Spectral Valleys"; 181 | SpectralValleys.description = "Valley of the spectrum."; 182 | SpectralValleys.unit = ""; 183 | SpectralValleys.hasFixedBinCount = true; 184 | SpectralValleys.binCount = numBands; 185 | SpectralValleys.hasKnownExtents = false; 186 | SpectralValleys.isQuantized = false; 187 | SpectralValleys.sampleType = OutputDescriptor::OneSamplePerStep; 188 | SpectralValleys.hasDuration = false; 189 | list.push_back(SpectralValleys); 190 | 191 | OutputDescriptor SpectralPeaks; 192 | SpectralPeaks.identifier = "peaks"; 193 | SpectralPeaks.name = "Spectral Peaks"; 194 | SpectralPeaks.description = "Peak of the spectrum."; 195 | SpectralPeaks.unit = ""; 196 | SpectralPeaks.hasFixedBinCount = true; 197 | SpectralPeaks.binCount = numBands; 198 | SpectralPeaks.hasKnownExtents = false; 199 | SpectralPeaks.isQuantized = false; 200 | SpectralPeaks.sampleType = OutputDescriptor::OneSamplePerStep; 201 | SpectralPeaks.hasDuration = false; 202 | list.push_back(SpectralPeaks); 203 | 204 | OutputDescriptor SpectralMean; 205 | SpectralMean.identifier = "mean"; 206 | SpectralMean.name = "Spectral Mean"; 207 | SpectralMean.description = "Mean of the spectrum."; 208 | SpectralMean.unit = ""; 209 | SpectralMean.hasFixedBinCount = true; 210 | SpectralMean.binCount = numBands; 211 | SpectralMean.hasKnownExtents = false; 212 | SpectralMean.isQuantized = false; 213 | SpectralMean.sampleType = OutputDescriptor::OneSamplePerStep; 214 | SpectralMean.hasDuration = false; 215 | list.push_back(SpectralMean); 216 | 217 | return list; 218 | } 219 | 220 | bool 221 | SpectralContrast::initialise(size_t channels, size_t stepSize, size_t blockSize) 222 | { 223 | if (channels < getMinChannelCount() || channels > getMaxChannelCount()) return false; 224 | 225 | m_blockSize = blockSize; 226 | m_stepSize = stepSize; 227 | reset(); 228 | 229 | return true; 230 | } 231 | 232 | void 233 | SpectralContrast::reset() 234 | { 235 | } 236 | 237 | SpectralContrast::FeatureSet 238 | SpectralContrast::process(const float *const *inputBuffers, Vamp::RealTime timestamp) 239 | { 240 | FeatureSet output; 241 | Feature valleysOut; 242 | Feature peaksOut; 243 | Feature meanOut; 244 | int currentBand = 0; 245 | 246 | // create vector of vectors 247 | vector empty; 248 | vector< vector > bins; 249 | bins.push_back(empty); 250 | 251 | // for each frequency bin 252 | for (int i=0; i(inputBuffers[0][i*2], inputBuffers[0][i*2+1])); 256 | 257 | // find centre frequency of this bin 258 | float freq = (i+1)*m_sampleRate / (float)m_blockSize; 259 | 260 | // locate which band this bin belongs in 261 | while (freq > bandHighFreq[currentBand]) { 262 | currentBand++; 263 | if (currentBand >= numBands) break; 264 | bins.push_back(empty); 265 | } 266 | 267 | // add the bin to the relevent band vector 268 | bins.at(currentBand).push_back(binVal); 269 | } 270 | 271 | // for each band 272 | for (int band=0; band= (1/alpha)) end = round(bins.at(band).size()*alpha); 281 | float valleySum = 0; 282 | 283 | // find average of those bins 284 | for (int i=start; i= (1/alpha)) start = bins.at(band).size() - round(bins.at(band).size()*alpha); 293 | end = bins.at(band).size(); 294 | float peakSum = 0; 295 | 296 | // find average of those bins 297 | for (int i=start; i 22 | #include 23 | #include 24 | #include 25 | #include 26 | 27 | using std::string; 28 | using std::vector; 29 | using std::complex; 30 | using std::abs; 31 | 32 | /*! 33 | * \brief Calculates the peak and valleys of the spectral contrast feature 34 | * 35 | * \section Outputs 36 | * \par Valleys 37 | * The valley of each frequency sub-band 38 | * \par Peaks 39 | * The peak of each frequency sub-band 40 | * \par Mean 41 | * The mean of each frequency sub-band 42 | * 43 | * \section Parameters 44 | * \par Alpha 45 | * Ratio of FFT bins used to find the peak/valley in each sub-band (default = 0.02) 46 | * \par Sub-bands 47 | * The number of sub-bands to use. (default = 7) 48 | * 49 | * \section Description 50 | * 51 | * This simple algorithm, taken from [1], divides a signal into N sub-bands and 52 | * sorts the FFT bins in each sub-band by magnitude. The peak and valley are 53 | * found by taking a proportion (defined as alpha) of FFT bins from the 54 | * top/bottom of the sorted bins and finding the mean of those. The mean of all 55 | * the FFT bins in each sub-band are also calculated. The 'spectral contrast' 56 | * can be found by subtracting the valley from the peak in each sub-band, 57 | * although this isn't calculated in the plugin. 58 | * 59 | * [1] Jiang, D.-N., Lu, L., & Zhang, H.-J. (2002). Music type classification 60 | * by spectral contrast feature. IEEE International Conference on Multimedia 61 | * and Expo (pp. 113–116). 62 | * 63 | * Thanks to Erik Schmidt at Drexel for providing a reference MATLAB implementation. 64 | */ 65 | class SpectralContrast : public Vamp::Plugin 66 | { 67 | public: 68 | /// @cond 69 | SpectralContrast(float inputSampleRate); 70 | virtual ~SpectralContrast(); 71 | string getIdentifier() const; 72 | string getName() const; 73 | string getDescription() const; 74 | string getMaker() const; 75 | int getPluginVersion() const; 76 | string getCopyright() const; 77 | InputDomain getInputDomain() const; 78 | size_t getPreferredBlockSize() const; 79 | size_t getPreferredStepSize() const; 80 | size_t getMinChannelCount() const; 81 | size_t getMaxChannelCount() const; 82 | ParameterList getParameterDescriptors() const; 83 | float getParameter(string identifier) const; 84 | void setParameter(string identifier, 85 | float value); 86 | ProgramList getPrograms() const; 87 | string getCurrentProgram() const; 88 | void selectProgram(string name); 89 | OutputList getOutputDescriptors() const; 90 | bool initialise(size_t channels, 91 | size_t stepSize, 92 | size_t blockSize); 93 | void reset(); 94 | FeatureSet process(const float *const *inputBuffers, 95 | Vamp::RealTime timestamp); 96 | FeatureSet getRemainingFeatures(); 97 | void calculateBandFreqs(); 98 | /// @endcond 99 | 100 | protected: 101 | /// @cond 102 | int m_blockSize, m_stepSize; 103 | float m_sampleRate; 104 | /// @endcond 105 | 106 | float alpha; /*!< Alpha parameter of spectral contrast algorithm*/ 107 | int numBands; /*!< Number of sub-bands to use */ 108 | float *bandHighFreq; /*!< Upper frequency range of each sub-band */ 109 | }; 110 | 111 | #endif 112 | -------------------------------------------------------------------------------- /src/SpectralFlux.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #include "SpectralFlux.h" 19 | /// @cond 20 | 21 | SpectralFlux::SpectralFlux(float inputSampleRate):Plugin(inputSampleRate) 22 | { 23 | l2norm = false; 24 | } 25 | 26 | SpectralFlux::~SpectralFlux() 27 | { 28 | } 29 | 30 | string 31 | SpectralFlux::getIdentifier() const 32 | { 33 | return "bbc-spectral-flux"; 34 | } 35 | 36 | string 37 | SpectralFlux::getName() const 38 | { 39 | return "Spectral Flux"; 40 | } 41 | 42 | string 43 | SpectralFlux::getDescription() const 44 | { 45 | return ""; 46 | } 47 | 48 | string 49 | SpectralFlux::getMaker() const 50 | { 51 | return "BBC"; 52 | } 53 | 54 | int 55 | SpectralFlux::getPluginVersion() const 56 | { 57 | return 1; 58 | } 59 | 60 | string 61 | SpectralFlux::getCopyright() const 62 | { 63 | return "(c) 2013 British Broadcasting Corporation"; 64 | } 65 | 66 | SpectralFlux::InputDomain 67 | SpectralFlux::getInputDomain() const 68 | { 69 | return FrequencyDomain; 70 | } 71 | 72 | size_t 73 | SpectralFlux::getPreferredBlockSize() const 74 | { 75 | return 1024; 76 | } 77 | 78 | size_t 79 | SpectralFlux::getPreferredStepSize() const 80 | { 81 | return 1024; 82 | } 83 | 84 | size_t 85 | SpectralFlux::getMinChannelCount() const 86 | { 87 | return 1; 88 | } 89 | 90 | size_t 91 | SpectralFlux::getMaxChannelCount() const 92 | { 93 | return 1; 94 | } 95 | 96 | SpectralFlux::ParameterList 97 | SpectralFlux::getParameterDescriptors() const 98 | { 99 | ParameterList list; 100 | 101 | ParameterDescriptor usel2; 102 | usel2.identifier = "usel2"; 103 | usel2.name = "Use L2 norm over L1"; 104 | usel2.description = "Replaces L1 normalisation with L2."; 105 | usel2.unit = ""; 106 | usel2.minValue = 0; 107 | usel2.maxValue = 1; 108 | usel2.defaultValue = 0; 109 | usel2.isQuantized = true; 110 | usel2.quantizeStep = 1.0; 111 | list.push_back(usel2); 112 | 113 | return list; 114 | } 115 | 116 | float 117 | SpectralFlux::getParameter(string identifier) const 118 | { 119 | if (identifier == "usel2") 120 | return l2norm; 121 | return 0; 122 | } 123 | 124 | void 125 | SpectralFlux::setParameter(string identifier, float value) 126 | { 127 | if (identifier == "usel2") { 128 | l2norm = value; 129 | } 130 | } 131 | 132 | SpectralFlux::ProgramList 133 | SpectralFlux::getPrograms() const 134 | { 135 | ProgramList list; 136 | 137 | return list; 138 | } 139 | 140 | string 141 | SpectralFlux::getCurrentProgram() const 142 | { 143 | return ""; 144 | } 145 | 146 | void 147 | SpectralFlux::selectProgram(string name) 148 | { 149 | } 150 | 151 | SpectralFlux::OutputList 152 | SpectralFlux::getOutputDescriptors() const 153 | { 154 | OutputList list; 155 | 156 | OutputDescriptor spectralflux; 157 | spectralflux.identifier = "spectral-flux"; 158 | spectralflux.name = "Spectral Flux"; 159 | spectralflux.description = "Difference between FFT bin values."; 160 | spectralflux.unit = ""; 161 | spectralflux.hasFixedBinCount = true; 162 | spectralflux.binCount = 1; 163 | spectralflux.hasKnownExtents = false; 164 | spectralflux.isQuantized = false; 165 | spectralflux.sampleType = OutputDescriptor::OneSamplePerStep; 166 | spectralflux.hasDuration = false; 167 | list.push_back(spectralflux); 168 | 169 | return list; 170 | } 171 | 172 | bool 173 | SpectralFlux::initialise(size_t channels, size_t stepSize, size_t blockSize) 174 | { 175 | if (channels < getMinChannelCount() || 176 | channels > getMaxChannelCount()) return false; 177 | 178 | m_blockSize = blockSize; 179 | m_stepSize = stepSize; 180 | reset(); 181 | 182 | return true; 183 | } 184 | 185 | void 186 | SpectralFlux::reset() 187 | { 188 | prevBin.clear(); 189 | } 190 | 191 | SpectralFlux::FeatureSet 192 | SpectralFlux::process(const float *const *inputBuffers, Vamp::RealTime timestamp) 193 | { 194 | FeatureSet output; 195 | float total = 0; 196 | 197 | // for each frequency bin 198 | for (int i=0; i= prevBin.size()) 202 | { 203 | prevBin.push_back(0.f); 204 | } 205 | 206 | // get absolute value 207 | float bin = abs(complex(inputBuffers[0][i*2], inputBuffers[0][i*2+1])); 208 | 209 | // find difference from prev frame 210 | float diff = bin - prevBin.at(i); 211 | 212 | // save current frame 213 | prevBin.at(i) = bin; 214 | 215 | // have-wave rectify 216 | if (diff < 0) diff = diff * -1; 217 | 218 | // square if L2 norm 219 | if (l2norm) diff = diff*diff; 220 | 221 | // add to total 222 | total += diff; 223 | } 224 | 225 | // find root of total if L2 norm 226 | if (l2norm) total = sqrt(total); 227 | 228 | // send SpectralFlux outputs 229 | Feature flux; 230 | flux.values.push_back(total); 231 | output[0].push_back(flux); 232 | 233 | return output; 234 | } 235 | 236 | SpectralFlux::FeatureSet 237 | SpectralFlux::getRemainingFeatures() 238 | { 239 | return FeatureSet(); 240 | } 241 | 242 | /// @endcond 243 | -------------------------------------------------------------------------------- /src/SpectralFlux.h: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #ifndef _FLUX_H_ 19 | #define _FLUX_H_ 20 | 21 | #include 22 | #include 23 | #include 24 | #include 25 | 26 | using std::string; 27 | using std::vector; 28 | using std::complex; 29 | using std::abs; 30 | 31 | /*! 32 | * \brief Calculates the spectral flux 33 | * 34 | * \section Outputs 35 | * \par Spectral flux 36 | * The spectral difference between successive frames. 37 | * 38 | * \section Parameters 39 | * \par Use L2 norm 40 | * Whether to use L2 normalisation over L1 (default = 0) 41 | * 42 | * \section Description 43 | * 44 | * The algorithm is defined in [1], section 2.1: 45 | * "Spectral flux measures the change in magnitude in each frequency bin. It is 46 | * restricted to the positive changes and summed across all frequency bins." 47 | * 48 | * When using L1 norm, the algorithm is as follows: 49 | * \f[ SF(n) = \sum_{k=0}^{F_s/2} H\left( | X(n,k)| - |X(n-1,k)| \right) \f] 50 | * 51 | * When L2 norm is selected, the following is used: 52 | * \f[ SF(n) = \sqrt{ \sum_{k=0}^{F_s/2} H\left( | X(n,k)| - |X(n-1,k)| \right)^2 } \f] 53 | * 54 | * In both cases, \f$ H(x) = \frac{x+|x|}{2} \f$ 55 | * 56 | * [1] Dixon, S. (2006). Onset Detection Revisited. International Conference on 57 | * Digital Audio Effects (DAFx) (pp. 133–137). 58 | */ 59 | class SpectralFlux : public Vamp::Plugin 60 | { 61 | public: 62 | /// @cond 63 | SpectralFlux(float inputSampleRate); 64 | virtual ~SpectralFlux(); 65 | string getIdentifier() const; 66 | string getName() const; 67 | string getDescription() const; 68 | string getMaker() const; 69 | int getPluginVersion() const; 70 | string getCopyright() const; 71 | InputDomain getInputDomain() const; 72 | size_t getPreferredBlockSize() const; 73 | size_t getPreferredStepSize() const; 74 | size_t getMinChannelCount() const; 75 | size_t getMaxChannelCount() const; 76 | ParameterList getParameterDescriptors() const; 77 | float getParameter(string identifier) const; 78 | void setParameter(string identifier, 79 | float value); 80 | ProgramList getPrograms() const; 81 | string getCurrentProgram() const; 82 | void selectProgram(string name); 83 | OutputList getOutputDescriptors() const; 84 | bool initialise(size_t channels, 85 | size_t stepSize, 86 | size_t blockSize); 87 | void reset(); 88 | FeatureSet process(const float *const *inputBuffers, 89 | Vamp::RealTime timestamp); 90 | FeatureSet getRemainingFeatures(); 91 | /// @endcond 92 | 93 | protected: 94 | /// @cond 95 | int m_blockSize, m_stepSize; 96 | vector prevBin; 97 | /// @endcond 98 | 99 | bool l2norm; /*!< Flag to indicate use of L2 normalisation */ 100 | }; 101 | 102 | #endif 103 | -------------------------------------------------------------------------------- /src/SpeechMusicSegmenter.cpp: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #include "SpeechMusicSegmenter.h" 19 | /// @cond 20 | 21 | SpeechMusicSegmenter::SpeechMusicSegmenter(float inputSampleRate) : 22 | Plugin(inputSampleRate), 23 | m_blockSize(0), 24 | m_nframes(0), 25 | resolution(256), 26 | margin(14), 27 | change_threshold(0.0781), 28 | decision_threshold(0.2734), 29 | min_music_length(0) 30 | // Also be sure to set your plugin parameters (presumably stored 31 | // in member variables) to their default values here -- the host 32 | // will not do that for you 33 | { 34 | } 35 | 36 | SpeechMusicSegmenter::~SpeechMusicSegmenter() 37 | { 38 | } 39 | 40 | string 41 | SpeechMusicSegmenter::getIdentifier() const 42 | { 43 | return "bbc-speechmusic-segmenter"; 44 | } 45 | 46 | string 47 | SpeechMusicSegmenter::getName() const 48 | { 49 | return "Speech/Music segmenter"; 50 | } 51 | 52 | string 53 | SpeechMusicSegmenter::getDescription() const 54 | { 55 | return "A simple speech/music segmenter"; 56 | } 57 | 58 | string 59 | SpeechMusicSegmenter::getMaker() const 60 | { 61 | return "BBC"; 62 | } 63 | 64 | int 65 | SpeechMusicSegmenter::getPluginVersion() const 66 | { 67 | // Increment this each time you release a version that behaves 68 | // differently from the previous one 69 | return 1; 70 | } 71 | 72 | string 73 | SpeechMusicSegmenter::getCopyright() const 74 | { 75 | // This function is not ideally named. It does not necessarily 76 | // need to say who made the plugin -- getMaker does that -- but it 77 | // should indicate the terms under which it is distributed. For 78 | // example, "Copyright (year). All Rights Reserved", or "GPL" 79 | return "(c) 2011 British Broadcasting Corporation"; 80 | } 81 | 82 | SpeechMusicSegmenter::InputDomain 83 | SpeechMusicSegmenter::getInputDomain() const 84 | { 85 | return TimeDomain; 86 | } 87 | 88 | size_t 89 | SpeechMusicSegmenter::getPreferredBlockSize() const 90 | { 91 | return 0; // 0 means "I can handle any block size" 92 | } 93 | 94 | size_t 95 | SpeechMusicSegmenter::getPreferredStepSize() const 96 | { 97 | return 0; // 0 means "anything sensible"; in practice this 98 | // means the same as the block size for TimeDomain 99 | // plugins, or half of it for FrequencyDomain plugins 100 | } 101 | 102 | size_t 103 | SpeechMusicSegmenter::getMinChannelCount() const 104 | { 105 | return 1; 106 | } 107 | 108 | size_t 109 | SpeechMusicSegmenter::getMaxChannelCount() const 110 | { 111 | return 1; 112 | } 113 | 114 | SpeechMusicSegmenter::ParameterList 115 | SpeechMusicSegmenter::getParameterDescriptors() const 116 | { 117 | ParameterList list; 118 | 119 | // If the plugin has no adjustable parameters, return an empty 120 | // list here (and there's no need to provide implementations of 121 | // getParameter and setParameter in that case either). 122 | 123 | // Note that it is your responsibility to make sure the parameters 124 | // start off having their default values (e.g. in the constructor 125 | // above). The host needs to know the default value so it can do 126 | // things like provide a "reset to default" function, but it will 127 | // not explicitly set your parameters to their defaults for you if 128 | // they have not changed in the mean time. 129 | 130 | ParameterDescriptor d; 131 | d.identifier = "resolution"; 132 | d.name = "Resolution"; 133 | d.description = "Resolution (in number of frames) at which segment boundaries can be found"; 134 | d.unit = ""; 135 | d.minValue = 1; 136 | d.maxValue = 1024; 137 | d.defaultValue = 256; 138 | d.isQuantized = true; 139 | d.quantizeStep = 1; 140 | list.push_back(d); 141 | 142 | ParameterDescriptor d21; 143 | d21.identifier = "change_threshold"; 144 | d21.name = "Change threshold"; 145 | d21.description = "Threshold the detection function needs to exceed for a corresponding segment change to be taken into account"; 146 | d21.unit = ""; 147 | d21.minValue = 0; 148 | d21.maxValue = 1; 149 | d21.defaultValue = 0.0781; 150 | d21.isQuantized = false; 151 | list.push_back(d21); 152 | 153 | ParameterDescriptor d22; 154 | d22.identifier = "decision_threshold"; 155 | d22.name = "Decision threshold"; 156 | d22.description = "Mean of detection function above threshold: speech; Mean of detection function below threshold: music"; 157 | d22.unit = ""; 158 | d22.minValue = 0; 159 | d22.maxValue = 1; 160 | d22.defaultValue = 0.2734; 161 | d22.isQuantized = false; 162 | list.push_back(d22); 163 | 164 | ParameterDescriptor d23; 165 | d23.identifier = "min_music_length"; 166 | d23.name = "Minimum music segment length"; 167 | d23.description = "The minimum length of a music segment"; 168 | d23.unit = ""; 169 | d23.minValue = 0; 170 | d23.maxValue = 100; 171 | d23.defaultValue = 0; 172 | d23.isQuantized = false; 173 | list.push_back(d23); 174 | 175 | ParameterDescriptor d3; 176 | d3.identifier = "margin"; 177 | d3.name = "Margin"; 178 | d3.description = "Margin around mean ZCR under which no value is taken into account in the detection function"; 179 | d3.unit = ""; 180 | d3.minValue = 0; 181 | d3.defaultValue = 14; 182 | d3.maxValue = 50; 183 | d3.isQuantized = false; 184 | list.push_back(d3); 185 | 186 | return list; 187 | } 188 | 189 | float 190 | SpeechMusicSegmenter::getParameter(string identifier) const 191 | { 192 | if (identifier == "resolution") { 193 | return resolution; 194 | } 195 | 196 | if (identifier == "change_threshold") { 197 | return change_threshold; 198 | } 199 | 200 | if (identifier == "decision_threshold") { 201 | return decision_threshold; 202 | } 203 | 204 | if (identifier == "min_music_length") { 205 | return min_music_length; 206 | } 207 | 208 | if (identifier == "margin") { 209 | return margin; 210 | } 211 | 212 | std::cerr << "WARNING: SegmenterPlugin::getParameter: unknown parameter \"" 213 | << identifier << "\"" << std::endl; 214 | return 0.0; 215 | } 216 | 217 | void 218 | SpeechMusicSegmenter::setParameter(string identifier, float value) 219 | { 220 | if (identifier == "resolution") { 221 | resolution = value; 222 | return; 223 | } 224 | 225 | if (identifier == "change_threshold") { 226 | change_threshold = value; 227 | return; 228 | } 229 | 230 | if (identifier == "decision_threshold") { 231 | decision_threshold = value; 232 | return; 233 | } 234 | 235 | if (identifier == "min_music_length") { 236 | min_music_length = value; 237 | return; 238 | } 239 | 240 | if (identifier == "margin") { 241 | margin = value; 242 | return; 243 | } 244 | 245 | std::cerr << "WARNING: SegmenterPlugin::setParameter: unknown parameter \"" 246 | << identifier << "\"" << std::endl; 247 | } 248 | 249 | SpeechMusicSegmenter::ProgramList 250 | SpeechMusicSegmenter::getPrograms() const 251 | { 252 | ProgramList list; 253 | 254 | // If you have no programs, return an empty list (or simply don't 255 | // implement this function or getCurrentProgram/selectProgram) 256 | 257 | return list; 258 | } 259 | 260 | string 261 | SpeechMusicSegmenter::getCurrentProgram() const 262 | { 263 | return ""; // no programs 264 | } 265 | 266 | void 267 | SpeechMusicSegmenter::selectProgram(string name) 268 | { 269 | } 270 | 271 | SpeechMusicSegmenter::OutputList 272 | SpeechMusicSegmenter::getOutputDescriptors() const 273 | { 274 | OutputList list; 275 | 276 | OutputDescriptor segmentation; 277 | segmentation.identifier = "segmentation"; 278 | segmentation.name = "Segmentation"; 279 | segmentation.description = "Segmentation"; 280 | segmentation.unit = "segment-type"; 281 | segmentation.hasFixedBinCount = true; 282 | segmentation.binCount = 1; 283 | segmentation.hasKnownExtents = true; 284 | segmentation.minValue = 0; 285 | segmentation.maxValue = 2; 286 | segmentation.isQuantized = true; 287 | segmentation.quantizeStep = 1; 288 | segmentation.sampleType = OutputDescriptor::VariableSampleRate; 289 | segmentation.sampleRate = m_inputSampleRate / getPreferredStepSize(); 290 | 291 | OutputDescriptor skewness; 292 | skewness.identifier = "skewness"; 293 | skewness.name = "Detection function"; 294 | skewness.description = "Detection function"; 295 | skewness.unit = "segment-type"; 296 | skewness.hasFixedBinCount = true; 297 | skewness.binCount = 1; 298 | skewness.hasKnownExtents = true; 299 | skewness.minValue = 0; 300 | skewness.maxValue = 2; 301 | skewness.isQuantized = true; 302 | skewness.quantizeStep = 1; 303 | skewness.sampleType = OutputDescriptor::VariableSampleRate; 304 | skewness.sampleRate = m_inputSampleRate / getPreferredStepSize(); 305 | 306 | list.push_back(segmentation); 307 | list.push_back(skewness); 308 | 309 | return list; 310 | } 311 | 312 | bool 313 | SpeechMusicSegmenter::initialise(size_t channels, size_t stepSize, size_t blockSize) 314 | { 315 | if (channels < getMinChannelCount() || 316 | channels > getMaxChannelCount()) return false; 317 | 318 | // Real initialisation work goes here! 319 | m_blockSize = blockSize; 320 | 321 | return true; 322 | } 323 | 324 | void 325 | SpeechMusicSegmenter::reset() 326 | { 327 | // Clear buffers, reset stored values, etc 328 | m_zcr.erase(m_zcr.begin(), m_zcr.end()); 329 | m_nframes = 0; 330 | } 331 | 332 | SpeechMusicSegmenter::FeatureSet 333 | SpeechMusicSegmenter::process(const float *const *inputBuffers, Vamp::RealTime timestamp) 334 | { 335 | // Extracting ZCR per frame 336 | size_t i = 1; 337 | double zc = 0.0; 338 | 339 | while (i < m_blockSize) { 340 | if ((inputBuffers[0][i] * inputBuffers[0][i - 1]) < 0) zc += 1; 341 | i += 1; 342 | } 343 | zc /= (m_blockSize - 1); 344 | m_zcr.push_back(zc); 345 | 346 | m_nframes += 1; 347 | 348 | return FeatureSet(); 349 | } 350 | 351 | SpeechMusicSegmenter::FeatureSet 352 | SpeechMusicSegmenter::getRemainingFeatures() 353 | { 354 | FeatureSet features; 355 | vector skewness = getSkewnessFunction(); 356 | double old_mean = 0.0; 357 | int feature_size = 0; 358 | for (int n = 0; n < m_nframes / resolution; n++) { 359 | double mean = 0.0; 360 | for (int i = 0; i < resolution; i++) { 361 | mean += skewness[n * resolution + i]; 362 | } 363 | mean /= resolution; 364 | if ((n > 0 && std::abs(mean - old_mean) > change_threshold) || n == 0) { 365 | Feature feature; feature.hasTimestamp = true; 366 | feature.timestamp = Vamp::RealTime::frame2RealTime((n * resolution + resolution / 2.0) * m_blockSize, static_cast(m_inputSampleRate)); 367 | vector floatval; 368 | floatval.push_back(mean); 369 | if (mean < decision_threshold) { 370 | feature.label = "Music"; 371 | } else { 372 | feature.label = "Speech"; 373 | } 374 | feature.values = floatval; 375 | if (feature_size == 0 || (feature_size > 0 && feature.label != features[0].back().label)) { 376 | if (feature_size > 0 && features[0].back().label == "Music" && 377 | (feature.timestamp - features[0].back().timestamp < Vamp::RealTime::fromSeconds(min_music_length)) 378 | ) { 379 | features[0].pop_back(); 380 | feature_size -= 1; 381 | } else { 382 | if (feature_size == 0) feature.timestamp = Vamp::RealTime::fromSeconds(0); 383 | features[0].push_back(feature); 384 | feature_size += 1; 385 | } 386 | } 387 | } 388 | old_mean = mean; 389 | } 390 | 391 | for (unsigned int n = 1; n < skewness.size(); n++) { 392 | Feature feature; 393 | feature.hasTimestamp = true; 394 | feature.timestamp = Vamp::RealTime::frame2RealTime(n * m_blockSize, static_cast(m_inputSampleRate)); 395 | vector floatval; 396 | floatval.push_back(skewness[n]); 397 | feature.values = floatval; 398 | features[1].push_back(feature); 399 | } 400 | 401 | return features; 402 | } 403 | 404 | vector 405 | SpeechMusicSegmenter::getSkewnessFunction() 406 | { 407 | double threshold_d = margin / 1000; 408 | vector skewness; 409 | 410 | double mean_zcr; 411 | for (int n = 0; n < m_nframes; n++) { 412 | int i = 0; 413 | mean_zcr = 0.0; 414 | while (i < resolution && n+i < m_zcr.size()) { 415 | mean_zcr += m_zcr[n + i]; 416 | i += 1; 417 | } 418 | mean_zcr /= resolution; 419 | i = 0; 420 | int above = 0; 421 | int below = 0; 422 | while (i < resolution && n+i < m_zcr.size()) { 423 | if (m_zcr[n + i] > (mean_zcr + threshold_d)) above += 1; 424 | if (m_zcr[n + i] < (mean_zcr - threshold_d)) below += 1; 425 | i += 1; 426 | } 427 | double skewness_value = below - above; 428 | skewness_value /= resolution; 429 | skewness.push_back(skewness_value); 430 | } 431 | return skewness; 432 | } 433 | /// @endcond 434 | -------------------------------------------------------------------------------- /src/SpeechMusicSegmenter.h: -------------------------------------------------------------------------------- 1 | /** 2 | * BBC Vamp plugin collection 3 | * 4 | * Copyright (c) 2011-2013 British Broadcasting Corporation 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | #ifndef _SPEECHMUSIC_PLUGIN_H_ 19 | #define _SPEECHMUSIC_PLUGIN_H_ 20 | 21 | #include 22 | #include 23 | #include 24 | #include 25 | 26 | using std::string; 27 | using std::vector; 28 | 29 | /*! 30 | * \brief Calculates boundaries between speech and music 31 | * 32 | * \section Outputs 33 | * \par Segmentation 34 | * Impulses at the boundary points. 35 | * \par Detection function 36 | * Function used to find boundaries. 37 | * 38 | * \section Parameters 39 | * \par Resolution 40 | * The number of frames defining the window at which candidate changes might 41 | * be found (default = 256) 42 | * \par Change threshold 43 | * The threshold of skewness difference at which a candidate change will be marked 44 | * (default = 0.0781) 45 | * \par Decision threshold 46 | * The threshold used to classify segments as speech or music (default = 0.2734) 47 | * \par Margin 48 | * A parameter for the generation of the ZCR skewness (margin around mean ZCR where 49 | * no ZCR samples will be taken into account) (default = 14) 50 | * \par Minimum music segment length 51 | * Music segments that are shorter than this minimum length will be dismissed 52 | * (default = 0) 53 | * 54 | * \section Description 55 | * 56 | * This Vamp plugin is heavily inspired by the approach described in [1]. 57 | * 58 | * The algorithm works as follows: 59 | * 60 | * -# Measure the skewness of the distribution of zero-crossing rate across the audio file; 61 | * -# Find points at which this distribution changes drastically; 62 | * -# For each candidate change point found, classify the corresponding segment as follows: 63 | * - Mean skewness > threshold: speech 64 | * - Mean skewness < threshold: music 65 | * -# If the segment has the same type with the previous one, merge it with 66 | * the previous one. 67 | * 68 | * This is a very early prototype, so not very accurate. It is relatively fast 69 | * (around 1s to process a 20 minute file). 70 | * 71 | * \section References 72 | * [1] J. Saunders, "Real-time discrimination of broadcast speech/music," 73 | * IEEE International Conference on Acoustics, Speech, and Signal Processing, 74 | * vol.2, pp.993-999, 7-10 May 1996 75 | */ 76 | class SpeechMusicSegmenter : public Vamp::Plugin 77 | { 78 | public: 79 | /// @cond 80 | SpeechMusicSegmenter(float inputSampleRate); 81 | virtual ~SpeechMusicSegmenter(); 82 | 83 | string getIdentifier() const; 84 | string getName() const; 85 | string getDescription() const; 86 | string getMaker() const; 87 | int getPluginVersion() const; 88 | string getCopyright() const; 89 | 90 | InputDomain getInputDomain() const; 91 | size_t getPreferredBlockSize() const; 92 | size_t getPreferredStepSize() const; 93 | size_t getMinChannelCount() const; 94 | size_t getMaxChannelCount() const; 95 | 96 | ParameterList getParameterDescriptors() const; 97 | float getParameter(string identifier) const; 98 | void setParameter(string identifier, float value); 99 | 100 | ProgramList getPrograms() const; 101 | string getCurrentProgram() const; 102 | void selectProgram(string name); 103 | 104 | OutputList getOutputDescriptors() const; 105 | 106 | bool initialise(size_t channels, size_t stepSize, size_t blockSize); 107 | void reset(); 108 | 109 | FeatureSet process(const float *const *inputBuffers, 110 | Vamp::RealTime timestamp); 111 | 112 | FeatureSet getRemainingFeatures(); 113 | vector getSkewnessFunction(); 114 | /// @endcond 115 | 116 | protected: 117 | /// @cond 118 | size_t m_blockSize; 119 | /// @endcond 120 | vector m_zcr; 121 | int m_nframes; 122 | int resolution; 123 | double margin; 124 | double change_threshold; 125 | double decision_threshold; 126 | double min_music_length; 127 | }; 128 | 129 | 130 | 131 | #endif 132 | -------------------------------------------------------------------------------- /src/plugins.cpp: -------------------------------------------------------------------------------- 1 | // This is a skeleton file for use in creating your own plugin 2 | // libraries. Replace MyPlugin and myPlugin throughout with the name 3 | // of your first plugin class, and fill in the gaps as appropriate. 4 | 5 | #include 6 | #include 7 | 8 | #include "Energy.h" 9 | #include "Intensity.h" 10 | #include "SpectralFlux.h" 11 | #include "Rhythm.h" 12 | #include "SpectralContrast.h" 13 | #include "SpeechMusicSegmenter.h" 14 | #include "Peaks.h" 15 | 16 | // Declare one static adapter here for each plugin class in this library. 17 | 18 | static Vamp::PluginAdapter energy; 19 | static Vamp::PluginAdapter intensity; 20 | static Vamp::PluginAdapter flux; 21 | static Vamp::PluginAdapter rhythm; 22 | static Vamp::PluginAdapter spectralcontrast; 23 | static Vamp::PluginAdapter speechMusicSegmenter; 24 | static Vamp::PluginAdapter peaks; 25 | 26 | // This is the entry-point for the library, and the only function that 27 | // needs to be publicly exported. 28 | 29 | const VampPluginDescriptor * 30 | vampGetPluginDescriptor(unsigned int version, unsigned int index) { 31 | if (version < 1) 32 | return 0; 33 | 34 | // Return a different plugin adaptor's descriptor for each index, 35 | // and return 0 for the first index after you run out of plugins. 36 | // (That's how the host finds out how many plugins are in this 37 | // library.) 38 | 39 | switch (index) { 40 | case 0: 41 | return energy.getDescriptor(); 42 | case 1: 43 | return intensity.getDescriptor(); 44 | case 2: 45 | return flux.getDescriptor(); 46 | case 3: 47 | return rhythm.getDescriptor(); 48 | case 4: 49 | return spectralcontrast.getDescriptor(); 50 | case 5: 51 | return speechMusicSegmenter.getDescriptor(); 52 | case 6: 53 | return peaks.getDescriptor(); 54 | default: 55 | return 0; 56 | } 57 | } 58 | 59 | -------------------------------------------------------------------------------- /src/vamp-plugin.list: -------------------------------------------------------------------------------- 1 | _vampGetPluginDescriptor 2 | -------------------------------------------------------------------------------- /src/vamp-plugin.map: -------------------------------------------------------------------------------- 1 | { 2 | global: vampGetPluginDescriptor; 3 | local: *; 4 | }; 5 | --------------------------------------------------------------------------------