├── LICENSE.md ├── README.md ├── getAudioFingerprinter.m ├── getEnergy.m ├── getLMSPredictor.m ├── getLandmarks.m ├── getLocalMaximum.m ├── getSpecgram.m ├── getZCR.m └── pic ├── LMSPredictor.JPG ├── ZCR.JPG ├── ZeroCrossingRate.png ├── audiofingerprinter.jpg ├── bitDerivation.JPG ├── fingerprinter.png ├── getEnergy.png ├── landmark-2d.png ├── landmark-3d.png ├── list ├── sgn.JPG ├── short-time-energy.png └── specgram.png /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 DandelionLau 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AudioProcessing 2 | An toolbox for audio processing in Matlab,the verison of Matlab is R2016a.
3 | The audio sample in this toobox is 8000Hz, 16 bits. 4 | 5 | ## 1. getAudioFingerpriner 6 | + [getAudioFingerprinter](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/getAudioFingerprinter.m) calcuates the difference of the energy of each frequency sub-band as shown below. 7 | ![Extraction process](https://github.com/DandelionLau/AudioProcessing/blob/master/pic/audiofingerprinter.jpg) 8 | 9 | + Specifically, the extraction process including the following steps: 10 | 1. divide audio into frames, the overlap length is alterable 11 | 2. do fft for each frame 12 | 3. divide frequency spectrum into 32 frequency sub bands 13 | 4. calcuate the energy of each sub band 14 | 5. calcuate audio fingerprinter as follow 15 | ![Bit Derviation](https://github.com/DandelionLau/AudioProcessing/blob/master/pic/bitDerivation.JPG) 16 | 17 | + The result is shown below 18 | ![printer](https://github.com/DandelionLau/AudioProcessing/blob/master/pic/fingerprinter.png) 19 | 20 | ## 2. getZCR 21 | + [getZCR](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/getZCR.m) calcuates the zero-crossing rate which can be used to predict the frequency roughly. 22 | + Specifically, the extraction process including the following steps: 23 | 1. divide audio into frames 24 | 2. calcuate zcr of each frame using the following equation 25 | ![ZCR](https://github.com/DandelionLau/AudioProcessing/blob/master/pic/ZCR.JPG) 26 | where sgn(x) is given by 27 | ![sgn(x)](https://github.com/DandelionLau/AudioProcessing/blob/master/pic/sgn.JPG) 28 | 29 | + The result is shown below 30 | ![zcrpic](https://github.com/DandelionLau/AudioProcessing/blob/master/pic/ZeroCrossingRate.png) 31 | 32 | ## 3. getLandmarks 33 | + [getLandmarks](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/getLandmarks.m) calcuates the max-enegry points(also called landmark) in the frequency spectrum. 34 | 35 | + Specifically, the extraction process including the following steps: 36 | 1. divide audio into frames, the overlap length is alterable 37 | 2. do fft for each frame 38 | 3. calcuate the energy of frequency 39 | 4. calcuate the local maximum as landmark 40 | + The result is shown below 41 | ![landmark-2d](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/pic/landmark-2d.png) 42 | ![landmark-3d](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/pic/landmark-3d.png) 43 | 44 | ## 4. getEnergy 45 | + [getEnergy](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/getEnergy.m) calcuates the short-time energy. 46 | + Specifically, the extraction process including the following steps: 47 | 1.divide audio into frames, the overlap length is alterable 48 | 2.calcuate the energy using the following equation: 49 | ![stenergy](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/pic/short-time-energy.png) 50 | 51 | + The result is shown below 52 | ![getEnrgy](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/pic/getEnergy.png) 53 | 54 | ## 5. getSpecgram 55 | + [getSpecgram](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/getSpecgram.m) calcuates the spectgram. 56 | + Specifically, the extraction process including the following steps: 57 | 1. divide signal into frames in time domain 58 | 2. do fft of each frame 59 | + The result is shown below 60 | ![specgram](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/pic/specgram.png) 61 | 62 | ## 6. getLMSPredictor 63 | + [getLMSPredictor](https://github.com/DandelionLau/AudioProcessing-toolbox/blob/master/getLMSPredictor.m) trains least mean square(LMS) adaptive predictor. 64 | -------------------------------------------------------------------------------- /getAudioFingerprinter.m: -------------------------------------------------------------------------------- 1 | function [F] = getAudioFingerprint( path ) 2 | % path：the path of audio sample 3 | % F: the audio fingerprinter of audio sample 4 | [data,f] = audioread(path); 5 | data = filter([1 -0.9375],1,data); % pre-emphasis 6 | signal = data(1:3*f); % take a audio of length 3s from sample 7 | tframe = enframe(signal,hanning(2048),80); % enframe in time domain,hanning window of length 2048 smaples,the non-overlapped length is 80 samples 8 | [tn,tm]= size(tframe); 9 | E = zeros(tn,33); 10 | F = zeros(tn,32); 11 | for i = 1:tn 12 | Y = fft(tframe(i,:)); 13 | L = length(tframe(i,:)); 14 | P2 = abs(Y/L); 15 | P1 = P2(1:L/2+1); 16 | P1(2:end-1) = 2*P1(2:end-1); 17 | fframe = enframe(P1,fix(length(P1)/33)); % enframe in frequency domain,33 sub-bands in total 18 | [fn,fm] = size(fframe); 19 | for j = 1:fn 20 | E(i,j) = sum(fframe(j,:).^2); % calcuate energy of each frame 21 | 22 | end 23 | end 24 | 25 | for n = 2:tn 26 | for m = 1:32 27 | if E(n,m)-E(n,m+1)-(E(n-1,m)-E(n-1,m+1))>0 % calcuate the fingerprinter of audio 28 | F(n,m) = 1; 29 | else 30 | F(n,m) = 0; 31 | end 32 | end 33 | end 34 | 35 | end 36 | 37 | -------------------------------------------------------------------------------- /getEnergy.m: -------------------------------------------------------------------------------- 1 | function [E] = getEnergy(path) 2 | % path: the path of audio sample 3 | % E: the short time energy of each frame 4 | [signal,fs] = audioread(path); 5 | framelength = 240; % set length of window 6 | framenumber = fix(length(signal)/framelength); 7 | for i = 1:framenumber; % enframe 8 | framesignal = signal((i-1)*framelength+1:i*framelength); 9 | E(i) = 0; 10 | for j = 1:framelength; % calcuate energy of each frame 11 | E(i) = E(i)+framesignal(j)^2; 12 | end 13 | end 14 | -------------------------------------------------------------------------------- /getLMSPredictor.m: -------------------------------------------------------------------------------- 1 | function [ Xest,Xerr] = getLMSPredictor(Xrec) 2 | % Xrec : the input signal 3 | % Xest : the predicted signal 4 | % Xerr : prediction error 5 | N = length(Xrec); 6 | a = 0.953125; 7 | b = 0.953125; 8 | u = 0.90625; 9 | r0 = 0; 10 | r1 = 0; 11 | COR0 = zeros(1,N); 12 | COR1 = zeros(1,N); 13 | VAR0 = ones(1,N); 14 | VAR1 = ones(1,N); 15 | e0 = zeros(1,N); 16 | e1 = zeros(1,N); 17 | Xerr = zeros(1,N); 18 | for n = 2:N 19 | K1(n) = b*COR0(n-1)/VAR0(n-1); 20 | K2(n) = b*COR1(n-1)/VAR1(n-1); 21 | Xest(n) = K1(n)*r0(n-1) + K2(n)*r1(n-1); 22 | % update coefficients 23 | e0(n) = Xrec(n); 24 | e1(n) = e0(n) - K1(n)*r0(n-1); 25 | VAR0(n) = u*VAR0(n-1) + 0.5*[r0(n-1)^2+e0(n)^2]; 26 | COR0(n) = u*COR0(n-1) + r0(n-1)*e0(n); 27 | VAR1(n) = u*VAR1(n-1) + 0.5*[r1(n-1)^2+e1(n)^2]; 28 | COR1(n) = u*COR1(n-1) + r1(n-1)*e1(n); 29 | r1(n) = a*(r0(n-1)-K1(n)*e0(n)); 30 | r0(n) = a*e0(n); 31 | Xerr(n) = Xrec(n) - Xest(n); 32 | end 33 | end 34 | 35 | -------------------------------------------------------------------------------- /getLandmarks.m: -------------------------------------------------------------------------------- 1 | function [landmark] = getLandmark(path) 2 | [data,f] = audioread(path); 3 | data = filter([1 -0.9375],1,data); % pre-emphisi 4 | signal = data(1:8000); % take 1s sample 5 | tframe = enframe(signal,hanning(1024),80); % enframe in time domain,hanning window of length 1024 samples,the non-overlapped length is 80 samples 6 | [tn,tm]= size(tframe); 7 | X = zeros(tn,tm/2+1); 8 | for i = 1:tn 9 | Y = fft(tframe(i,:)); 10 | L = length(tframe(i,:)); 11 | P2 = 20*log10(abs(Y/L)); 12 | P1 = P2(1:L/2+1); 13 | X(i,:) = P1; 14 | end 15 | % calcuate landmark 16 | landmark = getLocalMaximun(X,32,64,1); 17 | -------------------------------------------------------------------------------- /getLocalMaximum.m: -------------------------------------------------------------------------------- 1 | function [local] =getLocalMaximum(M,m1,m2,np) 2 | % the function is to get the few local maximums in a matrix 3 | % M is traget matrix 4 | % m1*m1 the size of local matrix 5 | % np is the total of the local maximum 6 | % the format of local is abscissa value, ordinate value, maxumum 7 | [a,b] = size(M); 8 | local = zeros(fix(a/m1*b/m2)*np,3); 9 | n = 0; 10 | [a,b] = size(M); 11 | subMatrix = zeros(m1,m2); % local matrix 12 | for x = 1:m1/2:a 13 | for y = 1:m2:b 14 | if x+m1-1 <= a && y+m2-1 <= b 15 | subMatrix = M(x:x+m1-1,y:y+m2-1); 16 | s = sort(subMatrix(:),'descend'); 17 | for k = 1:np 18 | [r,c] = find(M == s(k)); 19 | n = n+1; 20 | local(n,:) = [r,c,s(k)]; 21 | end 22 | end 23 | end 24 | end 25 | -------------------------------------------------------------------------------- /getSpecgram.m: -------------------------------------------------------------------------------- 1 | function [X] = getSpecgram(path) 2 | [signal,f] = audioread(path); 3 | signal = filter([1 -0.9375],1,signal); % pre-emphasis 4 | tframe = enframe(signal,hanning(2048),160); % enframe in time domain 5 | [tn,tm]= size(tframe); 6 | X = zeros(tn,tm/2+1); 7 | for i = 1:tn 8 | Y = fft(tframe(i,:)); 9 | L = length(tframe(i,:)); 10 | P2 = abs(Y); 11 | P1 = P2(1:L/2+1); 12 | X(i,:) = P1; 13 | end 14 | -------------------------------------------------------------------------------- /getZCR.m: -------------------------------------------------------------------------------- 1 | function zcr = getZCR(wavName) 2 | signal = audioread(wavName); 3 | framelength = 240; % the length of each frame 4 | framenumber = fix(length(signal)/framelength); 5 | for i = 1:framenumber; % enframe 6 | framesignal = signal((i-1)*framelength+1:i*framelength); % take one frame signal 7 | zcr(i) = 0; 8 | for j = 2:framelength-1; % calcuate the times of zero-crossing 9 | zcr(i) = zcr(i) + abs(sgn(framesignal(j))-sgn(framesignal(j-1))); 10 | end 11 | end 12 | zcr = zcr/(2*length(signal)); % calcuate the ratio 13 | end 14 | -------------------------------------------------------------------------------- /pic/LMSPredictor.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/LMSPredictor.JPG -------------------------------------------------------------------------------- /pic/ZCR.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/ZCR.JPG -------------------------------------------------------------------------------- /pic/ZeroCrossingRate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/ZeroCrossingRate.png -------------------------------------------------------------------------------- /pic/audiofingerprinter.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/audiofingerprinter.jpg -------------------------------------------------------------------------------- /pic/bitDerivation.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/bitDerivation.JPG -------------------------------------------------------------------------------- /pic/fingerprinter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/fingerprinter.png -------------------------------------------------------------------------------- /pic/getEnergy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/getEnergy.png -------------------------------------------------------------------------------- /pic/landmark-2d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/landmark-2d.png -------------------------------------------------------------------------------- /pic/landmark-3d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/landmark-3d.png -------------------------------------------------------------------------------- /pic/list: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /pic/sgn.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/sgn.JPG -------------------------------------------------------------------------------- /pic/short-time-energy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/short-time-energy.png -------------------------------------------------------------------------------- /pic/specgram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ryuk17/AudioProcessing-toolbox/1c46a68d22ca481a1f84e44ea5386044468ba3d4/pic/specgram.png --------------------------------------------------------------------------------