├── .gitignore
├── README.md
├── Room.py
├── SoundSource.py
├── beamforming.py
├── bin
└── README.md
├── constants.py
├── figure_Measures1.py
├── figure_Measures2.py
├── figure_SumNorm.py
├── figure_beam_scenarios.py
├── figure_filter_avg_ir.py
├── figure_quality.sh
├── figure_quality_plot.py
├── figure_quality_sim.py
├── figure_spectrograms.py
├── figures
├── README.md
├── beam_scenarios.png
└── spectrograms.png
├── make_all_figures.sh
├── metrics.py
├── output_samples
├── README.md
├── input_mic.wav
├── output_maxsinr.wav
└── output_rake-maxsinr.wav
├── phat.py
├── samples
├── Homer.wav
├── fq_sample1_8000.wav
├── fq_sample2_8000.wav
├── german_speech.wav
├── german_speech_44100.wav
├── german_speech_8000.wav
├── noreverb.wav
├── singing.wav
├── singing_16000.wav
├── singing_44100.wav
├── singing_8000.wav
├── speech.wav
└── sputnk1b.wav
├── sim_data
├── README.md
└── fig10
│ ├── quality_20150109-070951.npz
│ ├── quality_20150109-095429.npz
│ └── quality_20150109-201321.npz
├── stft.py
├── trinicon.py
├── utilities.py
├── wav_resample.py
└── windows.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.swp
3 | output_samples/fq*
4 | *.npz
5 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Raking the Cocktail Party
2 | =========================
3 |
4 | This repository contains all the code to reproduce the results of the paper
5 | [*Raking the Cocktail Party*](http://infoscience.epfl.ch/record/200336).
6 |
7 | We created a simple framework for simulation of room acoustics in object
8 | oriented python and apply it to perform numerical experiments related to
9 | this paper. All the figures and sound samples can be recreated by calling
10 | simple scripts leveraging this framework. We strongly hope that this code
11 | will be useful beyond the scope of this paper and plan to develop it into
12 | a standalone python package in the future.
13 |
14 | We are available for any question or request relating to either the code
15 | or the theory behind it. Just ask!
16 |
17 | Abstract
18 | --------
19 |
20 | We present the concept of an acoustic rake receiver (ARR) — a microphone
21 | beamformer that uses echoes to improve the noise and interference suppression.
22 | The rake idea is well-known in wireless communications. It involves
23 | constructively combining different multipath components that arrive at the
24 | receiver antennas. Unlike typical spread-spectrum signals used in wireless
25 | communications, speech signals are not orthogonal to their shifts, which makes
26 | acoustic raking a more challenging problem. That is why the correct way to
27 | think about it is spatial. Instead of explicitly estimating the channel, we
28 | create correspondences between early echoes in time and image sources in space.
29 | These multiple sources of the desired and interfering signals offer additional
30 | spatial diversity that we can exploit in the beamformer design.
31 |
32 | We present several "intuitive" and optimal formulations of ARRs, and show
33 | theoretically and numerically that the rake formulation of the maximum
34 | signal-to-interference-and-noise beamformer offers significant performance
35 | boosts in terms of noise suppression and interference cancellation. We
36 | accompany the paper by the complete simulation and processing chain written in
37 | Python.
38 |
39 |
40 | Authors
41 | -------
42 |
43 | Ivan Dokmanić, Robin Scheibler, and Martin Vetterli are with
44 | Laboratory for Audiovisual Communications ([LCAV](http://lcav.epfl.ch)) at
45 | [EPFL](http://www.epfl.ch).
46 |
47 |
48 |
49 | #### Contact
50 |
51 | [Ivan Dokmanić](mailto:ivan[dot]dokmanic[at]epfl[dot]ch)
52 | EPFL-IC-LCAV
53 | BC Building
54 | Station 14
55 | 1015 Lausanne
56 |
57 |
58 | Selected results from the paper
59 | -------------------------------
60 |
61 | ### Spectrograms and Sound Samples
62 |
63 |
64 |
65 | Comparison of the conventional Max-SINR and Rake-Max-SINR beamformer on a real
66 | speech sample. Spectrograms of (A) clean signal of interest, (B) signal
67 | corrupted by an interferer and additive white Gaussian noise at the microphone
68 | input, outputs of (C) conventional Max-SINR and (D) Rake-Max- SINR beamformers.
69 | Time naturally goes from left to right, and frequency increases from zero at
70 | the bottom up to Fs/2. To highlight the improvement of Rake-Max-SINR over
71 | Max-SINR, we blow-up three parts of the spectrograms in the lower part of the
72 | figure. The boxes and the corresponding part of the original spectrogram are
73 | numbered in (A). The numbering is the same but omitted in the rest of the
74 | figure for clarity.
75 |
76 | The corresponding sound samples:
77 |
78 | * [A](https://github.com/LCAV/AcousticRakeReceiver/raw/master/samples/singing_8000.wav) Desired signal.
79 | * [B](https://github.com/LCAV/AcousticRakeReceiver/raw/master/output_samples/input_mic.wav) Simulated microphone input signal.
80 | * [C](https://github.com/LCAV/AcousticRakeReceiver/raw/master/output_samples/output_maxsinr.wav) Output of conventional Max-SINR beamformer.
81 | * [D](https://github.com/LCAV/AcousticRakeReceiver/raw/master/output_samples/output_rake-maxsinr.wav) Output of proposed Rake-Max-SINR beamformer.
82 |
83 | ### Beam Patterns
84 |
85 |
86 |
87 | Beam patterns in different scenarios. The rectangular room is 4 by 6 metres and
88 | contains a source of interest (•) and an interferer (✭) ((B), (C), (D) only).
89 | The first order image sources are also displayed. The weight computation of the
90 | beamformer includes the direct source and the first order image sources of both
91 | desired source and interferer (when applicable). (A) Rake-Max-SINR, no
92 | interferer, (B) Rake-Max-SINR, one interferer, (C) Rake-Max-UDR, one
93 | interferer, (D) Rake-Max-SINR, interferer is in direct path.
94 |
95 | Dependencies
96 | ------------
97 |
98 | * A working distribution of [Python 2.7](https://www.python.org/downloads/).
99 | * The code relies heavily on [Numpy](http://www.numpy.org/), [Scipy](http://www.scipy.org/), and [matplotlib](http://matplotlib.org).
100 | * We use the distribution [anaconda](https://store.continuum.io/cshop/anaconda/) to simplify the setup of the environment.
101 |
102 | ### PESQ Tool
103 |
104 | Download the [source files](http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en) of the ITU P.862
105 | compliance tool from the ITU website.
106 |
107 | #### Unix compilation (Linux/Mac OS X)
108 |
109 | Execute the following sequence of commands to get to the source code.
110 |
111 | mkdir PESQ
112 | cd PESQ
113 | wget 'https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.862-200511-I!Amd2!SOFT-ZST-E&type=items'
114 | unzip dologin_pub.asp\?lang\=e\&id\=T-REC-P.862-200511-I\!Amd2\!SOFT-ZST-E\&type\=items
115 | cd Software
116 | unzip 'P862_annex_A_2005_CD wav final.zip'
117 | cd P862_annex_A_2005_CD/source/
118 |
119 | In the `Software/P862_annex_A_2005_CD/source/` directory, create a file called `Makefile` and copy
120 | the following into it.
121 |
122 | CC=gcc
123 | CFLAGS=-O2
124 |
125 | OBJS=dsp.o pesqdsp.o pesqio.o pesqmod.o pesqmain.o
126 | DEPS=dsp.h pesq.h pesqpar.h
127 |
128 | %.o: %.c $(DEPS)
129 | $(CC) -c -o $@ $< $(CFLAGS)
130 |
131 | pesq: $(OBJS)
132 | $(CC) -o $@ $^ $(CFLAGS)
133 |
134 | .PHONY : clean
135 | clean :
136 | -rm pesq $(OBJS)
137 |
138 | Execute compilation by typing this.
139 |
140 | make pesq
141 |
142 | Finally move the `pesq` binary to `/bin/`.
143 |
144 | Notes:
145 | * The files input to the pesq utility must be 16 bit PCM wav files.
146 | * File names longer than 14 characters (suffix included) cause the utility to
147 | crash with the message `Abort trap(6)` or similar.
148 |
149 | #### Windows compilation
150 |
151 | 1. Open visual studio, create a new project from existing files and select the directory
152 | containing the source code of PESQ (`Software\P862_annex_A_2005_CD\source\`).
153 |
154 | FILE -> New -> Project From Existing Code...
155 |
156 | 2. Select `Visual C++` from the dropdown menu, then next.
157 | * *Project file location* : directory containing source code of pesq (`Software\P862_annex_A_2005_CD\source\`).
158 | * *Project Name* : pesq
159 | * Then next.
160 | * As *project type*, select `Console application` project.
161 | * Then finish.
162 |
163 | 3. Go to
164 |
165 | BUILD -> Configuration Manager...
166 |
167 | and change active solution configuration from `Debug` to `Release`. Then Close.
168 |
169 | 4. Then
170 |
171 | BUILD -> Build Solution
172 |
173 | 5. Copy the executable `Release\pesq.exe` to the bin folder.
174 |
175 | *(tested with Microsoft Windows Server 2012)*
176 |
177 | Recreate the figures and sound samples
178 | --------------------------------------
179 |
180 | In a UNIX terminal, run the following script.
181 |
182 | ./make_all_figures.sh
183 |
184 | Alternatively, type in the following commands in an ipython shell.
185 |
186 | run figure_spectrograms.py
187 | run figure_beam_scenarios.py
188 | run figure_Measures1.py
189 | run figure_Measures2.py
190 | run figure_SumNorm.py
191 | run figure_quality_sim.py -s 10000
192 | run figure_quality_plot.py
193 |
194 | The figures and sound samples generated are collected in `figures` and
195 | `output_samples`, respectively.
196 |
197 | The script `figure_quality_sim.py` is very heavy computationally. Above, 10000
198 | is the number of loops. This number can be decreased when testing the code.
199 | It is possible to run it also in parallel in the following way. Open a shell
200 | and type in the following.
201 |
202 | ipcluster start -n
203 | ipython figure_quality_sim.py 10000
204 |
205 | On the first line, we start the ipython workers. Notice that we omit the `-s`
206 | option on the second line. This will run `` parallel jobs.
207 | Be sure to *deactivate* the MKL extensions if you have them enabled to make sure
208 | you have maximum efficiency.
209 |
210 | License
211 | -------
212 |
213 | Copyright (c) 2014, Ivan Dokmanić, Robin Scheibler, Martin Vetterli
214 |
215 | This code is free to reuse for non-commercial purpose such as academic or
216 | educational. For any other use, please contact the authors.
217 |
218 | 
Acoustic Rake Receiver by Ivan Dokmanić, Robin Scheibler, Martin Vetterli is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://github.com/LCAV/AcousticRakeReceiver.
219 |
220 |
--------------------------------------------------------------------------------
/Room.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 |
4 | import beamforming as bf
5 | from SoundSource import SoundSource
6 |
7 | import constants
8 |
9 | '''
10 | Room
11 | A room geometry is defined by all the sources and all their images
12 | '''
13 |
14 | class Room(object):
15 |
16 | def __init__(
17 | self,
18 | corners,
19 | Fs,
20 | t0=0.,
21 | absorption=1.,
22 | max_order=1,
23 | sigma2_awgn=None,
24 | sources=None,
25 | mics=None):
26 |
27 | # make sure we have an ndarray of the right size
28 | corners = np.array(corners)
29 | if (corners.ndim != 2):
30 | raise NameError('Room corners is a 2D array.')
31 |
32 | # make sure the corners are anti-clockwise
33 | if (self.area(corners) <= 0):
34 | raise NameError('Room corners must be anti-clockwise')
35 |
36 | self.corners = corners
37 | self.dim = corners.shape[0]
38 |
39 | # sampling frequency and time offset
40 | self.Fs = Fs
41 | self.t0 = t0
42 |
43 | # circular wall vectors (counter clockwise)
44 | self.walls = self.corners - \
45 | self.corners[:, xrange(-1, corners.shape[1] - 1)]
46 |
47 | # compute normals (outward pointing)
48 | self.normals = self.walls[[1, 0], :]/np.linalg.norm(self.walls, axis=0)[np.newaxis,:]
49 | self.normals[1, :] *= -1;
50 |
51 | # list of attenuation factors for the wall reflections
52 | absorption = np.array(absorption, dtype='float64')
53 | if (absorption.ndim == 0):
54 | self.absorption = absorption * np.ones(self.corners.shape[1])
55 | elif (absorption.ndim > 1 or self.corners.shape[1] != len(absorption)):
56 | raise NameError('Absorption and corner must be the same size')
57 | else:
58 | self.absorption = absorption
59 |
60 | # a list of sources
61 | if (sources is None):
62 | self.sources = []
63 | elif (sources is list):
64 | self.sources = sources
65 | else:
66 | raise NameError('Room needs a source or list of sources.')
67 |
68 | # a microphone array
69 | if (mics is not None):
70 | self.micArray = None
71 | else:
72 | self.micArray = mics
73 |
74 | # a maximum orders for image source computation
75 | self.max_order = max_order
76 |
77 | # pre-compute RIR if needed
78 | if (len(self.sources) > 0 and self.micArray is not None):
79 | self.compute_RIR()
80 | else:
81 | self.rir = []
82 |
83 | # ambiant additive white gaussian noise level
84 | self.sigma2_awgn = sigma2_awgn
85 |
86 |
87 | def plot(self, img_order=None, freq=None, **kwargs):
88 |
89 | import matplotlib
90 | from matplotlib.patches import Circle, Wedge, Polygon
91 | from matplotlib.collections import PatchCollection
92 | import matplotlib.pyplot as plt
93 |
94 | # get current figure and axis
95 | fig = plt.gcf()
96 | ax = plt.gca()
97 |
98 | # we always want equal aspect ratio
99 | ax.set_aspect('equal')
100 |
101 | # set the properties of the plot
102 | for key in kwargs:
103 | plt.setp(ax, key, kwargs[key])
104 |
105 | # draw room
106 | polygons = [Polygon(self.corners.T, True)]
107 | p = PatchCollection(polygons, cmap=matplotlib.cm.jet,
108 | facecolor=np.array([1,1,1]), edgecolor=np.array([0,0,0]))
109 | ax.add_collection(p)
110 |
111 | # draw the microphones
112 | if (self.micArray is not None):
113 | for mic in self.micArray.R.T:
114 | ax.scatter(mic[0], mic[1],
115 | marker='x', linewidth=0.5, s=2, c='k')
116 |
117 | # draw the beam pattern of the beamformer if requested (and
118 | # available)
119 | if freq is not None \
120 | and type(self.micArray) is bf.Beamformer \
121 | and self.micArray.weights is not None:
122 |
123 | freq = np.array(freq)
124 | if freq.ndim is 0:
125 | freq = np.array([freq])
126 |
127 | # define a new set of colors for the beam patterns
128 | newmap = plt.get_cmap('autumn')
129 | desat = 0.7
130 | ax.set_color_cycle([newmap(k) for k in desat*np.linspace(0,1,len(freq))])
131 |
132 |
133 | phis = np.arange(360) * 2 * np.pi / 360.
134 | newfreq = np.zeros(freq.shape)
135 | H = np.zeros((len(freq), len(phis)), dtype=complex)
136 | for i,f in enumerate(freq):
137 | newfreq[i], H[i] = self.micArray.response(phis, f)
138 |
139 | # normalize max amplitude to one
140 | H = np.abs(H)**2/np.abs(H).max()**2
141 |
142 | # a normalization factor according to room size
143 | norm = np.linalg.norm(
144 | (self.corners - self.micArray.center),
145 | axis=0).max()
146 |
147 | # plot all the beam patterns
148 | i = 0
149 | for f,h in zip(newfreq, H):
150 | x = np.cos(phis) * h * norm + self.micArray.center[0, 0]
151 | y = np.sin(phis) * h * norm + self.micArray.center[1, 0]
152 | l = ax.plot(x, y, '-', linewidth=1.0)
153 | #lbl = '%.2f' % f
154 | #i0 = i*360/len(freq)
155 | #ax.text(x[i0], y[i0], lbl, color=plt.getp(l[0], 'color'))
156 | #i += 1
157 |
158 | #ax.legend(freq)
159 |
160 | # define some markers for different sources and colormap for damping
161 | markers = ['o', '$\mathbf{+}$', '*', 'v', 's', '.']
162 | cmap = plt.get_cmap('YlGnBu')
163 | # draw the scatter of images
164 | for i, source in enumerate(self.sources):
165 | # draw source
166 | ax.scatter(
167 | source.position[0],
168 | source.position[1],
169 | c=cmap(1.),
170 | s=20,
171 | marker=markers[
172 | i %
173 | len(markers)],
174 | edgecolor=cmap(1.))
175 | #ax.text(source.position[0]+0.1, source.position[1]+0.1, str(i))
176 |
177 | # draw images
178 | if (img_order is None):
179 | img_order = self.max_order
180 | for o in xrange(img_order):
181 | # map the damping to a log scale (mapping 1 to 1)
182 | val = (np.log2(source.damping[o]) + 10.) / 10.
183 | # plot the images
184 | ax.scatter(source.images[o][0, :], source.images[o][1,:], \
185 | c=cmap(val), s=20,
186 | marker=markers[i % len(markers)], edgecolor=cmap(val))
187 |
188 | # keep axis equal, or the symmetry is lost
189 | #ax.axis('equal')
190 |
191 | def plotRIR(self):
192 |
193 | if self.rir == None:
194 | self.compute_RIR()
195 |
196 | import matplotlib.pyplot as plt
197 |
198 | M = self.micArray.M
199 | S = len(self.sources)
200 | for r in xrange(M):
201 | for s in xrange(S):
202 | h = self.rir[r][s]
203 | plt.subplot(M, S, r*S + s + 1)
204 | plt.plot(np.arange(len(h)) / float(self.Fs), h)
205 | plt.title('RIR: mic'+str(r)+' source'+str(s))
206 | if r == M-1:
207 | plt.xlabel('Time [s]')
208 |
209 |
210 | def addMicrophoneArray(self, micArray):
211 | self.micArray = micArray
212 |
213 | def addSource(self, position, signal=None, delay=0):
214 |
215 | # generate first order images
216 | i, d = self.firstOrderImages(np.array(position))
217 | images = [i]
218 | damping = [d]
219 |
220 | # generate all higher order images up to max_order
221 | o = 1
222 | while o < self.max_order:
223 | # generate all images of images of previous order
224 | img = np.zeros((self.dim, 0))
225 | dmp = np.array([])
226 | for si, sd in zip(images[o - 1].T, damping[o - 1]):
227 | i, d = self.firstOrderImages(si)
228 | img = np.concatenate((img, i), axis=1)
229 | dmp = np.concatenate((dmp, d * sd))
230 |
231 | # remove duplicates
232 | ordering = np.lexsort(img)
233 | img = img[:, ordering]
234 | dmp = dmp[ordering]
235 | diff = np.diff(img, axis=1)
236 | ui = np.ones(img.shape[1], 'bool')
237 | ui[1:] = (diff != 0).any(axis=0)
238 |
239 | # add to array of images
240 | images.append(img[:, ui])
241 | damping.append(dmp[ui])
242 |
243 | # next order
244 | o += 1
245 |
246 | # add a new source to the source list
247 | self.sources.append(
248 | SoundSource(
249 | position,
250 | images=images,
251 | damping=damping,
252 | signal=signal,
253 | delay=delay))
254 |
255 | def firstOrderImages(self, source_position):
256 |
257 | # projected length onto normal
258 | ip = np.sum(
259 | self.normals * (self.corners - source_position[:, np.newaxis]), axis=0)
260 |
261 | # projected vector from source to wall
262 | d = ip * self.normals
263 |
264 | # compute images points, positivity is to get only the reflections
265 | # outside the room
266 | images = source_position[:, np.newaxis] + 2 * d[:, ip > 0]
267 |
268 | # collect absorption factors of reflecting walls
269 | damping = self.absorption[ip > 0]
270 |
271 | return images, damping
272 |
273 | def compute_RIR(self, c=constants.c, window=False):
274 | '''
275 | Compute the room impulse response between every source and microphone
276 | '''
277 | self.rir = []
278 |
279 | for mic in self.micArray.R.T:
280 |
281 | h = []
282 |
283 | for source in self.sources:
284 |
285 | # stack source and all images
286 | img = source.getImages(self.max_order)
287 | dmp = source.getDamping(self.max_order)
288 |
289 | # compute the distance
290 | dist = np.sqrt(np.sum((img - mic[:, np.newaxis]) ** 2, axis=0))
291 | time = dist / c + self.t0
292 | alpha = dmp/(4.*np.pi*dist)
293 |
294 | # the number of samples needed
295 | N = np.ceil((time.max() + self.t0) * self.Fs)
296 |
297 | t = np.arange(N)/float(self.Fs)
298 | ir = np.zeros(t.shape)
299 |
300 | for ti, ai in zip(time, alpha):
301 | ir += np.sinc(self.Fs*(t-ti))*ai
302 |
303 | h.append(ir)
304 |
305 | self.rir.append(h)
306 |
307 | def simulate(self, recompute_rir=False):
308 | '''
309 | Simulate the microphone signal at every microphone in the array
310 | '''
311 |
312 | # import convolution routine
313 | from scipy.signal import fftconvolve
314 |
315 | # Throw an error if we are missing some hardware in the room
316 | if (len(self.sources) is 0):
317 | raise NameError('There are no sound sources in the room.')
318 | if (self.micArray is None):
319 | raise NameError('There is no microphone in the room.')
320 |
321 | # compute RIR if necessary
322 | if len(self.rir) == 0 or recompute_rir:
323 | self.compute_RIR()
324 |
325 | # number of mics and sources
326 | M = self.micArray.M
327 | S = len(self.sources)
328 |
329 | # compute the maximum signal length
330 | from itertools import product
331 | max_len_rir = np.array([len(self.rir[i][j])
332 | for i, j in product(xrange(M), xrange(S))]).max()
333 | f = lambda i: len(
334 | self.sources[i].signal) + np.floor(self.sources[i].delay * self.Fs)
335 | max_sig_len = np.array([f(i) for i in xrange(S)]).max()
336 | L = max_len_rir + max_sig_len - 1
337 | if L%2 == 1: L += 1
338 |
339 | # the array that will receive all the signals
340 | self.micArray.signals = np.zeros((M, L))
341 |
342 | # compute the signal at every microphone in the array
343 | for m in np.arange(M):
344 | rx = self.micArray.signals[m]
345 | for s in np.arange(S):
346 | sig = self.sources[s].signal
347 | if sig is None:
348 | continue
349 | d = np.floor(self.sources[s].delay * self.Fs)
350 | h = self.rir[m][s]
351 | rx[d:d + len(sig) + len(h) - 1] += fftconvolve(h, sig)
352 |
353 | # add white gaussian noise if necessary
354 | if self.sigma2_awgn is not None:
355 | rx += np.sqrt(self.sigma2_awgn)*np.random.normal(0., 1., rx.shape)
356 |
357 |
358 | def dSNR(self, x, source=0):
359 | ''' direct Signal-to-Noise Ratio'''
360 |
361 | if source >= len(self.sources):
362 | raise NameError('No such source')
363 |
364 | if self.sources[source].signal is None:
365 | raise NameError('No signal defined for source ' + str(source))
366 |
367 | if self.sigma2_awgn is None:
368 | return float('inf')
369 |
370 | x = np.array(x)
371 |
372 | sigma2_s = np.mean(self.sources[0].signal**2)
373 |
374 | d2 = np.sum((x - self.sources[source].position)**2)
375 |
376 | return sigma2_s/self.sigma2_awgn/(16*np.pi**2*d2)
377 |
378 |
379 | @classmethod
380 | def shoeBox2D(cls, p1, p2, Fs, **kwargs):
381 | '''
382 | Create a new Shoe Box room geometry.
383 | Arguments:
384 | p1: the lower left corner of the room
385 | p2: the upper right corner of the room
386 | max_order: the maximum order of image sources desired.
387 | '''
388 |
389 | # compute room characteristics
390 | corners = np.array(
391 | [[p1[0], p2[0], p2[0], p1[0]], [p1[1], p1[1], p2[1], p2[1]]])
392 |
393 | return Room(corners, Fs, **kwargs)
394 |
395 | @classmethod
396 | def area(cls, corners):
397 | '''
398 | Compute the area of a 2D room represented by its corners
399 | '''
400 | x = corners[0, :] - corners[0, xrange(-1, corners.shape[1]-1)]
401 | y = corners[1, :] + corners[1, xrange(-1, corners.shape[1]-1)]
402 | return -0.5 * (x * y).sum()
403 |
404 | @classmethod
405 | def isAntiClockwise(cls, corners):
406 | '''
407 | Return true if the corners of the room are arranged anti-clockwise
408 | '''
409 | return (cls.area(corners) > 0)
410 |
411 | @classmethod
412 | def ccw3p(cls, p):
413 | '''
414 | Argument: p, a (3,2)-ndarray whose rows are the vertices of a 2D triangle
415 | Returns
416 | 1: if triangle vertices are counter-clockwise
417 | -1: if triangle vertices are clock-wise
418 | 0: if vertices are colinear
419 |
420 | Ref: https://en.wikipedia.org/wiki/Curve_orientation
421 | '''
422 | if (p.shape != (2, 3)):
423 | raise NameError(
424 | 'Room.ccw3p is for three 2D points, input is 3x2 ndarray')
425 | D = (p[0, 1] - p[0, 0]) * (p[1, 2] - p[1, 0]) - \
426 | (p[0, 2] - p[0, 0]) * (p[1, 1] - p[1, 0])
427 |
428 | if (np.abs(D) < constants.eps):
429 | return 0
430 | elif (D > 0):
431 | return 1
432 | else:
433 | return -1
434 |
--------------------------------------------------------------------------------
/SoundSource.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 |
4 | '''
5 | A class to represent sound sources
6 | '''
7 |
8 |
9 | class SoundSource(object):
10 |
11 | def __init__(
12 | self,
13 | position,
14 | images=None,
15 | damping=None,
16 | signal=None,
17 | delay=0):
18 |
19 | self.position = np.array(position)
20 |
21 | if (images is None):
22 | # set to empty list if nothing provided
23 | self.images = []
24 | self.damping = []
25 |
26 | else:
27 | # save list if provided
28 | self.images = images
29 |
30 | # we need to have damping factors for every image
31 | if (damping is None):
32 | # set to one if not set
33 | self.damping = []
34 | for o in images:
35 | self.damping.append(np.ones(o.shape))
36 | else:
37 | # check damping is the same size as images
38 | if (len(damping) != len(images)):
39 | raise NameError('Images and damping must have same shape')
40 | for i in range(len(damping)):
41 | if (damping[i].shape[0] != images[i].shape[1]):
42 | raise NameError(
43 | 'Images and damping must have same shape')
44 |
45 | # copy over if correct
46 | self.damping = damping
47 |
48 | # The sound signal of the source
49 | self.signal = signal
50 | self.delay = delay
51 |
52 | def addSignal(signal):
53 |
54 | self.signal = signal
55 |
56 | def getImages(self, max_order=None, max_distance=None, n_nearest=None, ref_point=None):
57 |
58 | # TO DO: Add also n_strongest
59 |
60 | # TO DO: Make some of these thing exclusive (e.g. can't have n_nearest
61 | # AND n_strongest (although could have max_order AND n_nearest)
62 |
63 | # TO DO: Make this more efficient if bottleneck (unlikely)
64 |
65 | if (max_order is None):
66 | max_order = len(self.images)
67 |
68 | # stack source and all images
69 | img = np.array([self.position]).T
70 | for o in xrange(max_order):
71 | img = np.concatenate((img, self.images[o]), axis=1)
72 |
73 | if (n_nearest is not None):
74 | dist = np.sum((img - ref_point)**2, axis=0)
75 | i_nearest = dist.argsort()[0:n_nearest]
76 | img = img[:,i_nearest]
77 |
78 | return img
79 |
80 | def getDamping(self, max_order=None):
81 | if (max_order is None):
82 | max_order = len(images)
83 |
84 | # stack source and all images
85 | dmp = np.array([1.])
86 | for o in xrange(max_order):
87 | dmp = np.concatenate((dmp, self.damping[o]))
88 |
89 | return dmp
90 |
--------------------------------------------------------------------------------
/beamforming.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.linalg import pinv, eig, inv
3 | from time import sleep
4 |
5 | import constants
6 |
7 | import windows
8 | import stft
9 |
10 |
11 | #=========================================================================
12 | # Free (non-class-member) functions related to beamformer design
13 | #=========================================================================
14 |
15 |
16 | def H(A, **kwargs):
17 | '''Returns the conjugate (Hermitian) transpose of a matrix.'''
18 |
19 | return np.transpose(A, **kwargs).conj()
20 |
21 | def sumcols(A):
22 | '''Sums the columns of a matrix (np.array). The output is a 2D np.array
23 | of dimensions M x 1.'''
24 |
25 | return np.sum(A, axis=1, keepdims=1)
26 |
27 |
28 | def mdot(*args):
29 | '''Left-to-right associative matrix multiplication of multiple 2D
30 | ndarrays'''
31 |
32 | ret = args[0]
33 | for a in args[1:]:
34 | ret = np.dot(ret,a)
35 |
36 | return ret
37 |
38 |
39 | def distance(X, Y):
40 | '''
41 | X and Y are DxN ndarray containing N D-dimensional vectors
42 | distance(X,Y) computes the distance matrix E where E[i,j] = sqrt(sum((X[:,i]-Y[:,j])**2))
43 | '''
44 | # Assume X, Y are arrays, *not* matrices
45 | X = np.array(X)
46 | Y = np.array(Y)
47 |
48 | return np.sqrt((X[0,:,np.newaxis]-Y[0,:])**2 + (X[1,:,np.newaxis]-Y[1,:])**2)
49 |
50 |
51 | def unit_vec2D(phi):
52 | return np.array([[np.cos(phi), np.sin(phi)]]).T
53 |
54 |
55 | def linear2DArray(center, M, phi, d):
56 | u = unit_vec2D(phi)
57 | return np.array(center)[:, np.newaxis] + d * \
58 | (np.arange(M)[np.newaxis, :] - (M - 1.) / 2.) * u
59 |
60 |
61 | def circular2DArray(center, M, phi0, radius):
62 | phi = np.arange(M) * 2. * np.pi / M
63 | return np.array(center)[:, np.newaxis] + radius * \
64 | np.vstack((np.cos(phi + phi0), np.sin(phi + phi0)))
65 |
66 |
67 | def fir_approximation_ls(weights, T, n1, n2):
68 |
69 | freqs_plus = np.array(weights.keys())[:, np.newaxis]
70 | freqs = np.vstack([freqs_plus,
71 | -freqs_plus])
72 | omega = 2 * np.pi * freqs
73 | omega_discrete = omega * T
74 |
75 | n = np.arange(n1, n2)
76 |
77 | # Create the DTFT transform matrix corresponding to a discrete set of
78 | # frequencies and the FIR filter indices
79 | F = np.exp(-1j * omega_discrete * n)
80 | print pinv(F)
81 |
82 | w_plus = np.array(weights.values())[:, :, 0]
83 | w = np.vstack([w_plus,
84 | w_plus.conj()])
85 |
86 | return pinv(F).dot(w)
87 |
88 |
89 | #=========================================================================
90 | # Classes (microphone array and beamformer related)
91 | #=========================================================================
92 |
93 |
94 | class MicrophoneArray(object):
95 |
96 | """Microphone array class."""
97 |
98 | def __init__(self, R, Fs):
99 | self.dim = R.shape[0] # are we in 2D or in 3D
100 | self.M = R.shape[1] # number of microphones
101 | self.R = R # array geometry
102 |
103 | self.Fs = Fs # sampling frequency of microphones
104 |
105 | self.signals = None
106 |
107 | self.center = np.mean(R, axis=1, keepdims=True)
108 |
109 |
110 | def to_wav(self, filename, mono=False, norm=False, type=float):
111 | '''
112 | Save all the signals to wav files
113 | '''
114 | from scipy.io import wavfile
115 |
116 | if mono is True:
117 | signal = self.signals[self.M/2]
118 | else:
119 | signal = self.signals.T # each column is a channel
120 |
121 | if type is float:
122 | bits = None
123 | elif type is np.int8:
124 | bits = 8
125 | elif type is np.int16:
126 | bits = 16
127 | elif type is np.int32:
128 | bits = 32
129 | elif type is np.int64:
130 | bits = 64
131 | else:
132 | raise NameError('No such type.')
133 |
134 | if norm is True:
135 | from utilities import normalize
136 | signal = normalize(signal, bits=bits)
137 |
138 | signal = np.array(signal, dtype=type)
139 |
140 | wavfile.write(filename, self.Fs, signal)
141 |
142 | @classmethod
143 | def linear2D(cls, Fs, center, M, phi, d):
144 | return MicrophoneArray(linear2DArray(center, M, phi, d), Fs)
145 |
146 | @classmethod
147 | def circular2D(cls, Fs, center, M, phi, radius):
148 | return MicrophoneArray(circular2DArray(center, M, phi, radius), Fs)
149 |
150 |
151 | class Beamformer(MicrophoneArray):
152 |
153 | """Beamformer class. At some point, in some nice way, the design methods
154 | should also go here. Probably with generic arguments."""
155 |
156 | def __init__(self, R, Fs):
157 | MicrophoneArray.__init__(self, R, Fs)
158 |
159 | # All of these will be defined in setProcessing
160 | self.processing = None # Time or frequency domain
161 | self.N = None
162 | self.L = None
163 | self.hop = None
164 | self.zpf = None
165 | self.zpb = None
166 |
167 | self.frequencies = None # frequencies of weights are defined in processing
168 |
169 | # weights will be computed later, the array is of shape (M, N/2+1)
170 | self.weights = None
171 |
172 |
173 | def setProcessing(self, processing, *args):
174 | """ Setup the processing type and parameters """
175 |
176 | self.processing = processing
177 |
178 | if processing == 'FrequencyDomain':
179 | self.L = args[0] # frame size
180 | if self.L % 2 is not 0: self.L += 1 # ensure even length
181 | self.hop = args[1] # hop between two successive frames
182 | self.zpf = args[2] # zero-padding front
183 | self.zpb = args[3] # zero-padding back
184 | self.N = self.L + self.zpf + self.zpb
185 | if self.N % 2 is not 0: # ensure even length
186 | self.N += 1
187 | self.zpb += 1
188 | elif processing == 'TimeDomain':
189 | self.N = args[0] # filter length
190 | if self.N % 2 is not 0: self.N += 1 # ensure even length
191 | elif processing == 'Total':
192 | self.N = self.signals.shape[1]
193 | else:
194 | raise NameError(processing + ': No such type of processing')
195 |
196 | # for now only support equally spaced frequencies
197 | self.frequencies = np.arange(0, self.N/2+1)/float(self.N)*float(self.Fs)
198 |
199 | def __add__(self, y):
200 | """ Concatenates two beamformers together """
201 |
202 | return Beamformer(np.concatenate((self.R, y.R), axis=1), self.Fs)
203 |
204 |
205 | # def steering_vector_2D_ff(self, frequency, phi, attn=False):
206 | # phi = np.array([phi]).reshape(phi.size)
207 | # omega = 2*np.pi*frequency
208 |
209 | # return np.exp(-1j*omega*)
210 |
211 |
212 | def steering_vector_2D(self, frequency, phi, dist, attn=False):
213 |
214 | phi = np.array([phi]).reshape(phi.size)
215 |
216 | # Assume phi and dist are measured from the array's center
217 | X = dist * np.array([np.cos(phi), np.sin(phi)]) + self.center
218 |
219 | D = distance(self.R, X)
220 | omega = 2 * np.pi * frequency
221 |
222 | if attn:
223 | # TO DO 1: This will mean slightly different absolute value for
224 | # every entry, even within the same steering vector. Perhaps a
225 | # better paradigm is far-field with phase carrier.
226 | return 1. / (4 * np.pi) / D * np.exp(-1j * omega * D / constants.c)
227 | else:
228 | return np.exp(-1j * omega * D / constants.c)
229 |
230 |
231 | def steering_vector_2D_from_point(self, frequency, source, attn=True, ff=False):
232 | """ Creates a steering vector for a particular frequency and source
233 |
234 | Args:
235 | frequency
236 | source: location in cartesian coordinates
237 | attn: include attenuation factor if True
238 | ff: uses far-field distance if true
239 |
240 | Return:
241 | A 2x1 ndarray containing the steering vector
242 | """
243 | X = np.array(source)
244 | if X.ndim == 1:
245 | X = source[:,np.newaxis]
246 |
247 | # normalize for far-field if requested
248 | if (ff):
249 | X -= self.center
250 | Xn = np.sqrt(np.sum(X**2, axis=0))
251 | X *= constants.ffdist/Xn
252 | X += self.center
253 |
254 | D = distance(self.R, X)
255 | omega = 2 * np.pi * frequency
256 |
257 | if attn:
258 | # TO DO 1: This will mean slightly different absolute value for
259 | # every entry, even within the same steering vector. Perhaps a
260 | # better paradigm is far-field with phase carrier.
261 | return 1. / (4 * np.pi) / D * np.exp(-1j * omega * D / constants.c)
262 | else:
263 | return np.exp(-1j * omega * D / constants.c)
264 |
265 |
266 | def response(self, phi_list, frequency):
267 |
268 | i_freq = np.argmin(np.abs(self.frequencies - frequency))
269 |
270 | # For the moment assume that we are in 2D
271 | bfresp = np.dot(H(self.weights[:,i_freq]), self.steering_vector_2D(
272 | self.frequencies[i_freq], phi_list, constants.ffdist))
273 |
274 | return self.frequencies[i_freq], bfresp
275 |
276 |
277 | def response_from_point(self, x, frequency):
278 |
279 | i_freq = np.argmin(np.abs(self.frequencies - frequency))
280 |
281 | # For the moment assume that we are in 2D
282 | bfresp = np.dot(H(self.weights[:,i_freq]), self.steering_vector_2D_from_point(
283 | self.frequencies[i_freq], x, attn=True, ff=False))
284 |
285 | return self.frequencies[i_freq], bfresp
286 |
287 |
288 | def plot_response_from_point(self, x, legend=None):
289 |
290 | if x.ndim == 0:
291 | x = np.array([x])
292 |
293 | import matplotlib.pyplot as plt
294 |
295 | HF = np.zeros((x.shape[1], self.frequencies.shape[0]), dtype=complex)
296 | for k,p in enumerate(x.T):
297 | for i,f in enumerate(self.frequencies):
298 | r = np.dot(H(self.weights[:,i]),
299 | self.steering_vector_2D_from_point(f, p, attn=True, ff=False))
300 | HF[k,i] = r[0]
301 |
302 |
303 | plt.subplot(2,1,1)
304 | plt.title('Beamformer response')
305 | for hf in HF:
306 | plt.plot(self.frequencies, np.abs(hf))
307 | plt.ylabel('Modulus')
308 | plt.axis('tight')
309 | plt.legend(legend)
310 |
311 | plt.subplot(2,1,2)
312 | for hf in HF:
313 | plt.plot(self.frequencies, np.unwrap(np.angle(hf)))
314 | plt.ylabel('Phase')
315 | plt.xlabel('Frequency [Hz]')
316 | plt.axis('tight')
317 | plt.legend(legend)
318 |
319 |
320 | def plot_beam_response(self):
321 |
322 | phi = np.linspace(-np.pi, np.pi-np.pi/180, 360)
323 | freq = self.frequencies
324 | #freq = self.frequencies[self.frequencies > constants.fc_hp]
325 |
326 | resp = np.zeros((freq.shape[0], phi.shape[0]), dtype=complex)
327 |
328 | for i,f in enumerate(freq):
329 | # For the moment assume that we are in 2D
330 | resp[i,:] = np.dot(H(self.weights[:,i]), self.steering_vector_2D(
331 | f, phi, constants.ffdist))
332 |
333 | H_abs = np.abs(resp)**2
334 | H_abs /= H_abs.max()
335 | H_abs = 10*np.log10(H_abs)
336 |
337 | p_min = 0
338 | p_max = 100
339 | vmin, vmax = np.percentile(H_abs.flatten(), [p_min, p_max])
340 |
341 | import matplotlib.pyplot as plt
342 |
343 | plt.imshow(H_abs,
344 | aspect='auto',
345 | origin='lower',
346 | interpolation='sinc',
347 | vmax=vmax, vmin=vmin)
348 |
349 | plt.xlabel('Angle [rad]')
350 | xticks = [-np.pi, -np.pi/2, 0, np.pi/2, np.pi]
351 | for i,p in enumerate(xticks):
352 | xticks[i] = np.argmin(np.abs(p - phi))
353 | xticklabels = ['$-\pi$', '$-\pi/2$', '0', '$\pi/2$', '$\pi$']
354 | plt.setp(plt.gca(), 'xticks', xticks)
355 | plt.setp(plt.gca(), 'xticklabels', xticklabels)
356 |
357 | plt.ylabel('Freq [kHz]')
358 | yticks = np.zeros(4)
359 | f_0 = np.floor(self.Fs/8000.)
360 | for i in np.arange(1,5):
361 | yticks[i-1] = np.argmin(np.abs(freq - 1000.*i*f_0))
362 | #yticks = np.array(plt.getp(plt.gca(), 'yticks'), dtype=np.int)
363 | plt.setp(plt.gca(), 'yticks', yticks)
364 | plt.setp(plt.gca(), 'yticklabels', np.arange(1,5)*f_0)
365 |
366 |
367 | def farFieldWeights(self, phi):
368 | '''
369 | This method computes weight for a far field at infinity
370 |
371 | phi: direction of beam
372 | '''
373 |
374 | u = unit_vec2D(phi)
375 | proj = np.dot(u.T, self.R - self.center)[0]
376 |
377 | # normalize the first arriving signal to ensure a causal filter
378 | proj -= proj.max()
379 |
380 | self.weights = np.exp(2j * np.pi *
381 | self.frequencies[:, np.newaxis] * proj / constants.c).T
382 |
383 |
384 | def rakeDelayAndSumWeights(self, source, interferer=None, R_n=None, attn=True, ff=False):
385 |
386 | self.weights = np.zeros((self.M, self.frequencies.shape[0]), dtype=complex)
387 |
388 | K = source.shape[1] - 1
389 |
390 | for i, f in enumerate(self.frequencies):
391 | W = self.steering_vector_2D_from_point(f, source, attn=attn, ff=ff)
392 | self.weights[:,i] = 1.0/self.M/(K+1) * np.sum(W, axis=1)
393 |
394 |
395 |
396 | def rakeOneForcingWeights(self, source, interferer, R_n=None, ff=False, attn=True):
397 |
398 | if R_n is None:
399 | R_n = np.zeros((self.M, self.M))
400 |
401 | self.weights = np.zeros((self.M, self.frequencies.shape[0]), dtype=complex)
402 |
403 | for i, f in enumerate(self.frequencies):
404 | if interferer is None:
405 | A_bad = np.array([[]])
406 | else:
407 | A_bad = self.steering_vector_2D_from_point(f, interferer, attn=attn, ff=ff)
408 |
409 | R_nq = R_n + sumcols(A_bad).dot(H(sumcols(A_bad)))
410 |
411 | A_s = self.steering_vector_2D_from_point(f, source, attn=attn, ff=ff)
412 | R_nq_inv = pinv(R_nq)
413 | D = pinv(mdot(H(A_s), R_nq_inv, A_s))
414 |
415 | self.weights[:,i] = sumcols( mdot( R_nq_inv, A_s, D ) )[:,0]
416 |
417 | def rakeMaxSINRWeights(self, source, interferer, R_n=None,
418 | rcond=0., ff=False, attn=True):
419 | '''
420 | This method computes a beamformer focusing on a number of specific sources
421 | and ignoring a number of interferers.
422 |
423 | INPUTS
424 | * source : source locations
425 | * interferer : interferer locations
426 | '''
427 |
428 | if R_n is None:
429 | R_n = np.zeros((self.M, self.M))
430 |
431 | self.weights = np.zeros((self.M, self.frequencies.shape[0]), dtype=complex)
432 |
433 | for i,f in enumerate(self.frequencies):
434 |
435 | A_good = self.steering_vector_2D_from_point(f, source, attn=attn, ff=ff)
436 |
437 | if interferer is None:
438 | A_bad = np.array([[]])
439 | else:
440 | A_bad = self.steering_vector_2D_from_point(f, interferer, attn=attn, ff=ff)
441 |
442 | a_good = sumcols(A_good)
443 | a_bad = sumcols(A_bad)
444 |
445 | # TO DO: Fix this (check for numerical rank, use the low rank approximation)
446 | K_inv = pinv(a_bad.dot(H(a_bad)) + R_n + rcond * np.eye(A_bad.shape[0]))
447 | self.weights[:,i] = (K_inv.dot(a_good) / mdot(H(a_good), K_inv, a_good))[:,0]
448 |
449 |
450 | def rakeMaxUDRWeights(self, source, interferer, R_n=None, ff=False, attn=True):
451 |
452 | if source.shape[1] == 1:
453 | self.rakeMaxSINRWeights(source, interferer, R_n=R_n, ff=ff, attn=attn)
454 | return
455 |
456 | if R_n is None:
457 | R_n = np.zeros((self.M, self.M))
458 |
459 | self.weights = np.zeros((self.M, self.frequencies.shape[0]), dtype=complex)
460 |
461 | for i, f in enumerate(self.frequencies):
462 | A_good = self.steering_vector_2D_from_point(f, source, attn=attn, ff=ff)
463 |
464 | if interferer is None:
465 | A_bad = np.array([[]])
466 | else:
467 | A_bad = self.steering_vector_2D_from_point(f, interferer, attn=attn, ff=ff)
468 |
469 | R_nq = R_n + sumcols(A_bad).dot(H(sumcols(A_bad)))
470 |
471 | C = np.linalg.cholesky(R_nq)
472 | l, v = eig( mdot( inv(C), A_good, H(A_good), H(inv(C)) ) )
473 |
474 | self.weights[:,i] = inv(H(C)).dot(v[:,0])
475 |
476 |
477 | def SNR(self, source, interferer, f, R_n=None, dB=False):
478 |
479 | i_f = np.argmin(np.abs(self.frequencies - f))
480 |
481 | # This works at a single frequency because otherwise we need to pass
482 | # many many covariance matrices. Easy to change though (you can also
483 | # have frequency independent R_n).
484 |
485 | if R_n is None:
486 | R_n = np.zeros((self.M, self.M))
487 |
488 | # To compute the SNR, we /must/ use the real steering vectors, so no
489 | # far field, and attn=True
490 | A_good = self.steering_vector_2D_from_point(self.frequencies[i_f], source, attn=True, ff=False)
491 |
492 | if interferer is not None:
493 | A_bad = self.steering_vector_2D_from_point(self.frequencies[i_f], interferer, attn=True, ff=False)
494 | R_nq = R_n + sumcols(A_bad) * H(sumcols(A_bad))
495 | else:
496 | R_nq = R_n
497 |
498 | w = self.weights[:,i_f]
499 | a_1 = sumcols(A_good)
500 |
501 | SNR = np.real(mdot(H(w), a_1, H(a_1), w) / mdot(H(w), R_nq, w))
502 |
503 | if dB is True:
504 | SNR = 10 * np.log10(SNR)
505 |
506 | return SNR
507 |
508 |
509 | def UDR(self, source, interferer, f, R_n=None, dB=False):
510 |
511 | i_f = np.argmin(np.abs(self.frequencies - f))
512 |
513 | if R_n is None:
514 | R_n = np.zeros((self.M, self.M))
515 |
516 | A_good = self.steering_vector_2D_from_point(self.frequencies[i_f], source, attn=True, ff=False)
517 |
518 | if interferer is not None:
519 | A_bad = self.steering_vector_2D_from_point(self.frequencies[i_f], interferer, attn=True, ff=False)
520 | R_nq = R_n + sumcols(A_bad).dot(H(sumcols(A_bad)))
521 | else:
522 | R_nq = R_n
523 |
524 | w = self.weights[:,i_f]
525 |
526 | UDR = np.real(mdot(H(w), A_good, H(A_good), w) / mdot(H(w), R_nq, w))
527 | if dB is True:
528 | UDR = 10 * np.log10(UDR)
529 |
530 | return UDR
531 |
532 |
533 | def process(self):
534 |
535 | if (self.signals is None or len(self.signals) == 0):
536 | raise NameError('No signal to beamform')
537 |
538 | if self.processing is 'FrequencyDomain':
539 |
540 | # create window function
541 | win = np.concatenate((np.zeros(self.zpf),
542 | windows.hann(self.L),
543 | np.zeros(self.zpb)))
544 |
545 | # do real STFT of first signal
546 | tfd_sig = stft.stft(self.signals[0],
547 | self.L,
548 | self.hop,
549 | zp_back=self.zpb,
550 | zp_front=self.zpf,
551 | transform=np.fft.rfft,
552 | win=win) * np.conj(self.weights[0])
553 | for i in xrange(1, self.M):
554 | tfd_sig += stft.stft(self.signals[i],
555 | self.L,
556 | self.hop,
557 | zp_back=self.zpb,
558 | zp_front=self.zpf,
559 | transform=np.fft.rfft,
560 | win=win) * np.conj(self.weights[i])
561 |
562 | # now reconstruct the signal
563 | output = stft.istft(
564 | tfd_sig,
565 | self.L,
566 | self.hop,
567 | zp_back=self.zpb,
568 | zp_front=self.zpf,
569 | transform=np.fft.irfft)
570 |
571 | # remove the zero padding from output signal
572 | if self.zpb is 0:
573 | output = output[self.zpf:]
574 | else:
575 | output = output[self.zpf:-self.zpb]
576 |
577 | elif self.processing is 'TimeDomain':
578 |
579 | # go back to time domain and shift DC to center
580 | tw = np.sqrt(self.weights.shape[1])*np.fft.irfft(np.conj(self.weights), axis=1)
581 | tw = np.concatenate((tw[:, self.N/2:], tw[:, :self.N/2]), axis=1)
582 |
583 | from scipy.signal import fftconvolve
584 |
585 | # do real STFT of first signal
586 | output = fftconvolve(tw[0], self.signals[0])
587 | for i in xrange(1, len(self.signals)):
588 | output += fftconvolve(tw[i], self.signals[i])
589 |
590 | elif self.processing is 'Total':
591 |
592 | W = np.concatenate((self.weights, np.conj(self.weights[:,-2:0:-1])), axis=1)
593 | W[:,0] = np.real(W[:,0])
594 | W[:,self.N/2] = np.real(W[:,self.N/2])
595 |
596 | F_sig = np.zeros(self.signals.shape[1], dtype=complex)
597 | for i in xrange(self.M):
598 | F_sig += np.fft.fft(self.signals[i])*np.conj(W[i,:])
599 |
600 | f_sig = np.fft.ifft(F_sig)
601 | print np.abs(np.imag(f_sig)).mean()
602 | print np.abs(np.real(f_sig)).mean()
603 |
604 | output = np.real(np.fft.ifft(F_sig))
605 |
606 | return output
607 |
608 |
609 | def plot(self, sum_ir=False):
610 |
611 | import matplotlib.pyplot as plt
612 |
613 | plt.subplot(2, 2, 1)
614 | plt.plot(self.frequencies, np.abs(self.weights.T))
615 | plt.title('Beamforming weights [modulus]')
616 | plt.xlabel('Frequency [Hz]')
617 | plt.ylabel('Weight modulus')
618 |
619 | plt.subplot(2, 2, 2)
620 | plt.plot(self.frequencies, np.unwrap(np.angle(self.weights.T), axis=0))
621 | plt.title('Beamforming weights [phase]')
622 | plt.xlabel('Frequency [Hz]')
623 | plt.ylabel('Unwrapped phase')
624 |
625 | plt.subplot(2, 1, 2)
626 |
627 | self.plot_IR(sum_ir=sum_ir)
628 |
629 | plt.title('Beamforming filters')
630 | plt.xlabel('Time [s]')
631 | plt.ylabel('Filter amplitude')
632 | plt.axis('tight')
633 |
634 |
635 | def ir(self, sum_ir=False, norm=None, zp=1, **kwargs):
636 | ''' compute time domain impulse response of the beamformer'''
637 |
638 | # go back to time domain and shift DC to center
639 | tw = np.fft.irfft(np.conj(self.weights), axis=1, n=zp*self.N)
640 |
641 | tw = np.concatenate((tw[:,-self.N/2:], tw[:, :self.N/2]), axis=1)
642 |
643 | if sum_ir is True:
644 | tw = np.sum(tw.T, axis=1)
645 | else:
646 | tw = tw.T
647 |
648 | if norm is not None:
649 | tw *= norm/np.abs(tw).max()
650 |
651 | return tw
652 |
653 | def plot_IR(self, sum_ir=False, norm=None, zp=1, **kwargs):
654 |
655 | tw = self.ir(sum_ir=sum_ir, norm=norm, zp=zp, **kwargs)
656 |
657 | import matplotlib.pyplot as plt
658 |
659 | plt.plot(np.arange(tw.shape[0])/float(self.Fs), tw, **kwargs)
660 |
661 |
662 | @classmethod
663 | def linear2D(cls, Fs, center, M, phi, d):
664 | ''' Create linear beamformer '''
665 | return Beamformer(linear2DArray(center, M, phi, d), Fs)
666 |
667 | @classmethod
668 | def circular2D(cls, Fs, center, M, phi, radius):
669 | ''' Create circular beamformer'''
670 | return Beamformer(circular2DArray(center, M, phi, radius), Fs)
671 |
672 | @classmethod
673 | def poisson(cls, Fs, center, M, d):
674 | ''' Create beamformer with microphone positions drawn from Poisson process '''
675 |
676 | from numpy.random import standard_exponential, randint
677 |
678 | R = d*standard_exponential((2, M))*(2*randint(0,2, (2,M)) - 1)
679 | R = R.cumsum(axis=1)
680 | R -= R.mean(axis=1)[:,np.newaxis]
681 | R += np.array([center]).T
682 |
683 | return Beamformer(R, Fs)
684 |
685 |
--------------------------------------------------------------------------------
/bin/README.md:
--------------------------------------------------------------------------------
1 | Binary Blobs
2 | ===============
3 |
4 | Put the PESQ binary blob in this directory.
5 |
--------------------------------------------------------------------------------
/constants.py:
--------------------------------------------------------------------------------
1 | '''
2 | This file defines the main physical constants of the system
3 | '''
4 |
5 | # Speed of sound c=343 m/s
6 | c = 343.
7 |
8 | # distance to the far field
9 | ffdist = 10.
10 |
11 | # cut-off frequency of standard high-pass filter
12 | fc_hp = 300.
13 |
14 | # tolerance for computations
15 | eps = 1e-10
16 |
--------------------------------------------------------------------------------
/figure_Measures1.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib
3 | import constants
4 | matplotlib.use('TkAgg')
5 |
6 | import matplotlib.pyplot as plt
7 | import matplotlib.colors as colors
8 | import matplotlib.cm as cmx
9 |
10 |
11 | import Room as rg
12 | import beamforming as bf
13 | from scipy.io import wavfile
14 |
15 | # Room 1 : Shoe box
16 | p1 = np.array([0, 0])
17 | p2 = np.array([4, 6])
18 |
19 | # The desired signal
20 | source1 = [1.2, 1.5]
21 |
22 | # The interferer
23 | source2 = [2.5, 2]
24 |
25 | # Some simulation parameters
26 | Fs = 44100
27 | absorption = 0.8
28 | max_order = 4
29 |
30 | # create a microphone array
31 | mic1 = [2, 3]
32 | M = 12
33 | d = 0.3
34 | freqs = np.array([1000])
35 | f = 1000
36 | sigma2 = 1e-3
37 |
38 | mics = bf.Beamformer.circular2D(Fs, mic1, M, 0, d)
39 | mics.frequencies = freqs
40 |
41 | # How much to simulate?
42 | max_K = 21
43 | n_monte_carlo = 20000
44 |
45 | beamformer_names = ['DS',
46 | 'Max-SINR',
47 | 'Rake-DS',
48 | 'Rake-MaxSINR',
49 | 'Rake-MaxUDR']
50 | # 'Rake-OF']
51 | bf_weights_fun = [mics.rakeDelayAndSumWeights,
52 | mics.rakeMaxSINRWeights,
53 | mics.rakeDelayAndSumWeights,
54 | mics.rakeMaxSINRWeights,
55 | mics.rakeMaxUDRWeights]
56 | # mics.rakeOneForcingWeights]
57 |
58 | SNR = {}
59 | SNR_ci = {}
60 | SNR_ci_minus = {}
61 | SNR_ci_plus = {}
62 |
63 | UDR = {}
64 | UDR_ci = {}
65 |
66 | for bf in beamformer_names:
67 | SNR.update({bf: np.zeros((max_K, n_monte_carlo))})
68 | SNR_ci.update({bf: np.float(0)})
69 | UDR.update({bf: np.zeros((max_K, n_monte_carlo))})
70 | UDR_ci.update({bf: np.float(0)})
71 |
72 | SNR_ci_minus = SNR_ci.copy()
73 | SNR_ci_plus = SNR_ci.copy()
74 |
75 | for K in range(0, max_K):
76 | for n in xrange(n_monte_carlo):
77 |
78 | # create the room with sources
79 | room1 = rg.Room.shoeBox2D(
80 | p1,
81 | p2,
82 | Fs,
83 | max_order=max_order,
84 | absorption=absorption)
85 |
86 | source1 = p1 + np.random.rand(2) * (p2 - p1)
87 | source2 = p1 + np.random.rand(2) * (p2 - p1)
88 |
89 | room1.addSource(source1)
90 | room1.addSource(source2)
91 |
92 | # Create different beamformers and evaluate corresponding performance measures
93 | for i, bf in enumerate(beamformer_names):
94 |
95 | if (bf is 'DS') or (bf is 'Max-SINR'):
96 | n_nearest = 1
97 | else:
98 | n_nearest = K+1
99 |
100 |
101 | bf_weights_fun[i](room1.sources[0].getImages(n_nearest=n_nearest, ref_point=mics.center),
102 | room1.sources[1].getImages(n_nearest=n_nearest, ref_point=mics.center),
103 | R_n=sigma2 * np.eye(mics.M),
104 | ff=False,
105 | attn=True)
106 |
107 | room1.addMicrophoneArray(mics)
108 |
109 | SNR[bf][K][n] = mics.SNR(room1.sources[0].getImages(n_nearest=K+1, ref_point=mics.center),
110 | room1.sources[1].getImages(n_nearest=max_K+1, ref_point=mics.center),
111 | f,
112 | R_n=sigma2 * np.eye(mics.M),
113 | dB=True)
114 | UDR[bf][K][n] = mics.UDR(room1.sources[0].getImages(n_nearest=K+1, ref_point=mics.center),
115 | room1.sources[1].getImages(n_nearest=max_K+1, ref_point=mics.center),
116 | f,
117 | R_n=sigma2 * np.eye(mics.M),
118 | dB=True)
119 |
120 | print 'Computed for K =', K
121 |
122 |
123 | # Compute the confidence regions, symmetrically, and then separately for
124 | # positive and for negative differences
125 | p = 0.5
126 | for bf in beamformer_names:
127 | err_SNR = SNR[bf][K] - np.median(SNR[bf][K])
128 | n_plus = np.sum(err_SNR >= 0)
129 | n_minus = np.sum(err_SNR < 0)
130 | SNR_ci[bf] = np.sort(np.abs(err_SNR))[np.floor(p*n_monte_carlo)]
131 | SNR_ci_plus[bf] = np.sort(err_SNR[err_SNR >= 0])[np.floor(p*n_plus)]
132 | SNR_ci_minus[bf] = np.sort(-err_SNR[err_SNR < 0])[np.floor(p*n_minus)]
133 |
134 | err_UDR = UDR[bf][K] - np.median(UDR[bf][K])
135 | UDR_ci[bf] = np.sort(np.abs(err_UDR))[np.floor(p*n_monte_carlo)]
136 |
137 |
138 | #---------------------------------------------------------------------
139 | # Export the SNR figure
140 | #---------------------------------------------------------------------
141 |
142 | plt.figure(figsize=(4, 3))
143 |
144 | newmap = plt.get_cmap('gist_heat')
145 | ax1 = plt.gca()
146 | ax1.set_color_cycle([newmap( k ) for k in np.linspace(0.25,0.9,len(beamformer_names))])
147 |
148 | from itertools import cycle
149 | lines = ['-s','-o','-v','-D','->']
150 | linecycler = cycle(lines)
151 |
152 | for i, bf in enumerate(beamformer_names):
153 | p, = plt.plot(range(0, max_K),
154 | np.median(SNR[bf], axis=1),
155 | next(linecycler),
156 | linewidth=1,
157 | markersize=4,
158 | markeredgewidth=.5,
159 | clip_on=False)
160 |
161 | plt.fill_between(range(0, max_K),
162 | np.median(SNR['Rake-MaxSINR'], axis=1) - SNR_ci['Rake-MaxSINR'],
163 | np.median(SNR['Rake-MaxSINR'], axis=1) + SNR_ci['Rake-MaxSINR'],
164 | color='grey',
165 | linewidth=0.3,
166 | edgecolor='k',
167 | alpha=0.7)
168 |
169 | # Hide right and top axes
170 | ax1.spines['top'].set_visible(False)
171 | ax1.spines['right'].set_visible(False)
172 | ax1.spines['bottom'].set_position(('outward', 10))
173 | ax1.spines['left'].set_position(('outward', 15))
174 | ax1.yaxis.set_ticks_position('left')
175 | ax1.xaxis.set_ticks_position('bottom')
176 |
177 | # Make ticks nicer
178 | ax1.xaxis.set_tick_params(width=.3, length=3)
179 | ax1.yaxis.set_tick_params(width=.3, length=3)
180 |
181 | # Make axis lines thinner
182 | for axis in ['bottom','left']:
183 | ax1.spines[axis].set_linewidth(0.3)
184 |
185 | # Set ticks fontsize
186 | plt.xticks(size=9)
187 | plt.yticks(size=9)
188 |
189 | # Set labels
190 | plt.xlabel(r'Number of images $K$', fontsize=10)
191 | plt.ylabel('Output SINR [dB]', fontsize=10)
192 | plt.tight_layout()
193 |
194 |
195 | plt.legend(beamformer_names, fontsize=7, loc='upper left', frameon=False, labelspacing=0)
196 |
197 | plt.savefig('figures/SINR_vs_K.pdf')
198 |
199 | plt.close()
200 |
201 | #---------------------------------------------------------------------
202 | # Export the UDR figure
203 | #---------------------------------------------------------------------
204 |
205 | plt.figure(figsize=(4, 3))
206 |
207 | newmap = plt.get_cmap('gist_heat')
208 | ax1 = plt.gca()
209 | ax1.set_color_cycle([newmap( k ) for k in np.linspace(0.25,0.9,len(beamformer_names))])
210 |
211 | for i, bf in enumerate(beamformer_names):
212 | p, = plt.plot(range(0, max_K),
213 | np.median(UDR[bf], axis=1),
214 | next(linecycler),
215 | linewidth=1,
216 | markersize=4,
217 | markeredgewidth=.5,
218 | clip_on=False)
219 |
220 | plt.fill_between(range(0, max_K),
221 | np.median(UDR['Rake-MaxUDR'], axis=1) - UDR_ci['Rake-MaxUDR'],
222 | np.median(UDR['Rake-MaxUDR'], axis=1) + UDR_ci['Rake-MaxUDR'],
223 | color='grey',
224 | linewidth=0.3,
225 | edgecolor='k',
226 | alpha=0.7)
227 |
228 | # Hide right and top axes
229 | ax1.spines['top'].set_visible(False)
230 | ax1.spines['right'].set_visible(False)
231 | ax1.spines['bottom'].set_position(('outward', 10))
232 | ax1.spines['left'].set_position(('outward', 15))
233 | ax1.yaxis.set_ticks_position('left')
234 | ax1.xaxis.set_ticks_position('bottom')
235 |
236 | # Make ticks nicer
237 | ax1.xaxis.set_tick_params(width=.3, length=3)
238 | ax1.yaxis.set_tick_params(width=.3, length=3)
239 |
240 | # Make axis lines thinner
241 | for axis in ['bottom','left']:
242 | ax1.spines[axis].set_linewidth(0.3)
243 |
244 | # Set ticks fontsize
245 | plt.xticks(size=9)
246 | plt.yticks(size=9)
247 |
248 | # Set labels
249 | plt.xlabel(r'Number of images $K$', fontsize=10)
250 | plt.ylabel('Output UDR [dB]', fontsize=10)
251 | plt.tight_layout()
252 |
253 |
254 | plt.legend(beamformer_names, fontsize=7, loc='upper left', frameon=False, labelspacing=0)
255 |
256 | plt.savefig('figures/UDR_vs_K.pdf')
257 |
--------------------------------------------------------------------------------
/figure_Measures2.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib
3 | import constants
4 | matplotlib.use('TkAgg')
5 |
6 | import matplotlib.pyplot as plt
7 | import matplotlib.colors as colors
8 | import matplotlib.cm as cmx
9 |
10 | import Room as rg
11 | import beamforming as bf
12 | from scipy.io import wavfile
13 |
14 | # Room 1 : Shoe box
15 | p1 = np.array([0, 0])
16 | p2 = np.array([4, 6])
17 |
18 | # The first signal is Homer
19 | source1 = [1.2, 1.5]
20 |
21 | # the second signal is some speech
22 | source2 = [2.5, 2]
23 |
24 | # Some simulation parameters
25 | Fs = 44100
26 | absorption = 0.8
27 | max_order = 4
28 |
29 | # create a microphone array
30 | mic1 = [2, 3]
31 | M = 12
32 | d = 0.3
33 | freqs = np.arange(100,4000,200)
34 | sigma2 = 1e-3
35 |
36 | mics = bf.Beamformer.circular2D(Fs, mic1, M, 0, d)
37 | mics.frequencies = freqs
38 |
39 | # How much to simulate?
40 | n_monte_carlo = 20000
41 |
42 | beamformer_names = ['DS',
43 | 'Max-SINR',
44 | 'Rake-DS',
45 | 'Rake-MaxSINR',
46 | 'Rake-MaxUDR']
47 | # 'Rake-OF']
48 | bf_weights_fun = [mics.rakeDelayAndSumWeights,
49 | mics.rakeMaxSINRWeights,
50 | mics.rakeDelayAndSumWeights,
51 | mics.rakeMaxSINRWeights,
52 | mics.rakeMaxUDRWeights]
53 | # mics.rakeOneForcingWeights]
54 |
55 | SNR = {}
56 | UDR = {}
57 | for bf in beamformer_names:
58 | SNR.update({bf: np.zeros((freqs.size, n_monte_carlo))})
59 | UDR.update({bf: np.zeros((freqs.size, n_monte_carlo))})
60 |
61 | K = 10
62 |
63 | # How many images there is in the first 15 generations?
64 | max_K = 1000
65 |
66 | for n in xrange(n_monte_carlo):
67 |
68 | # create the room with sources
69 | room1 = rg.Room.shoeBox2D(
70 | p1,
71 | p2,
72 | Fs,
73 | max_order=max_order,
74 | absorption=absorption)
75 |
76 | source1 = p1 + np.random.rand(2) * (p2 - p1)
77 | source2 = p1 + np.random.rand(2) * (p2 - p1)
78 |
79 | room1.addSource(source1)
80 | room1.addSource(source2)
81 |
82 | # Create different beamformers and evaluate corresponding performance measures
83 | for i_bf, bf in enumerate(beamformer_names):
84 |
85 | if (bf is 'DS') or (bf is 'Max-SINR'):
86 | n_nearest = 1
87 | else:
88 | n_nearest = K+1
89 |
90 | bf_weights_fun[i_bf](room1.sources[0].getImages(n_nearest=n_nearest, ref_point=mics.center),
91 | room1.sources[1].getImages(n_nearest=n_nearest, ref_point=mics.center),
92 | R_n=sigma2 * np.eye(mics.M),
93 | ff=False,
94 | attn=True)
95 |
96 | room1.addMicrophoneArray(mics)
97 |
98 | # TO DO: Average in dB or in the linear scale?
99 | for i_f, f in enumerate(freqs):
100 | SNR[bf][i_f][n] = mics.SNR(room1.sources[0].getImages(n_nearest=K+1, ref_point=mics.center),
101 | room1.sources[1].getImages(n_nearest=max_K+1, ref_point=mics.center),
102 | f,
103 | R_n=sigma2 * np.eye(mics.M),
104 | dB=True)
105 | UDR[bf][i_f][n] = mics.UDR(room1.sources[0].getImages(n_nearest=K+1, ref_point=mics.center),
106 | room1.sources[1].getImages(n_nearest=max_K+1, ref_point=mics.center),
107 | f,
108 | R_n=sigma2 * np.eye(mics.M),
109 | dB=True)
110 |
111 | print 'Computed for n =', n
112 |
113 | # Plot the results
114 | #
115 | # Make SublimeText use iPython, right? currently it uses python... at least make sure that it uses the correct one.
116 | #
117 | plt.figure(figsize=(4, 3))
118 |
119 | from itertools import cycle
120 | lines = ['-s','-o','-v','-D','->']
121 | linecycler = cycle(lines)
122 |
123 | newmap = plt.get_cmap('gist_heat')
124 | ax1 = plt.gca()
125 | ax1.set_color_cycle([newmap( k ) for k in np.linspace(0.25,0.9,len(beamformer_names))])
126 |
127 | for i, bf in enumerate(beamformer_names):
128 | p, = plt.plot(freqs,
129 | np.mean(SNR[bf], axis=1),
130 | next(linecycler),
131 | linewidth=1,
132 | markersize=4,
133 | markeredgewidth=.5)
134 |
135 | # Hide right and top axes
136 | ax1 = plt.gca()
137 | ax1.spines['top'].set_visible(False)
138 | ax1.spines['right'].set_visible(False)
139 | ax1.spines['bottom'].set_position(('outward', 10))
140 | ax1.spines['left'].set_position(('outward', 15))
141 | ax1.yaxis.set_ticks_position('left')
142 | ax1.xaxis.set_ticks_position('bottom')
143 |
144 | # Make ticks nicer
145 | ax1.xaxis.set_tick_params(width=.3, length=3)
146 | ax1.yaxis.set_tick_params(width=.3, length=3)
147 |
148 | # Make axis lines thinner
149 | for axis in ['bottom','left']:
150 | ax1.spines[axis].set_linewidth(0.3)
151 |
152 | # Set ticks fontsize
153 | plt.xticks(size=9)
154 | plt.yticks(size=9)
155 |
156 | # Set labels
157 | plt.xlabel(r'Frequency [Hz]', fontsize=10)
158 | plt.ylabel('Output SINR [dB]', fontsize=10)
159 | plt.tight_layout()
160 |
161 |
162 | plt.legend(beamformer_names, fontsize=7, loc='lower right', frameon=False, labelspacing=0)
163 |
164 | plt.savefig('figures/SINR_vs_freq.pdf')
165 |
166 |
167 |
168 |
--------------------------------------------------------------------------------
/figure_SumNorm.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | import scipy.special as spfun
4 |
5 | import matplotlib
6 | import constants
7 |
8 | import matplotlib.colors as colors
9 | import matplotlib.cm as cmx
10 |
11 | matplotlib.use('TkAgg')
12 | import matplotlib.pyplot as plt
13 |
14 | import Room as rg
15 | import beamforming as bf
16 |
17 | # Room 1 : Shoe box
18 | p1 = np.array([0, 0])
19 | p2 = np.array([4, 6])
20 | mic1 = [2, 3]
21 | Fs = 44100
22 | absorption = 0.8
23 | max_order = 4
24 |
25 | # Parameters for the theoretical curve
26 | a = 5
27 | b = 10
28 | Delta = b-a
29 |
30 | # Create a microphone array
31 | M = 12
32 | d = 0.2
33 | frequencies = np.arange(25, 600, 5)
34 |
35 | mics = bf.Beamformer.linear2D(Fs, mic1, M, 0, d)
36 |
37 | K_list = [16, 8]
38 | n_monte_carlo = 1000
39 |
40 | SNR_gain = np.zeros((len(K_list), frequencies.size))
41 | SNR_gain_theory = np.zeros((len(K_list), frequencies.size))
42 |
43 | for i_K, K in enumerate(K_list):
44 | for i, f in enumerate(frequencies):
45 | print 'Simulating for the frequency', f
46 | for n in range(0, n_monte_carlo):
47 |
48 | # Generate a source at a random location. TO DO: Add a bounding box for
49 | # sources!
50 | source1 = p1 + np.random.rand(2) * (p2 - p1)
51 |
52 | # Create the room
53 | room1 = rg.Room.shoeBox2D(
54 | p1,
55 | p2,
56 | Fs,
57 | max_order=max_order,
58 | absorption=absorption)
59 | room1.addSource(source1)
60 | room1.addMicrophoneArray(mics)
61 |
62 | A = mics.steering_vector_2D_from_point(f, room1.sources[0].getImages(n_nearest=K+1, ref_point=mics.center), attn=False)
63 | SNR_gain[i_K][i] += np.linalg.norm(np.sum(A, axis=1))**2 / np.linalg.norm(A[:, 0])**2
64 |
65 | SNR_gain[i_K][i] /= n_monte_carlo
66 |
67 | m = np.arange(M)
68 | kappa = 2*np.pi*f / constants.c
69 | SNR_gain_theory[i_K][i] = np.sum(np.abs(A[0,:]))*np.sum(1 + 2*spfun.jv(0, m*d*kappa)**2 * (1-np.cos(Delta * kappa)) / (Delta * kappa)**2)/np.linalg.norm(A[:, 0])**2
70 |
71 | # Plot the results
72 | plt.figure(figsize=(4, 2.5))
73 | ax1 = plt.gca()
74 |
75 | newmap = plt.get_cmap('gist_heat')
76 | ax1.set_color_cycle([newmap( k ) for k in np.linspace(0.25,0.8,2)])
77 |
78 | plt.plot(frequencies, 10*np.log10(SNR_gain.T))
79 | plt.plot(frequencies, 10*np.log10(SNR_gain_theory.T), 'o', markersize=2.5, markeredgewidth=.3)
80 |
81 | # Hide right and top axes
82 | ax1.spines['top'].set_visible(False)
83 | ax1.spines['right'].set_visible(False)
84 | ax1.spines['bottom'].set_position(('outward', 10))
85 | ax1.spines['left'].set_position(('outward', 15))
86 | ax1.yaxis.set_ticks_position('left')
87 | ax1.xaxis.set_ticks_position('bottom')
88 |
89 | # Make ticks nicer
90 | ax1.xaxis.set_tick_params(width=.3, length=3)
91 | ax1.yaxis.set_tick_params(width=.3, length=3)
92 |
93 | # Make axis lines thinner
94 | for axis in ['bottom','left']:
95 | ax1.spines[axis].set_linewidth(0.3)
96 |
97 | # Set ticks
98 | plt.xticks(size=9)
99 | plt.yticks(size=9)
100 |
101 | # Do the legend
102 | plt.legend([r'Simulation, $K=16$',
103 | r'Simulation, $K=8$',
104 | r'Theorem, $K=16$',
105 | r'Theorem, $K=8$'], fontsize=7, loc='upper right', frameon=False, labelspacing=0)
106 |
107 | # Set labels
108 | plt.xlabel(r'Frequency [Hz]', fontsize=10)
109 | plt.ylabel('SNR gain [dB]', fontsize=10)
110 | plt.tight_layout()
111 |
112 | plt.savefig('figures/SNR_gain.pdf')
113 |
114 |
--------------------------------------------------------------------------------
/figure_beam_scenarios.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | import matplotlib
4 | import matplotlib.pyplot as plt
5 | from scipy.io import wavfile
6 | from scipy.signal import resample
7 |
8 | import Room as rg
9 | import beamforming as bf
10 | import windows
11 | import utilities as u
12 |
13 | # Beam pattern figure properties
14 | freq=[800, 1600]
15 | figsize=(4*1.88,2.24)
16 | xlim=[-4,8]
17 | ylim=[-5.2,10]
18 |
19 | # Some simulation parameters
20 | Fs = 8000
21 | t0 = 1./(Fs*np.pi*1e-2) # starting time function of sinc decay in RIR response
22 | absorption = 0.90
23 | max_order_sim = 10
24 | sigma2_n = 1e-7
25 |
26 | # Room 1 : Shoe box
27 | room_dim = [4, 6]
28 |
29 | # the good source is fixed for all
30 | good_source = [1, 4.5] # good source
31 | normal_interferer = [2.8, 4.3] # interferer
32 | hard_interferer = [1.5, 3] # interferer in direct path
33 |
34 | # microphone array design parameters
35 | mic1 = [2, 1.5] # position
36 | M = 8 # number of microphones
37 | d = 0.08 # distance between microphones
38 | phi = 0. # angle from horizontal
39 | max_order_design = 1 # maximum image generation used in design
40 | shape = 'Linear' # array shape
41 |
42 | # create a microphone array
43 | if shape is 'Circular':
44 | mics = bf.Beamformer.circular2D(Fs, mic1, M, phi, d*M/(2*np.pi))
45 | else:
46 | mics = bf.Beamformer.linear2D(Fs, mic1, M, phi, d)
47 |
48 | # define the array processing type
49 | L = 4096 # frame length
50 | hop = 2048 # hop between frames
51 | zp = 2048 # zero padding (front + back)
52 | mics.setProcessing('FrequencyDomain', L, hop, zp, zp)
53 |
54 | # The first signal (of interest) is singing
55 | rate1, signal1 = wavfile.read('samples/singing_'+str(Fs)+'.wav')
56 | signal1 = np.array(signal1, dtype=float)
57 | signal1 = u.normalize(signal1)
58 | signal1 = u.highpass(signal1, Fs)
59 | delay1 = 0.
60 |
61 | # the second signal (interferer) is some german speech
62 | rate2, signal2 = wavfile.read('samples/german_speech_'+str(Fs)+'.wav')
63 | signal2 = np.array(signal2, dtype=float)
64 | signal2 = u.normalize(signal2)
65 | signal2 = u.highpass(signal2, Fs)
66 | delay2 = 1.
67 |
68 | # create the room with sources and mics
69 | room1 = rg.Room.shoeBox2D(
70 | [0,0],
71 | room_dim,
72 | Fs,
73 | t0 = t0,
74 | max_order=max_order_sim,
75 | absorption=absorption,
76 | sigma2_awgn=sigma2_n)
77 |
78 | # add mic and good source to room
79 | room1.addSource(good_source, signal=signal1, delay=delay1)
80 | room1.addMicrophoneArray(mics)
81 |
82 | # start a figure
83 | fig = plt.figure(figsize=figsize)
84 |
85 | #rect = fig.patch
86 | #rect.set_facecolor('white')
87 | #rect.set_alpha(0.15)
88 |
89 | def nice_room_plot(label, leg=None):
90 | ax = plt.gca()
91 |
92 | room1.plot(img_order=np.minimum(room1.max_order, 1),
93 | freq=freq,
94 | xlim=xlim, ylim=ylim,
95 | autoscale_on=False)
96 |
97 | if leg is not None:
98 | l = ax.legend(leg, loc=(0.005,0.85), fontsize=7, frameon=False)
99 |
100 | ax.text(xlim[1]-1.1, ylim[1]-1.1, label, weight='bold')
101 |
102 | ax.axis('on')
103 | ax.tick_params(\
104 | axis='both', # changes apply to the x-axis
105 | which='both', # both major and minor ticks are affected
106 | bottom='off', # ticks along the bottom edge are off
107 | left='off',
108 | right='off',
109 | top='off', # ticks along the top edge are off
110 | labelbottom='off',
111 | labelleft='off') #
112 |
113 | ax.spines['right'].set_visible(False)
114 | ax.spines['left'].set_visible(False)
115 | ax.spines['bottom'].set_visible(False)
116 | ax.spines['top'].set_visible(False)
117 |
118 | ax.patch.set_facecolor('grey')
119 | ax.patch.set_alpha(0.15)
120 | ax.patch.edgecolor = 'none'
121 | ax.patch.linewidth = 0
122 | ax.edgecolor = 'none'
123 | ax.linewidth = 0
124 |
125 |
126 | '''
127 | SCENARIO 1
128 | Only one source of interest
129 | Max-SINR
130 | '''
131 | print 'Scenario1...'
132 |
133 | # Compute the beamforming weights depending on room geometry
134 | good_sources = room1.sources[0].getImages(max_order=max_order_design)
135 | mics.rakeMaxSINRWeights(good_sources, None,
136 | R_n = sigma2_n*np.eye(mics.M),
137 | rcond=0.,
138 | attn=True, ff=False)
139 |
140 | # plot the room and beamformer
141 | ax = plt.subplot(1,4,1)
142 | nice_room_plot('A', leg=('800 Hz', '1600 Hz'))
143 |
144 | '''
145 | SCENARIO 2
146 | One source or interest and one interefer (easy)
147 | Max-SINR
148 | '''
149 | print 'Scenario2...'
150 |
151 | room1.addSource(normal_interferer, signal=signal2, delay=delay2)
152 |
153 | # Compute the beamforming weights depending on room geometry
154 | bad_sources = room1.sources[1].getImages(max_order=max_order_design)
155 | mics.rakeMaxSINRWeights(good_sources, bad_sources,
156 | R_n = sigma2_n*np.eye(mics.M),
157 | rcond=0.,
158 | attn=True, ff=False)
159 |
160 | # plot the room and beamformer
161 | ax = plt.subplot(1,4,2)
162 | nice_room_plot('B')
163 |
164 |
165 | '''
166 | SCENARIO 3
167 | One source or interest and one interefer (easy)
168 | Max-UDR (eSNR)
169 | '''
170 | print 'Scenario3...'
171 |
172 | # Compute the beamforming weights depending on room geometry
173 | mics.rakeMaxUDRWeights(good_sources, bad_sources,
174 | R_n = sigma2_n*np.eye(mics.M),
175 | attn=True, ff=False)
176 |
177 | # plot the room and beamformer
178 | plt.subplot(1,4,3)
179 | nice_room_plot('C')
180 |
181 | '''
182 | SCENARIO 4
183 | One source and one interferer in the direct path (hard)
184 | Max-SINR
185 | '''
186 | print 'Scenario4...'
187 |
188 | room1.sources.pop()
189 | room1.addSource(hard_interferer, signal=signal2, delay=delay2)
190 |
191 | # Compute the beamforming weights depending on room geometry
192 | bad_sources = room1.sources[1].getImages(max_order=max_order_design)
193 | mics.rakeMaxSINRWeights(good_sources, bad_sources,
194 | R_n = sigma2_n*np.eye(mics.M),
195 | rcond=0.,
196 | attn=True, ff=False)
197 |
198 | # plot the room and beamformer
199 | ax = plt.subplot(1,4,4)
200 | nice_room_plot('D')
201 |
202 | plt.subplots_adjust(left=0.0, right=1., bottom=0., top=1., wspace=0.05, hspace=0.02)
203 |
204 | fig.savefig('figures/beam_scenarios.pdf')
205 | fig.savefig('figures/beam_scenarios.png',dpi=300)
206 |
207 | plt.show()
208 |
209 |
--------------------------------------------------------------------------------
/figure_filter_avg_ir.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | import matplotlib
4 | import matplotlib.pyplot as plt
5 | from scipy.io import wavfile
6 | from scipy.signal import resample
7 |
8 | import Room as rg
9 | import beamforming as bf
10 | import windows
11 | import utilities as u
12 |
13 | # Beam pattern figure properties
14 | freq=[800, 1600]
15 | figsize=(1.88,2.24)
16 | xlim=[-4,8]
17 | ylim=[-4.9,9.4]
18 |
19 | # Some simulation parameters
20 | Fs = 8000
21 | t0 = 1./(Fs*np.pi*1e-2) # starting time function of sinc decay in RIR response
22 | absorption = 0.90
23 | max_order_sim = 10
24 | sigma2_n = 1e-7
25 |
26 | # Room 1 : Shoe box
27 | room_dim = [4, 6]
28 |
29 | # the good source is fixed for all
30 | good_source = [1, 4.5] # good source
31 | normal_interferer = [3, 4] # interferer
32 | hard_interferer = [1.5, 3] # interferer in direct path
33 |
34 | # microphone array design parameters
35 | mic1 = [2, 1.5] # position
36 | M = 8 # number of microphones
37 | d = 0.08 # distance between microphones
38 | phi = 0. # angle from horizontal
39 | max_order_design = 1 # maximum image generation used in design
40 | shape = 'Linear' # array shape
41 |
42 | # create a microphone array
43 | if shape is 'Circular':
44 | mics = bf.Beamformer.circular2D(Fs, mic1, M, phi, d*M/(2*np.pi))
45 | else:
46 | mics = bf.Beamformer.linear2D(Fs, mic1, M, phi, d)
47 |
48 | # define the array processing type
49 | N = int(1.5*Fs) # frame length
50 | zero_padding_factor = 2
51 | mics.setProcessing('TimeDomain', N)
52 |
53 | # The first signal (of interest) is singing
54 | rate1, signal1 = wavfile.read('samples/singing_'+str(Fs)+'.wav')
55 | signal1 = np.array(signal1, dtype=float)
56 | signal1 = u.normalize(signal1)
57 | signal1 = u.highpass(signal1, Fs)
58 | delay1 = 0.
59 |
60 | # the second signal (interferer) is some german speech
61 | rate2, signal2 = wavfile.read('samples/german_speech_'+str(Fs)+'.wav')
62 | signal2 = np.array(signal2, dtype=float)
63 | signal2 = u.normalize(signal2)
64 | signal2 = u.highpass(signal2, Fs)
65 | delay2 = 1.
66 |
67 | # create the room with sources and mics
68 | room1 = rg.Room.shoeBox2D(
69 | [0,0],
70 | room_dim,
71 | Fs,
72 | t0 = t0,
73 | max_order=max_order_sim,
74 | absorption=absorption,
75 | sigma2_awgn=sigma2_n)
76 |
77 | # add mic and good source to room
78 | room1.addSource(good_source, signal=signal1, delay=delay1)
79 | room1.addSource(normal_interferer, signal=signal2, delay=delay2)
80 | room1.addMicrophoneArray(mics)
81 |
82 | # plot the room and beamformer
83 | fig = plt.figure(figsize=(4,3))
84 |
85 | # define a new set of colors for the beam patterns
86 | newmap = plt.get_cmap('autumn')
87 | desat = 0.7
88 | plt.gca().set_color_cycle([newmap(k) for k in desat*np.linspace(0,1,3)])
89 |
90 |
91 | '''
92 | BEAMFORMER 1
93 | Rake-MaxSINR
94 | '''
95 | print 'Beamformer 1...'
96 |
97 | # Compute the beamforming weights depending on room geometry
98 | good_sources = room1.sources[0].getImages(max_order=max_order_design)
99 | bad_sources = room1.sources[1].getImages(max_order=max_order_design)
100 | mics.rakeMaxSINRWeights(good_sources, bad_sources,
101 | R_n = sigma2_n*np.eye(mics.M),
102 | rcond=0.,
103 | attn=True, ff=False)
104 |
105 | mics.plot_IR(sum_ir=True, norm=1., zp=zero_padding_factor, linewidth=0.5)
106 |
107 | '''
108 | BEAMFORMER 2
109 | Rake-MaxUDR (eSNR)
110 | '''
111 | print 'Beamformer 2...'
112 |
113 | # Compute the beamforming weights depending on room geometry
114 | mics.rakeMaxUDRWeights(good_sources, bad_sources,
115 | R_n = sigma2_n*np.eye(mics.M),
116 | attn=True, ff=False)
117 |
118 | mics.plot_IR(sum_ir=True, norm=1., zp=zero_padding_factor, linewidth=0.5)
119 |
120 | '''
121 | BEAMFORMER 3
122 | MaxSINR (MVDR)
123 | '''
124 | print 'Beamformer 3...'
125 |
126 | # Compute the beamforming weights depending on room geometry
127 | mics.rakeMaxSINRWeights(room1.sources[0].getImages(max_order=0),
128 | room1.sources[1].getImages(max_order=0),
129 | R_n = sigma2_n*np.eye(mics.M),
130 | rcond=0.,
131 | attn=True, ff=False)
132 |
133 | mics.plot_IR(sum_ir=True, norm=1., zp=zero_padding_factor, linewidth=0.5)
134 |
135 | '''
136 | FINISH PLOT
137 | '''
138 |
139 |
140 | leg = ('Rake-MaxSINR', 'Rake-MaxUDR', 'MaxSINR')
141 | plt.legend(leg, fontsize=7, loc='upper left', frameon=False, labelspacing=0)
142 |
143 | # Hide right and top axes
144 | ax1 = plt.gca()
145 |
146 | # prepare axis
147 | #ax1.autoscale(tight=True, axis='x')
148 | ax1.spines['top'].set_visible(False)
149 | ax1.spines['right'].set_visible(False)
150 | ax1.spines['left'].set_visible(False)
151 | ax1.spines['bottom'].set_position(('outward', 5))
152 | ax1.yaxis.set_ticks_position('left')
153 | ax1.xaxis.set_ticks_position('bottom')
154 |
155 | # set x axis limit
156 | #ax1.set_xlim(0.5, 1.5)
157 |
158 | # Set ticks
159 | plt.xticks(np.arange(0, float(N)/Fs+1, 0.5), size=9)
160 | plt.xlim(0, 1.5)
161 | plt.yticks([])
162 |
163 | # Set labels
164 | plt.xlabel(r'Time [s]', fontsize=10)
165 | plt.ylabel('')
166 | plt.tight_layout()
167 |
168 | fig.savefig('figures/AvgIR.pdf')
169 |
170 | # show all plots
171 | plt.show()
172 |
--------------------------------------------------------------------------------
/figure_quality.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # This script will dispatch the perceptual quality evaluation
4 | # to multiple process to use most of the computer resource available.
5 |
6 | LOOPS=1000
7 |
8 | # simulate for 1 source to 21 sources
9 | for i in {1..11}
10 | do
11 | echo python figure_quality_sim.py ${i} ${LOOPS}
12 | screen -d -m python figure_quality_sim.py ${i} ${LOOPS}
13 | done
14 |
--------------------------------------------------------------------------------
/figure_quality_plot.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 | import utilities as u
4 | import metrics as metrics
5 |
6 | import sys
7 | import os
8 | import fnmatch
9 |
10 | max_sources = 11
11 | sim_data_dir = './sim_data/'
12 |
13 | beamformer_names = ['Rake-DS',
14 | 'Rake-MaxSINR',
15 | 'Rake-MaxUDR']
16 | bf_dict = dict(zip(beamformer_names,
17 | range(len(beamformer_names))))
18 | NBF = len(beamformer_names)
19 |
20 | loops = 0
21 |
22 | if len(sys.argv) == 0:
23 | # if no argument is specified, use all available files
24 | name_pattern = './sim_data/quality_2015*.npz'
25 | files = [file for file in os.listdir(sim_data_dir) if fnmatch.fnmatch(file, name_pattern)]
26 | else:
27 | files = sys.argv[1:]
28 |
29 | # Empty data containers
30 | good_source = np.zeros((0,2))
31 | bad_source = np.zeros((0,2))
32 | ipesq = np.zeros((0,2))
33 | opesq_tri = np.zeros((0,2,2))
34 | opesq_bf = np.zeros((0,2,NBF,max_sources))
35 | isinr = np.zeros((0))
36 | osinr_tri = np.zeros((0,2))
37 | osinr_bf = np.zeros((0,NBF,max_sources))
38 |
39 | # Read in all the data
40 | for fname in files:
41 | print 'Loading from',fname
42 |
43 | a = np.load(fname)
44 |
45 | good_source = np.concatenate((good_source, a['good_source']), axis=0)
46 | bad_source = np.concatenate((bad_source, a['bad_source']), axis=0)
47 |
48 | isinr = np.concatenate((isinr,u.dB(a['isinr'])), axis=0)
49 | osinr_bf = np.concatenate((osinr_bf,u.dB(a['osinr_bf'])), axis=0)
50 | osinr_tri = np.concatenate((osinr_tri,u.dB(a['osinr_trinicon'])), axis=0)
51 | ipesq = np.concatenate((ipesq,a['pesq_input']), axis=0)
52 | opesq_bf = np.concatenate((opesq_bf,a['pesq_bf']), axis=0)
53 | opesq_tri = np.concatenate((opesq_tri,a['pesq_trinicon']), axis=0)
54 |
55 | loops = good_source.shape[0]
56 |
57 | print 'Number of loops:',loops
58 | print 'Median input Raw MOS',np.median(ipesq[:,0])
59 | print 'Median input MOS LQO',np.median(ipesq[:,1])
60 | print 'Median input SINR',np.median(isinr[:])
61 |
62 | # Trinicon is blind so we have PESQ for both output channels
63 | # Select the channel that has highest Raw MOS for evaluation
64 | I_tri = np.argmax(opesq_tri[:,0,:], axis=1)
65 | opesq_tri_max = np.array([opesq_tri[i,:,I_tri[i]] for i in xrange(opesq_tri.shape[0])])
66 | osinr_tri_max = np.array([osinr_tri[i,I_tri[i]] for i in xrange(osinr_tri.shape[0])])
67 |
68 | print 'Median Trinicon Raw MOS',np.median(opesq_tri_max[:,0])
69 | print 'Median Trinicon MOS LQO',np.median(opesq_tri_max[:,1])
70 | print 'Median Trinicon SINR',np.median(osinr_tri_max[:])
71 |
72 | def nice_plot(x, ylabel, bf_order=None):
73 | '''
74 | Define a function to plot consistently the data
75 | '''
76 |
77 | if bf_order is None:
78 | bf_order = beamformer_names
79 |
80 | ax1 = plt.gca()
81 |
82 | newmap = plt.get_cmap('gist_heat')
83 | from itertools import cycle
84 |
85 | # totally a hack to get the same line styles as Fig6/7
86 | lines = ['-D','-v','->','-s','-o']
87 | linecycler = cycle(lines)
88 |
89 | # totally a hack to get the same line styles as Fig6/7
90 | map1 = [newmap( k ) for k in np.linspace(0.25,0.9,5)]
91 | map2 = [map1[3],map1[2],map1[4],map1[0],map1[1]]
92 |
93 | ax1.set_color_cycle(map2)
94 |
95 | # no clipping of the beautiful markers
96 | plt.setp(ax1,'clip_on',False)
97 |
98 | for bf in bf_order:
99 | i = bf_dict[bf]
100 | p, = plt.plot(range(0, max_sources),
101 | np.median(x[:,i,:], axis=0),
102 | next(linecycler),
103 | linewidth=1,
104 | markersize=4,
105 | markeredgewidth=.5,
106 | clip_on=False)
107 |
108 | if bf == 'Rake-MaxSINR':
109 | plt.fill_between(range(0, max_sources),
110 | np.percentile(x[:,i,:], 25, axis=0),
111 | np.percentile(x[:,i,:], 75, axis=0),
112 | color='grey',
113 | linewidth=0.3,
114 | edgecolor='k',
115 | alpha=0.7)
116 |
117 | # Hide right and top axes
118 | ax1.spines['top'].set_visible(False)
119 | ax1.spines['right'].set_visible(False)
120 | ax1.spines['bottom'].set_position(('outward', 10))
121 | ax1.spines['left'].set_position(('outward', 15))
122 | ax1.yaxis.set_ticks_position('left')
123 | ax1.xaxis.set_ticks_position('bottom')
124 |
125 | # Make ticks nicer
126 | ax1.xaxis.set_tick_params(width=.3, length=3)
127 | ax1.yaxis.set_tick_params(width=.3, length=3)
128 |
129 | # Make axis lines thinner
130 | for axis in ['bottom','left']:
131 | ax1.spines[axis].set_linewidth(0.3)
132 |
133 | # Set ticks fontsize
134 | plt.xticks(size=9)
135 | plt.yticks(size=9)
136 |
137 | # Set labels
138 | plt.xlabel(r'Number of images $K$', fontsize=10)
139 | plt.ylabel(ylabel, fontsize=10)
140 |
141 | plt.legend(bf_order, fontsize=7, loc='upper left', frameon=False, labelspacing=0)
142 |
143 |
144 | '''
145 | # Here is a larger figure with all performance measures.
146 | plt.figure(figsize=(12,6))
147 |
148 | plt.subplot(2,3,1)
149 | nice_plot(opesq_bf[:,0,:,:], 'PESQ [Raw MOS]')
150 | plt.xlabel('Number of sources')
151 | plt.ylabel('Raw MOS')
152 |
153 | plt.subplot(2,3,2)
154 | nice_plot(opesq_bf[:,1,:,:], 'PESQ [MOS LQO]')
155 |
156 | plt.subplot(2,3,3)
157 | nice_plot(osinr_bf, 'SINR [dB]')
158 | plt.xlabel('Number of sources')
159 | plt.ylabel('output SINR')
160 |
161 | plt.subplot(2,3,4)
162 | nice_plot(opesq_bf[:,0,:,:] - ipesq[:,0,np.newaxis,np.newaxis], 'Improvement PESQ [Raw MOS]')
163 | plt.xlabel('Number of sources')
164 | plt.ylabel('Improvement Raw MOS')
165 |
166 | plt.subplot(2,3,5)
167 | nice_plot(opesq_bf[:,1,:,:] - ipesq[:,1,np.newaxis,np.newaxis], 'Improvement PESQ [MOS LQO]')
168 | plt.xlabel('Number of sources')
169 | plt.ylabel('Improvement MOS LQO')
170 |
171 | plt.subplot(2,3,6)
172 | nice_plot(osinr_bf[:,:,:] - isinr[:,np.newaxis,np.newaxis], 'Improvement SINR [dB]')
173 | plt.xlabel('Number of sources')
174 | plt.ylabel('Improvement SINR')
175 |
176 | plt.tight_layout(pad=0.2)
177 | '''
178 |
179 | # Here we plot the figure used in the paper (Fig. 10)
180 | plt.figure(figsize=(4,3))
181 | nice_plot(opesq_bf[:,0,:,:], 'PESQ [MOS]',
182 | bf_order=['Rake-MaxSINR','Rake-DS','Rake-MaxUDR'])
183 | #plt.plot(np.arange(max_sources), np.median(ipesq[:,0])*np.ones(max_sources))
184 | #plt.plot(np.arange(max_sources), np.median(opesq_tri_max[:,0])*np.ones(max_sources))
185 | plt.tight_layout()
186 | plt.savefig('figures/perceptual_quality.pdf')
187 |
188 |
--------------------------------------------------------------------------------
/figure_quality_sim.py:
--------------------------------------------------------------------------------
1 |
2 | def perceptual_quality_evaluation(good_source, bad_source):
3 | '''
4 | Perceputal Quality evaluation simulation
5 | Inner Loop
6 | '''
7 |
8 | # Imports are done in the function so that it can be easily
9 | # parallelized
10 | import numpy as np
11 | from scipy.io import wavfile
12 | from scipy.signal import resample
13 | from os import getpid
14 |
15 | from Room import Room
16 | from beamforming import Beamformer, MicrophoneArray
17 | from trinicon import trinicon
18 |
19 | from utilities import normalize, to_16b, highpass
20 | from phat import time_align
21 | from metrics import snr, pesq
22 |
23 | # number of number of sources
24 | n_sources = np.arange(1,12)
25 | S = n_sources.shape[0]
26 |
27 | # we the speech samples used
28 | speech_sample1 = 'samples/fq_sample1_8000.wav'
29 | speech_sample2 = 'samples/fq_sample2_8000.wav'
30 |
31 | # Some simulation parameters
32 | Fs = 8000
33 | t0 = 1./(Fs*np.pi*1e-2) # starting time function of sinc decay in RIR response
34 | absorption = 0.90
35 | max_order_sim = 10
36 | SNR_at_mic = 20 # SNR at center of microphone array in dB
37 |
38 | # Room 1 : Shoe box
39 | room_dim = [4, 6]
40 |
41 | # microphone array design parameters
42 | mic1 = [2, 1.5] # position
43 | M = 8 # number of microphones
44 | d = 0.08 # distance between microphones
45 | phi = 0. # angle from horizontal
46 | shape = 'Linear' # array shape
47 |
48 | # create a microphone array
49 | if shape is 'Circular':
50 | mics = Beamformer.circular2D(Fs, mic1, M, phi, d*M/(2*np.pi))
51 | else:
52 | mics = Beamformer.linear2D(Fs, mic1, M, phi, d)
53 |
54 | # create a single reference mic at center of array
55 | ref_mic = MicrophoneArray(mics.center, Fs)
56 |
57 | # define the array processing type
58 | L = 4096 # frame length
59 | hop = 2048 # hop between frames
60 | zp = 2048 # zero padding (front + back)
61 | mics.setProcessing('FrequencyDomain', L, hop, zp, zp)
62 |
63 | # data receptacles
64 | beamformer_names = ['Rake-DS',
65 | 'Rake-MaxSINR',
66 | 'Rake-MaxUDR']
67 | bf_weights_fun = [mics.rakeDelayAndSumWeights,
68 | mics.rakeMaxSINRWeights,
69 | mics.rakeMaxUDRWeights]
70 | bf_fnames = ['1','2','3']
71 | NBF = len(beamformer_names)
72 |
73 | # receptacle arrays
74 | pesq_input = np.zeros(2)
75 | pesq_trinicon = np.zeros((2,2))
76 | pesq_bf = np.zeros((2,NBF,S))
77 | isinr = 0
78 | osinr_trinicon = np.zeros(2)
79 | osinr_bf = np.zeros((NBF,S))
80 |
81 | # since we run multiple thread, we need to uniquely identify filenames
82 | pid = str(getpid())
83 |
84 | file_ref = 'output_samples/fqref' + pid + '.wav'
85 | file_suffix = '-' + pid + '.wav'
86 | files_tri = ['output_samples/fqt' + str(i+1) + file_suffix for i in xrange(2)]
87 | files_bf = ['output_samples/fq' + str(i+1) + file_suffix for i in xrange(NBF)]
88 | file_raw = 'output_samples/fqraw' + pid + '.wav'
89 |
90 | # Read the two speech samples used
91 | rate, good_signal = wavfile.read(speech_sample1)
92 | good_signal = np.array(good_signal, dtype=float)
93 | good_signal = normalize(good_signal)
94 | good_signal = highpass(good_signal, rate)
95 | good_len = good_signal.shape[0]/float(Fs)
96 |
97 | rate, bad_signal = wavfile.read(speech_sample2)
98 | bad_signal = np.array(bad_signal, dtype=float)
99 | bad_signal = normalize(bad_signal)
100 | bad_signal = highpass(bad_signal, rate)
101 | bad_len = bad_signal.shape[0]/float(Fs)
102 |
103 | # variance of good signal
104 | good_sigma2 = np.mean(good_signal**2)
105 |
106 | # normalize interference signal to have equal power with desired signal
107 | bad_signal *= good_sigma2/np.mean(bad_signal**2)
108 |
109 | # pick good source position at random
110 | good_distance = np.linalg.norm(mics.center[:,0] - np.array(good_source))
111 |
112 | # pick bad source position at random
113 | bad_distance = np.linalg.norm(mics.center[:,0] - np.array(bad_source))
114 |
115 | if good_len > bad_len:
116 | good_delay = 0
117 | bad_delay = (good_len - bad_len)/2.
118 | else:
119 | bad_delay = 0
120 | good_delay = (bad_len - good_len)/2.
121 |
122 | # compute the noise variance at center of array wrt good signal and SNR
123 | sigma2_n = good_sigma2/(4*np.pi*good_distance)**2/10**(SNR_at_mic/10)
124 |
125 | # create the reference room for freespace, noisless, no interference simulation
126 | ref_room = Room.shoeBox2D(
127 | [0,0],
128 | room_dim,
129 | Fs,
130 | t0 = t0,
131 | max_order=0,
132 | absorption=absorption,
133 | sigma2_awgn=0.)
134 | ref_room.addSource(good_source, signal=good_signal, delay=good_delay)
135 | ref_room.addMicrophoneArray(ref_mic)
136 | ref_room.compute_RIR()
137 | ref_room.simulate()
138 | reference = ref_mic.signals[0]
139 | reference_n = normalize(reference)
140 |
141 | # save the reference desired signal
142 | wavfile.write(file_ref, Fs, to_16b(reference_n))
143 |
144 | # create the 'real' room with sources and mics
145 | room1 = Room.shoeBox2D(
146 | [0,0],
147 | room_dim,
148 | Fs,
149 | t0 = t0,
150 | max_order=max_order_sim,
151 | absorption=absorption,
152 | sigma2_awgn=sigma2_n)
153 |
154 | # add sources to room
155 | room1.addSource(good_source, signal=good_signal, delay=good_delay)
156 | room1.addSource(bad_source, signal=bad_signal, delay=bad_delay)
157 |
158 | # Record first the degraded signal at reference mic (center of array)
159 | room1.addMicrophoneArray(ref_mic)
160 | room1.compute_RIR()
161 | room1.simulate()
162 | raw_n = normalize(highpass(ref_mic.signals[0], Fs))
163 |
164 | # save degraded reference signal
165 | wavfile.write(file_raw, Fs, to_16b(raw_n))
166 |
167 | # Compute PESQ and SINR of raw degraded reference signal
168 | isinr = snr(reference_n, raw_n[:reference_n.shape[0]])
169 | pesq_input[:] = pesq(file_ref, file_raw, Fs=Fs).T
170 |
171 | # Now record input of microphone array
172 | room1.addMicrophoneArray(mics)
173 | room1.compute_RIR()
174 | room1.simulate()
175 |
176 | # Run the Trinicon algorithm
177 | double_sig = mics.signals.copy()
178 | for i in xrange(2):
179 | double_sig = np.concatenate((double_sig, mics.signals), axis=1)
180 | sig_len = mics.signals.shape[1]
181 | output_trinicon = trinicon(double_sig)[:,-sig_len:]
182 |
183 | # normalize time-align and save to file
184 | output_tri1 = normalize(highpass(output_trinicon[0,:], Fs))
185 | output_tri1 = time_align(reference_n, output_tri1)
186 | wavfile.write(files_tri[0], Fs, to_16b(output_tri1))
187 | output_tri2 = normalize(highpass(output_trinicon[1,:], Fs))
188 | output_tri2 = time_align(reference_n, output_tri2)
189 | wavfile.write(files_tri[1], Fs, to_16b(output_tri2))
190 |
191 | # evaluate
192 | # Measure PESQ and SINR for both output signals, we'll sort out later
193 | pesq_trinicon = pesq(file_ref, files_tri, Fs=Fs)
194 | osinr_trinicon[0] = snr(reference_n, output_tri1)
195 | osinr_trinicon[1] = snr(reference_n, output_tri2)
196 |
197 | # Run all the beamformers
198 | for k,s in enumerate(n_sources):
199 |
200 | '''
201 | BEAMFORMING PART
202 | '''
203 | # Extract image sources locations and create noise covariance matrix
204 | good_sources = room1.sources[0].getImages(n_nearest=s,
205 | ref_point=mics.center)
206 | bad_sources = room1.sources[1].getImages(n_nearest=s,
207 | ref_point=mics.center)
208 | Rn = sigma2_n*np.eye(mics.M)
209 |
210 | # run for all beamformers considered
211 | for i, bfr in enumerate(beamformer_names):
212 |
213 | # compute the beamforming weights
214 | bf_weights_fun[i](good_sources, bad_sources,
215 | R_n = sigma2_n*np.eye(mics.M),
216 | attn=True, ff=False)
217 |
218 | output = mics.process()
219 | output = normalize(highpass(output, Fs))
220 | output = time_align(reference_n, output)
221 |
222 | # save files for PESQ evaluation
223 | wavfile.write(files_bf[i], Fs, to_16b(output))
224 |
225 | # compute output SINR
226 | osinr_bf[i,k] = snr(reference_n, output)
227 |
228 | # compute PESQ
229 | pesq_bf[:,i,k] = pesq(file_ref, files_bf[i], Fs=Fs).T
230 |
231 | # end of beamformers loop
232 |
233 | # end of number of sources loop
234 |
235 | return pesq_input, pesq_trinicon, pesq_bf, isinr, osinr_trinicon, osinr_bf
236 |
237 |
238 |
239 | if __name__ == '__main__':
240 |
241 | import numpy as np
242 | import sys
243 | import time
244 |
245 | if len(sys.argv) == 3 and sys.argv[1] == '-s':
246 | parallel = False
247 | Loops = int(sys.argv[2])
248 | elif len(sys.argv) == 2:
249 | parallel = True
250 | Loops = int(sys.argv[1])
251 | else:
252 | print 'Usage: ipython figure_quality_sim.py -- [-s] '
253 | print ' -s: Serial loop, no parallelism used.'
254 | sys.exit(0)
255 |
256 | # we restrict sources to be in a square 1m away from every wall and from the array
257 | bbox_size = np.array([[2.,2.5]])
258 | bbox_origin = np.array([[1.,2.5]])
259 |
260 | # draw all target and interferer at random
261 | good_source = np.random.random((Loops,2))*bbox_size + bbox_origin
262 | bad_source = np.random.random((Loops,2))*bbox_size + bbox_origin
263 |
264 | # start timing simulation
265 | start = time.time()
266 |
267 | if parallel is True:
268 | # Launch many workers!
269 | from IPython import parallel
270 |
271 | # setup parallel computation env
272 | c = parallel.Client()
273 | print c.ids
274 | c.blocks = True
275 | view = c.load_balanced_view()
276 |
277 | out = view.map_sync(perceptual_quality_evaluation, good_source, bad_source)
278 |
279 | else:
280 | # Just one boring loop...
281 | out = []
282 | for i in xrange(Loops):
283 | out.append(perceptual_quality_evaluation(good_source[i,:], bad_source[i,:]))
284 |
285 | # How long was this ?
286 | ellapsed = time.time() - start
287 |
288 | # how long was this ?
289 | print('Time ellapsed: ' + str(ellapsed))
290 |
291 | # recover all the data
292 | pesq_input = np.array([o[0] for o in out])
293 | pesq_trinicon = np.array([o[1] for o in out])
294 | pesq_bf = np.array([o[2] for o in out])
295 | isinr = np.array([o[3] for o in out])
296 | osinr_trinicon = np.array([o[4] for o in out])
297 | osinr_bf = np.array([o[5] for o in out])
298 |
299 | # save the simulation results to file
300 | filename = 'sim_data/quality_' + time.strftime('%Y%m%d-%H%M%S') + '.npz'
301 | np.savez_compressed(filename, good_source=good_source, bad_source=bad_source,
302 | isinr=isinr, osinr_bf=osinr_bf, osinr_trinicon=osinr_trinicon,
303 | pesq_bf=pesq_bf, pesq_input=pesq_input, pesq_trinicon=pesq_trinicon)
304 |
305 |
--------------------------------------------------------------------------------
/figure_spectrograms.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | import matplotlib
4 | import matplotlib.pyplot as plt
5 | from scipy.io import wavfile
6 | from scipy.signal import resample
7 |
8 | import Room as rg
9 | import beamforming as bf
10 |
11 | from constants import eps
12 | from stft import stft, spectroplot
13 | import windows
14 | import utilities as u
15 |
16 | # Spectrogram figure properties
17 | figsize=(7.87, 1.65) # figure size
18 | figsize2=(7.87, 1.5*1.65) # figure size
19 | fft_size = 512 # fft size for analysis
20 | fft_hop = 8 # hop between analysis frame
21 | fft_zp = 512
22 | analysis_window = np.concatenate((windows.hann(fft_size), np.zeros(fft_zp)))
23 | t_cut = 0.83 # length in [s] to remove at end of signal (no sound)
24 |
25 | # Some simulation parameters
26 | Fs = 8000
27 | t0 = 1./(Fs*np.pi*1e-2) # starting time function of sinc decay in RIR response
28 | absorption = 0.90
29 | max_order_sim = 10
30 | SNR_at_mic = 20 # SNR at center of microphone array in dB
31 |
32 | # Room 1 : Shoe box
33 | room_dim = [4, 6]
34 |
35 | # the good source is fixed for all
36 | good_source = [1, 4.5] # good source
37 | normal_interferer = [2.8, 4.3] # interferer
38 |
39 | # microphone array design parameters
40 | mic1 = [2, 1.5] # position
41 | M = 8 # number of microphones
42 | d = 0.08 # distance between microphones
43 | phi = 0. # angle from horizontal
44 | design_order_good = 3 # maximum image generation used in design
45 | design_order_bad = 3 # maximum image generation used in design
46 | shape = 'Linear' # array shape
47 |
48 | # create a microphone array
49 | if shape is 'Circular':
50 | mics = bf.Beamformer.circular2D(Fs, mic1, M, phi, d*M/(2*np.pi))
51 | else:
52 | mics = bf.Beamformer.linear2D(Fs, mic1, M, phi, d)
53 |
54 | # define the array processing type
55 | L = 4096 # frame length
56 | hop = 2048 # hop between frames
57 | zp = 2048 # zero padding (front + back)
58 | mics.setProcessing('FrequencyDomain', L, hop, zp, zp)
59 |
60 | # The first signal (of interest) is singing
61 | rate1, signal1 = wavfile.read('samples/singing_'+str(Fs)+'.wav')
62 | signal1 = np.array(signal1, dtype=float)
63 | signal1 = u.normalize(signal1)
64 | signal1 = u.highpass(signal1, Fs)
65 | delay1 = 0.
66 |
67 | # the second signal (interferer) is some german speech
68 | rate2, signal2 = wavfile.read('samples/german_speech_'+str(Fs)+'.wav')
69 | signal2 = np.array(signal2, dtype=float)
70 | signal2 = u.normalize(signal2)
71 | signal2 = u.highpass(signal2, Fs)
72 | delay2 = 1.
73 |
74 | # compute the noise variance at center of array wrt signal1 and SNR
75 | sigma2_signal1 = np.mean(signal1**2)
76 | distance = np.linalg.norm(mics.center[:,0] - np.array(good_source))
77 | sigma2_n = sigma2_signal1/(4*np.pi*distance)**2/10**(SNR_at_mic/10)
78 |
79 | # create the room with sources and mics
80 | room1 = rg.Room.shoeBox2D(
81 | [0,0],
82 | room_dim,
83 | Fs,
84 | t0 = t0,
85 | max_order=max_order_sim,
86 | absorption=absorption,
87 | sigma2_awgn=sigma2_n)
88 |
89 | # add mic and sources to room
90 | room1.addSource(good_source, signal=signal1, delay=delay1)
91 | room1.addSource(normal_interferer, signal=signal2, delay=delay2)
92 | room1.addMicrophoneArray(mics)
93 |
94 | # Compute RIR and simulate propagation of signals
95 | room1.compute_RIR()
96 | room1.simulate()
97 |
98 | '''
99 | BEAMFORMER 1: Max SINR
100 | '''
101 | print 'Max SINR...'
102 |
103 | # Compute the beamforming weights depending on room geometry
104 | good_sources = room1.sources[0].getImages(max_order=0)
105 | bad_sources = room1.sources[1].getImages(max_order=0)
106 | mics.rakeMaxSINRWeights(good_sources, bad_sources,
107 | R_n = sigma2_n*np.eye(mics.M),
108 | rcond=0.,
109 | attn=True, ff=False)
110 |
111 | output_mvdr = mics.process()
112 |
113 | # high-pass and normalize
114 | output_mvdr = u.highpass(output_mvdr, Fs)
115 | output_mvdr = u.normalize(output_mvdr)
116 |
117 | '''
118 | BEAMFORMER 2: Rake MaxSINR
119 | '''
120 | print 'Rake MaxSINR...'
121 |
122 |
123 | # Compute the beamforming weights depending on room geometry
124 | good_sources = room1.sources[0].getImages(max_order=design_order_good)
125 | bad_sources = room1.sources[1].getImages(max_order=design_order_bad)
126 | mics.rakeMaxSINRWeights(good_sources, bad_sources,
127 | R_n = sigma2_n*np.eye(mics.M),
128 | rcond=0.,
129 | attn=True, ff=False)
130 |
131 | output_maxsinr = mics.process()
132 |
133 | # high-pass and normalize
134 | output_maxsinr = u.highpass(output_maxsinr, Fs)
135 | output_maxsinr = u.normalize(output_maxsinr)
136 |
137 | '''
138 | PLOT SPECTROGRAM
139 | '''
140 |
141 | dSNR = u.dB(room1.dSNR(mics.center[:,0], source=0), power=True)
142 | print 'The direct SNR for good source is ' + str(dSNR)
143 |
144 | # as comparison pic central mic signal
145 | input_mic = mics.signals[mics.M/2]
146 |
147 | # high-pass and normalize
148 | input_mic = u.highpass(input_mic, Fs)
149 | input_mic = u.normalize(input_mic)
150 |
151 | # remove a bit of signal at the end and time-align all signals.
152 | # the delays were visually measured by plotting the signals
153 | n_lim = np.ceil(len(input_mic) - t_cut*Fs)
154 | input_clean = signal1[:n_lim]
155 | input_mic = input_mic[105:n_lim+105]
156 | output_mvdr = output_mvdr[31:n_lim+31]
157 | output_maxsinr = output_maxsinr[31:n_lim+31]
158 |
159 | # save all files for listening test
160 | wavfile.write('output_samples/input_mic.wav', Fs, input_mic)
161 | wavfile.write('output_samples/output_maxsinr.wav', Fs, output_mvdr)
162 | wavfile.write('output_samples/output_rake-maxsinr.wav', Fs, output_maxsinr)
163 |
164 | # compute time-frequency planes
165 | F0 = stft(input_clean, fft_size, fft_hop,
166 | win=analysis_window,
167 | zp_back=fft_zp)
168 | F1 = stft(input_mic, fft_size, fft_hop,
169 | win=analysis_window,
170 | zp_back=fft_zp)
171 | F2 = stft(output_mvdr, fft_size, fft_hop,
172 | win=analysis_window,
173 | zp_back=fft_zp)
174 | F3 = stft(output_maxsinr, fft_size, fft_hop,
175 | win=analysis_window,
176 | zp_back=fft_zp)
177 |
178 | # (not so) fancy way to set the scale to avoid having the spectrum
179 | # dominated by a few outliers
180 | p_min = 7
181 | p_max = 100
182 | all_vals = np.concatenate((u.dB(F1+eps),
183 | u.dB(F2+eps),
184 | u.dB(F3+eps),
185 | u.dB(F0+eps))).flatten()
186 | vmin, vmax = np.percentile(all_vals, [p_min, p_max])
187 |
188 | #cmap = 'afmhot'
189 | interpolation='sinc'
190 | cmap = 'Purples'
191 | #cmap = 'YlGnBu'
192 | #cmap = 'PuRd'
193 | cmap = 'binary'
194 | #interpolation='none'
195 |
196 | # We want to blow up some parts of the spectromgram to highlight differences
197 | # Define some boxes here
198 | from matplotlib.patches import Circle, Wedge, Polygon
199 | from matplotlib.collections import PatchCollection
200 | import matplotlib.pyplot as plt
201 | top = F0.shape[1]/2+1
202 | end = F0.shape[0]
203 | x1 = np.floor(end*np.array([0.045, 0.13]))
204 | y1 = np.floor(top*np.array([0.74, 0.908]))
205 | box1 = [[x1[0],y1[0]],[x1[0],y1[1]],[x1[1],y1[1]],[x1[1],y1[0]],[x1[0],y1[0]]]
206 |
207 | x2 = np.floor(end*np.array([0.50, 0.66]))
208 | y2 = np.floor(top*np.array([0.84, 0.96]))
209 | box2 = [[x2[0],y2[0]],[x2[0],y2[1]],[x2[1],y2[1]],[x2[1],y2[0]],[x2[0],y2[0]]]
210 |
211 | x3 = np.floor(end*np.array([0.48, 0.64]))
212 | y3 = np.floor(top*np.array([0.44, 0.56]))
213 | box3 = [[x3[0],y3[0]],[x3[0],y3[1]],[x3[1],y3[1]],[x3[1],y3[0]],[x3[0],y3[0]]]
214 |
215 | boxes = [Polygon(box1, True, fill=False, facecolor='none'),
216 | Polygon(box2, True, fill=False, facecolor='none'),
217 | Polygon(box3, True, fill=False, facecolor='none'),]
218 | ec=np.array([0,0,0])
219 | lw = 0.5
220 |
221 | # Draw first the spectrograms with boxes on top
222 | fig, ax = plt.subplots(figsize=figsize2, nrows=2, ncols=4)
223 |
224 | ax = plt.subplot(2,4,1)
225 | spectroplot(F0.T, fft_size+fft_zp, fft_hop, Fs, vmin=vmin, vmax=vmax,
226 | cmap=plt.get_cmap(cmap), interpolation=interpolation, colorbar=False)
227 | ax.add_collection(PatchCollection(boxes, facecolor='none', edgecolor=ec, linewidth=lw))
228 | ax.text(F0.shape[0]-300, F0.shape[1]/2-60, 'A', weight='bold')
229 | ax.set_ylabel('')
230 | ax.set_xlabel('')
231 | aspect = ax.get_aspect()
232 | ax.axis('off')
233 |
234 | ax = plt.subplot(2,4,2)
235 | spectroplot(F1.T, fft_size+fft_zp, fft_hop, Fs, vmin=vmin, vmax=vmax,
236 | cmap=plt.get_cmap(cmap), interpolation=interpolation, colorbar=False)
237 | ax.add_collection(PatchCollection(boxes, facecolor='none', edgecolor=ec, linewidth=lw))
238 | ax.text(F0.shape[0]-300, F0.shape[1]/2-60, 'B', weight='bold')
239 | ax.set_ylabel('')
240 | ax.set_xlabel('')
241 | ax.axis('off')
242 |
243 | ax = plt.subplot(2,4,3)
244 | spectroplot(F2.T, fft_size+fft_zp, fft_hop, Fs, vmin=vmin, vmax=vmax,
245 | cmap=plt.get_cmap(cmap), interpolation=interpolation, colorbar=False)
246 | ax.add_collection(PatchCollection(boxes, facecolor='none', edgecolor=ec, linewidth=lw))
247 | ax.text(F0.shape[0]-300, F0.shape[1]/2-60, 'C', weight='bold')
248 | ax.set_ylabel('')
249 | ax.set_xlabel('')
250 | ax.axis('off')
251 |
252 | ax = plt.subplot(2,4,4)
253 | spectroplot(F3.T, fft_size+fft_zp, fft_hop, Fs, vmin=vmin, vmax=vmax,
254 | cmap=plt.get_cmap(cmap), interpolation=interpolation, colorbar=False)
255 | ax.add_collection(PatchCollection(boxes, facecolor='none', edgecolor=ec, linewidth=lw))
256 | ax.text(F0.shape[0]-300, F0.shape[1]/2-60, 'D', weight='bold')
257 | ax.set_ylabel('')
258 | ax.set_xlabel('')
259 | ax.axis('off')
260 |
261 | # conserve aspect ratio from top plot
262 | aspect = float(top)/end
263 | w = figsize2[0]/4
264 | h = figsize2[1]/2
265 | aspect = (h/top)/(w/end)
266 |
267 | z1 = 0.5*end/(x1[1]-x1[0]+1)
268 | z2 = 0.5*end/(x2[1]-x2[0]+1)
269 | z3 = 0.5*end/(x3[1]-x3[0]+1)
270 |
271 | # 3x zoom on blown up boxes
272 | zoom = 3.
273 |
274 | # define a function to plot the blown-up part
275 | # with proper aspect ratio and zoom
276 | def blow_up(F, x, y, aspect, ax, zoom=None):
277 | w = x[1]+1-x[0]
278 | h = y[1]+1-y[0]
279 | extent = [0,w,0,h]
280 | plt.imshow(u.dB(F[x[0]:x[1]+1,y[0]:y[1]+1].T),
281 | aspect=aspect,
282 | origin='lower', extent=extent,
283 | vmin=vmin, vmax=vmax, cmap=cmap, interpolation=interpolation)
284 | if zoom is not None:
285 | wo = w*(1-zoom)/zoom
286 | ho = h*(1-zoom)/zoom
287 | ax.set_xlim(-wo/2,w+wo/2)
288 | ax.set_ylim(-ho/2,h+ho/2)
289 | ax.set_ylabel('')
290 | ax.set_xlabel('')
291 | ax.axis('off')
292 |
293 | # plot the blown up boxes
294 | ax = plt.subplot(2,8,9)
295 | blow_up(F0,x1,y1,aspect,ax,zoom=zoom/z1)
296 | ax = plt.subplot(4,8,18)
297 | blow_up(F0,x2,y2,aspect,ax,zoom=zoom/z2)
298 | ax = plt.subplot(4,8,26)
299 | blow_up(F0,x3,y3,aspect,ax,zoom=zoom/z3)
300 |
301 | ax = plt.subplot(2,8,11)
302 | blow_up(F1,x1,y1,aspect,ax,zoom=zoom/z1)
303 | ax = plt.subplot(4,8,20)
304 | blow_up(F1,x2,y2,aspect,ax,zoom=zoom/z2)
305 | ax = plt.subplot(4,8,28)
306 | blow_up(F1,x3,y3,aspect,ax,zoom=zoom/z3)
307 |
308 | ax = plt.subplot(2,8,13)
309 | blow_up(F2,x1,y1,aspect,ax,zoom=zoom/z1)
310 | ax = plt.subplot(4,8,22)
311 | blow_up(F2,x2,y2,aspect,ax,zoom=zoom/z2)
312 | ax = plt.subplot(4,8,30)
313 | blow_up(F2,x3,y3,aspect,ax,zoom=zoom/z3)
314 |
315 | ax = plt.subplot(2,8,15)
316 | blow_up(F3,x1,y1,aspect,ax,zoom=zoom/z1)
317 | ax = plt.subplot(4,8,24)
318 | blow_up(F3,x2,y2,aspect,ax,zoom=zoom/z2)
319 | ax = plt.subplot(4,8,32)
320 | blow_up(F3,x3,y3,aspect,ax,zoom=zoom/z3)
321 |
322 | plt.subplots_adjust(left=0.0, right=1., bottom=0., top=1., wspace=0.02, hspace=0.02)
323 |
324 | fig.savefig('figures/spectrograms.pdf', dpi=600)
325 | fig.savefig('figures/spectrograms.png', dpi=300)
326 |
327 | plt.show()
328 |
--------------------------------------------------------------------------------
/figures/README.md:
--------------------------------------------------------------------------------
1 | Figures
2 | =======
3 |
4 | This directory will contain all the figures of the paper.
5 |
6 | The correspondance between files and figures in the paper is the following.
7 |
8 | * Fig. 3 `SNR_gain.pdf`
9 | * Fig. 6 `beam_scenarios.pdf`
10 | * Fig. 7 `SINR_vs_K.pdf`
11 | * Fig. 8 `UDR_vs_K.pdf`
12 | * Fig. 9 `SINR_vs_freq.pdf`
13 | * Fig. 10 `perceptual_quality.pdf`
14 | * Fig. 11 `spectrograms.pdf`
15 |
--------------------------------------------------------------------------------
/figures/beam_scenarios.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/figures/beam_scenarios.png
--------------------------------------------------------------------------------
/figures/spectrograms.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/figures/spectrograms.png
--------------------------------------------------------------------------------
/make_all_figures.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Create all figures and sound samples
4 |
5 | ipython figure_spectrograms.py
6 |
7 | ipython figure_beam_scenarios.py
8 |
9 | ipython figure_Measures1.py
10 |
11 | ipython figure_Measures2.py
12 |
13 | ipython figure_SumNorm.py
14 |
15 | # Here one can launch a cluster of ipython
16 | # workers and remove the '-s' option for a larg
17 | # speed gain.
18 | ipython figure_quality_sim.py -- -s 10000
19 |
20 | ipython figure_quality_plot.py
21 |
22 |
--------------------------------------------------------------------------------
/metrics.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | import os
4 | from stft import stft
5 |
6 | import platform
7 |
8 | def median(x):
9 | '''
10 | m, ci = median(x)
11 | computes median and 0.95% confidence interval.
12 | x: 1D ndarray
13 | m: median
14 | ci: [le, ue]
15 | The confidence interval is [m-le, m+ue]
16 | '''
17 | x = np.sort(x);
18 | n = x.shape[0]
19 |
20 | if n % 2 == 1:
21 | # if n is odd, take central element
22 | m = x[(n+1)/2];
23 | else:
24 | # if n is even, average the two central elements
25 | m = 0.5*(x[n/2] + x[n/2+1]);
26 |
27 | # This table is taken from the Performance Evaluation lecture notes by J-Y Le Boudec
28 | # available at: http://perfeval.epfl.ch/lectureNotes.htm
29 | CI = [[1,6], [1,7], [1,7], [2,8], [2,9], [2,10], [3,10], [3,11], [3,11],[4,12], \
30 | [4,12], [5,13], [5,14], [5,15], [6,15], [6,16], [6,16], [7,17], [7,17],[8,18], \
31 | [8,19], [8,20], [9,20], [9,21], [10,21],[10,22],[10,22],[11,23],[11,23], \
32 | [12,24],[12,24],[13,25],[13,26],[13,27],[14,27],[14,28],[15,28],[15,29], \
33 | [16,29],[16,30],[16,30],[17,31],[17,31],[18,32],[18,32],[19,33],[19,34], \
34 | [19,35],[20,35],[20,36],[21,36],[21,37],[22,37],[22,38],[23,39],[23,39], \
35 | [24,40],[24,40],[24,40],[25,41],[25,41],[26,42],[26,43],[26,44],[27,44]];
36 | CI = np.array(CI)
37 |
38 | # adjust to indexing from 0
39 | CI -= 1
40 |
41 | if n < 6:
42 | # If we have less than 6 samples, we cannot have a confidence interval
43 | ci = np.array([0,0])
44 | elif n <= 70:
45 | # For 6 <= n <= 70, we use exact values from the table
46 | j = CI[n-6,0]
47 | k = CI[n-6,1]
48 | ci = np.array([x[j]-m,x[k]-m])
49 | else:
50 | # For 70 < n, we use the approximation for large sets
51 | j = np.floor(0.5*n - 0.98*np.sqrt(n))
52 | k = np.ceil(0.5*n + 1 + 0.98*np.sqrt(n))
53 | ci = np.array([x[j]-m,x[k]-m])
54 |
55 | return m, ci
56 |
57 | # Simple mean squared error function
58 | def mse(x1, x2):
59 | return (np.abs(x1-x2)**2).sum()/len(x1)
60 |
61 |
62 | # Itakura-Saito distance function
63 | def itakura_saito(x1, x2, sigma2_n, stft_L=128, stft_hop=128):
64 |
65 | P1 = np.abs(stft(x1, stft_L, stft_hop))**2
66 | P2 = np.abs(stft(x2, stft_L, stft_hop))**2
67 |
68 | VAD1 = P1.mean(axis=1) > 2*stft_L**2*sigma2_n
69 | VAD2 = P2.mean(axis=1) > 2*stft_L**2*sigma2_n
70 | VAD = np.logical_or(VAD1, VAD2)
71 |
72 | if P1.shape[0] != P2.shape[0] or P1.shape[1] != P2.shape[1]:
73 | raise ValueError("Error: Itakura-Saito requires both array to have same length")
74 |
75 | R = P1[VAD,:]/P2[VAD,:]
76 |
77 | IS = (R - np.log(R) - 1.).mean(axis=1)
78 |
79 | return np.median(IS)
80 |
81 | def snr(ref, deg):
82 |
83 | return np.sum(ref**2)/np.sum((ref-deg)**2)
84 |
85 | # Perceptual Evaluation of Speech Quality for multiple files using multiple threads
86 | def pesq(ref_file, deg_files, Fs=8000, swap=False, wb=False, bin='./bin/pesq'):
87 | '''
88 | pesq_vals = pesq(ref_file, deg_files, sample_rate=None, bin='./bin/pesq'):
89 | Uses the utility obtained from ITU P.862
90 | http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en
91 |
92 | Arguments
93 | ---------
94 | ref_file: The filename of the reference file.
95 | deg_files: A list of degraded sound files names.
96 | sample_rate: Sample rates of the sound files [8kHz or 16kHz, default 8kHz].
97 | swap: Swap byte orders (whatever that does is not clear to me) [default: False].
98 | wb: Use wideband algorithm [default: False].
99 | bin: Location of pesq executable [default: ./bin/pesq].
100 |
101 | Return
102 | ------
103 | pesq_vals: A 2xN ndarray containing Raw MOS and MOS LQO in rows 0 and 1,
104 | respectively, and has one column per degraded file name in deg_files.
105 | '''
106 |
107 | if isinstance(deg_files, str):
108 | deg_files = [deg_files]
109 |
110 | if platform.system() is 'Windows':
111 | bin = bin + '.exe'
112 |
113 | if not os.path.isfile(ref_file):
114 | raise ValueError('Some file did not exist')
115 | for f in deg_files:
116 | if not os.path.isfile(f):
117 | raise ValueError('Some file did not exist')
118 |
119 | if Fs not in (8000, 16000):
120 | raise ValueError('sample rate must be 8000 or 16000')
121 |
122 | args = [ bin, '+%d' % int(Fs) ]
123 |
124 | if swap is True:
125 | args.append('+swap')
126 |
127 | if wb is True:
128 | args.append('+wb')
129 |
130 | args.append(ref_file)
131 |
132 | # array to receive all output values
133 | pesq_vals = np.zeros((2,len(deg_files)))
134 |
135 | # launch pesq for each degraded file in a different process
136 | import subprocess
137 | pipes = [ subprocess.Popen(args+[deg], stdout=subprocess.PIPE) for deg in deg_files ]
138 | states = np.ones(len(pipes), dtype=np.bool)
139 |
140 | # Recover output as the processes finish
141 | while states.any():
142 |
143 | for i,p in enumerate(pipes):
144 | if states[i] == True and p.poll() is not None:
145 | states[i] = False
146 | out = p.stdout.readlines()
147 | last_line = out[-1][:-2]
148 |
149 | if wb is True:
150 | if not last_line.startswith('P.862.2 Prediction'):
151 | raise ValueError(last_line)
152 | pesq_vals[:,i] = np.array([0, float(last_line.split()[-1])])
153 | else:
154 | if not last_line.startswith('P.862 Prediction'):
155 | raise ValueError(last_line)
156 | pesq_vals[:,i] = np.array(map(float, last_line.split()[-2:]))
157 |
158 | return pesq_vals
159 |
--------------------------------------------------------------------------------
/output_samples/README.md:
--------------------------------------------------------------------------------
1 | Sound Samples
2 | =============
3 |
4 | A directory to store all generated output sound samples. All samples have been
5 | normalized to have maximum amplitude 1.
6 |
7 | * `input_mic.wav` is the input to one of the central microphone of the
8 | array, for reference.
9 |
10 | * `output_maxsinr.wav` is the output of the processing by the conventional
11 | Max-SINR beamformer.
12 |
13 | * `output_rake-maxsinr.wav` is the output of the Rake-Max-SINR beamformer.
14 |
--------------------------------------------------------------------------------
/output_samples/input_mic.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/output_samples/input_mic.wav
--------------------------------------------------------------------------------
/output_samples/output_maxsinr.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/output_samples/output_maxsinr.wav
--------------------------------------------------------------------------------
/output_samples/output_rake-maxsinr.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/output_samples/output_rake-maxsinr.wav
--------------------------------------------------------------------------------
/phat.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 |
4 | def phat(x1, x2):
5 |
6 | N1 = x1.shape[0]
7 | N2 = x2.shape[0]
8 |
9 | N = N1 + N2 - 1
10 |
11 | X1 = np.fft.rfft(x1, n=N)
12 | X1 /= np.abs(X1)
13 |
14 | X2 = np.fft.rfft(x2, n=N)
15 | X2 /= np.abs(X2)
16 |
17 | r_12 = np.fft.irfft(X1*np.conj(X2), n=N)
18 |
19 | '''
20 | import matplotlib.pyplot as plt
21 | plt.figure()
22 | plt.plot(r_12)
23 | plt.show()
24 | '''
25 |
26 | i = np.argmax(np.abs(r_12))
27 |
28 | if i < N1:
29 | return i
30 | else:
31 | return i - N1 - N2 + 1
32 |
33 | def correlation(x1, x2):
34 |
35 | N1 = x1.shape[0]
36 | N2 = x2.shape[0]
37 |
38 | N = N1 + N2 - 1
39 |
40 | x1_p = np.zeros(N)
41 | x1_p[:N1] = x1
42 | x2_p = np.zeros(N)
43 | x2_p[:N2] = x2
44 |
45 | X1 = np.fft.fft(x1_p)
46 |
47 | X2 = np.fft.fft(x2_p)
48 |
49 | r_12 = np.real(np.fft.ifft(X1*np.conj(X2)))
50 |
51 | '''
52 | import matplotlib.pyplot as plt
53 | plt.figure()
54 | plt.plot(np.real(r_12))
55 | plt.plot(np.imag(r_12))
56 | plt.show()
57 | '''
58 |
59 | i = np.argmax(r_12)
60 |
61 | if i < N1:
62 | return i
63 | else:
64 | return i - N1 - N2 + 1
65 |
66 |
67 | def delay_estimation(x1, x2, L):
68 | '''
69 | Estimate the delay between x1 and x2.
70 | L is the block length used for phat
71 | '''
72 |
73 | K = np.minimum(x1.shape[0], x2.shape[0])/L
74 |
75 | delays = np.zeros(K)
76 | for k in xrange(K):
77 | delays[k] = phat(x1[k*L:(k+1)*L], x2[k*L:(k+1)*L])
78 |
79 | return int(np.median(delays))
80 |
81 |
82 | def time_align(ref, deg, L=4096):
83 | '''
84 | return a copy of deg time-aligned and of same-length as ref.
85 | L is the block length used for correlations.
86 | '''
87 |
88 | # estimate delay of signal
89 | from phat import delay_estimation
90 | from numpy import zeros, minimum
91 | delay = delay_estimation(ref, deg, L)
92 |
93 | # time-align with reference segment for error metric computation
94 | sig = zeros(ref.shape[0])
95 | if (delay >= 0):
96 | length = minimum(deg.shape[0], ref.shape[0]-delay)
97 | sig[delay:length+delay] = deg[:length]
98 | else:
99 | length = minimum(deg.shape[0]+delay, ref.shape[0])
100 | sig = zeros(ref.shape)
101 | sig[:length] = deg[-delay:-delay+length]
102 |
103 | return sig
104 |
105 |
--------------------------------------------------------------------------------
/samples/Homer.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/Homer.wav
--------------------------------------------------------------------------------
/samples/fq_sample1_8000.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/fq_sample1_8000.wav
--------------------------------------------------------------------------------
/samples/fq_sample2_8000.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/fq_sample2_8000.wav
--------------------------------------------------------------------------------
/samples/german_speech.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/german_speech.wav
--------------------------------------------------------------------------------
/samples/german_speech_44100.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/german_speech_44100.wav
--------------------------------------------------------------------------------
/samples/german_speech_8000.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/german_speech_8000.wav
--------------------------------------------------------------------------------
/samples/noreverb.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/noreverb.wav
--------------------------------------------------------------------------------
/samples/singing.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/singing.wav
--------------------------------------------------------------------------------
/samples/singing_16000.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/singing_16000.wav
--------------------------------------------------------------------------------
/samples/singing_44100.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/singing_44100.wav
--------------------------------------------------------------------------------
/samples/singing_8000.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/singing_8000.wav
--------------------------------------------------------------------------------
/samples/speech.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/speech.wav
--------------------------------------------------------------------------------
/samples/sputnk1b.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/samples/sputnk1b.wav
--------------------------------------------------------------------------------
/sim_data/README.md:
--------------------------------------------------------------------------------
1 | Simulation Data
2 | ===============
3 |
4 | A directory to store all generated simulation data.
5 |
--------------------------------------------------------------------------------
/sim_data/fig10/quality_20150109-070951.npz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/sim_data/fig10/quality_20150109-070951.npz
--------------------------------------------------------------------------------
/sim_data/fig10/quality_20150109-095429.npz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/sim_data/fig10/quality_20150109-095429.npz
--------------------------------------------------------------------------------
/sim_data/fig10/quality_20150109-201321.npz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LCAV/AcousticRakeReceiver/3786e6470662430dc08a61948a9a688c90aee423/sim_data/fig10/quality_20150109-201321.npz
--------------------------------------------------------------------------------
/stft.py:
--------------------------------------------------------------------------------
1 | '''Collection of spectral estimation methods.'''
2 |
3 | import sys
4 | import numpy as np
5 | from scipy.signal import correlate as correlate
6 | import matplotlib.pyplot as plt
7 |
8 | from numpy.lib.stride_tricks import as_strided
9 |
10 | # a routine for long convolutions using overlap add method
11 |
12 |
13 | def overlap_add(in1, in2, L):
14 |
15 | # set the shortest sequence as the filter
16 | if (len(in1) > len(in2)):
17 | x = in1
18 | h = in2
19 | else:
20 | h = in1
21 | x = in2
22 |
23 | # filter length
24 | M = len(h)
25 |
26 | # FFT size
27 | N = L + M - 1
28 |
29 | # frequency domain filter (zero-padded)
30 | H = np.fft.rfft(h, N)
31 |
32 | # prepare output signal
33 | ylen = int(np.ceil(len(x) / float(L)) * L + M - 1)
34 | y = np.zeros(ylen)
35 |
36 | # overlap add
37 | i = 0
38 | while (i < len(x)):
39 | y[i:i + N] += np.fft.irfft(np.fft.rfft(x[i:i + L], N) * H, N)
40 | i += L
41 |
42 | return y[:len(x) + M - 1]
43 |
44 |
45 | # Nicely plot the spectrogram
46 | def spectroplot(Z, N, hop, Fs, fdiv=None, tdiv=None,
47 | vmin=None, vmax=None, cmap=None, interpolation='none', colorbar=True):
48 |
49 | plt.imshow(
50 | 20 * np.log10(np.abs(Z[:N / 2 + 1, :])),
51 | aspect='auto',
52 | origin='lower',
53 | vmin=vmin, vmax=vmax, cmap=cmap, interpolation=interpolation)
54 |
55 | # label y axis correctly
56 | plt.ylabel('Freq [Hz]')
57 | yticks = plt.getp(plt.gca(), 'yticks')
58 | plt.setp(plt.gca(), 'yticklabels', np.round(yticks / float(N) * Fs))
59 | if (fdiv is not None):
60 | tick_lbls = np.arange(0, Fs / 2, fdiv)
61 | tick_locs = tick_lbls * N / Fs
62 | plt.yticks(tick_locs, tick_lbls)
63 |
64 | # label x axis correctly
65 | plt.xlabel('Time [s]')
66 | xticks = plt.getp(plt.gca(), 'xticks')
67 | plt.setp(plt.gca(), 'xticklabels', xticks / float(Fs) * hop)
68 | if (tdiv is not None):
69 | unit = float(hop) / Fs
70 | length = unit * Z.shape[1]
71 | tick_lbls = np.arange(0, int(length), tdiv)
72 | tick_locs = tick_lbls * Fs / hop
73 | plt.xticks(tick_locs, tick_lbls)
74 |
75 | if colorbar is True:
76 | plt.colorbar(orientation='horizontal')
77 |
78 | # A more general implementation of STFT
79 |
80 |
81 | def stft(x, L, hop, transform=np.fft.fft, win=None, zp_back=0, zp_front=0):
82 | '''
83 | Arguments:
84 | x: input signal
85 | L: frame size
86 | hop: shift size between frames
87 | transform: the transform routine to apply (default FFT)
88 | win: the window to apply (default None)
89 | zp_back: zero padding to apply at the end of the frame
90 | zp_front: zero padding to apply at the beginning of the frame
91 | Return:
92 | The STFT of x
93 | '''
94 |
95 | # the transform size
96 | N = L + zp_back + zp_front
97 |
98 | # window needs to be same size as transform
99 | if (win is not None and len(win) != N):
100 | print 'Window length need to be equal to frame length + zero padding.'
101 | sys.exit(-1)
102 |
103 | # reshape
104 | new_strides = (hop * x.strides[0], x.strides[0])
105 | new_shape = ((len(x) - L) / hop + 1, L)
106 | y = as_strided(x, shape=new_shape, strides=new_strides)
107 |
108 | # add the zero-padding
109 | y = np.concatenate(
110 | (np.zeros(
111 | (y.shape[0], zp_front)), y, np.zeros(
112 | (y.shape[0], zp_back))), axis=1)
113 |
114 | # apply window if needed
115 | if (win is not None):
116 | y = win * y
117 | #y = np.expand_dims(win, 0)*y
118 |
119 | # transform along rows
120 | Z = transform(y, axis=1)
121 |
122 | # apply transform
123 | return Z
124 |
125 |
126 | # inverse STFT
127 | def istft(X, L, hop, transform=np.fft.ifft, win=None, zp_back=0, zp_front=0):
128 |
129 | # the transform size
130 | N = L + zp_back + zp_front
131 |
132 | # window needs to be same size as transform
133 | if (win is not None and len(win) != N):
134 | print 'Window length need to be equal to frame length + zero padding.'
135 | sys.exit(-1)
136 |
137 | # inverse transform
138 | iX = transform(X, axis=1)
139 | if (iX.dtype == 'complex128'):
140 | iX = np.real(iX)
141 |
142 | # apply synthesis window if necessary
143 | if (win is not None):
144 | iX *= win
145 |
146 | # create output signal
147 | x = np.zeros(X.shape[0] * hop + (L - hop) + zp_back + zp_front)
148 |
149 | # overlap add
150 | for i in xrange(X.shape[0]):
151 | x[i * hop:i * hop + N] += iX[i]
152 |
153 | return x
154 |
155 |
156 | # FreqVec: given FFT size and sampling rate, returns a vector of real
157 | # frequencies
158 | def freqvec(N, Fs, centered=False):
159 | '''
160 | N: FFT length
161 | Fs: sampling rate of the signal
162 | shift: False if the DC is at the beginning, True if the DC is centered
163 | '''
164 |
165 | # Create a centered vector. The (1-N%2) is to correct for even/odd length
166 | vec = np.arange(-N / 2 + (1 - N % 2), N / 2 + 1) * float(Fs) / float(N)
167 |
168 | # Shift positive/negative frequencies if needed. Again (1-N%2) for
169 | # even/odd length
170 | if centered:
171 | return vec
172 | else:
173 | return np.concatenate((vec[N / 2 - (1 - N % 2):], vec[0:N / 2 - 1]))
174 |
--------------------------------------------------------------------------------
/trinicon.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | from scipy.signal import fftconvolve, correlate
4 | import matplotlib.pyplot as plt
5 |
6 | def trinicon(signals):
7 | '''
8 | Implementation of the TRINICON Blind Source Separation algorithm as described in
9 |
10 | Aichner, R., Buchner, H., Yan, F., & Kellermann, W. (2006).
11 | A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments.
12 | Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022
13 |
14 | Specifically, adaptation of the pseudo-code from Table 1.
15 |
16 | The implementation is hard-coded for 2 output channels.
17 | '''
18 |
19 | P = signals.shape[0] # number of microphones
20 | Q = 2 # number of output channels
21 |
22 | K = 8 # Number of successive blocks processed at the same time
23 | L = 4096 # Filter length
24 | N = 2*L # Block length
25 | alpha_on = K # online overlap factor
26 | alpha_off = 1 # offline overlap factor (not used here)
27 |
28 | j_max = 10 # number of offline iterations
29 |
30 | delta_max = 1e-4 # regularization parameter, this sets the maximum value of the regularization term
31 | sigma2_0 = 1e-7 # regularization parameter, this sets the reference (machine?) noise level in the regularization
32 |
33 | mu = 0.0010 # offline update step size
34 | lambd_a = 0.2 # online forgetting factor
35 |
36 | # the filters
37 | w = np.zeros((P,Q,L))
38 | w[:P/2,0,L/2] = 1
39 | w[P/2:,1,L/2] = 1
40 |
41 | hop = K*L/alpha_on
42 |
43 | # pad with zeros to have a whole number of online blocks
44 | if signals.shape[1] % hop != 0:
45 | signals = np.concatenate((signals, np.zeros((P, hop - (signals.shape[0]%hop)))), axis=1)
46 |
47 | S = signals.shape[1] # total signal length
48 | M = S / hop # number of online blocks
49 |
50 | y = np.zeros((Q,S)) # the processed output signal
51 |
52 | m = 1 # online block index
53 | while m <= M: # online loop
54 |
55 | # new chunk of input signal
56 | x = np.zeros((P,K*L+N))
57 | if m*hop > S:
58 | # we need some zero padding at the back
59 | le = S - (m-1)*hop + N
60 | x[:le] = signals[:,m*hop-K*L-N:]
61 | if m*hop >= K*L+N:
62 | x = signals[:,m*hop-K*L-N:m*hop]
63 | else:
64 | # we need some zero padding at the beginning
65 | x[:,-m*hop:] = signals[:,:m*hop]
66 |
67 | # use filter from previous iteration to initialize offline part
68 | w_new = w.copy()
69 |
70 | for j in xrange(j_max): # offline update loop
71 |
72 | y_c = np.zeros((Q,K*L+N-L)) # c stands for chunk
73 | y_blocks = np.zeros((Q,K,N))
74 |
75 | for q in xrange(Q):
76 | # convolve with filters
77 | for p in xrange(P):
78 | # We discard the 'oldest' output of the convolution according
79 | # to the filter matrix definition (6) in the paper
80 | y_c[q,:] += fftconvolve(x[p,:], w_new[p,q,:], mode='valid')[1:]
81 |
82 | # split into smaller blocks
83 | for i in xrange(K):
84 | y_blocks[q,i,:] = y_c[q,i*L:i*L+N]
85 |
86 | # blocks energy
87 | sigma2 = np.sum(y_blocks**2, axis=2)
88 |
89 | # cross-correlations
90 | r_cross = np.zeros((Q,K,2*L-1))
91 | for i in xrange(K):
92 | y0 = y_c[0,i*L:i*L+N]
93 | y1 = y_c[1,i*L:i*L+N]
94 | r = fftconvolve(y1, y0[::-1], mode='full')
95 | r_cross[0,i,:] = r[N-L:N+L-1] # r_y1y0
96 | r_cross[1,i,:] = r_cross[0,i,::-1] # r_y0y1 by symmetry is just r_y1y0 reversed
97 |
98 | # regularization term
99 | delta = delta_max*np.exp(-sigma2/sigma2_0)
100 |
101 | # offline update
102 | delta_w = np.zeros((P,Q,L))
103 | for q in xrange(Q):
104 | for p in xrange(P):
105 | for i in xrange(K):
106 | # this implements the row-wise sylvester constraint as explained in Fig. 4 (b) of paper
107 | delta_w[p,q,:] += fftconvolve(r_cross[q,i,:]/(sigma2[q,i]+delta[q,i]), w_new[p,1-q,::-1], mode='valid')[::-1]
108 | delta_w[p,q,:] /= K
109 |
110 | w_new = w_new - mu*delta_w
111 |
112 | # online update
113 | w = lambd_a*w + (1-lambd_a)*w_new
114 |
115 | # compute output signal
116 | for q in xrange(Q):
117 | for p in xrange(P):
118 | y[q,(m-1)*hop:m*hop] += fftconvolve(x[p,-hop-L+1:], w[p,q,:], mode='valid')
119 |
120 | # next block
121 | m += 1
122 |
123 | return y
124 |
--------------------------------------------------------------------------------
/utilities.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | import constants
4 |
5 | def to_16b(signal):
6 | '''
7 | converts float 32 bit signal (-1 to 1) to a signed 16 bits representation
8 | No clipping in performed, you are responsible to ensure signal is within
9 | the correct interval.
10 | '''
11 | return ((2**15-1)*signal).astype(np.int16)
12 |
13 |
14 | def clip(signal, high, low):
15 | '''
16 | Clip a signal from above at high and from below at low.
17 | '''
18 | s = signal.copy()
19 |
20 | s[np.where(s > high)] = high
21 | s[np.where(s < low)] = low
22 |
23 | return s
24 |
25 |
26 | def normalize(signal, bits=None):
27 | '''
28 | normalize to be in a given range. The default is to normalize the maximum
29 | amplitude to be one. An optional argument allows to normalize the signal
30 | to be within the range of a given signed integer representation of bits.
31 | '''
32 |
33 | s = signal.copy()
34 |
35 | s /= np.abs(s).max()
36 |
37 | # if one wants to scale for bits allocated
38 | if bits is not None:
39 | s *= 2 ** (bits - 1)
40 | s = clip(signal, 2 ** (bits - 1) - 1, -2 ** (bits - 1))
41 |
42 | return s
43 |
44 |
45 | def angle_from_points(x1, x2):
46 |
47 | return np.angle((x1[0,0]-x2[0,0]) + 1j*(x1[1,0] - x2[1,0]))
48 |
49 |
50 | def normalize_pwr(sig1, sig2):
51 | '''
52 | normalize sig1 to have the same power as sig2
53 | '''
54 |
55 | # average power per sample
56 | p1 = np.mean(sig1 ** 2)
57 | p2 = np.mean(sig2 ** 2)
58 |
59 | # normalize
60 | return sig1.copy() * np.sqrt(p2 / p1)
61 |
62 |
63 | def highpass(signal, Fs, fc=constants.fc_hp, plot=False):
64 | '''
65 | Filter out the really low frequencies, default is below 50Hz
66 | '''
67 |
68 | # have some predefined parameters
69 | rp = 5 # minimum ripple in dB in pass-band
70 | rs = 60 # minimum attenuation in dB in stop-band
71 | n = 4 # order of the filter
72 | type = 'butter'
73 |
74 | # normalized cut-off frequency
75 | wc = 2. * fc / Fs
76 |
77 | # design the filter
78 | from scipy.signal import iirfilter, lfilter, freqz
79 | b, a = iirfilter(n, Wn=wc, rp=rp, rs=rs, btype='highpass', ftype=type)
80 |
81 | # plot frequency response of filter if requested
82 | if (plot):
83 | import matplotlib.pyplot as plt
84 | w, h = freqz(b, a)
85 |
86 | plt.figure()
87 | plt.title('Digital filter frequency response')
88 | plt.plot(w, 20 * np.log10(np.abs(h)))
89 | plt.title('Digital filter frequency response')
90 | plt.ylabel('Amplitude Response [dB]')
91 | plt.xlabel('Frequency (rad/sample)')
92 | plt.grid()
93 |
94 | # apply the filter
95 | signal = lfilter(b, a, signal.copy())
96 |
97 | return signal
98 |
99 |
100 | def time_dB(signal, Fs, bits=16):
101 | '''
102 | Compute the signed dB amplitude of the oscillating signal
103 | normalized wrt the number of bits used for the signal
104 | '''
105 |
106 | import matplotlib.pyplot as plt
107 |
108 | # min dB (least significant bit in dB)
109 | lsb = -20 * np.log10(2.) * (bits - 1)
110 |
111 | # magnitude in dB (clipped)
112 | pos = clip(signal, 2. ** (bits - 1) - 1, 1.) / 2. ** (bits - 1)
113 | neg = -clip(signal, -1., -2. ** (bits - 1)) / 2. ** (bits - 1)
114 |
115 | mag_pos = np.zeros(signal.shape)
116 | Ip = np.where(pos > 0)
117 | mag_pos[Ip] = 20 * np.log10(pos[Ip]) + lsb + 1
118 |
119 | mag_neg = np.zeros(signal.shape)
120 | In = np.where(neg > 0)
121 | mag_neg[In] = 20 * np.log10(neg[In]) + lsb + 1
122 |
123 | plt.plot(np.arange(len(signal)) / float(Fs), mag_pos - mag_neg)
124 | plt.xlabel('Time [s]')
125 | plt.ylabel('Amplitude [dB]')
126 | plt.axis('tight')
127 | plt.ylim(lsb-1, -lsb+1)
128 |
129 | # draw ticks corresponding to decibels
130 | div = 20
131 | n = int(-lsb/div)+1
132 | yticks = np.zeros(2*n)
133 | yticks[:n] = lsb - 1 + np.arange(0, n*div, div)
134 | yticks[n:] = -lsb + 1 - np.arange((n-1)*div, -1, -div)
135 | yticklabels = np.zeros(2*n)
136 | yticklabels = range(0, -n*div, -div) + range(-(n-1)*div, 1, div)
137 | plt.setp(plt.gca(), 'yticks', yticks)
138 | plt.setp(plt.gca(), 'yticklabels', yticklabels)
139 |
140 | plt.setp(plt.getp(plt.gca(), 'ygridlines'), 'ls', '--')
141 |
142 |
143 | def spectrum(signal, Fs, N):
144 |
145 | import stft
146 | import windows
147 |
148 | F = stft.stft(signal, N, N / 2, win=windows.hann(N))
149 | stft.spectroplot(F.T, N, N / 2, Fs)
150 |
151 |
152 | def dB(signal, power=False):
153 | if power is True:
154 | return 10*np.log10(np.abs(signal))
155 | else:
156 | return 20*np.log10(np.abs(signal))
157 |
158 |
159 | def comparePlot(signal1, signal2, Fs, fft_size=512, norm=False, equal=False, title1=None, title2=None):
160 |
161 | import matplotlib.pyplot as plt
162 |
163 | td_amp = np.maximum(np.abs(signal1).max(), np.abs(signal2).max())
164 |
165 | if norm:
166 | if equal:
167 | signal1 /= np.abs(signal1).max()
168 | signal2 /= np.abs(signal2).max()
169 | else:
170 | signal1 /= td_amp
171 | signal2 /= td_amp
172 | td_amp = 1.
173 |
174 | plt.subplot(2,2,1)
175 | plt.plot(np.arange(len(signal1))/float(Fs), signal1)
176 | plt.axis('tight')
177 | plt.ylim(-td_amp, td_amp)
178 | if title1 is not None:
179 | plt.title(title1)
180 |
181 | plt.subplot(2,2,2)
182 | plt.plot(np.arange(len(signal2))/float(Fs), signal2)
183 | plt.axis('tight')
184 | plt.ylim(-td_amp, td_amp)
185 | if title2 is not None:
186 | plt.title(title2)
187 |
188 | from constants import eps
189 | import stft
190 | import windows
191 |
192 | F1 = stft.stft(signal1, fft_size, fft_size / 2, win=windows.hann(fft_size))
193 | F2 = stft.stft(signal2, fft_size, fft_size / 2, win=windows.hann(fft_size))
194 |
195 | # try a fancy way to set the scale to avoid having the spectrum
196 | # dominated by a few outliers
197 | p_min = 1
198 | p_max = 99.5
199 | all_vals = np.concatenate((dB(F1+eps), dB(F2+eps))).flatten()
200 | vmin, vmax = np.percentile(all_vals, [p_min, p_max])
201 |
202 | cmap = 'jet'
203 | interpolation='sinc'
204 |
205 | plt.subplot(2,2,3)
206 | stft.spectroplot(F1.T, fft_size, fft_size / 2, Fs, vmin=vmin, vmax=vmax,
207 | cmap=plt.get_cmap(cmap), interpolation=interpolation)
208 |
209 | plt.subplot(2,2,4)
210 | stft.spectroplot(F2.T, fft_size, fft_size / 2, Fs, vmin=vmin, vmax=vmax,
211 | cmap=plt.get_cmap(cmap), interpolation=interpolation)
212 |
213 |
--------------------------------------------------------------------------------
/wav_resample.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | from scipy.io import wavfile
4 | from scipy.signal import resample
5 | import sys
6 |
7 | Fs = int(sys.argv[1])
8 | filename = sys.argv[2]
9 |
10 | base, suffix = filename.split('.')
11 |
12 | rate, signal = wavfile.read(filename)
13 |
14 | if (rate == Fs):
15 | print 'Sampling rate is already matching.'
16 | sys.exit(1)
17 |
18 | signal = resample(
19 | np.array(
20 | signal, dtype=float), np.ceil(
21 | len(signal) / float(rate) * Fs))
22 |
23 | wavfile.write(
24 | base +
25 | '_' +
26 | str(Fs) +
27 | '.' +
28 | suffix,
29 | Fs,
30 | np.array(
31 | signal,
32 | dtype=np.float))
33 |
--------------------------------------------------------------------------------
/windows.py:
--------------------------------------------------------------------------------
1 | '''A collection of windowing functions.'''
2 |
3 | import numpy as np
4 |
5 | # cosine window function
6 | def cosine(N, flag='asymmetric', length='full'):
7 |
8 | # first choose the indexes of points to compute
9 | if (length == 'left'): # left side of window
10 | t = np.arange(0, N / 2)
11 | elif(length == 'right'): # right side of window
12 | t = np.arange(N / 2, N)
13 | else: # full window by default
14 | t = np.arange(0, N)
15 |
16 | # if asymmetric window, denominator is N, if symmetric it is N-1
17 | if (flag == 'symmetric' or flag == 'mdct'):
18 | t = t / float(N - 1)
19 | else:
20 | t = t / float(N)
21 |
22 | w = np.cos(np.pi * (t - 0.5)) ** 2
23 |
24 | # make the window respect MDCT condition
25 | if (flag == 'mdct'):
26 | w **= 2
27 | d = w[:N / 2] + w[N / 2:]
28 | w[:N / 2] *= 1. / d
29 | w[N / 2:] *= 1. / d
30 |
31 | # compute window
32 | return w
33 |
34 |
35 | # triangular window function
36 | def triang(N, flag='asymmetric', length='full'):
37 |
38 | # first choose the indexes of points to compute
39 | if (length == 'left'): # left side of window
40 | t = np.arange(0, N / 2)
41 | elif(length == 'right'): # right side of window
42 | t = np.arange(N / 2, N)
43 | else: # full window by default
44 | t = np.arange(0, N)
45 |
46 | # if asymmetric window, denominator is N, if symmetric it is N-1
47 | if (flag == 'symmetric' or flag == 'mdct'):
48 | t = t / float(N - 1)
49 | else:
50 | t = t / float(N)
51 |
52 | w = 1. - np.abs(2. * t - 1.)
53 |
54 | # make the window respect MDCT condition
55 | if (flag == 'mdct'):
56 | d = w[:N / 2] + w[N / 2:]
57 | w[:N / 2] *= 1. / d
58 | w[N / 2:] *= 1. / d
59 |
60 | # compute window
61 | return w
62 |
63 |
64 | # hann window function
65 | def hann(N, flag='asymmetric', length='full'):
66 |
67 | # first choose the indexes of points to compute
68 | if (length == 'left'): # left side of window
69 | t = np.arange(0, N / 2)
70 | elif(length == 'right'): # right side of window
71 | t = np.arange(N / 2, N)
72 | else: # full window by default
73 | t = np.arange(0, N)
74 |
75 | # if asymmetric window, denominator is N, if symmetric it is N-1
76 | if (flag == 'symmetric' or flag == 'mdct'):
77 | t = t / float(N - 1)
78 | else:
79 | t = t / float(N)
80 |
81 | w = 0.5 * (1 - np.cos(2 * np.pi * t))
82 |
83 | # make the window respect MDCT condition
84 | if (flag == 'mdct'):
85 | d = w[:N / 2] + w[N / 2:]
86 | w[:N / 2] *= 1. / d
87 | w[N / 2:] *= 1. / d
88 |
89 | # compute window
90 | return w
91 |
92 |
93 | # Blackman-Harris window
94 | def blackman_harris(N, flag='asymmetric', length='full'):
95 |
96 | # coefficients
97 | a = np.array([.35875, .48829, .14128, .01168])
98 |
99 | # first choose the indexes of points to compute
100 | if (length == 'left'): # left side of window
101 | t = np.arange(0, N / 2)
102 | elif(length == 'right'): # right side of window
103 | t = np.arange(N / 2, N)
104 | else: # full window by default
105 | t = np.arange(0, N)
106 |
107 | # if asymmetric window, denominator is N, if symmetric it is N-1
108 | if (flag == 'symmetric'):
109 | t = t / float(N - 1)
110 | else:
111 | t = t / float(N)
112 |
113 | pi = np.pi
114 | w = a[0] - a[1]*np.cos(2*pi*t) + a[2]*np.cos(4*pi*t) + a[3]*np.cos(6*pi*t)
115 |
116 | return w
117 |
118 | # Rectangular window function
119 | def rect(N):
120 | return np.ones(N)
121 |
--------------------------------------------------------------------------------