├── README.md
├── cqt_nsgt_pytorch
    ├── CQT_nsgt.py
    ├── __init__.py
    ├── __pycache__
    │   ├── CQT_nsgt.cpython-310.pyc
    │   ├── CQT_nsgt.cpython-38.pyc
    │   ├── __init__.cpython-310.pyc
    │   ├── __init__.cpython-38.pyc
    │   ├── fscale.cpython-310.pyc
    │   ├── fscale.cpython-38.pyc
    │   ├── nsdual.cpython-310.pyc
    │   ├── nsdual.cpython-38.pyc
    │   ├── nsgfwin.cpython-310.pyc
    │   ├── nsgfwin.cpython-38.pyc
    │   ├── util.cpython-310.pyc
    │   └── util.cpython-38.pyc
    ├── fscale.py
    ├── nsdual.py
    ├── nsgfwin.py
    └── util.py
├── setup.py
└── tests
    ├── test.m
    └── test_notebook.ipynb


/README.md:
--------------------------------------------------------------------------------
 1 | # CQT_pytorch
 2 | 
 3 | Pytorch implementation of the invertible CQT based on Non-stationary Gabor filters.
 4 | 
 5 | The transform has near-perfect reconstruction, is differentiable and GPU-efficient.
 6 | 
 7 | ## Install
 8 | 
 9 | ```bash
10 | pip install cqt-nsgt-pytorch
11 | ```
12 | ## Usage
13 | ```py
14 | from cqt_nsgt_pytorch import CQT_nsgt
15 | 
16 | #parameter examples
17 | numocts=9 
18 | binsoct=64
19 | fs=44100 
20 | Ls=131072 
21 | 
22 | cqt=CQT_nsgt(numocts, binsoct, mode="matrix_complete",fs=fs, audio_len=Ls, device="cuda", dtype=torch.float32)
23 | 
24 | audio=#load some audio file shape=[Batch, channels, time]
25 | 
26 | X=cqt.fwd(audio)# forward transform
27 | #X.shape=[batch, channels, frequency, time]
28 | audio_reconstructed=cqt.bwd(X) #backward transform
29 | 
30 | ```
31 | ## Modes of operation
32 | 
33 | Different versions of the transform are implemented. They can be selected by choosing the 'mode' parameter. Except "matrix" and "oct, that discard DC and Nyquist bands, the rest have perfect reconstruction.
34 | 
35 | mode          | Description  |  Output shape
36 | ------------- | ------------- | -------------
37 | "critical"      | (default) critical sampling (no redundancy) (slow implementation) |  list of tensors, each with different time resolution 
38 | "matrix"      |  Equal time resolution per frequency band. maximum redundancy (discards DC and Nyquist) | 2d-Tensor \[binsoct \times numocts, T\]
39 | "matrix_complete  | Same as above, but DC and Nyquist are included.  | 2d-Tensor \[binsoct \times numocts + 2, T\]
40 | "matrix_slow" | Slower version of "matrix_complete". Might show similar efficiency in CPU and consumes way less memory | 2d-Tensor \[binsoct \times numocts + 2, T\]
41 | "oct" | Tradeoff between structure and redundancy. THe frequency bins are grouped by octave bands, each octave with a different time resolution. The time lengths are restricted to be powers of 2. (Discards DC and Nyquist) | list of tensors, one per octave band, each with different time resolution
42 | "oct_complete" | Same as above, but DC and Nyquist are included | list of tensors, one per octave band,DC and Nyquist, each with a different time resolution
43 | 
44 | 
45 | 
46 | ## TODO
47 | - [x] On "matrix" mode, give the option to output also the DC and Nyq. Same in "oct" mode. Document how this disacrding thing is implemented.
48 | - [ ] Do some proper documentation
49 | - [ ] Test it for mixed precision. problems with powers of 2, etc. Maybe this will require zero padding...
50 | - [ ] Make the apply_hpf_DC() and apply_lpf_DC() more handy and clear. Document the usage of those.
51 | - [ ] Accelerate the "critical" mode, similar method as in "oct" could also apply. (update: seems a bit tricky memory-wise)
52 | - [ ] Clean the whole __init__() method as now it is a mess. 
53 | - [ ] Report the efficiency of the implementation in GPU. (time and frequency). Briefly: It is fast as everything is vectorized but maybe consumes too much memory, specially on the backward pass.
54 | - [x] Check if there is more redundancy to get rid of. Apparently, there is not
55 | 


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/CQT_nsgt.py:
--------------------------------------------------------------------------------
  1 | import torch 
  2 | #from src.nsgt.cq  import NSGT
  3 | 
  4 | from .fscale  import LogScale , FlexLogOctScale
  5 | 
  6 | from .nsgfwin import nsgfwin
  7 | from .nsdual import nsdual
  8 | from .util import calcwinrange
  9 | 
 10 | import math
 11 | from math import ceil
 12 | 
 13 | def next_power_of_2(x):
 14 |     return 1 if x == 0 else 2**math.ceil(math.log2(x))
 15 | 
 16 | 
 17 | 
 18 | class CQT_nsgt():
 19 |     def __init__(self,numocts, binsoct,  mode="critical", window="hann", flex_Q=None, fs=44100, audio_len=44100, device="cpu", dtype=torch.float32):
 20 |         """
 21 |             args:
 22 |                 numocts (int) number of octaves
 23 |                 binsoct (int) number of bins per octave. Can be a list if mode="flex_oct"
 24 |                 mode (string) defines the mode of operation:
 25 |                      "critical": (default) critical sampling (no redundancy) returns a list of tensors, each with different time resolution (slow implementation)
 26 |                      "critical_fast": notimplemented
 27 |                      "matrix": returns a 2d-matrix maximum redundancy (discards DC and Nyquist)
 28 |                      "matrix_pow2": returns a 2d-matrix maximum redundancy (discards DC and Nyquist) (time-resolution is rounded up to a power of 2)
 29 |                      "matrix_complete": returns a 2d-matrix maximum redundancy (with DC and Nyquist)
 30 |                      "matrix_slow": returns a 2d-matrix maximum redundancy (slow implementation)
 31 |                      "oct": octave-wise rasterization ( modearate redundancy) returns a list of tensors, each from a different octave with different time resolution (discards DC and Nyquist)
 32 |                      "oct_complete": octave-wise rasterization ( modearate redundancy) returns a list of tensors, each from a different octave with different time resolution (with DC and Nyquist)
 33 |                 fs (float) sampling frequency
 34 |                 audio_len (int) sample length
 35 |                 device
 36 |         """
 37 | 
 38 |         fmax=fs/2 -10**-6 #the maximum frequency is Nyquist
 39 |         self.Ls=audio_len #the length is given
 40 | 
 41 |         fmin=fmax/(2**numocts)
 42 |         fbins=int(binsoct*numocts) 
 43 |         self.numocts=numocts
 44 |         self.binsoct=binsoct
 45 |        
 46 |         if mode=="flex_oct":
 47 |             self.scale = FlexLogOctScale(fs, self.numocts, self.binsoct, time_reductions)
 48 |         else:
 49 |             self.scale = LogScale(fmin, fmax, fbins)
 50 | 
 51 |         self.fs=fs
 52 | 
 53 |         self.device=torch.device(device)
 54 |         self.mode=mode
 55 |         self.dtype=dtype
 56 | 
 57 |         self.frqs,self.q = self.scale() 
 58 | 
 59 |         self.g,rfbas,self.M = nsgfwin(self.frqs, self.q, self.fs, self.Ls, dtype=self.dtype, device=self.device, min_win=4, window=window)
 60 | 
 61 |         sl = slice(0,len(self.g)//2+1)
 62 | 
 63 |         # coefficients per slice
 64 |         self.ncoefs = max(int(math.ceil(float(len(gii))/mii))*mii for mii,gii in zip(self.M[sl],self.g[sl]))        
 65 |         if mode=="matrix" or mode=="matrix_complete" or mode=="matrix_slow":
 66 |             #just use the maximum resolution everywhere
 67 |             self.M[:] = self.M.max()
 68 |         elif mode=="matrix_pow2":
 69 |             self.size_per_oct=[]
 70 |             self.M[:]=next_power_of_2(self.M.max())
 71 | 
 72 |         elif mode=="oct" or mode=="oct_complete":
 73 |             #round uo all the lengths of an octave to the next power of 2
 74 |             self.size_per_oct=[]
 75 |             idx=1
 76 |             for i in range(numocts):
 77 |                 value=next_power_of_2(self.M[idx:idx+binsoct].max())
 78 | 
 79 |                 #value=M[idx:idx+binsoct].max()
 80 |                 self.size_per_oct.append(value)
 81 |                 self.M[idx:idx+binsoct]=value
 82 |                 self.M[-idx-binsoct:-idx]=value
 83 |                 idx+=binsoct
 84 | 
 85 | 
 86 |         # calculate shifts
 87 |         self.wins,self.nn = calcwinrange(self.g, rfbas, self.Ls, device=self.device)
 88 |         # calculate dual windows
 89 |         self.gd = nsdual(self.g, self.wins, self.nn, self.M, dtype=self.dtype, device=self.device)
 90 | 
 91 |         #filter DC
 92 |         self.Hlpf=torch.zeros(self.Ls, dtype=self.dtype, device=self.device)
 93 |         self.Hlpf[0:len(self.g[0])//2]=self.g[0][:len(self.g[0])//2]*self.gd[0][:len(self.g[0])//2]*self.M[0]
 94 |         self.Hlpf[-len(self.g[0])//2:]=self.g[0][len(self.g[0])//2:]*self.gd[0][len(self.g[0])//2:]*self.M[0]
 95 |         #filter nyquist
 96 |         nyquist_idx=len(self.g)//2
 97 |         Lg=len(self.g[nyquist_idx])
 98 |         self.Hlpf[self.wins[nyquist_idx][0:(Lg+1)//2]]+=self.g[nyquist_idx][(Lg)//2:]*self.gd[nyquist_idx][(Lg)//2:]*self.M[nyquist_idx]
 99 |         self.Hlpf[self.wins[nyquist_idx][-(Lg-1)//2:]]+=self.g[nyquist_idx][:(Lg)//2]*self.gd[nyquist_idx][:(Lg)//2]*self.M[nyquist_idx]
100 | 
101 |         self.Hhpf=1-self.Hlpf
102 | 
103 |         #FORWARD!! this is from nsgtf
104 |         #self.forward = lambda s: nsgtf(s, self.g, self.wins, self.nn, self.M, mode=self.mode , device=self.device)
105 |         #sl = slice(0,len(self.g)//2+1)
106 |         if mode=="matrix" or mode=="oct" or mode=="matrix_pow2":
107 |             sl = slice(1,len(self.g)//2) #getting rid of the DC component and the Nyquist
108 |         else:
109 |             sl = slice(0,len(self.g)//2+1)
110 | 
111 |         self.maxLg_enc = max(int(ceil(float(len(gii))/mii))*mii for mii,gii in zip(self.M[sl], self.g[sl]))
112 |     
113 |         self.loopparams_enc = []
114 |         for mii,gii,win_range in zip(self.M[sl],self.g[sl],self.wins[sl]):
115 |             Lg = len(gii)
116 |             col = int(ceil(float(Lg)/mii))
117 |             assert col*mii >= Lg
118 |             assert col == 1
119 |             p = (mii,win_range,Lg,col)
120 |             self.loopparams_enc.append(p)
121 |     
122 | 
123 |         def get_ragged_giis(g, wins, ms, mode):
124 |             #ragged_giis = [torch.nn.functional.pad(torch.unsqueeze(gii, dim=0), (0, self.maxLg_enc-gii.shape[0])) for gii in gd[sl]]
125 |             #ragged_giis=[]
126 |             c=torch.zeros((len(g),self.Ls//2+1),dtype=self.dtype,device=self.device)
127 |             ix=[]
128 |             if mode=="oct":
129 |                 for i in range(self.numocts):
130 |                     ix.append(torch.zeros((self.binsoct,self.size_per_oct[i]),dtype=torch.int64,device=self.device))
131 |             elif mode=="matrix" or mode=="matrix_pow2":
132 |                 ix.append(torch.zeros((len(g),self.maxLg_enc),dtype=torch.int64,device=self.device))
133 | 
134 |             elif mode=="oct_complete" or mode=="matrix_complete":
135 |                 ix.append(torch.zeros((1,ms[0]),dtype=torch.int64,device=self.device))
136 |                 count=0
137 |                 for i in range(1,len(g)-1):
138 |                     if count==0 or ms[i] == ms[i-1]:
139 |                         count+=1
140 |                     else:
141 |                         ix.append(torch.zeros((count,ms[i-1]),dtype=torch.int64,device=self.device))
142 |                         count=1
143 | 
144 |                 ix.append(torch.zeros((count,ms[i-1]),dtype=torch.int64,device=self.device))
145 | 
146 |                 ix.append(torch.zeros((1,ms[-1]),dtype=torch.int64,device=self.device))
147 | 
148 |             j=0
149 |             k=0
150 |             for i,(gii, win_range) in enumerate(zip(g,wins)):
151 |                 if i>0:
152 |                     if ms[i]!=ms[i-1] or ((mode=="oct_complete" or mode=="matrix_complete") and (j==0 or i==len(g)-1)):
153 |                         j+=1
154 |                         k=0
155 | 
156 |                 gii=torch.fft.fftshift(gii).unsqueeze(0)
157 |                 Lg=gii.shape[1]
158 | 
159 |                 if (i==0 or i==len(g)-1) and (mode=="oct_complete" or mode=="matrix_complete"):
160 |                     #special case for the DC and Nyquist, as we don't want to use the mirrored frequencies, take this into account during forward! we would just need to conjugate or sth!
161 |                     if i==0:
162 |                         c[i,win_range[Lg//2:]]=gii[...,Lg//2:]
163 | 
164 |                         ix[j][0,:(Lg+1)//2]=win_range[Lg//2:].unsqueeze(0)
165 |                         ix[j][0,-(Lg//2):]=torch.flip(win_range[Lg//2:].unsqueeze(0),(-1,))
166 |                     if i==len(g)-1:
167 |                         c[i,win_range[:(Lg+1)//2]]=gii[...,:(Lg+1)//2]
168 | 
169 |                         ix[j][0,:(Lg+1)//2]=torch.flip(win_range[:(Lg+1)//2].unsqueeze(0),(-1,)) #rethink this
170 |                         ix[j][0,-(Lg//2):]=win_range[:(Lg)//2].unsqueeze(0)
171 |                 else:
172 |                     c[i,win_range]=gii 
173 | 
174 |                     ix[j][k,:(Lg+1)//2]=win_range[Lg//2:].unsqueeze(0)
175 |                     ix[j][k,-(Lg//2):]=win_range[:Lg//2].unsqueeze(0)
176 | 
177 |                 k+=1
178 |                 #a=torch.unsqueeze(gii, dim=0)
179 |                 #b=torch.nn.functional.pad(a, (0, self.maxLg_enc-gii.shape[0]))
180 |                 #ragged_giis.append(b)
181 |             #dirty unsqueeze
182 |             return  torch.conj(c), ix
183 | 
184 | 
185 |         if self.mode=="matrix" or self.mode=="matrix_complete" or self.mode=="matrix_pow2":
186 |             self.giis, self.idx_enc=get_ragged_giis(self.g[sl], self.wins[sl], self.M[sl],self.mode)
187 |             #self.idx_enc=self.idx_enc[0]
188 |             #self.idx_enc=self.idx_enc.unsqueeze(0).unsqueeze(0)
189 |         elif self.mode=="oct" or self.mode=="oct_complete":
190 |             self.giis, self.idx_enc=get_ragged_giis(self.g[sl], self.wins[sl], self.M[sl], self.mode)
191 |             #self.idx_enc=self.idx_enc.unsqueeze(0).unsqueeze(0)
192 |         elif self.mode=="critical" or self.mode=="matrix_slow":
193 |             #self.giis, self.idx_enc=get_ragged_giis(self.g[sl], self.wins[sl], self.M[sl], self.mode)
194 |             
195 |             ragged_giis = [torch.nn.functional.pad(torch.unsqueeze(gii, dim=0), (0, self.maxLg_enc-gii.shape[0])) for gii in self.g[sl]]
196 |             self.giis = torch.conj(torch.cat(ragged_giis))
197 |             #ragged_giis = [torch.nn.functional.pad(torch.unsqueeze(gii, dim=0), (0, self.maxLg_enc-gii.shape[0])) for gii in self.g[sl]]
198 | 
199 |             #self.giis = torch.conj(torch.cat(ragged_giis))
200 | 
201 |         #FORWARD!! this is from nsigtf
202 |         #self.backward = lambda c: nsigtf(c, self.gd, self.wins, self.nn, self.Ls, mode=self.mode,  device=self.device)
203 | 
204 |         self.maxLg_dec = max(len(gdii) for gdii in self.gd)
205 |         if self.mode=="matrix_pow2":
206 |             self.maxLg_dec=self.maxLg_enc
207 |         #self.maxLg_dec=self.maxLg_enc
208 |         #print(self.maxLg_enc, self.maxLg_dec)
209 |        
210 |         #ragged_gdiis = [torch.nn.functional.pad(torch.unsqueeze(gdii, dim=0), (0, self.maxLg_dec-gdii.shape[0])) for gdii in self.gd]
211 |         #self.gdiis = torch.conj(torch.cat(ragged_gdiis))
212 | 
213 |         def get_ragged_gdiis(gd, wins, mode, ms=None):
214 |             ragged_gdiis=[]
215 |             ix=torch.zeros((len(gd),self.Ls//2+1),dtype=torch.int64,device=self.device)+self.maxLg_dec//2#I initialize the index with the center to make sure that it points to a 0
216 |             for i,(g, win_range) in enumerate(zip(gd, wins)):
217 |                 Lg=g.shape[0]
218 |                 gl=g[:(Lg+1)//2]
219 |                 gr=g[(Lg+1)//2:]
220 |                 zeros = torch.zeros(self.maxLg_dec-Lg ,dtype=g.dtype, device=g.device)  # pre-allocation
221 |                 paddedg=torch.cat((gl, zeros, gr),0).unsqueeze(0)
222 |                 ragged_gdiis.append(paddedg)
223 | 
224 |                 wr1 = win_range[:(Lg)//2]
225 |                 wr2 = win_range[-((Lg+1)//2):]
226 |                 if mode=="matrix_complete" and i==0:
227 |                     #ix[i,wr1]=torch.Tensor([self.maxLg_dec-(Lg//2)+i for i in range(len(wr1))]).to(torch.int64) #the end part
228 |                     ix[i,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(torch.int64).to(self.device) #the start part
229 |                 elif mode=="matrix_complete" and i==len(gd)-1:
230 |                     ix[i,wr1]=torch.Tensor([self.maxLg_dec-(Lg//2)+i for i in range(len(wr1))]).to(torch.int64).to(self.device) #the end part
231 |                     #ix[i,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(torch.int64) #the start part
232 |                 else:
233 |                     ix[i,wr1]=torch.Tensor([self.maxLg_dec-(Lg//2)+i for i in range(len(wr1))]).to(torch.int64).to(self.device) #the end part
234 |                     ix[i,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(torch.int64).to(self.device) #the start part
235 | 
236 |                 
237 |             return torch.conj(torch.cat(ragged_gdiis)).to(self.dtype)*self.maxLg_dec, ix
238 | 
239 |         def get_ragged_gdiis_critical(gd, ms):
240 |             seq_gdiis=[]
241 |             ragged_gdiis=[]
242 |             mprev=-1
243 |             for i,(g,m) in enumerate(zip(gd, ms)):
244 |                 if i>0 and m!=mprev:
245 |                     gdii=torch.conj(torch.cat(ragged_gdiis))
246 |                     if len(gdii.shape)==1:
247 |                         gdii=gdii.unsqueeze(0)
248 |                     #seq_gdiis.append(gdii[0:gdii.shape[0]//2 +1])
249 |                     seq_gdiis.append(gdii)
250 |                     ragged_gdiis=[]
251 |                     
252 |                 Lg=g.shape[0]
253 |                 gl=g[:(Lg+1)//2]
254 |                 gr=g[(Lg+1)//2:]
255 |                 zeros = torch.zeros(m-Lg ,dtype=g.dtype, device=g.device)  # pre-allocation
256 |                 paddedg=torch.cat((gl, zeros, gr),0).unsqueeze(0)*m
257 |                 ragged_gdiis.append(paddedg)
258 |                 mprev=m
259 |             
260 |             gdii=torch.conj(torch.cat(ragged_gdiis))
261 |             seq_gdiis.append(gdii)
262 |             #seq_gdiis.append(gdii[0:gdii.shape[0]//2 +1])
263 |             return seq_gdiis
264 | 
265 |         def get_ragged_gdiis_oct(gd, ms, wins, mode):
266 |             seq_gdiis=[]
267 |             ragged_gdiis=[]
268 |             mprev=-1
269 |             ix=[] 
270 |             if mode=="oct_complete":
271 |                 ix+=[torch.zeros((1,self.Ls//2+1),dtype=torch.int64,device=self.device)+ms[0]//2]
272 | 
273 |             ix+=[torch.zeros((self.binsoct,self.Ls//2+1),dtype=torch.int64,device=self.device)+self.size_per_oct[j]//2 for j in range(len(self.size_per_oct))]
274 |             if mode=="oct_complete":
275 |                 ix+=[torch.zeros((1,self.Ls//2+1),dtype=torch.int64,device=self.device)+ms[-1]//2]
276 |             
277 |             #I nitialize the index with the center to make sure that it points to a 0
278 |             j=0
279 |             k=0
280 |             for i,(g,m, win_range) in enumerate(zip(gd, ms, wins)):
281 |                 if i>0 and m!=mprev or (mode=="oct_complete" and i==len(gd)-1):
282 |                     #take care when size of DC is the same as the next octave, or last octave has the same size as nyquist!
283 |                     gdii=torch.conj(torch.cat(ragged_gdiis))
284 |                     if len(gdii.shape)==1:
285 |                         gdii=gdii.unsqueeze(0)
286 |                     #seq_gdiis.append(gdii[0:gdii.shape[0]//2 +1])
287 |                     seq_gdiis.append(gdii.to(self.dtype))
288 |                     ragged_gdiis=[]
289 |                     j+=1
290 |                     k=0
291 |                     
292 |                 Lg=g.shape[0]
293 |                 gl=g[:(Lg+1)//2]
294 |                 gr=g[(Lg+1)//2:]
295 |                 zeros = torch.zeros(m-Lg ,dtype=g.dtype, device=g.device)  # pre-allocation
296 |                 paddedg=torch.cat((gl, zeros, gr),0).unsqueeze(0)*m
297 |                 ragged_gdiis.append(paddedg)
298 |                 mprev=m
299 | 
300 |                 wr1 = win_range[:(Lg)//2]
301 |                 wr2 = win_range[-((Lg+1)//2):]
302 |                 if mode=="oct_complete" and i==0:
303 |                     #ix[i,wr1]=torch.Tensor([self.maxLg_dec-(Lg//2)+i for i in range(len(wr1))]).to(torch.int64) #the end part
304 |                     #ix[i,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(torch.int64) #the start part
305 |                     ix[0][k,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(self.device).to(torch.int64) #the start part
306 |                 elif mode=="oct_complete" and i==len(gd)-1:
307 |                     #ix[i,wr1]=torch.Tensor([self.maxLg_dec-(Lg//2)+i for i in range(len(wr1))]).to(torch.int64) #the end part
308 |                     ix[-1][k,wr1]=torch.Tensor([m-(Lg//2)+i for i in range(len(wr1))]).to(self.device).to(torch.int64) #the end part
309 |                     #ix[i,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(torch.int64) #the start part
310 |                 else:
311 |                     #ix[i,wr1]=torch.Tensor([self.maxLg_dec-(Lg//2)+i for i in range(len(wr1))]).to(torch.int64) #the end part
312 |                     #ix[i,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(torch.int64) #the start part
313 | 
314 |                     ix[j][k,wr1]=torch.Tensor([m-(Lg//2)+i for i in range(len(wr1))]).to(self.device).to(torch.int64) #the end part
315 |                     ix[j][k,wr2]=torch.Tensor([i for i in range(len(wr2))]).to(self.device).to(torch.int64) #the start part
316 |                 k+=1
317 |             
318 |             gdii=torch.conj(torch.cat(ragged_gdiis))
319 |             seq_gdiis.append(gdii.to(self.dtype))
320 |             #seq_gdiis.append(gdii[0:gdii.shape[0]//2 +1])
321 | 
322 |             return seq_gdiis, ix
323 | 
324 |         if self.mode=="matrix" or self.mode=="matrix_complete":
325 |             self.gdiis, self.idx_dec= get_ragged_gdiis(self.gd[sl], self.wins[sl], self.mode)
326 |             #self.gdiis = self.gdiis[sl]
327 |             #self.gdiis = self.gdiis[0:(self.gdiis.shape[0]//2 +1)]
328 |         elif self.mode=="matrix_pow2":
329 |             self.gdiis, self.idx_dec= get_ragged_gdiis(self.gd[sl], self.wins[sl], self.mode, ms=self.M[sl])
330 |         elif self.mode=="oct" or self.mode=="oct_complete":
331 |             self.gdiis, self.idx_dec=get_ragged_gdiis_oct(self.gd[sl], self.M[sl], self.wins[sl], self.mode)
332 |             for gdiis in self.gdiis:
333 |                 gdiis.to(self.dtype)
334 | 
335 |         elif self.mode=="critical":
336 |             self.gdiis =get_ragged_gdiis_critical(self.gd[sl], self.M[sl])
337 |         elif self.mode=="matrix_slow":
338 |             ragged_gdiis = [torch.nn.functional.pad(torch.unsqueeze(gdii, dim=0), (0, self.maxLg_dec-gdii.shape[0])) for gdii in self.gd]
339 |             self.gdiis = torch.conj(torch.cat(ragged_gdiis))
340 | 
341 |         self.loopparams_dec = []
342 |         for gdii,win_range in zip(self.gd[sl], self.wins[sl]):
343 |             Lg = len(gdii)
344 |             wr1 = win_range[:(Lg)//2]
345 |             wr2 = win_range[-((Lg+1)//2):]
346 |             p = (wr1,wr2,Lg)
347 |             self.loopparams_dec.append(p)
348 | 
349 |     def apply_hpf_DC(self, x):
350 |         Lin=x.shape[-1]
351 |         if Lin<self.Ls:
352 |             #pad zeros
353 |             x=torch.nn.functional.pad(x, (0, self.Ls-Lin))
354 |         elif Lin> self.Ls:
355 |             raise ValueError("Input signal is longer than the maximum length. I could have patched it, but I didn't. sorry :(")
356 | 
357 |         X=torch.fft.fft(x)
358 |         X=X*torch.conj(self.Hhpf)
359 |         out= torch.fft.ifft(X).real
360 |         if Lin<self.Ls:
361 |             out=out[..., :Lin]
362 |         return out
363 | 
364 | 
365 |     def apply_lpf_DC(self, x):
366 |         Lin=x.shape[-1]
367 |         if Lin<self.Ls:
368 |             #pad zeros
369 |             x=torch.nn.functional.pad(x, (0, self.Ls-Lin))
370 |         elif Lin> self.Ls:
371 |             raise ValueError("Input signal is longer than the maximum length. I could have patched it, but I didn't. sorry :(")
372 |         X=torch.fft.fft(x)
373 |         X=X*torch.conj(self.Hlpf)
374 |         out= torch.fft.ifft(X).real
375 |         if Lin<self.Ls:
376 |             out=out[..., :Lin]
377 |         return out
378 | 
379 | 
380 |     def nsgtf(self,f):
381 |         """
382 |             forward transform
383 |             args:
384 |                 t: Tensor shape(B, C, T) time-domain waveform
385 |             returns:
386 |                 if mode = "matrix" 
387 |                     ret: Tensor shape (B, C, F, T') 2d spectrogram spectrogram matrix
388 |                 else 
389 |                     ret: list([Tensor]) list of tensors of shape (B, C, Fbibs, T') , representing the bands with the same time-resolution.
390 |                     if mode="oct", the elements on the lists correspond to different octaves
391 |                 
392 |         """
393 |         
394 | 
395 |         ft = torch.fft.fft(f)
396 |     
397 |         Ls = f.shape[-1]
398 | 
399 |         assert self.nn == Ls
400 |     
401 |         if self.mode=="matrix" or self.mode=="matrix_pow2":
402 |             ft=ft[...,:self.Ls//2+1]
403 |             #c = torch.zeros(*f.shape[:2], len(self.loopparams_enc), self.maxLg_enc, dtype=ft.dtype, device=torch.device(self.device))
404 |     
405 |             t=ft.unsqueeze(-2)*self.giis #this has a lot of rendundant operations and, probably, consumes a lot of memory. Anyways, it is parallelizable, so it is not a big deal, I guess.
406 |             #c=torch.gather(t, 3, self.idx_enc)
407 |             a=torch.gather(t, 3, self.idx_enc[0].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index. 
408 | 
409 |             #a=torch.cat(a,torch.conj(torch.fliplr(a[...,0:-1])),dim=-1)
410 | 
411 |             return torch.fft.ifft(a)
412 |     
413 |         elif self.mode=="oct": 
414 |             ft=ft[...,:self.Ls//2 +1]
415 |             #block_ptr = -1
416 |             #bucketed_tensors = []
417 |             ret = []
418 |             #ret2 = []
419 |         
420 |             t=ft.unsqueeze(-2)*self.giis #this has a lot of rendundant operations and, probably, consumes a lot of memory. Anyways, it is parallelizable, so it is not a big deal, I guess.
421 | 
422 |             for i in range(self.numocts):
423 |                 #c=torch.gather(t[...,i*self.binsoct:(i+1)*self.binsoct,:], 3, self.idx_enc[i]) 
424 |                 #ret.append(torch.fft.ifft(torch.cat(bucketed_tensors,2)))
425 |                 a=torch.gather(t[...,i*self.binsoct:(i+1)*self.binsoct,:], 3, self.idx_enc[i].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index.
426 |                 ret.append(torch.fft.ifft(a))
427 | 
428 |             return ret
429 |         elif self.mode=="oct_complete": 
430 |             ft=ft[...,:self.Ls//2 +1]
431 |             #block_ptr = -1
432 |             #bucketed_tensors = []
433 |             ret = []
434 |             #ret2 = []
435 |         
436 |             t=ft.unsqueeze(-2)*self.giis #this has a lot of rendundant operations and, probably, consumes a lot of memory. Anyways, it is parallelizable, so it is not a big deal, I guess.
437 | 
438 |             L=self.idx_enc[0].shape[-1]
439 |             a=torch.gather(t[...,0,:].unsqueeze(-2), 3, self.idx_enc[0].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index.
440 |             a[...,(L+1)//2:]=torch.conj(a[...,(L+1)//2:])
441 |             #conjugate one of the partsk
442 |             ret.append(torch.fft.ifft(a))
443 | 
444 |             for i in range(self.numocts):
445 |                 #c=torch.gather(t[...,i*self.binsoct:(i+1)*self.binsoct,:], 3, self.idx_enc[i]) 
446 |                 #ret.append(torch.fft.ifft(torch.cat(bucketed_tensors,2)))
447 |                 a=torch.gather(t[...,i*self.binsoct+1:(i+1)*self.binsoct+1,:], 3, self.idx_enc[i+1].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index.
448 |                 ret.append(torch.fft.ifft(a))
449 |             
450 |             L=self.idx_enc[-1].shape[-1]
451 |             a=torch.gather(t[...,-1,:].unsqueeze(-2), 3, self.idx_enc[-1].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index. 
452 |             a[...,:(L)//2]=torch.conj(a[...,:(L)//2]) #conjugate one of the parts (here the first)
453 |             ret.append(torch.fft.ifft(a))
454 | 
455 |             return ret
456 | 
457 |         elif self.mode=="matrix_complete":
458 |             ft=ft[...,:self.Ls//2+1]
459 |             #c = torch.zeros(*f.shape[:2], len(self.loopparams_enc), self.maxLg_enc, dtype=ft.dtype, device=torch.device(self.device))
460 | 
461 |     
462 |             t=ft.unsqueeze(-2)*self.giis #this has a lot of rendundant operations and, probably, consumes a lot of memory. Anyways, it is parallelizable, so it is not a big deal, I guess.
463 |             #c=torch.gather(t, 3, self.idx_enc)
464 |             ret=[]
465 |             i=0 #DC be careful!
466 |             L=self.idx_enc[0].shape[-1]
467 |             a=torch.gather(t[...,0,:].unsqueeze(-2), 3, self.idx_enc[0].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index.
468 |             a[...,(L+1)//2:]=torch.conj(a[...,(L+1)//2:])
469 |             #conjugate one of the partsk
470 |             ret.append(torch.fft.ifft(a))
471 | 
472 |             #normal
473 |             a=torch.gather(t[...,1:-1,:], 3, self.idx_enc[1].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index. 
474 |             ret.append(torch.fft.ifft(a))
475 |             #nyquist be careful!
476 |             i=-1 
477 |             a=torch.gather(t[...,-1,:].unsqueeze(-2), 3, self.idx_enc[-1].unsqueeze(0).unsqueeze(0).expand(t.shape[0],t.shape[1],-1,-1)) #To make torch.gather broadcast, I need to add a dimension to the index. 
478 |             a[...,:(L)//2]=torch.conj(a[...,:(L)//2]) #conjugate one of the parts (here the first)
479 |             ret.append(torch.fft.ifft(a))
480 |             #conjugate one of the partsk
481 |             return torch.cat(ret,dim=2)
482 | 
483 |         elif self.mode=="matrix_slow":
484 |             c = torch.zeros(*f.shape[:2], len(self.loopparams_enc), self.maxLg_enc, dtype=ft.dtype, device=torch.device(self.device))
485 | 
486 |             for j, (mii,win_range,Lg,col) in enumerate(self.loopparams_enc):
487 |                 t = ft[:, :, win_range]*torch.fft.fftshift(self.giis[j, :Lg])
488 | 
489 |                 sl1 = slice(None,(Lg+1)//2)
490 |                 sl2 = slice(-(Lg//2),None)
491 | 
492 |                 c[:, :, j, sl1] = t[:, :, Lg//2:]  # if mii is odd, this is of length mii-mii//2
493 |                 c[:, :, j, sl2] = t[:, :, :Lg//2]  # if mii is odd, this is of length mii//2
494 | 
495 |             return torch.fft.ifft(c)
496 |         elif self.mode=="critical": 
497 |             block_ptr = -1
498 |             bucketed_tensors = []
499 |             ret = []
500 |         
501 |             for j, (mii,win_range,Lg,col) in enumerate(self.loopparams_enc):
502 | 
503 |                 c = torch.zeros(*f.shape[:2], 1, mii, dtype=ft.dtype, device=torch.device(self.device))
504 |         
505 |                 t = ft[:, :, win_range]*torch.fft.fftshift(self.giis[j, :Lg]) #this needs to be parallelized!
506 |         
507 |                 sl1 = slice(None,(Lg+1)//2)
508 |                 sl2 = slice(-(Lg//2),None)
509 |         
510 |                 c[:, :, 0, sl1] = t[:, :, Lg//2:]  # if mii is odd, this is of length mii-mii//2
511 |                 c[:, :, 0, sl2] = t[:, :, :Lg//2]  # if mii is odd, this is of length mii//2
512 |         
513 |                 # start a new block
514 |                 if block_ptr == -1 or bucketed_tensors[block_ptr][0].shape[-1] != mii:
515 |                     bucketed_tensors.append(c)
516 |                     block_ptr += 1
517 |                 else:
518 |                     # concat block to previous contiguous frequency block with same time resolution
519 |                     bucketed_tensors[block_ptr] = torch.cat([bucketed_tensors[block_ptr], c], dim=2)
520 |         
521 |             # bucket-wise ifft
522 |             for bucketed_tensor in bucketed_tensors:
523 |                 ret.append(torch.fft.ifft(bucketed_tensor))
524 |         
525 |             return ret
526 | 
527 |     def nsigtf(self,cseq):
528 |         """
529 |         mode: "matrix"
530 |             args
531 |                 cseq: Time-frequency Tensor with shape (B, C, Freq, Time)
532 |             returns
533 |                 sig: Time-domain Tensor with shape (B, C, Time)
534 |                 
535 |         """
536 | 
537 | 
538 |         if self.mode!="matrix" and self.mode!="matrix_slow" and self.mode!="matrix_complete" and self.mode!="matrix_pow2":
539 |             #print(cseq)
540 |             assert type(cseq) == list
541 |             nfreqs = 0
542 |             for i, cseq_tsor in enumerate(cseq):
543 |                 cseq_dtype = cseq_tsor.dtype
544 |                 cseq[i] = torch.fft.fft(cseq_tsor)
545 |                 nfreqs += cseq_tsor.shape[2]
546 |             cseq_shape = (*cseq_tsor.shape[:2], nfreqs)
547 |         else:
548 |             assert type(cseq) == torch.Tensor
549 |             cseq_shape = cseq.shape[:3]
550 |             cseq_dtype = cseq.dtype
551 |             fc = torch.fft.fft(cseq)
552 |         
553 |         fbins = cseq_shape[2]
554 |         #temp0 = torch.empty(*cseq_shape[:2], self.maxLg_dec, dtype=fr.dtype, device=torch.device(self.device))  # pre-allocation
555 |         
556 |         
557 | 
558 |         # The overlap-add procedure including multiplication with the synthesis windows
559 |         #tart=time.time()
560 |         if self.mode=="matrix_slow":
561 |             fr = torch.zeros(*cseq_shape[:2], self.nn, dtype=cseq_dtype, device=torch.device(self.device))  # Allocate output
562 |             temp0 = torch.empty(*cseq_shape[:2], self.maxLg_dec, dtype=fr.dtype, device=torch.device(self.device))  # pre-allocation
563 | 
564 |             for i,(wr1,wr2,Lg) in enumerate(self.loopparams_dec[:fbins]):
565 |                 t = fc[:, :, i]
566 | 
567 |                 r = (Lg+1)//2
568 |                 l = (Lg//2)
569 | 
570 |                 t1 = temp0[:, :, :r]
571 |                 t2 = temp0[:, :, Lg-l:Lg]
572 | 
573 |                 t1[:, :, :] = t[:, :, :r]
574 |                 t2[:, :, :] = t[:, :, self.maxLg_dec-l:self.maxLg_dec]
575 | 
576 |                 temp0[:, :, :Lg] *= self.gdiis[i, :Lg] 
577 |                 temp0[:, :, :Lg] *= self.maxLg_dec
578 | 
579 |                 fr[:, :, wr1] += t2
580 |                 fr[:, :, wr2] += t1
581 | 
582 |         elif self.mode=="matrix" or self.mode=="matrix_complete" or self.mode=="matrix_pow2":
583 |             fr = torch.zeros(*cseq_shape[:2], self.nn//2+1, dtype=cseq_dtype, device=torch.device(self.device))  # Allocate output
584 |             temp0=fc*self.gdiis.unsqueeze(0).unsqueeze(0)
585 |             fr=torch.gather(temp0, 3, self.idx_dec.unsqueeze(0).unsqueeze(0).expand(temp0.shape[0], temp0.shape[1], -1, -1)).sum(2)
586 | 
587 |         elif self.mode=="oct" or self.mode=="oct_complete":
588 |             fr = torch.zeros(*cseq_shape[:2], self.nn//2+1, dtype=cseq_dtype, device=torch.device(self.device))  # Allocate output
589 |             # frequencies are bucketed by same time resolution
590 |             fbin_ptr = 0
591 |             for j, (fc, gdii_j) in enumerate(zip(cseq, self.gdiis)):
592 |                 Lg_outer = fc.shape[-1]
593 |         
594 |                 nb_fbins = fc.shape[2]
595 |                 temp0 = torch.zeros(*cseq_shape[:2],nb_fbins, Lg_outer, dtype=cseq_dtype, device=torch.device(self.device))  # Allocate output
596 |         
597 |                 temp0=fc*gdii_j.unsqueeze(0).unsqueeze(0)
598 |                 fr+=torch.gather(temp0, 3, self.idx_dec[j].unsqueeze(0).unsqueeze(0).expand(temp0.shape[0], temp0.shape[1], -1, -1)).sum(2)
599 | 
600 |         else:
601 |             # speed uniefficient but save mode
602 |             # frequencies are bucketed by same time resolution
603 |             fr = torch.zeros(*cseq_shape[:2], self.nn, dtype=cseq_dtype, device=torch.device(self.device))  # Allocate output
604 |             fbin_ptr = 0
605 |             for j, (fc, gdii_j) in enumerate(zip(cseq, self.gdiis)):
606 |                 Lg_outer = fc.shape[-1]
607 |         
608 |                 nb_fbins = fc.shape[2]
609 |                 temp0 = torch.zeros(*cseq_shape[:2],nb_fbins, Lg_outer, dtype=cseq_dtype, device=torch.device(self.device))  # Allocate output
610 |         
611 |                 temp0=fc*gdii_j.unsqueeze(0).unsqueeze(0)
612 | 
613 |                 for i,(wr1,wr2,Lg) in enumerate(self.loopparams_dec[fbin_ptr:fbin_ptr+nb_fbins][:fbins]):
614 |                     r = (Lg+1)//2
615 |                     l = (Lg//2)
616 |         
617 |                     fr[:, :, wr1] += temp0[:,:,i,Lg_outer-l:Lg_outer]
618 |                     fr[:, :, wr2] += temp0[:,:,i, :r]
619 | 
620 |                 fbin_ptr += nb_fbins
621 |         
622 |         #end=time.time()
623 |         #rint("in for loop",end-start)
624 |         ftr = fr[:, :, :self.nn//2+1]
625 |         sig = torch.fft.irfft(ftr, n=self.nn)
626 |         sig = sig[:, :, :self.Ls] # Truncate the signal to original length (if given)
627 |         return sig
628 | 
629 |     def fwd(self,x):
630 |         """
631 |             x: [B,C,T]
632 |         """
633 |         c = self.nsgtf(x)
634 |         return c
635 | 
636 |     def bwd(self,c):
637 |         s = self.nsigtf(c) #messing out with the channels agains...
638 |         return s
639 | 
640 | 


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__init__.py:
--------------------------------------------------------------------------------
1 | from .CQT_nsgt import CQT_nsgt


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/CQT_nsgt.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/CQT_nsgt.cpython-310.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/CQT_nsgt.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/CQT_nsgt.cpython-38.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/__init__.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/__init__.cpython-310.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/__init__.cpython-38.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/fscale.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/fscale.cpython-310.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/fscale.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/fscale.cpython-38.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/nsdual.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/nsdual.cpython-310.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/nsdual.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/nsdual.cpython-38.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/nsgfwin.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/nsgfwin.cpython-310.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/nsgfwin.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/nsgfwin.cpython-38.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/util.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/util.cpython-310.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/__pycache__/util.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eloimoliner/CQT_pytorch/1f2e9f649202693c97d425dcc9d964ba24174212/cqt_nsgt_pytorch/__pycache__/util.cpython-38.pyc


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/fscale.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8
  2 | 
  3 | """
  4 | Python implementation of Non-Stationary Gabor Transform (NSGT)
  5 | derived from MATLAB code by NUHAG, University of Vienna, Austria
  6 | 
  7 | Thomas Grill, 2011-2015
  8 | http://grrrr.org/nsgt
  9 | 
 10 | Austrian Research Institute for Artificial Intelligence (OFAI)
 11 | AudioMiner project, supported by Vienna Science and Technology Fund (WWTF)
 12 | """
 13 | 
 14 | import numpy as np
 15 | 
 16 | class Scale:
 17 |     dbnd = 1.e-8
 18 |     
 19 |     def __init__(self, bnds):
 20 |         self.bnds = bnds
 21 |         
 22 |     def __len__(self):
 23 |         return self.bnds
 24 |     
 25 |     def Q(self, bnd=None):
 26 |         # numerical differentiation (if self.Q not defined by sub-class)
 27 |         if bnd is None:
 28 |             bnd = np.arange(self.bnds)
 29 |         return self.F(bnd)*self.dbnd/(self.F(bnd+self.dbnd)-self.F(bnd-self.dbnd))
 30 |     
 31 |     def __call__(self):
 32 |         f = np.array([self.F(b) for b in range(self.bnds)],dtype=float)
 33 |         q = np.array([self.Q(b) for b in range(self.bnds)],dtype=float)
 34 |         return f,q
 35 | 
 36 |     def suggested_sllen_trlen(self, sr):
 37 |         f,q = self()
 38 | 
 39 |         Ls = int(np.ceil(max((q*8.*sr)/f)))
 40 | 
 41 |         # make sure its divisible by 4
 42 |         Ls = Ls + -Ls % 4
 43 | 
 44 |         sllen = Ls
 45 | 
 46 |         trlen = sllen//4
 47 |         trlen = trlen + -trlen % 2 # make trlen divisible by 2
 48 | 
 49 |         return sllen, trlen
 50 | 
 51 | 
 52 | class LogScale(Scale):
 53 |     def __init__(self, fmin, fmax, bnds, beyond=0):
 54 |         """
 55 |         @param fmin: minimum frequency (Hz)
 56 |         @param fmax: maximum frequency (Hz)
 57 |         @param bnds: number of frequency bands (int)
 58 |         @param beyond: number of frequency bands below fmin and above fmax (int)
 59 |         """
 60 |         Scale.__init__(self, bnds+beyond*2)
 61 |         lfmin = np.log2(fmin)
 62 |         lfmax = np.log2(fmax)
 63 |         odiv = (lfmax-lfmin)/(bnds-1)
 64 |         lfmin_ = lfmin-odiv*beyond
 65 |         lfmax_ = lfmax+odiv*beyond
 66 |         self.fmin = 2**lfmin_
 67 |         self.fmax = 2**lfmax_
 68 |         self.pow2n = 2**odiv
 69 |         self.q = np.sqrt(self.pow2n)/(self.pow2n-1.)/2.
 70 |         
 71 |     def F(self, bnd=None):
 72 |         return self.fmin*self.pow2n**(bnd if bnd is not None else np.arange(self.bnds))
 73 |     
 74 |     def Q(self, bnd=None):
 75 |         return self.q
 76 | 
 77 | class FlexLogOctScale():
 78 |     def __init__(self, fs, numocts, binsoct, flex_Q):
 79 |         """
 80 |         @param fmin: minimum frequency (Hz)
 81 |         @param fmax: maximum frequency (Hz)
 82 |         @param bnds: number of frequency bands (int)
 83 |         @param beyond: number of frequency bands below fmin and above fmax (int)
 84 |         """
 85 |         fmax=fs/2
 86 |         fmin=fmax/(2**numocts)
 87 | 
 88 |         self.bnds=0
 89 |         for i in range(numocts):
 90 |             self.bnds+=binsoct[i]
 91 | 
 92 |         lfmin = np.log2(fmin)
 93 |         lfmax = np.log2(fmax)
 94 | 
 95 |         odiv = (lfmax-lfmin)/(bnds-1)
 96 |         lfmin_ = lfmin-odiv*beyond
 97 |         lfmax_ = lfmax+odiv*beyond
 98 |         self.fmin = 2**lfmin_
 99 |         self.fmax = 2**lfmax_
100 |         self.pow2n = 2**odiv
101 |         self.q = np.sqrt(self.pow2n)/(self.pow2n-1.)/2.
102 | 
103 |         self.bnds=bnds #number of frequency bands
104 |         
105 |     def F(self, bnd=None):
106 |         return self.fmin*self.pow2n**(bnd if bnd is not None else np.arange(self.bnds))
107 |     
108 |     def Q(self, bnd=None):
109 |         return self.q
110 | 
111 |     def Q(self, bnd=None):
112 |         # numerical differentiation (if self.Q not defined by sub-class)
113 |         if bnd is None:
114 |             bnd = np.arange(self.bnds)
115 |         return self.F(bnd)*self.dbnd/(self.F(bnd+self.dbnd)-self.F(bnd-self.dbnd))
116 |     
117 |     def __call__(self):
118 |         f = np.array([self.F(b) for b in range(self.bnds)],dtype=float)
119 |         q = np.array([self.Q(b) for b in range(self.bnds)],dtype=float)
120 |         return f,q
121 | 
122 | 
123 | 
124 | 
125 | 


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/nsdual.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8
 2 | 
 3 | """
 4 | Thomas Grill, 2011-2015
 5 | http://grrrr.org/nsgt
 6 | 
 7 | Austrian Research Institute for Artificial Intelligence (OFAI)
 8 | AudioMiner project, supported by Vienna Science and Technology Fund (WWTF)
 9 | 
10 | --
11 | Original matlab code comments follow:
12 | 
13 | NSDUAL.M - Nicki Holighaus 02.02.11
14 | 
15 | Computes (for the painless case) the dual frame corresponding to a given 
16 | non-stationary Gabor frame specified by the windows 'g' and time shifts
17 | 'shift'.
18 | 
19 | Note, the time shifts corresponding to the dual window sequence is the
20 | same as the original shift sequence and as such already given.
21 | 
22 | This routine's output can be used to achieve reconstruction of a signal 
23 | from its non-stationary Gabor coefficients using the inverse 
24 | non-stationary Gabor transform 'nsigt'.
25 | 
26 | More information on Non-stationary Gabor transforms
27 | can be found at:
28 | 
29 | http://www.univie.ac.at/nonstatgab/
30 | 
31 | minor edit by Gino Velasco 23.02.11
32 | """
33 | 
34 | import numpy as np
35 | import torch
36 | 
37 | #from .util import chkM
38 | 
39 | 
40 | def nsdual(g, wins, nn, M=None, dtype=torch.float32, device="cpu"):
41 |     #M = chkM(M,g)
42 | 
43 |     # Construct the diagonal of the frame operator matrix explicitly
44 |     x = torch.zeros((nn,), dtype=dtype, device=torch.device(device))
45 |     for gi,mii,sl in zip(g, M, wins):
46 |         xa = torch.square(torch.fft.fftshift(gi))
47 |         xa *= mii
48 | 
49 |         x[sl] += xa
50 | 
51 |     gd = [gi/torch.fft.ifftshift(x[wi]) for gi,wi in zip(g,wins)]
52 |     return gd
53 | 


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/nsgfwin.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8
  2 | 
  3 | """
  4 | Thomas Grill, 2011-2016
  5 | http://grrrr.org/nsgt
  6 | 
  7 | --
  8 | Original matlab code comments follow:
  9 | 
 10 | NSGFWIN.M
 11 | ---------------------------------------------------------------
 12 |  [g,rfbas,M]=nsgfwin(fmin,bins,sr,Ls) creates a set of windows whose
 13 |  centers correspond to center frequencies to be
 14 |  used for the nonstationary Gabor transform with varying Q-factor. 
 15 | ---------------------------------------------------------------
 16 | 
 17 | INPUT : fmin ...... Minimum frequency (in Hz)
 18 |         bins ...... Vector consisting of the number of bins per octave
 19 |         sr ........ Sampling rate (in Hz)
 20 |         Ls ........ Length of signal (in samples)
 21 | 
 22 | OUTPUT : g ......... Cell array of window functions.
 23 |          rfbas ..... Vector of positions of the center frequencies.
 24 |          M ......... Vector of lengths of the window functions.
 25 | 
 26 | AUTHOR(s) : Monika Dörfler, Gino Angelo Velasco, Nicki Holighaus, 2010
 27 | 
 28 | COPYRIGHT : (c) NUHAG, Dept.Math., University of Vienna, AUSTRIA
 29 | http://nuhag.eu/
 30 | Permission is granted to modify and re-distribute this
 31 | code in any manner as long as this notice is preserved.
 32 | All standard disclaimers apply.
 33 | 
 34 | EXTERNALS : firwin
 35 | """
 36 | 
 37 | import numpy as np
 38 | from .util import hannwin, blackharr, kaiserwin
 39 | from math import ceil
 40 | from warnings import warn
 41 | from itertools import chain
 42 | #import torch
 43 | 
 44 | 
 45 | def nsgfwin(f, q, sr, Ls,  min_win=4, Qvar=1, dowarn=True, dtype=np.float64, device="cpu", window="hann"):
 46 |     nf = sr/2.
 47 | 
 48 |     lim = np.argmax(f > 0)
 49 |     if lim != 0:
 50 |         # f partly <= 0 
 51 |         f = f[lim:]
 52 |         q = q[lim:]
 53 |             
 54 |     lim = np.argmax(f >= nf)
 55 |     if lim != 0:
 56 |         # f partly >= nf 
 57 |         f = f[:lim]
 58 |         q = q[:lim]
 59 |     
 60 |     assert len(f) == len(q)
 61 |     assert np.all((f[1:]-f[:-1]) > 0)  # frequencies must be increasing
 62 |     assert np.all(q > 0)  # all q must be > 0
 63 |     
 64 |     qneeded = f*(Ls/(8.*sr))
 65 |     #if np.any(q >= qneeded) and dowarn:
 66 |     #    warn("Q-factor too high for frequencies %s"%",".join("%.2f"%fi for fi in f[q >= qneeded]))
 67 |     
 68 |     fbas = f
 69 |     lbas = len(fbas)
 70 |     
 71 |     frqs = np.concatenate(((0.,),fbas,(nf,)))
 72 |     
 73 |     fbas = np.concatenate((frqs,sr-frqs[-2:0:-1]))
 74 | 
 75 |     # at this point: fbas.... frequencies in Hz
 76 |     
 77 |     fbas *= float(Ls)/sr
 78 |     
 79 |     # Omega[k] in the paper
 80 |     M = np.zeros(fbas.shape, dtype=int)
 81 |     M[0] = np.round(2*fbas[1])
 82 |     #M[1]=
 83 |     M[1] = np.round(fbas[1]/q[0])
 84 |     for k in range(2,lbas+1):
 85 |         #M[k] = np.round(fbas[k]/q[k-1])
 86 |         M[k]= np.round(fbas[k+1]-fbas[k-1]) #this is nyq!
 87 |         #M[k] =
 88 |     #M[lbas]=np.round(fbas[lbas]/q[-1])
 89 |     M[lbas+1]= np.round(fbas[k+1]-fbas[k-1]) #this is nyq!
 90 |     M[lbas+2:]=M[lbas:0:-1] #symmetry!
 91 |     
 92 |     #M[-1] = np.round(Ls-fbas[-2])
 93 |         
 94 |     np.clip(M, min_win, np.inf, out=M)
 95 | 
 96 |     
 97 |     if window=="hann":
 98 |         print("using a hann window")
 99 |         g = [hannwin(m, device=device).to(dtype) for m in M]
100 |     elif window=="blackharr":
101 |         print("using a blackharr window")
102 |         g = [blackharr(m, device=device).to(dtype) for m in M]
103 |     elif window[0]=="kaiser":
104 |         print("using a kaiser window with beta=",window[1])
105 |         str, beta= window
106 |         g = [kaiserwin(m,beta, device=device).to(dtype) for m in M]
107 | 
108 |     #g[0]=tukeywin(M[0], 0.2, device=device).to(dtype)
109 |     
110 |     fbas[lbas] = (fbas[lbas-1]+fbas[lbas+1])/2
111 |     fbas[lbas+2] = Ls-fbas[lbas]
112 |     rfbas = np.round(fbas).astype(int)
113 |         
114 | 
115 |     return g,rfbas,M
116 | 


--------------------------------------------------------------------------------
/cqt_nsgt_pytorch/util.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8
 2 | 
 3 | """
 4 | Python implementation of Non-Stationary Gabor Transform (NSGT)
 5 | derived from MATLAB code by NUHAG, University of Vienna, Austria
 6 | 
 7 | Thomas Grill, 2011-2015
 8 | http://grrrr.org/nsgt
 9 | 
10 | Austrian Research Institute for Artificial Intelligence (OFAI)
11 | AudioMiner project, supported by Vienna Science and Technology Fund (WWTF)
12 | """
13 | 
14 | import numpy as np
15 | import torch
16 | from math import exp, floor, ceil, pi
17 | #import scipy.signal
18 | 
19 | 
20 | def hannwin(l, device="cpu"):
21 |     r = torch.arange(l,dtype=float, device=torch.device(device))
22 |     r *= np.pi*2./l
23 |     r = torch.cos(r)
24 |     r += 1.
25 |     r *= 0.5
26 |     return r
27 | 
28 | #design a kaiser window
29 | def kaiserwin(l, beta, device="cpu"):
30 |     beta=torch.tensor(beta, dtype=float, device=torch.device(device))
31 |     r = torch.arange(l,dtype=float, device=torch.device(device))
32 |     r *= np.pi*2./l
33 |     r = torch.cos(r)
34 |     r += 1.
35 |     r *= 0.5
36 |     r = torch.sqrt(r)
37 |     r = torch.i0(beta*torch.sqrt(1.-r**2))/(2.*torch.i0(beta))
38 |     r=torch.roll(r, l//2)
39 |     return r
40 | 
41 | 
42 | 
43 | #alternative windows!! maybe could be interesting to switch to get better time or freq resolution, who knows...
44 | def blackharr(n, l=None, mod=True, device="cpu"):
45 |     if l is None: 
46 |         l = n
47 |     nn = (n//2)*2
48 |     k = torch.arange(n, device=torch.device(device))
49 |     if not mod:
50 |         bh = 0.35875 - 0.48829*torch.cos(k*(2*pi/nn)) + 0.14128*torch.cos(k*(4*pi/nn)) -0.01168*torch.cos(k*(6*pi/nn))
51 |     else:
52 |         bh = 0.35872 - 0.48832*torch.cos(k*(2*pi/nn)) + 0.14128*torch.cos(k*(4*pi/nn)) -0.01168*torch.cos(k*(6*pi/nn))
53 |     bh = torch.hstack((bh,torch.zeros(l-n,dtype=bh.dtype,device=torch.device(device))))
54 |     bh = torch.hstack((bh[-n//2:],bh[:-n//2]))
55 |     return bh
56 | 
57 | def blackharrcw(bandwidth,corr_shift):
58 |     flip = -1 if corr_shift < 0 else 1
59 |     corr_shift *= flip
60 |     
61 |     M = np.ceil(bandwidth/2+corr_shift-1)*2
62 |     win = np.concatenate((np.arange(M//2,M), np.arange(0,M//2)))-corr_shift
63 |     win = (0.35872 - 0.48832*np.cos(win*(2*np.pi/bandwidth))+ 0.14128*np.cos(win*(4*np.pi/bandwidth)) -0.01168*np.cos(win*(6*np.pi/bandwidth)))*(win <= bandwidth)*(win >= 0)
64 | 
65 |     return win[::flip],M
66 | 
67 | 
68 | 
69 | def _isseq(x):
70 |     try:
71 |         len(x)
72 |     except TypeError:
73 |         return False
74 |     return True        
75 | 
76 | 
77 | def calcwinrange(g, rfbas, Ls, device="cpu"):
78 |     shift = np.concatenate(((np.mod(-rfbas[-1],Ls),), rfbas[1:]-rfbas[:-1]))
79 |     
80 |     timepos = np.cumsum(shift)
81 |     nn = timepos[-1]
82 |     timepos -= shift[0] # Calculate positions from shift vector
83 |     
84 |     wins = []
85 |     for gii,tpii in zip(g, timepos):
86 |         Lg = len(gii)
87 |         win_range = torch.arange(-(Lg//2)+tpii, Lg-(Lg//2)+tpii, dtype=int, device=torch.device(device))
88 |         win_range %= nn
89 | 
90 |         wins.append(win_range)
91 |         
92 |     return wins,nn
93 | 
94 | 
95 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import find_packages, setup
 2 | 
 3 | setup(
 4 |     name="cqt-nsgt-pytorch",
 5 |     packages=find_packages(exclude=[]),
 6 |     version="0.0.9",
 7 |     license="MIT",
 8 |     description="Pytorch implementation of an invertible and differentiable Constant-Q Transform based Non-stationary Gabor Transform (NSGT) for audio processing.",
 9 |     long_description_content_type="text/markdown",
10 |     author="Eloi Moliner",
11 |     author_email="eloi.moliner@aalto.fi",
12 |     url="https://github.com/eloimoliner/CQT_pytorch",
13 |     keywords=["audio processing", "constant-q transform", "deep learning", "pytorch", "nsgt"],
14 |     install_requires=[
15 |         "torch>=1.13.0",
16 |         "numpy>=1.19.5",
17 |     ],
18 |     classifiers=[
19 |         "Development Status :: 4 - Beta",
20 |         "Intended Audience :: Developers",
21 |         "License :: OSI Approved :: MIT License",
22 |         "Programming Language :: Python :: 3.6",
23 |     ],
24 | )
25 | 


--------------------------------------------------------------------------------
/tests/test.m:
--------------------------------------------------------------------------------
 1 | clear all;
 2 | close all;
 3 | [a,fs]=audioread("test_dir/0.wav");
 4 | 
 5 | a=a(1:131072);
 6 | 
 7 | A=fft(a);
 8 | N=length(A)
 9 | 
10 | L=512
11 | k=1:L
12 | 
13 | g=(0.5+0.5*cos(k.*pi*2/L))
14 | H=zeros(length(A),1)
15 | H(1:L/2)=g(1:L/2)
16 | H(N-L/2+1:end)=g(L/2+1:end)
17 | 
18 | Hhpf=1-H
19 | 
20 | B=A.*Hhpf
21 | b=real(ifft(B))
22 | 
23 | C=fft(b)
24 | 
25 | 


--------------------------------------------------------------------------------