├── README.md
├── misc
    ├── diagram.png
    └── xvec_config.png
└── tdnn.py


/README.md:
--------------------------------------------------------------------------------
 1 | # TDNN
 2 | Simple Time Delay Neural Network (TDNN) implementation in Pytorch. Uses the unfold method to slide over an input sequence.
 3 | 
 4 | ![Alt text](misc/diagram.png?raw=true "Diagram") [1] https://www.danielpovey.com/files/2015_interspeech_multisplice.pdf
 5 | 
 6 | # Factorized TDNN (TDNN-F)
 7 | 
 8 | I've also implemented the Factorized TDNN from Kaldi (TDNN-F) in PyTorch here: https://github.com/cvqluu/Factorized-TDNN
 9 | 
10 | ## Usage
11 | 
12 | To recreate the TDNN part of the x-vector network in [2]:
13 | 
14 | ```python
15 | 
16 | from tdnn import TDNN
17 | 
18 | # Assuming 24 dim MFCCs per frame
19 | 
20 | frame1 = TDNN(input_dim=24, output_dim=512, context_size=5, dilation=1)
21 | frame2 = TDNN(input_dim=512, output_dim=512, context_size=3, dilation=2)
22 | frame3 = TDNN(input_dim=512, output_dim=512, context_size=3, dilation=3)
23 | frame4 = TDNN(input_dim=512, output_dim=512, context_size=1, dilation=1)
24 | frame5 = TDNN(input_dim=512, output_dim=1500, context_size=1, dilation=1)
25 | 
26 | # Input to frame1 is of shape (batch_size, T, 24)
27 | # Output of frame5 will be (batch_size, T-14, 1500)
28 | 
29 | ```
30 | 
31 | ![Alt text](misc/xvec_config.png?raw=true "Diagram") [2] https://www.danielpovey.com/files/2018_icassp_xvectors.pdf
32 | 


--------------------------------------------------------------------------------
/misc/diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cvqluu/TDNN/c9d3df7b342ca016067da1cb3bdeba0566a97877/misc/diagram.png


--------------------------------------------------------------------------------
/misc/xvec_config.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cvqluu/TDNN/c9d3df7b342ca016067da1cb3bdeba0566a97877/misc/xvec_config.png


--------------------------------------------------------------------------------
/tdnn.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torch.nn.functional as F
 4 | 
 5 | class TDNN(nn.Module):
 6 |     
 7 |     def __init__(
 8 |                     self, 
 9 |                     input_dim=23, 
10 |                     output_dim=512,
11 |                     context_size=5,
12 |                     stride=1,
13 |                     dilation=1,
14 |                     batch_norm=True,
15 |                     dropout_p=0.0
16 |                 ):
17 |         '''
18 |         TDNN as defined by https://www.danielpovey.com/files/2015_interspeech_multisplice.pdf
19 | 
20 |         Affine transformation not applied globally to all frames but smaller windows with local context
21 | 
22 |         batch_norm: True to include batch normalisation after the non linearity
23 |         
24 |         Context size and dilation determine the frames selected
25 |         (although context size is not really defined in the traditional sense)
26 |         For example:
27 |             context size 5 and dilation 1 is equivalent to [-2,-1,0,1,2]
28 |             context size 3 and dilation 2 is equivalent to [-2, 0, 2]
29 |             context size 1 and dilation 1 is equivalent to [0]
30 |         '''
31 |         super(TDNN, self).__init__()
32 |         self.context_size = context_size
33 |         self.stride = stride
34 |         self.input_dim = input_dim
35 |         self.output_dim = output_dim
36 |         self.dilation = dilation
37 |         self.dropout_p = dropout_p
38 |         self.batch_norm = batch_norm
39 |       
40 |         self.kernel = nn.Linear(input_dim*context_size, output_dim)
41 |         self.nonlinearity = nn.ReLU()
42 |         if self.batch_norm:
43 |             self.bn = nn.BatchNorm1d(output_dim)
44 |         if self.dropout_p:
45 |             self.drop = nn.Dropout(p=self.dropout_p)
46 |         
47 |     def forward(self, x):
48 |         '''
49 |         input: size (batch, seq_len, input_features)
50 |         outpu: size (batch, new_seq_len, output_features)
51 |         '''
52 | 
53 |         _, _, d = x.shape
54 |         assert (d == self.input_dim), 'Input dimension was wrong. Expected ({}), got ({})'.format(self.input_dim, d)
55 |         x = x.unsqueeze(1)
56 | 
57 |         # Unfold input into smaller temporal contexts
58 |         x = F.unfold(
59 |                         x, 
60 |                         (self.context_size, self.input_dim), 
61 |                         stride=(1,self.input_dim), 
62 |                         dilation=(self.dilation,1)
63 |                     )
64 | 
65 |         # N, output_dim*context_size, new_t = x.shape
66 |         x = x.transpose(1,2)
67 |         x = self.kernel(x)
68 |         x = self.nonlinearity(x)
69 |         
70 |         if self.dropout_p:
71 |             x = self.drop(x)
72 | 
73 |         if self.batch_norm:
74 |             x = x.transpose(1,2)
75 |             x = self.bn(x)
76 |             x = x.transpose(1,2)
77 | 
78 |         return x
79 | 


--------------------------------------------------------------------------------