├── README.md ├── misc ├── diagram.png └── xvec_config.png └── tdnn.py /README.md: -------------------------------------------------------------------------------- 1 | # TDNN 2 | Simple Time Delay Neural Network (TDNN) implementation in Pytorch. Uses the unfold method to slide over an input sequence. 3 | 4 | ![Alt text](misc/diagram.png?raw=true "Diagram") [1] https://www.danielpovey.com/files/2015_interspeech_multisplice.pdf 5 | 6 | # Factorized TDNN (TDNN-F) 7 | 8 | I've also implemented the Factorized TDNN from Kaldi (TDNN-F) in PyTorch here: https://github.com/cvqluu/Factorized-TDNN 9 | 10 | ## Usage 11 | 12 | To recreate the TDNN part of the x-vector network in [2]: 13 | 14 | ```python 15 | 16 | from tdnn import TDNN 17 | 18 | # Assuming 24 dim MFCCs per frame 19 | 20 | frame1 = TDNN(input_dim=24, output_dim=512, context_size=5, dilation=1) 21 | frame2 = TDNN(input_dim=512, output_dim=512, context_size=3, dilation=2) 22 | frame3 = TDNN(input_dim=512, output_dim=512, context_size=3, dilation=3) 23 | frame4 = TDNN(input_dim=512, output_dim=512, context_size=1, dilation=1) 24 | frame5 = TDNN(input_dim=512, output_dim=1500, context_size=1, dilation=1) 25 | 26 | # Input to frame1 is of shape (batch_size, T, 24) 27 | # Output of frame5 will be (batch_size, T-14, 1500) 28 | 29 | ``` 30 | 31 | ![Alt text](misc/xvec_config.png?raw=true "Diagram") [2] https://www.danielpovey.com/files/2018_icassp_xvectors.pdf 32 | -------------------------------------------------------------------------------- /misc/diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cvqluu/TDNN/c9d3df7b342ca016067da1cb3bdeba0566a97877/misc/diagram.png -------------------------------------------------------------------------------- /misc/xvec_config.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cvqluu/TDNN/c9d3df7b342ca016067da1cb3bdeba0566a97877/misc/xvec_config.png -------------------------------------------------------------------------------- /tdnn.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | class TDNN(nn.Module): 6 | 7 | def __init__( 8 | self, 9 | input_dim=23, 10 | output_dim=512, 11 | context_size=5, 12 | stride=1, 13 | dilation=1, 14 | batch_norm=True, 15 | dropout_p=0.0 16 | ): 17 | ''' 18 | TDNN as defined by https://www.danielpovey.com/files/2015_interspeech_multisplice.pdf 19 | 20 | Affine transformation not applied globally to all frames but smaller windows with local context 21 | 22 | batch_norm: True to include batch normalisation after the non linearity 23 | 24 | Context size and dilation determine the frames selected 25 | (although context size is not really defined in the traditional sense) 26 | For example: 27 | context size 5 and dilation 1 is equivalent to [-2,-1,0,1,2] 28 | context size 3 and dilation 2 is equivalent to [-2, 0, 2] 29 | context size 1 and dilation 1 is equivalent to [0] 30 | ''' 31 | super(TDNN, self).__init__() 32 | self.context_size = context_size 33 | self.stride = stride 34 | self.input_dim = input_dim 35 | self.output_dim = output_dim 36 | self.dilation = dilation 37 | self.dropout_p = dropout_p 38 | self.batch_norm = batch_norm 39 | 40 | self.kernel = nn.Linear(input_dim*context_size, output_dim) 41 | self.nonlinearity = nn.ReLU() 42 | if self.batch_norm: 43 | self.bn = nn.BatchNorm1d(output_dim) 44 | if self.dropout_p: 45 | self.drop = nn.Dropout(p=self.dropout_p) 46 | 47 | def forward(self, x): 48 | ''' 49 | input: size (batch, seq_len, input_features) 50 | outpu: size (batch, new_seq_len, output_features) 51 | ''' 52 | 53 | _, _, d = x.shape 54 | assert (d == self.input_dim), 'Input dimension was wrong. Expected ({}), got ({})'.format(self.input_dim, d) 55 | x = x.unsqueeze(1) 56 | 57 | # Unfold input into smaller temporal contexts 58 | x = F.unfold( 59 | x, 60 | (self.context_size, self.input_dim), 61 | stride=(1,self.input_dim), 62 | dilation=(self.dilation,1) 63 | ) 64 | 65 | # N, output_dim*context_size, new_t = x.shape 66 | x = x.transpose(1,2) 67 | x = self.kernel(x) 68 | x = self.nonlinearity(x) 69 | 70 | if self.dropout_p: 71 | x = self.drop(x) 72 | 73 | if self.batch_norm: 74 | x = x.transpose(1,2) 75 | x = self.bn(x) 76 | x = x.transpose(1,2) 77 | 78 | return x 79 | --------------------------------------------------------------------------------