├── README.md
├── logger.py
├── create_spectrograms.py
└── main_topcoder.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Spoken-language-identification
 2 | 方言分类，pytorch
 3 | 参考的是[topcoder比赛](http://yerevann.github.io/2016/06/26/combining-cnn-and-rnn-for-spoken-language-identification/)的结果
 4 | 
 5 | 见[知乎](https://zhuanlan.zhihu.com/p/45104360)
 6 | 
 7 | ## topcoder 原始语音数据
 8 | [Training Data Files]( http://www.topcoder.com/contest/problem/SpokenLanguages2/trainingdata.zip)
 9 | 
10 | [Testing Data Files]( http://www.topcoder.com/contest/problem/SpokenLanguages2/testingdata.zip)
11 | 
12 | [Training Dataset List]( http://www.topcoder.com/contest/problem/SpokenLanguages2/trainingData.csv)
13 | 
14 | [Testing Dataset List]( http://www.topcoder.com/contest/problem/SpokenLanguages2/testingData.csv)
15 | ## 函数介绍：
16 | 
17 | 1. create_spectrograms.py 该函数将语音转成声谱图
18 | * --input_dir:输入的音频文件所在目录
19 | * --save_img_dir:音频转为声谱图之后保存的图片目录
20 | * --audio_type:音频类型（mp3,wav）
21 | 
22 | 2. logger.py 是用于tensorboard可视化的函数
23 | 
24 | 3. main_topcoder.py 是主函数
25 | * batch_gen_imgdata 用于批量读取图片，生成pytorch可以读取的向量
26 | * Network_CNN_RNN 是模型结构类，其中卷积处理后的维度和theano略有不同，不过不影响结果，该结构针对的是topcoder的语音，语音长度固定是10s，
27 | 生成的声谱图的维度是 256 * 858，作为pytorch输入的维度是 100 * 1 * 256 * 858
28 | 
29 | 
30 | ## tips：
31 | 1.每个epoch计算dev集合的acc，根据dev的acc结果保存model
32 | 
33 | 2.使用tensorboard可视化训练集的loss和acc，测试集的acc
34 | 
35 | 3.rnn_input_size 的大小需要提前设定好，当声谱图大小有修改或者conv有变动时，该值也需要修改
36 | 
37 | ## 运行
38 | trainEqual.csv格式如下：
39 | >
40 | /movie/audio/topcoder/topcoder_train_png/000w3fewuqj.png,57
41 | /movie/audio/topcoder/topcoder_train_png/000ylhu4sxl.png,55
42 | /movie/audio/topcoder/topcoder_train_png/0014x3zvjrl.png,155
43 | /movie/audio/topcoder/topcoder_train_png/001xjmtk2wx.png,148
44 | /movie/audio/topcoder/topcoder_train_png/002hrjhbsnk.png,110
45 | 
46 | python -u main_topcoder.py --mode=train --datalist_path=/movie/audio/topcoder --use_gpu=1 --use_pretrained=0
47 | 


--------------------------------------------------------------------------------
/logger.py:
--------------------------------------------------------------------------------
 1 | # Code referenced from https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514
 2 | import tensorflow as tf
 3 | import numpy as np
 4 | import scipy.misc 
 5 | try:
 6 |     from StringIO import StringIO  # Python 2.7
 7 | except ImportError:
 8 |     from io import BytesIO         # Python 3.x
 9 | 
10 | 
11 | class Logger(object):
12 |     
13 |     def __init__(self, log_dir):
14 |         """Create a summary writer logging to log_dir."""
15 |         self.writer = tf.summary.FileWriter(log_dir)
16 | 
17 |     def scalar_summary(self, tag, value, step):
18 |         """Log a scalar variable."""
19 |         summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value)])
20 |         self.writer.add_summary(summary, step)
21 | 
22 |     def image_summary(self, tag, images, step):
23 |         """Log a list of images."""
24 | 
25 |         img_summaries = []
26 |         for i, img in enumerate(images):
27 |             # Write the image to a string
28 |             try:
29 |                 s = StringIO()
30 |             except:
31 |                 s = BytesIO()
32 |             scipy.misc.toimage(img).save(s, format="png")
33 | 
34 |             # Create an Image object
35 |             img_sum = tf.Summary.Image(encoded_image_string=s.getvalue(),
36 |                                        height=img.shape[0],
37 |                                        width=img.shape[1])
38 |             # Create a Summary value
39 |             img_summaries.append(tf.Summary.Value(tag='%s/%d' % (tag, i), image=img_sum))
40 | 
41 |         # Create and write Summary
42 |         summary = tf.Summary(value=img_summaries)
43 |         self.writer.add_summary(summary, step)
44 |         
45 |     def histo_summary(self, tag, values, step, bins=1000):
46 |         """Log a histogram of the tensor of values."""
47 | 
48 |         # Create a histogram using numpy
49 |         counts, bin_edges = np.histogram(values, bins=bins)
50 | 
51 |         # Fill the fields of the histogram proto
52 |         hist = tf.HistogramProto()
53 |         hist.min = float(np.min(values))
54 |         hist.max = float(np.max(values))
55 |         hist.num = int(np.prod(values.shape))
56 |         hist.sum = float(np.sum(values))
57 |         hist.sum_squares = float(np.sum(values**2))
58 | 
59 |         # Drop the start of the first bin
60 |         bin_edges = bin_edges[1:]
61 | 
62 |         # Add bin edges and counts
63 |         for edge in bin_edges:
64 |             hist.bucket_limit.append(edge)
65 |         for c in counts:
66 |             hist.bucket.append(c)
67 | 
68 |         # Create and write Summary
69 |         summary = tf.Summary(value=[tf.Summary.Value(tag=tag, histo=hist)])
70 |         self.writer.add_summary(summary, step)
71 |         self.writer.flush()
72 | 


--------------------------------------------------------------------------------
/create_spectrograms.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from matplotlib import pyplot as plt
  3 | import scipy.io.wavfile as wav
  4 | from numpy.lib import stride_tricks
  5 | import PIL.Image as Image
  6 | import os
  7 | import argparse
  8 | import glob
  9 | 
 10 | parser = argparse.ArgumentParser()
 11 | parser.add_argument("--input_dir", type=str, default='/movie/audio/topcoder/topcoder_train',help="")
 12 | parser.add_argument("--save_img_dir", type=str, default="/movie/audio/topcoder/topcoder_train_png",help="")
 13 | parser.add_argument("--audio_type", type=str, default="mp3",help="audio type")
 14 | 
 15 | opt = parser.parse_args()
 16 | print opt
 17 | 
 18 | 
 19 | """ short time fourier transform of audio signal """
 20 | def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
 21 |     win = window(frameSize)
 22 |     hopSize = int(frameSize - np.floor(overlapFac * frameSize))
 23 |     
 24 |     # zeros at beginning (thus center of 1st window should be for sample nr. 0)
 25 |     samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)
 26 |     # cols for windowing
 27 |     cols = int(np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1)
 28 |     # zeros at end (thus samples can be fully covered by frames)
 29 |     samples = np.append(samples, np.zeros(frameSize))
 30 |     
 31 |     frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
 32 |     frames *= win
 33 |     
 34 |     return np.fft.rfft(frames)    
 35 |     
 36 | """ scale frequency axis logarithmically """    
 37 | def logscale_spec(spec, sr=44100, factor=20., alpha=1.0, f0=0.9, fmax=1):
 38 |     spec = spec[:, 0:256]
 39 |     timebins, freqbins = np.shape(spec)
 40 |     scale = np.linspace(0, 1, freqbins) #** factor
 41 |     
 42 |     # http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=650310&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel4%2F89%2F14168%2F00650310
 43 |     scale = np.array(map(lambda x: x * alpha if x <= f0 else (fmax-alpha*f0)/(fmax-f0)*(x-f0)+alpha*f0, scale))
 44 |     scale *= (freqbins-1)/max(scale)
 45 | 
 46 |     newspec = np.complex128(np.zeros([timebins, freqbins]))
 47 |     allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
 48 |     freqs = [0.0 for i in range(freqbins)]
 49 |     totw = [0.0 for i in range(freqbins)]
 50 |     for i in range(0, freqbins):
 51 |         if (i < 1 or i + 1 >= freqbins):
 52 |             newspec[:, i] += spec[:, i]
 53 |             freqs[i] += allfreqs[i]
 54 |             totw[i] += 1.0
 55 |             continue
 56 |         else:
 57 |             # scale[15] = 17.2
 58 |             w_up = scale[i] - np.floor(scale[i])
 59 |             w_down = 1 - w_up
 60 |             j = int(np.floor(scale[i]))
 61 |            
 62 |             newspec[:, j] += w_down * spec[:, i]
 63 |             freqs[j] += w_down * allfreqs[i]
 64 |             totw[j] += w_down
 65 |             
 66 |             newspec[:, j + 1] += w_up * spec[:, i]
 67 |             freqs[j + 1] += w_up * allfreqs[i]
 68 |             totw[j + 1] += w_up
 69 |     
 70 |     for i in range(len(freqs)):
 71 |         if (totw[i] > 1e-6):
 72 |             freqs[i] /= totw[i]
 73 |     
 74 |     return newspec, freqs
 75 | 
 76 | """ plot spectrogram"""
 77 | def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="gray", channel=0, name='tmp.png', alpha=1, offset=0):
 78 |     samplerate, samples = wav.read(audiopath)
 79 |     samples = samples if len(samples.shape) <=1 else samples[:, channel]
 80 |     s = stft(samples, binsize) # 431 * 513
 81 | 
 82 |     #sshow : 431 * 256,
 83 |     sshow, freq = logscale_spec(s, factor=1, sr=samplerate, alpha=alpha)
 84 |     sshow = sshow[2:, :]
 85 |     ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
 86 |     timebins, freqbins = np.shape(ims)
 87 | 
 88 |     ims = np.transpose(ims)
 89 |     # ims = ims[0:256, offset:offset+768] # 0-11khz, ~9s interval
 90 |     ims = ims[0:256, :] # 0-11khz, ~10s interval
 91 |     #print "ims.shape", ims.shape
 92 | 
 93 |     image = Image.fromarray(ims)
 94 |     image = image.convert('L')
 95 |     image.save(name)
 96 | 
 97 | def create_spec(input_dir,save_img_dir,audio_type):
 98 | 
 99 |         # file = open(input_dir, 'r')
100 |         for iter, line in enumerate(glob.glob(os.path.join(input_dir,"*.%s"%audio_type))):#enumerate(file.readlines()[1:]): # first line of traininData.csv is header (only for trainingData.csv)
101 |             # filepath = line.split(',')[0]
102 |             filename = line.split("/")[-1].split(".")[0]
103 |             # file_split.pop(-1)
104 |             # filename = "_".join(file_split)
105 |             if audio_type == "mp3":
106 |                 wavfile = os.path.join(input_dir,filename + '.wav')
107 |                 #os.system('ffmpeg -i ' + line +" -ar 8000 tmp.wav" )
108 |                 #os.system("ffmpeg -i tmp.wav -ar 8000 tmp.mp3" )
109 |                 #os.system("ffmpeg -i tmp.mp3 -ar 8000 -ac 1 "+wavfile )
110 |                 os.system('ffmpeg -i ' + line +" " + wavfile)
111 | 
112 |                 # we create only one spectrogram for each speach sample
113 |                 # we don't do vocal tract length perturbation (alpha=1.0)
114 |                 # also we don't crop 9s part from the speech
115 |                 plotstft(wavfile, channel=0, name=os.path.join(save_img_dir,filename+'.png'), alpha=1.0)
116 |                 os.remove(wavfile)
117 |                 #os.remove("tmp.mp3")
118 |                 #os.remove("tmp.wav")
119 |             elif audio_type == "wav":
120 |                 plotstft(line, channel=0, name=os.path.join(save_img_dir, filename + '_1.png'), alpha=1.0)
121 |             print "processed %d files" % (iter + 1)
122 | 
123 | 
124 | if __name__ == '__main__':
125 |     create_spec(input_dir = opt.input_dir,save_img_dir = opt.save_img_dir,audio_type = opt.audio_type)
126 | 


--------------------------------------------------------------------------------
/main_topcoder.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | # @Time    : 8/10/18 4:48 PM
  4 | # @Author  : renxiaoming@julive.com
  5 | # @Site    : 
  6 | # @File    : main.py.py
  7 | # @Software: PyCharm
  8 | 
  9 | import torch
 10 | import yaml
 11 | import glob
 12 | import time
 13 | import PIL.Image as Image
 14 | import torch.utils.data as data
 15 | import Levenshtein
 16 | #from torchviz import make_dot
 17 | import os
 18 | if torch.cuda.is_available():
 19 |     import torch.cuda as device
 20 | else:
 21 |     import torch as device
 22 | 
 23 | from torch.autograd import Variable
 24 | import torch.nn as nn
 25 | import torch.nn.functional as F
 26 | from torch.optim.lr_scheduler import MultiStepLR
 27 | import numpy as np
 28 | import argparse
 29 | from logger import Logger
 30 | 
 31 | parser = argparse.ArgumentParser()
 32 | #parser.add_argument('--root_dir', type=str,default='/Users/comjia/Downloads/code/pytorch_seq2seq/pytorch_seq2seq',help='??aishell????????path')
 33 | parser.add_argument("--mode", type=str, default='train',help="train | dev | test")
 34 | parser.add_argument("--datalist_path", type=str, default="/movie/audio/topcoder",help="声谱图列表文件所在路径")
 35 | parser.add_argument("--use_gpu", type=int, default=0,help="use gpu = 1; ")
 36 | parser.add_argument("--use_pretrained", type=int, default=0,help="use_pretrained = 1; ")
 37 | parser.add_argument("--model_path", type=str, default="checkpoint/spoken_Lang_id_topcoder",help="model path ")
 38 | parser.add_argument("--img_path", type=str, default="",help="用于预测的声谱图路径 ")
 39 | 
 40 | opt = parser.parse_args()
 41 | print opt
 42 | 
 43 | class batch_gen_imgdata(data.Dataset):
 44 |     def __init__(self,data_listfile):
 45 |         # self.root = os.path.expanduser(root)
 46 |         # self.processed_imglist = glob.glob(os.path.join(data_path,"*.png"))
 47 |         with open(data_listfile, "r") as f:
 48 |             self.labels = f.readlines()
 49 |         self.num_samples = len(self.labels)
 50 | 
 51 |     def __getitem__(self, index):
 52 |         img_path_label = self.labels[index]
 53 |         #print img_path_label
 54 |         img_split = img_path_label.split(",")
 55 |         label = int(img_split[1][:-1])#kind_map[img_path.split("/")[-1].split("_")[0]]
 56 |         img = Image.open(img_split[0])
 57 |         img = np.array(img)
 58 |         return img if img.shape[1] == 154 else img[:,:-1],label,img_split[0]
 59 | 
 60 |     def __len__(self):
 61 |         return self.num_samples
 62 | 
 63 |     def collate_fn(self,batch):
 64 |         # batch.sort(key=lambda x: len(x[1]), reverse=True)
 65 |         imgs, labels, img_paths = zip(*batch)
 66 | 
 67 |         #64 * 256 * 858
 68 |         batch_imgs = np.array(imgs)
 69 |         # 64 * 1
 70 |         batch_labels = np.array(labels)[:,np.newaxis]
 71 |         return batch_imgs,batch_labels
 72 | 
 73 | class Network_CNN_RNN(nn.Module):
 74 |     def __init__(self,rnn_input_size,rnn_hidden_size,rnn_num_layers,use_gpu =False):
 75 |         super(Network_CNN_RNN, self).__init__()
 76 | 
 77 |         self.conv1 = nn.Sequential(
 78 |             nn.Conv2d(
 79 |                 in_channels=1,      # input height
 80 |                 out_channels=16,    # n_filters
 81 |                 kernel_size=7,      # filter size
 82 |                 stride=1,           # filter movement/step
 83 |                 padding=0,      # , padding=(kernel_size-1)/2 ? stride=1
 84 |             ),
 85 |             nn.BatchNorm2d(16),
 86 |             nn.ReLU(),    # activation
 87 |             nn.MaxPool2d(kernel_size=(3,3),stride=(2,2),padding=1),    # 16 * 125 *212
 88 |         )
 89 |         self.conv2 = nn.Sequential(  # input shape (16, 125, 212)
 90 |             nn.Conv2d(16, 32, 5, 1, 0),
 91 |             nn.BatchNorm2d(32),
 92 |             nn.ReLU(),  # activation
 93 |             nn.MaxPool2d(3,2,1),  # output shape (32, 61 ,104)
 94 |         )
 95 |         self.conv3 = nn.Sequential(  # input shape (32, 61 ,104)
 96 |             nn.Conv2d(32, 32, 3, 1, 0),
 97 |             nn.BatchNorm2d(32),
 98 |             nn.ReLU(),  # activation
 99 |             nn.MaxPool2d(3,2,1),  # output shape (32,30,51)
100 |         )
101 |         self.conv4 = nn.Sequential(  # input shape (32,30,51)
102 |             nn.Conv2d(32, 32, 3, 1, 0),
103 |             nn.BatchNorm2d(32),
104 |             nn.ReLU(),  # activation
105 |             nn.MaxPool2d(3,2),  # output shape (32, 13, 24)
106 |         )
107 |         self.gru = nn.GRU(input_size=rnn_input_size, hidden_size=rnn_hidden_size,
108 |                           num_layers = rnn_num_layers,batch_first=True)
109 |         #self.batch_norm = nn.BatchNorm2d()
110 |         self.out = nn.Linear(rnn_hidden_size, lang_num)
111 | 
112 | 
113 |         self.use_gpu = use_gpu
114 |         if self.use_gpu:
115 |             self = self.cuda()
116 | 
117 |     def forward(self, x):
118 |         #x: 100 * 1 * 256 * 858
119 |         x = self.conv1(x)
120 |         x = self.conv2(x)
121 |         x = self.conv3(x)
122 |         x = self.conv4(x)
123 |         # input: batchsize * 32 * 13 *51
124 |         x = x.view(x.size(0),-1,x.size(-1))
125 |         x = x.transpose(1,2)
126 |         gru_output,_ = self.gru(x)
127 |         gru_output = F.dropout(gru_output, training=self.training)
128 |         # gru_output = self.batch_norm(gru_output)
129 |         output = self.out(gru_output[:,-1,:])
130 |         #output = F.log_softmax(output, dim=1)
131 |         return output
132 | 
133 | def train(network):
134 |     best_model_acc = 0.0
135 |     optimizer = torch.optim.Adam(net.parameters(), lr=0.01,weight_decay=1e-5)
136 |     scheduler_ = MultiStepLR(optimizer, milestones=[5,10,100], gamma=0.1)
137 |     objective = nn.CrossEntropyLoss()
138 | 
139 |     for epoch in range(num_epochs):
140 |         network.train()
141 |         tr_loss = 0.0
142 |         tr_acc = 0.0
143 |         scheduler_.step()
144 |         for batch_index, (batch_imgs,batch_labels) in enumerate(train_set):
145 |             batch_imgs = Variable(torch.FloatTensor(batch_imgs))
146 |             batch_labels = Variable(torch.LongTensor(batch_labels))
147 | 
148 |             if use_gpu:
149 |                 batch_imgs = batch_imgs.cuda()
150 |                 batch_labels = batch_labels.cuda()
151 | 
152 | 
153 |             pred = network(batch_imgs.unsqueeze(1))
154 | 
155 |             loss = objective(input = pred,target = batch_labels.squeeze(1))
156 | 
157 |             optimizer.zero_grad()
158 |             loss.backward() # backpropagation, compute gradients
159 |             optimizer.step() # apply gradients
160 |             batch_loss = loss.cpu().data.numpy() #/ batch_labels.shape[0]
161 | 
162 |             _, keys = torch.topk(pred, 1)
163 |             pre = keys.cpu().data.numpy().T.tolist()[0]
164 |             tar = np.array(batch_labels).reshape([-1])
165 |             batch_acc = np.mean(np.equal(pre, tar))
166 | 
167 |             tr_loss += batch_loss
168 |             tr_acc += batch_acc
169 | 
170 |             # ========================= Log ======================
171 |             step = epoch * imgdata_train.num_samples + batch_index
172 |             # (1) Log the scalar values
173 |             info = {'loss': batch_loss, 'accuracy': batch_acc}
174 | 
175 |             for tag, value in info.items():
176 |                 logger_train.scalar_summary(tag, value, step)
177 | 
178 | 
179 |             if (batch_index+1) % verbose_step == 0:
180 |                 print(
181 |                 training_msg.format(time.asctime(), epoch + 1, batch_index + 1,
182 |                                     tr_loss / verbose_step, tr_acc / verbose_step))
183 |                 tr_loss = 0.0
184 |                 tr_acc = 0.0
185 | 
186 |         if epoch % per_epoch_save_model == 0:
187 |             dev_acc = dev(network, epoch=epoch)
188 |             if dev_acc > best_model_acc:
189 |                 torch.save(net.state_dict(), model_path)
190 | 
191 | def dev(network,epoch = 0):
192 |     network.eval()
193 |     sum_acc = 0.0
194 |     for batch_index, (batch_imgs,batch_labels) in enumerate(test_set):
195 |         batch_imgs = Variable(torch.FloatTensor(batch_imgs))
196 |         batch_labels = Variable(torch.LongTensor(batch_labels))
197 | 
198 |         if use_gpu:
199 |             batch_imgs = batch_imgs.cuda()
200 |             batch_labels = batch_labels.cuda()
201 | 
202 |         pred = network(batch_imgs.unsqueeze(1))
203 |         _, keys = torch.topk(pred, 1)
204 |         pre = keys.cpu().data.numpy().T.tolist()[0]
205 |         tar = np.array(batch_labels).reshape([-1])
206 |         batch_acc = np.mean(np.equal(pre, tar))
207 |         sum_acc += batch_acc * len(batch_labels)
208 |         print("step_{:3d}_acc_{:.4f}".format(batch_index + 1,batch_acc))
209 | 
210 |         # ========================= Log ======================
211 |         step = epoch * imgdata_test.num_samples + batch_index
212 |         # (1) Log the scalar values
213 |         info = { 'accuracy': batch_acc}
214 | 
215 |         for tag, value in info.items():
216 |             logger_dev.scalar_summary(tag, value, step)
217 |     sum_acc = sum_acc/imgdata_test.num_samples
218 |     print "sum_acc = %.4f"%(sum_acc)
219 |     return sum_acc
220 | 
221 | def test():
222 |     None
223 | 
224 | def predict(network,img_path):
225 |     network.eval()
226 | 
227 |     batch_imgs = Image.open(img_path)
228 |     batch_imgs = Variable(torch.FloatTensor(batch_imgs))
229 |     if use_gpu:
230 |         batch_imgs = batch_imgs.cuda()
231 | 
232 |     pred = network(batch_imgs.unsqueeze(1))
233 |     _, keys = torch.topk(pred, 1)
234 |     pre = keys.cpu().data.numpy().T.tolist()[0]
235 |     print pre
236 | 
237 | 
238 | if __name__ == '__main__':
239 | 
240 |     use_gpu = True if 1 == opt.use_gpu else False
241 |     use_pretrained = True if opt.use_pretrained == 1 else False
242 |     model_path = opt.model_path
243 |     lang_num = 179
244 |     rnn_hidden_size = 500
245 |     rnn_num_layers = 1
246 |     rnn_input_size = 416#448
247 |     num_epochs = 100
248 |     verbose_step = 10
249 |     per_epoch_save_model = 1
250 | 
251 |     logger_train = Logger('./log_topcoder/logs_train')
252 |     logger_dev = Logger('./log_topcoder/logs_dev')
253 |     net = Network_CNN_RNN(rnn_input_size = rnn_input_size,rnn_hidden_size = rnn_hidden_size,rnn_num_layers = rnn_num_layers,use_gpu=use_gpu)
254 |     print net
255 |     params = list(net.parameters())
256 |     k = 0
257 |     for i in params:
258 |         l = 1
259 |         for j in i.size():
260 |             l *= j
261 |         k = k + l
262 |     print("all params num:" + str(k))
263 | 
264 | 
265 |     if use_pretrained:
266 |         net.load_state_dict(torch.load(model_path))
267 | 
268 | 
269 |     training_msg = 'time_{}_epoch_{:2d}_step_{:3d}_TrLoss_{:.4f}_acc_{:.4f}'
270 |     imgdata_train = batch_gen_imgdata(data_listfile=os.path.join(opt.datalist_path,"trainEqual.csv"))
271 |     train_set = torch.utils.data.DataLoader(imgdata_train,
272 |                                               batch_size=64,
273 |                                               shuffle=True,
274 |                                               num_workers=1,
275 |                                               collate_fn=imgdata_train.collate_fn)
276 | 
277 |     imgdata_test = batch_gen_imgdata(data_listfile=os.path.join(opt.datalist_path,"valEqaul.csv"))
278 |     test_set = torch.utils.data.DataLoader(imgdata_test,
279 |                                               batch_size=64,
280 |                                               shuffle=False,
281 |                                               num_workers=1,
282 |                                               collate_fn=imgdata_test.collate_fn)
283 | 
284 |     if opt.mode == "train":
285 |         train(net)
286 |     elif opt.mode == "dev":
287 |         dev(net)
288 |     elif opt.mode == "test":
289 |         test()
290 |     elif opt.mode == "predict":
291 |         predict(net,opt.img_path)
292 | 


--------------------------------------------------------------------------------