├── .gitignore ├── IWSLT ├── dev │ └── placeholder ├── en │ └── placeholder ├── log │ └── placeholder ├── model │ └── placeholder ├── test │ └── placeholder └── zh │ └── placeholder ├── README.md ├── mxwrap ├── __init__.py ├── attention │ ├── BasicAttention.py │ ├── ConcatAttention.py │ └── __init__.py ├── rnn │ ├── BaseCell.py │ ├── GRU.py │ ├── GRUv0.py │ ├── LSTM.py │ ├── SimpleRNN.py │ └── __init__.py └── seq2seq │ ├── __init__.py │ ├── decoder.py │ └── encoder.py ├── nmt ├── dict_gen.py ├── inference.py ├── inference_mask.py ├── main.py ├── masked_bucket_io.py ├── masked_bucket_io_new.py ├── tester.py ├── trainer.py ├── xcallback.py ├── xconfig.py ├── xmetric.py ├── xsymbol.py └── xutils.py └── trainingLog.txt /.gitignore: -------------------------------------------------------------------------------- 1 | dist 2 | *.egg-info 3 | build 4 | *.pyc 5 | *.paramas 6 | *.params -------------------------------------------------------------------------------- /IWSLT/dev/placeholder: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/dev/placeholder -------------------------------------------------------------------------------- /IWSLT/en/placeholder: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/en/placeholder -------------------------------------------------------------------------------- /IWSLT/log/placeholder: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/log/placeholder -------------------------------------------------------------------------------- /IWSLT/model/placeholder: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/model/placeholder -------------------------------------------------------------------------------- /IWSLT/test/placeholder: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/test/placeholder -------------------------------------------------------------------------------- /IWSLT/zh/placeholder: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/zh/placeholder -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MXNMT: MXNet based Neural Machine Translation 2 | 3 | This is an implementation of seq2seq with attention for neural machine translation with MXNet. 4 | 5 | ## Warning: 6 | This repo is no longer maintained. 7 | I recommend https://github.com/magic282/PyTorch_seq2seq 8 | 9 | ## Data 10 | 11 | The current code uses IWSLT 2009 Chinese-English corpus as training, development and test data. Please request this data set or **use other available parallel corpus**. Data statistics, 12 | 13 | | training | dev | test | 14 | |----------|-----|------| 15 | | 81819 | 446 | 504 | 16 | 17 | ## Attention 18 | * This code does work with the latest mxnet. I made a new version with improved performance in the [next](https://github.com/magic282/MXNMT/tree/next) branch and it can run with the 0.9.5 mxnet. However, this branch is not complete since it lacks the decode part. **I will really appreciate it if you can contribute to this branch.** Also, I ***strongly*** recommend to use this commit (138344683e65c87af20250e3f4cdcc5a72ac3cc5) of mxnet because of [this issue](https://github.com/dmlc/mxnet/issues/5816). 19 | * The author cannot distribute this dataset. **Any email requesting this dataset to the code author will not be replied.** 20 | 21 | ### Dev/Test Data Format 22 | The reference number of IWSLT 2009 Ch-En is 7, for example: 23 | ``` 24 | 在 找 给 家里 人 的 礼物 . 25 | 26 | i 'm searching for some gifts for my family . 27 | i want to find something for my family as presents . 28 | i 'm about to buy some presents for my family . 29 | i 'd like to buy my family something as a gift . 30 | i 'm looking for a gift for my family . 31 | i 'm looking for a present for my family . 32 | i need a gift for my family . 33 | 有 $number 块 钱 以下 的 茶 吗 ? |||| {1 ||| 1 ||| one thousand ||| $number ||| 一千} 34 | 35 | do you have any tea under one thousand yen ? 36 | i 'd like to take a look at some tea cheaper than one thousand yen . 37 | is there any tea less than one thousand yen here ? 38 | i 'm looking for some tea under one thousand yen . 39 | do you have any tea lower than one thousand yen ? 40 | do you have any tea less than one thousand yen ? 41 | i would like to buy some tea cheaper than one thousand yen . 42 | ``` 43 | 44 | ## Result 45 | 46 | According to my test, this code can achieve 44.18 BLEU score (with beam search) on IWSLT dev set without post-processing after 53 iteration. Specifically, 47 | `1gram=72.65% 2gram=49.63% 3gram=37.62% 4gram=28.08% BP = 1.0000 BLEU = 0.4418` 48 | 49 | 50 | ## Know Issues 51 | * Compatibility issue. The current version will ask to use Python 3 since it is annoying to handle Chinese encoding problems for Python 2. 52 | * In the attention part, `h.dot(U)` should be pre-computed. However it seems that it won't work properly if I do so. 53 | * The BLEU evaluator, which is an exe file and not included, should be replaced by nltk evaluator in the future. 54 | * The model can be modified to make it achieve about 50 BLEU score on this data set. 55 | -------------------------------------------------------------------------------- /mxwrap/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/__init__.py -------------------------------------------------------------------------------- /mxwrap/attention/BasicAttention.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | 3 | 4 | class BasicAttention: 5 | def __init__(self, batch_size, attend_dim, state_dim): 6 | self.e_weight_W = mx.sym.Variable('energy_W_weight', shape=(state_dim, state_dim)) 7 | self.e_weight_U = mx.sym.Variable('energy_U_weight', shape=(attend_dim, state_dim)) 8 | self.e_weight_v = mx.sym.Variable('energy_v_bias', shape=(state_dim, 1)) 9 | self.batch_size = batch_size 10 | self.attend_dim = attend_dim 11 | self.state_dim = state_dim 12 | self.pre_compute_buf = {} 13 | 14 | def getHdotU(self, attended, idx): 15 | if idx not in self.pre_compute_buf: 16 | h = attended[idx] # (batch, attend_dim) 17 | expr = mx.sym.dot(h, self.e_weight_U, name='_energy_1_{0:03d}'.format(idx)) 18 | self.pre_compute_buf[idx] = expr 19 | return self.pre_compute_buf[idx] 20 | 21 | def attend(self, attended, concat_attended, state, attend_masks, use_masking): 22 | ''' 23 | 24 | :param attended: list [seq_len, (batch, attend_dim)] 25 | :param concat_attended: (batch, seq_len, attend_dim ) 26 | :param state: (batch, state_dim) 27 | :param attend_masks: list [seq_len, (batch, 1)] 28 | :param use_masking: boolean 29 | :return: 30 | ''' 31 | seq_len = len(attended) 32 | energy_all = [] 33 | pre_compute = mx.sym.dot(state, self.e_weight_W, name='_energy_0') 34 | for idx in range(seq_len): 35 | h = attended[idx] # (batch, attend_dim) 36 | energy = pre_compute + mx.sym.dot(h, self.e_weight_U, 37 | name='_energy_1_{0:03d}'.format(idx)) # (batch, state_dim) 38 | # energy = pre_compute + self.getHdotU(attended, idx) 39 | energy = mx.sym.Activation(energy, act_type="tanh", 40 | name='_energy_2_{0:03d}'.format(idx)) # (batch, state_dim) 41 | energy = mx.sym.dot(energy, self.e_weight_v, name='_energy_3_{0:03d}'.format(idx)) # (batch, 1) 42 | if use_masking: 43 | energy = energy * attend_masks[idx] + (1.0 - attend_masks[idx]) * (-10000.0) # (batch, 1) 44 | energy_all.append(energy) 45 | 46 | all_energy = mx.sym.Concat(*energy_all, dim=1, name='_all_energy_1') # (batch, seq_len) 47 | 48 | alpha = mx.sym.SoftmaxActivation(all_energy, name='_alpha_1') # (batch, seq_len) 49 | alpha = mx.sym.Reshape(data=alpha, shape=(self.batch_size, seq_len, 1), 50 | name='_alpha_2') # (batch, seq_len, 1) 51 | 52 | weighted_attended = mx.sym.broadcast_mul(alpha, concat_attended, 53 | name='_weighted_attended_1') # (batch, seq_len, attend_dim) 54 | weighted_attended = mx.sym.sum(data=weighted_attended, axis=1, 55 | name='_weighted_attended_2') # (batch, attend_dim) 56 | return alpha, weighted_attended 57 | -------------------------------------------------------------------------------- /mxwrap/attention/ConcatAttention.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | 3 | 4 | class ConcatAttention: 5 | def __init__(self, batch_size, attend_dim, state_dim): 6 | self.e_weight_W = mx.sym.Variable('energy_W_weight', shape=(state_dim, state_dim)) 7 | self.e_weight_U = mx.sym.Variable('energy_U_weight', shape=(attend_dim, state_dim)) 8 | self.e_weight_v = mx.sym.Variable('energy_v_bias', shape=(state_dim, 1)) 9 | self.batch_size = batch_size 10 | self.attend_dim = attend_dim 11 | self.state_dim = state_dim 12 | 13 | def pre_compute(self, attended): 14 | seq_len = len(attended) 15 | res = [None for i in range(seq_len)] 16 | for idx in range(seq_len): 17 | h = attended[idx] 18 | res[idx] = mx.sym.dot(h, self.e_weight_U, name='_energy_1_{0:03d}'.format(idx)) 19 | return res 20 | 21 | def attend(self, source_pre_computed, attended, concat_attended, state, attend_masks, use_masking): 22 | ''' 23 | 24 | :param attended: list [seq_len, (batch, attend_dim)] 25 | :param concat_attended: (batch, seq_len, attend_dim ) 26 | :param state: (batch, state_dim) 27 | :param attend_masks: list [seq_len, (batch, 1)] 28 | :param use_masking: boolean 29 | :return: 30 | ''' 31 | seq_len = len(attended) 32 | energy_all = [] 33 | pre_compute = mx.sym.dot(state, self.e_weight_W, name='_energy_0') 34 | for idx in range(seq_len): 35 | energy = pre_compute + source_pre_computed[idx] 36 | energy = mx.sym.Activation(energy, act_type="tanh", 37 | name='_energy_2_{0:03d}'.format(idx)) # (batch, state_dim) 38 | energy = mx.sym.dot(energy, self.e_weight_v, name='_energy_3_{0:03d}'.format(idx)) # (batch, 1) 39 | if use_masking: 40 | energy = energy * attend_masks[idx] + (1.0 - attend_masks[idx]) * (-10000.0) # (batch, 1) 41 | energy_all.append(energy) 42 | 43 | all_energy = mx.sym.Concat(*energy_all, dim=1, name='_all_energy_1') # (batch, seq_len) 44 | 45 | alpha = mx.sym.SoftmaxActivation(all_energy, name='_alpha_1') # (batch, seq_len) 46 | alpha = mx.sym.Reshape(data=alpha, shape=(self.batch_size, seq_len, 1), 47 | name='_alpha_2') # (batch, seq_len, 1) 48 | 49 | weighted_attended = mx.sym.broadcast_mul(alpha, concat_attended, 50 | name='_weighted_attended_1') # (batch, seq_len, attend_dim) 51 | weighted_attended = mx.sym.sum(data=weighted_attended, axis=1, 52 | name='_weighted_attended_2') # (batch, attend_dim) 53 | return alpha, weighted_attended 54 | 55 | def pre_compute_fast(self, attended): 56 | seq_len = len(attended) 57 | buf = [] 58 | for s in attended: 59 | buf.append(mx.sym.expand_dims(data=s, axis=0)) 60 | time_major_concat = mx.sym.concat(*buf, dim=0, name='time_major_concat') # (seq, batch, dim) 61 | time_major_concat = mx.sym.dot(time_major_concat, self.e_weight_U, name='_expr01') # (seq, batch, dim) 62 | return time_major_concat 63 | 64 | def attend_fast(self, source_pre_computed, seq_len, state, attend_masks, use_masking): 65 | ''' 66 | 67 | :param source_pre_computed: 68 | :param seq_len: 69 | :param state: 70 | :param attend_masks: 71 | :param use_masking: 72 | :return: 73 | ''' 74 | energy_all = [] 75 | pre_compute = mx.sym.dot(state, self.e_weight_W, name='_energy_00') # (batch, dim) 76 | pre_compute = mx.sym.expand_dims(data=pre_compute, axis=0, name='_energy_10') # (1, batch, dim) 77 | 78 | energy = mx.sym.broadcast_add(source_pre_computed, pre_compute, name='_b_add') 79 | energy = mx.sym.Activation(energy, act_type="tanh", name='_energy_20') # (seq, batch, dim) 80 | energy = mx.sym.dot(energy, self.e_weight_v, name='_energy_30') # (seq, batch, 1) 81 | energy = mx.sym.reshape(energy, shape=(seq_len, -1)) # (seq, batch) 82 | energy = mx.sym.split(energy, axis=0, num_outputs=seq_len, squeeze_axis=True) # [seq, (batch,)] 83 | 84 | for idx in range(seq_len): 85 | this_e = energy[idx] 86 | if use_masking: 87 | this_e = this_e * attend_masks[idx] + (1.0 - attend_masks[idx]) * (-1e6) # (batch,) 88 | this_e = mx.sym.expand_dims(data=this_e, axis=0, name='_this_e_10') 89 | energy_all.append(this_e) 90 | 91 | all_energy = mx.sym.Concat(*energy_all, dim=0, name='_all_energy_1') # (seq, batch) 92 | 93 | alpha = mx.sym.SoftmaxActivation(all_energy, name='_alpha_1') # (seq, batch) 94 | alpha = mx.sym.expand_dims(data=alpha, axis=2, name='_alpha_2') # (seq, batch, 1) 95 | 96 | weighted_attended = mx.sym.broadcast_mul(source_pre_computed, alpha, 97 | name='_weighted_attended_1') # (seq, batch, attend_dim) 98 | weighted_attended = mx.sym.sum(data=weighted_attended, axis=0, 99 | name='_weighted_attended_2') # (batch, attend_dim) 100 | return alpha, weighted_attended 101 | -------------------------------------------------------------------------------- /mxwrap/attention/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/attention/__init__.py -------------------------------------------------------------------------------- /mxwrap/rnn/BaseCell.py: -------------------------------------------------------------------------------- 1 | from abc import abstractmethod, ABCMeta 2 | 3 | 4 | class BaseCell(object): 5 | __metaclass__ = ABCMeta 6 | 7 | def __init__(self, *args, **kwargs): 8 | pass 9 | 10 | @abstractmethod 11 | def apply(self): 12 | raise NotImplementedError 13 | -------------------------------------------------------------------------------- /mxwrap/rnn/GRU.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | from collections import namedtuple 3 | from .BaseCell import BaseCell 4 | 5 | 6 | class GRU(BaseCell): 7 | def __init__(self, name, num_hidden, **kwargs): 8 | super(BaseCell, self).__init__() 9 | self.name = name + '_GRU' 10 | self.num_hidden = num_hidden 11 | self.W = mx.sym.Variable("{0}_W_weight".format(self.name)) 12 | self.B = mx.sym.Variable("{0}_W_bias".format(self.name)) 13 | self.U = mx.sym.Variable("{0}_U_weight".format(self.name)) 14 | 15 | def apply(self, indata, prev_h, seqidx, mask=None): 16 | xW = mx.sym.FullyConnected(data=indata, 17 | weight=self.W, 18 | bias=self.B, 19 | num_hidden=self.num_hidden * 3, 20 | name="{0}_xW_{1}".format(self.name, seqidx) 21 | ) 22 | # hU = mx.sym.dot(prev_state.h, param.gru_U_weight) 23 | hU = mx.sym.FullyConnected(data=prev_h, 24 | weight=self.U, 25 | num_hidden=self.num_hidden * 3, 26 | no_bias=True, 27 | name="{0}_hU_{1}".format(self.name, seqidx) 28 | ) 29 | xW_s = mx.sym.split(num_outputs=3, data=xW) 30 | hU_s = mx.sym.split(num_outputs=3, data=hU) 31 | r = mx.sym.Activation(data=(xW_s[0] + hU_s[0]), act_type='sigmoid') 32 | z = mx.sym.Activation(data=(xW_s[1] + hU_s[1]), act_type='sigmoid') 33 | h1 = mx.sym.Activation(data=(xW_s[2] + r * hU_s[2]), act_type='tanh') 34 | 35 | h = (h1 - prev_h) * z + prev_h 36 | if mask: 37 | h = mx.sym.broadcast_mul(mask, h, name='bm_1') + mx.sym.broadcast_mul((1 - mask), prev_h, name='bm_2') 38 | return h 39 | -------------------------------------------------------------------------------- /mxwrap/rnn/GRUv0.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | from collections import namedtuple 3 | 4 | GRUState = namedtuple("GRUState", ["h"]) 5 | GRUParam = namedtuple("GRUParam", ["gates_i2h_weight", "gates_i2h_bias", 6 | "gates_h2h_weight", "gates_h2h_bias", 7 | "trans_i2h_weight", "trans_i2h_bias", 8 | "trans_h2h_weight", "trans_h2h_bias"]) 9 | GRUModel = namedtuple("GRUModel", ["rnn_exec", "symbol", 10 | "init_states", "last_states", 11 | "seq_data", "seq_labels", "seq_outputs", 12 | "param_blocks"]) 13 | 14 | 15 | def gru(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0.): 16 | """ 17 | GRU Cell symbol 18 | Reference: 19 | * Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural 20 | networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014). 21 | """ 22 | if dropout > 0.: 23 | indata = mx.sym.Dropout(data=indata, p=dropout) 24 | i2h = mx.sym.FullyConnected(data=indata, 25 | weight=param.gates_i2h_weight, 26 | bias=param.gates_i2h_bias, 27 | num_hidden=num_hidden * 2, 28 | name="t%d_l%d_gates_i2h" % (seqidx, layeridx)) 29 | h2h = mx.sym.FullyConnected(data=prev_state.h, 30 | weight=param.gates_h2h_weight, 31 | bias=param.gates_h2h_bias, 32 | num_hidden=num_hidden * 2, 33 | name="t%d_l%d_gates_h2h" % (seqidx, layeridx)) 34 | gates = i2h + h2h 35 | slice_gates = mx.sym.SliceChannel(gates, num_outputs=2, 36 | name="t%d_l%d_slice" % (seqidx, layeridx)) 37 | update_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid") 38 | reset_gate = mx.sym.Activation(slice_gates[1], act_type="sigmoid") 39 | # The transform part of GRU is a little magic 40 | htrans_i2h = mx.sym.FullyConnected(data=indata, 41 | weight=param.trans_i2h_weight, 42 | bias=param.trans_i2h_bias, 43 | num_hidden=num_hidden, 44 | name="t%d_l%d_trans_i2h" % (seqidx, layeridx)) 45 | h_after_reset = prev_state.h * reset_gate 46 | htrans_h2h = mx.sym.FullyConnected(data=h_after_reset, 47 | weight=param.trans_h2h_weight, 48 | bias=param.trans_h2h_bias, 49 | num_hidden=num_hidden, 50 | name="t%d_l%d_trans_i2h" % (seqidx, layeridx)) 51 | h_trans = htrans_i2h + htrans_h2h 52 | h_trans_active = mx.sym.Activation(h_trans, act_type="tanh") 53 | next_h = prev_state.h + update_gate * (h_trans_active - prev_state.h) 54 | return GRUState(h=next_h) 55 | -------------------------------------------------------------------------------- /mxwrap/rnn/LSTM.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | from collections import namedtuple 3 | 4 | 5 | LSTMState = namedtuple("LSTMState", ["c", "h"]) 6 | LSTMParam = namedtuple("LSTMParam", ["i2h_weight", "i2h_bias", 7 | "h2h_weight", "h2h_bias"]) 8 | LSTMModel = namedtuple("LSTMModel", ["rnn_exec", "symbol", 9 | "init_states", "last_states", 10 | "seq_data", "seq_labels", "seq_outputs", 11 | "param_blocks"]) 12 | 13 | 14 | def lstm(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0.): 15 | """LSTM Cell symbol""" 16 | if dropout > 0.: 17 | indata = mx.sym.Dropout(data=indata, p=dropout) 18 | i2h = mx.sym.FullyConnected(data=indata, 19 | weight=param.i2h_weight, 20 | bias=param.i2h_bias, 21 | num_hidden=num_hidden * 4, 22 | name="t%d_l%d_i2h" % (seqidx, layeridx)) 23 | h2h = mx.sym.FullyConnected(data=prev_state.h, 24 | weight=param.h2h_weight, 25 | bias=param.h2h_bias, 26 | num_hidden=num_hidden * 4, 27 | name="t%d_l%d_h2h" % (seqidx, layeridx)) 28 | gates = i2h + h2h 29 | slice_gates = mx.sym.SliceChannel(gates, num_outputs=4, 30 | name="t%d_l%d_slice" % (seqidx, layeridx)) 31 | in_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid") 32 | in_transform = mx.sym.Activation(slice_gates[1], act_type="tanh") 33 | forget_gate = mx.sym.Activation(slice_gates[2], act_type="sigmoid") 34 | out_gate = mx.sym.Activation(slice_gates[3], act_type="sigmoid") 35 | next_c = (forget_gate * prev_state.c) + (in_gate * in_transform) 36 | next_h = out_gate * mx.sym.Activation(next_c, act_type="tanh") 37 | return LSTMState(c=next_c, h=next_h) 38 | -------------------------------------------------------------------------------- /mxwrap/rnn/SimpleRNN.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | from collections import namedtuple 3 | 4 | RNNState = namedtuple("RNNState", ["h"]) 5 | RNNParam = namedtuple("RNNParam", ["i2h_weight", "i2h_bias", 6 | "h2h_weight", "h2h_bias"]) 7 | RNNModel = namedtuple("RNNModel", ["rnn_exec", "symbol", 8 | "init_states", "last_states", 9 | "seq_data", "seq_labels", "seq_outputs", 10 | "param_blocks"]) 11 | 12 | 13 | def rnn(num_hidden, in_data, prev_state, param, seqidx, layeridx, dropout=0., batch_norm=False): 14 | if dropout > 0. : 15 | in_data = mx.sym.Dropout(data=in_data, p=dropout) 16 | i2h = mx.sym.FullyConnected(data=in_data, 17 | weight=param.i2h_weight, 18 | bias=param.i2h_bias, 19 | num_hidden=num_hidden, 20 | name="t%d_l%d_i2h" % (seqidx, layeridx)) 21 | h2h = mx.sym.FullyConnected(data=prev_state.h, 22 | weight=param.h2h_weight, 23 | bias=param.h2h_bias, 24 | num_hidden=num_hidden, 25 | name="t%d_l%d_h2h" % (seqidx, layeridx)) 26 | hidden = i2h + h2h 27 | 28 | hidden = mx.sym.Activation(data=hidden, act_type="tanh") 29 | if batch_norm == True: 30 | hidden = mx.sym.BatchNorm(data=hidden) 31 | return RNNState(h=hidden) -------------------------------------------------------------------------------- /mxwrap/rnn/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/rnn/__init__.py -------------------------------------------------------------------------------- /mxwrap/seq2seq/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/seq2seq/__init__.py -------------------------------------------------------------------------------- /mxwrap/seq2seq/decoder.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | 3 | from ..rnn.GRU import GRU 4 | 5 | 6 | class GruAttentionDecoder(object): 7 | def __init__(self, use_masking, 8 | state_dim, 9 | input_dim, output_dim, 10 | vocab_size, embed_dim, 11 | dropout=0.0, num_of_layer=1, 12 | attention=None, **kwargs): 13 | self.use_masking = use_masking 14 | self.state_dim = state_dim 15 | self.input_dim = input_dim 16 | self.output_dim = output_dim 17 | self.vocab_size = vocab_size 18 | self.embed_dim = embed_dim 19 | self.dropout = dropout 20 | self.num_of_layer = num_of_layer 21 | self.attention = attention 22 | self.kwargs = kwargs 23 | self.gru = GRU('decode', self.state_dim) 24 | # declare variables 25 | self.embed_weight = mx.sym.Variable("target_embed_weight") 26 | self.cls_weight = mx.sym.Variable("target_cls_weight") 27 | self.cls_bias = mx.sym.Variable("target_cls_bias") 28 | self.init_weight = mx.sym.Variable("target_init_weight") 29 | self.init_bias = mx.sym.Variable("target_init_bias") 30 | 31 | def decode(self, target_len, encoded_for_init_state, encoded, encoded_mask): 32 | # last_encoded = encoded[-1] 33 | 34 | data = mx.sym.Variable('target') # target input data 35 | label = mx.sym.Variable('target_softmax_label') # target label data 36 | 37 | hidden_all = [None for _ in range(target_len)] 38 | context_all = [None for _ in range(target_len)] 39 | all_weights = [None for _ in range(target_len)] 40 | readout_all = [None for _ in range(target_len)] 41 | 42 | init_h = mx.sym.FullyConnected(data=encoded_for_init_state, num_hidden=self.state_dim * self.num_of_layer, 43 | weight=self.init_weight, bias=self.init_bias, name='init_fc') 44 | init_h = mx.sym.Activation(data=init_h, act_type='tanh', name='init_act') 45 | 46 | # embedding layer 47 | embed = mx.sym.Embedding(data=data, input_dim=self.vocab_size, 48 | weight=self.embed_weight, output_dim=self.embed_dim, name='target_embed') 49 | wordvec = mx.sym.split(data=embed, num_outputs=target_len, squeeze_axis=1) 50 | # split mask 51 | if self.use_masking: 52 | input_mask = mx.sym.Variable('target_mask') 53 | # masks = mx.sym.split(data=input_mask, num_outputs=target_len, name='sliced_target_mask') 54 | 55 | source_attention_pre_compute = self.attention.pre_compute_fast(encoded) 56 | 57 | for seq_idx in range(target_len): 58 | # mask = masks[seq_idx] if self.use_masking else None 59 | if seq_idx == 0: 60 | hidden_all[seq_idx] = init_h 61 | else: 62 | in_x = mx.sym.Concat(wordvec[seq_idx], context_all[seq_idx - 1]) 63 | # hidden_all[seq_idx] = self.gru.apply(in_x, hidden_all[seq_idx - 1], seq_idx, mask) 64 | hidden_all[seq_idx] = self.gru.apply(in_x, hidden_all[seq_idx - 1], seq_idx) 65 | 66 | weights, weighted_encoded = self.attention.attend_fast(source_pre_computed=source_attention_pre_compute, 67 | seq_len=len(encoded), 68 | state=hidden_all[seq_idx], 69 | attend_masks=encoded_mask, 70 | use_masking=True) 71 | context_all[seq_idx] = weighted_encoded 72 | all_weights[seq_idx] = weights 73 | readout_all[seq_idx] = mx.sym.Concat(wordvec[seq_idx], context_all[seq_idx], hidden_all[seq_idx]) 74 | 75 | hidden_concat = mx.sym.Concat(*readout_all, dim=0) 76 | pred = mx.sym.FullyConnected(data=hidden_concat, num_hidden=self.output_dim, 77 | weight=self.cls_weight, bias=self.cls_bias, name='target_pred') 78 | 79 | label = mx.sym.transpose(data=label) 80 | label = mx.sym.Reshape(data=label, shape=(-1,)) 81 | 82 | sm = mx.sym.SoftmaxOutput(data=pred, label=label, 83 | use_ignore=True, ignore_label=0, normalization='valid', 84 | name='target_softmax') 85 | return sm 86 | 87 | # loss = mx.sym.softmax_cross_entropy(pred, label) 88 | # loss = mx.sym.MakeLoss(loss) 89 | # return loss 90 | -------------------------------------------------------------------------------- /mxwrap/seq2seq/encoder.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | 3 | from ..rnn.GRU import GRU 4 | 5 | 6 | class BiDirectionalGruEncoder(object): 7 | def __init__(self, use_masking, 8 | state_dim, 9 | input_dim, output_dim, 10 | vocab_size, embed_dim, 11 | dropout=0.0, num_of_layer=1): 12 | self.use_masking = use_masking 13 | self.state_dim = state_dim 14 | self.input_dim = input_dim 15 | self.output_dim = output_dim 16 | self.vocab_size = vocab_size 17 | self.embed_dim = embed_dim 18 | self.dropout = dropout 19 | self.num_of_layer = num_of_layer 20 | # declare variables 21 | self.forward_gru = GRU('forward_source', self.state_dim) 22 | self.backward_gru = GRU('backward_source', self.state_dim) 23 | self.embed_weight = mx.sym.Variable("source_embed_weight") 24 | 25 | def encode(self, seq_len): 26 | data = mx.sym.Variable('source') # input data, source 27 | 28 | # embedding layer 29 | embed = mx.sym.Embedding(data=data, input_dim=self.vocab_size, 30 | weight=self.embed_weight, output_dim=self.embed_dim, name='source_embed') 31 | wordvec = mx.sym.split(data=embed, num_outputs=seq_len, squeeze_axis=1) 32 | 33 | # split mask 34 | if self.use_masking: 35 | input_mask = mx.sym.Variable('source_mask') 36 | enc_masks = mx.sym.split(data=input_mask, num_outputs=seq_len, squeeze_axis='False', 37 | name='sliced_source_mask') 38 | att_masks = mx.sym.split(data=input_mask, num_outputs=seq_len, squeeze_axis='True', 39 | name='sliced_source_mask') 40 | 41 | forward_hidden = [None for i in range(seq_len)] 42 | backward_hidden = [None for i in range(seq_len)] 43 | bi_hidden = [] 44 | for seq_idx in range(seq_len): 45 | word = wordvec[seq_idx] 46 | mask = enc_masks[seq_idx] if self.use_masking else None 47 | if seq_idx == 0: 48 | forward_hidden[seq_idx] = mx.sym.Variable("forward_source_l0_init_h") 49 | else: 50 | forward_hidden[seq_idx] = self.forward_gru.apply(word, forward_hidden[seq_idx - 1], seq_idx, mask) 51 | 52 | for seq_idx in range(seq_len - 1, -1, -1): 53 | word = wordvec[seq_idx] 54 | mask = enc_masks[seq_idx] if self.use_masking else None 55 | if seq_idx == seq_len - 1: 56 | backward_hidden[seq_idx] = mx.sym.Variable("backward_source_l0_init_h") 57 | else: 58 | backward_hidden[seq_idx] = self.backward_gru.apply(word, backward_hidden[seq_idx + 1], seq_idx, mask) 59 | 60 | # for seq_idx in range(self.seq_len): 61 | for f, b in zip(forward_hidden, backward_hidden): 62 | bi = mx.sym.Concat(f, b, dim=1) 63 | bi_hidden.append(bi) 64 | 65 | if self.use_masking: 66 | return forward_hidden, backward_hidden, bi_hidden, att_masks 67 | else: 68 | return forward_hidden, backward_hidden, bi_hidden 69 | -------------------------------------------------------------------------------- /nmt/dict_gen.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import argparse 4 | import pickle 5 | import gzip 6 | import bz2 7 | import logging 8 | import os 9 | 10 | import numpy 11 | import tables 12 | 13 | from collections import Counter 14 | from operator import add 15 | from numpy.lib.stride_tricks import as_strided 16 | 17 | parser = argparse.ArgumentParser( 18 | description=""" 19 | This takes a list of .txt or .txt.gz files and does word counting and 20 | creating a dictionary (potentially limited by size). It uses this 21 | dictionary to binarize the text into a numeric format (replacing OOV 22 | words with 1) and create n-grams of a fixed size (padding the sentence 23 | with 0 for EOS and BOS markers as necessary). The n-gram data can be 24 | split up in a training and validation set. 25 | 26 | The n-grams are saved to HDF5 format whereas the dictionary, word counts 27 | and binarized text are all pickled Python objects. 28 | """, formatter_class=argparse.RawTextHelpFormatter) 29 | parser.add_argument("input", type=argparse.FileType('r', encoding='utf-8'), nargs="+", 30 | help="The input files") 31 | parser.add_argument("-b", "--binarized-text", default='binarized_text.pkl', 32 | help="the name of the pickled binarized text file") 33 | parser.add_argument("-d", "--dictionary", default='vocab.pkl', 34 | help="the name of the pickled binarized text file") 35 | parser.add_argument("-n", "--ngram", type=int, metavar="N", 36 | help="create n-grams") 37 | parser.add_argument("-v", "--vocab", type=int, metavar="N", 38 | help="limit vocabulary size to this number, which must " 39 | "include BOS/EOS and OOV markers") 40 | parser.add_argument("-p", "--pickle", action="store_true", 41 | help="pickle the text as a list of lists of ints") 42 | parser.add_argument("-s", "--split", type=float, metavar="N", 43 | help="create a validation set. If >= 1 take this many " 44 | "samples for the validation set, if < 1, take this " 45 | "fraction of the samples") 46 | parser.add_argument("-o", "--overwrite", action="store_true", 47 | help="overwrite earlier created files, also forces the " 48 | "program not to reuse count files") 49 | parser.add_argument("-e", "--each", action="store_true", 50 | help="output files for each separate input file") 51 | parser.add_argument("-c", "--count", action="store_true", 52 | help="save the word counts") 53 | parser.add_argument("-t", "--char", action="store_true", 54 | help="character-level processing") 55 | parser.add_argument("-l", "--lowercase", action="store_true", 56 | help="lowercase") 57 | 58 | 59 | def open_files(): 60 | base_filenames = [] 61 | for i, input_file in enumerate(args.input): 62 | dirname, filename = os.path.split(input_file.name) 63 | if filename.split(os.extsep)[-1] == 'gz': 64 | base_filename = filename.rstrip('.gz') 65 | elif filename.split(os.extsep)[-1] == 'bz2': 66 | base_filename = filename.rstrip('.bz2') 67 | else: 68 | base_filename = filename 69 | if base_filename.split(os.extsep)[-1] == 'txt': 70 | base_filename = base_filename.rstrip('.txt') 71 | if filename.split(os.extsep)[-1] == 'gz': 72 | args.input[i] = gzip.GzipFile(input_file.name, input_file.mode, 73 | 9, input_file) 74 | elif filename.split(os.extsep)[-1] == 'bz2': 75 | args.input[i] = bz2.BZ2File(input_file.name, input_file.mode) 76 | base_filenames.append(base_filename) 77 | return base_filenames 78 | 79 | 80 | def safe_pickle(obj, filename): 81 | if os.path.isfile(filename) and not args.overwrite: 82 | logger.warning("Not saving %s, already exists." % (filename)) 83 | else: 84 | if os.path.isfile(filename): 85 | logger.info("Overwriting %s." % filename) 86 | else: 87 | logger.info("Saving to %s." % filename) 88 | with open(filename, 'wb') as f: 89 | pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL) 90 | 91 | 92 | def safe_hdf(array, name): 93 | if os.path.isfile(name + '.hdf') and not args.overwrite: 94 | logger.warning("Not saving %s, already exists." % (name + '.hdf')) 95 | else: 96 | if os.path.isfile(name + '.hdf'): 97 | logger.info("Overwriting %s." % (name + '.hdf')) 98 | else: 99 | logger.info("Saving to %s." % (name + '.hdf')) 100 | with tables.openFile(name + '.hdf', 'w') as f: 101 | atom = tables.Atom.from_dtype(array.dtype) 102 | filters = tables.Filters(complib='blosc', complevel=5) 103 | ds = f.createCArray(f.root, name.replace('.', ''), atom, 104 | array.shape, filters=filters) 105 | ds[:] = array 106 | 107 | 108 | def create_dictionary(): 109 | # Part I: Counting the words 110 | counters = [] 111 | sentence_counts = [] 112 | global_counter = Counter() 113 | 114 | for input_file, base_filename in zip(args.input, base_filenames): 115 | count_filename = base_filename + '.count.pkl' 116 | input_filename = os.path.basename(input_file.name) 117 | if os.path.isfile(count_filename) and not args.overwrite: 118 | logger.info("Loading word counts for %s from %s" 119 | % (input_filename, count_filename)) 120 | with open(count_filename, 'rb') as f: 121 | counter = pickle.load(f) 122 | sentence_count = sum([1 for line in input_file]) 123 | else: 124 | logger.info("Counting words in %s" % input_filename) 125 | counter = Counter() 126 | sentence_count = 0 127 | for line in input_file: 128 | if args.lowercase: 129 | line = line.lower() 130 | words = None 131 | if args.char: 132 | words = list(line.strip().decode('utf-8')) 133 | else: 134 | words = line.strip().split(' ') 135 | counter.update(words) 136 | global_counter.update(words) 137 | sentence_count += 1 138 | counters.append(counter) 139 | sentence_counts.append(sentence_count) 140 | logger.info("%d unique words in %d sentences with a total of %d words." 141 | % (len(counter), sentence_count, sum(counter.values()))) 142 | if args.each and args.count: 143 | safe_pickle(counter, count_filename) 144 | input_file.seek(0) 145 | 146 | # Part II: Combining the counts 147 | combined_counter = global_counter 148 | logger.info("Total: %d unique words in %d sentences with a total " 149 | "of %d words." 150 | % (len(combined_counter), sum(sentence_counts), 151 | sum(combined_counter.values()))) 152 | if args.count: 153 | safe_pickle(combined_counter, 'combined.count.pkl') 154 | 155 | # Part III: Creating the dictionary 156 | if args.vocab is not None: 157 | if args.vocab <= 2: 158 | logger.info('Building a dictionary with all unique words') 159 | args.vocab = len(combined_counter) + 2 160 | vocab_count = combined_counter.most_common(args.vocab - 2) 161 | logger.info("Creating dictionary of %s most common words, covering " 162 | "%2.1f%% of the text." 163 | % (args.vocab, 164 | 100.0 * sum([count for word, count in vocab_count]) / 165 | sum(combined_counter.values()))) 166 | else: 167 | logger.info("Creating dictionary of all words") 168 | vocab_count = counter.most_common() 169 | vocab = {} 170 | idx = 4 # place for pad 171 | ban_words = {'', '', ''} 172 | for word, count in vocab_count: 173 | if word not in ban_words: 174 | vocab[word] = idx 175 | idx += 1 176 | safe_pickle(vocab, args.dictionary) 177 | return combined_counter, sentence_counts, counters, vocab 178 | 179 | 180 | def binarize(): 181 | if args.ngram: 182 | assert numpy.iinfo(numpy.uint16).max > len(vocab) 183 | ngrams = numpy.empty((sum(combined_counter.values()) + 184 | sum(sentence_counts), args.ngram), 185 | dtype='uint16') 186 | binarized_corpora = [] 187 | total_ngram_count = 0 188 | for input_file, base_filename, sentence_count in \ 189 | zip(args.input, base_filenames, sentence_counts): 190 | input_filename = os.path.basename(input_file.name) 191 | logger.info("Binarizing %s." % (input_filename)) 192 | binarized_corpus = [] 193 | ngram_count = 0 194 | for sentence_count, sentence in enumerate(input_file): 195 | if args.lowercase: 196 | sentence = sentence.lower() 197 | if args.char: 198 | words = list(sentence.strip().decode('utf-8')) 199 | else: 200 | words = sentence.strip().split(' ') 201 | binarized_sentence = [vocab.get(word, 1) for word in words] 202 | binarized_corpus.append(binarized_sentence) 203 | if args.ngram: 204 | padded_sentence = numpy.asarray( 205 | [0] * (args.ngram - 1) + binarized_sentence + [0] 206 | ) 207 | ngrams[total_ngram_count + ngram_count: 208 | total_ngram_count + ngram_count + len(words) + 1] = \ 209 | as_strided( 210 | padded_sentence, 211 | shape=(len(words) + 1, args.ngram), 212 | strides=(padded_sentence.itemsize, 213 | padded_sentence.itemsize) 214 | ) 215 | ngram_count += len(words) + 1 216 | # endfor sentence in input_file 217 | # Output 218 | if args.each: 219 | if args.pickle: 220 | safe_pickle(binarized_corpus, base_filename + '.pkl') 221 | if args.ngram and args.split: 222 | if args.split >= 1: 223 | rows = int(args.split) 224 | else: 225 | rows = int(ngram_count * args.split) 226 | logger.info("Saving training set (%d samples) and validation " 227 | "set (%d samples)." 228 | % (ngram_count - rows, rows)) 229 | rows = numpy.random.choice(ngram_count, rows, replace=False) 230 | safe_hdf(ngrams[total_ngram_count + rows], 231 | base_filename + '_valid') 232 | safe_hdf( 233 | ngrams[total_ngram_count + numpy.setdiff1d( 234 | numpy.arange(ngram_count), 235 | rows, True 236 | )], base_filename + '_train' 237 | ) 238 | elif args.ngram: 239 | logger.info("Saving n-grams to %s." % (base_filename + '.hdf')) 240 | safe_hdf(ngrams, base_filename) 241 | binarized_corpora += binarized_corpus 242 | total_ngram_count += ngram_count 243 | input_file.seek(0) 244 | # endfor input_file in args.input 245 | if args.pickle: 246 | safe_pickle(binarized_corpora, args.binarized_text) 247 | if args.ngram and args.split: 248 | if args.split >= 1: 249 | rows = int(args.split) 250 | else: 251 | rows = int(total_ngram_count * args.split) 252 | logger.info("Saving training set (%d samples) and validation set (%d " 253 | "samples)." 254 | % (total_ngram_count - rows, rows)) 255 | rows = numpy.random.choice(total_ngram_count, rows, replace=False) 256 | safe_hdf(ngrams[rows], 'combined_valid') 257 | safe_hdf(ngrams[numpy.setdiff1d(numpy.arange(total_ngram_count), 258 | rows, True)], 'combined_train') 259 | elif args.ngram: 260 | safe_hdf(ngrams, 'combined') 261 | 262 | 263 | if __name__ == "__main__": 264 | logging.basicConfig(level=logging.INFO) 265 | logger = logging.getLogger('preprocess') 266 | args = parser.parse_args() 267 | base_filenames = open_files() 268 | combined_counter, sentence_counts, counters, vocab = create_dictionary() 269 | if args.ngram or args.pickle: 270 | binarize() 271 | -------------------------------------------------------------------------------- /nmt/inference.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | from mxwrap.rnn.LSTM import lstm, LSTMModel, LSTMParam, LSTMState 3 | from mxwrap.seq2seq.encoder import LstmEncoder, BiDirectionalLstmEncoder 4 | from mxwrap.attention.BasicAttention import BasicAttention 5 | 6 | 7 | def initial_state_symbol(t_num_lstm_layer, t_num_hidden): 8 | encoded = mx.sym.Variable("encoded") 9 | init_weight = mx.sym.Variable("target_init_weight") 10 | init_bias = mx.sym.Variable("target_init_bias") 11 | init_h = mx.sym.FullyConnected(data=encoded, num_hidden=t_num_hidden, 12 | weight=init_weight, bias=init_bias, name='init_fc') 13 | init_h = mx.sym.Activation(data=init_h, act_type='tanh', name='init_act') 14 | init_hs = mx.sym.SliceChannel(data=init_h, num_outputs=t_num_lstm_layer, squeeze_axis=1) 15 | return init_hs 16 | 17 | 18 | class BiS2SInferenceModel(object): 19 | def __init__(self, 20 | s_num_lstm_layer, s_seq_len, s_vocab_size, s_num_hidden, s_num_embed, s_dropout, 21 | t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label, t_dropout, 22 | arg_params, 23 | use_masking, 24 | ctx=mx.cpu(), 25 | batch_size=1): 26 | self.encode_sym = bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking, 27 | s_vocab_size, s_num_hidden, s_num_embed, 28 | s_dropout) 29 | attention = BasicAttention(batch_size=batch_size, seq_len=s_seq_len, attend_dim=s_num_hidden * 2, 30 | state_dim=t_num_hidden) 31 | self.decode_sym = lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, 32 | t_num_embed, 33 | t_num_label, t_dropout, attention, s_seq_len) 34 | self.init_state_sym = initial_state_symbol(t_num_lstm_layer, t_num_hidden) 35 | 36 | # initialize states for LSTM 37 | forward_source_init_c = [('forward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in 38 | range(s_num_lstm_layer)] 39 | forward_source_init_h = [('forward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in 40 | range(s_num_lstm_layer)] 41 | backward_source_init_c = [('backward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in 42 | range(s_num_lstm_layer)] 43 | backward_source_init_h = [('backward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in 44 | range(s_num_lstm_layer)] 45 | source_init_states = forward_source_init_c + forward_source_init_h + backward_source_init_c + backward_source_init_h 46 | 47 | target_init_c = [('target_l%d_init_c' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)] 48 | target_init_h = [('target_l%d_init_h' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)] 49 | target_init_states = target_init_c + target_init_h 50 | 51 | encode_data_shape = [("source", (batch_size, s_seq_len))] 52 | decode_data_shape = [("target", (batch_size,))] 53 | attend_state_shapes = [("attended", (batch_size, s_num_hidden * 2 * s_seq_len))] 54 | init_state_shapes = [("encoded", (batch_size, s_num_hidden * 2))] 55 | 56 | encode_input_shapes = dict(source_init_states + encode_data_shape) 57 | decode_input_shapes = dict(target_init_states + decode_data_shape + attend_state_shapes) 58 | init_input_shapes = dict(init_state_shapes) 59 | self.encode_executor = self.encode_sym.simple_bind(ctx=ctx, grad_req='null', **encode_input_shapes) 60 | self.decode_executor = self.decode_sym.simple_bind(ctx=ctx, grad_req='null', **decode_input_shapes) 61 | self.init_state_executor = self.init_state_sym.simple_bind(ctx=ctx, grad_req='null', **init_input_shapes) 62 | 63 | for key in self.encode_executor.arg_dict.keys(): 64 | if key in arg_params: 65 | arg_params[key].copyto(self.encode_executor.arg_dict[key]) 66 | for key in self.decode_executor.arg_dict.keys(): 67 | if key in arg_params: 68 | arg_params[key].copyto(self.decode_executor.arg_dict[key]) 69 | for key in self.init_state_executor.arg_dict.keys(): 70 | if key in arg_params: 71 | arg_params[key].copyto(self.init_state_executor.arg_dict[key]) 72 | 73 | encode_state_name = [] 74 | decode_state_name = [] 75 | for i in range(s_num_lstm_layer): 76 | encode_state_name.append("forward_source_l%d_init_c" % i) 77 | encode_state_name.append("forward_source_l%d_init_h" % i) 78 | encode_state_name.append("backward_source_l%d_init_c" % i) 79 | encode_state_name.append("backward_source_l%d_init_h" % i) 80 | for i in range(t_num_lstm_layer): 81 | decode_state_name.append("target_l%d_init_c" % i) 82 | decode_state_name.append("target_l%d_init_h" % i) 83 | 84 | self.encode_states_dict = dict(zip(encode_state_name, self.encode_executor.outputs)) 85 | self.decode_states_dict = dict(zip(decode_state_name, self.decode_executor.outputs[1:])) 86 | 87 | def encode(self, input_data): 88 | for key in self.encode_states_dict.keys(): 89 | self.encode_executor.arg_dict[key][:] = 0. 90 | input_data.copyto(self.encode_executor.arg_dict["source"]) 91 | self.encode_executor.forward() 92 | last_encoded = self.encode_executor.outputs[0] 93 | all_encoded = self.encode_executor.outputs[1] 94 | return last_encoded, all_encoded 95 | 96 | def decode_forward(self, last_encoded, all_encoded, input_data, new_seq): 97 | if new_seq: 98 | last_encoded.copyto(self.init_state_executor.arg_dict["encoded"]) 99 | self.init_state_executor.forward() 100 | init_hs = self.init_state_executor.outputs[0] 101 | init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 102 | self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0 103 | all_encoded.copyto(self.decode_executor.arg_dict["attended"]) 104 | input_data.copyto(self.decode_executor.arg_dict["target"]) 105 | self.decode_executor.forward() 106 | 107 | prob = self.decode_executor.outputs[0].asnumpy() 108 | 109 | self.decode_executor.outputs[1].copyto(self.decode_executor.arg_dict["target_l0_init_c"]) 110 | self.decode_executor.outputs[2].copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 111 | 112 | attention_weights = self.decode_executor.outputs[3].asnumpy() 113 | 114 | return prob, attention_weights 115 | 116 | def decode_forward_with_state(self, last_encoded, all_encoded, input_data, state, new_seq): 117 | if new_seq: 118 | last_encoded.copyto(self.init_state_executor.arg_dict["encoded"]) 119 | self.init_state_executor.forward() 120 | init_hs = self.init_state_executor.outputs[0] 121 | # init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 122 | self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0 123 | state = LSTMState(c=self.decode_executor.arg_dict["target_l0_init_c"], h=init_hs) 124 | all_encoded.copyto(self.decode_executor.arg_dict["attended"]) 125 | input_data.copyto(self.decode_executor.arg_dict["target"]) 126 | state.c.copyto(self.decode_executor.arg_dict["target_l0_init_c"]) 127 | state.h.copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 128 | self.decode_executor.forward() 129 | 130 | prob = self.decode_executor.outputs[0] 131 | 132 | c = self.decode_executor.outputs[1] 133 | h = self.decode_executor.outputs[2] 134 | 135 | attention_weights = self.decode_executor.outputs[3] 136 | 137 | return prob, attention_weights, LSTMState(c=c, h=h) 138 | 139 | 140 | def bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking, s_vocab_size, s_num_hidden, s_num_embed, 141 | s_dropout): 142 | encoder = BiDirectionalLstmEncoder(seq_len=s_seq_len, use_masking=use_masking, state_dim=s_num_hidden, 143 | input_dim=s_vocab_size, 144 | output_dim=0, 145 | vocab_size=s_vocab_size, embed_dim=s_num_embed, 146 | dropout=s_dropout, num_of_layer=s_num_lstm_layer) 147 | forward_hidden_all, backward_hidden_all, bi_hidden_all = encoder.encode() 148 | concat_encoded = mx.sym.Concat(*bi_hidden_all, dim=1) 149 | encoded_for_init_state = mx.sym.Concat(forward_hidden_all[-1], backward_hidden_all[0], dim=1, 150 | name='encoded_for_init_state') 151 | return mx.sym.Group([encoded_for_init_state, concat_encoded]) 152 | 153 | 154 | def lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label, 155 | t_dropout, 156 | attention, source_seq_len): 157 | data = mx.sym.Variable("target") 158 | seqidx = 0 159 | 160 | embed_weight = mx.sym.Variable("target_embed_weight") 161 | cls_weight = mx.sym.Variable("target_cls_weight") 162 | cls_bias = mx.sym.Variable("target_cls_bias") 163 | 164 | input_weight = mx.sym.Variable("target_input_weight") 165 | # input_bias = mx.sym.Variable("target_input_bias") 166 | 167 | param_cells = [] 168 | last_states = [] 169 | 170 | for i in range(t_num_lstm_layer): 171 | param_cells.append(LSTMParam(i2h_weight=mx.sym.Variable("target_l%d_i2h_weight" % i), 172 | i2h_bias=mx.sym.Variable("target_l%d_i2h_bias" % i), 173 | h2h_weight=mx.sym.Variable("target_l%d_h2h_weight" % i), 174 | h2h_bias=mx.sym.Variable("target_l%d_h2h_bias" % i))) 175 | state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i), 176 | h=mx.sym.Variable("target_l%d_init_h" % i)) 177 | # state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i), 178 | # h=init_hs[i]) 179 | last_states.append(state) 180 | assert (len(last_states) == t_num_lstm_layer) 181 | 182 | hidden = mx.sym.Embedding(data=data, 183 | input_dim=t_vocab_size + 1, 184 | output_dim=t_num_embed, 185 | weight=embed_weight, 186 | name="target_embed") 187 | 188 | all_encoded = mx.sym.Variable("attended") 189 | encoded = mx.sym.SliceChannel(data=all_encoded, axis=1, num_outputs=source_seq_len) 190 | weights, weighted_encoded = attention.attend(attended=encoded, concat_attended=all_encoded, 191 | state=last_states[0].h, 192 | attend_masks=None, 193 | use_masking=False) 194 | con = mx.sym.Concat(hidden, weighted_encoded) 195 | hidden = mx.sym.FullyConnected(data=con, num_hidden=t_num_embed, 196 | weight=input_weight, no_bias=True, name='input_fc') 197 | # hidden = mx.sym.Activation(data=hidden, act_type='tanh', name='input_act') 198 | 199 | # stack LSTM 200 | for i in range(t_num_lstm_layer): 201 | if i == 0: 202 | dp = 0. 203 | else: 204 | dp = t_dropout 205 | next_state = lstm(t_num_hidden, indata=hidden, 206 | prev_state=last_states[i], 207 | param=param_cells[i], 208 | seqidx=seqidx, layeridx=i, dropout=dp) 209 | hidden = next_state.h 210 | last_states[i] = next_state 211 | 212 | fc = mx.sym.FullyConnected(data=hidden, num_hidden=t_num_label, 213 | weight=cls_weight, bias=cls_bias, name='target_pred') 214 | sm = mx.sym.SoftmaxOutput(data=fc, name='target_softmax') 215 | output = [sm] 216 | for state in last_states: 217 | output.append(state.c) 218 | output.append(state.h) 219 | output.append(weights) 220 | return mx.sym.Group(output) 221 | -------------------------------------------------------------------------------- /nmt/inference_mask.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | from mxwrap.rnn.LSTM import lstm, LSTMModel, LSTMParam, LSTMState 3 | from mxwrap.seq2seq.encoder import LstmEncoder, BiDirectionalLstmEncoder 4 | from mxwrap.attention.BasicAttention import BasicAttention 5 | 6 | 7 | def initial_state_symbol(t_num_lstm_layer, t_num_hidden): 8 | encoded = mx.sym.Variable("encoded") 9 | init_weight = mx.sym.Variable("target_init_weight") 10 | init_bias = mx.sym.Variable("target_init_bias") 11 | init_h = mx.sym.FullyConnected(data=encoded, num_hidden=t_num_hidden, 12 | weight=init_weight, bias=init_bias, name='init_fc') 13 | init_h = mx.sym.Activation(data=init_h, act_type='tanh', name='init_act') 14 | init_hs = mx.sym.SliceChannel(data=init_h, num_outputs=t_num_lstm_layer, squeeze_axis=1) 15 | return init_hs 16 | 17 | 18 | class BiS2SInferenceModel_mask(object): 19 | def __init__(self, 20 | s_num_lstm_layer, s_seq_len, s_vocab_size, s_num_hidden, s_num_embed, s_dropout, 21 | t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label, t_dropout, 22 | arg_params, 23 | use_masking, ctx=mx.cpu(), 24 | batch_size=1): 25 | self.encode_sym = bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking, 26 | s_vocab_size, s_num_hidden, s_num_embed, 27 | s_dropout) 28 | attention = BasicAttention(batch_size=batch_size, seq_len=s_seq_len, attend_dim=s_num_hidden * 2, 29 | state_dim=t_num_hidden) 30 | self.decode_sym = lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, 31 | t_num_embed, 32 | t_num_label, t_dropout, attention, s_seq_len, batch_size) 33 | self.init_state_sym = initial_state_symbol(t_num_lstm_layer, t_num_hidden) 34 | 35 | # initialize states for LSTM 36 | forward_source_init_c = [('forward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in 37 | range(s_num_lstm_layer)] 38 | forward_source_init_h = [('forward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in 39 | range(s_num_lstm_layer)] 40 | backward_source_init_c = [('backward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in 41 | range(s_num_lstm_layer)] 42 | backward_source_init_h = [('backward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in 43 | range(s_num_lstm_layer)] 44 | source_init_states = forward_source_init_c + forward_source_init_h + backward_source_init_c + backward_source_init_h 45 | 46 | target_init_c = [('target_l%d_init_c' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)] 47 | target_init_h = [('target_l%d_init_h' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)] 48 | target_init_states = target_init_c + target_init_h 49 | 50 | encode_data_shape = [("source", (batch_size, s_seq_len))] 51 | mask_data_shape = [("source_mask", (batch_size, s_seq_len))] 52 | decode_data_shape = [("target", (batch_size,))] 53 | attend_state_shapes = [("attended", (batch_size, s_num_hidden * 2 * s_seq_len))] 54 | attend_mask = [("encoded_mask", (batch_size, s_seq_len))] 55 | init_state_shapes = [("encoded", (batch_size, s_num_hidden * 2))] 56 | 57 | encode_input_shapes = dict(source_init_states + encode_data_shape + mask_data_shape) 58 | decode_input_shapes = dict(target_init_states + decode_data_shape + attend_state_shapes + attend_mask) 59 | init_input_shapes = dict(init_state_shapes) 60 | self.encode_executor = self.encode_sym.simple_bind(ctx=ctx, grad_req='null', **encode_input_shapes) 61 | self.decode_executor = self.decode_sym.simple_bind(ctx=ctx, grad_req='null', **decode_input_shapes) 62 | self.init_state_executor = self.init_state_sym.simple_bind(ctx=ctx, grad_req='null', **init_input_shapes) 63 | 64 | for key in self.encode_executor.arg_dict.keys(): 65 | if key in arg_params: 66 | arg_params[key].copyto(self.encode_executor.arg_dict[key]) 67 | for key in self.decode_executor.arg_dict.keys(): 68 | if key in arg_params: 69 | arg_params[key].copyto(self.decode_executor.arg_dict[key]) 70 | for key in self.init_state_executor.arg_dict.keys(): 71 | if key in arg_params: 72 | arg_params[key].copyto(self.init_state_executor.arg_dict[key]) 73 | 74 | encode_state_name = [] 75 | decode_state_name = [] 76 | for i in range(s_num_lstm_layer): 77 | encode_state_name.append("forward_source_l%d_init_c" % i) 78 | encode_state_name.append("forward_source_l%d_init_h" % i) 79 | encode_state_name.append("backward_source_l%d_init_c" % i) 80 | encode_state_name.append("backward_source_l%d_init_h" % i) 81 | for i in range(t_num_lstm_layer): 82 | decode_state_name.append("target_l%d_init_c" % i) 83 | decode_state_name.append("target_l%d_init_h" % i) 84 | 85 | self.encode_states_dict = dict(zip(encode_state_name, self.encode_executor.outputs)) 86 | self.decode_states_dict = dict(zip(decode_state_name, self.decode_executor.outputs[1:])) 87 | 88 | def encode(self, input_data, input_mask): 89 | for key in self.encode_states_dict.keys(): 90 | self.encode_executor.arg_dict[key][:] = 0. 91 | input_data.copyto(self.encode_executor.arg_dict["source"]) 92 | input_mask.copyto(self.encode_executor.arg_dict["source_mask"]) 93 | self.encode_executor.forward() 94 | last_encoded = self.encode_executor.outputs[0] 95 | all_encoded = self.encode_executor.outputs[1] 96 | return last_encoded, all_encoded 97 | 98 | def decode_forward(self, last_encoded, all_encoded, mask, input_data, new_seq): 99 | if new_seq: 100 | last_encoded.copyto(self.init_state_executor.arg_dict["encoded"]) 101 | self.init_state_executor.forward() 102 | init_hs = self.init_state_executor.outputs[0] 103 | init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 104 | self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0 105 | all_encoded.copyto(self.decode_executor.arg_dict["attended"]) 106 | mask.copyto(self.decode_executor.arg_dict["encoded_mask"]) 107 | input_data.copyto(self.decode_executor.arg_dict["target"]) 108 | self.decode_executor.forward() 109 | 110 | prob = self.decode_executor.outputs[0].asnumpy() 111 | 112 | self.decode_executor.outputs[1].copyto(self.decode_executor.arg_dict["target_l0_init_c"]) 113 | self.decode_executor.outputs[2].copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 114 | 115 | attention_weights = self.decode_executor.outputs[3].asnumpy() 116 | 117 | return prob, attention_weights 118 | 119 | def decode_forward_with_state(self, last_encoded, all_encoded, mask, input_data, state, new_seq): 120 | if new_seq: 121 | last_encoded.copyto(self.init_state_executor.arg_dict["encoded"]) 122 | self.init_state_executor.forward() 123 | init_hs = self.init_state_executor.outputs[0] 124 | # init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 125 | self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0 126 | state = LSTMState(c=self.decode_executor.arg_dict["target_l0_init_c"], h=init_hs) 127 | all_encoded.copyto(self.decode_executor.arg_dict["attended"]) 128 | mask.copyto(self.decode_executor.arg_dict["encoded_mask"]) 129 | input_data.copyto(self.decode_executor.arg_dict["target"]) 130 | state.c.copyto(self.decode_executor.arg_dict["target_l0_init_c"]) 131 | state.h.copyto(self.decode_executor.arg_dict["target_l0_init_h"]) 132 | self.decode_executor.forward() 133 | 134 | prob = self.decode_executor.outputs[0] 135 | 136 | c = self.decode_executor.outputs[1] 137 | h = self.decode_executor.outputs[2] 138 | 139 | attention_weights = self.decode_executor.outputs[3] 140 | 141 | return prob, attention_weights, LSTMState(c=c, h=h) 142 | 143 | 144 | def bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking, s_vocab_size, s_num_hidden, s_num_embed, 145 | s_dropout): 146 | encoder = BiDirectionalLstmEncoder(seq_len=s_seq_len, use_masking=use_masking, state_dim=s_num_hidden, 147 | input_dim=s_vocab_size, 148 | output_dim=0, 149 | vocab_size=s_vocab_size, embed_dim=s_num_embed, 150 | dropout=s_dropout, num_of_layer=s_num_lstm_layer) 151 | forward_hidden_all, backward_hidden_all, bi_hidden_all, masks_sliced = encoder.encode() 152 | concat_encoded = mx.sym.Concat(*bi_hidden_all, dim=1) 153 | encoded_for_init_state = mx.sym.Concat(forward_hidden_all[-1], backward_hidden_all[0], dim=1, 154 | name='encoded_for_init_state') 155 | return mx.sym.Group([encoded_for_init_state, concat_encoded]) 156 | 157 | 158 | def lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label, 159 | t_dropout, 160 | attention, source_seq_len, batch_size): 161 | data = mx.sym.Variable("target") 162 | encoded_mask = mx.sym.Variable("encoded_mask") 163 | encoded_mask = mx.sym.SliceChannel(data=encoded_mask, num_outputs=source_seq_len, name='sliced_source_mask') 164 | seqidx = 0 165 | 166 | embed_weight = mx.sym.Variable("target_embed_weight") 167 | cls_weight = mx.sym.Variable("target_cls_weight") 168 | cls_bias = mx.sym.Variable("target_cls_bias") 169 | 170 | input_weight = mx.sym.Variable("target_input_weight") 171 | # input_bias = mx.sym.Variable("target_input_bias") 172 | 173 | param_cells = [] 174 | last_states = [] 175 | 176 | for i in range(t_num_lstm_layer): 177 | param_cells.append(LSTMParam(i2h_weight=mx.sym.Variable("target_l%d_i2h_weight" % i), 178 | i2h_bias=mx.sym.Variable("target_l%d_i2h_bias" % i), 179 | h2h_weight=mx.sym.Variable("target_l%d_h2h_weight" % i), 180 | h2h_bias=mx.sym.Variable("target_l%d_h2h_bias" % i))) 181 | state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i), 182 | h=mx.sym.Variable("target_l%d_init_h" % i)) 183 | # state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i), 184 | # h=init_hs[i]) 185 | last_states.append(state) 186 | assert (len(last_states) == t_num_lstm_layer) 187 | 188 | hidden = mx.sym.Embedding(data=data, 189 | input_dim=t_vocab_size + 1, 190 | output_dim=t_num_embed, 191 | weight=embed_weight, 192 | name="target_embed") 193 | 194 | all_encoded = mx.sym.Variable("attended") 195 | all_attended = mx.sym.Reshape(data=all_encoded, shape=(batch_size, source_seq_len, -1), 196 | name='_reshape_concat_attended') 197 | encoded = mx.sym.SliceChannel(data=all_encoded, axis=1, num_outputs=source_seq_len) 198 | weights, weighted_encoded = attention.attend(attended=encoded, concat_attended=all_attended, 199 | state=last_states[0].h, 200 | attend_masks=encoded_mask, 201 | use_masking=True) 202 | con = mx.sym.Concat(hidden, weighted_encoded) 203 | hidden = mx.sym.FullyConnected(data=con, num_hidden=t_num_embed, 204 | weight=input_weight, no_bias=True, name='input_fc') 205 | # hidden = mx.sym.Activation(data=hidden, act_type='tanh', name='input_act') 206 | 207 | # stack LSTM 208 | for i in range(t_num_lstm_layer): 209 | if i == 0: 210 | dp = 0. 211 | else: 212 | dp = t_dropout 213 | next_state = lstm(t_num_hidden, indata=hidden, 214 | prev_state=last_states[i], 215 | param=param_cells[i], 216 | seqidx=seqidx, layeridx=i, dropout=dp) 217 | hidden = next_state.h 218 | last_states[i] = next_state 219 | 220 | fc = mx.sym.FullyConnected(data=hidden, num_hidden=t_num_label, 221 | weight=cls_weight, bias=cls_bias, name='target_pred') 222 | sm = mx.sym.SoftmaxOutput(data=fc, name='target_softmax') 223 | output = [sm] 224 | for state in last_states: 225 | output.append(state.c) 226 | output.append(state.h) 227 | output.append(weights) 228 | return mx.sym.Group(output) 229 | -------------------------------------------------------------------------------- /nmt/main.py: -------------------------------------------------------------------------------- 1 | """ 2 | Encoder-Decoder with attention for neural machine translation 3 | 4 | """ 5 | 6 | import sys 7 | 8 | if sys.version_info[0] < 3: 9 | raise Exception("Must be using Python 3") 10 | 11 | import argparse 12 | import logging 13 | import time 14 | import os 15 | import mxnet as mx 16 | import numpy as np 17 | import xconfig 18 | 19 | np.random.seed(65536) # make it predictable 20 | mx.random.seed(65535) # 2333 21 | 22 | sys.path.append('.') 23 | sys.path.append('..') 24 | 25 | logging.basicConfig(format='%(asctime)s %(levelname)s:%(name)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S') 26 | file_handler = logging.FileHandler(os.path.join(xconfig.log_root, time.strftime("%Y%m%d-%H%M%S") + '.log')) 27 | file_handler.setFormatter(logging.Formatter('%(asctime)s [%(levelname)-5.5s:%(name)s] %(message)s')) 28 | logging.root.addHandler(file_handler) 29 | # logger = logging.getLogger(__name__) 30 | 31 | # Get the arguments 32 | parser = argparse.ArgumentParser() 33 | parser.add_argument( 34 | "--mode", choices=["train", "test"], default='train', 35 | help="The mode to run. In the `train` mode a model is trained." 36 | " In the `test` mode a trained model is used to translate") 37 | args = parser.parse_args() 38 | 39 | logging.info(xconfig.get_config_str()) 40 | 41 | if __name__ == "__main__": 42 | if args.mode == 'train': 43 | logging.info('In train mode.') 44 | from trainer import train 45 | 46 | train() 47 | elif args.mode == 'test': 48 | logging.info('In test mode.') 49 | from tester import test 50 | 51 | test() 52 | -------------------------------------------------------------------------------- /nmt/masked_bucket_io.py: -------------------------------------------------------------------------------- 1 | # pylint: disable=C0111,too-many-arguments,too-many-instance-attributes,too-many-locals,redefined-outer-name,fixme 2 | # pylint: disable=superfluous-parens, no-member, invalid-name 3 | import sys 4 | 5 | sys.path.insert(0, "../../python") 6 | import numpy as np 7 | import mxnet as mx 8 | from mxnet.io import DataBatch 9 | 10 | 11 | # The interface of a data iter that works for bucketing 12 | # 13 | # DataIter 14 | # - default_bucket_key: the bucket key for the default symbol. 15 | # 16 | # DataBatch 17 | # - provide_data: same as DataIter, but specific to this batch 18 | # - provide_label: same as DataIter, but specific to this batch 19 | # - bucket_key: the key for the bucket that should be used for this batch 20 | 21 | def default_read_content(path): 22 | with open(path) as ins: 23 | content = ins.read() 24 | content = content.replace('\n', ' ').replace('. ', ' ') 25 | return content 26 | 27 | 28 | def default_build_vocab(path): 29 | content = default_read_content(path) 30 | content = content.split(' ') 31 | the_vocab = {} 32 | idx = 1 # 0 is left for zero-padding 33 | the_vocab[' '] = 0 # put a dummy element here so that len(vocab) is correct 34 | for word in content: 35 | if len(word) == 0: 36 | continue 37 | if not word in the_vocab: 38 | the_vocab[word] = idx 39 | idx += 1 40 | return the_vocab 41 | 42 | 43 | def default_text2id(sentence, the_vocab): 44 | words = sentence.split(' ') 45 | words = [the_vocab[w] for w in words if len(w) > 0] 46 | return words 47 | 48 | 49 | def default_gen_buckets(sentences, batch_size, the_vocab): 50 | len_dict = {} 51 | max_len = -1 52 | for sentence in sentences: 53 | words = default_text2id(sentence, the_vocab) 54 | if len(words) == 0: 55 | continue 56 | if len(words) > max_len: 57 | max_len = len(words) 58 | if len(words) in len_dict: 59 | len_dict[len(words)] += 1 60 | else: 61 | len_dict[len(words)] = 1 62 | print(len_dict) 63 | 64 | tl = 0 65 | buckets = [] 66 | for l, n in len_dict.items(): # TODO: There are better heuristic ways to do this 67 | if n + tl >= batch_size: 68 | buckets.append(l) 69 | tl = 0 70 | else: 71 | tl += n 72 | if tl > 0: 73 | buckets.append(max_len) 74 | return buckets 75 | 76 | 77 | class SimpleBatch(object): 78 | def __init__(self, data_names, data, label_names, label, bucket_key): 79 | self.data = data 80 | self.label = label 81 | self.data_names = data_names 82 | self.label_names = label_names 83 | self.bucket_key = bucket_key 84 | 85 | self.pad = 0 86 | self.index = None # TODO: what is index? 87 | 88 | @property 89 | def provide_data(self): 90 | return [(n, x.shape) for n, x in zip(self.data_names, self.data)] 91 | 92 | @property 93 | def provide_label(self): 94 | return [(n, x.shape) for n, x in zip(self.label_names, self.label)] 95 | 96 | 97 | class DummyIter(mx.io.DataIter): 98 | '''A dummy iterator that always return the same batch, used for speed testing''' 99 | 100 | def __init__(self, real_iter): 101 | super(DummyIter, self).__init__() 102 | self.real_iter = real_iter 103 | self.provide_data = real_iter.provide_data 104 | self.provide_label = real_iter.provide_label 105 | self.batch_size = real_iter.batch_size 106 | 107 | for batch in real_iter: 108 | self.the_batch = batch 109 | break 110 | 111 | def __iter__(self): 112 | return self 113 | 114 | def next(self): 115 | return self.the_batch 116 | 117 | 118 | class MaskedBucketSentenceIter(mx.io.DataIter): 119 | def __init__(self, source_path, target_path, source_vocab, target_vocab, 120 | buckets, batch_size, 121 | source_init_states, target_init_states, 122 | source_data_name='source', source_mask_name='source_mask', 123 | target_data_name='target', target_mask_name='target_mask', 124 | label_name='target_softmax_label', 125 | seperate_char=' ', text2id=None, read_content=None, max_read_sample=sys.maxsize): 126 | super(MaskedBucketSentenceIter, self).__init__() 127 | 128 | if text2id is None: 129 | self.text2id = default_text2id 130 | else: 131 | self.text2id = text2id 132 | if read_content is None: 133 | self.read_content = default_read_content 134 | else: 135 | self.read_content = read_content 136 | source_sentences = self.read_content(source_path, max_read_sample) 137 | # source_sentences = source_content.split(seperate_char) 138 | 139 | target_sentences = self.read_content(target_path, max_read_sample) 140 | # target_sentences = target_content.split(seperate_char) 141 | 142 | assert len(source_sentences) == len(target_sentences) 143 | 144 | self.source_vocab_size = len(source_vocab) 145 | self.target_vocab_size = len(target_vocab) 146 | self.source_data_name = source_data_name 147 | self.target_data_name = target_data_name 148 | self.label_name = label_name 149 | 150 | self.source_mask_name = source_mask_name 151 | self.target_mask_name = target_mask_name 152 | 153 | buckets.sort() 154 | self.buckets = buckets 155 | self.source_data = [[] for _ in buckets] 156 | self.target_data = [[] for _ in buckets] 157 | self.label_data = [[] for _ in buckets] 158 | self.source_mask_data = [[] for _ in buckets] 159 | self.target_mask_data = [[] for _ in buckets] 160 | 161 | # pre-allocate with the largest bucket for better memory sharing 162 | 163 | 164 | num_of_data = len(source_sentences) 165 | for i in range(num_of_data): 166 | source = source_sentences[i] 167 | target = [''] + target_sentences[i] 168 | label = target_sentences[i] + [''] 169 | source_sentence = self.text2id(source, source_vocab) 170 | target_sentence = self.text2id(target, target_vocab) 171 | label_id = self.text2id(label, target_vocab) 172 | if len(source_sentence) == 0 or len(target_sentence) == 0: 173 | continue 174 | for j, bkt in enumerate(buckets): 175 | if bkt[0] >= len(source) and bkt[1] >= len(target): 176 | self.source_data[j].append(source_sentence) 177 | self.target_data[j].append(target_sentence) 178 | self.label_data[j].append(label_id) 179 | break 180 | # we just ignore the sentence it is longer than the maximum 181 | # bucket size here 182 | source_data_clean = [] 183 | target_data_clean = [] 184 | label_data_clean = [] 185 | buckets_clean = [] 186 | for i in range(len(self.source_data)): 187 | if len(self.source_data[i]) >= batch_size: 188 | source_data_clean.append(self.source_data[i]) 189 | target_data_clean.append(self.target_data[i]) 190 | label_data_clean.append(self.label_data[i]) 191 | buckets_clean.append(self.buckets[i]) 192 | 193 | self.source_data = source_data_clean 194 | self.target_data = target_data_clean 195 | self.label_data = label_data_clean 196 | self.buckets = buckets_clean 197 | del buckets 198 | self.default_bucket_key = max(self.buckets) 199 | 200 | # convert data into ndarrays for better speed during training 201 | source_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)] 202 | source_mask_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)] 203 | target_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)] 204 | target_mask_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)] 205 | label_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.label_data)] 206 | for i_bucket in range(len(self.buckets)): 207 | for j in range(len(self.source_data[i_bucket])): 208 | source = self.source_data[i_bucket][j] 209 | target = self.target_data[i_bucket][j] 210 | label = self.label_data[i_bucket][j] 211 | source_data[i_bucket][j, :len(source)] = source 212 | source_mask_data[i_bucket][j, :len(source)] = 1 213 | target_data[i_bucket][j, :len(target)] = target 214 | target_mask_data[i_bucket][j, :len(target)] = 1 215 | label_data[i_bucket][j, :len(label)] = label 216 | self.source_data = source_data 217 | self.source_mask_data = source_mask_data 218 | self.target_data = target_data 219 | self.target_mask_data = target_mask_data 220 | self.label_data = label_data 221 | 222 | # Get the size of each bucket, so that we could sample 223 | # uniformly from the bucket 224 | bucket_sizes = [len(x) for x in self.source_data] 225 | 226 | print("Summary of dataset ==================") 227 | total_count = 0 228 | for bkt, size in zip(self.buckets, bucket_sizes): 229 | print("bucket of {0} : {1} samples".format(bkt, size)) 230 | total_count += size 231 | print('Total: {0} ({1}) in {2} buckets'.format(total_count, num_of_data, len(self.buckets))) 232 | 233 | self.batch_size = batch_size 234 | self.make_data_iter_plan() 235 | 236 | self.source_init_states = source_init_states 237 | self.target_init_states = target_init_states 238 | self.source_init_state_arrays = [mx.nd.zeros(x[1]) for x in source_init_states] 239 | self.target_init_state_arrays = [mx.nd.zeros(x[1]) for x in target_init_states] 240 | 241 | self.provide_data = [(source_data_name, (batch_size, self.default_bucket_key[0])), 242 | (source_mask_name, (batch_size, self.default_bucket_key[0])), 243 | (target_data_name, (batch_size, self.default_bucket_key[1])), 244 | # (target_mask_name, (batch_size, self.default_bucket_key[1])) 245 | ] + source_init_states + target_init_states 246 | 247 | self.provide_label = [(label_name, (self.batch_size, self.default_bucket_key[1]))] 248 | 249 | def make_data_iter_plan(self): 250 | "make a random data iteration plan" 251 | # truncate each bucket into multiple of batch-size 252 | bucket_n_batches = [] 253 | for i in range(len(self.source_data)): 254 | bucket_n_batches.append(len(self.source_data[i]) / self.batch_size) 255 | self.source_data[i] = self.source_data[i][:int(bucket_n_batches[i] * self.batch_size)] 256 | self.source_mask_data[i] = self.source_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)] 257 | self.target_data[i] = self.target_data[i][:int(bucket_n_batches[i] * self.batch_size)] 258 | self.target_mask_data[i] = self.target_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)] 259 | 260 | bucket_plan = np.hstack([np.zeros(int(n), int) + i for i, n in enumerate(bucket_n_batches)]) 261 | np.random.shuffle(bucket_plan) 262 | 263 | bucket_idx_all = [np.random.permutation(len(x)) for x in self.source_data] 264 | 265 | self.bucket_plan = bucket_plan 266 | self.bucket_idx_all = bucket_idx_all 267 | self.bucket_curr_idx = [0 for x in self.source_data] 268 | 269 | self.source_data_buffer = [] 270 | self.source_mask_data_buffer = [] 271 | self.target_data_buffer = [] 272 | self.target_mask_data_buffer = [] 273 | self.label_buffer = [] 274 | for i_bucket in range(len(self.source_data)): 275 | source_data = np.zeros((self.batch_size, self.buckets[i_bucket][0])) 276 | source_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][0])) 277 | target_data = np.zeros((self.batch_size, self.buckets[i_bucket][1])) 278 | target_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][1])) 279 | label = np.zeros((self.batch_size, self.buckets[i_bucket][1])) 280 | 281 | self.source_data_buffer.append(source_data) 282 | self.source_mask_data_buffer.append(source_mask_data) 283 | self.target_data_buffer.append(target_data) 284 | self.target_mask_data_buffer.append(target_mask_data) 285 | self.label_buffer.append(label) 286 | self.iterIndex = 0 287 | 288 | def next(self): 289 | """Get next data batch from iterator. 290 | 291 | Returns 292 | ------- 293 | DataBatch 294 | The data of next batch. 295 | 296 | Raises 297 | ------ 298 | StopIteration 299 | If the end of the data is reached 300 | """ 301 | if self.iterIndex == len(self.bucket_plan): 302 | raise StopIteration 303 | 304 | i_bucket = self.bucket_plan[self.iterIndex] 305 | 306 | source_data = self.source_data_buffer[i_bucket] 307 | source_mask_data = self.source_mask_data_buffer[i_bucket] 308 | target_data = self.target_data_buffer[i_bucket] 309 | target_mask_data = self.target_mask_data_buffer[i_bucket] 310 | label = self.label_buffer[i_bucket] 311 | 312 | i_idx = self.bucket_curr_idx[i_bucket] 313 | idx = self.bucket_idx_all[i_bucket][i_idx:i_idx + self.batch_size] 314 | self.bucket_curr_idx[i_bucket] += self.batch_size 315 | source_data[:] = self.source_data[i_bucket][idx] 316 | source_mask_data[:] = self.source_mask_data[i_bucket][idx] 317 | target_data[:] = self.target_data[i_bucket][idx] 318 | target_mask_data[:] = self.target_mask_data[i_bucket][idx] 319 | label[:] = self.label_data[i_bucket][idx] 320 | 321 | data_all = [mx.nd.array(source_data), mx.nd.array(source_mask_data)] + \ 322 | [mx.nd.array(target_data), 323 | # mx.nd.array(target_mask_data) 324 | ] + \ 325 | self.source_init_state_arrays + self.target_init_state_arrays 326 | label_all = [mx.nd.array(label)] 327 | 328 | bucket_key = self.buckets[i_bucket] 329 | provide_data = [(self.source_data_name, (self.batch_size, bucket_key[0])), 330 | (self.source_mask_name, (self.batch_size, bucket_key[0])), 331 | (self.target_data_name, (self.batch_size, bucket_key[1])), 332 | # (self.target_mask_name, (self.batch_size, bucket_key[1])) 333 | ] + self.source_init_states + self.target_init_states 334 | provide_label = [(self.label_name, (self.batch_size, bucket_key[1]))] 335 | 336 | data_batch = DataBatch(data_all, label_all, pad=0, 337 | bucket_key=bucket_key, 338 | provide_data=provide_data, 339 | provide_label=provide_label) 340 | self.iterIndex += 1 341 | return data_batch 342 | 343 | def reset(self): 344 | self.iterIndex = 0 345 | self.bucket_curr_idx = [0 for x in self.source_data] 346 | -------------------------------------------------------------------------------- /nmt/masked_bucket_io_new.py: -------------------------------------------------------------------------------- 1 | # pylint: disable=C0111,too-many-arguments,too-many-instance-attributes,too-many-locals,redefined-outer-name,fixme 2 | # pylint: disable=superfluous-parens, no-member, invalid-name 3 | import sys 4 | 5 | sys.path.insert(0, "../../python") 6 | import numpy as np 7 | import mxnet as mx 8 | from mxnet.io import DataBatch 9 | 10 | 11 | # The interface of a data iter that works for bucketing 12 | # 13 | # DataIter 14 | # - default_bucket_key: the bucket key for the default symbol. 15 | # 16 | # DataBatch 17 | # - provide_data: same as DataIter, but specific to this batch 18 | # - provide_label: same as DataIter, but specific to this batch 19 | # - bucket_key: the key for the bucket that should be used for this batch 20 | 21 | def default_read_content(path): 22 | with open(path) as ins: 23 | content = ins.read() 24 | content = content.replace('\n', ' ').replace('. ', ' ') 25 | return content 26 | 27 | 28 | def default_build_vocab(path): 29 | content = default_read_content(path) 30 | content = content.split(' ') 31 | the_vocab = {} 32 | idx = 1 # 0 is left for zero-padding 33 | the_vocab[' '] = 0 # put a dummy element here so that len(vocab) is correct 34 | for word in content: 35 | if len(word) == 0: 36 | continue 37 | if not word in the_vocab: 38 | the_vocab[word] = idx 39 | idx += 1 40 | return the_vocab 41 | 42 | 43 | def default_text2id(sentence, the_vocab): 44 | words = sentence.split(' ') 45 | words = [the_vocab[w] for w in words if len(w) > 0] 46 | return words 47 | 48 | 49 | def default_gen_buckets(sentences, batch_size, the_vocab): 50 | len_dict = {} 51 | max_len = -1 52 | for sentence in sentences: 53 | words = default_text2id(sentence, the_vocab) 54 | if len(words) == 0: 55 | continue 56 | if len(words) > max_len: 57 | max_len = len(words) 58 | if len(words) in len_dict: 59 | len_dict[len(words)] += 1 60 | else: 61 | len_dict[len(words)] = 1 62 | print(len_dict) 63 | 64 | tl = 0 65 | buckets = [] 66 | for l, n in len_dict.items(): # TODO: There are better heuristic ways to do this 67 | if n + tl >= batch_size: 68 | buckets.append(l) 69 | tl = 0 70 | else: 71 | tl += n 72 | if tl > 0: 73 | buckets.append(max_len) 74 | return buckets 75 | 76 | 77 | class SimpleBatch(object): 78 | def __init__(self, data_names, data, label_names, label, bucket_key): 79 | self.data = data 80 | self.label = label 81 | self.data_names = data_names 82 | self.label_names = label_names 83 | self.bucket_key = bucket_key 84 | 85 | self.pad = 0 86 | self.index = None # TODO: what is index? 87 | 88 | @property 89 | def provide_data(self): 90 | return [(n, x.shape) for n, x in zip(self.data_names, self.data)] 91 | 92 | @property 93 | def provide_label(self): 94 | return [(n, x.shape) for n, x in zip(self.label_names, self.label)] 95 | 96 | 97 | class DummyIter(mx.io.DataIter): 98 | '''A dummy iterator that always return the same batch, used for speed testing''' 99 | 100 | def __init__(self, real_iter): 101 | super(DummyIter, self).__init__() 102 | self.real_iter = real_iter 103 | self.provide_data = real_iter.provide_data 104 | self.provide_label = real_iter.provide_label 105 | self.batch_size = real_iter.batch_size 106 | 107 | for batch in real_iter: 108 | self.the_batch = batch 109 | break 110 | 111 | def __iter__(self): 112 | return self 113 | 114 | def next(self): 115 | return self.the_batch 116 | 117 | 118 | class MaskedBucketSentenceIter(mx.io.DataIter): 119 | def __init__(self, source_path, target_path, source_vocab, target_vocab, 120 | buckets, batch_size, 121 | source_init_states, target_init_states, 122 | source_data_name='source', source_mask_name='source_mask', 123 | target_data_name='target', target_mask_name='target_mask', 124 | label_name='target_softmax_label', 125 | text2id=None, read_content=None, 126 | max_read_sample=sys.maxsize): 127 | super(MaskedBucketSentenceIter, self).__init__() 128 | 129 | if text2id is None: 130 | self.text2id = default_text2id 131 | else: 132 | self.text2id = text2id 133 | if read_content is None: 134 | self.read_content = default_read_content 135 | else: 136 | self.read_content = read_content 137 | 138 | source_sentences = self.read_content(source_path, max_read_sample) 139 | target_sentences = self.read_content(target_path, max_read_sample) 140 | assert len(source_sentences) == len(target_sentences) 141 | 142 | self.batch_size = batch_size 143 | self.source_data_name = source_data_name 144 | self.target_data_name = target_data_name 145 | self.label_name = label_name 146 | self.source_mask_name = source_mask_name 147 | self.target_mask_name = target_mask_name 148 | 149 | buckets.sort() 150 | self.buckets = buckets 151 | self.source_data = [[] for _ in buckets] 152 | self.target_data = [[] for _ in buckets] 153 | self.label_data = [[] for _ in buckets] 154 | self.source_mask_data = [[] for _ in buckets] 155 | self.target_mask_data = [[] for _ in buckets] 156 | 157 | # pre-allocate with the largest bucket for better memory sharing 158 | num_of_data = len(source_sentences) 159 | for i in range(num_of_data): 160 | source = source_sentences[i] 161 | target = [''] + target_sentences[i] 162 | label = target_sentences[i] + [''] 163 | source_sentence = self.text2id(source, source_vocab) 164 | target_sentence = self.text2id(target, target_vocab) 165 | label_id = self.text2id(label, target_vocab) 166 | if len(source_sentence) == 0 or len(target_sentence) == 0: 167 | continue 168 | for j, bkt in enumerate(buckets): 169 | if bkt[0] >= len(source) and bkt[1] >= len(target): 170 | self.source_data[j].append(source_sentence) 171 | self.target_data[j].append(target_sentence) 172 | self.label_data[j].append(label_id) 173 | break 174 | # we just ignore the sentence it is longer than the maximum 175 | # bucket size here 176 | source_data_clean = [] 177 | target_data_clean = [] 178 | label_data_clean = [] 179 | buckets_clean = [] 180 | for i in range(len(self.source_data)): 181 | if len(self.source_data[i]) > 0: 182 | source_data_clean.append(self.source_data[i]) 183 | target_data_clean.append(self.target_data[i]) 184 | label_data_clean.append(self.label_data[i]) 185 | buckets_clean.append(self.buckets[i]) 186 | 187 | self.source_data = source_data_clean 188 | self.target_data = target_data_clean 189 | self.label_data = label_data_clean 190 | self.buckets = buckets_clean 191 | del buckets 192 | self.default_bucket_key = max(self.buckets) 193 | 194 | # convert data into ndarrays for better speed during training 195 | source_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)] 196 | source_mask_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)] 197 | target_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)] 198 | target_mask_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)] 199 | label_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.label_data)] 200 | for i_bucket in range(len(self.buckets)): 201 | for j in range(len(self.source_data[i_bucket])): 202 | source = self.source_data[i_bucket][j] 203 | target = self.target_data[i_bucket][j] 204 | label = self.label_data[i_bucket][j] 205 | source_data[i_bucket][j, :len(source)] = source 206 | source_mask_data[i_bucket][j, :len(source)] = 1 207 | target_data[i_bucket][j, :len(target)] = target 208 | target_mask_data[i_bucket][j, :len(target)] = 1 209 | label_data[i_bucket][j, :len(label)] = label 210 | self.source_data = source_data 211 | self.source_mask_data = source_mask_data 212 | self.target_data = target_data 213 | self.target_mask_data = target_mask_data 214 | self.label_data = label_data 215 | 216 | # Get the size of each bucket, so that we could sample 217 | # uniformly from the bucket 218 | bucket_sizes = [len(x) for x in self.source_data] 219 | 220 | print("Summary of dataset ==================") 221 | total_count = 0 222 | for bkt, size in zip(self.buckets, bucket_sizes): 223 | print("bucket of {0} : {1} samples".format(bkt, size)) 224 | total_count += size 225 | print('Total: {0} ({1}) in {2} buckets'.format(total_count, num_of_data, len(self.buckets))) 226 | 227 | self.make_data_iter_plan() 228 | 229 | self.source_init_states = source_init_states 230 | self.target_init_states = target_init_states 231 | self.source_init_state_arrays = [mx.nd.zeros(x[1]) for x in source_init_states] 232 | self.target_init_state_arrays = [mx.nd.zeros(x[1]) for x in target_init_states] 233 | 234 | self.provide_data = [(source_data_name, (batch_size, self.default_bucket_key[0])), 235 | (source_mask_name, (batch_size, self.default_bucket_key[0])), 236 | (target_data_name, (batch_size, self.default_bucket_key[1])), 237 | # (target_mask_name, (batch_size, self.default_bucket_key[1])) 238 | ] + source_init_states + target_init_states 239 | 240 | self.provide_label = [(label_name, (self.batch_size, self.default_bucket_key[1]))] 241 | 242 | def make_data_iter_plan(self): 243 | "make a random data iteration plan" 244 | # truncate each bucket into multiple of batch-size 245 | bucket_n_batches = [] 246 | for i in range(len(self.source_data)): 247 | bucket_n_batches.append(len(self.source_data[i]) / self.batch_size) 248 | self.source_data[i] = self.source_data[i][:int(bucket_n_batches[i] * self.batch_size)] 249 | self.source_mask_data[i] = self.source_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)] 250 | self.target_data[i] = self.target_data[i][:int(bucket_n_batches[i] * self.batch_size)] 251 | self.target_mask_data[i] = self.target_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)] 252 | 253 | bucket_plan = np.hstack([np.zeros(int(n), int) + i for i, n in enumerate(bucket_n_batches)]) 254 | np.random.shuffle(bucket_plan) 255 | 256 | bucket_idx_all = [np.random.permutation(len(x)) for x in self.source_data] 257 | 258 | self.bucket_plan = bucket_plan 259 | self.bucket_idx_all = bucket_idx_all 260 | self.bucket_curr_idx = [0 for _ in self.source_data] 261 | 262 | self.source_data_buffer = [] 263 | self.source_mask_data_buffer = [] 264 | self.target_data_buffer = [] 265 | self.target_mask_data_buffer = [] 266 | self.label_buffer = [] 267 | for i_bucket in range(len(self.source_data)): 268 | source_data = np.zeros((self.batch_size, self.buckets[i_bucket][0])) 269 | source_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][0])) 270 | target_data = np.zeros((self.batch_size, self.buckets[i_bucket][1])) 271 | target_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][1])) 272 | label = np.zeros((self.batch_size, self.buckets[i_bucket][1])) 273 | 274 | self.source_data_buffer.append(source_data) 275 | self.source_mask_data_buffer.append(source_mask_data) 276 | self.target_data_buffer.append(target_data) 277 | self.target_mask_data_buffer.append(target_mask_data) 278 | self.label_buffer.append(label) 279 | self.iterIndex = 0 280 | 281 | def next(self): 282 | """Get next data batch from iterator. 283 | 284 | Returns 285 | ------- 286 | DataBatch 287 | The data of next batch. 288 | 289 | Raises 290 | ------ 291 | StopIteration 292 | If the end of the data is reached 293 | """ 294 | if self.iterIndex == len(self.bucket_plan): 295 | raise StopIteration 296 | 297 | i_bucket = self.bucket_plan[self.iterIndex] 298 | 299 | source_data = self.source_data_buffer[i_bucket] 300 | source_mask_data = self.source_mask_data_buffer[i_bucket] 301 | target_data = self.target_data_buffer[i_bucket] 302 | target_mask_data = self.target_mask_data_buffer[i_bucket] 303 | label = self.label_buffer[i_bucket] 304 | 305 | i_idx = self.bucket_curr_idx[i_bucket] 306 | idx = self.bucket_idx_all[i_bucket][i_idx:i_idx + self.batch_size] 307 | self.bucket_curr_idx[i_bucket] += self.batch_size 308 | source_data[:] = self.source_data[i_bucket][idx] 309 | source_mask_data[:] = self.source_mask_data[i_bucket][idx] 310 | target_data[:] = self.target_data[i_bucket][idx] 311 | target_mask_data[:] = self.target_mask_data[i_bucket][idx] 312 | label[:] = self.label_data[i_bucket][idx] 313 | 314 | data_all = [mx.nd.array(source_data), mx.nd.array(source_mask_data)] + \ 315 | [mx.nd.array(target_data), 316 | # mx.nd.array(target_mask_data) 317 | ] + \ 318 | self.source_init_state_arrays + self.target_init_state_arrays 319 | label_all = [mx.nd.array(label)] 320 | 321 | bucket_key = self.buckets[i_bucket] 322 | provide_data = [(self.source_data_name, (self.batch_size, bucket_key[0])), 323 | (self.source_mask_name, (self.batch_size, bucket_key[0])), 324 | (self.target_data_name, (self.batch_size, bucket_key[1])), 325 | # (self.target_mask_name, (self.batch_size, bucket_key[1])) 326 | ] + self.source_init_states + self.target_init_states 327 | provide_label = [(self.label_name, (self.batch_size, bucket_key[1]))] 328 | 329 | data_batch = DataBatch(data_all, label_all, pad=0, 330 | bucket_key=bucket_key, 331 | provide_data=provide_data, 332 | provide_label=provide_label) 333 | self.iterIndex += 1 334 | return data_batch 335 | 336 | def reset(self): 337 | self.iterIndex = 0 338 | self.bucket_curr_idx = [0 for _ in self.source_data] 339 | -------------------------------------------------------------------------------- /nmt/tester.py: -------------------------------------------------------------------------------- 1 | import xconfig 2 | from inference import BiS2SInferenceModel 3 | from inference_mask import BiS2SInferenceModel_mask 4 | from xutils import read_content, load_vocab, sentence2id, word2id 5 | 6 | import mxnet as mx 7 | import numpy as np 8 | import logging 9 | import random 10 | import bisect 11 | from collections import OrderedDict, namedtuple 12 | from mxwrap.rnn.LSTM import LSTMState 13 | 14 | BeamNode = namedtuple("BeamNode", ["father", "content", "score", "acc_score", "finish", "finishLen"]) 15 | 16 | random_sample = False 17 | 18 | 19 | def get_inference_models(buckets, arg_params, source_vocab_size, target_vocab_size, ctx, batch_size): 20 | # build an inference model 21 | model_buckets = OrderedDict() 22 | for bucket in buckets: 23 | model_buckets[bucket] = BiS2SInferenceModel_mask(s_num_lstm_layer=xconfig.num_lstm_layer, s_seq_len=bucket[0], 24 | s_vocab_size=source_vocab_size + 1, 25 | s_num_hidden=xconfig.num_hidden, s_num_embed=xconfig.num_embed, 26 | s_dropout=0, 27 | t_num_lstm_layer=xconfig.num_lstm_layer, t_seq_len=bucket[1], 28 | t_vocab_size=target_vocab_size + 1, 29 | t_num_hidden=xconfig.num_hidden, t_num_embed=xconfig.num_embed, 30 | t_num_label=target_vocab_size + 1, t_dropout=0, 31 | arg_params=arg_params, 32 | use_masking=True, 33 | ctx=ctx, batch_size=batch_size) 34 | return model_buckets 35 | 36 | 37 | def get_bucket_model(model_buckets, input_len): 38 | for bucket, m in model_buckets.items(): 39 | if bucket[0] >= input_len: 40 | return m 41 | return None 42 | 43 | 44 | # helper strcuture for prediction 45 | def MakeRevertVocab(vocab): 46 | dic = {} 47 | for k, v in vocab.items(): 48 | dic[v] = k 49 | return dic 50 | 51 | 52 | # make input from char 53 | def MakeInput(sentence, vocab, unroll_len, data_arr, mask_arr): 54 | idx = sentence2id(sentence, vocab) 55 | tmp = np.zeros((1, unroll_len)) 56 | mask = np.zeros((1, unroll_len)) 57 | for i in range(min(len(idx), unroll_len)): 58 | tmp[0][i] = idx[i] 59 | mask[0][i] = 1 60 | data_arr[:] = tmp 61 | mask_arr[:] = mask 62 | 63 | 64 | def MakeInput_beam(sentence, vocab, unroll_len, data_arr, mask_arr, beam_size): 65 | idx = sentence2id(sentence, vocab) 66 | tmp = np.zeros((beam_size, unroll_len)) 67 | mask = np.zeros((beam_size, unroll_len)) 68 | for i in range(min(len(idx), unroll_len)): 69 | for j in range(beam_size): 70 | tmp[j][i] = idx[i] 71 | mask[j][i] = 1 72 | data_arr[:] = tmp 73 | mask_arr[:] = mask 74 | 75 | 76 | def MakeInput_batch(sentences, vocab, unroll_len, data_arr, mask_arr, batch_size): 77 | tmp = np.zeros((batch_size, unroll_len)) 78 | mask = np.zeros((batch_size, unroll_len)) 79 | actual_sample_num = len(sentences) 80 | for i in range(min(batch_size, actual_sample_num)): 81 | idx = sentence2id(sentences[i], vocab) 82 | for j in range(min(len(idx), unroll_len)): 83 | tmp[i][j] = idx[j] 84 | mask[i][j] = 1 85 | data_arr[:] = tmp 86 | mask_arr[:] = mask 87 | 88 | 89 | def MakeTargetInput(char, vocab, arr): 90 | idx = word2id(char, vocab) 91 | tmp = np.zeros((1,)) 92 | tmp[0] = idx 93 | arr[:] = tmp 94 | 95 | 96 | def MakeTargetInput_batch(chars, vocab, arr, batch_size): 97 | tmp = np.zeros((batch_size,)) 98 | actual_sample_num = len(chars) 99 | for idx in range(min(batch_size, actual_sample_num)): 100 | word_id = word2id(chars[idx], vocab) 101 | tmp[idx] = word_id 102 | arr[:] = tmp 103 | 104 | 105 | def MakeTargetInput_beam(beam_nodes, vocab, arr): 106 | tmp = np.zeros((len(beam_nodes),)) 107 | for idx in range(len(beam_nodes)): 108 | word_id = vocab[beam_nodes[idx].content] if beam_nodes[idx].content in vocab else vocab[''] 109 | tmp[idx] = word_id 110 | arr[:] = tmp 111 | 112 | 113 | # helper function for random sample 114 | def _cdf(weights): 115 | total = sum(weights) 116 | result = [] 117 | cumsum = 0 118 | for w in weights: 119 | cumsum += w 120 | result.append(cumsum / total) 121 | return result 122 | 123 | 124 | def _choice(population, weights): 125 | assert len(population) == len(weights) 126 | cdf_vals = _cdf(weights) 127 | x = random.random() 128 | idx = bisect.bisect(cdf_vals, x) 129 | return population[idx] 130 | 131 | 132 | # we can use random output or fixed output by choosing largest probability 133 | def MakeOutput(prob, vocab, sample=False, temperature=1.): 134 | if sample == False: 135 | idx = np.argmax(prob, axis=1)[0] 136 | else: 137 | fix_dict = [""] + [vocab[i] for i in range(1, len(vocab) + 1)] 138 | scale_prob = np.clip(prob, 1e-6, 1 - 1e-6) 139 | rescale = np.exp(np.log(scale_prob) / temperature) 140 | rescale[:] /= rescale.sum() 141 | return _choice(fix_dict, rescale[0, :]) 142 | try: 143 | char = vocab[idx] 144 | except: 145 | char = '' 146 | return char 147 | 148 | 149 | # we can use random output or fixed output by choosing largest probability 150 | def MakeOutput_batch(probs, vocab, sample=False, temperature=1.): 151 | res = [] 152 | for i in range(probs.shape[0]): 153 | prob = probs[i] 154 | if sample == False: 155 | idx = np.argmax(prob) 156 | else: 157 | fix_dict = [""] + [vocab[i] for i in range(1, len(vocab) + 1)] 158 | scale_prob = np.clip(prob, 1e-6, 1 - 1e-6) 159 | rescale = np.exp(np.log(scale_prob) / temperature) 160 | rescale[:] /= rescale.sum() 161 | return _choice(fix_dict, rescale[0, :]) 162 | try: 163 | char = vocab[idx] 164 | except: 165 | char = '' 166 | res.append(char) 167 | return res 168 | 169 | 170 | def translate_one(max_decode_len, sentence, model_buckets, unroll_len, source_vocab, target_vocab, revert_vocab, 171 | target_ndarray): 172 | input_length = len(sentence) 173 | cur_model = get_bucket_model(model_buckets, input_length) 174 | input_ndarray = mx.nd.zeros((1, unroll_len)) 175 | mask_ndarray = mx.nd.zeros((1, unroll_len)) 176 | output = [''] 177 | MakeInput(sentence, source_vocab, unroll_len, input_ndarray, mask_ndarray) 178 | last_encoded, all_encoded = cur_model.encode(input_ndarray, 179 | mask_ndarray) # last_encoded means the last time step hidden 180 | for i in range(max_decode_len): 181 | MakeTargetInput(output[-1], target_vocab, target_ndarray) 182 | prob, attention_weights = cur_model.decode_forward(last_encoded, all_encoded, mask_ndarray, target_ndarray, 183 | i == 0) 184 | next_char = MakeOutput(prob, revert_vocab, random_sample) 185 | if next_char == '': 186 | break 187 | output.append(next_char) 188 | return output[1:] 189 | 190 | 191 | def translate_greedy_batch(max_decode_len, sentences, batch_size, model_buckets, unroll_len, source_vocab, target_vocab, 192 | revert_vocab, target_ndarray): 193 | cur_model = get_bucket_model(model_buckets, unroll_len) 194 | input_ndarray = mx.nd.zeros((batch_size, unroll_len)) 195 | mask_ndarray = mx.nd.zeros((batch_size, unroll_len)) 196 | output = [[''] * batch_size] 197 | MakeInput_batch(sentences, source_vocab, unroll_len, input_ndarray, mask_ndarray, batch_size) 198 | last_encoded, all_encoded = cur_model.encode(input_ndarray, 199 | mask_ndarray) # last_encoded means the last time step hidden 200 | for i in range(max_decode_len): 201 | MakeTargetInput_batch(output[-1], target_vocab, target_ndarray, batch_size) 202 | probs, attention_weights = cur_model.decode_forward(last_encoded, all_encoded, mask_ndarray, target_ndarray, 203 | i == 0) 204 | next_chars = MakeOutput_batch(probs, revert_vocab, random_sample) 205 | finished = [ch == '' for ch in next_chars] 206 | if all(finished): 207 | break 208 | output.append(next_chars) 209 | return output[1:] 210 | 211 | 212 | def _smallest(matrix, k, only_first_row=False): 213 | """Find k smallest elements of a matrix. 214 | 215 | Parameters 216 | ---------- 217 | matrix : :class:`numpy.ndarray` 218 | The matrix. 219 | k : int 220 | The number of smallest elements required. 221 | only_first_row : bool, optional 222 | Consider only elements of the first row. 223 | 224 | Returns 225 | ------- 226 | Tuple of ((row numbers, column numbers), values). 227 | 228 | """ 229 | if only_first_row: 230 | flatten = matrix[:1, :].flatten() 231 | else: 232 | flatten = matrix.flatten() 233 | # flatten = -flatten 234 | args = np.argpartition(flatten, k)[:k] 235 | args = args[np.argsort(flatten[args])] 236 | return np.unravel_index(args, matrix.shape), flatten[args] 237 | 238 | 239 | def translate_one_with_beam(max_decode_len, sentence, model_buckets, unroll_len, source_vocab, target_vocab, 240 | revert_vocab, target_ndarray, beam_size, eos_index): 241 | input_length = len(sentence) 242 | cur_model = get_bucket_model(model_buckets, input_length) 243 | input_ndarray = mx.nd.zeros((beam_size, unroll_len)) 244 | mask_ndarray = mx.nd.zeros((beam_size, unroll_len)) 245 | 246 | beam = [[BeamNode(father=-1, content='', score=0.0, acc_score=0.0, finish=False, finishLen=0) for i in 247 | range(beam_size)]] 248 | beam_state = [None] 249 | 250 | MakeInput_beam(sentence, source_vocab, unroll_len, input_ndarray, mask_ndarray, beam_size) 251 | last_encoded, all_encoded = cur_model.encode(input_ndarray, 252 | mask_ndarray) # last_encoded means the last time step hidden 253 | for i in range(max_decode_len): 254 | MakeTargetInput_beam(beam[-1], target_vocab, target_ndarray) 255 | prob, attention_weights, new_state = cur_model.decode_forward_with_state(last_encoded, all_encoded, 256 | mask_ndarray, target_ndarray, 257 | beam_state[-1], i == 0) 258 | log_prob = -mx.ndarray.log(prob) 259 | finished_beam = [t for t, x in enumerate(beam[-1]) if x.finish] 260 | for idx in range(beam_size): 261 | # log_prob[idx] = mx.nd.add(log_prob[idx], beam[-1][idx].score) 262 | if not beam[-1][idx].finish: 263 | # log_prob[idx] += beam[-1][idx].acc_score 264 | log_prob[idx] = (log_prob[idx] + beam[-1][idx].acc_score * beam[-1][idx].finishLen) / ( 265 | beam[-1][idx].finishLen + 1) 266 | else: 267 | # log_prob[idx] = beam[-1][idx].acc_score 268 | log_prob[idx] = beam[-1][idx].acc_score 269 | for idx in finished_beam: 270 | log_prob[idx][:eos_index] = np.inf 271 | log_prob[idx][eos_index + 1:] = np.inf 272 | 273 | (indexes, outputs), chosen_costs = _smallest(log_prob.asnumpy(), beam_size, only_first_row=(i == 0)) 274 | next_chars = [revert_vocab[idx] if idx in revert_vocab else '' for idx in outputs] 275 | 276 | next_state_h = mx.nd.empty(new_state.h.shape, ctx=mx.gpu(0)) 277 | next_state_c = mx.nd.empty(new_state.c.shape, ctx=mx.gpu(0)) 278 | for idx in range(beam_size): 279 | next_state_h[idx] = new_state.h[np.asscalar(indexes[idx])] 280 | next_state_c[idx] = new_state.c[np.asscalar(indexes[idx])] 281 | next_state = LSTMState(c=next_state_c, h=next_state_h) 282 | beam_state.append(next_state) 283 | 284 | next_beam = [BeamNode(father=indexes[idx], 285 | content=next_chars[idx] if not beam[-1][indexes[idx]].finish else beam[-1][ 286 | indexes[idx]].content, 287 | score=chosen_costs[idx] - beam[-1][indexes[idx]].acc_score, 288 | acc_score=chosen_costs[idx], 289 | finish=(next_chars[idx] == '' or beam[-1][indexes[idx]].finish), 290 | finishLen=(beam[-1][indexes[idx]].finishLen if beam[-1][indexes[idx]].finish else ( 291 | beam[-1][indexes[idx]].finishLen + 1))) for 292 | idx in range(beam_size)] 293 | beam.append(next_beam) 294 | finished = [node.finish for node in beam[-1]] 295 | if all(finished): 296 | break 297 | # output.append(next_char) 298 | all_result = [] 299 | all_score = [] 300 | for aaa in range(beam_size): 301 | ptr = aaa 302 | result = [] 303 | 304 | for idx in range(len(beam) - 1 - 1, 0, -1): 305 | word = beam[idx][ptr].content 306 | if word != '': 307 | result.append(word) 308 | ptr = beam[idx][ptr].father 309 | result = result[::-1] 310 | all_result.append(' '.join(result)) 311 | all_score.append(beam[-1][aaa].acc_score) 312 | 313 | return all_result, all_score 314 | 315 | 316 | def test_on_file_iwslt(input_file, output_file, model_buckets, source_vocab, target_vocab, revert_vocab, ctx, 317 | unroll_len, 318 | max_decode_len, 319 | do_beam=False, 320 | beam_size=1): 321 | beam_file = open(output_file + '_beam', 'w', encoding='utf-8') if do_beam else None 322 | batch_size = beam_size if do_beam else 1 323 | eos_index = target_vocab[xconfig.eos_word] 324 | target_ndarray = mx.nd.zeros((batch_size,), ctx=ctx) 325 | read_count = 0 326 | with open(input_file, mode='r', encoding='utf-8') as f, open(output_file, 'w', encoding='utf-8') as of: 327 | for line in f: 328 | read_count += 1 329 | if (read_count - 1) % (xconfig.bleu_ref_number + 2) != 0: 330 | continue 331 | 332 | ch = line.split(' |||| ')[0].strip().split(' ') 333 | if do_beam: 334 | # en = translate_one_with_beam(ch, model_buckets, beam_size) 335 | all_en, all_score = translate_one_with_beam(max_decode_len, ch, model_buckets, unroll_len, source_vocab, 336 | target_vocab, revert_vocab, target_ndarray, 337 | beam_size, eos_index) 338 | en = all_en[0] 339 | else: 340 | en = translate_one(max_decode_len, ch, model_buckets, unroll_len, source_vocab, target_vocab, 341 | revert_vocab, 342 | target_ndarray) 343 | en = ' '.join(en) 344 | of.write(en + '\n') 345 | if do_beam: 346 | for idx in range(len(all_en)): 347 | beam_file.write('{0}\t{1}\n'.format(all_en[idx], all_score[idx])) 348 | beam_file.write('\n') 349 | if beam_file: 350 | beam_file.close() 351 | 352 | 353 | def test_on_file_greedy_batch_iwslt(input_file, output_file, model_buckets, source_vocab, target_vocab, revert_vocab, 354 | ctx, 355 | unroll_len, max_decode_len, batch_size): 356 | with open(input_file, mode='r', encoding='utf-8') as f: 357 | lines = f.read().splitlines() 358 | lines = lines[0::(xconfig.bleu_ref_number + 2)] 359 | input_sents = [line.split(' |||| ')[0].strip().split(' ') for line in lines] 360 | batch_sents = [input_sents[i: i + batch_size] for i in range(0, len(input_sents), batch_size)] 361 | eos_index = target_vocab[xconfig.eos_word] 362 | with open(output_file, 'w', encoding='utf-8') as of: 363 | for batch in batch_sents: 364 | target_ndarray = mx.nd.zeros((batch_size,), ctx=ctx) 365 | output_sents = translate_greedy_batch(max_decode_len, batch, batch_size, 366 | model_buckets, unroll_len, source_vocab, 367 | target_vocab, revert_vocab, target_ndarray) 368 | for i in range(len(batch)): 369 | tmp = [] 370 | for j in range(len(output_sents)): 371 | word = output_sents[j][i] 372 | if word == xconfig.eos_word: 373 | break 374 | tmp.append(word) 375 | of.write(' '.join(tmp) + '\n') 376 | 377 | 378 | def test(): 379 | # load vocabulary 380 | source_vocab = load_vocab(xconfig.source_vocab_path, xconfig.special_words) 381 | target_vocab = load_vocab(xconfig.target_vocab_path, xconfig.special_words) 382 | 383 | revert_vocab = MakeRevertVocab(target_vocab) 384 | 385 | print('source_vocab size: {0}'.format(len(source_vocab))) 386 | print('target_vocab size: {0}'.format(len(target_vocab))) 387 | 388 | # load model from check-point 389 | _, arg_params, __ = mx.model.load_checkpoint(xconfig.model_to_load_prefix, xconfig.model_to_load_number) 390 | 391 | buckets = xconfig.buckets 392 | buckets = [max(buckets)] 393 | 394 | if xconfig.use_batch_greedy_search: 395 | if xconfig.use_beam_search: 396 | logging.warning( 397 | 'use_batch_greedy_search and use_beam_search both True, fallback to use_batch_greedy_search') 398 | 399 | model_buckets = get_inference_models(buckets, arg_params, len(source_vocab), len(target_vocab), 400 | xconfig.test_device, batch_size=xconfig.greedy_batch_size) 401 | test_on_file_greedy_batch_iwslt(input_file=xconfig.test_source, output_file=xconfig.test_output, 402 | model_buckets=model_buckets, 403 | source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab, 404 | ctx=xconfig.test_device, unroll_len=max(buckets)[0], 405 | max_decode_len=xconfig.max_decode_len, batch_size=xconfig.greedy_batch_size) 406 | else: 407 | model_buckets = get_inference_models(buckets, arg_params, len(source_vocab), len(target_vocab), 408 | xconfig.test_device, batch_size=xconfig.beam_size) 409 | test_on_file_iwslt(input_file=xconfig.test_source, output_file=xconfig.test_output, model_buckets=model_buckets, 410 | source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab, 411 | ctx=xconfig.test_device, unroll_len=max(buckets)[0], max_decode_len=xconfig.max_decode_len, 412 | do_beam=xconfig.use_beam_search, beam_size=xconfig.beam_size) 413 | 414 | del model_buckets 415 | from xmetric import get_bleu 416 | raw_output, scores = get_bleu(xconfig.test_gold, xconfig.test_output) 417 | logging.info(raw_output) 418 | logging.info(str(scores)) 419 | 420 | 421 | def test_use_model_param(arg_params, test_file, output_file, gold_file, use_beam=False, beam_size=-1): 422 | # load vocabulary 423 | source_vocab = load_vocab(xconfig.source_vocab_path, xconfig.special_words) 424 | target_vocab = load_vocab(xconfig.target_vocab_path, xconfig.special_words) 425 | 426 | revert_vocab = MakeRevertVocab(target_vocab) 427 | 428 | buckets = xconfig.buckets 429 | buckets = [max(buckets)] 430 | b_size = beam_size if use_beam else xconfig.greedy_batch_size 431 | model_buckets = get_inference_models(buckets, arg_params, len(source_vocab), len(target_vocab), 432 | xconfig.test_device, batch_size=b_size) 433 | if use_beam: 434 | test_on_file_iwslt(input_file=test_file, output_file=output_file, model_buckets=model_buckets, 435 | source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab, 436 | ctx=xconfig.test_device, unroll_len=max(buckets)[0], max_decode_len=xconfig.max_decode_len, 437 | do_beam=use_beam, beam_size=beam_size) 438 | else: 439 | test_on_file_greedy_batch_iwslt(input_file=test_file, output_file=output_file, model_buckets=model_buckets, 440 | source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab, 441 | ctx=xconfig.test_device, unroll_len=max(buckets)[0], 442 | max_decode_len=xconfig.max_decode_len, batch_size=xconfig.greedy_batch_size) 443 | from xmetric import get_bleu 444 | raw_output, score = get_bleu(gold_file, output_file) 445 | logging.info(raw_output) 446 | del model_buckets 447 | return score 448 | -------------------------------------------------------------------------------- /nmt/trainer.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | import mxnet as mx 4 | import logging 5 | 6 | import xconfig 7 | from xsymbol import sym_gen 8 | from xcallback import BatchCheckpoint, CheckBLEUBatch 9 | from xutils import read_content, load_vocab, sentence2id 10 | from xmetric import Perplexity, MyMakeLoss 11 | # from masked_bucket_io import MaskedBucketSentenceIter 12 | from masked_bucket_io_new import MaskedBucketSentenceIter 13 | 14 | 15 | def get_GRU_shape(): 16 | # initalize states for LSTM 17 | 18 | forward_source_init_h = [('forward_source_l%d_init_h' % l, (xconfig.batch_size, xconfig.num_hidden)) for l in 19 | range(xconfig.num_lstm_layer)] 20 | backward_source_init_h = [('backward_source_l%d_init_h' % l, (xconfig.batch_size, xconfig.num_hidden)) for l in 21 | range(xconfig.num_lstm_layer)] 22 | source_init_states = forward_source_init_h + backward_source_init_h 23 | 24 | target_init_c = [('target_l%d_init_c' % l, (xconfig.batch_size, xconfig.num_hidden)) for l in 25 | range(xconfig.num_lstm_layer)] 26 | # target_init_h = [('target_l%d_init_h' % l, (batch_size, num_hidden)) for l in range(num_lstm_layer)] 27 | target_init_states = [] 28 | return source_init_states, target_init_states 29 | 30 | 31 | def train(): 32 | # load vocabulary 33 | source_vocab = load_vocab(xconfig.source_vocab_path, xconfig.special_words) 34 | target_vocab = load_vocab(xconfig.target_vocab_path, xconfig.special_words) 35 | 36 | logging.info('source_vocab size: {0}'.format(len(source_vocab))) 37 | logging.info('target_vocab size: {0}'.format(len(target_vocab))) 38 | 39 | # get states shapes 40 | source_init_states, target_init_states = get_GRU_shape() 41 | # source_init_states, target_init_states = get_LSTM_shape() 42 | 43 | # build data iterator 44 | data_train = MaskedBucketSentenceIter(xconfig.train_source, xconfig.train_target, source_vocab, target_vocab, 45 | xconfig.buckets, xconfig.batch_size, 46 | source_init_states, target_init_states, 47 | text2id=sentence2id, read_content=read_content, 48 | max_read_sample=xconfig.train_max_samples) 49 | 50 | # data_dev = MaskedBucketSentenceIter(xconfig.dev_source, xconfig.dev_source, source_vocab, target_vocab, 51 | # xconfig.buckets, xconfig.batch_size, 52 | # source_init_states, target_init_states, seperate_char='\n', 53 | # text2id=sentence2id, read_content=read_content, 54 | # max_read_sample=xconfig.dev_max_samples) 55 | 56 | # Train a LSTM network as simple as feedforward network 57 | # optimizer = mx.optimizer.AdaDelta(clip_gradient=10.0) 58 | optimizer = mx.optimizer.Adam(clip_gradient=10.0, rescale_grad=1.0 / xconfig.batch_size) 59 | # optimizer = mx.optimizer.SGD(clip_gradient=10, learning_rate=0.01, rescale_grad=1.0 / xconfig.batch_size) 60 | _arg_params = None 61 | 62 | if xconfig.use_resuming: 63 | logging.info("Try resuming from {0} {1}".format(xconfig.resume_model_prefix, xconfig.resume_model_number)) 64 | try: 65 | _, __arg_params, __ = mx.model.load_checkpoint(xconfig.resume_model_prefix, xconfig.resume_model_number) 66 | logging.info("Resume succeeded.") 67 | _arg_params = __arg_params 68 | except: 69 | logging.error('Resume failed.') 70 | 71 | model = mx.mod.BucketingModule( 72 | sym_gen=sym_gen(len(source_vocab) + 1, len(target_vocab) + 1), 73 | default_bucket_key=data_train.default_bucket_key, 74 | context=xconfig.train_device, 75 | ) 76 | 77 | # Fit it 78 | model.fit(train_data=data_train, 79 | # eval_metric=mx.metric.np(Perplexity), 80 | eval_metric=mx.metric.CustomMetric(Perplexity), 81 | # eval_metric=mx.metric.np(MyMakeLoss), 82 | batch_end_callback=[mx.callback.Speedometer(xconfig.batch_size, xconfig.show_every_x_batch), ], 83 | # optimizer='sgd', 84 | # optimizer_params={'clip_gradient': 10.0, }, 85 | initializer=mx.init.Xavier(factor_type="in", magnitude=2.34, rnd_type='gaussian'), 86 | optimizer=optimizer, 87 | num_epoch=10, 88 | ) 89 | -------------------------------------------------------------------------------- /nmt/xcallback.py: -------------------------------------------------------------------------------- 1 | import xconfig 2 | 3 | import mxnet as mx 4 | import logging 5 | 6 | 7 | class BatchCheckpoint(object): 8 | def __init__(self, save_name, per_x_batch): 9 | self.save_name = save_name 10 | self.per_x_batch = per_x_batch 11 | from mxnet.model import save_checkpoint 12 | self._save = save_checkpoint 13 | 14 | def __call__(self, params): 15 | # batch_end_params = BatchEndParam(epoch=epoch, 16 | # nbatch=nbatch, 17 | # eval_metric=eval_metric, 18 | # locals=locals()) 19 | 20 | if params.nbatch % self.per_x_batch == 0: 21 | executor_manager = params.locals['executor_manager'] 22 | param_names = executor_manager.param_names 23 | param_arrays = executor_manager.param_arrays 24 | 25 | param_dict = {} 26 | for idx, name in enumerate(param_names): 27 | param_dict[name] = param_arrays[idx][0] 28 | 29 | self._save(self.save_name, 0, params.locals['symbol'], 30 | param_dict, params.locals['aux_params']) 31 | # TODO is this the correct way to save aux_params ? 32 | 33 | 34 | class CheckBLEUBatch(object): 35 | def __init__(self, start_epoch, per_batch, use_beam=False, beam_size=-1): 36 | self.best_bleu = -1.0 37 | self.best_epoch = -1 38 | self.start_epoch = start_epoch 39 | self.per_batch = per_batch 40 | self.use_beam_search = use_beam 41 | self.beam_size = beam_size 42 | from mxnet.model import save_checkpoint 43 | self._save = save_checkpoint 44 | # TODO ugly code 2333 45 | from tester import test_use_model_param 46 | self.bleu_computer = test_use_model_param 47 | 48 | def __call__(self, params): 49 | # batch_end_params = BatchEndParam(epoch=epoch, 50 | # nbatch=nbatch, 51 | # eval_metric=eval_metric, 52 | # locals=locals()) 53 | 54 | if params.nbatch % self.per_batch == 0: 55 | if params.epoch < self.start_epoch: 56 | print('Too early to check BLEU at epoch {0}'.format(params.epoch)) 57 | return 58 | logging.info('Checking BLEU for epoch {0} batch {1}'.format(params.epoch, params.nbatch)) 59 | gold = xconfig.dev_source 60 | test = xconfig.dev_source 61 | output = xconfig.dev_output 62 | 63 | executor_manager = params.locals['executor_manager'] 64 | param_names = executor_manager.param_names 65 | param_arrays = executor_manager.param_arrays 66 | 67 | param_dict = {} 68 | for idx, name in enumerate(param_names): 69 | param_dict[name] = param_arrays[idx][0] 70 | 71 | cur_rouge = self.bleu_computer(arg_params=param_dict, test_file=test, output_file=output, gold_file=gold, 72 | use_beam=self.use_beam_search, beam_size=self.beam_size) 73 | logging.info('BLEU: {0} @ epoch {1} batch {2}'.format(cur_rouge, params.epoch, params.nbatch)) 74 | 75 | if cur_rouge > self.best_bleu: 76 | logging.info( 77 | 'Current BLEU: {0} > prev best {1} in epoch {2}'.format(cur_rouge, self.best_bleu, 78 | self.best_epoch)) 79 | self.best_bleu = cur_rouge 80 | self.best_epoch = params.epoch 81 | logging.info('Saving...') 82 | self._save("best_bleu", params.epoch + 1, params.locals['symbol'], 83 | param_dict, params.locals['aux_params']) 84 | # TODO is this the correct way to save aux_params ? 85 | -------------------------------------------------------------------------------- /nmt/xconfig.py: -------------------------------------------------------------------------------- 1 | import os 2 | import mxnet as mx 3 | 4 | # path 5 | source_root = os.path.abspath(os.path.join(os.getcwd(), os.path.pardir)) 6 | data_root = os.path.join(source_root, 'IWSLT') 7 | model_root = os.path.join(source_root, 'IWSLT', 'model') 8 | log_root = os.path.join(source_root, 'IWSLT', 'log') 9 | 10 | if not os.path.exists(model_root): 11 | os.makedirs(model_root) 12 | if not os.path.exists(log_root): 13 | os.makedirs(log_root) 14 | 15 | # dictionary 16 | bos_word = '' 17 | eos_word = '' 18 | unk_word = '' 19 | special_words = {unk_word: 1, bos_word: 2, eos_word: 3} 20 | source_vocab_path = os.path.join(data_root, 'zh', 'zh.vocab.pkl') 21 | target_vocab_path = os.path.join(data_root, 'en', 'en.vocab.pkl') 22 | 23 | # data set 24 | train_source = os.path.join(data_root, 'zh', 'zh.txt') 25 | train_target = os.path.join(data_root, 'en', 'en.txt') 26 | train_max_samples = 100000 27 | dev_source = os.path.join(data_root, 'dev', 'IWSLT.dev.txt') 28 | dev_target = os.path.join(data_root, 'invalid', 'invalid') 29 | dev_output = os.path.join(data_root, 'dev', 'dev.out') 30 | dev_max_samples = 100000 31 | test_source = os.path.join(data_root, 'test', 'IWSLT.test.txt') 32 | test_gold = os.path.join(data_root, 'test', 'IWSLT.test.txt') 33 | 34 | bleu_ref_number = 7 35 | 36 | # model parameter 37 | batch_size = 64 38 | bucket_stride = 8 39 | buckets = [] 40 | for i in range(8, 128, bucket_stride): 41 | for j in range(8, 128, bucket_stride): 42 | buckets.append((i, j)) 43 | num_hidden = 512 # hidden unit in LSTM cell 44 | num_embed = 512 # embedding dimension 45 | num_lstm_layer = 1 # number of lstm layer 46 | 47 | # training parameter 48 | num_epoch = 60 49 | learning_rate = 1 50 | momentum = 0.1 51 | dropout = 0.5 52 | show_every_x_batch = 100 53 | eval_per_x_batch = 400 54 | eval_start_epoch = 4 55 | 56 | # model save option 57 | model_save_name = os.path.join(model_root, "zh-en-iwslt") 58 | model_save_freq = 1 # every x epoch 59 | checkpoint_name = os.path.join(model_root, 'checkpoint_model') 60 | checkpoint_freq_batch = 1000 # save checkpoint model every x batch 61 | 62 | # train device 63 | train_device = [mx.context.gpu(0)] 64 | # test device 65 | test_device = mx.context.gpu(0) 66 | 67 | # test parameter 68 | model_to_load_prefix = os.path.join(model_root, 'zh-en-iwslt') 69 | model_to_load_number = 1 70 | use_beam_search = True 71 | beam_size = 12 72 | if not use_beam_search: beam_size = 1 73 | test_output = os.path.join(data_root, 'test', 'test.out') 74 | use_batch_greedy_search = False 75 | greedy_batch_size = 32 76 | max_decode_len = 15 77 | 78 | # resume training 79 | use_resuming = False 80 | resume_model_prefix = os.path.join(model_root, "checkpoint_model") 81 | resume_model_number = 0 82 | 83 | 84 | def get_config_str(): 85 | res = '' 86 | res += 'Config:\n' 87 | import collections 88 | hehe = collections.OrderedDict(sorted(globals().items(), key=lambda x: x[0])) 89 | for k, v in hehe.items(): 90 | if k.startswith('__'): continue 91 | if k.startswith('SEPARATOR'): continue 92 | if k.startswith('get'): continue 93 | if type(v) == (type(os)): continue 94 | if len(k) < 2: continue 95 | res += '{0}: {1}\n'.format(k, v) 96 | return res 97 | -------------------------------------------------------------------------------- /nmt/xmetric.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | # Evaluation 5 | def Perplexity(label, pred): 6 | label = label.T.reshape((-1,)) 7 | loss = 0. 8 | mask_count = 0 9 | for i in range(pred.shape[0]): 10 | if int(label[i]) == 0: 11 | mask_count += 1 12 | continue 13 | loss += -np.log(max(1e-10, pred[i][int(label[i])])) 14 | return np.exp(loss / (label.size - mask_count)) 15 | 16 | 17 | def MyCrossEntropy(label, pred): 18 | label = label.T.reshape((-1,)) 19 | loss = 0. 20 | for i in range(pred.shape[0]): 21 | loss += -np.log(max(1e-10, pred[i][int(label[i])])) 22 | return loss / label.size 23 | 24 | 25 | def MyMakeLoss(label, pred): 26 | # label = label.T.reshape((-1,)) 27 | # loss = 0. 28 | # for i in range(pred.shape[0]): 29 | # loss += -np.log(max(1e-10, pred[i][int(label[i])])) 30 | return pred[0] 31 | 32 | 33 | def MyCrossEntropy_mask(label, pred): 34 | label = label.T.reshape((-1,)) 35 | loss = 0. 36 | mask_count = 0 37 | for i in range(pred.shape[0]): 38 | if int(label[i]) == 0: 39 | mask_count += 1 40 | continue 41 | loss += -np.log(max(1e-10, pred[i][int(label[i])])) 42 | return loss / (label.size - mask_count) 43 | 44 | 45 | def get_bleu(gold, test): 46 | import subprocess 47 | bleu_computer = r"CompBleu_new.exe" 48 | rawoutput = subprocess.check_output([bleu_computer, gold, test]) 49 | output = rawoutput.splitlines() 50 | bleu = float(output[-1].decode('utf-8').split('=')[-1].strip()) 51 | return rawoutput, bleu 52 | -------------------------------------------------------------------------------- /nmt/xsymbol.py: -------------------------------------------------------------------------------- 1 | from mxwrap.seq2seq.encoder import BiDirectionalGruEncoder 2 | from mxwrap.seq2seq.decoder import GruAttentionDecoder 3 | from mxwrap.attention.ConcatAttention import ConcatAttention 4 | 5 | import xconfig 6 | import mxnet as mx 7 | 8 | 9 | def s2s_unroll(encoder, attention, decoder, 10 | source_len, target_len, 11 | input_names, output_names, 12 | **kwargs): 13 | forward_hidden_all, backward_hidden_all, source_representations, source_mask_sliced = encoder.encode(source_len) 14 | 15 | encoded_for_init_state = mx.sym.Concat(forward_hidden_all[-1], backward_hidden_all[0], dim=1, 16 | name='encoded_for_init_state') 17 | target_representation = decoder.decode(target_len, encoded_for_init_state, source_representations, 18 | source_mask_sliced) 19 | return target_representation, input_names, output_names 20 | 21 | 22 | def sym_gen(source_vocab_size, target_vocab_size): 23 | input_names = ['source', 'source_mask', 'target', 24 | # 'target_mask', 25 | "forward_source_l0_init_h", 26 | "backward_source_l0_init_h"] 27 | output_names = ['target_softmax_label'] 28 | encoder = BiDirectionalGruEncoder(use_masking=True, state_dim=xconfig.num_hidden, 29 | input_dim=0, output_dim=0, 30 | vocab_size=source_vocab_size, embed_dim=xconfig.num_embed, 31 | dropout=xconfig.dropout, num_of_layer=xconfig.num_lstm_layer) 32 | 33 | attention = ConcatAttention(batch_size=xconfig.batch_size, attend_dim=xconfig.num_hidden * 2, 34 | state_dim=xconfig.num_hidden) 35 | 36 | decoder = GruAttentionDecoder(use_masking=True, state_dim=xconfig.num_hidden, 37 | input_dim=0, output_dim=target_vocab_size, 38 | vocab_size=target_vocab_size, embed_dim=xconfig.num_embed, 39 | dropout=xconfig.dropout, 40 | num_of_layer=xconfig.num_lstm_layer, attention=attention, 41 | batch_size=xconfig.batch_size) 42 | 43 | def _sym_gen(s_t_len): 44 | return s2s_unroll(encoder=encoder, 45 | attention=attention, 46 | decoder=decoder, 47 | source_len=s_t_len[0], 48 | target_len=s_t_len[1], 49 | input_names=input_names, output_names=output_names, 50 | ) 51 | 52 | return _sym_gen 53 | -------------------------------------------------------------------------------- /nmt/xutils.py: -------------------------------------------------------------------------------- 1 | import mxnet as mx 2 | import sys 3 | import pickle 4 | 5 | import xconfig 6 | 7 | 8 | def get_gpu_number(): 9 | for i in range(100): 10 | try: 11 | mx.nd.zeros((1,), ctx=mx.gpu(i)) 12 | except: 13 | return i 14 | 15 | 16 | # Read from doc 17 | def read_content(path, max_read_line=sys.maxsize): 18 | content = [] 19 | count = 0 20 | with open(path, encoding='utf-8') as ins: 21 | while True: 22 | line = ins.readline() 23 | if not line: 24 | break 25 | count += 1 26 | if count > max_read_line: 27 | break 28 | line = line.strip() 29 | content.append(line.split(' ')) 30 | return content 31 | 32 | 33 | def load_vocab(path, special=None): 34 | """ 35 | Load vocab from file, the 0, 1, 2, 3 should be reserved for pad, , , 36 | :param path: the vocab 37 | :param special: 38 | :return: 39 | """ 40 | with open(path, 'rb') as f: 41 | vocab = pickle.load(f) 42 | 43 | if special: 44 | if not isinstance(special, dict): 45 | raise Exception('special words not instance of python dict') 46 | for word, idx in special.items(): 47 | if len(word) == 0: 48 | continue 49 | if word == '\n' or word == ' ': 50 | continue 51 | if word not in vocab: 52 | vocab[word] = idx 53 | return vocab 54 | 55 | 56 | def sentence2id(sentence, the_vocab): 57 | words = list(sentence) 58 | words = [the_vocab[w] if w in the_vocab else the_vocab[xconfig.unk_word] for w in words if len(w) > 0] 59 | return words 60 | 61 | 62 | def word2id(word, the_vocab): 63 | return the_vocab[word] if word in the_vocab else the_vocab[xconfig.unk_word] 64 | -------------------------------------------------------------------------------- /trainingLog.txt: -------------------------------------------------------------------------------- 1 | C:\Anaconda3\python.exe D:/users/home/Projects/mxnmt/nmt/main.py 2 | 20:05:35 INFO:root:Config: 3 | batch_size: 128 4 | beam_size: 12 5 | bleu_ref_number: 7 6 | bos_word: 7 | bucket_stride: 10 8 | buckets: [(10, 10), (10, 20), (10, 30), (10, 40), (10, 50), (10, 60), (20, 10), (20, 20), (20, 30), (20, 40), (20, 50), (20, 60), (30, 10), (30, 20), (30, 30), (30, 40), (30, 50), (30, 60), (40, 10), (40, 20), (40, 30), (40, 40), (40, 50), (40, 60), (50, 10), (50, 20), (50, 30), (50, 40), (50, 50), (50, 60), (60, 10), (60, 20), (60, 30), (60, 40), (60, 50), (60, 60)] 9 | checkpoint_freq_batch: 1000 10 | checkpoint_name: D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model 11 | data_root: D:\users\home\Projects\mxnmt\IWSLT 12 | dev_max_samples: 100000 13 | dev_output: D:\users\home\Projects\mxnmt\IWSLT\dev\dev.out 14 | dev_source: D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt 15 | dev_target: D:\users\home\Projects\mxnmt\IWSLT\invalid\invalid 16 | dropout: 0.5 17 | eos_word: 18 | eval_per_x_batch: 400 19 | eval_start_epoch: 4 20 | greedy_batch_size: 32 21 | learning_rate: 1 22 | log_root: D:\users\home\Projects\mxnmt\IWSLT\log 23 | max_decode_len: 15 24 | model_root: D:\users\home\Projects\mxnmt\IWSLT\model 25 | model_save_freq: 1 26 | model_save_name: D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt 27 | model_to_load_number: 1 28 | model_to_load_prefix: D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt 29 | momentum: 0.1 30 | num_embed: 512 31 | num_epoch: 60 32 | num_hidden: 512 33 | num_lstm_layer: 1 34 | resume_model_number: 0 35 | resume_model_prefix: D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model 36 | show_every_x_batch: 100 37 | source_root: D:\users\home\Projects\mxnmt 38 | source_vocab_path: D:\users\home\Projects\mxnmt\IWSLT\zh\zh.vocab.pkl 39 | special_words: {'': 2, '': 3, '': 1} 40 | target_vocab_path: D:\users\home\Projects\mxnmt\IWSLT\en\en.vocab.pkl 41 | test_device: gpu(0) 42 | test_gold: D:\users\home\Projects\mxnmt\IWSLT\test\IWSLT.test.txt 43 | test_output: D:\users\home\Projects\mxnmt\IWSLT\test\test.out 44 | test_source: D:\users\home\Projects\mxnmt\IWSLT\test\IWSLT.test.txt 45 | train_device: [gpu(0)] 46 | train_max_samples: 100000 47 | train_source: D:\users\home\Projects\mxnmt\IWSLT\zh\zh.txt 48 | train_target: D:\users\home\Projects\mxnmt\IWSLT\en\en.txt 49 | unk_word: 50 | use_batch_greedy_search: False 51 | use_beam_search: True 52 | use_resuming: True 53 | 54 | 20:05:35 INFO:root:In train mode. 55 | 20:05:36 INFO:root:source_vocab size: 9825 56 | 20:05:36 INFO:root:target_vocab size: 9413 57 | Summary of dataset ================== 58 | Total: 81819 in 36 buckets 59 | bucket of (10, 10) : 49266 samples 60 | bucket of (10, 20) : 15701 samples 61 | bucket of (10, 30) : 288 samples 62 | bucket of (10, 40) : 5 samples 63 | bucket of (10, 50) : 0 samples 64 | bucket of (10, 60) : 0 samples 65 | bucket of (20, 10) : 1039 samples 66 | bucket of (20, 20) : 10825 samples 67 | bucket of (20, 30) : 3126 samples 68 | bucket of (20, 40) : 203 samples 69 | bucket of (20, 50) : 10 samples 70 | bucket of (20, 60) : 1 samples 71 | bucket of (30, 10) : 1 samples 72 | bucket of (30, 20) : 118 samples 73 | bucket of (30, 30) : 752 samples 74 | bucket of (30, 40) : 269 samples 75 | bucket of (30, 50) : 38 samples 76 | bucket of (30, 60) : 2 samples 77 | bucket of (40, 10) : 0 samples 78 | bucket of (40, 20) : 0 samples 79 | bucket of (40, 30) : 10 samples 80 | bucket of (40, 40) : 43 samples 81 | bucket of (40, 50) : 31 samples 82 | bucket of (40, 60) : 2 samples 83 | bucket of (50, 10) : 0 samples 84 | bucket of (50, 20) : 0 samples 85 | bucket of (50, 30) : 0 samples 86 | bucket of (50, 40) : 4 samples 87 | bucket of (50, 50) : 25 samples 88 | bucket of (50, 60) : 15 samples 89 | bucket of (60, 10) : 0 samples 90 | bucket of (60, 20) : 0 samples 91 | bucket of (60, 30) : 0 samples 92 | bucket of (60, 40) : 0 samples 93 | bucket of (60, 50) : 10 samples 94 | bucket of (60, 60) : 18 samples 95 | D:\users\home\Projects\mxnmt\nmt\masked_bucket_io.py:239: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future 96 | bucket_plan = np.hstack([np.zeros(n, int) + i for i, n in enumerate(bucket_n_batches)]) 97 | 20:05:39 INFO:root:Try resuming from D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model 0 98 | [20:05:39] D:\mxnet\dmlc-core\include\dmlc/logging.h:235: [20:05:39] D:\mxnet\dmlc-core\src\io\local_filesys.cc:154: Check failed: allow_null LocalFileSystem: fail to open "D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model-symbol.json" 99 | 20:05:39 ERROR:root:Resume failed. 100 | 20:05:40 INFO:root:Start training with [gpu(0)] 101 | 20:06:24 INFO:root:Epoch[0] Batch [100] Speed: 346.40 samples/sec Train-Perplexity=677.456613 102 | 20:07:02 INFO:root:Epoch[0] Batch [200] Speed: 334.68 samples/sec Train-Perplexity=127.235813 103 | 20:07:40 INFO:root:Epoch[0] Batch [300] Speed: 336.18 samples/sec Train-Perplexity=87.583511 104 | 20:08:21 INFO:root:Epoch[0] Batch [400] Speed: 308.59 samples/sec Train-Perplexity=72.805057 105 | Too early to check BLEU at epoch 0 106 | 20:08:59 INFO:root:Epoch[0] Batch [500] Speed: 345.62 samples/sec Train-Perplexity=58.451281 107 | 20:09:38 INFO:root:Epoch[0] Batch [600] Speed: 324.52 samples/sec Train-Perplexity=54.911271 108 | 20:09:52 INFO:root:Epoch[0] Resetting Data Iterator 109 | 20:09:52 INFO:root:Epoch[0] Time cost=245.259 110 | 20:09:52 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0001.params" 111 | 20:10:28 INFO:root:Epoch[1] Batch [100] Speed: 357.79 samples/sec Train-Perplexity=43.068291 112 | 20:11:06 INFO:root:Epoch[1] Batch [200] Speed: 335.12 samples/sec Train-Perplexity=42.659636 113 | 20:11:44 INFO:root:Epoch[1] Batch [300] Speed: 335.83 samples/sec Train-Perplexity=38.860670 114 | 20:12:24 INFO:root:Epoch[1] Batch [400] Speed: 319.99 samples/sec Train-Perplexity=37.914928 115 | Too early to check BLEU at epoch 1 116 | 20:13:01 INFO:root:Epoch[1] Batch [500] Speed: 345.75 samples/sec Train-Perplexity=33.331348 117 | 20:13:41 INFO:root:Epoch[1] Batch [600] Speed: 324.27 samples/sec Train-Perplexity=33.298229 118 | 20:13:55 INFO:root:Epoch[1] Resetting Data Iterator 119 | 20:13:55 INFO:root:Epoch[1] Time cost=242.353 120 | 20:13:55 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0002.params" 121 | 20:14:31 INFO:root:Epoch[2] Batch [100] Speed: 356.97 samples/sec Train-Perplexity=27.835325 122 | 20:15:09 INFO:root:Epoch[2] Batch [200] Speed: 335.24 samples/sec Train-Perplexity=28.387237 123 | 20:15:48 INFO:root:Epoch[2] Batch [300] Speed: 334.72 samples/sec Train-Perplexity=26.452638 124 | 20:16:28 INFO:root:Epoch[2] Batch [400] Speed: 316.15 samples/sec Train-Perplexity=26.782767 125 | Too early to check BLEU at epoch 2 126 | 20:17:05 INFO:root:Epoch[2] Batch [500] Speed: 345.44 samples/sec Train-Perplexity=23.714569 127 | 20:17:45 INFO:root:Epoch[2] Batch [600] Speed: 324.77 samples/sec Train-Perplexity=24.313119 128 | 20:17:58 INFO:root:Epoch[2] Resetting Data Iterator 129 | 20:17:58 INFO:root:Epoch[2] Time cost=243.027 130 | 20:17:59 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0003.params" 131 | 20:18:35 INFO:root:Epoch[3] Batch [100] Speed: 357.69 samples/sec Train-Perplexity=20.603173 132 | 20:19:13 INFO:root:Epoch[3] Batch [200] Speed: 335.12 samples/sec Train-Perplexity=21.253643 133 | 20:19:51 INFO:root:Epoch[3] Batch [300] Speed: 335.51 samples/sec Train-Perplexity=20.253162 134 | Too early to check BLEU at epoch 3 135 | 20:20:31 INFO:root:Epoch[3] Batch [400] Speed: 320.28 samples/sec Train-Perplexity=20.852583 136 | 20:21:08 INFO:root:Epoch[3] Batch [500] Speed: 346.38 samples/sec Train-Perplexity=18.429250 137 | 20:21:48 INFO:root:Epoch[3] Batch [600] Speed: 325.04 samples/sec Train-Perplexity=19.102989 138 | 20:22:01 INFO:root:Epoch[3] Resetting Data Iterator 139 | 20:22:01 INFO:root:Epoch[3] Time cost=242.185 140 | 20:22:02 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0004.params" 141 | 20:22:38 INFO:root:Epoch[4] Batch [100] Speed: 358.60 samples/sec Train-Perplexity=16.546370 142 | 20:23:16 INFO:root:Epoch[4] Batch [200] Speed: 335.66 samples/sec Train-Perplexity=17.048965 143 | 20:23:54 INFO:root:Epoch[4] Batch [300] Speed: 336.47 samples/sec Train-Perplexity=16.390798 144 | 20:24:34 INFO:root:Epoch[4] Batch [400] Speed: 320.39 samples/sec Train-Perplexity=17.157482 145 | 20:24:34 INFO:root:Checking BLEU for epoch 4 batch 400 146 | C:\Anaconda3\lib\site-packages\mxnet-0.7.0-py3.5.egg\mxnet\ndarray.py:531: RuntimeWarning: copy an array to itself, is it intended? 147 | RuntimeWarning) 148 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 149 | 20:26:39 INFO:root:b'1gram=65.61% 2gram=39.62% 3gram=27.79% 4gram=19.09% \r\nBP = 0.9639\r\nBLEU = 0.3303\r\n' 150 | 20:26:39 INFO:root:BLEU: 0.3303 @ epoch 4 batch 400 151 | 20:26:39 INFO:root:Current BLEU: 0.3303 > prev best -1.0 in epoch -1 152 | 20:26:39 INFO:root:Saving... 153 | 20:26:39 INFO:root:Saved checkpoint to "best_bleu-0005.params" 154 | 20:27:16 INFO:root:Epoch[4] Batch [500] Speed: 78.94 samples/sec Train-Perplexity=15.243541 155 | 20:27:55 INFO:root:Epoch[4] Batch [600] Speed: 325.14 samples/sec Train-Perplexity=15.828474 156 | 20:28:09 INFO:root:Epoch[4] Resetting Data Iterator 157 | 20:28:09 INFO:root:Epoch[4] Time cost=367.106 158 | 20:28:10 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0005.params" 159 | 20:28:45 INFO:root:Epoch[5] Batch [100] Speed: 357.54 samples/sec Train-Perplexity=13.908643 160 | 20:29:24 INFO:root:Epoch[5] Batch [200] Speed: 335.71 samples/sec Train-Perplexity=14.388900 161 | 20:30:02 INFO:root:Epoch[5] Batch [300] Speed: 336.51 samples/sec Train-Perplexity=13.905681 162 | 20:30:42 INFO:root:Epoch[5] Batch [400] Speed: 320.08 samples/sec Train-Perplexity=14.677042 163 | 20:30:42 INFO:root:Checking BLEU for epoch 5 batch 400 164 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 165 | 20:32:45 INFO:root:b'1gram=68.54% 2gram=42.38% 3gram=29.82% 4gram=21.34% \r\nBP = 0.9224\r\nBLEU = 0.3401\r\n' 166 | 20:32:45 INFO:root:BLEU: 0.3401 @ epoch 5 batch 400 167 | 20:32:45 INFO:root:Current BLEU: 0.3401 > prev best 0.3303 in epoch 4 168 | 20:32:45 INFO:root:Saving... 169 | 20:32:45 INFO:root:Saved checkpoint to "best_bleu-0006.params" 170 | 20:33:22 INFO:root:Epoch[5] Batch [500] Speed: 79.94 samples/sec Train-Perplexity=12.956247 171 | 20:34:01 INFO:root:Epoch[5] Batch [600] Speed: 323.94 samples/sec Train-Perplexity=13.587172 172 | 20:34:15 INFO:root:Epoch[5] Resetting Data Iterator 173 | 20:34:15 INFO:root:Epoch[5] Time cost=365.382 174 | 20:34:16 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0006.params" 175 | 20:34:52 INFO:root:Epoch[6] Batch [100] Speed: 357.26 samples/sec Train-Perplexity=11.914505 176 | 20:35:30 INFO:root:Epoch[6] Batch [200] Speed: 333.97 samples/sec Train-Perplexity=12.512574 177 | 20:36:08 INFO:root:Epoch[6] Batch [300] Speed: 335.77 samples/sec Train-Perplexity=12.020077 178 | 20:36:48 INFO:root:Epoch[6] Batch [400] Speed: 319.38 samples/sec Train-Perplexity=12.936626 179 | 20:36:48 INFO:root:Checking BLEU for epoch 6 batch 400 180 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 181 | 20:38:53 INFO:root:b'1gram=68.32% 2gram=42.65% 3gram=29.93% 4gram=20.46% \r\nBP = 0.9666\r\nBLEU = 0.3533\r\n' 182 | 20:38:53 INFO:root:BLEU: 0.3533 @ epoch 6 batch 400 183 | 20:38:53 INFO:root:Current BLEU: 0.3533 > prev best 0.3401 in epoch 5 184 | 20:38:53 INFO:root:Saving... 185 | 20:38:53 INFO:root:Saved checkpoint to "best_bleu-0007.params" 186 | 20:39:30 INFO:root:Epoch[6] Batch [500] Speed: 79.16 samples/sec Train-Perplexity=11.328022 187 | 20:40:09 INFO:root:Epoch[6] Batch [600] Speed: 324.32 samples/sec Train-Perplexity=11.931129 188 | 20:40:23 INFO:root:Epoch[6] Resetting Data Iterator 189 | 20:40:23 INFO:root:Epoch[6] Time cost=367.294 190 | 20:40:24 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0007.params" 191 | 20:40:59 INFO:root:Epoch[7] Batch [100] Speed: 357.71 samples/sec Train-Perplexity=10.575803 192 | 20:41:38 INFO:root:Epoch[7] Batch [200] Speed: 334.78 samples/sec Train-Perplexity=11.017215 193 | 20:42:16 INFO:root:Epoch[7] Batch [300] Speed: 336.01 samples/sec Train-Perplexity=10.738582 194 | 20:42:56 INFO:root:Epoch[7] Batch [400] Speed: 319.64 samples/sec Train-Perplexity=11.594227 195 | 20:42:56 INFO:root:Checking BLEU for epoch 7 batch 400 196 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 197 | 20:44:57 INFO:root:b'1gram=71.05% 2gram=45.56% 3gram=33.06% 4gram=22.97% \r\nBP = 0.9349\r\nBLEU = 0.3702\r\n' 198 | 20:44:57 INFO:root:BLEU: 0.3702 @ epoch 7 batch 400 199 | 20:44:57 INFO:root:Current BLEU: 0.3702 > prev best 0.3533 in epoch 6 200 | 20:44:57 INFO:root:Saving... 201 | 20:44:58 INFO:root:Saved checkpoint to "best_bleu-0008.params" 202 | 20:45:35 INFO:root:Epoch[7] Batch [500] Speed: 80.60 samples/sec Train-Perplexity=10.112880 203 | 20:46:14 INFO:root:Epoch[7] Batch [600] Speed: 324.06 samples/sec Train-Perplexity=10.736643 204 | 20:46:28 INFO:root:Epoch[7] Resetting Data Iterator 205 | 20:46:28 INFO:root:Epoch[7] Time cost=364.244 206 | 20:46:28 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0008.params" 207 | 20:47:04 INFO:root:Epoch[8] Batch [100] Speed: 357.45 samples/sec Train-Perplexity=9.522396 208 | 20:47:43 INFO:root:Epoch[8] Batch [200] Speed: 334.64 samples/sec Train-Perplexity=9.972395 209 | 20:48:21 INFO:root:Epoch[8] Batch [300] Speed: 329.93 samples/sec Train-Perplexity=9.700864 210 | 20:49:01 INFO:root:Epoch[8] Batch [400] Speed: 319.50 samples/sec Train-Perplexity=10.634506 211 | 20:49:01 INFO:root:Checking BLEU for epoch 8 batch 400 212 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 213 | 20:51:06 INFO:root:b'1gram=70.02% 2gram=44.80% 3gram=32.07% 4gram=22.38% \r\nBP = 0.9731\r\nBLEU = 0.3769\r\n' 214 | 20:51:06 INFO:root:BLEU: 0.3769 @ epoch 8 batch 400 215 | 20:51:06 INFO:root:Current BLEU: 0.3769 > prev best 0.3702 in epoch 7 216 | 20:51:06 INFO:root:Saving... 217 | 20:51:06 INFO:root:Saved checkpoint to "best_bleu-0009.params" 218 | 20:51:43 INFO:root:Epoch[8] Batch [500] Speed: 79.21 samples/sec Train-Perplexity=9.170730 219 | 20:52:23 INFO:root:Epoch[8] Batch [600] Speed: 323.98 samples/sec Train-Perplexity=9.790732 220 | 20:52:36 INFO:root:Epoch[8] Resetting Data Iterator 221 | 20:52:36 INFO:root:Epoch[8] Time cost=367.819 222 | 20:52:37 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0009.params" 223 | 20:53:13 INFO:root:Epoch[9] Batch [100] Speed: 357.66 samples/sec Train-Perplexity=8.732078 224 | 20:53:51 INFO:root:Epoch[9] Batch [200] Speed: 335.10 samples/sec Train-Perplexity=9.055381 225 | 20:54:29 INFO:root:Epoch[9] Batch [300] Speed: 336.43 samples/sec Train-Perplexity=8.890672 226 | 20:55:09 INFO:root:Epoch[9] Batch [400] Speed: 320.36 samples/sec Train-Perplexity=9.763253 227 | 20:55:09 INFO:root:Checking BLEU for epoch 9 batch 400 228 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 229 | 20:57:13 INFO:root:b'1gram=71.28% 2gram=46.02% 3gram=33.44% 4gram=23.75% \r\nBP = 0.9666\r\nBLEU = 0.3883\r\n' 230 | 20:57:13 INFO:root:BLEU: 0.3883 @ epoch 9 batch 400 231 | 20:57:13 INFO:root:Current BLEU: 0.3883 > prev best 0.3769 in epoch 8 232 | 20:57:13 INFO:root:Saving... 233 | 20:57:13 INFO:root:Saved checkpoint to "best_bleu-0010.params" 234 | 20:57:50 INFO:root:Epoch[9] Batch [500] Speed: 79.62 samples/sec Train-Perplexity=8.448180 235 | 20:58:29 INFO:root:Epoch[9] Batch [600] Speed: 324.77 samples/sec Train-Perplexity=8.975397 236 | 20:58:43 INFO:root:Epoch[9] Resetting Data Iterator 237 | 20:58:43 INFO:root:Epoch[9] Time cost=366.024 238 | 20:58:44 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0010.params" 239 | 20:59:20 INFO:root:Epoch[10] Batch [100] Speed: 357.45 samples/sec Train-Perplexity=8.093092 240 | 20:59:58 INFO:root:Epoch[10] Batch [200] Speed: 334.54 samples/sec Train-Perplexity=8.395097 241 | 21:00:36 INFO:root:Epoch[10] Batch [300] Speed: 335.94 samples/sec Train-Perplexity=8.181741 242 | 21:01:16 INFO:root:Epoch[10] Batch [400] Speed: 319.47 samples/sec Train-Perplexity=9.018671 243 | 21:01:16 INFO:root:Checking BLEU for epoch 10 batch 400 244 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [239 ms]. 245 | 21:03:19 INFO:root:b'1gram=71.81% 2gram=47.26% 3gram=34.34% 4gram=24.33% \r\nBP = 0.9623\r\nBLEU = 0.3949\r\n' 246 | 21:03:19 INFO:root:BLEU: 0.3949 @ epoch 10 batch 400 247 | 21:03:19 INFO:root:Current BLEU: 0.3949 > prev best 0.3883 in epoch 9 248 | 21:03:19 INFO:root:Saving... 249 | 21:03:20 INFO:root:Saved checkpoint to "best_bleu-0011.params" 250 | 21:03:56 INFO:root:Epoch[10] Batch [500] Speed: 79.78 samples/sec Train-Perplexity=7.841984 251 | 21:04:37 INFO:root:Epoch[10] Batch [600] Speed: 318.43 samples/sec Train-Perplexity=8.340074 252 | 21:04:50 INFO:root:Epoch[10] Resetting Data Iterator 253 | 21:04:50 INFO:root:Epoch[10] Time cost=366.643 254 | 21:04:51 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0011.params" 255 | 21:05:27 INFO:root:Epoch[11] Batch [100] Speed: 357.71 samples/sec Train-Perplexity=7.543190 256 | 21:06:05 INFO:root:Epoch[11] Batch [200] Speed: 335.21 samples/sec Train-Perplexity=7.860304 257 | 21:06:43 INFO:root:Epoch[11] Batch [300] Speed: 335.82 samples/sec Train-Perplexity=7.613764 258 | 21:07:23 INFO:root:Epoch[11] Batch [400] Speed: 318.66 samples/sec Train-Perplexity=8.446283 259 | 21:07:23 INFO:root:Checking BLEU for epoch 11 batch 400 260 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 261 | 21:09:24 INFO:root:b'1gram=71.83% 2gram=46.38% 3gram=33.56% 4gram=23.98% \r\nBP = 0.9468\r\nBLEU = 0.3831\r\n' 262 | 21:09:24 INFO:root:BLEU: 0.3831 @ epoch 11 batch 400 263 | 21:10:01 INFO:root:Epoch[11] Batch [500] Speed: 81.12 samples/sec Train-Perplexity=7.340801 264 | 21:10:41 INFO:root:Epoch[11] Batch [600] Speed: 324.20 samples/sec Train-Perplexity=7.796961 265 | 21:10:54 INFO:root:Epoch[11] Resetting Data Iterator 266 | 21:10:54 INFO:root:Epoch[11] Time cost=363.285 267 | 21:10:55 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0012.params" 268 | 21:11:31 INFO:root:Epoch[12] Batch [100] Speed: 358.01 samples/sec Train-Perplexity=7.088733 269 | 21:12:09 INFO:root:Epoch[12] Batch [200] Speed: 334.66 samples/sec Train-Perplexity=7.353227 270 | 21:12:47 INFO:root:Epoch[12] Batch [300] Speed: 335.80 samples/sec Train-Perplexity=7.147846 271 | 21:13:27 INFO:root:Epoch[12] Batch [400] Speed: 319.75 samples/sec Train-Perplexity=8.068194 272 | 21:13:27 INFO:root:Checking BLEU for epoch 12 batch 400 273 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 274 | 21:15:30 INFO:root:b'1gram=72.21% 2gram=47.82% 3gram=35.17% 4gram=25.34% \r\nBP = 0.9582\r\nBLEU = 0.4014\r\n' 275 | 21:15:30 INFO:root:BLEU: 0.4014 @ epoch 12 batch 400 276 | 21:15:30 INFO:root:Current BLEU: 0.4014 > prev best 0.3949 in epoch 10 277 | 21:15:30 INFO:root:Saving... 278 | 21:15:30 INFO:root:Saved checkpoint to "best_bleu-0013.params" 279 | 21:16:07 INFO:root:Epoch[12] Batch [500] Speed: 80.00 samples/sec Train-Perplexity=6.901814 280 | 21:16:47 INFO:root:Epoch[12] Batch [600] Speed: 323.91 samples/sec Train-Perplexity=7.349590 281 | 21:17:00 INFO:root:Epoch[12] Resetting Data Iterator 282 | 21:17:00 INFO:root:Epoch[12] Time cost=365.473 283 | 21:17:01 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0013.params" 284 | 21:17:37 INFO:root:Epoch[13] Batch [100] Speed: 357.26 samples/sec Train-Perplexity=6.668300 285 | 21:18:15 INFO:root:Epoch[13] Batch [200] Speed: 334.11 samples/sec Train-Perplexity=6.903907 286 | 21:18:53 INFO:root:Epoch[13] Batch [300] Speed: 334.95 samples/sec Train-Perplexity=6.779002 287 | 21:19:34 INFO:root:Epoch[13] Batch [400] Speed: 319.28 samples/sec Train-Perplexity=7.538552 288 | 21:19:34 INFO:root:Checking BLEU for epoch 13 batch 400 289 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 290 | 21:21:35 INFO:root:b'1gram=72.78% 2gram=48.05% 3gram=35.43% 4gram=25.77% \r\nBP = 0.9380\r\nBLEU = 0.3965\r\n' 291 | 21:21:35 INFO:root:BLEU: 0.3965 @ epoch 13 batch 400 292 | 21:22:12 INFO:root:Epoch[13] Batch [500] Speed: 80.77 samples/sec Train-Perplexity=6.545751 293 | 21:22:52 INFO:root:Epoch[13] Batch [600] Speed: 323.67 samples/sec Train-Perplexity=6.999107 294 | 21:23:05 INFO:root:Epoch[13] Resetting Data Iterator 295 | 21:23:05 INFO:root:Epoch[13] Time cost=364.298 296 | 21:23:06 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0014.params" 297 | 21:23:42 INFO:root:Epoch[14] Batch [100] Speed: 357.32 samples/sec Train-Perplexity=6.375863 298 | 21:24:20 INFO:root:Epoch[14] Batch [200] Speed: 334.81 samples/sec Train-Perplexity=6.594310 299 | 21:24:58 INFO:root:Epoch[14] Batch [300] Speed: 336.03 samples/sec Train-Perplexity=6.421297 300 | 21:25:38 INFO:root:Epoch[14] Batch [400] Speed: 319.55 samples/sec Train-Perplexity=7.225313 301 | 21:25:38 INFO:root:Checking BLEU for epoch 14 batch 400 302 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 303 | 21:27:41 INFO:root:b'1gram=71.98% 2gram=48.20% 3gram=36.14% 4gram=26.24% \r\nBP = 0.9393\r\nBLEU = 0.4001\r\n' 304 | 21:27:41 INFO:root:BLEU: 0.4001 @ epoch 14 batch 400 305 | 21:28:18 INFO:root:Epoch[14] Batch [500] Speed: 80.35 samples/sec Train-Perplexity=6.232595 306 | 21:28:57 INFO:root:Epoch[14] Batch [600] Speed: 324.25 samples/sec Train-Perplexity=6.626467 307 | 21:29:11 INFO:root:Epoch[14] Resetting Data Iterator 308 | 21:29:11 INFO:root:Epoch[14] Time cost=364.748 309 | 21:29:11 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0015.params" 310 | 21:29:47 INFO:root:Epoch[15] Batch [100] Speed: 358.29 samples/sec Train-Perplexity=6.163365 311 | 21:30:25 INFO:root:Epoch[15] Batch [200] Speed: 335.27 samples/sec Train-Perplexity=6.239482 312 | 21:31:03 INFO:root:Epoch[15] Batch [300] Speed: 336.43 samples/sec Train-Perplexity=6.124430 313 | 21:31:44 INFO:root:Epoch[15] Batch [400] Speed: 318.66 samples/sec Train-Perplexity=6.912867 314 | 21:31:44 INFO:root:Checking BLEU for epoch 15 batch 400 315 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 316 | 21:33:47 INFO:root:b'1gram=72.72% 2gram=49.09% 3gram=36.69% 4gram=26.89% \r\nBP = 0.9610\r\nBLEU = 0.4163\r\n' 317 | 21:33:47 INFO:root:BLEU: 0.4163 @ epoch 15 batch 400 318 | 21:33:47 INFO:root:Current BLEU: 0.4163 > prev best 0.4014 in epoch 12 319 | 21:33:47 INFO:root:Saving... 320 | 21:33:47 INFO:root:Saved checkpoint to "best_bleu-0016.params" 321 | 21:34:24 INFO:root:Epoch[15] Batch [500] Speed: 79.87 samples/sec Train-Perplexity=5.964563 322 | 21:35:03 INFO:root:Epoch[15] Batch [600] Speed: 324.65 samples/sec Train-Perplexity=6.364883 323 | 21:35:17 INFO:root:Epoch[15] Resetting Data Iterator 324 | 21:35:17 INFO:root:Epoch[15] Time cost=365.577 325 | 21:35:18 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0016.params" 326 | 21:35:53 INFO:root:Epoch[16] Batch [100] Speed: 357.86 samples/sec Train-Perplexity=5.805708 327 | 21:36:32 INFO:root:Epoch[16] Batch [200] Speed: 334.97 samples/sec Train-Perplexity=5.979251 328 | 21:37:11 INFO:root:Epoch[16] Batch [300] Speed: 327.06 samples/sec Train-Perplexity=5.853375 329 | 21:37:51 INFO:root:Epoch[16] Batch [400] Speed: 319.71 samples/sec Train-Perplexity=6.650697 330 | 21:37:51 INFO:root:Checking BLEU for epoch 16 batch 400 331 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 332 | 21:39:56 INFO:root:b'1gram=71.67% 2gram=46.89% 3gram=34.49% 4gram=24.43% \r\nBP = 0.9840\r\nBLEU = 0.4036\r\n' 333 | 21:39:56 INFO:root:BLEU: 0.4036 @ epoch 16 batch 400 334 | 21:40:33 INFO:root:Epoch[16] Batch [500] Speed: 79.14 samples/sec Train-Perplexity=5.721799 335 | 21:41:12 INFO:root:Epoch[16] Batch [600] Speed: 324.26 samples/sec Train-Perplexity=6.081054 336 | 21:41:26 INFO:root:Epoch[16] Resetting Data Iterator 337 | 21:41:26 INFO:root:Epoch[16] Time cost=368.138 338 | 21:41:26 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0017.params" 339 | 21:42:02 INFO:root:Epoch[17] Batch [100] Speed: 357.51 samples/sec Train-Perplexity=5.597813 340 | 21:42:41 INFO:root:Epoch[17] Batch [200] Speed: 334.83 samples/sec Train-Perplexity=5.752815 341 | 21:43:19 INFO:root:Epoch[17] Batch [300] Speed: 335.82 samples/sec Train-Perplexity=5.613985 342 | 21:43:59 INFO:root:Epoch[17] Batch [400] Speed: 319.40 samples/sec Train-Perplexity=6.407925 343 | 21:43:59 INFO:root:Checking BLEU for epoch 17 batch 400 344 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 345 | 21:46:01 INFO:root:b'1gram=72.71% 2gram=48.73% 3gram=36.33% 4gram=26.43% \r\nBP = 0.9536\r\nBLEU = 0.4095\r\n' 346 | 21:46:01 INFO:root:BLEU: 0.4095 @ epoch 17 batch 400 347 | 21:46:38 INFO:root:Epoch[17] Batch [500] Speed: 80.48 samples/sec Train-Perplexity=5.507343 348 | 21:47:17 INFO:root:Epoch[17] Batch [600] Speed: 324.23 samples/sec Train-Perplexity=5.856936 349 | 21:47:31 INFO:root:Epoch[17] Resetting Data Iterator 350 | 21:47:31 INFO:root:Epoch[17] Time cost=364.515 351 | 21:47:32 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0018.params" 352 | 21:48:07 INFO:root:Epoch[18] Batch [100] Speed: 357.96 samples/sec Train-Perplexity=5.517313 353 | 21:48:46 INFO:root:Epoch[18] Batch [200] Speed: 335.06 samples/sec Train-Perplexity=5.540820 354 | 21:49:24 INFO:root:Epoch[18] Batch [300] Speed: 335.82 samples/sec Train-Perplexity=5.418239 355 | 21:50:04 INFO:root:Epoch[18] Batch [400] Speed: 319.89 samples/sec Train-Perplexity=6.152498 356 | 21:50:04 INFO:root:Checking BLEU for epoch 18 batch 400 357 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms]. 358 | 21:52:06 INFO:root:b'1gram=72.94% 2gram=48.72% 3gram=35.73% 4gram=24.87% \r\nBP = 0.9593\r\nBLEU = 0.4044\r\n' 359 | 21:52:06 INFO:root:BLEU: 0.4044 @ epoch 18 batch 400 360 | 21:52:43 INFO:root:Epoch[18] Batch [500] Speed: 80.39 samples/sec Train-Perplexity=5.301571 361 | 21:53:23 INFO:root:Epoch[18] Batch [600] Speed: 319.32 samples/sec Train-Perplexity=5.640023 362 | 21:53:37 INFO:root:Epoch[18] Resetting Data Iterator 363 | 21:53:37 INFO:root:Epoch[18] Time cost=365.165 364 | 21:53:37 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0019.params" 365 | 21:54:13 INFO:root:Epoch[19] Batch [100] Speed: 357.84 samples/sec Train-Perplexity=5.218759 366 | 21:54:52 INFO:root:Epoch[19] Batch [200] Speed: 334.61 samples/sec Train-Perplexity=5.312471 367 | 21:55:30 INFO:root:Epoch[19] Batch [300] Speed: 335.46 samples/sec Train-Perplexity=5.240229 368 | 21:56:10 INFO:root:Epoch[19] Batch [400] Speed: 319.12 samples/sec Train-Perplexity=5.939835 369 | 21:56:10 INFO:root:Checking BLEU for epoch 19 batch 400 370 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 371 | 21:58:14 INFO:root:b'1gram=71.92% 2gram=48.46% 3gram=35.67% 4gram=25.13% \r\nBP = 0.9816\r\nBLEU = 0.4127\r\n' 372 | 21:58:14 INFO:root:BLEU: 0.4127 @ epoch 19 batch 400 373 | 21:58:51 INFO:root:Epoch[19] Batch [500] Speed: 79.56 samples/sec Train-Perplexity=5.138349 374 | 21:59:30 INFO:root:Epoch[19] Batch [600] Speed: 324.29 samples/sec Train-Perplexity=5.474285 375 | 21:59:44 INFO:root:Epoch[19] Resetting Data Iterator 376 | 21:59:44 INFO:root:Epoch[19] Time cost=366.446 377 | 21:59:44 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0020.params" 378 | 22:00:20 INFO:root:Epoch[20] Batch [100] Speed: 357.59 samples/sec Train-Perplexity=5.156831 379 | 22:00:59 INFO:root:Epoch[20] Batch [200] Speed: 335.02 samples/sec Train-Perplexity=5.152715 380 | 22:01:37 INFO:root:Epoch[20] Batch [300] Speed: 335.62 samples/sec Train-Perplexity=5.069593 381 | 22:02:17 INFO:root:Epoch[20] Batch [400] Speed: 319.26 samples/sec Train-Perplexity=5.734110 382 | 22:02:17 INFO:root:Checking BLEU for epoch 20 batch 400 383 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 384 | 22:04:20 INFO:root:b'1gram=72.28% 2gram=48.18% 3gram=36.24% 4gram=26.46% \r\nBP = 0.9763\r\nBLEU = 0.4173\r\n' 385 | 22:04:20 INFO:root:BLEU: 0.4173 @ epoch 20 batch 400 386 | 22:04:20 INFO:root:Current BLEU: 0.4173 > prev best 0.4163 in epoch 15 387 | 22:04:20 INFO:root:Saving... 388 | 22:04:21 INFO:root:Saved checkpoint to "best_bleu-0021.params" 389 | 22:04:57 INFO:root:Epoch[20] Batch [500] Speed: 79.79 samples/sec Train-Perplexity=4.956359 390 | 22:05:37 INFO:root:Epoch[20] Batch [600] Speed: 324.48 samples/sec Train-Perplexity=5.288804 391 | 22:05:50 INFO:root:Epoch[20] Resetting Data Iterator 392 | 22:05:50 INFO:root:Epoch[20] Time cost=365.873 393 | 22:05:51 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0021.params" 394 | 22:06:27 INFO:root:Epoch[21] Batch [100] Speed: 358.00 samples/sec Train-Perplexity=4.901484 395 | 22:07:05 INFO:root:Epoch[21] Batch [200] Speed: 334.84 samples/sec Train-Perplexity=5.006587 396 | 22:07:43 INFO:root:Epoch[21] Batch [300] Speed: 336.02 samples/sec Train-Perplexity=4.877228 397 | 22:08:23 INFO:root:Epoch[21] Batch [400] Speed: 319.59 samples/sec Train-Perplexity=5.617798 398 | 22:08:23 INFO:root:Checking BLEU for epoch 21 batch 400 399 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 400 | 22:10:28 INFO:root:b'1gram=71.66% 2gram=47.92% 3gram=35.61% 4gram=25.55% \r\nBP = 0.9948\r\nBLEU = 0.4183\r\n' 401 | 22:10:28 INFO:root:BLEU: 0.4183 @ epoch 21 batch 400 402 | 22:10:28 INFO:root:Current BLEU: 0.4183 > prev best 0.4173 in epoch 20 403 | 22:10:28 INFO:root:Saving... 404 | 22:10:28 INFO:root:Saved checkpoint to "best_bleu-0022.params" 405 | 22:11:05 INFO:root:Epoch[21] Batch [500] Speed: 79.31 samples/sec Train-Perplexity=4.851527 406 | 22:11:44 INFO:root:Epoch[21] Batch [600] Speed: 324.12 samples/sec Train-Perplexity=5.100939 407 | 22:11:58 INFO:root:Epoch[21] Resetting Data Iterator 408 | 22:11:58 INFO:root:Epoch[21] Time cost=366.776 409 | 22:11:58 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0022.params" 410 | 22:12:34 INFO:root:Epoch[22] Batch [100] Speed: 357.98 samples/sec Train-Perplexity=4.722476 411 | 22:13:12 INFO:root:Epoch[22] Batch [200] Speed: 335.27 samples/sec Train-Perplexity=4.857207 412 | 22:13:51 INFO:root:Epoch[22] Batch [300] Speed: 336.11 samples/sec Train-Perplexity=4.766189 413 | 22:14:31 INFO:root:Epoch[22] Batch [400] Speed: 320.01 samples/sec Train-Perplexity=5.428668 414 | 22:14:31 INFO:root:Checking BLEU for epoch 22 batch 400 415 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms]. 416 | 22:16:34 INFO:root:b'1gram=72.78% 2gram=48.23% 3gram=35.50% 4gram=25.64% \r\nBP = 0.9771\r\nBLEU = 0.4131\r\n' 417 | 22:16:34 INFO:root:BLEU: 0.4131 @ epoch 22 batch 400 418 | 22:17:11 INFO:root:Epoch[22] Batch [500] Speed: 79.83 samples/sec Train-Perplexity=4.681073 419 | 22:17:50 INFO:root:Epoch[22] Batch [600] Speed: 324.54 samples/sec Train-Perplexity=4.956886 420 | 22:18:04 INFO:root:Epoch[22] Resetting Data Iterator 421 | 22:18:04 INFO:root:Epoch[22] Time cost=365.562 422 | 22:18:05 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0023.params" 423 | 22:18:41 INFO:root:Epoch[23] Batch [100] Speed: 358.29 samples/sec Train-Perplexity=4.682423 424 | 22:19:19 INFO:root:Epoch[23] Batch [200] Speed: 335.45 samples/sec Train-Perplexity=4.707169 425 | 22:19:57 INFO:root:Epoch[23] Batch [300] Speed: 336.35 samples/sec Train-Perplexity=4.628967 426 | 22:20:37 INFO:root:Epoch[23] Batch [400] Speed: 319.81 samples/sec Train-Perplexity=5.250182 427 | 22:20:37 INFO:root:Checking BLEU for epoch 23 batch 400 428 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 429 | 22:22:43 INFO:root:b'1gram=70.01% 2gram=46.34% 3gram=34.48% 4gram=24.87% \r\nBP = 1.0000\r\nBLEU = 0.4084\r\n' 430 | 22:22:43 INFO:root:BLEU: 0.4084 @ epoch 23 batch 400 431 | 22:23:19 INFO:root:Epoch[23] Batch [500] Speed: 78.69 samples/sec Train-Perplexity=4.552618 432 | 22:23:59 INFO:root:Epoch[23] Batch [600] Speed: 324.72 samples/sec Train-Perplexity=4.859983 433 | 22:24:12 INFO:root:Epoch[23] Resetting Data Iterator 434 | 22:24:12 INFO:root:Epoch[23] Time cost=367.811 435 | 22:24:13 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0024.params" 436 | 22:24:49 INFO:root:Epoch[24] Batch [100] Speed: 357.62 samples/sec Train-Perplexity=4.552113 437 | 22:25:28 INFO:root:Epoch[24] Batch [200] Speed: 328.03 samples/sec Train-Perplexity=4.613325 438 | 22:26:06 INFO:root:Epoch[24] Batch [300] Speed: 335.90 samples/sec Train-Perplexity=4.530597 439 | 22:26:46 INFO:root:Epoch[24] Batch [400] Speed: 320.08 samples/sec Train-Perplexity=5.147030 440 | 22:26:46 INFO:root:Checking BLEU for epoch 24 batch 400 441 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 442 | 22:28:49 INFO:root:b'1gram=71.94% 2gram=48.40% 3gram=36.03% 4gram=26.64% \r\nBP = 0.9717\r\nBLEU = 0.4155\r\n' 443 | 22:28:49 INFO:root:BLEU: 0.4155 @ epoch 24 batch 400 444 | 22:29:26 INFO:root:Epoch[24] Batch [500] Speed: 80.22 samples/sec Train-Perplexity=4.456267 445 | 22:30:05 INFO:root:Epoch[24] Batch [600] Speed: 324.49 samples/sec Train-Perplexity=4.723113 446 | 22:30:19 INFO:root:Epoch[24] Resetting Data Iterator 447 | 22:30:19 INFO:root:Epoch[24] Time cost=365.690 448 | 22:30:19 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0025.params" 449 | 22:30:55 INFO:root:Epoch[25] Batch [100] Speed: 357.99 samples/sec Train-Perplexity=4.392298 450 | 22:31:34 INFO:root:Epoch[25] Batch [200] Speed: 335.29 samples/sec Train-Perplexity=4.485758 451 | 22:32:12 INFO:root:Epoch[25] Batch [300] Speed: 336.13 samples/sec Train-Perplexity=4.399285 452 | 22:32:52 INFO:root:Epoch[25] Batch [400] Speed: 320.07 samples/sec Train-Perplexity=5.018728 453 | 22:32:52 INFO:root:Checking BLEU for epoch 25 batch 400 454 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 455 | 22:34:56 INFO:root:b'1gram=72.33% 2gram=48.85% 3gram=36.39% 4gram=26.22% \r\nBP = 0.9914\r\nBLEU = 0.4248\r\n' 456 | 22:34:56 INFO:root:BLEU: 0.4248 @ epoch 25 batch 400 457 | 22:34:56 INFO:root:Current BLEU: 0.4248 > prev best 0.4183 in epoch 21 458 | 22:34:56 INFO:root:Saving... 459 | 22:34:57 INFO:root:Saved checkpoint to "best_bleu-0026.params" 460 | 22:35:33 INFO:root:Epoch[25] Batch [500] Speed: 79.07 samples/sec Train-Perplexity=4.335405 461 | 22:36:13 INFO:root:Epoch[25] Batch [600] Speed: 324.77 samples/sec Train-Perplexity=4.627439 462 | 22:36:27 INFO:root:Epoch[25] Resetting Data Iterator 463 | 22:36:27 INFO:root:Epoch[25] Time cost=367.088 464 | 22:36:27 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0026.params" 465 | 22:37:03 INFO:root:Epoch[26] Batch [100] Speed: 357.85 samples/sec Train-Perplexity=4.306752 466 | 22:37:41 INFO:root:Epoch[26] Batch [200] Speed: 334.62 samples/sec Train-Perplexity=4.379418 467 | 22:38:19 INFO:root:Epoch[26] Batch [300] Speed: 335.97 samples/sec Train-Perplexity=4.283302 468 | 22:38:59 INFO:root:Epoch[26] Batch [400] Speed: 319.95 samples/sec Train-Perplexity=4.912712 469 | 22:38:59 INFO:root:Checking BLEU for epoch 26 batch 400 470 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms]. 471 | 22:41:02 INFO:root:b'1gram=73.43% 2gram=49.63% 3gram=37.07% 4gram=27.09% \r\nBP = 0.9612\r\nBLEU = 0.4204\r\n' 472 | 22:41:02 INFO:root:BLEU: 0.4204 @ epoch 26 batch 400 473 | 22:41:40 INFO:root:Epoch[26] Batch [500] Speed: 79.92 samples/sec Train-Perplexity=4.257015 474 | 22:42:19 INFO:root:Epoch[26] Batch [600] Speed: 324.30 samples/sec Train-Perplexity=4.502108 475 | 22:42:33 INFO:root:Epoch[26] Resetting Data Iterator 476 | 22:42:33 INFO:root:Epoch[26] Time cost=365.570 477 | 22:42:33 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0027.params" 478 | 22:43:09 INFO:root:Epoch[27] Batch [100] Speed: 357.83 samples/sec Train-Perplexity=4.244413 479 | 22:43:48 INFO:root:Epoch[27] Batch [200] Speed: 335.25 samples/sec Train-Perplexity=4.262056 480 | 22:44:26 INFO:root:Epoch[27] Batch [300] Speed: 336.12 samples/sec Train-Perplexity=4.184889 481 | 22:45:06 INFO:root:Epoch[27] Batch [400] Speed: 319.83 samples/sec Train-Perplexity=4.801407 482 | 22:45:06 INFO:root:Checking BLEU for epoch 27 batch 400 483 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [239 ms]. 484 | 22:47:08 INFO:root:b'1gram=73.75% 2gram=49.58% 3gram=37.25% 4gram=27.27% \r\nBP = 0.9561\r\nBLEU = 0.4197\r\n' 485 | 22:47:08 INFO:root:BLEU: 0.4197 @ epoch 27 batch 400 486 | 22:47:44 INFO:root:Epoch[27] Batch [500] Speed: 80.66 samples/sec Train-Perplexity=4.152310 487 | 22:48:24 INFO:root:Epoch[27] Batch [600] Speed: 324.11 samples/sec Train-Perplexity=4.398443 488 | 22:48:37 INFO:root:Epoch[27] Resetting Data Iterator 489 | 22:48:37 INFO:root:Epoch[27] Time cost=364.017 490 | 22:48:38 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0028.params" 491 | 22:49:14 INFO:root:Epoch[28] Batch [100] Speed: 357.90 samples/sec Train-Perplexity=4.110479 492 | 22:49:52 INFO:root:Epoch[28] Batch [200] Speed: 334.73 samples/sec Train-Perplexity=4.186066 493 | 22:50:30 INFO:root:Epoch[28] Batch [300] Speed: 336.05 samples/sec Train-Perplexity=4.109250 494 | 22:51:10 INFO:root:Epoch[28] Batch [400] Speed: 319.06 samples/sec Train-Perplexity=4.731491 495 | 22:51:10 INFO:root:Checking BLEU for epoch 28 batch 400 496 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [244 ms]. 497 | 22:53:14 INFO:root:b'1gram=73.15% 2gram=49.07% 3gram=36.82% 4gram=26.62% \r\nBP = 0.9701\r\nBLEU = 0.4202\r\n' 498 | 22:53:14 INFO:root:BLEU: 0.4202 @ epoch 28 batch 400 499 | 22:53:51 INFO:root:Epoch[28] Batch [500] Speed: 79.84 samples/sec Train-Perplexity=4.068351 500 | 22:54:30 INFO:root:Epoch[28] Batch [600] Speed: 324.61 samples/sec Train-Perplexity=4.302435 501 | 22:54:44 INFO:root:Epoch[28] Resetting Data Iterator 502 | 22:54:44 INFO:root:Epoch[28] Time cost=365.732 503 | 22:54:45 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0029.params" 504 | 22:55:20 INFO:root:Epoch[29] Batch [100] Speed: 357.71 samples/sec Train-Perplexity=4.101044 505 | 22:55:59 INFO:root:Epoch[29] Batch [200] Speed: 335.13 samples/sec Train-Perplexity=4.092799 506 | 22:56:37 INFO:root:Epoch[29] Batch [300] Speed: 336.05 samples/sec Train-Perplexity=4.027847 507 | 22:57:17 INFO:root:Epoch[29] Batch [400] Speed: 319.65 samples/sec Train-Perplexity=4.594741 508 | 22:57:17 INFO:root:Checking BLEU for epoch 29 batch 400 509 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 510 | 22:59:20 INFO:root:b'1gram=73.02% 2gram=49.75% 3gram=37.72% 4gram=27.57% \r\nBP = 0.9887\r\nBLEU = 0.4359\r\n' 511 | 22:59:20 INFO:root:BLEU: 0.4359 @ epoch 29 batch 400 512 | 22:59:20 INFO:root:Current BLEU: 0.4359 > prev best 0.4248 in epoch 25 513 | 22:59:20 INFO:root:Saving... 514 | 22:59:20 INFO:root:Saved checkpoint to "best_bleu-0030.params" 515 | 22:59:57 INFO:root:Epoch[29] Batch [500] Speed: 79.82 samples/sec Train-Perplexity=3.977610 516 | 23:00:37 INFO:root:Epoch[29] Batch [600] Speed: 324.40 samples/sec Train-Perplexity=4.218835 517 | 23:00:50 INFO:root:Epoch[29] Resetting Data Iterator 518 | 23:00:50 INFO:root:Epoch[29] Time cost=365.691 519 | 23:00:51 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0030.params" 520 | 23:01:27 INFO:root:Epoch[30] Batch [100] Speed: 358.04 samples/sec Train-Perplexity=3.962719 521 | 23:02:05 INFO:root:Epoch[30] Batch [200] Speed: 334.57 samples/sec Train-Perplexity=4.003199 522 | 23:02:43 INFO:root:Epoch[30] Batch [300] Speed: 335.85 samples/sec Train-Perplexity=3.933537 523 | 23:03:23 INFO:root:Epoch[30] Batch [400] Speed: 319.46 samples/sec Train-Perplexity=4.507263 524 | 23:03:23 INFO:root:Checking BLEU for epoch 30 batch 400 525 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 526 | 23:05:27 INFO:root:b'1gram=72.85% 2gram=48.92% 3gram=36.23% 4gram=26.13% \r\nBP = 0.9853\r\nBLEU = 0.4223\r\n' 527 | 23:05:27 INFO:root:BLEU: 0.4223 @ epoch 30 batch 400 528 | 23:06:04 INFO:root:Epoch[30] Batch [500] Speed: 79.72 samples/sec Train-Perplexity=3.909418 529 | 23:06:43 INFO:root:Epoch[30] Batch [600] Speed: 324.64 samples/sec Train-Perplexity=4.146617 530 | 23:06:57 INFO:root:Epoch[30] Resetting Data Iterator 531 | 23:06:57 INFO:root:Epoch[30] Time cost=365.978 532 | 23:06:58 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0031.params" 533 | 23:07:33 INFO:root:Epoch[31] Batch [100] Speed: 357.55 samples/sec Train-Perplexity=3.881883 534 | 23:08:12 INFO:root:Epoch[31] Batch [200] Speed: 335.27 samples/sec Train-Perplexity=3.929952 535 | 23:08:50 INFO:root:Epoch[31] Batch [300] Speed: 336.06 samples/sec Train-Perplexity=3.861903 536 | 23:09:30 INFO:root:Epoch[31] Batch [400] Speed: 319.83 samples/sec Train-Perplexity=4.402535 537 | 23:09:30 INFO:root:Checking BLEU for epoch 31 batch 400 538 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms]. 539 | 23:11:34 INFO:root:b'1gram=72.48% 2gram=49.10% 3gram=36.89% 4gram=26.80% \r\nBP = 0.9837\r\nBLEU = 0.4261\r\n' 540 | 23:11:34 INFO:root:BLEU: 0.4261 @ epoch 31 batch 400 541 | 23:12:11 INFO:root:Epoch[31] Batch [500] Speed: 79.59 samples/sec Train-Perplexity=3.829989 542 | 23:12:50 INFO:root:Epoch[31] Batch [600] Speed: 324.17 samples/sec Train-Perplexity=4.054468 543 | 23:13:04 INFO:root:Epoch[31] Resetting Data Iterator 544 | 23:13:04 INFO:root:Epoch[31] Time cost=366.171 545 | 23:13:04 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0032.params" 546 | 23:13:40 INFO:root:Epoch[32] Batch [100] Speed: 357.95 samples/sec Train-Perplexity=3.885665 547 | 23:14:19 INFO:root:Epoch[32] Batch [200] Speed: 331.46 samples/sec Train-Perplexity=3.854284 548 | 23:14:57 INFO:root:Epoch[32] Batch [300] Speed: 335.93 samples/sec Train-Perplexity=3.783954 549 | 23:15:37 INFO:root:Epoch[32] Batch [400] Speed: 319.36 samples/sec Train-Perplexity=4.335291 550 | 23:15:37 INFO:root:Checking BLEU for epoch 32 batch 400 551 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms]. 552 | 23:17:41 INFO:root:b'1gram=72.07% 2gram=48.37% 3gram=36.08% 4gram=26.34% \r\nBP = 0.9963\r\nBLEU = 0.4251\r\n' 553 | 23:17:41 INFO:root:BLEU: 0.4251 @ epoch 32 batch 400 554 | 23:18:18 INFO:root:Epoch[32] Batch [500] Speed: 79.54 samples/sec Train-Perplexity=3.787482 555 | 23:18:57 INFO:root:Epoch[32] Batch [600] Speed: 324.37 samples/sec Train-Perplexity=3.992663 556 | 23:19:11 INFO:root:Epoch[32] Resetting Data Iterator 557 | 23:19:11 INFO:root:Epoch[32] Time cost=366.729 558 | 23:19:12 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0033.params" 559 | 23:19:48 INFO:root:Epoch[33] Batch [100] Speed: 358.02 samples/sec Train-Perplexity=3.789431 560 | 23:20:26 INFO:root:Epoch[33] Batch [200] Speed: 335.04 samples/sec Train-Perplexity=3.802084 561 | 23:21:04 INFO:root:Epoch[33] Batch [300] Speed: 336.21 samples/sec Train-Perplexity=3.733409 562 | 23:21:44 INFO:root:Epoch[33] Batch [400] Speed: 319.87 samples/sec Train-Perplexity=4.257213 563 | 23:21:44 INFO:root:Checking BLEU for epoch 33 batch 400 564 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 565 | 23:23:48 INFO:root:b'1gram=71.47% 2gram=47.45% 3gram=35.10% 4gram=24.91% \r\nBP = 1.0000\r\nBLEU = 0.4150\r\n' 566 | 23:23:48 INFO:root:BLEU: 0.415 @ epoch 33 batch 400 567 | 23:24:25 INFO:root:Epoch[33] Batch [500] Speed: 79.44 samples/sec Train-Perplexity=3.702252 568 | 23:25:05 INFO:root:Epoch[33] Batch [600] Speed: 324.16 samples/sec Train-Perplexity=3.953748 569 | 23:25:18 INFO:root:Epoch[33] Resetting Data Iterator 570 | 23:25:18 INFO:root:Epoch[33] Time cost=366.485 571 | 23:25:19 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0034.params" 572 | 23:25:55 INFO:root:Epoch[34] Batch [100] Speed: 357.78 samples/sec Train-Perplexity=3.712668 573 | 23:26:33 INFO:root:Epoch[34] Batch [200] Speed: 335.09 samples/sec Train-Perplexity=3.726853 574 | 23:27:11 INFO:root:Epoch[34] Batch [300] Speed: 335.97 samples/sec Train-Perplexity=3.656434 575 | 23:27:51 INFO:root:Epoch[34] Batch [400] Speed: 319.40 samples/sec Train-Perplexity=4.186915 576 | 23:27:51 INFO:root:Checking BLEU for epoch 34 batch 400 577 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 578 | 23:29:56 INFO:root:b'1gram=72.86% 2gram=49.52% 3gram=37.17% 4gram=27.17% \r\nBP = 0.9879\r\nBLEU = 0.4317\r\n' 579 | 23:29:56 INFO:root:BLEU: 0.4317 @ epoch 34 batch 400 580 | 23:30:33 INFO:root:Epoch[34] Batch [500] Speed: 78.88 samples/sec Train-Perplexity=3.626724 581 | 23:31:13 INFO:root:Epoch[34] Batch [600] Speed: 324.34 samples/sec Train-Perplexity=3.852287 582 | 23:31:27 INFO:root:Epoch[34] Resetting Data Iterator 583 | 23:31:27 INFO:root:Epoch[34] Time cost=367.663 584 | 23:31:27 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0035.params" 585 | 23:32:03 INFO:root:Epoch[35] Batch [100] Speed: 357.82 samples/sec Train-Perplexity=3.642621 586 | 23:32:41 INFO:root:Epoch[35] Batch [200] Speed: 334.81 samples/sec Train-Perplexity=3.670861 587 | 23:33:19 INFO:root:Epoch[35] Batch [300] Speed: 336.10 samples/sec Train-Perplexity=3.608510 588 | 23:34:00 INFO:root:Epoch[35] Batch [400] Speed: 319.11 samples/sec Train-Perplexity=4.110463 589 | 23:34:00 INFO:root:Checking BLEU for epoch 35 batch 400 590 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 591 | 23:36:04 INFO:root:b'1gram=73.04% 2gram=49.33% 3gram=37.01% 4gram=27.20% \r\nBP = 0.9956\r\nBLEU = 0.4345\r\n' 592 | 23:36:04 INFO:root:BLEU: 0.4345 @ epoch 35 batch 400 593 | 23:36:41 INFO:root:Epoch[35] Batch [500] Speed: 79.50 samples/sec Train-Perplexity=3.585162 594 | 23:37:20 INFO:root:Epoch[35] Batch [600] Speed: 324.01 samples/sec Train-Perplexity=3.779326 595 | 23:37:34 INFO:root:Epoch[35] Resetting Data Iterator 596 | 23:37:34 INFO:root:Epoch[35] Time cost=366.483 597 | 23:37:34 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0036.params" 598 | 23:38:10 INFO:root:Epoch[36] Batch [100] Speed: 357.39 samples/sec Train-Perplexity=3.574405 599 | 23:38:48 INFO:root:Epoch[36] Batch [200] Speed: 334.85 samples/sec Train-Perplexity=3.615811 600 | 23:39:27 INFO:root:Epoch[36] Batch [300] Speed: 335.90 samples/sec Train-Perplexity=3.536461 601 | 23:40:07 INFO:root:Epoch[36] Batch [400] Speed: 319.57 samples/sec Train-Perplexity=4.072223 602 | 23:40:07 INFO:root:Checking BLEU for epoch 36 batch 400 603 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 604 | 23:42:11 INFO:root:b'1gram=73.09% 2gram=49.54% 3gram=37.24% 4gram=27.26% \r\nBP = 0.9971\r\nBLEU = 0.4366\r\n' 605 | 23:42:11 INFO:root:BLEU: 0.4366 @ epoch 36 batch 400 606 | 23:42:11 INFO:root:Current BLEU: 0.4366 > prev best 0.4359 in epoch 29 607 | 23:42:11 INFO:root:Saving... 608 | 23:42:11 INFO:root:Saved checkpoint to "best_bleu-0037.params" 609 | 23:42:48 INFO:root:Epoch[36] Batch [500] Speed: 79.34 samples/sec Train-Perplexity=3.511627 610 | 23:43:27 INFO:root:Epoch[36] Batch [600] Speed: 324.38 samples/sec Train-Perplexity=3.713632 611 | 23:43:41 INFO:root:Epoch[36] Resetting Data Iterator 612 | 23:43:41 INFO:root:Epoch[36] Time cost=366.780 613 | 23:43:42 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0037.params" 614 | 23:44:18 INFO:root:Epoch[37] Batch [100] Speed: 357.87 samples/sec Train-Perplexity=3.533548 615 | 23:44:56 INFO:root:Epoch[37] Batch [200] Speed: 334.95 samples/sec Train-Perplexity=3.543121 616 | 23:45:34 INFO:root:Epoch[37] Batch [300] Speed: 335.64 samples/sec Train-Perplexity=3.484517 617 | 23:46:14 INFO:root:Epoch[37] Batch [400] Speed: 319.59 samples/sec Train-Perplexity=3.976594 618 | 23:46:14 INFO:root:Checking BLEU for epoch 37 batch 400 619 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms]. 620 | 23:48:18 INFO:root:b'1gram=71.98% 2gram=47.77% 3gram=35.48% 4gram=25.23% \r\nBP = 0.9977\r\nBLEU = 0.4179\r\n' 621 | 23:48:18 INFO:root:BLEU: 0.4179 @ epoch 37 batch 400 622 | 23:48:55 INFO:root:Epoch[37] Batch [500] Speed: 79.58 samples/sec Train-Perplexity=3.486265 623 | 23:49:34 INFO:root:Epoch[37] Batch [600] Speed: 323.40 samples/sec Train-Perplexity=3.658202 624 | 23:49:48 INFO:root:Epoch[37] Resetting Data Iterator 625 | 23:49:48 INFO:root:Epoch[37] Time cost=366.380 626 | 23:49:49 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0038.params" 627 | 23:50:25 INFO:root:Epoch[38] Batch [100] Speed: 357.33 samples/sec Train-Perplexity=3.494332 628 | 23:51:03 INFO:root:Epoch[38] Batch [200] Speed: 334.01 samples/sec Train-Perplexity=3.510251 629 | 23:51:41 INFO:root:Epoch[38] Batch [300] Speed: 335.70 samples/sec Train-Perplexity=3.437536 630 | 23:52:21 INFO:root:Epoch[38] Batch [400] Speed: 319.40 samples/sec Train-Perplexity=3.915965 631 | 23:52:21 INFO:root:Checking BLEU for epoch 38 batch 400 632 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms]. 633 | 23:54:24 INFO:root:b'1gram=73.34% 2gram=49.38% 3gram=37.60% 4gram=27.72% \r\nBP = 0.9792\r\nBLEU = 0.4316\r\n' 634 | 23:54:24 INFO:root:BLEU: 0.4316 @ epoch 38 batch 400 635 | 23:55:01 INFO:root:Epoch[38] Batch [500] Speed: 80.05 samples/sec Train-Perplexity=3.412933 636 | 23:55:41 INFO:root:Epoch[38] Batch [600] Speed: 324.22 samples/sec Train-Perplexity=3.610383 637 | 23:55:54 INFO:root:Epoch[38] Resetting Data Iterator 638 | 23:55:54 INFO:root:Epoch[38] Time cost=365.524 639 | 23:55:55 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0039.params" 640 | 23:56:31 INFO:root:Epoch[39] Batch [100] Speed: 357.86 samples/sec Train-Perplexity=3.410397 641 | 23:57:09 INFO:root:Epoch[39] Batch [200] Speed: 334.67 samples/sec Train-Perplexity=3.453946 642 | 23:57:47 INFO:root:Epoch[39] Batch [300] Speed: 335.97 samples/sec Train-Perplexity=3.392457 643 | 23:58:27 INFO:root:Epoch[39] Batch [400] Speed: 319.98 samples/sec Train-Perplexity=3.868850 644 | 23:58:27 INFO:root:Checking BLEU for epoch 39 batch 400 645 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms]. 646 | 00:00:32 INFO:root:b'1gram=72.16% 2gram=47.99% 3gram=35.59% 4gram=25.67% \r\nBP = 1.0000\r\nBLEU = 0.4217\r\n' 647 | 00:00:32 INFO:root:BLEU: 0.4217 @ epoch 39 batch 400 648 | 00:01:09 INFO:root:Epoch[39] Batch [500] Speed: 79.26 samples/sec Train-Perplexity=3.392095 649 | 00:01:48 INFO:root:Epoch[39] Batch [600] Speed: 324.30 samples/sec Train-Perplexity=3.557561 650 | 00:02:02 INFO:root:Epoch[39] Resetting Data Iterator 651 | 00:02:02 INFO:root:Epoch[39] Time cost=366.865 652 | 00:02:02 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0040.params" 653 | 00:02:39 INFO:root:Epoch[40] Batch [100] Speed: 350.17 samples/sec Train-Perplexity=3.356947 654 | 00:03:17 INFO:root:Epoch[40] Batch [200] Speed: 334.80 samples/sec Train-Perplexity=3.408309 655 | 00:03:56 INFO:root:Epoch[40] Batch [300] Speed: 335.42 samples/sec Train-Perplexity=3.341321 656 | 00:04:36 INFO:root:Epoch[40] Batch [400] Speed: 319.48 samples/sec Train-Perplexity=3.813529 657 | 00:04:36 INFO:root:Checking BLEU for epoch 40 batch 400 658 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [244 ms]. 659 | 00:06:40 INFO:root:b'1gram=73.21% 2gram=49.27% 3gram=37.03% 4gram=27.07% \r\nBP = 0.9908\r\nBLEU = 0.4321\r\n' 660 | 00:06:40 INFO:root:BLEU: 0.4321 @ epoch 40 batch 400 661 | 00:07:16 INFO:root:Epoch[40] Batch [500] Speed: 79.59 samples/sec Train-Perplexity=3.342658 662 | 00:07:56 INFO:root:Epoch[40] Batch [600] Speed: 324.20 samples/sec Train-Perplexity=3.529993 663 | 00:08:10 INFO:root:Epoch[40] Resetting Data Iterator 664 | 00:08:10 INFO:root:Epoch[40] Time cost=367.100 665 | 00:08:10 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0041.params" 666 | 00:08:46 INFO:root:Epoch[41] Batch [100] Speed: 357.20 samples/sec Train-Perplexity=3.363024 667 | 00:09:24 INFO:root:Epoch[41] Batch [200] Speed: 334.37 samples/sec Train-Perplexity=3.352243 668 | 00:10:03 INFO:root:Epoch[41] Batch [300] Speed: 335.77 samples/sec Train-Perplexity=3.289595 669 | 00:10:43 INFO:root:Epoch[41] Batch [400] Speed: 319.19 samples/sec Train-Perplexity=3.754321 670 | 00:10:43 INFO:root:Checking BLEU for epoch 41 batch 400 671 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 672 | 00:12:48 INFO:root:b'1gram=72.25% 2gram=49.30% 3gram=37.14% 4gram=26.91% \r\nBP = 1.0000\r\nBLEU = 0.4344\r\n' 673 | 00:12:48 INFO:root:BLEU: 0.4344 @ epoch 41 batch 400 674 | 00:13:25 INFO:root:Epoch[41] Batch [500] Speed: 79.01 samples/sec Train-Perplexity=3.291721 675 | 00:14:04 INFO:root:Epoch[41] Batch [600] Speed: 323.94 samples/sec Train-Perplexity=3.463519 676 | 00:14:18 INFO:root:Epoch[41] Resetting Data Iterator 677 | 00:14:18 INFO:root:Epoch[41] Time cost=367.667 678 | 00:14:19 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0042.params" 679 | 00:14:55 INFO:root:Epoch[42] Batch [100] Speed: 357.39 samples/sec Train-Perplexity=3.321909 680 | 00:15:33 INFO:root:Epoch[42] Batch [200] Speed: 334.74 samples/sec Train-Perplexity=3.308056 681 | 00:16:11 INFO:root:Epoch[42] Batch [300] Speed: 334.78 samples/sec Train-Perplexity=3.234425 682 | 00:16:51 INFO:root:Epoch[42] Batch [400] Speed: 319.44 samples/sec Train-Perplexity=3.724314 683 | 00:16:51 INFO:root:Checking BLEU for epoch 42 batch 400 684 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms]. 685 | 00:18:58 INFO:root:b'1gram=70.86% 2gram=47.49% 3gram=35.56% 4gram=25.64% \r\nBP = 1.0000\r\nBLEU = 0.4185\r\n' 686 | 00:18:58 INFO:root:BLEU: 0.4185 @ epoch 42 batch 400 687 | 00:19:35 INFO:root:Epoch[42] Batch [500] Speed: 78.28 samples/sec Train-Perplexity=3.245776 688 | 00:20:14 INFO:root:Epoch[42] Batch [600] Speed: 323.44 samples/sec Train-Perplexity=3.407396 689 | 00:20:28 INFO:root:Epoch[42] Resetting Data Iterator 690 | 00:20:28 INFO:root:Epoch[42] Time cost=369.212 691 | 00:20:28 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0043.params" 692 | 00:21:04 INFO:root:Epoch[43] Batch [100] Speed: 356.92 samples/sec Train-Perplexity=3.250229 693 | 00:21:43 INFO:root:Epoch[43] Batch [200] Speed: 335.56 samples/sec Train-Perplexity=3.274649 694 | 00:22:21 INFO:root:Epoch[43] Batch [300] Speed: 336.32 samples/sec Train-Perplexity=3.214198 695 | 00:23:01 INFO:root:Epoch[43] Batch [400] Speed: 319.96 samples/sec Train-Perplexity=3.701373 696 | 00:23:01 INFO:root:Checking BLEU for epoch 43 batch 400 697 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 698 | 00:25:06 INFO:root:b'1gram=71.57% 2gram=48.42% 3gram=36.15% 4gram=26.46% \r\nBP = 1.0000\r\nBLEU = 0.4267\r\n' 699 | 00:25:06 INFO:root:BLEU: 0.4267 @ epoch 43 batch 400 700 | 00:25:43 INFO:root:Epoch[43] Batch [500] Speed: 79.03 samples/sec Train-Perplexity=3.202524 701 | 00:26:22 INFO:root:Epoch[43] Batch [600] Speed: 324.71 samples/sec Train-Perplexity=3.374590 702 | 00:26:36 INFO:root:Epoch[43] Resetting Data Iterator 703 | 00:26:36 INFO:root:Epoch[43] Time cost=367.202 704 | 00:26:36 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0044.params" 705 | 00:27:12 INFO:root:Epoch[44] Batch [100] Speed: 358.34 samples/sec Train-Perplexity=3.251368 706 | 00:27:50 INFO:root:Epoch[44] Batch [200] Speed: 335.11 samples/sec Train-Perplexity=3.230534 707 | 00:28:28 INFO:root:Epoch[44] Batch [300] Speed: 336.60 samples/sec Train-Perplexity=3.183112 708 | 00:29:08 INFO:root:Epoch[44] Batch [400] Speed: 320.27 samples/sec Train-Perplexity=3.616401 709 | 00:29:08 INFO:root:Checking BLEU for epoch 44 batch 400 710 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [239 ms]. 711 | 00:31:50 INFO:root:b'1gram=72.44% 2gram=48.61% 3gram=36.49% 4gram=26.59% \r\nBP = 1.0000\r\nBLEU = 0.4299\r\n' 712 | 00:31:50 INFO:root:BLEU: 0.4299 @ epoch 44 batch 400 713 | 00:32:59 INFO:root:Epoch[44] Batch [500] Speed: 55.43 samples/sec Train-Perplexity=3.160131 714 | 00:33:43 INFO:root:Epoch[44] Batch [600] Speed: 294.19 samples/sec Train-Perplexity=3.314777 715 | 00:34:05 INFO:root:Epoch[44] Resetting Data Iterator 716 | 00:34:05 INFO:root:Epoch[44] Time cost=448.968 717 | 00:34:06 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0045.params" 718 | 00:35:15 INFO:root:Epoch[45] Batch [100] Speed: 187.53 samples/sec Train-Perplexity=3.168578 719 | 00:35:58 INFO:root:Epoch[45] Batch [200] Speed: 296.46 samples/sec Train-Perplexity=3.186525 720 | 00:36:37 INFO:root:Epoch[45] Batch [300] Speed: 331.03 samples/sec Train-Perplexity=3.124966 721 | 00:37:22 INFO:root:Epoch[45] Batch [400] Speed: 283.18 samples/sec Train-Perplexity=3.576049 722 | 00:37:22 INFO:root:Checking BLEU for epoch 45 batch 400 723 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 724 | 00:39:27 INFO:root:b'1gram=72.85% 2gram=48.60% 3gram=36.19% 4gram=26.58% \r\nBP = 1.0000\r\nBLEU = 0.4296\r\n' 725 | 00:39:27 INFO:root:BLEU: 0.4296 @ epoch 45 batch 400 726 | 00:40:04 INFO:root:Epoch[45] Batch [500] Speed: 78.93 samples/sec Train-Perplexity=3.120963 727 | 00:40:44 INFO:root:Epoch[45] Batch [600] Speed: 324.25 samples/sec Train-Perplexity=3.287161 728 | 00:40:57 INFO:root:Epoch[45] Resetting Data Iterator 729 | 00:40:57 INFO:root:Epoch[45] Time cost=410.759 730 | 00:40:58 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0046.params" 731 | 00:41:34 INFO:root:Epoch[46] Batch [100] Speed: 358.12 samples/sec Train-Perplexity=3.126530 732 | 00:42:12 INFO:root:Epoch[46] Batch [200] Speed: 335.34 samples/sec Train-Perplexity=3.139847 733 | 00:42:50 INFO:root:Epoch[46] Batch [300] Speed: 336.50 samples/sec Train-Perplexity=3.094464 734 | 00:43:30 INFO:root:Epoch[46] Batch [400] Speed: 318.95 samples/sec Train-Perplexity=3.523277 735 | 00:43:30 INFO:root:Checking BLEU for epoch 46 batch 400 736 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 737 | 00:45:35 INFO:root:b'1gram=72.20% 2gram=48.50% 3gram=35.88% 4gram=26.09% \r\nBP = 1.0000\r\nBLEU = 0.4255\r\n' 738 | 00:45:35 INFO:root:BLEU: 0.4255 @ epoch 46 batch 400 739 | 00:46:12 INFO:root:Epoch[46] Batch [500] Speed: 79.07 samples/sec Train-Perplexity=3.091579 740 | 00:46:51 INFO:root:Epoch[46] Batch [600] Speed: 324.51 samples/sec Train-Perplexity=3.243255 741 | 00:47:05 INFO:root:Epoch[46] Resetting Data Iterator 742 | 00:47:05 INFO:root:Epoch[46] Time cost=367.176 743 | 00:47:06 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0047.params" 744 | 00:47:42 INFO:root:Epoch[47] Batch [100] Speed: 358.25 samples/sec Train-Perplexity=3.127451 745 | 00:48:20 INFO:root:Epoch[47] Batch [200] Speed: 335.39 samples/sec Train-Perplexity=3.118381 746 | 00:48:58 INFO:root:Epoch[47] Batch [300] Speed: 336.53 samples/sec Train-Perplexity=3.064091 747 | 00:49:38 INFO:root:Epoch[47] Batch [400] Speed: 319.99 samples/sec Train-Perplexity=3.486758 748 | 00:49:38 INFO:root:Checking BLEU for epoch 47 batch 400 749 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 750 | 00:51:41 INFO:root:b'1gram=73.16% 2gram=49.56% 3gram=37.50% 4gram=28.27% \r\nBP = 0.9755\r\nBLEU = 0.4319\r\n' 751 | 00:51:41 INFO:root:BLEU: 0.4319 @ epoch 47 batch 400 752 | 00:52:17 INFO:root:Epoch[47] Batch [500] Speed: 80.20 samples/sec Train-Perplexity=3.042889 753 | 00:52:57 INFO:root:Epoch[47] Batch [600] Speed: 324.67 samples/sec Train-Perplexity=3.219357 754 | 00:53:10 INFO:root:Epoch[47] Resetting Data Iterator 755 | 00:53:10 INFO:root:Epoch[47] Time cost=364.720 756 | 00:53:11 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0048.params" 757 | 00:53:47 INFO:root:Epoch[48] Batch [100] Speed: 358.36 samples/sec Train-Perplexity=3.065208 758 | 00:54:25 INFO:root:Epoch[48] Batch [200] Speed: 335.32 samples/sec Train-Perplexity=3.078973 759 | 00:55:03 INFO:root:Epoch[48] Batch [300] Speed: 336.66 samples/sec Train-Perplexity=3.022346 760 | 00:55:43 INFO:root:Epoch[48] Batch [400] Speed: 319.88 samples/sec Train-Perplexity=3.463543 761 | 00:55:43 INFO:root:Checking BLEU for epoch 48 batch 400 762 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 763 | 00:57:49 INFO:root:b'1gram=72.70% 2gram=49.23% 3gram=36.91% 4gram=27.22% \r\nBP = 1.0000\r\nBLEU = 0.4354\r\n' 764 | 00:57:49 INFO:root:BLEU: 0.4354 @ epoch 48 batch 400 765 | 00:58:26 INFO:root:Epoch[48] Batch [500] Speed: 78.62 samples/sec Train-Perplexity=3.017454 766 | 00:59:05 INFO:root:Epoch[48] Batch [600] Speed: 324.70 samples/sec Train-Perplexity=3.170958 767 | 00:59:19 INFO:root:Epoch[48] Resetting Data Iterator 768 | 00:59:19 INFO:root:Epoch[48] Time cost=367.905 769 | 00:59:20 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0049.params" 770 | 00:59:55 INFO:root:Epoch[49] Batch [100] Speed: 358.41 samples/sec Train-Perplexity=3.014912 771 | 01:00:34 INFO:root:Epoch[49] Batch [200] Speed: 335.27 samples/sec Train-Perplexity=3.051303 772 | 01:01:12 INFO:root:Epoch[49] Batch [300] Speed: 336.77 samples/sec Train-Perplexity=2.991697 773 | 01:01:52 INFO:root:Epoch[49] Batch [400] Speed: 320.15 samples/sec Train-Perplexity=3.408235 774 | 01:01:52 INFO:root:Checking BLEU for epoch 49 batch 400 775 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms]. 776 | 01:03:55 INFO:root:b'1gram=74.00% 2gram=49.88% 3gram=37.87% 4gram=28.41% \r\nBP = 0.9800\r\nBLEU = 0.4375\r\n' 777 | 01:03:55 INFO:root:BLEU: 0.4375 @ epoch 49 batch 400 778 | 01:03:55 INFO:root:Current BLEU: 0.4375 > prev best 0.4366 in epoch 36 779 | 01:03:55 INFO:root:Saving... 780 | 01:03:55 INFO:root:Saved checkpoint to "best_bleu-0050.params" 781 | 01:04:32 INFO:root:Epoch[49] Batch [500] Speed: 79.91 samples/sec Train-Perplexity=2.990173 782 | 01:05:11 INFO:root:Epoch[49] Batch [600] Speed: 324.86 samples/sec Train-Perplexity=3.145794 783 | 01:05:25 INFO:root:Epoch[49] Resetting Data Iterator 784 | 01:05:25 INFO:root:Epoch[49] Time cost=365.229 785 | 01:05:25 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0050.params" 786 | 01:06:01 INFO:root:Epoch[50] Batch [100] Speed: 357.71 samples/sec Train-Perplexity=3.020203 787 | 01:06:40 INFO:root:Epoch[50] Batch [200] Speed: 335.67 samples/sec Train-Perplexity=3.019267 788 | 01:07:18 INFO:root:Epoch[50] Batch [300] Speed: 336.66 samples/sec Train-Perplexity=2.957691 789 | 01:07:58 INFO:root:Epoch[50] Batch [400] Speed: 315.11 samples/sec Train-Perplexity=3.374037 790 | 01:07:58 INFO:root:Checking BLEU for epoch 50 batch 400 791 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 792 | 01:10:01 INFO:root:b'1gram=73.25% 2gram=48.85% 3gram=36.41% 4gram=26.88% \r\nBP = 0.9887\r\nBLEU = 0.4277\r\n' 793 | 01:10:01 INFO:root:BLEU: 0.4277 @ epoch 50 batch 400 794 | 01:10:37 INFO:root:Epoch[50] Batch [500] Speed: 80.42 samples/sec Train-Perplexity=2.954035 795 | 01:11:17 INFO:root:Epoch[50] Batch [600] Speed: 324.71 samples/sec Train-Perplexity=3.105496 796 | 01:11:30 INFO:root:Epoch[50] Resetting Data Iterator 797 | 01:11:30 INFO:root:Epoch[50] Time cost=364.881 798 | 01:11:31 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0051.params" 799 | 01:12:07 INFO:root:Epoch[51] Batch [100] Speed: 358.36 samples/sec Train-Perplexity=2.987536 800 | 01:12:45 INFO:root:Epoch[51] Batch [200] Speed: 335.53 samples/sec Train-Perplexity=2.985420 801 | 01:13:23 INFO:root:Epoch[51] Batch [300] Speed: 336.94 samples/sec Train-Perplexity=2.933420 802 | 01:14:03 INFO:root:Epoch[51] Batch [400] Speed: 319.95 samples/sec Train-Perplexity=3.334243 803 | 01:14:03 INFO:root:Checking BLEU for epoch 51 batch 400 804 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 805 | 01:16:08 INFO:root:b'1gram=72.88% 2gram=49.27% 3gram=36.94% 4gram=27.25% \r\nBP = 1.0000\r\nBLEU = 0.4361\r\n' 806 | 01:16:08 INFO:root:BLEU: 0.4361 @ epoch 51 batch 400 807 | 01:16:45 INFO:root:Epoch[51] Batch [500] Speed: 79.19 samples/sec Train-Perplexity=2.927022 808 | 01:17:24 INFO:root:Epoch[51] Batch [600] Speed: 324.95 samples/sec Train-Perplexity=3.073325 809 | 01:17:38 INFO:root:Epoch[51] Resetting Data Iterator 810 | 01:17:38 INFO:root:Epoch[51] Time cost=366.619 811 | 01:17:38 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0052.params" 812 | 01:18:14 INFO:root:Epoch[52] Batch [100] Speed: 358.80 samples/sec Train-Perplexity=2.942163 813 | 01:18:52 INFO:root:Epoch[52] Batch [200] Speed: 335.97 samples/sec Train-Perplexity=2.951506 814 | 01:19:30 INFO:root:Epoch[52] Batch [300] Speed: 336.97 samples/sec Train-Perplexity=2.895919 815 | 01:20:10 INFO:root:Epoch[52] Batch [400] Speed: 319.99 samples/sec Train-Perplexity=3.298124 816 | 01:20:10 INFO:root:Checking BLEU for epoch 52 batch 400 817 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 818 | 01:22:16 INFO:root:b'1gram=71.71% 2gram=47.97% 3gram=35.51% 4gram=26.02% \r\nBP = 1.0000\r\nBLEU = 0.4222\r\n' 819 | 01:22:16 INFO:root:BLEU: 0.4222 @ epoch 52 batch 400 820 | 01:22:52 INFO:root:Epoch[52] Batch [500] Speed: 79.03 samples/sec Train-Perplexity=2.905427 821 | 01:23:32 INFO:root:Epoch[52] Batch [600] Speed: 325.05 samples/sec Train-Perplexity=3.039335 822 | 01:23:46 INFO:root:Epoch[52] Resetting Data Iterator 823 | 01:23:46 INFO:root:Epoch[52] Time cost=367.316 824 | 01:23:46 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0053.params" 825 | 01:24:22 INFO:root:Epoch[53] Batch [100] Speed: 358.35 samples/sec Train-Perplexity=2.912539 826 | 01:25:00 INFO:root:Epoch[53] Batch [200] Speed: 335.07 samples/sec Train-Perplexity=2.912047 827 | 01:25:38 INFO:root:Epoch[53] Batch [300] Speed: 336.40 samples/sec Train-Perplexity=2.884444 828 | 01:26:18 INFO:root:Epoch[53] Batch [400] Speed: 319.80 samples/sec Train-Perplexity=3.274917 829 | 01:26:18 INFO:root:Checking BLEU for epoch 53 batch 400 830 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 831 | 01:28:23 INFO:root:b'1gram=72.65% 2gram=49.63% 3gram=37.62% 4gram=28.08% \r\nBP = 1.0000\r\nBLEU = 0.4418\r\n' 832 | 01:28:23 INFO:root:BLEU: 0.4418 @ epoch 53 batch 400 833 | 01:28:23 INFO:root:Current BLEU: 0.4418 > prev best 0.4375 in epoch 49 834 | 01:28:23 INFO:root:Saving... 835 | 01:28:23 INFO:root:Saved checkpoint to "best_bleu-0054.params" 836 | 01:29:00 INFO:root:Epoch[53] Batch [500] Speed: 79.18 samples/sec Train-Perplexity=2.870809 837 | 01:29:40 INFO:root:Epoch[53] Batch [600] Speed: 324.26 samples/sec Train-Perplexity=3.016911 838 | 01:29:53 INFO:root:Epoch[53] Resetting Data Iterator 839 | 01:29:53 INFO:root:Epoch[53] Time cost=366.881 840 | 01:29:54 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0054.params" 841 | 01:30:30 INFO:root:Epoch[54] Batch [100] Speed: 358.38 samples/sec Train-Perplexity=2.883674 842 | 01:31:08 INFO:root:Epoch[54] Batch [200] Speed: 335.57 samples/sec Train-Perplexity=2.884527 843 | 01:31:46 INFO:root:Epoch[54] Batch [300] Speed: 336.49 samples/sec Train-Perplexity=2.842884 844 | 01:32:26 INFO:root:Epoch[54] Batch [400] Speed: 320.14 samples/sec Train-Perplexity=3.233647 845 | 01:32:26 INFO:root:Checking BLEU for epoch 54 batch 400 846 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms]. 847 | 01:34:29 INFO:root:b'1gram=72.82% 2gram=48.45% 3gram=36.25% 4gram=26.67% \r\nBP = 0.9922\r\nBLEU = 0.4264\r\n' 848 | 01:34:30 INFO:root:BLEU: 0.4264 @ epoch 54 batch 400 849 | 01:35:06 INFO:root:Epoch[54] Batch [500] Speed: 79.87 samples/sec Train-Perplexity=2.852245 850 | 01:35:45 INFO:root:Epoch[54] Batch [600] Speed: 325.38 samples/sec Train-Perplexity=2.977500 851 | 01:35:59 INFO:root:Epoch[54] Resetting Data Iterator 852 | 01:35:59 INFO:root:Epoch[54] Time cost=365.198 853 | 01:36:00 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0055.params" 854 | 01:36:35 INFO:root:Epoch[55] Batch [100] Speed: 358.32 samples/sec Train-Perplexity=2.842367 855 | 01:37:14 INFO:root:Epoch[55] Batch [200] Speed: 336.32 samples/sec Train-Perplexity=2.863954 856 | 01:37:52 INFO:root:Epoch[55] Batch [300] Speed: 336.51 samples/sec Train-Perplexity=2.809389 857 | 01:38:32 INFO:root:Epoch[55] Batch [400] Speed: 320.76 samples/sec Train-Perplexity=3.203576 858 | 01:38:32 INFO:root:Checking BLEU for epoch 55 batch 400 859 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms]. 860 | 01:40:37 INFO:root:b'1gram=71.48% 2gram=48.49% 3gram=36.17% 4gram=26.42% \r\nBP = 1.0000\r\nBLEU = 0.4266\r\n' 861 | 01:40:37 INFO:root:BLEU: 0.4266 @ epoch 55 batch 400 862 | 01:41:13 INFO:root:Epoch[55] Batch [500] Speed: 79.12 samples/sec Train-Perplexity=2.816309 863 | 01:41:53 INFO:root:Epoch[55] Batch [600] Speed: 325.54 samples/sec Train-Perplexity=2.954175 864 | 01:42:06 INFO:root:Epoch[55] Resetting Data Iterator 865 | 01:42:06 INFO:root:Epoch[55] Time cost=366.534 866 | 01:42:07 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0056.params" 867 | 01:42:43 INFO:root:Epoch[56] Batch [100] Speed: 359.04 samples/sec Train-Perplexity=2.832930 868 | 01:43:21 INFO:root:Epoch[56] Batch [200] Speed: 335.64 samples/sec Train-Perplexity=2.840716 869 | 01:43:59 INFO:root:Epoch[56] Batch [300] Speed: 336.90 samples/sec Train-Perplexity=2.801349 870 | 01:44:39 INFO:root:Epoch[56] Batch [400] Speed: 320.40 samples/sec Train-Perplexity=3.174728 871 | 01:44:39 INFO:root:Checking BLEU for epoch 56 batch 400 872 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms]. 873 | 01:46:44 INFO:root:b'1gram=72.53% 2gram=48.84% 3gram=36.61% 4gram=26.60% \r\nBP = 1.0000\r\nBLEU = 0.4310\r\n' 874 | 01:46:44 INFO:root:BLEU: 0.431 @ epoch 56 batch 400 875 | 01:47:21 INFO:root:Epoch[56] Batch [500] Speed: 78.99 samples/sec Train-Perplexity=2.816300 876 | 01:48:00 INFO:root:Epoch[56] Batch [600] Speed: 325.36 samples/sec Train-Perplexity=2.920598 877 | 01:48:14 INFO:root:Epoch[56] Resetting Data Iterator 878 | 01:48:14 INFO:root:Epoch[56] Time cost=366.832 879 | 01:48:14 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0057.params" 880 | 01:48:50 INFO:root:Epoch[57] Batch [100] Speed: 358.55 samples/sec Train-Perplexity=2.874106 881 | 01:49:28 INFO:root:Epoch[57] Batch [200] Speed: 335.34 samples/sec Train-Perplexity=2.828585 882 | 01:50:06 INFO:root:Epoch[57] Batch [300] Speed: 337.02 samples/sec Train-Perplexity=2.769679 883 | 01:50:46 INFO:root:Epoch[57] Batch [400] Speed: 320.39 samples/sec Train-Perplexity=3.160472 884 | 01:50:46 INFO:root:Checking BLEU for epoch 57 batch 400 885 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 886 | 01:52:51 INFO:root:b'1gram=72.43% 2gram=49.31% 3gram=36.48% 4gram=26.66% \r\nBP = 1.0000\r\nBLEU = 0.4317\r\n' 887 | 01:52:51 INFO:root:BLEU: 0.4317 @ epoch 57 batch 400 888 | 01:53:27 INFO:root:Epoch[57] Batch [500] Speed: 79.45 samples/sec Train-Perplexity=2.769165 889 | 01:54:07 INFO:root:Epoch[57] Batch [600] Speed: 325.05 samples/sec Train-Perplexity=2.893226 890 | 01:54:20 INFO:root:Epoch[57] Resetting Data Iterator 891 | 01:54:20 INFO:root:Epoch[57] Time cost=366.027 892 | 01:54:21 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0058.params" 893 | 01:54:57 INFO:root:Epoch[58] Batch [100] Speed: 358.92 samples/sec Train-Perplexity=2.794343 894 | 01:55:35 INFO:root:Epoch[58] Batch [200] Speed: 336.01 samples/sec Train-Perplexity=2.797318 895 | 01:56:14 INFO:root:Epoch[58] Batch [300] Speed: 330.93 samples/sec Train-Perplexity=2.754186 896 | 01:56:53 INFO:root:Epoch[58] Batch [400] Speed: 321.14 samples/sec Train-Perplexity=3.105465 897 | 01:56:53 INFO:root:Checking BLEU for epoch 58 batch 400 898 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms]. 899 | 01:58:57 INFO:root:b'1gram=72.48% 2gram=48.46% 3gram=36.07% 4gram=26.21% \r\nBP = 1.0000\r\nBLEU = 0.4269\r\n' 900 | 01:58:57 INFO:root:BLEU: 0.4269 @ epoch 58 batch 400 901 | 01:59:34 INFO:root:Epoch[58] Batch [500] Speed: 79.88 samples/sec Train-Perplexity=2.762731 902 | 02:00:13 INFO:root:Epoch[58] Batch [600] Speed: 326.43 samples/sec Train-Perplexity=2.879713 903 | 02:00:26 INFO:root:Epoch[58] Resetting Data Iterator 904 | 02:00:26 INFO:root:Epoch[58] Time cost=365.398 905 | 02:00:27 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0059.params" 906 | 02:01:03 INFO:root:Epoch[59] Batch [100] Speed: 359.87 samples/sec Train-Perplexity=2.757996 907 | 02:01:41 INFO:root:Epoch[59] Batch [200] Speed: 337.06 samples/sec Train-Perplexity=2.766533 908 | 02:02:19 INFO:root:Epoch[59] Batch [300] Speed: 337.16 samples/sec Train-Perplexity=2.724831 909 | 02:02:59 INFO:root:Epoch[59] Batch [400] Speed: 320.85 samples/sec Train-Perplexity=3.093099 910 | 02:02:59 INFO:root:Checking BLEU for epoch 59 batch 400 911 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms]. 912 | 02:05:04 INFO:root:b'1gram=71.82% 2gram=48.89% 3gram=36.85% 4gram=27.27% \r\nBP = 1.0000\r\nBLEU = 0.4334\r\n' 913 | 02:05:04 INFO:root:BLEU: 0.4334 @ epoch 59 batch 400 914 | 02:05:41 INFO:root:Epoch[59] Batch [500] Speed: 79.01 samples/sec Train-Perplexity=2.736961 915 | 02:06:20 INFO:root:Epoch[59] Batch [600] Speed: 325.19 samples/sec Train-Perplexity=2.838531 916 | 02:06:34 INFO:root:Epoch[59] Resetting Data Iterator 917 | 02:06:34 INFO:root:Epoch[59] Time cost=366.495 918 | 02:06:34 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0060.params" 919 | 920 | Process finished with exit code 0 921 | --------------------------------------------------------------------------------