├── .gitignore
├── IWSLT
    ├── dev
    │   └── placeholder
    ├── en
    │   └── placeholder
    ├── log
    │   └── placeholder
    ├── model
    │   └── placeholder
    ├── test
    │   └── placeholder
    └── zh
    │   └── placeholder
├── README.md
├── mxwrap
    ├── __init__.py
    ├── attention
    │   ├── BasicAttention.py
    │   ├── ConcatAttention.py
    │   └── __init__.py
    ├── rnn
    │   ├── BaseCell.py
    │   ├── GRU.py
    │   ├── GRUv0.py
    │   ├── LSTM.py
    │   ├── SimpleRNN.py
    │   └── __init__.py
    └── seq2seq
    │   ├── __init__.py
    │   ├── decoder.py
    │   └── encoder.py
├── nmt
    ├── dict_gen.py
    ├── inference.py
    ├── inference_mask.py
    ├── main.py
    ├── masked_bucket_io.py
    ├── masked_bucket_io_new.py
    ├── tester.py
    ├── trainer.py
    ├── xcallback.py
    ├── xconfig.py
    ├── xmetric.py
    ├── xsymbol.py
    └── xutils.py
└── trainingLog.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | dist
2 | *.egg-info
3 | build
4 | *.pyc
5 | *.paramas
6 | *.params


--------------------------------------------------------------------------------
/IWSLT/dev/placeholder:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/dev/placeholder


--------------------------------------------------------------------------------
/IWSLT/en/placeholder:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/en/placeholder


--------------------------------------------------------------------------------
/IWSLT/log/placeholder:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/log/placeholder


--------------------------------------------------------------------------------
/IWSLT/model/placeholder:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/model/placeholder


--------------------------------------------------------------------------------
/IWSLT/test/placeholder:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/test/placeholder


--------------------------------------------------------------------------------
/IWSLT/zh/placeholder:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/IWSLT/zh/placeholder


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # MXNMT: MXNet based Neural Machine Translation
 2 | 
 3 | This is an implementation of seq2seq with attention for neural machine translation with MXNet.
 4 | 
 5 | ## Warning:
 6 | This repo is no longer maintained.
 7 | I recommend https://github.com/magic282/PyTorch_seq2seq
 8 | 
 9 | ## Data
10 | 
11 | The current code uses IWSLT 2009 Chinese-English corpus as training, development and test data. Please request this data set or **use other available parallel corpus**. Data statistics,
12 | 
13 | | training | dev | test |
14 | |----------|-----|------|
15 | | 81819    | 446 | 504  |
16 | 
17 | ## Attention
18 | * This code does work with the latest mxnet. I made a new version with improved performance in the [next](https://github.com/magic282/MXNMT/tree/next) branch and it can run with the 0.9.5 mxnet. However, this branch is not complete since it lacks the decode part. **I will really appreciate it if you can contribute to this branch.** Also, I ***strongly*** recommend to use this commit (138344683e65c87af20250e3f4cdcc5a72ac3cc5) of mxnet because of [this issue](https://github.com/dmlc/mxnet/issues/5816).
19 | * The author cannot distribute this dataset. **Any email requesting this dataset to the code author will not be replied.**
20 | 
21 | ### Dev/Test Data Format
22 | The reference number of IWSLT 2009 Ch-En is 7, for example:
23 | ```
24 | 在 找 给 家里 人 的 礼物 .
25 | 
26 | i 'm searching for some gifts for my family .
27 | i want to find something for my family as presents .
28 | i 'm about to buy some presents for my family .
29 | i 'd like to buy my family something as a gift .
30 | i 'm looking for a gift for my family .
31 | i 'm looking for a present for my family .
32 | i need a gift for my family .
33 | 有 $number 块 钱 以下 的 茶 吗 ? |||| {1 ||| 1 ||| one thousand ||| $number ||| 一千}
34 | 
35 | do you have any tea under one thousand yen ?
36 | i 'd like to take a look at some tea cheaper than one thousand yen .
37 | is there any tea less than one thousand yen here ?
38 | i 'm looking for some tea under one thousand yen .
39 | do you have any tea lower than one thousand yen ?
40 | do you have any tea less than one thousand yen ?
41 | i would like to buy some tea cheaper than one thousand yen .
42 | ```
43 | 
44 | ## Result
45 | 
46 | According to my test, this code can achieve 44.18 BLEU score (with beam search) on IWSLT dev set without post-processing after 53 iteration. Specifically,
47 | `1gram=72.65%  2gram=49.63%  3gram=37.62%  4gram=28.08%   BP = 1.0000 BLEU = 0.4418`
48 | 
49 | 
50 | ## Know Issues
51 | *  Compatibility issue. The current version will ask to use Python 3 since it is annoying to handle Chinese encoding problems for Python 2.
52 | *  In the attention part, `h.dot(U)` should be pre-computed. However it seems that it won't work properly if I do so.
53 | *  The BLEU evaluator, which is an exe file and not included, should be replaced by nltk evaluator in the future.
54 | *  The model can be modified to make it achieve about 50 BLEU score on this data set.
55 | 


--------------------------------------------------------------------------------
/mxwrap/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/__init__.py


--------------------------------------------------------------------------------
/mxwrap/attention/BasicAttention.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | 
 3 | 
 4 | class BasicAttention:
 5 |     def __init__(self, batch_size,  attend_dim, state_dim):
 6 |         self.e_weight_W = mx.sym.Variable('energy_W_weight', shape=(state_dim, state_dim))
 7 |         self.e_weight_U = mx.sym.Variable('energy_U_weight', shape=(attend_dim, state_dim))
 8 |         self.e_weight_v = mx.sym.Variable('energy_v_bias', shape=(state_dim, 1))
 9 |         self.batch_size = batch_size
10 |         self.attend_dim = attend_dim
11 |         self.state_dim = state_dim
12 |         self.pre_compute_buf = {}
13 | 
14 |     def getHdotU(self, attended, idx):
15 |         if idx not in self.pre_compute_buf:
16 |             h = attended[idx]  # (batch, attend_dim)
17 |             expr = mx.sym.dot(h, self.e_weight_U, name='_energy_1_{0:03d}'.format(idx))
18 |             self.pre_compute_buf[idx] = expr
19 |         return self.pre_compute_buf[idx]
20 | 
21 |     def attend(self, attended, concat_attended, state, attend_masks, use_masking):
22 |         '''
23 | 
24 |         :param attended: list [seq_len, (batch, attend_dim)]
25 |         :param concat_attended:  (batch, seq_len, attend_dim )
26 |         :param state: (batch, state_dim)
27 |         :param attend_masks: list [seq_len, (batch, 1)]
28 |         :param use_masking: boolean
29 |         :return:
30 |         '''
31 |         seq_len = len(attended)
32 |         energy_all = []
33 |         pre_compute = mx.sym.dot(state, self.e_weight_W, name='_energy_0')
34 |         for idx in range(seq_len):
35 |             h = attended[idx]  # (batch, attend_dim)
36 |             energy = pre_compute + mx.sym.dot(h, self.e_weight_U,
37 |                                               name='_energy_1_{0:03d}'.format(idx))  # (batch, state_dim)
38 |             # energy = pre_compute + self.getHdotU(attended, idx)
39 |             energy = mx.sym.Activation(energy, act_type="tanh",
40 |                                        name='_energy_2_{0:03d}'.format(idx))  # (batch, state_dim)
41 |             energy = mx.sym.dot(energy, self.e_weight_v, name='_energy_3_{0:03d}'.format(idx))  # (batch, 1)
42 |             if use_masking:
43 |                 energy = energy * attend_masks[idx] + (1.0 - attend_masks[idx]) * (-10000.0)  # (batch, 1)
44 |             energy_all.append(energy)
45 | 
46 |         all_energy = mx.sym.Concat(*energy_all, dim=1, name='_all_energy_1')  # (batch, seq_len)
47 | 
48 |         alpha = mx.sym.SoftmaxActivation(all_energy, name='_alpha_1')  # (batch, seq_len)
49 |         alpha = mx.sym.Reshape(data=alpha, shape=(self.batch_size, seq_len, 1),
50 |                                name='_alpha_2')  # (batch, seq_len, 1)
51 | 
52 |         weighted_attended = mx.sym.broadcast_mul(alpha, concat_attended,
53 |                                                  name='_weighted_attended_1')  # (batch, seq_len, attend_dim)
54 |         weighted_attended = mx.sym.sum(data=weighted_attended, axis=1,
55 |                                        name='_weighted_attended_2')  # (batch,  attend_dim)
56 |         return alpha, weighted_attended
57 | 


--------------------------------------------------------------------------------
/mxwrap/attention/ConcatAttention.py:
--------------------------------------------------------------------------------
  1 | import mxnet as mx
  2 | 
  3 | 
  4 | class ConcatAttention:
  5 |     def __init__(self, batch_size, attend_dim, state_dim):
  6 |         self.e_weight_W = mx.sym.Variable('energy_W_weight', shape=(state_dim, state_dim))
  7 |         self.e_weight_U = mx.sym.Variable('energy_U_weight', shape=(attend_dim, state_dim))
  8 |         self.e_weight_v = mx.sym.Variable('energy_v_bias', shape=(state_dim, 1))
  9 |         self.batch_size = batch_size
 10 |         self.attend_dim = attend_dim
 11 |         self.state_dim = state_dim
 12 | 
 13 |     def pre_compute(self, attended):
 14 |         seq_len = len(attended)
 15 |         res = [None for i in range(seq_len)]
 16 |         for idx in range(seq_len):
 17 |             h = attended[idx]
 18 |             res[idx] = mx.sym.dot(h, self.e_weight_U, name='_energy_1_{0:03d}'.format(idx))
 19 |         return res
 20 | 
 21 |     def attend(self, source_pre_computed, attended, concat_attended, state, attend_masks, use_masking):
 22 |         '''
 23 | 
 24 |         :param attended: list [seq_len, (batch, attend_dim)]
 25 |         :param concat_attended:  (batch, seq_len, attend_dim )
 26 |         :param state: (batch, state_dim)
 27 |         :param attend_masks: list [seq_len, (batch, 1)]
 28 |         :param use_masking: boolean
 29 |         :return:
 30 |         '''
 31 |         seq_len = len(attended)
 32 |         energy_all = []
 33 |         pre_compute = mx.sym.dot(state, self.e_weight_W, name='_energy_0')
 34 |         for idx in range(seq_len):
 35 |             energy = pre_compute + source_pre_computed[idx]
 36 |             energy = mx.sym.Activation(energy, act_type="tanh",
 37 |                                        name='_energy_2_{0:03d}'.format(idx))  # (batch, state_dim)
 38 |             energy = mx.sym.dot(energy, self.e_weight_v, name='_energy_3_{0:03d}'.format(idx))  # (batch, 1)
 39 |             if use_masking:
 40 |                 energy = energy * attend_masks[idx] + (1.0 - attend_masks[idx]) * (-10000.0)  # (batch, 1)
 41 |             energy_all.append(energy)
 42 | 
 43 |         all_energy = mx.sym.Concat(*energy_all, dim=1, name='_all_energy_1')  # (batch, seq_len)
 44 | 
 45 |         alpha = mx.sym.SoftmaxActivation(all_energy, name='_alpha_1')  # (batch, seq_len)
 46 |         alpha = mx.sym.Reshape(data=alpha, shape=(self.batch_size, seq_len, 1),
 47 |                                name='_alpha_2')  # (batch, seq_len, 1)
 48 | 
 49 |         weighted_attended = mx.sym.broadcast_mul(alpha, concat_attended,
 50 |                                                  name='_weighted_attended_1')  # (batch, seq_len, attend_dim)
 51 |         weighted_attended = mx.sym.sum(data=weighted_attended, axis=1,
 52 |                                        name='_weighted_attended_2')  # (batch,  attend_dim)
 53 |         return alpha, weighted_attended
 54 | 
 55 |     def pre_compute_fast(self, attended):
 56 |         seq_len = len(attended)
 57 |         buf = []
 58 |         for s in attended:
 59 |             buf.append(mx.sym.expand_dims(data=s, axis=0))
 60 |         time_major_concat = mx.sym.concat(*buf, dim=0, name='time_major_concat')  # (seq, batch, dim)
 61 |         time_major_concat = mx.sym.dot(time_major_concat, self.e_weight_U, name='_expr01')  # (seq, batch, dim)
 62 |         return time_major_concat
 63 | 
 64 |     def attend_fast(self, source_pre_computed, seq_len, state, attend_masks, use_masking):
 65 |         '''
 66 | 
 67 |         :param source_pre_computed:
 68 |         :param seq_len:
 69 |         :param state:
 70 |         :param attend_masks:
 71 |         :param use_masking:
 72 |         :return:
 73 |         '''
 74 |         energy_all = []
 75 |         pre_compute = mx.sym.dot(state, self.e_weight_W, name='_energy_00')  # (batch, dim)
 76 |         pre_compute = mx.sym.expand_dims(data=pre_compute, axis=0, name='_energy_10')  # (1, batch, dim)
 77 | 
 78 |         energy = mx.sym.broadcast_add(source_pre_computed, pre_compute, name='_b_add')
 79 |         energy = mx.sym.Activation(energy, act_type="tanh", name='_energy_20')  # (seq, batch, dim)
 80 |         energy = mx.sym.dot(energy, self.e_weight_v, name='_energy_30')  # (seq, batch, 1)
 81 |         energy = mx.sym.reshape(energy, shape=(seq_len, -1))  # (seq, batch)
 82 |         energy = mx.sym.split(energy, axis=0, num_outputs=seq_len, squeeze_axis=True)  # [seq, (batch,)]
 83 | 
 84 |         for idx in range(seq_len):
 85 |             this_e = energy[idx]
 86 |             if use_masking:
 87 |                 this_e = this_e * attend_masks[idx] + (1.0 - attend_masks[idx]) * (-1e6)  # (batch,)
 88 |                 this_e = mx.sym.expand_dims(data=this_e, axis=0, name='_this_e_10')
 89 |             energy_all.append(this_e)
 90 | 
 91 |         all_energy = mx.sym.Concat(*energy_all, dim=0, name='_all_energy_1')  # (seq, batch)
 92 | 
 93 |         alpha = mx.sym.SoftmaxActivation(all_energy, name='_alpha_1')  # (seq, batch)
 94 |         alpha = mx.sym.expand_dims(data=alpha, axis=2, name='_alpha_2')  # (seq, batch, 1)
 95 | 
 96 |         weighted_attended = mx.sym.broadcast_mul(source_pre_computed, alpha,
 97 |                                                  name='_weighted_attended_1')  # (seq, batch, attend_dim)
 98 |         weighted_attended = mx.sym.sum(data=weighted_attended, axis=0,
 99 |                                        name='_weighted_attended_2')  # (batch, attend_dim)
100 |         return alpha, weighted_attended
101 | 


--------------------------------------------------------------------------------
/mxwrap/attention/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/attention/__init__.py


--------------------------------------------------------------------------------
/mxwrap/rnn/BaseCell.py:
--------------------------------------------------------------------------------
 1 | from abc import abstractmethod, ABCMeta
 2 | 
 3 | 
 4 | class BaseCell(object):
 5 |     __metaclass__ = ABCMeta
 6 | 
 7 |     def __init__(self, *args, **kwargs):
 8 |         pass
 9 | 
10 |     @abstractmethod
11 |     def apply(self):
12 |         raise NotImplementedError
13 | 


--------------------------------------------------------------------------------
/mxwrap/rnn/GRU.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | from collections import namedtuple
 3 | from .BaseCell import BaseCell
 4 | 
 5 | 
 6 | class GRU(BaseCell):
 7 |     def __init__(self, name, num_hidden, **kwargs):
 8 |         super(BaseCell, self).__init__()
 9 |         self.name = name + '_GRU'
10 |         self.num_hidden = num_hidden
11 |         self.W = mx.sym.Variable("{0}_W_weight".format(self.name))
12 |         self.B = mx.sym.Variable("{0}_W_bias".format(self.name))
13 |         self.U = mx.sym.Variable("{0}_U_weight".format(self.name))
14 | 
15 |     def apply(self, indata, prev_h, seqidx, mask=None):
16 |         xW = mx.sym.FullyConnected(data=indata,
17 |                                    weight=self.W,
18 |                                    bias=self.B,
19 |                                    num_hidden=self.num_hidden * 3,
20 |                                    name="{0}_xW_{1}".format(self.name, seqidx)
21 |                                    )
22 |         # hU = mx.sym.dot(prev_state.h, param.gru_U_weight)
23 |         hU = mx.sym.FullyConnected(data=prev_h,
24 |                                    weight=self.U,
25 |                                    num_hidden=self.num_hidden * 3,
26 |                                    no_bias=True,
27 |                                    name="{0}_hU_{1}".format(self.name, seqidx)
28 |                                    )
29 |         xW_s = mx.sym.split(num_outputs=3, data=xW)
30 |         hU_s = mx.sym.split(num_outputs=3, data=hU)
31 |         r = mx.sym.Activation(data=(xW_s[0] + hU_s[0]), act_type='sigmoid')
32 |         z = mx.sym.Activation(data=(xW_s[1] + hU_s[1]), act_type='sigmoid')
33 |         h1 = mx.sym.Activation(data=(xW_s[2] + r * hU_s[2]), act_type='tanh')
34 | 
35 |         h = (h1 - prev_h) * z + prev_h
36 |         if mask:
37 |             h = mx.sym.broadcast_mul(mask, h, name='bm_1') + mx.sym.broadcast_mul((1 - mask), prev_h, name='bm_2')
38 |         return h
39 | 


--------------------------------------------------------------------------------
/mxwrap/rnn/GRUv0.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | from collections import namedtuple
 3 | 
 4 | GRUState = namedtuple("GRUState", ["h"])
 5 | GRUParam = namedtuple("GRUParam", ["gates_i2h_weight", "gates_i2h_bias",
 6 |                                    "gates_h2h_weight", "gates_h2h_bias",
 7 |                                    "trans_i2h_weight", "trans_i2h_bias",
 8 |                                    "trans_h2h_weight", "trans_h2h_bias"])
 9 | GRUModel = namedtuple("GRUModel", ["rnn_exec", "symbol",
10 |                                    "init_states", "last_states",
11 |                                    "seq_data", "seq_labels", "seq_outputs",
12 |                                    "param_blocks"])
13 | 
14 | 
15 | def gru(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0.):
16 |     """
17 |     GRU Cell symbol
18 |     Reference:
19 |     * Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural
20 |         networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
21 |     """
22 |     if dropout > 0.:
23 |         indata = mx.sym.Dropout(data=indata, p=dropout)
24 |     i2h = mx.sym.FullyConnected(data=indata,
25 |                                 weight=param.gates_i2h_weight,
26 |                                 bias=param.gates_i2h_bias,
27 |                                 num_hidden=num_hidden * 2,
28 |                                 name="t%d_l%d_gates_i2h" % (seqidx, layeridx))
29 |     h2h = mx.sym.FullyConnected(data=prev_state.h,
30 |                                 weight=param.gates_h2h_weight,
31 |                                 bias=param.gates_h2h_bias,
32 |                                 num_hidden=num_hidden * 2,
33 |                                 name="t%d_l%d_gates_h2h" % (seqidx, layeridx))
34 |     gates = i2h + h2h
35 |     slice_gates = mx.sym.SliceChannel(gates, num_outputs=2,
36 |                                       name="t%d_l%d_slice" % (seqidx, layeridx))
37 |     update_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid")
38 |     reset_gate = mx.sym.Activation(slice_gates[1], act_type="sigmoid")
39 |     # The transform part of GRU is a little magic
40 |     htrans_i2h = mx.sym.FullyConnected(data=indata,
41 |                                        weight=param.trans_i2h_weight,
42 |                                        bias=param.trans_i2h_bias,
43 |                                        num_hidden=num_hidden,
44 |                                        name="t%d_l%d_trans_i2h" % (seqidx, layeridx))
45 |     h_after_reset = prev_state.h * reset_gate
46 |     htrans_h2h = mx.sym.FullyConnected(data=h_after_reset,
47 |                                        weight=param.trans_h2h_weight,
48 |                                        bias=param.trans_h2h_bias,
49 |                                        num_hidden=num_hidden,
50 |                                        name="t%d_l%d_trans_i2h" % (seqidx, layeridx))
51 |     h_trans = htrans_i2h + htrans_h2h
52 |     h_trans_active = mx.sym.Activation(h_trans, act_type="tanh")
53 |     next_h = prev_state.h + update_gate * (h_trans_active - prev_state.h)
54 |     return GRUState(h=next_h)
55 | 


--------------------------------------------------------------------------------
/mxwrap/rnn/LSTM.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | from collections import namedtuple
 3 | 
 4 | 
 5 | LSTMState = namedtuple("LSTMState", ["c", "h"])
 6 | LSTMParam = namedtuple("LSTMParam", ["i2h_weight", "i2h_bias",
 7 |                                      "h2h_weight", "h2h_bias"])
 8 | LSTMModel = namedtuple("LSTMModel", ["rnn_exec", "symbol",
 9 |                                      "init_states", "last_states",
10 |                                      "seq_data", "seq_labels", "seq_outputs",
11 |                                      "param_blocks"])
12 | 
13 | 
14 | def lstm(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0.):
15 |     """LSTM Cell symbol"""
16 |     if dropout > 0.:
17 |         indata = mx.sym.Dropout(data=indata, p=dropout)
18 |     i2h = mx.sym.FullyConnected(data=indata,
19 |                                 weight=param.i2h_weight,
20 |                                 bias=param.i2h_bias,
21 |                                 num_hidden=num_hidden * 4,
22 |                                 name="t%d_l%d_i2h" % (seqidx, layeridx))
23 |     h2h = mx.sym.FullyConnected(data=prev_state.h,
24 |                                 weight=param.h2h_weight,
25 |                                 bias=param.h2h_bias,
26 |                                 num_hidden=num_hidden * 4,
27 |                                 name="t%d_l%d_h2h" % (seqidx, layeridx))
28 |     gates = i2h + h2h
29 |     slice_gates = mx.sym.SliceChannel(gates, num_outputs=4,
30 |                                       name="t%d_l%d_slice" % (seqidx, layeridx))
31 |     in_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid")
32 |     in_transform = mx.sym.Activation(slice_gates[1], act_type="tanh")
33 |     forget_gate = mx.sym.Activation(slice_gates[2], act_type="sigmoid")
34 |     out_gate = mx.sym.Activation(slice_gates[3], act_type="sigmoid")
35 |     next_c = (forget_gate * prev_state.c) + (in_gate * in_transform)
36 |     next_h = out_gate * mx.sym.Activation(next_c, act_type="tanh")
37 |     return LSTMState(c=next_c, h=next_h)
38 | 


--------------------------------------------------------------------------------
/mxwrap/rnn/SimpleRNN.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | from collections import namedtuple
 3 | 
 4 | RNNState = namedtuple("RNNState", ["h"])
 5 | RNNParam = namedtuple("RNNParam", ["i2h_weight", "i2h_bias",
 6 |                                    "h2h_weight", "h2h_bias"])
 7 | RNNModel = namedtuple("RNNModel", ["rnn_exec", "symbol",
 8 |                                    "init_states", "last_states",
 9 |                                    "seq_data", "seq_labels", "seq_outputs",
10 |                                    "param_blocks"])
11 | 
12 | 
13 | def rnn(num_hidden, in_data, prev_state, param, seqidx, layeridx, dropout=0., batch_norm=False):
14 |     if dropout > 0. :
15 |         in_data = mx.sym.Dropout(data=in_data, p=dropout)
16 |     i2h = mx.sym.FullyConnected(data=in_data,
17 |                                 weight=param.i2h_weight,
18 |                                 bias=param.i2h_bias,
19 |                                 num_hidden=num_hidden,
20 |                                 name="t%d_l%d_i2h" % (seqidx, layeridx))
21 |     h2h = mx.sym.FullyConnected(data=prev_state.h,
22 |                                 weight=param.h2h_weight,
23 |                                 bias=param.h2h_bias,
24 |                                 num_hidden=num_hidden,
25 |                                 name="t%d_l%d_h2h" % (seqidx, layeridx))
26 |     hidden = i2h + h2h
27 | 
28 |     hidden = mx.sym.Activation(data=hidden, act_type="tanh")
29 |     if batch_norm == True:
30 |         hidden = mx.sym.BatchNorm(data=hidden)
31 |     return RNNState(h=hidden)


--------------------------------------------------------------------------------
/mxwrap/rnn/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/rnn/__init__.py


--------------------------------------------------------------------------------
/mxwrap/seq2seq/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/magic282/MXNMT/18b96a74e5891919e0363eef138cb09c3a0a2592/mxwrap/seq2seq/__init__.py


--------------------------------------------------------------------------------
/mxwrap/seq2seq/decoder.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | 
 3 | from ..rnn.GRU import GRU
 4 | 
 5 | 
 6 | class GruAttentionDecoder(object):
 7 |     def __init__(self, use_masking,
 8 |                  state_dim,
 9 |                  input_dim, output_dim,
10 |                  vocab_size, embed_dim,
11 |                  dropout=0.0, num_of_layer=1,
12 |                  attention=None, **kwargs):
13 |         self.use_masking = use_masking
14 |         self.state_dim = state_dim
15 |         self.input_dim = input_dim
16 |         self.output_dim = output_dim
17 |         self.vocab_size = vocab_size
18 |         self.embed_dim = embed_dim
19 |         self.dropout = dropout
20 |         self.num_of_layer = num_of_layer
21 |         self.attention = attention
22 |         self.kwargs = kwargs
23 |         self.gru = GRU('decode', self.state_dim)
24 |         # declare variables
25 |         self.embed_weight = mx.sym.Variable("target_embed_weight")
26 |         self.cls_weight = mx.sym.Variable("target_cls_weight")
27 |         self.cls_bias = mx.sym.Variable("target_cls_bias")
28 |         self.init_weight = mx.sym.Variable("target_init_weight")
29 |         self.init_bias = mx.sym.Variable("target_init_bias")
30 | 
31 |     def decode(self, target_len, encoded_for_init_state, encoded, encoded_mask):
32 |         # last_encoded = encoded[-1]
33 | 
34 |         data = mx.sym.Variable('target')  # target input data
35 |         label = mx.sym.Variable('target_softmax_label')  # target label data
36 | 
37 |         hidden_all = [None for _ in range(target_len)]
38 |         context_all = [None for _ in range(target_len)]
39 |         all_weights = [None for _ in range(target_len)]
40 |         readout_all = [None for _ in range(target_len)]
41 | 
42 |         init_h = mx.sym.FullyConnected(data=encoded_for_init_state, num_hidden=self.state_dim * self.num_of_layer,
43 |                                        weight=self.init_weight, bias=self.init_bias, name='init_fc')
44 |         init_h = mx.sym.Activation(data=init_h, act_type='tanh', name='init_act')
45 | 
46 |         # embedding layer
47 |         embed = mx.sym.Embedding(data=data, input_dim=self.vocab_size,
48 |                                  weight=self.embed_weight, output_dim=self.embed_dim, name='target_embed')
49 |         wordvec = mx.sym.split(data=embed, num_outputs=target_len, squeeze_axis=1)
50 |         # split mask
51 |         if self.use_masking:
52 |             input_mask = mx.sym.Variable('target_mask')
53 |             # masks = mx.sym.split(data=input_mask, num_outputs=target_len, name='sliced_target_mask')
54 | 
55 |         source_attention_pre_compute = self.attention.pre_compute_fast(encoded)
56 | 
57 |         for seq_idx in range(target_len):
58 |             # mask = masks[seq_idx] if self.use_masking else None
59 |             if seq_idx == 0:
60 |                 hidden_all[seq_idx] = init_h
61 |             else:
62 |                 in_x = mx.sym.Concat(wordvec[seq_idx], context_all[seq_idx - 1])
63 |                 # hidden_all[seq_idx] = self.gru.apply(in_x, hidden_all[seq_idx - 1], seq_idx, mask)
64 |                 hidden_all[seq_idx] = self.gru.apply(in_x, hidden_all[seq_idx - 1], seq_idx)
65 | 
66 |             weights, weighted_encoded = self.attention.attend_fast(source_pre_computed=source_attention_pre_compute,
67 |                                                                    seq_len=len(encoded),
68 |                                                                    state=hidden_all[seq_idx],
69 |                                                                    attend_masks=encoded_mask,
70 |                                                                    use_masking=True)
71 |             context_all[seq_idx] = weighted_encoded
72 |             all_weights[seq_idx] = weights
73 |             readout_all[seq_idx] = mx.sym.Concat(wordvec[seq_idx], context_all[seq_idx], hidden_all[seq_idx])
74 | 
75 |         hidden_concat = mx.sym.Concat(*readout_all, dim=0)
76 |         pred = mx.sym.FullyConnected(data=hidden_concat, num_hidden=self.output_dim,
77 |                                      weight=self.cls_weight, bias=self.cls_bias, name='target_pred')
78 | 
79 |         label = mx.sym.transpose(data=label)
80 |         label = mx.sym.Reshape(data=label, shape=(-1,))
81 | 
82 |         sm = mx.sym.SoftmaxOutput(data=pred, label=label,
83 |                                   use_ignore=True, ignore_label=0, normalization='valid',
84 |                                   name='target_softmax')
85 |         return sm
86 | 
87 |         # loss = mx.sym.softmax_cross_entropy(pred, label)
88 |         # loss = mx.sym.MakeLoss(loss)
89 |         # return loss
90 | 


--------------------------------------------------------------------------------
/mxwrap/seq2seq/encoder.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | 
 3 | from ..rnn.GRU import  GRU
 4 | 
 5 | 
 6 | class BiDirectionalGruEncoder(object):
 7 |     def __init__(self, use_masking,
 8 |                  state_dim,
 9 |                  input_dim, output_dim,
10 |                  vocab_size, embed_dim,
11 |                  dropout=0.0, num_of_layer=1):
12 |         self.use_masking = use_masking
13 |         self.state_dim = state_dim
14 |         self.input_dim = input_dim
15 |         self.output_dim = output_dim
16 |         self.vocab_size = vocab_size
17 |         self.embed_dim = embed_dim
18 |         self.dropout = dropout
19 |         self.num_of_layer = num_of_layer
20 |         # declare variables
21 |         self.forward_gru = GRU('forward_source', self.state_dim)
22 |         self.backward_gru = GRU('backward_source', self.state_dim)
23 |         self.embed_weight = mx.sym.Variable("source_embed_weight")
24 | 
25 |     def encode(self, seq_len):
26 |         data = mx.sym.Variable('source')  # input data, source
27 | 
28 |         # embedding layer
29 |         embed = mx.sym.Embedding(data=data, input_dim=self.vocab_size,
30 |                                  weight=self.embed_weight, output_dim=self.embed_dim, name='source_embed')
31 |         wordvec = mx.sym.split(data=embed, num_outputs=seq_len, squeeze_axis=1)
32 | 
33 |         # split mask
34 |         if self.use_masking:
35 |             input_mask = mx.sym.Variable('source_mask')
36 |             enc_masks = mx.sym.split(data=input_mask, num_outputs=seq_len, squeeze_axis='False',
37 |                                      name='sliced_source_mask')
38 |             att_masks = mx.sym.split(data=input_mask, num_outputs=seq_len, squeeze_axis='True',
39 |                                      name='sliced_source_mask')
40 | 
41 |         forward_hidden = [None for i in range(seq_len)]
42 |         backward_hidden = [None for i in range(seq_len)]
43 |         bi_hidden = []
44 |         for seq_idx in range(seq_len):
45 |             word = wordvec[seq_idx]
46 |             mask = enc_masks[seq_idx] if self.use_masking else None
47 |             if seq_idx == 0:
48 |                 forward_hidden[seq_idx] = mx.sym.Variable("forward_source_l0_init_h")
49 |             else:
50 |                 forward_hidden[seq_idx] = self.forward_gru.apply(word, forward_hidden[seq_idx - 1], seq_idx, mask)
51 | 
52 |         for seq_idx in range(seq_len - 1, -1, -1):
53 |             word = wordvec[seq_idx]
54 |             mask = enc_masks[seq_idx] if self.use_masking else None
55 |             if seq_idx == seq_len - 1:
56 |                 backward_hidden[seq_idx] = mx.sym.Variable("backward_source_l0_init_h")
57 |             else:
58 |                 backward_hidden[seq_idx] = self.backward_gru.apply(word, backward_hidden[seq_idx + 1], seq_idx, mask)
59 | 
60 |         # for seq_idx in range(self.seq_len):
61 |         for f, b in zip(forward_hidden, backward_hidden):
62 |             bi = mx.sym.Concat(f, b, dim=1)
63 |             bi_hidden.append(bi)
64 | 
65 |         if self.use_masking:
66 |             return forward_hidden, backward_hidden, bi_hidden, att_masks
67 |         else:
68 |             return forward_hidden, backward_hidden, bi_hidden
69 | 


--------------------------------------------------------------------------------
/nmt/dict_gen.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | import argparse
  4 | import pickle
  5 | import gzip
  6 | import bz2
  7 | import logging
  8 | import os
  9 | 
 10 | import numpy
 11 | import tables
 12 | 
 13 | from collections import Counter
 14 | from operator import add
 15 | from numpy.lib.stride_tricks import as_strided
 16 | 
 17 | parser = argparse.ArgumentParser(
 18 |     description="""
 19 | This takes a list of .txt or .txt.gz files and does word counting and
 20 | creating a dictionary (potentially limited by size). It uses this
 21 | dictionary to binarize the text into a numeric format (replacing OOV
 22 | words with 1) and create n-grams of a fixed size (padding the sentence
 23 | with 0 for EOS and BOS markers as necessary). The n-gram data can be
 24 | split up in a training and validation set.
 25 | 
 26 | The n-grams are saved to HDF5 format whereas the dictionary, word counts
 27 | and binarized text are all pickled Python objects.
 28 | """, formatter_class=argparse.RawTextHelpFormatter)
 29 | parser.add_argument("input", type=argparse.FileType('r', encoding='utf-8'), nargs="+",
 30 |                     help="The input files")
 31 | parser.add_argument("-b", "--binarized-text", default='binarized_text.pkl',
 32 |                     help="the name of the pickled binarized text file")
 33 | parser.add_argument("-d", "--dictionary", default='vocab.pkl',
 34 |                     help="the name of the pickled binarized text file")
 35 | parser.add_argument("-n", "--ngram", type=int, metavar="N",
 36 |                     help="create n-grams")
 37 | parser.add_argument("-v", "--vocab", type=int, metavar="N",
 38 |                     help="limit vocabulary size to this number, which must "
 39 |                          "include BOS/EOS and OOV markers")
 40 | parser.add_argument("-p", "--pickle", action="store_true",
 41 |                     help="pickle the text as a list of lists of ints")
 42 | parser.add_argument("-s", "--split", type=float, metavar="N",
 43 |                     help="create a validation set. If >= 1 take this many "
 44 |                          "samples for the validation set, if < 1, take this "
 45 |                          "fraction of the samples")
 46 | parser.add_argument("-o", "--overwrite", action="store_true",
 47 |                     help="overwrite earlier created files, also forces the "
 48 |                          "program not to reuse count files")
 49 | parser.add_argument("-e", "--each", action="store_true",
 50 |                     help="output files for each separate input file")
 51 | parser.add_argument("-c", "--count", action="store_true",
 52 |                     help="save the word counts")
 53 | parser.add_argument("-t", "--char", action="store_true",
 54 |                     help="character-level processing")
 55 | parser.add_argument("-l", "--lowercase", action="store_true",
 56 |                     help="lowercase")
 57 | 
 58 | 
 59 | def open_files():
 60 |     base_filenames = []
 61 |     for i, input_file in enumerate(args.input):
 62 |         dirname, filename = os.path.split(input_file.name)
 63 |         if filename.split(os.extsep)[-1] == 'gz':
 64 |             base_filename = filename.rstrip('.gz')
 65 |         elif filename.split(os.extsep)[-1] == 'bz2':
 66 |             base_filename = filename.rstrip('.bz2')
 67 |         else:
 68 |             base_filename = filename
 69 |         if base_filename.split(os.extsep)[-1] == 'txt':
 70 |             base_filename = base_filename.rstrip('.txt')
 71 |         if filename.split(os.extsep)[-1] == 'gz':
 72 |             args.input[i] = gzip.GzipFile(input_file.name, input_file.mode,
 73 |                                           9, input_file)
 74 |         elif filename.split(os.extsep)[-1] == 'bz2':
 75 |             args.input[i] = bz2.BZ2File(input_file.name, input_file.mode)
 76 |         base_filenames.append(base_filename)
 77 |     return base_filenames
 78 | 
 79 | 
 80 | def safe_pickle(obj, filename):
 81 |     if os.path.isfile(filename) and not args.overwrite:
 82 |         logger.warning("Not saving %s, already exists." % (filename))
 83 |     else:
 84 |         if os.path.isfile(filename):
 85 |             logger.info("Overwriting %s." % filename)
 86 |         else:
 87 |             logger.info("Saving to %s." % filename)
 88 |         with open(filename, 'wb') as f:
 89 |             pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL)
 90 | 
 91 | 
 92 | def safe_hdf(array, name):
 93 |     if os.path.isfile(name + '.hdf') and not args.overwrite:
 94 |         logger.warning("Not saving %s, already exists." % (name + '.hdf'))
 95 |     else:
 96 |         if os.path.isfile(name + '.hdf'):
 97 |             logger.info("Overwriting %s." % (name + '.hdf'))
 98 |         else:
 99 |             logger.info("Saving to %s." % (name + '.hdf'))
100 |         with tables.openFile(name + '.hdf', 'w') as f:
101 |             atom = tables.Atom.from_dtype(array.dtype)
102 |             filters = tables.Filters(complib='blosc', complevel=5)
103 |             ds = f.createCArray(f.root, name.replace('.', ''), atom,
104 |                                 array.shape, filters=filters)
105 |             ds[:] = array
106 | 
107 | 
108 | def create_dictionary():
109 |     # Part I: Counting the words
110 |     counters = []
111 |     sentence_counts = []
112 |     global_counter = Counter()
113 | 
114 |     for input_file, base_filename in zip(args.input, base_filenames):
115 |         count_filename = base_filename + '.count.pkl'
116 |         input_filename = os.path.basename(input_file.name)
117 |         if os.path.isfile(count_filename) and not args.overwrite:
118 |             logger.info("Loading word counts for %s from %s"
119 |                         % (input_filename, count_filename))
120 |             with open(count_filename, 'rb') as f:
121 |                 counter = pickle.load(f)
122 |             sentence_count = sum([1 for line in input_file])
123 |         else:
124 |             logger.info("Counting words in %s" % input_filename)
125 |             counter = Counter()
126 |             sentence_count = 0
127 |             for line in input_file:
128 |                 if args.lowercase:
129 |                     line = line.lower()
130 |                 words = None
131 |                 if args.char:
132 |                     words = list(line.strip().decode('utf-8'))
133 |                 else:
134 |                     words = line.strip().split(' ')
135 |                 counter.update(words)
136 |                 global_counter.update(words)
137 |                 sentence_count += 1
138 |         counters.append(counter)
139 |         sentence_counts.append(sentence_count)
140 |         logger.info("%d unique words in %d sentences with a total of %d words."
141 |                     % (len(counter), sentence_count, sum(counter.values())))
142 |         if args.each and args.count:
143 |             safe_pickle(counter, count_filename)
144 |         input_file.seek(0)
145 | 
146 |     # Part II: Combining the counts
147 |     combined_counter = global_counter
148 |     logger.info("Total: %d unique words in %d sentences with a total "
149 |                 "of %d words."
150 |                 % (len(combined_counter), sum(sentence_counts),
151 |                    sum(combined_counter.values())))
152 |     if args.count:
153 |         safe_pickle(combined_counter, 'combined.count.pkl')
154 | 
155 |     # Part III: Creating the dictionary
156 |     if args.vocab is not None:
157 |         if args.vocab <= 2:
158 |             logger.info('Building a dictionary with all unique words')
159 |             args.vocab = len(combined_counter) + 2
160 |         vocab_count = combined_counter.most_common(args.vocab - 2)
161 |         logger.info("Creating dictionary of %s most common words, covering "
162 |                     "%2.1f%% of the text."
163 |                     % (args.vocab,
164 |                        100.0 * sum([count for word, count in vocab_count]) /
165 |                        sum(combined_counter.values())))
166 |     else:
167 |         logger.info("Creating dictionary of all words")
168 |         vocab_count = counter.most_common()
169 |     vocab = {}
170 |     idx = 4  # place for pad <unk> <s> </s>
171 |     ban_words = {'<unk>', '<s>', '</s>'}
172 |     for word, count in vocab_count:
173 |         if word not in ban_words:
174 |             vocab[word] = idx
175 |             idx += 1
176 |     safe_pickle(vocab, args.dictionary)
177 |     return combined_counter, sentence_counts, counters, vocab
178 | 
179 | 
180 | def binarize():
181 |     if args.ngram:
182 |         assert numpy.iinfo(numpy.uint16).max > len(vocab)
183 |         ngrams = numpy.empty((sum(combined_counter.values()) +
184 |                               sum(sentence_counts), args.ngram),
185 |                              dtype='uint16')
186 |     binarized_corpora = []
187 |     total_ngram_count = 0
188 |     for input_file, base_filename, sentence_count in \
189 |             zip(args.input, base_filenames, sentence_counts):
190 |         input_filename = os.path.basename(input_file.name)
191 |         logger.info("Binarizing %s." % (input_filename))
192 |         binarized_corpus = []
193 |         ngram_count = 0
194 |         for sentence_count, sentence in enumerate(input_file):
195 |             if args.lowercase:
196 |                 sentence = sentence.lower()
197 |             if args.char:
198 |                 words = list(sentence.strip().decode('utf-8'))
199 |             else:
200 |                 words = sentence.strip().split(' ')
201 |             binarized_sentence = [vocab.get(word, 1) for word in words]
202 |             binarized_corpus.append(binarized_sentence)
203 |             if args.ngram:
204 |                 padded_sentence = numpy.asarray(
205 |                     [0] * (args.ngram - 1) + binarized_sentence + [0]
206 |                 )
207 |                 ngrams[total_ngram_count + ngram_count:
208 |                 total_ngram_count + ngram_count + len(words) + 1] = \
209 |                     as_strided(
210 |                         padded_sentence,
211 |                         shape=(len(words) + 1, args.ngram),
212 |                         strides=(padded_sentence.itemsize,
213 |                                  padded_sentence.itemsize)
214 |                     )
215 |             ngram_count += len(words) + 1
216 |         # endfor sentence in input_file
217 |         # Output
218 |         if args.each:
219 |             if args.pickle:
220 |                 safe_pickle(binarized_corpus, base_filename + '.pkl')
221 |             if args.ngram and args.split:
222 |                 if args.split >= 1:
223 |                     rows = int(args.split)
224 |                 else:
225 |                     rows = int(ngram_count * args.split)
226 |                 logger.info("Saving training set (%d samples) and validation "
227 |                             "set (%d samples)."
228 |                             % (ngram_count - rows, rows))
229 |                 rows = numpy.random.choice(ngram_count, rows, replace=False)
230 |                 safe_hdf(ngrams[total_ngram_count + rows],
231 |                          base_filename + '_valid')
232 |                 safe_hdf(
233 |                     ngrams[total_ngram_count + numpy.setdiff1d(
234 |                         numpy.arange(ngram_count),
235 |                         rows, True
236 |                     )], base_filename + '_train'
237 |                 )
238 |             elif args.ngram:
239 |                 logger.info("Saving n-grams to %s." % (base_filename + '.hdf'))
240 |                 safe_hdf(ngrams, base_filename)
241 |         binarized_corpora += binarized_corpus
242 |         total_ngram_count += ngram_count
243 |         input_file.seek(0)
244 |     # endfor input_file in args.input
245 |     if args.pickle:
246 |         safe_pickle(binarized_corpora, args.binarized_text)
247 |     if args.ngram and args.split:
248 |         if args.split >= 1:
249 |             rows = int(args.split)
250 |         else:
251 |             rows = int(total_ngram_count * args.split)
252 |         logger.info("Saving training set (%d samples) and validation set (%d "
253 |                     "samples)."
254 |                     % (total_ngram_count - rows, rows))
255 |         rows = numpy.random.choice(total_ngram_count, rows, replace=False)
256 |         safe_hdf(ngrams[rows], 'combined_valid')
257 |         safe_hdf(ngrams[numpy.setdiff1d(numpy.arange(total_ngram_count),
258 |                                         rows, True)], 'combined_train')
259 |     elif args.ngram:
260 |         safe_hdf(ngrams, 'combined')
261 | 
262 | 
263 | if __name__ == "__main__":
264 |     logging.basicConfig(level=logging.INFO)
265 |     logger = logging.getLogger('preprocess')
266 |     args = parser.parse_args()
267 |     base_filenames = open_files()
268 |     combined_counter, sentence_counts, counters, vocab = create_dictionary()
269 |     if args.ngram or args.pickle:
270 |         binarize()
271 | 


--------------------------------------------------------------------------------
/nmt/inference.py:
--------------------------------------------------------------------------------
  1 | import mxnet as mx
  2 | from mxwrap.rnn.LSTM import lstm, LSTMModel, LSTMParam, LSTMState
  3 | from mxwrap.seq2seq.encoder import LstmEncoder, BiDirectionalLstmEncoder
  4 | from mxwrap.attention.BasicAttention import BasicAttention
  5 | 
  6 | 
  7 | def initial_state_symbol(t_num_lstm_layer, t_num_hidden):
  8 |     encoded = mx.sym.Variable("encoded")
  9 |     init_weight = mx.sym.Variable("target_init_weight")
 10 |     init_bias = mx.sym.Variable("target_init_bias")
 11 |     init_h = mx.sym.FullyConnected(data=encoded, num_hidden=t_num_hidden,
 12 |                                    weight=init_weight, bias=init_bias, name='init_fc')
 13 |     init_h = mx.sym.Activation(data=init_h, act_type='tanh', name='init_act')
 14 |     init_hs = mx.sym.SliceChannel(data=init_h, num_outputs=t_num_lstm_layer, squeeze_axis=1)
 15 |     return init_hs
 16 | 
 17 | 
 18 | class BiS2SInferenceModel(object):
 19 |     def __init__(self,
 20 |                  s_num_lstm_layer, s_seq_len, s_vocab_size, s_num_hidden, s_num_embed, s_dropout,
 21 |                  t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label, t_dropout,
 22 |                  arg_params,
 23 |                  use_masking,
 24 |                  ctx=mx.cpu(),
 25 |                  batch_size=1):
 26 |         self.encode_sym = bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking,
 27 |                                                       s_vocab_size, s_num_hidden, s_num_embed,
 28 |                                                       s_dropout)
 29 |         attention = BasicAttention(batch_size=batch_size, seq_len=s_seq_len, attend_dim=s_num_hidden * 2,
 30 |                                    state_dim=t_num_hidden)
 31 |         self.decode_sym = lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden,
 32 |                                                        t_num_embed,
 33 |                                                        t_num_label, t_dropout, attention, s_seq_len)
 34 |         self.init_state_sym = initial_state_symbol(t_num_lstm_layer, t_num_hidden)
 35 | 
 36 |         # initialize states for LSTM
 37 |         forward_source_init_c = [('forward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in
 38 |                                  range(s_num_lstm_layer)]
 39 |         forward_source_init_h = [('forward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in
 40 |                                  range(s_num_lstm_layer)]
 41 |         backward_source_init_c = [('backward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in
 42 |                                   range(s_num_lstm_layer)]
 43 |         backward_source_init_h = [('backward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in
 44 |                                   range(s_num_lstm_layer)]
 45 |         source_init_states = forward_source_init_c + forward_source_init_h + backward_source_init_c + backward_source_init_h
 46 | 
 47 |         target_init_c = [('target_l%d_init_c' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)]
 48 |         target_init_h = [('target_l%d_init_h' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)]
 49 |         target_init_states = target_init_c + target_init_h
 50 | 
 51 |         encode_data_shape = [("source", (batch_size, s_seq_len))]
 52 |         decode_data_shape = [("target", (batch_size,))]
 53 |         attend_state_shapes = [("attended", (batch_size, s_num_hidden * 2 * s_seq_len))]
 54 |         init_state_shapes = [("encoded", (batch_size, s_num_hidden * 2))]
 55 | 
 56 |         encode_input_shapes = dict(source_init_states + encode_data_shape)
 57 |         decode_input_shapes = dict(target_init_states + decode_data_shape + attend_state_shapes)
 58 |         init_input_shapes = dict(init_state_shapes)
 59 |         self.encode_executor = self.encode_sym.simple_bind(ctx=ctx, grad_req='null', **encode_input_shapes)
 60 |         self.decode_executor = self.decode_sym.simple_bind(ctx=ctx, grad_req='null', **decode_input_shapes)
 61 |         self.init_state_executor = self.init_state_sym.simple_bind(ctx=ctx, grad_req='null', **init_input_shapes)
 62 | 
 63 |         for key in self.encode_executor.arg_dict.keys():
 64 |             if key in arg_params:
 65 |                 arg_params[key].copyto(self.encode_executor.arg_dict[key])
 66 |         for key in self.decode_executor.arg_dict.keys():
 67 |             if key in arg_params:
 68 |                 arg_params[key].copyto(self.decode_executor.arg_dict[key])
 69 |         for key in self.init_state_executor.arg_dict.keys():
 70 |             if key in arg_params:
 71 |                 arg_params[key].copyto(self.init_state_executor.arg_dict[key])
 72 | 
 73 |         encode_state_name = []
 74 |         decode_state_name = []
 75 |         for i in range(s_num_lstm_layer):
 76 |             encode_state_name.append("forward_source_l%d_init_c" % i)
 77 |             encode_state_name.append("forward_source_l%d_init_h" % i)
 78 |             encode_state_name.append("backward_source_l%d_init_c" % i)
 79 |             encode_state_name.append("backward_source_l%d_init_h" % i)
 80 |         for i in range(t_num_lstm_layer):
 81 |             decode_state_name.append("target_l%d_init_c" % i)
 82 |             decode_state_name.append("target_l%d_init_h" % i)
 83 | 
 84 |         self.encode_states_dict = dict(zip(encode_state_name, self.encode_executor.outputs))
 85 |         self.decode_states_dict = dict(zip(decode_state_name, self.decode_executor.outputs[1:]))
 86 | 
 87 |     def encode(self, input_data):
 88 |         for key in self.encode_states_dict.keys():
 89 |             self.encode_executor.arg_dict[key][:] = 0.
 90 |         input_data.copyto(self.encode_executor.arg_dict["source"])
 91 |         self.encode_executor.forward()
 92 |         last_encoded = self.encode_executor.outputs[0]
 93 |         all_encoded = self.encode_executor.outputs[1]
 94 |         return last_encoded, all_encoded
 95 | 
 96 |     def decode_forward(self, last_encoded, all_encoded, input_data, new_seq):
 97 |         if new_seq:
 98 |             last_encoded.copyto(self.init_state_executor.arg_dict["encoded"])
 99 |             self.init_state_executor.forward()
100 |             init_hs = self.init_state_executor.outputs[0]
101 |             init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"])
102 |             self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0
103 |             all_encoded.copyto(self.decode_executor.arg_dict["attended"])
104 |         input_data.copyto(self.decode_executor.arg_dict["target"])
105 |         self.decode_executor.forward()
106 | 
107 |         prob = self.decode_executor.outputs[0].asnumpy()
108 | 
109 |         self.decode_executor.outputs[1].copyto(self.decode_executor.arg_dict["target_l0_init_c"])
110 |         self.decode_executor.outputs[2].copyto(self.decode_executor.arg_dict["target_l0_init_h"])
111 | 
112 |         attention_weights = self.decode_executor.outputs[3].asnumpy()
113 | 
114 |         return prob, attention_weights
115 | 
116 |     def decode_forward_with_state(self, last_encoded, all_encoded, input_data, state, new_seq):
117 |         if new_seq:
118 |             last_encoded.copyto(self.init_state_executor.arg_dict["encoded"])
119 |             self.init_state_executor.forward()
120 |             init_hs = self.init_state_executor.outputs[0]
121 |             # init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"])
122 |             self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0
123 |             state = LSTMState(c=self.decode_executor.arg_dict["target_l0_init_c"], h=init_hs)
124 |             all_encoded.copyto(self.decode_executor.arg_dict["attended"])
125 |         input_data.copyto(self.decode_executor.arg_dict["target"])
126 |         state.c.copyto(self.decode_executor.arg_dict["target_l0_init_c"])
127 |         state.h.copyto(self.decode_executor.arg_dict["target_l0_init_h"])
128 |         self.decode_executor.forward()
129 | 
130 |         prob = self.decode_executor.outputs[0]
131 | 
132 |         c = self.decode_executor.outputs[1]
133 |         h = self.decode_executor.outputs[2]
134 | 
135 |         attention_weights = self.decode_executor.outputs[3]
136 | 
137 |         return prob, attention_weights, LSTMState(c=c, h=h)
138 | 
139 | 
140 | def bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking, s_vocab_size, s_num_hidden, s_num_embed,
141 |                                 s_dropout):
142 |     encoder = BiDirectionalLstmEncoder(seq_len=s_seq_len, use_masking=use_masking, state_dim=s_num_hidden,
143 |                                        input_dim=s_vocab_size,
144 |                                        output_dim=0,
145 |                                        vocab_size=s_vocab_size, embed_dim=s_num_embed,
146 |                                        dropout=s_dropout, num_of_layer=s_num_lstm_layer)
147 |     forward_hidden_all, backward_hidden_all, bi_hidden_all = encoder.encode()
148 |     concat_encoded = mx.sym.Concat(*bi_hidden_all, dim=1)
149 |     encoded_for_init_state = mx.sym.Concat(forward_hidden_all[-1], backward_hidden_all[0], dim=1,
150 |                                            name='encoded_for_init_state')
151 |     return mx.sym.Group([encoded_for_init_state, concat_encoded])
152 | 
153 | 
154 | def lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label,
155 |                                  t_dropout,
156 |                                  attention, source_seq_len):
157 |     data = mx.sym.Variable("target")
158 |     seqidx = 0
159 | 
160 |     embed_weight = mx.sym.Variable("target_embed_weight")
161 |     cls_weight = mx.sym.Variable("target_cls_weight")
162 |     cls_bias = mx.sym.Variable("target_cls_bias")
163 | 
164 |     input_weight = mx.sym.Variable("target_input_weight")
165 |     # input_bias = mx.sym.Variable("target_input_bias")
166 | 
167 |     param_cells = []
168 |     last_states = []
169 | 
170 |     for i in range(t_num_lstm_layer):
171 |         param_cells.append(LSTMParam(i2h_weight=mx.sym.Variable("target_l%d_i2h_weight" % i),
172 |                                      i2h_bias=mx.sym.Variable("target_l%d_i2h_bias" % i),
173 |                                      h2h_weight=mx.sym.Variable("target_l%d_h2h_weight" % i),
174 |                                      h2h_bias=mx.sym.Variable("target_l%d_h2h_bias" % i)))
175 |         state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i),
176 |                           h=mx.sym.Variable("target_l%d_init_h" % i))
177 |         # state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i),
178 |         #                   h=init_hs[i])
179 |         last_states.append(state)
180 |     assert (len(last_states) == t_num_lstm_layer)
181 | 
182 |     hidden = mx.sym.Embedding(data=data,
183 |                               input_dim=t_vocab_size + 1,
184 |                               output_dim=t_num_embed,
185 |                               weight=embed_weight,
186 |                               name="target_embed")
187 | 
188 |     all_encoded = mx.sym.Variable("attended")
189 |     encoded = mx.sym.SliceChannel(data=all_encoded, axis=1, num_outputs=source_seq_len)
190 |     weights, weighted_encoded = attention.attend(attended=encoded, concat_attended=all_encoded,
191 |                                                  state=last_states[0].h,
192 |                                                  attend_masks=None,
193 |                                                  use_masking=False)
194 |     con = mx.sym.Concat(hidden, weighted_encoded)
195 |     hidden = mx.sym.FullyConnected(data=con, num_hidden=t_num_embed,
196 |                                    weight=input_weight, no_bias=True, name='input_fc')
197 |     # hidden = mx.sym.Activation(data=hidden, act_type='tanh', name='input_act')
198 | 
199 |     # stack LSTM
200 |     for i in range(t_num_lstm_layer):
201 |         if i == 0:
202 |             dp = 0.
203 |         else:
204 |             dp = t_dropout
205 |         next_state = lstm(t_num_hidden, indata=hidden,
206 |                           prev_state=last_states[i],
207 |                           param=param_cells[i],
208 |                           seqidx=seqidx, layeridx=i, dropout=dp)
209 |         hidden = next_state.h
210 |         last_states[i] = next_state
211 | 
212 |     fc = mx.sym.FullyConnected(data=hidden, num_hidden=t_num_label,
213 |                                weight=cls_weight, bias=cls_bias, name='target_pred')
214 |     sm = mx.sym.SoftmaxOutput(data=fc, name='target_softmax')
215 |     output = [sm]
216 |     for state in last_states:
217 |         output.append(state.c)
218 |         output.append(state.h)
219 |     output.append(weights)
220 |     return mx.sym.Group(output)
221 | 


--------------------------------------------------------------------------------
/nmt/inference_mask.py:
--------------------------------------------------------------------------------
  1 | import mxnet as mx
  2 | from mxwrap.rnn.LSTM import lstm, LSTMModel, LSTMParam, LSTMState
  3 | from mxwrap.seq2seq.encoder import LstmEncoder, BiDirectionalLstmEncoder
  4 | from mxwrap.attention.BasicAttention import BasicAttention
  5 | 
  6 | 
  7 | def initial_state_symbol(t_num_lstm_layer, t_num_hidden):
  8 |     encoded = mx.sym.Variable("encoded")
  9 |     init_weight = mx.sym.Variable("target_init_weight")
 10 |     init_bias = mx.sym.Variable("target_init_bias")
 11 |     init_h = mx.sym.FullyConnected(data=encoded, num_hidden=t_num_hidden,
 12 |                                    weight=init_weight, bias=init_bias, name='init_fc')
 13 |     init_h = mx.sym.Activation(data=init_h, act_type='tanh', name='init_act')
 14 |     init_hs = mx.sym.SliceChannel(data=init_h, num_outputs=t_num_lstm_layer, squeeze_axis=1)
 15 |     return init_hs
 16 | 
 17 | 
 18 | class BiS2SInferenceModel_mask(object):
 19 |     def __init__(self,
 20 |                  s_num_lstm_layer, s_seq_len, s_vocab_size, s_num_hidden, s_num_embed, s_dropout,
 21 |                  t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label, t_dropout,
 22 |                  arg_params,
 23 |                  use_masking, ctx=mx.cpu(),
 24 |                  batch_size=1):
 25 |         self.encode_sym = bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking,
 26 |                                                       s_vocab_size, s_num_hidden, s_num_embed,
 27 |                                                       s_dropout)
 28 |         attention = BasicAttention(batch_size=batch_size, seq_len=s_seq_len, attend_dim=s_num_hidden * 2,
 29 |                                    state_dim=t_num_hidden)
 30 |         self.decode_sym = lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden,
 31 |                                                        t_num_embed,
 32 |                                                        t_num_label, t_dropout, attention, s_seq_len, batch_size)
 33 |         self.init_state_sym = initial_state_symbol(t_num_lstm_layer, t_num_hidden)
 34 | 
 35 |         # initialize states for LSTM
 36 |         forward_source_init_c = [('forward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in
 37 |                                  range(s_num_lstm_layer)]
 38 |         forward_source_init_h = [('forward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in
 39 |                                  range(s_num_lstm_layer)]
 40 |         backward_source_init_c = [('backward_source_l%d_init_c' % l, (batch_size, s_num_hidden)) for l in
 41 |                                   range(s_num_lstm_layer)]
 42 |         backward_source_init_h = [('backward_source_l%d_init_h' % l, (batch_size, s_num_hidden)) for l in
 43 |                                   range(s_num_lstm_layer)]
 44 |         source_init_states = forward_source_init_c + forward_source_init_h + backward_source_init_c + backward_source_init_h
 45 | 
 46 |         target_init_c = [('target_l%d_init_c' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)]
 47 |         target_init_h = [('target_l%d_init_h' % l, (batch_size, t_num_hidden)) for l in range(t_num_lstm_layer)]
 48 |         target_init_states = target_init_c + target_init_h
 49 | 
 50 |         encode_data_shape = [("source", (batch_size, s_seq_len))]
 51 |         mask_data_shape = [("source_mask", (batch_size, s_seq_len))]
 52 |         decode_data_shape = [("target", (batch_size,))]
 53 |         attend_state_shapes = [("attended", (batch_size, s_num_hidden * 2 * s_seq_len))]
 54 |         attend_mask = [("encoded_mask", (batch_size, s_seq_len))]
 55 |         init_state_shapes = [("encoded", (batch_size, s_num_hidden * 2))]
 56 | 
 57 |         encode_input_shapes = dict(source_init_states + encode_data_shape + mask_data_shape)
 58 |         decode_input_shapes = dict(target_init_states + decode_data_shape + attend_state_shapes + attend_mask)
 59 |         init_input_shapes = dict(init_state_shapes)
 60 |         self.encode_executor = self.encode_sym.simple_bind(ctx=ctx, grad_req='null', **encode_input_shapes)
 61 |         self.decode_executor = self.decode_sym.simple_bind(ctx=ctx, grad_req='null', **decode_input_shapes)
 62 |         self.init_state_executor = self.init_state_sym.simple_bind(ctx=ctx, grad_req='null', **init_input_shapes)
 63 | 
 64 |         for key in self.encode_executor.arg_dict.keys():
 65 |             if key in arg_params:
 66 |                 arg_params[key].copyto(self.encode_executor.arg_dict[key])
 67 |         for key in self.decode_executor.arg_dict.keys():
 68 |             if key in arg_params:
 69 |                 arg_params[key].copyto(self.decode_executor.arg_dict[key])
 70 |         for key in self.init_state_executor.arg_dict.keys():
 71 |             if key in arg_params:
 72 |                 arg_params[key].copyto(self.init_state_executor.arg_dict[key])
 73 | 
 74 |         encode_state_name = []
 75 |         decode_state_name = []
 76 |         for i in range(s_num_lstm_layer):
 77 |             encode_state_name.append("forward_source_l%d_init_c" % i)
 78 |             encode_state_name.append("forward_source_l%d_init_h" % i)
 79 |             encode_state_name.append("backward_source_l%d_init_c" % i)
 80 |             encode_state_name.append("backward_source_l%d_init_h" % i)
 81 |         for i in range(t_num_lstm_layer):
 82 |             decode_state_name.append("target_l%d_init_c" % i)
 83 |             decode_state_name.append("target_l%d_init_h" % i)
 84 | 
 85 |         self.encode_states_dict = dict(zip(encode_state_name, self.encode_executor.outputs))
 86 |         self.decode_states_dict = dict(zip(decode_state_name, self.decode_executor.outputs[1:]))
 87 | 
 88 |     def encode(self, input_data, input_mask):
 89 |         for key in self.encode_states_dict.keys():
 90 |             self.encode_executor.arg_dict[key][:] = 0.
 91 |         input_data.copyto(self.encode_executor.arg_dict["source"])
 92 |         input_mask.copyto(self.encode_executor.arg_dict["source_mask"])
 93 |         self.encode_executor.forward()
 94 |         last_encoded = self.encode_executor.outputs[0]
 95 |         all_encoded = self.encode_executor.outputs[1]
 96 |         return last_encoded, all_encoded
 97 | 
 98 |     def decode_forward(self, last_encoded, all_encoded, mask, input_data, new_seq):
 99 |         if new_seq:
100 |             last_encoded.copyto(self.init_state_executor.arg_dict["encoded"])
101 |             self.init_state_executor.forward()
102 |             init_hs = self.init_state_executor.outputs[0]
103 |             init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"])
104 |             self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0
105 |             all_encoded.copyto(self.decode_executor.arg_dict["attended"])
106 |             mask.copyto(self.decode_executor.arg_dict["encoded_mask"])
107 |         input_data.copyto(self.decode_executor.arg_dict["target"])
108 |         self.decode_executor.forward()
109 | 
110 |         prob = self.decode_executor.outputs[0].asnumpy()
111 | 
112 |         self.decode_executor.outputs[1].copyto(self.decode_executor.arg_dict["target_l0_init_c"])
113 |         self.decode_executor.outputs[2].copyto(self.decode_executor.arg_dict["target_l0_init_h"])
114 | 
115 |         attention_weights = self.decode_executor.outputs[3].asnumpy()
116 | 
117 |         return prob, attention_weights
118 | 
119 |     def decode_forward_with_state(self, last_encoded, all_encoded, mask, input_data, state, new_seq):
120 |         if new_seq:
121 |             last_encoded.copyto(self.init_state_executor.arg_dict["encoded"])
122 |             self.init_state_executor.forward()
123 |             init_hs = self.init_state_executor.outputs[0]
124 |             # init_hs.copyto(self.decode_executor.arg_dict["target_l0_init_h"])
125 |             self.decode_executor.arg_dict["target_l0_init_c"][:] = 0.0
126 |             state = LSTMState(c=self.decode_executor.arg_dict["target_l0_init_c"], h=init_hs)
127 |             all_encoded.copyto(self.decode_executor.arg_dict["attended"])
128 |             mask.copyto(self.decode_executor.arg_dict["encoded_mask"])
129 |         input_data.copyto(self.decode_executor.arg_dict["target"])
130 |         state.c.copyto(self.decode_executor.arg_dict["target_l0_init_c"])
131 |         state.h.copyto(self.decode_executor.arg_dict["target_l0_init_h"])
132 |         self.decode_executor.forward()
133 | 
134 |         prob = self.decode_executor.outputs[0]
135 | 
136 |         c = self.decode_executor.outputs[1]
137 |         h = self.decode_executor.outputs[2]
138 | 
139 |         attention_weights = self.decode_executor.outputs[3]
140 | 
141 |         return prob, attention_weights, LSTMState(c=c, h=h)
142 | 
143 | 
144 | def bidirectional_encode_symbol(s_num_lstm_layer, s_seq_len, use_masking, s_vocab_size, s_num_hidden, s_num_embed,
145 |                                 s_dropout):
146 |     encoder = BiDirectionalLstmEncoder(seq_len=s_seq_len, use_masking=use_masking, state_dim=s_num_hidden,
147 |                                        input_dim=s_vocab_size,
148 |                                        output_dim=0,
149 |                                        vocab_size=s_vocab_size, embed_dim=s_num_embed,
150 |                                        dropout=s_dropout, num_of_layer=s_num_lstm_layer)
151 |     forward_hidden_all, backward_hidden_all, bi_hidden_all, masks_sliced = encoder.encode()
152 |     concat_encoded = mx.sym.Concat(*bi_hidden_all, dim=1)
153 |     encoded_for_init_state = mx.sym.Concat(forward_hidden_all[-1], backward_hidden_all[0], dim=1,
154 |                                            name='encoded_for_init_state')
155 |     return mx.sym.Group([encoded_for_init_state, concat_encoded])
156 | 
157 | 
158 | def lstm_attention_decode_symbol(t_num_lstm_layer, t_seq_len, t_vocab_size, t_num_hidden, t_num_embed, t_num_label,
159 |                                  t_dropout,
160 |                                  attention, source_seq_len, batch_size):
161 |     data = mx.sym.Variable("target")
162 |     encoded_mask = mx.sym.Variable("encoded_mask")
163 |     encoded_mask = mx.sym.SliceChannel(data=encoded_mask, num_outputs=source_seq_len, name='sliced_source_mask')
164 |     seqidx = 0
165 | 
166 |     embed_weight = mx.sym.Variable("target_embed_weight")
167 |     cls_weight = mx.sym.Variable("target_cls_weight")
168 |     cls_bias = mx.sym.Variable("target_cls_bias")
169 | 
170 |     input_weight = mx.sym.Variable("target_input_weight")
171 |     # input_bias = mx.sym.Variable("target_input_bias")
172 | 
173 |     param_cells = []
174 |     last_states = []
175 | 
176 |     for i in range(t_num_lstm_layer):
177 |         param_cells.append(LSTMParam(i2h_weight=mx.sym.Variable("target_l%d_i2h_weight" % i),
178 |                                      i2h_bias=mx.sym.Variable("target_l%d_i2h_bias" % i),
179 |                                      h2h_weight=mx.sym.Variable("target_l%d_h2h_weight" % i),
180 |                                      h2h_bias=mx.sym.Variable("target_l%d_h2h_bias" % i)))
181 |         state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i),
182 |                           h=mx.sym.Variable("target_l%d_init_h" % i))
183 |         # state = LSTMState(c=mx.sym.Variable("target_l%d_init_c" % i),
184 |         #                   h=init_hs[i])
185 |         last_states.append(state)
186 |     assert (len(last_states) == t_num_lstm_layer)
187 | 
188 |     hidden = mx.sym.Embedding(data=data,
189 |                               input_dim=t_vocab_size + 1,
190 |                               output_dim=t_num_embed,
191 |                               weight=embed_weight,
192 |                               name="target_embed")
193 | 
194 |     all_encoded = mx.sym.Variable("attended")
195 |     all_attended = mx.sym.Reshape(data=all_encoded, shape=(batch_size, source_seq_len, -1),
196 |                                   name='_reshape_concat_attended')
197 |     encoded = mx.sym.SliceChannel(data=all_encoded, axis=1, num_outputs=source_seq_len)
198 |     weights, weighted_encoded = attention.attend(attended=encoded, concat_attended=all_attended,
199 |                                                  state=last_states[0].h,
200 |                                                  attend_masks=encoded_mask,
201 |                                                  use_masking=True)
202 |     con = mx.sym.Concat(hidden, weighted_encoded)
203 |     hidden = mx.sym.FullyConnected(data=con, num_hidden=t_num_embed,
204 |                                    weight=input_weight, no_bias=True, name='input_fc')
205 |     # hidden = mx.sym.Activation(data=hidden, act_type='tanh', name='input_act')
206 | 
207 |     # stack LSTM
208 |     for i in range(t_num_lstm_layer):
209 |         if i == 0:
210 |             dp = 0.
211 |         else:
212 |             dp = t_dropout
213 |         next_state = lstm(t_num_hidden, indata=hidden,
214 |                           prev_state=last_states[i],
215 |                           param=param_cells[i],
216 |                           seqidx=seqidx, layeridx=i, dropout=dp)
217 |         hidden = next_state.h
218 |         last_states[i] = next_state
219 | 
220 |     fc = mx.sym.FullyConnected(data=hidden, num_hidden=t_num_label,
221 |                                weight=cls_weight, bias=cls_bias, name='target_pred')
222 |     sm = mx.sym.SoftmaxOutput(data=fc, name='target_softmax')
223 |     output = [sm]
224 |     for state in last_states:
225 |         output.append(state.c)
226 |         output.append(state.h)
227 |     output.append(weights)
228 |     return mx.sym.Group(output)
229 | 


--------------------------------------------------------------------------------
/nmt/main.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Encoder-Decoder with attention for neural machine translation
 3 | 
 4 | """
 5 | 
 6 | import sys
 7 | 
 8 | if sys.version_info[0] < 3:
 9 |     raise Exception("Must be using Python 3")
10 | 
11 | import argparse
12 | import logging
13 | import time
14 | import os
15 | import mxnet as mx
16 | import numpy as np
17 | import xconfig
18 | 
19 | np.random.seed(65536)  # make it predictable
20 | mx.random.seed(65535)  # 2333
21 | 
22 | sys.path.append('.')
23 | sys.path.append('..')
24 | 
25 | logging.basicConfig(format='%(asctime)s %(levelname)s:%(name)s:%(message)s', level=logging.INFO, datefmt='%H:%M:%S')
26 | file_handler = logging.FileHandler(os.path.join(xconfig.log_root, time.strftime("%Y%m%d-%H%M%S") + '.log'))
27 | file_handler.setFormatter(logging.Formatter('%(asctime)s [%(levelname)-5.5s:%(name)s] %(message)s'))
28 | logging.root.addHandler(file_handler)
29 | # logger = logging.getLogger(__name__)
30 | 
31 | # Get the arguments
32 | parser = argparse.ArgumentParser()
33 | parser.add_argument(
34 |     "--mode", choices=["train", "test"], default='train',
35 |     help="The mode to run. In the `train` mode a model is trained."
36 |          " In the `test` mode a trained model is used to translate")
37 | args = parser.parse_args()
38 | 
39 | logging.info(xconfig.get_config_str())
40 | 
41 | if __name__ == "__main__":
42 |     if args.mode == 'train':
43 |         logging.info('In train mode.')
44 |         from trainer import train
45 | 
46 |         train()
47 |     elif args.mode == 'test':
48 |         logging.info('In test mode.')
49 |         from tester import test
50 | 
51 |         test()
52 | 


--------------------------------------------------------------------------------
/nmt/masked_bucket_io.py:
--------------------------------------------------------------------------------
  1 | # pylint: disable=C0111,too-many-arguments,too-many-instance-attributes,too-many-locals,redefined-outer-name,fixme
  2 | # pylint: disable=superfluous-parens, no-member, invalid-name
  3 | import sys
  4 | 
  5 | sys.path.insert(0, "../../python")
  6 | import numpy as np
  7 | import mxnet as mx
  8 | from mxnet.io import DataBatch
  9 | 
 10 | 
 11 | # The interface of a data iter that works for bucketing
 12 | #
 13 | # DataIter
 14 | #   - default_bucket_key: the bucket key for the default symbol.
 15 | #
 16 | # DataBatch
 17 | #   - provide_data: same as DataIter, but specific to this batch
 18 | #   - provide_label: same as DataIter, but specific to this batch
 19 | #   - bucket_key: the key for the bucket that should be used for this batch
 20 | 
 21 | def default_read_content(path):
 22 |     with open(path) as ins:
 23 |         content = ins.read()
 24 |         content = content.replace('\n', ' <eos> ').replace('. ', ' <eos> ')
 25 |         return content
 26 | 
 27 | 
 28 | def default_build_vocab(path):
 29 |     content = default_read_content(path)
 30 |     content = content.split(' ')
 31 |     the_vocab = {}
 32 |     idx = 1  # 0 is left for zero-padding
 33 |     the_vocab[' '] = 0  # put a dummy element here so that len(vocab) is correct
 34 |     for word in content:
 35 |         if len(word) == 0:
 36 |             continue
 37 |         if not word in the_vocab:
 38 |             the_vocab[word] = idx
 39 |             idx += 1
 40 |     return the_vocab
 41 | 
 42 | 
 43 | def default_text2id(sentence, the_vocab):
 44 |     words = sentence.split(' ')
 45 |     words = [the_vocab[w] for w in words if len(w) > 0]
 46 |     return words
 47 | 
 48 | 
 49 | def default_gen_buckets(sentences, batch_size, the_vocab):
 50 |     len_dict = {}
 51 |     max_len = -1
 52 |     for sentence in sentences:
 53 |         words = default_text2id(sentence, the_vocab)
 54 |         if len(words) == 0:
 55 |             continue
 56 |         if len(words) > max_len:
 57 |             max_len = len(words)
 58 |         if len(words) in len_dict:
 59 |             len_dict[len(words)] += 1
 60 |         else:
 61 |             len_dict[len(words)] = 1
 62 |     print(len_dict)
 63 | 
 64 |     tl = 0
 65 |     buckets = []
 66 |     for l, n in len_dict.items():  # TODO: There are better heuristic ways to do this
 67 |         if n + tl >= batch_size:
 68 |             buckets.append(l)
 69 |             tl = 0
 70 |         else:
 71 |             tl += n
 72 |     if tl > 0:
 73 |         buckets.append(max_len)
 74 |     return buckets
 75 | 
 76 | 
 77 | class SimpleBatch(object):
 78 |     def __init__(self, data_names, data, label_names, label, bucket_key):
 79 |         self.data = data
 80 |         self.label = label
 81 |         self.data_names = data_names
 82 |         self.label_names = label_names
 83 |         self.bucket_key = bucket_key
 84 | 
 85 |         self.pad = 0
 86 |         self.index = None  # TODO: what is index?
 87 | 
 88 |     @property
 89 |     def provide_data(self):
 90 |         return [(n, x.shape) for n, x in zip(self.data_names, self.data)]
 91 | 
 92 |     @property
 93 |     def provide_label(self):
 94 |         return [(n, x.shape) for n, x in zip(self.label_names, self.label)]
 95 | 
 96 | 
 97 | class DummyIter(mx.io.DataIter):
 98 |     '''A dummy iterator that always return the same batch, used for speed testing'''
 99 | 
100 |     def __init__(self, real_iter):
101 |         super(DummyIter, self).__init__()
102 |         self.real_iter = real_iter
103 |         self.provide_data = real_iter.provide_data
104 |         self.provide_label = real_iter.provide_label
105 |         self.batch_size = real_iter.batch_size
106 | 
107 |         for batch in real_iter:
108 |             self.the_batch = batch
109 |             break
110 | 
111 |     def __iter__(self):
112 |         return self
113 | 
114 |     def next(self):
115 |         return self.the_batch
116 | 
117 | 
118 | class MaskedBucketSentenceIter(mx.io.DataIter):
119 |     def __init__(self, source_path, target_path, source_vocab, target_vocab,
120 |                  buckets, batch_size,
121 |                  source_init_states, target_init_states,
122 |                  source_data_name='source', source_mask_name='source_mask',
123 |                  target_data_name='target', target_mask_name='target_mask',
124 |                  label_name='target_softmax_label',
125 |                  seperate_char=' <eos> ', text2id=None, read_content=None, max_read_sample=sys.maxsize):
126 |         super(MaskedBucketSentenceIter, self).__init__()
127 | 
128 |         if text2id is None:
129 |             self.text2id = default_text2id
130 |         else:
131 |             self.text2id = text2id
132 |         if read_content is None:
133 |             self.read_content = default_read_content
134 |         else:
135 |             self.read_content = read_content
136 |         source_sentences = self.read_content(source_path, max_read_sample)
137 |         # source_sentences = source_content.split(seperate_char)
138 | 
139 |         target_sentences = self.read_content(target_path, max_read_sample)
140 |         # target_sentences = target_content.split(seperate_char)
141 | 
142 |         assert len(source_sentences) == len(target_sentences)
143 | 
144 |         self.source_vocab_size = len(source_vocab)
145 |         self.target_vocab_size = len(target_vocab)
146 |         self.source_data_name = source_data_name
147 |         self.target_data_name = target_data_name
148 |         self.label_name = label_name
149 | 
150 |         self.source_mask_name = source_mask_name
151 |         self.target_mask_name = target_mask_name
152 | 
153 |         buckets.sort()
154 |         self.buckets = buckets
155 |         self.source_data = [[] for _ in buckets]
156 |         self.target_data = [[] for _ in buckets]
157 |         self.label_data = [[] for _ in buckets]
158 |         self.source_mask_data = [[] for _ in buckets]
159 |         self.target_mask_data = [[] for _ in buckets]
160 | 
161 |         # pre-allocate with the largest bucket for better memory sharing
162 | 
163 | 
164 |         num_of_data = len(source_sentences)
165 |         for i in range(num_of_data):
166 |             source = source_sentences[i]
167 |             target = ['<s>'] + target_sentences[i]
168 |             label = target_sentences[i] + ['</s>']
169 |             source_sentence = self.text2id(source, source_vocab)
170 |             target_sentence = self.text2id(target, target_vocab)
171 |             label_id = self.text2id(label, target_vocab)
172 |             if len(source_sentence) == 0 or len(target_sentence) == 0:
173 |                 continue
174 |             for j, bkt in enumerate(buckets):
175 |                 if bkt[0] >= len(source) and bkt[1] >= len(target):
176 |                     self.source_data[j].append(source_sentence)
177 |                     self.target_data[j].append(target_sentence)
178 |                     self.label_data[j].append(label_id)
179 |                     break
180 |                     # we just ignore the sentence it is longer than the maximum
181 |                     # bucket size here
182 |         source_data_clean = []
183 |         target_data_clean = []
184 |         label_data_clean = []
185 |         buckets_clean = []
186 |         for i in range(len(self.source_data)):
187 |             if len(self.source_data[i]) >= batch_size:
188 |                 source_data_clean.append(self.source_data[i])
189 |                 target_data_clean.append(self.target_data[i])
190 |                 label_data_clean.append(self.label_data[i])
191 |                 buckets_clean.append(self.buckets[i])
192 | 
193 |         self.source_data = source_data_clean
194 |         self.target_data = target_data_clean
195 |         self.label_data = label_data_clean
196 |         self.buckets = buckets_clean
197 |         del buckets
198 |         self.default_bucket_key = max(self.buckets)
199 | 
200 |         # convert data into ndarrays for better speed during training
201 |         source_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)]
202 |         source_mask_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)]
203 |         target_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)]
204 |         target_mask_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)]
205 |         label_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.label_data)]
206 |         for i_bucket in range(len(self.buckets)):
207 |             for j in range(len(self.source_data[i_bucket])):
208 |                 source = self.source_data[i_bucket][j]
209 |                 target = self.target_data[i_bucket][j]
210 |                 label = self.label_data[i_bucket][j]
211 |                 source_data[i_bucket][j, :len(source)] = source
212 |                 source_mask_data[i_bucket][j, :len(source)] = 1
213 |                 target_data[i_bucket][j, :len(target)] = target
214 |                 target_mask_data[i_bucket][j, :len(target)] = 1
215 |                 label_data[i_bucket][j, :len(label)] = label
216 |         self.source_data = source_data
217 |         self.source_mask_data = source_mask_data
218 |         self.target_data = target_data
219 |         self.target_mask_data = target_mask_data
220 |         self.label_data = label_data
221 | 
222 |         # Get the size of each bucket, so that we could sample
223 |         # uniformly from the bucket
224 |         bucket_sizes = [len(x) for x in self.source_data]
225 | 
226 |         print("Summary of dataset ==================")
227 |         total_count = 0
228 |         for bkt, size in zip(self.buckets, bucket_sizes):
229 |             print("bucket of {0} : {1} samples".format(bkt, size))
230 |             total_count += size
231 |         print('Total: {0} ({1}) in {2} buckets'.format(total_count, num_of_data, len(self.buckets)))
232 | 
233 |         self.batch_size = batch_size
234 |         self.make_data_iter_plan()
235 | 
236 |         self.source_init_states = source_init_states
237 |         self.target_init_states = target_init_states
238 |         self.source_init_state_arrays = [mx.nd.zeros(x[1]) for x in source_init_states]
239 |         self.target_init_state_arrays = [mx.nd.zeros(x[1]) for x in target_init_states]
240 | 
241 |         self.provide_data = [(source_data_name, (batch_size, self.default_bucket_key[0])),
242 |                              (source_mask_name, (batch_size, self.default_bucket_key[0])),
243 |                              (target_data_name, (batch_size, self.default_bucket_key[1])),
244 |                              # (target_mask_name, (batch_size, self.default_bucket_key[1]))
245 |                              ] + source_init_states + target_init_states
246 | 
247 |         self.provide_label = [(label_name, (self.batch_size, self.default_bucket_key[1]))]
248 | 
249 |     def make_data_iter_plan(self):
250 |         "make a random data iteration plan"
251 |         # truncate each bucket into multiple of batch-size
252 |         bucket_n_batches = []
253 |         for i in range(len(self.source_data)):
254 |             bucket_n_batches.append(len(self.source_data[i]) / self.batch_size)
255 |             self.source_data[i] = self.source_data[i][:int(bucket_n_batches[i] * self.batch_size)]
256 |             self.source_mask_data[i] = self.source_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)]
257 |             self.target_data[i] = self.target_data[i][:int(bucket_n_batches[i] * self.batch_size)]
258 |             self.target_mask_data[i] = self.target_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)]
259 | 
260 |         bucket_plan = np.hstack([np.zeros(int(n), int) + i for i, n in enumerate(bucket_n_batches)])
261 |         np.random.shuffle(bucket_plan)
262 | 
263 |         bucket_idx_all = [np.random.permutation(len(x)) for x in self.source_data]
264 | 
265 |         self.bucket_plan = bucket_plan
266 |         self.bucket_idx_all = bucket_idx_all
267 |         self.bucket_curr_idx = [0 for x in self.source_data]
268 | 
269 |         self.source_data_buffer = []
270 |         self.source_mask_data_buffer = []
271 |         self.target_data_buffer = []
272 |         self.target_mask_data_buffer = []
273 |         self.label_buffer = []
274 |         for i_bucket in range(len(self.source_data)):
275 |             source_data = np.zeros((self.batch_size, self.buckets[i_bucket][0]))
276 |             source_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][0]))
277 |             target_data = np.zeros((self.batch_size, self.buckets[i_bucket][1]))
278 |             target_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][1]))
279 |             label = np.zeros((self.batch_size, self.buckets[i_bucket][1]))
280 | 
281 |             self.source_data_buffer.append(source_data)
282 |             self.source_mask_data_buffer.append(source_mask_data)
283 |             self.target_data_buffer.append(target_data)
284 |             self.target_mask_data_buffer.append(target_mask_data)
285 |             self.label_buffer.append(label)
286 |         self.iterIndex = 0
287 | 
288 |     def next(self):
289 |         """Get next data batch from iterator.
290 | 
291 |         Returns
292 |         -------
293 |         DataBatch
294 |             The data of next batch.
295 | 
296 |         Raises
297 |         ------
298 |         StopIteration
299 |             If the end of the data is reached
300 |         """
301 |         if self.iterIndex == len(self.bucket_plan):
302 |             raise StopIteration
303 | 
304 |         i_bucket = self.bucket_plan[self.iterIndex]
305 | 
306 |         source_data = self.source_data_buffer[i_bucket]
307 |         source_mask_data = self.source_mask_data_buffer[i_bucket]
308 |         target_data = self.target_data_buffer[i_bucket]
309 |         target_mask_data = self.target_mask_data_buffer[i_bucket]
310 |         label = self.label_buffer[i_bucket]
311 | 
312 |         i_idx = self.bucket_curr_idx[i_bucket]
313 |         idx = self.bucket_idx_all[i_bucket][i_idx:i_idx + self.batch_size]
314 |         self.bucket_curr_idx[i_bucket] += self.batch_size
315 |         source_data[:] = self.source_data[i_bucket][idx]
316 |         source_mask_data[:] = self.source_mask_data[i_bucket][idx]
317 |         target_data[:] = self.target_data[i_bucket][idx]
318 |         target_mask_data[:] = self.target_mask_data[i_bucket][idx]
319 |         label[:] = self.label_data[i_bucket][idx]
320 | 
321 |         data_all = [mx.nd.array(source_data), mx.nd.array(source_mask_data)] + \
322 |                    [mx.nd.array(target_data),
323 |                     # mx.nd.array(target_mask_data)
324 |                     ] + \
325 |                    self.source_init_state_arrays + self.target_init_state_arrays
326 |         label_all = [mx.nd.array(label)]
327 | 
328 |         bucket_key = self.buckets[i_bucket]
329 |         provide_data = [(self.source_data_name, (self.batch_size, bucket_key[0])),
330 |                         (self.source_mask_name, (self.batch_size, bucket_key[0])),
331 |                         (self.target_data_name, (self.batch_size, bucket_key[1])),
332 |                         # (self.target_mask_name, (self.batch_size, bucket_key[1]))
333 |                         ] + self.source_init_states + self.target_init_states
334 |         provide_label = [(self.label_name, (self.batch_size, bucket_key[1]))]
335 | 
336 |         data_batch = DataBatch(data_all, label_all, pad=0,
337 |                                bucket_key=bucket_key,
338 |                                provide_data=provide_data,
339 |                                provide_label=provide_label)
340 |         self.iterIndex += 1
341 |         return data_batch
342 | 
343 |     def reset(self):
344 |         self.iterIndex = 0
345 |         self.bucket_curr_idx = [0 for x in self.source_data]
346 | 


--------------------------------------------------------------------------------
/nmt/masked_bucket_io_new.py:
--------------------------------------------------------------------------------
  1 | # pylint: disable=C0111,too-many-arguments,too-many-instance-attributes,too-many-locals,redefined-outer-name,fixme
  2 | # pylint: disable=superfluous-parens, no-member, invalid-name
  3 | import sys
  4 | 
  5 | sys.path.insert(0, "../../python")
  6 | import numpy as np
  7 | import mxnet as mx
  8 | from mxnet.io import DataBatch
  9 | 
 10 | 
 11 | # The interface of a data iter that works for bucketing
 12 | #
 13 | # DataIter
 14 | #   - default_bucket_key: the bucket key for the default symbol.
 15 | #
 16 | # DataBatch
 17 | #   - provide_data: same as DataIter, but specific to this batch
 18 | #   - provide_label: same as DataIter, but specific to this batch
 19 | #   - bucket_key: the key for the bucket that should be used for this batch
 20 | 
 21 | def default_read_content(path):
 22 |     with open(path) as ins:
 23 |         content = ins.read()
 24 |         content = content.replace('\n', ' <eos> ').replace('. ', ' <eos> ')
 25 |         return content
 26 | 
 27 | 
 28 | def default_build_vocab(path):
 29 |     content = default_read_content(path)
 30 |     content = content.split(' ')
 31 |     the_vocab = {}
 32 |     idx = 1  # 0 is left for zero-padding
 33 |     the_vocab[' '] = 0  # put a dummy element here so that len(vocab) is correct
 34 |     for word in content:
 35 |         if len(word) == 0:
 36 |             continue
 37 |         if not word in the_vocab:
 38 |             the_vocab[word] = idx
 39 |             idx += 1
 40 |     return the_vocab
 41 | 
 42 | 
 43 | def default_text2id(sentence, the_vocab):
 44 |     words = sentence.split(' ')
 45 |     words = [the_vocab[w] for w in words if len(w) > 0]
 46 |     return words
 47 | 
 48 | 
 49 | def default_gen_buckets(sentences, batch_size, the_vocab):
 50 |     len_dict = {}
 51 |     max_len = -1
 52 |     for sentence in sentences:
 53 |         words = default_text2id(sentence, the_vocab)
 54 |         if len(words) == 0:
 55 |             continue
 56 |         if len(words) > max_len:
 57 |             max_len = len(words)
 58 |         if len(words) in len_dict:
 59 |             len_dict[len(words)] += 1
 60 |         else:
 61 |             len_dict[len(words)] = 1
 62 |     print(len_dict)
 63 | 
 64 |     tl = 0
 65 |     buckets = []
 66 |     for l, n in len_dict.items():  # TODO: There are better heuristic ways to do this
 67 |         if n + tl >= batch_size:
 68 |             buckets.append(l)
 69 |             tl = 0
 70 |         else:
 71 |             tl += n
 72 |     if tl > 0:
 73 |         buckets.append(max_len)
 74 |     return buckets
 75 | 
 76 | 
 77 | class SimpleBatch(object):
 78 |     def __init__(self, data_names, data, label_names, label, bucket_key):
 79 |         self.data = data
 80 |         self.label = label
 81 |         self.data_names = data_names
 82 |         self.label_names = label_names
 83 |         self.bucket_key = bucket_key
 84 | 
 85 |         self.pad = 0
 86 |         self.index = None  # TODO: what is index?
 87 | 
 88 |     @property
 89 |     def provide_data(self):
 90 |         return [(n, x.shape) for n, x in zip(self.data_names, self.data)]
 91 | 
 92 |     @property
 93 |     def provide_label(self):
 94 |         return [(n, x.shape) for n, x in zip(self.label_names, self.label)]
 95 | 
 96 | 
 97 | class DummyIter(mx.io.DataIter):
 98 |     '''A dummy iterator that always return the same batch, used for speed testing'''
 99 | 
100 |     def __init__(self, real_iter):
101 |         super(DummyIter, self).__init__()
102 |         self.real_iter = real_iter
103 |         self.provide_data = real_iter.provide_data
104 |         self.provide_label = real_iter.provide_label
105 |         self.batch_size = real_iter.batch_size
106 | 
107 |         for batch in real_iter:
108 |             self.the_batch = batch
109 |             break
110 | 
111 |     def __iter__(self):
112 |         return self
113 | 
114 |     def next(self):
115 |         return self.the_batch
116 | 
117 | 
118 | class MaskedBucketSentenceIter(mx.io.DataIter):
119 |     def __init__(self, source_path, target_path, source_vocab, target_vocab,
120 |                  buckets, batch_size,
121 |                  source_init_states, target_init_states,
122 |                  source_data_name='source', source_mask_name='source_mask',
123 |                  target_data_name='target', target_mask_name='target_mask',
124 |                  label_name='target_softmax_label',
125 |                  text2id=None, read_content=None,
126 |                  max_read_sample=sys.maxsize):
127 |         super(MaskedBucketSentenceIter, self).__init__()
128 | 
129 |         if text2id is None:
130 |             self.text2id = default_text2id
131 |         else:
132 |             self.text2id = text2id
133 |         if read_content is None:
134 |             self.read_content = default_read_content
135 |         else:
136 |             self.read_content = read_content
137 | 
138 |         source_sentences = self.read_content(source_path, max_read_sample)
139 |         target_sentences = self.read_content(target_path, max_read_sample)
140 |         assert len(source_sentences) == len(target_sentences)
141 | 
142 |         self.batch_size = batch_size
143 |         self.source_data_name = source_data_name
144 |         self.target_data_name = target_data_name
145 |         self.label_name = label_name
146 |         self.source_mask_name = source_mask_name
147 |         self.target_mask_name = target_mask_name
148 | 
149 |         buckets.sort()
150 |         self.buckets = buckets
151 |         self.source_data = [[] for _ in buckets]
152 |         self.target_data = [[] for _ in buckets]
153 |         self.label_data = [[] for _ in buckets]
154 |         self.source_mask_data = [[] for _ in buckets]
155 |         self.target_mask_data = [[] for _ in buckets]
156 | 
157 |         # pre-allocate with the largest bucket for better memory sharing
158 |         num_of_data = len(source_sentences)
159 |         for i in range(num_of_data):
160 |             source = source_sentences[i]
161 |             target = ['<s>'] + target_sentences[i]
162 |             label = target_sentences[i] + ['</s>']
163 |             source_sentence = self.text2id(source, source_vocab)
164 |             target_sentence = self.text2id(target, target_vocab)
165 |             label_id = self.text2id(label, target_vocab)
166 |             if len(source_sentence) == 0 or len(target_sentence) == 0:
167 |                 continue
168 |             for j, bkt in enumerate(buckets):
169 |                 if bkt[0] >= len(source) and bkt[1] >= len(target):
170 |                     self.source_data[j].append(source_sentence)
171 |                     self.target_data[j].append(target_sentence)
172 |                     self.label_data[j].append(label_id)
173 |                     break
174 |                     # we just ignore the sentence it is longer than the maximum
175 |                     # bucket size here
176 |         source_data_clean = []
177 |         target_data_clean = []
178 |         label_data_clean = []
179 |         buckets_clean = []
180 |         for i in range(len(self.source_data)):
181 |             if len(self.source_data[i]) > 0:
182 |                 source_data_clean.append(self.source_data[i])
183 |                 target_data_clean.append(self.target_data[i])
184 |                 label_data_clean.append(self.label_data[i])
185 |                 buckets_clean.append(self.buckets[i])
186 | 
187 |         self.source_data = source_data_clean
188 |         self.target_data = target_data_clean
189 |         self.label_data = label_data_clean
190 |         self.buckets = buckets_clean
191 |         del buckets
192 |         self.default_bucket_key = max(self.buckets)
193 | 
194 |         # convert data into ndarrays for better speed during training
195 |         source_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)]
196 |         source_mask_data = [np.zeros((len(x), self.buckets[i][0])) for i, x in enumerate(self.source_data)]
197 |         target_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)]
198 |         target_mask_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.target_data)]
199 |         label_data = [np.zeros((len(x), self.buckets[i][1])) for i, x in enumerate(self.label_data)]
200 |         for i_bucket in range(len(self.buckets)):
201 |             for j in range(len(self.source_data[i_bucket])):
202 |                 source = self.source_data[i_bucket][j]
203 |                 target = self.target_data[i_bucket][j]
204 |                 label = self.label_data[i_bucket][j]
205 |                 source_data[i_bucket][j, :len(source)] = source
206 |                 source_mask_data[i_bucket][j, :len(source)] = 1
207 |                 target_data[i_bucket][j, :len(target)] = target
208 |                 target_mask_data[i_bucket][j, :len(target)] = 1
209 |                 label_data[i_bucket][j, :len(label)] = label
210 |         self.source_data = source_data
211 |         self.source_mask_data = source_mask_data
212 |         self.target_data = target_data
213 |         self.target_mask_data = target_mask_data
214 |         self.label_data = label_data
215 | 
216 |         # Get the size of each bucket, so that we could sample
217 |         # uniformly from the bucket
218 |         bucket_sizes = [len(x) for x in self.source_data]
219 | 
220 |         print("Summary of dataset ==================")
221 |         total_count = 0
222 |         for bkt, size in zip(self.buckets, bucket_sizes):
223 |             print("bucket of {0} : {1} samples".format(bkt, size))
224 |             total_count += size
225 |         print('Total: {0} ({1}) in {2} buckets'.format(total_count, num_of_data, len(self.buckets)))
226 | 
227 |         self.make_data_iter_plan()
228 | 
229 |         self.source_init_states = source_init_states
230 |         self.target_init_states = target_init_states
231 |         self.source_init_state_arrays = [mx.nd.zeros(x[1]) for x in source_init_states]
232 |         self.target_init_state_arrays = [mx.nd.zeros(x[1]) for x in target_init_states]
233 | 
234 |         self.provide_data = [(source_data_name, (batch_size, self.default_bucket_key[0])),
235 |                              (source_mask_name, (batch_size, self.default_bucket_key[0])),
236 |                              (target_data_name, (batch_size, self.default_bucket_key[1])),
237 |                              # (target_mask_name, (batch_size, self.default_bucket_key[1]))
238 |                              ] + source_init_states + target_init_states
239 | 
240 |         self.provide_label = [(label_name, (self.batch_size, self.default_bucket_key[1]))]
241 | 
242 |     def make_data_iter_plan(self):
243 |         "make a random data iteration plan"
244 |         # truncate each bucket into multiple of batch-size
245 |         bucket_n_batches = []
246 |         for i in range(len(self.source_data)):
247 |             bucket_n_batches.append(len(self.source_data[i]) / self.batch_size)
248 |             self.source_data[i] = self.source_data[i][:int(bucket_n_batches[i] * self.batch_size)]
249 |             self.source_mask_data[i] = self.source_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)]
250 |             self.target_data[i] = self.target_data[i][:int(bucket_n_batches[i] * self.batch_size)]
251 |             self.target_mask_data[i] = self.target_mask_data[i][:int(bucket_n_batches[i] * self.batch_size)]
252 | 
253 |         bucket_plan = np.hstack([np.zeros(int(n), int) + i for i, n in enumerate(bucket_n_batches)])
254 |         np.random.shuffle(bucket_plan)
255 | 
256 |         bucket_idx_all = [np.random.permutation(len(x)) for x in self.source_data]
257 | 
258 |         self.bucket_plan = bucket_plan
259 |         self.bucket_idx_all = bucket_idx_all
260 |         self.bucket_curr_idx = [0 for _ in self.source_data]
261 | 
262 |         self.source_data_buffer = []
263 |         self.source_mask_data_buffer = []
264 |         self.target_data_buffer = []
265 |         self.target_mask_data_buffer = []
266 |         self.label_buffer = []
267 |         for i_bucket in range(len(self.source_data)):
268 |             source_data = np.zeros((self.batch_size, self.buckets[i_bucket][0]))
269 |             source_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][0]))
270 |             target_data = np.zeros((self.batch_size, self.buckets[i_bucket][1]))
271 |             target_mask_data = np.zeros((self.batch_size, self.buckets[i_bucket][1]))
272 |             label = np.zeros((self.batch_size, self.buckets[i_bucket][1]))
273 | 
274 |             self.source_data_buffer.append(source_data)
275 |             self.source_mask_data_buffer.append(source_mask_data)
276 |             self.target_data_buffer.append(target_data)
277 |             self.target_mask_data_buffer.append(target_mask_data)
278 |             self.label_buffer.append(label)
279 |         self.iterIndex = 0
280 | 
281 |     def next(self):
282 |         """Get next data batch from iterator.
283 | 
284 |         Returns
285 |         -------
286 |         DataBatch
287 |             The data of next batch.
288 | 
289 |         Raises
290 |         ------
291 |         StopIteration
292 |             If the end of the data is reached
293 |         """
294 |         if self.iterIndex == len(self.bucket_plan):
295 |             raise StopIteration
296 | 
297 |         i_bucket = self.bucket_plan[self.iterIndex]
298 | 
299 |         source_data = self.source_data_buffer[i_bucket]
300 |         source_mask_data = self.source_mask_data_buffer[i_bucket]
301 |         target_data = self.target_data_buffer[i_bucket]
302 |         target_mask_data = self.target_mask_data_buffer[i_bucket]
303 |         label = self.label_buffer[i_bucket]
304 | 
305 |         i_idx = self.bucket_curr_idx[i_bucket]
306 |         idx = self.bucket_idx_all[i_bucket][i_idx:i_idx + self.batch_size]
307 |         self.bucket_curr_idx[i_bucket] += self.batch_size
308 |         source_data[:] = self.source_data[i_bucket][idx]
309 |         source_mask_data[:] = self.source_mask_data[i_bucket][idx]
310 |         target_data[:] = self.target_data[i_bucket][idx]
311 |         target_mask_data[:] = self.target_mask_data[i_bucket][idx]
312 |         label[:] = self.label_data[i_bucket][idx]
313 | 
314 |         data_all = [mx.nd.array(source_data), mx.nd.array(source_mask_data)] + \
315 |                    [mx.nd.array(target_data),
316 |                     # mx.nd.array(target_mask_data)
317 |                     ] + \
318 |                    self.source_init_state_arrays + self.target_init_state_arrays
319 |         label_all = [mx.nd.array(label)]
320 | 
321 |         bucket_key = self.buckets[i_bucket]
322 |         provide_data = [(self.source_data_name, (self.batch_size, bucket_key[0])),
323 |                         (self.source_mask_name, (self.batch_size, bucket_key[0])),
324 |                         (self.target_data_name, (self.batch_size, bucket_key[1])),
325 |                         # (self.target_mask_name, (self.batch_size, bucket_key[1]))
326 |                         ] + self.source_init_states + self.target_init_states
327 |         provide_label = [(self.label_name, (self.batch_size, bucket_key[1]))]
328 | 
329 |         data_batch = DataBatch(data_all, label_all, pad=0,
330 |                                bucket_key=bucket_key,
331 |                                provide_data=provide_data,
332 |                                provide_label=provide_label)
333 |         self.iterIndex += 1
334 |         return data_batch
335 | 
336 |     def reset(self):
337 |         self.iterIndex = 0
338 |         self.bucket_curr_idx = [0 for _ in self.source_data]
339 | 


--------------------------------------------------------------------------------
/nmt/tester.py:
--------------------------------------------------------------------------------
  1 | import xconfig
  2 | from inference import BiS2SInferenceModel
  3 | from inference_mask import BiS2SInferenceModel_mask
  4 | from xutils import read_content, load_vocab, sentence2id, word2id
  5 | 
  6 | import mxnet as mx
  7 | import numpy as np
  8 | import logging
  9 | import random
 10 | import bisect
 11 | from collections import OrderedDict, namedtuple
 12 | from mxwrap.rnn.LSTM import LSTMState
 13 | 
 14 | BeamNode = namedtuple("BeamNode", ["father", "content", "score", "acc_score", "finish", "finishLen"])
 15 | 
 16 | random_sample = False
 17 | 
 18 | 
 19 | def get_inference_models(buckets, arg_params, source_vocab_size, target_vocab_size, ctx, batch_size):
 20 |     # build an inference model
 21 |     model_buckets = OrderedDict()
 22 |     for bucket in buckets:
 23 |         model_buckets[bucket] = BiS2SInferenceModel_mask(s_num_lstm_layer=xconfig.num_lstm_layer, s_seq_len=bucket[0],
 24 |                                                          s_vocab_size=source_vocab_size + 1,
 25 |                                                          s_num_hidden=xconfig.num_hidden, s_num_embed=xconfig.num_embed,
 26 |                                                          s_dropout=0,
 27 |                                                          t_num_lstm_layer=xconfig.num_lstm_layer, t_seq_len=bucket[1],
 28 |                                                          t_vocab_size=target_vocab_size + 1,
 29 |                                                          t_num_hidden=xconfig.num_hidden, t_num_embed=xconfig.num_embed,
 30 |                                                          t_num_label=target_vocab_size + 1, t_dropout=0,
 31 |                                                          arg_params=arg_params,
 32 |                                                          use_masking=True,
 33 |                                                          ctx=ctx, batch_size=batch_size)
 34 |     return model_buckets
 35 | 
 36 | 
 37 | def get_bucket_model(model_buckets, input_len):
 38 |     for bucket, m in model_buckets.items():
 39 |         if bucket[0] >= input_len:
 40 |             return m
 41 |     return None
 42 | 
 43 | 
 44 | # helper strcuture for prediction
 45 | def MakeRevertVocab(vocab):
 46 |     dic = {}
 47 |     for k, v in vocab.items():
 48 |         dic[v] = k
 49 |     return dic
 50 | 
 51 | 
 52 | # make input from char
 53 | def MakeInput(sentence, vocab, unroll_len, data_arr, mask_arr):
 54 |     idx = sentence2id(sentence, vocab)
 55 |     tmp = np.zeros((1, unroll_len))
 56 |     mask = np.zeros((1, unroll_len))
 57 |     for i in range(min(len(idx), unroll_len)):
 58 |         tmp[0][i] = idx[i]
 59 |         mask[0][i] = 1
 60 |     data_arr[:] = tmp
 61 |     mask_arr[:] = mask
 62 | 
 63 | 
 64 | def MakeInput_beam(sentence, vocab, unroll_len, data_arr, mask_arr, beam_size):
 65 |     idx = sentence2id(sentence, vocab)
 66 |     tmp = np.zeros((beam_size, unroll_len))
 67 |     mask = np.zeros((beam_size, unroll_len))
 68 |     for i in range(min(len(idx), unroll_len)):
 69 |         for j in range(beam_size):
 70 |             tmp[j][i] = idx[i]
 71 |             mask[j][i] = 1
 72 |     data_arr[:] = tmp
 73 |     mask_arr[:] = mask
 74 | 
 75 | 
 76 | def MakeInput_batch(sentences, vocab, unroll_len, data_arr, mask_arr, batch_size):
 77 |     tmp = np.zeros((batch_size, unroll_len))
 78 |     mask = np.zeros((batch_size, unroll_len))
 79 |     actual_sample_num = len(sentences)
 80 |     for i in range(min(batch_size, actual_sample_num)):
 81 |         idx = sentence2id(sentences[i], vocab)
 82 |         for j in range(min(len(idx), unroll_len)):
 83 |             tmp[i][j] = idx[j]
 84 |             mask[i][j] = 1
 85 |     data_arr[:] = tmp
 86 |     mask_arr[:] = mask
 87 | 
 88 | 
 89 | def MakeTargetInput(char, vocab, arr):
 90 |     idx = word2id(char, vocab)
 91 |     tmp = np.zeros((1,))
 92 |     tmp[0] = idx
 93 |     arr[:] = tmp
 94 | 
 95 | 
 96 | def MakeTargetInput_batch(chars, vocab, arr, batch_size):
 97 |     tmp = np.zeros((batch_size,))
 98 |     actual_sample_num = len(chars)
 99 |     for idx in range(min(batch_size, actual_sample_num)):
100 |         word_id = word2id(chars[idx], vocab)
101 |         tmp[idx] = word_id
102 |     arr[:] = tmp
103 | 
104 | 
105 | def MakeTargetInput_beam(beam_nodes, vocab, arr):
106 |     tmp = np.zeros((len(beam_nodes),))
107 |     for idx in range(len(beam_nodes)):
108 |         word_id = vocab[beam_nodes[idx].content] if beam_nodes[idx].content in vocab else vocab['<unk>']
109 |         tmp[idx] = word_id
110 |     arr[:] = tmp
111 | 
112 | 
113 | # helper function for random sample
114 | def _cdf(weights):
115 |     total = sum(weights)
116 |     result = []
117 |     cumsum = 0
118 |     for w in weights:
119 |         cumsum += w
120 |         result.append(cumsum / total)
121 |     return result
122 | 
123 | 
124 | def _choice(population, weights):
125 |     assert len(population) == len(weights)
126 |     cdf_vals = _cdf(weights)
127 |     x = random.random()
128 |     idx = bisect.bisect(cdf_vals, x)
129 |     return population[idx]
130 | 
131 | 
132 | # we can use random output or fixed output by choosing largest probability
133 | def MakeOutput(prob, vocab, sample=False, temperature=1.):
134 |     if sample == False:
135 |         idx = np.argmax(prob, axis=1)[0]
136 |     else:
137 |         fix_dict = [""] + [vocab[i] for i in range(1, len(vocab) + 1)]
138 |         scale_prob = np.clip(prob, 1e-6, 1 - 1e-6)
139 |         rescale = np.exp(np.log(scale_prob) / temperature)
140 |         rescale[:] /= rescale.sum()
141 |         return _choice(fix_dict, rescale[0, :])
142 |     try:
143 |         char = vocab[idx]
144 |     except:
145 |         char = ''
146 |     return char
147 | 
148 | 
149 | # we can use random output or fixed output by choosing largest probability
150 | def MakeOutput_batch(probs, vocab, sample=False, temperature=1.):
151 |     res = []
152 |     for i in range(probs.shape[0]):
153 |         prob = probs[i]
154 |         if sample == False:
155 |             idx = np.argmax(prob)
156 |         else:
157 |             fix_dict = [""] + [vocab[i] for i in range(1, len(vocab) + 1)]
158 |             scale_prob = np.clip(prob, 1e-6, 1 - 1e-6)
159 |             rescale = np.exp(np.log(scale_prob) / temperature)
160 |             rescale[:] /= rescale.sum()
161 |             return _choice(fix_dict, rescale[0, :])
162 |         try:
163 |             char = vocab[idx]
164 |         except:
165 |             char = ''
166 |         res.append(char)
167 |     return res
168 | 
169 | 
170 | def translate_one(max_decode_len, sentence, model_buckets, unroll_len, source_vocab, target_vocab, revert_vocab,
171 |                   target_ndarray):
172 |     input_length = len(sentence)
173 |     cur_model = get_bucket_model(model_buckets, input_length)
174 |     input_ndarray = mx.nd.zeros((1, unroll_len))
175 |     mask_ndarray = mx.nd.zeros((1, unroll_len))
176 |     output = ['<s>']
177 |     MakeInput(sentence, source_vocab, unroll_len, input_ndarray, mask_ndarray)
178 |     last_encoded, all_encoded = cur_model.encode(input_ndarray,
179 |                                                  mask_ndarray)  # last_encoded means the last time step hidden
180 |     for i in range(max_decode_len):
181 |         MakeTargetInput(output[-1], target_vocab, target_ndarray)
182 |         prob, attention_weights = cur_model.decode_forward(last_encoded, all_encoded, mask_ndarray, target_ndarray,
183 |                                                            i == 0)
184 |         next_char = MakeOutput(prob, revert_vocab, random_sample)
185 |         if next_char == '</s>':
186 |             break
187 |         output.append(next_char)
188 |     return output[1:]
189 | 
190 | 
191 | def translate_greedy_batch(max_decode_len, sentences, batch_size, model_buckets, unroll_len, source_vocab, target_vocab,
192 |                            revert_vocab, target_ndarray):
193 |     cur_model = get_bucket_model(model_buckets, unroll_len)
194 |     input_ndarray = mx.nd.zeros((batch_size, unroll_len))
195 |     mask_ndarray = mx.nd.zeros((batch_size, unroll_len))
196 |     output = [['<s>'] * batch_size]
197 |     MakeInput_batch(sentences, source_vocab, unroll_len, input_ndarray, mask_ndarray, batch_size)
198 |     last_encoded, all_encoded = cur_model.encode(input_ndarray,
199 |                                                  mask_ndarray)  # last_encoded means the last time step hidden
200 |     for i in range(max_decode_len):
201 |         MakeTargetInput_batch(output[-1], target_vocab, target_ndarray, batch_size)
202 |         probs, attention_weights = cur_model.decode_forward(last_encoded, all_encoded, mask_ndarray, target_ndarray,
203 |                                                             i == 0)
204 |         next_chars = MakeOutput_batch(probs, revert_vocab, random_sample)
205 |         finished = [ch == '</s>' for ch in next_chars]
206 |         if all(finished):
207 |             break
208 |         output.append(next_chars)
209 |     return output[1:]
210 | 
211 | 
212 | def _smallest(matrix, k, only_first_row=False):
213 |     """Find k smallest elements of a matrix.
214 | 
215 |     Parameters
216 |     ----------
217 |     matrix : :class:`numpy.ndarray`
218 |         The matrix.
219 |     k : int
220 |         The number of smallest elements required.
221 |     only_first_row : bool, optional
222 |         Consider only elements of the first row.
223 | 
224 |     Returns
225 |     -------
226 |     Tuple of ((row numbers, column numbers), values).
227 | 
228 |     """
229 |     if only_first_row:
230 |         flatten = matrix[:1, :].flatten()
231 |     else:
232 |         flatten = matrix.flatten()
233 |     # flatten = -flatten
234 |     args = np.argpartition(flatten, k)[:k]
235 |     args = args[np.argsort(flatten[args])]
236 |     return np.unravel_index(args, matrix.shape), flatten[args]
237 | 
238 | 
239 | def translate_one_with_beam(max_decode_len, sentence, model_buckets, unroll_len, source_vocab, target_vocab,
240 |                             revert_vocab, target_ndarray, beam_size, eos_index):
241 |     input_length = len(sentence)
242 |     cur_model = get_bucket_model(model_buckets, input_length)
243 |     input_ndarray = mx.nd.zeros((beam_size, unroll_len))
244 |     mask_ndarray = mx.nd.zeros((beam_size, unroll_len))
245 | 
246 |     beam = [[BeamNode(father=-1, content='<s>', score=0.0, acc_score=0.0, finish=False, finishLen=0) for i in
247 |              range(beam_size)]]
248 |     beam_state = [None]
249 | 
250 |     MakeInput_beam(sentence, source_vocab, unroll_len, input_ndarray, mask_ndarray, beam_size)
251 |     last_encoded, all_encoded = cur_model.encode(input_ndarray,
252 |                                                  mask_ndarray)  # last_encoded means the last time step hidden
253 |     for i in range(max_decode_len):
254 |         MakeTargetInput_beam(beam[-1], target_vocab, target_ndarray)
255 |         prob, attention_weights, new_state = cur_model.decode_forward_with_state(last_encoded, all_encoded,
256 |                                                                                  mask_ndarray, target_ndarray,
257 |                                                                                  beam_state[-1], i == 0)
258 |         log_prob = -mx.ndarray.log(prob)
259 |         finished_beam = [t for t, x in enumerate(beam[-1]) if x.finish]
260 |         for idx in range(beam_size):
261 |             # log_prob[idx] = mx.nd.add(log_prob[idx], beam[-1][idx].score)
262 |             if not beam[-1][idx].finish:
263 |                 # log_prob[idx] += beam[-1][idx].acc_score
264 |                 log_prob[idx] = (log_prob[idx] + beam[-1][idx].acc_score * beam[-1][idx].finishLen) / (
265 |                     beam[-1][idx].finishLen + 1)
266 |             else:
267 |                 # log_prob[idx] = beam[-1][idx].acc_score
268 |                 log_prob[idx] = beam[-1][idx].acc_score
269 |         for idx in finished_beam:
270 |             log_prob[idx][:eos_index] = np.inf
271 |             log_prob[idx][eos_index + 1:] = np.inf
272 | 
273 |         (indexes, outputs), chosen_costs = _smallest(log_prob.asnumpy(), beam_size, only_first_row=(i == 0))
274 |         next_chars = [revert_vocab[idx] if idx in revert_vocab else '' for idx in outputs]
275 | 
276 |         next_state_h = mx.nd.empty(new_state.h.shape, ctx=mx.gpu(0))
277 |         next_state_c = mx.nd.empty(new_state.c.shape, ctx=mx.gpu(0))
278 |         for idx in range(beam_size):
279 |             next_state_h[idx] = new_state.h[np.asscalar(indexes[idx])]
280 |             next_state_c[idx] = new_state.c[np.asscalar(indexes[idx])]
281 |         next_state = LSTMState(c=next_state_c, h=next_state_h)
282 |         beam_state.append(next_state)
283 | 
284 |         next_beam = [BeamNode(father=indexes[idx],
285 |                               content=next_chars[idx] if not beam[-1][indexes[idx]].finish else beam[-1][
286 |                                   indexes[idx]].content,
287 |                               score=chosen_costs[idx] - beam[-1][indexes[idx]].acc_score,
288 |                               acc_score=chosen_costs[idx],
289 |                               finish=(next_chars[idx] == '</s>' or beam[-1][indexes[idx]].finish),
290 |                               finishLen=(beam[-1][indexes[idx]].finishLen if beam[-1][indexes[idx]].finish else (
291 |                                   beam[-1][indexes[idx]].finishLen + 1))) for
292 |                      idx in range(beam_size)]
293 |         beam.append(next_beam)
294 |         finished = [node.finish for node in beam[-1]]
295 |         if all(finished):
296 |             break
297 |             # output.append(next_char)
298 |     all_result = []
299 |     all_score = []
300 |     for aaa in range(beam_size):
301 |         ptr = aaa
302 |         result = []
303 | 
304 |         for idx in range(len(beam) - 1 - 1, 0, -1):
305 |             word = beam[idx][ptr].content
306 |             if word != '</s>':
307 |                 result.append(word)
308 |             ptr = beam[idx][ptr].father
309 |         result = result[::-1]
310 |         all_result.append(' '.join(result))
311 |         all_score.append(beam[-1][aaa].acc_score)
312 | 
313 |     return all_result, all_score
314 | 
315 | 
316 | def test_on_file_iwslt(input_file, output_file, model_buckets, source_vocab, target_vocab, revert_vocab, ctx,
317 |                        unroll_len,
318 |                        max_decode_len,
319 |                        do_beam=False,
320 |                        beam_size=1):
321 |     beam_file = open(output_file + '_beam', 'w', encoding='utf-8') if do_beam else None
322 |     batch_size = beam_size if do_beam else 1
323 |     eos_index = target_vocab[xconfig.eos_word]
324 |     target_ndarray = mx.nd.zeros((batch_size,), ctx=ctx)
325 |     read_count = 0
326 |     with open(input_file, mode='r', encoding='utf-8') as f, open(output_file, 'w', encoding='utf-8') as of:
327 |         for line in f:
328 |             read_count += 1
329 |             if (read_count - 1) % (xconfig.bleu_ref_number + 2) != 0:
330 |                 continue
331 | 
332 |             ch = line.split(' |||| ')[0].strip().split(' ')
333 |             if do_beam:
334 |                 # en = translate_one_with_beam(ch, model_buckets, beam_size)
335 |                 all_en, all_score = translate_one_with_beam(max_decode_len, ch, model_buckets, unroll_len, source_vocab,
336 |                                                             target_vocab, revert_vocab, target_ndarray,
337 |                                                             beam_size, eos_index)
338 |                 en = all_en[0]
339 |             else:
340 |                 en = translate_one(max_decode_len, ch, model_buckets, unroll_len, source_vocab, target_vocab,
341 |                                    revert_vocab,
342 |                                    target_ndarray)
343 |                 en = ' '.join(en)
344 |             of.write(en + '\n')
345 |             if do_beam:
346 |                 for idx in range(len(all_en)):
347 |                     beam_file.write('{0}\t{1}\n'.format(all_en[idx], all_score[idx]))
348 |                 beam_file.write('\n')
349 |     if beam_file:
350 |         beam_file.close()
351 | 
352 | 
353 | def test_on_file_greedy_batch_iwslt(input_file, output_file, model_buckets, source_vocab, target_vocab, revert_vocab,
354 |                                     ctx,
355 |                                     unroll_len, max_decode_len, batch_size):
356 |     with open(input_file, mode='r', encoding='utf-8') as f:
357 |         lines = f.read().splitlines()
358 |     lines = lines[0::(xconfig.bleu_ref_number + 2)]
359 |     input_sents = [line.split(' |||| ')[0].strip().split(' ') for line in lines]
360 |     batch_sents = [input_sents[i: i + batch_size] for i in range(0, len(input_sents), batch_size)]
361 |     eos_index = target_vocab[xconfig.eos_word]
362 |     with open(output_file, 'w', encoding='utf-8') as of:
363 |         for batch in batch_sents:
364 |             target_ndarray = mx.nd.zeros((batch_size,), ctx=ctx)
365 |             output_sents = translate_greedy_batch(max_decode_len, batch, batch_size,
366 |                                                   model_buckets, unroll_len, source_vocab,
367 |                                                   target_vocab, revert_vocab, target_ndarray)
368 |             for i in range(len(batch)):
369 |                 tmp = []
370 |                 for j in range(len(output_sents)):
371 |                     word = output_sents[j][i]
372 |                     if word == xconfig.eos_word:
373 |                         break
374 |                     tmp.append(word)
375 |                 of.write(' '.join(tmp) + '\n')
376 | 
377 | 
378 | def test():
379 |     # load vocabulary
380 |     source_vocab = load_vocab(xconfig.source_vocab_path, xconfig.special_words)
381 |     target_vocab = load_vocab(xconfig.target_vocab_path, xconfig.special_words)
382 | 
383 |     revert_vocab = MakeRevertVocab(target_vocab)
384 | 
385 |     print('source_vocab size: {0}'.format(len(source_vocab)))
386 |     print('target_vocab size: {0}'.format(len(target_vocab)))
387 | 
388 |     # load model from check-point
389 |     _, arg_params, __ = mx.model.load_checkpoint(xconfig.model_to_load_prefix, xconfig.model_to_load_number)
390 | 
391 |     buckets = xconfig.buckets
392 |     buckets = [max(buckets)]
393 | 
394 |     if xconfig.use_batch_greedy_search:
395 |         if xconfig.use_beam_search:
396 |             logging.warning(
397 |                 'use_batch_greedy_search and use_beam_search both True, fallback to use_batch_greedy_search')
398 | 
399 |         model_buckets = get_inference_models(buckets, arg_params, len(source_vocab), len(target_vocab),
400 |                                              xconfig.test_device, batch_size=xconfig.greedy_batch_size)
401 |         test_on_file_greedy_batch_iwslt(input_file=xconfig.test_source, output_file=xconfig.test_output,
402 |                                         model_buckets=model_buckets,
403 |                                         source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab,
404 |                                         ctx=xconfig.test_device, unroll_len=max(buckets)[0],
405 |                                         max_decode_len=xconfig.max_decode_len, batch_size=xconfig.greedy_batch_size)
406 |     else:
407 |         model_buckets = get_inference_models(buckets, arg_params, len(source_vocab), len(target_vocab),
408 |                                              xconfig.test_device, batch_size=xconfig.beam_size)
409 |         test_on_file_iwslt(input_file=xconfig.test_source, output_file=xconfig.test_output, model_buckets=model_buckets,
410 |                            source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab,
411 |                            ctx=xconfig.test_device, unroll_len=max(buckets)[0], max_decode_len=xconfig.max_decode_len,
412 |                            do_beam=xconfig.use_beam_search, beam_size=xconfig.beam_size)
413 | 
414 |     del model_buckets
415 |     from xmetric import get_bleu
416 |     raw_output, scores = get_bleu(xconfig.test_gold, xconfig.test_output)
417 |     logging.info(raw_output)
418 |     logging.info(str(scores))
419 | 
420 | 
421 | def test_use_model_param(arg_params, test_file, output_file, gold_file, use_beam=False, beam_size=-1):
422 |     # load vocabulary
423 |     source_vocab = load_vocab(xconfig.source_vocab_path, xconfig.special_words)
424 |     target_vocab = load_vocab(xconfig.target_vocab_path, xconfig.special_words)
425 | 
426 |     revert_vocab = MakeRevertVocab(target_vocab)
427 | 
428 |     buckets = xconfig.buckets
429 |     buckets = [max(buckets)]
430 |     b_size = beam_size if use_beam else xconfig.greedy_batch_size
431 |     model_buckets = get_inference_models(buckets, arg_params, len(source_vocab), len(target_vocab),
432 |                                          xconfig.test_device, batch_size=b_size)
433 |     if use_beam:
434 |         test_on_file_iwslt(input_file=test_file, output_file=output_file, model_buckets=model_buckets,
435 |                            source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab,
436 |                            ctx=xconfig.test_device, unroll_len=max(buckets)[0], max_decode_len=xconfig.max_decode_len,
437 |                            do_beam=use_beam, beam_size=beam_size)
438 |     else:
439 |         test_on_file_greedy_batch_iwslt(input_file=test_file, output_file=output_file, model_buckets=model_buckets,
440 |                                         source_vocab=source_vocab, target_vocab=target_vocab, revert_vocab=revert_vocab,
441 |                                         ctx=xconfig.test_device, unroll_len=max(buckets)[0],
442 |                                         max_decode_len=xconfig.max_decode_len, batch_size=xconfig.greedy_batch_size)
443 |     from xmetric import get_bleu
444 |     raw_output, score = get_bleu(gold_file, output_file)
445 |     logging.info(raw_output)
446 |     del model_buckets
447 |     return score
448 | 


--------------------------------------------------------------------------------
/nmt/trainer.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | import mxnet as mx
 4 | import logging
 5 | 
 6 | import xconfig
 7 | from xsymbol import sym_gen
 8 | from xcallback import BatchCheckpoint, CheckBLEUBatch
 9 | from xutils import read_content, load_vocab, sentence2id
10 | from xmetric import Perplexity, MyMakeLoss
11 | # from masked_bucket_io import MaskedBucketSentenceIter
12 | from masked_bucket_io_new import MaskedBucketSentenceIter
13 | 
14 | 
15 | def get_GRU_shape():
16 |     # initalize states for LSTM
17 | 
18 |     forward_source_init_h = [('forward_source_l%d_init_h' % l, (xconfig.batch_size, xconfig.num_hidden)) for l in
19 |                              range(xconfig.num_lstm_layer)]
20 |     backward_source_init_h = [('backward_source_l%d_init_h' % l, (xconfig.batch_size, xconfig.num_hidden)) for l in
21 |                               range(xconfig.num_lstm_layer)]
22 |     source_init_states = forward_source_init_h + backward_source_init_h
23 | 
24 |     target_init_c = [('target_l%d_init_c' % l, (xconfig.batch_size, xconfig.num_hidden)) for l in
25 |                      range(xconfig.num_lstm_layer)]
26 |     # target_init_h = [('target_l%d_init_h' % l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
27 |     target_init_states = []
28 |     return source_init_states, target_init_states
29 | 
30 | 
31 | def train():
32 |     # load vocabulary
33 |     source_vocab = load_vocab(xconfig.source_vocab_path, xconfig.special_words)
34 |     target_vocab = load_vocab(xconfig.target_vocab_path, xconfig.special_words)
35 | 
36 |     logging.info('source_vocab size: {0}'.format(len(source_vocab)))
37 |     logging.info('target_vocab size: {0}'.format(len(target_vocab)))
38 | 
39 |     # get states shapes
40 |     source_init_states, target_init_states = get_GRU_shape()
41 |     # source_init_states, target_init_states = get_LSTM_shape()
42 | 
43 |     # build data iterator
44 |     data_train = MaskedBucketSentenceIter(xconfig.train_source, xconfig.train_target, source_vocab, target_vocab,
45 |                                           xconfig.buckets, xconfig.batch_size,
46 |                                           source_init_states, target_init_states,
47 |                                           text2id=sentence2id, read_content=read_content,
48 |                                           max_read_sample=xconfig.train_max_samples)
49 | 
50 |     # data_dev = MaskedBucketSentenceIter(xconfig.dev_source, xconfig.dev_source, source_vocab, target_vocab,
51 |     #                                     xconfig.buckets, xconfig.batch_size,
52 |     #                                     source_init_states, target_init_states, seperate_char='\n',
53 |     #                                     text2id=sentence2id, read_content=read_content,
54 |     #                                     max_read_sample=xconfig.dev_max_samples)
55 | 
56 |     # Train a LSTM network as simple as feedforward network
57 |     # optimizer = mx.optimizer.AdaDelta(clip_gradient=10.0)
58 |     optimizer = mx.optimizer.Adam(clip_gradient=10.0, rescale_grad=1.0 / xconfig.batch_size)
59 |     # optimizer = mx.optimizer.SGD(clip_gradient=10, learning_rate=0.01, rescale_grad=1.0 / xconfig.batch_size)
60 |     _arg_params = None
61 | 
62 |     if xconfig.use_resuming:
63 |         logging.info("Try resuming from {0} {1}".format(xconfig.resume_model_prefix, xconfig.resume_model_number))
64 |         try:
65 |             _, __arg_params, __ = mx.model.load_checkpoint(xconfig.resume_model_prefix, xconfig.resume_model_number)
66 |             logging.info("Resume succeeded.")
67 |             _arg_params = __arg_params
68 |         except:
69 |             logging.error('Resume failed.')
70 | 
71 |     model = mx.mod.BucketingModule(
72 |         sym_gen=sym_gen(len(source_vocab) + 1, len(target_vocab) + 1),
73 |         default_bucket_key=data_train.default_bucket_key,
74 |         context=xconfig.train_device,
75 |     )
76 | 
77 |     # Fit it
78 |     model.fit(train_data=data_train,
79 |               # eval_metric=mx.metric.np(Perplexity),
80 |               eval_metric=mx.metric.CustomMetric(Perplexity),
81 |               # eval_metric=mx.metric.np(MyMakeLoss),
82 |               batch_end_callback=[mx.callback.Speedometer(xconfig.batch_size, xconfig.show_every_x_batch), ],
83 |               # optimizer='sgd',
84 |               # optimizer_params={'clip_gradient': 10.0, },
85 |               initializer=mx.init.Xavier(factor_type="in", magnitude=2.34, rnd_type='gaussian'),
86 |               optimizer=optimizer,
87 |               num_epoch=10,
88 |               )
89 | 


--------------------------------------------------------------------------------
/nmt/xcallback.py:
--------------------------------------------------------------------------------
 1 | import xconfig
 2 | 
 3 | import mxnet as mx
 4 | import logging
 5 | 
 6 | 
 7 | class BatchCheckpoint(object):
 8 |     def __init__(self, save_name, per_x_batch):
 9 |         self.save_name = save_name
10 |         self.per_x_batch = per_x_batch
11 |         from mxnet.model import save_checkpoint
12 |         self._save = save_checkpoint
13 | 
14 |     def __call__(self, params):
15 |         # batch_end_params = BatchEndParam(epoch=epoch,
16 |         #                                  nbatch=nbatch,
17 |         #                                  eval_metric=eval_metric,
18 |         #                                  locals=locals())
19 | 
20 |         if params.nbatch % self.per_x_batch == 0:
21 |             executor_manager = params.locals['executor_manager']
22 |             param_names = executor_manager.param_names
23 |             param_arrays = executor_manager.param_arrays
24 | 
25 |             param_dict = {}
26 |             for idx, name in enumerate(param_names):
27 |                 param_dict[name] = param_arrays[idx][0]
28 | 
29 |             self._save(self.save_name, 0, params.locals['symbol'],
30 |                        param_dict, params.locals['aux_params'])
31 |             # TODO is this the correct way to save aux_params ?
32 | 
33 | 
34 | class CheckBLEUBatch(object):
35 |     def __init__(self, start_epoch, per_batch, use_beam=False, beam_size=-1):
36 |         self.best_bleu = -1.0
37 |         self.best_epoch = -1
38 |         self.start_epoch = start_epoch
39 |         self.per_batch = per_batch
40 |         self.use_beam_search = use_beam
41 |         self.beam_size = beam_size
42 |         from mxnet.model import save_checkpoint
43 |         self._save = save_checkpoint
44 |         # TODO ugly code 2333
45 |         from tester import test_use_model_param
46 |         self.bleu_computer = test_use_model_param
47 | 
48 |     def __call__(self, params):
49 |         # batch_end_params = BatchEndParam(epoch=epoch,
50 |         #                                  nbatch=nbatch,
51 |         #                                  eval_metric=eval_metric,
52 |         #                                  locals=locals())
53 | 
54 |         if params.nbatch % self.per_batch == 0:
55 |             if params.epoch < self.start_epoch:
56 |                 print('Too early to check BLEU at epoch {0}'.format(params.epoch))
57 |                 return
58 |             logging.info('Checking BLEU for epoch {0} batch {1}'.format(params.epoch, params.nbatch))
59 |             gold = xconfig.dev_source
60 |             test = xconfig.dev_source
61 |             output = xconfig.dev_output
62 | 
63 |             executor_manager = params.locals['executor_manager']
64 |             param_names = executor_manager.param_names
65 |             param_arrays = executor_manager.param_arrays
66 | 
67 |             param_dict = {}
68 |             for idx, name in enumerate(param_names):
69 |                 param_dict[name] = param_arrays[idx][0]
70 | 
71 |             cur_rouge = self.bleu_computer(arg_params=param_dict, test_file=test, output_file=output, gold_file=gold,
72 |                                            use_beam=self.use_beam_search, beam_size=self.beam_size)
73 |             logging.info('BLEU: {0} @ epoch {1} batch {2}'.format(cur_rouge, params.epoch, params.nbatch))
74 | 
75 |             if cur_rouge > self.best_bleu:
76 |                 logging.info(
77 |                     'Current BLEU: {0} > prev best {1} in epoch {2}'.format(cur_rouge, self.best_bleu,
78 |                                                                             self.best_epoch))
79 |                 self.best_bleu = cur_rouge
80 |                 self.best_epoch = params.epoch
81 |                 logging.info('Saving...')
82 |                 self._save("best_bleu", params.epoch + 1, params.locals['symbol'],
83 |                            param_dict, params.locals['aux_params'])
84 |                 # TODO is this the correct way to save aux_params ?
85 | 


--------------------------------------------------------------------------------
/nmt/xconfig.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import mxnet as mx
 3 | 
 4 | # path
 5 | source_root = os.path.abspath(os.path.join(os.getcwd(), os.path.pardir))
 6 | data_root = os.path.join(source_root, 'IWSLT')
 7 | model_root = os.path.join(source_root, 'IWSLT', 'model')
 8 | log_root = os.path.join(source_root, 'IWSLT', 'log')
 9 | 
10 | if not os.path.exists(model_root):
11 |     os.makedirs(model_root)
12 | if not os.path.exists(log_root):
13 |     os.makedirs(log_root)
14 | 
15 | # dictionary
16 | bos_word = '<s>'
17 | eos_word = '</s>'
18 | unk_word = '<unk>'
19 | special_words = {unk_word: 1, bos_word: 2, eos_word: 3}
20 | source_vocab_path = os.path.join(data_root, 'zh', 'zh.vocab.pkl')
21 | target_vocab_path = os.path.join(data_root, 'en', 'en.vocab.pkl')
22 | 
23 | # data set
24 | train_source = os.path.join(data_root, 'zh', 'zh.txt')
25 | train_target = os.path.join(data_root, 'en', 'en.txt')
26 | train_max_samples = 100000
27 | dev_source = os.path.join(data_root, 'dev', 'IWSLT.dev.txt')
28 | dev_target = os.path.join(data_root, 'invalid', 'invalid')
29 | dev_output = os.path.join(data_root, 'dev', 'dev.out')
30 | dev_max_samples = 100000
31 | test_source = os.path.join(data_root, 'test', 'IWSLT.test.txt')
32 | test_gold = os.path.join(data_root, 'test', 'IWSLT.test.txt')
33 | 
34 | bleu_ref_number = 7
35 | 
36 | # model parameter
37 | batch_size = 64
38 | bucket_stride = 8
39 | buckets = []
40 | for i in range(8, 128, bucket_stride):
41 |     for j in range(8, 128, bucket_stride):
42 |         buckets.append((i, j))
43 | num_hidden = 512  # hidden unit in LSTM cell
44 | num_embed = 512  # embedding dimension
45 | num_lstm_layer = 1  # number of lstm layer
46 | 
47 | # training parameter
48 | num_epoch = 60
49 | learning_rate = 1
50 | momentum = 0.1
51 | dropout = 0.5
52 | show_every_x_batch = 100
53 | eval_per_x_batch = 400
54 | eval_start_epoch = 4
55 | 
56 | # model save option
57 | model_save_name = os.path.join(model_root, "zh-en-iwslt")
58 | model_save_freq = 1  # every x epoch
59 | checkpoint_name = os.path.join(model_root, 'checkpoint_model')
60 | checkpoint_freq_batch = 1000  # save checkpoint model every x batch
61 | 
62 | # train device
63 | train_device = [mx.context.gpu(0)]
64 | # test device
65 | test_device = mx.context.gpu(0)
66 | 
67 | # test parameter
68 | model_to_load_prefix = os.path.join(model_root, 'zh-en-iwslt')
69 | model_to_load_number = 1
70 | use_beam_search = True
71 | beam_size = 12
72 | if not use_beam_search: beam_size = 1
73 | test_output = os.path.join(data_root, 'test', 'test.out')
74 | use_batch_greedy_search = False
75 | greedy_batch_size = 32
76 | max_decode_len = 15
77 | 
78 | # resume training
79 | use_resuming = False
80 | resume_model_prefix = os.path.join(model_root, "checkpoint_model")
81 | resume_model_number = 0
82 | 
83 | 
84 | def get_config_str():
85 |     res = ''
86 |     res += 'Config:\n'
87 |     import collections
88 |     hehe = collections.OrderedDict(sorted(globals().items(), key=lambda x: x[0]))
89 |     for k, v in hehe.items():
90 |         if k.startswith('__'): continue
91 |         if k.startswith('SEPARATOR'): continue
92 |         if k.startswith('get'): continue
93 |         if type(v) == (type(os)): continue
94 |         if len(k) < 2: continue
95 |         res += '{0}: {1}\n'.format(k, v)
96 |     return res
97 | 


--------------------------------------------------------------------------------
/nmt/xmetric.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | # Evaluation
 5 | def Perplexity(label, pred):
 6 |     label = label.T.reshape((-1,))
 7 |     loss = 0.
 8 |     mask_count = 0
 9 |     for i in range(pred.shape[0]):
10 |         if int(label[i]) == 0:
11 |             mask_count += 1
12 |             continue
13 |         loss += -np.log(max(1e-10, pred[i][int(label[i])]))
14 |     return np.exp(loss / (label.size - mask_count))
15 | 
16 | 
17 | def MyCrossEntropy(label, pred):
18 |     label = label.T.reshape((-1,))
19 |     loss = 0.
20 |     for i in range(pred.shape[0]):
21 |         loss += -np.log(max(1e-10, pred[i][int(label[i])]))
22 |     return loss / label.size
23 | 
24 | 
25 | def MyMakeLoss(label, pred):
26 |     # label = label.T.reshape((-1,))
27 |     # loss = 0.
28 |     # for i in range(pred.shape[0]):
29 |     #     loss += -np.log(max(1e-10, pred[i][int(label[i])]))
30 |     return pred[0]
31 | 
32 | 
33 | def MyCrossEntropy_mask(label, pred):
34 |     label = label.T.reshape((-1,))
35 |     loss = 0.
36 |     mask_count = 0
37 |     for i in range(pred.shape[0]):
38 |         if int(label[i]) == 0:
39 |             mask_count += 1
40 |             continue
41 |         loss += -np.log(max(1e-10, pred[i][int(label[i])]))
42 |     return loss / (label.size - mask_count)
43 | 
44 | 
45 | def get_bleu(gold, test):
46 |     import subprocess
47 |     bleu_computer = r"CompBleu_new.exe"
48 |     rawoutput = subprocess.check_output([bleu_computer, gold, test])
49 |     output = rawoutput.splitlines()
50 |     bleu = float(output[-1].decode('utf-8').split('=')[-1].strip())
51 |     return rawoutput, bleu
52 | 


--------------------------------------------------------------------------------
/nmt/xsymbol.py:
--------------------------------------------------------------------------------
 1 | from mxwrap.seq2seq.encoder import BiDirectionalGruEncoder
 2 | from mxwrap.seq2seq.decoder import GruAttentionDecoder
 3 | from mxwrap.attention.ConcatAttention import ConcatAttention
 4 | 
 5 | import xconfig
 6 | import mxnet as mx
 7 | 
 8 | 
 9 | def s2s_unroll(encoder, attention, decoder,
10 |                source_len, target_len,
11 |                input_names, output_names,
12 |                **kwargs):
13 |     forward_hidden_all, backward_hidden_all, source_representations, source_mask_sliced = encoder.encode(source_len)
14 | 
15 |     encoded_for_init_state = mx.sym.Concat(forward_hidden_all[-1], backward_hidden_all[0], dim=1,
16 |                                            name='encoded_for_init_state')
17 |     target_representation = decoder.decode(target_len, encoded_for_init_state, source_representations,
18 |                                            source_mask_sliced)
19 |     return target_representation, input_names, output_names
20 | 
21 | 
22 | def sym_gen(source_vocab_size, target_vocab_size):
23 |     input_names = ['source', 'source_mask', 'target',
24 |                    # 'target_mask',
25 |                    "forward_source_l0_init_h",
26 |                    "backward_source_l0_init_h"]
27 |     output_names = ['target_softmax_label']
28 |     encoder = BiDirectionalGruEncoder(use_masking=True, state_dim=xconfig.num_hidden,
29 |                                       input_dim=0, output_dim=0,
30 |                                       vocab_size=source_vocab_size, embed_dim=xconfig.num_embed,
31 |                                       dropout=xconfig.dropout, num_of_layer=xconfig.num_lstm_layer)
32 | 
33 |     attention = ConcatAttention(batch_size=xconfig.batch_size, attend_dim=xconfig.num_hidden * 2,
34 |                                 state_dim=xconfig.num_hidden)
35 | 
36 |     decoder = GruAttentionDecoder(use_masking=True, state_dim=xconfig.num_hidden,
37 |                                   input_dim=0, output_dim=target_vocab_size,
38 |                                   vocab_size=target_vocab_size, embed_dim=xconfig.num_embed,
39 |                                   dropout=xconfig.dropout,
40 |                                   num_of_layer=xconfig.num_lstm_layer, attention=attention,
41 |                                   batch_size=xconfig.batch_size)
42 | 
43 |     def _sym_gen(s_t_len):
44 |         return s2s_unroll(encoder=encoder,
45 |                           attention=attention,
46 |                           decoder=decoder,
47 |                           source_len=s_t_len[0],
48 |                           target_len=s_t_len[1],
49 |                           input_names=input_names, output_names=output_names,
50 |                           )
51 | 
52 |     return _sym_gen
53 | 


--------------------------------------------------------------------------------
/nmt/xutils.py:
--------------------------------------------------------------------------------
 1 | import mxnet as mx
 2 | import sys
 3 | import pickle
 4 | 
 5 | import xconfig
 6 | 
 7 | 
 8 | def get_gpu_number():
 9 |     for i in range(100):
10 |         try:
11 |             mx.nd.zeros((1,), ctx=mx.gpu(i))
12 |         except:
13 |             return i
14 | 
15 | 
16 | # Read from doc
17 | def read_content(path, max_read_line=sys.maxsize):
18 |     content = []
19 |     count = 0
20 |     with open(path, encoding='utf-8') as ins:
21 |         while True:
22 |             line = ins.readline()
23 |             if not line:
24 |                 break
25 |             count += 1
26 |             if count > max_read_line:
27 |                 break
28 |             line = line.strip()
29 |             content.append(line.split(' '))
30 |     return content
31 | 
32 | 
33 | def load_vocab(path, special=None):
34 |     """
35 |     Load vocab from file, the 0, 1, 2, 3 should be reserved for pad, <unk>, <s>, </s>
36 |     :param path: the vocab
37 |     :param special:
38 |     :return:
39 |     """
40 |     with open(path, 'rb') as f:
41 |         vocab = pickle.load(f)
42 | 
43 |     if special:
44 |         if not isinstance(special, dict):
45 |             raise Exception('special words not instance of python dict')
46 |         for word, idx in special.items():
47 |             if len(word) == 0:
48 |                 continue
49 |             if word == '\n' or word == ' ':
50 |                 continue
51 |             if word not in vocab:
52 |                 vocab[word] = idx
53 |     return vocab
54 | 
55 | 
56 | def sentence2id(sentence, the_vocab):
57 |     words = list(sentence)
58 |     words = [the_vocab[w] if w in the_vocab else the_vocab[xconfig.unk_word] for w in words if len(w) > 0]
59 |     return words
60 | 
61 | 
62 | def word2id(word, the_vocab):
63 |     return the_vocab[word] if word in the_vocab else the_vocab[xconfig.unk_word]
64 | 


--------------------------------------------------------------------------------
/trainingLog.txt:
--------------------------------------------------------------------------------
  1 | C:\Anaconda3\python.exe D:/users/home/Projects/mxnmt/nmt/main.py
  2 | 20:05:35 INFO:root:Config:
  3 | batch_size: 128
  4 | beam_size: 12
  5 | bleu_ref_number: 7
  6 | bos_word: <s>
  7 | bucket_stride: 10
  8 | buckets: [(10, 10), (10, 20), (10, 30), (10, 40), (10, 50), (10, 60), (20, 10), (20, 20), (20, 30), (20, 40), (20, 50), (20, 60), (30, 10), (30, 20), (30, 30), (30, 40), (30, 50), (30, 60), (40, 10), (40, 20), (40, 30), (40, 40), (40, 50), (40, 60), (50, 10), (50, 20), (50, 30), (50, 40), (50, 50), (50, 60), (60, 10), (60, 20), (60, 30), (60, 40), (60, 50), (60, 60)]
  9 | checkpoint_freq_batch: 1000
 10 | checkpoint_name: D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model
 11 | data_root: D:\users\home\Projects\mxnmt\IWSLT
 12 | dev_max_samples: 100000
 13 | dev_output: D:\users\home\Projects\mxnmt\IWSLT\dev\dev.out
 14 | dev_source: D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt
 15 | dev_target: D:\users\home\Projects\mxnmt\IWSLT\invalid\invalid
 16 | dropout: 0.5
 17 | eos_word: </s>
 18 | eval_per_x_batch: 400
 19 | eval_start_epoch: 4
 20 | greedy_batch_size: 32
 21 | learning_rate: 1
 22 | log_root: D:\users\home\Projects\mxnmt\IWSLT\log
 23 | max_decode_len: 15
 24 | model_root: D:\users\home\Projects\mxnmt\IWSLT\model
 25 | model_save_freq: 1
 26 | model_save_name: D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt
 27 | model_to_load_number: 1
 28 | model_to_load_prefix: D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt
 29 | momentum: 0.1
 30 | num_embed: 512
 31 | num_epoch: 60
 32 | num_hidden: 512
 33 | num_lstm_layer: 1
 34 | resume_model_number: 0
 35 | resume_model_prefix: D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model
 36 | show_every_x_batch: 100
 37 | source_root: D:\users\home\Projects\mxnmt
 38 | source_vocab_path: D:\users\home\Projects\mxnmt\IWSLT\zh\zh.vocab.pkl
 39 | special_words: {'<s>': 2, '</s>': 3, '<unk>': 1}
 40 | target_vocab_path: D:\users\home\Projects\mxnmt\IWSLT\en\en.vocab.pkl
 41 | test_device: gpu(0)
 42 | test_gold: D:\users\home\Projects\mxnmt\IWSLT\test\IWSLT.test.txt
 43 | test_output: D:\users\home\Projects\mxnmt\IWSLT\test\test.out
 44 | test_source: D:\users\home\Projects\mxnmt\IWSLT\test\IWSLT.test.txt
 45 | train_device: [gpu(0)]
 46 | train_max_samples: 100000
 47 | train_source: D:\users\home\Projects\mxnmt\IWSLT\zh\zh.txt
 48 | train_target: D:\users\home\Projects\mxnmt\IWSLT\en\en.txt
 49 | unk_word: <unk>
 50 | use_batch_greedy_search: False
 51 | use_beam_search: True
 52 | use_resuming: True
 53 | 
 54 | 20:05:35 INFO:root:In train mode.
 55 | 20:05:36 INFO:root:source_vocab size: 9825
 56 | 20:05:36 INFO:root:target_vocab size: 9413
 57 | Summary of dataset ==================
 58 | Total: 81819 in 36 buckets
 59 | bucket of (10, 10) : 49266 samples
 60 | bucket of (10, 20) : 15701 samples
 61 | bucket of (10, 30) : 288 samples
 62 | bucket of (10, 40) : 5 samples
 63 | bucket of (10, 50) : 0 samples
 64 | bucket of (10, 60) : 0 samples
 65 | bucket of (20, 10) : 1039 samples
 66 | bucket of (20, 20) : 10825 samples
 67 | bucket of (20, 30) : 3126 samples
 68 | bucket of (20, 40) : 203 samples
 69 | bucket of (20, 50) : 10 samples
 70 | bucket of (20, 60) : 1 samples
 71 | bucket of (30, 10) : 1 samples
 72 | bucket of (30, 20) : 118 samples
 73 | bucket of (30, 30) : 752 samples
 74 | bucket of (30, 40) : 269 samples
 75 | bucket of (30, 50) : 38 samples
 76 | bucket of (30, 60) : 2 samples
 77 | bucket of (40, 10) : 0 samples
 78 | bucket of (40, 20) : 0 samples
 79 | bucket of (40, 30) : 10 samples
 80 | bucket of (40, 40) : 43 samples
 81 | bucket of (40, 50) : 31 samples
 82 | bucket of (40, 60) : 2 samples
 83 | bucket of (50, 10) : 0 samples
 84 | bucket of (50, 20) : 0 samples
 85 | bucket of (50, 30) : 0 samples
 86 | bucket of (50, 40) : 4 samples
 87 | bucket of (50, 50) : 25 samples
 88 | bucket of (50, 60) : 15 samples
 89 | bucket of (60, 10) : 0 samples
 90 | bucket of (60, 20) : 0 samples
 91 | bucket of (60, 30) : 0 samples
 92 | bucket of (60, 40) : 0 samples
 93 | bucket of (60, 50) : 10 samples
 94 | bucket of (60, 60) : 18 samples
 95 | D:\users\home\Projects\mxnmt\nmt\masked_bucket_io.py:239: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
 96 |   bucket_plan = np.hstack([np.zeros(n, int) + i for i, n in enumerate(bucket_n_batches)])
 97 | 20:05:39 INFO:root:Try resuming from D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model 0
 98 | [20:05:39] D:\mxnet\dmlc-core\include\dmlc/logging.h:235: [20:05:39] D:\mxnet\dmlc-core\src\io\local_filesys.cc:154: Check failed: allow_null  LocalFileSystem: fail to open "D:\users\home\Projects\mxnmt\IWSLT\model\checkpoint_model-symbol.json"
 99 | 20:05:39 ERROR:root:Resume failed.
100 | 20:05:40 INFO:root:Start training with [gpu(0)]
101 | 20:06:24 INFO:root:Epoch[0] Batch [100]	Speed: 346.40 samples/sec	Train-Perplexity=677.456613
102 | 20:07:02 INFO:root:Epoch[0] Batch [200]	Speed: 334.68 samples/sec	Train-Perplexity=127.235813
103 | 20:07:40 INFO:root:Epoch[0] Batch [300]	Speed: 336.18 samples/sec	Train-Perplexity=87.583511
104 | 20:08:21 INFO:root:Epoch[0] Batch [400]	Speed: 308.59 samples/sec	Train-Perplexity=72.805057
105 | Too early to check BLEU at epoch 0
106 | 20:08:59 INFO:root:Epoch[0] Batch [500]	Speed: 345.62 samples/sec	Train-Perplexity=58.451281
107 | 20:09:38 INFO:root:Epoch[0] Batch [600]	Speed: 324.52 samples/sec	Train-Perplexity=54.911271
108 | 20:09:52 INFO:root:Epoch[0] Resetting Data Iterator
109 | 20:09:52 INFO:root:Epoch[0] Time cost=245.259
110 | 20:09:52 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0001.params"
111 | 20:10:28 INFO:root:Epoch[1] Batch [100]	Speed: 357.79 samples/sec	Train-Perplexity=43.068291
112 | 20:11:06 INFO:root:Epoch[1] Batch [200]	Speed: 335.12 samples/sec	Train-Perplexity=42.659636
113 | 20:11:44 INFO:root:Epoch[1] Batch [300]	Speed: 335.83 samples/sec	Train-Perplexity=38.860670
114 | 20:12:24 INFO:root:Epoch[1] Batch [400]	Speed: 319.99 samples/sec	Train-Perplexity=37.914928
115 | Too early to check BLEU at epoch 1
116 | 20:13:01 INFO:root:Epoch[1] Batch [500]	Speed: 345.75 samples/sec	Train-Perplexity=33.331348
117 | 20:13:41 INFO:root:Epoch[1] Batch [600]	Speed: 324.27 samples/sec	Train-Perplexity=33.298229
118 | 20:13:55 INFO:root:Epoch[1] Resetting Data Iterator
119 | 20:13:55 INFO:root:Epoch[1] Time cost=242.353
120 | 20:13:55 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0002.params"
121 | 20:14:31 INFO:root:Epoch[2] Batch [100]	Speed: 356.97 samples/sec	Train-Perplexity=27.835325
122 | 20:15:09 INFO:root:Epoch[2] Batch [200]	Speed: 335.24 samples/sec	Train-Perplexity=28.387237
123 | 20:15:48 INFO:root:Epoch[2] Batch [300]	Speed: 334.72 samples/sec	Train-Perplexity=26.452638
124 | 20:16:28 INFO:root:Epoch[2] Batch [400]	Speed: 316.15 samples/sec	Train-Perplexity=26.782767
125 | Too early to check BLEU at epoch 2
126 | 20:17:05 INFO:root:Epoch[2] Batch [500]	Speed: 345.44 samples/sec	Train-Perplexity=23.714569
127 | 20:17:45 INFO:root:Epoch[2] Batch [600]	Speed: 324.77 samples/sec	Train-Perplexity=24.313119
128 | 20:17:58 INFO:root:Epoch[2] Resetting Data Iterator
129 | 20:17:58 INFO:root:Epoch[2] Time cost=243.027
130 | 20:17:59 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0003.params"
131 | 20:18:35 INFO:root:Epoch[3] Batch [100]	Speed: 357.69 samples/sec	Train-Perplexity=20.603173
132 | 20:19:13 INFO:root:Epoch[3] Batch [200]	Speed: 335.12 samples/sec	Train-Perplexity=21.253643
133 | 20:19:51 INFO:root:Epoch[3] Batch [300]	Speed: 335.51 samples/sec	Train-Perplexity=20.253162
134 | Too early to check BLEU at epoch 3
135 | 20:20:31 INFO:root:Epoch[3] Batch [400]	Speed: 320.28 samples/sec	Train-Perplexity=20.852583
136 | 20:21:08 INFO:root:Epoch[3] Batch [500]	Speed: 346.38 samples/sec	Train-Perplexity=18.429250
137 | 20:21:48 INFO:root:Epoch[3] Batch [600]	Speed: 325.04 samples/sec	Train-Perplexity=19.102989
138 | 20:22:01 INFO:root:Epoch[3] Resetting Data Iterator
139 | 20:22:01 INFO:root:Epoch[3] Time cost=242.185
140 | 20:22:02 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0004.params"
141 | 20:22:38 INFO:root:Epoch[4] Batch [100]	Speed: 358.60 samples/sec	Train-Perplexity=16.546370
142 | 20:23:16 INFO:root:Epoch[4] Batch [200]	Speed: 335.66 samples/sec	Train-Perplexity=17.048965
143 | 20:23:54 INFO:root:Epoch[4] Batch [300]	Speed: 336.47 samples/sec	Train-Perplexity=16.390798
144 | 20:24:34 INFO:root:Epoch[4] Batch [400]	Speed: 320.39 samples/sec	Train-Perplexity=17.157482
145 | 20:24:34 INFO:root:Checking BLEU for epoch 4 batch 400
146 | C:\Anaconda3\lib\site-packages\mxnet-0.7.0-py3.5.egg\mxnet\ndarray.py:531: RuntimeWarning: copy an array to itself, is it intended?
147 |   RuntimeWarning)
148 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
149 | 20:26:39 INFO:root:b'1gram=65.61%  2gram=39.62%  3gram=27.79%  4gram=19.09%  \r\nBP = 0.9639\r\nBLEU = 0.3303\r\n'
150 | 20:26:39 INFO:root:BLEU: 0.3303 @ epoch 4 batch 400
151 | 20:26:39 INFO:root:Current BLEU: 0.3303 > prev best -1.0 in epoch -1
152 | 20:26:39 INFO:root:Saving...
153 | 20:26:39 INFO:root:Saved checkpoint to "best_bleu-0005.params"
154 | 20:27:16 INFO:root:Epoch[4] Batch [500]	Speed: 78.94 samples/sec	Train-Perplexity=15.243541
155 | 20:27:55 INFO:root:Epoch[4] Batch [600]	Speed: 325.14 samples/sec	Train-Perplexity=15.828474
156 | 20:28:09 INFO:root:Epoch[4] Resetting Data Iterator
157 | 20:28:09 INFO:root:Epoch[4] Time cost=367.106
158 | 20:28:10 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0005.params"
159 | 20:28:45 INFO:root:Epoch[5] Batch [100]	Speed: 357.54 samples/sec	Train-Perplexity=13.908643
160 | 20:29:24 INFO:root:Epoch[5] Batch [200]	Speed: 335.71 samples/sec	Train-Perplexity=14.388900
161 | 20:30:02 INFO:root:Epoch[5] Batch [300]	Speed: 336.51 samples/sec	Train-Perplexity=13.905681
162 | 20:30:42 INFO:root:Epoch[5] Batch [400]	Speed: 320.08 samples/sec	Train-Perplexity=14.677042
163 | 20:30:42 INFO:root:Checking BLEU for epoch 5 batch 400
164 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
165 | 20:32:45 INFO:root:b'1gram=68.54%  2gram=42.38%  3gram=29.82%  4gram=21.34%  \r\nBP = 0.9224\r\nBLEU = 0.3401\r\n'
166 | 20:32:45 INFO:root:BLEU: 0.3401 @ epoch 5 batch 400
167 | 20:32:45 INFO:root:Current BLEU: 0.3401 > prev best 0.3303 in epoch 4
168 | 20:32:45 INFO:root:Saving...
169 | 20:32:45 INFO:root:Saved checkpoint to "best_bleu-0006.params"
170 | 20:33:22 INFO:root:Epoch[5] Batch [500]	Speed: 79.94 samples/sec	Train-Perplexity=12.956247
171 | 20:34:01 INFO:root:Epoch[5] Batch [600]	Speed: 323.94 samples/sec	Train-Perplexity=13.587172
172 | 20:34:15 INFO:root:Epoch[5] Resetting Data Iterator
173 | 20:34:15 INFO:root:Epoch[5] Time cost=365.382
174 | 20:34:16 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0006.params"
175 | 20:34:52 INFO:root:Epoch[6] Batch [100]	Speed: 357.26 samples/sec	Train-Perplexity=11.914505
176 | 20:35:30 INFO:root:Epoch[6] Batch [200]	Speed: 333.97 samples/sec	Train-Perplexity=12.512574
177 | 20:36:08 INFO:root:Epoch[6] Batch [300]	Speed: 335.77 samples/sec	Train-Perplexity=12.020077
178 | 20:36:48 INFO:root:Epoch[6] Batch [400]	Speed: 319.38 samples/sec	Train-Perplexity=12.936626
179 | 20:36:48 INFO:root:Checking BLEU for epoch 6 batch 400
180 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
181 | 20:38:53 INFO:root:b'1gram=68.32%  2gram=42.65%  3gram=29.93%  4gram=20.46%  \r\nBP = 0.9666\r\nBLEU = 0.3533\r\n'
182 | 20:38:53 INFO:root:BLEU: 0.3533 @ epoch 6 batch 400
183 | 20:38:53 INFO:root:Current BLEU: 0.3533 > prev best 0.3401 in epoch 5
184 | 20:38:53 INFO:root:Saving...
185 | 20:38:53 INFO:root:Saved checkpoint to "best_bleu-0007.params"
186 | 20:39:30 INFO:root:Epoch[6] Batch [500]	Speed: 79.16 samples/sec	Train-Perplexity=11.328022
187 | 20:40:09 INFO:root:Epoch[6] Batch [600]	Speed: 324.32 samples/sec	Train-Perplexity=11.931129
188 | 20:40:23 INFO:root:Epoch[6] Resetting Data Iterator
189 | 20:40:23 INFO:root:Epoch[6] Time cost=367.294
190 | 20:40:24 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0007.params"
191 | 20:40:59 INFO:root:Epoch[7] Batch [100]	Speed: 357.71 samples/sec	Train-Perplexity=10.575803
192 | 20:41:38 INFO:root:Epoch[7] Batch [200]	Speed: 334.78 samples/sec	Train-Perplexity=11.017215
193 | 20:42:16 INFO:root:Epoch[7] Batch [300]	Speed: 336.01 samples/sec	Train-Perplexity=10.738582
194 | 20:42:56 INFO:root:Epoch[7] Batch [400]	Speed: 319.64 samples/sec	Train-Perplexity=11.594227
195 | 20:42:56 INFO:root:Checking BLEU for epoch 7 batch 400
196 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
197 | 20:44:57 INFO:root:b'1gram=71.05%  2gram=45.56%  3gram=33.06%  4gram=22.97%  \r\nBP = 0.9349\r\nBLEU = 0.3702\r\n'
198 | 20:44:57 INFO:root:BLEU: 0.3702 @ epoch 7 batch 400
199 | 20:44:57 INFO:root:Current BLEU: 0.3702 > prev best 0.3533 in epoch 6
200 | 20:44:57 INFO:root:Saving...
201 | 20:44:58 INFO:root:Saved checkpoint to "best_bleu-0008.params"
202 | 20:45:35 INFO:root:Epoch[7] Batch [500]	Speed: 80.60 samples/sec	Train-Perplexity=10.112880
203 | 20:46:14 INFO:root:Epoch[7] Batch [600]	Speed: 324.06 samples/sec	Train-Perplexity=10.736643
204 | 20:46:28 INFO:root:Epoch[7] Resetting Data Iterator
205 | 20:46:28 INFO:root:Epoch[7] Time cost=364.244
206 | 20:46:28 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0008.params"
207 | 20:47:04 INFO:root:Epoch[8] Batch [100]	Speed: 357.45 samples/sec	Train-Perplexity=9.522396
208 | 20:47:43 INFO:root:Epoch[8] Batch [200]	Speed: 334.64 samples/sec	Train-Perplexity=9.972395
209 | 20:48:21 INFO:root:Epoch[8] Batch [300]	Speed: 329.93 samples/sec	Train-Perplexity=9.700864
210 | 20:49:01 INFO:root:Epoch[8] Batch [400]	Speed: 319.50 samples/sec	Train-Perplexity=10.634506
211 | 20:49:01 INFO:root:Checking BLEU for epoch 8 batch 400
212 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
213 | 20:51:06 INFO:root:b'1gram=70.02%  2gram=44.80%  3gram=32.07%  4gram=22.38%  \r\nBP = 0.9731\r\nBLEU = 0.3769\r\n'
214 | 20:51:06 INFO:root:BLEU: 0.3769 @ epoch 8 batch 400
215 | 20:51:06 INFO:root:Current BLEU: 0.3769 > prev best 0.3702 in epoch 7
216 | 20:51:06 INFO:root:Saving...
217 | 20:51:06 INFO:root:Saved checkpoint to "best_bleu-0009.params"
218 | 20:51:43 INFO:root:Epoch[8] Batch [500]	Speed: 79.21 samples/sec	Train-Perplexity=9.170730
219 | 20:52:23 INFO:root:Epoch[8] Batch [600]	Speed: 323.98 samples/sec	Train-Perplexity=9.790732
220 | 20:52:36 INFO:root:Epoch[8] Resetting Data Iterator
221 | 20:52:36 INFO:root:Epoch[8] Time cost=367.819
222 | 20:52:37 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0009.params"
223 | 20:53:13 INFO:root:Epoch[9] Batch [100]	Speed: 357.66 samples/sec	Train-Perplexity=8.732078
224 | 20:53:51 INFO:root:Epoch[9] Batch [200]	Speed: 335.10 samples/sec	Train-Perplexity=9.055381
225 | 20:54:29 INFO:root:Epoch[9] Batch [300]	Speed: 336.43 samples/sec	Train-Perplexity=8.890672
226 | 20:55:09 INFO:root:Epoch[9] Batch [400]	Speed: 320.36 samples/sec	Train-Perplexity=9.763253
227 | 20:55:09 INFO:root:Checking BLEU for epoch 9 batch 400
228 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
229 | 20:57:13 INFO:root:b'1gram=71.28%  2gram=46.02%  3gram=33.44%  4gram=23.75%  \r\nBP = 0.9666\r\nBLEU = 0.3883\r\n'
230 | 20:57:13 INFO:root:BLEU: 0.3883 @ epoch 9 batch 400
231 | 20:57:13 INFO:root:Current BLEU: 0.3883 > prev best 0.3769 in epoch 8
232 | 20:57:13 INFO:root:Saving...
233 | 20:57:13 INFO:root:Saved checkpoint to "best_bleu-0010.params"
234 | 20:57:50 INFO:root:Epoch[9] Batch [500]	Speed: 79.62 samples/sec	Train-Perplexity=8.448180
235 | 20:58:29 INFO:root:Epoch[9] Batch [600]	Speed: 324.77 samples/sec	Train-Perplexity=8.975397
236 | 20:58:43 INFO:root:Epoch[9] Resetting Data Iterator
237 | 20:58:43 INFO:root:Epoch[9] Time cost=366.024
238 | 20:58:44 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0010.params"
239 | 20:59:20 INFO:root:Epoch[10] Batch [100]	Speed: 357.45 samples/sec	Train-Perplexity=8.093092
240 | 20:59:58 INFO:root:Epoch[10] Batch [200]	Speed: 334.54 samples/sec	Train-Perplexity=8.395097
241 | 21:00:36 INFO:root:Epoch[10] Batch [300]	Speed: 335.94 samples/sec	Train-Perplexity=8.181741
242 | 21:01:16 INFO:root:Epoch[10] Batch [400]	Speed: 319.47 samples/sec	Train-Perplexity=9.018671
243 | 21:01:16 INFO:root:Checking BLEU for epoch 10 batch 400
244 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [239 ms].
245 | 21:03:19 INFO:root:b'1gram=71.81%  2gram=47.26%  3gram=34.34%  4gram=24.33%  \r\nBP = 0.9623\r\nBLEU = 0.3949\r\n'
246 | 21:03:19 INFO:root:BLEU: 0.3949 @ epoch 10 batch 400
247 | 21:03:19 INFO:root:Current BLEU: 0.3949 > prev best 0.3883 in epoch 9
248 | 21:03:19 INFO:root:Saving...
249 | 21:03:20 INFO:root:Saved checkpoint to "best_bleu-0011.params"
250 | 21:03:56 INFO:root:Epoch[10] Batch [500]	Speed: 79.78 samples/sec	Train-Perplexity=7.841984
251 | 21:04:37 INFO:root:Epoch[10] Batch [600]	Speed: 318.43 samples/sec	Train-Perplexity=8.340074
252 | 21:04:50 INFO:root:Epoch[10] Resetting Data Iterator
253 | 21:04:50 INFO:root:Epoch[10] Time cost=366.643
254 | 21:04:51 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0011.params"
255 | 21:05:27 INFO:root:Epoch[11] Batch [100]	Speed: 357.71 samples/sec	Train-Perplexity=7.543190
256 | 21:06:05 INFO:root:Epoch[11] Batch [200]	Speed: 335.21 samples/sec	Train-Perplexity=7.860304
257 | 21:06:43 INFO:root:Epoch[11] Batch [300]	Speed: 335.82 samples/sec	Train-Perplexity=7.613764
258 | 21:07:23 INFO:root:Epoch[11] Batch [400]	Speed: 318.66 samples/sec	Train-Perplexity=8.446283
259 | 21:07:23 INFO:root:Checking BLEU for epoch 11 batch 400
260 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
261 | 21:09:24 INFO:root:b'1gram=71.83%  2gram=46.38%  3gram=33.56%  4gram=23.98%  \r\nBP = 0.9468\r\nBLEU = 0.3831\r\n'
262 | 21:09:24 INFO:root:BLEU: 0.3831 @ epoch 11 batch 400
263 | 21:10:01 INFO:root:Epoch[11] Batch [500]	Speed: 81.12 samples/sec	Train-Perplexity=7.340801
264 | 21:10:41 INFO:root:Epoch[11] Batch [600]	Speed: 324.20 samples/sec	Train-Perplexity=7.796961
265 | 21:10:54 INFO:root:Epoch[11] Resetting Data Iterator
266 | 21:10:54 INFO:root:Epoch[11] Time cost=363.285
267 | 21:10:55 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0012.params"
268 | 21:11:31 INFO:root:Epoch[12] Batch [100]	Speed: 358.01 samples/sec	Train-Perplexity=7.088733
269 | 21:12:09 INFO:root:Epoch[12] Batch [200]	Speed: 334.66 samples/sec	Train-Perplexity=7.353227
270 | 21:12:47 INFO:root:Epoch[12] Batch [300]	Speed: 335.80 samples/sec	Train-Perplexity=7.147846
271 | 21:13:27 INFO:root:Epoch[12] Batch [400]	Speed: 319.75 samples/sec	Train-Perplexity=8.068194
272 | 21:13:27 INFO:root:Checking BLEU for epoch 12 batch 400
273 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
274 | 21:15:30 INFO:root:b'1gram=72.21%  2gram=47.82%  3gram=35.17%  4gram=25.34%  \r\nBP = 0.9582\r\nBLEU = 0.4014\r\n'
275 | 21:15:30 INFO:root:BLEU: 0.4014 @ epoch 12 batch 400
276 | 21:15:30 INFO:root:Current BLEU: 0.4014 > prev best 0.3949 in epoch 10
277 | 21:15:30 INFO:root:Saving...
278 | 21:15:30 INFO:root:Saved checkpoint to "best_bleu-0013.params"
279 | 21:16:07 INFO:root:Epoch[12] Batch [500]	Speed: 80.00 samples/sec	Train-Perplexity=6.901814
280 | 21:16:47 INFO:root:Epoch[12] Batch [600]	Speed: 323.91 samples/sec	Train-Perplexity=7.349590
281 | 21:17:00 INFO:root:Epoch[12] Resetting Data Iterator
282 | 21:17:00 INFO:root:Epoch[12] Time cost=365.473
283 | 21:17:01 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0013.params"
284 | 21:17:37 INFO:root:Epoch[13] Batch [100]	Speed: 357.26 samples/sec	Train-Perplexity=6.668300
285 | 21:18:15 INFO:root:Epoch[13] Batch [200]	Speed: 334.11 samples/sec	Train-Perplexity=6.903907
286 | 21:18:53 INFO:root:Epoch[13] Batch [300]	Speed: 334.95 samples/sec	Train-Perplexity=6.779002
287 | 21:19:34 INFO:root:Epoch[13] Batch [400]	Speed: 319.28 samples/sec	Train-Perplexity=7.538552
288 | 21:19:34 INFO:root:Checking BLEU for epoch 13 batch 400
289 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
290 | 21:21:35 INFO:root:b'1gram=72.78%  2gram=48.05%  3gram=35.43%  4gram=25.77%  \r\nBP = 0.9380\r\nBLEU = 0.3965\r\n'
291 | 21:21:35 INFO:root:BLEU: 0.3965 @ epoch 13 batch 400
292 | 21:22:12 INFO:root:Epoch[13] Batch [500]	Speed: 80.77 samples/sec	Train-Perplexity=6.545751
293 | 21:22:52 INFO:root:Epoch[13] Batch [600]	Speed: 323.67 samples/sec	Train-Perplexity=6.999107
294 | 21:23:05 INFO:root:Epoch[13] Resetting Data Iterator
295 | 21:23:05 INFO:root:Epoch[13] Time cost=364.298
296 | 21:23:06 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0014.params"
297 | 21:23:42 INFO:root:Epoch[14] Batch [100]	Speed: 357.32 samples/sec	Train-Perplexity=6.375863
298 | 21:24:20 INFO:root:Epoch[14] Batch [200]	Speed: 334.81 samples/sec	Train-Perplexity=6.594310
299 | 21:24:58 INFO:root:Epoch[14] Batch [300]	Speed: 336.03 samples/sec	Train-Perplexity=6.421297
300 | 21:25:38 INFO:root:Epoch[14] Batch [400]	Speed: 319.55 samples/sec	Train-Perplexity=7.225313
301 | 21:25:38 INFO:root:Checking BLEU for epoch 14 batch 400
302 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
303 | 21:27:41 INFO:root:b'1gram=71.98%  2gram=48.20%  3gram=36.14%  4gram=26.24%  \r\nBP = 0.9393\r\nBLEU = 0.4001\r\n'
304 | 21:27:41 INFO:root:BLEU: 0.4001 @ epoch 14 batch 400
305 | 21:28:18 INFO:root:Epoch[14] Batch [500]	Speed: 80.35 samples/sec	Train-Perplexity=6.232595
306 | 21:28:57 INFO:root:Epoch[14] Batch [600]	Speed: 324.25 samples/sec	Train-Perplexity=6.626467
307 | 21:29:11 INFO:root:Epoch[14] Resetting Data Iterator
308 | 21:29:11 INFO:root:Epoch[14] Time cost=364.748
309 | 21:29:11 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0015.params"
310 | 21:29:47 INFO:root:Epoch[15] Batch [100]	Speed: 358.29 samples/sec	Train-Perplexity=6.163365
311 | 21:30:25 INFO:root:Epoch[15] Batch [200]	Speed: 335.27 samples/sec	Train-Perplexity=6.239482
312 | 21:31:03 INFO:root:Epoch[15] Batch [300]	Speed: 336.43 samples/sec	Train-Perplexity=6.124430
313 | 21:31:44 INFO:root:Epoch[15] Batch [400]	Speed: 318.66 samples/sec	Train-Perplexity=6.912867
314 | 21:31:44 INFO:root:Checking BLEU for epoch 15 batch 400
315 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
316 | 21:33:47 INFO:root:b'1gram=72.72%  2gram=49.09%  3gram=36.69%  4gram=26.89%  \r\nBP = 0.9610\r\nBLEU = 0.4163\r\n'
317 | 21:33:47 INFO:root:BLEU: 0.4163 @ epoch 15 batch 400
318 | 21:33:47 INFO:root:Current BLEU: 0.4163 > prev best 0.4014 in epoch 12
319 | 21:33:47 INFO:root:Saving...
320 | 21:33:47 INFO:root:Saved checkpoint to "best_bleu-0016.params"
321 | 21:34:24 INFO:root:Epoch[15] Batch [500]	Speed: 79.87 samples/sec	Train-Perplexity=5.964563
322 | 21:35:03 INFO:root:Epoch[15] Batch [600]	Speed: 324.65 samples/sec	Train-Perplexity=6.364883
323 | 21:35:17 INFO:root:Epoch[15] Resetting Data Iterator
324 | 21:35:17 INFO:root:Epoch[15] Time cost=365.577
325 | 21:35:18 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0016.params"
326 | 21:35:53 INFO:root:Epoch[16] Batch [100]	Speed: 357.86 samples/sec	Train-Perplexity=5.805708
327 | 21:36:32 INFO:root:Epoch[16] Batch [200]	Speed: 334.97 samples/sec	Train-Perplexity=5.979251
328 | 21:37:11 INFO:root:Epoch[16] Batch [300]	Speed: 327.06 samples/sec	Train-Perplexity=5.853375
329 | 21:37:51 INFO:root:Epoch[16] Batch [400]	Speed: 319.71 samples/sec	Train-Perplexity=6.650697
330 | 21:37:51 INFO:root:Checking BLEU for epoch 16 batch 400
331 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
332 | 21:39:56 INFO:root:b'1gram=71.67%  2gram=46.89%  3gram=34.49%  4gram=24.43%  \r\nBP = 0.9840\r\nBLEU = 0.4036\r\n'
333 | 21:39:56 INFO:root:BLEU: 0.4036 @ epoch 16 batch 400
334 | 21:40:33 INFO:root:Epoch[16] Batch [500]	Speed: 79.14 samples/sec	Train-Perplexity=5.721799
335 | 21:41:12 INFO:root:Epoch[16] Batch [600]	Speed: 324.26 samples/sec	Train-Perplexity=6.081054
336 | 21:41:26 INFO:root:Epoch[16] Resetting Data Iterator
337 | 21:41:26 INFO:root:Epoch[16] Time cost=368.138
338 | 21:41:26 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0017.params"
339 | 21:42:02 INFO:root:Epoch[17] Batch [100]	Speed: 357.51 samples/sec	Train-Perplexity=5.597813
340 | 21:42:41 INFO:root:Epoch[17] Batch [200]	Speed: 334.83 samples/sec	Train-Perplexity=5.752815
341 | 21:43:19 INFO:root:Epoch[17] Batch [300]	Speed: 335.82 samples/sec	Train-Perplexity=5.613985
342 | 21:43:59 INFO:root:Epoch[17] Batch [400]	Speed: 319.40 samples/sec	Train-Perplexity=6.407925
343 | 21:43:59 INFO:root:Checking BLEU for epoch 17 batch 400
344 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
345 | 21:46:01 INFO:root:b'1gram=72.71%  2gram=48.73%  3gram=36.33%  4gram=26.43%  \r\nBP = 0.9536\r\nBLEU = 0.4095\r\n'
346 | 21:46:01 INFO:root:BLEU: 0.4095 @ epoch 17 batch 400
347 | 21:46:38 INFO:root:Epoch[17] Batch [500]	Speed: 80.48 samples/sec	Train-Perplexity=5.507343
348 | 21:47:17 INFO:root:Epoch[17] Batch [600]	Speed: 324.23 samples/sec	Train-Perplexity=5.856936
349 | 21:47:31 INFO:root:Epoch[17] Resetting Data Iterator
350 | 21:47:31 INFO:root:Epoch[17] Time cost=364.515
351 | 21:47:32 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0018.params"
352 | 21:48:07 INFO:root:Epoch[18] Batch [100]	Speed: 357.96 samples/sec	Train-Perplexity=5.517313
353 | 21:48:46 INFO:root:Epoch[18] Batch [200]	Speed: 335.06 samples/sec	Train-Perplexity=5.540820
354 | 21:49:24 INFO:root:Epoch[18] Batch [300]	Speed: 335.82 samples/sec	Train-Perplexity=5.418239
355 | 21:50:04 INFO:root:Epoch[18] Batch [400]	Speed: 319.89 samples/sec	Train-Perplexity=6.152498
356 | 21:50:04 INFO:root:Checking BLEU for epoch 18 batch 400
357 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms].
358 | 21:52:06 INFO:root:b'1gram=72.94%  2gram=48.72%  3gram=35.73%  4gram=24.87%  \r\nBP = 0.9593\r\nBLEU = 0.4044\r\n'
359 | 21:52:06 INFO:root:BLEU: 0.4044 @ epoch 18 batch 400
360 | 21:52:43 INFO:root:Epoch[18] Batch [500]	Speed: 80.39 samples/sec	Train-Perplexity=5.301571
361 | 21:53:23 INFO:root:Epoch[18] Batch [600]	Speed: 319.32 samples/sec	Train-Perplexity=5.640023
362 | 21:53:37 INFO:root:Epoch[18] Resetting Data Iterator
363 | 21:53:37 INFO:root:Epoch[18] Time cost=365.165
364 | 21:53:37 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0019.params"
365 | 21:54:13 INFO:root:Epoch[19] Batch [100]	Speed: 357.84 samples/sec	Train-Perplexity=5.218759
366 | 21:54:52 INFO:root:Epoch[19] Batch [200]	Speed: 334.61 samples/sec	Train-Perplexity=5.312471
367 | 21:55:30 INFO:root:Epoch[19] Batch [300]	Speed: 335.46 samples/sec	Train-Perplexity=5.240229
368 | 21:56:10 INFO:root:Epoch[19] Batch [400]	Speed: 319.12 samples/sec	Train-Perplexity=5.939835
369 | 21:56:10 INFO:root:Checking BLEU for epoch 19 batch 400
370 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
371 | 21:58:14 INFO:root:b'1gram=71.92%  2gram=48.46%  3gram=35.67%  4gram=25.13%  \r\nBP = 0.9816\r\nBLEU = 0.4127\r\n'
372 | 21:58:14 INFO:root:BLEU: 0.4127 @ epoch 19 batch 400
373 | 21:58:51 INFO:root:Epoch[19] Batch [500]	Speed: 79.56 samples/sec	Train-Perplexity=5.138349
374 | 21:59:30 INFO:root:Epoch[19] Batch [600]	Speed: 324.29 samples/sec	Train-Perplexity=5.474285
375 | 21:59:44 INFO:root:Epoch[19] Resetting Data Iterator
376 | 21:59:44 INFO:root:Epoch[19] Time cost=366.446
377 | 21:59:44 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0020.params"
378 | 22:00:20 INFO:root:Epoch[20] Batch [100]	Speed: 357.59 samples/sec	Train-Perplexity=5.156831
379 | 22:00:59 INFO:root:Epoch[20] Batch [200]	Speed: 335.02 samples/sec	Train-Perplexity=5.152715
380 | 22:01:37 INFO:root:Epoch[20] Batch [300]	Speed: 335.62 samples/sec	Train-Perplexity=5.069593
381 | 22:02:17 INFO:root:Epoch[20] Batch [400]	Speed: 319.26 samples/sec	Train-Perplexity=5.734110
382 | 22:02:17 INFO:root:Checking BLEU for epoch 20 batch 400
383 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
384 | 22:04:20 INFO:root:b'1gram=72.28%  2gram=48.18%  3gram=36.24%  4gram=26.46%  \r\nBP = 0.9763\r\nBLEU = 0.4173\r\n'
385 | 22:04:20 INFO:root:BLEU: 0.4173 @ epoch 20 batch 400
386 | 22:04:20 INFO:root:Current BLEU: 0.4173 > prev best 0.4163 in epoch 15
387 | 22:04:20 INFO:root:Saving...
388 | 22:04:21 INFO:root:Saved checkpoint to "best_bleu-0021.params"
389 | 22:04:57 INFO:root:Epoch[20] Batch [500]	Speed: 79.79 samples/sec	Train-Perplexity=4.956359
390 | 22:05:37 INFO:root:Epoch[20] Batch [600]	Speed: 324.48 samples/sec	Train-Perplexity=5.288804
391 | 22:05:50 INFO:root:Epoch[20] Resetting Data Iterator
392 | 22:05:50 INFO:root:Epoch[20] Time cost=365.873
393 | 22:05:51 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0021.params"
394 | 22:06:27 INFO:root:Epoch[21] Batch [100]	Speed: 358.00 samples/sec	Train-Perplexity=4.901484
395 | 22:07:05 INFO:root:Epoch[21] Batch [200]	Speed: 334.84 samples/sec	Train-Perplexity=5.006587
396 | 22:07:43 INFO:root:Epoch[21] Batch [300]	Speed: 336.02 samples/sec	Train-Perplexity=4.877228
397 | 22:08:23 INFO:root:Epoch[21] Batch [400]	Speed: 319.59 samples/sec	Train-Perplexity=5.617798
398 | 22:08:23 INFO:root:Checking BLEU for epoch 21 batch 400
399 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
400 | 22:10:28 INFO:root:b'1gram=71.66%  2gram=47.92%  3gram=35.61%  4gram=25.55%  \r\nBP = 0.9948\r\nBLEU = 0.4183\r\n'
401 | 22:10:28 INFO:root:BLEU: 0.4183 @ epoch 21 batch 400
402 | 22:10:28 INFO:root:Current BLEU: 0.4183 > prev best 0.4173 in epoch 20
403 | 22:10:28 INFO:root:Saving...
404 | 22:10:28 INFO:root:Saved checkpoint to "best_bleu-0022.params"
405 | 22:11:05 INFO:root:Epoch[21] Batch [500]	Speed: 79.31 samples/sec	Train-Perplexity=4.851527
406 | 22:11:44 INFO:root:Epoch[21] Batch [600]	Speed: 324.12 samples/sec	Train-Perplexity=5.100939
407 | 22:11:58 INFO:root:Epoch[21] Resetting Data Iterator
408 | 22:11:58 INFO:root:Epoch[21] Time cost=366.776
409 | 22:11:58 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0022.params"
410 | 22:12:34 INFO:root:Epoch[22] Batch [100]	Speed: 357.98 samples/sec	Train-Perplexity=4.722476
411 | 22:13:12 INFO:root:Epoch[22] Batch [200]	Speed: 335.27 samples/sec	Train-Perplexity=4.857207
412 | 22:13:51 INFO:root:Epoch[22] Batch [300]	Speed: 336.11 samples/sec	Train-Perplexity=4.766189
413 | 22:14:31 INFO:root:Epoch[22] Batch [400]	Speed: 320.01 samples/sec	Train-Perplexity=5.428668
414 | 22:14:31 INFO:root:Checking BLEU for epoch 22 batch 400
415 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms].
416 | 22:16:34 INFO:root:b'1gram=72.78%  2gram=48.23%  3gram=35.50%  4gram=25.64%  \r\nBP = 0.9771\r\nBLEU = 0.4131\r\n'
417 | 22:16:34 INFO:root:BLEU: 0.4131 @ epoch 22 batch 400
418 | 22:17:11 INFO:root:Epoch[22] Batch [500]	Speed: 79.83 samples/sec	Train-Perplexity=4.681073
419 | 22:17:50 INFO:root:Epoch[22] Batch [600]	Speed: 324.54 samples/sec	Train-Perplexity=4.956886
420 | 22:18:04 INFO:root:Epoch[22] Resetting Data Iterator
421 | 22:18:04 INFO:root:Epoch[22] Time cost=365.562
422 | 22:18:05 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0023.params"
423 | 22:18:41 INFO:root:Epoch[23] Batch [100]	Speed: 358.29 samples/sec	Train-Perplexity=4.682423
424 | 22:19:19 INFO:root:Epoch[23] Batch [200]	Speed: 335.45 samples/sec	Train-Perplexity=4.707169
425 | 22:19:57 INFO:root:Epoch[23] Batch [300]	Speed: 336.35 samples/sec	Train-Perplexity=4.628967
426 | 22:20:37 INFO:root:Epoch[23] Batch [400]	Speed: 319.81 samples/sec	Train-Perplexity=5.250182
427 | 22:20:37 INFO:root:Checking BLEU for epoch 23 batch 400
428 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
429 | 22:22:43 INFO:root:b'1gram=70.01%  2gram=46.34%  3gram=34.48%  4gram=24.87%  \r\nBP = 1.0000\r\nBLEU = 0.4084\r\n'
430 | 22:22:43 INFO:root:BLEU: 0.4084 @ epoch 23 batch 400
431 | 22:23:19 INFO:root:Epoch[23] Batch [500]	Speed: 78.69 samples/sec	Train-Perplexity=4.552618
432 | 22:23:59 INFO:root:Epoch[23] Batch [600]	Speed: 324.72 samples/sec	Train-Perplexity=4.859983
433 | 22:24:12 INFO:root:Epoch[23] Resetting Data Iterator
434 | 22:24:12 INFO:root:Epoch[23] Time cost=367.811
435 | 22:24:13 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0024.params"
436 | 22:24:49 INFO:root:Epoch[24] Batch [100]	Speed: 357.62 samples/sec	Train-Perplexity=4.552113
437 | 22:25:28 INFO:root:Epoch[24] Batch [200]	Speed: 328.03 samples/sec	Train-Perplexity=4.613325
438 | 22:26:06 INFO:root:Epoch[24] Batch [300]	Speed: 335.90 samples/sec	Train-Perplexity=4.530597
439 | 22:26:46 INFO:root:Epoch[24] Batch [400]	Speed: 320.08 samples/sec	Train-Perplexity=5.147030
440 | 22:26:46 INFO:root:Checking BLEU for epoch 24 batch 400
441 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
442 | 22:28:49 INFO:root:b'1gram=71.94%  2gram=48.40%  3gram=36.03%  4gram=26.64%  \r\nBP = 0.9717\r\nBLEU = 0.4155\r\n'
443 | 22:28:49 INFO:root:BLEU: 0.4155 @ epoch 24 batch 400
444 | 22:29:26 INFO:root:Epoch[24] Batch [500]	Speed: 80.22 samples/sec	Train-Perplexity=4.456267
445 | 22:30:05 INFO:root:Epoch[24] Batch [600]	Speed: 324.49 samples/sec	Train-Perplexity=4.723113
446 | 22:30:19 INFO:root:Epoch[24] Resetting Data Iterator
447 | 22:30:19 INFO:root:Epoch[24] Time cost=365.690
448 | 22:30:19 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0025.params"
449 | 22:30:55 INFO:root:Epoch[25] Batch [100]	Speed: 357.99 samples/sec	Train-Perplexity=4.392298
450 | 22:31:34 INFO:root:Epoch[25] Batch [200]	Speed: 335.29 samples/sec	Train-Perplexity=4.485758
451 | 22:32:12 INFO:root:Epoch[25] Batch [300]	Speed: 336.13 samples/sec	Train-Perplexity=4.399285
452 | 22:32:52 INFO:root:Epoch[25] Batch [400]	Speed: 320.07 samples/sec	Train-Perplexity=5.018728
453 | 22:32:52 INFO:root:Checking BLEU for epoch 25 batch 400
454 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
455 | 22:34:56 INFO:root:b'1gram=72.33%  2gram=48.85%  3gram=36.39%  4gram=26.22%  \r\nBP = 0.9914\r\nBLEU = 0.4248\r\n'
456 | 22:34:56 INFO:root:BLEU: 0.4248 @ epoch 25 batch 400
457 | 22:34:56 INFO:root:Current BLEU: 0.4248 > prev best 0.4183 in epoch 21
458 | 22:34:56 INFO:root:Saving...
459 | 22:34:57 INFO:root:Saved checkpoint to "best_bleu-0026.params"
460 | 22:35:33 INFO:root:Epoch[25] Batch [500]	Speed: 79.07 samples/sec	Train-Perplexity=4.335405
461 | 22:36:13 INFO:root:Epoch[25] Batch [600]	Speed: 324.77 samples/sec	Train-Perplexity=4.627439
462 | 22:36:27 INFO:root:Epoch[25] Resetting Data Iterator
463 | 22:36:27 INFO:root:Epoch[25] Time cost=367.088
464 | 22:36:27 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0026.params"
465 | 22:37:03 INFO:root:Epoch[26] Batch [100]	Speed: 357.85 samples/sec	Train-Perplexity=4.306752
466 | 22:37:41 INFO:root:Epoch[26] Batch [200]	Speed: 334.62 samples/sec	Train-Perplexity=4.379418
467 | 22:38:19 INFO:root:Epoch[26] Batch [300]	Speed: 335.97 samples/sec	Train-Perplexity=4.283302
468 | 22:38:59 INFO:root:Epoch[26] Batch [400]	Speed: 319.95 samples/sec	Train-Perplexity=4.912712
469 | 22:38:59 INFO:root:Checking BLEU for epoch 26 batch 400
470 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms].
471 | 22:41:02 INFO:root:b'1gram=73.43%  2gram=49.63%  3gram=37.07%  4gram=27.09%  \r\nBP = 0.9612\r\nBLEU = 0.4204\r\n'
472 | 22:41:02 INFO:root:BLEU: 0.4204 @ epoch 26 batch 400
473 | 22:41:40 INFO:root:Epoch[26] Batch [500]	Speed: 79.92 samples/sec	Train-Perplexity=4.257015
474 | 22:42:19 INFO:root:Epoch[26] Batch [600]	Speed: 324.30 samples/sec	Train-Perplexity=4.502108
475 | 22:42:33 INFO:root:Epoch[26] Resetting Data Iterator
476 | 22:42:33 INFO:root:Epoch[26] Time cost=365.570
477 | 22:42:33 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0027.params"
478 | 22:43:09 INFO:root:Epoch[27] Batch [100]	Speed: 357.83 samples/sec	Train-Perplexity=4.244413
479 | 22:43:48 INFO:root:Epoch[27] Batch [200]	Speed: 335.25 samples/sec	Train-Perplexity=4.262056
480 | 22:44:26 INFO:root:Epoch[27] Batch [300]	Speed: 336.12 samples/sec	Train-Perplexity=4.184889
481 | 22:45:06 INFO:root:Epoch[27] Batch [400]	Speed: 319.83 samples/sec	Train-Perplexity=4.801407
482 | 22:45:06 INFO:root:Checking BLEU for epoch 27 batch 400
483 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [239 ms].
484 | 22:47:08 INFO:root:b'1gram=73.75%  2gram=49.58%  3gram=37.25%  4gram=27.27%  \r\nBP = 0.9561\r\nBLEU = 0.4197\r\n'
485 | 22:47:08 INFO:root:BLEU: 0.4197 @ epoch 27 batch 400
486 | 22:47:44 INFO:root:Epoch[27] Batch [500]	Speed: 80.66 samples/sec	Train-Perplexity=4.152310
487 | 22:48:24 INFO:root:Epoch[27] Batch [600]	Speed: 324.11 samples/sec	Train-Perplexity=4.398443
488 | 22:48:37 INFO:root:Epoch[27] Resetting Data Iterator
489 | 22:48:37 INFO:root:Epoch[27] Time cost=364.017
490 | 22:48:38 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0028.params"
491 | 22:49:14 INFO:root:Epoch[28] Batch [100]	Speed: 357.90 samples/sec	Train-Perplexity=4.110479
492 | 22:49:52 INFO:root:Epoch[28] Batch [200]	Speed: 334.73 samples/sec	Train-Perplexity=4.186066
493 | 22:50:30 INFO:root:Epoch[28] Batch [300]	Speed: 336.05 samples/sec	Train-Perplexity=4.109250
494 | 22:51:10 INFO:root:Epoch[28] Batch [400]	Speed: 319.06 samples/sec	Train-Perplexity=4.731491
495 | 22:51:10 INFO:root:Checking BLEU for epoch 28 batch 400
496 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [244 ms].
497 | 22:53:14 INFO:root:b'1gram=73.15%  2gram=49.07%  3gram=36.82%  4gram=26.62%  \r\nBP = 0.9701\r\nBLEU = 0.4202\r\n'
498 | 22:53:14 INFO:root:BLEU: 0.4202 @ epoch 28 batch 400
499 | 22:53:51 INFO:root:Epoch[28] Batch [500]	Speed: 79.84 samples/sec	Train-Perplexity=4.068351
500 | 22:54:30 INFO:root:Epoch[28] Batch [600]	Speed: 324.61 samples/sec	Train-Perplexity=4.302435
501 | 22:54:44 INFO:root:Epoch[28] Resetting Data Iterator
502 | 22:54:44 INFO:root:Epoch[28] Time cost=365.732
503 | 22:54:45 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0029.params"
504 | 22:55:20 INFO:root:Epoch[29] Batch [100]	Speed: 357.71 samples/sec	Train-Perplexity=4.101044
505 | 22:55:59 INFO:root:Epoch[29] Batch [200]	Speed: 335.13 samples/sec	Train-Perplexity=4.092799
506 | 22:56:37 INFO:root:Epoch[29] Batch [300]	Speed: 336.05 samples/sec	Train-Perplexity=4.027847
507 | 22:57:17 INFO:root:Epoch[29] Batch [400]	Speed: 319.65 samples/sec	Train-Perplexity=4.594741
508 | 22:57:17 INFO:root:Checking BLEU for epoch 29 batch 400
509 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
510 | 22:59:20 INFO:root:b'1gram=73.02%  2gram=49.75%  3gram=37.72%  4gram=27.57%  \r\nBP = 0.9887\r\nBLEU = 0.4359\r\n'
511 | 22:59:20 INFO:root:BLEU: 0.4359 @ epoch 29 batch 400
512 | 22:59:20 INFO:root:Current BLEU: 0.4359 > prev best 0.4248 in epoch 25
513 | 22:59:20 INFO:root:Saving...
514 | 22:59:20 INFO:root:Saved checkpoint to "best_bleu-0030.params"
515 | 22:59:57 INFO:root:Epoch[29] Batch [500]	Speed: 79.82 samples/sec	Train-Perplexity=3.977610
516 | 23:00:37 INFO:root:Epoch[29] Batch [600]	Speed: 324.40 samples/sec	Train-Perplexity=4.218835
517 | 23:00:50 INFO:root:Epoch[29] Resetting Data Iterator
518 | 23:00:50 INFO:root:Epoch[29] Time cost=365.691
519 | 23:00:51 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0030.params"
520 | 23:01:27 INFO:root:Epoch[30] Batch [100]	Speed: 358.04 samples/sec	Train-Perplexity=3.962719
521 | 23:02:05 INFO:root:Epoch[30] Batch [200]	Speed: 334.57 samples/sec	Train-Perplexity=4.003199
522 | 23:02:43 INFO:root:Epoch[30] Batch [300]	Speed: 335.85 samples/sec	Train-Perplexity=3.933537
523 | 23:03:23 INFO:root:Epoch[30] Batch [400]	Speed: 319.46 samples/sec	Train-Perplexity=4.507263
524 | 23:03:23 INFO:root:Checking BLEU for epoch 30 batch 400
525 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
526 | 23:05:27 INFO:root:b'1gram=72.85%  2gram=48.92%  3gram=36.23%  4gram=26.13%  \r\nBP = 0.9853\r\nBLEU = 0.4223\r\n'
527 | 23:05:27 INFO:root:BLEU: 0.4223 @ epoch 30 batch 400
528 | 23:06:04 INFO:root:Epoch[30] Batch [500]	Speed: 79.72 samples/sec	Train-Perplexity=3.909418
529 | 23:06:43 INFO:root:Epoch[30] Batch [600]	Speed: 324.64 samples/sec	Train-Perplexity=4.146617
530 | 23:06:57 INFO:root:Epoch[30] Resetting Data Iterator
531 | 23:06:57 INFO:root:Epoch[30] Time cost=365.978
532 | 23:06:58 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0031.params"
533 | 23:07:33 INFO:root:Epoch[31] Batch [100]	Speed: 357.55 samples/sec	Train-Perplexity=3.881883
534 | 23:08:12 INFO:root:Epoch[31] Batch [200]	Speed: 335.27 samples/sec	Train-Perplexity=3.929952
535 | 23:08:50 INFO:root:Epoch[31] Batch [300]	Speed: 336.06 samples/sec	Train-Perplexity=3.861903
536 | 23:09:30 INFO:root:Epoch[31] Batch [400]	Speed: 319.83 samples/sec	Train-Perplexity=4.402535
537 | 23:09:30 INFO:root:Checking BLEU for epoch 31 batch 400
538 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms].
539 | 23:11:34 INFO:root:b'1gram=72.48%  2gram=49.10%  3gram=36.89%  4gram=26.80%  \r\nBP = 0.9837\r\nBLEU = 0.4261\r\n'
540 | 23:11:34 INFO:root:BLEU: 0.4261 @ epoch 31 batch 400
541 | 23:12:11 INFO:root:Epoch[31] Batch [500]	Speed: 79.59 samples/sec	Train-Perplexity=3.829989
542 | 23:12:50 INFO:root:Epoch[31] Batch [600]	Speed: 324.17 samples/sec	Train-Perplexity=4.054468
543 | 23:13:04 INFO:root:Epoch[31] Resetting Data Iterator
544 | 23:13:04 INFO:root:Epoch[31] Time cost=366.171
545 | 23:13:04 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0032.params"
546 | 23:13:40 INFO:root:Epoch[32] Batch [100]	Speed: 357.95 samples/sec	Train-Perplexity=3.885665
547 | 23:14:19 INFO:root:Epoch[32] Batch [200]	Speed: 331.46 samples/sec	Train-Perplexity=3.854284
548 | 23:14:57 INFO:root:Epoch[32] Batch [300]	Speed: 335.93 samples/sec	Train-Perplexity=3.783954
549 | 23:15:37 INFO:root:Epoch[32] Batch [400]	Speed: 319.36 samples/sec	Train-Perplexity=4.335291
550 | 23:15:37 INFO:root:Checking BLEU for epoch 32 batch 400
551 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms].
552 | 23:17:41 INFO:root:b'1gram=72.07%  2gram=48.37%  3gram=36.08%  4gram=26.34%  \r\nBP = 0.9963\r\nBLEU = 0.4251\r\n'
553 | 23:17:41 INFO:root:BLEU: 0.4251 @ epoch 32 batch 400
554 | 23:18:18 INFO:root:Epoch[32] Batch [500]	Speed: 79.54 samples/sec	Train-Perplexity=3.787482
555 | 23:18:57 INFO:root:Epoch[32] Batch [600]	Speed: 324.37 samples/sec	Train-Perplexity=3.992663
556 | 23:19:11 INFO:root:Epoch[32] Resetting Data Iterator
557 | 23:19:11 INFO:root:Epoch[32] Time cost=366.729
558 | 23:19:12 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0033.params"
559 | 23:19:48 INFO:root:Epoch[33] Batch [100]	Speed: 358.02 samples/sec	Train-Perplexity=3.789431
560 | 23:20:26 INFO:root:Epoch[33] Batch [200]	Speed: 335.04 samples/sec	Train-Perplexity=3.802084
561 | 23:21:04 INFO:root:Epoch[33] Batch [300]	Speed: 336.21 samples/sec	Train-Perplexity=3.733409
562 | 23:21:44 INFO:root:Epoch[33] Batch [400]	Speed: 319.87 samples/sec	Train-Perplexity=4.257213
563 | 23:21:44 INFO:root:Checking BLEU for epoch 33 batch 400
564 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
565 | 23:23:48 INFO:root:b'1gram=71.47%  2gram=47.45%  3gram=35.10%  4gram=24.91%  \r\nBP = 1.0000\r\nBLEU = 0.4150\r\n'
566 | 23:23:48 INFO:root:BLEU: 0.415 @ epoch 33 batch 400
567 | 23:24:25 INFO:root:Epoch[33] Batch [500]	Speed: 79.44 samples/sec	Train-Perplexity=3.702252
568 | 23:25:05 INFO:root:Epoch[33] Batch [600]	Speed: 324.16 samples/sec	Train-Perplexity=3.953748
569 | 23:25:18 INFO:root:Epoch[33] Resetting Data Iterator
570 | 23:25:18 INFO:root:Epoch[33] Time cost=366.485
571 | 23:25:19 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0034.params"
572 | 23:25:55 INFO:root:Epoch[34] Batch [100]	Speed: 357.78 samples/sec	Train-Perplexity=3.712668
573 | 23:26:33 INFO:root:Epoch[34] Batch [200]	Speed: 335.09 samples/sec	Train-Perplexity=3.726853
574 | 23:27:11 INFO:root:Epoch[34] Batch [300]	Speed: 335.97 samples/sec	Train-Perplexity=3.656434
575 | 23:27:51 INFO:root:Epoch[34] Batch [400]	Speed: 319.40 samples/sec	Train-Perplexity=4.186915
576 | 23:27:51 INFO:root:Checking BLEU for epoch 34 batch 400
577 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
578 | 23:29:56 INFO:root:b'1gram=72.86%  2gram=49.52%  3gram=37.17%  4gram=27.17%  \r\nBP = 0.9879\r\nBLEU = 0.4317\r\n'
579 | 23:29:56 INFO:root:BLEU: 0.4317 @ epoch 34 batch 400
580 | 23:30:33 INFO:root:Epoch[34] Batch [500]	Speed: 78.88 samples/sec	Train-Perplexity=3.626724
581 | 23:31:13 INFO:root:Epoch[34] Batch [600]	Speed: 324.34 samples/sec	Train-Perplexity=3.852287
582 | 23:31:27 INFO:root:Epoch[34] Resetting Data Iterator
583 | 23:31:27 INFO:root:Epoch[34] Time cost=367.663
584 | 23:31:27 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0035.params"
585 | 23:32:03 INFO:root:Epoch[35] Batch [100]	Speed: 357.82 samples/sec	Train-Perplexity=3.642621
586 | 23:32:41 INFO:root:Epoch[35] Batch [200]	Speed: 334.81 samples/sec	Train-Perplexity=3.670861
587 | 23:33:19 INFO:root:Epoch[35] Batch [300]	Speed: 336.10 samples/sec	Train-Perplexity=3.608510
588 | 23:34:00 INFO:root:Epoch[35] Batch [400]	Speed: 319.11 samples/sec	Train-Perplexity=4.110463
589 | 23:34:00 INFO:root:Checking BLEU for epoch 35 batch 400
590 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
591 | 23:36:04 INFO:root:b'1gram=73.04%  2gram=49.33%  3gram=37.01%  4gram=27.20%  \r\nBP = 0.9956\r\nBLEU = 0.4345\r\n'
592 | 23:36:04 INFO:root:BLEU: 0.4345 @ epoch 35 batch 400
593 | 23:36:41 INFO:root:Epoch[35] Batch [500]	Speed: 79.50 samples/sec	Train-Perplexity=3.585162
594 | 23:37:20 INFO:root:Epoch[35] Batch [600]	Speed: 324.01 samples/sec	Train-Perplexity=3.779326
595 | 23:37:34 INFO:root:Epoch[35] Resetting Data Iterator
596 | 23:37:34 INFO:root:Epoch[35] Time cost=366.483
597 | 23:37:34 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0036.params"
598 | 23:38:10 INFO:root:Epoch[36] Batch [100]	Speed: 357.39 samples/sec	Train-Perplexity=3.574405
599 | 23:38:48 INFO:root:Epoch[36] Batch [200]	Speed: 334.85 samples/sec	Train-Perplexity=3.615811
600 | 23:39:27 INFO:root:Epoch[36] Batch [300]	Speed: 335.90 samples/sec	Train-Perplexity=3.536461
601 | 23:40:07 INFO:root:Epoch[36] Batch [400]	Speed: 319.57 samples/sec	Train-Perplexity=4.072223
602 | 23:40:07 INFO:root:Checking BLEU for epoch 36 batch 400
603 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
604 | 23:42:11 INFO:root:b'1gram=73.09%  2gram=49.54%  3gram=37.24%  4gram=27.26%  \r\nBP = 0.9971\r\nBLEU = 0.4366\r\n'
605 | 23:42:11 INFO:root:BLEU: 0.4366 @ epoch 36 batch 400
606 | 23:42:11 INFO:root:Current BLEU: 0.4366 > prev best 0.4359 in epoch 29
607 | 23:42:11 INFO:root:Saving...
608 | 23:42:11 INFO:root:Saved checkpoint to "best_bleu-0037.params"
609 | 23:42:48 INFO:root:Epoch[36] Batch [500]	Speed: 79.34 samples/sec	Train-Perplexity=3.511627
610 | 23:43:27 INFO:root:Epoch[36] Batch [600]	Speed: 324.38 samples/sec	Train-Perplexity=3.713632
611 | 23:43:41 INFO:root:Epoch[36] Resetting Data Iterator
612 | 23:43:41 INFO:root:Epoch[36] Time cost=366.780
613 | 23:43:42 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0037.params"
614 | 23:44:18 INFO:root:Epoch[37] Batch [100]	Speed: 357.87 samples/sec	Train-Perplexity=3.533548
615 | 23:44:56 INFO:root:Epoch[37] Batch [200]	Speed: 334.95 samples/sec	Train-Perplexity=3.543121
616 | 23:45:34 INFO:root:Epoch[37] Batch [300]	Speed: 335.64 samples/sec	Train-Perplexity=3.484517
617 | 23:46:14 INFO:root:Epoch[37] Batch [400]	Speed: 319.59 samples/sec	Train-Perplexity=3.976594
618 | 23:46:14 INFO:root:Checking BLEU for epoch 37 batch 400
619 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms].
620 | 23:48:18 INFO:root:b'1gram=71.98%  2gram=47.77%  3gram=35.48%  4gram=25.23%  \r\nBP = 0.9977\r\nBLEU = 0.4179\r\n'
621 | 23:48:18 INFO:root:BLEU: 0.4179 @ epoch 37 batch 400
622 | 23:48:55 INFO:root:Epoch[37] Batch [500]	Speed: 79.58 samples/sec	Train-Perplexity=3.486265
623 | 23:49:34 INFO:root:Epoch[37] Batch [600]	Speed: 323.40 samples/sec	Train-Perplexity=3.658202
624 | 23:49:48 INFO:root:Epoch[37] Resetting Data Iterator
625 | 23:49:48 INFO:root:Epoch[37] Time cost=366.380
626 | 23:49:49 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0038.params"
627 | 23:50:25 INFO:root:Epoch[38] Batch [100]	Speed: 357.33 samples/sec	Train-Perplexity=3.494332
628 | 23:51:03 INFO:root:Epoch[38] Batch [200]	Speed: 334.01 samples/sec	Train-Perplexity=3.510251
629 | 23:51:41 INFO:root:Epoch[38] Batch [300]	Speed: 335.70 samples/sec	Train-Perplexity=3.437536
630 | 23:52:21 INFO:root:Epoch[38] Batch [400]	Speed: 319.40 samples/sec	Train-Perplexity=3.915965
631 | 23:52:21 INFO:root:Checking BLEU for epoch 38 batch 400
632 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms].
633 | 23:54:24 INFO:root:b'1gram=73.34%  2gram=49.38%  3gram=37.60%  4gram=27.72%  \r\nBP = 0.9792\r\nBLEU = 0.4316\r\n'
634 | 23:54:24 INFO:root:BLEU: 0.4316 @ epoch 38 batch 400
635 | 23:55:01 INFO:root:Epoch[38] Batch [500]	Speed: 80.05 samples/sec	Train-Perplexity=3.412933
636 | 23:55:41 INFO:root:Epoch[38] Batch [600]	Speed: 324.22 samples/sec	Train-Perplexity=3.610383
637 | 23:55:54 INFO:root:Epoch[38] Resetting Data Iterator
638 | 23:55:54 INFO:root:Epoch[38] Time cost=365.524
639 | 23:55:55 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0039.params"
640 | 23:56:31 INFO:root:Epoch[39] Batch [100]	Speed: 357.86 samples/sec	Train-Perplexity=3.410397
641 | 23:57:09 INFO:root:Epoch[39] Batch [200]	Speed: 334.67 samples/sec	Train-Perplexity=3.453946
642 | 23:57:47 INFO:root:Epoch[39] Batch [300]	Speed: 335.97 samples/sec	Train-Perplexity=3.392457
643 | 23:58:27 INFO:root:Epoch[39] Batch [400]	Speed: 319.98 samples/sec	Train-Perplexity=3.868850
644 | 23:58:27 INFO:root:Checking BLEU for epoch 39 batch 400
645 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms].
646 | 00:00:32 INFO:root:b'1gram=72.16%  2gram=47.99%  3gram=35.59%  4gram=25.67%  \r\nBP = 1.0000\r\nBLEU = 0.4217\r\n'
647 | 00:00:32 INFO:root:BLEU: 0.4217 @ epoch 39 batch 400
648 | 00:01:09 INFO:root:Epoch[39] Batch [500]	Speed: 79.26 samples/sec	Train-Perplexity=3.392095
649 | 00:01:48 INFO:root:Epoch[39] Batch [600]	Speed: 324.30 samples/sec	Train-Perplexity=3.557561
650 | 00:02:02 INFO:root:Epoch[39] Resetting Data Iterator
651 | 00:02:02 INFO:root:Epoch[39] Time cost=366.865
652 | 00:02:02 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0040.params"
653 | 00:02:39 INFO:root:Epoch[40] Batch [100]	Speed: 350.17 samples/sec	Train-Perplexity=3.356947
654 | 00:03:17 INFO:root:Epoch[40] Batch [200]	Speed: 334.80 samples/sec	Train-Perplexity=3.408309
655 | 00:03:56 INFO:root:Epoch[40] Batch [300]	Speed: 335.42 samples/sec	Train-Perplexity=3.341321
656 | 00:04:36 INFO:root:Epoch[40] Batch [400]	Speed: 319.48 samples/sec	Train-Perplexity=3.813529
657 | 00:04:36 INFO:root:Checking BLEU for epoch 40 batch 400
658 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [244 ms].
659 | 00:06:40 INFO:root:b'1gram=73.21%  2gram=49.27%  3gram=37.03%  4gram=27.07%  \r\nBP = 0.9908\r\nBLEU = 0.4321\r\n'
660 | 00:06:40 INFO:root:BLEU: 0.4321 @ epoch 40 batch 400
661 | 00:07:16 INFO:root:Epoch[40] Batch [500]	Speed: 79.59 samples/sec	Train-Perplexity=3.342658
662 | 00:07:56 INFO:root:Epoch[40] Batch [600]	Speed: 324.20 samples/sec	Train-Perplexity=3.529993
663 | 00:08:10 INFO:root:Epoch[40] Resetting Data Iterator
664 | 00:08:10 INFO:root:Epoch[40] Time cost=367.100
665 | 00:08:10 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0041.params"
666 | 00:08:46 INFO:root:Epoch[41] Batch [100]	Speed: 357.20 samples/sec	Train-Perplexity=3.363024
667 | 00:09:24 INFO:root:Epoch[41] Batch [200]	Speed: 334.37 samples/sec	Train-Perplexity=3.352243
668 | 00:10:03 INFO:root:Epoch[41] Batch [300]	Speed: 335.77 samples/sec	Train-Perplexity=3.289595
669 | 00:10:43 INFO:root:Epoch[41] Batch [400]	Speed: 319.19 samples/sec	Train-Perplexity=3.754321
670 | 00:10:43 INFO:root:Checking BLEU for epoch 41 batch 400
671 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
672 | 00:12:48 INFO:root:b'1gram=72.25%  2gram=49.30%  3gram=37.14%  4gram=26.91%  \r\nBP = 1.0000\r\nBLEU = 0.4344\r\n'
673 | 00:12:48 INFO:root:BLEU: 0.4344 @ epoch 41 batch 400
674 | 00:13:25 INFO:root:Epoch[41] Batch [500]	Speed: 79.01 samples/sec	Train-Perplexity=3.291721
675 | 00:14:04 INFO:root:Epoch[41] Batch [600]	Speed: 323.94 samples/sec	Train-Perplexity=3.463519
676 | 00:14:18 INFO:root:Epoch[41] Resetting Data Iterator
677 | 00:14:18 INFO:root:Epoch[41] Time cost=367.667
678 | 00:14:19 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0042.params"
679 | 00:14:55 INFO:root:Epoch[42] Batch [100]	Speed: 357.39 samples/sec	Train-Perplexity=3.321909
680 | 00:15:33 INFO:root:Epoch[42] Batch [200]	Speed: 334.74 samples/sec	Train-Perplexity=3.308056
681 | 00:16:11 INFO:root:Epoch[42] Batch [300]	Speed: 334.78 samples/sec	Train-Perplexity=3.234425
682 | 00:16:51 INFO:root:Epoch[42] Batch [400]	Speed: 319.44 samples/sec	Train-Perplexity=3.724314
683 | 00:16:51 INFO:root:Checking BLEU for epoch 42 batch 400
684 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms].
685 | 00:18:58 INFO:root:b'1gram=70.86%  2gram=47.49%  3gram=35.56%  4gram=25.64%  \r\nBP = 1.0000\r\nBLEU = 0.4185\r\n'
686 | 00:18:58 INFO:root:BLEU: 0.4185 @ epoch 42 batch 400
687 | 00:19:35 INFO:root:Epoch[42] Batch [500]	Speed: 78.28 samples/sec	Train-Perplexity=3.245776
688 | 00:20:14 INFO:root:Epoch[42] Batch [600]	Speed: 323.44 samples/sec	Train-Perplexity=3.407396
689 | 00:20:28 INFO:root:Epoch[42] Resetting Data Iterator
690 | 00:20:28 INFO:root:Epoch[42] Time cost=369.212
691 | 00:20:28 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0043.params"
692 | 00:21:04 INFO:root:Epoch[43] Batch [100]	Speed: 356.92 samples/sec	Train-Perplexity=3.250229
693 | 00:21:43 INFO:root:Epoch[43] Batch [200]	Speed: 335.56 samples/sec	Train-Perplexity=3.274649
694 | 00:22:21 INFO:root:Epoch[43] Batch [300]	Speed: 336.32 samples/sec	Train-Perplexity=3.214198
695 | 00:23:01 INFO:root:Epoch[43] Batch [400]	Speed: 319.96 samples/sec	Train-Perplexity=3.701373
696 | 00:23:01 INFO:root:Checking BLEU for epoch 43 batch 400
697 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
698 | 00:25:06 INFO:root:b'1gram=71.57%  2gram=48.42%  3gram=36.15%  4gram=26.46%  \r\nBP = 1.0000\r\nBLEU = 0.4267\r\n'
699 | 00:25:06 INFO:root:BLEU: 0.4267 @ epoch 43 batch 400
700 | 00:25:43 INFO:root:Epoch[43] Batch [500]	Speed: 79.03 samples/sec	Train-Perplexity=3.202524
701 | 00:26:22 INFO:root:Epoch[43] Batch [600]	Speed: 324.71 samples/sec	Train-Perplexity=3.374590
702 | 00:26:36 INFO:root:Epoch[43] Resetting Data Iterator
703 | 00:26:36 INFO:root:Epoch[43] Time cost=367.202
704 | 00:26:36 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0044.params"
705 | 00:27:12 INFO:root:Epoch[44] Batch [100]	Speed: 358.34 samples/sec	Train-Perplexity=3.251368
706 | 00:27:50 INFO:root:Epoch[44] Batch [200]	Speed: 335.11 samples/sec	Train-Perplexity=3.230534
707 | 00:28:28 INFO:root:Epoch[44] Batch [300]	Speed: 336.60 samples/sec	Train-Perplexity=3.183112
708 | 00:29:08 INFO:root:Epoch[44] Batch [400]	Speed: 320.27 samples/sec	Train-Perplexity=3.616401
709 | 00:29:08 INFO:root:Checking BLEU for epoch 44 batch 400
710 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [239 ms].
711 | 00:31:50 INFO:root:b'1gram=72.44%  2gram=48.61%  3gram=36.49%  4gram=26.59%  \r\nBP = 1.0000\r\nBLEU = 0.4299\r\n'
712 | 00:31:50 INFO:root:BLEU: 0.4299 @ epoch 44 batch 400
713 | 00:32:59 INFO:root:Epoch[44] Batch [500]	Speed: 55.43 samples/sec	Train-Perplexity=3.160131
714 | 00:33:43 INFO:root:Epoch[44] Batch [600]	Speed: 294.19 samples/sec	Train-Perplexity=3.314777
715 | 00:34:05 INFO:root:Epoch[44] Resetting Data Iterator
716 | 00:34:05 INFO:root:Epoch[44] Time cost=448.968
717 | 00:34:06 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0045.params"
718 | 00:35:15 INFO:root:Epoch[45] Batch [100]	Speed: 187.53 samples/sec	Train-Perplexity=3.168578
719 | 00:35:58 INFO:root:Epoch[45] Batch [200]	Speed: 296.46 samples/sec	Train-Perplexity=3.186525
720 | 00:36:37 INFO:root:Epoch[45] Batch [300]	Speed: 331.03 samples/sec	Train-Perplexity=3.124966
721 | 00:37:22 INFO:root:Epoch[45] Batch [400]	Speed: 283.18 samples/sec	Train-Perplexity=3.576049
722 | 00:37:22 INFO:root:Checking BLEU for epoch 45 batch 400
723 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
724 | 00:39:27 INFO:root:b'1gram=72.85%  2gram=48.60%  3gram=36.19%  4gram=26.58%  \r\nBP = 1.0000\r\nBLEU = 0.4296\r\n'
725 | 00:39:27 INFO:root:BLEU: 0.4296 @ epoch 45 batch 400
726 | 00:40:04 INFO:root:Epoch[45] Batch [500]	Speed: 78.93 samples/sec	Train-Perplexity=3.120963
727 | 00:40:44 INFO:root:Epoch[45] Batch [600]	Speed: 324.25 samples/sec	Train-Perplexity=3.287161
728 | 00:40:57 INFO:root:Epoch[45] Resetting Data Iterator
729 | 00:40:57 INFO:root:Epoch[45] Time cost=410.759
730 | 00:40:58 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0046.params"
731 | 00:41:34 INFO:root:Epoch[46] Batch [100]	Speed: 358.12 samples/sec	Train-Perplexity=3.126530
732 | 00:42:12 INFO:root:Epoch[46] Batch [200]	Speed: 335.34 samples/sec	Train-Perplexity=3.139847
733 | 00:42:50 INFO:root:Epoch[46] Batch [300]	Speed: 336.50 samples/sec	Train-Perplexity=3.094464
734 | 00:43:30 INFO:root:Epoch[46] Batch [400]	Speed: 318.95 samples/sec	Train-Perplexity=3.523277
735 | 00:43:30 INFO:root:Checking BLEU for epoch 46 batch 400
736 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
737 | 00:45:35 INFO:root:b'1gram=72.20%  2gram=48.50%  3gram=35.88%  4gram=26.09%  \r\nBP = 1.0000\r\nBLEU = 0.4255\r\n'
738 | 00:45:35 INFO:root:BLEU: 0.4255 @ epoch 46 batch 400
739 | 00:46:12 INFO:root:Epoch[46] Batch [500]	Speed: 79.07 samples/sec	Train-Perplexity=3.091579
740 | 00:46:51 INFO:root:Epoch[46] Batch [600]	Speed: 324.51 samples/sec	Train-Perplexity=3.243255
741 | 00:47:05 INFO:root:Epoch[46] Resetting Data Iterator
742 | 00:47:05 INFO:root:Epoch[46] Time cost=367.176
743 | 00:47:06 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0047.params"
744 | 00:47:42 INFO:root:Epoch[47] Batch [100]	Speed: 358.25 samples/sec	Train-Perplexity=3.127451
745 | 00:48:20 INFO:root:Epoch[47] Batch [200]	Speed: 335.39 samples/sec	Train-Perplexity=3.118381
746 | 00:48:58 INFO:root:Epoch[47] Batch [300]	Speed: 336.53 samples/sec	Train-Perplexity=3.064091
747 | 00:49:38 INFO:root:Epoch[47] Batch [400]	Speed: 319.99 samples/sec	Train-Perplexity=3.486758
748 | 00:49:38 INFO:root:Checking BLEU for epoch 47 batch 400
749 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
750 | 00:51:41 INFO:root:b'1gram=73.16%  2gram=49.56%  3gram=37.50%  4gram=28.27%  \r\nBP = 0.9755\r\nBLEU = 0.4319\r\n'
751 | 00:51:41 INFO:root:BLEU: 0.4319 @ epoch 47 batch 400
752 | 00:52:17 INFO:root:Epoch[47] Batch [500]	Speed: 80.20 samples/sec	Train-Perplexity=3.042889
753 | 00:52:57 INFO:root:Epoch[47] Batch [600]	Speed: 324.67 samples/sec	Train-Perplexity=3.219357
754 | 00:53:10 INFO:root:Epoch[47] Resetting Data Iterator
755 | 00:53:10 INFO:root:Epoch[47] Time cost=364.720
756 | 00:53:11 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0048.params"
757 | 00:53:47 INFO:root:Epoch[48] Batch [100]	Speed: 358.36 samples/sec	Train-Perplexity=3.065208
758 | 00:54:25 INFO:root:Epoch[48] Batch [200]	Speed: 335.32 samples/sec	Train-Perplexity=3.078973
759 | 00:55:03 INFO:root:Epoch[48] Batch [300]	Speed: 336.66 samples/sec	Train-Perplexity=3.022346
760 | 00:55:43 INFO:root:Epoch[48] Batch [400]	Speed: 319.88 samples/sec	Train-Perplexity=3.463543
761 | 00:55:43 INFO:root:Checking BLEU for epoch 48 batch 400
762 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
763 | 00:57:49 INFO:root:b'1gram=72.70%  2gram=49.23%  3gram=36.91%  4gram=27.22%  \r\nBP = 1.0000\r\nBLEU = 0.4354\r\n'
764 | 00:57:49 INFO:root:BLEU: 0.4354 @ epoch 48 batch 400
765 | 00:58:26 INFO:root:Epoch[48] Batch [500]	Speed: 78.62 samples/sec	Train-Perplexity=3.017454
766 | 00:59:05 INFO:root:Epoch[48] Batch [600]	Speed: 324.70 samples/sec	Train-Perplexity=3.170958
767 | 00:59:19 INFO:root:Epoch[48] Resetting Data Iterator
768 | 00:59:19 INFO:root:Epoch[48] Time cost=367.905
769 | 00:59:20 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0049.params"
770 | 00:59:55 INFO:root:Epoch[49] Batch [100]	Speed: 358.41 samples/sec	Train-Perplexity=3.014912
771 | 01:00:34 INFO:root:Epoch[49] Batch [200]	Speed: 335.27 samples/sec	Train-Perplexity=3.051303
772 | 01:01:12 INFO:root:Epoch[49] Batch [300]	Speed: 336.77 samples/sec	Train-Perplexity=2.991697
773 | 01:01:52 INFO:root:Epoch[49] Batch [400]	Speed: 320.15 samples/sec	Train-Perplexity=3.408235
774 | 01:01:52 INFO:root:Checking BLEU for epoch 49 batch 400
775 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms].
776 | 01:03:55 INFO:root:b'1gram=74.00%  2gram=49.88%  3gram=37.87%  4gram=28.41%  \r\nBP = 0.9800\r\nBLEU = 0.4375\r\n'
777 | 01:03:55 INFO:root:BLEU: 0.4375 @ epoch 49 batch 400
778 | 01:03:55 INFO:root:Current BLEU: 0.4375 > prev best 0.4366 in epoch 36
779 | 01:03:55 INFO:root:Saving...
780 | 01:03:55 INFO:root:Saved checkpoint to "best_bleu-0050.params"
781 | 01:04:32 INFO:root:Epoch[49] Batch [500]	Speed: 79.91 samples/sec	Train-Perplexity=2.990173
782 | 01:05:11 INFO:root:Epoch[49] Batch [600]	Speed: 324.86 samples/sec	Train-Perplexity=3.145794
783 | 01:05:25 INFO:root:Epoch[49] Resetting Data Iterator
784 | 01:05:25 INFO:root:Epoch[49] Time cost=365.229
785 | 01:05:25 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0050.params"
786 | 01:06:01 INFO:root:Epoch[50] Batch [100]	Speed: 357.71 samples/sec	Train-Perplexity=3.020203
787 | 01:06:40 INFO:root:Epoch[50] Batch [200]	Speed: 335.67 samples/sec	Train-Perplexity=3.019267
788 | 01:07:18 INFO:root:Epoch[50] Batch [300]	Speed: 336.66 samples/sec	Train-Perplexity=2.957691
789 | 01:07:58 INFO:root:Epoch[50] Batch [400]	Speed: 315.11 samples/sec	Train-Perplexity=3.374037
790 | 01:07:58 INFO:root:Checking BLEU for epoch 50 batch 400
791 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
792 | 01:10:01 INFO:root:b'1gram=73.25%  2gram=48.85%  3gram=36.41%  4gram=26.88%  \r\nBP = 0.9887\r\nBLEU = 0.4277\r\n'
793 | 01:10:01 INFO:root:BLEU: 0.4277 @ epoch 50 batch 400
794 | 01:10:37 INFO:root:Epoch[50] Batch [500]	Speed: 80.42 samples/sec	Train-Perplexity=2.954035
795 | 01:11:17 INFO:root:Epoch[50] Batch [600]	Speed: 324.71 samples/sec	Train-Perplexity=3.105496
796 | 01:11:30 INFO:root:Epoch[50] Resetting Data Iterator
797 | 01:11:30 INFO:root:Epoch[50] Time cost=364.881
798 | 01:11:31 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0051.params"
799 | 01:12:07 INFO:root:Epoch[51] Batch [100]	Speed: 358.36 samples/sec	Train-Perplexity=2.987536
800 | 01:12:45 INFO:root:Epoch[51] Batch [200]	Speed: 335.53 samples/sec	Train-Perplexity=2.985420
801 | 01:13:23 INFO:root:Epoch[51] Batch [300]	Speed: 336.94 samples/sec	Train-Perplexity=2.933420
802 | 01:14:03 INFO:root:Epoch[51] Batch [400]	Speed: 319.95 samples/sec	Train-Perplexity=3.334243
803 | 01:14:03 INFO:root:Checking BLEU for epoch 51 batch 400
804 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
805 | 01:16:08 INFO:root:b'1gram=72.88%  2gram=49.27%  3gram=36.94%  4gram=27.25%  \r\nBP = 1.0000\r\nBLEU = 0.4361\r\n'
806 | 01:16:08 INFO:root:BLEU: 0.4361 @ epoch 51 batch 400
807 | 01:16:45 INFO:root:Epoch[51] Batch [500]	Speed: 79.19 samples/sec	Train-Perplexity=2.927022
808 | 01:17:24 INFO:root:Epoch[51] Batch [600]	Speed: 324.95 samples/sec	Train-Perplexity=3.073325
809 | 01:17:38 INFO:root:Epoch[51] Resetting Data Iterator
810 | 01:17:38 INFO:root:Epoch[51] Time cost=366.619
811 | 01:17:38 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0052.params"
812 | 01:18:14 INFO:root:Epoch[52] Batch [100]	Speed: 358.80 samples/sec	Train-Perplexity=2.942163
813 | 01:18:52 INFO:root:Epoch[52] Batch [200]	Speed: 335.97 samples/sec	Train-Perplexity=2.951506
814 | 01:19:30 INFO:root:Epoch[52] Batch [300]	Speed: 336.97 samples/sec	Train-Perplexity=2.895919
815 | 01:20:10 INFO:root:Epoch[52] Batch [400]	Speed: 319.99 samples/sec	Train-Perplexity=3.298124
816 | 01:20:10 INFO:root:Checking BLEU for epoch 52 batch 400
817 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
818 | 01:22:16 INFO:root:b'1gram=71.71%  2gram=47.97%  3gram=35.51%  4gram=26.02%  \r\nBP = 1.0000\r\nBLEU = 0.4222\r\n'
819 | 01:22:16 INFO:root:BLEU: 0.4222 @ epoch 52 batch 400
820 | 01:22:52 INFO:root:Epoch[52] Batch [500]	Speed: 79.03 samples/sec	Train-Perplexity=2.905427
821 | 01:23:32 INFO:root:Epoch[52] Batch [600]	Speed: 325.05 samples/sec	Train-Perplexity=3.039335
822 | 01:23:46 INFO:root:Epoch[52] Resetting Data Iterator
823 | 01:23:46 INFO:root:Epoch[52] Time cost=367.316
824 | 01:23:46 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0053.params"
825 | 01:24:22 INFO:root:Epoch[53] Batch [100]	Speed: 358.35 samples/sec	Train-Perplexity=2.912539
826 | 01:25:00 INFO:root:Epoch[53] Batch [200]	Speed: 335.07 samples/sec	Train-Perplexity=2.912047
827 | 01:25:38 INFO:root:Epoch[53] Batch [300]	Speed: 336.40 samples/sec	Train-Perplexity=2.884444
828 | 01:26:18 INFO:root:Epoch[53] Batch [400]	Speed: 319.80 samples/sec	Train-Perplexity=3.274917
829 | 01:26:18 INFO:root:Checking BLEU for epoch 53 batch 400
830 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
831 | 01:28:23 INFO:root:b'1gram=72.65%  2gram=49.63%  3gram=37.62%  4gram=28.08%  \r\nBP = 1.0000\r\nBLEU = 0.4418\r\n'
832 | 01:28:23 INFO:root:BLEU: 0.4418 @ epoch 53 batch 400
833 | 01:28:23 INFO:root:Current BLEU: 0.4418 > prev best 0.4375 in epoch 49
834 | 01:28:23 INFO:root:Saving...
835 | 01:28:23 INFO:root:Saved checkpoint to "best_bleu-0054.params"
836 | 01:29:00 INFO:root:Epoch[53] Batch [500]	Speed: 79.18 samples/sec	Train-Perplexity=2.870809
837 | 01:29:40 INFO:root:Epoch[53] Batch [600]	Speed: 324.26 samples/sec	Train-Perplexity=3.016911
838 | 01:29:53 INFO:root:Epoch[53] Resetting Data Iterator
839 | 01:29:53 INFO:root:Epoch[53] Time cost=366.881
840 | 01:29:54 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0054.params"
841 | 01:30:30 INFO:root:Epoch[54] Batch [100]	Speed: 358.38 samples/sec	Train-Perplexity=2.883674
842 | 01:31:08 INFO:root:Epoch[54] Batch [200]	Speed: 335.57 samples/sec	Train-Perplexity=2.884527
843 | 01:31:46 INFO:root:Epoch[54] Batch [300]	Speed: 336.49 samples/sec	Train-Perplexity=2.842884
844 | 01:32:26 INFO:root:Epoch[54] Batch [400]	Speed: 320.14 samples/sec	Train-Perplexity=3.233647
845 | 01:32:26 INFO:root:Checking BLEU for epoch 54 batch 400
846 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [241 ms].
847 | 01:34:29 INFO:root:b'1gram=72.82%  2gram=48.45%  3gram=36.25%  4gram=26.67%  \r\nBP = 0.9922\r\nBLEU = 0.4264\r\n'
848 | 01:34:30 INFO:root:BLEU: 0.4264 @ epoch 54 batch 400
849 | 01:35:06 INFO:root:Epoch[54] Batch [500]	Speed: 79.87 samples/sec	Train-Perplexity=2.852245
850 | 01:35:45 INFO:root:Epoch[54] Batch [600]	Speed: 325.38 samples/sec	Train-Perplexity=2.977500
851 | 01:35:59 INFO:root:Epoch[54] Resetting Data Iterator
852 | 01:35:59 INFO:root:Epoch[54] Time cost=365.198
853 | 01:36:00 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0055.params"
854 | 01:36:35 INFO:root:Epoch[55] Batch [100]	Speed: 358.32 samples/sec	Train-Perplexity=2.842367
855 | 01:37:14 INFO:root:Epoch[55] Batch [200]	Speed: 336.32 samples/sec	Train-Perplexity=2.863954
856 | 01:37:52 INFO:root:Epoch[55] Batch [300]	Speed: 336.51 samples/sec	Train-Perplexity=2.809389
857 | 01:38:32 INFO:root:Epoch[55] Batch [400]	Speed: 320.76 samples/sec	Train-Perplexity=3.203576
858 | 01:38:32 INFO:root:Checking BLEU for epoch 55 batch 400
859 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [243 ms].
860 | 01:40:37 INFO:root:b'1gram=71.48%  2gram=48.49%  3gram=36.17%  4gram=26.42%  \r\nBP = 1.0000\r\nBLEU = 0.4266\r\n'
861 | 01:40:37 INFO:root:BLEU: 0.4266 @ epoch 55 batch 400
862 | 01:41:13 INFO:root:Epoch[55] Batch [500]	Speed: 79.12 samples/sec	Train-Perplexity=2.816309
863 | 01:41:53 INFO:root:Epoch[55] Batch [600]	Speed: 325.54 samples/sec	Train-Perplexity=2.954175
864 | 01:42:06 INFO:root:Epoch[55] Resetting Data Iterator
865 | 01:42:06 INFO:root:Epoch[55] Time cost=366.534
866 | 01:42:07 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0056.params"
867 | 01:42:43 INFO:root:Epoch[56] Batch [100]	Speed: 359.04 samples/sec	Train-Perplexity=2.832930
868 | 01:43:21 INFO:root:Epoch[56] Batch [200]	Speed: 335.64 samples/sec	Train-Perplexity=2.840716
869 | 01:43:59 INFO:root:Epoch[56] Batch [300]	Speed: 336.90 samples/sec	Train-Perplexity=2.801349
870 | 01:44:39 INFO:root:Epoch[56] Batch [400]	Speed: 320.40 samples/sec	Train-Perplexity=3.174728
871 | 01:44:39 INFO:root:Checking BLEU for epoch 56 batch 400
872 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms].
873 | 01:46:44 INFO:root:b'1gram=72.53%  2gram=48.84%  3gram=36.61%  4gram=26.60%  \r\nBP = 1.0000\r\nBLEU = 0.4310\r\n'
874 | 01:46:44 INFO:root:BLEU: 0.431 @ epoch 56 batch 400
875 | 01:47:21 INFO:root:Epoch[56] Batch [500]	Speed: 78.99 samples/sec	Train-Perplexity=2.816300
876 | 01:48:00 INFO:root:Epoch[56] Batch [600]	Speed: 325.36 samples/sec	Train-Perplexity=2.920598
877 | 01:48:14 INFO:root:Epoch[56] Resetting Data Iterator
878 | 01:48:14 INFO:root:Epoch[56] Time cost=366.832
879 | 01:48:14 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0057.params"
880 | 01:48:50 INFO:root:Epoch[57] Batch [100]	Speed: 358.55 samples/sec	Train-Perplexity=2.874106
881 | 01:49:28 INFO:root:Epoch[57] Batch [200]	Speed: 335.34 samples/sec	Train-Perplexity=2.828585
882 | 01:50:06 INFO:root:Epoch[57] Batch [300]	Speed: 337.02 samples/sec	Train-Perplexity=2.769679
883 | 01:50:46 INFO:root:Epoch[57] Batch [400]	Speed: 320.39 samples/sec	Train-Perplexity=3.160472
884 | 01:50:46 INFO:root:Checking BLEU for epoch 57 batch 400
885 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
886 | 01:52:51 INFO:root:b'1gram=72.43%  2gram=49.31%  3gram=36.48%  4gram=26.66%  \r\nBP = 1.0000\r\nBLEU = 0.4317\r\n'
887 | 01:52:51 INFO:root:BLEU: 0.4317 @ epoch 57 batch 400
888 | 01:53:27 INFO:root:Epoch[57] Batch [500]	Speed: 79.45 samples/sec	Train-Perplexity=2.769165
889 | 01:54:07 INFO:root:Epoch[57] Batch [600]	Speed: 325.05 samples/sec	Train-Perplexity=2.893226
890 | 01:54:20 INFO:root:Epoch[57] Resetting Data Iterator
891 | 01:54:20 INFO:root:Epoch[57] Time cost=366.027
892 | 01:54:21 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0058.params"
893 | 01:54:57 INFO:root:Epoch[58] Batch [100]	Speed: 358.92 samples/sec	Train-Perplexity=2.794343
894 | 01:55:35 INFO:root:Epoch[58] Batch [200]	Speed: 336.01 samples/sec	Train-Perplexity=2.797318
895 | 01:56:14 INFO:root:Epoch[58] Batch [300]	Speed: 330.93 samples/sec	Train-Perplexity=2.754186
896 | 01:56:53 INFO:root:Epoch[58] Batch [400]	Speed: 321.14 samples/sec	Train-Perplexity=3.105465
897 | 01:56:53 INFO:root:Checking BLEU for epoch 58 batch 400
898 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [242 ms].
899 | 01:58:57 INFO:root:b'1gram=72.48%  2gram=48.46%  3gram=36.07%  4gram=26.21%  \r\nBP = 1.0000\r\nBLEU = 0.4269\r\n'
900 | 01:58:57 INFO:root:BLEU: 0.4269 @ epoch 58 batch 400
901 | 01:59:34 INFO:root:Epoch[58] Batch [500]	Speed: 79.88 samples/sec	Train-Perplexity=2.762731
902 | 02:00:13 INFO:root:Epoch[58] Batch [600]	Speed: 326.43 samples/sec	Train-Perplexity=2.879713
903 | 02:00:26 INFO:root:Epoch[58] Resetting Data Iterator
904 | 02:00:26 INFO:root:Epoch[58] Time cost=365.398
905 | 02:00:27 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0059.params"
906 | 02:01:03 INFO:root:Epoch[59] Batch [100]	Speed: 359.87 samples/sec	Train-Perplexity=2.757996
907 | 02:01:41 INFO:root:Epoch[59] Batch [200]	Speed: 337.06 samples/sec	Train-Perplexity=2.766533
908 | 02:02:19 INFO:root:Epoch[59] Batch [300]	Speed: 337.16 samples/sec	Train-Perplexity=2.724831
909 | 02:02:59 INFO:root:Epoch[59] Batch [400]	Speed: 320.85 samples/sec	Train-Perplexity=3.093099
910 | 02:02:59 INFO:root:Checking BLEU for epoch 59 batch 400
911 | Loading reference data D:\users\home\Projects\mxnmt\IWSLT\dev\IWSLT.dev.txt...446 reference sentences read [240 ms].
912 | 02:05:04 INFO:root:b'1gram=71.82%  2gram=48.89%  3gram=36.85%  4gram=27.27%  \r\nBP = 1.0000\r\nBLEU = 0.4334\r\n'
913 | 02:05:04 INFO:root:BLEU: 0.4334 @ epoch 59 batch 400
914 | 02:05:41 INFO:root:Epoch[59] Batch [500]	Speed: 79.01 samples/sec	Train-Perplexity=2.736961
915 | 02:06:20 INFO:root:Epoch[59] Batch [600]	Speed: 325.19 samples/sec	Train-Perplexity=2.838531
916 | 02:06:34 INFO:root:Epoch[59] Resetting Data Iterator
917 | 02:06:34 INFO:root:Epoch[59] Time cost=366.495
918 | 02:06:34 INFO:root:Saved checkpoint to "D:\users\home\Projects\mxnmt\IWSLT\model\zh-en-iwslt-0060.params"
919 | 
920 | Process finished with exit code 0
921 | 


--------------------------------------------------------------------------------