├── -1. Tensorflow RNN Basic of Basic
    ├── README.MD
    └── dynamic-rnn.png
├── 0. Basic
    ├── README.md
    ├── RNN-TF-dynamic-decode.py
    └── dynamic-rnn-decode.png
├── 1. RNNWrapper
    ├── cell-cell.png
    ├── dynamic-rnn-decode2.png
    ├── math-basicRNN.png
    ├── readme.md
    └── wrapper.png
├── 2. User Defined Helper
    ├── README.md
    └── tacotron-decoder.png
├── 3. User Defined Decoder
    ├── BasicDecoder.png
    ├── BasicDecoder2.png
    └── README.md
├── 4. Attention with Tensorflow
    ├── Attention.png
    ├── AttentionWrapper-API.png
    ├── Bahdanau-Luong-Attention.png
    ├── README.md
    ├── answer.png
    ├── attentioin-dynamic-rnn-decode.png
    ├── attention-layer-size.png
    └── attention-shape.png
├── README.md
├── TF-RNN.png
├── a.bat
└── b.bat


/-1. Tensorflow RNN Basic of Basic/README.MD:
--------------------------------------------------------------------------------
  1 | # Tensorflow RNN Basic of Basic
  2 | <p align="center"><img width="700" src="dynamic-rnn.png" />  </p>
  3 | 
  4 | 
  5 | * 위 그림은 tensorflow에서 기본적인 RNN모델을 만드는데 필요한 API들의 관계를 그린 것이다.
  6 | 
  7 | * 아래의 Example은 (단어 3개 + SOS + EOS)로 전체 5개의 vocabulary를 가지고, embedding 처리 후, BasicRNNCell를 거치는 모델이다.
  8 | * BasicRNNCell을 거친 후에 Fully-Connected Layer(OutputProjectionWrapper)를 한번 더 거치면 최종 output이 만들어진다.
  9 | * 정리하면, 5개의 단어를 가지고, [minimal character-level RNN language model](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)을 만들어 본다.
 10 | * 위 그림에서 처럼, {cell,  inputs, seq_length, inital_state} 이렇게 4개의 object를 만들어서 dynamic_rnn에 넘겨주면 된다.
 11 | * 먼저 Example Code 전체를 보고, 부분 부분 살펴보자.
 12 | 
 13 | ```python
 14 | # -*- coding: utf-8 -*-
 15 | import numpy as np
 16 | import tensorflow as tf
 17 | tf.reset_default_graph()
 18 | 
 19 | def dynamic_rnn_test():
 20 | 
 21 |     vocab_size = 5
 22 |     SOS_token = 0
 23 |     EOS_token = 4
 24 |     
 25 |     x_data = np.array([[SOS_token, 3, 1, 2, 3, 2],[SOS_token, 3, 1, 2, 3, 1],[SOS_token, 1, 3, 2, 2, 1]], dtype=np.int32)
 26 |     y_data = np.array([[1,2,0,3,2,EOS_token],[3,2,3,3,1,EOS_token],[3,1,1,2,0,EOS_token]],dtype=np.int32)
 27 |     Y = tf.convert_to_tensor(y_data)
 28 |     print("data shape: ", x_data.shape)
 29 |     sess = tf.InteractiveSession()
 30 |     
 31 |     output_dim = vocab_size
 32 |     batch_size = len(x_data)
 33 |     hidden_dim =6
 34 |     num_layers = 2
 35 |     seq_length = x_data.shape[1]
 36 |     embedding_dim = 8
 37 | 
 38 |     init = np.arange(vocab_size*embedding_dim).reshape(vocab_size,-1)
 39 |     
 40 |     with tf.variable_scope('test') as scope:
 41 |         cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim)
 42 |         cell = tf.contrib.rnn.OutputProjectionWrapper(cell,output_dim)
 43 |     
 44 |         embedding = tf.get_variable("embedding", initializer=init.astype(np.float32),dtype = tf.float32)
 45 |         inputs = tf.nn.embedding_lookup(embedding, x_data) # batch_size  x seq_length x embedding_dim
 46 |     
 47 |         initial_state = cell.zero_state(batch_size, tf.float32) #(batch_size x hidden_dim) 
 48 |         outputs, last_state = tf.nn.dynamic_rnn(cell,inputs,sequence_length=[seq_length]*batch_size,initial_state=initial_state)    
 49 | 
 50 |         weights = tf.ones(shape=[batch_size,seq_length])
 51 |         loss =   tf.contrib.seq2seq.sequence_loss(logits=outputs, targets=Y, weights=weights)
 52 |     
 53 |         sess.run(tf.global_variables_initializer())
 54 |         print("initial_state: ", sess.run(initial_state))
 55 |         print("\n\noutputs: ",outputs)
 56 |         o = sess.run(outputs)  #batch_size, seq_length, outputs
 57 |         o2 = sess.run(tf.argmax(outputs,axis=-1))
 58 |         print("\n",o,o2) #batch_size, seq_length, outputs
 59 |     
 60 |         print("\n\nlast_state: ",last_state)
 61 |         print(sess.run(last_state)) # batch_size, hidden_dim
 62 |       
 63 |         p = sess.run(tf.nn.softmax(outputs)).reshape(-1,output_dim)
 64 |         print("loss: {:20.6f}".format(sess.run(loss)))
 65 |         print("manual cal. loss: {:0.6f} ".format(np.average(-np.log(p[np.arange(y_data.size),y_data.flatten()]))) )
 66 | 
 67 | if __name__ == '__main__':
 68 |     dynamic_rnn_test()
 69 |     print('Done')
 70 | ```
 71 | 
 72 | #[Code 설명]
 73 | ```python
 74 |     vocab_size = 5
 75 |     SOS_token = 0
 76 |     EOS_token = 4
 77 |     
 78 |     x_data = np.array([[SOS_token, 3, 1, 2, 3, 2],[SOS_token, 3, 1, 2, 3, 1],[SOS_token, 1, 3, 2, 2, 1]], dtype=np.int32)
 79 |     y_data = np.array([[1,2,0,3,2,EOS_token],[3,2,3,3,1,EOS_token],[3,1,1,2,0,EOS_token]],dtype=np.int32)
 80 |     Y = tf.convert_to_tensor(y_data)
 81 | ```
 82 | 
 83 | * 0(SOS),1,2,3,EOS(4) --> 모두 5개의 단어에 대하여, batch_size=3인 sample data를 만들었다.
 84 | * 이제 RNN모델의 core인 cell을 만들어 보자
 85 | 
 86 | 
 87 | ```python
 88 |         cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim)
 89 |         cell = tf.contrib.rnn.OutputProjectionWrapper(cell,output_dim)
 90 | ```
 91 | 
 92 | * 여기서는 BasicRNNCell을 사용했고, BasicRNNCell은 hidden_dim만 지정해주면 된다. BasicRNNCell을 거쳐서 나온 결과의 dimension은 당연히 hidden_dim이 되는데, 최종 결과를 만들기 위해서는 vocab_size(=output_dim)의 dimension을 가질 수 있도록 Fully-Connected-Layer를 추가해 주면 된다.
 93 | * Tensorflow RNN모델에서 Fully-Connected-Layer는 tf.contrib.rnn.OutputProjectionWrapper를 사용하면 된다.
 94 | * BasicRNNCell 대신 BasicLSTMCell,GRUCell 등을 사용할 수도 있다.
 95 | 
 96 | ---
 97 | * inputs를 만드는 과정을 보자. embedding vector를 만든 후, embedding_lookup을 통해 단어들에 대한 embedding vector를 찾아준다.
 98 | 
 99 | ```python
100 |         embedding = tf.get_variable("embedding", initializer=init.astype(np.float32),dtype = tf.float32)
101 |         inputs = tf.nn.embedding_lookup(embedding, x_data) # batch_size  x seq_length x embedding_dim
102 | ```
103 | ---
104 | * 이제 initial_state를 만들어보자. 다음의 코드는 0으로된 initial_state를 만들어 준다.
105 | ```python
106 |         initial_state = cell.zero_state(batch_size, tf.float32) #(batch_size x hidden_dim) 
107 | ```
108 | * 필요에 따라, 0이 아닌 다른 값으로 initial_state를 만들 수도 있다.
109 | 
110 | ---
111 | * 나머지 코드들은 test 목적으로 만든 것들로 쉽게 이해할 수 있다.
112 | 
113 | ---
114 | * tf.nn.dynamic_rnn은 teacher forcing 방식. inference 목적으로로 사용하기 위해서는 모델을 많이 수정해야 함. ==> tf.contrib.seq2seq.dynamic_decode를 사용해야 한다.
115 | 
116 | 


--------------------------------------------------------------------------------
/-1. Tensorflow RNN Basic of Basic/dynamic-rnn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/-1. Tensorflow RNN Basic of Basic/dynamic-rnn.png


--------------------------------------------------------------------------------
/0. Basic/README.md:
--------------------------------------------------------------------------------
  1 | # Tensorflow에서 tf.contrib.seq2seq.dynamic_decode를 어떻게 사용해야 하는지 설명.
  2 | 아래 그림은 tf.nn.dynamic_rnn과 tf.contrib.seq2seq.dynamic_decode의 입력 구조를 비교한 그림이다.
  3 | 
  4 | ![decode](./dynamic-rnn-decode.png)
  5 |  * Tensorflow에서는 seq2seq(encoder-decoder) 모델을 다룰 수 있는 dynamic_rnn, dynamic_decode를 제공하고 있다.
  6 |  * dynamic_rnn은 좀 더 단순한 구조로 되어 있는데, 여기서는 dynamic_decode를 설명한다.
  7 |  * cell은 BasicRNNCell,BasicLSTMCell,GRUCell이 올 수 있고, 이런 것들을 쌓은 MultiRNNCell도 올 수 있다.
  8 |  * initial_state는 hidden state의 초기값으로 zero_state, encoder의 마지막  hidden state, captioning model에서 image의 feature등이 올 수 있다.
  9 |  * TrainingHelper는 training 단계에서 사용하고, GreedyEmbeddingHelper는 inference 단계에서 사용하면 된다.
 10 |  * GreedyEmbeddingHelper는 inference에 사용하는 hleper로 전단계의 output의 argmax에 해당하는 결과를 다음 단계의 input으로 전달한다.
 11 |  
 12 | ### [code 설명]
 13 |  * 전체 코드는 [RNN-TF-dynamic-decode.py](https://github.com/hccho2/RNN-Tutorial/blob/master/RNN-TF-dynamic-decode.py)에 있고, 이 페이지의 [아래](#full-code)에서도 확인할 수 있다.
 14 |  * 이제 코드의 시작부터 차근차근 설명해 보자.
 15 | ```python
 16 | vocab_size = 5
 17 | SOS_token = 0
 18 | EOS_token = 4
 19 | 
 20 | x_data = np.array([[SOS_token, 2, 1, 2, 3, 2],[SOS_token, 3, 1, 2, 3, 1],[SOS_token, 1, 3, 2, 2, 1]], dtype=np.int32)
 21 | y_data = np.array([[2, 1, 2, 3, 2,EOS_token],[3, 1, 2, 3, 1,EOS_token],[ 1, 3, 2, 2, 1,EOS_token]],dtype=np.int32)
 22 | ```
 23 |  * 간단한 data로 설명하기 위해, 단어 개수 vocab_size = 5로 설정. 제시된 x_data, y_data를 보면 알 수 있듯이, x_data는 SOS_token으로 시작하고, y_data는 EOS_token으로 끝난다.
 24 |  * seq_length는 6이다. batch data들의 길이가 같지 않은 경우가 대부분인데, 이런 경우에는 Null을 도입하여 최대 길이(max_sequence)를 정하고, 뒷부분을 Null로 채워서 길이를 맞춘다. 여기서는 Null을 사용하지 않았다.
 25 |  * 실전 data에서는 data file을 읽어, 단어를 숫자로 mapping하고 Null로 padding하는 등의 preprocessing에 많은 시간이 소요될 수 있다.
 26 |  * Tensorflow의 data 입력 op인 placeholder를 사용해야하는데, 여기서는 간단함을 위해 사용하지 않는다. 
 27 | 
 28 | 
 29 | ```python
 30 | output_dim = vocab_size
 31 | batch_size = len(x_data)
 32 | hidden_dim = 6
 33 | num_layers = 2
 34 | seq_length = x_data.shape[1]
 35 | embedding_dim = 8
 36 | 
 37 | state_tuple_mode = True
 38 | init_state_flag = 0
 39 | train_mode = True
 40 | ```
 41 | * output_dim은 RNN cell의 output에 연결되는 FC layer의 출력 dimension이다. 보통의 경우 단어 개수와 동일한 dimension이다. 그래서 output_dim = vocab_size
 42 | * batch_size는 추가 설명 불필요^^
 43 | * hidden_dim은 말 그대로 RNN cell의 hidden layer size.
 44 | * num_layer는 Multi RNN모델에서 RNN layer를 몇 층으로 쌓을지 결정하는 값. 즉, num_layer만큼 LSTM cell을 쌓는다.
 45 | * embedding_dim은 각 단어를 몇 차원 vector로 mapping할지 결정하는 변수.
 46 | * 나머지 3개 변수(state_tuple_mode,init_state_flag,train_mode)는 코드 상의 옵션을 설정하는 변수로 중요한 것은 아님. 차차 설명.
 47 | 
 48 | ```python
 49 | cells = []
 50 | for _ in range(num_layers):
 51 | 	cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim,state_is_tuple=state_tuple_mode)
 52 | 	cells.append(cell)
 53 | cell = tf.contrib.rnn.MultiRNNCell(cells)    
 54 | ```
 55 | * RNN cell을 num_layer만큼 쌓아야하기 때문에 for loop를 통해서 BasicLSTMCell을 원하는 만큼 쌓았다.
 56 | * BasicLSTMCell의 state_is_tuple항목은 c_state와 h_state(m_state라고 하기도 함)를 tuple형태로 관리할 지, 그냥 이어서 하나로 관리할 지 정하는 항목인데, model구조에 영향을 주는 것은 아니다.
 57 | * Tensorflow에서는 tuple로 관리할 것을 권장하고 있다.
 58 | 
 59 | ```python
 60 | init = tf.contrib.layers.xavier_initializer()
 61 | embedding = tf.get_variable("embedding",shape=[vocab_size,embedding_dim], initializer=init,dtype = tf.float32)
 62 | inputs = tf.nn.embedding_lookup(embedding, x_data) # batch_size  x seq_length x embedding_dim
 63 | ```
 64 | 
 65 | * 각 단어를 embedding vector로 변환할 수 있는 변수를 만든다. vocab_size x embedding_dim만큼의 변수가 필요하다.
 66 | * embedding변수가 만들어지면, embedding_lookup을 통해, x_data를 embedding vector로 변환한다.
 67 | * 참고로 embedding vector를 만들때 초기값을 아래와 같이 0,1,2,...로 지정하여 embedding vector로 변환이 어떻게 이루어지는지 확인해 볼 수도 있다.
 68 | ```python
 69 | init = np.arange(vocab_size*embedding_dim).reshape(vocab_size,-1).astype(np.float32) # 아래 embedding의 get_variable에서 shape을 지정하면 안된다.
 70 | embedding = tf.get_variable("embedding", initializer=init,dtype = tf.float32)
 71 | 
 72 | ```
 73 | 
 74 | ---
 75 | * 이제 RNN cell의 hidden state의 초기값을 지정하는 코드를 살펴보자.
 76 | ```python
 77 | if init_state_flag==0:
 78 |     initial_state = cell.zero_state(batch_size, tf.float32) #(batch_size x hidden_dim) x layer 개수 
 79 | else:
 80 |     h0 = tf.random_normal([batch_size,hidden_dim]) #실제에서는 적절한 값을 외부에서 받아와야 함.
 81 |     if state_tuple_mode: 
 82 |         initial_state=(tf.contrib.rnn.LSTMStateTuple(tf.zeros_like(h0), h0),) + (tf.contrib.rnn.LSTMStateTuple(tf.zeros_like(h0), tf.zeros_like(h0)),)*(num_layers-1)          
 83 |     else:
 84 |         initial_state = (tf.concat((tf.zeros_like(h0),h0), axis=1),) + (tf.concat((tf.zeros_like(h0),tf.zeros_like(h0)), axis=1),) * (num_layers-1)
 85 | ```
 86 | * hidden state의 초기값은 cell.zero_state(batch_size, tf.float32)와 같이 0으로 지정하는 경우도 있고,
 87 | * encoder-decoder 모델에서의 decoder의 hidden state 초기값은 encoder의 마지막 hidden state값을 받아오기도 한다.
 88 | * image에 대한 caption을 생성하는 모델에서는 image의 추상화된 feature를 초기값으로 사용할 수도 있다.
 89 | * 또한 simple한 Attention 모델에서는 attention vector를 hidden state 초기값으로 전달하기도 한다.
 90 | * 우리의 경우, init_state_flag==0인 경우는 0으로 초기화 했고,
 91 | * init_state_flag가 0이 아니면, 밖에서 받아온 값으로 초기화해야 하는데, 예를 위해서 h0 = tf.random_normal([batch_size,hidden_dim])를 사용했다.
 92 | * LSTM cell에서는 c_state와 h_state가 있기 때문에 각각의 값을 지정해야함.
 93 | * 우리는 LSTM cell을 multi로 쌓았기 때문에, 제일 아래 층만 지정된 값을 주고, 나머지 층은 0으로 초기화.
 94 | * 제일 아래층에서도 c_state는 0으로 초기화하고, h_state는 h0값으로 초기화 했다.
 95 | ---
 96 | * helper부분을 살펴보자.
 97 | ```python
 98 | if train_mode:
 99 |     helper = tf.contrib.seq2seq.TrainingHelper(inputs, np.array([seq_length]*batch_size))
100 | else:
101 |     helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding, start_tokens=tf.tile([SOS_token], [batch_size]), end_token=EOS_token)
102 | ```
103 | * helper는 input data를 cell에 전달하는 역할을 하는데, training 모드에서는 TrainingHelper를 사용하고, inference 모드에서는 GreedyEmbeddingHelper가 사용된다.
104 | * TrainingHelper의 2번째 argument로 배치의 seq_length를 정해줘야 하는데, 우리는 (필요한 경우 Null을 붙혀) batch속에 있는 각각의 data 길이가 동일(지금의 예에서는 6)하게 만들어 놓았기 때문에, [seq_length]*batch_size로 하면 된다.
105 | * Null을 붙혀 data길이를 맞추었다면, 나중에 loss계산할 때, Null이 붙은 부분의 weight는 0으로 줘서 무시될 수 있도록 하면 된다.
106 | * GreedyEmbeddingHelper는 이전 단계의 output의 argmax에 해당하는 값을 다음 단계의 input으로 전달한다.
107 | * GreedyEmbeddingHelper는 batch개수 만큼의 SOS_token과 EOS_token이 parameter로 넘어간다. EOS_token이 생성될 때까지 RNN 모델이 돌아간다. EOS_token이 생성되지 않으면 무한 루프에 빠질 수 있다.
108 | * 무한 루프에 빠지는 것을 방지하기 위해 아래의 tf.contrib.seq2seq.dynamic_decode에서 maximum_iterations을 지정해 주는 것이 좋다.
109 | ---
110 | * 이제 모델의 마지막 부분인 BasicDecoder, dynamic_decode를 살펴보자.
111 | ```python
112 | output_layer = Dense(output_dim, name='output_projection')
113 | decoder = tf.contrib.seq2seq.BasicDecoder(cell=cell,helper=helper,initial_state=initial_state,output_layer=output_layer)    
114 | outputs, last_state, last_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder=decoder,output_time_major=False,impute_finished=True,maximum_iterations=10)
115 | 
116 | ```
117 | * output_layer는 RNN cell의 출력값을 받아 연결할 Full Connected Layer를 지정해 준다. output dimension만 정해주면 된다.
118 | * 지금까지의 만든 cell, helper, initial_state, output_layer를 BasicDecoder에 전달하여 decoder를 만들고, 이 decoder를 전달하여 최종적으로 dynamic_decode를 만든다.
119 | ---
120 | * 이후의 코드는 Neural Net 모형을 아는 사람은 어렵지 않게 이해할 수 있기 때문에 추가적인 설명은 생략한다.
121 | * 또한, tf.contrib.seq2seq.sequence_loss에서 계산해 주는 loss값과 cross entropy loss를 직접 계산한 값이 일치하는지도 확인하고 있다.
122 | * tf.contrib.seq2seq.sequence_loss의 targets은 one-hot으로 변환되지 않은 값이 전달된다.
123 | * 기타 여러가지 확인할 부분을 출력하는 코드가 추가되어 있다.
124 | ---
125 | [추가 설명]
126 | * loss를 계산하는 부분에 대한 설명:
127 | ```python
128 | weights = tf.ones(shape=[batch_size,seq_length])
129 | loss =   tf.contrib.seq2seq.sequence_loss(logits=outputs.rnn_output, targets=Y, weights=weights)
130 | ```
131 | * 우리는 Null을 사용하지 않았기 때문에 모든 batch data의 sequence에 대해서 동일한 가중치 1을 부여했다.
132 | * Null을 사용했다면, Null이 들어가는 부분의 loss가 무시될 수 있도록 weights를 만들어 준다.
133 | ```python
134 | weights = tf.to_float(tf.not_equal(y_data, Null))
135 | ```
136 | 
137 | 
138 | 
139 | 
140 | 
141 | 
142 | 
143 | 
144 | ---
145 | ### Full CODE
146 | 
147 | ```python
148 | # -*- coding: utf-8 -*-
149 | 
150 | import numpy as np
151 | import tensorflow as tf
152 | 
153 | 
154 | from tensorflow.python.layers.core import Dense
155 | tf.reset_default_graph()
156 | 
157 | vocab_size = 5
158 | SOS_token = 0
159 | EOS_token = 4
160 | 
161 | x_data = np.array([[SOS_token, 2, 1, 2, 3, 2],[SOS_token, 3, 1, 2, 3, 1],[SOS_token, 1, 3, 2, 2, 1]], dtype=np.int32)
162 | y_data = np.array([[2, 1, 2, 3, 2,EOS_token],[3, 1, 2, 3, 1,EOS_token],[ 1, 3, 2, 2, 1,EOS_token]],dtype=np.int32)
163 | print("data shape: ", x_data.shape)
164 | 
165 | 
166 | output_dim = vocab_size
167 | batch_size = len(x_data)
168 | hidden_dim = 6
169 | num_layers = 2
170 | seq_length = x_data.shape[1]
171 | embedding_dim = 8
172 | 
173 | state_tuple_mode = True
174 | init_state_flag = 0
175 | train_mode = True
176 | 
177 | 
178 | 
179 | with tf.variable_scope('test',reuse=tf.AUTO_REUSE) as scope:
180 |     # Make rnn
181 |     cells = []
182 |     for _ in range(num_layers):
183 |         #cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim)
184 |         cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim,state_is_tuple=state_tuple_mode)
185 |         cells.append(cell)
186 |     cell = tf.contrib.rnn.MultiRNNCell(cells)    
187 |     #cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim)
188 | 
189 | 
190 |     #init = np.arange(vocab_size*embedding_dim).reshape(vocab_size,-1).astype(np.float32) # 이경우는 아래의 embedding의 get_variable에서 shape을 지정하면 안된다.
191 |     init = tf.contrib.layers.xavier_initializer()
192 |     embedding = tf.get_variable("embedding",shape=[vocab_size,embedding_dim], initializer=init,dtype = tf.float32)
193 |     inputs = tf.nn.embedding_lookup(embedding, x_data) # batch_size  x seq_length x embedding_dim
194 | 
195 |     
196 | 
197 |     if init_state_flag==0:
198 |          initial_state = cell.zero_state(batch_size, tf.float32) #(batch_size x hidden_dim) x layer 개수 
199 |     else:
200 |         h0 = tf.random_normal([batch_size,hidden_dim]) #실제에서는 적절한 값을 외부에서 받아와야 함.
201 |         if state_tuple_mode: 
202 |             initial_state=(tf.contrib.rnn.LSTMStateTuple(tf.zeros_like(h0), h0),) + (tf.contrib.rnn.LSTMStateTuple(tf.zeros_like(h0), tf.zeros_like(h0)),)*(num_layers-1)          
203 |         else:
204 |             initial_state = (tf.concat((tf.zeros_like(h0),h0), axis=1),) + (tf.concat((tf.zeros_like(h0),tf.zeros_like(h0)), axis=1),) * (num_layers-1)
205 |     if train_mode:
206 |         helper = tf.contrib.seq2seq.TrainingHelper(inputs, np.array([seq_length]*batch_size))
207 |     else:
208 |         helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding, start_tokens=tf.tile([SOS_token], [batch_size]), end_token=EOS_token)
209 | 
210 |     output_layer = Dense(output_dim, name='output_projection')
211 |     decoder = tf.contrib.seq2seq.BasicDecoder(cell=cell,helper=helper,initial_state=initial_state,output_layer=output_layer)    
212 |     # maximum_iterations를 설정하지 않으면, inference에서 EOS토큰을 만나지 못하면 무한 루프에 빠진다
213 |     outputs, last_state, last_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder=decoder,output_time_major=False,impute_finished=True,maximum_iterations=10)
214 | 
215 |     
216 |     Y = tf.convert_to_tensor(y_data)
217 |     weights = tf.ones(shape=[batch_size,seq_length])
218 |     loss =   tf.contrib.seq2seq.sequence_loss(logits=outputs.rnn_output, targets=Y, weights=weights)
219 | 
220 |     optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
221 |     train = optimizer.minimize(loss)
222 | 
223 | 
224 |     with tf.Session() as sess:
225 |         
226 |         sess.run(tf.global_variables_initializer())
227 |         if train_mode:
228 |             for step in range(100):
229 |                 _,l = sess.run([train,loss])
230 |                 if step %10 ==0:
231 |                     print("step: {}, loss: {}".format(step,l))
232 |             
233 |             p = sess.run(tf.nn.softmax(outputs.rnn_output)).reshape(-1,output_dim)
234 |             print("loss: {:20.6f}".format(sess.run(loss)))
235 |             print("manual cal. loss: {:0.6f} ".format(np.average(-np.log(p[np.arange(y_data.size),y_data.flatten()]))) )     
236 |         
237 |         print("initial_state: ", sess.run(initial_state))
238 |         print("\n\noutputs: ",outputs)
239 |         o = sess.run(outputs.rnn_output)  #batch_size, seq_length, outputs
240 |         o2 = sess.run(tf.argmax(outputs.rnn_output,axis=-1))
241 |         print("\n",o,o2) #batch_size, seq_length, outputs
242 |     
243 |         print("\n\nlast_state: ",last_state)
244 |         print(sess.run(last_state)) # batch_size, hidden_dim
245 |     
246 |         print("\n\nlast_sequence_lengths: ",last_sequence_lengths)
247 |         print(sess.run(last_sequence_lengths)) #  [seq_length]*batch_size    
248 |         
249 |         print("kernel(weight)",sess.run(output_layer.trainable_weights[0]))  # kernel(weight)
250 |         print("bias",sess.run(output_layer.trainable_weights[1]))  # bias
251 |     
252 | ```
253 | 


--------------------------------------------------------------------------------
/0. Basic/RNN-TF-dynamic-decode.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | import numpy as np
  4 | import tensorflow as tf
  5 | 
  6 | 
  7 | from tensorflow.python.layers.core import Dense
  8 | tf.reset_default_graph()
  9 | 
 10 | vocab_size = 5
 11 | SOS_token = 0
 12 | EOS_token = 4
 13 | 
 14 | x_data = np.array([[SOS_token, 2, 1, 2, 3, 2],[SOS_token, 3, 1, 2, 3, 1],[SOS_token, 1, 3, 2, 2, 1]], dtype=np.int32)
 15 | y_data = np.array([[2, 1, 2, 3, 2,EOS_token],[3, 1, 2, 3, 1,EOS_token],[ 1, 3, 2, 2, 1,EOS_token]],dtype=np.int32)
 16 | print("data shape: ", x_data.shape)
 17 | 
 18 | 
 19 | output_dim = vocab_size
 20 | batch_size = len(x_data)
 21 | hidden_dim =6
 22 | num_layers = 2
 23 | seq_length = x_data.shape[1]
 24 | embedding_dim = 8
 25 | state_tuple_mode = True
 26 | init_state_flag = 0
 27 | 
 28 | #init = np.arange(vocab_size*embedding_dim).reshape(vocab_size,-1).astype(np.float32) # 이경우는 아래의 embedding의 get_variable에서 shape을 지정하면 안된다.
 29 | init = tf.contrib.layers.xavier_initializer()
 30 | 
 31 | train_mode = True
 32 | with tf.variable_scope('test',reuse=tf.AUTO_REUSE) as scope:
 33 |     # Make rnn
 34 |     cells = []
 35 |     for _ in range(num_layers):
 36 |         #cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim)
 37 |         cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim,state_is_tuple=state_tuple_mode)
 38 |         cells.append(cell)
 39 |     cell = tf.contrib.rnn.MultiRNNCell(cells)    
 40 |     #cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim)
 41 | 
 42 |     embedding = tf.get_variable("embedding",shape=[vocab_size,embedding_dim], initializer=init,dtype = tf.float32)
 43 |     inputs = tf.nn.embedding_lookup(embedding, x_data) # batch_size  x seq_length x embedding_dim
 44 | 
 45 |     
 46 | 
 47 |     if init_state_flag==0:
 48 |          initial_state = cell.zero_state(batch_size, tf.float32) #(batch_size x hidden_dim) x layer 개수 
 49 |     else:
 50 |         if state_tuple_mode:
 51 |             h0 = tf.random_normal([batch_size,hidden_dim]) #h0 = tf.cast(np.random.randn(batch_size,hidden_dim),tf.float32)
 52 |             initial_state=(tf.contrib.rnn.LSTMStateTuple(tf.zeros_like(h0), h0),) + (tf.contrib.rnn.LSTMStateTuple(tf.zeros_like(h0), tf.zeros_like(h0)),)*(num_layers-1)
 53 |             
 54 |         else:
 55 |             h0 = tf.random_normal([batch_size,hidden_dim]) #h0 = tf.cast(np.random.randn(batch_size,hidden_dim),tf.float32)
 56 |             initial_state = (tf.concat((tf.zeros_like(h0),h0), axis=1),) + (tf.concat((tf.zeros_like(h0),tf.zeros_like(h0)), axis=1),) * (num_layers-1)
 57 |     if train_mode:
 58 |         helper = tf.contrib.seq2seq.TrainingHelper(inputs, np.array([seq_length]*batch_size))
 59 |     else:
 60 |         helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding, start_tokens=tf.tile([SOS_token], [batch_size]), end_token=EOS_token)
 61 | 
 62 |     output_layer = Dense(output_dim, name='output_projection')
 63 |     decoder = tf.contrib.seq2seq.BasicDecoder(cell=cell,helper=helper,initial_state=initial_state,output_layer=output_layer)    
 64 |     # maximum_iterations를 설정하지 않으면, inference에서 EOS토큰을 만나지 못하면 무한 루프에 빠진다
 65 |     outputs, last_state, last_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder=decoder,output_time_major=False,impute_finished=True,maximum_iterations=10)
 66 | 
 67 |     
 68 |     Y = tf.convert_to_tensor(y_data)
 69 |     weights = tf.ones(shape=[batch_size,seq_length])
 70 | 
 71 |     loss =   tf.contrib.seq2seq.sequence_loss(logits=outputs.rnn_output, targets=Y, weights=weights)
 72 | 
 73 |     optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
 74 |     train = optimizer.minimize(loss)
 75 | 
 76 | 
 77 |     with tf.Session() as sess:
 78 |         
 79 |         sess.run(tf.global_variables_initializer())
 80 |         if train_mode:
 81 |             for step in range(100):
 82 |                 _,l = sess.run([train,loss])
 83 |                 if step %10 ==0:
 84 |                     print("step: {}, loss: {}".format(step,l))
 85 |             
 86 |             p = sess.run(tf.nn.softmax(outputs.rnn_output)).reshape(-1,output_dim)
 87 |             print("loss: {:20.6f}".format(sess.run(loss)))
 88 |             print("manual cal. loss: {:0.6f} ".format(np.average(-np.log(p[np.arange(y_data.size),y_data.flatten()]))) )     
 89 |         
 90 |         print("initial_state: ", sess.run(initial_state))
 91 |         print("\n\noutputs: ",outputs)
 92 |         o = sess.run(outputs.rnn_output)  #batch_size, seq_length, outputs
 93 |         o2 = sess.run(tf.argmax(outputs.rnn_output,axis=-1))
 94 |         print("\n",o,o2) #batch_size, seq_length, outputs
 95 |     
 96 |         print("\n\nlast_state: ",last_state)
 97 |         print(sess.run(last_state)) # batch_size, hidden_dim
 98 |     
 99 |         print("\n\nlast_sequence_lengths: ",last_sequence_lengths)
100 |         print(sess.run(last_sequence_lengths)) #  [seq_length]*batch_size    
101 |         
102 |         print("kernel(weight)",sess.run(output_layer.trainable_weights[0]))  # kernel(weight)
103 |         print("bias",sess.run(output_layer.trainable_weights[1]))  # bias


--------------------------------------------------------------------------------
/0. Basic/dynamic-rnn-decode.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/0. Basic/dynamic-rnn-decode.png


--------------------------------------------------------------------------------
/1. RNNWrapper/cell-cell.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/1. RNNWrapper/cell-cell.png


--------------------------------------------------------------------------------
/1. RNNWrapper/dynamic-rnn-decode2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/1. RNNWrapper/dynamic-rnn-decode2.png


--------------------------------------------------------------------------------
/1. RNNWrapper/math-basicRNN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/1. RNNWrapper/math-basicRNN.png


--------------------------------------------------------------------------------
/1. RNNWrapper/readme.md:
--------------------------------------------------------------------------------
  1 | # Tensorflow에서 BasicDecoder에 넘겨 줄 수 있는 사용자 정의 RNN Wrapper Class를 만들어 보자.
  2 | ### [목차]
  3 | * [쉬운 것부터 시작해보자](#쉬운-것부터-시작해보자)
  4 | * [BasicRNNCell을 직접 만들어보자](#basicrnncell을-직접-만들어보자)
  5 | * [User Defined RNNCell](#user-defined-rnncell)
  6 | * [진정한 Wrapper 만들기](#진정한-wrapper-만들기)
  7 | 
  8 | ### [쉬운 것부터 시작해보자]
  9 | 
 10 | 아래 그림은 tf.contrib.seq2seq.BasicDecoder(class)와 tf.contrib.seq2seq.dynamic_decode(함수)의 입력 구조를 그려놓은 것이다.
 11 | BasicDecoder, dynamic_decode를 잘 모르는 경우에는 이전 post인 [RNN-Tutorial](https://github.com/hccho2/RNN-Tutorial)를 참고하면 된다.
 12 | ![decode](./dynamic-rnn-decode2.png)
 13 | * Tensorflow의 dynamic_decode는 BasicDecoder를 입력받고, BasicDecoder는 cell, Helper 등을 입력받아 RNN모델이 구현된다.
 14 | * 이 post에서는 user defined RNNCell 구현에 대해서 알아보고자 한다.
 15 | * 먼저, cell의 대표적인 예로는 Tensorflow에 구현되어 있는 BasicRNNCell, GRUCell, BasicLSTMCell 등이 있다.
 16 | * 이런 cell들은 (Tensorflow의) RNNCell을 상속받은 class들이다.
 17 | * RNNCell을 상속받아 사용자 정의 RNN Wrapper class를 만들어  BasicDecoder로 넘겨줄 수 있다.
 18 | * 이제, 초간단으로 만들어진 user defined RNN Wrapper의 sample code를 살펴보자.
 19 | 
 20 | ```python
 21 | from tensorflow.contrib.rnn import RNNCell
 22 | 
 23 | class MyRnnWrapper(RNNCell):
 24 |     # property(output_size, state_size) 2개와 call을 정의하면 된다.
 25 |     def __init__(self,state_dim,name=None):
 26 |         super(MyRnnWrapper, self).__init__(name=name)
 27 |         self.sate_size = state_dim
 28 | 
 29 |     @property
 30 |     def output_size(self):
 31 |         return self.sate_size  
 32 | 
 33 |     @property
 34 |     def state_size(self):
 35 |         return self.sate_size  
 36 | 
 37 |     def call(self, inputs, state):
 38 |         # 이 call 함수를 통해 cell과 cell이 연결된다.
 39 |         cell_output = inputs
 40 |         next_state = state + 0.1
 41 |         return cell_output, next_state 
 42 | ```
 43 | * 위 코드는, RNNCell을 상속받아 class MyRnnWrapper를 구현하고 있다.
 44 | * MyRnnWrapper에서 반드시 구현하여야 하는 부분은 perperty output_size와 state_size 이다. 그리고 call(self, inputs, state)이라는 특수한 class method를 구현해야 한다.
 45 | * output_size는 RNN Model에서 출력될 결과물의 dimension이고 state_size는 cell과 cell를 연결하는 hidden state의 크기이다. 
 46 | * call 함수(method)는 input과 직전 cell에서 넘겨 받은 hidden state값을 넘겨 받아, 필요한 계산을 수행한 후, 다음 단계로 넘겨 줄 next_state와 cell_output를 구하는 역할을 수행한다.
 47 | 
 48 | ![decode](./cell-cell.png)
 49 | * 위 sample code은 넘겨받은 input을 그대로 output으로 내보내고, 넘겨 받은 state는 test 삼아, 0.1을 더해 next_state로 넘겨주는 의미 없는 example이다.
 50 | * 지금까지 만든 MyRnnWrapper를 dynamic_decode로 넘겨 볼 수 있는 간단한 예를 만들어 돌려보자. 아래 코드가 이해되지 않는다면 위에서 언급한 [RNN-Tutorial](https://github.com/hccho2/RNN-Tutorial)를 참고하면 된다.
 51 | * last state의 출력 값이 0.6인 이유는? 
 52 | ```python
 53 | # coding: utf-8
 54 | SOS_token = 0
 55 | EOS_token = 4
 56 | vocab_size = 5
 57 | x_data = np.array([[SOS_token, 3, 3, 2, 3, 2],[SOS_token, 3, 1, 2, 3, 1],[SOS_token, 1, 3, 2, 2, 1]], dtype=np.int32)
 58 | 
 59 | print("data shape: ", x_data.shape)
 60 | sess = tf.InteractiveSession()
 61 | 
 62 | output_dim = vocab_size
 63 | batch_size = len(x_data)
 64 | seq_length = x_data.shape[1]
 65 | embedding_dim = 2
 66 | 
 67 | init = np.arange(vocab_size*embedding_dim).reshape(vocab_size,-1)
 68 | 
 69 | train_mode = True
 70 | alignment_history_flag = False
 71 | with tf.variable_scope('test') as scope:
 72 |     # Make rnn
 73 |     cell = MyRnnWrapper(embedding_dim,"xxx")
 74 | 
 75 |     embedding = tf.get_variable("embedding", initializer=init.astype(np.float32),dtype = tf.float32)
 76 |     inputs = tf.nn.embedding_lookup(embedding, x_data) # batch_size  x seq_length x embedding_dim
 77 | 
 78 |     #######################################################
 79 | 
 80 |     initial_state = cell.zero_state(batch_size, tf.float32) #(batch_size x hidden_dim) x layer 개수 
 81 |     
 82 |     helper = tf.contrib.seq2seq.TrainingHelper(inputs, np.array([seq_length]*batch_size,dtype=np.int32))
 83 | 
 84 |     decoder = tf.contrib.seq2seq.BasicDecoder(cell=cell,helper=helper,initial_state=initial_state,output_layer=None)    
 85 |     outputs, last_state, last_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder=decoder,output_time_major=False,impute_finished=True,maximum_iterations=10)
 86 | 
 87 |     
 88 |     ######################################################
 89 |     sess.run(tf.global_variables_initializer())
 90 |     print("\ninputs: ",inputs)
 91 |     inputs_ = sess.run(inputs) 
 92 |     print("\n",inputs_)    
 93 |     
 94 |     print("\noutputs: ",outputs)
 95 |     outputs_ = sess.run(outputs.rnn_output) 
 96 |     print("\n",outputs_) 
 97 | 
 98 |     print("\nlast state: ",last_state)
 99 |     last_state_ = sess.run(last_state) 
100 |     print("\n",last_state_) 
101 | """
102 | inputs:  Tensor("test/embedding_lookup:0", shape=(3, 6, 2), dtype=float32)
103 | 
104 |  [[[0. 1.]
105 |   [6. 7.]
106 |   [6. 7.]
107 |   [4. 5.]
108 |   [6. 7.]
109 |   [4. 5.]]
110 | 
111 |  [[0. 1.]
112 |   [6. 7.]
113 |   [2. 3.]
114 |   [4. 5.]
115 |   [6. 7.]
116 |   [2. 3.]]
117 | 
118 |  [[0. 1.]
119 |   [2. 3.]
120 |   [6. 7.]
121 |   [4. 5.]
122 |   [4. 5.]
123 |   [2. 3.]]]
124 | 
125 | outputs:  BasicDecoderOutput(rnn_output=<tf.Tensor 'test/decoder/transpose:0' shape=(3, ?, 2) dtype=float32>, sample_id=<tf.Tensor 'test/decoder/transpose_1:0' shape=(3, ?) dtype=int32>)
126 | 
127 |  [[[0. 1.]
128 |   [6. 7.]
129 |   [6. 7.]
130 |   [4. 5.]
131 |   [6. 7.]
132 |   [4. 5.]]
133 | 
134 |  [[0. 1.]
135 |   [6. 7.]
136 |   [2. 3.]
137 |   [4. 5.]
138 |   [6. 7.]
139 |   [2. 3.]]
140 | 
141 |  [[0. 1.]
142 |   [2. 3.]
143 |   [6. 7.]
144 |   [4. 5.]
145 |   [4. 5.]
146 |   [2. 3.]]]
147 | 
148 | last state:  Tensor("test/decoder/while/Exit_3:0", shape=(3, 2), dtype=float32)
149 | 
150 |  [[0.6 0.6]
151 |  [0.6 0.6]
152 |  [0.6 0.6]]
153 | """    
154 | ```
155 | 
156 | ### [BasicRNNCell을 직접 만들어보자]
157 | * 이제, 조금 더 의미 있는 user defined RNN Wrapper class를 만들어 보자. Tensorflow에 이미 구현되어 있는 tf.contrib.rnn.BasicRNNCell을 직접 구현해 보자.
158 | ![decode](./math-basicRNN.png)
159 | * BasicRNNCell은 input과 이전 state를 concat하여 kernel을 곱한 후, bias를 더하고 tanh를 취하는 구조이다.
160 | * 이런 구조의 RNNCell을 만들기 위해서는 kernel과 bias를 정의해야 하고, call method에서 필요한 연산을 해 주면 된다.
161 | * kernel과 bias의 정의는 build(self, inputs_shape)라는 특수한 형태의 함수에서 해주면 된다.
162 | ```python
163 | class MyBasicRNNWrapper(RNNCell):
164 |     # property(output_size, state_size) 2개와 call을 정의하면 된다.
165 |     def __init__(self,state_dim,name=None):
166 |         super(MyBasicRNNWrapper, self).__init__(name=name)
167 |         self.sate_size = state_dim
168 | 
169 |     @property
170 |     def output_size(self):
171 |         return self.sate_size  
172 | 
173 |     @property
174 |     def state_size(self):
175 |         return self.sate_size  
176 | 
177 |     def build(self, inputs_shape):
178 |         # 필요한 trainable variable이 있으면 여기서 생성하고, self.built = True로 설정하면 된다.
179 |         input_depth = inputs_shape[1].value
180 |         self._kernel = tf.get_variable('kernel', shape=[input_depth + self.sate_size, self.sate_size])
181 |         self._bias = tf.get_variable('bias', shape=[self.sate_size],  initializer=tf.zeros_initializer(dtype=tf.float32))
182 |     
183 |         self.built = True  # 필요한 변수가 선언되었다는 것을 super에게 알려주는 역할.
184 | 
185 |     def call(self, inputs, state):
186 |         # 이 call 함수를 통해 cell과 cell이 연결된다.
187 |         gate_inputs = tf.matmul(tf.concat((inputs,state),axis=1),self._kernel)
188 |         gate_inputs = tf.nn.bias_add(gate_inputs,self._bias)
189 |         cell_output = tf.tanh(gate_inputs)
190 |         next_state = cell_output
191 |         return cell_output, next_state 
192 | 
193 | ```
194 | * 위와 같이 MyBasicRNNWrapper를 만들고 나면 MyRnnWrapper를 test했던 코드에서 Wrapper Class만 바꾸어서 test하면 된다.
195 | ```python
196 | hidden_dim = 4
197 | #cell = MyRnnWrapper(embedding_dim,"xxx")
198 | cell = MyBasicRNNWrapper(hidden_dim,"xxx")
199 | ```
200 | 
201 | ### [User Defined RNNCell]
202 | * 입력 값인 input이 들어오면, FC layer를 한번 거친 후, state와 결합하여 연산하는 구조를 원한다면, call method를 수정해주면 된다.
203 | ```python
204 | def call(self, inputs, state):
205 | 	# 이 call 함수를 통해 cell과 cell이 연결된다.
206 | 	inputs = tf.layers.dense(inputs,units=self.input_depth,kernel_initializer = tf.contrib.layers.xavier_initializer(False), activation=tf.nn.relu)
207 | 	gate_inputs = tf.matmul(tf.concat((inputs,state),axis=1),self._kernel)
208 | 	gate_inputs = tf.nn.bias_add(gate_inputs,self._bias)
209 | 	cell_output = tf.tanh(gate_inputs)
210 | 	next_state = cell_output
211 | 	return cell_output, next_state 
212 | ```
213 | * 출력값 cell_output을 FC layer를 한 번 거친 후, 내보내고 싶다면, 약간만 수정해 준다면 어렵지 않다.
214 | * 이런 식으로 원하는 구조로 얼마든지 변형이 가능하기 때문에, 원하는 모델을 만드는 것이 가능하다.
215 | 
216 | 
217 | ### [진정한 Wrapper 만들기]
218 | * RNN Wrapper는 말 그대로 RNN Cell을 감싸는 WRAP이다. Tensorflow에 구현되어 있는 BasicRNNCell, GRUCell, BasicLSTMCell등을 감싸서 새로운 구조를 만드는 틀이 될 수 있다. 
219 | * 물론 내장되어 있는 RNN Cell뿐만 아니라, 우리가 여기서 다루고 있는 RNNWrapper를 다시 감싸는 RNNWrapper도 가능하다.
220 | * RNNCell을 감싸기 위해서는 init함수에서 원하는 RNNCell을 받으면 된다.
221 | ![wrapper](./wrapper.png)
222 | * 위 그림에서 볼 수 있듯이, 입력이 들어오면, 먼저 Wrapper가 받은 후, 내부의 RNNCell에 전달되는 구조이다.
223 | * tf.contrib.seq2seq.AttentionWrapper같은 경우에 입력이 들어오면, attention과 concat하여 새로운 input을 만드는데, 이렇게 catcat된 input을 내부의 RNNCell에 전달한다.
224 | 
225 | ```python
226 | class MyBasicRNNWrapper2(RNNCell):
227 |     # property(output_size, state_size) 2개와 call을 정의하면 된다.
228 |     def __init__(self,cell,name=None):
229 |         super(MyBasicRNNWrapper2, self).__init__(name=name)
230 |         self.cell = cell
231 |     @property
232 |     def output_size(self):
233 |         return self.cell.output_size  
234 |     @property
235 |     def state_size(self):
236 |         return self.cell.state_size  
237 | 
238 |     def call(self, inputs, state):
239 |         # 필요한 작업1: inputs, state를 이용하여 필요한 작업을 수행하여 self.cell에 넘겨줄 새로은 inputs, state를 만든다.
240 |         cell_output, next_state = self.cell(inputs,state)
241 | 	# 필요한 작업2: self.cell이 return한 cell_outpus, next_state를 가공하여 return 값을 만든다.
242 |         return cell_output, next_state 
243 | 
244 | hidden_dim = 4
245 | cell = MyBasicRNNWrapper2(tf.contrib.rnn.BasicRNNCell(hidden_dim),"xxx")
246 | ```
247 | * 위 코드의 call함수에서는 입력값인 (inputs, state)를 self.cell에 그래도 넘겼지만, 
248 | * self.cell에 넘기기 전에 필요한 작업을 한 후 넘겨주고, 
249 | * 그리고 self.cell에서 return 받는 결과에 필요한 후처리(post processing)를 수행한 후 return 하면 된다.
250 | * 예를 들어, Residual 같은 구조로 입력과 결과값을 더해서 output으로 내보내고 싶다거나, output과 state를 concat해서 새로운 output을 만들거나 등등
251 | ```python
252 | def call(self, inputs, state):
253 | 	cell_output, next_state = self.cell(inputs,state)
254 | 	cell_output = inputs + cell_output  # residual rnn
255 | 	return cell_output, next_state 
256 | ```
257 | 
258 | * inputs를 FC layer에 넣은 후, cell에 넣어 줄 수도 있다. 이 때, 만들어지는 FC layer는 training이 된다. 
259 | ```python
260 | def call(self, inputs, state):
261 |         fc_outputs = tf.layers.dense(inputs,units=5,name='myFC')  # FC layer
262 | 	cell_output, next_state = self.cell(fc_outputs,state)
263 | 	cell_output = inputs + cell_output  # residual rnn
264 | 	return cell_output, next_state 
265 | ```
266 | 
267 | ### [p.s.]
268 | * 지금까지 tensorflow의 seq2seq 모델에서 BasicDecoder에 넘겨 줄 수 있는 user defined RNN Wrapper을 구현해 보았다.
269 | * RNN Wrapper를 구현하게 된 것은 Tacotron모델을 공부하는 과정에서 Bahdanau Attention을 변형하여 user defined Attention, user defined Helper 등을 공부했는데, 
270 | * 이러한 것을 이해하기 위해서는 먼저 Wrapper Class를 잘 이해할 필요가 있기 때문에, user defined Wrapper Class 만들기를 정리해 보았다.
271 | * 추후,
272 | 	+ user defined Helper class 만들기
273 | 	+ user defined Attention 만들기 등을 정리해 볼 예정입니다.
274 | 


--------------------------------------------------------------------------------
/1. RNNWrapper/wrapper.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/1. RNNWrapper/wrapper.png


--------------------------------------------------------------------------------
/2. User Defined Helper/README.md:
--------------------------------------------------------------------------------
  1 | # Tensorflow에서 BasicDecoder에 넘겨 줄 수 있는 사용자 정의 Helper Calss를 만들어 보자.
  2 | User Defined Helper는 tensorflow.contrib.seq2seq.Helper를 상속받아 구현할 수 있다.
  3 | 
  4 | ### [목차]
  5 | * [왜 User Defined Helper가 필요한가](#왜-User-Defined-Helper가-필요한가)
  6 | * [User Defined Helper 만들기](#User-Defined-Helper-만들기)
  7 | * [TacotronTestHelper 만들기](#TacotronTestHelper-만들기)
  8 | 
  9 | ### [왜 User Defined Helper가 필요한가]
 10 | * 기본적으로 TrainingHelper, GreedyEmbeddingHelper, SampleEmbeddingHelper 등을 주로 사용한다.
 11 | * 모델에 따라서는 이런 표준적인 Helper로 처리할 수 없는 경우가 있다. 
 12 | * 예를 들어, [Tacotron](https://arxiv.org/abs/1703.10135) 모델의 Decoder구조는 r(reduction factor, 아래 그림에서는 3)개의 output을 만들어 내고, 그 중 마지막 것을 다음 step의 input으로 넘겨주는 방식으로 설계되어 있다(inference 단계). 이런 모델을 구현하기 위해서는 User Defined Helper가 필요하다.
 13 | <p align="center"><img width="300" src="tacotron-decoder.png" />  </p>
 14 | 
 15 | * train 단계, inference 단계 각각에 맞는 Helper가 필요하다.
 16 | 
 17 | ### [User Defined Helper 만들기]
 18 | * TrainingHelper와 동일한 User Defined Helper를 하나 만들어보다.
 19 | * tensorflow.contrib.seq2seq.Helper를 상속받아 구현해야 한다.
 20 | 
 21 | ```python
 22 | class MyRnnHelper(Helper):
 23 |     # property(batch_size,sample_ids_dtype,sample_ids_shape)이 정의되어야 하고, initialize,sample,next_inputs이 정의되어야 한다.
 24 |     def __init__(self,embedding,batch_size,output_dim,sequence_length):
 25 |         self._embedding = embedding
 26 |         self._batch_size = batch_size
 27 |         self._output_dim = output_dim
 28 |         self._sequence_length = sequence_length
 29 | 
 30 |     @property
 31 |     def batch_size(self):
 32 |         return self._batch_size
 33 | 
 34 |     @property
 35 |     def sample_ids_dtype(self):
 36 |         return tf.int32
 37 | 
 38 |     @property
 39 |     def sample_ids_shape(self):
 40 |         return tf.TensorShape([])   # sample_ids의 shape이 (batch_size,) 이므로, batch_size를 제외하면, "[]"이 된다.
 41 | 
 42 |     def next_inputs(self, time, outputs, state, sample_ids, name=None):   # time+1을 위한 input을 만든다., outputs,state,sample_ids는 time step에서의 결과이다.
 43 |         # 넘어오는 sample_ids는 sample 함수에어 계산되어 넘어온 값이다.   <----- 이런 계산은 BasicDecoder의 'step' 함수에서 이루어 진다.
 44 |         # next input을 계산하기 위해서 sample_ids를 이용하거나, outpus를 이용하거나 선택하면 된다.
 45 |         
 46 |         next_time = time + 1
 47 |         finished = (next_time >= self._sequence_length)
 48 |         next_inputs = tf.nn.embedding_lookup(self._embedding,sample_ids)
 49 |         return (finished, next_inputs, state)  #finished==True이면 next_inputs,state는 의미가 없다.
 50 | 
 51 |     def initialize(self, name=None):
 52 |         # 시작하는 input을 정의한다.
 53 |         # return (finished, first_inputs). finished는 시작이니까, 무조건 False
 54 |         # first_inputs는 예를 위해서, SOS_token으로 만들어 보았다.
 55 |         return (tf.tile([False], [self._batch_size]), tf.nn.embedding_lookup(self._embedding,tf.tile([SOS_token], [self._batch_size])))  
 56 | 
 57 |     def sample(self, time, outputs, state, name=None):
 58 |         return tf.argmax(outputs, axis=-1,output_type=tf.int32)
 59 | ```
 60 | 
 61 | * 필수 property 3개(batch_size, sample_ids_dtype, sample_ids_shape)를 구현해야하고,
 62 | * member function 3개(initialize,sample,next_inputs)도 구현하면 된다.
 63 | * 필수 property들과 member function을 구현하는데 필요한 추가적인 정보가 필요하다면 __init__ 에서 받아오도록 하면 된다.
 64 | * def initialize(self, name=None): RNN 모형에서 첫 input data를 만들어 주는 역할을 한다.
 65 | *  def sample(self, time, outputs, state, name=None): time에서 만들어진 output, state을 조합해서 sample을 만든다. 예를 들어, TrainingHelper에서는 argmax를 취해서 sample을 만들고, SampleEmbeddingHelper에서는 단순 argmax 대신 distribution을 이용해서 random sampling으로 sample을 만든다. 
 66 | * def next_inputs(self, time, outputs, state,sample_ids, name=None): time step에서 만들어진 output, state와 sample함수에서 만들어진 sample_ids를 이용하여 time+1(다음 step)을 위한 입력 data를 만들어주면 된다.
 67 | * next_inputs 함수 내에서 batch data마다 길이가 다르기 때문에, finished를 정확히 계산하려면 sequence_length를 __init__ 에서 받아와야 한다.
 68 | 
 69 | 
 70 | 
 71 | ### [TacotronTestHelper 만들기]
 72 | * Tacotron 모델에서 inference 단계에서 사용할 Helper를 만들어 보자. 
 73 | 
 74 | ```python
 75 | class TacoTestHelper(Helper):
 76 |     def __init__(self, batch_size, output_dim, r):
 77 |         with tf.name_scope('TacoTestHelper'):
 78 |             self._batch_size = batch_size
 79 |             self._output_dim = output_dim
 80 |             self._end_token = tf.tile([0.0], [output_dim * r])  # [0.0,0.0,...]
 81 | 
 82 |     @property
 83 |     def batch_size(self):
 84 |         return self._batch_size
 85 |     
 86 |     @property
 87 |     def sample_ids_dtype(self):
 88 |         return tf.int32
 89 | 
 90 |     @property
 91 |     def sample_ids_shape(self):
 92 |         return tf.TensorShape([])
 93 |     
 94 |     def initialize(self, name=None):
 95 |         return (tf.tile([False], [self._batch_size]), tf.tile([[0.0]], [self._batch_size, self._output_dim]))
 96 | 
 97 |     def sample(self, time, outputs, state, name=None):
 98 |         # sample함수가 특별히 할 역할이 없기 때문에 그냥 garbage를 return한다.
 99 |         return tf.tile([0], [self._batch_size])  # Return all 0; we ignore them
100 | 
101 |     def next_inputs(self, time, outputs, state, sample_ids, name=None):
102 |         '''Stop on EOS. Otherwise, pass the last output as the next input and pass through state.'''
103 |         with tf.name_scope('TacoTestHelper'):
104 |             finished = tf.reduce_all(tf.equal(outputs, self._end_token), axis=1)
105 |             # Feed last output frame as next input. outputs is [N, output_dim * r]
106 |             next_inputs = outputs[:, -self._output_dim:]  #outputs: (batch_size, output_dim*r)에서 마지막 output_dim개만 return
107 |             return (finished, next_inputs, state)
108 | ```
109 | 
110 | * 함수 next_inputs에서 next_inputs를 계산하는 방식에만 주목하면 된다. output_dim*r 중에서 마지막 output_dim개만 return했다.
111 | 
112 | 
113 | 
114 | ---
115 | ## Reference
116 | - https://github.com/keithito/tacotron
117 | 


--------------------------------------------------------------------------------
/2. User Defined Helper/tacotron-decoder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/2. User Defined Helper/tacotron-decoder.png


--------------------------------------------------------------------------------
/3. User Defined Decoder/BasicDecoder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/3. User Defined Decoder/BasicDecoder.png


--------------------------------------------------------------------------------
/3. User Defined Decoder/BasicDecoder2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/3. User Defined Decoder/BasicDecoder2.png


--------------------------------------------------------------------------------
/3. User Defined Decoder/README.md:
--------------------------------------------------------------------------------
 1 | # 기본적으로 BasicDecoder를 사용하지만, User Defined Help나 RNNWrapper를 사용하기 위해서는 Cumtomization이 필요하다.
 2 | 
 3 | ### [목차]
 4 | * [User Defined Decoder는 언제 필요한가](#User-Defined-Decoder는-언제-필요한가)
 5 | * [BasicDecoder Cumtomization](#BasicDecoder-Cumtomization)
 6 | 
 7 | ---
 8 | 
 9 | 
10 | ### [User Defined Decoder는 언제 필요한가]
11 | * BasicDecoder의 __init__ 함수는 다음과 같은 proto type을 가진다.
12 | <p align="center"><img width="500" src="BasicDecoder.png" />  </p>
13 | 
14 | * 다음은 BasicDecoder의 member function인 step 함수이다.
15 | ```python
16 | def step(self, time, inputs, state, name=None):
17 |   """Perform a decoding step.
18 |   Args:
19 |     time: scalar `int32` tensor.
20 |     inputs: A (structure of) input tensors.
21 |     state: A (structure of) state tensors and TensorArrays.
22 |     name: Name scope for any created operations.
23 |   Returns:
24 |     `(outputs, next_state, next_inputs, finished)`.
25 |   """
26 |   with ops.name_scope(name, "BasicDecoderStep", (time, inputs, state)):
27 |     cell_outputs, cell_state = self._cell(inputs, state)
28 |     if self._output_layer is not None:
29 |       cell_outputs = self._output_layer(cell_outputs)
30 |     sample_ids = self._helper.sample(
31 |         time=time, outputs=cell_outputs, state=cell_state)
32 |     (finished, next_inputs, next_state) = self._helper.next_inputs(
33 |         time=time,
34 |         outputs=cell_outputs,
35 |         state=cell_state,
36 |         sample_ids=sample_ids)
37 |   outputs = BasicDecoderOutput(cell_outputs, sample_ids)
38 |   return (outputs, next_state, next_inputs, finished)
39 | ```
40 | 
41 | * __init__에서 넘겨받은 cell, helper를 step함수에서 처리하고 있는 것을 볼 수 있다.
42 | * __init__에 넘겨지는 cell이나 helper를 표준적인 형식에서 벗어나게 customization했다면, BasicDecoder를 사용할 수 없다.
43 | * 좀 더 구체적으로 살펴보자.
44 | * Helper의 next_inputs함수의 proto type은 다음과 같다.
45 | ```python
46 | def next_inputs(self, time, outputs, state, sample_ids, name=None):
47 | ```
48 | * (time, outputs, state, sample_ids,name)으로 이루어진 argument에 다음과 같이 추가적인 argument가 더해진다고 해보자.
49 | ```python
50 | def next_inputs(self, time, outputs, state, new_arg, sample_ids, name=None):
51 | ```
52 | * 이런 경우에는, BasicDecoder를 customization해야 한다.
53 | 
54 | * 또 다른 예로, cell의 __call__함수의 argument나 return의 형식이 변경되는 경우에도 BasicDecoder를 사용할 수 없다.
55 | 
56 | ---
57 | 
58 | 
59 | ### [BasicDecoder Cumtomization]
60 | * Tensorflow의 BasicDecoder [구현](https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/contrib/seq2seq/python/ops/basic_decoder.py)을 살펴보면 생각보다 간단함을 알 수 있다.
61 | <p align="center"><img width="700" src="BasicDecoder2.png" />  </p>
62 | 
63 | * self.cell(...)이나 self._helper.next_input(...)이 customization 되어 있다면, 상황에 맞게 BasicDecoder의 step함수를 변경하기만 하면 된다.
64 | * 예를 들어, cell이 2개의 output을 return한다면(e.g. Tacotron2에서 rnn cell의 return (cell_outputs, stop_token)) 이런 변형된 output을 처리하려면 BasicDecoder의 step함수를 수정해야 한다.
65 | * 참고: 변형된 output을 하나로 묶어 return하고, Helper에서 처리할 수 있도록 Helper만 cumtomization해도 된다.
66 | 
67 | ### [dynamic_decode]
68 | * tf.contrib.seq2seq.dynamic_decode는 class가 아니고 함수이다.
69 | * decoder(BasicDecoder)의 step 함수를 호출하여 loop 처리한다.
70 | 
71 | 


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/Attention.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/4. Attention with Tensorflow/Attention.png


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/AttentionWrapper-API.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/4. Attention with Tensorflow/AttentionWrapper-API.png


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/Bahdanau-Luong-Attention.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/4. Attention with Tensorflow/Bahdanau-Luong-Attention.png


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/README.md:
--------------------------------------------------------------------------------
  1 | # Tensorflow에서 Attention Model이 RNN API와 어떻게 연계되어 동작하는지 살펴보자.
  2 | 
  3 | ## Attention Model
  4 | * 대표적인 Attention Model은 `Bahdanau Attention`, `Luong Attention` 등이 있다.
  5 | * 이런 Attention Model에 Monotonic한 성질을 더하여 Bahdanau Monotonic Attention, Luong Monotonic Attention이 만들어 질 수도 있다.
  6 | * Tensorflow에서는 Attention Model이 `Attention Mechanism` 이라는 개념으로 다루어진다.
  7 | 
  8 | ![decode](./attentioin-dynamic-rnn-decode.png)
  9 | 
 10 | * 위 그림은 Tensorflow Attention API가 이전 tutorial에서 다룬 Tensorflow의 RNN API(BasicDecoder, dynamic_decode 등)와 어떻게 연결되는지 보여주고 있다.
 11 | * 이전에 tutorial에서 다룬 BasicRNNCell, BasicLSTMCell, GRUCell 등은 class RNNCell을 상속받아 구현된 class들이다.
 12 | * Attention Model을 적용하기 위해서는, 이런 cell들을 대신해서 AttentionWrapper라는 class가 필요한데, 이 AttentionWrapper 또한 RNNCell class를 상속하여 구현된 class이다.
 13 | 
 14 | ![decode](./AttentionWrapper-API.png)
 15 | 
 16 | * AttentionWrapper의 __init__함수의 주요 argument는 {cell, attention_mechanism, attention_layer_size, output_attention, initial_cell_state}이다.
 17 | * 이 주요 argument를 하나씩 살펴보자.
 18 | 
 19 | ### cell
 20 | * cell에는 지금까지 다룬, BasicRNNCell, BasicLSTMCell, GRUCell 등을 넣어 주면된다.
 21 | 
 22 | 
 23 | ### attention_mechanism
 24 | * attention_mechanism은 `AttentionMechanism` object를 넣어 주어야 한다.
 25 | * `AttentionMechanism`에는 `tf.contrib.seq2seq.BahdanauAttention`, `tf.contrib.seq2seq.LuongAttention`, `tf.contrib.seq2seq.BahdanauMonotonicAttention` 등이 있다.
 26 | 
 27 | ![decode](./Attention.png)
 28 | * Attention은 score -> softmax -> alignment -> context -> attention 순으로 계산이 된다.
 29 | * Bahdanau Attention과 Luong Attention은 score를 계산하는 방식이 다른 것 뿐이다.
 30 | ```python
 31 | attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units=11, memory=encoder_outputs,memory_sequence_length=input_lengths,normalize=False)
 32 | ```
 33 | * 여기서 `num_units`이 score의 dimension을 결정한다.
 34 | 
 35 | ![decode](./Bahdanau-Luong-Attention.png)
 36 | * Monotonic Attention은 score로부터 alignmnet를 계산할 때, softmax함수 대신 다른 방식 계산이 사용된다.
 37 | 
 38 | 
 39 | ### attention_layer_size
 40 | * `attention_layer_size`는 contex vector로 부터 attention을 구하는데 필요하다.
 41 | * `attention_layer_size = None` 으로 설정되면 attention은 context vector로 주어진다.
 42 | ![decode](./attention-layer-size.png)
 43 | * `attention_layer_size`가 None이 아니면 decoder hidden state와 context vector가 concat되어 Fully-Connected Layer를 한번 더 통과하여 attention vector를 만들어 낸다.
 44 | 
 45 | 
 46 | ### output_attention
 47 | * `output_attention = True` 이면 attention vector가 RNN의 output이 된다.
 48 | * `output_attention = False` 이면 decoder의 hidden state가 RNN의 output이 된다.
 49 | * 어떤 모델에서는 RNN의 output으로 attentin vector와 decoder hidden state를 concat하기도 하는데, 이런 경우에는 AttentionWrapper를 customization해야 한다.
 50 | 
 51 | 
 52 | ## [Full Code]
 53 | ```python
 54 | # coding: utf-8
 55 | import tensorflow as tf
 56 | import numpy as np
 57 | 
 58 | tf.reset_default_graph()
 59 | def attention_test():
 60 |     vocab_size = 5
 61 |     SOS_token = 0
 62 |     EOS_token = 4
 63 |     
 64 |     x_data = np.array([[SOS_token, 3, 1, 2, 3, 2],[SOS_token, 3, 1, 2, 3, 1],[SOS_token, 1, 3, 2, 2, 1]], dtype=np.int32)
 65 |     y_data = np.array([[1,2,0,3,2,EOS_token],[3,2,3,3,1,EOS_token],[3,1,1,2,0,EOS_token]],dtype=np.int32)
 66 |     Y = tf.convert_to_tensor(y_data)
 67 |     print("data shape: ", x_data.shape)
 68 |     sess = tf.InteractiveSession()
 69 |     
 70 |     output_dim = vocab_size
 71 |     batch_size = len(x_data)
 72 |     hidden_dim =6
 73 |     seq_length = x_data.shape[1]
 74 |     embedding_dim = 8
 75 | 
 76 |     init = np.arange(vocab_size*embedding_dim).reshape(vocab_size,-1)
 77 |     
 78 |     alignment_history_flag = True   # True이면 initial_state나 last state를 sess.run 하면 안됨. alignment_history가 function이기 때문에...
 79 |     with tf.variable_scope('test',reuse=tf.AUTO_REUSE) as scope:
 80 |         # Make rnn cell
 81 |         cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim)
 82 |         
 83 |         
 84 |         embedding = tf.get_variable("embedding", initializer=init.astype(np.float32),dtype = tf.float32)
 85 |         inputs = tf.nn.embedding_lookup(embedding, x_data) # batch_size  x seq_length x embedding_dim
 86 |     
 87 |         #encoder_outputs은 Encoder의 output이다. 보통 Memory라 불린다. 여기서는 toy model이기 때문에 ranodm값을 생성하여 넣어 준다.
 88 |         encoder_outputs = tf.convert_to_tensor(np.random.normal(0,1,[batch_size,20,30]).astype(np.float32)) # 20: encoder sequence length, 30: encoder hidden dim
 89 |         
 90 |         # encoder_outpus의 길이는 20이지만, 다음과 같이 조절할 수 있다.
 91 |         input_lengths = [5,10,20]  # encoder에 padding 같은 것이 있을 경우, attention을 주지 않기 위해
 92 |         
 93 |         # attention mechanism  # num_units = Na = 11
 94 |         attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units=11, memory=encoder_outputs,memory_sequence_length=input_lengths,normalize=False)
 95 | 
 96 |         
 97 |         attention_initial_state = cell.zero_state(batch_size, tf.float32)
 98 |         cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_layer_size=13,initial_cell_state=attention_initial_state,
 99 |                                                    alignment_history=alignment_history_flag,output_attention=True)
100 |         cell = tf.contrib.rnn.OutputProjectionWrapper(cell,output_dim)
101 |         
102 |         # 여기서 zero_state를 부르면, 위의 attentionwrapper에서 넘겨준 attention_initial_state를 가져온다. 즉, AttentionWrapperState.cell_state에는 넣어준 값이 들어있다.
103 |         initial_state = cell.zero_state(batch_size, tf.float32) # AttentionWrapperState
104 |  
105 |         helper = tf.contrib.seq2seq.TrainingHelper(inputs, np.array([seq_length]*batch_size))
106 | 
107 |         decoder = tf.contrib.seq2seq.BasicDecoder(cell=cell,helper=helper,initial_state=initial_state)    
108 | 
109 |         outputs, last_state, last_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder=decoder,output_time_major=False,impute_finished=True)
110 |      
111 |         weights = tf.ones(shape=[batch_size,seq_length])
112 |         loss =   tf.contrib.seq2seq.sequence_loss(logits=outputs.rnn_output, targets=Y, weights=weights)
113 |      
114 |         opt = tf.train.AdamOptimizer(0.01).minimize(loss)
115 |         
116 |         sess.run(tf.global_variables_initializer())
117 |         for i in range(100):
118 |             loss_,_ =sess.run([loss,opt])
119 |             print("{} loss: = {}".format(i,loss_))
120 |         
121 |         if alignment_history_flag ==False:
122 |             print("initial_state: ", sess.run(initial_state))
123 |         print("\n\noutputs: ",outputs)
124 |         o = sess.run(outputs.rnn_output)  #batch_size, seq_length, outputs
125 |         o2 = sess.run(tf.argmax(outputs.rnn_output,axis=-1))
126 |         print("\n",o,o2) #batch_size, seq_length, outputs
127 |      
128 |         print("\n\nlast_state: ",last_state)
129 |         if alignment_history_flag == False:
130 |             print(sess.run(last_state)) # batch_size, hidden_dim
131 |         else:
132 |             print("alignment_history: ", last_state.alignment_history.stack())
133 |             alignment_history_ = sess.run(last_state.alignment_history.stack())
134 |             print(alignment_history_)
135 |             print("alignment_history sum: ",np.sum(alignment_history_,axis=-1))
136 |             
137 |             print("cell_state: ", sess.run(last_state.cell_state))
138 |             print("attention: ", sess.run(last_state.attention))
139 |             print("time: ", sess.run(last_state.time))
140 |             
141 |             alignments_ = sess.run(last_state.alignments)
142 |             print("alignments: ", alignments_)
143 |             print('alignments sum: ', np.sum(alignments_,axis=1))   # alignments의 합이 1인지 확인
144 |             print("attention_state: ", sess.run(last_state.attention_state))
145 | 
146 |         print("\n\nlast_sequence_lengths: ",last_sequence_lengths)
147 |         print(sess.run(last_sequence_lengths)) #  [seq_length]*batch_size    
148 |      
149 |         p = sess.run(tf.nn.softmax(outputs.rnn_output)).reshape(-1,output_dim)
150 |         print("loss: {:20.6f}".format(sess.run(loss)))
151 |         print("manual cal. loss: {:0.6f} ".format(np.average(-np.log(p[np.arange(y_data.size),y_data.flatten()]))) )   
152 | 
153 | if __name__ == '__main__':
154 |     attention_test()
155 | ```
156 | 
157 | ### Summary
158 | * input dimension = 8, hidden dimention = 6, (attention mechanism) num_units=11, (AttentionWrapper) attention_layer_size=13
159 | * encoder output shape = (N,20,30)
160 | ![decode](./attention-shape.png)
161 | * e_i를 계산할 때, 행렬곱을 2가지 방법으로 할 수 있다. [코드](https://gist.github.com/hccho2/81265eea686465fc0fd7aba5cbb73051)
162 | ### Quiz
163 | * T_e = ?
164 | * a: (N,20), h: (N,20,30) 일 때, context c를 구하는 과정을 tensorflow 연산으로 구현하라.
165 | 
166 | [정답](https://github.com/hccho2/Tensorflow-RNN-Tutorial/blob/master/4.%20Attention%20with%20Tensorflow/answer.png)
167 | 


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/answer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/4. Attention with Tensorflow/answer.png


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/attentioin-dynamic-rnn-decode.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/4. Attention with Tensorflow/attentioin-dynamic-rnn-decode.png


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/attention-layer-size.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/4. Attention with Tensorflow/attention-layer-size.png


--------------------------------------------------------------------------------
/4. Attention with Tensorflow/attention-shape.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/4. Attention with Tensorflow/attention-shape.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Tensorflow RNN-tutorial
 2 | 
 3 | <p align="center"><img width="700" src="TF-RNN.png" />  </p>
 4 | 
 5 | 
 6 | #### -1. [Tensorflow RNN Basic of Basic](https://github.com/hccho2/Tensorflow-RNN-Tutorial/tree/master/-1.%20Tensorflow%20RNN%20Basic%20of%20Basic)
 7 | 여기서는 Tensorflow에서 RNN 모델에 대한 공부를 시작할 때 배우게 되는 dynamic_rnn에 대해서 알아본다.
 8 | - dynamic_rnn은 Seq2Seq(Encoder-Decoder)모델을 배우기 전에 먼저 알아야 할 기본적인 RNN 모델 API이다.
 9 | 
10 | 
11 | #### 0. [Basic RNN Model](https://github.com/hccho2/RNN-Tutorial/tree/master/0.%20Basic)
12 | Tensorflow의 다음과 같은 API를 사용하여 기본적인 RNN 모델의 작동 원리를 알 수 있다.
13 | - dynamic_decode를 사용하여 Tensorflow RNN모델 사용법에 관해 알아본다.
14 | - BasicRNNCell, BasicLSTMCell, GRUCell
15 | - TrainingHelper, GreedyEmbeddingHelper
16 | - BasicDecoder
17 | - dynamic_decode
18 | 
19 | #### 1. [User Defined RNNWrapper](https://github.com/hccho2/RNN-Tutorial/tree/master/1.%20RNNWrapper) 
20 | 사용자 정의 RNN Wrapper를 만드는 방법에 대하여 알아본다. 
21 | - RNNCell을 상속받아 사용자 정의 RNN Wrapper class를 정의한다.
22 | - 여기서 만드는 RNN Wrapper는 BasicRNNCell을 대체할 수 있다.
23 | 
24 | 
25 | 
26 | 
27 | 
28 | #### 2. [User Defined Helper](https://github.com/hccho2/Tensorflow-RNN-Tutorial/tree/master/2.%20User%20Defined%20Helper)
29 | 주로 사용하는 TrainingHelper, GreedyEmbeddingHelper, SampleEmbeddingHelper를 대신할 수 있는 사용자 정의 Helper를 만들어 보자.
30 | - Tacotron과 같은 모델에서는 RNN decoder를 구현하려면 사용자 정의 Helper가 반드시 필요하다.
31 | 
32 | 
33 | #### 3. [User Defined Decoder](https://github.com/hccho2/Tensorflow-RNN-Tutorial/tree/master/3.%20User%20Defined%20Decoder)
34 | BasicDecoder를 대체할 수 있는 사용자 정의 Decoder를 만들어 보자.
35 | 
36 | 
37 | 
38 | #### 4. [Attention with Tensorflow](https://github.com/hccho2/Tensorflow-RNN-Tutorial/tree/master/4.%20Attention%20with%20Tensorflow)
39 | - Bahdanau Attention, Luong Attention이 Tensorflow내에서 어떻게 작동하는지에 대하여 알아 보자.
40 | 
41 | 


--------------------------------------------------------------------------------
/TF-RNN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hccho2/Tensorflow-RNN-Tutorial/c27d72d3247b55a3d79b102a37558b5ba2242d1d/TF-RNN.png


--------------------------------------------------------------------------------
/a.bat:
--------------------------------------------------------------------------------
1 | git add --all
2 | git commit -am.
3 | git push origin +master


--------------------------------------------------------------------------------
/b.bat:
--------------------------------------------------------------------------------
1 | git fetch --all
2 | git reset --hard origin/master


--------------------------------------------------------------------------------