├── Convex Hull Accuracy.png ├── README.md ├── convex_hull.png ├── convex_hull_multi_hot.csv ├── convex_hull_one_hot.csv ├── convex_hull_softmax.csv ├── dataset.py ├── main.py ├── pointer.py ├── pointer_networks.m ├── ptr-net2.png ├── seq2seq.png └── seq2seqVSptr.jpg /Convex Hull Accuracy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/Convex Hull Accuracy.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pointer-networks 2 | 3 | Tensorflow implementation of Pointer Networks, modified to use a threshold (or hardmax) pointer instead of a softmax pointer. 4 | ## What is a Pointer Network? 5 | In a normal sequence-to-sequence model, we train a recurrent neural network (RNN) to output a sequence of elements from an output dictionary. In Vinyals et al.'s `Ptr-Net` architecture, we train a RNN to choose an element of the input sequence to output. 6 | 7 | To illustrate the differences between the two models, we can consider how the two networks will treat a toy problem - that of computing the planar convex hull of the five points (generated randomly using numpy) `(0.248, 0.683), (0.986, 0.224), (0.006, 1.000), (0.127, 0.157), (0.165, 0.274)`. In this case, the input is the sequence `[(0.248, 0.683), (0.986, 0.224), (0.006, 1.000), (0.127, 0.157), (0.165, 0.274)]` and the desired output is a permutation of `1,2,3` - correspoding to the three points on the convex hull`(0.006, 1.000), (0.127, 0.157), (0.986, 0.224)`: 8 | 9 | ![Convex Hull of Five Points](https://github.com/Chanlaw/pointer-networks/blob/master/convex_hull.png "Convex Hull of Five Points") 10 | 11 | In a sequence-to-sequence model, the decoding RNN is simply a normal RNN: 12 | ![Sequence-to-sequence Model](https://github.com/Chanlaw/pointer-networks/blob/master/seq2seq.png "Sequence-to-sequence model") 13 | In a Pointer Network, the output of the decoding RNN is used to modulate an attention mechanism over elements of the original sequence: 14 | ![Pointer Network](https://github.com/Chanlaw/pointer-networks/blob/master/ptr-net2.png "Pointer Network") 15 | 16 | Here we introduce two new variants of the original pointer net: `Hard-Ptr-Net` and `Multi-Ptr-Net`. The difference between the three networks is what input gets fed into the pointer network during inference. In the original implementation, we take the softmax over the outputs of the pointer network and use this to blend elements of the input sequence to feed to the network. 17 | 18 | For `Hard-Ptr-Net`, we take the maximum of the outputs and use this to choose an element of the input sequence to feed to the network. 19 | 20 | For `Multi-Ptr-Net`, we take the average of the elements of the input sequence that correspond to outputs that are greater than a threshold (`0.3` by default). (This means that the network "points" to multiple elements of the input sequence.) 21 | ## Running and Evaluating Pointer Networks 22 | The `main.py` file contains code for building and training the pointer network. To build the original `Ptr-Net` of Vinyals et al. and train it on the Convex Hull problem, run: 23 | ``` 24 | python main.py --pointer_type=softmax --problem_type=convex_hull 25 | ``` 26 | To build the `Hard-Ptr-Net` and train it on the convex hull problem: 27 | ``` 28 | python main.py --pointer_type=one_hot --problem_type=convex_hull 29 | ``` 30 | And to build the `Multi-Ptr-Net`: 31 | ``` 32 | python main.py --pointer_type=multi_hot --problem_type=convex_hull 33 | ``` 34 | 35 | ### Other Parameters 36 | In addition to the type of pointer, we can also play around with the following parameters: 37 | ``` 38 | batch_size: Batch size. Default 128. 39 | checkpoint_dir: Directory to store checkpoints. Default 'checkpoints'. 40 | log_dir: Directory to store tensorboard log files. Default 'pointer_logs'. 41 | max_len: The problem size. Default 50. 42 | num_steps: The number of steps to train the network for. Default 100K. 43 | rnn_size: The number of RNN units in each layer. Default 512. 44 | num_layers: The number of layers in the network. Default 1. 45 | problem_type: What kind of problem to train on: one of 'convex_hull' or 'sort'. Default 'convex_hull'. 46 | steps_per_checkpoint: Print averaged train loss, test loss, and accuracy to console after this many steps. Default 100. 47 | learning_rate: The learning rate. Default 0.001. 48 | to_csv: Whether or not to export averaged loss and test accuracies to CSV. Default True. 49 | 50 | ``` 51 | For example, to run a `Ptr-Net` of size `128` on the problem of sorting `10` numbers, run: 52 | ``` 53 | python main.py --pointer_type=softmax --rnn_size=128 --problem_type=sort --max_len=10 54 | ``` 55 | ### Tensorboard Logging 56 | The code supports Tensorboard logging for (test) accuracy, (training) loss, and test loss. The default log directory is `./pointer_logs/`. To run Tensorboard: 57 | ``` 58 | tensorboard --log_dir=pointer_logs 59 | ``` 60 | Then navigate to the address Tensorboard is running at. (The default should be `0.0.0.0:6006`.) 61 | ## Reference 62 | - Oriol Vinyals, Meire Fortunato, Navdeep Jaitly, "Pointer Networks" [arXiv:1506.03134](http://arxiv.org/abs/1506.03134) 63 | -------------------------------------------------------------------------------- /convex_hull.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/convex_hull.png -------------------------------------------------------------------------------- /dataset.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import, division, print_function 2 | 3 | import numpy as np 4 | from scipy.spatial import ConvexHull 5 | 6 | 7 | class DataGenerator(object): 8 | 9 | def __init__(self): 10 | """Construct a DataGenerator.""" 11 | pass 12 | 13 | def next_batch(self, batch_size, N, train_mode=True, convex_hull=False): 14 | """Return the next batch of the data""" 15 | # If training on the convex hull problem: sequence of random points from [0, 1] x [0, 1] 16 | # If training on the sorting problem: sequence of random real numbers in [0, 1] 17 | reader_input_batch = [] 18 | 19 | # Sorted sequence that we feed to encoder 20 | # In inference we feed an unordered sequence again 21 | decoder_input_batch = [] 22 | 23 | # Ordered sequence where one hot vector encodes position in the input array 24 | writer_outputs_batch = [] 25 | 26 | if convex_hull: 27 | for _ in range(N): 28 | reader_input_batch.append(np.zeros([batch_size, 2])) 29 | for _ in range(N+1): 30 | decoder_input_batch.append(np.zeros([batch_size, 2])) 31 | writer_outputs_batch.append(np.zeros([batch_size, N + 1])) 32 | 33 | for b in range(batch_size): 34 | sequence = np.random.rand(N, 2) 35 | leftmost_point = np.argmin(sequence[:,0]) 36 | hull = ConvexHull(sequence) 37 | v = hull.vertices 38 | v = np.roll(v, -list(v).index(leftmost_point)) #start from leftmost point 39 | for i in range(N): 40 | reader_input_batch[i][b] = sequence[i] 41 | 42 | for i in range(len(v)): 43 | if train_mode: 44 | decoder_input_batch[i + 1][b] = sequence[v[i]] 45 | else: 46 | decoder_input_batch[i + 1][b] = sequence[i] 47 | writer_outputs_batch[i][b, v[i]+1] = 1.0 48 | 49 | #Write the stop symbol 50 | for i in xrange(len(v), N): 51 | writer_outputs_batch[i][b, 0] = 1.0 52 | if not train_mode: 53 | decoder_input_batch[i + 1][b] = sequence[i] 54 | writer_outputs_batch[N][b, 0] = 1.0 55 | else: 56 | 57 | for _ in range(N): 58 | reader_input_batch.append(np.zeros([batch_size, 1])) 59 | for _ in range(N + 1): 60 | decoder_input_batch.append(np.zeros([batch_size, 1])) 61 | writer_outputs_batch.append(np.zeros([batch_size, N + 1])) 62 | 63 | for b in range(batch_size): 64 | shuffle = np.random.permutation(N) 65 | sequence = np.sort(np.random.random(N)) 66 | shuffled_sequence = sequence[shuffle] 67 | 68 | for i in range(N): 69 | reader_input_batch[i][b] = shuffled_sequence[i] 70 | if train_mode: 71 | decoder_input_batch[i + 1][b] = sequence[i] 72 | else: 73 | decoder_input_batch[i + 1][b] = shuffled_sequence[i] 74 | writer_outputs_batch[shuffle[i]][b, i + 1] = 1.0 75 | 76 | # Points to the stop symbol 77 | writer_outputs_batch[N][b, 0] = 1.0 78 | 79 | 80 | return reader_input_batch, decoder_input_batch, writer_outputs_batch 81 | if __name__ == "__main__": 82 | dataset = DataGenerator() 83 | r, d, w = dataset.next_batch(1, 5, train_mode=False, convex_hull=True) 84 | print("Reader: ", r) 85 | print("Decoder: ", d) 86 | print("Writer: ", w) 87 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | """Implementation of Pointer networks: http://arxiv.org/pdf/1506.03134v1.pdf. 2 | """ 3 | 4 | 5 | from __future__ import absolute_import, division, print_function 6 | 7 | import random 8 | import os 9 | import math 10 | import numpy as np 11 | import pandas as pd 12 | import tensorflow as tf 13 | 14 | from dataset import DataGenerator 15 | from pointer import pointer_decoder 16 | 17 | flags = tf.app.flags 18 | flags.DEFINE_integer('batch_size', 128, 'Batch size. ') 19 | flags.DEFINE_integer('max_len', 50, 'Size of problem.') 20 | flags.DEFINE_integer('num_steps', 100000, 'Number of steps to train for') 21 | flags.DEFINE_integer('rnn_size', 512, 'Number of RNN cells in each layer') 22 | flags.DEFINE_integer('num_layers', 1, 'Number of layers in the network.') 23 | flags.DEFINE_integer('load_from_checkpoint', False, 'Whether to load from checkpoint') 24 | flags.DEFINE_string('checkpoint_dir', 'checkpoints', 'Directory to store checkpoints') 25 | flags.DEFINE_string('log_dir', 'pointer_logs', 'Directory to put tensorboard log files') 26 | flags.DEFINE_string('problem_type', 'convex_hull', 'What kind of problem to train on: "convex_hull", or "sort".') 27 | flags.DEFINE_string('pointer_type', 'one_hot', 'What kind of pointer to use: "multi_hot", "one_hot", or "soft_max"') 28 | flags.DEFINE_integer('steps_per_checkpoint', 200, 'How many training steps to do per checkpoint.') 29 | flags.DEFINE_float('learning_rate', 0.001, "Learning rate.") 30 | flags.DEFINE_boolean('to_csv', True, "if true, export the averaged loss and test accuracies") 31 | FLAGS = flags.FLAGS 32 | 33 | class PointerNetwork(object): 34 | 35 | def __init__(self, max_len, input_size, size, num_layers, batch_size, learning_rate): 36 | """Create the network. 37 | 38 | Args: 39 | max_len: maximum length of the model. 40 | input_size: size of the inputs data. 41 | size: number of units in each layer of the model. 42 | num_layers: number of layers in the model. 43 | batch_size: the size of the batches used during training; 44 | the model construction is independent of batch_size, so it can be 45 | changed after initialization if this is convenient, e.g., for decoding. 46 | learning_rate: learning rate to start with. 47 | """ 48 | self.max_len = max_len 49 | self.batch_size = batch_size 50 | self.learning_rate = learning_rate 51 | self.global_step = tf.Variable(0, trainable=False) 52 | 53 | 54 | cell = tf.nn.rnn_cell.LSTMCell(size, initializer=tf.random_uniform_initializer(-0.08, 0.08)) 55 | if num_layers > 1: 56 | cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers) 57 | 58 | self.encoder_inputs = [] 59 | self.decoder_inputs = [] 60 | self.decoder_targets = [] 61 | self.target_weights = [] 62 | for i in range(max_len): 63 | self.encoder_inputs.append(tf.placeholder( 64 | tf.float32, [batch_size, input_size], name="EncoderInput%d" % i)) 65 | 66 | for i in range(max_len + 1): 67 | self.decoder_inputs.append(tf.placeholder( 68 | tf.float32, [batch_size, input_size], name="DecoderInput%d" % i)) 69 | self.decoder_targets.append(tf.placeholder( 70 | tf.float32, [batch_size, max_len + 1], name="DecoderTarget%d" % i)) # one hot 71 | self.target_weights.append(tf.placeholder( 72 | tf.float32, [batch_size, 1], name="TargetWeight%d" % i)) 73 | 74 | # Need for attention 75 | encoder_outputs, final_state = tf.nn.rnn(cell, self.encoder_inputs, dtype = tf.float32) 76 | 77 | # Need a dummy output to point on it. End of decoding. 78 | encoder_outputs = [tf.zeros([batch_size, size])] + encoder_outputs 79 | 80 | # First calculate a concatenation of encoder outputs to put attention on. 81 | top_states = [tf.reshape(e, [-1, 1, cell.output_size]) 82 | for e in encoder_outputs] 83 | attention_states = tf.concat(1, top_states) 84 | 85 | #For training 86 | with tf.variable_scope("decoder"): 87 | outputs, states, _ = pointer_decoder( 88 | self.decoder_inputs, final_state, attention_states, cell, feed_prev=False, pointer_type=FLAGS.pointer_type) 89 | 90 | #For inference 91 | with tf.variable_scope("decoder", reuse=True): 92 | predictions, _, inps = pointer_decoder( 93 | self.decoder_inputs, final_state, attention_states, cell, feed_prev=True, pointer_type=FLAGS.pointer_type) 94 | 95 | self.predictions = predictions 96 | self.outputs = outputs 97 | self.inps = inps 98 | 99 | 100 | def create_feed_dict(self, encoder_input_data, decoder_input_data, decoder_target_data): 101 | feed_dict = {} 102 | for placeholder, data in zip(self.encoder_inputs, encoder_input_data): 103 | feed_dict[placeholder] = data 104 | 105 | for placeholder, data in zip(self.decoder_inputs, decoder_input_data): 106 | feed_dict[placeholder] = data 107 | 108 | for placeholder, data in zip(self.decoder_targets, decoder_target_data): 109 | feed_dict[placeholder] = data 110 | 111 | for placeholder in self.target_weights: 112 | feed_dict[placeholder] = np.ones([self.batch_size, 1]) 113 | 114 | return feed_dict 115 | 116 | def step(self): 117 | 118 | loss = 0.0 119 | for output, target, weight in zip(self.outputs, self.decoder_targets, self.target_weights): 120 | loss += tf.nn.softmax_cross_entropy_with_logits(output, target) * weight 121 | 122 | loss = tf.reduce_mean(loss) 123 | tf.scalar_summary('loss', loss) 124 | 125 | test_loss = 0.0 126 | for output, target, weight in zip(self.predictions, self.decoder_targets, self.target_weights): 127 | test_loss += tf.nn.softmax_cross_entropy_with_logits(output, target) * weight 128 | 129 | tf.histogram_summary('predictions', self.predictions) 130 | test_loss = tf.reduce_mean(test_loss) 131 | tf.scalar_summary('test_loss', test_loss) 132 | optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate) 133 | train_op = optimizer.minimize(loss) 134 | 135 | train_loss_value = 0.0 136 | test_loss_value = 0.0 137 | test_acc_value = 0.0 138 | 139 | correct_order = 0.0 140 | all_order = 0.0 141 | 142 | predictions_order = tf.concat(0,[tf.expand_dims(prediction , 0) for prediction in self.predictions]) 143 | predictions_order = tf.transpose(tf.argmax(predictions_order, 2), perm=[1,0]) 144 | 145 | targets_order = tf.concat(0,[tf.expand_dims(target, 0) for target in self.decoder_targets]) 146 | 147 | targets_order = tf.transpose(tf.argmax(targets_order, 2), perm=[1,0]) 148 | 149 | correct_order += tf.reduce_sum(tf.cast(tf.reduce_all(tf.equal(predictions_order,targets_order), 1), tf.float32)) 150 | all_order += self.batch_size 151 | 152 | acc = correct_order/all_order 153 | tf.scalar_summary('accuracy', acc) 154 | 155 | sess = tf.Session() 156 | previous_losses = [] 157 | test_losses = [] 158 | test_accuracies = [] 159 | merged = tf.merge_all_summaries() 160 | 161 | #add op to save and restore all the variables 162 | saver = tf.train.Saver() 163 | 164 | with sess.as_default(): 165 | train_writer = tf.train.SummaryWriter(FLAGS.log_dir + "/" + FLAGS.problem_type +"/" + FLAGS.pointer_type+ "/train", sess.graph) 166 | test_writer = tf.train.SummaryWriter(FLAGS.log_dir + "/" + FLAGS.problem_type +"/" + FLAGS.pointer_type + "/test", sess.graph) 167 | init = tf.initialize_all_variables() 168 | sess.run(init) 169 | 170 | if FLAGS.load_from_checkpoint: 171 | print("Loading from checkpoint...") 172 | saver.restore(sess, FLAGS.checkpoint_dir+"/" + FLAGS.pointer_type + "/model.ckpt") 173 | print("Training network...") 174 | 175 | for i in xrange(FLAGS.num_steps): 176 | encoder_input_data, decoder_input_data, targets_data = dataset.next_batch( 177 | self.batch_size, self.max_len, convex_hull=(FLAGS.problem_type=="convex_hull")) 178 | # Train 179 | feed_dict = self.create_feed_dict( 180 | encoder_input_data, decoder_input_data, targets_data) 181 | 182 | if (i+1)%FLAGS.steps_per_checkpoint == 0: 183 | #record run metadata 184 | run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) 185 | run_metadata = tf.RunMetadata() 186 | summary, d_x, l = sess.run([merged, loss, train_op], 187 | feed_dict=feed_dict, 188 | options=run_options, 189 | run_metadata=run_metadata) 190 | train_writer.add_run_metadata(run_metadata, 'step%d'%(i+1)) 191 | train_writer.add_summary(summary, (i+1)) 192 | else: 193 | summary, d_x, l = sess.run([merged, loss, train_op], feed_dict=feed_dict) 194 | train_writer.add_summary(summary, (i+1)) 195 | 196 | train_loss_value += d_x/FLAGS.steps_per_checkpoint 197 | 198 | if (i+1) % FLAGS.steps_per_checkpoint == 0: 199 | print('Step:', i+1, 'Learning rate:', self.learning_rate) 200 | # store checkpoint 201 | saver.save(sess, FLAGS.checkpoint_dir+"/" + FLAGS.pointer_type + "/model.ckpt") 202 | print("Train Loss: ", train_loss_value) 203 | previous_losses.append(train_loss_value) 204 | train_loss_value = 0 205 | 206 | encoder_input_data, decoder_input_data, targets_data = dataset.next_batch( 207 | self.batch_size, self.max_len, train_mode=False, convex_hull=(FLAGS.problem_type=="convex_hull")) 208 | # Test 209 | feed_dict = self.create_feed_dict( 210 | encoder_input_data, decoder_input_data, targets_data) 211 | inps_ = sess.run(self.inps, feed_dict=feed_dict) 212 | predictions = sess.run(self.predictions, feed_dict=feed_dict) 213 | 214 | summary, test_loss_, test_acc = sess.run([merged, test_loss, acc], feed_dict=feed_dict) 215 | test_writer.add_summary(summary, (i+1)) 216 | 217 | test_loss_value += test_loss_/FLAGS.steps_per_checkpoint 218 | test_acc_value += test_acc/FLAGS.steps_per_checkpoint 219 | 220 | if (i+1) % FLAGS.steps_per_checkpoint == 0: 221 | print("Test Loss: ", test_loss_value) 222 | test_losses.append(test_loss_value) 223 | test_loss_value = 0.0 224 | print('Test Accuracy: %.5f' % test_acc_value) 225 | test_accuracies.append(test_acc_value) 226 | test_acc_value = 0.0 227 | print("----") 228 | # export data to csv 229 | if (FLAGS.to_csv): 230 | output=pd.DataFrame(data={'train_loss': previous_losses, 'test_loss': test_losses, 'test_accuracy': test_accuracies}) 231 | output.to_csv('./pointer_logs/'+ FLAGS.problem_type+'_' + FLAGS.pointer_type+'.csv') 232 | 233 | 234 | if __name__ == "__main__": 235 | # Make log and checkpoint directories if necessary 236 | if not os.path.exists(FLAGS.log_dir): 237 | os.makedirs(FLAGS.log_dir) 238 | if not os.path.exists(FLAGS.checkpoint_dir): 239 | os.makedirs(FLAGS.checkpoint_dir) 240 | print("Creating pointer network...") 241 | pointer_network = PointerNetwork(FLAGS.max_len, 2 - (FLAGS.problem_type == 'sort'), FLAGS.rnn_size, 242 | FLAGS.num_layers, FLAGS.batch_size, FLAGS.learning_rate) 243 | dataset = DataGenerator() 244 | pointer_network.step() 245 | 246 | -------------------------------------------------------------------------------- /pointer.py: -------------------------------------------------------------------------------- 1 | # Copyright 2015 Google Inc. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | """A pointer-network helper. 17 | Based on attenton_decoder implementation from TensorFlow 18 | https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn.py 19 | """ 20 | 21 | from __future__ import absolute_import 22 | from __future__ import division 23 | from __future__ import print_function 24 | 25 | from six.moves import xrange # pylint: disable=redefined-builtin 26 | 27 | import tensorflow as tf 28 | 29 | from tensorflow.python.framework import dtypes 30 | from tensorflow.python.framework import ops 31 | from tensorflow.python.ops import array_ops 32 | from tensorflow.python.ops import control_flow_ops 33 | from tensorflow.python.ops import embedding_ops 34 | from tensorflow.python.ops import math_ops 35 | from tensorflow.python.ops import nn_ops 36 | from tensorflow.python.ops import rnn 37 | from tensorflow.python.ops import rnn_cell 38 | from tensorflow.python.ops import sparse_ops 39 | from tensorflow.python.ops import variable_scope as vs 40 | 41 | def one_hot(inp, attn_length): 42 | output = tf.one_hot(tf.argmax(inp, dimension=1),attn_length) 43 | return output 44 | 45 | def multi_hot(inp, attn_length, threshold=0.3): 46 | output = tf.maximum( 47 | tf.select(tf.greater_equal(inp,tf.fill(tf.shape(inp),threshold)), tf.ones_like(inp) , tf.zeros_like(inp)), 48 | tf.one_hot(tf.argmax(inp, dimension=1),attn_length)) 49 | #normalize output so each row sums to 1 50 | normalizer = tf.expand_dims(tf.reduce_sum(output, reduction_indices=1), 1) 51 | output = tf.div(output, normalizer) 52 | return output 53 | 54 | def pointer_decoder(decoder_inputs, initial_state, attention_states, cell, 55 | feed_prev=True, dtype=dtypes.float32, scope=None, pointer_type="one_hot"): 56 | """RNN decoder with pointer net for the sequence-to-sequence model. 57 | Args: 58 | decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size]. 59 | initial_state: 2D Tensor [batch_size x cell.state_size]. 60 | attention_states: 3D Tensor [batch_size x attn_length x attn_size]. 61 | cell: rnn_cell.RNNCell defining the cell function and size. 62 | dtype: The dtype to use for the RNN initial state (default: tf.float32). 63 | scope: VariableScope for the created subgraph; default: "pointer_decoder". 64 | Returns: 65 | outputs: A list of the same length as decoder_inputs of 2D Tensors of shape 66 | [batch_size x output_size]. These represent the generated outputs. 67 | Output i is computed from input i (which is either i-th decoder_inputs. 68 | First, we run the cell 69 | on a combination of the input and previous attention masks: 70 | cell_output, new_state = cell(linear(input, prev_attn), prev_state). 71 | Then, we calculate new attention masks: 72 | new_attn = softmax(V^T * tanh(W * attention_states + U * new_state)) 73 | and then we calculate the output: 74 | output = linear(cell_output, new_attn). 75 | states: The state of each decoder cell in each time-step. This is a list 76 | with length len(decoder_inputs) -- one item for each time-step. 77 | Each item is a 2D Tensor of shape [batch_size x cell.state_size]. 78 | """ 79 | if not decoder_inputs: 80 | raise ValueError("Must provide at least 1 input to attention decoder.") 81 | if not attention_states.get_shape()[1:2].is_fully_defined(): 82 | raise ValueError("Shape[1] and [2] of attention_states must be known: %s" 83 | % attention_states.get_shape()) 84 | 85 | with vs.variable_scope(scope or "point_decoder") as scope: 86 | batch_size = array_ops.shape(decoder_inputs[0])[0] # Needed for reshaping. 87 | input_size = decoder_inputs[0].get_shape()[1].value 88 | attn_length = attention_states.get_shape()[1].value 89 | attn_size = attention_states.get_shape()[2].value 90 | 91 | # To calculate W1 * h_t we use a 1-by-1 convolution, need to reshape before. 92 | hidden = array_ops.reshape( 93 | attention_states, [-1, attn_length, 1, attn_size]) 94 | 95 | attention_vec_size = attn_size # Size of query vectors for attention. 96 | k = vs.get_variable("AttnW", [1, 1, attn_size, attention_vec_size]) 97 | hidden_features = nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME") 98 | v = vs.get_variable("AttnV", [attention_vec_size]) 99 | 100 | states = [initial_state] 101 | 102 | def attention(query): 103 | """Point on hidden using hidden_features and query.""" 104 | with vs.variable_scope("Attention"): 105 | y = rnn_cell._linear(query, attention_vec_size, True) 106 | y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size]) 107 | # Attention mask is a softmax of v^T * tanh(...). 108 | a = math_ops.reduce_sum( 109 | v * math_ops.tanh(hidden_features + y), [2, 3]) 110 | a = nn_ops.softmax(a) 111 | return a 112 | 113 | outputs = [] 114 | prev = None 115 | batch_attn_size = array_ops.pack([batch_size, attn_size]) 116 | attns = array_ops.zeros(batch_attn_size, dtype=dtype) 117 | 118 | attns.set_shape([None, attn_size]) 119 | inps = [] 120 | for i in xrange(len(decoder_inputs)): 121 | if i > 0: 122 | vs.get_variable_scope().reuse_variables() 123 | inp = decoder_inputs[i] 124 | 125 | if feed_prev and i > 0: 126 | inp = tf.pack(decoder_inputs) 127 | inp = tf.transpose(inp, perm=[1, 0, 2]) 128 | inp = tf.reshape(inp, [-1, attn_length, input_size]) 129 | if pointer_type == "multi_hot": 130 | inp = tf.reduce_sum(inp * tf.reshape(multi_hot(output, attn_length), [-1, attn_length, 1]), 1) 131 | elif pointer_type == "one_hot": 132 | inp = tf.reduce_sum(inp * tf.reshape(one_hot(output, attn_length), [-1, attn_length, 1]), 1) 133 | elif pointer_type == "softmax": 134 | inp = tf.reduce_sum(inp * tf.reshape(output, [-1, attn_length, 1]), 1) 135 | else: 136 | raise ValueError('Pointer type must be one of "multi_hot", "one_hot", and "softmax') 137 | inp = tf.stop_gradient(inp) 138 | inps.append(inp) 139 | 140 | # Use the same inputs in inference, order internaly 141 | 142 | # Merge input and previous attentions into one vector of the right size. 143 | x = rnn_cell._linear([inp, attns], cell.output_size, True) 144 | # Run the RNN. 145 | cell_output, new_state = cell(x, states[-1]) 146 | states.append(new_state) 147 | # Run the attention mechanism. 148 | output = attention(new_state) 149 | outputs.append(output) 150 | return outputs, states, inps 151 | -------------------------------------------------------------------------------- /pointer_networks.m: -------------------------------------------------------------------------------- 1 | Multi_hot = readtable('convex_hull_multi_hot.csv'); 2 | One_hot = readtable('convex_hull_one_hot.csv'); 3 | Softmax = readtable('convex_hull_softmax.csv'); 4 | 5 | Multi_hot{:,1} = round(Multi_hot{:,1}/5+0.5); 6 | Multi_hot = varfun(@mean,Multi_hot,'GroupingVariables','Var1') 7 | One_hot{:,1} = round(One_hot{:,1}/5+0.5); 8 | One_hot = varfun(@mean,One_hot,'GroupingVariables','Var1') 9 | Softmax{:,1} = round(Softmax{:,1}/5+0.5); 10 | Softmax = varfun(@mean,Softmax,'GroupingVariables','Var1') 11 | figure; 12 | hold on 13 | axis([0,1000,0,1]) 14 | plot(Multi_hot{:,1}, Multi_hot.mean_test_accuracy, 'r'); 15 | plot(One_hot{:,1}, One_hot.mean_test_accuracy, 'b'); 16 | plot(Softmax{:,1}, Softmax.mean_test_accuracy,'g'); 17 | xlabel('Step (x10^3)'); 18 | ylabel('Accuracy'); 19 | legend('Multi-Ptr-Net', 'Hard-Ptr-Net', 'Softmax'); 20 | hold off -------------------------------------------------------------------------------- /ptr-net2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/ptr-net2.png -------------------------------------------------------------------------------- /seq2seq.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/seq2seq.png -------------------------------------------------------------------------------- /seq2seqVSptr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/seq2seqVSptr.jpg --------------------------------------------------------------------------------