├── Convex Hull Accuracy.png
├── README.md
├── convex_hull.png
├── convex_hull_multi_hot.csv
├── convex_hull_one_hot.csv
├── convex_hull_softmax.csv
├── dataset.py
├── main.py
├── pointer.py
├── pointer_networks.m
├── ptr-net2.png
├── seq2seq.png
└── seq2seqVSptr.jpg


/Convex Hull Accuracy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/Convex Hull Accuracy.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Pointer-networks
 2 | 
 3 | Tensorflow implementation of Pointer Networks, modified to use a threshold (or hardmax) pointer instead of a softmax pointer.
 4 | ## What is a Pointer Network?
 5 | In a normal sequence-to-sequence model, we train a recurrent neural network (RNN) to output a sequence of elements from an output dictionary. In Vinyals et al.'s `Ptr-Net` architecture, we train a RNN to choose an element of the input sequence to output. 
 6 | 
 7 | To illustrate the differences between the two models, we can consider how the two networks will treat a toy problem - that of computing the planar convex hull of the five points (generated randomly using numpy) `(0.248, 0.683), (0.986, 0.224), (0.006, 1.000), (0.127, 0.157), (0.165, 0.274)`. In this case, the input is the sequence `[(0.248, 0.683), (0.986, 0.224), (0.006, 1.000), (0.127, 0.157), (0.165, 0.274)]` and the desired output is a permutation of `1,2,3` - correspoding to the three points on the convex hull`(0.006, 1.000), (0.127, 0.157), (0.986, 0.224)`:
 8 | 
 9 | ![Convex Hull of Five Points](https://github.com/Chanlaw/pointer-networks/blob/master/convex_hull.png "Convex Hull of Five Points")
10 | 
11 | In a sequence-to-sequence model, the decoding RNN is simply a normal RNN:
12 | ![Sequence-to-sequence Model](https://github.com/Chanlaw/pointer-networks/blob/master/seq2seq.png "Sequence-to-sequence model")
13 | In a Pointer Network, the output of the decoding RNN is used to modulate an attention mechanism over elements of the original sequence:
14 | ![Pointer Network](https://github.com/Chanlaw/pointer-networks/blob/master/ptr-net2.png "Pointer Network")
15 | 
16 | Here we introduce two new variants of the original pointer net: `Hard-Ptr-Net` and `Multi-Ptr-Net`. The difference between the three networks is what input gets fed into the pointer network during inference. In the original implementation, we take the softmax over the outputs of the pointer network and use this to blend elements of the input sequence to feed to the network.
17 | 
18 | For `Hard-Ptr-Net`, we take the maximum of the outputs and use this to choose an element of the input sequence to feed to the network.
19 | 
20 | For `Multi-Ptr-Net`, we take the average of the elements of the input sequence that correspond to outputs that are greater than a threshold (`0.3` by default). (This means that the network "points" to multiple elements of the input sequence.)
21 | ## Running and Evaluating Pointer Networks
22 | The `main.py` file contains code for building and training the pointer network. To build the original `Ptr-Net` of Vinyals et al. and train it on the Convex Hull problem, run:
23 | ```
24 | python main.py --pointer_type=softmax --problem_type=convex_hull
25 | ```
26 | To build the `Hard-Ptr-Net` and train it on the convex hull problem:
27 | ```
28 | python main.py --pointer_type=one_hot --problem_type=convex_hull
29 | ```
30 | And to build the `Multi-Ptr-Net`:
31 | ```
32 | python main.py --pointer_type=multi_hot --problem_type=convex_hull
33 | ```
34 | 
35 | ### Other Parameters
36 | In addition to the type of pointer, we can also play around with the following parameters:
37 | ```
38 | batch_size: Batch size. Default 128.
39 | checkpoint_dir: Directory to store checkpoints. Default 'checkpoints'.
40 | log_dir: Directory to store tensorboard log files. Default 'pointer_logs'. 
41 | max_len: The problem size. Default 50.
42 | num_steps: The number of steps to train the network for. Default 100K.
43 | rnn_size: The number of RNN units in each layer. Default 512.
44 | num_layers: The number of layers in the network. Default 1.
45 | problem_type: What kind of problem to train on: one of 'convex_hull' or 'sort'. Default 'convex_hull'.
46 | steps_per_checkpoint: Print averaged train loss, test loss, and accuracy to console after this many steps. Default 100.
47 | learning_rate: The learning rate. Default 0.001.
48 | to_csv: Whether or not to export averaged loss and test accuracies to CSV. Default True.
49 | 
50 | ```
51 | For example, to run a `Ptr-Net` of size `128` on the problem of sorting `10` numbers, run:
52 | ```
53 | python main.py --pointer_type=softmax --rnn_size=128 --problem_type=sort --max_len=10
54 | ```
55 | ### Tensorboard Logging
56 | The code supports Tensorboard logging for (test) accuracy, (training) loss, and test loss. The default log directory is `./pointer_logs/`. To run Tensorboard:
57 | ```
58 | tensorboard --log_dir=pointer_logs
59 | ```
60 | Then navigate to the address Tensorboard is running at. (The default should be `0.0.0.0:6006`.)
61 | ## Reference
62 | - Oriol Vinyals, Meire Fortunato, Navdeep Jaitly, "Pointer Networks" [arXiv:1506.03134](http://arxiv.org/abs/1506.03134)
63 | 


--------------------------------------------------------------------------------
/convex_hull.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/convex_hull.png


--------------------------------------------------------------------------------
/dataset.py:
--------------------------------------------------------------------------------
 1 | from __future__ import absolute_import, division, print_function
 2 | 
 3 | import numpy as np
 4 | from scipy.spatial import ConvexHull
 5 | 
 6 | 
 7 | class DataGenerator(object):
 8 | 
 9 |     def __init__(self):
10 |         """Construct a DataGenerator."""
11 |         pass
12 | 
13 |     def next_batch(self, batch_size, N, train_mode=True, convex_hull=False):
14 |         """Return the next batch of the data"""
15 |         # If training on the convex hull problem: sequence of random points from [0, 1] x [0, 1]
16 |         # If training on the sorting problem: sequence of random real numbers in [0, 1]
17 |         reader_input_batch = []
18 | 
19 |         # Sorted sequence that we feed to encoder
20 |         # In inference we feed an unordered sequence again
21 |         decoder_input_batch = []
22 | 
23 |         # Ordered sequence where one hot vector encodes position in the input array
24 |         writer_outputs_batch = []
25 | 
26 |         if convex_hull:
27 |             for _ in range(N):
28 |                 reader_input_batch.append(np.zeros([batch_size, 2]))
29 |             for _ in range(N+1):
30 |                 decoder_input_batch.append(np.zeros([batch_size, 2]))
31 |                 writer_outputs_batch.append(np.zeros([batch_size, N + 1]))
32 | 
33 |             for b in range(batch_size):
34 |                 sequence = np.random.rand(N, 2)
35 |                 leftmost_point = np.argmin(sequence[:,0])
36 |                 hull = ConvexHull(sequence)
37 |                 v = hull.vertices
38 |                 v = np.roll(v, -list(v).index(leftmost_point)) #start from leftmost point
39 |                 for i in range(N):
40 |                     reader_input_batch[i][b] = sequence[i]
41 | 
42 |                 for i in range(len(v)):
43 |                     if train_mode:
44 |                         decoder_input_batch[i + 1][b] = sequence[v[i]]
45 |                     else:
46 |                         decoder_input_batch[i + 1][b] = sequence[i]
47 |                     writer_outputs_batch[i][b, v[i]+1] = 1.0
48 | 
49 |                 #Write the stop symbol    
50 |                 for i in xrange(len(v), N):
51 |                     writer_outputs_batch[i][b, 0] = 1.0
52 |                     if not train_mode:
53 |                         decoder_input_batch[i + 1][b] = sequence[i]
54 |                 writer_outputs_batch[N][b, 0] = 1.0
55 |         else:
56 |             
57 |             for _ in range(N):
58 |                 reader_input_batch.append(np.zeros([batch_size, 1]))
59 |             for _ in range(N + 1):
60 |                 decoder_input_batch.append(np.zeros([batch_size, 1]))
61 |                 writer_outputs_batch.append(np.zeros([batch_size, N + 1]))
62 | 
63 |             for b in range(batch_size):
64 |                 shuffle = np.random.permutation(N)
65 |                 sequence = np.sort(np.random.random(N))
66 |                 shuffled_sequence = sequence[shuffle]
67 | 
68 |                 for i in range(N):
69 |                     reader_input_batch[i][b] = shuffled_sequence[i]
70 |                     if train_mode:
71 |                         decoder_input_batch[i + 1][b] = sequence[i]
72 |                     else:
73 |                         decoder_input_batch[i + 1][b] = shuffled_sequence[i]
74 |                     writer_outputs_batch[shuffle[i]][b, i + 1] = 1.0
75 | 
76 |                 # Points to the stop symbol
77 |                 writer_outputs_batch[N][b, 0] = 1.0
78 | 
79 | 
80 |         return reader_input_batch, decoder_input_batch, writer_outputs_batch
81 | if __name__ == "__main__":
82 |     dataset = DataGenerator()
83 |     r, d, w = dataset.next_batch(1, 5, train_mode=False, convex_hull=True)
84 |     print("Reader: ", r)
85 |     print("Decoder: ", d)
86 |     print("Writer: ", w)
87 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | """Implementation of Pointer networks: http://arxiv.org/pdf/1506.03134v1.pdf.
  2 | """
  3 | 
  4 | 
  5 | from __future__ import absolute_import, division, print_function
  6 | 
  7 | import random
  8 | import os
  9 | import math
 10 | import numpy as np
 11 | import pandas as pd
 12 | import tensorflow as tf
 13 | 
 14 | from dataset import DataGenerator
 15 | from pointer import pointer_decoder
 16 | 
 17 | flags = tf.app.flags
 18 | flags.DEFINE_integer('batch_size', 128, 'Batch size.  ')
 19 | flags.DEFINE_integer('max_len', 50, 'Size of problem.')
 20 | flags.DEFINE_integer('num_steps', 100000, 'Number of steps to train for')
 21 | flags.DEFINE_integer('rnn_size', 512, 'Number of RNN cells in each layer')
 22 | flags.DEFINE_integer('num_layers', 1, 'Number of layers in the network.')
 23 | flags.DEFINE_integer('load_from_checkpoint', False, 'Whether to load from checkpoint')
 24 | flags.DEFINE_string('checkpoint_dir', 'checkpoints', 'Directory to store checkpoints')
 25 | flags.DEFINE_string('log_dir', 'pointer_logs', 'Directory to put tensorboard log files')
 26 | flags.DEFINE_string('problem_type', 'convex_hull', 'What kind of problem to train on: "convex_hull", or "sort".')
 27 | flags.DEFINE_string('pointer_type', 'one_hot', 'What kind of pointer to use: "multi_hot", "one_hot", or "soft_max"')
 28 | flags.DEFINE_integer('steps_per_checkpoint', 200, 'How many training steps to do per checkpoint.')
 29 | flags.DEFINE_float('learning_rate', 0.001, "Learning rate.")
 30 | flags.DEFINE_boolean('to_csv', True, "if true, export the averaged loss and test accuracies")
 31 | FLAGS = flags.FLAGS
 32 | 
 33 | class PointerNetwork(object):
 34 |     
 35 |     def __init__(self, max_len, input_size, size, num_layers, batch_size, learning_rate):
 36 |         """Create the network.
 37 |         
 38 |         Args:
 39 |             max_len: maximum length of the model.
 40 |             input_size: size of the inputs data.
 41 |             size: number of units in each layer of the model.
 42 |             num_layers: number of layers in the model.
 43 |             batch_size: the size of the batches used during training;
 44 |                 the model construction is independent of batch_size, so it can be
 45 |                 changed after initialization if this is convenient, e.g., for decoding.
 46 |             learning_rate: learning rate to start with.
 47 |         """
 48 |         self.max_len = max_len
 49 |         self.batch_size = batch_size
 50 |         self.learning_rate = learning_rate
 51 |         self.global_step = tf.Variable(0, trainable=False)
 52 | 
 53 |         
 54 |         cell = tf.nn.rnn_cell.LSTMCell(size, initializer=tf.random_uniform_initializer(-0.08, 0.08))
 55 |         if num_layers > 1:
 56 |             cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers)
 57 |             
 58 |         self.encoder_inputs = []
 59 |         self.decoder_inputs = []
 60 |         self.decoder_targets = []
 61 |         self.target_weights = []
 62 |         for i in range(max_len):
 63 |             self.encoder_inputs.append(tf.placeholder(
 64 |                 tf.float32, [batch_size, input_size], name="EncoderInput%d" % i))
 65 | 
 66 |         for i in range(max_len + 1):
 67 |             self.decoder_inputs.append(tf.placeholder(
 68 |                 tf.float32, [batch_size, input_size], name="DecoderInput%d" % i))
 69 |             self.decoder_targets.append(tf.placeholder(
 70 |                 tf.float32, [batch_size, max_len + 1], name="DecoderTarget%d" % i))  # one hot
 71 |             self.target_weights.append(tf.placeholder(
 72 |                 tf.float32, [batch_size, 1], name="TargetWeight%d" % i))
 73 |         
 74 |         # Need for attention
 75 |         encoder_outputs, final_state = tf.nn.rnn(cell, self.encoder_inputs, dtype = tf.float32)
 76 |         
 77 |         # Need a dummy output to point on it. End of decoding.
 78 |         encoder_outputs = [tf.zeros([batch_size, size])] + encoder_outputs
 79 | 
 80 |         # First calculate a concatenation of encoder outputs to put attention on.
 81 |         top_states = [tf.reshape(e, [-1, 1, cell.output_size])
 82 |                       for e in encoder_outputs]
 83 |         attention_states = tf.concat(1, top_states)
 84 | 
 85 |         #For training
 86 |         with tf.variable_scope("decoder"):
 87 |             outputs, states, _ = pointer_decoder(
 88 |                 self.decoder_inputs, final_state, attention_states, cell, feed_prev=False, pointer_type=FLAGS.pointer_type)
 89 | 
 90 |         #For inference
 91 |         with tf.variable_scope("decoder", reuse=True):
 92 |             predictions, _, inps = pointer_decoder(
 93 |                 self.decoder_inputs, final_state, attention_states, cell, feed_prev=True, pointer_type=FLAGS.pointer_type)
 94 |             
 95 |         self.predictions = predictions
 96 |         self.outputs = outputs
 97 |         self.inps = inps
 98 |         
 99 |             
100 |     def create_feed_dict(self, encoder_input_data, decoder_input_data, decoder_target_data):
101 |         feed_dict = {}
102 |         for placeholder, data in zip(self.encoder_inputs, encoder_input_data):
103 |             feed_dict[placeholder] = data
104 | 
105 |         for placeholder, data in zip(self.decoder_inputs, decoder_input_data):
106 |             feed_dict[placeholder] = data
107 | 
108 |         for placeholder, data in zip(self.decoder_targets, decoder_target_data):
109 |             feed_dict[placeholder] = data
110 | 
111 |         for placeholder in self.target_weights:
112 |             feed_dict[placeholder] = np.ones([self.batch_size, 1])
113 | 
114 |         return feed_dict
115 | 
116 |     def step(self):
117 | 
118 |         loss = 0.0
119 |         for output, target, weight in zip(self.outputs, self.decoder_targets, self.target_weights):
120 |             loss += tf.nn.softmax_cross_entropy_with_logits(output, target) * weight
121 | 
122 |         loss = tf.reduce_mean(loss)
123 |         tf.scalar_summary('loss', loss)
124 |         
125 |         test_loss = 0.0
126 |         for output, target, weight in zip(self.predictions, self.decoder_targets, self.target_weights):
127 |             test_loss += tf.nn.softmax_cross_entropy_with_logits(output, target) * weight
128 |         
129 |         tf.histogram_summary('predictions', self.predictions)
130 |         test_loss = tf.reduce_mean(test_loss)
131 |         tf.scalar_summary('test_loss', test_loss)
132 |         optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
133 |         train_op = optimizer.minimize(loss)
134 |         
135 |         train_loss_value = 0.0
136 |         test_loss_value = 0.0
137 |         test_acc_value = 0.0
138 |         
139 |         correct_order = 0.0
140 |         all_order = 0.0
141 | 
142 |         predictions_order = tf.concat(0,[tf.expand_dims(prediction , 0) for prediction in self.predictions])
143 |         predictions_order = tf.transpose(tf.argmax(predictions_order, 2), perm=[1,0])
144 |         
145 |         targets_order = tf.concat(0,[tf.expand_dims(target, 0) for target in self.decoder_targets])
146 | 
147 |         targets_order = tf.transpose(tf.argmax(targets_order, 2), perm=[1,0])
148 |         
149 |         correct_order += tf.reduce_sum(tf.cast(tf.reduce_all(tf.equal(predictions_order,targets_order), 1), tf.float32))
150 |         all_order += self.batch_size
151 | 
152 |         acc = correct_order/all_order
153 |         tf.scalar_summary('accuracy', acc)
154 | 
155 |         sess = tf.Session()
156 |         previous_losses = []
157 |         test_losses = []
158 |         test_accuracies = []
159 |         merged = tf.merge_all_summaries()
160 | 
161 |         #add op to save and restore all the variables
162 |         saver = tf.train.Saver()
163 | 
164 |         with sess.as_default():
165 |             train_writer = tf.train.SummaryWriter(FLAGS.log_dir + "/" + FLAGS.problem_type +"/" + FLAGS.pointer_type+ "/train", sess.graph)
166 |             test_writer = tf.train.SummaryWriter(FLAGS.log_dir + "/" + FLAGS.problem_type +"/" + FLAGS.pointer_type + "/test", sess.graph)
167 |             init = tf.initialize_all_variables()
168 |             sess.run(init)
169 | 
170 |             if FLAGS.load_from_checkpoint:
171 |                 print("Loading from checkpoint...")
172 |                 saver.restore(sess, FLAGS.checkpoint_dir+"/" + FLAGS.pointer_type +  "/model.ckpt")
173 |             print("Training network...")
174 | 
175 |             for i in xrange(FLAGS.num_steps): 
176 |                 encoder_input_data, decoder_input_data, targets_data = dataset.next_batch(
177 |                     self.batch_size, self.max_len, convex_hull=(FLAGS.problem_type=="convex_hull"))
178 |                 # Train
179 |                 feed_dict = self.create_feed_dict(
180 |                     encoder_input_data, decoder_input_data, targets_data)
181 | 
182 |                 if (i+1)%FLAGS.steps_per_checkpoint == 0:
183 |                     #record run metadata
184 |                     run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
185 |                     run_metadata = tf.RunMetadata()
186 |                     summary, d_x, l = sess.run([merged, loss, train_op], 
187 |                         feed_dict=feed_dict, 
188 |                         options=run_options,
189 |                         run_metadata=run_metadata)
190 |                     train_writer.add_run_metadata(run_metadata, 'step%d'%(i+1))
191 |                     train_writer.add_summary(summary, (i+1))
192 |                 else:
193 |                     summary, d_x, l = sess.run([merged, loss, train_op], feed_dict=feed_dict)
194 |                     train_writer.add_summary(summary, (i+1))
195 | 
196 |                 train_loss_value += d_x/FLAGS.steps_per_checkpoint
197 | 
198 |                 if (i+1) % FLAGS.steps_per_checkpoint == 0:
199 |                     print('Step:', i+1, 'Learning rate:', self.learning_rate)
200 |                     # store checkpoint
201 |                     saver.save(sess, FLAGS.checkpoint_dir+"/" + FLAGS.pointer_type +  "/model.ckpt")
202 |                     print("Train Loss: ", train_loss_value)
203 |                     previous_losses.append(train_loss_value)
204 |                     train_loss_value = 0
205 | 
206 |                 encoder_input_data, decoder_input_data, targets_data = dataset.next_batch(
207 |                     self.batch_size, self.max_len, train_mode=False, convex_hull=(FLAGS.problem_type=="convex_hull"))
208 |                 # Test
209 |                 feed_dict = self.create_feed_dict(
210 |                     encoder_input_data, decoder_input_data, targets_data)
211 |                 inps_ = sess.run(self.inps, feed_dict=feed_dict)
212 |                 predictions = sess.run(self.predictions, feed_dict=feed_dict)
213 |                 
214 |                 summary, test_loss_, test_acc = sess.run([merged, test_loss, acc], feed_dict=feed_dict)
215 |                 test_writer.add_summary(summary, (i+1))
216 | 
217 |                 test_loss_value += test_loss_/FLAGS.steps_per_checkpoint
218 |                 test_acc_value += test_acc/FLAGS.steps_per_checkpoint
219 | 
220 |                 if (i+1) % FLAGS.steps_per_checkpoint == 0:
221 |                     print("Test Loss: ", test_loss_value)
222 |                     test_losses.append(test_loss_value)
223 |                     test_loss_value = 0.0
224 |                     print('Test Accuracy: %.5f' % test_acc_value)
225 |                     test_accuracies.append(test_acc_value)
226 |                     test_acc_value = 0.0
227 |                     print("----")
228 |             # export data to csv
229 |             if (FLAGS.to_csv):
230 |                 output=pd.DataFrame(data={'train_loss': previous_losses, 'test_loss': test_losses, 'test_accuracy': test_accuracies})
231 |                 output.to_csv('./pointer_logs/'+ FLAGS.problem_type+'_' + FLAGS.pointer_type+'.csv')
232 | 
233 | 
234 | if __name__ == "__main__":
235 |     # Make log and checkpoint directories if necessary
236 |     if not os.path.exists(FLAGS.log_dir):
237 |         os.makedirs(FLAGS.log_dir)
238 |     if not os.path.exists(FLAGS.checkpoint_dir):
239 |         os.makedirs(FLAGS.checkpoint_dir)
240 |     print("Creating pointer network...")
241 |     pointer_network = PointerNetwork(FLAGS.max_len, 2 - (FLAGS.problem_type == 'sort'), FLAGS.rnn_size,
242 |                                      FLAGS.num_layers, FLAGS.batch_size, FLAGS.learning_rate)
243 |     dataset = DataGenerator()
244 |     pointer_network.step()
245 | 
246 | 


--------------------------------------------------------------------------------
/pointer.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2015 Google Inc. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | 
 16 | """A pointer-network helper.
 17 | Based on attenton_decoder implementation from TensorFlow
 18 | https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn.py
 19 | """
 20 | 
 21 | from __future__ import absolute_import
 22 | from __future__ import division
 23 | from __future__ import print_function
 24 | 
 25 | from six.moves import xrange  # pylint: disable=redefined-builtin
 26 | 
 27 | import tensorflow as tf
 28 | 
 29 | from tensorflow.python.framework import dtypes
 30 | from tensorflow.python.framework import ops
 31 | from tensorflow.python.ops import array_ops
 32 | from tensorflow.python.ops import control_flow_ops
 33 | from tensorflow.python.ops import embedding_ops
 34 | from tensorflow.python.ops import math_ops
 35 | from tensorflow.python.ops import nn_ops
 36 | from tensorflow.python.ops import rnn
 37 | from tensorflow.python.ops import rnn_cell
 38 | from tensorflow.python.ops import sparse_ops
 39 | from tensorflow.python.ops import variable_scope as vs
 40 | 
 41 | def one_hot(inp, attn_length):
 42 |     output = tf.one_hot(tf.argmax(inp, dimension=1),attn_length)
 43 |     return output 
 44 | 
 45 | def multi_hot(inp, attn_length,  threshold=0.3):
 46 |     output = tf.maximum(
 47 |         tf.select(tf.greater_equal(inp,tf.fill(tf.shape(inp),threshold)), tf.ones_like(inp) , tf.zeros_like(inp)),
 48 |         tf.one_hot(tf.argmax(inp, dimension=1),attn_length))
 49 |     #normalize output so each row sums to 1
 50 |     normalizer = tf.expand_dims(tf.reduce_sum(output, reduction_indices=1), 1)
 51 |     output = tf.div(output, normalizer)
 52 |     return output
 53 | 
 54 | def pointer_decoder(decoder_inputs, initial_state, attention_states, cell,
 55 |                     feed_prev=True, dtype=dtypes.float32, scope=None, pointer_type="one_hot"):
 56 |     """RNN decoder with pointer net for the sequence-to-sequence model.
 57 |     Args:
 58 |       decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
 59 |       initial_state: 2D Tensor [batch_size x cell.state_size].
 60 |       attention_states: 3D Tensor [batch_size x attn_length x attn_size].
 61 |       cell: rnn_cell.RNNCell defining the cell function and size.
 62 |       dtype: The dtype to use for the RNN initial state (default: tf.float32).
 63 |       scope: VariableScope for the created subgraph; default: "pointer_decoder".
 64 |     Returns:
 65 |       outputs: A list of the same length as decoder_inputs of 2D Tensors of shape
 66 |         [batch_size x output_size]. These represent the generated outputs.
 67 |         Output i is computed from input i (which is either i-th decoder_inputs.
 68 |         First, we run the cell
 69 |         on a combination of the input and previous attention masks:
 70 |           cell_output, new_state = cell(linear(input, prev_attn), prev_state).
 71 |         Then, we calculate new attention masks:
 72 |           new_attn = softmax(V^T * tanh(W * attention_states + U * new_state))
 73 |         and then we calculate the output:
 74 |           output = linear(cell_output, new_attn).
 75 |       states: The state of each decoder cell in each time-step. This is a list
 76 |         with length len(decoder_inputs) -- one item for each time-step.
 77 |         Each item is a 2D Tensor of shape [batch_size x cell.state_size].
 78 |     """
 79 |     if not decoder_inputs:
 80 |         raise ValueError("Must provide at least 1 input to attention decoder.")
 81 |     if not attention_states.get_shape()[1:2].is_fully_defined():
 82 |         raise ValueError("Shape[1] and [2] of attention_states must be known: %s"
 83 |                          % attention_states.get_shape())
 84 | 
 85 |     with vs.variable_scope(scope or "point_decoder") as scope:
 86 |         batch_size = array_ops.shape(decoder_inputs[0])[0]  # Needed for reshaping.
 87 |         input_size = decoder_inputs[0].get_shape()[1].value
 88 |         attn_length = attention_states.get_shape()[1].value
 89 |         attn_size = attention_states.get_shape()[2].value
 90 | 
 91 |         # To calculate W1 * h_t we use a 1-by-1 convolution, need to reshape before.
 92 |         hidden = array_ops.reshape(
 93 |             attention_states, [-1, attn_length, 1, attn_size])
 94 | 
 95 |         attention_vec_size = attn_size  # Size of query vectors for attention.
 96 |         k = vs.get_variable("AttnW", [1, 1, attn_size, attention_vec_size])
 97 |         hidden_features = nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME")
 98 |         v = vs.get_variable("AttnV", [attention_vec_size])
 99 | 
100 |         states = [initial_state]
101 | 
102 |         def attention(query):
103 |             """Point on hidden using hidden_features and query."""
104 |             with vs.variable_scope("Attention"):
105 |                 y = rnn_cell._linear(query, attention_vec_size, True)
106 |                 y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size])
107 |                 # Attention mask is a softmax of v^T * tanh(...).
108 |                 a = math_ops.reduce_sum(
109 |                     v * math_ops.tanh(hidden_features + y), [2, 3])
110 |                 a = nn_ops.softmax(a)
111 |                 return a
112 | 
113 |         outputs = []
114 |         prev = None
115 |         batch_attn_size = array_ops.pack([batch_size, attn_size])
116 |         attns = array_ops.zeros(batch_attn_size, dtype=dtype)
117 | 
118 |         attns.set_shape([None, attn_size])
119 |         inps = []
120 |         for i in xrange(len(decoder_inputs)):
121 |             if i > 0:
122 |                 vs.get_variable_scope().reuse_variables()
123 |             inp = decoder_inputs[i]
124 | 
125 |             if feed_prev and i > 0:
126 |                 inp = tf.pack(decoder_inputs)
127 |                 inp = tf.transpose(inp, perm=[1, 0, 2])
128 |                 inp = tf.reshape(inp, [-1, attn_length, input_size])
129 |                 if pointer_type == "multi_hot":
130 |                     inp = tf.reduce_sum(inp * tf.reshape(multi_hot(output, attn_length), [-1, attn_length, 1]), 1)
131 |                 elif pointer_type == "one_hot":
132 |                     inp = tf.reduce_sum(inp * tf.reshape(one_hot(output, attn_length), [-1, attn_length, 1]), 1)
133 |                 elif pointer_type == "softmax":
134 |                     inp = tf.reduce_sum(inp * tf.reshape(output, [-1, attn_length, 1]), 1)
135 |                 else:
136 |                     raise ValueError('Pointer type must be one of "multi_hot", "one_hot", and "softmax')
137 |                 inp = tf.stop_gradient(inp)
138 |                 inps.append(inp)
139 | 
140 |             # Use the same inputs in inference, order internaly
141 | 
142 |             # Merge input and previous attentions into one vector of the right size.
143 |             x = rnn_cell._linear([inp, attns], cell.output_size, True)
144 |             # Run the RNN.
145 |             cell_output, new_state = cell(x, states[-1])
146 |             states.append(new_state)
147 |             # Run the attention mechanism.
148 |             output = attention(new_state)
149 |             outputs.append(output)
150 |     return outputs, states, inps
151 | 


--------------------------------------------------------------------------------
/pointer_networks.m:
--------------------------------------------------------------------------------
 1 | Multi_hot = readtable('convex_hull_multi_hot.csv');
 2 | One_hot = readtable('convex_hull_one_hot.csv');
 3 | Softmax = readtable('convex_hull_softmax.csv');
 4 | 
 5 | Multi_hot{:,1} = round(Multi_hot{:,1}/5+0.5);
 6 | Multi_hot = varfun(@mean,Multi_hot,'GroupingVariables','Var1')
 7 | One_hot{:,1} = round(One_hot{:,1}/5+0.5);
 8 | One_hot = varfun(@mean,One_hot,'GroupingVariables','Var1')
 9 | Softmax{:,1} = round(Softmax{:,1}/5+0.5);
10 | Softmax = varfun(@mean,Softmax,'GroupingVariables','Var1')
11 | figure;
12 | hold on
13 | axis([0,1000,0,1])
14 | plot(Multi_hot{:,1}, Multi_hot.mean_test_accuracy, 'r');
15 | plot(One_hot{:,1}, One_hot.mean_test_accuracy, 'b');
16 | plot(Softmax{:,1}, Softmax.mean_test_accuracy,'g');
17 | xlabel('Step (x10^3)');
18 | ylabel('Accuracy');
19 | legend('Multi-Ptr-Net', 'Hard-Ptr-Net', 'Softmax');
20 | hold off


--------------------------------------------------------------------------------
/ptr-net2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/ptr-net2.png


--------------------------------------------------------------------------------
/seq2seq.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/seq2seq.png


--------------------------------------------------------------------------------
/seq2seqVSptr.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chanlaw/pointer-networks/cb29f4c3166bf16d22e0461bc85dc45b238a6968/seq2seqVSptr.jpg


--------------------------------------------------------------------------------