├── glove
    └── glove-files-go-here
├── AES-Model.png
├── train_all_sets.sh
├── LICENSE
├── README.md
├── data_utils.py
├── qwk.py
├── memn2n_kv_regression.py
├── memn2n_kv.py
├── regression_train.py
├── train.py
└── cv_train.py


/glove/glove-files-go-here:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/AES-Model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binhetech/automated-essay-grading/master/AES-Model.png


--------------------------------------------------------------------------------
/train_all_sets.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | for ((i=1; i<=8; i++))
3 | do
4 |     echo $i
5 |     python train.py --essay_set_id $i --num_samples 2
6 | done
7 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2016 Siyuan Zhao
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Automated Essay Grading
 2 | Source code for the paper [A Memory-Augmented Neural Model for Automated Grading](http://dl.acm.org/citation.cfm?doid=3051457.3053982) in L@S 2017.
 3 | 
 4 | ![Model Structure](AES-Model.png)
 5 | 
 6 | The dataset comes from Kaggle ASAP competition. You can download the data from the link below.
 7 | 
 8 | [https://www.kaggle.com/c/asap-aes/data](https://www.kaggle.com/c/asap-aes/data)
 9 | 
10 | Glove embeddings are used in this work. Specifically, 42B 300d is used to get the best results. You can download the embeddings from the link below.
11 | 
12 | https://nlp.stanford.edu/projects/glove/
13 | 
14 | ### Get Started
15 | 
16 | ```
17 | git clone https://github.com/siyuanzhao/automated-essay-grading.git
18 | ```
19 | 
20 | * Download training data file 'training_set_rel3.tsv' from [Kaggle](https://www.kaggle.com/c/asap-aes/data) and put it under the root folder of this repo.
21 | 
22 | * Download 'glove.42B.300d.zip' from [https://nlp.stanford.edu/projects/glove/](https://nlp.stanford.edu/projects/glove/) and unzip all files into 'glove/' folder.
23 | 
24 | ### Requirements
25 | * Tensorflow 1.10
26 | * scikit-learn 0.19
27 | * six 1.10.0
28 | 
29 | ### Usage
30 | ```
31 | # Train the model on an essay set <essay_set_id>
32 | python cv_train.py --essay_set_id <eassy_set_id>
33 | ```
34 | 
35 | There are serval flags within cv_train.py. Below is an example of training the model on essay set 1 with specific learning rate, and epochs.
36 | 
37 | ```
38 | python cv_train.py --essay_set_id 1 --learning_rate 0.005 --epochs 200
39 | ```
40 | Check all avaiable flags with the following command.
41 | 
42 | ```
43 | python cv_train.py -h
44 | ```
45 | 
46 | **Note**: The model is trained on the training data with 5-fold cross validation. By default, the output layer of the model is a classification layer. There is another model whose output layer is a regression layer in *memn2n_kv_regression.py*. To train the model with the regression output layer, set flag is_regression to True. For example,
47 | 
48 | ```
49 | python cv_train.py --essay_set_id 1 --learning_rate 0.005 --epochs 200 --is_regression True
50 | ```
51 | 
52 | 
53 | 


--------------------------------------------------------------------------------
/data_utils.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import os as os
  3 | import numpy as np
  4 | import itertools
  5 | import pandas as pd
  6 | from collections import Counter
  7 | 
  8 | def load_training_data(training_path, essay_set=1):
  9 |     training_df = pd.read_csv(training_path, delimiter='\t')
 10 |     # resolved score for essay set 1
 11 |     resolved_score = training_df[training_df['essay_set'] == essay_set]['domain1_score']
 12 |     essay_ids = training_df[training_df['essay_set'] == essay_set]['essay_id']
 13 |     essays = training_df[training_df['essay_set'] == essay_set]['essay']
 14 |     essay_list = []
 15 |     # turn an essay to a list of words
 16 |     for idx, essay in essays.iteritems():
 17 |         essay = clean_str(essay)
 18 |         #essay_list.append([w for w in tokenize(essay) if is_ascii(w)])
 19 |         essay_list.append(tokenize(essay))
 20 |     return essay_list, resolved_score.tolist(), essay_ids.tolist()
 21 |     
 22 | def load_glove(token_num=6, dim=50):
 23 |     word2vec = []
 24 |     word_idx = {}
 25 |     # first word is nil
 26 |     word2vec.append([0]*dim)
 27 |     count = 1
 28 |     with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), "glove/glove."+str(token_num)+
 29 |                            "B." + str(dim) + "d.txt")) as f:
 30 |         for line in f:
 31 |             l = line.split()
 32 |             word = l[0]
 33 |             vector = map(float, l[1:])
 34 |             word_idx[word] = count
 35 |             word2vec.append(vector)
 36 |             count += 1
 37 | 
 38 |     print "==> glove is loaded"
 39 | 
 40 |     return word_idx, word2vec
 41 | 
 42 | def tokenize(sent):
 43 |     '''Return the tokens of a sentence including punctuation.
 44 |     >>> tokenize('Bob dropped the apple. Where is the apple?')
 45 |     ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']
 46 |     >>> tokenize('I don't know')
 47 |     ['I', 'don', '\'', 'know']
 48 |     '''
 49 |     return [x.strip() for x in re.split('(\W+)?', sent) if x.strip()]
 50 | 
 51 | def clean_str(string):
 52 |     """
 53 |     Tokenization/string cleaning for all datasets except for SST.
 54 |     Original taken from https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py
 55 |     """
 56 |     string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string)
 57 |     string = re.sub(r"\'s", " \'s", string)
 58 |     string = re.sub(r"\'ve", " \'ve", string)
 59 |     string = re.sub(r"n\'t", " n\'t", string)
 60 |     string = re.sub(r"\'re", " \'re", string)
 61 |     string = re.sub(r"\'d", " \'d", string)
 62 |     string = re.sub(r"\'ll", " \'ll", string)
 63 |     string = re.sub(r",", " , ", string)
 64 |     string = re.sub(r"!", " ! ", string)
 65 |     string = re.sub(r"\(", " ( ", string)
 66 |     string = re.sub(r"\)", " ) ", string)
 67 |     string = re.sub(r"\?", " ? ", string)
 68 |     string = re.sub(r"\s{2,}", " ", string)
 69 | 
 70 |     return string.strip().lower()
 71 | 
 72 | def build_vocab(sentences, vocab_limit):
 73 |     """
 74 |     Builds a vocabulary mapping from word to index based on the sentences.
 75 |     Returns vocabulary mapping and inverse vocabulary mapping.
 76 |     """
 77 |     # Build vocabulary
 78 |     word_counts = Counter(itertools.chain(*sentences))
 79 |     print 'Total size of vocab is {}'.format(len(word_counts.most_common()))
 80 |     # Mapping from index to word
 81 |     # vocabulary_inv = [x[0] for x in word_counts.most_common(vocab_limit)]
 82 |     vocabulary_inv = [x[0] for x in word_counts.most_common(vocab_limit)]
 83 |     
 84 |     vocabulary_inv = list(sorted(vocabulary_inv))
 85 |     # Mapping from word to index
 86 |     vocabulary = {x: i+1 for i, x in enumerate(vocabulary_inv)}
 87 |     return [vocabulary, vocabulary_inv]
 88 | 
 89 | # data is DataFrame
 90 | def vectorize_data(data, word_idx, sentence_size):
 91 |     E = []
 92 |     for essay in data:
 93 |         ls = max(0, sentence_size - len(essay))
 94 |         wl = []
 95 |         for w in essay:
 96 |             if w in word_idx:
 97 |                 wl.append(word_idx[w])
 98 |             else:
 99 |                 #print '{} is not in vocab'.format(w)
100 |                 wl.append(0)
101 |         wl += [0]*ls
102 |         E.append(wl)
103 |     return E
104 | 


--------------------------------------------------------------------------------
/qwk.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | def confusion_matrix(rater_a, rater_b, min_rating=None, max_rating=None):
  5 |     """
  6 |     Returns the confusion matrix between rater's ratings
  7 |     """
  8 |     assert(len(rater_a) == len(rater_b))
  9 |     if min_rating is None:
 10 |         min_rating = min(rater_a + rater_b)
 11 |     if max_rating is None:
 12 |         max_rating = max(rater_a + rater_b)
 13 |     num_ratings = int(max_rating - min_rating + 1)
 14 |     conf_mat = [[0 for i in range(num_ratings)]
 15 |                 for j in range(num_ratings)]
 16 |     for a, b in zip(rater_a, rater_b):
 17 |         conf_mat[a - min_rating][b - min_rating] += 1
 18 |     return conf_mat
 19 | 
 20 | 
 21 | def histogram(ratings, min_rating=None, max_rating=None):
 22 |     """
 23 |     Returns the counts of each type of rating that a rater made
 24 |     """
 25 |     if min_rating is None:
 26 |         min_rating = min(ratings)
 27 |     if max_rating is None:
 28 |         max_rating = max(ratings)
 29 |     num_ratings = int(max_rating - min_rating + 1)
 30 |     hist_ratings = [0 for x in range(num_ratings)]
 31 |     for r in ratings:
 32 |         hist_ratings[r - min_rating] += 1
 33 |     return hist_ratings
 34 | 
 35 | 
 36 | def quadratic_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None):
 37 |     """
 38 |     Calculates the quadratic weighted kappa
 39 |     quadratic_weighted_kappa calculates the quadratic weighted kappa
 40 |     value, which is a measure of inter-rater agreement between two raters
 41 |     that provide discrete numeric ratings.  Potential values range from -1
 42 |     (representing complete disagreement) to 1 (representing complete
 43 |     agreement).  A kappa value of 0 is expected if all agreement is due to
 44 |     chance.
 45 |     quadratic_weighted_kappa(rater_a, rater_b), where rater_a and rater_b
 46 |     each correspond to a list of integer ratings.  These lists must have the
 47 |     same length.
 48 |     The ratings should be integers, and it is assumed that they contain
 49 |     the complete range of possible ratings.
 50 |     quadratic_weighted_kappa(X, min_rating, max_rating), where min_rating
 51 |     is the minimum possible rating, and max_rating is the maximum possible
 52 |     rating
 53 |     """
 54 |     rater_a = np.array(rater_a, dtype=int)
 55 |     rater_b = np.array(rater_b, dtype=int)
 56 |     assert(len(rater_a) == len(rater_b))
 57 |     if min_rating is None:
 58 |         min_rating = min(min(rater_a), min(rater_b))
 59 |     if max_rating is None:
 60 |         max_rating = max(max(rater_a), max(rater_b))
 61 |     conf_mat = confusion_matrix(rater_a, rater_b,
 62 |                                 min_rating, max_rating)
 63 |     num_ratings = len(conf_mat)
 64 |     num_scored_items = float(len(rater_a))
 65 | 
 66 |     hist_rater_a = histogram(rater_a, min_rating, max_rating)
 67 |     hist_rater_b = histogram(rater_b, min_rating, max_rating)
 68 | 
 69 |     numerator = 0.0
 70 |     denominator = 0.0
 71 | 
 72 |     for i in range(num_ratings):
 73 |         for j in range(num_ratings):
 74 |             expected_count = (hist_rater_a[i] * hist_rater_b[j]
 75 |                               / num_scored_items)
 76 |             d = pow(i - j, 2.0) / pow(num_ratings - 1, 2.0)
 77 |             numerator += d * conf_mat[i][j] / num_scored_items
 78 |             denominator += d * expected_count / num_scored_items
 79 | 
 80 |     return 1.0 - numerator / denominator
 81 | 
 82 | 
 83 | def linear_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None):
 84 |     """
 85 |     Calculates the linear weighted kappa
 86 |     linear_weighted_kappa calculates the linear weighted kappa
 87 |     value, which is a measure of inter-rater agreement between two raters
 88 |     that provide discrete numeric ratings.  Potential values range from -1
 89 |     (representing complete disagreement) to 1 (representing complete
 90 |     agreement).  A kappa value of 0 is expected if all agreement is due to
 91 |     chance.
 92 |     linear_weighted_kappa(rater_a, rater_b), where rater_a and rater_b
 93 |     each correspond to a list of integer ratings.  These lists must have the
 94 |     same length.
 95 |     The ratings should be integers, and it is assumed that they contain
 96 |     the complete range of possible ratings.
 97 |     linear_weighted_kappa(X, min_rating, max_rating), where min_rating
 98 |     is the minimum possible rating, and max_rating is the maximum possible
 99 |     rating
100 |     """
101 |     assert(len(rater_a) == len(rater_b))
102 |     if min_rating is None:
103 |         min_rating = min(rater_a + rater_b)
104 |     if max_rating is None:
105 |         max_rating = max(rater_a + rater_b)
106 |     conf_mat = confusion_matrix(rater_a, rater_b,
107 |                                 min_rating, max_rating)
108 |     num_ratings = len(conf_mat)
109 |     num_scored_items = float(len(rater_a))
110 | 
111 |     hist_rater_a = histogram(rater_a, min_rating, max_rating)
112 |     hist_rater_b = histogram(rater_b, min_rating, max_rating)
113 | 
114 |     numerator = 0.0
115 |     denominator = 0.0
116 | 
117 |     for i in range(num_ratings):
118 |         for j in range(num_ratings):
119 |             expected_count = (hist_rater_a[i] * hist_rater_b[j]
120 |                               / num_scored_items)
121 |             d = abs(i - j) / float(num_ratings - 1)
122 |             numerator += d * conf_mat[i][j] / num_scored_items
123 |             denominator += d * expected_count / num_scored_items
124 | 
125 |     return 1.0 - numerator / denominator
126 | 
127 | 
128 | def kappa(rater_a, rater_b, min_rating=None, max_rating=None):
129 |     """
130 |     Calculates the kappa
131 |     kappa calculates the kappa
132 |     value, which is a measure of inter-rater agreement between two raters
133 |     that provide discrete numeric ratings.  Potential values range from -1
134 |     (representing complete disagreement) to 1 (representing complete
135 |     agreement).  A kappa value of 0 is expected if all agreement is due to
136 |     chance.
137 |     kappa(rater_a, rater_b), where rater_a and rater_b
138 |     each correspond to a list of integer ratings.  These lists must have the
139 |     same length.
140 |     The ratings should be integers, and it is assumed that they contain
141 |     the complete range of possible ratings.
142 |     kappa(X, min_rating, max_rating), where min_rating
143 |     is the minimum possible rating, and max_rating is the maximum possible
144 |     rating
145 |     """
146 |     assert(len(rater_a) == len(rater_b))
147 |     if min_rating is None:
148 |         min_rating = min(rater_a + rater_b)
149 |     if max_rating is None:
150 |         max_rating = max(rater_a + rater_b)
151 |     conf_mat = confusion_matrix(rater_a, rater_b,
152 |                                 min_rating, max_rating)
153 |     num_ratings = len(conf_mat)
154 |     num_scored_items = float(len(rater_a))
155 | 
156 |     hist_rater_a = histogram(rater_a, min_rating, max_rating)
157 |     hist_rater_b = histogram(rater_b, min_rating, max_rating)
158 | 
159 |     numerator = 0.0
160 |     denominator = 0.0
161 | 
162 |     for i in range(num_ratings):
163 |         for j in range(num_ratings):
164 |             expected_count = (hist_rater_a[i] * hist_rater_b[j]
165 |                               / num_scored_items)
166 |             if i == j:
167 |                 d = 0.0
168 |             else:
169 |                 d = 1.0
170 |             numerator += d * conf_mat[i][j] / num_scored_items
171 |             denominator += d * expected_count / num_scored_items
172 | 
173 |     return 1.0 - numerator / denominator
174 | 
175 | 
176 | def mean_quadratic_weighted_kappa(kappas, weights=None):
177 |     """
178 |     Calculates the mean of the quadratic
179 |     weighted kappas after applying Fisher's r-to-z transform, which is
180 |     approximately a variance-stabilizing transformation.  This
181 |     transformation is undefined if one of the kappas is 1.0, so all kappa
182 |     values are capped in the range (-0.999, 0.999).  The reverse
183 |     transformation is then applied before returning the result.
184 |     mean_quadratic_weighted_kappa(kappas), where kappas is a vector of
185 |     kappa values
186 |     mean_quadratic_weighted_kappa(kappas, weights), where weights is a vector
187 |     of weights that is the same size as kappas.  Weights are applied in the
188 |     z-space
189 |     """
190 |     kappas = np.array(kappas, dtype=float)
191 |     if weights is None:
192 |         weights = np.ones(np.shape(kappas))
193 |     else:
194 |         weights = weights / np.mean(weights)
195 | 
196 |     # ensure that kappas are in the range [-.999, .999]
197 |     kappas = np.array([min(x, .999) for x in kappas])
198 |     kappas = np.array([max(x, -.999) for x in kappas])
199 | 
200 |     z = 0.5 * np.log((1 + kappas) / (1 - kappas)) * weights
201 |     z = np.mean(z)
202 |     return (np.exp(2 * z) - 1) / (np.exp(2 * z) + 1)
203 | 
204 | 
205 | def weighted_mean_quadratic_weighted_kappa(solution, submission):
206 |     predicted_score = submission[submission.columns[-1]].copy()
207 |     predicted_score.name = "predicted_score"
208 |     if predicted_score.index[0] == 0:
209 |         predicted_score = predicted_score[:len(solution)]
210 |         predicted_score.index = solution.index
211 |     combined = solution.join(predicted_score, how="left")
212 |     groups = combined.groupby(by="essay_set")
213 |     kappas = [quadratic_weighted_kappa(group[1]["essay_score"], group[1]["predicted_score"]) for group in groups]
214 |     weights = [group[1]["essay_weight"].irow(0) for group in groups]
215 |     return mean_quadratic_weighted_kappa(kappas, weights=weights)
216 | 


--------------------------------------------------------------------------------
/memn2n_kv_regression.py:
--------------------------------------------------------------------------------
  1 | """Key Value Memory Networks with GRU reader.
  2 | The implementation is based on https://arxiv.org/abs/1606.03126
  3 | The implementation is based on http://arxiv.org/abs/1503.08895 [1]
  4 | """
  5 | from __future__ import absolute_import
  6 | from __future__ import division
  7 | 
  8 | import tensorflow as tf
  9 | from six.moves import range
 10 | import numpy as np
 11 | # from attention_reader import Attention_Reader
 12 | 
 13 | def position_encoding(sentence_size, embedding_size):
 14 |     """
 15 |     Position Encoding described in section 4.1 [1]
 16 |     """
 17 |     encoding = np.ones((embedding_size, sentence_size), dtype=np.float32)
 18 |     ls = sentence_size+1
 19 |     le = embedding_size+1
 20 |     for i in range(1, le):
 21 |         for j in range(1, ls):
 22 |             encoding[i-1, j-1] = (i - (le-1)/2) * (j - (ls-1)/2)
 23 |     encoding = 1 + 4 * encoding / embedding_size / sentence_size
 24 |     return np.transpose(encoding)
 25 | 
 26 | def add_gradient_noise(t, stddev=1e-3, name=None):
 27 |     """
 28 |     Adds gradient noise as described in http://arxiv.org/abs/1511.06807 [2].
 29 | 
 30 |     The input Tensor `t` should be a gradient.
 31 | 
 32 |     The output will be `t` + gaussian noise.
 33 | 
 34 |     0.001 was said to be a good fixed value for memory networks [2].
 35 |     """
 36 |     with tf.name_scope(name, "add_gradient_noise", [t, stddev]) as name:
 37 |         #r = 0.55
 38 |         t = tf.convert_to_tensor(t, name="t")
 39 |         #sd = stddev/(1+step)**r
 40 |         gn = tf.random_normal(tf.shape(t), stddev=stddev)
 41 |         return tf.add(t, gn, name=name)
 42 | 
 43 | def zero_nil_slot(t, name=None):
 44 |     """
 45 |     Overwrites the nil_slot (first row) of the input Tensor with zeros.
 46 |     The nil_slot is a dummy slot and should not be trained and influence
 47 |     the training algorithm.
 48 |     """
 49 |     with tf.name_scope(name, "zero_nil_slot", [t]) as name:
 50 |         t = tf.convert_to_tensor(t, name="t")
 51 |         s = tf.shape(t)[1]
 52 |         z = tf.zeros(tf.pack([1, s]))
 53 |         return tf.concat(0, [z, tf.slice(t, [1, 0], [-1, -1])], name=name)
 54 | 
 55 | class MemN2N_KV(object):
 56 |     """Key Value Memory Network."""
 57 |     def __init__(self, batch_size, vocab_size,
 58 |                  query_size, story_size, memory_key_size,
 59 |                  memory_value_size, embedding_size,
 60 |                  min_score, feature_size=30,
 61 |                  hops=3,
 62 |                  reader='bow',
 63 |                  l2_lambda=0.2,
 64 |                  name='KeyValueMemN2N'):
 65 |         """Creates an Key Value Memory Network
 66 | 
 67 |         Args:
 68 |         batch_size: The size of the batch.
 69 | 
 70 |         vocab_size: The size of the vocabulary (should include the nil word). The nil word one-hot encoding should be 0.
 71 | 
 72 |         query_size: largest number of words in question
 73 | 
 74 |         story_size: largest number of words in story
 75 | 
 76 |         embedding_size: The size of the word embedding.
 77 | 
 78 |         memory_key_size: the size of memory slots for keys
 79 |         memory_value_size: the size of memory slots for values
 80 |         
 81 |         feature_size: dimension of feature extraced from word embedding
 82 | 
 83 |         hops: The number of hops. A hop consists of reading and addressing a memory slot.
 84 | 
 85 |         debug_mode: If true, print some debug info about tensors
 86 |         name: Name of the End-To-End Memory Network.\
 87 |         Defaults to `KeyValueMemN2N`.
 88 |         """
 89 |         self._story_size = story_size
 90 |         self._batch_size = batch_size
 91 |         self._vocab_size = vocab_size
 92 |         self._query_size = query_size
 93 |         #self._wiki_sentence_size = doc_size
 94 |         self._memory_key_size = memory_key_size
 95 |         self._embedding_size = embedding_size
 96 |         self._hops = hops
 97 |         self._name = name
 98 |         self._memory_value_size = memory_value_size
 99 |         self._encoding = tf.constant(position_encoding(self._story_size, self._embedding_size), name="encoding")
100 |         self._reader = reader
101 |         self._build_inputs()
102 | 
103 |         d = feature_size
104 |         self._feature_size = feature_size
105 |         self._n_hidden = feature_size
106 |         self.reader_feature_size = 0
107 | 
108 |         # trainable variables
109 |         if reader == 'bow':
110 |             self.reader_feature_size = self._embedding_size
111 |         elif reader == 'simple_gru':
112 |             self.reader_feature_size = self._n_hidden
113 | 
114 |         self.A = tf.get_variable('A', shape=[self._feature_size, self.reader_feature_size],
115 |                                  initializer=tf.contrib.layers.xavier_initializer())
116 |         self.A_mvalue = tf.get_variable('A_mvalue', shape=[self._feature_size, self.reader_feature_size],
117 |                                         initializer=tf.contrib.layers.xavier_initializer())
118 |         self.A_mkey = tf.get_variable('A_mkey', shape=[self._feature_size, self.reader_feature_size],
119 |                                       initializer=tf.contrib.layers.xavier_initializer())
120 | 
121 |         #self.TK = tf.get_variable('TK', shape=[self._memory_value_size, self.reader_feature_size],
122 |         #                          initializer=tf.contrib.layers.xavier_initializer())
123 |         #self.TV = tf.get_variable('TV', shape=[self._memory_value_size, self.reader_feature_size],
124 |         #                          initializer=tf.contrib.layers.xavier_initializer())
125 | 
126 |         # Embedding layer
127 |         #nil_word_slot = tf.zeros([1, embedding_size])
128 |         #self.W = tf.concat(0, [nil_word_slot, tf.get_variable('W', shape=[vocab_size-1, embedding_size],
129 |         #                                                      initializer=tf.contrib.layers.xavier_initializer())])
130 |         self.W = tf.Variable(self.w_placeholder, trainable=False)
131 |         self.W_memory = self.W
132 |         #self._nil_vars = set([self.W.name, self.W_memory.name])
133 |         # shape: [batch_size, query_size, embedding_size]
134 |         self.embedded_chars = tf.nn.embedding_lookup(self.W, self._query)
135 |         # shape: [batch_size, memory_size, story_size, embedding_size]
136 |         self.mkeys_embedded_chars = tf.nn.embedding_lookup(self.W_memory, self._memory_key)
137 |         # shape: [batch_size, memory_size, story_size, embedding_size]
138 |         self.mvalues_embedded_chars = tf.nn.embedding_lookup(self.W_memory, self._memory_key)
139 | 
140 |         if reader == 'bow':
141 |             q_r = tf.reduce_sum(self.embedded_chars*self._encoding, 1)
142 |             doc_r = tf.reduce_sum(self.mkeys_embedded_chars*self._encoding, 2)
143 |             value_r = tf.reduce_sum(self.mvalues_embedded_chars*self._encoding, 2)
144 | 
145 |         r_list = []
146 |         R = tf.get_variable('R', shape=[self._feature_size, self._feature_size],
147 |                             initializer=tf.contrib.layers.xavier_initializer())
148 |         
149 |         for _ in range(self._hops):
150 |             # define R for variables
151 |             #R = tf.get_variable('R{}'.format(_), shape=[self._feature_size, self._feature_size],
152 |             #                    initializer=tf.contrib.layers.xavier_initializer())
153 |             r_list.append(R)
154 | 
155 |         o = self._key_addressing(doc_r, value_r, q_r, r_list)
156 |         o = tf.transpose(o)
157 |         if reader == 'bow':
158 |             #self.B = self.A
159 |             self.B = tf.get_variable('B', shape=[self._feature_size, 1],
160 |                                      initializer=tf.truncated_normal_initializer())
161 |         elif reader == 'simple_gru':
162 |             #self.B = tf.get_variable('B', shape=[self._feature_size, self._embedding_size],
163 |             self.B = tf.get_variable('B', shape=[self._feature_size, self._vocab_size],
164 |                                      initializer=tf.contrib.layers.xavier_initializer())
165 |         logits_bias = tf.get_variable('logits_bias', [1])
166 |         # y_tmp = tf.matmul(self.B, self.W_memory, transpose_b=True)
167 |         with tf.name_scope("prediction"):
168 |             #logits = tf.matmul(o, y_tmp)# + logits_bias
169 |             logits = tf.matmul(o, self.B) + logits_bias
170 |             #normed_score = tf.squeeze(tf.nn.sigmoid(tf.cast(logits, tf.float32)))
171 |             #score = normed_score * (max_score - min_score) + min_score
172 |             score = tf.squeeze(logits)
173 |             mse = tf.reduce_mean(tf.square(tf.sub(score, self._score_encoding)))
174 |             # loss op
175 |             trainable_vars = tf.trainable_variables()
176 |             lossL2 = tf.add_n([tf.nn.l2_loss(v) for v in trainable_vars])
177 |             loss_op = mse + l2_lambda*lossL2
178 |             # predict ops
179 | 
180 |             # assign ops
181 |             self.cost = mse
182 |             self.loss_op = loss_op
183 |             self.predict_op = score
184 | 
185 |     def _build_inputs(self):
186 |         with tf.name_scope("input"):
187 |             self._memory_key = tf.placeholder(tf.int32, [None, self._memory_value_size, self._story_size], name='memory_key')
188 |             
189 |             self._query = tf.placeholder(tf.int32, [None, self._query_size], name='essay')
190 | 
191 |             self._score_encoding = tf.placeholder(tf.float32, [None], name='score')
192 |             self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')
193 |             self.w_placeholder = tf.placeholder(tf.float32, [self._vocab_size, self._embedding_size])
194 |             self._mem_attention_encoding = tf.placeholder(tf.int32, [None, self._memory_key_size])
195 | 
196 |     '''
197 |     mkeys: the vector representation for keys in memory
198 |     -- shape of each mkeys: [1, embedding_size]
199 |     mvalues: the vector representation for values in memory
200 |     -- shape of each mvalues: [1, embedding_size]
201 |     questions: the vector representation for the question
202 |     -- shape of questions: [1, embedding_size]
203 |     -- shape of R: [feature_size, feature_size]
204 |     -- shape of self.A: [feature_size, embedding_size]
205 |     -- shape of self.B: [feature_size, embedding_size]
206 |     self.A, self.B and R are the parameters to learn
207 |     '''
208 |     def _key_addressing(self, mkeys, mvalues, questions, r_list):
209 |         self.mem_attention_probs = []
210 |         with tf.variable_scope(self._name):
211 |             # [feature_size, batch_size]
212 |             u_o = tf.matmul(self.A, questions, transpose_b=True)
213 |             u = [u_o]
214 |             for _ in range(self._hops):
215 |                 R = r_list[_]
216 |                 u_temp = u[-1]
217 |                 mk_temp = mkeys # + self.TK
218 |                 # [reader_size, batch_size x memory_size]
219 |                 k_temp = tf.reshape(tf.transpose(mk_temp, [2, 0, 1]), [self.reader_feature_size, -1])
220 |                 # [feature_size, batch_size x memory_size]
221 |                 a_k_temp = tf.nn.dropout(tf.matmul(self.A_mvalue, k_temp), self.keep_prob)
222 |                 # [batch_size, memory_size, feature_size]
223 |                 a_k = tf.reshape(tf.transpose(a_k_temp), [-1, self._memory_key_size, self._feature_size])
224 |                 # [batch_size, 1, feature_size]
225 |                 u_expanded = tf.expand_dims(tf.transpose(u_temp), [1])
226 |                 # [batch_size, memory_size]
227 |                 dotted = tf.reduce_sum(a_k*u_expanded, 2)
228 | 
229 |                 # Calculate probabilities
230 |                 # [batch_size, memory_size]
231 |                 probs = tf.nn.softmax(tf.to_float(dotted))
232 |                 self.mem_attention_probs.append(probs)
233 | 
234 |                 # [batch_size, memory_size, 1]
235 |                 probs_expand = tf.expand_dims(probs, -1)
236 |                 mv_temp = mvalues # + self.TV
237 |                 # [reader_size, batch_size x memory_size]
238 |                 v_temp = tf.reshape(tf.transpose(mv_temp, [2, 0, 1]), [self.reader_feature_size, -1])
239 |                 # [feature_size, batch_size x memory_size]
240 |                 a_v_temp = tf.nn.dropout(tf.matmul(self.A_mkey, v_temp), self.keep_prob)
241 |                 # [batch_size, memory_size, feature_size]
242 |                 a_v = tf.reshape(tf.transpose(a_v_temp), [-1, self._memory_key_size, self._feature_size])
243 |                 # [batch_size, feature_size]
244 |                 o_k = tf.reduce_sum(probs_expand*a_v, 1)
245 |                 # [feature_size, batch_size]
246 |                 o_k = tf.transpose(o_k)
247 |                 # [feature_size, batch_size]
248 |                 # test point
249 |                 #u_k = tf.nn.relu(tf.matmul(R, u_o+o_k))
250 |                 u_k = tf.nn.dropout(tf.nn.relu(tf.matmul(R, u[-1]+o_k)), self.keep_prob)
251 | 
252 |                 u.append(u_k)
253 |             self.mem_attention_probs = tf.pack(self.mem_attention_probs, axis=1)
254 |             # test point
255 |             return u[-1]
256 |             # return tf.add_n(u)/len(u)
257 | 


--------------------------------------------------------------------------------
/memn2n_kv.py:
--------------------------------------------------------------------------------
  1 | """Key Value Memory Networks with GRU reader.
  2 | The implementation is based on https://arxiv.org/abs/1606.03126
  3 | The implementation is based on http://arxiv.org/abs/1503.08895 [1]
  4 | """
  5 | from __future__ import absolute_import
  6 | from __future__ import division
  7 | 
  8 | import tensorflow as tf
  9 | from six.moves import range
 10 | import numpy as np
 11 | # from attention_reader import Attention_Reader
 12 | 
 13 | def position_encoding(sentence_size, embedding_size):
 14 |     """
 15 |     Position Encoding described in section 4.1 [1]
 16 |     """
 17 |     encoding = np.ones((embedding_size, sentence_size), dtype=np.float32)
 18 |     ls = sentence_size+1
 19 |     le = embedding_size+1
 20 |     for i in range(1, le):
 21 |         for j in range(1, ls):
 22 |             encoding[i-1, j-1] = (i - (le-1)/2) * (j - (ls-1)/2)
 23 |     encoding = 1 + 4 * encoding / embedding_size / sentence_size
 24 |     return np.transpose(encoding)
 25 | 
 26 | def add_gradient_noise(t, stddev=1e-3, name=None):
 27 |     """
 28 |     Adds gradient noise as described in http://arxiv.org/abs/1511.06807 [2].
 29 | 
 30 |     The input Tensor `t` should be a gradient.
 31 | 
 32 |     The output will be `t` + gaussian noise.
 33 | 
 34 |     0.001 was said to be a good fixed value for memory networks [2].
 35 |     """
 36 |     with tf.name_scope(name, "add_gradient_noise", [t, stddev]) as name:
 37 |         #r = 0.55
 38 |         t = tf.convert_to_tensor(t, name="t")
 39 |         #sd = stddev/(1+step)**r
 40 |         gn = tf.random_normal(tf.shape(t), stddev=stddev)
 41 |         return tf.add(t, gn, name=name)
 42 | 
 43 | def zero_nil_slot(t, name=None):
 44 |     """
 45 |     Overwrites the nil_slot (first row) of the input Tensor with zeros.
 46 |     The nil_slot is a dummy slot and should not be trained and influence
 47 |     the training algorithm.
 48 |     """
 49 |     with tf.name_scope(name, "zero_nil_slot", [t]) as name:
 50 |         t = tf.convert_to_tensor(t, name="t")
 51 |         s = tf.shape(t)[1]
 52 |         z = tf.zeros(tf.stack([1, s]))
 53 |         return tf.concat([z, tf.slice(t, [1, 0], [-1, -1])], 0, name=name)
 54 | 
 55 | class MemN2N_KV(object):
 56 |     """Key Value Memory Network."""
 57 |     def __init__(self, batch_size, vocab_size,
 58 |                  query_size, story_size, memory_key_size,
 59 |                  memory_value_size, embedding_size, score_range,
 60 |                  feature_size=30,
 61 |                  hops=3,
 62 |                  reader='bow',
 63 |                  l2_lambda=0.2,
 64 |                  name='KeyValueMemN2N'):
 65 |         """Creates an Key Value Memory Network
 66 | 
 67 |         Args:
 68 |         batch_size: The size of the batch.
 69 | 
 70 |         vocab_size: The size of the vocabulary (should include the nil word). The nil word one-hot encoding should be 0.
 71 | 
 72 |         query_size: largest number of words in question
 73 | 
 74 |         story_size: largest number of words in story
 75 | 
 76 |         embedding_size: The size of the word embedding.
 77 | 
 78 |         memory_key_size: the size of memory slots for keys
 79 |         memory_value_size: the size of memory slots for values
 80 |         
 81 |         feature_size: dimension of feature extraced from word embedding
 82 | 
 83 |         hops: The number of hops. A hop consists of reading and addressing a memory slot.
 84 | 
 85 |         debug_mode: If true, print some debug info about tensors
 86 |         name: Name of the End-To-End Memory Network.\
 87 |         Defaults to `KeyValueMemN2N`.
 88 |         """
 89 |         self._story_size = story_size
 90 |         self._batch_size = batch_size
 91 |         self._vocab_size = vocab_size
 92 |         self._query_size = query_size
 93 |         #self._wiki_sentence_size = doc_size
 94 |         self._memory_key_size = memory_key_size
 95 |         self._embedding_size = embedding_size
 96 |         self._hops = hops
 97 |         self._name = name
 98 |         self._memory_value_size = memory_value_size
 99 |         self._encoding = tf.constant(position_encoding(self._story_size, self._embedding_size), name="encoding")
100 |         self._reader = reader
101 |         self._build_inputs()
102 | 
103 |         d = feature_size
104 |         self._feature_size = feature_size
105 |         self._n_hidden = embedding_size
106 |         self.reader_feature_size = 0
107 | 
108 |         # keep track of attention in memory
109 |         self.mem_attention_probs = []
110 | 
111 |         # one-hot encoding for scores
112 |         self._labels = tf.one_hot(self._score_encoding, score_range, on_value=1.0, off_value=0.0, axis=-1)
113 |         # trainable variables
114 |         self.reader_feature_size = self._embedding_size
115 | 
116 |         self.A = tf.get_variable('A', shape=[self._feature_size, self.reader_feature_size],
117 |                                  initializer=tf.contrib.layers.xavier_initializer())
118 |         self.A_mvalue = tf.get_variable('A_mvalue', shape=[self._feature_size, self.reader_feature_size],
119 |                                         initializer=tf.contrib.layers.xavier_initializer())
120 |         self.A_mkey = tf.get_variable('A_mkey', shape=[self._feature_size, self.reader_feature_size],
121 |                                       initializer=tf.contrib.layers.xavier_initializer())
122 | 
123 |         # Embedding layer
124 |         self.W = tf.Variable(self.w_placeholder, trainable=False)
125 |         self.W_memory = self.W
126 |         # shape: [batch_size, query_size, embedding_size]
127 |         self.embedded_chars = tf.nn.embedding_lookup(self.W, self._query)
128 |         # shape: [batch_size, memory_size, story_size, embedding_size]
129 |         self.mkeys_embedded_chars = tf.nn.embedding_lookup(self.W_memory, self._memory_key)
130 |         if reader == 'bow':
131 |             # shape: [batch_size, memory_size, story_size, embedding_size]
132 |             q_r = tf.reduce_sum(self.embedded_chars*self._encoding, 1)
133 |             doc_r = tf.reduce_sum(self.mkeys_embedded_chars*self._encoding, 2)
134 |         elif reader == 'gru':
135 |             x_tmp = tf.reshape(self.mkeys_embedded_chars, [-1, self._story_size, self._embedding_size])
136 |             x = tf.transpose(x_tmp, [1, 0, 2])
137 |             # Reshape to (n_steps*batch_size, n_input)
138 |             x = tf.reshape(x, [-1, self._embedding_size])
139 |             # Split to get a list of 'n_steps'
140 |             # tensors of shape (doc_num, n_input)
141 |             x = tf.split(x, self._story_size, 0)
142 | 
143 |             # do the same thing on the essay
144 |             q = tf.transpose(self.embedded_chars, [1, 0, 2])
145 |             q = tf.reshape(q, [-1, self._embedding_size])
146 |             q = tf.split(q, self._query_size, 0)
147 | 
148 |             with tf.variable_scope('gru') as gru_scope:
149 |                 gru_rnn = tf.nn.rnn_cell.GRUCell(self._n_hidden)
150 |                 doc_r, _ = tf.contrib.rnn.static_rnn(gru_rnn, x, dtype=tf.float32)
151 |                 doc_r = tf.reshape(doc_r[-1], [-1, self._memory_key_size, self._n_hidden])
152 |             with tf.variable_scope(gru_scope, reuse=True):
153 |                 q_r, _ = tf.contrib.rnn.static_rnn(gru_rnn, q, dtype=tf.float32)
154 |                 q_r = q_r[-1]
155 | 
156 |         r_list = []
157 |         R = tf.get_variable('R', shape=[self._feature_size, self._feature_size],
158 |                             initializer=tf.contrib.layers.xavier_initializer())
159 | 
160 |         for _ in range(self._hops):
161 |             # define R for variables
162 |             #R = tf.get_variable('R{}'.format(_), shape=[self._feature_size, self._feature_size],
163 |             #                    initializer=tf.contrib.layers.xavier_initializer())
164 |             r_list.append(R)
165 | 
166 |         o = self._key_addressing(doc_r, doc_r, q_r, r_list)
167 |         o = tf.transpose(o)
168 |         self.B = tf.get_variable('B', shape=[self._feature_size, score_range],
169 |                                  initializer=tf.contrib.layers.xavier_initializer())
170 |         logits_bias = tf.get_variable('logits_bias', [score_range])
171 |         # y_tmp = tf.matmul(self.B, self.W_memory, transpose_b=True)
172 |         with tf.name_scope("prediction"):
173 |             #logits = tf.matmul(o, y_tmp)# + logits_bias
174 |             logits = tf.matmul(o, self.B) + logits_bias
175 |             probs = tf.nn.softmax(tf.cast(logits, tf.float32))
176 |             
177 |             cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=tf.cast(self._labels, tf.float32), name='cross_entropy')
178 |             cross_entropy_sum = tf.reduce_sum(cross_entropy, name="cross_entropy_sum")
179 | 
180 |             # loss op
181 |             trainable_vars = tf.trainable_variables()
182 |             lossL2 = tf.add_n([tf.nn.l2_loss(v) for v in trainable_vars])
183 |             loss_op = cross_entropy_sum + l2_lambda*lossL2
184 |             # predict ops
185 |             predict_op = tf.argmax(probs, 1, name="predict_op")
186 | 
187 |             # assign ops
188 |             self.cost = cross_entropy_sum
189 |             self.loss_op = loss_op
190 |             self.predict_op = predict_op
191 |             self.probs = probs
192 | 
193 |     def _build_inputs(self):
194 |         with tf.name_scope("input"):
195 |             self._memory_key = tf.placeholder(tf.int32, [None, self._memory_value_size, self._story_size], name='memory_key')
196 |             
197 |             self._query = tf.placeholder(tf.int32, [None, self._query_size], name='question')
198 | 
199 |             #self._memory_value = tf.placeholder(tf.int32, [None, self._memory_value_size, self._story_size], name='memory_value')
200 | 
201 |             self._score_encoding = tf.placeholder(tf.int32, [None], name='score_encoding')
202 |             self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')
203 |             self.w_placeholder = tf.placeholder(tf.float32, [self._vocab_size, self._embedding_size])
204 |             self._mem_attention_encoding = tf.placeholder(tf.int32, [None, self._memory_key_size])
205 | 
206 |     '''
207 |     mkeys: the vector representation for keys in memory
208 |     -- shape of each mkeys: [1, embedding_size]
209 |     mvalues: the vector representation for values in memory
210 |     -- shape of each mvalues: [1, embedding_size]
211 |     questions: the vector representation for the question
212 |     -- shape of questions: [1, embedding_size]
213 |     -- shape of R: [feature_size, feature_size]
214 |     -- shape of self.A: [feature_size, embedding_size]
215 |     -- shape of self.B: [feature_size, embedding_size]
216 |     self.A, self.B and R are the parameters to learn
217 |     '''
218 |     def _key_addressing(self, mkeys, mvalues, questions, r_list):
219 |         self.mem_attention_probs = []
220 |         with tf.variable_scope(self._name):
221 |             questions = tf.nn.dropout(questions, self.keep_prob)
222 |             # [feature_size, batch_size]
223 |             u_o = tf.matmul(self.A, questions, transpose_b=True)
224 |             u = [u_o]
225 |             hop_probs = []
226 |             for _ in range(self._hops):
227 |                 R = r_list[_]
228 |                 u_temp = u[-1]
229 |                 mk_temp = tf.nn.dropout(mkeys, self.keep_prob)
230 |                 # [reader_size, batch_size x memory_size]
231 |                 k_temp = tf.reshape(tf.transpose(mk_temp, [2, 0, 1]), [self.reader_feature_size, -1])
232 |                 # [feature_size, batch_size x memory_size]
233 |                 a_k_temp = tf.matmul(self.A_mvalue, k_temp)
234 |                 # [batch_size, memory_size, feature_size]
235 |                 a_k = tf.reshape(tf.transpose(a_k_temp), [-1, self._memory_key_size, self._feature_size])
236 |                 # [batch_size, 1, feature_size]
237 |                 u_expanded = tf.expand_dims(tf.transpose(u_temp), [1])
238 |                 # [batch_size, memory_size]
239 |                 dotted = tf.reduce_sum(a_k*u_expanded, 2)
240 | 
241 |                 # Calculate probabilities
242 |                 # [batch_size, memory_size]
243 |                 probs = tf.nn.softmax(dotted)
244 |                 self.mem_attention_probs.append(probs)
245 |                 # [batch_size, memory_size, 1]
246 |                 probs_expand = tf.expand_dims(probs, -1)
247 |                 mv_temp = mk_temp
248 |                 # [reader_size, batch_size x memory_size]
249 |                 v_temp = tf.reshape(tf.transpose(mv_temp, [2, 0, 1]), [self.reader_feature_size, -1])
250 |                 # [feature_size, batch_size x memory_size]
251 |                 a_v_temp = tf.matmul(self.A_mkey, v_temp)
252 |                 # [batch_size, memory_size, feature_size]
253 |                 a_v = tf.reshape(tf.transpose(a_v_temp), [-1, self._memory_key_size, self._feature_size])
254 |                 # [batch_size, feature_size]
255 |                 o_k = tf.reduce_sum(probs_expand*a_v, 1)
256 |                 # [feature_size, batch_size]
257 |                 o_k = tf.transpose(o_k)
258 |                 # [feature_size, batch_size]
259 |                 # test point
260 |                 u_k = tf.nn.relu(tf.matmul(R, u[-1]+o_k))
261 |                 #u_k = tf.matmul(R, u[-1]+o_k)
262 |                 #u_k = tf.nn.relu(tf.matmul(R, u_o + o_k))
263 |                 u.append(u_k)
264 |             self.mem_attention_probs = tf.stack(self.mem_attention_probs, axis=1)
265 |             #TODO:
266 |             return u[-1]
267 |             #return tf.add_n(u)/len(u)
268 | 


--------------------------------------------------------------------------------
/regression_train.py:
--------------------------------------------------------------------------------
  1 | import data_utils
  2 | import numpy as np
  3 | from sklearn import cross_validation
  4 | from memn2n_kv_regression import MemN2N_KV
  5 | #from skll.metrics import kappa
  6 | from qwk import quadratic_weighted_kappa as kappa
  7 | import tensorflow as tf
  8 | from memn2n_kv_regression import add_gradient_noise
  9 | import time
 10 | import os
 11 | import sys
 12 | 
 13 | print 'start to load flags\n'
 14 | 
 15 | # flags
 16 | tf.flags.DEFINE_float("epsilon", 0.1, "Epsilon value for Adam Optimizer.")
 17 | tf.flags.DEFINE_float("l2_lambda", 0.1, "Lambda for l2 loss.")
 18 | tf.flags.DEFINE_float("learning_rate", 0.002, "Learning rate")
 19 | tf.flags.DEFINE_float("max_grad_norm", 1, "Clip gradients to this norm.")
 20 | tf.flags.DEFINE_float("keep_prob", 0.8, "Keep probability for dropout")
 21 | tf.flags.DEFINE_integer("evaluation_interval", 3, "Evaluate and print results every x epochs")
 22 | tf.flags.DEFINE_integer("batch_size", 32, "Batch size for training.")
 23 | tf.flags.DEFINE_integer("feature_size", 100, "Feature size")
 24 | tf.flags.DEFINE_integer("num_samples", 1, "Number of samples selected from training for each score")
 25 | tf.flags.DEFINE_integer("hops", 1, "Number of hops in the Memory Network.")
 26 | tf.flags.DEFINE_integer("epochs", 200, "Number of epochs to train for.")
 27 | tf.flags.DEFINE_integer("embedding_size", 300, "Embedding size for embedding matrices.")
 28 | tf.flags.DEFINE_integer("token_num", 42, "The number of token in glove")
 29 | tf.flags.DEFINE_integer("essay_set_id", 7, "essay set id, 1 <= id <= 8")
 30 | tf.flags.DEFINE_string("reader", "bow", "Reader for the model (bow, simple_gru)")
 31 | tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement")
 32 | tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices")
 33 | # hyper-parameters
 34 | FLAGS = tf.flags.FLAGS
 35 | FLAGS._parse_flags()
 36 | 
 37 | #vocab_limit = 13000
 38 | essay_set_id = FLAGS.essay_set_id
 39 | batch_size = FLAGS.batch_size
 40 | embedding_size = FLAGS.embedding_size
 41 | feature_size = FLAGS.feature_size
 42 | l2_lambda = FLAGS.l2_lambda
 43 | hops = FLAGS.hops
 44 | reader = 'bow'
 45 | epochs = FLAGS.epochs
 46 | num_samples = FLAGS.num_samples
 47 | num_tokens = FLAGS.token_num
 48 | test_batch_size = batch_size
 49 | random_state = 10
 50 | 
 51 | # print flags info
 52 | orig_stdout = sys.stdout
 53 | timestamp = str(int(time.time()))
 54 | folder_name = 'essay_set_{}_{}_regression_{}'.format(essay_set_id, num_samples, timestamp)
 55 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", folder_name))
 56 | if not os.path.exists(out_dir):
 57 |     os.makedirs(out_dir)
 58 | 
 59 | # save output to a file
 60 | #f = file(out_dir+'/out.txt', 'w')
 61 | #sys.stdout = f
 62 | print("Writing to {}\n".format(out_dir))
 63 | 
 64 | print("\nParameters:")
 65 | for attr, value in sorted(FLAGS.__flags.items()):
 66 |     print("{}={}".format(attr.upper(), value))
 67 | print("")
 68 | 
 69 | with open(out_dir+'/params', 'w') as f:
 70 |     for attr, value in sorted(FLAGS.__flags.items()):
 71 |         f.write("{}={}".format(attr.upper(), value))
 72 |         f.write("\n")
 73 | 
 74 | # hyper-parameters end here
 75 | training_path = 'training_set_rel3.tsv'
 76 | essay_list, resolved_scores, essay_id = data_utils.load_training_data(training_path, essay_set_id)
 77 | 
 78 | max_score = max(resolved_scores)
 79 | min_score = min(resolved_scores)
 80 | if essay_set_id == 7:
 81 |     min_score, max_score = 0, 30
 82 | elif essay_set_id == 8:
 83 |     min_score, max_score = 0, 60
 84 | print 'max_score is {} \t min_score is {}\n'.format(max_score, min_score)
 85 | with open(out_dir+'/params', 'a') as f:
 86 |     f.write('max_score is {} \t min_score is {} \n'.format(max_score, min_score))
 87 | 
 88 | # include max score
 89 | score_range = range(min_score, max_score+1)
 90 | 
 91 | #word_idx, _ = data_utils.build_vocab(essay_list, vocab_limit)
 92 | 
 93 | # load glove
 94 | word_idx, word2vec = data_utils.load_glove(num_tokens, embedding_size)
 95 | vocab_size = len(word_idx) + 1
 96 | # stat info on data set
 97 | 
 98 | sent_size_list = map(len, [essay for essay in essay_list])
 99 | max_sent_size = max(sent_size_list)
100 | mean_sent_size = int(np.mean(map(len, [essay for essay in essay_list])))
101 | 
102 | print 'max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size)
103 | with open(out_dir+'/params', 'a') as f:
104 |     f.write('max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size))
105 | 
106 | print 'The length of score range is {}'.format(len(score_range))
107 | E = data_utils.vectorize_data(essay_list, word_idx, max_sent_size)
108 | 
109 | labeled_data = zip(E, resolved_scores, sent_size_list)
110 | 
111 | # split the data on the fly
112 | trainE, testE, train_scores, test_scores, train_essay_id, test_essay_id = cross_validation.train_test_split(
113 |     E, resolved_scores, essay_id, test_size=.2, random_state=random_state)
114 | 
115 | memory = []
116 | memory_score = []
117 | memory_sent_size = []
118 | memory_essay_ids = []
119 | # pick sampled essay for each score
120 | for i in score_range:
121 |     for j in range(num_samples):
122 |         if i in train_scores:
123 |             score_idx = train_scores.index(i)
124 |             score = train_scores.pop(score_idx)
125 |             essay = trainE.pop(score_idx)
126 |             #sent_size = sent_size_list.pop(score_idx)
127 |             memory.append(essay)
128 |             memory_score.append(score)
129 |             memory_essay_ids.append(train_essay_id.pop(score_idx))
130 | memory_size = len(memory)
131 | trainE, evalE, train_scores, eval_scores, train_essay_id, eval_essay_id = cross_validation.train_test_split(
132 |     trainE, train_scores, train_essay_id, test_size=.2, random_state=random_state)
133 | 
134 | # convert score to one hot encoding
135 | #train_scores_encoding = map(lambda x: score_range.index(x), train_scores)
136 | # normalize training score
137 | #normed_train_scores = (np.array(train_scores) - min_score) / (max_score - min_score)
138 | 
139 | # data size
140 | n_train = len(trainE)
141 | n_test = len(testE)
142 | n_eval = len(evalE)
143 | 
144 | print 'The size of training data: {}'.format(n_train)
145 | print 'The size of testing data: {}'.format(n_test)
146 | print 'The size of evaluation data: {}'.format(n_eval)
147 | with open(out_dir+'/params', 'a') as f:
148 |     f.write('The size of training data: {}\n'.format(n_train))
149 |     f.write('The size of testing data: {}'.format(n_test))
150 |     f.write('The size of evaluation data: {}'.format(n_eval))
151 |     f.write('\nEssay ids in memory:\n{}'.format(memory_essay_ids))
152 |     f.write('\nEssay ids in training:\n{}'.format(train_essay_id))
153 |     f.write('\nEssay ids in evaluation:\n{}'.format(eval_essay_id))
154 |     f.write('\nEssay ids in testing:\n{}'.format(test_essay_id))
155 | 
156 | batches = zip(range(0, n_train-batch_size, batch_size), range(batch_size, n_train, batch_size))
157 | batches = [(start, end) for start, end in batches]
158 | 
159 | with tf.Graph().as_default():
160 |     session_conf = tf.ConfigProto(
161 |         allow_soft_placement=FLAGS.allow_soft_placement,
162 |         log_device_placement=FLAGS.log_device_placement)
163 | 
164 |     global_step = tf.Variable(0, name="global_step", trainable=False)
165 |     # decay learning rate
166 |     starter_learning_rate = FLAGS.learning_rate
167 |     learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 3000, 0.96, staircase=True)
168 | 
169 |     #optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=FLAGS.epsilon)
170 |     optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
171 | 
172 |     best_kappa_so_far = 0.0
173 |     with tf.Session(config=session_conf) as sess:
174 |         model = MemN2N_KV(batch_size, vocab_size, max_sent_size, max_sent_size, memory_size,
175 |                           memory_size, embedding_size, min_score, max_score,
176 |                           feature_size, hops, reader, l2_lambda)
177 | 
178 |         grads_and_vars = optimizer.compute_gradients(
179 |             model.loss_op, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
180 |         grads_and_vars = [(tf.clip_by_norm(g, FLAGS.max_grad_norm), v)
181 |                           for g, v in grads_and_vars if g is not None]
182 |         grads_and_vars = [(add_gradient_noise(g), v) for g, v in grads_and_vars]
183 |         train_op = optimizer.apply_gradients(grads_and_vars, name="train_op", global_step=global_step)
184 |         
185 |         sess.run(tf.global_variables_initializer(), feed_dict={model.w_placeholder: word2vec})
186 | 
187 |         saver = tf.train.Saver(tf.global_variables())
188 | 
189 |         def train_step(m, e, s):
190 |             feed_dict = {
191 |                 model._query: e,
192 |                 model._memory_key: m,
193 |                 model._score_encoding: s,
194 |                 model.keep_prob: FLAGS.keep_prob
195 |             }
196 |             start_time = time.time()
197 |             _, step, predict_op, cost = sess.run([train_op, global_step, model.predict_op, model.cost], feed_dict)
198 |             end_time = time.time()
199 |             time_cost = end_time - start_time
200 |             return predict_op, cost, time_cost
201 | 
202 |         def test_step(e, m):
203 |             feed_dict = {
204 |                 model._query: e,
205 |                 model._memory_key: m,
206 |                 model.keep_prob: 1
207 |             }
208 |             preds = sess.run(model.predict_op, feed_dict)
209 |             return np.round(preds)
210 | 
211 |         for i in range(1, epochs+1):
212 |             train_cost = 0
213 |             np.random.shuffle(batches)
214 |             for start, end in batches:
215 |                 e = trainE[start:end]
216 |                 s = train_scores[start:end]
217 |                 #s = normed_train_scores[start:end]
218 |                 batched_memory = []
219 |                 # batch sized memory
220 |                 for _ in range(len(e)):
221 |                     batched_memory.append(memory)
222 |                 _, cost, _ = train_step(batched_memory, e, s)
223 |                 train_cost += cost
224 |             print 'Finish epoch {}, total training cost is {}'.format(i, train_cost)
225 |             # evaluation
226 |             if i % FLAGS.evaluation_interval == 0 or i == FLAGS.epochs:
227 |                 # test on training data
228 |                 train_preds = []
229 |                 for start in range(0, n_train, test_batch_size):
230 |                     end = min(n_train, start+test_batch_size)
231 |                     
232 |                     batched_memory = []
233 |                     for _ in range(end-start):
234 |                         batched_memory.append(memory)
235 |                     preds = test_step(trainE[start:end], batched_memory)
236 |                     for ite in preds:
237 |                         if ite > max_score:
238 |                             ite = max_score
239 |                         elif ite < min_score:
240 |                             ite = min_score
241 |                         train_preds.append(ite)
242 |                 # regression
243 |                 #train_preds = np.array(train_preds)*(max_score-min_score) + min_score
244 |                 print train_preds[-10:]
245 |                 train_kappp_score = kappa(train_scores, train_preds, min_score, max_score)
246 |             
247 |                 # test on eval data
248 |                 eval_preds = []
249 |                 for start in range(0, n_eval, test_batch_size):
250 |                     end = min(n_eval, start+test_batch_size)
251 |                     
252 |                     batched_memory = []
253 |                     for _ in range(end-start):
254 |                         batched_memory.append(memory)
255 |                     preds = test_step(evalE[start:end], batched_memory)
256 |                     for ite in preds:
257 |                         if ite > max_score:
258 |                             ite = max_score
259 |                         elif ite < min_score:
260 |                             ite = min_score
261 | 
262 |                         eval_preds.append(ite)
263 |                 # regression
264 |                 #eval_preds = np.array(eval_preds)*(max_score-min_score) + min_score
265 |                 eval_kappp_score = kappa(eval_scores, eval_preds, min_score, max_score)
266 | 
267 |                 # test on test data
268 |                 test_preds = []
269 |                 for start in range(0, n_test, test_batch_size):
270 |                     end = min(n_test, start+test_batch_size)
271 |                     
272 |                     batched_memory = []
273 |                     for _ in range(end-start):
274 |                         batched_memory.append(memory)
275 |                     preds = test_step(testE[start:end], batched_memory)
276 |                     for ite in preds:
277 |                         if ite > max_score:
278 |                             ite = max_score
279 |                         elif ite < min_score:
280 |                             ite = min_score
281 | 
282 |                         test_preds.append(ite)
283 |                 # regression
284 |                 #test_preds = np.array(test_preds)*(max_score-min_score) + min_score
285 |                 test_kappp_score = kappa(test_scores, test_preds, min_score, max_score)
286 | 
287 |                 # save the model if it gets best kappa
288 |                 if(test_kappp_score > best_kappa_so_far):
289 |                     best_kappa_so_far = test_kappp_score
290 |                     #saver.save(sess, out_dir+'/checkpoints', global_step)
291 |                 print("Training kappa score = {}".format(train_kappp_score))
292 |                 print("Validation kappa score = {}".format(eval_kappp_score))
293 |                 print("Testing kappa score = {}".format(test_kappp_score))
294 |                 with open(out_dir+'/eval', 'a') as f:
295 |                     f.write("Training kappa score = {}\n".format(train_kappp_score))
296 |                     f.write("Validation kappa score = {}\n".format(eval_kappp_score))
297 |                     f.write("Testing kappa score = {}\n".format(test_kappp_score))
298 |                     f.write("Best Testing kappa score so far = {}\n".format(best_kappa_so_far))
299 |                     f.write('*'*10)
300 |                     f.write('\n')
301 | #sys.stdout = orig_stdout
302 | #f.close()
303 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import data_utils
  2 | import numpy as np
  3 | from sklearn import cross_validation
  4 | from qwk import quadratic_weighted_kappa
  5 | import tensorflow as tf
  6 | from memn2n_kv import add_gradient_noise
  7 | import time
  8 | import os
  9 | import sys
 10 | import pandas as pd
 11 | 
 12 | print 'start to load flags\n'
 13 | 
 14 | # flags
 15 | tf.flags.DEFINE_float("epsilon", 0.1, "Epsilon value for Adam Optimizer.")
 16 | tf.flags.DEFINE_float("l2_lambda", 0.3, "Lambda for l2 loss.")
 17 | tf.flags.DEFINE_float("learning_rate", 0.002, "Learning rate")
 18 | tf.flags.DEFINE_float("max_grad_norm", 10.0, "Clip gradients to this norm.")
 19 | tf.flags.DEFINE_float("keep_prob", 0.8, "Keep probability for dropout")
 20 | tf.flags.DEFINE_integer("evaluation_interval", 3, "Evaluate and print results every x epochs")
 21 | tf.flags.DEFINE_integer("batch_size", 32, "Batch size for training.")
 22 | tf.flags.DEFINE_integer("feature_size", 100, "Feature size")
 23 | tf.flags.DEFINE_integer("num_samples", 1, "Number of samples selected from training for each score")
 24 | tf.flags.DEFINE_integer("hops", 3, "Number of hops in the Memory Network.")
 25 | tf.flags.DEFINE_integer("epochs", 200, "Number of epochs to train for.")
 26 | tf.flags.DEFINE_integer("embedding_size", 300, "Embedding size for embedding matrices.")
 27 | tf.flags.DEFINE_integer("essay_set_id", 1, "essay set id, 1 <= id <= 8")
 28 | tf.flags.DEFINE_integer("token_num", 42, "The number of token in glove (6, 42)")
 29 | tf.flags.DEFINE_boolean("gated_addressing", False, "Simple gated addressing")
 30 | tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement")
 31 | tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices")
 32 | # hyper-parameters
 33 | FLAGS = tf.flags.FLAGS
 34 | FLAGS._parse_flags()
 35 | 
 36 | gated_addressing = FLAGS.gated_addressing
 37 | essay_set_id = FLAGS.essay_set_id
 38 | batch_size = FLAGS.batch_size
 39 | embedding_size = FLAGS.embedding_size
 40 | feature_size = FLAGS.feature_size
 41 | l2_lambda = FLAGS.l2_lambda
 42 | hops = FLAGS.hops
 43 | reader = 'bow'
 44 | epochs = FLAGS.epochs
 45 | num_samples = FLAGS.num_samples
 46 | num_tokens = FLAGS.token_num
 47 | test_batch_size = batch_size
 48 | random_state = 0
 49 | if gated_addressing:
 50 |     from memn2n_g_kv import MemN2N_KV
 51 | else:
 52 |     from memn2n_kv import MemN2N_KV
 53 | # print flags info
 54 | orig_stdout = sys.stdout
 55 | timestamp = time.strftime("%b_%d_%Y_%H:%M:%S", time.localtime())
 56 | folder_name = 'essay_set_{}_{}_{}'.format(essay_set_id, num_samples, timestamp)
 57 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", folder_name))
 58 | if not os.path.exists(out_dir):
 59 |     os.makedirs(out_dir)
 60 | 
 61 | # save output to a file
 62 | #f = file(out_dir+'/out.txt', 'w')
 63 | #sys.stdout = f
 64 | print("Writing to {}\n".format(out_dir))
 65 | 
 66 | print("\nParameters:")
 67 | for attr, value in sorted(FLAGS.__flags.items()):
 68 |     print("{}={}".format(attr.upper(), value))
 69 | print("")
 70 | 
 71 | with open(out_dir+'/params', 'w') as f:
 72 |     for attr, value in sorted(FLAGS.__flags.items()):
 73 |         f.write("{}={}".format(attr.upper(), value))
 74 |         f.write("\n")
 75 | 
 76 | # hyper-parameters end here
 77 | training_path = 'training_set_rel3.tsv'
 78 | essay_list, resolved_scores, essay_id = data_utils.load_training_data(training_path, essay_set_id)
 79 | 
 80 | max_score = max(resolved_scores)
 81 | min_score = min(resolved_scores)
 82 | if essay_set_id == 7:
 83 |     min_score, max_score = 0, 30
 84 | elif essay_set_id == 8:
 85 |     min_score, max_score = 0, 60
 86 | 
 87 | print 'max_score is {} \t min_score is {}\n'.format(max_score, min_score)
 88 | with open(out_dir+'/params', 'a') as f:
 89 |     f.write('max_score is {} \t min_score is {} \n'.format(max_score, min_score))
 90 | 
 91 | # include max score
 92 | score_range = range(min_score, max_score+1)
 93 | 
 94 | #word_idx, _ = data_utils.build_vocab(essay_list, vocab_limit)
 95 | 
 96 | # load glove
 97 | word_idx, word2vec = data_utils.load_glove(num_tokens, dim=embedding_size)
 98 | 
 99 | vocab_size = len(word_idx) + 1
100 | # stat info on data set
101 | 
102 | sent_size_list = map(len, [essay for essay in essay_list])
103 | max_sent_size = max(sent_size_list)
104 | mean_sent_size = int(np.mean(map(len, [essay for essay in essay_list])))
105 | 
106 | print 'max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size)
107 | with open(out_dir+'/params', 'a') as f:
108 |     f.write('max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size))
109 | 
110 | print 'The length of score range is {}'.format(len(score_range))
111 | E = data_utils.vectorize_data(essay_list, word_idx, max_sent_size)
112 | 
113 | labeled_data = zip(E, resolved_scores, sent_size_list)
114 | 
115 | # split the data on the fly
116 | #trainE, testE, train_scores, test_scores, train_sent_sizes, test_sent_sizes = cross_validation.train_test_split(
117 | #    E, resolved_scores, sent_size_list, test_size=.2, random_state=random_state)
118 | 
119 | #trainE, evalE, train_scores, eval_scores, train_sent_sizes, eval_sent_sizes = cross_validation.train_test_split(
120 | #    trainE, train_scores, train_sent_sizes, test_size=.1, random_state=random_state)
121 | # split the data on the fly
122 | trainE, testE, train_scores, test_scores, train_essay_id, test_essay_id = cross_validation.train_test_split(
123 |     E, resolved_scores, essay_id, test_size=.2, random_state=random_state)
124 | 
125 | memory = []
126 | memory_score = []
127 | memory_sent_size = []
128 | memory_essay_ids = []
129 | # pick sampled essay for each score
130 | for i in score_range:
131 |     # test point: limit the number of samples in memory for 8
132 |     for j in range(num_samples):
133 |         if i in train_scores:
134 |             score_idx = train_scores.index(i)
135 |             score = train_scores.pop(score_idx)
136 |             essay = trainE.pop(score_idx)
137 |             sent_size = sent_size_list.pop(score_idx)
138 |             memory.append(essay)
139 |             memory_score.append(score)
140 |             memory_essay_ids.append(train_essay_id.pop(score_idx))
141 |             memory_sent_size.append(sent_size)
142 | memory_size = len(memory)
143 | trainE, evalE, train_scores, eval_scores, train_essay_id, eval_essay_id = cross_validation.train_test_split(
144 |     trainE, train_scores, train_essay_id, test_size=.2)
145 | # convert score to one hot encoding
146 | train_scores_encoding = map(lambda x: score_range.index(x), train_scores)
147 | 
148 | # data size
149 | n_train = len(trainE)
150 | n_test = len(testE)
151 | n_eval = len(evalE)
152 | 
153 | print 'The size of training data: {}'.format(n_train)
154 | print 'The size of testing data: {}'.format(n_test)
155 | print 'The size of evaluation data: {}'.format(n_eval)
156 | with open(out_dir+'/params', 'a') as f:
157 |     f.write('The size of training data: {}\n'.format(n_train))
158 |     f.write('The size of testing data: {}\n'.format(n_test))
159 |     f.write('The size of evaluation data: {}\n'.format(n_eval))
160 |     f.write('\nEssay scores in memory:\n{}'.format(memory_score))
161 |     f.write('\nEssay ids in memory:\n{}'.format(memory_essay_ids))
162 |     f.write('\nEssay ids in training:\n{}'.format(train_essay_id))
163 |     f.write('\nEssay ids in evaluation:\n{}'.format(eval_essay_id))
164 |     f.write('\nEssay ids in testing:\n{}'.format(test_essay_id))
165 | 
166 | batches = zip(range(0, n_train-batch_size, batch_size), range(batch_size, n_train, batch_size))
167 | batches = [(start, end) for start, end in batches]
168 | 
169 | with tf.Graph().as_default():
170 |     session_conf = tf.ConfigProto(
171 |         allow_soft_placement=FLAGS.allow_soft_placement,
172 |         log_device_placement=FLAGS.log_device_placement)
173 | 
174 |     global_step = tf.Variable(0, name="global_step", trainable=False)
175 |     # decay learning rate
176 |     starter_learning_rate = FLAGS.learning_rate
177 |     learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 3000, 0.96, staircase=True)
178 | 
179 |     # test point
180 |     optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=FLAGS.epsilon)
181 |     #optimizer = tf.train.AdagradOptimizer(learning_rate)
182 |     best_kappa_so_far = 0.0
183 |     with tf.Session(config=session_conf) as sess:
184 |         model = MemN2N_KV(batch_size, vocab_size, max_sent_size, max_sent_size, memory_size,
185 |                           memory_size, embedding_size, len(score_range), feature_size, hops, reader, l2_lambda)
186 | 
187 |         grads_and_vars = optimizer.compute_gradients(model.loss_op, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
188 |         grads_and_vars = [(tf.clip_by_norm(g, FLAGS.max_grad_norm), v)
189 |                           for g, v in grads_and_vars if g is not None]
190 |         grads_and_vars = [(add_gradient_noise(g, 1e-3), v) for g, v in grads_and_vars]
191 |         # test point
192 |         #nil_grads_and_vars = []
193 |         #for g, v in grads_and_vars:
194 |         #    if v.name in model._nil_vars:
195 |         #        nil_grads_and_vars.append((zero_nil_slot(g), v))
196 |         #    else:
197 |         #        nil_grads_and_vars.append((g, v))
198 | 
199 |         train_op = optimizer.apply_gradients(grads_and_vars, name="train_op", global_step=global_step)
200 |         
201 |         sess.run(tf.initialize_all_variables(), feed_dict={model.w_placeholder: word2vec})
202 | 
203 |         saver = tf.train.Saver(tf.all_variables())
204 | 
205 |         def train_step(m, e, s, ma):
206 |             start_time = time.time()
207 |             feed_dict = {
208 |                 model._query: e,
209 |                 model._memory_key: m,
210 |                 model._score_encoding: s,
211 |                 model._mem_attention_encoding: ma,
212 |                 model.keep_prob: FLAGS.keep_prob
213 |                 #model.w_placeholder: word2vec
214 |             }
215 |             _, step, predict_op, cost = sess.run([train_op, global_step, model.predict_op, model.cost], feed_dict)
216 |             end_time = time.time()
217 |             time_spent = end_time - start_time
218 |             return predict_op, cost, time_spent
219 | 
220 |         def test_step(e, m):
221 |             feed_dict = {
222 |                 model._query: e,
223 |                 model._memory_key: m,
224 |                 model.keep_prob: 1
225 |                 #model.w_placeholder: word2vec
226 |             }
227 |             preds, mem_attention_probs = sess.run([model.predict_op, model.mem_attention_probs], feed_dict)
228 |             return preds, mem_attention_probs
229 | 
230 |         for i in range(1, epochs+1):
231 |             train_cost = 0
232 |             total_time = 0
233 |             np.random.shuffle(batches)
234 |             for start, end in batches:
235 |                 e = trainE[start:end]
236 |                 s = train_scores_encoding[start:end]
237 |                 s_num = train_scores[start:end]
238 |                 #batched_memory = []
239 |                 # batch sized memory
240 |                 #for _ in range(len(e)):
241 |                 #    batched_memory.append(memory)
242 |                 mem_atten_encoding = []
243 |                 for ite in s_num:
244 |                     mem_encoding = np.zeros(memory_size)
245 |                     for j_idx, j in enumerate(memory_score):
246 |                         if j == ite:
247 |                             mem_encoding[j_idx] = 1
248 |                     mem_atten_encoding.append(mem_encoding)
249 |                 batched_memory = [memory] * (end-start)
250 |                 _, cost, time_spent = train_step(batched_memory, e, s, mem_atten_encoding)
251 |                 total_time += time_spent
252 |                 train_cost += cost
253 |             print 'Finish epoch {}, total training cost is {}, time spent is {}'.format(i, train_cost, total_time)
254 |             # evaluation
255 |             if i % FLAGS.evaluation_interval == 0 or i == FLAGS.epochs:
256 |                 # test on training data
257 |                 train_preds = []
258 |                 for start in range(0, n_train, test_batch_size):
259 |                     end = min(n_train, start+test_batch_size)
260 |                     
261 |                     #batched_memory = []
262 |                     #for _ in range(end-start):
263 |                     #    batched_memory.append(memory)
264 |                     batched_memory = [memory] * (end-start)
265 |                     preds, _ = test_step(trainE[start:end], batched_memory)
266 |                     for ite in preds:
267 |                         train_preds.append(ite)
268 |                 train_preds = np.add(train_preds, min_score)
269 |                 #train_kappp_score = kappa(train_scores, train_preds, 'quadratic')
270 |                 train_kappp_score = quadratic_weighted_kappa(
271 |                     train_scores, train_preds, min_score, max_score)
272 |                 # test on eval data
273 |                 eval_preds = []
274 |                 for start in range(0, n_eval, test_batch_size):
275 |                     end = min(n_eval, start+test_batch_size)
276 |                     
277 |                     #batched_memory = []
278 |                     #for _ in range(end-start):
279 |                     #    batched_memory.append(memory)
280 |                     batched_memory = [memory] * (end-start)
281 |                     preds, _ = test_step(evalE[start:end], batched_memory)
282 |                     for ite in preds:
283 |                         eval_preds.append(ite)
284 | 
285 |                 eval_preds = np.add(eval_preds, min_score)
286 |                 #eval_kappp_score = kappa(eval_scores, eval_preds, 'quadratic')
287 |                 eval_kappp_score = quadratic_weighted_kappa(
288 |                     eval_scores, eval_preds, min_score, max_score)
289 | 
290 |                 # test on test data
291 |                 test_preds = []
292 |                 test_atten_probs = []
293 |                 for start in range(0, n_test, test_batch_size):
294 |                     end = min(n_test, start+test_batch_size)
295 |                     
296 |                     #batched_memory = []
297 |                     #for _ in range(end-start):
298 |                     #    batched_memory.append(memory)
299 |                     batched_memory = [memory] * (end-start)
300 |                     preds, mem_attention_probs = test_step(testE[start:end], batched_memory)
301 |                     for ite in preds:
302 |                         test_preds.append(ite)
303 |                     for ite in mem_attention_probs:
304 |                         test_atten_probs.append(ite)
305 |                 test_preds = np.add(test_preds, min_score)
306 |                 #test_kappp_score = kappa(test_scores, test_preds, 'quadratic')
307 |                 test_kappp_score = quadratic_weighted_kappa(
308 |                     test_scores, test_preds, min_score, max_score)
309 |                 stat_dict = {'essay_id': test_essay_id, 'score': test_scores, 'pred_score': test_preds}
310 |                 stat_df = pd.DataFrame(stat_dict)
311 |                 # save the model if it gets best kappa
312 |                 if(test_kappp_score > best_kappa_so_far):
313 |                     best_kappa_so_far = test_kappp_score
314 |                     # stats on test
315 |                     stat_df.to_csv(out_dir+'/stat')
316 |                     with open(out_dir+'/mem_atten', 'a') as f:
317 |                         for idx, ite in enumerate(test_essay_id):
318 |                             f.write('{}\n'.format(ite))
319 |                             f.write('{}\n'.format(test_atten_probs[idx]))
320 |                     #saver.save(sess, out_dir+'/checkpoints', global_step)
321 |                 print("Training kappa score = {}".format(train_kappp_score))
322 |                 print("Validation kappa score = {}".format(eval_kappp_score))
323 |                 print("Testing kappa score = {}".format(test_kappp_score))
324 |                 with open(out_dir+'/eval', 'a') as f:
325 |                     f.write("Training kappa score = {}\n".format(train_kappp_score))
326 |                     f.write("Validation kappa score = {}\n".format(eval_kappp_score))
327 |                     f.write("Testing kappa score = {}\n".format(test_kappp_score))
328 |                     f.write("Best Testing kappa score so far = {}\n".format(best_kappa_so_far))
329 |                     f.write('*'*10)
330 |                     f.write('\n')
331 | #sys.stdout = orig_stdout
332 | #f.close()
333 | 


--------------------------------------------------------------------------------
/cv_train.py:
--------------------------------------------------------------------------------
  1 | import data_utils
  2 | import numpy as np
  3 | from sklearn.model_selection import KFold
  4 | from qwk import quadratic_weighted_kappa
  5 | import tensorflow as tf
  6 | import time
  7 | import os
  8 | import sys
  9 | import pandas as pd
 10 | 
 11 | print 'start to load flags\n'
 12 | 
 13 | # flags
 14 | tf.flags.DEFINE_float("epsilon", 0.1, "Epsilon value for Adam Optimizer.")
 15 | tf.flags.DEFINE_float("l2_lambda", 0.3, "Lambda for l2 loss.")
 16 | tf.flags.DEFINE_float("learning_rate", 0.002, "Learning rate")
 17 | tf.flags.DEFINE_float("max_grad_norm", 10.0, "Clip gradients to this norm.")
 18 | tf.flags.DEFINE_float("keep_prob", 0.9, "Keep probability for dropout")
 19 | tf.flags.DEFINE_integer("evaluation_interval", 2, "Evaluate and print results every x epochs")
 20 | tf.flags.DEFINE_integer("batch_size", 15, "Batch size for training.")
 21 | tf.flags.DEFINE_integer("feature_size", 100, "Feature size")
 22 | tf.flags.DEFINE_integer("num_samples", 1, "Number of samples selected from training for each score")
 23 | tf.flags.DEFINE_integer("hops", 3, "Number of hops in the Memory Network.")
 24 | tf.flags.DEFINE_integer("epochs", 200, "Number of epochs to train for.")
 25 | tf.flags.DEFINE_integer("embedding_size", 300, "Embedding size for embedding matrices.")
 26 | tf.flags.DEFINE_integer("essay_set_id", 1, "essay set id, 1 <= id <= 8")
 27 | tf.flags.DEFINE_integer("token_num", 42, "The number of token in glove (6, 42)")
 28 | tf.flags.DEFINE_boolean("gated_addressing", False, "Simple gated addressing")
 29 | tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement")
 30 | tf.flags.DEFINE_boolean("is_regression", False, "The output is regression or classification")
 31 | tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices")
 32 | # hyper-parameters
 33 | FLAGS = tf.flags.FLAGS
 34 | 
 35 | early_stop_count = 0
 36 | max_step_count = 10
 37 | is_regression = FLAGS.is_regression
 38 | gated_addressing = FLAGS.gated_addressing
 39 | essay_set_id = FLAGS.essay_set_id
 40 | batch_size = FLAGS.batch_size
 41 | embedding_size = FLAGS.embedding_size
 42 | feature_size = FLAGS.feature_size
 43 | l2_lambda = FLAGS.l2_lambda
 44 | hops = FLAGS.hops
 45 | reader = 'bow' # gru may not work
 46 | epochs = FLAGS.epochs
 47 | num_samples = FLAGS.num_samples
 48 | num_tokens = FLAGS.token_num
 49 | test_batch_size = batch_size
 50 | random_state = 0
 51 | if is_regression:
 52 |     from memn2n_kv_regression import MemN2N_KV
 53 | else:
 54 |     from memn2n_kv import MemN2N_KV
 55 | # print flags info
 56 | orig_stdout = sys.stdout
 57 | timestamp = time.strftime("%b_%d_%Y_%H:%M:%S", time.localtime())
 58 | folder_name = 'essay_set_{}_cv_{}_{}'.format(essay_set_id, num_samples, timestamp)
 59 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", folder_name))
 60 | if not os.path.exists(out_dir):
 61 |     os.makedirs(out_dir)
 62 | 
 63 | # save output to a file
 64 | #f = file(out_dir+'/out.txt', 'w')
 65 | #sys.stdout = f
 66 | print("Writing to {}\n".format(out_dir))
 67 | 
 68 | print("\nParameters:")
 69 | for attr, value in sorted(FLAGS.__flags.items()):
 70 |     print("{}={}".format(attr.upper(), value))
 71 | print("")
 72 | 
 73 | with open(out_dir+'/params', 'w') as f:
 74 |     for attr, value in sorted(FLAGS.__flags.items()):
 75 |         f.write("{}={}".format(attr.upper(), value))
 76 |         f.write("\n")
 77 | 
 78 | # hyper-parameters end here
 79 | training_path = 'training_set_rel3.tsv'
 80 | essay_list, resolved_scores, essay_id = data_utils.load_training_data(training_path, essay_set_id)
 81 | 
 82 | max_score = max(resolved_scores)
 83 | min_score = min(resolved_scores)
 84 | if essay_set_id == 7:
 85 |     min_score, max_score = 0, 30
 86 | elif essay_set_id == 8:
 87 |     min_score, max_score = 0, 60
 88 | 
 89 | print 'max_score is {} \t min_score is {}\n'.format(max_score, min_score)
 90 | with open(out_dir+'/params', 'a') as f:
 91 |     f.write('max_score is {} \t min_score is {} \n'.format(max_score, min_score))
 92 | 
 93 | # include max score
 94 | score_range = range(min_score, max_score+1)
 95 | 
 96 | #word_idx, _ = data_utils.build_vocab(essay_list, vocab_limit)
 97 | 
 98 | # load glove
 99 | word_idx, word2vec = data_utils.load_glove(num_tokens, dim=embedding_size)
100 | 
101 | vocab_size = len(word_idx) + 1
102 | # stat info on data set
103 | 
104 | sent_size_list = map(len, [essay for essay in essay_list])
105 | max_sent_size = max(sent_size_list)
106 | mean_sent_size = int(np.mean(map(len, [essay for essay in essay_list])))
107 | 
108 | print 'max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size)
109 | with open(out_dir+'/params', 'a') as f:
110 |     f.write('max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size))
111 | 
112 | print 'The length of score range is {}'.format(len(score_range))
113 | E = data_utils.vectorize_data(essay_list, word_idx, max_sent_size)
114 | 
115 | labeled_data = zip(E, resolved_scores, sent_size_list)
116 | 
117 | # split the data on the fly
118 | #trainE, testE, train_scores, test_scores, train_sent_sizes, test_sent_sizes = cross_validation.train_test_split(
119 | #    E, resolved_scores, sent_size_list, test_size=.2, random_state=random_state)
120 | 
121 | #trainE, evalE, train_scores, eval_scores, train_sent_sizes, eval_sent_sizes = cross_validation.train_test_split(
122 | #    trainE, train_scores, train_sent_sizes, test_size=.1, random_state=random_state)
123 | # split the data on the fly
124 | 
125 | def train_step(m, e, s, ma):
126 |     start_time = time.time()
127 |     feed_dict = {
128 |         model._query: e,
129 |         model._memory_key: m,
130 |         model._score_encoding: s,
131 |         model._mem_attention_encoding: ma,
132 |         model.keep_prob: FLAGS.keep_prob
133 |         #model.w_placeholder: word2vec
134 |     }
135 |     _, step, predict_op, cost = sess.run([train_op, global_step, model.predict_op, model.cost], feed_dict)
136 |     end_time = time.time()
137 |     time_spent = end_time - start_time
138 |     return predict_op, cost, time_spent
139 | 
140 | def test_step(e, m):
141 |     feed_dict = {
142 |         model._query: e,
143 |         model._memory_key: m,
144 |         model.keep_prob: 1
145 |         #model.w_placeholder: word2vec
146 |     }
147 |     preds, mem_attention_probs = sess.run([model.predict_op, model.mem_attention_probs], feed_dict)
148 |     if is_regression:
149 |         preds = np.clip(np.round(preds), min_score, max_score)
150 |         return preds, mem_attention_probs
151 |     else:
152 |         return preds, mem_attention_probs
153 | 
154 | fold_count = 0
155 | kf = KFold(n_splits=5, random_state=random_state)
156 | best_kappa_scores = []
157 | for train_index, test_index in kf.split(essay_id):
158 |     early_stop_count = 0
159 |     fold_count += 1
160 |     trainE = []
161 |     testE = []
162 |     train_scores = []
163 |     test_scores = []
164 |     train_essay_id = []
165 |     test_essay_id = []
166 | 
167 |     for ite in train_index:
168 |         trainE.append(E[ite])
169 |         train_scores.append(resolved_scores[ite])
170 |         train_essay_id.append(essay_id[ite])
171 |     for ite in test_index:
172 |         testE.append(E[ite])
173 |         test_scores.append(resolved_scores[ite])
174 |         test_essay_id.append(essay_id[ite])
175 |     
176 | #trainE, testE, train_scores, test_scores, train_essay_id, test_essay_id = cross_validation.train_test_split(
177 | #    E, resolved_scores, essay_id, test_size=.2, random_state=random_state)
178 | 
179 |     memory = []
180 |     memory_score = []
181 |     memory_sent_size = []
182 |     memory_essay_ids = []
183 |     # pick sampled essay for each score
184 |     for i in score_range:
185 |         # test point: limit the number of samples in memory for 8
186 |         for j in range(num_samples):
187 |             if i in train_scores:
188 |                 score_idx = train_scores.index(i)
189 |                 score = train_scores.pop(score_idx)
190 |                 essay = trainE.pop(score_idx)
191 |                 sent_size = sent_size_list.pop(score_idx)
192 |                 memory.append(essay)
193 |                 memory_score.append(score)
194 |                 memory_essay_ids.append(train_essay_id.pop(score_idx))
195 |                 memory_sent_size.append(sent_size)
196 |     memory_size = len(memory)
197 |     if is_regression:
198 |         # bad naming
199 |         train_scores_encoding = train_scores
200 |     else:
201 |         train_scores_encoding = map(lambda x: score_range.index(x), train_scores)
202 | 
203 |     # data size
204 |     n_train = len(trainE)
205 |     n_test = len(testE)
206 | 
207 |     print 'The size of training data: {}'.format(n_train)
208 |     print 'The size of testing data: {}'.format(n_test)
209 |     with open(out_dir+'/params{}'.format(fold_count), 'a') as f:
210 |         f.write('The size of training data: {}\n'.format(n_train))
211 |         f.write('The size of testing data: {}\n'.format(n_test))
212 |         f.write('\nEssay scores in memory:\n{}'.format(memory_score))
213 |         f.write('\nEssay ids in memory:\n{}'.format(memory_essay_ids))
214 |         f.write('\nEssay ids in training:\n{}'.format(train_essay_id))
215 |         f.write('\nEssay ids in testing:\n{}'.format(test_essay_id))
216 | 
217 |     batches = zip(range(0, n_train-batch_size, batch_size), range(batch_size, n_train, batch_size))
218 |     batches = [(start, end) for start, end in batches]
219 | 
220 |     with tf.Graph().as_default():
221 |         session_conf = tf.ConfigProto(
222 |             allow_soft_placement=FLAGS.allow_soft_placement,
223 |             log_device_placement=FLAGS.log_device_placement)
224 | 
225 |         global_step = tf.Variable(0, name="global_step", trainable=False)
226 |         # decay learning rate
227 |         starter_learning_rate = FLAGS.learning_rate
228 |         learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 3000, 0.96, staircase=True)
229 | 
230 |         # test point
231 |         optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=FLAGS.epsilon)
232 |         #optimizer = tf.train.AdagradOptimizer(learning_rate)
233 |         best_kappa_so_far = 0.0
234 |         with tf.Session(config=session_conf) as sess:
235 |             model = MemN2N_KV(batch_size, vocab_size, max_sent_size, max_sent_size, memory_size,
236 |                               memory_size, embedding_size, len(score_range), feature_size, hops, reader, l2_lambda)
237 | 
238 |             grads_and_vars = optimizer.compute_gradients(model.loss_op, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
239 |             grads_and_vars = [(tf.clip_by_norm(g, FLAGS.max_grad_norm), v)
240 |                               for g, v in grads_and_vars if g is not None]
241 |             #grads_and_vars = [(add_gradient_noise(g, 1e-4), v) for g, v in grads_and_vars]
242 |             train_op = optimizer.apply_gradients(grads_and_vars, name="train_op", global_step=global_step)
243 |             sess.run(tf.global_variables_initializer(), feed_dict={model.w_placeholder: word2vec})
244 |             saver = tf.train.Saver(tf.global_variables())
245 | 
246 |             for i in range(1, epochs+1):
247 |                 train_cost = 0
248 |                 total_time = 0
249 |                 np.random.shuffle(batches)
250 |                 for start, end in batches:
251 |                     e = trainE[start:end]
252 |                     s = train_scores_encoding[start:end]
253 |                     s_num = train_scores[start:end]
254 |                     #batched_memory = []
255 |                     # batch sized memory
256 |                     #for _ in range(len(e)):
257 |                     #    batched_memory.append(memory)
258 |                     mem_atten_encoding = []
259 |                     for ite in s_num:
260 |                         mem_encoding = np.zeros(memory_size)
261 |                         for j_idx, j in enumerate(memory_score):
262 |                             if j == ite:
263 |                                 mem_encoding[j_idx] = 1
264 |                         mem_atten_encoding.append(mem_encoding)
265 |                     batched_memory = [memory] * (end-start)
266 |                     _, cost, time_spent = train_step(batched_memory, e, s, mem_atten_encoding)
267 |                     total_time += time_spent
268 |                     train_cost += cost
269 |                 print 'Finish epoch {}, total training cost is {}, time spent is {}'.format(i, train_cost, total_time)
270 |                 # evaluation
271 |                 if i % FLAGS.evaluation_interval == 0 or i == FLAGS.epochs:
272 |                     # test on training data
273 |                     train_preds = []
274 |                     for start in range(0, n_train, test_batch_size):
275 |                         end = min(n_train, start+test_batch_size)
276 | 
277 |                         #batched_memory = []
278 |                         #for _ in range(end-start):
279 |                         #    batched_memory.append(memory)
280 |                         batched_memory = [memory] * (end-start)
281 |                         preds, _ = test_step(trainE[start:end], batched_memory)
282 |                         if type(preds) is np.float32:
283 |                             train_preds.append(preds)
284 |                         else:
285 |                             for ite in preds:
286 |                                 train_preds.append(ite)
287 |                     if not is_regression:
288 |                         train_preds = np.add(train_preds, min_score)
289 |                     #train_kappp_score = kappa(train_scores, train_preds, 'quadratic')
290 |                     train_kappp_score = quadratic_weighted_kappa(
291 |                         train_scores, train_preds, min_score, max_score)
292 |                     # test on test data
293 |                     test_preds = []
294 |                     test_atten_probs = []
295 |                     for start in range(0, n_test, test_batch_size):
296 |                         end = min(n_test, start+test_batch_size)
297 | 
298 |                         #batched_memory = []
299 |                         #for _ in range(end-start):
300 |                         #    batched_memory.append(memory)
301 |                         batched_memory = [memory] * (end-start)
302 |                         preds, mem_attention_probs = test_step(testE[start:end], batched_memory)
303 |                         if type(preds) is np.float32:
304 |                             test_preds.append(preds)
305 |                         else:
306 |                             for ite in preds:
307 |                                 test_preds.append(ite)
308 |                         for ite in mem_attention_probs:
309 |                             test_atten_probs.append(ite)
310 |                     if not is_regression:
311 |                         test_preds = np.add(test_preds, min_score)
312 |                     #test_kappp_score = kappa(test_scores, test_preds, 'quadratic')
313 |                     test_kappp_score = quadratic_weighted_kappa(
314 |                         test_scores, test_preds, min_score, max_score)
315 |                     stat_dict = {'essay_id': test_essay_id, 'score': test_scores, 'pred_score': test_preds}
316 |                     stat_df = pd.DataFrame(stat_dict)
317 |                     # save the model if it gets best kappa
318 |                     if(test_kappp_score > best_kappa_so_far):
319 |                         early_stop_count = 0
320 |                         best_kappa_so_far = test_kappp_score
321 |                         # stats on test
322 |                         stat_df.to_csv(out_dir+'/stat')
323 |                         with open(out_dir+'/mem_atten', 'a') as f:
324 |                             for idx, ite in enumerate(test_essay_id):
325 |                                 f.write('{}\n'.format(ite))
326 |                                 f.write('{}\n'.format(test_atten_probs[idx]))
327 |                         #saver.save(sess, out_dir+'/checkpoints', global_step)
328 |                     else:
329 |                         early_stop_count += 1
330 |                     print("Training kappa score = {}".format(train_kappp_score))
331 |                     print("Testing kappa score = {}".format(test_kappp_score))
332 |                     with open(out_dir+'/eval{}'.format(fold_count), 'a') as f:
333 |                         f.write("Training kappa score = {}\n".format(train_kappp_score))
334 |                         f.write("Testing kappa score = {}\n".format(test_kappp_score))
335 |                         f.write("Best Testing kappa score so far = {}\n".format(best_kappa_so_far))
336 |                         f.write('*'*10)
337 |                         f.write('\n')
338 |                 if early_stop_count > max_step_count:
339 |                     break
340 |             best_kappa_scores.append(best_kappa_so_far)
341 | 
342 | with open(out_dir+'/eval'.format(fold_count), 'a') as f:
343 |     f.write('5 fold cv {}\n'.format(best_kappa_scores))
344 |     f.write('final result is {}'.format(np.mean(np.array(best_kappa_scores))))
345 | 
346 | #sys.stdout = orig_stdout
347 | #f.close()
348 | 


--------------------------------------------------------------------------------