├── glove └── glove-files-go-here ├── AES-Model.png ├── train_all_sets.sh ├── LICENSE ├── README.md ├── data_utils.py ├── qwk.py ├── memn2n_kv_regression.py ├── memn2n_kv.py ├── regression_train.py ├── train.py └── cv_train.py /glove/glove-files-go-here: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /AES-Model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binhetech/automated-essay-grading/master/AES-Model.png -------------------------------------------------------------------------------- /train_all_sets.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | for ((i=1; i<=8; i++)) 3 | do 4 | echo $i 5 | python train.py --essay_set_id $i --num_samples 2 6 | done 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Siyuan Zhao 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Automated Essay Grading 2 | Source code for the paper [A Memory-Augmented Neural Model for Automated Grading](http://dl.acm.org/citation.cfm?doid=3051457.3053982) in L@S 2017. 3 | 4 | ![Model Structure](AES-Model.png) 5 | 6 | The dataset comes from Kaggle ASAP competition. You can download the data from the link below. 7 | 8 | [https://www.kaggle.com/c/asap-aes/data](https://www.kaggle.com/c/asap-aes/data) 9 | 10 | Glove embeddings are used in this work. Specifically, 42B 300d is used to get the best results. You can download the embeddings from the link below. 11 | 12 | https://nlp.stanford.edu/projects/glove/ 13 | 14 | ### Get Started 15 | 16 | ``` 17 | git clone https://github.com/siyuanzhao/automated-essay-grading.git 18 | ``` 19 | 20 | * Download training data file 'training_set_rel3.tsv' from [Kaggle](https://www.kaggle.com/c/asap-aes/data) and put it under the root folder of this repo. 21 | 22 | * Download 'glove.42B.300d.zip' from [https://nlp.stanford.edu/projects/glove/](https://nlp.stanford.edu/projects/glove/) and unzip all files into 'glove/' folder. 23 | 24 | ### Requirements 25 | * Tensorflow 1.10 26 | * scikit-learn 0.19 27 | * six 1.10.0 28 | 29 | ### Usage 30 | ``` 31 | # Train the model on an essay set 32 | python cv_train.py --essay_set_id 33 | ``` 34 | 35 | There are serval flags within cv_train.py. Below is an example of training the model on essay set 1 with specific learning rate, and epochs. 36 | 37 | ``` 38 | python cv_train.py --essay_set_id 1 --learning_rate 0.005 --epochs 200 39 | ``` 40 | Check all avaiable flags with the following command. 41 | 42 | ``` 43 | python cv_train.py -h 44 | ``` 45 | 46 | **Note**: The model is trained on the training data with 5-fold cross validation. By default, the output layer of the model is a classification layer. There is another model whose output layer is a regression layer in *memn2n_kv_regression.py*. To train the model with the regression output layer, set flag is_regression to True. For example, 47 | 48 | ``` 49 | python cv_train.py --essay_set_id 1 --learning_rate 0.005 --epochs 200 --is_regression True 50 | ``` 51 | 52 | 53 | -------------------------------------------------------------------------------- /data_utils.py: -------------------------------------------------------------------------------- 1 | import re 2 | import os as os 3 | import numpy as np 4 | import itertools 5 | import pandas as pd 6 | from collections import Counter 7 | 8 | def load_training_data(training_path, essay_set=1): 9 | training_df = pd.read_csv(training_path, delimiter='\t') 10 | # resolved score for essay set 1 11 | resolved_score = training_df[training_df['essay_set'] == essay_set]['domain1_score'] 12 | essay_ids = training_df[training_df['essay_set'] == essay_set]['essay_id'] 13 | essays = training_df[training_df['essay_set'] == essay_set]['essay'] 14 | essay_list = [] 15 | # turn an essay to a list of words 16 | for idx, essay in essays.iteritems(): 17 | essay = clean_str(essay) 18 | #essay_list.append([w for w in tokenize(essay) if is_ascii(w)]) 19 | essay_list.append(tokenize(essay)) 20 | return essay_list, resolved_score.tolist(), essay_ids.tolist() 21 | 22 | def load_glove(token_num=6, dim=50): 23 | word2vec = [] 24 | word_idx = {} 25 | # first word is nil 26 | word2vec.append([0]*dim) 27 | count = 1 28 | with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), "glove/glove."+str(token_num)+ 29 | "B." + str(dim) + "d.txt")) as f: 30 | for line in f: 31 | l = line.split() 32 | word = l[0] 33 | vector = map(float, l[1:]) 34 | word_idx[word] = count 35 | word2vec.append(vector) 36 | count += 1 37 | 38 | print "==> glove is loaded" 39 | 40 | return word_idx, word2vec 41 | 42 | def tokenize(sent): 43 | '''Return the tokens of a sentence including punctuation. 44 | >>> tokenize('Bob dropped the apple. Where is the apple?') 45 | ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?'] 46 | >>> tokenize('I don't know') 47 | ['I', 'don', '\'', 'know'] 48 | ''' 49 | return [x.strip() for x in re.split('(\W+)?', sent) if x.strip()] 50 | 51 | def clean_str(string): 52 | """ 53 | Tokenization/string cleaning for all datasets except for SST. 54 | Original taken from https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py 55 | """ 56 | string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string) 57 | string = re.sub(r"\'s", " \'s", string) 58 | string = re.sub(r"\'ve", " \'ve", string) 59 | string = re.sub(r"n\'t", " n\'t", string) 60 | string = re.sub(r"\'re", " \'re", string) 61 | string = re.sub(r"\'d", " \'d", string) 62 | string = re.sub(r"\'ll", " \'ll", string) 63 | string = re.sub(r",", " , ", string) 64 | string = re.sub(r"!", " ! ", string) 65 | string = re.sub(r"\(", " ( ", string) 66 | string = re.sub(r"\)", " ) ", string) 67 | string = re.sub(r"\?", " ? ", string) 68 | string = re.sub(r"\s{2,}", " ", string) 69 | 70 | return string.strip().lower() 71 | 72 | def build_vocab(sentences, vocab_limit): 73 | """ 74 | Builds a vocabulary mapping from word to index based on the sentences. 75 | Returns vocabulary mapping and inverse vocabulary mapping. 76 | """ 77 | # Build vocabulary 78 | word_counts = Counter(itertools.chain(*sentences)) 79 | print 'Total size of vocab is {}'.format(len(word_counts.most_common())) 80 | # Mapping from index to word 81 | # vocabulary_inv = [x[0] for x in word_counts.most_common(vocab_limit)] 82 | vocabulary_inv = [x[0] for x in word_counts.most_common(vocab_limit)] 83 | 84 | vocabulary_inv = list(sorted(vocabulary_inv)) 85 | # Mapping from word to index 86 | vocabulary = {x: i+1 for i, x in enumerate(vocabulary_inv)} 87 | return [vocabulary, vocabulary_inv] 88 | 89 | # data is DataFrame 90 | def vectorize_data(data, word_idx, sentence_size): 91 | E = [] 92 | for essay in data: 93 | ls = max(0, sentence_size - len(essay)) 94 | wl = [] 95 | for w in essay: 96 | if w in word_idx: 97 | wl.append(word_idx[w]) 98 | else: 99 | #print '{} is not in vocab'.format(w) 100 | wl.append(0) 101 | wl += [0]*ls 102 | E.append(wl) 103 | return E 104 | -------------------------------------------------------------------------------- /qwk.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def confusion_matrix(rater_a, rater_b, min_rating=None, max_rating=None): 5 | """ 6 | Returns the confusion matrix between rater's ratings 7 | """ 8 | assert(len(rater_a) == len(rater_b)) 9 | if min_rating is None: 10 | min_rating = min(rater_a + rater_b) 11 | if max_rating is None: 12 | max_rating = max(rater_a + rater_b) 13 | num_ratings = int(max_rating - min_rating + 1) 14 | conf_mat = [[0 for i in range(num_ratings)] 15 | for j in range(num_ratings)] 16 | for a, b in zip(rater_a, rater_b): 17 | conf_mat[a - min_rating][b - min_rating] += 1 18 | return conf_mat 19 | 20 | 21 | def histogram(ratings, min_rating=None, max_rating=None): 22 | """ 23 | Returns the counts of each type of rating that a rater made 24 | """ 25 | if min_rating is None: 26 | min_rating = min(ratings) 27 | if max_rating is None: 28 | max_rating = max(ratings) 29 | num_ratings = int(max_rating - min_rating + 1) 30 | hist_ratings = [0 for x in range(num_ratings)] 31 | for r in ratings: 32 | hist_ratings[r - min_rating] += 1 33 | return hist_ratings 34 | 35 | 36 | def quadratic_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None): 37 | """ 38 | Calculates the quadratic weighted kappa 39 | quadratic_weighted_kappa calculates the quadratic weighted kappa 40 | value, which is a measure of inter-rater agreement between two raters 41 | that provide discrete numeric ratings. Potential values range from -1 42 | (representing complete disagreement) to 1 (representing complete 43 | agreement). A kappa value of 0 is expected if all agreement is due to 44 | chance. 45 | quadratic_weighted_kappa(rater_a, rater_b), where rater_a and rater_b 46 | each correspond to a list of integer ratings. These lists must have the 47 | same length. 48 | The ratings should be integers, and it is assumed that they contain 49 | the complete range of possible ratings. 50 | quadratic_weighted_kappa(X, min_rating, max_rating), where min_rating 51 | is the minimum possible rating, and max_rating is the maximum possible 52 | rating 53 | """ 54 | rater_a = np.array(rater_a, dtype=int) 55 | rater_b = np.array(rater_b, dtype=int) 56 | assert(len(rater_a) == len(rater_b)) 57 | if min_rating is None: 58 | min_rating = min(min(rater_a), min(rater_b)) 59 | if max_rating is None: 60 | max_rating = max(max(rater_a), max(rater_b)) 61 | conf_mat = confusion_matrix(rater_a, rater_b, 62 | min_rating, max_rating) 63 | num_ratings = len(conf_mat) 64 | num_scored_items = float(len(rater_a)) 65 | 66 | hist_rater_a = histogram(rater_a, min_rating, max_rating) 67 | hist_rater_b = histogram(rater_b, min_rating, max_rating) 68 | 69 | numerator = 0.0 70 | denominator = 0.0 71 | 72 | for i in range(num_ratings): 73 | for j in range(num_ratings): 74 | expected_count = (hist_rater_a[i] * hist_rater_b[j] 75 | / num_scored_items) 76 | d = pow(i - j, 2.0) / pow(num_ratings - 1, 2.0) 77 | numerator += d * conf_mat[i][j] / num_scored_items 78 | denominator += d * expected_count / num_scored_items 79 | 80 | return 1.0 - numerator / denominator 81 | 82 | 83 | def linear_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None): 84 | """ 85 | Calculates the linear weighted kappa 86 | linear_weighted_kappa calculates the linear weighted kappa 87 | value, which is a measure of inter-rater agreement between two raters 88 | that provide discrete numeric ratings. Potential values range from -1 89 | (representing complete disagreement) to 1 (representing complete 90 | agreement). A kappa value of 0 is expected if all agreement is due to 91 | chance. 92 | linear_weighted_kappa(rater_a, rater_b), where rater_a and rater_b 93 | each correspond to a list of integer ratings. These lists must have the 94 | same length. 95 | The ratings should be integers, and it is assumed that they contain 96 | the complete range of possible ratings. 97 | linear_weighted_kappa(X, min_rating, max_rating), where min_rating 98 | is the minimum possible rating, and max_rating is the maximum possible 99 | rating 100 | """ 101 | assert(len(rater_a) == len(rater_b)) 102 | if min_rating is None: 103 | min_rating = min(rater_a + rater_b) 104 | if max_rating is None: 105 | max_rating = max(rater_a + rater_b) 106 | conf_mat = confusion_matrix(rater_a, rater_b, 107 | min_rating, max_rating) 108 | num_ratings = len(conf_mat) 109 | num_scored_items = float(len(rater_a)) 110 | 111 | hist_rater_a = histogram(rater_a, min_rating, max_rating) 112 | hist_rater_b = histogram(rater_b, min_rating, max_rating) 113 | 114 | numerator = 0.0 115 | denominator = 0.0 116 | 117 | for i in range(num_ratings): 118 | for j in range(num_ratings): 119 | expected_count = (hist_rater_a[i] * hist_rater_b[j] 120 | / num_scored_items) 121 | d = abs(i - j) / float(num_ratings - 1) 122 | numerator += d * conf_mat[i][j] / num_scored_items 123 | denominator += d * expected_count / num_scored_items 124 | 125 | return 1.0 - numerator / denominator 126 | 127 | 128 | def kappa(rater_a, rater_b, min_rating=None, max_rating=None): 129 | """ 130 | Calculates the kappa 131 | kappa calculates the kappa 132 | value, which is a measure of inter-rater agreement between two raters 133 | that provide discrete numeric ratings. Potential values range from -1 134 | (representing complete disagreement) to 1 (representing complete 135 | agreement). A kappa value of 0 is expected if all agreement is due to 136 | chance. 137 | kappa(rater_a, rater_b), where rater_a and rater_b 138 | each correspond to a list of integer ratings. These lists must have the 139 | same length. 140 | The ratings should be integers, and it is assumed that they contain 141 | the complete range of possible ratings. 142 | kappa(X, min_rating, max_rating), where min_rating 143 | is the minimum possible rating, and max_rating is the maximum possible 144 | rating 145 | """ 146 | assert(len(rater_a) == len(rater_b)) 147 | if min_rating is None: 148 | min_rating = min(rater_a + rater_b) 149 | if max_rating is None: 150 | max_rating = max(rater_a + rater_b) 151 | conf_mat = confusion_matrix(rater_a, rater_b, 152 | min_rating, max_rating) 153 | num_ratings = len(conf_mat) 154 | num_scored_items = float(len(rater_a)) 155 | 156 | hist_rater_a = histogram(rater_a, min_rating, max_rating) 157 | hist_rater_b = histogram(rater_b, min_rating, max_rating) 158 | 159 | numerator = 0.0 160 | denominator = 0.0 161 | 162 | for i in range(num_ratings): 163 | for j in range(num_ratings): 164 | expected_count = (hist_rater_a[i] * hist_rater_b[j] 165 | / num_scored_items) 166 | if i == j: 167 | d = 0.0 168 | else: 169 | d = 1.0 170 | numerator += d * conf_mat[i][j] / num_scored_items 171 | denominator += d * expected_count / num_scored_items 172 | 173 | return 1.0 - numerator / denominator 174 | 175 | 176 | def mean_quadratic_weighted_kappa(kappas, weights=None): 177 | """ 178 | Calculates the mean of the quadratic 179 | weighted kappas after applying Fisher's r-to-z transform, which is 180 | approximately a variance-stabilizing transformation. This 181 | transformation is undefined if one of the kappas is 1.0, so all kappa 182 | values are capped in the range (-0.999, 0.999). The reverse 183 | transformation is then applied before returning the result. 184 | mean_quadratic_weighted_kappa(kappas), where kappas is a vector of 185 | kappa values 186 | mean_quadratic_weighted_kappa(kappas, weights), where weights is a vector 187 | of weights that is the same size as kappas. Weights are applied in the 188 | z-space 189 | """ 190 | kappas = np.array(kappas, dtype=float) 191 | if weights is None: 192 | weights = np.ones(np.shape(kappas)) 193 | else: 194 | weights = weights / np.mean(weights) 195 | 196 | # ensure that kappas are in the range [-.999, .999] 197 | kappas = np.array([min(x, .999) for x in kappas]) 198 | kappas = np.array([max(x, -.999) for x in kappas]) 199 | 200 | z = 0.5 * np.log((1 + kappas) / (1 - kappas)) * weights 201 | z = np.mean(z) 202 | return (np.exp(2 * z) - 1) / (np.exp(2 * z) + 1) 203 | 204 | 205 | def weighted_mean_quadratic_weighted_kappa(solution, submission): 206 | predicted_score = submission[submission.columns[-1]].copy() 207 | predicted_score.name = "predicted_score" 208 | if predicted_score.index[0] == 0: 209 | predicted_score = predicted_score[:len(solution)] 210 | predicted_score.index = solution.index 211 | combined = solution.join(predicted_score, how="left") 212 | groups = combined.groupby(by="essay_set") 213 | kappas = [quadratic_weighted_kappa(group[1]["essay_score"], group[1]["predicted_score"]) for group in groups] 214 | weights = [group[1]["essay_weight"].irow(0) for group in groups] 215 | return mean_quadratic_weighted_kappa(kappas, weights=weights) 216 | -------------------------------------------------------------------------------- /memn2n_kv_regression.py: -------------------------------------------------------------------------------- 1 | """Key Value Memory Networks with GRU reader. 2 | The implementation is based on https://arxiv.org/abs/1606.03126 3 | The implementation is based on http://arxiv.org/abs/1503.08895 [1] 4 | """ 5 | from __future__ import absolute_import 6 | from __future__ import division 7 | 8 | import tensorflow as tf 9 | from six.moves import range 10 | import numpy as np 11 | # from attention_reader import Attention_Reader 12 | 13 | def position_encoding(sentence_size, embedding_size): 14 | """ 15 | Position Encoding described in section 4.1 [1] 16 | """ 17 | encoding = np.ones((embedding_size, sentence_size), dtype=np.float32) 18 | ls = sentence_size+1 19 | le = embedding_size+1 20 | for i in range(1, le): 21 | for j in range(1, ls): 22 | encoding[i-1, j-1] = (i - (le-1)/2) * (j - (ls-1)/2) 23 | encoding = 1 + 4 * encoding / embedding_size / sentence_size 24 | return np.transpose(encoding) 25 | 26 | def add_gradient_noise(t, stddev=1e-3, name=None): 27 | """ 28 | Adds gradient noise as described in http://arxiv.org/abs/1511.06807 [2]. 29 | 30 | The input Tensor `t` should be a gradient. 31 | 32 | The output will be `t` + gaussian noise. 33 | 34 | 0.001 was said to be a good fixed value for memory networks [2]. 35 | """ 36 | with tf.name_scope(name, "add_gradient_noise", [t, stddev]) as name: 37 | #r = 0.55 38 | t = tf.convert_to_tensor(t, name="t") 39 | #sd = stddev/(1+step)**r 40 | gn = tf.random_normal(tf.shape(t), stddev=stddev) 41 | return tf.add(t, gn, name=name) 42 | 43 | def zero_nil_slot(t, name=None): 44 | """ 45 | Overwrites the nil_slot (first row) of the input Tensor with zeros. 46 | The nil_slot is a dummy slot and should not be trained and influence 47 | the training algorithm. 48 | """ 49 | with tf.name_scope(name, "zero_nil_slot", [t]) as name: 50 | t = tf.convert_to_tensor(t, name="t") 51 | s = tf.shape(t)[1] 52 | z = tf.zeros(tf.pack([1, s])) 53 | return tf.concat(0, [z, tf.slice(t, [1, 0], [-1, -1])], name=name) 54 | 55 | class MemN2N_KV(object): 56 | """Key Value Memory Network.""" 57 | def __init__(self, batch_size, vocab_size, 58 | query_size, story_size, memory_key_size, 59 | memory_value_size, embedding_size, 60 | min_score, feature_size=30, 61 | hops=3, 62 | reader='bow', 63 | l2_lambda=0.2, 64 | name='KeyValueMemN2N'): 65 | """Creates an Key Value Memory Network 66 | 67 | Args: 68 | batch_size: The size of the batch. 69 | 70 | vocab_size: The size of the vocabulary (should include the nil word). The nil word one-hot encoding should be 0. 71 | 72 | query_size: largest number of words in question 73 | 74 | story_size: largest number of words in story 75 | 76 | embedding_size: The size of the word embedding. 77 | 78 | memory_key_size: the size of memory slots for keys 79 | memory_value_size: the size of memory slots for values 80 | 81 | feature_size: dimension of feature extraced from word embedding 82 | 83 | hops: The number of hops. A hop consists of reading and addressing a memory slot. 84 | 85 | debug_mode: If true, print some debug info about tensors 86 | name: Name of the End-To-End Memory Network.\ 87 | Defaults to `KeyValueMemN2N`. 88 | """ 89 | self._story_size = story_size 90 | self._batch_size = batch_size 91 | self._vocab_size = vocab_size 92 | self._query_size = query_size 93 | #self._wiki_sentence_size = doc_size 94 | self._memory_key_size = memory_key_size 95 | self._embedding_size = embedding_size 96 | self._hops = hops 97 | self._name = name 98 | self._memory_value_size = memory_value_size 99 | self._encoding = tf.constant(position_encoding(self._story_size, self._embedding_size), name="encoding") 100 | self._reader = reader 101 | self._build_inputs() 102 | 103 | d = feature_size 104 | self._feature_size = feature_size 105 | self._n_hidden = feature_size 106 | self.reader_feature_size = 0 107 | 108 | # trainable variables 109 | if reader == 'bow': 110 | self.reader_feature_size = self._embedding_size 111 | elif reader == 'simple_gru': 112 | self.reader_feature_size = self._n_hidden 113 | 114 | self.A = tf.get_variable('A', shape=[self._feature_size, self.reader_feature_size], 115 | initializer=tf.contrib.layers.xavier_initializer()) 116 | self.A_mvalue = tf.get_variable('A_mvalue', shape=[self._feature_size, self.reader_feature_size], 117 | initializer=tf.contrib.layers.xavier_initializer()) 118 | self.A_mkey = tf.get_variable('A_mkey', shape=[self._feature_size, self.reader_feature_size], 119 | initializer=tf.contrib.layers.xavier_initializer()) 120 | 121 | #self.TK = tf.get_variable('TK', shape=[self._memory_value_size, self.reader_feature_size], 122 | # initializer=tf.contrib.layers.xavier_initializer()) 123 | #self.TV = tf.get_variable('TV', shape=[self._memory_value_size, self.reader_feature_size], 124 | # initializer=tf.contrib.layers.xavier_initializer()) 125 | 126 | # Embedding layer 127 | #nil_word_slot = tf.zeros([1, embedding_size]) 128 | #self.W = tf.concat(0, [nil_word_slot, tf.get_variable('W', shape=[vocab_size-1, embedding_size], 129 | # initializer=tf.contrib.layers.xavier_initializer())]) 130 | self.W = tf.Variable(self.w_placeholder, trainable=False) 131 | self.W_memory = self.W 132 | #self._nil_vars = set([self.W.name, self.W_memory.name]) 133 | # shape: [batch_size, query_size, embedding_size] 134 | self.embedded_chars = tf.nn.embedding_lookup(self.W, self._query) 135 | # shape: [batch_size, memory_size, story_size, embedding_size] 136 | self.mkeys_embedded_chars = tf.nn.embedding_lookup(self.W_memory, self._memory_key) 137 | # shape: [batch_size, memory_size, story_size, embedding_size] 138 | self.mvalues_embedded_chars = tf.nn.embedding_lookup(self.W_memory, self._memory_key) 139 | 140 | if reader == 'bow': 141 | q_r = tf.reduce_sum(self.embedded_chars*self._encoding, 1) 142 | doc_r = tf.reduce_sum(self.mkeys_embedded_chars*self._encoding, 2) 143 | value_r = tf.reduce_sum(self.mvalues_embedded_chars*self._encoding, 2) 144 | 145 | r_list = [] 146 | R = tf.get_variable('R', shape=[self._feature_size, self._feature_size], 147 | initializer=tf.contrib.layers.xavier_initializer()) 148 | 149 | for _ in range(self._hops): 150 | # define R for variables 151 | #R = tf.get_variable('R{}'.format(_), shape=[self._feature_size, self._feature_size], 152 | # initializer=tf.contrib.layers.xavier_initializer()) 153 | r_list.append(R) 154 | 155 | o = self._key_addressing(doc_r, value_r, q_r, r_list) 156 | o = tf.transpose(o) 157 | if reader == 'bow': 158 | #self.B = self.A 159 | self.B = tf.get_variable('B', shape=[self._feature_size, 1], 160 | initializer=tf.truncated_normal_initializer()) 161 | elif reader == 'simple_gru': 162 | #self.B = tf.get_variable('B', shape=[self._feature_size, self._embedding_size], 163 | self.B = tf.get_variable('B', shape=[self._feature_size, self._vocab_size], 164 | initializer=tf.contrib.layers.xavier_initializer()) 165 | logits_bias = tf.get_variable('logits_bias', [1]) 166 | # y_tmp = tf.matmul(self.B, self.W_memory, transpose_b=True) 167 | with tf.name_scope("prediction"): 168 | #logits = tf.matmul(o, y_tmp)# + logits_bias 169 | logits = tf.matmul(o, self.B) + logits_bias 170 | #normed_score = tf.squeeze(tf.nn.sigmoid(tf.cast(logits, tf.float32))) 171 | #score = normed_score * (max_score - min_score) + min_score 172 | score = tf.squeeze(logits) 173 | mse = tf.reduce_mean(tf.square(tf.sub(score, self._score_encoding))) 174 | # loss op 175 | trainable_vars = tf.trainable_variables() 176 | lossL2 = tf.add_n([tf.nn.l2_loss(v) for v in trainable_vars]) 177 | loss_op = mse + l2_lambda*lossL2 178 | # predict ops 179 | 180 | # assign ops 181 | self.cost = mse 182 | self.loss_op = loss_op 183 | self.predict_op = score 184 | 185 | def _build_inputs(self): 186 | with tf.name_scope("input"): 187 | self._memory_key = tf.placeholder(tf.int32, [None, self._memory_value_size, self._story_size], name='memory_key') 188 | 189 | self._query = tf.placeholder(tf.int32, [None, self._query_size], name='essay') 190 | 191 | self._score_encoding = tf.placeholder(tf.float32, [None], name='score') 192 | self.keep_prob = tf.placeholder(tf.float32, name='keep_prob') 193 | self.w_placeholder = tf.placeholder(tf.float32, [self._vocab_size, self._embedding_size]) 194 | self._mem_attention_encoding = tf.placeholder(tf.int32, [None, self._memory_key_size]) 195 | 196 | ''' 197 | mkeys: the vector representation for keys in memory 198 | -- shape of each mkeys: [1, embedding_size] 199 | mvalues: the vector representation for values in memory 200 | -- shape of each mvalues: [1, embedding_size] 201 | questions: the vector representation for the question 202 | -- shape of questions: [1, embedding_size] 203 | -- shape of R: [feature_size, feature_size] 204 | -- shape of self.A: [feature_size, embedding_size] 205 | -- shape of self.B: [feature_size, embedding_size] 206 | self.A, self.B and R are the parameters to learn 207 | ''' 208 | def _key_addressing(self, mkeys, mvalues, questions, r_list): 209 | self.mem_attention_probs = [] 210 | with tf.variable_scope(self._name): 211 | # [feature_size, batch_size] 212 | u_o = tf.matmul(self.A, questions, transpose_b=True) 213 | u = [u_o] 214 | for _ in range(self._hops): 215 | R = r_list[_] 216 | u_temp = u[-1] 217 | mk_temp = mkeys # + self.TK 218 | # [reader_size, batch_size x memory_size] 219 | k_temp = tf.reshape(tf.transpose(mk_temp, [2, 0, 1]), [self.reader_feature_size, -1]) 220 | # [feature_size, batch_size x memory_size] 221 | a_k_temp = tf.nn.dropout(tf.matmul(self.A_mvalue, k_temp), self.keep_prob) 222 | # [batch_size, memory_size, feature_size] 223 | a_k = tf.reshape(tf.transpose(a_k_temp), [-1, self._memory_key_size, self._feature_size]) 224 | # [batch_size, 1, feature_size] 225 | u_expanded = tf.expand_dims(tf.transpose(u_temp), [1]) 226 | # [batch_size, memory_size] 227 | dotted = tf.reduce_sum(a_k*u_expanded, 2) 228 | 229 | # Calculate probabilities 230 | # [batch_size, memory_size] 231 | probs = tf.nn.softmax(tf.to_float(dotted)) 232 | self.mem_attention_probs.append(probs) 233 | 234 | # [batch_size, memory_size, 1] 235 | probs_expand = tf.expand_dims(probs, -1) 236 | mv_temp = mvalues # + self.TV 237 | # [reader_size, batch_size x memory_size] 238 | v_temp = tf.reshape(tf.transpose(mv_temp, [2, 0, 1]), [self.reader_feature_size, -1]) 239 | # [feature_size, batch_size x memory_size] 240 | a_v_temp = tf.nn.dropout(tf.matmul(self.A_mkey, v_temp), self.keep_prob) 241 | # [batch_size, memory_size, feature_size] 242 | a_v = tf.reshape(tf.transpose(a_v_temp), [-1, self._memory_key_size, self._feature_size]) 243 | # [batch_size, feature_size] 244 | o_k = tf.reduce_sum(probs_expand*a_v, 1) 245 | # [feature_size, batch_size] 246 | o_k = tf.transpose(o_k) 247 | # [feature_size, batch_size] 248 | # test point 249 | #u_k = tf.nn.relu(tf.matmul(R, u_o+o_k)) 250 | u_k = tf.nn.dropout(tf.nn.relu(tf.matmul(R, u[-1]+o_k)), self.keep_prob) 251 | 252 | u.append(u_k) 253 | self.mem_attention_probs = tf.pack(self.mem_attention_probs, axis=1) 254 | # test point 255 | return u[-1] 256 | # return tf.add_n(u)/len(u) 257 | -------------------------------------------------------------------------------- /memn2n_kv.py: -------------------------------------------------------------------------------- 1 | """Key Value Memory Networks with GRU reader. 2 | The implementation is based on https://arxiv.org/abs/1606.03126 3 | The implementation is based on http://arxiv.org/abs/1503.08895 [1] 4 | """ 5 | from __future__ import absolute_import 6 | from __future__ import division 7 | 8 | import tensorflow as tf 9 | from six.moves import range 10 | import numpy as np 11 | # from attention_reader import Attention_Reader 12 | 13 | def position_encoding(sentence_size, embedding_size): 14 | """ 15 | Position Encoding described in section 4.1 [1] 16 | """ 17 | encoding = np.ones((embedding_size, sentence_size), dtype=np.float32) 18 | ls = sentence_size+1 19 | le = embedding_size+1 20 | for i in range(1, le): 21 | for j in range(1, ls): 22 | encoding[i-1, j-1] = (i - (le-1)/2) * (j - (ls-1)/2) 23 | encoding = 1 + 4 * encoding / embedding_size / sentence_size 24 | return np.transpose(encoding) 25 | 26 | def add_gradient_noise(t, stddev=1e-3, name=None): 27 | """ 28 | Adds gradient noise as described in http://arxiv.org/abs/1511.06807 [2]. 29 | 30 | The input Tensor `t` should be a gradient. 31 | 32 | The output will be `t` + gaussian noise. 33 | 34 | 0.001 was said to be a good fixed value for memory networks [2]. 35 | """ 36 | with tf.name_scope(name, "add_gradient_noise", [t, stddev]) as name: 37 | #r = 0.55 38 | t = tf.convert_to_tensor(t, name="t") 39 | #sd = stddev/(1+step)**r 40 | gn = tf.random_normal(tf.shape(t), stddev=stddev) 41 | return tf.add(t, gn, name=name) 42 | 43 | def zero_nil_slot(t, name=None): 44 | """ 45 | Overwrites the nil_slot (first row) of the input Tensor with zeros. 46 | The nil_slot is a dummy slot and should not be trained and influence 47 | the training algorithm. 48 | """ 49 | with tf.name_scope(name, "zero_nil_slot", [t]) as name: 50 | t = tf.convert_to_tensor(t, name="t") 51 | s = tf.shape(t)[1] 52 | z = tf.zeros(tf.stack([1, s])) 53 | return tf.concat([z, tf.slice(t, [1, 0], [-1, -1])], 0, name=name) 54 | 55 | class MemN2N_KV(object): 56 | """Key Value Memory Network.""" 57 | def __init__(self, batch_size, vocab_size, 58 | query_size, story_size, memory_key_size, 59 | memory_value_size, embedding_size, score_range, 60 | feature_size=30, 61 | hops=3, 62 | reader='bow', 63 | l2_lambda=0.2, 64 | name='KeyValueMemN2N'): 65 | """Creates an Key Value Memory Network 66 | 67 | Args: 68 | batch_size: The size of the batch. 69 | 70 | vocab_size: The size of the vocabulary (should include the nil word). The nil word one-hot encoding should be 0. 71 | 72 | query_size: largest number of words in question 73 | 74 | story_size: largest number of words in story 75 | 76 | embedding_size: The size of the word embedding. 77 | 78 | memory_key_size: the size of memory slots for keys 79 | memory_value_size: the size of memory slots for values 80 | 81 | feature_size: dimension of feature extraced from word embedding 82 | 83 | hops: The number of hops. A hop consists of reading and addressing a memory slot. 84 | 85 | debug_mode: If true, print some debug info about tensors 86 | name: Name of the End-To-End Memory Network.\ 87 | Defaults to `KeyValueMemN2N`. 88 | """ 89 | self._story_size = story_size 90 | self._batch_size = batch_size 91 | self._vocab_size = vocab_size 92 | self._query_size = query_size 93 | #self._wiki_sentence_size = doc_size 94 | self._memory_key_size = memory_key_size 95 | self._embedding_size = embedding_size 96 | self._hops = hops 97 | self._name = name 98 | self._memory_value_size = memory_value_size 99 | self._encoding = tf.constant(position_encoding(self._story_size, self._embedding_size), name="encoding") 100 | self._reader = reader 101 | self._build_inputs() 102 | 103 | d = feature_size 104 | self._feature_size = feature_size 105 | self._n_hidden = embedding_size 106 | self.reader_feature_size = 0 107 | 108 | # keep track of attention in memory 109 | self.mem_attention_probs = [] 110 | 111 | # one-hot encoding for scores 112 | self._labels = tf.one_hot(self._score_encoding, score_range, on_value=1.0, off_value=0.0, axis=-1) 113 | # trainable variables 114 | self.reader_feature_size = self._embedding_size 115 | 116 | self.A = tf.get_variable('A', shape=[self._feature_size, self.reader_feature_size], 117 | initializer=tf.contrib.layers.xavier_initializer()) 118 | self.A_mvalue = tf.get_variable('A_mvalue', shape=[self._feature_size, self.reader_feature_size], 119 | initializer=tf.contrib.layers.xavier_initializer()) 120 | self.A_mkey = tf.get_variable('A_mkey', shape=[self._feature_size, self.reader_feature_size], 121 | initializer=tf.contrib.layers.xavier_initializer()) 122 | 123 | # Embedding layer 124 | self.W = tf.Variable(self.w_placeholder, trainable=False) 125 | self.W_memory = self.W 126 | # shape: [batch_size, query_size, embedding_size] 127 | self.embedded_chars = tf.nn.embedding_lookup(self.W, self._query) 128 | # shape: [batch_size, memory_size, story_size, embedding_size] 129 | self.mkeys_embedded_chars = tf.nn.embedding_lookup(self.W_memory, self._memory_key) 130 | if reader == 'bow': 131 | # shape: [batch_size, memory_size, story_size, embedding_size] 132 | q_r = tf.reduce_sum(self.embedded_chars*self._encoding, 1) 133 | doc_r = tf.reduce_sum(self.mkeys_embedded_chars*self._encoding, 2) 134 | elif reader == 'gru': 135 | x_tmp = tf.reshape(self.mkeys_embedded_chars, [-1, self._story_size, self._embedding_size]) 136 | x = tf.transpose(x_tmp, [1, 0, 2]) 137 | # Reshape to (n_steps*batch_size, n_input) 138 | x = tf.reshape(x, [-1, self._embedding_size]) 139 | # Split to get a list of 'n_steps' 140 | # tensors of shape (doc_num, n_input) 141 | x = tf.split(x, self._story_size, 0) 142 | 143 | # do the same thing on the essay 144 | q = tf.transpose(self.embedded_chars, [1, 0, 2]) 145 | q = tf.reshape(q, [-1, self._embedding_size]) 146 | q = tf.split(q, self._query_size, 0) 147 | 148 | with tf.variable_scope('gru') as gru_scope: 149 | gru_rnn = tf.nn.rnn_cell.GRUCell(self._n_hidden) 150 | doc_r, _ = tf.contrib.rnn.static_rnn(gru_rnn, x, dtype=tf.float32) 151 | doc_r = tf.reshape(doc_r[-1], [-1, self._memory_key_size, self._n_hidden]) 152 | with tf.variable_scope(gru_scope, reuse=True): 153 | q_r, _ = tf.contrib.rnn.static_rnn(gru_rnn, q, dtype=tf.float32) 154 | q_r = q_r[-1] 155 | 156 | r_list = [] 157 | R = tf.get_variable('R', shape=[self._feature_size, self._feature_size], 158 | initializer=tf.contrib.layers.xavier_initializer()) 159 | 160 | for _ in range(self._hops): 161 | # define R for variables 162 | #R = tf.get_variable('R{}'.format(_), shape=[self._feature_size, self._feature_size], 163 | # initializer=tf.contrib.layers.xavier_initializer()) 164 | r_list.append(R) 165 | 166 | o = self._key_addressing(doc_r, doc_r, q_r, r_list) 167 | o = tf.transpose(o) 168 | self.B = tf.get_variable('B', shape=[self._feature_size, score_range], 169 | initializer=tf.contrib.layers.xavier_initializer()) 170 | logits_bias = tf.get_variable('logits_bias', [score_range]) 171 | # y_tmp = tf.matmul(self.B, self.W_memory, transpose_b=True) 172 | with tf.name_scope("prediction"): 173 | #logits = tf.matmul(o, y_tmp)# + logits_bias 174 | logits = tf.matmul(o, self.B) + logits_bias 175 | probs = tf.nn.softmax(tf.cast(logits, tf.float32)) 176 | 177 | cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=tf.cast(self._labels, tf.float32), name='cross_entropy') 178 | cross_entropy_sum = tf.reduce_sum(cross_entropy, name="cross_entropy_sum") 179 | 180 | # loss op 181 | trainable_vars = tf.trainable_variables() 182 | lossL2 = tf.add_n([tf.nn.l2_loss(v) for v in trainable_vars]) 183 | loss_op = cross_entropy_sum + l2_lambda*lossL2 184 | # predict ops 185 | predict_op = tf.argmax(probs, 1, name="predict_op") 186 | 187 | # assign ops 188 | self.cost = cross_entropy_sum 189 | self.loss_op = loss_op 190 | self.predict_op = predict_op 191 | self.probs = probs 192 | 193 | def _build_inputs(self): 194 | with tf.name_scope("input"): 195 | self._memory_key = tf.placeholder(tf.int32, [None, self._memory_value_size, self._story_size], name='memory_key') 196 | 197 | self._query = tf.placeholder(tf.int32, [None, self._query_size], name='question') 198 | 199 | #self._memory_value = tf.placeholder(tf.int32, [None, self._memory_value_size, self._story_size], name='memory_value') 200 | 201 | self._score_encoding = tf.placeholder(tf.int32, [None], name='score_encoding') 202 | self.keep_prob = tf.placeholder(tf.float32, name='keep_prob') 203 | self.w_placeholder = tf.placeholder(tf.float32, [self._vocab_size, self._embedding_size]) 204 | self._mem_attention_encoding = tf.placeholder(tf.int32, [None, self._memory_key_size]) 205 | 206 | ''' 207 | mkeys: the vector representation for keys in memory 208 | -- shape of each mkeys: [1, embedding_size] 209 | mvalues: the vector representation for values in memory 210 | -- shape of each mvalues: [1, embedding_size] 211 | questions: the vector representation for the question 212 | -- shape of questions: [1, embedding_size] 213 | -- shape of R: [feature_size, feature_size] 214 | -- shape of self.A: [feature_size, embedding_size] 215 | -- shape of self.B: [feature_size, embedding_size] 216 | self.A, self.B and R are the parameters to learn 217 | ''' 218 | def _key_addressing(self, mkeys, mvalues, questions, r_list): 219 | self.mem_attention_probs = [] 220 | with tf.variable_scope(self._name): 221 | questions = tf.nn.dropout(questions, self.keep_prob) 222 | # [feature_size, batch_size] 223 | u_o = tf.matmul(self.A, questions, transpose_b=True) 224 | u = [u_o] 225 | hop_probs = [] 226 | for _ in range(self._hops): 227 | R = r_list[_] 228 | u_temp = u[-1] 229 | mk_temp = tf.nn.dropout(mkeys, self.keep_prob) 230 | # [reader_size, batch_size x memory_size] 231 | k_temp = tf.reshape(tf.transpose(mk_temp, [2, 0, 1]), [self.reader_feature_size, -1]) 232 | # [feature_size, batch_size x memory_size] 233 | a_k_temp = tf.matmul(self.A_mvalue, k_temp) 234 | # [batch_size, memory_size, feature_size] 235 | a_k = tf.reshape(tf.transpose(a_k_temp), [-1, self._memory_key_size, self._feature_size]) 236 | # [batch_size, 1, feature_size] 237 | u_expanded = tf.expand_dims(tf.transpose(u_temp), [1]) 238 | # [batch_size, memory_size] 239 | dotted = tf.reduce_sum(a_k*u_expanded, 2) 240 | 241 | # Calculate probabilities 242 | # [batch_size, memory_size] 243 | probs = tf.nn.softmax(dotted) 244 | self.mem_attention_probs.append(probs) 245 | # [batch_size, memory_size, 1] 246 | probs_expand = tf.expand_dims(probs, -1) 247 | mv_temp = mk_temp 248 | # [reader_size, batch_size x memory_size] 249 | v_temp = tf.reshape(tf.transpose(mv_temp, [2, 0, 1]), [self.reader_feature_size, -1]) 250 | # [feature_size, batch_size x memory_size] 251 | a_v_temp = tf.matmul(self.A_mkey, v_temp) 252 | # [batch_size, memory_size, feature_size] 253 | a_v = tf.reshape(tf.transpose(a_v_temp), [-1, self._memory_key_size, self._feature_size]) 254 | # [batch_size, feature_size] 255 | o_k = tf.reduce_sum(probs_expand*a_v, 1) 256 | # [feature_size, batch_size] 257 | o_k = tf.transpose(o_k) 258 | # [feature_size, batch_size] 259 | # test point 260 | u_k = tf.nn.relu(tf.matmul(R, u[-1]+o_k)) 261 | #u_k = tf.matmul(R, u[-1]+o_k) 262 | #u_k = tf.nn.relu(tf.matmul(R, u_o + o_k)) 263 | u.append(u_k) 264 | self.mem_attention_probs = tf.stack(self.mem_attention_probs, axis=1) 265 | #TODO: 266 | return u[-1] 267 | #return tf.add_n(u)/len(u) 268 | -------------------------------------------------------------------------------- /regression_train.py: -------------------------------------------------------------------------------- 1 | import data_utils 2 | import numpy as np 3 | from sklearn import cross_validation 4 | from memn2n_kv_regression import MemN2N_KV 5 | #from skll.metrics import kappa 6 | from qwk import quadratic_weighted_kappa as kappa 7 | import tensorflow as tf 8 | from memn2n_kv_regression import add_gradient_noise 9 | import time 10 | import os 11 | import sys 12 | 13 | print 'start to load flags\n' 14 | 15 | # flags 16 | tf.flags.DEFINE_float("epsilon", 0.1, "Epsilon value for Adam Optimizer.") 17 | tf.flags.DEFINE_float("l2_lambda", 0.1, "Lambda for l2 loss.") 18 | tf.flags.DEFINE_float("learning_rate", 0.002, "Learning rate") 19 | tf.flags.DEFINE_float("max_grad_norm", 1, "Clip gradients to this norm.") 20 | tf.flags.DEFINE_float("keep_prob", 0.8, "Keep probability for dropout") 21 | tf.flags.DEFINE_integer("evaluation_interval", 3, "Evaluate and print results every x epochs") 22 | tf.flags.DEFINE_integer("batch_size", 32, "Batch size for training.") 23 | tf.flags.DEFINE_integer("feature_size", 100, "Feature size") 24 | tf.flags.DEFINE_integer("num_samples", 1, "Number of samples selected from training for each score") 25 | tf.flags.DEFINE_integer("hops", 1, "Number of hops in the Memory Network.") 26 | tf.flags.DEFINE_integer("epochs", 200, "Number of epochs to train for.") 27 | tf.flags.DEFINE_integer("embedding_size", 300, "Embedding size for embedding matrices.") 28 | tf.flags.DEFINE_integer("token_num", 42, "The number of token in glove") 29 | tf.flags.DEFINE_integer("essay_set_id", 7, "essay set id, 1 <= id <= 8") 30 | tf.flags.DEFINE_string("reader", "bow", "Reader for the model (bow, simple_gru)") 31 | tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement") 32 | tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices") 33 | # hyper-parameters 34 | FLAGS = tf.flags.FLAGS 35 | FLAGS._parse_flags() 36 | 37 | #vocab_limit = 13000 38 | essay_set_id = FLAGS.essay_set_id 39 | batch_size = FLAGS.batch_size 40 | embedding_size = FLAGS.embedding_size 41 | feature_size = FLAGS.feature_size 42 | l2_lambda = FLAGS.l2_lambda 43 | hops = FLAGS.hops 44 | reader = 'bow' 45 | epochs = FLAGS.epochs 46 | num_samples = FLAGS.num_samples 47 | num_tokens = FLAGS.token_num 48 | test_batch_size = batch_size 49 | random_state = 10 50 | 51 | # print flags info 52 | orig_stdout = sys.stdout 53 | timestamp = str(int(time.time())) 54 | folder_name = 'essay_set_{}_{}_regression_{}'.format(essay_set_id, num_samples, timestamp) 55 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", folder_name)) 56 | if not os.path.exists(out_dir): 57 | os.makedirs(out_dir) 58 | 59 | # save output to a file 60 | #f = file(out_dir+'/out.txt', 'w') 61 | #sys.stdout = f 62 | print("Writing to {}\n".format(out_dir)) 63 | 64 | print("\nParameters:") 65 | for attr, value in sorted(FLAGS.__flags.items()): 66 | print("{}={}".format(attr.upper(), value)) 67 | print("") 68 | 69 | with open(out_dir+'/params', 'w') as f: 70 | for attr, value in sorted(FLAGS.__flags.items()): 71 | f.write("{}={}".format(attr.upper(), value)) 72 | f.write("\n") 73 | 74 | # hyper-parameters end here 75 | training_path = 'training_set_rel3.tsv' 76 | essay_list, resolved_scores, essay_id = data_utils.load_training_data(training_path, essay_set_id) 77 | 78 | max_score = max(resolved_scores) 79 | min_score = min(resolved_scores) 80 | if essay_set_id == 7: 81 | min_score, max_score = 0, 30 82 | elif essay_set_id == 8: 83 | min_score, max_score = 0, 60 84 | print 'max_score is {} \t min_score is {}\n'.format(max_score, min_score) 85 | with open(out_dir+'/params', 'a') as f: 86 | f.write('max_score is {} \t min_score is {} \n'.format(max_score, min_score)) 87 | 88 | # include max score 89 | score_range = range(min_score, max_score+1) 90 | 91 | #word_idx, _ = data_utils.build_vocab(essay_list, vocab_limit) 92 | 93 | # load glove 94 | word_idx, word2vec = data_utils.load_glove(num_tokens, embedding_size) 95 | vocab_size = len(word_idx) + 1 96 | # stat info on data set 97 | 98 | sent_size_list = map(len, [essay for essay in essay_list]) 99 | max_sent_size = max(sent_size_list) 100 | mean_sent_size = int(np.mean(map(len, [essay for essay in essay_list]))) 101 | 102 | print 'max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size) 103 | with open(out_dir+'/params', 'a') as f: 104 | f.write('max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size)) 105 | 106 | print 'The length of score range is {}'.format(len(score_range)) 107 | E = data_utils.vectorize_data(essay_list, word_idx, max_sent_size) 108 | 109 | labeled_data = zip(E, resolved_scores, sent_size_list) 110 | 111 | # split the data on the fly 112 | trainE, testE, train_scores, test_scores, train_essay_id, test_essay_id = cross_validation.train_test_split( 113 | E, resolved_scores, essay_id, test_size=.2, random_state=random_state) 114 | 115 | memory = [] 116 | memory_score = [] 117 | memory_sent_size = [] 118 | memory_essay_ids = [] 119 | # pick sampled essay for each score 120 | for i in score_range: 121 | for j in range(num_samples): 122 | if i in train_scores: 123 | score_idx = train_scores.index(i) 124 | score = train_scores.pop(score_idx) 125 | essay = trainE.pop(score_idx) 126 | #sent_size = sent_size_list.pop(score_idx) 127 | memory.append(essay) 128 | memory_score.append(score) 129 | memory_essay_ids.append(train_essay_id.pop(score_idx)) 130 | memory_size = len(memory) 131 | trainE, evalE, train_scores, eval_scores, train_essay_id, eval_essay_id = cross_validation.train_test_split( 132 | trainE, train_scores, train_essay_id, test_size=.2, random_state=random_state) 133 | 134 | # convert score to one hot encoding 135 | #train_scores_encoding = map(lambda x: score_range.index(x), train_scores) 136 | # normalize training score 137 | #normed_train_scores = (np.array(train_scores) - min_score) / (max_score - min_score) 138 | 139 | # data size 140 | n_train = len(trainE) 141 | n_test = len(testE) 142 | n_eval = len(evalE) 143 | 144 | print 'The size of training data: {}'.format(n_train) 145 | print 'The size of testing data: {}'.format(n_test) 146 | print 'The size of evaluation data: {}'.format(n_eval) 147 | with open(out_dir+'/params', 'a') as f: 148 | f.write('The size of training data: {}\n'.format(n_train)) 149 | f.write('The size of testing data: {}'.format(n_test)) 150 | f.write('The size of evaluation data: {}'.format(n_eval)) 151 | f.write('\nEssay ids in memory:\n{}'.format(memory_essay_ids)) 152 | f.write('\nEssay ids in training:\n{}'.format(train_essay_id)) 153 | f.write('\nEssay ids in evaluation:\n{}'.format(eval_essay_id)) 154 | f.write('\nEssay ids in testing:\n{}'.format(test_essay_id)) 155 | 156 | batches = zip(range(0, n_train-batch_size, batch_size), range(batch_size, n_train, batch_size)) 157 | batches = [(start, end) for start, end in batches] 158 | 159 | with tf.Graph().as_default(): 160 | session_conf = tf.ConfigProto( 161 | allow_soft_placement=FLAGS.allow_soft_placement, 162 | log_device_placement=FLAGS.log_device_placement) 163 | 164 | global_step = tf.Variable(0, name="global_step", trainable=False) 165 | # decay learning rate 166 | starter_learning_rate = FLAGS.learning_rate 167 | learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 3000, 0.96, staircase=True) 168 | 169 | #optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=FLAGS.epsilon) 170 | optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate) 171 | 172 | best_kappa_so_far = 0.0 173 | with tf.Session(config=session_conf) as sess: 174 | model = MemN2N_KV(batch_size, vocab_size, max_sent_size, max_sent_size, memory_size, 175 | memory_size, embedding_size, min_score, max_score, 176 | feature_size, hops, reader, l2_lambda) 177 | 178 | grads_and_vars = optimizer.compute_gradients( 179 | model.loss_op, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE) 180 | grads_and_vars = [(tf.clip_by_norm(g, FLAGS.max_grad_norm), v) 181 | for g, v in grads_and_vars if g is not None] 182 | grads_and_vars = [(add_gradient_noise(g), v) for g, v in grads_and_vars] 183 | train_op = optimizer.apply_gradients(grads_and_vars, name="train_op", global_step=global_step) 184 | 185 | sess.run(tf.global_variables_initializer(), feed_dict={model.w_placeholder: word2vec}) 186 | 187 | saver = tf.train.Saver(tf.global_variables()) 188 | 189 | def train_step(m, e, s): 190 | feed_dict = { 191 | model._query: e, 192 | model._memory_key: m, 193 | model._score_encoding: s, 194 | model.keep_prob: FLAGS.keep_prob 195 | } 196 | start_time = time.time() 197 | _, step, predict_op, cost = sess.run([train_op, global_step, model.predict_op, model.cost], feed_dict) 198 | end_time = time.time() 199 | time_cost = end_time - start_time 200 | return predict_op, cost, time_cost 201 | 202 | def test_step(e, m): 203 | feed_dict = { 204 | model._query: e, 205 | model._memory_key: m, 206 | model.keep_prob: 1 207 | } 208 | preds = sess.run(model.predict_op, feed_dict) 209 | return np.round(preds) 210 | 211 | for i in range(1, epochs+1): 212 | train_cost = 0 213 | np.random.shuffle(batches) 214 | for start, end in batches: 215 | e = trainE[start:end] 216 | s = train_scores[start:end] 217 | #s = normed_train_scores[start:end] 218 | batched_memory = [] 219 | # batch sized memory 220 | for _ in range(len(e)): 221 | batched_memory.append(memory) 222 | _, cost, _ = train_step(batched_memory, e, s) 223 | train_cost += cost 224 | print 'Finish epoch {}, total training cost is {}'.format(i, train_cost) 225 | # evaluation 226 | if i % FLAGS.evaluation_interval == 0 or i == FLAGS.epochs: 227 | # test on training data 228 | train_preds = [] 229 | for start in range(0, n_train, test_batch_size): 230 | end = min(n_train, start+test_batch_size) 231 | 232 | batched_memory = [] 233 | for _ in range(end-start): 234 | batched_memory.append(memory) 235 | preds = test_step(trainE[start:end], batched_memory) 236 | for ite in preds: 237 | if ite > max_score: 238 | ite = max_score 239 | elif ite < min_score: 240 | ite = min_score 241 | train_preds.append(ite) 242 | # regression 243 | #train_preds = np.array(train_preds)*(max_score-min_score) + min_score 244 | print train_preds[-10:] 245 | train_kappp_score = kappa(train_scores, train_preds, min_score, max_score) 246 | 247 | # test on eval data 248 | eval_preds = [] 249 | for start in range(0, n_eval, test_batch_size): 250 | end = min(n_eval, start+test_batch_size) 251 | 252 | batched_memory = [] 253 | for _ in range(end-start): 254 | batched_memory.append(memory) 255 | preds = test_step(evalE[start:end], batched_memory) 256 | for ite in preds: 257 | if ite > max_score: 258 | ite = max_score 259 | elif ite < min_score: 260 | ite = min_score 261 | 262 | eval_preds.append(ite) 263 | # regression 264 | #eval_preds = np.array(eval_preds)*(max_score-min_score) + min_score 265 | eval_kappp_score = kappa(eval_scores, eval_preds, min_score, max_score) 266 | 267 | # test on test data 268 | test_preds = [] 269 | for start in range(0, n_test, test_batch_size): 270 | end = min(n_test, start+test_batch_size) 271 | 272 | batched_memory = [] 273 | for _ in range(end-start): 274 | batched_memory.append(memory) 275 | preds = test_step(testE[start:end], batched_memory) 276 | for ite in preds: 277 | if ite > max_score: 278 | ite = max_score 279 | elif ite < min_score: 280 | ite = min_score 281 | 282 | test_preds.append(ite) 283 | # regression 284 | #test_preds = np.array(test_preds)*(max_score-min_score) + min_score 285 | test_kappp_score = kappa(test_scores, test_preds, min_score, max_score) 286 | 287 | # save the model if it gets best kappa 288 | if(test_kappp_score > best_kappa_so_far): 289 | best_kappa_so_far = test_kappp_score 290 | #saver.save(sess, out_dir+'/checkpoints', global_step) 291 | print("Training kappa score = {}".format(train_kappp_score)) 292 | print("Validation kappa score = {}".format(eval_kappp_score)) 293 | print("Testing kappa score = {}".format(test_kappp_score)) 294 | with open(out_dir+'/eval', 'a') as f: 295 | f.write("Training kappa score = {}\n".format(train_kappp_score)) 296 | f.write("Validation kappa score = {}\n".format(eval_kappp_score)) 297 | f.write("Testing kappa score = {}\n".format(test_kappp_score)) 298 | f.write("Best Testing kappa score so far = {}\n".format(best_kappa_so_far)) 299 | f.write('*'*10) 300 | f.write('\n') 301 | #sys.stdout = orig_stdout 302 | #f.close() 303 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import data_utils 2 | import numpy as np 3 | from sklearn import cross_validation 4 | from qwk import quadratic_weighted_kappa 5 | import tensorflow as tf 6 | from memn2n_kv import add_gradient_noise 7 | import time 8 | import os 9 | import sys 10 | import pandas as pd 11 | 12 | print 'start to load flags\n' 13 | 14 | # flags 15 | tf.flags.DEFINE_float("epsilon", 0.1, "Epsilon value for Adam Optimizer.") 16 | tf.flags.DEFINE_float("l2_lambda", 0.3, "Lambda for l2 loss.") 17 | tf.flags.DEFINE_float("learning_rate", 0.002, "Learning rate") 18 | tf.flags.DEFINE_float("max_grad_norm", 10.0, "Clip gradients to this norm.") 19 | tf.flags.DEFINE_float("keep_prob", 0.8, "Keep probability for dropout") 20 | tf.flags.DEFINE_integer("evaluation_interval", 3, "Evaluate and print results every x epochs") 21 | tf.flags.DEFINE_integer("batch_size", 32, "Batch size for training.") 22 | tf.flags.DEFINE_integer("feature_size", 100, "Feature size") 23 | tf.flags.DEFINE_integer("num_samples", 1, "Number of samples selected from training for each score") 24 | tf.flags.DEFINE_integer("hops", 3, "Number of hops in the Memory Network.") 25 | tf.flags.DEFINE_integer("epochs", 200, "Number of epochs to train for.") 26 | tf.flags.DEFINE_integer("embedding_size", 300, "Embedding size for embedding matrices.") 27 | tf.flags.DEFINE_integer("essay_set_id", 1, "essay set id, 1 <= id <= 8") 28 | tf.flags.DEFINE_integer("token_num", 42, "The number of token in glove (6, 42)") 29 | tf.flags.DEFINE_boolean("gated_addressing", False, "Simple gated addressing") 30 | tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement") 31 | tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices") 32 | # hyper-parameters 33 | FLAGS = tf.flags.FLAGS 34 | FLAGS._parse_flags() 35 | 36 | gated_addressing = FLAGS.gated_addressing 37 | essay_set_id = FLAGS.essay_set_id 38 | batch_size = FLAGS.batch_size 39 | embedding_size = FLAGS.embedding_size 40 | feature_size = FLAGS.feature_size 41 | l2_lambda = FLAGS.l2_lambda 42 | hops = FLAGS.hops 43 | reader = 'bow' 44 | epochs = FLAGS.epochs 45 | num_samples = FLAGS.num_samples 46 | num_tokens = FLAGS.token_num 47 | test_batch_size = batch_size 48 | random_state = 0 49 | if gated_addressing: 50 | from memn2n_g_kv import MemN2N_KV 51 | else: 52 | from memn2n_kv import MemN2N_KV 53 | # print flags info 54 | orig_stdout = sys.stdout 55 | timestamp = time.strftime("%b_%d_%Y_%H:%M:%S", time.localtime()) 56 | folder_name = 'essay_set_{}_{}_{}'.format(essay_set_id, num_samples, timestamp) 57 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", folder_name)) 58 | if not os.path.exists(out_dir): 59 | os.makedirs(out_dir) 60 | 61 | # save output to a file 62 | #f = file(out_dir+'/out.txt', 'w') 63 | #sys.stdout = f 64 | print("Writing to {}\n".format(out_dir)) 65 | 66 | print("\nParameters:") 67 | for attr, value in sorted(FLAGS.__flags.items()): 68 | print("{}={}".format(attr.upper(), value)) 69 | print("") 70 | 71 | with open(out_dir+'/params', 'w') as f: 72 | for attr, value in sorted(FLAGS.__flags.items()): 73 | f.write("{}={}".format(attr.upper(), value)) 74 | f.write("\n") 75 | 76 | # hyper-parameters end here 77 | training_path = 'training_set_rel3.tsv' 78 | essay_list, resolved_scores, essay_id = data_utils.load_training_data(training_path, essay_set_id) 79 | 80 | max_score = max(resolved_scores) 81 | min_score = min(resolved_scores) 82 | if essay_set_id == 7: 83 | min_score, max_score = 0, 30 84 | elif essay_set_id == 8: 85 | min_score, max_score = 0, 60 86 | 87 | print 'max_score is {} \t min_score is {}\n'.format(max_score, min_score) 88 | with open(out_dir+'/params', 'a') as f: 89 | f.write('max_score is {} \t min_score is {} \n'.format(max_score, min_score)) 90 | 91 | # include max score 92 | score_range = range(min_score, max_score+1) 93 | 94 | #word_idx, _ = data_utils.build_vocab(essay_list, vocab_limit) 95 | 96 | # load glove 97 | word_idx, word2vec = data_utils.load_glove(num_tokens, dim=embedding_size) 98 | 99 | vocab_size = len(word_idx) + 1 100 | # stat info on data set 101 | 102 | sent_size_list = map(len, [essay for essay in essay_list]) 103 | max_sent_size = max(sent_size_list) 104 | mean_sent_size = int(np.mean(map(len, [essay for essay in essay_list]))) 105 | 106 | print 'max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size) 107 | with open(out_dir+'/params', 'a') as f: 108 | f.write('max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size)) 109 | 110 | print 'The length of score range is {}'.format(len(score_range)) 111 | E = data_utils.vectorize_data(essay_list, word_idx, max_sent_size) 112 | 113 | labeled_data = zip(E, resolved_scores, sent_size_list) 114 | 115 | # split the data on the fly 116 | #trainE, testE, train_scores, test_scores, train_sent_sizes, test_sent_sizes = cross_validation.train_test_split( 117 | # E, resolved_scores, sent_size_list, test_size=.2, random_state=random_state) 118 | 119 | #trainE, evalE, train_scores, eval_scores, train_sent_sizes, eval_sent_sizes = cross_validation.train_test_split( 120 | # trainE, train_scores, train_sent_sizes, test_size=.1, random_state=random_state) 121 | # split the data on the fly 122 | trainE, testE, train_scores, test_scores, train_essay_id, test_essay_id = cross_validation.train_test_split( 123 | E, resolved_scores, essay_id, test_size=.2, random_state=random_state) 124 | 125 | memory = [] 126 | memory_score = [] 127 | memory_sent_size = [] 128 | memory_essay_ids = [] 129 | # pick sampled essay for each score 130 | for i in score_range: 131 | # test point: limit the number of samples in memory for 8 132 | for j in range(num_samples): 133 | if i in train_scores: 134 | score_idx = train_scores.index(i) 135 | score = train_scores.pop(score_idx) 136 | essay = trainE.pop(score_idx) 137 | sent_size = sent_size_list.pop(score_idx) 138 | memory.append(essay) 139 | memory_score.append(score) 140 | memory_essay_ids.append(train_essay_id.pop(score_idx)) 141 | memory_sent_size.append(sent_size) 142 | memory_size = len(memory) 143 | trainE, evalE, train_scores, eval_scores, train_essay_id, eval_essay_id = cross_validation.train_test_split( 144 | trainE, train_scores, train_essay_id, test_size=.2) 145 | # convert score to one hot encoding 146 | train_scores_encoding = map(lambda x: score_range.index(x), train_scores) 147 | 148 | # data size 149 | n_train = len(trainE) 150 | n_test = len(testE) 151 | n_eval = len(evalE) 152 | 153 | print 'The size of training data: {}'.format(n_train) 154 | print 'The size of testing data: {}'.format(n_test) 155 | print 'The size of evaluation data: {}'.format(n_eval) 156 | with open(out_dir+'/params', 'a') as f: 157 | f.write('The size of training data: {}\n'.format(n_train)) 158 | f.write('The size of testing data: {}\n'.format(n_test)) 159 | f.write('The size of evaluation data: {}\n'.format(n_eval)) 160 | f.write('\nEssay scores in memory:\n{}'.format(memory_score)) 161 | f.write('\nEssay ids in memory:\n{}'.format(memory_essay_ids)) 162 | f.write('\nEssay ids in training:\n{}'.format(train_essay_id)) 163 | f.write('\nEssay ids in evaluation:\n{}'.format(eval_essay_id)) 164 | f.write('\nEssay ids in testing:\n{}'.format(test_essay_id)) 165 | 166 | batches = zip(range(0, n_train-batch_size, batch_size), range(batch_size, n_train, batch_size)) 167 | batches = [(start, end) for start, end in batches] 168 | 169 | with tf.Graph().as_default(): 170 | session_conf = tf.ConfigProto( 171 | allow_soft_placement=FLAGS.allow_soft_placement, 172 | log_device_placement=FLAGS.log_device_placement) 173 | 174 | global_step = tf.Variable(0, name="global_step", trainable=False) 175 | # decay learning rate 176 | starter_learning_rate = FLAGS.learning_rate 177 | learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 3000, 0.96, staircase=True) 178 | 179 | # test point 180 | optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=FLAGS.epsilon) 181 | #optimizer = tf.train.AdagradOptimizer(learning_rate) 182 | best_kappa_so_far = 0.0 183 | with tf.Session(config=session_conf) as sess: 184 | model = MemN2N_KV(batch_size, vocab_size, max_sent_size, max_sent_size, memory_size, 185 | memory_size, embedding_size, len(score_range), feature_size, hops, reader, l2_lambda) 186 | 187 | grads_and_vars = optimizer.compute_gradients(model.loss_op, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE) 188 | grads_and_vars = [(tf.clip_by_norm(g, FLAGS.max_grad_norm), v) 189 | for g, v in grads_and_vars if g is not None] 190 | grads_and_vars = [(add_gradient_noise(g, 1e-3), v) for g, v in grads_and_vars] 191 | # test point 192 | #nil_grads_and_vars = [] 193 | #for g, v in grads_and_vars: 194 | # if v.name in model._nil_vars: 195 | # nil_grads_and_vars.append((zero_nil_slot(g), v)) 196 | # else: 197 | # nil_grads_and_vars.append((g, v)) 198 | 199 | train_op = optimizer.apply_gradients(grads_and_vars, name="train_op", global_step=global_step) 200 | 201 | sess.run(tf.initialize_all_variables(), feed_dict={model.w_placeholder: word2vec}) 202 | 203 | saver = tf.train.Saver(tf.all_variables()) 204 | 205 | def train_step(m, e, s, ma): 206 | start_time = time.time() 207 | feed_dict = { 208 | model._query: e, 209 | model._memory_key: m, 210 | model._score_encoding: s, 211 | model._mem_attention_encoding: ma, 212 | model.keep_prob: FLAGS.keep_prob 213 | #model.w_placeholder: word2vec 214 | } 215 | _, step, predict_op, cost = sess.run([train_op, global_step, model.predict_op, model.cost], feed_dict) 216 | end_time = time.time() 217 | time_spent = end_time - start_time 218 | return predict_op, cost, time_spent 219 | 220 | def test_step(e, m): 221 | feed_dict = { 222 | model._query: e, 223 | model._memory_key: m, 224 | model.keep_prob: 1 225 | #model.w_placeholder: word2vec 226 | } 227 | preds, mem_attention_probs = sess.run([model.predict_op, model.mem_attention_probs], feed_dict) 228 | return preds, mem_attention_probs 229 | 230 | for i in range(1, epochs+1): 231 | train_cost = 0 232 | total_time = 0 233 | np.random.shuffle(batches) 234 | for start, end in batches: 235 | e = trainE[start:end] 236 | s = train_scores_encoding[start:end] 237 | s_num = train_scores[start:end] 238 | #batched_memory = [] 239 | # batch sized memory 240 | #for _ in range(len(e)): 241 | # batched_memory.append(memory) 242 | mem_atten_encoding = [] 243 | for ite in s_num: 244 | mem_encoding = np.zeros(memory_size) 245 | for j_idx, j in enumerate(memory_score): 246 | if j == ite: 247 | mem_encoding[j_idx] = 1 248 | mem_atten_encoding.append(mem_encoding) 249 | batched_memory = [memory] * (end-start) 250 | _, cost, time_spent = train_step(batched_memory, e, s, mem_atten_encoding) 251 | total_time += time_spent 252 | train_cost += cost 253 | print 'Finish epoch {}, total training cost is {}, time spent is {}'.format(i, train_cost, total_time) 254 | # evaluation 255 | if i % FLAGS.evaluation_interval == 0 or i == FLAGS.epochs: 256 | # test on training data 257 | train_preds = [] 258 | for start in range(0, n_train, test_batch_size): 259 | end = min(n_train, start+test_batch_size) 260 | 261 | #batched_memory = [] 262 | #for _ in range(end-start): 263 | # batched_memory.append(memory) 264 | batched_memory = [memory] * (end-start) 265 | preds, _ = test_step(trainE[start:end], batched_memory) 266 | for ite in preds: 267 | train_preds.append(ite) 268 | train_preds = np.add(train_preds, min_score) 269 | #train_kappp_score = kappa(train_scores, train_preds, 'quadratic') 270 | train_kappp_score = quadratic_weighted_kappa( 271 | train_scores, train_preds, min_score, max_score) 272 | # test on eval data 273 | eval_preds = [] 274 | for start in range(0, n_eval, test_batch_size): 275 | end = min(n_eval, start+test_batch_size) 276 | 277 | #batched_memory = [] 278 | #for _ in range(end-start): 279 | # batched_memory.append(memory) 280 | batched_memory = [memory] * (end-start) 281 | preds, _ = test_step(evalE[start:end], batched_memory) 282 | for ite in preds: 283 | eval_preds.append(ite) 284 | 285 | eval_preds = np.add(eval_preds, min_score) 286 | #eval_kappp_score = kappa(eval_scores, eval_preds, 'quadratic') 287 | eval_kappp_score = quadratic_weighted_kappa( 288 | eval_scores, eval_preds, min_score, max_score) 289 | 290 | # test on test data 291 | test_preds = [] 292 | test_atten_probs = [] 293 | for start in range(0, n_test, test_batch_size): 294 | end = min(n_test, start+test_batch_size) 295 | 296 | #batched_memory = [] 297 | #for _ in range(end-start): 298 | # batched_memory.append(memory) 299 | batched_memory = [memory] * (end-start) 300 | preds, mem_attention_probs = test_step(testE[start:end], batched_memory) 301 | for ite in preds: 302 | test_preds.append(ite) 303 | for ite in mem_attention_probs: 304 | test_atten_probs.append(ite) 305 | test_preds = np.add(test_preds, min_score) 306 | #test_kappp_score = kappa(test_scores, test_preds, 'quadratic') 307 | test_kappp_score = quadratic_weighted_kappa( 308 | test_scores, test_preds, min_score, max_score) 309 | stat_dict = {'essay_id': test_essay_id, 'score': test_scores, 'pred_score': test_preds} 310 | stat_df = pd.DataFrame(stat_dict) 311 | # save the model if it gets best kappa 312 | if(test_kappp_score > best_kappa_so_far): 313 | best_kappa_so_far = test_kappp_score 314 | # stats on test 315 | stat_df.to_csv(out_dir+'/stat') 316 | with open(out_dir+'/mem_atten', 'a') as f: 317 | for idx, ite in enumerate(test_essay_id): 318 | f.write('{}\n'.format(ite)) 319 | f.write('{}\n'.format(test_atten_probs[idx])) 320 | #saver.save(sess, out_dir+'/checkpoints', global_step) 321 | print("Training kappa score = {}".format(train_kappp_score)) 322 | print("Validation kappa score = {}".format(eval_kappp_score)) 323 | print("Testing kappa score = {}".format(test_kappp_score)) 324 | with open(out_dir+'/eval', 'a') as f: 325 | f.write("Training kappa score = {}\n".format(train_kappp_score)) 326 | f.write("Validation kappa score = {}\n".format(eval_kappp_score)) 327 | f.write("Testing kappa score = {}\n".format(test_kappp_score)) 328 | f.write("Best Testing kappa score so far = {}\n".format(best_kappa_so_far)) 329 | f.write('*'*10) 330 | f.write('\n') 331 | #sys.stdout = orig_stdout 332 | #f.close() 333 | -------------------------------------------------------------------------------- /cv_train.py: -------------------------------------------------------------------------------- 1 | import data_utils 2 | import numpy as np 3 | from sklearn.model_selection import KFold 4 | from qwk import quadratic_weighted_kappa 5 | import tensorflow as tf 6 | import time 7 | import os 8 | import sys 9 | import pandas as pd 10 | 11 | print 'start to load flags\n' 12 | 13 | # flags 14 | tf.flags.DEFINE_float("epsilon", 0.1, "Epsilon value for Adam Optimizer.") 15 | tf.flags.DEFINE_float("l2_lambda", 0.3, "Lambda for l2 loss.") 16 | tf.flags.DEFINE_float("learning_rate", 0.002, "Learning rate") 17 | tf.flags.DEFINE_float("max_grad_norm", 10.0, "Clip gradients to this norm.") 18 | tf.flags.DEFINE_float("keep_prob", 0.9, "Keep probability for dropout") 19 | tf.flags.DEFINE_integer("evaluation_interval", 2, "Evaluate and print results every x epochs") 20 | tf.flags.DEFINE_integer("batch_size", 15, "Batch size for training.") 21 | tf.flags.DEFINE_integer("feature_size", 100, "Feature size") 22 | tf.flags.DEFINE_integer("num_samples", 1, "Number of samples selected from training for each score") 23 | tf.flags.DEFINE_integer("hops", 3, "Number of hops in the Memory Network.") 24 | tf.flags.DEFINE_integer("epochs", 200, "Number of epochs to train for.") 25 | tf.flags.DEFINE_integer("embedding_size", 300, "Embedding size for embedding matrices.") 26 | tf.flags.DEFINE_integer("essay_set_id", 1, "essay set id, 1 <= id <= 8") 27 | tf.flags.DEFINE_integer("token_num", 42, "The number of token in glove (6, 42)") 28 | tf.flags.DEFINE_boolean("gated_addressing", False, "Simple gated addressing") 29 | tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement") 30 | tf.flags.DEFINE_boolean("is_regression", False, "The output is regression or classification") 31 | tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices") 32 | # hyper-parameters 33 | FLAGS = tf.flags.FLAGS 34 | 35 | early_stop_count = 0 36 | max_step_count = 10 37 | is_regression = FLAGS.is_regression 38 | gated_addressing = FLAGS.gated_addressing 39 | essay_set_id = FLAGS.essay_set_id 40 | batch_size = FLAGS.batch_size 41 | embedding_size = FLAGS.embedding_size 42 | feature_size = FLAGS.feature_size 43 | l2_lambda = FLAGS.l2_lambda 44 | hops = FLAGS.hops 45 | reader = 'bow' # gru may not work 46 | epochs = FLAGS.epochs 47 | num_samples = FLAGS.num_samples 48 | num_tokens = FLAGS.token_num 49 | test_batch_size = batch_size 50 | random_state = 0 51 | if is_regression: 52 | from memn2n_kv_regression import MemN2N_KV 53 | else: 54 | from memn2n_kv import MemN2N_KV 55 | # print flags info 56 | orig_stdout = sys.stdout 57 | timestamp = time.strftime("%b_%d_%Y_%H:%M:%S", time.localtime()) 58 | folder_name = 'essay_set_{}_cv_{}_{}'.format(essay_set_id, num_samples, timestamp) 59 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", folder_name)) 60 | if not os.path.exists(out_dir): 61 | os.makedirs(out_dir) 62 | 63 | # save output to a file 64 | #f = file(out_dir+'/out.txt', 'w') 65 | #sys.stdout = f 66 | print("Writing to {}\n".format(out_dir)) 67 | 68 | print("\nParameters:") 69 | for attr, value in sorted(FLAGS.__flags.items()): 70 | print("{}={}".format(attr.upper(), value)) 71 | print("") 72 | 73 | with open(out_dir+'/params', 'w') as f: 74 | for attr, value in sorted(FLAGS.__flags.items()): 75 | f.write("{}={}".format(attr.upper(), value)) 76 | f.write("\n") 77 | 78 | # hyper-parameters end here 79 | training_path = 'training_set_rel3.tsv' 80 | essay_list, resolved_scores, essay_id = data_utils.load_training_data(training_path, essay_set_id) 81 | 82 | max_score = max(resolved_scores) 83 | min_score = min(resolved_scores) 84 | if essay_set_id == 7: 85 | min_score, max_score = 0, 30 86 | elif essay_set_id == 8: 87 | min_score, max_score = 0, 60 88 | 89 | print 'max_score is {} \t min_score is {}\n'.format(max_score, min_score) 90 | with open(out_dir+'/params', 'a') as f: 91 | f.write('max_score is {} \t min_score is {} \n'.format(max_score, min_score)) 92 | 93 | # include max score 94 | score_range = range(min_score, max_score+1) 95 | 96 | #word_idx, _ = data_utils.build_vocab(essay_list, vocab_limit) 97 | 98 | # load glove 99 | word_idx, word2vec = data_utils.load_glove(num_tokens, dim=embedding_size) 100 | 101 | vocab_size = len(word_idx) + 1 102 | # stat info on data set 103 | 104 | sent_size_list = map(len, [essay for essay in essay_list]) 105 | max_sent_size = max(sent_size_list) 106 | mean_sent_size = int(np.mean(map(len, [essay for essay in essay_list]))) 107 | 108 | print 'max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size) 109 | with open(out_dir+'/params', 'a') as f: 110 | f.write('max sentence size: {} \nmean sentence size: {}\n'.format(max_sent_size, mean_sent_size)) 111 | 112 | print 'The length of score range is {}'.format(len(score_range)) 113 | E = data_utils.vectorize_data(essay_list, word_idx, max_sent_size) 114 | 115 | labeled_data = zip(E, resolved_scores, sent_size_list) 116 | 117 | # split the data on the fly 118 | #trainE, testE, train_scores, test_scores, train_sent_sizes, test_sent_sizes = cross_validation.train_test_split( 119 | # E, resolved_scores, sent_size_list, test_size=.2, random_state=random_state) 120 | 121 | #trainE, evalE, train_scores, eval_scores, train_sent_sizes, eval_sent_sizes = cross_validation.train_test_split( 122 | # trainE, train_scores, train_sent_sizes, test_size=.1, random_state=random_state) 123 | # split the data on the fly 124 | 125 | def train_step(m, e, s, ma): 126 | start_time = time.time() 127 | feed_dict = { 128 | model._query: e, 129 | model._memory_key: m, 130 | model._score_encoding: s, 131 | model._mem_attention_encoding: ma, 132 | model.keep_prob: FLAGS.keep_prob 133 | #model.w_placeholder: word2vec 134 | } 135 | _, step, predict_op, cost = sess.run([train_op, global_step, model.predict_op, model.cost], feed_dict) 136 | end_time = time.time() 137 | time_spent = end_time - start_time 138 | return predict_op, cost, time_spent 139 | 140 | def test_step(e, m): 141 | feed_dict = { 142 | model._query: e, 143 | model._memory_key: m, 144 | model.keep_prob: 1 145 | #model.w_placeholder: word2vec 146 | } 147 | preds, mem_attention_probs = sess.run([model.predict_op, model.mem_attention_probs], feed_dict) 148 | if is_regression: 149 | preds = np.clip(np.round(preds), min_score, max_score) 150 | return preds, mem_attention_probs 151 | else: 152 | return preds, mem_attention_probs 153 | 154 | fold_count = 0 155 | kf = KFold(n_splits=5, random_state=random_state) 156 | best_kappa_scores = [] 157 | for train_index, test_index in kf.split(essay_id): 158 | early_stop_count = 0 159 | fold_count += 1 160 | trainE = [] 161 | testE = [] 162 | train_scores = [] 163 | test_scores = [] 164 | train_essay_id = [] 165 | test_essay_id = [] 166 | 167 | for ite in train_index: 168 | trainE.append(E[ite]) 169 | train_scores.append(resolved_scores[ite]) 170 | train_essay_id.append(essay_id[ite]) 171 | for ite in test_index: 172 | testE.append(E[ite]) 173 | test_scores.append(resolved_scores[ite]) 174 | test_essay_id.append(essay_id[ite]) 175 | 176 | #trainE, testE, train_scores, test_scores, train_essay_id, test_essay_id = cross_validation.train_test_split( 177 | # E, resolved_scores, essay_id, test_size=.2, random_state=random_state) 178 | 179 | memory = [] 180 | memory_score = [] 181 | memory_sent_size = [] 182 | memory_essay_ids = [] 183 | # pick sampled essay for each score 184 | for i in score_range: 185 | # test point: limit the number of samples in memory for 8 186 | for j in range(num_samples): 187 | if i in train_scores: 188 | score_idx = train_scores.index(i) 189 | score = train_scores.pop(score_idx) 190 | essay = trainE.pop(score_idx) 191 | sent_size = sent_size_list.pop(score_idx) 192 | memory.append(essay) 193 | memory_score.append(score) 194 | memory_essay_ids.append(train_essay_id.pop(score_idx)) 195 | memory_sent_size.append(sent_size) 196 | memory_size = len(memory) 197 | if is_regression: 198 | # bad naming 199 | train_scores_encoding = train_scores 200 | else: 201 | train_scores_encoding = map(lambda x: score_range.index(x), train_scores) 202 | 203 | # data size 204 | n_train = len(trainE) 205 | n_test = len(testE) 206 | 207 | print 'The size of training data: {}'.format(n_train) 208 | print 'The size of testing data: {}'.format(n_test) 209 | with open(out_dir+'/params{}'.format(fold_count), 'a') as f: 210 | f.write('The size of training data: {}\n'.format(n_train)) 211 | f.write('The size of testing data: {}\n'.format(n_test)) 212 | f.write('\nEssay scores in memory:\n{}'.format(memory_score)) 213 | f.write('\nEssay ids in memory:\n{}'.format(memory_essay_ids)) 214 | f.write('\nEssay ids in training:\n{}'.format(train_essay_id)) 215 | f.write('\nEssay ids in testing:\n{}'.format(test_essay_id)) 216 | 217 | batches = zip(range(0, n_train-batch_size, batch_size), range(batch_size, n_train, batch_size)) 218 | batches = [(start, end) for start, end in batches] 219 | 220 | with tf.Graph().as_default(): 221 | session_conf = tf.ConfigProto( 222 | allow_soft_placement=FLAGS.allow_soft_placement, 223 | log_device_placement=FLAGS.log_device_placement) 224 | 225 | global_step = tf.Variable(0, name="global_step", trainable=False) 226 | # decay learning rate 227 | starter_learning_rate = FLAGS.learning_rate 228 | learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 3000, 0.96, staircase=True) 229 | 230 | # test point 231 | optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=FLAGS.epsilon) 232 | #optimizer = tf.train.AdagradOptimizer(learning_rate) 233 | best_kappa_so_far = 0.0 234 | with tf.Session(config=session_conf) as sess: 235 | model = MemN2N_KV(batch_size, vocab_size, max_sent_size, max_sent_size, memory_size, 236 | memory_size, embedding_size, len(score_range), feature_size, hops, reader, l2_lambda) 237 | 238 | grads_and_vars = optimizer.compute_gradients(model.loss_op, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE) 239 | grads_and_vars = [(tf.clip_by_norm(g, FLAGS.max_grad_norm), v) 240 | for g, v in grads_and_vars if g is not None] 241 | #grads_and_vars = [(add_gradient_noise(g, 1e-4), v) for g, v in grads_and_vars] 242 | train_op = optimizer.apply_gradients(grads_and_vars, name="train_op", global_step=global_step) 243 | sess.run(tf.global_variables_initializer(), feed_dict={model.w_placeholder: word2vec}) 244 | saver = tf.train.Saver(tf.global_variables()) 245 | 246 | for i in range(1, epochs+1): 247 | train_cost = 0 248 | total_time = 0 249 | np.random.shuffle(batches) 250 | for start, end in batches: 251 | e = trainE[start:end] 252 | s = train_scores_encoding[start:end] 253 | s_num = train_scores[start:end] 254 | #batched_memory = [] 255 | # batch sized memory 256 | #for _ in range(len(e)): 257 | # batched_memory.append(memory) 258 | mem_atten_encoding = [] 259 | for ite in s_num: 260 | mem_encoding = np.zeros(memory_size) 261 | for j_idx, j in enumerate(memory_score): 262 | if j == ite: 263 | mem_encoding[j_idx] = 1 264 | mem_atten_encoding.append(mem_encoding) 265 | batched_memory = [memory] * (end-start) 266 | _, cost, time_spent = train_step(batched_memory, e, s, mem_atten_encoding) 267 | total_time += time_spent 268 | train_cost += cost 269 | print 'Finish epoch {}, total training cost is {}, time spent is {}'.format(i, train_cost, total_time) 270 | # evaluation 271 | if i % FLAGS.evaluation_interval == 0 or i == FLAGS.epochs: 272 | # test on training data 273 | train_preds = [] 274 | for start in range(0, n_train, test_batch_size): 275 | end = min(n_train, start+test_batch_size) 276 | 277 | #batched_memory = [] 278 | #for _ in range(end-start): 279 | # batched_memory.append(memory) 280 | batched_memory = [memory] * (end-start) 281 | preds, _ = test_step(trainE[start:end], batched_memory) 282 | if type(preds) is np.float32: 283 | train_preds.append(preds) 284 | else: 285 | for ite in preds: 286 | train_preds.append(ite) 287 | if not is_regression: 288 | train_preds = np.add(train_preds, min_score) 289 | #train_kappp_score = kappa(train_scores, train_preds, 'quadratic') 290 | train_kappp_score = quadratic_weighted_kappa( 291 | train_scores, train_preds, min_score, max_score) 292 | # test on test data 293 | test_preds = [] 294 | test_atten_probs = [] 295 | for start in range(0, n_test, test_batch_size): 296 | end = min(n_test, start+test_batch_size) 297 | 298 | #batched_memory = [] 299 | #for _ in range(end-start): 300 | # batched_memory.append(memory) 301 | batched_memory = [memory] * (end-start) 302 | preds, mem_attention_probs = test_step(testE[start:end], batched_memory) 303 | if type(preds) is np.float32: 304 | test_preds.append(preds) 305 | else: 306 | for ite in preds: 307 | test_preds.append(ite) 308 | for ite in mem_attention_probs: 309 | test_atten_probs.append(ite) 310 | if not is_regression: 311 | test_preds = np.add(test_preds, min_score) 312 | #test_kappp_score = kappa(test_scores, test_preds, 'quadratic') 313 | test_kappp_score = quadratic_weighted_kappa( 314 | test_scores, test_preds, min_score, max_score) 315 | stat_dict = {'essay_id': test_essay_id, 'score': test_scores, 'pred_score': test_preds} 316 | stat_df = pd.DataFrame(stat_dict) 317 | # save the model if it gets best kappa 318 | if(test_kappp_score > best_kappa_so_far): 319 | early_stop_count = 0 320 | best_kappa_so_far = test_kappp_score 321 | # stats on test 322 | stat_df.to_csv(out_dir+'/stat') 323 | with open(out_dir+'/mem_atten', 'a') as f: 324 | for idx, ite in enumerate(test_essay_id): 325 | f.write('{}\n'.format(ite)) 326 | f.write('{}\n'.format(test_atten_probs[idx])) 327 | #saver.save(sess, out_dir+'/checkpoints', global_step) 328 | else: 329 | early_stop_count += 1 330 | print("Training kappa score = {}".format(train_kappp_score)) 331 | print("Testing kappa score = {}".format(test_kappp_score)) 332 | with open(out_dir+'/eval{}'.format(fold_count), 'a') as f: 333 | f.write("Training kappa score = {}\n".format(train_kappp_score)) 334 | f.write("Testing kappa score = {}\n".format(test_kappp_score)) 335 | f.write("Best Testing kappa score so far = {}\n".format(best_kappa_so_far)) 336 | f.write('*'*10) 337 | f.write('\n') 338 | if early_stop_count > max_step_count: 339 | break 340 | best_kappa_scores.append(best_kappa_so_far) 341 | 342 | with open(out_dir+'/eval'.format(fold_count), 'a') as f: 343 | f.write('5 fold cv {}\n'.format(best_kappa_scores)) 344 | f.write('final result is {}'.format(np.mean(np.array(best_kappa_scores)))) 345 | 346 | #sys.stdout = orig_stdout 347 | #f.close() 348 | --------------------------------------------------------------------------------