├── .gitignore ├── README.md ├── config ├── bAbi_task1.yml └── check_tiny.yml ├── data └── en-10 │ ├── qa1_single-supporting-fact_test.txt │ └── qa1_single-supporting-fact_train.txt ├── data_loader.py ├── dynamic_memory ├── __init__.py ├── encoder.py └── episode.py ├── dynamic_memory_plus ├── __init__.py ├── attn_gru.py ├── encoder.py ├── episode.py └── input.py ├── hook.py ├── images └── ask_me_anything_figure_3.png ├── main.py ├── model.py ├── notebooks └── data_loader.py.ipynb ├── requirements.txt └── scripts ├── fetch_babi_data.sh └── fetch_glove_data.sh /.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | .ipynb_checkpoints/ 3 | notebooks/.ipynb_checkpoints/ 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Dynamic Memory Network [![hb-research](https://img.shields.io/badge/hb--research-experiment-green.svg?style=flat&colorA=448C57&colorB=555555)](https://github.com/hb-research) 2 | 3 | TensorFlow implementation of [Ask Me Anything: 4 | Dynamic Memory Networks for Natural Language Processing](https://arxiv.org/pdf/1506.07285.pdf). 5 | 6 | ![images](images/ask_me_anything_figure_3.png) 7 | 8 | 9 | ## Requirements 10 | 11 | - Python 3.6 12 | - TensorFlow 1.8 13 | - [hb-config](https://github.com/hb-research/hb-config) (Singleton Config) 14 | - nltk (tokenizer and blue score) 15 | - tqdm (progress bar) 16 | 17 | 18 | ## Project Structure 19 | 20 | init Project by [hb-base](https://github.com/hb-research/hb-base) 21 | 22 | . 23 | ├── config # Config files (.yml, .json) using with hb-config 24 | ├── data # dataset path 25 | ├── notebooks # Prototyping with numpy or tf.interactivesession 26 | ├── dynamic_memory # dmn architecture graphs (from input to output) 27 | ├── __init__.py # Graph logic 28 | ├── encoder.py # Encoder 29 | └── episode.py # Episode and AttentionGate 30 | ├── data_loader.py # raw_date -> precossed_data -> generate_batch (using Dataset) 31 | ├── hook.py # training or test hook feature (eg. print_variables) 32 | ├── main.py # define experiment_fn 33 | └── model.py # define EstimatorSpec 34 | 35 | Reference : [hb-config](https://github.com/hb-research/hb-config), [Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator), [experiments_fn](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment), [EstimatorSpec](https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec) 36 | 37 | 38 | ## Todo 39 | 40 | - Implements DMN+ ([Dynamic Memory Networks for Visual and Textual Question Answering](https://arxiv.org/pdf/1603.01417.pdf) (2016) by C Xiong) 41 | 42 | 43 | 44 | ## Config 45 | 46 | example: bAbi_task1.yml 47 | 48 | ```yml 49 | data: 50 | base_path: 'data/' 51 | task_path: 'en-10k/' 52 | task_id: 1 53 | PAD_ID: 0 54 | 55 | model: 56 | batch_size: 16 57 | use_pretrained: true # (true or false) 58 | embed_dim: 50 # if use_pretrained: only available 50, 100, 200, 300 59 | encoder_type: uni # uni, bi 60 | cell_type: gru # lstm, gru, layer_norm_lstm, nas 61 | num_layers: 1 62 | num_units: 32 63 | memory_hob: 3 64 | dropout: 0.0 65 | reg_scale: 0.001 66 | 67 | train: 68 | learning_rate: 0.0001 69 | optimizer: 'Adam' # Adagrad, Adam, Ftrl, Momentum, RMSProp, SGD 70 | 71 | train_steps: 100000 72 | model_dir: 'logs/bAbi_task1' 73 | 74 | save_checkpoints_steps: 1000 75 | check_hook_n_iter: 1000 76 | min_eval_frequency: 1000 77 | 78 | print_verbose: False 79 | debug: False 80 | ``` 81 | 82 | 83 | ## Usage 84 | 85 | Install requirements. 86 | 87 | ```pip install -r requirements.txt``` 88 | 89 | Then, prepare dataset and pre-trained glove. 90 | 91 | ``` 92 | sh scripts/fetch_babi_data.sh 93 | sh scripts/fetch_glove_data.sh 94 | ``` 95 | 96 | Finally, start trand and evalueate model 97 | ``` 98 | python main.py --config bAbi_task1 --mode train_and_evaluate 99 | ``` 100 | 101 | ### Experiments modes 102 | 103 | :white_check_mark: : Working 104 | :white_medium_small_square: : Not tested yet. 105 | 106 | 107 | - :white_check_mark: `evaluate` : Evaluate on the evaluation data. 108 | - :white_medium_small_square: `extend_train_hooks` : Extends the hooks for training. 109 | - :white_medium_small_square: `reset_export_strategies` : Resets the export strategies with the new_export_strategies. 110 | - :white_medium_small_square: `run_std_server` : Starts a TensorFlow server and joins the serving thread. 111 | - :white_medium_small_square: `test` : Tests training, evaluating and exporting the estimator for a single step. 112 | - :white_check_mark: `train` : Fit the estimator using the training data. 113 | - :white_check_mark: `train_and_evaluate` : Interleaves training and evaluation. 114 | 115 | --- 116 | 117 | 118 | ### Tensorboar 119 | 120 | ```tensorboard --logdir logs``` 121 | 122 | 123 | ## Reference 124 | 125 | - [Implementing Dynamic memory networks](https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/) 126 | - [arXiv - Ask Me Anything: 127 | Dynamic Memory Networks for Natural Language Processing](https://arxiv.org/abs/1506.07285) (2015. 6) by A Kumar 128 | - [arXiv - Dynamic Memory Networks for Visual and Textual Question Answering](https://arxiv.org/abs/1603.01417) (2016. 3) by C Xiong 129 | 130 | ## Author 131 | 132 | Dongjun Lee (humanbrain.djlee@gmail.com) 133 | -------------------------------------------------------------------------------- /config/bAbi_task1.yml: -------------------------------------------------------------------------------- 1 | data: 2 | base_path: 'data/' 3 | task_path: 'en-10/' 4 | task_id: 1 5 | PAD_ID: 0 6 | 7 | model: 8 | batch_size: 16 9 | use_pretrained: true # (true or false) 10 | embed_dim: 50 # if use_pretrained: only available 50, 100, 200, 300 11 | encoder_type: UNI # uni, bi 12 | cell_type: GRU # lstm, gru, layer_norm_lstm, nas 13 | num_layers: 1 14 | num_units: 32 15 | memory_hob: 3 16 | dropout: 0.0 17 | reg_scale: 0.001 18 | 19 | train: 20 | learning_rate: 0.0001 21 | optimizer: 'Adam' # Adagrad, Adam, Ftrl, Momentum, RMSProp, SGD 22 | 23 | train_steps: 100000 24 | model_dir: 'logs/bAbi_task1' 25 | 26 | save_checkpoints_steps: 1000 27 | check_hook_n_iter: 1000 28 | min_eval_frequency: 1000 29 | 30 | print_verbose: False 31 | debug: False 32 | -------------------------------------------------------------------------------- /config/check_tiny.yml: -------------------------------------------------------------------------------- 1 | data: 2 | base_path: 'data/' 3 | task_path: 'en-10/' 4 | task_id: 1 5 | PAD_ID: 0 6 | 7 | model: 8 | batch_size: 2 9 | use_pretrained: false # (true or false) 10 | embed_dim: 8 # if use_pretrained: only available 50, 100, 200, 300 11 | encoder_type: uni # uni, bi 12 | cell_type: gru # lstm, gru, layer_norm_lstm, nas 13 | num_layers: 2 14 | num_units: 16 15 | memory_hob: 2 16 | dropout: 0.5 17 | reg_scale: 0.001 18 | 19 | train: 20 | learning_rate: 0.001 21 | optimizer: 'Adam' # Adagrad, Adam, Ftrl, Momentum, RMSProp, SGD 22 | 23 | train_steps: 10000 24 | model_dir: 'logs/check_tiny' 25 | 26 | save_checkpoints_steps: 1000 27 | check_hook_n_iter: 100 28 | min_eval_frequency: 100 29 | 30 | print_verbose: False 31 | debug: False 32 | -------------------------------------------------------------------------------- /data/en-10/qa1_single-supporting-fact_test.txt: -------------------------------------------------------------------------------- 1 | 1 Mary moved to the bathroom. 2 | 2 John went to the hallway. 3 | 3 Where is Mary? bathroom 1 4 | 4 Daniel went back to the hallway. 5 | 5 Sandra moved to the garden. 6 | 6 Where is Daniel? hallway 4 7 | 7 John moved to the office. 8 | 8 Sandra journeyed to the bathroom. 9 | 9 Where is Daniel? hallway 4 -------------------------------------------------------------------------------- /data/en-10/qa1_single-supporting-fact_train.txt: -------------------------------------------------------------------------------- 1 | 1 Mary moved to the bathroom. 2 | 2 John went to the hallway. 3 | 3 Where is Mary? bathroom 1 4 | 4 Daniel went back to the hallway. 5 | 5 Sandra moved to the garden. 6 | 6 Where is Daniel? hallway 4 7 | 7 John moved to the office. 8 | 8 Sandra journeyed to the bathroom. 9 | 9 Where is Daniel? hallway 4 -------------------------------------------------------------------------------- /data_loader.py: -------------------------------------------------------------------------------- 1 | """ 2 | bAbi data_loader 3 | Original code : https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano/blob/master/utils.py 4 | """ 5 | 6 | import os 7 | 8 | from hbconfig import Config 9 | import numpy as np 10 | import tensorflow as tf 11 | from tqdm import tqdm 12 | 13 | 14 | 15 | class DataLoader: 16 | 17 | def __init__(self, task_path, task_id, task_test_id, w2v_dim=100, input_mask_mode="sentence", use_pretrained=True): 18 | self.base_path = "data/" 19 | self.task_path = task_path 20 | 21 | self.task_id = str(task_id) 22 | self.task_test_id = str(task_test_id) 23 | self.w2v_dim = w2v_dim 24 | self.input_mask_mode = input_mask_mode 25 | self.use_pretrained = use_pretrained 26 | 27 | def make_train_and_test_set(self): 28 | train_raw, test_raw = self.get_babi_raw(self.task_id, self.task_test_id) 29 | self.max_facts_seq_len, self.max_question_seq_len, self.max_input_mask_len = self.get_max_seq_length(train_raw, test_raw) 30 | 31 | if self.use_pretrained: 32 | self.word2vec = self.load_glove(self.w2v_dim) 33 | else: 34 | self.word2vec = {} 35 | self.vocab = {} 36 | self.ivocab = {} 37 | 38 | self.create_vector("unknown") 39 | 40 | train_input, train_question, train_answer, train_input_mask = self.process_input(train_raw) 41 | test_input, test_question, test_answer, test_input_mask = self.process_input(test_raw) 42 | 43 | return { 44 | "train": (train_input, train_input_mask, train_question, train_answer), 45 | "test": (test_input, test_input_mask, test_question, test_answer) 46 | } 47 | 48 | def get_max_seq_length(self, *datasets): 49 | max_facts_length, max_question_length, max_input_mask_length = 0, 0, 0 50 | 51 | def count_punctuation(facts): 52 | return len(list(filter(lambda x: x == ".", facts))) 53 | 54 | for dataset in datasets: 55 | for d in dataset: 56 | max_facts_length = max(max_facts_length, len(d['C'].split())) 57 | max_input_mask_length = max(max_input_mask_length, count_punctuation(d['C'])) 58 | max_question_length = max(max_question_length, len(d['Q'].split())) 59 | return max_facts_length, max_question_length, max_input_mask_length 60 | 61 | def init_babi(self, fname): 62 | print("==> Loading test from %s" % fname) 63 | tasks = [] 64 | task = None 65 | for i, line in enumerate(open(fname)): 66 | id = int(line[0:line.find(' ')]) 67 | if id == 1: 68 | task = {"C": "", "Q": "", "A": ""} 69 | 70 | line = line.strip() 71 | line = line.replace('.', ' . ') 72 | line = line[line.find(' ')+1:] 73 | if line.find('?') == -1: 74 | task["C"] += line 75 | else: 76 | idx = line.find('?') 77 | tmp = line[idx+1:].split('\t') 78 | task["Q"] = line[:idx] 79 | task["A"] = tmp[1].strip() 80 | tasks.append(task.copy()) 81 | 82 | return tasks 83 | 84 | 85 | def get_babi_raw(self, id, test_id): 86 | babi_map = { 87 | "1": "qa1_single-supporting-fact", 88 | "2": "qa2_two-supporting-facts", 89 | "3": "qa3_three-supporting-facts", 90 | "4": "qa4_two-arg-relations", 91 | "5": "qa5_three-arg-relations", 92 | "6": "qa6_yes-no-questions", 93 | "7": "qa7_counting", 94 | "8": "qa8_lists-sets", 95 | "9": "qa9_simple-negation", 96 | "10": "qa10_indefinite-knowledge", 97 | "11": "qa11_basic-coreference", 98 | "12": "qa12_conjunction", 99 | "13": "qa13_compound-coreference", 100 | "14": "qa14_time-reasoning", 101 | "15": "qa15_basic-deduction", 102 | "16": "qa16_basic-induction", 103 | "17": "qa17_positional-reasoning", 104 | "18": "qa18_size-reasoning", 105 | "19": "qa19_path-finding", 106 | "20": "qa20_agents-motivations", 107 | "MCTest": "MCTest", 108 | "19changed": "19changed", 109 | "joint": "all_shuffled", 110 | "sh1": "../shuffled/qa1_single-supporting-fact", 111 | "sh2": "../shuffled/qa2_two-supporting-facts", 112 | "sh3": "../shuffled/qa3_three-supporting-facts", 113 | "sh4": "../shuffled/qa4_two-arg-relations", 114 | "sh5": "../shuffled/qa5_three-arg-relations", 115 | "sh6": "../shuffled/qa6_yes-no-questions", 116 | "sh7": "../shuffled/qa7_counting", 117 | "sh8": "../shuffled/qa8_lists-sets", 118 | "sh9": "../shuffled/qa9_simple-negation", 119 | "sh10": "../shuffled/qa10_indefinite-knowledge", 120 | "sh11": "../shuffled/qa11_basic-coreference", 121 | "sh12": "../shuffled/qa12_conjunction", 122 | "sh13": "../shuffled/qa13_compound-coreference", 123 | "sh14": "../shuffled/qa14_time-reasoning", 124 | "sh15": "../shuffled/qa15_basic-deduction", 125 | "sh16": "../shuffled/qa16_basic-induction", 126 | "sh17": "../shuffled/qa17_positional-reasoning", 127 | "sh18": "../shuffled/qa18_size-reasoning", 128 | "sh19": "../shuffled/qa19_path-finding", 129 | "sh20": "../shuffled/qa20_agents-motivations", 130 | } 131 | if (test_id == ""): 132 | test_id = id 133 | babi_name = babi_map[id] 134 | babi_test_name = babi_map[test_id] 135 | babi_train_raw = self.init_babi(os.path.join(self.base_path, self.task_path, '%s_train.txt' % babi_name)) 136 | babi_test_raw = self.init_babi(os.path.join(self.base_path, self.task_path, '%s_test.txt' % babi_test_name)) 137 | return babi_train_raw, babi_test_raw 138 | 139 | def load_glove(self, dim): 140 | word2vec = {} 141 | 142 | print("==> loading glove") 143 | with open(os.path.join(self.base_path, "glove/glove.6B." + str(dim) + "d.txt"), 'rb') as f: 144 | for line in tqdm(f): 145 | l = line.decode('utf-8').split() 146 | word2vec[l[0]] = l[1:] 147 | 148 | print("==> glove is loaded") 149 | 150 | return word2vec 151 | 152 | def create_vector(self, word, silent=False): 153 | # if the word is missing from Glove, create some fake vector and store in glove! 154 | vector = np.random.uniform(0.0, 1.0, (self.w2v_dim,)) 155 | self.word2vec[word] = vector 156 | if (not silent): 157 | print("data_loader.py::create_vector => %s is missing" % word) 158 | return vector 159 | 160 | def process_word(self, word, to_return="word2vec", silent=False): 161 | if not word in self.word2vec: 162 | self.create_vector(word, silent=silent) 163 | if not word in self.vocab: 164 | next_index = len(self.vocab) 165 | self.vocab[word] = next_index 166 | self.ivocab[next_index] = word 167 | 168 | if to_return == "word2vec": 169 | return self.word2vec[word] 170 | elif to_return == "index": 171 | return self.vocab[word] 172 | else: 173 | raise ValueError("return type is 'word2vec' or 'index'") 174 | 175 | def get_norm(self, x): 176 | x = np.array(x) 177 | return np.sum(x * x) 178 | 179 | def process_input(self, data_raw): 180 | questions = [] 181 | inputs = [] 182 | answers = [] 183 | input_masks = [] 184 | 185 | for x in data_raw: 186 | inp = x["C"].lower().split(' ') 187 | inp = [w for w in inp if len(w) > 0] 188 | 189 | q = x["Q"].lower().split(' ') 190 | q = [w for w in q if len(w) > 0] 191 | 192 | inp_vector = [self.process_word(word=w, to_return="word2vec") for w in inp] 193 | inp_vector = self.pad_input(inp_vector, self.max_facts_seq_len, [np.zeros(self.w2v_dim)]) 194 | 195 | q_vector = [self.process_word(word=w, to_return="word2vec") for w in q] 196 | q_vector = self.pad_input(q_vector, self.max_question_seq_len, [np.zeros(self.w2v_dim)]) 197 | 198 | inputs.append(np.vstack(inp_vector).astype(float)) 199 | questions.append(np.vstack(q_vector).astype(float)) 200 | answers.append(self.process_word(word = x["A"], to_return = "index")) 201 | 202 | if self.input_mask_mode == 'word': 203 | input_masks.append(np.array([index for index, w in enumerate(inp)], dtype=np.int32)) 204 | elif self.input_mask_mode == 'sentence': 205 | input_mask = [index for index, w in enumerate(inp) if w == '.'] 206 | input_mask = self.pad_input(input_mask, self.max_input_mask_len, [0]) 207 | input_masks.append(input_mask) 208 | else: 209 | raise ValueError("input_mask_mode is only available (word, sentence)") 210 | 211 | return (np.array(inputs, dtype=np.float32), 212 | np.array(questions, dtype=np.float32), 213 | np.array(answers, dtype=np.int32).reshape(-1, 1), 214 | np.array(input_masks, dtype=np.int32)) 215 | 216 | def pad_input(self, input_, size, pad_item): 217 | return input_ + pad_item * (size - len(input_)) 218 | 219 | def make_batch(self, data, buffer_size=10000, batch_size=64, scope="train"): 220 | 221 | class IteratorInitializerHook(tf.train.SessionRunHook): 222 | """Hook to initialise data iterator after Session is created.""" 223 | 224 | def __init__(self): 225 | super(IteratorInitializerHook, self).__init__() 226 | self.iterator_initializer_func = None 227 | 228 | def after_create_session(self, session, coord): 229 | """Initialise the iterator after the session has been created.""" 230 | self.iterator_initializer_func(session) 231 | 232 | 233 | iterator_initializer_hook = IteratorInitializerHook() 234 | 235 | def get_inputs(): 236 | with tf.name_scope(scope): 237 | 238 | inputs, input_masks, questions, answers = data 239 | 240 | # Define placeholders 241 | input_placeholder = tf.placeholder( 242 | tf.float32, [None, Config.data.max_facts_seq_len, Config.model.embed_dim]) 243 | input_mask_placeholder = tf.placeholder( 244 | tf.int32, [None, Config.data.max_input_mask_length]) 245 | question_placeholder = tf.placeholder( 246 | tf.float32, [None, Config.data.max_question_seq_len, Config.model.embed_dim]) 247 | answer_placeholder = tf.placeholder( 248 | tf.int32, [None, 1]) 249 | 250 | # Build dataset iterator 251 | dataset = tf.data.Dataset.from_tensor_slices( 252 | (input_placeholder, input_mask_placeholder, 253 | question_placeholder, answer_placeholder)) 254 | 255 | if scope == "train": 256 | dataset = dataset.repeat(None) # Infinite iterations 257 | else: 258 | dataset = dataset.repeat(1) # one Epoch 259 | 260 | dataset = dataset.shuffle(buffer_size=buffer_size) 261 | dataset = dataset.batch(batch_size) 262 | 263 | iterator = dataset.make_initializable_iterator() 264 | next_input, next_input_mask, next_question, next_answer = iterator.get_next() 265 | 266 | # Set runhook to initialize iterator 267 | iterator_initializer_hook.iterator_initializer_func = \ 268 | lambda sess: sess.run( 269 | iterator.initializer, 270 | feed_dict={input_placeholder: inputs, 271 | input_mask_placeholder: input_masks, 272 | question_placeholder: questions, 273 | answer_placeholder: answers}) 274 | 275 | # Return batched (features, labels) 276 | features = {"input_data": next_input, 277 | "input_data_mask": next_input_mask, 278 | "question_data": next_question} 279 | return (features, next_answer) 280 | 281 | # Return function and hook 282 | return get_inputs, iterator_initializer_hook 283 | -------------------------------------------------------------------------------- /dynamic_memory/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | from hbconfig import Config 3 | import tensorflow as tf 4 | 5 | from .encoder import Encoder 6 | from .episode import Episode 7 | 8 | 9 | 10 | class Graph: 11 | 12 | def __init__(self, mode, dtype=tf.float32): 13 | self.mode = mode 14 | self.dtype = dtype 15 | 16 | def build(self, 17 | embedding_input=None, 18 | input_mask=None, 19 | embedding_question=None): 20 | 21 | facts, question = self._build_input_module(embedding_input, input_mask, embedding_question) 22 | last_memory = self._build_episodic_memory(facts, question) 23 | return self._build_answer_decoder(last_memory) 24 | 25 | def _build_input_module(self, embedding_input, input_mask, embedding_question): 26 | encoder = Encoder( 27 | encoder_type=Config.model.encoder_type, 28 | num_layers=Config.model.num_layers, 29 | cell_type=Config.model.cell_type, 30 | num_units=Config.model.num_units, 31 | dropout=Config.model.dropout) 32 | 33 | # slice zeros padding 34 | input_length = tf.reduce_max(input_mask, axis=1) 35 | question_length = tf.reduce_sum(tf.to_int32( 36 | tf.not_equal(tf.reduce_max(embedding_question, axis=2), Config.data.PAD_ID)), axis=1) 37 | 38 | with tf.variable_scope("input-module") as scope: 39 | input_encoder_outputs, _ = encoder.build( 40 | embedding_input, input_length, scope="encoder") 41 | 42 | with tf.variable_scope("facts") as scope: 43 | batch_size = tf.shape(input_mask)[0] 44 | max_mask_length = tf.shape(input_mask)[1] 45 | 46 | def get_encoded_fact(i): 47 | nonlocal input_mask 48 | 49 | mask_lengths = tf.reduce_sum(tf.to_int32(tf.not_equal(input_mask[i], Config.data.PAD_ID)), axis=0) 50 | input_mask = tf.boolean_mask(input_mask[i], tf.sequence_mask(mask_lengths, max_mask_length)) 51 | 52 | encoded_facts = tf.gather_nd(input_encoder_outputs[i], tf.reshape(input_mask, [-1, 1])) 53 | padding = tf.zeros(tf.stack([max_mask_length - mask_lengths, Config.model.num_units])) 54 | return tf.concat([encoded_facts, padding], 0) 55 | 56 | facts_stacked = tf.map_fn(get_encoded_fact, tf.range(start=0, limit=batch_size), dtype=self.dtype) 57 | 58 | # max_input_mask_length x [batch_size, num_units] 59 | facts = tf.unstack(tf.transpose(facts_stacked, [1, 0, 2]), num=Config.data.max_input_mask_length) 60 | 61 | with tf.variable_scope("input-module") as scope: 62 | scope.reuse_variables() 63 | _, question = encoder.build( 64 | embedding_question, question_length, scope="encoder") 65 | 66 | return facts, question[0] 67 | 68 | 69 | def _build_episodic_memory(self, facts, question): 70 | 71 | with tf.variable_scope('episodic-memory-module') as scope: 72 | memory = tf.identity(question) 73 | 74 | episode = Episode(Config.model.num_units, reg_scale=Config.model.reg_scale) 75 | rnn = tf.contrib.rnn.GRUCell(Config.model.num_units) 76 | 77 | for _ in range(Config.model.memory_hob): 78 | updated_memory = episode.update(facts, 79 | tf.transpose(memory, name="m"), 80 | tf.transpose(question, name="q")) 81 | memory, _ = rnn(updated_memory, memory, scope="memory_rnn") 82 | scope.reuse_variables() 83 | return memory 84 | 85 | def _build_answer_decoder(self, last_memory): 86 | 87 | with tf.variable_scope('answer-module'): 88 | w_a = tf.get_variable( 89 | "w_a", [Config.model.num_units, Config.data.vocab_size], 90 | regularizer=tf.contrib.layers.l2_regularizer(Config.model.reg_scale)) 91 | logits = tf.matmul(last_memory, w_a) 92 | return logits 93 | -------------------------------------------------------------------------------- /dynamic_memory/encoder.py: -------------------------------------------------------------------------------- 1 | 2 | import tensorflow as tf 3 | 4 | 5 | 6 | class Encoder: 7 | """Encoder class is Mutil-layer Recurrent Neural Networks 8 | 9 | The 'Encoder' usually encode the sequential input vector. 10 | """ 11 | 12 | UNI_ENCODER_TYPE = "UNI" 13 | BI_ENCODER_TYPE = "BI" 14 | 15 | RNN_GRU_CELL = "GRU" 16 | RNN_LSTM_CELL = "LSTM" 17 | RNN_LAYER_NORM_LSTM_CELL = "layer_norm_lstm" 18 | RNN_NAS_CELL = "nas" 19 | 20 | def __init__(self, encoder_type="UNI", num_layers=4, 21 | cell_type="GRU", num_units=512, dropout=0.8, 22 | dtype=tf.float32): 23 | """Contructs an 'Encoder' instance. 24 | 25 | * Args: 26 | encoder_type: rnn encoder_type (uni, bi) 27 | num_layers: number of RNN cell composed sequentially of multiple simple cells. 28 | input_vector: RNN Input vectors. 29 | sequence_length: batch element's sequence length 30 | cell_type: RNN cell types (lstm, gru, layer_norm_lstm, nas) 31 | num_units: the number of units in cell 32 | dropout: set prob operator adding dropout to inputs of the given cell. 33 | dtype: the dtype of the input 34 | 35 | * Returns: 36 | Encoder instance 37 | """ 38 | 39 | self.encoder_type = encoder_type 40 | self.num_layers = num_layers 41 | self.cell_type = cell_type 42 | self.num_units = num_units 43 | self.dropout = dropout 44 | self.dtype = dtype 45 | 46 | def build(self, input_vector, sequence_length, scope=None): 47 | if self.encoder_type == self.UNI_ENCODER_TYPE: 48 | self.cells = self._create_rnn_cells() 49 | 50 | return self.unidirectional_rnn(input_vector, sequence_length, scope=scope) 51 | elif self.encoder_type == self.BI_ENCODER_TYPE: 52 | self.cells_fw = self._create_rnn_cells(is_list=True) 53 | self.cells_bw = self._create_rnn_cells(is_list=True) 54 | 55 | return self.bidirectional_rnn(input_vector, sequence_length, scope=scope) 56 | else: 57 | raise ValueError(f"Unknown encoder_type {self.encoder_type}") 58 | 59 | def unidirectional_rnn(self, input_vector, sequence_length, scope=None): 60 | return tf.nn.dynamic_rnn( 61 | self.cells, 62 | input_vector, 63 | sequence_length=sequence_length, 64 | dtype=self.dtype, 65 | time_major=False, 66 | swap_memory=True, 67 | scope=scope) 68 | 69 | def bidirectional_rnn(self, input_vector, sequence_length, scope=None): 70 | outputs, output_state_fw, output_state_bw = tf.contrib.rnn.stack_bidirectional_dynamic_rnn( 71 | self.cells_fw, 72 | self.cells_bw, 73 | input_vector, 74 | sequence_length=sequence_length, 75 | dtype=self.dtype, 76 | scope=scope) 77 | 78 | encoder_final_state = tf.concat((output_state_fw[-1], output_state_bw[-1]), axis=1) 79 | return outputs, encoder_final_state 80 | 81 | def _create_rnn_cells(self, is_list=False): 82 | """Contructs stacked_rnn with num_layers 83 | 84 | * Args: 85 | is_list: flags for stack bidirectional. True=stack bidirectional, False=unidirectional 86 | 87 | * Returns: 88 | stacked_rnn 89 | """ 90 | 91 | stacked_rnn = [] 92 | for _ in range(self.num_layers): 93 | single_cell = self._rnn_single_cell() 94 | stacked_rnn.append(single_cell) 95 | 96 | if is_list: 97 | return stacked_rnn 98 | else: 99 | return tf.nn.rnn_cell.MultiRNNCell( 100 | cells=stacked_rnn, 101 | state_is_tuple=True) 102 | 103 | def _rnn_single_cell(self): 104 | """Contructs rnn single_cell""" 105 | 106 | if self.cell_type == self.RNN_GRU_CELL: 107 | single_cell = tf.contrib.rnn.GRUCell( 108 | self.num_units, 109 | reuse=tf.get_variable_scope().reuse) 110 | elif self.cell_type == self.RNN_LSTM_CELL: 111 | single_cell = tf.contrib.rnn.BasicLSTMCell( 112 | self.num_units, 113 | forget_bias=1.0, 114 | reuse=tf.get_variable_scope().reuse) 115 | elif self.cell_type == self.RNN_LAYER_NORM_LSTM_CELL: 116 | single_cell = tf.contrib.rnn.LayerNormBasicLSTMCell( 117 | self.num_units, 118 | forget_bias=1.0, 119 | layer_norm=True, 120 | reuse=tf.get_variable_scope().reuse) 121 | elif self.cell_type == self.RNN_NAS_CELL: 122 | single_cell = tf.contrib.rnn.LayerNormBasicLSTMCell( 123 | self.num_units) 124 | else: 125 | raise ValueError(f"Unknown rnn cell type. {self.cell_type}") 126 | 127 | if self.dropout > 0.0: 128 | single_cell = tf.contrib.rnn.DropoutWrapper( 129 | cell=single_cell, input_keep_prob=(1.0 - self.dropout)) 130 | 131 | return single_cell 132 | -------------------------------------------------------------------------------- /dynamic_memory/episode.py: -------------------------------------------------------------------------------- 1 | 2 | import tensorflow as tf 3 | 4 | 5 | 6 | class Episode: 7 | """Episode class is used update memory in in Episodic Memory Module""" 8 | 9 | def __init__(self, num_units, reg_scale=0.001): 10 | self.gate = AttentionGate(hidden_size=num_units, reg_scale=reg_scale) 11 | self.rnn = tf.contrib.rnn.GRUCell(num_units) 12 | 13 | def update(self, c, m_t, q_t): 14 | """Update memory with attention mechanism 15 | 16 | * Args: 17 | c : encoded raw text and stacked by each sentence 18 | shape: fact_count x [batch_size, num_units] 19 | m_t : previous memory 20 | shape: [num_units, batch_size] 21 | q_t : encoded question last state 22 | shape: [num_units, batch_size] 23 | 24 | * Returns: 25 | h : updated memory 26 | """ 27 | h = tf.zeros_like(c[0]) 28 | 29 | with tf.variable_scope('memory-update') as scope: 30 | for fact in c: 31 | g = self.gate.score(tf.transpose(fact, name="c"), m_t, q_t) 32 | h = g * self.rnn(fact, h, scope="episode_rnn")[0] + (1 - g) * h 33 | scope.reuse_variables() 34 | return h 35 | 36 | 37 | class AttentionGate: 38 | """AttentionGate class is simple two-layer feed forward neural network with Score function.""" 39 | 40 | def __init__(self, hidden_size=4, reg_scale=0.001): 41 | self.w1 = tf.get_variable( 42 | "w1", [hidden_size, 7*hidden_size], 43 | regularizer=tf.contrib.layers.l2_regularizer(reg_scale)) 44 | self.b1 = tf.get_variable("b1", [hidden_size, 1]) 45 | self.w2 = tf.get_variable( 46 | "w2", [1, hidden_size], 47 | regularizer=tf.contrib.layers.l2_regularizer(reg_scale)) 48 | self.b2 = tf.get_variable("b2", [1, 1]) 49 | 50 | def score(self, c_t, m_t, q_t): 51 | """For captures a variety of similarities between input(c), memory(m) and question(q) 52 | 53 | * Args: 54 | c_t : transpose of one fact (encoded sentence's last state) 55 | shape: [num_units, batch_size] 56 | m_t : transpose of previous memory 57 | shape: [num_units, batch_size] 58 | q_t : transpose of encoded question 59 | shape: [num_units, batch_size] 60 | 61 | * Returns: 62 | gate score 63 | shape: [batch_size, 1] 64 | """ 65 | 66 | with tf.variable_scope('attention_gate'): 67 | z = tf.concat([c_t, m_t, q_t, c_t*q_t, c_t*m_t, (c_t-q_t)**2, (c_t-m_t)**2], 0) 68 | 69 | o1 = tf.nn.tanh(tf.matmul(self.w1, z) + self.b1) 70 | o2 = tf.nn.sigmoid(tf.matmul(self.w2, o1) + self.b2) 71 | return tf.transpose(o2) 72 | -------------------------------------------------------------------------------- /dynamic_memory_plus/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | from hbconfig import Config 3 | import tensorflow as tf 4 | 5 | 6 | from .input import TextualInput 7 | 8 | 9 | class Graph: 10 | 11 | def __init__(self, mode, dtype=tf.float32): 12 | self.mode = mode 13 | self.dtype = dtype 14 | 15 | def build(self, 16 | input=None, 17 | input_mask=None, 18 | question=None): 19 | 20 | facts = self._build_textual_input_module(input) 21 | encoded_question = self._build_question_module(question) 22 | last_memory = self._build_episodic_memory_module(facts, question) 23 | return self._build_question_module(last_memory) 24 | 25 | def _build_textual_input_module(self, input): 26 | textual_input = TextualInput(embed_dim=Config.model.embed_dim, 27 | vocab_size=Config.data.vocab_size, 28 | dtype=self.dtype) 29 | facts = textual_input.build(input) 30 | return facts 31 | 32 | def _build_question_module(self, question): 33 | encoder = Encoder(...) 34 | _, question = encoder.build(question) 35 | return question[0] 36 | 37 | def _build_episodic_memory_module(self, facts, question): 38 | # attention mechanism (gate attention + AttnGRU) 39 | # memory update 40 | pass 41 | 42 | def _build_answer_module(self): 43 | pass 44 | -------------------------------------------------------------------------------- /dynamic_memory_plus/attn_gru.py: -------------------------------------------------------------------------------- 1 | 2 | from tf.nn.rnn_cell importRNNCell 3 | 4 | 5 | class AttnGRUCell(RNNCell): 6 | """Attention based GRU (cf. https://arxiv.org/abs/1603.01417). 7 | 8 | * Args: 9 | num_units: int, The number of units in the AttnGRU cell. 10 | activation: Nonlinearity to use. Default: `tanh`. 11 | reuse: (optional) Python boolean describing whether to reuse variables 12 | in an existing scope. If not `True`, and the existing scope already has 13 | the given variables, an error is raised. 14 | kernel_initializer: (optional) The initializer to use for the weight and 15 | projection matrices. 16 | bias_initializer: (optional) The initializer to use for the bias. 17 | """ 18 | 19 | def __init__(self, 20 | num_units, 21 | activation=None, 22 | reuse=None, 23 | kernel_initializer=None, 24 | bias_initializer=None): 25 | 26 | super(AttnGRUCell, self).__init__(_reuse=reuse) 27 | self._num_units = num_units 28 | self._activation = activation or math_ops.tanh 29 | self._kernel_initializer = kernel_initializer 30 | self._bias_initializer = bias_initializer 31 | self._gate_linear = None 32 | self._candidate_linear = None 33 | 34 | @property 35 | def state_size(self): 36 | return self._num_units 37 | 38 | @property 39 | def output_size(self): 40 | return self._num_units 41 | 42 | def call(self, inputs, state): 43 | """Attention Based GRU with num units cells.""" 44 | if self._gate_linear is None: 45 | bias_ones = self._bias_initializer 46 | if self._bias_initializer is None: 47 | bias_ones = init_ops.constant_initializer(1.0, dtype=inputs.dtype) 48 | 49 | with vs.variable_scope("gates"): # Reset gate and update gate. 50 | self._gate_linear = _Linear( 51 | [inputs, state], 52 | 2 * self._num_units, 53 | True, 54 | bias_initializer=bias_ones, 55 | kernel_initializer=self._kernel_initializer) 56 | 57 | value = math_ops.sigmoid(self._gate_linear([inputs, state])) 58 | r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1) 59 | 60 | r_state = r * state 61 | if self._candidate_linear is None: 62 | with vs.variable_scope("candidate"): 63 | self._candidate_linear = _Linear( 64 | [inputs, r_state], 65 | self._num_units, 66 | True, 67 | bias_initializer=self._bias_initializer, 68 | kernel_initializer=self._kernel_initializer) 69 | c = self._activation(self._candidate_linear([inputs, r_state])) 70 | new_h = u * state + (1 - u) * c 71 | return new_h, new_h 72 | -------------------------------------------------------------------------------- /dynamic_memory_plus/encoder.py: -------------------------------------------------------------------------------- 1 | 2 | import tensorflow as tf 3 | 4 | 5 | 6 | class Encoder: 7 | """Encoder class is Mutil-layer Recurrent Neural Networks 8 | 9 | The 'Encoder' usually encode the sequential input vector. 10 | """ 11 | 12 | UNI_ENCODER_TYPE = "uni" 13 | BI_ENCODER_TYPE = "bi" 14 | 15 | RNN_GRU_CELL = "gru" 16 | RNN_LSTM_CELL = "lstm" 17 | RNN_LAYER_NORM_LSTM_CELL = "layer_norm_lstm" 18 | RNN_NAS_CELL = "nas" 19 | 20 | def __init__(self, encoder_type="UNI", num_layers=4, 21 | cell_type="GRU", num_units=512, dropout=0.8, 22 | dtype=tf.float32): 23 | """Contructs an 'Encoder' instance. 24 | 25 | * Args: 26 | encoder_type: rnn encoder_type (uni, bi) 27 | num_layers: number of RNN cell composed sequentially of multiple simple cells. 28 | input_vector: RNN Input vectors. 29 | sequence_length: batch element's sequence length 30 | cell_type: RNN cell types (lstm, gru, layer_norm_lstm, nas) 31 | num_units: the number of units in cell 32 | dropout: set prob operator adding dropout to inputs of the given cell. 33 | dtype: the dtype of the input 34 | 35 | * Returns: 36 | Encoder instance 37 | """ 38 | 39 | self.encoder_type = encoder_type 40 | self.num_layers = num_layers 41 | self.cell_type = cell_type 42 | self.num_units = num_units 43 | self.dropout = dropout 44 | self.dtype = dtype 45 | 46 | def build(self, input_vector, sequence_length, scope=None): 47 | if self.encoder_type == self.UNI_ENCODER_TYPE: 48 | self.cells = self._create_rnn_cells() 49 | 50 | return self.unidirectional_rnn(input_vector, sequence_length, scope=scope) 51 | elif self.encoder_type == self.BI_ENCODER_TYPE: 52 | self.cells_fw = self._create_rnn_cells(is_list=True) 53 | self.cells_bw = self._create_rnn_cells(is_list=True) 54 | 55 | return self.bidirectional_rnn(input_vector, sequence_length, scope=scope) 56 | else: 57 | raise ValueError(f"Unknown encoder_type {self.encoder_type}") 58 | 59 | def unidirectional_rnn(self, input_vector, sequence_length, scope=None): 60 | return tf.nn.dynamic_rnn( 61 | self.cells, 62 | input_vector, 63 | sequence_length=sequence_length, 64 | dtype=self.dtype, 65 | time_major=False, 66 | swap_memory=True, 67 | scope=scope) 68 | 69 | def bidirectional_rnn(self, input_vector, sequence_length, scope=None): 70 | outputs, output_state_fw, output_state_bw = tf.contrib.rnn.stack_bidirectional_dynamic_rnn( 71 | self.cells_fw, 72 | self.cells_bw, 73 | input_vector, 74 | sequence_length=sequence_length, 75 | dtype=self.dtype, 76 | scope=scope) 77 | 78 | encoder_final_state = tf.concat((output_state_fw[-1], output_state_bw[-1]), axis=1) 79 | return outputs, encoder_final_state 80 | 81 | def _create_rnn_cells(self, is_list=False): 82 | """Contructs stacked_rnn with num_layers 83 | 84 | * Args: 85 | is_list: flags for stack bidirectional. True=stack bidirectional, False=unidirectional 86 | 87 | * Returns: 88 | stacked_rnn 89 | """ 90 | 91 | stacked_rnn = [] 92 | for _ in range(self.num_layers): 93 | single_cell = self._rnn_single_cell() 94 | stacked_rnn.append(single_cell) 95 | 96 | if is_list: 97 | return stacked_rnn 98 | else: 99 | return tf.nn.rnn_cell.MultiRNNCell( 100 | cells=stacked_rnn, 101 | state_is_tuple=True) 102 | 103 | def _rnn_single_cell(self): 104 | """Contructs rnn single_cell""" 105 | 106 | if self.cell_type == self.RNN_GRU_CELL: 107 | single_cell = tf.contrib.rnn.GRUCell( 108 | self.num_units, 109 | reuse=tf.get_variable_scope().reuse) 110 | elif self.cell_type == self.RNN_LSTM_CELL: 111 | single_cell = tf.contrib.rnn.BasicLSTMCell( 112 | self.num_units, 113 | forget_bias=1.0, 114 | reuse=tf.get_variable_scope().reuse) 115 | elif self.cell_type == self.RNN_LAYER_NORM_LSTM_CELL: 116 | single_cell = tf.contrib.rnn.LayerNormBasicLSTMCell( 117 | self.num_units, 118 | forget_bias=1.0, 119 | layer_norm=True, 120 | reuse=tf.get_variable_scope().reuse) 121 | elif self.cell_type == self.RNN_NAS_CELL: 122 | single_cell = tf.contrib.rnn.LayerNormBasicLSTMCell( 123 | self.num_units) 124 | else: 125 | raise ValueError(f"Unknown rnn cell type. {self.cell_type}") 126 | 127 | if self.dropout > 0.0: 128 | single_cell = tf.contrib.rnn.DropoutWrapper( 129 | cell=single_cell, input_keep_prob=(1.0 - self.dropout)) 130 | 131 | return single_cell 132 | -------------------------------------------------------------------------------- /dynamic_memory_plus/episode.py: -------------------------------------------------------------------------------- 1 | 2 | import tensorflow as tf 3 | 4 | 5 | 6 | class Episode: 7 | 8 | def __init__(self, num_units): 9 | self.gate = AttentionGate() 10 | self.attn_gru = self._build_attention_based_gru() 11 | self.rnn = self._build_attention_based_gru(num_units) 12 | 13 | def _build_attention_based_gru(self, num_units): 14 | pass 15 | 16 | def update(self, f, m_t, q_t): 17 | h = tf.zeros_like(f[0]) 18 | 19 | with tf.variable_scope('memory-update') as scope: 20 | for fact in f: 21 | g = self.gate.score(tf.transpose(fact, name="f"), m_t, q_t) 22 | h = g * self.rnn(fact, h, scope="episode_rnn")[0] + (1 - g) * h 23 | scope.reuse_variables() 24 | return h 25 | 26 | 27 | class AttentionGate: 28 | 29 | def __init__(self, hidden_size=4, reg_scale=0.001): 30 | self.w1 = tf.get_variable( 31 | "w1", [hidden_size, 7*hidden_size], 32 | regularizer=tf.contrib.layers.l2_regularizer(reg_scale)) 33 | self.b1 = tf.get_variable("b1", [hidden_size, 1]) 34 | self.w2 = tf.get_variable( 35 | "w2", [1, hidden_size], 36 | regularizer=tf.contrib.layers.l2_regularizer(reg_scale)) 37 | self.b2 = tf.get_variable("b2", [1, 1]) 38 | 39 | def score(self, f_t, m_t, q_t): 40 | 41 | with tf.variable_scope('attention_gate'): 42 | z = tf.concat([f_t * q_t, f_t * m_t, tf.abs(t_f - q_t), tf.abs(f_t - m_t)], axis=0) 43 | 44 | o1 = tf.nn.tanh(tf.matmul(self.w1, z) + self.b1) 45 | o2 = tf.matmul(self.w2, o1) + self.b2 46 | o3 = tf.softmax(o2) 47 | return tf.transpose(o3) 48 | -------------------------------------------------------------------------------- /dynamic_memory_plus/input.py: -------------------------------------------------------------------------------- 1 | 2 | import numpy as np 3 | import tensorflow as tf 4 | 5 | from .encoder import Encoder 6 | 7 | 8 | 9 | class TextualInput: 10 | 11 | def __init__(self, embed_dim, vocab_size, dtype=tf.float32): 12 | self.embed_dim = embed_dim 13 | self.vocab_size = vocab_size 14 | self.dtype = dtype 15 | 16 | def build(self, input): 17 | fs = self.build_sentence_reader(input) 18 | facts = self.build_input_fusion_layer(fs) 19 | return facts 20 | 21 | def build_sentence_reader(self, input): 22 | 23 | with tf.variable_scope("sentence reader"): 24 | # fi = j=1 ~ M sum of (i_j o w_j^i) 25 | pe = tf.nn.embedding_lookup(self._positional_encoding(num_of_words, dim), input) 26 | w = tf.nn.embedding_lookup(self._word_embedding(), input) 27 | return pe * w # element wise 28 | 29 | def _word_embedding(self, dtype=tf.float32): 30 | return tf.get_variable("word embedding", 31 | [self.vocab_size, self.embed_dim], dtype) 32 | 33 | def _positional_encoding(self, num_of_words, dim, dtype=tf.float32): 34 | # M = num_of_words, K = dim 35 | pe = np.array( 36 | [[ (1 - j/M) - (k/d) * (1 - 2j/M) for j in M(num_of_words) ] for k in K (dim)] 37 | ) 38 | return tf.convert_to_tensor(pe, dtype=dtype, name="positional encoding") 39 | 40 | def build_input_fusion_layer(self, f): 41 | encoder = Encoder() # bidirectional rnn 42 | facts = encoder.build() 43 | return facts 44 | -------------------------------------------------------------------------------- /hook.py: -------------------------------------------------------------------------------- 1 | 2 | from hbconfig import Config 3 | import numpy as np 4 | import tensorflow as tf 5 | 6 | 7 | 8 | def print_input(variables, vocab=None, every_n_iter=100): 9 | 10 | return tf.train.LoggingTensorHook( 11 | variables, 12 | every_n_iter=every_n_iter, 13 | formatter=format_variable(variables, vocab=vocab)) 14 | 15 | 16 | def format_variable(keys, vocab=None): 17 | rev_vocab = get_rev_vocab(vocab) 18 | 19 | def to_str(sequence): 20 | tokens = [ 21 | rev_vocab.get(x, '') for x in sequence if x != Config.data.PAD_ID] 22 | return ' '.join(tokens) 23 | 24 | def format(values): 25 | result = [] 26 | for key in keys: 27 | if vocab is None: 28 | result.append(f"{key} = {values[key]}") 29 | else: 30 | result.append(f"{key} = {to_str(values[key])}") 31 | 32 | try: 33 | return '\n - '.join(result) 34 | except: 35 | pass 36 | 37 | return format 38 | 39 | 40 | def get_rev_vocab(vocab): 41 | if vocab is None: 42 | return None 43 | return {idx: key for key, idx in vocab.items()} 44 | 45 | 46 | def print_target(variables, every_n_iter=100): 47 | 48 | return tf.train.LoggingTensorHook( 49 | variables, 50 | every_n_iter=every_n_iter, 51 | formatter=print_pos_or_neg(variables)) 52 | 53 | 54 | def print_pos_or_neg(keys): 55 | 56 | def format(values): 57 | result = [] 58 | for key in keys: 59 | if type(values[key]) == np.ndarray: 60 | value = max(values[key]) 61 | else: 62 | value = values[key] 63 | result.append(f"{key} = {value}") 64 | 65 | try: 66 | return ', '.join(result) 67 | except: 68 | pass 69 | 70 | return format 71 | -------------------------------------------------------------------------------- /images/ask_me_anything_figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DongjunLee/dmn-tensorflow/09796bda5f068d8e6d53cfe71da4a234e67c6a7d/images/ask_me_anything_figure_3.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | #-- coding: utf-8 -*- 2 | 3 | import argparse 4 | import logging 5 | 6 | from hbconfig import Config 7 | import tensorflow as tf 8 | from tensorflow.python import debug as tf_debug 9 | 10 | from data_loader import DataLoader 11 | import hook 12 | from model import Model 13 | 14 | 15 | def experiment_fn(run_config, params): 16 | 17 | model = Model() 18 | estimator = tf.estimator.Estimator( 19 | model_fn=model.model_fn, 20 | model_dir=Config.train.model_dir, 21 | params=params, 22 | config=run_config) 23 | 24 | data_loader = DataLoader( 25 | task_path=Config.data.task_path, 26 | task_id=Config.data.task_id, 27 | task_test_id=Config.data.task_id, 28 | w2v_dim=Config.model.embed_dim, 29 | use_pretrained=Config.model.use_pretrained) 30 | 31 | data = data_loader.make_train_and_test_set() 32 | 33 | vocab = data_loader.vocab 34 | 35 | # setting data property 36 | Config.data.vocab_size = len(vocab) 37 | Config.data.max_facts_seq_len = data_loader.max_facts_seq_len 38 | Config.data.max_question_seq_len = data_loader.max_question_seq_len 39 | Config.data.max_input_mask_length = data_loader.max_input_mask_len 40 | print("max_facts_seq_len:", data_loader.max_facts_seq_len) 41 | print("max_question_seq_len:", data_loader.max_question_seq_len) 42 | print("max_input_mask_length:", data_loader.max_input_mask_len) 43 | 44 | train_input_fn, train_input_hook = data_loader.make_batch( 45 | data["train"], batch_size=Config.model.batch_size, scope="train") 46 | test_input_fn, test_input_hook = data_loader.make_batch( 47 | data["test"], batch_size=Config.model.batch_size, scope="test") 48 | 49 | train_hooks = [train_input_hook] 50 | if Config.train.print_verbose: 51 | pass 52 | if Config.train.debug: 53 | train_hooks.append(tf_debug.LocalCLIDebugHook()) 54 | 55 | eval_hooks = [test_input_hook] 56 | if Config.train.debug: 57 | eval_hooks.append(tf_debug.LocalCLIDebugHook()) 58 | 59 | experiment = tf.contrib.learn.Experiment( 60 | estimator=estimator, 61 | train_input_fn=train_input_fn, 62 | eval_input_fn=test_input_fn, 63 | train_steps=Config.train.train_steps, 64 | min_eval_frequency=Config.train.min_eval_frequency, 65 | train_monitors=train_hooks, 66 | eval_hooks=eval_hooks 67 | ) 68 | return experiment 69 | 70 | 71 | def main(mode): 72 | params = tf.contrib.training.HParams(**Config.model.to_dict()) 73 | 74 | run_config = tf.contrib.learn.RunConfig( 75 | model_dir=Config.train.model_dir, 76 | save_checkpoints_steps=Config.train.save_checkpoints_steps) 77 | 78 | tf.contrib.learn.learn_runner.run( 79 | experiment_fn=experiment_fn, 80 | run_config=run_config, 81 | schedule=mode, 82 | hparams=params 83 | ) 84 | 85 | 86 | if __name__ == '__main__': 87 | 88 | parser = argparse.ArgumentParser( 89 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 90 | parser.add_argument('--config', type=str, default='config', 91 | help='config file name') 92 | parser.add_argument('--mode', type=str, default='train', 93 | help='Mode (train/test/train_and_evaluate)') 94 | args = parser.parse_args() 95 | 96 | tf.logging.set_verbosity(logging.INFO) 97 | 98 | Config(args.config) 99 | print("Config: ", Config) 100 | if Config.description: 101 | print("Config Description") 102 | for key, value in Config.description.items(): 103 | print(f" - {key}: {value}") 104 | 105 | main(args.mode) 106 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | 4 | from hbconfig import Config 5 | import tensorflow as tf 6 | 7 | import dynamic_memory 8 | 9 | 10 | 11 | class Model: 12 | 13 | def __init__(self): 14 | pass 15 | 16 | def model_fn(self, mode, features, labels, params): 17 | self.dtype = tf.float32 18 | 19 | self.mode = mode 20 | self.params = params 21 | 22 | self.loss, self.train_op, self.eval_metric_ops, self.predictions = None, None, None, None 23 | self._init_placeholder(features, labels) 24 | self.build_graph() 25 | 26 | # train mode: required loss and train_op 27 | # eval mode: required loss 28 | # predict mode: required predictions 29 | 30 | return tf.estimator.EstimatorSpec( 31 | mode=mode, 32 | predictions=self.predictions, 33 | loss=self.loss, 34 | train_op=self.train_op, 35 | eval_metric_ops=self._build_metric() 36 | ) 37 | 38 | def _init_placeholder(self, features, labels): 39 | self.input_data = features 40 | if type(features) == dict: 41 | self.embedding_input = features["input_data"] 42 | self.input_mask = features["input_data_mask"] 43 | self.embedding_question = features["question_data"] 44 | 45 | self.targets = labels 46 | 47 | def build_graph(self): 48 | graph = dynamic_memory.Graph(self.mode) 49 | output = graph.build(embedding_input=self.embedding_input, 50 | input_mask = self.input_mask, 51 | embedding_question=self.embedding_question) 52 | self.predictions = tf.argmax(output, axis=1) 53 | 54 | self._build_loss(output) 55 | self._build_optimizer() 56 | 57 | def _build_loss(self, output): 58 | with tf.variable_scope('loss'): 59 | cross_entropy = tf.losses.sparse_softmax_cross_entropy( 60 | self.targets, 61 | output, 62 | scope="cross-entropy") 63 | reg_term = tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)) 64 | 65 | self.loss = tf.add(cross_entropy, reg_term) 66 | 67 | def _build_optimizer(self): 68 | self.train_op = tf.contrib.layers.optimize_loss( 69 | self.loss, tf.train.get_global_step(), 70 | optimizer=Config.train.get('optimizer', 'Adam'), 71 | learning_rate=Config.train.learning_rate, 72 | summaries=['loss', 'gradients', 'learning_rate'], 73 | name="train_op") 74 | 75 | def _build_metric(self): 76 | return { 77 | "accuracy": tf.metrics.accuracy(self.targets, self.predictions) 78 | } 79 | -------------------------------------------------------------------------------- /notebooks/data_loader.py.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 44, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "\"\"\"\n", 10 | "bAbi data_loader\n", 11 | "Original code : https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano/blob/master/utils.py\n", 12 | "\"\"\"\n", 13 | "\n", 14 | "import os as os\n", 15 | "import numpy as np\n", 16 | "from tqdm import tqdm\n", 17 | "\n", 18 | "\n", 19 | "\n", 20 | "class DataLoader:\n", 21 | "\n", 22 | " def __init__(self, task_id, task_test_id, w2v_dim=100, input_mask_mode=\"sentence\", use_pretrained=True):\n", 23 | " self.base_path = os.path.join(\"data/\")\n", 24 | "\n", 25 | " self.task_id = str(task_id)\n", 26 | " self.task_test_id = str(task_test_id)\n", 27 | " self.w2v_dim = w2v_dim\n", 28 | " self.input_mask_mode = input_mask_mode\n", 29 | " self.use_pretrained = use_pretrained\n", 30 | "\n", 31 | " def make_train_and_test_set(self):\n", 32 | " train_raw, test_raw = self.get_babi_raw(self.task_id, self.task_test_id)\n", 33 | " self.max_facts_seq_len, self.max_question_seq_len, self.max_input_mask_len = self.get_max_seq_length(train_raw, test_raw)\n", 34 | " \n", 35 | " if self.use_pretrained:\n", 36 | " self.word2vec = self.load_glove(self.w2v_dim)\n", 37 | " else:\n", 38 | " self.word2vec = {}\n", 39 | " self.vocab = {}\n", 40 | " self.ivocab = {}\n", 41 | " \n", 42 | " self.create_vector(\"unknown\")\n", 43 | "\n", 44 | " train_input, train_question, train_answer, train_input_mask = self.process_input(train_raw)\n", 45 | " test_input, test_question, test_answer, test_input_mask = self.process_input(test_raw)\n", 46 | "\n", 47 | " return {\n", 48 | " \"train\": (train_input, train_input_mask, train_question, train_answer),\n", 49 | " \"test\": (test_input, test_input_mask, test_question, test_answer)\n", 50 | " }\n", 51 | " \n", 52 | " def get_max_seq_length(self, *datasets):\n", 53 | " max_facts_length, max_question_length, max_input_mask_length = 0, 0, 0\n", 54 | " \n", 55 | " def count_punctuation(facts):\n", 56 | " return len(list(filter(lambda x: x == \".\", facts)))\n", 57 | " \n", 58 | " for dataset in datasets:\n", 59 | " for d in dataset:\n", 60 | " max_facts_length = max(max_facts_length, len(d['C'].split()))\n", 61 | " max_input_mask_length = max(max_input_mask_length, count_punctuation(d['C']))\n", 62 | " max_question_length = max(max_question_length, len(d['Q'].split()))\n", 63 | " return max_facts_length, max_question_length, max_input_mask_length\n", 64 | "\n", 65 | " def init_babi(self, fname):\n", 66 | " print(\"==> Loading test from %s\" % fname)\n", 67 | " tasks = []\n", 68 | " task = None\n", 69 | " for i, line in enumerate(open(fname)):\n", 70 | " id = int(line[0:line.find(' ')])\n", 71 | " if id == 1:\n", 72 | " task = {\"C\": \"\", \"Q\": \"\", \"A\": \"\"}\n", 73 | "\n", 74 | " line = line.strip()\n", 75 | " line = line.replace('.', ' . ')\n", 76 | " line = line[line.find(' ')+1:]\n", 77 | " if line.find('?') == -1:\n", 78 | " task[\"C\"] += line\n", 79 | " else:\n", 80 | " idx = line.find('?')\n", 81 | " tmp = line[idx+1:].split('\\t')\n", 82 | " task[\"Q\"] = line[:idx]\n", 83 | " task[\"A\"] = tmp[1].strip()\n", 84 | " tasks.append(task.copy())\n", 85 | "\n", 86 | " return tasks\n", 87 | "\n", 88 | "\n", 89 | " def get_babi_raw(self, id, test_id):\n", 90 | " babi_map = {\n", 91 | " \"1\": \"qa1_single-supporting-fact\",\n", 92 | " \"2\": \"qa2_two-supporting-facts\",\n", 93 | " \"3\": \"qa3_three-supporting-facts\",\n", 94 | " \"4\": \"qa4_two-arg-relations\",\n", 95 | " \"5\": \"qa5_three-arg-relations\",\n", 96 | " \"6\": \"qa6_yes-no-questions\",\n", 97 | " \"7\": \"qa7_counting\",\n", 98 | " \"8\": \"qa8_lists-sets\",\n", 99 | " \"9\": \"qa9_simple-negation\",\n", 100 | " \"10\": \"qa10_indefinite-knowledge\",\n", 101 | " \"11\": \"qa11_basic-coreference\",\n", 102 | " \"12\": \"qa12_conjunction\",\n", 103 | " \"13\": \"qa13_compound-coreference\",\n", 104 | " \"14\": \"qa14_time-reasoning\",\n", 105 | " \"15\": \"qa15_basic-deduction\",\n", 106 | " \"16\": \"qa16_basic-induction\",\n", 107 | " \"17\": \"qa17_positional-reasoning\",\n", 108 | " \"18\": \"qa18_size-reasoning\",\n", 109 | " \"19\": \"qa19_path-finding\",\n", 110 | " \"20\": \"qa20_agents-motivations\",\n", 111 | " \"MCTest\": \"MCTest\",\n", 112 | " \"19changed\": \"19changed\",\n", 113 | " \"joint\": \"all_shuffled\",\n", 114 | " \"sh1\": \"../shuffled/qa1_single-supporting-fact\",\n", 115 | " \"sh2\": \"../shuffled/qa2_two-supporting-facts\",\n", 116 | " \"sh3\": \"../shuffled/qa3_three-supporting-facts\",\n", 117 | " \"sh4\": \"../shuffled/qa4_two-arg-relations\",\n", 118 | " \"sh5\": \"../shuffled/qa5_three-arg-relations\",\n", 119 | " \"sh6\": \"../shuffled/qa6_yes-no-questions\",\n", 120 | " \"sh7\": \"../shuffled/qa7_counting\",\n", 121 | " \"sh8\": \"../shuffled/qa8_lists-sets\",\n", 122 | " \"sh9\": \"../shuffled/qa9_simple-negation\",\n", 123 | " \"sh10\": \"../shuffled/qa10_indefinite-knowledge\",\n", 124 | " \"sh11\": \"../shuffled/qa11_basic-coreference\",\n", 125 | " \"sh12\": \"../shuffled/qa12_conjunction\",\n", 126 | " \"sh13\": \"../shuffled/qa13_compound-coreference\",\n", 127 | " \"sh14\": \"../shuffled/qa14_time-reasoning\",\n", 128 | " \"sh15\": \"../shuffled/qa15_basic-deduction\",\n", 129 | " \"sh16\": \"../shuffled/qa16_basic-induction\",\n", 130 | " \"sh17\": \"../shuffled/qa17_positional-reasoning\",\n", 131 | " \"sh18\": \"../shuffled/qa18_size-reasoning\",\n", 132 | " \"sh19\": \"../shuffled/qa19_path-finding\",\n", 133 | " \"sh20\": \"../shuffled/qa20_agents-motivations\",\n", 134 | " }\n", 135 | " if (test_id == \"\"):\n", 136 | " test_id = id\n", 137 | " babi_name = babi_map[id]\n", 138 | " babi_test_name = babi_map[test_id]\n", 139 | " babi_train_raw = self.init_babi(os.path.join(self.base_path, 'en-10/%s_train.txt' % babi_name))\n", 140 | " babi_test_raw = self.init_babi(os.path.join(self.base_path, 'en-10/%s_test.txt' % babi_test_name))\n", 141 | " return babi_train_raw, babi_test_raw\n", 142 | "\n", 143 | " def load_glove(self, dim):\n", 144 | " word2vec = {}\n", 145 | "\n", 146 | " print(\"==> loading glove\")\n", 147 | " with open(os.path.join(self.base_path, \"glove/glove.6B.\" + str(dim) + \"d.txt\")) as f:\n", 148 | " for line in tqdm(f):\n", 149 | " l = line.split()\n", 150 | " word2vec[l[0]] = l[1:]\n", 151 | "\n", 152 | " print(\"==> glove is loaded\")\n", 153 | "\n", 154 | " return word2vec\n", 155 | "\n", 156 | " def create_vector(self, word, silent=False):\n", 157 | " # if the word is missing from Glove, create some fake vector and store in glove!\n", 158 | " vector = np.random.uniform(0.0, 1.0, (self.w2v_dim,))\n", 159 | " self.word2vec[word] = vector\n", 160 | " if (not silent):\n", 161 | " print(\"data_loader.py::create_vector => %s is missing\" % word)\n", 162 | " return vector\n", 163 | "\n", 164 | " def process_word(self, word, to_return=\"word2vec\", silent=False):\n", 165 | " if not word in self.word2vec:\n", 166 | " self.create_vector(word, silent=silent)\n", 167 | " if not word in self.vocab:\n", 168 | " next_index = len(self.vocab)\n", 169 | " self.vocab[word] = next_index\n", 170 | " self.ivocab[next_index] = word\n", 171 | "\n", 172 | " if to_return == \"word2vec\":\n", 173 | " return self.word2vec[word]\n", 174 | " elif to_return == \"index\":\n", 175 | " return self.vocab[word]\n", 176 | " else:\n", 177 | " raise ValueError(\"return type is 'word2vec' or 'index'\")\n", 178 | "\n", 179 | " def get_norm(self, x):\n", 180 | " x = np.array(x)\n", 181 | " return np.sum(x * x)\n", 182 | "\n", 183 | " def process_input(self, data_raw):\n", 184 | " questions = []\n", 185 | " inputs = []\n", 186 | " answers = []\n", 187 | " input_masks = []\n", 188 | " \n", 189 | " for x in data_raw:\n", 190 | " inp = x[\"C\"].lower().split(' ')\n", 191 | " inp = [w for w in inp if len(w) > 0]\n", 192 | " \n", 193 | " q = x[\"Q\"].lower().split(' ')\n", 194 | " q = [w for w in q if len(w) > 0]\n", 195 | "\n", 196 | " inp_vector = [self.process_word(word=w, to_return=\"word2vec\") for w in inp]\n", 197 | " inp_vector = self.pad_input(inp_vector, self.max_facts_seq_len, [np.zeros(self.w2v_dim)])\n", 198 | " \n", 199 | " q_vector = [self.process_word(word=w, to_return=\"word2vec\") for w in q]\n", 200 | " q_vector = self.pad_input(q_vector, self.max_question_seq_len, [np.zeros(self.w2v_dim)])\n", 201 | " \n", 202 | " inputs.append(np.vstack(inp_vector).astype(float)) \n", 203 | " questions.append(np.vstack(q_vector).astype(float))\n", 204 | " answers.append(self.process_word(word = x[\"A\"], to_return = \"index\"))\n", 205 | "\n", 206 | " if self.input_mask_mode == 'word':\n", 207 | " input_masks.append(np.array([index for index, w in enumerate(inp)], dtype=np.int32))\n", 208 | " elif self.input_mask_mode == 'sentence':\n", 209 | " input_mask = [index for index, w in enumerate(inp) if w == '.']\n", 210 | " input_mask = self.pad_input(input_mask, self.max_input_mask_len, [0])\n", 211 | " input_masks.append(input_mask)\n", 212 | " else:\n", 213 | " raise ValueError(\"input_mask_mode is only available (word, sentence)\")\n", 214 | " \n", 215 | " return (np.array(inputs, dtype=np.float32), \n", 216 | " np.array(questions, dtype=np.float32),\n", 217 | " np.array(answers, dtype=np.int32).reshape(-1, 1), \n", 218 | " np.array(input_masks, dtype=np.int32))\n", 219 | " \n", 220 | " def pad_input(self, input_, size, pad_item):\n", 221 | " return input_ + pad_item * (size - len(input_))" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 45, 227 | "metadata": {}, 228 | "outputs": [ 229 | { 230 | "name": "stdout", 231 | "output_type": "stream", 232 | "text": [ 233 | "==> Loading test from data/en-10/qa1_single-supporting-fact_train.txt\n", 234 | "==> Loading test from data/en-10/qa1_single-supporting-fact_test.txt\n", 235 | "data_loader.py::create_vector => unknown is missing\n", 236 | "data_loader.py::create_vector => mary is missing\n", 237 | "data_loader.py::create_vector => moved is missing\n", 238 | "data_loader.py::create_vector => to is missing\n", 239 | "data_loader.py::create_vector => the is missing\n", 240 | "data_loader.py::create_vector => bathroom is missing\n", 241 | "data_loader.py::create_vector => . is missing\n", 242 | "data_loader.py::create_vector => john is missing\n", 243 | "data_loader.py::create_vector => went is missing\n", 244 | "data_loader.py::create_vector => hallway is missing\n", 245 | "data_loader.py::create_vector => where is missing\n", 246 | "data_loader.py::create_vector => is is missing\n", 247 | "data_loader.py::create_vector => daniel is missing\n", 248 | "data_loader.py::create_vector => back is missing\n", 249 | "data_loader.py::create_vector => sandra is missing\n", 250 | "data_loader.py::create_vector => garden is missing\n", 251 | "data_loader.py::create_vector => office is missing\n", 252 | "data_loader.py::create_vector => journeyed is missing\n" 253 | ] 254 | } 255 | ], 256 | "source": [ 257 | "data_loader = DataLoader(task_id=\"1\", task_test_id=\"1\", w2v_dim=50, use_pretrained=False)\n", 258 | "data = data_loader.make_train_and_test_set()" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 46, 264 | "metadata": {}, 265 | "outputs": [ 266 | { 267 | "name": "stdout", 268 | "output_type": "stream", 269 | "text": [ 270 | "==> Loading test from data/en-10/qa1_single-supporting-fact_train.txt\n", 271 | "==> Loading test from data/en-10/qa1_single-supporting-fact_test.txt\n" 272 | ] 273 | } 274 | ], 275 | "source": [ 276 | "train_raw, test_raw = data_loader.get_babi_raw(\"1\", \"1\")" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 47, 282 | "metadata": {}, 283 | "outputs": [ 284 | { 285 | "data": { 286 | "text/plain": [ 287 | "[{'A': 'bathroom',\n", 288 | " 'C': 'Mary moved to the bathroom . John went to the hallway . ',\n", 289 | " 'Q': 'Where is Mary'},\n", 290 | " {'A': 'hallway',\n", 291 | " 'C': 'Mary moved to the bathroom . John went to the hallway . Daniel went back to the hallway . Sandra moved to the garden . ',\n", 292 | " 'Q': 'Where is Daniel'},\n", 293 | " {'A': 'hallway',\n", 294 | " 'C': 'Mary moved to the bathroom . John went to the hallway . Daniel went back to the hallway . Sandra moved to the garden . John moved to the office . Sandra journeyed to the bathroom . ',\n", 295 | " 'Q': 'Where is Daniel'}]" 296 | ] 297 | }, 298 | "execution_count": 47, 299 | "metadata": {}, 300 | "output_type": "execute_result" 301 | } 302 | ], 303 | "source": [ 304 | "train_raw" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 48, 310 | "metadata": {}, 311 | "outputs": [ 312 | { 313 | "name": "stdout", 314 | "output_type": "stream", 315 | "text": [ 316 | "(3, 37, 50) (3, 6) (3, 3, 50) (3, 1)\n" 317 | ] 318 | } 319 | ], 320 | "source": [ 321 | "train_input, train_input_mask, train_question, train_answer = data[\"train\"]\n", 322 | "print(train_input.shape, train_input_mask.shape, train_question.shape, train_answer.shape)" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 49, 328 | "metadata": {}, 329 | "outputs": [ 330 | { 331 | "data": { 332 | "text/plain": [ 333 | "(3, 37, 50)" 334 | ] 335 | }, 336 | "execution_count": 49, 337 | "metadata": {}, 338 | "output_type": "execute_result" 339 | } 340 | ], 341 | "source": [ 342 | "train_input.shape" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 50, 348 | "metadata": {}, 349 | "outputs": [ 350 | { 351 | "data": { 352 | "text/plain": [ 353 | "(37, 50)" 354 | ] 355 | }, 356 | "execution_count": 50, 357 | "metadata": {}, 358 | "output_type": "execute_result" 359 | } 360 | ], 361 | "source": [ 362 | "train_input[0].shape" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 51, 368 | "metadata": {}, 369 | "outputs": [ 370 | { 371 | "data": { 372 | "text/plain": [ 373 | "array([ 5, 11, 0, 0, 0, 0], dtype=int32)" 374 | ] 375 | }, 376 | "execution_count": 51, 377 | "metadata": {}, 378 | "output_type": "execute_result" 379 | } 380 | ], 381 | "source": [ 382 | "train_input_mask[0]" 383 | ] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "execution_count": 52, 388 | "metadata": {}, 389 | "outputs": [ 390 | { 391 | "data": { 392 | "text/plain": [ 393 | "(3, 50)" 394 | ] 395 | }, 396 | "execution_count": 52, 397 | "metadata": {}, 398 | "output_type": "execute_result" 399 | } 400 | ], 401 | "source": [ 402 | "train_question[0].shape" 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": 53, 408 | "metadata": {}, 409 | "outputs": [ 410 | { 411 | "data": { 412 | "text/plain": [ 413 | "array([4], dtype=int32)" 414 | ] 415 | }, 416 | "execution_count": 53, 417 | "metadata": {}, 418 | "output_type": "execute_result" 419 | } 420 | ], 421 | "source": [ 422 | "train_answer[0]" 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 54, 428 | "metadata": {}, 429 | "outputs": [ 430 | { 431 | "data": { 432 | "text/plain": [ 433 | "37" 434 | ] 435 | }, 436 | "execution_count": 54, 437 | "metadata": {}, 438 | "output_type": "execute_result" 439 | } 440 | ], 441 | "source": [ 442 | "data_loader.max_facts_seq_len" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 49, 448 | "metadata": {}, 449 | "outputs": [ 450 | { 451 | "data": { 452 | "text/plain": [ 453 | "3" 454 | ] 455 | }, 456 | "execution_count": 49, 457 | "metadata": {}, 458 | "output_type": "execute_result" 459 | } 460 | ], 461 | "source": [ 462 | "data_loader.max_question_seq_len" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": null, 468 | "metadata": {}, 469 | "outputs": [], 470 | "source": [] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": null, 475 | "metadata": {}, 476 | "outputs": [], 477 | "source": [] 478 | } 479 | ], 480 | "metadata": { 481 | "kernelspec": { 482 | "display_name": "Python 3.6 (NLP)", 483 | "language": "python", 484 | "name": "nlp" 485 | }, 486 | "language_info": { 487 | "codemirror_mode": { 488 | "name": "ipython", 489 | "version": 3 490 | }, 491 | "file_extension": ".py", 492 | "mimetype": "text/x-python", 493 | "name": "python", 494 | "nbconvert_exporter": "python", 495 | "pygments_lexer": "ipython3", 496 | "version": "3.6.1" 497 | } 498 | }, 499 | "nbformat": 4, 500 | "nbformat_minor": 2 501 | } 502 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | hb-config 2 | tqdm -------------------------------------------------------------------------------- /scripts/fetch_babi_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | url=http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz 4 | fname=`basename $url` 5 | 6 | curl -SLO $url 7 | tar zxvf $fname 8 | mkdir -p data 9 | mv tasks_1-20_v1-2/* data/ 10 | 11 | rm -r tasks_1-20_v1-2 12 | rm tasks_1-20_v1-2.tar.gz 13 | -------------------------------------------------------------------------------- /scripts/fetch_glove_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | url=http://nlp.stanford.edu/data/glove.6B.zip 4 | fname=`basename $url` 5 | 6 | curl -SLO $url 7 | mkdir -p data 8 | unzip $fname -d data/glove/ 9 | --------------------------------------------------------------------------------