├── README.md ├── fair ├── LICENSE ├── README.md ├── main.lua ├── model.lua └── preprocess.lua ├── stanford ├── LICENSE ├── README.md ├── main.lua ├── model.lua ├── model_nngraph.lua ├── preprocess.lua └── utils.lua └── watson ├── LICENSE ├── README.md ├── main.lua ├── model.lua ├── model_nngraph.lua ├── preprocess.lua └── utils.lua /README.md: -------------------------------------------------------------------------------- 1 | # Teaching Machines to Read and Comprehend CNN News and Children Books using Torch 2 | 3 | This software repository hosts the self-contained implementation of the state-of-the-art models used in Machine Reading and Comprehension Task. 4 | 5 | | Folder | Reference | 6 | |---|---| 7 | | [watson/](https://github.com/ganeshjawahar/torch-teacher/tree/master/watson)| **Text Understanding with the Attention Sum Reader Network**, *Kadlec et al.*, *ACL 2016*. | 8 | | [stanford/](https://github.com/ganeshjawahar/torch-teacher/tree/master/stanford) | **A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task**, *Chen et al.*, *ACL 2016*. | 9 | | [fair/](https://github.com/ganeshjawahar/torch-teacher/tree/master/fair) | **The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations**, *Hill et al.*, *ICLR 2016*. | 10 | 11 | ### Benchmarking Training Time 12 | 13 | | mins per batch (mins per epoch) | [watson/](https://github.com/ganeshjawahar/torch-teacher/tree/master/watson) | [stanford/](https://github.com/ganeshjawahar/torch-teacher/tree/master/stanford) | [fair/](https://github.com/ganeshjawahar/torch-teacher/tree/master/fair) | 14 | |---|:---:|:---:|:---:| 15 | |GPU\Batch Size|32|32|1| 16 | |[K40](http://www.nvidia.com/object/tesla-servers.html)|`806 ms` (`46m 16s`)|`800 ms` (`2h 40m`)|`18ms` (`34m 8s`)| 17 | |[Titan X](http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications)|`746 ms` (`42m 38s`)|-|`13ms` (`24m 45s`)| 18 | |[1080](http://www.geforce.com/hardware/10series/geforce-gtx-1080) |`889 ms` (`51m 8s`)|-|`13ms` (`25m 29s`)| 19 | 20 | ### Acknowledgements 21 | This repository would not be possible without the efforts of the maintainers of the following libraries: 22 | * [Element-Research/rnn](https://github.com/Element-Research/rnn) 23 | * [MemNN](https://github.com/facebook/MemNN) 24 | * [Torch](https://github.com/torch) (Ofcourse!) 25 | 26 | ### Author 27 | [Ganesh J](https://researchweb.iiit.ac.in/~ganesh.j/) 28 | 29 | ### Licence 30 | MIT 31 | -------------------------------------------------------------------------------- /fair/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in 11 | all copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 19 | THE SOFTWARE. 20 | -------------------------------------------------------------------------------- /fair/README.md: -------------------------------------------------------------------------------- 1 | ## The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations (ICLR 2016) 2 | 3 | [Torch](http://torch.ch) implementation of the model 'MemNNs (Window Memory + Self-Supervision)' proposed in [Hill et al.](http://arxiv.org/abs/1511.02301)'s work. 4 | 5 | ### Features 6 | 7 | * Train the model with the benchmarked corpus, [Children's Book Test](https://research.facebook.com/research/babi/) (CBT) out of the box. 8 | * Support for the selection of window composition, viz., Summation or Concatentation of the word vectors within the window 9 | * Support for tuning other hyper-parameters of the model reported in the paper. 10 | 11 | ### Quick Start 12 | 13 | Download and extract the `CBTest.tgz` file from [FB Research](https://research.facebook.com/research/babi/)'s page. 14 | 15 | To generate and save the data tensors (objects readable for Torch), 16 | 17 | ``` 18 | th preprocess.lua -data CBTest/data/ 19 | ``` 20 | 21 | where the data value `CBTest/data/` points to the extracted directory containing the training, validation and testing files for all the 4 prediction tasks, viz., `Named Entity` (NE), `Common Noun` (CN), `Preposition` (P) and `Verb` (V). 22 | 23 | To kick-start the training, 24 | 25 | ``` 26 | th main.lua 27 | ``` 28 | 29 | To know the hyper-parameters relevant for both the steps, 30 | 31 | ``` 32 | th preprocess.lua --help 33 | th main.lua --help 34 | ``` 35 | 36 | ### Training options 37 | 38 | #### `preprocess.lua` 39 | 40 | * `word_type`: class type of the prediction word. Specify `NE` for Named Entity, `CN` for Common Noun, `P` for Preposition and `V` for Verb 41 | * `data`: path to the data folder containing train, validation and test records 42 | * `b`: size of the window memory (same symbol used in the paper) 43 | * `out`: output file name for the tensors to be saved 44 | 45 | #### `main.lua` 46 | 47 | * `input`: input file name for the saved tensors 48 | * `seed`: seed value for the random generator 49 | * `p`: dimensionality of word embeddings (same symbol used in the paper) 50 | * `num_epochs`: number of full passes through the training data 51 | * `lr`: learning rate (note: currently waiting for the first author to reply the optimal learning rate decay used in the experiments.) 52 | * `window_compo`: how to compose the window representations from the word vectors? sum or concatenation? 53 | 54 | ### Torch Dependencies 55 | * nn 56 | * cunn 57 | * cutorch 58 | * xlua 59 | * tds 60 | 61 | ### Author 62 | [Ganesh J](https://researchweb.iiit.ac.in/~ganesh.j/) 63 | 64 | ### Licence 65 | MIT 66 | -------------------------------------------------------------------------------- /fair/main.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Torch Implementation of the paper 'The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations' 4 | 5 | Precisely we attempt to code for the model 'MemNNs (Window Memory + Self-Supervision)' mentioned in the paper 6 | 7 | ]]-- 8 | 9 | require 'torch' 10 | require 'cutorch' 11 | require 'nn' 12 | require 'cunn' 13 | require 'xlua' 14 | require 'sys' 15 | tds = require('tds') 16 | paths.dofile('model.lua') 17 | 18 | cmd = torch.CmdLine() 19 | cmd:option('-input', 'dataset.t7', [[data tensors input file name]]) 20 | cmd:option('-seed', 123, [[seed for the random generator]]) 21 | cmd:option('-p', 300, [[dimensionality of word embeddings]]) 22 | cmd:option('-num_epochs', 5, [[number of full passes through the training data]]) 23 | cmd:option('-lr', 0.01, [[sgd learning rate]]) 24 | cmd:option('-window_compo', 'sum', [[how to compose the window rep. from the word vectors? 25 | sum or concatenation?]]) 26 | params = cmd:parse(arg) 27 | 28 | torch.manualSeed(params.seed) 29 | 30 | -- load the dataset 31 | print('loading the dataset...') 32 | params.dataset = torch.load(params.input) 33 | params.b = (#(params.dataset.train_tensors[1][1])[1])[1] 34 | 35 | -- initiate the model & criterion 36 | print('initializing the model...') 37 | local record = params.dataset.train_tensors[1] 38 | params.model, params.criterion = get_model(params) 39 | 40 | -- train & evaluate the model 41 | print('training...') 42 | local best_dev_acc, best_test_acc, best_acc_epoch, train_start = -1, -1, -1, sys.clock() 43 | function compute_accuracy(model, data) 44 | model:evaluate() 45 | local correct, total, softmax, center = 0, 0, nn.SoftMax():cuda(), math.ceil(params.b / 2) 46 | for i = 1, #data do 47 | local record = data[i] 48 | local pred = model:forward({record[1], record[2]}) 49 | local soft_pred = softmax:forward(pred:t()) 50 | local _, max_id = soft_pred:max(2) 51 | if record[1][max_id[1][1]][center] == record[3] then 52 | correct = correct + 1 53 | end 54 | total = total + 1 55 | end 56 | return correct/total 57 | end 58 | for epoch = 1, params.num_epochs do 59 | local epoch_start, epoch_loss, epoch_iterations, indices = sys.clock(), 0, 0, torch.randperm(#params.dataset.train_tensors) 60 | params.model:training() 61 | xlua.progress(1, #params.dataset.train_tensors) 62 | for rec_id = 1, #params.dataset.train_tensors do 63 | local record = params.dataset.train_tensors[indices[rec_id]] 64 | local out = params.model:forward({record[1], record[2]}) 65 | local _, m_bar = out:max(1) 66 | local m_o1 = nil 67 | if #record[4] == 1 then 68 | m_o1 = record[4][1] 69 | else 70 | local max_id = -1 71 | for mem_i = 1, #record[4] do 72 | if max_id == -1 or out[record[4][mem_i]][1] > out[record[4][max_id]][1] then 73 | max_id = mem_i 74 | end 75 | end 76 | m_o1 = record[4][max_id] 77 | end 78 | if m_o1 ~= m_bar[1] then 79 | -- update the model 80 | local example_loss = params.criterion:forward(out, m_o1) 81 | epoch_loss = epoch_loss + example_loss 82 | epoch_iterations = epoch_iterations + 1 83 | local mem_grads = params.criterion:backward(out, m_o1) 84 | params.model:zeroGradParameters() 85 | params.model:backward({record[1], record[2]}, mem_grads) 86 | params.model:updateParameters(params.lr) 87 | end 88 | if rec_id % 5 == 0 then xlua.progress(rec_id, #params.dataset.train_tensors) end 89 | end 90 | xlua.progress(#params.dataset.train_tensors, #params.dataset.train_tensors) 91 | -- update the best performing model so far 92 | local cur_dev_acc = compute_accuracy(params.model, params.dataset.val_tensors) 93 | if best_dev_acc < cur_dev_acc then 94 | best_dev_acc = cur_dev_acc 95 | best_acc_epoch = epoch 96 | best_test_acc = compute_accuracy(params.model, params.dataset.test_tensors) 97 | end 98 | print(string.format('epoch (%d/%d) loss = %.2f; best dev. acc = %.2f; best test. acc = %.2f (%d); update ratio = %.2f; time = %.2f mins;', 99 | epoch, params.num_epochs, (epoch_loss / epoch_iterations), best_dev_acc, best_test_acc, 100 | best_acc_epoch, (epoch_iterations / #params.dataset.train_tensors), ((sys.clock() - epoch_start)/60))) 101 | end 102 | print(string.format('final accuracy = %.2f (%d); time = %.2f mins;', 103 | best_test_acc, best_acc_epoch, ((sys.clock() - train_start)/60))) 104 | -------------------------------------------------------------------------------- /fair/model.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Model implementation 4 | 5 | ]]-- 6 | 7 | function get_model(params) 8 | -- get the representations for supporting memories 9 | local support_dict_master = nn.ParallelTable() -- one dictionary per window position 10 | for i = 1, params.b do 11 | support_dict_master:add(nn.LookupTable(#params.dataset.index2word, params.p)) 12 | end 13 | local support_mem_model = nn.Sequential():add(nn.Identity()):add(nn.SplitTable(1, 1)) 14 | :add(support_dict_master) 15 | if params.window_compo == 'sum' then 16 | support_mem_model:add(nn.CAddTable()) 17 | else 18 | support_mem_model:add(nn.JoinTable(1, 1)) 19 | end 20 | 21 | -- get the representation for query 22 | local query_dict_master = nn.ParallelTable() 23 | for i = 1, params.b do 24 | query_dict_master:add(nn.LookupTable(#params.dataset.index2word, params.p)) 25 | end 26 | local query_mem_model = nn.Sequential():add(nn.Identity()):add(nn.SplitTable(1, 1)) 27 | :add(query_dict_master) 28 | if params.window_compo == 'sum' then 29 | query_mem_model:add(nn.CAddTable()) 30 | else 31 | query_mem_model:add(nn.JoinTable(1, 1)) 32 | end 33 | 34 | -- build the final scoring model 35 | local model = nn.Sequential() 36 | model:add(nn.ParallelTable()) 37 | model.modules[1]:add(support_mem_model) 38 | model.modules[1]:add(query_mem_model) 39 | model:add(nn.MM(false, true)) 40 | 41 | -- ship it to gpu 42 | model = model:cuda() 43 | 44 | -- IMPORTANT! do weight sharing after model is in cuda 45 | for i = 1, params.b do 46 | query_dict_master.modules[i]:share(support_dict_master.modules[i], 'weight', 'bias', 'gradWeight', 'gradBias') 47 | end 48 | 49 | return model, nn.CrossEntropyCriterion():cuda() 50 | end 51 | -------------------------------------------------------------------------------- /fair/preprocess.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Pre-processing scripts which 4 | (i) creates the data tensors (to be used by main.lua) 5 | (ii) saves them to t7 file 6 | 7 | ]]-- 8 | 9 | require 'torch' 10 | require 'cutorch' 11 | require 'io' 12 | require 'xlua' 13 | require 'lfs' 14 | require 'pl.stringx' 15 | require 'pl.file' 16 | tds = require('tds') 17 | 18 | -- Command line arguments 19 | cmd = torch.CmdLine() 20 | cmd:option('-data', '../cbt/', [[path to the data folder containing train, val. & test records]]) 21 | cmd:option('-word_type', 'NE', [[class type of the prediction word. (NE - Named Entity, 22 | CN - Common Noun, V - Verb, P - Preposition]]) 23 | cmd:option('-b', 5, [[size of the window memory]]) 24 | cmd:option('-out', 'dataset.t7', [[data tensors output file name]]) 25 | params = cmd:parse(arg) 26 | 27 | -- Load all the data to memory 28 | print('loading ' .. params.word_type .. ' data...') 29 | params.train_lines = stringx.splitlines(file.read(params.data .. 'cbtest_' .. params.word_type .. '_train.txt')) 30 | params.val_lines = stringx.splitlines(file.read(params.data .. 'cbtest_' .. params.word_type .. '_valid_2000ex.txt')) 31 | params.test_lines = stringx.splitlines(file.read(params.data .. 'cbtest_' .. params.word_type .. '_test_2500ex.txt')) 32 | params.train_size, params.val_size, params.test_size = #params.train_lines/22, #params.val_lines/22, #params.test_lines/22 33 | params.num_pads = math.floor((params.b - 1) / 2) 34 | print('found (' .. params.train_size .. '/' .. params.val_size .. '/' .. params.test_size ..') records') 35 | 36 | -- Build the vocabulary 37 | print('building vocab...') 38 | local start = sys.clock() 39 | params.index2word, params.word2index = tds.hash(), tds.hash() 40 | -- add zero padding before and after a context sentence and question 41 | for i = 1, params.num_pads do 42 | params.index2word[#params.index2word + 1] = '' 43 | params.word2index[''] = #params.index2word 44 | params.index2word[#params.index2word + 1] = '' 45 | params.word2index[''] = #params.index2word 46 | end 47 | function create_vocab(lines) 48 | for i = 1, #lines/22 do 49 | local start = 1 + 22 * (i - 1) 50 | for j = 0, 19 do 51 | local words = stringx.split(lines[start + j]) 52 | for k = 2, #words do 53 | if params.word2index[words[k]] == nil then 54 | params.index2word[#params.index2word + 1] = words[k] 55 | params.word2index[words[k]] = #params.index2word 56 | end 57 | end 58 | end 59 | local words = stringx.split(lines[start + 20]) 60 | for k = 2, (#words - 2) do 61 | words[k] = string.lower(words[k]) 62 | if params.word2index[words[k]] == nil then 63 | params.index2word[#params.index2word + 1] = words[k] 64 | params.word2index[words[k]] = #params.index2word 65 | end 66 | end 67 | end 68 | end 69 | create_vocab(params.train_lines) 70 | create_vocab(params.val_lines) 71 | create_vocab(params.test_lines) 72 | print(string.format('vocab built in %.2f mins; # unique words = %d;', ((sys.clock() - start) / 60), #params.index2word)) 73 | 74 | -- Generate the data tensors 75 | function gen_tensors(lines, label) 76 | print('generating tensors for ' .. label .. '...') 77 | function get_context_windows(lines, start, cands, answer) 78 | local windows, rel_mem_ids = {}, {} 79 | function tokenize(sentence, last_offset) 80 | local words = stringx.split(sentence) 81 | local last_index = #words 82 | if last_offset ~= nil then last_index = #words - last_offset end 83 | local tokens = {} 84 | for i = 1, params.num_pads do 85 | table.insert(tokens, '') 86 | end 87 | for i = 2, last_index do 88 | table.insert(tokens, words[i]) 89 | end 90 | for i = 1, params.num_pads do 91 | table.insert(tokens, '') 92 | end 93 | return tokens 94 | end 95 | for i = 1, #cands do 96 | for j = 0, 19 do 97 | local words = tokenize(lines[start + j]) 98 | for k = params.num_pads + 1, #words - params.num_pads do 99 | if words[k] == cands[i] then 100 | local tensor, m = torch.CudaTensor(params.b), 0 101 | for l = k - params.num_pads, k + params.num_pads do 102 | m = m + 1 103 | tensor[m] = params.word2index[words[l]] 104 | end 105 | table.insert(windows, tensor) 106 | if cands[i] == answer then 107 | -- store the memories relevant to the answer 108 | table.insert(rel_mem_ids, #windows) 109 | end 110 | end 111 | end 112 | end 113 | end 114 | assert(#windows ~= 0) 115 | assert(#rel_mem_ids ~= 0) 116 | local memory_tensor = torch.CudaTensor(#windows, params.b) 117 | for i = 1, #windows do 118 | memory_tensor[i] = windows[i] 119 | end 120 | return memory_tensor, rel_mem_ids 121 | end 122 | function get_query_memory(line) 123 | local words = tokenize(line, 2) 124 | for i = 1, #words do 125 | if words[i] == 'XXXXX' then 126 | local query_tensor, j = torch.CudaTensor(1, params.b), 0 127 | for k = i - params.num_pads, i + params.num_pads do 128 | j = j + 1 129 | words[k] = string.lower(words[k]) 130 | assert(params.word2index[words[k]] ~= nil) 131 | query_tensor[1][j] = params.word2index[words[k]] 132 | end 133 | return query_tensor 134 | end 135 | end 136 | error('cannot find the placeholder token.') 137 | end 138 | local dataset = {} 139 | xlua.progress(1, #lines/22) 140 | for i = 1, #lines/22 do 141 | local start = 1 + 22 * (i - 1) 142 | local last_line = stringx.split(lines[start + 20]) 143 | local answer, answer_cand = last_line[#last_line - 1], stringx.split(last_line[#last_line], '|') 144 | local supporting_memories, rel_mem_ids = get_context_windows(lines, start, answer_cand, answer) 145 | local query_memory = get_query_memory(lines[start + 20]) 146 | local answer_id = params.word2index[answer] 147 | assert(answer_id ~= nil) 148 | table.insert(dataset, {supporting_memories, query_memory, answer_id, rel_mem_ids}) 149 | if i % 5 == 0 then xlua.progress(i, #lines/22) end 150 | end 151 | xlua.progress(#lines/22, #lines/22) 152 | return dataset 153 | end 154 | params.train_tensors = gen_tensors(params.train_lines, 'train') 155 | params.val_tensors = gen_tensors(params.val_lines, 'valid') 156 | params.test_tensors = gen_tensors(params.test_lines, 'test') 157 | 158 | -- Save the data tensors 159 | print('saving all the tensors...') 160 | local save_point = { 161 | train_tensors = params.train_tensors, 162 | val_tensors = params.val_tensors, 163 | test_tensors = params.test_tensors, 164 | index2word = params.index2word, 165 | word2index = params.word2index 166 | } 167 | torch.save(params.out, save_point) 168 | -------------------------------------------------------------------------------- /stanford/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in 11 | all copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 19 | THE SOFTWARE. 20 | -------------------------------------------------------------------------------- /stanford/README.md: -------------------------------------------------------------------------------- 1 | ## A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task 2 | 3 | [Torch](http://torch.ch) implementation of the model 'Neural Net (Relabeling)' proposed in [Chen et al.](https://arxiv.org/abs/1606.02858)'s work. 4 | 5 | ### Features 6 | 7 | * Train the model with the benchmarked corpus, [CNN News Test](https://github.com/deepmind/rc-data) (CNN) out of the box. 8 | * Easy to try out all the variants of activations functions of RNNs such as vanilla RNN, GRU and LSTM. 9 | * Support for tuning other hyper-parameters of the model reported in the paper. 10 | 11 | ### Quick Start 12 | 13 | Download and extract the `cnn.tgz` file from [DeepMind Q&A Dataset](http://cs.nyu.edu/~kcho/DMQA/)'s page. 14 | 15 | To generate and save the data tensors (objects readable for Torch), 16 | 17 | ``` 18 | th preprocess.lua -data cnn/questions/ 19 | ``` 20 | 21 | where the data value `cnn/questions/` points to the extracted directory containing the training, validation and testing files for NE prediction task. 22 | 23 | To kick-start the training, 24 | 25 | ``` 26 | th main.lua 27 | ``` 28 | 29 | To know the hyper-parameters relevant for both the steps, 30 | 31 | ``` 32 | th preprocess.lua --help 33 | th main.lua --help 34 | ``` 35 | 36 | ### Training options 37 | 38 | #### `preprocess.lua` 39 | 40 | * `data`: path to the data folder containing train, validation and test records 41 | * `out`: output file name for the tensors to be saved 42 | * `batch_size`: sgd mini-batch size 43 | * `vocab_size`: size of the word vocabulary. this is constructed by taking the top most frequent words. rest are replaced with tokens.' 44 | * `question_pad`: which side to pad the question to make sequences in a batch to be of same size? `left` or `right`? 45 | * `passage_pad`: which side to pad the passage to make sequences in a batch to be of same size? `left` or `right`? 46 | 47 | #### `main.lua` 48 | 49 | * `input`: input file name for the saved tensors 50 | * `seed`: seed value for the random generator 51 | * `glove_file`: file containing the pre-trained glove word embeddings (downloadable from the [Glove](http://nlp.stanford.edu/projects/glove/)'s official page) 52 | * `dim`: dimensionality of word embeddings 53 | * `hid_size`: RNN's hidden layer size 54 | * `num_epochs`: number of full passes through the training data 55 | * `lr`: adam's learning rate 56 | * `grad_clip`: clip gradients at this value 57 | * `dropout`: dropout for regularization, used before the prediction layer. 0 = no dropout 58 | 59 | ### Torch Dependencies 60 | * nn 61 | * cunn 62 | * cutorch 63 | * rnn 64 | * optim 65 | * xlua 66 | * tds 67 | * nngraph 68 | 69 | ### Author 70 | [Ganesh J](https://researchweb.iiit.ac.in/~ganesh.j/) 71 | 72 | ### Licence 73 | MIT 74 | 75 | 76 | -------------------------------------------------------------------------------- /stanford/main.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Torch Implementation of the paper 'A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task' 4 | 5 | Precisely we attempt to code for the model 'Neural net (relabeling)' mentioned in the paper 6 | 7 | ]]-- 8 | 9 | require 'torch' 10 | require 'cutorch' 11 | require 'nn' 12 | require 'cunn' 13 | require 'optim' 14 | require 'xlua' 15 | require 'sys' 16 | require 'lfs' 17 | tds = require('tds') 18 | paths.dofile('model.lua') 19 | local utils = require 'utils' 20 | 21 | cmd = torch.CmdLine() 22 | cmd:option('-input', 'dataset.t7', [[data tensors input file name]]) 23 | cmd:option('-glove_file', 'data/glove.6B.100d.txt', [[file containing the pre-trained glove word embeddings]]) 24 | cmd:option('-seed', 123, [[seed for the random generator]]) 25 | cmd:option('-dim', 100, [[dimensionality of word embeddings]]) 26 | cmd:option('-hid_size', 128, [[GRU's hidden layer size]]) 27 | cmd:option('-num_epochs', 30, [[number of full passes through the training data]]) 28 | cmd:option('-lr', 0.1, [[sgd learning rate]]) 29 | cmd:option('-grad_clip', 10, [[clip gradients at this value]]) 30 | cmd:option('-dropout', 0.2, [[dropout for regularization, used before the prediction layer. 0 = no dropout]]) 31 | params = cmd:parse(arg) 32 | 33 | torch.manualSeed(params.seed) 34 | params.optim_state = { learningRate = params.lr } 35 | 36 | -- load the dataset 37 | print('loading the dataset...') 38 | params.dataset = torch.load(params.input) 39 | 40 | -- initiate the model & criterion 41 | print('initializing the model...') 42 | params.model = nn.MaskZero(get_model(params), 1):cuda() 43 | params.criterion = nn.MaskZeroCriterion(nn.CrossEntropyCriterion(), 1):cuda() 44 | params.pp, params.gp = params.model:getParameters() -- flatten all the parameters into one fat tensor 45 | 46 | -- pre-initialize the word embeddings from glove 47 | local is_present = lfs.attributes(params.glove_file) or -1 48 | if is_present ~= -1 then 49 | utils.init_word_weights(params, params.passage_word_lookup, params.glove_file) 50 | else 51 | print('>>>WARNING>>> Specified glove embedding file is not found at: ' .. params.glove_file) 52 | end 53 | 54 | -- train & evaluate the model 55 | print('training...') 56 | local optim_states, best_dev_acc, best_test_acc, best_acc_epoch, train_start = {}, -1, -1, -1, sys.clock() 57 | for epoch = 1, params.num_epochs do 58 | local epoch_start, epoch_loss, epoch_iterations, indices = sys.clock(), 0, 0, torch.randperm(#params.dataset.train_batches) 59 | params.model:training() 60 | xlua.progress(1, #params.dataset.train_batches) 61 | for batch = 1, #params.dataset.train_batches do 62 | local cur_batch_record = params.dataset.train_batches[indices[batch]] 63 | -- while defining the model, we assume the batch size is constant. (must solve this later) 64 | local feval = function(x) 65 | params.gp:zero() 66 | local out = params.model:forward({cur_batch_record[3], {{cur_batch_record[1], cur_batch_record[2]}, cur_batch_record[1]}}) 67 | local example_batch_loss = params.criterion:forward(out, cur_batch_record[4]) 68 | epoch_loss = epoch_loss + example_batch_loss * (#cur_batch_record[4])[1] 69 | epoch_iterations = epoch_iterations + (#cur_batch_record[4])[1] 70 | local rep_grads = params.criterion:backward(out, cur_batch_record[4]) 71 | params.model:backward({cur_batch_record[3], {{cur_batch_record[1], cur_batch_record[2]}, cur_batch_record[1]}}, rep_grads) 72 | params.gp:clamp(-params.grad_clip, params.grad_clip) 73 | return example_batch_loss, params.gp 74 | end 75 | optim.sgd(feval, params.pp, params.optim_state, optim_states) 76 | -- if you don't call the following 2 lines, the previous state will be used to activate the next 77 | params.question_encoder:forget() 78 | params.passage_encoder:forget() 79 | xlua.progress(batch, #params.dataset.train_batches) 80 | end 81 | xlua.progress(#params.dataset.train_batches, #params.dataset.train_batches) 82 | -- update the best performing model so far 83 | local cur_dev_acc = utils.compute_accuracy(params.model, params.dataset.valid_batches) 84 | if best_dev_acc < cur_dev_acc then 85 | best_dev_acc = cur_dev_acc 86 | best_acc_epoch = epoch 87 | best_test_acc = utils.compute_accuracy(params.model, params.dataset.test_batches) 88 | end 89 | print(string.format('epoch (%d/%d) loss = %.2f; best dev. acc = %.2f; best test. acc = %.2f (%d); time = %.2f mins;', 90 | epoch, params.num_epochs, (epoch_loss / epoch_iterations), best_dev_acc, best_test_acc, 91 | best_acc_epoch, ((sys.clock() - epoch_start)/60))) 92 | end 93 | print(string.format('final accuracy = %.2f (%d); time = %.2f mins;', 94 | best_test_acc, best_acc_epoch, ((sys.clock() - train_start)/60))) 95 | -------------------------------------------------------------------------------- /stanford/model.lua: -------------------------------------------------------------------------------- 1 | require 'rnn' 2 | 3 | function get_model(params) 4 | -- question encoder 5 | local word_lookup = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim) 6 | local q_gru_layer = nn.GRU(params.dim, params.hid_size, nil, 0):maskZero(1) 7 | local q_fwd_gru = nn.Sequential():add(word_lookup) 8 | :add(q_gru_layer) 9 | local q_fwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 10 | :add(nn.Sequencer(q_fwd_gru)) 11 | local q_bwd_gru = nn.Sequential():add(word_lookup:sharedClone()) 12 | :add(q_gru_layer:sharedClone()) 13 | local q_bwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 14 | :add(nn.Sequencer(q_bwd_gru)) 15 | local q_parallel = nn.ParallelTable():add(q_fwd_seq) 16 | :add(q_bwd_seq) 17 | local q_encoder = nn.Sequential():add(q_parallel) 18 | :add(nn.MaskZero(nn.ZipTable(), 1)) 19 | :add(nn.Sequencer(nn.MaskZero(nn.JoinTable(1, 1), 1))) -- merges the fwd, seq out at each time step 20 | :add(nn.Sequencer(nn.MaskZero(nn.Select(1, -1), 2))) -- get the last step output 21 | :add(nn.MaskZero(nn.JoinTable(1), 1)) 22 | :add(nn.MaskZero(nn.View(-1, 2 * params.hid_size), 1)) 23 | 24 | -- passage encoder 25 | local p_gru_layer = nn.GRU(params.dim, params.hid_size, nil, 0):maskZero(1) 26 | local p_fwd_gru = nn.Sequential():add(word_lookup:sharedClone()) 27 | :add(p_gru_layer) 28 | local p_fwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 29 | :add(nn.Sequencer(p_fwd_gru)) 30 | local p_bwd_gru = nn.Sequential():add(word_lookup:sharedClone()) 31 | :add(p_gru_layer:sharedClone()) 32 | local p_bwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 33 | :add(nn.Sequencer(p_bwd_gru)) 34 | local p_parallel = nn.ParallelTable():add(p_fwd_seq) 35 | :add(p_bwd_seq) 36 | local p_encoder = nn.Sequential():add(p_parallel) 37 | :add(nn.MaskZero(nn.ZipTable(), 1)) 38 | :add(nn.Sequencer(nn.MaskZero(nn.JoinTable(1, 1), 1))) -- merges the fwd, seq out at each time step 39 | :add(nn.Sequencer(nn.MaskZero(nn.View(1, -1, 2 * params.hid_size), 1))) 40 | :add(nn.MaskZero(nn.JoinTable(1), 3)) 41 | 42 | -- build the attention model 43 | local bilinear_layer = nn.Sequential():add(q_encoder) 44 | :add(nn.MaskZero(nn.Linear(2 * params.hid_size, 2 * params.hid_size), 2)) 45 | :add(nn.MaskZero(nn.Unsqueeze(3), 2)) 46 | local alpha_layer_0 = nn.ParallelTable():add(p_encoder) 47 | :add(bilinear_layer) 48 | local alpha_layer = nn.Sequential():add(alpha_layer_0) 49 | :add(nn.MaskZero(nn.MM(false, false), 2)) 50 | :add(nn.MaskZero(nn.Squeeze(), 3)) 51 | :add(nn.MaskZero(nn.SoftMax(), 1)) 52 | :add(nn.MaskZero(nn.Unsqueeze(3), 2)) 53 | local output_layer_0 = nn.ParallelTable():add(alpha_layer) 54 | :add(p_encoder:sharedClone()) 55 | local output_layer = nn.Sequential():add(output_layer_0) 56 | :add(nn.MaskZero(nn.MM(true, false), 3)) 57 | local output_lookup = nn.LookupTableMaskZero(#params.dataset.global_i2e, 2 * params.hid_size) 58 | local combiner = nn.ParallelTable():add(output_lookup) 59 | :add(output_layer) 60 | local model = nn.Sequential():add(combiner) 61 | :add(nn.MaskZero(nn.MM(false, true), 3)) 62 | :add(nn.MaskZero(nn.Squeeze(), 3)) 63 | model = model:cuda() 64 | 65 | -- bring back all states to the start of the sequence buffers 66 | params.question_encoder = q_gru_layer 67 | params.question_encoder:forget() 68 | params.passage_encoder = p_gru_layer 69 | params.passage_encoder:forget() 70 | 71 | return model 72 | end -------------------------------------------------------------------------------- /stanford/model_nngraph.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Nngraph based implementation of the NN 4 | 5 | ]]-- 6 | 7 | require 'nngraph' 8 | require 'rnn' 9 | 10 | function get_model(params) 11 | local inputs = { nn.Identity()(), nn.Identity()(), nn.Identity()()} 12 | local question = inputs[1] 13 | local passage = inputs[2] 14 | local answer_candidates = inputs[3] 15 | 16 | -- encode the question 17 | local question_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(question):annotate{name = 'question_word_lookup'} 18 | local question_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1)) 19 | (nn.SplitTable(1, 2)(question_word_vectors)):annotate{name = 'question_encoder'} 20 | local final_q_out = nn.Dropout(params.dropout)(nn.Unsqueeze(3)(nn.SelectTable(-1)(question_encoder))) -- get the last step output 21 | 22 | -- encode the passage 23 | local passage_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(passage):annotate{name = 'passage_word_lookup'} 24 | local passage_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1)) 25 | (nn.SplitTable(1, 2)(passage_word_vectors)):annotate{name = 'passage_encoder'} 26 | local final_p_out = nn.Dropout(params.dropout)(nn.View(params.bsize, -1, 2 * params.hid_size) 27 | (nn.JoinTable(2)(passage_encoder))) -- combine the forward and backward rnns' output 28 | 29 | -- calculate the attention 30 | local attention_probs = nn.SoftMax()(nn.MM(false, false)({final_p_out, 31 | nn.Unsqueeze(3)(nn.Linear(2 * params.hid_size, 2 * params.hid_size)(nn.View(2 * params.hid_size)(final_q_out)))})) 32 | local weighted_out = nn.MM(true, false){final_p_out, attention_probs} 33 | 34 | -- do prediction 35 | local answer_output_lookup = nn.LookupTableMaskZero(#params.dataset.global_i2e, 2 * params.hid_size) 36 | local cand_answer_vectors = answer_output_lookup(answer_candidates):annotate{name = 'cand_lookup'} 37 | local prediction_out = nn.Squeeze()(nn.MM(false, false){cand_answer_vectors, weighted_out}) 38 | 39 | return nn.gModule(inputs, {prediction_out}) 40 | end 41 | -------------------------------------------------------------------------------- /stanford/preprocess.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Pre-processing scripts which 4 | (i) creates the data tensors (to be used by main.lua) 5 | (ii) saves them to t7 file 6 | 7 | ]]-- 8 | 9 | require 'torch' 10 | require 'cutorch' 11 | require 'io' 12 | require 'xlua' 13 | require 'lfs' 14 | require 'pl.stringx' 15 | tds = require('tds') 16 | 17 | -- Command line arguments 18 | cmd = torch.CmdLine() 19 | cmd:option('-data', '../cnn/', [[path to the data folder containing train, val. & test records]]) 20 | cmd:option('-out', 'dataset.t7', [[data tensors output file name]]) 21 | cmd:option('-batch_size', 32, [[sgd mini-batch size]]) 22 | cmd:option('-vocab_size', 50000, [[size of the word vocabulary. this is constructed by taking the top most frequent words. 23 | rest are replaced with tokens.']]) 24 | cmd:option('-question_pad', 'left', [[which side to pad the question to make sequences in a batch 25 | to be of same size? left or right? ]]) 26 | cmd:option('-passage_pad', 'right', [[which side to pad the passage to make sequences in a batch 27 | to be of same size? left or right? ]]) 28 | params = cmd:parse(arg) 29 | 30 | params.train_folder = params.data .. 'training/' 31 | params.val_folder = params.data .. 'validation/' 32 | params.test_folder = params.data .. 'test/' 33 | params.unk = '' 34 | 35 | -- Build the vocabulary 36 | print('building vocab...') 37 | local start = sys.clock() 38 | params.vocab, params.global_e2i, params.global_i2e, params.unique_words, params.train_size = tds.hash(), tds.hash(), tds.hash(), 0, 0 39 | for file in lfs.dir(params.train_folder) do 40 | if #file > 2 then -- pass through '.' & '..' 41 | local fptr = io.open(params.train_folder .. file, 'r') 42 | local url = fptr:read() 43 | fptr:read() -- pass through empty line 44 | local passage = fptr:read() 45 | fptr:read() -- pass through empty line 46 | local local_e2i, local_i2e = {}, {} 47 | function add_to_vocab(tokens, update_vocab) 48 | for i = 1, #tokens do 49 | local token = tokens[i] 50 | if stringx.startswith(token, '@') == true then 51 | if local_e2i[token] == nil then 52 | local_i2e[#local_i2e + 1] = token 53 | local_e2i[token] = #local_i2e 54 | end 55 | token = '@entity' .. local_e2i[token] 56 | if params.global_e2i[token] == nil then 57 | params.global_i2e[#params.global_i2e + 1] = token 58 | params.global_e2i[token] = #params.global_i2e 59 | end 60 | end 61 | if update_vocab == true then 62 | if params.vocab[tokens[i]] == nil then 63 | params.vocab[tokens[i]] = 0 64 | params.unique_words = params.unique_words + 1 65 | end 66 | params.vocab[tokens[i]] = params.vocab[tokens[i]] + 1 67 | end 68 | end 69 | end 70 | add_to_vocab(stringx.split(passage), true) -- add tokens in passage to vocab 71 | local question = fptr:read() -- add tokens in question to vocab 72 | add_to_vocab(stringx.split(question), true) 73 | fptr:read() 74 | add_to_vocab({answer}, false) -- add answer token to vocab 75 | fptr:read() 76 | while true do 77 | local line = fptr:read() 78 | if line == nil then break end 79 | local cand_entity = stringx.split(line, ':')[1] 80 | add_to_vocab({cand_entity}, false) -- add candidate answer token to vocab 81 | end 82 | io.close(fptr) 83 | params.train_size = params.train_size + 1 84 | end 85 | end 86 | print(string.format('vocab built in %.2f mins', ((sys.clock() - start) / 60))) 87 | print('#unique candidate answers = '..#params.global_i2e .. ' (from ' .. params.train_size ..' questions)') 88 | print('#unique unigrams before pruning = '..params.unique_words) 89 | function extract_top_k_words() 90 | print('extracting top ' .. params.vocab_size .. ' words...') 91 | if params.unique_words < params.vocab_size then 92 | print('error: specified vocabulary size cannot be greater than # unique words in the dataset.') 93 | os.exit(0) 94 | end 95 | local words, word_freq_tensor, i = {}, torch.Tensor(params.unique_words), 0 96 | for word, count in pairs(params.vocab) do 97 | table.insert(words, word) 98 | i = i + 1 99 | word_freq_tensor[i] = count 100 | end 101 | local _, idx = torch.sort(word_freq_tensor, true) -- sort the words by decreasing order of frequency 102 | local new_vocab = tds.hash() 103 | for i = 1, params.vocab_size do 104 | new_vocab[words[idx[i]]] = params.vocab[words[idx[i]]] 105 | end 106 | return new_vocab 107 | end 108 | params.vocab = extract_top_k_words() 109 | params.index2word = tds.hash() 110 | params.word2index = tds.hash() 111 | for word, count in pairs(params.vocab) do 112 | params.index2word[#params.index2word + 1] = word 113 | params.word2index[word] = #params.index2word 114 | end 115 | params.index2word[#params.index2word + 1] = params.unk 116 | params.word2index[params.unk] = #params.index2word 117 | params.global_i2e[#params.global_i2e + 1] = params.unk 118 | params.global_e2i[params.unk] = #params.global_i2e 119 | 120 | -- Generate the data tensors 121 | function gen_tensors(folder, label) 122 | print('processing tensors for ' .. label .. '...') 123 | start = sys.clock() 124 | -- get the data id -> passage length mapping 125 | print('generating data id -> passage mapping for ' .. label .. '...') 126 | local d2plen, records = {}, {} 127 | for file in lfs.dir(folder) do 128 | if #file > 2 then 129 | local fptr = io.open(folder .. file, 'r') 130 | local url = fptr:read() 131 | fptr:read() 132 | local local_e2i, local_i2e = {}, {} 133 | function get_entity_text(token) 134 | if local_e2i[token] == nil then 135 | local_i2e[#local_i2e + 1] = token 136 | local_e2i[token] = #local_i2e 137 | end 138 | token = '@entity' .. local_e2i[token] 139 | return token 140 | end 141 | function get_tensor(unigrams) 142 | local tensor = torch.CudaTensor(#unigrams) 143 | for i = 1, #unigrams do 144 | local token = unigrams[i] 145 | if stringx.startswith(token, '@') == true then 146 | token = get_entity_text(token) 147 | end 148 | if params.word2index[token] == nil then 149 | token = params.unk 150 | end 151 | tensor[i] = params.word2index[token] 152 | end 153 | return tensor 154 | end 155 | local passage_text = fptr:read() 156 | local passage_tensor = get_tensor(stringx.split(passage_text)) 157 | table.insert(d2plen, (#passage_tensor)[1]) 158 | fptr:read() 159 | local question_text = fptr:read() 160 | local question_tensor = get_tensor(stringx.split(question_text)) 161 | fptr:read() 162 | function get_answer_id(token) 163 | token = get_entity_text(token) 164 | if params.global_e2i[token] == nil then token = params.unk end 165 | return params.global_e2i[token] 166 | end 167 | local answer_text = fptr:read() 168 | local answer_id = get_answer_id(answer_text) 169 | assert(answer_id ~= nil) 170 | fptr:read() 171 | local cand_ids = {} 172 | while true do 173 | local line = fptr:read() 174 | if line == nil then break end 175 | local candidate_text = stringx.split(line, ':')[1] 176 | local cand_id = get_answer_id(candidate_text) 177 | assert(cand_id ~= nil) 178 | table.insert(cand_ids, cand_id) 179 | end 180 | local cand_tensor = torch.CudaTensor(cand_ids) 181 | table.insert(records, {passage_tensor, question_tensor, answer_id, cand_tensor}) 182 | io.close(fptr) 183 | end 184 | end 185 | 186 | local p_sizes, idx = torch.sort(torch.Tensor(d2plen), true) -- Sort the passage ids by decreasing order of length 187 | 188 | function get_cur_batch_stat(start, last) 189 | local max_passage, max_ans_cand, max_question = -1000, -1000, -1000 190 | for i = start, last do 191 | local record = records[idx[i]] 192 | if (#record[1])[1] > max_passage then max_passage = (#record[1])[1] end 193 | if (#record[2])[1] > max_question then max_question = (#record[2])[1] end 194 | if (#record[4])[1] > max_ans_cand then max_ans_cand = (#record[4])[1] end 195 | end 196 | return max_passage, max_ans_cand, max_question 197 | end 198 | 199 | -- create the batches 200 | print('creating the final batches for ' .. label .. '...') 201 | local batches, cur_batch, num_batches = {}, 0, math.ceil(#d2plen / params.batch_size) 202 | xlua.progress(1, num_batches) 203 | for i = 1, #d2plen, params.batch_size do 204 | cur_batch = cur_batch + 1 205 | local cur_batch_size = math.min(#d2plen, i + params.batch_size - 1) - i + 1 206 | if cur_batch_size ~= params.batch_size then break end -- (to-do) 207 | local max_passage, max_ans_cand, max_question = get_cur_batch_stat(i, i + cur_batch_size - 1) 208 | local passage_batch_tensor, passage_rev_batch_tensor, question_batch_tensor, question_rev_batch_tensor, answer_cand_batch_tensor = 209 | torch.CudaTensor(cur_batch_size, max_passage):fill(0), torch.CudaTensor(cur_batch_size, max_passage):fill(0), 210 | torch.CudaTensor(cur_batch_size, max_question):fill(0), torch.CudaTensor(cur_batch_size, max_question):fill(0), 211 | torch.CudaTensor(cur_batch_size, max_ans_cand):fill(0) 212 | local answer_batch_tensor = torch.CudaTensor(cur_batch_size, 1) 213 | for j = 1, cur_batch_size do 214 | local record = records[idx[i + j - 1]] 215 | for k = 1, (#record[1])[1] do 216 | if params.passage_pad == 'left' then 217 | passage_batch_tensor[j][k] = record[1][k] 218 | passage_rev_batch_tensor[j][(#record[1])[1] - k + 1] = record[1][k] 219 | else 220 | passage_batch_tensor[j][max_passage - k + 1] = record[1][k] 221 | passage_rev_batch_tensor[j][max_passage - (#record[1])[1] + k] = record[1][k] 222 | end 223 | end 224 | for k = 1, (#record[2])[1] do 225 | if params.question_pad == 'left' then 226 | question_batch_tensor[j][k] = record[2][k] 227 | question_rev_batch_tensor[j][(#record[2])[1] - k + 1] = record[2][k] 228 | else 229 | question_batch_tensor[j][max_passage - k + 1] = record[2][k] 230 | question_rev_batch_tensor[j][max_passage - (#record[2])[1] + k] = record[2][k] 231 | end 232 | end 233 | answer_batch_tensor[j][1] = record[3] 234 | answer_cand_batch_tensor[{ j, { 1, (#record[4])[1] }}] = record[4] 235 | end 236 | if cur_batch % 5 == 0 then xlua.progress(i, num_batches) end 237 | table.insert(batches, {{passage_batch_tensor, passage_rev_batch_tensor}, {question_batch_tensor, question_rev_batch_tensor}, 238 | answer_cand_batch_tensor, answer_batch_tensor}) 239 | end 240 | xlua.progress(num_batches, num_batches) 241 | print(string.format('batches for %s processed in %.2f mins', label, ((sys.clock() - start) / 60))) 242 | return batches 243 | end 244 | 245 | params.train_batches = gen_tensors(params.train_folder, 'train') 246 | params.valid_batches = gen_tensors(params.val_folder, 'valid') 247 | params.test_batches = gen_tensors(params.test_folder, 'test') 248 | 249 | -- Save the batches 250 | print('saving all the tensors...') 251 | local save_point = { 252 | train_batches = params.train_batches, 253 | valid_batches = params.valid_batches, 254 | test_batches = params.test_batches, 255 | index2word = params.index2word, 256 | word2index = params.word2index, 257 | global_i2e = params.global_i2e, 258 | global_e2i = params.global_e2i 259 | } 260 | torch.save(params.out, save_point) 261 | -------------------------------------------------------------------------------- /stanford/utils.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Utilities used by main.lua 4 | 5 | ]]-- 6 | 7 | local utils={} 8 | 9 | -- Function to get any layer from nnGraph module 10 | function utils.get_layer(model, name) 11 | for _, node in ipairs(model.modules[1].forwardnodes) do 12 | if node.data.annotations.name == name then 13 | return node.data.module 14 | end 15 | end 16 | return nil 17 | end 18 | 19 | -- Function to compute accuracy of the model 20 | function utils.compute_accuracy(model, data, bsize) 21 | model:evaluate() 22 | local total, correct, softmax_model = 0, 0, nn.SoftMax():cuda() 23 | for i = 1, #data do 24 | local cur_batch_record = data[i] 25 | --if (#cur_batch[1])[1] == params.bsize then 26 | local out = params.model:forward({cur_batch_record[3], {{cur_batch_record[1], cur_batch_record[2]}, cur_batch_record[1]}}) 27 | local soft_out = softmax_model:forward(out) 28 | _, max_ids = soft_out:max(2) 29 | for j = 1, (#cur_batch_record[4])[1] do 30 | if max_ids[j] == cur_batch_record[4][j] then correct = correct + 1 end 31 | total = total + 1 32 | end 33 | --end 34 | end 35 | return correct / total 36 | end 37 | 38 | -- Function to initalize word weights 39 | function utils.init_word_weights(params, lookup, file) 40 | print('initializing word lookup with the pre-trained embeddings...') 41 | local start_time = sys.clock() 42 | local ic = 0 43 | local begin_offset = 1 --[[ since rnn uses lookuptablemaskzero table, 44 | the first index in weight matrix corresponds to the zero padding ]]-- 45 | for line in io.lines(file) do 46 | local content = stringx.split(line) 47 | local word = content[1] 48 | if params.dataset.word2index[word] ~= nil then 49 | local tensor = torch.Tensor(#content - 1) 50 | for i = 2, #content do 51 | tensor[i - 1] = tonumber(content[i]) 52 | end 53 | lookup.weight[begin_offset + params.dataset.word2index[word]] = tensor 54 | ic = ic + 1 55 | end 56 | end 57 | print(string.format("%d out of %d words initialized. Done in %.2f minutes.", 58 | ic, #params.dataset.index2word, (sys.clock() - start_time)/60)) 59 | end 60 | 61 | return utils 62 | -------------------------------------------------------------------------------- /watson/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in 11 | all copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 19 | THE SOFTWARE. 20 | -------------------------------------------------------------------------------- /watson/README.md: -------------------------------------------------------------------------------- 1 | ## Text Understanding with the Attention Sum Reader Network 2 | 3 | [Torch](http://torch.ch) implementation of the model 'AS Reader (Single Model)' proposed in [Kadlec et al.](https://arxiv.org/abs/1603.01547)'s work. 4 | 5 | ### Features 6 | 7 | * Train the model with the benchmarked corpus, [Children's Book Test](https://research.facebook.com/research/babi/) (CBT) out of the box. 8 | * Easy to try out all the variants of activations functions of RNNs such as vanilla RNN, GRU and LSTM. 9 | * Support for tuning other hyper-parameters of the model reported in the paper. 10 | 11 | ### Quick Start 12 | 13 | Download and extract the `CBTest.tgz` file from [FB Research](https://research.facebook.com/research/babi/)'s page. 14 | 15 | To generate and save the data tensors (objects readable for Torch), 16 | 17 | ``` 18 | th preprocess.lua -data CBTest/data/ 19 | ``` 20 | 21 | where the data value `CBTest/data/` points to the extracted directory containing the training, validation and testing files for all the 4 prediction tasks, viz., `Named Entity` (NE), `Common Noun` (CN), `Preposition` (P) and `Verb` (V). 22 | 23 | To kick-start the training, 24 | 25 | ``` 26 | th main.lua 27 | ``` 28 | 29 | To know the hyper-parameters relevant for both the steps, 30 | 31 | ``` 32 | th preprocess.lua --help 33 | th main.lua --help 34 | ``` 35 | 36 | ### Training options 37 | 38 | #### `preprocess.lua` 39 | 40 | * `word_type`: class type of the prediction word. Specify `NE` for Named Entity, `CN` for Common Noun, `P` for Preposition and `V` for Verb 41 | * `data`: path to the data folder containing train, validation and test records 42 | * `out`: output file name for the tensors to be saved 43 | * `question_pad`: which side to pad the question to make sequences in a batch to be of same size? `left` or `right`? 44 | * `passage_pad`: which side to pad the passage to make sequences in a batch to be of same size? `left` or `right`? 45 | 46 | #### `main.lua` 47 | 48 | * `input`: input file name for the saved tensors 49 | * `seed`: seed value for the random generator 50 | * `dim`: dimensionality of word embeddings 51 | * `hid_size`: RNN's hidden layer size 52 | * `num_epochs`: number of full passes through the training data 53 | * `lr`: adam's learning rate 54 | * `bsize`: mini-batch size for adam 55 | * `grad_clip`: clip gradients at this value 56 | 57 | ### Torch Dependencies 58 | * nn 59 | * cunn 60 | * cutorch 61 | * rnn 62 | * optim 63 | * xlua 64 | * tds 65 | * nngraph 66 | 67 | ### Author 68 | [Ganesh J](https://researchweb.iiit.ac.in/~ganesh.j/) 69 | 70 | ### Licence 71 | MIT 72 | 73 | -------------------------------------------------------------------------------- /watson/main.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Torch Implementation of the paper 'Text Understanding with the Attention Sum Reader Network' 4 | 5 | Precisely we attempt to code for the model 'AS Reader (Single model)' mentioned in the paper 6 | 7 | ]]-- 8 | 9 | require 'torch' 10 | require 'cutorch' 11 | require 'nn' 12 | require 'cunn' 13 | require 'optim' 14 | require 'xlua' 15 | require 'sys' 16 | require 'lfs' 17 | tds = require('tds') 18 | paths.dofile('model.lua') 19 | local utils = require 'utils' 20 | 21 | cmd = torch.CmdLine() 22 | cmd:option('-input', 'dataset.t7', [[data tensors input file name]]) 23 | cmd:option('-seed', 123, [[seed for the random generator]]) 24 | cmd:option('-dim', 384, [[dimensionality of word embeddings]]) 25 | cmd:option('-hid_size', 384, [[GRU's hidden layer size]]) 26 | cmd:option('-num_epochs', 2, [[number of full passes through the training data]]) 27 | cmd:option('-lr', 0.001, [[adam learning rate]]) 28 | cmd:option('-bsize', 32, [[adam mini-batch size]]) 29 | cmd:option('-grad_clip', 10, [[clip gradients at this value]]) 30 | params = cmd:parse(arg) 31 | 32 | torch.manualSeed(params.seed) 33 | params.optim_state = { learningRate = params.lr } 34 | 35 | -- load the dataset 36 | print('loading the dataset...') 37 | params.dataset = torch.load(params.input) 38 | 39 | -- initiate the model & criterion 40 | print('initializing the model...') 41 | params.model = get_model(params):cuda() 42 | params.criterion = nn.MaskZeroCriterion(nn.ClassNLLCriterion(), 1):cuda() 43 | params.pp, params.gp = params.model:getParameters() -- flatten all the parameters into one fat tensor 44 | params.pp:uniform(-0.1, 0.1) -- initialize the parameters from uniform distribution 45 | 46 | -- train & evaluate the model 47 | print('training...') 48 | local optim_states, best_dev_acc, best_test_acc, best_acc_epoch, train_start = {}, -1, -1, -1, sys.clock() 49 | local question_master_tensor, question_rev_master_tensor, passage_master_tensor, passage_rev_master_tensor = 50 | torch.CudaTensor(params.bsize, params.dataset.max_question_len), torch.CudaTensor(params.bsize, params.dataset.max_question_len), 51 | torch.CudaTensor(params.bsize, params.dataset.max_passage_len), torch.CudaTensor(params.bsize, params.dataset.max_passage_len) 52 | local sum_master_tensor, new_grads_tensor, label_master_tensor = torch.CudaTensor(params.bsize, params.dataset.max_uniq_words_per_passage), 53 | torch.CudaTensor(params.bsize, params.dataset.max_passage_len), torch.CudaTensor(params.bsize) 54 | for epoch = 1, params.num_epochs do 55 | local epoch_start, epoch_loss, indices, num_batches, batch_id = sys.clock(), 0, 56 | torch.randperm(#params.dataset.train_tensors[1]), math.ceil(#params.dataset.train_tensors[1] / params.bsize), 0 57 | params.model:training() 58 | xlua.progress(1, num_batches) 59 | for i = 1, #params.dataset.train_tensors[1], params.bsize do 60 | local batch_size = math.min(i + params.bsize - 1, #params.dataset.train_tensors[1]) - i + 1 61 | local feval = function(x) 62 | params.gp:zero() 63 | for j = 1, batch_size do 64 | local index = indices[i + j - 1] 65 | local meta_p_local_word_list, meta_local_vocab_size, meta_local_answer_id = unpack(params.dataset.train_tensors[1][index]) 66 | local data_passage_tensor, data_passage_rev_tensor, data_question_tensor, data_question_rev_tensor = params.dataset.train_tensors[2][index], 67 | params.dataset.train_tensors[3][index], params.dataset.train_tensors[4][index], params.dataset.train_tensors[5][index] 68 | question_master_tensor[batch_size]:copy(data_question_tensor:cuda()) 69 | question_rev_master_tensor[batch_size]:copy(data_question_rev_tensor:cuda()) 70 | passage_master_tensor[batch_size]:copy(data_passage_tensor:cuda()) 71 | passage_master_tensor[batch_size]:copy(data_passage_rev_tensor:cuda()) 72 | label_master_tensor[j]= meta_local_answer_id 73 | end 74 | local pred = params.model:forward({{passage_master_tensor[{ {1, batch_size}, {} }], passage_rev_master_tensor[{ {1, batch_size}, {} }]}, 75 | {question_master_tensor[{ {1, batch_size}, {} }], question_rev_master_tensor[{ {1, batch_size}, {} }]}}) 76 | -- compute the total probability per unique word in the passage 77 | sum_master_tensor:fill(0) 78 | for j = 1, batch_size do 79 | local local_word_list = params.dataset.train_tensors[1][indices[i + j - 1]][1] 80 | for k = 1, #local_word_list do 81 | local l = (params.dataset.max_passage_len - #local_word_list + k) 82 | sum_master_tensor[j][local_word_list[k]] = sum_master_tensor[j][local_word_list[k]] + pred[j][l] 83 | end 84 | end 85 | local example_loss = params.criterion:forward(sum_master_tensor[{ {1, batch_size}, {} }], label_master_tensor[{ {batch_size} }]) 86 | epoch_loss = epoch_loss + example_loss 87 | local rep_grads = params.criterion:backward(sum_master_tensor[{ {1, batch_size}, {} }], label_master_tensor[{ {batch_size} }]) 88 | new_grads_tensor:fill(0) 89 | for j = 1, batch_size do 90 | local local_word_list = params.dataset.train_tensors[1][indices[i + j - 1]][1] 91 | for k = 1, #local_word_list do 92 | local l = (params.dataset.max_passage_len - #local_word_list + k) 93 | new_grads_tensor[j][l] = rep_grads[j][local_word_list[k]] 94 | end 95 | end 96 | params.gp:clamp(-params.grad_clip, params.grad_clip) 97 | return example_loss, params.gp 98 | end 99 | optim.adam(feval, params.pp, params.optim_state, optim_states) 100 | -- if you don't call the following 2 lines, the previous state will be used to activate the next 101 | params.q_gru_layer:forget() 102 | params.p_gru_layer:forget() 103 | batch_id = batch_id + 1 104 | xlua.progress(batch_id, num_batches) 105 | end 106 | xlua.progress(num_batches, num_batches) 107 | -- update the best performing model so far 108 | local cur_dev_acc = utils.compute_accuracy(params.model, params.dataset.val_tensors, params) 109 | if best_dev_acc < cur_dev_acc then 110 | best_dev_acc = cur_dev_acc 111 | best_acc_epoch = epoch 112 | best_test_acc = utils.compute_accuracy(params.model, params.dataset.test_tensors, params) 113 | end 114 | print(string.format('epoch (%d/%d) loss = %.2f; best dev. acc = %.2f; best test. acc = %.2f (%d); time = %.2f mins;', 115 | epoch, params.num_epochs, (epoch_loss / #params.dataset.train_tensors[1]), best_dev_acc, best_test_acc, 116 | best_acc_epoch, ((sys.clock() - epoch_start)/60))) 117 | end 118 | print(string.format('final accuracy = %.2f (%d); time = %.2f mins;', 119 | best_test_acc, best_acc_epoch, ((sys.clock() - train_start)/60))) 120 | -------------------------------------------------------------------------------- /watson/model.lua: -------------------------------------------------------------------------------- 1 | require 'rnn' 2 | 3 | function get_model(params) 4 | -- question encoder 5 | local word_lookup = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim) 6 | local q_gru_layer = nn.GRU(params.dim, params.hid_size, params.max_question_len, 0):maskZero(1) 7 | local q_fwd_gru = nn.Sequential():add(word_lookup) 8 | :add(q_gru_layer) 9 | local q_fwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 10 | :add(nn.Sequencer(q_fwd_gru)) 11 | local q_bwd_gru = nn.Sequential():add(word_lookup:sharedClone()) 12 | :add(q_gru_layer:sharedClone()) 13 | local q_bwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 14 | :add(nn.Sequencer(q_bwd_gru)) 15 | local q_parallel = nn.ParallelTable():add(q_fwd_seq) 16 | :add(q_bwd_seq) 17 | local q_encoder = nn.Sequential():add(q_parallel) 18 | :add(nn.MaskZero(nn.ZipTable(), 1)) 19 | :add(nn.Sequencer(nn.MaskZero(nn.JoinTable(1, 1), 1))) -- merges the fwd, seq out at each time step 20 | :add(nn.Sequencer(nn.MaskZero(nn.Select(1, -1), 2))) -- get the last step output 21 | :add(nn.MaskZero(nn.JoinTable(1), 1)) 22 | :add(nn.MaskZero(nn.View(-1, 2 * params.hid_size), 1)) 23 | :add(nn.MaskZero(nn.Unsqueeze(3), 2)) 24 | 25 | -- passage encoder 26 | local p_gru_layer = nn.GRU(params.dim, params.hid_size, params.max_passage_len, 0):maskZero(1) 27 | local p_fwd_gru = nn.Sequential():add(word_lookup:sharedClone()) 28 | :add(p_gru_layer) 29 | local p_fwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 30 | :add(nn.Sequencer(p_fwd_gru)) 31 | local p_bwd_gru = nn.Sequential():add(word_lookup:sharedClone()) 32 | :add(p_gru_layer:sharedClone()) 33 | local p_bwd_seq = nn.Sequential():add(nn.SplitTable(1, 2)) 34 | :add(nn.Sequencer(p_bwd_gru)) 35 | local p_parallel = nn.ParallelTable():add(p_fwd_seq) 36 | :add(p_bwd_seq) 37 | local p_encoder = nn.Sequential():add(p_parallel) 38 | :add(nn.MaskZero(nn.ZipTable(), 1)) 39 | :add(nn.Sequencer(nn.MaskZero(nn.JoinTable(1, 1), 1))) -- merges the fwd, seq out at each time step 40 | :add(nn.Sequencer(nn.MaskZero(nn.View(1, -1, 2 * params.hid_size), 1))) 41 | :add(nn.MaskZero(nn.JoinTable(1), 3)) 42 | 43 | -- build the attention model 44 | local combiner = nn.ParallelTable():add(p_encoder) 45 | :add(q_encoder) 46 | local model = nn.Sequential():add(combiner):add(nn.MaskZero(nn.MM(false, false), 2)) 47 | :add(nn.MaskZero(nn.Squeeze(), 3)) 48 | :add(nn.MaskZero(nn.SoftMax(), 1)) 49 | 50 | -- bring back all states to the start of the sequence buffers 51 | params.q_gru_layer = q_gru_layer 52 | params.q_gru_layer:forget() 53 | params.p_gru_layer = p_gru_layer 54 | params.p_gru_layer:forget() 55 | 56 | return model 57 | end -------------------------------------------------------------------------------- /watson/model_nngraph.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Nngraph based implementation of the NN 4 | 5 | ]]-- 6 | 7 | require 'nngraph' 8 | require 'rnn' 9 | 10 | function get_model(params) 11 | local inputs = { nn.Identity()(), nn.Identity()()} 12 | local question = inputs[1] 13 | local passage = inputs[2] 14 | 15 | -- encode the question 16 | local question_word_vectors = nn.LookupTable(#params.dataset.index2word, params.dim)(question):annotate{name = 'question_word_lookup'} 17 | local question_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), 18 | nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1)) 19 | (nn.SplitTable(1, 2)(question_word_vectors)):annotate{name = 'question_encoder'} 20 | local final_q_out = nn.SelectTable(-1)(question_encoder) -- get the last step output 21 | 22 | -- encode the passage 23 | local passage_word_vectors = nn.LookupTable(#params.dataset.index2word, params.dim)(passage):annotate{name = 'passage_word_lookup'} 24 | local passage_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), 25 | nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1)) 26 | (nn.SplitTable(1, 2)(passage_word_vectors)):annotate{name = 'passage_encoder'} 27 | local final_p_out = nn.View(-1, 2 * params.hid_size)(nn.JoinTable(2)(passage_encoder)) -- combine the forward and backward rnns' output 28 | 29 | -- calculate the final prob 30 | local soft_out = nn.SoftMax()(nn.Squeeze()(nn.MM(false, true){final_p_out, final_q_out})) 31 | 32 | return nn.gModule(inputs, {soft_out}) 33 | end 34 | -------------------------------------------------------------------------------- /watson/preprocess.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Pre-processing scripts which 4 | (i) creates the data tensors (to be used by main.lua) 5 | (ii) saves them to t7 file 6 | 7 | ]]-- 8 | 9 | require 'torch' 10 | require 'cutorch' 11 | require 'io' 12 | require 'xlua' 13 | require 'lfs' 14 | require 'pl.stringx' 15 | require 'pl.file' 16 | tds = require('tds') 17 | 18 | -- Command line arguments 19 | cmd = torch.CmdLine() 20 | cmd:option('-data', '../cbt/', [[path to the data folder containing train, val. & test records]]) 21 | cmd:option('-word_type', 'NE', [[class type of the prediction word. (NE - Named Entity, 22 | CN - Common Noun, V - Verb, P - Preposition]]) 23 | cmd:option('-out', 'dataset.t7', [[data tensors output file name]]) 24 | cmd:option('-question_pad', 'left', [[which side to pad the question to make sequences in a batch to be 25 | of same size? left or right? ]]) 26 | cmd:option('-passage_pad', 'right', [[which side to pad the passage to make sequences in a batch to be 27 | of same size? left or right? ]]) 28 | params = cmd:parse(arg) 29 | 30 | -- Load all the data to memory 31 | print('loading ' .. params.word_type .. ' data...') 32 | params.train_lines = stringx.splitlines(file.read(params.data .. 'cbtest_' .. params.word_type .. '_train.txt')) 33 | params.val_lines = stringx.splitlines(file.read(params.data .. 'cbtest_' .. params.word_type .. '_valid_2000ex.txt')) 34 | params.test_lines = stringx.splitlines(file.read(params.data .. 'cbtest_' .. params.word_type .. '_test_2500ex.txt')) 35 | params.train_size, params.val_size, params.test_size = #params.train_lines/22, #params.val_lines/22, #params.test_lines/22 36 | print('found (' .. params.train_size .. '/' .. params.val_size .. '/' .. params.test_size ..') records') 37 | 38 | -- Build the vocabulary 39 | print('building vocab...') 40 | local start = sys.clock() 41 | params.index2word, params.word2index = tds.hash(), tds.hash() 42 | params.max_passage_len, params.max_question_len = -1, -1 43 | function create_vocab(lines) 44 | for i = 1, #lines/22 do 45 | local start, passage_len = (1 + 22 * (i - 1)), 0 46 | for j = 0, 19 do 47 | local words = stringx.split(lines[start + j]) 48 | for k = 2, #words do 49 | if params.word2index[words[k]] == nil then 50 | params.index2word[#params.index2word + 1] = words[k] 51 | params.word2index[words[k]] = #params.index2word 52 | end 53 | end 54 | passage_len = passage_len + #words - 1 55 | end 56 | if params.max_passage_len < passage_len then params.max_passage_len = passage_len end 57 | local words = stringx.split(lines[start + 20]) 58 | for k = 2, (#words - 2) do 59 | words[k] = string.lower(words[k]) 60 | if params.word2index[words[k]] == nil then 61 | params.index2word[#params.index2word + 1] = words[k] 62 | params.word2index[words[k]] = #params.index2word 63 | end 64 | end 65 | if params.max_question_len < (#words - 3) then params.max_question_len = #words - 3 end 66 | end 67 | end 68 | create_vocab(params.train_lines) 69 | create_vocab(params.val_lines) 70 | create_vocab(params.test_lines) 71 | print(string.format('vocab built in %.2f mins; # unique words = %d; max. passage len = %d; max. question len = %d;', 72 | ((sys.clock() - start) / 60), #params.index2word, params.max_passage_len, params.max_question_len)) 73 | 74 | -- Generate the data tensors 75 | params.max_uniq_words_per_passage = -1 76 | function gen_tensors(lines, label) 77 | print('generating tensors for ' .. label .. '...') 78 | function tokenize(sentence, last_offset) 79 | local words = stringx.split(sentence) 80 | local last_index = #words 81 | if last_offset ~= nil then last_index = #words - last_offset end 82 | local tokens = {} 83 | for i = 2, last_index do 84 | table.insert(tokens, words[i]) 85 | end 86 | return tokens 87 | end 88 | function get_question(last_line) 89 | local words = tokenize(last_line, 2) 90 | local question_tensor, question_rev_tensor = torch.Tensor(params.max_question_len):fill(0), 91 | torch.Tensor(params.max_question_len):fill(0) 92 | for i = 1, #words do 93 | words[i] = string.lower(words[i]) 94 | assert(params.word2index[words[i]] ~= nil) 95 | if params.question_pad == 'right' then 96 | question_tensor[params.max_question_len - i + 1] = params.word2index[words[i]] 97 | question_rev_tensor[params.max_question_len - #words + i] = params.word2index[words[i]] 98 | else 99 | question_tensor[i] = params.word2index[words[i]] 100 | question_rev_tensor[#words - i + 1] = params.word2index[words[i]] 101 | end 102 | end 103 | return {question_tensor, question_rev_tensor} 104 | end 105 | function get_passage_tensor_n_answer(lines, start, answer, answer_cand) 106 | local p_global_word_list, p_local_word_list, local_index2word, local_word2index, local_answer_id = tds.hash(), tds.hash(), 107 | tds.hash(), tds.hash(), -1 108 | for i = 0, 19 do 109 | local words = tokenize(lines[start + i]) 110 | for j = 1, #words do 111 | assert(params.word2index[words[j]] ~= nil) 112 | p_global_word_list[#p_global_word_list + 1] = params.word2index[words[j]] 113 | if local_word2index[words[j]] == nil then 114 | local_index2word[#local_index2word + 1] = words[j] 115 | local_word2index[words[j]] = #local_index2word 116 | if words[j] == answer then 117 | local_answer_id = local_word2index[answer] 118 | end 119 | end 120 | assert(local_word2index[words[j]] ~= nil) 121 | p_local_word_list[#p_local_word_list + 1] = local_word2index[words[j]] 122 | end 123 | end 124 | assert(local_answer_id ~= -1) 125 | assert(#p_global_word_list == #p_local_word_list) 126 | local passage_tensor, passage_rev_tensor = torch.Tensor(params.max_passage_len):fill(0), 127 | torch.Tensor(params.max_passage_len):fill(0) 128 | for i = 1, #p_global_word_list do 129 | if params.passage_pad == 'right' then 130 | passage_tensor[params.max_passage_len - i + 1] = p_global_word_list[i] 131 | passage_rev_tensor[params.max_passage_len - #p_global_word_list + i] = p_global_word_list[i] 132 | else 133 | passage_tensor[i] = p_global_word_list[i] 134 | passage_rev_tensor[#p_global_word_list - i + 1] = p_global_word_list[i] 135 | end 136 | end 137 | 138 | if #local_index2word > params.max_uniq_words_per_passage then 139 | params.max_uniq_words_per_passage = #local_index2word 140 | end 141 | 142 | return {passage_tensor, passage_rev_tensor}, p_local_word_list, #local_index2word, local_answer_id 143 | end 144 | local meta_info, passage_tensor_master, passage_rev_tensor_master, question_tensor_master, question_rev_tensor_master = {}, 145 | tds.hash(), tds.hash(), tds.hash(), tds.hash() 146 | xlua.progress(1, #lines/22) 147 | for i = 1, #lines/22 do 148 | local start = 1 + 22 * (i - 1) 149 | local last_line = stringx.split(lines[start + 20]) 150 | local answer, answer_cand = last_line[#last_line - 1], stringx.split(last_line[#last_line], '|') 151 | local question_tensor = get_question(lines[start + 20]) 152 | local passage_tensor, p_local_word_list, local_vocab_size, local_answer_id = get_passage_tensor_n_answer( 153 | lines, start, answer, answer_cand) 154 | passage_tensor_master[#passage_tensor_master + 1] = passage_tensor[1] 155 | passage_rev_tensor_master[#passage_rev_tensor_master + 1] = passage_tensor[2] 156 | question_tensor_master[#question_tensor_master + 1] = question_tensor[1] 157 | question_rev_tensor_master[#question_rev_tensor_master + 1] = question_tensor[2] 158 | table.insert(meta_info, {p_local_word_list, local_vocab_size, local_answer_id}) 159 | if i % 5 == 0 then xlua.progress(i, #lines/22) end 160 | end 161 | xlua.progress(#lines/22, #lines/22) 162 | return {meta_info, passage_tensor_master, passage_rev_tensor_master, question_tensor_master, question_rev_tensor_master} 163 | end 164 | params.train_tensors = gen_tensors(params.train_lines, 'train') 165 | params.val_tensors = gen_tensors(params.val_lines, 'valid') 166 | params.test_tensors = gen_tensors(params.test_lines, 'test') 167 | 168 | -- Save the data tensors 169 | print('saving all the tensors...') 170 | local save_point = { 171 | train_tensors = params.train_tensors, 172 | val_tensors = params.val_tensors, 173 | test_tensors = params.test_tensors, 174 | index2word = params.index2word, 175 | word2index = params.word2index, 176 | max_passage_len = params.max_passage_len, 177 | max_question_len = params.max_question_len, 178 | max_uniq_words_per_passage = params.max_uniq_words_per_passage 179 | } 180 | torch.save(params.out, save_point) 181 | -------------------------------------------------------------------------------- /watson/utils.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | 3 | Utilities used by main.lua 4 | 5 | ]]-- 6 | 7 | local utils={} 8 | 9 | -- Function to get any layer from nnGraph module 10 | function utils.get_layer(model, name) 11 | for _, node in ipairs(model.forwardnodes) do 12 | if node.data.annotations.name == name then 13 | return node.data.module 14 | end 15 | end 16 | return nil 17 | end 18 | 19 | -- Function to compute accuracy of the model 20 | function utils.compute_accuracy(model, data, params) 21 | model:evaluate() 22 | local correct = 0 23 | local question_master_tensor, question_rev_master_tensor, passage_master_tensor, passage_rev_master_tensor = 24 | torch.CudaTensor(params.bsize, params.dataset.max_question_len), torch.CudaTensor(params.bsize, params.dataset.max_question_len), 25 | torch.CudaTensor(params.bsize, params.dataset.max_passage_len), torch.CudaTensor(params.bsize, params.dataset.max_passage_len) 26 | local sum_master_tensor, label_master_tensor = torch.CudaTensor(params.bsize, params.dataset.max_uniq_words_per_passage), torch.CudaTensor(params.bsize) 27 | for i = 1, #data, params.bsize do 28 | local batch_size = math.min(i + params.bsize - 1, #data) - i + 1 29 | for j = 1, batch_size do 30 | local index = i + j - 1 31 | local meta_p_local_word_list, meta_local_vocab_size, meta_local_answer_id = unpack(params.dataset.train_tensors[1][index]) 32 | local data_passage_tensor, data_passage_rev_tensor, data_question_tensor, data_question_rev_tensor = params.dataset.train_tensors[2][index], 33 | params.dataset.train_tensors[3][index], params.dataset.train_tensors[4][index], params.dataset.train_tensors[5][index] 34 | question_master_tensor[batch_size]:copy(data_question_tensor:cuda()) 35 | question_rev_master_tensor[batch_size]:copy(data_question_rev_tensor:cuda()) 36 | passage_master_tensor[batch_size]:copy(data_passage_tensor:cuda()) 37 | passage_master_tensor[batch_size]:copy(data_passage_rev_tensor:cuda()) 38 | label_master_tensor[j] = meta_local_answer_id 39 | end 40 | local pred = params.model:forward({{passage_master_tensor[{ {1, batch_size}, {} }], passage_rev_master_tensor[{ {1, batch_size}, {} }]}, 41 | {question_master_tensor[{ {1, batch_size}, {} }], question_rev_master_tensor[{ {1, batch_size}, {} }]}}) 42 | sum_master_tensor:fill(0) 43 | for j = 1, batch_size do 44 | local local_word_list = params.dataset.train_tensors[1][i + j - 1][1] 45 | for k = 1, #local_word_list do 46 | local l = (params.dataset.max_passage_len - #local_word_list + k) 47 | sum_master_tensor[j][local_word_list[k]] = sum_master_tensor[j][local_word_list[k]] + pred[j][l] 48 | end 49 | end 50 | _, ids = sum_master_tensor[{ {1, batch_size}, {} }]:max(2) 51 | for j = 1, batch_size do 52 | if ids[j][1] == label_master_tensor[j] then 53 | correct = correct + 1 54 | end 55 | end 56 | end 57 | return correct / #data 58 | end 59 | 60 | return utils --------------------------------------------------------------------------------