├── .gitignore ├── README.md └── chatbot ├── README.md ├── chatbot.py ├── config.py ├── data.py ├── model.py └── output_convo.txt /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | *.pdf 3 | 4 | *.SUNet 5 | *.pyc 6 | 7 | 8 | examples/checkpoints/* 9 | 10 | examples/chatbot/processed/* 11 | examples/chatbot/checkpoints/* 12 | examples/chatbot/data_analysis.py 13 | 14 | assignments/chatbot/processed/* 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # tf-stanford-tutorials Chatbot 2 | This repository contains the code example for a Chatbot from the course CS 20SI: TensorFlow for Deep Learning Research at Stanford University
3 | 4 | It is a complete but primitive neural chatbot using sequence to sequence model with attentional decoder in TensorFlow. Originally it was created by Chip Huyen as the starter code for an assignment «TensorFlow for Deep Learning Research» cs20si.stanford.edu 5 | 6 | Original Github code repo: 7 | https://goo.gl/QH6M6E 8 | 9 | A companion assignment instructions sheet: web.stanford.edu - 10 | https://goo.gl/vfGQI4 11 | 12 | Claude Coulombe, TÉLUQ / UQAM Montréal, updated the code in order to be compatible with Python 3 and TensorFlow 1.1. On May 31 2017, it seems to work correctly but I have not the computing ressources to train it for a long period of time.
13 | 14 | Detailed syllabus and lecture notes can be found at http://cs20si.stanford.edu 15 |
16 | 17 | Instructions:
18 | -------------
19 | 20 | 1) Check out this repository.
21 | git clone https://github.com/ClaudeCoulombe/tf-stanford-tutorials.git
22 | 23 | 2) Download and unzip the dataset and put it in the data sub-folder
24 | https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
25 | 26 | 3) Change the DATA_PATH in the config.py, line 20
27 | 28 | 4) Then run the data.py file
29 | python data.py
30 | It should create the folder 'processed', and then put a bunch of data files into it.
31 | 32 | 5) Train the model
33 | python chatbot.py --mode train
34 | You could interrupt it after a long time or when the loss is low enough
35 | since the TensorFlow model is saved in the checkpoints folder you can recover the models
36 | 37 | 6) Interact / play with the Chatbot
38 | python chatbot.py --mode chat
39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /chatbot/README.md: -------------------------------------------------------------------------------- 1 | <<<<<<< 2 | A neural chatbot using sequence to sequence model with 3 | attentional decoder. This is a fully functional chatbot. 4 | 5 | This is based on Google Translate Tensorflow model 6 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ 7 | 8 | Sequence to sequence model by Cho et al.(2014) 9 | 10 | Created by Chip Huyen as the starter code for assignment 3, 11 | class CS 20SI: "TensorFlow for Deep Learning Research" 12 | cs20si.stanford.edu 13 | 14 | Updated for Python 3 and TensorFlow 1.1 by Claude Coulombe - TÉLUQ / UQAM Montréal - May 31 2017 15 | 16 | The detailed assignment handout and information on training time can be found at http://web.stanford.edu/class/cs20si/assignments/a3.pdf 17 | 18 |

Sample conversation

19 | HUMAN ++++ hi
20 | BOT ++++ hi . what ' s your name ?
21 | HUMAN ++++ i'm chip
22 | BOT ++++ please .
23 | HUMAN ++++ please what?
24 | BOT ++++ i don ' t know what to say .
25 | HUMAN ++++ where are you from?
26 | BOT ++++ california .
27 | HUMAN ++++ do you like california?
28 | BOT ++++ i ' m okay .
29 | HUMAN ++++ i like california a lot
30 | BOT ++++ of what ?
31 | HUMAN ++++ i live here
32 | BOT ++++ and what is it ?
33 | HUMAN ++++ california
34 | BOT ++++ i don ' t know what to say .
35 | HUMAN ++++ i'm a student here
36 | BOT ++++ and get it out of here .
37 | HUMAN ++++ what do you like to do in your free time?
38 | BOT ++++ and what about the gun ?
39 | HUMAN ++++ do you have a gun?
40 | BOT ++++ yes .
41 | HUMAN ++++ you're dangerous!
42 | BOT ++++ i ' m not .
43 | HUMAN ++++ do you go hunting?
44 | BOT ++++ i can do .
45 | HUMAN ++++ what do you hunt?
46 | BOT ++++ virgin .
47 | HUMAN ++++ that's very bad
48 | BOT ++++ and you in love ?
49 | HUMAN ++++ no i'm not
50 | BOT ++++ please - -
51 | HUMAN ++++ okay, i am
52 | BOT ++++ and you are good .
53 | HUMAN ++++ yeah i am good too.
54 | BOT ++++ good good to you .
55 | HUMAN ++++ are you in love?
56 | BOT ++++ i ' m ready .
57 | HUMAN ++++ ready for what?
58 | BOT ++++ for what ?
59 | HUMAN ++++ for what?
60 | BOT ++++ i don ' t know what to say .
61 | HUMAN ++++ tell me
62 | BOT ++++ i can do it .
63 | HUMAN ++++ okay, tell me what you're ready for
64 | BOT ++++ i ' m doing you .
65 | HUMAN ++++ no you're not
66 | BOT ++++ please !
67 | HUMAN ++++ this is a strange conversation
68 | BOT ++++ i don ' t know what to say .
69 | HUMAN ++++ are you a guy?
70 | BOT ++++ of what ?
71 | HUMAN ++++ are you a boy?
72 | BOT ++++ i ' m fine .
73 | HUMAN ++++ are you a man?
74 | BOT ++++ i ' m a little .
75 | HUMAN ++++ what's your name?
76 | BOT ++++ chris -
77 | HUMAN ++++ chris who?
78 | BOT ++++ cornelius .
79 | HUMAN ++++ so i guess you're a guy.
80 | BOT ++++ i am .
81 | HUMAN ++++ okay
82 | BOT ++++ okay ?
83 | HUMAN ++++ do you have a girlfriend?
84 | BOT ++++ i ' m sure .
85 | HUMAN ++++ what's her name?
86 | BOT ++++ let ' s talk about something else .
87 | 88 | See output_convo.txt for more sample conversations. 89 | 90 |

Usage

91 | 92 | Step 1: create a data folder in your project directory, download 93 | the Cornell Movie-Dialogs Corpus from 94 | https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html 95 | Unzip it 96 | 97 | Step 2: python data.py 98 |
This will do all the pre-processing for the Cornell dataset. 99 | 100 | Step 3: 101 | python chatbot.py --mode [train/chat]
102 | If mode is train, then you train the chatbot. By default, the model will 103 | restore the previously trained weights (if there is any) and continue 104 | training up on that. 105 | 106 | If you want to start training from scratch, please delete all the checkpoints 107 | in the checkpoints folder. 108 | 109 | If the mode is chat, you'll go into the interaction mode with the bot. 110 | 111 | By default, all the conversations you have with the chatbot will be written 112 | into the file output_convo.txt in the processed folder. If you run this chatbot, 113 | I kindly ask you to send me the output_convo.txt so that I can improve 114 | the chatbot. My email is huyenn@stanford.edu 115 | 116 | If you find the tutorial helpful, please head over to Anonymous Chatlog Donation 117 | to see how you can help us create the first realistic dialogue dataset. 118 | 119 | Thank you very much! 120 | >>>>>>> origin/master 121 | -------------------------------------------------------------------------------- /chatbot/chatbot.py: -------------------------------------------------------------------------------- 1 | """ A neural chatbot using sequence to sequence model with 2 | attentional decoder. 3 | 4 | This is based on Google Translate Tensorflow model 5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ 6 | 7 | Sequence to sequence model by Cho et al.(2014) 8 | 9 | Created by Chip Huyen as the starter code for assignment 3, 10 | class CS 20SI: "TensorFlow for Deep Learning Research" 11 | cs20si.stanford.edu 12 | 13 | This file contains the code to run the model. 14 | 15 | See readme.md for instruction on how to run the starter code. 16 | """ 17 | 18 | 19 | 20 | import argparse 21 | import os 22 | import random 23 | import sys 24 | import time 25 | 26 | import numpy as np 27 | import tensorflow as tf 28 | 29 | from model import ChatBotModel 30 | import config 31 | import data 32 | 33 | def _get_random_bucket(train_buckets_scale): 34 | """ Get a random bucket from which to choose a training sample """ 35 | rand = random.random() 36 | return min([i for i in range(len(train_buckets_scale)) 37 | if train_buckets_scale[i] > rand]) 38 | 39 | def _assert_lengths(encoder_size, decoder_size, encoder_inputs, decoder_inputs, decoder_masks): 40 | """ Assert that the encoder inputs, decoder inputs, and decoder masks are 41 | of the expected lengths """ 42 | if len(encoder_inputs) != encoder_size: 43 | raise ValueError("Encoder length must be equal to the one in bucket," 44 | " %d != %d." % (len(encoder_inputs), encoder_size)) 45 | if len(decoder_inputs) != decoder_size: 46 | raise ValueError("Decoder length must be equal to the one in bucket," 47 | " %d != %d." % (len(decoder_inputs), decoder_size)) 48 | if len(decoder_masks) != decoder_size: 49 | raise ValueError("Weights length must be equal to the one in bucket," 50 | " %d != %d." % (len(decoder_masks), decoder_size)) 51 | 52 | def run_step(sess, model, encoder_inputs, decoder_inputs, decoder_masks, bucket_id, forward_only): 53 | """ Run one step in training. 54 | @forward_only: boolean value to decide whether a backward path should be created 55 | forward_only is set to True when you just want to evaluate on the test set, 56 | or when you want to the bot to be in chat mode. """ 57 | encoder_size, decoder_size = config.BUCKETS[bucket_id] 58 | _assert_lengths(encoder_size, decoder_size, encoder_inputs, decoder_inputs, decoder_masks) 59 | 60 | # input feed: encoder inputs, decoder inputs, target_weights, as provided. 61 | input_feed = {} 62 | for step in range(encoder_size): 63 | input_feed[model.encoder_inputs[step].name] = encoder_inputs[step] 64 | for step in range(decoder_size): 65 | input_feed[model.decoder_inputs[step].name] = decoder_inputs[step] 66 | input_feed[model.decoder_masks[step].name] = decoder_masks[step] 67 | 68 | last_target = model.decoder_inputs[decoder_size].name 69 | input_feed[last_target] = np.zeros([model.batch_size], dtype=np.int32) 70 | 71 | # output feed: depends on whether we do a backward step or not. 72 | if not forward_only: 73 | output_feed = [model.train_ops[bucket_id], # update op that does SGD. 74 | model.gradient_norms[bucket_id], # gradient norm. 75 | model.losses[bucket_id]] # loss for this batch. 76 | else: 77 | output_feed = [model.losses[bucket_id]] # loss for this batch. 78 | for step in range(decoder_size): # output logits. 79 | output_feed.append(model.outputs[bucket_id][step]) 80 | 81 | outputs = sess.run(output_feed, input_feed) 82 | if not forward_only: 83 | return outputs[1], outputs[2], None # Gradient norm, loss, no outputs. 84 | else: 85 | return None, outputs[0], outputs[1:] # No gradient norm, loss, outputs. 86 | 87 | def _get_buckets(): 88 | """ Load the dataset into buckets based on their lengths. 89 | train_buckets_scale is the inverval that'll help us 90 | choose a random bucket later on. 91 | """ 92 | test_buckets = data.load_data('test_ids.enc', 'test_ids.dec') 93 | data_buckets = data.load_data('train_ids.enc', 'train_ids.dec') 94 | train_bucket_sizes = [len(data_buckets[b]) for b in range(len(config.BUCKETS))] 95 | print("Number of samples in each bucket:\n", train_bucket_sizes) 96 | train_total_size = sum(train_bucket_sizes) 97 | # list of increasing numbers from 0 to 1 that we'll use to select a bucket. 98 | train_buckets_scale = [sum(train_bucket_sizes[:i + 1]) / train_total_size 99 | for i in range(len(train_bucket_sizes))] 100 | print("Bucket scale:\n", train_buckets_scale) 101 | return test_buckets, data_buckets, train_buckets_scale 102 | 103 | def _get_skip_step(iteration): 104 | """ How many steps should the model train before it saves all the weights. """ 105 | if iteration < 100: 106 | return 30 107 | return 100 108 | 109 | def _check_restore_parameters(sess, saver): 110 | """ Restore the previously trained parameters if there are any. """ 111 | ckpt = tf.train.get_checkpoint_state(os.path.dirname(config.CPT_PATH + '/checkpoint')) 112 | if ckpt and ckpt.model_checkpoint_path: 113 | print("Loading parameters for the Chatbot") 114 | saver.restore(sess, ckpt.model_checkpoint_path) 115 | else: 116 | print("Initializing fresh parameters for the Chatbot") 117 | 118 | def _eval_test_set(sess, model, test_buckets): 119 | """ Evaluate on the test set. """ 120 | for bucket_id in range(len(config.BUCKETS)): 121 | if len(test_buckets[bucket_id]) == 0: 122 | print(" Test: empty bucket %d" % (bucket_id)) 123 | continue 124 | start = time.time() 125 | encoder_inputs, decoder_inputs, decoder_masks = data.get_batch(test_buckets[bucket_id], 126 | bucket_id, 127 | batch_size=config.BATCH_SIZE) 128 | _, step_loss, _ = run_step(sess, model, encoder_inputs, decoder_inputs, 129 | decoder_masks, bucket_id, True) 130 | print('Test bucket {}: loss {}, time {}'.format(bucket_id, step_loss, time.time() - start)) 131 | 132 | def train(): 133 | """ Train the bot """ 134 | test_buckets, data_buckets, train_buckets_scale = _get_buckets() 135 | # in train mode, we need to create the backward path, so forwrad_only is False 136 | model = ChatBotModel(False, config.BATCH_SIZE) 137 | model.build_graph() 138 | 139 | saver = tf.train.Saver() 140 | 141 | with tf.Session() as sess: 142 | print('Running session') 143 | sess.run(tf.global_variables_initializer()) 144 | _check_restore_parameters(sess, saver) 145 | 146 | iteration = model.global_step.eval() 147 | total_loss = 0 148 | while True: 149 | skip_step = _get_skip_step(iteration) 150 | bucket_id = _get_random_bucket(train_buckets_scale) 151 | encoder_inputs, decoder_inputs, decoder_masks = data.get_batch(data_buckets[bucket_id], 152 | bucket_id, 153 | batch_size=config.BATCH_SIZE) 154 | start = time.time() 155 | _, step_loss, _ = run_step(sess, model, encoder_inputs, decoder_inputs, decoder_masks, bucket_id, False) 156 | total_loss += step_loss 157 | iteration += 1 158 | 159 | if iteration % skip_step == 0: 160 | print('Iter {}: loss {}, time {}'.format(iteration, total_loss/skip_step, time.time() - start)) 161 | start = time.time() 162 | total_loss = 0 163 | saver.save(sess, os.path.join(config.CPT_PATH, 'chatbot'), global_step=model.global_step) 164 | if iteration % (10 * skip_step) == 0: 165 | # Run evals on development set and print their loss 166 | _eval_test_set(sess, model, test_buckets) 167 | start = time.time() 168 | sys.stdout.flush() 169 | 170 | def _get_user_input(): 171 | """ Get user's input, which will be transformed into encoder input later """ 172 | print("HUMAN ++++>", end="") 173 | sys.stdout.flush() 174 | return sys.stdin.readline() 175 | 176 | def _find_right_bucket(length): 177 | """ Find the proper bucket for an encoder input based on its length """ 178 | return min([b for b in range(len(config.BUCKETS)) 179 | if config.BUCKETS[b][0] >= length]) 180 | 181 | TRACE = False 182 | 183 | def _construct_response(output_logits, inv_dec_vocab): 184 | """ Construct a response to the user's encoder input. 185 | @output_logits: the outputs from sequence to sequence wrapper. 186 | output_logits is decoder_size np array, each of dim 1 x DEC_VOCAB 187 | 188 | This is a greedy decoder - outputs are just argmaxes of output_logits. 189 | """ 190 | if TRACE: 191 | print(output_logits[0]) 192 | outputs = [int(np.argmax(logit, axis=1)) for logit in output_logits] 193 | # If there is an EOS symbol in outputs, cut them at that point. 194 | if config.EOS_ID in outputs: 195 | outputs = outputs[:outputs.index(config.EOS_ID)] 196 | # Print out sentence corresponding to outputs. 197 | return " ".join([tf.compat.as_str(inv_dec_vocab[output]) for output in outputs]) 198 | 199 | def chat(): 200 | """ in test mode, we don't to create the backward path 201 | """ 202 | _, enc_vocab = data.load_vocab(os.path.join(config.PROCESSED_PATH, 'vocab.enc')) 203 | inv_dec_vocab, _ = data.load_vocab(os.path.join(config.PROCESSED_PATH, 'vocab.dec')) 204 | 205 | model = ChatBotModel(True, batch_size=1) 206 | model.build_graph() 207 | 208 | saver = tf.train.Saver() 209 | 210 | with tf.Session() as sess: 211 | sess.run(tf.global_variables_initializer()) 212 | _check_restore_parameters(sess, saver) 213 | output_file = open(os.path.join(config.PROCESSED_PATH, config.OUTPUT_FILE), 'a+') 214 | # Decode from standard input. 215 | max_length = config.BUCKETS[-1][0] 216 | print('Welcome to TensorBro. Say something. Enter to exit. Max length is', max_length) 217 | while True: 218 | line = _get_user_input() 219 | if len(line) > 0 and line[-1] == '\n': 220 | line = line[:-1] 221 | if line == '': 222 | break 223 | output_file.write('HUMAN ++++ ' + line + '\n') 224 | # Get token-ids for the input sentence. 225 | token_ids = data.sentence2id(enc_vocab, str(line)) 226 | if (len(token_ids) > max_length): 227 | print('Max length I can handle is:', max_length) 228 | line = _get_user_input() 229 | continue 230 | # Which bucket does it belong to? 231 | bucket_id = _find_right_bucket(len(token_ids)) 232 | # Get a 1-element batch to feed the sentence to the model. 233 | encoder_inputs, decoder_inputs, decoder_masks = data.get_batch([(token_ids, [])], 234 | bucket_id, 235 | batch_size=1) 236 | # Get output logits for the sentence. 237 | _, _, output_logits = run_step(sess, model, encoder_inputs, decoder_inputs, 238 | decoder_masks, bucket_id, True) 239 | response = _construct_response(output_logits, inv_dec_vocab) 240 | print('BOT ++++ ' + response) 241 | output_file.write('BOT ++++ ' + response + '\n') 242 | output_file.write('=============================================\n') 243 | output_file.close() 244 | 245 | def main(): 246 | parser = argparse.ArgumentParser() 247 | parser.add_argument('--mode', choices={'train', 'chat'}, 248 | default='train', help="mode. if not specified, it's in the train mode") 249 | args = parser.parse_args() 250 | 251 | if not os.path.isdir(config.PROCESSED_PATH): 252 | data.prepare_raw_data() 253 | data.process_data() 254 | print('Data ready!') 255 | # create checkpoints folder if there isn't one already 256 | data.make_dir(config.CPT_PATH) 257 | 258 | if args.mode == 'train': 259 | train() 260 | elif args.mode == 'chat': 261 | chat() 262 | 263 | if __name__ == '__main__': 264 | main() 265 | -------------------------------------------------------------------------------- /chatbot/config.py: -------------------------------------------------------------------------------- 1 | """ A neural chatbot using sequence to sequence model with 2 | attentional decoder. 3 | 4 | This is based on Google Translate Tensorflow model 5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ 6 | 7 | Sequence to sequence model by Cho et al.(2014) 8 | 9 | Created by Chip Huyen as the starter code for assignment 3, 10 | class CS 20SI: "TensorFlow for Deep Learning Research" 11 | cs20si.stanford.edu 12 | 13 | This file contains the hyperparameters for the model. 14 | 15 | See readme.md for instruction on how to run the starter code. 16 | """ 17 | 18 | # parameters for processing the dataset 19 | # DATA_PATH = '/Users/Chip/data/cornell movie-dialogs corpus' 20 | DATA_PATH = '/data/cornell movie-dialogs corpus' 21 | CONVO_FILE = 'movie_conversations.txt' 22 | LINE_FILE = 'movie_lines.txt' 23 | OUTPUT_FILE = 'output_convo.txt' 24 | PROCESSED_PATH = 'processed' 25 | CPT_PATH = 'checkpoints' 26 | 27 | THRESHOLD = 2 28 | 29 | PAD_ID = 0 30 | UNK_ID = 1 31 | START_ID = 2 32 | EOS_ID = 3 33 | 34 | TESTSET_SIZE = 25000 35 | 36 | # model parameters 37 | """ Train encoder length distribution: 38 | [175, 92, 11883, 8387, 10656, 13613, 13480, 12850, 11802, 10165, 39 | 8973, 7731, 7005, 6073, 5521, 5020, 4530, 4421, 3746, 3474, 3192, 40 | 2724, 2587, 2413, 2252, 2015, 1816, 1728, 1555, 1392, 1327, 1248, 41 | 1128, 1084, 1010, 884, 843, 755, 705, 660, 649, 594, 558, 517, 475, 42 | 426, 444, 388, 349, 337] 43 | These buckets size seem to work the best 44 | """ 45 | # [19530, 17449, 17585, 23444, 22884, 16435, 17085, 18291, 18931] 46 | # BUCKETS = [(6, 8), (8, 10), (10, 12), (13, 15), (16, 19), (19, 22), (23, 26), (29, 32), (39, 44)] 47 | 48 | # [37049, 33519, 30223, 33513, 37371] 49 | # BUCKETS = [(8, 10), (12, 14), (16, 19), (23, 26), (39, 43)] 50 | 51 | BUCKETS = [(8, 10), (12, 14), (16, 19)] 52 | 53 | NUM_LAYERS = 3 54 | HIDDEN_SIZE = 256 55 | BATCH_SIZE = 64 56 | 57 | LR = 0.5 58 | MAX_GRAD_NORM = 5.0 59 | 60 | NUM_SAMPLES = 512 61 | ENC_VOCAB = 24449 62 | DEC_VOCAB = 24633 63 | -------------------------------------------------------------------------------- /chatbot/data.py: -------------------------------------------------------------------------------- 1 | """ A neural chatbot using sequence to sequence model with 2 | attentional decoder. 3 | 4 | This is based on Google Translate Tensorflow model 5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ 6 | 7 | Sequence to sequence model by Cho et al.(2014) 8 | 9 | Created by Chip Huyen as the starter code for assignment 3, 10 | class CS 20SI: "TensorFlow for Deep Learning Research" 11 | cs20si.stanford.edu 12 | 13 | This file contains the code to do the pre-processing for the 14 | Cornell Movie-Dialogs Corpus. 15 | 16 | See readme.md for instruction on how to run the starter code. 17 | """ 18 | 19 | import random 20 | import re 21 | import os 22 | 23 | import numpy as np 24 | 25 | import config 26 | 27 | def get_lines(): 28 | id2line = {} 29 | file_path = os.path.join(config.DATA_PATH, config.LINE_FILE) 30 | print(file_path) 31 | # with open(file_path, 'rb') as f: 32 | with open(file_path, 'r', encoding="latin-1") as f: 33 | lines = f.readlines() 34 | for line in lines: 35 | parts = line.split(' +++$+++ ') 36 | if len(parts) == 5: 37 | if parts[4][-1] == '\n': 38 | parts[4] = parts[4][:-1] 39 | id2line[parts[0]] = parts[4] 40 | return id2line 41 | 42 | def get_convos(): 43 | """ Get conversations from the raw data """ 44 | file_path = os.path.join(config.DATA_PATH, config.CONVO_FILE) 45 | convos = [] 46 | # with open(file_path, 'rb') as f: 47 | with open(file_path, 'r', encoding="latin-1") as f: 48 | for line in f.readlines(): 49 | parts = line.split(' +++$+++ ') 50 | if len(parts) == 4: 51 | convo = [] 52 | for line in parts[3][1:-2].split(', '): 53 | convo.append(line[1:-1]) 54 | convos.append(convo) 55 | 56 | return convos 57 | 58 | def question_answers(id2line, convos): 59 | """ Divide the dataset into two sets: questions and answers. """ 60 | questions, answers = [], [] 61 | for convo in convos: 62 | for index, line in enumerate(convo[:-1]): 63 | questions.append(id2line[convo[index]]) 64 | answers.append(id2line[convo[index + 1]]) 65 | assert len(questions) == len(answers) 66 | return questions, answers 67 | 68 | def prepare_dataset(questions, answers): 69 | # create path to store all the train & test encoder & decoder 70 | make_dir(config.PROCESSED_PATH) 71 | 72 | # random convos to create the test set 73 | test_ids = random.sample([i for i in range(len(questions))],config.TESTSET_SIZE) 74 | 75 | filenames = ['train.enc', 'train.dec', 'test.enc', 'test.dec'] 76 | files = [] 77 | for filename in filenames: 78 | # files.append(open(os.path.join(config.PROCESSED_PATH, filename),'wb')) 79 | files.append(open(os.path.join(config.PROCESSED_PATH, filename),'w')) 80 | 81 | for i in range(len(questions)): 82 | if i in test_ids: 83 | files[2].write(questions[i] + '\n') 84 | files[3].write(answers[i] + '\n') 85 | else: 86 | files[0].write(questions[i] + '\n') 87 | files[1].write(answers[i] + '\n') 88 | 89 | for file in files: 90 | file.close() 91 | 92 | def make_dir(path): 93 | """ Create a directory if there isn't one already. """ 94 | try: 95 | os.mkdir(path) 96 | except OSError: 97 | pass 98 | 99 | def basic_tokenizer(line, normalize_digits=True): 100 | """ A basic tokenizer to tokenize text into tokens. 101 | Feel free to change this to suit your need. """ 102 | line = re.sub('', '', line) 103 | line = re.sub('', '', line) 104 | line = re.sub('\[', '', line) 105 | line = re.sub('\]', '', line) 106 | words = [] 107 | # _WORD_SPLIT = re.compile(b"([.,!?\"'-<>:;)(])") 108 | _WORD_SPLIT = re.compile("([.,!?\"'-<>:;)(])") 109 | _DIGIT_RE = re.compile(r"\d") 110 | for fragment in line.strip().lower().split(): 111 | for token in re.split(_WORD_SPLIT, fragment): 112 | if not token: 113 | continue 114 | if normalize_digits: 115 | # token = re.sub(_DIGIT_RE, b'#', token) 116 | token = re.sub(_DIGIT_RE, '#', token) 117 | words.append(token) 118 | return words 119 | 120 | def build_vocab(filename, normalize_digits=True): 121 | in_path = os.path.join(config.PROCESSED_PATH, filename) 122 | out_path = os.path.join(config.PROCESSED_PATH, 'vocab.{}'.format(filename[-3:])) 123 | 124 | vocab = {} 125 | # with open(in_path, 'rb') as f: 126 | with open(in_path, 'r', encoding="latin-1") as f: 127 | for line in f.readlines(): 128 | for token in basic_tokenizer(line): 129 | if not token in vocab: 130 | vocab[token] = 0 131 | vocab[token] += 1 132 | 133 | sorted_vocab = sorted(vocab, key=vocab.get, reverse=True) 134 | # with open(out_path, 'wb') as f: 135 | with open(out_path, 'w') as f: 136 | f.write('' + '\n') 137 | f.write('' + '\n') 138 | f.write('' + '\n') 139 | f.write('<\s>' + '\n') 140 | index = 4 141 | for word in sorted_vocab: 142 | if vocab[word] < config.THRESHOLD: 143 | # with open('config.py', 'ab') as cf: 144 | with open('config.py', 'a') as cf: 145 | if filename[-3:] == 'enc': 146 | cf.write('ENC_VOCAB = ' + str(index) + '\n') 147 | else: 148 | cf.write('DEC_VOCAB = ' + str(index) + '\n') 149 | break 150 | f.write(word + '\n') 151 | index += 1 152 | 153 | def load_vocab(vocab_path): 154 | # with open(vocab_path, 'rb') as f: 155 | with open(vocab_path, 'r', encoding='latin-1') as f: 156 | words = f.read().splitlines() 157 | return words, {words[i]: i for i in range(len(words))} 158 | 159 | def sentence2id(vocab, line): 160 | return [vocab.get(token, vocab['']) for token in basic_tokenizer(line)] 161 | 162 | def token2id(data, mode): 163 | """ Convert all the tokens in the data into their corresponding 164 | index in the vocabulary. """ 165 | vocab_path = 'vocab.' + mode 166 | in_path = data + '.' + mode 167 | out_path = data + '_ids.' + mode 168 | 169 | _, vocab = load_vocab(os.path.join(config.PROCESSED_PATH, vocab_path)) 170 | # in_file = open(os.path.join(config.PROCESSED_PATH, in_path), 'rb') 171 | in_file = open(os.path.join(config.PROCESSED_PATH, in_path), 'r') 172 | # out_file = open(os.path.join(config.PROCESSED_PATH, out_path), 'wb') 173 | out_file = open(os.path.join(config.PROCESSED_PATH, out_path), 'w') 174 | 175 | lines = in_file.read().splitlines() 176 | for line in lines: 177 | if mode == 'dec': # we only care about '' and in encoder 178 | ids = [vocab['']] 179 | else: 180 | ids = [] 181 | ids.extend(sentence2id(vocab, line)) 182 | # ids.extend([vocab.get(token, vocab['']) for token in basic_tokenizer(line)]) 183 | if mode == 'dec': 184 | ids.append(vocab['<\s>']) 185 | out_file.write(' '.join(str(id_) for id_ in ids) + '\n') 186 | 187 | def prepare_raw_data(): 188 | print('Preparing raw data into train set and test set ...') 189 | id2line = get_lines() 190 | convos = get_convos() 191 | questions, answers = question_answers(id2line, convos) 192 | prepare_dataset(questions, answers) 193 | 194 | def process_data(): 195 | print('Preparing data to be model-ready ...') 196 | build_vocab('train.enc') 197 | build_vocab('train.dec') 198 | token2id('train', 'enc') 199 | token2id('train', 'dec') 200 | token2id('test', 'enc') 201 | token2id('test', 'dec') 202 | 203 | def load_data(enc_filename, dec_filename, max_training_size=None): 204 | # encode_file = open(os.path.join(config.PROCESSED_PATH, enc_filename), 'rb') 205 | encode_file = open(os.path.join(config.PROCESSED_PATH, enc_filename), 'r') 206 | # decode_file = open(os.path.join(config.PROCESSED_PATH, dec_filename), 'rb') 207 | decode_file = open(os.path.join(config.PROCESSED_PATH, dec_filename), 'r') 208 | encode, decode = encode_file.readline(), decode_file.readline() 209 | data_buckets = [[] for _ in config.BUCKETS] 210 | i = 0 211 | while encode and decode: 212 | if (i + 1) % 10000 == 0: 213 | print("Bucketing conversation number", i) 214 | encode_ids = [int(id_) for id_ in encode.split()] 215 | decode_ids = [int(id_) for id_ in decode.split()] 216 | for bucket_id, (encode_max_size, decode_max_size) in enumerate(config.BUCKETS): 217 | if len(encode_ids) <= encode_max_size and len(decode_ids) <= decode_max_size: 218 | data_buckets[bucket_id].append([encode_ids, decode_ids]) 219 | break 220 | encode, decode = encode_file.readline(), decode_file.readline() 221 | i += 1 222 | return data_buckets 223 | 224 | def _pad_input(input_, size): 225 | return input_ + [config.PAD_ID] * (size - len(input_)) 226 | 227 | def _reshape_batch(inputs, size, batch_size): 228 | """ Create batch-major inputs. Batch inputs are just re-indexed inputs 229 | """ 230 | batch_inputs = [] 231 | for length_id in range(size): 232 | batch_inputs.append(np.array([inputs[batch_id][length_id] 233 | for batch_id in range(batch_size)], dtype=np.int32)) 234 | return batch_inputs 235 | 236 | 237 | def get_batch(data_bucket, bucket_id, batch_size=1): 238 | """ Return one batch to feed into the model """ 239 | # only pad to the max length of the bucket 240 | encoder_size, decoder_size = config.BUCKETS[bucket_id] 241 | encoder_inputs, decoder_inputs = [], [] 242 | 243 | for _ in range(batch_size): 244 | encoder_input, decoder_input = random.choice(data_bucket) 245 | # pad both encoder and decoder, reverse the encoder 246 | encoder_inputs.append(list(reversed(_pad_input(encoder_input, encoder_size)))) 247 | decoder_inputs.append(_pad_input(decoder_input, decoder_size)) 248 | 249 | # now we create batch-major vectors from the data selected above. 250 | batch_encoder_inputs = _reshape_batch(encoder_inputs, encoder_size, batch_size) 251 | batch_decoder_inputs = _reshape_batch(decoder_inputs, decoder_size, batch_size) 252 | 253 | # create decoder_masks to be 0 for decoders that are padding. 254 | batch_masks = [] 255 | for length_id in range(decoder_size): 256 | batch_mask = np.ones(batch_size, dtype=np.float32) 257 | for batch_id in range(batch_size): 258 | # we set mask to 0 if the corresponding target is a PAD symbol. 259 | # the corresponding decoder is decoder_input shifted by 1 forward. 260 | if length_id < decoder_size - 1: 261 | target = decoder_inputs[batch_id][length_id + 1] 262 | if length_id == decoder_size - 1 or target == config.PAD_ID: 263 | batch_mask[batch_id] = 0.0 264 | batch_masks.append(batch_mask) 265 | return batch_encoder_inputs, batch_decoder_inputs, batch_masks 266 | 267 | if __name__ == '__main__': 268 | prepare_raw_data() 269 | process_data() 270 | -------------------------------------------------------------------------------- /chatbot/model.py: -------------------------------------------------------------------------------- 1 | """ A neural chatbot using sequence to sequence model with 2 | attentional decoder. 3 | 4 | This is based on Google Translate Tensorflow model 5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ 6 | 7 | Sequence to sequence model by Cho et al.(2014) 8 | 9 | Created by Chip Huyen as the starter code for assignment 3, 10 | class CS 20SI: "TensorFlow for Deep Learning Research" 11 | cs20si.stanford.edu 12 | 13 | This file contains the code to build the model 14 | 15 | See readme.md for instruction on how to run the starter code. 16 | """ 17 | 18 | 19 | import time 20 | 21 | import numpy as np 22 | import tensorflow as tf 23 | 24 | import config 25 | 26 | class ChatBotModel(object): 27 | def __init__(self, forward_only, batch_size): 28 | """forward_only: if set, we do not construct the backward pass in the model. 29 | """ 30 | print('Initialize new model') 31 | self.fw_only = forward_only 32 | self.batch_size = batch_size 33 | 34 | def _create_placeholders(self): 35 | # Feeds for inputs. It's a list of placeholders 36 | print('Create placeholders') 37 | self.encoder_inputs = [tf.placeholder(tf.int32, shape=[None], name='encoder{}'.format(i)) 38 | for i in range(config.BUCKETS[-1][0])] 39 | self.decoder_inputs = [tf.placeholder(tf.int32, shape=[None], name='decoder{}'.format(i)) 40 | for i in range(config.BUCKETS[-1][1] + 1)] 41 | self.decoder_masks = [tf.placeholder(tf.float32, shape=[None], name='mask{}'.format(i)) 42 | for i in range(config.BUCKETS[-1][1] + 1)] 43 | 44 | # Our targets are decoder inputs shifted by one (to ignore symbol) 45 | self.targets = self.decoder_inputs[1:] 46 | 47 | def _inference(self): 48 | print('Create inference') 49 | # If we use sampled softmax, we need an output projection. 50 | # Sampled softmax only makes sense if we sample less than vocabulary size. 51 | if config.NUM_SAMPLES > 0 and config.NUM_SAMPLES < config.DEC_VOCAB: 52 | w = tf.get_variable('proj_w', [config.HIDDEN_SIZE, config.DEC_VOCAB]) 53 | b = tf.get_variable('proj_b', [config.DEC_VOCAB]) 54 | self.output_projection = (w, b) 55 | 56 | # def sampled_loss(inputs, labels): 57 | def sampled_loss(labels, inputs): 58 | labels = tf.reshape(labels, [-1, 1]) 59 | # return tf.nn.sampled_softmax_loss(tf.transpose(w), b, inputs, labels, 60 | # config.NUM_SAMPLES, config.DEC_VOCAB) 61 | return tf.nn.sampled_softmax_loss(tf.transpose(w), b, labels, inputs, 62 | config.NUM_SAMPLES, config.DEC_VOCAB) 63 | self.softmax_loss_function = sampled_loss 64 | 65 | # single_cell = tf.nn.rnn_cell.GRUCell(config.HIDDEN_SIZE) 66 | single_cell = tf.contrib.rnn.GRUCell(config.HIDDEN_SIZE) 67 | # self.cell = tf.nn.rnn_cell.MultiRNNCell([single_cell] * config.NUM_LAYERS) 68 | self.cell = tf.contrib.rnn.MultiRNNCell([single_cell] * config.NUM_LAYERS) 69 | 70 | def _create_loss(self): 71 | print('Creating loss... \nIt might take a couple of minutes depending on how many buckets you have.') 72 | start = time.time() 73 | def _seq2seq_f(encoder_inputs, decoder_inputs, do_decode): 74 | # return tf.nn.seq2seq.embedding_attention_seq2seq( 75 | return tf.contrib.legacy_seq2seq.embedding_attention_seq2seq( 76 | encoder_inputs, decoder_inputs, self.cell, 77 | num_encoder_symbols=config.ENC_VOCAB, 78 | num_decoder_symbols=config.DEC_VOCAB, 79 | embedding_size=config.HIDDEN_SIZE, 80 | output_projection=self.output_projection, 81 | feed_previous=do_decode) 82 | 83 | if self.fw_only: 84 | # self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets( 85 | self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets( 86 | self.encoder_inputs, 87 | self.decoder_inputs, 88 | self.targets, 89 | self.decoder_masks, 90 | config.BUCKETS, 91 | lambda x, y: _seq2seq_f(x, y, True), 92 | softmax_loss_function=self.softmax_loss_function) 93 | # If we use output projection, we need to project outputs for decoding. 94 | if self.output_projection: 95 | for bucket in range(len(config.BUCKETS)): 96 | self.outputs[bucket] = [tf.matmul(output, 97 | self.output_projection[0]) + self.output_projection[1] 98 | for output in self.outputs[bucket]] 99 | else: 100 | # self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets( 101 | self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets( 102 | self.encoder_inputs, 103 | self.decoder_inputs, 104 | self.targets, 105 | self.decoder_masks, 106 | config.BUCKETS, 107 | lambda x, y: _seq2seq_f(x, y, False), 108 | softmax_loss_function=self.softmax_loss_function) 109 | print('Time:', time.time() - start) 110 | 111 | def _creat_optimizer(self): 112 | print('Create optimizer... \nIt might take a couple of minutes depending on how many buckets you have.') 113 | with tf.variable_scope('training') as scope: 114 | self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step') 115 | 116 | if not self.fw_only: 117 | self.optimizer = tf.train.GradientDescentOptimizer(config.LR) 118 | trainables = tf.trainable_variables() 119 | self.gradient_norms = [] 120 | self.train_ops = [] 121 | start = time.time() 122 | for bucket in range(len(config.BUCKETS)): 123 | 124 | clipped_grads, norm = tf.clip_by_global_norm(tf.gradients(self.losses[bucket], 125 | trainables), 126 | config.MAX_GRAD_NORM) 127 | self.gradient_norms.append(norm) 128 | self.train_ops.append(self.optimizer.apply_gradients(list(zip(clipped_grads, trainables)), 129 | global_step=self.global_step)) 130 | print('Creating opt for bucket {} took {} seconds'.format(bucket, time.time() - start)) 131 | start = time.time() 132 | 133 | 134 | def _create_summary(self): 135 | pass 136 | 137 | def build_graph(self): 138 | self._create_placeholders() 139 | self._inference() 140 | self._create_loss() 141 | self._creat_optimizer() 142 | self._create_summary() 143 | -------------------------------------------------------------------------------- /chatbot/output_convo.txt: -------------------------------------------------------------------------------- 1 | HUMAN ++++ hi 2 | BOT ++++ hi 3 | HUMAN ++++ how are you? 4 | BOT ++++ i ' m okay . 5 | HUMAN ++++ really? 6 | BOT ++++ i don ' t know what to say . 7 | HUMAN ++++ how're you? 8 | BOT ++++ i ' m fine . 9 | HUMAN ++++ where're you from? 10 | BOT ++++ california . 11 | HUMAN ++++ which city? 12 | BOT ++++ let ' s talk about something else . 13 | HUMAN ++++ you don't like california? 14 | BOT ++++ oh . . . 15 | HUMAN ++++ why not? 16 | BOT ++++ i can ' t . 17 | HUMAN ++++ how old are you? 18 | BOT ++++ eighteen . 19 | ============================================= 20 | HUMAN ++++ hi 21 | BOT ++++ hi 22 | HUMAN ++++ how are you? 23 | BOT ++++ i ' m okay . 24 | HUMAN ++++ what's your name? 25 | BOT ++++ let ' s talk about something else . 26 | ============================================= 27 | HUMAN ++++ hi 28 | BOT ++++ hi 29 | HUMAN ++++ how are you? 30 | BOT ++++ i ' m okay . 31 | HUMAN ++++ what's your name? 32 | BOT ++++ what ? 33 | HUMAN ++++ what's your name? 34 | BOT ++++ let ' s talk about something else . 35 | HUMAN ++++ what's your name? 36 | BOT ++++ let ' s talk about something else . 37 | HUMAN ++++ what's your name? 38 | BOT ++++ let ' s talk about something else . 39 | HUMAN ++++ what's your name? 40 | BOT ++++ prisoner . 41 | HUMAN ++++ ha ha you really don't want to tell your name 42 | BOT ++++ i ' m not going to do it . 43 | HUMAN ++++ how old are you? 44 | BOT ++++ can you guess ? 45 | HUMAN ++++ eighteen? 46 | BOT ++++ in the bathroom . 47 | HUMAN ++++ in the bathroom? 48 | BOT ++++ in the next time . 49 | HUMAN ++++ why are you talking so funny? 50 | BOT ++++ i ' m looking . 51 | HUMAN ++++ where are you from? 52 | BOT ++++ general . 53 | HUMAN ++++ where are you from? 54 | BOT ++++ how ? 55 | HUMAN ++++ do you like the us? 56 | BOT ++++ i ' m sure . 57 | HUMAN ++++ do you vote? 58 | BOT ++++ i ' m afraid . 59 | HUMAN ++++ what are you afraid of? 60 | BOT ++++ i ' m looking . 61 | HUMAN ++++ what are you looking for? 62 | BOT ++++ can you guess ? 63 | HUMAN ++++ no 64 | BOT ++++ i don ' t know what to say . 65 | ============================================= 66 | HUMAN ++++ hi 67 | BOT ++++ hi . what ' s your name ? 68 | HUMAN ++++ i'm chip 69 | BOT ++++ please . 70 | HUMAN ++++ please what? 71 | BOT ++++ i don ' t know what to say . 72 | HUMAN ++++ where are you from? 73 | BOT ++++ california . 74 | HUMAN ++++ do you like california? 75 | BOT ++++ i ' m okay . 76 | HUMAN ++++ i like california a lot 77 | BOT ++++ of what ? 78 | HUMAN ++++ i live here 79 | BOT ++++ and what is it ? 80 | HUMAN ++++ california 81 | BOT ++++ i don ' t know what to say . 82 | HUMAN ++++ i'm a student here 83 | BOT ++++ and get it out of here . 84 | HUMAN ++++ what do you like to do in your free time? 85 | BOT ++++ and what about the gun ? 86 | HUMAN ++++ do you have a gun? 87 | BOT ++++ yes . 88 | HUMAN ++++ you're dangerous! 89 | BOT ++++ i ' m not . 90 | HUMAN ++++ do you go hunting? 91 | BOT ++++ i can do . 92 | HUMAN ++++ what do you hunt? 93 | BOT ++++ virgin . 94 | HUMAN ++++ that's very bad 95 | BOT ++++ and you in love ? 96 | HUMAN ++++ no i'm not 97 | BOT ++++ please - - 98 | HUMAN ++++ okay, i am 99 | BOT ++++ and you are good . 100 | HUMAN ++++ yeah i am good too. 101 | BOT ++++ good good to you . 102 | HUMAN ++++ are you in love? 103 | BOT ++++ i ' m ready . 104 | HUMAN ++++ ready for what? 105 | BOT ++++ for what ? 106 | HUMAN ++++ for what? 107 | BOT ++++ i don ' t know what to say . 108 | HUMAN ++++ tell me 109 | BOT ++++ i can do it . 110 | HUMAN ++++ okay, tell me what you're ready for 111 | BOT ++++ i ' m doing you . 112 | HUMAN ++++ no you're not 113 | BOT ++++ please ! 114 | HUMAN ++++ this is a strange conversation 115 | BOT ++++ i don ' t know what to say . 116 | HUMAN ++++ are you a guy? 117 | BOT ++++ of what ? 118 | HUMAN ++++ are you a boy? 119 | BOT ++++ i ' m fine . 120 | HUMAN ++++ are you a man? 121 | BOT ++++ i ' m a little . 122 | HUMAN ++++ what's your name? 123 | BOT ++++ chris - 124 | HUMAN ++++ chris who? 125 | BOT ++++ cornelius . 126 | HUMAN ++++ so i guess you're a guy. 127 | BOT ++++ i am . 128 | HUMAN ++++ okay 129 | BOT ++++ okay ? 130 | HUMAN ++++ do you have a girlfriend? 131 | BOT ++++ i ' m sure . 132 | HUMAN ++++ what's her name? 133 | BOT ++++ let ' s talk about something else . 134 | ============================================= 135 | HUMAN ++++ hi chris 136 | BOT ++++ hi 137 | HUMAN ++++ how're you today? 138 | BOT ++++ okay . 139 | HUMAN ++++ the weather is nice today 140 | BOT ++++ i don ' t know what to say . 141 | HUMAN ++++ just say the weather is nice 142 | BOT ++++ i don ' t know what to say . 143 | ============================================= 144 | HUMAN ++++ hi 145 | BOT ++++ hi 146 | HUMAN ++++ how're you today? 147 | BOT ++++ okay . 148 | HUMAN ++++ just okay? 149 | BOT ++++ i ' m okay . 150 | HUMAN ++++ what's on your mind? 151 | BOT ++++ i ' m in my room . 152 | HUMAN ++++ doing what? 153 | BOT ++++ i don ' t know what to say . 154 | HUMAN ++++ you're boring. 155 | BOT ++++ i ' m not . 156 | HUMAN ++++ prove it 157 | BOT ++++ and what ? 158 | HUMAN ++++ what what? 159 | BOT ++++ sorry , i ' m still very stupid . 160 | HUMAN ++++ it's okay. we're all stupid sometimes. 161 | BOT ++++ ' s good to see . 162 | HUMAN ++++ what's good to see? 163 | BOT ++++ i ' m in . 164 | HUMAN ++++ you're in your room? 165 | BOT ++++ i ' m here . 166 | HUMAN ++++ where's your room? 167 | BOT ++++ i ' m in my room . 168 | HUMAN ++++ are you alone? 169 | BOT ++++ i ' m fine . 170 | HUMAN ++++ are you by yourself? 171 | BOT ++++ in the morning . 172 | HUMAN ++++ what time is it? 173 | BOT ++++ # # # # # # # . 174 | ============================================= 175 | HUMAN ++++ hi 176 | BOT ++++ hi . what ' s your name ? 177 | HUMAN ++++ i'm marie 178 | BOT ++++ you are . 179 | HUMAN ++++ yes i am 180 | BOT ++++ and you are my friend . 181 | HUMAN ++++ yes you are 182 | BOT ++++ and you are my best . 183 | HUMAN ++++ that's so sweet 184 | BOT ++++ please . 185 | HUMAN ++++ do you know what time it is? 186 | BOT ++++ i don ' t know what to say . 187 | ============================================= 188 | HUMAN ++++ what time is it? 189 | BOT ++++ 5 9 6 9 1 4 5 . 190 | HUMAN ++++ it doesn't make any sense 191 | BOT ++++ are you ? 192 | HUMAN ++++ i do 193 | BOT ++++ and you are ? 194 | HUMAN ++++ i'm a human. 195 | BOT ++++ you in . 196 | HUMAN ++++ are you in trouble? 197 | BOT ++++ i ' m in . 198 | HUMAN ++++ what trouble? 199 | BOT ++++ i ' m in love . 200 | HUMAN ++++ wow that's nice. 201 | BOT ++++ like that ? 202 | HUMAN ++++ yeah 203 | BOT ++++ i don ' t know what to say . 204 | HUMAN ++++ who are you? 205 | BOT ++++ i don ' t understand . 206 | HUMAN ++++ who are you? 207 | BOT ++++ no idea . 208 | HUMAN ++++ what's your name? 209 | BOT ++++ chris - 210 | HUMAN ++++ chris who? 211 | BOT ++++ cornelius . 212 | HUMAN ++++ where are you from? 213 | BOT ++++ california . 214 | HUMAN ++++ are you american? 215 | BOT ++++ i ' m fine . 216 | HUMAN ++++ of course you're fine. 217 | BOT ++++ and you ? 218 | HUMAN ++++ i'm okay. 219 | BOT ++++ you are . 220 | HUMAN ++++ yes, i am. 221 | BOT ++++ and you are my friend . 222 | HUMAN ++++ yes, i am. 223 | BOT ++++ and you are my friend . 224 | HUMAN ++++ that's correct. 225 | BOT ++++ and you ' re in trouble ? 226 | HUMAN ++++ no, i'm not. why did you think so? 227 | BOT ++++ because it ' s a good idea . 228 | HUMAN ++++ what is your good idea? 229 | BOT ++++ i ' ll be fine . 230 | ============================================= 231 | HUMAN ++++ hello 232 | BOT ++++ and you are in trouble ? 233 | HUMAN ++++ what did you say that? 234 | BOT ++++ ' s life . 235 | HUMAN ++++ i'm not in trouble. 236 | BOT ++++ what the hell is that ? 237 | HUMAN ++++ okay, what trouble? 238 | BOT ++++ i ' m looking . 239 | HUMAN ++++ why did you say i'm in trouble? 240 | BOT ++++ let ' s talk about something else . 241 | HUMAN ++++ what do you want to talk about? 242 | BOT ++++ i can ' t . 243 | HUMAN ++++ what can't? 244 | BOT ++++ i ' ll have a drink . 245 | HUMAN ++++ okay, me too. 246 | BOT ++++ you were here ? 247 | HUMAN ++++ i'm still here. 248 | BOT ++++ what is it ? 249 | HUMAN ++++ i thought you wanted a drink. 250 | BOT ++++ i ' m not . 251 | HUMAN ++++ you just said so! 252 | BOT ++++ i was wrong . 253 | HUMAN ++++ it's fine. 254 | BOT ++++ okay ? 255 | HUMAN ++++ yeah 256 | BOT ++++ i don ' t know what to say . 257 | HUMAN ++++ do you drink a lot? 258 | BOT ++++ i can do it . 259 | HUMAN ++++ do you drink often? 260 | BOT ++++ in the time . 261 | HUMAN ++++ that doesn't make any sense. 262 | BOT ++++ are you ? 263 | HUMAN ++++ what do you drink? 264 | BOT ++++ ' s in a drink . 265 | HUMAN ++++ do you drink alcohol? 266 | BOT ++++ in time . 267 | HUMAN ++++ i like beer. 268 | BOT ++++ oh . . . 269 | HUMAN ++++ don't you like beer? 270 | BOT ++++ i don ' t know what to say . 271 | HUMAN ++++ where do you live? 272 | BOT ++++ up . 273 | HUMAN ++++ up where? 274 | BOT ++++ in the hospital . 275 | HUMAN ++++ i'm sorry. are you okay? 276 | BOT ++++ i ' m fine . 277 | HUMAN ++++ why are you in the hospital? 278 | BOT ++++ and what ? 279 | HUMAN ++++ how many words do you know? 280 | BOT ++++ i ' d do it . 281 | HUMAN ++++ how many? 282 | BOT ++++ i ' m not . 283 | HUMAN ++++ how many? 284 | BOT ++++ i ' m not . 285 | HUMAN ++++ how many? 286 | BOT ++++ i ' m not . 287 | HUMAN ++++ whatever 288 | BOT ++++ i don ' t know what to say . 289 | HUMAN ++++ you're boring. 290 | BOT ++++ i ' m not . 291 | HUMAN ++++ yes you are. 292 | BOT ++++ and you are my friend . 293 | HUMAN ++++ no i'm not. 294 | BOT ++++ please . 295 | HUMAN ++++ okay fine. i'm your friend. 296 | BOT ++++ you ' re my mother ? 297 | HUMAN ++++ i have no son like you. 298 | BOT ++++ i don ' t know what to say . 299 | HUMAN ++++ i have no son. 300 | BOT ++++ i see you . 301 | HUMAN ++++ i have no children. 302 | BOT ++++ you are . 303 | HUMAN ++++ omg you're so dumb. 304 | BOT ++++ i ' m not . 305 | HUMAN ++++ you're stupid! 306 | BOT ++++ i ' m not . 307 | HUMAN ++++ do you always talk like this? 308 | BOT ++++ of course . 309 | HUMAN ++++ how old are you? 310 | BOT ++++ eighteen . 311 | HUMAN ++++ you're big enough. 312 | BOT ++++ i ' m not . 313 | HUMAN ++++ are you an adult? 314 | BOT ++++ i ' m going . 315 | HUMAN ++++ you're going to be an adult? 316 | BOT ++++ i ' m looking for you . 317 | HUMAN ++++ i'm in my room. 318 | BOT ++++ and your mother ? 319 | HUMAN ++++ she's in vietnam. 320 | BOT ++++ you ' re here . 321 | HUMAN ++++ yes i'm here. 322 | BOT ++++ and what are you talking about ? 323 | HUMAN ++++ you asked me where i am. 324 | BOT ++++ and you are your friend . 325 | HUMAN ++++ i need to go now. 326 | BOT ++++ i will not . 327 | HUMAN ++++ that's fine. you can stay. i'm leaving. 328 | BOT ++++ oh . . . 329 | HUMAN ++++ bye 330 | BOT ++++ i don ' t know what to say . 331 | HUMAN ++++ goodbye 332 | BOT ++++ i don ' t know what to say . 333 | HUMAN ++++ bye bye 334 | BOT ++++ okay ? 335 | HUMAN ++++ see you later 336 | BOT ++++ i don ' t know what to say . 337 | ============================================= 338 | HUMAN ++++ hi 339 | BOT ++++ hi . what ' s your name ? 340 | HUMAN ++++ i'm maria 341 | BOT ++++ please . 342 | HUMAN ++++ i'm jackson 343 | BOT ++++ you are . 344 | HUMAN ++++ i'm jack 345 | BOT ++++ let ' s talk about something else . 346 | HUMAN ++++ i have no go 347 | BOT ++++ you are ? 348 | HUMAN ++++ i have to go 349 | BOT ++++ and what do you mean ? 350 | HUMAN ++++ i'm leaving. 351 | BOT ++++ what ? 352 | HUMAN ++++ yeah, bye 353 | BOT ++++ bye . have a nice day ! 354 | ============================================= 355 | HUMAN ++++ that's very bad. 356 | BOT ++++ i ' ve got to work . 357 | HUMAN ++++ what do you do? 358 | BOT ++++ i ' ll be there . 359 | HUMAN ++++ where do you work? 360 | BOT ++++ in the trunk . 361 | HUMAN ++++ are you a machenics? 362 | BOT ++++ i ' m not . 363 | HUMAN ++++ what are you? 364 | BOT ++++ no idea . 365 | ============================================= 366 | --------------------------------------------------------------------------------