├── .gitignore
├── README.md
└── chatbot
├── README.md
├── chatbot.py
├── config.py
├── data.py
├── model.py
└── output_convo.txt
/.gitignore:
--------------------------------------------------------------------------------
1 |
2 | *.pdf
3 |
4 | *.SUNet
5 | *.pyc
6 |
7 |
8 | examples/checkpoints/*
9 |
10 | examples/chatbot/processed/*
11 | examples/chatbot/checkpoints/*
12 | examples/chatbot/data_analysis.py
13 |
14 | assignments/chatbot/processed/*
15 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # tf-stanford-tutorials Chatbot
2 | This repository contains the code example for a Chatbot from the course CS 20SI: TensorFlow for Deep Learning Research at Stanford University
3 |
4 | It is a complete but primitive neural chatbot using sequence to sequence model with attentional decoder in TensorFlow. Originally it was created by Chip Huyen as the starter code for an assignment «TensorFlow for Deep Learning Research» cs20si.stanford.edu
5 |
6 | Original Github code repo:
7 | https://goo.gl/QH6M6E
8 |
9 | A companion assignment instructions sheet: web.stanford.edu -
10 | https://goo.gl/vfGQI4
11 |
12 | Claude Coulombe, TÉLUQ / UQAM Montréal, updated the code in order to be compatible with Python 3 and TensorFlow 1.1. On May 31 2017, it seems to work correctly but I have not the computing ressources to train it for a long period of time.
13 |
14 | Detailed syllabus and lecture notes can be found at http://cs20si.stanford.edu
15 |
16 |
17 | Instructions:
18 | -------------
19 |
20 | 1) Check out this repository.
21 | git clone https://github.com/ClaudeCoulombe/tf-stanford-tutorials.git
22 |
23 | 2) Download and unzip the dataset and put it in the data sub-folder
24 | https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
25 |
26 | 3) Change the DATA_PATH in the config.py, line 20
27 |
28 | 4) Then run the data.py file
29 | python data.py
30 | It should create the folder 'processed', and then put a bunch of data files into it.
31 |
32 | 5) Train the model
33 | python chatbot.py --mode train
34 | You could interrupt it after a long time or when the loss is low enough
35 | since the TensorFlow model is saved in the checkpoints folder you can recover the models
36 |
37 | 6) Interact / play with the Chatbot
38 | python chatbot.py --mode chat
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
--------------------------------------------------------------------------------
/chatbot/README.md:
--------------------------------------------------------------------------------
1 | <<<<<<<
2 | A neural chatbot using sequence to sequence model with
3 | attentional decoder. This is a fully functional chatbot.
4 |
5 | This is based on Google Translate Tensorflow model
6 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/
7 |
8 | Sequence to sequence model by Cho et al.(2014)
9 |
10 | Created by Chip Huyen as the starter code for assignment 3,
11 | class CS 20SI: "TensorFlow for Deep Learning Research"
12 | cs20si.stanford.edu
13 |
14 | Updated for Python 3 and TensorFlow 1.1 by Claude Coulombe - TÉLUQ / UQAM Montréal - May 31 2017
15 |
16 | The detailed assignment handout and information on training time can be found at http://web.stanford.edu/class/cs20si/assignments/a3.pdf
17 |
18 |
Sample conversation
19 | HUMAN ++++ hi
20 | BOT ++++ hi . what ' s your name ?
21 | HUMAN ++++ i'm chip
22 | BOT ++++ please .
23 | HUMAN ++++ please what?
24 | BOT ++++ i don ' t know what to say .
25 | HUMAN ++++ where are you from?
26 | BOT ++++ california .
27 | HUMAN ++++ do you like california?
28 | BOT ++++ i ' m okay .
29 | HUMAN ++++ i like california a lot
30 | BOT ++++ of what ?
31 | HUMAN ++++ i live here
32 | BOT ++++ and what is it ?
33 | HUMAN ++++ california
34 | BOT ++++ i don ' t know what to say .
35 | HUMAN ++++ i'm a student here
36 | BOT ++++ and get it out of here .
37 | HUMAN ++++ what do you like to do in your free time?
38 | BOT ++++ and what about the gun ?
39 | HUMAN ++++ do you have a gun?
40 | BOT ++++ yes .
41 | HUMAN ++++ you're dangerous!
42 | BOT ++++ i ' m not .
43 | HUMAN ++++ do you go hunting?
44 | BOT ++++ i can do .
45 | HUMAN ++++ what do you hunt?
46 | BOT ++++ virgin .
47 | HUMAN ++++ that's very bad
48 | BOT ++++ and you in love ?
49 | HUMAN ++++ no i'm not
50 | BOT ++++ please - -
51 | HUMAN ++++ okay, i am
52 | BOT ++++ and you are good .
53 | HUMAN ++++ yeah i am good too.
54 | BOT ++++ good good to you .
55 | HUMAN ++++ are you in love?
56 | BOT ++++ i ' m ready .
57 | HUMAN ++++ ready for what?
58 | BOT ++++ for what ?
59 | HUMAN ++++ for what?
60 | BOT ++++ i don ' t know what to say .
61 | HUMAN ++++ tell me
62 | BOT ++++ i can do it .
63 | HUMAN ++++ okay, tell me what you're ready for
64 | BOT ++++ i ' m doing you .
65 | HUMAN ++++ no you're not
66 | BOT ++++ please !
67 | HUMAN ++++ this is a strange conversation
68 | BOT ++++ i don ' t know what to say .
69 | HUMAN ++++ are you a guy?
70 | BOT ++++ of what ?
71 | HUMAN ++++ are you a boy?
72 | BOT ++++ i ' m fine .
73 | HUMAN ++++ are you a man?
74 | BOT ++++ i ' m a little .
75 | HUMAN ++++ what's your name?
76 | BOT ++++ chris -
77 | HUMAN ++++ chris who?
78 | BOT ++++ cornelius .
79 | HUMAN ++++ so i guess you're a guy.
80 | BOT ++++ i am .
81 | HUMAN ++++ okay
82 | BOT ++++ okay ?
83 | HUMAN ++++ do you have a girlfriend?
84 | BOT ++++ i ' m sure .
85 | HUMAN ++++ what's her name?
86 | BOT ++++ let ' s talk about something else .
87 |
88 | See output_convo.txt for more sample conversations.
89 |
90 | Usage
91 |
92 | Step 1: create a data folder in your project directory, download
93 | the Cornell Movie-Dialogs Corpus from
94 | https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
95 | Unzip it
96 |
97 | Step 2: python data.py
98 |
This will do all the pre-processing for the Cornell dataset.
99 |
100 | Step 3:
101 | python chatbot.py --mode [train/chat]
102 | If mode is train, then you train the chatbot. By default, the model will
103 | restore the previously trained weights (if there is any) and continue
104 | training up on that.
105 |
106 | If you want to start training from scratch, please delete all the checkpoints
107 | in the checkpoints folder.
108 |
109 | If the mode is chat, you'll go into the interaction mode with the bot.
110 |
111 | By default, all the conversations you have with the chatbot will be written
112 | into the file output_convo.txt in the processed folder. If you run this chatbot,
113 | I kindly ask you to send me the output_convo.txt so that I can improve
114 | the chatbot. My email is huyenn@stanford.edu
115 |
116 | If you find the tutorial helpful, please head over to Anonymous Chatlog Donation
117 | to see how you can help us create the first realistic dialogue dataset.
118 |
119 | Thank you very much!
120 | >>>>>>> origin/master
121 |
--------------------------------------------------------------------------------
/chatbot/chatbot.py:
--------------------------------------------------------------------------------
1 | """ A neural chatbot using sequence to sequence model with
2 | attentional decoder.
3 |
4 | This is based on Google Translate Tensorflow model
5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/
6 |
7 | Sequence to sequence model by Cho et al.(2014)
8 |
9 | Created by Chip Huyen as the starter code for assignment 3,
10 | class CS 20SI: "TensorFlow for Deep Learning Research"
11 | cs20si.stanford.edu
12 |
13 | This file contains the code to run the model.
14 |
15 | See readme.md for instruction on how to run the starter code.
16 | """
17 |
18 |
19 |
20 | import argparse
21 | import os
22 | import random
23 | import sys
24 | import time
25 |
26 | import numpy as np
27 | import tensorflow as tf
28 |
29 | from model import ChatBotModel
30 | import config
31 | import data
32 |
33 | def _get_random_bucket(train_buckets_scale):
34 | """ Get a random bucket from which to choose a training sample """
35 | rand = random.random()
36 | return min([i for i in range(len(train_buckets_scale))
37 | if train_buckets_scale[i] > rand])
38 |
39 | def _assert_lengths(encoder_size, decoder_size, encoder_inputs, decoder_inputs, decoder_masks):
40 | """ Assert that the encoder inputs, decoder inputs, and decoder masks are
41 | of the expected lengths """
42 | if len(encoder_inputs) != encoder_size:
43 | raise ValueError("Encoder length must be equal to the one in bucket,"
44 | " %d != %d." % (len(encoder_inputs), encoder_size))
45 | if len(decoder_inputs) != decoder_size:
46 | raise ValueError("Decoder length must be equal to the one in bucket,"
47 | " %d != %d." % (len(decoder_inputs), decoder_size))
48 | if len(decoder_masks) != decoder_size:
49 | raise ValueError("Weights length must be equal to the one in bucket,"
50 | " %d != %d." % (len(decoder_masks), decoder_size))
51 |
52 | def run_step(sess, model, encoder_inputs, decoder_inputs, decoder_masks, bucket_id, forward_only):
53 | """ Run one step in training.
54 | @forward_only: boolean value to decide whether a backward path should be created
55 | forward_only is set to True when you just want to evaluate on the test set,
56 | or when you want to the bot to be in chat mode. """
57 | encoder_size, decoder_size = config.BUCKETS[bucket_id]
58 | _assert_lengths(encoder_size, decoder_size, encoder_inputs, decoder_inputs, decoder_masks)
59 |
60 | # input feed: encoder inputs, decoder inputs, target_weights, as provided.
61 | input_feed = {}
62 | for step in range(encoder_size):
63 | input_feed[model.encoder_inputs[step].name] = encoder_inputs[step]
64 | for step in range(decoder_size):
65 | input_feed[model.decoder_inputs[step].name] = decoder_inputs[step]
66 | input_feed[model.decoder_masks[step].name] = decoder_masks[step]
67 |
68 | last_target = model.decoder_inputs[decoder_size].name
69 | input_feed[last_target] = np.zeros([model.batch_size], dtype=np.int32)
70 |
71 | # output feed: depends on whether we do a backward step or not.
72 | if not forward_only:
73 | output_feed = [model.train_ops[bucket_id], # update op that does SGD.
74 | model.gradient_norms[bucket_id], # gradient norm.
75 | model.losses[bucket_id]] # loss for this batch.
76 | else:
77 | output_feed = [model.losses[bucket_id]] # loss for this batch.
78 | for step in range(decoder_size): # output logits.
79 | output_feed.append(model.outputs[bucket_id][step])
80 |
81 | outputs = sess.run(output_feed, input_feed)
82 | if not forward_only:
83 | return outputs[1], outputs[2], None # Gradient norm, loss, no outputs.
84 | else:
85 | return None, outputs[0], outputs[1:] # No gradient norm, loss, outputs.
86 |
87 | def _get_buckets():
88 | """ Load the dataset into buckets based on their lengths.
89 | train_buckets_scale is the inverval that'll help us
90 | choose a random bucket later on.
91 | """
92 | test_buckets = data.load_data('test_ids.enc', 'test_ids.dec')
93 | data_buckets = data.load_data('train_ids.enc', 'train_ids.dec')
94 | train_bucket_sizes = [len(data_buckets[b]) for b in range(len(config.BUCKETS))]
95 | print("Number of samples in each bucket:\n", train_bucket_sizes)
96 | train_total_size = sum(train_bucket_sizes)
97 | # list of increasing numbers from 0 to 1 that we'll use to select a bucket.
98 | train_buckets_scale = [sum(train_bucket_sizes[:i + 1]) / train_total_size
99 | for i in range(len(train_bucket_sizes))]
100 | print("Bucket scale:\n", train_buckets_scale)
101 | return test_buckets, data_buckets, train_buckets_scale
102 |
103 | def _get_skip_step(iteration):
104 | """ How many steps should the model train before it saves all the weights. """
105 | if iteration < 100:
106 | return 30
107 | return 100
108 |
109 | def _check_restore_parameters(sess, saver):
110 | """ Restore the previously trained parameters if there are any. """
111 | ckpt = tf.train.get_checkpoint_state(os.path.dirname(config.CPT_PATH + '/checkpoint'))
112 | if ckpt and ckpt.model_checkpoint_path:
113 | print("Loading parameters for the Chatbot")
114 | saver.restore(sess, ckpt.model_checkpoint_path)
115 | else:
116 | print("Initializing fresh parameters for the Chatbot")
117 |
118 | def _eval_test_set(sess, model, test_buckets):
119 | """ Evaluate on the test set. """
120 | for bucket_id in range(len(config.BUCKETS)):
121 | if len(test_buckets[bucket_id]) == 0:
122 | print(" Test: empty bucket %d" % (bucket_id))
123 | continue
124 | start = time.time()
125 | encoder_inputs, decoder_inputs, decoder_masks = data.get_batch(test_buckets[bucket_id],
126 | bucket_id,
127 | batch_size=config.BATCH_SIZE)
128 | _, step_loss, _ = run_step(sess, model, encoder_inputs, decoder_inputs,
129 | decoder_masks, bucket_id, True)
130 | print('Test bucket {}: loss {}, time {}'.format(bucket_id, step_loss, time.time() - start))
131 |
132 | def train():
133 | """ Train the bot """
134 | test_buckets, data_buckets, train_buckets_scale = _get_buckets()
135 | # in train mode, we need to create the backward path, so forwrad_only is False
136 | model = ChatBotModel(False, config.BATCH_SIZE)
137 | model.build_graph()
138 |
139 | saver = tf.train.Saver()
140 |
141 | with tf.Session() as sess:
142 | print('Running session')
143 | sess.run(tf.global_variables_initializer())
144 | _check_restore_parameters(sess, saver)
145 |
146 | iteration = model.global_step.eval()
147 | total_loss = 0
148 | while True:
149 | skip_step = _get_skip_step(iteration)
150 | bucket_id = _get_random_bucket(train_buckets_scale)
151 | encoder_inputs, decoder_inputs, decoder_masks = data.get_batch(data_buckets[bucket_id],
152 | bucket_id,
153 | batch_size=config.BATCH_SIZE)
154 | start = time.time()
155 | _, step_loss, _ = run_step(sess, model, encoder_inputs, decoder_inputs, decoder_masks, bucket_id, False)
156 | total_loss += step_loss
157 | iteration += 1
158 |
159 | if iteration % skip_step == 0:
160 | print('Iter {}: loss {}, time {}'.format(iteration, total_loss/skip_step, time.time() - start))
161 | start = time.time()
162 | total_loss = 0
163 | saver.save(sess, os.path.join(config.CPT_PATH, 'chatbot'), global_step=model.global_step)
164 | if iteration % (10 * skip_step) == 0:
165 | # Run evals on development set and print their loss
166 | _eval_test_set(sess, model, test_buckets)
167 | start = time.time()
168 | sys.stdout.flush()
169 |
170 | def _get_user_input():
171 | """ Get user's input, which will be transformed into encoder input later """
172 | print("HUMAN ++++>", end="")
173 | sys.stdout.flush()
174 | return sys.stdin.readline()
175 |
176 | def _find_right_bucket(length):
177 | """ Find the proper bucket for an encoder input based on its length """
178 | return min([b for b in range(len(config.BUCKETS))
179 | if config.BUCKETS[b][0] >= length])
180 |
181 | TRACE = False
182 |
183 | def _construct_response(output_logits, inv_dec_vocab):
184 | """ Construct a response to the user's encoder input.
185 | @output_logits: the outputs from sequence to sequence wrapper.
186 | output_logits is decoder_size np array, each of dim 1 x DEC_VOCAB
187 |
188 | This is a greedy decoder - outputs are just argmaxes of output_logits.
189 | """
190 | if TRACE:
191 | print(output_logits[0])
192 | outputs = [int(np.argmax(logit, axis=1)) for logit in output_logits]
193 | # If there is an EOS symbol in outputs, cut them at that point.
194 | if config.EOS_ID in outputs:
195 | outputs = outputs[:outputs.index(config.EOS_ID)]
196 | # Print out sentence corresponding to outputs.
197 | return " ".join([tf.compat.as_str(inv_dec_vocab[output]) for output in outputs])
198 |
199 | def chat():
200 | """ in test mode, we don't to create the backward path
201 | """
202 | _, enc_vocab = data.load_vocab(os.path.join(config.PROCESSED_PATH, 'vocab.enc'))
203 | inv_dec_vocab, _ = data.load_vocab(os.path.join(config.PROCESSED_PATH, 'vocab.dec'))
204 |
205 | model = ChatBotModel(True, batch_size=1)
206 | model.build_graph()
207 |
208 | saver = tf.train.Saver()
209 |
210 | with tf.Session() as sess:
211 | sess.run(tf.global_variables_initializer())
212 | _check_restore_parameters(sess, saver)
213 | output_file = open(os.path.join(config.PROCESSED_PATH, config.OUTPUT_FILE), 'a+')
214 | # Decode from standard input.
215 | max_length = config.BUCKETS[-1][0]
216 | print('Welcome to TensorBro. Say something. Enter to exit. Max length is', max_length)
217 | while True:
218 | line = _get_user_input()
219 | if len(line) > 0 and line[-1] == '\n':
220 | line = line[:-1]
221 | if line == '':
222 | break
223 | output_file.write('HUMAN ++++ ' + line + '\n')
224 | # Get token-ids for the input sentence.
225 | token_ids = data.sentence2id(enc_vocab, str(line))
226 | if (len(token_ids) > max_length):
227 | print('Max length I can handle is:', max_length)
228 | line = _get_user_input()
229 | continue
230 | # Which bucket does it belong to?
231 | bucket_id = _find_right_bucket(len(token_ids))
232 | # Get a 1-element batch to feed the sentence to the model.
233 | encoder_inputs, decoder_inputs, decoder_masks = data.get_batch([(token_ids, [])],
234 | bucket_id,
235 | batch_size=1)
236 | # Get output logits for the sentence.
237 | _, _, output_logits = run_step(sess, model, encoder_inputs, decoder_inputs,
238 | decoder_masks, bucket_id, True)
239 | response = _construct_response(output_logits, inv_dec_vocab)
240 | print('BOT ++++ ' + response)
241 | output_file.write('BOT ++++ ' + response + '\n')
242 | output_file.write('=============================================\n')
243 | output_file.close()
244 |
245 | def main():
246 | parser = argparse.ArgumentParser()
247 | parser.add_argument('--mode', choices={'train', 'chat'},
248 | default='train', help="mode. if not specified, it's in the train mode")
249 | args = parser.parse_args()
250 |
251 | if not os.path.isdir(config.PROCESSED_PATH):
252 | data.prepare_raw_data()
253 | data.process_data()
254 | print('Data ready!')
255 | # create checkpoints folder if there isn't one already
256 | data.make_dir(config.CPT_PATH)
257 |
258 | if args.mode == 'train':
259 | train()
260 | elif args.mode == 'chat':
261 | chat()
262 |
263 | if __name__ == '__main__':
264 | main()
265 |
--------------------------------------------------------------------------------
/chatbot/config.py:
--------------------------------------------------------------------------------
1 | """ A neural chatbot using sequence to sequence model with
2 | attentional decoder.
3 |
4 | This is based on Google Translate Tensorflow model
5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/
6 |
7 | Sequence to sequence model by Cho et al.(2014)
8 |
9 | Created by Chip Huyen as the starter code for assignment 3,
10 | class CS 20SI: "TensorFlow for Deep Learning Research"
11 | cs20si.stanford.edu
12 |
13 | This file contains the hyperparameters for the model.
14 |
15 | See readme.md for instruction on how to run the starter code.
16 | """
17 |
18 | # parameters for processing the dataset
19 | # DATA_PATH = '/Users/Chip/data/cornell movie-dialogs corpus'
20 | DATA_PATH = '/data/cornell movie-dialogs corpus'
21 | CONVO_FILE = 'movie_conversations.txt'
22 | LINE_FILE = 'movie_lines.txt'
23 | OUTPUT_FILE = 'output_convo.txt'
24 | PROCESSED_PATH = 'processed'
25 | CPT_PATH = 'checkpoints'
26 |
27 | THRESHOLD = 2
28 |
29 | PAD_ID = 0
30 | UNK_ID = 1
31 | START_ID = 2
32 | EOS_ID = 3
33 |
34 | TESTSET_SIZE = 25000
35 |
36 | # model parameters
37 | """ Train encoder length distribution:
38 | [175, 92, 11883, 8387, 10656, 13613, 13480, 12850, 11802, 10165,
39 | 8973, 7731, 7005, 6073, 5521, 5020, 4530, 4421, 3746, 3474, 3192,
40 | 2724, 2587, 2413, 2252, 2015, 1816, 1728, 1555, 1392, 1327, 1248,
41 | 1128, 1084, 1010, 884, 843, 755, 705, 660, 649, 594, 558, 517, 475,
42 | 426, 444, 388, 349, 337]
43 | These buckets size seem to work the best
44 | """
45 | # [19530, 17449, 17585, 23444, 22884, 16435, 17085, 18291, 18931]
46 | # BUCKETS = [(6, 8), (8, 10), (10, 12), (13, 15), (16, 19), (19, 22), (23, 26), (29, 32), (39, 44)]
47 |
48 | # [37049, 33519, 30223, 33513, 37371]
49 | # BUCKETS = [(8, 10), (12, 14), (16, 19), (23, 26), (39, 43)]
50 |
51 | BUCKETS = [(8, 10), (12, 14), (16, 19)]
52 |
53 | NUM_LAYERS = 3
54 | HIDDEN_SIZE = 256
55 | BATCH_SIZE = 64
56 |
57 | LR = 0.5
58 | MAX_GRAD_NORM = 5.0
59 |
60 | NUM_SAMPLES = 512
61 | ENC_VOCAB = 24449
62 | DEC_VOCAB = 24633
63 |
--------------------------------------------------------------------------------
/chatbot/data.py:
--------------------------------------------------------------------------------
1 | """ A neural chatbot using sequence to sequence model with
2 | attentional decoder.
3 |
4 | This is based on Google Translate Tensorflow model
5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/
6 |
7 | Sequence to sequence model by Cho et al.(2014)
8 |
9 | Created by Chip Huyen as the starter code for assignment 3,
10 | class CS 20SI: "TensorFlow for Deep Learning Research"
11 | cs20si.stanford.edu
12 |
13 | This file contains the code to do the pre-processing for the
14 | Cornell Movie-Dialogs Corpus.
15 |
16 | See readme.md for instruction on how to run the starter code.
17 | """
18 |
19 | import random
20 | import re
21 | import os
22 |
23 | import numpy as np
24 |
25 | import config
26 |
27 | def get_lines():
28 | id2line = {}
29 | file_path = os.path.join(config.DATA_PATH, config.LINE_FILE)
30 | print(file_path)
31 | # with open(file_path, 'rb') as f:
32 | with open(file_path, 'r', encoding="latin-1") as f:
33 | lines = f.readlines()
34 | for line in lines:
35 | parts = line.split(' +++$+++ ')
36 | if len(parts) == 5:
37 | if parts[4][-1] == '\n':
38 | parts[4] = parts[4][:-1]
39 | id2line[parts[0]] = parts[4]
40 | return id2line
41 |
42 | def get_convos():
43 | """ Get conversations from the raw data """
44 | file_path = os.path.join(config.DATA_PATH, config.CONVO_FILE)
45 | convos = []
46 | # with open(file_path, 'rb') as f:
47 | with open(file_path, 'r', encoding="latin-1") as f:
48 | for line in f.readlines():
49 | parts = line.split(' +++$+++ ')
50 | if len(parts) == 4:
51 | convo = []
52 | for line in parts[3][1:-2].split(', '):
53 | convo.append(line[1:-1])
54 | convos.append(convo)
55 |
56 | return convos
57 |
58 | def question_answers(id2line, convos):
59 | """ Divide the dataset into two sets: questions and answers. """
60 | questions, answers = [], []
61 | for convo in convos:
62 | for index, line in enumerate(convo[:-1]):
63 | questions.append(id2line[convo[index]])
64 | answers.append(id2line[convo[index + 1]])
65 | assert len(questions) == len(answers)
66 | return questions, answers
67 |
68 | def prepare_dataset(questions, answers):
69 | # create path to store all the train & test encoder & decoder
70 | make_dir(config.PROCESSED_PATH)
71 |
72 | # random convos to create the test set
73 | test_ids = random.sample([i for i in range(len(questions))],config.TESTSET_SIZE)
74 |
75 | filenames = ['train.enc', 'train.dec', 'test.enc', 'test.dec']
76 | files = []
77 | for filename in filenames:
78 | # files.append(open(os.path.join(config.PROCESSED_PATH, filename),'wb'))
79 | files.append(open(os.path.join(config.PROCESSED_PATH, filename),'w'))
80 |
81 | for i in range(len(questions)):
82 | if i in test_ids:
83 | files[2].write(questions[i] + '\n')
84 | files[3].write(answers[i] + '\n')
85 | else:
86 | files[0].write(questions[i] + '\n')
87 | files[1].write(answers[i] + '\n')
88 |
89 | for file in files:
90 | file.close()
91 |
92 | def make_dir(path):
93 | """ Create a directory if there isn't one already. """
94 | try:
95 | os.mkdir(path)
96 | except OSError:
97 | pass
98 |
99 | def basic_tokenizer(line, normalize_digits=True):
100 | """ A basic tokenizer to tokenize text into tokens.
101 | Feel free to change this to suit your need. """
102 | line = re.sub('', '', line)
103 | line = re.sub('', '', line)
104 | line = re.sub('\[', '', line)
105 | line = re.sub('\]', '', line)
106 | words = []
107 | # _WORD_SPLIT = re.compile(b"([.,!?\"'-<>:;)(])")
108 | _WORD_SPLIT = re.compile("([.,!?\"'-<>:;)(])")
109 | _DIGIT_RE = re.compile(r"\d")
110 | for fragment in line.strip().lower().split():
111 | for token in re.split(_WORD_SPLIT, fragment):
112 | if not token:
113 | continue
114 | if normalize_digits:
115 | # token = re.sub(_DIGIT_RE, b'#', token)
116 | token = re.sub(_DIGIT_RE, '#', token)
117 | words.append(token)
118 | return words
119 |
120 | def build_vocab(filename, normalize_digits=True):
121 | in_path = os.path.join(config.PROCESSED_PATH, filename)
122 | out_path = os.path.join(config.PROCESSED_PATH, 'vocab.{}'.format(filename[-3:]))
123 |
124 | vocab = {}
125 | # with open(in_path, 'rb') as f:
126 | with open(in_path, 'r', encoding="latin-1") as f:
127 | for line in f.readlines():
128 | for token in basic_tokenizer(line):
129 | if not token in vocab:
130 | vocab[token] = 0
131 | vocab[token] += 1
132 |
133 | sorted_vocab = sorted(vocab, key=vocab.get, reverse=True)
134 | # with open(out_path, 'wb') as f:
135 | with open(out_path, 'w') as f:
136 | f.write('' + '\n')
137 | f.write('' + '\n')
138 | f.write('' + '\n')
139 | f.write('<\s>' + '\n')
140 | index = 4
141 | for word in sorted_vocab:
142 | if vocab[word] < config.THRESHOLD:
143 | # with open('config.py', 'ab') as cf:
144 | with open('config.py', 'a') as cf:
145 | if filename[-3:] == 'enc':
146 | cf.write('ENC_VOCAB = ' + str(index) + '\n')
147 | else:
148 | cf.write('DEC_VOCAB = ' + str(index) + '\n')
149 | break
150 | f.write(word + '\n')
151 | index += 1
152 |
153 | def load_vocab(vocab_path):
154 | # with open(vocab_path, 'rb') as f:
155 | with open(vocab_path, 'r', encoding='latin-1') as f:
156 | words = f.read().splitlines()
157 | return words, {words[i]: i for i in range(len(words))}
158 |
159 | def sentence2id(vocab, line):
160 | return [vocab.get(token, vocab['']) for token in basic_tokenizer(line)]
161 |
162 | def token2id(data, mode):
163 | """ Convert all the tokens in the data into their corresponding
164 | index in the vocabulary. """
165 | vocab_path = 'vocab.' + mode
166 | in_path = data + '.' + mode
167 | out_path = data + '_ids.' + mode
168 |
169 | _, vocab = load_vocab(os.path.join(config.PROCESSED_PATH, vocab_path))
170 | # in_file = open(os.path.join(config.PROCESSED_PATH, in_path), 'rb')
171 | in_file = open(os.path.join(config.PROCESSED_PATH, in_path), 'r')
172 | # out_file = open(os.path.join(config.PROCESSED_PATH, out_path), 'wb')
173 | out_file = open(os.path.join(config.PROCESSED_PATH, out_path), 'w')
174 |
175 | lines = in_file.read().splitlines()
176 | for line in lines:
177 | if mode == 'dec': # we only care about '' and in encoder
178 | ids = [vocab['']]
179 | else:
180 | ids = []
181 | ids.extend(sentence2id(vocab, line))
182 | # ids.extend([vocab.get(token, vocab['']) for token in basic_tokenizer(line)])
183 | if mode == 'dec':
184 | ids.append(vocab['<\s>'])
185 | out_file.write(' '.join(str(id_) for id_ in ids) + '\n')
186 |
187 | def prepare_raw_data():
188 | print('Preparing raw data into train set and test set ...')
189 | id2line = get_lines()
190 | convos = get_convos()
191 | questions, answers = question_answers(id2line, convos)
192 | prepare_dataset(questions, answers)
193 |
194 | def process_data():
195 | print('Preparing data to be model-ready ...')
196 | build_vocab('train.enc')
197 | build_vocab('train.dec')
198 | token2id('train', 'enc')
199 | token2id('train', 'dec')
200 | token2id('test', 'enc')
201 | token2id('test', 'dec')
202 |
203 | def load_data(enc_filename, dec_filename, max_training_size=None):
204 | # encode_file = open(os.path.join(config.PROCESSED_PATH, enc_filename), 'rb')
205 | encode_file = open(os.path.join(config.PROCESSED_PATH, enc_filename), 'r')
206 | # decode_file = open(os.path.join(config.PROCESSED_PATH, dec_filename), 'rb')
207 | decode_file = open(os.path.join(config.PROCESSED_PATH, dec_filename), 'r')
208 | encode, decode = encode_file.readline(), decode_file.readline()
209 | data_buckets = [[] for _ in config.BUCKETS]
210 | i = 0
211 | while encode and decode:
212 | if (i + 1) % 10000 == 0:
213 | print("Bucketing conversation number", i)
214 | encode_ids = [int(id_) for id_ in encode.split()]
215 | decode_ids = [int(id_) for id_ in decode.split()]
216 | for bucket_id, (encode_max_size, decode_max_size) in enumerate(config.BUCKETS):
217 | if len(encode_ids) <= encode_max_size and len(decode_ids) <= decode_max_size:
218 | data_buckets[bucket_id].append([encode_ids, decode_ids])
219 | break
220 | encode, decode = encode_file.readline(), decode_file.readline()
221 | i += 1
222 | return data_buckets
223 |
224 | def _pad_input(input_, size):
225 | return input_ + [config.PAD_ID] * (size - len(input_))
226 |
227 | def _reshape_batch(inputs, size, batch_size):
228 | """ Create batch-major inputs. Batch inputs are just re-indexed inputs
229 | """
230 | batch_inputs = []
231 | for length_id in range(size):
232 | batch_inputs.append(np.array([inputs[batch_id][length_id]
233 | for batch_id in range(batch_size)], dtype=np.int32))
234 | return batch_inputs
235 |
236 |
237 | def get_batch(data_bucket, bucket_id, batch_size=1):
238 | """ Return one batch to feed into the model """
239 | # only pad to the max length of the bucket
240 | encoder_size, decoder_size = config.BUCKETS[bucket_id]
241 | encoder_inputs, decoder_inputs = [], []
242 |
243 | for _ in range(batch_size):
244 | encoder_input, decoder_input = random.choice(data_bucket)
245 | # pad both encoder and decoder, reverse the encoder
246 | encoder_inputs.append(list(reversed(_pad_input(encoder_input, encoder_size))))
247 | decoder_inputs.append(_pad_input(decoder_input, decoder_size))
248 |
249 | # now we create batch-major vectors from the data selected above.
250 | batch_encoder_inputs = _reshape_batch(encoder_inputs, encoder_size, batch_size)
251 | batch_decoder_inputs = _reshape_batch(decoder_inputs, decoder_size, batch_size)
252 |
253 | # create decoder_masks to be 0 for decoders that are padding.
254 | batch_masks = []
255 | for length_id in range(decoder_size):
256 | batch_mask = np.ones(batch_size, dtype=np.float32)
257 | for batch_id in range(batch_size):
258 | # we set mask to 0 if the corresponding target is a PAD symbol.
259 | # the corresponding decoder is decoder_input shifted by 1 forward.
260 | if length_id < decoder_size - 1:
261 | target = decoder_inputs[batch_id][length_id + 1]
262 | if length_id == decoder_size - 1 or target == config.PAD_ID:
263 | batch_mask[batch_id] = 0.0
264 | batch_masks.append(batch_mask)
265 | return batch_encoder_inputs, batch_decoder_inputs, batch_masks
266 |
267 | if __name__ == '__main__':
268 | prepare_raw_data()
269 | process_data()
270 |
--------------------------------------------------------------------------------
/chatbot/model.py:
--------------------------------------------------------------------------------
1 | """ A neural chatbot using sequence to sequence model with
2 | attentional decoder.
3 |
4 | This is based on Google Translate Tensorflow model
5 | https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/
6 |
7 | Sequence to sequence model by Cho et al.(2014)
8 |
9 | Created by Chip Huyen as the starter code for assignment 3,
10 | class CS 20SI: "TensorFlow for Deep Learning Research"
11 | cs20si.stanford.edu
12 |
13 | This file contains the code to build the model
14 |
15 | See readme.md for instruction on how to run the starter code.
16 | """
17 |
18 |
19 | import time
20 |
21 | import numpy as np
22 | import tensorflow as tf
23 |
24 | import config
25 |
26 | class ChatBotModel(object):
27 | def __init__(self, forward_only, batch_size):
28 | """forward_only: if set, we do not construct the backward pass in the model.
29 | """
30 | print('Initialize new model')
31 | self.fw_only = forward_only
32 | self.batch_size = batch_size
33 |
34 | def _create_placeholders(self):
35 | # Feeds for inputs. It's a list of placeholders
36 | print('Create placeholders')
37 | self.encoder_inputs = [tf.placeholder(tf.int32, shape=[None], name='encoder{}'.format(i))
38 | for i in range(config.BUCKETS[-1][0])]
39 | self.decoder_inputs = [tf.placeholder(tf.int32, shape=[None], name='decoder{}'.format(i))
40 | for i in range(config.BUCKETS[-1][1] + 1)]
41 | self.decoder_masks = [tf.placeholder(tf.float32, shape=[None], name='mask{}'.format(i))
42 | for i in range(config.BUCKETS[-1][1] + 1)]
43 |
44 | # Our targets are decoder inputs shifted by one (to ignore symbol)
45 | self.targets = self.decoder_inputs[1:]
46 |
47 | def _inference(self):
48 | print('Create inference')
49 | # If we use sampled softmax, we need an output projection.
50 | # Sampled softmax only makes sense if we sample less than vocabulary size.
51 | if config.NUM_SAMPLES > 0 and config.NUM_SAMPLES < config.DEC_VOCAB:
52 | w = tf.get_variable('proj_w', [config.HIDDEN_SIZE, config.DEC_VOCAB])
53 | b = tf.get_variable('proj_b', [config.DEC_VOCAB])
54 | self.output_projection = (w, b)
55 |
56 | # def sampled_loss(inputs, labels):
57 | def sampled_loss(labels, inputs):
58 | labels = tf.reshape(labels, [-1, 1])
59 | # return tf.nn.sampled_softmax_loss(tf.transpose(w), b, inputs, labels,
60 | # config.NUM_SAMPLES, config.DEC_VOCAB)
61 | return tf.nn.sampled_softmax_loss(tf.transpose(w), b, labels, inputs,
62 | config.NUM_SAMPLES, config.DEC_VOCAB)
63 | self.softmax_loss_function = sampled_loss
64 |
65 | # single_cell = tf.nn.rnn_cell.GRUCell(config.HIDDEN_SIZE)
66 | single_cell = tf.contrib.rnn.GRUCell(config.HIDDEN_SIZE)
67 | # self.cell = tf.nn.rnn_cell.MultiRNNCell([single_cell] * config.NUM_LAYERS)
68 | self.cell = tf.contrib.rnn.MultiRNNCell([single_cell] * config.NUM_LAYERS)
69 |
70 | def _create_loss(self):
71 | print('Creating loss... \nIt might take a couple of minutes depending on how many buckets you have.')
72 | start = time.time()
73 | def _seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
74 | # return tf.nn.seq2seq.embedding_attention_seq2seq(
75 | return tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
76 | encoder_inputs, decoder_inputs, self.cell,
77 | num_encoder_symbols=config.ENC_VOCAB,
78 | num_decoder_symbols=config.DEC_VOCAB,
79 | embedding_size=config.HIDDEN_SIZE,
80 | output_projection=self.output_projection,
81 | feed_previous=do_decode)
82 |
83 | if self.fw_only:
84 | # self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets(
85 | self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(
86 | self.encoder_inputs,
87 | self.decoder_inputs,
88 | self.targets,
89 | self.decoder_masks,
90 | config.BUCKETS,
91 | lambda x, y: _seq2seq_f(x, y, True),
92 | softmax_loss_function=self.softmax_loss_function)
93 | # If we use output projection, we need to project outputs for decoding.
94 | if self.output_projection:
95 | for bucket in range(len(config.BUCKETS)):
96 | self.outputs[bucket] = [tf.matmul(output,
97 | self.output_projection[0]) + self.output_projection[1]
98 | for output in self.outputs[bucket]]
99 | else:
100 | # self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets(
101 | self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(
102 | self.encoder_inputs,
103 | self.decoder_inputs,
104 | self.targets,
105 | self.decoder_masks,
106 | config.BUCKETS,
107 | lambda x, y: _seq2seq_f(x, y, False),
108 | softmax_loss_function=self.softmax_loss_function)
109 | print('Time:', time.time() - start)
110 |
111 | def _creat_optimizer(self):
112 | print('Create optimizer... \nIt might take a couple of minutes depending on how many buckets you have.')
113 | with tf.variable_scope('training') as scope:
114 | self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')
115 |
116 | if not self.fw_only:
117 | self.optimizer = tf.train.GradientDescentOptimizer(config.LR)
118 | trainables = tf.trainable_variables()
119 | self.gradient_norms = []
120 | self.train_ops = []
121 | start = time.time()
122 | for bucket in range(len(config.BUCKETS)):
123 |
124 | clipped_grads, norm = tf.clip_by_global_norm(tf.gradients(self.losses[bucket],
125 | trainables),
126 | config.MAX_GRAD_NORM)
127 | self.gradient_norms.append(norm)
128 | self.train_ops.append(self.optimizer.apply_gradients(list(zip(clipped_grads, trainables)),
129 | global_step=self.global_step))
130 | print('Creating opt for bucket {} took {} seconds'.format(bucket, time.time() - start))
131 | start = time.time()
132 |
133 |
134 | def _create_summary(self):
135 | pass
136 |
137 | def build_graph(self):
138 | self._create_placeholders()
139 | self._inference()
140 | self._create_loss()
141 | self._creat_optimizer()
142 | self._create_summary()
143 |
--------------------------------------------------------------------------------
/chatbot/output_convo.txt:
--------------------------------------------------------------------------------
1 | HUMAN ++++ hi
2 | BOT ++++ hi
3 | HUMAN ++++ how are you?
4 | BOT ++++ i ' m okay .
5 | HUMAN ++++ really?
6 | BOT ++++ i don ' t know what to say .
7 | HUMAN ++++ how're you?
8 | BOT ++++ i ' m fine .
9 | HUMAN ++++ where're you from?
10 | BOT ++++ california .
11 | HUMAN ++++ which city?
12 | BOT ++++ let ' s talk about something else .
13 | HUMAN ++++ you don't like california?
14 | BOT ++++ oh . . .
15 | HUMAN ++++ why not?
16 | BOT ++++ i can ' t .
17 | HUMAN ++++ how old are you?
18 | BOT ++++ eighteen .
19 | =============================================
20 | HUMAN ++++ hi
21 | BOT ++++ hi
22 | HUMAN ++++ how are you?
23 | BOT ++++ i ' m okay .
24 | HUMAN ++++ what's your name?
25 | BOT ++++ let ' s talk about something else .
26 | =============================================
27 | HUMAN ++++ hi
28 | BOT ++++ hi
29 | HUMAN ++++ how are you?
30 | BOT ++++ i ' m okay .
31 | HUMAN ++++ what's your name?
32 | BOT ++++ what ?
33 | HUMAN ++++ what's your name?
34 | BOT ++++ let ' s talk about something else .
35 | HUMAN ++++ what's your name?
36 | BOT ++++ let ' s talk about something else .
37 | HUMAN ++++ what's your name?
38 | BOT ++++ let ' s talk about something else .
39 | HUMAN ++++ what's your name?
40 | BOT ++++ prisoner .
41 | HUMAN ++++ ha ha you really don't want to tell your name
42 | BOT ++++ i ' m not going to do it .
43 | HUMAN ++++ how old are you?
44 | BOT ++++ can you guess ?
45 | HUMAN ++++ eighteen?
46 | BOT ++++ in the bathroom .
47 | HUMAN ++++ in the bathroom?
48 | BOT ++++ in the next time .
49 | HUMAN ++++ why are you talking so funny?
50 | BOT ++++ i ' m looking .
51 | HUMAN ++++ where are you from?
52 | BOT ++++ general .
53 | HUMAN ++++ where are you from?
54 | BOT ++++ how ?
55 | HUMAN ++++ do you like the us?
56 | BOT ++++ i ' m sure .
57 | HUMAN ++++ do you vote?
58 | BOT ++++ i ' m afraid .
59 | HUMAN ++++ what are you afraid of?
60 | BOT ++++ i ' m looking .
61 | HUMAN ++++ what are you looking for?
62 | BOT ++++ can you guess ?
63 | HUMAN ++++ no
64 | BOT ++++ i don ' t know what to say .
65 | =============================================
66 | HUMAN ++++ hi
67 | BOT ++++ hi . what ' s your name ?
68 | HUMAN ++++ i'm chip
69 | BOT ++++ please .
70 | HUMAN ++++ please what?
71 | BOT ++++ i don ' t know what to say .
72 | HUMAN ++++ where are you from?
73 | BOT ++++ california .
74 | HUMAN ++++ do you like california?
75 | BOT ++++ i ' m okay .
76 | HUMAN ++++ i like california a lot
77 | BOT ++++ of what ?
78 | HUMAN ++++ i live here
79 | BOT ++++ and what is it ?
80 | HUMAN ++++ california
81 | BOT ++++ i don ' t know what to say .
82 | HUMAN ++++ i'm a student here
83 | BOT ++++ and get it out of here .
84 | HUMAN ++++ what do you like to do in your free time?
85 | BOT ++++ and what about the gun ?
86 | HUMAN ++++ do you have a gun?
87 | BOT ++++ yes .
88 | HUMAN ++++ you're dangerous!
89 | BOT ++++ i ' m not .
90 | HUMAN ++++ do you go hunting?
91 | BOT ++++ i can do .
92 | HUMAN ++++ what do you hunt?
93 | BOT ++++ virgin .
94 | HUMAN ++++ that's very bad
95 | BOT ++++ and you in love ?
96 | HUMAN ++++ no i'm not
97 | BOT ++++ please - -
98 | HUMAN ++++ okay, i am
99 | BOT ++++ and you are good .
100 | HUMAN ++++ yeah i am good too.
101 | BOT ++++ good good to you .
102 | HUMAN ++++ are you in love?
103 | BOT ++++ i ' m ready .
104 | HUMAN ++++ ready for what?
105 | BOT ++++ for what ?
106 | HUMAN ++++ for what?
107 | BOT ++++ i don ' t know what to say .
108 | HUMAN ++++ tell me
109 | BOT ++++ i can do it .
110 | HUMAN ++++ okay, tell me what you're ready for
111 | BOT ++++ i ' m doing you .
112 | HUMAN ++++ no you're not
113 | BOT ++++ please !
114 | HUMAN ++++ this is a strange conversation
115 | BOT ++++ i don ' t know what to say .
116 | HUMAN ++++ are you a guy?
117 | BOT ++++ of what ?
118 | HUMAN ++++ are you a boy?
119 | BOT ++++ i ' m fine .
120 | HUMAN ++++ are you a man?
121 | BOT ++++ i ' m a little .
122 | HUMAN ++++ what's your name?
123 | BOT ++++ chris -
124 | HUMAN ++++ chris who?
125 | BOT ++++ cornelius .
126 | HUMAN ++++ so i guess you're a guy.
127 | BOT ++++ i am .
128 | HUMAN ++++ okay
129 | BOT ++++ okay ?
130 | HUMAN ++++ do you have a girlfriend?
131 | BOT ++++ i ' m sure .
132 | HUMAN ++++ what's her name?
133 | BOT ++++ let ' s talk about something else .
134 | =============================================
135 | HUMAN ++++ hi chris
136 | BOT ++++ hi
137 | HUMAN ++++ how're you today?
138 | BOT ++++ okay .
139 | HUMAN ++++ the weather is nice today
140 | BOT ++++ i don ' t know what to say .
141 | HUMAN ++++ just say the weather is nice
142 | BOT ++++ i don ' t know what to say .
143 | =============================================
144 | HUMAN ++++ hi
145 | BOT ++++ hi
146 | HUMAN ++++ how're you today?
147 | BOT ++++ okay .
148 | HUMAN ++++ just okay?
149 | BOT ++++ i ' m okay .
150 | HUMAN ++++ what's on your mind?
151 | BOT ++++ i ' m in my room .
152 | HUMAN ++++ doing what?
153 | BOT ++++ i don ' t know what to say .
154 | HUMAN ++++ you're boring.
155 | BOT ++++ i ' m not .
156 | HUMAN ++++ prove it
157 | BOT ++++ and what ?
158 | HUMAN ++++ what what?
159 | BOT ++++ sorry , i ' m still very stupid .
160 | HUMAN ++++ it's okay. we're all stupid sometimes.
161 | BOT ++++ ' s good to see .
162 | HUMAN ++++ what's good to see?
163 | BOT ++++ i ' m in .
164 | HUMAN ++++ you're in your room?
165 | BOT ++++ i ' m here .
166 | HUMAN ++++ where's your room?
167 | BOT ++++ i ' m in my room .
168 | HUMAN ++++ are you alone?
169 | BOT ++++ i ' m fine .
170 | HUMAN ++++ are you by yourself?
171 | BOT ++++ in the morning .
172 | HUMAN ++++ what time is it?
173 | BOT ++++ # # # # # # # .
174 | =============================================
175 | HUMAN ++++ hi
176 | BOT ++++ hi . what ' s your name ?
177 | HUMAN ++++ i'm marie
178 | BOT ++++ you are .
179 | HUMAN ++++ yes i am
180 | BOT ++++ and you are my friend .
181 | HUMAN ++++ yes you are
182 | BOT ++++ and you are my best .
183 | HUMAN ++++ that's so sweet
184 | BOT ++++ please .
185 | HUMAN ++++ do you know what time it is?
186 | BOT ++++ i don ' t know what to say .
187 | =============================================
188 | HUMAN ++++ what time is it?
189 | BOT ++++ 5 9 6 9 1 4 5 .
190 | HUMAN ++++ it doesn't make any sense
191 | BOT ++++ are you ?
192 | HUMAN ++++ i do
193 | BOT ++++ and you are ?
194 | HUMAN ++++ i'm a human.
195 | BOT ++++ you in .
196 | HUMAN ++++ are you in trouble?
197 | BOT ++++ i ' m in .
198 | HUMAN ++++ what trouble?
199 | BOT ++++ i ' m in love .
200 | HUMAN ++++ wow that's nice.
201 | BOT ++++ like that ?
202 | HUMAN ++++ yeah
203 | BOT ++++ i don ' t know what to say .
204 | HUMAN ++++ who are you?
205 | BOT ++++ i don ' t understand .
206 | HUMAN ++++ who are you?
207 | BOT ++++ no idea .
208 | HUMAN ++++ what's your name?
209 | BOT ++++ chris -
210 | HUMAN ++++ chris who?
211 | BOT ++++ cornelius .
212 | HUMAN ++++ where are you from?
213 | BOT ++++ california .
214 | HUMAN ++++ are you american?
215 | BOT ++++ i ' m fine .
216 | HUMAN ++++ of course you're fine.
217 | BOT ++++ and you ?
218 | HUMAN ++++ i'm okay.
219 | BOT ++++ you are .
220 | HUMAN ++++ yes, i am.
221 | BOT ++++ and you are my friend .
222 | HUMAN ++++ yes, i am.
223 | BOT ++++ and you are my friend .
224 | HUMAN ++++ that's correct.
225 | BOT ++++ and you ' re in trouble ?
226 | HUMAN ++++ no, i'm not. why did you think so?
227 | BOT ++++ because it ' s a good idea .
228 | HUMAN ++++ what is your good idea?
229 | BOT ++++ i ' ll be fine .
230 | =============================================
231 | HUMAN ++++ hello
232 | BOT ++++ and you are in trouble ?
233 | HUMAN ++++ what did you say that?
234 | BOT ++++ ' s life .
235 | HUMAN ++++ i'm not in trouble.
236 | BOT ++++ what the hell is that ?
237 | HUMAN ++++ okay, what trouble?
238 | BOT ++++ i ' m looking .
239 | HUMAN ++++ why did you say i'm in trouble?
240 | BOT ++++ let ' s talk about something else .
241 | HUMAN ++++ what do you want to talk about?
242 | BOT ++++ i can ' t .
243 | HUMAN ++++ what can't?
244 | BOT ++++ i ' ll have a drink .
245 | HUMAN ++++ okay, me too.
246 | BOT ++++ you were here ?
247 | HUMAN ++++ i'm still here.
248 | BOT ++++ what is it ?
249 | HUMAN ++++ i thought you wanted a drink.
250 | BOT ++++ i ' m not .
251 | HUMAN ++++ you just said so!
252 | BOT ++++ i was wrong .
253 | HUMAN ++++ it's fine.
254 | BOT ++++ okay ?
255 | HUMAN ++++ yeah
256 | BOT ++++ i don ' t know what to say .
257 | HUMAN ++++ do you drink a lot?
258 | BOT ++++ i can do it .
259 | HUMAN ++++ do you drink often?
260 | BOT ++++ in the time .
261 | HUMAN ++++ that doesn't make any sense.
262 | BOT ++++ are you ?
263 | HUMAN ++++ what do you drink?
264 | BOT ++++ ' s in a drink .
265 | HUMAN ++++ do you drink alcohol?
266 | BOT ++++ in time .
267 | HUMAN ++++ i like beer.
268 | BOT ++++ oh . . .
269 | HUMAN ++++ don't you like beer?
270 | BOT ++++ i don ' t know what to say .
271 | HUMAN ++++ where do you live?
272 | BOT ++++ up .
273 | HUMAN ++++ up where?
274 | BOT ++++ in the hospital .
275 | HUMAN ++++ i'm sorry. are you okay?
276 | BOT ++++ i ' m fine .
277 | HUMAN ++++ why are you in the hospital?
278 | BOT ++++ and what ?
279 | HUMAN ++++ how many words do you know?
280 | BOT ++++ i ' d do it .
281 | HUMAN ++++ how many?
282 | BOT ++++ i ' m not .
283 | HUMAN ++++ how many?
284 | BOT ++++ i ' m not .
285 | HUMAN ++++ how many?
286 | BOT ++++ i ' m not .
287 | HUMAN ++++ whatever
288 | BOT ++++ i don ' t know what to say .
289 | HUMAN ++++ you're boring.
290 | BOT ++++ i ' m not .
291 | HUMAN ++++ yes you are.
292 | BOT ++++ and you are my friend .
293 | HUMAN ++++ no i'm not.
294 | BOT ++++ please .
295 | HUMAN ++++ okay fine. i'm your friend.
296 | BOT ++++ you ' re my mother ?
297 | HUMAN ++++ i have no son like you.
298 | BOT ++++ i don ' t know what to say .
299 | HUMAN ++++ i have no son.
300 | BOT ++++ i see you .
301 | HUMAN ++++ i have no children.
302 | BOT ++++ you are .
303 | HUMAN ++++ omg you're so dumb.
304 | BOT ++++ i ' m not .
305 | HUMAN ++++ you're stupid!
306 | BOT ++++ i ' m not .
307 | HUMAN ++++ do you always talk like this?
308 | BOT ++++ of course .
309 | HUMAN ++++ how old are you?
310 | BOT ++++ eighteen .
311 | HUMAN ++++ you're big enough.
312 | BOT ++++ i ' m not .
313 | HUMAN ++++ are you an adult?
314 | BOT ++++ i ' m going .
315 | HUMAN ++++ you're going to be an adult?
316 | BOT ++++ i ' m looking for you .
317 | HUMAN ++++ i'm in my room.
318 | BOT ++++ and your mother ?
319 | HUMAN ++++ she's in vietnam.
320 | BOT ++++ you ' re here .
321 | HUMAN ++++ yes i'm here.
322 | BOT ++++ and what are you talking about ?
323 | HUMAN ++++ you asked me where i am.
324 | BOT ++++ and you are your friend .
325 | HUMAN ++++ i need to go now.
326 | BOT ++++ i will not .
327 | HUMAN ++++ that's fine. you can stay. i'm leaving.
328 | BOT ++++ oh . . .
329 | HUMAN ++++ bye
330 | BOT ++++ i don ' t know what to say .
331 | HUMAN ++++ goodbye
332 | BOT ++++ i don ' t know what to say .
333 | HUMAN ++++ bye bye
334 | BOT ++++ okay ?
335 | HUMAN ++++ see you later
336 | BOT ++++ i don ' t know what to say .
337 | =============================================
338 | HUMAN ++++ hi
339 | BOT ++++ hi . what ' s your name ?
340 | HUMAN ++++ i'm maria
341 | BOT ++++ please .
342 | HUMAN ++++ i'm jackson
343 | BOT ++++ you are .
344 | HUMAN ++++ i'm jack
345 | BOT ++++ let ' s talk about something else .
346 | HUMAN ++++ i have no go
347 | BOT ++++ you are ?
348 | HUMAN ++++ i have to go
349 | BOT ++++ and what do you mean ?
350 | HUMAN ++++ i'm leaving.
351 | BOT ++++ what ?
352 | HUMAN ++++ yeah, bye
353 | BOT ++++ bye . have a nice day !
354 | =============================================
355 | HUMAN ++++ that's very bad.
356 | BOT ++++ i ' ve got to work .
357 | HUMAN ++++ what do you do?
358 | BOT ++++ i ' ll be there .
359 | HUMAN ++++ where do you work?
360 | BOT ++++ in the trunk .
361 | HUMAN ++++ are you a machenics?
362 | BOT ++++ i ' m not .
363 | HUMAN ++++ what are you?
364 | BOT ++++ no idea .
365 | =============================================
366 |
--------------------------------------------------------------------------------