├── README.md ├── images └── nn.jpg └── src ├── classes_dict.py ├── classes_dict.pyc ├── ml_reply.py ├── model_laura.tflearn.data-00000-of-00001 ├── model_laura.tflearn.index ├── model_laura.tflearn.meta ├── retrieval_based_Reply.py └── rule_based_reply.py /README.md: -------------------------------------------------------------------------------- 1 | # Reply Bot 2 | 3 | Smart reply bot implemented in Python. 4 | 5 | ## What Does It Do? 6 | 7 | The objective of this project is to create a bot that suggests short responses to a text or email message using Natural Language Processing. A reply bot is a type of chatbot. The goal of a chatbot is to mimic written or spoken speech as to simulate a conversation with a human. 8 | 9 | The scope of bot is casual conversation. 10 | 11 | From a high-level, there are two variants of chatbots, **rule-based** and **self-learning**: 12 | * A **rule-based** bot responds via pattern matching and rules. 13 | * A **self-learning** bot responds via using machine learning. 14 | * A **retrieval-based** chatbot replies by locating the best response from a database (corpus). 15 | * A **generative-based** chatbot use deep learning methods to respond and can generate a response it has never seen before. 16 | 17 | In order to understand the tradeoffs between the different types of chatbots, I implemented a rule-based, a retrieval-based and a machine learning-based reply bot. 18 | 19 | ## Built With 20 | 21 | * Python 22 | * [NLTK](https://www.nltk.org/) 23 | * TensorFlow and TFLearn 24 | 25 | ## Design 26 | 27 | ### Rule-Based Bot 28 | 29 | A rule-based chatbot is the simplest type of bot. This type of bot searches for predefined patterns in the input and uses a set of rules (if-else logic) to suggest replies. The suggested replies, in this case, are predefined. We can implement pattern matching on an input message by using regular expressions and applying rules with if-else statements. 30 | 31 | For example, to search for a greeting we can use this as our regex pattern: 32 | 33 | ```python 34 | greeting_str = 'hi|hello|sup|hey|good\s+[morning|afternoon|evening] 35 | ``` 36 | 37 | If the pattern is found, we suggest predefined greeting responses: 38 | 39 | ```python 40 | greeting_response = ["hi", "hey", "hello"] 41 | ``` 42 | 43 | Altogether we have 44 | 45 | ```python 46 | if re.search(greeting_str, user_input): 47 | return greeting_response 48 | ``` 49 | 50 | For the reply bot, I have defined 8 simple rules: 51 | * Greeting 52 | * Goodbye 53 | * How are you 54 | * Thank you 55 | * Do you/will you/can you/would you 56 | * Are you 57 | * When 58 | * What’s up 59 | 60 | Each rule has an associated reply: 61 | 62 | ```python 63 | greeting_response = ["hi", "hello", "hey"] 64 | goodbye_response = ["bye", "talk to you later"] 65 | thank_response = ['happy to help','don\'t mention it','my pleasure'] 66 | inquiry_response = ['i\'m doing ok','ok','i\'ve been better'] 67 | future_response = ['yes','no','maybe'] 68 | what_you_response = ['nothing', 'not much'] 69 | are_you_response = ['yes','no', 'maybe'] 70 | when_response = ['soon','not now'] 71 | no_response = ['[No Suggestion]'] 72 | ``` 73 | 74 | ### Retrieval-Based Bot 75 | 76 | The goal of a retrieval bot is to find the most likely response r given the input i. A retrieval-based bot responds to input using a database of utterance-response (Q-R) pairs. The database is known as our corpus. To generate a response, we first need to classify the user input to one of our predefined classes (if applicable). Once classified, the response associated with that class is suggested. 77 | 78 | Below is our database/corpus (in this case, it’s a python dictionary) which we will use for our self-learning bots. 79 | 80 | ```python 81 | classes_dict = {} 82 | 83 | classes_dict["greeting"] = {} 84 | classes_dict["greeting"]["pattern"] = ["hi there", "hello", "hey", "good afternoon", "good morning", "good evening", "good day"] 85 | classes_dict["greeting"]["response"] = ["hi", "hey", "hello"] 86 | 87 | classes_dict["goodbye"] = {} 88 | classes_dict["goodbye"]["pattern"] = ["bye", "goodbye", "see you later", "gotta go", "i have to go", "see you", "see ya", "talk to you later"] 89 | classes_dict["goodbye"]["response"] = ["bye", "talk to you later"] 90 | 91 | classes_dict["thanks"] = {} 92 | classes_dict["thanks"]["pattern"] = ["thanks", "thank you"] 93 | classes_dict["thanks"]["response"] = ["you're welcome", "my pleasure", "don't mention it"] 94 | 95 | classes_dict["how are you"] = {} 96 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"] 97 | classes_dict["how are you"]["response"] = ["i'm doing ok", "ok", "i've been better"] 98 | 99 | classes_dict["future"] = {} 100 | classes_dict["future"]["pattern"] = ["will you", "can you", "would you", "do you"] 101 | classes_dict["future"]["response"] = ["yes", "no", "maybe"] 102 | 103 | classes_dict["are you"] = {} 104 | classes_dict["are you"]["pattern"] = ["are you", "aren't you"] 105 | classes_dict["are you"]["response"] = ["yes", "no", "maybe"] 106 | 107 | classes_dict["when"] = {} 108 | classes_dict["when"]["pattern"] = ["when will", "when can", "when would", "when should"] 109 | classes_dict["when"]["response"] = ["soon", "not now"] 110 | 111 | classes_dict["whats up"] = {} 112 | classes_dict["whats up"]["pattern"] = ["sup", "what's up", "what up", "whats up", "what is going on", "what's going on", "what's happening", "what are you up to", "what are you doing"] 113 | classes_dict["whats up"]["response"] = ["nothing", "not much"] 114 | ``` 115 | 116 | It can be seen that we have the 8 classes to classify user input. Each value of classes_dict (i.e., “greeting”, “goodbye”, etc.), has sub-dictionaries of “pattern” and “response.” “pattern” defines a list of expressions that we use to match user input to that class. “response” defines the suggested responses when a match to that class has been found. 117 | 118 | For a retrieval-based bot, to determine the class of the user input, we do not use pattern matching but instead deconstruct the user input into a feature vector via text processing [4]. Basic text pre-processing includes: 119 | 120 | * Converting text into uppercase or lowercase 121 | * Tokenizing – convert normal text strings into a list of tokens 122 | * Removing noise – remove everything that isn’t a number or letter (i.e. punctuation) 123 | * Stemming – reduce words to their stem, base or root form 124 | * Lemmatization – a variant of stemming. Stemming can create non-existent words; however, lemmas are actual words. For example, the words “better” and “good” are the same lemma [5] 125 | * Removing stop words – remove extremely common words that would be of little value to the algorithm 126 | 127 | Once the user input is pre-processed, we transform the user input into a vector of numbers (one number for each word). We do this using Term Frequency-Inverse Document Frequency (TF-IDF). This assigns a weight to each word which is a statistical measure about how important that word is. We can then use cosine similarity to compare user input to the corpus [1]: 128 | 129 | ```python 130 | remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation) 131 | 132 | def lem_tokens(tokens): 133 | 134 | lemmer = nltk.stem.WordNetLemmatizer() 135 | 136 | return [lemmer.lemmatize(token) for token in tokens] 137 | 138 | def lem_normalize(text): 139 | 140 | return lem_tokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict))) 141 | 142 | sent_tokens['user'] = user_input 143 | sent_tokens_ = [] 144 | 145 | for value in sent_tokens: 146 | sent_tokens_.append(sent_tokens[value]) 147 | 148 | tfidf_vec = TfidfVectorizer(tokenizer = lem_normalize) 149 | tfidf = tfidf_vec.fit_transform(sent_tokens_) 150 | vals = cosine_similarity(tfidf[-1], tfidf) 151 | idx = vals.argsort()[0][-2] 152 | flat = vals.flatten() 153 | flat.sort() 154 | req_tfidf = flat[-2] 155 | ``` 156 | 157 | The class with the highest similarity to the user input is chosen and the class response is suggested. 158 | 159 | ```python 160 | error_threshold = 0.1 161 | if(req_tfidf < error_threshold): 162 | 163 | robo_response = ["[No Suggestion]"] 164 | 165 | else: 166 | 167 | for value in sent_tokens: 168 | match_pattern = sent_tokens_[idx] 169 | pattern = sent_tokens[value] 170 | if match_pattern == pattern: 171 | match_class = value 172 | 173 | robo_response = my_dict[match_class]['response'] 174 | 175 | ``` 176 | 177 | ### Machine Learning-Based Bot 178 | 179 | This type of reply bot uses a Neural Network (NN) to suggest a reply. The NN classifies the user input into one of pre-defined classes and we suggest the replies associated with that class [2]. 180 | 181 | The first step to creating input training data is to create an array of words found in our corpus, which we call our ‘bag of words.’ Recall all the class patterns found in the corpus: 182 | 183 | ```python 184 | classes_dict["greeting"]["pattern"] = ["hi there", "hello", "hey", "good afternoon", "good morning", "good evening", "good day"] 185 | classes_dict["goodbye"]["pattern"] = ["bye", "goodbye", "see you later", "gotta go", "i have to go", "see you", "see ya", "talk to you later"] 186 | classes_dict["thanks"]["pattern"] = ["thanks", "thank you"] 187 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"] 188 | classes_dict["future"]["pattern"] = ["will you", "can you", "would you", "do you"] 189 | classes_dict["are you"]["pattern"] = ["are you", "aren't you"] 190 | classes_dict["when"]["pattern"] = ["when will", "when can", "when would", "when should"] 191 | classes_dict["whats up"]["pattern"] = ["sup", "what's up", "what up", "whats up", "what is going on", "what's going on", "what's happening", "what are you up to", "what are you doing"] 192 | ``` 193 | 194 | We tokenize, stem, and sort these patterns to create an a bag of words array. We have 42 known words in our corpus: 195 | 196 | ```python 197 | my_words = ["'s", 'afternoon', 'ar', 'bye', 'can', 'day', 'do', 'doing', 'ev', 'go', 'going', 'good', 'goodby', 'got', 'hap', 'hav', 'hello', 'hey', 'hi', 'how', 'i', 'is', 'it', 'lat', 'morn', "n't", 'on', 'see', 'should', 'sup', 'ta', 'talk', 'thank', 'ther', 'to', 'up', 'what', 'when', 'wil', 'would', 'ya', 'you'] 198 | ``` 199 | 200 | Likewise, we need to create a sorted array for our output classes: 201 | 202 | ```python 203 | my_classes = ['are you', 'future', 'goodbye', 'greeting', 'how are you', 'thanks', 'whats up', 'when'] 204 | ``` 205 | 206 | We now can create our training data. Let’s use the ‘how are you’ class as an example. 207 | 208 | ```python 209 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"] 210 | ``` 211 | 212 | In this class, one pattern is ‘how are you doing.’ We can tokenize and stem this into a list: 213 | 214 | ```python 215 | my_pattern = ['how', 'ar', 'you', 'doing'] 216 | ``` 217 | 218 | We then create an input training list by appending a 1 in indices where a my_pattern word is found in our list of known words (my_words) and appending a 0 otherwise. For the pattern ‘how are you doing,’ we append bag with a 1 in positions ‘ar’ ‘doing’, ‘how,’ and ‘you’: 219 | 220 | ```python 221 | bag = [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 222 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] 223 | ``` 224 | 225 | Since this belongs to the ‘how are you’ class, we append the output as 226 | 227 | ```python 228 | output_row = [0, 0, 0, 0, 1, 0, 0, 0] 229 | ``` 230 | 231 | The architecture of our NN is simple, as can be seen below. We have only 2 hidden layers. 232 | 233 | 234 | 235 | We train this for 1,000 epochs; this actually trains quite fast. It takes less than 30 seconds to train completely. 236 | 237 | Once the model is trained, we can use the model to predict a class on user input. Again we define an error threshold for when to offer a response. Here it is 95%. Instead of using and error threshold to determine if a message will get a smart reply, we could train/implement a binary classifier on all messages to determine if they will get a smart reply [1]. 238 | 239 | ```python 240 | def tokenize_input(sentence): 241 | 242 | sentence_words = nltk.word_tokenize(sentence) # tokenize pattern 243 | 244 | for some_word in sentence_words: 245 | sentence_words[sentence_words.index(some_word)] = stemmer.stem(some_word.lower()) 246 | 247 | return sentence_words 248 | 249 | def bag(user_input): 250 | 251 | input_words = tokenize_input(user_input) 252 | 253 | bag = [0] * len(my_words) 254 | 255 | for input_word in input_words: 256 | for i in range(0,len(my_words)): 257 | bag_word = my_words[i] 258 | if input_word == bag_word: 259 | bag[i] = 1 260 | 261 | return(np.array(bag)) 262 | 263 | error_threshold = 0.95 264 | def classify(user_input): 265 | 266 | results = model.predict([bag(user_input)])[0] # generate probabilities from the model 267 | 268 | filtered_results = [] 269 | 270 | for i in range(0, len(results)): 271 | this_result = results[i] 272 | if this_result > error_threshold: 273 | filtered_results.append([i, this_result]) 274 | 275 | filtered_results.sort(key = lambda x: x[1], reverse = True) 276 | 277 | return_list = [] 278 | 279 | for i in range(0, len(filtered_results)): 280 | return_list.append((my_classes[filtered_results[i][0]], filtered_results[i][1])) 281 | 282 | return return_list # return tuple of intent and probability 283 | 284 | some_array = classify(user_input) 285 | 286 | if len(some_array) != 0: 287 | 288 | for response in classes_dict[some_array[0][0]]["response"]: 289 | print "* " + response 290 | ``` 291 | 292 | ## Usage 293 | 294 | Execute the program 295 | 296 | ``` 297 | python rule_based_reply.py 298 | ``` 299 | 300 | The program will then request a message 301 | 302 | ``` 303 | * Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit! 304 | >>> 305 | ``` 306 | 307 | Try different different messages and the reply bot will generate suggested replies or [No Suggestion] 308 | 309 | ``` 310 | >>> hello 311 | * hi 312 | * hey 313 | * hello 314 | ``` 315 | 316 | To quit, type quit 317 | 318 | ``` 319 | >>> quit 320 | ``` 321 | 322 | ## Thoughts 323 | 324 | It's obvious that no one type of reply bot works best. A retrieval-based bot performs poorly for casual conversation and works better for factual queries. For casual conversation, a hybrid implementation (rule-based and machine-learning based) may work best. 325 | 326 | ## Author 327 | 328 | **Laura Kocubinski** [laurakoco](https://github.com/laurakoco) 329 | 330 | ## Acknowledgments 331 | 332 | * Boston University MET Master of Science Computer Science Program 333 | * MET CS 664 Artificial Intelligence 334 | 335 | [1] https://medium.com/analytics-vidhya/building-a-simple-chatbot-in-python-using-nltk-7c8c8215ac6e 336 | 337 | [2] https://chatbotsmagazine.com/contextual-chat-bots-with-tensorflow-4391749d0077 338 | -------------------------------------------------------------------------------- /images/nn.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/images/nn.jpg -------------------------------------------------------------------------------- /src/classes_dict.py: -------------------------------------------------------------------------------- 1 | 2 | classes_dict = {} 3 | 4 | classes_dict["greeting"] = {} 5 | classes_dict["greeting"]["pattern"] = ["hi", "hi there", "hello", "hey", "good afternoon", "good morning", "good evening", "good day"] 6 | classes_dict["greeting"]["response"] = ["hi", "hey", "hello"] 7 | 8 | classes_dict["goodbye"] = {} 9 | classes_dict["goodbye"]["pattern"] = ["bye", "goodbye", "see you later", "gotta go", "i have to go", "see you", "see ya", "talk to you later"] 10 | classes_dict["goodbye"]["response"] = ["bye", "talk to you later"] 11 | 12 | classes_dict["thanks"] = {} 13 | classes_dict["thanks"]["pattern"] = ["thanks", "thank you"] 14 | classes_dict["thanks"]["response"] = ["you're welcome", "my pleasure", "don't mention it"] 15 | 16 | classes_dict["how are you"] = {} 17 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"] 18 | classes_dict["how are you"]["response"] = ["i'm doing ok", "ok", "i've been better"] 19 | 20 | classes_dict["future"] = {} 21 | classes_dict["future"]["pattern"] = ["will you", "can you", "would you", "do you"] 22 | classes_dict["future"]["response"] = ["yes", "no", "maybe"] 23 | 24 | classes_dict["are you"] = {} 25 | classes_dict["are you"]["pattern"] = ["are you", "aren't you"] 26 | classes_dict["are you"]["response"] = ["yes", "no", "maybe"] 27 | 28 | classes_dict["when"] = {} 29 | classes_dict["when"]["pattern"] = ["when will", "when can", "when would", "when should"] 30 | classes_dict["when"]["response"] = ["soon", "not now"] 31 | 32 | classes_dict["whats up"] = {} 33 | classes_dict["whats up"]["pattern"] = ["sup", "what's up", "what up", "whats up", "what is going on", "what's going on", "what's happening", "what are you up to", "what are you doing"] 34 | classes_dict["whats up"]["response"] = ["nothing", "not much"] 35 | -------------------------------------------------------------------------------- /src/classes_dict.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/classes_dict.pyc -------------------------------------------------------------------------------- /src/ml_reply.py: -------------------------------------------------------------------------------- 1 | import nltk 2 | import numpy as np 3 | import tflearn 4 | import tensorflow as tf 5 | import random 6 | import pickle 7 | import re 8 | from nltk.stem.lancaster import LancasterStemmer 9 | 10 | stemmer = LancasterStemmer() 11 | 12 | from classes_dict import * 13 | 14 | my_words = [] 15 | my_classes = [] 16 | my_doc = [] 17 | ignore_words = ['?'] 18 | 19 | # pre-process text from classes_dict 20 | for some_class in classes_dict: 21 | 22 | my_classes.append(some_class) 23 | 24 | for some_pattern in classes_dict[some_class]["pattern"]: 25 | 26 | temp_words = [] 27 | 28 | raw_words = some_pattern 29 | # raw_words = ' '.join(raw_words) 30 | word_tokens = nltk.word_tokenize(raw_words) 31 | 32 | for some_word in word_tokens: 33 | if some_word not in ignore_words: 34 | stemmed_word = stemmer.stem(some_word.lower()) 35 | my_words.append(stemmed_word) 36 | temp_words.append(stemmed_word) 37 | 38 | my_doc.append((temp_words, some_class)) 39 | 40 | my_words = sorted( list(set(my_words)) ) # remove duplicate words 41 | my_classes = sorted( list(set(my_classes)) ) 42 | 43 | training = [] 44 | output = [] 45 | output_empty = [0] * len(my_classes) 46 | 47 | for some_doc in my_doc: 48 | 49 | bag = [] 50 | pattern_words = some_doc[0] 51 | 52 | # create bag of words array 53 | for some_word in my_words: 54 | if some_word in pattern_words: 55 | bag.append(1) 56 | else: 57 | bag.append(0) 58 | 59 | output_row = list(output_empty) 60 | output_row[my_classes.index(some_doc[1])] = 1 61 | 62 | training.append([bag, output_row]) 63 | 64 | # shuffle training data and put into array 65 | random.shuffle(training) 66 | training = np.array(training) 67 | 68 | # create train lists 69 | train_x = list(training[:,0]) 70 | train_y = list(training[:,1]) 71 | 72 | # reset underlying graph data 73 | tf.reset_default_graph() 74 | 75 | # build nn model 76 | net = tflearn.input_data(shape = [None, len(train_x[0])]) 77 | net = tflearn.fully_connected(net, 8) 78 | net = tflearn.fully_connected(net, 8) 79 | net = tflearn.fully_connected(net, len(train_y[0]), activation = 'softmax') 80 | net = tflearn.regression(net) 81 | 82 | # define model and setup tensorboard 83 | model = tflearn.DNN(net, tensorboard_dir = 'tflearn_logs') 84 | 85 | # train model 86 | # model.fit(train_x, train_y, n_epoch = 500, batch_size = 8, show_metric = True) 87 | # model.save('model_stupid.tflearn') 88 | # pickle.dump( {'my_words':my_words, 'my_classes':my_classes, 'train_x':train_x, 'train_y':train_y}, open( "training_data", "wb" ) ) 89 | 90 | # load model 91 | model.load('./model_laura.tflearn') 92 | 93 | # import pickle 94 | # data = pickle.load( open( "training_data", "rb" ) ) 95 | # words = data['words'] 96 | # classes = data['classes'] 97 | # train_x = data['train_x'] 98 | # train_y = data['train_y'] 99 | 100 | def tokenize_input(sentence): 101 | 102 | sentence_words = nltk.word_tokenize(sentence) # tokenize pattern 103 | # sentence_words = [stemmer.stem(word.lower()) for word in sentence_words] # stem each word 104 | 105 | for some_word in sentence_words: 106 | sentence_words[sentence_words.index(some_word)] = stemmer.stem(some_word.lower()) 107 | 108 | return sentence_words 109 | 110 | def bag(user_input): 111 | 112 | input_words = tokenize_input(user_input) 113 | 114 | bag = [0] * len(my_words) 115 | 116 | for input_word in input_words: 117 | for i in range(0,len(my_words)): 118 | bag_word = my_words[i] 119 | if input_word == bag_word: 120 | bag[i] = 1 121 | 122 | return(np.array(bag)) 123 | 124 | error_threshold = 0.95 125 | 126 | def classify(user_input): 127 | 128 | results = model.predict([bag(user_input)])[0] # generate probabilities from the model 129 | 130 | filtered_results = [] 131 | 132 | for i in range(0, len(results)): 133 | this_result = results[i] 134 | if this_result > error_threshold: 135 | filtered_results.append([i, this_result]) 136 | 137 | filtered_results.sort(key = lambda x: x[1], reverse = True) 138 | 139 | return_list = [] 140 | 141 | for i in range(0, len(filtered_results)): 142 | return_list.append((my_classes[filtered_results[i][0]], filtered_results[i][1])) 143 | 144 | return return_list # return tuple of intent and probability 145 | 146 | print("* Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit!") 147 | 148 | flag = True 149 | 150 | while flag: 151 | 152 | user_input = raw_input('>>> ').lower() # get input and convert to lowercase 153 | 154 | if(user_input != "quit"): 155 | 156 | some_array = [] 157 | 158 | some_array = classify(user_input) 159 | 160 | # print "predicted class: " + my_classes[pred_class] 161 | # print "probability: " + str(pred_prob) 162 | 163 | if len(some_array) != 0: 164 | 165 | # print "predicted class: " + str(some_array[0][0]) 166 | # print "probability: " + str(some_array[0][1]) 167 | 168 | for response in classes_dict[some_array[0][0]]["response"]: 169 | print "* " + response 170 | 171 | else: 172 | 173 | print "[No Suggestion]" 174 | 175 | else: 176 | 177 | flag = False 178 | -------------------------------------------------------------------------------- /src/model_laura.tflearn.data-00000-of-00001: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/model_laura.tflearn.data-00000-of-00001 -------------------------------------------------------------------------------- /src/model_laura.tflearn.index: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/model_laura.tflearn.index -------------------------------------------------------------------------------- /src/model_laura.tflearn.meta: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/model_laura.tflearn.meta -------------------------------------------------------------------------------- /src/retrieval_based_Reply.py: -------------------------------------------------------------------------------- 1 | 2 | import nltk 3 | import random 4 | import string 5 | import sys 6 | import nltk 7 | 8 | from sklearn.feature_extraction.text import TfidfVectorizer 9 | from sklearn.metrics.pairwise import cosine_similarity 10 | 11 | from classes_dict import * 12 | 13 | remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation) 14 | 15 | def lem_tokens(tokens): 16 | 17 | lemmer = nltk.stem.WordNetLemmatizer() 18 | 19 | return [lemmer.lemmatize(token) for token in tokens] 20 | 21 | def lem_normalize(text): 22 | 23 | return lem_tokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict))) 24 | # return LemTokens(nltk.word_tokenize(text.lower())) 25 | 26 | def my_response(my_dict, user_input, sent_tokens): 27 | 28 | robo_response = '' 29 | 30 | #sent_tokens.append(user_response) 31 | sent_tokens['user'] = user_input 32 | 33 | sent_tokens_ = [] 34 | 35 | for value in sent_tokens: 36 | sent_tokens_.append(sent_tokens[value]) 37 | 38 | # tfidf_vec = TfidfVectorizer(tokenizer = lem_normalize, stop_words='english') 39 | tfidf_vec = TfidfVectorizer(tokenizer = lem_normalize) 40 | 41 | tfidf = tfidf_vec.fit_transform(sent_tokens_) 42 | vals = cosine_similarity(tfidf[-1], tfidf) 43 | idx = vals.argsort()[0][-2] 44 | flat = vals.flatten() 45 | flat.sort() 46 | req_tfidf = flat[-2] 47 | # print req_tfidf 48 | 49 | error_threshold = 0.1 50 | if(req_tfidf < error_threshold): 51 | 52 | robo_response = ["[No Suggestion]"] 53 | return robo_response 54 | 55 | else: 56 | 57 | for value in sent_tokens: 58 | match_pattern = sent_tokens_[idx] 59 | pattern = sent_tokens[value] 60 | if match_pattern == pattern: 61 | match_class = value 62 | 63 | # print match_class 64 | robo_response = my_dict[match_class]['response'] 65 | return robo_response 66 | 67 | def post_dict(some_dict): 68 | 69 | sent_tokens = {} 70 | 71 | for value in some_dict: 72 | words = some_dict[value]["pattern"] 73 | words = ' '.join(words) 74 | sent_tokens[value] = words 75 | word_tokens = nltk.word_tokenize(words) 76 | 77 | return sent_tokens, word_tokens 78 | 79 | sent_tokens, word_tokens = post_dict(classes_dict) 80 | 81 | print("* Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit!") 82 | 83 | flag = True 84 | 85 | while flag: 86 | 87 | user_input = raw_input('>>> ').lower() 88 | 89 | if(user_input != "quit"): 90 | 91 | response = my_response(classes_dict, user_input, sent_tokens) 92 | 93 | for i in range(0, len(response)): 94 | print('* ' + response[i]) 95 | 96 | # sent_tokens.remove(user_input) 97 | del sent_tokens['user'] 98 | 99 | else: 100 | 101 | flag = False 102 | 103 | # def read_text(): 104 | 105 | # f = open('smart_reply_input.txt', 'r') 106 | # raw = f.read() 107 | 108 | # sent_tokens = nltk.sent_tokenize(raw) 109 | # word_tokens = nltk.word_tokenize(raw) 110 | 111 | # return sent_tokens, word_tokens 112 | 113 | # def process_text(sent_tokens, word_tokens): 114 | # remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation) 115 | # new_word_tokens = LemTokens(word_tokens) 116 | # return new_word_tokens 117 | 118 | # def greeting(sentence): 119 | 120 | # for word in sentence.split(): 121 | # if word in greeting_inputs: 122 | # return random.choice(greeting_responses) 123 | 124 | # my_reply = [] 125 | # my_reply.append(["hi", "hey", "hello"]) 126 | # my_reply.append(['You\'re welcome','My pleasure','Don\'t mention it']) 127 | # my_reply.append(['I\'m doing well','OK','I\'ve been better']) 128 | # my_reply.append(['Yes, I\'d love to','Sure!','Sorry, I can\'t']) 129 | # my_reply.append(['Not much','Nothing']) 130 | # my_reply.append(['Yes','No']) 131 | # my_reply.append([':)','Cool']) 132 | # my_reply.append([':(','Sorry']) 133 | # my_reply.append(['Work','Home','I\'m out']) 134 | # my_reply.append(['Soon','Not now']) 135 | # my_reply.append(['Soon','One Minute']) 136 | # my_reply.append(['Sounds good.','Great!','See you then!']) 137 | 138 | # from nltk.corpus import nps_chat 139 | # chatroom = nps_chat.posts('10-19-20s_706posts.xml') 140 | # a = chatroom[123] 141 | 142 | # nltk.download('punkt') 143 | # nltk.download('wordnet') 144 | -------------------------------------------------------------------------------- /src/rule_based_reply.py: -------------------------------------------------------------------------------- 1 | 2 | import re 3 | 4 | greeting_response = ["hi", "hello", "hey"] 5 | goodbye_response = ["bye", "talk to you later"] 6 | thank_response = ['happy to help','don\'t mention it','my pleasure'] 7 | inquiry_response = ['i\'m doing ok','ok','i\'ve been better'] 8 | future_response = ['yes','no','maybe'] 9 | what_you_response = ['nothing', 'not much'] 10 | are_you_response = ['yes','no', 'maybe'] 11 | when_response = ['soon','not now'] 12 | no_response = ['[No Suggestion]'] 13 | 14 | def bot_response(user_input): 15 | 16 | # pattern search 17 | 18 | # greeting 19 | if re.match('hi|hi\s+there|hello|hey|good\s+[morning|afternoon|evening|day]', user_input): # greeting 20 | return greeting_response 21 | 22 | # goodbye 23 | elif re.match('goodbye|bye|see\s+ya|gotta\s+go', user_input): 24 | return goodbye_response 25 | 26 | # how are you 27 | elif re.match('how\s+are\s+you.*|how.*going', user_input): 28 | return inquiry_response 29 | 30 | # thanks 31 | elif re.match('thank', user_input): 32 | return thank_response 33 | 34 | # future (will you, can you, would you, do you) 35 | elif re.search('[will|can|would|do]\s+you', user_input): 36 | return future_response 37 | 38 | # are you 39 | elif re.match('are.*you', user_input): 40 | return are_you_response 41 | 42 | # when 43 | elif re.match('when.*you', user_input): 44 | return when_response 45 | 46 | # what's up 47 | elif re.match('sup|what.*[happening|up|you]', user_input): 48 | return what_you_response 49 | 50 | else: # else 51 | return no_response 52 | 53 | print("* Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit! ") 54 | 55 | flag = True 56 | 57 | while flag: 58 | 59 | user_input = raw_input('>>> ').lower() # get input and convert to lowercase 60 | 61 | if not re.search('quit', user_input): 62 | 63 | response = bot_response(user_input) 64 | 65 | for i in range(0, len(response)): 66 | print('* ' + response[i]) 67 | 68 | else: 69 | 70 | flag = False 71 | # print("> Bye now!") 72 | --------------------------------------------------------------------------------