├── README.md
├── images
    └── nn.jpg
└── src
    ├── classes_dict.py
    ├── classes_dict.pyc
    ├── ml_reply.py
    ├── model_laura.tflearn.data-00000-of-00001
    ├── model_laura.tflearn.index
    ├── model_laura.tflearn.meta
    ├── retrieval_based_Reply.py
    └── rule_based_reply.py


/README.md:
--------------------------------------------------------------------------------
  1 | # Reply Bot
  2 | 
  3 | Smart reply bot implemented in Python.
  4 | 
  5 | ## What Does It Do?
  6 | 
  7 | The objective of this project is to create a bot that suggests short responses to a text or email message using Natural Language Processing. A reply bot is a type of chatbot. The goal of a chatbot is to mimic written or spoken speech as to simulate a conversation with a human.
  8 | 
  9 | The scope of bot is casual conversation.
 10 | 
 11 | From a high-level, there are two variants of chatbots, **rule-based** and **self-learning**:
 12 | * A **rule-based** bot responds via pattern matching and rules.
 13 | * A **self-learning** bot responds via using machine learning.
 14 |     * A **retrieval-based** chatbot replies by locating the best response from a database (corpus).
 15 |     * A **generative-based** chatbot use deep learning methods to respond and can generate a response it has never seen before.
 16 | 
 17 | In order to understand the tradeoffs between the different types of chatbots, I implemented a rule-based, a retrieval-based and a machine learning-based reply bot.
 18 | 
 19 | ## Built With
 20 | 
 21 | * Python
 22 | * [NLTK](https://www.nltk.org/)
 23 | * TensorFlow and TFLearn
 24 | 
 25 | ## Design
 26 | 
 27 | ### Rule-Based Bot
 28 | 
 29 | A rule-based chatbot is the simplest type of bot. This type of bot searches for predefined patterns in the input and uses a set of rules (if-else logic) to suggest replies. The suggested replies, in this case, are predefined. We can implement pattern matching on an input message by using regular expressions and applying rules with if-else statements.
 30 | 
 31 | For example, to search for a greeting we can use this as our regex pattern:
 32 | 
 33 | ```python
 34 | greeting_str = 'hi|hello|sup|hey|good\s+[morning|afternoon|evening]
 35 | ```
 36 | 
 37 | If the pattern is found, we suggest predefined greeting responses:
 38 | 
 39 | ```python
 40 | greeting_response = ["hi", "hey", "hello"]
 41 | ```
 42 | 
 43 | Altogether we have
 44 | 
 45 | ```python
 46 | if re.search(greeting_str, user_input):
 47 |     return greeting_response
 48 | ```
 49 | 
 50 | For the reply bot, I have defined 8 simple rules:
 51 | * Greeting
 52 | * Goodbye
 53 | * How are you
 54 | * Thank you
 55 | * Do you/will you/can you/would you
 56 | * Are you
 57 | * When
 58 | * What’s up
 59 | 
 60 | Each rule has an associated reply:
 61 | 
 62 | ```python
 63 | greeting_response = ["hi", "hello", "hey"]
 64 | goodbye_response = ["bye", "talk to you later"]
 65 | thank_response = ['happy to help','don\'t mention it','my pleasure']
 66 | inquiry_response = ['i\'m doing ok','ok','i\'ve been better']
 67 | future_response = ['yes','no','maybe']
 68 | what_you_response = ['nothing', 'not much']
 69 | are_you_response = ['yes','no', 'maybe']
 70 | when_response = ['soon','not now']
 71 | no_response = ['[No Suggestion]']
 72 | ```
 73 | 
 74 | ### Retrieval-Based Bot
 75 | 
 76 | The goal of a retrieval bot is to find the most likely response r given the input i. A retrieval-based bot responds to input using a database of utterance-response (Q-R) pairs. The database is known as our corpus. To generate a response, we first need to classify the user input to one of our predefined classes (if applicable). Once classified, the response associated with that class is suggested.
 77 | 
 78 | Below is our database/corpus (in this case, it’s a python dictionary) which we will use for our self-learning bots.
 79 | 
 80 | ```python
 81 | classes_dict = {}
 82 | 
 83 | classes_dict["greeting"] = {}
 84 | classes_dict["greeting"]["pattern"] = ["hi there", "hello", "hey", "good afternoon", "good morning", "good evening", "good day"]
 85 | classes_dict["greeting"]["response"] = ["hi", "hey", "hello"]
 86 | 
 87 | classes_dict["goodbye"] = {}
 88 | classes_dict["goodbye"]["pattern"] = ["bye", "goodbye", "see you later", "gotta go", "i have to go", "see you", "see ya", "talk to you later"]
 89 | classes_dict["goodbye"]["response"] = ["bye", "talk to you later"]
 90 | 
 91 | classes_dict["thanks"] = {}
 92 | classes_dict["thanks"]["pattern"] = ["thanks", "thank you"]
 93 | classes_dict["thanks"]["response"] = ["you're welcome", "my pleasure", "don't mention it"]
 94 | 
 95 | classes_dict["how are you"] = {}
 96 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"]
 97 | classes_dict["how are you"]["response"] = ["i'm doing ok", "ok", "i've been better"]
 98 | 
 99 | classes_dict["future"] = {}
100 | classes_dict["future"]["pattern"] = ["will you", "can you", "would you", "do you"]
101 | classes_dict["future"]["response"] = ["yes", "no", "maybe"]
102 | 
103 | classes_dict["are you"] = {}
104 | classes_dict["are you"]["pattern"] = ["are you", "aren't you"]
105 | classes_dict["are you"]["response"] = ["yes", "no", "maybe"]
106 | 
107 | classes_dict["when"] = {}
108 | classes_dict["when"]["pattern"] = ["when will", "when can", "when would", "when should"]
109 | classes_dict["when"]["response"] = ["soon", "not now"]
110 | 
111 | classes_dict["whats up"] = {}
112 | classes_dict["whats up"]["pattern"] = ["sup", "what's up", "what up", "whats up", "what is going on", "what's going on", "what's happening", "what are you up to", "what are you doing"]
113 | classes_dict["whats up"]["response"] = ["nothing", "not much"]
114 | ```
115 | 
116 | It can be seen that we have the 8 classes to classify user input. Each value of classes_dict (i.e., “greeting”, “goodbye”, etc.), has sub-dictionaries of “pattern” and “response.” “pattern” defines a list of expressions that we use to match user input to that class. “response” defines the suggested responses when a match to that class has been found.
117 | 
118 | For a retrieval-based bot, to determine the class of the user input, we do not use pattern matching but instead deconstruct the user input into a feature vector via text processing [4]. Basic text pre-processing includes:
119 | 
120 | * Converting text into uppercase or lowercase
121 | * Tokenizing – convert normal text strings into a list of tokens
122 | * Removing noise – remove everything that isn’t a number or letter (i.e. punctuation)
123 | * Stemming – reduce words to their stem, base or root form
124 | * Lemmatization – a variant of stemming. Stemming can create non-existent words; however, lemmas are actual words. For example, the words “better” and “good” are the same lemma [5]
125 | * Removing stop words – remove extremely common words that would be of little value to the algorithm
126 | 
127 | Once the user input is pre-processed, we transform the user input into a vector of numbers (one number for each word). We do this using Term Frequency-Inverse Document Frequency (TF-IDF). This assigns a weight to each word which is a statistical measure about how important that word is. We can then use cosine similarity to compare user input to the corpus [1]:
128 | 
129 | ```python
130 | remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
131 | 
132 | def lem_tokens(tokens):
133 | 
134 |     lemmer = nltk.stem.WordNetLemmatizer()
135 | 
136 |     return [lemmer.lemmatize(token) for token in tokens]
137 | 
138 | def lem_normalize(text):
139 | 
140 |     return lem_tokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
141 | 
142 | sent_tokens['user'] = user_input
143 | sent_tokens_ = []
144 | 
145 | for value in sent_tokens:
146 | sent_tokens_.append(sent_tokens[value])
147 | 
148 | tfidf_vec = TfidfVectorizer(tokenizer = lem_normalize)
149 | tfidf = tfidf_vec.fit_transform(sent_tokens_)
150 | vals = cosine_similarity(tfidf[-1], tfidf)
151 | idx = vals.argsort()[0][-2]
152 | flat = vals.flatten()
153 | flat.sort()
154 | req_tfidf = flat[-2]
155 | ```
156 | 
157 | The class with the highest similarity to the user input is chosen and the class response is suggested.
158 | 
159 | ```python
160 | error_threshold = 0.1
161 | if(req_tfidf < error_threshold):
162 | 
163 |     robo_response = ["[No Suggestion]"]
164 | 
165 | else:
166 | 
167 |     for value in sent_tokens:
168 |         match_pattern = sent_tokens_[idx]
169 |         pattern = sent_tokens[value]
170 |         if match_pattern == pattern:
171 |             match_class = value
172 | 
173 |     robo_response = my_dict[match_class]['response']
174 | 
175 | ```
176 | 
177 | ### Machine Learning-Based Bot
178 | 
179 | This type of reply bot uses a Neural Network (NN) to suggest a reply. The NN classifies the user input into one of pre-defined classes and we suggest the replies associated with that class [2].
180 | 
181 | The first step to creating input training data is to create an array of words found in our corpus, which we call our ‘bag of words.’ Recall all the class patterns found in the corpus:
182 | 
183 | ```python
184 | classes_dict["greeting"]["pattern"] = ["hi there", "hello", "hey", "good afternoon", "good morning", "good evening", "good day"]
185 | classes_dict["goodbye"]["pattern"] = ["bye", "goodbye", "see you later", "gotta go", "i have to go", "see you", "see ya", "talk to you later"]
186 | classes_dict["thanks"]["pattern"] = ["thanks", "thank you"]
187 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"]
188 | classes_dict["future"]["pattern"] = ["will you", "can you", "would you", "do you"]
189 | classes_dict["are you"]["pattern"] = ["are you", "aren't you"]
190 | classes_dict["when"]["pattern"] = ["when will", "when can", "when would", "when should"]
191 | classes_dict["whats up"]["pattern"] = ["sup", "what's up", "what up", "whats up", "what is going on", "what's going on", "what's happening", "what are you up to", "what are you doing"]
192 | ```
193 | 
194 | We tokenize, stem, and sort these patterns to create an a bag of words array. We have 42 known words in our corpus:
195 | 
196 | ```python
197 | my_words = ["'s", 'afternoon', 'ar', 'bye', 'can', 'day', 'do', 'doing', 'ev', 'go', 'going', 'good', 'goodby', 'got', 'hap', 'hav', 'hello', 'hey', 'hi', 'how', 'i', 'is', 'it', 'lat', 'morn', "n't", 'on', 'see', 'should', 'sup', 'ta', 'talk', 'thank', 'ther', 'to', 'up', 'what', 'when', 'wil', 'would', 'ya', 'you']
198 | ```
199 | 
200 | Likewise, we need to create a sorted array for our output classes:
201 | 
202 | ```python
203 | my_classes = ['are you', 'future', 'goodbye', 'greeting', 'how are you', 'thanks', 'whats up', 'when']
204 | ```
205 | 
206 | We now can create our training data. Let’s use the ‘how are you’ class as an example.
207 | 
208 | ```python
209 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"]
210 | ```
211 | 
212 | In this class, one pattern is ‘how are you doing.’ We can tokenize and stem this into a list:
213 | 
214 | ```python
215 | my_pattern = ['how', 'ar', 'you', 'doing']
216 | ```
217 | 
218 | We then create an input training list by appending a 1 in indices where a my_pattern word is found in our list of known words (my_words) and appending a 0 otherwise. For the pattern ‘how are you doing,’ we append bag with a 1 in positions ‘ar’ ‘doing’, ‘how,’ and ‘you’:
219 | 
220 | ```python
221 | bag = [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
222 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
223 | ```
224 | 
225 | Since this belongs to the ‘how are you’ class, we append the output as
226 | 
227 | ```python
228 | output_row = [0, 0, 0, 0, 1, 0, 0, 0]
229 | ```
230 | 
231 | The architecture of our NN is simple, as can be seen below. We have only 2 hidden layers.
232 | 
233 | <img src="images/nn.jpg" width="1000">
234 | 
235 | We train this for 1,000 epochs; this actually trains quite fast. It takes less than 30 seconds to train completely.
236 | 
237 | Once the model is trained, we can use the model to predict a class on user input. Again we define an error threshold for when to offer a response. Here it is 95%. Instead of using and error threshold to determine if a message will get a smart reply, we could train/implement a binary classifier on all messages to determine if they will get a smart reply [1].
238 | 
239 | ```python
240 | def tokenize_input(sentence):
241 | 
242 |     sentence_words = nltk.word_tokenize(sentence) # tokenize pattern
243 | 
244 |     for some_word in sentence_words:
245 |         sentence_words[sentence_words.index(some_word)] = stemmer.stem(some_word.lower())
246 | 
247 |     return sentence_words
248 | 
249 | def bag(user_input):
250 | 
251 |     input_words = tokenize_input(user_input)
252 | 
253 |     bag = [0] * len(my_words)
254 | 
255 |     for input_word in input_words:
256 |         for i in range(0,len(my_words)):
257 |             bag_word = my_words[i]
258 |             if input_word == bag_word:
259 |                 bag[i] = 1
260 |                 
261 |     return(np.array(bag))
262 |     
263 | error_threshold = 0.95
264 | def classify(user_input):
265 |                 
266 |     results = model.predict([bag(user_input)])[0] # generate probabilities from the model
267 |                 
268 |     filtered_results = []
269 |                 
270 |     for i in range(0, len(results)):
271 |             this_result = results[i]
272 |             if this_result > error_threshold:
273 |                 filtered_results.append([i, this_result])
274 |                 
275 |     filtered_results.sort(key = lambda x: x[1], reverse = True)
276 |                 
277 |     return_list = []
278 |                 
279 |     for i in range(0, len(filtered_results)):
280 |         return_list.append((my_classes[filtered_results[i][0]], filtered_results[i][1]))
281 |                 
282 |     return return_list # return tuple of intent and probability
283 | 
284 | some_array = classify(user_input)
285 | 
286 | if len(some_array) != 0:
287 | 
288 |     for response in classes_dict[some_array[0][0]]["response"]:
289 |         print "* " + response
290 | ```
291 | 
292 | ## Usage
293 | 
294 | Execute the program
295 | 
296 | ```
297 | python rule_based_reply.py
298 | ```
299 | 
300 | The program will then request a message
301 | 
302 | ```
303 | * Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit!
304 | >>>
305 | ```
306 | 
307 | Try different different messages and the reply bot will generate suggested replies or [No Suggestion]
308 | 
309 | ```
310 | >>> hello
311 | * hi
312 | * hey
313 | * hello
314 | ```
315 | 
316 | To quit, type quit
317 | 
318 | ```
319 | >>> quit
320 | ```
321 | 
322 | ## Thoughts
323 | 
324 | It's obvious that no one type of reply bot works best. A retrieval-based bot performs poorly for casual conversation and works better for factual queries. For casual conversation, a hybrid implementation (rule-based and machine-learning based) may work best.
325 | 
326 | ## Author
327 | 
328 | **Laura Kocubinski** [laurakoco](https://github.com/laurakoco)
329 | 
330 | ## Acknowledgments
331 | 
332 | * Boston University MET Master of Science Computer Science Program
333 | * MET CS 664 Artificial Intelligence
334 | 
335 | [1] https://medium.com/analytics-vidhya/building-a-simple-chatbot-in-python-using-nltk-7c8c8215ac6e
336 | 
337 | [2] https://chatbotsmagazine.com/contextual-chat-bots-with-tensorflow-4391749d0077 
338 | 


--------------------------------------------------------------------------------
/images/nn.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/images/nn.jpg


--------------------------------------------------------------------------------
/src/classes_dict.py:
--------------------------------------------------------------------------------
 1 | 
 2 | classes_dict = {}
 3 | 
 4 | classes_dict["greeting"] = {}
 5 | classes_dict["greeting"]["pattern"] = ["hi", "hi there", "hello", "hey", "good afternoon", "good morning", "good evening", "good day"]
 6 | classes_dict["greeting"]["response"] = ["hi", "hey", "hello"]
 7 | 
 8 | classes_dict["goodbye"] = {}
 9 | classes_dict["goodbye"]["pattern"] = ["bye", "goodbye", "see you later", "gotta go", "i have to go", "see you", "see ya", "talk to you later"]
10 | classes_dict["goodbye"]["response"] = ["bye", "talk to you later"]
11 | 
12 | classes_dict["thanks"] = {}
13 | classes_dict["thanks"]["pattern"] = ["thanks", "thank you"]
14 | classes_dict["thanks"]["response"] = ["you're welcome", "my pleasure", "don't mention it"]
15 | 
16 | classes_dict["how are you"] = {}
17 | classes_dict["how are you"]["pattern"] = ["how are you", "how are you doing", "how's it going"]
18 | classes_dict["how are you"]["response"] = ["i'm doing ok", "ok", "i've been better"]
19 | 
20 | classes_dict["future"] = {}
21 | classes_dict["future"]["pattern"] = ["will you", "can you", "would you", "do you"]
22 | classes_dict["future"]["response"] = ["yes", "no", "maybe"]
23 | 
24 | classes_dict["are you"] = {}
25 | classes_dict["are you"]["pattern"] = ["are you", "aren't you"]
26 | classes_dict["are you"]["response"] = ["yes", "no", "maybe"]
27 | 
28 | classes_dict["when"] = {}
29 | classes_dict["when"]["pattern"] = ["when will", "when can", "when would", "when should"]
30 | classes_dict["when"]["response"] = ["soon", "not now"]
31 | 
32 | classes_dict["whats up"] = {}
33 | classes_dict["whats up"]["pattern"] = ["sup", "what's up", "what up", "whats up", "what is going on", "what's going on", "what's happening", "what are you up to", "what are you doing"]
34 | classes_dict["whats up"]["response"] = ["nothing", "not much"]
35 | 


--------------------------------------------------------------------------------
/src/classes_dict.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/classes_dict.pyc


--------------------------------------------------------------------------------
/src/ml_reply.py:
--------------------------------------------------------------------------------
  1 | import nltk
  2 | import numpy as np
  3 | import tflearn
  4 | import tensorflow as tf
  5 | import random
  6 | import pickle
  7 | import re
  8 | from nltk.stem.lancaster import LancasterStemmer
  9 | 
 10 | stemmer = LancasterStemmer()
 11 | 
 12 | from classes_dict import *
 13 | 
 14 | my_words = []
 15 | my_classes = []
 16 | my_doc = []
 17 | ignore_words = ['?']
 18 | 
 19 | # pre-process text from classes_dict
 20 | for some_class in classes_dict:
 21 |     
 22 |     my_classes.append(some_class)
 23 |     
 24 |     for some_pattern in classes_dict[some_class]["pattern"]:
 25 |         
 26 |         temp_words = []
 27 |         
 28 |         raw_words = some_pattern
 29 |         # raw_words = ' '.join(raw_words)
 30 |         word_tokens = nltk.word_tokenize(raw_words)
 31 |         
 32 |         for some_word in word_tokens:
 33 |             if some_word not in ignore_words:
 34 |                 stemmed_word = stemmer.stem(some_word.lower())
 35 |                 my_words.append(stemmed_word)
 36 |                 temp_words.append(stemmed_word)
 37 | 
 38 |     my_doc.append((temp_words, some_class))
 39 | 
 40 | my_words = sorted( list(set(my_words)) ) # remove duplicate words
 41 | my_classes = sorted( list(set(my_classes)) )
 42 | 
 43 | training = []
 44 | output = []
 45 | output_empty = [0] * len(my_classes)
 46 | 
 47 | for some_doc in my_doc:
 48 |     
 49 |     bag = []
 50 |     pattern_words = some_doc[0]
 51 |     
 52 |     # create bag of words array
 53 |     for some_word in my_words:
 54 |         if some_word in pattern_words:
 55 |             bag.append(1)
 56 |         else:
 57 |             bag.append(0)
 58 | 
 59 | output_row = list(output_empty)
 60 | output_row[my_classes.index(some_doc[1])] = 1
 61 | 
 62 | training.append([bag, output_row])
 63 | 
 64 | # shuffle training data and put into array
 65 | random.shuffle(training)
 66 | training = np.array(training)
 67 | 
 68 | # create train lists
 69 | train_x = list(training[:,0])
 70 | train_y = list(training[:,1])
 71 | 
 72 | # reset underlying graph data
 73 | tf.reset_default_graph()
 74 | 
 75 | # build nn model
 76 | net = tflearn.input_data(shape = [None, len(train_x[0])])
 77 | net = tflearn.fully_connected(net, 8)
 78 | net = tflearn.fully_connected(net, 8)
 79 | net = tflearn.fully_connected(net, len(train_y[0]), activation = 'softmax')
 80 | net = tflearn.regression(net)
 81 | 
 82 | # define model and setup tensorboard
 83 | model = tflearn.DNN(net, tensorboard_dir = 'tflearn_logs')
 84 | 
 85 | # train model
 86 | # model.fit(train_x, train_y, n_epoch = 500, batch_size = 8, show_metric = True)
 87 | # model.save('model_stupid.tflearn')
 88 | # pickle.dump( {'my_words':my_words, 'my_classes':my_classes, 'train_x':train_x, 'train_y':train_y}, open( "training_data", "wb" ) )
 89 | 
 90 | # load model
 91 | model.load('./model_laura.tflearn')
 92 | 
 93 | # import pickle
 94 | # data = pickle.load( open( "training_data", "rb" ) )
 95 | # words = data['words']
 96 | # classes = data['classes']
 97 | # train_x = data['train_x']
 98 | # train_y = data['train_y']
 99 | 
100 | def tokenize_input(sentence):
101 |     
102 |     sentence_words = nltk.word_tokenize(sentence) # tokenize pattern
103 |     # sentence_words = [stemmer.stem(word.lower()) for word in sentence_words] # stem each word
104 |     
105 |     for some_word in sentence_words:
106 |         sentence_words[sentence_words.index(some_word)] = stemmer.stem(some_word.lower())
107 |     
108 |     return sentence_words
109 | 
110 | def bag(user_input):
111 |     
112 |     input_words = tokenize_input(user_input)
113 |     
114 |     bag = [0] * len(my_words)
115 |     
116 |     for input_word in input_words:
117 |         for i in range(0,len(my_words)):
118 |             bag_word = my_words[i]
119 |             if input_word == bag_word:
120 |                 bag[i] = 1
121 | 
122 |     return(np.array(bag))
123 | 
124 | error_threshold = 0.95
125 | 
126 | def classify(user_input):
127 |     
128 |     results = model.predict([bag(user_input)])[0] # generate probabilities from the model
129 |     
130 |     filtered_results = []
131 |     
132 |     for i in range(0, len(results)):
133 |         this_result = results[i]
134 |         if this_result > error_threshold:
135 |             filtered_results.append([i, this_result])
136 | 
137 |     filtered_results.sort(key = lambda x: x[1], reverse = True)
138 | 
139 |     return_list = []
140 | 
141 |     for i in range(0, len(filtered_results)):
142 |         return_list.append((my_classes[filtered_results[i][0]], filtered_results[i][1]))
143 |     
144 |     return return_list # return tuple of intent and probability
145 | 
146 | print("* Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit!")
147 | 
148 | flag = True
149 | 
150 | while flag:
151 |     
152 |     user_input = raw_input('>>> ').lower() # get input and convert to lowercase
153 |     
154 |     if(user_input != "quit"):
155 | 
156 |         some_array = []
157 |         
158 |         some_array = classify(user_input)
159 |         
160 |         # print "predicted class: " + my_classes[pred_class]
161 |         # print "probability: " + str(pred_prob)
162 |         
163 |         if len(some_array) != 0:
164 |             
165 |             # print "predicted class: " + str(some_array[0][0])
166 |             # print "probability: " + str(some_array[0][1])
167 |             
168 |             for response in classes_dict[some_array[0][0]]["response"]:
169 |                 print "* " + response
170 |     
171 |         else:
172 |             
173 |             print "[No Suggestion]"
174 | 
175 |     else:
176 |     
177 |         flag = False
178 | 


--------------------------------------------------------------------------------
/src/model_laura.tflearn.data-00000-of-00001:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/model_laura.tflearn.data-00000-of-00001


--------------------------------------------------------------------------------
/src/model_laura.tflearn.index:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/model_laura.tflearn.index


--------------------------------------------------------------------------------
/src/model_laura.tflearn.meta:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/laurakoco/reply-bot/1f6d0da885fd27812b048c5d4dd176f52d58483e/src/model_laura.tflearn.meta


--------------------------------------------------------------------------------
/src/retrieval_based_Reply.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import nltk
  3 | import random
  4 | import string
  5 | import sys
  6 | import nltk
  7 | 
  8 | from sklearn.feature_extraction.text import TfidfVectorizer
  9 | from sklearn.metrics.pairwise import cosine_similarity
 10 | 
 11 | from classes_dict import *
 12 | 
 13 | remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
 14 | 
 15 | def lem_tokens(tokens):
 16 | 
 17 |     lemmer = nltk.stem.WordNetLemmatizer()
 18 | 
 19 |     return [lemmer.lemmatize(token) for token in tokens]
 20 | 
 21 | def lem_normalize(text):
 22 | 
 23 |     return lem_tokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
 24 |     # return LemTokens(nltk.word_tokenize(text.lower()))
 25 | 
 26 | def my_response(my_dict, user_input, sent_tokens):
 27 | 
 28 |     robo_response = ''
 29 | 
 30 |     #sent_tokens.append(user_response)
 31 |     sent_tokens['user'] = user_input
 32 | 
 33 |     sent_tokens_ = []
 34 | 
 35 |     for value in sent_tokens:
 36 |         sent_tokens_.append(sent_tokens[value])
 37 | 
 38 |     # tfidf_vec = TfidfVectorizer(tokenizer = lem_normalize, stop_words='english')
 39 |     tfidf_vec = TfidfVectorizer(tokenizer = lem_normalize)
 40 | 
 41 |     tfidf = tfidf_vec.fit_transform(sent_tokens_)
 42 |     vals = cosine_similarity(tfidf[-1], tfidf)
 43 |     idx = vals.argsort()[0][-2]
 44 |     flat = vals.flatten()
 45 |     flat.sort()
 46 |     req_tfidf = flat[-2]
 47 |     # print req_tfidf
 48 | 
 49 |     error_threshold = 0.1
 50 |     if(req_tfidf < error_threshold):
 51 | 
 52 |         robo_response = ["[No Suggestion]"]
 53 |         return robo_response
 54 | 
 55 |     else:
 56 | 
 57 |         for value in sent_tokens:
 58 |             match_pattern = sent_tokens_[idx]
 59 |             pattern = sent_tokens[value]
 60 |             if match_pattern == pattern:
 61 |                 match_class = value
 62 | 
 63 |         # print match_class
 64 |         robo_response = my_dict[match_class]['response']
 65 |         return robo_response
 66 | 
 67 | def post_dict(some_dict):
 68 | 
 69 |     sent_tokens = {}
 70 | 
 71 |     for value in some_dict:
 72 |         words = some_dict[value]["pattern"]
 73 |         words = ' '.join(words)
 74 |         sent_tokens[value] = words
 75 |         word_tokens = nltk.word_tokenize(words)
 76 | 
 77 |     return sent_tokens, word_tokens
 78 | 
 79 | sent_tokens, word_tokens = post_dict(classes_dict)
 80 | 
 81 | print("* Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit!")
 82 | 
 83 | flag = True
 84 | 
 85 | while flag:
 86 | 
 87 |     user_input = raw_input('>>> ').lower()
 88 | 
 89 |     if(user_input != "quit"):
 90 | 
 91 |         response = my_response(classes_dict, user_input, sent_tokens)
 92 | 
 93 |         for i in range(0, len(response)):
 94 |             print('* ' + response[i])
 95 | 
 96 |         # sent_tokens.remove(user_input)
 97 |         del sent_tokens['user']
 98 | 
 99 |     else:
100 | 
101 |         flag = False
102 | 
103 | # def read_text():
104 | 
105 | #    f = open('smart_reply_input.txt', 'r')
106 | #    raw = f.read()
107 | 
108 | #    sent_tokens = nltk.sent_tokenize(raw)
109 | #    word_tokens = nltk.word_tokenize(raw)
110 | 
111 | #    return sent_tokens, word_tokens
112 | 
113 | # def process_text(sent_tokens, word_tokens):
114 |     # remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
115 |     # new_word_tokens = LemTokens(word_tokens)
116 |     # return new_word_tokens
117 | 
118 | # def greeting(sentence):
119 | 
120 |     # for word in sentence.split():
121 |         # if word in greeting_inputs:
122 |             # return random.choice(greeting_responses)
123 | 
124 | # my_reply = []
125 | # my_reply.append(["hi", "hey", "hello"])
126 | # my_reply.append(['You\'re welcome','My pleasure','Don\'t mention it'])
127 | # my_reply.append(['I\'m doing well','OK','I\'ve been better'])
128 | # my_reply.append(['Yes, I\'d love to','Sure!','Sorry, I can\'t'])
129 | # my_reply.append(['Not much','Nothing'])
130 | # my_reply.append(['Yes','No'])
131 | # my_reply.append([':)','Cool'])
132 | # my_reply.append([':(','Sorry'])
133 | # my_reply.append(['Work','Home','I\'m out'])
134 | # my_reply.append(['Soon','Not now'])
135 | # my_reply.append(['Soon','One Minute'])
136 | # my_reply.append(['Sounds good.','Great!','See you then!'])
137 | 
138 | # from nltk.corpus import nps_chat
139 | # chatroom = nps_chat.posts('10-19-20s_706posts.xml')
140 | # a = chatroom[123]
141 | 
142 | # nltk.download('punkt')
143 | # nltk.download('wordnet')
144 | 


--------------------------------------------------------------------------------
/src/rule_based_reply.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import re
 3 | 
 4 | greeting_response = ["hi", "hello", "hey"]
 5 | goodbye_response = ["bye", "talk to you later"]
 6 | thank_response = ['happy to help','don\'t mention it','my pleasure']
 7 | inquiry_response = ['i\'m doing ok','ok','i\'ve been better']
 8 | future_response = ['yes','no','maybe']
 9 | what_you_response = ['nothing', 'not much']
10 | are_you_response = ['yes','no', 'maybe']
11 | when_response = ['soon','not now']
12 | no_response = ['[No Suggestion]']
13 | 
14 | def bot_response(user_input):
15 | 
16 |     # pattern search
17 | 
18 |     # greeting
19 |     if re.match('hi|hi\s+there|hello|hey|good\s+[morning|afternoon|evening|day]', user_input): # greeting
20 |         return greeting_response
21 | 
22 |     # goodbye
23 |     elif re.match('goodbye|bye|see\s+ya|gotta\s+go', user_input):
24 |         return goodbye_response
25 | 
26 |     # how are you
27 |     elif re.match('how\s+are\s+you.*|how.*going', user_input):
28 |         return inquiry_response
29 | 
30 |     # thanks
31 |     elif re.match('thank', user_input):
32 |         return thank_response
33 | 
34 |     # future (will you, can you, would you, do you)
35 |     elif re.search('[will|can|would|do]\s+you', user_input):
36 |         return future_response
37 | 
38 |     # are you
39 |     elif re.match('are.*you', user_input):
40 |         return are_you_response
41 | 
42 |     # when
43 |     elif re.match('when.*you', user_input):
44 |         return when_response
45 | 
46 |     # what's up
47 |     elif re.match('sup|what.*[happening|up|you]', user_input):
48 |         return what_you_response
49 | 
50 |     else: # else
51 |         return no_response
52 | 
53 | print("* Hello! Type in a message and I will suggest some replies! If you'd like to exit please type quit! ")
54 | 
55 | flag = True
56 | 
57 | while flag:
58 | 
59 |     user_input = raw_input('>>> ').lower() # get input and convert to lowercase
60 | 
61 |     if not re.search('quit', user_input):
62 | 
63 |         response = bot_response(user_input)
64 | 
65 |         for i in range(0, len(response)):
66 |             print('* ' + response[i])
67 | 
68 |     else:
69 | 
70 |         flag = False
71 |         # print("> Bye now!")
72 | 


--------------------------------------------------------------------------------