├── models ├── lda_model_neg ├── ldavis_prepared ├── lda_model_neg.state ├── lda_model_neg.id2word ├── lda_model_neg.expElogbeta.npy └── ldavis_html.txt ├── README.md └── Yelp_chatbot.ipynb /models/lda_model_neg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuyuhan0306/Yelp_CRM_NLP/HEAD/models/lda_model_neg -------------------------------------------------------------------------------- /models/ldavis_prepared: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuyuhan0306/Yelp_CRM_NLP/HEAD/models/ldavis_prepared -------------------------------------------------------------------------------- /models/lda_model_neg.state: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuyuhan0306/Yelp_CRM_NLP/HEAD/models/lda_model_neg.state -------------------------------------------------------------------------------- /models/lda_model_neg.id2word: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuyuhan0306/Yelp_CRM_NLP/HEAD/models/lda_model_neg.id2word -------------------------------------------------------------------------------- /models/lda_model_neg.expElogbeta.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuyuhan0306/Yelp_CRM_NLP/HEAD/models/lda_model_neg.expElogbeta.npy -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Yelp_CRM_NLP 2 | The goal of this capstone project is responding negative reviews in real-time so as to improve customer satisfaction. 3 | Using the Yelp Open Dataset published on September 1, 2017 as our dataset, we conducted sentiment analysis by NLTK package to separate positive and negative reviews. To attain our goal, complaints were selected exclusively as the corpus. Then, we utilized the spaCy package to preprocess the complaint reviews, and the gensim package to train n-gram phrase models and Latent Dirichlet Allocation (LDA) topic models. Finally, we built an automated Telegram chatbot to demonstrate the replies to different types of complaints. There are some interesting insights found in this project: 4 | 5 | (1) We calculated the difference between the number of stars given for each review and that reviewer’s average star rating across all reviews. The result yields a column named star difference (“stars_dif”). There is a large concentration at zero star difference, which suggests that many users give the exact same rating for every review. Thus, the value of review stars is questionable for sentiment classification. This motivates us to further conduct sentiment analysis. 6 | 7 | (2) Even a 5-star rated review can be a negative review based on our sentiment analysis. 8 | 9 | (3) According to the LDA topic model, the three complaint categories include price, food quality, and service quality. The most problematic issue is service quality. 10 | 11 | (4) There are 18749 restaurants are categorized as negative rated dining places. 3677 restaurants are in Arizona, Nevada, and California. 12 | 13 | (5) The chatbot currently responds with a single reply to the most prominent topic. However, sometimes, customer complaints are mixed problems. We can further improve the chatbot to reply with an appropriate response incorporating multiple solutions. Additionally, the accuracy of the result depends on the volume of the text. Therefore, creating a common word database and tagging those words into certain categories may improve the performance. 14 | 15 | We made a Shiny app to visulize the distribution of restaurants, LDA, and chatbot. If you would like to know more about the workflow, concepts, and insights, please feel free to check out our blog post. https://blog.nycdatascience.com/student-works/capstone/real-time-yelp-reviews-analysis-response-solutions-restaurant-owners/ 16 | 17 | Slides 18 | https://docs.google.com/presentation/d/1QP6qXirIbS6aCRvcUr1oJglEin3FVzJ5sI9aCQQs-cs/edit?usp=sharing 19 | -------------------------------------------------------------------------------- /Yelp_chatbot.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "#load packages for NLP\n", 12 | "import spacy\n", 13 | "nlp = spacy.load('en')\n", 14 | "\n", 15 | "from gensim.models import Phrases\n", 16 | "from gensim.corpora import Dictionary, MmCorpus\n", 17 | "from gensim.models.word2vec import LineSentence\n", 18 | "from gensim.models.ldamulticore import LdaMulticore\n", 19 | "\n", 20 | "import pandas as pd\n", 21 | "import numpy as np" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "collapsed": true 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "#load trained n-gram and LDA models\n", 33 | "trigram_dictionary = Dictionary.load('./models/trigram_dict_all_neg.dict')\n", 34 | "lda = LdaMulticore.load('./models/lda_model_neg')\n", 35 | "bigram_model = Phrases.load('./models/bigram_model_neg.txt')\n", 36 | "trigram_model = Phrases.load('./models/trigram_model_neg.txt')\n", 37 | "\n", 38 | "def punct_space(token):\n", 39 | " \"\"\"\n", 40 | " helper function to eliminate tokens\n", 41 | " that are pure punctuation or whitespace\n", 42 | " \"\"\"\n", 43 | " return token.is_punct or token.is_space or token.like_num or token.is_digit\n", 44 | "\n", 45 | "\n", 46 | "def lda_descr_chat(review_text, min_topic_freq=0):\n", 47 | " \"\"\"\n", 48 | " accept the original text of a review and (1) parse it with spaCy,\n", 49 | " (2) apply text pre-proccessing steps, (3) create a bag-of-words\n", 50 | " representation, (4) create an LDA representation, and\n", 51 | " (5) print a sorted list of the top topics in the LDA representation\n", 52 | " \"\"\" \n", 53 | " # parse the review text with spaCy\n", 54 | " parsed_review = nlp(review_text)\n", 55 | " \n", 56 | " # lemmatize the text and remove punctuation and whitespace\n", 57 | " unigram_review = [token.lemma_ for token in parsed_review\n", 58 | " if not punct_space(token)]\n", 59 | " \n", 60 | " # apply the first-order and secord-order phrase models\n", 61 | " bigram_review = bigram_model[unigram_review]\n", 62 | " trigram_review = trigram_model[bigram_review]\n", 63 | " \n", 64 | " # remove any remaining stopwords\n", 65 | " trigram_review = [term for term in trigram_review\n", 66 | " if not term in spacy.en.English.Defaults.stop_words]\n", 67 | " \n", 68 | " # create a bag-of-words representation\n", 69 | " review_bow = trigram_dictionary.doc2bow(trigram_review)\n", 70 | " \n", 71 | " # create an LDA representation\n", 72 | " review_lda = lda[review_bow]\n", 73 | " return max(review_lda, key=lambda item: item[1])[0]" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": { 80 | "collapsed": true 81 | }, 82 | "outputs": [], 83 | "source": [ 84 | "#use the JSON module to parse the JSON responses from Telegram into Python dictionaries\n", 85 | "import json \n", 86 | "#make web requests using Python and use it to interact with the Telegram API \n", 87 | "import requests \n", 88 | "#handle special characters\n", 89 | "import urllib \n", 90 | "import time" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": { 97 | "collapsed": true 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "#set up global variables\n", 102 | "TOKEN = \"(my chatbot's token)\"\n", 103 | "URL = \"https://api.telegram.org/bot{}/\".format(TOKEN)" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": null, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "#download the content from a URL and giving us a string\n", 113 | "def get_url(url):\n", 114 | " response = requests.get(url)\n", 115 | " content = response.content.decode(\"utf8\")\n", 116 | " return content\n", 117 | "\n", 118 | "\n", 119 | "#use the JSON module to parse the JSON responses from Telegram into Python dictionaries\n", 120 | "def get_json_from_url(url):\n", 121 | " content = get_url(url)\n", 122 | " js = json.loads(content)\n", 123 | " return js\n", 124 | "\n", 125 | "\n", 126 | "#retrieve a list of \"updates\" (messages sent to YelpReviews_bot)\n", 127 | "def get_updates(offset=None):\n", 128 | " # use Long Polling; keeps the connection open until there are updates\n", 129 | " # in URLs, we specify that the argument list is starting with a ?\n", 130 | " url = URL + \"getUpdates?timeout=100\"\n", 131 | " # don't want to receive any messages with smaller IDs than this\n", 132 | " if offset:\n", 133 | " # in URLs, further arguments are separated with &\n", 134 | " url += \"&offset={}\".format(offset)\n", 135 | " js = get_json_from_url(url)\n", 136 | " return js\n", 137 | "\n", 138 | "\n", 139 | "def get_last_update_id(updates):\n", 140 | " update_ids = []\n", 141 | " for update in updates[\"result\"]:\n", 142 | " update_ids.append(int(update[\"update_id\"]))\n", 143 | " return max(update_ids)\n", 144 | "\n", 145 | "\n", 146 | "def complaint_response(updates):\n", 147 | " for update in updates[\"result\"]:\n", 148 | " try:\n", 149 | " text = update[\"message\"][\"text\"]\n", 150 | " chat = update[\"message\"][\"chat\"][\"id\"]\n", 151 | " send_message(text, chat)\n", 152 | " except Exception as e:\n", 153 | " pass \n", 154 | "\n", 155 | " \n", 156 | "def get_last_chat_id_and_text(updates):\n", 157 | " num_updates = len(updates[\"result\"])\n", 158 | " last_update = num_updates - 1\n", 159 | " text = updates[\"result\"][last_update][\"message\"][\"text\"]\n", 160 | " chat_id = updates[\"result\"][last_update][\"message\"][\"chat\"][\"id\"]\n", 161 | " return (text, chat_id)\n", 162 | "\n", 163 | "\n", 164 | "#take the text of the message, and the chat ID of the chat where we want to send the message\n", 165 | "greeting_words = (\"hello\", \"hi\", \"hey\", \"greetings\", \"sup\", \"what's up\", \"/start\")\n", 166 | "\n", 167 | "def send_message(text, chat_id):\n", 168 | " if text.lower() in greeting_words:\n", 169 | " answer = \"Hello, I'm the Complainteller. How can I help you?\"\n", 170 | " else:\n", 171 | " complaint_type = lda_descr_chat(text)\n", 172 | " if complaint_type == 0:\n", 173 | " answer = \"We're sorry you were disappointed with your meal. If you have any suggestion, please let us know.\"\n", 174 | " elif complaint_type == 1:\n", 175 | " answer = \"We're sorry to hear that you were disappointed with our service and pricing. Unfortunately the costs are necessary to provide our quality ingredients. Thanks for your understanding.\"\n", 176 | " else:\n", 177 | " answer = \"Thank you for bringing this situation to our attention. We would like to sincerely apologize for the disappointing customer service. Providing primary customer service is our primary goal and we will use this situation as an educational oppourtunity for our staff.\"\n", 178 | " url = URL + \"sendMessage?text={}&chat_id={}\".format(answer, chat_id)\n", 179 | " get_url(url)\n", 180 | "\n", 181 | "\n", 182 | "def main():\n", 183 | " last_update_id = None\n", 184 | " # no duplicate messages\n", 185 | " while True:\n", 186 | " # to check 'get_updates' works\n", 187 | " #print(\"getting updates\")\n", 188 | " updates = get_updates(last_update_id)\n", 189 | " #print(updates) check the current state and from/to info\n", 190 | " if len(updates[\"result\"]) > 0:\n", 191 | " last_update_id = get_last_update_id(updates) + 1\n", 192 | " complaint_response(updates)\n", 193 | " # gets the most recent messages from Telegram every half second\n", 194 | " time.sleep(0.5) \n", 195 | "\n", 196 | "\n", 197 | " \n", 198 | "#import the functions into another script without running anything\n", 199 | "if __name__ == '__main__':\n", 200 | " main()" 201 | ] 202 | } 203 | ], 204 | "metadata": { 205 | "kernelspec": { 206 | "display_name": "Python 3", 207 | "language": "python", 208 | "name": "python3" 209 | }, 210 | "language_info": { 211 | "codemirror_mode": { 212 | "name": "ipython", 213 | "version": 3 214 | }, 215 | "file_extension": ".py", 216 | "mimetype": "text/x-python", 217 | "name": "python", 218 | "nbconvert_exporter": "python", 219 | "pygments_lexer": "ipython3", 220 | "version": "3.6.1" 221 | } 222 | }, 223 | "nbformat": 4, 224 | "nbformat_minor": 2 225 | } 226 | -------------------------------------------------------------------------------- /models/ldavis_html.txt: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 |
6 | --------------------------------------------------------------------------------