├── Presentation.pptx ├── Data └── customer_booking.csv ├── completion_certificate.pdf ├── Predicitve Model Results.pptx ├── Predictive Modeling ├── customer_booking.csv └── .ipynb_checkpoints │ └── Explore-customer-data-checkpoint.ipynb ├── README.md ├── Data Collection ├── getting_started.ipynb ├── .ipynb_checkpoints │ ├── getting_started-checkpoint.ipynb │ └── Data-collection-checkpoint.ipynb └── Data-collection.ipynb └── Data Cleaning ├── Data Cleaning.ipynb └── .ipynb_checkpoints └── Data Cleaning-checkpoint.ipynb /Presentation.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hseju/British-Airways-Good-or-Bad/HEAD/Presentation.pptx -------------------------------------------------------------------------------- /Data/customer_booking.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hseju/British-Airways-Good-or-Bad/HEAD/Data/customer_booking.csv -------------------------------------------------------------------------------- /completion_certificate.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hseju/British-Airways-Good-or-Bad/HEAD/completion_certificate.pdf -------------------------------------------------------------------------------- /Predicitve Model Results.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hseju/British-Airways-Good-or-Bad/HEAD/Predicitve Model Results.pptx -------------------------------------------------------------------------------- /Predictive Modeling/customer_booking.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hseju/British-Airways-Good-or-Bad/HEAD/Predictive Modeling/customer_booking.csv -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # British Airways Good or Bad? 2 | 3 | This project is a part of the Data Science virtual internship program offered by Forage with British Airways. 4 | 5 | ## The virtual Internship is divided into two main tasks 6 | 7 | 1. Web scraping to gain company insights 8 | 2. Predicting customer buying behaviour 9 | 10 |
11 | 12 | ### Task 1 - Web scraping to gain company insights 13 | 14 | Customers who book a flight with BA will experience many interaction points with the BA brand. Understanding a customer's feelings, needs, and feedback is crucial for any business, including BA. 15 | 16 | This first task is focused on scraping and collecting customer feedback and reviewing data from a third-party source and analysing this data to present any insights you may uncover. 17 | 18 | Customer review data for Britis Airways was collected from [Skytrax](https://www.airlinequality.com/airline-reviews/british-airways). 19 | 20 | Following insights were uncovered as they are summed up in the one slide presentation. 21 | 22 | ![image](https://user-images.githubusercontent.com/89634505/201470985-159e17d2-605d-46c1-a9f1-8d0cdd147245.png) 23 | 24 | ### Task 2 - Predicting customer buying behaviour 25 | 26 | Customers are more empowered than ever because they have access to a wealth of information at their fingertips. This is one of the reasons the buying cycle is very different to what it used to be. Today, if you’re hoping that a customer purchases your flights or holidays as they come into the airport, you’ve already lost! Being reactive in this situation is not ideal; airlines must be proactive in order to acquire customers before they embark on their holiday. 27 | 28 | This task involves building a high quality predictive to predict the successful bookings using customer bookings data. 29 | 30 | ![image](https://user-images.githubusercontent.com/89634505/201471191-cdd85024-1691-4136-b9f8-b4b1d8d42f72.png) 31 | -------------------------------------------------------------------------------- /Data Collection/getting_started.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Task 1\n", 8 | "\n", 9 | "---\n", 10 | "\n", 11 | "## Web scraping and analysis\n", 12 | "\n", 13 | "This Jupyter notebook includes some code to get you started with web scraping. We will use a package called `BeautifulSoup` to collect the data from the web. Once you've collected your data and saved it into a local `.csv` file you should start with your analysis.\n", 14 | "\n", 15 | "### Scraping data from Skytrax\n", 16 | "\n", 17 | "If you visit [https://www.airlinequality.com] you can see that there is a lot of data there. For this task, we are only interested in reviews related to British Airways and the Airline itself.\n", 18 | "\n", 19 | "If you navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use `Python` and `BeautifulSoup` to collect all the links to the reviews and then to collect the text data on each of the individual review links." 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 16, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import requests\n", 29 | "from bs4 import BeautifulSoup\n", 30 | "import pandas as pd" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 14, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "name": "stdout", 40 | "output_type": "stream", 41 | "text": [ 42 | "Scraping page 1\n", 43 | " ---> 100 total reviews\n", 44 | "Scraping page 2\n", 45 | " ---> 200 total reviews\n", 46 | "Scraping page 3\n", 47 | " ---> 300 total reviews\n", 48 | "Scraping page 4\n", 49 | " ---> 400 total reviews\n", 50 | "Scraping page 5\n", 51 | " ---> 500 total reviews\n", 52 | "Scraping page 6\n", 53 | " ---> 600 total reviews\n", 54 | "Scraping page 7\n", 55 | " ---> 700 total reviews\n", 56 | "Scraping page 8\n", 57 | " ---> 800 total reviews\n", 58 | "Scraping page 9\n", 59 | " ---> 900 total reviews\n", 60 | "Scraping page 10\n", 61 | " ---> 1000 total reviews\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "base_url = \"https://www.airlinequality.com/airline-reviews/british-airways\"\n", 67 | "pages = 10\n", 68 | "page_size = 100\n", 69 | "\n", 70 | "reviews = []\n", 71 | "\n", 72 | "# for i in range(1, pages + 1):\n", 73 | "for i in range(1, pages + 1):\n", 74 | "\n", 75 | " print(f\"Scraping page {i}\")\n", 76 | "\n", 77 | " # Create URL to collect links from paginated data\n", 78 | " url = f\"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}\"\n", 79 | "\n", 80 | " # Collect HTML data from this page\n", 81 | " response = requests.get(url)\n", 82 | "\n", 83 | " # Parse content\n", 84 | " content = response.content\n", 85 | " parsed_content = BeautifulSoup(content, 'html.parser')\n", 86 | " for para in parsed_content.find_all(\"div\", {\"class\": \"text_content\"}):\n", 87 | " reviews.append(para.get_text())\n", 88 | " \n", 89 | " print(f\" ---> {len(reviews)} total reviews\")" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 17, 95 | "metadata": {}, 96 | "outputs": [ 97 | { 98 | "data": { 99 | "text/html": [ 100 | "
\n", 101 | "\n", 114 | "\n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | "
reviews
0Not Verified | LHR-LAX. I prefer the Boeing 7...
1✅ Trip Verified | Flew back to UK from Miami ...
2✅ Trip Verified | I flew with hand baggage bu...
3Not Verified | London to Cairo. First, on this...
4✅ Trip Verified | This review is specifically...
\n", 144 | "
" 145 | ], 146 | "text/plain": [ 147 | " reviews\n", 148 | "0 Not Verified | LHR-LAX. I prefer the Boeing 7...\n", 149 | "1 ✅ Trip Verified | Flew back to UK from Miami ...\n", 150 | "2 ✅ Trip Verified | I flew with hand baggage bu...\n", 151 | "3 Not Verified | London to Cairo. First, on this...\n", 152 | "4 ✅ Trip Verified | This review is specifically..." 153 | ] 154 | }, 155 | "execution_count": 17, 156 | "metadata": {}, 157 | "output_type": "execute_result" 158 | } 159 | ], 160 | "source": [ 161 | "df = pd.DataFrame()\n", 162 | "df[\"reviews\"] = reviews\n", 163 | "df.head()" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 18, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "df.to_csv(\"data/BA_reviews.csv\")" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "Congratulations! Now you have your dataset for this task! The loops above collected 1000 reviews by iterating through the paginated pages on the website. However, if you want to collect more data, try increasing the number of pages!\n", 180 | "\n", 181 | " The next thing that you should do is clean this data to remove any unnecessary text from each of the rows. For example, \"✅ Trip Verified\" can be removed from each row if it exists, as it's not relevant to what we want to investigate." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [] 190 | } 191 | ], 192 | "metadata": { 193 | "kernelspec": { 194 | "display_name": "Python 3 (ipykernel)", 195 | "language": "python", 196 | "name": "python3" 197 | }, 198 | "language_info": { 199 | "codemirror_mode": { 200 | "name": "ipython", 201 | "version": 3 202 | }, 203 | "file_extension": ".py", 204 | "mimetype": "text/x-python", 205 | "name": "python", 206 | "nbconvert_exporter": "python", 207 | "pygments_lexer": "ipython3", 208 | "version": "3.9.7" 209 | }, 210 | "vscode": { 211 | "interpreter": { 212 | "hash": "4f7924c4c56b083e0e50eadfe7ef592a7a8ef70df33a0047f82280e6be1afe15" 213 | } 214 | } 215 | }, 216 | "nbformat": 4, 217 | "nbformat_minor": 4 218 | } 219 | -------------------------------------------------------------------------------- /Data Collection/.ipynb_checkpoints/getting_started-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Task 1\n", 8 | "\n", 9 | "---\n", 10 | "\n", 11 | "## Web scraping and analysis\n", 12 | "\n", 13 | "This Jupyter notebook includes some code to get you started with web scraping. We will use a package called `BeautifulSoup` to collect the data from the web. Once you've collected your data and saved it into a local `.csv` file you should start with your analysis.\n", 14 | "\n", 15 | "### Scraping data from Skytrax\n", 16 | "\n", 17 | "If you visit [https://www.airlinequality.com] you can see that there is a lot of data there. For this task, we are only interested in reviews related to British Airways and the Airline itself.\n", 18 | "\n", 19 | "If you navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use `Python` and `BeautifulSoup` to collect all the links to the reviews and then to collect the text data on each of the individual review links." 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 16, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import requests\n", 29 | "from bs4 import BeautifulSoup\n", 30 | "import pandas as pd" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 14, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "name": "stdout", 40 | "output_type": "stream", 41 | "text": [ 42 | "Scraping page 1\n", 43 | " ---> 100 total reviews\n", 44 | "Scraping page 2\n", 45 | " ---> 200 total reviews\n", 46 | "Scraping page 3\n", 47 | " ---> 300 total reviews\n", 48 | "Scraping page 4\n", 49 | " ---> 400 total reviews\n", 50 | "Scraping page 5\n", 51 | " ---> 500 total reviews\n", 52 | "Scraping page 6\n", 53 | " ---> 600 total reviews\n", 54 | "Scraping page 7\n", 55 | " ---> 700 total reviews\n", 56 | "Scraping page 8\n", 57 | " ---> 800 total reviews\n", 58 | "Scraping page 9\n", 59 | " ---> 900 total reviews\n", 60 | "Scraping page 10\n", 61 | " ---> 1000 total reviews\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "base_url = \"https://www.airlinequality.com/airline-reviews/british-airways\"\n", 67 | "pages = 10\n", 68 | "page_size = 100\n", 69 | "\n", 70 | "reviews = []\n", 71 | "\n", 72 | "# for i in range(1, pages + 1):\n", 73 | "for i in range(1, pages + 1):\n", 74 | "\n", 75 | " print(f\"Scraping page {i}\")\n", 76 | "\n", 77 | " # Create URL to collect links from paginated data\n", 78 | " url = f\"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}\"\n", 79 | "\n", 80 | " # Collect HTML data from this page\n", 81 | " response = requests.get(url)\n", 82 | "\n", 83 | " # Parse content\n", 84 | " content = response.content\n", 85 | " parsed_content = BeautifulSoup(content, 'html.parser')\n", 86 | " for para in parsed_content.find_all(\"div\", {\"class\": \"text_content\"}):\n", 87 | " reviews.append(para.get_text())\n", 88 | " \n", 89 | " print(f\" ---> {len(reviews)} total reviews\")" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 17, 95 | "metadata": {}, 96 | "outputs": [ 97 | { 98 | "data": { 99 | "text/html": [ 100 | "
\n", 101 | "\n", 114 | "\n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | "
reviews
0Not Verified | LHR-LAX. I prefer the Boeing 7...
1✅ Trip Verified | Flew back to UK from Miami ...
2✅ Trip Verified | I flew with hand baggage bu...
3Not Verified | London to Cairo. First, on this...
4✅ Trip Verified | This review is specifically...
\n", 144 | "
" 145 | ], 146 | "text/plain": [ 147 | " reviews\n", 148 | "0 Not Verified | LHR-LAX. I prefer the Boeing 7...\n", 149 | "1 ✅ Trip Verified | Flew back to UK from Miami ...\n", 150 | "2 ✅ Trip Verified | I flew with hand baggage bu...\n", 151 | "3 Not Verified | London to Cairo. First, on this...\n", 152 | "4 ✅ Trip Verified | This review is specifically..." 153 | ] 154 | }, 155 | "execution_count": 17, 156 | "metadata": {}, 157 | "output_type": "execute_result" 158 | } 159 | ], 160 | "source": [ 161 | "df = pd.DataFrame()\n", 162 | "df[\"reviews\"] = reviews\n", 163 | "df.head()" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 18, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "df.to_csv(\"data/BA_reviews.csv\")" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "Congratulations! Now you have your dataset for this task! The loops above collected 1000 reviews by iterating through the paginated pages on the website. However, if you want to collect more data, try increasing the number of pages!\n", 180 | "\n", 181 | " The next thing that you should do is clean this data to remove any unnecessary text from each of the rows. For example, \"✅ Trip Verified\" can be removed from each row if it exists, as it's not relevant to what we want to investigate." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [] 190 | } 191 | ], 192 | "metadata": { 193 | "kernelspec": { 194 | "display_name": "Python 3.9.13 ('venv': venv)", 195 | "language": "python", 196 | "name": "python3" 197 | }, 198 | "language_info": { 199 | "codemirror_mode": { 200 | "name": "ipython", 201 | "version": 3 202 | }, 203 | "file_extension": ".py", 204 | "mimetype": "text/x-python", 205 | "name": "python", 206 | "nbconvert_exporter": "python", 207 | "pygments_lexer": "ipython3", 208 | "version": "3.9.13" 209 | }, 210 | "orig_nbformat": 4, 211 | "vscode": { 212 | "interpreter": { 213 | "hash": "4f7924c4c56b083e0e50eadfe7ef592a7a8ef70df33a0047f82280e6be1afe15" 214 | } 215 | } 216 | }, 217 | "nbformat": 4, 218 | "nbformat_minor": 2 219 | } 220 | -------------------------------------------------------------------------------- /Data Collection/.ipynb_checkpoints/Data-collection-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "8d47a312-9fd2-44a8-b43e-aaaddaf670c4", 6 | "metadata": {}, 7 | "source": [ 8 | "## Data Collection\n", 9 | "\n", 10 | "In this phase we will collect the customer ratings data from the airline quality website called [Skytrax](https://www.airlinequality.com/airline-reviews/british-airways). We will collect data about airline ratings, seat ratings and lounge experience ratings from this website. " 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 199, 16 | "id": "5f43dc5f-ffc2-4ba0-8a73-3b70e9ab0fcf", 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "#imports\n", 21 | "\n", 22 | "import pandas as pd\n", 23 | "import numpy as np\n", 24 | "from bs4 import BeautifulSoup\n", 25 | "import requests " 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 200, 31 | "id": "d5c2c023-7500-4e89-94f7-ddff4864d83c", 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "#create an empty list to collect all reviews\n", 36 | "reviews = []\n", 37 | "\n", 38 | "#create an empty list to collect rating stars\n", 39 | "stars = []\n", 40 | "\n", 41 | "#create an empty list to collect date\n", 42 | "date = []\n", 43 | "\n", 44 | "#create an empty list to collect country the reviewer is from\n", 45 | "country = []" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 201, 51 | "id": "2894e93a-cbd4-45b5-b262-95f88994d83a", 52 | "metadata": {}, 53 | "outputs": [ 54 | { 55 | "name": "stdout", 56 | "output_type": "stream", 57 | "text": [ 58 | "Error on page 29\n", 59 | "Error on page 30\n", 60 | "Error on page 30\n", 61 | "Error on page 33\n", 62 | "Error on page 33\n" 63 | ] 64 | } 65 | ], 66 | "source": [ 67 | "for i in range(1, 36):\n", 68 | " page = requests.get(f\"https://www.airlinequality.com/airline-reviews/british-airways/page/{i}/?sortby=post_date%3ADesc&pagesize=100\")\n", 69 | " \n", 70 | " soup = BeautifulSoup(page.content, \"html5\")\n", 71 | " \n", 72 | " for item in soup.find_all(\"div\", class_=\"text_content\"):\n", 73 | " reviews.append(item.text)\n", 74 | " \n", 75 | " for item in soup.find_all(\"div\", class_ = \"rating-10\"):\n", 76 | " try:\n", 77 | " stars.append(item.span.text)\n", 78 | " except:\n", 79 | " print(f\"Error on page {i}\")\n", 80 | " stars.append(\"None\")\n", 81 | " \n", 82 | " #date\n", 83 | " for item in soup.find_all(\"time\"):\n", 84 | " date.append(item.text)\n", 85 | " \n", 86 | " #country\n", 87 | " for item in soup.find_all(\"h3\"):\n", 88 | " country.append(item.span.next_sibling.text.strip(\" ()\"))" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 202, 94 | "id": "0982838a-78be-4c65-82c1-351c2075fa6d", 95 | "metadata": {}, 96 | "outputs": [ 97 | { 98 | "data": { 99 | "text/plain": [ 100 | "3418" 101 | ] 102 | }, 103 | "execution_count": 202, 104 | "metadata": {}, 105 | "output_type": "execute_result" 106 | } 107 | ], 108 | "source": [ 109 | "#check the length of total reviews extracted\n", 110 | "len(reviews)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 203, 116 | "id": "82fb4e85-e73e-493d-a0db-b84da71b616d", 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "3418" 123 | ] 124 | }, 125 | "execution_count": 203, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "len(country)" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 204, 137 | "id": "df8251db-2d1d-40da-b6e6-a383d09808be", 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [ 141 | "#check the length \n", 142 | "stars = stars[:3418]" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 205, 148 | "id": "79213047-71dd-4907-8caf-d9a9d157f37a", 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "#create a dataframe from these collected lists of data\n", 153 | "\n", 154 | "df = pd.DataFrame({\"reviews\":reviews,\"stars\": stars, \"date\":date, \"country\": country})" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 206, 160 | "id": "c6532f95-e13b-4fb9-83f3-300c35c20cf6", 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "data": { 165 | "text/html": [ 166 | "
\n", 167 | "\n", 180 | "\n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | "
reviewsstarsdatecountry
0Not Verified | Worst experience ever. Outbound...\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t57th November 2022Italy
1✅ Trip Verified | Check in was a shambles at ...17th November 2022Malaysia
2✅ Trip Verified | Beyond disgusted with the fa...55th November 2022United Arab Emirates
3✅ Trip Verified | On July 19th 2022 I had subm...131st October 2022United States
4✅ Trip Verified | I booked the flight on Oct ...131st October 2022United States
\n", 228 | "
" 229 | ], 230 | "text/plain": [ 231 | " reviews \\\n", 232 | "0 Not Verified | Worst experience ever. Outbound... \n", 233 | "1 ✅ Trip Verified | Check in was a shambles at ... \n", 234 | "2 ✅ Trip Verified | Beyond disgusted with the fa... \n", 235 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... \n", 236 | "4 ✅ Trip Verified | I booked the flight on Oct ... \n", 237 | "\n", 238 | " stars date country \n", 239 | "0 \\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5 7th November 2022 Italy \n", 240 | "1 1 7th November 2022 Malaysia \n", 241 | "2 5 5th November 2022 United Arab Emirates \n", 242 | "3 1 31st October 2022 United States \n", 243 | "4 1 31st October 2022 United States " 244 | ] 245 | }, 246 | "execution_count": 206, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "df.head()" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 207, 258 | "id": "366096e8-d1f2-4033-a831-5447e52861c2", 259 | "metadata": {}, 260 | "outputs": [ 261 | { 262 | "data": { 263 | "text/plain": [ 264 | "(3418, 4)" 265 | ] 266 | }, 267 | "execution_count": 207, 268 | "metadata": {}, 269 | "output_type": "execute_result" 270 | } 271 | ], 272 | "source": [ 273 | "df.shape" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "id": "fc219e55-fdc8-4b10-a3e9-8dda764ee275", 279 | "metadata": {}, 280 | "source": [ 281 | "### Export the data into a csv format" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 208, 287 | "id": "b286e642-59a2-4da3-9120-e9db119ebea6", 288 | "metadata": {}, 289 | "outputs": [], 290 | "source": [ 291 | "import os\n", 292 | "\n", 293 | "cwd = os.getcwd()\n", 294 | "df.to_csv(cwd+ \"/BA_reviews.csv\")" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": null, 300 | "id": "eb1d69e4-3f4c-4b21-8b04-ebccfb7d6332", 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": null, 308 | "id": "591157d3-bd0c-4305-9975-c1b09cb680d3", 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [] 312 | } 313 | ], 314 | "metadata": { 315 | "kernelspec": { 316 | "display_name": "Python 3 (ipykernel)", 317 | "language": "python", 318 | "name": "python3" 319 | }, 320 | "language_info": { 321 | "codemirror_mode": { 322 | "name": "ipython", 323 | "version": 3 324 | }, 325 | "file_extension": ".py", 326 | "mimetype": "text/x-python", 327 | "name": "python", 328 | "nbconvert_exporter": "python", 329 | "pygments_lexer": "ipython3", 330 | "version": "3.9.7" 331 | } 332 | }, 333 | "nbformat": 4, 334 | "nbformat_minor": 5 335 | } 336 | -------------------------------------------------------------------------------- /Data Collection/Data-collection.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "8d47a312-9fd2-44a8-b43e-aaaddaf670c4", 6 | "metadata": {}, 7 | "source": [ 8 | "## Data Collection\n", 9 | "\n", 10 | "In this phase we will collect the customer ratings data from the airline quality website called [Skytrax](https://www.airlinequality.com/airline-reviews/british-airways). We will collect data about airline ratings, seat ratings and lounge experience ratings from this website. " 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 199, 16 | "id": "5f43dc5f-ffc2-4ba0-8a73-3b70e9ab0fcf", 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "#imports\n", 21 | "\n", 22 | "import pandas as pd\n", 23 | "import numpy as np\n", 24 | "from bs4 import BeautifulSoup\n", 25 | "import requests " 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 200, 31 | "id": "d5c2c023-7500-4e89-94f7-ddff4864d83c", 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "#create an empty list to collect all reviews\n", 36 | "reviews = []\n", 37 | "\n", 38 | "#create an empty list to collect rating stars\n", 39 | "stars = []\n", 40 | "\n", 41 | "#create an empty list to collect date\n", 42 | "date = []\n", 43 | "\n", 44 | "#create an empty list to collect country the reviewer is from\n", 45 | "country = []" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 201, 51 | "id": "2894e93a-cbd4-45b5-b262-95f88994d83a", 52 | "metadata": {}, 53 | "outputs": [ 54 | { 55 | "name": "stdout", 56 | "output_type": "stream", 57 | "text": [ 58 | "Error on page 29\n", 59 | "Error on page 30\n", 60 | "Error on page 30\n", 61 | "Error on page 33\n", 62 | "Error on page 33\n" 63 | ] 64 | } 65 | ], 66 | "source": [ 67 | "for i in range(1, 36):\n", 68 | " page = requests.get(f\"https://www.airlinequality.com/airline-reviews/british-airways/page/{i}/?sortby=post_date%3ADesc&pagesize=100\")\n", 69 | " \n", 70 | " soup = BeautifulSoup(page.content, \"html5\")\n", 71 | " \n", 72 | " for item in soup.find_all(\"div\", class_=\"text_content\"):\n", 73 | " reviews.append(item.text)\n", 74 | " \n", 75 | " for item in soup.find_all(\"div\", class_ = \"rating-10\"):\n", 76 | " try:\n", 77 | " stars.append(item.span.text)\n", 78 | " except:\n", 79 | " print(f\"Error on page {i}\")\n", 80 | " stars.append(\"None\")\n", 81 | " \n", 82 | " #date\n", 83 | " for item in soup.find_all(\"time\"):\n", 84 | " date.append(item.text)\n", 85 | " \n", 86 | " #country\n", 87 | " for item in soup.find_all(\"h3\"):\n", 88 | " country.append(item.span.next_sibling.text.strip(\" ()\"))" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 202, 94 | "id": "0982838a-78be-4c65-82c1-351c2075fa6d", 95 | "metadata": {}, 96 | "outputs": [ 97 | { 98 | "data": { 99 | "text/plain": [ 100 | "3418" 101 | ] 102 | }, 103 | "execution_count": 202, 104 | "metadata": {}, 105 | "output_type": "execute_result" 106 | } 107 | ], 108 | "source": [ 109 | "#check the length of total reviews extracted\n", 110 | "len(reviews)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 203, 116 | "id": "82fb4e85-e73e-493d-a0db-b84da71b616d", 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "3418" 123 | ] 124 | }, 125 | "execution_count": 203, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "len(country)" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 204, 137 | "id": "df8251db-2d1d-40da-b6e6-a383d09808be", 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [ 141 | "#check the length \n", 142 | "stars = stars[:3418]" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 205, 148 | "id": "79213047-71dd-4907-8caf-d9a9d157f37a", 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "#create a dataframe from these collected lists of data\n", 153 | "\n", 154 | "df = pd.DataFrame({\"reviews\":reviews,\"stars\": stars, \"date\":date, \"country\": country})" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 206, 160 | "id": "c6532f95-e13b-4fb9-83f3-300c35c20cf6", 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "data": { 165 | "text/html": [ 166 | "
\n", 167 | "\n", 180 | "\n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | "
reviewsstarsdatecountry
0Not Verified | Worst experience ever. Outbound...\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t57th November 2022Italy
1✅ Trip Verified | Check in was a shambles at ...17th November 2022Malaysia
2✅ Trip Verified | Beyond disgusted with the fa...55th November 2022United Arab Emirates
3✅ Trip Verified | On July 19th 2022 I had subm...131st October 2022United States
4✅ Trip Verified | I booked the flight on Oct ...131st October 2022United States
\n", 228 | "
" 229 | ], 230 | "text/plain": [ 231 | " reviews \\\n", 232 | "0 Not Verified | Worst experience ever. Outbound... \n", 233 | "1 ✅ Trip Verified | Check in was a shambles at ... \n", 234 | "2 ✅ Trip Verified | Beyond disgusted with the fa... \n", 235 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... \n", 236 | "4 ✅ Trip Verified | I booked the flight on Oct ... \n", 237 | "\n", 238 | " stars date country \n", 239 | "0 \\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5 7th November 2022 Italy \n", 240 | "1 1 7th November 2022 Malaysia \n", 241 | "2 5 5th November 2022 United Arab Emirates \n", 242 | "3 1 31st October 2022 United States \n", 243 | "4 1 31st October 2022 United States " 244 | ] 245 | }, 246 | "execution_count": 206, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "df.head()" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 207, 258 | "id": "366096e8-d1f2-4033-a831-5447e52861c2", 259 | "metadata": {}, 260 | "outputs": [ 261 | { 262 | "data": { 263 | "text/plain": [ 264 | "(3418, 4)" 265 | ] 266 | }, 267 | "execution_count": 207, 268 | "metadata": {}, 269 | "output_type": "execute_result" 270 | } 271 | ], 272 | "source": [ 273 | "df.shape" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "id": "fc219e55-fdc8-4b10-a3e9-8dda764ee275", 279 | "metadata": {}, 280 | "source": [ 281 | "### Export the data into a csv format" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 208, 287 | "id": "b286e642-59a2-4da3-9120-e9db119ebea6", 288 | "metadata": {}, 289 | "outputs": [], 290 | "source": [ 291 | "import os\n", 292 | "\n", 293 | "cwd = os.getcwd()\n", 294 | "df.to_csv(cwd+ \"/BA_reviews.csv\")" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": null, 300 | "id": "eb1d69e4-3f4c-4b21-8b04-ebccfb7d6332", 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": null, 308 | "id": "591157d3-bd0c-4305-9975-c1b09cb680d3", 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "id": "360952f1-decb-4f41-abbf-54c5ac7b9e59", 317 | "metadata": {}, 318 | "outputs": [], 319 | "source": [] 320 | } 321 | ], 322 | "metadata": { 323 | "kernelspec": { 324 | "display_name": "Python 3 (ipykernel)", 325 | "language": "python", 326 | "name": "python3" 327 | }, 328 | "language_info": { 329 | "codemirror_mode": { 330 | "name": "ipython", 331 | "version": 3 332 | }, 333 | "file_extension": ".py", 334 | "mimetype": "text/x-python", 335 | "name": "python", 336 | "nbconvert_exporter": "python", 337 | "pygments_lexer": "ipython3", 338 | "version": "3.9.7" 339 | } 340 | }, 341 | "nbformat": 4, 342 | "nbformat_minor": 5 343 | } 344 | -------------------------------------------------------------------------------- /Data Cleaning/Data Cleaning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "e00b5383-b915-4a16-8ed0-476f979052b7", 6 | "metadata": {}, 7 | "source": [ 8 | "## Data Cleaning\n", 9 | "\n", 10 | "Now since we have extracted data from the website, it is not cleaned and ready to be analyzed yet. The reviews section will need to be cleaned for punctuations, spellings and other characters. " 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 83, 16 | "id": "69c14a0d-d637-4b7e-8714-31fb531b4954", 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "#imports\n", 21 | "\n", 22 | "import pandas as pd\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "import seaborn as sns\n", 25 | "import os\n", 26 | "\n", 27 | "#regex\n", 28 | "import re" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 84, 34 | "id": "c46b06dc-b9f5-4b9f-b52b-7f9b01ff7200", 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "#create a dataframe from csv file\n", 39 | "\n", 40 | "cwd = os.getcwd()\n", 41 | "\n", 42 | "df = pd.read_csv(cwd+\"/BA_reviews.csv\", index_col=0)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 85, 48 | "id": "888d1113-bafd-4e6c-a020-c0e7b721ce50", 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/html": [ 54 | "
\n", 55 | "\n", 68 | "\n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | "
reviewsstarsdatecountry
0Not Verified | Worst experience ever. Outbound...\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t57th November 2022Italy
1✅ Trip Verified | Check in was a shambles at ...17th November 2022Malaysia
2✅ Trip Verified | Beyond disgusted with the fa...55th November 2022United Arab Emirates
3✅ Trip Verified | On July 19th 2022 I had subm...131st October 2022United States
4✅ Trip Verified | I booked the flight on Oct ...131st October 2022United States
\n", 116 | "
" 117 | ], 118 | "text/plain": [ 119 | " reviews \\\n", 120 | "0 Not Verified | Worst experience ever. Outbound... \n", 121 | "1 ✅ Trip Verified | Check in was a shambles at ... \n", 122 | "2 ✅ Trip Verified | Beyond disgusted with the fa... \n", 123 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... \n", 124 | "4 ✅ Trip Verified | I booked the flight on Oct ... \n", 125 | "\n", 126 | " stars date country \n", 127 | "0 \\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5 7th November 2022 Italy \n", 128 | "1 1 7th November 2022 Malaysia \n", 129 | "2 5 5th November 2022 United Arab Emirates \n", 130 | "3 1 31st October 2022 United States \n", 131 | "4 1 31st October 2022 United States " 132 | ] 133 | }, 134 | "execution_count": 85, 135 | "metadata": {}, 136 | "output_type": "execute_result" 137 | } 138 | ], 139 | "source": [ 140 | "df.head()" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "id": "7aeb0b06-23f8-48bb-b2d4-b91455040038", 146 | "metadata": {}, 147 | "source": [ 148 | "We will also create a column which mentions if the user is verified or not. " 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 86, 154 | "id": "45908798-e042-4d71-8e01-395f53f2005d", 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "df['verified'] = df.reviews.str.contains(\"Trip Verified\")" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 87, 164 | "id": "ae4e6e23-4ecc-4c94-8526-5f645ff2f6a2", 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "data": { 169 | "text/plain": [ 170 | "0 False\n", 171 | "1 True\n", 172 | "2 True\n", 173 | "3 True\n", 174 | "4 True\n", 175 | " ... \n", 176 | "3413 False\n", 177 | "3414 False\n", 178 | "3415 False\n", 179 | "3416 False\n", 180 | "3417 False\n", 181 | "Name: verified, Length: 3418, dtype: bool" 182 | ] 183 | }, 184 | "execution_count": 87, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "df['verified']" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "id": "cf266fe0-0c33-477a-9f1e-529a214e91c5", 196 | "metadata": {}, 197 | "source": [ 198 | "### Cleaning Reviews" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "id": "ac2bc6a6-8783-4873-b5e7-1719f997cdf2", 204 | "metadata": {}, 205 | "source": [ 206 | "We will extract the column of reviews into a separate dataframe and clean it for semantic analysis" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 88, 212 | "id": "e4319089-b957-4b60-8f07-fc8cd3487f9e", 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "#for lemmatization of words we will use nltk library\n", 217 | "from nltk.stem import WordNetLemmatizer\n", 218 | "from nltk.corpus import stopwords\n", 219 | "lemma = WordNetLemmatizer()\n", 220 | "\n", 221 | "\n", 222 | "reviews_data = df.reviews.str.strip(\"✅ Trip Verified |\")\n", 223 | "\n", 224 | "#create an empty list to collect cleaned data corpus\n", 225 | "corpus =[]\n", 226 | "\n", 227 | "#loop through each review, remove punctuations, small case it, join it and add it to corpus\n", 228 | "for rev in reviews_data:\n", 229 | " rev = re.sub('[^a-zA-Z]',' ', rev)\n", 230 | " rev = rev.lower()\n", 231 | " rev = rev.split()\n", 232 | " rev = [lemma.lemmatize(word) for word in rev if word not in set(stopwords.words(\"english\"))]\n", 233 | " rev = \" \".join(rev)\n", 234 | " corpus.append(rev)" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 89, 240 | "id": "835b9930-20e0-4d79-8a88-a5737309b4ce", 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [ 244 | "# add the corpus to the original dataframe\n", 245 | "\n", 246 | "df['corpus'] = corpus" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 90, 252 | "id": "baa16beb-4ac5-4850-a46a-0ae11d64cf3d", 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "data": { 257 | "text/html": [ 258 | "
\n", 259 | "\n", 272 | "\n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | "
reviewsstarsdatecountryverifiedcorpus
0Not Verified | Worst experience ever. Outbound...\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t57th November 2022ItalyFalseverified worst experience ever outbound flight...
1✅ Trip Verified | Check in was a shambles at ...17th November 2022MalaysiaTruecheck shamble bwi counter open full flight bag...
2✅ Trip Verified | Beyond disgusted with the fa...55th November 2022United Arab EmiratesTruebeyond disgusted fact baggage yet delivered we...
3✅ Trip Verified | On July 19th 2022 I had subm...131st October 2022United StatesTruejuly th submitted complaint form regard fact b...
4✅ Trip Verified | I booked the flight on Oct ...131st October 2022United StatesTruebooked flight oct cancel flight day learning g...
\n", 332 | "
" 333 | ], 334 | "text/plain": [ 335 | " reviews \\\n", 336 | "0 Not Verified | Worst experience ever. Outbound... \n", 337 | "1 ✅ Trip Verified | Check in was a shambles at ... \n", 338 | "2 ✅ Trip Verified | Beyond disgusted with the fa... \n", 339 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... \n", 340 | "4 ✅ Trip Verified | I booked the flight on Oct ... \n", 341 | "\n", 342 | " stars date country \\\n", 343 | "0 \\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5 7th November 2022 Italy \n", 344 | "1 1 7th November 2022 Malaysia \n", 345 | "2 5 5th November 2022 United Arab Emirates \n", 346 | "3 1 31st October 2022 United States \n", 347 | "4 1 31st October 2022 United States \n", 348 | "\n", 349 | " verified corpus \n", 350 | "0 False verified worst experience ever outbound flight... \n", 351 | "1 True check shamble bwi counter open full flight bag... \n", 352 | "2 True beyond disgusted fact baggage yet delivered we... \n", 353 | "3 True july th submitted complaint form regard fact b... \n", 354 | "4 True booked flight oct cancel flight day learning g... " 355 | ] 356 | }, 357 | "execution_count": 90, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "df.head()" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "id": "63424d81-bab2-4e74-aa97-6094bd035940", 369 | "metadata": {}, 370 | "source": [ 371 | "### Cleaning/Fromat date" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 91, 377 | "id": "33f1a718-af12-4fa4-85ae-f5511e8986ae", 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "reviews object\n", 384 | "stars object\n", 385 | "date object\n", 386 | "country object\n", 387 | "verified bool\n", 388 | "corpus object\n", 389 | "dtype: object" 390 | ] 391 | }, 392 | "execution_count": 91, 393 | "metadata": {}, 394 | "output_type": "execute_result" 395 | } 396 | ], 397 | "source": [ 398 | "df.dtypes" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 92, 404 | "id": "c773fca4-23ee-4f7d-a1db-8da648ddb161", 405 | "metadata": {}, 406 | "outputs": [], 407 | "source": [ 408 | "# convert the date to datetime format\n", 409 | "\n", 410 | "df.date = pd.to_datetime(df.date)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 93, 416 | "id": "0440c89e-9a89-4ef4-888f-eb25ad833afb", 417 | "metadata": {}, 418 | "outputs": [ 419 | { 420 | "data": { 421 | "text/plain": [ 422 | "0 2022-11-07\n", 423 | "1 2022-11-07\n", 424 | "2 2022-11-05\n", 425 | "3 2022-10-31\n", 426 | "4 2022-10-31\n", 427 | "Name: date, dtype: datetime64[ns]" 428 | ] 429 | }, 430 | "execution_count": 93, 431 | "metadata": {}, 432 | "output_type": "execute_result" 433 | } 434 | ], 435 | "source": [ 436 | "df.date.head()" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "id": "a7973868-80c4-46b9-9449-6761fac91f64", 442 | "metadata": {}, 443 | "source": [ 444 | "### Cleaning ratings with stars" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 100, 450 | "id": "4f535799-1f5a-43d2-a858-5639f35b073c", 451 | "metadata": {}, 452 | "outputs": [ 453 | { 454 | "data": { 455 | "text/plain": [ 456 | "array(['\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5', '1', '5', '6', '9', '3', '2', '8',\n", 457 | " '7', '10', '4', 'None'], dtype=object)" 458 | ] 459 | }, 460 | "execution_count": 100, 461 | "metadata": {}, 462 | "output_type": "execute_result" 463 | } 464 | ], 465 | "source": [ 466 | "#check for unique values\n", 467 | "df.stars.unique()" 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": 102, 473 | "id": "5664b1c3-17d3-4f54-8799-388443a6242d", 474 | "metadata": {}, 475 | "outputs": [], 476 | "source": [ 477 | "# remove the \\t and \\n from the ratings\n", 478 | "df.stars = df.stars.str.strip(\"\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\")" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": 105, 484 | "id": "9cbfaa06-b519-4ce1-bb50-420692a923d6", 485 | "metadata": {}, 486 | "outputs": [ 487 | { 488 | "data": { 489 | "text/plain": [ 490 | "1 735\n", 491 | "2 382\n", 492 | "3 379\n", 493 | "8 349\n", 494 | "10 306\n", 495 | "7 299\n", 496 | "9 293\n", 497 | "5 259\n", 498 | "4 227\n", 499 | "6 182\n", 500 | "None 5\n", 501 | "Name: stars, dtype: int64" 502 | ] 503 | }, 504 | "execution_count": 105, 505 | "metadata": {}, 506 | "output_type": "execute_result" 507 | } 508 | ], 509 | "source": [ 510 | "df.stars.value_counts()" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "id": "23941023-6f4d-44a3-9ae5-34ba7270a646", 516 | "metadata": {}, 517 | "source": [ 518 | "There are 5 rows having values \"None\" in the ratings. We will drop all these 5 rows. " 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 110, 524 | "id": "e5d3e147-b454-4515-8b14-54e4ec756b1c", 525 | "metadata": {}, 526 | "outputs": [], 527 | "source": [ 528 | "# drop the rows where the value of ratings is None\n", 529 | "df.drop(df[df.stars == \"None\"].index, axis=0, inplace=True)" 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "execution_count": 111, 535 | "id": "e152aed3-2da4-4ee9-90dc-d2bd3553d870", 536 | "metadata": {}, 537 | "outputs": [ 538 | { 539 | "data": { 540 | "text/plain": [ 541 | "array(['5', '1', '6', '9', '3', '2', '8', '7', '10', '4'], dtype=object)" 542 | ] 543 | }, 544 | "execution_count": 111, 545 | "metadata": {}, 546 | "output_type": "execute_result" 547 | } 548 | ], 549 | "source": [ 550 | "#check the unique values again\n", 551 | "df.stars.unique()" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "id": "c29e569e-bab0-4d73-8496-f60aa375be38", 557 | "metadata": {}, 558 | "source": [ 559 | "## Check for null Values" 560 | ] 561 | }, 562 | { 563 | "cell_type": "code", 564 | "execution_count": 112, 565 | "id": "e81950f2-0cb3-4c1a-9859-233f4c3304d7", 566 | "metadata": {}, 567 | "outputs": [ 568 | { 569 | "data": { 570 | "text/plain": [ 571 | "reviews stars date country verified corpus\n", 572 | "False False False False False False 3411\n", 573 | "dtype: int64" 574 | ] 575 | }, 576 | "execution_count": 112, 577 | "metadata": {}, 578 | "output_type": "execute_result" 579 | } 580 | ], 581 | "source": [ 582 | "df.isnull().value_counts()" 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": 113, 588 | "id": "7c53f6a8-3f14-4711-a11a-a99f72015431", 589 | "metadata": {}, 590 | "outputs": [ 591 | { 592 | "data": { 593 | "text/plain": [ 594 | "False 3411\n", 595 | "Name: country, dtype: int64" 596 | ] 597 | }, 598 | "execution_count": 113, 599 | "metadata": {}, 600 | "output_type": "execute_result" 601 | } 602 | ], 603 | "source": [ 604 | "df.country.isnull().value_counts()" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "id": "3d9dbc68-9826-49e0-a003-d1c0c5e5d4f6", 610 | "metadata": {}, 611 | "source": [ 612 | "We have two missing values for country. For this we can just remove those two reviews (rows) from the dataframe. " 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 114, 618 | "id": "98b5b16a-c25d-475b-8894-e680323195eb", 619 | "metadata": {}, 620 | "outputs": [], 621 | "source": [ 622 | "#drop the rows using index where the country value is null\n", 623 | "df.drop(df[df.country.isnull() == True].index, axis=0, inplace=True)" 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 115, 629 | "id": "116656f9-0f7a-4d79-8f69-7f4fbf711eca", 630 | "metadata": {}, 631 | "outputs": [ 632 | { 633 | "data": { 634 | "text/plain": [ 635 | "(3411, 6)" 636 | ] 637 | }, 638 | "execution_count": 115, 639 | "metadata": {}, 640 | "output_type": "execute_result" 641 | } 642 | ], 643 | "source": [ 644 | "df.shape" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": 116, 650 | "id": "54a2b64d-4730-44f3-b40c-409d098f78f9", 651 | "metadata": {}, 652 | "outputs": [ 653 | { 654 | "data": { 655 | "text/html": [ 656 | "
\n", 657 | "\n", 670 | "\n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | "
reviewsstarsdatecountryverifiedcorpus
0Not Verified | Worst experience ever. Outbound...52022-11-07ItalyFalseverified worst experience ever outbound flight...
1✅ Trip Verified | Check in was a shambles at ...12022-11-07MalaysiaTruecheck shamble bwi counter open full flight bag...
2✅ Trip Verified | Beyond disgusted with the fa...52022-11-05United Arab EmiratesTruebeyond disgusted fact baggage yet delivered we...
3✅ Trip Verified | On July 19th 2022 I had subm...12022-10-31United StatesTruejuly th submitted complaint form regard fact b...
4✅ Trip Verified | I booked the flight on Oct ...12022-10-31United StatesTruebooked flight oct cancel flight day learning g...
.....................
3406This was a bmi Regional operated flight on a R...12012-08-29United KingdomFalsebmi regional operated flight rj manchester hea...
3407LHR to HAM. Purser addresses all club passenge...102012-08-28United KingdomFalselhr ham purser address club passenger name boa...
3408My son who had worked for British Airways urge...102011-10-12United KingdomFalseson worked british airway urged fly british ai...
3409London City-New York JFK via Shannon on A318 b...82011-10-11United StatesFalselondon city new york jfk via shannon really ni...
3410SIN-LHR BA12 B747-436 First Class. Old aircraf...92011-10-09United KingdomFalsesin lhr ba b first class old aircraft seat pri...
\n", 784 | "

3411 rows × 6 columns

\n", 785 | "
" 786 | ], 787 | "text/plain": [ 788 | " reviews stars date \\\n", 789 | "0 Not Verified | Worst experience ever. Outbound... 5 2022-11-07 \n", 790 | "1 ✅ Trip Verified | Check in was a shambles at ... 1 2022-11-07 \n", 791 | "2 ✅ Trip Verified | Beyond disgusted with the fa... 5 2022-11-05 \n", 792 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... 1 2022-10-31 \n", 793 | "4 ✅ Trip Verified | I booked the flight on Oct ... 1 2022-10-31 \n", 794 | "... ... ... ... \n", 795 | "3406 This was a bmi Regional operated flight on a R... 1 2012-08-29 \n", 796 | "3407 LHR to HAM. Purser addresses all club passenge... 10 2012-08-28 \n", 797 | "3408 My son who had worked for British Airways urge... 10 2011-10-12 \n", 798 | "3409 London City-New York JFK via Shannon on A318 b... 8 2011-10-11 \n", 799 | "3410 SIN-LHR BA12 B747-436 First Class. Old aircraf... 9 2011-10-09 \n", 800 | "\n", 801 | " country verified \\\n", 802 | "0 Italy False \n", 803 | "1 Malaysia True \n", 804 | "2 United Arab Emirates True \n", 805 | "3 United States True \n", 806 | "4 United States True \n", 807 | "... ... ... \n", 808 | "3406 United Kingdom False \n", 809 | "3407 United Kingdom False \n", 810 | "3408 United Kingdom False \n", 811 | "3409 United States False \n", 812 | "3410 United Kingdom False \n", 813 | "\n", 814 | " corpus \n", 815 | "0 verified worst experience ever outbound flight... \n", 816 | "1 check shamble bwi counter open full flight bag... \n", 817 | "2 beyond disgusted fact baggage yet delivered we... \n", 818 | "3 july th submitted complaint form regard fact b... \n", 819 | "4 booked flight oct cancel flight day learning g... \n", 820 | "... ... \n", 821 | "3406 bmi regional operated flight rj manchester hea... \n", 822 | "3407 lhr ham purser address club passenger name boa... \n", 823 | "3408 son worked british airway urged fly british ai... \n", 824 | "3409 london city new york jfk via shannon really ni... \n", 825 | "3410 sin lhr ba b first class old aircraft seat pri... \n", 826 | "\n", 827 | "[3411 rows x 6 columns]" 828 | ] 829 | }, 830 | "execution_count": 116, 831 | "metadata": {}, 832 | "output_type": "execute_result" 833 | } 834 | ], 835 | "source": [ 836 | "#resetting the index\n", 837 | "df.reset_index(drop=True)" 838 | ] 839 | }, 840 | { 841 | "cell_type": "markdown", 842 | "id": "9fa4efa3-5d41-4fa7-a5ac-a959e322fcc9", 843 | "metadata": {}, 844 | "source": [ 845 | "*****" 846 | ] 847 | }, 848 | { 849 | "cell_type": "markdown", 850 | "id": "cbb39abb-e09a-4bad-9fa6-65bba66ed631", 851 | "metadata": {}, 852 | "source": [ 853 | "Now our data is all cleaned and ready for data visualization and data analysis." 854 | ] 855 | }, 856 | { 857 | "cell_type": "code", 858 | "execution_count": 117, 859 | "id": "d9f0183c-e72a-4028-9c9a-1f3dee87c342", 860 | "metadata": {}, 861 | "outputs": [], 862 | "source": [ 863 | "# export the cleaned data\n", 864 | "\n", 865 | "df.to_csv(cwd + \"/cleaned-BA-reviews.csv\")" 866 | ] 867 | }, 868 | { 869 | "cell_type": "code", 870 | "execution_count": null, 871 | "id": "4f9bad2f-721b-445a-86f9-bc9e16c177ce", 872 | "metadata": {}, 873 | "outputs": [], 874 | "source": [] 875 | } 876 | ], 877 | "metadata": { 878 | "kernelspec": { 879 | "display_name": "Python 3 (ipykernel)", 880 | "language": "python", 881 | "name": "python3" 882 | }, 883 | "language_info": { 884 | "codemirror_mode": { 885 | "name": "ipython", 886 | "version": 3 887 | }, 888 | "file_extension": ".py", 889 | "mimetype": "text/x-python", 890 | "name": "python", 891 | "nbconvert_exporter": "python", 892 | "pygments_lexer": "ipython3", 893 | "version": "3.9.7" 894 | } 895 | }, 896 | "nbformat": 4, 897 | "nbformat_minor": 5 898 | } 899 | -------------------------------------------------------------------------------- /Data Cleaning/.ipynb_checkpoints/Data Cleaning-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "e00b5383-b915-4a16-8ed0-476f979052b7", 6 | "metadata": {}, 7 | "source": [ 8 | "## Data Cleaning\n", 9 | "\n", 10 | "Now since we have extracted data from the website, it is not cleaned and ready to be analyzed yet. The reviews section will need to be cleaned for punctuations, spellings and other characters. " 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 83, 16 | "id": "69c14a0d-d637-4b7e-8714-31fb531b4954", 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "#imports\n", 21 | "\n", 22 | "import pandas as pd\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "import seaborn as sns\n", 25 | "import os\n", 26 | "\n", 27 | "#regex\n", 28 | "import re" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 84, 34 | "id": "c46b06dc-b9f5-4b9f-b52b-7f9b01ff7200", 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "#create a dataframe from csv file\n", 39 | "\n", 40 | "cwd = os.getcwd()\n", 41 | "\n", 42 | "df = pd.read_csv(cwd+\"/BA_reviews.csv\", index_col=0)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 85, 48 | "id": "888d1113-bafd-4e6c-a020-c0e7b721ce50", 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/html": [ 54 | "
\n", 55 | "\n", 68 | "\n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | "
reviewsstarsdatecountry
0Not Verified | Worst experience ever. Outbound...\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t57th November 2022Italy
1✅ Trip Verified | Check in was a shambles at ...17th November 2022Malaysia
2✅ Trip Verified | Beyond disgusted with the fa...55th November 2022United Arab Emirates
3✅ Trip Verified | On July 19th 2022 I had subm...131st October 2022United States
4✅ Trip Verified | I booked the flight on Oct ...131st October 2022United States
\n", 116 | "
" 117 | ], 118 | "text/plain": [ 119 | " reviews \\\n", 120 | "0 Not Verified | Worst experience ever. Outbound... \n", 121 | "1 ✅ Trip Verified | Check in was a shambles at ... \n", 122 | "2 ✅ Trip Verified | Beyond disgusted with the fa... \n", 123 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... \n", 124 | "4 ✅ Trip Verified | I booked the flight on Oct ... \n", 125 | "\n", 126 | " stars date country \n", 127 | "0 \\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5 7th November 2022 Italy \n", 128 | "1 1 7th November 2022 Malaysia \n", 129 | "2 5 5th November 2022 United Arab Emirates \n", 130 | "3 1 31st October 2022 United States \n", 131 | "4 1 31st October 2022 United States " 132 | ] 133 | }, 134 | "execution_count": 85, 135 | "metadata": {}, 136 | "output_type": "execute_result" 137 | } 138 | ], 139 | "source": [ 140 | "df.head()" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "id": "7aeb0b06-23f8-48bb-b2d4-b91455040038", 146 | "metadata": {}, 147 | "source": [ 148 | "We will also create a column which mentions if the user is verified or not. " 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 86, 154 | "id": "45908798-e042-4d71-8e01-395f53f2005d", 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "df['verified'] = df.reviews.str.contains(\"Trip Verified\")" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 87, 164 | "id": "ae4e6e23-4ecc-4c94-8526-5f645ff2f6a2", 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "data": { 169 | "text/plain": [ 170 | "0 False\n", 171 | "1 True\n", 172 | "2 True\n", 173 | "3 True\n", 174 | "4 True\n", 175 | " ... \n", 176 | "3413 False\n", 177 | "3414 False\n", 178 | "3415 False\n", 179 | "3416 False\n", 180 | "3417 False\n", 181 | "Name: verified, Length: 3418, dtype: bool" 182 | ] 183 | }, 184 | "execution_count": 87, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "df['verified']" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "id": "cf266fe0-0c33-477a-9f1e-529a214e91c5", 196 | "metadata": {}, 197 | "source": [ 198 | "### Cleaning Reviews" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "id": "ac2bc6a6-8783-4873-b5e7-1719f997cdf2", 204 | "metadata": {}, 205 | "source": [ 206 | "We will extract the column of reviews into a separate dataframe and clean it for semantic analysis" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 88, 212 | "id": "e4319089-b957-4b60-8f07-fc8cd3487f9e", 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "#for lemmatization of words we will use nltk library\n", 217 | "from nltk.stem import WordNetLemmatizer\n", 218 | "from nltk.corpus import stopwords\n", 219 | "lemma = WordNetLemmatizer()\n", 220 | "\n", 221 | "\n", 222 | "reviews_data = df.reviews.str.strip(\"✅ Trip Verified |\")\n", 223 | "\n", 224 | "#create an empty list to collect cleaned data corpus\n", 225 | "corpus =[]\n", 226 | "\n", 227 | "#loop through each review, remove punctuations, small case it, join it and add it to corpus\n", 228 | "for rev in reviews_data:\n", 229 | " rev = re.sub('[^a-zA-Z]',' ', rev)\n", 230 | " rev = rev.lower()\n", 231 | " rev = rev.split()\n", 232 | " rev = [lemma.lemmatize(word) for word in rev if word not in set(stopwords.words(\"english\"))]\n", 233 | " rev = \" \".join(rev)\n", 234 | " corpus.append(rev)" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 89, 240 | "id": "835b9930-20e0-4d79-8a88-a5737309b4ce", 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [ 244 | "# add the corpus to the original dataframe\n", 245 | "\n", 246 | "df['corpus'] = corpus" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 90, 252 | "id": "baa16beb-4ac5-4850-a46a-0ae11d64cf3d", 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "data": { 257 | "text/html": [ 258 | "
\n", 259 | "\n", 272 | "\n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | "
reviewsstarsdatecountryverifiedcorpus
0Not Verified | Worst experience ever. Outbound...\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t57th November 2022ItalyFalseverified worst experience ever outbound flight...
1✅ Trip Verified | Check in was a shambles at ...17th November 2022MalaysiaTruecheck shamble bwi counter open full flight bag...
2✅ Trip Verified | Beyond disgusted with the fa...55th November 2022United Arab EmiratesTruebeyond disgusted fact baggage yet delivered we...
3✅ Trip Verified | On July 19th 2022 I had subm...131st October 2022United StatesTruejuly th submitted complaint form regard fact b...
4✅ Trip Verified | I booked the flight on Oct ...131st October 2022United StatesTruebooked flight oct cancel flight day learning g...
\n", 332 | "
" 333 | ], 334 | "text/plain": [ 335 | " reviews \\\n", 336 | "0 Not Verified | Worst experience ever. Outbound... \n", 337 | "1 ✅ Trip Verified | Check in was a shambles at ... \n", 338 | "2 ✅ Trip Verified | Beyond disgusted with the fa... \n", 339 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... \n", 340 | "4 ✅ Trip Verified | I booked the flight on Oct ... \n", 341 | "\n", 342 | " stars date country \\\n", 343 | "0 \\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5 7th November 2022 Italy \n", 344 | "1 1 7th November 2022 Malaysia \n", 345 | "2 5 5th November 2022 United Arab Emirates \n", 346 | "3 1 31st October 2022 United States \n", 347 | "4 1 31st October 2022 United States \n", 348 | "\n", 349 | " verified corpus \n", 350 | "0 False verified worst experience ever outbound flight... \n", 351 | "1 True check shamble bwi counter open full flight bag... \n", 352 | "2 True beyond disgusted fact baggage yet delivered we... \n", 353 | "3 True july th submitted complaint form regard fact b... \n", 354 | "4 True booked flight oct cancel flight day learning g... " 355 | ] 356 | }, 357 | "execution_count": 90, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "df.head()" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "id": "63424d81-bab2-4e74-aa97-6094bd035940", 369 | "metadata": {}, 370 | "source": [ 371 | "### Cleaning/Fromat date" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 91, 377 | "id": "33f1a718-af12-4fa4-85ae-f5511e8986ae", 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "reviews object\n", 384 | "stars object\n", 385 | "date object\n", 386 | "country object\n", 387 | "verified bool\n", 388 | "corpus object\n", 389 | "dtype: object" 390 | ] 391 | }, 392 | "execution_count": 91, 393 | "metadata": {}, 394 | "output_type": "execute_result" 395 | } 396 | ], 397 | "source": [ 398 | "df.dtypes" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 92, 404 | "id": "c773fca4-23ee-4f7d-a1db-8da648ddb161", 405 | "metadata": {}, 406 | "outputs": [], 407 | "source": [ 408 | "# convert the date to datetime format\n", 409 | "\n", 410 | "df.date = pd.to_datetime(df.date)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 93, 416 | "id": "0440c89e-9a89-4ef4-888f-eb25ad833afb", 417 | "metadata": {}, 418 | "outputs": [ 419 | { 420 | "data": { 421 | "text/plain": [ 422 | "0 2022-11-07\n", 423 | "1 2022-11-07\n", 424 | "2 2022-11-05\n", 425 | "3 2022-10-31\n", 426 | "4 2022-10-31\n", 427 | "Name: date, dtype: datetime64[ns]" 428 | ] 429 | }, 430 | "execution_count": 93, 431 | "metadata": {}, 432 | "output_type": "execute_result" 433 | } 434 | ], 435 | "source": [ 436 | "df.date.head()" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "id": "a7973868-80c4-46b9-9449-6761fac91f64", 442 | "metadata": {}, 443 | "source": [ 444 | "### Cleaning ratings with stars" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 100, 450 | "id": "4f535799-1f5a-43d2-a858-5639f35b073c", 451 | "metadata": {}, 452 | "outputs": [ 453 | { 454 | "data": { 455 | "text/plain": [ 456 | "array(['\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t5', '1', '5', '6', '9', '3', '2', '8',\n", 457 | " '7', '10', '4', 'None'], dtype=object)" 458 | ] 459 | }, 460 | "execution_count": 100, 461 | "metadata": {}, 462 | "output_type": "execute_result" 463 | } 464 | ], 465 | "source": [ 466 | "#check for unique values\n", 467 | "df.stars.unique()" 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": 102, 473 | "id": "5664b1c3-17d3-4f54-8799-388443a6242d", 474 | "metadata": {}, 475 | "outputs": [], 476 | "source": [ 477 | "# remove the \\t and \\n from the ratings\n", 478 | "df.stars = df.stars.str.strip(\"\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\")" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": 105, 484 | "id": "9cbfaa06-b519-4ce1-bb50-420692a923d6", 485 | "metadata": {}, 486 | "outputs": [ 487 | { 488 | "data": { 489 | "text/plain": [ 490 | "1 735\n", 491 | "2 382\n", 492 | "3 379\n", 493 | "8 349\n", 494 | "10 306\n", 495 | "7 299\n", 496 | "9 293\n", 497 | "5 259\n", 498 | "4 227\n", 499 | "6 182\n", 500 | "None 5\n", 501 | "Name: stars, dtype: int64" 502 | ] 503 | }, 504 | "execution_count": 105, 505 | "metadata": {}, 506 | "output_type": "execute_result" 507 | } 508 | ], 509 | "source": [ 510 | "df.stars.value_counts()" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "id": "23941023-6f4d-44a3-9ae5-34ba7270a646", 516 | "metadata": {}, 517 | "source": [ 518 | "There are 5 rows having values \"None\" in the ratings. We will drop all these 5 rows. " 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 110, 524 | "id": "e5d3e147-b454-4515-8b14-54e4ec756b1c", 525 | "metadata": {}, 526 | "outputs": [], 527 | "source": [ 528 | "# drop the rows where the value of ratings is None\n", 529 | "df.drop(df[df.stars == \"None\"].index, axis=0, inplace=True)" 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "execution_count": 111, 535 | "id": "e152aed3-2da4-4ee9-90dc-d2bd3553d870", 536 | "metadata": {}, 537 | "outputs": [ 538 | { 539 | "data": { 540 | "text/plain": [ 541 | "array(['5', '1', '6', '9', '3', '2', '8', '7', '10', '4'], dtype=object)" 542 | ] 543 | }, 544 | "execution_count": 111, 545 | "metadata": {}, 546 | "output_type": "execute_result" 547 | } 548 | ], 549 | "source": [ 550 | "#check the unique values again\n", 551 | "df.stars.unique()" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "id": "c29e569e-bab0-4d73-8496-f60aa375be38", 557 | "metadata": {}, 558 | "source": [ 559 | "## Check for null Values" 560 | ] 561 | }, 562 | { 563 | "cell_type": "code", 564 | "execution_count": 112, 565 | "id": "e81950f2-0cb3-4c1a-9859-233f4c3304d7", 566 | "metadata": {}, 567 | "outputs": [ 568 | { 569 | "data": { 570 | "text/plain": [ 571 | "reviews stars date country verified corpus\n", 572 | "False False False False False False 3411\n", 573 | "dtype: int64" 574 | ] 575 | }, 576 | "execution_count": 112, 577 | "metadata": {}, 578 | "output_type": "execute_result" 579 | } 580 | ], 581 | "source": [ 582 | "df.isnull().value_counts()" 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": 113, 588 | "id": "7c53f6a8-3f14-4711-a11a-a99f72015431", 589 | "metadata": {}, 590 | "outputs": [ 591 | { 592 | "data": { 593 | "text/plain": [ 594 | "False 3411\n", 595 | "Name: country, dtype: int64" 596 | ] 597 | }, 598 | "execution_count": 113, 599 | "metadata": {}, 600 | "output_type": "execute_result" 601 | } 602 | ], 603 | "source": [ 604 | "df.country.isnull().value_counts()" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "id": "3d9dbc68-9826-49e0-a003-d1c0c5e5d4f6", 610 | "metadata": {}, 611 | "source": [ 612 | "We have two missing values for country. For this we can just remove those two reviews (rows) from the dataframe. " 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 114, 618 | "id": "98b5b16a-c25d-475b-8894-e680323195eb", 619 | "metadata": {}, 620 | "outputs": [], 621 | "source": [ 622 | "#drop the rows using index where the country value is null\n", 623 | "df.drop(df[df.country.isnull() == True].index, axis=0, inplace=True)" 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 115, 629 | "id": "116656f9-0f7a-4d79-8f69-7f4fbf711eca", 630 | "metadata": {}, 631 | "outputs": [ 632 | { 633 | "data": { 634 | "text/plain": [ 635 | "(3411, 6)" 636 | ] 637 | }, 638 | "execution_count": 115, 639 | "metadata": {}, 640 | "output_type": "execute_result" 641 | } 642 | ], 643 | "source": [ 644 | "df.shape" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": 116, 650 | "id": "54a2b64d-4730-44f3-b40c-409d098f78f9", 651 | "metadata": {}, 652 | "outputs": [ 653 | { 654 | "data": { 655 | "text/html": [ 656 | "
\n", 657 | "\n", 670 | "\n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | "
reviewsstarsdatecountryverifiedcorpus
0Not Verified | Worst experience ever. Outbound...52022-11-07ItalyFalseverified worst experience ever outbound flight...
1✅ Trip Verified | Check in was a shambles at ...12022-11-07MalaysiaTruecheck shamble bwi counter open full flight bag...
2✅ Trip Verified | Beyond disgusted with the fa...52022-11-05United Arab EmiratesTruebeyond disgusted fact baggage yet delivered we...
3✅ Trip Verified | On July 19th 2022 I had subm...12022-10-31United StatesTruejuly th submitted complaint form regard fact b...
4✅ Trip Verified | I booked the flight on Oct ...12022-10-31United StatesTruebooked flight oct cancel flight day learning g...
.....................
3406This was a bmi Regional operated flight on a R...12012-08-29United KingdomFalsebmi regional operated flight rj manchester hea...
3407LHR to HAM. Purser addresses all club passenge...102012-08-28United KingdomFalselhr ham purser address club passenger name boa...
3408My son who had worked for British Airways urge...102011-10-12United KingdomFalseson worked british airway urged fly british ai...
3409London City-New York JFK via Shannon on A318 b...82011-10-11United StatesFalselondon city new york jfk via shannon really ni...
3410SIN-LHR BA12 B747-436 First Class. Old aircraf...92011-10-09United KingdomFalsesin lhr ba b first class old aircraft seat pri...
\n", 784 | "

3411 rows × 6 columns

\n", 785 | "
" 786 | ], 787 | "text/plain": [ 788 | " reviews stars date \\\n", 789 | "0 Not Verified | Worst experience ever. Outbound... 5 2022-11-07 \n", 790 | "1 ✅ Trip Verified | Check in was a shambles at ... 1 2022-11-07 \n", 791 | "2 ✅ Trip Verified | Beyond disgusted with the fa... 5 2022-11-05 \n", 792 | "3 ✅ Trip Verified | On July 19th 2022 I had subm... 1 2022-10-31 \n", 793 | "4 ✅ Trip Verified | I booked the flight on Oct ... 1 2022-10-31 \n", 794 | "... ... ... ... \n", 795 | "3406 This was a bmi Regional operated flight on a R... 1 2012-08-29 \n", 796 | "3407 LHR to HAM. Purser addresses all club passenge... 10 2012-08-28 \n", 797 | "3408 My son who had worked for British Airways urge... 10 2011-10-12 \n", 798 | "3409 London City-New York JFK via Shannon on A318 b... 8 2011-10-11 \n", 799 | "3410 SIN-LHR BA12 B747-436 First Class. Old aircraf... 9 2011-10-09 \n", 800 | "\n", 801 | " country verified \\\n", 802 | "0 Italy False \n", 803 | "1 Malaysia True \n", 804 | "2 United Arab Emirates True \n", 805 | "3 United States True \n", 806 | "4 United States True \n", 807 | "... ... ... \n", 808 | "3406 United Kingdom False \n", 809 | "3407 United Kingdom False \n", 810 | "3408 United Kingdom False \n", 811 | "3409 United States False \n", 812 | "3410 United Kingdom False \n", 813 | "\n", 814 | " corpus \n", 815 | "0 verified worst experience ever outbound flight... \n", 816 | "1 check shamble bwi counter open full flight bag... \n", 817 | "2 beyond disgusted fact baggage yet delivered we... \n", 818 | "3 july th submitted complaint form regard fact b... \n", 819 | "4 booked flight oct cancel flight day learning g... \n", 820 | "... ... \n", 821 | "3406 bmi regional operated flight rj manchester hea... \n", 822 | "3407 lhr ham purser address club passenger name boa... \n", 823 | "3408 son worked british airway urged fly british ai... \n", 824 | "3409 london city new york jfk via shannon really ni... \n", 825 | "3410 sin lhr ba b first class old aircraft seat pri... \n", 826 | "\n", 827 | "[3411 rows x 6 columns]" 828 | ] 829 | }, 830 | "execution_count": 116, 831 | "metadata": {}, 832 | "output_type": "execute_result" 833 | } 834 | ], 835 | "source": [ 836 | "#resetting the index\n", 837 | "df.reset_index(drop=True)" 838 | ] 839 | }, 840 | { 841 | "cell_type": "markdown", 842 | "id": "9fa4efa3-5d41-4fa7-a5ac-a959e322fcc9", 843 | "metadata": {}, 844 | "source": [ 845 | "*****" 846 | ] 847 | }, 848 | { 849 | "cell_type": "markdown", 850 | "id": "cbb39abb-e09a-4bad-9fa6-65bba66ed631", 851 | "metadata": {}, 852 | "source": [ 853 | "Now our data is all cleaned and ready for data visualization and data analysis." 854 | ] 855 | }, 856 | { 857 | "cell_type": "code", 858 | "execution_count": 117, 859 | "id": "d9f0183c-e72a-4028-9c9a-1f3dee87c342", 860 | "metadata": {}, 861 | "outputs": [], 862 | "source": [ 863 | "# export the cleaned data\n", 864 | "\n", 865 | "df.to_csv(cwd + \"/cleaned-BA-reviews.csv\")" 866 | ] 867 | }, 868 | { 869 | "cell_type": "code", 870 | "execution_count": null, 871 | "id": "4f9bad2f-721b-445a-86f9-bc9e16c177ce", 872 | "metadata": {}, 873 | "outputs": [], 874 | "source": [] 875 | } 876 | ], 877 | "metadata": { 878 | "kernelspec": { 879 | "display_name": "Python 3 (ipykernel)", 880 | "language": "python", 881 | "name": "python3" 882 | }, 883 | "language_info": { 884 | "codemirror_mode": { 885 | "name": "ipython", 886 | "version": 3 887 | }, 888 | "file_extension": ".py", 889 | "mimetype": "text/x-python", 890 | "name": "python", 891 | "nbconvert_exporter": "python", 892 | "pygments_lexer": "ipython3", 893 | "version": "3.9.7" 894 | } 895 | }, 896 | "nbformat": 4, 897 | "nbformat_minor": 5 898 | } 899 | -------------------------------------------------------------------------------- /Predictive Modeling/.ipynb_checkpoints/Explore-customer-data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "e356b851-f7a9-4f12-ae70-93d26dee453b", 6 | "metadata": {}, 7 | "source": [ 8 | "## Exploratory Data Analysis on Customer Bookings data for British Airways" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "d7d67fea-8f6b-4e3c-a414-2e821ca8b442", 14 | "metadata": {}, 15 | "source": [ 16 | "We will explore the customer data first to get to know it better in depth.\n" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 37, 22 | "id": "f8340639-d648-4236-8a50-b27bf8157449", 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "#imports\n", 27 | "\n", 28 | "import pandas as pd\n", 29 | "import numpy as np\n", 30 | "import os\n", 31 | "\n", 32 | "import matplotlib.pyplot as plt\n", 33 | "import seaborn as sns" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 6, 39 | "id": "553a9c80-8691-4b5e-9a51-4cb0a72d3737", 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "#get current working directory\n", 44 | "\n", 45 | "cwd = os.getcwd()\n", 46 | "\n", 47 | "#read the csv\n", 48 | "\n", 49 | "df = pd.read_csv(cwd + \"/customer_booking.csv\", encoding=\"ISO-8859-1\")" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 8, 55 | "id": "1f195839-980d-4dfa-adf2-533bfeef3f94", 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/html": [ 61 | "
\n", 62 | "\n", 75 | "\n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | "
num_passengerssales_channeltrip_typepurchase_leadlength_of_stayflight_hourflight_dayroutebooking_originwants_extra_baggagewants_preferred_seatwants_in_flight_mealsflight_durationbooking_complete
02InternetRoundTrip262197SatAKLDELNew Zealand1005.520
11InternetRoundTrip112203SatAKLDELNew Zealand0005.520
22InternetRoundTrip2432217WedAKLDELIndia1105.520
31InternetRoundTrip96314SatAKLDELNew Zealand0015.520
42InternetRoundTrip682215WedAKLDELIndia1015.520
\n", 183 | "
" 184 | ], 185 | "text/plain": [ 186 | " num_passengers sales_channel trip_type purchase_lead length_of_stay \\\n", 187 | "0 2 Internet RoundTrip 262 19 \n", 188 | "1 1 Internet RoundTrip 112 20 \n", 189 | "2 2 Internet RoundTrip 243 22 \n", 190 | "3 1 Internet RoundTrip 96 31 \n", 191 | "4 2 Internet RoundTrip 68 22 \n", 192 | "\n", 193 | " flight_hour flight_day route booking_origin wants_extra_baggage \\\n", 194 | "0 7 Sat AKLDEL New Zealand 1 \n", 195 | "1 3 Sat AKLDEL New Zealand 0 \n", 196 | "2 17 Wed AKLDEL India 1 \n", 197 | "3 4 Sat AKLDEL New Zealand 0 \n", 198 | "4 15 Wed AKLDEL India 1 \n", 199 | "\n", 200 | " wants_preferred_seat wants_in_flight_meals flight_duration \\\n", 201 | "0 0 0 5.52 \n", 202 | "1 0 0 5.52 \n", 203 | "2 1 0 5.52 \n", 204 | "3 0 1 5.52 \n", 205 | "4 0 1 5.52 \n", 206 | "\n", 207 | " booking_complete \n", 208 | "0 0 \n", 209 | "1 0 \n", 210 | "2 0 \n", 211 | "3 0 \n", 212 | "4 0 " 213 | ] 214 | }, 215 | "execution_count": 8, 216 | "metadata": {}, 217 | "output_type": "execute_result" 218 | } 219 | ], 220 | "source": [ 221 | "df.head()" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 10, 227 | "id": "127dc3df-583f-424a-892d-697c41410906", 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "data": { 232 | "text/plain": [ 233 | "(50000, 14)" 234 | ] 235 | }, 236 | "execution_count": 10, 237 | "metadata": {}, 238 | "output_type": "execute_result" 239 | } 240 | ], 241 | "source": [ 242 | "df.shape" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 11, 248 | "id": "c89996b9-621d-4235-88a6-53ed2822d697", 249 | "metadata": {}, 250 | "outputs": [ 251 | { 252 | "data": { 253 | "text/html": [ 254 | "
\n", 255 | "\n", 268 | "\n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | "
num_passengerspurchase_leadlength_of_stayflight_hourwants_extra_baggagewants_preferred_seatwants_in_flight_mealsflight_durationbooking_complete
count50000.00000050000.00000050000.0000050000.0000050000.00000050000.00000050000.00000050000.00000050000.000000
mean1.59124084.94048023.044569.066340.6687800.2969600.4271407.2775610.149560
std1.02016590.45137833.887675.412660.4706570.4569230.4946681.4968630.356643
min1.0000000.0000000.000000.000000.0000000.0000000.0000004.6700000.000000
25%1.00000021.0000005.000005.000000.0000000.0000000.0000005.6200000.000000
50%1.00000051.00000017.000009.000001.0000000.0000000.0000007.5700000.000000
75%2.000000115.00000028.0000013.000001.0000001.0000001.0000008.8300000.000000
max9.000000867.000000778.0000023.000001.0000001.0000001.0000009.5000001.000000
\n", 382 | "
" 383 | ], 384 | "text/plain": [ 385 | " num_passengers purchase_lead length_of_stay flight_hour \\\n", 386 | "count 50000.000000 50000.000000 50000.00000 50000.00000 \n", 387 | "mean 1.591240 84.940480 23.04456 9.06634 \n", 388 | "std 1.020165 90.451378 33.88767 5.41266 \n", 389 | "min 1.000000 0.000000 0.00000 0.00000 \n", 390 | "25% 1.000000 21.000000 5.00000 5.00000 \n", 391 | "50% 1.000000 51.000000 17.00000 9.00000 \n", 392 | "75% 2.000000 115.000000 28.00000 13.00000 \n", 393 | "max 9.000000 867.000000 778.00000 23.00000 \n", 394 | "\n", 395 | " wants_extra_baggage wants_preferred_seat wants_in_flight_meals \\\n", 396 | "count 50000.000000 50000.000000 50000.000000 \n", 397 | "mean 0.668780 0.296960 0.427140 \n", 398 | "std 0.470657 0.456923 0.494668 \n", 399 | "min 0.000000 0.000000 0.000000 \n", 400 | "25% 0.000000 0.000000 0.000000 \n", 401 | "50% 1.000000 0.000000 0.000000 \n", 402 | "75% 1.000000 1.000000 1.000000 \n", 403 | "max 1.000000 1.000000 1.000000 \n", 404 | "\n", 405 | " flight_duration booking_complete \n", 406 | "count 50000.000000 50000.000000 \n", 407 | "mean 7.277561 0.149560 \n", 408 | "std 1.496863 0.356643 \n", 409 | "min 4.670000 0.000000 \n", 410 | "25% 5.620000 0.000000 \n", 411 | "50% 7.570000 0.000000 \n", 412 | "75% 8.830000 0.000000 \n", 413 | "max 9.500000 1.000000 " 414 | ] 415 | }, 416 | "execution_count": 11, 417 | "metadata": {}, 418 | "output_type": "execute_result" 419 | } 420 | ], 421 | "source": [ 422 | "df.describe()" 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 12, 428 | "id": "5c9f1d4b-3a05-49b7-ace4-fc222e19cbcf", 429 | "metadata": {}, 430 | "outputs": [ 431 | { 432 | "name": "stdout", 433 | "output_type": "stream", 434 | "text": [ 435 | "\n", 436 | "RangeIndex: 50000 entries, 0 to 49999\n", 437 | "Data columns (total 14 columns):\n", 438 | " # Column Non-Null Count Dtype \n", 439 | "--- ------ -------------- ----- \n", 440 | " 0 num_passengers 50000 non-null int64 \n", 441 | " 1 sales_channel 50000 non-null object \n", 442 | " 2 trip_type 50000 non-null object \n", 443 | " 3 purchase_lead 50000 non-null int64 \n", 444 | " 4 length_of_stay 50000 non-null int64 \n", 445 | " 5 flight_hour 50000 non-null int64 \n", 446 | " 6 flight_day 50000 non-null object \n", 447 | " 7 route 50000 non-null object \n", 448 | " 8 booking_origin 50000 non-null object \n", 449 | " 9 wants_extra_baggage 50000 non-null int64 \n", 450 | " 10 wants_preferred_seat 50000 non-null int64 \n", 451 | " 11 wants_in_flight_meals 50000 non-null int64 \n", 452 | " 12 flight_duration 50000 non-null float64\n", 453 | " 13 booking_complete 50000 non-null int64 \n", 454 | "dtypes: float64(1), int64(8), object(5)\n", 455 | "memory usage: 5.3+ MB\n" 456 | ] 457 | } 458 | ], 459 | "source": [ 460 | "df.info()" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "id": "edb28431-ab65-43c0-a49c-99fd0eb536bb", 466 | "metadata": {}, 467 | "source": [ 468 | "### Sales Channel" 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": 23, 474 | "id": "c003e099-f950-4817-82ef-4b73a9428b94", 475 | "metadata": {}, 476 | "outputs": [], 477 | "source": [ 478 | "per_internet = df.sales_channel.value_counts().values[0] / df.sales_channel.count() *100\n", 479 | "per_mobile = df.sales_channel.value_counts().values[1] / df.sales_channel.count() *100" 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": 25, 485 | "id": "bde01b1a-3142-4667-b389-2b9a6c52913f", 486 | "metadata": {}, 487 | "outputs": [ 488 | { 489 | "name": "stdout", 490 | "output_type": "stream", 491 | "text": [ 492 | "Number of bookings done through internet: 88.764 %\n", 493 | "Number of bookings done through phone call: 11.236 %\n" 494 | ] 495 | } 496 | ], 497 | "source": [ 498 | "print(f\"Number of bookings done through internet: {per_internet} %\")\n", 499 | "print(f\"Number of bookings done through phone call: {per_mobile} %\")" 500 | ] 501 | }, 502 | { 503 | "cell_type": "markdown", 504 | "id": "9252a715-73db-4763-b969-85517922fd06", 505 | "metadata": {}, 506 | "source": [ 507 | "### Trip Type" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": 28, 513 | "id": "7f04c4b2-e90f-408e-9163-09353a1d977b", 514 | "metadata": {}, 515 | "outputs": [], 516 | "source": [ 517 | "per_round = df.trip_type.value_counts().values[0]/ df.trip_type.count() *100\n", 518 | "per_oneway = df.trip_type.value_counts().values[1]/ df.trip_type.count() *100\n", 519 | "per_circle = df.trip_type.value_counts().values[2]/ df.trip_type.count() *100" 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 29, 525 | "id": "27523ae3-0d2c-4320-b748-47f6e59fbffc", 526 | "metadata": {}, 527 | "outputs": [ 528 | { 529 | "name": "stdout", 530 | "output_type": "stream", 531 | "text": [ 532 | "Percentage of round trips: 98.994 %\n", 533 | "Percentage of One way trips: 0.774 %\n", 534 | "Percentage of circle trips: 0.232 %\n" 535 | ] 536 | } 537 | ], 538 | "source": [ 539 | "print(f\"Percentage of round trips: {per_round} %\")\n", 540 | "print(f\"Percentage of One way trips: {per_oneway} %\")\n", 541 | "print(f\"Percentage of circle trips: {per_circle} %\")\n" 542 | ] 543 | }, 544 | { 545 | "cell_type": "markdown", 546 | "id": "03c17de4-2cdc-40c9-8bd0-9f5a9f39cba3", 547 | "metadata": {}, 548 | "source": [ 549 | "### Purchase Lead" 550 | ] 551 | }, 552 | { 553 | "cell_type": "code", 554 | "execution_count": 54, 555 | "id": "5207bdf6-4389-47ae-82f7-91e6c02f13a6", 556 | "metadata": {}, 557 | "outputs": [ 558 | { 559 | "data": { 560 | "text/plain": [ 561 | "" 562 | ] 563 | }, 564 | "execution_count": 54, 565 | "metadata": {}, 566 | "output_type": "execute_result" 567 | }, 568 | { 569 | "data": { 570 | "image/png": "\n", 571 | "text/plain": [ 572 | "
" 573 | ] 574 | }, 575 | "metadata": { 576 | "needs_background": "light" 577 | }, 578 | "output_type": "display_data" 579 | } 580 | ], 581 | "source": [ 582 | "plt.figure(figsize=(15,5))\n", 583 | "sns.histplot(data=df, x=\"purchase_lead\", binwidth=20,kde=True)" 584 | ] 585 | }, 586 | { 587 | "cell_type": "markdown", 588 | "id": "67e9da17-7a80-4cee-9e66-f4f195222156", 589 | "metadata": {}, 590 | "source": [ 591 | "There are few bookings that were done more than 2 years before the travel date and it seems very unlikely that book that in advance. However, it might also be because of the cancellation and rebooking in a period of 6 months for twice. Generally airline keep the tickets for rebooking within a year. But at this point we will consider them as outliers which will effect the results of predictive model in a huge way. " 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "execution_count": 68, 597 | "id": "51af85fc-b939-4f49-b1ef-43376018701c", 598 | "metadata": {}, 599 | "outputs": [ 600 | { 601 | "data": { 602 | "text/plain": [ 603 | "False 49992\n", 604 | "True 8\n", 605 | "Name: purchase_lead, dtype: int64" 606 | ] 607 | }, 608 | "execution_count": 68, 609 | "metadata": {}, 610 | "output_type": "execute_result" 611 | } 612 | ], 613 | "source": [ 614 | "(df.purchase_lead >600).value_counts()" 615 | ] 616 | }, 617 | { 618 | "cell_type": "markdown", 619 | "id": "71a14c0c-fa30-4bc0-9f3e-214251579ac2", 620 | "metadata": {}, 621 | "source": [ 622 | "If we assume that no customer is booking in advance of more than 1 and half year we will remove all entries with purchase_lead more than 600 days." 623 | ] 624 | }, 625 | { 626 | "cell_type": "code", 627 | "execution_count": 69, 628 | "id": "956345d4-716f-49a4-9f66-ca014256c9f5", 629 | "metadata": {}, 630 | "outputs": [ 631 | { 632 | "data": { 633 | "text/html": [ 634 | "
\n", 635 | "\n", 648 | "\n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | "
num_passengerssales_channeltrip_typepurchase_leadlength_of_stayflight_hourflight_dayroutebooking_originwants_extra_baggagewants_preferred_seatwants_in_flight_mealsflight_durationbooking_complete
8353InternetRoundTrip641466SunAKLKULMalaysia1018.831
61481InternetRoundTrip6141911WedCOKMELAustralia0008.830
241191InternetRoundTrip704238TuePNHSYDAustralia0008.580
383562InternetRoundTrip633510SatHKTOOLAustralia0018.830
394171MobileRoundTrip625515FriICNRGNMyanmar (Burma)0006.620
429161MobileRoundTrip605618ThuBLRMELIndia0008.830
467162InternetRoundTrip60666FriHKTTPEUnited States0014.670
482593InternetRoundTrip86767MonKIXMLEJapan0017.001
\n", 807 | "
" 808 | ], 809 | "text/plain": [ 810 | " num_passengers sales_channel trip_type purchase_lead length_of_stay \\\n", 811 | "835 3 Internet RoundTrip 641 46 \n", 812 | "6148 1 Internet RoundTrip 614 19 \n", 813 | "24119 1 Internet RoundTrip 704 23 \n", 814 | "38356 2 Internet RoundTrip 633 5 \n", 815 | "39417 1 Mobile RoundTrip 625 5 \n", 816 | "42916 1 Mobile RoundTrip 605 6 \n", 817 | "46716 2 Internet RoundTrip 606 6 \n", 818 | "48259 3 Internet RoundTrip 867 6 \n", 819 | "\n", 820 | " flight_hour flight_day route booking_origin wants_extra_baggage \\\n", 821 | "835 6 Sun AKLKUL Malaysia 1 \n", 822 | "6148 11 Wed COKMEL Australia 0 \n", 823 | "24119 8 Tue PNHSYD Australia 0 \n", 824 | "38356 10 Sat HKTOOL Australia 0 \n", 825 | "39417 15 Fri ICNRGN Myanmar (Burma) 0 \n", 826 | "42916 18 Thu BLRMEL India 0 \n", 827 | "46716 6 Fri HKTTPE United States 0 \n", 828 | "48259 7 Mon KIXMLE Japan 0 \n", 829 | "\n", 830 | " wants_preferred_seat wants_in_flight_meals flight_duration \\\n", 831 | "835 0 1 8.83 \n", 832 | "6148 0 0 8.83 \n", 833 | "24119 0 0 8.58 \n", 834 | "38356 0 1 8.83 \n", 835 | "39417 0 0 6.62 \n", 836 | "42916 0 0 8.83 \n", 837 | "46716 0 1 4.67 \n", 838 | "48259 0 1 7.00 \n", 839 | "\n", 840 | " booking_complete \n", 841 | "835 1 \n", 842 | "6148 0 \n", 843 | "24119 0 \n", 844 | "38356 0 \n", 845 | "39417 0 \n", 846 | "42916 0 \n", 847 | "46716 0 \n", 848 | "48259 1 " 849 | ] 850 | }, 851 | "execution_count": 69, 852 | "metadata": {}, 853 | "output_type": "execute_result" 854 | } 855 | ], 856 | "source": [ 857 | "df[df.purchase_lead > 600]" 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "execution_count": 71, 863 | "id": "12453b86-8ee1-4e41-9814-9938fd4abcbd", 864 | "metadata": {}, 865 | "outputs": [], 866 | "source": [ 867 | "#filtering the data to have only purchase lead days less than 600 days\n", 868 | "df = df[df.purchase_lead <600 ]" 869 | ] 870 | }, 871 | { 872 | "cell_type": "markdown", 873 | "id": "5029fd70-7dc9-4748-85fb-f9f3541280a3", 874 | "metadata": {}, 875 | "source": [ 876 | "### Length Of Stay" 877 | ] 878 | }, 879 | { 880 | "cell_type": "code", 881 | "execution_count": 74, 882 | "id": "53e701c9-5153-45b0-9e5c-bfdd887a30a5", 883 | "metadata": {}, 884 | "outputs": [ 885 | { 886 | "data": { 887 | "text/plain": [ 888 | "" 889 | ] 890 | }, 891 | "execution_count": 74, 892 | "metadata": {}, 893 | "output_type": "execute_result" 894 | }, 895 | { 896 | "data": { 897 | "image/png": "\n", 898 | "text/plain": [ 899 | "
" 900 | ] 901 | }, 902 | "metadata": { 903 | "needs_background": "light" 904 | }, 905 | "output_type": "display_data" 906 | } 907 | ], 908 | "source": [ 909 | "plt.figure(figsize=(15,5))\n", 910 | "sns.histplot(data=df, x=\"length_of_stay\", binwidth=15,kde=True)" 911 | ] 912 | }, 913 | { 914 | "cell_type": "markdown", 915 | "id": "1d4cdb69-f8e7-4254-a3e8-6d6dcf6dcd51", 916 | "metadata": {}, 917 | "source": [ 918 | "Let's see how many entries do we have that exceeds length of stay more than 100 days." 919 | ] 920 | }, 921 | { 922 | "cell_type": "code", 923 | "execution_count": 76, 924 | "id": "f8f4c77f-e2c9-4ca3-ac9f-95563158db01", 925 | "metadata": {}, 926 | "outputs": [ 927 | { 928 | "data": { 929 | "text/plain": [ 930 | "False 49713\n", 931 | "True 279\n", 932 | "Name: length_of_stay, dtype: int64" 933 | ] 934 | }, 935 | "execution_count": 76, 936 | "metadata": {}, 937 | "output_type": "execute_result" 938 | } 939 | ], 940 | "source": [ 941 | "(df.length_of_stay> 200).value_counts()" 942 | ] 943 | }, 944 | { 945 | "cell_type": "code", 946 | "execution_count": 90, 947 | "id": "865f0f4a-d6a7-40f5-99e4-f693eb0c1cfc", 948 | "metadata": {}, 949 | "outputs": [ 950 | { 951 | "data": { 952 | "text/plain": [ 953 | "0 9\n", 954 | "1 1\n", 955 | "Name: booking_complete, dtype: int64" 956 | ] 957 | }, 958 | "execution_count": 90, 959 | "metadata": {}, 960 | "output_type": "execute_result" 961 | } 962 | ], 963 | "source": [ 964 | "df[df.length_of_stay> 500].booking_complete.value_counts()" 965 | ] 966 | }, 967 | { 968 | "cell_type": "markdown", 969 | "id": "9ad14f0a-8b73-4f49-879f-eb3bb45ae05d", 970 | "metadata": {}, 971 | "source": [ 972 | "We need to have more business knowledge to decide whether to remove these entries with more than 600 days of stay. There are could be many reasons for such bookings. But for now, we will just want to focus on bookings done for length of stay less than 500 days. " 973 | ] 974 | }, 975 | { 976 | "cell_type": "code", 977 | "execution_count": 91, 978 | "id": "36c921c0-0efb-4054-9b4f-64331e142a3c", 979 | "metadata": {}, 980 | "outputs": [], 981 | "source": [ 982 | "#filtering the data to have only length of stay days less than 500 days\n", 983 | "df = df[df.purchase_lead <500 ]" 984 | ] 985 | }, 986 | { 987 | "cell_type": "markdown", 988 | "id": "1f7b9aee-dcce-4616-8b39-5ffb0da18ed6", 989 | "metadata": {}, 990 | "source": [ 991 | "### Flight Day\n", 992 | "\n", 993 | "We will map the flight day with a number of a week. " 994 | ] 995 | }, 996 | { 997 | "cell_type": "code", 998 | "execution_count": 93, 999 | "id": "24b9f199-1c20-403c-a8ba-2a126e05b2e6", 1000 | "metadata": {}, 1001 | "outputs": [], 1002 | "source": [ 1003 | "mapping = {\n", 1004 | " \"Mon\" : 1,\n", 1005 | " \"Tue\" : 2,\n", 1006 | " \"Wed\" : 3,\n", 1007 | " \"Thu\" : 4,\n", 1008 | " \"Fri\" : 5,\n", 1009 | " \"Sat\" : 6,\n", 1010 | " \"Sun\" : 7\n", 1011 | "}\n", 1012 | "\n", 1013 | "df.flight_day = df.flight_day.map(mapping)" 1014 | ] 1015 | }, 1016 | { 1017 | "cell_type": "code", 1018 | "execution_count": 98, 1019 | "id": "ba56606a-4c3a-4f23-b45b-943bc376f7aa", 1020 | "metadata": {}, 1021 | "outputs": [ 1022 | { 1023 | "data": { 1024 | "text/plain": [ 1025 | "1 8100\n", 1026 | "3 7671\n", 1027 | "2 7670\n", 1028 | "4 7423\n", 1029 | "5 6759\n", 1030 | "7 6550\n", 1031 | "6 5809\n", 1032 | "Name: flight_day, dtype: int64" 1033 | ] 1034 | }, 1035 | "execution_count": 98, 1036 | "metadata": {}, 1037 | "output_type": "execute_result" 1038 | } 1039 | ], 1040 | "source": [ 1041 | "df.flight_day.value_counts()" 1042 | ] 1043 | }, 1044 | { 1045 | "cell_type": "markdown", 1046 | "id": "cb094eea-5531-4f35-8226-fbb953bc8846", 1047 | "metadata": {}, 1048 | "source": [ 1049 | "Most of the customers want to travel on Monday and choose Saturday as least preffered day as flight day. " 1050 | ] 1051 | }, 1052 | { 1053 | "cell_type": "markdown", 1054 | "id": "07344b29-7a55-4093-b892-b240e46c3f32", 1055 | "metadata": {}, 1056 | "source": [ 1057 | "### Booking Origin" 1058 | ] 1059 | }, 1060 | { 1061 | "cell_type": "code", 1062 | "execution_count": 122, 1063 | "id": "b1be60f0-9c41-47d5-ae29-6cad5c4ba916", 1064 | "metadata": {}, 1065 | "outputs": [ 1066 | { 1067 | "data": { 1068 | "text/plain": [ 1069 | "Text(0, 0.5, 'Number of bookings')" 1070 | ] 1071 | }, 1072 | "execution_count": 122, 1073 | "metadata": {}, 1074 | "output_type": "execute_result" 1075 | }, 1076 | { 1077 | "data": { 1078 | "image/png": "\n", 1079 | "text/plain": [ 1080 | "
" 1081 | ] 1082 | }, 1083 | "metadata": { 1084 | "needs_background": "light" 1085 | }, 1086 | "output_type": "display_data" 1087 | } 1088 | ], 1089 | "source": [ 1090 | "plt.figure(figsize=(15,5))\n", 1091 | "ax = df.booking_origin.value_counts()[:20].plot(kind=\"bar\")\n", 1092 | "ax.set_xlabel(\"Countries\")\n", 1093 | "ax.set_ylabel(\"Number of bookings\")" 1094 | ] 1095 | }, 1096 | { 1097 | "cell_type": "markdown", 1098 | "id": "6816a14d-ec09-4372-9213-da04c5e28b77", 1099 | "metadata": {}, 1100 | "source": [ 1101 | "Above chart shows travellers from which country had maximum booking applications. " 1102 | ] 1103 | }, 1104 | { 1105 | "cell_type": "code", 1106 | "execution_count": 123, 1107 | "id": "03541101-79a5-4cd6-89b8-75786acab617", 1108 | "metadata": {}, 1109 | "outputs": [ 1110 | { 1111 | "data": { 1112 | "text/plain": [ 1113 | "Text(0, 0.5, 'Number of complete bookings')" 1114 | ] 1115 | }, 1116 | "execution_count": 123, 1117 | "metadata": {}, 1118 | "output_type": "execute_result" 1119 | }, 1120 | { 1121 | "data": { 1122 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAGDCAYAAACV/RXuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAABIZ0lEQVR4nO3dedz99Zz/8cdTlpCkKVtaaMI01oTIWMJYshtrtiwxtixjJozJNsMYGdmFKBqGwQhJSWQZtEgL+kmWyhaiiCiv3x/vz+l7vtf3uq7v+eY61+dzvudxv92u2znnfbbXd7nO+bw+7/f79UpVIUmSJEmaD1foOwBJkiRJ0uoxCZQkSZKkOWISKEmSJElzxCRQkiRJkuaISaAkSZIkzRGTQEmSJEmaI1fsO4Bp2GqrrWqHHXboOwxJkiRJ6sWJJ57486raerH7NsokcIcdduCEE07oOwxJkiRJ6kWSHyx1n8tBJUmSJGmOmARKkiRJ0hwxCZQkSZKkOWISKEmSJElzZGpJYJJtkxyb5JtJTk+ybzf+0iTnJjm5+7nv2HNemOTMJGckudfY+L27sTOT7DetmCVJkiRpYzfN6qCXAM+vqpOSXAM4McnR3X3/WVWvHX9wkp2BRwJ/DVwf+EySG3d3vxm4J3AOcHySw6vqm1OMXZIkSZI2SlNLAqvqx8CPu+sXJvkWsM0yT3kg8IGquhj4XpIzgdt1951ZVWcBJPlA91iTQEmSJEnaQKuyJzDJDsCtga92Q89MckqSg5NcqxvbBjh77GnndGNLjS98j32SnJDkhPPOO2+l/wiSJEmStFGYehKYZDPgw8BzquoC4K3AjsCtaDOFB6zE+1TVQVW1a1XtuvXWW6/ES0qSJEnSRmeaewJJciVaAnhYVX0EoKp+Onb/O4BPdDfPBbYde/oNujGWGZckSZIkbYBpVgcN8C7gW1X1urHx64097MHAad31w4FHJrlKkhsCOwFfA44HdkpywyRXphWPOXxacUuSJEnSxmyaM4G7A48FTk1ycjf2IuBRSW4FFPB94KkAVXV6kg/SCr5cAjyjqi4FSPJM4NPAJsDBVXX6SgW5w36fXKmXusz3X73nir+mJEmSJK2EaVYH/SKQRe46Ypnn/Cvwr4uMH7Hc8yRJkiRJk1mV6qCSJEmSpGEwCZQkSZKkOWISKEmSJElzxCRQkiRJkuaISaAkSZIkzRGTQEmSJEmaIyaBkiRJkjRHTAIlSZIkaY6YBEqSJEnSHDEJlCRJkqQ5YhIoSZIkSXPEJFCSJEmS5ohJoCRJkiTNEZNASZIkSZojJoGSJEmSNEdMAiVJkiRpjpgESpIkSdIcMQmUJEmSpDliEihJkiRJc8QkUJIkSZLmiEmgJEmSJM0Rk0BJkiRJmiMmgZIkSZI0R0wCJUmSJGmOmARKkiRJ0hwxCZQkSZKkOWISKEmSJElzxCRQkiRJkuaISaAkSZIkzRGTQEmSJEmaIyaBkiRJkjRHTAIlSZIkaY6YBEqSJEnSHDEJlCRJkqQ5YhIoSZIkSXPEJFCSJEmS5ohJoCRJkiTNEZNASZIkSZojJoGSJEmSNEdMAiVJkiRpjpgESpIkSdIcMQmUJEmSpDliEihJkiRJc2RqSWCSbZMcm+SbSU5Psm83vmWSo5N8p7u8VjeeJG9IcmaSU5LsMvZaj+8e/50kj59WzJIkSZK0sZvmTOAlwPOramdgN+AZSXYG9gOOqaqdgGO62wD3AXbqfvYB3gotaQT2B24P3A7Yf5Q4SpIkSZI2zNSSwKr6cVWd1F2/EPgWsA3wQOCQ7mGHAA/qrj8QOLSarwBbJLkecC/g6Kr6ZVWdDxwN3HtacUuSJEnSxmxV9gQm2QG4NfBV4DpV9ePurp8A1+mubwOcPfa0c7qxpcYlSZIkSRtovUlgkocluUZ3/Z+TfGR8v94Ez98M+DDwnKq6YPy+qiqgNjDmpd5nnyQnJDnhvPPOW4mXlCRJkqSNziQzgS+pqguT3Am4B/Auuv1665PkSrQE8LCq+kg3/NNumSfd5c+68XOBbceefoNubKnxtVTVQVW1a1XtuvXWW08SniRJkiTNnUmSwEu7yz2Bg6rqk8CV1/ekJKEljN+qqteN3XU4MKrw+XjgY2Pjj+uqhO4G/LpbNvpp4G+TXKsrCPO33ZgkSZIkaQNdcYLHnJvk7cA9gX9PchUmSx53Bx4LnJrk5G7sRcCrgQ8meRLwA+Dh3X1HAPcFzgQuAvYGqKpfJnkFcHz3uJdX1S8neH9JkiRJ0gKTJIEPp1XjfG1V/apbwvmC9T2pqr4IZIm7777I4wt4xhKvdTBw8ASxSpIkSZKWMUkSuCnwObisZ9/FwLFTjEmSJEmSNCWTLOs8CTgP+H/Ad7rr309yUpLbTDM4SZIkSdLKmiQJPBq4b1VtVVV/AdwH+CTwdOAt0wxOkiRJkrSyJkkCd6uqy6pxVtVR3dhXgKtMLTJJkiRJ0oqbZE/gj5P8E/CB7vYjaL3+NgH+NLXIJEmSJEkrbpKZwEfTGrT/b/ezXTe2CWvaO0iSJEmSZsB6ZwKr6ufAs5a4+8yVDUeSJEmSNE3rTQKT3Bj4B2CH8cdX1R7TC0uSJEmSNA2T7An8EPA24J3ApdMNR5IkSZI0TZMkgZdU1VunHokkSZIkaeomKQzz8SRPT3K9JFuOfqYemSRJkiRpxU0yE/j47vIFY2MF3Gjlw5EkSZIkTdMk1UFvuBqBSJIkSZKmb8kkMMkeVfXZJA9Z7P6q+sj0wpIkSZIkTcNyM4F3AT4L3H+R+wowCZQkSZKkGbNkElhV+3dXn1ZVF4/fZ2EYSZIkSZpNk1QH/UiSy5LFJNcFjp5eSJIkSZKkaZkkCfxf4ENJNkmyA3AU8MJpBiVJkiRJmo5JqoO+I8mVacngDsBTq+rLU45LkiRJkjQFy1UHfd74TWA74GRgtyS7VdXrphybJEmSJGmFLTcTeI0Ftz+yxLgkSZIkaUYsVx30ZeO3k2zWjf9m2kFJkiRJkqZjvYVhktwsydeB04HTk5yY5K+nH5okSZIkaaVNUh30IOB5VbV9VW0PPB94x3TDkiRJkiRNwyRJ4NWr6tjRjar6HHD1qUUkSZIkSZqa9baIAM5K8hLgvd3txwBnTS8kSZIkSdK0TDIT+ERga1p10I901584zaAkSZIkSdMxSbP484FnJ7lGu2l1UEmSJEmaVZNUB715Vx30NNZUB73Z9EOTJEmSJK20SZaDvp11q4MeNN2wJEmSJEnTYHVQSZIkSZojVgeVJEmSpDlidVBJkiRJmiMbUh30msCfqurC6YclSZIkSZqGSaqD3jbJqcA3gFOTfCPJbaYfmiRJkiRppU2yJ/BdwNOr6gsASe4EvBu4xTQDkyRJkiStvEn2BF46SgABquqLwCXTC0mSJEmSNC1LzgQm2aW7+vkkbwfeDxTwCOBz0w9NkiRJkrTSllsOesCC2/uPXa8pxCJJkiRJmrIlk8CquttqBiJJkiRJmr5J9gRKkiRJkjYSJoGSJEmSNEdMAiVJkiRpjkzSLP5qSV6S5B3d7Z2S3G/6oUmSJEmSVtokM4HvBi4G7tDdPhd45fqelOTgJD9LctrY2EuTnJvk5O7nvmP3vTDJmUnOSHKvsfF7d2NnJtlv4j+ZJEmSJGkdkySBO1bVa4A/AlTVRUAmeN57gHsvMv6fVXWr7ucIgCQ7A48E/rp7zluSbJJkE+DNwH2AnYFHdY+VJEmSJF0Oy/UJHPlDkqvS9QZMsiNtZnBZVXVckh0mjOOBwAeq6mLge0nOBG7X3XdmVZ3VvfcHusd+c8LXlSRJkiSNmWQm8KXAkcC2SQ4DjgH+6c94z2cmOaVbLnqtbmwb4Oyxx5zTjS01vo4k+yQ5IckJ55133p8RniRJkiRtvNabBFbVUcBDgCcA7wd2rapjL+f7vRXYEbgV8GPggMv5OuuoqoOqateq2nXrrbdeqZeVJEmSpI3KJNVBj6mqX1TVJ6vqE1X18yTHXJ43q6qfVtWlVfUn4B2sWfJ5LrDt2ENv0I0tNS5JkiRJuhyWTAKTbJpkS2CrJNdKsmX3swNLLMlcnyTXG7v5YGBUOfRw4JFJrpLkhsBOwNeA44GdktwwyZVpxWMOvzzvLUmSJElavjDMU4HnANcHThobvwB40/peOMn7gbvSkshzgP2Buya5Fa3IzPe796CqTk/yQVrBl0uAZ1TVpd3rPBP4NLAJcHBVnT7xn06SJEmStJYlk8CqOhA4MMmzquqNG/rCVfWoRYbftczj/xX410XGjwCO2ND3lyRJkiSta5LqoAcn+eckBwEk2SnJ/aYclyRJkiRpCiZKAoE/AHfsbp8LvHJqEUmSJEmSpmaSJHDHqnoN8EeAqroIyFSjkiRJkiRNxSRJ4B+SXJVWzIUkOwIXTzUqSZIkSdJULFcddGR/4Ehg2ySHAbvTGsdLkiRJkmbMepPAqjo6yUnAbrRloPtW1c+nHpkkSZIkacUtmQQm2WXB0I+7y+2SbFdVJy18jiRJkiRp2JabCTxgmfsK2GOFY5EkSZIkTdlyzeLvtpqBSJIkSZKmb717ApNsCjwduBNtBvALwNuq6vdTjk2SJEmStMImqQ56KHAh8Mbu9qOB9wIPm1ZQkiRJkqTpmCQJvFlV7Tx2+9gk35xWQJIkSZKk6ZmkWfxJSXYb3Uhye+CE6YUkSZIkSZqWSWYCbwN8OckPu9vbAWckORWoqrrF1KKTJEmSJK2oSZLAe089CkmSJEnSqlhvElhVP0hyLWDb8cfbLF6SJEmSZs8kLSJeATwB+C6tRQTYLF6SJEmSZtIky0EfDuxYVX+YdjCSJEmSpOmapDroacAWU45DkiRJkrQKJpkJfBXw9SSnARePBqvqAVOLSpIkSZI0FZMkgYcA/w6cCvxpuuFIkiRJkqZpkiTwoqp6w9QjkSRJkiRN3SRJ4BeSvAo4nLWXg9oiQpIkSZJmzCRJ4K27y93GxmwRIUmSJEkzaJJm8XdbjUAkSZIkSdO33hYRSa6Z5HVJTuh+DkhyzdUITpIkSZK0sibpE3gwcCGtafzDgQuAd08zKEmSJEnSdEyyJ3DHqnro2O2XJTl5SvFIkiRJkqZokpnA3yW50+hGkt2B300vJEmSJEnStEwyE/j3wCFj+wDPB54wtYgkSZIkSVMzSXXQk4FbJtm8u33BtIOSJEmSJE3HJNVB/y3JFlV1QVVdkORaSV65GsFJkiRJklbWJHsC71NVvxrdqKrzgftOLSJJkiRJ0tRMkgRukuQqoxtJrgpcZZnHS5IkSZIGapLCMIcBxyQZ9QbcGzhkeiFJkiRJkqZlksIw/57kG8A9uqFXVNWnpxuWJEmSJGkaJpkJpKqOBI6cciySJEmSpCmbZE+gJEmSJGkjYRIoSZIkSXNkySQwyTHd5b+vXjiSJEmSpGlabk/g9ZLcEXhAkg8AGb+zqk6aamSSJEmSpBW3XBL4L8BLgBsAr1twXwF7TCsoSZIkSdJ0LJkEVtX/AP+T5CVV9YpVjEmSJEmSNCXrLQxTVa9I8oAkr+1+7jfJCyc5OMnPkpw2NrZlkqOTfKe7vFY3niRvSHJmklOS7DL2nMd3j/9Oksdfnj+kJEmSJKlZbxKY5FXAvsA3u599k/zbBK/9HuDeC8b2A46pqp2AY7rbAPcBdup+9gHe2r33lsD+wO2B2wH7jxJHSZIkSdKGm6RFxJ7APavq4Ko6mJbYrXc2sKqOA365YPiBwCHd9UOAB42NH1rNV4AtklwPuBdwdFX9sqrOB45m3cRSkiRJkjShSfsEbjF2/Zp/xvtdp6p+3F3/CXCd7vo2wNljjzunG1tqXJIkSZJ0OSxXHXTkVcDXkxxLaxNxZ9Ys47zcqqqS1J/7OiNJ9qEtJWW77bZbqZeVJEmSpI3KJIVh3g/sBnwE+DBwh6r678v5fj/tlnnSXf6sGz8X2HbscTfoxpYaXyzOg6pq16radeutt76c4UmSJEnSxm2i5aBV9eOqOrz7+cmf8X6HA6MKn48HPjY2/riuSuhuwK+7ZaOfBv42ybW6gjB/241JkiRJki6HSZaDXi5J3g/cFdgqyTm0Kp+vBj6Y5EnAD4CHdw8/ArgvcCZwEbA3QFX9MskrgOO7x728qhYWm5EkSZIkTWhqSWBVPWqJu+6+yGMLeMYSr3MwcPAKhiZJkiRJc2vZ5aBJNkny7dUKRpIkSZI0XcsmgVV1KXBGEsttSpIkSdJGYJLloNcCTk/yNeC3o8GqesDUopIkSZIkTcUkSeBLph6FJEmSJGlVrDcJrKrPJ9ke2KmqPpPkasAm0w9NkiRJkrTS1psEJnkKsA+wJbAjsA3wNhap8qnp2WG/T674a37/1Xuu+GtKkiRJGrZJmsU/A9gduACgqr4DXHuaQUmSJEmSpmOSJPDiqvrD6EaSKwI1vZAkSZIkSdMySRL4+SQvAq6a5J7Ah4CPTzcsSZIkSdI0TJIE7gecB5wKPBU4AvjnaQYlSZIkSZqOSaqD/inJIcBXactAz6gql4NKkiRJ0gyapDronrRqoN8FAtwwyVOr6lPTDk6SJEmStLImaRZ/AHC3qjoTIMmOwCcBk0BJkiRJmjGT7Am8cJQAds4CLpxSPJIkSZKkKVpyJjDJQ7qrJyQ5AvggbU/gw4DjVyE2SZIkSdIKW2456P3Hrv8UuEt3/TzgqlOLSJIkSZI0NUsmgVW192oGIkmSJEmavkmqg94QeBaww/jjq+oB0wtLkiRJkjQNk1QH/V/gXcDHgT9NNRpJkiRJ0lRNkgT+vqreMPVIJEmSJElTN0kSeGCS/YGjgItHg1V10tSikiRJkiRNxSRJ4M2BxwJ7sGY5aHW3JUmSJEkzZJIk8GHAjarqD9MORpIkSZI0XVeY4DGnAVtMOQ5JkiRJ0iqYZCZwC+DbSY5n7T2BtoiQJEmSpBkzSRK4/9SjkCRJkiStivUmgVX1+dUIRJIkSZI0fetNApNcSKsGCnBl4ErAb6tq82kGptm0w36fXPHX/P6r91zx15QkSZLm1SQzgdcYXU8S4IHAbtMMSpIkSZI0HZNUB71MNf8L3Gs64UiSJEmSpmmS5aAPGbt5BWBX4PdTi0iSJEmSNDWTVAe9/9j1S4Dv05aESpIkSZJmzCR7AvdejUAkSZIkSdO3ZBKY5F+WeV5V1SumEI8kSZIkaYqWmwn87SJjVweeBPwFYBIoSZIkSTNmySSwqg4YXU9yDWBfYG/gA8ABSz1PGjp7GUqSJGmeLbsnMMmWwPOAvYBDgF2q6vzVCEySJEmStPKW2xP4H8BDgIOAm1fVb1YtKkmSJEnSVCzXLP75wPWBfwZ+lOSC7ufCJBesTniSJEmSpJW03J7A5RJESZIkSdIMMtGTJEmSpDliEihJkiRJc8QkUJIkSZLmiEmgJEmSJM2RXpLAJN9PcmqSk5Oc0I1tmeToJN/pLq/VjSfJG5KcmeSUJLv0EbMkSZIkbQz6nAm8W1Xdqqp27W7vBxxTVTsBx3S3Ae4D7NT97AO8ddUjlSRJkqSNxJCWgz4QOKS7fgjwoLHxQ6v5CrBFkuv1EJ8kSZIkzby+ksACjkpyYpJ9urHrVNWPu+s/Aa7TXd8GOHvsued0Y2tJsk+SE5KccN55500rbkmSJEmaaUs2i5+yO1XVuUmuDRyd5Nvjd1ZVJakNecGqOgg4CGDXXXfdoOdKkiRJ0rzoJQmsqnO7y58l+ShwO+CnSa5XVT/ulnv+rHv4ucC2Y0+/QTcmbdR22O+TK/6a33/1niv+mpIkSZotq54EJrk6cIWqurC7/rfAy4HDgccDr+4uP9Y95XDgmUk+ANwe+PXYslFJPTNZlSRJmi19zAReB/hoktH7/1dVHZnkeOCDSZ4E/AB4ePf4I4D7AmcCFwF7r37IkmadyaokSVKz6klgVZ0F3HKR8V8Ad19kvIBnrEJokiRJkrTRG1KLCEmSJEnSlJkESpIkSdIcMQmUJEmSpDliEihJkiRJc8QkUJIkSZLmiEmgJEmSJM0Rk0BJkiRJmiMmgZIkSZI0R0wCJUmSJGmOmARKkiRJ0hwxCZQkSZKkOWISKEmSJElzxCRQkiRJkuaISaAkSZIkzRGTQEmSJEmaI1fsOwBJ0ho77PfJFX/N7796zxV/TUmSNLucCZQkSZKkOWISKEmSJElzxCRQkiRJkuaISaAkSZIkzRGTQEmSJEmaIyaBkiRJkjRHTAIlSZIkaY7YJ1CStMHsZyhJ0uxyJlCSJEmS5ohJoCRJkiTNEZNASZIkSZojJoGSJEmSNEdMAiVJkiRpjpgESpIkSdIcMQmUJEmSpDliEihJkiRJc8QkUJIkSZLmyBX7DkCSpGnZYb9Prujrff/Ve67o60mS1AdnAiVJkiRpjjgTKElSj1Z6thKcsZQkLc+ZQEmSJEmaI84ESpKk9XLGUpI2HiaBkiRpo2GyKknrZxIoSZK0ymYlWZ2VOCVtGPcESpIkSdIccSZQkiRJM80ZS2nDmARKkiRJq2Clk1UTVV1eM5MEJrk3cCCwCfDOqnp1zyFJkiRJGxVnVefDTCSBSTYB3gzcEzgHOD7J4VX1zX4jkyRJkrTaTFb/PDORBAK3A86sqrMAknwAeCBgEihJkiRpkIaarM5KddBtgLPHbp/TjUmSJEmSNkCqqu8Y1ivJ3wH3rqond7cfC9y+qp459ph9gH26mzcBzljhMLYCfr7CrzkNxrmyjHNlzUKcsxAjGOdKM86VZZwrZxZiBONcaca5suY1zu2rauvF7piV5aDnAtuO3b5BN3aZqjoIOGhaASQ5oap2ndbrrxTjXFnGubJmIc5ZiBGMc6UZ58oyzpUzCzGCca4041xZxrmuWVkOejywU5IbJrky8Ejg8J5jkiRJkqSZMxMzgVV1SZJnAp+mtYg4uKpO7zksSZIkSZo5M5EEAlTVEcARPYYwtaWmK8w4V5ZxrqxZiHMWYgTjXGnGubKMc+XMQoxgnCvNOFeWcS4wE4VhJEmSJEkrY1b2BEqSJEmSVoBJoCRJkiTNkZnZEyhpdiTZZbn7q+qk1YplEkmuBewEbDoaq6rj+ototiXZBLgOY98xVfXD/iKS1pXkCsBmVXVB37HMoiU+538N/KCqLlnteKRx3ffQnsAOrP1d9Lq+YlpMkl2BvwGuD/wOOA04uqrOn/p7uydwtiXZFHgS8NesfQD7xN6C2ggk2ZN1/05f3l9EayTZHXgpsD3tgy1AVdWN+oxrXJJju6ubArsC36DFeQvghKq6Q1+xLZTkycC+tP6jJwO7Af9XVXv0GddiklwNeD6wXVU9JclOwE2q6hM9h3aZJM8C9gd+CvypG66qukV/UUlNkv8CngZcSms/tTlwYFX9R6+BLdB9zp9cVb9N8hhgF1qcP+g5tMsk+QotrlNon+83A04Hrgn8fVUd1WN4JDkVWPIg18+kyy/JzYCdWfsY6dD+IlpXkiOA3wOnsua7iKp6WW9BjUmyN/As4HvAicDPaH+fNwZ2pyWDL5nmCVRnApeRZDfgjcBfAVemtaf4bVVt3mtga3sv8G3gXsDLgb2Ab/Ua0RKSbA38E+t+cAzqYDvJ24CrAXcD3gn8HfC1XoNa27uA59I+NC7tOZZFVdXdAJJ8BNilqk7tbt+MlsAOyb7AbYGvVNXdktwU+LeeY1rKu2n/7qMk+lzgQ8BgkkDa3+dNquoXfQeyPrPwmdSdJX4x6570GdwBbJLvschB95BOUAE7V9UFSfYCPgXsR/udGlQSCLwVuGWSW9JO/LwTOBS4S69Rre1HwJNGLbuS7Ew7DvlH4CNAr0kgcL/u8hnd5Xu7y716iGVZST5YVQ9fJHEd3O97kv2Bu9I+N48A7gN8kfb/c0huMKS/t0VcDdi9qn632J1JbkVboWQS2JM30RrTf4g2m/E4WoY+JH9ZVQ9L8sCqOqQ7y/mFvoNawmHAf9Om558GPB44r9eIFnfHqrpFklOq6mVJDqAdLAzFr6tqSPEs5yajBBCgqk5L8ld9BrSI31fV75OQ5CpV9e0kN+k7qCXsWFWPSPIogKq6KEn6DmqBs2lLwmbBLHwmHQa8gAVnswdq17HrmwIPA7bsKZalXCnJlYAHAW+qqj8mGeKSqEuqqpI8kBbnu5I8qe+gFrjxeM/mqvpmkptW1VlD+FgazZomuWdV3Xrsrv2SnEQ7ATAU+3aX91v2UcPwd8Atga9X1d5JrgO8r+eYFvOpJH/b94z0UqrqzdBm/avqS+P3LTY2DSaB61FVZybZpKouBd6d5OvAC/uOa8wfu8tfdbMsPwGu3WM8y/mL7ots36r6PPD5JMf3HdQiRmdlLkpyfeAXwPV6jGehY5P8B+1M68WjwaHts+uckuSdrPmC2Iu2dGhIzkmyBfC/wNFJzgcGs+RqgT8kuSrdmeIkOzL2f2AgzgI+l+STrP3/c1D7MDqz8Jl0XlUd3ncQk1hk9vf1SU4E/qWPeJbwduD7tCXqxyXZHhjinsALk7wQeCzwN93+xSv1HNNCpyd5K/CB7vYjgG8muQprjk2GIOMH1UnuyMAKI1bVj7vLH3T/J3eqqs90n/dDO1b/XVX9KcklSTanLWPctu+gFvEV4KPd784fWTOrOqTVfNBWHC7cX7vY2Iob2n+sobkoyZWBk5O8BvgxA/vgAA7qilq8BDgc2IxhfeGOG30p/Ljbc/cjhneWGOATXVLwH8BJtAPud/Ya0dpu312On3UvYDBL2MbsDfw9a85yHkdb5jQYVfXg7upLu72M1wSO7DGk5exPi23bJIfR9g08odeI1vXD7ufK3c+QzcJn0v7diZRjWDup/kh/IS1uQaGQK9A+owZ1nFFVbwDeMDb0gyR36yueZTwCeDTwxKr6SZLtGN6S1ScATwee093+EvAPtN+rIf2dPgk4OMk1aYnA+cAg6yYkeQqwD+1zaEfaXvW3AXfvM64FTuiOkd5BW0r9G+D/eo1oca+jbZ04tQZYACXJHYA7Alsned7YXZvTtp9NP4YB/r0MRnc25me0s2/PpR0cvqWqzuw1sBmV5H60parb0s5ybA68bMhnubszmptW1awsb9PlkOROtDOv7+72iW1WVd/rO65x3dnMv6MlA7vRDma+UlU/7zWwGTYLn0lJ3gfclFZwY7zQzuAOYscKQgFcQptxe21VndFPROvqlq79G3D9qrpPt4/tDlX1rp5DW8eCGaGrAZtU1YV9xzWruiSQIX+fJzkZuB3w1dES1iSnVtXNew1sCUl2ADavqqGt8CHJccBdq2qQy+iT3IW2t/JptER/5ELg41X1nanHYBI4m5I8pqret+DswWUGuvRq0JLsUVWfTfKQxe4f0pn3DLh66bisW8kUGFahiG6T+660/Ys37pYAf6iqdu85tHUkOaGqdl3/I/vTJdH/yLr/P4c4Uz14Sc6oqqHuUZ05ST5FK7D04qq6ZZIr0vY2Deoge3xGqKp2TKsE/LaqGsyM0Cx8vsNlJ3MfyrqtAob4nfnVqrp9kq9X1a27/58nDaHASbff89tZogXU0LakJHkPcCNaTYfBbk1Isn23DPhqVXXRar73oJZpDMUyVZqAwZQVvnp3eY1eo9gA3cHhU1j3g3goZ7TvAnwWuP8i9xVtD17vMvzqpeMGX8kUeDBwa9rSX6rqR0mG+nv1mST/QCtm8tvRYFX9sr+Q1jEqtnI/BlpsJck/VtVrkryRxT/jn91DWEv5cpKdq+qbfQcyiRk4QbVVVX2w229HVV2SZIifTc+gmxECqKrvJBnafv9Z+HwH+BitWNWJDG8P9UKfT/Ii4KpJ7klbbvvxnmMaeT7tGO6ARe4b4paU73U/Q9+acP3u5NRmwHZpFYGfWlVPn/YbmwQubvBVmqrq7d3lIPqdTOhjtKVXn2GAXxhVtX93uXffsazH0KuXjpuFSqZ/6KrwjYqtXH19T+jRI7rLZ4yNFe1s51DMQrGVURudE3qNYjK70falf492ADu4kvEjM3KC6rdJ/oI1xZV2Y5jVbC+uqj+kq7LZzQgNbenWLHy+Q2sVcO++g5jQfrQ9jKcCT6W1YBhETYKqekp3OaT9nksaHR8n2ay7/Zt+I1rS62lt3g4HqKpvJLnzaryxSeAiRlWagJ+zpgrSjWn7MgbxgZfkDcvdP7Az2SNXq6p/6juI9UmyL2250IW0jc+7APsNqMzw0KuXjpuFSqYfTPJ2YItuCdYTaf/ug1NVN+w7hgkMvthKVX28uzyk71gmMCsHrzAbJ6ieRzvY2jHJl4Ctaa0shmbIM0Ijs/D5Dm02/eY11q5oqLr9a+9ggN9BS22VGRnSlhm4rC/xe+m+f5L8HHhcjbU1GYqqOjtrt1VZlYkSk8DlHUcrzXwtWtPT42ln4ofQaPTEvgO4HD6R5L5VdUTfgazHE6vqwCT3Av6CVqL7vfTf+HZkseqlg/vC6Ay6kmnap+5/007wXADcBPiXqjq618CWkNbf7O+B0VnCzwFvr6ohlWN/ZVeA4fmsKbby3H5DWluSj7PMrEpVPWAVw1lWrel1dm3GllgO1CycoDqdtvT/JrRZ1TMYXtVvGPCM0JhBf76PuRPwhBmZTV+4z3IU6xBWe4y2ylybVtXys93tuwFfZiBbZsYcBDyvqo4FSHJX2rHSHXuMaTFnp7Utqe47fl/WrFaZKgvDLCPJSVW1S5JnAVft9pCcXFW36ju2WZTkQtpexosZcM+W7iz2LZIcCHyuqj462qTdd2wLWb30zzfkymsLpbUKuBIwmsF6LHBpVT25v6jWlmTTqvp933Esp6vKBvAQ4Lqs6WP5KOCnVTWYpDXJA2h7cK5Pq1a9PfCtqvrrXgNbRJKX0BL/uwNvpmuvU1Uv6TWwMaPv9fWNaePRVVldx+gEy5Ak+TaL7LOsdXtw9ibJUcDjR6vmklwPeE9V3avfyNaW5BtVdcv1jfUtyVbAgcA9aMfFRwH7rsa/uTOBy0taH4+9aGfkYJV6d0yqK7byT8DODLwSX1UNtdjGQid2H3I3BF7YFQkZTInhJJvSlgbdiXaQ9cUkbx3qgfcMFIo4Kcltq2po+9YWc9sFX2CfTfKN3qJZ3GlJfkrb//sF4ItDO0nR7VUkyQELqq1+PMnQ9gm+grYv8DNdtcC7AY/pOaZFVdUruqsfTvIJBnSCKsl1gW1oyytvTTvYgjZTfbXeAlvCwGeEgMtaLuzPmpUJnwdePpR/85EZm02fhX2W245tmwL4KbBdX8Es46zuxNR7u9uPAc7qMZ5FVWvz1MsKQ5PA5e0LvBD4aFWdnuRGwLHrec5qG1Xi25PhVuKbqbLCtIT/VsBZVXVRV0RgSMViDqXtV3xjd/vRtA+5we1rmZFCEbcH9kryA1rFzcEuFQIuTbJjVX0XoPtMGlSRpar6y7TG1n9D+1x6c5JfDXQFxdWT3KiqzgJIckPWVF4eij9W1S+SXCHJFarq2CSv7zuoxSR5BnBYVf2qqi5OcrUkT6+qt/QdG63wwhNozbfHS8RfCLyoj4DWYxYqbx4MnAY8vLv9WNp++mX3jq22pWbTaScnh2YW9lkek+TTwPu724+gFfwbmicCL2PNMtUvdGODkuQ1wCtpy+mPBG4BPLeq3rfsE1fivV0OurRZ2Eic5MSqus1oCWM3dnxV3bbv2EaSHFRV+2TtRsIjNbRZy26f2F7Ajarq5d0B7XWrahDJS5JvVtXO6xsbgrGltaPLzYBPVdXf9B3byIwtFbo77SDrLFqyuj2w92jPwxAkuQEtAbwLcEvgl7TZwFf1Gtgiktybtm9k/O/zqVX16V4DG5PkM8CDgFcBW9EOYm9bVUPb18Ji2yWGtpQ+yUOr6sN9x7E+6frF9R3Hcpb49x7clplutcQeLJhNr6onreepq26GjpMezJoZ4OOq6qN9xjPLRr8z3d/p/WjFq45bjWWrzgQu7y3dnqv30M5uDmqJQ2cWKvHt013ORFlh4C205Z97AC+nnSn+MDCUxPqkJLtV1VcAktye4Za6H3yhiGpNWm9JS1wAvlBVQ1tiCUBVHZPWNHrUPPyMqhpa36sf0opo/VtVPa3vYJZTVUd2f5837Ya+PcC/zwcCv6fNCu0FXJP2uTREmyRJdWeXk2zCwPpzVdWHZ2CJOszGjNDvktypqr4Ily1h/d16ntOHmZlNn6HjpC8Dl9C2pAziBPnILBX+6oxysT2BD1XVr7N2pdCpv7EWUVV/k9YaYm/aPrGv0Ta/DqVKJMxAJb5xXQWkHVi7WfyhvQW0uNt3BYG+DlBV5ycZ0oHMbWglr3/Y3d4OOCPJqQxvGeNilUwHVeEurSXIU1izZOR93ez1G5d5Wi8W2Q/6hSRvG9h+0FvT4nt0kv2A7wCfr6p39RvWkm7Dms+kWyYZ1GdSVf127ObQW1ocCfx3WssVaFUtj+wxnnXMyBJ1mI3Km08DDu2OQQDOZ1hbJ0Z+1a1COQ44LMnPaEv/ByfJvyw2PqSTFEkeTvtO/xxtBcUbk7ygqv6n18DWeG13uWjhr14iWt4nuoJAvwP+vqv1sSrf6S4HnUB3NvNBwBtoZeQDvKh67onSxfXsqvrPPuOYVJL3AjsCJ7Nmj0PVwHoaJvkqrYTw8V0yuDVw1FCWNC21fHFkiMsYYbiVTJOcAtxhdLCd1iz+/waWTAOQ5IO0menRl9qjgS2qalD7QbsDrjvRZlcfA1BVy/6/7cMsfCal9eb6d1pZ9jDQqsoASa5AS/zu3g0dTasOOpg9bbOwRH3okjy+xnpsJhn9X/wdcGhVPaqfyBbXfab/nva7M5pNP6wGVHFzJMnzx25uSlse+K2qGsxetm557T2r6mfd7a1pS22HVnXzhAWFvxYdG4IkW9KKAl3a/X+9RlX9ZNrv60zgMpLcgnZWa0/al9n9q+qkblnb/9FzT5TuP8ujgJlIAmlnNHceLRUasDcAHwWuneRfaWeK/7nfkNZYqtJZVf1wySetsizTVLabaRlSP6GwduGFS1lTOXBobrZg7+exSb7ZWzSLSKuueRXacqEvAHce6okJZuMz6TW0755V6Rv156jW6Pqt3c9QDXqJepLHVNX7kjxvsfur6nWLja+yfZNcpaoOAqiqC7oD108A5/Qb2rpmaTa9qg4Yv53ktcBg9ih3rjBKADu/YJi9Nmeh8BcAVfXLseu/ZZVmqk0Cl/dG2nKRF1XVZevcq+pHSYaSFHwpyZtoFUIv+08zsH0DI6fRpuZ/vL4H9qmqDktyIu1sdoAHDekALLNR6ez+y9xXDKup7LuBryYZbWx/EK0y3xDNwn7Q+1TVoCoUL2MWPpN+OqTPn+V0+ytfxbotiwbT1oDhL1EfHaQOuaXSPYAj03qCvqGbCToCOKaq9us5tsuk9SZe7ATPYGfTF3E1WkXbITky61YHPaLHeJbyXOBzSdYq/NVvSMPictAZNwuVpMY26V6D1nrha6y90X1om3RHS22vw9p7Fwcx0zZLlc6GLMm2VXV2d30X2vJFaLNX21TVJ3oLbglJvkUrCrPWflC6Dfp9LmGdkRmMtXSfn7diwJ9JSQ6kJar/y9oxDulECgBJvkjrG/eftBNBe9NmDRbd59SHbgbr4tF1WrL6+wEWBBq0bgnop2iflw8E3lZVB/Yb1ewb7e3vbm4CbE3rvfim/qJaV7fa57LvzKFWB+1+x4dc+KtXzgQuY0bOaj5pNNU9ktY7bEgOpyVUX1gw/jcM8Ax8kmfRDmR+ypqlgUXr3TIEg690NiMJwdFJ7l1V3+9mzk8CSPJE4MW0pU1Dc+++A1jGLMxgLPTSvgOYwObARcDfjo0NbTZ95KpdBdt0S4Bf2q2qGEwSSNvKsQtAd0B4cZKTRmND0X2PHwjsRvv3/j9a77Dem12PLfc/iNZz8Rjg7NH4EE9QwLC3UIy539j1S2grAS7pK5hlfJl2fPQnWjXowVlkW8qOSX4NnLpgOWuvkhxTVXdf39g0mAQu792sOat5N7qzmr1GtK7/Yd0vrw/RKt4NxQOBF9aCnotJfgn8G8NbercvcJMhbhrvjCqdfYHhVjqbhYTgecBRSfasqu8AdNUs96L1uBucIe8Hraq3d5cv6zuWSVXV5/uOYQLvrKovjQ+kleIfoou74jDfSfJM4Fxgs55jAiDJdYFtgKsmuTVr9v1uTltyNzT/BbwZeHB3+5G05XdD6B04vtz/8AVjgztBMSNbKEYrkD5dVTdd74N7lOTJtBM7n2VNddCXV9XB/Ua2jicBd2BNnHcFTgRu2MX73h5jG1X7vhqwVZJrsfZn0jarEoPLQZeWNY3YT62qm4+PDSC2m9I+wF4DvGDsrs2BF1TVYD7cskzz+vG/26Holojdc6Bn30hyNdZUOnsM7d/8sPGNxZpMWvP1t9P2AT4ZuB2wZ1Wd32dcS1nqYGZgv++b0r58F/ZhG1J1u5nZK5TkpKraZX1jQ5DktrSD6y2AV9CqML5mtIe1T0keDzyBVgzoeNYccF1Ia/00tMTllIXLu5N8Y2gVGGfBLG2hSPIx4FlDOLG3lCRnAHccnShP8hfAl6vqJss/c3V1+xYfV1U/7W5fBziU1iriuKq6Wc/x7Qs8h/Z9/qOxuy4A3rEaS4CdCVzeYM9q0vYF3Y/2ZTt+Vu5CWs+zIdlimfuuulpBbICzaJuJP8nae3B6XcK4xIHr6EDmX5J8F3hxVR2zupEtbegJQbd0bW9av6MvA3vUsHruLfQK2vKwtQ5meo5pofcC3wbuRWtqvhctMRiMqhryDDUASe5Aa1Wz9YJl1ZvT9goNTlWNloX9hoH1i6vW0uCQJA+tqg/3Hc8EPtWtTPgA7XP/EcARaaXk8aTfBhn8Foox1wJOT+tLPV7sbzD7lGnVQC8cu31hNzY0244SwM7PurFfJvljX0GNdHtoD0zyrOqpL7FJ4PL2pU3VPpt28LUH8PheI+pU1ceAjyW5Q1X9X9/xrMcJSZ5SVe8YH+yWFJzYU0zL+WH3c+XuZxCWO3DtlpHcDDisuxyKwSYEY0l1aC0N7g78LMngZoPGDPZgJskVu9nzv6yqhyV5YFUdkuS/WHc/sNbvyrSTjldk7WXVF9Da1gxGksOXu38IB7BJ7g+cMkoA05pyPxT4AbBvVX2vz/gW8fDucmE1w0fSPreGtvd/yGZhC8XIS/oOYCljJ6POpFXU/hjt/+IDgVN6C2xpn0vyCdoWKWi/759La2fyq96iWtfbkzwbuHN3+3PA26tq6omqy0FnXJLXAK+k9T46kla85LlV9b5ln7iKuin4jwJ/YE3StyvtIOfBtQoNMS+P7kuDqvpN37FMKslTR3uzhiDJ17sZq1Fj5ivRKont1ndssyjJZ2hLV18FbEU7s3nbqrpjn3HBmiWKSb5WVbdLchzwdOAnwNcGVlBrZiTZvqp+kORqVXVR3/EsJsl5wNm0PWtfZUGfzSHsvUxyCrBbVV2U5H60giaPAm4NPKyq7tVrgJqa7qD/d7SaDoNuFj8uyVbAL2ogB+pJ9l/u/qHtB+9O6D4UGO2h/hLw4aH8fY4keSdwJdb0sHwscGlVPXnq7z2wv4tByJqWBosawlnNkSQnV9WtkjyYtjz0ebS1zoPbN9AtXRvNUp1eVZ/tM56lJLkZbQZry27o57R15af3F9VsMiFYWd3BzGg/6KAOZsaSwCcDHwZuDryHNpv1kiGdnJgl3bLQdwGbVdV2SW4JPLWqnt5zaJfpViLck5ZU3QL4JPD+IX1mju+nS3IwcEZV/Xt3e3B7LLul9E+nleEv2izW24a0XD3rVl8EGFz1RWgnU4Cdquoz3b76TarqwvU9b7Uk2Q14NfBL2sqz99JO9F2BdvxxZI/haYoW2+u7Wvt/XQ66uNf2HcAGuFJ3uSfwoar6dTv5MTxVdSywWF/DoTkIeF4XL0nuCryDtj9HG+agrurVS2hV5DZjWOXiZ0pVjS9hOmTJB/bj2mPLhUb7wd7cXV59kcdrMq+nLac+HKCqvpHkzss+Y5VV1aW0lShHpvXlehRt2dXLVqO4wYTSre64iLb0+y1j9226+FN6dShtr9Vor9CjaYnBw3qLaF2j6ouj7/W7MqDqiyNJngLsQzuxuyOt8uLbaP8PhuJNwItoJ/Y+C9ynqr6SVgTw/bTfr0FIsiutjdL2rN1LeShttIDLTlL8O3Bt2onToW71uDTJjlX1XbisPcylq/HGJoGLGMLSlQ3w8STfpi11+PskW9NmCnT5XX2UAAJU1WgNuTZQVb2zu/p53MPyZ1uiONCvgROA51e/PcQ2oSX5i52FcsnJn6Gqzl5wcm9VDhA2RJf87UlLAHcA3kDbBjAUrwdOpu2p/FZVnQCQ1i5icP1qgZtV1c5jt49N8s3eolncFYG/WqT64u2B42hJ6xA8g1b5+asAVfWdtDY7Q3LFqjoKoEugvwJQVd8e4In9w2hV6U+l9QkcqtcA96+qQdQhWMYLaL/fZ9G+P7dnlQprmQQuIzPQLL6q9uv2Bf66qi5N8lvaJl1dfmcleQlrvsAeQ6sYqg3UHRg+lHZQOH7G8OV9xTTjXg+cQ+shFlqRiB1pje4Ppp2J78uP/XedirOT3BGobk/tvgykuNJIkkNpS/2PAF5WVaf1HNI6qurgtJLx1wa+MXbXTxhYJdPOSUl2GyUDSW5PO9kzJIOuvjjm4qr6wyiZSnJFhndiajyZ+t2C+4YW63lVtWwxqIH46QwkgKMq5TvRqv5DW6p+8XLPWSnuCVxGki+ypln8/emaxVfVYJazJXncYuNVdehqx7Kx6JYvvoy2FwPaXoyX1kB7xw1ZkiNpM1UnMjZ7UVUH9BbUAjO0ZGSpvQOjfcG99hAbFQHq6/03Vl1xiAOBe9D+bx5Fq2bZ+z7QkSR/Yk21xfGDisH+Lg1dkm/RDgpH/eK2A84ALqH9nfa+9C7JW2hxjVdfPIc2s/GJqrpbX7GN606U/wp4HPAs2l7Lb1bVi/uMa1ySS2m/Q6G1zhoVgQqwaVVdaannrra0/rqPAo5h7TZaQ+u1eSBwXeB/GXCcAN2Jvh1Y+2T51I/jTQKXkQE3ix9JMt5bZFPaGveTqmpQJcQ1n5KcVj03ZF2fJGcyG0tGSPJ/tJNS/9MN/R1t/+puo2Swx9i2LHuXSSuiK2SypKr6wWrFspQZqr54Bdr+xb+lJVWfBt45tDhnRZL3ATcFTmfNDGbVQPr/jiR59yLDQ4zzvbQVPSez5mR5VdWzp/7e/g4sLcmXabNB/0PbqHsu8OqqusmyT+xRki2AD1TVvfuOZdbMUlXYWZHkIOCNVXVq37EsJcmXqmr39T+yf92G8QNpxRgK+ArwXNpn022q6os9hqcVlOQfq+o13Ym+dT6XVuMAQasvyeZVdUG6pvALeaJlckluUlVnLHHf7lX1pdWOaWOQ5IwhHwfPmm7Wf+c+TkqYBC4jyW1pey+2oJXsvSbwmtEa/SHq9oycXlU37juWWZPkLt3Vh9CWEIx6LT6Ktrb8ub0ENoOSnEY7Q3hFYCfansqLWbM8bAhLmUblze/CjCwZ0fzoigQ8jnaGeB1VNbTqsIO2VFI1MpTkKsknqup+Sb5HS/7Hq4LUkGoSDH0pfbdM+b3AM2pBv98htgWZFd0M239U1dAKFQGzdwItyYeAZ1fVqheosjDMMqrq+O7qbxjmxvGFs1ebAH8FfLC/iGbXqCpskgOqatexuz6eZGgb8oduG+BWfQexHvcfu34RbanQSAGDSwK76r9PYd29A4Na3qIV8QbgP4Dr0T7T319VX+83pJl2ImuSqu2A87vrW9D23d2wt8jGVNX9ustBxLMeQ6++eDptj+JJSR634AT+4EpuzpDdgJO7ExWDOrnbGf1/nJXjtq2Abyb5GmufiJ766jOTwEUkWbbq0cCWBY73NLyElgg+oqdYNhZXT3KjUbn9JDfEPmcb6ntD2LOynKraGxZfFpRkqMtDP0YrVPQZBtgmQCunql4PvL7bG/ZI4OAkV6VVhn1/VX2nz/hmzSipSvIO4KNVdUR3+z7Ag3oMbS1Jlp2dqqqTViuWCQy9+uIfq+rFXVXYw5IcAryyqv7E8CpuzpKhbzf6ISy+WiLJ369+OOv10r7e2OWgi0hyHnA2rUHnV1lwxqgG1kew63P0aFoT2e/RNmYPpUHvzElyb1rD+PGeLU+tqk/3GtgMSXIO8Lql7q+qJe9bbYstCxrqUqG+i7+oX91n/cHALapqk77jmUXjhd6WG+tLkmOXubuqao9VC2Y9hl59cfxzvKuX8FbaLPBewEeG+Bk/S7pei+Pt0364zMNXTbeU/mFVdeKC8ZfRZq4H8e+eJOvbBzjJY/4czgQu7rrAPWl7wR4NfJJ25vX0XqMak+TGtPgeBfwc+G9aUj+IksyzrKqO7Hq23LQb+natUs+WjchyjcMHIckdgDsCWyd53thdm9PiH6JPJLnvaBZDG7+up9l9aLOBdwc+R49njjcCP0ryz6zZ870X8KMe41nLjH2Hb86wl9Jf9v1TVb8CHpXk8cAXaW0YdDkkeQBwAHB9Wm/I7WlLMP+6z7jGPAz4UJK9qur/uiq2b6W1XLlrr5Gt7dgkHwY+Np5AJ7kyrSjl44FjgfdMKwBnAtcjrdn1o2h7M142lBm2bsPzF4AnVdWZ3dhZQ9o0Psv66tmysRjqTNq4rhDQXYGnAW8bu+tC4ONDXG6X5ELa0uSLgT8ysEIMWjlJRici7wt8DfgA7WDht8s+UcvqCsTsD9yZlrAcB7x8QIVh9qiqz44VrlrLUGbZZkGSp1fVWxYZvxHwj1X1tB7CmnlJvgHsAXymqm6d5G7AY6rqST2HdpkktwA+CjyDto8e4NFDOqGfZFPgibQTUTek9bK8KnAFWj/Yt0x7H7hJ4BK65G9P2pfwDsDhwMFVdW6fcY0keRDtzPDuwJG0A4R3zshm8kHrs2fLxiIz1Dg8yfZD37+o+ZPks7T9fx+uqvP7jmdjk+TqQ0yok7ysqvafhR5nSW4AvJE1fQK/AOxbVef0F5WmLckJVbVrlwzeuqr+lOQbVXXLvmODtSoB70xbqvwZ4Jl0PQ2HcsJnXFfZfyvgd92s9eq8r0ngupIcCtwMOILWc++0nkNaUpKrAw+kJat7AIfSNr0f1WtgM6zPni0bi8xQ4/BuD85iZaSHtPfmplX17aWKRgysWIQ0WN0qj3cCm1XVdkluSdvz/fSeQ5s5SY6mnah4bzf0GGCvqrpnf1Fp2pJ8hlZM6VW0xOVnwG2r6o59xjUy1l4F1iwJHlUGHlSblb6ZBC6iW2o5OkM4/hc06KVXSa5FWwv9iKq6e9/xzKo+e7Zo9SW5zdjNTYGHApdU1T/2FNI6khxUVfssKBpx2WfTkBJWaciSfBX4O+Dw0WqFJKdV1c36jWxt3Wqkh7LutoSX9xXTQosVqrJ41cYryV8C16GtkvodbdniXrQ9gZ9cWIhFw2dhmEVU1RX6juHy6JYMHdT96PLrrWeLVt8iX1xf6v7th+SdSa47KhrRFTd4KPB9LBIibZCqOrvVirjMENutfAz4Na2/4WD2MS3wiySPoVVSh7Yi6Rc9xrOoJHcHvlxVv+s7lhn3euCFY8uo/wQckuTmwL+xdu9dzQCTQGldL+07AK2esf0D0M5s3ga4Zk/hLOVtwD0AktyZtgznWcCtaCd9/q63yKTZcna3JLS6fTj7sqa59JDcoKqG3o/tibQ9gf9JW5nwZWDvXiNa3OOAtyb5JW3f4nHAF91ru8GuU1WnLhysqlOT7NBDPPozmQRKCwytD6Sm7kTW7Be4hNZrczBVzjqbjO2xfARwUFV9GPhwkpP7C0uaOU8DDgS2Ac6lVeEb4n7ALye5+WIH3UPRFdQa/AqZqno8QJLr006YvZnW3sBj4A2zxTL32XJjBvkLIHW68vuLbZId9F5Q/XlmpKLuJkmuWFWX0HrF7TN2n5/j0uRuUlV7jQ8k2R34Uk/xrCXJqbTvoSsCe3eNry9mzffQLfqMDyDJG1n8uxKAoVXS7pas/g1wc1pf5TfRZgS1YU5I8pSqesf4YJIn006mDsKC1T3rmJWidavBwjCS5lq3JOzvaX3DoDXjfntV/bG3oBZI8mJav7ifA9sBu1RVdRv1D6mq3Zd9AUnA4j1Mh9TXNMn2y90/hHY23Z7kkZfR+i5epqoOWd2Ilpfk58B3acvqj62q7/cb0WxKch1a770/sCbp2xW4MvDgqvpJX7GNG6sOGtr35fnd9S2AH87Iid9VYRIoaa4leSdwJWB04PJY4NKqenJ/Ua0ryW7A9YCjRhvzk9yYVureFhHSMpLcAbgj8BzaHraRzWkHsEPpcbYpbcnqXwKnAu/qVgAM0qz0hE3y17QTfXcCdgLOqKrH9hvVbOqaw4+q6Z5eVZ/tM56lJHkHrWXaEd3t+wAPqqqn9hvZcLiMSNK8u+2CA8DPdk1wB6WqvrLI2P/rIxZpBl0Z2Ix23HONsfELGFZhpUOAP9KWK96H1vB6314jWt7gZxKSbE6bEdqe1nLjmnSNw7XhqupY4Nj1PrB/u1XVU0Y3qupTSV7TZ0BDYxIoad5dmmTHqvouQJIbMcyS8ZIup67g1+eTvGcISyqXsXNV3RwgybuAobWrmUVfHPt5U1Wd03M8Wh0/SvLPwPu623sBP+oxnsExCZQ0714AHNsVYAjtbPEQy5xLupySvL6qngO8Kck6s1cD6gN72V7kqrpkQT/DQVhQRO1qSS4Y3cUAi6iNiukkuVpVXdR3PFo1j6LtV/0o7f/rcd2YOu4JlDT3klwFuEl384yqGmpzZkmXQ5LbVNWJSe6y2P1DaQ2U5FJg1Iw7tNL7FzHQBGsWdPtB30XbP71dklsCT62qIbYG0QpLcvWxBvcaYxIoaS4luS1w9qiiWZLHAQ8FfgC81DLSkjT7knyVtu/z8FERmySnVdXNln+mZlmSOwLvxOR/SVfoOwBJ6snbaaWuSXJn4NXAocCvgYN6jEvSlCTZPcnRSf5fkrOSfK9bCq6NWFWdvWDIfd8bv/8E7gX8AqCqvsGaVlDCPYGS5tcmY7N9jwAOqqoPAx9OcnJ/YUmaoncBz6X1OTMRmA9nd7NC1fWF3Rf4Vs8xaRVU1dkL9tX6Oz/GJFDSvNokyRW7Hlx3B/YZu8/PRmnj9Ouq+lTfQWhVPQ04ENgGOBc4CnhGrxFpNZj8r4d7AiXNpSQvBu4L/JzWQ2qXqqokfwkcUlW79xqgpBWX5NXAJsBHgMsKQFXVSb0FJWnFJdmKlvzfg1ZY6Sjg2e73X8MkUNLcSrIbcD3gqFH1sCQ3pm0k96BQ2sgkWazJdVXVHqsejKYqyb8sc3dV1StWLRituiS7V9WX1jc2z0wCJUmStFFJ8vxFhq8OPAn4i6rabJVD0ipKclJV7bK+sXnmvhdJkrRRS/K8BUNFWwr+xar6Xg8hacqq6oDR9STXoO0J2xv4AHDAUs/TbOv6Qt4R2HrB7/3mtKXg6tgiQpIkbeyuseBnc2BX4FNJHtlnYJqeJFsmeSVwCm3iY5eq+qeq+lnPoWl6rgxsRvv3Hv+dv4DWL1Idl4NKkqS5lGRL4DMuEdv4JPkP4CG0vq9vrqrf9BySVlGS7avqB33HMWQmgZIkaW4l+XpV3brvOLSykvyJVgH2Etry38vuohWG2byXwDRVSV5fVc9J8nHW/ncHoKoe0ENYg+SeQEmSNJeS3A04v+84tPKqyi1P8+m93eVre41iBjgTKEmSNmpJTmXdWYEtgR8Bj6uqb69+VJLUH5NASZK0UUuy/YKhAn4x6g8qaeOSZHfgpcD2tJWPo2XAN+ozriExCZQkSZK00UjybeC5wInApaPxqvpFb0ENjHsCJUmSJG1Mfl1Vn+o7iCFzJlCSJEnSRiPJq2nN4T9CqxILQFWd1FtQA2MSKEmSJGmjkeTYRYarqvZY9WAGyiRQkiRJkuaIewIlSZIkzbwkz1swVMDPgS9W1fd6CGmwbKQpSZIkaWNwjQU/mwO7Ap9K8sg+Axsal4NKkiRJ2mgl2RL4TFXt0ncsQ+FMoCRJkqSNVlX9ktYwXh2TQEmSJEkbrSR3A87vO44hsTCMJEmSpJmX5FRaMZhxWwI/Ah63+hENl3sCJUmSJM28JNsvGCrgF1X12z7iGTKTQEmSJEmaI+4JlCRJkqQ5YhIoSZIkSXPEJFCSNNeSXDfJB5J8N8mJSY5IcuMVfP27JrnjMvc/IMl+K/V+kiStj3sCJUlzK0mALwOHVNXburFbAptX1RdW6D1eCvymql67yH1XrKpLVuJ9JEmalEmgJGluJdkDeGlV3XnBeIDXAPehVZd7ZVX9d5K7Av9QVffrHvcm4ISqek+S7wOHAPcHrgQ8DPg98BXgUuA84FnAk7rxWwNfAk4Bdq2qZybZGngbsF0XynOq6ktJ7gIc2I0VcOequnCF/zokSXPCPoGSpHl2M+DERcYfAtwKuCWwFXB8kuMmeL2fV9UuSZ5OSxafnORtjM0EJnkScAPgjlV1aZInjD3/QOA/q+qLSbYDPg38FfAPwDO6hHAzWhIpSdLlYhIoSdK67gS8v6ouBX6a5PPAbYEL1vO8j3SXJ9ISyaV8qHvthe4B7NwmIgHYvEv6vgS8LslhwEeq6pwJ/xySJK3DwjCSpHl2OnCbDXj8Jaz93bnpgvsv7i4vZfkTrUs1Lr4CsFtV3ar72aaqflNVrwaeDFwV+FKSm25AzJIkrcUkUJI0zz4LXCXJPqOBJLcAfgU8Iskm3T69OwNfA35Am6m7SpItgLtP8B4XAteYMJ6jaPsGR7HcqrvcsapOrap/B44HTAIlSZebSaAkaW5Vq472YOAeXYuI04FXAf9FK9jyDVqi+I9V9ZOqOhv4IHBad/n1Cd7m48CDk5yc5G/W89hnA7smOSXJN4GndePPSXJaklOAPwKf2rA/qSRJa1gdVJIkSZLmiDOBkiRJkjRHTAIlSZIkaY6YBEqSJEnSHDEJlCRJkqQ5YhIoSZIkSXPEJFCSJEmS5ohJoCRJkiTNEZNASZIkSZoj/x+6OMRbQdGykAAAAABJRU5ErkJggg==\n", 1123 | "text/plain": [ 1124 | "
" 1125 | ] 1126 | }, 1127 | "metadata": { 1128 | "needs_background": "light" 1129 | }, 1130 | "output_type": "display_data" 1131 | } 1132 | ], 1133 | "source": [ 1134 | "plt.figure(figsize=(15,5))\n", 1135 | "ax = df[df.booking_complete ==1].booking_origin.value_counts()[:20].plot(kind=\"bar\")\n", 1136 | "ax.set_xlabel(\"Countries\")\n", 1137 | "ax.set_ylabel(\"Number of complete bookings\")" 1138 | ] 1139 | }, 1140 | { 1141 | "cell_type": "markdown", 1142 | "id": "8ae2cf98-2b59-4d0c-afa6-b92269949ff4", 1143 | "metadata": {}, 1144 | "source": [ 1145 | "Above chart shows travellers from which country had their booking complete. " 1146 | ] 1147 | }, 1148 | { 1149 | "cell_type": "markdown", 1150 | "id": "92ebdc99-d53b-48ed-92c5-a875e8bf62d3", 1151 | "metadata": {}, 1152 | "source": [ 1153 | "### Booking complete" 1154 | ] 1155 | }, 1156 | { 1157 | "cell_type": "code", 1158 | "execution_count": 117, 1159 | "id": "a7039c39-6989-4c36-b61d-acc4cc8fcbb4", 1160 | "metadata": {}, 1161 | "outputs": [], 1162 | "source": [ 1163 | "successful_booking_per = df.booking_complete.value_counts().values[0] / len(df) * 100" 1164 | ] 1165 | }, 1166 | { 1167 | "cell_type": "code", 1168 | "execution_count": 118, 1169 | "id": "215d81a8-44dd-462d-81c7-acc3b6786ccf", 1170 | "metadata": {}, 1171 | "outputs": [], 1172 | "source": [ 1173 | "unsuccessful_booking_per = 100-successful_booking_per" 1174 | ] 1175 | }, 1176 | { 1177 | "cell_type": "code", 1178 | "execution_count": 127, 1179 | "id": "3e64b38d-03b0-419e-9fc1-7bf465c0cee6", 1180 | "metadata": {}, 1181 | "outputs": [ 1182 | { 1183 | "name": "stdout", 1184 | "output_type": "stream", 1185 | "text": [ 1186 | "Out of 50000 booking entries only 14.96 % bookings were successfull or complete.\n" 1187 | ] 1188 | } 1189 | ], 1190 | "source": [ 1191 | "print(f\"Out of 50000 booking entries only {round(unsuccessful_booking_per,2)} % bookings were successfull or complete.\")" 1192 | ] 1193 | }, 1194 | { 1195 | "cell_type": "code", 1196 | "execution_count": null, 1197 | "id": "6a1f9ef8-5fc9-4632-a6f7-4fc62d6a1828", 1198 | "metadata": {}, 1199 | "outputs": [], 1200 | "source": [ 1201 | "\n" 1202 | ] 1203 | }, 1204 | { 1205 | "cell_type": "code", 1206 | "execution_count": null, 1207 | "id": "b355a529-18e7-4696-b251-1e1df3e865b2", 1208 | "metadata": {}, 1209 | "outputs": [], 1210 | "source": [] 1211 | } 1212 | ], 1213 | "metadata": { 1214 | "kernelspec": { 1215 | "display_name": "Python 3 (ipykernel)", 1216 | "language": "python", 1217 | "name": "python3" 1218 | }, 1219 | "language_info": { 1220 | "codemirror_mode": { 1221 | "name": "ipython", 1222 | "version": 3 1223 | }, 1224 | "file_extension": ".py", 1225 | "mimetype": "text/x-python", 1226 | "name": "python", 1227 | "nbconvert_exporter": "python", 1228 | "pygments_lexer": "ipython3", 1229 | "version": "3.9.7" 1230 | } 1231 | }, 1232 | "nbformat": 4, 1233 | "nbformat_minor": 5 1234 | } 1235 | --------------------------------------------------------------------------------