├── Flight Price Predication using Machine Learning ├── Flight Price Predication Using ML Techniques.ipynb ├── Flight_Price_Prediction.pdf ├── Flight_Price_Prediction.pkl ├── Flight_Price_dataset_2.xlsx ├── Presentation on Flight Price Prediction.pptx ├── Project Report on Flight Price Predication Using ML Techniques.pdf ├── Read me.md ├── Web Scraping Data of Flight prices.ipynb └── sample-documentation.docx ├── Malignant Commentes Classifier - Multi Label Classification Project using NLP ├── Malignant Comments Classifier - Multi Label Classification Project.ipynb ├── Malignant Commentes Classifier - Multi Label Classification Project using NLP - FlipRobo.pptx ├── Rating Prediction Project presentation - FlipRobo.pptx ├── Read me.md └── test_dataset_predictions.zip ├── Micro Credit Defaulter Project ├── Project Solution Files Micro Credit Project Defaulter.zip └── README.md ├── Product Review Rating Predication Using NLP ├── Problem-Statement.pdf ├── Product Review Rating Predication Using NLP.ipynb ├── Product Review Rating Predication Using NLP.pdf ├── Rating Prediction Data Web Scraping .ipynb ├── Rating Prediction Project presentation - FlipRobo.pptx ├── Rating_Prediction_dataset.csv ├── Read me.md └── sample-documentation.docx ├── Project Customer Retention in Ecommerce sector ├── Customer Retention Project.zip ├── Customer_retention_case study.docx ├── Project Customer Retention Case Study .ipynb ├── Project Report on Data Analysis of Customer Retention in Ecommerce Sector .pdf ├── Read me.md └── customer_retention_dataset.xlsx ├── Project Used Car price predication using ML ├── Car Price Predication Project.zip ├── Car Price Predication Using ML Part 1.ipynb ├── Car Price Predication Web Scraping script.ipynb ├── Car Price Prediction Part 2 ML Model .ipynb ├── Car price prediction using ML ppt.pptx ├── Project Report Car Price Predication .pdf └── Read me.md ├── README.md ├── Surprise Housing - Housing Price Predication & Analysis Project ├── Project-Housing--2---1- │ └── Project-Housing_splitted │ │ ├── Data Description.txt │ │ ├── HOUSING Use Case 2.pdf │ │ ├── sample documentation.docx │ │ ├── test.csv │ │ └── train.csv ├── README.md ├── Surprise Housing - Housing Price Predication & Analysis Project.ipynb ├── Surprise Housing - Housing Price Predication & Analysis Project.pdf └── Surprise Housing Price Predication .pptx ├── Web Scraping 1 Assignment ├── Read me.md └── Web Scraping Assignment1 BeautifulSoup.ipynb ├── Web Scraping Selenium Assignment 3 ├── Fruits_Cars_ML_google_images.zip ├── Selenium Exception Handling Assignment ├── WEB SCRAPING ASSIGNMENT 3 Selenium Exception Handling.ipynb └── WEB-SCRAPING-ASSIGNMENT-3.pdf ├── WebScraping Assignment 4 Selenium ├── Web Scraping Assignment 4 └── Web Scraping Assignment 4 Selenium Exception .ipynb ├── Webscraping Assignment 2 Selenium ├── Web Scraping Assignment 2 Selenium.ipynb └── Webscraping 2.md └── Worksheet_set_1 ├── Machine Learning Worksheet 1.pdf ├── Python Worksheet 1.ipynb ├── Python Worksheet 1.pdf ├── Statistics Worksheet 1.pdf └── Worksheet_set_1.md /Flight Price Predication using Machine Learning/Flight_Price_Prediction.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Flight_Price_Prediction.pdf -------------------------------------------------------------------------------- /Flight Price Predication using Machine Learning/Flight_Price_Prediction.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Flight_Price_Prediction.pkl -------------------------------------------------------------------------------- /Flight Price Predication using Machine Learning/Flight_Price_dataset_2.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Flight_Price_dataset_2.xlsx -------------------------------------------------------------------------------- /Flight Price Predication using Machine Learning/Presentation on Flight Price Prediction.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Presentation on Flight Price Prediction.pptx -------------------------------------------------------------------------------- /Flight Price Predication using Machine Learning/Project Report on Flight Price Predication Using ML Techniques.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Project Report on Flight Price Predication Using ML Techniques.pdf -------------------------------------------------------------------------------- /Flight Price Predication using Machine Learning/Read me.md: -------------------------------------------------------------------------------- 1 | FLIGHT PRICE PREDICTION PROJECT 2 | 3 | Anyone who has booked a flight ticket knows how unexpectedly the prices vary. The cheapest 4 | available ticket on a given flight gets more and less expensive over time. This usually happens as 5 | an attempt to maximize revenue based on - 6 | 1. Time of purchase patterns (making sure last-minute purchases are expensive) 7 | 2. Keeping the flight as full as they want it (raising prices on a flight which is filling up in order 8 | to reduce sales and hold back inventory for those expensive last-minute expensive 9 | purchases) 10 | So, you have to work on a project where you collect data of flight fares with other features and 11 | work to make a model to predict fares of flights. 12 | 13 | STEPS 14 | 15 | 1. Data Collection : 16 | You have to scrape at least 1500 rows of data. You can scrape more data as well, it’s up to you, 17 | More the data better the model 18 | In this section you have to scrape the data of flights from different websites (yatra.com, 19 | skyscanner.com, official websites of airlines, etc). The number of columns for data doesn’t have 20 | limit, it’s up to you and your creativity. Generally, these columns areairline name, date of journey, 21 | source, destination, route, departure time, arrival time, duration, total stops and the target variable 22 | price. You can make changes to it, you can add or you can remove some columns, it completely 23 | depends on the website from which you are fetching the data. 24 | 25 | 2. Data Analysis : 26 | After cleaning the data, you have to do some analysis on the data. 27 | Do airfares change frequently? Do they move in small increments or in large jumps? Do they tend 28 | to go up or down over time? 29 | What is the best time to buy so that the consumer can save the most by taking the least risk? 30 | Does price increase as we get near to departure date? Is Indigo cheaper than Jet Airways? Are 31 | morning flights expensive? 32 | 33 | 3. Model Building : 34 | After collecting the data, you need to build a machine learning model. Before model building do 35 | all data pre-processing steps. Try different models with different hyper parameters and select 36 | the bestmodel. 37 | 38 | Follow the complete life cycle of data science. Include all the steps like 39 | 40 | 1. Data Cleaning 41 | 2. Exploratory Data Analysis 42 | 3. Data Pre-processing 43 | 4. Model Building 44 | 5. Model Evaluation 45 | 6. Selecting the best model 46 | -------------------------------------------------------------------------------- /Flight Price Predication using Machine Learning/sample-documentation.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/sample-documentation.docx -------------------------------------------------------------------------------- /Malignant Commentes Classifier - Multi Label Classification Project using NLP/Malignant Commentes Classifier - Multi Label Classification Project using NLP - FlipRobo.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Malignant Commentes Classifier - Multi Label Classification Project using NLP/Malignant Commentes Classifier - Multi Label Classification Project using NLP - FlipRobo.pptx -------------------------------------------------------------------------------- /Malignant Commentes Classifier - Multi Label Classification Project using NLP/Rating Prediction Project presentation - FlipRobo.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Malignant Commentes Classifier - Multi Label Classification Project using NLP/Rating Prediction Project presentation - FlipRobo.pptx -------------------------------------------------------------------------------- /Malignant Commentes Classifier - Multi Label Classification Project using NLP/Read me.md: -------------------------------------------------------------------------------- 1 | Problem Statement: 2 | 3 | The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection. Online hate, described as abusive language, aggression, cyberbullying, hatefulness and many others has been identified as a major threat on online social media platforms. Social media platforms are the most prominent grounds for such toxic behaviour. 4 | There has been a remarkable increase in the cases of cyberbullying and trolls on various social media platforms. Many celebrities and influences are facing backlashes from people and have to come across hateful and offensive comments. This can take a toll on anyone and affect them mentally leading to depression, mental illness, self-hatred and suicidal thoughts. 5 | Internet comments are bastions of hatred and vitriol. While online anonymity has provided a new outlet for aggression and hate speech, machine learning can be used to fight it. The problem we sought to solve was the tagging of internet comments that are aggressive towards other users. This means that insults to third parties such as celebrities will be tagged as unoffensive, but “u are an idiot” is clearly offensive. 6 | 7 | Our goal is to build a prototype of online hate and abuse comment classifier which can used to classify hate and offensive comments so that it can be controlled and restricted from spreading hatred and cyberbullying. 8 | 9 | Data Set Description: 10 | 11 | The data set contains the training set, which has approximately 1,59,000 samples and the test set which contains nearly 1,53,000 samples. All the data samples contain 8 fields which includes ‘Id’, ‘Comments’, ‘Malignant’, ‘Highly malignant’, ‘Rude’, ‘Threat’, ‘Abuse’ and ‘Loathe’.The label can be either 0 or 1, where 0 denotes a NO while 1 denotes a YES. There are various comments which have multiple labels. The first attribute is a unique ID associated with each comment. 12 | 13 | The data set includes: 14 | 15 | 16 | Malignant: It is the Label column, which includes values 0 and 1, denoting if the comment is malignant or not. 17 | 18 | Highly Malignant: It denotes comments that are highly malignant and hurtful. 19 | 20 | Rude: It denotes comments that are very rude and offensive. 21 | 22 | Threat: It contains indication of the comments that are giving any threat to someone. Abuse: It is for comments that are abusive in nature. 23 | 24 | Loathe: It describes the comments which are hateful and loathing in nature. 25 | 26 | ID: It includes unique Ids associated with each comment text given. 27 | 28 | Comment text: This column contains the comments extracted from various social media platforms. 29 | -------------------------------------------------------------------------------- /Malignant Commentes Classifier - Multi Label Classification Project using NLP/test_dataset_predictions.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Malignant Commentes Classifier - Multi Label Classification Project using NLP/test_dataset_predictions.zip -------------------------------------------------------------------------------- /Micro Credit Defaulter Project/Project Solution Files Micro Credit Project Defaulter.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Micro Credit Defaulter Project/Project Solution Files Micro Credit Project Defaulter.zip -------------------------------------------------------------------------------- /Micro Credit Defaulter Project/README.md: -------------------------------------------------------------------------------- 1 | Micro Credit Defaulter project submission at flip robo 2 | -------------------------------------------------------------------------------- /Product Review Rating Predication Using NLP/Problem-Statement.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/Problem-Statement.pdf -------------------------------------------------------------------------------- /Product Review Rating Predication Using NLP/Product Review Rating Predication Using NLP.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/Product Review Rating Predication Using NLP.pdf -------------------------------------------------------------------------------- /Product Review Rating Predication Using NLP/Rating Prediction Data Web Scraping .ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "> ### **Web Scraping for Rating Review Prediction Project**\n", 8 | "***\n", 9 | "**By: Lokesh Baviskar**\n", 10 | "\n", 11 | "**Batch : Internship 20**" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "#### **Web Scraping details**:\n", 19 | "- **Data for this project is Scrap from Amazon & Flipkart.**\n", 20 | "- **Around 50000 Reviews are scrap for this project.**\n", 21 | "- **Part 1 - Scraping data from Amazon.in**\n", 22 | "- **Part 2 - Scraping data from flipkart**" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 1, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "#Importing required libraries\n", 32 | "import pandas as pd\n", 33 | "import selenium\n", 34 | "from selenium import webdriver\n", 35 | "from selenium.common.exceptions import NoSuchElementException , StaleElementReferenceException\n", 36 | "import time\n", 37 | "import warnings\n", 38 | "warnings.filterwarnings('ignore')" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "### **Part 1- Scrapping data from Amazon**" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "#### **1.1.Headphones**" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 2, 58 | "metadata": {}, 59 | "outputs": [], 60 | "source": [ 61 | "#Connect to web driver\n", 62 | "driver =webdriver.Chrome(r\"C:\\chromedriver.exe\")" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 3, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "# Opening the Amazon.in\n", 72 | "driver.get('https://www.amazon.in/')" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 4, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "# Searching headphones in the search bar and clicking the search button\n", 82 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n", 83 | "search_bar.send_keys(\"headphones\")\n", 84 | "driver.find_element_by_id('nav-search-submit-button').click()" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 5, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "# Creating empty lists\n", 94 | "Product_URL=[]\n", 95 | "Ratings=[]\n", 96 | "Review=[]\n", 97 | "\n", 98 | "#Getting URLs of the product\n", 99 | "for i in range(1,4):\n", 100 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n", 101 | " for i in URL:\n", 102 | " Product_URL.append(i.get_attribute('href'))\n", 103 | " \n", 104 | " try:\n", 105 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n", 106 | " except NoSuchElementException:\n", 107 | " pass\n", 108 | " \n", 109 | "for i in Product_URL:\n", 110 | " driver.get(i)\n", 111 | " \n", 112 | " # Clicking the rating\n", 113 | " try:\n", 114 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n", 115 | " except NoSuchElementException:\n", 116 | " print(\"No rating\")\n", 117 | " pass\n", 118 | " \n", 119 | " #Clicking to see all reviews\n", 120 | " try:\n", 121 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n", 122 | " except NoSuchElementException:\n", 123 | " pass\n", 124 | " \n", 125 | " #Scrapping the details\n", 126 | " Start_page=1\n", 127 | " End_page=50\n", 128 | " for page in range(Start_page,End_page+1):\n", 129 | " try:\n", 130 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n", 131 | " for r in Reviews:\n", 132 | " Review.append(r.text.replace('\\n',''))\n", 133 | " except NoSuchElementException:\n", 134 | " Review.append(\"Not Available\")\n", 135 | " try:\n", 136 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n", 137 | " for i in Rating:\n", 138 | " rating=i.get_attribute('title')\n", 139 | " Ratings.append(rating[:3])\n", 140 | " except NoSuchElementException:\n", 141 | " Ratings.append(\"Not available\") \n", 142 | " \n", 143 | " #Looping for going to next page automatically\n", 144 | " try:\n", 145 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n", 146 | " if next_page.text=='Next Page':\n", 147 | " next_page.click()\n", 148 | " time.sleep(2)\n", 149 | " except NoSuchElementException:\n", 150 | " pass" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 6, 156 | "metadata": {}, 157 | "outputs": [ 158 | { 159 | "name": "stdout", 160 | "output_type": "stream", 161 | "text": [ 162 | "9000 9000\n" 163 | ] 164 | } 165 | ], 166 | "source": [ 167 | "#Checking the length of data extracted\n", 168 | "print(len(Review),len(Ratings))" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 7, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "#Saving in dataframe\n", 178 | "headphones=pd.DataFrame({'Product_Review':Review[:9000],'Ratings':Ratings[:9000]})" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "#### **1.2.Laptops** " 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 24, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "# Getting the website to driver\n", 195 | "driver.get('https://www.amazon.in/')" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 25, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "# Searching laptops in the search bar and clicking the search button\n", 205 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n", 206 | "search_bar.send_keys(\"laptops\")\n", 207 | "driver.find_element_by_id('nav-search-submit-button').click()" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 26, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "#Creating empty lists\n", 217 | "Product_URL=[]\n", 218 | "Ratings=[]\n", 219 | "Review=[]\n", 220 | "\n", 221 | "#Getting URLs of the product\n", 222 | "for i in range(1,4):\n", 223 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n", 224 | " for i in URL:\n", 225 | " Product_URL.append(i.get_attribute('href'))\n", 226 | " try:\n", 227 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n", 228 | " except NoSuchElementException:\n", 229 | " pass\n", 230 | " \n", 231 | "for i in Product_URL:\n", 232 | " driver.get(i)\n", 233 | " \n", 234 | " #Clicking the rating\n", 235 | " try:\n", 236 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n", 237 | " except NoSuchElementException:\n", 238 | " print(\"No rating\")\n", 239 | " pass\n", 240 | " \n", 241 | " #Clicking to see all reviews\n", 242 | " try:\n", 243 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n", 244 | " except NoSuchElementException:\n", 245 | " pass\n", 246 | " \n", 247 | " #Scrapping the details\n", 248 | " Start_page=1\n", 249 | " End_page=80\n", 250 | " for page in range(Start_page,End_page+1):\n", 251 | " try:\n", 252 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n", 253 | " for r in Reviews:\n", 254 | " Review.append(r.text.replace('\\n',''))\n", 255 | " except NoSuchElementException:\n", 256 | " Review.append(\"Not Available\")\n", 257 | " try:\n", 258 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n", 259 | " for i in Rating:\n", 260 | " rating=i.get_attribute('title')\n", 261 | " Ratings.append(rating[:3])\n", 262 | " except NoSuchElementException:\n", 263 | " Ratings.append(\"Not available\") \n", 264 | " \n", 265 | " # Looping for going to next page automatically\n", 266 | " try:\n", 267 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n", 268 | " if next_page.text=='Next Page':\n", 269 | " next_page.click()\n", 270 | " time.sleep(2)\n", 271 | " except NoSuchElementException:\n", 272 | " pass" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 27, 278 | "metadata": {}, 279 | "outputs": [ 280 | { 281 | "name": "stdout", 282 | "output_type": "stream", 283 | "text": [ 284 | "11000 11000\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "# Checking the length of data extracted\n", 290 | "print(len(Review),len(Ratings))" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 28, 296 | "metadata": {}, 297 | "outputs": [], 298 | "source": [ 299 | "#Saving in dataframe\n", 300 | "laptops=pd.DataFrame({'Product_Review':Review[:11000],'Ratings':Ratings[:11000]})" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "#### **1.3.Camera**" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 33, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "# Getting the website to driver\n", 317 | "driver.get('https://www.amazon.in/')" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 34, 323 | "metadata": {}, 324 | "outputs": [], 325 | "source": [ 326 | "#Searching dslr in the search bar and clicking the search button\n", 327 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n", 328 | "search_bar.send_keys(\"dslr\")\n", 329 | "driver.find_element_by_id('nav-search-submit-button').click()" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 35, 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [ 338 | "#Creating empty lists\n", 339 | "Product_URL=[]\n", 340 | "Ratings=[]\n", 341 | "Review=[]\n", 342 | "\n", 343 | "#Getting URLs of the product\n", 344 | "for i in range(1,4):\n", 345 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n", 346 | " for i in URL:\n", 347 | " Product_URL.append(i.get_attribute('href'))\n", 348 | " try:\n", 349 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n", 350 | " except NoSuchElementException:\n", 351 | " pass\n", 352 | " \n", 353 | "for i in Product_URL:\n", 354 | " driver.get(i)\n", 355 | " \n", 356 | " #Clicking the rating\n", 357 | " try:\n", 358 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n", 359 | " except NoSuchElementException:\n", 360 | " print(\"No rating\")\n", 361 | " pass\n", 362 | " \n", 363 | " #Clicking to see all reviews\n", 364 | " try:\n", 365 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n", 366 | " except NoSuchElementException:\n", 367 | " pass\n", 368 | " \n", 369 | " #Scrapping the details\n", 370 | " Start_page=1\n", 371 | " End_page=100\n", 372 | " for page in range(Start_page,End_page+1):\n", 373 | " try:\n", 374 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n", 375 | " for r in Reviews:\n", 376 | " Review.append(r.text.replace('\\n',''))\n", 377 | " except NoSuchElementException:\n", 378 | " Review.append(\"Not Available\")\n", 379 | " try:\n", 380 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n", 381 | " for i in Rating:\n", 382 | " rating=i.get_attribute('title')\n", 383 | " Ratings.append(rating[:3])\n", 384 | " except NoSuchElementException:\n", 385 | " Ratings.append(\"Not available\") \n", 386 | " \n", 387 | " #Looping for going to next page automatically\n", 388 | " try:\n", 389 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n", 390 | " if next_page.text=='Next Page':\n", 391 | " next_page.click()\n", 392 | " time.sleep(2)\n", 393 | " except NoSuchElementException:\n", 394 | " pass" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": 36, 400 | "metadata": {}, 401 | "outputs": [ 402 | { 403 | "name": "stdout", 404 | "output_type": "stream", 405 | "text": [ 406 | "10000 10000\n" 407 | ] 408 | } 409 | ], 410 | "source": [ 411 | "#Checking the length of data extracted\n", 412 | "print(len(Review),len(Ratings))" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 37, 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [ 421 | "#Saving in dataframe\n", 422 | "camera=pd.DataFrame({'Product_Review':Review[:10000],'Ratings':Ratings[:10000]})" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "#### **1.4.Smartphones**" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 42, 435 | "metadata": {}, 436 | "outputs": [], 437 | "source": [ 438 | "#Getting the website to driver\n", 439 | "driver.get('https://www.amazon.in/')" 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": 43, 445 | "metadata": {}, 446 | "outputs": [], 447 | "source": [ 448 | "#Searching phones in the search bar and clicking the search button\n", 449 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n", 450 | "search_bar.send_keys(\"phones\")\n", 451 | "driver.find_element_by_id('nav-search-submit-button').click()" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 44, 457 | "metadata": {}, 458 | "outputs": [], 459 | "source": [ 460 | "#Creating empty lists\n", 461 | "Product_URL=[]\n", 462 | "Ratings=[]\n", 463 | "Review=[]\n", 464 | "\n", 465 | "#Getting URLs of the product\n", 466 | "for i in range(1,4):\n", 467 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n", 468 | " for i in URL:\n", 469 | " Product_URL.append(i.get_attribute('href'))\n", 470 | " try:\n", 471 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n", 472 | " except NoSuchElementException:\n", 473 | " pass\n", 474 | " \n", 475 | "for i in Product_URL:\n", 476 | " driver.get(i)\n", 477 | " \n", 478 | " #Clicking the rating\n", 479 | " try:\n", 480 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n", 481 | " except NoSuchElementException:\n", 482 | " print(\"No rating\")\n", 483 | " pass\n", 484 | " \n", 485 | " #Clicking to see all reviews\n", 486 | " try:\n", 487 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n", 488 | " except NoSuchElementException:\n", 489 | " pass\n", 490 | " \n", 491 | " #Scrapping the details\n", 492 | " Start_page=1\n", 493 | " End_page=60\n", 494 | " for page in range(Start_page,End_page+1):\n", 495 | " try:\n", 496 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n", 497 | " for r in Reviews:\n", 498 | " Review.append(r.text.replace('\\n',''))\n", 499 | " except NoSuchElementException:\n", 500 | " Review.append(\"Not Available\")\n", 501 | " try:\n", 502 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n", 503 | " for i in Rating:\n", 504 | " rating=i.get_attribute('title')\n", 505 | " Ratings.append(rating[:3])\n", 506 | " except NoSuchElementException:\n", 507 | " Ratings.append(\"Not available\") \n", 508 | " \n", 509 | " #Looping for going to next page automatically\n", 510 | " try:\n", 511 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n", 512 | " if next_page.text=='Next Page':\n", 513 | " next_page.click()\n", 514 | " time.sleep(2)\n", 515 | " except NoSuchElementException:\n", 516 | " pass" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 45, 522 | "metadata": {}, 523 | "outputs": [ 524 | { 525 | "name": "stdout", 526 | "output_type": "stream", 527 | "text": [ 528 | "10000 10000\n" 529 | ] 530 | } 531 | ], 532 | "source": [ 533 | "#Checking the length of data extracted\n", 534 | "print(len(Review),len(Ratings))" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 46, 540 | "metadata": {}, 541 | "outputs": [], 542 | "source": [ 543 | "#Saving in dataframe\n", 544 | "phones=pd.DataFrame({'Product_Review':Review[:10000],'Ratings':Ratings[:10000]})" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 49, 550 | "metadata": {}, 551 | "outputs": [], 552 | "source": [ 553 | "# Closing the driver\n", 554 | "driver.close()" 555 | ] 556 | }, 557 | { 558 | "cell_type": "markdown", 559 | "metadata": {}, 560 | "source": [ 561 | "### **Part 2 - Scrapping data from Flipkart**" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "#### **2.1. I-Phone**" 569 | ] 570 | }, 571 | { 572 | "cell_type": "code", 573 | "execution_count": 50, 574 | "metadata": {}, 575 | "outputs": [], 576 | "source": [ 577 | "# Connect to web driver\n", 578 | "driver =webdriver.Chrome(r\"C:\\chromedriver.exe \")" 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": 51, 584 | "metadata": {}, 585 | "outputs": [], 586 | "source": [ 587 | "#Getting the website to driver\n", 588 | "driver.get('https://www.flipkart.com/apple-iphone-11-black-64-gb/product-reviews/itm4e5041ba101fd?pid=MOBFWQ6BXGJCEYNY&lid=LSTMOBFWQ6BXGJCEYNYZXSHRJ&marketplace=FLIPKART')" 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 52, 594 | "metadata": {}, 595 | "outputs": [], 596 | "source": [ 597 | "#Taking the empty lists\n", 598 | "Ratings=[]\n", 599 | "Review=[]\n", 600 | "\n", 601 | "#As there are nearly 10 reviews per page, we will check for 400+ pages and scrap the required data\n", 602 | "#Now we will take a for loop and scrap\n", 603 | "for i in range(0,410):\n", 604 | " for j in driver.find_elements_by_xpath(\"//div[@class='_3LWZlK _1BLPMq']\"):\n", 605 | " Ratings.append(j.text)\n", 606 | " for j in driver.find_elements_by_xpath(\"//div[@class='t-ZTKy']\"):\n", 607 | " Review.append(j.text)\n", 608 | " \n", 609 | " #Path for next page as it changes for every page. We are appending numbers as pages change \n", 610 | " k=i+1\n", 611 | " next_page=\"https://www.flipkart.com/apple-iphone-11-black-64-gb/product-reviews/itm4e5041ba101fd?pid=MOBFWQ6BXGJCEYNY&lid=LSTMOBFWQ6BXGJCEYNYZXSHRJ&marketplace=FLIPKART&page=\"+str(k) \n", 612 | " driver.get(next_page)" 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 53, 618 | "metadata": {}, 619 | "outputs": [ 620 | { 621 | "name": "stdout", 622 | "output_type": "stream", 623 | "text": [ 624 | "3734 3988\n" 625 | ] 626 | } 627 | ], 628 | "source": [ 629 | "#Checking the length of the data scraped\n", 630 | "print(len(Ratings),len(Review))" 631 | ] 632 | }, 633 | { 634 | "cell_type": "code", 635 | "execution_count": 55, 636 | "metadata": {}, 637 | "outputs": [], 638 | "source": [ 639 | "#Saving in dataframe\n", 640 | "f_phones=pd.DataFrame({'Product_Review':Review[:3700],'Ratings':Ratings[:3700]})" 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": 56, 646 | "metadata": {}, 647 | "outputs": [ 648 | { 649 | "data": { 650 | "text/html": [ 651 | "
\n", 652 | "\n", 665 | "\n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | "
Product_ReviewRatings
0The Best Phone for the Money\\n\\nThe iPhone 11 ...5
1Really satisfied with the Product I received.....5
2Great iPhone very snappy experience as apple k...5
3Amazing phone with great cameras and better ba...5
4Previously I was using one plus 3t it was a gr...5
\n", 701 | "
" 702 | ], 703 | "text/plain": [ 704 | " Product_Review Ratings\n", 705 | "0 The Best Phone for the Money\\n\\nThe iPhone 11 ... 5\n", 706 | "1 Really satisfied with the Product I received..... 5\n", 707 | "2 Great iPhone very snappy experience as apple k... 5\n", 708 | "3 Amazing phone with great cameras and better ba... 5\n", 709 | "4 Previously I was using one plus 3t it was a gr... 5" 710 | ] 711 | }, 712 | "execution_count": 56, 713 | "metadata": {}, 714 | "output_type": "execute_result" 715 | } 716 | ], 717 | "source": [ 718 | "#Checking first 5 data of the dataframe\n", 719 | "f_phones.head()" 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "execution_count": 57, 725 | "metadata": {}, 726 | "outputs": [ 727 | { 728 | "data": { 729 | "text/html": [ 730 | "
\n", 731 | "\n", 744 | "\n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | "
Product_ReviewRatings
3695Awesome. This is my first iPhone value for money.5
3696good5
369711 is as close & good as 12, The only downside...5
3698Mind-blowing 😍🥳5
3699Amazing product at best price5
\n", 780 | "
" 781 | ], 782 | "text/plain": [ 783 | " Product_Review Ratings\n", 784 | "3695 Awesome. This is my first iPhone value for money. 5\n", 785 | "3696 good 5\n", 786 | "3697 11 is as close & good as 12, The only downside... 5\n", 787 | "3698 Mind-blowing 😍🥳 5\n", 788 | "3699 Amazing product at best price 5" 789 | ] 790 | }, 791 | "execution_count": 57, 792 | "metadata": {}, 793 | "output_type": "execute_result" 794 | } 795 | ], 796 | "source": [ 797 | "#Checking last 5 data of the dataframe\n", 798 | "f_phones.tail()" 799 | ] 800 | }, 801 | { 802 | "cell_type": "code", 803 | "execution_count": 58, 804 | "metadata": {}, 805 | "outputs": [], 806 | "source": [ 807 | "#Closing the driver\n", 808 | "driver.close()" 809 | ] 810 | }, 811 | { 812 | "cell_type": "markdown", 813 | "metadata": {}, 814 | "source": [ 815 | "#### **2.2. Poco Mobiles**" 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": 59, 821 | "metadata": {}, 822 | "outputs": [], 823 | "source": [ 824 | "# Connect to web driver\n", 825 | "driver =webdriver.Chrome(r\"C:\\Users\\Femina\\Downloads\\chromedriver_win32 (1)\\chromedriver.exe \")" 826 | ] 827 | }, 828 | { 829 | "cell_type": "code", 830 | "execution_count": 60, 831 | "metadata": {}, 832 | "outputs": [], 833 | "source": [ 834 | "# Getting the website to driver\n", 835 | "driver.get('https://www.flipkart.com/poco-m3-power-black-64-gb/product-reviews/itmb49cc10841be2?pid=MOBFZTCUTAYPJHHR&lid=LSTMOBFZTCUTAYPJHHR2ZVC1N&marketplace=FLIPKART')" 836 | ] 837 | }, 838 | { 839 | "cell_type": "code", 840 | "execution_count": 61, 841 | "metadata": {}, 842 | "outputs": [], 843 | "source": [ 844 | "#Taking the empty lists\n", 845 | "Ratings=[]\n", 846 | "Review=[]\n", 847 | "\n", 848 | "#As there are nearly 10 reviews per page, we will check for 400+ pages and scrap the required data\n", 849 | "#Now we will take a for loop and scrap\n", 850 | "for i in range(0,650):\n", 851 | " for j in driver.find_elements_by_xpath(\"//div[@class='_3LWZlK _1BLPMq']\"):\n", 852 | " Ratings.append(j.text)\n", 853 | " for j in driver.find_elements_by_xpath(\"//div[@class='t-ZTKy']\"):\n", 854 | " Review.append(j.text)\n", 855 | " \n", 856 | " #Path for next page as it changes for every page. We are appending numbers as pages change \n", 857 | " k=i+1\n", 858 | " next_page=\"https://www.flipkart.com/poco-m3-power-black-64-gb/product-reviews/itmb49cc10841be2?pid=MOBFZTCUTAYPJHHR&lid=LSTMOBFZTCUTAYPJHHR2ZVC1N&marketplace=FLIPKART&page=\"+str(k) \n", 859 | " driver.get(next_page)" 860 | ] 861 | }, 862 | { 863 | "cell_type": "code", 864 | "execution_count": 62, 865 | "metadata": {}, 866 | "outputs": [ 867 | { 868 | "name": "stdout", 869 | "output_type": "stream", 870 | "text": [ 871 | "6411 5408\n" 872 | ] 873 | } 874 | ], 875 | "source": [ 876 | "#Checking the length of the data scraped\n", 877 | "print(len(Review),len(Ratings))" 878 | ] 879 | }, 880 | { 881 | "cell_type": "code", 882 | "execution_count": 63, 883 | "metadata": {}, 884 | "outputs": [], 885 | "source": [ 886 | "#Saving in dataframe\n", 887 | "f_poco=pd.DataFrame({'Product_Review':Review[:5200],'Ratings':Ratings[:5200]})" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": 64, 893 | "metadata": {}, 894 | "outputs": [ 895 | { 896 | "data": { 897 | "text/html": [ 898 | "
\n", 899 | "\n", 912 | "\n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | "
Product_ReviewRatings
0Great Phone at this Price point. Superb cool D...5
1Good mobile poco m3\\nPros:\\nFullhd display,\\ns...4
2Good phone battery🔋 And camera This price poin...5
3U will never get this specs for this price...d...5
4One word review \" Value for Money\"\\nIt has the...5
\n", 948 | "
" 949 | ], 950 | "text/plain": [ 951 | " Product_Review Ratings\n", 952 | "0 Great Phone at this Price point. Superb cool D... 5\n", 953 | "1 Good mobile poco m3\\nPros:\\nFullhd display,\\ns... 4\n", 954 | "2 Good phone battery🔋 And camera This price poin... 5\n", 955 | "3 U will never get this specs for this price...d... 5\n", 956 | "4 One word review \" Value for Money\"\\nIt has the... 5" 957 | ] 958 | }, 959 | "execution_count": 64, 960 | "metadata": {}, 961 | "output_type": "execute_result" 962 | } 963 | ], 964 | "source": [ 965 | "#Checking first 5 data of the dataframe\n", 966 | "f_poco.head()" 967 | ] 968 | }, 969 | { 970 | "cell_type": "code", 971 | "execution_count": 65, 972 | "metadata": {}, 973 | "outputs": [ 974 | { 975 | "data": { 976 | "text/html": [ 977 | "
\n", 978 | "\n", 991 | "\n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | "
Product_ReviewRatings
5195Good5
5196Best5
5197Good phone4
5198It's awesome... Thank you Flipkart every year ...3
5199Wow 😍😍😍😍 I am so happy very good tq Flipkart 😍😍😍😍5
\n", 1027 | "
" 1028 | ], 1029 | "text/plain": [ 1030 | " Product_Review Ratings\n", 1031 | "5195 Good 5\n", 1032 | "5196 Best 5\n", 1033 | "5197 Good phone 4\n", 1034 | "5198 It's awesome... Thank you Flipkart every year ... 3\n", 1035 | "5199 Wow 😍😍😍😍 I am so happy very good tq Flipkart 😍😍😍😍 5" 1036 | ] 1037 | }, 1038 | "execution_count": 65, 1039 | "metadata": {}, 1040 | "output_type": "execute_result" 1041 | } 1042 | ], 1043 | "source": [ 1044 | "#Checking last 5 data of the dataframe\n", 1045 | "f_poco.tail()" 1046 | ] 1047 | }, 1048 | { 1049 | "cell_type": "code", 1050 | "execution_count": 66, 1051 | "metadata": {}, 1052 | "outputs": [], 1053 | "source": [ 1054 | "#Closing the driver\n", 1055 | "driver.close()" 1056 | ] 1057 | }, 1058 | { 1059 | "cell_type": "markdown", 1060 | "metadata": {}, 1061 | "source": [ 1062 | "#### **2.3. Routers**" 1063 | ] 1064 | }, 1065 | { 1066 | "cell_type": "code", 1067 | "execution_count": 67, 1068 | "metadata": {}, 1069 | "outputs": [], 1070 | "source": [ 1071 | "# Connect to web driver\n", 1072 | "driver =webdriver.Chrome(r\"C:\\Users\\Femina\\Downloads\\chromedriver_win32 (1)\\chromedriver.exe \")" 1073 | ] 1074 | }, 1075 | { 1076 | "cell_type": "code", 1077 | "execution_count": 68, 1078 | "metadata": {}, 1079 | "outputs": [], 1080 | "source": [ 1081 | "# Getting the website to driver\n", 1082 | "driver.get('https://www.flipkart.com/tp-link-tl-wr841n-300mbps-wireless-n-router/product-reviews/itmf48vgyfyx8m4f?pid=RTRD7HN3JJYF6WN2&lid=LSTRTRD7HN3JJYF6WN20ZITXQ&marketplace=FLIPKART')" 1083 | ] 1084 | }, 1085 | { 1086 | "cell_type": "code", 1087 | "execution_count": 69, 1088 | "metadata": {}, 1089 | "outputs": [], 1090 | "source": [ 1091 | "#Taking the empty lists\n", 1092 | "Ratings=[]\n", 1093 | "Review=[]\n", 1094 | "\n", 1095 | "#As there are nearly 10 reviews per page, we will check for 400+ pages and scrap the required data\n", 1096 | "#Now we will take a for loop and scrap\n", 1097 | "for i in range(0,150):\n", 1098 | " for j in driver.find_elements_by_xpath(\"//div[@class='_3LWZlK _1BLPMq']\"):\n", 1099 | " Ratings.append(j.text)\n", 1100 | " for j in driver.find_elements_by_xpath(\"//div[@class='t-ZTKy']\"):\n", 1101 | " Review.append(j.text)\n", 1102 | " \n", 1103 | " #Path for next page as it changes for every page. We are appending numbers as pages change \n", 1104 | " k=i+1\n", 1105 | " next_page=\"https://www.flipkart.com/tp-link-tl-wr841n-300mbps-wireless-n-router/product-reviews/itmf48vgyfyx8m4f?pid=RTRD7HN3JJYF6WN2&lid=LSTRTRD7HN3JJYF6WN20ZITXQ&marketplace=FLIPKART&page=\"+str(k) \n", 1106 | " driver.get(next_page)" 1107 | ] 1108 | }, 1109 | { 1110 | "cell_type": "code", 1111 | "execution_count": 70, 1112 | "metadata": {}, 1113 | "outputs": [ 1114 | { 1115 | "name": "stdout", 1116 | "output_type": "stream", 1117 | "text": [ 1118 | "1500 1457\n" 1119 | ] 1120 | } 1121 | ], 1122 | "source": [ 1123 | "# Checking the length of the data scraped\n", 1124 | "print(len(Review),len(Ratings))" 1125 | ] 1126 | }, 1127 | { 1128 | "cell_type": "code", 1129 | "execution_count": 71, 1130 | "metadata": {}, 1131 | "outputs": [], 1132 | "source": [ 1133 | "# Saving in dataframe\n", 1134 | "router=pd.DataFrame({'Product_Review':Review[:1000],'Ratings':Ratings[:1000]})" 1135 | ] 1136 | }, 1137 | { 1138 | "cell_type": "code", 1139 | "execution_count": 72, 1140 | "metadata": {}, 1141 | "outputs": [ 1142 | { 1143 | "data": { 1144 | "text/html": [ 1145 | "
\n", 1146 | "\n", 1159 | "\n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | "
Product_ReviewRatings
0*********EDIT: Today is 8th March 2015, more t...5
1I Used this Router for 30 days, Really worth f...4
2Okay, so... my review of this router is going ...4
3Excellent service and great buying experience ...5
4I bought this product on 25/05/2012 at 11.30 a...5
\n", 1195 | "
" 1196 | ], 1197 | "text/plain": [ 1198 | " Product_Review Ratings\n", 1199 | "0 *********EDIT: Today is 8th March 2015, more t... 5\n", 1200 | "1 I Used this Router for 30 days, Really worth f... 4\n", 1201 | "2 Okay, so... my review of this router is going ... 4\n", 1202 | "3 Excellent service and great buying experience ... 5\n", 1203 | "4 I bought this product on 25/05/2012 at 11.30 a... 5" 1204 | ] 1205 | }, 1206 | "execution_count": 72, 1207 | "metadata": {}, 1208 | "output_type": "execute_result" 1209 | } 1210 | ], 1211 | "source": [ 1212 | "# Checking first 5 data of the dataframe\n", 1213 | "router.head()" 1214 | ] 1215 | }, 1216 | { 1217 | "cell_type": "code", 1218 | "execution_count": 73, 1219 | "metadata": {}, 1220 | "outputs": [ 1221 | { 1222 | "data": { 1223 | "text/html": [ 1224 | "
\n", 1225 | "\n", 1238 | "\n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | "
Product_ReviewRatings
995I needed to create a WiFi setup for small offi...5
996its very easy interface to work with and i str...4
997Honest rating for this product is 5/5 . Just f...5
998I have bought 3 of these little guys, for my s...4
999Its a great product. Works really fine. It wil...5
\n", 1274 | "
" 1275 | ], 1276 | "text/plain": [ 1277 | " Product_Review Ratings\n", 1278 | "995 I needed to create a WiFi setup for small offi... 5\n", 1279 | "996 its very easy interface to work with and i str... 4\n", 1280 | "997 Honest rating for this product is 5/5 . Just f... 5\n", 1281 | "998 I have bought 3 of these little guys, for my s... 4\n", 1282 | "999 Its a great product. Works really fine. It wil... 5" 1283 | ] 1284 | }, 1285 | "execution_count": 73, 1286 | "metadata": {}, 1287 | "output_type": "execute_result" 1288 | } 1289 | ], 1290 | "source": [ 1291 | "# Checking last 5 data of the dataframe\n", 1292 | "router.tail()" 1293 | ] 1294 | }, 1295 | { 1296 | "cell_type": "code", 1297 | "execution_count": 74, 1298 | "metadata": {}, 1299 | "outputs": [], 1300 | "source": [ 1301 | "# Closing the driver\n", 1302 | "driver.close()" 1303 | ] 1304 | }, 1305 | { 1306 | "cell_type": "markdown", 1307 | "metadata": {}, 1308 | "source": [ 1309 | "#### Exporting data in CSV file" 1310 | ] 1311 | }, 1312 | { 1313 | "cell_type": "code", 1314 | "execution_count": null, 1315 | "metadata": {}, 1316 | "outputs": [], 1317 | "source": [ 1318 | "# Combining all dataframes into a single dataframe\n", 1319 | "ratings_data=headphones.append([laptops,camera,phones,f_phones,f_poco,router],ignore_index=True)\n", 1320 | "ratings_data" 1321 | ] 1322 | }, 1323 | { 1324 | "cell_type": "code", 1325 | "execution_count": 76, 1326 | "metadata": {}, 1327 | "outputs": [], 1328 | "source": [ 1329 | "# Saving the data into a csv file\n", 1330 | "ratings_data.to_csv('Rating_Prediction_dataset.csv')" 1331 | ] 1332 | }, 1333 | { 1334 | "cell_type": "code", 1335 | "execution_count": null, 1336 | "metadata": {}, 1337 | "outputs": [], 1338 | "source": [] 1339 | } 1340 | ], 1341 | "metadata": { 1342 | "kernelspec": { 1343 | "display_name": "Python 3", 1344 | "language": "python", 1345 | "name": "python3" 1346 | }, 1347 | "language_info": { 1348 | "codemirror_mode": { 1349 | "name": "ipython", 1350 | "version": 3 1351 | }, 1352 | "file_extension": ".py", 1353 | "mimetype": "text/x-python", 1354 | "name": "python", 1355 | "nbconvert_exporter": "python", 1356 | "pygments_lexer": "ipython3", 1357 | "version": "3.8.5" 1358 | } 1359 | }, 1360 | "nbformat": 4, 1361 | "nbformat_minor": 4 1362 | } 1363 | -------------------------------------------------------------------------------- /Product Review Rating Predication Using NLP/Rating Prediction Project presentation - FlipRobo.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/Rating Prediction Project presentation - FlipRobo.pptx -------------------------------------------------------------------------------- /Product Review Rating Predication Using NLP/Read me.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Product Review Rating Predication Using NLP/sample-documentation.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/sample-documentation.docx -------------------------------------------------------------------------------- /Project Customer Retention in Ecommerce sector/Customer Retention Project.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/Customer Retention Project.zip -------------------------------------------------------------------------------- /Project Customer Retention in Ecommerce sector/Customer_retention_case study.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/Customer_retention_case study.docx -------------------------------------------------------------------------------- /Project Customer Retention in Ecommerce sector/Project Report on Data Analysis of Customer Retention in Ecommerce Sector .pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/Project Report on Data Analysis of Customer Retention in Ecommerce Sector .pdf -------------------------------------------------------------------------------- /Project Customer Retention in Ecommerce sector/Read me.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Project Customer Retention in Ecommerce sector/customer_retention_dataset.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/customer_retention_dataset.xlsx -------------------------------------------------------------------------------- /Project Used Car price predication using ML/Car Price Predication Project.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Used Car price predication using ML/Car Price Predication Project.zip -------------------------------------------------------------------------------- /Project Used Car price predication using ML/Car Price Predication Web Scraping script.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## **Car Price Prediction Project - Part 1 Web Scraping Data of Used Car**\n", 8 | "***Author : Mr.Lokesh Baviskar***\n", 9 | "\n", 10 | "### **Objective** :\n", 11 | " **To Scrape data of used car to predict car price**\n", 12 | "\n", 13 | "### **Strategy** :\n", 14 | "1. Selenium will be used for webscraping data from cardheko.com\n", 15 | "2. In first part Scraping URL of Used car for different location in India.\n", 16 | "3. Storing Scrap URL in excel file. \n", 17 | "4. Selecting car feature to be scrap from website.\n", 18 | "5. In second part Scraping data from indiviual URL in excel file.\n", 19 | "6. Exporting final data in Excel file." 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### **Part 1 : Scraping URLs of Used car from Cardheko.com**" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "- **Importing libraries require for scraping**" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "import pandas as pd\n", 43 | "import numpy as pd\n", 44 | "import time\n", 45 | "import selenium\n", 46 | "from selenium import webdriver\n", 47 | "from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | " - **Importing webdriver**" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 2, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "driver=webdriver.Chrome(r'C:\\chromedriver.exe')" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "- **Opening cardheko website in browser**" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 113, 76 | "metadata": {}, 77 | "outputs": [], 78 | "source": [ 79 | "url = \"https://www.cardekho.com/\"\n", 80 | "driver.get(url)\n", 81 | "time.sleep(2)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 16, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "Used_cars=driver.find_element_by_xpath('//li[@data-slug=\"/usedCars\"]/a').get_attribute('href')\n", 91 | "driver.get(Used_cars)\n", 92 | "time.sleep(2)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "- **Collecting url of different location/ city for futher scraping**" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "**1. Extracting data for Ahmedabad city**" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 16, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "url = \"https://www.cardekho.com/used-cars+in+ahmedabad\"\n", 116 | "driver.get(url)\n", 117 | "time.sleep(2)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 17, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "name": "stderr", 127 | "output_type": "stream", 128 | "text": [ 129 | "100%|██████████| 200/200 [05:42<00:00, 1.71s/it]\n" 130 | ] 131 | } 132 | ], 133 | "source": [ 134 | "from tqdm import tqdm\n", 135 | "for _ in tqdm(range(0,200)):\n", 136 | " time.sleep(0.5)\n", 137 | " driver.execute_script(\"window.scrollBy(0,1000)\",\"\")\n", 138 | " time.sleep(1)\n", 139 | " driver.execute_script(\"window.scrollBy(0,-350)\")" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 18, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stderr", 149 | "output_type": "stream", 150 | "text": [ 151 | "100%|██████████| 509/509 [00:04<00:00, 112.64it/s]\n" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "Car_url_ahmedabad = []\n", 157 | "car_url_ahmedabad = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 158 | "for j in tqdm(range(len(car_url_ahmedabad))):\n", 159 | " Car_url_ahmedabad.append(car_url_ahmedabad[j].get_attribute('href'))\n", 160 | "time.sleep(2)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 20, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "data": { 170 | "text/plain": [ 171 | "509" 172 | ] 173 | }, 174 | "execution_count": 20, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "len(Car_url_ahmedabad)" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "**2. Extracting URL for Bangalore city**" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 33, 193 | "metadata": {}, 194 | "outputs": [], 195 | "source": [ 196 | "url = \"https://www.cardekho.com/used-cars+in+bangalore\"\n", 197 | "driver.get(url)\n", 198 | "time.sleep(2)" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 34, 204 | "metadata": {}, 205 | "outputs": [ 206 | { 207 | "name": "stderr", 208 | "output_type": "stream", 209 | "text": [ 210 | "100%|██████████| 300/300 [09:45<00:00, 1.95s/it]\n" 211 | ] 212 | } 213 | ], 214 | "source": [ 215 | "from tqdm import tqdm\n", 216 | "for _ in tqdm(range(0,300)):\n", 217 | " time.sleep(0.75)\n", 218 | " driver.execute_script(\"window.scrollBy(0,1000)\",\"\")\n", 219 | " time.sleep(1)\n", 220 | " driver.execute_script(\"window.scrollBy(0,-350)\")" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 35, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "name": "stderr", 230 | "output_type": "stream", 231 | "text": [ 232 | "100%|██████████| 580/580 [00:05<00:00, 105.58it/s]\n" 233 | ] 234 | } 235 | ], 236 | "source": [ 237 | "Car_url_bangalore = []\n", 238 | "car_url_bangalore = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 239 | "for j in tqdm(range(len(car_url_bangalore))):\n", 240 | " Car_url_bangalore.append(car_url_bangalore[j].get_attribute('href'))\n", 241 | "time.sleep(2)" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 36, 247 | "metadata": {}, 248 | "outputs": [ 249 | { 250 | "data": { 251 | "text/plain": [ 252 | "580" 253 | ] 254 | }, 255 | "execution_count": 36, 256 | "metadata": {}, 257 | "output_type": "execute_result" 258 | } 259 | ], 260 | "source": [ 261 | "len(Car_url_bangalore)" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "**3. Extracting URL for Chennai**" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 37, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "url = \"https://www.cardekho.com/used-cars+in+chennai\"\n", 278 | "driver.get(url)\n", 279 | "time.sleep(2)" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 38, 285 | "metadata": {}, 286 | "outputs": [ 287 | { 288 | "name": "stderr", 289 | "output_type": "stream", 290 | "text": [ 291 | "100%|██████████| 250/250 [06:35<00:00, 1.58s/it]\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "from tqdm import tqdm\n", 297 | "for _ in tqdm(range(0,250)):\n", 298 | " time.sleep(0.5)\n", 299 | " driver.execute_script(\"window.scrollBy(0,1000)\",\"\")\n", 300 | " time.sleep(1)\n", 301 | " driver.execute_script(\"window.scrollBy(0,-350)\")" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 39, 307 | "metadata": {}, 308 | "outputs": [ 309 | { 310 | "name": "stderr", 311 | "output_type": "stream", 312 | "text": [ 313 | "100%|██████████| 298/298 [00:02<00:00, 117.45it/s]\n" 314 | ] 315 | } 316 | ], 317 | "source": [ 318 | "Car_url_chennai = []\n", 319 | "car_url_chennai = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 320 | "for j in tqdm(range(len(car_url_chennai))):\n", 321 | " Car_url_chennai.append(car_url_chennai[j].get_attribute('href'))\n", 322 | "time.sleep(2)" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "**4. Extracting URL for Dehli-NCR**" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 48, 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [ 338 | "url = \"https://www.cardekho.com/used-cars+in+delhi-ncr\"\n", 339 | "driver.get(url)\n", 340 | "time.sleep(2)" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 49, 346 | "metadata": {}, 347 | "outputs": [ 348 | { 349 | "name": "stderr", 350 | "output_type": "stream", 351 | "text": [ 352 | "100%|██████████| 1100/1100 [53:55<00:00, 2.94s/it]\n" 353 | ] 354 | } 355 | ], 356 | "source": [ 357 | "from tqdm import tqdm\n", 358 | "for _ in tqdm(range(0,1100)):\n", 359 | " time.sleep(0.5)\n", 360 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 361 | " time.sleep(1)\n", 362 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 50, 368 | "metadata": {}, 369 | "outputs": [ 370 | { 371 | "name": "stderr", 372 | "output_type": "stream", 373 | "text": [ 374 | "100%|██████████| 3141/3141 [01:03<00:00, 49.52it/s]\n" 375 | ] 376 | } 377 | ], 378 | "source": [ 379 | "Car_url_delhi_ncr = []\n", 380 | "car_url_delhi_ncr = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 381 | "for j in tqdm(range(len(car_url_delhi_ncr))):\n", 382 | " Car_url_delhi_ncr.append(car_url_delhi_ncr[j].get_attribute('href'))\n", 383 | "time.sleep(2)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "**5. Extracting URL for Gurgaon city**" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 51, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [ 399 | "url = \"https://www.cardekho.com/used-cars+in+gurgaon\"\n", 400 | "driver.get(url)\n", 401 | "time.sleep(2)" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": 52, 407 | "metadata": {}, 408 | "outputs": [ 409 | { 410 | "name": "stderr", 411 | "output_type": "stream", 412 | "text": [ 413 | "100%|██████████| 600/600 [20:22<00:00, 2.04s/it]\n" 414 | ] 415 | } 416 | ], 417 | "source": [ 418 | "from tqdm import tqdm\n", 419 | "for _ in tqdm(range(0,600)):\n", 420 | " time.sleep(0.5)\n", 421 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 422 | " time.sleep(1)\n", 423 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": 53, 429 | "metadata": {}, 430 | "outputs": [ 431 | { 432 | "name": "stderr", 433 | "output_type": "stream", 434 | "text": [ 435 | "100%|██████████| 1217/1217 [00:12<00:00, 100.96it/s]\n" 436 | ] 437 | } 438 | ], 439 | "source": [ 440 | "Car_url_gurgaon = []\n", 441 | "car_url_gurgaon = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 442 | "for j in tqdm(range(len(car_url_gurgaon))):\n", 443 | " Car_url_gurgaon.append(car_url_gurgaon[j].get_attribute('href'))\n", 444 | "time.sleep(2)" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": {}, 450 | "source": [ 451 | "**6. Extracting URL for Telangana**" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 54, 457 | "metadata": {}, 458 | "outputs": [], 459 | "source": [ 460 | "url = \"https://www.cardekho.com/used-cars+in+telangana\"\n", 461 | "driver.get(url)\n", 462 | "time.sleep(2)" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": 56, 468 | "metadata": {}, 469 | "outputs": [ 470 | { 471 | "name": "stderr", 472 | "output_type": "stream", 473 | "text": [ 474 | "100%|██████████| 400/400 [17:11<00:00, 2.58s/it]\n" 475 | ] 476 | } 477 | ], 478 | "source": [ 479 | "from tqdm import tqdm\n", 480 | "for _ in tqdm(range(0,400)):\n", 481 | " time.sleep(0.5)\n", 482 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 483 | " time.sleep(1)\n", 484 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 57, 490 | "metadata": {}, 491 | "outputs": [ 492 | { 493 | "name": "stderr", 494 | "output_type": "stream", 495 | "text": [ 496 | "100%|██████████| 1210/1210 [00:10<00:00, 110.06it/s]\n" 497 | ] 498 | } 499 | ], 500 | "source": [ 501 | "Car_url_telangana = []\n", 502 | "car_url_telangana = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 503 | "for j in tqdm(range(len(car_url_telangana))):\n", 504 | " Car_url_telangana.append(car_url_telangana[j].get_attribute('href'))\n", 505 | "time.sleep(2)" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "**7. Extracting URL for Maharashtra**" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": 61, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "url = \"https://www.cardekho.com/used-cars+in+maharashtra\"\n", 522 | "driver.get(url)\n", 523 | "time.sleep(2)" 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": 62, 529 | "metadata": {}, 530 | "outputs": [ 531 | { 532 | "name": "stderr", 533 | "output_type": "stream", 534 | "text": [ 535 | "100%|██████████| 1000/1000 [09:30<00:00, 1.75it/s]\n" 536 | ] 537 | } 538 | ], 539 | "source": [ 540 | "from tqdm import tqdm\n", 541 | "for _ in tqdm(range(0,1000)):\n", 542 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 543 | " time.sleep(0.5)\n", 544 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 60, 550 | "metadata": {}, 551 | "outputs": [ 552 | { 553 | "name": "stderr", 554 | "output_type": "stream", 555 | "text": [ 556 | "100%|██████████| 4526/4526 [01:24<00:00, 53.65it/s] \n" 557 | ] 558 | } 559 | ], 560 | "source": [ 561 | "Car_url_Maharashtra = []\n", 562 | "car_url_Maharashtra = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 563 | "for j in tqdm(range(len(car_url_Maharashtra))):\n", 564 | " Car_url_Maharashtra.append(car_url_Maharashtra[j].get_attribute('href'))\n", 565 | "time.sleep(2)" 566 | ] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": {}, 571 | "source": [ 572 | "**8. Extracting URL for Karnataka**" 573 | ] 574 | }, 575 | { 576 | "cell_type": "code", 577 | "execution_count": 63, 578 | "metadata": {}, 579 | "outputs": [], 580 | "source": [ 581 | "url = \"https://www.cardekho.com/used-cars+in+karnataka\"\n", 582 | "driver.get(url)\n", 583 | "time.sleep(2)" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 64, 589 | "metadata": {}, 590 | "outputs": [ 591 | { 592 | "name": "stderr", 593 | "output_type": "stream", 594 | "text": [ 595 | "100%|██████████| 750/750 [08:08<00:00, 1.53it/s] \n" 596 | ] 597 | } 598 | ], 599 | "source": [ 600 | "from tqdm import tqdm\n", 601 | "for _ in tqdm(range(0,750)):\n", 602 | " \n", 603 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 604 | " time.sleep(0.4)\n", 605 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": 65, 611 | "metadata": {}, 612 | "outputs": [ 613 | { 614 | "name": "stderr", 615 | "output_type": "stream", 616 | "text": [ 617 | "100%|██████████| 867/867 [00:07<00:00, 111.79it/s]\n" 618 | ] 619 | } 620 | ], 621 | "source": [ 622 | "Car_url_Karnataka = []\n", 623 | "car_url_Karnataka = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 624 | "for j in tqdm(range(len(car_url_Karnataka))):\n", 625 | " Car_url_Karnataka.append(car_url_Karnataka[j].get_attribute('href'))\n", 626 | "time.sleep(2)" 627 | ] 628 | }, 629 | { 630 | "cell_type": "markdown", 631 | "metadata": {}, 632 | "source": [ 633 | "**9. Extracting URL for Uttar Pradesh**" 634 | ] 635 | }, 636 | { 637 | "cell_type": "code", 638 | "execution_count": 66, 639 | "metadata": {}, 640 | "outputs": [], 641 | "source": [ 642 | "url = \"https://www.cardekho.com/used-cars+in+uttar-pradesh\"\n", 643 | "driver.get(url)\n", 644 | "time.sleep(2)" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": 67, 650 | "metadata": {}, 651 | "outputs": [ 652 | { 653 | "name": "stderr", 654 | "output_type": "stream", 655 | "text": [ 656 | "100%|██████████| 700/700 [10:19<00:00, 1.13it/s] \n" 657 | ] 658 | } 659 | ], 660 | "source": [ 661 | "from tqdm import tqdm\n", 662 | "for _ in tqdm(range(0,700)):\n", 663 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 664 | " time.sleep(0.25)\n", 665 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 666 | ] 667 | }, 668 | { 669 | "cell_type": "code", 670 | "execution_count": 68, 671 | "metadata": {}, 672 | "outputs": [ 673 | { 674 | "name": "stderr", 675 | "output_type": "stream", 676 | "text": [ 677 | "100%|██████████| 1380/1380 [00:15<00:00, 89.31it/s]\n" 678 | ] 679 | } 680 | ], 681 | "source": [ 682 | "Car_url_UttarPradesh = []\n", 683 | "car_url_UttarPradesh = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 684 | "for j in tqdm(range(len(car_url_UttarPradesh))):\n", 685 | " Car_url_UttarPradesh.append(car_url_UttarPradesh[j].get_attribute('href'))\n", 686 | "time.sleep(2)" 687 | ] 688 | }, 689 | { 690 | "cell_type": "markdown", 691 | "metadata": {}, 692 | "source": [ 693 | "**10. Extracting URL for Tamil Nadu**" 694 | ] 695 | }, 696 | { 697 | "cell_type": "code", 698 | "execution_count": 69, 699 | "metadata": {}, 700 | "outputs": [], 701 | "source": [ 702 | "url = \"https://www.cardekho.com/used-cars+in+tamil-nadu\"\n", 703 | "driver.get(url)\n", 704 | "time.sleep(2)" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "from tqdm import tqdm\n", 714 | "for _ in tqdm(range(0,600)):\n", 715 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 716 | " time.sleep(0.25)\n", 717 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 718 | ] 719 | }, 720 | { 721 | "cell_type": "code", 722 | "execution_count": 73, 723 | "metadata": {}, 724 | "outputs": [ 725 | { 726 | "name": "stderr", 727 | "output_type": "stream", 728 | "text": [ 729 | "100%|██████████| 1750/1750 [02:35<00:00, 11.24it/s]\n" 730 | ] 731 | } 732 | ], 733 | "source": [ 734 | "Car_url_TamilNadu = []\n", 735 | "car_url_TamilNadu = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 736 | "for j in tqdm(range(len(car_url_TamilNadu))):\n", 737 | " Car_url_TamilNadu.append(car_url_TamilNadu[j].get_attribute('href'))\n", 738 | "time.sleep(2)" 739 | ] 740 | }, 741 | { 742 | "cell_type": "markdown", 743 | "metadata": {}, 744 | "source": [ 745 | "**11. Extracting URL for Haryana**" 746 | ] 747 | }, 748 | { 749 | "cell_type": "code", 750 | "execution_count": 75, 751 | "metadata": {}, 752 | "outputs": [], 753 | "source": [ 754 | "url = \"https://www.cardekho.com/used-cars+in+haryana\"\n", 755 | "driver.get(url)\n", 756 | "time.sleep(2)" 757 | ] 758 | }, 759 | { 760 | "cell_type": "code", 761 | "execution_count": null, 762 | "metadata": {}, 763 | "outputs": [], 764 | "source": [ 765 | "from tqdm import tqdm\n", 766 | "for _ in tqdm(range(0,600)):\n", 767 | " \n", 768 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 769 | " time.sleep(0.25)\n", 770 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 771 | ] 772 | }, 773 | { 774 | "cell_type": "code", 775 | "execution_count": 79, 776 | "metadata": {}, 777 | "outputs": [ 778 | { 779 | "name": "stderr", 780 | "output_type": "stream", 781 | "text": [ 782 | "100%|██████████| 1228/1228 [00:12<00:00, 96.30it/s] \n" 783 | ] 784 | } 785 | ], 786 | "source": [ 787 | "Car_url_Haryana = []\n", 788 | "car_url_Haryana = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 789 | "for j in tqdm(range(len(car_url_Haryana))):\n", 790 | " Car_url_Haryana.append(car_url_Haryana[j].get_attribute('href'))\n", 791 | "time.sleep(2)" 792 | ] 793 | }, 794 | { 795 | "cell_type": "code", 796 | "execution_count": 99, 797 | "metadata": {}, 798 | "outputs": [ 799 | { 800 | "data": { 801 | "text/plain": [ 802 | "1228" 803 | ] 804 | }, 805 | "execution_count": 99, 806 | "metadata": {}, 807 | "output_type": "execute_result" 808 | } 809 | ], 810 | "source": [ 811 | "len(Car_url_Haryana)" 812 | ] 813 | }, 814 | { 815 | "cell_type": "markdown", 816 | "metadata": {}, 817 | "source": [ 818 | "**12. Extracting URL for Rajasthan**" 819 | ] 820 | }, 821 | { 822 | "cell_type": "code", 823 | "execution_count": 80, 824 | "metadata": {}, 825 | "outputs": [], 826 | "source": [ 827 | "url = \"https://www.cardekho.com/used-cars+in+rajasthan\"\n", 828 | "driver.get(url)\n", 829 | "time.sleep(2)" 830 | ] 831 | }, 832 | { 833 | "cell_type": "code", 834 | "execution_count": 81, 835 | "metadata": {}, 836 | "outputs": [ 837 | { 838 | "name": "stderr", 839 | "output_type": "stream", 840 | "text": [ 841 | "100%|██████████| 500/500 [04:49<00:00, 1.73it/s]\n" 842 | ] 843 | } 844 | ], 845 | "source": [ 846 | "from tqdm import tqdm\n", 847 | "for _ in tqdm(range(0,500)):\n", 848 | " \n", 849 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 850 | " time.sleep(0.25)\n", 851 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 852 | ] 853 | }, 854 | { 855 | "cell_type": "code", 856 | "execution_count": 82, 857 | "metadata": {}, 858 | "outputs": [ 859 | { 860 | "name": "stderr", 861 | "output_type": "stream", 862 | "text": [ 863 | "100%|██████████| 687/687 [00:06<00:00, 111.65it/s]\n" 864 | ] 865 | } 866 | ], 867 | "source": [ 868 | "Car_url_Rajasthan = []\n", 869 | "car_url_Rajasthan = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 870 | "for j in tqdm(range(len(car_url_Rajasthan))):\n", 871 | " Car_url_Rajasthan.append(car_url_Rajasthan[j].get_attribute('href'))\n", 872 | "time.sleep(2)" 873 | ] 874 | }, 875 | { 876 | "cell_type": "markdown", 877 | "metadata": {}, 878 | "source": [ 879 | "**13. Extracting URL for Kerala**" 880 | ] 881 | }, 882 | { 883 | "cell_type": "code", 884 | "execution_count": 83, 885 | "metadata": {}, 886 | "outputs": [], 887 | "source": [ 888 | "url = \"https://www.cardekho.com/used-cars+in+kerala\"\n", 889 | "driver.get(url)\n", 890 | "time.sleep(2)" 891 | ] 892 | }, 893 | { 894 | "cell_type": "code", 895 | "execution_count": 84, 896 | "metadata": {}, 897 | "outputs": [ 898 | { 899 | "name": "stderr", 900 | "output_type": "stream", 901 | "text": [ 902 | "100%|██████████| 400/400 [10:11<00:00, 1.53s/it]\n" 903 | ] 904 | } 905 | ], 906 | "source": [ 907 | "from tqdm import tqdm\n", 908 | "for _ in tqdm(range(0,400)):\n", 909 | " time.sleep(0.5)\n", 910 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n", 911 | " time.sleep(1)\n", 912 | " driver.execute_script(\"window.scrollBy(0,-500)\")" 913 | ] 914 | }, 915 | { 916 | "cell_type": "code", 917 | "execution_count": 85, 918 | "metadata": {}, 919 | "outputs": [ 920 | { 921 | "name": "stderr", 922 | "output_type": "stream", 923 | "text": [ 924 | "100%|██████████| 18/18 [00:00<00:00, 117.58it/s]\n" 925 | ] 926 | } 927 | ], 928 | "source": [ 929 | "Car_url_Kerala = []\n", 930 | "car_url_Kerala = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n", 931 | "for j in tqdm(range(len(car_url_Kerala))):\n", 932 | " Car_url_Kerala.append(car_url_Kerala[j].get_attribute('href'))\n", 933 | "time.sleep(2)" 934 | ] 935 | }, 936 | { 937 | "cell_type": "code", 938 | "execution_count": 87, 939 | "metadata": {}, 940 | "outputs": [], 941 | "source": [ 942 | "Car_url = []" 943 | ] 944 | }, 945 | { 946 | "cell_type": "code", 947 | "execution_count": 89, 948 | "metadata": {}, 949 | "outputs": [], 950 | "source": [ 951 | "Car_url = Car_url_Kerala + Car_url_Rajasthan + Car_url_Haryana + Car_url_TamilNadu + Car_url_UttarPradesh + Car_url_Karnataka + Car_url_telangana + Car_url_gurgaon + Car_url_delhi_ncr + Car_url_chennai + Car_url_ahmedabad" 952 | ] 953 | }, 954 | { 955 | "cell_type": "code", 956 | "execution_count": 90, 957 | "metadata": {}, 958 | "outputs": [ 959 | { 960 | "data": { 961 | "text/plain": [ 962 | "12305" 963 | ] 964 | }, 965 | "execution_count": 90, 966 | "metadata": {}, 967 | "output_type": "execute_result" 968 | } 969 | ], 970 | "source": [ 971 | "len(Car_url)" 972 | ] 973 | }, 974 | { 975 | "cell_type": "markdown", 976 | "metadata": {}, 977 | "source": [ 978 | "#### **Creating Excel file of URL for futher webscraping**" 979 | ] 980 | }, 981 | { 982 | "cell_type": "code", 983 | "execution_count": 3, 984 | "metadata": {}, 985 | "outputs": [], 986 | "source": [ 987 | "import pandas as pd" 988 | ] 989 | }, 990 | { 991 | "cell_type": "code", 992 | "execution_count": 97, 993 | "metadata": {}, 994 | "outputs": [], 995 | "source": [ 996 | "Car_URL = pd.DataFrame({})\n", 997 | "Car_URL['Urls'] = Car_url" 998 | ] 999 | }, 1000 | { 1001 | "cell_type": "code", 1002 | "execution_count": 98, 1003 | "metadata": {}, 1004 | "outputs": [], 1005 | "source": [ 1006 | "Car_URL.to_excel('Car_url.xlsx', index = False)" 1007 | ] 1008 | }, 1009 | { 1010 | "cell_type": "markdown", 1011 | "metadata": {}, 1012 | "source": [ 1013 | "## **Part 2 : Scraping features from Indiviudal Link**" 1014 | ] 1015 | }, 1016 | { 1017 | "cell_type": "markdown", 1018 | "metadata": {}, 1019 | "source": [ 1020 | "### **Importing excel file contain URLs.**" 1021 | ] 1022 | }, 1023 | { 1024 | "cell_type": "code", 1025 | "execution_count": 6, 1026 | "metadata": {}, 1027 | "outputs": [], 1028 | "source": [ 1029 | "import pandas as pd" 1030 | ] 1031 | }, 1032 | { 1033 | "cell_type": "code", 1034 | "execution_count": 4, 1035 | "metadata": {}, 1036 | "outputs": [], 1037 | "source": [ 1038 | "df = pd.read_excel('Car_url.xlsx')" 1039 | ] 1040 | }, 1041 | { 1042 | "cell_type": "code", 1043 | "execution_count": 5, 1044 | "metadata": {}, 1045 | "outputs": [ 1046 | { 1047 | "data": { 1048 | "text/plain": [ 1049 | "(12305, 1)" 1050 | ] 1051 | }, 1052 | "execution_count": 5, 1053 | "metadata": {}, 1054 | "output_type": "execute_result" 1055 | } 1056 | ], 1057 | "source": [ 1058 | "df.shape" 1059 | ] 1060 | }, 1061 | { 1062 | "cell_type": "markdown", 1063 | "metadata": {}, 1064 | "source": [ 1065 | "#### **As we have scrap around 12035 URL, We Will scrap car details in different batchs.**" 1066 | ] 1067 | }, 1068 | { 1069 | "cell_type": "code", 1070 | "execution_count": 6, 1071 | "metadata": {}, 1072 | "outputs": [], 1073 | "source": [ 1074 | "# Making Empty lists\n", 1075 | "Location = []\n", 1076 | "Model = []\n", 1077 | "Variant = []\n", 1078 | "Price = []\n", 1079 | "Make_year =[]\n", 1080 | "Fuel_Type = []\n", 1081 | "KMs_driven = []\n", 1082 | "Engine_displacement = []\n", 1083 | "Transmission = []\n", 1084 | "Milage = []\n", 1085 | "Max_power = []\n", 1086 | "Torque = []\n", 1087 | "Seats = []\n", 1088 | "Color = []\n", 1089 | "Gear_Box =[]\n", 1090 | "Steering_Type =[]\n", 1091 | "Front_Brake_Type = []\n", 1092 | "Rear_Brake_Type = []\n", 1093 | "Tyre_Volume = []\n", 1094 | "Cargo_volume = []\n", 1095 | "Engine_Type = []\n", 1096 | "No_of_cylinder = []\n", 1097 | "Value_Configuration = []\n", 1098 | "Fuel_Suppy_System = []\n", 1099 | "Turbo_charger = []\n", 1100 | "Super_charger = []\n", 1101 | "Length = []\n", 1102 | "Width =[]\n", 1103 | "Height = []\n", 1104 | "Gross_weight = []" 1105 | ] 1106 | }, 1107 | { 1108 | "cell_type": "code", 1109 | "execution_count": 21, 1110 | "metadata": {}, 1111 | "outputs": [], 1112 | "source": [ 1113 | "driver=webdriver.Chrome(r'C:\\chromedriver.exe')" 1114 | ] 1115 | }, 1116 | { 1117 | "cell_type": "markdown", 1118 | "metadata": {}, 1119 | "source": [ 1120 | "**Extracting details for batch 1 of 500**" 1121 | ] 1122 | }, 1123 | { 1124 | "cell_type": "code", 1125 | "execution_count": 16, 1126 | "metadata": {}, 1127 | "outputs": [ 1128 | { 1129 | "name": "stderr", 1130 | "output_type": "stream", 1131 | "text": [ 1132 | "100%|██████████| 100/100 [07:57<00:00, 4.78s/it]\n" 1133 | ] 1134 | } 1135 | ], 1136 | "source": [ 1137 | "from tqdm import tqdm\n", 1138 | "for i in tqdm(df['Urls'][9800:9900]):\n", 1139 | " driver.get(i)\n", 1140 | " time.sleep(0.5)\n", 1141 | " \n", 1142 | " # Extracting Car Model via xpath\n", 1143 | " try :\n", 1144 | " model = driver.find_element_by_xpath('//div[@class=\"gsc_col-xs-12\"]/h1')\n", 1145 | " Model.append(model.text[5:])\n", 1146 | " except NoSuchElementException:\n", 1147 | " try :\n", 1148 | " model = driver.find_element_by_xpath('//div[@class=\"gsc_container_hold\"]/div/h1[2]')\n", 1149 | " Model.append(model.text) \n", 1150 | " except NoSuchElementException:\n", 1151 | " pass\n", 1152 | " \n", 1153 | " #clicking to view all specifications\n", 1154 | " try:\n", 1155 | " view_more = driver.find_element_by_xpath(\"//*[text() = 'View All Specifications' or text() = 'View More']\")\n", 1156 | " driver.execute_script(\"arguments[0].scrollIntoView();\", view_more)\n", 1157 | " driver.execute_script(\"arguments[0].click();\", view_more)\n", 1158 | " \n", 1159 | " except NoSuchElementException:\n", 1160 | " try:\n", 1161 | " Button= driver.find_element_by_xpath('//*[@id=\"topspec\"]/div[2]/a')\n", 1162 | " Button.click()\n", 1163 | " time.sleep(1)\n", 1164 | " except NoSuchElementException:\n", 1165 | " pass\n", 1166 | " \n", 1167 | " time.sleep(0.75)\n", 1168 | " # Extracting Car Price via xpath\n", 1169 | " try :\n", 1170 | " price = driver.find_element_by_xpath('//div[@class=\"priceSection\"]/span[2]')\n", 1171 | " Price.append(price.text)\n", 1172 | " except NoSuchElementException:\n", 1173 | " try :\n", 1174 | " price = driver.find_element_by_xpath('//div[@class=\"gsc_container_hold\"]/span[1]/span')\n", 1175 | " Price.append(price.text)\n", 1176 | " except NoSuchElementException:\n", 1177 | " pass\n", 1178 | " \n", 1179 | " # Extracting Car Make Year via xpath\n", 1180 | " try :\n", 1181 | " year = driver.find_element_by_xpath('//*[text()=\"Make Year\"]/following-sibling::div')\n", 1182 | " Make_year.append(year.text) \n", 1183 | " except NoSuchElementException:\n", 1184 | " try :\n", 1185 | " year = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[1]/div/div')\n", 1186 | " Make_year.append(year.text)\n", 1187 | " except NoSuchElementException:\n", 1188 | " pass\n", 1189 | " \n", 1190 | " # Extracting Car Fuel Type via xpath\n", 1191 | " try :\n", 1192 | " fuel = driver.find_element_by_xpath('//*[text()=\"Fuel\"]/following-sibling::div')\n", 1193 | " Fuel_Type.append(fuel.text)\n", 1194 | " except NoSuchElementException:\n", 1195 | " try :\n", 1196 | " fuel = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[5]/div/div')\n", 1197 | " Fuel_Type.append(fuel.text)\n", 1198 | " except NoSuchElementException:\n", 1199 | " Fuel_Type.append('-')\n", 1200 | " \n", 1201 | " # Extracting KMS driven via xpath\n", 1202 | " try :\n", 1203 | " kms = driver.find_element_by_xpath('//*[text()=\"KMs Driven\"]/following-sibling::div')\n", 1204 | " KMs_driven.append(kms.text.replace('Kms',''))\n", 1205 | " except NoSuchElementException:\n", 1206 | " try :\n", 1207 | " kms = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[3]/div/div')\n", 1208 | " KMs_driven.append(kms.text.replace('kms',''))\n", 1209 | " except NoSuchElementException:\n", 1210 | " pass\n", 1211 | " \n", 1212 | " # Extracting Engine_displacemet via xpath\n", 1213 | " try :\n", 1214 | " engine_disp = driver.find_element_by_xpath('//*[text()=\"Engine Displacement\"]/following-sibling::div')\n", 1215 | " Engine_displacement.append(engine_disp.text.replace('CC','')) \n", 1216 | " except NoSuchElementException:\n", 1217 | " try :\n", 1218 | " engine_disp = driver.find_element_by_xpath('//*[text()=\"Engine\"]/following-sibling::div')\n", 1219 | " Engine_displacement.append(engine_disp.text.replace('CC',''))\n", 1220 | " except NoSuchElementException:\n", 1221 | " pass\n", 1222 | " \n", 1223 | " # Extracting Transmission via xpath\n", 1224 | " try :\n", 1225 | " transmission = driver.find_element_by_xpath('//*[text()=\"Transmission\"]/following-sibling::div')\n", 1226 | " Transmission.append(transmission.text)\n", 1227 | " except NoSuchElementException:\n", 1228 | " try :\n", 1229 | " transmission = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[6]/div/div')\n", 1230 | " Transmission.append(transmission.text)\n", 1231 | " except NoSuchElementException:\n", 1232 | " pass\n", 1233 | " time.sleep(0.25)\n", 1234 | " # Extracting Milage via xpath\n", 1235 | " try :\n", 1236 | " milage = driver.find_element_by_xpath('//*[text()=\"Mileage\"]/following-sibling::div')\n", 1237 | " Milage.append(milage.text.replace('kmpl',''))\n", 1238 | " \n", 1239 | " except NoSuchElementException:\n", 1240 | " Milage.append('-')\n", 1241 | " \n", 1242 | " # Extracting Max_power via xpath\n", 1243 | " try :\n", 1244 | " maxbhp = driver.find_element_by_xpath('//*[text()=\"Max Power\"]/following-sibling::div')\n", 1245 | " Max_power.append(maxbhp.text.replace('bhp',''))\n", 1246 | " \n", 1247 | " except NoSuchElementException:\n", 1248 | " Max_power.append('-')\n", 1249 | " \n", 1250 | " # Extracting Torque via xpath\n", 1251 | " try :\n", 1252 | " torque = driver.find_element_by_xpath('//*[text()=\"Torque\"]/following-sibling::div')\n", 1253 | " Torque.append(torque.text.replace('Nm',''))\n", 1254 | " \n", 1255 | " except NoSuchElementException:\n", 1256 | " Torque.append('-') \n", 1257 | " \n", 1258 | " # Extracting Seating capacity via xpath\n", 1259 | " try :\n", 1260 | " seats = driver.find_element_by_xpath('//*[text()=\"Seating Capacity\"]/following-sibling::div')\n", 1261 | " Seats.append(seats.text)\n", 1262 | " \n", 1263 | " except NoSuchElementException:\n", 1264 | " Seats.append('-') \n", 1265 | " \n", 1266 | " # Extracting color via xpath\n", 1267 | " try :\n", 1268 | " color = driver.find_element_by_xpath('//*[text()=\"Color\"]/following-sibling::div')\n", 1269 | " Color.append(color.text) \n", 1270 | " except NoSuchElementException:\n", 1271 | " Color.append('-')\n", 1272 | " \n", 1273 | " # Extracting Gear_Box via xpath\n", 1274 | " try :\n", 1275 | " gear_Box = driver.find_element_by_xpath('//*[text()=\"Gear Box\"]/following-sibling::div')\n", 1276 | " Gear_Box.append(gear_Box.text)\n", 1277 | " \n", 1278 | " except NoSuchElementException:\n", 1279 | " Gear_Box.append('-')\n", 1280 | " \n", 1281 | " # Extracting Steering_Type via xpath\n", 1282 | " try :\n", 1283 | " steering_Type = driver.find_element_by_xpath('//*[text()=\"Steering Type\"]/following-sibling::div')\n", 1284 | " Steering_Type.append(steering_Type.text)\n", 1285 | " \n", 1286 | " except NoSuchElementException:\n", 1287 | " Steering_Type.append('-')\n", 1288 | " \n", 1289 | " # Extracting Front_Brake_Type via xpath\n", 1290 | " try :\n", 1291 | " front_Brake_Type = driver.find_element_by_xpath('//*[text()=\"Front Brake Type\"]/following-sibling::div')\n", 1292 | " Front_Brake_Type.append(front_Brake_Type.text)\n", 1293 | " \n", 1294 | " except NoSuchElementException:\n", 1295 | " Front_Brake_Type.append('-')\n", 1296 | " \n", 1297 | " # Extracting Rear_Brake_Type via xpath\n", 1298 | " try :\n", 1299 | " rear_Brake_Type = driver.find_element_by_xpath('//*[text()=\"Rear Brake Type\"]/following-sibling::div')\n", 1300 | " Rear_Brake_Type.append(rear_Brake_Type.text)\n", 1301 | " \n", 1302 | " except NoSuchElementException:\n", 1303 | " Rear_Brake_Type.append('-')\n", 1304 | " \n", 1305 | " # Extracting Tyre_Volume via xpath\n", 1306 | " try :\n", 1307 | " tyre_Volume = driver.find_element_by_xpath('//*[text()=\"Tyre Type\"]/following-sibling::div')\n", 1308 | " Tyre_Volume.append(tyre_Volume.text)\n", 1309 | " \n", 1310 | " except NoSuchElementException:\n", 1311 | " Tyre_Volume.append('-')\n", 1312 | " \n", 1313 | " # Extracting Engine_Type via xpath\n", 1314 | " try :\n", 1315 | " engine_Type = driver.find_element_by_xpath('//*[text()=\"Engine Type\"]/following-sibling::div')\n", 1316 | " Engine_Type.append(engine_Type.text)\n", 1317 | " \n", 1318 | " except NoSuchElementException:\n", 1319 | " Engine_Type.append('-')\n", 1320 | " \n", 1321 | " # Extracting No_of_cylinder via xpath\n", 1322 | " try :\n", 1323 | " no_of_cylinder = driver.find_element_by_xpath('//*[text()=\"No of Cylinder\"]/following-sibling::div')\n", 1324 | " No_of_cylinder.append(no_of_cylinder.text) \n", 1325 | " except NoSuchElementException:\n", 1326 | " try :\n", 1327 | " no_of_cylinder = driver.find_element_by_xpath('//*[text()=\"No Of Cylinder\"]/following-sibling::div')\n", 1328 | " No_of_cylinder.append(no_of_cylinder.text)\n", 1329 | " except NoSuchElementException:\n", 1330 | " pass\n", 1331 | " \n", 1332 | " # Extracting Value_Configuration via xpath\n", 1333 | " try :\n", 1334 | " value_Configuration = driver.find_element_by_xpath('//*[text()=\"Value Configuration\"]/following-sibling::div')\n", 1335 | " Value_Configuration.append(value_Configuration.text) \n", 1336 | " except NoSuchElementException:\n", 1337 | " try :\n", 1338 | " value_Configuration = driver.find_element_by_xpath('//*[text()=\"Valve Configuration\"]/following-sibling::div')\n", 1339 | " Value_Configuration.append(value_Configuration.text) \n", 1340 | " except NoSuchElementException:\n", 1341 | " pass\n", 1342 | " \n", 1343 | " # Extracting Turbo_charger via xpath\n", 1344 | " try :\n", 1345 | " turbo_charger = driver.find_element_by_xpath('//*[text()=\"Turbo Charger\"]/following-sibling::div')\n", 1346 | " Turbo_charger.append(turbo_charger.text)\n", 1347 | " \n", 1348 | " except NoSuchElementException:\n", 1349 | " Turbo_charger.append('-')\n", 1350 | " \n", 1351 | " # Extracting Super_charger via xpath\n", 1352 | " try :\n", 1353 | " super_charger = driver.find_element_by_xpath('//*[text()=\"Super Charger\"]/following-sibling::div')\n", 1354 | " Super_charger.append(super_charger.text) \n", 1355 | " except NoSuchElementException:\n", 1356 | " try :\n", 1357 | " super_charger = driver.find_element_by_xpath('//*[text()=\"superCharger\"]/following-sibling::div')\n", 1358 | " Super_charger.append(super_charger.text) \n", 1359 | " except NoSuchElementException:\n", 1360 | " Super_charger.append('-')\n", 1361 | " \n", 1362 | " # Extracting Length via xpath\n", 1363 | " try :\n", 1364 | " length = driver.find_element_by_xpath('//*[text()=\"Length\"]/following-sibling::div')\n", 1365 | " Length.append(length.text.replace('mm',''))\n", 1366 | " \n", 1367 | " except NoSuchElementException:\n", 1368 | " Length.append('-')\n", 1369 | " \n", 1370 | " # Extracting Width via xpath\n", 1371 | " try :\n", 1372 | " width = driver.find_element_by_xpath('//*[text()=\"Width\"]/following-sibling::div')\n", 1373 | " Width.append(width.text.replace('mm',''))\n", 1374 | " \n", 1375 | " except NoSuchElementException:\n", 1376 | " Width.append('-')\n", 1377 | " \n", 1378 | " # Extracting Height via xpath\n", 1379 | " try :\n", 1380 | " height = driver.find_element_by_xpath('//*[text()=\"Height\"]/following-sibling::div')\n", 1381 | " Height.append(height.text.replace('mm',''))\n", 1382 | " \n", 1383 | " except NoSuchElementException:\n", 1384 | " Height.append('-')\n", 1385 | " " 1386 | ] 1387 | }, 1388 | { 1389 | "cell_type": "code", 1390 | "execution_count": 18, 1391 | "metadata": { 1392 | "scrolled": true 1393 | }, 1394 | "outputs": [ 1395 | { 1396 | "data": { 1397 | "text/html": [ 1398 | "
\n", 1399 | "\n", 1412 | "\n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1499 | " \n", 1500 | " \n", 1501 | " \n", 1502 | " \n", 1503 | " \n", 1504 | " \n", 1505 | " \n", 1506 | " \n", 1507 | " \n", 1508 | " \n", 1509 | " \n", 1510 | " \n", 1511 | " \n", 1512 | " \n", 1513 | " \n", 1514 | " \n", 1515 | " \n", 1516 | " \n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1523 | " \n", 1524 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | "
Car ModelMake YearFuel TypeKMs drivenEngine Displacement(CC)TransmissionMilage(kmpl)Max Power(bhp)Torque(Nm)Seating CapacityColorGear BoxSteering TypeFront Brake TypeRear Brake TypeTyre VolumeEngine TypeNo of CylinderTurbo ChargerSuper ChargerLength(mm)Width(mm)Height(mm)Price(Rs)
0BMW 5 Series 520d2012Diesel56,0001995Automatic18.4817735.7@ 1,750-3,000(kgm@ rpm)5Brown6 SpeedPowerVentilated discsVentilated discsTubeless,RadialIn-Line Engine4NoYes48411846146810.9 Lakh*
1Audi Q3 35 TDI Quattro Premium Plus2016Diesel97,0001968Automatic15.73174.33380@ 1750-2500rpm5Silver7-Speed S-TronicPowerVentilated DiscDrumTubeless,RadialTDI Diesel Engine4YesNo43852019160817.5 Lakh*
2Hyundai Creta SX Opt Diesel AT2021Diesel8001493Automatic18.5113.42250nm@ 1500-2750rpm5Grey6-SpeedPowerDiscDiscTubeless,Radial1.5L CRDi Diesel4Yes-43001790163520.25 Lakh*
3Maruti Baleno Alpha CVT2020Petrol13,8001197Automatic19.5681.80113@ 4200rpm5BlueCVTElectricDiscDrumTubeless,Radial1.2L VVT Engine4NoNo3995174515109 Lakh*
4Tata Tiago 1.2 Revotron XZ2018Petrol20,4631199Manual23.8484114@ 3500rpmWhite5 Speed4.3 Lakh*
\n", 1580 | "
" 1581 | ], 1582 | "text/plain": [ 1583 | " Car Model Make Year Fuel Type KMs driven \\\n", 1584 | "0 BMW 5 Series 520d 2012 Diesel 56,000 \n", 1585 | "1 Audi Q3 35 TDI Quattro Premium Plus 2016 Diesel 97,000 \n", 1586 | "2 Hyundai Creta SX Opt Diesel AT 2021 Diesel 800 \n", 1587 | "3 Maruti Baleno Alpha CVT 2020 Petrol 13,800 \n", 1588 | "4 Tata Tiago 1.2 Revotron XZ 2018 Petrol 20,463 \n", 1589 | "\n", 1590 | " Engine Displacement(CC) Transmission Milage(kmpl) Max Power(bhp) \\\n", 1591 | "0 1995 Automatic 18.48 177 \n", 1592 | "1 1968 Automatic 15.73 174.33 \n", 1593 | "2 1493 Automatic 18.5 113.42 \n", 1594 | "3 1197 Automatic 19.56 81.80 \n", 1595 | "4 1199 Manual 23.84 84 \n", 1596 | "\n", 1597 | " Torque(Nm) Seating Capacity Color Gear Box \\\n", 1598 | "0 35.7@ 1,750-3,000(kgm@ rpm) 5 Brown 6 Speed \n", 1599 | "1 380@ 1750-2500rpm 5 Silver 7-Speed S-Tronic \n", 1600 | "2 250nm@ 1500-2750rpm 5 Grey 6-Speed \n", 1601 | "3 113@ 4200rpm 5 Blue CVT \n", 1602 | "4 114@ 3500rpm White 5 Speed \n", 1603 | "\n", 1604 | " Steering Type Front Brake Type Rear Brake Type Tyre Volume \\\n", 1605 | "0 Power Ventilated discs Ventilated discs Tubeless,Radial \n", 1606 | "1 Power Ventilated Disc Drum Tubeless,Radial \n", 1607 | "2 Power Disc Disc Tubeless,Radial \n", 1608 | "3 Electric Disc Drum Tubeless,Radial \n", 1609 | "4 \n", 1610 | "\n", 1611 | " Engine Type No of Cylinder Turbo Charger Super Charger Length(mm) \\\n", 1612 | "0 In-Line Engine 4 No Yes 4841 \n", 1613 | "1 TDI Diesel Engine 4 Yes No 4385 \n", 1614 | "2 1.5L CRDi Diesel 4 Yes - 4300 \n", 1615 | "3 1.2L VVT Engine 4 No No 3995 \n", 1616 | "4 \n", 1617 | "\n", 1618 | " Width(mm) Height(mm) Price(Rs) \n", 1619 | "0 1846 1468 10.9 Lakh* \n", 1620 | "1 2019 1608 17.5 Lakh* \n", 1621 | "2 1790 1635 20.25 Lakh* \n", 1622 | "3 1745 1510 9 Lakh* \n", 1623 | "4 4.3 Lakh* " 1624 | ] 1625 | }, 1626 | "execution_count": 18, 1627 | "metadata": {}, 1628 | "output_type": "execute_result" 1629 | } 1630 | ], 1631 | "source": [ 1632 | "data = list(zip(Model, Make_year, Fuel_Type, KMs_driven, Engine_displacement, Transmission, Milage,\n", 1633 | " Max_power, Torque, Seats, Color, Gear_Box, Steering_Type, Front_Brake_Type, Rear_Brake_Type,\n", 1634 | " Tyre_Volume, Engine_Type, No_of_cylinder,\n", 1635 | " Turbo_charger, Super_charger, Length, Width, Height, Price))\n", 1636 | "Batch7 = pd.DataFrame(data, columns=['Car Model', 'Make Year', 'Fuel Type', 'KMs driven', 'Engine Displacement(CC)',\n", 1637 | " 'Transmission', 'Milage(kmpl)', 'Max Power(bhp)', 'Torque(Nm)', 'Seating Capacity',\n", 1638 | " 'Color', 'Gear Box', 'Steering Type', 'Front Brake Type', 'Rear Brake Type',\n", 1639 | " 'Tyre Volume', 'Engine Type', 'No of Cylinder', \n", 1640 | " 'Turbo Charger', 'Super Charger', 'Length(mm)', 'Width(mm)',\n", 1641 | " 'Height(mm)', 'Price(Rs)'])\n", 1642 | "\n", 1643 | "pd.set_option('display.max_columns', None)\n", 1644 | "Batch7.head(5)" 1645 | ] 1646 | }, 1647 | { 1648 | "cell_type": "code", 1649 | "execution_count": 198, 1650 | "metadata": {}, 1651 | "outputs": [ 1652 | { 1653 | "data": { 1654 | "text/plain": [ 1655 | "500" 1656 | ] 1657 | }, 1658 | "execution_count": 198, 1659 | "metadata": {}, 1660 | "output_type": "execute_result" 1661 | } 1662 | ], 1663 | "source": [ 1664 | "len(Batch1)" 1665 | ] 1666 | }, 1667 | { 1668 | "cell_type": "markdown", 1669 | "metadata": {}, 1670 | "source": [ 1671 | "### **For Next Batches same code of batch 1 is Re-Run again & again.**" 1672 | ] 1673 | }, 1674 | { 1675 | "cell_type": "code", 1676 | "execution_count": 14, 1677 | "metadata": {}, 1678 | "outputs": [ 1679 | { 1680 | "data": { 1681 | "text/plain": [ 1682 | "994" 1683 | ] 1684 | }, 1685 | "execution_count": 14, 1686 | "metadata": {}, 1687 | "output_type": "execute_result" 1688 | } 1689 | ], 1690 | "source": [ 1691 | "len(Batch2)" 1692 | ] 1693 | }, 1694 | { 1695 | "cell_type": "code", 1696 | "execution_count": 22, 1697 | "metadata": {}, 1698 | "outputs": [ 1699 | { 1700 | "data": { 1701 | "text/plain": [ 1702 | "1457" 1703 | ] 1704 | }, 1705 | "execution_count": 22, 1706 | "metadata": {}, 1707 | "output_type": "execute_result" 1708 | } 1709 | ], 1710 | "source": [ 1711 | "len(Batch3)" 1712 | ] 1713 | }, 1714 | { 1715 | "cell_type": "code", 1716 | "execution_count": 27, 1717 | "metadata": {}, 1718 | "outputs": [ 1719 | { 1720 | "data": { 1721 | "text/plain": [ 1722 | "3436" 1723 | ] 1724 | }, 1725 | "execution_count": 27, 1726 | "metadata": {}, 1727 | "output_type": "execute_result" 1728 | } 1729 | ], 1730 | "source": [ 1731 | "len(Batch4)" 1732 | ] 1733 | }, 1734 | { 1735 | "cell_type": "code", 1736 | "execution_count": 16, 1737 | "metadata": {}, 1738 | "outputs": [ 1739 | { 1740 | "data": { 1741 | "text/plain": [ 1742 | "2430" 1743 | ] 1744 | }, 1745 | "execution_count": 16, 1746 | "metadata": {}, 1747 | "output_type": "execute_result" 1748 | } 1749 | ], 1750 | "source": [ 1751 | "len(Batch5)" 1752 | ] 1753 | }, 1754 | { 1755 | "cell_type": "code", 1756 | "execution_count": 24, 1757 | "metadata": {}, 1758 | "outputs": [ 1759 | { 1760 | "data": { 1761 | "text/plain": [ 1762 | "740" 1763 | ] 1764 | }, 1765 | "execution_count": 24, 1766 | "metadata": {}, 1767 | "output_type": "execute_result" 1768 | } 1769 | ], 1770 | "source": [ 1771 | "len(Batch6)" 1772 | ] 1773 | }, 1774 | { 1775 | "cell_type": "code", 1776 | "execution_count": 19, 1777 | "metadata": {}, 1778 | "outputs": [ 1779 | { 1780 | "data": { 1781 | "text/plain": [ 1782 | "1612" 1783 | ] 1784 | }, 1785 | "execution_count": 19, 1786 | "metadata": {}, 1787 | "output_type": "execute_result" 1788 | } 1789 | ], 1790 | "source": [ 1791 | "len(Batch7)" 1792 | ] 1793 | }, 1794 | { 1795 | "cell_type": "markdown", 1796 | "metadata": {}, 1797 | "source": [ 1798 | "#### **Exporting Batch wise data in Excel file.**" 1799 | ] 1800 | }, 1801 | { 1802 | "cell_type": "code", 1803 | "execution_count": 199, 1804 | "metadata": {}, 1805 | "outputs": [], 1806 | "source": [ 1807 | "# Saving Batch1 data in excel file\n", 1808 | "Batch1.to_excel('Batch1 (0-500).xlsx', index = False)" 1809 | ] 1810 | }, 1811 | { 1812 | "cell_type": "code", 1813 | "execution_count": 15, 1814 | "metadata": {}, 1815 | "outputs": [], 1816 | "source": [ 1817 | "# Saving batch 2 data in excel \n", 1818 | "Batch2.to_excel('Batch1 (500-1500).xlsx', index = False)" 1819 | ] 1820 | }, 1821 | { 1822 | "cell_type": "code", 1823 | "execution_count": 23, 1824 | "metadata": {}, 1825 | "outputs": [], 1826 | "source": [ 1827 | "# Saving batch 3 data in excel \n", 1828 | "Batch3.to_excel('Batch3 (1500-3000).xlsx', index = False)" 1829 | ] 1830 | }, 1831 | { 1832 | "cell_type": "code", 1833 | "execution_count": 28, 1834 | "metadata": {}, 1835 | "outputs": [], 1836 | "source": [ 1837 | "# Saving Batch4 data in excel file 3000-5000\n", 1838 | "Batch4.to_excel('Batch 4 (3000 - 5000).xlsx', index = False)" 1839 | ] 1840 | }, 1841 | { 1842 | "cell_type": "code", 1843 | "execution_count": 17, 1844 | "metadata": {}, 1845 | "outputs": [], 1846 | "source": [ 1847 | "# Saving Batch5 data in excel file 5000 - 7500\n", 1848 | "Batch5.to_excel('Batch 5 (5000).xlsx', index = False)" 1849 | ] 1850 | }, 1851 | { 1852 | "cell_type": "code", 1853 | "execution_count": 25, 1854 | "metadata": {}, 1855 | "outputs": [], 1856 | "source": [ 1857 | "# Saving Batch6 data in excel file 7500 - 8500\n", 1858 | "Batch6.to_excel('Batch 6 (7500-8250).xlsx', index = False)" 1859 | ] 1860 | }, 1861 | { 1862 | "cell_type": "code", 1863 | "execution_count": 20, 1864 | "metadata": {}, 1865 | "outputs": [], 1866 | "source": [ 1867 | "# Saving Batch7 data in excel file 7500 - 8500\n", 1868 | "Batch7.to_excel('Batch 7 (8250-9800).xlsx', index = False)" 1869 | ] 1870 | }, 1871 | { 1872 | "cell_type": "code", 1873 | "execution_count": 21, 1874 | "metadata": {}, 1875 | "outputs": [ 1876 | { 1877 | "data": { 1878 | "text/plain": [ 1879 | "(1612, 24)" 1880 | ] 1881 | }, 1882 | "execution_count": 21, 1883 | "metadata": {}, 1884 | "output_type": "execute_result" 1885 | } 1886 | ], 1887 | "source": [ 1888 | "Batch7.shape" 1889 | ] 1890 | }, 1891 | { 1892 | "cell_type": "markdown", 1893 | "metadata": {}, 1894 | "source": [ 1895 | "## **Summary** :\n", 1896 | "- **We have scrape more than 11000 cars details with 24 features.**" 1897 | ] 1898 | }, 1899 | { 1900 | "cell_type": "code", 1901 | "execution_count": null, 1902 | "metadata": {}, 1903 | "outputs": [], 1904 | "source": [] 1905 | } 1906 | ], 1907 | "metadata": { 1908 | "kernelspec": { 1909 | "display_name": "Python 3", 1910 | "language": "python", 1911 | "name": "python3" 1912 | }, 1913 | "language_info": { 1914 | "codemirror_mode": { 1915 | "name": "ipython", 1916 | "version": 3 1917 | }, 1918 | "file_extension": ".py", 1919 | "mimetype": "text/x-python", 1920 | "name": "python", 1921 | "nbconvert_exporter": "python", 1922 | "pygments_lexer": "ipython3", 1923 | "version": "3.8.5" 1924 | } 1925 | }, 1926 | "nbformat": 4, 1927 | "nbformat_minor": 4 1928 | } 1929 | -------------------------------------------------------------------------------- /Project Used Car price predication using ML/Car price prediction using ML ppt.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Used Car price predication using ML/Car price prediction using ML ppt.pptx -------------------------------------------------------------------------------- /Project Used Car price predication using ML/Project Report Car Price Predication .pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Used Car price predication using ML/Project Report Car Price Predication .pdf -------------------------------------------------------------------------------- /Project Used Car price predication using ML/Read me.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Internship 2 | ## This Repository includes project done at Flip Robo as part of Data Sciene & Machine Learning Internship 3 | - *Internship Duration - **6 Months** (Sep 2021 - March 2022)* 4 | - *For some Projects data set was provided while for **remaining projects webscraping is done using Selenium** before model building* 5 | - *For each project **Detail Project Report and Presentation is prepared along with ML Model juypter Notebook** which you can find in respective project Repository* 6 | - *Literature review is done for all projects and you can find **reference research paper** used in project in Project Report* 7 | 8 | ### **During Duration I work on Several Web Scraping, Machine Learning and Natural Language Processing (NLP) Projects as mention below:** 9 | #### Web Scraping Assignments 10 | 1. Web Scraping Assignment 1 - Beautiful Soup 11 | 2. Web Scraping Assignment 2 - Selenium 12 | 3. Web Scraping Assignment 3 - Selenium Exception Handling 13 | 4. Web Scraping Assignment 5 - Selenium 14 | 5. Worksheet Assignment on ML, Stats and Python 15 | 16 | #### Machine Learning (ML) Projects 17 | 1. [Micro Credit Defaulter {Data Provided}](https://github.com/Lab-of-Infinity/Internship/tree/main/Micro%20Credit%20Defaulter%20Project) 18 | 2. [Used Car price predication using ML *{Data Scrap using Selenium before Model Building}*](https://github.com/Lab-of-Infinity/Internship/tree/main/Project%20Used%20Car%20price%20predication%20using%20ML) 19 | 3. [Customer Retention in Ecommerce sector {Data Provided}](https://github.com/Lab-of-Infinity/Internship/tree/main/Project%20Customer%20Retention%20in%20Ecommerce%20sector) 20 | 4. [Flight Price Predication using Machine Learning *{Data Scrap using Selenium before Model Building}*](https://github.com/Lab-of-Infinity/Internship/tree/main/Flight%20Price%20Predication%20using%20Machine%20Learning) 21 | 5. [Surprise Housing - Housing Price Predication & Analysis Project {Data Provided}](https://github.com/Lab-of-Infinity/Internship/tree/main/Surprise%20Housing%20-%20Housing%20Price%20Predication%20%26%20Analysis%20Project) 22 | 23 | #### Natural Language Processing (NLP) Projects 24 | 1. [Product Review Rating Predication Using NLP *{Data Scrap using Selenium before Model Building}*](https://github.com/Lab-of-Infinity/Internship/tree/main/Product%20Review%20Rating%20Predication%20Using%20NLP) 25 | 2. [Malignant Commentes Classifier - Multi Label Classification Project using NLP {Data Provided} ](https://github.com/Lab-of-Infinity/Internship/tree/main/Malignant%20Commentes%20Classifier%20-%20Multi%20Label%20Classification%20Project%20using%20NLP) 26 | -------------------------------------------------------------------------------- /Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/Data Description.txt: -------------------------------------------------------------------------------- 1 | MSSubClass: Identifies the type of dwelling involved in the sale. 2 | 3 | 20 1-STORY 1946 & NEWER ALL STYLES 4 | 30 1-STORY 1945 & OLDER 5 | 40 1-STORY W/FINISHED ATTIC ALL AGES 6 | 45 1-1/2 STORY - UNFINISHED ALL AGES 7 | 50 1-1/2 STORY FINISHED ALL AGES 8 | 60 2-STORY 1946 & NEWER 9 | 70 2-STORY 1945 & OLDER 10 | 75 2-1/2 STORY ALL AGES 11 | 80 SPLIT OR MULTI-LEVEL 12 | 85 SPLIT FOYER 13 | 90 DUPLEX - ALL STYLES AND AGES 14 | 120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER 15 | 150 1-1/2 STORY PUD - ALL AGES 16 | 160 2-STORY PUD - 1946 & NEWER 17 | 180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER 18 | 190 2 FAMILY CONVERSION - ALL STYLES AND AGES 19 | 20 | MSZoning: Identifies the general zoning classification of the sale. 21 | 22 | A Agriculture 23 | C Commercial 24 | FV Floating Village Residential 25 | I Industrial 26 | RH Residential High Density 27 | RL Residential Low Density 28 | RP Residential Low Density Park 29 | RM Residential Medium Density 30 | 31 | LotFrontage: Linear feet of street connected to property 32 | 33 | LotArea: Lot size in square feet 34 | 35 | Street: Type of road access to property 36 | 37 | Grvl Gravel 38 | Pave Paved 39 | 40 | Alley: Type of alley access to property 41 | 42 | Grvl Gravel 43 | Pave Paved 44 | NA No alley access 45 | 46 | LotShape: General shape of property 47 | 48 | Reg Regular 49 | IR1 Slightly irregular 50 | IR2 Moderately Irregular 51 | IR3 Irregular 52 | 53 | LandContour: Flatness of the property 54 | 55 | Lvl Near Flat/Level 56 | Bnk Banked - Quick and significant rise from street grade to building 57 | HLS Hillside - Significant slope from side to side 58 | Low Depression 59 | 60 | Utilities: Type of utilities available 61 | 62 | AllPub All public Utilities (E,G,W,& S) 63 | NoSewr Electricity, Gas, and Water (Septic Tank) 64 | NoSeWa Electricity and Gas Only 65 | ELO Electricity only 66 | 67 | LotConfig: Lot configuration 68 | 69 | Inside Inside lot 70 | Corner Corner lot 71 | CulDSac Cul-de-sac 72 | FR2 Frontage on 2 sides of property 73 | FR3 Frontage on 3 sides of property 74 | 75 | LandSlope: Slope of property 76 | 77 | Gtl Gentle slope 78 | Mod Moderate Slope 79 | Sev Severe Slope 80 | 81 | Neighborhood: Physical locations within Ames city limits 82 | 83 | Blmngtn Bloomington Heights 84 | Blueste Bluestem 85 | BrDale Briardale 86 | BrkSide Brookside 87 | ClearCr Clear Creek 88 | CollgCr College Creek 89 | Crawfor Crawford 90 | Edwards Edwards 91 | Gilbert Gilbert 92 | IDOTRR Iowa DOT and Rail Road 93 | MeadowV Meadow Village 94 | Mitchel Mitchell 95 | Names North Ames 96 | NoRidge Northridge 97 | NPkVill Northpark Villa 98 | NridgHt Northridge Heights 99 | NWAmes Northwest Ames 100 | OldTown Old Town 101 | SWISU South & West of Iowa State University 102 | Sawyer Sawyer 103 | SawyerW Sawyer West 104 | Somerst Somerset 105 | StoneBr Stone Brook 106 | Timber Timberland 107 | Veenker Veenker 108 | 109 | Condition1: Proximity to various conditions 110 | 111 | Artery Adjacent to arterial street 112 | Feedr Adjacent to feeder street 113 | Norm Normal 114 | RRNn Within 200' of North-South Railroad 115 | RRAn Adjacent to North-South Railroad 116 | PosN Near positive off-site feature--park, greenbelt, etc. 117 | PosA Adjacent to postive off-site feature 118 | RRNe Within 200' of East-West Railroad 119 | RRAe Adjacent to East-West Railroad 120 | 121 | Condition2: Proximity to various conditions (if more than one is present) 122 | 123 | Artery Adjacent to arterial street 124 | Feedr Adjacent to feeder street 125 | Norm Normal 126 | RRNn Within 200' of North-South Railroad 127 | RRAn Adjacent to North-South Railroad 128 | PosN Near positive off-site feature--park, greenbelt, etc. 129 | PosA Adjacent to postive off-site feature 130 | RRNe Within 200' of East-West Railroad 131 | RRAe Adjacent to East-West Railroad 132 | 133 | BldgType: Type of dwelling 134 | 135 | 1Fam Single-family Detached 136 | 2FmCon Two-family Conversion; originally built as one-family dwelling 137 | Duplx Duplex 138 | TwnhsE Townhouse End Unit 139 | TwnhsI Townhouse Inside Unit 140 | 141 | HouseStyle: Style of dwelling 142 | 143 | 1Story One story 144 | 1.5Fin One and one-half story: 2nd level finished 145 | 1.5Unf One and one-half story: 2nd level unfinished 146 | 2Story Two story 147 | 2.5Fin Two and one-half story: 2nd level finished 148 | 2.5Unf Two and one-half story: 2nd level unfinished 149 | SFoyer Split Foyer 150 | SLvl Split Level 151 | 152 | OverallQual: Rates the overall material and finish of the house 153 | 154 | 10 Very Excellent 155 | 9 Excellent 156 | 8 Very Good 157 | 7 Good 158 | 6 Above Average 159 | 5 Average 160 | 4 Below Average 161 | 3 Fair 162 | 2 Poor 163 | 1 Very Poor 164 | 165 | OverallCond: Rates the overall condition of the house 166 | 167 | 10 Very Excellent 168 | 9 Excellent 169 | 8 Very Good 170 | 7 Good 171 | 6 Above Average 172 | 5 Average 173 | 4 Below Average 174 | 3 Fair 175 | 2 Poor 176 | 1 Very Poor 177 | 178 | YearBuilt: Original construction date 179 | 180 | YearRemodAdd: Remodel date (same as construction date if no remodeling or additions) 181 | 182 | RoofStyle: Type of roof 183 | 184 | Flat Flat 185 | Gable Gable 186 | Gambrel Gabrel (Barn) 187 | Hip Hip 188 | Mansard Mansard 189 | Shed Shed 190 | 191 | RoofMatl: Roof material 192 | 193 | ClyTile Clay or Tile 194 | CompShg Standard (Composite) Shingle 195 | Membran Membrane 196 | Metal Metal 197 | Roll Roll 198 | Tar&Grv Gravel & Tar 199 | WdShake Wood Shakes 200 | WdShngl Wood Shingles 201 | 202 | Exterior1st: Exterior covering on house 203 | 204 | AsbShng Asbestos Shingles 205 | AsphShn Asphalt Shingles 206 | BrkComm Brick Common 207 | BrkFace Brick Face 208 | CBlock Cinder Block 209 | CemntBd Cement Board 210 | HdBoard Hard Board 211 | ImStucc Imitation Stucco 212 | MetalSd Metal Siding 213 | Other Other 214 | Plywood Plywood 215 | PreCast PreCast 216 | Stone Stone 217 | Stucco Stucco 218 | VinylSd Vinyl Siding 219 | Wd Sdng Wood Siding 220 | WdShing Wood Shingles 221 | 222 | Exterior2nd: Exterior covering on house (if more than one material) 223 | 224 | AsbShng Asbestos Shingles 225 | AsphShn Asphalt Shingles 226 | BrkComm Brick Common 227 | BrkFace Brick Face 228 | CBlock Cinder Block 229 | CemntBd Cement Board 230 | HdBoard Hard Board 231 | ImStucc Imitation Stucco 232 | MetalSd Metal Siding 233 | Other Other 234 | Plywood Plywood 235 | PreCast PreCast 236 | Stone Stone 237 | Stucco Stucco 238 | VinylSd Vinyl Siding 239 | Wd Sdng Wood Siding 240 | WdShing Wood Shingles 241 | 242 | MasVnrType: Masonry veneer type 243 | 244 | BrkCmn Brick Common 245 | BrkFace Brick Face 246 | CBlock Cinder Block 247 | None None 248 | Stone Stone 249 | 250 | MasVnrArea: Masonry veneer area in square feet 251 | 252 | ExterQual: Evaluates the quality of the material on the exterior 253 | 254 | Ex Excellent 255 | Gd Good 256 | TA Average/Typical 257 | Fa Fair 258 | Po Poor 259 | 260 | ExterCond: Evaluates the present condition of the material on the exterior 261 | 262 | Ex Excellent 263 | Gd Good 264 | TA Average/Typical 265 | Fa Fair 266 | Po Poor 267 | 268 | Foundation: Type of foundation 269 | 270 | BrkTil Brick & Tile 271 | CBlock Cinder Block 272 | PConc Poured Contrete 273 | Slab Slab 274 | Stone Stone 275 | Wood Wood 276 | 277 | BsmtQual: Evaluates the height of the basement 278 | 279 | Ex Excellent (100+ inches) 280 | Gd Good (90-99 inches) 281 | TA Typical (80-89 inches) 282 | Fa Fair (70-79 inches) 283 | Po Poor (<70 inches 284 | NA No Basement 285 | 286 | BsmtCond: Evaluates the general condition of the basement 287 | 288 | Ex Excellent 289 | Gd Good 290 | TA Typical - slight dampness allowed 291 | Fa Fair - dampness or some cracking or settling 292 | Po Poor - Severe cracking, settling, or wetness 293 | NA No Basement 294 | 295 | BsmtExposure: Refers to walkout or garden level walls 296 | 297 | Gd Good Exposure 298 | Av Average Exposure (split levels or foyers typically score average or above) 299 | Mn Mimimum Exposure 300 | No No Exposure 301 | NA No Basement 302 | 303 | BsmtFinType1: Rating of basement finished area 304 | 305 | GLQ Good Living Quarters 306 | ALQ Average Living Quarters 307 | BLQ Below Average Living Quarters 308 | Rec Average Rec Room 309 | LwQ Low Quality 310 | Unf Unfinshed 311 | NA No Basement 312 | 313 | BsmtFinSF1: Type 1 finished square feet 314 | 315 | BsmtFinType2: Rating of basement finished area (if multiple types) 316 | 317 | GLQ Good Living Quarters 318 | ALQ Average Living Quarters 319 | BLQ Below Average Living Quarters 320 | Rec Average Rec Room 321 | LwQ Low Quality 322 | Unf Unfinshed 323 | NA No Basement 324 | 325 | BsmtFinSF2: Type 2 finished square feet 326 | 327 | BsmtUnfSF: Unfinished square feet of basement area 328 | 329 | TotalBsmtSF: Total square feet of basement area 330 | 331 | Heating: Type of heating 332 | 333 | Floor Floor Furnace 334 | GasA Gas forced warm air furnace 335 | GasW Gas hot water or steam heat 336 | Grav Gravity furnace 337 | OthW Hot water or steam heat other than gas 338 | Wall Wall furnace 339 | 340 | HeatingQC: Heating quality and condition 341 | 342 | Ex Excellent 343 | Gd Good 344 | TA Average/Typical 345 | Fa Fair 346 | Po Poor 347 | 348 | CentralAir: Central air conditioning 349 | 350 | N No 351 | Y Yes 352 | 353 | Electrical: Electrical system 354 | 355 | SBrkr Standard Circuit Breakers & Romex 356 | FuseA Fuse Box over 60 AMP and all Romex wiring (Average) 357 | FuseF 60 AMP Fuse Box and mostly Romex wiring (Fair) 358 | FuseP 60 AMP Fuse Box and mostly knob & tube wiring (poor) 359 | Mix Mixed 360 | 361 | 1stFlrSF: First Floor square feet 362 | 363 | 2ndFlrSF: Second floor square feet 364 | 365 | LowQualFinSF: Low quality finished square feet (all floors) 366 | 367 | GrLivArea: Above grade (ground) living area square feet 368 | 369 | BsmtFullBath: Basement full bathrooms 370 | 371 | BsmtHalfBath: Basement half bathrooms 372 | 373 | FullBath: Full bathrooms above grade 374 | 375 | HalfBath: Half baths above grade 376 | 377 | Bedroom: Bedrooms above grade (does NOT include basement bedrooms) 378 | 379 | Kitchen: Kitchens above grade 380 | 381 | KitchenQual: Kitchen quality 382 | 383 | Ex Excellent 384 | Gd Good 385 | TA Typical/Average 386 | Fa Fair 387 | Po Poor 388 | 389 | TotRmsAbvGrd: Total rooms above grade (does not include bathrooms) 390 | 391 | Functional: Home functionality (Assume typical unless deductions are warranted) 392 | 393 | Typ Typical Functionality 394 | Min1 Minor Deductions 1 395 | Min2 Minor Deductions 2 396 | Mod Moderate Deductions 397 | Maj1 Major Deductions 1 398 | Maj2 Major Deductions 2 399 | Sev Severely Damaged 400 | Sal Salvage only 401 | 402 | Fireplaces: Number of fireplaces 403 | 404 | FireplaceQu: Fireplace quality 405 | 406 | Ex Excellent - Exceptional Masonry Fireplace 407 | Gd Good - Masonry Fireplace in main level 408 | TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement 409 | Fa Fair - Prefabricated Fireplace in basement 410 | Po Poor - Ben Franklin Stove 411 | NA No Fireplace 412 | 413 | GarageType: Garage location 414 | 415 | 2Types More than one type of garage 416 | Attchd Attached to home 417 | Basment Basement Garage 418 | BuiltIn Built-In (Garage part of house - typically has room above garage) 419 | CarPort Car Port 420 | Detchd Detached from home 421 | NA No Garage 422 | 423 | GarageYrBlt: Year garage was built 424 | 425 | GarageFinish: Interior finish of the garage 426 | 427 | Fin Finished 428 | RFn Rough Finished 429 | Unf Unfinished 430 | NA No Garage 431 | 432 | GarageCars: Size of garage in car capacity 433 | 434 | GarageArea: Size of garage in square feet 435 | 436 | GarageQual: Garage quality 437 | 438 | Ex Excellent 439 | Gd Good 440 | TA Typical/Average 441 | Fa Fair 442 | Po Poor 443 | NA No Garage 444 | 445 | GarageCond: Garage condition 446 | 447 | Ex Excellent 448 | Gd Good 449 | TA Typical/Average 450 | Fa Fair 451 | Po Poor 452 | NA No Garage 453 | 454 | PavedDrive: Paved driveway 455 | 456 | Y Paved 457 | P Partial Pavement 458 | N Dirt/Gravel 459 | 460 | WoodDeckSF: Wood deck area in square feet 461 | 462 | OpenPorchSF: Open porch area in square feet 463 | 464 | EnclosedPorch: Enclosed porch area in square feet 465 | 466 | 3SsnPorch: Three season porch area in square feet 467 | 468 | ScreenPorch: Screen porch area in square feet 469 | 470 | PoolArea: Pool area in square feet 471 | 472 | PoolQC: Pool quality 473 | 474 | Ex Excellent 475 | Gd Good 476 | TA Average/Typical 477 | Fa Fair 478 | NA No Pool 479 | 480 | Fence: Fence quality 481 | 482 | GdPrv Good Privacy 483 | MnPrv Minimum Privacy 484 | GdWo Good Wood 485 | MnWw Minimum Wood/Wire 486 | NA No Fence 487 | 488 | MiscFeature: Miscellaneous feature not covered in other categories 489 | 490 | Elev Elevator 491 | Gar2 2nd Garage (if not described in garage section) 492 | Othr Other 493 | Shed Shed (over 100 SF) 494 | TenC Tennis Court 495 | NA None 496 | 497 | MiscVal: $Value of miscellaneous feature 498 | 499 | MoSold: Month Sold (MM) 500 | 501 | YrSold: Year Sold (YYYY) 502 | 503 | SaleType: Type of sale 504 | 505 | WD Warranty Deed - Conventional 506 | CWD Warranty Deed - Cash 507 | VWD Warranty Deed - VA Loan 508 | New Home just constructed and sold 509 | COD Court Officer Deed/Estate 510 | Con Contract 15% Down payment regular terms 511 | ConLw Contract Low Down payment and low interest 512 | ConLI Contract Low Interest 513 | ConLD Contract Low Down 514 | Oth Other 515 | 516 | SaleCondition: Condition of sale 517 | 518 | Normal Normal Sale 519 | Abnorml Abnormal Sale - trade, foreclosure, short sale 520 | AdjLand Adjoining Land Purchase 521 | Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit 522 | Family Sale between family members 523 | Partial Home was not completed when last assessed (associated with New Homes) -------------------------------------------------------------------------------- /Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/HOUSING Use Case 2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/HOUSING Use Case 2.pdf -------------------------------------------------------------------------------- /Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/sample documentation.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/sample documentation.docx -------------------------------------------------------------------------------- /Surprise Housing - Housing Price Predication & Analysis Project/README.md: -------------------------------------------------------------------------------- 1 | ## Surprise Housing - Housing Price Predication & Analysis Project 2 | 3 | Houses are one of the necessary need of each and every person around the globe and therefore housing and real estate 4 | market is one of the markets which is one of the major contributors in the world’s economy. It is a very large market 5 | and there are various companies working in the domain. Data science comes as a very important tool to solve problems 6 | in the domain to help the companies increase their overall revenue, profits, improving their marketing strategies and 7 | focusing on changing trends in house sales and purchases. Predictive modelling, Market mix modelling, 8 | recommendation systems are some of the machine learning techniques used for achieving the business goals for housing 9 | companies. Our problem is related to one such housing company. 10 | 11 | A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses 12 | data analytics to purchase houses at a price below their actual values and flip them at a higher price. For the same 13 | purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file 14 | below. 15 | 16 | The company is looking at prospective properties to buy houses to enter the market. You are required to build a model 17 | using Machine Learning in order to predict the actual value of the prospective properties and decide whether to invest 18 | in them or not. For this company wants to know: 19 | • **Which variables are important to predict the price of variable?** 20 | 21 | • **How do these variables describe the price of the house?** 22 | 23 | ### Business Goal: 24 | You are required to model the price of houses with the available independent variables. This model will then be used 25 | by the management to understand how exactly the prices vary with the variables. They can accordingly manipulate the 26 | strategy of the firm and concentrate on areas that will yield high returns. Further, the model will be a good way for the 27 | management to understand the pricing dynamics of a new market. 28 | -------------------------------------------------------------------------------- /Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing - Housing Price Predication & Analysis Project.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing - Housing Price Predication & Analysis Project.pdf -------------------------------------------------------------------------------- /Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing Price Predication .pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing Price Predication .pptx -------------------------------------------------------------------------------- /Web Scraping 1 Assignment/Read me.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Web Scraping Selenium Assignment 3/Fruits_Cars_ML_google_images.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Web Scraping Selenium Assignment 3/Fruits_Cars_ML_google_images.zip -------------------------------------------------------------------------------- /Web Scraping Selenium Assignment 3/Selenium Exception Handling Assignment: -------------------------------------------------------------------------------- 1 | Web Scraping Assinment 3 2 | at FlipRobo 3 | On Selenium exception Handling 4 | -------------------------------------------------------------------------------- /Web Scraping Selenium Assignment 3/WEB-SCRAPING-ASSIGNMENT-3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Web Scraping Selenium Assignment 3/WEB-SCRAPING-ASSIGNMENT-3.pdf -------------------------------------------------------------------------------- /WebScraping Assignment 4 Selenium/Web Scraping Assignment 4: -------------------------------------------------------------------------------- 1 | Webscraping assignment 4 on selenium 2 | -------------------------------------------------------------------------------- /Webscraping Assignment 2 Selenium/Webscraping 2.md: -------------------------------------------------------------------------------- 1 | Selenium Webscraping assignment 2 2 | -------------------------------------------------------------------------------- /Worksheet_set_1/Machine Learning Worksheet 1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Worksheet_set_1/Machine Learning Worksheet 1.pdf -------------------------------------------------------------------------------- /Worksheet_set_1/Python Worksheet 1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 11. Write a python program to find the factorial of a number" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 19, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "Enter the Number :30\n", 20 | "Factorial is the product of all positive integers less than or equal to that number\n", 21 | "The Factorial of 30 is 265252859812191058636308480000000\n" 22 | ] 23 | } 24 | ], 25 | "source": [ 26 | "# program to find the factorial of a number\n", 27 | "def factorial(num):\n", 28 | " \n", 29 | " print('Factorial is the product of all positive integers less than or equal to that number')\n", 30 | " \n", 31 | " fact =1\n", 32 | " \n", 33 | " if num >=1 :\n", 34 | " for num in range (1, num+1):\n", 35 | " fact = fact * num\n", 36 | " print('The Factorial of', num, 'is',fact)\n", 37 | " \n", 38 | " elif num ==0 :\n", 39 | " print(' The Factorial of 0 is 1')\n", 40 | " \n", 41 | " elif num < 0:\n", 42 | " print('The Factorial of negative number doesnot exit')\n", 43 | " \n", 44 | "num = int (input('Enter the Number :'))\n", 45 | "factorial(num)\n", 46 | " \n" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "#### Alternate method using factorial from Math libary (math.factorial())" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 24, 59 | "metadata": {}, 60 | "outputs": [ 61 | { 62 | "name": "stdout", 63 | "output_type": "stream", 64 | "text": [ 65 | "Enter the Number :15\n", 66 | "Factorial of 15 is 1307674368000\n" 67 | ] 68 | } 69 | ], 70 | "source": [ 71 | "# program to find the factorial of a number\n", 72 | "\n", 73 | "import math\n", 74 | "\n", 75 | "num = int (input('Enter the Number :'))\n", 76 | "print('Factorial of', num, 'is', math.factorial(num))" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "## 12. Write a python program to find whether a number is prime or composite." 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 3, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "Enter The Number:7\n", 96 | "7 is a Prime Number\n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "num = int(input('Enter The Number:'))\n", 102 | "\n", 103 | "if num >1:\n", 104 | " for i in range(2,int(num/2)+1):\n", 105 | " if (num % i ==0):\n", 106 | " print('The input number ', num, 'is not a Prime Number')\n", 107 | " break\n", 108 | " else:\n", 109 | " print(num, 'is a Prime Number')\n", 110 | " \n", 111 | "# 0 and 1 are not considered as prime numbers\n", 112 | "# Prime number exist only for positive whole number greater than 1. (prime number doesnot exist for negative number)\n", 113 | "\n", 114 | "else:\n", 115 | " print(num, 'is not a Prime Number')" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "## 13. Write a python program to check whether a given string is palindrome or not" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 7, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "Enter the string :1256@#@6521\n", 135 | "The input string is Palindrome\n" 136 | ] 137 | } 138 | ], 139 | "source": [ 140 | "Input_str = input('Enter the string :') \n", 141 | "\n", 142 | "def Palindrome(s):\n", 143 | " return s == s[::-1]\n", 144 | "\n", 145 | "# check string is palindrome or not\n", 146 | "\n", 147 | "check = Palindrome(Input_str)\n", 148 | "\n", 149 | "if check:\n", 150 | " print('The input string is Palindrome')\n", 151 | " \n", 152 | "else:\n", 153 | " print('The input string is not Palindrome')" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "## 14. Write a Python program to get the third side of right-angled triangle from two given sides." 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 11, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "name": "stdout", 170 | "output_type": "stream", 171 | "text": [ 172 | "Opposite side denoted by x\n", 173 | "Adjacent side denoted by y\n", 174 | "Hypotenuse is denoted by z\n", 175 | "\n", 176 | "\n", 177 | "Which side (x,y,z) do you want to calculatez\n", 178 | "\n", 179 | "\n", 180 | "Input the length of side x: 3\n", 181 | "Input the length of side y: 4\n", 182 | "The length of Hypotenus(z) is 5: \n" 183 | ] 184 | } 185 | ], 186 | "source": [ 187 | "import math\n", 188 | "# Printing notation of three side for user\n", 189 | "print('Opposite side denoted by x')\n", 190 | "print('Adjacent side denoted by y')\n", 191 | "print('Hypotenuse is denoted by z')\n", 192 | "print('\\n')\n", 193 | "\n", 194 | "# Taking input for user about which side need to be calculated\n", 195 | "choice = input('Which side (x,y,z) do you want to calculate')\n", 196 | "print('\\n')\n", 197 | "\n", 198 | "# Calculating remaining side of right angle triangle using if-else loop\n", 199 | "\n", 200 | "if choice == 'x':\n", 201 | " y = float(input('Input the length of side y: '))\n", 202 | " z = float(input('Input the length of side z: '))\n", 203 | " x = math.sqrt((z * z) - (y * y))\n", 204 | " print('The length of Opposite side(x) is %g: ' %(x))\n", 205 | " \n", 206 | "elif choice =='y':\n", 207 | " x = float(input('Input the length of side x: '))\n", 208 | " z = float(input('Input the length of side z: '))\n", 209 | " x = math.sqrt((z * z) - (x * x))\n", 210 | " print('The length of Adajacent side(y) is %g: ' %(y))\n", 211 | " \n", 212 | "elif choice == 'z':\n", 213 | " x = float(input('Input the length of side x: '))\n", 214 | " y = float(input('Input the length of side y: '))\n", 215 | " z = math.sqrt((x * x) + (y * y))\n", 216 | " print('The length of Hypotenus(z) is %g: ' %(z))\n", 217 | "\n", 218 | "else:\n", 219 | " print('Invalid Entry :')\n", 220 | " print('Choose & Enter correct side to be calculate out of x, y and z')\n", 221 | " " 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "## 15. Write a python program to print the frequency of each of the characters present in a given string." 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 18, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "name": "stdout", 238 | "output_type": "stream", 239 | "text": [ 240 | "Enter the input string:Welcome To Python\n", 241 | "\n", 242 | "\n", 243 | "Frequency of all characters in Input string:\n", 244 | " Counter({'o': 3, 'e': 2, ' ': 2, 't': 2, 'w': 1, 'l': 1, 'c': 1, 'm': 1, 'p': 1, 'y': 1, 'h': 1, 'n': 1})\n" 245 | ] 246 | } 247 | ], 248 | "source": [ 249 | "# using collections.Counter() to get frequency of each character in string\n", 250 | "from collections import Counter\n", 251 | "\n", 252 | "#casefold() is to ensure that uppercase and lower case charachter is treated same.\n", 253 | "Input_str = input('Enter the input string:').casefold()\n", 254 | "print('\\n')\n", 255 | "res = Counter(Input_str)\n", 256 | "print('Frequency of all characters in Input string:\\n',res)" 257 | ] 258 | } 259 | ], 260 | "metadata": { 261 | "kernelspec": { 262 | "display_name": "Python 3", 263 | "language": "python", 264 | "name": "python3" 265 | }, 266 | "language_info": { 267 | "codemirror_mode": { 268 | "name": "ipython", 269 | "version": 3 270 | }, 271 | "file_extension": ".py", 272 | "mimetype": "text/x-python", 273 | "name": "python", 274 | "nbconvert_exporter": "python", 275 | "pygments_lexer": "ipython3", 276 | "version": "3.8.5" 277 | } 278 | }, 279 | "nbformat": 4, 280 | "nbformat_minor": 4 281 | } 282 | -------------------------------------------------------------------------------- /Worksheet_set_1/Python Worksheet 1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Worksheet_set_1/Python Worksheet 1.pdf -------------------------------------------------------------------------------- /Worksheet_set_1/Statistics Worksheet 1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Worksheet_set_1/Statistics Worksheet 1.pdf -------------------------------------------------------------------------------- /Worksheet_set_1/Worksheet_set_1.md: -------------------------------------------------------------------------------- 1 | 2 | --------------------------------------------------------------------------------