├── Flight Price Predication using Machine Learning
├── Flight Price Predication Using ML Techniques.ipynb
├── Flight_Price_Prediction.pdf
├── Flight_Price_Prediction.pkl
├── Flight_Price_dataset_2.xlsx
├── Presentation on Flight Price Prediction.pptx
├── Project Report on Flight Price Predication Using ML Techniques.pdf
├── Read me.md
├── Web Scraping Data of Flight prices.ipynb
└── sample-documentation.docx
├── Malignant Commentes Classifier - Multi Label Classification Project using NLP
├── Malignant Comments Classifier - Multi Label Classification Project.ipynb
├── Malignant Commentes Classifier - Multi Label Classification Project using NLP - FlipRobo.pptx
├── Rating Prediction Project presentation - FlipRobo.pptx
├── Read me.md
└── test_dataset_predictions.zip
├── Micro Credit Defaulter Project
├── Project Solution Files Micro Credit Project Defaulter.zip
└── README.md
├── Product Review Rating Predication Using NLP
├── Problem-Statement.pdf
├── Product Review Rating Predication Using NLP.ipynb
├── Product Review Rating Predication Using NLP.pdf
├── Rating Prediction Data Web Scraping .ipynb
├── Rating Prediction Project presentation - FlipRobo.pptx
├── Rating_Prediction_dataset.csv
├── Read me.md
└── sample-documentation.docx
├── Project Customer Retention in Ecommerce sector
├── Customer Retention Project.zip
├── Customer_retention_case study.docx
├── Project Customer Retention Case Study .ipynb
├── Project Report on Data Analysis of Customer Retention in Ecommerce Sector .pdf
├── Read me.md
└── customer_retention_dataset.xlsx
├── Project Used Car price predication using ML
├── Car Price Predication Project.zip
├── Car Price Predication Using ML Part 1.ipynb
├── Car Price Predication Web Scraping script.ipynb
├── Car Price Prediction Part 2 ML Model .ipynb
├── Car price prediction using ML ppt.pptx
├── Project Report Car Price Predication .pdf
└── Read me.md
├── README.md
├── Surprise Housing - Housing Price Predication & Analysis Project
├── Project-Housing--2---1-
│ └── Project-Housing_splitted
│ │ ├── Data Description.txt
│ │ ├── HOUSING Use Case 2.pdf
│ │ ├── sample documentation.docx
│ │ ├── test.csv
│ │ └── train.csv
├── README.md
├── Surprise Housing - Housing Price Predication & Analysis Project.ipynb
├── Surprise Housing - Housing Price Predication & Analysis Project.pdf
└── Surprise Housing Price Predication .pptx
├── Web Scraping 1 Assignment
├── Read me.md
└── Web Scraping Assignment1 BeautifulSoup.ipynb
├── Web Scraping Selenium Assignment 3
├── Fruits_Cars_ML_google_images.zip
├── Selenium Exception Handling Assignment
├── WEB SCRAPING ASSIGNMENT 3 Selenium Exception Handling.ipynb
└── WEB-SCRAPING-ASSIGNMENT-3.pdf
├── WebScraping Assignment 4 Selenium
├── Web Scraping Assignment 4
└── Web Scraping Assignment 4 Selenium Exception .ipynb
├── Webscraping Assignment 2 Selenium
├── Web Scraping Assignment 2 Selenium.ipynb
└── Webscraping 2.md
└── Worksheet_set_1
├── Machine Learning Worksheet 1.pdf
├── Python Worksheet 1.ipynb
├── Python Worksheet 1.pdf
├── Statistics Worksheet 1.pdf
└── Worksheet_set_1.md
/Flight Price Predication using Machine Learning/Flight_Price_Prediction.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Flight_Price_Prediction.pdf
--------------------------------------------------------------------------------
/Flight Price Predication using Machine Learning/Flight_Price_Prediction.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Flight_Price_Prediction.pkl
--------------------------------------------------------------------------------
/Flight Price Predication using Machine Learning/Flight_Price_dataset_2.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Flight_Price_dataset_2.xlsx
--------------------------------------------------------------------------------
/Flight Price Predication using Machine Learning/Presentation on Flight Price Prediction.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Presentation on Flight Price Prediction.pptx
--------------------------------------------------------------------------------
/Flight Price Predication using Machine Learning/Project Report on Flight Price Predication Using ML Techniques.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/Project Report on Flight Price Predication Using ML Techniques.pdf
--------------------------------------------------------------------------------
/Flight Price Predication using Machine Learning/Read me.md:
--------------------------------------------------------------------------------
1 | FLIGHT PRICE PREDICTION PROJECT
2 |
3 | Anyone who has booked a flight ticket knows how unexpectedly the prices vary. The cheapest
4 | available ticket on a given flight gets more and less expensive over time. This usually happens as
5 | an attempt to maximize revenue based on -
6 | 1. Time of purchase patterns (making sure last-minute purchases are expensive)
7 | 2. Keeping the flight as full as they want it (raising prices on a flight which is filling up in order
8 | to reduce sales and hold back inventory for those expensive last-minute expensive
9 | purchases)
10 | So, you have to work on a project where you collect data of flight fares with other features and
11 | work to make a model to predict fares of flights.
12 |
13 | STEPS
14 |
15 | 1. Data Collection :
16 | You have to scrape at least 1500 rows of data. You can scrape more data as well, it’s up to you,
17 | More the data better the model
18 | In this section you have to scrape the data of flights from different websites (yatra.com,
19 | skyscanner.com, official websites of airlines, etc). The number of columns for data doesn’t have
20 | limit, it’s up to you and your creativity. Generally, these columns areairline name, date of journey,
21 | source, destination, route, departure time, arrival time, duration, total stops and the target variable
22 | price. You can make changes to it, you can add or you can remove some columns, it completely
23 | depends on the website from which you are fetching the data.
24 |
25 | 2. Data Analysis :
26 | After cleaning the data, you have to do some analysis on the data.
27 | Do airfares change frequently? Do they move in small increments or in large jumps? Do they tend
28 | to go up or down over time?
29 | What is the best time to buy so that the consumer can save the most by taking the least risk?
30 | Does price increase as we get near to departure date? Is Indigo cheaper than Jet Airways? Are
31 | morning flights expensive?
32 |
33 | 3. Model Building :
34 | After collecting the data, you need to build a machine learning model. Before model building do
35 | all data pre-processing steps. Try different models with different hyper parameters and select
36 | the bestmodel.
37 |
38 | Follow the complete life cycle of data science. Include all the steps like
39 |
40 | 1. Data Cleaning
41 | 2. Exploratory Data Analysis
42 | 3. Data Pre-processing
43 | 4. Model Building
44 | 5. Model Evaluation
45 | 6. Selecting the best model
46 |
--------------------------------------------------------------------------------
/Flight Price Predication using Machine Learning/sample-documentation.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Flight Price Predication using Machine Learning/sample-documentation.docx
--------------------------------------------------------------------------------
/Malignant Commentes Classifier - Multi Label Classification Project using NLP/Malignant Commentes Classifier - Multi Label Classification Project using NLP - FlipRobo.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Malignant Commentes Classifier - Multi Label Classification Project using NLP/Malignant Commentes Classifier - Multi Label Classification Project using NLP - FlipRobo.pptx
--------------------------------------------------------------------------------
/Malignant Commentes Classifier - Multi Label Classification Project using NLP/Rating Prediction Project presentation - FlipRobo.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Malignant Commentes Classifier - Multi Label Classification Project using NLP/Rating Prediction Project presentation - FlipRobo.pptx
--------------------------------------------------------------------------------
/Malignant Commentes Classifier - Multi Label Classification Project using NLP/Read me.md:
--------------------------------------------------------------------------------
1 | Problem Statement:
2 |
3 | The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection. Online hate, described as abusive language, aggression, cyberbullying, hatefulness and many others has been identified as a major threat on online social media platforms. Social media platforms are the most prominent grounds for such toxic behaviour.
4 | There has been a remarkable increase in the cases of cyberbullying and trolls on various social media platforms. Many celebrities and influences are facing backlashes from people and have to come across hateful and offensive comments. This can take a toll on anyone and affect them mentally leading to depression, mental illness, self-hatred and suicidal thoughts.
5 | Internet comments are bastions of hatred and vitriol. While online anonymity has provided a new outlet for aggression and hate speech, machine learning can be used to fight it. The problem we sought to solve was the tagging of internet comments that are aggressive towards other users. This means that insults to third parties such as celebrities will be tagged as unoffensive, but “u are an idiot” is clearly offensive.
6 |
7 | Our goal is to build a prototype of online hate and abuse comment classifier which can used to classify hate and offensive comments so that it can be controlled and restricted from spreading hatred and cyberbullying.
8 |
9 | Data Set Description:
10 |
11 | The data set contains the training set, which has approximately 1,59,000 samples and the test set which contains nearly 1,53,000 samples. All the data samples contain 8 fields which includes ‘Id’, ‘Comments’, ‘Malignant’, ‘Highly malignant’, ‘Rude’, ‘Threat’, ‘Abuse’ and ‘Loathe’.The label can be either 0 or 1, where 0 denotes a NO while 1 denotes a YES. There are various comments which have multiple labels. The first attribute is a unique ID associated with each comment.
12 |
13 | The data set includes:
14 |
15 |
16 | Malignant: It is the Label column, which includes values 0 and 1, denoting if the comment is malignant or not.
17 |
18 | Highly Malignant: It denotes comments that are highly malignant and hurtful.
19 |
20 | Rude: It denotes comments that are very rude and offensive.
21 |
22 | Threat: It contains indication of the comments that are giving any threat to someone. Abuse: It is for comments that are abusive in nature.
23 |
24 | Loathe: It describes the comments which are hateful and loathing in nature.
25 |
26 | ID: It includes unique Ids associated with each comment text given.
27 |
28 | Comment text: This column contains the comments extracted from various social media platforms.
29 |
--------------------------------------------------------------------------------
/Malignant Commentes Classifier - Multi Label Classification Project using NLP/test_dataset_predictions.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Malignant Commentes Classifier - Multi Label Classification Project using NLP/test_dataset_predictions.zip
--------------------------------------------------------------------------------
/Micro Credit Defaulter Project/Project Solution Files Micro Credit Project Defaulter.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Micro Credit Defaulter Project/Project Solution Files Micro Credit Project Defaulter.zip
--------------------------------------------------------------------------------
/Micro Credit Defaulter Project/README.md:
--------------------------------------------------------------------------------
1 | Micro Credit Defaulter project submission at flip robo
2 |
--------------------------------------------------------------------------------
/Product Review Rating Predication Using NLP/Problem-Statement.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/Problem-Statement.pdf
--------------------------------------------------------------------------------
/Product Review Rating Predication Using NLP/Product Review Rating Predication Using NLP.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/Product Review Rating Predication Using NLP.pdf
--------------------------------------------------------------------------------
/Product Review Rating Predication Using NLP/Rating Prediction Data Web Scraping .ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "> ### **Web Scraping for Rating Review Prediction Project**\n",
8 | "***\n",
9 | "**By: Lokesh Baviskar**\n",
10 | "\n",
11 | "**Batch : Internship 20**"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "#### **Web Scraping details**:\n",
19 | "- **Data for this project is Scrap from Amazon & Flipkart.**\n",
20 | "- **Around 50000 Reviews are scrap for this project.**\n",
21 | "- **Part 1 - Scraping data from Amazon.in**\n",
22 | "- **Part 2 - Scraping data from flipkart**"
23 | ]
24 | },
25 | {
26 | "cell_type": "code",
27 | "execution_count": 1,
28 | "metadata": {},
29 | "outputs": [],
30 | "source": [
31 | "#Importing required libraries\n",
32 | "import pandas as pd\n",
33 | "import selenium\n",
34 | "from selenium import webdriver\n",
35 | "from selenium.common.exceptions import NoSuchElementException , StaleElementReferenceException\n",
36 | "import time\n",
37 | "import warnings\n",
38 | "warnings.filterwarnings('ignore')"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "### **Part 1- Scrapping data from Amazon**"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "#### **1.1.Headphones**"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 2,
58 | "metadata": {},
59 | "outputs": [],
60 | "source": [
61 | "#Connect to web driver\n",
62 | "driver =webdriver.Chrome(r\"C:\\chromedriver.exe\")"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 3,
68 | "metadata": {},
69 | "outputs": [],
70 | "source": [
71 | "# Opening the Amazon.in\n",
72 | "driver.get('https://www.amazon.in/')"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": 4,
78 | "metadata": {},
79 | "outputs": [],
80 | "source": [
81 | "# Searching headphones in the search bar and clicking the search button\n",
82 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n",
83 | "search_bar.send_keys(\"headphones\")\n",
84 | "driver.find_element_by_id('nav-search-submit-button').click()"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 5,
90 | "metadata": {},
91 | "outputs": [],
92 | "source": [
93 | "# Creating empty lists\n",
94 | "Product_URL=[]\n",
95 | "Ratings=[]\n",
96 | "Review=[]\n",
97 | "\n",
98 | "#Getting URLs of the product\n",
99 | "for i in range(1,4):\n",
100 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n",
101 | " for i in URL:\n",
102 | " Product_URL.append(i.get_attribute('href'))\n",
103 | " \n",
104 | " try:\n",
105 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n",
106 | " except NoSuchElementException:\n",
107 | " pass\n",
108 | " \n",
109 | "for i in Product_URL:\n",
110 | " driver.get(i)\n",
111 | " \n",
112 | " # Clicking the rating\n",
113 | " try:\n",
114 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n",
115 | " except NoSuchElementException:\n",
116 | " print(\"No rating\")\n",
117 | " pass\n",
118 | " \n",
119 | " #Clicking to see all reviews\n",
120 | " try:\n",
121 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n",
122 | " except NoSuchElementException:\n",
123 | " pass\n",
124 | " \n",
125 | " #Scrapping the details\n",
126 | " Start_page=1\n",
127 | " End_page=50\n",
128 | " for page in range(Start_page,End_page+1):\n",
129 | " try:\n",
130 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n",
131 | " for r in Reviews:\n",
132 | " Review.append(r.text.replace('\\n',''))\n",
133 | " except NoSuchElementException:\n",
134 | " Review.append(\"Not Available\")\n",
135 | " try:\n",
136 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n",
137 | " for i in Rating:\n",
138 | " rating=i.get_attribute('title')\n",
139 | " Ratings.append(rating[:3])\n",
140 | " except NoSuchElementException:\n",
141 | " Ratings.append(\"Not available\") \n",
142 | " \n",
143 | " #Looping for going to next page automatically\n",
144 | " try:\n",
145 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n",
146 | " if next_page.text=='Next Page':\n",
147 | " next_page.click()\n",
148 | " time.sleep(2)\n",
149 | " except NoSuchElementException:\n",
150 | " pass"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": 6,
156 | "metadata": {},
157 | "outputs": [
158 | {
159 | "name": "stdout",
160 | "output_type": "stream",
161 | "text": [
162 | "9000 9000\n"
163 | ]
164 | }
165 | ],
166 | "source": [
167 | "#Checking the length of data extracted\n",
168 | "print(len(Review),len(Ratings))"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": 7,
174 | "metadata": {},
175 | "outputs": [],
176 | "source": [
177 | "#Saving in dataframe\n",
178 | "headphones=pd.DataFrame({'Product_Review':Review[:9000],'Ratings':Ratings[:9000]})"
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "#### **1.2.Laptops** "
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 24,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": [
194 | "# Getting the website to driver\n",
195 | "driver.get('https://www.amazon.in/')"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 25,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": [
204 | "# Searching laptops in the search bar and clicking the search button\n",
205 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n",
206 | "search_bar.send_keys(\"laptops\")\n",
207 | "driver.find_element_by_id('nav-search-submit-button').click()"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": 26,
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "#Creating empty lists\n",
217 | "Product_URL=[]\n",
218 | "Ratings=[]\n",
219 | "Review=[]\n",
220 | "\n",
221 | "#Getting URLs of the product\n",
222 | "for i in range(1,4):\n",
223 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n",
224 | " for i in URL:\n",
225 | " Product_URL.append(i.get_attribute('href'))\n",
226 | " try:\n",
227 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n",
228 | " except NoSuchElementException:\n",
229 | " pass\n",
230 | " \n",
231 | "for i in Product_URL:\n",
232 | " driver.get(i)\n",
233 | " \n",
234 | " #Clicking the rating\n",
235 | " try:\n",
236 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n",
237 | " except NoSuchElementException:\n",
238 | " print(\"No rating\")\n",
239 | " pass\n",
240 | " \n",
241 | " #Clicking to see all reviews\n",
242 | " try:\n",
243 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n",
244 | " except NoSuchElementException:\n",
245 | " pass\n",
246 | " \n",
247 | " #Scrapping the details\n",
248 | " Start_page=1\n",
249 | " End_page=80\n",
250 | " for page in range(Start_page,End_page+1):\n",
251 | " try:\n",
252 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n",
253 | " for r in Reviews:\n",
254 | " Review.append(r.text.replace('\\n',''))\n",
255 | " except NoSuchElementException:\n",
256 | " Review.append(\"Not Available\")\n",
257 | " try:\n",
258 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n",
259 | " for i in Rating:\n",
260 | " rating=i.get_attribute('title')\n",
261 | " Ratings.append(rating[:3])\n",
262 | " except NoSuchElementException:\n",
263 | " Ratings.append(\"Not available\") \n",
264 | " \n",
265 | " # Looping for going to next page automatically\n",
266 | " try:\n",
267 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n",
268 | " if next_page.text=='Next Page':\n",
269 | " next_page.click()\n",
270 | " time.sleep(2)\n",
271 | " except NoSuchElementException:\n",
272 | " pass"
273 | ]
274 | },
275 | {
276 | "cell_type": "code",
277 | "execution_count": 27,
278 | "metadata": {},
279 | "outputs": [
280 | {
281 | "name": "stdout",
282 | "output_type": "stream",
283 | "text": [
284 | "11000 11000\n"
285 | ]
286 | }
287 | ],
288 | "source": [
289 | "# Checking the length of data extracted\n",
290 | "print(len(Review),len(Ratings))"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": 28,
296 | "metadata": {},
297 | "outputs": [],
298 | "source": [
299 | "#Saving in dataframe\n",
300 | "laptops=pd.DataFrame({'Product_Review':Review[:11000],'Ratings':Ratings[:11000]})"
301 | ]
302 | },
303 | {
304 | "cell_type": "markdown",
305 | "metadata": {},
306 | "source": [
307 | "#### **1.3.Camera**"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": 33,
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "# Getting the website to driver\n",
317 | "driver.get('https://www.amazon.in/')"
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": 34,
323 | "metadata": {},
324 | "outputs": [],
325 | "source": [
326 | "#Searching dslr in the search bar and clicking the search button\n",
327 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n",
328 | "search_bar.send_keys(\"dslr\")\n",
329 | "driver.find_element_by_id('nav-search-submit-button').click()"
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "execution_count": 35,
335 | "metadata": {},
336 | "outputs": [],
337 | "source": [
338 | "#Creating empty lists\n",
339 | "Product_URL=[]\n",
340 | "Ratings=[]\n",
341 | "Review=[]\n",
342 | "\n",
343 | "#Getting URLs of the product\n",
344 | "for i in range(1,4):\n",
345 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n",
346 | " for i in URL:\n",
347 | " Product_URL.append(i.get_attribute('href'))\n",
348 | " try:\n",
349 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n",
350 | " except NoSuchElementException:\n",
351 | " pass\n",
352 | " \n",
353 | "for i in Product_URL:\n",
354 | " driver.get(i)\n",
355 | " \n",
356 | " #Clicking the rating\n",
357 | " try:\n",
358 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n",
359 | " except NoSuchElementException:\n",
360 | " print(\"No rating\")\n",
361 | " pass\n",
362 | " \n",
363 | " #Clicking to see all reviews\n",
364 | " try:\n",
365 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n",
366 | " except NoSuchElementException:\n",
367 | " pass\n",
368 | " \n",
369 | " #Scrapping the details\n",
370 | " Start_page=1\n",
371 | " End_page=100\n",
372 | " for page in range(Start_page,End_page+1):\n",
373 | " try:\n",
374 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n",
375 | " for r in Reviews:\n",
376 | " Review.append(r.text.replace('\\n',''))\n",
377 | " except NoSuchElementException:\n",
378 | " Review.append(\"Not Available\")\n",
379 | " try:\n",
380 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n",
381 | " for i in Rating:\n",
382 | " rating=i.get_attribute('title')\n",
383 | " Ratings.append(rating[:3])\n",
384 | " except NoSuchElementException:\n",
385 | " Ratings.append(\"Not available\") \n",
386 | " \n",
387 | " #Looping for going to next page automatically\n",
388 | " try:\n",
389 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n",
390 | " if next_page.text=='Next Page':\n",
391 | " next_page.click()\n",
392 | " time.sleep(2)\n",
393 | " except NoSuchElementException:\n",
394 | " pass"
395 | ]
396 | },
397 | {
398 | "cell_type": "code",
399 | "execution_count": 36,
400 | "metadata": {},
401 | "outputs": [
402 | {
403 | "name": "stdout",
404 | "output_type": "stream",
405 | "text": [
406 | "10000 10000\n"
407 | ]
408 | }
409 | ],
410 | "source": [
411 | "#Checking the length of data extracted\n",
412 | "print(len(Review),len(Ratings))"
413 | ]
414 | },
415 | {
416 | "cell_type": "code",
417 | "execution_count": 37,
418 | "metadata": {},
419 | "outputs": [],
420 | "source": [
421 | "#Saving in dataframe\n",
422 | "camera=pd.DataFrame({'Product_Review':Review[:10000],'Ratings':Ratings[:10000]})"
423 | ]
424 | },
425 | {
426 | "cell_type": "markdown",
427 | "metadata": {},
428 | "source": [
429 | "#### **1.4.Smartphones**"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": 42,
435 | "metadata": {},
436 | "outputs": [],
437 | "source": [
438 | "#Getting the website to driver\n",
439 | "driver.get('https://www.amazon.in/')"
440 | ]
441 | },
442 | {
443 | "cell_type": "code",
444 | "execution_count": 43,
445 | "metadata": {},
446 | "outputs": [],
447 | "source": [
448 | "#Searching phones in the search bar and clicking the search button\n",
449 | "search_bar=driver.find_element_by_id('twotabsearchtextbox')\n",
450 | "search_bar.send_keys(\"phones\")\n",
451 | "driver.find_element_by_id('nav-search-submit-button').click()"
452 | ]
453 | },
454 | {
455 | "cell_type": "code",
456 | "execution_count": 44,
457 | "metadata": {},
458 | "outputs": [],
459 | "source": [
460 | "#Creating empty lists\n",
461 | "Product_URL=[]\n",
462 | "Ratings=[]\n",
463 | "Review=[]\n",
464 | "\n",
465 | "#Getting URLs of the product\n",
466 | "for i in range(1,4):\n",
467 | " URL = driver.find_elements_by_xpath(\"//div[@class='a-section a-spacing-medium']/div[2]/div[2]/div/div/h2/a\")\n",
468 | " for i in URL:\n",
469 | " Product_URL.append(i.get_attribute('href'))\n",
470 | " try:\n",
471 | " next_btn=driver.find_element_by_xpath(\"//li[@class='a-last']/a\").click()\n",
472 | " except NoSuchElementException:\n",
473 | " pass\n",
474 | " \n",
475 | "for i in Product_URL:\n",
476 | " driver.get(i)\n",
477 | " \n",
478 | " #Clicking the rating\n",
479 | " try:\n",
480 | " driver.find_element_by_xpath(\"//a[1][@id='acrCustomerReviewLink']\").click()\n",
481 | " except NoSuchElementException:\n",
482 | " print(\"No rating\")\n",
483 | " pass\n",
484 | " \n",
485 | " #Clicking to see all reviews\n",
486 | " try:\n",
487 | " driver.find_element_by_xpath(\"//a[@class='a-link-emphasis a-text-bold']\").click()\n",
488 | " except NoSuchElementException:\n",
489 | " pass\n",
490 | " \n",
491 | " #Scrapping the details\n",
492 | " Start_page=1\n",
493 | " End_page=60\n",
494 | " for page in range(Start_page,End_page+1):\n",
495 | " try:\n",
496 | " Reviews=driver.find_elements_by_xpath(\"//div[@class='a-row a-spacing-small review-data']/span/span\")\n",
497 | " for r in Reviews:\n",
498 | " Review.append(r.text.replace('\\n',''))\n",
499 | " except NoSuchElementException:\n",
500 | " Review.append(\"Not Available\")\n",
501 | " try:\n",
502 | " Rating=driver.find_elements_by_xpath(\"//div[@class='a-section celwidget']/div[2]/a[1]\")\n",
503 | " for i in Rating:\n",
504 | " rating=i.get_attribute('title')\n",
505 | " Ratings.append(rating[:3])\n",
506 | " except NoSuchElementException:\n",
507 | " Ratings.append(\"Not available\") \n",
508 | " \n",
509 | " #Looping for going to next page automatically\n",
510 | " try:\n",
511 | " next_page=driver.find_element_by_xpath(\"//div[@id='cm_cr-pagination_bar']/ul/li[2]/a\")\n",
512 | " if next_page.text=='Next Page':\n",
513 | " next_page.click()\n",
514 | " time.sleep(2)\n",
515 | " except NoSuchElementException:\n",
516 | " pass"
517 | ]
518 | },
519 | {
520 | "cell_type": "code",
521 | "execution_count": 45,
522 | "metadata": {},
523 | "outputs": [
524 | {
525 | "name": "stdout",
526 | "output_type": "stream",
527 | "text": [
528 | "10000 10000\n"
529 | ]
530 | }
531 | ],
532 | "source": [
533 | "#Checking the length of data extracted\n",
534 | "print(len(Review),len(Ratings))"
535 | ]
536 | },
537 | {
538 | "cell_type": "code",
539 | "execution_count": 46,
540 | "metadata": {},
541 | "outputs": [],
542 | "source": [
543 | "#Saving in dataframe\n",
544 | "phones=pd.DataFrame({'Product_Review':Review[:10000],'Ratings':Ratings[:10000]})"
545 | ]
546 | },
547 | {
548 | "cell_type": "code",
549 | "execution_count": 49,
550 | "metadata": {},
551 | "outputs": [],
552 | "source": [
553 | "# Closing the driver\n",
554 | "driver.close()"
555 | ]
556 | },
557 | {
558 | "cell_type": "markdown",
559 | "metadata": {},
560 | "source": [
561 | "### **Part 2 - Scrapping data from Flipkart**"
562 | ]
563 | },
564 | {
565 | "cell_type": "markdown",
566 | "metadata": {},
567 | "source": [
568 | "#### **2.1. I-Phone**"
569 | ]
570 | },
571 | {
572 | "cell_type": "code",
573 | "execution_count": 50,
574 | "metadata": {},
575 | "outputs": [],
576 | "source": [
577 | "# Connect to web driver\n",
578 | "driver =webdriver.Chrome(r\"C:\\chromedriver.exe \")"
579 | ]
580 | },
581 | {
582 | "cell_type": "code",
583 | "execution_count": 51,
584 | "metadata": {},
585 | "outputs": [],
586 | "source": [
587 | "#Getting the website to driver\n",
588 | "driver.get('https://www.flipkart.com/apple-iphone-11-black-64-gb/product-reviews/itm4e5041ba101fd?pid=MOBFWQ6BXGJCEYNY&lid=LSTMOBFWQ6BXGJCEYNYZXSHRJ&marketplace=FLIPKART')"
589 | ]
590 | },
591 | {
592 | "cell_type": "code",
593 | "execution_count": 52,
594 | "metadata": {},
595 | "outputs": [],
596 | "source": [
597 | "#Taking the empty lists\n",
598 | "Ratings=[]\n",
599 | "Review=[]\n",
600 | "\n",
601 | "#As there are nearly 10 reviews per page, we will check for 400+ pages and scrap the required data\n",
602 | "#Now we will take a for loop and scrap\n",
603 | "for i in range(0,410):\n",
604 | " for j in driver.find_elements_by_xpath(\"//div[@class='_3LWZlK _1BLPMq']\"):\n",
605 | " Ratings.append(j.text)\n",
606 | " for j in driver.find_elements_by_xpath(\"//div[@class='t-ZTKy']\"):\n",
607 | " Review.append(j.text)\n",
608 | " \n",
609 | " #Path for next page as it changes for every page. We are appending numbers as pages change \n",
610 | " k=i+1\n",
611 | " next_page=\"https://www.flipkart.com/apple-iphone-11-black-64-gb/product-reviews/itm4e5041ba101fd?pid=MOBFWQ6BXGJCEYNY&lid=LSTMOBFWQ6BXGJCEYNYZXSHRJ&marketplace=FLIPKART&page=\"+str(k) \n",
612 | " driver.get(next_page)"
613 | ]
614 | },
615 | {
616 | "cell_type": "code",
617 | "execution_count": 53,
618 | "metadata": {},
619 | "outputs": [
620 | {
621 | "name": "stdout",
622 | "output_type": "stream",
623 | "text": [
624 | "3734 3988\n"
625 | ]
626 | }
627 | ],
628 | "source": [
629 | "#Checking the length of the data scraped\n",
630 | "print(len(Ratings),len(Review))"
631 | ]
632 | },
633 | {
634 | "cell_type": "code",
635 | "execution_count": 55,
636 | "metadata": {},
637 | "outputs": [],
638 | "source": [
639 | "#Saving in dataframe\n",
640 | "f_phones=pd.DataFrame({'Product_Review':Review[:3700],'Ratings':Ratings[:3700]})"
641 | ]
642 | },
643 | {
644 | "cell_type": "code",
645 | "execution_count": 56,
646 | "metadata": {},
647 | "outputs": [
648 | {
649 | "data": {
650 | "text/html": [
651 | "
\n",
652 | "\n",
665 | "
\n",
666 | " \n",
667 | " \n",
668 | " | \n",
669 | " Product_Review | \n",
670 | " Ratings | \n",
671 | "
\n",
672 | " \n",
673 | " \n",
674 | " \n",
675 | " 0 | \n",
676 | " The Best Phone for the Money\\n\\nThe iPhone 11 ... | \n",
677 | " 5 | \n",
678 | "
\n",
679 | " \n",
680 | " 1 | \n",
681 | " Really satisfied with the Product I received..... | \n",
682 | " 5 | \n",
683 | "
\n",
684 | " \n",
685 | " 2 | \n",
686 | " Great iPhone very snappy experience as apple k... | \n",
687 | " 5 | \n",
688 | "
\n",
689 | " \n",
690 | " 3 | \n",
691 | " Amazing phone with great cameras and better ba... | \n",
692 | " 5 | \n",
693 | "
\n",
694 | " \n",
695 | " 4 | \n",
696 | " Previously I was using one plus 3t it was a gr... | \n",
697 | " 5 | \n",
698 | "
\n",
699 | " \n",
700 | "
\n",
701 | "
"
702 | ],
703 | "text/plain": [
704 | " Product_Review Ratings\n",
705 | "0 The Best Phone for the Money\\n\\nThe iPhone 11 ... 5\n",
706 | "1 Really satisfied with the Product I received..... 5\n",
707 | "2 Great iPhone very snappy experience as apple k... 5\n",
708 | "3 Amazing phone with great cameras and better ba... 5\n",
709 | "4 Previously I was using one plus 3t it was a gr... 5"
710 | ]
711 | },
712 | "execution_count": 56,
713 | "metadata": {},
714 | "output_type": "execute_result"
715 | }
716 | ],
717 | "source": [
718 | "#Checking first 5 data of the dataframe\n",
719 | "f_phones.head()"
720 | ]
721 | },
722 | {
723 | "cell_type": "code",
724 | "execution_count": 57,
725 | "metadata": {},
726 | "outputs": [
727 | {
728 | "data": {
729 | "text/html": [
730 | "\n",
731 | "\n",
744 | "
\n",
745 | " \n",
746 | " \n",
747 | " | \n",
748 | " Product_Review | \n",
749 | " Ratings | \n",
750 | "
\n",
751 | " \n",
752 | " \n",
753 | " \n",
754 | " 3695 | \n",
755 | " Awesome. This is my first iPhone value for money. | \n",
756 | " 5 | \n",
757 | "
\n",
758 | " \n",
759 | " 3696 | \n",
760 | " good | \n",
761 | " 5 | \n",
762 | "
\n",
763 | " \n",
764 | " 3697 | \n",
765 | " 11 is as close & good as 12, The only downside... | \n",
766 | " 5 | \n",
767 | "
\n",
768 | " \n",
769 | " 3698 | \n",
770 | " Mind-blowing 😍🥳 | \n",
771 | " 5 | \n",
772 | "
\n",
773 | " \n",
774 | " 3699 | \n",
775 | " Amazing product at best price | \n",
776 | " 5 | \n",
777 | "
\n",
778 | " \n",
779 | "
\n",
780 | "
"
781 | ],
782 | "text/plain": [
783 | " Product_Review Ratings\n",
784 | "3695 Awesome. This is my first iPhone value for money. 5\n",
785 | "3696 good 5\n",
786 | "3697 11 is as close & good as 12, The only downside... 5\n",
787 | "3698 Mind-blowing 😍🥳 5\n",
788 | "3699 Amazing product at best price 5"
789 | ]
790 | },
791 | "execution_count": 57,
792 | "metadata": {},
793 | "output_type": "execute_result"
794 | }
795 | ],
796 | "source": [
797 | "#Checking last 5 data of the dataframe\n",
798 | "f_phones.tail()"
799 | ]
800 | },
801 | {
802 | "cell_type": "code",
803 | "execution_count": 58,
804 | "metadata": {},
805 | "outputs": [],
806 | "source": [
807 | "#Closing the driver\n",
808 | "driver.close()"
809 | ]
810 | },
811 | {
812 | "cell_type": "markdown",
813 | "metadata": {},
814 | "source": [
815 | "#### **2.2. Poco Mobiles**"
816 | ]
817 | },
818 | {
819 | "cell_type": "code",
820 | "execution_count": 59,
821 | "metadata": {},
822 | "outputs": [],
823 | "source": [
824 | "# Connect to web driver\n",
825 | "driver =webdriver.Chrome(r\"C:\\Users\\Femina\\Downloads\\chromedriver_win32 (1)\\chromedriver.exe \")"
826 | ]
827 | },
828 | {
829 | "cell_type": "code",
830 | "execution_count": 60,
831 | "metadata": {},
832 | "outputs": [],
833 | "source": [
834 | "# Getting the website to driver\n",
835 | "driver.get('https://www.flipkart.com/poco-m3-power-black-64-gb/product-reviews/itmb49cc10841be2?pid=MOBFZTCUTAYPJHHR&lid=LSTMOBFZTCUTAYPJHHR2ZVC1N&marketplace=FLIPKART')"
836 | ]
837 | },
838 | {
839 | "cell_type": "code",
840 | "execution_count": 61,
841 | "metadata": {},
842 | "outputs": [],
843 | "source": [
844 | "#Taking the empty lists\n",
845 | "Ratings=[]\n",
846 | "Review=[]\n",
847 | "\n",
848 | "#As there are nearly 10 reviews per page, we will check for 400+ pages and scrap the required data\n",
849 | "#Now we will take a for loop and scrap\n",
850 | "for i in range(0,650):\n",
851 | " for j in driver.find_elements_by_xpath(\"//div[@class='_3LWZlK _1BLPMq']\"):\n",
852 | " Ratings.append(j.text)\n",
853 | " for j in driver.find_elements_by_xpath(\"//div[@class='t-ZTKy']\"):\n",
854 | " Review.append(j.text)\n",
855 | " \n",
856 | " #Path for next page as it changes for every page. We are appending numbers as pages change \n",
857 | " k=i+1\n",
858 | " next_page=\"https://www.flipkart.com/poco-m3-power-black-64-gb/product-reviews/itmb49cc10841be2?pid=MOBFZTCUTAYPJHHR&lid=LSTMOBFZTCUTAYPJHHR2ZVC1N&marketplace=FLIPKART&page=\"+str(k) \n",
859 | " driver.get(next_page)"
860 | ]
861 | },
862 | {
863 | "cell_type": "code",
864 | "execution_count": 62,
865 | "metadata": {},
866 | "outputs": [
867 | {
868 | "name": "stdout",
869 | "output_type": "stream",
870 | "text": [
871 | "6411 5408\n"
872 | ]
873 | }
874 | ],
875 | "source": [
876 | "#Checking the length of the data scraped\n",
877 | "print(len(Review),len(Ratings))"
878 | ]
879 | },
880 | {
881 | "cell_type": "code",
882 | "execution_count": 63,
883 | "metadata": {},
884 | "outputs": [],
885 | "source": [
886 | "#Saving in dataframe\n",
887 | "f_poco=pd.DataFrame({'Product_Review':Review[:5200],'Ratings':Ratings[:5200]})"
888 | ]
889 | },
890 | {
891 | "cell_type": "code",
892 | "execution_count": 64,
893 | "metadata": {},
894 | "outputs": [
895 | {
896 | "data": {
897 | "text/html": [
898 | "\n",
899 | "\n",
912 | "
\n",
913 | " \n",
914 | " \n",
915 | " | \n",
916 | " Product_Review | \n",
917 | " Ratings | \n",
918 | "
\n",
919 | " \n",
920 | " \n",
921 | " \n",
922 | " 0 | \n",
923 | " Great Phone at this Price point. Superb cool D... | \n",
924 | " 5 | \n",
925 | "
\n",
926 | " \n",
927 | " 1 | \n",
928 | " Good mobile poco m3\\nPros:\\nFullhd display,\\ns... | \n",
929 | " 4 | \n",
930 | "
\n",
931 | " \n",
932 | " 2 | \n",
933 | " Good phone battery🔋 And camera This price poin... | \n",
934 | " 5 | \n",
935 | "
\n",
936 | " \n",
937 | " 3 | \n",
938 | " U will never get this specs for this price...d... | \n",
939 | " 5 | \n",
940 | "
\n",
941 | " \n",
942 | " 4 | \n",
943 | " One word review \" Value for Money\"\\nIt has the... | \n",
944 | " 5 | \n",
945 | "
\n",
946 | " \n",
947 | "
\n",
948 | "
"
949 | ],
950 | "text/plain": [
951 | " Product_Review Ratings\n",
952 | "0 Great Phone at this Price point. Superb cool D... 5\n",
953 | "1 Good mobile poco m3\\nPros:\\nFullhd display,\\ns... 4\n",
954 | "2 Good phone battery🔋 And camera This price poin... 5\n",
955 | "3 U will never get this specs for this price...d... 5\n",
956 | "4 One word review \" Value for Money\"\\nIt has the... 5"
957 | ]
958 | },
959 | "execution_count": 64,
960 | "metadata": {},
961 | "output_type": "execute_result"
962 | }
963 | ],
964 | "source": [
965 | "#Checking first 5 data of the dataframe\n",
966 | "f_poco.head()"
967 | ]
968 | },
969 | {
970 | "cell_type": "code",
971 | "execution_count": 65,
972 | "metadata": {},
973 | "outputs": [
974 | {
975 | "data": {
976 | "text/html": [
977 | "\n",
978 | "\n",
991 | "
\n",
992 | " \n",
993 | " \n",
994 | " | \n",
995 | " Product_Review | \n",
996 | " Ratings | \n",
997 | "
\n",
998 | " \n",
999 | " \n",
1000 | " \n",
1001 | " 5195 | \n",
1002 | " Good | \n",
1003 | " 5 | \n",
1004 | "
\n",
1005 | " \n",
1006 | " 5196 | \n",
1007 | " Best | \n",
1008 | " 5 | \n",
1009 | "
\n",
1010 | " \n",
1011 | " 5197 | \n",
1012 | " Good phone | \n",
1013 | " 4 | \n",
1014 | "
\n",
1015 | " \n",
1016 | " 5198 | \n",
1017 | " It's awesome... Thank you Flipkart every year ... | \n",
1018 | " 3 | \n",
1019 | "
\n",
1020 | " \n",
1021 | " 5199 | \n",
1022 | " Wow 😍😍😍😍 I am so happy very good tq Flipkart 😍😍😍😍 | \n",
1023 | " 5 | \n",
1024 | "
\n",
1025 | " \n",
1026 | "
\n",
1027 | "
"
1028 | ],
1029 | "text/plain": [
1030 | " Product_Review Ratings\n",
1031 | "5195 Good 5\n",
1032 | "5196 Best 5\n",
1033 | "5197 Good phone 4\n",
1034 | "5198 It's awesome... Thank you Flipkart every year ... 3\n",
1035 | "5199 Wow 😍😍😍😍 I am so happy very good tq Flipkart 😍😍😍😍 5"
1036 | ]
1037 | },
1038 | "execution_count": 65,
1039 | "metadata": {},
1040 | "output_type": "execute_result"
1041 | }
1042 | ],
1043 | "source": [
1044 | "#Checking last 5 data of the dataframe\n",
1045 | "f_poco.tail()"
1046 | ]
1047 | },
1048 | {
1049 | "cell_type": "code",
1050 | "execution_count": 66,
1051 | "metadata": {},
1052 | "outputs": [],
1053 | "source": [
1054 | "#Closing the driver\n",
1055 | "driver.close()"
1056 | ]
1057 | },
1058 | {
1059 | "cell_type": "markdown",
1060 | "metadata": {},
1061 | "source": [
1062 | "#### **2.3. Routers**"
1063 | ]
1064 | },
1065 | {
1066 | "cell_type": "code",
1067 | "execution_count": 67,
1068 | "metadata": {},
1069 | "outputs": [],
1070 | "source": [
1071 | "# Connect to web driver\n",
1072 | "driver =webdriver.Chrome(r\"C:\\Users\\Femina\\Downloads\\chromedriver_win32 (1)\\chromedriver.exe \")"
1073 | ]
1074 | },
1075 | {
1076 | "cell_type": "code",
1077 | "execution_count": 68,
1078 | "metadata": {},
1079 | "outputs": [],
1080 | "source": [
1081 | "# Getting the website to driver\n",
1082 | "driver.get('https://www.flipkart.com/tp-link-tl-wr841n-300mbps-wireless-n-router/product-reviews/itmf48vgyfyx8m4f?pid=RTRD7HN3JJYF6WN2&lid=LSTRTRD7HN3JJYF6WN20ZITXQ&marketplace=FLIPKART')"
1083 | ]
1084 | },
1085 | {
1086 | "cell_type": "code",
1087 | "execution_count": 69,
1088 | "metadata": {},
1089 | "outputs": [],
1090 | "source": [
1091 | "#Taking the empty lists\n",
1092 | "Ratings=[]\n",
1093 | "Review=[]\n",
1094 | "\n",
1095 | "#As there are nearly 10 reviews per page, we will check for 400+ pages and scrap the required data\n",
1096 | "#Now we will take a for loop and scrap\n",
1097 | "for i in range(0,150):\n",
1098 | " for j in driver.find_elements_by_xpath(\"//div[@class='_3LWZlK _1BLPMq']\"):\n",
1099 | " Ratings.append(j.text)\n",
1100 | " for j in driver.find_elements_by_xpath(\"//div[@class='t-ZTKy']\"):\n",
1101 | " Review.append(j.text)\n",
1102 | " \n",
1103 | " #Path for next page as it changes for every page. We are appending numbers as pages change \n",
1104 | " k=i+1\n",
1105 | " next_page=\"https://www.flipkart.com/tp-link-tl-wr841n-300mbps-wireless-n-router/product-reviews/itmf48vgyfyx8m4f?pid=RTRD7HN3JJYF6WN2&lid=LSTRTRD7HN3JJYF6WN20ZITXQ&marketplace=FLIPKART&page=\"+str(k) \n",
1106 | " driver.get(next_page)"
1107 | ]
1108 | },
1109 | {
1110 | "cell_type": "code",
1111 | "execution_count": 70,
1112 | "metadata": {},
1113 | "outputs": [
1114 | {
1115 | "name": "stdout",
1116 | "output_type": "stream",
1117 | "text": [
1118 | "1500 1457\n"
1119 | ]
1120 | }
1121 | ],
1122 | "source": [
1123 | "# Checking the length of the data scraped\n",
1124 | "print(len(Review),len(Ratings))"
1125 | ]
1126 | },
1127 | {
1128 | "cell_type": "code",
1129 | "execution_count": 71,
1130 | "metadata": {},
1131 | "outputs": [],
1132 | "source": [
1133 | "# Saving in dataframe\n",
1134 | "router=pd.DataFrame({'Product_Review':Review[:1000],'Ratings':Ratings[:1000]})"
1135 | ]
1136 | },
1137 | {
1138 | "cell_type": "code",
1139 | "execution_count": 72,
1140 | "metadata": {},
1141 | "outputs": [
1142 | {
1143 | "data": {
1144 | "text/html": [
1145 | "\n",
1146 | "\n",
1159 | "
\n",
1160 | " \n",
1161 | " \n",
1162 | " | \n",
1163 | " Product_Review | \n",
1164 | " Ratings | \n",
1165 | "
\n",
1166 | " \n",
1167 | " \n",
1168 | " \n",
1169 | " 0 | \n",
1170 | " *********EDIT: Today is 8th March 2015, more t... | \n",
1171 | " 5 | \n",
1172 | "
\n",
1173 | " \n",
1174 | " 1 | \n",
1175 | " I Used this Router for 30 days, Really worth f... | \n",
1176 | " 4 | \n",
1177 | "
\n",
1178 | " \n",
1179 | " 2 | \n",
1180 | " Okay, so... my review of this router is going ... | \n",
1181 | " 4 | \n",
1182 | "
\n",
1183 | " \n",
1184 | " 3 | \n",
1185 | " Excellent service and great buying experience ... | \n",
1186 | " 5 | \n",
1187 | "
\n",
1188 | " \n",
1189 | " 4 | \n",
1190 | " I bought this product on 25/05/2012 at 11.30 a... | \n",
1191 | " 5 | \n",
1192 | "
\n",
1193 | " \n",
1194 | "
\n",
1195 | "
"
1196 | ],
1197 | "text/plain": [
1198 | " Product_Review Ratings\n",
1199 | "0 *********EDIT: Today is 8th March 2015, more t... 5\n",
1200 | "1 I Used this Router for 30 days, Really worth f... 4\n",
1201 | "2 Okay, so... my review of this router is going ... 4\n",
1202 | "3 Excellent service and great buying experience ... 5\n",
1203 | "4 I bought this product on 25/05/2012 at 11.30 a... 5"
1204 | ]
1205 | },
1206 | "execution_count": 72,
1207 | "metadata": {},
1208 | "output_type": "execute_result"
1209 | }
1210 | ],
1211 | "source": [
1212 | "# Checking first 5 data of the dataframe\n",
1213 | "router.head()"
1214 | ]
1215 | },
1216 | {
1217 | "cell_type": "code",
1218 | "execution_count": 73,
1219 | "metadata": {},
1220 | "outputs": [
1221 | {
1222 | "data": {
1223 | "text/html": [
1224 | "\n",
1225 | "\n",
1238 | "
\n",
1239 | " \n",
1240 | " \n",
1241 | " | \n",
1242 | " Product_Review | \n",
1243 | " Ratings | \n",
1244 | "
\n",
1245 | " \n",
1246 | " \n",
1247 | " \n",
1248 | " 995 | \n",
1249 | " I needed to create a WiFi setup for small offi... | \n",
1250 | " 5 | \n",
1251 | "
\n",
1252 | " \n",
1253 | " 996 | \n",
1254 | " its very easy interface to work with and i str... | \n",
1255 | " 4 | \n",
1256 | "
\n",
1257 | " \n",
1258 | " 997 | \n",
1259 | " Honest rating for this product is 5/5 . Just f... | \n",
1260 | " 5 | \n",
1261 | "
\n",
1262 | " \n",
1263 | " 998 | \n",
1264 | " I have bought 3 of these little guys, for my s... | \n",
1265 | " 4 | \n",
1266 | "
\n",
1267 | " \n",
1268 | " 999 | \n",
1269 | " Its a great product. Works really fine. It wil... | \n",
1270 | " 5 | \n",
1271 | "
\n",
1272 | " \n",
1273 | "
\n",
1274 | "
"
1275 | ],
1276 | "text/plain": [
1277 | " Product_Review Ratings\n",
1278 | "995 I needed to create a WiFi setup for small offi... 5\n",
1279 | "996 its very easy interface to work with and i str... 4\n",
1280 | "997 Honest rating for this product is 5/5 . Just f... 5\n",
1281 | "998 I have bought 3 of these little guys, for my s... 4\n",
1282 | "999 Its a great product. Works really fine. It wil... 5"
1283 | ]
1284 | },
1285 | "execution_count": 73,
1286 | "metadata": {},
1287 | "output_type": "execute_result"
1288 | }
1289 | ],
1290 | "source": [
1291 | "# Checking last 5 data of the dataframe\n",
1292 | "router.tail()"
1293 | ]
1294 | },
1295 | {
1296 | "cell_type": "code",
1297 | "execution_count": 74,
1298 | "metadata": {},
1299 | "outputs": [],
1300 | "source": [
1301 | "# Closing the driver\n",
1302 | "driver.close()"
1303 | ]
1304 | },
1305 | {
1306 | "cell_type": "markdown",
1307 | "metadata": {},
1308 | "source": [
1309 | "#### Exporting data in CSV file"
1310 | ]
1311 | },
1312 | {
1313 | "cell_type": "code",
1314 | "execution_count": null,
1315 | "metadata": {},
1316 | "outputs": [],
1317 | "source": [
1318 | "# Combining all dataframes into a single dataframe\n",
1319 | "ratings_data=headphones.append([laptops,camera,phones,f_phones,f_poco,router],ignore_index=True)\n",
1320 | "ratings_data"
1321 | ]
1322 | },
1323 | {
1324 | "cell_type": "code",
1325 | "execution_count": 76,
1326 | "metadata": {},
1327 | "outputs": [],
1328 | "source": [
1329 | "# Saving the data into a csv file\n",
1330 | "ratings_data.to_csv('Rating_Prediction_dataset.csv')"
1331 | ]
1332 | },
1333 | {
1334 | "cell_type": "code",
1335 | "execution_count": null,
1336 | "metadata": {},
1337 | "outputs": [],
1338 | "source": []
1339 | }
1340 | ],
1341 | "metadata": {
1342 | "kernelspec": {
1343 | "display_name": "Python 3",
1344 | "language": "python",
1345 | "name": "python3"
1346 | },
1347 | "language_info": {
1348 | "codemirror_mode": {
1349 | "name": "ipython",
1350 | "version": 3
1351 | },
1352 | "file_extension": ".py",
1353 | "mimetype": "text/x-python",
1354 | "name": "python",
1355 | "nbconvert_exporter": "python",
1356 | "pygments_lexer": "ipython3",
1357 | "version": "3.8.5"
1358 | }
1359 | },
1360 | "nbformat": 4,
1361 | "nbformat_minor": 4
1362 | }
1363 |
--------------------------------------------------------------------------------
/Product Review Rating Predication Using NLP/Rating Prediction Project presentation - FlipRobo.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/Rating Prediction Project presentation - FlipRobo.pptx
--------------------------------------------------------------------------------
/Product Review Rating Predication Using NLP/Read me.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Product Review Rating Predication Using NLP/sample-documentation.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Product Review Rating Predication Using NLP/sample-documentation.docx
--------------------------------------------------------------------------------
/Project Customer Retention in Ecommerce sector/Customer Retention Project.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/Customer Retention Project.zip
--------------------------------------------------------------------------------
/Project Customer Retention in Ecommerce sector/Customer_retention_case study.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/Customer_retention_case study.docx
--------------------------------------------------------------------------------
/Project Customer Retention in Ecommerce sector/Project Report on Data Analysis of Customer Retention in Ecommerce Sector .pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/Project Report on Data Analysis of Customer Retention in Ecommerce Sector .pdf
--------------------------------------------------------------------------------
/Project Customer Retention in Ecommerce sector/Read me.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Project Customer Retention in Ecommerce sector/customer_retention_dataset.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Customer Retention in Ecommerce sector/customer_retention_dataset.xlsx
--------------------------------------------------------------------------------
/Project Used Car price predication using ML/Car Price Predication Project.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Used Car price predication using ML/Car Price Predication Project.zip
--------------------------------------------------------------------------------
/Project Used Car price predication using ML/Car Price Predication Web Scraping script.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## **Car Price Prediction Project - Part 1 Web Scraping Data of Used Car**\n",
8 | "***Author : Mr.Lokesh Baviskar***\n",
9 | "\n",
10 | "### **Objective** :\n",
11 | " **To Scrape data of used car to predict car price**\n",
12 | "\n",
13 | "### **Strategy** :\n",
14 | "1. Selenium will be used for webscraping data from cardheko.com\n",
15 | "2. In first part Scraping URL of Used car for different location in India.\n",
16 | "3. Storing Scrap URL in excel file. \n",
17 | "4. Selecting car feature to be scrap from website.\n",
18 | "5. In second part Scraping data from indiviual URL in excel file.\n",
19 | "6. Exporting final data in Excel file."
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "### **Part 1 : Scraping URLs of Used car from Cardheko.com**"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "- **Importing libraries require for scraping**"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 1,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "import pandas as pd\n",
43 | "import numpy as pd\n",
44 | "import time\n",
45 | "import selenium\n",
46 | "from selenium import webdriver\n",
47 | "from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | " - **Importing webdriver**"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 2,
60 | "metadata": {},
61 | "outputs": [],
62 | "source": [
63 | "driver=webdriver.Chrome(r'C:\\chromedriver.exe')"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "- **Opening cardheko website in browser**"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 113,
76 | "metadata": {},
77 | "outputs": [],
78 | "source": [
79 | "url = \"https://www.cardekho.com/\"\n",
80 | "driver.get(url)\n",
81 | "time.sleep(2)"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 16,
87 | "metadata": {},
88 | "outputs": [],
89 | "source": [
90 | "Used_cars=driver.find_element_by_xpath('//li[@data-slug=\"/usedCars\"]/a').get_attribute('href')\n",
91 | "driver.get(Used_cars)\n",
92 | "time.sleep(2)"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {},
98 | "source": [
99 | "- **Collecting url of different location/ city for futher scraping**"
100 | ]
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": [
106 | "**1. Extracting data for Ahmedabad city**"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": 16,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": [
115 | "url = \"https://www.cardekho.com/used-cars+in+ahmedabad\"\n",
116 | "driver.get(url)\n",
117 | "time.sleep(2)"
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": 17,
123 | "metadata": {},
124 | "outputs": [
125 | {
126 | "name": "stderr",
127 | "output_type": "stream",
128 | "text": [
129 | "100%|██████████| 200/200 [05:42<00:00, 1.71s/it]\n"
130 | ]
131 | }
132 | ],
133 | "source": [
134 | "from tqdm import tqdm\n",
135 | "for _ in tqdm(range(0,200)):\n",
136 | " time.sleep(0.5)\n",
137 | " driver.execute_script(\"window.scrollBy(0,1000)\",\"\")\n",
138 | " time.sleep(1)\n",
139 | " driver.execute_script(\"window.scrollBy(0,-350)\")"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 18,
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "name": "stderr",
149 | "output_type": "stream",
150 | "text": [
151 | "100%|██████████| 509/509 [00:04<00:00, 112.64it/s]\n"
152 | ]
153 | }
154 | ],
155 | "source": [
156 | "Car_url_ahmedabad = []\n",
157 | "car_url_ahmedabad = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
158 | "for j in tqdm(range(len(car_url_ahmedabad))):\n",
159 | " Car_url_ahmedabad.append(car_url_ahmedabad[j].get_attribute('href'))\n",
160 | "time.sleep(2)"
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": 20,
166 | "metadata": {},
167 | "outputs": [
168 | {
169 | "data": {
170 | "text/plain": [
171 | "509"
172 | ]
173 | },
174 | "execution_count": 20,
175 | "metadata": {},
176 | "output_type": "execute_result"
177 | }
178 | ],
179 | "source": [
180 | "len(Car_url_ahmedabad)"
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {},
186 | "source": [
187 | "**2. Extracting URL for Bangalore city**"
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": 33,
193 | "metadata": {},
194 | "outputs": [],
195 | "source": [
196 | "url = \"https://www.cardekho.com/used-cars+in+bangalore\"\n",
197 | "driver.get(url)\n",
198 | "time.sleep(2)"
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": 34,
204 | "metadata": {},
205 | "outputs": [
206 | {
207 | "name": "stderr",
208 | "output_type": "stream",
209 | "text": [
210 | "100%|██████████| 300/300 [09:45<00:00, 1.95s/it]\n"
211 | ]
212 | }
213 | ],
214 | "source": [
215 | "from tqdm import tqdm\n",
216 | "for _ in tqdm(range(0,300)):\n",
217 | " time.sleep(0.75)\n",
218 | " driver.execute_script(\"window.scrollBy(0,1000)\",\"\")\n",
219 | " time.sleep(1)\n",
220 | " driver.execute_script(\"window.scrollBy(0,-350)\")"
221 | ]
222 | },
223 | {
224 | "cell_type": "code",
225 | "execution_count": 35,
226 | "metadata": {},
227 | "outputs": [
228 | {
229 | "name": "stderr",
230 | "output_type": "stream",
231 | "text": [
232 | "100%|██████████| 580/580 [00:05<00:00, 105.58it/s]\n"
233 | ]
234 | }
235 | ],
236 | "source": [
237 | "Car_url_bangalore = []\n",
238 | "car_url_bangalore = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
239 | "for j in tqdm(range(len(car_url_bangalore))):\n",
240 | " Car_url_bangalore.append(car_url_bangalore[j].get_attribute('href'))\n",
241 | "time.sleep(2)"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": 36,
247 | "metadata": {},
248 | "outputs": [
249 | {
250 | "data": {
251 | "text/plain": [
252 | "580"
253 | ]
254 | },
255 | "execution_count": 36,
256 | "metadata": {},
257 | "output_type": "execute_result"
258 | }
259 | ],
260 | "source": [
261 | "len(Car_url_bangalore)"
262 | ]
263 | },
264 | {
265 | "cell_type": "markdown",
266 | "metadata": {},
267 | "source": [
268 | "**3. Extracting URL for Chennai**"
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": 37,
274 | "metadata": {},
275 | "outputs": [],
276 | "source": [
277 | "url = \"https://www.cardekho.com/used-cars+in+chennai\"\n",
278 | "driver.get(url)\n",
279 | "time.sleep(2)"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": 38,
285 | "metadata": {},
286 | "outputs": [
287 | {
288 | "name": "stderr",
289 | "output_type": "stream",
290 | "text": [
291 | "100%|██████████| 250/250 [06:35<00:00, 1.58s/it]\n"
292 | ]
293 | }
294 | ],
295 | "source": [
296 | "from tqdm import tqdm\n",
297 | "for _ in tqdm(range(0,250)):\n",
298 | " time.sleep(0.5)\n",
299 | " driver.execute_script(\"window.scrollBy(0,1000)\",\"\")\n",
300 | " time.sleep(1)\n",
301 | " driver.execute_script(\"window.scrollBy(0,-350)\")"
302 | ]
303 | },
304 | {
305 | "cell_type": "code",
306 | "execution_count": 39,
307 | "metadata": {},
308 | "outputs": [
309 | {
310 | "name": "stderr",
311 | "output_type": "stream",
312 | "text": [
313 | "100%|██████████| 298/298 [00:02<00:00, 117.45it/s]\n"
314 | ]
315 | }
316 | ],
317 | "source": [
318 | "Car_url_chennai = []\n",
319 | "car_url_chennai = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
320 | "for j in tqdm(range(len(car_url_chennai))):\n",
321 | " Car_url_chennai.append(car_url_chennai[j].get_attribute('href'))\n",
322 | "time.sleep(2)"
323 | ]
324 | },
325 | {
326 | "cell_type": "markdown",
327 | "metadata": {},
328 | "source": [
329 | "**4. Extracting URL for Dehli-NCR**"
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "execution_count": 48,
335 | "metadata": {},
336 | "outputs": [],
337 | "source": [
338 | "url = \"https://www.cardekho.com/used-cars+in+delhi-ncr\"\n",
339 | "driver.get(url)\n",
340 | "time.sleep(2)"
341 | ]
342 | },
343 | {
344 | "cell_type": "code",
345 | "execution_count": 49,
346 | "metadata": {},
347 | "outputs": [
348 | {
349 | "name": "stderr",
350 | "output_type": "stream",
351 | "text": [
352 | "100%|██████████| 1100/1100 [53:55<00:00, 2.94s/it]\n"
353 | ]
354 | }
355 | ],
356 | "source": [
357 | "from tqdm import tqdm\n",
358 | "for _ in tqdm(range(0,1100)):\n",
359 | " time.sleep(0.5)\n",
360 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
361 | " time.sleep(1)\n",
362 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
363 | ]
364 | },
365 | {
366 | "cell_type": "code",
367 | "execution_count": 50,
368 | "metadata": {},
369 | "outputs": [
370 | {
371 | "name": "stderr",
372 | "output_type": "stream",
373 | "text": [
374 | "100%|██████████| 3141/3141 [01:03<00:00, 49.52it/s]\n"
375 | ]
376 | }
377 | ],
378 | "source": [
379 | "Car_url_delhi_ncr = []\n",
380 | "car_url_delhi_ncr = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
381 | "for j in tqdm(range(len(car_url_delhi_ncr))):\n",
382 | " Car_url_delhi_ncr.append(car_url_delhi_ncr[j].get_attribute('href'))\n",
383 | "time.sleep(2)"
384 | ]
385 | },
386 | {
387 | "cell_type": "markdown",
388 | "metadata": {},
389 | "source": [
390 | "**5. Extracting URL for Gurgaon city**"
391 | ]
392 | },
393 | {
394 | "cell_type": "code",
395 | "execution_count": 51,
396 | "metadata": {},
397 | "outputs": [],
398 | "source": [
399 | "url = \"https://www.cardekho.com/used-cars+in+gurgaon\"\n",
400 | "driver.get(url)\n",
401 | "time.sleep(2)"
402 | ]
403 | },
404 | {
405 | "cell_type": "code",
406 | "execution_count": 52,
407 | "metadata": {},
408 | "outputs": [
409 | {
410 | "name": "stderr",
411 | "output_type": "stream",
412 | "text": [
413 | "100%|██████████| 600/600 [20:22<00:00, 2.04s/it]\n"
414 | ]
415 | }
416 | ],
417 | "source": [
418 | "from tqdm import tqdm\n",
419 | "for _ in tqdm(range(0,600)):\n",
420 | " time.sleep(0.5)\n",
421 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
422 | " time.sleep(1)\n",
423 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
424 | ]
425 | },
426 | {
427 | "cell_type": "code",
428 | "execution_count": 53,
429 | "metadata": {},
430 | "outputs": [
431 | {
432 | "name": "stderr",
433 | "output_type": "stream",
434 | "text": [
435 | "100%|██████████| 1217/1217 [00:12<00:00, 100.96it/s]\n"
436 | ]
437 | }
438 | ],
439 | "source": [
440 | "Car_url_gurgaon = []\n",
441 | "car_url_gurgaon = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
442 | "for j in tqdm(range(len(car_url_gurgaon))):\n",
443 | " Car_url_gurgaon.append(car_url_gurgaon[j].get_attribute('href'))\n",
444 | "time.sleep(2)"
445 | ]
446 | },
447 | {
448 | "cell_type": "markdown",
449 | "metadata": {},
450 | "source": [
451 | "**6. Extracting URL for Telangana**"
452 | ]
453 | },
454 | {
455 | "cell_type": "code",
456 | "execution_count": 54,
457 | "metadata": {},
458 | "outputs": [],
459 | "source": [
460 | "url = \"https://www.cardekho.com/used-cars+in+telangana\"\n",
461 | "driver.get(url)\n",
462 | "time.sleep(2)"
463 | ]
464 | },
465 | {
466 | "cell_type": "code",
467 | "execution_count": 56,
468 | "metadata": {},
469 | "outputs": [
470 | {
471 | "name": "stderr",
472 | "output_type": "stream",
473 | "text": [
474 | "100%|██████████| 400/400 [17:11<00:00, 2.58s/it]\n"
475 | ]
476 | }
477 | ],
478 | "source": [
479 | "from tqdm import tqdm\n",
480 | "for _ in tqdm(range(0,400)):\n",
481 | " time.sleep(0.5)\n",
482 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
483 | " time.sleep(1)\n",
484 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
485 | ]
486 | },
487 | {
488 | "cell_type": "code",
489 | "execution_count": 57,
490 | "metadata": {},
491 | "outputs": [
492 | {
493 | "name": "stderr",
494 | "output_type": "stream",
495 | "text": [
496 | "100%|██████████| 1210/1210 [00:10<00:00, 110.06it/s]\n"
497 | ]
498 | }
499 | ],
500 | "source": [
501 | "Car_url_telangana = []\n",
502 | "car_url_telangana = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
503 | "for j in tqdm(range(len(car_url_telangana))):\n",
504 | " Car_url_telangana.append(car_url_telangana[j].get_attribute('href'))\n",
505 | "time.sleep(2)"
506 | ]
507 | },
508 | {
509 | "cell_type": "markdown",
510 | "metadata": {},
511 | "source": [
512 | "**7. Extracting URL for Maharashtra**"
513 | ]
514 | },
515 | {
516 | "cell_type": "code",
517 | "execution_count": 61,
518 | "metadata": {},
519 | "outputs": [],
520 | "source": [
521 | "url = \"https://www.cardekho.com/used-cars+in+maharashtra\"\n",
522 | "driver.get(url)\n",
523 | "time.sleep(2)"
524 | ]
525 | },
526 | {
527 | "cell_type": "code",
528 | "execution_count": 62,
529 | "metadata": {},
530 | "outputs": [
531 | {
532 | "name": "stderr",
533 | "output_type": "stream",
534 | "text": [
535 | "100%|██████████| 1000/1000 [09:30<00:00, 1.75it/s]\n"
536 | ]
537 | }
538 | ],
539 | "source": [
540 | "from tqdm import tqdm\n",
541 | "for _ in tqdm(range(0,1000)):\n",
542 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
543 | " time.sleep(0.5)\n",
544 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
545 | ]
546 | },
547 | {
548 | "cell_type": "code",
549 | "execution_count": 60,
550 | "metadata": {},
551 | "outputs": [
552 | {
553 | "name": "stderr",
554 | "output_type": "stream",
555 | "text": [
556 | "100%|██████████| 4526/4526 [01:24<00:00, 53.65it/s] \n"
557 | ]
558 | }
559 | ],
560 | "source": [
561 | "Car_url_Maharashtra = []\n",
562 | "car_url_Maharashtra = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
563 | "for j in tqdm(range(len(car_url_Maharashtra))):\n",
564 | " Car_url_Maharashtra.append(car_url_Maharashtra[j].get_attribute('href'))\n",
565 | "time.sleep(2)"
566 | ]
567 | },
568 | {
569 | "cell_type": "markdown",
570 | "metadata": {},
571 | "source": [
572 | "**8. Extracting URL for Karnataka**"
573 | ]
574 | },
575 | {
576 | "cell_type": "code",
577 | "execution_count": 63,
578 | "metadata": {},
579 | "outputs": [],
580 | "source": [
581 | "url = \"https://www.cardekho.com/used-cars+in+karnataka\"\n",
582 | "driver.get(url)\n",
583 | "time.sleep(2)"
584 | ]
585 | },
586 | {
587 | "cell_type": "code",
588 | "execution_count": 64,
589 | "metadata": {},
590 | "outputs": [
591 | {
592 | "name": "stderr",
593 | "output_type": "stream",
594 | "text": [
595 | "100%|██████████| 750/750 [08:08<00:00, 1.53it/s] \n"
596 | ]
597 | }
598 | ],
599 | "source": [
600 | "from tqdm import tqdm\n",
601 | "for _ in tqdm(range(0,750)):\n",
602 | " \n",
603 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
604 | " time.sleep(0.4)\n",
605 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
606 | ]
607 | },
608 | {
609 | "cell_type": "code",
610 | "execution_count": 65,
611 | "metadata": {},
612 | "outputs": [
613 | {
614 | "name": "stderr",
615 | "output_type": "stream",
616 | "text": [
617 | "100%|██████████| 867/867 [00:07<00:00, 111.79it/s]\n"
618 | ]
619 | }
620 | ],
621 | "source": [
622 | "Car_url_Karnataka = []\n",
623 | "car_url_Karnataka = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
624 | "for j in tqdm(range(len(car_url_Karnataka))):\n",
625 | " Car_url_Karnataka.append(car_url_Karnataka[j].get_attribute('href'))\n",
626 | "time.sleep(2)"
627 | ]
628 | },
629 | {
630 | "cell_type": "markdown",
631 | "metadata": {},
632 | "source": [
633 | "**9. Extracting URL for Uttar Pradesh**"
634 | ]
635 | },
636 | {
637 | "cell_type": "code",
638 | "execution_count": 66,
639 | "metadata": {},
640 | "outputs": [],
641 | "source": [
642 | "url = \"https://www.cardekho.com/used-cars+in+uttar-pradesh\"\n",
643 | "driver.get(url)\n",
644 | "time.sleep(2)"
645 | ]
646 | },
647 | {
648 | "cell_type": "code",
649 | "execution_count": 67,
650 | "metadata": {},
651 | "outputs": [
652 | {
653 | "name": "stderr",
654 | "output_type": "stream",
655 | "text": [
656 | "100%|██████████| 700/700 [10:19<00:00, 1.13it/s] \n"
657 | ]
658 | }
659 | ],
660 | "source": [
661 | "from tqdm import tqdm\n",
662 | "for _ in tqdm(range(0,700)):\n",
663 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
664 | " time.sleep(0.25)\n",
665 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
666 | ]
667 | },
668 | {
669 | "cell_type": "code",
670 | "execution_count": 68,
671 | "metadata": {},
672 | "outputs": [
673 | {
674 | "name": "stderr",
675 | "output_type": "stream",
676 | "text": [
677 | "100%|██████████| 1380/1380 [00:15<00:00, 89.31it/s]\n"
678 | ]
679 | }
680 | ],
681 | "source": [
682 | "Car_url_UttarPradesh = []\n",
683 | "car_url_UttarPradesh = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
684 | "for j in tqdm(range(len(car_url_UttarPradesh))):\n",
685 | " Car_url_UttarPradesh.append(car_url_UttarPradesh[j].get_attribute('href'))\n",
686 | "time.sleep(2)"
687 | ]
688 | },
689 | {
690 | "cell_type": "markdown",
691 | "metadata": {},
692 | "source": [
693 | "**10. Extracting URL for Tamil Nadu**"
694 | ]
695 | },
696 | {
697 | "cell_type": "code",
698 | "execution_count": 69,
699 | "metadata": {},
700 | "outputs": [],
701 | "source": [
702 | "url = \"https://www.cardekho.com/used-cars+in+tamil-nadu\"\n",
703 | "driver.get(url)\n",
704 | "time.sleep(2)"
705 | ]
706 | },
707 | {
708 | "cell_type": "code",
709 | "execution_count": null,
710 | "metadata": {},
711 | "outputs": [],
712 | "source": [
713 | "from tqdm import tqdm\n",
714 | "for _ in tqdm(range(0,600)):\n",
715 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
716 | " time.sleep(0.25)\n",
717 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
718 | ]
719 | },
720 | {
721 | "cell_type": "code",
722 | "execution_count": 73,
723 | "metadata": {},
724 | "outputs": [
725 | {
726 | "name": "stderr",
727 | "output_type": "stream",
728 | "text": [
729 | "100%|██████████| 1750/1750 [02:35<00:00, 11.24it/s]\n"
730 | ]
731 | }
732 | ],
733 | "source": [
734 | "Car_url_TamilNadu = []\n",
735 | "car_url_TamilNadu = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
736 | "for j in tqdm(range(len(car_url_TamilNadu))):\n",
737 | " Car_url_TamilNadu.append(car_url_TamilNadu[j].get_attribute('href'))\n",
738 | "time.sleep(2)"
739 | ]
740 | },
741 | {
742 | "cell_type": "markdown",
743 | "metadata": {},
744 | "source": [
745 | "**11. Extracting URL for Haryana**"
746 | ]
747 | },
748 | {
749 | "cell_type": "code",
750 | "execution_count": 75,
751 | "metadata": {},
752 | "outputs": [],
753 | "source": [
754 | "url = \"https://www.cardekho.com/used-cars+in+haryana\"\n",
755 | "driver.get(url)\n",
756 | "time.sleep(2)"
757 | ]
758 | },
759 | {
760 | "cell_type": "code",
761 | "execution_count": null,
762 | "metadata": {},
763 | "outputs": [],
764 | "source": [
765 | "from tqdm import tqdm\n",
766 | "for _ in tqdm(range(0,600)):\n",
767 | " \n",
768 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
769 | " time.sleep(0.25)\n",
770 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
771 | ]
772 | },
773 | {
774 | "cell_type": "code",
775 | "execution_count": 79,
776 | "metadata": {},
777 | "outputs": [
778 | {
779 | "name": "stderr",
780 | "output_type": "stream",
781 | "text": [
782 | "100%|██████████| 1228/1228 [00:12<00:00, 96.30it/s] \n"
783 | ]
784 | }
785 | ],
786 | "source": [
787 | "Car_url_Haryana = []\n",
788 | "car_url_Haryana = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
789 | "for j in tqdm(range(len(car_url_Haryana))):\n",
790 | " Car_url_Haryana.append(car_url_Haryana[j].get_attribute('href'))\n",
791 | "time.sleep(2)"
792 | ]
793 | },
794 | {
795 | "cell_type": "code",
796 | "execution_count": 99,
797 | "metadata": {},
798 | "outputs": [
799 | {
800 | "data": {
801 | "text/plain": [
802 | "1228"
803 | ]
804 | },
805 | "execution_count": 99,
806 | "metadata": {},
807 | "output_type": "execute_result"
808 | }
809 | ],
810 | "source": [
811 | "len(Car_url_Haryana)"
812 | ]
813 | },
814 | {
815 | "cell_type": "markdown",
816 | "metadata": {},
817 | "source": [
818 | "**12. Extracting URL for Rajasthan**"
819 | ]
820 | },
821 | {
822 | "cell_type": "code",
823 | "execution_count": 80,
824 | "metadata": {},
825 | "outputs": [],
826 | "source": [
827 | "url = \"https://www.cardekho.com/used-cars+in+rajasthan\"\n",
828 | "driver.get(url)\n",
829 | "time.sleep(2)"
830 | ]
831 | },
832 | {
833 | "cell_type": "code",
834 | "execution_count": 81,
835 | "metadata": {},
836 | "outputs": [
837 | {
838 | "name": "stderr",
839 | "output_type": "stream",
840 | "text": [
841 | "100%|██████████| 500/500 [04:49<00:00, 1.73it/s]\n"
842 | ]
843 | }
844 | ],
845 | "source": [
846 | "from tqdm import tqdm\n",
847 | "for _ in tqdm(range(0,500)):\n",
848 | " \n",
849 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
850 | " time.sleep(0.25)\n",
851 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
852 | ]
853 | },
854 | {
855 | "cell_type": "code",
856 | "execution_count": 82,
857 | "metadata": {},
858 | "outputs": [
859 | {
860 | "name": "stderr",
861 | "output_type": "stream",
862 | "text": [
863 | "100%|██████████| 687/687 [00:06<00:00, 111.65it/s]\n"
864 | ]
865 | }
866 | ],
867 | "source": [
868 | "Car_url_Rajasthan = []\n",
869 | "car_url_Rajasthan = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
870 | "for j in tqdm(range(len(car_url_Rajasthan))):\n",
871 | " Car_url_Rajasthan.append(car_url_Rajasthan[j].get_attribute('href'))\n",
872 | "time.sleep(2)"
873 | ]
874 | },
875 | {
876 | "cell_type": "markdown",
877 | "metadata": {},
878 | "source": [
879 | "**13. Extracting URL for Kerala**"
880 | ]
881 | },
882 | {
883 | "cell_type": "code",
884 | "execution_count": 83,
885 | "metadata": {},
886 | "outputs": [],
887 | "source": [
888 | "url = \"https://www.cardekho.com/used-cars+in+kerala\"\n",
889 | "driver.get(url)\n",
890 | "time.sleep(2)"
891 | ]
892 | },
893 | {
894 | "cell_type": "code",
895 | "execution_count": 84,
896 | "metadata": {},
897 | "outputs": [
898 | {
899 | "name": "stderr",
900 | "output_type": "stream",
901 | "text": [
902 | "100%|██████████| 400/400 [10:11<00:00, 1.53s/it]\n"
903 | ]
904 | }
905 | ],
906 | "source": [
907 | "from tqdm import tqdm\n",
908 | "for _ in tqdm(range(0,400)):\n",
909 | " time.sleep(0.5)\n",
910 | " driver.execute_script(\"window.scrollBy(0,1500)\",\"\")\n",
911 | " time.sleep(1)\n",
912 | " driver.execute_script(\"window.scrollBy(0,-500)\")"
913 | ]
914 | },
915 | {
916 | "cell_type": "code",
917 | "execution_count": 85,
918 | "metadata": {},
919 | "outputs": [
920 | {
921 | "name": "stderr",
922 | "output_type": "stream",
923 | "text": [
924 | "100%|██████████| 18/18 [00:00<00:00, 117.58it/s]\n"
925 | ]
926 | }
927 | ],
928 | "source": [
929 | "Car_url_Kerala = []\n",
930 | "car_url_Kerala = driver.find_elements_by_xpath('//div[@class=\"gsc_col-xs-7 carsName\"]/a') \n",
931 | "for j in tqdm(range(len(car_url_Kerala))):\n",
932 | " Car_url_Kerala.append(car_url_Kerala[j].get_attribute('href'))\n",
933 | "time.sleep(2)"
934 | ]
935 | },
936 | {
937 | "cell_type": "code",
938 | "execution_count": 87,
939 | "metadata": {},
940 | "outputs": [],
941 | "source": [
942 | "Car_url = []"
943 | ]
944 | },
945 | {
946 | "cell_type": "code",
947 | "execution_count": 89,
948 | "metadata": {},
949 | "outputs": [],
950 | "source": [
951 | "Car_url = Car_url_Kerala + Car_url_Rajasthan + Car_url_Haryana + Car_url_TamilNadu + Car_url_UttarPradesh + Car_url_Karnataka + Car_url_telangana + Car_url_gurgaon + Car_url_delhi_ncr + Car_url_chennai + Car_url_ahmedabad"
952 | ]
953 | },
954 | {
955 | "cell_type": "code",
956 | "execution_count": 90,
957 | "metadata": {},
958 | "outputs": [
959 | {
960 | "data": {
961 | "text/plain": [
962 | "12305"
963 | ]
964 | },
965 | "execution_count": 90,
966 | "metadata": {},
967 | "output_type": "execute_result"
968 | }
969 | ],
970 | "source": [
971 | "len(Car_url)"
972 | ]
973 | },
974 | {
975 | "cell_type": "markdown",
976 | "metadata": {},
977 | "source": [
978 | "#### **Creating Excel file of URL for futher webscraping**"
979 | ]
980 | },
981 | {
982 | "cell_type": "code",
983 | "execution_count": 3,
984 | "metadata": {},
985 | "outputs": [],
986 | "source": [
987 | "import pandas as pd"
988 | ]
989 | },
990 | {
991 | "cell_type": "code",
992 | "execution_count": 97,
993 | "metadata": {},
994 | "outputs": [],
995 | "source": [
996 | "Car_URL = pd.DataFrame({})\n",
997 | "Car_URL['Urls'] = Car_url"
998 | ]
999 | },
1000 | {
1001 | "cell_type": "code",
1002 | "execution_count": 98,
1003 | "metadata": {},
1004 | "outputs": [],
1005 | "source": [
1006 | "Car_URL.to_excel('Car_url.xlsx', index = False)"
1007 | ]
1008 | },
1009 | {
1010 | "cell_type": "markdown",
1011 | "metadata": {},
1012 | "source": [
1013 | "## **Part 2 : Scraping features from Indiviudal Link**"
1014 | ]
1015 | },
1016 | {
1017 | "cell_type": "markdown",
1018 | "metadata": {},
1019 | "source": [
1020 | "### **Importing excel file contain URLs.**"
1021 | ]
1022 | },
1023 | {
1024 | "cell_type": "code",
1025 | "execution_count": 6,
1026 | "metadata": {},
1027 | "outputs": [],
1028 | "source": [
1029 | "import pandas as pd"
1030 | ]
1031 | },
1032 | {
1033 | "cell_type": "code",
1034 | "execution_count": 4,
1035 | "metadata": {},
1036 | "outputs": [],
1037 | "source": [
1038 | "df = pd.read_excel('Car_url.xlsx')"
1039 | ]
1040 | },
1041 | {
1042 | "cell_type": "code",
1043 | "execution_count": 5,
1044 | "metadata": {},
1045 | "outputs": [
1046 | {
1047 | "data": {
1048 | "text/plain": [
1049 | "(12305, 1)"
1050 | ]
1051 | },
1052 | "execution_count": 5,
1053 | "metadata": {},
1054 | "output_type": "execute_result"
1055 | }
1056 | ],
1057 | "source": [
1058 | "df.shape"
1059 | ]
1060 | },
1061 | {
1062 | "cell_type": "markdown",
1063 | "metadata": {},
1064 | "source": [
1065 | "#### **As we have scrap around 12035 URL, We Will scrap car details in different batchs.**"
1066 | ]
1067 | },
1068 | {
1069 | "cell_type": "code",
1070 | "execution_count": 6,
1071 | "metadata": {},
1072 | "outputs": [],
1073 | "source": [
1074 | "# Making Empty lists\n",
1075 | "Location = []\n",
1076 | "Model = []\n",
1077 | "Variant = []\n",
1078 | "Price = []\n",
1079 | "Make_year =[]\n",
1080 | "Fuel_Type = []\n",
1081 | "KMs_driven = []\n",
1082 | "Engine_displacement = []\n",
1083 | "Transmission = []\n",
1084 | "Milage = []\n",
1085 | "Max_power = []\n",
1086 | "Torque = []\n",
1087 | "Seats = []\n",
1088 | "Color = []\n",
1089 | "Gear_Box =[]\n",
1090 | "Steering_Type =[]\n",
1091 | "Front_Brake_Type = []\n",
1092 | "Rear_Brake_Type = []\n",
1093 | "Tyre_Volume = []\n",
1094 | "Cargo_volume = []\n",
1095 | "Engine_Type = []\n",
1096 | "No_of_cylinder = []\n",
1097 | "Value_Configuration = []\n",
1098 | "Fuel_Suppy_System = []\n",
1099 | "Turbo_charger = []\n",
1100 | "Super_charger = []\n",
1101 | "Length = []\n",
1102 | "Width =[]\n",
1103 | "Height = []\n",
1104 | "Gross_weight = []"
1105 | ]
1106 | },
1107 | {
1108 | "cell_type": "code",
1109 | "execution_count": 21,
1110 | "metadata": {},
1111 | "outputs": [],
1112 | "source": [
1113 | "driver=webdriver.Chrome(r'C:\\chromedriver.exe')"
1114 | ]
1115 | },
1116 | {
1117 | "cell_type": "markdown",
1118 | "metadata": {},
1119 | "source": [
1120 | "**Extracting details for batch 1 of 500**"
1121 | ]
1122 | },
1123 | {
1124 | "cell_type": "code",
1125 | "execution_count": 16,
1126 | "metadata": {},
1127 | "outputs": [
1128 | {
1129 | "name": "stderr",
1130 | "output_type": "stream",
1131 | "text": [
1132 | "100%|██████████| 100/100 [07:57<00:00, 4.78s/it]\n"
1133 | ]
1134 | }
1135 | ],
1136 | "source": [
1137 | "from tqdm import tqdm\n",
1138 | "for i in tqdm(df['Urls'][9800:9900]):\n",
1139 | " driver.get(i)\n",
1140 | " time.sleep(0.5)\n",
1141 | " \n",
1142 | " # Extracting Car Model via xpath\n",
1143 | " try :\n",
1144 | " model = driver.find_element_by_xpath('//div[@class=\"gsc_col-xs-12\"]/h1')\n",
1145 | " Model.append(model.text[5:])\n",
1146 | " except NoSuchElementException:\n",
1147 | " try :\n",
1148 | " model = driver.find_element_by_xpath('//div[@class=\"gsc_container_hold\"]/div/h1[2]')\n",
1149 | " Model.append(model.text) \n",
1150 | " except NoSuchElementException:\n",
1151 | " pass\n",
1152 | " \n",
1153 | " #clicking to view all specifications\n",
1154 | " try:\n",
1155 | " view_more = driver.find_element_by_xpath(\"//*[text() = 'View All Specifications' or text() = 'View More']\")\n",
1156 | " driver.execute_script(\"arguments[0].scrollIntoView();\", view_more)\n",
1157 | " driver.execute_script(\"arguments[0].click();\", view_more)\n",
1158 | " \n",
1159 | " except NoSuchElementException:\n",
1160 | " try:\n",
1161 | " Button= driver.find_element_by_xpath('//*[@id=\"topspec\"]/div[2]/a')\n",
1162 | " Button.click()\n",
1163 | " time.sleep(1)\n",
1164 | " except NoSuchElementException:\n",
1165 | " pass\n",
1166 | " \n",
1167 | " time.sleep(0.75)\n",
1168 | " # Extracting Car Price via xpath\n",
1169 | " try :\n",
1170 | " price = driver.find_element_by_xpath('//div[@class=\"priceSection\"]/span[2]')\n",
1171 | " Price.append(price.text)\n",
1172 | " except NoSuchElementException:\n",
1173 | " try :\n",
1174 | " price = driver.find_element_by_xpath('//div[@class=\"gsc_container_hold\"]/span[1]/span')\n",
1175 | " Price.append(price.text)\n",
1176 | " except NoSuchElementException:\n",
1177 | " pass\n",
1178 | " \n",
1179 | " # Extracting Car Make Year via xpath\n",
1180 | " try :\n",
1181 | " year = driver.find_element_by_xpath('//*[text()=\"Make Year\"]/following-sibling::div')\n",
1182 | " Make_year.append(year.text) \n",
1183 | " except NoSuchElementException:\n",
1184 | " try :\n",
1185 | " year = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[1]/div/div')\n",
1186 | " Make_year.append(year.text)\n",
1187 | " except NoSuchElementException:\n",
1188 | " pass\n",
1189 | " \n",
1190 | " # Extracting Car Fuel Type via xpath\n",
1191 | " try :\n",
1192 | " fuel = driver.find_element_by_xpath('//*[text()=\"Fuel\"]/following-sibling::div')\n",
1193 | " Fuel_Type.append(fuel.text)\n",
1194 | " except NoSuchElementException:\n",
1195 | " try :\n",
1196 | " fuel = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[5]/div/div')\n",
1197 | " Fuel_Type.append(fuel.text)\n",
1198 | " except NoSuchElementException:\n",
1199 | " Fuel_Type.append('-')\n",
1200 | " \n",
1201 | " # Extracting KMS driven via xpath\n",
1202 | " try :\n",
1203 | " kms = driver.find_element_by_xpath('//*[text()=\"KMs Driven\"]/following-sibling::div')\n",
1204 | " KMs_driven.append(kms.text.replace('Kms',''))\n",
1205 | " except NoSuchElementException:\n",
1206 | " try :\n",
1207 | " kms = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[3]/div/div')\n",
1208 | " KMs_driven.append(kms.text.replace('kms',''))\n",
1209 | " except NoSuchElementException:\n",
1210 | " pass\n",
1211 | " \n",
1212 | " # Extracting Engine_displacemet via xpath\n",
1213 | " try :\n",
1214 | " engine_disp = driver.find_element_by_xpath('//*[text()=\"Engine Displacement\"]/following-sibling::div')\n",
1215 | " Engine_displacement.append(engine_disp.text.replace('CC','')) \n",
1216 | " except NoSuchElementException:\n",
1217 | " try :\n",
1218 | " engine_disp = driver.find_element_by_xpath('//*[text()=\"Engine\"]/following-sibling::div')\n",
1219 | " Engine_displacement.append(engine_disp.text.replace('CC',''))\n",
1220 | " except NoSuchElementException:\n",
1221 | " pass\n",
1222 | " \n",
1223 | " # Extracting Transmission via xpath\n",
1224 | " try :\n",
1225 | " transmission = driver.find_element_by_xpath('//*[text()=\"Transmission\"]/following-sibling::div')\n",
1226 | " Transmission.append(transmission.text)\n",
1227 | " except NoSuchElementException:\n",
1228 | " try :\n",
1229 | " transmission = driver.find_element_by_xpath('//div[@class=\"GenDetailBox\"]/ul/li[6]/div/div')\n",
1230 | " Transmission.append(transmission.text)\n",
1231 | " except NoSuchElementException:\n",
1232 | " pass\n",
1233 | " time.sleep(0.25)\n",
1234 | " # Extracting Milage via xpath\n",
1235 | " try :\n",
1236 | " milage = driver.find_element_by_xpath('//*[text()=\"Mileage\"]/following-sibling::div')\n",
1237 | " Milage.append(milage.text.replace('kmpl',''))\n",
1238 | " \n",
1239 | " except NoSuchElementException:\n",
1240 | " Milage.append('-')\n",
1241 | " \n",
1242 | " # Extracting Max_power via xpath\n",
1243 | " try :\n",
1244 | " maxbhp = driver.find_element_by_xpath('//*[text()=\"Max Power\"]/following-sibling::div')\n",
1245 | " Max_power.append(maxbhp.text.replace('bhp',''))\n",
1246 | " \n",
1247 | " except NoSuchElementException:\n",
1248 | " Max_power.append('-')\n",
1249 | " \n",
1250 | " # Extracting Torque via xpath\n",
1251 | " try :\n",
1252 | " torque = driver.find_element_by_xpath('//*[text()=\"Torque\"]/following-sibling::div')\n",
1253 | " Torque.append(torque.text.replace('Nm',''))\n",
1254 | " \n",
1255 | " except NoSuchElementException:\n",
1256 | " Torque.append('-') \n",
1257 | " \n",
1258 | " # Extracting Seating capacity via xpath\n",
1259 | " try :\n",
1260 | " seats = driver.find_element_by_xpath('//*[text()=\"Seating Capacity\"]/following-sibling::div')\n",
1261 | " Seats.append(seats.text)\n",
1262 | " \n",
1263 | " except NoSuchElementException:\n",
1264 | " Seats.append('-') \n",
1265 | " \n",
1266 | " # Extracting color via xpath\n",
1267 | " try :\n",
1268 | " color = driver.find_element_by_xpath('//*[text()=\"Color\"]/following-sibling::div')\n",
1269 | " Color.append(color.text) \n",
1270 | " except NoSuchElementException:\n",
1271 | " Color.append('-')\n",
1272 | " \n",
1273 | " # Extracting Gear_Box via xpath\n",
1274 | " try :\n",
1275 | " gear_Box = driver.find_element_by_xpath('//*[text()=\"Gear Box\"]/following-sibling::div')\n",
1276 | " Gear_Box.append(gear_Box.text)\n",
1277 | " \n",
1278 | " except NoSuchElementException:\n",
1279 | " Gear_Box.append('-')\n",
1280 | " \n",
1281 | " # Extracting Steering_Type via xpath\n",
1282 | " try :\n",
1283 | " steering_Type = driver.find_element_by_xpath('//*[text()=\"Steering Type\"]/following-sibling::div')\n",
1284 | " Steering_Type.append(steering_Type.text)\n",
1285 | " \n",
1286 | " except NoSuchElementException:\n",
1287 | " Steering_Type.append('-')\n",
1288 | " \n",
1289 | " # Extracting Front_Brake_Type via xpath\n",
1290 | " try :\n",
1291 | " front_Brake_Type = driver.find_element_by_xpath('//*[text()=\"Front Brake Type\"]/following-sibling::div')\n",
1292 | " Front_Brake_Type.append(front_Brake_Type.text)\n",
1293 | " \n",
1294 | " except NoSuchElementException:\n",
1295 | " Front_Brake_Type.append('-')\n",
1296 | " \n",
1297 | " # Extracting Rear_Brake_Type via xpath\n",
1298 | " try :\n",
1299 | " rear_Brake_Type = driver.find_element_by_xpath('//*[text()=\"Rear Brake Type\"]/following-sibling::div')\n",
1300 | " Rear_Brake_Type.append(rear_Brake_Type.text)\n",
1301 | " \n",
1302 | " except NoSuchElementException:\n",
1303 | " Rear_Brake_Type.append('-')\n",
1304 | " \n",
1305 | " # Extracting Tyre_Volume via xpath\n",
1306 | " try :\n",
1307 | " tyre_Volume = driver.find_element_by_xpath('//*[text()=\"Tyre Type\"]/following-sibling::div')\n",
1308 | " Tyre_Volume.append(tyre_Volume.text)\n",
1309 | " \n",
1310 | " except NoSuchElementException:\n",
1311 | " Tyre_Volume.append('-')\n",
1312 | " \n",
1313 | " # Extracting Engine_Type via xpath\n",
1314 | " try :\n",
1315 | " engine_Type = driver.find_element_by_xpath('//*[text()=\"Engine Type\"]/following-sibling::div')\n",
1316 | " Engine_Type.append(engine_Type.text)\n",
1317 | " \n",
1318 | " except NoSuchElementException:\n",
1319 | " Engine_Type.append('-')\n",
1320 | " \n",
1321 | " # Extracting No_of_cylinder via xpath\n",
1322 | " try :\n",
1323 | " no_of_cylinder = driver.find_element_by_xpath('//*[text()=\"No of Cylinder\"]/following-sibling::div')\n",
1324 | " No_of_cylinder.append(no_of_cylinder.text) \n",
1325 | " except NoSuchElementException:\n",
1326 | " try :\n",
1327 | " no_of_cylinder = driver.find_element_by_xpath('//*[text()=\"No Of Cylinder\"]/following-sibling::div')\n",
1328 | " No_of_cylinder.append(no_of_cylinder.text)\n",
1329 | " except NoSuchElementException:\n",
1330 | " pass\n",
1331 | " \n",
1332 | " # Extracting Value_Configuration via xpath\n",
1333 | " try :\n",
1334 | " value_Configuration = driver.find_element_by_xpath('//*[text()=\"Value Configuration\"]/following-sibling::div')\n",
1335 | " Value_Configuration.append(value_Configuration.text) \n",
1336 | " except NoSuchElementException:\n",
1337 | " try :\n",
1338 | " value_Configuration = driver.find_element_by_xpath('//*[text()=\"Valve Configuration\"]/following-sibling::div')\n",
1339 | " Value_Configuration.append(value_Configuration.text) \n",
1340 | " except NoSuchElementException:\n",
1341 | " pass\n",
1342 | " \n",
1343 | " # Extracting Turbo_charger via xpath\n",
1344 | " try :\n",
1345 | " turbo_charger = driver.find_element_by_xpath('//*[text()=\"Turbo Charger\"]/following-sibling::div')\n",
1346 | " Turbo_charger.append(turbo_charger.text)\n",
1347 | " \n",
1348 | " except NoSuchElementException:\n",
1349 | " Turbo_charger.append('-')\n",
1350 | " \n",
1351 | " # Extracting Super_charger via xpath\n",
1352 | " try :\n",
1353 | " super_charger = driver.find_element_by_xpath('//*[text()=\"Super Charger\"]/following-sibling::div')\n",
1354 | " Super_charger.append(super_charger.text) \n",
1355 | " except NoSuchElementException:\n",
1356 | " try :\n",
1357 | " super_charger = driver.find_element_by_xpath('//*[text()=\"superCharger\"]/following-sibling::div')\n",
1358 | " Super_charger.append(super_charger.text) \n",
1359 | " except NoSuchElementException:\n",
1360 | " Super_charger.append('-')\n",
1361 | " \n",
1362 | " # Extracting Length via xpath\n",
1363 | " try :\n",
1364 | " length = driver.find_element_by_xpath('//*[text()=\"Length\"]/following-sibling::div')\n",
1365 | " Length.append(length.text.replace('mm',''))\n",
1366 | " \n",
1367 | " except NoSuchElementException:\n",
1368 | " Length.append('-')\n",
1369 | " \n",
1370 | " # Extracting Width via xpath\n",
1371 | " try :\n",
1372 | " width = driver.find_element_by_xpath('//*[text()=\"Width\"]/following-sibling::div')\n",
1373 | " Width.append(width.text.replace('mm',''))\n",
1374 | " \n",
1375 | " except NoSuchElementException:\n",
1376 | " Width.append('-')\n",
1377 | " \n",
1378 | " # Extracting Height via xpath\n",
1379 | " try :\n",
1380 | " height = driver.find_element_by_xpath('//*[text()=\"Height\"]/following-sibling::div')\n",
1381 | " Height.append(height.text.replace('mm',''))\n",
1382 | " \n",
1383 | " except NoSuchElementException:\n",
1384 | " Height.append('-')\n",
1385 | " "
1386 | ]
1387 | },
1388 | {
1389 | "cell_type": "code",
1390 | "execution_count": 18,
1391 | "metadata": {
1392 | "scrolled": true
1393 | },
1394 | "outputs": [
1395 | {
1396 | "data": {
1397 | "text/html": [
1398 | "\n",
1399 | "\n",
1412 | "
\n",
1413 | " \n",
1414 | " \n",
1415 | " | \n",
1416 | " Car Model | \n",
1417 | " Make Year | \n",
1418 | " Fuel Type | \n",
1419 | " KMs driven | \n",
1420 | " Engine Displacement(CC) | \n",
1421 | " Transmission | \n",
1422 | " Milage(kmpl) | \n",
1423 | " Max Power(bhp) | \n",
1424 | " Torque(Nm) | \n",
1425 | " Seating Capacity | \n",
1426 | " Color | \n",
1427 | " Gear Box | \n",
1428 | " Steering Type | \n",
1429 | " Front Brake Type | \n",
1430 | " Rear Brake Type | \n",
1431 | " Tyre Volume | \n",
1432 | " Engine Type | \n",
1433 | " No of Cylinder | \n",
1434 | " Turbo Charger | \n",
1435 | " Super Charger | \n",
1436 | " Length(mm) | \n",
1437 | " Width(mm) | \n",
1438 | " Height(mm) | \n",
1439 | " Price(Rs) | \n",
1440 | "
\n",
1441 | " \n",
1442 | " \n",
1443 | " \n",
1444 | " 0 | \n",
1445 | " BMW 5 Series 520d | \n",
1446 | " 2012 | \n",
1447 | " Diesel | \n",
1448 | " 56,000 | \n",
1449 | " 1995 | \n",
1450 | " Automatic | \n",
1451 | " 18.48 | \n",
1452 | " 177 | \n",
1453 | " 35.7@ 1,750-3,000(kgm@ rpm) | \n",
1454 | " 5 | \n",
1455 | " Brown | \n",
1456 | " 6 Speed | \n",
1457 | " Power | \n",
1458 | " Ventilated discs | \n",
1459 | " Ventilated discs | \n",
1460 | " Tubeless,Radial | \n",
1461 | " In-Line Engine | \n",
1462 | " 4 | \n",
1463 | " No | \n",
1464 | " Yes | \n",
1465 | " 4841 | \n",
1466 | " 1846 | \n",
1467 | " 1468 | \n",
1468 | " 10.9 Lakh* | \n",
1469 | "
\n",
1470 | " \n",
1471 | " 1 | \n",
1472 | " Audi Q3 35 TDI Quattro Premium Plus | \n",
1473 | " 2016 | \n",
1474 | " Diesel | \n",
1475 | " 97,000 | \n",
1476 | " 1968 | \n",
1477 | " Automatic | \n",
1478 | " 15.73 | \n",
1479 | " 174.33 | \n",
1480 | " 380@ 1750-2500rpm | \n",
1481 | " 5 | \n",
1482 | " Silver | \n",
1483 | " 7-Speed S-Tronic | \n",
1484 | " Power | \n",
1485 | " Ventilated Disc | \n",
1486 | " Drum | \n",
1487 | " Tubeless,Radial | \n",
1488 | " TDI Diesel Engine | \n",
1489 | " 4 | \n",
1490 | " Yes | \n",
1491 | " No | \n",
1492 | " 4385 | \n",
1493 | " 2019 | \n",
1494 | " 1608 | \n",
1495 | " 17.5 Lakh* | \n",
1496 | "
\n",
1497 | " \n",
1498 | " 2 | \n",
1499 | " Hyundai Creta SX Opt Diesel AT | \n",
1500 | " 2021 | \n",
1501 | " Diesel | \n",
1502 | " 800 | \n",
1503 | " 1493 | \n",
1504 | " Automatic | \n",
1505 | " 18.5 | \n",
1506 | " 113.42 | \n",
1507 | " 250nm@ 1500-2750rpm | \n",
1508 | " 5 | \n",
1509 | " Grey | \n",
1510 | " 6-Speed | \n",
1511 | " Power | \n",
1512 | " Disc | \n",
1513 | " Disc | \n",
1514 | " Tubeless,Radial | \n",
1515 | " 1.5L CRDi Diesel | \n",
1516 | " 4 | \n",
1517 | " Yes | \n",
1518 | " - | \n",
1519 | " 4300 | \n",
1520 | " 1790 | \n",
1521 | " 1635 | \n",
1522 | " 20.25 Lakh* | \n",
1523 | "
\n",
1524 | " \n",
1525 | " 3 | \n",
1526 | " Maruti Baleno Alpha CVT | \n",
1527 | " 2020 | \n",
1528 | " Petrol | \n",
1529 | " 13,800 | \n",
1530 | " 1197 | \n",
1531 | " Automatic | \n",
1532 | " 19.56 | \n",
1533 | " 81.80 | \n",
1534 | " 113@ 4200rpm | \n",
1535 | " 5 | \n",
1536 | " Blue | \n",
1537 | " CVT | \n",
1538 | " Electric | \n",
1539 | " Disc | \n",
1540 | " Drum | \n",
1541 | " Tubeless,Radial | \n",
1542 | " 1.2L VVT Engine | \n",
1543 | " 4 | \n",
1544 | " No | \n",
1545 | " No | \n",
1546 | " 3995 | \n",
1547 | " 1745 | \n",
1548 | " 1510 | \n",
1549 | " 9 Lakh* | \n",
1550 | "
\n",
1551 | " \n",
1552 | " 4 | \n",
1553 | " Tata Tiago 1.2 Revotron XZ | \n",
1554 | " 2018 | \n",
1555 | " Petrol | \n",
1556 | " 20,463 | \n",
1557 | " 1199 | \n",
1558 | " Manual | \n",
1559 | " 23.84 | \n",
1560 | " 84 | \n",
1561 | " 114@ 3500rpm | \n",
1562 | " | \n",
1563 | " White | \n",
1564 | " 5 Speed | \n",
1565 | " | \n",
1566 | " | \n",
1567 | " | \n",
1568 | " | \n",
1569 | " | \n",
1570 | " | \n",
1571 | " | \n",
1572 | " | \n",
1573 | " | \n",
1574 | " | \n",
1575 | " | \n",
1576 | " 4.3 Lakh* | \n",
1577 | "
\n",
1578 | " \n",
1579 | "
\n",
1580 | "
"
1581 | ],
1582 | "text/plain": [
1583 | " Car Model Make Year Fuel Type KMs driven \\\n",
1584 | "0 BMW 5 Series 520d 2012 Diesel 56,000 \n",
1585 | "1 Audi Q3 35 TDI Quattro Premium Plus 2016 Diesel 97,000 \n",
1586 | "2 Hyundai Creta SX Opt Diesel AT 2021 Diesel 800 \n",
1587 | "3 Maruti Baleno Alpha CVT 2020 Petrol 13,800 \n",
1588 | "4 Tata Tiago 1.2 Revotron XZ 2018 Petrol 20,463 \n",
1589 | "\n",
1590 | " Engine Displacement(CC) Transmission Milage(kmpl) Max Power(bhp) \\\n",
1591 | "0 1995 Automatic 18.48 177 \n",
1592 | "1 1968 Automatic 15.73 174.33 \n",
1593 | "2 1493 Automatic 18.5 113.42 \n",
1594 | "3 1197 Automatic 19.56 81.80 \n",
1595 | "4 1199 Manual 23.84 84 \n",
1596 | "\n",
1597 | " Torque(Nm) Seating Capacity Color Gear Box \\\n",
1598 | "0 35.7@ 1,750-3,000(kgm@ rpm) 5 Brown 6 Speed \n",
1599 | "1 380@ 1750-2500rpm 5 Silver 7-Speed S-Tronic \n",
1600 | "2 250nm@ 1500-2750rpm 5 Grey 6-Speed \n",
1601 | "3 113@ 4200rpm 5 Blue CVT \n",
1602 | "4 114@ 3500rpm White 5 Speed \n",
1603 | "\n",
1604 | " Steering Type Front Brake Type Rear Brake Type Tyre Volume \\\n",
1605 | "0 Power Ventilated discs Ventilated discs Tubeless,Radial \n",
1606 | "1 Power Ventilated Disc Drum Tubeless,Radial \n",
1607 | "2 Power Disc Disc Tubeless,Radial \n",
1608 | "3 Electric Disc Drum Tubeless,Radial \n",
1609 | "4 \n",
1610 | "\n",
1611 | " Engine Type No of Cylinder Turbo Charger Super Charger Length(mm) \\\n",
1612 | "0 In-Line Engine 4 No Yes 4841 \n",
1613 | "1 TDI Diesel Engine 4 Yes No 4385 \n",
1614 | "2 1.5L CRDi Diesel 4 Yes - 4300 \n",
1615 | "3 1.2L VVT Engine 4 No No 3995 \n",
1616 | "4 \n",
1617 | "\n",
1618 | " Width(mm) Height(mm) Price(Rs) \n",
1619 | "0 1846 1468 10.9 Lakh* \n",
1620 | "1 2019 1608 17.5 Lakh* \n",
1621 | "2 1790 1635 20.25 Lakh* \n",
1622 | "3 1745 1510 9 Lakh* \n",
1623 | "4 4.3 Lakh* "
1624 | ]
1625 | },
1626 | "execution_count": 18,
1627 | "metadata": {},
1628 | "output_type": "execute_result"
1629 | }
1630 | ],
1631 | "source": [
1632 | "data = list(zip(Model, Make_year, Fuel_Type, KMs_driven, Engine_displacement, Transmission, Milage,\n",
1633 | " Max_power, Torque, Seats, Color, Gear_Box, Steering_Type, Front_Brake_Type, Rear_Brake_Type,\n",
1634 | " Tyre_Volume, Engine_Type, No_of_cylinder,\n",
1635 | " Turbo_charger, Super_charger, Length, Width, Height, Price))\n",
1636 | "Batch7 = pd.DataFrame(data, columns=['Car Model', 'Make Year', 'Fuel Type', 'KMs driven', 'Engine Displacement(CC)',\n",
1637 | " 'Transmission', 'Milage(kmpl)', 'Max Power(bhp)', 'Torque(Nm)', 'Seating Capacity',\n",
1638 | " 'Color', 'Gear Box', 'Steering Type', 'Front Brake Type', 'Rear Brake Type',\n",
1639 | " 'Tyre Volume', 'Engine Type', 'No of Cylinder', \n",
1640 | " 'Turbo Charger', 'Super Charger', 'Length(mm)', 'Width(mm)',\n",
1641 | " 'Height(mm)', 'Price(Rs)'])\n",
1642 | "\n",
1643 | "pd.set_option('display.max_columns', None)\n",
1644 | "Batch7.head(5)"
1645 | ]
1646 | },
1647 | {
1648 | "cell_type": "code",
1649 | "execution_count": 198,
1650 | "metadata": {},
1651 | "outputs": [
1652 | {
1653 | "data": {
1654 | "text/plain": [
1655 | "500"
1656 | ]
1657 | },
1658 | "execution_count": 198,
1659 | "metadata": {},
1660 | "output_type": "execute_result"
1661 | }
1662 | ],
1663 | "source": [
1664 | "len(Batch1)"
1665 | ]
1666 | },
1667 | {
1668 | "cell_type": "markdown",
1669 | "metadata": {},
1670 | "source": [
1671 | "### **For Next Batches same code of batch 1 is Re-Run again & again.**"
1672 | ]
1673 | },
1674 | {
1675 | "cell_type": "code",
1676 | "execution_count": 14,
1677 | "metadata": {},
1678 | "outputs": [
1679 | {
1680 | "data": {
1681 | "text/plain": [
1682 | "994"
1683 | ]
1684 | },
1685 | "execution_count": 14,
1686 | "metadata": {},
1687 | "output_type": "execute_result"
1688 | }
1689 | ],
1690 | "source": [
1691 | "len(Batch2)"
1692 | ]
1693 | },
1694 | {
1695 | "cell_type": "code",
1696 | "execution_count": 22,
1697 | "metadata": {},
1698 | "outputs": [
1699 | {
1700 | "data": {
1701 | "text/plain": [
1702 | "1457"
1703 | ]
1704 | },
1705 | "execution_count": 22,
1706 | "metadata": {},
1707 | "output_type": "execute_result"
1708 | }
1709 | ],
1710 | "source": [
1711 | "len(Batch3)"
1712 | ]
1713 | },
1714 | {
1715 | "cell_type": "code",
1716 | "execution_count": 27,
1717 | "metadata": {},
1718 | "outputs": [
1719 | {
1720 | "data": {
1721 | "text/plain": [
1722 | "3436"
1723 | ]
1724 | },
1725 | "execution_count": 27,
1726 | "metadata": {},
1727 | "output_type": "execute_result"
1728 | }
1729 | ],
1730 | "source": [
1731 | "len(Batch4)"
1732 | ]
1733 | },
1734 | {
1735 | "cell_type": "code",
1736 | "execution_count": 16,
1737 | "metadata": {},
1738 | "outputs": [
1739 | {
1740 | "data": {
1741 | "text/plain": [
1742 | "2430"
1743 | ]
1744 | },
1745 | "execution_count": 16,
1746 | "metadata": {},
1747 | "output_type": "execute_result"
1748 | }
1749 | ],
1750 | "source": [
1751 | "len(Batch5)"
1752 | ]
1753 | },
1754 | {
1755 | "cell_type": "code",
1756 | "execution_count": 24,
1757 | "metadata": {},
1758 | "outputs": [
1759 | {
1760 | "data": {
1761 | "text/plain": [
1762 | "740"
1763 | ]
1764 | },
1765 | "execution_count": 24,
1766 | "metadata": {},
1767 | "output_type": "execute_result"
1768 | }
1769 | ],
1770 | "source": [
1771 | "len(Batch6)"
1772 | ]
1773 | },
1774 | {
1775 | "cell_type": "code",
1776 | "execution_count": 19,
1777 | "metadata": {},
1778 | "outputs": [
1779 | {
1780 | "data": {
1781 | "text/plain": [
1782 | "1612"
1783 | ]
1784 | },
1785 | "execution_count": 19,
1786 | "metadata": {},
1787 | "output_type": "execute_result"
1788 | }
1789 | ],
1790 | "source": [
1791 | "len(Batch7)"
1792 | ]
1793 | },
1794 | {
1795 | "cell_type": "markdown",
1796 | "metadata": {},
1797 | "source": [
1798 | "#### **Exporting Batch wise data in Excel file.**"
1799 | ]
1800 | },
1801 | {
1802 | "cell_type": "code",
1803 | "execution_count": 199,
1804 | "metadata": {},
1805 | "outputs": [],
1806 | "source": [
1807 | "# Saving Batch1 data in excel file\n",
1808 | "Batch1.to_excel('Batch1 (0-500).xlsx', index = False)"
1809 | ]
1810 | },
1811 | {
1812 | "cell_type": "code",
1813 | "execution_count": 15,
1814 | "metadata": {},
1815 | "outputs": [],
1816 | "source": [
1817 | "# Saving batch 2 data in excel \n",
1818 | "Batch2.to_excel('Batch1 (500-1500).xlsx', index = False)"
1819 | ]
1820 | },
1821 | {
1822 | "cell_type": "code",
1823 | "execution_count": 23,
1824 | "metadata": {},
1825 | "outputs": [],
1826 | "source": [
1827 | "# Saving batch 3 data in excel \n",
1828 | "Batch3.to_excel('Batch3 (1500-3000).xlsx', index = False)"
1829 | ]
1830 | },
1831 | {
1832 | "cell_type": "code",
1833 | "execution_count": 28,
1834 | "metadata": {},
1835 | "outputs": [],
1836 | "source": [
1837 | "# Saving Batch4 data in excel file 3000-5000\n",
1838 | "Batch4.to_excel('Batch 4 (3000 - 5000).xlsx', index = False)"
1839 | ]
1840 | },
1841 | {
1842 | "cell_type": "code",
1843 | "execution_count": 17,
1844 | "metadata": {},
1845 | "outputs": [],
1846 | "source": [
1847 | "# Saving Batch5 data in excel file 5000 - 7500\n",
1848 | "Batch5.to_excel('Batch 5 (5000).xlsx', index = False)"
1849 | ]
1850 | },
1851 | {
1852 | "cell_type": "code",
1853 | "execution_count": 25,
1854 | "metadata": {},
1855 | "outputs": [],
1856 | "source": [
1857 | "# Saving Batch6 data in excel file 7500 - 8500\n",
1858 | "Batch6.to_excel('Batch 6 (7500-8250).xlsx', index = False)"
1859 | ]
1860 | },
1861 | {
1862 | "cell_type": "code",
1863 | "execution_count": 20,
1864 | "metadata": {},
1865 | "outputs": [],
1866 | "source": [
1867 | "# Saving Batch7 data in excel file 7500 - 8500\n",
1868 | "Batch7.to_excel('Batch 7 (8250-9800).xlsx', index = False)"
1869 | ]
1870 | },
1871 | {
1872 | "cell_type": "code",
1873 | "execution_count": 21,
1874 | "metadata": {},
1875 | "outputs": [
1876 | {
1877 | "data": {
1878 | "text/plain": [
1879 | "(1612, 24)"
1880 | ]
1881 | },
1882 | "execution_count": 21,
1883 | "metadata": {},
1884 | "output_type": "execute_result"
1885 | }
1886 | ],
1887 | "source": [
1888 | "Batch7.shape"
1889 | ]
1890 | },
1891 | {
1892 | "cell_type": "markdown",
1893 | "metadata": {},
1894 | "source": [
1895 | "## **Summary** :\n",
1896 | "- **We have scrape more than 11000 cars details with 24 features.**"
1897 | ]
1898 | },
1899 | {
1900 | "cell_type": "code",
1901 | "execution_count": null,
1902 | "metadata": {},
1903 | "outputs": [],
1904 | "source": []
1905 | }
1906 | ],
1907 | "metadata": {
1908 | "kernelspec": {
1909 | "display_name": "Python 3",
1910 | "language": "python",
1911 | "name": "python3"
1912 | },
1913 | "language_info": {
1914 | "codemirror_mode": {
1915 | "name": "ipython",
1916 | "version": 3
1917 | },
1918 | "file_extension": ".py",
1919 | "mimetype": "text/x-python",
1920 | "name": "python",
1921 | "nbconvert_exporter": "python",
1922 | "pygments_lexer": "ipython3",
1923 | "version": "3.8.5"
1924 | }
1925 | },
1926 | "nbformat": 4,
1927 | "nbformat_minor": 4
1928 | }
1929 |
--------------------------------------------------------------------------------
/Project Used Car price predication using ML/Car price prediction using ML ppt.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Used Car price predication using ML/Car price prediction using ML ppt.pptx
--------------------------------------------------------------------------------
/Project Used Car price predication using ML/Project Report Car Price Predication .pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Project Used Car price predication using ML/Project Report Car Price Predication .pdf
--------------------------------------------------------------------------------
/Project Used Car price predication using ML/Read me.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Internship
2 | ## This Repository includes project done at Flip Robo as part of Data Sciene & Machine Learning Internship
3 | - *Internship Duration - **6 Months** (Sep 2021 - March 2022)*
4 | - *For some Projects data set was provided while for **remaining projects webscraping is done using Selenium** before model building*
5 | - *For each project **Detail Project Report and Presentation is prepared along with ML Model juypter Notebook** which you can find in respective project Repository*
6 | - *Literature review is done for all projects and you can find **reference research paper** used in project in Project Report*
7 |
8 | ### **During Duration I work on Several Web Scraping, Machine Learning and Natural Language Processing (NLP) Projects as mention below:**
9 | #### Web Scraping Assignments
10 | 1. Web Scraping Assignment 1 - Beautiful Soup
11 | 2. Web Scraping Assignment 2 - Selenium
12 | 3. Web Scraping Assignment 3 - Selenium Exception Handling
13 | 4. Web Scraping Assignment 5 - Selenium
14 | 5. Worksheet Assignment on ML, Stats and Python
15 |
16 | #### Machine Learning (ML) Projects
17 | 1. [Micro Credit Defaulter {Data Provided}](https://github.com/Lab-of-Infinity/Internship/tree/main/Micro%20Credit%20Defaulter%20Project)
18 | 2. [Used Car price predication using ML *{Data Scrap using Selenium before Model Building}*](https://github.com/Lab-of-Infinity/Internship/tree/main/Project%20Used%20Car%20price%20predication%20using%20ML)
19 | 3. [Customer Retention in Ecommerce sector {Data Provided}](https://github.com/Lab-of-Infinity/Internship/tree/main/Project%20Customer%20Retention%20in%20Ecommerce%20sector)
20 | 4. [Flight Price Predication using Machine Learning *{Data Scrap using Selenium before Model Building}*](https://github.com/Lab-of-Infinity/Internship/tree/main/Flight%20Price%20Predication%20using%20Machine%20Learning)
21 | 5. [Surprise Housing - Housing Price Predication & Analysis Project {Data Provided}](https://github.com/Lab-of-Infinity/Internship/tree/main/Surprise%20Housing%20-%20Housing%20Price%20Predication%20%26%20Analysis%20Project)
22 |
23 | #### Natural Language Processing (NLP) Projects
24 | 1. [Product Review Rating Predication Using NLP *{Data Scrap using Selenium before Model Building}*](https://github.com/Lab-of-Infinity/Internship/tree/main/Product%20Review%20Rating%20Predication%20Using%20NLP)
25 | 2. [Malignant Commentes Classifier - Multi Label Classification Project using NLP {Data Provided} ](https://github.com/Lab-of-Infinity/Internship/tree/main/Malignant%20Commentes%20Classifier%20-%20Multi%20Label%20Classification%20Project%20using%20NLP)
26 |
--------------------------------------------------------------------------------
/Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/Data Description.txt:
--------------------------------------------------------------------------------
1 | MSSubClass: Identifies the type of dwelling involved in the sale.
2 |
3 | 20 1-STORY 1946 & NEWER ALL STYLES
4 | 30 1-STORY 1945 & OLDER
5 | 40 1-STORY W/FINISHED ATTIC ALL AGES
6 | 45 1-1/2 STORY - UNFINISHED ALL AGES
7 | 50 1-1/2 STORY FINISHED ALL AGES
8 | 60 2-STORY 1946 & NEWER
9 | 70 2-STORY 1945 & OLDER
10 | 75 2-1/2 STORY ALL AGES
11 | 80 SPLIT OR MULTI-LEVEL
12 | 85 SPLIT FOYER
13 | 90 DUPLEX - ALL STYLES AND AGES
14 | 120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER
15 | 150 1-1/2 STORY PUD - ALL AGES
16 | 160 2-STORY PUD - 1946 & NEWER
17 | 180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
18 | 190 2 FAMILY CONVERSION - ALL STYLES AND AGES
19 |
20 | MSZoning: Identifies the general zoning classification of the sale.
21 |
22 | A Agriculture
23 | C Commercial
24 | FV Floating Village Residential
25 | I Industrial
26 | RH Residential High Density
27 | RL Residential Low Density
28 | RP Residential Low Density Park
29 | RM Residential Medium Density
30 |
31 | LotFrontage: Linear feet of street connected to property
32 |
33 | LotArea: Lot size in square feet
34 |
35 | Street: Type of road access to property
36 |
37 | Grvl Gravel
38 | Pave Paved
39 |
40 | Alley: Type of alley access to property
41 |
42 | Grvl Gravel
43 | Pave Paved
44 | NA No alley access
45 |
46 | LotShape: General shape of property
47 |
48 | Reg Regular
49 | IR1 Slightly irregular
50 | IR2 Moderately Irregular
51 | IR3 Irregular
52 |
53 | LandContour: Flatness of the property
54 |
55 | Lvl Near Flat/Level
56 | Bnk Banked - Quick and significant rise from street grade to building
57 | HLS Hillside - Significant slope from side to side
58 | Low Depression
59 |
60 | Utilities: Type of utilities available
61 |
62 | AllPub All public Utilities (E,G,W,& S)
63 | NoSewr Electricity, Gas, and Water (Septic Tank)
64 | NoSeWa Electricity and Gas Only
65 | ELO Electricity only
66 |
67 | LotConfig: Lot configuration
68 |
69 | Inside Inside lot
70 | Corner Corner lot
71 | CulDSac Cul-de-sac
72 | FR2 Frontage on 2 sides of property
73 | FR3 Frontage on 3 sides of property
74 |
75 | LandSlope: Slope of property
76 |
77 | Gtl Gentle slope
78 | Mod Moderate Slope
79 | Sev Severe Slope
80 |
81 | Neighborhood: Physical locations within Ames city limits
82 |
83 | Blmngtn Bloomington Heights
84 | Blueste Bluestem
85 | BrDale Briardale
86 | BrkSide Brookside
87 | ClearCr Clear Creek
88 | CollgCr College Creek
89 | Crawfor Crawford
90 | Edwards Edwards
91 | Gilbert Gilbert
92 | IDOTRR Iowa DOT and Rail Road
93 | MeadowV Meadow Village
94 | Mitchel Mitchell
95 | Names North Ames
96 | NoRidge Northridge
97 | NPkVill Northpark Villa
98 | NridgHt Northridge Heights
99 | NWAmes Northwest Ames
100 | OldTown Old Town
101 | SWISU South & West of Iowa State University
102 | Sawyer Sawyer
103 | SawyerW Sawyer West
104 | Somerst Somerset
105 | StoneBr Stone Brook
106 | Timber Timberland
107 | Veenker Veenker
108 |
109 | Condition1: Proximity to various conditions
110 |
111 | Artery Adjacent to arterial street
112 | Feedr Adjacent to feeder street
113 | Norm Normal
114 | RRNn Within 200' of North-South Railroad
115 | RRAn Adjacent to North-South Railroad
116 | PosN Near positive off-site feature--park, greenbelt, etc.
117 | PosA Adjacent to postive off-site feature
118 | RRNe Within 200' of East-West Railroad
119 | RRAe Adjacent to East-West Railroad
120 |
121 | Condition2: Proximity to various conditions (if more than one is present)
122 |
123 | Artery Adjacent to arterial street
124 | Feedr Adjacent to feeder street
125 | Norm Normal
126 | RRNn Within 200' of North-South Railroad
127 | RRAn Adjacent to North-South Railroad
128 | PosN Near positive off-site feature--park, greenbelt, etc.
129 | PosA Adjacent to postive off-site feature
130 | RRNe Within 200' of East-West Railroad
131 | RRAe Adjacent to East-West Railroad
132 |
133 | BldgType: Type of dwelling
134 |
135 | 1Fam Single-family Detached
136 | 2FmCon Two-family Conversion; originally built as one-family dwelling
137 | Duplx Duplex
138 | TwnhsE Townhouse End Unit
139 | TwnhsI Townhouse Inside Unit
140 |
141 | HouseStyle: Style of dwelling
142 |
143 | 1Story One story
144 | 1.5Fin One and one-half story: 2nd level finished
145 | 1.5Unf One and one-half story: 2nd level unfinished
146 | 2Story Two story
147 | 2.5Fin Two and one-half story: 2nd level finished
148 | 2.5Unf Two and one-half story: 2nd level unfinished
149 | SFoyer Split Foyer
150 | SLvl Split Level
151 |
152 | OverallQual: Rates the overall material and finish of the house
153 |
154 | 10 Very Excellent
155 | 9 Excellent
156 | 8 Very Good
157 | 7 Good
158 | 6 Above Average
159 | 5 Average
160 | 4 Below Average
161 | 3 Fair
162 | 2 Poor
163 | 1 Very Poor
164 |
165 | OverallCond: Rates the overall condition of the house
166 |
167 | 10 Very Excellent
168 | 9 Excellent
169 | 8 Very Good
170 | 7 Good
171 | 6 Above Average
172 | 5 Average
173 | 4 Below Average
174 | 3 Fair
175 | 2 Poor
176 | 1 Very Poor
177 |
178 | YearBuilt: Original construction date
179 |
180 | YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
181 |
182 | RoofStyle: Type of roof
183 |
184 | Flat Flat
185 | Gable Gable
186 | Gambrel Gabrel (Barn)
187 | Hip Hip
188 | Mansard Mansard
189 | Shed Shed
190 |
191 | RoofMatl: Roof material
192 |
193 | ClyTile Clay or Tile
194 | CompShg Standard (Composite) Shingle
195 | Membran Membrane
196 | Metal Metal
197 | Roll Roll
198 | Tar&Grv Gravel & Tar
199 | WdShake Wood Shakes
200 | WdShngl Wood Shingles
201 |
202 | Exterior1st: Exterior covering on house
203 |
204 | AsbShng Asbestos Shingles
205 | AsphShn Asphalt Shingles
206 | BrkComm Brick Common
207 | BrkFace Brick Face
208 | CBlock Cinder Block
209 | CemntBd Cement Board
210 | HdBoard Hard Board
211 | ImStucc Imitation Stucco
212 | MetalSd Metal Siding
213 | Other Other
214 | Plywood Plywood
215 | PreCast PreCast
216 | Stone Stone
217 | Stucco Stucco
218 | VinylSd Vinyl Siding
219 | Wd Sdng Wood Siding
220 | WdShing Wood Shingles
221 |
222 | Exterior2nd: Exterior covering on house (if more than one material)
223 |
224 | AsbShng Asbestos Shingles
225 | AsphShn Asphalt Shingles
226 | BrkComm Brick Common
227 | BrkFace Brick Face
228 | CBlock Cinder Block
229 | CemntBd Cement Board
230 | HdBoard Hard Board
231 | ImStucc Imitation Stucco
232 | MetalSd Metal Siding
233 | Other Other
234 | Plywood Plywood
235 | PreCast PreCast
236 | Stone Stone
237 | Stucco Stucco
238 | VinylSd Vinyl Siding
239 | Wd Sdng Wood Siding
240 | WdShing Wood Shingles
241 |
242 | MasVnrType: Masonry veneer type
243 |
244 | BrkCmn Brick Common
245 | BrkFace Brick Face
246 | CBlock Cinder Block
247 | None None
248 | Stone Stone
249 |
250 | MasVnrArea: Masonry veneer area in square feet
251 |
252 | ExterQual: Evaluates the quality of the material on the exterior
253 |
254 | Ex Excellent
255 | Gd Good
256 | TA Average/Typical
257 | Fa Fair
258 | Po Poor
259 |
260 | ExterCond: Evaluates the present condition of the material on the exterior
261 |
262 | Ex Excellent
263 | Gd Good
264 | TA Average/Typical
265 | Fa Fair
266 | Po Poor
267 |
268 | Foundation: Type of foundation
269 |
270 | BrkTil Brick & Tile
271 | CBlock Cinder Block
272 | PConc Poured Contrete
273 | Slab Slab
274 | Stone Stone
275 | Wood Wood
276 |
277 | BsmtQual: Evaluates the height of the basement
278 |
279 | Ex Excellent (100+ inches)
280 | Gd Good (90-99 inches)
281 | TA Typical (80-89 inches)
282 | Fa Fair (70-79 inches)
283 | Po Poor (<70 inches
284 | NA No Basement
285 |
286 | BsmtCond: Evaluates the general condition of the basement
287 |
288 | Ex Excellent
289 | Gd Good
290 | TA Typical - slight dampness allowed
291 | Fa Fair - dampness or some cracking or settling
292 | Po Poor - Severe cracking, settling, or wetness
293 | NA No Basement
294 |
295 | BsmtExposure: Refers to walkout or garden level walls
296 |
297 | Gd Good Exposure
298 | Av Average Exposure (split levels or foyers typically score average or above)
299 | Mn Mimimum Exposure
300 | No No Exposure
301 | NA No Basement
302 |
303 | BsmtFinType1: Rating of basement finished area
304 |
305 | GLQ Good Living Quarters
306 | ALQ Average Living Quarters
307 | BLQ Below Average Living Quarters
308 | Rec Average Rec Room
309 | LwQ Low Quality
310 | Unf Unfinshed
311 | NA No Basement
312 |
313 | BsmtFinSF1: Type 1 finished square feet
314 |
315 | BsmtFinType2: Rating of basement finished area (if multiple types)
316 |
317 | GLQ Good Living Quarters
318 | ALQ Average Living Quarters
319 | BLQ Below Average Living Quarters
320 | Rec Average Rec Room
321 | LwQ Low Quality
322 | Unf Unfinshed
323 | NA No Basement
324 |
325 | BsmtFinSF2: Type 2 finished square feet
326 |
327 | BsmtUnfSF: Unfinished square feet of basement area
328 |
329 | TotalBsmtSF: Total square feet of basement area
330 |
331 | Heating: Type of heating
332 |
333 | Floor Floor Furnace
334 | GasA Gas forced warm air furnace
335 | GasW Gas hot water or steam heat
336 | Grav Gravity furnace
337 | OthW Hot water or steam heat other than gas
338 | Wall Wall furnace
339 |
340 | HeatingQC: Heating quality and condition
341 |
342 | Ex Excellent
343 | Gd Good
344 | TA Average/Typical
345 | Fa Fair
346 | Po Poor
347 |
348 | CentralAir: Central air conditioning
349 |
350 | N No
351 | Y Yes
352 |
353 | Electrical: Electrical system
354 |
355 | SBrkr Standard Circuit Breakers & Romex
356 | FuseA Fuse Box over 60 AMP and all Romex wiring (Average)
357 | FuseF 60 AMP Fuse Box and mostly Romex wiring (Fair)
358 | FuseP 60 AMP Fuse Box and mostly knob & tube wiring (poor)
359 | Mix Mixed
360 |
361 | 1stFlrSF: First Floor square feet
362 |
363 | 2ndFlrSF: Second floor square feet
364 |
365 | LowQualFinSF: Low quality finished square feet (all floors)
366 |
367 | GrLivArea: Above grade (ground) living area square feet
368 |
369 | BsmtFullBath: Basement full bathrooms
370 |
371 | BsmtHalfBath: Basement half bathrooms
372 |
373 | FullBath: Full bathrooms above grade
374 |
375 | HalfBath: Half baths above grade
376 |
377 | Bedroom: Bedrooms above grade (does NOT include basement bedrooms)
378 |
379 | Kitchen: Kitchens above grade
380 |
381 | KitchenQual: Kitchen quality
382 |
383 | Ex Excellent
384 | Gd Good
385 | TA Typical/Average
386 | Fa Fair
387 | Po Poor
388 |
389 | TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
390 |
391 | Functional: Home functionality (Assume typical unless deductions are warranted)
392 |
393 | Typ Typical Functionality
394 | Min1 Minor Deductions 1
395 | Min2 Minor Deductions 2
396 | Mod Moderate Deductions
397 | Maj1 Major Deductions 1
398 | Maj2 Major Deductions 2
399 | Sev Severely Damaged
400 | Sal Salvage only
401 |
402 | Fireplaces: Number of fireplaces
403 |
404 | FireplaceQu: Fireplace quality
405 |
406 | Ex Excellent - Exceptional Masonry Fireplace
407 | Gd Good - Masonry Fireplace in main level
408 | TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
409 | Fa Fair - Prefabricated Fireplace in basement
410 | Po Poor - Ben Franklin Stove
411 | NA No Fireplace
412 |
413 | GarageType: Garage location
414 |
415 | 2Types More than one type of garage
416 | Attchd Attached to home
417 | Basment Basement Garage
418 | BuiltIn Built-In (Garage part of house - typically has room above garage)
419 | CarPort Car Port
420 | Detchd Detached from home
421 | NA No Garage
422 |
423 | GarageYrBlt: Year garage was built
424 |
425 | GarageFinish: Interior finish of the garage
426 |
427 | Fin Finished
428 | RFn Rough Finished
429 | Unf Unfinished
430 | NA No Garage
431 |
432 | GarageCars: Size of garage in car capacity
433 |
434 | GarageArea: Size of garage in square feet
435 |
436 | GarageQual: Garage quality
437 |
438 | Ex Excellent
439 | Gd Good
440 | TA Typical/Average
441 | Fa Fair
442 | Po Poor
443 | NA No Garage
444 |
445 | GarageCond: Garage condition
446 |
447 | Ex Excellent
448 | Gd Good
449 | TA Typical/Average
450 | Fa Fair
451 | Po Poor
452 | NA No Garage
453 |
454 | PavedDrive: Paved driveway
455 |
456 | Y Paved
457 | P Partial Pavement
458 | N Dirt/Gravel
459 |
460 | WoodDeckSF: Wood deck area in square feet
461 |
462 | OpenPorchSF: Open porch area in square feet
463 |
464 | EnclosedPorch: Enclosed porch area in square feet
465 |
466 | 3SsnPorch: Three season porch area in square feet
467 |
468 | ScreenPorch: Screen porch area in square feet
469 |
470 | PoolArea: Pool area in square feet
471 |
472 | PoolQC: Pool quality
473 |
474 | Ex Excellent
475 | Gd Good
476 | TA Average/Typical
477 | Fa Fair
478 | NA No Pool
479 |
480 | Fence: Fence quality
481 |
482 | GdPrv Good Privacy
483 | MnPrv Minimum Privacy
484 | GdWo Good Wood
485 | MnWw Minimum Wood/Wire
486 | NA No Fence
487 |
488 | MiscFeature: Miscellaneous feature not covered in other categories
489 |
490 | Elev Elevator
491 | Gar2 2nd Garage (if not described in garage section)
492 | Othr Other
493 | Shed Shed (over 100 SF)
494 | TenC Tennis Court
495 | NA None
496 |
497 | MiscVal: $Value of miscellaneous feature
498 |
499 | MoSold: Month Sold (MM)
500 |
501 | YrSold: Year Sold (YYYY)
502 |
503 | SaleType: Type of sale
504 |
505 | WD Warranty Deed - Conventional
506 | CWD Warranty Deed - Cash
507 | VWD Warranty Deed - VA Loan
508 | New Home just constructed and sold
509 | COD Court Officer Deed/Estate
510 | Con Contract 15% Down payment regular terms
511 | ConLw Contract Low Down payment and low interest
512 | ConLI Contract Low Interest
513 | ConLD Contract Low Down
514 | Oth Other
515 |
516 | SaleCondition: Condition of sale
517 |
518 | Normal Normal Sale
519 | Abnorml Abnormal Sale - trade, foreclosure, short sale
520 | AdjLand Adjoining Land Purchase
521 | Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit
522 | Family Sale between family members
523 | Partial Home was not completed when last assessed (associated with New Homes)
--------------------------------------------------------------------------------
/Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/HOUSING Use Case 2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/HOUSING Use Case 2.pdf
--------------------------------------------------------------------------------
/Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/sample documentation.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Project-Housing--2---1-/Project-Housing_splitted/sample documentation.docx
--------------------------------------------------------------------------------
/Surprise Housing - Housing Price Predication & Analysis Project/README.md:
--------------------------------------------------------------------------------
1 | ## Surprise Housing - Housing Price Predication & Analysis Project
2 |
3 | Houses are one of the necessary need of each and every person around the globe and therefore housing and real estate
4 | market is one of the markets which is one of the major contributors in the world’s economy. It is a very large market
5 | and there are various companies working in the domain. Data science comes as a very important tool to solve problems
6 | in the domain to help the companies increase their overall revenue, profits, improving their marketing strategies and
7 | focusing on changing trends in house sales and purchases. Predictive modelling, Market mix modelling,
8 | recommendation systems are some of the machine learning techniques used for achieving the business goals for housing
9 | companies. Our problem is related to one such housing company.
10 |
11 | A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses
12 | data analytics to purchase houses at a price below their actual values and flip them at a higher price. For the same
13 | purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file
14 | below.
15 |
16 | The company is looking at prospective properties to buy houses to enter the market. You are required to build a model
17 | using Machine Learning in order to predict the actual value of the prospective properties and decide whether to invest
18 | in them or not. For this company wants to know:
19 | • **Which variables are important to predict the price of variable?**
20 |
21 | • **How do these variables describe the price of the house?**
22 |
23 | ### Business Goal:
24 | You are required to model the price of houses with the available independent variables. This model will then be used
25 | by the management to understand how exactly the prices vary with the variables. They can accordingly manipulate the
26 | strategy of the firm and concentrate on areas that will yield high returns. Further, the model will be a good way for the
27 | management to understand the pricing dynamics of a new market.
28 |
--------------------------------------------------------------------------------
/Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing - Housing Price Predication & Analysis Project.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing - Housing Price Predication & Analysis Project.pdf
--------------------------------------------------------------------------------
/Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing Price Predication .pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Surprise Housing - Housing Price Predication & Analysis Project/Surprise Housing Price Predication .pptx
--------------------------------------------------------------------------------
/Web Scraping 1 Assignment/Read me.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Web Scraping Selenium Assignment 3/Fruits_Cars_ML_google_images.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Web Scraping Selenium Assignment 3/Fruits_Cars_ML_google_images.zip
--------------------------------------------------------------------------------
/Web Scraping Selenium Assignment 3/Selenium Exception Handling Assignment:
--------------------------------------------------------------------------------
1 | Web Scraping Assinment 3
2 | at FlipRobo
3 | On Selenium exception Handling
4 |
--------------------------------------------------------------------------------
/Web Scraping Selenium Assignment 3/WEB-SCRAPING-ASSIGNMENT-3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Web Scraping Selenium Assignment 3/WEB-SCRAPING-ASSIGNMENT-3.pdf
--------------------------------------------------------------------------------
/WebScraping Assignment 4 Selenium/Web Scraping Assignment 4:
--------------------------------------------------------------------------------
1 | Webscraping assignment 4 on selenium
2 |
--------------------------------------------------------------------------------
/Webscraping Assignment 2 Selenium/Webscraping 2.md:
--------------------------------------------------------------------------------
1 | Selenium Webscraping assignment 2
2 |
--------------------------------------------------------------------------------
/Worksheet_set_1/Machine Learning Worksheet 1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Worksheet_set_1/Machine Learning Worksheet 1.pdf
--------------------------------------------------------------------------------
/Worksheet_set_1/Python Worksheet 1.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## 11. Write a python program to find the factorial of a number"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 19,
13 | "metadata": {},
14 | "outputs": [
15 | {
16 | "name": "stdout",
17 | "output_type": "stream",
18 | "text": [
19 | "Enter the Number :30\n",
20 | "Factorial is the product of all positive integers less than or equal to that number\n",
21 | "The Factorial of 30 is 265252859812191058636308480000000\n"
22 | ]
23 | }
24 | ],
25 | "source": [
26 | "# program to find the factorial of a number\n",
27 | "def factorial(num):\n",
28 | " \n",
29 | " print('Factorial is the product of all positive integers less than or equal to that number')\n",
30 | " \n",
31 | " fact =1\n",
32 | " \n",
33 | " if num >=1 :\n",
34 | " for num in range (1, num+1):\n",
35 | " fact = fact * num\n",
36 | " print('The Factorial of', num, 'is',fact)\n",
37 | " \n",
38 | " elif num ==0 :\n",
39 | " print(' The Factorial of 0 is 1')\n",
40 | " \n",
41 | " elif num < 0:\n",
42 | " print('The Factorial of negative number doesnot exit')\n",
43 | " \n",
44 | "num = int (input('Enter the Number :'))\n",
45 | "factorial(num)\n",
46 | " \n"
47 | ]
48 | },
49 | {
50 | "cell_type": "markdown",
51 | "metadata": {},
52 | "source": [
53 | "#### Alternate method using factorial from Math libary (math.factorial())"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 24,
59 | "metadata": {},
60 | "outputs": [
61 | {
62 | "name": "stdout",
63 | "output_type": "stream",
64 | "text": [
65 | "Enter the Number :15\n",
66 | "Factorial of 15 is 1307674368000\n"
67 | ]
68 | }
69 | ],
70 | "source": [
71 | "# program to find the factorial of a number\n",
72 | "\n",
73 | "import math\n",
74 | "\n",
75 | "num = int (input('Enter the Number :'))\n",
76 | "print('Factorial of', num, 'is', math.factorial(num))"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "## 12. Write a python program to find whether a number is prime or composite."
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 3,
89 | "metadata": {},
90 | "outputs": [
91 | {
92 | "name": "stdout",
93 | "output_type": "stream",
94 | "text": [
95 | "Enter The Number:7\n",
96 | "7 is a Prime Number\n"
97 | ]
98 | }
99 | ],
100 | "source": [
101 | "num = int(input('Enter The Number:'))\n",
102 | "\n",
103 | "if num >1:\n",
104 | " for i in range(2,int(num/2)+1):\n",
105 | " if (num % i ==0):\n",
106 | " print('The input number ', num, 'is not a Prime Number')\n",
107 | " break\n",
108 | " else:\n",
109 | " print(num, 'is a Prime Number')\n",
110 | " \n",
111 | "# 0 and 1 are not considered as prime numbers\n",
112 | "# Prime number exist only for positive whole number greater than 1. (prime number doesnot exist for negative number)\n",
113 | "\n",
114 | "else:\n",
115 | " print(num, 'is not a Prime Number')"
116 | ]
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "## 13. Write a python program to check whether a given string is palindrome or not"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": 7,
128 | "metadata": {},
129 | "outputs": [
130 | {
131 | "name": "stdout",
132 | "output_type": "stream",
133 | "text": [
134 | "Enter the string :1256@#@6521\n",
135 | "The input string is Palindrome\n"
136 | ]
137 | }
138 | ],
139 | "source": [
140 | "Input_str = input('Enter the string :') \n",
141 | "\n",
142 | "def Palindrome(s):\n",
143 | " return s == s[::-1]\n",
144 | "\n",
145 | "# check string is palindrome or not\n",
146 | "\n",
147 | "check = Palindrome(Input_str)\n",
148 | "\n",
149 | "if check:\n",
150 | " print('The input string is Palindrome')\n",
151 | " \n",
152 | "else:\n",
153 | " print('The input string is not Palindrome')"
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {},
159 | "source": [
160 | "## 14. Write a Python program to get the third side of right-angled triangle from two given sides."
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": 11,
166 | "metadata": {},
167 | "outputs": [
168 | {
169 | "name": "stdout",
170 | "output_type": "stream",
171 | "text": [
172 | "Opposite side denoted by x\n",
173 | "Adjacent side denoted by y\n",
174 | "Hypotenuse is denoted by z\n",
175 | "\n",
176 | "\n",
177 | "Which side (x,y,z) do you want to calculatez\n",
178 | "\n",
179 | "\n",
180 | "Input the length of side x: 3\n",
181 | "Input the length of side y: 4\n",
182 | "The length of Hypotenus(z) is 5: \n"
183 | ]
184 | }
185 | ],
186 | "source": [
187 | "import math\n",
188 | "# Printing notation of three side for user\n",
189 | "print('Opposite side denoted by x')\n",
190 | "print('Adjacent side denoted by y')\n",
191 | "print('Hypotenuse is denoted by z')\n",
192 | "print('\\n')\n",
193 | "\n",
194 | "# Taking input for user about which side need to be calculated\n",
195 | "choice = input('Which side (x,y,z) do you want to calculate')\n",
196 | "print('\\n')\n",
197 | "\n",
198 | "# Calculating remaining side of right angle triangle using if-else loop\n",
199 | "\n",
200 | "if choice == 'x':\n",
201 | " y = float(input('Input the length of side y: '))\n",
202 | " z = float(input('Input the length of side z: '))\n",
203 | " x = math.sqrt((z * z) - (y * y))\n",
204 | " print('The length of Opposite side(x) is %g: ' %(x))\n",
205 | " \n",
206 | "elif choice =='y':\n",
207 | " x = float(input('Input the length of side x: '))\n",
208 | " z = float(input('Input the length of side z: '))\n",
209 | " x = math.sqrt((z * z) - (x * x))\n",
210 | " print('The length of Adajacent side(y) is %g: ' %(y))\n",
211 | " \n",
212 | "elif choice == 'z':\n",
213 | " x = float(input('Input the length of side x: '))\n",
214 | " y = float(input('Input the length of side y: '))\n",
215 | " z = math.sqrt((x * x) + (y * y))\n",
216 | " print('The length of Hypotenus(z) is %g: ' %(z))\n",
217 | "\n",
218 | "else:\n",
219 | " print('Invalid Entry :')\n",
220 | " print('Choose & Enter correct side to be calculate out of x, y and z')\n",
221 | " "
222 | ]
223 | },
224 | {
225 | "cell_type": "markdown",
226 | "metadata": {},
227 | "source": [
228 | "## 15. Write a python program to print the frequency of each of the characters present in a given string."
229 | ]
230 | },
231 | {
232 | "cell_type": "code",
233 | "execution_count": 18,
234 | "metadata": {},
235 | "outputs": [
236 | {
237 | "name": "stdout",
238 | "output_type": "stream",
239 | "text": [
240 | "Enter the input string:Welcome To Python\n",
241 | "\n",
242 | "\n",
243 | "Frequency of all characters in Input string:\n",
244 | " Counter({'o': 3, 'e': 2, ' ': 2, 't': 2, 'w': 1, 'l': 1, 'c': 1, 'm': 1, 'p': 1, 'y': 1, 'h': 1, 'n': 1})\n"
245 | ]
246 | }
247 | ],
248 | "source": [
249 | "# using collections.Counter() to get frequency of each character in string\n",
250 | "from collections import Counter\n",
251 | "\n",
252 | "#casefold() is to ensure that uppercase and lower case charachter is treated same.\n",
253 | "Input_str = input('Enter the input string:').casefold()\n",
254 | "print('\\n')\n",
255 | "res = Counter(Input_str)\n",
256 | "print('Frequency of all characters in Input string:\\n',res)"
257 | ]
258 | }
259 | ],
260 | "metadata": {
261 | "kernelspec": {
262 | "display_name": "Python 3",
263 | "language": "python",
264 | "name": "python3"
265 | },
266 | "language_info": {
267 | "codemirror_mode": {
268 | "name": "ipython",
269 | "version": 3
270 | },
271 | "file_extension": ".py",
272 | "mimetype": "text/x-python",
273 | "name": "python",
274 | "nbconvert_exporter": "python",
275 | "pygments_lexer": "ipython3",
276 | "version": "3.8.5"
277 | }
278 | },
279 | "nbformat": 4,
280 | "nbformat_minor": 4
281 | }
282 |
--------------------------------------------------------------------------------
/Worksheet_set_1/Python Worksheet 1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Worksheet_set_1/Python Worksheet 1.pdf
--------------------------------------------------------------------------------
/Worksheet_set_1/Statistics Worksheet 1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Lab-of-Infinity/Internship/03d2add52e6caf1ca733bb6149ef204ca73cce10/Worksheet_set_1/Statistics Worksheet 1.pdf
--------------------------------------------------------------------------------
/Worksheet_set_1/Worksheet_set_1.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------