└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # CAR PRICE PREDICTION MULTIPLE LINEAR REGRESSION 2 | 3 | Implementing Multiple Linear Regression to predict Car Prices 4 | 5 | ## Table of Contents 6 | 7 | - [Introduction](#introduction) 8 | - [Data Source](#data-source) 9 | - [Features](#features) 10 | - [Learnings](#learnings) 11 | 12 | ## Introduction 13 | 14 | The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. 15 | The goal is: 16 | - Which variables are significant in predicting the price of a car? 17 | - How well do those variables describe the price of a car? 18 | 19 | ## Data Source 20 | This data set is taken from Kaggle, the link to which is: 21 | - https://www.kaggle.com/datasets/hellbuoy/car-price-prediction 22 | 23 | ## Features 24 | 25 | This is a very detailed and explanatory project for beginners who are looking to get their hands dirty with Exploratory Data Analysis, Hypothesis Testing, and Linear Regression. 26 | 27 | - Exploratory Data Analysis of Categorical and Numerical Variables. 28 | - Using 1 Sample t-test to check if a sample car feature (Average Horsepower in a Sedan) from my dataset is a good representation of the population. 29 | - Using Ordinary Least Squares (OLS) and Normal Equation. 30 | - Using Gradient Descent and Cost Function. 31 | 32 | ## Learnings 33 | 34 | The first real project I worked on, taught me several things: 35 | - Getting comfortable with working with a relatively large and noisy dataset. 36 | - Using and showcasing several Python techniques and libraries: 37 | 1. Numpy 38 | 2. Pandas 39 | 3. Scipy 40 | 4. Matplotlib 41 | 5. Seaborn 42 | 6. Scikit Learn 43 | - Using techniques like: 44 | 1. Outlier Detection 45 | 2. Identifying Input Features and Target Variables. 46 | 3. Visualizing different graphs for different variables. 47 | 4. Visualizing relationships amongst different variables. 48 | - Hypothesis Testing: 49 | 1. Deciding which test to conduct. 50 | 2. Defining a significant alpha value. 51 | 3. Comparing the alpha with the p-value and t-calculated with t-critical to draw conclusions about the statistical significance and infer information about the population based on the sample. 52 | - Employing OLS and Normal Equation to build a Multiple Linear Regression Model 53 | - Splitting the dataset into training and testing splits. 54 | - Visualising the predictions and actual values via scatterplot. 55 | - Using Regression Performance Metrics to check the accuracy of our model: 56 | 1. Mean Absolute Error 57 | 2. Mean Squared Error 58 | 3. Root Mean Squared Error 59 | 4. R2 Score 60 | 5. Adjusted R2 Score 61 | - Employing Gradient Descent to build the second Multiple Linear Regression Model. 62 | - Feature Engineering: 63 | 1. Introducing new features from the existing ones to prevent overfitting and save computational resources. 64 | 2. Since introducing new features may increase the chances of introducing Multi-Collinearity in the dataset. 65 | 3. Checking for Multi-Collinearity through Heatmap and Variance Inflation Factor (VIF). 66 | 4. Choosing relevant features for the second model. 67 | - Feature Scaling 68 | 1. Using Z-Score Normalization to scale the input features for a better and more accurate model. 69 | - Using Gradient Descent to build the second model. 70 | - Visualising the predictions and actual values via scatterplot. 71 | - Using Regression Performance Metrics to check the accuracy of our model: 72 | 1. Mean Absolute Error 73 | 2. Mean Squared Error 74 | 3. Root Mean Squared Error 75 | 4. R2 Score 76 | 5. Adjusted R2 Score 77 | 78 | Below you can find a few snippets of the project 79 | 80 | ![image](https://github.com/hitesh-hetfield/DS_Projects/assets/151897902/71816b84-6bba-4527-abfe-c08a980c02c3) 81 | ![image](https://github.com/hitesh-hetfield/DS_Projects/assets/151897902/4096e75e-2319-4aea-b93c-4dbdb156411b) 82 | ![image](https://github.com/hitesh-hetfield/DS_Projects/assets/151897902/ce9d06e0-22a3-4f46-8d04-80092b0bc256) 83 | ![image](https://github.com/hitesh-hetfield/DS_Projects/assets/151897902/817487e5-6a78-4e6f-bdd3-90418b913618) 84 | 85 | 86 | 87 | 88 | 89 | --------------------------------------------------------------------------------