├── Electric_Vehicle_Population_Size_History_By_County.csv ├── Python_report_12308533.docx ├── README.md └── project.py /Python_report_12308533.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Asmit03/Python-Project---ElectroTrend/5fd97cf956277af5b513f645e46054890bd3619a/Python_report_12308533.docx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Electric Vehicle (EV) Population Analysis 2 | 3 | ## Project Overview 4 | This project analyzes electric vehicle population size history data by county to track EV adoption trends, regional differences, and relationship patterns between different vehicle categories. The analysis provides insights into the growth and distribution of electric vehicles across different counties and states. 5 | 6 | ## Technologies Used 7 | - **Python 3.x** - Core programming language 8 | - **Pandas** - Data manipulation and analysis 9 | - **Matplotlib** - Data visualization 10 | - **Seaborn** - Enhanced data visualization 11 | - **NumPy** - Numerical operations 12 | 13 | ## Dataset 14 | The analysis uses the "Electric_Vehicle_Population_Size_History_By_County.csv" dataset, which contains historical data on electric vehicle populations across different counties and states. The dataset includes information on: 15 | 16 | - Battery Electric Vehicles (BEVs) 17 | - Plug-In Hybrid Electric Vehicles (PHEVs) 18 | - Total Electric Vehicles 19 | - Non-Electric Vehicles 20 | - Total Vehicles 21 | - Electric Vehicle Percentage 22 | - Geographic information (County, State) 23 | - Date information 24 | - Vehicle primary use types 25 | 26 | ## Methodology 27 | 28 | ### Data Preprocessing 29 | The project includes comprehensive data preprocessing steps: 30 | 31 | 1. **Data Exploration** - Initial examination of data structure, statistics, and unique values 32 | 2. **Missing Value Handling** - Removing records with missing critical information (County, State, Date) 33 | 3. **Data Integrity Checks** - Verifying mathematical relationships (EV Total = BEVs + PHEVs, etc.) 34 | 4. **Data Standardization** - Normalizing categorical values (state names, county names) 35 | 5. **Duplicate Removal** - Identifying and removing duplicate records 36 | 6. **Outlier Detection** - Using IQR method to identify potential outliers in numeric columns 37 | 38 | ### Analysis Objectives 39 | 40 | The project investigates six key objectives: 41 | 42 | 1. **EV Growth Over Time** - Tracking the trend of electric vehicle adoption over time 43 | 2. **EV Adoption by Region** - Analyzing which counties and states have the highest EV adoption rates 44 | 3. **Correlation Analysis** - Examining relationships between BEVs, PHEVs, and total EV counts 45 | 4. **Outlier Distribution** - Visualizing outliers in electric vehicle percentage 46 | 5. **100% EV Counties** - Identifying regions with complete EV adoption 47 | 6. **Vehicle Use Analysis** - Investigating the relationship between vehicle total and EV percentage by usage type 48 | 49 | ## Key Visualizations 50 | 51 | The project produces several insightful visualizations: 52 | 53 | 1. **Line Graph** - Monthly trend of EV growth over time 54 | ![Screenshot 2025-04-24 205245](https://github.com/user-attachments/assets/9c3a93b1-77d1-4bcd-bcc5-44cc00059644) 55 | 56 | 2. **Bar Charts** - Top counties and states by average EV percentage 57 | ![Screenshot 2025-04-24 205257](https://github.com/user-attachments/assets/a42a1e54-811f-409e-9c35-486679bb4450) 58 | 59 | 3. **Correlation Heatmap** - Relationship strength between BEV, PHEV, and total EV counts 60 | ![Screenshot 2025-04-24 205311](https://github.com/user-attachments/assets/d2ad3488-4b5a-412e-9b25-a11e9cc59acc) 61 | 62 | 4. **Box Plot** - Distribution and outliers of EV percentage 63 | ![Screenshot 2025-04-24 205320](https://github.com/user-attachments/assets/d909e3cc-aafe-4dc2-b857-3e9f56238297) 64 | 65 | 5. **Bar Chart** - Counties with 100% EV concentration 66 | ![Screenshot 2025-04-24 205331](https://github.com/user-attachments/assets/aecbb2b1-7647-448f-a4d8-ec797245d2ea) 67 | 68 | 6. **Scatter Plot** - Relationship between total vehicles and EV percentage by primary use type 69 | ![Screenshot 2025-04-24 205342](https://github.com/user-attachments/assets/904dbb24-8a97-4c6d-ad5c-672eb9f5bde1) 70 | 71 | 72 | ## Findings and Insights 73 | 74 | The analysis reveals: 75 | - Temporal trends in EV adoption across the dataset period 76 | - Geographic hotspots for electric vehicle adoption 77 | - Strong correlations between different EV categories 78 | - Outlier regions with unusually high or low EV percentages 79 | - Counties with complete EV adoption 80 | - Relationships between vehicle fleet size and electrification percentage by usage type 81 | 82 | ## Running the Project 83 | 84 | 1. Clone this repository - git clone https://github.com/Asmit03/Python-Project---ElectroTrend.git 85 | 2. Ensure you have all required libraries installed: 86 | ``` 87 | pip install pandas seaborn matplotlib numpy 88 | ``` 89 | 3. Place the "Electric_Vehicle_Population_Size_History_By_County.csv" file in the "Same" directory 90 | 4. Run the project.py script: 91 | ``` 92 | python project.py 93 | ``` 94 | 95 | ## Future Improvements 96 | - Implement predictive modeling to forecast future EV adoption rates 97 | - Create interactive dashboards for more dynamic exploration 98 | - Incorporate additional datasets for deeper analysis (e.g., charging infrastructure, economic indicators) 99 | - Perform geographic clustering analysis to identify regional patterns 100 | 101 | ## Contributing 102 | Contributions, issues, and feature requests are welcome. Feel free to check the issues page if you want to contribute. 103 | -------------------------------------------------------------------------------- /project.py: -------------------------------------------------------------------------------- 1 | # Project: CA2 2 | import pandas as pd 3 | import seaborn as sns 4 | import matplotlib.pyplot as plt 5 | import numpy as np 6 | #read csv file 7 | df=pd.read_csv('CA2\Electric_Vehicle_Population_Size_History_By_County.csv') 8 | 9 | 10 | # Exploratory Data Analysis 11 | df.info() 12 | df.describe() 13 | 14 | #check for unique values in each column 15 | print("\nUnique Values in Each Column:") 16 | print(df.nunique()) 17 | 18 | # Data Cleaning and Preprocessing 19 | 20 | # 1. Check and Handle Missing Values 21 | print("\nMissing Values Before Cleaning:") 22 | print(df.isnull().sum()) 23 | 24 | # Drop rows where County or State are missing 25 | df = df.dropna(subset=['County', 'State']) 26 | 27 | # Convert 'Date' to datetime format and handle errors 28 | # Coerce invalid dates to NaT 29 | df['Date'] = pd.to_datetime(df['Date'], errors='coerce') 30 | # Drop rows with invalid dates if needed 31 | df = df.dropna(subset=['Date']) 32 | 33 | # Fill missing values in 'Percent Electric Vehicles' with 0 34 | df['Percent Electric Vehicles'] = df['Percent Electric Vehicles'].fillna(0) 35 | 36 | # 2. Data Integrity Checks 37 | # EV total = BEV + PHEV 38 | ev_mismatch = df[df['Electric Vehicle (EV) Total'] != (df['Battery Electric Vehicles (BEVs)'] + df['Plug-In Hybrid Electric Vehicles (PHEVs)'])] 39 | print(f"\nRows with EV Total mismatch: {len(ev_mismatch)}") 40 | df = df[df['Electric Vehicle (EV) Total'] == (df['Battery Electric Vehicles (BEVs)'] + df['Plug-In Hybrid Electric Vehicles (PHEVs)'])] 41 | 42 | # Total vehicles = EV total + non-EV 43 | total_mismatch = df[df['Total Vehicles'] != (df['Electric Vehicle (EV) Total'] + df['Non-Electric Vehicle Total'])] 44 | print(f"Rows with Total Vehicle mismatch: {len(total_mismatch)}") 45 | df = df[df['Total Vehicles'] == (df['Electric Vehicle (EV) Total'] + df['Non-Electric Vehicle Total'])] 46 | 47 | # 3. Standardize Categorical Values 48 | df['State'] = df['State'].str.upper().str.strip() 49 | df['County'] = df['County'].str.title().str.strip() 50 | df['Vehicle Primary Use'] = df['Vehicle Primary Use'].str.title().str.strip() 51 | 52 | # 4. Remove Duplicates 53 | duplicates_count = df.duplicated().sum() 54 | print(f"\nDuplicate Records Found: {duplicates_count}") 55 | if duplicates_count > 0: 56 | df = df.drop_duplicates() 57 | print("Duplicates removed.") 58 | 59 | # 5. Outlier Detection using IQR (optional: removal step) 60 | numeric_cols = [ 61 | 'Battery Electric Vehicles (BEVs)', 62 | 'Plug-In Hybrid Electric Vehicles (PHEVs)', 63 | 'Electric Vehicle (EV) Total', 64 | 'Non-Electric Vehicle Total', 65 | 'Total Vehicles', 66 | 'Percent Electric Vehicles' 67 | ] 68 | 69 | def count_outliers_iqr(series): 70 | Q1 = series.quantile(0.25) 71 | Q3 = series.quantile(0.75) 72 | IQR = Q3 - Q1 73 | lower = Q1 - 1.5 * IQR 74 | upper = Q3 + 1.5 * IQR 75 | return ((series < lower) | (series > upper)).sum() 76 | 77 | print("\n--- Outlier Count using IQR ---") 78 | for col in numeric_cols: 79 | outliers = count_outliers_iqr(df[col]) 80 | print(f"{col}: {outliers} outliers") 81 | 82 | # Objective 1: EV Growth Over Time 83 | df_monthly = df.groupby(df['Date'].dt.to_period('M'))['Electric Vehicle (EV) Total'].sum().reset_index() 84 | df_monthly['Date'] = df_monthly['Date'].dt.to_timestamp() 85 | 86 | plt.figure(figsize=(12,6)) 87 | sns.lineplot(data=df_monthly, x='Date', y='Electric Vehicle (EV) Total') 88 | plt.title("EV Growth Trend Over Time") 89 | plt.xlabel("Date") 90 | plt.ylabel("EV Total") 91 | plt.grid(True) 92 | plt.tight_layout() 93 | plt.show() 94 | 95 | # Objective 2: EV Adoption by County & State 96 | 97 | top_counties = df.groupby('County')['Percent Electric Vehicles'].mean().sort_values(ascending=False).head(10) 98 | top_states = df.groupby('State')['Percent Electric Vehicles'].mean().sort_values(ascending=False).head(10) 99 | 100 | fig, ax = plt.subplots(1, 2, figsize=(16,6)) 101 | sns.barplot(x=top_counties.values, y=top_counties.index, ax=ax[0]) 102 | ax[0].set_title("Top 10 Counties by Avg % EVs") 103 | sns.barplot(x=top_states.values, y=top_states.index, ax=ax[1]) 104 | ax[1].set_title("Top 10 States by Avg % EVs") 105 | plt.tight_layout() 106 | plt.show() 107 | 108 | # Objective 3: Correlation Between BEV, PHEV, EV Total 109 | plt.figure(figsize=(8,6)) 110 | corr = df[['Battery Electric Vehicles (BEVs)', 'Plug-In Hybrid Electric Vehicles (PHEVs)', 'Electric Vehicle (EV) Total']].corr() 111 | sns.heatmap(corr, annot=True, cmap='coolwarm') 112 | plt.title("Correlation Between BEV, PHEV, EV Total") 113 | plt.tight_layout() 114 | plt.show() 115 | 116 | # Objective 4: Boxplot Outlier Visualization for Percent Electric Vehicles 117 | plt.figure(figsize=(8, 4)) 118 | sns.boxplot(x=df['Percent Electric Vehicles'], color='skyblue') 119 | plt.title("Boxplot of Percent Electric Vehicles") 120 | plt.xlabel("Percent Electric Vehicles") 121 | plt.tight_layout() 122 | plt.show() 123 | 124 | 125 | # Objective 5: Counties with 100% EVs 126 | top_ev_counties = df[df['Percent Electric Vehicles'] == 100]['County'].value_counts().head(10) 127 | #fig 128 | plt.figure(figsize=(10,6)) 129 | sns.barplot(x=top_ev_counties.values, y=top_ev_counties.index, palette='Greens',) 130 | plt.title("Top Counties with 100% EV Concentration") 131 | plt.xlabel("Number of Records") 132 | plt.ylabel("County") 133 | plt.tight_layout() 134 | plt.show() 135 | 136 | # Objective 6: Scatter Plot - Total Vehicles vs % EVs by Usage Type 137 | plt.figure(figsize=(10,6)) 138 | sns.scatterplot(data=df, x='Total Vehicles', y='Percent Electric Vehicles', hue='Vehicle Primary Use', alpha=0.6) 139 | plt.title("EV Adoption by Vehicle Use Type") 140 | plt.xlabel("Total Vehicles") 141 | plt.ylabel("Percent Electric Vehicles") 142 | plt.grid(True) 143 | plt.legend(title='Vehicle Use') 144 | plt.tight_layout() 145 | plt.show() 146 | 147 | # git 148 | 149 | # git - shub 150 | 151 | # git - man 152 | 153 | # git - sak 154 | # git - ary 155 | # git-clone - https://github.com/abheeshakespeare/Python-Project---ElectroTrend.git 156 | # git-change - ujj 157 | # git-change - pri 158 | # END 159 | --------------------------------------------------------------------------------