├── Electric_Vehicle_Population_Size_History_By_County.csv
├── Python_report_12308533.docx
├── README.md
└── project.py


/Python_report_12308533.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Asmit03/Python-Project---ElectroTrend/5fd97cf956277af5b513f645e46054890bd3619a/Python_report_12308533.docx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Electric Vehicle (EV) Population Analysis
  2 | 
  3 | ## Project Overview
  4 | This project analyzes electric vehicle population size history data by county to track EV adoption trends, regional differences, and relationship patterns between different vehicle categories. The analysis provides insights into the growth and distribution of electric vehicles across different counties and states.
  5 | 
  6 | ## Technologies Used
  7 | - **Python 3.x** - Core programming language
  8 | - **Pandas** - Data manipulation and analysis
  9 | - **Matplotlib** - Data visualization
 10 | - **Seaborn** - Enhanced data visualization
 11 | - **NumPy** - Numerical operations
 12 | 
 13 | ## Dataset
 14 | The analysis uses the "Electric_Vehicle_Population_Size_History_By_County.csv" dataset, which contains historical data on electric vehicle populations across different counties and states. The dataset includes information on:
 15 | 
 16 | - Battery Electric Vehicles (BEVs)
 17 | - Plug-In Hybrid Electric Vehicles (PHEVs)
 18 | - Total Electric Vehicles
 19 | - Non-Electric Vehicles
 20 | - Total Vehicles
 21 | - Electric Vehicle Percentage
 22 | - Geographic information (County, State)
 23 | - Date information
 24 | - Vehicle primary use types
 25 | 
 26 | ## Methodology
 27 | 
 28 | ### Data Preprocessing
 29 | The project includes comprehensive data preprocessing steps:
 30 | 
 31 | 1. **Data Exploration** - Initial examination of data structure, statistics, and unique values
 32 | 2. **Missing Value Handling** - Removing records with missing critical information (County, State, Date)
 33 | 3. **Data Integrity Checks** - Verifying mathematical relationships (EV Total = BEVs + PHEVs, etc.)
 34 | 4. **Data Standardization** - Normalizing categorical values (state names, county names)
 35 | 5. **Duplicate Removal** - Identifying and removing duplicate records
 36 | 6. **Outlier Detection** - Using IQR method to identify potential outliers in numeric columns
 37 | 
 38 | ### Analysis Objectives
 39 | 
 40 | The project investigates six key objectives:
 41 | 
 42 | 1. **EV Growth Over Time** - Tracking the trend of electric vehicle adoption over time
 43 | 2. **EV Adoption by Region** - Analyzing which counties and states have the highest EV adoption rates
 44 | 3. **Correlation Analysis** - Examining relationships between BEVs, PHEVs, and total EV counts
 45 | 4. **Outlier Distribution** - Visualizing outliers in electric vehicle percentage
 46 | 5. **100% EV Counties** - Identifying regions with complete EV adoption
 47 | 6. **Vehicle Use Analysis** - Investigating the relationship between vehicle total and EV percentage by usage type
 48 | 
 49 | ## Key Visualizations
 50 | 
 51 | The project produces several insightful visualizations:
 52 | 
 53 | 1. **Line Graph** - Monthly trend of EV growth over time
 54 |  ![Screenshot 2025-04-24 205245](https://github.com/user-attachments/assets/9c3a93b1-77d1-4bcd-bcc5-44cc00059644)
 55 | 
 56 | 2. **Bar Charts** - Top counties and states by average EV percentage
 57 |  ![Screenshot 2025-04-24 205257](https://github.com/user-attachments/assets/a42a1e54-811f-409e-9c35-486679bb4450)
 58 | 
 59 | 3. **Correlation Heatmap** - Relationship strength between BEV, PHEV, and total EV counts
 60 |  ![Screenshot 2025-04-24 205311](https://github.com/user-attachments/assets/d2ad3488-4b5a-412e-9b25-a11e9cc59acc)
 61 | 
 62 | 4. **Box Plot** - Distribution and outliers of EV percentage
 63 |  ![Screenshot 2025-04-24 205320](https://github.com/user-attachments/assets/d909e3cc-aafe-4dc2-b857-3e9f56238297)
 64 | 
 65 | 5. **Bar Chart** - Counties with 100% EV concentration
 66 |  ![Screenshot 2025-04-24 205331](https://github.com/user-attachments/assets/aecbb2b1-7647-448f-a4d8-ec797245d2ea)
 67 | 
 68 | 6. **Scatter Plot** - Relationship between total vehicles and EV percentage by primary use type
 69 | ![Screenshot 2025-04-24 205342](https://github.com/user-attachments/assets/904dbb24-8a97-4c6d-ad5c-672eb9f5bde1)
 70 | 
 71 | 
 72 | ## Findings and Insights
 73 | 
 74 | The analysis reveals:
 75 | - Temporal trends in EV adoption across the dataset period
 76 | - Geographic hotspots for electric vehicle adoption
 77 | - Strong correlations between different EV categories
 78 | - Outlier regions with unusually high or low EV percentages
 79 | - Counties with complete EV adoption
 80 | - Relationships between vehicle fleet size and electrification percentage by usage type
 81 | 
 82 | ## Running the Project
 83 | 
 84 | 1. Clone this repository - git clone https://github.com/Asmit03/Python-Project---ElectroTrend.git
 85 | 2. Ensure you have all required libraries installed:
 86 |    ```
 87 |    pip install pandas seaborn matplotlib numpy
 88 |    ```
 89 | 3. Place the "Electric_Vehicle_Population_Size_History_By_County.csv" file in the "Same" directory
 90 | 4. Run the project.py script:
 91 |    ```
 92 |    python project.py
 93 |    ```
 94 | 
 95 | ## Future Improvements
 96 | - Implement predictive modeling to forecast future EV adoption rates
 97 | - Create interactive dashboards for more dynamic exploration
 98 | - Incorporate additional datasets for deeper analysis (e.g., charging infrastructure, economic indicators)
 99 | - Perform geographic clustering analysis to identify regional patterns
100 | 
101 | ## Contributing
102 | Contributions, issues, and feature requests are welcome. Feel free to check the issues page if you want to contribute.
103 | 


--------------------------------------------------------------------------------
/project.py:
--------------------------------------------------------------------------------
  1 | # Project: CA2
  2 | import pandas as pd
  3 | import seaborn as sns
  4 | import matplotlib.pyplot as plt
  5 | import numpy as np
  6 | #read csv file
  7 | df=pd.read_csv('CA2\Electric_Vehicle_Population_Size_History_By_County.csv')
  8 | 
  9 | 
 10 | # Exploratory Data Analysis
 11 | df.info()
 12 | df.describe()
 13 | 
 14 | #check for unique values in each column
 15 | print("\nUnique Values in Each Column:")
 16 | print(df.nunique())
 17 | 
 18 | # Data Cleaning and Preprocessing
 19 | 
 20 | # 1. Check and Handle Missing Values
 21 | print("\nMissing Values Before Cleaning:")
 22 | print(df.isnull().sum())
 23 | 
 24 | # Drop rows where County or State are missing
 25 | df = df.dropna(subset=['County', 'State'])
 26 | 
 27 | # Convert 'Date' to datetime format and handle errors
 28 | # Coerce invalid dates to NaT
 29 | df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
 30 | # Drop rows with invalid dates if needed
 31 | df = df.dropna(subset=['Date'])
 32 | 
 33 | # Fill missing values in 'Percent Electric Vehicles' with 0
 34 | df['Percent Electric Vehicles'] = df['Percent Electric Vehicles'].fillna(0)
 35 | 
 36 | # 2. Data Integrity Checks
 37 | # EV total = BEV + PHEV
 38 | ev_mismatch = df[df['Electric Vehicle (EV) Total'] != (df['Battery Electric Vehicles (BEVs)'] + df['Plug-In Hybrid Electric Vehicles (PHEVs)'])]
 39 | print(f"\nRows with EV Total mismatch: {len(ev_mismatch)}")
 40 | df = df[df['Electric Vehicle (EV) Total'] == (df['Battery Electric Vehicles (BEVs)'] + df['Plug-In Hybrid Electric Vehicles (PHEVs)'])]
 41 | 
 42 | # Total vehicles = EV total + non-EV
 43 | total_mismatch = df[df['Total Vehicles'] != (df['Electric Vehicle (EV) Total'] + df['Non-Electric Vehicle Total'])]
 44 | print(f"Rows with Total Vehicle mismatch: {len(total_mismatch)}")
 45 | df = df[df['Total Vehicles'] == (df['Electric Vehicle (EV) Total'] + df['Non-Electric Vehicle Total'])]
 46 | 
 47 | # 3. Standardize Categorical Values
 48 | df['State'] = df['State'].str.upper().str.strip()
 49 | df['County'] = df['County'].str.title().str.strip()
 50 | df['Vehicle Primary Use'] = df['Vehicle Primary Use'].str.title().str.strip()
 51 | 
 52 | # 4. Remove Duplicates
 53 | duplicates_count = df.duplicated().sum()
 54 | print(f"\nDuplicate Records Found: {duplicates_count}")
 55 | if duplicates_count > 0:
 56 |     df = df.drop_duplicates()
 57 |     print("Duplicates removed.")
 58 | 
 59 | # 5. Outlier Detection using IQR (optional: removal step)
 60 | numeric_cols = [
 61 |     'Battery Electric Vehicles (BEVs)',
 62 |     'Plug-In Hybrid Electric Vehicles (PHEVs)',
 63 |     'Electric Vehicle (EV) Total',
 64 |     'Non-Electric Vehicle Total',
 65 |     'Total Vehicles',
 66 |     'Percent Electric Vehicles'
 67 | ]
 68 | 
 69 | def count_outliers_iqr(series):
 70 |     Q1 = series.quantile(0.25)
 71 |     Q3 = series.quantile(0.75)
 72 |     IQR = Q3 - Q1
 73 |     lower = Q1 - 1.5 * IQR
 74 |     upper = Q3 + 1.5 * IQR
 75 |     return ((series < lower) | (series > upper)).sum()
 76 | 
 77 | print("\n--- Outlier Count using IQR ---")
 78 | for col in numeric_cols:
 79 |     outliers = count_outliers_iqr(df[col])
 80 |     print(f"{col}: {outliers} outliers")
 81 | 
 82 | # Objective 1: EV Growth Over Time
 83 | df_monthly = df.groupby(df['Date'].dt.to_period('M'))['Electric Vehicle (EV) Total'].sum().reset_index()
 84 | df_monthly['Date'] = df_monthly['Date'].dt.to_timestamp()
 85 | 
 86 | plt.figure(figsize=(12,6))
 87 | sns.lineplot(data=df_monthly, x='Date', y='Electric Vehicle (EV) Total')
 88 | plt.title("EV Growth Trend Over Time")
 89 | plt.xlabel("Date")
 90 | plt.ylabel("EV Total")
 91 | plt.grid(True)
 92 | plt.tight_layout()
 93 | plt.show()
 94 | 
 95 | # Objective 2: EV Adoption by County & State
 96 | 
 97 | top_counties = df.groupby('County')['Percent Electric Vehicles'].mean().sort_values(ascending=False).head(10)
 98 | top_states = df.groupby('State')['Percent Electric Vehicles'].mean().sort_values(ascending=False).head(10)
 99 | 
100 | fig, ax = plt.subplots(1, 2, figsize=(16,6))
101 | sns.barplot(x=top_counties.values, y=top_counties.index, ax=ax[0])
102 | ax[0].set_title("Top 10 Counties by Avg % EVs")
103 | sns.barplot(x=top_states.values, y=top_states.index, ax=ax[1])
104 | ax[1].set_title("Top 10 States by Avg % EVs")
105 | plt.tight_layout()
106 | plt.show()
107 | 
108 | # Objective 3: Correlation Between BEV, PHEV, EV Total
109 | plt.figure(figsize=(8,6))
110 | corr = df[['Battery Electric Vehicles (BEVs)', 'Plug-In Hybrid Electric Vehicles (PHEVs)', 'Electric Vehicle (EV) Total']].corr()
111 | sns.heatmap(corr, annot=True, cmap='coolwarm')
112 | plt.title("Correlation Between BEV, PHEV, EV Total")
113 | plt.tight_layout()
114 | plt.show()
115 | 
116 | # Objective 4: Boxplot Outlier Visualization for Percent Electric Vehicles
117 | plt.figure(figsize=(8, 4))
118 | sns.boxplot(x=df['Percent Electric Vehicles'], color='skyblue')
119 | plt.title("Boxplot of Percent Electric Vehicles")
120 | plt.xlabel("Percent Electric Vehicles")
121 | plt.tight_layout()
122 | plt.show()
123 | 
124 | 
125 | # Objective 5: Counties with 100% EVs
126 | top_ev_counties = df[df['Percent Electric Vehicles'] == 100]['County'].value_counts().head(10)
127 | #fig
128 | plt.figure(figsize=(10,6))
129 | sns.barplot(x=top_ev_counties.values, y=top_ev_counties.index, palette='Greens',)
130 | plt.title("Top Counties with 100% EV Concentration")
131 | plt.xlabel("Number of Records")
132 | plt.ylabel("County")
133 | plt.tight_layout()
134 | plt.show()
135 | 
136 | # Objective 6: Scatter Plot - Total Vehicles vs % EVs by Usage Type
137 | plt.figure(figsize=(10,6))
138 | sns.scatterplot(data=df, x='Total Vehicles', y='Percent Electric Vehicles', hue='Vehicle Primary Use', alpha=0.6)
139 | plt.title("EV Adoption by Vehicle Use Type")
140 | plt.xlabel("Total Vehicles")
141 | plt.ylabel("Percent Electric Vehicles")
142 | plt.grid(True)
143 | plt.legend(title='Vehicle Use')
144 | plt.tight_layout()
145 | plt.show()
146 | 
147 | # git
148 | 
149 | # git - shub
150 | 
151 | # git - man
152 | 
153 | # git - sak
154 | # git - ary
155 | # git-clone - https://github.com/abheeshakespeare/Python-Project---ElectroTrend.git
156 | # git-change - ujj
157 | # git-change - pri
158 | # END
159 | 


--------------------------------------------------------------------------------