├── README.md ├── Screenshot 2025-04-12 134644.png ├── Screenshot 2025-04-12 134707.png ├── Screenshot 2025-04-12 134732.png ├── Screenshot 2025-04-12 134755.png ├── Screenshot 2025-04-12 134821.png ├── Vrinda Store Data Analysis.xlsx └── index.py /README.md: -------------------------------------------------------------------------------- 1 | # 📈 Sales Visualization with Python 2 | 3 | This project is an interactive sales dashboard built using **Python** and **Excel** to analyze and visualize sales data for **Vrinda Store**. It uses Python to process the data and Excel to present it in a visually appealing format. 4 | 5 | ![Dashboard Screenshot](Screenshot%202025-04-12%20134644.png) 6 | 7 | ------------- 8 | 9 | ## 🧾 Overview 10 | 11 | The **Sales Visualization Dashboard** is designed to help businesses easily understand their sales performance. With the combination of Python for data handling and Excel for charting, this project provides a powerful tool for data-driven decision making and for better analaysis 12 | 13 | ----- 14 | 15 | ## ✅ Features 16 | 17 | - 📊 **Sales Trend Charts** – View sales over time using line/bar charts. 18 | - 🧮 **KPIs and Metrics** – Automatically calculated total sales, units sold, and more. 19 | - 🧩 **Clean and Simple Layout** – Easy to use and interpret visuals. 20 | - 📂 **Data Automation with Python** – Process and clean raw sales data using a Python script. 21 | - 🔄 **Refreshable Excel Dashboard** – Update dashboard instantly when new data is available. 22 | 23 | --- 24 | 25 | ## 📁 Files Included 26 | 27 | - `Vrinda Store Data Analysis.xlsx` – Excel dashboard with charts and visualizations. 28 | - `index.py` – Python script to clean and prepare the data. 29 | - `Screenshot 2025-04-12 134644.png` – Preview of the final dashboard. 30 | - `README.md` – Project documentation. 31 | 32 | --- 33 | 34 | ## 🚀 How to Use: 35 | 36 | 1. **Clone the Repository** 37 | 2. **Make chnages** 38 | -------------------------------------------------------------------------------- /Screenshot 2025-04-12 134644.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134644.png -------------------------------------------------------------------------------- /Screenshot 2025-04-12 134707.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134707.png -------------------------------------------------------------------------------- /Screenshot 2025-04-12 134732.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134732.png -------------------------------------------------------------------------------- /Screenshot 2025-04-12 134755.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134755.png -------------------------------------------------------------------------------- /Screenshot 2025-04-12 134821.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134821.png -------------------------------------------------------------------------------- /Vrinda Store Data Analysis.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Vrinda Store Data Analysis.xlsx -------------------------------------------------------------------------------- /index.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | from matplotlib.ticker import FuncFormatter 6 | from scipy import stats 7 | import warnings 8 | warnings.filterwarnings('ignore') 9 | 10 | 11 | sns.set_style('whitegrid') 12 | 13 | def load_data(file_path): 14 | try: 15 | df = pd.read_excel(file_path) 16 | print("Data loaded successfully.") 17 | return df 18 | except FileNotFoundError: 19 | print(f"Error: File {file_path} not found.") 20 | return None 21 | except Exception as e: 22 | print(f"Error loading data: {e}") 23 | return None 24 | 25 | def clean_data(df): 26 | if df is None: 27 | print("Error: No data to clean.") 28 | return None 29 | 30 | df.rename(columns={'Channel ': 'Channel'}, inplace=True) 31 | 32 | def clean_gender(gender): 33 | if pd.isna(gender): 34 | return np.nan 35 | gender = str(gender).strip().lower() 36 | if gender in ['women', 'w', 'female']: 37 | return 'Female' 38 | elif gender in ['men', 'm', 'male']: 39 | return 'Male' 40 | else: 41 | return np.nan 42 | 43 | df['Gender'] = df['Gender'].apply(clean_gender) 44 | 45 | # Clean Qty column 46 | qty_map = {'one': 1, 'two': 2, 'three': 3, 'One': 1, 'Two': 2, 'Three': 3} 47 | df['Qty'] = df['Qty'].apply(lambda x: qty_map.get(str(x).lower(), x) if isinstance(x, str) else x) 48 | df['Qty'] = pd.to_numeric(df['Qty'], errors='coerce') 49 | 50 | # Clean Channel column 51 | df['Channel'] = df['Channel'].str.strip().str.title().replace('', np.nan) 52 | 53 | # Ensure numeric and datetime columns 54 | df['Age'] = pd.to_numeric(df['Age'], errors='coerce') 55 | df['Date'] = pd.to_datetime(df['Date'], errors='coerce', unit='D') 56 | df['Amount'] = pd.to_numeric(df['Amount'], errors='coerce') 57 | 58 | # Create Age Group column 59 | def get_age_group(age): 60 | if pd.isna(age): 61 | return np.nan 62 | elif 3 <= age <= 18: 63 | return 'Teenager' 64 | elif 19 <= age <= 64: 65 | return 'Adult' 66 | elif age >= 65: 67 | return 'Senior' 68 | else: 69 | return 'Other' 70 | 71 | df['Age Group'] = df['Age'].apply(get_age_group) 72 | 73 | print("\nDataset Info After Cleaning:") 74 | df.info() 75 | return df 76 | 77 | def run_statistical_tests(df): 78 | if df is None or df.empty: 79 | print("Error: No data for statistical testing.") 80 | return 81 | 82 | print("\n🔬 Running Statistical Tests:") 83 | 84 | # Test 1: T-Test - Compare sales between two genders 85 | print("\nT-Test between Female and Male Sales:") 86 | female_sales = df[df['Gender'] == 'Female']['Amount'].dropna() 87 | male_sales = df[df['Gender'] == 'Male']['Amount'].dropna() 88 | 89 | if len(female_sales) > 0 and len(male_sales) > 0: 90 | t_stat, p_val = stats.ttest_ind(female_sales, male_sales, equal_var=False) 91 | 92 | print(f" T-statistic = {t_stat:.4f}") 93 | print(f" P-value = {p_val:.4f}") 94 | 95 | if p_val < 0.05: 96 | print(" → Statistically significant difference in sales between genders.") 97 | else: 98 | print(" → No significant difference found in sales between genders.") 99 | else: 100 | print(" No sufficient data for gender comparison.") 101 | 102 | # Test 2: T-Test - Compare sales between top two categories 103 | top_categories = df.groupby('Category')['Amount'].sum().nlargest(2).index.tolist() 104 | 105 | if len(top_categories) >= 2: 106 | cat1, cat2 = top_categories[0], top_categories[1] 107 | print(f"\nT-Test between {cat1} and {cat2} Categories:") 108 | 109 | cat1_sales = df[df['Category'] == cat1]['Amount'].dropna() 110 | cat2_sales = df[df['Category'] == cat2]['Amount'].dropna() 111 | 112 | if len(cat1_sales) > 0 and len(cat2_sales) > 0: 113 | t_stat, p_val = stats.ttest_ind(cat1_sales, cat2_sales, equal_var=False) 114 | 115 | print(f" T-statistic = {t_stat:.4f}") 116 | print(f" P-value = {p_val:.4f}") 117 | 118 | if p_val < 0.05: 119 | print(f" → Statistically significant difference in sales between {cat1} and {cat2}.") 120 | else: 121 | print(f" → No significant difference found in sales between categories.") 122 | else: 123 | print(" No sufficient data for category comparison.") 124 | else: 125 | print(" Not enough categories for comparison.") 126 | 127 | # Test 3: Z-Test - Compare channel sales to overall mean 128 | top_channel = df['Channel'].value_counts().index[0] if 'Channel' in df.columns and not df['Channel'].empty else None 129 | 130 | if top_channel: 131 | print(f"\n📌 Z-Test for {top_channel} vs overall sales mean:") 132 | 133 | channel_sales = df[df['Channel'] == top_channel]['Amount'].dropna() 134 | overall_mean = df['Amount'].mean() 135 | overall_std = df['Amount'].std() 136 | sample_size = len(channel_sales) 137 | 138 | if sample_size > 0: 139 | z_score = (channel_sales.mean() - overall_mean) / (overall_std / np.sqrt(sample_size)) 140 | # Two-tailed p-value 141 | p_val = 2 * (1 - stats.norm.cdf(abs(z_score))) 142 | 143 | print(f" Z-score = {z_score:.4f}") 144 | print(f" P-value = {p_val:.4f}") 145 | 146 | if p_val < 0.05: 147 | print(f" → {top_channel} has significantly different sales from the overall average.") 148 | else: 149 | print(f" → No significant difference from the overall sales average.") 150 | else: 151 | print(f" No data found for channel: {top_channel}") 152 | else: 153 | print(" No channel data available for Z-test.") 154 | 155 | # Test 4: Z-Test - Compare state sales to overall mean 156 | if 'ship-state' in df.columns: 157 | top_state = df['ship-state'].value_counts().index[0] 158 | print(f"\n📌 Z-Test for {top_state} vs overall sales mean:") 159 | 160 | state_sales = df[df['ship-state'] == top_state]['Amount'].dropna() 161 | overall_mean = df['Amount'].mean() 162 | overall_std = df['Amount'].std() 163 | sample_size = len(state_sales) 164 | 165 | if sample_size > 0: 166 | z_score = (state_sales.mean() - overall_mean) / (overall_std / np.sqrt(sample_size)) 167 | # Two-tailed p-value 168 | p_val = 2 * (1 - stats.norm.cdf(abs(z_score))) 169 | 170 | print(f" Z-score = {z_score:.4f}") 171 | print(f" P-value = {p_val:.4f}") 172 | 173 | if p_val < 0.05: 174 | print(f" → {top_state} has significantly different sales from the overall average.") 175 | else: 176 | print(f" → No significant difference from the overall sales average.") 177 | else: 178 | print(f" No data found for state: {top_state}") 179 | 180 | def visualize_data(df): 181 | if df is None or df.empty: 182 | print("Error: No data for visualization.") 183 | return 184 | 185 | # Formatter for K format 186 | def thousands_formatter(x, pos): 187 | if x >= 1000: 188 | return f'{int(x/1000)}K' 189 | return f'{int(x)}' 190 | 191 | # Objective 1: Total Sales by Ship State (Top 10) 192 | top_states = df.groupby('ship-state')['Amount'].sum().reset_index().sort_values(by='Amount', ascending=False).head(10) 193 | plt.figure(figsize=(12, 8)) 194 | sns.barplot(x='ship-state', y='Amount', data=top_states, palette='Blues_d') 195 | plt.title('Top 10 Ship States by Sales Amount', fontsize=14) 196 | plt.xlabel('Ship State', fontsize=12) 197 | plt.ylabel('Total Sales (INR)', fontsize=12) 198 | plt.xticks(rotation=45, ha='right') 199 | plt.gca().yaxis.set_major_formatter(FuncFormatter(thousands_formatter)) 200 | plt.tight_layout() 201 | plt.show() 202 | 203 | # Objective 2: Sales Distribution by Gender 204 | sales_by_gender = df.groupby('Gender')['Amount'].sum() 205 | plt.figure(figsize=(8, 8)) 206 | plt.pie(sales_by_gender, labels=sales_by_gender.index, autopct='%1.1f%%', startangle=90, colors=['#ff9999', '#66b3ff']) 207 | plt.title('Sales Distribution by Gender', fontsize=14) 208 | plt.tight_layout() 209 | plt.show() 210 | 211 | # Objective 3: Category-wise Sales Distribution 212 | sales_by_category = df.groupby('Category')['Amount'].sum().reset_index().sort_values(by='Amount', ascending=False) 213 | plt.figure(figsize=(10, 6)) 214 | sns.barplot(x='Category', y='Amount', data=sales_by_category, palette='Greens_d') 215 | plt.title('Sales Distribution by Product Category', fontsize=14) 216 | plt.xlabel('Category', fontsize=12) 217 | plt.ylabel('Total Sales (INR)', fontsize=12) 218 | plt.xticks(rotation=45, ha='right') 219 | plt.gca().yaxis.set_major_formatter(FuncFormatter(thousands_formatter)) 220 | plt.tight_layout() 221 | plt.show() 222 | 223 | # NEW VISUALIZATION: Box Plot of Sales Amount by Age Group 224 | plt.figure(figsize=(12, 8)) 225 | sns.boxplot(x='Age Group', y='Amount', data=df, palette='viridis') 226 | plt.title('Distribution of Sales Amount by Age Group', fontsize=14) 227 | plt.xlabel('Age Group', fontsize=12) 228 | plt.ylabel('Sales Amount (INR)', fontsize=12) 229 | plt.grid(axis='y', linestyle='--', alpha=0.7) 230 | plt.tight_layout() 231 | plt.show() 232 | 233 | # Objective 4: Channel-wise Order Volume 234 | plt.figure(figsize=(10, 6)) 235 | sns.countplot(y='Channel', data=df, order=df['Channel'].value_counts().index, palette='Purples_d') 236 | plt.title('Order Volume by Sales Channel', fontsize=14) 237 | plt.xlabel('Number of Orders', fontsize=12) 238 | plt.ylabel('Channel', fontsize=12) 239 | plt.tight_layout() 240 | plt.show() 241 | 242 | # Objective 5: Order Status Breakdown 243 | status_counts = df['Status'].value_counts() 244 | plt.figure(figsize=(8, 8)) 245 | plt.pie(status_counts, labels=status_counts.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('Set2')) 246 | plt.title('Order Status Breakdown', fontsize=14) 247 | plt.tight_layout() 248 | plt.show() 249 | 250 | # NEW VISUALIZATION: Correlation Heatmap 251 | # Select only numeric columns for correlation 252 | numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns.tolist() 253 | if len(numeric_cols) >= 2: # Need at least 2 numeric columns for correlation 254 | plt.figure(figsize=(10, 8)) 255 | correlation_matrix = df[numeric_cols].corr() 256 | mask = np.triu(np.ones_like(correlation_matrix, dtype=bool)) # Create mask for upper triangle 257 | sns.heatmap(correlation_matrix, mask=mask, annot=True, fmt='.2f', cmap='coolwarm', 258 | linewidths=0.5, cbar_kws={'label': 'Correlation Coefficient'}) 259 | plt.title('Correlation Heatmap of Numeric Variables', fontsize=14) 260 | plt.tight_layout() 261 | plt.show() 262 | else: 263 | print("Not enough numeric columns for correlation heatmap.") 264 | 265 | def main(): 266 | 267 | file_path = r'E:\Python Project 1\Vrinda Store Data Analysis.xlsx' 268 | 269 | df = load_data(file_path) 270 | 271 | df = clean_data(df) 272 | 273 | run_statistical_tests(df) # Updated function name 274 | 275 | visualize_data(df) 276 | 277 | print("\nData cleaning, statistical testing, and visualizations completed successfully.") 278 | 279 | if __name__ == "__main__": 280 | main() 281 | --------------------------------------------------------------------------------