├── README.md
├── Screenshot 2025-04-12 134644.png
├── Screenshot 2025-04-12 134707.png
├── Screenshot 2025-04-12 134732.png
├── Screenshot 2025-04-12 134755.png
├── Screenshot 2025-04-12 134821.png
├── Vrinda Store Data Analysis.xlsx
└── index.py


/README.md:
--------------------------------------------------------------------------------
 1 | # 📈 Sales Visualization with Python
 2 | 
 3 | This project is an interactive sales dashboard built using **Python** and **Excel** to analyze and visualize sales data for **Vrinda Store**. It uses Python to process the data and Excel to present it in a visually appealing format.
 4 | 
 5 | ![Dashboard Screenshot](Screenshot%202025-04-12%20134644.png) 
 6 | 
 7 | -------------
 8 | 
 9 | ## 🧾 Overview
10 | 
11 | The **Sales Visualization Dashboard** is designed to help businesses easily understand their sales performance. With the combination of Python for data handling and Excel for charting, this project provides a powerful tool for data-driven decision making and for better analaysis
12 | 
13 | -----
14 | 
15 | ## ✅ Features
16 | 
17 | - 📊 **Sales Trend Charts** – View sales over time using line/bar charts.
18 | - 🧮 **KPIs and Metrics** – Automatically calculated total sales, units sold, and more.
19 | - 🧩 **Clean and Simple Layout** – Easy to use and interpret visuals.
20 | - 📂 **Data Automation with Python** – Process and clean raw sales data using a Python script.
21 | - 🔄 **Refreshable Excel Dashboard** – Update dashboard instantly when new data is available.
22 | 
23 | ---
24 | 
25 | ## 📁 Files Included
26 | 
27 | - `Vrinda Store Data Analysis.xlsx` – Excel dashboard with charts and visualizations.
28 | - `index.py` – Python script to clean and prepare the data.
29 | - `Screenshot 2025-04-12 134644.png` – Preview of the final dashboard.
30 | - `README.md` – Project documentation.
31 | 
32 | ---
33 | 
34 | ## 🚀 How to Use: 
35 | 
36 | 1. **Clone the Repository**  
37 | 2. **Make chnages**
38 | 


--------------------------------------------------------------------------------
/Screenshot 2025-04-12 134644.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134644.png


--------------------------------------------------------------------------------
/Screenshot 2025-04-12 134707.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134707.png


--------------------------------------------------------------------------------
/Screenshot 2025-04-12 134732.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134732.png


--------------------------------------------------------------------------------
/Screenshot 2025-04-12 134755.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134755.png


--------------------------------------------------------------------------------
/Screenshot 2025-04-12 134821.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Screenshot 2025-04-12 134821.png


--------------------------------------------------------------------------------
/Vrinda Store Data Analysis.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Vivekb638/Sales-Visualization-by-python/782c913e461c0729cc394247d5226e6823b42098/Vrinda Store Data Analysis.xlsx


--------------------------------------------------------------------------------
/index.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | import seaborn as sns
  5 | from matplotlib.ticker import FuncFormatter
  6 | from scipy import stats
  7 | import warnings
  8 | warnings.filterwarnings('ignore')
  9 | 
 10 | 
 11 | sns.set_style('whitegrid')
 12 | 
 13 | def load_data(file_path):
 14 |     try:
 15 |         df = pd.read_excel(file_path)
 16 |         print("Data loaded successfully.")
 17 |         return df
 18 |     except FileNotFoundError:
 19 |         print(f"Error: File {file_path} not found.")
 20 |         return None
 21 |     except Exception as e:
 22 |         print(f"Error loading data: {e}")
 23 |         return None
 24 | 
 25 | def clean_data(df):
 26 |     if df is None:
 27 |         print("Error: No data to clean.")
 28 |         return None
 29 |   
 30 |     df.rename(columns={'Channel ': 'Channel'}, inplace=True)
 31 | 
 32 |     def clean_gender(gender):
 33 |         if pd.isna(gender):
 34 |             return np.nan
 35 |         gender = str(gender).strip().lower()
 36 |         if gender in ['women', 'w', 'female']:
 37 |             return 'Female'
 38 |         elif gender in ['men', 'm', 'male']:
 39 |             return 'Male'
 40 |         else:
 41 |             return np.nan
 42 | 
 43 |     df['Gender'] = df['Gender'].apply(clean_gender)
 44 | 
 45 |     # Clean Qty column
 46 |     qty_map = {'one': 1, 'two': 2, 'three': 3, 'One': 1, 'Two': 2, 'Three': 3}
 47 |     df['Qty'] = df['Qty'].apply(lambda x: qty_map.get(str(x).lower(), x) if isinstance(x, str) else x)
 48 |     df['Qty'] = pd.to_numeric(df['Qty'], errors='coerce')
 49 | 
 50 |     # Clean Channel column
 51 |     df['Channel'] = df['Channel'].str.strip().str.title().replace('', np.nan)
 52 | 
 53 |     # Ensure numeric and datetime columns
 54 |     df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
 55 |     df['Date'] = pd.to_datetime(df['Date'], errors='coerce', unit='D')
 56 |     df['Amount'] = pd.to_numeric(df['Amount'], errors='coerce')
 57 | 
 58 |     # Create Age Group column
 59 |     def get_age_group(age):
 60 |         if pd.isna(age):
 61 |             return np.nan
 62 |         elif 3 <= age <= 18:
 63 |             return 'Teenager'
 64 |         elif 19 <= age <= 64:
 65 |             return 'Adult'
 66 |         elif age >= 65:
 67 |             return 'Senior'
 68 |         else:
 69 |             return 'Other'
 70 | 
 71 |     df['Age Group'] = df['Age'].apply(get_age_group)
 72 | 
 73 |     print("\nDataset Info After Cleaning:")
 74 |     df.info()
 75 |     return df
 76 | 
 77 | def run_statistical_tests(df):
 78 |     if df is None or df.empty:
 79 |         print("Error: No data for statistical testing.")
 80 |         return
 81 | 
 82 |     print("\n🔬 Running Statistical Tests:")
 83 | 
 84 |     # Test 1: T-Test - Compare sales between two genders
 85 |     print("\nT-Test between Female and Male Sales:")
 86 |     female_sales = df[df['Gender'] == 'Female']['Amount'].dropna()
 87 |     male_sales = df[df['Gender'] == 'Male']['Amount'].dropna()
 88 | 
 89 |     if len(female_sales) > 0 and len(male_sales) > 0:
 90 |         t_stat, p_val = stats.ttest_ind(female_sales, male_sales, equal_var=False)
 91 |         
 92 |         print(f"   T-statistic = {t_stat:.4f}")
 93 |         print(f"   P-value = {p_val:.4f}")
 94 |         
 95 |         if p_val < 0.05:
 96 |             print("   → Statistically significant difference in sales between genders.")
 97 |         else:
 98 |             print("   → No significant difference found in sales between genders.")
 99 |     else:
100 |         print("   No sufficient data for gender comparison.")
101 | 
102 |     # Test 2: T-Test - Compare sales between top two categories
103 |     top_categories = df.groupby('Category')['Amount'].sum().nlargest(2).index.tolist()
104 |     
105 |     if len(top_categories) >= 2:
106 |         cat1, cat2 = top_categories[0], top_categories[1]
107 |         print(f"\nT-Test between {cat1} and {cat2} Categories:")
108 |         
109 |         cat1_sales = df[df['Category'] == cat1]['Amount'].dropna()
110 |         cat2_sales = df[df['Category'] == cat2]['Amount'].dropna()
111 |         
112 |         if len(cat1_sales) > 0 and len(cat2_sales) > 0:
113 |             t_stat, p_val = stats.ttest_ind(cat1_sales, cat2_sales, equal_var=False)
114 |             
115 |             print(f"   T-statistic = {t_stat:.4f}")
116 |             print(f"   P-value = {p_val:.4f}")
117 |             
118 |             if p_val < 0.05:
119 |                 print(f"   → Statistically significant difference in sales between {cat1} and {cat2}.")
120 |             else:
121 |                 print(f"   → No significant difference found in sales between categories.")
122 |         else:
123 |             print("   No sufficient data for category comparison.")
124 |     else:
125 |         print("   Not enough categories for comparison.")
126 | 
127 |     # Test 3: Z-Test - Compare channel sales to overall mean
128 |     top_channel = df['Channel'].value_counts().index[0] if 'Channel' in df.columns and not df['Channel'].empty else None
129 |     
130 |     if top_channel:
131 |         print(f"\n📌 Z-Test for {top_channel} vs overall sales mean:")
132 |         
133 |         channel_sales = df[df['Channel'] == top_channel]['Amount'].dropna()
134 |         overall_mean = df['Amount'].mean()
135 |         overall_std = df['Amount'].std()
136 |         sample_size = len(channel_sales)
137 |         
138 |         if sample_size > 0:
139 |             z_score = (channel_sales.mean() - overall_mean) / (overall_std / np.sqrt(sample_size))
140 |             # Two-tailed p-value
141 |             p_val = 2 * (1 - stats.norm.cdf(abs(z_score)))
142 |             
143 |             print(f"   Z-score = {z_score:.4f}")
144 |             print(f"   P-value = {p_val:.4f}")
145 |             
146 |             if p_val < 0.05:
147 |                 print(f"   → {top_channel} has significantly different sales from the overall average.")
148 |             else:
149 |                 print(f"   → No significant difference from the overall sales average.")
150 |         else:
151 |             print(f"   No data found for channel: {top_channel}")
152 |     else:
153 |         print("   No channel data available for Z-test.")
154 | 
155 |     # Test 4: Z-Test - Compare state sales to overall mean
156 |     if 'ship-state' in df.columns:
157 |         top_state = df['ship-state'].value_counts().index[0]
158 |         print(f"\n📌 Z-Test for {top_state} vs overall sales mean:")
159 |         
160 |         state_sales = df[df['ship-state'] == top_state]['Amount'].dropna()
161 |         overall_mean = df['Amount'].mean()
162 |         overall_std = df['Amount'].std()
163 |         sample_size = len(state_sales)
164 |         
165 |         if sample_size > 0:
166 |             z_score = (state_sales.mean() - overall_mean) / (overall_std / np.sqrt(sample_size))
167 |             # Two-tailed p-value
168 |             p_val = 2 * (1 - stats.norm.cdf(abs(z_score)))
169 |             
170 |             print(f"   Z-score = {z_score:.4f}")
171 |             print(f"   P-value = {p_val:.4f}")
172 |             
173 |             if p_val < 0.05:
174 |                 print(f"   → {top_state} has significantly different sales from the overall average.")
175 |             else:
176 |                 print(f"   → No significant difference from the overall sales average.")
177 |         else:
178 |             print(f"   No data found for state: {top_state}")
179 | 
180 | def visualize_data(df):
181 |     if df is None or df.empty:
182 |         print("Error: No data for visualization.")
183 |         return
184 | 
185 |     # Formatter for K format
186 |     def thousands_formatter(x, pos):
187 |         if x >= 1000:
188 |             return f'{int(x/1000)}K'
189 |         return f'{int(x)}'
190 | 
191 |     # Objective 1: Total Sales by Ship State (Top 10)
192 |     top_states = df.groupby('ship-state')['Amount'].sum().reset_index().sort_values(by='Amount', ascending=False).head(10)
193 |     plt.figure(figsize=(12, 8))
194 |     sns.barplot(x='ship-state', y='Amount', data=top_states, palette='Blues_d')
195 |     plt.title('Top 10 Ship States by Sales Amount', fontsize=14)
196 |     plt.xlabel('Ship State', fontsize=12)
197 |     plt.ylabel('Total Sales (INR)', fontsize=12)
198 |     plt.xticks(rotation=45, ha='right')
199 |     plt.gca().yaxis.set_major_formatter(FuncFormatter(thousands_formatter))
200 |     plt.tight_layout()
201 |     plt.show()
202 | 
203 |     # Objective 2: Sales Distribution by Gender
204 |     sales_by_gender = df.groupby('Gender')['Amount'].sum()
205 |     plt.figure(figsize=(8, 8))
206 |     plt.pie(sales_by_gender, labels=sales_by_gender.index, autopct='%1.1f%%', startangle=90, colors=['#ff9999', '#66b3ff'])
207 |     plt.title('Sales Distribution by Gender', fontsize=14)
208 |     plt.tight_layout()
209 |     plt.show()
210 | 
211 |     # Objective 3: Category-wise Sales Distribution
212 |     sales_by_category = df.groupby('Category')['Amount'].sum().reset_index().sort_values(by='Amount', ascending=False)
213 |     plt.figure(figsize=(10, 6))
214 |     sns.barplot(x='Category', y='Amount', data=sales_by_category, palette='Greens_d')
215 |     plt.title('Sales Distribution by Product Category', fontsize=14)
216 |     plt.xlabel('Category', fontsize=12)
217 |     plt.ylabel('Total Sales (INR)', fontsize=12)
218 |     plt.xticks(rotation=45, ha='right')
219 |     plt.gca().yaxis.set_major_formatter(FuncFormatter(thousands_formatter))
220 |     plt.tight_layout()
221 |     plt.show()
222 |     
223 |     # NEW VISUALIZATION: Box Plot of Sales Amount by Age Group
224 |     plt.figure(figsize=(12, 8))
225 |     sns.boxplot(x='Age Group', y='Amount', data=df, palette='viridis')
226 |     plt.title('Distribution of Sales Amount by Age Group', fontsize=14)
227 |     plt.xlabel('Age Group', fontsize=12)
228 |     plt.ylabel('Sales Amount (INR)', fontsize=12)
229 |     plt.grid(axis='y', linestyle='--', alpha=0.7)
230 |     plt.tight_layout()
231 |     plt.show()
232 | 
233 |     # Objective 4: Channel-wise Order Volume
234 |     plt.figure(figsize=(10, 6))
235 |     sns.countplot(y='Channel', data=df, order=df['Channel'].value_counts().index, palette='Purples_d')
236 |     plt.title('Order Volume by Sales Channel', fontsize=14)
237 |     plt.xlabel('Number of Orders', fontsize=12)
238 |     plt.ylabel('Channel', fontsize=12)
239 |     plt.tight_layout()
240 |     plt.show()
241 | 
242 |     # Objective 5: Order Status Breakdown
243 |     status_counts = df['Status'].value_counts()
244 |     plt.figure(figsize=(8, 8))
245 |     plt.pie(status_counts, labels=status_counts.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('Set2'))
246 |     plt.title('Order Status Breakdown', fontsize=14)
247 |     plt.tight_layout()
248 |     plt.show()
249 |     
250 |     # NEW VISUALIZATION: Correlation Heatmap
251 |     # Select only numeric columns for correlation
252 |     numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns.tolist()
253 |     if len(numeric_cols) >= 2:  # Need at least 2 numeric columns for correlation
254 |         plt.figure(figsize=(10, 8))
255 |         correlation_matrix = df[numeric_cols].corr()
256 |         mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))  # Create mask for upper triangle
257 |         sns.heatmap(correlation_matrix, mask=mask, annot=True, fmt='.2f', cmap='coolwarm', 
258 |                     linewidths=0.5, cbar_kws={'label': 'Correlation Coefficient'})
259 |         plt.title('Correlation Heatmap of Numeric Variables', fontsize=14)
260 |         plt.tight_layout()
261 |         plt.show()
262 |     else:
263 |         print("Not enough numeric columns for correlation heatmap.")
264 | 
265 | def main():
266 |     
267 |     file_path = r'E:\Python Project 1\Vrinda Store Data Analysis.xlsx'
268 |     
269 |     df = load_data(file_path)
270 |     
271 |     df = clean_data(df)
272 |     
273 |     run_statistical_tests(df)  # Updated function name
274 | 
275 |     visualize_data(df)
276 |     
277 |     print("\nData cleaning, statistical testing, and visualizations completed successfully.")
278 | 
279 | if __name__ == "__main__":
280 |     main()
281 | 


--------------------------------------------------------------------------------