├── README.md ├── code.py ├── ppt.pptx ├── project.docx └── python project.csv /README.md: -------------------------------------------------------------------------------- 1 | 📊 Business Financial Analysis using Python 2 | This project performs Exploratory Data Analysis (EDA), Data Cleaning, and Financial Insights Extraction on a dataset containing business-related financial records. The analysis is supported by data visualizations using Matplotlib and Seaborn to aid in business decision-making. 3 | 4 | 📁 Dataset 5 | The dataset includes financial information of businesses with fields such as: 6 | industry 7 | line_code 8 | size (e.g., Small, Medium, Large) 9 | level (numerical scale) 10 | value (financial amount) 11 | 12 | description 13 | File: python project.csv 14 | 🧼 1. Data Cleaning & Preprocessing 15 | Handled missing values using appropriate strategies: 16 | Mode for categorical columns (line_code, industry, size, description) 17 | Mean for numerical columns (level, value) 18 | Replaced special characters (e.g., \x96 to -) 19 | Removed duplicate rows 20 | Applied Z-Score for outlier detection and removed extreme values 21 | 22 | 📊 2. Exploratory Data Analysis (EDA) 23 | Generated summary statistics 24 | Visualized data distributions and counts using: 25 | Histogram of financial value 26 | Bar chart of industry counts 27 | 28 | 💡 3. Financial Insights Extraction 29 | Key metrics calculated: 30 | Total Value 31 | Average Value 32 | Maximum Value 33 | 34 | 📈 4. Data Visualizations 35 | Multiple visualizations were generated to explore relationships and patterns: 36 | 📊 Histogram – Distribution of financial values 37 | 38 | 🏭 Bar Chart – Count of records by industry 39 | 40 | 📌 Scatter Plot – Total value by industry 41 | 42 | 🧁 Pie Chart – Business size distribution 43 | 44 | 📦 Box Plot – Value by business size 45 | 46 | 📉 Line Graph – Level vs Value trend 47 | 48 | 🧠 5. Business Decision Support 49 | This section highlights business insights that can help stakeholders make informed decisions: 50 | 51 | Top Industries by Revenue 52 | 53 | Top Business Sizes by Revenue 54 | 55 | Top Line Codes by Revenue 56 | 57 | 🛠️ Technologies Used 58 | Python 3 59 | 60 | Pandas 61 | 62 | NumPy 63 | 64 | Matplotlib 65 | 66 | Seaborn 67 | 68 | SciPy 69 | 70 | ✅ How to Run 71 | Clone the repository 72 | 73 | Ensure dependencies are installed (see requirements.txt) 74 | 75 | Place python project.csv in the project directory 76 | 77 | Run the Python script to see output and visualizations 78 | 79 | bash 80 | Copy 81 | Edit 82 | python business_analysis.py 83 | 📌 Author 84 | [S.Surendranath Reddy] 85 | 86 | [singamsurendra14@gmail.com /https://github.com/singam2006] 87 | 88 | -------------------------------------------------------------------------------- /code.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | from scipy import stats 6 | df = pd.read_csv("C:\\Users\\singa\\Documents\\OneDrive\\Desktop\\python project.csv", encoding='ISO-8859-1') 7 | # 1:Data Cleaning & Preprocessing 8 | 9 | print(df.info()) 10 | print("\nMissing Values:") 11 | print(df.isnull().sum()) 12 | 13 | df['size'] = df['size'].str.replace('\x96', '-', regex=True) 14 | 15 | df['line_code'] = df['line_code'].fillna(df['line_code'].mode()[0]) 16 | df['industry'] = df['industry'].fillna(df['industry'].mode()[0]) 17 | df['size'] = df['size'].fillna(df['size'].mode()[0]) 18 | df['level'] = df['level'].fillna(df['level'].mean()) 19 | df['description'] = df['description'].fillna(df['description'].mode()[0]) 20 | df['value'] = df['value'].fillna(df['value'].mean()) 21 | 22 | df.drop_duplicates(inplace=True) 23 | 24 | z = np.abs(stats.zscore(df[['level', 'value']])) 25 | df = df[(z < 3).all(axis=1)] 26 | 27 | print("\nCleaned Data Info:") 28 | print(df.info()) 29 | print("Remaining null values:", df.isnull().sum().sum()) 30 | 31 | # 2:Exploratory Data Analysis (EDA) 32 | 33 | print("\n--- Descriptive Statistics ---") 34 | print(df.describe()) 35 | 36 | # Value distribution (Histogram) 37 | plt.figure() 38 | sns.histplot(df['value'], bins=30, kde=True) 39 | plt.title("Distribution of Financial Values (Histogram)") 40 | plt.xlabel("Value") 41 | plt.ylabel("Frequency") 42 | plt.tight_layout() 43 | plt.show() 44 | 45 | # Count by industry (Bar Chart) 46 | plt.figure(figsize=(10, 8)) 47 | sns.countplot(y='industry', data=df, order=df['industry'].value_counts().index) 48 | plt.title("Record Count by Industry (Bar Chart)") 49 | plt.tight_layout() 50 | plt.show() 51 | 52 | # 3:Financial Insights Extraction 53 | total = df['value'].sum() 54 | average = df['value'].mean() 55 | maximum = df['value'].max() 56 | 57 | print("\n--- Financial Insights ---") 58 | print(f"Total Value: {total}") 59 | print(f"Average Value: {average}") 60 | print(f"Maximum Value: {maximum}") 61 | 62 | # 4:Data Visualization 63 | industry_values = df.groupby('industry')['value'].sum().sort_values() 64 | 65 | # Scatter plot 66 | plt.figure(figsize=(10, 6)) 67 | plt.scatter(industry_values.values, industry_values.index, color='teal') # x: values, y: industries 68 | plt.title("Total Value by Industry (Scatter Plot)") 69 | plt.xlabel("Value") 70 | plt.ylabel("Industry") 71 | plt.tight_layout() 72 | plt.show() 73 | 74 | ##pie chart 75 | size_counts = df['size'].value_counts() 76 | colors = plt.cm.Set3(np.linspace(0, 1, len(size_counts))) 77 | plt.figure(figsize=(8, 8)) 78 | plt.pie( 79 | size_counts, 80 | labels=size_counts.index, 81 | autopct='%1.1f%%', 82 | startangle=140, 83 | colors=colors, 84 | textprops={'fontsize': 10, 'fontweight': 'bold', 'color': 'black'}, 85 | wedgeprops={'edgecolor': 'white', 'linewidth': 1} 86 | ) 87 | 88 | plt.title("Business Size Distribution (Pie Chart)", fontsize=14, fontweight='bold') 89 | plt.axis('equal') 90 | plt.tight_layout() 91 | plt.show() 92 | 93 | ##Box Plot 94 | plt.figure(figsize=(10, 6)) 95 | sns.boxplot(x='size', y='value', data=df) 96 | plt.title("Value by Business Size (Box Plot)") 97 | plt.xticks(rotation=45) 98 | plt.tight_layout() 99 | plt.show() 100 | 101 | 102 | # Line Graph 103 | df_sorted = df.sort_values(by='level') 104 | plt.figure(figsize=(8, 5)) 105 | plt.plot(df_sorted['level'], df_sorted['value'], marker='o', color='blue') 106 | plt.title("Level vs Value (Line Graph)") 107 | plt.xlabel("Level") 108 | plt.ylabel("Value") 109 | plt.grid(True) 110 | plt.tight_layout() 111 | plt.show() 112 | 113 | # 5:Business Decision Support 114 | 115 | print("\n--- Business Decision Support ---") 116 | 117 | print("\nTop Industries by Revenue:") 118 | print(df.groupby('industry')['value'].sum().sort_values(ascending=False).head()) 119 | 120 | print("\nTop Business Sizes by Revenue:") 121 | print(df.groupby('size')['value'].sum().sort_values(ascending=False)) 122 | 123 | print("\nTop Line Codes by Revenue:") 124 | print(df.groupby('line_code')['value'].sum().sort_values(ascending=False).head(10)) 125 | -------------------------------------------------------------------------------- /ppt.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/singam2006/Singam-Python-Project/cd1d1e1b62cc0d9f7c4a486e6d7cd613061bfa0a/ppt.pptx -------------------------------------------------------------------------------- /project.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/singam2006/Singam-Python-Project/cd1d1e1b62cc0d9f7c4a486e6d7cd613061bfa0a/project.docx -------------------------------------------------------------------------------- /python project.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/singam2006/Singam-Python-Project/cd1d1e1b62cc0d9f7c4a486e6d7cd613061bfa0a/python project.csv --------------------------------------------------------------------------------