├── README.md
├── code.py
├── ppt.pptx
├── project.docx
└── python project.csv


/README.md:
--------------------------------------------------------------------------------
 1 | 📊 Business Financial Analysis using Python
 2 | This project performs Exploratory Data Analysis (EDA), Data Cleaning, and Financial Insights Extraction on a dataset containing business-related financial records. The analysis is supported by data visualizations using Matplotlib and Seaborn to aid in business decision-making.
 3 | 
 4 | 📁 Dataset
 5 | The dataset includes financial information of businesses with fields such as:
 6 | industry
 7 | line_code
 8 | size (e.g., Small, Medium, Large)
 9 | level (numerical scale)
10 | value (financial amount)
11 | 
12 | description
13 | File: python project.csv
14 | 🧼 1. Data Cleaning & Preprocessing
15 | Handled missing values using appropriate strategies:
16 | Mode for categorical columns (line_code, industry, size, description)
17 | Mean for numerical columns (level, value)
18 | Replaced special characters (e.g., \x96 to -)
19 | Removed duplicate rows
20 | Applied Z-Score for outlier detection and removed extreme values
21 | 
22 | 📊 2. Exploratory Data Analysis (EDA)
23 | Generated summary statistics
24 | Visualized data distributions and counts using:
25 | Histogram of financial value
26 | Bar chart of industry counts
27 | 
28 | 💡 3. Financial Insights Extraction
29 | Key metrics calculated:
30 | Total Value
31 | Average Value
32 | Maximum Value
33 | 
34 | 📈 4. Data Visualizations
35 | Multiple visualizations were generated to explore relationships and patterns:
36 | 📊 Histogram – Distribution of financial values
37 | 
38 | 🏭 Bar Chart – Count of records by industry
39 | 
40 | 📌 Scatter Plot – Total value by industry
41 | 
42 | 🧁 Pie Chart – Business size distribution
43 | 
44 | 📦 Box Plot – Value by business size
45 | 
46 | 📉 Line Graph – Level vs Value trend
47 | 
48 | 🧠 5. Business Decision Support
49 | This section highlights business insights that can help stakeholders make informed decisions:
50 | 
51 | Top Industries by Revenue
52 | 
53 | Top Business Sizes by Revenue
54 | 
55 | Top Line Codes by Revenue
56 | 
57 | 🛠️ Technologies Used
58 | Python 3
59 | 
60 | Pandas
61 | 
62 | NumPy
63 | 
64 | Matplotlib
65 | 
66 | Seaborn
67 | 
68 | SciPy
69 | 
70 | ✅ How to Run
71 | Clone the repository
72 | 
73 | Ensure dependencies are installed (see requirements.txt)
74 | 
75 | Place python project.csv in the project directory
76 | 
77 | Run the Python script to see output and visualizations
78 | 
79 | bash
80 | Copy
81 | Edit
82 | python business_analysis.py
83 | 📌 Author
84 | [S.Surendranath Reddy]
85 | 
86 | [singamsurendra14@gmail.com /https://github.com/singam2006]
87 | 
88 | 


--------------------------------------------------------------------------------
/code.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | import seaborn as sns
  5 | from scipy import stats
  6 | df = pd.read_csv("C:\\Users\\singa\\Documents\\OneDrive\\Desktop\\python project.csv", encoding='ISO-8859-1')
  7 | # 1:Data Cleaning & Preprocessing
  8 | 
  9 | print(df.info())
 10 | print("\nMissing Values:")
 11 | print(df.isnull().sum())
 12 | 
 13 | df['size'] = df['size'].str.replace('\x96', '-', regex=True)
 14 | 
 15 | df['line_code'] = df['line_code'].fillna(df['line_code'].mode()[0])
 16 | df['industry'] = df['industry'].fillna(df['industry'].mode()[0])
 17 | df['size'] = df['size'].fillna(df['size'].mode()[0])
 18 | df['level'] = df['level'].fillna(df['level'].mean())
 19 | df['description'] = df['description'].fillna(df['description'].mode()[0])
 20 | df['value'] = df['value'].fillna(df['value'].mean())
 21 | 
 22 | df.drop_duplicates(inplace=True)
 23 | 
 24 | z = np.abs(stats.zscore(df[['level', 'value']]))
 25 | df = df[(z < 3).all(axis=1)]
 26 | 
 27 | print("\nCleaned Data Info:")
 28 | print(df.info())
 29 | print("Remaining null values:", df.isnull().sum().sum())
 30 | 
 31 | # 2:Exploratory Data Analysis (EDA)
 32 | 
 33 | print("\n--- Descriptive Statistics ---")
 34 | print(df.describe())
 35 | 
 36 | # Value distribution (Histogram)
 37 | plt.figure()
 38 | sns.histplot(df['value'], bins=30, kde=True)
 39 | plt.title("Distribution of Financial Values (Histogram)")
 40 | plt.xlabel("Value")
 41 | plt.ylabel("Frequency")
 42 | plt.tight_layout()
 43 | plt.show()
 44 | 
 45 | # Count by industry (Bar Chart)
 46 | plt.figure(figsize=(10, 8))
 47 | sns.countplot(y='industry', data=df, order=df['industry'].value_counts().index)
 48 | plt.title("Record Count by Industry (Bar Chart)")
 49 | plt.tight_layout()
 50 | plt.show()
 51 | 
 52 | # 3:Financial Insights Extraction
 53 | total = df['value'].sum()
 54 | average = df['value'].mean()
 55 | maximum = df['value'].max()
 56 | 
 57 | print("\n--- Financial Insights ---")
 58 | print(f"Total Value: {total}")
 59 | print(f"Average Value: {average}")
 60 | print(f"Maximum Value: {maximum}")
 61 | 
 62 | # 4:Data Visualization
 63 | industry_values = df.groupby('industry')['value'].sum().sort_values()
 64 | 
 65 | # Scatter plot
 66 | plt.figure(figsize=(10, 6))
 67 | plt.scatter(industry_values.values, industry_values.index, color='teal')  # x: values, y: industries
 68 | plt.title("Total Value by Industry (Scatter Plot)")
 69 | plt.xlabel("Value")
 70 | plt.ylabel("Industry")
 71 | plt.tight_layout()
 72 | plt.show()
 73 | 
 74 | ##pie chart
 75 | size_counts = df['size'].value_counts()
 76 | colors = plt.cm.Set3(np.linspace(0, 1, len(size_counts)))
 77 | plt.figure(figsize=(8, 8))
 78 | plt.pie(
 79 |     size_counts,
 80 |     labels=size_counts.index,
 81 |     autopct='%1.1f%%',
 82 |     startangle=140,
 83 |     colors=colors,
 84 |     textprops={'fontsize': 10, 'fontweight': 'bold', 'color': 'black'},
 85 |     wedgeprops={'edgecolor': 'white', 'linewidth': 1}
 86 | )
 87 | 
 88 | plt.title("Business Size Distribution (Pie Chart)", fontsize=14, fontweight='bold')
 89 | plt.axis('equal')
 90 | plt.tight_layout()
 91 | plt.show()
 92 | 
 93 | ##Box Plot
 94 | plt.figure(figsize=(10, 6))
 95 | sns.boxplot(x='size', y='value', data=df)
 96 | plt.title("Value by Business Size (Box Plot)")
 97 | plt.xticks(rotation=45)
 98 | plt.tight_layout()
 99 | plt.show()
100 | 
101 | 
102 | # Line Graph
103 | df_sorted = df.sort_values(by='level')
104 | plt.figure(figsize=(8, 5))
105 | plt.plot(df_sorted['level'], df_sorted['value'], marker='o', color='blue')
106 | plt.title("Level vs Value (Line Graph)")
107 | plt.xlabel("Level")
108 | plt.ylabel("Value")
109 | plt.grid(True)
110 | plt.tight_layout()
111 | plt.show()
112 | 
113 | # 5:Business Decision Support
114 | 
115 | print("\n--- Business Decision Support ---")
116 | 
117 | print("\nTop Industries by Revenue:")
118 | print(df.groupby('industry')['value'].sum().sort_values(ascending=False).head())
119 | 
120 | print("\nTop Business Sizes by Revenue:")
121 | print(df.groupby('size')['value'].sum().sort_values(ascending=False))
122 | 
123 | print("\nTop Line Codes by Revenue:")
124 | print(df.groupby('line_code')['value'].sum().sort_values(ascending=False).head(10))
125 | 


--------------------------------------------------------------------------------
/ppt.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singam2006/Singam-Python-Project/cd1d1e1b62cc0d9f7c4a486e6d7cd613061bfa0a/ppt.pptx


--------------------------------------------------------------------------------
/project.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singam2006/Singam-Python-Project/cd1d1e1b62cc0d9f7c4a486e6d7cd613061bfa0a/project.docx


--------------------------------------------------------------------------------
/python project.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/singam2006/Singam-Python-Project/cd1d1e1b62cc0d9f7c4a486e6d7cd613061bfa0a/python project.csv


--------------------------------------------------------------------------------