├── excel project (3).xlsx
├── EDA PYTHON PROJECT REPORT .doc
├── README.md
└── eda project python code.py


/excel project (3).xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PrabhkiratSingh123/EDA-PROJECT-RELATED-TO-Demographics-An-Exploratory-Analysis-of-Census-Data/HEAD/excel project (3).xlsx


--------------------------------------------------------------------------------
/EDA PYTHON PROJECT REPORT .doc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PrabhkiratSingh123/EDA-PROJECT-RELATED-TO-Demographics-An-Exploratory-Analysis-of-Census-Data/HEAD/EDA PYTHON PROJECT REPORT .doc


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # EDA-PROJECT-RELATED-TO-Demographics-An-Exploratory-Analysis-of-Census-Data
  2 | # 🧠 DATA SCIENCE TOOLBOX: PYTHON PROGRAMMING  
  3 | ## 📊 Demographics: An Exploratory Analysis of Census Data  
  4 | 
  5 | **Project Semester:** January–April 2025  
  6 | **Course Code:** INT375  
  7 | **Program & Section:** B.Tech (CSE) - K23GD  
  8 | **Submitted By:** Prabhkirat Singh (Reg. No: 12309872)  
  9 | **Guided By:** Mrs. Baljinder Kaur, Discipline of CSE/IT, Lovely Professional University  
 10 | 
 11 | ## 📘 Project Overview
 12 | 
 13 | This project conducts an **Exploratory Data Analysis (EDA)** of the Indian Census dataset to uncover insights about the population based on factors such as:
 14 | 
 15 | - Gender distribution
 16 | - Urban vs. Rural residence
 17 | - Place of birth
 18 | - Age groups
 19 | - Correlation between demographic variables
 20 | 
 21 | The findings are visualized using Python libraries including **Pandas**, **Seaborn**, **Matplotlib**, and **NumPy**.
 22 | 
 23 | ---
 24 | 
 25 | ## 📂 Dataset Source
 26 | 
 27 | The dataset is sourced from the [Government of India's Census Portal](https://censusindia.gov.in/nada/index.php/catalog/10717) and contains the following key features:
 28 | 
 29 | - State, District, Area Name
 30 | - Age Group
 31 | - Birthplace
 32 | - Total / Male / Female population
 33 | - Rural and Urban segmentation
 34 | 
 35 | ---
 36 | 
 37 | ## 🔧 EDA Process
 38 | 
 39 | The following steps were performed:
 40 | 
 41 | - **Data Cleaning:** Handling nulls, fixing inconsistent entries
 42 | - **Transformation:** Filtering, type conversions
 43 | - **Aggregation:** Using `groupby()` and `sum()`
 44 | - **Visualization:** Pie, bar, line, violin, and heatmap plots
 45 | 
 46 | ### 🛠 Libraries Used
 47 | 
 48 | - `pandas`
 49 | - `numpy`
 50 | - `matplotlib`
 51 | - `seaborn`
 52 | - `squarify`
 53 | 
 54 | ---
 55 | 
 56 | ## 📈 Key Analyses & Visualizations
 57 | 
 58 | ### 1. Gender Distribution
 59 | - **Insight:** Slight male dominance, otherwise balanced
 60 | - **Visualization:** Donut chart
 61 | 
 62 | ### 2. Urban vs Rural
 63 | - **Insight:** Rural population still forms the majority
 64 | - **Visualization:** Bar and pie charts
 65 | 
 66 | ### 3. Top Birthplaces
 67 | - **Insight:** Majority born locally; UP leads in external migration
 68 | - **Visualization:** Treemap
 69 | 
 70 | ### 4. Age Group Distribution
 71 | - **Insight:** Peak in working-age population; decline in senior groups
 72 | - **Visualization:** Line plot
 73 | 
 74 | ### 5. Correlation Matrix
 75 | - **Insight:** Strong positive correlation between related variables
 76 | - **Visualization:** Heatmap
 77 | 
 78 | ### 6. Population by Birthplace
 79 | - **Insight:** Uneven contribution by regions; some have higher spread
 80 | - **Visualization:** Violin plot
 81 | 
 82 | ---
 83 | 
 84 | ## ✅ Conclusion
 85 | 
 86 | The project highlighted:
 87 | 
 88 | - Significant urban-rural population contrasts  
 89 | - Balanced but slightly male-skewed gender ratios  
 90 | - Insightful birthplace patterns suggesting internal migration  
 91 | - Dominance of the working-age population
 92 | 
 93 | EDA proves to be a powerful step in making census data actionable for policy-making.
 94 | 
 95 | ---
 96 | 
 97 | ## 🚀 Future Scope
 98 | 
 99 | - Predictive Modeling for migration/urban growth
100 | - Geo-mapping with tools like `GeoPandas`
101 | - Interactive Dashboards (Plotly/Tableau)
102 | - Time-series forecasting on census trends
103 | - Machine Learning for clustering and classification
104 | 
105 | ---
106 | 
107 | ## 📚 References
108 | 
109 | - [Census India Dataset](https://censusindia.gov.in/nada/index.php/catalog/10717)  
110 | - [Seaborn](https://seaborn.pydata.org)  
111 | - [Matplotlib](https://matplotlib.org)  
112 | - [Pandas](https://pandas.pydata.org)  
113 | 
114 | ---
115 | 
116 | 
117 | 


--------------------------------------------------------------------------------
/eda project python code.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import matplotlib.pyplot as plt
  3 | import seaborn as sns
  4 | import squarify
  5 | 
  6 | # Load dataset
  7 | df = pd.read_csv(r"C:\Users\ssard\OneDrive\Documents\Dataset for project.csv")
  8 | 
  9 | # Display dataset description
 10 | print("📝 Dataset Description:\n")
 11 | print(df.describe(include="all"))
 12 | print("\n📊 Columns:\n", df.columns)
 13 | 
 14 | # Labels & sizes for treemap
 15 | labels = [
 16 |     "Total Population\n67,151,764",
 17 |     "Born within India\n66,333,160",
 18 |     "Born in the place of enumeration\n38,688,772",
 19 |     "Within the state of enumeration\n40,578,120",
 20 |     "States in India\nbeyond the state of enumeration\n25,755,040",
 21 |     "Uttar Pradesh\n11,619,444"
 22 | ]
 23 | sizes = [67151764, 66333160, 38688772, 40578120, 25755040, 11619444]
 24 | colors = ['#a2d4c5', '#ffffcc', '#ffb3b3', '#d5ccff', '#a3c2c2', '#ffcc99']
 25 | 
 26 | # Clean numeric columns
 27 | cols_to_convert = [
 28 |     "Total Persons", "Total Males", "Total Females",
 29 |     "Rural Persons", "Rural Males", "Rural Females",
 30 |     "Urban Persons", "Urban Males", "Urban Females"
 31 | ]
 32 | for col in cols_to_convert:
 33 |     df[col] = df[col].str.replace(",", "").astype(int)
 34 | 
 35 | # Aggregated data
 36 | gender_counts = df[["Total Males", "Total Females"]].sum()
 37 | urban_rural = df[["Urban Persons", "Rural Persons"]].sum()
 38 | birthplace_counts = df.groupby("Birth place ")["Total Persons"].sum().sort_values(ascending=False)
 39 | age_group_dist = df.groupby("Age-group")["Total Persons"].sum().reset_index()
 40 | top_birthplaces = birthplace_counts.head(6)
 41 | df_birth_top = df[df["Birth place "].isin(top_birthplaces.index)]
 42 | 
 43 | # Set seaborn style
 44 | sns.set(style="whitegrid")
 45 | 
 46 | # 1. Donut Chart - Gender Distribution
 47 | plt.figure(figsize=(6, 6))
 48 | plt.pie(gender_counts, labels=gender_counts.index, startangle=90,
 49 |         autopct="%1.1f%%", colors=["#5DADE2", "#F1948A"], wedgeprops=dict(width=0.4))
 50 | plt.title("Gender Distribution", fontsize=16)
 51 | plt.show()
 52 | 
 53 | # 2. Barplot - Urban vs Rural
 54 | plt.figure(figsize=(6, 5))
 55 | sns.barplot(x=urban_rural.index, y=urban_rural.values, palette="Accent")
 56 | plt.title("Urban vs Rural Population", fontsize=16)
 57 | plt.ylabel("Population")
 58 | plt.show()
 59 | 
 60 | # 3. Treemap - Top Birthplaces
 61 | plt.figure(figsize=(12, 6))
 62 | squarify.plot(sizes=sizes, label=labels, color=colors, pad=True, text_kwargs={'fontsize': 10})
 63 | plt.axis('off')
 64 | plt.title("Top 6 Birthplaces by Population", fontsize=16)
 65 | plt.show()
 66 | 
 67 | # 4. Lineplot - Age Group
 68 | plt.figure(figsize=(8, 5))
 69 | sns.lineplot(data=age_group_dist, x="Age-group", y="Total Persons", marker="o", color="#E67E22")
 70 | plt.title("Age Group Distribution", fontsize=16)
 71 | plt.xticks(rotation=45)
 72 | plt.show()
 73 | 
 74 | # 5. Heatmap - Correlation
 75 | plt.figure(figsize=(8, 6))
 76 | corr = df[cols_to_convert].corr()
 77 | sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f", cbar_kws={"shrink": 0.8})
 78 | plt.title("Demographic Correlation", fontsize=16)
 79 | plt.show()
 80 | 
 81 | # 6. Violin Plot - Population by Birthplace (Top 6)
 82 | plt.figure(figsize=(10, 6))
 83 | sns.violinplot(data=df_birth_top, x="Birth place ", y="Total Persons", palette="pastel")
 84 | plt.title("Population Distribution in Top Birthplaces", fontsize=16)
 85 | plt.xlabel("Birthplace")
 86 | plt.ylabel("Total Persons")
 87 | plt.xticks(rotation=45)
 88 | plt.show()
 89 | 
 90 | 
 91 | 
 92 | # 7. Pie Chart - Urban vs Rural Persons
 93 | plt.figure(figsize=(6, 6))
 94 | plt.pie(urban_rural, labels=urban_rural.index, autopct='%1.1f%%',
 95 |         startangle=140, colors=["#82E0AA", "#F5B7B1"])
 96 | plt.title("Urban vs Rural Split", fontsize=16)
 97 | plt.show()
 98 | 
 99 | # 🔚 Combined Grid of 4 Charts (Donut, Bar, Line, Pie)
100 | fig, axs = plt.subplots(2, 2, figsize=(14, 10))
101 | 
102 | # Gender Donut Chart
103 | axs[0, 0].pie(gender_counts, labels=gender_counts.index, startangle=90,
104 |               autopct="%1.1f%%", colors=["#5DADE2", "#F1948A"], wedgeprops=dict(width=0.4))
105 | axs[0, 0].set_title("Gender Distribution")
106 | 
107 | # Urban vs Rural Barplot
108 | sns.barplot(x=urban_rural.index, y=urban_rural.values, palette="Accent", ax=axs[0, 1])
109 | axs[0, 1].set_title("Urban vs Rural Population")
110 | 
111 | # Age Group Line Plot
112 | sns.lineplot(data=age_group_dist, x="Age-group", y="Total Persons", marker="o",
113 |              color="#E67E22", ax=axs[1, 0])
114 | axs[1, 0].set_title("Age Group Distribution")
115 | axs[1, 0].tick_params(axis='x', rotation=45)
116 | 
117 | # Urban vs Rural Pie Chart
118 | axs[1, 1].pie(urban_rural, labels=urban_rural.index, autopct='%1.1f%%',
119 |               startangle=140, colors=["#82E0AA", "#F5B7B1"])
120 | axs[1, 1].set_title("Urban vs Rural Split")
121 | 
122 | plt.tight_layout()
123 | plt.suptitle("📊 Combined Demographic Dashboard", fontsize=18, y=1.03)
124 | plt.show()
125 | 


--------------------------------------------------------------------------------