├── REPORT.pdf
├── README.md
└── IGNDPS_dataanalysis.py


/REPORT.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ankan00V/Samavesh-Visualizing-Social-Inclusion-in-India/HEAD/REPORT.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 📊 IGNDPS Data Analysis and Visualization
 2 | 
 3 | A Data-Driven Evaluation of the Indira Gandhi National Disability Pension Scheme (IGNDPS)
 4 | 
 5 | 📘 Overview
 6 | 
 7 | This project presents an in-depth exploratory and statistical analysis of the Indira Gandhi National Disability Pension Scheme (IGNDPS) dataset, sourced from the Government of India’s official open data platform, data.gov.in. The goal of the project is to evaluate the scheme’s effectiveness by uncovering hidden patterns, identifying inclusion gaps, and generating insights to inform policy-level decisions.
 8 | 
 9 | By leveraging modern data science methodologies and visual analytics, the study aims to assess the geographical distribution, digital penetration (Aadhar and mobile linkage), demographic outreach, and temporal progress of IGNDPS beneficiaries across India. The dataset comprises 14,000+ real-time records from various states and districts, providing a granular view of the scheme’s on-ground implementation.
10 | 
11 | 🗃 Dataset Description
12 | 
13 | Source: data.gov.in
14 | Total Records: 14,000+
15 | Attributes Include:
16 | Administrative: state_name, district_name
17 | Beneficiary Metrics: total_beneficiaries, total_aadhar, total_mobileno
18 | Caste Composition: sc, st, obc, gen
19 | Time Series Indicator: lastUpdated
20 | 
21 | 🎯 Analytical Objectives
22 | 
23 | Geographical Coverage Mapping
24 | Assess state-wise and district-wise distribution to identify areas with significant or limited IGNDPS penetration.
25 | Inclusion Index Development
26 | Design and compute a digital inclusion metric using Aadhaar and mobile linkage data to benchmark technological outreach.
27 | Statistical Correlation Analysis
28 | Evaluate dependencies between beneficiary count and digital identification variables using correlation coefficients.
29 | Temporal Trend Monitoring
30 | Track scheme enrollment progression over time using the lastUpdated timestamp field for time-series analysis.
31 | Demographic Impact Modeling
32 | Apply regression analysis to determine how caste-based segments affect scheme participation across regions.
33 | 
34 | 🧪 Methodology and Technologies
35 | 
36 | Language: Python 3.10+
37 | Libraries & Tools:
38 | Data Analysis: pandas, numpy
39 | Visualization: matplotlib, seaborn, plotly.express
40 | Modeling: scikit-learn (Linear Regression, StandardScaler)
41 | Temporal Analysis: datetime
42 | Feature Engineering:
43 | Inclusion Score = (Aadhar + Mobile) / Total Beneficiaries
44 | Year-Month Index from timestamp for trend analysis
45 | Proportional caste-based demographic ratios
46 | 
47 | 📈 Key Findings
48 | 
49 | States like Jharkhand, Tamil Nadu, and Andaman & Nicobar Islands showed complete digital inclusion (100% Aadhaar + mobile linkage).
50 | Districts such as North 24 Parganas and Kaushambi lead in both absolute and inclusive beneficiary coverage.
51 | A strong positive Pearson correlation (~0.83) exists between Aadhaar linkage and overall beneficiary count.
52 | Time series plots revealed cyclical enrollment spikes, possibly aligning with awareness campaigns or policy mandates.
53 | Demographic modeling highlighted OBC and SC populations as significantly represented in multiple high-performing regions.
54 | 
55 | 📊 Visual Assets
56 | 
57 | State/District Bar Plots – For top contributors and lagging regions
58 | Heatmaps – To visualize correlation between features
59 | Scatter Plots with Color Encodings – For multi-variable exploration
60 | Interactive Dashboards (Plotly) – For advanced data exploration
61 | Line Graphs & Area Charts – To capture temporal trends
62 | 
63 | 🧠 Strategic Insights
64 | 
65 | Digital Identity Integration plays a central role in accessibility and coverage.
66 | Demographic segmentation analysis supports the effectiveness of affirmative action under welfare schemes.
67 | Disparities across states/districts highlight areas requiring targeted outreach and administrative support.
68 | 
69 | 📁 Repository Structure
70 | 
71 | 📦 IGNDPS-Data-Analysis/
72 | ├── data/
73 | │   └── realtime_data.csv
74 | ├── notebooks/
75 | │   └── igndps_analysis.ipynb
76 | ├── visualizations/
77 | │   └── *.png, *.html
78 | ├── report/
79 | │   └── Samavesh_IGNDPS_Report.pdf
80 | ├── README.md
81 | 
82 | 🚀 Future Roadmap
83 | 
84 | Integrate geospatial analysis using GeoPandas and state shapefiles.
85 | Develop a real-time Streamlit dashboard for live monitoring and reporting.
86 | Expand the framework to cover other welfare schemes for comparative analytics.
87 | Incorporate machine learning clustering for identifying latent beneficiary patterns.
88 | 
89 | 🧾 Citation & Acknowledgements
90 | 
91 | Dataset Source: Government of India Open Data Platform – https://data.gov.in
92 | This project is part of a broader initiative to apply data for social good, especially in public policy evaluation and digital inclusion.
93 | 


--------------------------------------------------------------------------------
/IGNDPS_dataanalysis.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | import seaborn as sns
  5 | from sklearn.preprocessing import StandardScaler
  6 | import statsmodels.api as sm
  7 | import plotly.express as px
  8 | 
  9 | # Set aesthetics for plots
 10 | sns.set_theme(style="whitegrid", palette="muted")
 11 | plt.rcParams.update({'figure.autolayout': True})
 12 | 
 13 | # Load the dataset
 14 | file_path = "/Users/ankanghosh/Desktop/realtime_data.csv"
 15 | df = pd.read_csv(file_path)
 16 | 
 17 | # -------------------------
 18 | # 1. DATA CLEANING & VISUALIZATION
 19 | # -------------------------
 20 | 
 21 | # Check for missing values
 22 | print("Missing values per column:\n", df.isnull().sum())
 23 | 
 24 | # Convert date
 25 | df['lastUpdated'] = pd.to_datetime(df['lastUpdated'], format='%d-%m-%Y', errors='coerce')
 26 | 
 27 | # Drop duplicates
 28 | df = df.drop_duplicates()
 29 | 
 30 | # Total beneficiaries by state
 31 | state_beneficiaries = df.groupby('state_name')['total_beneficiaries'].sum().reset_index().sort_values(by='total_beneficiaries', ascending=False)
 32 | 
 33 | # Bar plot (updated hue to fix warning)
 34 | plt.figure(figsize=(14, 6))
 35 | sns.barplot(data=state_beneficiaries, x='state_name', y='total_beneficiaries', hue='state_name', palette='Spectral', dodge=False)
 36 | plt.xticks(rotation=90)
 37 | plt.title("Total Beneficiaries by State")
 38 | plt.xlabel("State")
 39 | plt.ylabel("Total Beneficiaries")
 40 | plt.legend([], [], frameon=False)
 41 | plt.tight_layout()
 42 | plt.show()
 43 | 
 44 | # -------------------------
 45 | # 2. EDA & STATISTICAL ANALYSIS
 46 | # -------------------------
 47 | 
 48 | # Correlation Matrix Heatmap
 49 | plt.figure(figsize=(10, 7))
 50 | correlation = df[['total_beneficiaries', 'total_aadhar', 'total_mobileno', 'total_sc', 'total_st', 'total_gen', 'total_obc']].corr()
 51 | sns.heatmap(correlation, annot=True, cmap='coolwarm', linewidths=0.5)
 52 | plt.title("Correlation Heatmap")
 53 | plt.tight_layout()
 54 | plt.show()
 55 | 
 56 | # Scatterplot - Aadhaar vs Beneficiaries (clean)
 57 | plt.figure(figsize=(10, 6))
 58 | sns.scatterplot(data=df, x='total_aadhar', y='total_beneficiaries', hue='state_name', palette='tab20', alpha=0.7)
 59 | plt.title("Aadhaar Linked vs Total Beneficiaries")
 60 | plt.xlabel("Aadhaar Linked")
 61 | plt.ylabel("Total Beneficiaries")
 62 | plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), fontsize='small')
 63 | plt.tight_layout()
 64 | plt.show()
 65 | 
 66 | # Correlation between aadhar and beneficiaries
 67 | cor_val = df['total_aadhar'].corr(df['total_beneficiaries'])
 68 | print(f"\nCorrelation between Total Aadhar and Total Beneficiaries: {cor_val}")
 69 | 
 70 | # -------------------------
 71 | # 3. CREATIVITY & INNOVATION
 72 | # -------------------------
 73 | 
 74 | # Objective 1: Inclusion Score = avg of mobile + aadhar coverage
 75 | df['inclusion_score'] = (df['total_aadhar'] + df['total_mobileno']) / (2 * df['total_beneficiaries'])
 76 | df['inclusion_score'] = df['inclusion_score'].clip(upper=1.0)
 77 | 
 78 | # Objective 2: Top 5 inclusive districts
 79 | top_districts = df.groupby('district_name')['inclusion_score'].mean().reset_index().sort_values(by='inclusion_score', ascending=False).head(5)
 80 | print("\nTop 5 Districts by Inclusion Score:\n", top_districts)
 81 | 
 82 | # Objective 3: Top 5 inclusive states
 83 | top_states = df.groupby('state_name')['inclusion_score'].mean().reset_index().sort_values(by='inclusion_score', ascending=False).head(5)
 84 | print("\nTop 5 States by Inclusion Score:\n", top_states)
 85 | 
 86 | # Plot top 5 states with fixed hue
 87 | plt.figure(figsize=(10, 5))
 88 | sns.barplot(data=top_states, x='state_name', y='inclusion_score', hue='state_name', palette='coolwarm', dodge=False)
 89 | plt.title("Top 5 States by Inclusion Score")
 90 | plt.ylabel("Inclusion Score")
 91 | plt.xlabel("State")
 92 | plt.legend([], [], frameon=False)
 93 | plt.tight_layout()
 94 | plt.show()
 95 | 
 96 | # Objective 4: Trend in scheme reporting over time
 97 | monthly_report = df.copy()
 98 | monthly_report['month'] = monthly_report['lastUpdated'].dt.to_period('M')
 99 | monthly_trend = monthly_report.groupby('month').size().reset_index(name='report_count')
100 | 
101 | plt.figure(figsize=(10, 5))
102 | sns.lineplot(data=monthly_trend, x='month', y='report_count', marker='o', color='purple')
103 | plt.title("Monthly Scheme Data Reporting Trend")
104 | plt.xticks(rotation=45)
105 | plt.tight_layout()
106 | plt.show()
107 | 
108 | # Objective 5: District-wise beneficiary analysis using interactive Plotly chart
109 | district_plot = df.groupby('district_name')['total_beneficiaries'].sum().reset_index().sort_values(by='total_beneficiaries', ascending=False).head(20)
110 | fig = px.bar(district_plot, x='district_name', y='total_beneficiaries',
111 |              color='total_beneficiaries', title="Top 20 Districts by Total Beneficiaries",
112 |              color_continuous_scale='Viridis')
113 | fig.update_layout(xaxis_tickangle=-45)
114 | fig.show()
115 | 
116 | # -------------------------
117 | # Optional: Save visuals as PDF/HTML
118 | # -------------------------
119 | 
120 | # Export static plots
121 | plt.figure(figsize=(12, 6))
122 | sns.boxplot(data=df[['total_beneficiaries', 'total_aadhar', 'total_mobileno']], palette='Set2')
123 | plt.title("Boxplot for Key Metrics")
124 | plt.tight_layout()
125 | plt.savefig("/Users/ankanghosh/Desktop/key_metrics_boxplot.pdf")
126 | plt.show()
127 | 


--------------------------------------------------------------------------------