├── REPORT.pdf ├── README.md └── IGNDPS_dataanalysis.py /REPORT.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ankan00V/Samavesh-Visualizing-Social-Inclusion-in-India/HEAD/REPORT.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 📊 IGNDPS Data Analysis and Visualization 2 | 3 | A Data-Driven Evaluation of the Indira Gandhi National Disability Pension Scheme (IGNDPS) 4 | 5 | 📘 Overview 6 | 7 | This project presents an in-depth exploratory and statistical analysis of the Indira Gandhi National Disability Pension Scheme (IGNDPS) dataset, sourced from the Government of India’s official open data platform, data.gov.in. The goal of the project is to evaluate the scheme’s effectiveness by uncovering hidden patterns, identifying inclusion gaps, and generating insights to inform policy-level decisions. 8 | 9 | By leveraging modern data science methodologies and visual analytics, the study aims to assess the geographical distribution, digital penetration (Aadhar and mobile linkage), demographic outreach, and temporal progress of IGNDPS beneficiaries across India. The dataset comprises 14,000+ real-time records from various states and districts, providing a granular view of the scheme’s on-ground implementation. 10 | 11 | 🗃 Dataset Description 12 | 13 | Source: data.gov.in 14 | Total Records: 14,000+ 15 | Attributes Include: 16 | Administrative: state_name, district_name 17 | Beneficiary Metrics: total_beneficiaries, total_aadhar, total_mobileno 18 | Caste Composition: sc, st, obc, gen 19 | Time Series Indicator: lastUpdated 20 | 21 | 🎯 Analytical Objectives 22 | 23 | Geographical Coverage Mapping 24 | Assess state-wise and district-wise distribution to identify areas with significant or limited IGNDPS penetration. 25 | Inclusion Index Development 26 | Design and compute a digital inclusion metric using Aadhaar and mobile linkage data to benchmark technological outreach. 27 | Statistical Correlation Analysis 28 | Evaluate dependencies between beneficiary count and digital identification variables using correlation coefficients. 29 | Temporal Trend Monitoring 30 | Track scheme enrollment progression over time using the lastUpdated timestamp field for time-series analysis. 31 | Demographic Impact Modeling 32 | Apply regression analysis to determine how caste-based segments affect scheme participation across regions. 33 | 34 | 🧪 Methodology and Technologies 35 | 36 | Language: Python 3.10+ 37 | Libraries & Tools: 38 | Data Analysis: pandas, numpy 39 | Visualization: matplotlib, seaborn, plotly.express 40 | Modeling: scikit-learn (Linear Regression, StandardScaler) 41 | Temporal Analysis: datetime 42 | Feature Engineering: 43 | Inclusion Score = (Aadhar + Mobile) / Total Beneficiaries 44 | Year-Month Index from timestamp for trend analysis 45 | Proportional caste-based demographic ratios 46 | 47 | 📈 Key Findings 48 | 49 | States like Jharkhand, Tamil Nadu, and Andaman & Nicobar Islands showed complete digital inclusion (100% Aadhaar + mobile linkage). 50 | Districts such as North 24 Parganas and Kaushambi lead in both absolute and inclusive beneficiary coverage. 51 | A strong positive Pearson correlation (~0.83) exists between Aadhaar linkage and overall beneficiary count. 52 | Time series plots revealed cyclical enrollment spikes, possibly aligning with awareness campaigns or policy mandates. 53 | Demographic modeling highlighted OBC and SC populations as significantly represented in multiple high-performing regions. 54 | 55 | 📊 Visual Assets 56 | 57 | State/District Bar Plots – For top contributors and lagging regions 58 | Heatmaps – To visualize correlation between features 59 | Scatter Plots with Color Encodings – For multi-variable exploration 60 | Interactive Dashboards (Plotly) – For advanced data exploration 61 | Line Graphs & Area Charts – To capture temporal trends 62 | 63 | 🧠 Strategic Insights 64 | 65 | Digital Identity Integration plays a central role in accessibility and coverage. 66 | Demographic segmentation analysis supports the effectiveness of affirmative action under welfare schemes. 67 | Disparities across states/districts highlight areas requiring targeted outreach and administrative support. 68 | 69 | 📁 Repository Structure 70 | 71 | 📦 IGNDPS-Data-Analysis/ 72 | ├── data/ 73 | │ └── realtime_data.csv 74 | ├── notebooks/ 75 | │ └── igndps_analysis.ipynb 76 | ├── visualizations/ 77 | │ └── *.png, *.html 78 | ├── report/ 79 | │ └── Samavesh_IGNDPS_Report.pdf 80 | ├── README.md 81 | 82 | 🚀 Future Roadmap 83 | 84 | Integrate geospatial analysis using GeoPandas and state shapefiles. 85 | Develop a real-time Streamlit dashboard for live monitoring and reporting. 86 | Expand the framework to cover other welfare schemes for comparative analytics. 87 | Incorporate machine learning clustering for identifying latent beneficiary patterns. 88 | 89 | 🧾 Citation & Acknowledgements 90 | 91 | Dataset Source: Government of India Open Data Platform – https://data.gov.in 92 | This project is part of a broader initiative to apply data for social good, especially in public policy evaluation and digital inclusion. 93 | -------------------------------------------------------------------------------- /IGNDPS_dataanalysis.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | from sklearn.preprocessing import StandardScaler 6 | import statsmodels.api as sm 7 | import plotly.express as px 8 | 9 | # Set aesthetics for plots 10 | sns.set_theme(style="whitegrid", palette="muted") 11 | plt.rcParams.update({'figure.autolayout': True}) 12 | 13 | # Load the dataset 14 | file_path = "/Users/ankanghosh/Desktop/realtime_data.csv" 15 | df = pd.read_csv(file_path) 16 | 17 | # ------------------------- 18 | # 1. DATA CLEANING & VISUALIZATION 19 | # ------------------------- 20 | 21 | # Check for missing values 22 | print("Missing values per column:\n", df.isnull().sum()) 23 | 24 | # Convert date 25 | df['lastUpdated'] = pd.to_datetime(df['lastUpdated'], format='%d-%m-%Y', errors='coerce') 26 | 27 | # Drop duplicates 28 | df = df.drop_duplicates() 29 | 30 | # Total beneficiaries by state 31 | state_beneficiaries = df.groupby('state_name')['total_beneficiaries'].sum().reset_index().sort_values(by='total_beneficiaries', ascending=False) 32 | 33 | # Bar plot (updated hue to fix warning) 34 | plt.figure(figsize=(14, 6)) 35 | sns.barplot(data=state_beneficiaries, x='state_name', y='total_beneficiaries', hue='state_name', palette='Spectral', dodge=False) 36 | plt.xticks(rotation=90) 37 | plt.title("Total Beneficiaries by State") 38 | plt.xlabel("State") 39 | plt.ylabel("Total Beneficiaries") 40 | plt.legend([], [], frameon=False) 41 | plt.tight_layout() 42 | plt.show() 43 | 44 | # ------------------------- 45 | # 2. EDA & STATISTICAL ANALYSIS 46 | # ------------------------- 47 | 48 | # Correlation Matrix Heatmap 49 | plt.figure(figsize=(10, 7)) 50 | correlation = df[['total_beneficiaries', 'total_aadhar', 'total_mobileno', 'total_sc', 'total_st', 'total_gen', 'total_obc']].corr() 51 | sns.heatmap(correlation, annot=True, cmap='coolwarm', linewidths=0.5) 52 | plt.title("Correlation Heatmap") 53 | plt.tight_layout() 54 | plt.show() 55 | 56 | # Scatterplot - Aadhaar vs Beneficiaries (clean) 57 | plt.figure(figsize=(10, 6)) 58 | sns.scatterplot(data=df, x='total_aadhar', y='total_beneficiaries', hue='state_name', palette='tab20', alpha=0.7) 59 | plt.title("Aadhaar Linked vs Total Beneficiaries") 60 | plt.xlabel("Aadhaar Linked") 61 | plt.ylabel("Total Beneficiaries") 62 | plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), fontsize='small') 63 | plt.tight_layout() 64 | plt.show() 65 | 66 | # Correlation between aadhar and beneficiaries 67 | cor_val = df['total_aadhar'].corr(df['total_beneficiaries']) 68 | print(f"\nCorrelation between Total Aadhar and Total Beneficiaries: {cor_val}") 69 | 70 | # ------------------------- 71 | # 3. CREATIVITY & INNOVATION 72 | # ------------------------- 73 | 74 | # Objective 1: Inclusion Score = avg of mobile + aadhar coverage 75 | df['inclusion_score'] = (df['total_aadhar'] + df['total_mobileno']) / (2 * df['total_beneficiaries']) 76 | df['inclusion_score'] = df['inclusion_score'].clip(upper=1.0) 77 | 78 | # Objective 2: Top 5 inclusive districts 79 | top_districts = df.groupby('district_name')['inclusion_score'].mean().reset_index().sort_values(by='inclusion_score', ascending=False).head(5) 80 | print("\nTop 5 Districts by Inclusion Score:\n", top_districts) 81 | 82 | # Objective 3: Top 5 inclusive states 83 | top_states = df.groupby('state_name')['inclusion_score'].mean().reset_index().sort_values(by='inclusion_score', ascending=False).head(5) 84 | print("\nTop 5 States by Inclusion Score:\n", top_states) 85 | 86 | # Plot top 5 states with fixed hue 87 | plt.figure(figsize=(10, 5)) 88 | sns.barplot(data=top_states, x='state_name', y='inclusion_score', hue='state_name', palette='coolwarm', dodge=False) 89 | plt.title("Top 5 States by Inclusion Score") 90 | plt.ylabel("Inclusion Score") 91 | plt.xlabel("State") 92 | plt.legend([], [], frameon=False) 93 | plt.tight_layout() 94 | plt.show() 95 | 96 | # Objective 4: Trend in scheme reporting over time 97 | monthly_report = df.copy() 98 | monthly_report['month'] = monthly_report['lastUpdated'].dt.to_period('M') 99 | monthly_trend = monthly_report.groupby('month').size().reset_index(name='report_count') 100 | 101 | plt.figure(figsize=(10, 5)) 102 | sns.lineplot(data=monthly_trend, x='month', y='report_count', marker='o', color='purple') 103 | plt.title("Monthly Scheme Data Reporting Trend") 104 | plt.xticks(rotation=45) 105 | plt.tight_layout() 106 | plt.show() 107 | 108 | # Objective 5: District-wise beneficiary analysis using interactive Plotly chart 109 | district_plot = df.groupby('district_name')['total_beneficiaries'].sum().reset_index().sort_values(by='total_beneficiaries', ascending=False).head(20) 110 | fig = px.bar(district_plot, x='district_name', y='total_beneficiaries', 111 | color='total_beneficiaries', title="Top 20 Districts by Total Beneficiaries", 112 | color_continuous_scale='Viridis') 113 | fig.update_layout(xaxis_tickangle=-45) 114 | fig.show() 115 | 116 | # ------------------------- 117 | # Optional: Save visuals as PDF/HTML 118 | # ------------------------- 119 | 120 | # Export static plots 121 | plt.figure(figsize=(12, 6)) 122 | sns.boxplot(data=df[['total_beneficiaries', 'total_aadhar', 'total_mobileno']], palette='Set2') 123 | plt.title("Boxplot for Key Metrics") 124 | plt.tight_layout() 125 | plt.savefig("/Users/ankanghosh/Desktop/key_metrics_boxplot.pdf") 126 | plt.show() 127 | --------------------------------------------------------------------------------