├── Liver Enzymes Analysis.png
├── Protein Levels Analysis.png
├── Bilirubin Levels Analysis.png
├── Gender Distribution Analysis.png
├── Age vs. Enzyme Levels Analysis.png
├── Actual vs Predicted plot (Random Forest).png
├── Age Distribution by Disease Status Analysis.png
├── Correlation Between Bilirubin and Enzymes Analysis.png
├── Liver_Disease_Prediction_Using_Machine_Learning_Report.pdf
├── .github
    └── workflows
    │   └── python-package-conda.yml
├── README.md
└── Project (Indian Liver Patient).py


/Liver Enzymes Analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Liver Enzymes Analysis.png


--------------------------------------------------------------------------------
/Protein Levels Analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Protein Levels Analysis.png


--------------------------------------------------------------------------------
/Bilirubin Levels Analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Bilirubin Levels Analysis.png


--------------------------------------------------------------------------------
/Gender Distribution Analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Gender Distribution Analysis.png


--------------------------------------------------------------------------------
/Age vs. Enzyme Levels Analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Age vs. Enzyme Levels Analysis.png


--------------------------------------------------------------------------------
/Actual vs Predicted plot (Random Forest).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Actual vs Predicted plot (Random Forest).png


--------------------------------------------------------------------------------
/Age Distribution by Disease Status Analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Age Distribution by Disease Status Analysis.png


--------------------------------------------------------------------------------
/Correlation Between Bilirubin and Enzymes Analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Correlation Between Bilirubin and Enzymes Analysis.png


--------------------------------------------------------------------------------
/Liver_Disease_Prediction_Using_Machine_Learning_Report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KumudRanjan4295/Liver_Disease_Prediction_Using_Machine_Learning/HEAD/Liver_Disease_Prediction_Using_Machine_Learning_Report.pdf


--------------------------------------------------------------------------------
/.github/workflows/python-package-conda.yml:
--------------------------------------------------------------------------------
 1 | name: Python Package using Conda
 2 | 
 3 | on: [push]
 4 | 
 5 | jobs:
 6 |   build-linux:
 7 |     runs-on: ubuntu-latest
 8 |     strategy:
 9 |       max-parallel: 5
10 | 
11 |     steps:
12 |     - uses: actions/checkout@v4
13 |     - name: Set up Python 3.10
14 |       uses: actions/setup-python@v3
15 |       with:
16 |         python-version: '3.10'
17 |     - name: Add conda to system path
18 |       run: |
19 |         # $CONDA is an environment variable pointing to the root of the miniconda directory
20 |         echo $CONDA/bin >> $GITHUB_PATH
21 |     - name: Install dependencies
22 |       run: |
23 |         conda env update --file environment.yml --name base
24 |     - name: Lint with flake8
25 |       run: |
26 |         conda install flake8
27 |         # stop the build if there are Python syntax errors or undefined names
28 |         flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
29 |         # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
30 |         flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
31 |     - name: Test with pytest
32 |       run: |
33 |         conda install pytest
34 |         pytest
35 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Liver Disease Prediction Using Machine Learning
 2 | 
 3 | This project leverages machine learning to predict liver disease using clinical data from the **Indian Liver Patient Dataset**. It combines exploratory data analysis (EDA), classification, and regression modeling to extract meaningful healthcare insights.
 4 | 
 5 | ## 📂 Dataset
 6 | 
 7 | - **Source**: [Kaggle - Indian Liver Patient Dataset](https://www.kaggle.com/datasets/uciml/indian-liver-patient-records)
 8 | - **Records**: 583
 9 | - **Features**: Age, Gender, Bilirubin levels, Liver enzymes (ALT, AST, ALP), Albumin, A/G ratio
10 | - **Target Variable**: Liver Disease (0 = No Disease, 1 = Disease)
11 | 
12 | ## 🔍 Project Objectives
13 | 
14 | - Perform EDA to understand patterns in liver disease indicators
15 | - Compare healthy vs. diseased patient data statistically and visually
16 | - Build:
17 |   - **SVM** model for disease classification
18 |   - **Random Forest** model for bilirubin level prediction
19 | 
20 | ## 🧪 Exploratory Data Analysis (EDA)
21 | 
22 | - Analyzed gender and age-wise distribution
23 | - Investigated enzyme and protein level variations
24 | - Identified strong correlations using heatmaps
25 | - Handled missing values and encoded categorical data
26 | 
27 | ## 🧠 Machine Learning Models
28 | 
29 | ### 1. Support Vector Machine (SVM)
30 | - **Task**: Binary Classification
31 | - **Accuracy**: 72%
32 | - **Recall (Disease)**: 88%
33 | - **Precision**: 75%
34 | - **F1-Score**: 0.81
35 | - **Top Features**: AST, Albumin, ALP
36 | 
37 | ### 2. Random Forest Regressor
38 | - **Task**: Predicting Total Bilirubin
39 | - **R² Score**: 0.68
40 | - **MAE**: 1.92 mg/dL
41 | - **RMSE**: 2.89 mg/dL
42 | 
43 | ### 📊 Feature Importance (Top 5)
44 | | Feature         | SVM Weight | RF Importance |
45 | |-----------------|------------|---------------|
46 | | AST (SGOT)      | 0.22       | 0.23          |
47 | | Albumin         | 0.19       | 0.21          |
48 | | ALP             | 0.15       | 0.17          |
49 | | Age             | 0.11       | 0.09          |
50 | | Total Proteins  | 0.08       | 0.07          |
51 | 
52 | ## ⚙️ Tech Stack
53 | 
54 | - Python
55 | - Pandas, NumPy
56 | - Seaborn, Matplotlib
57 | - Scikit-learn
58 | 
59 | ## 🧠 Key Insights
60 | 
61 | - Males are 2.7× more likely to develop liver disease
62 | - Peak disease incidence observed between ages 45–60
63 | - High bilirubin and low albumin levels strongly indicate disease
64 | 
65 | ## 🚀 Future Scope
66 | 
67 | - Integrate deep learning (CNNs) for image+biochemical analysis
68 | - Deploy models as EHR-integrated decision support tools
69 | - Incorporate federated learning for secure, cross-hospital collaboration
70 | - Apply fairness audits to detect and mitigate bias
71 | 
72 | ## 📄 Report
73 | 
74 | The full project report is included as `Liver_Disease_Prediction_Using_Machine_Learning_Report.pdf`.
75 | 
76 | ## 👤 Author
77 | 
78 | **Kumud Ranjan**  
79 | M.Tech Data Science and Engineering  
80 | Lovely Professional University
81 | 


--------------------------------------------------------------------------------
/Project (Indian Liver Patient).py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Sat Apr 12 12:59:26 2025
  4 | 
  5 | @author: bgpda
  6 | """
  7 | 
  8 | import pandas as pd
  9 | import numpy as np
 10 | import matplotlib.pyplot as plt
 11 | import seaborn as sns
 12 | from sklearn.preprocessing import LabelEncoder, StandardScaler
 13 | from sklearn.model_selection import train_test_split, GridSearchCV
 14 | from sklearn.metrics import accuracy_score, classification_report, mean_squared_error, r2_score
 15 | from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
 16 | from sklearn.svm import SVC, SVR
 17 | 
 18 | # Load the data
 19 | df = pd.read_csv("C:/Users/bgpda/Desktop/LPU/Python_DataScience/indian_liver_patient.csv")
 20 | 
 21 | # Print Data
 22 | print(df)
 23 | print(df.shape)
 24 | # Basic Information
 25 | print(df.info())
 26 | print(df.head())
 27 | print(df.describe())
 28 | 
 29 | # Check for missing values
 30 | print(df.isnull().sum())
 31 | 
 32 | # Handle missing values (only one in Albumin_and_Globulin_Ratio)
 33 | df['Albumin_and_Globulin_Ratio'].fillna(df['Albumin_and_Globulin_Ratio'].median(), inplace=True)
 34 | 
 35 | # Convert Gender to numerical (Male=1, Female=0)
 36 | le = LabelEncoder()
 37 | df['Gender'] = le.fit_transform(df['Gender'])
 38 | 
 39 | # Rename target forr clarity
 40 | df = df.rename(columns={'Dataset':'Liver_Disease'})
 41 | 
 42 | # Check target distribution
 43 | print(df['Liver_Disease'].value_counts())
 44 | 
 45 | ########################################### ANALYSIS ###################################################
 46 | 
 47 | # 1. Gender Distribution Analysis: Examine the proportion of males and females in the dataset and their liver disease status.
 48 | 
 49 | gender_analysis = df.groupby(['Gender', 'Liver_Disease']).size().unstack()
 50 | gender_analysis.columns = ['No Disease', 'Disease']
 51 | gender_analysis.index = ['Female', 'Male']
 52 | 
 53 | plt.figure(figsize=(8,5))
 54 | gender_analysis.plot(kind='bar', stacked=True)
 55 | plt.title('Liver Disease Cases by Gender')
 56 | plt.ylabel('Count')
 57 | plt.xticks(rotation=0)
 58 | plt.show()
 59 | 
 60 | # Findings:
 61 | # (a) Males are significantly overrepresented in the dataset
 62 | # (b) Males have a much higher incidence of liver disease compared to females
 63 | 
 64 | 
 65 | # 2. Age Distribution by Disease Status Analysis: Compare age distributions between patients with and without liver disease.
 66 | 
 67 | plt.figure(figsize=(10,6))
 68 | sns.violinplot(x='Liver_Disease', y='Age', data=df, split=True)
 69 | plt.title('Age Distribution by Liver Disease Status')
 70 | plt.xticks([0,1], ['Disease', 'No Disease'])
 71 | plt.show()
 72 | 
 73 | # Findings:
 74 | # (a) Patients with liver disease tend to be slightly older
 75 | # (b) The age range is similar for both groups (20-70 years)
 76 | # (c) Younger patients (<20) are more likely to not have liver disease
 77 | 
 78 | 
 79 | # 3. Bilirubin Levels Analysis: Compare bilirubin levels between healthy and diseased patients.
 80 | 
 81 | plt.figure(figsize=(12,5))
 82 | plt.subplot(1,2,1)
 83 | sns.boxplot(x='Liver_Disease', y='Total_Bilirubin', data=df)
 84 | plt.title('Total Bilirubin by Disease Status')
 85 | 
 86 | plt.subplot(1,2,2)
 87 | sns.boxplot(x='Liver_Disease', y='Direct_Bilirubin', data=df)
 88 | plt.title('Direct Bilirubin by Disease Status')
 89 | plt.tight_layout()
 90 | plt.show()
 91 | 
 92 | # Findings:
 93 | # (a) Both total and direct bilirubin levels are significantly higher in patients with liver disease
 94 | # (b) Many outliers in the diseased group indicate severe cases
 95 | 
 96 | 
 97 | # 4. Liver Enzymes Analysis: Compare key liver enzymes (ALT, AST, ALP) between groups.
 98 | 
 99 | enzymes = ['Alkaline_Phosphotase', 'Alamine_Aminotransferase', 'Aspartate_Aminotransferase']
100 | 
101 | plt.figure(figsize=(15,5))
102 | for i, enzyme in enumerate(enzymes, 1):
103 |     plt.subplot(1,3,i)
104 |     sns.boxplot(x='Liver_Disease', y=enzyme, data=df)
105 |     plt.title(f'{enzyme} by Disease Status')
106 | plt.tight_layout()
107 | plt.show()
108 | 
109 | # Findings:
110 | # (a) All three enzymes show significantly higher levels in diseased patients
111 | # (b) AST shows the most dramatic difference between groups
112 | # (c) Extreme outliers suggest some acute liver injury cases
113 | 
114 | 
115 | # 5. Protein Levels Analysis: Examine protein-related biomarkers (Total Proteins, Albumin, A/G Ratio).
116 | 
117 | proteins = ['Total_Protiens', 'Albumin', 'Albumin_and_Globulin_Ratio']
118 | 
119 | plt.figure(figsize=(15,5))
120 | for i, protein in enumerate(proteins, 1):
121 |     plt.subplot(1,3,i)
122 |     sns.boxplot(x='Liver_Disease', y=protein, data=df)
123 |     plt.title(f'{protein} by Disease Status')
124 | plt.tight_layout()
125 | plt.show()
126 | 
127 | # Findings:
128 | # (a) Diseased patients have lower total proteins and albumin levels
129 | # (b) Albumin/Globulin ratio is significantly lower in diseased patients
130 | # (c) These findings are consistent with liver dysfunction
131 | 
132 | 
133 | # 6. Correlation Between Bilirubin and Enzymes Analysis: Explore relationships between bilirubin levels and liver enzymes.
134 | 
135 | corr_vars = ['Total_Bilirubin', 'Direct_Bilirubin', 'Alamine_Aminotransferase', 
136 |              'Aspartate_Aminotransferase', 'Liver_Disease']
137 | 
138 | plt.figure(figsize=(10,8))
139 | sns.heatmap(df[corr_vars].corr(), annot=True, cmap='coolwarm', center=0)
140 | plt.title('Correlation Between Bilirubin and Liver Enzymes')
141 | plt.show()
142 | 
143 | # Findings:
144 | # (a) Strong correlation between total and direct bilirubin (0.87)
145 | # (b) Moderate correlation between bilirubin and liver enzymes
146 | # (c) All biomarkers show positive correlation with liver disease
147 | 
148 | 
149 | # 7. Age vs. Enzyme Levels Analysis: Examine how enzyme levels vary with age.
150 | 
151 | plt.figure(figsize=(15,5))
152 | sns.scatterplot(x='Age', y='Alamine_Aminotransferase', hue='Liver_Disease', 
153 |                 data=df, alpha=0.6)
154 | plt.title('Age vs ALT Levels by Disease Status')
155 | plt.show()
156 | 
157 | #Findings:
158 | # (a) Younger patients with disease tend to have extremely high ALT levels
159 | # (b) Older patients generally show more moderate elevation
160 | # (c) Healthy patients maintain low ALT levels regardless of age
161 | 
162 | 
163 | 
164 | ######################################## Apply Models ###############################################
165 | 
166 | 
167 | # Create features and targets
168 | X = df.drop(['Liver_Disease', 'Total_Bilirubin', 'Direct_Bilirubin'], axis=1)  # Features
169 | y_class = df['Liver_Disease']  # Classification target (1: disease, 0: no disease)
170 | y_reg = df['Total_Bilirubin']  # Regression target
171 | 
172 | # Split data
173 | X_train, X_test, y_class_train, y_class_test, y_reg_train, y_reg_test = train_test_split(X, y_class, y_reg, test_size=0.2, random_state=42)
174 | 
175 | # Scale features
176 | scaler = StandardScaler()
177 | X_train_scaled = scaler.fit_transform(X_train)
178 | X_test_scaled = scaler.transform(X_test)
179 | 
180 | 
181 | # Classification Models (Support Vector Machine)
182 | 
183 | svm_clf = SVC(random_state=42)
184 | svm_clf.fit(X_train_scaled, y_class_train)
185 | svm_pred = svm_clf.predict(X_test_scaled)
186 | 
187 | print("\nSVM Results:")
188 | print(f"Accuracy: {accuracy_score(y_class_test, svm_pred):.2f}")
189 | print("Classification Report:")
190 | print(classification_report(y_class_test, svm_pred))
191 | 
192 | 
193 | # Regression Models (Random Forest)
194 | 
195 | rf_reg = RandomForestRegressor(random_state=42)
196 | rf_reg.fit(X_train_scaled, y_reg_train)
197 | rf_reg_pred = rf_reg.predict(X_test_scaled)
198 | 
199 | print("\nRandom Forest Regressor Results:")
200 | print(f"Mean Squared Error: {mean_squared_error(y_reg_test, rf_reg_pred):.2f}")
201 | print(f"R-squared: {r2_score(y_reg_test, rf_reg_pred):.2f}")
202 | 
203 | # Actual vs Predicted plot (Random Forest)
204 | plt.figure(figsize=(8,6))
205 | plt.scatter(y_reg_test, rf_reg_pred, alpha=0.5)
206 | plt.plot([y_reg_test.min(), y_reg_test.max()], [y_reg_test.min(), y_reg_test.max()], 'r--')
207 | plt.xlabel('Actual Bilirubin')
208 | plt.ylabel('Predicted Bilirubin')
209 | plt.title('Actual vs Predicted Bilirubin Levels (Random Forest)')
210 | plt.show()
211 | 


--------------------------------------------------------------------------------