└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Stroke Prediction Based on Risk Factors 2 | 3 | This project is a machine learning-based stroke prediction system developed using healthcare data. It was created as part of my Data Science Minor Project at Lovely Professional University. 4 | 5 | The model predicts the likelihood of a stroke based on features like age, BMI, average glucose level, hypertension, heart disease, and lifestyle factors. 6 | 7 | ## What I Did 8 | 9 | - Cleaned and preprocessed the dataset (handled missing values, encoded categories, scaled features) 10 | - Applied SMOTE to balance the dataset 11 | - Trained two models: Logistic Regression (baseline) and Random Forest (after SMOTE) 12 | - Evaluated models using Accuracy, Recall, F1-score, ROC Curve, and Confusion Matrix 13 | 14 | ## Key Takeaways 15 | 16 | - SMOTE significantly improved stroke case detection 17 | - Random Forest outperformed Logistic Regression on the balanced data 18 | - Data preprocessing and class balancing are crucial for health predictions 19 | 20 | ## Dataset 21 | 22 | Kaggle Stroke Prediction Dataset 23 | [Link to dataset](https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset) 24 | 25 | ## Author 26 | 27 | Bhavya 28 | B.Tech CSE, Lovely Professional University 29 | 30 | --- 31 | 32 | This project is for academic learning and research purposes. 33 | 34 | --------------------------------------------------------------------------------