├── LICENSE ├── README.md ├── data └── kindey stone urine analysis.csv ├── machine-learning-tutorial.ipynb ├── model ├── optimized_xgb_classifier.pkl └── unoptimized_xgb_classifier.pkl └── requirements.txt /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Ahmed Sameh 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ml-tutorial-for-biologists 2 | 3 | ## Overview 4 | This repository is designed for beginners to intermediate level ML/Bioinformatics engineers, students, and enthusiasts. It focuses on creating a simple ensemble classification model to predict if someone has a kidney stone using the dataset provided in the repository. 5 | 6 | ## Understanding the objective 7 | In this tutorial, we use an ensemble classification model, which combines multiple machine learning models to improve prediction accuracy. Ensemble methods leverage the strengths of different models to produce a more robust and reliable prediction. Specifically, this tutorial guides you through building a model to classify whether an individual has a kidney stone based on various features in the dataset. 8 | 9 | ## Getting Started 10 | To get started, clone the repository and run the tutorial [notebook](machine-learning-tutorial.ipynb) to see the steps and code. 11 | 12 | ## Prerequisites 13 | - Required Python libraries (listed in requirements.txt)
You can install them using the following command: 14 | ``` 15 | pip install -r requirements.txt 16 | ``` 17 | 18 | ## License 19 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. 20 | 21 | ## Contributing 22 | Contributions are welcome! Please feel free to submit a pull request. 23 | -------------------------------------------------------------------------------- /data/kindey stone urine analysis.csv: -------------------------------------------------------------------------------- 1 | gravity,ph,osmo,cond,urea,calc,target 2 | 1.021,4.91,725,14,443,2.45,0 3 | 1.017,5.74,577,20,296,4.49,0 4 | 1.008,7.2,321,14.9,101,2.36,0 5 | 1.011,5.51,408,12.6,224,2.15,0 6 | 1.005,6.52,187,7.5,91,1.16,0 7 | 1.02,5.27,668,25.3,252,3.34,0 8 | 1.012,5.62,461,17.4,195,1.4,0 9 | 1.029,5.67,1107,35.9,550,8.48,0 10 | 1.015,5.41,543,21.9,170,1.16,0 11 | 1.021,6.13,779,25.7,382,2.21,0 12 | 1.011,6.19,345,11.5,152,1.93,0 13 | 1.025,5.53,907,28.4,448,1.27,0 14 | 1.006,7.12,242,11.3,64,1.03,0 15 | 1.007,5.35,283,9.9,147,1.47,0 16 | 1.011,5.21,450,17.9,161,1.53,0 17 | 1.018,4.9,684,26.1,284,5.09,0 18 | 1.007,6.63,253,8.4,133,1.05,0 19 | 1.025,6.81,947,32.6,395,2.03,0 20 | 1.008,6.88,395,26.1,95,7.68,0 21 | 1.014,6.14,565,23.6,214,1.45,0 22 | 1.024,6.3,874,29.9,380,5.16,0 23 | 1.019,5.47,760,33.8,199,0.81,0 24 | 1.014,7.38,577,30.1,87,1.32,0 25 | 1.02,5.96,631,11.2,422,1.55,0 26 | 1.023,5.68,749,29,239,1.52,0 27 | 1.017,6.76,455,8.8,270,0.77,0 28 | 1.017,7.61,527,25.8,75,2.17,0 29 | 1.01,6.61,225,9.8,72,0.17,0 30 | 1.008,5.87,241,5.1,159,0.83,0 31 | 1.02,5.44,781,29,349,3.04,0 32 | 1.017,7.92,680,25.3,282,1.06,0 33 | 1.019,5.98,579,15.5,297,3.93,0 34 | 1.017,6.56,559,15.8,317,5.38,0 35 | 1.008,5.94,256,8.1,130,3.53,0 36 | 1.023,5.85,970,38,362,4.54,0 37 | 1.02,5.66,702,23.6,330,3.98,0 38 | 1.008,6.4,341,14.6,125,1.02,0 39 | 1.02,6.35,704,24.5,260,3.46,0 40 | 1.009,6.37,325,12.2,97,1.19,0 41 | 1.018,6.18,694,23.3,311,5.64,0 42 | 1.021,5.33,815,26,385,2.66,0 43 | 1.009,5.64,386,17.7,104,1.22,0 44 | 1.015,6.79,541,20.9,187,2.64,0 45 | 1.01,5.97,343,13.4,126,2.31,0 46 | 1.02,5.68,876,35.8,308,4.49,0 47 | 1.021,5.94,774,27.9,325,6.96,1 48 | 1.024,5.77,698,19.5,354,13,1 49 | 1.024,5.6,866,29.5,360,5.54,1 50 | 1.021,5.53,775,31.2,302,6.19,1 51 | 1.024,5.36,853,27.6,364,7.31,1 52 | 1.026,5.16,822,26,301,14.34,1 53 | 1.013,5.86,531,21.4,197,4.74,1 54 | 1.01,6.27,371,11.2,188,2.5,1 55 | 1.011,7.01,443,21.4,124,1.27,1 56 | 1.022,6.21,442,20.6,398,4.18,1 57 | 1.011,6.13,364,10.9,159,3.1,1 58 | 1.031,5.73,874,17.4,516,3.01,1 59 | 1.02,7.94,567,19.7,212,6.81,1 60 | 1.04,6.28,838,14.3,486,8.28,1 61 | 1.021,5.56,658,23.6,224,2.33,1 62 | 1.025,5.71,854,27,385,7.18,1 63 | 1.026,6.19,956,27.6,473,5.67,1 64 | 1.034,5.24,1236,27.3,620,12.68,1 65 | 1.033,5.58,1032,29.1,430,8.94,1 66 | 1.015,5.98,487,14.8,198,3.16,1 67 | 1.013,5.58,516,20.8,184,3.3,1 68 | 1.014,5.9,456,17.8,164,6.99,1 69 | 1.012,6.75,251,5.1,141,0.65,1 70 | 1.025,6.9,945,33.6,396,4.18,1 71 | 1.026,6.29,833,22.2,457,4.45,1 72 | 1.028,4.76,312,12.4,10,0.27,1 73 | 1.027,5.4,840,24.5,395,7.64,1 74 | 1.018,5.14,703,29,272,6.63,1 75 | 1.022,5.09,736,19.8,418,8.53,1 76 | 1.025,7.9,721,23.6,301,9.04,1 77 | 1.017,4.81,410,13.3,195,0.58,1 78 | 1.024,5.4,803,21.8,394,7.82,1 79 | 1.016,6.81,594,21.4,255,12.2,1 80 | 1.015,6.03,416,12.8,178,9.39,1 81 | -------------------------------------------------------------------------------- /model/optimized_xgb_classifier.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ahhmedsamehh/ml-tutorial-for-biologists/6f54e267b47f0fd61ddde9358aaf324164962aac/model/optimized_xgb_classifier.pkl -------------------------------------------------------------------------------- /model/unoptimized_xgb_classifier.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ahhmedsamehh/ml-tutorial-for-biologists/6f54e267b47f0fd61ddde9358aaf324164962aac/model/unoptimized_xgb_classifier.pkl -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pandas==1.5.3 2 | seaborn==0.12.2 3 | matplotlib==3.7.1 4 | scikit-learn==1.2.2 5 | xgboost==1.7.6 6 | joblib==1.3.0 --------------------------------------------------------------------------------