├── README.md ├── .gitignore ├── ProjectProposal_Group141_WI24.ipynb ├── oasis_longitudinal.csv └── DataCheckpoint_Group141_WI24.ipynb /README.md: -------------------------------------------------------------------------------- 1 | This is your group repo for your final project for COGS108. 2 | 3 | This repository is private, and is only visible to the course instructors and your group mates; it is not visible to anyone else. 4 | 5 | Template notebooks for each component are provided. Only work on the notebook prior to its due date. After each submission is due, move onto the next notebook (For example, after the proposal is due, start working in the Data Checkpoint notebook). 6 | 7 | This repository will be frozen on the final project due date. No further changes can be made after that time. 8 | 9 | Your project proposal and final project will be graded based solely on the corresponding project notebooks in this repository. 10 | 11 | Template Jupyter notebooks have been included, with your group number replacing the XXX in the following file names. For each due date, make sure you have a notebook present in this repository by each due date with the following name (where XXX is replaced by your group number): 12 | 13 | - `ProjectProposal_groupXXX.ipynb` 14 | - `DataCheckpoint_groupXXX.ipynb` 15 | - `EDACheckpoint_groupXXX.ipynb` 16 | - `FinalProject_groupXXX.ipynb` 17 | 18 | This is *your* repo. You are free to manage the repo as you see fit, edit this README, add data files, add scripts, etc. So long as there are the four files above on due dates with the required information, the rest is up to you all. 19 | 20 | Also, you are free and encouraged to share this project after the course and to add it to your portfolio. Just be sure to fork it to your GitHub at the end of the quarter! 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # poetry 98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 102 | #poetry.lock 103 | 104 | # pdm 105 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 106 | #pdm.lock 107 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 108 | # in version control. 109 | # https://pdm.fming.dev/#use-with-ide 110 | .pdm.toml 111 | 112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 113 | __pypackages__/ 114 | 115 | # Celery stuff 116 | celerybeat-schedule 117 | celerybeat.pid 118 | 119 | # SageMath parsed files 120 | *.sage.py 121 | 122 | # Environments 123 | .env 124 | .venv 125 | env/ 126 | venv/ 127 | ENV/ 128 | env.bak/ 129 | venv.bak/ 130 | 131 | # Spyder project settings 132 | .spyderproject 133 | .spyproject 134 | 135 | # Rope project settings 136 | .ropeproject 137 | 138 | # mkdocs documentation 139 | /site 140 | 141 | # mypy 142 | .mypy_cache/ 143 | .dmypy.json 144 | dmypy.json 145 | 146 | # Pyre type checker 147 | .pyre/ 148 | 149 | # pytype static type analyzer 150 | .pytype/ 151 | 152 | # Cython debug symbols 153 | cython_debug/ 154 | 155 | # PyCharm 156 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 157 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 158 | # and can be added to the global gitignore or merged into this file. For a more nuclear 159 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 160 | #.idea/ 161 | 162 | -------------------------------------------------------------------------------- /ProjectProposal_Group141_WI24.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "42a14256-91c8-446e-92a7-ab6bf11055d3", 6 | "metadata": {}, 7 | "source": [ 8 | "# **COGS 108 - Project Proposal**\n", 9 | "\n", 10 | "# **Names**\n", 11 | "\n", 12 | "- Shivangi Gupta\n", 13 | "\n", 14 | "- Joseph Hwang\n", 15 | "\n", 16 | "- Zijun Yang\n", 17 | "\n", 18 | "- Johnny Gonzales\n", 19 | "\n", 20 | "- Tanishq Rathore\n", 21 | "\n", 22 | "# **Research Question**\n", 23 | "\n", 24 | "Utilizing clinical MRI Data and personal details of an individual, can we predict via machine learning model whether\n", 25 | "an individual will have an onset of Alzheimer's disease? Features the model will be trained on include variables such as Mini Mental State Examination (MMSE), visit number, Clinical Dementia Rating (CDR), gender, age, years of education, socioeconomic status, Estimated total intracranial volume (eTIV), Normalize Whole Brain Volume (nWBV), and Atlas Scaling Factor (ASF).\n", 26 | "\n", 27 | "## **Background and Prior Work**\n", 28 | "\n", 29 | "Background\n", 30 | "\n", 31 | "Advancements in healthcare, improvements in living conditions, and\n", 32 | "breakthroughs in medicine have collectively contributed to longer life\n", 33 | "expectancies worldwide; simultaneously, developed countries are also\n", 34 | "experiencing declining fertility\n", 35 | "rates.[1](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/)\n", 36 | "The combination of these two circumstances has resulted in the\n", 37 | "proportion of older people within populations to steadily increase. The\n", 38 | "World Health Organization (WHO) reported that “in 2020, the number of\n", 39 | "people aged 60 and older outnumbered children younger than 5\n", 40 | "years”.[2](https://www.who.int/news-room/fact-sheets/detail/ageing-and-health)\n", 41 | "In addition, they also state that “between 2015 and 2050, the proportion\n", 42 | "of the world’s population over 60 years will nearly double from 12% to\n", 43 | "22%”. As a result, it is reasonable that we examine common health\n", 44 | "conditions associated with older age, one being Alzheimer’s disease.\n", 45 | "\n", 46 | "So what is Alzheimer’s disease? Alzheimer’s disease is a progressive\n", 47 | "neurodegenerative brain disorder that impairs memory and cognitive\n", 48 | "functions. It is the most common cause of dementia and affects about 6.5\n", 49 | "million people in the United States who are aged 65 and\n", 50 | "older.[3](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447)\n", 51 | "At the moment, there are no cures for the disease but medicines may\n", 52 | "improve or slow the progression of\n", 53 | "symptoms.[3](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447)\n", 54 | "As such, it is our project to create a model that is able to predict\n", 55 | "Alzheimer’s disease based on clinical data that include factors that\n", 56 | "show risk and progression of the disease.\n", 57 | "\n", 58 | "There are several other projects that have asked similar questions and\n", 59 | "approached similar problems for other diseases. For instance, one study\n", 60 | "tried to use machine learning methods to predict risk of cardiovascular\n", 61 | "disease based on major contributing\n", 62 | "factors.[4](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/)\n", 63 | "Similarly, another paper used machine learning and ranker-based feature\n", 64 | "selection methods to predict eye diseases based on\n", 65 | "symptoms.[5](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/)\n", 66 | "Lastly, there was a paper that predicted thyroid disease using selective\n", 67 | "features and machine learning\n", 68 | "techniques.[6](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/)\n", 69 | "All three papers seem to have been relatively successful in predicting\n", 70 | "the disease based on distinct features. Evidently, training machine\n", 71 | "learning models based on datasets which contain factors and indicators\n", 72 | "for a given disease is not a novel format of question and method; we\n", 73 | "hope to achieve similarly for Alzheimer’s disease.\n", 74 | "\n", 75 | "In-Depth Study Analysis\n", 76 | "\n", 77 | "Our group analyzed two studies published in the National Institute of\n", 78 | "Health’s (NIH) journal database. The first study[7](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/) employed\n", 79 | "machine learning models to predict early-stage Alzheimer's Disease using\n", 80 | "Open Access Series of Imaging Studies (OASIS) data, focusing on metrics\n", 81 | "like precision, recall, accuracy, and F1-score. The authors, with\n", 82 | "backgrounds in technology and health research, aimed to enhance early\n", 83 | "diagnosis, potentially lowering Alzheimer's mortality rates. The study\n", 84 | "demonstrated that machine learning techniques such as decision trees,\n", 85 | "random forests, SVM, gradient boosting, and voting classifiers can\n", 86 | "effectively predict early-stage Alzheimer's Disease with an accuracy of\n", 87 | "up to 83%. This achievement highlights the critical role of data science\n", 88 | "in identifying Alzheimer's at an early phase, leveraging feature\n", 89 | "selection and advanced algorithms to enhance diagnostic accuracy. Early\n", 90 | "detection is crucial for timely intervention, potentially mitigating the\n", 91 | "disease's progression and impact on patients and their families (Kavitha\n", 92 | "et al.).\n", 93 | "\n", 94 | "The second study[8](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/) titled\n", 95 | "“Application of machine learning methods for diagnosis of dementia based\n", 96 | "on the 10/66 battery of cognitive function tests in south India”\n", 97 | "investigated the use of machine learning for diagnosing dementia in\n", 98 | "South India, employing the culturally and educationally fair 10/66\n", 99 | "battery of cognitive function tests designed for use in low and\n", 100 | "middle-income countries. Through the analysis of neuropsychological\n", 101 | "data, demographic information, and normative data, the research applied\n", 102 | "Jrip classification algorithm among others, achieving high diagnostic\n", 103 | "accuracy. This approach demonstrates the potential to streamline the\n", 104 | "diagnostic process, making it quicker and more accessible for clinicians\n", 105 | "and patients in India, thereby addressing the significant healthcare\n", 106 | "challenge of efficiently identifying dementia in community settings\n", 107 | "(Bhagyashree et al).\n", 108 | "\n", 109 | "In-Depth Analysis of Similar Projects\n", 110 | "\n", 111 | "Our group also delved into actual Kaggle projects that are directly\n", 112 | "associated with the dataset we’ve chosen to use, delving into EDA and\n", 113 | "prediction models using Scikit-Learn and Tensorflow. The first\n", 114 | "project[9](https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri)\n", 115 | "I will discuss starts with an introduction to Alzheimer's disease and\n", 116 | "the problem statement of estimating the Clinical Dementia Rating (CDR)\n", 117 | "using MRI dataset features. It progresses through data loading and\n", 118 | "preprocessing, including null value handling and normalization, and\n", 119 | "employs machine learning techniques, specifically mentioning model\n", 120 | "training with hyperparameter tuning for XGBClassifier and\n", 121 | "GradientBoostingClassifier. The notebook concludes with predictions and\n", 122 | "performance evaluation, indicated by confusion matrix and classification\n", 123 | "report visualizations, and the model was able to reach a final accuracy\n", 124 | "of \\~80%.\n", 125 | "\n", 126 | "The second project[10](https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf) tried to predict the Clinical Rating of Alzheimer's disease (CRA) by\n", 127 | "integrating data loading, visualization, and extensive machine learning,\n", 128 | "including the use of TensorFlow for neural network models. It explored\n", 129 | "various machine learning models, with a special emphasis on model\n", 130 | "training and evaluation, culminating in the finding that the\n", 131 | "DecisionTreeClassifier performed the best among the models tested. The\n", 132 | "conclusion stressed the need for more data to enhance the precision of\n", 133 | "Alzheimer's disease predictions, highlighting the challenge of data\n", 134 | "scarcity in achieving accurate diagnostic models.\n", 135 | "\n", 136 | "**References**\n", 137 | "\n", 138 | "1. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/) Nargund G. (2009) Declining birth rate in Developed Countries: A\n", 139 | " radical policy re-think is required. *Facts, views & vision in\n", 140 | " ObGyn, 1(3), 191–193.*\n", 141 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/)\n", 142 | "\n", 143 | "2. [^](https://www.who.int/news-room/fact-sheets/detail/ageing-and-health) World Health Organization. (1 Oct 2022) Ageing and health. *World\n", 144 | " Health Organization*.\n", 145 | " [https://www.who.int/news-room/fact-sheets/detail/ageing-and-health](https://www.who.int/news-room/fact-sheets/detail/ageing-and-health)\n", 146 | "\n", 147 | "3. [^](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447) Mayo Foundation for Medical Education and Research. (30\n", 148 | " August 2023) Alzheimer’s disease. *Mayo Clinic*.\n", 149 | " [https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447)\n", 150 | "\n", 151 | "4. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/) Peng, M., Hou, F., Cheng, Z., Shen, T., Liu, K., Zhao, C., &\n", 152 | " Zheng, W. (23 Mar 2023) Prediction of cardiovascular disease risk\n", 153 | " based on major contributing features. *Scientific reports, 13(1),\n", 154 | " 4778*.\n", 155 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/)\n", 156 | "\n", 157 | "5. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/) Marouf, A. A., Mottalib, M. M., Alhajj, R., Rokne, J., &\n", 158 | " Jafarullah, O. (24 Dec 2022) An Efficient Approach to Predict Eye\n", 159 | " Diseases from Symptoms Using Machine Learning and Ranker-Based\n", 160 | " Feature Selection Methods. *Bioengineering (Basel, Switzerland),\n", 161 | " 10(1), 25*.\n", 162 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/)\n", 163 | "\n", 164 | "6. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/) Chaganti, R., Rustam, F., De La Torre Díez, I., Mazón, J. L. V.,\n", 165 | " Rodríguez, C. L., & Ashraf, I. (13 Aug 2022). Thyroid Disease\n", 166 | " Prediction Using Selective Features and Machine Learning\n", 167 | " Techniques. *Cancers, 14(16), 3914*.\n", 168 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/)\n", 169 | "\n", 170 | "7. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/) Kavitha C, Mani V, Srividhya SR, Khalaf OI, Tavera Romero CA.\n", 171 | " Early-Stage Alzheimer's Disease Prediction Using Machine Learning\n", 172 | " Models. Front Public Health. 2022 Mar 3;10:853294. doi:\n", 173 | " 10.3389/fpubh.2022.853294. PMID: 35309200; PMCID: PMC8927715.\n", 174 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/)\n", 175 | "\n", 176 | "8. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/) Bhagyashree SIR, Nagaraj K, Prince M, Fall CHD, Krishna M.\n", 177 | " Diagnosis of Dementia by Machine learning methods in\n", 178 | " Epidemiological studies: a pilot exploratory study from south\n", 179 | " India. Soc Psychiatry Psychiatr Epidemiol. 2018 Jan;53(1):77-86.\n", 180 | " doi: 10.1007/s00127-017-1410-0. Epub 2017 Jul 11. PMID: 28698926;\n", 181 | " PMCID: PMC6138240.\n", 182 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/)\n", 183 | "\n", 184 | "9. [^](https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri) Reddy, Shreyas. 2021. Alzheimer's analysis using MRI, February\n", 185 | " 8, 2024.\n", 186 | " [https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri](https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri)\n", 187 | "\n", 188 | "10. [^](https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf) Andrew. 2017. Predict alzheimer disease sl and tf, February 8, 2024.\n", 189 | " [https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf](https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf)\n", 190 | "\n", 191 | "# **Hypothesis**\n", 192 | "\n", 193 | "Our project's hypothesis is the following: \"It is possible to predict the onset of Alzheimers based on the combination of (1) clinical data* and (2) personal features such as gender, age, years of education, and socioeconomic status.\" We believe that we will be able to successfully train a model that is able to predict the onset of Alzheimer's disease because, as mentioned in the background portion of the proposal, there has been numerous successful machine learning models trained on clinical data to predict the onset of a disease. The clinical data provide variables that capture both the cognitive and structural changes associated with the disease's progression. Incorporating personal features such as gender and age in the prediction model is justified by extensive research indicating that these factors can influence the risk and progression rate of Alzheimer's disease. Together, we believe that this will be enough data to train a model to predict the onset of Alzheimer's disease.\n", 194 | "\n", 195 | "*Clinical data include Mini Mental State Examination (MMSE), visit number, Clinical Dementia Rating (CDR), Estimated total intracranial volume (eTIV), Normalize Whole Brain Volume (nWBV), and the Atlas Scaling Factor (ASF)\n", 196 | "\n", 197 | "# **Data**\n", 198 | "\n", 199 | "1. The ideal dataset to train our machine learning model would contain clinical and personal variables that capture the progression and risk level of Alzheimer's disease. These datasets would have to be found online from reputable medical research centers or through large-scale longitudinal studies focusing on aging and dementia. Within these studies, MRI scans and protocols should be conducted using standardized procedures for consistency across the dataset.\n", 200 | "\n", 201 | " In addition, a large sample size would be ideal, although preliminary searches in clinical data seem to range around 500 for diseases. We also note that although incorporating many variables will make it difficult to interpret and determine what the main driving factor is in which makes the model predict the onset of the disease in an individual, our goal is just to create a model that is able to predict the onset given a set of clinical and personal variables that are commonly available. Perhaps determining which factor influences the model's decision the most may be another interesting avenue to explore.\n", 202 | "\n", 203 | " The clinical variables in which we want to examine are the following:\n", 204 | " - Mini Mental State Examination (MMSE): A measure of cognitive impairment.\n", 205 | " - Clinical Dementia Rating (CDR): A numerical scale used to quantify the severity of symptoms of dementia.\n", 206 | " - Estimated Total Intracranial Volume (eTIV)\n", 207 | " - Normalize Whole Brain Volume (nWBV) \n", 208 | " - Atlas Scaling Factor (ASF): Measures of brain volume and structure, obtained from MRI scans, that can indicate brain atrophy associated with Alzheimer's.\n", 209 | " - Visit Number\n", 210 | "\n", 211 | " Personal features that are commonly associated with Alzhiemer's risk that we also want to include in our project are the following:\n", 212 | " - Gender\n", 213 | " - Age\n", 214 | " - Years of Education\n", 215 | " - Socioeconomic Status\n", 216 | " - Indicators of cognitive reserve\n", 217 | "\n", 218 | "2. Searching for potential datasets, we found that there is the Open Access Series of Imaging Studies (OASIS) which makes available MRI data sets related to Alzheimer's disease. We were able to download the dataset and found that it included clinical and personal variables that we ideally wanted to incorporate. OASIS makes available these data to be accessible through csv files which we can download and utilize independently. There is also the UK Biobank which is a large scale database with 500,000 participants which include patient information and brain imaging data, but does not have all indicators. Lastly, we found data from the National Alzheimer's Coordinating Center (NACC) which collects and stores data from Alzheimer's research centers around the US. The data appears reliable but is not in a standardized format (which requires cleaning as a result).\n", 219 | "\n", 220 | "# **Ethics & Privacy**\n", 221 | "\n", 222 | "Ethics & Privacy Considerations:\n", 223 | "\n", 224 | "Biases/Privacy/Terms of Use Issues with Proposed Data:\n", 225 | "\n", 226 | "1. The potential datasets considered, such as OASIS MRI, UK Biobank,\n", 227 | " and NACC, may have biases and privacy considerations. For\n", 228 | " instance, the OASIS MRI project's terms of use and participant\n", 229 | " selection could introduce biases. UK Biobank, despite its size,\n", 230 | " might not include a diverse representation of certain populations,\n", 231 | " leading to potential biases in the dataset.\n", 232 | "\n", 233 | "Potential Biases in Dataset Composition and Collection:\n", 234 | "\n", 235 | "2. Biases may arise in dataset composition and collection, affecting\n", 236 | " the equitable analysis of Alzheimer's prediction. For example, if\n", 237 | " the data predominantly includes participants from specific\n", 238 | " demographic groups, it could introduce biases in the model.\n", 239 | " Additionally, variations in data collection methods across\n", 240 | " different research centers, as in the case of NACC, may impact\n", 241 | " standardization, potentially leading to biases.\n", 242 | "\n", 243 | "Detection and Mitigation of Biases:\n", 244 | "\n", 245 | "3. To detect biases, the group will conduct a thorough review of the\n", 246 | " dataset sources, including participant demographics and\n", 247 | " recruitment methods. During data preprocessing, the team will\n", 248 | " analyze variables for potential biases, ensuring a balanced\n", 249 | " representation. The group plans to collaborate with experts in the\n", 250 | " field and seek external input to validate the fairness and\n", 251 | " inclusivity of the dataset.\n", 252 | "\n", 253 | "Other Issues Related to Privacy and Equitable Impact:\n", 254 | "\n", 255 | "4. Privacy concerns arise from the sensitive nature of medical data,\n", 256 | " especially in Alzheimer's research. Ensuring participant anonymity\n", 257 | " and adhering to privacy regulations are paramount. Equitable\n", 258 | " impact considerations involve understanding if the model's\n", 259 | " predictions could disproportionately affect certain groups. It is\n", 260 | " essential to communicate findings responsibly, avoiding\n", 261 | " reinforcing existing biases or stigmatizing specific populations.\n", 262 | "\n", 263 | "Handling Identified Issues:\n", 264 | "\n", 265 | "5. The group commits to transparently communicating any identified\n", 266 | " biases throughout the research process. Mitigation strategies will\n", 267 | " be implemented during data preprocessing and model development.\n", 268 | " The team will consider alternative datasets or additional sampling\n", 269 | " methods if biases persist. Ethical review boards will be\n", 270 | " consulted, and the group aims to publish findings with a clear\n", 271 | " acknowledgment of potential limitations and biases, promoting\n", 272 | " responsible and equitable use of the predictive model.\n", 273 | "\n", 274 | "In summary, the group is dedicated to addressing ethical concerns\n", 275 | "comprehensively, from data collection to analysis and post-analysis.\n", 276 | "Transparency, collaboration with experts, and continuous evaluation of\n", 277 | "potential biases will guide the research, ensuring responsible and\n", 278 | "ethical development of the Alzheimer's prediction model.\n", 279 | "\n", 280 | "# **Team Expectations**\n", 281 | "\n", 282 | "1. Clear communication and relatively reasonable responsiveness to messages sent in our Discord group chat.\n", 283 | "2. Shared Responsibility and Accountability: finish all tasked work by the designated due date.\n", 284 | "3. Maintain quality work and attention to detail; ask questions when necessary.\n", 285 | "4. Attendance and Participation: during designated days in which we meet (either through Zoom or in-person), everyone should be present unless notified previously in advance. \n", 286 | "\n", 287 | "\n", 288 | "# **Project Timeline Proposal**\n", 289 | "\n", 290 | "| **Meeting Date** | **Meeting Time** | **Completed Before Meeting** | **Discuss at Meeting** |\n", 291 | "|-----------|-----------|-----------------------------|-----------------------|\n", 292 | "| 2/10 | 1 PM | Edit, finalize, and submit proposal; Search for datasets (everyone) | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part |\n", 293 | "| 2/16 | 6 PM | Import & Wrangle Data (Johnny); EDA (Shivangi) | Review/Edit wrangling/EDA; Discuss Analysis Plan |\n", 294 | "| 2/23 | 12 PM | Finalize wrangling/EDA; Begin Analysis (Joseph; Zijun) | Discuss/edit Analysis; Complete project check-in |\n", 295 | "| 3/16 | 12 PM | Complete analysis; Draft results/conclusion/discussion (Tanishq); everyone will make/record our presentation | Discuss/edit full project; Record our video |\n", 296 | "| 3/20 | Before 11:59 PM | Finalize/proofread project conclusion and submit report and video (everyone) | Turn in Final Project & Group Project Surveys |" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "id": "ba0dd509", 302 | "metadata": {}, 303 | "source": [] 304 | } 305 | ], 306 | "metadata": { 307 | "language_info": { 308 | "name": "python" 309 | } 310 | }, 311 | "nbformat": 4, 312 | "nbformat_minor": 5 313 | } 314 | -------------------------------------------------------------------------------- /oasis_longitudinal.csv: -------------------------------------------------------------------------------- 1 | Subject ID,MRI ID,Group,Visit,MR Delay,M/F,Hand,Age,EDUC,SES,MMSE,CDR,eTIV,nWBV,ASF 2 | OAS2_0001,OAS2_0001_MR1,Nondemented,1,0,M,R,87,14,2,27,0,1987,0.696,0.883 3 | OAS2_0001,OAS2_0001_MR2,Nondemented,2,457,M,R,88,14,2,30,0,2004,0.681,0.876 4 | OAS2_0002,OAS2_0002_MR1,Demented,1,0,M,R,75,12,,23,0.5,1678,0.736,1.046 5 | OAS2_0002,OAS2_0002_MR2,Demented,2,560,M,R,76,12,,28,0.5,1738,0.713,1.010 6 | OAS2_0002,OAS2_0002_MR3,Demented,3,1895,M,R,80,12,,22,0.5,1698,0.701,1.034 7 | OAS2_0004,OAS2_0004_MR1,Nondemented,1,0,F,R,88,18,3,28,0,1215,0.710,1.444 8 | OAS2_0004,OAS2_0004_MR2,Nondemented,2,538,F,R,90,18,3,27,0,1200,0.718,1.462 9 | OAS2_0005,OAS2_0005_MR1,Nondemented,1,0,M,R,80,12,4,28,0,1689,0.712,1.039 10 | OAS2_0005,OAS2_0005_MR2,Nondemented,2,1010,M,R,83,12,4,29,0.5,1701,0.711,1.032 11 | OAS2_0005,OAS2_0005_MR3,Nondemented,3,1603,M,R,85,12,4,30,0,1699,0.705,1.033 12 | OAS2_0007,OAS2_0007_MR1,Demented,1,0,M,R,71,16,,28,0.5,1357,0.748,1.293 13 | OAS2_0007,OAS2_0007_MR3,Demented,3,518,M,R,73,16,,27,1,1365,0.727,1.286 14 | OAS2_0007,OAS2_0007_MR4,Demented,4,1281,M,R,75,16,,27,1,1372,0.710,1.279 15 | OAS2_0008,OAS2_0008_MR1,Nondemented,1,0,F,R,93,14,2,30,0,1272,0.698,1.380 16 | OAS2_0008,OAS2_0008_MR2,Nondemented,2,742,F,R,95,14,2,29,0,1257,0.703,1.396 17 | OAS2_0009,OAS2_0009_MR1,Demented,1,0,M,R,68,12,2,27,0.5,1457,0.806,1.205 18 | OAS2_0009,OAS2_0009_MR2,Demented,2,576,M,R,69,12,2,24,0.5,1480,0.791,1.186 19 | OAS2_0010,OAS2_0010_MR1,Demented,1,0,F,R,66,12,3,30,0.5,1447,0.769,1.213 20 | OAS2_0010,OAS2_0010_MR2,Demented,2,854,F,R,68,12,3,29,0.5,1482,0.752,1.184 21 | OAS2_0012,OAS2_0012_MR1,Nondemented,1,0,F,R,78,16,2,29,0,1333,0.748,1.316 22 | OAS2_0012,OAS2_0012_MR2,Nondemented,2,730,F,R,80,16,2,29,0,1323,0.738,1.326 23 | OAS2_0012,OAS2_0012_MR3,Nondemented,3,1598,F,R,83,16,2,29,0,1323,0.718,1.327 24 | OAS2_0013,OAS2_0013_MR1,Nondemented,1,0,F,R,81,12,4,30,0,1230,0.715,1.427 25 | OAS2_0013,OAS2_0013_MR2,Nondemented,2,643,F,R,82,12,4,30,0,1212,0.720,1.448 26 | OAS2_0013,OAS2_0013_MR3,Nondemented,3,1456,F,R,85,12,4,29,0,1225,0.710,1.433 27 | OAS2_0014,OAS2_0014_MR1,Demented,1,0,M,R,76,16,3,21,0.5,1602,0.697,1.096 28 | OAS2_0014,OAS2_0014_MR2,Demented,2,504,M,R,77,16,3,16,1,1590,0.696,1.104 29 | OAS2_0016,OAS2_0016_MR1,Demented,1,0,M,R,88,8,4,25,0.5,1651,0.660,1.063 30 | OAS2_0016,OAS2_0016_MR2,Demented,2,707,M,R,90,8,4,23,0.5,1668,0.646,1.052 31 | OAS2_0017,OAS2_0017_MR1,Nondemented,1,0,M,R,80,12,3,29,0,1783,0.752,0.985 32 | OAS2_0017,OAS2_0017_MR3,Nondemented,3,617,M,R,81,12,3,27,0.5,1814,0.759,0.968 33 | OAS2_0017,OAS2_0017_MR4,Nondemented,4,1861,M,R,85,12,3,30,0,1820,0.755,0.964 34 | OAS2_0017,OAS2_0017_MR5,Nondemented,5,2400,M,R,86,12,3,27,0,1813,0.761,0.968 35 | OAS2_0018,OAS2_0018_MR1,Converted,1,0,F,R,87,14,1,30,0,1406,0.715,1.248 36 | OAS2_0018,OAS2_0018_MR3,Converted,3,489,F,R,88,14,1,29,0,1398,0.713,1.255 37 | OAS2_0018,OAS2_0018_MR4,Converted,4,1933,F,R,92,14,1,27,0.5,1423,0.696,1.234 38 | OAS2_0020,OAS2_0020_MR1,Converted,1,0,M,R,80,20,1,29,0,1587,0.693,1.106 39 | OAS2_0020,OAS2_0020_MR2,Converted,2,756,M,R,82,20,1,28,0.5,1606,0.677,1.093 40 | OAS2_0020,OAS2_0020_MR3,Converted,3,1563,M,R,84,20,1,26,0.5,1597,0.666,1.099 41 | OAS2_0021,OAS2_0021_MR1,Demented,1,0,M,R,72,20,1,26,0.5,1911,0.719,0.919 42 | OAS2_0021,OAS2_0021_MR2,Demented,2,1164,M,R,76,20,1,25,0.5,1926,0.736,0.911 43 | OAS2_0022,OAS2_0022_MR1,Nondemented,1,0,F,R,61,16,3,30,0,1313,0.805,1.337 44 | OAS2_0022,OAS2_0022_MR2,Nondemented,2,828,F,R,64,16,3,29,0,1316,0.796,1.333 45 | OAS2_0023,OAS2_0023_MR1,Demented,1,0,F,R,86,12,4,21,0.5,1247,0.662,1.407 46 | OAS2_0023,OAS2_0023_MR2,Demented,2,578,F,R,87,12,4,21,0.5,1250,0.652,1.405 47 | OAS2_0026,OAS2_0026_MR1,Demented,1,0,M,R,82,12,3,27,0.5,1420,0.713,1.236 48 | OAS2_0026,OAS2_0026_MR2,Demented,2,673,M,R,84,12,3,27,0.5,1445,0.695,1.214 49 | OAS2_0027,OAS2_0027_MR1,Nondemented,1,0,F,R,69,12,3,29,0,1365,0.783,1.286 50 | OAS2_0027,OAS2_0027_MR2,Nondemented,2,609,F,R,71,12,3,30,0,1360,0.782,1.291 51 | OAS2_0027,OAS2_0027_MR3,Nondemented,3,1234,F,R,73,12,3,30,0,1358,0.775,1.293 52 | OAS2_0027,OAS2_0027_MR4,Nondemented,4,1779,F,R,74,12,3,30,0,1353,0.772,1.297 53 | OAS2_0028,OAS2_0028_MR1,Demented,1,0,M,R,64,18,2,22,0.5,1547,0.737,1.134 54 | OAS2_0028,OAS2_0028_MR2,Demented,2,610,M,R,66,18,2,21,1,1562,0.717,1.124 55 | OAS2_0029,OAS2_0029_MR1,Nondemented,1,0,F,R,77,12,4,29,0,1377,0.734,1.275 56 | OAS2_0029,OAS2_0029_MR2,Nondemented,2,1099,F,R,80,12,4,30,0,1390,0.735,1.263 57 | OAS2_0030,OAS2_0030_MR1,Nondemented,1,0,F,R,60,18,1,30,0,1402,0.822,1.252 58 | OAS2_0030,OAS2_0030_MR2,Nondemented,2,932,F,R,62,18,1,30,0,1392,0.817,1.261 59 | OAS2_0031,OAS2_0031_MR1,Converted,1,0,F,R,86,12,3,30,0,1430,0.718,1.227 60 | OAS2_0031,OAS2_0031_MR2,Converted,2,446,F,R,88,12,3,30,0,1445,0.719,1.215 61 | OAS2_0031,OAS2_0031_MR3,Converted,3,1588,F,R,91,12,3,28,0.5,1463,0.696,1.199 62 | OAS2_0032,OAS2_0032_MR1,Demented,1,0,M,R,90,12,3,21,0.5,1307,0.679,1.342 63 | OAS2_0032,OAS2_0032_MR2,Demented,2,642,M,R,92,12,3,24,0.5,1311,0.676,1.339 64 | OAS2_0034,OAS2_0034_MR1,Nondemented,1,0,F,R,79,16,1,29,0,1466,0.703,1.197 65 | OAS2_0034,OAS2_0034_MR2,Nondemented,2,489,F,R,80,16,1,30,0,1450,0.698,1.210 66 | OAS2_0034,OAS2_0034_MR3,Nondemented,3,1287,F,R,82,16,1,30,0,1460,0.695,1.202 67 | OAS2_0034,OAS2_0034_MR4,Nondemented,4,1884,F,R,84,16,1,30,0,1453,0.684,1.208 68 | OAS2_0035,OAS2_0035_MR1,Nondemented,1,0,F,R,88,12,4,30,0,1336,0.738,1.313 69 | OAS2_0035,OAS2_0035_MR2,Nondemented,2,405,F,R,89,12,4,27,0,1329,0.733,1.320 70 | OAS2_0036,OAS2_0036_MR1,Nondemented,1,0,F,R,69,13,4,30,0,1359,0.789,1.291 71 | OAS2_0036,OAS2_0036_MR3,Nondemented,3,713,F,R,70,13,4,30,0,1361,0.783,1.290 72 | OAS2_0036,OAS2_0036_MR4,Nondemented,4,1770,F,R,73,13,4,30,0,1360,0.773,1.291 73 | OAS2_0036,OAS2_0036_MR5,Nondemented,5,2369,F,R,75,13,4,29,0,1349,0.778,1.301 74 | OAS2_0037,OAS2_0037_MR1,Demented,1,0,M,R,82,12,4,27,0.5,1477,0.729,1.188 75 | OAS2_0037,OAS2_0037_MR2,Demented,2,1123,M,R,85,12,4,29,0.5,1487,0.717,1.180 76 | OAS2_0037,OAS2_0037_MR3,Demented,3,2029,M,R,88,12,4,26,0.5,1483,0.709,1.184 77 | OAS2_0037,OAS2_0037_MR4,Demented,4,2508,M,R,89,12,4,26,0.5,1485,0.706,1.181 78 | OAS2_0039,OAS2_0039_MR1,Demented,1,0,F,R,81,18,2,26,0.5,1174,0.742,1.495 79 | OAS2_0039,OAS2_0039_MR2,Demented,2,486,F,R,83,18,2,25,0.5,1179,0.733,1.488 80 | OAS2_0040,OAS2_0040_MR1,Demented,1,0,M,R,84,6,4,25,0.5,1310,0.727,1.339 81 | OAS2_0040,OAS2_0040_MR2,Demented,2,567,M,R,86,6,4,27,0.5,1320,0.724,1.329 82 | OAS2_0040,OAS2_0040_MR3,Demented,3,1204,M,R,88,6,4,23,0.5,1348,0.713,1.302 83 | OAS2_0041,OAS2_0041_MR1,Converted,1,0,F,R,71,16,1,27,0,1289,0.771,1.362 84 | OAS2_0041,OAS2_0041_MR2,Converted,2,756,F,R,73,16,1,28,0,1295,0.768,1.356 85 | OAS2_0041,OAS2_0041_MR3,Converted,3,1331,F,R,75,16,1,28,0.5,1314,0.760,1.335 86 | OAS2_0042,OAS2_0042_MR1,Nondemented,1,0,F,R,70,17,3,29,0,1640,0.766,1.070 87 | OAS2_0042,OAS2_0042_MR2,Nondemented,2,1008,F,R,73,17,3,29,0,1665,0.748,1.054 88 | OAS2_0043,OAS2_0043_MR1,Demented,1,0,F,R,72,12,4,26,0.5,1453,0.777,1.208 89 | OAS2_0043,OAS2_0043_MR2,Demented,2,491,F,R,73,12,4,26,0.5,1451,0.757,1.210 90 | OAS2_0044,OAS2_0044_MR1,Demented,1,0,M,R,68,14,4,21,1,1333,0.685,1.317 91 | OAS2_0044,OAS2_0044_MR2,Demented,2,352,M,R,69,14,4,15,1,1331,0.678,1.318 92 | OAS2_0044,OAS2_0044_MR3,Demented,3,866,M,R,71,14,4,22,1,1332,0.679,1.317 93 | OAS2_0045,OAS2_0045_MR1,Nondemented,1,0,F,R,75,18,1,30,0,1317,0.737,1.332 94 | OAS2_0045,OAS2_0045_MR2,Nondemented,2,689,F,R,77,18,1,29,0,1322,0.731,1.327 95 | OAS2_0046,OAS2_0046_MR1,Demented,1,0,F,R,83,15,2,20,0.5,1476,0.750,1.189 96 | OAS2_0046,OAS2_0046_MR2,Demented,2,575,F,R,85,15,2,22,1,1483,0.748,1.183 97 | OAS2_0047,OAS2_0047_MR1,Nondemented,1,0,F,R,77,16,2,29,0,1433,0.723,1.225 98 | OAS2_0047,OAS2_0047_MR2,Nondemented,2,486,F,R,78,16,2,27,0,1414,0.727,1.242 99 | OAS2_0048,OAS2_0048_MR1,Demented,1,0,M,R,66,16,1,19,1,1695,0.711,1.036 100 | OAS2_0048,OAS2_0048_MR2,Demented,2,248,M,R,66,16,1,21,1,1708,0.703,1.028 101 | OAS2_0048,OAS2_0048_MR3,Demented,3,647,M,R,68,16,1,19,1,1712,0.691,1.025 102 | OAS2_0048,OAS2_0048_MR4,Demented,4,970,M,R,68,16,1,7,1,1714,0.682,1.024 103 | OAS2_0048,OAS2_0048_MR5,Demented,5,1233,M,R,69,16,1,4,1,1701,0.676,1.032 104 | OAS2_0049,OAS2_0049_MR1,Nondemented,1,0,F,R,69,16,3,30,0,1491,0.794,1.177 105 | OAS2_0049,OAS2_0049_MR2,Nondemented,2,395,F,R,70,16,3,30,0,1505,0.791,1.166 106 | OAS2_0049,OAS2_0049_MR3,Nondemented,3,687,F,R,71,16,3,30,0,1503,0.788,1.168 107 | OAS2_0050,OAS2_0050_MR1,Demented,1,0,M,R,71,12,4,20,0.5,1461,0.724,1.202 108 | OAS2_0050,OAS2_0050_MR2,Demented,2,538,M,R,72,12,4,17,1,1483,0.695,1.184 109 | OAS2_0051,OAS2_0051_MR1,Nondemented,1,0,F,R,92,23,1,29,0,1454,0.701,1.207 110 | OAS2_0051,OAS2_0051_MR2,Nondemented,2,457,F,R,94,23,1,29,0,1474,0.696,1.190 111 | OAS2_0051,OAS2_0051_MR3,Nondemented,3,1526,F,R,97,23,1,30,0,1483,0.689,1.184 112 | OAS2_0052,OAS2_0052_MR1,Nondemented,1,0,M,R,74,18,2,29,0,1463,0.737,1.199 113 | OAS2_0052,OAS2_0052_MR2,Nondemented,2,1510,M,R,78,18,2,30,0,1484,0.703,1.183 114 | OAS2_0053,OAS2_0053_MR1,Nondemented,1,0,F,R,82,16,3,29,0,1484,0.760,1.183 115 | OAS2_0053,OAS2_0053_MR2,Nondemented,2,842,F,R,84,16,3,28,0,1500,0.744,1.170 116 | OAS2_0054,OAS2_0054_MR1,Converted,1,0,F,R,85,18,1,29,0,1264,0.701,1.388 117 | OAS2_0054,OAS2_0054_MR2,Converted,2,846,F,R,87,18,1,24,0.5,1275,0.683,1.376 118 | OAS2_0055,OAS2_0055_MR1,Nondemented,1,0,M,R,65,13,3,29,0,1362,0.837,1.289 119 | OAS2_0055,OAS2_0055_MR2,Nondemented,2,726,M,R,67,13,3,27,0,1365,0.827,1.285 120 | OAS2_0056,OAS2_0056_MR1,Nondemented,1,0,F,R,71,14,2,28,0,1461,0.756,1.202 121 | OAS2_0056,OAS2_0056_MR2,Nondemented,2,622,F,R,73,14,2,30,0,1456,0.739,1.205 122 | OAS2_0057,OAS2_0057_MR1,Nondemented,1,0,F,R,81,12,2,30,0,1599,0.755,1.098 123 | OAS2_0057,OAS2_0057_MR2,Nondemented,2,640,F,R,83,12,2,29,0,1569,0.757,1.118 124 | OAS2_0057,OAS2_0057_MR3,Nondemented,3,1340,F,R,85,12,2,30,0,1580,0.739,1.111 125 | OAS2_0058,OAS2_0058_MR1,Demented,1,0,M,R,78,14,3,30,0.5,1315,0.707,1.335 126 | OAS2_0058,OAS2_0058_MR2,Demented,2,212,M,R,79,14,3,26,0.5,1308,0.706,1.341 127 | OAS2_0058,OAS2_0058_MR3,Demented,3,764,M,R,80,14,3,29,0.5,1324,0.695,1.326 128 | OAS2_0060,OAS2_0060_MR1,Demented,1,0,M,R,75,13,4,29,0.5,1416,0.766,1.239 129 | OAS2_0060,OAS2_0060_MR2,Demented,2,1290,M,R,78,13,4,28,0.5,1408,0.757,1.247 130 | OAS2_0061,OAS2_0061_MR1,Nondemented,1,0,M,R,68,18,1,30,0,1654,0.747,1.061 131 | OAS2_0061,OAS2_0061_MR2,Nondemented,2,873,M,R,70,18,1,30,0,1660,0.738,1.057 132 | OAS2_0061,OAS2_0061_MR3,Nondemented,3,1651,M,R,72,18,1,30,0,1681,0.729,1.044 133 | OAS2_0062,OAS2_0062_MR1,Nondemented,1,0,F,R,79,18,2,29,0,1641,0.695,1.069 134 | OAS2_0062,OAS2_0062_MR2,Nondemented,2,723,F,R,81,18,2,30,0,1664,0.677,1.055 135 | OAS2_0062,OAS2_0062_MR3,Nondemented,3,1351,F,R,83,18,2,29,0,1667,0.688,1.053 136 | OAS2_0063,OAS2_0063_MR1,Demented,1,0,F,R,80,12,,30,0.5,1430,0.737,1.228 137 | OAS2_0063,OAS2_0063_MR2,Demented,2,490,F,R,81,12,,27,0.5,1453,0.721,1.208 138 | OAS2_0064,OAS2_0064_MR1,Demented,1,0,F,R,78,8,5,23,1,1462,0.691,1.200 139 | OAS2_0064,OAS2_0064_MR2,Demented,2,830,F,R,81,8,5,26,0.5,1459,0.694,1.203 140 | OAS2_0064,OAS2_0064_MR3,Demented,3,1282,F,R,82,8,5,18,0.5,1464,0.682,1.199 141 | OAS2_0066,OAS2_0066_MR1,Demented,1,0,M,R,61,18,1,30,1,1957,0.734,0.897 142 | OAS2_0066,OAS2_0066_MR2,Demented,2,497,M,R,62,18,1,30,0.5,1928,0.731,0.910 143 | OAS2_0067,OAS2_0067_MR1,Nondemented,1,0,M,R,67,12,4,30,0,1440,0.727,1.219 144 | OAS2_0067,OAS2_0067_MR2,Nondemented,2,451,M,R,68,12,4,29,0,1438,0.738,1.220 145 | OAS2_0067,OAS2_0067_MR3,Nondemented,3,1438,M,R,71,12,4,29,0,1455,0.724,1.206 146 | OAS2_0067,OAS2_0067_MR4,Nondemented,4,2163,M,R,73,12,4,28,0,1444,0.722,1.215 147 | OAS2_0068,OAS2_0068_MR1,Nondemented,1,0,F,R,88,12,3,30,0,1428,0.700,1.229 148 | OAS2_0068,OAS2_0068_MR2,Nondemented,2,743,F,R,90,12,3,29,0,1475,0.676,1.190 149 | OAS2_0069,OAS2_0069_MR1,Nondemented,1,0,F,R,81,18,2,29,0,1470,0.687,1.194 150 | OAS2_0069,OAS2_0069_MR2,Nondemented,2,432,F,R,82,18,2,30,0,1471,0.690,1.193 151 | OAS2_0070,OAS2_0070_MR1,Nondemented,1,0,M,R,80,17,1,28,0,1660,0.728,1.057 152 | OAS2_0070,OAS2_0070_MR2,Nondemented,2,672,M,R,82,17,1,29,0,1692,0.723,1.037 153 | OAS2_0070,OAS2_0070_MR3,Nondemented,3,1415,M,R,84,17,1,29,0,1707,0.717,1.028 154 | OAS2_0070,OAS2_0070_MR4,Nondemented,4,1870,M,R,85,17,1,30,0,1724,0.704,1.018 155 | OAS2_0070,OAS2_0070_MR5,Nondemented,5,2386,M,R,86,17,1,30,0,1720,0.705,1.020 156 | OAS2_0071,OAS2_0071_MR1,Demented,1,0,F,R,83,13,2,27,1,1391,0.705,1.262 157 | OAS2_0071,OAS2_0071_MR2,Demented,2,365,F,R,84,13,2,28,1,1402,0.695,1.252 158 | OAS2_0073,OAS2_0073_MR1,Nondemented,1,0,F,R,70,14,3,29,0,1524,0.787,1.151 159 | OAS2_0073,OAS2_0073_MR2,Nondemented,2,580,F,R,72,14,3,28,0,1512,0.777,1.161 160 | OAS2_0073,OAS2_0073_MR3,Nondemented,3,1705,F,R,75,14,3,28,0,1507,0.782,1.164 161 | OAS2_0073,OAS2_0073_MR4,Nondemented,4,2288,F,R,76,14,3,29,0,1490,0.774,1.178 162 | OAS2_0073,OAS2_0073_MR5,Nondemented,5,2517,F,R,77,14,3,29,0,1504,0.769,1.167 163 | OAS2_0075,OAS2_0075_MR1,Demented,1,0,F,R,73,8,5,25,0.5,1151,0.743,1.525 164 | OAS2_0075,OAS2_0075_MR2,Demented,2,567,F,R,75,8,5,22,0.5,1143,0.741,1.535 165 | OAS2_0076,OAS2_0076_MR1,Nondemented,1,0,F,R,66,18,2,30,0,1504,0.725,1.167 166 | OAS2_0076,OAS2_0076_MR2,Nondemented,2,956,F,R,69,18,2,29,0,1536,0.719,1.143 167 | OAS2_0076,OAS2_0076_MR3,Nondemented,3,1663,F,R,71,18,2,30,0,1520,0.718,1.155 168 | OAS2_0077,OAS2_0077_MR1,Nondemented,1,0,M,R,69,16,2,28,0,1848,0.737,0.950 169 | OAS2_0077,OAS2_0077_MR2,Nondemented,2,1393,M,R,73,16,2,29,0,1931,0.722,0.909 170 | OAS2_0078,OAS2_0078_MR1,Nondemented,1,0,M,R,89,16,1,28,0,1631,0.674,1.076 171 | OAS2_0078,OAS2_0078_MR2,Nondemented,2,441,M,R,91,16,1,28,0,1640,0.670,1.070 172 | OAS2_0078,OAS2_0078_MR3,Nondemented,3,1019,M,R,92,16,1,30,0,1662,0.682,1.056 173 | OAS2_0079,OAS2_0079_MR1,Demented,1,0,F,R,69,12,4,23,0.5,1447,0.759,1.213 174 | OAS2_0079,OAS2_0079_MR2,Demented,2,584,F,R,71,12,4,16,1,1492,0.725,1.176 175 | OAS2_0079,OAS2_0079_MR3,Demented,3,1435,F,R,73,12,4,16,1,1478,0.696,1.188 176 | OAS2_0080,OAS2_0080_MR1,Demented,1,0,M,R,66,15,2,25,0.5,1548,0.727,1.134 177 | OAS2_0080,OAS2_0080_MR2,Demented,2,580,M,R,68,15,2,30,0.5,1556,0.713,1.128 178 | OAS2_0080,OAS2_0080_MR3,Demented,3,1209,M,R,69,15,2,28,0.5,1546,0.724,1.135 179 | OAS2_0081,OAS2_0081_MR1,Demented,1,0,F,R,82,12,4,26,0.5,1271,0.695,1.381 180 | OAS2_0081,OAS2_0081_MR2,Demented,2,659,F,R,84,12,4,26,0.5,1273,0.686,1.378 181 | OAS2_0085,OAS2_0085_MR1,Nondemented,1,0,F,R,78,8,5,29,0,1383,0.756,1.269 182 | OAS2_0085,OAS2_0085_MR2,Nondemented,2,670,F,R,80,8,5,27,0,1381,0.751,1.270 183 | OAS2_0086,OAS2_0086_MR1,Nondemented,1,0,F,R,63,15,2,28,0,1544,0.805,1.136 184 | OAS2_0086,OAS2_0086_MR2,Nondemented,2,802,F,R,65,15,2,28,0,1542,0.792,1.138 185 | OAS2_0087,OAS2_0087_MR1,Demented,1,0,F,R,96,17,1,26,1,1465,0.683,1.198 186 | OAS2_0087,OAS2_0087_MR2,Demented,2,754,F,R,98,17,1,21,2,1503,0.660,1.168 187 | OAS2_0088,OAS2_0088_MR1,Demented,1,0,M,R,78,12,4,21,1,1477,0.672,1.188 188 | OAS2_0088,OAS2_0088_MR2,Demented,2,751,M,R,80,12,4,20,1,1494,0.661,1.175 189 | OAS2_0089,OAS2_0089_MR1,Demented,1,0,M,R,70,12,2,29,0.5,1432,0.692,1.225 190 | OAS2_0089,OAS2_0089_MR3,Demented,3,563,M,R,72,12,2,27,1,1432,0.684,1.226 191 | OAS2_0090,OAS2_0090_MR1,Nondemented,1,0,M,R,73,18,2,29,0,1548,0.773,1.134 192 | OAS2_0090,OAS2_0090_MR2,Nondemented,2,680,M,R,75,18,2,29,0,1534,0.772,1.144 193 | OAS2_0090,OAS2_0090_MR3,Nondemented,3,1345,M,R,76,18,2,30,0,1550,0.758,1.133 194 | OAS2_0091,OAS2_0091_MR1,Nondemented,1,0,M,R,75,12,4,28,0,1511,0.739,1.162 195 | OAS2_0091,OAS2_0091_MR2,Nondemented,2,1047,M,R,78,12,4,29,0,1506,0.715,1.166 196 | OAS2_0092,OAS2_0092_MR1,Converted,1,0,F,R,83,12,2,28,0,1383,0.748,1.269 197 | OAS2_0092,OAS2_0092_MR2,Converted,2,706,F,R,84,12,2,27,0.5,1390,0.728,1.263 198 | OAS2_0094,OAS2_0094_MR1,Nondemented,1,0,F,R,61,16,1,30,0,1513,0.771,1.160 199 | OAS2_0094,OAS2_0094_MR2,Nondemented,2,817,F,R,63,16,1,30,0,1449,0.774,1.212 200 | OAS2_0095,OAS2_0095_MR1,Nondemented,1,0,M,R,71,18,1,30,0,1769,0.699,0.992 201 | OAS2_0095,OAS2_0095_MR2,Nondemented,2,673,M,R,72,18,1,29,0,1785,0.687,0.983 202 | OAS2_0095,OAS2_0095_MR3,Nondemented,3,1412,M,R,74,18,1,29,0,1814,0.679,0.967 203 | OAS2_0096,OAS2_0096_MR1,Nondemented,1,0,F,R,89,13,3,29,0,1154,0.750,1.521 204 | OAS2_0096,OAS2_0096_MR2,Nondemented,2,778,F,R,91,13,3,28,0,1165,0.736,1.506 205 | OAS2_0097,OAS2_0097_MR1,Nondemented,1,0,M,R,74,16,2,30,0,1611,0.729,1.089 206 | OAS2_0097,OAS2_0097_MR2,Nondemented,2,1024,M,R,77,16,2,30,0,1628,0.709,1.078 207 | OAS2_0098,OAS2_0098_MR1,Demented,1,0,M,R,66,12,4,30,0.5,1446,0.780,1.214 208 | OAS2_0098,OAS2_0098_MR2,Demented,2,661,M,R,67,12,4,28,0.5,1412,0.783,1.243 209 | OAS2_0099,OAS2_0099_MR1,Demented,1,0,F,R,80,12,,27,0.5,1475,0.762,1.190 210 | OAS2_0099,OAS2_0099_MR2,Demented,2,807,F,R,83,12,,23,0.5,1484,0.750,1.183 211 | OAS2_0100,OAS2_0100_MR1,Nondemented,1,0,F,R,77,11,4,29,0,1583,0.777,1.108 212 | OAS2_0100,OAS2_0100_MR2,Nondemented,2,1218,F,R,80,11,4,30,0,1586,0.757,1.107 213 | OAS2_0100,OAS2_0100_MR3,Nondemented,3,1752,F,R,82,11,4,30,0,1590,0.760,1.104 214 | OAS2_0101,OAS2_0101_MR1,Nondemented,1,0,F,R,71,18,2,30,0,1371,0.769,1.280 215 | OAS2_0101,OAS2_0101_MR2,Nondemented,2,952,F,R,74,18,2,30,0,1400,0.752,1.254 216 | OAS2_0101,OAS2_0101_MR3,Nondemented,3,1631,F,R,76,18,2,30,0,1379,0.757,1.273 217 | OAS2_0102,OAS2_0102_MR1,Demented,1,0,M,R,82,15,3,29,0.5,1499,0.689,1.171 218 | OAS2_0102,OAS2_0102_MR2,Demented,2,610,M,R,84,15,3,29,0.5,1497,0.686,1.172 219 | OAS2_0102,OAS2_0102_MR3,Demented,3,1387,M,R,86,15,3,30,0.5,1498,0.681,1.171 220 | OAS2_0103,OAS2_0103_MR1,Converted,1,0,F,R,69,16,1,30,0,1404,0.750,1.250 221 | OAS2_0103,OAS2_0103_MR2,Converted,2,1554,F,R,74,16,1,30,0.5,1423,0.722,1.233 222 | OAS2_0103,OAS2_0103_MR3,Converted,3,2002,F,R,75,16,1,30,0.5,1419,0.731,1.236 223 | OAS2_0104,OAS2_0104_MR1,Demented,1,0,M,R,70,16,1,25,0.5,1568,0.696,1.119 224 | OAS2_0104,OAS2_0104_MR2,Demented,2,465,M,R,71,16,1,17,1,1562,0.685,1.123 225 | OAS2_0105,OAS2_0105_MR1,Nondemented,1,0,M,R,86,12,4,29,0,1783,0.703,0.984 226 | OAS2_0105,OAS2_0105_MR2,Nondemented,2,675,M,R,87,12,4,30,0,1762,0.718,0.996 227 | OAS2_0106,OAS2_0106_MR1,Demented,1,0,F,R,70,11,4,22,1,1445,0.722,1.214 228 | OAS2_0106,OAS2_0106_MR2,Demented,2,729,F,R,72,11,4,21,1,1489,0.686,1.179 229 | OAS2_0108,OAS2_0108_MR1,Demented,1,0,M,R,77,18,1,25,0.5,1604,0.781,1.094 230 | OAS2_0108,OAS2_0108_MR2,Demented,2,883,M,R,79,18,1,27,0.5,1569,0.781,1.118 231 | OAS2_0109,OAS2_0109_MR1,Nondemented,1,0,M,R,81,11,4,28,0,1750,0.670,1.003 232 | OAS2_0109,OAS2_0109_MR2,Nondemented,2,766,M,R,83,11,4,29,0,1744,0.670,1.006 233 | OAS2_0111,OAS2_0111_MR1,Demented,1,0,M,R,62,12,4,17,0.5,1525,0.732,1.151 234 | OAS2_0111,OAS2_0111_MR2,Demented,2,881,M,R,65,12,4,17,0.5,1520,0.699,1.155 235 | OAS2_0112,OAS2_0112_MR1,Demented,1,0,F,R,76,12,3,27,0.5,1315,0.698,1.335 236 | OAS2_0112,OAS2_0112_MR2,Demented,2,558,F,R,78,12,3,20,0.5,1339,0.689,1.311 237 | OAS2_0113,OAS2_0113_MR1,Demented,1,0,F,R,73,13,2,23,0.5,1536,0.725,1.142 238 | OAS2_0113,OAS2_0113_MR2,Demented,2,504,F,R,75,13,2,28,0.5,1520,0.708,1.155 239 | OAS2_0114,OAS2_0114_MR1,Demented,1,0,F,R,76,12,,27,0.5,1316,0.727,1.333 240 | OAS2_0114,OAS2_0114_MR2,Demented,2,570,F,R,78,12,,27,1,1309,0.709,1.341 241 | OAS2_0116,OAS2_0116_MR1,Demented,1,0,F,R,73,12,3,27,0.5,1425,0.769,1.232 242 | OAS2_0116,OAS2_0116_MR2,Demented,2,616,F,R,75,12,3,28,0.5,1407,0.770,1.247 243 | OAS2_0117,OAS2_0117_MR1,Nondemented,1,0,M,R,73,20,2,30,0,1842,0.758,0.953 244 | OAS2_0117,OAS2_0117_MR2,Nondemented,2,576,M,R,74,20,2,30,0,1806,0.759,0.972 245 | OAS2_0117,OAS2_0117_MR3,Nondemented,3,1345,M,R,76,20,2,30,0,1823,0.739,0.963 246 | OAS2_0117,OAS2_0117_MR4,Nondemented,4,1927,M,R,78,20,2,29,0,1826,0.734,0.961 247 | OAS2_0118,OAS2_0118_MR1,Converted,1,0,F,R,67,14,4,30,0,1508,0.794,1.164 248 | OAS2_0118,OAS2_0118_MR2,Converted,2,1422,F,R,71,14,4,26,0.5,1529,0.788,1.147 249 | OAS2_0119,OAS2_0119_MR1,Nondemented,1,0,F,R,81,15,2,28,0,1486,0.754,1.181 250 | OAS2_0119,OAS2_0119_MR2,Nondemented,2,733,F,R,83,15,2,29,0,1482,0.751,1.184 251 | OAS2_0119,OAS2_0119_MR3,Nondemented,3,1713,F,R,85,15,2,30,0,1488,0.741,1.180 252 | OAS2_0120,OAS2_0120_MR1,Demented,1,0,F,R,76,14,3,25,1,1409,0.715,1.246 253 | OAS2_0120,OAS2_0120_MR2,Demented,2,595,F,R,78,14,3,15,2,1401,0.700,1.253 254 | OAS2_0121,OAS2_0121_MR1,Nondemented,1,0,F,R,73,11,4,30,0,1475,0.726,1.190 255 | OAS2_0121,OAS2_0121_MR2,Nondemented,2,647,F,R,74,11,4,30,0,1517,0.705,1.157 256 | OAS2_0122,OAS2_0122_MR1,Nondemented,1,0,F,R,86,16,3,30,0,1293,0.747,1.357 257 | OAS2_0122,OAS2_0122_MR2,Nondemented,2,597,F,R,88,16,3,30,0,1295,0.744,1.355 258 | OAS2_0124,OAS2_0124_MR1,Demented,1,0,M,R,70,16,3,29,0.5,1463,0.749,1.200 259 | OAS2_0124,OAS2_0124_MR2,Demented,2,472,M,R,71,16,3,27,0.5,1479,0.750,1.187 260 | OAS2_0126,OAS2_0126_MR1,Nondemented,1,0,F,R,74,12,3,29,0,1344,0.739,1.306 261 | OAS2_0126,OAS2_0126_MR2,Nondemented,2,472,F,R,75,12,3,29,0,1338,0.747,1.312 262 | OAS2_0126,OAS2_0126_MR3,Nondemented,3,1192,F,R,77,12,3,29,0,1344,0.740,1.306 263 | OAS2_0127,OAS2_0127_MR1,Converted,1,0,M,R,79,18,1,29,0,1644,0.729,1.067 264 | OAS2_0127,OAS2_0127_MR2,Converted,2,851,M,R,81,18,1,29,0.5,1654,0.720,1.061 265 | OAS2_0127,OAS2_0127_MR3,Converted,3,1042,M,R,81,18,1,29,0.5,1647,0.717,1.066 266 | OAS2_0127,OAS2_0127_MR4,Converted,4,2153,M,R,84,18,1,29,0.5,1668,0.694,1.052 267 | OAS2_0127,OAS2_0127_MR5,Converted,5,2639,M,R,86,18,1,30,0.5,1670,0.669,1.051 268 | OAS2_0128,OAS2_0128_MR1,Nondemented,1,0,F,R,76,16,1,28,0,1346,0.762,1.304 269 | OAS2_0128,OAS2_0128_MR2,Nondemented,2,1140,F,R,79,16,1,29,0,1354,0.739,1.297 270 | OAS2_0129,OAS2_0129_MR1,Nondemented,1,0,F,R,78,18,1,30,0,1440,0.666,1.219 271 | OAS2_0129,OAS2_0129_MR2,Nondemented,2,737,F,R,80,18,1,30,0,1436,0.663,1.222 272 | OAS2_0129,OAS2_0129_MR3,Nondemented,3,1591,F,R,82,18,1,29,0,1442,0.644,1.217 273 | OAS2_0131,OAS2_0131_MR1,Converted,1,0,F,R,65,12,2,30,0.5,1340,0.754,1.309 274 | OAS2_0131,OAS2_0131_MR2,Converted,2,679,F,R,67,12,2,25,0,1331,0.761,1.318 275 | OAS2_0133,OAS2_0133_MR1,Converted,1,0,F,R,78,12,3,29,0,1475,0.731,1.190 276 | OAS2_0133,OAS2_0133_MR3,Converted,3,1006,F,R,81,12,3,28,0.5,1495,0.687,1.174 277 | OAS2_0134,OAS2_0134_MR1,Demented,1,0,F,R,70,11,4,29,0.5,1295,0.748,1.355 278 | OAS2_0134,OAS2_0134_MR2,Demented,2,539,F,R,71,11,4,28,0.5,1284,0.741,1.367 279 | OAS2_0135,OAS2_0135_MR1,Nondemented,1,0,M,R,74,18,2,30,0,1636,0.680,1.073 280 | OAS2_0135,OAS2_0135_MR2,Nondemented,2,1146,M,R,78,18,2,27,0,1645,0.663,1.067 281 | OAS2_0137,OAS2_0137_MR1,Demented,1,0,M,R,74,18,2,28,0.5,1659,0.739,1.058 282 | OAS2_0137,OAS2_0137_MR2,Demented,2,636,M,R,75,18,2,30,0.5,1651,0.737,1.063 283 | OAS2_0138,OAS2_0138_MR1,Nondemented,1,0,F,R,73,16,2,29,0,1123,0.786,1.563 284 | OAS2_0138,OAS2_0138_MR2,Nondemented,2,846,F,R,75,16,2,28,0,1106,0.767,1.587 285 | OAS2_0139,OAS2_0139_MR1,Demented,1,0,F,R,67,16,1,29,0.5,1337,0.766,1.312 286 | OAS2_0139,OAS2_0139_MR2,Demented,2,403,F,R,68,16,1,29,0.5,1344,0.733,1.305 287 | OAS2_0140,OAS2_0140_MR1,Demented,1,0,F,R,76,16,3,26,0.5,1391,0.705,1.262 288 | OAS2_0140,OAS2_0140_MR2,Demented,2,793,F,R,78,16,3,27,0.5,1393,0.690,1.260 289 | OAS2_0140,OAS2_0140_MR3,Demented,3,1655,F,R,81,16,3,25,0.5,1396,0.687,1.257 290 | OAS2_0141,OAS2_0141_MR1,Nondemented,1,0,F,R,65,18,2,30,0,1277,0.812,1.374 291 | OAS2_0141,OAS2_0141_MR2,Nondemented,2,1022,F,R,68,18,2,29,0,1290,0.795,1.361 292 | OAS2_0142,OAS2_0142_MR1,Nondemented,1,0,F,R,69,16,3,29,0,1380,0.819,1.272 293 | OAS2_0142,OAS2_0142_MR2,Nondemented,2,665,F,R,71,16,3,28,0,1390,0.810,1.262 294 | OAS2_0143,OAS2_0143_MR1,Nondemented,1,0,F,R,89,18,2,30,0,1715,0.746,1.023 295 | OAS2_0143,OAS2_0143_MR2,Nondemented,2,561,F,R,91,18,2,30,0,1714,0.741,1.024 296 | OAS2_0143,OAS2_0143_MR3,Nondemented,3,1553,F,R,93,18,2,29,0,1744,0.723,1.006 297 | OAS2_0144,OAS2_0144_MR1,Converted,1,0,M,R,77,16,1,30,0,1704,0.716,1.030 298 | OAS2_0144,OAS2_0144_MR2,Converted,2,683,M,R,79,16,1,30,0.5,1722,0.708,1.019 299 | OAS2_0145,OAS2_0145_MR1,Converted,1,0,F,R,68,16,3,30,0,1298,0.799,1.352 300 | OAS2_0145,OAS2_0145_MR2,Converted,2,1707,F,R,73,16,3,29,0.5,1287,0.771,1.364 301 | OAS2_0146,OAS2_0146_MR1,Demented,1,0,F,R,80,15,2,20,1,1732,0.685,1.013 302 | OAS2_0146,OAS2_0146_MR2,Demented,2,525,F,R,82,15,2,20,1,1729,0.698,1.015 303 | OAS2_0147,OAS2_0147_MR1,Nondemented,1,0,F,R,77,13,2,29,0,1351,0.769,1.299 304 | OAS2_0147,OAS2_0147_MR2,Nondemented,2,440,F,R,78,13,2,29,0,1334,0.769,1.316 305 | OAS2_0147,OAS2_0147_MR3,Nondemented,3,1204,F,R,80,13,2,28,0,1337,0.762,1.313 306 | OAS2_0147,OAS2_0147_MR4,Nondemented,4,1806,F,R,82,13,2,30,0,1342,0.747,1.307 307 | OAS2_0149,OAS2_0149_MR1,Nondemented,1,0,F,R,81,13,2,29,0,1345,0.737,1.305 308 | OAS2_0149,OAS2_0149_MR2,Nondemented,2,674,F,R,83,13,2,30,0,1335,0.732,1.314 309 | OAS2_0150,OAS2_0150_MR1,Demented,1,0,F,R,73,12,3,30,0.5,1343,0.720,1.306 310 | OAS2_0150,OAS2_0150_MR2,Demented,2,518,F,R,75,12,3,27,1,1357,0.714,1.293 311 | OAS2_0152,OAS2_0152_MR1,Nondemented,1,0,F,R,66,18,2,29,0,1191,0.785,1.474 312 | OAS2_0152,OAS2_0152_MR2,Nondemented,2,790,F,R,68,18,2,29,0,1194,0.772,1.469 313 | OAS2_0152,OAS2_0152_MR3,Nondemented,3,1329,F,R,69,18,2,29,0,1202,0.770,1.461 314 | OAS2_0154,OAS2_0154_MR1,Nondemented,1,0,F,R,75,18,1,29,0,1436,0.750,1.222 315 | OAS2_0154,OAS2_0154_MR2,Nondemented,2,791,F,R,77,18,1,28,0,1559,0.713,1.125 316 | OAS2_0156,OAS2_0156_MR1,Nondemented,1,0,F,R,78,18,1,30,0,1243,0.748,1.412 317 | OAS2_0156,OAS2_0156_MR2,Nondemented,2,777,F,R,81,18,1,30,0,1256,0.739,1.398 318 | OAS2_0157,OAS2_0157_MR1,Demented,1,0,F,R,73,12,2,19,1,1274,0.728,1.377 319 | OAS2_0157,OAS2_0157_MR2,Demented,2,764,F,R,75,12,2,18,1,1479,0.657,1.187 320 | OAS2_0158,OAS2_0158_MR1,Nondemented,1,0,F,R,73,15,4,29,0,1272,0.697,1.380 321 | OAS2_0158,OAS2_0158_MR2,Nondemented,2,1399,F,R,76,15,4,29,0,1281,0.680,1.370 322 | OAS2_0159,OAS2_0159_MR1,Demented,1,0,F,R,73,14,3,29,0.5,1238,0.757,1.418 323 | OAS2_0159,OAS2_0159_MR2,Demented,2,759,F,R,76,14,3,28,0.5,1236,0.764,1.419 324 | OAS2_0160,OAS2_0160_MR1,Demented,1,0,M,R,76,12,,27,0.5,1557,0.705,1.127 325 | OAS2_0160,OAS2_0160_MR2,Demented,2,552,M,R,78,12,,29,1,1569,0.704,1.119 326 | OAS2_0161,OAS2_0161_MR1,Nondemented,1,0,M,R,77,16,1,29,0,1818,0.734,0.965 327 | OAS2_0161,OAS2_0161_MR2,Nondemented,2,454,M,R,79,16,1,30,0,1817,0.736,0.966 328 | OAS2_0161,OAS2_0161_MR3,Nondemented,3,1033,M,R,80,16,1,29,0,1830,0.724,0.959 329 | OAS2_0162,OAS2_0162_MR1,Demented,1,0,M,R,82,14,2,23,0.5,1514,0.678,1.159 330 | OAS2_0162,OAS2_0162_MR2,Demented,2,621,M,R,84,14,2,22,0.5,1550,0.665,1.132 331 | OAS2_0164,OAS2_0164_MR1,Demented,1,0,M,R,77,20,1,23,1,1713,0.756,1.024 332 | OAS2_0164,OAS2_0164_MR2,Demented,2,580,M,R,79,20,1,25,2,1710,0.760,1.026 333 | OAS2_0165,OAS2_0165_MR1,Demented,1,0,M,R,78,12,3,23,1,1491,0.710,1.177 334 | OAS2_0165,OAS2_0165_MR2,Demented,2,736,M,R,80,12,3,17,1,1755,0.696,1.000 335 | OAS2_0169,OAS2_0169_MR1,Nondemented,1,0,F,R,71,18,1,30,0,1426,0.731,1.231 336 | OAS2_0169,OAS2_0169_MR2,Nondemented,2,691,F,R,73,18,1,30,0,1414,0.739,1.241 337 | OAS2_0171,OAS2_0171_MR1,Nondemented,1,0,M,R,76,16,3,30,0,1832,0.769,0.958 338 | OAS2_0171,OAS2_0171_MR2,Nondemented,2,493,M,R,77,16,3,30,0,1820,0.768,0.964 339 | OAS2_0171,OAS2_0171_MR3,Nondemented,3,1695,M,R,81,16,3,30,0,1836,0.744,0.956 340 | OAS2_0172,OAS2_0172_MR1,Demented,1,0,M,R,75,16,1,30,0.5,1891,0.709,0.928 341 | OAS2_0172,OAS2_0172_MR2,Demented,2,1212,M,R,79,16,1,29,0.5,1899,0.700,0.924 342 | OAS2_0174,OAS2_0174_MR1,Nondemented,1,0,M,R,60,12,4,30,0,1379,0.806,1.273 343 | OAS2_0174,OAS2_0174_MR2,Nondemented,2,695,M,R,62,12,4,30,0,1378,0.795,1.274 344 | OAS2_0174,OAS2_0174_MR3,Nondemented,3,1555,M,R,64,12,4,30,0,1370,0.794,1.281 345 | OAS2_0175,OAS2_0175_MR1,Demented,1,0,M,R,70,16,4,26,0.5,1796,0.742,0.977 346 | OAS2_0175,OAS2_0175_MR2,Demented,2,700,M,R,72,16,4,28,0.5,1796,0.732,0.977 347 | OAS2_0175,OAS2_0175_MR3,Demented,3,1343,M,R,73,16,4,28,0.5,1803,0.731,0.973 348 | OAS2_0176,OAS2_0176_MR1,Converted,1,0,M,R,84,16,2,30,0,1404,0.710,1.250 349 | OAS2_0176,OAS2_0176_MR2,Converted,2,774,M,R,87,16,2,30,0,1398,0.696,1.255 350 | OAS2_0176,OAS2_0176_MR3,Converted,3,1631,M,R,89,16,2,30,0.5,1408,0.679,1.246 351 | OAS2_0177,OAS2_0177_MR1,Nondemented,1,0,M,R,68,14,3,26,0,1444,0.778,1.216 352 | OAS2_0177,OAS2_0177_MR2,Nondemented,2,665,M,R,70,14,3,28,0,1510,0.770,1.162 353 | OAS2_0178,OAS2_0178_MR1,Nondemented,1,0,F,R,89,14,2,29,0,1509,0.756,1.163 354 | OAS2_0178,OAS2_0178_MR2,Nondemented,2,600,F,R,90,14,2,28,0,1495,0.746,1.174 355 | OAS2_0178,OAS2_0178_MR3,Nondemented,3,1447,F,R,93,14,2,30,0,1488,0.735,1.179 356 | OAS2_0179,OAS2_0179_MR1,Demented,1,0,M,R,79,20,1,26,0.5,1548,0.711,1.134 357 | OAS2_0179,OAS2_0179_MR2,Demented,2,652,M,R,81,20,1,26,0.5,1556,0.691,1.128 358 | OAS2_0181,OAS2_0181_MR1,Demented,1,0,F,R,74,12,,26,0.5,1171,0.733,1.499 359 | OAS2_0181,OAS2_0181_MR2,Demented,2,539,F,R,75,12,,,1,1169,0.742,1.501 360 | OAS2_0181,OAS2_0181_MR3,Demented,3,1107,F,R,77,12,,,1,1159,0.733,1.515 361 | OAS2_0182,OAS2_0182_MR1,Demented,1,0,M,R,73,12,,23,0.5,1661,0.698,1.056 362 | OAS2_0182,OAS2_0182_MR2,Demented,2,776,M,R,75,12,,20,0.5,1654,0.696,1.061 363 | OAS2_0183,OAS2_0183_MR1,Nondemented,1,0,F,R,66,13,2,30,0,1495,0.746,1.174 364 | OAS2_0183,OAS2_0183_MR2,Nondemented,2,182,F,R,66,13,2,30,0,1506,0.740,1.165 365 | OAS2_0183,OAS2_0183_MR3,Nondemented,3,732,F,R,68,13,2,30,0,1506,0.740,1.165 366 | OAS2_0183,OAS2_0183_MR4,Nondemented,4,2107,F,R,72,13,2,30,0,1510,0.723,1.162 367 | OAS2_0184,OAS2_0184_MR1,Demented,1,0,F,R,72,16,3,24,0.5,1354,0.733,1.296 368 | OAS2_0184,OAS2_0184_MR2,Demented,2,553,F,R,73,16,3,21,1,1351,0.708,1.299 369 | OAS2_0185,OAS2_0185_MR1,Demented,1,0,M,R,80,16,1,28,0.5,1704,0.711,1.030 370 | OAS2_0185,OAS2_0185_MR2,Demented,2,842,M,R,82,16,1,28,0.5,1693,0.694,1.037 371 | OAS2_0185,OAS2_0185_MR3,Demented,3,2297,M,R,86,16,1,26,0.5,1688,0.675,1.040 372 | OAS2_0186,OAS2_0186_MR1,Nondemented,1,0,F,R,61,13,2,30,0,1319,0.801,1.331 373 | OAS2_0186,OAS2_0186_MR2,Nondemented,2,763,F,R,63,13,2,30,0,1327,0.796,1.323 374 | OAS2_0186,OAS2_0186_MR3,Nondemented,3,1608,F,R,65,13,2,30,0,1333,0.801,1.317 375 | -------------------------------------------------------------------------------- /DataCheckpoint_Group141_WI24.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# COGS 108 - Data Checkpoint" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Names\n", 15 | "\n", 16 | "- Shivangi Gupta\n", 17 | "- Joseph Hwang\n", 18 | "- Zijun Yang\n", 19 | "- Johnny Gonzales\n", 20 | "- Tanishq Rathore" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Research Question" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "Utilizing clinical MRI Data and personal details of an individual, can we predict via machine learning model whether an individual will have an onset of Alzheimer's disease? Features the model will be trained on include variables such as Mini Mental State Examination (MMSE), visit number, Clinical Dementia Rating (CDR), gender, age, years of education, socioeconomic status, Estimated total intracranial volume (eTIV), Normalize Whole Brain Volume (nWBV), and Atlas Scaling Factor (ASF)." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## Background and Prior Work" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "Advancements in healthcare, improvements in living conditions, and\n", 49 | "breakthroughs in medicine have collectively contributed to longer life\n", 50 | "expectancies worldwide; simultaneously, developed countries are also\n", 51 | "experiencing declining fertility\n", 52 | "rates.[1](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/)\n", 53 | "The combination of these two circumstances has resulted in the\n", 54 | "proportion of older people within populations to steadily increase. The\n", 55 | "World Health Organization (WHO) reported that “in 2020, the number of\n", 56 | "people aged 60 and older outnumbered children younger than 5\n", 57 | "years”.[2](https://www.who.int/news-room/fact-sheets/detail/ageing-and-health)\n", 58 | "In addition, they also state that “between 2015 and 2050, the proportion\n", 59 | "of the world’s population over 60 years will nearly double from 12% to\n", 60 | "22%”. As a result, it is reasonable that we examine common health\n", 61 | "conditions associated with older age, one being Alzheimer’s disease.\n", 62 | "\n", 63 | "So what is Alzheimer’s disease? Alzheimer’s disease is a progressive\n", 64 | "neurodegenerative brain disorder that impairs memory and cognitive\n", 65 | "functions. It is the most common cause of dementia and affects about 6.5\n", 66 | "million people in the United States who are aged 65 and\n", 67 | "older.[3](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447)\n", 68 | "At the moment, there are no cures for the disease but medicines may\n", 69 | "improve or slow the progression of\n", 70 | "symptoms.[3](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447)\n", 71 | "As such, it is our project to create a model that is able to predict\n", 72 | "Alzheimer’s disease based on clinical data that include factors that\n", 73 | "show risk and progression of the disease.\n", 74 | "\n", 75 | "There are several other projects that have asked similar questions and\n", 76 | "approached similar problems for other diseases. For instance, one study\n", 77 | "tried to use machine learning methods to predict risk of cardiovascular\n", 78 | "disease based on major contributing\n", 79 | "factors.[4](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/)\n", 80 | "Similarly, another paper used machine learning and ranker-based feature\n", 81 | "selection methods to predict eye diseases based on\n", 82 | "symptoms.[5](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/)\n", 83 | "Lastly, there was a paper that predicted thyroid disease using selective\n", 84 | "features and machine learning\n", 85 | "techniques.[6](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/)\n", 86 | "All three papers seem to have been relatively successful in predicting\n", 87 | "the disease based on distinct features. Evidently, training machine\n", 88 | "learning models based on datasets which contain factors and indicators\n", 89 | "for a given disease is not a novel format of question and method; we\n", 90 | "hope to achieve similarly for Alzheimer’s disease.\n", 91 | "\n", 92 | "In-Depth Study Analysis\n", 93 | "\n", 94 | "Our group analyzed two studies published in the National Institute of\n", 95 | "Health’s (NIH) journal database. The first study[7](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/) employed\n", 96 | "machine learning models to predict early-stage Alzheimer's Disease using\n", 97 | "Open Access Series of Imaging Studies (OASIS) data, focusing on metrics\n", 98 | "like precision, recall, accuracy, and F1-score. The authors, with\n", 99 | "backgrounds in technology and health research, aimed to enhance early\n", 100 | "diagnosis, potentially lowering Alzheimer's mortality rates. The study\n", 101 | "demonstrated that machine learning techniques such as decision trees,\n", 102 | "random forests, SVM, gradient boosting, and voting classifiers can\n", 103 | "effectively predict early-stage Alzheimer's Disease with an accuracy of\n", 104 | "up to 83%. This achievement highlights the critical role of data science\n", 105 | "in identifying Alzheimer's at an early phase, leveraging feature\n", 106 | "selection and advanced algorithms to enhance diagnostic accuracy. Early\n", 107 | "detection is crucial for timely intervention, potentially mitigating the\n", 108 | "disease's progression and impact on patients and their families (Kavitha\n", 109 | "et al.).\n", 110 | "\n", 111 | "The second study[8](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/) titled\n", 112 | "“Application of machine learning methods for diagnosis of dementia based\n", 113 | "on the 10/66 battery of cognitive function tests in south India”\n", 114 | "investigated the use of machine learning for diagnosing dementia in\n", 115 | "South India, employing the culturally and educationally fair 10/66\n", 116 | "battery of cognitive function tests designed for use in low and\n", 117 | "middle-income countries. Through the analysis of neuropsychological\n", 118 | "data, demographic information, and normative data, the research applied\n", 119 | "Jrip classification algorithm among others, achieving high diagnostic\n", 120 | "accuracy. This approach demonstrates the potential to streamline the\n", 121 | "diagnostic process, making it quicker and more accessible for clinicians\n", 122 | "and patients in India, thereby addressing the significant healthcare\n", 123 | "challenge of efficiently identifying dementia in community settings\n", 124 | "(Bhagyashree et al).\n", 125 | "\n", 126 | "In-Depth Analysis of Similar Projects\n", 127 | "\n", 128 | "Our group also delved into actual Kaggle projects that are directly\n", 129 | "associated with the dataset we’ve chosen to use, delving into EDA and\n", 130 | "prediction models using Scikit-Learn and Tensorflow. The first\n", 131 | "project[9](https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri)\n", 132 | "I will discuss starts with an introduction to Alzheimer's disease and\n", 133 | "the problem statement of estimating the Clinical Dementia Rating (CDR)\n", 134 | "using MRI dataset features. It progresses through data loading and\n", 135 | "preprocessing, including null value handling and normalization, and\n", 136 | "employs machine learning techniques, specifically mentioning model\n", 137 | "training with hyperparameter tuning for XGBClassifier and\n", 138 | "GradientBoostingClassifier. The notebook concludes with predictions and\n", 139 | "performance evaluation, indicated by confusion matrix and classification\n", 140 | "report visualizations, and the model was able to reach a final accuracy\n", 141 | "of \\~80%.\n", 142 | "\n", 143 | "The second project[10](https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf) tried to predict the Clinical Rating of Alzheimer's disease (CRA) by\n", 144 | "integrating data loading, visualization, and extensive machine learning,\n", 145 | "including the use of TensorFlow for neural network models. It explored\n", 146 | "various machine learning models, with a special emphasis on model\n", 147 | "training and evaluation, culminating in the finding that the\n", 148 | "DecisionTreeClassifier performed the best among the models tested. The\n", 149 | "conclusion stressed the need for more data to enhance the precision of\n", 150 | "Alzheimer's disease predictions, highlighting the challenge of data\n", 151 | "scarcity in achieving accurate diagnostic models.\n", 152 | "\n", 153 | "**References**\n", 154 | "\n", 155 | "1. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/) Nargund G. (2009) Declining birth rate in Developed Countries: A\n", 156 | " radical policy re-think is required. *Facts, views & vision in\n", 157 | " ObGyn, 1(3), 191–193.*\n", 158 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/)\n", 159 | "\n", 160 | "2. [^](https://www.who.int/news-room/fact-sheets/detail/ageing-and-health) World Health Organization. (1 Oct 2022) Ageing and health. *World\n", 161 | " Health Organization*.\n", 162 | " [https://www.who.int/news-room/fact-sheets/detail/ageing-and-health](https://www.who.int/news-room/fact-sheets/detail/ageing-and-health)\n", 163 | "\n", 164 | "3. [^](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447) Mayo Foundation for Medical Education and Research. (30\n", 165 | " August 2023) Alzheimer’s disease. *Mayo Clinic*.\n", 166 | " [https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447](https://www.mayoclinic.org/diseases-conditions/alzheimers-disease/symptoms-causes/syc-20350447)\n", 167 | "\n", 168 | "4. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/) Peng, M., Hou, F., Cheng, Z., Shen, T., Liu, K., Zhao, C., &\n", 169 | " Zheng, W. (23 Mar 2023) Prediction of cardiovascular disease risk\n", 170 | " based on major contributing features. *Scientific reports, 13(1),\n", 171 | " 4778*.\n", 172 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036320/)\n", 173 | "\n", 174 | "5. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/) Marouf, A. A., Mottalib, M. M., Alhajj, R., Rokne, J., &\n", 175 | " Jafarullah, O. (24 Dec 2022) An Efficient Approach to Predict Eye\n", 176 | " Diseases from Symptoms Using Machine Learning and Ranker-Based\n", 177 | " Feature Selection Methods. *Bioengineering (Basel, Switzerland),\n", 178 | " 10(1), 25*.\n", 179 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854513/)\n", 180 | "\n", 181 | "6. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/) Chaganti, R., Rustam, F., De La Torre Díez, I., Mazón, J. L. V.,\n", 182 | " Rodríguez, C. L., & Ashraf, I. (13 Aug 2022). Thyroid Disease\n", 183 | " Prediction Using Selective Features and Machine Learning\n", 184 | " Techniques. *Cancers, 14(16), 3914*.\n", 185 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405591/)\n", 186 | "\n", 187 | "7. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/) Kavitha C, Mani V, Srividhya SR, Khalaf OI, Tavera Romero CA.\n", 188 | " Early-Stage Alzheimer's Disease Prediction Using Machine Learning\n", 189 | " Models. Front Public Health. 2022 Mar 3;10:853294. doi:\n", 190 | " 10.3389/fpubh.2022.853294. PMID: 35309200; PMCID: PMC8927715.\n", 191 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8927715/)\n", 192 | "\n", 193 | "8. [^](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/) Bhagyashree SIR, Nagaraj K, Prince M, Fall CHD, Krishna M.\n", 194 | " Diagnosis of Dementia by Machine learning methods in\n", 195 | " Epidemiological studies: a pilot exploratory study from south\n", 196 | " India. Soc Psychiatry Psychiatr Epidemiol. 2018 Jan;53(1):77-86.\n", 197 | " doi: 10.1007/s00127-017-1410-0. Epub 2017 Jul 11. PMID: 28698926;\n", 198 | " PMCID: PMC6138240.\n", 199 | " [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138240/)\n", 200 | "\n", 201 | "9. [^](https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri) Reddy, Shreyas. 2021. Alzheimer's analysis using MRI, February\n", 202 | " 8, 2024.\n", 203 | " [https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri](https://www.kaggle.com/code/shreyaspj/alzheimer-s-analysis-using-mri)\n", 204 | "\n", 205 | "10. [^](https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf) Andrew. 2017. Predict alzheimer disease sl and tf, February 8, 2024.\n", 206 | " [https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf](https://www.kaggle.com/code/andrew32bit/predict-alzheimer-disease-sl-and-tf)" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "# Hypothesis\n" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "Our project's hypothesis is the following: \"It is possible to predict the onset of Alzheimers based on the combination of (1) clinical data* and (2) personal features such as gender, age, years of education, and socioeconomic status.\" We believe that we will be able to successfully train a model that is able to predict the onset of Alzheimer's disease because, as mentioned in the background portion of the proposal, there has been numerous successful machine learning models trained on clinical data to predict the onset of a disease. The clinical data provide variables that capture both the cognitive and structural changes associated with the disease's progression. Incorporating personal features such as gender and age in the prediction model is justified by extensive research indicating that these factors can influence the risk and progression rate of Alzheimer's disease. Together, we believe that this will be enough data to train a model to predict the onset of Alzheimer's disease.\n", 221 | "\n", 222 | "*Clinical data include Mini Mental State Examination (MMSE), visit number, Clinical Dementia Rating (CDR), Estimated total intracranial volume (eTIV), Normalize Whole Brain Volume (nWBV), and the Atlas Scaling Factor (ASF)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "# Data" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "## Data overview\n", 237 | "\n", 238 | "- Dataset #1\n", 239 | " - Dataset Name: OASIS-2: Longitudinal MRI Data in Nondemented and Demented Older Adults\n", 240 | " - Link to the dataset:https://www.oasis-brains.org/#data\n", 241 | " - Number of observations: 373\n", 242 | " - Number of variables: 15\n", 243 | "\n", 244 | "The dataset provided by The Open Access Series of Imaging Studies (OASIS) contains a collection of 150 subjects aged 60 to 96. Each subject was scanned on 2 or more visits, separated by one year for a total of 373 imaging sessions. All subjects were right-handed and included both men and women. 72 subjects were charaterized as nondemented throughout the study. 64 subjects subjects were characterized as demented at their initial vists and remained so for subsequent scans. Lastly, 14 subjects were characterized as nondemented at their initial vist but were subsequently characterized as demented in later visits. Important variables are years of education (EDUC), socioeconomic status (SES; 1 to 5), mini mental state examination (MMSE score), clinical dementia rating (CDR rating scale), estimated total intracranial volume (eTIV), normalize whole brain volume(nWBV), and atlas scaling factor (ASF). \n", 245 | "\n", 246 | "The variables may be proxies for Alzheimers/cognitive state (conjecture):\n", 247 | "- __EDUC__: Represents years of education which may proxy for cognitive reserve. Higher education levels may result in better cognitive function and reduced risk of dementia. \n", 248 | "- __SES__: Ranging from 1 to 5, is a proxy for income, education level, and occupation, which represents environmental influences on cognitive health. \n", 249 | "- __MMSE__: A widely used screening tool for cognitive impairment. Scores range from 1 to 30 with higher scores indicating better cognitive function. The examination assesses domains such as orientation, memory, attention, and language. \n", 250 | "- __CDR__: CDR scale is also commonly used to assess severity of dementia symptoms. The scale ranges from 0 to 3 where 0 indicates no dementia, 0.5 is questionable dementia, 1 is mild dementia, 2 is moderate dementia, and 3 is severe dementia.\n", 251 | "- __eTIV__: Indicates the total volume inside the skull (including brain tissue, cerebrospinal fluid, etc.). It is measured in cubic centimeters (cc) and can help establish baseline brain size. Decrease/varying sizes may proxy for cognitive function and Alzheimer's risk.\n", 252 | "- __nWBV__: Represents the volume of the brain normalized to the subject's eTIV. It is expressed as a percentage, reflecting the proportion of the brain occupying the eTIV (nWBV = brain volume / eTIV). Changes in nWBV may be indicative of brain atrophy which is a common feature of Alzheimer's and dementia.\n", 253 | "- __ASF__: A scaling factor used in brain imaging to adjust for individual differences in brain size and shape. It accounts for variability in brain morphology allowing for accurate comparisons of brain structures.\n", 254 | "\n", 255 | "The dataset provided by OASIS is relatively clean and organized. However, some cleaning may be required, such as changing M/F and group to numeric values, dropping columns such as MRI ID, Visit, and Hand which will not be used in our predictions. All of this will be done by manipulating the dataframe. " 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "## Dataset #1: oasis_longitudinal.csv" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "### Setup" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 1, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "import pandas as pd\n", 279 | "import numpy as np\n", 280 | "import seaborn as sns\n", 281 | "import matplotlib.pyplot as plt\n", 282 | "%matplotlib inline\n", 283 | "%config InlineBackend.figure_format ='retina'" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 2, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "data": { 293 | "text/html": [ 294 | "
| \n", 312 | " | Subject ID | \n", 313 | "MRI ID | \n", 314 | "Group | \n", 315 | "Visit | \n", 316 | "MR Delay | \n", 317 | "M/F | \n", 318 | "Hand | \n", 319 | "Age | \n", 320 | "EDUC | \n", 321 | "SES | \n", 322 | "MMSE | \n", 323 | "CDR | \n", 324 | "eTIV | \n", 325 | "nWBV | \n", 326 | "ASF | \n", 327 | "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", 332 | "OAS2_0001 | \n", 333 | "OAS2_0001_MR1 | \n", 334 | "Nondemented | \n", 335 | "1 | \n", 336 | "0 | \n", 337 | "M | \n", 338 | "R | \n", 339 | "87 | \n", 340 | "14 | \n", 341 | "2.0 | \n", 342 | "27.0 | \n", 343 | "0.0 | \n", 344 | "1987 | \n", 345 | "0.696 | \n", 346 | "0.883 | \n", 347 | "
| 1 | \n", 350 | "OAS2_0001 | \n", 351 | "OAS2_0001_MR2 | \n", 352 | "Nondemented | \n", 353 | "2 | \n", 354 | "457 | \n", 355 | "M | \n", 356 | "R | \n", 357 | "88 | \n", 358 | "14 | \n", 359 | "2.0 | \n", 360 | "30.0 | \n", 361 | "0.0 | \n", 362 | "2004 | \n", 363 | "0.681 | \n", 364 | "0.876 | \n", 365 | "
| 2 | \n", 368 | "OAS2_0002 | \n", 369 | "OAS2_0002_MR1 | \n", 370 | "Demented | \n", 371 | "1 | \n", 372 | "0 | \n", 373 | "M | \n", 374 | "R | \n", 375 | "75 | \n", 376 | "12 | \n", 377 | "NaN | \n", 378 | "23.0 | \n", 379 | "0.5 | \n", 380 | "1678 | \n", 381 | "0.736 | \n", 382 | "1.046 | \n", 383 | "
| 3 | \n", 386 | "OAS2_0002 | \n", 387 | "OAS2_0002_MR2 | \n", 388 | "Demented | \n", 389 | "2 | \n", 390 | "560 | \n", 391 | "M | \n", 392 | "R | \n", 393 | "76 | \n", 394 | "12 | \n", 395 | "NaN | \n", 396 | "28.0 | \n", 397 | "0.5 | \n", 398 | "1738 | \n", 399 | "0.713 | \n", 400 | "1.010 | \n", 401 | "
| 4 | \n", 404 | "OAS2_0002 | \n", 405 | "OAS2_0002_MR3 | \n", 406 | "Demented | \n", 407 | "3 | \n", 408 | "1895 | \n", 409 | "M | \n", 410 | "R | \n", 411 | "80 | \n", 412 | "12 | \n", 413 | "NaN | \n", 414 | "22.0 | \n", 415 | "0.5 | \n", 416 | "1698 | \n", 417 | "0.701 | \n", 418 | "1.034 | \n", 419 | "
| ... | \n", 422 | "... | \n", 423 | "... | \n", 424 | "... | \n", 425 | "... | \n", 426 | "... | \n", 427 | "... | \n", 428 | "... | \n", 429 | "... | \n", 430 | "... | \n", 431 | "... | \n", 432 | "... | \n", 433 | "... | \n", 434 | "... | \n", 435 | "... | \n", 436 | "... | \n", 437 | "
| 368 | \n", 440 | "OAS2_0185 | \n", 441 | "OAS2_0185_MR2 | \n", 442 | "Demented | \n", 443 | "2 | \n", 444 | "842 | \n", 445 | "M | \n", 446 | "R | \n", 447 | "82 | \n", 448 | "16 | \n", 449 | "1.0 | \n", 450 | "28.0 | \n", 451 | "0.5 | \n", 452 | "1693 | \n", 453 | "0.694 | \n", 454 | "1.037 | \n", 455 | "
| 369 | \n", 458 | "OAS2_0185 | \n", 459 | "OAS2_0185_MR3 | \n", 460 | "Demented | \n", 461 | "3 | \n", 462 | "2297 | \n", 463 | "M | \n", 464 | "R | \n", 465 | "86 | \n", 466 | "16 | \n", 467 | "1.0 | \n", 468 | "26.0 | \n", 469 | "0.5 | \n", 470 | "1688 | \n", 471 | "0.675 | \n", 472 | "1.040 | \n", 473 | "
| 370 | \n", 476 | "OAS2_0186 | \n", 477 | "OAS2_0186_MR1 | \n", 478 | "Nondemented | \n", 479 | "1 | \n", 480 | "0 | \n", 481 | "F | \n", 482 | "R | \n", 483 | "61 | \n", 484 | "13 | \n", 485 | "2.0 | \n", 486 | "30.0 | \n", 487 | "0.0 | \n", 488 | "1319 | \n", 489 | "0.801 | \n", 490 | "1.331 | \n", 491 | "
| 371 | \n", 494 | "OAS2_0186 | \n", 495 | "OAS2_0186_MR2 | \n", 496 | "Nondemented | \n", 497 | "2 | \n", 498 | "763 | \n", 499 | "F | \n", 500 | "R | \n", 501 | "63 | \n", 502 | "13 | \n", 503 | "2.0 | \n", 504 | "30.0 | \n", 505 | "0.0 | \n", 506 | "1327 | \n", 507 | "0.796 | \n", 508 | "1.323 | \n", 509 | "
| 372 | \n", 512 | "OAS2_0186 | \n", 513 | "OAS2_0186_MR3 | \n", 514 | "Nondemented | \n", 515 | "3 | \n", 516 | "1608 | \n", 517 | "F | \n", 518 | "R | \n", 519 | "65 | \n", 520 | "13 | \n", 521 | "2.0 | \n", 522 | "30.0 | \n", 523 | "0.0 | \n", 524 | "1333 | \n", 525 | "0.801 | \n", 526 | "1.317 | \n", 527 | "
373 rows × 15 columns
\n", 531 | "| \n", 874 | " | Subject ID | \n", 875 | "Group | \n", 876 | "MR Delay | \n", 877 | "Age | \n", 878 | "EDUC | \n", 879 | "SES | \n", 880 | "MMSE | \n", 881 | "CDR | \n", 882 | "eTIV | \n", 883 | "nWBV | \n", 884 | "ASF | \n", 885 | "Gender | \n", 886 | "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", 891 | "OAS2_0001 | \n", 892 | "0 | \n", 893 | "0 | \n", 894 | "87 | \n", 895 | "14 | \n", 896 | "2.0 | \n", 897 | "27.0 | \n", 898 | "0.0 | \n", 899 | "1987 | \n", 900 | "0.696 | \n", 901 | "0.883 | \n", 902 | "1 | \n", 903 | "
| 1 | \n", 906 | "OAS2_0002 | \n", 907 | "1 | \n", 908 | "0 | \n", 909 | "75 | \n", 910 | "12 | \n", 911 | "2.0 | \n", 912 | "23.0 | \n", 913 | "0.5 | \n", 914 | "1678 | \n", 915 | "0.736 | \n", 916 | "1.046 | \n", 917 | "1 | \n", 918 | "
| 2 | \n", 921 | "OAS2_0004 | \n", 922 | "0 | \n", 923 | "0 | \n", 924 | "88 | \n", 925 | "18 | \n", 926 | "3.0 | \n", 927 | "28.0 | \n", 928 | "0.0 | \n", 929 | "1215 | \n", 930 | "0.710 | \n", 931 | "1.444 | \n", 932 | "0 | \n", 933 | "
| 3 | \n", 936 | "OAS2_0005 | \n", 937 | "0 | \n", 938 | "0 | \n", 939 | "80 | \n", 940 | "12 | \n", 941 | "4.0 | \n", 942 | "28.0 | \n", 943 | "0.0 | \n", 944 | "1689 | \n", 945 | "0.712 | \n", 946 | "1.039 | \n", 947 | "1 | \n", 948 | "
| 4 | \n", 951 | "OAS2_0007 | \n", 952 | "1 | \n", 953 | "0 | \n", 954 | "71 | \n", 955 | "16 | \n", 956 | "2.0 | \n", 957 | "28.0 | \n", 958 | "0.5 | \n", 959 | "1357 | \n", 960 | "0.748 | \n", 961 | "1.293 | \n", 962 | "1 | \n", 963 | "
| ... | \n", 966 | "... | \n", 967 | "... | \n", 968 | "... | \n", 969 | "... | \n", 970 | "... | \n", 971 | "... | \n", 972 | "... | \n", 973 | "... | \n", 974 | "... | \n", 975 | "... | \n", 976 | "... | \n", 977 | "... | \n", 978 | "
| 145 | \n", 981 | "OAS2_0182 | \n", 982 | "1 | \n", 983 | "0 | \n", 984 | "73 | \n", 985 | "12 | \n", 986 | "2.0 | \n", 987 | "23.0 | \n", 988 | "0.5 | \n", 989 | "1661 | \n", 990 | "0.698 | \n", 991 | "1.056 | \n", 992 | "1 | \n", 993 | "
| 146 | \n", 996 | "OAS2_0183 | \n", 997 | "0 | \n", 998 | "0 | \n", 999 | "66 | \n", 1000 | "13 | \n", 1001 | "2.0 | \n", 1002 | "30.0 | \n", 1003 | "0.0 | \n", 1004 | "1495 | \n", 1005 | "0.746 | \n", 1006 | "1.174 | \n", 1007 | "0 | \n", 1008 | "
| 147 | \n", 1011 | "OAS2_0184 | \n", 1012 | "1 | \n", 1013 | "0 | \n", 1014 | "72 | \n", 1015 | "16 | \n", 1016 | "3.0 | \n", 1017 | "24.0 | \n", 1018 | "0.5 | \n", 1019 | "1354 | \n", 1020 | "0.733 | \n", 1021 | "1.296 | \n", 1022 | "0 | \n", 1023 | "
| 148 | \n", 1026 | "OAS2_0185 | \n", 1027 | "1 | \n", 1028 | "0 | \n", 1029 | "80 | \n", 1030 | "16 | \n", 1031 | "1.0 | \n", 1032 | "28.0 | \n", 1033 | "0.5 | \n", 1034 | "1704 | \n", 1035 | "0.711 | \n", 1036 | "1.030 | \n", 1037 | "1 | \n", 1038 | "
| 149 | \n", 1041 | "OAS2_0186 | \n", 1042 | "0 | \n", 1043 | "0 | \n", 1044 | "61 | \n", 1045 | "13 | \n", 1046 | "2.0 | \n", 1047 | "30.0 | \n", 1048 | "0.0 | \n", 1049 | "1319 | \n", 1050 | "0.801 | \n", 1051 | "1.331 | \n", 1052 | "0 | \n", 1053 | "
150 rows × 12 columns
\n", 1057 | "