├── heart.xlsx ├── Chapter3DataAndEDA.mlx ├── Chapter3DataAndEDA.pdf ├── Chapter04Spreadsheets.mlx ├── Chapter04Spreadsheets.pdf ├── Chapter5DataWrangling.mlx ├── Chapter5DataWrangling.pdf ├── Chapter02VectorsAndMatrices.mlx ├── Chapter2VectorsAndMatrices.pdf ├── Chapter6CommonStatisticalTests.mlx ├── Chapter6CommonStatisticalTests.pdf ├── Chapter01ArithmeticVariablesContainers.mlx ├── Chapter01ArithmeticVariablesContainers.pdf ├── .github └── workflows │ └── draft-pdf.yml ├── README.md ├── paper.md └── paper.bib /heart.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/heart.xlsx -------------------------------------------------------------------------------- /Chapter3DataAndEDA.mlx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter3DataAndEDA.mlx -------------------------------------------------------------------------------- /Chapter3DataAndEDA.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter3DataAndEDA.pdf -------------------------------------------------------------------------------- /Chapter04Spreadsheets.mlx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter04Spreadsheets.mlx -------------------------------------------------------------------------------- /Chapter04Spreadsheets.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter04Spreadsheets.pdf -------------------------------------------------------------------------------- /Chapter5DataWrangling.mlx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter5DataWrangling.mlx -------------------------------------------------------------------------------- /Chapter5DataWrangling.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter5DataWrangling.pdf -------------------------------------------------------------------------------- /Chapter02VectorsAndMatrices.mlx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter02VectorsAndMatrices.mlx -------------------------------------------------------------------------------- /Chapter2VectorsAndMatrices.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter2VectorsAndMatrices.pdf -------------------------------------------------------------------------------- /Chapter6CommonStatisticalTests.mlx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter6CommonStatisticalTests.mlx -------------------------------------------------------------------------------- /Chapter6CommonStatisticalTests.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter6CommonStatisticalTests.pdf -------------------------------------------------------------------------------- /Chapter01ArithmeticVariablesContainers.mlx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter01ArithmeticVariablesContainers.mlx -------------------------------------------------------------------------------- /Chapter01ArithmeticVariablesContainers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juanklopper/MATLAB4DataScience/main/Chapter01ArithmeticVariablesContainers.pdf -------------------------------------------------------------------------------- /.github/workflows/draft-pdf.yml: -------------------------------------------------------------------------------- 1 | name: Draft PDF 2 | on: [push] 3 | 4 | jobs: 5 | paper: 6 | runs-on: ubuntu-latest 7 | name: Paper Draft 8 | steps: 9 | - name: Checkout 10 | uses: actions/checkout@v4 11 | - name: Build draft PDF 12 | uses: openjournals/openjournals-draft-action@master 13 | with: 14 | journal: joss 15 | # This should be the path to the paper within your repo. 16 | paper-path: paper.md 17 | - name: Upload 18 | uses: actions/upload-artifact@v4 19 | with: 20 | name: paper 21 | # This is the output path where Pandoc will write the compiled 22 | # PDF. Note, this should be the same directory as the input 23 | # paper.md 24 | path: paper.pdf 25 | on: 26 | push: 27 | paths: 28 | - paper/** 29 | - .github/workflows/draft-pdf.yml 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MATLAB4HealthDataScience 2 | 3 | This repository contains all the course material for a course in health data science. The course was primarily created for postgraduate students in public health who have taken courses in SAS, Python, and R for data analsyis. 4 | 5 | MATLAB is listed as required skill in many data science jobs. In healthcare these jobs tend to be with companies and institutions that includes medical devices and hardware in their services and products. 6 | 7 | The course consists of six chapters. Each chapter has complete documentation that is provided as a portable document format (PDF) file and as a MATLAB Live Script file. The latter files are interactive coding envorinments that contains both the learning material and the code. Students can replicate or alter the code as part of their exploration. 8 | 9 | The course reviews all concepts related to a data science or research project including how to work with data and import data from a spreadsheet. 10 | 11 | Note that MATLAB is prorietary software. Many Colleges, universities, and other institutions have site licences. Find out from you administrator if you have free access to MATLAB. 12 | 13 | There is also a complete course video. Visit https://www.youtube.com/@datascienceforpublichealth 14 | -------------------------------------------------------------------------------- /paper.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Public Health Data Science with MATLAB' 3 | tags: 4 | - data science 5 | - public health 6 | - MATLAB 7 | - tutorial 8 | authors: 9 | - name: Juan H Klopper 10 | orcid: 0000-0002-7325-1906 11 | affiliation: 1 12 | affiliations: 13 | - name: Milken Institute School of Public Health, The George Washington University 14 | index: 1 15 | date: 21 August 21 2024 16 | bibliography: paper.bib 17 | --- 18 | 19 | # Summary 20 | 21 | Public health research is essential for identifying, preventing, and managing health issues that affect populations [@jutte_administrative_2011]. It plays a critical role in understanding the distribution and determinants of health and diseases, which in turn informs the development of policies and interventions that improve public health outcomes [@bernier_public_2011]. Through research, public health professionals can identify risk factors, evaluate the effectiveness of interventions, and guide evidence-based decision-making to protect and promote the health of communities [@brownson_building_2018]. 22 | 23 | For domain experts, the ability to work with and analyze data is particularly important [@maier-hein_surgical_2022]. The growing availability of health data from electronic health records to large-scale surveys—offers opportunities to uncover patterns, predict trends, and optimize resource allocation [@de_mauro_beyond_2016]. By equipping themselves with data science skills, public health students and experts can leverage these data sources to generate insights that are crucial for tackling complex health challenges, such as chronic diseases, pandemics, and health disparities. 24 | 25 | Proficiency in statistical tools and software is invaluable, as it allows researchers to perform advanced analyses, develop models, and visualize data, thereby enhancing the quality and impact of their research in public health. The MATLAB language and software environment allows domain experts full access to working with data, either using code or by using a variety of built-in tools that import, wrangle, analyze, and report on finding with the click of a button. This makes MATLAB an ideal tool for the domain expert to contribute to data analysis and research in public health. 26 | 27 | The open educational resource built using MATLAB and presented here is structured as follows. 28 | 29 | 1. Arithmetic, variables, and data containers 30 | 2. Storing data as vectors and matrices 31 | 3. Data types and exploratory data analysis 32 | 4. Working with data in spreadsheets 33 | 5. Data wrangling 34 | 6. Common statistical tests and models 35 | 36 | The audience for this open educational tool are any student of Public Health, early career researcher in Public Health, any expert in Public Health already in employment, and for any instructor at a School of Public Health that wishes to introduce the use of MATLAB to their students. 37 | 38 | Since its creation, the material in this course has been used by over 100 past students of the author. The course material consists off a full set of elaborate documents (one for each chapter), available on GitHub in PDF and in MLX (MATLAB Live Script) format [@klopper_juankloppermatlab4datascience_2024]. The latter are interactive documents. Students can use these notebooks to experiment with the code in the course. The GitHub repository also contains the data file used in the course. There is also a five-hour instructional video for the course [@data_science_for_public_health_matlab_2024]. 39 | 40 | # Statement of Need 41 | 42 | The availability of data and the need to use this data to find solutions to public health problems is increasing. The recent COVID-19 pandemic serves as outstanding example. The tools to manage and analyze data and to present findings have matured over time. There is an increased expectation that public health experts, scientists, and researchers not only understand their domain, but also have a working understanding of applied computer science and statistics to achieve their goals. 43 | 44 | Many Schools of Public Health throughout the United States (US) and elsewhere in the World offer technical courses in computer languages and software for Health Data Science. These courses are dominated by SAS [@noauthor_sas_nodate], Python [@noauthor_welcome_2024], and R [@noauthor_r_nodate]. None of the 10 leading Schools in the US offer any courses in MATLAB. MATLAB, however, remains a leading language in Data Science today. This is especially true in technical and engineering fields. The job market for graduates of Schools of Public Health is robust, with opportunities in many job markets. To enable students to expand their opportunities for employment in companies and organizations with health-related technical and engineering positions, the free and open educational resource presented in this paper aims to achieve the following. 45 | 46 | 1. Provide a learning resource to all students of Public Health and instructors at Schools of Public Health 47 | 2. Review important topics in biostatistics 48 | 3. Learn critical skills in the creation, wrangling, analysis, and presentation of data using a coding and low-coding approach with the MATLAB language and built-in tools 49 | 4. Learn to use code and low-code approaches to solve problems in Public Health 50 | 51 | # References -------------------------------------------------------------------------------- /paper.bib: -------------------------------------------------------------------------------- 1 | 2 | @article{jutte_administrative_2011, 3 | title = {Administrative {Record} {Linkage} as a {Tool} for {Public} {Health} {Research}}, 4 | volume = {32}, 5 | issn = {0163-7525, 1545-2093}, 6 | url = {https://www.annualreviews.org/content/journals/10.1146/annurev-publhealth-031210-100700}, 7 | doi = {10.1146/annurev-publhealth-031210-100700}, 8 | abstract = {Linked administrative databases offer a powerful resource for studying important public health issues. Methods developed and implemented in several jurisdictions across the globe have achieved high-quality linkages for conducting health and social research without compromising confidentiality. Key data available for linkage include health services utilization, population registries, place of residence, family ties, educational outcomes, and use of social services. Linking events for large populations of individuals across disparate sources and over time permits a range of research possibilities, including the capacity to study low-prevalence exposure-disease associations, multiple outcome domains within the same cohort of individuals, service utilization and chronic disease patterns, and life course and transgenerational transmission of health. Limited information on variables such as individual-level socioeconomic status (SES) and social supports is outweighed by strengths that include comprehensive follow-up, continuous data collection, objective measures, and relatively low expense. Ever advancing methodologies and data holdings guarantee that research using linked administrative databases will make increasingly important contributions to public health research.}, 9 | language = {en}, 10 | number = {Volume 32, 2011}, 11 | urldate = {2024-08-21}, 12 | journal = {Annual Review of Public Health}, 13 | author = {Jutte, Douglas P. and Roos, Leslie L. and Brownell, Marni D.}, 14 | month = apr, 15 | year = {2011}, 16 | note = {Publisher: Annual Reviews}, 17 | pages = {91--108}, 18 | file = {Snapshot:/Users/juanklopper/Zotero/storage/KFRDTTF7/annurev-publhealth-031210-100700.html:text/html}, 19 | } 20 | 21 | @article{brownson_building_2018, 22 | title = {Building {Capacity} for {Evidence}-{Based} {Public} {Health}: {Reconciling} the {Pulls} of {Practice} and the {Push} of {Research}}, 23 | volume = {39}, 24 | issn = {0163-7525, 1545-2093}, 25 | shorttitle = {Building {Capacity} for {Evidence}-{Based} {Public} {Health}}, 26 | url = {https://www.annualreviews.org/content/journals/10.1146/annurev-publhealth-040617-014746}, 27 | doi = {10.1146/annurev-publhealth-040617-014746}, 28 | abstract = {Timely implementation of principles of evidence-based public health (EBPH) is critical for bridging the gap between discovery of new knowledge and its application. Public health organizations need sufficient capacity (the availability of resources, structures, and workforce to plan, deliver, and evaluate the preventive dose of an evidence-based intervention) to move science to practice. We review principles of EBPH, the importance of capacity building to advance evidence-based approaches, promising approaches for capacity building, and future areas for research and practice. Although there is general agreement among practitioners and scientists on the importance of EBPH, there is less clarity on the definition of evidence, how to find it, and how, when, and where to use it. Capacity for EBPH is needed among both individuals and organizations. Capacity can be strengthened via training, use of tools, technical assistance, assessment and feedback, peer networking, and incentives. Modest investments in EBPH capacity building will foster more effective public health practice.}, 29 | language = {en}, 30 | number = {Volume 39, 2018}, 31 | urldate = {2024-08-21}, 32 | journal = {Annual Review of Public Health}, 33 | author = {Brownson, Ross C. and Fielding, Jonathan E. and Green, Lawrence W.}, 34 | month = apr, 35 | year = {2018}, 36 | note = {Publisher: Annual Reviews}, 37 | pages = {27--53}, 38 | file = {Full Text:/Users/juanklopper/Zotero/storage/LETE7QW4/Brownson et al. - 2018 - Building Capacity for Evidence-Based Public Health.pdf:application/pdf;Snapshot:/Users/juanklopper/Zotero/storage/LF9FGSNL/annurev-publhealth-040617-014746.html:text/html}, 39 | } 40 | 41 | @misc{noauthor_sas_nodate, 42 | title = {{SAS} {Viya} – {AI}, {Analytics} and {Data} {Management} on a {Cloud} {Native} {Platform}}, 43 | url = {https://www.sas.com/en_us/software/viya.html}, 44 | abstract = {SAS Viya is a cloud native AI, analytic and data management platform that supports your whole team across the entire analytics life cycle.}, 45 | language = {en}, 46 | urldate = {2024-08-21}, 47 | file = {Snapshot:/Users/juanklopper/Zotero/storage/6GVU4FMP/viya.html:text/html}, 48 | } 49 | 50 | @misc{noauthor_welcome_2024, 51 | title = {Welcome to {Python}.org}, 52 | url = {https://www.python.org/}, 53 | abstract = {The official home of the Python Programming Language}, 54 | language = {en}, 55 | urldate = {2024-08-21}, 56 | journal = {Python.org}, 57 | month = aug, 58 | year = {2024}, 59 | file = {Snapshot:/Users/juanklopper/Zotero/storage/P7PQJWVV/www.python.org.html:text/html}, 60 | } 61 | 62 | @misc{noauthor_r_nodate, 63 | title = {R: {What} is {R}?}, 64 | url = {https://www.r-project.org/about.html}, 65 | urldate = {2024-08-21}, 66 | file = {R\: What is R?:/Users/juanklopper/Zotero/storage/ETK4NQRL/about.html:text/html}, 67 | } 68 | 69 | @article{bernier_public_2011, 70 | title = {Public health policy research: making the case for a political science approach}, 71 | volume = {26}, 72 | issn = {0957-4824}, 73 | shorttitle = {Public health policy research}, 74 | url = {https://doi.org/10.1093/heapro/daq079}, 75 | doi = {10.1093/heapro/daq079}, 76 | abstract = {The past few years have seen the emergence of claims that the political determinants of health do not get due consideration and a growing demand for better insights into public policy analysis in the health research field. Several public health and health promotion researchers are calling for better training and a stronger research culture in health policy. The development of these studies tends to be more advanced in health promotion than in other areas of public health research, but researchers are still commonly caught in a naïve, idealistic and narrow view of public policy. This article argues that the political science discipline has developed a specific approach to public policy analysis that can help to open up unexplored levers of influence for public health research and practice and that can contribute to a better understanding of public policy as a determinant of health. It describes and critiques the public health model of policy analysis, analyzes political science's specific approach to public policy analysis, and discusses how the politics of research provides opportunities and barriers to the integration of political science's distinctive contributions to policy analysis in health promotion.}, 77 | number = {1}, 78 | urldate = {2024-08-21}, 79 | journal = {Health Promotion International}, 80 | author = {Bernier, Nicole F. and Clavier, Carole}, 81 | month = mar, 82 | year = {2011}, 83 | pages = {109--116}, 84 | file = {Full Text PDF:/Users/juanklopper/Zotero/storage/PABD8N3B/Bernier and Clavier - 2011 - Public health policy research making the case for.pdf:application/pdf;Snapshot:/Users/juanklopper/Zotero/storage/7YKDZX2U/682978.html:text/html}, 85 | } 86 | 87 | @book{de_mauro_beyond_2016, 88 | title = {Beyond {Data} {Scientists}: a {Review} of {Big} {Data} {Skills} and {Job} {Families}}, 89 | shorttitle = {Beyond {Data} {Scientists}}, 90 | abstract = {Purpose – This paper promises to shed light on the heterogeneous nature of the skills required to ‘win’ with Big Data by analysing a large amount of job posts published online. More specifically we: 1) identify the most important ‘job families’ related to Big Data; 2) recognize homogeneous groups of skills (skillsets) that are most sought after by companies; 3) characterize each job family with the appropriate level of competence required within each Big Data skillset. 91 | 92 | Design/methodology/approach – We implement a semi-automated, fully reproducible, analytical methodology that is able to cope with the significant amount of job posts obtained by scraping some of the most popular job search online portals. Job families are identified through the expert evaluation of the most important keywords appearing in job posts’ titles. Skillsets are instead obtained by using Latent Dirichlet Allocation (LDA), an unsupervised machine learning algorithm used for text classification. Finally, we characterize the job families through a measure of the relative importance of each skillset. 93 | 94 | Originality/value – This study represents one of the first attempts to classify jobs in families and describe them in terms of skill requirements by means of a large-scale, semi-automated job post analysis, based on machine learning algorithms. To do so, we propose an original combination of various analytical techniques, which are widely established in previous scientific works. The characterization of job families through text mining and topic modelling techniques is innovative and can be reapplied to similar future studies focusing on any other professional field. 95 | 96 | Practical implications – This paper brings clarity to the multifaceted nature of Big Data competency requirements and job role types. Our results can concretely help business leaders and HR managers create clearer strategies for the procurement of the right skills needed to leverage Big Data at best. In addition, the structured classification of job families and skillsets will help establish a common language to be used within the job market, through which supply and demand can more effectively meet.}, 97 | author = {De Mauro, Andrea and Greco, Marco and Grimaldi, Michele and Nobili, Giacomo}, 98 | month = jun, 99 | year = {2016}, 100 | } 101 | 102 | @article{maier-hein_surgical_2022, 103 | title = {Surgical data science – from concepts toward clinical translation}, 104 | volume = {76}, 105 | issn = {1361-8415}, 106 | url = {https://www.sciencedirect.com/science/article/pii/S1361841521003510}, 107 | doi = {10.1016/j.media.2021.102306}, 108 | abstract = {Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.}, 109 | urldate = {2024-08-21}, 110 | journal = {Medical Image Analysis}, 111 | author = {Maier-Hein, Lena and Eisenmann, Matthias and Sarikaya, Duygu and März, Keno and Collins, Toby and Malpani, Anand and Fallert, Johannes and Feussner, Hubertus and Giannarou, Stamatia and Mascagni, Pietro and Nakawala, Hirenkumar and Park, Adrian and Pugh, Carla and Stoyanov, Danail and Vedula, Swaroop S. and Cleary, Kevin and Fichtinger, Gabor and Forestier, Germain and Gibaud, Bernard and Grantcharov, Teodor and Hashizume, Makoto and Heckmann-Nötzel, Doreen and Kenngott, Hannes G. and Kikinis, Ron and Mündermann, Lars and Navab, Nassir and Onogur, Sinan and Roß, Tobias and Sznitman, Raphael and Taylor, Russell H. and Tizabi, Minu D. and Wagner, Martin and Hager, Gregory D. and Neumuth, Thomas and Padoy, Nicolas and Collins, Justin and Gockel, Ines and Goedeke, Jan and Hashimoto, Daniel A. and Joyeux, Luc and Lam, Kyle and Leff, Daniel R. and Madani, Amin and Marcus, Hani J. and Meireles, Ozanan and Seitel, Alexander and Teber, Dogu and Ückert, Frank and Müller-Stich, Beat P. and Jannin, Pierre and Speidel, Stefanie}, 112 | month = feb, 113 | year = {2022}, 114 | keywords = {Artificial intelligence, Clinical translation, Computer aided surgery, Deep learning, Surgical data science}, 115 | pages = {102306}, 116 | file = {Full Text:/Users/juanklopper/Zotero/storage/CV8TME2U/Maier-Hein et al. - 2022 - Surgical data science – from concepts toward clini.pdf:application/pdf;ScienceDirect Snapshot:/Users/juanklopper/Zotero/storage/AR33HTXH/S1361841521003510.html:text/html}, 117 | } 118 | 119 | @misc{data_science_for_public_health_matlab_2024, 120 | title = {{MATLAB} for {Data} {Science} {Complete} {Course}}, 121 | url = {https://www.youtube.com/watch?v=2pVuWLZ-Ac8}, 122 | abstract = {A complete course introducing MATLAB for Data Science. 123 | 124 | In this complete course I introduce you to the use of MATLAB for Data Science. While there are many courses on MATLAB here on YouTube, very few use Live Scripts. Live Scripts are living research documents and are absolutely great for Data Science. They are the core of reproducible research and science communication. We can use them to share our work with collaborators and with the World. 125 | 126 | Each chapter is accompanied by a complete set of notes that you can find at https://github.com/juanklopper/MATLAB... 127 | 128 | There are six chapters in this course and you can find timestamps to each below. 129 | 130 | 00:05:26 Introducing MATLAB and Live Scripts 131 | 01:02:35 Vectors, matrices, and indexing 132 | 01:59:22 Creating simulated data and exploratory data analysis 133 | 03:01:31 Working with spreadsheets 134 | 03:33:12 Data manipulation 135 | 04:06:19 Common statistical tests 136 | 137 | The spreadsheet that I use in this notebooks can be found at https://github.com/juanklopper/MATLAB... 138 | 139 | Please visit my main channel at    / @drjuanklopper   140 | 141 | Visit my University webpage at https://blogs.gwu.edu/juanklopper}, 142 | urldate = {2024-08-21}, 143 | author = {{Data Science for Public Health}}, 144 | month = jul, 145 | year = {2024}, 146 | } 147 | 148 | @misc{klopper_juankloppermatlab4datascience_2024, 149 | title = {juanklopper/{MATLAB4DataScience}}, 150 | url = {https://github.com/juanklopper/MATLAB4DataScience}, 151 | abstract = {Course material for my course on introducing MATLAB for Data Science.}, 152 | urldate = {2024-08-21}, 153 | author = {Klopper, Dr Jay H.}, 154 | month = aug, 155 | year = {2024}, 156 | note = {original-date: 2024-07-07T05:21:30Z}, 157 | } 158 | --------------------------------------------------------------------------------