├── Chunk_Options.jpg ├── rmarkdown-reference.pdf └── README.md /Chunk_Options.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maiwen-ch/2025_Data_Analysis_Project/HEAD/Chunk_Options.jpg -------------------------------------------------------------------------------- /rmarkdown-reference.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maiwen-ch/2025_Data_Analysis_Project/HEAD/rmarkdown-reference.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Data Analysis Project 2025 - overview 2 | ===================================== 3 | 4 | Responsible teachers: 5 | 6 | Carl Herrmann (carl.herrmann@bioquant.uni-heidelberg.de) - [IPMB & BioQuant](http://www.hdsu.org/) 7 | Maiwen Caudron-Herger (m.caudron@dkfz-heidelberg.de) - Research group - [RNA-Protein Complexes & Cell Proliferation](https://www.dkfz.de/en/rna-protein-complexes-and-cell-proliferation) (B150) 8 |
9 | 10 | What are the goal? 11 | ------------------ 12 | 13 | The goal of the Data Analysis Module during the Summer Semester is to provide hands-on experience in data analysis of large scale datasets and get first insights into using computational tools to provide a reproducible data analysis. 14 | 15 | After this module, you will have 16 | 17 | > - gained skills in programming language such as R or Python (depending on the chosen project) 18 | > - learn to use tools to perform reproducible analysis, like Markdown/Notebooks and GitHub 19 | > - understand the key steps in a large-scale data analysis 20 |
21 | 22 | In a nutshell… 23 | -------------- 24 | - choose your team and one project until Monday 21.04 (coordination: semester speakers) 25 | - prepare and present a 10 min. project proposal on Wednesday 14.05 26 | - work on your project during the whole term 27 | - meet your tutor every week (max. 30 min per group) 28 | - get your repository completed until Monday 07.07 29 | - give a final poster presentation on Wednesday 09.07 30 |
31 | 32 | Important dates 33 | --------------- 34 | - Introductory lecture Wednesday 16.04 35 | - overall presentation of the module 36 | - presentation of the different topics and sub-projects 37 | 38 | - Introductory lecture Wednesday 23.04 39 | - introduction to Markdown / Github 40 | 41 | - Tutorial Wednesday 30.04 42 | - More specific information about the presentations and evaluations 43 | - Introduction to visual communication for scientists (slides, poster) 44 |
45 | 46 | Projects 47 | -------- 48 | We have defined 5 topics in data and image analysis. 49 | Each project will comprise up to 5 different sub-projects. 50 | Most of the time, these 5 sub-projects are very similar to each other but analyze slightly different datasets. 51 |
52 | 53 | You can find a description of the 5 topics here: 54 | 55 | - Topic 01 : [Biomedical Image Analysis](https://github.com/maiwen-ch/2025_Data_Analysis_Topic_01_Biomedical_Image_Analysis) (Karl Rohr & Leonid Kostrykin - Python) 56 | - Tutor: Bastian Mucha 57 | 58 | - Topic 02: [Gene Regulation of Immune Cells](https://github.com/maiwen-ch/2025_Data_Analysis_Topic_02_Gene_Regulation_of_Immune_Cells.git) (Alexander Sasse - Python) 59 | - Tutorin: Aidana Smugalova 60 | 61 | - Topic 03 : [Proteome-wide Screen for RNA-dependent Proteins](https://github.com/maiwen-ch/2025_Data_Analysis_Topic_03_Proteome_Screen) (Maiwen Caudron-Herger - R) 62 | - Tutorin: Michela Pozzi 63 | 64 | - Topic 04: [Antibody-Antigen Interactions](https://github.com/maiwen-ch/2025_Data_Analysis_Topic_04_Antibody_Antigen_Interactions) (Dominik Niopek - Python) 65 | - Tutor: Enno Schäfer 66 | 67 | - Topic 05: [DNA Methylation](https://github.com/maiwen-ch/2025_Data_Analysis_Topic_05_DNA_Methylation.git) (Michael Scherer - R) 68 | - Tutorin: Franziska Lam 69 |
70 | 71 | Important information: 72 | --------------------- 73 | 74 | - Topic 1 is an image analysis project which will be performed in Python. 75 | - Topics 2,3,4,5 are data analysis topics/projects which will be conducted in R or Python as indicated above. 76 |
77 | 78 | How to … 79 | -------- 80 | 81 | How do I select my project/team? 82 | - check the project description on this page (see above)! 83 | - select your team mates; each sub-project will be worked out by groups of **4 students**. 84 | - once the choice has been made, register your team in a Google Spreadsheet (see below); the choice of sub-project and definition of the teams should be completed by **Monday, 21.04.25, 10 am** (no extension!) 85 | - **create a GitHub account and register your github name in the registration Google Sheet** (coordinated by the semester speakers) 86 |
87 | 88 | Who will help me? 89 | ---------------- 90 | 91 | For each project, there will be a tutor assigned to this project. Each team within a project will have a **weekly meeting** with its tutor on Wednesday between 10 am and 1 pm during **20-30 minutes**. 92 |
93 | 94 | **VERY IMPORTANT**: as the weekly time which the tutor can dedicate to your project is limited, you should **carefully prepare your meeting**. 95 |
96 | 97 | 98 | What am I supposed to do? 99 | ------------------------- 100 | 101 | - **select your project and you team mates and register before Monday, 21.04, 10 am** 102 | - **Project presentation on 14.05.25**: prepare a 10 minutes presentation (+10 minutes discussion/questions) based on the indicated literature for each project, listing the relevant questions/topics that you want to address in your project. During this presentation, you should also explain the datasets you will be working with, and how you want to make use of them. 103 | - **Final RMarkdown/Jupyther notebook** (should be placed in the GitHub repositrory) **until 07.07.25 latest** as the **repositories will be closed on 07.07. at 12 pm**. 104 | - **Final presentation on 09.07.25** (10 min poster presentation + 10 minutes discussion/questions) 105 |
106 | 107 | How will I be evaluated? 108 | ------------------------ 109 | 110 | Each student will have an individual evaluation! This will take into account the 2 presentations listed above. 111 |
112 | Here are the relevant points taken into account during the project proposal presentation: 113 | 114 | - presentation of the **literature** results 115 | - understanding of the **biological question** 116 | - clarity of presentation of planned analysis and milestones 117 | - **knowledge of the datasets** 118 | - **teamwork** aspects 119 | - **quality** of the oral presentation, slides and poster 120 | - During the oral presentations, each student will be asked to explain part of the analysis, especially to explain the code! So everyone should make sure to be involved in the project. 121 |
122 | 123 | Final script will be submitted (or “committed”) to the Github repository of the group as a .Rmd or .ipynb file. 124 | 125 | **Important note**: *Science is collaboration!* so please make sure to share your insights/knowledge with other groups! You are free to choose whatever way to do so, e.g. Whatsapp groups or Slack groups. 126 | 127 | > *Practical aspects* 128 | > Depending on the projects, you will use either R (Topics 03/05) or python (Topics 01/02/04). 129 |
130 | 131 | > *R-based projects* 132 | You will use [RStudio](https://www.rstudio.com/), and create a [R markdown document](https://rmarkdown.rstudio.com/). 133 |
134 | 135 | - RStudio 136 | For those of you who want to work on their laptops, you can install RStudio using the previous link; however, since some datasets are rather large, you will need a decent laptop to be able to process the data in a reasonable time (i5/i7 CPUs with 8Gb RAM at least) 137 |
138 | 139 | - RMarkdown documents 140 | These will consist in a mixture of plain text (explanations about the analysis, comments,…) and code pieces (called **chunks**). All plots will be automatically and dynamically created from the code pieces in the markdown document. 141 | 142 | > Have a look at [this tutorial](https://rmarkdown.rstudio.com/lesson-1.html) or [this one](https://support.rstudio.com/hc/en-us/articles/205368677-R-Markdown-Dynamic-Documents-for-R) to get started with RMarkdown; RMarkdown is very easy to generate with RStudio. 143 |
144 | 145 | > *Python based projects* 146 | Similar to R markdow documents, the [Jupyter Notebook](https://jupyter.org/) offers a way to mix markdown text together with Python code. Installing Jupyter Notebook requires the installation of Python. Just follow the instructions on the previous link. 147 |
148 | 149 | > *GitHub* 150 | Git is a system to handle collaborative projects, in which each member of the team is contributing to the project. You can check [this website](https://guides.github.com/activities/hello-world/) for a simple intro to Git/GitHub. 151 | Git can be used either from the command line, or using [GitHub Desktop](https://desktop.github.com/), a GUI manager which makes commiting changes, etc… very easy. 152 |
153 | 154 | This tool will help you (and us…) track the progress of your project. 155 |
156 | 157 | Documentation 158 | -------------- 159 | 160 | Introduction to GitHub 161 | > [Intro to GitHub](https://youtu.be/tTwftnbWr6E) 162 | > [Push / Pull changes in GitHub](https://youtu.be/dz7x5MoZdbA) 163 | > [Resolve conflicts](https://youtu.be/2P5FM2WTNcQ) 164 |
165 | These are the Python Notebook files with Python intro provided by David Schwarzenbacher 166 | >[Python notebooks](https://hub.dkfz.de/s/cQdQY5F8Lcm2Nkc) 167 | --------------------------------------------------------------------------------