├── 2022 ├── readme.md ├── week10handout.pdf ├── week12handout.pdf ├── week13handout.pdf ├── week14handout.pdf ├── week15handout.pdf ├── week1handout.pdf ├── week3handout.pdf ├── week4handout.pdf ├── week5handout.pdf ├── week6handout.pdf ├── week7handout.pdf └── week8handout.pdf ├── 2024 ├── Week 1 │ └── week1handout.pdf ├── Week 10 │ ├── basic LaTeX document.zip │ ├── week10assignment.md │ └── week10handout.pdf ├── Week 11 │ ├── week11assignment.md │ └── week11handout.pdf ├── Week 12 │ ├── big_project.zip │ ├── week12assignment.md │ └── week12handout.pdf ├── Week 13 │ ├── big_project_1.zip │ ├── big_project_2.zip │ ├── corpus1.TXT │ ├── corpus2.TXT │ ├── corpus3.TXT │ ├── week13assignment.md │ └── week13upload.pdf ├── Week 14 │ ├── readme-example.md │ ├── week14assignment.md │ └── week14handout.pdf ├── Week 15 │ └── week15handout.pdf ├── Week 2 │ ├── .Rhistory │ ├── code_APR15.r │ ├── week2assignment.md │ └── week2handout.pdf ├── Week 3 │ ├── code_APR22.r │ ├── week3assignment.md │ └── week3handout.pdf ├── Week 4 │ ├── code_APR29.r │ ├── week4assignment.md │ └── week4handout.pdf ├── Week 5 │ ├── code_MAY06.r │ ├── week5assignment.md │ └── week5handout.pdf ├── Week 6 │ ├── code_MAY13.r │ ├── week6assignment.md │ └── week6handout.pdf ├── Week 8 │ ├── code_MAY27.r │ ├── week8.pdf │ ├── week8assignment.md │ └── week8handout.pdf ├── Week 9 │ ├── quarto-demo.zip │ ├── week9assignment.md │ └── week9handout.pdf └── readme.md ├── 2025 ├── Week 1 │ ├── assignment01.md │ └── week01handout.pdf ├── Week 2 │ ├── assignment02.md │ ├── week02.R │ └── week02handout.pdf ├── Week 3 │ ├── assignment03.R │ ├── assignment03.md │ ├── week03.R │ └── week03handout.pdf ├── Week 4 │ ├── assignment04.R │ ├── assignment04.md │ ├── week04.R │ └── week04handout.pdf ├── Week 5 │ ├── assignment05.R │ ├── assignment05.md │ ├── week05.R │ └── week05handout.pdf ├── Week 6 │ ├── assignment06.R │ └── week06handout.pdf ├── Week 7 │ ├── assignment07.R │ ├── week07.R │ └── week07handout.pdf ├── Week 8 │ ├── assignment08.md │ ├── week08.R │ └── week08handout.pdf ├── Week 9 │ ├── assignment09.md │ ├── quarto_demo.zip │ └── week09handout.pdf └── readme.md ├── R_tutorial ├── .gitignore ├── R_tutorial.Rproj ├── readme.md ├── rsconnect │ └── documents │ │ └── tutorial.Rmd │ │ └── shinyapps.io │ │ └── anna-pryslopska │ │ └── TidyversePractice.dcf ├── tutorial.Rmd └── tutorial.html └── readme.md /2022/readme.md: -------------------------------------------------------------------------------- 1 | # Course description 2 | 3 | This seminar provides a gentle, hands-on introduction to the essential tools for quantitative research for students of the humanities. During the course of the seminar, the students will familiarize themselves with a wide array of software that is rarely taught but is invaluable in developing an efficient, transparent, reusable, and scalable research workflow. From text file, through data visualization, to creating beautiful reports - this course will empower students to improve their skill and help them establish good practices. 4 | 5 | The seminar is targeted at students with little to no experience with programming, who wish to improve their workflow, learn the basics of data handling and document typesetting, prepare for a big project (such as a BA or MA thesis), and learn about scientific project management. 6 | 7 | Materials: laptop with internet access 8 | 9 | ## Syllabus 10 | 11 | | Week | Topic | Description | Slides | 12 | | --:| -- | -- | -- | 13 | | 1 | Introduction, course overview and software installation | Course overview and expectations, classroom management and assignments/grading etc. | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week1handout.pdf) (some pages missing for privacy) | 14 | | 2 | | *no class* | | 15 | | 3 | Looking at data | Data types, encoding, R and RStudio, installing and loading packages, importing data | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week3handout.pdf) | 16 | | 4 | Reading data, data inspection and manipulation | Basic operators, importing, sorting, filtering, subsetting, removing missing data, data maniplation | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week4handout.pdf) | 17 | | 5 | Data manipulation and error handling | Data manipulation (merging, summary statistics, grouping, if ... else ..., etc.), naming variables, pipelines, documentation, tidy code, error handling and getting help | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week5handout.pdf) | 18 | | 6 | Data visualization | Visualizing in R (`ggplot2`, `esquisse`), choice of visualization, plot types, best practices, exporting plots and data | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week6handout.pdf) | 19 | | 7 | Creating reports with RMarkdown and `knitr` | Pandoc, RMarkdown, basic syntax and elements, export, document, and chunk options, documentation | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week7handout.pdf) | 20 | | 8 | Typesetting documents with LaTeX | What is LaTeX, basic document and file structure, advantages and disadvantages, from R to LaTeX | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week8handout.pdf) | 21 | | 9 | | *no class* | | 22 | | 10 | Typesetting documents with LaTeX | Editing text (commands, whitespace, environments, font properties, figures, and tables), glosses, IPA symbols, semantic formulae, syntactic trees | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week10handout.pdf) | 23 | | 11 | | *no class* | | 24 | | 12 | Typesetting documents with LaTeX and bibliography management | Large projects, citations, references, bibliography styles, bib file structure | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week12handout.pdf) | 25 | | 13 | Literature and reference management, common command line commands | Reference managers, looking up literature, text editors, command line commads (grep, diff, ping, cd, etc.) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week13handout.pdf) (some pages missing for privacy) | 26 | | 14 | Version control and Git | Git, GitHub, version control | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week14handout.pdf) | 27 | | 15| Version control and Git, course wrap-up | Git, GitHub, SSH, reverting to older versions | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2022/week15handout.pdf) | 28 | -------------------------------------------------------------------------------- /2022/week10handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week10handout.pdf -------------------------------------------------------------------------------- /2022/week12handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week12handout.pdf -------------------------------------------------------------------------------- /2022/week13handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week13handout.pdf -------------------------------------------------------------------------------- /2022/week14handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week14handout.pdf -------------------------------------------------------------------------------- /2022/week15handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week15handout.pdf -------------------------------------------------------------------------------- /2022/week1handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week1handout.pdf -------------------------------------------------------------------------------- /2022/week3handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week3handout.pdf -------------------------------------------------------------------------------- /2022/week4handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week4handout.pdf -------------------------------------------------------------------------------- /2022/week5handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week5handout.pdf -------------------------------------------------------------------------------- /2022/week6handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week6handout.pdf -------------------------------------------------------------------------------- /2022/week7handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week7handout.pdf -------------------------------------------------------------------------------- /2022/week8handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2022/week8handout.pdf -------------------------------------------------------------------------------- /2024/Week 1/week1handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 1/week1handout.pdf -------------------------------------------------------------------------------- /2024/Week 10/basic LaTeX document.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 10/basic LaTeX document.zip -------------------------------------------------------------------------------- /2024/Week 10/week10assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 10 2 | 3 | Create a basic XeLaTeX document for the Noisy channel experiment (as on page 29 in the handout and the provided LaTeX files). Upload the resulting files to ILIAS. Use the scientific document structure. You DON'T need to include: 4 | 5 | - references, 6 | - abstract, 7 | - tables, 8 | - pictures, 9 | - lists, 10 | - numbered examples, 11 | 12 | You DO need to: 13 | 14 | - describe the experiment in detail, 15 | - include a description of the phenomenon, 16 | - describle the sentences. 17 | -------------------------------------------------------------------------------- /2024/Week 10/week10handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 10/week10handout.pdf -------------------------------------------------------------------------------- /2024/Week 11/week11assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 10 2 | 3 | Re-create the Moses illusion report in LaTeX (don't worry about proper citations). Upload all the files needed for compiling your report (obligatorily the TEX file, but also plots) and the PDF report file. 4 | 5 | - Include at least one table and one figure of the data (with captions). 6 | - Reference and hyperlink the table and figure in the text. 7 | - Create one list (via itemize, enumerate, or exe). 8 | - Make one gloss (e.g. translate a question from the experiment to your native language). 9 | - Make one syntactic tree. 10 | - Make one semantic formula. 11 | - Preserve the scientific article structure (Background, methods, results, etc.). 12 | - Include a table of contents. 13 | -------------------------------------------------------------------------------- /2024/Week 11/week11handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 11/week11handout.pdf -------------------------------------------------------------------------------- /2024/Week 12/big_project.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 12/big_project.zip -------------------------------------------------------------------------------- /2024/Week 12/week12assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 12 2 | 3 | Continue writing the Moses illusion report: make a style file and add a reference section. Upload your TEX, PDF, BIB, image, and style files. 4 | 5 | 1. Put all packages you're loading in a separate style file and load it. 6 | 2. Add 10 different but relevant references to your report, among them 7 | - at least one journal `@article` 8 | - at least one book `@book` 9 | - at least one part of a book `@incollection` 10 | - at least one thesis `@thesis` 11 | 3. Reference all the citations in the text, so that there is at least one of each of these: 12 | - as a parenthetical reference 13 | - as a textual reference 14 | - reference only the author 15 | - reference only the publication year 16 | - reference only the title 17 | - reference it without a citation but include in bibliography 18 | 4. Sort the entries by name, year, and title 19 | 5. Use the `authoryear` or `apa` reference style -------------------------------------------------------------------------------- /2024/Week 12/week12handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 12/week12handout.pdf -------------------------------------------------------------------------------- /2024/Week 13/big_project_1.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 13/big_project_1.zip -------------------------------------------------------------------------------- /2024/Week 13/big_project_2.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 13/big_project_2.zip -------------------------------------------------------------------------------- /2024/Week 13/week13assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 13 2 | 3 | Upload the solution as a single file of your choosing. 4 | 5 | - Find the changes I made in the big project (files: `big_poject1.zip` vs `big_poject2.zip`) → I didn’t compile it so the PDF will look the same. 6 | - How many times does the word "Tagblatt" appear in the files `corpus1.txt`, `corpus2.txt`, and `corpus3.txt`? Count only the lines. 7 | - Count all the lines and instances where the whole article "die" appears in these 3 files. Capitalization is not important, i.e. Die OK, Dieser not OK 8 | - What are the differences between `corpus2.txt` and `corpus3.txt`? 9 | -------------------------------------------------------------------------------- /2024/Week 13/week13upload.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 13/week13upload.pdf -------------------------------------------------------------------------------- /2024/Week 14/readme-example.md: -------------------------------------------------------------------------------- 1 | # Big project 2 | 3 | What is this? This project is authored by Anna Prysłopska. 4 | 5 | This is my Big Project. This can be my BA thesis, a novel, the next big app that Facebook will buy (I don't subscribe to their rebranding), or anything else. 6 | 7 | ## Table of Contents 8 | - [Installation](#installation) 9 | - [Usage](#usage) 10 | - [Configuration](#configuration) 11 | - [Examples](#examples) 12 | - [Contributing](#contributing) 13 | - [Contact](#contact) 14 | 15 | ## Installation 16 | 17 | You need LaTeX for this. 18 | 19 | ## Usage 20 | 21 | Read it. 22 | 23 | ## Configuration 24 | 25 | Some details. 26 | 27 | ## Examples 28 | 29 | Just read it. If you cannot read, this documentation will not help you. 30 | 31 | ## Contributing 32 | 33 | Pull requests are welcome or not. 34 | 35 | ## Contact 36 | 37 | It's Anna Prysłopska, Wikipedia, and ChatGPT. -------------------------------------------------------------------------------- /2024/Week 14/week14assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 14 2 | 3 | Upload a plain text file about how far you got with #2. 4 | 5 | 1. Make a new git repository for the project we worked on this semester (Moses illusion, noisy channel, and the related files). Add a readme file, then add the R files, the Quarto files, and the LaTeX files AS INDIVIDUAL COMMITS. Include only the necessary files (omit redundant files). You should have at least 4 commits. Write meaningful commit messages. 6 | 2. Attempt this exercise and report on how far you got with it: [Bandit](https://overthewire.org/wargames/bandit/bandit0.html) 7 | -------------------------------------------------------------------------------- /2024/Week 14/week14handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 14/week14handout.pdf -------------------------------------------------------------------------------- /2024/Week 15/week15handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 15/week15handout.pdf -------------------------------------------------------------------------------- /2024/Week 2/.Rhistory: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 2/.Rhistory -------------------------------------------------------------------------------- /2024/Week 2/code_APR15.r: -------------------------------------------------------------------------------- 1 | # Lecture 15 APR 2024 ----------------------------------------------------- 2 | # Installing and loading packages 3 | install.packages("dplyr") 4 | library(dplyr) 5 | 6 | # Working directory 7 | setwd() # FIXME remember to add your path! 8 | getwd() 9 | 10 | # First function call 11 | print("Hello world!") 12 | 13 | # Assignment 14 | ten <- 10.2 # works 15 | "rose" -> Rose # works 16 | name = "Anna" # works 17 | true <<- FALSE # works 18 | 13/12 ->> n # doesn't work 19 | 13/12 ->> nrs # works 20 | 21 | # Homework 22 | # 1. Change the layout and color theme of RStudio. 23 | # 2. Make and upload a screenshot of your RStudio installation. 24 | # 3. Install and load the packages: tidyverse, knitr, MASS, and psych 25 | # 4. Write a code that prints a long text (~30 words) and save it to a variable. 26 | # 5. Upload your code to ILIAS. 27 | 28 | # Session information 29 | sessionInfo() 30 | -------------------------------------------------------------------------------- /2024/Week 2/week2assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 2 2 | 3 | Upload 2 files to complete this assignment. 4 | 5 | ## Part 1/file 1 (image) 6 | 7 | - Change the layout and color theme of RStudio. 8 | - Make and upload a screenshot of your RStudio installation. 9 | 10 | ## Part 2/file 2 (r script) 11 | 12 | Upload an R script that does all of the following: 13 | 14 | - Install and load the packages: tidyverse, knitr, MASS, and psych 15 | - Prints a long text (~30 words) and saves it to a variable. 16 | -------------------------------------------------------------------------------- /2024/Week 2/week2handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 2/week2handout.pdf -------------------------------------------------------------------------------- /2024/Week 3/code_APR22.r: -------------------------------------------------------------------------------- 1 | # Lecture 22 APR 2024 ----------------------------------------------------- 2 | library(tidyverse) 3 | library(psych) 4 | 5 | # Data types 6 | log <- TRUE 7 | int <- 1L 8 | dbl <- 1.0 9 | cpx <- 1+0i 10 | chr <- "one" 11 | nan <- NaN 12 | inf <- Inf 13 | ninf <- -Inf 14 | mis <- NA 15 | ntype <- NULL 16 | 17 | # Data type exercises 18 | # = is for assignment; == is for equality 19 | log == int 20 | log == 10 21 | int == dbl 22 | dbl == cpx 23 | cpx == chr 24 | chr == nan 25 | nan == inf 26 | inf == ninf 27 | ninf == mis 28 | mis == mis 29 | mis == ntype 30 | ninf == ntype 31 | 32 | 33 | 5L + 2 34 | 3.7 * 3L 35 | 99999.0e-1 - 3.3e+3 36 | 10 / as.complex(2) 37 | as.character(5) / 5 38 | 39 | # Removing from the environment 40 | x <- "bad" 41 | rm(x) 42 | 43 | # Moses illusion data 44 | moses <- read_csv("moses.csv") 45 | moses 46 | print(moses) 47 | print(moses, n=Inf) 48 | View(moses) 49 | head(moses) 50 | head(as.data.frame(moses)) 51 | tail(as.data.frame(moses), n = 20) 52 | spec(moses) 53 | summary(moses) 54 | describe(moses) 55 | colnames(moses) -------------------------------------------------------------------------------- /2024/Week 3/week3assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 3 2 | 3 | Please complete the following tasks. Submit the assignment as a single R script. Use comments and sections to give your file structure. 4 | 5 | ## According to R, what is the type of the following 6 | 7 | "Anna" 8 | -10 9 | FALSE 10 | 3.14 11 | as.logical(1) 12 | 13 | ## According to R, is the following true 14 | 15 | 7+0i == 7 16 | 9 == 9.0 17 | "zero" == 0L 18 | "cat" == "cat" 19 | TRUE == 1 20 | 21 | ## What is the output of the following operations and why? 22 | 23 | 10 < 1 24 | 5 != 4 25 | 5 - FALSE 26 | 1.0 == 1 27 | 4 *9.1 28 | "a" + 1 29 | 0/0 30 | b* 2 31 | (1-2)/0 32 | 10 <- 20 33 | NA == NA 34 | -Inf == NA 35 | 36 | ## Read and inspect the `noisy.csv` data 37 | 38 | What are the meaningful columns? What should be kept and what can be discarded? 39 | 40 | (anonymized data tdb) 41 | -------------------------------------------------------------------------------- /2024/Week 3/week3handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 3/week3handout.pdf -------------------------------------------------------------------------------- /2024/Week 4/code_APR29.r: -------------------------------------------------------------------------------- 1 | # Lecture 29 APR 2024 ----------------------------------------------------- 2 | library(tidyverse) 3 | library(psych) 4 | 5 | # Moses illusion data 6 | moses <- read_csv("moses.csv") 7 | 8 | ## Clean up data ----------------------------------------------------------- 9 | # Task 1: Rename and drop columns 10 | moses.ren <- 11 | rename(moses, 12 | ID = MD5.hash.of.participant.s.IP.address, 13 | ANSWER = Value) 14 | 15 | moses.sel <- 16 | select(moses.ren, c(ITEM, CONDITION, ANSWER, ID, 17 | Label, Parameter)) 18 | 19 | # Task 2: Remove missing values 20 | moses.na <- na.omit(moses.sel) 21 | 22 | # Task 3: Remove unnecessary rows 23 | moses.fil <- 24 | filter(moses.na, 25 | Parameter == "Final", 26 | Label != "instructions", 27 | CONDITION %in% 1:2) 28 | 29 | # Task 4: Sort the values 30 | moses.arr <- arrange(moses.fil, ITEM, CONDITION, 31 | desc(ANSWER)) 32 | 33 | # Task 5: Re-code item number 34 | moses.it <- mutate(moses.arr, ITEM = as.numeric(ITEM)) 35 | head(moses.it, n=20) 36 | 37 | # Task 6: Look at possible answers 38 | uk <- unique(select(filter(moses.it, ITEM == 2), ANSWER)) 39 | 40 | ## Noisy channel data ------------------------------------------------------ 41 | noisy <- read_csv("noisy.csv") 42 | 43 | noisy |> 44 | rename(ID = MD5.hash.of.participant.s.IP.address) |> 45 | select(ID, 46 | Label, 47 | PennElementType, 48 | Parameter, 49 | Value, 50 | ITEM, 51 | CONDITION, 52 | Reading.time, 53 | Sentence..or.sentence.MD5.) |> 54 | view() 55 | 56 | # Session information 57 | sessionInfo() 58 | -------------------------------------------------------------------------------- /2024/Week 4/week4assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 4 2 | 3 | Please complete the following tasks. Submit the assignment as a single R script. Use comments and sections to give your file structure. 4 | 5 | ## What do the following evaluate to and why? 6 | 7 | !TRUE 8 | FALSE + 0 9 | 5 & TRUE 10 | 0 & TRUE 11 | 1 - FALSE 12 | FALSE + 1 13 | 1 | FALSE 14 | FALSE | NA 15 | 16 | ## Have the moses.csv data saved in your environment as "moses". Why do the following fail? 17 | 18 | Summary(moses) 19 | read_csv(moses.csv) 20 | tail(moses, n==10) 21 | describe(Moses) 22 | filter(moses, CONDITION == 102) 23 | arragne(moses, ID) 24 | mutate(moses, ITEMS = as.character("ITEM")) 25 | 26 | ## Clean up the Moses illusion data like we did in the tasks in class and save it to a new data frame. Use pipes instead of saving each step to a new data frame. 27 | 28 | - select relevant columns 29 | - rename mislabeled columns 30 | - remove missing data 31 | - remove unnecessary rows 32 | - change the item column to numeric values 33 | - arrange by item, condition, and answer 34 | 35 | ## From the Moses illusion data, make two new variables (printing and dont.know, respectively) with all answers which are supposed to mean "printing (press) and "don't know". 36 | 37 | ## Preprocess noisy channel data. 38 | 39 | Make two data frames: for reading times and for acceptability judgments. 40 | 41 | ### Acceptability ratings 42 | 43 | - rename the ID column and column with the rating 44 | - only data from the experiment (see `Label`) and where `PennElementType` IS "Scale" 45 | - make sure the column with the rating data is numeric 46 | - select the relevant columns: participant ID, item, condition, rating 47 | - remove missing values 48 | 49 | ### Reading times 50 | 51 | - rename the ID column and column with the full sentence 52 | - only data from the experiment (see `Label`) 53 | - only data where `PennElementType` IS NOT "Scale" or "TextInput" 54 | - only data from where Reading.time is not "NULL" (as a string) 55 | - make a new column with reading times as numeric values 56 | - keep only those rows with realistic reading times (between 80 and 2000 ms) 57 | - select relevant columns: participant ID, item, condition, sentence, reading time, and Parameter 58 | - remove missing values 59 | 60 | ## Solve the logic exercise from the slides. 61 | -------------------------------------------------------------------------------- /2024/Week 4/week4handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 4/week4handout.pdf -------------------------------------------------------------------------------- /2024/Week 5/code_MAY06.r: -------------------------------------------------------------------------------- 1 | # Lecture 6 May 2024 ------------------------------------------------------ 2 | library(tidyverse) 3 | library(psych) 4 | 5 | ## Homework --------------------------------------------------------------- 6 | 7 | moses <- read_csv("moses.csv") 8 | 9 | moses <- 10 | moses |> 11 | rename(ID = MD5.hash.of.participant.s.IP.address, 12 | ANSWER = Value) |> 13 | select(ITEM, CONDITION, ANSWER, ID, Label, Parameter) |> 14 | na.omit() |> 15 | filter(Parameter == "Final", 16 | Label != "instructions", 17 | CONDITION%in%1:2) |> 18 | arrange(ITEM, CONDITION, desc(ANSWER)) |> 19 | mutate(ITEM = as.numeric(ITEM)) 20 | 21 | 22 | 23 | moses |> select(RESPONSE) |> arrange(RESPONSE) |> unique() 24 | dont.know <- c("Don't Know", "Don't know", "don't knoe", "don't know", 25 | "don`t know", "dont know", "don´t know", "i don't know", 26 | "Not sure", "no idea", "forgotten", "I do not know", 27 | "I don't know") 28 | 29 | moses |> filter(ITEM == 108) |> select(RESPONSE) |> unique() 30 | printing <- c("Print", "printer", "Printing", "Printing books", "printing press", 31 | "press", "Press", "letter press", "letterpress", "Letterpressing", 32 | "inventing printing", "inventing the book press/his bibles", 33 | "finding a way to print books", "for inventing the pressing machine", 34 | "Drucka", "Book print", "book printing", "bookpress", "Buchdruck", 35 | "the book printer") 36 | 37 | 38 | 39 | noisy <- read_csv("noisy.csv") 40 | noisy.rt <- 41 | noisy |> 42 | rename(ID = "MD5.hash.of.participant.s.IP.address", 43 | SENTENCE = "Sentence..or.sentence.MD5.") |> 44 | mutate(RT = as.numeric(Reading.time)) |> 45 | filter(Label == "experiment", 46 | PennElementType != "Scale", 47 | PennElementType != "TextInput", 48 | Reading.time != "NULL", 49 | RT > 80 & RT < 2000) |> 50 | select(ID, ITEM, CONDITION, SENTENCE, RT, Parameter) |> 51 | na.omit() 52 | 53 | noisy.aj <- 54 | noisy |> 55 | filter(Label == "experiment", 56 | PennElementType == "Scale") |> 57 | mutate(RATING = as.numeric(Value), 58 | ID = "MD5.hash.of.participant.s.IP.address") |> 59 | select(ID, ITEM, CONDITION, RATING) |> 60 | na.omit() 61 | 62 | 63 | ## Naming ----------------------------------------------------------------- 64 | 65 | ueOd2FNRGAP0dRopq4OqU <- 1:10 66 | ueOd2FNRGAPOdRopq4OqU <- c("Passport", "ID", "Driver's license") 67 | ueOb2FNRGAPOdRopq4OqU <- FALSE 68 | ueOd2FNRGAPOdRopq4OqU <- 5L 69 | 70 | He_just.kept_talking.in.one.long_incredibly.unbroken.sentence.moving_from.topic_to.topic <- 1 71 | 72 | ## Joining, cleaning, grouping, summarizing ------------------------------- 73 | moses <- read_csv("moses_clean.csv") # Look, I overwrote the previous 'moses' variable! 74 | questions <- read_csv("questions.csv") 75 | 76 | # Task 1 77 | data_with_answers <- 78 | moses |> 79 | inner_join(questions, by=c("ITEM", "CONDITION", "LIST")) |> 80 | select(ITEM, CONDITION, ID, ANSWER, CORRECT_ANSWER, LIST) 81 | 82 | moses |> 83 | full_join(questions, by=c("ITEM", "CONDITION", "LIST")) |> 84 | select(ITEM, CONDITION, ID, ANSWER, CORRECT_ANSWER, LIST) 85 | 86 | moses |> 87 | merge(questions, by=c("ITEM", "CONDITION", "LIST")) |> 88 | select(ITEM, CONDITION, ID, ANSWER, CORRECT_ANSWER, LIST) 89 | 90 | # Task 2 91 | moses |> 92 | inner_join(questions, by=c("ITEM", "CONDITION", "LIST")) |> 93 | select(ITEM, CONDITION, ID, ANSWER, CORRECT_ANSWER, LIST) |> 94 | mutate(ACCURATE = ANSWER == CORRECT_ANSWER) 95 | 96 | # Task 3 97 | moses |> 98 | inner_join(questions, by=c("ITEM", "CONDITION", "LIST")) |> 99 | select(ITEM, CONDITION, ID, ANSWER, CORRECT_ANSWER, LIST) |> 100 | mutate(ACCURATE = ifelse(CORRECT_ANSWER == ANSWER, 101 | yes = "correct", 102 | no = ifelse(ANSWER == "dont_know", 103 | yes = "dont_know", 104 | no = "incorrect"))) 105 | 106 | # Task 4 107 | moses |> 108 | inner_join(questions, by=c("ITEM", "CONDITION", "LIST")) |> 109 | select(ITEM, CONDITION, ID, ANSWER, CORRECT_ANSWER, LIST) |> 110 | mutate(ACCURATE = ifelse(CORRECT_ANSWER == ANSWER, 111 | yes = "correct", 112 | no = ifelse(ANSWER == "dont_know", 113 | yes = "dont_know", 114 | no = "incorrect")), 115 | CONDITION = case_when(CONDITION == '1' ~ 'illusion', 116 | CONDITION == '2' ~ 'no illusion', 117 | CONDITION == '100' ~ 'good filler', 118 | CONDITION == '101' ~ 'bad filler')) 119 | 120 | # Task 5 121 | moses |> 122 | full_join(questions, by=c("ITEM", "CONDITION", "LIST")) |> 123 | select(ITEM, CONDITION, ID, ANSWER, CORRECT_ANSWER, LIST) |> 124 | mutate(ACCURATE = ifelse(CORRECT_ANSWER == ANSWER, 125 | yes = "correct", 126 | no = ifelse(ANSWER == "dont_know", 127 | yes = "dont_know", 128 | no = "incorrect")), 129 | CONDITION = case_when(CONDITION == '1' ~ 'illusion', 130 | CONDITION == '2' ~ 'no illusion', 131 | CONDITION == '100' ~ 'good filler', 132 | CONDITION == '101' ~ 'bad filler')) |> 133 | group_by(ITEM, ACCURATE) |> 134 | summarise(Count = n()) -------------------------------------------------------------------------------- /2024/Week 5/week5assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 5 2 | 3 | ## Read and preprocess the new Moses illusion data (`moses_clean.csv`) 4 | 5 | 1. Calculate the percentage of "correct", "incorrect", and "don't know" answers in the two critical conditions. 6 | 2. Of all the questions in all conditions, which question was the easiest and which was the hardest? 7 | 3. Of the Moses illusion questions, which question fooled most people? 8 | 4. Which participant was the best in answering questions? Who was the worst? 9 | 10 | ## Read and inspect the updated new noisy channel data (`noisy_rt.csv` and `noisy_aj.csv`). 11 | 12 | 1. **Acceptability judgment data:** Calculate the mean rating in each condition. How was the data spread out? Did the participants rate the sentences differently? 13 | 2. **Reading times:** calculate the average length people spent reading each sentence fragment in each condition. Did the participant read the sentences differently in each condition? 14 | 3. Make one data frame out of both data frames. Keep all the information but remove redundancy. 15 | -------------------------------------------------------------------------------- /2024/Week 5/week5handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 5/week5handout.pdf -------------------------------------------------------------------------------- /2024/Week 6/code_MAY13.r: -------------------------------------------------------------------------------- 1 | library(tidyverse) 2 | 3 | # Homework ---------------------------------------------------------------- 4 | # Task 1.1: Calculate the percentage of "correct", "incorrect", 5 | # and "don't know" answers in the two critical conditions. 6 | 7 | moses <- read_csv("moses_clean.csv") 8 | questions <- read_csv("questions.csv") 9 | 10 | moses.preprocessed <- 11 | moses |> 12 | inner_join(questions, by=c("ITEM", "CONDITION", "LIST")) |> 13 | select(ID, ITEM, CONDITION, QUESTION, ANSWER, CORRECT_ANSWER) |> 14 | mutate(ACCURATE = ifelse(CORRECT_ANSWER == ANSWER, 15 | yes = "correct", 16 | no = ifelse(ANSWER == "dont_know", 17 | yes = "dont_know", 18 | no = "incorrect")), 19 | CONDITION = case_when(CONDITION == '1' ~ 'illusion', 20 | CONDITION == '2' ~ 'no illusion', 21 | CONDITION == '100' ~ 'good filler', 22 | CONDITION == '101' ~ 'bad filler')) 23 | 24 | moses.preprocessed |> 25 | filter(CONDITION %in% c('illusion', 'no illusion')) |> 26 | group_by(CONDITION, ACCURATE) |> 27 | summarise(count = n()) |> 28 | mutate(percentage = count / sum(count) * 100) 29 | 30 | # Task 1.2: Of all the questions in all conditions, which 31 | # question was the easiest and which was the hardest? 32 | 33 | minmax <- 34 | moses.preprocessed |> 35 | group_by(ITEM, QUESTION, ACCURATE) |> 36 | summarise(count = n()) |> 37 | mutate(CORRECT_ANSWERS = count / sum(count) * 100) |> 38 | arrange(CORRECT_ANSWERS) |> 39 | filter(ACCURATE == "correct") 40 | 41 | head(minmax, 2) 42 | tail(minmax, 2) 43 | 44 | minmax |> 45 | filter(CORRECT_ANSWERS == min(minmax$CORRECT_ANSWERS) | 46 | CORRECT_ANSWERS == max(minmax$CORRECT_ANSWERS)) 47 | 48 | # Task 1.3: Of the Moses illusion questions, which question fooled most people? 49 | 50 | moses.preprocessed |> 51 | group_by(ITEM, CONDITION, QUESTION, ACCURATE) |> 52 | summarise(count = n()) |> 53 | mutate(CORRECT_ANSWERS = count / sum(count) * 100) |> 54 | filter(CONDITION == 'illusion', 55 | ACCURATE == "incorrect") |> 56 | arrange(CORRECT_ANSWERS) |> 57 | print(n=Inf) 58 | 59 | # Task 1.4: Which participant was the best in answering questions? 60 | # Who was the worst? 61 | 62 | moses.preprocessed |> 63 | group_by(ID, ACCURATE) |> 64 | summarise(count = n()) |> 65 | mutate(CORRECT_ANSWERS = count / sum(count) * 100) |> 66 | filter(ACCURATE == "correct") |> 67 | arrange(CORRECT_ANSWERS) |> 68 | print(n=Inf) 69 | 70 | # Task 2.1 71 | noisy_aj <- read.csv("noisy_aj.csv") 72 | noisy_aj |> 73 | group_by(CONDITION) |> 74 | summarise(MEAN_RATING = mean(RATING), 75 | SD = sd(RATING)) 76 | 77 | # Task 2.2 78 | noisy_rt <- read.csv("noisy_rt.csv") 79 | noisy_rt |> 80 | group_by(IA, CONDITION) |> 81 | summarise(MEAN_RT = mean(RT), 82 | SD = sd(RT)) 83 | 84 | # Task 2.3 85 | 86 | noisy <- noisy_aj |> 87 | full_join(noisy_rt) 88 | # full_join(noisy_rt, by=c("ID", "ITEM", "CONDITION")) |> head() 89 | 90 | # Lecture 13 May 2024 ------------------------------------------------------ 91 | 92 | # Noisy data preparation 93 | noisy <- read_csv("noisy.csv") 94 | noisy.rt <- 95 | noisy |> 96 | rename(ID = "MD5.hash.of.participant.s.IP.address", 97 | SENTENCE = "Sentence..or.sentence.MD5.") |> 98 | mutate(RT = as.numeric(Reading.time)) |> 99 | filter(Label == "experiment", 100 | PennElementType != "Scale", 101 | PennElementType != "TextInput", 102 | Reading.time != "NULL", 103 | RT > 80 & RT < 2000) |> 104 | select(ID, ITEM, CONDITION, SENTENCE, RT, Parameter) |> 105 | na.omit() 106 | 107 | # Plotting 108 | # Data summary with 1 row per observation 109 | noisy.summary <- 110 | noisy.rt |> 111 | group_by(ITEM, CONDITION, Parameter) |> 112 | summarise(RT = mean(RT)) |> 113 | group_by(CONDITION, Parameter) |> 114 | summarise(MeanRT = mean(RT), 115 | SD = sd(RT)) |> 116 | rename(IA = Parameter) 117 | 118 | # Plot object 119 | noisy.summary |> 120 | ggplot() + 121 | aes(x=as.numeric(IA), y=MeanRT, colour=CONDITION) + 122 | geom_line() + 123 | geom_point() + 124 | facet_wrap(.~CONDITION) + 125 | stat_sum() + 126 | # geom_errorbar(aes(ymin=MeanRT-2*SD, ymax=MeanRT+2*SD)) + 127 | coord_polar() + 128 | theme_classic() + 129 | labs(x = "Interest area", 130 | y = "Mean reading time in ms", 131 | title = "Noisy channel data", 132 | subtitle = "Reading times only", 133 | caption = "Additional caption", 134 | colour="Condition", 135 | size = "Count") 136 | 137 | # Esquisse 138 | library(esquisse) 139 | set_i18n("de") # Set language to German 140 | esquisser() 141 | -------------------------------------------------------------------------------- /2024/Week 6/week6assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 6 2 | 3 | **Make 10 plots overall.** 4 | 5 | 9 plots should visualize the data from the two in-class experiments. These plots should follow the WCOG guidelines and show different aspects of the data (e.g. only one condition, only one interest area) Do not make 3 plots that show the same thing, e.g. three times the mean acceptability rating between conditions. 6 | 7 | - 3 plots for the Moses illusion data (line, point, and bar), 8 | - 3 plots for the noisy channel reading time data (line, point, and bar), and 9 | - 3 plots for the noisy channel acceptability rating data (line, point, and bar). 10 | 11 | You can use hybrid plots as well. 12 | 13 | The last plot can be based on any dataset you want and be in any shape you want. It has to be ugly, unreadable, and violate as many WCOG guidelines as it can. 14 | 15 | If you use a dataset outside of the two experiments in class, please upload it with your script file. 16 | -------------------------------------------------------------------------------- /2024/Week 6/week6handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 6/week6handout.pdf -------------------------------------------------------------------------------- /2024/Week 8/code_MAY27.r: -------------------------------------------------------------------------------- 1 | # Load necessary packages 2 | library(ggplot2) 3 | library(patchwork) 4 | library(cowplot) 5 | 6 | 7 | # Last homework: minimal and maximal values only -------------------------- 8 | minmax <- 9 | moses.preprocessed |> 10 | group_by(ITEM, QUESTION, ACCURATE) |> 11 | summarise(count = n()) |> 12 | mutate(CORRECT_ANSWERS = count / sum(count) * 100) |> 13 | arrange(CORRECT_ANSWERS) |> 14 | filter(ACCURATE == "correct") 15 | 16 | minmax |> 17 | filter(CORRECT_ANSWERS == min(minmax$CORRECT_ANSWERS) | 18 | CORRECT_ANSWERS == max(minmax$CORRECT_ANSWERS)) 19 | 20 | # Generate plots from the R base 'iris' dataframe ------------------------- 21 | # Find out more abotu the iris dataframe by typing: ?iris 22 | plot1 <- 23 | ggplot(iris) + 24 | aes(x = Sepal.Length, 25 | fill = Species) + 26 | geom_density(alpha = 0.5) + 27 | theme_minimal() 28 | 29 | plot2 <- 30 | ggplot(iris) + 31 | aes(x = Sepal.Length, 32 | y = Sepal.Width, 33 | color = Species) + 34 | geom_point() + 35 | theme_minimal() 36 | 37 | plot3 <- 38 | ggplot(iris) + 39 | aes(x = Species, y = Petal.Width, fill = Species) + 40 | geom_boxplot() + 41 | theme_minimal() 42 | 43 | plot4 <- 44 | ggplot(iris) + 45 | aes(x = Petal.Length, 46 | y = Petal.Width, 47 | colour = Species, 48 | group = Species) + 49 | geom_step() + 50 | theme_minimal() 51 | 52 | 53 | # Patchwork --------------------------------------------------------------- 54 | # Join plots and arrange them in two rows 55 | plots <- (plot1 | plot2 | plot3) / plot4 + plot_layout(nrow = 2) 56 | # Keep all legends together and add annotations 57 | plots + plot_layout(guides = 'collect') + plot_annotation(tag_levels = 'A') 58 | 59 | # Export plots 60 | ggsave("patchwork_plots.png", width=1000, units = "px", dpi=100) 61 | ggsave("patchwork_plots.pdf", dpi=100) 62 | 63 | # Remove last plot 64 | dev.off() 65 | 66 | # Cowplot ----------------------------------------------------------------- 67 | # Join plots, remove all legends, add annotations 68 | plots <- 69 | plot_grid(plot1 + theme(legend.position="none"), 70 | plot2 + theme(legend.position="none"), 71 | plot3 + theme(legend.position="none"), 72 | plot4 + theme(legend.position="none"), 73 | labels = c('A', 'B', 'C', 'D'), 74 | label_size = 12) 75 | # Choose one legend to keep 76 | legend <- 77 | get_legend(plot1 + 78 | guides(color = guide_legend(nrow = 1)) + 79 | theme(legend.box.margin = margin(12, 12, 12, 12))) 80 | # Put the plot and legend together 81 | plot_grid(plots, legend, rel_widths = c(3, .4)) 82 | 83 | # Export plots 84 | ggsave("cowplot_plots.png", height=8, dpi=100) 85 | ggsave("cowplot_plots.pdf", dpi=100) 86 | -------------------------------------------------------------------------------- /2024/Week 8/week8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 8/week8.pdf -------------------------------------------------------------------------------- /2024/Week 8/week8assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 8 2 | 3 | Upload 1 file. 4 | 5 | - Upload one PNG or PDF file with the plots from last week's homework in one picture (**not one picture per plot**). Use the packages `patchwork` or `cowplot`. 6 | - Vote for the ugliest plot 7 | - Install Quarto: https://quarto.org/docs/get-started/ 8 | - Watch the introductory video: https://youtu.be/_f3latmOhew?si=xxovQvYkUosC_4uB 9 | -------------------------------------------------------------------------------- /2024/Week 8/week8handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 8/week8handout.pdf -------------------------------------------------------------------------------- /2024/Week 9/quarto-demo.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 9/quarto-demo.zip -------------------------------------------------------------------------------- /2024/Week 9/week9assignment.md: -------------------------------------------------------------------------------- 1 | # Assignment Week 9 2 | 3 | Submit two files in which you report on the Moses illusion experiment: a Quarto file (`.qmd`) and an HTML file (`.html`). The Quarto file should have all the code, text, and markup. The HTML file should look like **a beautiful report with no unnecessary code or output.** Treat it like a report or presentation, but be very brief; this is not a term paper. 4 | 5 | - Keep all the code you need for analyzing and visualizing the data. 6 | - Make at least one table. 7 | - Make at least one list. 8 | - Include at least one plot of the data. 9 | - Reference the table, list, and plot in the report text by hyperlinking/cross-referencing. 10 | - Include the session info in full. 11 | 12 | Your Quarto file should include ALL code needed to generate the data. That means you need to include everything from start to finish, so also the code for **loading the packages and data**. Then all the code for preprocessing and generating tables, plots, etc. 13 | -------------------------------------------------------------------------------- /2024/Week 9/week9handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2024/Week 9/week9handout.pdf -------------------------------------------------------------------------------- /2024/readme.md: -------------------------------------------------------------------------------- 1 | # Digital Research Toolkit for Linguists 2 | 3 | Author: `anna.pryslopska[ AT ]ling.uni-stuttgart.de` 4 | 5 | These are the original materials from the course "Digital Research Toolkit for Linguists taught by me in the Summer Semester 2024 at the University of Stuttgart. 6 | 7 | If you want to replicate this course, you can do so with proper attribution. To replicate the data, follow these links for [Experiment 1](https://farm.pcibex.net/r/CuZHnp/) (full Moses illusion experiment) and [Experiment 2](https://farm.pcibex.net/r/zAxKiw/) (demo of self-paced reading with acceptability judgment). 8 | 9 | ## Schedule and syllabus 10 | 11 | This is a rough overview of the topics discussed every week. These are subject to change, depending on how the class goes. 12 | 13 | | Week | Topic | Description | Assignments | Materials | 14 | | ---- | ----- | ----------- | ----------- | --------- | 15 | | 1 | Introduction & overview | Course overview and expectations, classroom management and assignments/grading etc. Data collection. | Complete [Experiment 1](https://farm.pcibex.net/p/glQRwV/) and [Experiment 2](https://farm.pcibex.net/p/ceZUkj/) and recruit one more person. [Install R](https://www.r-project.org/) and [RStudio](https://posit.co/download/rstudio-desktop/), install [Texmaker](https://www.xm1math.net/texmaker/) or make an [Overleaf](https://www.overleaf.com/) account. | [Slides](https://github.com/a-nap/DRTfL2024/blob/1e3ac235f6957eaaebf8a19f1889d0b6a6f79fb7/Week%201/week1handout.pdf) | 16 | | 2 | Data, R and RStudio | Intro recap, directories, R and RStudio, installing and loading packages, working with scripts | Read chapters 2, 6 and 7 of [R for Data Science](https://r4ds.hadley.nz/), complete [assignment 1](https://github.com/a-nap/DRTfL2024/blob/main/Week%202/week2assignment.md) | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%202/week2handout.pdf), [code](https://github.com/a-nap/DRTfL2024/blob/main/Week%202/code_APR15.r) | 17 | | 3 | Reading data, data inspection and manipulation | Looking at your data, data types, importing, making sense of the data, intro to sorting, filtering, subsetting, removing missing data, data manipulation | Read chapters 3, 4 and 5 of [R for Data Science](https://r4ds.hadley.nz/), complete [assignment 2](https://github.com/a-nap/DRTfL2024/blob/main/Week%203/week3assignment.md). | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%203/week3handout.pdf), [code](https://github.com/a-nap/DRTfL2024/blob/main/Week%203/code_APR22.r), data | 18 | | 4 | Data manipulation | Basic operators, data manipulation (filtering, sorting, subsetting, arranging), pipelines, tidy code, practice. | Compete [assignment 3](https://github.com/a-nap/DRTfL2024/blob/main/Week%204/week4assignment.md) | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%204/week4handout.pdf), [code](https://github.com/a-nap/DRTfL2024/blob/main/Week%204/code_APR29.r), data | 19 | | 5 | Data manipulation and error handling | Summary statistics, grouping, merging, if ... else, naming variables, tidy code, error handling and getting help. | [Assignment 4](https://github.com/a-nap/DRTfL2024/blob/main/Week%205/week5assignment.md), read the slides from the QCBS R Workshop Series [*Workshop 3: Introduction to data visualisation with `ggplot2`*](https://r.qcbs.ca/workshop03/pres-en/workshop03-pres-en.html) | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%205/week5handout.pdf), [code](https://github.com/a-nap/DRTfL2024/blob/main/Week%205/code_MAY06.r) | 20 | | 6 | Data visualization | Communicating with graphics, choice of visualization, plot types, best practices, visualizing in R (`ggplot2`, `esquisse`), exporting plots and data | Complete [assignment 5](https://github.com/a-nap/DRTfL2024/blob/main/Week%206/week6assignment.md). If you haven't yet, install the package `esquisse` | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%206/week6handout.pdf), [code](https://github.com/a-nap/DRTfL2024/blob/main/Week%206/code_MAY13.r) | 21 | | 7 | No class | Holiday | | | 22 | | 8 | Data visualization | Data visualization recap, best practices, lying with plots, practical exercises, exporting/saving plots and data. | Complete [assignment 6](https://github.com/a-nap/DRTfL2024/blob/main/Week%208/week8assignment.md). Install [Quarto](https://quarto.org/docs/get-started/). Watch [the introductory video](https://www.youtube.com/watch?v=_f3latmOhew) | Slides [large](https://github.com/a-nap/DRTfL2024/blob/main/Week%208/week8.pdf) and [compressed](https://github.com/a-nap/DRTfL2024/blob/main/Week%208/week8handout.pdf), [code](https://github.com/a-nap/DRTfL2024/blob/main/Week%208/code_MAY27.r) | 23 | | 9 | Creating reports with Quarto and knitr | Pandoc, markdown, Quarto, basic syntax and elements, export, document, and chunk options, documentation | Complete [assignment 7](https://github.com/a-nap/DRTfL2024/blob/main/Week%209/week9assignment.md). | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%209/week9handout.pdf), [compressed Quarto files](https://github.com/a-nap/DRTfL2024/blob/main/Week%209/quarto-demo.zip) | 24 | | 10 | Typesetting documents with LaTeX | What is LaTeX, basic document and file structure, advantages and disadvantages, from R to LaTeX | Complete [assignment 8](https://github.com/a-nap/DRTfL2024/blob/main/Week%2010/week10assignment.md), read chapter 2 of [*The Not So Short Introduction to LaTeX*](https://tobi.oetiker.ch/lshort/lshort.pdf). | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%2010/week10handout.pdf), [basic LaTeX file (zip)](https://github.com/a-nap/DRTfL2024/blob/main/Week%2010/basic%20LaTeX%20document.zip) | 25 | | 11 | Typesetting documents with LaTeX | Editing text (commands, whitespace, environments, font properties, figures, and tables), glosses, IPA symbols, semantic formulae, syntactic trees | Complete [assignment 9](https://github.com/a-nap/DRTfL2024/blob/main/Week%2011/week11assignment.md), read [*Bibliography management with biblatex*](https://www.overleaf.com/learn/latex/Bibliography_management_with_biblatex) | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%2011/week11handout.pdf) | 26 | | 12 | Typesetting documents with LaTeX and bibliography management | Large projects, citations, references, bibliography styles, bib file structure | Complete [assignment 10](https://github.com/a-nap/DRTfL2024/blob/main/Week%2012/week12assignment.md) | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%2012/week12handout.pdf), [big project files](https://github.com/a-nap/DRTfL2024/blob/main/Week%2012/big_project.zip) | 27 | | 13 | Literature and reference management, common command line commands | Reference managers, looking up literature, command line commands (grep, diff, ping, cd, etc.) | Complete [assignment 11](https://github.com/a-nap/DRTfL2024/blob/main/Week%2013/week13assignment.md) | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%2013/week13upload.pdf), [corpus1.txt](https://github.com/a-nap/DRTfL2024/blob/main/Week%2013/corpus1.TXT), [corpus2.txt](https://github.com/a-nap/DRTfL2024/blob/main/Week%2013/corpus2.TXT), [corpus3.txt](https://github.com/a-nap/DRTfL2024/blob/main/Week%2013/corpus3.TXT), [big project 1](https://github.com/a-nap/DRTfL2024/blob/main/Week%2013/big_project_1.zip), [big project 2](https://github.com/a-nap/DRTfL2024/blob/main/Week%2013/big_project_2.zip) | 28 | | 14 | Text editors, version control and Git | Text editors, Git, GitHub, version control | Complete [assignment 12](https://github.com/a-nap/DRTfL2024/blob/main/Week%2014/week14assignment.md) | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%2014/week14handout.pdf), [example readme file](https://github.com/a-nap/DRTfL2024/blob/main/Week%2014/readme-example.md) | 29 | | 15 | Version control and Git | Git, GitHub, SSH, reverting to older versions | In class assignment | [Slides](https://github.com/a-nap/DRTfL2024/blob/main/Week%2015/week15handout.pdf), [SSH for GitHub video](https://vimeo.com/989393245) | 30 | 31 | ## Recommended reading 32 | 33 | ### Git 34 | 35 | - GitHub Git guide: [`https://github.com/git-guides/`](https://github.com/git-guides/) 36 | - Another git guide: [`http://rogerdudler.github.io/git-guide/`](http://rogerdudler.github.io/git-guide/) 37 | - Git tutorial: [`http://git-scm.com/docs/gittutorial`](http://git-scm.com/docs/gittutorial) 38 | - Another git tutorial: [`https://www.w3schools.com/git/`](https://www.w3schools.com/git/) 39 | - Git cheat sheets: [`https://training.github.com/`](https://training.github.com/) 40 | - Where to ask questions: [Stackoverflow](https://stackoverflow.com) 41 | 42 | ### LaTeX 43 | 44 | - Overleaf (n.d.) *Bibliography management with biblatex*. Accessed: 2024-06-24. URL: [`https://www.overleaf.com/learn/latex/Bibliography_management_with_biblatex`](https://www.overleaf.com/learn/latex/Bibliography_management_with_biblatex) 45 | - Dickinson, Markus and Josh Herring (2008). *LaTeX for Linguists*. Accessed: 2024-06-07. URL: 46 | [`https://cl.indiana.edu/~md7/08/latex/slides.pdf`](https://cl.indiana.edu/~md7/08/latex/slides.pdf). 47 | - LaTeX/Linguistics - Wikibooks (2024). Accessed: 2024-06-07. URL: [`https://en.wikibooks.org/wiki/LaTeX/Linguistics`](https://en.wikibooks.org/wiki/LaTeX/Linguistics). 48 | - Oetiker, Tobias et al. (2023). *The Not So Short Introduction to LATEX*. Accessed: 2024-06-07. URL: 49 | [`https://tobi.oetiker.ch/lshort/lshort.pdf`](https://tobi.oetiker.ch/lshort/lshort.pdf). 50 | 51 | ### Quarto 52 | 53 | - Introductory video: [`https://www.youtube.com/watch?v=_f3latmOhew`](https://www.youtube.com/watch?v=_f3latmOhew) 54 | - Documentation: [`https://quarto.org/docs/get-started/`](https://quarto.org/docs/get-started/) 55 | 56 | ### R 57 | 58 | - QCBS R Workshop Series [`https://r.qcbs.ca/`](https://r.qcbs.ca/) 59 | - Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023). *R for data science: import, tidy, transform, visualize, and model data*. 2nd ed. O’Reilly Media, Inc. URL: [`https://r4ds.hadley.nz/`](https://r4ds.hadley.nz/). 60 | 61 | ### Experiments 62 | 63 | - Free-response: Erickson, Thomas D and Mark E Mattson (1981). “From words to meaning: A semantic illusion”. In: *Journal of Verbal Learning and Verbal Behavior* 20.5, pp. 540–551. DOI: [`10.1016/s0022-5371(81)90165-1`](https://www.sciencedirect.com/science/article/abs/pii/S0022537181901651). 64 | - Self-paced reading with acceptability judgments: Gibson, Edward, Leon Bergen, and Steven T Piantadosi (2013). “Rational integration of noisy evidence and prior semantic expectations in sentence interpretation”. In: *Proceedings of the National Academy of Sciences* 110.20, pp. 8051–8056. DOI: [`10.1073/pnas.1216438110`](https://www.pnas.org/doi/full/10.1073/pnas.1216438110). 65 | -------------------------------------------------------------------------------- /2025/Week 1/assignment01.md: -------------------------------------------------------------------------------- 1 | # Assignment 01 2 | 3 | Give one answer to each of the three tasks. 4 | 5 | ## Task 1 6 | 7 | 1. Take part in the class experiment: https://farm.pcibex.net/p/glQRwV/ 8 | 2. Which question did you like most? 9 | 10 | ## Task 2 11 | 12 | 1. Install R: https://cran.r-project.org/ 13 | 2. Install RStudio: https://www.rstudio.com/ 14 | 3. Did you successfully install both? 15 | 16 | ## Task 3 17 | 18 | Check if your data is sold: https://netzpolitik.org/2024/databroker-files-jetzt-testen-wurde-mein-handy-standort-verkauft/ 19 | 20 | Was your data sold? -------------------------------------------------------------------------------- /2025/Week 1/week01handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 1/week01handout.pdf -------------------------------------------------------------------------------- /2025/Week 2/assignment02.md: -------------------------------------------------------------------------------- 1 | # Assignment 02 2 | 3 | Upload 2 files to complete this assignment. 4 | 5 | ## Part 1/file 1 (image) 6 | 7 | - Change the layout and color theme of RStudio. 8 | - Make and upload a screenshot of your RStudio installation. 9 | 10 | ## Part 2/file 2 (r script) 11 | 12 | Upload an R script that does all of the following: 13 | 14 | - Install and load the packages: tidyverse, knitr, patchwork, and psych 15 | - Prints a long text (30-50 words) and saves it to a variable called "long_text" 16 | -------------------------------------------------------------------------------- /2025/Week 2/week02.R: -------------------------------------------------------------------------------- 1 | # Week 02 ----------------------------------------------------------------- 2 | # April 15th 2025 3 | 4 | # Working directory 5 | setwd("path here") # For me, this is "~/Linguistics toolkit course/2025/Code" 6 | getwd() # Show the working directory 7 | 8 | # Packages 9 | install.packages(c("NAME", "ANOTHER NAME")) # Install packages called NAME and ANOTHER NAME 10 | library(NAME) # Load one package at a time 11 | sessionInfo() # Current R session information 12 | detach("package:NAME", unload = TRUE) # Unload the package called NAME -------------------------------------------------------------------------------- /2025/Week 2/week02handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 2/week02handout.pdf -------------------------------------------------------------------------------- /2025/Week 3/assignment03.R: -------------------------------------------------------------------------------- 1 | # Homework assignment week 3 2 | # AUTOGRADER INFORMATION/CAUTION 3 | # Type your answers in the comments next to the word "ANSWER". 4 | # Do not make new lines, delete code, or change the code. This might make the autograder fail. 5 | 6 | # 1. According to R, what is the type of the following variables: 7 | "R" # ANSWER 8 | -10 # ANSWER 9 | FALSE # ANSWER 10 | 3.14 # ANSWER 11 | as.logical(1) # ANSWER 12 | as.numeric(TRUE) # ANSWER 13 | 14 | # 2. According to R, are the following two variables equivalent (yes/no): 15 | 7+0i == 7 # ANSWER 16 | 9 == 9.0 # ANSWER 17 | "zero" == 0L # ANSWER 18 | "cat" == "cat" # ANSWER 19 | TRUE == 1 # ANSWER 20 | 21 | # 3. What is the output of the following operations? If there is an error, what caused it? 22 | -10 > 1 # ANSWER 23 | 5 != 4 # ANSWER 24 | 5 - FALSE # ANSWER 25 | 17.0 == 7 # ANSWER 26 | 4 = 9.1 # ANSWER 27 | 0/0 # ANSWER 28 | "toolkit" + 1 # ANSWER 29 | toolkit = 2 # ANSWER 30 | toolkit * 2 # ANSWER 31 | (1-2)/0 # ANSWER 32 | 10 -> 20 # ANSWER 33 | NA == NA # ANSWER 34 | NA == Inf # ANSWER 35 | 36 | # 4. Create a long text (30-50 words) and save it to a variable called "long_text". 37 | -------------------------------------------------------------------------------- /2025/Week 3/assignment03.md: -------------------------------------------------------------------------------- 1 | # Assignment 03 2 | 3 | Upload 1 file to complete this assignment: assignment03.R 4 | 5 | ## AUTOGRADER INFORMATION/CAUTION 6 | 7 | Type your answers in the comments next to the word "ANSWER" for tasks 1-3. -------------------------------------------------------------------------------- /2025/Week 3/week03.R: -------------------------------------------------------------------------------- 1 | # Week 03 ----------------------------------------------------------------- 2 | # April 22nd 2025 3 | 4 | library(tidyverse) 5 | library(psych) 6 | 7 | # Data types 8 | typeof(1L) 9 | is.numeric(1) 10 | as.character(1) 11 | 12 | # Printing and assigning 13 | print("Hello World") 14 | jabberwocky <- print("Twas brillig, and the slithy toves did gyre and gimble in the wabe: all mimsy were the borogoves, and the mome raths outgrabe.") 15 | rm(jabberwocky) # Removes the variable "jabberwocky" 16 | 17 | # This operator can be used anywhere 18 | ten <- 10.2 19 | "rose" -> Rose 20 | mean(number <- 10) 21 | 22 | # This operator can be used only at the top level 23 | name = "Anna" 24 | mean(number = 10) # This will not work and cause an error 25 | 26 | # This operator assigns the value (used mainly in functions) 27 | true <<- FALSE 28 | 13/12 ->> n 29 | mean(number <<- 10) 30 | 31 | # Type coercion 32 | TRUE + 1 33 | 5L + 2 34 | 3.7 * 3L 35 | 99999.0e-1 - 3.3e+3 36 | 10 / as.complex(2) 37 | as.character(5) / 5 # This will not work and will cause an error 38 | paste(5+0i, "five") 39 | 40 | # Loading data 41 | getwd() # Check your working directory! 42 | moses <- read_csv("moses_raw_data.csv") 43 | 44 | # Inspecting data 45 | View(moses) 46 | moses 47 | print(moses, n=Inf) 48 | head(moses) 49 | tail(moses, n=20) 50 | spec(moses) 51 | summary(moses) 52 | describe(moses) 53 | colnames(moses) 54 | 55 | # This function calculates the probability of getting exactly 6 successes 56 | # out of 9 tries in a binomial experiment, where each try has a 50% (0.5) 57 | # chance of success. In other words: What are the chances of a fair coin landing 58 | # on "heads" 6 times out of 9 throws. Returns a probability between 0 and 1. 59 | # You can translate probability to percent by multiplying the result by 100 60 | # (so around 16.4%). 61 | dbinom(x=6, size=9, prob=0.5) 62 | 63 | min(moses$EventTime) 64 | max(moses$EventTime) 65 | quantile(moses$EventTime) 66 | colnames(moses) 67 | mean(moses$EventTime) 68 | median(moses$EventTime) 69 | min(moses$EventTime) 70 | max(moses$EventTime) 71 | range(moses$EventTime) 72 | sd(moses$EventTime) 73 | skew(moses$EventTime) 74 | kurtosis(moses$EventTime) # requires the package "moments", which we won't use 75 | mean_se(moses$EventTime) 76 | 77 | # Data cleanup 78 | select(WHERE, WHAT) # Select columns 79 | na.omit(WHERE) # Remove missing values 80 | filter(WHERE, TRUE CONDITION) # Select rows, based on a condition 81 | arrange(WHERE, HOW) # Reorder data by rows 82 | rename(WHERE, NEW = OLD) # Rename columns 83 | mutate(WHERE, NEW = FUNCTION(OLD)) # Create new values 84 | 85 | # Optional plot: Normal distribution with standard deviation lines. 86 | # Feel free to ignore. This code gives you a small preview of data visualization 87 | # That we'll be doing later in the course. 88 | 89 | # Define custom colors I use for the course 90 | dusk <- "#343643" 91 | pine <- "#476938" 92 | meadow <- "#86B047" 93 | sunshine <- "#DABA2E" 94 | 95 | # Set the mean and standard deviation for the normal distribution 96 | mean_value <- 0 97 | sd_value <- 1 98 | 99 | # Create a sequence of values from -4 to 4 (for plotting the bell curve) 100 | x_values <- seq(-4, 4, length.out = 100) 101 | 102 | # Generate the normal distribution values for those x-values 103 | y_values <- dnorm(x_values, mean = mean_value, sd = sd_value) 104 | 105 | # Create a data frame to use with ggplot 106 | data <- data.frame(x = x_values, y = y_values) 107 | 108 | # Plot the normal distribution curve 109 | ggplot(data, aes(x = x, y = y)) + 110 | geom_line(linewidth=2) + # Line for the bell curve 111 | annotate("text", x = mean_value + sd_value, y = 0.4, label = "68%", color = pine, size = 5) + 112 | annotate("text", x = mean_value + 2*sd_value, y = 0.4, label = "95%", color = meadow, size = 5) + 113 | annotate("text", x = mean_value + 3*sd_value, y = 0.4, label = "99.7%", color = sunshine, size = 5) + 114 | annotate("text", x = mean_value - sd_value, y = 0.4, label = "±1 SD", color = pine, size = 5) + 115 | annotate("text", x = mean_value - 2*sd_value, y = 0.4, label = "±2 SD", color = meadow, size = 5) + 116 | annotate("text", x = mean_value - 3*sd_value, y = 0.4, label = "±3 SD", color = sunshine, size = 5) + 117 | geom_histogram(stat="identity", fill="white", color=dusk)+ # Uncomment this line to see the values 118 | geom_vline(xintercept = mean_value, color = dusk, linetype = "dashed") + # Mean line 119 | geom_vline(xintercept = mean_value + sd_value, color = pine, linetype = "dotted", linewidth=1) + # +1 SD line 120 | geom_vline(xintercept = mean_value - sd_value, color = pine, linetype = "dotted", linewidth=1) + # -1 SD line 121 | geom_vline(xintercept = mean_value + 2*sd_value, color = meadow, linetype = "dashed", linewidth=1) + # +2 SD line 122 | geom_vline(xintercept = mean_value - 2*sd_value, color = meadow, linetype = "dashed", linewidth=1) + # -2 SD line 123 | geom_vline(xintercept = mean_value + 3*sd_value, color = sunshine, linetype = "solid", linewidth=1) + # +3 SD line 124 | geom_vline(xintercept = mean_value - 3*sd_value, color = sunshine, linetype = "solid", linewidth=1) + # -3 SD line 125 | labs(title = "Normal distribution with standard deviation lines", x = "Some variable X", 126 | y = "Density (how much data lies here)", 127 | subtitle="AKA Bell curve with ±1, ±2, ±3 SDs") + 128 | theme_bw() + 129 | theme(panel.grid = element_blank()) # Removes grid lines, because I think they're distracting 130 | 131 | -------------------------------------------------------------------------------- /2025/Week 3/week03handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 3/week03handout.pdf -------------------------------------------------------------------------------- /2025/Week 4/assignment04.R: -------------------------------------------------------------------------------- 1 | ########################################################################### 2 | # Assignment Week 4 3 | ########################################################################### 4 | 5 | # Please complete the following 4 tasks. Submit the assignment as a single R script. 6 | # Use comments and sections to give your file structure. I should be able to run 7 | # your script without errors. 8 | 9 | # Task 1 ------------------------------------------------------------------ 10 | # Clean up the Moses illusion data like we did in the tasks in class and save it 11 | # to a new data frame. 12 | # - select relevant columns 13 | # - rename mislabeled columns 14 | # - remove missing data 15 | # - remove unnecessary rows 16 | # - arrange by condition, and answer 17 | 18 | 19 | 20 | 21 | 22 | 23 | # Task 2 ------------------------------------------------------------------ 24 | # Have the mosesdata saved in your environment as "moses". 25 | # Why do these functions not work as intended? Fix the code and explain what was 26 | # wrong. 27 | ### IMPORTANT ############################################################# 28 | # Type your answers in the comments next to the word "ANSWER". 29 | 30 | read_csv(moses.csv) # ANSWER 31 | tail(moses, n==10) # ANSWER 32 | Summary(moses) # ANSWER 33 | describe(Moses) # ANSWER 34 | filter(moses, CONDITION == 102) # ANSWER 35 | arragne(moses, ID) # ANSWER 36 | 37 | 38 | # Task 3 ------------------------------------------------------------------ 39 | # From the Moses illusion data, make two new variables (called 'nobel' and 40 | # 'valentines', respectively) with all answers which are supposed to mean 41 | # "Nobel Prize" and "Valentines Day". You will have to figure out which ITEM ID 42 | # corresponds to the questions asking about Nobel Prize and Valentines Day. 43 | # Tip: The questions always come in pairs, so ITEM ID 1 will be present in 44 | # CONDITION 1 and 2. You want to look at both conditions in this assignment. 45 | # Try to figure out which item IDs you need by previewing the data first. 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | # Task 4 ------------------------------------------------------------------ 55 | # Logic exercise from the slides 56 | # Your world has four individuals: octopus, dolphin, llama, and parrot. 57 | # Octopus and dolphin are of the type 'dive', because they can dive. 58 | # Llama and dolphin are of the type 'mammal', because they are mammals. 59 | # Type your answers as a string. For example: 60 | 61 | octopus_dolphin = "dive" 62 | llama_dolphin = "mammal" 63 | octopus_parrot = "!mammal" 64 | 65 | ### IMPORTANT ############################################################# 66 | # Write your answers in between the quotation marks, as in the examples above. 67 | octopus = "" 68 | dolphin = "" 69 | llama = "" 70 | parrot = "" 71 | llama_parrot = "" 72 | parrot_dolphin = "" 73 | llama_octopus_parrot = "" 74 | octopus_llama_dolphin = "" 75 | dolphin_parrot_octopus = "" 76 | octopus_dolphin_llama_parrot = "" 77 | exclude_all = "" 78 | -------------------------------------------------------------------------------- /2025/Week 4/assignment04.md: -------------------------------------------------------------------------------- 1 | # Assignment 04 2 | 3 | Upload 1 file to complete this assignment: assignment04.R 4 | 5 | ## AUTOGRADER INFORMATION/CAUTION 6 | 7 | Please complete the following 4 tasks. Submit the assignment as a single R script. Use comments and sections to give your file structure. I should be able to run your script without errors. -------------------------------------------------------------------------------- /2025/Week 4/week04.R: -------------------------------------------------------------------------------- 1 | # Week 04 ----------------------------------------------------------------- 2 | # April 29th 2025 3 | 4 | # Check your working directory 5 | getwd() 6 | 7 | # Load the necessary packages and data 8 | library(tidyverse) 9 | moses <- read_csv("moses_raw_data.csv") 10 | moses 11 | 12 | # Renaming columns 13 | rename(moses, 14 | ID = MD5.hash.of.participant.s.IP.address, 15 | ANSWER = Value) 16 | 17 | # Selecting columns in 3 ways. 18 | select(moses, ID, ITEM, CONDITION, ANSWER) 19 | select(moses, c(ID, ITEM, CONDITION, ANSWER)) 20 | select(moses, c(ID, ITEM:ANSWER)) 21 | 22 | # Printing values 23 | 10 < 1 24 | print(10 < 1) 25 | c(10 < 1) 26 | cat(10 < 1) 27 | 28 | # Removing missing values 29 | na.omit(moses) 30 | na.omit(moses$Item) 31 | na.omit(moses[ , "Item"]) 32 | na.omit(moses[ , 4]) 33 | 34 | # Filtering rows 35 | # Only condition 1 36 | filter(moses, CONDITION == 1) 37 | filter(moses, CONDITION %in% 1) 38 | filter(moses, CONDITION >= 1 & CONDITION < 2) # CONDITION is at least 1 but less than 2 39 | # Conditions 1 and 2 40 | filter(moses, CONDITION == 1 | CONDITION == 2) # CONDITION is either 1 or 2. 41 | filter(moses, CONDITION %in% 1:2) # CONDITION is in the set {1, 2}. Here, the set is a range from 1 to 2. 42 | filter(moses, CONDITION < 100) # CONDITION is less than 100 43 | filter(moses, CONDITION %in% c(1, 2)) # Same syntax as above, but also works for character vectors. 44 | # The next function behaves unexpectedly. R tries to recycle values here. It compares CONDITION[1] == 1, CONDITION[2] == 2, etc. 45 | # So this does not check if CONDITION is 1 or 2, and can lead to confusing or incorrect results. 46 | filter(moses, CONDITION == 1:2) 47 | 48 | # Arranging rows 49 | arrange(moses, ITEM) 50 | arrange(moses, ITEM, CONDITION) 51 | arrange(moses, -ID) # ID is in decreasing order 52 | arrange(moses, desc(is.na(ANSWER))) # ANSWER is in decreasing order 53 | 54 | # Unique values 55 | unique(moses$ANSWER) # Show all the different values in ANSWER without repetitions 56 | unique(select(moses, ANSWER)) # Same as above 57 | print(unique(select(moses, ANSWER)), n=Inf) # Same as above, but print everything (= up to a max. value of infinity) 58 | 59 | # Data cleanup 60 | # To get all these values, I used a combination of selecting columns, filtering rows, 61 | # getting unique values, and arranging them in a reasonable way. For the homework 62 | # assignment, you will need to figure out which ITEM ID 63 | cant_answer <- c("Can't Answer", "Can't answer", 64 | "Can't answer the question", "Can't answrer", 65 | "Can't be answered", "Can´t answer", "i can't answer", 66 | "can't andwer" , "can't answer" , 67 | "can't answer (Nobel is given by Norway)", "can't asnwer", 68 | "can't know", "can`t answer", "can`t asnwer" , 69 | "cant answer", "can´t answ", "can´t answer", "no answer") 70 | -------------------------------------------------------------------------------- /2025/Week 4/week04handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 4/week04handout.pdf -------------------------------------------------------------------------------- /2025/Week 5/assignment05.R: -------------------------------------------------------------------------------- 1 | ########################################################################### 2 | # Assignment Week 5 3 | ########################################################################### 4 | 5 | # Please complete the following 3 tasks. Submit the assignment as a single R script. 6 | # Use comments and sections to give your file structure. I should be able to run 7 | # your script without errors. 8 | 9 | 10 | # Task 1 ------------------------------------------------------------------ 11 | # Using pipes, clean up the original, Moses illusion raw data like we did in 12 | # the tasks in class and save it to a new data frame. 13 | # - select relevant columns 14 | # - rename mislabeled columns 15 | # - remove missing data 16 | # - remove unnecessary rows 17 | # - arrange by condition, and answer 18 | # - re-code the ITEM column as a number 19 | # For this task 1, use the original, not preprocessed data: moses_raw_data.csv. 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | # Task 2 ------------------------------------------------------------------ 30 | # Using if-else or case_when statements, change the CONDITION column to have 31 | # more descriptive names for conditions instead of numbers. 32 | # Condition 1 are the Moses illusion questions. 33 | # Condition 2 are control questions, which have a single, predefined correct answer. 34 | # Condition 100 are good filler questions, which have a single, predefined correct answer. 35 | # Condition 101 are bad filler questions, which have no correct answer. 36 | # For this task, use the preprocessed data and questions: moses_clean.csv and questions.csv 37 | # If you save the result of this exercise as a new variable, you can use it for 38 | # the next next exercise. 39 | 40 | 41 | 42 | 43 | 44 | # Task 3 ------------------------------------------------------------------ 45 | # Calculate the percentage of "correct", "incorrect", and "don't know" answers 46 | # in the two critical conditions (think about which conditions these are). 47 | # Include the code for answering these questions. 48 | # For this task, use the preprocessed data and questions: moses_clean.csv and questions.csv 49 | # (i.e. the data frame you ) 50 | 51 | # Task 3A ----------------------------------------------------------------- 52 | # Of all the questions in all conditions, which question was the easiest and 53 | # which was the hardest? 54 | 55 | 56 | # Task 3B ------------------------------------------------------------------ 57 | # Of the Moses illusion questions, which question fooled most people? 58 | 59 | 60 | # Task 3C ------------------------------------------------------------------ 61 | # Which participant was the best in answering questions? Who was the worst? 62 | -------------------------------------------------------------------------------- /2025/Week 5/assignment05.md: -------------------------------------------------------------------------------- 1 | # Assignment 05 2 | 3 | Complete and upload the file assignment05.R Remember to rename your file with your name and assignment number. -------------------------------------------------------------------------------- /2025/Week 5/week05.R: -------------------------------------------------------------------------------- 1 | # Week 05 ----------------------------------------------------------------- 2 | # May 6th 2025 3 | library(tidyverse) 4 | library(psych) 5 | 6 | moses <- read_csv("moses_raw_data.csv") 7 | 8 | # Mutating ---------------------------------------------------------------- 9 | # Note: the code from line 13 will not work, because I didn't assign the mutated 10 | # data frame anywhere. 11 | 12 | mutate(moses, CLASS = TRUE) # Make a new column 13 | mutate(moses, NUMBER = 1:20596) # Make a new column 14 | mutate(moses, NUMBERS = NUMBER + 1) # Calculate a new column from existing one 15 | mutate(moses, NUMBER1 = NUMBER == 1) # Evaluate column 16 | mutate(moses, NUMBER = as.character(NUMBER)) # Overwrite column 17 | mutate(moses, NUMBER1 = NULL) # Remove column 18 | 19 | # Pipes ------------------------------------------------------------------- 20 | 21 | moses |> 22 | rename(ANSWER = Value, 23 | ID = MD5.hash.of.participant.s.IP.address) |> 24 | select(ID, ITEM, CONDITION, ANSWER) |> 25 | na.omit() |> 26 | filter(CONDITION != 0) |> 27 | mutate(ITEM = as.numeric(ITEM)) |> 28 | arrange(ITEM, CONDITION) |> 29 | unique() 30 | 31 | # Joins ------------------------------------------------------------------- 32 | moses <- read_csv("moses_clean.csv") 33 | questions <- read_csv("questions.csv") 34 | 35 | 36 | # If else statements ------------------------------------------------------ 37 | 38 | moses |> 39 | mutate(ACCURATE = ifelse(test = CORRECT_ANSWER == 40 | ANSWER, yes = TRUE, no = FALSE)) 41 | 42 | moses |> 43 | mutate(ACCURATE = ifelse(CORRECT_ANSWER == ANSWER, 44 | "correct", "incorrect")) 45 | 46 | 47 | # Case when --------------------------------------------------------------- 48 | 49 | moses |> 50 | mutate(CONDITION = case_when( 51 | CONDITION == '1' ~ 'illusion', 52 | CONDITION == '2' ~ 'no illusion', 53 | CONDITION == '100' ~ 'good filler', 54 | CONDITION == '101' ~ 'bad filler') 55 | ) 56 | 57 | 58 | moses |> 59 | # Code for joins and if-else statements omitted for brevity 60 | mutate(ACCURATE = case_when( 61 | ANSWER == CORRECT_ANSWER ~ "correct", 62 | ANSWER != "dont_know" ~ "incorrect", 63 | TRUE ~ ANSWER)) 64 | 65 | # Task 6 66 | moses.clean |> 67 | group_by(ITEM, ACCURATE) |> 68 | summarise(Count = n()) 69 | -------------------------------------------------------------------------------- /2025/Week 5/week05handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 5/week05handout.pdf -------------------------------------------------------------------------------- /2025/Week 6/assignment06.R: -------------------------------------------------------------------------------- 1 | ########################################################################### 2 | # Assignment Week 6 3 | ########################################################################### 4 | 5 | # The goal of this homework is to preprocess the noisy illusion data and compute summary statistics. 6 | 7 | # Meet the data ----------------------------------------------------------- 8 | 9 | # This is a NEW experiment data that you haven't see yet. It's about the 10 | # noisy channel effect and it was done with the same software that I used 11 | # for the Moses illusion experiment. 12 | 13 | # The noisy channel: Humans understand language even in noisy environments 14 | # and can recover meaning from imperfect utterances. 15 | # Semantic cues can pull a comprehender towards plausible meanings, 16 | # but too much noise makes comprehenders switch to the literal 17 | # interpretation. 18 | 19 | # In this study, participants read sentences bit by bit and the goal was to see 20 | # whether one kind of sentence caused people to read for longer (indicating 21 | # comprehension issues). 22 | # There were two kinds of sentences 23 | # • The cook baked Lucy a cake. = grammatical sentence 24 | # • The cook baked a cake Lucy. = ungrammatical sentence 25 | 26 | # !!!!!!!!!!! The reading time is in milliseconds !!!!!!!!!!!!!!!!!!! 27 | 28 | # You can read more about the effect here: 29 | # Gibson et al. (2013). “Rational integration of noisy evidence and prior semantic 30 | # expectations in sentence interpretation”. In: Proceedings of the 31 | # National Academy of Sciences 110.20, pp. 8051–8056. DOI: 32 | # 10.1073/pnas.1216438110. 33 | 34 | # Task 1 ------------------------------------------------------------------ 35 | 36 | # Using pipes, clean up the data like we did in class. 37 | # Save it to a new data frame. 38 | # !!!!!!!!!!! I WAS EVIL AND BROKE THE DATA IN SOME WAYS !!!!!!!!!!! 39 | # You need to think about what kind of data is even possible (e.g. what values 40 | # can reading time even take?). 41 | 42 | # • select relevant columns 43 | # • rename mislabeled columns 44 | # • remove missing data 45 | # • remove unnecessary rows 46 | # • arrange by condition, and reading time 47 | # • re-code the columns to the appropriate types 48 | # • make new columns if needed 49 | 50 | # Hint: Preview the data first in at least 2 or 3 ways to check what nonsense 51 | # I did and what the relevant columns may be. 52 | 53 | # Task 2 ------------------------------------------------------------------ 54 | 55 | # Calculate for each sentence type: 56 | # • the average reading time 57 | # • the standard deviation 58 | # • the minimal reading time 59 | # • the maximal reading time 60 | 61 | # Task 3 ------------------------------------------------------------------ 62 | 63 | # Calculate for each participant: 64 | # • the average reading time 65 | # • the standard deviation 66 | # • the minimal reading time 67 | # • the maximal deviation 68 | 69 | # Hint: you can reuse the code from Task 2 -------------------------------------------------------------------------------- /2025/Week 6/week06handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 6/week06handout.pdf -------------------------------------------------------------------------------- /2025/Week 7/assignment07.R: -------------------------------------------------------------------------------- 1 | ########################################################################### 2 | # Assignment Week 7 3 | ########################################################################### 4 | 5 | # The goal of this homework is to preview and plot the noisy channel and moses illusion data. 6 | # Preprocess the data as in the previous assignments. You can reuse your code 7 | # or use the solution provided in class. You can use the esquisse package for 8 | # creating plots but then you must copy the code to each task. 9 | 10 | # Task 1: NOISY CHANNEL --------------------------------------------------- 11 | # Create a bar plot with all the reading times (histogram). 12 | 13 | 14 | # Task 2: NOISY CHANEL ---------------------------------------------------- 15 | 16 | # Create a point plot that shows the mean reading times on each phrase (or interest area). 17 | 18 | # Task 3: NOISY CHANNEL --------------------------------------------------- 19 | 20 | # Create a line plot that shows the mean reading times on each phrase (or interest area) 21 | # for each condition. 22 | 23 | # Task 4: MOSES ILLUSION -------------------------------------------------- 24 | 25 | # Create the ugliest plot you can think of of the MOSES ILLUSION data. 26 | # Ensure that it BREAKS all POUR principles. 27 | 28 | -------------------------------------------------------------------------------- /2025/Week 7/week07.R: -------------------------------------------------------------------------------- 1 | # Week 07 ----------------------------------------------------------------- 2 | # May 20th 2025 3 | # Learn more about the datasets used for these expercises: 4 | # https://anna-pryslopska.shinyapps.io/TidyversePractice/#section-introduction 5 | 6 | library(tidyverse) 7 | 8 | # Exercise 1 -------------------------------------------------------------- 9 | # Create a ggplot object from the `mtcars` data. 10 | 11 | ggplot(data = mtcars) 12 | 13 | # Exercise 2 -------------------------------------------------------------- 14 | # Create a ggplot object from the `iris` data. This time, use the pipe operator. 15 | 16 | iris |> 17 | ggplot() 18 | 19 | # Exercise 3 -------------------------------------------------------------- 20 | # Create a ggplot object from the `mtcars` data and put the horsepower on the x axis 21 | # and the miles per gallon on the y axis. 22 | 23 | ggplot(data = mtcars) + aes(x = hp, y = mpg) 24 | 25 | # Exercise 4 -------------------------------------------------------------- 26 | # Create a ggplot object from the `iris` data and put the petal length on the x axis 27 | # and petal width y axis. This time, use the pipe operator. 28 | 29 | iris |> 30 | ggplot() + 31 | aes(x = Petal.Length, y = Petal.Width) 32 | 33 | # Exercise 5 -------------------------------------------------------------- 34 | # Create a ggplot object from the `mtcars` data and put the horsepower on the x axis 35 | # and the miles per gallon on the y axis. 36 | # Then add the geometry to make it a point plot. 37 | 38 | ggplot(data = mtcars) + 39 | aes(x = hp, y = mpg) + 40 | geom_point() 41 | 42 | # Exercise 6 -------------------------------------------------------------- 43 | # Create a ggplot object from the `iris` data and put the petal length on the x axis. 44 | # Then add the geometry to make it a bar plot. This time, use the pipe operator. 45 | 46 | iris |> 47 | ggplot() + 48 | aes(x = Petal.Length) + 49 | geom_bar() 50 | 51 | # Exercise 7 -------------------------------------------------------------- 52 | # Create a ggplot object from the `iris` data and put the petal length on the x axis 53 | # and petal width y axis. This time, use the pipe operator and make it a column plot. 54 | 55 | iris |> 56 | ggplot() + 57 | aes(x = Petal.Length, y = Petal.Width) + 58 | geom_col() 59 | 60 | # Exercise 8 -------------------------------------------------------------- 61 | # Create a ggplot object from the `airquality` data from May only. 62 | # Put the day of the month on the x axis and the temperature on the y axis. 63 | # Add first a column geometry and then a line geometry. Use the pipe operator. 64 | 65 | airquality |> 66 | filter(Month == 5) |> 67 | ggplot() + 68 | aes(x = Day, y = Temp) + 69 | geom_col() + 70 | geom_line() 71 | 72 | # Exercise 9 -------------------------------------------------------------- 73 | # Create a ggplot object from the `airquality`. Change the month column from 74 | # numbers to characters. Put the temperature on the x axis and the ozone values 75 | # on the y axis and create a column plot. Use the pipe operator. 76 | 77 | airquality |> 78 | mutate(Month = as.character(Month)) |> 79 | ggplot() + 80 | aes(x = Temp, 81 | y = Ozone, 82 | fill = Month) + 83 | geom_col() 84 | 85 | # Exercise 10 ------------------------------------------------------------- 86 | # Create a bar ggplot from the `iris` data. Put the petal length on the y axis. 87 | # Group the data by petal width. Then change the color of the bars to the default gradient. 88 | 89 | iris |> 90 | ggplot() + 91 | aes(y = Petal.Length, 92 | group = Petal.Width, 93 | fill = Petal.Width) + # Map (or assign) the color of the fill based on Petal.Width 94 | geom_bar() + 95 | scale_fill_gradient() 96 | 97 | # Exercise 11 ------------------------------------------------------------- 98 | # Create a point plot with a smoothing line from the `iris` data. Put the petal 99 | # length on the x axis and petal width on the y axis. Group the data by species. 100 | # Then change the color of the points to pink, orchid and purple. 101 | 102 | iris |> 103 | ggplot() + 104 | aes(x = Petal.Length, 105 | y = Petal.Width, 106 | group = Species, 107 | color = Species) + # Map (or assign) the color of the points based on Petal.Width 108 | geom_point() + 109 | scale_color_manual(values = c("pink", "orchid", "purple")) + 110 | geom_smooth() 111 | 112 | # Exercise 12 ------------------------------------------------------------- 113 | # Create a point ggplot from the `iris` data. Put the petal length on the x axis 114 | # and petal width on the y axis. Group the data by species. Then change the shape 115 | # (default) AND color of the points (to pink, orchid and purple). 116 | 117 | iris |> 118 | ggplot() + 119 | aes(x = Petal.Length, 120 | y = Petal.Width, 121 | group = Species, 122 | color = Species, 123 | shape = Species) + # Assign the shapes of the point based on the values in Species 124 | geom_point() + 125 | scale_color_manual(values = c("pink", "orchid", "purple")) 126 | -------------------------------------------------------------------------------- /2025/Week 7/week07handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 7/week07handout.pdf -------------------------------------------------------------------------------- /2025/Week 8/assignment08.md: -------------------------------------------------------------------------------- 1 | # Assignment 08 2 | 3 | In this task, you will export data relating to the Moses illusion and the noisy channel data sets. Upload a total of THREE files: PNG, PDF, and CSV. 4 | 5 | Use the packages patchwork or cowplot to export the plots. Each file should consist plots of different kinds (6 plots overall in 2 files; e.g. no two point plots in file 1, no two bar plots in file 2, etc.). 6 | 7 | ## Task 1 8 | 9 | Upload one PNG file with three plots visualizing the Moses illusion data. Use the tree different types of plot we spoke about in week 6 (or hybrid plots). 10 | 11 | 1. Did you fall for the illusion? Show the correct answers per question type (% of correct answers in the illusion, control, and each filler type conditions). 12 | 2. How did you do on the questions? Show the average accuracy per participant ONLY in two conditions: illusion and control (% of correct vs. incorrect answers; exclude don't knows). 13 | 3. Which questions fooled the most people? Show the average accuracy per participant ONLY in two conditions: illusion and control (% of correct vs. incorrect answers; exclude don't knows). 14 | 15 | ## Task 2 16 | 17 | Export the preprocessed noisy channel data to CSV. Only clean the data, do not calculate summary statistics. You should have the preprocessing code already from a previous homework. 18 | 19 | ## Task 3 20 | 21 | Upload one PDF file with three plots visualizing the cleaned and preprocessed noisy channel data. Use the tree different types of plot we spoke about in week 6 (or hybrid plots). 22 | 23 | 1. Were all sentences created equal? Show whether there is difference in the total reading times for the whole sentence between all conditions. 24 | 2. Did readers correct errors on the fly? Show the average reading times per sentence segment (also called interest area or IA) in both conditions. 25 | 3. Were some participants fast or slow readers? Show the total reading time per participant. 26 | 27 | ## Task 4 28 | 29 | Vote for the ugliest and least accessible plot. 30 | 31 | ## Task 5 32 | 33 | - Install Quarto: https://quarto.org/docs/get-started/ 34 | - Watch the introductory video: https://youtu.be/_f3latmOhew?si=xxovQvYkUosC_4uB 35 | -------------------------------------------------------------------------------- /2025/Week 8/week08.R: -------------------------------------------------------------------------------- 1 | # Week 08 ----------------------------------------------------------------- 2 | # May 27th 2025 3 | 4 | library(tidyverse) 5 | library(patchwork) 6 | 7 | # Plot 1 ------------------------------------------------------------------ 8 | my.plot1 <- 9 | iris |> 10 | ggplot() + 11 | aes(x = Petal.Length, 12 | y = Petal.Width, 13 | group = Species, 14 | color = Species, 15 | shape = Species) + 16 | geom_point() + 17 | scale_color_manual(values = c("pink", "orchid", "purple")) + 18 | theme_light() + 19 | labs(x = "Petal length", 20 | y = "Petal width", 21 | title = "Orchid petal comparison", 22 | subtitle = "Petal length and width in cm") 23 | 24 | # Plot 2 ------------------------------------------------------------------ 25 | my.plot2 <- 26 | iris |> 27 | ggplot() + 28 | aes(x = Petal.Length) + 29 | geom_histogram(fill = "#112446") + 30 | theme_light() + 31 | xlim(0, 8) + 32 | labs(x = "Petal length (cm)", 33 | y = "Count", 34 | title = "Petal distribution") 35 | 36 | # Plot 3 ------------------------------------------------------------------ 37 | my.plot3 <- 38 | iris |> 39 | ggplot() + 40 | aes(x = Species, 41 | y = Sepal.Width, 42 | group = Species, 43 | fill = Species) + 44 | geom_boxplot() + 45 | scale_fill_manual(values = c("pink", "orchid", "purple")) + 46 | theme_light() + 47 | labs(x = "Sepal length", 48 | y = "Sepal width", 49 | title = "Orchid sepal comparison", 50 | subtitle = "Sepal length and width in cm") 51 | 52 | my.plot1 53 | my.plot2 54 | my.plot3 55 | 56 | # Export data ------------------------------------------------------------- 57 | write_csv(iris, "iris.csv") 58 | write_tsv(iris, "learning_data/iris.tsv") 59 | write_delim(iris, "iris.txt", delim=";") 60 | 61 | # Export plots ------------------------------------------------------------ 62 | ggsave("iris1.png", width=10, height=10, units = "cm", dpi=150) 63 | ggsave(plot=my.plot2, "iris2.svg", width=10, height=10, units = "cm", dpi=150) 64 | 65 | all.my.plots <- 66 | (my.plot1 + my.plot3) / my.plot2 + 67 | plot_annotation( 68 | tag_levels = 'A', 69 | title = 'All of my orchid plots', 70 | caption = 'Disclaimer: None of these plots are particularly insightful' 71 | ) + 72 | plot_layout(guides = 'collect') 73 | 74 | ggsave("iris3.pdf", width=20, height=15, units = "cm", dpi=150) 75 | -------------------------------------------------------------------------------- /2025/Week 8/week08handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 8/week08handout.pdf -------------------------------------------------------------------------------- /2025/Week 9/assignment09.md: -------------------------------------------------------------------------------- 1 | # Assignment 09 2 | 3 | Create a quarto document with the collected homework assignments from Week 1 to Week 8 (inclusive). Make sections for each week and subsections for each task. Include the figures and print the plots where needed. I should be able to knit aka create the report on my computer. 4 | 5 | Your quarto document should contain: 6 | 7 | 1. The week numbers from Week 1 to Week 8 (inclusive). 8 | 2. The task numbers (if multiple) 9 | 3. The task descriptions (in plain text or as code, as applicable) 10 | 4. The solution code as code 11 | 5. Plots code, where applicable 12 | 6. Images, where applicable 13 | 7. Your session info. 14 | 15 | **Please upload two files: the quarto QMD file and an exported PDF file.** You **don't need** to include external images (e.g. screenshots) and data files, but you **do need** to include code for making plots. I will be able to see them in the PDF report. 16 | 17 | Remember to name your files with your name and assignment number. 18 | -------------------------------------------------------------------------------- /2025/Week 9/quarto_demo.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 9/quarto_demo.zip -------------------------------------------------------------------------------- /2025/Week 9/week09handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/a-nap/Digital-Research-Toolkit/a77600100bb012f00ea4329d576d3e40501eea01/2025/Week 9/week09handout.pdf -------------------------------------------------------------------------------- /2025/readme.md: -------------------------------------------------------------------------------- 1 | # Digital Research Toolkit for Linguists 2 | 3 | Author: `anna.pryslopska[ AT ]ling.uni-stuttgart.de` 4 | 5 | These are the original materials from the course "Digital Research Toolkit for Linguists taught by me in the Summer Semester 2025 at the University of Stuttgart. 6 | 7 | If you want to replicate this course, you can do so with proper attribution. To replicate the data, follow [this link for the experiment](https://farm.pcibex.net/r/CuZHnp/) (full Moses illusion experiment). 8 | 9 | ## Schedule and syllabus 10 | 11 | This is a rough overview of the topics discussed every week. These are subject to change, depending on how the class goes. 12 | 13 | | Week | Date | Topic | Description | Assignments | Materials | 14 | | ---- | ----- | ----------- | ----------- | --------- | --------- | 15 | | 1 | 08.04 | Course intro, data | General information, syllabus, data security | [Assignment 1](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%201/assignment01.md) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%201/week01handout.pdf) | 16 | | 2 | 15.04 | Data, R, RStudio | Data sources, directories, R and RStudio, installing and loading packages, working with scripts | [Assignment 2](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%202/assignment02.md) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%202/week02handout.pdf), [code](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%202/week02.R) | 17 | | 3 | 22.04 | Data, R, RStudio | Scripts, data types, encoding, importing and inspecting data | [Assignment 3](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%203/assignment03.md) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%203/week03handout.pdf), [code](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%203/week03.R) | 18 | | 4 | 29.04 | Data cleaning and manipulation | Basic operators, data manipulation (filtering, sorting, subsetting, arranging, renaming), dealing with missing data, sets, logic | [Assignment 4](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%204/assignment04.md) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%204/week04handout.pdf), [code](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%204/week04.R) | 19 | | 5 | 06.05 | Data manipulation | Mutating, pipes, joining data frames, if…else, summary statistics | [Assignment 5](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%205/assignment05.md) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%205/week05handout.pdf), [code](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%205/week05.R) | 20 | | 6 | 13.05 | Debugging, data visualization | Debugging, MRE, data vis goals, accessibility, plot types | [Assignment 6](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%206/assignment06.R) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%206/week06handout.pdf) | 21 | | 7 | 20.05 | Data visualization | Communicating with graphics, accessibility, visualizing in R (`ggplot2`, `esquisse`) | [Assignment 7](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%207/assignment07.R) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%207/week07handout.pdf), [code](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%207/week07.R) | 22 | | 8 | 27.05 | Data visualization | Best practices, lying with plots, in-class exercises, exporting/saving plots and data. | [Assignment 8](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%208/assignment08.md) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%208/week08handout.pdf), [code](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%208/week08.R) | 23 | | 9 | 03.06 | Documentation, Quarto | Pandoc, markdown, Quarto, `knitr`, basic syntax and elements, export, chunk options, documentation | [Assignment 9](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%209/assignment09.md) | [Slides](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%209/week09handout.pdf), [Quarto demo](https://github.com/a-nap/Digital-Research-Toolkit/blob/main/2025/Week%209/quarto_demo.zip) | 24 | | 10 | 10.06 | *no class* | | | 25 | | 11 | 17.06 | Text editors, writing reports | Plain text editors, writing reports | Assignment 10 | Slides | 26 | | 12 | 24.06 | Reference management | Reference managers, literature research, DOIs | Assignment 11 | Slides | 27 | | 13 | 01.07 | LLM, AI | LLM for humanities, effective AI use | Assignment 12 | Slides | 28 | | 14 | 08.07 | Git, GitHub | Version control, Git, GitHub, SSH | Assignment 13 | Slides | 29 | | 15 | 15.07 | Git, GitHub, course outro | Git, GitHub, reverting to older versions, class recap | In class assignment | Slides | 30 | 31 | ## Recommended reading 32 | 33 | ### Git 34 | 35 | - GitHub Git guide: [`https://github.com/git-guides/`](https://github.com/git-guides/) 36 | - Another git guide: [`http://rogerdudler.github.io/git-guide/`](http://rogerdudler.github.io/git-guide/) 37 | - Git tutorial: [`http://git-scm.com/docs/gittutorial`](http://git-scm.com/docs/gittutorial) 38 | - Another git tutorial: [`https://www.w3schools.com/git/`](https://www.w3schools.com/git/) 39 | - Git cheat sheets: [`https://training.github.com/`](https://training.github.com/) 40 | - Where to ask questions: [Stackoverflow](https://stackoverflow.com) 41 | 42 | ### Quarto 43 | 44 | - Introductory video: [`https://www.youtube.com/watch?v=_f3latmOhew`](https://www.youtube.com/watch?v=_f3latmOhew) 45 | - Documentation: [`https://quarto.org/docs/get-started/`](https://quarto.org/docs/get-started/) 46 | 47 | ### R 48 | 49 | - QCBS R Workshop Series [`https://r.qcbs.ca/`](https://r.qcbs.ca/) 50 | - Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023). *R for data science: import, tidy, transform, visualize, and model data*. 2nd ed. O’Reilly Media, Inc. URL: [`https://r4ds.hadley.nz/`](https://r4ds.hadley.nz/). 51 | - [Tidyverse practice tutorial](https://anna-pryslopska.shinyapps.io/TidyversePractice) for this class (selecting, arranging, filtering, grouping, summarizing etc.) 52 | - [Penguin wrangling `dplyr` tutorial](https://allisonhorst.github.io/posts/2021-02-08-dplyr-learnr/) by Allison Horst. 53 | 54 | ### Experiment 55 | 56 | - Erickson, Thomas D and Mark E Mattson (1981). “From words to meaning: A semantic illusion”. In: *Journal of Verbal Learning and Verbal Behavior* 20.5, pp. 540–551. DOI: [`10.1016/s0022-5371(81)90165-1`](https://www.sciencedirect.com/science/article/abs/pii/S0022537181901651). -------------------------------------------------------------------------------- /R_tutorial/.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /R_tutorial/R_tutorial.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | ProjectId: 11be4ae0-cc5b-49e1-a78b-d1ebb484bc9e 3 | 4 | RestoreWorkspace: Default 5 | SaveWorkspace: Default 6 | AlwaysSaveHistory: Default 7 | 8 | EnableCodeIndexing: Yes 9 | UseSpacesForTab: Yes 10 | NumSpacesForTab: 2 11 | Encoding: UTF-8 12 | 13 | RnwWeave: Sweave 14 | LaTeX: XeLaTeX 15 | 16 | AutoAppendNewline: Yes 17 | StripTrailingWhitespace: Yes 18 | 19 | BuildType: Website 20 | 21 | SpellingDictionary: en_US 22 | -------------------------------------------------------------------------------- /R_tutorial/readme.md: -------------------------------------------------------------------------------- 1 | # Tidyverse Practice: Digital Research Toolkit for the Humanities (SoSe 2025) 2 | 3 | Welcome to the repository for Tidyverse Practice, an interactive R tutorial developed for the *Digital Research Toolkit for the Humanities* course in the Summer Semester 2025. This tutorial helps students and researchers new to R gain hands-on experience with the tidyverse ecosystem, focusing on data manipulation with real-world datasets. 4 | 5 | Live demo: [LINK](https://anna-pryslopska.shinyapps.io/TidyversePractice/) 6 | 7 | ## About the Project 8 | 9 | This project uses the `learnr` package to provide an interactive and progressive learning environment. The tutorial covers foundational R concepts and `dplyr` functions with exercises that help learners build confidence in working with data. 10 | 11 | ## Key Features 12 | 13 | - Interactive code exercises with hints and solutions 14 | - Custom themed interface 15 | - Real datasets used in the social sciences and humanities 16 | - Focus on reproducible and readable R code 17 | 18 | ## Topics Covered 19 | 20 | The tutorial includes the following modules: 21 | 22 | 1. Navigating working directories and file structure 23 | 2. Installing, loading, and unloading R packages 24 | 3. Previewing and exploring data 25 | 4. Data preprocessing: 26 | - Selecting and renaming columns 27 | - Filtering values (with an introduction to set theory) 28 | - Handling missing values 29 | - Creating new variables 30 | - Sorting rows 31 | - Identifying unique values 32 | 5. Grouping data and summarizing results 33 | 6. Using conditional logic with if-else statements 34 | 7. Assigning values and data input 35 | 8. Creating dataframes, binding rows and columns 36 | 9. Combining data with joins and merges 37 | 38 | Each section includes multiple hands-on exercises designed to reinforce the concepts covered. 39 | 40 | ## Getting Started 41 | 42 | ### Prerequisites 43 | 44 | Ensure you have the following installed: 45 | 46 | - R (>= 4.0) 47 | - RStudio 48 | - The R packages: `learnr`, `tidyverse`, `psych`, `formattable`, `knitr`, `shiny`, `rmarkdown` 49 | 50 | ### Run the Tutorial Locally 51 | 52 | Clone this repository, open the `.Rmd` file in RStudio and click "Run Document". 53 | 54 | Note: Some exercises may not work in online or restricted R environments (e.g., installing packages or setting the working directory). 55 | -------------------------------------------------------------------------------- /R_tutorial/rsconnect/documents/tutorial.Rmd/shinyapps.io/anna-pryslopska/TidyversePractice.dcf: -------------------------------------------------------------------------------- 1 | name: TidyversePractice 2 | title: TidyversePractice 3 | username: 4 | account: anna-pryslopska 5 | server: shinyapps.io 6 | hostUrl: https://api.shinyapps.io/v1 7 | appId: 14699182 8 | bundleId: 10257058 9 | url: https://anna-pryslopska.shinyapps.io/TidyversePractice/ 10 | version: 1 11 | asMultiple: FALSE 12 | asStatic: FALSE 13 | when: 1612315822.30477 14 | 15 | -------------------------------------------------------------------------------- /R_tutorial/tutorial.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Tidyverse Practice" 3 | author: "Anna Pryslopska" 4 | output: 5 | learnr::tutorial: 6 | progressive: TRUE 7 | include_code: FALSE 8 | theme: 9 | bg: "#ffffff" 10 | fg: "#343643" 11 | secondary: "#0A0A1A" 12 | primary: "#000000" 13 | success: "#01AEAD" 14 | info: "#01AEAD" 15 | warning: "#F0AD4E" 16 | danger: "#D9534F" 17 | runtime: shiny_prerendered 18 | --- 19 | 20 | ```{r setup, include=FALSE} 21 | library(shiny) 22 | library(learnr) 23 | library(tidyverse) 24 | # library(fontawesome) 25 | # library(here) 26 | 27 | countries_data <- data.frame( 28 | Country = c("Canada", "Japan", "Brazil", "Egypt", "Germany", "Australia"), 29 | Capital = c("Ottawa", "Tokyo", "Brasília", "Cairo", "Berlin", "Canberra"), 30 | Continent = c("North America", "Asia", "South America", "Africa", "Europe", "Oceania") 31 | ) 32 | 33 | school <- data.frame(Age = 1:18, 34 | School = c("Preschool", "Preschool","Preschool","Preschool", "Preschool", 35 | "Primary school", "Primary school","Primary school","Primary school", 36 | "Middle school","Middle school","Middle school","Middle school","Middle school","Middle school", "High school", "High school", "High school")) 37 | 38 | knitr::opts_chunk$set(echo = FALSE) 39 | 40 | ``` 41 | 42 | ## Introduction 43 | 44 | This interactive tutorial will guide you through common **dplyr**, **ggplot2** and base R functions for data manipulation. 45 | You will practice with real data sets like `mtcars`, `iris`, `starwars`, and `airquality`. 46 | 47 | ### About this tutorial 48 | 49 | These exercises are meant to be complementary to the *Digital Research Toolkit for Linguists* offered in the summer semester 2025 at the University of Stuttgart. 50 | I reference the materials and examples from the seminar in several places. 51 | The relevant slides and exercises are available from the GitHub repository: [LINK](https://github.com/a-nap/Digital-Research-Toolkit). 52 | 53 | For questions and feedback, please contact Anna Prysłopska at `anna . pryslopska [AT] gmail . com` 54 | 55 | Topics covered include: 56 | 57 | 1. Navigating working directories and file structure 58 | 2. Installing, loading, and unloading R packages 59 | 3. Previewing and exploring data 60 | 4. Data preprocessing: 61 | - Selecting and renaming columns 62 | - Filtering values (with an introduction to set theory) 63 | - Handling missing values 64 | - Creating new variables 65 | - Sorting rows 66 | - Identifying unique values 67 | 5. Grouping data and summarizing results 68 | 6. Using conditional logic with if-else statements 69 | 7. Assigning values and data input 70 | 8. Creating dataframes, binding rows and columns 71 | 9. Combining data with joins and merges 72 | 10. Visualizing data 73 | 74 | #### How to use this tutorial 75 | 76 | This tutorial consists primarily of exercises. 77 | You will read a short description of the task and see a code chunk window, like the one below. 78 | 79 | ```{r intro, exercise=TRUE} 80 | # There is some code here 81 | print("Hello world!") 82 | ``` 83 | 84 | ```{r intro-solution} 85 | # You can copy this code and it will print "Hello world!" 86 | print("Hello world!") 87 | ``` 88 | 89 | You write your solution in the code window and run it by pressing `shift`+`enter`.The result should appear below the exercise window. 90 | 91 | Unfortunately, the run code button is disabled at the beginning for some reason (probably an error in the package or insufficient memory). 92 | 93 | If you get stuck, you can click on the *Hint* and/or *Solution* button to (incrementally) reveal the solution. You can then copy and paste the solution into the code window. Click on the *Hint* or *Solution* button again to close the popup window and return to the code chunk. 94 | 95 | Your solution is not actually graded in this tutorial, so it does not look to see if your answer matches the solution. Rather, it's a guideline for self-study. 96 | 97 | #### The tidyverse 98 | 99 | In class, we used tidyverse functions over base R. 100 | The [tidyverse](https://www.tidyverse.org/) is a curated set of R packages tailored for data science, all built around a consistent design philosophy, shared grammar, and common structures. 101 | They usually have punny names. 102 | 103 | There is SO MUCH more to both the packages we used in class and the tidyverse overall. 104 | 105 | #### `dplyr` 106 | 107 | [`dplyr`](https://dplyr.tidyverse.org/) is an R package in the tidyverse. It helps you work with data by providing you a set of simple, consistent commands that make it easier to do common tasks like filtering, sorting, and summarizing data. 108 | 109 | #### `ggplot2` 110 | 111 | I class, we have been using the `ggplot2` and `esquisse` packages to visualize data. In this tutorial, you will practice plotting with the former package. 112 | 113 | [`ggplot2`](https://ggplot2.tidyverse.org/) is a tool for making graphs. Its approach to data viz is that of a layered grammar of graphics. You design and construct graphics in 114 | a structured manner from data upwards. 115 | First, you tell it what data to use, how to match data to things like color or position, and what kind of shapes to draw (like bars or lines). Then `ggplot2` builds the graph for you, handling the rest of the work. 116 | 117 | `ggplot2` was created by Hadley Wickham. 118 | 119 | {width="50%"} 120 | 121 | #### `esquisse` 122 | 123 | [`esquisse`](https://dreamrs.github.io/esquisse/) is a package that lets you explore data interactively in a graphical user interface. It uses `ggplot2` for visualization. 124 | You can export the generated graph and save the code to generate it. 125 | It has its limitations but is useful to get a first impression/overview. 126 | 127 | #### `mtcars` 128 | 129 | The **`mtcars`** data set contains car data with fuel consumption and design specs for 32 car models taken from the US magazine *Motor Trend* (1973–74). 130 | 131 | | Column | Explanation | 132 | |--------|------------------------------------------| 133 | | `mpg` | Miles/(US) gallon | 134 | | `cyl` | Number of cylinders | 135 | | `disp` | Displacement (cu.in.) | 136 | | `hp` | Gross horsepower | 137 | | `drat` | Rear axle ratio | 138 | | `wt` | Weight (1000 lbs) | 139 | | `qsec` | 1/4 mile time | 140 | | `vs` | Engine (0 = V-shaped, 1 = straight) | 141 | | `am` | Transmission (0 = automatic, 1 = manual) | 142 | | `gear` | Number of forward gears | 143 | 144 | #### `iris` data set 145 | 146 | The **`iris`** data set contains measurements (sepal/petal) of 150 iris flowers across three species. 147 | Originally published by [the UCI Machine Learning Repository](https://archive.ics.uci.edu/data set/53/iris). 148 | 149 | | Column | Explanation | 150 | |----------------|------------------| 151 | | `Sepal.Length` | Length of sepals | 152 | | `Sepal.Width` | Width of sepals | 153 | | `Petal.Length` | Length of petals | 154 | | `Petal.Width` | Width of petals | 155 | | `Species` | One of 3 species | 156 | 157 | {width="90%"} 158 | 159 | #### `starwars` data set 160 | 161 | The **`starwars`** data set from the `dplyr` package contains [data on characters from the Star Wars universe](https://dplyr.tidyverse.org/reference/starwars.html). 162 | 163 | | Column | Explanation | 164 | |----|----| 165 | | `name` | Name of the character | 166 | | `height` | Height (cm) | 167 | | `mass` | Weight (kg) | 168 | | `hair_color`, `skin_color`, `eye_color` | Hair, skin, and eye colors | 169 | | `birth_year` | Year born (BBY = Before Battle of Yavin) | 170 | | `sex` | The biological sex of the character, namely male, female, hermaphroditic, or none (as in the case for Droids). | 171 | | `gender` | The gender role or gender identity of the character as determined by their personality or the way they were programmed (as in the case for Droids). | 172 | | `homeworld` | Name of homeworld | 173 | | `species` | Name of species | 174 | | `films` | List of films the character appeared in | 175 | | `vehicles` | List of vehicles the character has piloted | 176 | | `starships` | List of starships the character has piloted | 177 | 178 | #### `airquality` data set 179 | 180 | The **`airquality`** data contains daily air quality measurements in New York (May–September 1973). 181 | 182 | | Column | Explanation | 183 | |----|----| 184 | | `Ozone` | Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island | 185 | | `Solar.R` | Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park | 186 | | `Wind` | Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport | 187 | | `Temp` | Maximum daily temperature in degrees Fahrenheit at La Guardia Airport. | 188 | 189 | 190 | ------------------------------------------------------------------------ 191 | 192 | ## 1. Working directory 193 | 194 | *2 exercises* 195 | 196 | A **directory** or **folder** is a container for storing files or other folders. 197 | **File structure** or **file hierarchy** or **folder organization** is the way these containers are organized. 198 | 199 | The working directory is where R will look for files, where R will save visible and hidden files, and where R will automatically load files from. 200 | 201 | {width="50%"} 202 | 203 | ### Exercise 1.1: Check your working directory 204 | 205 | Check what working directory you are in. 206 | 207 | ```{r directory-ex1, exercise=TRUE} 208 | # Write your code here 209 | 210 | ``` 211 | 212 | ```{r directory-ex1-solution} 213 | getwd() 214 | ``` 215 | 216 | ### Exercise 1.2: Set your working directory 217 | 218 | Change the working directory to the one you're using in class. 219 | You won't actually manage to change it online, but try to do it anyway. 220 | 221 | ```{r directory-ex2, exercise=TRUE} 222 | # Write your code here 223 | 224 | ``` 225 | 226 | 227 | ```{r directory-ex2-hint} 228 | setwd("path/to/your/directory/in quotes") 229 | ``` 230 | 231 | 232 | ```{r directory-ex2-solution} 233 | setwd("path/to/your/directory/in quotes") 234 | ``` 235 | 236 | ## 2. Packages 237 | 238 | *4 exercises* 239 | 240 | Packages are collections of functions and/or data sets with a common theme (e.g. statistics, spatial analysis, plotting). 241 | Most packages are available through the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/) and on GitHub. 242 | 243 | ### Exercise 2.1: Install packages 244 | 245 | Install the packages `ggplot2` and `formattable` using the `install.packages()` function. 246 | You won't be able to do this online (and will get an error like *trying to use CRAN without setting a mirror*), but try anyway. 247 | 248 | ```{r packages-ex1, exercise=TRUE} 249 | # Write your code here 250 | 251 | ``` 252 | 253 | ```{r packages-ex1-hint} 254 | install.packages(c()) 255 | 256 | ``` 257 | 258 | ```{r packages-ex1-solution} 259 | install.packages(c("ggplot2", "formattable")) 260 | ``` 261 | 262 | ### Exercise 2.2: Load packages 263 | 264 | Load (aka import or activate) the packages you just installed into the workspace. 265 | 266 | ```{r packages-ex2, exercise=TRUE} 267 | # Write your code here 268 | 269 | ``` 270 | 271 | ```{r packages-ex2-hint} 272 | library(ggplot2) 273 | ``` 274 | 275 | 276 | ```{r packages-ex2-solution} 277 | library(ggplot2) 278 | library(formattable) 279 | ``` 280 | 281 | ### Exercise 2.3: Unload packages 282 | 283 | Sometimes packages will conflict with one another. 284 | Then you might want to "unload" a packages. 285 | Unload the packages you just installed and loaded into the workspace. 286 | 287 | ```{r packages-ex3, exercise=TRUE} 288 | # Write your code here 289 | 290 | ``` 291 | 292 | ```{r packages-ex3-hint} 293 | # Option 1: 294 | unloadNamespace("ggplot2") 295 | # Option 2: 296 | detach("package:ggplot2", unload = TRUE) 297 | ``` 298 | 299 | ```{r packages-ex3-solution} 300 | # Option 1 301 | unloadNamespace("ggplot2") 302 | unloadNamespace("formattable") 303 | # Option 2 304 | detach("package:ggplot2", unload = TRUE) 305 | detach("package:formattable", unload = TRUE) 306 | ``` 307 | 308 | ### Exercise 2.4: Session information 309 | 310 | You should collect information about current R session, so that you can reproduce the analysis, should anything change in R or in the packages you use. 311 | 312 | Check what packages are loaded in 313 | 314 | ```{r packages-ex4, exercise=TRUE} 315 | # Write your code here 316 | 317 | ``` 318 | 319 | ```{r packages-ex4-solution} 320 | sessionInfo() 321 | ``` 322 | 323 | ------------------------------------------------------------------------ 324 | 325 | ## 3. Preview data 326 | 327 | *7 exercises* 328 | 329 | Before starting any kind of analysis, you have to look at what you're dealing with. 330 | 331 | ```{r, include=FALSE} 332 | library(tidyverse) 333 | library(dplyr) 334 | ``` 335 | 336 | ### Exercise 3.1: No functions 337 | 338 | Preview the `starwars` data without calling any functions. 339 | 340 | ```{r preview-ex1, exercise=TRUE} 341 | # Write your code here 342 | 343 | ``` 344 | 345 | ```{r preview-ex1-solution} 346 | starwars 347 | ``` 348 | 349 | ### Exercise 3.2: Preview the columns 350 | 351 | What columns does the `mtcars` dataframe have? 352 | 353 | ```{r preview-ex2, exercise=TRUE} 354 | # Write your code here 355 | 356 | ``` 357 | 358 | ```{r preview-ex2-hint} 359 | colnames() 360 | ``` 361 | 362 | ```{r preview-ex2-solution} 363 | colnames(mtcars) 364 | ``` 365 | 366 | ### Exercise 3.3: Data summary 367 | 368 | Print a summary of the `iris` data set. 369 | 370 | ```{r preview-ex3, exercise=TRUE} 371 | # Write your code here 372 | 373 | ``` 374 | 375 | ```{r preview-ex3-hint} 376 | summary() 377 | ``` 378 | 379 | ```{r preview-ex3-solution} 380 | summary(iris) 381 | ``` 382 | 383 | ### Exercise 3.4: Describing data 384 | 385 | Using the package `psych`, show the description of the `airquality` data. 386 | 387 | ```{r preview-ex4, exercise=TRUE} 388 | # Write your code here 389 | 390 | ``` 391 | 392 | ```{r preview-ex4-hint-1} 393 | # Remember to load the package `psych` first. 394 | library(psych) 395 | ``` 396 | 397 | ```{r preview-ex4-hint-2} 398 | # Then use the describe function 399 | library(psych) 400 | describe() 401 | ``` 402 | 403 | ```{r preview-ex4-solution} 404 | library(psych) 405 | describe(airquality) 406 | ``` 407 | 408 | ### Exercise 3.5: Print the whole data 409 | 410 | Print the whole `starwars` data. 411 | All rows and columns. 412 | 413 | ```{r preview-ex5, exercise=TRUE} 414 | # Write your code here 415 | 416 | ``` 417 | 418 | ```{r preview-ex5-hint-1} 419 | # Use the print() function 420 | print() 421 | ``` 422 | 423 | ```{r preview-ex5-hint-2} 424 | # Specify how many rows to print (an infinite amount!) 425 | print(, n=Inf) 426 | ``` 427 | 428 | ```{r preview-ex5-solution} 429 | print(starwars, n=Inf) 430 | ``` 431 | 432 | ### Exercise 3.6: Heads 433 | 434 | Print the first 6 rows of the `mtcars` data. 435 | 436 | ```{r preview-ex6, exercise=TRUE} 437 | # Write your code here 438 | 439 | ``` 440 | 441 | ```{r preview-ex6-hint} 442 | # The top rows are the head 443 | head() 444 | ``` 445 | 446 | 447 | ```{r preview-ex6-solution} 448 | head(mtcars) 449 | ``` 450 | 451 | ### Exercise 3.7: Tails 452 | 453 | Print the last 10 rows of the `iris` data. 454 | 455 | ```{r preview-ex7, exercise=TRUE} 456 | # Write your code here 457 | 458 | ``` 459 | 460 | ```{r preview-ex7-hint-1} 461 | # The bottom rows are the tail 462 | tail() 463 | ``` 464 | 465 | ```{r preview-ex7-hint-2} 466 | # Specify how many rows to show (an infinite amount!) 467 | tail(, n=10) 468 | ``` 469 | 470 | ```{r preview-ex7-solution} 471 | tail(iris, n=10) 472 | ``` 473 | 474 | ------------------------------------------------------------------------ 475 | 476 | ## 4. `select()` 477 | 478 | *4 exercises* 479 | 480 | During data clean up, we selected only those columns that were meaningful for our analysis by using the function `select()`. All the other columns were removed. 481 | 482 | You can use `select()` with operators to select variables, as per the documentation: 483 | 484 | - `:` for selecting a range of consecutive variables. 485 | - `!` for taking the complement of a set of variables. 486 | - `&` and `|` for selecting the intersection or the union of two sets of variables. 487 | - `c()` for combining selections. 488 | 489 | ### Exercise 4.1: Select specific columns 490 | 491 | Use `select()` on `mtcars` data to choose miles per gallon, horse power, and weight columns. 492 | 493 | ```{r select-ex1, exercise=TRUE} 494 | # Write your code here 495 | 496 | ``` 497 | 498 | ```{r select-ex1-hint-1} 499 | # The data is the first argument you must give to the function 500 | ``` 501 | 502 | 503 | ```{r select-ex1-hint-2} 504 | # The columns you need are: mpg, hp, wt 505 | ``` 506 | 507 | ```{r select-ex1-solution} 508 | select(mtcars, mpg, hp, wt) 509 | ``` 510 | 511 | ### Exercise 4.2: Exclude columns 512 | 513 | Use `select()` on `starwars` to drop hair color and skin color columns. 514 | 515 | ```{r select-ex2, exercise=TRUE} 516 | # Write your code here 517 | 518 | ``` 519 | 520 | ```{r select-ex2-hint} 521 | # The columns you want to remove are hair_color and skin_color 522 | ``` 523 | 524 | ```{r select-ex2-solution} 525 | select(starwars, -hair_color, -skin_color) 526 | ``` 527 | 528 | ### Exercise 4.3: Select the last columns 529 | 530 | Use `select()` on `iris` to pick the last three columns. 531 | 532 | ```{r select-ex3, exercise=TRUE} 533 | # Write your code here 534 | 535 | ``` 536 | 537 | ```{r select-ex3-hint-1} 538 | # Go back to the introduction to check the columns or preview them 539 | colnames(iris) 540 | ``` 541 | 542 | ```{r select-ex3-hint-2} 543 | # Now that you know the column names, you have two options to select the final 3. 544 | # Option 1: using c() for concatenation 545 | # Option 2: using : for the range 546 | ``` 547 | 548 | ```{r select-ex3-solution} 549 | # Option 1: 550 | select(iris, c(Petal.Length, Petal.Width, Species)) 551 | # Option 2: 552 | select(iris, 3:5) 553 | ``` 554 | 555 | ### Exercise 4.4: Select a column range 556 | 557 | Use `select()` to select all the columns between "Ozone" and "Temp" in the `airquality` data. 558 | 559 | ```{r select-ex4, exercise=TRUE} 560 | # Write your code here 561 | 562 | ``` 563 | 564 | ```{r select-ex4-hint} 565 | # You want to take the range of columns, which should also include the "Ozone" and "Temp" columns. 566 | ``` 567 | 568 | ```{r select-ex4-solution} 569 | select(airquality, Ozone:Temp) 570 | ``` 571 | 572 | ------------------------------------------------------------------------ 573 | 574 | ## 5. `rename()` 575 | 576 | *3 exercises* 577 | 578 | Sometimes, columns are named in a annoying or misleading way. 579 | One of the steps of data analysis is to give columns, data, variables etc. meaningful names. 580 | 581 | ### Exercise 5.1: Rename a single column 582 | 583 | In `mtcars`, rename "mpg" to "miles_per_gallon". 584 | 585 | ```{r rename-ex1, exercise=TRUE} 586 | # Write your code here 587 | 588 | ``` 589 | 590 | ```{r rename-ex1-hint} 591 | # The rename() function takes the arguments: 592 | # Data (frame) 593 | # New variable name = Old variable name 594 | ``` 595 | 596 | ```{r rename-ex1-solution} 597 | rename(mtcars, miles_per_gallon = mpg) 598 | ``` 599 | 600 | ### Exercise 5.2: Rename multiple columns 601 | 602 | In `starwars`, rename "birth_year" to "age" and "mass" to "weight." 603 | 604 | ```{r rename-ex2, exercise=TRUE} 605 | # Write your code here 606 | 607 | ``` 608 | 609 | ```{r rename-ex2-hint-1} 610 | # As in the exercise before, you need to specify the data frame and new names for old columns. 611 | # You can simply list the new columns as arguments (comma separated) or use concatenation. 612 | ``` 613 | 614 | ```{r rename-ex2-hint-2} 615 | # Option 1: 616 | rename(df, 617 | new1 = old1, 618 | new2 = old2) 619 | ``` 620 | 621 | ```{r rename-ex2-hint-3} 622 | # Option 2: 623 | rename(df, 624 | c(new1 = old1, 625 | new2 = old2)) 626 | ``` 627 | 628 | ```{r rename-ex2-solution} 629 | # Option 1: 630 | rename(starwars, 631 | age = birth_year, 632 | weight = mass) 633 | # Option 2: 634 | rename(starwars, 635 | c(age = birth_year, 636 | weight = mass)) 637 | ``` 638 | 639 | ### Exercise 5.3: Rename after select 640 | 641 | Select "species" and "homeworld" from `starwars` and rename them to "type" and "planet". 642 | Do this in two steps, assigning the steps to the variables `df1` and `df2` ("dataframe 1" and "dataframe 2" for short). 643 | 644 | ```{r rename-ex3, exercise=TRUE} 645 | # Write your code here 646 | 647 | ``` 648 | 649 | ```{r rename-ex3-hint-1} 650 | # Start by creating a new data frame with the selected columns 651 | df1 <- select(df, column1, column2) 652 | ``` 653 | 654 | ```{r rename-ex3-hint-2} 655 | # Rename the columns 656 | # Remember to pass the correct variable to the second function. 657 | df2 <- rename(df, 658 | new1 = old1, 659 | new2 = old2) 660 | ``` 661 | 662 | ```{r rename-ex3-solution} 663 | df1 <- select(starwars, species, homeworld) 664 | df2 <- rename(df1, type = species, planet = homeworld) 665 | ``` 666 | 667 | ------------------------------------------------------------------------ 668 | 669 | ## 6. `filter()` 670 | 671 | *7 exercises* 672 | 673 | Filtering rows is analogous to selecting columns. 674 | Some rows are not relevant to us (or maybe at all!) and removing them makes it easier to determine the important information. 675 | 676 | In class, we spoke about using R as a calculator and how it can do basic functions: 677 | 678 | | Function | Symbol | 679 | |:---------------------------------------|---------------------| 680 | | addition | `+` | 681 | | subtraction | `-` | 682 | | division | `/` | 683 | | multiplication | `*` | 684 | | power | `^` | 685 | | equals | `==` | 686 | | does not equal | `!=` | 687 | | greater than | `>` | 688 | | greater than or equal | `>=` | 689 | | less than | `<` | 690 | | less than or equal | `<=` | 691 | | range (from NR1 to NR2) | NR1`:`NR2 | 692 | | identify element (is VALUE in OBJECT?) | VALUE `%in%` OBJECT | 693 | 694 | We also practiced set theory. 695 | **Set theory** is a part of math and a way to thinking about groups of things and how they relate to each other. 696 | These "things" can be anything: numbers, letters, shapes, people, concepts, ideas, etc. 697 | We call these collections sets and each thing inside the set is called an **element**. 698 | Sets are useful because they allow us to analyze arguments using logic and structure, define categories (e.g. all living beings) and compare them ("some living beings are humans"). 699 | You can combine and compare sets by using unions ('or' `|`, so everything in A or B) and intersections ('and' `&`, so only what is in *both* A and B). 700 | 701 | {width="50%"} 702 | 703 | ### Exercise 6.1: Basic filtering 704 | 705 | Filter `mtcars` for rows where the miles per gallon are more than 25. 706 | 707 | ```{r filter-ex1, exercise=TRUE} 708 | # Write your code here 709 | 710 | ``` 711 | 712 | ```{r filter-ex1-hint} 713 | # Substitute "condition" for a statement that evaluates to TRUE 714 | # In this case, miles per gallon is over 25. 715 | filter(df, condition) 716 | ``` 717 | 718 | ```{r filter-ex1-solution} 719 | filter(mtcars, mpg > 25) 720 | ``` 721 | 722 | ### Exercise 6.2: Multiple conditions 723 | 724 | From `starwars`, keep only masculine characters which are taller than 180. 725 | 726 | ```{r filter-ex2, exercise=TRUE} 727 | # Write your code here 728 | 729 | ``` 730 | 731 | ```{r filter-ex2-hint-1} 732 | # Option 1: 733 | filter(df, 734 | condition1, 735 | condition2) 736 | ``` 737 | 738 | ```{r filter-ex2-hint-2} 739 | # Option 2: 740 | filter(df, 741 | condition1 & condition2) 742 | ``` 743 | 744 | ```{r filter-ex2-solution} 745 | # Option 1: 746 | filter(starwars, 747 | gender == 'masculine', 748 | height > 180) 749 | # Option 2: 750 | filter(starwars, 751 | gender == 'masculine' & 752 | height > 180) 753 | ``` 754 | 755 | ### Exercise 6.3: Filter with %in% 756 | 757 | On `iris`, filter only the setosa and versicolor species. 758 | 759 | ```{r filter-ex3, exercise=TRUE} 760 | # Write your code here 761 | 762 | ``` 763 | 764 | ```{r filter-ex3-hint} 765 | # There are multiple ways to solve this exercise: 766 | # Option 1: Using %in% 767 | # Option 2: Using == and | 768 | # Option 3: Using != 769 | ``` 770 | 771 | ```{r filter-ex3-solution} 772 | # Option 1: 773 | filter(iris, Species %in% c('setosa', 'versicolor')) 774 | # Option 2: 775 | filter(iris, Species == 'setosa' | Species == 'versicolor') 776 | # Option 3: 777 | filter(iris, Species != 'virginica') 778 | ``` 779 | 780 | ### Exercise 6.4: More filters 781 | 782 | On `iris`, how many flowers of the virginica species have the petal width of at least 2.5 and petal length of 6 or less? 783 | 784 | ```{r filter-ex4, exercise=TRUE} 785 | # Write your code here 786 | 787 | ``` 788 | 789 | ```{r filter-ex4-hint-1} 790 | # As a reminder, you can combine the conditions in two ways: 791 | # Option 1: As a comma-separated list 792 | # Option 2: By using & 793 | ``` 794 | 795 | ```{r filter-ex4-hint-2} 796 | # Option 1: 797 | filter(df, 798 | condition1, 799 | condition2, 800 | cobndition3) 801 | ``` 802 | 803 | ```{r filter-ex4-hint-3} 804 | # Option 2: 805 | filter(df, 806 | condition1 & 807 | condition2 & 808 | condition3) 809 | ``` 810 | 811 | 812 | ```{r filter-ex4-solution} 813 | # This is true of only 2 flowers 814 | # Option 1: 815 | filter(iris, 816 | Species == 'virginica', 817 | Petal.Width > 2.4, 818 | Petal.Length <= 6.0) 819 | # Option 2: 820 | filter(iris, 821 | Species == 'virginica' & 822 | Petal.Width > 2.4 & 823 | Petal.Length <= 6.0) 824 | ``` 825 | 826 | ### Exercise 6.5: Mixing filters 827 | 828 | Look at the `starwars` data. 829 | What is the name and species of the man who is between 100 and 180 cm tall, has black or blue eyes, weighs 80 kg or less, does not have black hair, and whose homeworld is neither "Sullust" nor "Bespin". 830 | 831 | ```{r filter-ex5, exercise=TRUE} 832 | # Write your code here 833 | 834 | ``` 835 | 836 | ```{r filter-ex5-solution} 837 | filter(starwars, 838 | sex == "male", 839 | height %in% 100:180, 840 | eye_color %in% c("black", "blue"), 841 | mass <= 80, 842 | hair_color != "black", 843 | !(homeworld %in% c("Sullust", "Bespin"))) 844 | ``` 845 | 846 |