├── .gitignore ├── README.md ├── datasets └── Lacourse et al. (2001) Females.csv ├── powercurve-1.png ├── scaling.md ├── tutorial-power-analysis-using-simr.md ├── tutorial-scaling.md ├── tutorials-tidy-models.md └── tutorials-tidy-models_files └── figure-markdown_github ├── unnamed-chunk-6-1.png └── unnamed-chunk-6-2.png /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | Exp-Meth-III-Tutorials.Rproj 6 | *.rmd 7 | .DS_Store 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Intro 2 | 3 | This github contains tutorials for experimental methods III. 4 | 5 | The benefit of this page is 6 | 1) Everyone get to see all the relevant answers to question 7 | 2) You have ONE place rather than multiple slides, emails and facebook post 8 | 3) It get you in the habbit of using github 9 | 10 | The tutorials on this page include: 11 | 1) [An introduction to the benefits of scaling](https://github.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/blob/master/tutorial-scaling.md) 12 | 1) [A tutorial to power analysis using the simr package](https://github.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/blob/master/tutorial-power-analysis-using-simr.md) 13 | 1) [A tutorial in using tidy models](https://github.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/blob/master/tutorials-tidy-models.md) 14 | -------------------------------------------------------------------------------- /datasets/Lacourse et al. (2001) Females.csv: -------------------------------------------------------------------------------- 1 | Age,Age_Group,Drug_Use,Father_Negligence,Gender,Isolation,Marital_Status,Meaninglessness,Metal,Mother_Negligence,Normlessness,Self_Estrangement,Suicide_Risk,Vicarious,Worshipping 2 | 15.83,14-16 Years Old,8,17,Female,6,Together,10,4.83762518844781,10,6,15,Non-Suicidal,5,4 3 | 14.92,14-16 Years Old,9,23,Female,8,Together,26,6,12,8,20,Non-Suicidal,4,6 4 | 15.33,14-16 Years Old,5,15,Female,18,Together,19,6,16,7,17,Non-Suicidal,6,3 5 | 15.83,14-16 Years Old,11,11,Female,9,Separated or Divorced,13,4,10,5,12,Non-Suicidal,3,3 6 | 14.92,14-16 Years Old,7,13,Female,5,Together,13,8,16,3,6,Non-Suicidal,3,9 7 | 14.58,14-16 Years Old,4,29,Female,15,Separated or Divorced,18,7,18,5,15,Suicidal,2,4 8 | 14.5,14-16 Years Old,5,10,Female,8,Together,12,8,9,6,10,Non-Suicidal,3,4 9 | 15.67,14-16 Years Old,7,27,Female,6,Separated or Divorced,18,4,12,7,12,Non-Suicidal,3,4 10 | 14.92,14-16 Years Old,5,23,Female,10,Together,29,14,21,4,28,Non-Suicidal,8,9 11 | 15,14-16 Years Old,4,12,Female,5,Together,22,4,15,7,7,Non-Suicidal,5,9 12 | 15.17,14-16 Years Old,3,19,Female,12,Together,20,8,11,8,19,Non-Suicidal,5,7 13 | 17.17,16-19 Years Old,8,30,Female,14,Separated or Divorced,13,10,29,6,12,Suicidal,6,12 14 | 16.83,16-19 Years Old,11,12,Female,7,Together,22,4,22,7,19,Non-Suicidal,6,7 15 | 16.75,16-19 Years Old,6,16,Female,14,Together,25,7,9,11,10,Non-Suicidal,5,3 16 | 16.58,16-19 Years Old,8.43970783004555,20,Female,5,Together,16,4,15,4,9,Non-Suicidal,8,4 17 | 16.5,16-19 Years Old,5,17,Female,12,Together,17,7,9,6,19,Non-Suicidal,3,7 18 | 16.5,16-19 Years Old,3,24,Female,13,Separated or Divorced,21,7,24,7,16,Non-Suicidal,4,6 19 | 16.75,16-19 Years Old,3,22,Female,16,Together,30,4,31,7,11,Non-Suicidal,4,5 20 | 17,16-19 Years Old,5,20,Female,12,Together,13,4,14,4,12,Non-Suicidal,3,3 21 | 17.25,16-19 Years Old,5,31,Female,16,Separated or Divorced,23,5,21,6,22,Suicidal,5,7 22 | 17.17,16-19 Years Old,3,26,Female,15,Together,16,5,19.5824136069868,4,24,Non-Suicidal,5,9 23 | 17.42,16-19 Years Old,7.15478004110386,23,Female,15,Together,18,5,36,9,14,Non-Suicidal,6,3 24 | 17.08,16-19 Years Old,5,16,Female,10.9559005734461,Together,21,6.53961748140933,13,8,15,Non-Suicidal,6,4 25 | 17.92,16-19 Years Old,11,16,Female,6,Together,17,10,13,12,16,Suicidal,7,6 26 | 17.25,16-19 Years Old,7.87460242195194,26,Female,13,Together,24,6,10,8,10,Non-Suicidal,8,7 27 | 18.5,16-19 Years Old,6,33,Female,6,Separated or Divorced,25,4,9,6,14,Non-Suicidal,4,3 28 | 16.92,16-19 Years Old,5,27,Female,23,Together,10,12,19,5,22,Non-Suicidal,5,4 29 | 17.33,16-19 Years Old,12,14,Female,10,Together,10,6,15,7,16,Suicidal,3,5 30 | 17.42,16-19 Years Old,7,16,Female,6,Together,22,5,13,7,18,Non-Suicidal,5,4 31 | 17.17,16-19 Years Old,7,10,Female,5,Together,11,6,9,5,12,Suicidal,7,4 32 | 16.58,16-19 Years Old,8,12,Female,6,Together,13,9,12,6,11,Non-Suicidal,5,10 33 | 16.58,16-19 Years Old,5,21,Female,8,Together,15,4,18,7,14,Non-Suicidal,6,7 34 | 17.42,16-19 Years Old,6,30,Female,7,Together,25,9,24,4,14,Non-Suicidal,5,3 35 | 16.75,16-19 Years Old,11,24,Female,9,Together,15,7,15,8,12,Non-Suicidal,4,4 36 | 15,14-16 Years Old,11,29,Female,7,Together,26,11,10,6,7,Suicidal,5,10 37 | 16.5,16-19 Years Old,9,10,Female,4,Together,10,4,9,7,5,Non-Suicidal,3,4 38 | 16.58,16-19 Years Old,5,16,Female,13,Together,13,7,16,4,17,Non-Suicidal,5,4 39 | 16.67,16-19 Years Old,6,18,Female,11,Together,19,4,14,3,10,Non-Suicidal,5,3 40 | 17,16-19 Years Old,5,16,Female,8,Together,14,8.52820197005175,12,5,5,Suicidal,3,3 41 | 17.25,16-19 Years Old,7,12,Female,4,Together,8,4,16,10,10,Suicidal,2,3 42 | 14.58,14-16 Years Old,5,9,Female,4,Separated or Divorced,8,11,12,6,8,Non-Suicidal,8,6 43 | 14.75,14-16 Years Old,6,15,Female,6,Together,17,5,19,4,17,Non-Suicidal,5,5 44 | 14.5,14-16 Years Old,11,29,Female,10,Together,15,6.91536148045185,21,9,13,Non-Suicidal,2,3 45 | 14.75,14-16 Years Old,3,9,Female,4,Together,13,5,9,3,8,Non-Suicidal,3,9 46 | 14.58,14-16 Years Old,14,22,Female,14,Together,27,4,15,12,18,Suicidal,5,10 47 | 14.92,14-16 Years Old,7,13,Female,6,Separated or Divorced,14,4,13,5,11,Non-Suicidal,3,9 48 | 14.5,14-16 Years Old,14,27,Female,14,Together,25,7,18,8,16,Non-Suicidal,6,11 49 | 16.5,16-19 Years Old,9,10,Female,18,Separated or Divorced,15,6,24,6,26,Suicidal,6,5 50 | 14.67,14-16 Years Old,9,24,Female,6,Together,24,13,26,9,21,Suicidal,3,3 51 | 16.83,16-19 Years Old,13,22,Female,9,Separated or Divorced,15,4,9,9,14,Non-Suicidal,5,3 52 | 15.08,14-16 Years Old,8,29,Female,8,Separated or Divorced,30,9,21,17,29,Suicidal,7,10 53 | 15.33,14-16 Years Old,13,14,Female,5,Separated or Divorced,21,12,10,9,22,Suicidal,6,6 54 | 14.67,14-16 Years Old,9,11,Female,8,Together,19,6,10,7,18,Non-Suicidal,3,7 55 | 15.17,14-16 Years Old,11,22,Female,18,Together,20,12,13,8,22,Suicidal,8,9 56 | 16.17,14-16 Years Old,3,15,Female,18,Together,17,12,32,4,27,Non-Suicidal,7,3 57 | 16.08,14-16 Years Old,7,12,Female,5,Together,13.6292687605901,6,9,8,18,Suicidal,4,4 58 | 16.42,16-19 Years Old,3,13,Female,10,Together,14,6,20,3,10,Non-Suicidal,2,9 59 | 15.83,14-16 Years Old,5,11,Female,5,Together,16,5,11,4,8,Non-Suicidal,3,5 60 | 15.92,14-16 Years Old,10,35,Female,12,Separated or Divorced,25,4,17,8,14,Suicidal,7,5 61 | 16.25,16-19 Years Old,5,13,Female,8,Together,14,5,19,5,14,Non-Suicidal,6,6 62 | 17.75,16-19 Years Old,11,9,Female,9,Together,19,6,9,6,21,Suicidal,3,6 63 | 16,14-16 Years Old,6,14,Female,11,Together,17,9,18,11,17,Suicidal,5,10 64 | 15.42,14-16 Years Old,3,17,Female,16,Together,23,14,13,6,18,Non-Suicidal,3,3 65 | 16,14-16 Years Old,11,13,Female,6,Together,30,15,36,8,11,Non-Suicidal,8,6 66 | 15.17,14-16 Years Old,6,12,Female,12,Together,11,6,15,6,18,Non-Suicidal,4,5 67 | 15,14-16 Years Old,7,17,Female,11,Together,16,4,29,9,27,Non-Suicidal,8,4 68 | 14.5,14-16 Years Old,3,21,Female,5,Together,12,5,14,3,6,Non-Suicidal,8,12 69 | 15.08,14-16 Years Old,13,13,Female,7,Separated or Divorced,21,4,12,12,17,Non-Suicidal,8,9 70 | 15,14-16 Years Old,13,17,Female,9,Separated or Divorced,26,6,24,6,24,Non-Suicidal,8,9 71 | 15.08,14-16 Years Old,6,9,Female,7,Separated or Divorced,16,5,16,8,16,Non-Suicidal,4,3 72 | 15.25,14-16 Years Old,5,24,Female,18,Separated or Divorced,23.909975068745702,4,17,15,24,Non-Suicidal,2,4 73 | 15,14-16 Years Old,7,13,Female,11,Together,13,6,9,13,17,Suicidal,4,8 74 | 15.42,14-16 Years Old,13,22,Female,7,Together,18,5,13,5,28,Suicidal,8,6 75 | 14.67,14-16 Years Old,5,13,Female,8,Together,23,6,12,5,18,Non-Suicidal,5,7 76 | 15.42,14-16 Years Old,11,21,Female,6,Together,19,5,20,6,16,Non-Suicidal,5,4 77 | 15.17,14-16 Years Old,15,34,Female,22,Separated or Divorced,27,20,17,12,29,Suicidal,8,10 78 | 15.67,14-16 Years Old,6.36189616889847,11,Female,10,Together,11,5,9,8,15,Non-Suicidal,6,7 79 | 16.42,16-19 Years Old,6,10,Female,18,Together,21,6,14,10,21,Non-Suicidal,3,4 80 | 14.67,14-16 Years Old,4,11,Female,7,Together,11,10,9,4,11,Non-Suicidal,2,10 81 | 16.17,14-16 Years Old,7.35574617962391,15,Female,10,Together,14,7,14,5,20,Non-Suicidal,4,3 82 | 16.25,16-19 Years Old,5,12,Female,5,Together,16,4,13,5,14,Non-Suicidal,2,3 83 | 15.75,14-16 Years Old,13,14,Female,4,Together,12,7,15,12,9,Non-Suicidal,7,9 84 | 16.33,16-19 Years Old,4,15,Female,12,Together,14,5,13,4,10,Non-Suicidal,4,4 85 | 16.42,16-19 Years Old,12,25,Female,5,Together,18,5,16,6,20,Suicidal,3,5 86 | 17.08,16-19 Years Old,10,31,Female,16,Separated or Divorced,20,11,23,7,23,Suicidal,8,5 87 | 16.67,16-19 Years Old,7,15,Female,8,Together,21,6,10,3,11,Non-Suicidal,2,3 88 | 17.67,16-19 Years Old,4,10,Female,10,Together,22,16,10,5,15,Non-Suicidal,4,3 89 | 17.08,16-19 Years Old,7,25,Female,7,Separated or Divorced,13,6,13,5,13,Non-Suicidal,5,3 90 | 16.83,16-19 Years Old,5,17,Female,12,Separated or Divorced,13,4,20,4,14,Non-Suicidal,5,5 91 | 17,16-19 Years Old,5,20,Female,6,Separated or Divorced,11,6,12,5,19,Non-Suicidal,6,3 92 | 17.25,16-19 Years Old,11,27,Female,10,Separated or Divorced,13,12,12,8,17,Suicidal,7,7 93 | 17.25,16-19 Years Old,9,20,Female,9,Together,30,18,9,15,27,Suicidal,8,10 94 | 16.83,16-19 Years Old,11,31,Female,7,Separated or Divorced,21,7,14,13,16,Suicidal,3,5 95 | 16.42,16-19 Years Old,3,14,Female,17,Together,11,4,17,4,12,Non-Suicidal,6,4 96 | 17.42,16-19 Years Old,3,17,Female,12,Together,12,5,14,5,14,Non-Suicidal,6,3 97 | 16.42,16-19 Years Old,5,12,Female,4,Together,15,4,12,4,12,Non-Suicidal,5,5 98 | 16.75,16-19 Years Old,3,19,Female,11,Separated or Divorced,21,6,18,6,13,Non-Suicidal,5,3 99 | 17.25,16-19 Years Old,3,19,Female,8,Together,10,6,23,3,12,Non-Suicidal,2,3 100 | 17.08,16-19 Years Old,11,11,Female,5,Together,14.433093821544,8,10,6,8,Non-Suicidal,2,5 101 | 17.25,16-19 Years Old,7,16,Female,9,Together,23,6,14,10,10,Suicidal,4,3 102 | 17.33,16-19 Years Old,6,13,Female,4,Together,22,8,11,6,9,Non-Suicidal,4,3 103 | 17.08,16-19 Years Old,6,10,Female,10,Together,10,6,9,5,10,Non-Suicidal,5,4 104 | 17.25,16-19 Years Old,3,19,Female,4,Together,14,12,16,3,9,Non-Suicidal,4,3 105 | 17.42,16-19 Years Old,3,11,Female,9,Together,12,5,10,3,12,Non-Suicidal,2,8 106 | 17.42,16-19 Years Old,5,12,Female,11,Together,26,7,9,5,18,Non-Suicidal,3,4 107 | 15.75,14-16 Years Old,13,13,Female,18,Together,18,14,9,8,21,Suicidal,4,7 108 | 16,14-16 Years Old,5,29,Female,8,Separated or Divorced,19,6,11,7,18,Non-Suicidal,6,6 109 | 16.42,16-19 Years Old,5,10,Female,9,Together,17,4,11.8560541141899,4,17,Non-Suicidal,6,3 110 | 15.58,14-16 Years Old,6,22,Female,18,Together,20,5.06898320036459,15.7517495914531,12,17,Non-Suicidal,6,7 111 | 15.83,14-16 Years Old,5,12,Female,6,Together,13,6,14,3,14,Non-Suicidal,3,4 112 | 15.58,14-16 Years Old,11,21,Female,10,Together,20,4,16,9,21,Suicidal,5,3 113 | 15.83,14-16 Years Old,3,18,Female,9,Together,18,10,14,5,10,Non-Suicidal,3,6 114 | 15.5,14-16 Years Old,3,13,Female,5,Together,19,5,15,6,11,Non-Suicidal,3,4 115 | 15.58,14-16 Years Old,3,10,Female,4,Together,25,4,9,3,14,Non-Suicidal,2,4 116 | 16.33,16-19 Years Old,3,17,Female,5,Together,18,13,13,5,14,Non-Suicidal,4,4 117 | 15.83,14-16 Years Old,11,26,Female,8,Separated or Divorced,22,6,17,9,7,Non-Suicidal,5,4 118 | 15.5,14-16 Years Old,11,12,Female,4,Together,14,18,12,14,15,Non-Suicidal,7,8 119 | 15.58,14-16 Years Old,5,16,Female,14,Together,29,4,12,13,14,Non-Suicidal,5,6 120 | 15.58,14-16 Years Old,4,13,Female,8,Together,15,5,13,8,16,Non-Suicidal,5,3 121 | 15.67,14-16 Years Old,7,20,Female,6,Together,14,6,18,4,11,Non-Suicidal,7,4 122 | 16,14-16 Years Old,5,9,Female,4,Separated or Divorced,11,4,9,4,10,Non-Suicidal,2,3 123 | -------------------------------------------------------------------------------- /powercurve-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/577db60da2d0448a9ba6cae4f9861524837efd00/powercurve-1.png -------------------------------------------------------------------------------- /scaling.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | (Note that this is saved as pdf due to the inclusion of formulas) 4 | 5 | This is the start of a tutorial explaining why you should scale your 6 | variables. Some of it is still missing though. 7 | 8 | Scaling formula 9 | =============== 10 | 11 | Scaling (or Z-score normalization) follows the formula 12 | $$ 13 | x' = \\frac{x- \\bar x}{\\sigma} 14 | $$ 15 | Where *x* − *x̄* centers the variables around zero and where dividing by 16 | *σ* constitutes the normalization of the variables. 17 | 18 | The scale function performs a Z-score normalization, which can be seen 19 | here: 20 | 21 | set.seed(1) # set seed to ensure random number generation is the same 22 | x <- runif(7) # generate random numbers 23 | 24 | # Manually scaling 25 | (x - mean(x)) / sd(x) 26 | 27 | ## [1] -1.01951259 -0.68940037 -0.06788275 0.97047346 -1.21713898 0.94007371 28 | ## [7] 1.08338753 29 | 30 | #scale using scale 31 | scale(x) 32 | 33 | ## [,1] 34 | ## [1,] -1.01951259 35 | ## [2,] -0.68940037 36 | ## [3,] -0.06788275 37 | ## [4,] 0.97047346 38 | ## [5,] -1.21713898 39 | ## [6,] 0.94007371 40 | ## [7,] 1.08338753 41 | ## attr(,"scaled:center") 42 | ## [1] 0.5947772 43 | ## attr(,"scaled:scale") 44 | ## [1] 0.3229666 45 | 46 | class(scale(x)) 47 | 48 | ## [1] "matrix" 49 | 50 | *note* that scale is a class matrix, which might cause problems. If you 51 | want scale to return a numeric vector (a list of numbers) use 52 | `scale(x)[[1]]`. 53 | 54 | FAQ 55 | --- 56 | 57 | ### How do i include formula in my R markdown? 58 | 59 | This is done using LaTeX. You first need to install a LaTeX distribution 60 | on your system e.g. using the following install command 61 | `tinytex::install_tinytex()` You also need to set your header to output 62 | latex. The header for this script is for example: 63 | 64 | --- 65 | title: "Tutorial_Scaling" 66 | author: "Kenneth C. Enevoldsen" 67 | date: "10/14/2019" 68 | header-includes: 69 | - \usepackage{bbm} 70 | output: 71 | html_document: md_document 72 | --- 73 | 74 | You then include latex using the double dollarsigns, in the following 75 | way: `$$ x - \bar x$$` 76 | -------------------------------------------------------------------------------- /tutorial-power-analysis-using-simr.md: -------------------------------------------------------------------------------- 1 | # Power Analysis using simr 2 | ------------------------- 3 | 4 | ## Install packages from Git 5 | Prior to this make sure your **R version is 3.5.3** or above. Furhermore 6 | simr might **require dependencies**. Make sure these are met. 7 | 8 | ``` r 9 | #library(githubinstall) 10 | #githubinstall("simr", lib = .libPaths()) 11 | print(pacman::p_path() == .libPaths()) # should be the same!! 12 | ``` 13 | 14 | ## [1] TRUE 15 | 16 | ``` r 17 | pacman::p_load(tidyverse, lmerTest, simr) 18 | ``` 19 | 20 | Read the data 21 | ------------- 22 | You shouldn’t need this, but I have put it here to make sure you can 23 | replicate my results. 24 | ``` r 25 | CleanUpData <- function(Demo,LU,Word){ 26 | 27 | Speech <- merge(LU, Word) %>% 28 | rename( 29 | Child.ID = SUBJ, 30 | Visit = VISIT) %>% 31 | mutate( 32 | Visit = as.numeric(str_extract(Visit, "\\d")), 33 | Child.ID = gsub("\\.","", Child.ID) 34 | ) %>% 35 | dplyr::select( 36 | Child.ID, Visit, MOT_MLU, CHI_MLU, types_MOT, types_CHI, tokens_MOT, tokens_CHI 37 | ) 38 | 39 | Demo <- Demo %>% 40 | dplyr::select( 41 | Child.ID, Visit, Ethnicity, Diagnosis, Gender, Age, ADOS, MullenRaw, ExpressiveLangRaw, Socialization 42 | ) %>% 43 | mutate( 44 | Child.ID = gsub("\\.","", Child.ID) 45 | ) 46 | 47 | Data = merge(Demo, Speech, all = T) 48 | 49 | Data1 = Data %>% 50 | subset(Visit == "1") %>% 51 | dplyr::select(Child.ID, ADOS, ExpressiveLangRaw, MullenRaw, Socialization) %>% 52 | rename(Ados1 = ADOS, 53 | verbalIQ1 = ExpressiveLangRaw, 54 | nonVerbalIQ1 = MullenRaw, 55 | Socialization1 = Socialization) 56 | 57 | Data = merge(Data, Data1, all = T) %>% 58 | mutate( 59 | Child.ID = as.numeric(as.factor(as.character(Child.ID))), 60 | Visit = as.numeric(as.character(Visit)), 61 | Gender = recode(Gender, 62 | "1" = "M", 63 | "2" = "F"), 64 | Diagnosis = recode(Diagnosis, 65 | "A" = "ASD", # Note that this function is fixed here 66 | "B" = "TD") 67 | ) 68 | 69 | return(Data) 70 | } 71 | 72 | # Training Data 73 | #setwd('~/Dropbox/2019 - methods 3/Assignments19/Assignment2/solutions/') 74 | Demo <- read_csv('../../Assignment1/data/demo_train.csv') 75 | LU <- read_csv('../../Assignment1/data/LU_train.csv') 76 | Word <- read_csv('../../Assignment1/data/token_train.csv') 77 | TrainData <- CleanUpData(Demo,LU,Word) 78 | Demo <- read_csv('../../Assignment2/data/demo_test.csv') 79 | LU <- read_csv('../../Assignment2/data/LU_test.csv') 80 | Word <- read_csv('../../Assignment2/data/token_test.csv') 81 | TestData <- CleanUpData(Demo,LU,Word) 82 | 83 | # merge training and testing 84 | Data <- merge(TrainData, TestData, all = T) 85 | 86 | Data <- Data[complete.cases(Data[,c("CHI_MLU","Visit","Diagnosis","verbalIQ1","Child.ID")]),] 87 | Data$Child.ID <- as.factor(Data$Child.ID) 88 | ``` 89 | 90 | ------------------------------------------------------------------------ 91 | 92 | Create a model 93 | -------------- 94 | 95 | ``` r 96 | InteractionM <- lmer(CHI_MLU ~ Visit * Diagnosis + (1+Visit|Child.ID), 97 | Data, REML = F, 98 | control = lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE)) #optimizer should be relevant 99 | ``` 100 | 101 | this model encodes the following hypotheses and assumptions: 102 | 103 | 1. We expect child MLU to vary dependent on visit and diagnosis 104 | 105 | 2. We expect the effect of visit to vary according to diagnosis 106 | (interaction) 107 | 108 | 3. We have multiple datapoint pr. child ID 109 | 110 | 4. We expect each child start from different point (random intercept) 111 | and increase MLU at different rates (random slope) 112 | 113 | ## What is the power of Diagnosis? 114 | 115 | ``` r 116 | sim = powerSim(InteractionM , fixed("Diagnosis"), nsim = 50, seed = 1, progress = F) 117 | ``` 118 | 119 | ## Warning in observedPowerWarning(sim): This appears to be an "observed 120 | ## power" calculation 121 | 122 | ``` r 123 | sim 124 | ``` 125 | 126 | ## Power for predictor 'Diagnosis', (95% confidence interval): 127 | ## 100.0% (92.89, 100.0) 128 | ## 129 | ## Test: unknown test 130 | ## 131 | ## Based on 50 simulations, (50 warnings, 0 errors) 132 | ## alpha = 0.05, nrow = 387 133 | ## 134 | ## Time elapsed: 0 h 0 m 3 s 135 | ## 136 | ## nb: result might be an observed power calculation 137 | 138 | ``` r 139 | # progress just turns off the progressbar - makes for a nicer reading experience 140 | # seed = 1 ensures I get the same result also when knitting 141 | ``` 142 | 143 | Oh 100% not too shabby. Note that it gives a warning, which is the same 144 | as for visit. Let’s just ignore it for now. Furthermore it also gives a 145 | warning telling us it is an observed power calculation, this simply 146 | tells us that the visit is fixed effect is the one the model calculated. 147 | We can change this using: 148 | 149 | `fixef(InteractionM)["Diagnosis"] <- 0.4` 150 | 151 | This is the *minimal interesting effect*, e.g. if the effect is lower 152 | than this we are not interested in it and consequently we should not 153 | make a power analysis for any power lower than this. This is *naturally* 154 | context dependent and you should argue for your minimal interesting 155 | effect. You could for example argue that the mean of the ASD should vary 156 | from the mean of TD by 1 SD (you would need to calculate this) or you 157 | could (and probably should) examine the literature to examine what a 158 | relevant effect might be. 159 | 160 | ------------------------------------------------------------------------ 161 | 162 | ## What about Visit? 163 | 164 | ``` r 165 | sim = powerSim(InteractionM , fixed("Visit"), nsim = 50, seed = 1, progress = F) 166 | ``` 167 | 168 | ## Warning in observedPowerWarning(sim): This appears to be an "observed 169 | ## power" calculation 170 | 171 | ``` r 172 | sim 173 | ``` 174 | 175 | ## Power for predictor 'Visit', (95% confidence interval): 176 | ## 0.00% ( 0.00, 7.11) 177 | ## 178 | ## Test: unknown test 179 | ## Effect size for Visit is 0.13 180 | ## 181 | ## Based on 50 simulations, (50 warnings, 50 errors) 182 | ## alpha = 0.05, nrow = 387 183 | ## 184 | ## Time elapsed: 0 h 0 m 4 s 185 | ## 186 | ## nb: result might be an observed power calculation 187 | 188 | **50 errors**, that is a problem (if there is any errors don’t worry about 189 | the power). Lets examine: 190 | 191 | ``` r 192 | print(sim$errors$message[1]) 193 | ``` 194 | 195 | ## [1] "Models have either equal fixed mean stucture or are not nested" 196 | 197 | ``` r 198 | print(sim$warnings$message[1]) 199 | ``` 200 | 201 | ## [1] "Main effect (Visit) was tested but there were interactions." 202 | 203 | So from this we see that simr is telling us that we should examine the 204 | fixed effect when there is a interaction effect. If we want to examine 205 | the fixed effect independently we should make a model without these. 206 | (try to check if the warning message for diagnosis is the same) 207 | 208 | **Sidenote:** This is a bad design for a function output, as 209 | 210 | 1. the errors aren’t clearly visible 211 | 212 | 2. the relevant error is ‘hidden’ in the warning, while the actual 213 | error message is gibberish 214 | 215 | 3. Worst of all, someone might interpret these result without noticing 216 | the errors 217 | 218 | ------------------------------------------------------------------------ 219 | 220 | ## and, lastly, the interaction 221 | 222 | ``` r 223 | fixef(InteractionM)["Visit:DiagnosisTD"] <- 0.1 # let's try setting a fixed ef 224 | powerSim(InteractionM , fixed("Visit:Diagnosis"), nsim = 50, seed = 1, progress = F) 225 | ``` 226 | 227 | ## Power for predictor 'Visit:Diagnosis', (95% confidence interval): 228 | ## 76.00% (61.83, 86.94) 229 | ## 230 | ## Test: unknown test 231 | ## 232 | ## Based on 50 simulations, (0 warnings, 0 errors) 233 | ## alpha = 0.05, nrow = 387 234 | ## 235 | ## Time elapsed: 0 h 0 m 10 s 236 | 237 | Assuming 0.1 is the minimal interesting effect we don’t have enough 238 | power. Consequently we should add some more participant. Let’s try to do 239 | that: 240 | 241 | ``` r 242 | InteractionM <- extend(InteractionM, along = "Child.ID", n = 120) #extend data along child ID 243 | 244 | # plot the powercurve 245 | powerCurveV1 = powerCurve(InteractionM, fixed("Visit:Diagnosis"), along = "Child.ID", 246 | nsim = 10, breaks = seq(from = 10, to = 120, by = 5), seed = 1, progress = F) # waaay to few sim 247 | # break is a which interval is should do a power calculations (this simply says every 5th child) 248 | plot(powerCurveV1) 249 | ``` 250 | 251 | ![](powercurve-1.png) 252 | 253 | Here we see that given the specified effect we would only have enough power with 254 | 75 participants!! 255 | 256 | **Is 0.1 a good estimate?** I would argue it is set too low. Why? Well 257 | because a TD child whom only improves by 0.1 mlu pr. visit is rather 258 | low. This conclusion naturally take into consideration the time between 259 | each visit and what we know about the development of children MLU. 260 | 261 | --- 262 | 263 | ## *Additions aka. FAQ* 264 | These are added in response to question I get so everyone can get to see them. 265 | 266 | ### Regarding choosing effect size - an example 267 | Choosing an effect size can be hard, but it is highly important since it requires field expertise as well as knowledge and assumptions about the data generating process. 268 | For example if you were to measure whether Cognitive science students are smarter than the avereage student at Aarhus by measing IQ. We will then run a simulation for a power analysis and specify the least interesting effect between groups. Assuming the model we choose is: 269 | 270 | ``` r 271 | lm(IQ ~ is_cogsci) 272 | ``` 273 | 274 | This corresponds to the beta estimates (e.g. slope) of the model. Naturally a slope of 0-1 would be only a minimal effekt, e.g. cogsci student are only **very** slighly smarter than the rest of student and would need a big sample to get this quite uninsteresting result. Alternatively, let's say we only consider an beta estimate of >5 interesting we set the minimal interesting effect to be 5, we can then measure how many participant we would need to measure the desired effect. 275 | 276 | Note that this can also be turned around. E.g. you could ask *"given that we collect 30 participants what is the smallest effect we can discover while maintaining a power of 80%?"*. You could do this by simply looping over a range of valid effect sizes and then estimating the power. 277 | 278 | Using field specific knowledge is also very similar to setting priors in a bayesian framework, so it might be an idea to get used quantifying your beliefs as you will have to do this more in the future. 279 | 280 | 281 | ### Regarding alpha at 0.05 and our beta at 0.8 282 | So in the assignment the values alpha at 0.05 and our beta at 0.8 is refered to. 283 | These two values signifies: 284 | alpha 0.05: the significance level we will be using 285 | beta 0.8: the power of the power simulation e.g. 80% 286 | -------------------------------------------------------------------------------- /tutorial-scaling.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | This is the start of a tutorial explaining why you should scale your 4 | variables. Some of it is still missing though. 5 | 6 | Scaling formula 7 | =============== 8 | 9 | Scaling (or Z-score normalization) follows the formula 10 | $$ 11 | x' = \\frac{x- \\bar x}{\\sigma} 12 | $$ 13 | Where *x* − *x̄* centers the variables around zero and where dividing by 14 | *σ* constitutes the normalization of the variables. 15 | 16 | The scale function performs a Z-score normalization, which can be seen 17 | here: 18 | 19 | set.seed(1) # set seed to ensure random number generation is the same 20 | x <- runif(7) # generate random numbers 21 | 22 | # Manually scaling 23 | (x - mean(x)) / sd(x) 24 | 25 | ## [1] -1.01951259 -0.68940037 -0.06788275 0.97047346 -1.21713898 0.94007371 26 | ## [7] 1.08338753 27 | 28 | #scale using scale 29 | scale(x) 30 | 31 | ## [,1] 32 | ## [1,] -1.01951259 33 | ## [2,] -0.68940037 34 | ## [3,] -0.06788275 35 | ## [4,] 0.97047346 36 | ## [5,] -1.21713898 37 | ## [6,] 0.94007371 38 | ## [7,] 1.08338753 39 | ## attr(,"scaled:center") 40 | ## [1] 0.5947772 41 | ## attr(,"scaled:scale") 42 | ## [1] 0.3229666 43 | 44 | class(scale(x)) 45 | 46 | ## [1] "matrix" 47 | 48 | *note* that scale is a class matrix, which might cause problems. If you 49 | want scale to return a numeric vector (a list of numbers) use 50 | `scale(x)[[1]]`. Furthermore note that the mean (scale:center) and SD 51 | (scaled:scale) is also provided by the scale function, which is useful 52 | for going back to the original values. 53 | 54 | Will come in the future 55 | ----------------------- 56 | 57 | 1. Why scaling improves model fit (and convergence) 58 | 2. Why center (and scaling) improves model interpretability 59 | 3. Other ways to scale / normalize 60 | 61 | FAQ 62 | --- 63 | 64 | ### How do i include formula in my R markdown? 65 | 66 | This is done using LaTeX. You first need to install a LaTeX distribution 67 | on your system e.g. using the following install command 68 | `tinytex::install_tinytex()` You also need to set your header to output 69 | latex. The header for this script is for example: 70 | 71 | --- 72 | title: "Tutorial_Scaling" 73 | author: "Kenneth C. Enevoldsen" 74 | date: "10/14/2019" 75 | header-includes: 76 | - \usepackage{bbm} 77 | output: 78 | html_document: md_document 79 | --- 80 | 81 | You then include latex using the double dollarsigns, in the following 82 | way: `$$ x - \bar x$$` 83 | -------------------------------------------------------------------------------- /tutorials-tidy-models.md: -------------------------------------------------------------------------------- 1 | Currently commentary in this file is fairly limited, let me know if 2 | there is specific points which you feel could need more explanation. 3 | 4 | Load packages 5 | ------------- 6 | 7 | ``` r 8 | pacman::p_load(pacman, tidyverse, tidymodels, groupdata2) 9 | ``` 10 | 11 | For more on the data see [Lacourse et at. 12 | (2001)](https://www.researchgate.net/publication/226157359_Heavy_Metal_Music_and_Adolescent_Suicidal_Risk) 13 | 14 | Load Data and Data Preperation 15 | ------------------------------ 16 | 17 | ``` r 18 | heavy_data <- read_csv("datasets/Lacourse et al. (2001) Females.csv", col_types = cols()) %>% 19 | mutate_at(c("Age_Group", "Marital_Status", "Gender", "Suicide_Risk"), as.factor) 20 | 21 | #examine dataset 22 | heavy_data %>% head(5) %>% knitr::kable() 23 | ``` 24 | 25 | | Age| Age\_Group | Drug\_Use| Father\_Negligence| Gender | Isolation| Marital\_Status | Meaninglessness| Metal| Mother\_Negligence| Normlessness| Self\_Estrangement| Suicide\_Risk | Vicarious| Worshipping| 26 | |------:|:----------------|----------:|-------------------:|:-------|----------:|:----------------------|----------------:|---------:|-------------------:|-------------:|-------------------:|:--------------|----------:|------------:| 27 | | 15.83| 14-16 Years Old | 8| 17| Female | 6| Together | 10| 4.837625| 10| 6| 15| Non-Suicidal | 5| 4| 28 | | 14.92| 14-16 Years Old | 9| 23| Female | 8| Together | 26| 6.000000| 12| 8| 20| Non-Suicidal | 4| 6| 29 | | 15.33| 14-16 Years Old | 5| 15| Female | 18| Together | 19| 6.000000| 16| 7| 17| Non-Suicidal | 6| 3| 30 | | 15.83| 14-16 Years Old | 11| 11| Female | 9| Separated or Divorced | 13| 4.000000| 10| 5| 12| Non-Suicidal | 3| 3| 31 | | 14.92| 14-16 Years Old | 7| 13| Female | 5| Together | 13| 8.000000| 16| 3| 6| Non-Suicidal | 3| 9| 32 | 33 | ``` r 34 | # Prep the dataset 35 | df <- heavy_data %>% 36 | select(-Age_Group, -Gender) #since it is highly correlated with Age and Gender = Female 37 | ``` 38 | 39 | Partioning the data 40 | ------------------- 41 | 42 | Using groupdata2, note that you can also use rsample which follows the 43 | tidymodels framework, but does not (at least easily) allow for 44 | categorical and ID columns. 45 | 46 | ``` r 47 | set.seed(5) 48 | df_list <- partition(df, p = 0.2, cat_col = c("Suicide_Risk"), id_col = NULL, list_out = T) 49 | df_test = df_list[[1]] 50 | df_train = df_list[[2]] 51 | ``` 52 | 53 | If you have an id column you should remove it after here. Otherwise it 54 | will be included as a predictor. 55 | 56 | Let’s define a ‘recipe’ that preprocesses the data 57 | -------------------------------------------------- 58 | 59 | Note that the naming in recipe (a package within tidymodels), can be 60 | rather odd. They really dig the recipe metaphor. 61 | 62 | ``` r 63 | #create recipe 64 | rec <- df_train %>% recipe(Suicide_Risk ~ .) %>% # defines the outcome 65 | step_center(all_numeric()) %>% # center numeric predictors 66 | step_scale(all_numeric()) %>% # scales numeric predictors 67 | step_corr(all_numeric()) %>% 68 | check_missing(everything()) %>% 69 | prep(training = df_train) 70 | 71 | train_baked <- juice(rec) # extract df_train from rec 72 | 73 | rec #inspect rec 74 | ``` 75 | 76 | ## Data Recipe 77 | ## 78 | ## Inputs: 79 | ## 80 | ## role #variables 81 | ## outcome 1 82 | ## predictor 12 83 | ## 84 | ## Training data contained 97 data points and no missing data. 85 | ## 86 | ## Operations: 87 | ## 88 | ## Centering for Age, Drug_Use, ... [trained] 89 | ## Scaling for Age, Drug_Use, ... [trained] 90 | ## Correlation filter removed no terms [trained] 91 | ## Check missing values for Age, Drug_Use, ... [trained] 92 | 93 | This is naturally just an example of how a reciple could be done. It 94 | does not need to check for e.g. missing values or correlation if I know 95 | it is not an issue. 96 | 97 | ### Apply recipe to test 98 | 99 | ``` r 100 | test_baked <- rec %>% bake(df_test) 101 | ``` 102 | 103 | **note** that we just extract the `df_train` from the recipe (`rec`) to 104 | convince yourself that they are indeed the same compare `juice(rec)` 105 | with `rec %>% bake(df_train)`. 106 | 107 | Creating models 108 | --------------- 109 | 110 | To see all the available models in parsnip (part of tidy models) go 111 | [here](https://tidymodels.github.io/parsnip/articles/articles/Models.html). 112 | 113 | ``` r 114 | #logistic regression 115 | log_fit <- 116 | logistic_reg() %>% 117 | set_mode("classification") %>% 118 | set_engine("glm") %>% 119 | fit(Suicide_Risk ~ . , data = train_baked) 120 | 121 | #support vector machine 122 | svm_fit <- 123 | svm_rbf() %>% 124 | set_mode("classification") %>% 125 | set_engine("kernlab") %>% 126 | fit(Suicide_Risk ~ . , data = train_baked) 127 | ``` 128 | 129 | Apply model to test set 130 | ----------------------- 131 | 132 | ``` r 133 | #predict class 134 | log_class <- log_fit %>% 135 | predict(new_data = test_baked) 136 | #get prob of class 137 | log_prop <- log_fit %>% 138 | predict(new_data = test_baked, type = "prob") %>% 139 | pull(.pred_Suicidal) 140 | 141 | #get multiple at once 142 | test_results <- 143 | test_baked %>% 144 | select(Suicide_Risk) %>% 145 | mutate( 146 | log_class = predict(log_fit, new_data = test_baked) %>% 147 | pull(.pred_class), 148 | log_prob = predict(log_fit, new_data = test_baked, type = "prob") %>% 149 | pull(.pred_Suicidal), 150 | svm_class = predict(svm_fit, new_data = test_baked) %>% 151 | pull(.pred_class), 152 | svm_prob = predict(svm_fit, new_data = test_baked, type = "prob") %>% 153 | pull(.pred_Suicidal) 154 | ) 155 | 156 | test_results %>% 157 | head(5) %>% 158 | knitr::kable() #examine the first 5 159 | ``` 160 | 161 | | Suicide\_Risk | log\_class | log\_prob| svm\_class | svm\_prob| 162 | |:--------------|:-------------|----------:|:-------------|----------:| 163 | | Non-Suicidal | Non-Suicidal | 0.0038562| Non-Suicidal | 0.0354329| 164 | | Non-Suicidal | Non-Suicidal | 0.2125898| Non-Suicidal | 0.5161208| 165 | | Non-Suicidal | Non-Suicidal | 0.3828569| Non-Suicidal | 0.3209701| 166 | | Non-Suicidal | Non-Suicidal | 0.0860703| Non-Suicidal | 0.0462909| 167 | | Non-Suicidal | Non-Suicidal | 0.0729597| Non-Suicidal | 0.3043180| 168 | 169 | Performance metrics 170 | ------------------- 171 | 172 | ``` r 173 | metrics(test_results, truth = Suicide_Risk, estimate = log_class) %>% 174 | knitr::kable() 175 | ``` 176 | 177 | | .metric | .estimator | .estimate| 178 | |:---------|:-----------|----------:| 179 | | accuracy | binary | 0.8750000| 180 | | kap | binary | 0.6470588| 181 | 182 | ``` r 183 | metrics(test_results, truth = Suicide_Risk, estimate = svm_class) %>% 184 | knitr::kable() 185 | ``` 186 | 187 | | .metric | .estimator | .estimate| 188 | |:---------|:-----------|----------:| 189 | | accuracy | binary | 0.8333333| 190 | | kap | binary | 0.5000000| 191 | 192 | ``` r 193 | #plotting the roc curve: 194 | test_results %>% 195 | roc_curve(truth = Suicide_Risk, log_prob) %>% 196 | autoplot() 197 | ``` 198 | 199 | ![](tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-1.png) 200 | 201 | ``` r 202 | test_results %>% 203 | mutate(log_prob = 1 - log_prob) %>% # for the plot to show correctly (otherwise the line would be flipped) 204 | gain_curve(truth = Suicide_Risk, log_prob) %>% 205 | autoplot() 206 | ``` 207 | 208 | ![](tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-2.png) 209 | 210 | For more on how to interpret roc curves see this [toward datascience 211 | blogpost](https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5) 212 | 213 | For more on how to interpret gain curves see this [stack exchange 214 | question](https://stats.stackexchange.com/questions/279385/gain-curve-interpretation) 215 | 216 | Multiple Cross Validation 217 | ------------------------- 218 | 219 | ``` r 220 | # create 10 folds, 10 times, and make sure Suicide_Risk is balances across groups 221 | cv_folds <- vfold_cv(df_train, v = 10, repeats = 10, strata = Suicide_Risk) 222 | ``` 223 | 224 | **Note** that to respect ID use `group_vfold_cv()` or specify `group`. 225 | 226 | That was almost too easy. Now let us prepare the dataset and the fit the 227 | model to each supset. 228 | 229 | ``` r 230 | #prepare data set and fetch train data 231 | cv_folds <- cv_folds %>% 232 | mutate(recipes = splits %>% 233 | # prepper is a wrapper for `prep()` which handles `split` objects 234 | map(prepper, recipe = rec), 235 | train_data = splits %>% map(training)) 236 | 237 | # train model of each fold 238 | # create a non-fitted model 239 | log_fit <- 240 | logistic_reg() %>% 241 | set_mode("classification") %>% 242 | set_engine("glm") 243 | 244 | 245 | cv_folds <- cv_folds %>% mutate( 246 | log_fits = pmap(list(recipes, train_data), #input 247 | ~ fit(log_fit, formula(.x), data = bake(object = .x, new_data = .y)) # function to apply 248 | )) 249 | ``` 250 | 251 | map2 is a parallel version of map. It have the syntax: 252 | `pmap(list(x, y), ~ .x + .y)`, where x and y are input and ´\~.x + .y´ 253 | if the function to apply. You could also have used `map2`, which is 254 | specialised for just two variables `map2(x, y, ~ .x + .y)`. 255 | 256 | ``` r 257 | cv_folds %>% head(5) 258 | ``` 259 | 260 | ## # A tibble: 5 x 6 261 | ## splits id id2 recipes train_data log_fits 262 | ## * 263 | ## 1 Repeat01 Fold01 264 | ## 2 Repeat01 Fold02 265 | ## 3 Repeat01 Fold03 266 | ## 4 Repeat01 Fold04 267 | ## 5 Repeat01 Fold05 268 | 269 | Note how the dataframe looks. Take some time to understand it and note 270 | especially that cells contains entire datasets and their respective 271 | recipes and models. 272 | 273 | Now it gets slightly more complicated, we create a function which takens 274 | in a split (fold) from the above cross validation and applies a recipe 275 | and a model to it. Returns a tibble containing the actual and predicted 276 | results. We then apply it to the dataset. 277 | 278 | ``` r 279 | predict_log <- function(split, rec, model) { 280 | # IN 281 | # split: a split data 282 | # rec: recipe to prepare the data 283 | # 284 | # OUT 285 | # a tibble of the actual and predicted results 286 | baked_test <- bake(rec, testing(split)) 287 | tibble( 288 | actual = baked_test$Suicide_Risk, 289 | predicted = predict(model, new_data = baked_test) %>% pull(.pred_class), 290 | prop_sui = predict(model, new_data = baked_test, type = "prob") %>% pull(.pred_Suicidal), 291 | prop_non_sui = predict(model, new_data = baked_test, type = "prob") %>% pull(`.pred_Non-Suicidal`) 292 | ) 293 | } 294 | 295 | # apply our function to each split, which their respective recipes and models (in this case log fits) and save it to a new col 296 | cv_folds <- cv_folds %>% 297 | mutate(pred = pmap(list(splits, recipes, log_fits) , predict_log)) 298 | ``` 299 | 300 | Performance metrics 301 | ------------------- 302 | 303 | ``` r 304 | eval <- 305 | cv_folds %>% 306 | mutate( 307 | metrics = pmap(list(pred), ~ metrics(., truth = actual, estimate = predicted, prop_sui))) %>% 308 | select(id, id2, metrics) %>% 309 | unnest(metrics) 310 | 311 | #inspect performance metrics 312 | eval %>% 313 | select(repeat_n = id, fold_n = id2, metric = .metric, estimate = .estimate) %>% 314 | spread(metric, estimate) %>% 315 | head() %>% 316 | knitr::kable() 317 | ``` 318 | 319 | | repeat\_n | fold\_n | accuracy| kap| mn\_log\_loss| roc\_auc| 320 | |:----------|:--------|----------:|-----------:|--------------:|----------:| 321 | | Repeat01 | Fold01 | 0.6363636| 0.0833333| 1.282365| 0.4583333| 322 | | Repeat01 | Fold02 | 0.6363636| -0.1578947| 2.155142| 0.7916667| 323 | | Repeat01 | Fold03 | 0.7272727| 0.2325581| 3.027221| 0.7083333| 324 | | Repeat01 | Fold04 | 0.8000000| 0.5238095| 2.302621| 0.9523810| 325 | | Repeat01 | Fold05 | 0.8888889| 0.6086957| 2.145611| 0.5000000| 326 | | Repeat01 | Fold06 | 0.7777778| 0.3571429| 1.693287| 0.7857143| 327 | 328 | FAQ 329 | --- 330 | 331 | ### We have problems with merge and ID’s what should we do? 332 | 333 | You should first of all have that the following code evaluates to true, 334 | e.g. there should be the same amount of datapoint as you have pitch 335 | files. 336 | 337 | nrow(df) == length(list.files(path = filepath, pattern = ".txt", full.names = T)) 338 | 339 | Furthermore you should have unique participant ID’s. You can check this 340 | using the following code. Before you use it write a comment about what 341 | it does and why it works. What should the maximum value of 342 | `n_studies_pr_id` be? 343 | 344 | df %>% select(id, study) %>% unique() %>% group_by(id) %>% summarise(n_studies_pr_id = n()) %>% View() 345 | 346 | ### Can I please have some more cvms? 347 | 348 | Of course you can! The following is an example sent to me by Ludvig on 349 | how to use tidy models with cvms. Note that you will need to update cvms 350 | from CRAN, before the following work. 351 | 352 | ``` r 353 | # Attach packages 354 | pacman::p_load(cvms, tidymodels) 355 | 356 | # Prepare data 357 | dat <- groupdata2::fold(participant.scores, k = 4, 358 | cat_col = 'diagnosis', 359 | id_col = 'participant') 360 | dat[["diagnosis"]] <- factor(dat[["diagnosis"]]) 361 | 362 | # Create a model function (random forest in this case) 363 | # that takes train_data and formula as arguments 364 | # and returns the fitted model object 365 | rf_model_fn <- function(train_data, formula){ 366 | rand_forest(trees = 100, mode = "classification") %>% 367 | set_engine("randomForest") %>% 368 | fit(formula, data = train_data) 369 | } 370 | 371 | # Create a predict function 372 | # Usually just wraps stats::predict 373 | # Takes test_data, model and formula arguments 374 | # and returns vector with probabilities of class 1 375 | # (this depends on the type of task, gaussian, binomial or multinomial) 376 | rf_predict_fn <- function(test_data, model, formula){ 377 | stats::predict(object = model, new_data = test_data, type = "prob")[[2]] 378 | } 379 | 380 | # Now cross-validation 381 | # Note the different argument names from cross_validate() 382 | CV <- cross_validate_fn( 383 | dat, 384 | model_fn = rf_model_fn, 385 | formulas = c("diagnosis~score", "diagnosis~age"), 386 | fold_cols = '.folds', 387 | type = 'binomial', 388 | predict_fn = rf_predict_fn 389 | ) 390 | #inspect data 391 | CV %>% 392 | select(1:6) %>% #select only the first 6 cols 393 | head(2) %>% #select only the first two rows 394 | knitr::kable() 395 | ``` 396 | 397 | | Balanced Accuracy| F1| Sensitivity| Specificity| Pos Pred Value| Neg Pred Value| 398 | |------------------:|----------:|------------:|------------:|---------------:|---------------:| 399 | | 0.5972222| 0.7179487| 0.7777778| 0.4166667| 0.6666667| 0.5555556| 400 | | 0.2916667| 0.3636364| 0.3333333| 0.2500000| 0.4000000| 0.2000000| 401 | 402 | ### I want to do more! 403 | 404 | Well you crazy bastard! I encourage you to do one of these two things 405 | (or both): 406 | 407 | 1. do a bootstrapped and nested cross validation 408 | 409 | 2. make a ensample model, which combines it input of multiple models. 410 | Does combining models yield better results? What a potential issues 411 | with this approach? 412 | 413 | ### I have a 100% accuracy! 414 | 415 | It seems like you have some kind of data leakage. Try to check the 416 | following: 417 | 418 | 1. Check that you don’t use predictor which gives you the diagnosis 419 | e.g. filename, sympton severity etc. 420 | 421 | 2. If you use a random intercept for ID, make sure when you predict 422 | that the model doesn’t apply previous knowledge of for random 423 | intercepts. One way to do this is to is to simply change ID after 424 | fitting (e.g. `ID_101` -> `ID_101_test`). An even simpler way is 425 | to remove the random intercept entirely. I suggest comparing the two 426 | approaches and see which ones performs the best. 427 | 428 | ### How do I add a random effect to tidy models? 429 | 430 | As far as I know you can’t (at least not yet), see [this GitHub issue 431 | for more](https://github.com/tidymodels/parsnip/issues/35). But I 432 | recommend looking at the FAQ titled: ‘Can I please have some more cvms?’ 433 | -------------------------------------------------------------------------------- /tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/577db60da2d0448a9ba6cae4f9861524837efd00/tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-1.png -------------------------------------------------------------------------------- /tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/577db60da2d0448a9ba6cae4f9861524837efd00/tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-2.png --------------------------------------------------------------------------------