├── .gitignore
├── README.md
├── datasets
    └── Lacourse et al. (2001) Females.csv
├── powercurve-1.png
├── scaling.md
├── tutorial-power-analysis-using-simr.md
├── tutorial-scaling.md
├── tutorials-tidy-models.md
└── tutorials-tidy-models_files
    └── figure-markdown_github
        ├── unnamed-chunk-6-1.png
        └── unnamed-chunk-6-2.png


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | Exp-Meth-III-Tutorials.Rproj
6 | *.rmd
7 | .DS_Store
8 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Intro
 2 | 
 3 | This github contains tutorials for experimental methods III.
 4 | 
 5 | The benefit of this page is
 6 | 1) Everyone get to see all the relevant answers to question
 7 | 2) You have ONE place rather than multiple slides, emails and facebook post
 8 | 3) It get you in the habbit of using github
 9 | 
10 | The tutorials on this page include:
11 | 1) [An introduction to the benefits of scaling](https://github.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/blob/master/tutorial-scaling.md)
12 | 1) [A tutorial to power analysis using the simr package](https://github.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/blob/master/tutorial-power-analysis-using-simr.md)
13 | 1) [A tutorial in using tidy models](https://github.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/blob/master/tutorials-tidy-models.md)
14 | 


--------------------------------------------------------------------------------
/datasets/Lacourse et al. (2001) Females.csv:
--------------------------------------------------------------------------------
  1 | Age,Age_Group,Drug_Use,Father_Negligence,Gender,Isolation,Marital_Status,Meaninglessness,Metal,Mother_Negligence,Normlessness,Self_Estrangement,Suicide_Risk,Vicarious,Worshipping
  2 | 15.83,14-16 Years Old,8,17,Female,6,Together,10,4.83762518844781,10,6,15,Non-Suicidal,5,4
  3 | 14.92,14-16 Years Old,9,23,Female,8,Together,26,6,12,8,20,Non-Suicidal,4,6
  4 | 15.33,14-16 Years Old,5,15,Female,18,Together,19,6,16,7,17,Non-Suicidal,6,3
  5 | 15.83,14-16 Years Old,11,11,Female,9,Separated or Divorced,13,4,10,5,12,Non-Suicidal,3,3
  6 | 14.92,14-16 Years Old,7,13,Female,5,Together,13,8,16,3,6,Non-Suicidal,3,9
  7 | 14.58,14-16 Years Old,4,29,Female,15,Separated or Divorced,18,7,18,5,15,Suicidal,2,4
  8 | 14.5,14-16 Years Old,5,10,Female,8,Together,12,8,9,6,10,Non-Suicidal,3,4
  9 | 15.67,14-16 Years Old,7,27,Female,6,Separated or Divorced,18,4,12,7,12,Non-Suicidal,3,4
 10 | 14.92,14-16 Years Old,5,23,Female,10,Together,29,14,21,4,28,Non-Suicidal,8,9
 11 | 15,14-16 Years Old,4,12,Female,5,Together,22,4,15,7,7,Non-Suicidal,5,9
 12 | 15.17,14-16 Years Old,3,19,Female,12,Together,20,8,11,8,19,Non-Suicidal,5,7
 13 | 17.17,16-19 Years Old,8,30,Female,14,Separated or Divorced,13,10,29,6,12,Suicidal,6,12
 14 | 16.83,16-19 Years Old,11,12,Female,7,Together,22,4,22,7,19,Non-Suicidal,6,7
 15 | 16.75,16-19 Years Old,6,16,Female,14,Together,25,7,9,11,10,Non-Suicidal,5,3
 16 | 16.58,16-19 Years Old,8.43970783004555,20,Female,5,Together,16,4,15,4,9,Non-Suicidal,8,4
 17 | 16.5,16-19 Years Old,5,17,Female,12,Together,17,7,9,6,19,Non-Suicidal,3,7
 18 | 16.5,16-19 Years Old,3,24,Female,13,Separated or Divorced,21,7,24,7,16,Non-Suicidal,4,6
 19 | 16.75,16-19 Years Old,3,22,Female,16,Together,30,4,31,7,11,Non-Suicidal,4,5
 20 | 17,16-19 Years Old,5,20,Female,12,Together,13,4,14,4,12,Non-Suicidal,3,3
 21 | 17.25,16-19 Years Old,5,31,Female,16,Separated or Divorced,23,5,21,6,22,Suicidal,5,7
 22 | 17.17,16-19 Years Old,3,26,Female,15,Together,16,5,19.5824136069868,4,24,Non-Suicidal,5,9
 23 | 17.42,16-19 Years Old,7.15478004110386,23,Female,15,Together,18,5,36,9,14,Non-Suicidal,6,3
 24 | 17.08,16-19 Years Old,5,16,Female,10.9559005734461,Together,21,6.53961748140933,13,8,15,Non-Suicidal,6,4
 25 | 17.92,16-19 Years Old,11,16,Female,6,Together,17,10,13,12,16,Suicidal,7,6
 26 | 17.25,16-19 Years Old,7.87460242195194,26,Female,13,Together,24,6,10,8,10,Non-Suicidal,8,7
 27 | 18.5,16-19 Years Old,6,33,Female,6,Separated or Divorced,25,4,9,6,14,Non-Suicidal,4,3
 28 | 16.92,16-19 Years Old,5,27,Female,23,Together,10,12,19,5,22,Non-Suicidal,5,4
 29 | 17.33,16-19 Years Old,12,14,Female,10,Together,10,6,15,7,16,Suicidal,3,5
 30 | 17.42,16-19 Years Old,7,16,Female,6,Together,22,5,13,7,18,Non-Suicidal,5,4
 31 | 17.17,16-19 Years Old,7,10,Female,5,Together,11,6,9,5,12,Suicidal,7,4
 32 | 16.58,16-19 Years Old,8,12,Female,6,Together,13,9,12,6,11,Non-Suicidal,5,10
 33 | 16.58,16-19 Years Old,5,21,Female,8,Together,15,4,18,7,14,Non-Suicidal,6,7
 34 | 17.42,16-19 Years Old,6,30,Female,7,Together,25,9,24,4,14,Non-Suicidal,5,3
 35 | 16.75,16-19 Years Old,11,24,Female,9,Together,15,7,15,8,12,Non-Suicidal,4,4
 36 | 15,14-16 Years Old,11,29,Female,7,Together,26,11,10,6,7,Suicidal,5,10
 37 | 16.5,16-19 Years Old,9,10,Female,4,Together,10,4,9,7,5,Non-Suicidal,3,4
 38 | 16.58,16-19 Years Old,5,16,Female,13,Together,13,7,16,4,17,Non-Suicidal,5,4
 39 | 16.67,16-19 Years Old,6,18,Female,11,Together,19,4,14,3,10,Non-Suicidal,5,3
 40 | 17,16-19 Years Old,5,16,Female,8,Together,14,8.52820197005175,12,5,5,Suicidal,3,3
 41 | 17.25,16-19 Years Old,7,12,Female,4,Together,8,4,16,10,10,Suicidal,2,3
 42 | 14.58,14-16 Years Old,5,9,Female,4,Separated or Divorced,8,11,12,6,8,Non-Suicidal,8,6
 43 | 14.75,14-16 Years Old,6,15,Female,6,Together,17,5,19,4,17,Non-Suicidal,5,5
 44 | 14.5,14-16 Years Old,11,29,Female,10,Together,15,6.91536148045185,21,9,13,Non-Suicidal,2,3
 45 | 14.75,14-16 Years Old,3,9,Female,4,Together,13,5,9,3,8,Non-Suicidal,3,9
 46 | 14.58,14-16 Years Old,14,22,Female,14,Together,27,4,15,12,18,Suicidal,5,10
 47 | 14.92,14-16 Years Old,7,13,Female,6,Separated or Divorced,14,4,13,5,11,Non-Suicidal,3,9
 48 | 14.5,14-16 Years Old,14,27,Female,14,Together,25,7,18,8,16,Non-Suicidal,6,11
 49 | 16.5,16-19 Years Old,9,10,Female,18,Separated or Divorced,15,6,24,6,26,Suicidal,6,5
 50 | 14.67,14-16 Years Old,9,24,Female,6,Together,24,13,26,9,21,Suicidal,3,3
 51 | 16.83,16-19 Years Old,13,22,Female,9,Separated or Divorced,15,4,9,9,14,Non-Suicidal,5,3
 52 | 15.08,14-16 Years Old,8,29,Female,8,Separated or Divorced,30,9,21,17,29,Suicidal,7,10
 53 | 15.33,14-16 Years Old,13,14,Female,5,Separated or Divorced,21,12,10,9,22,Suicidal,6,6
 54 | 14.67,14-16 Years Old,9,11,Female,8,Together,19,6,10,7,18,Non-Suicidal,3,7
 55 | 15.17,14-16 Years Old,11,22,Female,18,Together,20,12,13,8,22,Suicidal,8,9
 56 | 16.17,14-16 Years Old,3,15,Female,18,Together,17,12,32,4,27,Non-Suicidal,7,3
 57 | 16.08,14-16 Years Old,7,12,Female,5,Together,13.6292687605901,6,9,8,18,Suicidal,4,4
 58 | 16.42,16-19 Years Old,3,13,Female,10,Together,14,6,20,3,10,Non-Suicidal,2,9
 59 | 15.83,14-16 Years Old,5,11,Female,5,Together,16,5,11,4,8,Non-Suicidal,3,5
 60 | 15.92,14-16 Years Old,10,35,Female,12,Separated or Divorced,25,4,17,8,14,Suicidal,7,5
 61 | 16.25,16-19 Years Old,5,13,Female,8,Together,14,5,19,5,14,Non-Suicidal,6,6
 62 | 17.75,16-19 Years Old,11,9,Female,9,Together,19,6,9,6,21,Suicidal,3,6
 63 | 16,14-16 Years Old,6,14,Female,11,Together,17,9,18,11,17,Suicidal,5,10
 64 | 15.42,14-16 Years Old,3,17,Female,16,Together,23,14,13,6,18,Non-Suicidal,3,3
 65 | 16,14-16 Years Old,11,13,Female,6,Together,30,15,36,8,11,Non-Suicidal,8,6
 66 | 15.17,14-16 Years Old,6,12,Female,12,Together,11,6,15,6,18,Non-Suicidal,4,5
 67 | 15,14-16 Years Old,7,17,Female,11,Together,16,4,29,9,27,Non-Suicidal,8,4
 68 | 14.5,14-16 Years Old,3,21,Female,5,Together,12,5,14,3,6,Non-Suicidal,8,12
 69 | 15.08,14-16 Years Old,13,13,Female,7,Separated or Divorced,21,4,12,12,17,Non-Suicidal,8,9
 70 | 15,14-16 Years Old,13,17,Female,9,Separated or Divorced,26,6,24,6,24,Non-Suicidal,8,9
 71 | 15.08,14-16 Years Old,6,9,Female,7,Separated or Divorced,16,5,16,8,16,Non-Suicidal,4,3
 72 | 15.25,14-16 Years Old,5,24,Female,18,Separated or Divorced,23.909975068745702,4,17,15,24,Non-Suicidal,2,4
 73 | 15,14-16 Years Old,7,13,Female,11,Together,13,6,9,13,17,Suicidal,4,8
 74 | 15.42,14-16 Years Old,13,22,Female,7,Together,18,5,13,5,28,Suicidal,8,6
 75 | 14.67,14-16 Years Old,5,13,Female,8,Together,23,6,12,5,18,Non-Suicidal,5,7
 76 | 15.42,14-16 Years Old,11,21,Female,6,Together,19,5,20,6,16,Non-Suicidal,5,4
 77 | 15.17,14-16 Years Old,15,34,Female,22,Separated or Divorced,27,20,17,12,29,Suicidal,8,10
 78 | 15.67,14-16 Years Old,6.36189616889847,11,Female,10,Together,11,5,9,8,15,Non-Suicidal,6,7
 79 | 16.42,16-19 Years Old,6,10,Female,18,Together,21,6,14,10,21,Non-Suicidal,3,4
 80 | 14.67,14-16 Years Old,4,11,Female,7,Together,11,10,9,4,11,Non-Suicidal,2,10
 81 | 16.17,14-16 Years Old,7.35574617962391,15,Female,10,Together,14,7,14,5,20,Non-Suicidal,4,3
 82 | 16.25,16-19 Years Old,5,12,Female,5,Together,16,4,13,5,14,Non-Suicidal,2,3
 83 | 15.75,14-16 Years Old,13,14,Female,4,Together,12,7,15,12,9,Non-Suicidal,7,9
 84 | 16.33,16-19 Years Old,4,15,Female,12,Together,14,5,13,4,10,Non-Suicidal,4,4
 85 | 16.42,16-19 Years Old,12,25,Female,5,Together,18,5,16,6,20,Suicidal,3,5
 86 | 17.08,16-19 Years Old,10,31,Female,16,Separated or Divorced,20,11,23,7,23,Suicidal,8,5
 87 | 16.67,16-19 Years Old,7,15,Female,8,Together,21,6,10,3,11,Non-Suicidal,2,3
 88 | 17.67,16-19 Years Old,4,10,Female,10,Together,22,16,10,5,15,Non-Suicidal,4,3
 89 | 17.08,16-19 Years Old,7,25,Female,7,Separated or Divorced,13,6,13,5,13,Non-Suicidal,5,3
 90 | 16.83,16-19 Years Old,5,17,Female,12,Separated or Divorced,13,4,20,4,14,Non-Suicidal,5,5
 91 | 17,16-19 Years Old,5,20,Female,6,Separated or Divorced,11,6,12,5,19,Non-Suicidal,6,3
 92 | 17.25,16-19 Years Old,11,27,Female,10,Separated or Divorced,13,12,12,8,17,Suicidal,7,7
 93 | 17.25,16-19 Years Old,9,20,Female,9,Together,30,18,9,15,27,Suicidal,8,10
 94 | 16.83,16-19 Years Old,11,31,Female,7,Separated or Divorced,21,7,14,13,16,Suicidal,3,5
 95 | 16.42,16-19 Years Old,3,14,Female,17,Together,11,4,17,4,12,Non-Suicidal,6,4
 96 | 17.42,16-19 Years Old,3,17,Female,12,Together,12,5,14,5,14,Non-Suicidal,6,3
 97 | 16.42,16-19 Years Old,5,12,Female,4,Together,15,4,12,4,12,Non-Suicidal,5,5
 98 | 16.75,16-19 Years Old,3,19,Female,11,Separated or Divorced,21,6,18,6,13,Non-Suicidal,5,3
 99 | 17.25,16-19 Years Old,3,19,Female,8,Together,10,6,23,3,12,Non-Suicidal,2,3
100 | 17.08,16-19 Years Old,11,11,Female,5,Together,14.433093821544,8,10,6,8,Non-Suicidal,2,5
101 | 17.25,16-19 Years Old,7,16,Female,9,Together,23,6,14,10,10,Suicidal,4,3
102 | 17.33,16-19 Years Old,6,13,Female,4,Together,22,8,11,6,9,Non-Suicidal,4,3
103 | 17.08,16-19 Years Old,6,10,Female,10,Together,10,6,9,5,10,Non-Suicidal,5,4
104 | 17.25,16-19 Years Old,3,19,Female,4,Together,14,12,16,3,9,Non-Suicidal,4,3
105 | 17.42,16-19 Years Old,3,11,Female,9,Together,12,5,10,3,12,Non-Suicidal,2,8
106 | 17.42,16-19 Years Old,5,12,Female,11,Together,26,7,9,5,18,Non-Suicidal,3,4
107 | 15.75,14-16 Years Old,13,13,Female,18,Together,18,14,9,8,21,Suicidal,4,7
108 | 16,14-16 Years Old,5,29,Female,8,Separated or Divorced,19,6,11,7,18,Non-Suicidal,6,6
109 | 16.42,16-19 Years Old,5,10,Female,9,Together,17,4,11.8560541141899,4,17,Non-Suicidal,6,3
110 | 15.58,14-16 Years Old,6,22,Female,18,Together,20,5.06898320036459,15.7517495914531,12,17,Non-Suicidal,6,7
111 | 15.83,14-16 Years Old,5,12,Female,6,Together,13,6,14,3,14,Non-Suicidal,3,4
112 | 15.58,14-16 Years Old,11,21,Female,10,Together,20,4,16,9,21,Suicidal,5,3
113 | 15.83,14-16 Years Old,3,18,Female,9,Together,18,10,14,5,10,Non-Suicidal,3,6
114 | 15.5,14-16 Years Old,3,13,Female,5,Together,19,5,15,6,11,Non-Suicidal,3,4
115 | 15.58,14-16 Years Old,3,10,Female,4,Together,25,4,9,3,14,Non-Suicidal,2,4
116 | 16.33,16-19 Years Old,3,17,Female,5,Together,18,13,13,5,14,Non-Suicidal,4,4
117 | 15.83,14-16 Years Old,11,26,Female,8,Separated or Divorced,22,6,17,9,7,Non-Suicidal,5,4
118 | 15.5,14-16 Years Old,11,12,Female,4,Together,14,18,12,14,15,Non-Suicidal,7,8
119 | 15.58,14-16 Years Old,5,16,Female,14,Together,29,4,12,13,14,Non-Suicidal,5,6
120 | 15.58,14-16 Years Old,4,13,Female,8,Together,15,5,13,8,16,Non-Suicidal,5,3
121 | 15.67,14-16 Years Old,7,20,Female,6,Together,14,6,18,4,11,Non-Suicidal,7,4
122 | 16,14-16 Years Old,5,9,Female,4,Separated or Divorced,11,4,9,4,10,Non-Suicidal,2,3
123 | 


--------------------------------------------------------------------------------
/powercurve-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/577db60da2d0448a9ba6cae4f9861524837efd00/powercurve-1.png


--------------------------------------------------------------------------------
/scaling.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | (Note that this is saved as pdf due to the inclusion of formulas)
 4 | 
 5 | This is the start of a tutorial explaining why you should scale your
 6 | variables. Some of it is still missing though.
 7 | 
 8 | Scaling formula
 9 | ===============
10 | 
11 | Scaling (or Z-score normalization) follows the formula
12 | $$
13 | x' = \\frac{x- \\bar x}{\\sigma}
14 | $$
15 |  Where *x* − *x̄* centers the variables around zero and where dividing by
16 | *σ* constitutes the normalization of the variables.
17 | 
18 | The scale function performs a Z-score normalization, which can be seen
19 | here:
20 | 
21 |     set.seed(1) # set seed to ensure random number generation is the same
22 |     x <- runif(7) # generate random numbers
23 | 
24 |     # Manually scaling
25 |     (x - mean(x)) / sd(x)
26 | 
27 |     ## [1] -1.01951259 -0.68940037 -0.06788275  0.97047346 -1.21713898  0.94007371
28 |     ## [7]  1.08338753
29 | 
30 |     #scale using scale
31 |     scale(x)
32 | 
33 |     ##             [,1]
34 |     ## [1,] -1.01951259
35 |     ## [2,] -0.68940037
36 |     ## [3,] -0.06788275
37 |     ## [4,]  0.97047346
38 |     ## [5,] -1.21713898
39 |     ## [6,]  0.94007371
40 |     ## [7,]  1.08338753
41 |     ## attr(,"scaled:center")
42 |     ## [1] 0.5947772
43 |     ## attr(,"scaled:scale")
44 |     ## [1] 0.3229666
45 | 
46 |     class(scale(x))
47 | 
48 |     ## [1] "matrix"
49 | 
50 | *note* that scale is a class matrix, which might cause problems. If you
51 | want scale to return a numeric vector (a list of numbers) use
52 | `scale(x)[[1]]`.
53 | 
54 | FAQ
55 | ---
56 | 
57 | ### How do i include formula in my R markdown?
58 | 
59 | This is done using LaTeX. You first need to install a LaTeX distribution
60 | on your system e.g. using the following install command
61 | `tinytex::install_tinytex()` You also need to set your header to output
62 | latex. The header for this script is for example:
63 | 
64 |     ---
65 |     title: "Tutorial_Scaling"
66 |     author: "Kenneth C. Enevoldsen"
67 |     date: "10/14/2019"
68 |     header-includes:
69 |        - \usepackage{bbm}
70 |     output:
71 |         html_document: md_document
72 |     ---
73 | 
74 | You then include latex using the double dollarsigns, in the following
75 | way: `$$ x - \bar x$$`
76 | 


--------------------------------------------------------------------------------
/tutorial-power-analysis-using-simr.md:
--------------------------------------------------------------------------------
  1 | # Power Analysis using simr
  2 | -------------------------
  3 | 
  4 | ## Install packages from Git
  5 | Prior to this make sure your **R version is 3.5.3** or above. Furhermore
  6 | simr might **require dependencies**. Make sure these are met.
  7 | 
  8 | ``` r
  9 | #library(githubinstall)
 10 | #githubinstall("simr", lib = .libPaths())
 11 | print(pacman::p_path() == .libPaths()) # should be the same!!
 12 | ```
 13 | 
 14 |     ## [1] TRUE
 15 | 
 16 | ``` r
 17 | pacman::p_load(tidyverse, lmerTest, simr)
 18 | ```
 19 | 
 20 | Read the data
 21 | -------------
 22 | You shouldn’t need this, but I have put it here to make sure you can
 23 | replicate my results.
 24 | ``` r
 25 | CleanUpData <- function(Demo,LU,Word){
 26 |   
 27 |   Speech <- merge(LU, Word) %>% 
 28 |     rename(
 29 |       Child.ID = SUBJ, 
 30 |       Visit = VISIT) %>%
 31 |     mutate(
 32 |       Visit = as.numeric(str_extract(Visit, "\\d")),
 33 |       Child.ID = gsub("\\.","", Child.ID)
 34 |       ) %>%
 35 |     dplyr::select(
 36 |       Child.ID, Visit, MOT_MLU, CHI_MLU, types_MOT, types_CHI, tokens_MOT, tokens_CHI
 37 |     )
 38 |   
 39 |   Demo <- Demo %>%
 40 |     dplyr::select(
 41 |       Child.ID, Visit, Ethnicity, Diagnosis, Gender, Age, ADOS, MullenRaw, ExpressiveLangRaw, Socialization
 42 |     ) %>%
 43 |     mutate(
 44 |       Child.ID = gsub("\\.","", Child.ID)
 45 |     )
 46 |     
 47 |   Data = merge(Demo, Speech, all = T)
 48 |   
 49 |   Data1 = Data %>% 
 50 |     subset(Visit == "1") %>% 
 51 |      dplyr::select(Child.ID, ADOS, ExpressiveLangRaw, MullenRaw, Socialization) %>%
 52 |      rename(Ados1 = ADOS, 
 53 |             verbalIQ1 = ExpressiveLangRaw, 
 54 |             nonVerbalIQ1 = MullenRaw,
 55 |             Socialization1 = Socialization) 
 56 |   
 57 |   Data = merge(Data, Data1, all = T) %>%
 58 |     mutate(
 59 |       Child.ID = as.numeric(as.factor(as.character(Child.ID))),
 60 |       Visit = as.numeric(as.character(Visit)),
 61 |       Gender = recode(Gender, 
 62 |          "1" = "M",
 63 |          "2" = "F"),
 64 |       Diagnosis = recode(Diagnosis,
 65 |          "A"  = "ASD", # Note that this function is fixed here
 66 |          "B"  = "TD")
 67 |     )
 68 | 
 69 |   return(Data)
 70 | }
 71 | 
 72 | # Training Data
 73 | #setwd('~/Dropbox/2019 - methods 3/Assignments19/Assignment2/solutions/')
 74 | Demo <- read_csv('../../Assignment1/data/demo_train.csv')
 75 | LU <- read_csv('../../Assignment1/data/LU_train.csv')
 76 | Word <- read_csv('../../Assignment1/data/token_train.csv')
 77 | TrainData <- CleanUpData(Demo,LU,Word)
 78 | Demo <- read_csv('../../Assignment2/data/demo_test.csv')
 79 | LU <- read_csv('../../Assignment2/data/LU_test.csv')
 80 | Word <- read_csv('../../Assignment2/data/token_test.csv')
 81 | TestData <- CleanUpData(Demo,LU,Word)
 82 | 
 83 | # merge training and testing
 84 | Data <- merge(TrainData, TestData, all = T)
 85 | 
 86 | Data <- Data[complete.cases(Data[,c("CHI_MLU","Visit","Diagnosis","verbalIQ1","Child.ID")]),]
 87 | Data$Child.ID <- as.factor(Data$Child.ID)
 88 | ```
 89 | 
 90 | ------------------------------------------------------------------------
 91 | 
 92 | Create a model
 93 | --------------
 94 | 
 95 | ``` r
 96 | InteractionM <- lmer(CHI_MLU ~ Visit * Diagnosis + (1+Visit|Child.ID),
 97 |               Data, REML = F,
 98 |               control = lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE)) #optimizer should be relevant
 99 | ```
100 | 
101 | this model encodes the following hypotheses and assumptions:
102 | 
103 | 1.  We expect child MLU to vary dependent on visit and diagnosis
104 | 
105 | 2.  We expect the effect of visit to vary according to diagnosis
106 |     (interaction)
107 | 
108 | 3.  We have multiple datapoint pr. child ID
109 | 
110 | 4.  We expect each child start from different point (random intercept)
111 |     and increase MLU at different rates (random slope)
112 | 
113 | ## What is the power of Diagnosis?
114 | 
115 | ``` r
116 | sim = powerSim(InteractionM , fixed("Diagnosis"), nsim = 50, seed = 1, progress = F) 
117 | ```
118 | 
119 |     ## Warning in observedPowerWarning(sim): This appears to be an "observed
120 |     ## power" calculation
121 | 
122 | ``` r
123 | sim
124 | ```
125 | 
126 |     ## Power for predictor 'Diagnosis', (95% confidence interval):
127 |     ##       100.0% (92.89, 100.0)
128 |     ## 
129 |     ## Test: unknown test
130 |     ## 
131 |     ## Based on 50 simulations, (50 warnings, 0 errors)
132 |     ## alpha = 0.05, nrow = 387
133 |     ## 
134 |     ## Time elapsed: 0 h 0 m 3 s
135 |     ## 
136 |     ## nb: result might be an observed power calculation
137 | 
138 | ``` r
139 | # progress just turns off the progressbar - makes for a nicer reading experience
140 | # seed = 1 ensures I get the same result also when knitting
141 | ```
142 | 
143 | Oh 100% not too shabby. Note that it gives a warning, which is the same
144 | as for visit. Let’s just ignore it for now. Furthermore it also gives a
145 | warning telling us it is an observed power calculation, this simply
146 | tells us that the visit is fixed effect is the one the model calculated.
147 | We can change this using:
148 | 
149 | `fixef(InteractionM)["Diagnosis"] <- 0.4`
150 | 
151 | This is the *minimal interesting effect*, e.g. if the effect is lower
152 | than this we are not interested in it and consequently we should not
153 | make a power analysis for any power lower than this. This is *naturally*
154 | context dependent and you should argue for your minimal interesting
155 | effect. You could for example argue that the mean of the ASD should vary
156 | from the mean of TD by 1 SD (you would need to calculate this) or you
157 | could (and probably should) examine the literature to examine what a
158 | relevant effect might be.
159 | 
160 | ------------------------------------------------------------------------
161 | 
162 | ## What about Visit?
163 | 
164 | ``` r
165 | sim = powerSim(InteractionM , fixed("Visit"), nsim = 50, seed = 1, progress = F)
166 | ```
167 | 
168 |     ## Warning in observedPowerWarning(sim): This appears to be an "observed
169 |     ## power" calculation
170 | 
171 | ``` r
172 | sim
173 | ```
174 | 
175 |     ## Power for predictor 'Visit', (95% confidence interval):
176 |     ##        0.00% ( 0.00,  7.11)
177 |     ## 
178 |     ## Test: unknown test
179 |     ##       Effect size for Visit is 0.13
180 |     ## 
181 |     ## Based on 50 simulations, (50 warnings, 50 errors)
182 |     ## alpha = 0.05, nrow = 387
183 |     ## 
184 |     ## Time elapsed: 0 h 0 m 4 s
185 |     ## 
186 |     ## nb: result might be an observed power calculation
187 | 
188 | **50 errors**, that is a problem (if there is any errors don’t worry about
189 | the power). Lets examine:
190 | 
191 | ``` r
192 | print(sim$errors$message[1])
193 | ```
194 | 
195 |     ## [1] "Models have either equal fixed mean stucture or are not nested"
196 | 
197 | ``` r
198 | print(sim$warnings$message[1])
199 | ```
200 | 
201 |     ## [1] "Main effect (Visit) was tested but there were interactions."
202 | 
203 | So from this we see that simr is telling us that we should examine the
204 | fixed effect when there is a interaction effect. If we want to examine
205 | the fixed effect independently we should make a model without these.
206 | (try to check if the warning message for diagnosis is the same)
207 | 
208 | **Sidenote:** This is a bad design for a function output, as
209 | 
210 | 1.  the errors aren’t clearly visible
211 | 
212 | 2.  the relevant error is ‘hidden’ in the warning, while the actual
213 |     error message is gibberish
214 | 
215 | 3.  Worst of all, someone might interpret these result without noticing
216 |     the errors
217 | 
218 | ------------------------------------------------------------------------
219 | 
220 | ## and, lastly, the interaction
221 | 
222 | ``` r
223 | fixef(InteractionM)["Visit:DiagnosisTD"] <- 0.1 # let's try setting a fixed ef
224 | powerSim(InteractionM , fixed("Visit:Diagnosis"), nsim = 50, seed = 1, progress = F)
225 | ```
226 | 
227 |     ## Power for predictor 'Visit:Diagnosis', (95% confidence interval):
228 |     ##       76.00% (61.83, 86.94)
229 |     ## 
230 |     ## Test: unknown test
231 |     ## 
232 |     ## Based on 50 simulations, (0 warnings, 0 errors)
233 |     ## alpha = 0.05, nrow = 387
234 |     ## 
235 |     ## Time elapsed: 0 h 0 m 10 s
236 | 
237 | Assuming 0.1 is the minimal interesting effect we don’t have enough
238 | power. Consequently we should add some more participant. Let’s try to do
239 | that:
240 | 
241 | ``` r
242 | InteractionM <- extend(InteractionM, along = "Child.ID", n = 120) #extend data along child ID
243 | 
244 | # plot the powercurve
245 | powerCurveV1 = powerCurve(InteractionM, fixed("Visit:Diagnosis"), along = "Child.ID", 
246 |                   nsim = 10, breaks = seq(from = 10, to = 120, by = 5), seed = 1, progress = F) # waaay to few sim
247 | # break is a which interval is should do a power calculations (this simply says every 5th child)
248 | plot(powerCurveV1)
249 | ```
250 | 
251 | ![](powercurve-1.png) 
252 | 
253 | Here we see that given the specified effect we would only have enough power with
254 | 75 participants!!
255 | 
256 | **Is 0.1 a good estimate?** I would argue it is set too low. Why? Well
257 | because a TD child whom only improves by 0.1 mlu pr. visit is rather
258 | low. This conclusion naturally take into consideration the time between
259 | each visit and what we know about the development of children MLU.
260 | 
261 | ---
262 | 
263 | ## *Additions aka. FAQ*
264 | These are added in response to question I get so everyone can get to see them.
265 | 
266 | ### Regarding choosing effect size - an example
267 | Choosing an effect size can be hard, but it is highly important since it requires field expertise as well as knowledge and assumptions about the data generating process. 
268 | For example if you were to measure whether Cognitive science students are smarter than the avereage student at Aarhus by measing IQ. We will then run a simulation for a power analysis and specify the least interesting effect between groups. Assuming the model we choose is:
269 | 
270 | ``` r
271 | lm(IQ ~ is_cogsci)
272 | ```
273 | 
274 | This corresponds to the beta estimates (e.g. slope) of the model. Naturally a slope of 0-1 would be only a minimal effekt, e.g. cogsci student are only **very** slighly smarter than the rest of student and would need a big sample to get this quite uninsteresting result. Alternatively, let's say we only consider an beta estimate of >5 interesting we set the minimal interesting effect to be 5, we can then measure how many participant we would need to measure the desired effect.
275 | 
276 | Note that this can also be turned around. E.g. you could ask *"given that we collect 30 participants what is the smallest effect we can discover while maintaining a power of 80%?"*. You could do this by simply looping over a range of valid effect sizes and then estimating the power.
277 | 
278 | Using field specific knowledge is also very similar to setting priors in a bayesian framework, so it might be an idea to get used quantifying your beliefs as you will have to do this more in the future.
279 | 
280 | 
281 | ### Regarding alpha at 0.05 and our beta at 0.8
282 | So in the assignment the values alpha at 0.05 and our beta at 0.8 is refered to.
283 | These two values signifies:
284 | alpha 0.05: the significance level we will be using
285 | beta 0.8: the power of the power simulation e.g. 80%
286 | 


--------------------------------------------------------------------------------
/tutorial-scaling.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | This is the start of a tutorial explaining why you should scale your
 4 | variables. Some of it is still missing though.
 5 | 
 6 | Scaling formula
 7 | ===============
 8 | 
 9 | Scaling (or Z-score normalization) follows the formula
10 | $$
11 | x' = \\frac{x- \\bar x}{\\sigma}
12 | $$
13 |  Where *x* − *x̄* centers the variables around zero and where dividing by
14 | *σ* constitutes the normalization of the variables.
15 | 
16 | The scale function performs a Z-score normalization, which can be seen
17 | here:
18 | 
19 |     set.seed(1) # set seed to ensure random number generation is the same
20 |     x <- runif(7) # generate random numbers
21 | 
22 |     # Manually scaling
23 |     (x - mean(x)) / sd(x)
24 | 
25 |     ## [1] -1.01951259 -0.68940037 -0.06788275  0.97047346 -1.21713898  0.94007371
26 |     ## [7]  1.08338753
27 | 
28 |     #scale using scale
29 |     scale(x)
30 | 
31 |     ##             [,1]
32 |     ## [1,] -1.01951259
33 |     ## [2,] -0.68940037
34 |     ## [3,] -0.06788275
35 |     ## [4,]  0.97047346
36 |     ## [5,] -1.21713898
37 |     ## [6,]  0.94007371
38 |     ## [7,]  1.08338753
39 |     ## attr(,"scaled:center")
40 |     ## [1] 0.5947772
41 |     ## attr(,"scaled:scale")
42 |     ## [1] 0.3229666
43 | 
44 |     class(scale(x))
45 | 
46 |     ## [1] "matrix"
47 | 
48 | *note* that scale is a class matrix, which might cause problems. If you
49 | want scale to return a numeric vector (a list of numbers) use
50 | `scale(x)[[1]]`. Furthermore note that the mean (scale:center) and SD
51 | (scaled:scale) is also provided by the scale function, which is useful
52 | for going back to the original values.
53 | 
54 | Will come in the future
55 | -----------------------
56 | 
57 | 1.  Why scaling improves model fit (and convergence)
58 | 2.  Why center (and scaling) improves model interpretability
59 | 3.  Other ways to scale / normalize
60 | 
61 | FAQ
62 | ---
63 | 
64 | ### How do i include formula in my R markdown?
65 | 
66 | This is done using LaTeX. You first need to install a LaTeX distribution
67 | on your system e.g. using the following install command
68 | `tinytex::install_tinytex()` You also need to set your header to output
69 | latex. The header for this script is for example:
70 | 
71 |     ---
72 |     title: "Tutorial_Scaling"
73 |     author: "Kenneth C. Enevoldsen"
74 |     date: "10/14/2019"
75 |     header-includes:
76 |        - \usepackage{bbm}
77 |     output:
78 |         html_document: md_document
79 |     ---
80 | 
81 | You then include latex using the double dollarsigns, in the following
82 | way: `$$ x - \bar x$$`
83 | 


--------------------------------------------------------------------------------
/tutorials-tidy-models.md:
--------------------------------------------------------------------------------
  1 | Currently commentary in this file is fairly limited, let me know if
  2 | there is specific points which you feel could need more explanation.
  3 | 
  4 | Load packages
  5 | -------------
  6 | 
  7 | ``` r
  8 | pacman::p_load(pacman, tidyverse, tidymodels, groupdata2)
  9 | ```
 10 | 
 11 | For more on the data see [Lacourse et at.
 12 | (2001)](https://www.researchgate.net/publication/226157359_Heavy_Metal_Music_and_Adolescent_Suicidal_Risk)
 13 | 
 14 | Load Data and Data Preperation
 15 | ------------------------------
 16 | 
 17 | ``` r
 18 | heavy_data <- read_csv("datasets/Lacourse et al. (2001) Females.csv", col_types = cols()) %>% 
 19 |   mutate_at(c("Age_Group", "Marital_Status", "Gender", "Suicide_Risk"), as.factor)
 20 | 
 21 | #examine dataset
 22 | heavy_data %>% head(5) %>% knitr::kable()
 23 | ```
 24 | 
 25 | |    Age| Age\_Group      |  Drug\_Use|  Father\_Negligence| Gender |  Isolation| Marital\_Status       |  Meaninglessness|     Metal|  Mother\_Negligence|  Normlessness|  Self\_Estrangement| Suicide\_Risk |  Vicarious|  Worshipping|
 26 | |------:|:----------------|----------:|-------------------:|:-------|----------:|:----------------------|----------------:|---------:|-------------------:|-------------:|-------------------:|:--------------|----------:|------------:|
 27 | |  15.83| 14-16 Years Old |          8|                  17| Female |          6| Together              |               10|  4.837625|                  10|             6|                  15| Non-Suicidal  |          5|            4|
 28 | |  14.92| 14-16 Years Old |          9|                  23| Female |          8| Together              |               26|  6.000000|                  12|             8|                  20| Non-Suicidal  |          4|            6|
 29 | |  15.33| 14-16 Years Old |          5|                  15| Female |         18| Together              |               19|  6.000000|                  16|             7|                  17| Non-Suicidal  |          6|            3|
 30 | |  15.83| 14-16 Years Old |         11|                  11| Female |          9| Separated or Divorced |               13|  4.000000|                  10|             5|                  12| Non-Suicidal  |          3|            3|
 31 | |  14.92| 14-16 Years Old |          7|                  13| Female |          5| Together              |               13|  8.000000|                  16|             3|                   6| Non-Suicidal  |          3|            9|
 32 | 
 33 | ``` r
 34 | # Prep the dataset
 35 | df <- heavy_data %>%
 36 |   select(-Age_Group, -Gender) #since it is highly correlated with Age and Gender = Female
 37 | ```
 38 | 
 39 | Partioning the data
 40 | -------------------
 41 | 
 42 | Using groupdata2, note that you can also use rsample which follows the
 43 | tidymodels framework, but does not (at least easily) allow for
 44 | categorical and ID columns.
 45 | 
 46 | ``` r
 47 | set.seed(5)
 48 | df_list <- partition(df, p = 0.2, cat_col = c("Suicide_Risk"), id_col = NULL, list_out = T)
 49 | df_test = df_list[[1]]
 50 | df_train = df_list[[2]]
 51 | ```
 52 | 
 53 | If you have an id column you should remove it after here. Otherwise it
 54 | will be included as a predictor.
 55 | 
 56 | Let’s define a ‘recipe’ that preprocesses the data
 57 | --------------------------------------------------
 58 | 
 59 | Note that the naming in recipe (a package within tidymodels), can be
 60 | rather odd. They really dig the recipe metaphor.
 61 | 
 62 | ``` r
 63 | #create recipe
 64 | rec <- df_train %>% recipe(Suicide_Risk ~ .) %>% # defines the outcome
 65 |   step_center(all_numeric()) %>% # center numeric predictors
 66 |   step_scale(all_numeric()) %>% # scales numeric predictors
 67 |   step_corr(all_numeric()) %>% 
 68 |   check_missing(everything()) %>%
 69 |   prep(training = df_train)
 70 | 
 71 | train_baked <- juice(rec) # extract df_train from rec
 72 | 
 73 | rec #inspect rec
 74 | ```
 75 | 
 76 |     ## Data Recipe
 77 |     ## 
 78 |     ## Inputs:
 79 |     ## 
 80 |     ##       role #variables
 81 |     ##    outcome          1
 82 |     ##  predictor         12
 83 |     ## 
 84 |     ## Training data contained 97 data points and no missing data.
 85 |     ## 
 86 |     ## Operations:
 87 |     ## 
 88 |     ## Centering for Age, Drug_Use, ... [trained]
 89 |     ## Scaling for Age, Drug_Use, ... [trained]
 90 |     ## Correlation filter removed no terms [trained]
 91 |     ## Check missing values for Age, Drug_Use, ... [trained]
 92 | 
 93 | This is naturally just an example of how a reciple could be done. It
 94 | does not need to check for e.g. missing values or correlation if I know
 95 | it is not an issue.
 96 | 
 97 | ### Apply recipe to test
 98 | 
 99 | ``` r
100 | test_baked <- rec %>% bake(df_test)
101 | ```
102 | 
103 | **note** that we just extract the `df_train` from the recipe (`rec`) to
104 | convince yourself that they are indeed the same compare `juice(rec)`
105 | with `rec %>% bake(df_train)`.
106 | 
107 | Creating models
108 | ---------------
109 | 
110 | To see all the available models in parsnip (part of tidy models) go
111 | [here](https://tidymodels.github.io/parsnip/articles/articles/Models.html).
112 | 
113 | ``` r
114 | #logistic regression
115 | log_fit <- 
116 |   logistic_reg() %>%
117 |   set_mode("classification") %>% 
118 |   set_engine("glm") %>%
119 |   fit(Suicide_Risk ~ . , data = train_baked)
120 | 
121 | #support vector machine
122 | svm_fit <-
123 |   svm_rbf() %>%
124 |   set_mode("classification") %>% 
125 |   set_engine("kernlab") %>%
126 |   fit(Suicide_Risk ~ . , data = train_baked)
127 | ```
128 | 
129 | Apply model to test set
130 | -----------------------
131 | 
132 | ``` r
133 | #predict class
134 | log_class <- log_fit %>%
135 |   predict(new_data = test_baked)
136 | #get prob of class
137 | log_prop <- log_fit %>%
138 |   predict(new_data = test_baked, type = "prob") %>%
139 |   pull(.pred_Suicidal)
140 | 
141 | #get multiple at once
142 | test_results <- 
143 |   test_baked %>% 
144 |   select(Suicide_Risk) %>% 
145 |   mutate(
146 |     log_class = predict(log_fit, new_data = test_baked) %>% 
147 |       pull(.pred_class),
148 |     log_prob  = predict(log_fit, new_data = test_baked, type = "prob") %>% 
149 |       pull(.pred_Suicidal),
150 |     svm_class = predict(svm_fit, new_data = test_baked) %>% 
151 |       pull(.pred_class),
152 |     svm_prob  = predict(svm_fit, new_data = test_baked, type = "prob") %>% 
153 |       pull(.pred_Suicidal)
154 |   )
155 | 
156 | test_results %>% 
157 |   head(5) %>% 
158 |   knitr::kable() #examine the first 5
159 | ```
160 | 
161 | | Suicide\_Risk | log\_class   |  log\_prob| svm\_class   |  svm\_prob|
162 | |:--------------|:-------------|----------:|:-------------|----------:|
163 | | Non-Suicidal  | Non-Suicidal |  0.0038562| Non-Suicidal |  0.0354329|
164 | | Non-Suicidal  | Non-Suicidal |  0.2125898| Non-Suicidal |  0.5161208|
165 | | Non-Suicidal  | Non-Suicidal |  0.3828569| Non-Suicidal |  0.3209701|
166 | | Non-Suicidal  | Non-Suicidal |  0.0860703| Non-Suicidal |  0.0462909|
167 | | Non-Suicidal  | Non-Suicidal |  0.0729597| Non-Suicidal |  0.3043180|
168 | 
169 | Performance metrics
170 | -------------------
171 | 
172 | ``` r
173 | metrics(test_results, truth = Suicide_Risk, estimate = log_class) %>% 
174 |   knitr::kable()
175 | ```
176 | 
177 | | .metric  | .estimator |  .estimate|
178 | |:---------|:-----------|----------:|
179 | | accuracy | binary     |  0.8750000|
180 | | kap      | binary     |  0.6470588|
181 | 
182 | ``` r
183 | metrics(test_results, truth = Suicide_Risk, estimate = svm_class) %>% 
184 |   knitr::kable()
185 | ```
186 | 
187 | | .metric  | .estimator |  .estimate|
188 | |:---------|:-----------|----------:|
189 | | accuracy | binary     |  0.8333333|
190 | | kap      | binary     |  0.5000000|
191 | 
192 | ``` r
193 | #plotting the roc curve:
194 | test_results %>%
195 |   roc_curve(truth = Suicide_Risk, log_prob) %>% 
196 |   autoplot()
197 | ```
198 | 
199 | ![](tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-1.png)
200 | 
201 | ``` r
202 | test_results %>% 
203 |   mutate(log_prob = 1 - log_prob) %>% # for the plot to show correctly (otherwise the line would be flipped)
204 |   gain_curve(truth = Suicide_Risk, log_prob) %>% 
205 |   autoplot()
206 | ```
207 | 
208 | ![](tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-2.png)
209 | 
210 | For more on how to interpret roc curves see this [toward datascience
211 | blogpost](https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5)
212 | 
213 | For more on how to interpret gain curves see this [stack exchange
214 | question](https://stats.stackexchange.com/questions/279385/gain-curve-interpretation)
215 | 
216 | Multiple Cross Validation
217 | -------------------------
218 | 
219 | ``` r
220 | # create 10 folds, 10 times, and make sure Suicide_Risk is balances across groups
221 | cv_folds <- vfold_cv(df_train, v = 10, repeats = 10, strata = Suicide_Risk)
222 | ```
223 | 
224 | **Note** that to respect ID use `group_vfold_cv()` or specify `group`.
225 | 
226 | That was almost too easy. Now let us prepare the dataset and the fit the
227 | model to each supset.
228 | 
229 | ``` r
230 | #prepare data set and fetch train data
231 | cv_folds <- cv_folds %>% 
232 |   mutate(recipes = splits %>%
233 |            # prepper is a wrapper for `prep()` which handles `split` objects
234 |            map(prepper, recipe = rec),
235 |          train_data = splits %>% map(training))
236 | 
237 | # train model of each fold
238 |   # create a non-fitted model
239 | log_fit <- 
240 |   logistic_reg() %>%
241 |   set_mode("classification") %>% 
242 |   set_engine("glm") 
243 | 
244 | 
245 | cv_folds <- cv_folds %>%  mutate(
246 |   log_fits = pmap(list(recipes, train_data), #input 
247 |                             ~ fit(log_fit, formula(.x), data = bake(object = .x, new_data = .y)) # function to apply
248 |                  ))
249 | ```
250 | 
251 | map2 is a parallel version of map. It have the syntax:
252 | `pmap(list(x, y), ~ .x + .y)`, where x and y are input and ´\~.x + .y´
253 | if the function to apply. You could also have used `map2`, which is
254 | specialised for just two variables `map2(x, y, ~ .x + .y)`.
255 | 
256 | ``` r
257 | cv_folds %>% head(5)
258 | ```
259 | 
260 |     ## # A tibble: 5 x 6
261 |     ##   splits          id       id2    recipes     train_data        log_fits   
262 |     ## * <named list>    <chr>    <chr>  <named lis> <named list>      <named lis>
263 |     ## 1 <split [86/11]> Repeat01 Fold01 <recipe>    <tibble [86 × 13… <fit[+]>   
264 |     ## 2 <split [86/11]> Repeat01 Fold02 <recipe>    <tibble [86 × 13… <fit[+]>   
265 |     ## 3 <split [86/11]> Repeat01 Fold03 <recipe>    <tibble [86 × 13… <fit[+]>   
266 |     ## 4 <split [87/10]> Repeat01 Fold04 <recipe>    <tibble [87 × 13… <fit[+]>   
267 |     ## 5 <split [88/9]>  Repeat01 Fold05 <recipe>    <tibble [88 × 13… <fit[+]>
268 | 
269 | Note how the dataframe looks. Take some time to understand it and note
270 | especially that cells contains entire datasets and their respective
271 | recipes and models.
272 | 
273 | Now it gets slightly more complicated, we create a function which takens
274 | in a split (fold) from the above cross validation and applies a recipe
275 | and a model to it. Returns a tibble containing the actual and predicted
276 | results. We then apply it to the dataset.
277 | 
278 | ``` r
279 | predict_log <- function(split, rec, model) {
280 |   # IN
281 |     # split: a split data
282 |     # rec: recipe to prepare the data
283 |     # 
284 |   # OUT
285 |     # a tibble of the actual and predicted results
286 |   baked_test <- bake(rec, testing(split))
287 |   tibble(
288 |     actual = baked_test$Suicide_Risk,
289 |     predicted = predict(model, new_data = baked_test) %>% pull(.pred_class),
290 |     prop_sui =  predict(model, new_data = baked_test, type = "prob") %>% pull(.pred_Suicidal),
291 |     prop_non_sui =  predict(model, new_data = baked_test, type = "prob") %>% pull(`.pred_Non-Suicidal`)
292 |   ) 
293 | }
294 | 
295 | # apply our function to each split, which their respective recipes and models (in this case log fits) and save it to a new col
296 | cv_folds <- cv_folds %>% 
297 |   mutate(pred = pmap(list(splits, recipes, log_fits) , predict_log))
298 | ```
299 | 
300 | Performance metrics
301 | -------------------
302 | 
303 | ``` r
304 | eval <- 
305 |   cv_folds %>% 
306 |   mutate(
307 |     metrics = pmap(list(pred), ~ metrics(., truth = actual, estimate = predicted, prop_sui))) %>% 
308 |   select(id, id2, metrics) %>% 
309 |   unnest(metrics)
310 | 
311 | #inspect performance metrics
312 | eval %>% 
313 |   select(repeat_n = id, fold_n = id2, metric = .metric, estimate = .estimate) %>% 
314 |   spread(metric, estimate) %>% 
315 |   head() %>% 
316 |   knitr::kable()
317 | ```
318 | 
319 | | repeat\_n | fold\_n |   accuracy|         kap|  mn\_log\_loss|   roc\_auc|
320 | |:----------|:--------|----------:|-----------:|--------------:|----------:|
321 | | Repeat01  | Fold01  |  0.6363636|   0.0833333|       1.282365|  0.4583333|
322 | | Repeat01  | Fold02  |  0.6363636|  -0.1578947|       2.155142|  0.7916667|
323 | | Repeat01  | Fold03  |  0.7272727|   0.2325581|       3.027221|  0.7083333|
324 | | Repeat01  | Fold04  |  0.8000000|   0.5238095|       2.302621|  0.9523810|
325 | | Repeat01  | Fold05  |  0.8888889|   0.6086957|       2.145611|  0.5000000|
326 | | Repeat01  | Fold06  |  0.7777778|   0.3571429|       1.693287|  0.7857143|
327 | 
328 | FAQ
329 | ---
330 | 
331 | ### We have problems with merge and ID’s what should we do?
332 | 
333 | You should first of all have that the following code evaluates to true,
334 | e.g. there should be the same amount of datapoint as you have pitch
335 | files.
336 | 
337 |     nrow(df) == length(list.files(path = filepath, pattern = ".txt", full.names = T))
338 | 
339 | Furthermore you should have unique participant ID’s. You can check this
340 | using the following code. Before you use it write a comment about what
341 | it does and why it works. What should the maximum value of
342 | `n_studies_pr_id` be?
343 | 
344 |     df %>% select(id, study) %>% unique() %>%  group_by(id) %>% summarise(n_studies_pr_id = n()) %>% View()
345 | 
346 | ### Can I please have some more cvms?
347 | 
348 | Of course you can! The following is an example sent to me by Ludvig on
349 | how to use tidy models with cvms. Note that you will need to update cvms
350 | from CRAN, before the following work.
351 | 
352 | ``` r
353 | # Attach packages
354 | pacman::p_load(cvms, tidymodels) 
355 | 
356 | # Prepare data
357 | dat <- groupdata2::fold(participant.scores, k = 4,
358 |                                      cat_col = 'diagnosis',
359 |                                      id_col = 'participant')
360 | dat[["diagnosis"]] <- factor(dat[["diagnosis"]])
361 | 
362 | # Create a model function (random forest in this case)
363 | # that takes train_data and formula as arguments
364 | # and returns the fitted model object
365 | rf_model_fn <- function(train_data, formula){
366 |     rand_forest(trees = 100, mode = "classification") %>%
367 |       set_engine("randomForest") %>%
368 |       fit(formula, data = train_data)
369 |   }
370 | 
371 | # Create a predict function
372 | # Usually just wraps stats::predict
373 | # Takes test_data, model and formula arguments
374 | # and returns vector with probabilities of class 1
375 | # (this depends on the type of task, gaussian, binomial or multinomial)
376 | rf_predict_fn <- function(test_data, model, formula){
377 |     stats::predict(object = model, new_data = test_data, type = "prob")[[2]]
378 |   }
379 | 
380 | # Now cross-validation
381 | # Note the different argument names from cross_validate()
382 | CV <- cross_validate_fn(
383 |   dat,
384 |   model_fn = rf_model_fn,
385 |   formulas = c("diagnosis~score", "diagnosis~age"),
386 |   fold_cols = '.folds',
387 |   type = 'binomial',
388 |   predict_fn = rf_predict_fn
389 | )
390 | #inspect data
391 | CV %>% 
392 |   select(1:6) %>% #select only the first 6 cols
393 |   head(2) %>% #select only the first two rows
394 |   knitr::kable()
395 | ```
396 | 
397 | |  Balanced Accuracy|         F1|  Sensitivity|  Specificity|  Pos Pred Value|  Neg Pred Value|
398 | |------------------:|----------:|------------:|------------:|---------------:|---------------:|
399 | |          0.5972222|  0.7179487|    0.7777778|    0.4166667|       0.6666667|       0.5555556|
400 | |          0.2916667|  0.3636364|    0.3333333|    0.2500000|       0.4000000|       0.2000000|
401 | 
402 | ### I want to do more!
403 | 
404 | Well you crazy bastard! I encourage you to do one of these two things
405 | (or both):
406 | 
407 | 1.  do a bootstrapped and nested cross validation
408 | 
409 | 2.  make a ensample model, which combines it input of multiple models.
410 |     Does combining models yield better results? What a potential issues
411 |     with this approach?
412 | 
413 | ### I have a 100% accuracy!
414 | 
415 | It seems like you have some kind of data leakage. Try to check the
416 | following:
417 | 
418 | 1.  Check that you don’t use predictor which gives you the diagnosis
419 |     e.g. filename, sympton severity etc.
420 | 
421 | 2.  If you use a random intercept for ID, make sure when you predict
422 |     that the model doesn’t apply previous knowledge of for random
423 |     intercepts. One way to do this is to is to simply change ID after
424 |     fitting (e.g. `ID_101` -&gt; `ID_101_test`). An even simpler way is
425 |     to remove the random intercept entirely. I suggest comparing the two
426 |     approaches and see which ones performs the best.
427 | 
428 | ### How do I add a random effect to tidy models?
429 | 
430 | As far as I know you can’t (at least not yet), see [this GitHub issue
431 | for more](https://github.com/tidymodels/parsnip/issues/35). But I
432 | recommend looking at the FAQ titled: ‘Can I please have some more cvms?’
433 | 


--------------------------------------------------------------------------------
/tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/577db60da2d0448a9ba6cae4f9861524837efd00/tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-1.png


--------------------------------------------------------------------------------
/tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KennethEnevoldsen/Exp-Meth-III-Tutorials/577db60da2d0448a9ba6cae4f9861524837efd00/tutorials-tidy-models_files/figure-markdown_github/unnamed-chunk-6-2.png


--------------------------------------------------------------------------------