├── .gitignore ├── LICENSE ├── README.md ├── datasets ├── README.md ├── coart_test ├── mental_abacus_data.csv └── pragmatic_scales_data.csv ├── intro_to_ggplot2.Rmd ├── intro_to_ggplot2.html ├── intro_to_ggplot2_someanswers.R ├── intro_to_r.R ├── intro_to_r.Rmd ├── mygraph.pdf ├── open_science.md ├── openscience_tutorial_icis2018.Rproj ├── rmarkdown_handout.Rmd └── rmarkdown_handout.html /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Session Data files 6 | .RData 7 | 8 | # Example code in package build process 9 | *-Ex.R 10 | 11 | # Output files from R CMD build 12 | /*.tar.gz 13 | 14 | # Output files from R CMD check 15 | /*.Rcheck/ 16 | 17 | # RStudio files 18 | .Rproj.user/ 19 | 20 | # produced vignettes 21 | vignettes/*.html 22 | vignettes/*.pdf 23 | 24 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 25 | .httr-oauth 26 | 27 | # knitr and R markdown default cache directories 28 | /*_cache/ 29 | /cache/ 30 | 31 | # Temporary files created by R markdown 32 | *.utf8.md 33 | *.knit.md 34 | 35 | # Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html 36 | rsconnect/ 37 | .Rproj.user 38 | 39 | .DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Michael Frank 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tools for Open Science: Reproducible Data Analysis and Paper Writing in R 2 | ## ICIS 2018 Pre-Conference Workshop 3 | 4 | *Time*: 8:30AM - 4:30PM 5 | 6 | *Place*: Commonwealth AB (2nd floor) 7 | 8 | ### Instructors 9 | 10 | [Michael C. Frank](https://web.stanford.edu/~mcfrank/), Stanford University 11 | 12 | Jessica Kosie U. Oregon 13 | 14 | [Elika Bergelson](http://bergelsonlab.com/), Duke 15 | 16 | [Melissa Kline](http://melissakline.net), MIT 17 | 18 | ### Pre-workshop Setup 19 | 20 | This workshop will be hands-on throughout! That means you need to install some software: 21 | 22 | * [This link](http://swirlstats.com/students.html) goes to a tutorial program (Swirl) which begins by walking you through the process of downloading R and RStudio. If you have never interacted with a terminal/command line before, you might find it helpful to go through just the first few lessons of the 'Basics of R Programming' series, as well. If you have difficulty getting these installed on your laptop, we'll be on hand right before the workshop for troubleshooting! 23 | 24 | * Make an account on the [Open Science Framework](http://osf.io). 25 | 26 | ### Schedule (subject to change) 27 | 28 | *8:00 - 8:30 - Installation troubleshooting (Melissa)* 29 | 30 | 8:30 - 8:45 - Getting Started (Mike) 31 | 32 | 8:45 - 11:00 - Introducing R and the Tidyverse (Jessica and Mike) 33 | 34 | 11:00 - 12:00 - Lunch Break (on your own) 35 | 36 | 12:00 - 1:15 - Reproducible Reports in RMarkdown (Mike) 37 | 38 | 1:15 - 2:30 - Intro to GGPlot (Elika) 39 | 40 | 41 | 2:30 - 2:45 - Mid-afternoon break 42 | 43 | 2:45 - 4:00 - Sharing and managing data using the Open Science Framework (Melissa) 44 | 45 | 4:00 - 4:30 - Open Science Panel and Q&A 46 | 47 | ### How to use this repository 48 | 49 | To download a copy of this repository (containing all the files above), click the green button at the top right that says "Clone or Download" and choose "Download Zip" to download a copy to your laptop. We recommend waiting to do this until the workshop itself so you have the most up-to-date versions of everything! 50 | 51 | ### Troubleshooting `tidyverse` installs 52 | 53 | If `tidyverse` doesn't install, try these things: 54 | 55 | + Check if you have the most recent version of R (at time of writing, 3.5). If not, get it from CRAN (google "R CRAN download"), restart, try again. 56 | + See if `tidyverse` is trying to "build from source." If it is, try saying "no" to this question and see if that works. 57 | + If that doesn't work and you have to install from source, errors may come from not having the appropriate compiler installed on your system. For a mac, this is XCode, which you will need to install from the app store. 58 | + If all else fails, try installing an earlier version of tidyverse, e.g. via the package `devtools`. Try `install.packages("devtools")` then `devtools::install_version("tidyverse", version = "1.0.0", repos = "http://cran.us.r-project.org")`. if that works, you will still find that library("tidyverse") won't work, but you can do the libraries one at a time to get the ones that do, e.g. 59 | library(readr) 60 | library(tidyr) 61 | library(dplyr) 62 | 63 | ### Troubleshooting `knit` and Rmd 64 | 65 | [An overview of R and non-western characters (Cyrillic, Chinese, etc.)](https://www.r-bloggers.com/r-and-foreign-characters/) 66 | 67 | If you have non-Latin characters in your path (eg your username), the button that says `knit` won't work for you. Instead, type the following in the console (not in your `.Rmd` document): 68 | 69 | `knit("mydocument.Rmd", encoding = "utf-8")` 70 | 71 | You may find that you have this trouble in other contexts - things are getting better, but in general (for other programming languages too, not just R) if you see messages saying that a program cannot find a file or that a path doesn't exist, it may be this problem happening again! 72 | -------------------------------------------------------------------------------- /datasets/README.md: -------------------------------------------------------------------------------- 1 | There are 3 datasets stored here that we use in the tutorial: 2 | 3 | ## `mental_abacus_data.csv` 4 | 5 | These are data from [Barner et al. (2017), JNC](https://jnc.psychopen.eu/article/view/106), a classroom-randomized controlled trial of an elementary math intervention. The design was a simple pre-post assessment at the beginning and end of the school year. The primary question was whether assignment to group (control vs. mental abacus) produced differences in any of the outcomes (`ravens`:`woodcockTotal`). A secondary question was about the role of grade level and baseline math knowledge in the success of the intervention. 6 | 7 | Variables: 8 | * `subid` - subject identifier 9 | * `class_num` - class number 10 | * `grade` - school grade 11 | * `group` - condition: mental abacus vs. control 12 | * `year` - Test year (pre vs. post) 13 | * `ravens` - Raven's progressive matrices (actually a knock-off version) 14 | * `gonogo` - Go/no-go average 15 | * `swm` - Spatial working memory 16 | * `pvAvg` - place value average performance 17 | * `arithmeticTotal` - total arithmetic problems correct 18 | * `arithmeticAverage` - average arithmetic performance 19 | * `woodcockTotal`- Woodcock Johnson III math concepts 20 | 21 | ## `pragmatic_scales_data.csv` 22 | 23 | These are data from [Stiller, Goodman, & Frank (2015), LLD](http://langcog.stanford.edu/papers_new/stiller-2015-lld.pdf), a study of preschooler's pragmatic inferencing abilities. The primary question here was whether children were able to make correct pragmatic inferences A) above chance and B) above control condition performance. A secondary question was about developmental change in this ability. 24 | 25 | Variables: 26 | * `subid` - subject identifier 27 | * `item` - base object for the inference, e.g. "my HOUSE has flowers" 28 | * `correct` - whether the pragmatic inference was made correctly 29 | * `age` - age in years 30 | * `condition` - experimental Label (pragmatic inference) or No Label control (puppet has peanut butter in mouth) 31 | 32 | ## `coart_test` 33 | This is a (very) in-prep dataset in the Bergelson lab (h/t grad student Charlotte Moore), but a useful adult eyetracking dataset to play with for graphing some of your graph wishes. 34 | These are some adult controls for a baby study we're still sorting out about coarticulation effects in word comp. 35 | This is your usual 2 pictures on the screen, one gets named, with different conditions setup 36 | All you need to know is that this is eyetracking data that's been binned into 20ms bins in a preprocessing pipeline 37 | 38 | Variables: 39 | * `Nonset`- the onset of the Noun in the carrier phrase eg. 'look at the X', i.e. time 0 40 | * `propt` - is a 1 or a 0 in every 20ms bin for whether child was looking at target or distractor 41 | * `TrialType` - is our condition variable, there are 3 conditions, cross, same, unfam; details irrelevant 42 | 43 | -------------------------------------------------------------------------------- /datasets/coart_test: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mcfrank/openscience_tutorial/fec714335860e1b2cb55f5613cb710bd233df5ad/datasets/coart_test -------------------------------------------------------------------------------- /datasets/mental_abacus_data.csv: -------------------------------------------------------------------------------- 1 | subid,class_num,grade,group,year,ravens,gonogo,swm,pvAvg,arithmeticTotal,arithmeticAverage,woodcockTotal 2 | S1-02-03,S1_02,first grade,CNTL,2015,3,0.8153846153846154,1.8333333333333333,0.0,0,0.0,2 3 | S1-02-03,S1_02,first grade,CNTL,2016,15,0.7076923076923077,2.125,0.36,12,0.25,15 4 | S1-02-08,S1_02,first grade,CNTL,2015,4,0.7615384615384615,1.625,0.0,0,0.0,3 5 | S1-02-08,S1_02,first grade,CNTL,2016,12,0.7076923076923077,1.375,0.36,4,0.08,9 6 | S1-02-17,S1_02,first grade,CNTL,2015,5,0.4307692307692308,1.5,0.09,8,0.17,9 7 | S1-02-17,S1_02,first grade,CNTL,2016,10,0.7307692307692307,4.375,NA,NA,NA,10 8 | S1-03-05,S1_03,first grade,MA,2015,8,0.6230769230769231,2.8181818181818183,0.0,4,0.08,5 9 | S1-03-05,S1_03,first grade,MA,2016,18,0.5615384615384615,5.208333333333333,0.91,16,0.33,13 10 | S1-03-14,S1_03,first grade,MA,2015,8,0.7615384615384615,3.0833333333333335,0.0,3,0.06,6 11 | S1-03-14,S1_03,first grade,MA,2016,8,0.7846153846153846,3.5833333333333335,0.27,8,0.17,9 12 | S1-03-15,S1_03,first grade,MA,2015,10,0.36153846153846153,1.125,0.0,0,0.0,5 13 | S1-03-15,S1_03,first grade,MA,2016,14,0.7307692307692307,2.7916666666666665,0.27,4,0.08,11 14 | S1-04-01,S1_04,first grade,MA,2015,9,0.9307692307692308,4.375,0.0,0,0.0,0 15 | S1-04-01,S1_04,first grade,MA,2016,21,0.8615384615384616,3.5833333333333335,0.82,13,0.27,10 16 | S1-04-03,S1_04,first grade,MA,2015,7,0.7384615384615385,3.4375,0.0,3,0.06,5 17 | S1-04-03,S1_04,first grade,MA,2016,4,0.8307692307692308,4.5,0.36,10,0.21,11 18 | S1-04-04,S1_04,first grade,MA,2015,6,0.7230769230769231,2.4166666666666665,0.0,3,0.06,5 19 | S1-04-04,S1_04,first grade,MA,2016,6,0.7923076923076923,5.6,0.18,6,0.13,7 20 | S1-04-07,S1_04,first grade,MA,2015,13,0.7786259541984732,NA,0.0,1,0.02,4 21 | S1-04-07,S1_04,first grade,MA,2016,7,0.7307692307692307,3.625,0.27,4,0.08,12 22 | S1-04-08,S1_04,first grade,MA,2015,8,0.8692307692307693,2.75,0.0,3,0.06,7 23 | S1-04-08,S1_04,first grade,MA,2016,15,0.8538461538461538,4.3125,0.36,6,0.13,11 24 | S1-04-11,S1_04,first grade,MA,2015,8,0.7538461538461538,1.4166666666666667,0.0,2,0.04,5 25 | S1-04-11,S1_04,first grade,MA,2016,25,0.8384615384615385,3.25531914893617,0.09,5,0.1,10 26 | S1-04-17,S1_04,first grade,MA,2015,17,0.5384615384615384,4.117647058823529,0.09,4,0.08,7 27 | S1-04-17,S1_04,first grade,MA,2016,21,0.6230769230769231,5.083333333333333,0.45,8,0.17,13 28 | S1-04-18,S1_04,first grade,MA,2015,4,0.5615384615384615,3.409090909090909,0.0,2,0.04,4 29 | S1-04-18,S1_04,first grade,MA,2016,15,0.6076923076923076,5.777777777777778,0.18,12,0.25,11 30 | S1-07-02,S1_07,first grade,CNTL,2015,9,0.7076923076923077,1.6666666666666667,0.0,3,0.06,4 31 | S1-07-02,S1_07,first grade,CNTL,2016,12,0.7,2.375,1.0,15,0.31,13 32 | S1-07-11,S1_07,first grade,CNTL,2015,4,0.8461538461538461,2.8,0.0,2,0.04,4 33 | S1-07-11,S1_07,first grade,CNTL,2016,3,0.8,6.6,0.18,7,0.15,11 34 | S1-07-18,S1_07,first grade,CNTL,2015,0,0.5076923076923077,3.0833333333333335,0.0,1,0.02,3 35 | S1-07-18,S1_07,first grade,CNTL,2016,10,0.7307692307692307,3.8636363636363638,0.36,5,0.1,8 36 | S1-07-20,S1_07,first grade,CNTL,2015,7,0.7692307692307693,6.235294117647059,0.27,4,0.08,5 37 | S1-07-20,S1_07,first grade,CNTL,2016,22,0.8846153846153846,5.947368421052632,0.36,8,0.17,8 38 | S1-09-01,S1_09,first grade,CNTL,2015,9,0.8692307692307693,2.2916666666666665,0.18,3,0.06,7 39 | S1-09-01,S1_09,first grade,CNTL,2016,10,0.7692307692307693,2.125,0.18,4,0.08,10 40 | S1-09-05,S1_09,first grade,CNTL,2015,5,0.6153846153846154,3.227272727272727,0.0,2,0.04,3 41 | S1-09-05,S1_09,first grade,CNTL,2016,2,0.6076923076923076,1.4166666666666667,0.36,5,0.1,10 42 | S1-09-06,S1_09,first grade,CNTL,2015,10,0.8538461538461538,3.0434782608695654,0.09,3,0.06,6 43 | S1-09-06,S1_09,first grade,CNTL,2016,14,0.7538461538461538,4.230769230769231,0.09,9,0.19,8 44 | S1-09-07,S1_09,first grade,CNTL,2015,12,0.7153846153846154,3.4545454545454546,0.0,3,0.06,5 45 | S1-09-07,S1_09,first grade,CNTL,2016,15,0.7153846153846154,2.7083333333333335,0.09,8,0.17,11 46 | S1-09-11,S1_09,first grade,CNTL,2015,12,0.8076923076923077,2.875,0.0,3,0.06,5 47 | S1-09-11,S1_09,first grade,CNTL,2016,9,0.9,5.428571428571429,0.27,9,0.19,11 48 | S1-10-01,S1_10,first grade,MA,2015,3,0.6230769230769231,2.7916666666666665,0.0,0,0.0,3 49 | S1-10-01,S1_10,first grade,MA,2016,6,0.6615384615384615,3.6666666666666665,0.36,6,0.13,11 50 | S1-10-02,S1_10,first grade,MA,2015,2,0.7615384615384615,1.9166666666666667,0.0,1,0.02,4 51 | S1-10-02,S1_10,first grade,MA,2016,4,0.823076923076923,2.9166666666666665,0.36,9,0.19,9 52 | S1-10-06,S1_10,first grade,MA,2015,4,0.8538461538461538,1.625,0.0,0,0.0,4 53 | S1-10-06,S1_10,first grade,MA,2016,4,0.9923076923076923,3.761904761904762,0.91,13,0.27,11 54 | S1-10-07,S1_10,first grade,MA,2015,4,0.8461538461538461,4.208333333333333,0.0,2,0.04,5 55 | S1-10-07,S1_10,first grade,MA,2016,11,0.8307692307692308,5.826086956521739,0.73,12,0.25,12 56 | S1-10-08,S1_10,first grade,MA,2015,13,0.8076923076923077,3.0416666666666665,0.0,3,0.06,9 57 | S1-10-08,S1_10,first grade,MA,2016,4,0.7153846153846154,3.25,0.36,13,0.27,12 58 | S1-10-09,S1_10,first grade,MA,2015,5,0.8,3.2916666666666665,0.0,3,0.06,5 59 | S1-10-09,S1_10,first grade,MA,2016,8,0.8396946564885496,3.25,0.36,10,0.21,11 60 | S1-10-12,S1_10,first grade,MA,2015,7,0.5769230769230769,2.3333333333333335,0.0,3,0.06,5 61 | S1-10-12,S1_10,first grade,MA,2016,4,0.8778625954198473,3.3181818181818183,0.36,10,0.21,10 62 | S1-10-13,S1_10,first grade,MA,2015,7,0.8923076923076924,7.4,0.0,3,0.06,5 63 | S1-10-13,S1_10,first grade,MA,2016,9,0.8153846153846154,6.1,0.36,12,0.25,11 64 | S1-11-01,S1_11,first grade,MA,2015,3,0.8692307692307693,4.4,0.18,3,0.06,5 65 | S1-11-01,S1_11,first grade,MA,2016,5,0.8307692307692308,3.4166666666666665,0.27,9,0.19,13 66 | S1-11-02,S1_11,first grade,MA,2015,8,0.823076923076923,2.875,0.0,4,0.08,5 67 | S1-11-02,S1_11,first grade,MA,2016,18,0.8384615384615385,2.75,0.45,13,0.27,13 68 | S1-11-03,S1_11,first grade,MA,2015,6,0.5,NA,0.0,2,0.04,7 69 | S1-11-03,S1_11,first grade,MA,2016,11,0.6461538461538462,3.2916666666666665,0.36,3,0.06,10 70 | S1-11-04,S1_11,first grade,MA,2015,6,0.6076923076923076,2.0,0.0,3,0.06,5 71 | S1-11-04,S1_11,first grade,MA,2016,7,0.6076923076923076,2.8181818181818183,0.09,5,0.1,10 72 | S1-11-09,S1_11,first grade,MA,2015,5,0.7538461538461538,4.681818181818182,0.0,3,0.06,5 73 | S1-11-09,S1_11,first grade,MA,2016,9,0.7461538461538462,6.3,0.09,6,0.13,9 74 | S1-11-10,S1_11,first grade,MA,2015,9,0.7846153846153846,3.8333333333333335,0.0,1,0.02,8 75 | S1-11-10,S1_11,first grade,MA,2016,12,0.7615384615384615,3.8333333333333335,0.27,6,0.13,12 76 | S1-11-11,S1_11,first grade,MA,2015,7,0.6692307692307692,2.1666666666666665,0.0,0,0.0,1 77 | S1-11-11,S1_11,first grade,MA,2016,3,0.6538461538461539,5.090909090909091,0.09,5,0.1,8 78 | S1-11-17,S1_11,first grade,MA,2015,8,0.5615384615384615,1.625,0.0,1,0.02,8 79 | S1-11-17,S1_11,first grade,MA,2016,2,0.8615384615384616,1.625,0.09,4,0.08,8 80 | S1-12-01,S1_12,first grade,CNTL,2015,13,0.9230769230769231,1.9090909090909092,0.0,1,0.02,5 81 | S1-12-01,S1_12,first grade,CNTL,2016,3,0.9230769230769231,3.25,0.18,14,0.29,13 82 | S1-12-04,S1_12,first grade,CNTL,2015,7,0.676923076923077,1.5,0.0,0,0.0,3 83 | S1-12-04,S1_12,first grade,CNTL,2016,6,0.6384615384615384,1.9583333333333333,0.09,4,0.08,10 84 | S1-12-05,S1_12,first grade,CNTL,2015,7,0.7384615384615385,2.875,0.0,2,0.04,7 85 | S1-12-05,S1_12,first grade,CNTL,2016,10,0.8076923076923077,7.5,0.27,5,0.1,10 86 | S1-12-08,S1_12,first grade,CNTL,2015,9,0.7153846153846154,2.3125,0.0,1,0.02,3 87 | S1-12-08,S1_12,first grade,CNTL,2016,4,0.7538461538461538,3.235294117647059,0.0,2,0.04,4 88 | S1-12-11,S1_12,first grade,CNTL,2015,0,0.6846153846153846,1.625,0.0,1,0.02,7 89 | S1-12-11,S1_12,first grade,CNTL,2016,12,0.6153846153846154,1.2083333333333333,0.18,6,0.13,11 90 | S1-14-01,S1_14,first grade,CNTL,2015,10,0.7076923076923077,2.5416666666666665,0.0,2,0.04,5 91 | S1-14-01,S1_14,first grade,CNTL,2016,17,0.5769230769230769,2.7916666666666665,0.36,9,0.19,11 92 | S1-14-02,S1_14,first grade,CNTL,2015,8,0.5076923076923077,1.5416666666666667,0.0,2,0.04,5 93 | S1-14-02,S1_14,first grade,CNTL,2016,13,0.8,4.125,0.0,4,0.08,10 94 | S1-14-07,S1_14,first grade,CNTL,2015,6,0.8076923076923077,3.25,0.0,2,0.04,6 95 | S1-14-07,S1_14,first grade,CNTL,2016,7,0.8307692307692308,4.166666666666667,0.45,12,0.25,13 96 | S1-14-09,S1_14,first grade,CNTL,2015,11,NA,2.75,0.0,3,0.06,6 97 | S1-14-09,S1_14,first grade,CNTL,2016,18,0.9,4.947368421052632,0.73,12,0.25,11 98 | S1-14-11,S1_14,first grade,CNTL,2015,8,0.6692307692307692,4.125,0.0,1,0.02,6 99 | S1-14-11,S1_14,first grade,CNTL,2016,8,0.8384615384615385,4.0,0.45,16,0.33,11 100 | S1-14-12,S1_14,first grade,CNTL,2015,20,0.8692307692307693,4.65,0.45,9,0.19,12 101 | S1-14-12,S1_14,first grade,CNTL,2016,20,0.7846153846153846,6.117647058823529,0.91,22,0.46,15 102 | S1-14-14,S1_14,first grade,CNTL,2015,16,0.9615384615384616,5.083333333333333,0.0,3,0.06,5 103 | S1-14-14,S1_14,first grade,CNTL,2016,19,0.9384615384615385,2.0,0.36,11,0.23,12 104 | S1-14-15,S1_14,first grade,CNTL,2015,10,0.8384615384615385,4.2105263157894735,0.0,4,0.08,5 105 | S1-14-15,S1_14,first grade,CNTL,2016,19,0.8538461538461538,3.15,0.64,13,0.27,12 106 | S1-14-17,S1_14,first grade,CNTL,2015,12,0.6,1.625,0.09,1,0.02,5 107 | S1-14-17,S1_14,first grade,CNTL,2016,4,0.8923076923076924,2.6666666666666665,0.36,11,0.23,12 108 | S1-15-01,S1_15,first grade,MA,2015,0,0.5384615384615384,3.857142857142857,0.0,0,0.0,3 109 | S1-15-01,S1_15,first grade,MA,2016,2,0.5384615384615384,4.25,0.0,4,0.08,10 110 | S1-15-02,S1_15,first grade,MA,2015,6,0.7384615384615385,2.7083333333333335,0.0,2,0.04,5 111 | S1-15-02,S1_15,first grade,MA,2016,8,0.8769230769230769,2.909090909090909,0.36,6,0.13,11 112 | S1-15-04,S1_15,first grade,MA,2015,10,0.7307692307692307,2.1666666666666665,0.0,2,0.04,5 113 | S1-15-04,S1_15,first grade,MA,2016,7,0.823076923076923,8.538461538461538,0.27,4,0.08,13 114 | S1-15-05,S1_15,first grade,MA,2015,8,0.5769230769230769,2.4705882352941178,0.0,1,0.02,5 115 | S1-15-05,S1_15,first grade,MA,2016,6,0.6307692307692307,3.375,0.36,5,0.1,11 116 | S1-15-08,S1_15,first grade,MA,2015,8,0.9461538461538461,8.714285714285714,0.27,4,0.08,8 117 | S1-15-08,S1_15,first grade,MA,2016,23,0.9538461538461539,5.578947368421052,0.91,17,0.35,14 118 | S1-15-09,S1_15,first grade,MA,2015,7,0.6461538461538462,NA,0.09,4,0.08,10 119 | S1-15-09,S1_15,first grade,MA,2016,11,0.6076923076923076,1.9166666666666667,0.73,13,0.27,12 120 | S1-15-10,S1_15,first grade,MA,2015,8,0.7615384615384615,4.944444444444445,0.09,3,0.06,7 121 | S1-15-10,S1_15,first grade,MA,2016,15,0.8538461538461538,7.3076923076923075,0.45,12,0.25,12 122 | S1-15-11,S1_15,first grade,MA,2015,7,0.8692307692307693,2.9166666666666665,0.09,2,0.04,8 123 | S1-15-11,S1_15,first grade,MA,2016,22,0.8153846153846154,4.541666666666667,0.45,13,0.27,13 124 | S2-03-01,S2_03,second grade,MA,2015,15,0.8153846153846154,3.8666666666666667,0.27,4,0.08,6 125 | S2-03-01,S2_03,second grade,MA,2016,23,0.8384615384615385,4.565217391304348,0.82,9,0.19,11 126 | S2-03-02,S2_03,second grade,MA,2015,8,0.9153846153846154,5.041666666666667,0.36,6,0.13,11 127 | S2-03-02,S2_03,second grade,MA,2016,19,0.8846153846153846,5.0,0.82,11,0.23,11 128 | S2-03-03,S2_03,second grade,MA,2015,10,0.7384615384615385,1.875,0.36,12,0.25,14 129 | S2-03-03,S2_03,second grade,MA,2016,6,0.6307692307692307,2.2083333333333335,0.73,17,0.35,14 130 | S2-03-04,S2_03,second grade,MA,2015,7,0.8384615384615385,2.75,0.36,5,0.1,7 131 | S2-03-04,S2_03,second grade,MA,2016,16,0.6692307692307692,4.041666666666667,0.64,7,0.15,12 132 | S2-03-13,S2_03,second grade,MA,2015,0,0.5076923076923077,1.125,0.18,4,0.08,11 133 | S2-03-13,S2_03,second grade,MA,2016,5,0.8846153846153846,3.7916666666666665,0.18,9,0.19,14 134 | S2-04-01,S2_04,second grade,MA,2015,10,0.8846153846153846,3.5833333333333335,0.64,12,0.25,10 135 | S2-04-01,S2_04,second grade,MA,2016,10,0.9153846153846154,4.625,1.0,29,0.6,16 136 | S2-04-05,S2_04,second grade,MA,2015,13,0.6538461538461539,1.8333333333333333,0.36,6,0.13,11 137 | S2-04-05,S2_04,second grade,MA,2016,12,0.8076923076923077,3.7083333333333335,0.45,13,0.27,14 138 | S2-04-07,S2_04,second grade,MA,2015,15,0.9461538461538461,4.041666666666667,0.91,5,0.1,6 139 | S2-04-07,S2_04,second grade,MA,2016,14,0.8307692307692308,4.791666666666667,0.91,15,0.31,15 140 | S2-04-08,S2_04,second grade,MA,2015,21,0.823076923076923,7.15,0.73,13,0.27,11 141 | S2-04-08,S2_04,second grade,MA,2016,19,0.7846153846153846,7.722222222222222,0.91,17,0.35,15 142 | S2-04-09,S2_04,second grade,MA,2015,14,0.8538461538461538,4.791666666666667,0.36,4,0.08,9 143 | S2-04-09,S2_04,second grade,MA,2016,13,0.8384615384615385,5.590909090909091,1.0,16,0.33,14 144 | S2-04-10,S2_04,second grade,MA,2015,18,0.9615384615384616,6.428571428571429,0.45,9,0.19,11 145 | S2-04-10,S2_04,second grade,MA,2016,15,0.8769230769230769,6.0,0.64,16,0.33,13 146 | S2-04-11,S2_04,second grade,MA,2015,12,0.8846153846153846,3.0,0.73,13,0.27,11 147 | S2-04-11,S2_04,second grade,MA,2016,11,0.8769230769230769,3.375,0.73,18,0.38,19 148 | S2-04-12,S2_04,second grade,MA,2015,17,0.8538461538461538,6.157894736842105,0.36,7,0.15,9 149 | S2-04-12,S2_04,second grade,MA,2016,18,0.8384615384615385,4.5,0.64,10,0.21,12 150 | S2-04-16,S2_04,second grade,MA,2015,9,0.8692307692307693,4.333333333333333,0.27,10,0.21,10 151 | S2-04-16,S2_04,second grade,MA,2016,6,0.8307692307692308,4.083333333333333,0.82,14,0.29,13 152 | S2-04-17,S2_04,second grade,MA,2015,11,0.9307692307692308,3.875,0.45,13,0.27,10 153 | S2-04-17,S2_04,second grade,MA,2016,14,0.9076923076923077,4.041666666666667,1.0,26,0.54,16 154 | S2-04-18,S2_04,second grade,MA,2015,21,0.8615384615384616,4.0,0.55,10,0.21,10 155 | S2-04-18,S2_04,second grade,MA,2016,15,0.6923076923076923,4.5,0.91,20,0.42,15 156 | S2-05-02,S2_05,second grade,MA,2015,8,0.8,2.6842105263157894,0.0,3,0.06,5 157 | S2-05-02,S2_05,second grade,MA,2016,2,0.6538461538461539,1.9583333333333333,0.09,6,0.13,11 158 | S2-08-01,S2_08,second grade,CNTL,2015,14,0.8307692307692308,5.130434782608695,0.18,8,0.17,9 159 | S2-08-01,S2_08,second grade,CNTL,2016,14,0.6230769230769231,2.4166666666666665,0.73,12,0.25,17 160 | S2-08-03,S2_08,second grade,CNTL,2015,6,0.7557251908396947,1.3125,0.0,4,0.08,10 161 | S2-08-03,S2_08,second grade,CNTL,2016,6,0.6230769230769231,2.0,0.73,9,0.19,11 162 | S2-09-05,S2_09,second grade,CNTL,2015,15,0.9,5.0,0.27,5,0.1,9 163 | S2-09-05,S2_09,second grade,CNTL,2016,17,0.9,2.2083333333333335,0.73,12,0.25,14 164 | S2-09-07,S2_09,second grade,CNTL,2015,6,0.7461538461538462,2.772727272727273,0.0,4,0.08,5 165 | S2-09-07,S2_09,second grade,CNTL,2016,7,0.8615384615384616,2.25,0.18,10,0.21,13 166 | S2-09-08,S2_09,second grade,CNTL,2015,6,0.8,3.1739130434782608,0.0,4,0.08,8 167 | S2-09-08,S2_09,second grade,CNTL,2016,6,0.8615384615384616,3.0,0.09,9,0.19,11 168 | S2-09-10,S2_09,second grade,CNTL,2015,11,0.8692307692307693,5.909090909090909,0.0,8,0.17,10 169 | S2-09-10,S2_09,second grade,CNTL,2016,18,0.9461538461538461,4.636363636363637,0.73,20,0.42,16 170 | S2-09-13,S2_09,second grade,CNTL,2015,14,0.9307692307692308,4.0,0.45,14,0.29,13 171 | S2-09-13,S2_09,second grade,CNTL,2016,21,0.9538461538461539,4.958333333333333,0.91,17,0.35,17 172 | S2-09-19,S2_09,second grade,CNTL,2015,15,0.8384615384615385,1.75,0.0,5,0.1,10 173 | S2-09-19,S2_09,second grade,CNTL,2016,20,0.8307692307692308,5.2727272727272725,0.36,20,0.42,15 174 | S2-09-20,S2_09,second grade,CNTL,2015,15,0.9384615384615385,4.25,0.91,10,0.21,11 175 | S2-09-20,S2_09,second grade,CNTL,2016,17,0.9076923076923077,4.785714285714286,0.91,24,0.5,14 176 | S2-10-02,S2_10,second grade,CNTL,2015,15,0.7461538461538462,1.9166666666666667,0.36,0,0.0,13 177 | S2-10-02,S2_10,second grade,CNTL,2016,21,0.9,5.521739130434782,0.45,25,0.52,13 178 | S2-10-03,S2_10,second grade,CNTL,2015,15,0.7230769230769231,1.75,0.0,5,0.1,10 179 | S2-10-03,S2_10,second grade,CNTL,2016,17,0.8307692307692308,5.368421052631579,0.36,17,0.35,15 180 | S2-10-05,S2_10,second grade,CNTL,2015,5,0.7461538461538462,3.0833333333333335,0.0,4,0.08,8 181 | S2-10-05,S2_10,second grade,CNTL,2016,15,0.7307692307692307,4.25,0.27,6,0.13,14 182 | S2-10-10,S2_10,second grade,CNTL,2015,9,0.5923076923076923,3.625,0.09,4,0.08,8 183 | S2-10-10,S2_10,second grade,CNTL,2016,18,0.8384615384615385,3.75,0.64,15,0.31,16 184 | S2-10-11,S2_10,second grade,CNTL,2015,4,0.8692307692307693,3.0,0.64,0,0.0,12 185 | S2-10-11,S2_10,second grade,CNTL,2016,17,0.8615384615384616,2.5833333333333335,0.73,14,0.29,10 186 | S2-10-13,S2_10,second grade,CNTL,2015,21,0.7923076923076923,3.8947368421052633,0.91,4,0.08,11 187 | S2-10-13,S2_10,second grade,CNTL,2016,23,0.8,5.375,0.91,15,0.31,15 188 | S2-10-14,S2_10,second grade,CNTL,2015,12,0.8,2.875,0.0,11,0.23,13 189 | S2-10-14,S2_10,second grade,CNTL,2016,10,0.8461538461538461,5.166666666666667,0.18,9,0.19,15 190 | S2-10-15,S2_10,second grade,CNTL,2015,9,0.8692307692307693,4.0,0.45,5,0.1,10 191 | S2-10-15,S2_10,second grade,CNTL,2016,19,0.9,4.208333333333333,0.64,10,0.21,16 192 | S2-10-18,S2_10,second grade,CNTL,2015,12,0.823076923076923,3.4166666666666665,0.36,4,0.08,9 193 | S2-10-18,S2_10,second grade,CNTL,2016,20,0.9153846153846154,7.095238095238095,0.73,13,0.27,15 194 | S2-10-20,S2_10,second grade,CNTL,2015,14,0.8538461538461538,3.7083333333333335,0.27,8,0.17,10 195 | S2-10-20,S2_10,second grade,CNTL,2016,10,0.7923076923076923,5.25,0.27,11,0.23,14 196 | S2-13-01,S2_13,second grade,CNTL,2015,4,0.8615384615384616,4.375,0.36,11,0.23,10 197 | S2-13-01,S2_13,second grade,CNTL,2016,22,0.8923076923076924,5.416666666666667,0.91,19,0.4,15 198 | S2-13-02,S2_13,second grade,CNTL,2015,9,0.7846153846153846,3.0416666666666665,0.36,10,0.21,13 199 | S2-13-02,S2_13,second grade,CNTL,2016,7,0.7769230769230769,6.526315789473684,0.36,24,0.5,16 200 | S2-13-04,S2_13,second grade,CNTL,2015,13,0.8923076923076924,2.3333333333333335,0.36,9,0.19,10 201 | S2-13-04,S2_13,second grade,CNTL,2016,19,0.8153846153846154,4.043478260869565,0.91,18,0.38,14 202 | S2-13-05,S2_13,second grade,CNTL,2015,16,0.7923076923076923,1.6666666666666667,0.36,8,0.17,10 203 | S2-13-05,S2_13,second grade,CNTL,2016,21,0.8153846153846154,5.956521739130435,1.0,14,0.29,16 204 | S2-13-06,S2_13,second grade,CNTL,2015,4,0.7846153846153846,3.0416666666666665,0.18,4,0.08,12 205 | S2-13-06,S2_13,second grade,CNTL,2016,7,0.9230769230769231,3.6956521739130435,0.45,19,0.4,18 206 | S2-13-09,S2_13,second grade,CNTL,2015,8,0.7099236641221374,4.958333333333333,0.27,7,0.15,8 207 | S2-13-09,S2_13,second grade,CNTL,2016,11,0.7,3.5833333333333335,0.64,6,0.13,13 208 | S2-13-12,S2_13,second grade,CNTL,2015,12,0.8692307692307693,3.25,0.0,8,0.17,12 209 | S2-13-12,S2_13,second grade,CNTL,2016,9,0.8615384615384616,4.083333333333333,0.36,9,0.19,13 210 | S2-13-15,S2_13,second grade,CNTL,2015,3,0.8,4.041666666666667,0.45,10,0.21,15 211 | S2-13-15,S2_13,second grade,CNTL,2016,19,0.823076923076923,4.809523809523809,0.82,18,0.38,13 212 | S2-13-18,S2_13,second grade,CNTL,2015,3,0.7615384615384615,2.7916666666666665,0.0,4,0.08,9 213 | S2-13-18,S2_13,second grade,CNTL,2016,8,0.8692307692307693,6.176470588235294,0.0,6,0.13,12 214 | S2-13-19,S2_13,second grade,CNTL,2015,4,0.8615384615384616,4.428571428571429,0.18,5,0.1,10 215 | S2-13-19,S2_13,second grade,CNTL,2016,14,0.8076923076923077,3.347826086956522,0.36,14,0.29,15 216 | S2-13-20,S2_13,second grade,CNTL,2015,6,0.8923076923076924,3.7916666666666665,0.82,12,0.25,13 217 | S2-13-20,S2_13,second grade,CNTL,2016,25,0.9461538461538461,4.611111111111111,0.82,15,0.31,14 218 | S2-13-21,S2_13,second grade,CNTL,2015,4,0.7,4.75,0.0,6,0.13,9 219 | S2-13-21,S2_13,second grade,CNTL,2016,15,0.8923076923076924,4.291666666666667,0.18,10,0.21,13 220 | S2-16-01,S2_16,second grade,MA,2015,19,0.9538461538461539,2.125,0.09,14,0.29,11 221 | S2-16-01,S2_16,second grade,MA,2016,22,0.8384615384615385,4.5,0.91,13,0.27,12 222 | S2-16-05,S2_16,second grade,MA,2015,10,0.8692307692307693,3.2941176470588234,0.27,12,0.25,10 223 | S2-16-05,S2_16,second grade,MA,2016,10,0.9230769230769231,4.166666666666667,0.55,16,0.33,16 224 | S2-16-08,S2_16,second grade,MA,2015,10,0.8615384615384616,3.5833333333333335,0.36,7,0.15,9 225 | S2-16-08,S2_16,second grade,MA,2016,22,0.9615384615384616,4.9523809523809526,0.73,9,0.19,12 226 | S2-16-09,S2_16,second grade,MA,2015,9,0.8692307692307693,3.8333333333333335,0.27,6,0.13,9 227 | S2-16-09,S2_16,second grade,MA,2016,16,0.8384615384615385,4.5,0.82,14,0.29,13 228 | S2-16-11,S2_16,second grade,MA,2015,10,0.8153846153846154,3.875,0.0,4,0.08,8 229 | S2-16-11,S2_16,second grade,MA,2016,18,0.9230769230769231,7.529411764705882,0.0,9,0.19,13 230 | S2-16-14,S2_16,second grade,MA,2015,4,0.7480916030534351,3.0,0.18,5,0.1,9 231 | S2-16-14,S2_16,second grade,MA,2016,8,0.6538461538461539,3.2916666666666665,0.36,9,0.19,11 232 | S2-16-15,S2_16,second grade,MA,2015,4,0.8153846153846154,5.818181818181818,0.55,8,0.17,10 233 | S2-16-15,S2_16,second grade,MA,2016,22,0.6230769230769231,5.782608695652174,1.0,15,0.31,14 234 | S2-16-17,S2_16,second grade,MA,2015,8,0.8153846153846154,1.5,0.0,3,0.06,9 235 | S2-16-17,S2_16,second grade,MA,2016,7,0.7923076923076923,3.9166666666666665,0.0,9,0.19,14 236 | S2-16-18,S2_16,second grade,MA,2015,4,0.6538461538461539,1.6666666666666667,0.09,4,0.08,10 237 | S2-16-18,S2_16,second grade,MA,2016,13,0.8384615384615385,3.0833333333333335,0.18,7,0.15,13 238 | S2-16-19,S2_16,second grade,MA,2015,9,0.7153846153846154,NA,0.91,16,0.33,13 239 | S2-16-19,S2_16,second grade,MA,2016,12,0.46153846153846156,5.782608695652174,1.0,19,0.4,20 240 | S2-16-20,S2_16,second grade,MA,2015,4,0.9538461538461539,3.1666666666666665,0.55,4,0.08,10 241 | S2-16-20,S2_16,second grade,MA,2016,13,0.7153846153846154,4.5,0.91,10,0.21,15 242 | S2-17-02,S2_17,second grade,MA,2015,6,0.8769230769230769,2.75,0.27,11,0.23,9 243 | S2-17-02,S2_17,second grade,MA,2016,11,0.9307692307692308,4.583333333333333,0.91,16,0.33,13 244 | S2-17-03,S2_17,second grade,MA,2015,17,0.7846153846153846,3.2916666666666665,0.36,6,0.13,10 245 | S2-17-03,S2_17,second grade,MA,2016,8,0.8307692307692308,6.5,0.27,12,0.25,13 246 | S2-17-04,S2_17,second grade,MA,2015,10,0.7692307692307693,2.9166666666666665,0.09,8,0.17,11 247 | S2-17-04,S2_17,second grade,MA,2016,16,0.8307692307692308,2.9375,0.82,12,0.25,18 248 | S2-17-05,S2_17,second grade,MA,2015,18,0.8307692307692308,6.352941176470588,0.27,4,0.08,10 249 | S2-17-05,S2_17,second grade,MA,2016,20,0.9461538461538461,4.2727272727272725,1.0,9,0.19,18 250 | S2-17-06,S2_17,second grade,MA,2015,14,0.8769230769230769,4.0476190476190474,0.36,8,0.17,12 251 | S2-17-06,S2_17,second grade,MA,2016,18,0.7692307692307693,4.6521739130434785,0.82,14,0.29,13 252 | S2-17-07,S2_17,second grade,MA,2015,17,0.9230769230769231,2.5833333333333335,0.09,5,0.1,12 253 | S2-17-07,S2_17,second grade,MA,2016,18,0.8153846153846154,5.375,0.64,21,0.44,14 254 | S2-17-08,S2_17,second grade,MA,2015,19,0.6307692307692307,4.913043478260869,0.09,3,0.06,11 255 | S2-17-08,S2_17,second grade,MA,2016,21,0.8307692307692308,5.083333333333333,0.73,16,0.33,15 256 | S2-17-09,S2_17,second grade,MA,2015,4,0.8923076923076924,4.041666666666667,0.18,8,0.17,10 257 | S2-17-09,S2_17,second grade,MA,2016,18,0.8846153846153846,4.291666666666667,0.27,17,0.35,13 258 | S2-17-10,S2_17,second grade,MA,2015,3,0.8615384615384616,1.9166666666666667,0.64,10,0.21,9 259 | S2-17-10,S2_17,second grade,MA,2016,13,0.823076923076923,4.0,1.0,15,0.31,15 260 | S2-17-11,S2_17,second grade,MA,2015,19,0.8615384615384616,2.7916666666666665,0.27,4,0.08,9 261 | S2-17-11,S2_17,second grade,MA,2016,21,0.8384615384615385,4.5,0.64,19,0.4,15 262 | S2-17-12,S2_17,second grade,MA,2015,5,0.9307692307692308,3.3333333333333335,0.27,8,0.17,8 263 | S2-17-12,S2_17,second grade,MA,2016,16,0.9461538461538461,5.0,0.18,12,0.25,12 264 | S2-17-13,S2_17,second grade,MA,2015,3,0.9076923076923077,3.1052631578947367,0.36,5,0.1,11 265 | S2-17-13,S2_17,second grade,MA,2016,19,0.8769230769230769,4.6,0.55,12,0.25,11 266 | S2-17-14,S2_17,second grade,MA,2015,11,0.7769230769230769,4.0,0.0,5,0.1,9 267 | S2-17-14,S2_17,second grade,MA,2016,16,0.8846153846153846,2.4166666666666665,0.36,7,0.15,11 268 | S2-17-15,S2_17,second grade,MA,2015,4,0.8692307692307693,5.391304347826087,0.22,14,0.29,12 269 | S2-17-15,S2_17,second grade,MA,2016,23,0.8692307692307693,4.833333333333333,1.0,18,0.38,19 270 | S2-17-16,S2_17,second grade,MA,2015,6,0.7461538461538462,3.25,0.27,4,0.08,11 271 | S2-17-16,S2_17,second grade,MA,2016,7,0.8307692307692308,1.875,0.27,13,0.27,15 272 | S2-17-17,S2_17,second grade,MA,2015,13,0.7615384615384615,4.208333333333333,0.55,5,0.1,10 273 | S2-17-17,S2_17,second grade,MA,2016,13,0.8769230769230769,6.055555555555555,0.64,16,0.33,14 274 | S2-17-19,S2_17,second grade,MA,2015,17,0.6461538461538462,2.0,0.36,8,0.17,11 275 | S2-17-19,S2_17,second grade,MA,2016,18,0.8076923076923077,3.357142857142857,0.64,15,0.31,14 276 | S2-18-01,S2_18,second grade,MA,2015,11,0.8923076923076924,3.0416666666666665,0.64,6,0.13,13 277 | S2-18-01,S2_18,second grade,MA,2016,18,0.7615384615384615,5.473684210526316,0.91,17,0.35,14 278 | S2-18-02,S2_18,second grade,MA,2015,16,0.8769230769230769,7.055555555555555,0.82,12,0.25,11 279 | S2-18-02,S2_18,second grade,MA,2016,21,0.8923076923076924,6.173913043478261,1.0,16,0.33,17 280 | S2-18-06,S2_18,second grade,MA,2015,8,0.6615384615384615,2.0833333333333335,0.0,2,0.04,3 281 | S2-18-06,S2_18,second grade,MA,2016,4,0.5538461538461539,2.1,0.27,4,0.08,11 282 | S2-18-07,S2_18,second grade,MA,2015,8,0.46153846153846156,2.3043478260869565,0.0,0,0.0,6 283 | S2-18-07,S2_18,second grade,MA,2016,17,0.27692307692307694,1.6666666666666667,0.0,5,0.1,11 284 | S2-18-09,S2_18,second grade,MA,2015,20,0.8461538461538461,2.0833333333333335,0.36,4,0.08,6 285 | S2-18-09,S2_18,second grade,MA,2016,6,0.6846153846153846,2.8823529411764706,0.45,7,0.15,12 286 | S2-18-10,S2_18,second grade,MA,2015,4,0.7230769230769231,1.5555555555555556,0.27,4,0.08,10 287 | S2-18-10,S2_18,second grade,MA,2016,4,0.676923076923077,2.4583333333333335,0.45,8,0.17,11 288 | S2-18-11,S2_18,second grade,MA,2015,13,0.8076923076923077,3.375,0.09,5,0.1,9 289 | S2-18-11,S2_18,second grade,MA,2016,14,0.6923076923076923,3.857142857142857,0.36,10,0.21,12 290 | S2-18-12,S2_18,second grade,MA,2015,3,0.7769230769230769,3.0833333333333335,0.45,6,0.13,11 291 | S2-18-12,S2_18,second grade,MA,2016,5,0.8153846153846154,7.25,0.82,21,0.44,15 292 | S2-18-13,S2_18,second grade,MA,2015,20,0.7307692307692307,3.5416666666666665,0.91,7,0.15,12 293 | S2-18-13,S2_18,second grade,MA,2016,21,0.8615384615384616,5.583333333333333,0.91,20,0.42,17 294 | S2-18-15,S2_18,second grade,MA,2015,13,0.9384615384615385,5.857142857142857,0.64,14,0.29,13 295 | S2-18-15,S2_18,second grade,MA,2016,11,0.9769230769230769,6.833333333333333,1.0,19,0.4,15 296 | S2-18-16,S2_18,second grade,MA,2015,8,0.8461538461538461,3.9166666666666665,0.18,6,0.13,10 297 | S2-18-16,S2_18,second grade,MA,2016,5,0.8846153846153846,4.761904761904762,0.45,6,0.13,12 298 | S2-18-17,S2_18,second grade,MA,2015,4,0.9307692307692308,4.045454545454546,0.27,8,0.17,5 299 | S2-18-17,S2_18,second grade,MA,2016,10,0.9692307692307692,6.478260869565218,0.64,11,0.23,15 300 | S2-18-19,S2_18,second grade,MA,2015,4,0.9538461538461539,5.136363636363637,0.55,11,0.23,13 301 | S2-18-19,S2_18,second grade,MA,2016,23,0.8153846153846154,3.8823529411764706,1.0,14,0.29,16 302 | S2-18-20,S2_18,second grade,MA,2015,14,0.9,4.416666666666667,0.09,7,0.15,11 303 | S2-18-20,S2_18,second grade,MA,2016,15,0.9538461538461539,4.416666666666667,1.0,13,0.27,14 304 | S2-19-01,S2_19,second grade,CNTL,2015,8,0.8769230769230769,4.086956521739131,0.18,3,0.06,9 305 | S2-19-01,S2_19,second grade,CNTL,2016,13,0.8846153846153846,3.7083333333333335,0.45,14,0.29,16 306 | S2-19-03,S2_19,second grade,CNTL,2015,8,0.8692307692307693,4.818181818181818,0.09,4,0.08,6 307 | S2-19-03,S2_19,second grade,CNTL,2016,4,0.8307692307692308,4.958333333333333,0.36,6,0.13,14 308 | S2-19-04,S2_19,second grade,CNTL,2015,6,0.5769230769230769,2.4166666666666665,0.36,1,0.02,8 309 | S2-19-04,S2_19,second grade,CNTL,2016,17,0.7692307692307693,4.875,0.45,12,0.25,15 310 | S2-19-05,S2_19,second grade,CNTL,2015,6,0.7076923076923077,5.285714285714286,0.27,5,0.1,8 311 | S2-19-05,S2_19,second grade,CNTL,2016,12,0.7769230769230769,2.8823529411764706,0.36,3,0.06,12 312 | S2-19-06,S2_19,second grade,CNTL,2015,9,0.6461538461538462,2.375,0.0,4,0.08,9 313 | S2-19-06,S2_19,second grade,CNTL,2016,10,0.8153846153846154,5.476190476190476,0.0,9,0.19,10 314 | S2-19-07,S2_19,second grade,CNTL,2015,6,0.7076923076923077,5.888888888888889,0.36,6,0.13,9 315 | S2-19-07,S2_19,second grade,CNTL,2016,11,0.7099236641221374,2.9583333333333335,0.27,8,0.17,9 316 | S2-19-08,S2_19,second grade,CNTL,2015,8,0.5538461538461539,5.875,1.0,14,0.29,13 317 | S2-19-08,S2_19,second grade,CNTL,2016,16,0.7692307692307693,4.333333333333333,0.73,17,0.35,17 318 | S2-19-09,S2_19,second grade,CNTL,2015,13,0.7615384615384615,2.5833333333333335,0.36,5,0.1,10 319 | S2-19-09,S2_19,second grade,CNTL,2016,13,0.8615384615384616,4.75,0.18,11,0.23,12 320 | S2-19-11,S2_19,second grade,CNTL,2015,11,0.8769230769230769,4.368421052631579,0.0,4,0.08,6 321 | S2-19-11,S2_19,second grade,CNTL,2016,10,0.9692307692307692,4.2,0.45,8,0.17,13 322 | S2-19-12,S2_19,second grade,CNTL,2015,14,0.8846153846153846,2.6875,0.64,3,0.06,7 323 | S2-19-12,S2_19,second grade,CNTL,2016,20,0.8538461538461538,2.75,1.0,15,0.31,16 324 | S2-19-13,S2_19,second grade,CNTL,2015,17,0.8076923076923077,4.045454545454546,1.0,14,0.29,13 325 | S2-19-13,S2_19,second grade,CNTL,2016,13,0.823076923076923,4.041666666666667,0.91,24,0.5,19 326 | S2-19-14,S2_19,second grade,CNTL,2015,8,0.6692307692307692,1.35,0.36,6,0.13,10 327 | S2-19-14,S2_19,second grade,CNTL,2016,9,0.7,2.4583333333333335,0.27,7,0.15,13 328 | S2-19-18,S2_19,second grade,CNTL,2015,1,0.7230769230769231,2.0833333333333335,0.36,4,0.08,9 329 | S2-19-18,S2_19,second grade,CNTL,2016,9,0.9076923076923077,3.5,0.27,12,0.25,11 330 | -------------------------------------------------------------------------------- /datasets/pragmatic_scales_data.csv: -------------------------------------------------------------------------------- 1 | subid,item,correct,age,condition 2 | M22,faces,1,2,Label 3 | M22,houses,1,2,Label 4 | M22,pasta,0,2,Label 5 | M22,beds,0,2,Label 6 | T22,beds,0,2.13,Label 7 | T22,faces,0,2.13,Label 8 | T22,houses,1,2.13,Label 9 | T22,pasta,1,2.13,Label 10 | T17,pasta,0,2.32,Label 11 | T17,faces,0,2.32,Label 12 | T17,houses,0,2.32,Label 13 | T17,beds,0,2.32,Label 14 | M3,faces,0,2.38,Label 15 | M3,houses,1,2.38,Label 16 | M3,pasta,1,2.38,Label 17 | M3,beds,1,2.38,Label 18 | T19,faces,0,2.47,Label 19 | T19,houses,0,2.47,Label 20 | T19,pasta,1,2.47,Label 21 | T19,beds,1,2.47,Label 22 | T20,faces,1,2.5,Label 23 | T20,houses,1,2.5,Label 24 | T20,pasta,0,2.5,Label 25 | T20,beds,1,2.5,Label 26 | T21,faces,1,2.58,Label 27 | T21,houses,1,2.58,Label 28 | T21,pasta,1,2.58,Label 29 | T21,beds,0,2.58,Label 30 | M26,faces,1,2.59,Label 31 | M26,houses,1,2.59,Label 32 | M26,pasta,0,2.59,Label 33 | M26,beds,1,2.59,Label 34 | T18,faces,1,2.61,Label 35 | T18,houses,0,2.61,Label 36 | T18,pasta,1,2.61,Label 37 | T18,beds,0,2.61,Label 38 | T12,beds,0,2.72,Label 39 | T12,faces,0,2.72,Label 40 | T12,houses,1,2.72,Label 41 | T12,pasta,0,2.72,Label 42 | T16,faces,1,2.73,Label 43 | T16,houses,0,2.73,Label 44 | T16,pasta,1,2.73,Label 45 | T16,beds,1,2.73,Label 46 | T7,faces,1,2.74,Label 47 | T7,houses,0,2.74,Label 48 | T7,pasta,0,2.74,Label 49 | T7,beds,0,2.74,Label 50 | T9,houses,0,2.79,Label 51 | T9,faces,1,2.79,Label 52 | T9,pasta,0,2.79,Label 53 | T9,beds,1,2.79,Label 54 | T5,faces,1,2.8,Label 55 | T5,houses,1,2.8,Label 56 | T5,pasta,0,2.8,Label 57 | T5,beds,1,2.8,Label 58 | T14,faces,1,2.83,Label 59 | T14,houses,1,2.83,Label 60 | T14,pasta,0,2.83,Label 61 | T14,beds,1,2.83,Label 62 | T2,houses,0,2.83,Label 63 | T2,faces,0,2.83,Label 64 | T2,pasta,1,2.83,Label 65 | T2,beds,1,2.83,Label 66 | T15,faces,0,2.85,Label 67 | T15,houses,0,2.85,Label 68 | T15,pasta,1,2.85,Label 69 | T15,beds,0,2.85,Label 70 | M13,houses,0,2.88,Label 71 | M13,beds,1,2.88,Label 72 | M13,faces,1,2.88,Label 73 | M13,pasta,0,2.88,Label 74 | M12,faces,1,2.88,Label 75 | M12,houses,0,2.88,Label 76 | M12,pasta,1,2.88,Label 77 | M12,beds,0,2.88,Label 78 | T13,beds,0,2.89,Label 79 | T13,faces,0,2.89,Label 80 | T13,houses,1,2.89,Label 81 | T13,pasta,1,2.89,Label 82 | T8,faces,1,2.91,Label 83 | T8,houses,0,2.91,Label 84 | T8,pasta,1,2.91,Label 85 | T8,beds,1,2.91,Label 86 | T1,faces,1,2.95,Label 87 | T1,houses,0,2.95,Label 88 | T1,pasta,0,2.95,Label 89 | T1,beds,1,2.95,Label 90 | M15,faces,1,2.98,Label 91 | M15,houses,1,2.98,Label 92 | M15,pasta,1,2.98,Label 93 | M15,beds,1,2.98,Label 94 | T11,faces,1,2.99,Label 95 | T11,houses,0,2.99,Label 96 | T11,pasta,1,2.99,Label 97 | T11,beds,1,2.99,Label 98 | T10,faces,0,3,Label 99 | T10,houses,1,3,Label 100 | T10,pasta,1,3,Label 101 | T10,beds,1,3,Label 102 | T3,faces,1,3.09,Label 103 | T3,houses,1,3.09,Label 104 | T3,pasta,1,3.09,Label 105 | T3,beds,1,3.09,Label 106 | T6,faces,1,3.1,Label 107 | T6,houses,1,3.1,Label 108 | T6,pasta,1,3.1,Label 109 | T6,beds,1,3.1,Label 110 | M32,beds,1,3.19,Label 111 | M32,faces,1,3.19,Label 112 | M32,houses,0,3.19,Label 113 | M32,pasta,1,3.19,Label 114 | M1,faces,0,3.2,Label 115 | M1,beds,1,3.2,Label 116 | M1,pasta,0,3.2,Label 117 | M1,houses,0,3.2,Label 118 | C16,faces,0,3.22,Label 119 | C16,houses,0,3.22,Label 120 | C16,pasta,1,3.22,Label 121 | C16,beds,1,3.22,Label 122 | T4,faces,1,3.24,Label 123 | T4,houses,0,3.24,Label 124 | T4,pasta,0,3.24,Label 125 | T4,beds,1,3.24,Label 126 | C17,faces,1,3.25,Label 127 | C17,houses,0,3.25,Label 128 | C17,pasta,1,3.25,Label 129 | C17,beds,0,3.25,Label 130 | C6,faces,0,3.26,Label 131 | C6,houses,1,3.26,Label 132 | C6,pasta,1,3.26,Label 133 | C6,beds,1,3.26,Label 134 | M10,faces,1,3.28,Label 135 | M10,houses,1,3.28,Label 136 | M10,beds,1,3.28,Label 137 | M10,pasta,1,3.28,Label 138 | M31,faces,0,3.3,Label 139 | M31,houses,1,3.3,Label 140 | M31,pasta,1,3.3,Label 141 | M31,beds,1,3.3,Label 142 | C3,houses,0,3.46,Label 143 | C3,pasta,1,3.46,Label 144 | C3,beds,1,3.46,Label 145 | C3,faces,1,3.46,Label 146 | C10,faces,0,3.46,Label 147 | C10,houses,0,3.46,Label 148 | C10,pasta,1,3.46,Label 149 | C10,beds,1,3.46,Label 150 | M18,faces,0,3.46,Label 151 | M18,houses,1,3.46,Label 152 | M18,pasta,1,3.46,Label 153 | M18,beds,1,3.46,Label 154 | M16,faces,0,3.5,Label 155 | M16,houses,0,3.5,Label 156 | M16,pasta,0,3.5,Label 157 | M16,beds,1,3.5,Label 158 | M23,faces,1,3.52,Label 159 | M23,houses,0,3.52,Label 160 | M23,pasta,1,3.52,Label 161 | M23,beds,1,3.52,Label 162 | C7,faces,0,3.55,Label 163 | C7,houses,1,3.55,Label 164 | C7,pasta,0,3.55,Label 165 | C7,beds,0,3.55,Label 166 | C12,faces,1,3.56,Label 167 | C12,houses,0,3.56,Label 168 | C12,pasta,1,3.56,Label 169 | C12,beds,1,3.56,Label 170 | C15,faces,1,3.59,Label 171 | C15,houses,1,3.59,Label 172 | C15,pasta,1,3.59,Label 173 | C15,beds,1,3.59,Label 174 | M29,faces,0,3.72,Label 175 | M29,houses,1,3.72,Label 176 | M29,pasta,1,3.72,Label 177 | M29,beds,1,3.72,Label 178 | C20,faces,1,3.75,Label 179 | C20,houses,1,3.75,Label 180 | C20,pasta,1,3.75,Label 181 | C20,beds,1,3.75,Label 182 | M11,faces,1,3.82,Label 183 | M11,houses,0,3.82,Label 184 | M11,pasta,1,3.82,Label 185 | M11,beds,1,3.82,Label 186 | C9,beds,1,3.82,Label 187 | C9,faces,1,3.82,Label 188 | C9,houses,1,3.82,Label 189 | C9,pasta,1,3.82,Label 190 | C24,faces,1,3.85,Label 191 | C24,houses,0,3.85,Label 192 | C24,pasta,0,3.85,Label 193 | C24,beds,1,3.85,Label 194 | C22,faces,0,3.92,Label 195 | C22,houses,0,3.92,Label 196 | C22,pasta,1,3.92,Label 197 | C22,beds,1,3.92,Label 198 | C8,faces,1,3.92,Label 199 | C8,houses,1,3.92,Label 200 | C8,pasta,1,3.92,Label 201 | C8,beds,1,3.92,Label 202 | M4,faces,1,3.96,Label 203 | M4,houses,1,3.96,Label 204 | M4,pasta,1,3.96,Label 205 | M4,beds,1,3.96,Label 206 | M6,faces,0,4.5,Label 207 | M6,houses,1,4.5,Label 208 | M6,pasta,1,4.5,Label 209 | M6,beds,0,4.5,Label 210 | C19,faces,1,4.14,Label 211 | C19,houses,0,4.14,Label 212 | C19,pasta,0,4.14,Label 213 | C19,beds,1,4.14,Label 214 | C1,faces,1,4.16,Label 215 | C1,houses,1,4.16,Label 216 | C1,pasta,1,4.16,Label 217 | C1,beds,1,4.16,Label 218 | M19,beds,1,4.16,Label 219 | M19,faces,0,4.16,Label 220 | M19,houses,0,4.16,Label 221 | M19,pasta,1,4.16,Label 222 | C11,faces,1,4.22,Label 223 | C11,houses,0,4.22,Label 224 | C11,pasta,1,4.22,Label 225 | C11,beds,1,4.22,Label 226 | M9,faces,1,4.26,Label 227 | M9,houses,1,4.26,Label 228 | M9,pasta,1,4.26,Label 229 | M9,beds,1,4.26,Label 230 | M2,faces,1,4.28,Label 231 | M2,houses,0,4.28,Label 232 | M2,pasta,1,4.28,Label 233 | M2,beds,1,4.28,Label 234 | C5,faces,1,4.29,Label 235 | C5,houses,1,4.29,Label 236 | C5,pasta,1,4.29,Label 237 | C5,beds,1,4.29,Label 238 | M30,beds,1,4.33,Label 239 | M30,faces,1,4.33,Label 240 | M30,houses,0,4.33,Label 241 | M30,pasta,1,4.33,Label 242 | C13,faces,0,4.38,Label 243 | C13,houses,1,4.38,Label 244 | C13,pasta,0,4.38,Label 245 | C13,beds,1,4.38,Label 246 | C4,faces,1,4.55,Label 247 | C4,houses,1,4.55,Label 248 | C4,pasta,1,4.55,Label 249 | C4,beds,1,4.55,Label 250 | C14,faces,1,4.57,Label 251 | C14,houses,1,4.57,Label 252 | C14,pasta,0,4.57,Label 253 | C14,beds,1,4.57,Label 254 | M17,faces,1,4.58,Label 255 | M17,houses,1,4.58,Label 256 | M17,pasta,1,4.58,Label 257 | M17,beds,1,4.58,Label 258 | C2,faces,1,4.6,Label 259 | C2,houses,1,4.6,Label 260 | C2,pasta,1,4.6,Label 261 | C2,beds,1,4.6,Label 262 | C23,faces,0,4.62,Label 263 | C23,houses,1,4.62,Label 264 | C23,pasta,1,4.62,Label 265 | C23,beds,0,4.62,Label 266 | M20,faces,0,4.64,Label 267 | M20,houses,0,4.64,Label 268 | M20,pasta,1,4.64,Label 269 | M20,beds,1,4.64,Label 270 | M21,faces,1,4.64,Label 271 | M21,houses,1,4.64,Label 272 | M21,pasta,1,4.64,Label 273 | M21,beds,1,4.64,Label 274 | C21,faces,1,4.73,Label 275 | C21,houses,0,4.73,Label 276 | C21,pasta,1,4.73,Label 277 | C21,beds,1,4.73,Label 278 | M24,faces,1,4.82,Label 279 | M24,houses,1,4.82,Label 280 | M24,pasta,1,4.82,Label 281 | M24,beds,1,4.82,Label 282 | M5,faces,0,4.84,Label 283 | M5,houses,0,4.84,Label 284 | M5,pasta,0,4.84,Label 285 | M5,beds,1,4.84,Label 286 | M7,faces,1,4.89,Label 287 | M7,houses,1,4.89,Label 288 | M7,pasta,1,4.89,Label 289 | M7,beds,0,4.89,Label 290 | M8,faces,1,4.89,Label 291 | M8,houses,1,4.89,Label 292 | M8,pasta,1,4.89,Label 293 | M8,beds,1,4.89,Label 294 | C18,faces,0,4.95,Label 295 | C18,houses,1,4.95,Label 296 | C18,pasta,1,4.95,Label 297 | C18,beds,1,4.95,Label 298 | M25,faces,1,4.96,Label 299 | M25,houses,1,4.96,Label 300 | M25,pasta,1,4.96,Label 301 | M25,beds,1,4.96,Label 302 | MSCH47,faces,1,2.01,No Label 303 | MSCH47,houses,0,2.01,No Label 304 | MSCH47,pasta,1,2.01,No Label 305 | MSCH47,beds,0,2.01,No Label 306 | MSCH50,faces,0,2.03,No Label 307 | MSCH50,houses,0,2.03,No Label 308 | MSCH50,pasta,0,2.03,No Label 309 | MSCH50,beds,0,2.03,No Label 310 | MSCH51,faces,0,2.07,No Label 311 | MSCH51,houses,0,2.07,No Label 312 | MSCH51,pasta,0,2.07,No Label 313 | MSCH51,beds,0,2.07,No Label 314 | MSCH44,faces,0,2.25,No Label 315 | MSCH44,houses,0,2.25,No Label 316 | MSCH44,pasta,0,2.25,No Label 317 | MSCH44,beds,0,2.25,No Label 318 | MSCH52,faces,0,2.5,No Label 319 | MSCH52,houses,1,2.5,No Label 320 | MSCH52,pasta,0,2.5,No Label 321 | MSCH52,beds,1,2.5,No Label 322 | MSCH38,faces,0,2.59,No Label 323 | MSCH38,houses,0,2.59,No Label 324 | MSCH38,pasta,1,2.59,No Label 325 | MSCH38,beds,0,2.59,No Label 326 | MSCH43,faces,0,2.71,No Label 327 | MSCH43,houses,0,2.71,No Label 328 | MSCH43,pasta,0,2.71,No Label 329 | MSCH43,beds,0,2.71,No Label 330 | MSCH49,faces,0,2.88,No Label 331 | MSCH49,houses,0,2.88,No Label 332 | MSCH49,pasta,0,2.88,No Label 333 | MSCH49,beds,0,2.88,No Label 334 | MSCH45,faces,0,2.9,No Label 335 | MSCH45,houses,0,2.9,No Label 336 | MSCH45,pasta,0,2.9,No Label 337 | MSCH45,beds,1,2.9,No Label 338 | MSCH42,faces,1,2.93,No Label 339 | MSCH42,houses,0,2.93,No Label 340 | MSCH42,pasta,0,2.93,No Label 341 | MSCH42,beds,0,2.93,No Label 342 | MSCH53,faces,1,2.99,No Label 343 | MSCH53,houses,1,2.99,No Label 344 | MSCH53,pasta,0,2.99,No Label 345 | MSCH53,beds,0,2.99,No Label 346 | SCH35,faces,0,3.02,No Label 347 | SCH35,houses,0,3.02,No Label 348 | SCH35,pasta,0,3.02,No Label 349 | SCH35,beds,0,3.02,No Label 350 | MSCH40,faces,0,3.02,No Label 351 | MSCH40,houses,1,3.02,No Label 352 | MSCH40,pasta,0,3.02,No Label 353 | MSCH40,beds,1,3.02,No Label 354 | SCH34,faces,0,3.06,No Label 355 | SCH34,houses,0,3.06,No Label 356 | SCH34,pasta,0,3.06,No Label 357 | SCH34,beds,0,3.06,No Label 358 | SCH33,faces,0,3.06,No Label 359 | SCH33,houses,0,3.06,No Label 360 | SCH33,pasta,0,3.06,No Label 361 | SCH33,beds,0,3.06,No Label 362 | MSCH41,faces,0,3.18,No Label 363 | MSCH41,houses,0,3.18,No Label 364 | MSCH41,pasta,0,3.18,No Label 365 | MSCH41,beds,0,3.18,No Label 366 | SCH37,beds,0,3.27,No Label 367 | SCH37,faces,1,3.27,No Label 368 | SCH37,houses,0,3.27,No Label 369 | SCH37,pasta,1,3.27,No Label 370 | SCH32,faces,1,3.27,No Label 371 | SCH32,houses,0,3.27,No Label 372 | SCH32,pasta,0,3.27,No Label 373 | SCH32,beds,0,3.27,No Label 374 | SCH36,beds,0,3.33,No Label 375 | SCH36,faces,0,3.33,No Label 376 | SCH36,houses,1,3.33,No Label 377 | SCH36,pasta,1,3.33,No Label 378 | SCH11,beds,0,3.41,No Label 379 | SCH12,faces,0,3.41,No Label 380 | SCH12,houses,0,3.41,No Label 381 | SCH12,pasta,0,3.41,No Label 382 | SCH12,beds,0,3.41,No Label 383 | SCH18,faces,0,3.45,No Label 384 | SCH18,houses,0,3.45,No Label 385 | SCH18,pasta,0,3.45,No Label 386 | SCH18,beds,0,3.45,No Label 387 | MSCH48,faces,0,3.5,No Label 388 | MSCH48,houses,1,3.5,No Label 389 | MSCH48,pasta,0,3.5,No Label 390 | MSCH48,beds,0,3.5,No Label 391 | SCH25,faces,0,3.54,No Label 392 | SCH25,houses,1,3.54,No Label 393 | SCH25,pasta,1,3.54,No Label 394 | SCH25,beds,0,3.54,No Label 395 | SCH31,faces,0,3.71,No Label 396 | SCH31,houses,0,3.71,No Label 397 | SCH31,pasta,0,3.71,No Label 398 | SCH31,beds,0,3.71,No Label 399 | MSCH46,faces,0,3.76,No Label 400 | MSCH46,houses,0,3.76,No Label 401 | MSCH46,pasta,1,3.76,No Label 402 | MSCH46,beds,0,3.76,No Label 403 | SCH11,faces,1,3.82,No Label 404 | SCH11,houses,1,3.82,No Label 405 | SCH11,pasta,1,3.82,No Label 406 | SCH29,faces,0,3.83,No Label 407 | SCH29,houses,0,3.83,No Label 408 | SCH29,pasta,0,3.83,No Label 409 | SCH29,beds,0,3.83,No Label 410 | MSCH39,beds,1,3.93,No Label 411 | MSCH39,pasta,0,3.93,No Label 412 | MSCH39,houses,0,3.94,No Label 413 | MSCH39,faces,0,3.94,No Label 414 | SCH28,faces,0,4.02,No Label 415 | SCH28,houses,0,4.02,No Label 416 | SCH28,pasta,0,4.02,No Label 417 | SCH28,beds,0,4.02,No Label 418 | SCH22,faces,0,4.02,No Label 419 | SCH22,houses,0,4.02,No Label 420 | SCH22,pasta,0,4.02,No Label 421 | SCH22,beds,1,4.02,No Label 422 | SCH24,faces,0,4.07,No Label 423 | SCH24,houses,0,4.07,No Label 424 | SCH24,pasta,1,4.07,No Label 425 | SCH24,beds,0,4.07,No Label 426 | SCH27,faces,0,4.09,No Label 427 | SCH27,houses,0,4.09,No Label 428 | SCH27,pasta,1,4.09,No Label 429 | SCH27,beds,0,4.09,No Label 430 | SCH17,faces,0,4.25,No Label 431 | SCH17,houses,0,4.25,No Label 432 | SCH17,pasta,1,4.25,No Label 433 | SCH17,beds,0,4.25,No Label 434 | SCH10,faces,0,4.32,No Label 435 | SCH10,houses,0,4.32,No Label 436 | SCH10,pasta,0,4.32,No Label 437 | SCH10,beds,1,4.32,No Label 438 | SCH9,faces,0,4.37,No Label 439 | SCH9,houses,0,4.37,No Label 440 | SCH9,pasta,0,4.37,No Label 441 | SCH9,beds,0,4.37,No Label 442 | SCH20,faces,0,4.39,No Label 443 | SCH20,houses,0,4.39,No Label 444 | SCH20,pasta,0,4.39,No Label 445 | SCH20,beds,0,4.39,No Label 446 | SCH6,faces,0,4.41,No Label 447 | SCH6,houses,0,4.41,No Label 448 | SCH6,pasta,0,4.41,No Label 449 | SCH6,beds,0,4.41,No Label 450 | SCH7,faces,1,4.41,No Label 451 | SCH7,houses,0,4.41,No Label 452 | SCH7,pasta,0,4.41,No Label 453 | SCH7,beds,0,4.41,No Label 454 | SCH15,faces,1,4.42,No Label 455 | SCH15,houses,0,4.42,No Label 456 | SCH15,pasta,0,4.42,No Label 457 | SCH15,beds,0,4.42,No Label 458 | SCH30,faces,0,4.44,No Label 459 | SCH30,houses,0,4.44,No Label 460 | SCH30,pasta,1,4.44,No Label 461 | SCH30,beds,0,4.44,No Label 462 | SCH3,faces,0,4.47,No Label 463 | SCH3,houses,0,4.47,No Label 464 | SCH3,pasta,0,4.47,No Label 465 | SCH3,beds,0,4.47,No Label 466 | SCH26,faces,0,4.47,No Label 467 | SCH26,houses,0,4.47,No Label 468 | SCH26,pasta,1,4.47,No Label 469 | SCH26,beds,0,4.47,No Label 470 | SCH8,faces,0,4.52,No Label 471 | SCH8,houses,0,4.52,No Label 472 | SCH8,pasta,0,4.52,No Label 473 | SCH8,beds,0,4.52,No Label 474 | SCH16,faces,0,4.55,No Label 475 | SCH16,houses,0,4.55,No Label 476 | SCH16,pasta,0,4.55,No Label 477 | SCH16,beds,1,4.55,No Label 478 | SCH14,faces,0,4.58,No Label 479 | SCH14,houses,0,4.58,No Label 480 | SCH14,pasta,0,4.58,No Label 481 | SCH14,beds,1,4.58,No Label 482 | SCH2,faces,0,4.61,No Label 483 | SCH2,houses,0,4.61,No Label 484 | SCH2,pasta,0,4.61,No Label 485 | SCH2,beds,0,4.61,No Label 486 | SCH5,faces,0,4.61,No Label 487 | SCH5,houses,0,4.61,No Label 488 | SCH5,pasta,0,4.61,No Label 489 | SCH5,beds,0,4.61,No Label 490 | SCH13,faces,0,4.75,No Label 491 | SCH13,houses,0,4.75,No Label 492 | SCH13,pasta,0,4.75,No Label 493 | SCH13,beds,0,4.75,No Label 494 | SCH21,faces,0,4.76,No Label 495 | SCH21,houses,0,4.76,No Label 496 | SCH21,pasta,0,4.76,No Label 497 | SCH21,beds,0,4.76,No Label 498 | SCH19,faces,0,4.79,No Label 499 | SCH19,houses,0,4.79,No Label 500 | SCH19,pasta,0,4.79,No Label 501 | SCH19,beds,1,4.79,No Label 502 | SCH23,faces,0,4.82,No Label 503 | SCH23,houses,0,4.82,No Label 504 | SCH23,pasta,0,4.82,No Label 505 | SCH23,beds,0,4.82,No Label 506 | SCH1,faces,0,4.82,No Label 507 | SCH1,houses,0,4.82,No Label 508 | SCH1,pasta,0,4.82,No Label 509 | SCH1,beds,0,4.82,No Label 510 | MSCH66,faces,0,3.5,No Label 511 | MSCH66,houses,0,3.5,No Label 512 | MSCH66,pasta,1,3.5,No Label 513 | MSCH66,beds,0,3.5,No Label 514 | MSCH67,faces,0,3.24,No Label 515 | MSCH67,houses,1,3.24,No Label 516 | MSCH67,pasta,0,3.24,No Label 517 | MSCH67,beds,1,3.24,No Label 518 | MSCH68,faces,0,3.94,No Label 519 | MSCH68,houses,0,3.94,No Label 520 | MSCH68,pasta,0,3.94,No Label 521 | MSCH68,beds,0,3.94,No Label 522 | MSCH69,faces,0,2.72,No Label 523 | MSCH69,houses,1,2.72,No Label 524 | MSCH69,pasta,1,2.72,No Label 525 | MSCH69,beds,0,2.72,No Label 526 | MSCH70,faces,0,2.31,No Label 527 | MSCH70,houses,0,2.31,No Label 528 | MSCH70,pasta,0,2.31,No Label 529 | MSCH70,beds,1,2.31,No Label 530 | MSCH71,faces,1,3.14,No Label 531 | MSCH71,houses,1,3.14,No Label 532 | MSCH71,pasta,1,3.14,No Label 533 | MSCH71,beds,0,3.14,No Label 534 | MSCH72,faces,1,3.72,No Label 535 | MSCH72,houses,1,3.72,No Label 536 | MSCH72,pasta,0,3.72,No Label 537 | MSCH72,beds,0,3.72,No Label 538 | MSCH73,faces,0,3.1,No Label 539 | MSCH73,houses,0,3.1,No Label 540 | MSCH73,pasta,0,3.1,No Label 541 | MSCH73,beds,0,3.1,No Label 542 | MSCH74,faces,1,2.34,No Label 543 | MSCH74,houses,0,2.34,No Label 544 | MSCH74,pasta,0,2.34,No Label 545 | MSCH74,beds,1,2.34,No Label 546 | MSCH75,faces,0,3.67,No Label 547 | MSCH75,houses,0,3.67,No Label 548 | MSCH75,pasta,0,3.67,No Label 549 | MSCH75,beds,0,3.66,No Label 550 | MSCH76,faces,0,2.58,No Label 551 | MSCH76,houses,0,2.58,No Label 552 | MSCH76,pasta,0,2.58,No Label 553 | MSCH76,beds,0,2.58,No Label 554 | MSCH77,faces,0,2.55,No Label 555 | MSCH77,houses,0,2.55,No Label 556 | MSCH77,pasta,0,2.55,No Label 557 | MSCH77,beds,1,2.55,No Label 558 | MSCH78,faces,0,2.43,No Label 559 | MSCH78,houses,0,2.43,No Label 560 | MSCH78,pasta,0,2.43,No Label 561 | MSCH78,beds,1,2.43,No Label 562 | MSCH79,faces,0,2.7,No Label 563 | MSCH79,houses,1,2.7,No Label 564 | MSCH79,pasta,0,2.7,No Label 565 | MSCH79,beds,1,2.7,No Label 566 | MSCH80,faces,0,2.76,No Label 567 | MSCH80,houses,0,2.76,No Label 568 | MSCH80,pasta,0,2.76,No Label 569 | MSCH80,beds,0,2.76,No Label 570 | MSCH81,faces,1,2.84,No Label 571 | MSCH81,houses,0,2.84,No Label 572 | MSCH81,pasta,0,2.84,No Label 573 | MSCH81,beds,0,2.84,No Label 574 | MSCH82,faces,1,2.46,No Label 575 | MSCH82,houses,0,2.46,No Label 576 | MSCH82,pasta,1,2.46,No Label 577 | MSCH82,beds,0,2.46,No Label 578 | MSCH83,faces,0,2.37,No Label 579 | MSCH83,houses,0,2.37,No Label 580 | MSCH83,pasta,1,2.37,No Label 581 | MSCH83,beds,0,2.37,No Label 582 | MSCH84,faces,0,2.83,No Label 583 | MSCH84,houses,0,2.83,No Label 584 | MSCH84,pasta,1,2.83,No Label 585 | MSCH84,beds,0,2.83,No Label 586 | MSCH85,faces,0,2.69,No Label 587 | MSCH85,houses,0,2.69,No Label 588 | MSCH85,pasta,0,2.69,No Label 589 | MSCH85,beds,0,2.69,No Label 590 | -------------------------------------------------------------------------------- /intro_to_ggplot2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction to ggplot2" 3 | author: "Elika Bergelson" 4 | date: "6/24/2018" 5 | output: 6 | html_document: 7 | toc: true 8 | toc_float: true 9 | --- 10 | 11 | # Intro and preliminaries 12 | 13 | If you haven't already, install these packages (you don't have to do this every time), so if you need to, uncomment the line below and run it. 14 | 15 | ```{r} 16 | #install.packages(c("tidyverse","knitr", "Hmisc")) 17 | ``` 18 | 19 | Then you load them with the "library" command. Confusingly, when you load the tidyverse library, some of its sub-libraries automatically load, and others need to be separately loaded (e.g. `broom`). 20 | 21 | ```{r} 22 | library(tidyverse) 23 | library(knitr) 24 | library(broom) 25 | library(Hmisc) 26 | 27 | # some useful settings (options) ------------------------------------ 28 | options(tibble.width = 300, 29 | dplyr.width = 300) 30 | # these make datasets easier to see when they get displayed on screen 31 | # later, you can mess with them and see what they do if you want. 32 | ``` 33 | 34 | 35 | Reminder: how to get help from R. Put a question mark in front of a function or built-in/loaded dataset, and help will appear! 36 | 37 | ```{r} 38 | ?mean 39 | ?diamonds 40 | #(mean is a function, diamonds is a dataset) 41 | ``` 42 | 43 | 44 | You can get in-line help by pressing tab as you go: R will autocomplete what you're typing within the function it will give you hints about the arguments the function takes try it out by typing `mean(` in the console below and then hitting tab. 45 | 46 | # Data preprocessing 47 | 48 | ## Reading in data 49 | 50 | you probably already read in the data in the intro script, but if you're just jumping in 51 | 52 | 53 | ```{r} 54 | ma_data <- read_csv("datasets/mental_abacus_data.csv") 55 | ps_data <- read_csv("datasets/pragmatic_scales_data.csv") 56 | 57 | ``` 58 | 59 | remember, you can use summary, glimpse, and View to remind yourself what these data files look like (always good to be very careful with that!) 60 | 61 | ```{r} 62 | summary(ma_data) 63 | # View(ps_data) 64 | glimpse(ma_data) 65 | ``` 66 | 67 | let's just make 2 simple aggregated version of this dataset, by subj & items 68 | 69 | ## Review of `dyplr` 70 | 71 | ```{r} 72 | ps_data_bysubj_cond <- ps_data %>% #take your dataset 73 | group_by(subid, condition) %>% #retain subject and condition, collapse everything else, i.e. item 74 | summarise(mean_corr = mean(correct, na.rm = TRUE),#create mean for each subj,cond 75 | sum_corr = sum(correct, na.rm = TRUE))# create sum for each subj,cond 76 | 77 | ps_data_byitem_cond <- ps_data %>% #take your dataset 78 | group_by(item, condition) %>% #retain subject and item, collapse everything else, i.e. subj 79 | summarise(mean_corr = mean(correct, na.rm = TRUE),#create mean for each item,cond 80 | sum_corr = sum(correct, na.rm = TRUE))# create sum for each item,cond 81 | 82 | ``` 83 | 84 | 85 | > protip: commenting what every single line does is great practice when you're stuck! 86 | 87 | # On to graphing! 88 | 89 | ## Scatterplots! 90 | 91 | Check out iris. 92 | 93 | ```{r} 94 | ?iris 95 | ``` 96 | 97 | Let's plot. 98 | 99 | ```{r} 100 | ggplot(data = iris)+ 101 | geom_point(mapping = aes(x=Sepal.Length, 102 | y =Species)) 103 | ``` 104 | 105 | Now use the same approach on `ps_data`. 106 | 107 | ```{r} 108 | ggplot(data = ps_data_byitem_cond)+ 109 | geom_point(mapping = aes(x=condition, 110 | y =sum_corr)) 111 | ps_data_byitem_cond 112 | ``` 113 | 114 | What's wrong with this graph? 115 | 116 | ```{r} 117 | ggplot(data = ma_data)+ 118 | geom_point(mapping = aes(x=grade, 119 | y =arithmeticAverage)) 120 | ma_data 121 | ``` 122 | 123 | 124 | ## jitter those points! 125 | 126 | one thing you should always ask yourself is: how many points,bars, lines should i be seeing? 127 | 128 | ```{r} 129 | ggplot(data = ma_data)+ 130 | geom_jitter(mapping = aes(x = grade, 131 | y = arithmeticAverage)) 132 | ``` 133 | 134 | hm, that's better, but now it feels a little TOO spread out, let's reign it in 135 | 136 | ```{r} 137 | ggplot(data = ma_data)+ 138 | geom_jitter(mapping = aes(x=grade, 139 | y =arithmeticAverage), 140 | width = .2, 141 | height = 0) 142 | ``` 143 | 144 | ##**Exercise.** Task 1. First scatterplots. 145 | + a) using the ps_data_byitem_cond, make a scatterplot of mean correct (x axis) by condition (y axis) 146 | + b) using the built-in iris dataset, make a scatterplot of Species by Petal.Width 147 | + c) using ps_data_bysubj_condition, make a scatterplot of sum correct by condition where you can appropriately see the dots 148 | 149 | ## aesthetics 150 | 151 | ```{r} 152 | ggplot(data = iris)+ 153 | geom_point(mapping = aes(x=Sepal.Length, 154 | y =Petal.Width, 155 | color = Species)) 156 | ``` 157 | 158 | ```{r} 159 | ggplot(data = iris)+ 160 | geom_point(mapping = aes(x=Sepal.Length, 161 | y =Petal.Width, 162 | alpha = Species)) 163 | 164 | ``` 165 | 166 | ```{r} 167 | ggplot(data = iris)+ 168 | geom_point(mapping = aes(x=Sepal.Length, 169 | y =Petal.Width, 170 | shape = Species, 171 | alpha = Species)) 172 | ``` 173 | 174 | ```{r} 175 | ggplot(data = iris)+ 176 | geom_point(mapping = aes(x=Sepal.Length, 177 | y =Petal.Width, 178 | size = Species)) 179 | 180 | ``` 181 | 182 | ```{r} 183 | ggplot(data = iris)+ 184 | geom_point(mapping = aes(x=Sepal.Length, 185 | y =Petal.Width), 186 | size = 4) 187 | ``` 188 | 189 | ```{r} 190 | ggplot(data = iris)+ 191 | geom_point(mapping = aes(x=Sepal.Length, 192 | y =Petal.Width))+ 193 | facet_wrap(facets = ~Species) 194 | ``` 195 | 196 | 197 | ## **Exercise.** For Task 2, use ma_data and a scatterplot of your choosing (jittered if needed!). 198 | + a) set the shape of all the dots in a scatterplot to an asterisk 199 | + b) map a continuous variable onto color (hint: use 'summary' to see what's continuous!) 200 | + c) map a discrete variable (a factor or character) onto shape 201 | + d) map a continuous variable (an integer or double) onto shape 202 | + e) make a graph of your choosing using facet_wrap 203 | + f) advanced: make a graph of your choosing using facet_wrap AND one with facet_grid:what's the difference? 204 | 205 | # Moving forward: Other geoms 206 | 207 | `geom_line` graph (we refer to this graph below in task 3c) 208 | 209 | ```{r} 210 | ggplot(data = ma_data, aes(x= factor(year),#this just makes it treat year as a factor 211 | y= arithmeticAverage, 212 | group = subid))+# group keeps the 'unit' at subid 213 | geom_point()+ 214 | geom_line() 215 | ``` 216 | 217 | geom_hline 218 | geom_text 219 | 220 | ```{r} 221 | ggplot(data = ps_data_byitem_cond, mapping = aes(x=condition, 222 | y=mean_corr))+ 223 | geom_point()+ 224 | 225 | geom_hline(yintercept = .5)+ #hey, this adds a line! 226 | geom_text(label = "1b", x = .7, y= .2, color = "purple")# this but '1b' in the corner! 227 | ``` 228 | 229 | the x and y tell it where to put the text, here `label` is 1 on the x axis 230 | 231 | ## Visualizing distributions 232 | 233 | in 1d; we refer to this graph in task 3d 234 | 235 | ```{r} 236 | ggplot(data = ma_data, aes(x=gonogo))+ 237 | geom_histogram(binwidth=.10) 238 | 239 | ``` 240 | 241 | in 2d 242 | 243 | ```{r} 244 | ggplot(data = ps_data_bysubj_cond, aes(x=condition, y = mean_corr))+ 245 | geom_boxplot() 246 | 247 | ``` 248 | 249 | with density info: 250 | 251 | ```{r} 252 | ggplot(data = ps_data_bysubj_cond, aes(x=condition, y = mean_corr))+ 253 | geom_violin() 254 | 255 | ``` 256 | 257 | with density AND dots! 258 | 259 | ```{r} 260 | ggplot(data = ps_data_bysubj_cond, aes(x=condition, y = mean_corr))+ 261 | geom_violin()+ 262 | geom_jitter(width=.1, height=.01, shape =1)# i like shape #1 for legibility 263 | ``` 264 | 265 | ## statistical transformation: smoothers 266 | 267 | (and examples of 'local' vs. 'global' variable setting) 268 | 269 | global x and y, color just for geom_point 270 | 271 | ```{r} 272 | ggplot(data = ma_data, mapping = aes(x = arithmeticTotal, y = gonogo)) + 273 | geom_point(mapping = aes(color = grade)) + 274 | stat_smooth() 275 | ``` 276 | 277 | all vars global: what's the difference? 278 | 279 | ```{r} 280 | ggplot(data = ma_data, mapping = aes(x = arithmeticTotal, y = gonogo, color = grade)) + 281 | geom_point() + 282 | stat_smooth() 283 | ``` 284 | 285 | filter the data for a layer 286 | 287 | ```{r} 288 | ggplot(data = ma_data, mapping = aes(x = arithmeticTotal, y = gonogo, color = grade)) + 289 | geom_point() + 290 | stat_smooth(data = filter(ma_data,grade=="first grade"))# the smoother only gets grade 1 data! 291 | ``` 292 | 293 | take out a class, remove confidence bnd 294 | 295 | ```{r} 296 | ggplot(data = ma_data, mapping = aes(x = arithmeticTotal, y = gonogo, color = grade)) + 297 | geom_point() + #the points include everyone 298 | stat_smooth(data = filter(ma_data,group != "MA"), 299 | se = FALSE) # but the smoother doesn't see MA group 300 | ``` 301 | 302 | what does `se = FALSE` do? 303 | 304 | `stat_smooth` default is `loess` (local estimator) 305 | 306 | ```{r} 307 | ggplot(data = ma_data, mapping = aes(x = arithmeticTotal, y = gonogo)) + 308 | geom_point(aes(color = grade)) + 309 | stat_smooth() 310 | ``` 311 | 312 | but you can make it fit a line 313 | 314 | ```{r} 315 | ggplot(data = ma_data, mapping = aes(x = arithmeticTotal, y = gonogo)) + 316 | geom_point(aes(color = grade)) + 317 | stat_smooth( method="lm") 318 | ``` 319 | 320 | ## **Exercise.** Task 3. Geoms, distributions, and smoothers. 321 | + a): go back to one of the scatter plots from #1 and add a loess smooth, and a line 322 | + b): using ma_data, make a boxplot of swm for every value of woodcockTotal 323 | + c): go back to the geom_line graph above and separate the data by grade (multiple solutions!) 324 | + d): more advanced: come up with a solution so that the histogram only has each subject represented 1x 325 | 326 | # Adding error bars 327 | 328 | mean by condition, no error bars yet 329 | 330 | ```{r} 331 | ggplot(data = ps_data, aes(x = condition, y = correct)) + 332 | stat_summary(fun.y=mean, 333 | na.rm=T, 334 | geom = "bar") 335 | ``` 336 | 337 | barbarplots? [cf twitter] 338 | 339 | 95% confidence interval 340 | 341 | ```{r} 342 | ggplot(data = ps_data, aes(x = condition, y = correct)) + 343 | stat_summary(fun.data = mean_cl_boot, geom = "pointrange") #fun.data, not fun.y! 344 | 345 | ``` 346 | 347 | `mean_cl_boot` is boostrapped confidence intervals, you can google what regular normal CIs would be! 348 | 349 | both: 350 | 351 | ```{r} 352 | ggplot(data = ps_data, aes(x = condition, y = correct)) + 353 | stat_summary(fun.y = mean, na.rm=T, geom = "bar")+ 354 | stat_summary(fun.data = mean_cl_boot, geom = "pointrange") 355 | ``` 356 | 357 | ## errors bars with violins 358 | 359 | same as violin plot above, but now with an errorbar! 360 | 361 | ```{r} 362 | ggplot(data = ps_data_bysubj_cond, aes(x=condition, y = mean_corr))+ 363 | geom_violin()+ 364 | stat_summary(fun.data=mean_cl_normal, geom = "pointrange") 365 | ``` 366 | 367 | 368 | ## stack and dodge 369 | 370 | > protip: use `fill` with bars not colour! 371 | 372 | bonus question: what does colour do for bars? 373 | 374 | ```{r} 375 | ggplot(data = ma_data) + 376 | geom_bar(mapping = aes(x = woodcockTotal, fill = grade), position = "fill") 377 | 378 | ggplot(data = ma_data) + 379 | geom_bar(mapping = aes(x = woodcockTotal, fill = grade), position = "stack") 380 | 381 | ggplot(data = ma_data) + 382 | geom_bar(mapping = aes(x = woodcockTotal, fill = grade), position = "dodge") 383 | ``` 384 | 385 | ## **Exercise**. Task 4: error bars, and stack & dodge 386 | + a): using the ps dataset, graph means for each item & add error bars 387 | + b): make a bargraph of the `ps_data_byitem_cond` showing the mean_corr for each condition using geom_bar 388 | (hint, you'll need to use "stat=" insde your `geom_bar()` call 389 | + c): when would it be most appropriate to use `fill`, `stack`, or `dodge`? 390 | 391 | ## Saving your graph 392 | 393 | ```{r} 394 | ?ggsave() 395 | ``` 396 | 397 | ggsave will save your *last* plot by default, but you can also tell it save a plot you've assigned. 398 | 399 | ```{r} 400 | mygraph <- ggplot(data = iris)+ 401 | geom_point(mapping = aes(x=Sepal.Length, 402 | y =Petal.Width, 403 | color = Species)) 404 | 405 | mygraph 406 | ggsave("mygraph.pdf",plot = mygraph,dpi = 100) 407 | ``` 408 | 409 | even better than saving your graph: add it to your R Markdown! the awesome thing about using your `.Rmd` file is that you can render graphs there, and they get saved for you! 410 | 411 | there are LOTS of settings you can muck with. (details here https://yihui.name/knitr/options/#plots). we'll do this back in our .rmd file 412 | 413 | # Graph Wishes 414 | 415 | ### **Exercise.** Task 5. Split into groups for task wishes. 416 | + Group A: Individual datapoints + summary stats. 417 | + Group B: Distribution-based Wishes. 418 | + Group C: Time-course graph based wishes. 419 | 420 | hint for group a 421 | 422 | ```{r} 423 | ggplot(data = ma_data, aes(x= factor(year),#this just makes it treat year as a factor 424 | y= arithmeticAverage, 425 | group = subid))+# this keeps the 'unit' at subid 426 | geom_point()+ 427 | geom_line()+facet_wrap(~grade)+ 428 | stat_summary(color = "red", size = 3, geom="line", fun.y=mean, aes(group =grade)) 429 | ``` 430 | 431 | hint 1 for group b 432 | 433 | ```{r} 434 | xvar <- c(rnorm(1500, mean = -1), rnorm(1500, mean = 1.5)) 435 | yvar <- c(rnorm(1500, mean = 1), rnorm(1500, mean = 1.5)) 436 | zvar <- as.factor(c(rep(1, 1500), rep(2, 1500))) 437 | xy <- data.frame(xvar, yvar, zvar) 438 | ggplot(xy, aes(xvar, yvar)) + geom_point() + geom_rug(col = "darkred", alpha = 0.1) 439 | ``` 440 | 441 | further hints: [here](http://felixfan.github.io/ggplot2-cheatsheet/) and [here](https://stackoverflow.com/questions/35366499/ggplot2-how-to-combine-histogram-rug-plot-and-logistic-regression-prediction) 442 | 443 | hint 2 for Group b 444 | 445 | all the code for this graph appears to be here, BUT this person did not do things the tidy way! 446 | https://micahallen.org/2018/03/15/introducing-raincloud-plots/. exercise for the reader: do his wrangling the tidy way! 447 | 448 | but using our `ps_data_bysubj_cond` and sourcing this: 449 | 450 | ```{r} 451 | source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R") 452 | 453 | ``` 454 | 455 | you should be able to make your raincloud:) 456 | 457 | hint for group c: here's a sample dataset and a graph to get you started in the right direction 458 | 459 | ```{r} 460 | library(feather)# this is part of tidyverse, but not auto-loaded 461 | ``` 462 | 463 | feather is a format that's convenient for various reasons 464 | 465 | ```{r} 466 | coart<- read_feather("datasets/coart_test") 467 | summary(coart) 468 | ``` 469 | 470 | do you know what each of these lines do? can you make errorbars? 471 | 472 | ```{r} 473 | ggplot(subset(coart, Nonset<5000 & Nonset>-1500), 474 | aes(Nonset, propt, color = TrialType))+ 475 | geom_hline(yintercept=.5)+ 476 | ylab("proportion of target looking")+ 477 | xlab("time from target onset")+ 478 | geom_vline(xintercept=0)+ 479 | stat_smooth(geom="point")+ 480 | theme_bw(base_size=18) 481 | ``` 482 | 483 | # Extras for the curious 484 | 485 | ## adding regression line 486 | 487 | if all we wanted to do was add a regression line, we'd just use `stat_smooth`: 488 | note this is like the graph we did with the errorbars above, just edited a little 489 | 490 | ```{r} 491 | ggplot(ToothGrowth, aes(x=dose, y=len, colour=supp)) + 492 | stat_summary(fun.y = mean, geom = "point", size = 4) + 493 | geom_point( size = 1)+ 494 | stat_smooth(method="lm") 495 | ``` 496 | 497 | but if we want to know what the actual formula for that line is, we have to calculate some things: 498 | 499 | first we need a linear model 500 | 501 | ```{r} 502 | ourmodel <- lm(data = ToothGrowth, len~dose*supp) 503 | ``` 504 | 505 | if you want to know more about the results you do a summary of the model 506 | 507 | ```{r} 508 | summary(ourmodel) 509 | ``` 510 | 511 | if you want the summary results to look prettier you tidy the model 512 | 513 | ```{r} 514 | tidy(ourmodel) 515 | ``` 516 | 517 | in our case, we can use the results of the model to manually put in a line, but there are fancier ways to do this that are beyond the scope of this tutorial 518 | 519 | ```{r} 520 | ggplot(ToothGrowth, aes(x=dose, y=len, colour=supp)) + 521 | stat_summary(fun.y = mean, geom = "point", size = 4) + 522 | geom_point( size = 1)+ 523 | stat_smooth(method="lm")+ 524 | annotate(x = 1, y = 30, "text", label = "y = 11.55 + 7.8 *dose + -8.26*suppVC + 3.9 * dose * suppVC") 525 | ``` 526 | 527 | note: for annotate, the x and y is where on the graph you want your text to go 528 | 529 | if you want to check this formula, you can plug in some values: 530 | 531 | ```{r} 532 | #11.55 + 7.8 *dose + -8.26*suppVC + 3.9 * dose * suppVC 533 | 11.55 + 7.8*0 + -8.26*0 + 3.9*0*0 # dose of 0 for oj 534 | 11.55 + 7.8*1 + -8.26*0 + 3.9*1*0# dose of 1 for oj 535 | 11.55 + 7.8*2 + -8.26*0 + 3.9*2*0# dose of 2 for oj 536 | 537 | 11.55 + 7.8*0 + -8.26*1 + 3.9*0*1# dose of 0 for vc 538 | 11.55 + 7.8*1 + -8.26*1 + 3.9*1*1# dose of 1 for vc 539 | 11.55 + 7.8*2 + -8.26*1 + 3.9*2*1# dose of 1 for vc 540 | ``` 541 | 542 | ## manually specified errorbars 543 | 544 | From: [http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/]() 545 | 546 | ```{r} 547 | tgc <- ToothGrowth%>% 548 | group_by(supp, dose)%>% 549 | summarise(n = n(), 550 | mean_len = mean(len), 551 | sd_len = sd(len), 552 | se_len = sd_len/sqrt(n), 553 | ci_len = qt((.95/2 +.5), 554 | df= n-1)*se_len) # looking up 95%'s 2 tails in t-dist 555 | 556 | ggplot(tgc, aes(x=dose, y=mean_len, colour=supp)) + 557 | geom_errorbar(aes(ymin=mean_len - ci_len, ymax = mean_len + ci_len)) + 558 | geom_line() + 559 | geom_point() 560 | 561 | ggplot(ToothGrowth, aes(x=dose, y=len, colour=supp)) + 562 | stat_summary(fun.data = mean_cl_normal, geom = "errorbar") + 563 | stat_summary(fun.y = mean, geom = "point") + 564 | stat_summary(fun.y = mean, geom = "line") 565 | ``` 566 | 567 | -------------------------------------------------------------------------------- /intro_to_ggplot2_someanswers.R: -------------------------------------------------------------------------------- 1 | #some answers to intro_to_ggplot2.r tasks 2 | #1a 3 | ggplot(data = ps_data_byitem_cond)+ 4 | geom_point(mapping = aes(x=condition, 5 | y =mean_corr)) 6 | 7 | #1b 8 | ggplot(data = iris)+ 9 | geom_point(mapping = aes(x=Species, 10 | y =Sepal.Length)) 11 | 12 | #1c 13 | ggplot(data = ps_data_bysubj_cond)+ 14 | geom_jitter(mapping = aes(x=condition, 15 | y =sum_corr), 16 | width = .2, 17 | height = .1) 18 | #1d 19 | ggplot(data = ma_data, aes(x=gonogo))+ 20 | geom_histogram(binwidth=.10)+facet_wrap(~year) 21 | 22 | #2a 23 | ggplot(data = ma_data)+ 24 | geom_jitter(mapping = aes(x=grade, 25 | y =arithmeticAverage), 26 | width = .2, 27 | height = 0, 28 | shape = 8) 29 | 30 | #2b 31 | ggplot(data = ma_data)+ 32 | geom_jitter(mapping = aes(x=grade, 33 | y =arithmeticAverage, 34 | color = swm), 35 | width = .2, 36 | height = 0) 37 | 38 | #2c 39 | ggplot(data = ma_data)+ 40 | geom_jitter(mapping = aes(x=grade, 41 | y =arithmeticAverage, 42 | shape = group), 43 | width = .2, 44 | height = 0) 45 | 46 | #2d 47 | ggplot(data = ma_data)+ 48 | geom_jitter(mapping = aes(x=grade, 49 | y =arithmeticAverage, 50 | shape = swm), 51 | width = .2, 52 | height = 0) 53 | #2e 54 | ggplot(data = mpg)+geom_point(aes(x=cty, y=hwy))+facet_wrap(~drv+cyl) 55 | ggplot(data = mpg)+geom_point(aes(x=cty, y=hwy))+facet_grid(~drv+cyl) 56 | 57 | #3b 58 | ggplot(ma_data, aes(woodcockTotal, swm, group = woodcockTotal))+geom_boxplot() 59 | #3c 60 | ggplot(data = ma_data, aes(x= factor(year),#this just makes it treat year as a factor 61 | y= arithmeticAverage, 62 | group = subid))+# this keeps the 'unit' at subid 63 | geom_point()+ 64 | geom_line()+facet_wrap(~grade) 65 | 66 | #4b 67 | ggplot(data=ps_data_byitem_cond, aes(condition, mean_corr, fill=item))+geom_bar(position="dodge", stat="identity") 68 | 69 | #5: Graph Wishes, 70 | #a) Group A: Distribution-based Wishes 71 | ggplot(data = ma_data, aes(x= factor(year),#this just makes it treat year as a factor 72 | y= arithmeticAverage, 73 | group = subid, 74 | linetype= grade))+# this keeps the 'unit' at subid 75 | geom_point()+ 76 | geom_line(aes(color = grade))+ 77 | stat_summary(color = "red", size = 3, geom="line", fun.y=mean, aes(group =grade)) 78 | 79 | 80 | #b) Group B: Time-course graph based wishes 81 | ggplot(subset(coart, Nonset<5000 & Nonset>-1500), 82 | aes(Nonset, propt, linetype= TrialType, shape=TrialType))+ 83 | geom_hline(yintercept=.5)+ 84 | ylab("proportion of target looking")+ 85 | xlab("time from target onset")+ 86 | geom_vline(xintercept=0)+ 87 | stat_smooth(geom="errorbar")+ 88 | theme_bw(base_size=18) 89 | 90 | 91 | #c) Group C: Individual datapoints + summary stats 92 | ggplot(data = ps_data_bysubj_cond, aes(x=condition, y = mean_corr))+ 93 | geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .8)+ 94 | geom_point(position = position_jitter(width = .15))+ 95 | geom_boxplot(width = .1, guides = FALSE, outlier.shape = NA, alpha = 0.5) 96 | 97 | -------------------------------------------------------------------------------- /intro_to_r.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mcfrank/openscience_tutorial/fec714335860e1b2cb55f5613cb710bd233df5ad/intro_to_r.R -------------------------------------------------------------------------------- /intro_to_r.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction to R" 3 | author: "Jessica Kosie & Mike Frank" 4 | date: "6/24/2018" 5 | output: 6 | html_document: 7 | toc: true 8 | toc_float: true 9 | --- 10 | 11 | # Goals 12 | 13 | By the end of this tutorial, you will know: 14 | 15 | + Basic `R` usage (using `R` as a calculator, creating variables, indexing vectors) 16 | + How to read in and examine data 17 | + How to get values out of the rows and columns in your data 18 | + What a pipe is and how to use pipes to chain together `tidyverse` verbs 19 | + How to create useful summaries of your data using tidyverse. 20 | 21 | The best way to do this tutorial is to walk through it slowly, executing each line and trying to understand what it does. You can execute a whole chunk at a time by hitting CMD+option+C (on a mac), and execute a single line by hitting CMD+enter (again on a mac). 22 | 23 | # Basic R Use 24 | 25 | R can simply be used as a calculator. 26 | 27 | ```{r} 28 | # Basic arithmetic 29 | 2+3 30 | 2*3 31 | 10/2 32 | 4^2 33 | 34 | # Follows order of operations (PEMDAS) 35 | (2^3)+4*(5/3) 36 | ``` 37 | 38 | These values aren't stored anywhere though. 39 | 40 | To keep them in memory, we need to assign them to a variable. 41 | 42 | ```{r} 43 | # Create a variable called x, that is assigned the number 8. 44 | x <- 8 45 | x = 8 46 | 47 | # What value did I assign to x? 48 | x 49 | 50 | # Can also assign a range of values 51 | x <- 1:5 #x is now a vector of 1, 2, 3, 4, 5 52 | 53 | # note that x is no longer 8, it take on whatever the most recent assignment was 54 | 55 | # We can also assign a vector of values this way 56 | x <- c(2, 8, 1, 9) 57 | ``` 58 | 59 | Vectors are just 1-dimensional lists of numbers. 60 | 61 | ```{r} 62 | #let's get the numbers 1 thru 10 by ones 63 | 1:10 64 | 65 | #sequence of numbers, 1 thru 10, by 2 66 | seq(from = 1, to = 10, by = 2) #can say "from", "to", and "by=", but they're not necessary 67 | 68 | #sequence of 11 equally spaced numbers between 0 and 1 69 | seq(from = 0, to = 1, length.out=11) 70 | ``` 71 | 72 | > **Exercise.** Create a variable called x that is assigned the number 5. Create a variable called y that is a sequence of numbers from 5 to 25, by 5. Multiply x and y. What happens? 73 | 74 | ```{r} 75 | 76 | ``` 77 | 78 | 79 | ## Functions 80 | 81 | `seq` that we used above is a **function**. Everything you typically want to do in statistical programming uses functions. `mean` is another good example. `mean` takes one **argument**, a numeric vector. We are going to **apply** this function to a new vector. 82 | 83 | ```{r} 84 | z <- 0:20 85 | mean(z) 86 | 87 | # now, let's get the mean of this vector: 88 | q <- c(2, 8, 6, NA, 4, 8) 89 | 90 | mean(q) 91 | 92 | # is that the answer you'd expect? let's get some information about the function `mean` 93 | ?mean 94 | 95 | # we need to add an additional argument to tell the function that we want to ignore NAs (the default for this argument is FALSE, that's why NAs weren't ignored above) 96 | mean(q, na.rm = TRUE) 97 | ``` 98 | 99 | > **Exercise.** R has a function called `rnorm` that will allow you to get a random sample of numbers drawn from a normal distribtuion. Get a sample of 5 numbers with a mean of 0 and a standard deviation of 0.5. 100 | 101 | ```{r} 102 | # Hint: 103 | ?rnorm 104 | 105 | 106 | ``` 107 | 108 | Creating and indexing matrices 109 | 110 | ```{r} 111 | x <- matrix(c(11,12,13,21,22,23), byrow=TRUE,nrow=2) #put into a matrix by row 112 | x1 <- matrix(c(11,12,13,21,22,23), byrow=FALSE,nrow=2) #put into a matrix by column 113 | 114 | #indexing matrices (getting the value that's in a particular row/column) 115 | 116 | # x[r,c] would give you the element in row r, column c of the matrix 117 | 118 | x[2,3] #gives you the element in row 2 column 3 of matrix x (defined above) 119 | x[] #all rows, all columns - could have just typed x 120 | x[1, ] #first row, all columns 121 | x[ ,3] #all rows, third column 122 | x[1,3] #first row, third column 123 | 124 | y <- x[1,2] #assign the value in the 1st row 2nd column to a new variable 125 | ``` 126 | 127 | ## Reading data into R 128 | 129 | First, you'll need to tell R where to look for the data. To do this, you will set your working directory. 130 | 131 | For this tutorial, your working directory should be wherever you downloaded the materials. I downloaded them to my desktop. 132 | 133 | I like to do this using RStudio, via the graphical interface on the top: 134 | 135 | Session > Set Working Directory > Set Working Directory to Source File Location 136 | 137 | That will put you in the right location. 138 | 139 | ```{r} 140 | # take a look at what's in your directory 141 | # can also use the Files pane 142 | dir() 143 | 144 | # Now, let's read in the pragmatic_scales_data CSV file and save it as an object called ps_data 145 | ps_data <- read.csv("datasets/pragmatic_scales_data.csv", header = TRUE) 146 | ``` 147 | 148 | ## Examining the data file 149 | 150 | We can simply look at the data frame. We can also get a summary of the data. (these are all **functions** too!) 151 | 152 | ```{r} 153 | # Look at the first few rows of the data. 154 | head(ps_data) 155 | 156 | # Look at the final few rows of the data. 157 | tail(ps_data) 158 | 159 | # Get a summary of the data. 160 | summary(ps_data) 161 | ``` 162 | 163 | Some people like `View(ps_data)` - that shows an interactive "spreadsheet" view. 164 | 165 | ## Indexing a data frame. 166 | 167 | You can select entries in the data frame just like indexing a vector. [row, column] 168 | 169 | ```{r} 170 | # Get the entry in row 3, column 4. 171 | ps_data[3, 4] 172 | 173 | # Get the entry in row 4 of the item column 174 | ps_data[4, "item"] 175 | ``` 176 | 177 | > ProTip: Many people use this kind of selection to modify individual entries, like if you just want to correct a single mistake at a paticular point in the data frame. Be careful if you do this, as there will be nothing in the code that tells you that `4` is the *right* element to fix, you'll just have to trust that you got that number right. 178 | 179 | ...or select an entire column using the $ operation. 180 | 181 | ```{r} 182 | ps_data$condition 183 | ``` 184 | 185 | Create a new column from a current column(s). 186 | 187 | ```{r} 188 | ps_data$new <- ps_data$age + ps_data$correct 189 | ``` 190 | 191 | We can apply functions to an entire column. For example, I can get the mean age for my entire sample. 192 | 193 | Note that I have to include the data file in this argument, if I just say mean(age) I'll get an error. 194 | 195 | ```{r} 196 | mean(ps_data$age) 197 | ``` 198 | 199 | > **Exercise.** Let's center age. Create a new column called age_centered in which you center age by subtracting the mean age from the age column. 200 | 201 | ```{r} 202 | 203 | 204 | ``` 205 | 206 | # Using the `tidyverse` 207 | 208 | > tidyverse is a package that has to be installed and loaded before you can use any of its functions. 209 | 210 | The functions we've been using so far have been in **base R** and don't require additional packages. To use the functions in the tidyverse packages the `tidyverse` package must first be installed and loaded. Tidyverse packages include tidyr, dplyr, ggplot2, and more - see here for more info: www.tidyverse.org. 211 | 212 | If you haven't installed the package, you'll need to run this command once: 213 | 214 | `install.packages("tidyverse")` 215 | 216 | ```{r} 217 | # Load the package (tell R that you want to use its functions) 218 | library("tidyverse") 219 | ``` 220 | 221 | We're going to reread the data now, using `read_csv`, which is the `tidyverse` version and works faster and better in a number of ways! 222 | 223 | ```{r} 224 | ps_data <- read_csv("datasets/pragmatic_scales_data.csv") 225 | ``` 226 | 227 | 228 | ## Pipes 229 | 230 | Pipes are a way to write strings of functions more easily. They bring the first argument of the function to the bedginning. So you can write: 231 | 232 | ```{r} 233 | ps_data$age %>% mean() 234 | ``` 235 | 236 | That's not very useful yet, but when you start **nesting** functions, it gets better. 237 | 238 | ```{r} 239 | mean(unique(ps_data$age)) 240 | ps_data$age %>% unique() %>% mean() 241 | ps_data$age %>% unique %>% mean 242 | ``` 243 | 244 | or 245 | 246 | ```{r} 247 | round(mean(unique(ps_data$age)), 248 | digits = 2) 249 | 250 | ps_data$age %>% unique %>% mean %>% round(digits = 2) 251 | 252 | # indenting makes things even easier to read 253 | ps_data$age %>% 254 | unique %>% 255 | mean %>% 256 | round(digits = 2) 257 | ``` 258 | 259 | This can be super helpful for writing strings of functions so that they are readable and distinct. 260 | 261 | > **Exercise.** Rewrite these commands using pipes and check that they do the same thing! (Or at least produce the same output). Unpiped version: 262 | 263 | ```{r} 264 | # number of unique items 265 | length(unique(ps_data$item)) 266 | ``` 267 | 268 | Piped version: 269 | 270 | ```{r} 271 | 272 | 273 | ``` 274 | 275 | ## Using `tidyverse` to explore and characterize the dataset 276 | 277 | We are going to manipulate these data using "verbs" from `dplyr`. I'll only teach four verbs, the most common in my workflow (but there are many other useful ones): 278 | 279 | + `filter` - remove rows by some logical condition 280 | + `mutate` - create new columns 281 | + `group_by` - group the data into subsets by some column 282 | + `summarize` - apply some function over columns in each group 283 | 284 | Inspect the various variables before you start any analysis. Earlier we used `summary` but TBH I don't find it useful. 285 | 286 | ```{r} 287 | summary(ps_data) 288 | ``` 289 | 290 | This output just feels overwhelming and uninformative. 291 | 292 | You can look at each variable by itself: 293 | 294 | ```{r} 295 | unique(ps_data$item) 296 | 297 | ps_data$subid %>% 298 | unique 299 | ``` 300 | 301 | Or use interactive tools like `View` or `DT::datatable` (which I really like). 302 | 303 | ```{r} 304 | # this won't work unless you first do 305 | # install.packages("DT") 306 | DT::datatable(ps_data) 307 | ``` 308 | 309 | > ProTip: What we're working with is called "tidy data" where each column is one measure, and each row is one observation. This is, by consensus, the best way to work with tabular data in R. It's actually where the name of `tidyverse` comes from. Check out [this paper](https://www.jstatsoft.org/article/view/v059i10) to learn more. BUT - if you normally work with "wide data", where each row is a subject and different trials are different columns (like what SPSS often does), you can get your data "tidy" using a package called `tidyr`, which is also part of the tidyverse. It's a little tricky so we're not teaching it today, but the verbs that it provides are `gather` and `spread`. 310 | 311 | ## Filtering & Mutating 312 | 313 | There are lots of reasons you might want to remove *rows* from your dataset, including getting rid of outliers, selecting subpopulations, etc. `filter` is a verb (function) that takes a data frame as its first argument, and then as its second takes the **condition** you want to filter on. 314 | 315 | So if you wanted to look only at 2 and 3 year olds. 316 | 317 | ```{r} 318 | ps_data %>% 319 | filter(age > 2, age < 3) 320 | 321 | # filter(ps_data, age > 2 & age < 3) 322 | 323 | # ps_data %>% 324 | # filter(age > 2, age < 3) 325 | ``` 326 | 327 | Note that we're going to be using pipes with functions over data frames here. The way this works is that: 328 | 329 | + `dplyr` verbs always take the data frame as their first argument, and 330 | + because pipes pull out the first argument, the data frame just gets passed through successive operations 331 | + so you can read a pipe chain as "take this data frame and first do this, then do this, then do that." 332 | 333 | This is essentially the huge insight of `dplyr`: you can chain verbs into readable and efficient sequences of operations over dataframes, provided 1) the verbs all have the same syntax (which they do) and 2) the data all have the same structure (which they do if they are tidy). 334 | 335 | OK, so filtering: 336 | 337 | ```{r} 338 | ps_data %>% 339 | filter(age > 2, 340 | age < 3) 341 | 342 | ``` 343 | 344 | **Exercise.** Create a smaller datast with **only** the "faces" items in the "Label" condition. 345 | 346 | ```{r} 347 | 348 | 349 | ``` 350 | 351 | > ProTip: You can think about `filter`ing as similar to "logical indexing", where you use a vector of `TRUE` and `FALSE`s to get a part of a dataset, for example, `ps_data[ps_data$items == "faces",]`. This command creates a logical vector `ps_data$items == "faces"` and uses it as a condition for filtering. 352 | 353 | There are also times when you want to add or remove *columns*. You might want to remove columns to simplify the dataset. If you wanted to do that, the verb is `select`. 354 | 355 | ```{r} 356 | ps_data %>% 357 | select(subid, age, correct) 358 | 359 | ps_data %>% 360 | select(-condition) 361 | 362 | ps_data %>% 363 | select(1) 364 | 365 | ps_data %>% 366 | select(starts_with("sub")) 367 | 368 | # learn about this with ?select 369 | ``` 370 | 371 | Perhaps more useful is *adding columns*. You might do this perhaps to compute some kind of derived variable. `mutate` is the verb for these situations - it allows you to add a column. Let's add a discrete age group factor to our dataset. 372 | 373 | ```{r} 374 | ps_data <- ps_data %>% 375 | mutate(age_group = cut(age, 2:5, include.lowest = TRUE), 376 | foo = age * 7) 377 | 378 | head(ps_data) 379 | ``` 380 | 381 | Recoding. 382 | 383 | ```{r} 384 | ps_data %>% 385 | mutate(age_factor = factor(round(age)), 386 | age_factor_bob = age_factor) %>% 387 | select(-age_factor) 388 | ``` 389 | 390 | 391 | ## Standard psychological descriptives 392 | 393 | We typically describe datasets at the level of subjects, not trials. We need two verbs to get a summary at the level of subjects: `group_by` and `summarise` (kiwi spelling). Grouping alone doesn't do much. 394 | 395 | ```{r} 396 | ps_data %>% 397 | group_by(age_group) 398 | ``` 399 | 400 | All it does is add a grouping marker. 401 | 402 | What `summarise` does is to *apply a function* to a part of the dataset to create a new summary dataset. So we can apply the function `mean` to the dataset and get the grand mean. 403 | 404 | ```{r} 405 | ## DO NOT DO THIS!!! 406 | # foo <- initialize_the_thing_being_bound() 407 | # for (i in 1:length(unique(ps_data$item))) { 408 | # for (j in 1:length(unique(ps_data$condition))) { 409 | # this_data <- ps_data[ps_data$item == unique(ps_data$item)[i] & 410 | # ps_data$condition == unique(ps_data$condition)[n],] 411 | # do_a_thing(this_data) 412 | # bind_together_somehow(this_data) 413 | # } 414 | # } 415 | 416 | ps_data %>% 417 | summarise(correct = mean(correct)) 418 | ``` 419 | 420 | Note the syntax here: `summarise` takes multiple `new_column_name = function_to_be_applied_to_data(data_column)` entries in a list. Using this syntax, we can create more elaborate summary datasets also: 421 | 422 | ```{r} 423 | ps_data %>% 424 | summarise(correct = mean(correct), 425 | n_observations = length(subid)) 426 | ``` 427 | 428 | Where these two verbs shine is in combination, though. Because `summarise` applies functions to columns in your *grouped data*, not just to the whole dataset! 429 | 430 | So we can group by age or condition or whatever else we want and then carry out the same procedure, and all of a sudden we are doing something extremely useful! 431 | 432 | ```{r} 433 | ps_means <- ps_data %>% 434 | group_by(age_group, condition) %>% 435 | summarise(correct = mean(correct), 436 | n_observations = length(subid)) 437 | ps_means 438 | ``` 439 | 440 | > **Exercise.** One of the most important analytic workflows for psychological data is to take some function (e.g., the mean) *for each participant* and then look at grand means and variability *across participant means*. This analytic workflow requires grouping, summarising, and then grouping again and summarising again! Use `dplyr` to make the same table as above (`ps_means`) but with means computed across subject means, not across all data points. (The means will be pretty similar as this is a balanced design but in a case with lots of missing data, they will vary.) 441 | 442 | ```{r} 443 | 444 | 445 | ``` 446 | 447 | 448 | ## Optional: $t$-test 449 | 450 | A classic 451 | 452 | Get the subject means. 453 | 454 | ```{r} 455 | ps_sub_means <- ps_data %>% 456 | group_by(age_group, subid, condition) %>% 457 | summarise(correct = mean(correct), 458 | n_observations = length(subid)) 459 | ``` 460 | 461 | Now do a t-test for all ages. 462 | 463 | ```{r} 464 | t.test(ps_sub_means$correct[ps_sub_means$condition == "Label"], 465 | ps_sub_means$correct[ps_sub_means$condition == "No Label"]) 466 | ``` 467 | 468 | 469 | > **Exercise.** Do a t-test for just the 3-4 year olds. 470 | 471 | ```{r} 472 | 473 | ``` 474 | 475 | -------------------------------------------------------------------------------- /mygraph.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mcfrank/openscience_tutorial/fec714335860e1b2cb55f5613cb710bd233df5ad/mygraph.pdf -------------------------------------------------------------------------------- /open_science.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mcfrank/openscience_tutorial/fec714335860e1b2cb55f5613cb710bd233df5ad/open_science.md -------------------------------------------------------------------------------- /openscience_tutorial_icis2018.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: XeLaTeX 14 | -------------------------------------------------------------------------------- /rmarkdown_handout.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "RMarkdown for writing reproducible scientific papers" 3 | author: '[Mike Frank](mailto:mcfrank@stanford.edu), adapted from work by Mike and [Chris Hartgerink](mailto:chris@libscie.org)' 4 | date: '`r Sys.Date()`' 5 | output: 6 | tufte::tufte_html: 7 | toc: yes 8 | toc_depth: 1 9 | --- 10 | 11 | 12 | ```{r, echo=FALSE} 13 | library(knitr) 14 | opts_chunk$set(echo=TRUE, 15 | warning=FALSE, message=FALSE, 16 | cache=FALSE) 17 | ``` 18 | 19 | 20 | # Introduction 21 | 22 | This document is a short tutorial on using RMarkdown to mix prose and code together for creating reproducible scientific documents. If you find any errors and have a Github account, **[please suggest changes here](https://github.com/mcfrank/openscience_tutorial)**. This is adapted from [a slightly longer tutorial that Mike and Chris Hartgerink taught together at SIPS 2017](https://github.com/mcfrank/rmarkdown-workshop). 23 | 24 | In short: RMarkdown allows you to create documents that are compiled with code, producing your next scientific paper. 25 | 26 | Now we're together trying to help spread the word, because it can make writing manuscripts so much easier! We wrote this handout in RMarkdown as well. [Take a look at the source.](https://github.com/mcfrank/openscience_tutorial/blob/master/rmarkdown_handout.Rmd) 27 | 28 | ## Who is this aimed at? 29 | 30 | We aim this document at anyone writing manuscripts and using R, including those who... 31 | 32 | 1. ...collaborate with people who use Word 33 | 2. ...want to write complex equations 34 | 3. ...want to be able to change bibliography styles with less hassle 35 | 4. ...want to spend more time actually doing research! 36 | 37 | ## Why write reproducible papers? 38 | 39 | Cool, thanks for sticking with us and reading up through here! 40 | 41 | There are three reasons to write reproducible papers. To be right, to be reproducible, and to be efficient. There are more, but these are convincing to us. In more depth: 42 | 43 | 1. To avoid errors. Using an automated method for scraping APA-formatted stats out of PDFs, @nuijten2016 found that over 10% of p-values in published papers were inconsistent with the reported details of the statistical test, and 1.6% were what they called "grossly" inconsistent, e.g. difference between the p-value and the test statistic meant that one implied statistical significance and the other did not. Nearly half of all papers had errors in them. 44 | 45 | 2. To promote computational reproducibility. Computational reproducibility means that other people can take your data and get the same numbers that are in your paper. Even if you don't have errors, it can still be very hard to recover the numbers from published papers because of ambiguities in analysis. Creating a document that literally specifies where all the numbers come from in terms of code that operates over the data removes all this ambiguity. 46 | 47 | 3. To create spiffy documents that can be revised easily. This is actually a really big neglected one for us. At least one of us used to tweak tables and figures by hand constantly, leading to a major incentive *never to rerun analyses* because it would mean re-pasting and re-illustratoring all the numbers and figures in a paper. That's a bad thing! It means you have an incentive to be lazy and to avoid redoing your stuff. And you waste tons of time when you do. In contrast, with a reproducible document, you can just rerun with a tweak to the code. You can even specify what you want the figures and tables to look like before you're done with all the data collection (e.g., for purposes of preregistraion or a registered report). 48 | 49 | ## Learning goals 50 | 51 | By the end of this class you should: 52 | 53 | * Know what Markdown is and how the syntax works, 54 | * See how to integrate code and data in RMarkdown, 55 | * Understand the different output formats from RMarkdown and how to generate them, and 56 | * Know about generating APA format files with `papaja` and bibtex. 57 | 58 | # Getting Started 59 | 60 | Fire up Rstudio and create a new RMarkdown file. Don't worry about the settings, we'll get to that later. 61 | 62 | If you click on "Knit" (or hit `CTRL+SHIFT+K`) the RMarkdown file will run and generate all results and present you with a PDF file, HTML file, or a Word file. If RStudio requests you to install packages, click yes and see whether everything works to begin with. 63 | 64 | We need that before we teach you more about RMarkdown. But you should feel good if you get here already, because honestly, you're about 80% of the way to being able to write basic RMarkdown files. It's _that_ easy. 65 | 66 | # Structure of an RMarkdown file 67 | 68 | An RMarkdown file contains several parts. Most essential are the header, the body text, and code chunks. 69 | 70 | ## Header 71 | 72 | Headers in RMarkdown files contain some metadata about your document, which you can customize to your liking. Below is a simple example that purely states the title, author name(s), date, and output format. 73 | 74 | ```yaml 75 | --- 76 | title: "Untitled" 77 | author: "NAME" 78 | date: "July 28, 2017" 79 | output: html_document 80 | --- 81 | ``` 82 | 83 | > ProTip: The header is written in "YAML", which means "yet another markup language." You don't need to know that, and don't worry about it. Just make sure you are careful with indenting, as YAML does care about that. 84 | 85 | For now, go ahead and set `html_document` to `word_document`, except if you have strong preferences for `HTML` or `PDF`.^[Note: to create PDF documents you also need a TeX installation. Don't know what that is? You probably don't have it then. More info below.] 86 | 87 | ## Body text 88 | 89 | The body of the document is where you actually write your reports. This is primarily written in the Markdown format, which is explained in the [Markdown syntax](#markdown-syntax) section. 90 | 91 | The beauty of RMarkdown is, however, that you can evaluate `R` code right in the text. To do this, you start inline code with \`r, type the code you want to run, and close it again with a \`. Usually, this key is below the escape (`ESC`) key or next to the left SHIFT button. 92 | 93 | For example, if you want to have the result of 48 times 35 in your text, you type \` r 48-35\`, which returns `r 48 - 35`. Please note that if you return a value with many decimals, it will also print these depending on your settings (for example, `r pi`). 94 | 95 | ## Code chunks 96 | 97 | In the section above we introduced you to running code inside text, but often you need to take several steps in order to get to the result you need. And you don't want to do data cleaning in the text! This is why there are code chunks. A simple example is a code chunk loading packages. 98 | 99 | First, insert a code chunk by going to `Code->Insert code chunk` or by pressing `CTRL+ALT+I`. Inside this code chunk you can then type for example, `library(ggplot2)` and create an object `x`. 100 | 101 | ```{r} 102 | library(ggplot2) 103 | 104 | x <- 1 + 1 105 | ``` 106 | 107 | If you do not want to have the contents of the code chunk to be put into your document, you include `echo=FALSE` at the start of the code chunk. We can now use the contents from the above code chunk to print results (e.g., $x=`r x`$). 108 | 109 | These code chunks can contain whatever you need, including tables, and figures (which we will go into more later). Note that all code chunks regard the location of the RMarkdown as the working directory, so when you try to read in data use the relative path in. 110 | 111 | 112 | # Markdown syntax 113 | 114 | Markdown is one of the simplest document languages around, that is an open standard and can be converted into `.tex`, `.docx`, `.html`, `.pdf`, etc. This is the main workhorse of RMarkdown and is very powerful. You can [learn Markdown in five (!) minutes](https://learnxinyminutes.com/docs/markdown/) Other resources include [http://rmarkdown.rstudio.com/authoring_basics.html](), and [this cheat sheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf). 115 | 116 | You can do some pretty cool tricks with Markdown, but these are the basics: 117 | 118 | * It's easy to get `*italic*` or `**bold**`. 119 | * You can get headings using `# heading1` for first level, `## heading2` for second-level, and `### heading3` for third level. Make sure you leave a space after the `#`! 120 | * Lists are delimited with `*` for each entry. 121 | * You can write links by writing `[here's my link](http://foo.com)`. 122 | 123 | If you want a more extensive description of all the potential of Markdown, [this introduction to Markdown](https://daringfireball.net/projects/markdown/) is highly detailed. 124 | 125 | The great thing about Markdown is that it works almost everywhere! Github, OSF, slack, many wikis, and even in text documents it looks pretty good. I find myself writing emails in markdown just because it's a clear and consistent way to format and outline. 126 | 127 | ## Exercises 128 | 129 | Swap over to your new sample markdown. 130 | 131 | 1. Outlining using headings is a really great way to keep things organized! Try making a bunch of headings, and then recompiling your document. 132 | 2. Add a table of contents. This will involve going to the header of the document (the `YAML`), and adding some options to the `html document` bit. You want it to look like this (indentation must to be correct): 133 | 134 | ```yaml 135 | output: 136 | html_document: 137 | toc: true 138 | ``` 139 | 140 | Now recompile. Looks nice, right?^[Pro-tip: you can specify how deep the TOC should go by adding `toc_depth: 2` to go two levels deep] 141 | 142 | 3. Try adding another option: `toc_float: true`. Recompile -- super cool. There are plenty more great output options that you can modify. [Here is a link to the documentation.](http://rmarkdown.rstudio.com/html_document_format.html) 143 | 144 | # Headers, Tables, and Graphs 145 | 146 | ## Headers 147 | 148 | We're going to want more libraries loaded (for now we're loading them inline). 149 | 150 | ```{r} 151 | library(knitr) 152 | library(ggplot2) 153 | library(broom) 154 | library(devtools) 155 | ``` 156 | 157 | We often also add `chunk options` to each code chunk so that, for example: 158 | 159 | - code does or doesn't display inline (`echo` setting) 160 | - figures are shown at various sizes (`fig.width` and `fig.height` settings) 161 | - warnings and messages are suppressed (`warning` and `message` settings) 162 | - computations are cached (`cache` setting) 163 | 164 | There are many others available as well. Caching can be very helpful for large files, but can also cause problems when there are external dependencies that change. An example that is useful for manuscripts is: 165 | 166 | ```{r eval=FALSE} 167 | opts_chunk$set(fig.width=8, fig.height=5, 168 | echo=TRUE, 169 | warning=FALSE, message=FALSE, 170 | cache=TRUE) 171 | ``` 172 | 173 | 174 | ## Graphs 175 | 176 | It's really easy to include graphs, like this one. (Using the `mtcars` dataset that comes with `ggplot2`). 177 | 178 | ```{r} 179 | qplot(hp, mpg, col = factor(cyl), data = mtcars) 180 | ``` 181 | 182 | All you have to do is make the plot and it will render straight into the text. 183 | 184 | External graphics can also be included, as follows: 185 | 186 | ```{r eval = FALSE} 187 | knitr::include_graphics("path/to/file") 188 | ``` 189 | 190 | ## Tables 191 | 192 | There are many ways to make good-looking tables using RMarkdown, depending on your display purpose. 193 | 194 | - The `knitr` package (which powers RMarkdown) comes with the `kable` function. It's versatile and makes perfectly reasonable tables. It also has a `digits` argument for controlling rounding. 195 | - For HTML tables, there is the `DT` package, which provides `datatable` -- these are pretty and interactive javascript-based tables that you can click on and search in. Not great for static documents though. 196 | 197 | - For APA manuscripts, it can also be helpful to use the `xtable` package, which creates very flexible LaTeX tables. These can be tricky to get right but they are completely customizable provided you want to google around and learn a bit about tex. 198 | 199 | We recommend starting with `kable`: 200 | 201 | ```{r} 202 | kable(head(mtcars), digits = 1) 203 | ``` 204 | 205 | ## Statistics 206 | 207 | It's also really easy to include statistical tests of various types. 208 | 209 | For this, an option is the `broom` package, which formats the outputs of various tests really nicely. Paired with knitr's `kable` you can make very simple tables in just a few lines of code. 210 | 211 | ```{r} 212 | mod <- lm(mpg ~ hp + cyl, data = mtcars) 213 | kable(tidy(mod), digits = 3) 214 | ``` 215 | 216 | Of course, cleaning these up can take some work. For example, we'd need to rename a bunch of fields to make this table have the labels we wanted (e.g., to turn `hp` into `Horsepower`). 217 | 218 | We often need APA-formatted statistics. We can compute them first, and then print them inline. 219 | 220 | ```{r} 221 | ts <- with(mtcars,t.test(hp[cyl==4], hp[cyl==6])) 222 | ``` 223 | 224 | > There's a statistically-significant difference in horsepower for 4- and 6-cylinder cars ($t(`r round(ts$parameter,2)`) = `r round(ts$statistic,2)`$, $p = `r round(ts$p.value,3)`$). 225 | 226 | To insert these stats inline I wrote e.g. `round(ts$parameter, 2)` inside an inline code block.^[APA would require omission of the leading zero. `papaja::printp()` will let you do that, see below.] 227 | 228 | Note that rounding can occasionally get you in trouble here, because it's very easy to have an output of $p = 0$ when in fact $p$ can never be exactly equal to 0. Nonetheless, this can help you prevent rounding errors and the wrath of `statcheck`. 229 | 230 | ## Exercises 231 | 232 | 1. Using the `mtcars` dataset, insert a table and a graph of your choice into the document.^[If you're feeling uninspired, try `hist(mtcars$mpg)`.] 233 | 234 | # Collaboration 235 | 236 | How do we collaborate using RMarkdown? There are lots of different workflows that people use. The way it works in my lab is that the first author typically makes a github repository with the markdown-formatted document in it. Sometimes we just collaborate through github or through writing comments on the rendered PDF and sending them back to the first author. (I like the dropbox PDF comment interface for this). 237 | 238 | But, sometimes you want to do lots of line-editing or write collaboratively, especially with someone who doesn't like github and markdown and all that. For these cases, we often paste the intro into google docs or Word and edit until we converge, then the first author puts that back into the markdown. This is a little clunky, but not too bad. And critically, all the figures and numbers get rendered fresh when you re-knit, so nothing can get accidentally altered during the editing process. 239 | 240 | # Writing APA-format papers 241 | 242 | (Thanks to [Frederick Aust](http://github.com/crsh) for contributing this section!) 243 | 244 | The end-game of reproducible research is to knit your entire paper. We'll focus on APA-style writeups. Managing APA format is a pain in the best of times. Isn't it nice to get it done for you? 245 | 246 | We're going to use the `papaja` package. `papaja` is a R-package including a R Markdown template that can be used to produce documents that adhere to the American Psychological Association (APA) manuscript guidelines (6th Edition). 247 | 248 | ## Software requirements 249 | 250 | To use `papaja`, make sure you are using the latest versions of R and RStudio. If you want to create PDF- in addition to DOCX-files you need **[TeX](http://de.wikipedia.org/wiki/TeX) 2013 or later**. Try [MikTeX](http://miktex.org/) for Windows, [MacTeX](https://tug.org/mactex/) for Mac, or [TeX Live](http://www.tug.org/texlive/) for Linux. Some Linux users may need a few additional TeX packages for the LaTeX document class `apa6` to work.^[For Ubuntu, we suggest running: `sudo apt-get install texlive texlive-publishers texlive-fonts-extra texlive-latex-extra texlive-humanities lmodern`.] 251 | 252 | 253 | ## Installing `papaja` 254 | 255 | `papaja` has not yet been released on CRAN but you can install it from GitHub. 256 | 257 | ```{r install_papapja, eval = FALSE} 258 | # Install devtools package if necessary 259 | if(!"devtools" %in% rownames(installed.packages())) install.packages("devtools") 260 | 261 | # Install papaja 262 | devtools::install_github("crsh/papaja") 263 | ``` 264 | 265 | ## Creating a document 266 | 267 | The APA manuscript template should now be available through the RStudio menus when creating a new R Markdown file. 268 | 269 | When you click RStudio's *Knit* button `papaja`, `rmarkdown,` and `knitr` work together to create an APA conform manuscript that includes both your manuscript text and the results of any embedded R code. 270 | 271 | Note, if you don't have TeX installed on your computer, or if you would like to create a Word document replace `output: papaja::apa6_pdf` with `output: papaja::apa6_word` in the document YAML header. 272 | 273 | `papaja` provides some rendering options that only work if you use `output: papaja::apa6_pdf`. 274 | `figsintext` indicates whether figures and tables should be included at the end of the document---as required by APA guidelines---or rendered in the body of the document. 275 | If `figurelist`, `tablelist`, or `footnotelist` are set to `yes` a list of figure captions, table captions, or footnotes is given following the reference section. 276 | `lineno` indicates whether lines should be continuously numbered through out the manuscript. 277 | 278 | 279 | ## Bibiographic management 280 | 281 | It's also possible to include references using `bibtex`, by using `@ref` syntax. An option for managing references is [bibdesk](http://bibdesk.sourceforge.net/), which integrates with google scholar.^[But many other options are possible.] 282 | 283 | With a bibtex file included, you can refer to papers. As an example, `@nuijten2016` results in the in text citation "@nuijten2016", or cite them parenthetically with `[@nuijten2016]` [@nuijten2016]. Take a look at the `papaja` APA example to see how this works. 284 | 285 | `citr` is an R package that provides an easy-to-use [RStudio addin](https://rstudio.github.io/rstudioaddins/) that facilitates inserting citations. 286 | The addin will automatically look up the Bib(La)TeX-file(s) specified in the YAML front matter. 287 | The references for the inserted citations are automatically added to the documents reference section. 288 | 289 | 290 | 291 | 292 | 293 | Once `citr` is installed (`install.packages("citr")`) and you have restarted your R session, the addin appears in the menus and you can define a [keyboard shortcut](https://rstudio.github.io/rstudioaddins/#keyboard-shorcuts) to call the addin. 294 | 295 | ## Exercise 296 | 297 | Make sure you've got `papaja`, then open a new template file. Compile this document, and look at how awesome it is. (To compile you need `texlive`, a library for compiling markdown to PDF, so you may need to wait and install this later if it's not working). 298 | 299 | Try pasting in your figure and table from your other RMarkdown (don't forget any libraries you need to make it compile). Presto, ready to submit! 300 | 301 | For a bit more on `papaja`, check out **[this guide](https://rpubs.com/YaRrr/papaja_guide)**. --------------------------------------------------------------------------------