├── .gitignore
├── README.md
├── original-data
    ├── README.md
    └── metadata
    │   ├── README.md
    │   ├── metadata_guide.md
    │   └── supplements
    │       └── README.md
└── processing-and-analysis
    ├── README.md
    ├── analysis-data
        ├── README.md
        └── data_appendix.Rmd
    ├── command-files
        └── README.md
    └── importable-data
        └── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rhistory


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # The TIER Documentation Protocol v3.0 for R
 2 | 
 3 | 
 4 | ## Overview 
 5 | 
 6 | The TIER Documentation Protocol provides instructions for assembling a 
 7 | set of electronic files that document all the steps of data processing 
 8 | and analysis you conduct for an empirical research paper. 
 9 | 
10 | The documentation specified by the Protocol contains all the data, 
11 | computer programs, and explanatory information an independent researcher 
12 | would need to be able to replicate the data processing and analysis you 
13 | conducted for the project and to reproduce exactly all the results 
14 | reported in your paper.
15 | 
16 | ## ProjectTIER_R repository
17 | 
18 | The instructions presented in this repository are written for users of R. 
19 | In a few places, they use R-specific terminology. For example, we refer to 
20 | command files as scripts, and their names are followed by the .R 
21 | extension. But the R-specific terminology that appears in these 
22 | instructions can be easily translated to any of the major statistical 
23 | packages (such as SPSS, SAS, Stata or Matlab) or other programming 
24 | languages.
25 | 
26 | ## Getting started
27 | 
28 | To get started you can fork and then clone this repository which will create 
29 | a copy of the folder structure recommended in the Project TIER protocol, or click the 
30 | "Clone or Download" button to download a ZIP file of the structure.
31 | 
32 | Below we describe how to organize your analysis according to the 
33 | Project TIER protocol, i.e. which components of your analysis should go 
34 | into which folder.
35 | 
36 | ## Hierarchy and description of files and folders
37 | 
38 | Your repository should have the following hierarchy of files and folders:
39 | 
40 | - An electronic copy of your complete final paper. Often, this means:
41 | 	+ An `.Rmd` file with all the text and code to produce the final paper
42 | 	+ A knitted HTML or PDF file of the complete paper
43 | - The `README.md` file for your repository
44 | - Original Data and Metadata - `original-data`
45 |     + Metadata - `metadata`
46 |         - Metadata Guide - `metadata_guide.md`
47 |         - Supplements - `supplements`
48 | - Processing and Analysis - `processing-and-analysis`
49 |     + Importable Data - `importable-data`
50 |     + Command Files - `command-files`
51 |     + Analysis Data - `analysis-data`
52 | 
53 | Contents of these files and folders are described in the `README` files
54 | within these folders.
55 | 
56 | ## README
57 | 
58 | The `README.md` file in the top hierarchy of your repository (this 
59 | file) gives information about all the other files included in the 
60 | documentation for your paper. In particular, the `README` file should:
61 | 
62 | 1. state what statistical software or other computer programs are 
63 | needed to run the command files.
64 | 1. explain the structure of the hierarchy of folders in which the 
65 | documentation is stored, and briefly describe each of the files 
66 | included in the documentation.
67 | 1. describe precisely any changes you made to your original data files 
68 | to create the corresponding versions saved in your `importable-data` 
69 | folder.
70 | 1. give explicit, step-by-step instructions for using your 
71 | documentation to replicate the statistical results reported in your 
72 | paper.
73 | 
74 | The README should be a Markdown document so that it can be 
75 | rendered properly on GitHub, and any changes can be tracked. It should 
76 | be named `README.md`. This file should be stored in the top level of 
77 | your repository.


--------------------------------------------------------------------------------
/original-data/README.md:
--------------------------------------------------------------------------------
1 | # Original Data and Metadata
2 | 
3 | This folder is where you place a copy of every original data file from which you extract any of the data used in your study. These data files should be placed directly into the `original-data` directory.
4 | 
5 | Occasionally, an original data file is in a format that cannot be read by the statistical software you are using for your project (for example, a PDF file that needs to be read using optical character recognition). In this case, you need to create a modified version of the original data file that is in a format your software can read. 
6 | 
7 | When you need to create an importable version of an original data file, you should keep both versions (importable and original) in the Data folder.  (The original and importable versions of a data file should be given different names.)
8 | 
9 | This folder also contains the sub-folder `metadata`, the contents of which is described in the `README` file within that folder.


--------------------------------------------------------------------------------
/original-data/metadata/README.md:
--------------------------------------------------------------------------------
 1 | # Metadata
 2 | 
 3 | The top level of your `metadata` folder should contain one document: 
 4 | the Metadata Guide.
 5 | 
 6 | The `metadata` folder should also contain one sub-folder: the `
 7 | supplements` folder.
 8 | 
 9 | ## Metadata Guide
10 | 
11 | For each of your original data files, the Metadata Guide provides the kind of information typically found in a codebook accompanying a dataset, such as variable definitions and coding, sampling methods, and anything else a user would need to know to work with and interpret the data appropriately.  
12 | 
13 | You, the author of the paper, compose the Metadata Guide.  
14 | 
15 | The Metadata Guide should be organized into one or more sections; each section should provide information about one of the original data files in your `original-data` folder.  For each original data file, the information included in the Metadata Guide should include:
16 | 
17 | 
18 | 1. *A bibliographic citation for the original data file.* This citation should be in a format consistent with the editorial style (e.g., APA or Chicago) used in the main paper or report on the study.
19 | 1. *A digital object identifier (DOI) for the data file (if one has been assigned).* If a DOI is included in the bibliographic citation, it need not be repeated.
20 | 1. *The date on which the author first downloaded, or obtained in some other way, the original data file.* If a date is included in the bibliographic citation, it need not be repeated.
21 | 1. *A written explanation of how an interested reader can obtain a copy 
22 | of the original data file.* In many cases, this explanation will give the 
23 | URL of a website from which the data can be accessed, along with 
24 | instructions for downloading a file identical to the original data file 
25 | you obtained from that site.In all cases, this explanation should be 
26 | complete and precise enough to allow an independent researcher to 
27 | locate and obtain the data file without any additional information or 
28 | assistance.
29 | 1. *Whatever additional information an independent researcher would need 
30 | to understand and use the data in the original data file.* The particular 
31 | information required can vary a great deal depending on the nature of 
32 | the original data file in question, and deciding what additional 
33 | information to provide therefore requires thoughtful consideration and 
34 | judgment.In many cases, the relevant information is similar to what is 
35 | found in a codebook or users' guide for a dataset: variable names and 
36 | definitions, coding schemes and units of measurement, and details of 
37 | the sampling method and weight variables.In some cases, it is also 
38 | necessary to include information about the file structure (e.g., the 
39 | delimiters used to separate variables, or, in rectangular files without 
40 | delimiters, the columns in which the variables are stored).Any other 
41 | unique or idiosyncratic aspects of the data that an independent user of 
42 | the data would need to understand should be explained as well.
43 | 1. *Supplementary documents with additional metadata.* In many cases, some or all of the information about an original data file that should be included in the Metadata Guide is available in an existing, publicly accessible document, such as a codebook or user’s guide that is provided with the original data file.  In these cases, it is not necessary to include that information in the Metadata Guide.  Instead, you may simply put a note in the Metadata Guide indicating that the information is available in an existing document.
44 | 
45 | When you put a note in the Metadata Guide indicating that certain parts of the information that should be provided there are available in an existing document, you should preserve a copy of the existing document in the Metadata sub-folder (along with the Metadata Guide that you compose yourself). 
46 | 
47 | The Metadata Guide should be a Markdown document so that it can be 
48 | rendered properly on GitHub, and any changes can be tracked. It should 
49 | be named `metadata_guide.md`. This file should be stored in the 
50 | `metadata` folder.


--------------------------------------------------------------------------------
/original-data/metadata/metadata_guide.md:
--------------------------------------------------------------------------------
1 | # Metadata Guide


--------------------------------------------------------------------------------
/original-data/metadata/supplements/README.md:
--------------------------------------------------------------------------------
1 | # Supplements
2 | 
3 | As described in the `README` file for the Metadata Guide, the 
4 | `supplements` folder is where you store any existing documents related 
5 | to your original data files, such as users’ guides or codebooks, that 
6 | contain relevant information you omitted from the Metadata Guide.


--------------------------------------------------------------------------------
/processing-and-analysis/README.md:
--------------------------------------------------------------------------------
1 | # Processing and Analysis
2 | 
3 | This folder contains three sub-folders: `importable-data`, 
4 | `command-files`, and `analysis-data`.
5 | 
6 | Contents of these folders are described in `README` files in those 
7 | folders.


--------------------------------------------------------------------------------
/processing-and-analysis/analysis-data/README.md:
--------------------------------------------------------------------------------
 1 | # Analysis Data
 2 | 
 3 | This folder should contain:
 4 | 
 5 | - Your analysis data file(s) as described in the instructions for your `
 6 | command-files` folder.
 7 | - Your Data Appendix, `data_appendix.Rmd`.
 8 | 
 9 | ## The Data Appendix
10 | 
11 | The Data Appendix is a document that serves as a codebook for the analysis data file(s).  It is composed by the author of the paper.
12 | 
13 | If the data processing phase of your research generated just one 
14 | analysis data file, and all the results presented in your paper were 
15 | derived from that single analysis data file, the Data Appendix should 
16 | begin with a brief description of the analysis data file.
17 | 
18 | Typically, this description will say something about the scope of the 
19 | sample or population the data represent, specify the unit of analysis, 
20 | and indicate the number of observations. As in the case of the metadata 
21 | that accompanies your original data files, however, exactly what 
22 | information is relevant will depend on the nature of the analysis data 
23 | file, so deciding which aspects you will describe in the Data Appendix 
24 | will require judgment.
25 | 
26 | After the brief description of the analysis data file, the Data 
27 | Appendix should present information about every variable in the 
28 | analysis data file.The information presented about each variable should 
29 | include:
30 | 
31 | - the name of the variable and a complete definition (including as appropriate, for example, coding and/or units of measurement, the wording of a survey question the variable is based on, or adjustments made for inflation or PPP).
32 | - the name(s) of the original data file from which the variable was extracted, or from which the variables used to construct it were extracted, and the names of the variables extracted from the original data files.
33 | - the number of observations with valid values for the variable, and the number of observations with missing values.
34 | 
35 | For categorical variables, the information should also include:
36 | - a frequency table.
37 | - a bar chart illustrating the frequency distribution.
38 | 
39 | For quantitative variables, the information should also include:
40 | - basic summary statistics: the mean, standard deviation, minimum, 25th 
41 | percentile, median, 75th percentile, and maximum.
42 | - a histogram.
43 | 
44 | If the results presented in your paper were derived from more than one 
45 | analysis data file, the Data Appendix should include all of the above 
46 | information - the brief description of the data file and the 
47 | information about each of the variables contained in the file - for 
48 | each of the analysis data files that was used.
49 | 
50 | The Data Appendix should be an R Markdown document so that it can be 
51 | rendered properly on GitHub, and any changes can be tracked. It should 
52 | be named `data_appendix.Rmd`. This file should be stored in the 
53 | `analysis-data` folder.
54 | 
55 | This file contains plain text as well as R code that generates the 
56 | summary statistics and figures described above.


--------------------------------------------------------------------------------
/processing-and-analysis/analysis-data/data_appendix.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Data Appendix"
3 | author: "name of author"
4 | date: "date created or edited"
5 | output: html_document
6 | ---


--------------------------------------------------------------------------------
/processing-and-analysis/command-files/README.md:
--------------------------------------------------------------------------------
 1 | # Command Files
 2 | 
 3 | For simple implementations of the TIER Protocol, the "commands" (R code) will be interwoven with text in the `.Rmd` file in the main folder of your project. 
 4 | 
 5 | However, occasionally it is necessary to split out supplementary work into additional files. For example, you may have a separate `.Rmd` file that completes a long and arduous data cleaning process, or you may have R scripts (with the `.R` extension) that contain specialized functions you wish to "source" into your final analysis. 
 6 | 
 7 | These additional files should be placed in the `command-files` directory. 
 8 | 
 9 | In all of the scripts you write, it is important to include comments 
10 | that are detailed and clear enough to make it possible for someone not 
11 | familiar with your project to understand the steps of data processing 
12 | and analysis that are executed by the commands in the script.
13 | 
14 | For the purpose of constructing and organizing your scripts, you may want to
15 | think of the work on your project in terms of two phases, 
16 | (1) processing, and (2) analysis. Your scripts 
17 | will include one or more files that execute each of these phases of 
18 | research.
19 | 
20 | 1. The script(s)for the processing phase should include commands that 
21 | execute all the processing required to transform your importable data 
22 | files into the final data you will use in your analysis. 
23 | Exactly what these steps will be is highly variable, but they typically 
24 | include operations such as joining two or more data files, dropping 
25 | variables or cases, generating new variables, and recoding. At the end 
26 | of the script(s)for the processing phase, there should be `save()`
27 | commands that save the final data file(s) upon 
28 | which your analysis will be conducted. We will refer to the final data 
29 | files(s) that you use in your analysis as "your analysis data file(s)". <br><br>
30 | Your analysis data file(s) should be stored in your `analysis-data` 
31 | folder. <br><br>
32 | Strictly speaking, including your analysis data file(s) in the 
33 | documentation is redundant:  anyone interested in your analysis data 
34 | file(s) files can create them simply by executing the R scripts you 
35 | wrote for the importing and processing phases of your project. 
36 | Nonetheless, the TIER Protocol calls for your analysis data file(s) to 
37 | be included simply because it is sometimes convenient to have a readily 
38 | accessible copy of the analysis data.
39 | 
40 | 1. The script(s) for the analysis phase should contain commands that 
41 | open the analysis data file(s) you created in the processing phase, and 
42 | then generate the results reported in your paper. <br><br>
43 | Every command in your analysis script(s)that generates a piece of 
44 | output or a result reported in your paper should be preceded by a 
45 | comment that indicates what piece of output or result the command will 
46 | generate.The following examples illustrate some typical kinds of 
47 | comments:
48 | 
49 |     `# The following command produces Table 6.`
50 | 
51 |     `# The following command produces Figure 12.`
52 |     
53 |     `# The following command calculates the correlation of -0.54` <br>
54 |     `# between variables X and Y reported on page 16 of the paper.`
55 | 
56 | All of the scripts for importing, processing and analyzing your data 
57 | should be included in the `command-files` folder. 
58 | 
59 | One additional script, called `data_appendix.R`, should also be 
60 | included in your `command-files` folder. This script is described in 
61 | the instructions for your Data Appendix.


--------------------------------------------------------------------------------
/processing-and-analysis/importable-data/README.md:
--------------------------------------------------------------------------------
 1 | # Importable Data
 2 | 
 3 | For each of the original data files in your `original-data` folder, 
 4 | you should create a corresponding version that we will call an 
 5 | "importable data file." These importable data files should be stored in 
 6 | the `importable-data` folder.
 7 | 
 8 | In some cases, the importable data file will be a slightly modified 
 9 | version of the original. In other cases the importable version will be 
10 | identical to the original. 
11 | 
12 | Whether or not an importable data file differs from the original 
13 | version will depend on whether the original version is in a format that 
14 | R can open or import.
15 | 
16 | There are two cases to consider:
17 | 
18 | (1) The original data file is in a format that R can open or import.
19 | 
20 | This case obviously applies if the original data file is in R’s 
21 | `.Rdata` format.
22 | 
23 | This case also applies to files that are not in `.Rdata` format, but 
24 | that can be opened with R. For example, R’s `read.csv()` command can 
25 | be used to import data from a file in CSV format. Similarly, if you 
26 | have loaded the `XLConnect` package, the `readWorksheetFromFile()`
27 | command can be used to import data from an Excel workbook.
28 | 
29 | When an original data file is in R’s `.Rdata` format, or another format 
30 | that can be imported into R without any modification, the corresponding 
31 | importable data file should be an exact copy of the original. In these 
32 | cases, the copy of the file in the `importable-data` folder should have 
33 | the same name as the copy in the `original-data' folder. 
34 | 
35 | Note, however, that in some cases, even when an original data file is 
36 | in CSV or Excel format, it may be convenient or necessary to modify it 
37 | slightly before using R to import the data it contains. Three examples 
38 | of cases in which this is true are given below under item (ii).
39 | 
40 | (2) The original data file must be modified before it can be imported 
41 | to R.
42 | 
43 | In some situations, it may be necessary or convenient to modify an 
44 | original data file before importing it to R.
45 | 
46 | The following examples illustrate a few of the common cases:
47 | 
48 | - If the original data file is a spreadsheet that contains explanatory 
49 | notes as well as data, it may be necessary or convenient to remove 
50 | those notes from the importable version of the spreadsheet.
51 | 
52 | - If a certain variable is measured in dollars, and a dollar sign ($) 
53 | precedes each value of the variable in the original CSV or Excel data 
54 | file, you may wish to remove the dollar signs so that R recognizes that 
55 | the variable should be stored in a numeric format.
56 | 
57 | - If an original data file is formatted for use with a particular type 
58 | of software other than R (e.g., SPSS, SAS, R or Matlab), it may be 
59 | necessary to convert the file from its original format to 
60 | R’s `.Rdata` format using a package like [Stat/Transfer](https://www.stattransfer.com/), or 
61 | an R package such as `foreign` or `haven`.
62 | 
63 | As these examples illustrate, the particular ways in which an original 
64 | data file needs to be modified will vary depending on the nature of the 
65 | original data file. But in every case, the modifications made to an 
66 | original data file to create the importable version should follow this 
67 | general principle:
68 | 
69 | **The importable data file should be as nearly identical as possible to 
70 | the original; no changes should be made to the file other than the 
71 | minimal modifications required to allow R to read the data 
72 | it contains.**
73 | 
74 | When an importable data file is a modified version of the corresponding 
75 | original data file, the original and importable versions should be 
76 | given different names.


--------------------------------------------------------------------------------