├── .gitignore
├── README.md
├── original-data
├── README.md
└── metadata
│ ├── README.md
│ ├── metadata_guide.md
│ └── supplements
│ └── README.md
└── processing-and-analysis
├── README.md
├── analysis-data
├── README.md
└── data_appendix.Rmd
├── command-files
└── README.md
└── importable-data
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rhistory
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # The TIER Documentation Protocol v3.0 for R
2 |
3 |
4 | ## Overview
5 |
6 | The TIER Documentation Protocol provides instructions for assembling a
7 | set of electronic files that document all the steps of data processing
8 | and analysis you conduct for an empirical research paper.
9 |
10 | The documentation specified by the Protocol contains all the data,
11 | computer programs, and explanatory information an independent researcher
12 | would need to be able to replicate the data processing and analysis you
13 | conducted for the project and to reproduce exactly all the results
14 | reported in your paper.
15 |
16 | ## ProjectTIER_R repository
17 |
18 | The instructions presented in this repository are written for users of R.
19 | In a few places, they use R-specific terminology. For example, we refer to
20 | command files as scripts, and their names are followed by the .R
21 | extension. But the R-specific terminology that appears in these
22 | instructions can be easily translated to any of the major statistical
23 | packages (such as SPSS, SAS, Stata or Matlab) or other programming
24 | languages.
25 |
26 | ## Getting started
27 |
28 | To get started you can fork and then clone this repository which will create
29 | a copy of the folder structure recommended in the Project TIER protocol, or click the
30 | "Clone or Download" button to download a ZIP file of the structure.
31 |
32 | Below we describe how to organize your analysis according to the
33 | Project TIER protocol, i.e. which components of your analysis should go
34 | into which folder.
35 |
36 | ## Hierarchy and description of files and folders
37 |
38 | Your repository should have the following hierarchy of files and folders:
39 |
40 | - An electronic copy of your complete final paper. Often, this means:
41 | + An `.Rmd` file with all the text and code to produce the final paper
42 | + A knitted HTML or PDF file of the complete paper
43 | - The `README.md` file for your repository
44 | - Original Data and Metadata - `original-data`
45 | + Metadata - `metadata`
46 | - Metadata Guide - `metadata_guide.md`
47 | - Supplements - `supplements`
48 | - Processing and Analysis - `processing-and-analysis`
49 | + Importable Data - `importable-data`
50 | + Command Files - `command-files`
51 | + Analysis Data - `analysis-data`
52 |
53 | Contents of these files and folders are described in the `README` files
54 | within these folders.
55 |
56 | ## README
57 |
58 | The `README.md` file in the top hierarchy of your repository (this
59 | file) gives information about all the other files included in the
60 | documentation for your paper. In particular, the `README` file should:
61 |
62 | 1. state what statistical software or other computer programs are
63 | needed to run the command files.
64 | 1. explain the structure of the hierarchy of folders in which the
65 | documentation is stored, and briefly describe each of the files
66 | included in the documentation.
67 | 1. describe precisely any changes you made to your original data files
68 | to create the corresponding versions saved in your `importable-data`
69 | folder.
70 | 1. give explicit, step-by-step instructions for using your
71 | documentation to replicate the statistical results reported in your
72 | paper.
73 |
74 | The README should be a Markdown document so that it can be
75 | rendered properly on GitHub, and any changes can be tracked. It should
76 | be named `README.md`. This file should be stored in the top level of
77 | your repository.
--------------------------------------------------------------------------------
/original-data/README.md:
--------------------------------------------------------------------------------
1 | # Original Data and Metadata
2 |
3 | This folder is where you place a copy of every original data file from which you extract any of the data used in your study. These data files should be placed directly into the `original-data` directory.
4 |
5 | Occasionally, an original data file is in a format that cannot be read by the statistical software you are using for your project (for example, a PDF file that needs to be read using optical character recognition). In this case, you need to create a modified version of the original data file that is in a format your software can read.
6 |
7 | When you need to create an importable version of an original data file, you should keep both versions (importable and original) in the Data folder. (The original and importable versions of a data file should be given different names.)
8 |
9 | This folder also contains the sub-folder `metadata`, the contents of which is described in the `README` file within that folder.
--------------------------------------------------------------------------------
/original-data/metadata/README.md:
--------------------------------------------------------------------------------
1 | # Metadata
2 |
3 | The top level of your `metadata` folder should contain one document:
4 | the Metadata Guide.
5 |
6 | The `metadata` folder should also contain one sub-folder: the `
7 | supplements` folder.
8 |
9 | ## Metadata Guide
10 |
11 | For each of your original data files, the Metadata Guide provides the kind of information typically found in a codebook accompanying a dataset, such as variable definitions and coding, sampling methods, and anything else a user would need to know to work with and interpret the data appropriately.
12 |
13 | You, the author of the paper, compose the Metadata Guide.
14 |
15 | The Metadata Guide should be organized into one or more sections; each section should provide information about one of the original data files in your `original-data` folder. For each original data file, the information included in the Metadata Guide should include:
16 |
17 |
18 | 1. *A bibliographic citation for the original data file.* This citation should be in a format consistent with the editorial style (e.g., APA or Chicago) used in the main paper or report on the study.
19 | 1. *A digital object identifier (DOI) for the data file (if one has been assigned).* If a DOI is included in the bibliographic citation, it need not be repeated.
20 | 1. *The date on which the author first downloaded, or obtained in some other way, the original data file.* If a date is included in the bibliographic citation, it need not be repeated.
21 | 1. *A written explanation of how an interested reader can obtain a copy
22 | of the original data file.* In many cases, this explanation will give the
23 | URL of a website from which the data can be accessed, along with
24 | instructions for downloading a file identical to the original data file
25 | you obtained from that site.In all cases, this explanation should be
26 | complete and precise enough to allow an independent researcher to
27 | locate and obtain the data file without any additional information or
28 | assistance.
29 | 1. *Whatever additional information an independent researcher would need
30 | to understand and use the data in the original data file.* The particular
31 | information required can vary a great deal depending on the nature of
32 | the original data file in question, and deciding what additional
33 | information to provide therefore requires thoughtful consideration and
34 | judgment.In many cases, the relevant information is similar to what is
35 | found in a codebook or users' guide for a dataset: variable names and
36 | definitions, coding schemes and units of measurement, and details of
37 | the sampling method and weight variables.In some cases, it is also
38 | necessary to include information about the file structure (e.g., the
39 | delimiters used to separate variables, or, in rectangular files without
40 | delimiters, the columns in which the variables are stored).Any other
41 | unique or idiosyncratic aspects of the data that an independent user of
42 | the data would need to understand should be explained as well.
43 | 1. *Supplementary documents with additional metadata.* In many cases, some or all of the information about an original data file that should be included in the Metadata Guide is available in an existing, publicly accessible document, such as a codebook or user’s guide that is provided with the original data file. In these cases, it is not necessary to include that information in the Metadata Guide. Instead, you may simply put a note in the Metadata Guide indicating that the information is available in an existing document.
44 |
45 | When you put a note in the Metadata Guide indicating that certain parts of the information that should be provided there are available in an existing document, you should preserve a copy of the existing document in the Metadata sub-folder (along with the Metadata Guide that you compose yourself).
46 |
47 | The Metadata Guide should be a Markdown document so that it can be
48 | rendered properly on GitHub, and any changes can be tracked. It should
49 | be named `metadata_guide.md`. This file should be stored in the
50 | `metadata` folder.
--------------------------------------------------------------------------------
/original-data/metadata/metadata_guide.md:
--------------------------------------------------------------------------------
1 | # Metadata Guide
--------------------------------------------------------------------------------
/original-data/metadata/supplements/README.md:
--------------------------------------------------------------------------------
1 | # Supplements
2 |
3 | As described in the `README` file for the Metadata Guide, the
4 | `supplements` folder is where you store any existing documents related
5 | to your original data files, such as users’ guides or codebooks, that
6 | contain relevant information you omitted from the Metadata Guide.
--------------------------------------------------------------------------------
/processing-and-analysis/README.md:
--------------------------------------------------------------------------------
1 | # Processing and Analysis
2 |
3 | This folder contains three sub-folders: `importable-data`,
4 | `command-files`, and `analysis-data`.
5 |
6 | Contents of these folders are described in `README` files in those
7 | folders.
--------------------------------------------------------------------------------
/processing-and-analysis/analysis-data/README.md:
--------------------------------------------------------------------------------
1 | # Analysis Data
2 |
3 | This folder should contain:
4 |
5 | - Your analysis data file(s) as described in the instructions for your `
6 | command-files` folder.
7 | - Your Data Appendix, `data_appendix.Rmd`.
8 |
9 | ## The Data Appendix
10 |
11 | The Data Appendix is a document that serves as a codebook for the analysis data file(s). It is composed by the author of the paper.
12 |
13 | If the data processing phase of your research generated just one
14 | analysis data file, and all the results presented in your paper were
15 | derived from that single analysis data file, the Data Appendix should
16 | begin with a brief description of the analysis data file.
17 |
18 | Typically, this description will say something about the scope of the
19 | sample or population the data represent, specify the unit of analysis,
20 | and indicate the number of observations. As in the case of the metadata
21 | that accompanies your original data files, however, exactly what
22 | information is relevant will depend on the nature of the analysis data
23 | file, so deciding which aspects you will describe in the Data Appendix
24 | will require judgment.
25 |
26 | After the brief description of the analysis data file, the Data
27 | Appendix should present information about every variable in the
28 | analysis data file.The information presented about each variable should
29 | include:
30 |
31 | - the name of the variable and a complete definition (including as appropriate, for example, coding and/or units of measurement, the wording of a survey question the variable is based on, or adjustments made for inflation or PPP).
32 | - the name(s) of the original data file from which the variable was extracted, or from which the variables used to construct it were extracted, and the names of the variables extracted from the original data files.
33 | - the number of observations with valid values for the variable, and the number of observations with missing values.
34 |
35 | For categorical variables, the information should also include:
36 | - a frequency table.
37 | - a bar chart illustrating the frequency distribution.
38 |
39 | For quantitative variables, the information should also include:
40 | - basic summary statistics: the mean, standard deviation, minimum, 25th
41 | percentile, median, 75th percentile, and maximum.
42 | - a histogram.
43 |
44 | If the results presented in your paper were derived from more than one
45 | analysis data file, the Data Appendix should include all of the above
46 | information - the brief description of the data file and the
47 | information about each of the variables contained in the file - for
48 | each of the analysis data files that was used.
49 |
50 | The Data Appendix should be an R Markdown document so that it can be
51 | rendered properly on GitHub, and any changes can be tracked. It should
52 | be named `data_appendix.Rmd`. This file should be stored in the
53 | `analysis-data` folder.
54 |
55 | This file contains plain text as well as R code that generates the
56 | summary statistics and figures described above.
--------------------------------------------------------------------------------
/processing-and-analysis/analysis-data/data_appendix.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Data Appendix"
3 | author: "name of author"
4 | date: "date created or edited"
5 | output: html_document
6 | ---
--------------------------------------------------------------------------------
/processing-and-analysis/command-files/README.md:
--------------------------------------------------------------------------------
1 | # Command Files
2 |
3 | For simple implementations of the TIER Protocol, the "commands" (R code) will be interwoven with text in the `.Rmd` file in the main folder of your project.
4 |
5 | However, occasionally it is necessary to split out supplementary work into additional files. For example, you may have a separate `.Rmd` file that completes a long and arduous data cleaning process, or you may have R scripts (with the `.R` extension) that contain specialized functions you wish to "source" into your final analysis.
6 |
7 | These additional files should be placed in the `command-files` directory.
8 |
9 | In all of the scripts you write, it is important to include comments
10 | that are detailed and clear enough to make it possible for someone not
11 | familiar with your project to understand the steps of data processing
12 | and analysis that are executed by the commands in the script.
13 |
14 | For the purpose of constructing and organizing your scripts, you may want to
15 | think of the work on your project in terms of two phases,
16 | (1) processing, and (2) analysis. Your scripts
17 | will include one or more files that execute each of these phases of
18 | research.
19 |
20 | 1. The script(s)for the processing phase should include commands that
21 | execute all the processing required to transform your importable data
22 | files into the final data you will use in your analysis.
23 | Exactly what these steps will be is highly variable, but they typically
24 | include operations such as joining two or more data files, dropping
25 | variables or cases, generating new variables, and recoding. At the end
26 | of the script(s)for the processing phase, there should be `save()`
27 | commands that save the final data file(s) upon
28 | which your analysis will be conducted. We will refer to the final data
29 | files(s) that you use in your analysis as "your analysis data file(s)".
30 | Your analysis data file(s) should be stored in your `analysis-data`
31 | folder.
32 | Strictly speaking, including your analysis data file(s) in the
33 | documentation is redundant: anyone interested in your analysis data
34 | file(s) files can create them simply by executing the R scripts you
35 | wrote for the importing and processing phases of your project.
36 | Nonetheless, the TIER Protocol calls for your analysis data file(s) to
37 | be included simply because it is sometimes convenient to have a readily
38 | accessible copy of the analysis data.
39 |
40 | 1. The script(s) for the analysis phase should contain commands that
41 | open the analysis data file(s) you created in the processing phase, and
42 | then generate the results reported in your paper.
43 | Every command in your analysis script(s)that generates a piece of
44 | output or a result reported in your paper should be preceded by a
45 | comment that indicates what piece of output or result the command will
46 | generate.The following examples illustrate some typical kinds of
47 | comments:
48 |
49 | `# The following command produces Table 6.`
50 |
51 | `# The following command produces Figure 12.`
52 |
53 | `# The following command calculates the correlation of -0.54`
54 | `# between variables X and Y reported on page 16 of the paper.`
55 |
56 | All of the scripts for importing, processing and analyzing your data
57 | should be included in the `command-files` folder.
58 |
59 | One additional script, called `data_appendix.R`, should also be
60 | included in your `command-files` folder. This script is described in
61 | the instructions for your Data Appendix.
--------------------------------------------------------------------------------
/processing-and-analysis/importable-data/README.md:
--------------------------------------------------------------------------------
1 | # Importable Data
2 |
3 | For each of the original data files in your `original-data` folder,
4 | you should create a corresponding version that we will call an
5 | "importable data file." These importable data files should be stored in
6 | the `importable-data` folder.
7 |
8 | In some cases, the importable data file will be a slightly modified
9 | version of the original. In other cases the importable version will be
10 | identical to the original.
11 |
12 | Whether or not an importable data file differs from the original
13 | version will depend on whether the original version is in a format that
14 | R can open or import.
15 |
16 | There are two cases to consider:
17 |
18 | (1) The original data file is in a format that R can open or import.
19 |
20 | This case obviously applies if the original data file is in R’s
21 | `.Rdata` format.
22 |
23 | This case also applies to files that are not in `.Rdata` format, but
24 | that can be opened with R. For example, R’s `read.csv()` command can
25 | be used to import data from a file in CSV format. Similarly, if you
26 | have loaded the `XLConnect` package, the `readWorksheetFromFile()`
27 | command can be used to import data from an Excel workbook.
28 |
29 | When an original data file is in R’s `.Rdata` format, or another format
30 | that can be imported into R without any modification, the corresponding
31 | importable data file should be an exact copy of the original. In these
32 | cases, the copy of the file in the `importable-data` folder should have
33 | the same name as the copy in the `original-data' folder.
34 |
35 | Note, however, that in some cases, even when an original data file is
36 | in CSV or Excel format, it may be convenient or necessary to modify it
37 | slightly before using R to import the data it contains. Three examples
38 | of cases in which this is true are given below under item (ii).
39 |
40 | (2) The original data file must be modified before it can be imported
41 | to R.
42 |
43 | In some situations, it may be necessary or convenient to modify an
44 | original data file before importing it to R.
45 |
46 | The following examples illustrate a few of the common cases:
47 |
48 | - If the original data file is a spreadsheet that contains explanatory
49 | notes as well as data, it may be necessary or convenient to remove
50 | those notes from the importable version of the spreadsheet.
51 |
52 | - If a certain variable is measured in dollars, and a dollar sign ($)
53 | precedes each value of the variable in the original CSV or Excel data
54 | file, you may wish to remove the dollar signs so that R recognizes that
55 | the variable should be stored in a numeric format.
56 |
57 | - If an original data file is formatted for use with a particular type
58 | of software other than R (e.g., SPSS, SAS, R or Matlab), it may be
59 | necessary to convert the file from its original format to
60 | R’s `.Rdata` format using a package like [Stat/Transfer](https://www.stattransfer.com/), or
61 | an R package such as `foreign` or `haven`.
62 |
63 | As these examples illustrate, the particular ways in which an original
64 | data file needs to be modified will vary depending on the nature of the
65 | original data file. But in every case, the modifications made to an
66 | original data file to create the importable version should follow this
67 | general principle:
68 |
69 | **The importable data file should be as nearly identical as possible to
70 | the original; no changes should be made to the file other than the
71 | minimal modifications required to allow R to read the data
72 | it contains.**
73 |
74 | When an importable data file is a modified version of the corresponding
75 | original data file, the original and importable versions should be
76 | given different names.
--------------------------------------------------------------------------------