├── .gitignore ├── README.md ├── original-data ├── README.md └── metadata │ ├── README.md │ ├── metadata_guide.md │ └── supplements │ └── README.md └── processing-and-analysis ├── README.md ├── analysis-data ├── README.md └── data_appendix.Rmd ├── command-files └── README.md └── importable-data └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .Rhistory -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # The TIER Documentation Protocol v3.0 for R 2 | 3 | 4 | ## Overview 5 | 6 | The TIER Documentation Protocol provides instructions for assembling a 7 | set of electronic files that document all the steps of data processing 8 | and analysis you conduct for an empirical research paper. 9 | 10 | The documentation specified by the Protocol contains all the data, 11 | computer programs, and explanatory information an independent researcher 12 | would need to be able to replicate the data processing and analysis you 13 | conducted for the project and to reproduce exactly all the results 14 | reported in your paper. 15 | 16 | ## ProjectTIER_R repository 17 | 18 | The instructions presented in this repository are written for users of R. 19 | In a few places, they use R-specific terminology. For example, we refer to 20 | command files as scripts, and their names are followed by the .R 21 | extension. But the R-specific terminology that appears in these 22 | instructions can be easily translated to any of the major statistical 23 | packages (such as SPSS, SAS, Stata or Matlab) or other programming 24 | languages. 25 | 26 | ## Getting started 27 | 28 | To get started you can fork and then clone this repository which will create 29 | a copy of the folder structure recommended in the Project TIER protocol, or click the 30 | "Clone or Download" button to download a ZIP file of the structure. 31 | 32 | Below we describe how to organize your analysis according to the 33 | Project TIER protocol, i.e. which components of your analysis should go 34 | into which folder. 35 | 36 | ## Hierarchy and description of files and folders 37 | 38 | Your repository should have the following hierarchy of files and folders: 39 | 40 | - An electronic copy of your complete final paper. Often, this means: 41 | + An `.Rmd` file with all the text and code to produce the final paper 42 | + A knitted HTML or PDF file of the complete paper 43 | - The `README.md` file for your repository 44 | - Original Data and Metadata - `original-data` 45 | + Metadata - `metadata` 46 | - Metadata Guide - `metadata_guide.md` 47 | - Supplements - `supplements` 48 | - Processing and Analysis - `processing-and-analysis` 49 | + Importable Data - `importable-data` 50 | + Command Files - `command-files` 51 | + Analysis Data - `analysis-data` 52 | 53 | Contents of these files and folders are described in the `README` files 54 | within these folders. 55 | 56 | ## README 57 | 58 | The `README.md` file in the top hierarchy of your repository (this 59 | file) gives information about all the other files included in the 60 | documentation for your paper. In particular, the `README` file should: 61 | 62 | 1. state what statistical software or other computer programs are 63 | needed to run the command files. 64 | 1. explain the structure of the hierarchy of folders in which the 65 | documentation is stored, and briefly describe each of the files 66 | included in the documentation. 67 | 1. describe precisely any changes you made to your original data files 68 | to create the corresponding versions saved in your `importable-data` 69 | folder. 70 | 1. give explicit, step-by-step instructions for using your 71 | documentation to replicate the statistical results reported in your 72 | paper. 73 | 74 | The README should be a Markdown document so that it can be 75 | rendered properly on GitHub, and any changes can be tracked. It should 76 | be named `README.md`. This file should be stored in the top level of 77 | your repository. -------------------------------------------------------------------------------- /original-data/README.md: -------------------------------------------------------------------------------- 1 | # Original Data and Metadata 2 | 3 | This folder is where you place a copy of every original data file from which you extract any of the data used in your study. These data files should be placed directly into the `original-data` directory. 4 | 5 | Occasionally, an original data file is in a format that cannot be read by the statistical software you are using for your project (for example, a PDF file that needs to be read using optical character recognition). In this case, you need to create a modified version of the original data file that is in a format your software can read. 6 | 7 | When you need to create an importable version of an original data file, you should keep both versions (importable and original) in the Data folder. (The original and importable versions of a data file should be given different names.) 8 | 9 | This folder also contains the sub-folder `metadata`, the contents of which is described in the `README` file within that folder. -------------------------------------------------------------------------------- /original-data/metadata/README.md: -------------------------------------------------------------------------------- 1 | # Metadata 2 | 3 | The top level of your `metadata` folder should contain one document: 4 | the Metadata Guide. 5 | 6 | The `metadata` folder should also contain one sub-folder: the ` 7 | supplements` folder. 8 | 9 | ## Metadata Guide 10 | 11 | For each of your original data files, the Metadata Guide provides the kind of information typically found in a codebook accompanying a dataset, such as variable definitions and coding, sampling methods, and anything else a user would need to know to work with and interpret the data appropriately. 12 | 13 | You, the author of the paper, compose the Metadata Guide. 14 | 15 | The Metadata Guide should be organized into one or more sections; each section should provide information about one of the original data files in your `original-data` folder. For each original data file, the information included in the Metadata Guide should include: 16 | 17 | 18 | 1. *A bibliographic citation for the original data file.* This citation should be in a format consistent with the editorial style (e.g., APA or Chicago) used in the main paper or report on the study. 19 | 1. *A digital object identifier (DOI) for the data file (if one has been assigned).* If a DOI is included in the bibliographic citation, it need not be repeated. 20 | 1. *The date on which the author first downloaded, or obtained in some other way, the original data file.* If a date is included in the bibliographic citation, it need not be repeated. 21 | 1. *A written explanation of how an interested reader can obtain a copy 22 | of the original data file.* In many cases, this explanation will give the 23 | URL of a website from which the data can be accessed, along with 24 | instructions for downloading a file identical to the original data file 25 | you obtained from that site.In all cases, this explanation should be 26 | complete and precise enough to allow an independent researcher to 27 | locate and obtain the data file without any additional information or 28 | assistance. 29 | 1. *Whatever additional information an independent researcher would need 30 | to understand and use the data in the original data file.* The particular 31 | information required can vary a great deal depending on the nature of 32 | the original data file in question, and deciding what additional 33 | information to provide therefore requires thoughtful consideration and 34 | judgment.In many cases, the relevant information is similar to what is 35 | found in a codebook or users' guide for a dataset: variable names and 36 | definitions, coding schemes and units of measurement, and details of 37 | the sampling method and weight variables.In some cases, it is also 38 | necessary to include information about the file structure (e.g., the 39 | delimiters used to separate variables, or, in rectangular files without 40 | delimiters, the columns in which the variables are stored).Any other 41 | unique or idiosyncratic aspects of the data that an independent user of 42 | the data would need to understand should be explained as well. 43 | 1. *Supplementary documents with additional metadata.* In many cases, some or all of the information about an original data file that should be included in the Metadata Guide is available in an existing, publicly accessible document, such as a codebook or user’s guide that is provided with the original data file. In these cases, it is not necessary to include that information in the Metadata Guide. Instead, you may simply put a note in the Metadata Guide indicating that the information is available in an existing document. 44 | 45 | When you put a note in the Metadata Guide indicating that certain parts of the information that should be provided there are available in an existing document, you should preserve a copy of the existing document in the Metadata sub-folder (along with the Metadata Guide that you compose yourself). 46 | 47 | The Metadata Guide should be a Markdown document so that it can be 48 | rendered properly on GitHub, and any changes can be tracked. It should 49 | be named `metadata_guide.md`. This file should be stored in the 50 | `metadata` folder. -------------------------------------------------------------------------------- /original-data/metadata/metadata_guide.md: -------------------------------------------------------------------------------- 1 | # Metadata Guide -------------------------------------------------------------------------------- /original-data/metadata/supplements/README.md: -------------------------------------------------------------------------------- 1 | # Supplements 2 | 3 | As described in the `README` file for the Metadata Guide, the 4 | `supplements` folder is where you store any existing documents related 5 | to your original data files, such as users’ guides or codebooks, that 6 | contain relevant information you omitted from the Metadata Guide. -------------------------------------------------------------------------------- /processing-and-analysis/README.md: -------------------------------------------------------------------------------- 1 | # Processing and Analysis 2 | 3 | This folder contains three sub-folders: `importable-data`, 4 | `command-files`, and `analysis-data`. 5 | 6 | Contents of these folders are described in `README` files in those 7 | folders. -------------------------------------------------------------------------------- /processing-and-analysis/analysis-data/README.md: -------------------------------------------------------------------------------- 1 | # Analysis Data 2 | 3 | This folder should contain: 4 | 5 | - Your analysis data file(s) as described in the instructions for your ` 6 | command-files` folder. 7 | - Your Data Appendix, `data_appendix.Rmd`. 8 | 9 | ## The Data Appendix 10 | 11 | The Data Appendix is a document that serves as a codebook for the analysis data file(s). It is composed by the author of the paper. 12 | 13 | If the data processing phase of your research generated just one 14 | analysis data file, and all the results presented in your paper were 15 | derived from that single analysis data file, the Data Appendix should 16 | begin with a brief description of the analysis data file. 17 | 18 | Typically, this description will say something about the scope of the 19 | sample or population the data represent, specify the unit of analysis, 20 | and indicate the number of observations. As in the case of the metadata 21 | that accompanies your original data files, however, exactly what 22 | information is relevant will depend on the nature of the analysis data 23 | file, so deciding which aspects you will describe in the Data Appendix 24 | will require judgment. 25 | 26 | After the brief description of the analysis data file, the Data 27 | Appendix should present information about every variable in the 28 | analysis data file.The information presented about each variable should 29 | include: 30 | 31 | - the name of the variable and a complete definition (including as appropriate, for example, coding and/or units of measurement, the wording of a survey question the variable is based on, or adjustments made for inflation or PPP). 32 | - the name(s) of the original data file from which the variable was extracted, or from which the variables used to construct it were extracted, and the names of the variables extracted from the original data files. 33 | - the number of observations with valid values for the variable, and the number of observations with missing values. 34 | 35 | For categorical variables, the information should also include: 36 | - a frequency table. 37 | - a bar chart illustrating the frequency distribution. 38 | 39 | For quantitative variables, the information should also include: 40 | - basic summary statistics: the mean, standard deviation, minimum, 25th 41 | percentile, median, 75th percentile, and maximum. 42 | - a histogram. 43 | 44 | If the results presented in your paper were derived from more than one 45 | analysis data file, the Data Appendix should include all of the above 46 | information - the brief description of the data file and the 47 | information about each of the variables contained in the file - for 48 | each of the analysis data files that was used. 49 | 50 | The Data Appendix should be an R Markdown document so that it can be 51 | rendered properly on GitHub, and any changes can be tracked. It should 52 | be named `data_appendix.Rmd`. This file should be stored in the 53 | `analysis-data` folder. 54 | 55 | This file contains plain text as well as R code that generates the 56 | summary statistics and figures described above. -------------------------------------------------------------------------------- /processing-and-analysis/analysis-data/data_appendix.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Data Appendix" 3 | author: "name of author" 4 | date: "date created or edited" 5 | output: html_document 6 | --- -------------------------------------------------------------------------------- /processing-and-analysis/command-files/README.md: -------------------------------------------------------------------------------- 1 | # Command Files 2 | 3 | For simple implementations of the TIER Protocol, the "commands" (R code) will be interwoven with text in the `.Rmd` file in the main folder of your project. 4 | 5 | However, occasionally it is necessary to split out supplementary work into additional files. For example, you may have a separate `.Rmd` file that completes a long and arduous data cleaning process, or you may have R scripts (with the `.R` extension) that contain specialized functions you wish to "source" into your final analysis. 6 | 7 | These additional files should be placed in the `command-files` directory. 8 | 9 | In all of the scripts you write, it is important to include comments 10 | that are detailed and clear enough to make it possible for someone not 11 | familiar with your project to understand the steps of data processing 12 | and analysis that are executed by the commands in the script. 13 | 14 | For the purpose of constructing and organizing your scripts, you may want to 15 | think of the work on your project in terms of two phases, 16 | (1) processing, and (2) analysis. Your scripts 17 | will include one or more files that execute each of these phases of 18 | research. 19 | 20 | 1. The script(s)for the processing phase should include commands that 21 | execute all the processing required to transform your importable data 22 | files into the final data you will use in your analysis. 23 | Exactly what these steps will be is highly variable, but they typically 24 | include operations such as joining two or more data files, dropping 25 | variables or cases, generating new variables, and recoding. At the end 26 | of the script(s)for the processing phase, there should be `save()` 27 | commands that save the final data file(s) upon 28 | which your analysis will be conducted. We will refer to the final data 29 | files(s) that you use in your analysis as "your analysis data file(s)".

30 | Your analysis data file(s) should be stored in your `analysis-data` 31 | folder.

32 | Strictly speaking, including your analysis data file(s) in the 33 | documentation is redundant: anyone interested in your analysis data 34 | file(s) files can create them simply by executing the R scripts you 35 | wrote for the importing and processing phases of your project. 36 | Nonetheless, the TIER Protocol calls for your analysis data file(s) to 37 | be included simply because it is sometimes convenient to have a readily 38 | accessible copy of the analysis data. 39 | 40 | 1. The script(s) for the analysis phase should contain commands that 41 | open the analysis data file(s) you created in the processing phase, and 42 | then generate the results reported in your paper.

43 | Every command in your analysis script(s)that generates a piece of 44 | output or a result reported in your paper should be preceded by a 45 | comment that indicates what piece of output or result the command will 46 | generate.The following examples illustrate some typical kinds of 47 | comments: 48 | 49 | `# The following command produces Table 6.` 50 | 51 | `# The following command produces Figure 12.` 52 | 53 | `# The following command calculates the correlation of -0.54`
54 | `# between variables X and Y reported on page 16 of the paper.` 55 | 56 | All of the scripts for importing, processing and analyzing your data 57 | should be included in the `command-files` folder. 58 | 59 | One additional script, called `data_appendix.R`, should also be 60 | included in your `command-files` folder. This script is described in 61 | the instructions for your Data Appendix. -------------------------------------------------------------------------------- /processing-and-analysis/importable-data/README.md: -------------------------------------------------------------------------------- 1 | # Importable Data 2 | 3 | For each of the original data files in your `original-data` folder, 4 | you should create a corresponding version that we will call an 5 | "importable data file." These importable data files should be stored in 6 | the `importable-data` folder. 7 | 8 | In some cases, the importable data file will be a slightly modified 9 | version of the original. In other cases the importable version will be 10 | identical to the original. 11 | 12 | Whether or not an importable data file differs from the original 13 | version will depend on whether the original version is in a format that 14 | R can open or import. 15 | 16 | There are two cases to consider: 17 | 18 | (1) The original data file is in a format that R can open or import. 19 | 20 | This case obviously applies if the original data file is in R’s 21 | `.Rdata` format. 22 | 23 | This case also applies to files that are not in `.Rdata` format, but 24 | that can be opened with R. For example, R’s `read.csv()` command can 25 | be used to import data from a file in CSV format. Similarly, if you 26 | have loaded the `XLConnect` package, the `readWorksheetFromFile()` 27 | command can be used to import data from an Excel workbook. 28 | 29 | When an original data file is in R’s `.Rdata` format, or another format 30 | that can be imported into R without any modification, the corresponding 31 | importable data file should be an exact copy of the original. In these 32 | cases, the copy of the file in the `importable-data` folder should have 33 | the same name as the copy in the `original-data' folder. 34 | 35 | Note, however, that in some cases, even when an original data file is 36 | in CSV or Excel format, it may be convenient or necessary to modify it 37 | slightly before using R to import the data it contains. Three examples 38 | of cases in which this is true are given below under item (ii). 39 | 40 | (2) The original data file must be modified before it can be imported 41 | to R. 42 | 43 | In some situations, it may be necessary or convenient to modify an 44 | original data file before importing it to R. 45 | 46 | The following examples illustrate a few of the common cases: 47 | 48 | - If the original data file is a spreadsheet that contains explanatory 49 | notes as well as data, it may be necessary or convenient to remove 50 | those notes from the importable version of the spreadsheet. 51 | 52 | - If a certain variable is measured in dollars, and a dollar sign ($) 53 | precedes each value of the variable in the original CSV or Excel data 54 | file, you may wish to remove the dollar signs so that R recognizes that 55 | the variable should be stored in a numeric format. 56 | 57 | - If an original data file is formatted for use with a particular type 58 | of software other than R (e.g., SPSS, SAS, R or Matlab), it may be 59 | necessary to convert the file from its original format to 60 | R’s `.Rdata` format using a package like [Stat/Transfer](https://www.stattransfer.com/), or 61 | an R package such as `foreign` or `haven`. 62 | 63 | As these examples illustrate, the particular ways in which an original 64 | data file needs to be modified will vary depending on the nature of the 65 | original data file. But in every case, the modifications made to an 66 | original data file to create the importable version should follow this 67 | general principle: 68 | 69 | **The importable data file should be as nearly identical as possible to 70 | the original; no changes should be made to the file other than the 71 | minimal modifications required to allow R to read the data 72 | it contains.** 73 | 74 | When an importable data file is a modified version of the corresponding 75 | original data file, the original and importable versions should be 76 | given different names. --------------------------------------------------------------------------------