├── .gitignore ├── README.md ├── deploy.sh ├── input └── data.csv ├── main.Rmd └── processData.R /.gitignore: -------------------------------------------------------------------------------- 1 | .Rdata 2 | .Rhistory 3 | .Rprofile 4 | main.html 5 | output/* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A workflow for reproducible and transparent data journalism with R and GitHub 2 | 3 | 4 | 5 | **WARNING: This guide is outdated and (probably) flawed. Check out my new [rddj-template](https://github.com/grssnbchr/rddj-template) instead!** 6 | 7 | 8 | 9 | This repo is tutorial & example at the same time, yay! 10 | 11 | The goal of this workflow is to automagically upload your knitted RMarkdown file to a [**GitHub Page**](http://grssnbchr.github.io/rddj-reproducibility-workflow/) and to "build" your R code into a zipped folder 12 | which can be downloaded by your readers. Ideally, you would reference this zipped folder from within your knitted RMarkdown. 13 | 14 | Note: *The repo from which the knitted RMarkdown is served to the GitHub Page can also be private!* 15 | 16 | ## Steps 17 | 18 | ### Step 0 19 | 20 | Init an empty repository & and add remote: 21 | ``` 22 | mkdir rddj-reproducibility-workflow 23 | cd rddj-reproducibility-workflow 24 | git init 25 | # replace the following with your account and repo 26 | git remote add origin https://github.com/grssnbchr/rddj-reproducibility-workflow.git 27 | ``` 28 | 29 | Add a `.gitignore` to ignore standard R output files & project files as well as the `tmp` folder we'll need for building 30 | ``` 31 | .Rdata 32 | .Rhistory 33 | .Rprofile 34 | main.html 35 | output/* 36 | ``` 37 | 38 | ### Step 1 (repetitive) 39 | 40 | All your "productive" R code goes into one RMarkdown file, but you can include source files (see `main.Rmd`). 41 | 42 | You can work with your repo as you would with any other, doing stuff like 43 | ``` 44 | git add 45 | git commit 46 | git push 47 | ... 48 | ``` 49 | 50 | ### Step 2 51 | 52 | Now you want to publish your RMarkdown, and, ideally, your whole R script (together with the input files) on GitHub Pages. 53 | 54 | Initially, and only once, you need to do the following in your working directory: 55 | 56 | * Start a new branch gh-pages 57 | 58 | ``` 59 | git checkout -b gh-pages 60 | ``` 61 | 62 | * remove everything except gitignore (need to enable an extension in Bash shells in order for this command to work) 63 | ``` 64 | shopt -s extglob 65 | git rm -rf !(.gitignore) 66 | git add -u 67 | ``` 68 | 69 | * make an initial commit 70 | ``` 71 | git commit -m "first commit to gh-pages branch" 72 | ``` 73 | 74 | ### Step 3 75 | For deployment, we want the following: 76 | 77 | (* The RMarkdown should be automagically knitted to HTML) 78 | * The knitted RMarkdown file (`main.html`) should be pushed as `index.html`, so it is shown on the GitHub Page 79 | * The R code and the input files should be made available for download as a **zipped folder**, so everyone can rerun the RMarkdown and/or modify the code and produce the output folder. 80 | 81 | In order to automate this deployment process, we create a little shell script. 82 | 83 | First, make sure you are in the master branch: 84 | ``` 85 | git checkout master 86 | ``` 87 | 88 | Then, fire up your favorite editor and create a shell script called `deploy.sh` in the top folder, with the following content: 89 | 90 | ``` 91 | #!/bin/bash 92 | # first, knit 93 | # only works if you have pandoc > 1.9.0 installed 94 | # R -e "rmarkdown::render('main.Rmd')" 95 | # make temporary copy of the stuff we want to commit in with all data we need in build 96 | mkdir tmp 97 | cp main.Rmd tmp/ 98 | cp -r input tmp/ 99 | cp processData.R tmp/ # replace this with the name your subroutines and add more, if needed 100 | # switch to gh-pages branch 101 | git checkout gh-pages 102 | # rename index file (the processed main.Rmd) from master branch 103 | mv main.html index.html 104 | # make folder for rscript 105 | mkdir rscript 106 | # copy over necessary scripts from master branch 107 | cp -r tmp/* rscript/ 108 | 109 | # zip the rscript folder 110 | zip -r rscript.zip rscript 111 | # remove the rscript folder 112 | rm -rf rscript 113 | # remove temporary folder 114 | rm -rf tmp 115 | # add everything for committing 116 | git add . 117 | # commit in gh-pages 118 | git commit -m "build and deploy to gh-pages" 119 | # push to remote:gh-pages 120 | git push origin gh-pages 121 | # checkout master again 122 | git checkout master 123 | ``` 124 | At the end, make the script executable 125 | ``` 126 | chmod 755 deploy.sh 127 | ``` 128 | 129 | ### Step 4 (repetitive) 130 | 131 | Now, every time you want to deploy your updated RMarkdown and your R script to your GitHub page, you can 132 | 133 | ``` 134 | ./deploy.sh 135 | ``` 136 | 137 | 138 | And your knitted RMarkdown will magically find its way into *username*.github.io/*reponame*. 139 | Note: This also works when *reponame* is a private repo! 140 | 141 | **For this to work best, make sure you are in the `master` branch and you have a clean working directory!** 142 | 143 | In the case of this demonstration repo, the results are viewable under http://grssnbchr.github.io/rddj-reproducibility-workflow. 144 | -------------------------------------------------------------------------------- /deploy.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # make temporary copy of the stuff we want to commit in with all data we need in build 3 | mkdir tmp 4 | cp main.Rmd tmp/ 5 | cp -r input tmp/ 6 | cp processData.R tmp/ 7 | # switch to gh-pages branch 8 | git checkout gh-pages 9 | # rename index file (the processed main.Rmd) from master branch 10 | mv main.html index.html 11 | # make folder for rscript 12 | mkdir rscript 13 | # copy over necessary scripts from master branch 14 | cp -r tmp/* rscript/ 15 | 16 | # zip the rscript folder 17 | zip -r rscript.zip rscript 18 | # remove the rscript folder 19 | rm -rf rscript 20 | # remove temporary folder 21 | rm -rf tmp 22 | # add everything for committing 23 | git add . 24 | # commit in gh-pages 25 | git commit -m "build and deploy to gh-pages" 26 | # push to remote:gh-pages 27 | git push origin gh-pages 28 | # checkout master again 29 | git checkout master -------------------------------------------------------------------------------- /input/data.csv: -------------------------------------------------------------------------------- 1 | a,b 2 | 1,2 3 | 4,5 4 | 4,6 5 | 3,4 6 | 5,6 7 | 10,4 8 | 5,7 9 | 4,5 10 | 23,3 -------------------------------------------------------------------------------- /main.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "An example RMarkdown file" 3 | author: "Timo Grossenbacher, SRF Data" 4 | date: "09/06/2015" 5 | output: html_document 6 | --- 7 | 8 | This is an example R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see . 9 | 10 | Inside this document, you show your audience what you did in your analysis. You can also include a link to the downloadable RScript, [like so](http://grssnbchr.github.io/rddj-reproducibility-workflow/rscript.zip). 11 | 12 | You can also include code from other scripts: 13 | 14 | ```{r} 15 | source("processData.R") 16 | ``` 17 | 18 | Example script that takes something from input and writes a png to output 19 | ```{r} 20 | # read csv from input 21 | data <- read.csv("input//data.csv") 22 | # transform it using a sourced function 23 | processedData <- doSomething(data) 24 | 25 | # generate an output folder if it doesn't exist 26 | dir.create(file.path(getwd(), 'output'), showWarnings = F) 27 | # plot 28 | png(filename = "output/image.png") 29 | plot(processedData) 30 | dev.off() 31 | plot(processedData) 32 | ``` 33 | 34 | This is the end. -------------------------------------------------------------------------------- /processData.R: -------------------------------------------------------------------------------- 1 | # processData.r 2 | # this script will be sourced by the main.Rmd 3 | doSomething <- function(input){ 4 | input$a <- input$a * 2 5 | input$b <- input$b * 3 6 | return(input) 7 | } 8 | --------------------------------------------------------------------------------