├── figures └── contvsvm.png └── escience_docker_for_reproducbility.Rmd /figures/contvsvm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/UW-eScience-docker-for-reproducible-research/HEAD/figures/contvsvm.png -------------------------------------------------------------------------------- /escience_docker_for_reproducbility.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Reproducible Research using Docker and R" 3 | output: ioslides_presentation 4 | --- 5 | 6 | Motives for using Docker 7 | ------------------------ 8 | 9 | - Difficulty of managing dependencies 10 | - Maximizing isolation and transparency 11 | - Portability of computational environment 12 | - Make extendibility and reuse easy 13 | - Ease of use generally 14 | 15 | Limitations of VMs 16 | ------------------ 17 | 18 | - **Size**: VMs are large files which makes them impractical to store 19 | and transfer 20 | - **Performance**: running VMs consumes significant CPU and memory 21 | 22 | VM vs Docker 23 | -------------- 24 | 25 | Drawing 26 | 27 | 28 | 29 | What Docker is 30 | -------------- 31 | 32 | - A shipping container for the online universe: hardware-agnostic and 33 | platform-agnostic 34 | - A tool that lets programmers neatly package software and move it 35 | from machine to machine. 36 | - Released as open source in March 2013, a big deal on github: 18.6k 37 | stars, 3.8k forks 38 | 39 | 40 | 41 | 42 | Basic ingredients 43 | ----------------- 44 | 45 | - **[dockerfiles](https://docs.docker.com/reference/builder/)**: 46 | plain-text instructions to automatically make images 47 | - **containers**: the active, running parts of Docker that do 48 | something 49 | - **images**: pre-built environments and instructions that tell a 50 | container what to do. 51 | - **registry**: [open online repository of 52 | images](https://registry.hub.docker.com/), including many ['trusted 53 | builds'](http://dockerfile.github.io/) 54 | 55 | Docker's limitations 56 | -------------------- 57 | 58 | - Security: it is possible for a hosted image to be written with some 59 | malicious intent 60 | - Limited to 64-bit host machines, making it impossible to run on 61 | older hardware 62 | - Does not provide complete virtualization but relies on the Linux 63 | kernel provided by the host 64 | - On OSX and Windows this means a VM must be present 65 | ([boot2docker](http://boot2docker.io/) installs 66 | [VirtualBox](https://www.virtualbox.org/) for this) 67 | 68 | Getting started on OSX & Windows 69 | -------------------------------- 70 | 71 | - Install & start [boot2docker](http://boot2docker.io/) 72 | - `docker pull /` gets an existing image from 73 | registry 74 | - eg. `docker pull ubuntu` notice there's no username here, because 75 | this is an 'official repo' 76 | - after `docker pull` then `docker run` 77 | - or simply `docker run`, which will `pull`, `create` and `run` in one 78 | step 79 | 80 | Common [arguments](https://docs.docker.com/reference/run/) for `docker run`: 81 | ---------------------------------------------------------------------------- 82 | 83 | Argument | Explanation 84 | ------------------ | ---------------------- 85 | -i | Interactive (usually used with -t) 86 | -t | Give a terminal interface for a CLI 87 | -p | Publish Ports: `-p :` 88 | -d | Detached mode: run the container in the background (opposite of -i -t) 89 | -v | Mount a volume from inside your container (that has been specified with the VOLUME instruction in the Dockerfile) 90 | -rm=true | Remove your container from the host when it stops running (only available with -it) 91 | 92 | 93 | 94 | Examples of `docker run` 95 | ------------------------ 96 | 97 | - `docker run -it ubuntu` 98 | - gets ubuntu and gives us a terminal for interaction 99 | - `docker run -dp 8787:8787 rocker/rstudio` 100 | - gets R & RStudio and opens port 8787 for using RStudio server in a 101 | web browser at localhost:8787 (linux) or 192.168.59.103:8787 102 | (Windows, OSX) 103 | 104 | [Interacting with docker at the command line](https://docs.docker.com/reference/commandline/cli/) 105 | ----------------------------------------------- 106 | 107 | Command | Explanation 108 | ------------------------------------------- | ---------------------------------- 109 | `docker ps` | list all the running containers on the host 110 | `docker ps -a` | list all the containers on the host, including those that have stopped 111 | `docker exec -it bash` | opens bash shell for a currently running container 112 | `docker stop ` | stop a running container 113 | `docker kill ` | force stop a running container 114 | 115 | 116 | [Interacting with docker at the command line](https://docs.docker.com/reference/commandline/cli/) 117 | ----------------------------------------------- 118 | 119 | Command | Explanation 120 | ------------------------------------------ | ----------------------------------- 121 | `docker rm ` | removes (deletes) a container 122 | `docker rmi ` | removes (deletes) an image 123 | `docker rm -f $(docker ps -a -q)` | remove all current containers 124 | `docker rmi -f $(docker images -q)` | remove all images, even those not in use 125 | 126 | 127 | 128 | 129 | [Writing a Dockerfile](https://docs.docker.com/articles/dockerfile_best-practices/) 130 | ----------------------------------------------------------------------------------- 131 | 132 | - It is possible to use `docker commit ` to commit a 133 | container's file changes or settings into a new image 134 | - But it is better to use Dockerfiles & git to manage your images in a 135 | documented and maintainable way 136 | - A Dockerfile is a short plain text file that is a recipie for making 137 | a docker image 138 | 139 | Some common Dockerfile elements 140 | ------------------------------- 141 | 142 | - FROM specifies which base image your image is built on 143 | (ultimately back to Debian) 144 | - MAINTAINER specifies who created and maintains the 145 | image. 146 | - CMD specifies the command to run immediately when a container is 147 | started from this image, unless you specify a different command. 148 | - ADD will copy new files from a source and add them to 149 | the containers filesystem path 150 | - RUN does just that: It runs a command inside the 151 | container (eg. `apt-get`) 152 | - EXPOSE tells Docker that the container will listen on 153 | the specified port when it starts 154 | - VOLUME will create a mount point with the specified name 155 | and tell Docker that the volume may be mounted by the host 156 | 157 | Using Dockerfiles 158 | ----------------- 159 | 160 | - To build an image from a dockerfile: 161 | `docker build --rm -t / ` 162 | - [simple](https://github.com/benmarwick/1989-excavation-report-Madjebebe/blob/master/Dockerfile) 163 | and [moderately 164 | complex](https://github.com/rocker-org/hadleyverse/blob/master/Dockerfile) 165 | examples 166 | - To send an image to the registry: 167 | `docker push /` You need to be registered at 168 | the [hub](https://hub.docker.com/) bfore pushing 169 | 170 | [Automated Docker image build testing](https://circleci.com/) 171 | ------------------------------------------------------------- 172 | 173 | - Automated image build testing on a new commit to the Dockerfile 174 | - Analogous to the travis-ci service, has a shield 175 | - Requires a `.circle.yml` file in github repo, eg. 176 | [https://github.com/benmarwick/1989-excavation-report-Madjebebe/blob/master/circle.yml](https://github.com/benmarwick/1989-excavation-report-Madjebebe/blob/master/circle.yml) 177 | - Pushes new image to hub on successful complete of test 178 | - And gives a 179 | [badge](https://github.com/benmarwick/1989-excavation-report-Madjebebe) 180 | to indicate [test 181 | status](https://circleci.com/gh/benmarwick/1989-excavation-report-Madjebebe) 182 | 183 | Doing research with RStudio and Docker 184 | -------------------------------------- 185 | 186 | - The [rocker project](https://github.com/rocker-org/) provides images 187 | that include R, key packages and other dependencies (RStudio, 188 | pandoc, LaTeX, etc.), and has excellent documentation on the [github 189 | wiki](https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image) 190 | - I run RStudio server in the browser, with host folder as volume, 191 | very easy to use 192 | - I store scripts on host volume because VC is simpler this way, but 193 | do development and analysis in container for isolation 194 | 195 | I get started with... 196 | -------------------------------------- 197 | 198 | `docker run -dp 8787:8787 -v /c/Users/marwick/docker:/home/rstudio/ -e ROOT=TRUE rocker/hadleyverse` 199 | 200 | - `-dp 8787:8787` gives me a port for the web browser to access 201 | RStudio 202 | - `-v /c/Users/marwick/docker:/home/rstudio/` gives me read and write 203 | access both ways between Windows (C:/Users/marwick/docker) and 204 | RStudio 205 | - `-e ROOT=TRUE` sets an environment variable to enable root access 206 | for me so I can manage dependencies 207 | - I can access the docker (Debian) shell via RStudio for file 208 | manipulation, etc. (or `docker exec -it bash`) 209 | 210 | ...and IPython 211 | ------------ 212 | 213 | - Choose your favourite from the 214 | [registry](https://registry.hub.docker.com/search?q=ipython&s=downloads) 215 | - the IPython project have a few images, and there are many 216 | user-contributed ones 217 | 218 | Cloud computing with Docker is widely supported 219 | ----------------------------------------------- 220 | 221 | - Amazon EC2 Container Service: docker clusters in the cloud (no 222 | registry) 223 | - Google Compute Engine: has container-optimized VMs 224 | - Google container registry: secure private docker image storage on 225 | google cloud platform 226 | - Microsoft Azure supports docker containers (docker hub is 227 | integrated) 228 | 229 | References & further reading 230 | ---------------------------- 231 | 232 | - [http://arxiv-web3.library.cornell.edu/pdf/1410.0846v1.pdf](http://arxiv-web3.library.cornell.edu/pdf/1410.0846v1.pdf) 233 | - [http://sites.duke.edu/researchcomputing/tag/docker/](http://sites.duke.edu/researchcomputing/tag/docker/) 234 | - [https://rc.duke.edu/duke-docker-day-was-great/](https://rc.duke.edu/duke-docker-day-was-great/) 235 | - [https://github.com/LinuxAtDuke/Intro-To-Docker](https://github.com/LinuxAtDuke/Intro-To-Docker) 236 | - [http://reproducible-research.github.io/scipy-tutorial-2014/environment/docker/](http://reproducible-research.github.io/scipy-tutorial-2014/environment/docker/) 237 | - [http://ropensci.org/blog/2014/10/23/introducing-rocker/](http://ropensci.org/blog/2014/10/23/introducing-rocker/) 238 | - [https://github.com/wsargent/docker-cheat-sheet](https://github.com/wsargent/docker-cheat-sheet) 239 | 240 | Colophon 241 | ------------------------- 242 | 243 | Presentation written in [R Markdown using ioslides](http://rmarkdown.rstudio.com/ioslides_presentation_format.html) 244 | 245 | Compiled into HTML5 using [RStudio](http://www.rstudio.com/ide/) & [knitr](http://yihui.name/knitr/) 246 | 247 | Source code hosting: 248 | https://github.com/benmarwick/UW-eScience-docker-for-reproducibility 249 | 250 | ORCID: http://orcid.org/0000-0001-7879-4531 251 | 252 | Licensing: 253 | 254 | * Presentation: [CC-BY-3.0](http://creativecommons.org/licenses/by/3.0/us/) 255 | 256 | * Source code: [MIT](http://opensource.org/licenses/MIT) --------------------------------------------------------------------------------