├── figures
└── contvsvm.png
└── escience_docker_for_reproducbility.Rmd
/figures/contvsvm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/benmarwick/UW-eScience-docker-for-reproducible-research/HEAD/figures/contvsvm.png
--------------------------------------------------------------------------------
/escience_docker_for_reproducbility.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Reproducible Research using Docker and R"
3 | output: ioslides_presentation
4 | ---
5 |
6 | Motives for using Docker
7 | ------------------------
8 |
9 | - Difficulty of managing dependencies
10 | - Maximizing isolation and transparency
11 | - Portability of computational environment
12 | - Make extendibility and reuse easy
13 | - Ease of use generally
14 |
15 | Limitations of VMs
16 | ------------------
17 |
18 | - **Size**: VMs are large files which makes them impractical to store
19 | and transfer
20 | - **Performance**: running VMs consumes significant CPU and memory
21 |
22 | VM vs Docker
23 | --------------
24 |
25 |
26 |
27 |
28 |
29 | What Docker is
30 | --------------
31 |
32 | - A shipping container for the online universe: hardware-agnostic and
33 | platform-agnostic
34 | - A tool that lets programmers neatly package software and move it
35 | from machine to machine.
36 | - Released as open source in March 2013, a big deal on github: 18.6k
37 | stars, 3.8k forks
38 |
39 |
40 |
41 |
42 | Basic ingredients
43 | -----------------
44 |
45 | - **[dockerfiles](https://docs.docker.com/reference/builder/)**:
46 | plain-text instructions to automatically make images
47 | - **containers**: the active, running parts of Docker that do
48 | something
49 | - **images**: pre-built environments and instructions that tell a
50 | container what to do.
51 | - **registry**: [open online repository of
52 | images](https://registry.hub.docker.com/), including many ['trusted
53 | builds'](http://dockerfile.github.io/)
54 |
55 | Docker's limitations
56 | --------------------
57 |
58 | - Security: it is possible for a hosted image to be written with some
59 | malicious intent
60 | - Limited to 64-bit host machines, making it impossible to run on
61 | older hardware
62 | - Does not provide complete virtualization but relies on the Linux
63 | kernel provided by the host
64 | - On OSX and Windows this means a VM must be present
65 | ([boot2docker](http://boot2docker.io/) installs
66 | [VirtualBox](https://www.virtualbox.org/) for this)
67 |
68 | Getting started on OSX & Windows
69 | --------------------------------
70 |
71 | - Install & start [boot2docker](http://boot2docker.io/)
72 | - `docker pull /` gets an existing image from
73 | registry
74 | - eg. `docker pull ubuntu` notice there's no username here, because
75 | this is an 'official repo'
76 | - after `docker pull` then `docker run`
77 | - or simply `docker run`, which will `pull`, `create` and `run` in one
78 | step
79 |
80 | Common [arguments](https://docs.docker.com/reference/run/) for `docker run`:
81 | ----------------------------------------------------------------------------
82 |
83 | Argument | Explanation
84 | ------------------ | ----------------------
85 | -i | Interactive (usually used with -t)
86 | -t | Give a terminal interface for a CLI
87 | -p | Publish Ports: `-p :`
88 | -d | Detached mode: run the container in the background (opposite of -i -t)
89 | -v | Mount a volume from inside your container (that has been specified with the VOLUME instruction in the Dockerfile)
90 | -rm=true | Remove your container from the host when it stops running (only available with -it)
91 |
92 |
93 |
94 | Examples of `docker run`
95 | ------------------------
96 |
97 | - `docker run -it ubuntu`
98 | - gets ubuntu and gives us a terminal for interaction
99 | - `docker run -dp 8787:8787 rocker/rstudio`
100 | - gets R & RStudio and opens port 8787 for using RStudio server in a
101 | web browser at localhost:8787 (linux) or 192.168.59.103:8787
102 | (Windows, OSX)
103 |
104 | [Interacting with docker at the command line](https://docs.docker.com/reference/commandline/cli/)
105 | -----------------------------------------------
106 |
107 | Command | Explanation
108 | ------------------------------------------- | ----------------------------------
109 | `docker ps` | list all the running containers on the host
110 | `docker ps -a` | list all the containers on the host, including those that have stopped
111 | `docker exec -it bash` | opens bash shell for a currently running container
112 | `docker stop ` | stop a running container
113 | `docker kill ` | force stop a running container
114 |
115 |
116 | [Interacting with docker at the command line](https://docs.docker.com/reference/commandline/cli/)
117 | -----------------------------------------------
118 |
119 | Command | Explanation
120 | ------------------------------------------ | -----------------------------------
121 | `docker rm ` | removes (deletes) a container
122 | `docker rmi ` | removes (deletes) an image
123 | `docker rm -f $(docker ps -a -q)` | remove all current containers
124 | `docker rmi -f $(docker images -q)` | remove all images, even those not in use
125 |
126 |
127 |
128 |
129 | [Writing a Dockerfile](https://docs.docker.com/articles/dockerfile_best-practices/)
130 | -----------------------------------------------------------------------------------
131 |
132 | - It is possible to use `docker commit ` to commit a
133 | container's file changes or settings into a new image
134 | - But it is better to use Dockerfiles & git to manage your images in a
135 | documented and maintainable way
136 | - A Dockerfile is a short plain text file that is a recipie for making
137 | a docker image
138 |
139 | Some common Dockerfile elements
140 | -------------------------------
141 |
142 | - FROM specifies which base image your image is built on
143 | (ultimately back to Debian)
144 | - MAINTAINER specifies who created and maintains the
145 | image.
146 | - CMD specifies the command to run immediately when a container is
147 | started from this image, unless you specify a different command.
148 | - ADD will copy new files from a source and add them to
149 | the containers filesystem path
150 | - RUN does just that: It runs a command inside the
151 | container (eg. `apt-get`)
152 | - EXPOSE tells Docker that the container will listen on
153 | the specified port when it starts
154 | - VOLUME will create a mount point with the specified name
155 | and tell Docker that the volume may be mounted by the host
156 |
157 | Using Dockerfiles
158 | -----------------
159 |
160 | - To build an image from a dockerfile:
161 | `docker build --rm -t / `
162 | - [simple](https://github.com/benmarwick/1989-excavation-report-Madjebebe/blob/master/Dockerfile)
163 | and [moderately
164 | complex](https://github.com/rocker-org/hadleyverse/blob/master/Dockerfile)
165 | examples
166 | - To send an image to the registry:
167 | `docker push /` You need to be registered at
168 | the [hub](https://hub.docker.com/) bfore pushing
169 |
170 | [Automated Docker image build testing](https://circleci.com/)
171 | -------------------------------------------------------------
172 |
173 | - Automated image build testing on a new commit to the Dockerfile
174 | - Analogous to the travis-ci service, has a shield
175 | - Requires a `.circle.yml` file in github repo, eg.
176 | [https://github.com/benmarwick/1989-excavation-report-Madjebebe/blob/master/circle.yml](https://github.com/benmarwick/1989-excavation-report-Madjebebe/blob/master/circle.yml)
177 | - Pushes new image to hub on successful complete of test
178 | - And gives a
179 | [badge](https://github.com/benmarwick/1989-excavation-report-Madjebebe)
180 | to indicate [test
181 | status](https://circleci.com/gh/benmarwick/1989-excavation-report-Madjebebe)
182 |
183 | Doing research with RStudio and Docker
184 | --------------------------------------
185 |
186 | - The [rocker project](https://github.com/rocker-org/) provides images
187 | that include R, key packages and other dependencies (RStudio,
188 | pandoc, LaTeX, etc.), and has excellent documentation on the [github
189 | wiki](https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image)
190 | - I run RStudio server in the browser, with host folder as volume,
191 | very easy to use
192 | - I store scripts on host volume because VC is simpler this way, but
193 | do development and analysis in container for isolation
194 |
195 | I get started with...
196 | --------------------------------------
197 |
198 | `docker run -dp 8787:8787 -v /c/Users/marwick/docker:/home/rstudio/ -e ROOT=TRUE rocker/hadleyverse`
199 |
200 | - `-dp 8787:8787` gives me a port for the web browser to access
201 | RStudio
202 | - `-v /c/Users/marwick/docker:/home/rstudio/` gives me read and write
203 | access both ways between Windows (C:/Users/marwick/docker) and
204 | RStudio
205 | - `-e ROOT=TRUE` sets an environment variable to enable root access
206 | for me so I can manage dependencies
207 | - I can access the docker (Debian) shell via RStudio for file
208 | manipulation, etc. (or `docker exec -it bash`)
209 |
210 | ...and IPython
211 | ------------
212 |
213 | - Choose your favourite from the
214 | [registry](https://registry.hub.docker.com/search?q=ipython&s=downloads)
215 | - the IPython project have a few images, and there are many
216 | user-contributed ones
217 |
218 | Cloud computing with Docker is widely supported
219 | -----------------------------------------------
220 |
221 | - Amazon EC2 Container Service: docker clusters in the cloud (no
222 | registry)
223 | - Google Compute Engine: has container-optimized VMs
224 | - Google container registry: secure private docker image storage on
225 | google cloud platform
226 | - Microsoft Azure supports docker containers (docker hub is
227 | integrated)
228 |
229 | References & further reading
230 | ----------------------------
231 |
232 | - [http://arxiv-web3.library.cornell.edu/pdf/1410.0846v1.pdf](http://arxiv-web3.library.cornell.edu/pdf/1410.0846v1.pdf)
233 | - [http://sites.duke.edu/researchcomputing/tag/docker/](http://sites.duke.edu/researchcomputing/tag/docker/)
234 | - [https://rc.duke.edu/duke-docker-day-was-great/](https://rc.duke.edu/duke-docker-day-was-great/)
235 | - [https://github.com/LinuxAtDuke/Intro-To-Docker](https://github.com/LinuxAtDuke/Intro-To-Docker)
236 | - [http://reproducible-research.github.io/scipy-tutorial-2014/environment/docker/](http://reproducible-research.github.io/scipy-tutorial-2014/environment/docker/)
237 | - [http://ropensci.org/blog/2014/10/23/introducing-rocker/](http://ropensci.org/blog/2014/10/23/introducing-rocker/)
238 | - [https://github.com/wsargent/docker-cheat-sheet](https://github.com/wsargent/docker-cheat-sheet)
239 |
240 | Colophon
241 | -------------------------
242 |
243 | Presentation written in [R Markdown using ioslides](http://rmarkdown.rstudio.com/ioslides_presentation_format.html)
244 |
245 | Compiled into HTML5 using [RStudio](http://www.rstudio.com/ide/) & [knitr](http://yihui.name/knitr/)
246 |
247 | Source code hosting:
248 | https://github.com/benmarwick/UW-eScience-docker-for-reproducibility
249 |
250 | ORCID: http://orcid.org/0000-0001-7879-4531
251 |
252 | Licensing:
253 |
254 | * Presentation: [CC-BY-3.0](http://creativecommons.org/licenses/by/3.0/us/)
255 |
256 | * Source code: [MIT](http://opensource.org/licenses/MIT)
--------------------------------------------------------------------------------