├── .gitignore ├── challenge-exercises.rst ├── README.md ├── LICENSE └── docker-intro.rst /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | -------------------------------------------------------------------------------- /challenge-exercises.rst: -------------------------------------------------------------------------------- 1 | Some "challenge exercises" 2 | ========================== 3 | 4 | Here are some ideas for projects that are a little less scripted than 5 | `the introduction to Docker on Linux <./docker-intro.rst>`__. 6 | 7 | 0. There's a set of `challenge exercises at the bottom of the introduction <./docker-intro.rst#challenge-exercises>`__. 8 | 9 | 1. Install Docker on your local computer (Mac, Windows, or Linux) and 10 | run through the introduction yourself. 11 | 12 | 2. Make a Docker container for a command-line package that you use. 13 | 14 | 3. Create a Docker Hub account and push the Docker container you built 15 | to the hub. Try running it on someone else's computer (or on 16 | another/new AWS machine). 17 | 18 | More advanced ideas 19 | ------------------- 20 | 21 | * Try out `bioboxes `__. 22 | 23 | * Take a look at `mybinder.org `__, which takes a 24 | git repo and turns it into a running data analysis environment. 25 | 26 | * Try out `docker-machine 27 | `__. 28 | 29 | * Try out `Carina `__. 30 | 31 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Docker Hands-on materials 2 | 3 | ## Nov 7-8, 2015 4 | 5 | These are the materials for [a Docker hands-on 6 | workshop](http://dib-training.readthedocs.org/en/pub/2015-11-09-docker.html) 7 | that we ran at UC Davis in November 2015. 8 | 9 | ---- 10 | 11 | 1. [An introduction to running Docker on Linux](./docker-intro.rst) 12 | 13 | Here is an introduction to running Docker on an Ubuntu machine. For 14 | sanity's sake, we use an Amazon Web Services machine, but it should work 15 | on any reasonably recent Ubuntu machine (14.04 or greater install). 16 | 17 | 2. [Challenge exercises](./challenge-exercises.rst) 18 | 19 | For people that want to try some things out on their own, here are some 20 | suggestions for getting started. 21 | 22 | 3. [Installing bioboxes](http://bioboxes.org/docs/how-to-install/) and using bioboxes to [assemble a genome and evaluate the assembly](http://bioboxes.org/docs/assemble-a-genome/). 23 | 24 | ---- 25 | 26 | Other links: 27 | 28 | * Lisa Cohen's [blog post recording our live-coding session for Dockerfiles](https://monsterbashseq.wordpress.com/2015/11/10/uc-davis-docker-workshop-live-coding/) 29 | 30 | * Titus Brown's [repository containing Dockerfiles for khmer and Salmon](https://github.com/ctb/2015-docker-building), together with notes on docker-machine and data volumes (see [the dammit directory](https://github.com/ctb/2015-docker-building/tree/master/dammit)). 31 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | CC0 1.0 Universal 2 | 3 | Statement of Purpose 4 | 5 | The laws of most jurisdictions throughout the world automatically confer 6 | exclusive Copyright and Related Rights (defined below) upon the creator and 7 | subsequent owner(s) (each and all, an "owner") of an original work of 8 | authorship and/or a database (each, a "Work"). 9 | 10 | Certain owners wish to permanently relinquish those rights to a Work for the 11 | purpose of contributing to a commons of creative, cultural and scientific 12 | works ("Commons") that the public can reliably and without fear of later 13 | claims of infringement build upon, modify, incorporate in other works, reuse 14 | and redistribute as freely as possible in any form whatsoever and for any 15 | purposes, including without limitation commercial purposes. These owners may 16 | contribute to the Commons to promote the ideal of a free culture and the 17 | further production of creative, cultural and scientific works, or to gain 18 | reputation or greater distribution for their Work in part through the use and 19 | efforts of others. 20 | 21 | For these and/or other purposes and motivations, and without any expectation 22 | of additional consideration or compensation, the person associating CC0 with a 23 | Work (the "Affirmer"), to the extent that he or she is an owner of Copyright 24 | and Related Rights in the Work, voluntarily elects to apply CC0 to the Work 25 | and publicly distribute the Work under its terms, with knowledge of his or her 26 | Copyright and Related Rights in the Work and the meaning and intended legal 27 | effect of CC0 on those rights. 28 | 29 | 1. Copyright and Related Rights. A Work made available under CC0 may be 30 | protected by copyright and related or neighboring rights ("Copyright and 31 | Related Rights"). Copyright and Related Rights include, but are not limited 32 | to, the following: 33 | 34 | i. the right to reproduce, adapt, distribute, perform, display, communicate, 35 | and translate a Work; 36 | 37 | ii. moral rights retained by the original author(s) and/or performer(s); 38 | 39 | iii. publicity and privacy rights pertaining to a person's image or likeness 40 | depicted in a Work; 41 | 42 | iv. rights protecting against unfair competition in regards to a Work, 43 | subject to the limitations in paragraph 4(a), below; 44 | 45 | v. rights protecting the extraction, dissemination, use and reuse of data in 46 | a Work; 47 | 48 | vi. database rights (such as those arising under Directive 96/9/EC of the 49 | European Parliament and of the Council of 11 March 1996 on the legal 50 | protection of databases, and under any national implementation thereof, 51 | including any amended or successor version of such directive); and 52 | 53 | vii. other similar, equivalent or corresponding rights throughout the world 54 | based on applicable law or treaty, and any national implementations thereof. 55 | 56 | 2. Waiver. To the greatest extent permitted by, but not in contravention of, 57 | applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and 58 | unconditionally waives, abandons, and surrenders all of Affirmer's Copyright 59 | and Related Rights and associated claims and causes of action, whether now 60 | known or unknown (including existing as well as future claims and causes of 61 | action), in the Work (i) in all territories worldwide, (ii) for the maximum 62 | duration provided by applicable law or treaty (including future time 63 | extensions), (iii) in any current or future medium and for any number of 64 | copies, and (iv) for any purpose whatsoever, including without limitation 65 | commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes 66 | the Waiver for the benefit of each member of the public at large and to the 67 | detriment of Affirmer's heirs and successors, fully intending that such Waiver 68 | shall not be subject to revocation, rescission, cancellation, termination, or 69 | any other legal or equitable action to disrupt the quiet enjoyment of the Work 70 | by the public as contemplated by Affirmer's express Statement of Purpose. 71 | 72 | 3. Public License Fallback. Should any part of the Waiver for any reason be 73 | judged legally invalid or ineffective under applicable law, then the Waiver 74 | shall be preserved to the maximum extent permitted taking into account 75 | Affirmer's express Statement of Purpose. In addition, to the extent the Waiver 76 | is so judged Affirmer hereby grants to each affected person a royalty-free, 77 | non transferable, non sublicensable, non exclusive, irrevocable and 78 | unconditional license to exercise Affirmer's Copyright and Related Rights in 79 | the Work (i) in all territories worldwide, (ii) for the maximum duration 80 | provided by applicable law or treaty (including future time extensions), (iii) 81 | in any current or future medium and for any number of copies, and (iv) for any 82 | purpose whatsoever, including without limitation commercial, advertising or 83 | promotional purposes (the "License"). The License shall be deemed effective as 84 | of the date CC0 was applied by Affirmer to the Work. Should any part of the 85 | License for any reason be judged legally invalid or ineffective under 86 | applicable law, such partial invalidity or ineffectiveness shall not 87 | invalidate the remainder of the License, and in such case Affirmer hereby 88 | affirms that he or she will not (i) exercise any of his or her remaining 89 | Copyright and Related Rights in the Work or (ii) assert any associated claims 90 | and causes of action with respect to the Work, in either case contrary to 91 | Affirmer's express Statement of Purpose. 92 | 93 | 4. Limitations and Disclaimers. 94 | 95 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 96 | surrendered, licensed or otherwise affected by this document. 97 | 98 | b. Affirmer offers the Work as-is and makes no representations or warranties 99 | of any kind concerning the Work, express, implied, statutory or otherwise, 100 | including without limitation warranties of title, merchantability, fitness 101 | for a particular purpose, non infringement, or the absence of latent or 102 | other defects, accuracy, or the present or absence of errors, whether or not 103 | discoverable, all to the greatest extent permissible under applicable law. 104 | 105 | c. Affirmer disclaims responsibility for clearing rights of other persons 106 | that may apply to the Work or any use thereof, including without limitation 107 | any person's Copyright and Related Rights in the Work. Further, Affirmer 108 | disclaims responsibility for obtaining any necessary consents, permissions 109 | or other rights required for any use of the Work. 110 | 111 | d. Affirmer understands and acknowledges that Creative Commons is not a 112 | party to this document and has no duty or obligation with respect to this 113 | CC0 or use of the Work. 114 | 115 | For more information, please see 116 | 117 | 118 | -------------------------------------------------------------------------------- /docker-intro.rst: -------------------------------------------------------------------------------- 1 | ================================= 2 | A hands-on introduction to Docker 3 | ================================= 4 | 5 | :author: \C. Titus Brown, titus@idyll.org 6 | :license: CC0 7 | :date: Nov 7. 2015 8 | 9 | Introduction and goals 10 | ====================== 11 | 12 | Docker is a mechanism for building and running isolated "containers" 13 | of software. Docker containers act much like virtual machines but are 14 | smaller and more flexible than VMs. The Docker culture and ecosystem 15 | also enhance Docker's potential for aiding in reproducible computing. 16 | 17 | Below, we'll show you how to install Docker on an EC2 instance, use an 18 | existing Docker container, and build your own Docker container. 19 | 20 | .. contents:: 21 | 22 | Getting started with Docker 23 | =========================== 24 | 25 | Install Docker 26 | -------------- 27 | 28 | Start up an EC2 instance running blank Ubuntu 14.04 29 | (see http://angus.readthedocs.org/en/2015/amazon). 30 | 31 | Then, install Docker:: 32 | 33 | wget -qO- https://get.docker.com/ | sudo sh 34 | 35 | (This will take about 5 minutes.) 36 | 37 | Now, configure the default user ('ubuntu') to use Docker:: 38 | 39 | sudo usermod -aG docker ubuntu 40 | 41 | and log out and log back in. 42 | 43 | Run Docker 44 | ---------- 45 | 46 | The following command will start up a blank Ubuntu 14.04 docker container:: 47 | 48 | docker run -it ubuntu:14.04 49 | 50 | (If you get the message ``Post 51 | http:///var/run/docker.sock/v1.20/containers/create: dial unix 52 | /var/run/docker.sock: permission denied.`` then you need to log out 53 | and log back in.) 54 | 55 | This command will spit out a fair bit of output - what it's doing (the 56 | first time you run it) is going out to `the docker hub 57 | `__ and downloading the Ubuntu 14.04 image to 58 | your EC2 instance. 59 | 60 | You should end up at a prompt that looks like this: 61 | ``root@77e00211fef4:/# ``. Unlike your previous prompt (which on EC2 62 | defaults to ending in a ``$ ``), *this* prompt has placed you inside 63 | your running Docker container. This container is running *inside* 64 | your other Ubuntu machine, and its file system and process space is 65 | completely isolated from the "parent" machine. Note in partcular that 66 | you are 'root' inside the Docker container, while you're still user 'ubuntu' 67 | on the AWS machine. 68 | 69 | This is a blank Ubuntu machine. You can play around in here a bit, if you 70 | want, to verify this. 71 | 72 | Now, exit by typing:: 73 | 74 | exit 75 | 76 | This will place you back at your EC2 prompt. 77 | 78 | At this point your docker container is shut down and you are placed 79 | back at your EC2 prompt. Importantly, everything you did to the file 80 | system in the container is basically gone at this point - container 81 | contents don't affect the image from which they derive. You can verify this 82 | by re-running the ``docker run -it ubuntu:14.04``, adding a file, and 83 | then exiting; if you run the same image, the file system will be 84 | missing the added file. 85 | 86 | 87 | Cleaning up 88 | ----------- 89 | 90 | By default, docker saves a record of containers that have been run, which you can see with ``docker ps -a``. If you wish to later delete an image, docker will complain if any dependent containers still exist. Here's a remedy: 91 | 92 | docker stop $(docker ps -a -q) #needed if you have running containers 93 | docker rm $(docker ps -a -q) 94 | 95 | Alternatively, one can add the ``--rm`` flag when running interactively to avoid having to remove the containers later. 96 | 97 | Persisting changes 98 | ----------- 99 | 100 | See ``docker 101 | commit`` and the `Docker image docs 102 | `__ for more info 103 | on building images, or go on to the next section. 104 | 105 | Building images 106 | =============== 107 | 108 | The image we ran above is named ``ubuntu:14.04``, which is the unique 109 | Docker ID for that particular OS (Ubuntu), that particular version 110 | (14.04). It doesn't contain anything particularly useful, 111 | though. What if you wanted to build your own Docker container 112 | with some more software installed? We'll do that next. 113 | 114 | Build a Docker image for MEGAHIT, interactively 115 | ----------------------------------------------- 116 | 117 | Let's build a Docker image for the MEGAHIT short-read assembler. 118 | (This is not the right way to do it in general, and we'll do it the 119 | Right Way with a Dockerfile, below.) This is all based on the 120 | `Assembling E. coli tutorial 121 | `__. 122 | 123 | Start up a new container:: 124 | 125 | docker run -it ubuntu:14.04 126 | 127 | This completes quite quickly, because you've already downloaded everything. 128 | 129 | Now, **in this new container**, run the commands necessary to build 130 | and run MEGAHIT: 131 | 132 | First, update the base software and install g++, make, git, and zlib:: 133 | 134 | apt-get update && apt-get install -y g++ make git zlib1g-dev python 135 | 136 | Then check out and build megahit:: 137 | 138 | git clone https://github.com/voutcn/megahit.git /home/megahit 139 | cd /home/megahit && make 140 | 141 | So, now we have megahit built! On our docker container! But we face 142 | two problems: 143 | 144 | * that took a while, and we'd probably rather not do it again; but the docker 145 | container is going to go away as soon as we exit! Wouldn't it be nice 146 | to be able to package this for others? 147 | 148 | * the docker container is disconnected from the underlying machine, so we 149 | have no way of accessing any data! How can we connect it to some data? 150 | 151 | Let's take these two problems on separately - we'll start with the 152 | first problem, by saving the docker container to an image that we can 153 | re-run. 154 | 155 | ---- 156 | 157 | To save the docker container to an image, we need to reference the 158 | docker container somehow. This is done by taking note of the 159 | container ID; it's the string between the '@' and the ':' in the 160 | command prompt, so, for a command prompt like ``root@fa1bf23148a5:``, 161 | it would be ``fa1bf23148a5``. Copy this information somewhere (into 162 | an e-mail or something). Then, exit the container:: 163 | 164 | exit 165 | 166 | Now you'll be back at the ``ubuntu`` prompt. To commit a copy of 167 | the container above to a docker image, type:: 168 | 169 | docker commit -m "built megahit" fa1bf23148a5 megahit_ctr 170 | 171 | but replacing ``fa1bf23148a5`` with your docker container ID. 172 | 173 | This creates a new image named 'megahit_ctr' that contains all of your changes 174 | above. If you run:: 175 | 176 | docker images 177 | 178 | you should see something like:: 179 | 180 | | REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE 181 | | megahit_ctr latest 749fd74397ed 29 seconds ago 427.5 MB 182 | | ubuntu 14.04 91e54dfb1179 3 days ago 188.4 MB 183 | 184 | Now, to run the megahit image, you can type:: 185 | 186 | docker run -it megahit_ctr 187 | 188 | and (inside the docker container, which will have a new container ID) you can 189 | run:: 190 | 191 | /home/megahit/megahit 192 | 193 | to verify that you still have megahit installed and running. And 194 | voila! You've created your own container! (If you want to make this 195 | available to everyone, go check out `the Docker hub 196 | `__.) 197 | 198 | Connecting a Docker container to some external data 199 | --------------------------------------------------- 200 | 201 | Now that we can run and rerun the megahit-installed container to our heart's 202 | content, we still have to figure out how to connect it to some data. How?? 203 | 204 | Well, first, let's download some data to our EC2 instance. 205 | 206 | Make sure you're at the ``ubuntu@`` prompt, by typing ``exit`` if necessary. 207 | 208 | Now execute:: 209 | 210 | cd 211 | mkdir data 212 | cd data 213 | wget http://public.ged.msu.edu.s3.amazonaws.com/ecoli_ref-5m-trim.se.fq.gz 214 | wget http://public.ged.msu.edu.s3.amazonaws.com/ecoli_ref-5m-trim.pe.fq.gz 215 | 216 | This downloads those two data files into your home directory -- these are 217 | E. coli short-read data from Chitsaz et al., 2011. 218 | 219 | Now, run your ``megahit_ctr`` image, and connect /home/ubuntu/data/ to /mydata 220 | on the image:: 221 | 222 | docker run -v /home/ubuntu/data:/mydata \ 223 | -it megahit_ctr 224 | 225 | This will "mount" your data from /home/ubuntu/data on the Docker container, 226 | and connect it to the '/mydata' directory in your container. Type:: 227 | 228 | ls /mydata 229 | 230 | to verify that you see these files. 231 | 232 | Now, let's assemble! :: 233 | 234 | /home/megahit/megahit --12 /mydata/*.pe.fq.gz \ 235 | -r /mydata/*.se.fq.gz \ 236 | -o /mydata/ecoli -t 4 237 | 238 | Now, exit your docker container with ``exit`` and look at your data directory:: 239 | 240 | ls /home/ubuntu/data 241 | 242 | You should see the /home/ubuntu/data/ecoli directory with the assembly in it:: 243 | 244 | ls /home/ubuntu/data/ecoli 245 | 246 | Running it all in one 247 | --------------------- 248 | 249 | You might think, "hey, wouldn't it be nice to be able to run all of 250 | this in one command, rather than starting a docker container and 251 | then running it from the command line in there?" Yep. Run this:: 252 | 253 | docker run -v /home/ubuntu/data:/mydata \ 254 | -it megahit_ctr \ 255 | sh -c '/home/megahit/megahit --12 /mydata/*.pe.fq.gz 256 | -r /mydata/*.se.fq.gz 257 | -o /mydata/ecoli -t 4' 258 | 259 | Basically, everything after the image name gets passed directly into docker 260 | to be executed. You have to use the 'sh -c' stuff because otherwise 261 | ``/data/*.se.fq.gz`` gets interpreted on your EC2 machine and not on your 262 | Docker image. 263 | 264 | But... this is kind of long and annoying. Wouldn't it be nice to have this 265 | in a shell script? Yes, yes, it would. Let's put it in a shell script 266 | in the 'data' directory, and then run *that*. 267 | 268 | First, put the command in a shell script:: 269 | 270 | cd /home/ubuntu/data 271 | cat < do-assemble.sh 272 | #! /bin/bash 273 | rm -fr /data/ecoli 274 | /home/megahit/megahit --12 /mydata/*.pe.fq.gz \ 275 | -r /mydata/*.se.fq.gz \ 276 | -o /mydata/ecoli -t 4 277 | EOF 278 | chmod +x do-assemble.sh 279 | 280 | and then run the shell script inside of Docker:: 281 | 282 | docker run -v /home/ubuntu/data:/mydata \ 283 | -it megahit_ctr /mydata/do-assemble.sh 284 | 285 | and voila! 286 | 287 | One thing to note here is that we've placed the ``do-assemble.sh`` script on 288 | the EC2 machine, rather than in the Docker container. You can do it either 289 | way, but in this case it was more convenient to do it this way because 290 | we'd already created the container and I didn't want to have to create a 291 | new one. The only change needed is to put the script in ``/home`` on the 292 | docker image (because that's the local disk), instead of ``/mydata`` (which 293 | is the mounted volume).. 294 | 295 | Building an image with a Dockerfile 296 | ----------------------------------- 297 | 298 | The image above was constructed by running a bunch of commands. Wouldn't 299 | it be nice if we could give Docker a bunch of commands and tell *it* to 300 | build an image *for us*? 301 | 302 | You can do that with a Dockerfile, which is the Right Way to build an image. 303 | 304 | Let's encode the commands above in a Dockerfile:: 305 | 306 | mkdir /home/ubuntu/make_megahit 307 | cd /home/ubuntu/make_megahit 308 | cat < Dockerfile 309 | FROM ubuntu:14.04 310 | RUN apt-get update 311 | RUN apt-get install -y g++ make git zlib1g-dev python 312 | RUN git clone https://github.com/voutcn/megahit.git /home/megahit 313 | RUN cd /home/megahit && make 314 | CMD /mydata/do-assemble.sh 315 | EOF 316 | 317 | Let's look at this Dockerfile before running it:: 318 | 319 | cat Dockerfile 320 | 321 | The 'FROM' command tells Docker what container to load; the 'RUN' 322 | commands tell Docker what to execute (and then save the results from); 323 | and the `CMD` specifies the script entry point - a command that is 324 | run if no other command is given. 325 | 326 | Let's build a Docker image from this and see what happens! :: 327 | 328 | docker build -t megahit_ctr2 . 329 | 330 | (This will take a few minutes.) 331 | 332 | Once it's built, you can now run it like so:: 333 | 334 | docker run -v /home/ubuntu/data:/mydata -it megahit_ctr2 335 | 336 | ...and voila! 337 | 338 | If you wanted to make this broadly available, the next steps 339 | would be to log into the Docker hub and push it; I did so with 340 | these commands: ``docker login``, ``docker build -t titus/megahit .``, 341 | and ``docker push titus/megahit``. 342 | 343 | You can run *my* version of all of this with:: 344 | 345 | docker run -v /home/ubuntu/data:/data -it titus/megahit 346 | 347 | and -- here's the super neat thing -- you don't need to repeat any of 348 | the above, other than installing Docker itself and downloading the data! 349 | 350 | Summary points 351 | ============== 352 | 353 | * Docker provides a nice way to bundle multiple packages of software 354 | together, for both yourself and for others to run. 355 | 356 | * Docker gives you a good way to isolate what you're running from the 357 | data you're running it on. 358 | 359 | * The Dockerfile enhances reproducibility by giving explicit instructions 360 | for what to install, rather than simply bundling it all in a binary. 361 | 362 | Challenge exercises 363 | =================== 364 | 365 | * Create a new image ``megahit2`` where the do-assemble.sh script 366 | created above is saved in /home on the image itself, rather than 367 | in /data. 368 | 369 | * Create a container that has both MEGAHIT and Quast installed; see 370 | `this page `__ 371 | for Quast install instructions. 372 | 373 | * Modify the Docker run script to also run Quast on the MEGAHIT 374 | assembly. 375 | 376 | * Install docker on your local computer, and run the 'titus/megahit' image 377 | there. 378 | 379 | More reading 380 | ============ 381 | 382 | `Docker has a lot of docs `__. 383 | 384 | Docker was used `to make a GigaScience paper completely reproducible `__. (I've `written about this idea `__ too.) 385 | 386 | `Binary containers can be bad for science `__. 387 | 388 | Dealing with data is `still complicated `__, but 389 | `the landscape is changing fast `__. 390 | 391 | `The impact of Docker containers on the performance of genomic pipelines `__, Di Tommaso et al., 2015 (PeerJ preprint). 392 | --------------------------------------------------------------------------------