├── .gitignore ├── pie ├── .gitignore ├── combine.sh ├── cook.sh └── Makefile ├── hello └── Makefile ├── c-hello ├── hello ├── hello.c └── Makefile ├── makefiles.pdf ├── images ├── string.jpg ├── flowchart.png ├── make-target.png ├── flowchart.graffle └── make-target.graffle ├── Makefile ├── yeast ├── .gitignore ├── download.sh ├── Makefile └── test.py ├── LICENSE └── README.adoc /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | .pytest_cache 3 | -------------------------------------------------------------------------------- /pie/.gitignore: -------------------------------------------------------------------------------- 1 | crust.txt 2 | filling.txt 3 | meringue.txt 4 | pie.txt 5 | -------------------------------------------------------------------------------- /hello/Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: hello 2 | 3 | hello: 4 | echo "Hello, World!" 5 | -------------------------------------------------------------------------------- /c-hello/hello: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyclark/make-tutorial/HEAD/c-hello/hello -------------------------------------------------------------------------------- /makefiles.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyclark/make-tutorial/HEAD/makefiles.pdf -------------------------------------------------------------------------------- /images/string.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyclark/make-tutorial/HEAD/images/string.jpg -------------------------------------------------------------------------------- /images/flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyclark/make-tutorial/HEAD/images/flowchart.png -------------------------------------------------------------------------------- /images/make-target.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyclark/make-tutorial/HEAD/images/make-target.png -------------------------------------------------------------------------------- /images/flowchart.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyclark/make-tutorial/HEAD/images/flowchart.graffle -------------------------------------------------------------------------------- /c-hello/hello.c: -------------------------------------------------------------------------------- 1 | #include 2 | int main() { 3 | printf("Hello, World!\n"); 4 | return 0; 5 | } 6 | -------------------------------------------------------------------------------- /images/make-target.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyclark/make-tutorial/HEAD/images/make-target.graffle -------------------------------------------------------------------------------- /c-hello/Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: clean 2 | 3 | hello: clean 4 | gcc -o hello hello.c 5 | 6 | clean: 7 | rm -f hello 8 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | makefiles.pdf: clean 2 | asciidoctor-pdf -a imagesdir=. README.adoc -o makefiles.pdf && open makefiles.pdf 3 | 4 | clean: 5 | rm -f makefiles.pdf 6 | 7 | -------------------------------------------------------------------------------- /yeast/.gitignore: -------------------------------------------------------------------------------- 1 | fasta 2 | chr-count 3 | chr-size 4 | gene-count 5 | verified-genes 6 | uncharacterized-genes 7 | gene-types 8 | terminated-genes 9 | SGD_features.tab 10 | -------------------------------------------------------------------------------- /pie/combine.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [[ $# -gt 1 ]]; then 4 | FILE=$1 5 | shift 1 6 | echo "Will combine $@" > "$FILE" 7 | else 8 | echo "usage: $(basename "$0") FILE ingredients" 9 | fi 10 | -------------------------------------------------------------------------------- /pie/cook.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [[ $# -eq 3 ]]; then 4 | ITEM=$1 5 | TEMP=$2 6 | TIME=$3 7 | echo "Will cook \"${ITEM}\" at ${TEMP} degrees for ${TIME} minutes." 8 | else 9 | echo "usage: $(basename $0) ITEM TEMP TIME" 10 | fi 11 | -------------------------------------------------------------------------------- /pie/Makefile: -------------------------------------------------------------------------------- 1 | all: crust.txt filling.txt meringue.txt 2 | ./combine.sh pie.txt crust.txt filling.txt meringue.txt 3 | ./cook.sh pie.txt 375 45 4 | 5 | filling.txt: 6 | ./combine.sh filling.txt lemon butter sugar 7 | 8 | meringue.txt: 9 | ./combine.sh meringue.txt eggwhites sugar 10 | 11 | crust.txt: 12 | ./combine.sh crust.txt flour butter water 13 | 14 | clean: 15 | rm -f crust.txt meringue.txt filling.txt pie.txt 16 | -------------------------------------------------------------------------------- /yeast/download.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -u 4 | 5 | OUT_DIR="fasta" 6 | [[ ! -d "$OUT_DIR" ]] && mkdir -p "$OUT_DIR" 7 | 8 | URLS=$(mktemp) 9 | echo "http://downloads.yeastgenome.org/sequence/S288C_reference/chromosomes/fasta/chrmt.fsa" > "$URLS" 10 | 11 | for i in $(seq 1 16); do 12 | printf "http://downloads.yeastgenome.org/sequence/S288C_reference/chromosomes/fasta/chr%02d.fsa\n" "$i" >> "$URLS" 13 | done 14 | 15 | cd "$OUT_DIR" 16 | wget -nc -i "$URLS" 17 | rm "$URLS" 18 | 19 | echo "Done." 20 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Ken Youens-Clark 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /yeast/Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: all fasta features test clean 2 | 3 | FEATURES = http://downloads.yeastgenome.org/curation/chromosomal_feature/SGD_features.tab 4 | 5 | all: fasta genome chr-count chr-size features gene-count verified-genes uncharacterized-genes gene-types terminated-genes test 6 | 7 | clean: 8 | find . \( -name \*gene\* -o -name chr-\* \) -exec rm {} \; 9 | rm -rf fasta SGD_features.tab 10 | 11 | fasta: 12 | ./download.sh 13 | 14 | genome: fasta 15 | (cd fasta && cat *.fsa > genome.fa) 16 | 17 | chr-count: genome 18 | grep -e '^>' "fasta/genome.fa" | grep 'chromosome' | wc -l > chr-count 19 | 20 | chr-size: genome 21 | grep -ve '^>' "fasta/genome.fa" | wc -c > chr-size 22 | 23 | features: 24 | wget -nc $(FEATURES) 25 | 26 | gene-count: features 27 | cut -f 2 SGD_features.tab | grep ORF | wc -l > gene-count 28 | 29 | verified-genes: features 30 | awk -F"\t" '$$3 == "Verified" {print}' SGD_features.tab | \ 31 | wc -l > verified-genes 32 | 33 | uncharacterized-genes: features 34 | awk -F"\t" '$$2 == "ORF" && $$3 == "Uncharacterized" {print $$2}' \ 35 | SGD_features.tab | wc -l > uncharacterized-genes 36 | 37 | gene-types: features 38 | awk -F"\t" '{print $$3}' SGD_features.tab | sort | uniq -c > gene-types 39 | 40 | terminated-genes: 41 | grep -o '/G=[^ ]*' palinsreg.txt | cut -d = -f 2 | \ 42 | sort -u > terminated-genes 43 | 44 | test: 45 | pytest -xv ./test.py 46 | -------------------------------------------------------------------------------- /yeast/test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """tests for yeast/Makefile exercise""" 3 | 4 | import os 5 | import re 6 | from subprocess import getstatusoutput 7 | 8 | 9 | # -------------------------------------------------- 10 | def test_files(): 11 | """files exist, have correct answers""" 12 | 13 | files = [('chr-count', '16'), ('chr-size', '12359733'), 14 | ('gene-count', '6604'), ('verified-genes', '5155'), 15 | ('uncharacterized-genes', '728')] 16 | 17 | for file, answer in files: 18 | assert os.path.isfile(file) 19 | contents = open(file).read().strip() 20 | assert contents == answer 21 | 22 | 23 | # -------------------------------------------------- 24 | def test_terminated_genes(): 25 | """terminated-genes""" 26 | 27 | file = 'terminated-genes' 28 | assert os.path.isfile(file) 29 | lines = open(file).readlines() 30 | assert len(lines) == 951 31 | 32 | 33 | # -------------------------------------------------- 34 | def test_gene_types(): 35 | """gene-types""" 36 | 37 | file = 'gene-types' 38 | assert os.path.isfile(file) 39 | 40 | expected = { 41 | 'Dubious': '717', 42 | 'Uncharacterized': '728', 43 | 'Verified': '5155', 44 | 'Verified|silenced_gene': '4', 45 | 'silenced_gene': '2', 46 | } 47 | 48 | regex = re.compile(r'^\s*(\d+)\s(.+)$') 49 | for line in open(file): 50 | match = regex.search(line) 51 | if match: 52 | num, gene_type = match.groups() 53 | if gene_type in expected: 54 | assert num == expected[gene_type] 55 | -------------------------------------------------------------------------------- /README.adoc: -------------------------------------------------------------------------------- 1 | = Reproducible commands and workflows with `make` 2 | 3 | The GNU `make` program was created in 1976 to help build executable programs from source code files. 4 | While it was originally developed to assist with programming in the `c` language, it is not limited to that language or even to the task of compiling code. 5 | According to the manual, one "can use it to describe any task where some files must be updated automatically from others whenever the others change." 6 | The `make` program has gone far beyond it's role as a build tool to become a workflow system. 7 | 8 | == Makefiles are recipes 9 | 10 | When you run the `make` command, it looks for a file called `Makefile` (or `makefile`) in the current working directory. 11 | This file contains recipes that describe discrete actions that combine to create some output. 12 | Think of how a recipe for a lemon pie has steps that need to be completed in a particular order and combination. 13 | For instance, we need to separately create the crust, filling, and meringue and then put them together and bake them before we can enjoy our tasty treat. 14 | We can visualize this with something called a "string diagram" like below. 15 | 16 | .A string diagram describing how to make a pie. From Brendan Fong and David Spivak, Seven Sketches in Compositionality: An Invitation to Applied Category Theory. 17 | image::images/string.jpg[align="center"] 18 | 19 | It's not really important if you make the pie crust the day before and keep it chilled, and the same might hold true for the filling, but it's certainly true that the crust needs to go into the dish followed by the filling and finally the meringue. 20 | An actual recipe might refer to a generic recipes for the crust and meringue in parts of the book and list the steps just for the lemon filling and baking instructions. 21 | 22 | We can write a `Makefile` to mock up these ideas. 23 | We'll use shell scripts to pretend we're assembling the various ingredients into some output like `crust.txt` and `filling.txt`. 24 | For instance, I've written a `combine.sh` script that expects a file name and a list of "ingredients" to put into the file: 25 | 26 | ---- 27 | $ ./combine.sh 28 | usage: combine.sh FILE ingredients 29 | ---- 30 | 31 | I can pretend to make the "crust" like this: 32 | 33 | ---- 34 | $ ./combine.sh crust.txt flour butter water 35 | ---- 36 | 37 | There is now a `crust.txt` file with the following contents: 38 | 39 | ---- 40 | $ cat crust.txt 41 | Will combine flour butter water 42 | ---- 43 | 44 | It's common for a recipe in a `Makefile` to create an output file such as this, but it's not necessary. 45 | Note in this example from the `pie` directory that the `clean` target actually removes files rather than making them: 46 | 47 | ---- 48 | all: crust.txt filling.txt meringue.txt <1> 49 | ./combine.sh pie.txt crust.txt filling.txt meringue.txt <2> 50 | ./cook.sh pie.txt 375 45 51 | 52 | filling.txt: <3> 53 | ./combine.sh filling.txt lemon butter sugar 54 | 55 | meringue.txt: <4> 56 | ./combine.sh meringue.txt eggwhites sugar 57 | 58 | crust.txt: <5> 59 | ./combine.sh crust.txt flour butter water 60 | 61 | clean: <6> 62 | rm -f crust.txt meringue.txt filling.txt 63 | ---- 64 | 65 | <1> This defines a target called `all`. The first target will be the one run when no target is specified. Convention holds that the `all` target will run _all_ the targets necessary to accomplish some default goal like building a piece of software. Here we want to create the `pie.txt` file from the component files and "cook" it. The name `all` is not as important as it being defined first. The target name is followed by a colon and then any dependencies that must be satisfied before running this target. 66 | <2> The `all` target has two commands to run. Each command is indented with a `Tab` character. 67 | <3> This is the `filling.txt` target. The goal of this target is to create the file called "filling.txt". It's common but not necessary to use the output file name as the target name. This target has just one command which is to combine the ingredients for the filling. 68 | <4> This is the `meringue.txt` target, and it combines the egg whites and sugar. 69 | <5> This is the `crust.txt` target that combines flour, butter, and water. 70 | <6> It's common to have a `clean` target to remove any files that were created in the normal course of building. 71 | 72 | As you can see above, the target (also sometimes called a "rule") has a name followed by a colon. 73 | Any dependent actions can be listed after the colon in the order you wish them to be run. 74 | The actions for a target must be indented with a `Tab` character, and you are allowed to define as many commands as you like. 75 | 76 | image::images/make-target.png[align="center"] 77 | 78 | == Running a specific target 79 | 80 | Each recipe in a `Makefile` is called a "target," "rule," or "recipe." 81 | The order of the targets is not important beyond the first target being the default. 82 | Targets can reference other targets defined earlier or later in the file. 83 | 84 | To run a specific target, we can run `make ` to have `make` run the commands for a given recipe: 85 | 86 | ---- 87 | $ make filling.txt 88 | ./combine.sh filling.txt lemon butter sugar 89 | ---- 90 | 91 | And now there is a file called `filling.txt`: 92 | 93 | ---- 94 | $ cat filling.txt 95 | Will combine lemon butter sugar 96 | ---- 97 | 98 | If we try to run this target again, we'll be told there's nothing to do because the file already exists: 99 | 100 | ---- 101 | $ make filling.txt 102 | make: `filling.txt' is up to date. 103 | ---- 104 | 105 | One of the reasons for the existence of `make` is precisely not to do extra work to create files unless some underlying source has changed. 106 | In the course of building software or running a pipeline, it may not be necessary to generate some output unless the inputs have changed (such as the source code being modified). 107 | 108 | To force `make` to run the `filling.txt` target, we can either remove that file or run `make clean` to remove any of the files that have been created: 109 | 110 | ---- 111 | $ make clean 112 | rm -f crust.txt meringue.txt filling.txt pie.txt 113 | ---- 114 | 115 | == Running with no target 116 | 117 | If you run the `make` command with no arguments, and it will automatically run the first target. 118 | This is the main reason to place the `all` target (or something like it) first. 119 | Be careful not to put something destructive like a `clean` target first as you might end up accidentally running it and removing valuable data! 120 | 121 | Let's run `make` with the above `Makefile` and see the output: 122 | 123 | ---- 124 | $ make <1> 125 | ./combine.sh crust.txt flour butter water <2> 126 | ./combine.sh filling.txt lemon butter sugar <3> 127 | ./combine.sh meringue.txt eggwhites sugar <4> 128 | ./combine.sh pie.txt crust.txt filling.txt meringue.txt <5> 129 | ./cook.sh pie.txt 375 45 <6> 130 | Will cook "pie.txt" at 375 degrees for 45 minutes. 131 | ---- 132 | 133 | <1> We run `make` with no arguments. It looks for the first target in a file called `Makefile` in the current working directory. 134 | <2> The `crust.txt` recipe is being run first. Because we didn't specify a target, `make` runs the `all` target which is defined first, and this target lists the `crust.txt` as it's first dependency. 135 | <3> Next the `filling.txt` target is run. 136 | <4> Followed by the `meringue.txt`. 137 | <5> Next we assemble `pie.txt`. 138 | <6> And then we "cook" the pie at 375 degrees for 45 minutes. 139 | 140 | If you run `make` again, you'll see the intermediate steps to produce the `crust.txt`, `filling.txt`, and `meringue.txt` files are skipped because those files already exist: 141 | 142 | ---- 143 | $ make 144 | ./combine.sh pie.txt crust.txt filling.txt meringue.txt 145 | ./cook.sh pie.txt 375 45 146 | Will cook "pie.txt" at 375 degrees for 45 minutes. 147 | ---- 148 | 149 | If you want to force them to be recreated, you can run `make clean && make`: 150 | 151 | ---- 152 | $ make clean && make 153 | rm -f crust.txt meringue.txt filling.txt pie.txt 154 | ./combine.sh crust.txt flour butter water 155 | ./combine.sh filling.txt lemon butter sugar 156 | ./combine.sh meringue.txt eggwhites sugar 157 | ./combine.sh pie.txt crust.txt filling.txt meringue.txt 158 | ./cook.sh pie.txt 375 45 159 | Will cook "pie.txt" at 375 degrees for 45 minutes. 160 | ---- 161 | 162 | == Makefiles create DAGs 163 | 164 | Each target can specify other targets as prerequisites or dependencies that must be accomplished first. 165 | These actions create a graph structure where there is some starting point and paths through targets to finally create some output file(s). 166 | The path described for any target should be a _directed_ (from a start to a stop) _acyclic_ (having no cycles or infinite loops) _graph_ or a DAG: 167 | 168 | .The targets may join together to describe a directed acyclic graph (DAG) of actions to produce some result. 169 | image::images/flowchart.png[align="center"] 170 | 171 | Many analysis pipelines are just that -- a graph of some input like a FASTA sequence file and some transformations (trimming, filtering, comparisons) into some output (e.g., BLAST hits, gene predictions, functional annotations). 172 | You would be surprised at just how far `make` can be abused to document your work and even create fully functional analysis pipelines! 173 | 174 | == Using `make` to compile a `c` program 175 | 176 | I believe it helps to use `make` for its intended purpose at least once in your life in order to really understand why it exists. 177 | Let's take a moment to write and compile a "Hello, World" example in the `c` language. 178 | 179 | In the `c-hello` directory, you will find a simple `c` program that will print "Hello, World!". 180 | Here is the `hello.c` source code: 181 | 182 | ---- 183 | #include <1> 184 | int main() { <2> 185 | printf("Hello, World!\n"); <3> 186 | return 0; <4> 187 | } <5> 188 | ---- 189 | 190 | Let's take a moment to learn just enough `c` to be dangerous going line-by-line: 191 | 192 | <1> Like `bash`, the `#` character introduces comments in the `c` language, but this is a special comment that allows external modules of code to be used. Here, we want to use the `printf` (print-format that we saw in the previous chapter), so we need to `include` the standard I/O (input/output) module called `stdio`. We actually only need to include the "header" file, `stdio.h`, to get at the function definitions in that module. This is a standard module, and the `c` compiler will look in various locations for any included files to find it. There may come times when you are unable to compile `c` (or `c++` programs) from source code because some header file cannot be found. For example, the `gzip` library is often used to de/compress data, but it is not always installed in a libary form that other programs may `include` in this way. Therefore you will have to download and install the `libgz` program, being sure to install the headers into the proper `include` directories. Note that package managers like `apt-get` and `yum` often have `-dev` or `-devel` packages that you have to install to get these headers, e.g., you would install both `libgz` and `libgz-dev` or whatnot. 193 | <2> This is the start of a function declaration in `c`. The `int` (an "integer") is the return value of the function called `main()`. The parentheses `()` list the parameters to the function. There are none, so the the parens are empty. The opening curly brace `{` shows the start of the code that belongs to the function. Note that `c` will automatically execute the `main()` function, and every `c` program must have a `main()` function where the program starts. 194 | <3> The `printf()` function will print the given string to the command line. This function is defined in the `stdio` library, which is why we need to `#include` the header file above. 195 | <4> `return` will exit the function and return the value `0`. Since this is the return value for the `main()` function, this will be the exit value for the entire program. The value `0` indicates that the program ran normally -- think "zero errors." Any non-zero value would indicate a failure. 196 | <5> This curly brace `}` is the closing mate for the one on line 2 and marks the end of the `main()` function. 197 | 198 | To turn that into an executable program you will need to have a `c` compiler on your machine. 199 | We can use the `gcc` (GNU c compiler) with this command: 200 | 201 | ---- 202 | $ gcc hello.o 203 | ---- 204 | 205 | That will create a file called `a.out` which is an executable file. 206 | On my Mac, this is what `file` will report: 207 | 208 | ---- 209 | $ file a.out 210 | a.out: Mach-O 64-bit executable x86_64 211 | ---- 212 | 213 | And I can execute that: 214 | 215 | ---- 216 | $ ./a.out 217 | Hello, World! 218 | ---- 219 | 220 | I don't like the name `a.out`, though, so I can use the `-o` option to name the output file called `hello`: 221 | 222 | ---- 223 | $ gcc -o hello hello.c 224 | ---- 225 | 226 | Run the resulting `hello` executable. 227 | You should see the same output. 228 | 229 | Rather than typing `gcc -o hello hello.c` every time I modify the `hello.c`, I can put that as a "target" into a `Makefile`: 230 | 231 | ---- 232 | hello: 233 | gcc -o hello hello.c 234 | ---- 235 | 236 | And now I can type `make hello` or just `make` if this is the first target: 237 | 238 | ---- 239 | $ make 240 | gcc -o hello hello.c 241 | ---- 242 | 243 | If I run `make` again, nothing happens because the `hello.c` file hasn't changed: 244 | 245 | ---- 246 | $ make 247 | make: `hello' is up to date. 248 | ---- 249 | 250 | Alter your `hello.c` file to print "Hola" instead of "Hello," and then try running `make` again: 251 | 252 | ---- 253 | $ make 254 | make: `hello' is up to date. 255 | ---- 256 | 257 | We can force make to run the targets using the `-B` option: 258 | 259 | ---- 260 | $ make -B 261 | gcc -o hello hello.c 262 | ---- 263 | 264 | And now our new program has been compiled: 265 | 266 | ---- 267 | $ ./hello 268 | Hola, World! 269 | ---- 270 | 271 | This is clearly a trivial example, and you may be wondering how this is actually a time saver. 272 | A real-world project in `c` or any language would likely have multiple `.c` files with headers (`.h` files) describing their functions so that they could be used by other `.c` files. 273 | The `c` compiler would need to turn each `.c` file into `.o` ("out") files and then link them together into a single executable. 274 | Imagine you have dozens of `.c` files, and you change one line of code in one file. 275 | Do you want to type dozens of commands to recompile and link all your code? 276 | Of course not! 277 | You would build a tool to automate those actions for you. 278 | 279 | We can add targets to the `Makefile` that don't generate new files. 280 | It's common to have a `clean` target that will clean up files and directories that we no longer need. 281 | Here I can create `clean` target to remove the `hello` executable. 282 | 283 | ---- 284 | clean: 285 | rm -f hello 286 | ---- 287 | 288 | If I want to be sure that the exeuctable is removed before every running the `hello` target, I can add it as a dependency: 289 | 290 | ---- 291 | hello: clean 292 | gcc -o hello hello.c 293 | ---- 294 | 295 | It's good form to document for `make` that this is a "phony" target because the result of the target is not a new file to "make." 296 | We use the `.PHONY:` target and list all the phonies. 297 | Here is our complete `Makefile` now: 298 | 299 | ---- 300 | $ cat Makefile 301 | .PHONY: clean 302 | 303 | hello: clean 304 | gcc -o hello hello.c 305 | 306 | clean: 307 | rm -f hello 308 | ---- 309 | 310 | If you `make` in the `c-hello` directory with this `Makefile`, you should see this: 311 | 312 | ---- 313 | $ make 314 | rm -f hello 315 | gcc -o hello hello.c 316 | ---- 317 | 318 | And there should now be a `hello` executable in your directory that you can run: 319 | 320 | ---- 321 | $ ./hello 322 | Hello, World! 323 | ---- 324 | 325 | Notice that the `clean` target can be listed as a dependency to the `hello` target even _before_ the target itself is mentioned. 326 | `make` will read the entire file and then use the dependencies to resolve the graph. 327 | If you were to put "foo" as an additional dependency to `hello` and then try to running `make` again, you would see this: 328 | 329 | ---- 330 | $ make 331 | make: *** No rule to make target `foo', needed by `hello'. Stop. 332 | ---- 333 | 334 | When we write `bash` programs, the program is executed from the top to the bottom, each statement one after the other. 335 | The `Makefile` allows us to write independent groups of actions that are ordered by their dependencies. 336 | They are essentially like _functions_ in a higher-level language. 337 | We have essentially written a program who's output is ... a program. 338 | 339 | I'd encourage you to `cat hello` to see what the `hello` program "looks" like. 340 | It's mostly binary information that will look like jibberish, but you will probably be able to make out some plain English, too. 341 | You can also use `strings hello` to extract just the "strings" of text. 342 | 343 | == Using `make` for a shortcut 344 | 345 | Let's look at how we can abuse Makefiles to create shortcuts for commands. 346 | Here we will say "Hello, World!" on the command line using the `echo` command: 347 | 348 | ---- 349 | .PHONY: hello <1> 350 | 351 | hello: <2> 352 | echo "Hello, World!" <3> 353 | ---- 354 | 355 | <1> Since the `hello` target doesn't actually produce a file, we list it as a "phony" target. 356 | <2> This is the `hello` target. The name of the target should be composed only of letters and numbers, should have no spaces before it, and is followed by a colon (`:`). 357 | <3> The command(s) to run for the `hello` target are listed on lines that are indented with a tab character. 358 | 359 | I often use a `Makefile` only to remember how to invoke a command with various arguments. 360 | That is, I might write an analysis pipeline and then document how to run the program on various data sets with all their parameters. 361 | In this way I'm documenting my work in a way that I can immediately reproduce it simply by running the target! 362 | 363 | == Defining variables 364 | 365 | Here is an example of a `Makefile` I wrote to document how I ran the Centrifuge app for making taxonomic assignments to short reads: 366 | 367 | ---- 368 | INDEX_DIR = /data/centrifuge-indexes <1> 369 | 370 | clean_paired: 371 | rm -rf $(HOME)/work/data/centrifuge/paired-out 372 | 373 | paired: clean_paired <2> 374 | ./run_centrifuge.py \ <3> 375 | -q $(HOME)/work/data/centrifuge/paired \ <4> 376 | -I $(INDEX_DIR) \ <5> 377 | -i 'p_compressed+h+v' \ 378 | -x "9606, 32630" \ 379 | -o $(HOME)/work/data/centrifuge/paired-out \ 380 | -T "C/Fe Cycling" 381 | ---- 382 | 383 | <1> Here I define the variable `INDEX_DIR` and assign a value. Note that there must be spaces on either side of the `=`. I prefer ALL_CAPS for my variable names, but this just personal preference. 384 | <2> Run the `clean_paired` target prior to running this target. This ensures that there is no leftover output from a previous run. 385 | <3> This action is long, so I used backslashes `\` as on the command line to indicate the command continues to the next line. 386 | <4> To have `make` use the value of the variable, you deference like `$(VAR)`. Here we can use the environmental variable `$HOME` as the `$(HOME)`. 387 | <5> The `$(INDEX_DIR)` refers to the variable defined at the top. 388 | 389 | == Writing a workflow 390 | 391 | Following is an example of how to write a workflow as `make` targets. 392 | The goal is to download the yeast genome and characterize various gene types as "Dubious," "Uncharacterized," "Verified," and such. 393 | This is accomplished with a collection of command-line tools such as `wget`, `grep`, and `awk` combined with a custom shell script called `download.sh` all pieced together and run in order by `make`: 394 | 395 | ---- 396 | .PHONY: all fasta features test clean 397 | 398 | FEATURES = http://downloads.yeastgenome.org/curation/chromosomal_feature/SGD_features.tab 399 | 400 | all: fasta genome chr-count chr-size features gene-count verified-genes uncharacterized-genes gene-types terminated-genes test 401 | 402 | clean: 403 | find . \( -name \*gene\* -o -name chr-\* \) -exec rm {} \; 404 | rm -rf fasta SGD_features.tab 405 | 406 | fasta: 407 | ./download.sh 408 | 409 | genome: fasta 410 | (cd fasta && cat *.fsa > genome.fa) 411 | 412 | chr-count: genome 413 | grep -e '^>' "fasta/genome.fa" | grep 'chromosome' | wc -l > chr-count 414 | 415 | chr-size: genome 416 | grep -ve '^>' "fasta/genome.fa" | wc -c > chr-size 417 | 418 | features: 419 | wget -nc $(FEATURES) 420 | 421 | gene-count: features 422 | cut -f 2 SGD_features.tab | grep ORF | wc -l > gene-count 423 | 424 | verified-genes: features 425 | awk -F"\t" '$$3 == "Verified" {print}' SGD_features.tab | \ 426 | wc -l > verified-genes 427 | 428 | uncharacterized-genes: features 429 | awk -F"\t" '$$2 == "ORF" && $$3 == "Uncharacterized" {print $$2}' \ 430 | SGD_features.tab | wc -l > uncharacterized-genes 431 | 432 | gene-types: features 433 | awk -F"\t" '{print $$3}' SGD_features.tab | sort | uniq -c > gene-types 434 | 435 | terminated-genes: 436 | grep -o '/G=[^ ]*' palinsreg.txt | cut -d = -f 2 | \ 437 | sort -u > terminated-genes 438 | 439 | test: 440 | pytest -xv ./test.py 441 | ---- 442 | 443 | I won't bother commenting on all the commands. 444 | Mostly I want to demonstrate how far we can abuse a `Makefile` to create a workflow. 445 | Not only have we documented all the steps, but they are _runnable_ with nothing more than the command `make`! 446 | Absent using `make`, we'd have to write a shell script to accomplish this or, more likely, move to a more powerful language like Python. 447 | The resulting program written in either language would probably be longer, buggier, and more difficult to understand. 448 | Sometimes, all you really need is a `Makefile` and some shell commands. 449 | 450 | == Other workflow managers 451 | 452 | As you bump up against the limitations of `make`, you may choose to move to a workflow manager. 453 | There are literally hundreds to choose from including: 454 | 455 | * Snakemake which extends the basic concept `make` with Python. 456 | * The Common Workflow Language (CWL) defines workflows and parameters in a configuration file (in YAML), and you use tools like `cwltool` or `cwl-runner` (both implemented in Python) to execute the workflow with another configuration file that describes the arguments. 457 | * The Workflow Description Language (WDL) takes a similar approach to describing workflows and arguments and can be run with the Cromwell engine. 458 | * Pegasus allows you to use Python code to describe a workflow that then is written to an XML file that is the input for the engine that will run your code. 459 | * Nextflow is similar in that you use a full programming language called "Groovy" (a subset of Java) to write a workflow that can be run by their engine. 460 | 461 | All of these systems have the same basic ideas as `make`, so understanding how `make` works and how to write the pieces of your workflow and how they interact is the basis for any larger analysis workflow you may create. 462 | 463 | == Further reading 464 | 465 | Here are some other resources you can use to learn about Make: 466 | 467 | * Online manual: https://www.gnu.org/software/make/manual/make.html 468 | * _GNU Make Book_ by John Graham-Cumming, No Starch Press, 2015: https://nostarch.com/gnumake 469 | * _Managing Projects with GNU Make_ by Robert Mecklenburg, O'Reilly, 2009: http://shop.oreilly.com/product/9780596006105.do 470 | --------------------------------------------------------------------------------