├── Makefile ├── README.md ├── Unix_Shell_Handout.md ├── Unix_Shell_Handout.pdf ├── Unix_Shell_cheat_sheet.md ├── Unix_Shell_cheat_sheet.pdf └── by.png /Makefile: -------------------------------------------------------------------------------- 1 | pdf: 2 | pandoc \ 3 | -o Unix_Shell_Handout.pdf \ 4 | --toc \ 5 | --pdf-engine xelatex \ 6 | --variable mainfont="DejaVu Sans" \ 7 | --variable sansfont="DejaVu Sans" \ 8 | -V geometry:"top=2cm, bottom=2.0cm, left=2.5cm, right=2.5cm" \ 9 | Unix_Shell_Handout.md 10 | 11 | pandoc \ 12 | -o Unix_Shell_cheat_sheet.pdf \ 13 | --toc \ 14 | --pdf-engine xelatex \ 15 | --variable mainfont="DejaVu Sans" \ 16 | --variable sansfont="DejaVu Sans" \ 17 | -V geometry:"top=2cm, bottom=2.0cm, left=2.5cm, right=2.5cm" \ 18 | Unix_Shell_cheat_sheet.md 19 | 20 | 21 | example_files: 22 | mkdir -p unix_course_files 23 | echo "This file\ncontains three\nlines." \ 24 | > unix_course_files/three_lines.txt 25 | echo "This file\ncontains two lines." \ 26 | > unix_course_files/two_lines.txt 27 | echo "999\n1\n55\n7777\n3\n42\n555\n23" \ 28 | > unix_course_files/unsorted_numbers.txt 29 | echo "ATGTGGTAGTAGTATGAAATGTGA" \ 30 | > unix_course_files/DNA.txt 31 | echo "Name\tStart\tStop\tStrand" \ 32 | > unix_course_files/genes.csv 33 | echo "dnaA\t1\t1416\t+" \ 34 | >> unix_course_files/genes.csv 35 | echo "gyrA\t6479\t8908\t+" \ 36 | >> unix_course_files/genes.csv 37 | echo "rpsF\t29330\t29788\t+" \ 38 | >> unix_course_files/genes.csv 39 | echo "yidC\t3986072\t3987691\t-" \ 40 | >> unix_course_files/genes.csv 41 | echo "tRNA\ntRNA\ntRNA\nrRNA\nrRNA\nmRNA\nmRNA\nmRNA\nmRNA" \ 42 | > unix_course_files/redundant.txt 43 | wget -cO unix_course_files/origin_of_species.txt \ 44 | https://archive.org/stream/originofspecies00darwuoft/originofspecies00darwuoft_djvu.txt 45 | 46 | new_release: 47 | @echo "* Commit changes e.g. 'git commit -m \"Set version to 1.0\"'" 48 | @echo "* Tag the commit e.g. 'git tag -a v1.0 -m \"version v1.0\"'" 49 | @echo "* Generate a new release based on this tag at" 50 | @echo " https://github.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/releases" 51 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![DOI](https://zenodo.org/badge/2756/konrad/Introduction_to_the_Unix_Shell_for_biologists.svg)](https://zenodo.org/badge/latestdoi/2756/konrad/Introduction_to_the_Unix_Shell_for_biologists) 2 | 3 | ## Introduction to the Unix Shell for biologists 4 | 5 | See Handout.md for the main document. 6 | 7 | ![CC-BY](by.png) 8 | 9 | This work by Konrad Förstner is licensed under a [Creative Commons 10 | Attribution 4.0 International 11 | License](https://creativecommons.org/licenses/by/4.0/). 12 | -------------------------------------------------------------------------------- /Unix_Shell_Handout.md: -------------------------------------------------------------------------------- 1 | % Introduction to the Unix shell for biologists 2 | % Konrad U. Förstner 3 | % 4 | 5 | ![CC-BY](by.png)\ 6 | 7 | This work by Konrad Förstner is licensed under a [Creative Commons 8 | Attribution 4.0 International 9 | License](https://creativecommons.org/licenses/by/4.0/). 10 | 11 | # Motivation and background 12 | 13 | In this course you will learn the basics of how to use the Unix 14 | shell. Unix is a class of operating systems with many different 15 | flavors including well-known ones like GNU/Linux and the BSDs. The 16 | development of Unix and its shell (also known as command line 17 | interface) dates back to the late 1960s. Still, their concepts lead to 18 | very powerful tools. In the command line you can easily combine 19 | different tools into pipelines, avoid repetitive work and make your 20 | workflow reproducible. Knowing how to use the shell will also enable 21 | you to run programs that are only developed for this environment which 22 | is the case for many bioinformatical tools. 23 | 24 | # Create and download some test files 25 | 26 | Use the `Makefile` of this repo and run 27 | 28 | ``` 29 | $ make example_files 30 | ``` 31 | 32 | This should create folder `unix_course_files` thank contains serveral 33 | examples files 34 | 35 | # The basic anatomy of a command line call 36 | 37 | Running a tool in the command line interface follows a simple 38 | pattern. At first you have to write the name of the command (if it is 39 | not globally installed it's precise location needs to be given - we 40 | will get to this later). Some programs additionally require 41 | parameters. While the parameters are the requirement of the program 42 | the actual values we give are called arguments. There are two 43 | different ways how to pass those arguments to a program - via keywords 44 | parameter (also called named keywords, flags or options) or via 45 | positional parameters. The common pattern looks like this (`<>` 46 | indicates obligatory items, `[]` indicates optional items): 47 | 48 | ``` 49 | [keyword parameters] [positional parameters] 50 | ``` 51 | 52 | An example is calling the program `ls` which lists the content of a 53 | directory. You can simply call it without any argument 54 | 55 | ``` 56 | $ ls 57 | ``` 58 | 59 | or with one or more keyword argument 60 | 61 | ``` 62 | $ ls -l 63 | $ ls -lh 64 | ``` 65 | 66 | or with one or more positional arguments 67 | 68 | ``` 69 | $ ls test_folder 70 | ``` 71 | 72 | or with one or more keyword and positional arguments 73 | 74 | ``` 75 | $ ls -l test_folder 76 | ``` 77 | 78 | The result of a command is written usually to the so called *standard 79 | output* of the shell which is the screen shown to you. We will later 80 | learn how to redirect this e.g. to the *standard input* of another 81 | program. 82 | 83 | # How to get help and documentation 84 | 85 | Especially in the beginning you will have a lot of questions what a 86 | command does and which arguments and parameters need to be given. One 87 | rule before using a command or before asking somebody about it is 88 | called [RTFM](https://en.wikipedia.org/wiki/RTFM) (please check the 89 | meaning yourself). Maybe the most important command is `man` which 90 | stands for *manual*. Most commands offer a manual and with `man` you 91 | can read those. To get the documentation of `ls` type 92 | 93 | ``` 94 | $ man ls 95 | ``` 96 | 97 | To close the manual use `q`. Additionally or alternatively many tools 98 | offer some help via the parameter `-h`, `-help` or `--help`. For 99 | example `ls`: 100 | 101 | ``` 102 | $ ls --help 103 | ``` 104 | 105 | Other tools present this help if they are called without any parameters 106 | or arguments. 107 | 108 | # Bash keyboard shortcuts 109 | 110 | There are different implementations of the Unix shell. You are 111 | currently working with Bash (**B**ourne-**a**gain **sh**ell). Bash has several keyboard shortcuts that 112 | improve the interaction. Here is a small selection: 113 | 114 | * Ctrl-c - Stop the command 115 | * Ctlr-↑ - Go backward in command history 116 | * Ctlr-↓ - Go forward in command history 117 | * Ctrl-a - Jump to the beginning of a line 118 | * Ctrl-e - Jump to the end of a line 119 | * Ctrl-u - Remove everything before the cursor position 120 | * Ctrl-k - Remove everything after the cursor position 121 | * Ctrl-l - Clean the screen 122 | * Ctrl-r - Search in command history 123 | * Tab - extend commands and file/folder names 124 | 125 | # Files, folders, locations 126 | 127 | Topics: 128 | 129 | * `ls ` 130 | * `pwd` 131 | * `cd` 132 | * `mkdir` 133 | * Relative vs. absolute path 134 | * `~/` 135 | 136 | In this part you will learn how to navigate through the file system, 137 | explore the content of folders and create folders. 138 | 139 | At first we need to know where we are. If you open a new terminal you 140 | should be in your home directory (we will explain this below). To test 141 | this, call the program `pwd` which stands for **p**rint **w**orking **d**irectory. 142 | 143 | ``` 144 | $ pwd 145 | /home/ubuntu 146 | ``` 147 | 148 | The default user of the Ubuntu live system is called `ubuntu`. In 149 | general each user has a folder with its user name located inside 150 | the folder `home`. The next command we need and which has been 151 | already mentioned above is `ls`. It simply lists the content of a 152 | folder. If you call it without any arguments it will output the content 153 | of the current folder. Using `ls` we want to get a rough overview of what 154 | a common Unix file system tree looks like and learn how to address 155 | files and folders. The root folder of a systems starts with `/`. Call 156 | 157 | ``` 158 | $ ls / 159 | ``` 160 | 161 | to see the content of the root folder. You should see something like 162 | 163 | ``` 164 | bin data etc lib lost+found mnt proc run srv tmp var 165 | boot dev home lib64 media opt root sbin sys usr 166 | ``` 167 | 168 | There are several subfolders in the so-called root folder (and yes, to 169 | make it a little bit confusing there is even a folder called `root` in 170 | the root folder). Those are more important if you are the 171 | administrator of the system. Normal users do not have the permission to 172 | make changes here. Currently your home directory is your little 173 | universe in which you can do whatever you want. In here we will 174 | learn how work with paths. A file or folder can be addressed 175 | either with its *absolute* or *relative path*. As you have 176 | downloaded and decompressed the test data you should have a 177 | folder `unix_course_files` located in your home folder. Assuming you are in this 178 | folder (`/home/ubuntu/`) the relative path to the folder is simply 179 | `unix_course_files`. You can get the content of the folder listed by 180 | calling `ls` like this: 181 | 182 | ``` 183 | $ ls unix_course_files 184 | ``` 185 | 186 | This is the so called *relative path* as it is relative to the current work 187 | directory `/home/ubuntu/`. The *absolute path* would start with a `/` 188 | and is `/home/ubuntu/unix_course_files`. Call `ls` like this: 189 | 190 | ``` 191 | $ ls /home/ubuntu/unix_course_files 192 | ``` 193 | 194 | There are some conventions regarding *relative* and *absolute paths*. One 195 | is that a dot (`.`) represents the current folder. The command 196 | 197 | ``` 198 | $ ls ./ 199 | ``` 200 | 201 | should return the same as simply calling 202 | 203 | ``` 204 | $ ls 205 | ``` 206 | 207 | Two dots (`..`) represent the parent folder. If you call 208 | 209 | ``` 210 | $ ls ../ 211 | ``` 212 | 213 | you should see the content of `/home`. If you call 214 | 215 | ``` 216 | $ ls ../../ 217 | ``` 218 | 219 | you should see the content of the parent folder of the parent folder which 220 | is the root folder (`/`) assuming you are in `/home/ubuntu/`. Another 221 | convention is that `~/` represents the home directory of the user. The 222 | command 223 | 224 | ``` 225 | $ ls ~/ 226 | ``` 227 | 228 | should list the content of your home directory independent of your 229 | current location in the file system. 230 | 231 | Now as we know where we are and what is there we can start to change 232 | our location. For this we use the command `cd` (change directory). If 233 | you are in your home directory `/home/ubuntu/` you can go into the 234 | folder `unix_course_files` by typing 235 | 236 | ``` 237 | $ cd unix_course_files 238 | ``` 239 | 240 | After that call `pwd` to make sure that you are in the correct folder. 241 | 242 | ``` 243 | $ pwd 244 | /home/ubuntu/unix_course_files 245 | ``` 246 | 247 | To go back into your home directory you have different options. Use 248 | the *absolute path* 249 | 250 | ``` 251 | $ cd /home/ubuntu/ 252 | ``` 253 | 254 | or the above mentioned convention for the home directory `~/`: 255 | 256 | ``` 257 | $ cd ~/ 258 | ``` 259 | 260 | or the *relative path*, in this case the parent directory of 261 | `/home/ubuntu/unix_course_files`: 262 | 263 | ``` 264 | $ cd ../ 265 | ``` 266 | 267 | As the home directory is such an important place `cd` uses this as 268 | default argument. This means if you call `cd` without argument you will 269 | go to the home directory. Test this behavior by calling 270 | 271 | ``` 272 | $ cd 273 | ``` 274 | 275 | Try now to go to different locations in the file system and list the 276 | files and folders located there. 277 | 278 | Now we will create our first folder using the command `mkdir` (*make 279 | directory*). Go into the home directory and type: 280 | 281 | ``` 282 | $ mkdir my_first_folder 283 | ``` 284 | 285 | Here we can discuss the implementation of another Unix philosophy: "No 286 | news is good news." The command successfully created the folder 287 | `my_first_folder`. You can check this by calling `ls`, but `mkdir` did 288 | not tell you this. If you do not get a message this usually means 289 | everything went fine. If you call the above `mkdir` command again you 290 | should get an error message like this: 291 | 292 | ``` 293 | $ mkdir my_first_folder 294 | mkdir: cannot create directory ‘my_first_folder’: File exists 295 | ``` 296 | 297 | So if a command does not complain you can usually assume there was no 298 | error. 299 | 300 | # Manipulating files and folder 301 | 302 | Topics: 303 | 304 | * `touch` 305 | * `cp` 306 | * `mv` 307 | * `rm` 308 | 309 | Next we want to manipulate files and folders. We create some dummy 310 | files using `touch` which is usually used to change the time stamps of 311 | files. But you can also create empty files with it easily. Let's 312 | create a file called `test_file_1.txt`: 313 | 314 | ``` 315 | $ touch test_file_1.txt 316 | ``` 317 | 318 | Use `ls` to check that it was created. 319 | 320 | The command `cp` (*copy*) can be used to copy files. For this it 321 | requires at least two arguments: the source and the target file. In 322 | the following example we generate a copy of the file `test_file_1.txt` 323 | called `a_copy_of_test_file.txt`. 324 | 325 | ``` 326 | $ cp test_file_1.txt a_copy_of_test_file.txt 327 | ``` 328 | 329 | Use `ls` to confirm that this worked. We can also copy the file in the 330 | folder `my_first_folder` which we have created above: 331 | 332 | ``` 333 | $ cp test_file_1.txt my_first_folder 334 | ``` 335 | 336 | Now there should be also a file `test_file_1.txt` in the folder 337 | `my_first_folder`. If you want to copy a folder and its content you 338 | have to use the parameter `-r`. 339 | 340 | ``` 341 | $ cp -r my_first_folder a_copy_of_my_first_folder 342 | ``` 343 | 344 | You can use the command `mv` (*move*) to rename or relocate files 345 | or folders. To rename the file `a_copy_of_test_file.txt` to 346 | `test_file_with_new_name.txt` call 347 | 348 | ``` 349 | $ mv a_copy_of_test_file.txt test_file_with_new_name.txt 350 | ``` 351 | 352 | With `mv` you can also move a file into a folder. For this the second 353 | argument has to be a folder. For example, to move the file now named 354 | `test_file_with_new_name.txt` into the folder `my_first_folder` use 355 | 356 | ``` 357 | $ mv test_file_with_new_name.txt my_first_folder 358 | ``` 359 | 360 | You are not limited to one file if you want to move them into a 361 | folder. Let's create and move two files `file1` and `file2` into the 362 | folder `my_first_folder`. 363 | 364 | ``` 365 | $ touch file1 file2 366 | $ mv file1 file2 my_first_folder 367 | ``` 368 | 369 | At this point we can introduce another handy feature most shells offer 370 | which is called *globbing*. Let us assume you want to apply the same 371 | command to several files. Instead of explicitly writing all the file 372 | names you can use a *globbing pattern* to address them. There are 373 | different wildcards that can be used for these patterns. The most 374 | important one is the asterisk (`*`). It can replace none, one or more 375 | characters. Let us explore this with a small example: 376 | 377 | ``` 378 | $ touch file1.txt file2.txt file3 379 | $ ls *txt 380 | $ mv *txt my_first_folder 381 | ``` 382 | 383 | The `ls` shows the two files matching the given pattern 384 | (i.e. `file1.txt` and `file2.txt`) while dismissing the one not 385 | matching (i.e. `file3`). Same for `mv` - it will only move the two 386 | files ending with `txt`. 387 | 388 | We accumulated several test files that we do not need anymore. Time to clean 389 | up a little bit. With the command `rm` (*remove*) you can delete files 390 | and folders. Please be aware that there is no such thing as a trash 391 | bin if you remove items this way. They will be gone for good and without further notice. 392 | 393 | To delete a file in `my_first_folder` call: 394 | 395 | ``` 396 | $ rm my_first_folder/file1.txt 397 | ``` 398 | 399 | To remove a folder use the parameter `-r` (*recursive*): 400 | 401 | ``` 402 | $ rm -r my_first_folder 403 | ``` 404 | 405 | Alternatively you can use the command `rmdir`: 406 | 407 | ``` 408 | $ rmdir my_first_folder 409 | ``` 410 | 411 | # File content - part 1 412 | 413 | Topics: 414 | 415 | * `less` / `more` 416 | * `cat` 417 | * `echo` 418 | * `head` 419 | * `tail` 420 | * `cut` 421 | 422 | Until now we did not care about the content of the files. This will 423 | change now. Please go into the folder `unix_course_files`: 424 | 425 | ``` 426 | $ cd unix_course_files 427 | ``` 428 | 429 | There should be some files waiting for you. To read the content with 430 | the possibility to scroll around we need a so called pager 431 | program. Most Unix systems offer the programs `more` and `less` which 432 | have very similar functionalities ("more or less are more or less the 433 | same"). We will use the later one here. Let's open the file 434 | `origin_of_species.txt` 435 | 436 | ``` 437 | $ less origin_of_species.txt 438 | ``` 439 | 440 | The file contains Charles Darwin's *Origin of species* in plain 441 | text. You can scroll up and down line-wise using the arrow keys or page-wise 442 | using the page-up/page-down keys. To quit use the key `q`. With 443 | pager programs you can read file content interactively, but sometimes 444 | you just want to have the content of a file given to you (i.e. on the 445 | *standard output*). The command `cat` (*concatenate*) does that for one 446 | or more files. Let us use it to see what is in the example file 447 | `two_lines.txt`. Assuming you are in the folder `unix_course_files` 448 | you can call 449 | 450 | ``` 451 | $ cat two_lines.txt 452 | ``` 453 | 454 | The content of the file is shown to you. You can apply the command to 455 | two files and the content is concatenated and returned: 456 | 457 | ``` 458 | $ cat two_lines.txt three_lines.txt 459 | ``` 460 | 461 | This is a good time to introduce the *standard input* and *standard 462 | output* and what you can do with it. Above I wrote the output is given 463 | to you. This means it is written to the so called *standard 464 | output*. You can redirect the *standard output* into a file by using 465 | `>`. Let us use the call above to generate a new file that contains 466 | the combined content of both files: 467 | 468 | ``` 469 | $ cat two_lines.txt three_lines.txt > five_lines.txt 470 | ``` 471 | 472 | Please have a look at the content of this file: 473 | 474 | ``` 475 | $ cat five_lines.txt 476 | ``` 477 | 478 | The *standard output* can also be redirected to other tools as 479 | *standard input*. More about this below. With `cat` we can reuse the 480 | existing file content. To create something new we use the command 481 | `echo` which writes a given string to the standard output. 482 | 483 | ``` 484 | $ echo "Something very creative" 485 | ``` 486 | 487 | To redirect the output into a target file use `>`. 488 | 489 | ``` 490 | $ echo "Something very creative." > creative.txt 491 | ``` 492 | 493 | Be aware that this can be dangerous. You will overwrite the content of an 494 | existing file. For example if you call now 495 | 496 | ``` 497 | $ echo "Something very uncreative." > creative.txt 498 | ``` 499 | 500 | there will be only the latest string written to the file and the 501 | previous one will be overwritten. To append the output of a command to a 502 | file without overwriting the content use `>>`. 503 | 504 | ``` 505 | $ echo "Something very creative." > creative.txt 506 | $ echo "Something very uncreative." >> creative.txt 507 | ``` 508 | 509 | Now `creative.txt` should contain two lines. 510 | 511 | Sometimes you just want to get an excerpt of a file e.g. just the 512 | first or last lines of it. For this the commands `head` and `tail` can 513 | be used. Per default 10 lines are shown. You can use the parameter `-n 514 | ` (e.g. `-n 20` or just `-` (e.g. `-20`) to specify the 515 | number of lines to be displayed. Test the tools with the file 516 | `origin_of_species.txt`: 517 | 518 | ``` 519 | $ head origin_of_species.txt 520 | $ tail origin_of_species.txt 521 | ``` 522 | 523 | You cannot only select vertically but also horizontally using the 524 | command `cut`. Let us extract only the first 10 characters of each line 525 | in the file `origin_of_species.txt`: 526 | 527 | ``` 528 | $ cut -c 1-10 origin_of_species.txt 529 | ``` 530 | 531 | The tool `cut` can be very useful to extract certain columns from CSV 532 | files (*comma/character separated values*). Have a look at the content of the 533 | file `genes.csv`. You see that it contains different columns that are 534 | tabular-separated. You can extract selected columns with `cut`: 535 | 536 | ``` 537 | $ cut -f 1,4 genes.csv 538 | ``` 539 | 540 | # File content - part 2 541 | 542 | Topics: 543 | 544 | * `wc` 545 | * `sort` 546 | * `uniq` 547 | * `grep` 548 | 549 | There are several tools that let you manipulate the content of a plain 550 | text file or return information about it. If you want for example some 551 | statistics about the number of character, words and lines use the 552 | command `wc`. Let us count the number of lines in the file 553 | `origin_of_species.txt`: 554 | 555 | ``` 556 | $ wc -l origin_of_species.txt 557 | ``` 558 | 559 | You can use the command `sort` to sort a file alpha-numerically. Test 560 | the following calls 561 | 562 | ``` 563 | $ sort unsorted_numbers.txt 564 | $ sort -n unsorted_numbers.txt 565 | $ sort -rn unsorted_numbers.txt 566 | ``` 567 | 568 | and try to understand the output. 569 | 570 | The tool `uniq` takes a sorted list of lines and removes line-wise the 571 | redundancy. Please have a look at the content of the file 572 | `redundant.txt`. Then use `uniq` to generate a non-redundant list: 573 | 574 | ``` 575 | $ uniq redundant.txt 576 | ``` 577 | 578 | If you call `uniq` with `-c` you get the number of occurrence for each 579 | remaining entry: 580 | 581 | ``` 582 | $ uniq -c redundant.txt 583 | ``` 584 | 585 | With the tool `grep` you can extract lines that match a given 586 | pattern. For instance, if you want to find all lines in 587 | `origin_of_species.txt` that contain the word `species` call 588 | 589 | ``` 590 | $ grep species origin_of_species.txt 591 | ``` 592 | 593 | As you can see we only get the lines that contain `species` but not 594 | the ones that contain `Species`. To make the search case-insensitive 595 | use the parameter `-i`. 596 | 597 | ``` 598 | $ grep -i species origin_of_species.txt 599 | ``` 600 | 601 | If you are only interested in the number of lines that match the pattern 602 | use `-c`: 603 | 604 | ``` 605 | $ grep -ic species origin_of_species.txt 606 | ``` 607 | 608 | # Connecting tools 609 | 610 | Another piece of the Unix philosophy is to build small tools that do 611 | one thing optimally and use the standard input and standard 612 | output. The real power of Unix builds on the capability to easily 613 | connect tools. For this so-called *pipes* are used. To use the 614 | *standard output* of one tool as *standard input* of another tool the 615 | vertical bar `|` is used. For example, in order to extract the first 616 | 1000 lines from `origin_of_species.txt`, search for lines that contain 617 | `species`, then search in those lines the ones which contain `wild` 618 | and finally replace the `w`s by `m`s call (Please write this in one line 619 | in the shell and remove the `\`): 620 | 621 | ``` 622 | $ head -n 1000 origin_of_species.txt | grep species \ 623 | | grep wild | tr w m 624 | ``` 625 | 626 | # Repeating command using the `for` loop 627 | 628 | Assuming you want to generate a copy of each of your files ending with ´.txt´. A 629 | 630 | ``` 631 | cp *txt copy_of_*txt 632 | ``` 633 | 634 | would not work. 635 | 636 | With `for` loops you can solve this problem. Let's start with a simple 637 | one. 638 | 639 | ``` 640 | for FILE in three_lines.txt two_lines.txt 641 | > do 642 | > head -n 1 $FILE 643 | > done 644 | ``` 645 | 646 | The variable `FILE` (you can give it also any other name) can be used 647 | inside of the loop. 648 | 649 | If you press now Ctr-↑ you will get the line 650 | 651 | ``` 652 | for FILE in three_lines.txt two_lines.txt; do head -n 1 $FILE; done 653 | ``` 654 | 655 | which is equivalent to the call before. You can not only call one 656 | command inside of a loop but several: 657 | 658 | ``` 659 | for FILE in three_lines.txt two_lines.txt 660 | > do 661 | > head -n 1 $FILE 662 | > echo "-----------------" 663 | > done 664 | ``` 665 | 666 | ``` 667 | for FILE in *txt 668 | > do 669 | > head -n 1 $FILE 670 | > echo "-----------------" 671 | > done 672 | ``` 673 | 674 | ``` 675 | for FILE in *txt 676 | > do 677 | > cp $FILE copy_of_$FILE 678 | > done 679 | ``` 680 | 681 | # Shell scripting 682 | 683 | Open a new file in a text editor of you choice, call it 684 | `count_lines.sh` and add the following text: 685 | 686 | ``` 687 | echo "Number of lines in the given file": 688 | wc -l origin_of_species.txt 689 | ``` 690 | 691 | Save the file, make sure the file `origin_of_species.txt` is in the 692 | same folder and run it the script: 693 | 694 | ``` 695 | $ bash count_lines.sh 696 | ``` 697 | 698 | You should get someting like 699 | 700 | ``` 701 | Number of lines in the given file 702 | 15322 origin_of_species.txt 703 | ``` 704 | 705 | This a very first shell script. Now we want to make it more 706 | flexible. Instead of hard coding the input file for `wc -l` we want to 707 | be able to give this as argument to the shell script. For this we 708 | change the shell script to: 709 | 710 | ``` 711 | echo "Number of lines in the given file": 712 | wc -l $1 713 | ``` 714 | 715 | The `$1` is a varible that represents the first argument given to the 716 | shell scrip. Now you can call the script in the following way 717 | 718 | ``` 719 | $ bash count_lines.sh origin_of_species.txt 720 | ``` 721 | 722 | You should get the same results as before. If you also like to take 723 | the second argument use the variable `$2`. For using all arguments 724 | given to the shell script use the variable "$@". E.g change the shell 725 | script to: 726 | 727 | 728 | ``` 729 | echo "Number of lines in the given file(s)": 730 | wc -l $@ 731 | ``` 732 | 733 | and run it with several input files: 734 | 735 | ``` 736 | bash count_lines.sh origin_of_species.txt genes.csv 737 | ``` 738 | 739 | You should get something like: 740 | 741 | ``` 742 | Number of lines that contains species: 743 | 15322 origin_of_species.txt 744 | 5 genes.csv 745 | 15327 total 746 | ``` 747 | 748 | # Examples analysis 749 | 750 | Equipped with a fine selection of useful programs and basic 751 | understanding of how to combine them, we will no apply them to analyze 752 | real biological data. 753 | 754 | ## Retrieving data 755 | 756 | You have used the tool `wget` above to download the example files. It is 757 | very useful, especially, if you want to retrieve large data sets. We 758 | download the fasta file of the *Salmonella* Thyphimuirum SL1344 759 | chromosome by calling (in this document the URL is split into three 760 | lines. Please write it in one line in the shell and remove the `\`). 761 | 762 | ``` 763 | $ wget ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/\ 764 | Salmonella_enterica_serovar_Typhimurium_SL1344_uid86645/\ 765 | NC_016810.fna 766 | ``` 767 | 768 | Additionally, we download the annotation in GFF format of the same replicon: 769 | 770 | ``` 771 | $ wget ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/\ 772 | Salmonella_enterica_serovar_Typhimurium_SL1344_uid86645/\ 773 | NC_016810.gff 774 | ``` 775 | 776 | ## Counting the number of features 777 | 778 | Use `less` to have a look at `NC_016810.gff`. It is a tabular-separated 779 | file. The first 5 lines start with `#` and are called 780 | header. Then several lines with 9 columns follow. The third column 781 | contains the type of the entry (gene, CDS, tRNA, rRNA, etc). If we 782 | want to know the numbers of tRNA entries we could try to apply `grep` 783 | and use `-c` to count the number of matching lines. 784 | 785 | ``` 786 | $ grep -c tRNA NC_016810.gff 787 | ``` 788 | 789 | This leads to a suspiciously large number. The issue is that the 790 | string `tRNA` also occurs in the attribute column (the 9th 791 | column). We just want to select lines with a match in the third column. 792 | This can be achieved by combining `cut` and `grep`. 793 | 794 | ``` 795 | $ cut -f 3 NC_016810.gff | grep -c tRNA 796 | ``` 797 | 798 | To get the number of entries for all other features we could just 799 | replace the `tRNA` e.g. by `rRNA`. But we can also get the number for 800 | all of them at once using this constellation: 801 | 802 | ``` 803 | $ grep -v "#" NC_016810.gff | cut -f 3 | sort | uniq -c 804 | ``` 805 | 806 | Try to understand what we did here. You can use a similar call to 807 | count the number genes on the plus and minus strand: 808 | 809 | ``` 810 | $ cut -f 3,7 NC_016810.gff | grep gene | sort | uniq -c 811 | ``` 812 | 813 | -------------------------------------------------------------------------------- /Unix_Shell_Handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/931b6ff1e3b104f25bf9d4cf5e9f14ff42f5f1ee/Unix_Shell_Handout.pdf -------------------------------------------------------------------------------- /Unix_Shell_cheat_sheet.md: -------------------------------------------------------------------------------- 1 | # Unix Shell cheat sheet 2 | 3 | - `ls` - lists file 4 | - `man` - shows the manual for a command; use arrrow keys to scroll up and down, `q` to quit 5 | - `pwd` - shows the *p*resent *w*ork *d*irectory 6 | - `cd` - change directory 7 | - `echo` - write a string to the standard output 8 | - `touch` - creates an empty file 9 | - `cp` - copys files and/or folder 10 | - `mv` - moves files and/or folder 11 | - `rm` - removes files and/or folder 12 | - `less` / `more` - pager programs to open a plain text file interactively; use arrrow keys to scroll up and down, `q` to quit 13 | - `cat` - writes the content of one or more file to the standard output 14 | - `head` - shows the first lines (per default 10) lines of a text file 15 | - `tail` - shows the last lines (per default 10) lines of a text file 16 | - `cut` - returns column of a text file 17 | - `wc` - *w*ord *c*ount, counts number of characters, words, and lines of a text file 18 | - `sort` - sorts a test file 19 | - `uniq` - removes redundancies from sorted list of lines 20 | - `grep` - extracts line of a file that match a pattern 21 | -------------------------------------------------------------------------------- /Unix_Shell_cheat_sheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/931b6ff1e3b104f25bf9d4cf5e9f14ff42f5f1ee/Unix_Shell_cheat_sheet.pdf -------------------------------------------------------------------------------- /by.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/931b6ff1e3b104f25bf9d4cf5e9f14ff42f5f1ee/by.png --------------------------------------------------------------------------------