├── Makefile
├── README.md
├── Unix_Shell_Handout.md
├── Unix_Shell_Handout.pdf
├── Unix_Shell_cheat_sheet.md
├── Unix_Shell_cheat_sheet.pdf
└── by.png


/Makefile:
--------------------------------------------------------------------------------
 1 | pdf:
 2 | 	pandoc \
 3 | 	  -o Unix_Shell_Handout.pdf \
 4 |           --toc \
 5 | 	   --pdf-engine xelatex \
 6 | 	   --variable mainfont="DejaVu Sans" \
 7 | 	   --variable sansfont="DejaVu Sans" \
 8 | 	   -V geometry:"top=2cm, bottom=2.0cm, left=2.5cm, right=2.5cm" \
 9 | 	  Unix_Shell_Handout.md
10 | 
11 | 	pandoc \
12 | 	  -o Unix_Shell_cheat_sheet.pdf \
13 |            --toc \
14 | 	   --pdf-engine xelatex \
15 | 	   --variable mainfont="DejaVu Sans" \
16 | 	   --variable sansfont="DejaVu Sans" \
17 | 	   -V geometry:"top=2cm, bottom=2.0cm, left=2.5cm, right=2.5cm" \
18 | 	   Unix_Shell_cheat_sheet.md
19 | 
20 | 
21 | example_files:
22 | 	mkdir -p unix_course_files
23 | 	echo "This file\ncontains three\nlines." \
24 | 	  > unix_course_files/three_lines.txt
25 | 	echo "This file\ncontains two lines." \
26 | 	  > unix_course_files/two_lines.txt
27 | 	echo "999\n1\n55\n7777\n3\n42\n555\n23" \
28 | 	  > unix_course_files/unsorted_numbers.txt
29 | 	echo "ATGTGGTAGTAGTATGAAATGTGA" \
30 | 	  > unix_course_files/DNA.txt
31 | 	echo "Name\tStart\tStop\tStrand" \
32 | 	  > unix_course_files/genes.csv
33 | 	echo "dnaA\t1\t1416\t+" \
34 | 	  >> unix_course_files/genes.csv
35 | 	echo "gyrA\t6479\t8908\t+" \
36 | 	  >> unix_course_files/genes.csv
37 | 	echo "rpsF\t29330\t29788\t+" \
38 | 	  >> unix_course_files/genes.csv
39 | 	echo "yidC\t3986072\t3987691\t-" \
40 | 	  >> unix_course_files/genes.csv
41 | 	echo "tRNA\ntRNA\ntRNA\nrRNA\nrRNA\nmRNA\nmRNA\nmRNA\nmRNA" \
42 | 	  > unix_course_files/redundant.txt
43 | 	wget -cO unix_course_files/origin_of_species.txt \
44 | 	  https://archive.org/stream/originofspecies00darwuoft/originofspecies00darwuoft_djvu.txt
45 | 
46 | new_release:
47 | 	@echo "* Commit changes e.g. 'git commit -m \"Set version to 1.0\"'"
48 | 	@echo "* Tag the commit e.g. 'git tag -a v1.0 -m \"version v1.0\"'"
49 | 	@echo "* Generate a new release based on this tag at"
50 | 	@echo "  https://github.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/releases"
51 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [![DOI](https://zenodo.org/badge/2756/konrad/Introduction_to_the_Unix_Shell_for_biologists.svg)](https://zenodo.org/badge/latestdoi/2756/konrad/Introduction_to_the_Unix_Shell_for_biologists)
 2 | 
 3 | ## Introduction to the Unix Shell for biologists
 4 | 
 5 | See Handout.md for the main document.
 6 | 
 7 | ![CC-BY](by.png)
 8 | 
 9 | This work by Konrad Förstner is licensed under a [Creative Commons
10 | Attribution 4.0 International
11 | License](https://creativecommons.org/licenses/by/4.0/).
12 | 


--------------------------------------------------------------------------------
/Unix_Shell_Handout.md:
--------------------------------------------------------------------------------
  1 | % Introduction to the Unix shell for biologists
  2 | % Konrad U. Förstner
  3 | % 
  4 | 
  5 | ![CC-BY](by.png)\
  6 | 
  7 | This work by Konrad Förstner is licensed under a [Creative Commons
  8 | Attribution 4.0 International
  9 | License](https://creativecommons.org/licenses/by/4.0/).
 10 | 
 11 | # Motivation and background
 12 | 
 13 | In this course you will learn the basics of how to use the Unix
 14 | shell. Unix is a class of operating systems with many different
 15 | flavors including well-known ones like GNU/Linux and the BSDs. The
 16 | development of Unix and its shell (also known as command line
 17 | interface) dates back to the late 1960s. Still, their concepts lead to
 18 | very powerful tools. In the command line you can easily combine
 19 | different tools into pipelines, avoid repetitive work and make your
 20 | workflow reproducible. Knowing how to use the shell will also enable
 21 | you to run programs that are only developed for this environment which
 22 | is the case for many bioinformatical tools.
 23 | 
 24 | # Create and download some test files
 25 | 
 26 | Use the `Makefile` of this repo and run
 27 | 
 28 | ```
 29 | $ make example_files
 30 | ```
 31 | 
 32 | This should create folder `unix_course_files` thank contains serveral
 33 | examples files
 34 | 
 35 | # The basic anatomy of a command line call
 36 | 
 37 | Running a tool in the command line interface follows a simple
 38 | pattern. At first you have to write the name of the command (if it is
 39 | not globally installed it's precise location needs to be given - we
 40 | will get to this later). Some programs additionally require
 41 | parameters. While the parameters are the requirement of the program
 42 | the actual values we give are called arguments. There are two
 43 | different ways how to pass those arguments to a program - via keywords
 44 | parameter (also called named keywords, flags or options) or via
 45 | positional parameters.  The common pattern looks like this (`<>`
 46 | indicates obligatory items, `[]` indicates optional items):
 47 | 
 48 | ```
 49 | <program name> [keyword parameters] [positional parameters]
 50 | ```
 51 | 
 52 | An example is calling the program `ls` which lists the content of a
 53 | directory. You can simply call it without any argument
 54 | 
 55 | ```
 56 | $ ls
 57 | ```
 58 | 
 59 | or with one or more keyword argument
 60 | 
 61 | ```
 62 | $ ls -l
 63 | $ ls -lh
 64 | ```
 65 | 
 66 | or with one or more positional arguments
 67 | 
 68 | ```
 69 | $ ls test_folder
 70 | ```
 71 | 
 72 | or with one or more keyword and positional arguments
 73 | 
 74 | ```
 75 | $ ls -l test_folder
 76 | ```
 77 | 
 78 | The result of a command is written usually to the so called *standard
 79 | output* of the shell which is the screen shown to you. We will later
 80 | learn how to redirect this e.g. to the *standard input* of another
 81 | program.
 82 | 
 83 | # How to get help and documentation
 84 | 
 85 | Especially in the beginning you will have a lot of questions what a
 86 | command does and which arguments and parameters need to be given. One
 87 | rule before using a command or before asking somebody about it is
 88 | called [RTFM](https://en.wikipedia.org/wiki/RTFM) (please check the
 89 | meaning yourself). Maybe the most important command is `man` which
 90 | stands for *manual*. Most commands offer a manual and with `man` you
 91 | can read those. To get the documentation of `ls` type
 92 | 
 93 | ```
 94 | $ man ls
 95 | ```
 96 | 
 97 | To close the manual use `q`. Additionally or alternatively many tools
 98 | offer some help via the parameter `-h`, `-help` or `--help`. For
 99 | example `ls`:
100 | 
101 | ```
102 | $ ls --help
103 | ```
104 | 
105 | Other tools present this help if they are called without any parameters
106 | or arguments.
107 | 
108 | # Bash keyboard shortcuts
109 | 
110 | There are different implementations of the Unix shell. You are
111 | currently working with Bash (**B**ourne-**a**gain **sh**ell). Bash has several keyboard shortcuts that
112 | improve the interaction. Here is a small selection:
113 | 
114 | * Ctrl-c - Stop the command
115 | * Ctlr-↑ - Go backward in command history
116 | * Ctlr-↓ - Go forward in command history
117 | * Ctrl-a - Jump to the beginning of a line
118 | * Ctrl-e - Jump to the end of a line
119 | * Ctrl-u - Remove everything before the cursor position
120 | * Ctrl-k - Remove everything after the cursor position
121 | * Ctrl-l - Clean the screen
122 | * Ctrl-r - Search in command history
123 | * Tab - extend commands and file/folder names
124 | 
125 | # Files, folders, locations
126 | 
127 | Topics:
128 | 
129 | * `ls `
130 | * `pwd`
131 | * `cd`
132 | * `mkdir`
133 | * Relative vs. absolute path
134 | * `~/` 
135 | 
136 | In this part you will learn how to navigate through the file system,
137 | explore the content of folders and create folders.
138 | 
139 | At first we need to know where we are. If you open a new terminal you
140 | should be in your home directory (we will explain this below). To test
141 | this, call the program `pwd` which stands for **p**rint **w**orking **d**irectory.
142 | 
143 | ```
144 | $ pwd
145 | /home/ubuntu
146 | ```
147 | 
148 | The default user of the Ubuntu live system is called `ubuntu`. In
149 | general each user has a folder with its user name located inside
150 | the folder `home`. The next command we need and which has been
151 | already mentioned above is `ls`. It simply lists the content of a
152 | folder. If you call it without any arguments it will output the content
153 | of the current folder. Using `ls` we want to get a rough overview of what
154 | a common Unix file system tree looks like and learn how to address
155 | files and folders. The root folder of a systems starts with `/`. Call
156 | 
157 | ```
158 | $ ls /
159 | ```
160 | 
161 | to see the content of the root folder. You should see something like
162 | 
163 | ```
164 | bin   data  etc  lib    lost+found  mnt  proc  run   srv  tmp  var
165 | boot  dev   home lib64  media   opt  root  sbin  sys  usr
166 | ```
167 | 
168 | There are several subfolders in the so-called root folder (and yes, to
169 | make it a little bit confusing there is even a folder called `root` in
170 | the root folder). Those are more important if you are the
171 | administrator of the system. Normal users do not have the permission to
172 | make changes here. Currently your home directory is your little
173 | universe in which you can do whatever you want. In here we will
174 | learn how work with paths. A file or folder can be addressed
175 | either with its *absolute* or *relative path*. As you have
176 | downloaded and decompressed the test data you should have a
177 | folder `unix_course_files` located in your home folder. Assuming you are in this
178 | folder (`/home/ubuntu/`) the relative path to the folder is simply
179 | `unix_course_files`. You can get the content of the folder listed by
180 | calling `ls` like this:
181 | 
182 | ```
183 | $ ls unix_course_files
184 | ```
185 |    
186 | This is the so called *relative path* as it is relative to the current work
187 | directory `/home/ubuntu/`. The *absolute path* would start with a `/`
188 | and is `/home/ubuntu/unix_course_files`. Call `ls` like this:
189 | 
190 | ```
191 | $ ls /home/ubuntu/unix_course_files
192 | ```
193 | 
194 | There are some conventions regarding *relative* and *absolute paths*. One
195 | is that a dot (`.`) represents the current folder. The command
196 | 
197 | ```
198 | $ ls ./
199 | ```
200 | 
201 | should return the same as simply calling
202 | 
203 | ```
204 | $ ls
205 | ```
206 | 
207 | Two dots (`..`) represent the parent folder. If you call
208 | 
209 | ```
210 | $ ls ../
211 | ```
212 | 
213 | you should see the content of `/home`. If you call
214 | 
215 | ```
216 | $ ls ../../
217 | ```
218 | 
219 | you should see the content of the parent folder of the parent folder which
220 | is the root folder (`/`) assuming you are in `/home/ubuntu/`. Another
221 | convention is that `~/` represents the home directory of the user. The
222 | command
223 | 
224 | ```
225 | $ ls ~/
226 | ```
227 |     
228 | should list the content of your home directory independent of your
229 | current location in the file system.
230 | 
231 | Now as we know where we are and what is there we can start to change
232 | our location. For this we use the command `cd` (change directory). If
233 | you are in your home directory `/home/ubuntu/` you can go into the
234 | folder `unix_course_files` by typing
235 | 
236 | ```
237 | $ cd unix_course_files
238 | ```
239 | 
240 | After that call `pwd` to make sure that you are in the correct folder.
241 | 
242 | ```
243 | $ pwd 
244 | /home/ubuntu/unix_course_files
245 | ```
246 | 
247 | To go back into your home directory you have different options. Use
248 | the *absolute path*
249 | 
250 | ```
251 | $ cd /home/ubuntu/
252 | ```
253 | 
254 | or the above mentioned convention for the home directory `~/`:
255 | 
256 | ```
257 | $ cd ~/
258 | ```
259 | 
260 | or the *relative path*, in this case the parent directory of  
261 | `/home/ubuntu/unix_course_files`:
262 | 
263 | ```
264 | $ cd ../
265 | ```
266 | 
267 | As the home directory is such an important place `cd` uses this as
268 | default argument. This means if you call `cd` without argument you will
269 | go to the home directory. Test this behavior by calling
270 | 
271 | ```
272 | $ cd
273 | ```
274 | 
275 | Try now to go to different locations in the file system and list the
276 | files and folders located there.
277 | 
278 | Now we will create our first folder using the command `mkdir` (*make
279 | directory*). Go into the home directory and type:
280 | 
281 | ```
282 | $ mkdir my_first_folder
283 | ```
284 | 
285 | Here we can discuss the implementation of another Unix philosophy: "No
286 | news is good news." The command successfully created the folder
287 | `my_first_folder`. You can check this by calling `ls`, but `mkdir` did
288 | not tell you this. If you do not get a message this usually means
289 | everything went fine. If you call the above `mkdir` command again you
290 | should get an error message like this:
291 | 
292 | ```
293 | $ mkdir my_first_folder
294 | mkdir: cannot create directory ‘my_first_folder’: File exists
295 | ```
296 | 
297 | So if a command does not complain you can usually assume there was no
298 | error.
299 | 
300 | # Manipulating files and folder
301 | 
302 | Topics:
303 | 
304 | * `touch`
305 | * `cp`
306 | * `mv`
307 | * `rm` 
308 | 
309 | Next we want to manipulate files and folders. We create some dummy
310 | files using `touch` which is usually used to change the time stamps of
311 | files. But you can also create empty files with it easily. Let's
312 | create a file called `test_file_1.txt`:
313 | 
314 | ```
315 | $ touch test_file_1.txt
316 | ```
317 | 
318 | Use `ls` to check that it was created. 
319 | 
320 | The command `cp` (*copy*) can be used to copy files. For this it
321 | requires at least two arguments: the source and the target file. In
322 | the following example we generate a copy of the file `test_file_1.txt`
323 | called `a_copy_of_test_file.txt`.
324 | 
325 | ```
326 | $ cp test_file_1.txt a_copy_of_test_file.txt
327 | ```
328 | 
329 | Use `ls` to confirm that this worked. We can also copy the file in the
330 | folder `my_first_folder` which we have created above:
331 | 
332 | ```
333 | $ cp test_file_1.txt my_first_folder
334 | ```
335 | 
336 | Now there should be also a file `test_file_1.txt` in the folder
337 | `my_first_folder`. If you want to copy a folder and its content you
338 | have to use the parameter `-r`.
339 | 
340 | ```
341 | $ cp -r my_first_folder a_copy_of_my_first_folder
342 | ```
343 | 
344 | You can use the command `mv` (*move*) to rename or relocate files
345 | or folders. To rename the file `a_copy_of_test_file.txt` to
346 | `test_file_with_new_name.txt` call
347 | 
348 | ```
349 | $ mv a_copy_of_test_file.txt test_file_with_new_name.txt
350 | ```
351 | 
352 | With `mv` you can also move a file into a folder. For this the second
353 | argument has to be a folder. For example, to move the file now named
354 | `test_file_with_new_name.txt` into the folder `my_first_folder` use 
355 | 
356 | ```
357 | $ mv test_file_with_new_name.txt my_first_folder
358 | ```
359 | 
360 | You are not limited to one file if you want to move them into a
361 | folder. Let's create and move two files `file1` and `file2` into the
362 | folder `my_first_folder`.
363 | 
364 | ```
365 | $ touch file1 file2 
366 | $ mv file1 file2 my_first_folder
367 | ```
368 | 
369 | At this point we can introduce another handy feature most shells offer
370 | which is called *globbing*. Let us assume you want to apply the same
371 | command to several files. Instead of explicitly writing all the file
372 | names you can use a *globbing pattern* to address them. There are
373 | different wildcards that can be used for these patterns. The most
374 | important one is the asterisk (`*`). It can replace none, one or more
375 | characters. Let us explore this with a small example:
376 | 
377 | ```
378 | $ touch file1.txt file2.txt file3
379 | $ ls *txt
380 | $ mv *txt my_first_folder
381 | ```
382 | 
383 | The `ls` shows the two files matching the given pattern
384 | (i.e. `file1.txt` and `file2.txt`) while dismissing the one not
385 | matching (i.e. `file3`). Same for `mv` - it will only move the two
386 | files ending with `txt`.
387 | 
388 | We accumulated several test files that we do not need anymore. Time to clean
389 | up a little bit. With the command `rm` (*remove*) you can delete files
390 | and folders. Please be aware that there is no such thing as a trash
391 | bin if you remove items this way. They will be gone for good and without further notice. 
392 | 
393 | To delete a file in `my_first_folder` call:
394 | 
395 | ```
396 | $ rm my_first_folder/file1.txt
397 | ```
398 | 
399 | To remove a folder use the parameter `-r` (*recursive*):
400 | 
401 | ```
402 | $ rm -r my_first_folder
403 | ```
404 | 
405 | Alternatively you can use the command `rmdir`:
406 | 
407 | ```
408 | $ rmdir my_first_folder
409 | ```
410 | 
411 | # File content - part 1
412 | 
413 | Topics:
414 | 
415 | * `less` / `more`
416 | * `cat`
417 | * `echo`
418 | * `head`
419 | * `tail`
420 | * `cut`
421 | 
422 | Until now we did not care about the content of the files. This will
423 | change now. Please go into the folder `unix_course_files`:
424 | 
425 | ```
426 | $ cd unix_course_files
427 | ```
428 | 
429 | There should be some files waiting for you. To read the content with
430 | the possibility to scroll around we need a so called pager
431 | program. Most Unix systems offer the programs `more` and `less` which
432 | have very similar functionalities ("more or less are more or less the
433 | same"). We will use the later one here. Let's open the file
434 | `origin_of_species.txt`
435 | 
436 | ```
437 | $ less origin_of_species.txt
438 | ```
439 | 
440 | The file contains Charles Darwin's *Origin of species* in plain
441 | text. You can scroll up and down line-wise using the arrow keys or page-wise
442 | using the page-up/page-down keys. To quit use the key `q`. With
443 | pager programs you can read file content interactively, but sometimes
444 | you just want to have the content of a file given to you (i.e. on the
445 | *standard output*). The command `cat` (*concatenate*) does that for one
446 | or more files. Let us use it to see what is in the example file
447 | `two_lines.txt`. Assuming you are in the folder `unix_course_files`
448 | you can call
449 | 
450 | ```
451 | $ cat two_lines.txt
452 | ```
453 | 
454 | The content of the file is shown to you. You can apply the command to
455 | two files and the content is concatenated and returned:
456 | 
457 | ```
458 | $ cat two_lines.txt three_lines.txt
459 | ```
460 | 
461 | This is a good time to introduce the *standard input* and *standard
462 | output* and what you can do with it. Above I wrote the output is given
463 | to you. This means it is written to the so called *standard
464 | output*. You can redirect the *standard output* into a file by using
465 | `>`. Let us use the call above to generate a new file that contains
466 | the combined content of both files:
467 | 
468 | ```
469 | $ cat two_lines.txt three_lines.txt > five_lines.txt
470 | ```
471 | 
472 | Please have a look at the content of this file:
473 | 
474 | ```
475 | $ cat five_lines.txt
476 | ```
477 | 
478 | The *standard output* can also be redirected to other tools as
479 | *standard input*. More about this below. With `cat` we can reuse the
480 | existing file content. To create something new we use the command
481 | `echo` which writes a given string to the standard output.
482 | 
483 | ```
484 | $ echo "Something very creative"
485 | ```
486 | 
487 | To redirect the output into a target file use `>`.
488 | 
489 | ```
490 | $ echo "Something very creative." > creative.txt
491 | ```
492 | 
493 | Be aware that this can be dangerous. You will overwrite the content of an
494 | existing file. For example if you call now
495 | 
496 | ```
497 | $ echo "Something very uncreative." > creative.txt
498 | ```
499 | 
500 | there will be only the latest string written to the file and the
501 | previous one will be overwritten. To append the output of a command to a
502 | file without overwriting the content use `>>`.
503 | 
504 | ```
505 | $ echo "Something very creative." > creative.txt
506 | $ echo "Something very uncreative." >> creative.txt
507 | ```
508 | 
509 | Now `creative.txt` should contain two lines.
510 | 
511 | Sometimes you just want to get an excerpt of a file e.g. just the
512 | first or last lines of it. For this the commands `head` and `tail` can
513 | be used. Per default 10 lines are shown. You can use the parameter `-n
514 | <NUMBER>` (e.g. `-n 20` or just `-<NUMBER>` (e.g. `-20`) to specify the
515 | number of lines to be displayed. Test the tools with the file
516 | `origin_of_species.txt`:
517 | 
518 | ```
519 | $ head origin_of_species.txt
520 | $ tail origin_of_species.txt
521 | ```
522 | 
523 | You cannot only select vertically but also horizontally using the
524 | command `cut`. Let us extract only the first 10 characters of each line
525 | in the file `origin_of_species.txt`:
526 | 
527 | ```
528 | $ cut -c 1-10 origin_of_species.txt
529 | ```
530 | 
531 | The tool `cut` can be very useful to extract certain columns from CSV
532 | files (*comma/character separated values*). Have a look at the content of the
533 | file `genes.csv`. You see that it contains different columns that are
534 | tabular-separated. You can extract selected columns with `cut`:
535 | 
536 | ```
537 | $ cut -f 1,4 genes.csv
538 | ```
539 | 
540 | # File content - part 2
541 | 
542 | Topics:
543 | 
544 | * `wc`
545 | * `sort`
546 | * `uniq`
547 | * `grep`
548 | 
549 | There are several tools that let you manipulate the content of a plain
550 | text file or return information about it. If you want for example some
551 | statistics about the number of character, words and lines use the
552 | command `wc`. Let us count the number of lines in the file
553 | `origin_of_species.txt`:
554 | 
555 | ```
556 | $ wc -l origin_of_species.txt
557 | ```
558 | 
559 | You can use the command `sort` to sort a file alpha-numerically. Test
560 | the following calls
561 | 
562 | ```
563 | $ sort unsorted_numbers.txt
564 | $ sort -n unsorted_numbers.txt
565 | $ sort -rn unsorted_numbers.txt
566 | ```
567 | 
568 | and try to understand the output.
569 | 
570 | The tool `uniq` takes a sorted list of lines and removes line-wise the
571 | redundancy. Please have a look at the content of the file
572 | `redundant.txt`. Then use `uniq` to generate a non-redundant list:
573 | 
574 | ```
575 | $ uniq redundant.txt
576 | ```
577 | 
578 | If you call `uniq` with `-c` you get the number of occurrence for each
579 | remaining entry:
580 | 
581 | ```
582 | $ uniq -c redundant.txt
583 | ```
584 | 
585 | With the tool `grep` you can extract lines that match a given
586 | pattern. For instance, if you want to find all lines in
587 | `origin_of_species.txt` that contain the word `species` call
588 | 
589 | ```
590 | $ grep species origin_of_species.txt
591 | ```
592 | 
593 | As you can see we only get the lines that contain `species` but not
594 | the ones that contain `Species`. To make the search case-insensitive
595 | use the parameter `-i`.
596 | 
597 | ```
598 | $ grep -i species origin_of_species.txt
599 | ```
600 | 
601 | If you are only interested in the number of lines that match the pattern
602 | use `-c`:
603 | 
604 | ```
605 | $ grep -ic species origin_of_species.txt
606 | ```
607 | 
608 | # Connecting tools
609 | 
610 | Another piece of the Unix philosophy is to build small tools that do
611 | one thing optimally and use the standard input and standard
612 | output. The real power of Unix builds on the capability to easily
613 | connect tools. For this so-called *pipes* are used. To use the
614 | *standard output* of one tool as *standard input* of another tool the
615 | vertical bar `|` is used. For example, in order to extract the first
616 | 1000 lines from `origin_of_species.txt`, search for lines that contain
617 | `species`, then search in those lines the ones which contain `wild`
618 | and finally replace the `w`s by `m`s call (Please write this in one line
619 | in the shell and remove the `\`):
620 | 
621 | ```
622 | $ head -n 1000 origin_of_species.txt | grep species \ 
623 |   | grep wild | tr w m
624 | ```
625 | 
626 | # Repeating command using the `for` loop
627 | 
628 | Assuming you want to generate a copy of each of your files ending with ´.txt´. A
629 | 
630 | ```
631 | cp *txt copy_of_*txt
632 | ```
633 | 
634 | would not work.
635 | 
636 | With `for` loops you can solve this problem. Let's start with a simple
637 | one. 
638 | 
639 | ```
640 | for FILE in three_lines.txt two_lines.txt
641 | > do
642 | > head -n 1 $FILE
643 | > done
644 | ```
645 | 
646 | The variable `FILE` (you can give it also any other name) can be used
647 | inside of the loop.
648 | 
649 | If you press now Ctr-↑ you will get the line
650 | 
651 | ```
652 | for FILE in three_lines.txt two_lines.txt; do head -n 1 $FILE; done
653 | ```
654 | 
655 | which is equivalent to the call before. You can not only call one
656 | command inside of a loop but several:
657 | 
658 | ```
659 | for FILE in three_lines.txt two_lines.txt
660 | > do
661 | > head -n 1 $FILE
662 | > echo "-----------------"
663 | > done
664 | ```
665 | 
666 | ```
667 | for FILE in *txt
668 | > do
669 | > head -n 1 $FILE
670 | > echo "-----------------"
671 | > done
672 | ```
673 | 
674 | ```
675 | for FILE in *txt
676 | > do
677 | > cp $FILE copy_of_$FILE
678 | > done
679 | ```
680 | 
681 | # Shell scripting
682 | 
683 | Open a new file in a text editor of you choice, call it
684 | `count_lines.sh` and add the following text:
685 | 
686 | ```
687 | echo "Number of lines in the given file":
688 | wc -l origin_of_species.txt
689 | ```
690 | 
691 | Save the file, make sure the file `origin_of_species.txt` is in the
692 | same folder and run it the script:
693 | 
694 | ```
695 | $ bash count_lines.sh
696 | ```
697 | 
698 | You should get someting like
699 | 
700 | ```
701 | Number of lines in the given file
702 | 15322 origin_of_species.txt
703 | ```
704 | 
705 | This a very first shell script. Now we want to make it more
706 | flexible. Instead of hard coding the input file for `wc -l` we want to
707 | be able to give this as argument to the shell script. For this we
708 | change the shell script to:
709 | 
710 | ```
711 | echo "Number of lines in the given file":
712 | wc -l $1
713 | ```
714 | 
715 | The `$1` is a varible that represents the first argument given to the
716 | shell scrip. Now you can call the script in the following way
717 | 
718 | ```
719 | $ bash count_lines.sh origin_of_species.txt
720 | ```
721 | 
722 | You should get the same results as before. If you also like to take
723 | the second argument use the variable `$2`. For using all arguments
724 | given to the shell script use the variable "$@". E.g change the shell
725 | script to:
726 | 
727 | 
728 | ```
729 | echo "Number of lines in the given file(s)":
730 | wc -l $@
731 | ```
732 | 
733 | and run it with several input files:
734 | 
735 | ```
736 | bash count_lines.sh origin_of_species.txt genes.csv
737 | ```
738 | 
739 | You should get something like:
740 | 
741 | ```
742 | Number of lines that contains species:
743 |  15322 origin_of_species.txt
744 |       5 genes.csv
745 |  15327 total
746 | ```       
747 | 
748 | # Examples analysis
749 | 
750 | Equipped with a fine selection of useful programs and basic
751 | understanding of how to combine them, we will no apply them to analyze
752 | real biological data.
753 | 
754 | ## Retrieving data
755 | 
756 | You have used the tool `wget` above to download the example files. It is
757 | very useful, especially, if you want to retrieve large data sets. We
758 | download the fasta file of the *Salmonella* Thyphimuirum SL1344
759 | chromosome by calling (in this document the URL is split into three
760 | lines. Please write it in one line in the shell and remove the `\`).
761 | 
762 | ```
763 | $ wget ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/\
764 | Salmonella_enterica_serovar_Typhimurium_SL1344_uid86645/\
765 | NC_016810.fna
766 | ```
767 | 
768 | Additionally, we download the annotation in GFF format of the same replicon:
769 | 
770 | ```
771 | $ wget ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/\
772 | Salmonella_enterica_serovar_Typhimurium_SL1344_uid86645/\
773 | NC_016810.gff
774 | ```   
775 | 
776 | ## Counting the number of features
777 | 
778 | Use `less` to have a look at `NC_016810.gff`. It is a tabular-separated
779 | file. The first 5 lines start with `#` and are called
780 | header. Then several lines with 9 columns follow. The third column
781 | contains the type of the entry (gene, CDS, tRNA, rRNA, etc). If we
782 | want to know the numbers of tRNA entries we could try to apply `grep`
783 | and use `-c` to count the number of matching lines.
784 | 
785 | ```
786 | $ grep -c tRNA NC_016810.gff
787 | ```
788 | 
789 | This leads to a suspiciously large number. The issue is that the
790 | string `tRNA` also occurs in the attribute column (the 9th
791 | column). We just want to select lines with a match in the third column.
792 | This can be achieved by combining `cut` and `grep`. 
793 | 
794 | ```
795 | $ cut -f 3 NC_016810.gff | grep -c tRNA
796 | ```
797 | 
798 | To get the number of entries for all other features we could just
799 | replace the `tRNA` e.g. by `rRNA`. But we can also get the number for
800 | all of them at once using this constellation:
801 | 
802 | ```
803 | $ grep -v "#" NC_016810.gff | cut -f 3 | sort | uniq -c
804 | ```
805 | 
806 | Try to understand what we did here. You can use a similar call to
807 | count the number genes on the plus and minus strand:
808 | 
809 | ```
810 | $ cut -f 3,7 NC_016810.gff | grep gene | sort | uniq -c
811 | ```
812 | 
813 | 


--------------------------------------------------------------------------------
/Unix_Shell_Handout.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/931b6ff1e3b104f25bf9d4cf5e9f14ff42f5f1ee/Unix_Shell_Handout.pdf


--------------------------------------------------------------------------------
/Unix_Shell_cheat_sheet.md:
--------------------------------------------------------------------------------
 1 | # Unix Shell cheat sheet
 2 | 
 3 | - `ls` - lists file
 4 | - `man` - shows the manual for a command; use arrrow keys to scroll up and down, `q` to quit
 5 | - `pwd` - shows the *p*resent *w*ork *d*irectory
 6 | - `cd` - change directory
 7 | - `echo` - write a string to the standard output
 8 | - `touch` - creates an empty file
 9 | - `cp` - copys files and/or folder
10 | - `mv` - moves files and/or folder
11 | - `rm` - removes files and/or folder
12 | - `less` / `more` - pager programs to open a plain text file interactively; use arrrow keys to scroll up and down, `q` to quit
13 | - `cat` - writes the content of one or more file to the standard output
14 | - `head` - shows the first lines (per default 10) lines of a text file
15 | - `tail` - shows the last lines (per default 10) lines of a text file
16 | - `cut` - returns column of a text file
17 | - `wc` - *w*ord *c*ount, counts number of characters, words, and lines of a text file
18 | - `sort` - sorts a test file
19 | - `uniq` - removes redundancies from sorted list of lines
20 | - `grep` - extracts line of a file that match a pattern
21 | 


--------------------------------------------------------------------------------
/Unix_Shell_cheat_sheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/931b6ff1e3b104f25bf9d4cf5e9f14ff42f5f1ee/Unix_Shell_cheat_sheet.pdf


--------------------------------------------------------------------------------
/by.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/konrad/Introduction_to_the_Unix_Shell_for_biologists/931b6ff1e3b104f25bf9d4cf5e9f14ff42f5f1ee/by.png


--------------------------------------------------------------------------------