├── .gitignore ├── Files ├── humchrx.txt └── test.txt ├── Intro-to-Python ├── 00.ipynb ├── 01.ipynb ├── 02.ipynb ├── 03.ipynb ├── 04.ipynb ├── 05.ipynb ├── 06.ipynb ├── 07.ipynb ├── 08.ipynb ├── 09.ipynb ├── bank.py ├── dnatools.py ├── execution.png ├── genelist.py ├── seqtools.py └── sysargv.py └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | *.pyc 3 | */.ipynb_checkpoints/* 4 | -------------------------------------------------------------------------------- /Files/test.txt: -------------------------------------------------------------------------------- 1 | My first file written from Python 2 | --------------------------------- 3 | Hello, world! 4 | -------------------------------------------------------------------------------- /Intro-to-Python/00.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python For Bioinformatics\n", 8 | "\n", 9 | "Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.\n", 10 | "\n", 11 | "\n", 12 | "\n", 13 | "## Attribution\n", 14 | "These tutorials are an adaptation of the Introduction to Python for Maths by [Andreas Ernst](http://users.monash.edu.au/~andreas), available from https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git. The original version was written by Rajath Kumar and is available at https://github.com/rajathkumarmp/Python-Lectures.\n", 15 | "\n", 16 | "These notes have been greatly amended and updated for the EANBiT Introduction to Python for Bioinformatics course facilitated [Caleb Kibet](https://twitter.com/calkibet), Audrey Mbogho and Anthony Etuk. \n", 17 | "\n", 18 | "\n", 19 | "# Quick Introduction to Jupyter Notebooks\n", 20 | "\n", 21 | "Throughout this course, we will be using Jupyter Notebooks. Although the HPC you will be using will have Jupyter setup, these notes are provided for you want to set it up in your Computer. \n", 22 | "\n", 23 | "## Introduction\n", 24 | "The Jupyter Notebook is an interactive computing environment that enables users to author notebooks, which contain a complete and self-contained record of a computation. These notebooks can be shared more efficiently. The notebooks may contain:\n", 25 | "* Live code\n", 26 | "* Interactive widgets\n", 27 | "* Plots\n", 28 | "* Narrative text\n", 29 | "* Equations\n", 30 | "* Images\n", 31 | "* Video\n", 32 | "\n", 33 | "It is good to note that \"Jupyter\" is a loose acronym meaning Julia, Python, and R; the primary languages supported by Jupyter. \n", 34 | "\n", 35 | "The notebook can allow a computational researcher to create reproducible documentation of their research. As Bioinformatics is datacentric, use of Jupyter Notebooks increases research transparency, hence promoting open science. \n", 36 | "\n", 37 | "## First Steps\n", 38 | "\n", 39 | "### Installation\n", 40 | "\n", 41 | "1. [Download Miniconda](https://www.anaconda.com/download/) for your specific OS to your home directory\n", 42 | " - Linux: `wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh`\n", 43 | " - Mac: `curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh`\n", 44 | "2. Run:\n", 45 | " - `bash Miniconda3-latest-Linux-x86_64.sh`\n", 46 | " - `bash Miniconda3-latest-MacOSX-x86_64.sh`\n", 47 | "3. Follow all the prompts: if unsure, accept defaults\n", 48 | "4. Close and re-open your terminal\n", 49 | "5. If the installation is successful, you should see a list of installed packages with\n", 50 | " - `conda list`\n", 51 | "If the command cannot be found, you can add Anaconda bin to the path using:\n", 52 | " ` export PATH=~/anaconda3/bin:$PATH`\n", 53 | "\n", 54 | "For reproducible analysis, you can [create a conda environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) with all the Python packages you used.\n", 55 | "\n", 56 | " `conda create --name bioinf python jupyter`\n", 57 | " \n", 58 | "To activate the conda environment:\n", 59 | " `source activate bioinf`\n", 60 | "\n", 61 | "Having set-up conda environment, you can install any package you need using pip. \n", 62 | "\n", 63 | "`conda install jupyter`\n", 64 | "`conda install -c conda-forge jupyterlab`\n", 65 | "\n", 66 | "or by using pip\n", 67 | "\n", 68 | "`pip3 install jupyter`\n", 69 | "\n", 70 | "Then you can quickly launch it using:\n", 71 | "\n", 72 | "`jupyter notebook` or `jupyter lab`\n", 73 | "\n", 74 | "NB: We will use a jupyter lab for training. \n", 75 | "\n", 76 | "\n", 77 | "A Jupyter notebook is made up of many cells. Each cell can contain Python code. You can execute a cell by clicking on it and pressing `Shift-Enter` or `Ctrl-Enter` (run without moving to the next line). \n", 78 | "\n", 79 | "### Login into the web server\n", 80 | "\n", 81 | "The easiest way to run this and other notebooks for the EANBiT course participants is to log into the Jupyter server (Unfortunately, this is not currently working). The steps for running notebooks are:\n", 82 | "* Log in using the username and password assigned to you. The first time you log in an empty account will automatically be set up for you.\n", 83 | "* Press the start button (if prompted by the system)\n", 84 | "* Use the menu of the jupyter system to upload a .ipynb python notebook file or to start a new notebook.\n", 85 | "\n", 86 | "### Further help\n", 87 | "\n", 88 | "To learn more about Jupyter notebooks, check [the official introduction](http://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb) and [some useful Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/). \n", 89 | "\n", 90 | "Book: http://www.ict.ru.ac.za/Resources/cspw/thinkcspy3/thinkcspy3.pdf\n", 91 | "\n", 92 | "# Python for Bioinformatics\n", 93 | "\n", 94 | "## Introduction\n", 95 | "\n", 96 | "Python is a modern, robust, high-level programming language. It is straightforward to pick up even if you are entirely new to programming. \n", 97 | "\n", 98 | "Python, similar to other languages like Matlab or R, is interpreted hence runs slowly compared to C++, Fortran or Java. However, writing programs in Python is very quick. Python has an extensive collection of libraries for everything from scientific computing to web services. It caters for object-oriented and functional programming with a module system that allows large and complex applications to be developed in Python. \n", 99 | "\n", 100 | "These lectures are using Jupyter notebooks which mix Python code with documentation. The python notebooks can be run on a web server or stand-alone on a computer.\n", 101 | "\n", 102 | "\n", 103 | "## Contents\n", 104 | "\n", 105 | "This course is broken up into a number of notebooks (lectures).\n", 106 | "### Session 1\n", 107 | "* [00](Intro-to-Python/00.ipynb) This introduction with additional information below on how to get started in running python\n", 108 | "* [01](Intro-to-Python/01.ipynb) Basic data types and operations (numbers, strings) \n", 109 | "\n", 110 | "### Session 2\n", 111 | "* [02](Intro-to-Python/02.ipynb) String manipulation \n", 112 | "* [03](Intro-to-Python/03.ipynb) Data structures: Lists and Tuples\n", 113 | "* [04](Intro-to-Python/04.ipynb) Data structures (continued): dictionaries\n", 114 | "\n", 115 | "### Session 3\n", 116 | "* [05](Intro-to-Python/05.ipynb) Control statements: if, for, while, try statements\n", 117 | "* [06](Intro-to-Python/06.ipynb) Functions\n", 118 | "* [07](Intro-to-Python/07.ipynb) Scripting with python\n", 119 | "* [08](Intro-to-Python/08.ipynb) Data Analysis and plotting with Pandas\n", 120 | "* [09](Intro-to-Python/09.ipynb) Reproducible Bioinformatics Research\n", 121 | "\n", 122 | "This is a tutorial style introduction to Python. For a quick reminder/summary of Python syntax, the following [Quick Reference Card](http://www.cs.put.poznan.pl/csobaniec/software/python/py-qrc.html) may be useful. A longer and more detailed tutorial style introduction to python is available from the python site at: https://docs.python.org/3/tutorial/.\n", 123 | "\n", 124 | "\n", 125 | "## How to learn from this resource?\n", 126 | "\n", 127 | "Download all the notebooks from [Python4Bioinformatics](https://github.com/kipkurui/Python4Bioinformatics). The easiest way to do that is to clone the GitHub repository to your working directory using any of the following commands:\n", 128 | "\n", 129 | " git clone https://github.com/kipkurui/Python4Bioinformatics.git\n", 130 | "\n", 131 | "or\n", 132 | "\n", 133 | " wget https://github.com/kipkurui/Python4Bioinformatics/archive/master.zip\n", 134 | " \n", 135 | " unzip master.zip\n", 136 | " \n", 137 | " rm master.zip\n", 138 | " \n", 139 | "\n", 140 | "## How to Contribute\n", 141 | "\n", 142 | "To contribute, fork the repository, make some updates and send me a pull request. \n", 143 | "\n", 144 | "Alternatively, you can open an issue. \n", 145 | "\n", 146 | "## License\n", 147 | "This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/.\n" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [] 156 | } 157 | ], 158 | "metadata": { 159 | "kernelspec": { 160 | "display_name": "Python 3", 161 | "language": "python", 162 | "name": "python3" 163 | }, 164 | "language_info": { 165 | "codemirror_mode": { 166 | "name": "ipython", 167 | "version": 3 168 | }, 169 | "file_extension": ".py", 170 | "mimetype": "text/x-python", 171 | "name": "python", 172 | "nbconvert_exporter": "python", 173 | "pygments_lexer": "ipython3", 174 | "version": "3.6.5" 175 | } 176 | }, 177 | "nbformat": 4, 178 | "nbformat_minor": 2 179 | } 180 | -------------------------------------------------------------------------------- /Intro-to-Python/01.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "All of these python notebooks are available at https://github.com/kipkurui/Python4Bioinformatics\n", 9 | "" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Getting started\n", 17 | "\n", 18 | "Python can be used as a calculator. Simply type in expressions to get them evaluated.\n", 19 | "\n", 20 | "## Basic syntax for statements \n", 21 | "The basic rules for writing simple statements and expressions in Python are:\n", 22 | "\n", 23 | "* No spaces or tab characters allowed at the start of a statement: Indentation plays a special role in Python (see the section on control statements). For now, simply ensure that all statements start at the beginning of the line.\n", 24 | "* The '#' character indicates that the rest of the line is a comment\n", 25 | "* Statements finish at the end of the line:\n", 26 | "* Except when there is an open bracket or parenthesis:\n", 27 | "\n", 28 | "```python\n", 29 | "1+2\n", 30 | "+3 #illegal continuation of the sum\n", 31 | "\n", 32 | "(1+2\n", 33 | " + 3) # perfectly OK even with spaces\n", 34 | "```\n", 35 | "* A single backslash at the end of the line can also be used to indicate that a statement is still incomplete \n", 36 | "```python\n", 37 | "1 + \\\n", 38 | "2 + 3 # this is also OK\n", 39 | "```\n", 40 | "The jupyter notebook system for writing Python intersperses text (like this) with Python statements. Try typing something into the cell (box) below and press the 'run cell' button above (triangle+line symbol) to execute it.\n" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 1, 46 | "metadata": {}, 47 | "outputs": [ 48 | { 49 | "data": { 50 | "text/plain": [ 51 | "8" 52 | ] 53 | }, 54 | "execution_count": 1, 55 | "metadata": {}, 56 | "output_type": "execute_result" 57 | } 58 | ], 59 | "source": [ 60 | "(1+3\n", 61 | " +4)" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 2, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "8" 78 | ] 79 | }, 80 | "execution_count": 2, 81 | "metadata": {}, 82 | "output_type": "execute_result" 83 | } 84 | ], 85 | "source": [ 86 | "1+3\\\n", 87 | "+4" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 3, 93 | "metadata": {}, 94 | "outputs": [ 95 | { 96 | "data": { 97 | "text/plain": [ 98 | "6" 99 | ] 100 | }, 101 | "execution_count": 3, 102 | "metadata": {}, 103 | "output_type": "execute_result" 104 | } 105 | ], 106 | "source": [ 107 | "1+2+3 #doing math" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "### Your First Code" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 4, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "name": "stdout", 124 | "output_type": "stream", 125 | "text": [ 126 | "Hello World!\n" 127 | ] 128 | } 129 | ], 130 | "source": [ 131 | "print(\"Hello World!\")" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 5, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "name": "stdout", 141 | "output_type": "stream", 142 | "text": [ 143 | "My name is Caleb\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "print('My name is Caleb')" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "Notice the syntax. The keyword `print` is a built-in command, and 'Hello World' is a string. In Bioinformatics, a string example would be DNA or amino acids sequence. \n", 156 | "\n", 157 | "For example:" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 6, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "ACGTACTAG\n" 170 | ] 171 | } 172 | ], 173 | "source": [ 174 | "print('ACGTACTAG')" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 7, 180 | "metadata": {}, 181 | "outputs": [ 182 | { 183 | "data": { 184 | "text/plain": [ 185 | "'Caleb'" 186 | ] 187 | }, 188 | "execution_count": 7, 189 | "metadata": {}, 190 | "output_type": "execute_result" 191 | } 192 | ], 193 | "source": [ 194 | "input('What is your name?')" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "You can write a program that asks for a DNA sequence, and prints it out. This is as simple as:" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 8, 207 | "metadata": {}, 208 | "outputs": [ 209 | { 210 | "name": "stdout", 211 | "output_type": "stream", 212 | "text": [ 213 | "ACGTATAGCA\n" 214 | ] 215 | } 216 | ], 217 | "source": [ 218 | "print(input(\"Please Enter a DNA sequence: \"))" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 9, 224 | "metadata": {}, 225 | "outputs": [ 226 | { 227 | "name": "stdout", 228 | "output_type": "stream", 229 | "text": [ 230 | "wxtylkas\n" 231 | ] 232 | } 233 | ], 234 | "source": [ 235 | "print(input(\"Please Enter aa sequence: \"))" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "As we go along, we'll learn how to check if the user has entered a valid DNA or Amino acid sequence. For, now, let's learn some basics. " 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "### Getting help\n", 250 | "Python has extensive help built in. You can execute **help()** for an interactive help session or **help(x)** for any library, object or type **x** to get more information. For example:" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 10, 256 | "metadata": {}, 257 | "outputs": [ 258 | { 259 | "name": "stdout", 260 | "output_type": "stream", 261 | "text": [ 262 | "\n", 263 | "Welcome to Python 3.6's help utility!\n", 264 | "\n", 265 | "If this is your first time using Python, you should definitely check out\n", 266 | "the tutorial on the Internet at https://docs.python.org/3.6/tutorial/.\n", 267 | "\n", 268 | "Enter the name of any module, keyword, or topic to get help on writing\n", 269 | "Python programs and using Python modules. To quit this help utility and\n", 270 | "return to the interpreter, just type \"quit\".\n", 271 | "\n", 272 | "To get a list of available modules, keywords, symbols, or topics, type\n", 273 | "\"modules\", \"keywords\", \"symbols\", or \"topics\". Each module also comes\n", 274 | "with a one-line summary of what it does; to list the modules whose name\n", 275 | "or summary contain a given string such as \"spam\", type \"modules spam\".\n", 276 | "\n", 277 | "Help on built-in function print in module builtins:\n", 278 | "\n", 279 | "print(...)\n", 280 | " print(value, ..., sep=' ', end='\\n', file=sys.stdout, flush=False)\n", 281 | " \n", 282 | " Prints the values to a stream, or to sys.stdout by default.\n", 283 | " Optional keyword arguments:\n", 284 | " file: a file-like object (stream); defaults to the current sys.stdout.\n", 285 | " sep: string inserted between values, default a space.\n", 286 | " end: string appended after the last value, default a newline.\n", 287 | " flush: whether to forcibly flush the stream.\n", 288 | "\n", 289 | "\n", 290 | "You are now leaving help and returning to the Python interpreter.\n", 291 | "If you want to ask for help on a particular object directly from the\n", 292 | "interpreter, you can type \"help(object)\". Executing \"help('string')\"\n", 293 | "has the same effect as typing a particular string at the help> prompt.\n" 294 | ] 295 | } 296 | ], 297 | "source": [ 298 | "help()" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "In the interactive session above, enter **print**. Alternatively, you can obtain the same information by typing:" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 11, 311 | "metadata": {}, 312 | "outputs": [ 313 | { 314 | "data": { 315 | "text/plain": [ 316 | "'Caleb'" 317 | ] 318 | }, 319 | "execution_count": 11, 320 | "metadata": {}, 321 | "output_type": "execute_result" 322 | } 323 | ], 324 | "source": [ 325 | "input(prompt='Enter your name: ')" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 12, 331 | "metadata": {}, 332 | "outputs": [ 333 | { 334 | "name": "stdout", 335 | "output_type": "stream", 336 | "text": [ 337 | "Help on built-in function print in module builtins:\n", 338 | "\n", 339 | "print(...)\n", 340 | " print(value, ..., sep=' ', end='\\n', file=sys.stdout, flush=False)\n", 341 | " \n", 342 | " Prints the values to a stream, or to sys.stdout by default.\n", 343 | " Optional keyword arguments:\n", 344 | " file: a file-like object (stream); defaults to the current sys.stdout.\n", 345 | " sep: string inserted between values, default a space.\n", 346 | " end: string appended after the last value, default a newline.\n", 347 | " flush: whether to forcibly flush the stream.\n", 348 | "\n" 349 | ] 350 | } 351 | ], 352 | "source": [ 353 | "help(print)" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "You can also print many items, and specify a separator. This is the delimiter. Below, we are using tab. " 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 13, 366 | "metadata": {}, 367 | "outputs": [ 368 | { 369 | "name": "stdout", 370 | "output_type": "stream", 371 | "text": [ 372 | "Name\tID\tAge\tGender\n" 373 | ] 374 | } 375 | ], 376 | "source": [ 377 | "print('Name', 'ID', 'Age', 'Gender', sep='\\t')" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "# Variables & Values" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "A name that is used to denote something or a value is called a variable. In Python, variables can be declared and values can be assigned to it as follows:" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 14, 397 | "metadata": {}, 398 | "outputs": [], 399 | "source": [ 400 | "a = 2+3" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 15, 406 | "metadata": {}, 407 | "outputs": [ 408 | { 409 | "data": { 410 | "text/plain": [ 411 | "5" 412 | ] 413 | }, 414 | "execution_count": 15, 415 | "metadata": {}, 416 | "output_type": "execute_result" 417 | } 418 | ], 419 | "source": [ 420 | "a" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": 16, 426 | "metadata": {}, 427 | "outputs": [ 428 | { 429 | "name": "stdout", 430 | "output_type": "stream", 431 | "text": [ 432 | "7 Hey\n" 433 | ] 434 | } 435 | ], 436 | "source": [ 437 | "x = 2 # anything after a '#' is a comment\n", 438 | "y = 5\n", 439 | "xy = 'Hey'\n", 440 | "print(x+y, xy)" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": {}, 446 | "source": [ 447 | "Multiple variables can be assigned with the same value." 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 17, 453 | "metadata": {}, 454 | "outputs": [ 455 | { 456 | "name": "stdout", 457 | "output_type": "stream", 458 | "text": [ 459 | "1 1\n" 460 | ] 461 | } 462 | ], 463 | "source": [ 464 | "x = y = 1\n", 465 | "print(x,y)" 466 | ] 467 | }, 468 | { 469 | "cell_type": "markdown", 470 | "metadata": {}, 471 | "source": [ 472 | "To understand how Python asigns variables we will use: http://www.pythontutor.com/visualize.html. We'll use this to visualize what goes on behind the scene as you assign a value to a variable. " 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "### Datatypes\n", 480 | "The basic types build into Python include `float` (floating point numbers), `int` (integers), `str` (unicode character strings) and `bool` (boolean). Some examples of each:\n", 481 | "\n", 482 | "#### Intergers\n", 483 | "\n", 484 | "Their type is `int`, and they can have as many digits as you want." 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 18, 490 | "metadata": {}, 491 | "outputs": [ 492 | { 493 | "data": { 494 | "text/plain": [ 495 | "1" 496 | ] 497 | }, 498 | "execution_count": 18, 499 | "metadata": {}, 500 | "output_type": "execute_result" 501 | } 502 | ], 503 | "source": [ 504 | "1 #simple interger" 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": 19, 510 | "metadata": {}, 511 | "outputs": [ 512 | { 513 | "data": { 514 | "text/plain": [ 515 | "-12" 516 | ] 517 | }, 518 | "execution_count": 19, 519 | "metadata": {}, 520 | "output_type": "execute_result" 521 | } 522 | ], 523 | "source": [ 524 | "-12 #a negative integer" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 20, 530 | "metadata": {}, 531 | "outputs": [ 532 | { 533 | "data": { 534 | "text/plain": [ 535 | "123" 536 | ] 537 | }, 538 | "execution_count": 20, 539 | "metadata": {}, 540 | "output_type": "execute_result" 541 | } 542 | ], 543 | "source": [ 544 | "+123 # A positive interger" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": null, 550 | "metadata": {}, 551 | "outputs": [], 552 | "source": [] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "#### String\n", 559 | "\n", 560 | "A string is enclosed in a pair of single or double quotes." 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 21, 566 | "metadata": {}, 567 | "outputs": [ 568 | { 569 | "data": { 570 | "text/plain": [ 571 | "str" 572 | ] 573 | }, 574 | "execution_count": 21, 575 | "metadata": {}, 576 | "output_type": "execute_result" 577 | } 578 | ], 579 | "source": [ 580 | "dna=\"ATCGTAGTACGGTA\"\n", 581 | "type(dna)" 582 | ] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": {}, 587 | "source": [ 588 | "When you have long strings, you can enclose with triple double quotes. This allows for spaces and new lines. " 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 22, 594 | "metadata": {}, 595 | "outputs": [ 596 | { 597 | "name": "stdout", 598 | "output_type": "stream", 599 | "text": [ 600 | "MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS\n", 601 | "LFPGEELK SLLKKTPDVV KAFPLLLAVR DESISLLD\n" 602 | ] 603 | }, 604 | { 605 | "data": { 606 | "text/plain": [ 607 | "str" 608 | ] 609 | }, 610 | "execution_count": 22, 611 | "metadata": {}, 612 | "output_type": "execute_result" 613 | } 614 | ], 615 | "source": [ 616 | "aa = \"\"\"MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS\n", 617 | "LFPGEELK SLLKKTPDVV KAFPLLLAVR DESISLLD\"\"\"\n", 618 | "\n", 619 | "print(aa)\n", 620 | "\n", 621 | "type(aa)" 622 | ] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "#### Float\n", 629 | "\n", 630 | "Used to represent floating point numbers, which always have a decimal point and a number afterward. " 631 | ] 632 | }, 633 | { 634 | "cell_type": "code", 635 | "execution_count": 23, 636 | "metadata": {}, 637 | "outputs": [ 638 | { 639 | "data": { 640 | "text/plain": [ 641 | "2.0" 642 | ] 643 | }, 644 | "execution_count": 23, 645 | "metadata": {}, 646 | "output_type": "execute_result" 647 | } 648 | ], 649 | "source": [ 650 | "2.0 # a simple floating point number" 651 | ] 652 | }, 653 | { 654 | "cell_type": "markdown", 655 | "metadata": {}, 656 | "source": [ 657 | "#### Booleans\n", 658 | "\n", 659 | "There are only two Boolean values: True and False." 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": 24, 665 | "metadata": {}, 666 | "outputs": [ 667 | { 668 | "data": { 669 | "text/plain": [ 670 | "True" 671 | ] 672 | }, 673 | "execution_count": 24, 674 | "metadata": {}, 675 | "output_type": "execute_result" 676 | } 677 | ], 678 | "source": [ 679 | "True or False # the two possible boolean values" 680 | ] 681 | }, 682 | { 683 | "cell_type": "code", 684 | "execution_count": 25, 685 | "metadata": {}, 686 | "outputs": [ 687 | { 688 | "data": { 689 | "text/plain": [ 690 | "True" 691 | ] 692 | }, 693 | "execution_count": 25, 694 | "metadata": {}, 695 | "output_type": "execute_result" 696 | } 697 | ], 698 | "source": [ 699 | "'AT' in dna" 700 | ] 701 | }, 702 | { 703 | "cell_type": "markdown", 704 | "metadata": {}, 705 | "source": [ 706 | "# Operators" 707 | ] 708 | }, 709 | { 710 | "cell_type": "markdown", 711 | "metadata": {}, 712 | "source": [ 713 | "## Arithmetic Operators" 714 | ] 715 | }, 716 | { 717 | "cell_type": "markdown", 718 | "metadata": {}, 719 | "source": [ 720 | "| Symbol | Task Performed |\n", 721 | "|----|---|\n", 722 | "| + | Addition |\n", 723 | "| - | Subtraction |\n", 724 | "| / | division |\n", 725 | "| % | mod |\n", 726 | "| * | multiplication |\n", 727 | "| // | floor division |\n", 728 | "| ** | to the power of |\n", 729 | "\n", 730 | "When one of the numbers in the operation is a float, the result is also a float. " 731 | ] 732 | }, 733 | { 734 | "cell_type": "code", 735 | "execution_count": 26, 736 | "metadata": {}, 737 | "outputs": [ 738 | { 739 | "data": { 740 | "text/plain": [ 741 | "3.0" 742 | ] 743 | }, 744 | "execution_count": 26, 745 | "metadata": {}, 746 | "output_type": "execute_result" 747 | } 748 | ], 749 | "source": [ 750 | "2.0 + 1" 751 | ] 752 | }, 753 | { 754 | "cell_type": "code", 755 | "execution_count": 27, 756 | "metadata": {}, 757 | "outputs": [ 758 | { 759 | "data": { 760 | "text/plain": [ 761 | "3" 762 | ] 763 | }, 764 | "execution_count": 27, 765 | "metadata": {}, 766 | "output_type": "execute_result" 767 | } 768 | ], 769 | "source": [ 770 | "1+2" 771 | ] 772 | }, 773 | { 774 | "cell_type": "code", 775 | "execution_count": 28, 776 | "metadata": {}, 777 | "outputs": [ 778 | { 779 | "data": { 780 | "text/plain": [ 781 | "1" 782 | ] 783 | }, 784 | "execution_count": 28, 785 | "metadata": {}, 786 | "output_type": "execute_result" 787 | } 788 | ], 789 | "source": [ 790 | "2-1" 791 | ] 792 | }, 793 | { 794 | "cell_type": "code", 795 | "execution_count": 29, 796 | "metadata": {}, 797 | "outputs": [ 798 | { 799 | "data": { 800 | "text/plain": [ 801 | "2" 802 | ] 803 | }, 804 | "execution_count": 29, 805 | "metadata": {}, 806 | "output_type": "execute_result" 807 | } 808 | ], 809 | "source": [ 810 | "1*2" 811 | ] 812 | }, 813 | { 814 | "cell_type": "code", 815 | "execution_count": 30, 816 | "metadata": {}, 817 | "outputs": [ 818 | { 819 | "data": { 820 | "text/plain": [ 821 | "0.75" 822 | ] 823 | }, 824 | "execution_count": 30, 825 | "metadata": {}, 826 | "output_type": "execute_result" 827 | } 828 | ], 829 | "source": [ 830 | "3/4" 831 | ] 832 | }, 833 | { 834 | "cell_type": "markdown", 835 | "metadata": {}, 836 | "source": [ 837 | "In many languages (and older versions of python) 1/2 = 0 (truncated division). In Python 3 this behaviour is captured by a separate operator that rounds down: (ie a // b$=\\lfloor \\frac{a}{b}\\rfloor$)" 838 | ] 839 | }, 840 | { 841 | "cell_type": "code", 842 | "execution_count": 31, 843 | "metadata": {}, 844 | "outputs": [ 845 | { 846 | "data": { 847 | "text/plain": [ 848 | "0.0" 849 | ] 850 | }, 851 | "execution_count": 31, 852 | "metadata": {}, 853 | "output_type": "execute_result" 854 | } 855 | ], 856 | "source": [ 857 | "3//4.0" 858 | ] 859 | }, 860 | { 861 | "cell_type": "markdown", 862 | "metadata": {}, 863 | "source": [ 864 | "The mudulo `%` returns the remainder after division." 865 | ] 866 | }, 867 | { 868 | "cell_type": "code", 869 | "execution_count": 32, 870 | "metadata": {}, 871 | "outputs": [ 872 | { 873 | "data": { 874 | "text/plain": [ 875 | "5" 876 | ] 877 | }, 878 | "execution_count": 32, 879 | "metadata": {}, 880 | "output_type": "execute_result" 881 | } 882 | ], 883 | "source": [ 884 | "15%10" 885 | ] 886 | }, 887 | { 888 | "cell_type": "markdown", 889 | "metadata": {}, 890 | "source": [ 891 | "Python natively allows (nearly) infinite length integers while floating point numbers are double precision numbers:" 892 | ] 893 | }, 894 | { 895 | "cell_type": "code", 896 | "execution_count": 33, 897 | "metadata": {}, 898 | "outputs": [ 899 | { 900 | "data": { 901 | "text/plain": [ 902 | "2617010996188399907017032528972038342491649416953000260240805955827972056685382434497090341496787032585738884786745286700473999847280664191731008874811751310888591786111994678208920175143911761181424495660877950654145066969036252669735483098936884016471326487403792787648506879212630637101259246005701084327338001" 903 | ] 904 | }, 905 | "execution_count": 33, 906 | "metadata": {}, 907 | "output_type": "execute_result" 908 | } 909 | ], 910 | "source": [ 911 | "11**300" 912 | ] 913 | }, 914 | { 915 | "cell_type": "code", 916 | "execution_count": 34, 917 | "metadata": {}, 918 | "outputs": [ 919 | { 920 | "ename": "OverflowError", 921 | "evalue": "(34, 'Numerical result out of range')", 922 | "output_type": "error", 923 | "traceback": [ 924 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 925 | "\u001b[0;31mOverflowError\u001b[0m Traceback (most recent call last)", 926 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;36m11.0\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0;36m300\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 927 | "\u001b[0;31mOverflowError\u001b[0m: (34, 'Numerical result out of range')" 928 | ] 929 | } 930 | ], 931 | "source": [ 932 | "11.0**300" 933 | ] 934 | }, 935 | { 936 | "cell_type": "markdown", 937 | "metadata": {}, 938 | "source": [ 939 | "## Relational Operators" 940 | ] 941 | }, 942 | { 943 | "cell_type": "markdown", 944 | "metadata": {}, 945 | "source": [ 946 | "| Symbol | Task Performed |\n", 947 | "|----|---|\n", 948 | "| == | True, if it is equal |\n", 949 | "| != | True, if not equal to |\n", 950 | "| < | less than |\n", 951 | "| > | greater than |\n", 952 | "| <= | less than or equal to |\n", 953 | "| >= | greater than or equal to |\n", 954 | "\n", 955 | "Note the difference between `==` (equality test) and `=` (assignment)" 956 | ] 957 | }, 958 | { 959 | "cell_type": "code", 960 | "execution_count": null, 961 | "metadata": {}, 962 | "outputs": [], 963 | "source": [ 964 | "z = 2\n", 965 | "z == 2" 966 | ] 967 | }, 968 | { 969 | "cell_type": "code", 970 | "execution_count": null, 971 | "metadata": {}, 972 | "outputs": [], 973 | "source": [ 974 | "z > 2" 975 | ] 976 | }, 977 | { 978 | "cell_type": "markdown", 979 | "metadata": {}, 980 | "source": [ 981 | "Comparisons can also be chained in the mathematically obvious way. The following will work as expected in Python (but not in other languages like C/C++):" 982 | ] 983 | }, 984 | { 985 | "cell_type": "code", 986 | "execution_count": null, 987 | "metadata": {}, 988 | "outputs": [], 989 | "source": [ 990 | "0.5 < z <= 1" 991 | ] 992 | }, 993 | { 994 | "cell_type": "markdown", 995 | "metadata": {}, 996 | "source": [ 997 | "### String Operations\n", 998 | "\n", 999 | "Four binary operators act on strings:**in** , **not in** , **+** , and **\\***\n", 1000 | "\n", 1001 | "Let's use the mitochondrial tRNA (NCBI Reference Sequence: NC_012920.1) to practice with string operations: NCBI Reference Sequence: NC_012920.1" 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "code", 1006 | "execution_count": null, 1007 | "metadata": {}, 1008 | "outputs": [], 1009 | "source": [ 1010 | "trna='AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTTA'" 1011 | ] 1012 | }, 1013 | { 1014 | "cell_type": "code", 1015 | "execution_count": null, 1016 | "metadata": {}, 1017 | "outputs": [], 1018 | "source": [ 1019 | "# we can check if a given motif is in sequence\n", 1020 | "\n", 1021 | "'ATTAA' in trna" 1022 | ] 1023 | }, 1024 | { 1025 | "cell_type": "code", 1026 | "execution_count": null, 1027 | "metadata": {}, 1028 | "outputs": [], 1029 | "source": [ 1030 | "# We can also check if a given motif is absent\n", 1031 | "\n", 1032 | "'GGCTGTT' not in trna" 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "code", 1037 | "execution_count": null, 1038 | "metadata": {}, 1039 | "outputs": [], 1040 | "source": [ 1041 | "#we can concatentate two strings\n", 1042 | "\n", 1043 | "'ATTAA' + 'GGCTGTT'" 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "code", 1048 | "execution_count": null, 1049 | "metadata": {}, 1050 | "outputs": [], 1051 | "source": [ 1052 | "# Create a long string from a substring by multiplying with an integer\n", 1053 | "'GGCTGTT' * 4" 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "markdown", 1058 | "metadata": {}, 1059 | "source": [ 1060 | "We'll continue with string formatting in the next lecture. " 1061 | ] 1062 | }, 1063 | { 1064 | "cell_type": "code", 1065 | "execution_count": null, 1066 | "metadata": {}, 1067 | "outputs": [], 1068 | "source": [] 1069 | }, 1070 | { 1071 | "cell_type": "code", 1072 | "execution_count": null, 1073 | "metadata": {}, 1074 | "outputs": [], 1075 | "source": [] 1076 | } 1077 | ], 1078 | "metadata": { 1079 | "kernelspec": { 1080 | "display_name": "Python 3", 1081 | "language": "python", 1082 | "name": "python3" 1083 | }, 1084 | "language_info": { 1085 | "codemirror_mode": { 1086 | "name": "ipython", 1087 | "version": 3 1088 | }, 1089 | "file_extension": ".py", 1090 | "mimetype": "text/x-python", 1091 | "name": "python", 1092 | "nbconvert_exporter": "python", 1093 | "pygments_lexer": "ipython3", 1094 | "version": "3.6.5" 1095 | } 1096 | }, 1097 | "nbformat": 4, 1098 | "nbformat_minor": 2 1099 | } 1100 | -------------------------------------------------------------------------------- /Intro-to-Python/02.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "All of these python notebooks are available at https://github.com/kipkurui/Python4Bioinformatics" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "# Working with strings\n", 16 | "\n", 17 | "## The Print Statement" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "As seen previously, The **print()** function prints all of its arguments as strings, separated by spaces and follows by a linebreak:\n", 25 | "\n", 26 | " - print(\"Hello World\")\n", 27 | " - print(\"Hello\",'World')\n", 28 | " - print(\"Hello\", )\n", 29 | "\n", 30 | "Note that **print** is different in old versions of Python (2.7) where it was a statement and did not need parenthesis around its arguments." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 1, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "name": "stdout", 40 | "output_type": "stream", 41 | "text": [ 42 | "Hello World\n" 43 | ] 44 | } 45 | ], 46 | "source": [ 47 | "print(\"Hello\",\"World\")" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "The print has some optional arguments to control where and how to print. This includes `sep` the separator (default space) and `end` (end charcter) and `file` to write to a file." 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 1, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "dna=\"ACGTATA\"" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 2, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "text/plain": [ 74 | "\u001b[0;31mDocstring:\u001b[0m\n", 75 | "S.count(sub[, start[, end]]) -> int\n", 76 | "\n", 77 | "Return the number of non-overlapping occurrences of substring sub in\n", 78 | "string S[start:end]. Optional arguments start and end are\n", 79 | "interpreted as in slice notation.\n", 80 | "\u001b[0;31mType:\u001b[0m builtin_function_or_method\n" 81 | ] 82 | }, 83 | "metadata": {}, 84 | "output_type": "display_data" 85 | } 86 | ], 87 | "source": [ 88 | "?dna.count()" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 2, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "name": "stdout", 98 | "output_type": "stream", 99 | "text": [ 100 | "Hello...World!!" 101 | ] 102 | } 103 | ], 104 | "source": [ 105 | "print(\"Hello\",\"World\",sep='...',end='!!')" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "You can find the additional arguments, and help on usage of print, and any other function, by appending a ? before it. " 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 3, 118 | "metadata": {}, 119 | "outputs": [ 120 | { 121 | "data": { 122 | "text/plain": [ 123 | "\u001b[0;31mDocstring:\u001b[0m\n", 124 | "print(value, ..., sep=' ', end='\\n', file=sys.stdout, flush=False)\n", 125 | "\n", 126 | "Prints the values to a stream, or to sys.stdout by default.\n", 127 | "Optional keyword arguments:\n", 128 | "file: a file-like object (stream); defaults to the current sys.stdout.\n", 129 | "sep: string inserted between values, default a space.\n", 130 | "end: string appended after the last value, default a newline.\n", 131 | "flush: whether to forcibly flush the stream.\n", 132 | "\u001b[0;31mType:\u001b[0m builtin_function_or_method\n" 133 | ] 134 | }, 135 | "metadata": {}, 136 | "output_type": "display_data" 137 | } 138 | ], 139 | "source": [ 140 | "?print()" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "## String Formatting\n", 148 | "\n", 149 | "There are lots of methods for formatting and manipulating strings built into python. Some of these are illustrated here.\n", 150 | "\n", 151 | "String concatenation is the \"addition\" of two strings. Observe that while concatenating there will be no space between the strings." 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 9, 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "name": "stdout", 161 | "output_type": "stream", 162 | "text": [ 163 | "Hello World!267.0\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "string1='World'\n", 169 | "string2='!'\n", 170 | "print('Hello' + \" \"+ string1 + string2 + str(267.00))" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "The **%** operator is used to format a string inserting the value that comes after. It relies on the string containing a format specifier that identifies where to insert the value. The most common types of format specifiers are:\n", 178 | "\n", 179 | " - %s -> string\n", 180 | " - %d -> Integer\n", 181 | " - %f -> Float\n", 182 | " - %o -> Octal\n", 183 | " - %x -> Hexadecimal\n", 184 | " - %e -> exponential" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 15, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "name": "stdout", 194 | "output_type": "stream", 195 | "text": [ 196 | "Hello World\n", 197 | "Actual Number = 18\n", 198 | "Float of the number = 18.877\n", 199 | "Exponential equivalent of the number = 1.800000e+01\n" 200 | ] 201 | } 202 | ], 203 | "source": [ 204 | "print(\"Hello %s\" % string1)\n", 205 | "print(\"Actual Number = %d\" %18)\n", 206 | "print(\"Float of the number = %.3f\" % 18.87687)\n", 207 | "print(\"Exponential equivalent of the number = %e\" %18)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "When referring to multiple variables parenthesis is used. Values are inserted in the order they appear in the paranthesis (more on tuples in the next lecture)" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 6, 220 | "metadata": {}, 221 | "outputs": [ 222 | { 223 | "name": "stdout", 224 | "output_type": "stream", 225 | "text": [ 226 | "Hello World !. This meaning of life is 42\n" 227 | ] 228 | } 229 | ], 230 | "source": [ 231 | "print(\"Hello %s %s. This meaning of life is %d\" %(string1,string2,42))" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "We can also specify the width of the field and the number of decimal places to be used. For example:" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 20, 244 | "metadata": {}, 245 | "outputs": [ 246 | { 247 | "name": "stdout", 248 | "output_type": "stream", 249 | "text": [ 250 | "Print width 10: | my|\n", 251 | "Print width 10: | name|\n", 252 | "The number pi = 3.14 to 2 decimal places\n", 253 | "More space pi = 3.14\n", 254 | "Pad pi with 0 = 0000003.14\n" 255 | ] 256 | } 257 | ], 258 | "source": [ 259 | "print('Print width 10: |%10s|'%'my')\n", 260 | "print('Print width 10: |%10s|'%'name') # left justified\n", 261 | "print(\"The number pi = %.2f to 2 decimal places\"%3.1415)\n", 262 | "print(\"More space pi = %10.2f\"%3.1415)\n", 263 | "print(\"Pad pi with 0 = %010.2f\"%3.1415) # pad with zeros" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "## Other String Methods" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "Multiplying a string by an integer simply repeats it" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 8, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "Hello World! Hello World! Hello World! Hello World! Hello World! \n" 290 | ] 291 | } 292 | ], 293 | "source": [ 294 | "print(\"Hello World! \"*5)" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "Strings can be tranformed by a variety of functions:\n", 302 | "\n", 303 | "Let's get back to our trna example. " 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": 9, 309 | "metadata": {}, 310 | "outputs": [ 311 | { 312 | "name": "stdout", 313 | "output_type": "stream", 314 | "text": [ 315 | "Hello world\n", 316 | "HELLO WORLD\n", 317 | "hello world\n", 318 | "| Hello World |\n", 319 | "|lots of space|\n", 320 | "Hello Class\n" 321 | ] 322 | } 323 | ], 324 | "source": [ 325 | "s=\"hello wOrld\"\n", 326 | "print(s.capitalize())\n", 327 | "print(s.upper())\n", 328 | "print(s.lower())\n", 329 | "print('|%s|' % \"Hello World\".center(30)) # center in 30 characters\n", 330 | "print('|%s|'% \" lots of space \".strip()) # remove leading and trailing whitespace\n", 331 | "print(\"Hello World\".replace(\"World\",\"Class\"))" 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "There are also lots of ways to inspect or check strings. Examples of a few of these are given here:" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": 21, 344 | "metadata": {}, 345 | "outputs": [ 346 | { 347 | "name": "stdout", 348 | "output_type": "stream", 349 | "text": [ 350 | "Help on class str in module builtins:\n", 351 | "\n", 352 | "class str(object)\n", 353 | " | str(object='') -> str\n", 354 | " | str(bytes_or_buffer[, encoding[, errors]]) -> str\n", 355 | " | \n", 356 | " | Create a new string object from the given object. If encoding or\n", 357 | " | errors is specified, then the object must expose a data buffer\n", 358 | " | that will be decoded using the given encoding and error handler.\n", 359 | " | Otherwise, returns the result of object.__str__() (if defined)\n", 360 | " | or repr(object).\n", 361 | " | encoding defaults to sys.getdefaultencoding().\n", 362 | " | errors defaults to 'strict'.\n", 363 | " | \n", 364 | " | Methods defined here:\n", 365 | " | \n", 366 | " | __add__(self, value, /)\n", 367 | " | Return self+value.\n", 368 | " | \n", 369 | " | __contains__(self, key, /)\n", 370 | " | Return key in self.\n", 371 | " | \n", 372 | " | __eq__(self, value, /)\n", 373 | " | Return self==value.\n", 374 | " | \n", 375 | " | __format__(...)\n", 376 | " | S.__format__(format_spec) -> str\n", 377 | " | \n", 378 | " | Return a formatted version of S as described by format_spec.\n", 379 | " | \n", 380 | " | __ge__(self, value, /)\n", 381 | " | Return self>=value.\n", 382 | " | \n", 383 | " | __getattribute__(self, name, /)\n", 384 | " | Return getattr(self, name).\n", 385 | " | \n", 386 | " | __getitem__(self, key, /)\n", 387 | " | Return self[key].\n", 388 | " | \n", 389 | " | __getnewargs__(...)\n", 390 | " | \n", 391 | " | __gt__(self, value, /)\n", 392 | " | Return self>value.\n", 393 | " | \n", 394 | " | __hash__(self, /)\n", 395 | " | Return hash(self).\n", 396 | " | \n", 397 | " | __iter__(self, /)\n", 398 | " | Implement iter(self).\n", 399 | " | \n", 400 | " | __le__(self, value, /)\n", 401 | " | Return self<=value.\n", 402 | " | \n", 403 | " | __len__(self, /)\n", 404 | " | Return len(self).\n", 405 | " | \n", 406 | " | __lt__(self, value, /)\n", 407 | " | Return self size of S in memory, in bytes\n", 432 | " | \n", 433 | " | __str__(self, /)\n", 434 | " | Return str(self).\n", 435 | " | \n", 436 | " | capitalize(...)\n", 437 | " | S.capitalize() -> str\n", 438 | " | \n", 439 | " | Return a capitalized version of S, i.e. make the first character\n", 440 | " | have upper case and the rest lower case.\n", 441 | " | \n", 442 | " | casefold(...)\n", 443 | " | S.casefold() -> str\n", 444 | " | \n", 445 | " | Return a version of S suitable for caseless comparisons.\n", 446 | " | \n", 447 | " | center(...)\n", 448 | " | S.center(width[, fillchar]) -> str\n", 449 | " | \n", 450 | " | Return S centered in a string of length width. Padding is\n", 451 | " | done using the specified fill character (default is a space)\n", 452 | " | \n", 453 | " | count(...)\n", 454 | " | S.count(sub[, start[, end]]) -> int\n", 455 | " | \n", 456 | " | Return the number of non-overlapping occurrences of substring sub in\n", 457 | " | string S[start:end]. Optional arguments start and end are\n", 458 | " | interpreted as in slice notation.\n", 459 | " | \n", 460 | " | encode(...)\n", 461 | " | S.encode(encoding='utf-8', errors='strict') -> bytes\n", 462 | " | \n", 463 | " | Encode S using the codec registered for encoding. Default encoding\n", 464 | " | is 'utf-8'. errors may be given to set a different error\n", 465 | " | handling scheme. Default is 'strict' meaning that encoding errors raise\n", 466 | " | a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and\n", 467 | " | 'xmlcharrefreplace' as well as any other name registered with\n", 468 | " | codecs.register_error that can handle UnicodeEncodeErrors.\n", 469 | " | \n", 470 | " | endswith(...)\n", 471 | " | S.endswith(suffix[, start[, end]]) -> bool\n", 472 | " | \n", 473 | " | Return True if S ends with the specified suffix, False otherwise.\n", 474 | " | With optional start, test S beginning at that position.\n", 475 | " | With optional end, stop comparing S at that position.\n", 476 | " | suffix can also be a tuple of strings to try.\n", 477 | " | \n", 478 | " | expandtabs(...)\n", 479 | " | S.expandtabs(tabsize=8) -> str\n", 480 | " | \n", 481 | " | Return a copy of S where all tab characters are expanded using spaces.\n", 482 | " | If tabsize is not given, a tab size of 8 characters is assumed.\n", 483 | " | \n", 484 | " | find(...)\n", 485 | " | S.find(sub[, start[, end]]) -> int\n", 486 | " | \n", 487 | " | Return the lowest index in S where substring sub is found,\n", 488 | " | such that sub is contained within S[start:end]. Optional\n", 489 | " | arguments start and end are interpreted as in slice notation.\n", 490 | " | \n", 491 | " | Return -1 on failure.\n", 492 | " | \n", 493 | " | format(...)\n", 494 | " | S.format(*args, **kwargs) -> str\n", 495 | " | \n", 496 | " | Return a formatted version of S, using substitutions from args and kwargs.\n", 497 | " | The substitutions are identified by braces ('{' and '}').\n", 498 | " | \n", 499 | " | format_map(...)\n", 500 | " | S.format_map(mapping) -> str\n", 501 | " | \n", 502 | " | Return a formatted version of S, using substitutions from mapping.\n", 503 | " | The substitutions are identified by braces ('{' and '}').\n", 504 | " | \n", 505 | " | index(...)\n", 506 | " | S.index(sub[, start[, end]]) -> int\n", 507 | " | \n", 508 | " | Return the lowest index in S where substring sub is found, \n", 509 | " | such that sub is contained within S[start:end]. Optional\n", 510 | " | arguments start and end are interpreted as in slice notation.\n", 511 | " | \n", 512 | " | Raises ValueError when the substring is not found.\n", 513 | " | \n", 514 | " | isalnum(...)\n", 515 | " | S.isalnum() -> bool\n", 516 | " | \n", 517 | " | Return True if all characters in S are alphanumeric\n", 518 | " | and there is at least one character in S, False otherwise.\n", 519 | " | \n", 520 | " | isalpha(...)\n", 521 | " | S.isalpha() -> bool\n", 522 | " | \n", 523 | " | Return True if all characters in S are alphabetic\n", 524 | " | and there is at least one character in S, False otherwise.\n", 525 | " | \n", 526 | " | isdecimal(...)\n", 527 | " | S.isdecimal() -> bool\n", 528 | " | \n", 529 | " | Return True if there are only decimal characters in S,\n", 530 | " | False otherwise.\n", 531 | " | \n", 532 | " | isdigit(...)\n", 533 | " | S.isdigit() -> bool\n", 534 | " | \n", 535 | " | Return True if all characters in S are digits\n", 536 | " | and there is at least one character in S, False otherwise.\n", 537 | " | \n", 538 | " | isidentifier(...)\n", 539 | " | S.isidentifier() -> bool\n", 540 | " | \n", 541 | " | Return True if S is a valid identifier according\n", 542 | " | to the language definition.\n", 543 | " | \n", 544 | " | Use keyword.iskeyword() to test for reserved identifiers\n", 545 | " | such as \"def\" and \"class\".\n", 546 | " | \n", 547 | " | islower(...)\n", 548 | " | S.islower() -> bool\n", 549 | " | \n", 550 | " | Return True if all cased characters in S are lowercase and there is\n", 551 | " | at least one cased character in S, False otherwise.\n", 552 | " | \n", 553 | " | isnumeric(...)\n", 554 | " | S.isnumeric() -> bool\n", 555 | " | \n", 556 | " | Return True if there are only numeric characters in S,\n", 557 | " | False otherwise.\n", 558 | " | \n", 559 | " | isprintable(...)\n", 560 | " | S.isprintable() -> bool\n", 561 | " | \n", 562 | " | Return True if all characters in S are considered\n", 563 | " | printable in repr() or S is empty, False otherwise.\n", 564 | " | \n", 565 | " | isspace(...)\n", 566 | " | S.isspace() -> bool\n", 567 | " | \n", 568 | " | Return True if all characters in S are whitespace\n", 569 | " | and there is at least one character in S, False otherwise.\n", 570 | " | \n", 571 | " | istitle(...)\n", 572 | " | S.istitle() -> bool\n", 573 | " | \n", 574 | " | Return True if S is a titlecased string and there is at least one\n", 575 | " | character in S, i.e. upper- and titlecase characters may only\n", 576 | " | follow uncased characters and lowercase characters only cased ones.\n", 577 | " | Return False otherwise.\n", 578 | " | \n", 579 | " | isupper(...)\n", 580 | " | S.isupper() -> bool\n", 581 | " | \n", 582 | " | Return True if all cased characters in S are uppercase and there is\n", 583 | " | at least one cased character in S, False otherwise.\n", 584 | " | \n", 585 | " | join(...)\n", 586 | " | S.join(iterable) -> str\n", 587 | " | \n", 588 | " | Return a string which is the concatenation of the strings in the\n", 589 | " | iterable. The separator between elements is S.\n", 590 | " | \n", 591 | " | ljust(...)\n", 592 | " | S.ljust(width[, fillchar]) -> str\n", 593 | " | \n", 594 | " | Return S left-justified in a Unicode string of length width. Padding is\n", 595 | " | done using the specified fill character (default is a space).\n", 596 | " | \n", 597 | " | lower(...)\n", 598 | " | S.lower() -> str\n", 599 | " | \n", 600 | " | Return a copy of the string S converted to lowercase.\n", 601 | " | \n", 602 | " | lstrip(...)\n", 603 | " | S.lstrip([chars]) -> str\n", 604 | " | \n", 605 | " | Return a copy of the string S with leading whitespace removed.\n", 606 | " | If chars is given and not None, remove characters in chars instead.\n", 607 | " | \n", 608 | " | partition(...)\n", 609 | " | S.partition(sep) -> (head, sep, tail)\n", 610 | " | \n", 611 | " | Search for the separator sep in S, and return the part before it,\n", 612 | " | the separator itself, and the part after it. If the separator is not\n", 613 | " | found, return S and two empty strings.\n", 614 | " | \n", 615 | " | replace(...)\n", 616 | " | S.replace(old, new[, count]) -> str\n", 617 | " | \n", 618 | " | Return a copy of S with all occurrences of substring\n", 619 | " | old replaced by new. If the optional argument count is\n", 620 | " | given, only the first count occurrences are replaced.\n", 621 | " | \n", 622 | " | rfind(...)\n", 623 | " | S.rfind(sub[, start[, end]]) -> int\n", 624 | " | \n", 625 | " | Return the highest index in S where substring sub is found,\n", 626 | " | such that sub is contained within S[start:end]. Optional\n", 627 | " | arguments start and end are interpreted as in slice notation.\n", 628 | " | \n", 629 | " | Return -1 on failure.\n", 630 | " | \n", 631 | " | rindex(...)\n", 632 | " | S.rindex(sub[, start[, end]]) -> int\n", 633 | " | \n", 634 | " | Return the highest index in S where substring sub is found,\n", 635 | " | such that sub is contained within S[start:end]. Optional\n", 636 | " | arguments start and end are interpreted as in slice notation.\n", 637 | " | \n", 638 | " | Raises ValueError when the substring is not found.\n", 639 | " | \n", 640 | " | rjust(...)\n", 641 | " | S.rjust(width[, fillchar]) -> str\n", 642 | " | \n", 643 | " | Return S right-justified in a string of length width. Padding is\n", 644 | " | done using the specified fill character (default is a space).\n", 645 | " | \n", 646 | " | rpartition(...)\n", 647 | " | S.rpartition(sep) -> (head, sep, tail)\n", 648 | " | \n", 649 | " | Search for the separator sep in S, starting at the end of S, and return\n", 650 | " | the part before it, the separator itself, and the part after it. If the\n", 651 | " | separator is not found, return two empty strings and S.\n", 652 | " | \n", 653 | " | rsplit(...)\n", 654 | " | S.rsplit(sep=None, maxsplit=-1) -> list of strings\n", 655 | " | \n", 656 | " | Return a list of the words in S, using sep as the\n", 657 | " | delimiter string, starting at the end of the string and\n", 658 | " | working to the front. If maxsplit is given, at most maxsplit\n", 659 | " | splits are done. If sep is not specified, any whitespace string\n", 660 | " | is a separator.\n", 661 | " | \n", 662 | " | rstrip(...)\n", 663 | " | S.rstrip([chars]) -> str\n", 664 | " | \n", 665 | " | Return a copy of the string S with trailing whitespace removed.\n", 666 | " | If chars is given and not None, remove characters in chars instead.\n", 667 | " | \n", 668 | " | split(...)\n", 669 | " | S.split(sep=None, maxsplit=-1) -> list of strings\n", 670 | " | \n", 671 | " | Return a list of the words in S, using sep as the\n", 672 | " | delimiter string. If maxsplit is given, at most maxsplit\n", 673 | " | splits are done. If sep is not specified or is None, any\n", 674 | " | whitespace string is a separator and empty strings are\n", 675 | " | removed from the result.\n", 676 | " | \n", 677 | " | splitlines(...)\n", 678 | " | S.splitlines([keepends]) -> list of strings\n", 679 | " | \n", 680 | " | Return a list of the lines in S, breaking at line boundaries.\n", 681 | " | Line breaks are not included in the resulting list unless keepends\n", 682 | " | is given and true.\n", 683 | " | \n", 684 | " | startswith(...)\n", 685 | " | S.startswith(prefix[, start[, end]]) -> bool\n", 686 | " | \n", 687 | " | Return True if S starts with the specified prefix, False otherwise.\n", 688 | " | With optional start, test S beginning at that position.\n", 689 | " | With optional end, stop comparing S at that position.\n", 690 | " | prefix can also be a tuple of strings to try.\n", 691 | " | \n", 692 | " | strip(...)\n", 693 | " | S.strip([chars]) -> str\n", 694 | " | \n", 695 | " | Return a copy of the string S with leading and trailing\n", 696 | " | whitespace removed.\n", 697 | " | If chars is given and not None, remove characters in chars instead.\n", 698 | " | \n", 699 | " | swapcase(...)\n", 700 | " | S.swapcase() -> str\n", 701 | " | \n", 702 | " | Return a copy of S with uppercase characters converted to lowercase\n", 703 | " | and vice versa.\n", 704 | " | \n", 705 | " | title(...)\n", 706 | " | S.title() -> str\n", 707 | " | \n", 708 | " | Return a titlecased version of S, i.e. words start with title case\n", 709 | " | characters, all remaining cased characters have lower case.\n", 710 | " | \n", 711 | " | translate(...)\n", 712 | " | S.translate(table) -> str\n", 713 | " | \n", 714 | " | Return a copy of the string S in which each character has been mapped\n", 715 | " | through the given translation table. The table must implement\n", 716 | " | lookup/indexing via __getitem__, for instance a dictionary or list,\n", 717 | " | mapping Unicode ordinals to Unicode ordinals, strings, or None. If\n", 718 | " | this operation raises LookupError, the character is left untouched.\n", 719 | " | Characters mapped to None are deleted.\n", 720 | " | \n", 721 | " | upper(...)\n", 722 | " | S.upper() -> str\n", 723 | " | \n", 724 | " | Return a copy of S converted to uppercase.\n", 725 | " | \n", 726 | " | zfill(...)\n", 727 | " | S.zfill(width) -> str\n", 728 | " | \n", 729 | " | Pad a numeric string S with zeros on the left, to fill a field\n", 730 | " | of the specified width. The string S is never truncated.\n", 731 | " | \n", 732 | " | ----------------------------------------------------------------------\n", 733 | " | Static methods defined here:\n", 734 | " | \n", 735 | " | maketrans(x, y=None, z=None, /)\n", 736 | " | Return a translation table usable for str.translate().\n", 737 | " | \n", 738 | " | If there is only one argument, it must be a dictionary mapping Unicode\n", 739 | " | ordinals (integers) or characters to Unicode ordinals, strings or None.\n", 740 | " | Character keys will be then converted to ordinals.\n", 741 | " | If there are two arguments, they must be strings of equal length, and\n", 742 | " | in the resulting dictionary, each character in x will be mapped to the\n", 743 | " | character at the same position in y. If there is a third argument, it\n", 744 | " | must be a string, whose characters will be mapped to None in the result.\n", 745 | "\n" 746 | ] 747 | } 748 | ], 749 | "source": [ 750 | "help(str)" 751 | ] 752 | }, 753 | { 754 | "cell_type": "code", 755 | "execution_count": 22, 756 | "metadata": {}, 757 | "outputs": [ 758 | { 759 | "name": "stdout", 760 | "output_type": "stream", 761 | "text": [ 762 | "The length of the sequence is 69 nucleotides\n", 763 | "There are 21 'G's but only 9 C's in the sequence\n", 764 | "The \"ATTAA\" motif is at index 14\n" 765 | ] 766 | } 767 | ], 768 | "source": [ 769 | "trna='AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTTA'\n", 770 | "print(\"The length of the sequence is %i\" % len(trna),\"nucleotides\") # len() gives length\n", 771 | "\n", 772 | "#count strings\n", 773 | "print(\"There are %d 'G's but only %d C's in the sequence\" % (trna.count('G'),trna.count('C')))\n", 774 | "print('The \"ATTAA\" motif is at index',trna.find('ATTAA')) #index from 0 or -1" 775 | ] 776 | }, 777 | { 778 | "cell_type": "markdown", 779 | "metadata": {}, 780 | "source": [ 781 | "### Exercise\n", 782 | "\n", 783 | "Calculate the % GC and % AT content in the trna sequence" 784 | ] 785 | }, 786 | { 787 | "cell_type": "code", 788 | "execution_count": 11, 789 | "metadata": {}, 790 | "outputs": [], 791 | "source": [ 792 | "A_count=trna.count('A')\n", 793 | "C_count=trna.count('C')\n", 794 | "G_count=trna.count('G')\n", 795 | "T_count=trna.count('T')" 796 | ] 797 | }, 798 | { 799 | "cell_type": "markdown", 800 | "metadata": {}, 801 | "source": [ 802 | "## String comparison operations\n", 803 | "Strings can be compared in lexicographical order with the usual comparisons. In addition the `in` operator checks for substrings:" 804 | ] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "execution_count": 12, 809 | "metadata": {}, 810 | "outputs": [ 811 | { 812 | "data": { 813 | "text/plain": [ 814 | "True" 815 | ] 816 | }, 817 | "execution_count": 12, 818 | "metadata": {}, 819 | "output_type": "execute_result" 820 | } 821 | ], 822 | "source": [ 823 | "'abc' < 'bbc' <= 'bbc'" 824 | ] 825 | }, 826 | { 827 | "cell_type": "code", 828 | "execution_count": 13, 829 | "metadata": {}, 830 | "outputs": [ 831 | { 832 | "data": { 833 | "text/plain": [ 834 | "True" 835 | ] 836 | }, 837 | "execution_count": 13, 838 | "metadata": {}, 839 | "output_type": "execute_result" 840 | } 841 | ], 842 | "source": [ 843 | "\"ABC\" in \"This is the ABC of Python\"" 844 | ] 845 | }, 846 | { 847 | "cell_type": "markdown", 848 | "metadata": {}, 849 | "source": [ 850 | "## Accessing parts of strings" 851 | ] 852 | }, 853 | { 854 | "cell_type": "markdown", 855 | "metadata": {}, 856 | "source": [ 857 | "Strings can be indexed with square brackets. Indexing starts from zero in Python. " 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "execution_count": 14, 863 | "metadata": {}, 864 | "outputs": [ 865 | { 866 | "name": "stdout", 867 | "output_type": "stream", 868 | "text": [ 869 | "First nucleotide of the sequence is A\n", 870 | "Last nucleotide of the sequence is T\n" 871 | ] 872 | } 873 | ], 874 | "source": [ 875 | "s = 'AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTT'\n", 876 | "print('First nucleotide of the sequence is',s[0])\n", 877 | "print('Last nucleotide of the sequence is',s[len(s)-1])" 878 | ] 879 | }, 880 | { 881 | "cell_type": "markdown", 882 | "metadata": {}, 883 | "source": [ 884 | "Negative indices can be used to start counting from the back" 885 | ] 886 | }, 887 | { 888 | "cell_type": "code", 889 | "execution_count": 15, 890 | "metadata": {}, 891 | "outputs": [ 892 | { 893 | "name": "stdout", 894 | "output_type": "stream", 895 | "text": [ 896 | "First nucleotide of the sequence is A\n", 897 | "Last nucleotide of the sequence is T\n" 898 | ] 899 | } 900 | ], 901 | "source": [ 902 | "print('First nucleotide of the sequence is',s[-len(s)])\n", 903 | "print('Last nucleotide of the sequence is',s[-1])" 904 | ] 905 | }, 906 | { 907 | "cell_type": "markdown", 908 | "metadata": {}, 909 | "source": [ 910 | "#### Slicing\n", 911 | "Finally a substring (range of characters) can be specified as using $a:b$ to specify the characters at index $a,a+1,\\ldots,b-1$. Note that the last charcter is *not* included. Now we can find the first codon in the sequence:" 912 | ] 913 | }, 914 | { 915 | "cell_type": "code", 916 | "execution_count": 16, 917 | "metadata": {}, 918 | "outputs": [ 919 | { 920 | "name": "stdout", 921 | "output_type": "stream", 922 | "text": [ 923 | "First codon in the sequence is AAG\n", 924 | "The secodn codon in the sequence is GGC\n" 925 | ] 926 | } 927 | ], 928 | "source": [ 929 | "print(\"First codon in the sequence is\",s[0:3])\n", 930 | "print(\"The secodn codon in the sequence is\",s[3:6])" 931 | ] 932 | }, 933 | { 934 | "cell_type": "markdown", 935 | "metadata": {}, 936 | "source": [ 937 | "An empty beginning and end of the range denotes the beginning/end of the string:" 938 | ] 939 | }, 940 | { 941 | "cell_type": "code", 942 | "execution_count": 17, 943 | "metadata": {}, 944 | "outputs": [ 945 | { 946 | "name": "stdout", 947 | "output_type": "stream", 948 | "text": [ 949 | "First codon in the sequence is AAG\n", 950 | "Last codon in the sequence is CTT\n" 951 | ] 952 | } 953 | ], 954 | "source": [ 955 | "print(\"First codon in the sequence is\", s[:3])\n", 956 | "print(\"Last codon in the sequence is\", s[-3:])" 957 | ] 958 | }, 959 | { 960 | "cell_type": "markdown", 961 | "metadata": {}, 962 | "source": [ 963 | "A colon without an index, returns the whole string. " 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": 18, 969 | "metadata": {}, 970 | "outputs": [ 971 | { 972 | "data": { 973 | "text/plain": [ 974 | "'AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTT'" 975 | ] 976 | }, 977 | "execution_count": 18, 978 | "metadata": {}, 979 | "output_type": "execute_result" 980 | } 981 | ], 982 | "source": [ 983 | "s[:]" 984 | ] 985 | }, 986 | { 987 | "cell_type": "markdown", 988 | "metadata": {}, 989 | "source": [ 990 | "## Strings are immutable\n", 991 | "\n", 992 | "It is important that strings are constant, immutable values in Python. While new strings can easily be created it is not possible to modify a string:" 993 | ] 994 | }, 995 | { 996 | "cell_type": "code", 997 | "execution_count": 19, 998 | "metadata": {}, 999 | "outputs": [ 1000 | { 1001 | "name": "stdout", 1002 | "output_type": "stream", 1003 | "text": [ 1004 | "creating new string 01X345 OK\n", 1005 | "01X345 still OK\n" 1006 | ] 1007 | }, 1008 | { 1009 | "ename": "TypeError", 1010 | "evalue": "'str' object does not support item assignment", 1011 | "output_type": "error", 1012 | "traceback": [ 1013 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 1014 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 1015 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0msX\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreplace\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'2'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'X'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# the same thing\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msX\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\"still OK\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0ms\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'X'\u001b[0m \u001b[0;31m# an error!!!\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1016 | "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment" 1017 | ] 1018 | } 1019 | ], 1020 | "source": [ 1021 | "s='012345'\n", 1022 | "sX=s[:2]+'X'+s[3:] # this creates a new string with 2 replaced by X\n", 1023 | "print(\"creating new string\",sX,\"OK\")\n", 1024 | "sX=s.replace('2','X') # the same thing\n", 1025 | "print(sX,\"still OK\")\n", 1026 | "s[2] = 'X' # an error!!!" 1027 | ] 1028 | }, 1029 | { 1030 | "cell_type": "markdown", 1031 | "metadata": {}, 1032 | "source": [ 1033 | "### Exercise:\n", 1034 | "\n", 1035 | "1. Given the following amino acid sequence (MNKMDLVADVAEKTDLSKAKATEVIDAVFA), find the first, last and the 5th amino acids in the sequence. \n", 1036 | "2. The above amino acid is a bacterial restriction enzyme that recognizes \"TCCGGA\". Find the first restriction site in the following sequence: AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA" 1037 | ] 1038 | }, 1039 | { 1040 | "cell_type": "code", 1041 | "execution_count": null, 1042 | "metadata": {}, 1043 | "outputs": [], 1044 | "source": [] 1045 | } 1046 | ], 1047 | "metadata": { 1048 | "kernelspec": { 1049 | "display_name": "Python 3", 1050 | "language": "python", 1051 | "name": "python3" 1052 | }, 1053 | "language_info": { 1054 | "codemirror_mode": { 1055 | "name": "ipython", 1056 | "version": 3 1057 | }, 1058 | "file_extension": ".py", 1059 | "mimetype": "text/x-python", 1060 | "name": "python", 1061 | "nbconvert_exporter": "python", 1062 | "pygments_lexer": "ipython3", 1063 | "version": "3.6.5" 1064 | } 1065 | }, 1066 | "nbformat": 4, 1067 | "nbformat_minor": 2 1068 | } 1069 | -------------------------------------------------------------------------------- /Intro-to-Python/04.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.\n", 9 | "" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Dictionaries" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "Dictionaries are mappings between keys and items stored in the dictionaries. Unlike lists and tuples, dictionaries are unordered. Alternatively one can think of dictionaries as sets in which something stored against every element of the set. They can be defined as follows:" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "To define a dictionary, equate a variable to { } or dict()" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 1, 36 | "metadata": { 37 | "collapsed": false 38 | }, 39 | "outputs": [ 40 | { 41 | "name": "stdout", 42 | "output_type": "stream", 43 | "text": [ 44 | "\n", 45 | "{'abc': 3, 4: 'A string'}\n" 46 | ] 47 | } 48 | ], 49 | "source": [ 50 | "d = dict() # or equivalently d={}\n", 51 | "print(type(d))\n", 52 | "d['abc'] = 3\n", 53 | "d[4] = \"A string\"\n", 54 | "print(d)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "As can be guessed from the output above. Dictionaries can be defined by using the `{ key : value }` syntax. The following dictionary has three elements" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 2, 67 | "metadata": { 68 | "collapsed": false 69 | }, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "text/plain": [ 74 | "3" 75 | ] 76 | }, 77 | "execution_count": 2, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "d = { 1: 'One', 2 : 'Two', 100 : 'Hundred'}\n", 84 | "len(d)" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "Now you are able to access 'One' by the index value set at 1" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 3, 97 | "metadata": { 98 | "collapsed": false 99 | }, 100 | "outputs": [ 101 | { 102 | "name": "stdout", 103 | "output_type": "stream", 104 | "text": [ 105 | "One\n" 106 | ] 107 | } 108 | ], 109 | "source": [ 110 | "print(d[1])" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "There are a number of alternative ways for specifying a dictionary including as a list of `(key,value)` tuples.\n", 118 | "To illustrate this we will start with two lists and form a set of tuples from them using the **zip()** function\n", 119 | "Two lists which are related can be merged to form a dictionary." 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 4, 125 | "metadata": { 126 | "collapsed": false 127 | }, 128 | "outputs": [ 129 | { 130 | "data": { 131 | "text/plain": [ 132 | "[('One', 1), ('Two', 2), ('Three', 3), ('Four', 4), ('Five', 5)]" 133 | ] 134 | }, 135 | "execution_count": 4, 136 | "metadata": {}, 137 | "output_type": "execute_result" 138 | } 139 | ], 140 | "source": [ 141 | "names = ['One', 'Two', 'Three', 'Four', 'Five']\n", 142 | "numbers = [1, 2, 3, 4, 5]\n", 143 | "[ (name,number) for name,number in zip(names,numbers)] # create (name,number) pairs" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "Now we can create a dictionary that maps the name to the number as follows." 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 5, 156 | "metadata": { 157 | "collapsed": false 158 | }, 159 | "outputs": [ 160 | { 161 | "name": "stdout", 162 | "output_type": "stream", 163 | "text": [ 164 | "{'One': 1, 'Two': 2, 'Three': 3, 'Four': 4, 'Five': 5}\n" 165 | ] 166 | } 167 | ], 168 | "source": [ 169 | "a1 = dict((name,number) for name,number in zip(names,numbers))\n", 170 | "print(a1)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "Note that the ordering for this dictionary is not based on the order in which elements are added but on its own ordering (based on hash index ordering). It is best never to assume an ordering when iterating over elements of a dictionary.\n", 178 | "\n", 179 | "By using tuples as indexes we make a dictionary behave like a sparse matrix:" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 6, 185 | "metadata": { 186 | "collapsed": false 187 | }, 188 | "outputs": [ 189 | { 190 | "name": "stdout", 191 | "output_type": "stream", 192 | "text": [ 193 | "{(0, 1): 3.5, (2, 17): 0.1, (2, 2): 3.6}\n" 194 | ] 195 | } 196 | ], 197 | "source": [ 198 | "matrix={ (0,1): 3.5, (2,17): 0.1}\n", 199 | "matrix[2,2] = matrix[0,1] + matrix[2,17]\n", 200 | "print(matrix)" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "Dictionary can also be built using the loop style definition." 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 7, 213 | "metadata": { 214 | "collapsed": false 215 | }, 216 | "outputs": [ 217 | { 218 | "name": "stdout", 219 | "output_type": "stream", 220 | "text": [ 221 | "{'One': 3, 'Two': 3, 'Three': 5, 'Four': 4, 'Five': 4}\n" 222 | ] 223 | } 224 | ], 225 | "source": [ 226 | "a2 = { name : len(name) for name in names}\n", 227 | "print(a2)" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "### Built-in Functions" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "The **len()** function and **in** operator have the obvious meaning:" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 8, 247 | "metadata": { 248 | "collapsed": false 249 | }, 250 | "outputs": [ 251 | { 252 | "name": "stdout", 253 | "output_type": "stream", 254 | "text": [ 255 | "a1 has 5 elements\n", 256 | "One is in a1 True but not Zero False\n" 257 | ] 258 | } 259 | ], 260 | "source": [ 261 | "print(\"a1 has\",len(a1),\"elements\")\n", 262 | "print(\"One is in a1\",'One' in a1,\"but not Zero\", 'Zero' in a1)" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "**clear( )** function is used to erase all elements." 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 9, 275 | "metadata": { 276 | "collapsed": false 277 | }, 278 | "outputs": [ 279 | { 280 | "name": "stdout", 281 | "output_type": "stream", 282 | "text": [ 283 | "{}\n" 284 | ] 285 | } 286 | ], 287 | "source": [ 288 | "a2.clear()\n", 289 | "print(a2)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "**values( )** function returns a list with all the assigned values in the dictionary. (Acutally not quit a list, but something that we can iterate over just like a list to construct a list, tuple or any other collection):" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 10, 302 | "metadata": { 303 | "collapsed": false 304 | }, 305 | "outputs": [ 306 | { 307 | "data": { 308 | "text/plain": [ 309 | "[1, 2, 3, 4, 5]" 310 | ] 311 | }, 312 | "execution_count": 10, 313 | "metadata": {}, 314 | "output_type": "execute_result" 315 | } 316 | ], 317 | "source": [ 318 | "[ v for v in a1.values() ]" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "**keys( )** function returns all the index or the keys to which contains the values that it was assigned to." 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 11, 331 | "metadata": { 332 | "collapsed": false 333 | }, 334 | "outputs": [ 335 | { 336 | "data": { 337 | "text/plain": [ 338 | "{'Five', 'Four', 'One', 'Three', 'Two'}" 339 | ] 340 | }, 341 | "execution_count": 11, 342 | "metadata": {}, 343 | "output_type": "execute_result" 344 | } 345 | ], 346 | "source": [ 347 | "{ k for k in a1.keys() }" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "**items( )** is returns a list containing both the list but each element in the dictionary is inside a tuple. This is same as the result that was obtained when zip function was used - except that the ordering has been 'shuffled' by the dictionary." 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 12, 360 | "metadata": { 361 | "collapsed": false 362 | }, 363 | "outputs": [ 364 | { 365 | "data": { 366 | "text/plain": [ 367 | "'One = 1, Two = 2, Three = 3, Four = 4, Five = 5'" 368 | ] 369 | }, 370 | "execution_count": 12, 371 | "metadata": {}, 372 | "output_type": "execute_result" 373 | } 374 | ], 375 | "source": [ 376 | "\", \".join( \"%s = %d\" % (name,val) for name,val in a1.items())" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "**pop( )** function is used to get the remove that particular element and this removed element can be assigned to a new variable. But remember only the value is stored and not the key. Because the is just a index value." 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 13, 389 | "metadata": { 390 | "collapsed": false 391 | }, 392 | "outputs": [ 393 | { 394 | "name": "stdout", 395 | "output_type": "stream", 396 | "text": [ 397 | "{'One': 1, 'Two': 2, 'Three': 3, 'Five': 5}\n", 398 | "Removed 4\n" 399 | ] 400 | } 401 | ], 402 | "source": [ 403 | "val = a1.pop('Four')\n", 404 | "print(a1)\n", 405 | "print(\"Removed\",val)" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "## Exercise\n", 413 | "\n", 414 | "- Using strings, lists, tuples and dictionaries concepts, find the reverse complement of AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": null, 420 | "metadata": {}, 421 | "outputs": [], 422 | "source": [] 423 | } 424 | ], 425 | "metadata": { 426 | "celltoolbar": "Raw Cell Format", 427 | "kernelspec": { 428 | "display_name": "Python 3", 429 | "language": "python", 430 | "name": "python3" 431 | }, 432 | "language_info": { 433 | "codemirror_mode": { 434 | "name": "ipython", 435 | "version": 3 436 | }, 437 | "file_extension": ".py", 438 | "mimetype": "text/x-python", 439 | "name": "python", 440 | "nbconvert_exporter": "python", 441 | "pygments_lexer": "ipython3", 442 | "version": "3.6.5" 443 | } 444 | }, 445 | "nbformat": 4, 446 | "nbformat_minor": 2 447 | } 448 | -------------------------------------------------------------------------------- /Intro-to-Python/05.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.\n", 9 | "" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "from IPython.display import HTML" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "# Control Flow Statements\n", 26 | "The key thing to note about Python's control flow statements and program structure is that it uses _indentation_ to mark blocks. Hence the amount of white space (space or tab characters) at the start of a line is very important. This generally helps to make code more readable but can catch out new users of python." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "## Conditionals\n", 34 | "\n", 35 | "Conditionals in Python allows us to test conditions and change the program behaviour depending on the outcome of the tests. The Booleans, 'True' or 'False' are used in conditionals. \n", 36 | "### If " 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "```python\n", 44 | "if some_condition:\n", 45 | " code block```\n", 46 | " \n", 47 | "Take note of the **:** at the end of the condition. The indented statements that follow are called a\n", 48 | "block. The first unindented statement marksthe end of the block. Code is executed in blocks. " 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": { 55 | "collapsed": false 56 | }, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "Hello\n" 63 | ] 64 | } 65 | ], 66 | "source": [ 67 | "x = 12\n", 68 | "if x > 10:\n", 69 | " print(\"Hello\")" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### If-else" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "```python\n", 84 | "if some_condition:\n", 85 | " algorithm1\n", 86 | "else:\n", 87 | " algorithm2```\n", 88 | " \n", 89 | " If the condition is True then algorithm1 is executed. If not, algorithm2 under the else clause is executed. " 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 3, 95 | "metadata": { 96 | "collapsed": false 97 | }, 98 | "outputs": [ 99 | { 100 | "name": "stdout", 101 | "output_type": "stream", 102 | "text": [ 103 | "world\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "x = 12\n", 109 | "if 10 < x < 11:\n", 110 | " print(\"hello\")\n", 111 | "else:\n", 112 | " print(\"world\")" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "### Else if\n", 120 | "\n", 121 | "Sometimes there are more than two possibilities and we need more than two branches. One way to express a computation like that is a **chained conditional**. You can have as many `elif` statements as you'd like, but it must have just one `else` statemet at the end. " 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "```python\n", 129 | "if some_condition: \n", 130 | " algorithm\n", 131 | "elif some_condition:\n", 132 | " algorithm\n", 133 | "else:\n", 134 | " algorithm```" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 4, 140 | "metadata": { 141 | "collapsed": false 142 | }, 143 | "outputs": [ 144 | { 145 | "name": "stdout", 146 | "output_type": "stream", 147 | "text": [ 148 | "x y:\n", 156 | " print(\"x>y\")\n", 157 | "elif x < y:\n", 158 | " print(\"x y:\n", 190 | " print( \"x>y\")\n", 191 | "elif x < y:\n", 192 | " print( \"x " 271 | ], 272 | "text/plain": [ 273 | "" 274 | ] 275 | }, 276 | "metadata": {}, 277 | "output_type": "display_data" 278 | } 279 | ], 280 | "source": [ 281 | "%%html\n", 282 | "" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 8, 288 | "metadata": { 289 | "collapsed": false 290 | }, 291 | "outputs": [ 292 | { 293 | "name": "stdout", 294 | "output_type": "stream", 295 | "text": [ 296 | "a\n", 297 | "b\n", 298 | "c\n", 299 | "total = 14\n" 300 | ] 301 | } 302 | ], 303 | "source": [ 304 | "for ch in 'abc':\n", 305 | " print(ch)\n", 306 | "total = 0\n", 307 | "for i in range(5):\n", 308 | " total += i\n", 309 | "for i,j in [(1,2),(3,1)]:\n", 310 | " total += i**j\n", 311 | "print(\"total =\",total)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "In the above example, i iterates over the 0,1,2,3,4. Every time it takes each value and executes the algorithm inside the loop. It is also possible to iterate over a nested list illustrated below." 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 9, 324 | "metadata": { 325 | "collapsed": false 326 | }, 327 | "outputs": [ 328 | { 329 | "name": "stdout", 330 | "output_type": "stream", 331 | "text": [ 332 | "[1, 2, 3]\n", 333 | "[4, 5, 6]\n", 334 | "[7, 8, 9]\n" 335 | ] 336 | } 337 | ], 338 | "source": [ 339 | "list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\n", 340 | "for list1 in list_of_lists:\n", 341 | " print(list1)" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "A use case of a nested for loop in this case would be," 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 10, 354 | "metadata": { 355 | "collapsed": false 356 | }, 357 | "outputs": [ 358 | { 359 | "name": "stdout", 360 | "output_type": "stream", 361 | "text": [ 362 | "45\n" 363 | ] 364 | } 365 | ], 366 | "source": [ 367 | "list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\n", 368 | "total=0\n", 369 | "for list1 in list_of_lists:\n", 370 | " for x in list1:\n", 371 | " total = total+x\n", 372 | "print(total)" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "There are many helper functions that make **for** loops even more powerful and easy to use. For example **enumerate()**, **zip()**, **sorted()**, **reversed()**" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 11, 385 | "metadata": { 386 | "collapsed": false 387 | }, 388 | "outputs": [ 389 | { 390 | "name": "stdout", 391 | "output_type": "stream", 392 | "text": [ 393 | "reversed: c;b;a;\n", 394 | "enuemerated: \n", 395 | "0 = a; 1 = b; 2 = c; \n", 396 | "zip'ed: \n", 397 | "a : x\n", 398 | "b : y\n", 399 | "c : z\n" 400 | ] 401 | } 402 | ], 403 | "source": [ 404 | "print(\"reversed: \",end=\"\")\n", 405 | "for ch in reversed(\"abc\"):\n", 406 | " print(ch,end=\";\")\n", 407 | "print(\"\\nenuemerated: \")\n", 408 | "for i,ch in enumerate(\"abc\"):\n", 409 | " print(i,\"=\",ch,end=\"; \")\n", 410 | "print(\"\\nzip'ed: \")\n", 411 | "for a,x in zip(\"abc\",\"xyz\"):\n", 412 | " print(a,\":\",x)" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "### While" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "```python\n", 427 | "while some_condition: \n", 428 | " algorithm```\n", 429 | " \n", 430 | "A while loop checks a condition and continues executing the block untill the condition is False. The loop terminates when the condition is not met.\n", 431 | "\n", 432 | "In the example below, sometimes the code does not behave as expected in Jupyter Notebook. See the Script bank.py. " 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 12, 438 | "metadata": {}, 439 | "outputs": [ 440 | { 441 | "name": "stdout", 442 | "output_type": "stream", 443 | "text": [ 444 | "Your balance is: 50000\n", 445 | "Anything else?\n", 446 | "q\n" 447 | ] 448 | } 449 | ], 450 | "source": [ 451 | "acountbal = 50000\n", 452 | "choice = input(\"Please enter 'b' to check balance or 'w' to withdraw: \")\n", 453 | "while choice != 'q':\n", 454 | " if choice.lower() in ('w','b'):\n", 455 | " if choice.lower() == 'b':\n", 456 | " print(\"Your balance is: %d\" % acountbal)\n", 457 | " print(\"Anything else?\")\n", 458 | " choice = input(\"Enter b for balance, w to withdraw or q to quit 1: \")\n", 459 | " print(choice.lower())\n", 460 | " else:\n", 461 | " withdraw = float(input(\"Enter amount to withdraw: \"))\n", 462 | " if withdraw <= acountbal:\n", 463 | " print(\"here is your: %.2f\" % withdraw)\n", 464 | " acountbal = acountbal - withdraw\n", 465 | " print(\"Anything else?\")\n", 466 | " choice = input(\"Enter b for balance, w to withdraw or q to quit 2: \")\n", 467 | " #choice = 'q'\n", 468 | " else:\n", 469 | " print(\"You have insufficient funds: %.2f\" % acountbal)\n", 470 | " else:\n", 471 | " print(\"Wrong choice!\")\n", 472 | " choice = input(\"Please enter 'b' to check balance or 'w' to withdraw: \")" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": 12, 478 | "metadata": { 479 | "collapsed": false 480 | }, 481 | "outputs": [ 482 | { 483 | "name": "stdout", 484 | "output_type": "stream", 485 | "text": [ 486 | "1\n", 487 | "4\n", 488 | "Bye\n" 489 | ] 490 | } 491 | ], 492 | "source": [ 493 | "i = 1\n", 494 | "while i < 3:\n", 495 | " print(i ** 2)\n", 496 | " i = i+1\n", 497 | "print('Bye')" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 13, 503 | "metadata": {}, 504 | "outputs": [ 505 | { 506 | "name": "stdout", 507 | "output_type": "stream", 508 | "text": [ 509 | "12\n" 510 | ] 511 | } 512 | ], 513 | "source": [ 514 | "dna = 'ATGCGGACCTAT'\n", 515 | "base = 'C'\n", 516 | "i = 0 # counter\n", 517 | "j = 0 # string index\n", 518 | "while j < len(dna):\n", 519 | " if dna[j] == base:\n", 520 | " i += 1\n", 521 | " j += 1\n", 522 | "print(j)" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "metadata": {}, 528 | "source": [ 529 | "If the conditional does not chnage to false at some point, we end up with an infinite loop. For example, if you follow the directions for using shampoo 'lather, rinse, repeat' literally you may never finish washing you hair. That is an infinite loop.\n", 530 | "\n", 531 | "Use a **for loop** if you know, before you start looping, the maximum number of times that you’ll need to execute the body." 532 | ] 533 | }, 534 | { 535 | "cell_type": "markdown", 536 | "metadata": {}, 537 | "source": [ 538 | "### Break" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": {}, 544 | "source": [ 545 | "Loops execute until a given number of times is reached or the condition changes to False. You can `break` out of a loop when a condition becomes true when executing the loop." 546 | ] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "execution_count": 14, 551 | "metadata": { 552 | "collapsed": false 553 | }, 554 | "outputs": [ 555 | { 556 | "name": "stdout", 557 | "output_type": "stream", 558 | "text": [ 559 | "0\n", 560 | "1\n", 561 | "2\n", 562 | "3\n", 563 | "4\n", 564 | "5\n", 565 | "6\n", 566 | "7\n" 567 | ] 568 | } 569 | ], 570 | "source": [ 571 | "for i in range(100):\n", 572 | " print(i)\n", 573 | " if i>=7:\n", 574 | " break" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "### Continue" 582 | ] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": {}, 587 | "source": [ 588 | "This continues the rest of the loop. Sometimes when a condition is satisfied there are chances of the loop getting terminated. This can be avoided using continue statement. " 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 15, 594 | "metadata": { 595 | "collapsed": false 596 | }, 597 | "outputs": [ 598 | { 599 | "name": "stdout", 600 | "output_type": "stream", 601 | "text": [ 602 | "Processed 0\n", 603 | "Processed 1\n", 604 | "Processed 2\n", 605 | "Processed 3\n", 606 | "Processed 4\n", 607 | "Ignored 5\n", 608 | "Ignored 6\n", 609 | "Ignored 7\n", 610 | "Ignored 8\n", 611 | "Ignored 9\n" 612 | ] 613 | } 614 | ], 615 | "source": [ 616 | "for i in range(10):\n", 617 | " if i>4:\n", 618 | " print(\"Ignored\",i)\n", 619 | " continue\n", 620 | " # this statement is not reach if i > 4\n", 621 | " print(\"Processed\",i)" 622 | ] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "## Catching exceptions" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "To break out of deeply nested exectution sometimes it is useful to raise an exception.\n", 636 | "A try block allows you to catch exceptions that happen anywhere during the exeuction of the try block:\n", 637 | "```python\n", 638 | "try:\n", 639 | " code\n", 640 | "except as :\n", 641 | " # deal with error of this type\n", 642 | "except:\n", 643 | " # deal with any error```" 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": 3, 649 | "metadata": { 650 | "collapsed": false 651 | }, 652 | "outputs": [ 653 | { 654 | "name": "stdout", 655 | "output_type": "stream", 656 | "text": [ 657 | "First here\n", 658 | "Then here\n", 659 | "Finally here\n", 660 | "Looping\n", 661 | "Finally here\n", 662 | "Looping\n", 663 | "Finally here\n", 664 | "Looping\n", 665 | "Finally here\n", 666 | "Looping\n", 667 | "Caught exception: could not convert string to float: 'ywed'\n" 668 | ] 669 | } 670 | ], 671 | "source": [ 672 | "try:\n", 673 | " count=0\n", 674 | " while True:\n", 675 | " print('First here')\n", 676 | " while True:\n", 677 | " print('Then here')\n", 678 | " while True:\n", 679 | " print('Finally here')\n", 680 | " print(\"Looping\")\n", 681 | " count = count + 1\n", 682 | " if count > 3:\n", 683 | " float('ywed')\n", 684 | " #raise Exception(\"abort\") # exit every loop or function\n", 685 | "except Exception as e: # this is where we go when an exception is raised\n", 686 | " print(\"Caught exception:\",e)" 687 | ] 688 | }, 689 | { 690 | "cell_type": "markdown", 691 | "metadata": {}, 692 | "source": [ 693 | "This can also be useful to handle unexpected system errors more gracefully:" 694 | ] 695 | }, 696 | { 697 | "cell_type": "code", 698 | "execution_count": 16, 699 | "metadata": { 700 | "collapsed": false 701 | }, 702 | "outputs": [ 703 | { 704 | "name": "stdout", 705 | "output_type": "stream", 706 | "text": [ 707 | "The inverse of 2.000000 is 0.500000\n", 708 | "The inverse of 1.500000 is 0.666667\n", 709 | "Cannot divide by zero\n" 710 | ] 711 | } 712 | ], 713 | "source": [ 714 | "try:\n", 715 | " for i in [2,1.5,0.0,3]:\n", 716 | " inverse = 1.0/i\n", 717 | " print(\"The inverse of %f is %f\" % (i,inverse))\n", 718 | "except ValueError: # no matter what exception\n", 719 | " print(\"Cannot calculate inverse of %f\" % i)\n", 720 | "except ZeroDivisionError:\n", 721 | " print(\"Cannot divide by zero\")\n", 722 | "except:\n", 723 | " print(\"No idea whhat went wrong\")" 724 | ] 725 | }, 726 | { 727 | "cell_type": "code", 728 | "execution_count": null, 729 | "metadata": {}, 730 | "outputs": [], 731 | "source": [] 732 | } 733 | ], 734 | "metadata": { 735 | "kernelspec": { 736 | "display_name": "Python 3", 737 | "language": "python", 738 | "name": "python3" 739 | }, 740 | "language_info": { 741 | "codemirror_mode": { 742 | "name": "ipython", 743 | "version": 3 744 | }, 745 | "file_extension": ".py", 746 | "mimetype": "text/x-python", 747 | "name": "python", 748 | "nbconvert_exporter": "python", 749 | "pygments_lexer": "ipython3", 750 | "version": "3.6.5" 751 | } 752 | }, 753 | "nbformat": 4, 754 | "nbformat_minor": 2 755 | } 756 | -------------------------------------------------------------------------------- /Intro-to-Python/06.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.\n", 9 | "" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Functions" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "Object `import` not found.\n" 29 | ] 30 | } 31 | ], 32 | "source": [] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "In Python, a function is a named sequence of statements that belong together. Functions allow code to be re-used so that complex programs can be built up out of simpler parts. Python has inbuilt functions, like `print()`, `max()` etc. We can also create our own functions by using the `def` keyword. " 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "This is the basic syntax of a function\n", 46 | "\n", 47 | "```python\n", 48 | "def funcname(arg1, arg2,... argN):\n", 49 | " ''' Document String'''\n", 50 | " statements\n", 51 | " return ```" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "Read the above syntax as, A function by name \"funcname\" is defined, which accepts arguements \"arg1,arg2,....argN\". The function is documented and it is '''Document String'''. The function after executing the statements returns a \"value\".\n", 59 | "\n", 60 | "Return values are optional (by default every function returns **None** if no return statement is executed)\n", 61 | "\n", 62 | "We can choose any function name, except the inbuilt Python keywords. We can check for keywords using:" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 1, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "name": "stdout", 72 | "output_type": "stream", 73 | "text": [ 74 | "['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']\n" 75 | ] 76 | } 77 | ], 78 | "source": [ 79 | "import keyword\n", 80 | "\n", 81 | "print(keyword.kwlist)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "Defining a function using keywords will throw an error." 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 7, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "ename": "SyntaxError", 98 | "evalue": "invalid syntax (, line 1)", 99 | "output_type": "error", 100 | "traceback": [ 101 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m def False():\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" 102 | ] 103 | } 104 | ], 105 | "source": [ 106 | "def False():\n", 107 | " pass" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 1, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "def sumnos(x,y,z):\n", 117 | " total = x + y + z\n", 118 | " \n", 119 | " return total" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 3, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "totals = sumnos(1,20,30)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 4, 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "51\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "print(totals)" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 5, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "ename": "NameError", 155 | "evalue": "name 'total' is not defined", 156 | "output_type": "error", 157 | "traceback": [ 158 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 159 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 160 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtotal\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 161 | "\u001b[0;31mNameError\u001b[0m: name 'total' is not defined" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "total" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "However, Python does not prevent you from overwritting Python inbuilt functions. So you have to be careful" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 3, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "def print(name):\n", 183 | " \"\"\"Take name as input and introduces 'Name' \"\"\"\n", 184 | " return \"My Name is \", name\n", 185 | " " 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 4, 191 | "metadata": {}, 192 | "outputs": [ 193 | { 194 | "data": { 195 | "text/plain": [ 196 | "('My Name is ', 'My Name is Caleb')" 197 | ] 198 | }, 199 | "execution_count": 4, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "print('My Name is Caleb')" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "To restore inbuilt functions, use:" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 5, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "del print" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 6, 227 | "metadata": {}, 228 | "outputs": [ 229 | { 230 | "name": "stdout", 231 | "output_type": "stream", 232 | "text": [ 233 | "My Name is Caleb\n" 234 | ] 235 | } 236 | ], 237 | "source": [ 238 | "print('My Name is Caleb')" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 7, 244 | "metadata": {}, 245 | "outputs": [ 246 | { 247 | "name": "stdout", 248 | "output_type": "stream", 249 | "text": [ 250 | "Hello Jack.\n", 251 | "Jack, how are you?\n" 252 | ] 253 | } 254 | ], 255 | "source": [ 256 | "print(\"Hello Jack.\")\n", 257 | "print(\"Jack, how are you?\")" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "Instead of writing the above two statements every single time it can be replaced by defining a function which would do the job in just one line. \n", 265 | "\n", 266 | "Defining a function firstfunc()." 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 5, 272 | "metadata": { 273 | "collapsed": false 274 | }, 275 | "outputs": [], 276 | "source": [ 277 | "def firstfunc():\n", 278 | " print(\"Hello Jack.\")\n", 279 | " return \"Jack, how are you?\"\n", 280 | " # execute the function" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 7, 286 | "metadata": {}, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "Hello Jack.\n" 293 | ] 294 | } 295 | ], 296 | "source": [ 297 | "greetings = firstfunc()" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 9, 303 | "metadata": {}, 304 | "outputs": [ 305 | { 306 | "name": "stdout", 307 | "output_type": "stream", 308 | "text": [ 309 | "Jack, how are you?\n" 310 | ] 311 | } 312 | ], 313 | "source": [ 314 | "print(greetings)" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "**firstfunc()** every time just prints the message to a single person. We can make our function **firstfunc()** to accept arguements which will store the name and then prints respective to that accepted name. To do so, add a argument within the function as shown." 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 10, 327 | "metadata": { 328 | "collapsed": true 329 | }, 330 | "outputs": [], 331 | "source": [ 332 | "def firstfunc(username):\n", 333 | " print(\"Hello %s.\" % username)\n", 334 | " print(username + ',' ,\"how are you?\")" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 11, 340 | "metadata": { 341 | "collapsed": false 342 | }, 343 | "outputs": [], 344 | "source": [ 345 | "name1 = 'Caleb' #input('Please enter your name : ')" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | " So we pass this variable to the function **firstfunc()** as the variable username because that is the variable that is defined for this function. i.e name1 is passed as username." 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": 12, 358 | "metadata": { 359 | "collapsed": false 360 | }, 361 | "outputs": [ 362 | { 363 | "name": "stdout", 364 | "output_type": "stream", 365 | "text": [ 366 | "Hello Caleb.\n", 367 | "Caleb, how are you?\n" 368 | ] 369 | } 370 | ], 371 | "source": [ 372 | "firstfunc(name1)" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "## Return Statement" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "When the function results in some value and that value has to be stored in a variable or needs to be sent back or returned for further operation to the main algorithm, a return statement is used." 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": 18, 392 | "metadata": { 393 | "collapsed": true 394 | }, 395 | "outputs": [ 396 | { 397 | "name": "stdout", 398 | "output_type": "stream", 399 | "text": [ 400 | "413\n" 401 | ] 402 | } 403 | ], 404 | "source": [ 405 | "def times(x,y):\n", 406 | " z = x*y\n", 407 | " return z\n", 408 | "\n", 409 | "c = times(7,59)\n", 410 | "print(c)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "The above defined **times( )** function accepts two arguements and return the variable z which contains the result of the product of the two arguements" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 16, 423 | "metadata": { 424 | "collapsed": false 425 | }, 426 | "outputs": [ 427 | { 428 | "name": "stdout", 429 | "output_type": "stream", 430 | "text": [ 431 | "413\n" 432 | ] 433 | } 434 | ], 435 | "source": [ 436 | "c = times(7,59)\n", 437 | "print(c)" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "The z value is stored in variable c and can be used for further operations." 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": {}, 450 | "source": [ 451 | "Instead of declaring another variable the entire statement itself can be used in the return statement as shown." 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 19, 457 | "metadata": { 458 | "collapsed": false 459 | }, 460 | "outputs": [], 461 | "source": [ 462 | "def times(x,y):\n", 463 | " \"\"\"This multiplies the two input arguments\"\"\"\n", 464 | " return x*y" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 15, 470 | "metadata": { 471 | "collapsed": false 472 | }, 473 | "outputs": [ 474 | { 475 | "name": "stdout", 476 | "output_type": "stream", 477 | "text": [ 478 | "20\n" 479 | ] 480 | } 481 | ], 482 | "source": [ 483 | "c = times(4,5)\n", 484 | "print(c)" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "Since the **times( )** is now defined, we can document it as shown above. This document is returned whenever **times( )** function is called under **help( )** function." 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 20, 497 | "metadata": { 498 | "collapsed": false 499 | }, 500 | "outputs": [ 501 | { 502 | "name": "stdout", 503 | "output_type": "stream", 504 | "text": [ 505 | "Help on function times in module __main__:\n", 506 | "\n", 507 | "times(x, y)\n", 508 | " This multiplies the two input arguments\n", 509 | "\n" 510 | ] 511 | } 512 | ], 513 | "source": [ 514 | "help(times)" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": {}, 520 | "source": [ 521 | "Multiple variable can also be returned as a tuple. However this tends not to be very readable when returning many value, and can easily introduce errors when the order of return values is interpreted incorrectly." 522 | ] 523 | }, 524 | { 525 | "cell_type": "code", 526 | "execution_count": 21, 527 | "metadata": { 528 | "collapsed": true 529 | }, 530 | "outputs": [], 531 | "source": [ 532 | "eglist = [10,50,30,12,6,8,100]" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 22, 538 | "metadata": { 539 | "collapsed": true 540 | }, 541 | "outputs": [], 542 | "source": [ 543 | "def egfunc(eglist):\n", 544 | " highest = max(eglist)\n", 545 | " lowest = min(eglist)\n", 546 | " first = eglist[0]\n", 547 | " last = eglist[-1]\n", 548 | " return highest,lowest,first,last" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "If the function is just called without any variable for it to be assigned to, the result is returned inside a tuple. But if the variables are mentioned then the result is assigned to the variable in a particular order which is declared in the return statement." 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": 23, 561 | "metadata": { 562 | "collapsed": false 563 | }, 564 | "outputs": [ 565 | { 566 | "data": { 567 | "text/plain": [ 568 | "(100, 6, 10, 100)" 569 | ] 570 | }, 571 | "execution_count": 23, 572 | "metadata": {}, 573 | "output_type": "execute_result" 574 | } 575 | ], 576 | "source": [ 577 | "egfunc(eglist)" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": 26, 583 | "metadata": { 584 | "collapsed": false 585 | }, 586 | "outputs": [ 587 | { 588 | "name": "stdout", 589 | "output_type": "stream", 590 | "text": [ 591 | " a = 100 b = 6 c = 10 d = 100\n" 592 | ] 593 | } 594 | ], 595 | "source": [ 596 | "a,b,c,d = egfunc(eglist)\n", 597 | "print(' a =',a,' b =',b,' c =',c,' d =',d)" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": {}, 603 | "source": [ 604 | "## Default arguments" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": {}, 610 | "source": [ 611 | "When an argument of a function is common in majority of the cases this can be specified with a default value. This is also called an implicit argument." 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": 34, 617 | "metadata": { 618 | "collapsed": true 619 | }, 620 | "outputs": [], 621 | "source": [ 622 | "def implicitadd(x,y=3,z=0):\n", 623 | " print(\"%d + %d + %d = %d\"%(x,y,z,x+y+z))\n", 624 | " return x+y+z" 625 | ] 626 | }, 627 | { 628 | "cell_type": "markdown", 629 | "metadata": {}, 630 | "source": [ 631 | "**implicitadd( )** is a function accepts up to three arguments but most of the times the first argument needs to be added just by 3. Hence the second argument is assigned the value 3 and the third argument is zero. Here the last two arguments are default arguments." 632 | ] 633 | }, 634 | { 635 | "cell_type": "markdown", 636 | "metadata": {}, 637 | "source": [ 638 | "Now if the second argument is not defined when calling the **implicitadd( )** function then it considered as 3." 639 | ] 640 | }, 641 | { 642 | "cell_type": "code", 643 | "execution_count": 37, 644 | "metadata": { 645 | "collapsed": false 646 | }, 647 | "outputs": [ 648 | { 649 | "name": "stdout", 650 | "output_type": "stream", 651 | "text": [ 652 | "3 + 3 + 4 = 10\n" 653 | ] 654 | }, 655 | { 656 | "data": { 657 | "text/plain": [ 658 | "10" 659 | ] 660 | }, 661 | "execution_count": 37, 662 | "metadata": {}, 663 | "output_type": "execute_result" 664 | } 665 | ], 666 | "source": [ 667 | "implicitadd(3,z=4)" 668 | ] 669 | }, 670 | { 671 | "cell_type": "markdown", 672 | "metadata": {}, 673 | "source": [ 674 | "However we can call the same function with two or three arguments. A useful feature is to explicitly name the argument values being passed into the function. This gives great flexibility in how to call a function with optional arguments. All off the following are valid:" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": 23, 680 | "metadata": { 681 | "collapsed": false 682 | }, 683 | "outputs": [ 684 | { 685 | "name": "stdout", 686 | "output_type": "stream", 687 | "text": [ 688 | "4 + 4 + 0 = 8\n", 689 | "4 + 5 + 6 = 15\n", 690 | "4 + 3 + 7 = 14\n", 691 | "2 + 1 + 9 = 12\n", 692 | "1 + 3 + 0 = 4\n" 693 | ] 694 | }, 695 | { 696 | "data": { 697 | "text/plain": [ 698 | "4" 699 | ] 700 | }, 701 | "execution_count": 23, 702 | "metadata": {}, 703 | "output_type": "execute_result" 704 | } 705 | ], 706 | "source": [ 707 | "implicitadd(4,4)\n", 708 | "implicitadd(4,5,6)\n", 709 | "implicitadd(4,z=7)\n", 710 | "implicitadd(2,y=1,z=9)\n", 711 | "implicitadd(x=1)" 712 | ] 713 | }, 714 | { 715 | "cell_type": "markdown", 716 | "metadata": {}, 717 | "source": [ 718 | "## Any number of arguments" 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": {}, 724 | "source": [ 725 | "If the number of arguments that is to be accepted by a function is not known then a asterisk symbol is used before the name of the argument to hold the remainder of the arguments. The following function requires at least one argument but can have many more." 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "execution_count": 24, 731 | "metadata": { 732 | "collapsed": true 733 | }, 734 | "outputs": [], 735 | "source": [ 736 | "def add_n(first,*args):\n", 737 | " \"return the sum of one or more numbers\"\n", 738 | " reslist = [first] + [value for value in args]\n", 739 | " print(reslist)\n", 740 | " return sum(reslist)" 741 | ] 742 | }, 743 | { 744 | "cell_type": "markdown", 745 | "metadata": {}, 746 | "source": [ 747 | "The above function defines a list of all of the arguments, prints the list and returns the sum of all of the arguments." 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": 26, 753 | "metadata": { 754 | "collapsed": false 755 | }, 756 | "outputs": [ 757 | { 758 | "name": "stdout", 759 | "output_type": "stream", 760 | "text": [ 761 | "[6.5]\n" 762 | ] 763 | }, 764 | { 765 | "data": { 766 | "text/plain": [ 767 | "6.5" 768 | ] 769 | }, 770 | "execution_count": 26, 771 | "metadata": {}, 772 | "output_type": "execute_result" 773 | } 774 | ], 775 | "source": [ 776 | "add_n(6.5)" 777 | ] 778 | }, 779 | { 780 | "cell_type": "markdown", 781 | "metadata": {}, 782 | "source": [ 783 | "Arbitrary numbers of named arguments can also be accepted using `**`. When the function is called all of the additional named arguments are provided in a dictionary " 784 | ] 785 | }, 786 | { 787 | "cell_type": "code", 788 | "execution_count": 27, 789 | "metadata": { 790 | "collapsed": false 791 | }, 792 | "outputs": [ 793 | { 794 | "name": "stdout", 795 | "output_type": "stream", 796 | "text": [ 797 | "x=12 animal=mouse z=(1+2j)\n" 798 | ] 799 | } 800 | ], 801 | "source": [ 802 | "def namedArgs(**names):\n", 803 | " 'print the named arguments'\n", 804 | " # names is a dictionary of keyword : value\n", 805 | " print(\" \".join(name+\"=\"+str(value) \n", 806 | " for name,value in names.items()))\n", 807 | "\n", 808 | "namedArgs(x=3*4,animal='mouse',z=(1+2j))" 809 | ] 810 | }, 811 | { 812 | "cell_type": "markdown", 813 | "metadata": {}, 814 | "source": [ 815 | "## Global and Local Variables" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": {}, 821 | "source": [ 822 | "Whatever variable is declared inside a function is local variable and outside the function in global variable." 823 | ] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": 28, 828 | "metadata": { 829 | "collapsed": false 830 | }, 831 | "outputs": [], 832 | "source": [ 833 | "eg1 = [1,2,3,4,5]\n" 834 | ] 835 | }, 836 | { 837 | "cell_type": "markdown", 838 | "metadata": {}, 839 | "source": [ 840 | "In the below function we are appending a element to the declared list inside the function. eg2 variable declared inside the function is a local variable." 841 | ] 842 | }, 843 | { 844 | "cell_type": "code", 845 | "execution_count": 29, 846 | "metadata": { 847 | "collapsed": true 848 | }, 849 | "outputs": [], 850 | "source": [ 851 | "def egfunc1():\n", 852 | " x=1\n", 853 | " def thirdfunc():\n", 854 | " x=2\n", 855 | " print(\"Inside thirdfunc x =\", x) \n", 856 | " thirdfunc()\n", 857 | " print(\"Outside x =\", x)" 858 | ] 859 | }, 860 | { 861 | "cell_type": "markdown", 862 | "metadata": {}, 863 | "source": [ 864 | "Let's have a look at how the variables are assigned. " 865 | ] 866 | }, 867 | { 868 | "cell_type": "code", 869 | "execution_count": 30, 870 | "metadata": {}, 871 | "outputs": [ 872 | { 873 | "data": { 874 | "text/html": [ 875 | "" 876 | ], 877 | "text/plain": [ 878 | "" 879 | ] 880 | }, 881 | "metadata": {}, 882 | "output_type": "display_data" 883 | } 884 | ], 885 | "source": [ 886 | "%%html\n", 887 | "" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": 31, 893 | "metadata": { 894 | "collapsed": false 895 | }, 896 | "outputs": [ 897 | { 898 | "name": "stdout", 899 | "output_type": "stream", 900 | "text": [ 901 | "Inside thirdfunc x = 2\n", 902 | "Outside x = 1\n" 903 | ] 904 | } 905 | ], 906 | "source": [ 907 | "egfunc1()" 908 | ] 909 | }, 910 | { 911 | "cell_type": "markdown", 912 | "metadata": {}, 913 | "source": [ 914 | "If a **global** variable is defined as shown in the example below then that variable can be called from anywhere. Global values should be used sparingly as they make functions harder to re-use." 915 | ] 916 | }, 917 | { 918 | "cell_type": "code", 919 | "execution_count": 32, 920 | "metadata": { 921 | "collapsed": false 922 | }, 923 | "outputs": [], 924 | "source": [ 925 | "eg3 = [1,2,3,4,5]" 926 | ] 927 | }, 928 | { 929 | "cell_type": "code", 930 | "execution_count": 33, 931 | "metadata": { 932 | "collapsed": true 933 | }, 934 | "outputs": [], 935 | "source": [ 936 | "def egfunc1():\n", 937 | " x = 1.0 # local variable for egfunc1\n", 938 | " def thirdfunc():\n", 939 | " global x # globally defined variable \n", 940 | " x = 2.0\n", 941 | " print(\"Inside thirdfunc x =\", x) \n", 942 | " thirdfunc()\n", 943 | " print(\"Outside x =\", x)" 944 | ] 945 | }, 946 | { 947 | "cell_type": "code", 948 | "execution_count": 34, 949 | "metadata": { 950 | "collapsed": false 951 | }, 952 | "outputs": [ 953 | { 954 | "name": "stdout", 955 | "output_type": "stream", 956 | "text": [ 957 | "Inside thirdfunc x = 2.0\n", 958 | "Outside x = 1.0\n", 959 | "Globally defined x = 2.0\n" 960 | ] 961 | } 962 | ], 963 | "source": [ 964 | "egfunc1()\n", 965 | "print(\"Globally defined x =\",x)" 966 | ] 967 | }, 968 | { 969 | "cell_type": "markdown", 970 | "metadata": {}, 971 | "source": [ 972 | "## Lambda Functions" 973 | ] 974 | }, 975 | { 976 | "cell_type": "markdown", 977 | "metadata": {}, 978 | "source": [ 979 | "These are small functions which are not defined with any name and carry a single expression whose result is returned. Lambda functions comes very handy when operating with lists. These function are defined by the keyword **lambda** followed by the variables, a colon and the respective expression." 980 | ] 981 | }, 982 | { 983 | "cell_type": "code", 984 | "execution_count": 35, 985 | "metadata": { 986 | "collapsed": true 987 | }, 988 | "outputs": [], 989 | "source": [ 990 | "z = lambda x: x * x" 991 | ] 992 | }, 993 | { 994 | "cell_type": "code", 995 | "execution_count": 36, 996 | "metadata": { 997 | "collapsed": false 998 | }, 999 | "outputs": [ 1000 | { 1001 | "data": { 1002 | "text/plain": [ 1003 | "64" 1004 | ] 1005 | }, 1006 | "execution_count": 36, 1007 | "metadata": {}, 1008 | "output_type": "execute_result" 1009 | } 1010 | ], 1011 | "source": [ 1012 | "z(8)" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "markdown", 1017 | "metadata": {}, 1018 | "source": [ 1019 | "### Composing functions" 1020 | ] 1021 | }, 1022 | { 1023 | "cell_type": "markdown", 1024 | "metadata": {}, 1025 | "source": [ 1026 | "Lambda functions can also be used to compose functions" 1027 | ] 1028 | }, 1029 | { 1030 | "cell_type": "code", 1031 | "execution_count": 37, 1032 | "metadata": { 1033 | "collapsed": false 1034 | }, 1035 | "outputs": [ 1036 | { 1037 | "name": "stdout", 1038 | "output_type": "stream", 1039 | "text": [ 1040 | "doublesquare is a \n" 1041 | ] 1042 | }, 1043 | { 1044 | "data": { 1045 | "text/plain": [ 1046 | "18" 1047 | ] 1048 | }, 1049 | "execution_count": 37, 1050 | "metadata": {}, 1051 | "output_type": "execute_result" 1052 | } 1053 | ], 1054 | "source": [ 1055 | "def double(x):\n", 1056 | " return 2*x\n", 1057 | "def square(x):\n", 1058 | " return x*x\n", 1059 | "def f_of_g(f,g):\n", 1060 | " \"Compose two functions of a single variable\"\n", 1061 | " return lambda x: f(g(x))\n", 1062 | "doublesquare= f_of_g(double,square)\n", 1063 | "print(\"doublesquare is a\",type(doublesquare))\n", 1064 | "doublesquare(3)" 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "markdown", 1069 | "metadata": {}, 1070 | "source": [ 1071 | "### Exercise\n", 1072 | "Let's return to our earlier exercise: calculating %GC content. In this exercise:\n", 1073 | "- Write a function `percentageGC` that calculates the GC content of a DNA sequence\n", 1074 | "- The function should return the %GC content\n", 1075 | "- The Function should return a message if the provided sequence is not DNA (This should be checked by a different function, called by your function)\n" 1076 | ] 1077 | }, 1078 | { 1079 | "cell_type": "code", 1080 | "execution_count": 59, 1081 | "metadata": {}, 1082 | "outputs": [], 1083 | "source": [ 1084 | "def percentGC(dna):\n", 1085 | " '''calculates the percentage GC content, given a DNA sequence'''\n", 1086 | " if dnacheck(dna):\n", 1087 | " dna_len= len(dna)\n", 1088 | " gs = dna.count('G')\n", 1089 | " cs = dna.count('C')\n", 1090 | " \n", 1091 | " return (gs+cs)/dna_len*100\n", 1092 | " \n", 1093 | " #print(\"The sequence input is not a valid DNA\")" 1094 | ] 1095 | }, 1096 | { 1097 | "cell_type": "code", 1098 | "execution_count": 61, 1099 | "metadata": {}, 1100 | "outputs": [ 1101 | { 1102 | "name": "stdout", 1103 | "output_type": "stream", 1104 | "text": [ 1105 | "There is an invalid base 'F' at position 2\n", 1106 | "There is an invalid base 'R' at position 3\n", 1107 | "There is an invalid base 'H' at position 11\n", 1108 | "There is an invalid base 'H' at position 11\n", 1109 | "There is an invalid base 'Y' at position 14\n", 1110 | "There is an invalid base 'K' at position 16\n" 1111 | ] 1112 | } 1113 | ], 1114 | "source": [ 1115 | "mydna = \"CAGTGATGATGACGAT\"\n", 1116 | "yourdna = \"ACGATCGAGACGTAGTA\"\n", 1117 | "testdna = \"ATFRACGATTGHAHYAK\"\n", 1118 | "percentGC(testdna)" 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "markdown", 1123 | "metadata": {}, 1124 | "source": [ 1125 | "## Rose's Function" 1126 | ] 1127 | }, 1128 | { 1129 | "cell_type": "code", 1130 | "execution_count": 33, 1131 | "metadata": {}, 1132 | "outputs": [], 1133 | "source": [ 1134 | "def dnaTest(x):\n", 1135 | " dnaset = set(\"AGCT\")\n", 1136 | " y = set(x.upper()).union(dnaset) == dnaset\n", 1137 | " return y" 1138 | ] 1139 | }, 1140 | { 1141 | "cell_type": "code", 1142 | "execution_count": 34, 1143 | "metadata": {}, 1144 | "outputs": [ 1145 | { 1146 | "data": { 1147 | "text/plain": [ 1148 | "True" 1149 | ] 1150 | }, 1151 | "execution_count": 34, 1152 | "metadata": {}, 1153 | "output_type": "execute_result" 1154 | } 1155 | ], 1156 | "source": [ 1157 | "dnaTest(testdna)" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "code", 1162 | "execution_count": 36, 1163 | "metadata": {}, 1164 | "outputs": [], 1165 | "source": [ 1166 | "mygc = percentGC(dna=mydna)" 1167 | ] 1168 | }, 1169 | { 1170 | "cell_type": "markdown", 1171 | "metadata": {}, 1172 | "source": [ 1173 | "### Dina's Function" 1174 | ] 1175 | }, 1176 | { 1177 | "cell_type": "code", 1178 | "execution_count": 36, 1179 | "metadata": {}, 1180 | "outputs": [ 1181 | { 1182 | "data": { 1183 | "text/plain": [ 1184 | "False" 1185 | ] 1186 | }, 1187 | "execution_count": 36, 1188 | "metadata": {}, 1189 | "output_type": "execute_result" 1190 | } 1191 | ], 1192 | "source": [ 1193 | "def dnatest(dnaseq):\n", 1194 | " testdna =[]\n", 1195 | " dna = dnaseq.upper()\n", 1196 | " for i in dna:\n", 1197 | " result = str(i) in ['T', 'A', 'G', 'C']\n", 1198 | " testdna.append(result)\n", 1199 | " if testdna.count(False) > 0:\n", 1200 | " return False\n", 1201 | " else:\n", 1202 | " return True\n", 1203 | "\n", 1204 | "dnaseq ='acgtaaqat'\n", 1205 | "dnatest(dnaseq)" 1206 | ] 1207 | }, 1208 | { 1209 | "cell_type": "markdown", 1210 | "metadata": {}, 1211 | "source": [ 1212 | "## Edwin's Function" 1213 | ] 1214 | }, 1215 | { 1216 | "cell_type": "code", 1217 | "execution_count": 64, 1218 | "metadata": {}, 1219 | "outputs": [ 1220 | { 1221 | "name": "stdout", 1222 | "output_type": "stream", 1223 | "text": [ 1224 | "There is an invalid base 'Z' at position 3\n", 1225 | "There is an invalid base 'Y' at position 7\n", 1226 | "There is an invalid base 'Z' at position 11\n", 1227 | "There is an invalid base 'B' at position 14\n" 1228 | ] 1229 | }, 1230 | { 1231 | "data": { 1232 | "text/plain": [ 1233 | "False" 1234 | ] 1235 | }, 1236 | "execution_count": 64, 1237 | "metadata": {}, 1238 | "output_type": "execute_result" 1239 | } 1240 | ], 1241 | "source": [ 1242 | "dna = 'ACZGTCYTTgZCABG'\n", 1243 | "def dnacheck(dna):\n", 1244 | " counter = 0\n", 1245 | " check = True\n", 1246 | " valid_dna = 'ACGT'\n", 1247 | " for i in dna.upper():\n", 1248 | " counter += 1\n", 1249 | " if i in valid_dna:\n", 1250 | " pass\n", 1251 | " else:\n", 1252 | " check = False\n", 1253 | " print(\"There is an invalid base '%s' at position %d\"\n", 1254 | " % (i,counter))\n", 1255 | " return check\n", 1256 | "dnacheck(dna)\n", 1257 | " \n", 1258 | " " 1259 | ] 1260 | }, 1261 | { 1262 | "cell_type": "code", 1263 | "execution_count": 20, 1264 | "metadata": {}, 1265 | "outputs": [ 1266 | { 1267 | "name": "stdout", 1268 | "output_type": "stream", 1269 | "text": [ 1270 | "The sequence is not DNA\n" 1271 | ] 1272 | } 1273 | ], 1274 | "source": [ 1275 | "def newfun():\n", 1276 | " \"\"\"\n", 1277 | " \n", 1278 | " I need a function that can do a b c d\n", 1279 | " \n", 1280 | " \"\"\"\n", 1281 | " pass" 1282 | ] 1283 | }, 1284 | { 1285 | "cell_type": "code", 1286 | "execution_count": null, 1287 | "metadata": {}, 1288 | "outputs": [], 1289 | "source": [] 1290 | }, 1291 | { 1292 | "cell_type": "code", 1293 | "execution_count": 37, 1294 | "metadata": {}, 1295 | "outputs": [], 1296 | "source": [ 1297 | "yourgc = percentGC(dna=yourdna)" 1298 | ] 1299 | }, 1300 | { 1301 | "cell_type": "code", 1302 | "execution_count": 38, 1303 | "metadata": {}, 1304 | "outputs": [ 1305 | { 1306 | "data": { 1307 | "text/plain": [ 1308 | "False" 1309 | ] 1310 | }, 1311 | "execution_count": 38, 1312 | "metadata": {}, 1313 | "output_type": "execute_result" 1314 | } 1315 | ], 1316 | "source": [ 1317 | "mygc > yourgc" 1318 | ] 1319 | }, 1320 | { 1321 | "cell_type": "code", 1322 | "execution_count": null, 1323 | "metadata": {}, 1324 | "outputs": [], 1325 | "source": [] 1326 | } 1327 | ], 1328 | "metadata": { 1329 | "kernelspec": { 1330 | "display_name": "Python 3", 1331 | "language": "python", 1332 | "name": "python3" 1333 | }, 1334 | "language_info": { 1335 | "codemirror_mode": { 1336 | "name": "ipython", 1337 | "version": 3 1338 | }, 1339 | "file_extension": ".py", 1340 | "mimetype": "text/x-python", 1341 | "name": "python", 1342 | "nbconvert_exporter": "python", 1343 | "pygments_lexer": "ipython3", 1344 | "version": "3.6.5" 1345 | } 1346 | }, 1347 | "nbformat": 4, 1348 | "nbformat_minor": 2 1349 | } 1350 | -------------------------------------------------------------------------------- /Intro-to-Python/07.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.\n", 9 | "\n", 10 | "\n", 11 | "## Files, Scripting and Modules\n", 12 | "\n", 13 | "So far, we have been writing all our Python Code in Jupyter notebooks. However, if you want to use the code we have written as part of a pipeline, you need to write scripts. Also, most of the time the data you need to analyse is in a file, which you need to read to Python and process. \n", 14 | "\n", 15 | "\n", 16 | "### Reading Files\n", 17 | "\n", 18 | "So far we have been working from memory. In Bioinformatics, you will need to read some file or even write some output to file. We use the `open` function. " 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 1, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "myfile = open(\"../Files/test.txt\", \"w\")\n", 28 | "myfile.write(\"My first file written from Python \\n\")\n", 29 | "myfile.write(\"---------------------------------\\n\")\n", 30 | "myfile.write(\"Hello, world!\\n\")\n", 31 | "myfile.close()" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "The **mode** in which you open the file determines whether to write (w), read (r) or append(a) to file. \n", 39 | "\n", 40 | "Opening a file creates what we call a **file handle** which contains methods for manipulating the file. In our case, `myfile` has the methods to write and close the file. Closing the file makes it accessible in the disk. \n", 41 | "\n", 42 | "Alternatively, one can open the file in a mode that automatically closes the file when done. " 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "with open(\"../Files/test.txt\", \"w\") as myfile:\n", 52 | " myfile.write(\"My first file written from Python \\n\")\n", 53 | " myfile.write(\"---------------------------------\\n\")\n", 54 | " myfile.write(\"Hello, world!\\n\")" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "Let's check what else we can do with `open`." 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 3, 67 | "metadata": {}, 68 | "outputs": [ 69 | { 70 | "data": { 71 | "text/plain": [ 72 | "\u001b[0;31mSignature:\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfile\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmode\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'r'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbuffering\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mencoding\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrors\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnewline\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mclosefd\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mopener\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 73 | "\u001b[0;31mDocstring:\u001b[0m\n", 74 | "Open file and return a stream. Raise IOError upon failure.\n", 75 | "\n", 76 | "file is either a text or byte string giving the name (and the path\n", 77 | "if the file isn't in the current working directory) of the file to\n", 78 | "be opened or an integer file descriptor of the file to be\n", 79 | "wrapped. (If a file descriptor is given, it is closed when the\n", 80 | "returned I/O object is closed, unless closefd is set to False.)\n", 81 | "\n", 82 | "mode is an optional string that specifies the mode in which the file\n", 83 | "is opened. It defaults to 'r' which means open for reading in text\n", 84 | "mode. Other common values are 'w' for writing (truncating the file if\n", 85 | "it already exists), 'x' for creating and writing to a new file, and\n", 86 | "'a' for appending (which on some Unix systems, means that all writes\n", 87 | "append to the end of the file regardless of the current seek position).\n", 88 | "In text mode, if encoding is not specified the encoding used is platform\n", 89 | "dependent: locale.getpreferredencoding(False) is called to get the\n", 90 | "current locale encoding. (For reading and writing raw bytes use binary\n", 91 | "mode and leave encoding unspecified.) The available modes are:\n", 92 | "\n", 93 | "========= ===============================================================\n", 94 | "Character Meaning\n", 95 | "--------- ---------------------------------------------------------------\n", 96 | "'r' open for reading (default)\n", 97 | "'w' open for writing, truncating the file first\n", 98 | "'x' create a new file and open it for writing\n", 99 | "'a' open for writing, appending to the end of the file if it exists\n", 100 | "'b' binary mode\n", 101 | "'t' text mode (default)\n", 102 | "'+' open a disk file for updating (reading and writing)\n", 103 | "'U' universal newline mode (deprecated)\n", 104 | "========= ===============================================================\n", 105 | "\n", 106 | "The default mode is 'rt' (open for reading text). For binary random\n", 107 | "access, the mode 'w+b' opens and truncates the file to 0 bytes, while\n", 108 | "'r+b' opens the file without truncation. The 'x' mode implies 'w' and\n", 109 | "raises an `FileExistsError` if the file already exists.\n", 110 | "\n", 111 | "Python distinguishes between files opened in binary and text modes,\n", 112 | "even when the underlying operating system doesn't. Files opened in\n", 113 | "binary mode (appending 'b' to the mode argument) return contents as\n", 114 | "bytes objects without any decoding. In text mode (the default, or when\n", 115 | "'t' is appended to the mode argument), the contents of the file are\n", 116 | "returned as strings, the bytes having been first decoded using a\n", 117 | "platform-dependent encoding or using the specified encoding if given.\n", 118 | "\n", 119 | "'U' mode is deprecated and will raise an exception in future versions\n", 120 | "of Python. It has no effect in Python 3. Use newline to control\n", 121 | "universal newlines mode.\n", 122 | "\n", 123 | "buffering is an optional integer used to set the buffering policy.\n", 124 | "Pass 0 to switch buffering off (only allowed in binary mode), 1 to select\n", 125 | "line buffering (only usable in text mode), and an integer > 1 to indicate\n", 126 | "the size of a fixed-size chunk buffer. When no buffering argument is\n", 127 | "given, the default buffering policy works as follows:\n", 128 | "\n", 129 | "* Binary files are buffered in fixed-size chunks; the size of the buffer\n", 130 | " is chosen using a heuristic trying to determine the underlying device's\n", 131 | " \"block size\" and falling back on `io.DEFAULT_BUFFER_SIZE`.\n", 132 | " On many systems, the buffer will typically be 4096 or 8192 bytes long.\n", 133 | "\n", 134 | "* \"Interactive\" text files (files for which isatty() returns True)\n", 135 | " use line buffering. Other text files use the policy described above\n", 136 | " for binary files.\n", 137 | "\n", 138 | "encoding is the name of the encoding used to decode or encode the\n", 139 | "file. This should only be used in text mode. The default encoding is\n", 140 | "platform dependent, but any encoding supported by Python can be\n", 141 | "passed. See the codecs module for the list of supported encodings.\n", 142 | "\n", 143 | "errors is an optional string that specifies how encoding errors are to\n", 144 | "be handled---this argument should not be used in binary mode. Pass\n", 145 | "'strict' to raise a ValueError exception if there is an encoding error\n", 146 | "(the default of None has the same effect), or pass 'ignore' to ignore\n", 147 | "errors. (Note that ignoring encoding errors can lead to data loss.)\n", 148 | "See the documentation for codecs.register or run 'help(codecs.Codec)'\n", 149 | "for a list of the permitted encoding error strings.\n", 150 | "\n", 151 | "newline controls how universal newlines works (it only applies to text\n", 152 | "mode). It can be None, '', '\\n', '\\r', and '\\r\\n'. It works as\n", 153 | "follows:\n", 154 | "\n", 155 | "* On input, if newline is None, universal newlines mode is\n", 156 | " enabled. Lines in the input can end in '\\n', '\\r', or '\\r\\n', and\n", 157 | " these are translated into '\\n' before being returned to the\n", 158 | " caller. If it is '', universal newline mode is enabled, but line\n", 159 | " endings are returned to the caller untranslated. If it has any of\n", 160 | " the other legal values, input lines are only terminated by the given\n", 161 | " string, and the line ending is returned to the caller untranslated.\n", 162 | "\n", 163 | "* On output, if newline is None, any '\\n' characters written are\n", 164 | " translated to the system default line separator, os.linesep. If\n", 165 | " newline is '' or '\\n', no translation takes place. If newline is any\n", 166 | " of the other legal values, any '\\n' characters written are translated\n", 167 | " to the given string.\n", 168 | "\n", 169 | "If closefd is False, the underlying file descriptor will be kept open\n", 170 | "when the file is closed. This does not work when a file name is given\n", 171 | "and must be True in that case.\n", 172 | "\n", 173 | "A custom opener can be used by passing a callable as *opener*. The\n", 174 | "underlying file descriptor for the file object is then obtained by\n", 175 | "calling *opener* with (*file*, *flags*). *opener* must return an open\n", 176 | "file descriptor (passing os.open as *opener* results in functionality\n", 177 | "similar to passing None).\n", 178 | "\n", 179 | "open() returns a file object whose type depends on the mode, and\n", 180 | "through which the standard file operations such as reading and writing\n", 181 | "are performed. When open() is used to open a file in a text mode ('w',\n", 182 | "'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open\n", 183 | "a file in a binary mode, the returned class varies: in read binary\n", 184 | "mode, it returns a BufferedReader; in write binary and append binary\n", 185 | "modes, it returns a BufferedWriter, and in read/write mode, it returns\n", 186 | "a BufferedRandom.\n", 187 | "\n", 188 | "It is also possible to use a string or bytearray as a file for both\n", 189 | "reading and writing. For strings StringIO can be used like a file\n", 190 | "opened in a text mode, and for bytes a BytesIO can be used like a file\n", 191 | "opened in a binary mode.\n", 192 | "\u001b[0;31mType:\u001b[0m builtin_function_or_method\n" 193 | ] 194 | }, 195 | "metadata": {}, 196 | "output_type": "display_data" 197 | } 198 | ], 199 | "source": [ 200 | "?open" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "#### Fetching file from the web\n", 215 | "Download this [file](https://www.uniprot.org/docs/humchrx.txt) we will use to explore file reading in python. " 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 4, 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "data": { 225 | "text/plain": [ 226 | "('../Files/humchrx.txt', )" 227 | ] 228 | }, 229 | "execution_count": 4, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "import urllib.request\n", 236 | "\n", 237 | "url = \"https://www.uniprot.org/docs/humchrx.txt\"\n", 238 | "destination_filename = \"../Files/humchrx.txt\"\n", 239 | "urllib.request.urlretrieve(url, destination_filename)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "#### Reading a file line-at-a-time\n", 247 | "\n", 248 | "We can read the file line by line using `readline`. Thie reads the line one by one until the end of the file. This is suitable for a large file which may not fit memory. " 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 5, 254 | "metadata": {}, 255 | "outputs": [ 256 | { 257 | "name": "stdout", 258 | "output_type": "stream", 259 | "text": [ 260 | "----------------------------------------------------------------------------\n", 261 | "\n" 262 | ] 263 | } 264 | ], 265 | "source": [ 266 | "humchrx = open('../Files/humchrx.txt', 'r')\n", 267 | "line = humchrx.readline()\n", 268 | "print(line)" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 6, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "humchrx.close()" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 7, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "My first file written from Python \n", 290 | "\n", 291 | "---------------------------------\n", 292 | "\n", 293 | "Hello, world!\n", 294 | "\n" 295 | ] 296 | } 297 | ], 298 | "source": [ 299 | "with open('../Files/test.txt', 'r') as myfile:\n", 300 | " while True:\n", 301 | " line = myfile.readline()\n", 302 | " if len(line) == 0: # If there are no more lines\n", 303 | " break\n", 304 | " print(line)\n", 305 | " " 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "### Read the whole file\n", 313 | "\n", 314 | "If the file is small or PC has enough memory, you can read the whole file into memory as a list using `readlines`." 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": 8, 320 | "metadata": {}, 321 | "outputs": [ 322 | { 323 | "name": "stdout", 324 | "output_type": "stream", 325 | "text": [ 326 | "My first file written from Python \n", 327 | "\n", 328 | "---------------------------------\n", 329 | "\n", 330 | "Hello, world!\n", 331 | "\n" 332 | ] 333 | } 334 | ], 335 | "source": [ 336 | "with open('../Files/test.txt', 'r') as myfile:\n", 337 | " lines = myfile.readlines()\n", 338 | " for line in lines:\n", 339 | " print(line)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "or as a whole" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 9, 352 | "metadata": {}, 353 | "outputs": [ 354 | { 355 | "name": "stdout", 356 | "output_type": "stream", 357 | "text": [ 358 | "My first file written from Python \n", 359 | "---------------------------------\n", 360 | "Hello, world!\n", 361 | "\n" 362 | ] 363 | } 364 | ], 365 | "source": [ 366 | "with open('../Files/test.txt', 'r') as myfile:\n", 367 | " whole_file = myfile.read()\n", 368 | " print(whole_file)" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "### Exercise 1\n", 376 | "\n", 377 | "Write a function the reads the file (humchr.txt) and writes to another file (gene_names.txt) a clean list of gene names." 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 82, 383 | "metadata": {}, 384 | "outputs": [], 385 | "source": [ 386 | "def get_genes(infile,outfile):\n", 387 | " \"\"\"\n", 388 | " Function to extract a list of genes and write to file\n", 389 | " \"\"\"\n", 390 | " gene_list = []\n", 391 | " with open(infile) as gene:\n", 392 | " tag = False\n", 393 | " for line in gene:\n", 394 | " if line.startswith('name'):\n", 395 | " tag = True\n", 396 | " pass\n", 397 | " if tag:\n", 398 | " items = line.split()\n", 399 | " if len(items) > 0:\n", 400 | " gene_list.append(items[0])\n", 401 | " gene_list = gene_list[1:-7]\n", 402 | " with open(outfile, 'w') as outfile:\n", 403 | " for i in gene_list:\n", 404 | " outfile.write(i+'\\n')" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 87, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "import genelist" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "### Scripts and Modules\n", 421 | "\n", 422 | "A script is a file containing Python definitions and statements for performing some analysis. Scripts are known as when they are intended for use in other Python programs. Many Python modules come with Python as part of the standard library. \n", 423 | "\n", 424 | "You can get a list of available modules using help() and explore them." 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 10, 430 | "metadata": {}, 431 | "outputs": [ 432 | { 433 | "name": "stdout", 434 | "output_type": "stream", 435 | "text": [ 436 | "\n", 437 | "Please wait a moment while I gather a list of all available modules...\n", 438 | "\n" 439 | ] 440 | }, 441 | { 442 | "name": "stderr", 443 | "output_type": "stream", 444 | "text": [ 445 | "/home/caleb/miniconda3/envs/icipe-env/lib/python3.6/site-packages/Bio/GA/__init__.py:14: BiopythonDeprecationWarning: Bio.GA has been deprecated, and we intend to remove it in a future release of Biopython. Please consider using DEAP instead. If you would like to continue using Bio.GA, please contact the Biopython developers via the mailing list or GitHub.\n", 446 | " BiopythonDeprecationWarning)\n", 447 | "/home/caleb/miniconda3/envs/icipe-env/lib/python3.6/site-packages/Bio/NeuralNetwork/__init__.py:15: BiopythonDeprecationWarning: Bio.NeuralNetwork has been deprecated, and we intend to remove it in a future release of Biopython. Please consider using scikit-learn or TensorFlow instead. If you would like to continue using Bio.NeuralNetwork, please contact the Biopython developers via the mailing list or GitHub.\n", 448 | " BiopythonDeprecationWarning)\n", 449 | "/home/caleb/miniconda3/envs/icipe-env/lib/python3.6/site-packages/Bio/SearchIO/__init__.py:211: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release.\n", 450 | " BiopythonExperimentalWarning)\n", 451 | "/home/caleb/miniconda3/envs/icipe-env/lib/python3.6/site-packages/Bio/codonalign/__init__.py:27: BiopythonExperimentalWarning: Bio.codonalign is an experimental module which may undergo significant changes prior to its future official release.\n", 452 | " BiopythonExperimentalWarning)\n", 453 | "/home/caleb/miniconda3/envs/icipe-env/lib/python3.6/site-packages/Bio/phenotype/__init__.py:101: BiopythonExperimentalWarning: Bio.phenotype is an experimental submodule which may undergo significant changes prior to its future official release.\n", 454 | " BiopythonExperimentalWarning)\n", 455 | "/home/caleb/miniconda3/envs/icipe-env/lib/python3.6/site-packages/IPython/kernel/__init__.py:13: ShimWarning: The `IPython.kernel` package has been deprecated since IPython 4.0.You should import from ipykernel or jupyter_client instead.\n", 456 | " \"You should import from ipykernel or jupyter_client instead.\", ShimWarning)\n" 457 | ] 458 | }, 459 | { 460 | "name": "stdout", 461 | "output_type": "stream", 462 | "text": [ 463 | "Bio autoreload jupyter_console select\n", 464 | "BioSQL backcall jupyter_core selectors\n", 465 | "IPython base64 jupyterlab send2trash\n", 466 | "PyQt5 bdb jupyterlab_launcher seqtools\n", 467 | "__future__ binascii keyword setuptools\n", 468 | "_ast binhex kiwisolver shelve\n", 469 | "_asyncio bisect lib2to3 shlex\n", 470 | "_bisect bleach linecache shutil\n", 471 | "_blake2 builtins locale signal\n", 472 | "_bootlocale bz2 logging simplegeneric\n", 473 | "_bz2 cProfile lxml sip\n", 474 | "_codecs calendar lzma sipconfig\n", 475 | "_codecs_cn certifi macpath sipdistutils\n", 476 | "_codecs_hk cgi macurl2path site\n", 477 | "_codecs_iso2022 cgitb mailbox six\n", 478 | "_codecs_jp chunk mailcap smtpd\n", 479 | "_codecs_kr cmath markupsafe smtplib\n", 480 | "_codecs_tw cmd marshal sndhdr\n", 481 | "_collections code math socket\n", 482 | "_collections_abc codecs matplotlib socketserver\n", 483 | "_compat_pickle codeop mimetypes spwd\n", 484 | "_compression collections mistune sqlite3\n", 485 | "_crypt colorsys mkl_fft sre_compile\n", 486 | "_csv compileall mkl_random sre_constants\n", 487 | "_ctypes concurrent mmap sre_parse\n", 488 | "_ctypes_test configparser modulefinder ssl\n", 489 | "_curses contextlib multiprocessing stat\n", 490 | "_curses_panel copy nbconvert statistics\n", 491 | "_datetime copyreg nbformat statsmodels\n", 492 | "_decimal crypt netrc storemagic\n", 493 | "_dummy_thread csv nis string\n", 494 | "_elementtree ctypes nntplib stringprep\n", 495 | "_functools curses notebook struct\n", 496 | "_hashlib cycler ntpath subprocess\n", 497 | "_heapq cythonmagic nturl2path sunau\n", 498 | "_imp datetime numbers symbol\n", 499 | "_io dateutil numpy sympyprinting\n", 500 | "_json dbm opcode symtable\n", 501 | "_locale decimal operator sys\n", 502 | "_lsprof decorator optparse sysargv\n", 503 | "_lzma difflib os sysconfig\n", 504 | "_markupbase dis ossaudiodev syslog\n", 505 | "_md5 distutils pandas tabnanny\n", 506 | "_multibytecodec doctest pandocfilters tarfile\n", 507 | "_multiprocessing dummy_threading parser telnetlib\n", 508 | "_opcode easy_install parso tempfile\n", 509 | "_operator email pathlib terminado\n", 510 | "_osx_support encodings patsy termios\n", 511 | "_pickle ensurepip pdb test\n", 512 | "_posixsubprocess entrypoints pexpect testpath\n", 513 | "_pydecimal enum pickle tests\n", 514 | "_pyio errno pickleshare textwrap\n", 515 | "_random ete3 pickletools this\n", 516 | "_sha1 faulthandler pip threading\n", 517 | "_sha256 fcntl pipes time\n", 518 | "_sha3 filecmp pkg_resources timeit\n", 519 | "_sha512 fileinput pkgutil tkinter\n", 520 | "_signal fnmatch platform token\n", 521 | "_sitebuiltins formatter plistlib tokenize\n", 522 | "_socket fractions poplib tornado\n", 523 | "_sqlite3 ftplib posix trace\n", 524 | "_sre functools posixpath traceback\n", 525 | "_ssl gc pprint tracemalloc\n", 526 | "_stat genericpath profile traitlets\n", 527 | "_string getopt prompt_toolkit tty\n", 528 | "_strptime getpass pstats turtle\n", 529 | "_struct gettext pty turtledemo\n", 530 | "_symtable glob ptyprocess types\n", 531 | "_sysconfigdata_i686_conda_cos6_linux_gnu grp pwd typing\n", 532 | "_sysconfigdata_m_linux_x86_64-linux-gnu gzip py_compile unicodedata\n", 533 | "_sysconfigdata_powerpc64le_conda_cos7_linux_gnu hashlib pybedtools unittest\n", 534 | "_sysconfigdata_x86_64_apple_darwin13_4_0 heapq pyclbr urllib\n", 535 | "_sysconfigdata_x86_64_conda_cos6_linux_gnu hmac pydoc uu\n", 536 | "_testbuffer html pydoc_data uuid\n", 537 | "_testcapi html5lib pyexpat venv\n", 538 | "_testimportmultiple http pygments warnings\n", 539 | "_testmultiphase idlelib pylab wave\n", 540 | "_thread imaplib pyparsing wcwidth\n", 541 | "_threading_local imghdr pysam weakref\n", 542 | "_tkinter imp pytz webbrowser\n", 543 | "_tracemalloc importlib qtconsole webencodings\n", 544 | "_warnings inspect queue wheel\n", 545 | "_weakref io quopri widgetsnbextension\n", 546 | "_weakrefset ipaddress random wsgiref\n", 547 | "abc ipykernel re xdrlib\n", 548 | "aifc ipykernel_launcher readline xml\n", 549 | "antigravity ipython_genutils reprlib xmlrpc\n", 550 | "argparse ipywidgets resource xxlimited\n", 551 | "array itertools rlcompleter xxsubtype\n", 552 | "ast jedi rmagic zipapp\n", 553 | "asynchat jinja2 runpy zipfile\n", 554 | "asyncio json sched zipimport\n", 555 | "asyncore jsonschema scipy zlib\n", 556 | "atexit jupyter seaborn zmq\n", 557 | "audioop jupyter_client secrets \n", 558 | "\n", 559 | "Enter any module name to get more help. Or, type \"modules spam\" to search\n", 560 | "for modules whose name or summary contain the string \"spam\".\n", 561 | "\n" 562 | ] 563 | }, 564 | { 565 | "name": "stderr", 566 | "output_type": "stream", 567 | "text": [ 568 | "/home/caleb/miniconda3/envs/icipe-env/lib/python3.6/pkgutil.py:107: VisibleDeprecationWarning: zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed.\n", 569 | " Install tornado itself to use zmq with the tornado IOLoop.\n", 570 | " \n", 571 | " yield from walk_packages(path, info.name+'.', onerror)\n" 572 | ] 573 | } 574 | ], 575 | "source": [ 576 | "help('modules')" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "### Writing you own modules\n", 584 | "\n", 585 | "All we need to do to create our own modules is to save our script as a file with a `.py` extension. Suppose, for example, this script is saved as a file named `seqtools.py`.\n", 586 | "\n", 587 | "```python\n", 588 | "def remove_at(pos, seq):\n", 589 | " return seq[:pos] + seq[pos+1:]```\n", 590 | " \n", 591 | "We can import the module as:" 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "execution_count": 2, 597 | "metadata": {}, 598 | "outputs": [], 599 | "source": [ 600 | "import dnatools" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": 3, 606 | "metadata": {}, 607 | "outputs": [ 608 | { 609 | "name": "stdout", 610 | "output_type": "stream", 611 | "text": [ 612 | "There is an invalid base 'D' at position 2\n", 613 | "There is an invalid base 'F' at position 5\n", 614 | "There is an invalid base 'F' at position 8\n" 615 | ] 616 | } 617 | ], 618 | "source": [ 619 | "dnatools.percentGC(\"ADTAFTAFTA\")" 620 | ] 621 | }, 622 | { 623 | "cell_type": "code", 624 | "execution_count": 11, 625 | "metadata": {}, 626 | "outputs": [ 627 | { 628 | "name": "stdout", 629 | "output_type": "stream", 630 | "text": [ 631 | "There is an invalid base 'V' at position 7\n", 632 | "There is an invalid base 'H' at position 8\n" 633 | ] 634 | }, 635 | { 636 | "data": { 637 | "text/plain": [ 638 | "False" 639 | ] 640 | }, 641 | "execution_count": 11, 642 | "metadata": {}, 643 | "output_type": "execute_result" 644 | } 645 | ], 646 | "source": [ 647 | "dnatools.dnacheck(\"ACGAgTVHTGATA\")" 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": 11, 653 | "metadata": {}, 654 | "outputs": [], 655 | "source": [ 656 | "import seqtools" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": 12, 662 | "metadata": {}, 663 | "outputs": [ 664 | { 665 | "data": { 666 | "text/plain": [ 667 | "'A sting!'" 668 | ] 669 | }, 670 | "execution_count": 12, 671 | "metadata": {}, 672 | "output_type": "execute_result" 673 | } 674 | ], 675 | "source": [ 676 | "s = \"A string!\"\n", 677 | "seqtools.remove_at(4,s)" 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 16, 683 | "metadata": {}, 684 | "outputs": [ 685 | { 686 | "data": { 687 | "text/plain": [ 688 | "'23000'" 689 | ] 690 | }, 691 | "execution_count": 16, 692 | "metadata": {}, 693 | "output_type": "execute_result" 694 | } 695 | ], 696 | "source": [ 697 | "'23,000,'.replace(',','')" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": 91, 703 | "metadata": {}, 704 | "outputs": [], 705 | "source": [ 706 | "import genelist" 707 | ] 708 | }, 709 | { 710 | "cell_type": "code", 711 | "execution_count": 94, 712 | "metadata": {}, 713 | "outputs": [ 714 | { 715 | "ename": "ImportError", 716 | "evalue": "cannot import name 'get_genes'", 717 | "output_type": "error", 718 | "traceback": [ 719 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 720 | "\u001b[0;31mImportError\u001b[0m Traceback (most recent call last)", 721 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mgenelist\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mget_genes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 722 | "\u001b[0;31mImportError\u001b[0m: cannot import name 'get_genes'" 723 | ] 724 | } 725 | ], 726 | "source": [ 727 | "from genelist import get_genes" 728 | ] 729 | }, 730 | { 731 | "cell_type": "markdown", 732 | "metadata": {}, 733 | "source": [ 734 | "Modules are useful when you want to analyse large data using the HPC or even create your library of handy functions. \n", 735 | "\n", 736 | "#### Running scripts\n", 737 | "\n", 738 | "When you have put your commands into a .py file, you can execute on the command line by invoking the Python interpreter using `python script.py.`" 739 | ] 740 | }, 741 | { 742 | "cell_type": "markdown", 743 | "metadata": {}, 744 | "source": [ 745 | "### Exercise 2\n", 746 | "\n", 747 | "1. Convert the function you wrote in exercise 1 into a python module. Then, import the module and use the function to read `humchrx.txt` file and create a gene list file.\n", 748 | "2. Create a stand-alone script that does all the above.\n", 749 | "\n", 750 | "\n", 751 | "### Script that takes command line arguments\n", 752 | "So far, we can create a script that does one thing. In this case, you have to edit the script if you have a new gene file to analyse or you want to use a different name for the output file.\n", 753 | "\n", 754 | "#### sys.argv\n", 755 | "sys.argv is a list in Python, which contains the command line arguments passed to the script. Lets add this to a script `sysargv.py` and run on the command line. \n", 756 | "\n", 757 | "```python\n", 758 | "import sys\n", 759 | "print(\"This is the name of the script: \", sys.argv[0])\n", 760 | "print(\"Number of arguments: \", len(sys.argv))\n", 761 | "print(\"The arguments are: \" , str(sys.argv))```" 762 | ] 763 | }, 764 | { 765 | "cell_type": "code", 766 | "execution_count": 95, 767 | "metadata": {}, 768 | "outputs": [ 769 | { 770 | "name": "stdout", 771 | "output_type": "stream", 772 | "text": [ 773 | "This is the name of the script: sysargv.py\n", 774 | "Number of arguments: 2\n", 775 | "The arguments are: ['sysargv.py', 'test']\n" 776 | ] 777 | } 778 | ], 779 | "source": [ 780 | "!python sysargv.py test" 781 | ] 782 | }, 783 | { 784 | "cell_type": "code", 785 | "execution_count": 96, 786 | "metadata": {}, 787 | "outputs": [], 788 | "source": [ 789 | "!python genelist.py ../Files/humchrx.txt ../Files/command_out.txt" 790 | ] 791 | }, 792 | { 793 | "cell_type": "markdown", 794 | "metadata": {}, 795 | "source": [ 796 | "### Exercise 3\n", 797 | "\n", 798 | "- Using the same concept, convert your script in your previous exercise to take command line arguments (input and output files)" 799 | ] 800 | } 801 | ], 802 | "metadata": { 803 | "kernelspec": { 804 | "display_name": "Python 3", 805 | "language": "python", 806 | "name": "python3" 807 | }, 808 | "language_info": { 809 | "codemirror_mode": { 810 | "name": "ipython", 811 | "version": 3 812 | }, 813 | "file_extension": ".py", 814 | "mimetype": "text/x-python", 815 | "name": "python", 816 | "nbconvert_exporter": "python", 817 | "pygments_lexer": "ipython3", 818 | "version": "3.6.5" 819 | } 820 | }, 821 | "nbformat": 4, 822 | "nbformat_minor": 2 823 | } 824 | -------------------------------------------------------------------------------- /Intro-to-Python/08.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.\n", 9 | "\n", 10 | "\n", 11 | "## Data Analysis with Pandas\n", 12 | "\n", 13 | "For this topic, we are going to use resources available from Data carpentry's [Python for Ecologists](http://www.datacarpentry.org/python-ecology-lesson/). \n", 14 | "\n", 15 | "\n", 16 | "### Set up\n", 17 | "\n", 18 | "Ensure you have installed `pandas` and `matplotlib` before the session. \n", 19 | "\n", 20 | "`conda install pandas`\n", 21 | "\n", 22 | "Follow the instructions provided [here](http://www.datacarpentry.org/python-ecology-lesson/setup) for further details on setting up, and to download the data. \n", 23 | "\n", 24 | "### Some useful resources\n", 25 | "\n", 26 | "1. [10 Minutes Pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html) provides a quick introduction to pandas Data Types and syntax\n", 27 | "2. Dataquest's [Pandas Tutorial: Data analysis with Python: Part 1](https://www.dataquest.io/blog/pandas-python-tutorial/)\n", 28 | "3. Coding Club's [Python Data Analysis with Pandas and Matplotlib](https://ourcodingclub.github.io/2018/04/18/pandas-python-intro.html)\n", 29 | "\n", 30 | "### Training format\n", 31 | "\n", 32 | "In this lesson, we will use live coding to follow along with Python for Ecologist's resources. " 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "import pandas as pd" 42 | ] 43 | } 44 | ], 45 | "metadata": { 46 | "kernelspec": { 47 | "display_name": "Python 3", 48 | "language": "python", 49 | "name": "python3" 50 | }, 51 | "language_info": { 52 | "codemirror_mode": { 53 | "name": "ipython", 54 | "version": 3 55 | }, 56 | "file_extension": ".py", 57 | "mimetype": "text/x-python", 58 | "name": "python", 59 | "nbconvert_exporter": "python", 60 | "pygments_lexer": "ipython3", 61 | "version": "3.6.5" 62 | } 63 | }, 64 | "nbformat": 4, 65 | "nbformat_minor": 2 66 | } 67 | -------------------------------------------------------------------------------- /Intro-to-Python/09.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics.\n", 9 | "\n", 10 | "\n", 11 | "\n", 12 | "## Reproducible Bioinformatics Research\n", 13 | "\n", 14 | "How can we use Jupyter Notebooks, Conda environments, Bioconda Channel and GitHub to ensure reproducible Bioinformatics Research? To explore these topics, we'll use various Open learning resource online:\n", 15 | "- [Bioinformatics best practices](https://github.com/griffithlab/rnaseq_tutorial/wiki/Bioinformatics-Best-Practices)\n", 16 | "- [Bioconda promises to ease bioinformatics software installation woes](http://blogs.nature.com/naturejobs/2017/11/03/techblog-bioconda-promises-to-ease-bioinformatics-software-installation-woes/)\n", 17 | "- Read the paper: [Bioconda: A sustainable and comprehensive software distribution for the life sciences](https://doi.org/10.1101/207092)\n", 18 | "\n", 19 | "\n", 20 | "### 1. Conda environments\n", 21 | "We've seen how you can create a conda environment. But how can you ensure someone else reproduces your set up? We'll also learn how to create environments for different projects. \n", 22 | "\n", 23 | "### 2. Bioconda Chanel\n", 24 | "\n", 25 | "Here, we'll explore some of the useful Bioinformatics packages in this channel, and how we can use them to conduct reproducible research. \n", 26 | "\n", 27 | "\n", 28 | "### 3. GitHub\n", 29 | "You have a reproducible environment and research notebook, how can your version and make your research accessible by others? This will be a quick introduction to version control with Git and GitHub." 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [] 38 | } 39 | ], 40 | "metadata": { 41 | "kernelspec": { 42 | "display_name": "Python 3", 43 | "language": "python", 44 | "name": "python3" 45 | }, 46 | "language_info": { 47 | "codemirror_mode": { 48 | "name": "ipython", 49 | "version": 3 50 | }, 51 | "file_extension": ".py", 52 | "mimetype": "text/x-python", 53 | "name": "python", 54 | "nbconvert_exporter": "python", 55 | "pygments_lexer": "ipython3", 56 | "version": "3.6.5" 57 | } 58 | }, 59 | "nbformat": 4, 60 | "nbformat_minor": 2 61 | } 62 | -------------------------------------------------------------------------------- /Intro-to-Python/bank.py: -------------------------------------------------------------------------------- 1 | acountbal = 50000 2 | choice = input("Please enter 'b' to check balance or 'w' to withdraw: ") 3 | while choice != 'q': 4 | if choice.lower() in ('w','b'): 5 | if choice.lower() == 'b': 6 | print("Your balance is: %d" % acountbal) 7 | print("Anything else?") 8 | choice = input("Enter b for balance, w to withdraw or q to quit 1: ") 9 | print(choice.lower()) 10 | else: 11 | try: 12 | withdraw = float(input("Enter amount to withdraw: ").replace(',','')) 13 | if withdraw <= acountbal: 14 | print("here is your: %.2f" % withdraw) 15 | acountbal = acountbal - withdraw 16 | print("Anything else?") 17 | choice = input("Enter b for balance, w to withdraw or q to quit 2: ") 18 | #choice = 'q' 19 | else: 20 | print("You have insufficient funds: %.2f" % acountbal) 21 | except: 22 | print("Enter amount in digits") 23 | else: 24 | print("Wrong choice!") 25 | choice = input("Please enter 'b' to check balance or 'w' to withdraw: ") -------------------------------------------------------------------------------- /Intro-to-Python/dnatools.py: -------------------------------------------------------------------------------- 1 | def percentGC(dna): 2 | '''calculates the percentage GC content, given a DNA sequence''' 3 | if dnacheck(dna): 4 | dna_len= len(dna) 5 | gs = dna.count('G') 6 | cs = dna.count('C') 7 | 8 | return (gs+cs)/dna_len*100 9 | def dnaTest(x): 10 | dnaset = set("AGCT") 11 | y = set(x.upper()).union(dnaset) == dnaset 12 | return y 13 | 14 | def dnatest(dnaseq): 15 | testdna =[] 16 | dna = dnaseq.upper() 17 | for i in dna: 18 | result = str(i) in ['T', 'A', 'G', 'C'] 19 | testdna.append(result) 20 | if testdna.count(False) > 0: 21 | return False 22 | else: 23 | return True 24 | 25 | def get_genes(infile,outfile): 26 | """ 27 | Function to extract a list of genes and write to file 28 | """ 29 | gene_list = [] 30 | with open(infile) as gene: 31 | tag = False 32 | for line in gene: 33 | if line.startswith('name'): 34 | tag = True 35 | pass 36 | if tag: 37 | items = line.split() 38 | if len(items) > 0: 39 | gene_list.append(items[0]) 40 | gene_list = gene_list[1:-7] 41 | with open(outfile, 'w') as outfile: 42 | for i in gene_list: 43 | outfile.write(i+'\n') 44 | 45 | def dnacheck(dna): 46 | counter = 0 47 | check = True 48 | valid_dna = 'ACGT' 49 | for i in dna.upper(): 50 | counter += 1 51 | if i in valid_dna: 52 | pass 53 | else: 54 | check = False 55 | print("There is an invalid base '%s' at position %d" 56 | % (i,counter)) 57 | return check 58 | 59 | -------------------------------------------------------------------------------- /Intro-to-Python/execution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kipkurui/Python4Bioinformatics/04c076d7f9f665142f35e838ab2c9b50067e5293/Intro-to-Python/execution.png -------------------------------------------------------------------------------- /Intro-to-Python/genelist.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | def get_genes(infile,outfile): 4 | """ 5 | Function to extract a list of genes and write to file 6 | """ 7 | gene_list = [] 8 | with open(infile) as gene: 9 | tag = False 10 | for line in gene: 11 | if line.startswith('name'): 12 | tag = True 13 | continue 14 | if tag: 15 | items = line.split() 16 | if len(items) > 0: 17 | gene_list.append(items[0]) 18 | gene_list = gene_list[1:-7] 19 | with open(outfile, 'w') as outfile: 20 | for i in gene_list: 21 | outfile.write(i+'\n') 22 | return True 23 | 24 | infile = sys.argv[1] 25 | output = sys.argv[2] 26 | get_genes(infile,output) 27 | -------------------------------------------------------------------------------- /Intro-to-Python/seqtools.py: -------------------------------------------------------------------------------- 1 | def remove_at(pos, seq): 2 | return seq[:pos] + seq[pos+1:] -------------------------------------------------------------------------------- /Intro-to-Python/sysargv.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | print("This is the name of the script: ", sys.argv[0]) 4 | print("Number of arguments: ", len(sys.argv)) 5 | print("The arguments are: " , str(sys.argv)) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python For Bioinformatics 2 | 3 | Introduction to Python for Bioinformatics - available at https://github.com/kipkurui/Python4Bioinformatics. 4 | 5 | 6 | 7 | ## Attribution 8 | These tutorials are an adaptation of the Introduction to Python for Maths by [Andreas Ernst](http://users.monash.edu.au/~andreas), available from https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git. The original version was written by Rajath Kumar and is available at https://github.com/rajathkumarmp/Python-Lectures. 9 | 10 | These notes have been greatly amended and updated for the EANBiT Introduction to Python for Bioinformatics course facilitated [Caleb Kibet](https://twitter.com/calkibet), Audrey Mbogho and Anthony Etuk. 11 | 12 | 13 | # Quick Introduction to Jupyter Notebooks 14 | 15 | Throughout this course, we will be using Jupyter Notebooks. Although the HPC you will be using will have Jupyter setup, these notes are provided for you want to set it up in your Computer. 16 | 17 | ## Introduction 18 | The Jupyter Notebook is an interactive computing environment that enables users to author notebooks, which contain a complete and self-contained record of a computation. These notebooks can be shared more efficiently. The notebooks may contain: 19 | * Live code 20 | * Interactive widgets 21 | * Plots 22 | * Narrative text 23 | * Equations 24 | * Images 25 | * Video 26 | 27 | It is good to note that "Jupyter" is a loose acronym meaning Julia, Python, and R; the primary languages supported by Jupyter. 28 | 29 | The notebook can allow a computational researcher to create reproducible documentation of their research. As Bioinformatics is datacentric, use of Jupyter Notebooks increases research transparency, hence promoting open science. 30 | 31 | ## First Steps 32 | 33 | ### Installation 34 | 35 | 1. [Download Miniconda](https://www.anaconda.com/download/) for your specific OS to your home directory 36 | - Linux: `wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh` 37 | - Mac: `curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh` 38 | 2. Run: 39 | - `bash Miniconda3-latest-Linux-x86_64.sh` 40 | - `bash Miniconda3-latest-MacOSX-x86_64.sh` 41 | 3. Follow all the prompts: if unsure, accept defaults 42 | 4. Close and re-open your terminal 43 | 5. If the installation is successful, you should see a list of installed packages with 44 | - `conda list` 45 | If the command cannot be found, you can add Anaconda bin to the path using: 46 | ` export PATH=~/miniconda3/bin:$PATH` 47 | 48 | For reproducible analysis, you can [create a conda environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) with all the Python packages you used. 49 | 50 | `conda create --name bioinf python jupyter` 51 | 52 | To activate the conda environment: 53 | `source activate bioinf` 54 | 55 | Having set-up conda environment, you can install `jupyter lab` using pip. 56 | 57 | `conda install -c conda-forge jupyterlab` 58 | 59 | or by using pip 60 | 61 | `pip3 install jupyter` 62 | 63 | ## How to learn from this resource? 64 | 65 | Download all the notebooks from [Python4Bioinformatics(https://github.com/kipkurui/Python4Bioinformatics). The easiest way to do that is to clone the GitHub repository to your working directory using any of the following commands: 66 | 67 | git clone https://github.com/kipkurui/Python4Bioinformatics.git 68 | 69 | or 70 | 71 | wget https://github.com/kipkurui/Python4Bioinformatics/archive/master.zip 72 | 73 | unzip master.zip 74 | 75 | rm master.zip 76 | 77 | cd Python4Bioinformatics-master 78 | 79 | Then you can quickly launch jupyter lab using: 80 | 81 | `jupyter lab` 82 | 83 | NB: We will use a jupyter lab for training. 84 | A Jupyter notebook is made up of many cells. Each cell can contain Python code. You can execute a cell by clicking on it and pressing `Shift-Enter` or `Ctrl-Enter` (run without moving to the next line). 85 | 86 | ### Login into the web server 87 | 88 | The easiest way to run this and other notebooks for the EANBiT course participants is to log into the Jupyter server (Unfortunately, this is not currently working). The steps for running notebooks are: 89 | * Log in using the username and password assigned to you. The first time you log in an empty account will automatically be set up for you. 90 | * Press the start button (if prompted by the system) 91 | * Use the menu of the jupyter system to upload a .ipynb python notebook file or to start a new notebook. 92 | 93 | ### Further help 94 | 95 | To learn more about Jupyter notebooks, check [the official introduction](http://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb) and [some useful Jupyter Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/). 96 | 97 | Book: http://www.ict.ru.ac.za/Resources/cspw/thinkcspy3/thinkcspy3.pdf 98 | 99 | # Python for Bioinformatics 100 | 101 | ## Introduction 102 | 103 | Python is a modern, robust, high-level programming language. It is straightforward to pick up even if you are entirely new to programming. 104 | 105 | Python, similar to other languages like Matlab or R, is interpreted hence runs slowly compared to C++, Fortran or Java. However, writing programs in Python is very quick. Python has an extensive collection of libraries for everything from scientific computing to web services. It caters for object-oriented and functional programming with a module system that allows large and complex applications to be developed in Python. 106 | 107 | These lectures are using Jupyter notebooks which mix Python code with documentation. The python notebooks can be run on a web server or stand-alone on a computer. 108 | 109 | 110 | ## Contents 111 | 112 | This course is broken up into a number of notebooks (lectures). 113 | ### Session 1 114 | * [00](Intro-to-Python/00.ipynb) This introduction with additional information below on how to get started in running python 115 | * [01](Intro-to-Python/01.ipynb) Basic data types and operations (numbers, strings) 116 | 117 | ### Session 2 118 | * [02](Intro-to-Python/02.ipynb) String manipulation 119 | * [03](Intro-to-Python/03.ipynb) Data structures: Lists and Tuples 120 | * [04](Intro-to-Python/04.ipynb) Data structures (continued): dictionaries 121 | 122 | ### Session 3 123 | * [05](Intro-to-Python/05.ipynb) Control statements: if, for, while, try statements 124 | * [06](Intro-to-Python/06.ipynb) Functions 125 | * [07](Intro-to-Python/07.ipynb) Scripting with python 126 | * [08](Intro-to-Python/08.ipynb) Data Analysis and plotting with Pandas 127 | * [09](Intro-to-Python/09.ipynb) Reproducible Bioinformatics Research 128 | 129 | This is a tutorial style introduction to Python. For a quick reminder/summary of Python syntax, the following [Quick Reference Card](http://www.cs.put.poznan.pl/csobaniec/software/python/py-qrc.html) may be useful. A longer and more detailed tutorial style introduction to python is available from the python site at: https://docs.python.org/3/tutorial/. 130 | 131 | 132 | 133 | 134 | 135 | ## How to Contribute 136 | 137 | To contribute, fork the repository, make some updates and send me a pull request. 138 | 139 | Alternatively, you can open an issue. 140 | 141 | ## License 142 | This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/. 143 | --------------------------------------------------------------------------------