├── workshops ├── docs │ ├── modules │ │ ├── data │ │ ├── images │ │ ├── notebooks │ │ │ ├── plot1.png │ │ │ ├── images │ │ │ │ ├── inner-join.png │ │ │ │ ├── left_join.png │ │ │ │ ├── loops_image.png │ │ │ │ ├── make_female.png │ │ │ │ ├── plot_mean_weight.png │ │ │ │ ├── plot_total_animals.png │ │ │ │ ├── testing.svg │ │ │ │ ├── slicing-indexing.svg │ │ │ │ └── slicing-slicing.svg │ │ │ ├── speciesSubset.csv │ │ │ ├── data │ │ │ │ └── speciesSubset.csv │ │ │ ├── Pipfile │ │ │ ├── wip │ │ │ │ ├── README.md │ │ │ │ ├── more_data_structures.ipynb │ │ │ │ ├── functions.ipynb │ │ │ │ ├── conditionals.ipynb │ │ │ │ ├── slicing_and_list_comprehensions.ipynb │ │ │ │ └── basics_data_carpentry.ipynb │ │ │ ├── nbconvert_templates │ │ │ │ ├── student_markdown.tpl │ │ │ │ ├── workshop_notes.tpl │ │ │ │ ├── instructor_markdown.tpl │ │ │ │ ├── student.tpl │ │ │ │ ├── workshop_notes_markdown.tpl │ │ │ │ └── instructor.tpl │ │ │ ├── loops.ipynb │ │ │ └── defensive_programming.ipynb │ │ ├── indexing_files │ │ │ └── indexing_74_1.png │ │ ├── working_with_data_files │ │ │ ├── working_with_data_57_1.png │ │ │ ├── working_with_data_59_1.png │ │ │ ├── working_with_data_62_1.png │ │ │ ├── working_with_data_64_1.png │ │ │ └── working_with_data_70_1.png │ │ ├── plotting_with_ggplot_files │ │ │ ├── plotting_with_ggplot_10_0.png │ │ │ ├── plotting_with_ggplot_12_0.png │ │ │ ├── plotting_with_ggplot_14_0.png │ │ │ ├── plotting_with_ggplot_16_0.png │ │ │ ├── plotting_with_ggplot_18_0.png │ │ │ ├── plotting_with_ggplot_21_0.png │ │ │ ├── plotting_with_ggplot_22_0.png │ │ │ ├── plotting_with_ggplot_24_0.png │ │ │ ├── plotting_with_ggplot_26_0.png │ │ │ ├── plotting_with_ggplot_28_0.png │ │ │ ├── plotting_with_ggplot_31_0.png │ │ │ ├── plotting_with_ggplot_36_0.png │ │ │ ├── plotting_with_ggplot_38_0.png │ │ │ ├── plotting_with_ggplot_39_0.png │ │ │ ├── plotting_with_ggplot_43_0.png │ │ │ ├── plotting_with_ggplot_44_0.png │ │ │ ├── plotting_with_ggplot_47_0.png │ │ │ ├── plotting_with_ggplot_48_0.png │ │ │ └── plotting_with_ggplot_50_0.png │ │ ├── loops.md │ │ ├── defensive_programming.md │ │ ├── plotting_with_ggplot.md │ │ └── intro.md │ ├── css │ │ └── extra.css │ ├── halfday.md │ ├── fullday.md │ └── index.md └── mkdocs.yml ├── deploy.sh ├── .gitmodules ├── Pipfile ├── .gitignore ├── scripts └── markdown2ipynb.py ├── LICENSE.md └── README.md /workshops/docs/modules/data: -------------------------------------------------------------------------------- 1 | notebooks/data -------------------------------------------------------------------------------- /workshops/docs/modules/images: -------------------------------------------------------------------------------- 1 | notebooks/images -------------------------------------------------------------------------------- /deploy.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cd workshops 4 | mkdocs gh-deploy 5 | cd .. 6 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "themes/mkdocs-windmill"] 2 | path = themes/mkdocs-windmill 3 | url = https://github.com/gristlabs/mkdocs-windmill 4 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/plot1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/plot1.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/inner-join.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/inner-join.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/left_join.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/left_join.png -------------------------------------------------------------------------------- /workshops/docs/modules/indexing_files/indexing_74_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/indexing_files/indexing_74_1.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/loops_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/loops_image.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/make_female.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/make_female.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/plot_mean_weight.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/plot_mean_weight.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/plot_total_animals.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/plot_total_animals.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/speciesSubset.csv: -------------------------------------------------------------------------------- 1 | "species_id","genus","species","taxa" 2 | "DM","Dipodomys","merriami","Rodent" 3 | "NL","Neotoma","albigula","Rodent" 4 | "PE","Peromyscus","eremicus","Rodent" 5 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/data/speciesSubset.csv: -------------------------------------------------------------------------------- 1 | "species_id","genus","species","taxa" 2 | "DM","Dipodomys","merriami","Rodent" 3 | "NL","Neotoma","albigula","Rodent" 4 | "PE","Peromyscus","eremicus","Rodent" 5 | -------------------------------------------------------------------------------- /workshops/docs/modules/working_with_data_files/working_with_data_57_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_57_1.png -------------------------------------------------------------------------------- /workshops/docs/modules/working_with_data_files/working_with_data_59_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_59_1.png -------------------------------------------------------------------------------- /workshops/docs/modules/working_with_data_files/working_with_data_62_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_62_1.png -------------------------------------------------------------------------------- /workshops/docs/modules/working_with_data_files/working_with_data_64_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_64_1.png -------------------------------------------------------------------------------- /workshops/docs/modules/working_with_data_files/working_with_data_70_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_70_1.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_10_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_10_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_12_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_12_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_14_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_14_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_16_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_16_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_18_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_18_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_21_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_21_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_22_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_22_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_24_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_24_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_26_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_26_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_28_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_28_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_31_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_31_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_36_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_36_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_38_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_38_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_39_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_39_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_43_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_43_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_44_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_44_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_47_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_47_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_48_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_48_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_50_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_50_0.png -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/Pipfile: -------------------------------------------------------------------------------- 1 | [[source]] 2 | url = "https://pypi.org/simple" 3 | verify_ssl = true 4 | name = "pypi" 5 | 6 | [packages] 7 | 8 | [dev-packages] 9 | 10 | [requires] 11 | python_version = "3.6" 12 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/wip/README.md: -------------------------------------------------------------------------------- 1 | These are work-in-progress modules that don't yet get rendered to Markdown or 2 | the main site. 3 | 4 | Once these modules take shape they can be moved to `workshops/docs/modules/notebooks` 5 | and integrated. 6 | -------------------------------------------------------------------------------- /Pipfile: -------------------------------------------------------------------------------- 1 | [[source]] 2 | url = "https://pypi.org/simple" 3 | verify_ssl = true 4 | name = "pypi" 5 | 6 | [packages] 7 | mkdocs = "*" 8 | mkdocs-windmill = "*" 9 | mkdocs-bootswatch = "*" 10 | mkdocs-cinder = "*" 11 | mkdocs-cluster = "*" 12 | jupyter = "*" 13 | pandas = "*" 14 | numpy = "*" 15 | plotnine = "*" 16 | 17 | [dev-packages] 18 | 19 | [requires] 20 | python_version = "3.6" 21 | pyyaml = ">=4.2b1" 22 | notebook = ">=5.7.2" 23 | -------------------------------------------------------------------------------- /workshops/docs/css/extra.css: -------------------------------------------------------------------------------- 1 | /* default boxes around code (Jupyter input and output blocks) */ 2 | pre { 3 | border-style: solid; 4 | border-width: 1px; 5 | border-color: rgb(204, 204, 204); 6 | } 7 | 8 | /* only left border on Jupyter output blocks */ 9 | pre.output { 10 | border-left-style: solid; 11 | border-top-style: none; 12 | border-right-style: none; 13 | border-bottom-style: none; 14 | border-width: 1px; 15 | border-color: #008cba; 16 | } 17 | -------------------------------------------------------------------------------- /workshops/docs/halfday.md: -------------------------------------------------------------------------------- 1 | # Introduction to Python Workshop (half-day) 2 | 3 | Welcome to _Introduction to Python_ ! 4 | 5 | ## Sections 6 | 7 | * 01 - [Introduction - the basics of Python](modules/intro.md) 8 | * 02 - [Data analysis in Python with Pandas](modules/working_with_data.md) 9 | * 03 - [Missing Values](modules/missing_values.md) 10 | * 04 - [Repetitive tasks with loops](modules/loops.md) 11 | * 05 - [Plotting with plotnine (ggplot)](modules/plotting_with_ggplot.md) 12 | 13 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | 3 | # mkdocs generated site 4 | /workshops/site 5 | 6 | # Some side-effect outputs from ./build.sh we don't want 7 | /workshops/docs/modules/notebooks/surveys.csv 8 | /workshops/docs/modules/notebooks/speciesSubset.csv 9 | /workshops/docs/modules/notebooks/function_surveys*.csv 10 | /workshops/docs/modules/notebooks/output/* 11 | 12 | # Byte-compiled / optimized / DLL files 13 | __pycache__/ 14 | *.py[cod] 15 | *$py.class 16 | 17 | # C extensions 18 | *.so 19 | 20 | 21 | -------------------------------------------------------------------------------- /workshops/docs/fullday.md: -------------------------------------------------------------------------------- 1 | # Introduction to Python Workshop 2 | 3 | Welcome to _Introduction to Python_ ! 4 | 5 | ## Sections 6 | 7 | * 01 - [Introduction - the basics of Python](modules/intro.md) 8 | * 02 - [Repetitive tasks with loops](modules/loops.md) 9 | * 03 - [Data analysis in Python with Pandas](modules/working_with_data.md) 10 | * 04 - [Reusable and modular code with functions](modules/functions.md) 11 | * 05 - [Handling Missing Values](modules/missing_values.md) 12 | * 06 - [Plotting with plotnine (ggplot)](modules/plotting_with_ggplot.md) 13 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/nbconvert_templates/student_markdown.tpl: -------------------------------------------------------------------------------- 1 | {% extends 'workshop_notes_markdown.tpl'%} 2 | 3 | {% block any_cell %} 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %} 5 | {{ super() }} 6 | {% elif 'solution' in cell['metadata'].get('tags', []) %} 7 | 8 | {% elif 'instructor' in cell['metadata'].get('tags', []) %} 9 | 10 | {% elif 'hide' in cell['metadata'].get('tags', []) %} 11 | 12 | {% elif 'oneday' in cell['metadata'].get('tags', []) %} 13 | 14 | {% else %} 15 | {{ super() }} 16 | {% endif %} 17 | {% endblock any_cell %} 18 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/nbconvert_templates/workshop_notes.tpl: -------------------------------------------------------------------------------- 1 | {% extends 'full.tpl'%} 2 | 3 | {%- block header -%} 4 | {{ super() }} 5 | 6 | 7 | 8 | 24 | 25 | {%- endblock header -%} 26 | 27 | {% block in_prompt -%} 28 |
29 | >>>  30 |
31 | {%- endblock in_prompt -%} 32 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/nbconvert_templates/instructor_markdown.tpl: -------------------------------------------------------------------------------- 1 | {% extends 'workshop_notes_markdown.tpl'%} 2 | 3 | {% block any_cell %} 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %} 5 | {{ super() }} 6 | {% elif 'solution' in cell['metadata'].get('tags', []) %} 7 | {{ super() }} 8 | {% elif 'instructor' in cell['metadata'].get('tags', []) %} 9 | {{ super() }} 10 | {% elif 'hide' in cell['metadata'].get('tags', []) %} 11 |
12 |
13 | {% elif 'oneday' in cell['metadata'].get('tags', []) %} 14 |
15 |
16 | {% else %} 17 | {{ super() }} 18 | {% endif %} 19 | {% endblock any_cell %} 20 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/nbconvert_templates/student.tpl: -------------------------------------------------------------------------------- 1 | {% extends 'workshop_notes.tpl'%} 2 | 3 | {% block any_cell %} 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %} 5 |
6 | {{ super() }} 7 |
8 | {% elif 'solution' in cell['metadata'].get('tags', []) %} 9 |
10 |
11 | {% elif 'instructor' in cell['metadata'].get('tags', []) %} 12 |
13 |
14 | {% elif 'hide' in cell['metadata'].get('tags', []) %} 15 |
16 |
17 | {% elif 'oneday' in cell['metadata'].get('tags', []) %} 18 |
19 |
20 | {% else %} 21 | {{ super() }} 22 | {% endif %} 23 | {% endblock any_cell %} 24 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/nbconvert_templates/workshop_notes_markdown.tpl: -------------------------------------------------------------------------------- 1 | {% extends 'markdown.tpl'%} 2 | 3 | {%- block header -%} 4 | {{ super() }} 5 | 6 | 15 | {%- endblock header -%} 16 | 17 | {% block stream %} 18 |
19 | 
output
20 | 21 | {{ output.text }} 22 | 23 |
24 | {% endblock stream %} 25 | 26 | {% block data_text scoped %} 27 |
28 | 
output
29 | 30 | {{ output.get('data', {}).get('text/plain', '') }} 31 | 32 |
33 | {% endblock data_text %} 34 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/nbconvert_templates/instructor.tpl: -------------------------------------------------------------------------------- 1 | {% extends 'workshop_notes.tpl'%} 2 | 3 | {% block any_cell %} 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %} 5 |
6 | {{ super() }} 7 |
8 | {% elif 'solution' in cell['metadata'].get('tags', []) %} 9 |
10 | {{ super() }} 11 |
12 | {% elif 'instructor' in cell['metadata'].get('tags', []) %} 13 |
14 | {{ super() }} 15 |
16 | {% elif 'hide' in cell['metadata'].get('tags', []) %} 17 |
18 |
19 | {% elif 'oneday' in cell['metadata'].get('tags', []) %} 20 |
21 |
22 | {% else %} 23 | {{ super() }} 24 | {% endif %} 25 | {% endblock any_cell %} 26 | -------------------------------------------------------------------------------- /workshops/mkdocs.yml: -------------------------------------------------------------------------------- 1 | --- 2 | site_name: 'Introduction to Python Workshop' 3 | repo_url: 'https://github.com/MonashDataFluency/python-workshop-base' 4 | edit_uri: 'blob/master/README.md#modifying-and-building' 5 | site_description: 'Monash Data Fluency Python Workshops' 6 | theme: cinder 7 | # theme: windmill 8 | # theme: cluster 9 | # theme: bootswatch 10 | 11 | extra_css: 12 | - css/extra.css 13 | 14 | pages: 15 | - 'Home': 'index.md' 16 | - 'Modules': 17 | - 'Introduction - the basics of Python': 'modules/intro.md' 18 | - 'Working with Data': 'modules/working_with_data.md' 19 | - 'Missing values': 'modules/missing_values.md' 20 | - 'Indexing': 'modules/indexing.md' 21 | - 'Loops': 'modules/loops.md' 22 | - 'Combining DataFrames with Pandas': 'modules/merging_data.md' 23 | - 'Plotting with ggplot for Python': 'modules/plotting_with_ggplot.md' 24 | - 'Reusable and modular code with functions': 'modules/functions.md' 25 | - 'Defensive Programming': 'modules/defensive_programming.md' 26 | - 'Half Day Course': 'halfday.md' 27 | - 'Full Day Course': 'fullday.md' 28 | -------------------------------------------------------------------------------- /workshops/docs/index.md: -------------------------------------------------------------------------------- 1 | # Introduction to Python Workshop 2 | 3 | Welcome to _Introduction to Python_ ! 4 | 5 | ## Modules 6 | 7 | * 01 - [Introduction - the basics of Python](modules/intro.md) 8 | * 02 - [Data analysis in Python with Pandas](modules/working_with_data.md) 9 | * 03 - [Indexing and slicing](modules/indexing.md) 10 | * 04 - [Missing Values](modules/missing_values.md) 11 | * 05 - [Combining DataFrames in Pandas](modules/merging_data.md) 12 | * 06 - [Repetitive tasks with loops](modules/loops.md) 13 | * 07 - [Plotting with plotnine (ggplot)](modules/plotting_with_ggplot.md) 14 | * 08 - [Reusable and modular code with functions](modules/functions.md) 15 | * 09 - [Defensive Programming](modules/defensive_programming.md) 16 | 17 | Some of these modules have been adapted from the original versions at 18 | [Data Carpentry - Python for Ecologists](http://www.datacarpentry.org/python-ecology-lesson/) 19 | and [Software Carpentry - Programming with Python](https://swcarpentry.github.io/python-novice-inflammation/) 20 | (used under a [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/)). 21 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/testing.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | −3.0 7 | 8 | 5.0 9 | 10 | 0.0 11 | 12 | 4.5 13 | 14 | −1.5 15 | 16 | 2.0 17 | 18 | 0.0 19 | 20 | 2.0 21 | 22 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/wip/more_data_structures.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Sets" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "data": { 17 | "text/plain": [ 18 | "{1, 2, 3, 4}" 19 | ] 20 | }, 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "output_type": "execute_result" 24 | } 25 | ], 26 | "source": [ 27 | "unique_items = set([1, 1, 2, 2, 3, 4, 1, 2, 3, 4])\n", 28 | "# or curly brackets\n", 29 | "# unique_items = {1, 1, 2, 2, 3, 4, 1, 2, 3, 4}\n", 30 | "unique_items" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [] 39 | } 40 | ], 41 | "metadata": { 42 | "kernelspec": { 43 | "display_name": "Python 3", 44 | "language": "python", 45 | "name": "python3" 46 | }, 47 | "language_info": { 48 | "codemirror_mode": { 49 | "name": "ipython", 50 | "version": 3 51 | }, 52 | "file_extension": ".py", 53 | "mimetype": "text/x-python", 54 | "name": "python", 55 | "nbconvert_exporter": "python", 56 | "pygments_lexer": "ipython3", 57 | "version": "3.6.3" 58 | } 59 | }, 60 | "nbformat": 4, 61 | "nbformat_minor": 2 62 | } 63 | -------------------------------------------------------------------------------- /scripts/markdown2ipynb.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Hacky script to take Markdown with ```python code blocks 3 | # and convert to a ipynb format Jupyter notebook. 4 | # The result will almost always need hand editing in Jupyter after 5 | # conversion. 6 | # 7 | # Usage: 8 | # 9 | # python markdown2ipynb.py some_markdown.md >a_notebook.ipynb 10 | 11 | import sys 12 | import json 13 | 14 | with open(sys.argv[1], 'r') as f: 15 | content = f.readlines() 16 | 17 | metadata = { 18 | "metadata": { 19 | "celltoolbar": "Tags", 20 | "kernelspec": { 21 | "display_name": "Python 3", 22 | "language": "python", 23 | "name": "python3" 24 | }, 25 | "language_info": { 26 | "codemirror_mode": { 27 | "name": "ipython", 28 | "version": 3 29 | }, 30 | "file_extension": ".py", 31 | "mimetype": "text/x-python", 32 | "name": "python", 33 | "nbconvert_exporter": "python", 34 | "pygments_lexer": "ipython3", 35 | "version": "3.6.3" 36 | } 37 | }, 38 | "nbformat": 4, 39 | "nbformat_minor": 2 40 | } 41 | 42 | cells = {"cells": []} 43 | split_at_markdown_h2 = True 44 | in_python_block = False 45 | source = [] 46 | for line in content: 47 | if line.startswith("```python"): 48 | in_python_block = True 49 | # source_lines = ["%s\n" % l for l in source] 50 | cells['cells'].append( 51 | {"cell_type": "markdown", 52 | "metadata": {}, 53 | "source": list(source)}) 54 | source = [] 55 | continue 56 | 57 | if in_python_block and line.startswith("```"): 58 | in_python_block = False 59 | cells['cells'].append( 60 | {"cell_type": "code", 61 | "metadata": {}, 62 | "outputs": [], 63 | "execution_count": 0, 64 | "source": list(source)}) 65 | source = [] 66 | continue 67 | 68 | if not in_python_block: 69 | if split_at_markdown_h2 and line.startswith("## "): 70 | cells['cells'].append( 71 | {"cell_type": "markdown", 72 | "metadata": {}, 73 | "source": list(source)}) 74 | source = [] 75 | source.append(line) 76 | continue 77 | 78 | source.append(line) 79 | 80 | cells.update(metadata) 81 | print(json.dumps(cells, indent=2)) 82 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/slicing-indexing.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | background 6 | 7 | 8 | 9 | 10 | 11 | 12 | Layer 1 13 | 14 | 15 | grades = [88, 72, 93, 94] 16 | 17 | 0 1 3 18 | indexing: getting a specific element 19 | 20 | >>> grades[2] 21 | 93 22 | 2 23 | 24 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/images/slicing-slicing.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | background 6 | 7 | 8 | 9 | 10 | 11 | 12 | Layer 1 13 | 14 | 15 | grades = [88, 72, 93, 94] 16 | 17 | 0 2 4 18 | slicing: selecting a set of elements 19 | 20 | >>> grades[1:3] 21 | [72, 93] 22 | 1 23 | 3 24 | 25 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | ## Instructional Material 2 | 3 | This workshop material is made available under a [Creative Commons Attribution license (CC-BY 4.0)][cc-by-human] 4 | 5 | Parts of this content have been adapted from the [Data Carpentry "Python for ecologists"](http://www.datacarpentry.org/python-ecology-lesson/) workshop material, used under a [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/legalcode). 6 | 7 | The following is a human-readable summary of (and not a substitute for) the [full legal text of the CC BY 4.0 8 | license][cc-by-legal]. 9 | 10 | You are free: 11 | 12 | * to **Share**---copy and redistribute the material in any medium or format 13 | * to **Adapt**---remix, transform, and build upon the material 14 | 15 | for any purpose, even commercially. 16 | 17 | The licensor cannot revoke these freedoms as long as you follow the 18 | license terms. 19 | 20 | Under the following terms: 21 | 22 | * **Attribution**---You must give appropriate credit (mentioning that 23 | your work is derived from work that is Copyright © Software 24 | Carpentry and Monash Data Fluency, where practical, linking to 25 | http://software-carpentry.org/ and https://github.com/MonashDataFluency), 26 | provide a [link to the license][cc-by-human], 27 | and indicate if changes were made. You may do 28 | so in any reasonable manner, but not in any way that suggests the 29 | licensor endorses you or your use. 30 | 31 | **No additional restrictions**---You may not apply legal terms or 32 | technological measures that legally restrict others from doing 33 | anything the license permits. With the understanding that: 34 | 35 | Notices: 36 | 37 | * You do not have to comply with the license for elements of the 38 | material in the public domain or where your use is permitted by an 39 | applicable exception or limitation. 40 | * No warranties are given. The license may not give you all of the 41 | permissions necessary for your intended use. For example, other 42 | rights such as publicity, privacy, or moral rights may limit how you 43 | use the material. 44 | 45 | ## Software 46 | 47 | [The MIT License (MIT)][mit-license] 48 | 49 | Permission is hereby granted, free of charge, to any person obtaining 50 | a copy of this software and associated documentation files (the 51 | "Software"), to deal in the Software without restriction, including 52 | without limitation the rights to use, copy, modify, merge, publish, 53 | distribute, sublicense, and/or sell copies of the Software, and to 54 | permit persons to whom the Software is furnished to do so, subject to 55 | the following conditions: 56 | 57 | The above copyright notice and this permission notice shall be 58 | included in all copies or substantial portions of the Software. 59 | 60 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 61 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 62 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 63 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 64 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 65 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 66 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 67 | 68 | [cc-by-human]: https://creativecommons.org/licenses/by/4.0/ 69 | [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode 70 | [mit-license]: http://opensource.org/licenses/mit-license.html 71 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python workshops base 2 | 3 | This is a base repository for Data Fluency Python Workshop modules. 4 | 5 | To add or modify content, edit the notebooks in 6 | `workshops/docs/modules/notebooks`. 7 | 8 | ## Quick start 9 | ```bash 10 | # Install pipenv to ~/.local/bin/pipenv 11 | pip install --user pipenv 12 | 13 | git clone https://github.com/MonashDataFluency/python-workshop-base.git 14 | cd python-workshop-base 15 | 16 | # Install dependencies 17 | pipenv install 18 | 19 | # Enter the virtual environment 20 | pipenv shell 21 | jupyter notebook 22 | # Edit the notebooks in workshops/docs/modules/notebooks 23 | # Ctrl-C in terminal to stop Jupyter when you are done 24 | ./build.sh 25 | 26 | # To view the generated site 27 | cd workshops 28 | open http://127.0.0.1:8000 && mkdocs serve 29 | ``` 30 | 31 | If everything looks fine, commit your changes (ideally to a branch), `git push` and send a Pull Request. 32 | 33 | To deploy the public docs, [see here](#deploying-the-static-site-to-github-pages). 34 | 35 | ---- 36 | 37 | ## Setup 38 | 39 | Install [Pipenv](https://docs.pipenv.org/) (eg `pip install pipenv`). 40 | 41 | Run: 42 | 43 | ```bash 44 | pipenv install 45 | ``` 46 | 47 | You can enter the virtualenv with `pipenv shell`, or run single commands in the 48 | enviroment of the virtualenv with `pipenv run`. 49 | 50 | ## Modifying and building 51 | 52 | Workshop modules can be found in `workshops/docs/modules/notebooks`. 53 | 54 | To edit and update a module: 55 | * edit the Jupyter Notebook, following the required [conventions](#jupyter-notebook-conventions). 56 | * ensure your code runs 57 | * save the notebook 58 | * **stop the kernel for the notebook** 59 | 60 | Then run: 61 | 62 | ```bash 63 | # Export the notebooks, build the docs 64 | pipenv run ./build.sh 65 | ``` 66 | 67 | This script runs `jupyter nbconvert` to generate Markdown from the notebooks, 68 | then runs `mkdocs build` to generate the static HTML. 69 | 70 | New modules should be listed in `workshops/mkdocs.yml`, `workshops/docs/index.md` 71 | and possibly `workshops/docs/fullday.md` and/or `workshops/docs/halfday.md` if they form part of the 72 | full or half day workshops. 73 | 74 | ### Jupyter notebook conventions 75 | 76 | The intention of developing the workshop materials directly from Jupyter notebooks is to: 77 | 78 | 1. Ensure code examples run correctly, catch errors early. 79 | 2. Make each module a self-contained unit, including pulling in dependencies. 80 | 3. Enable generation of student and instructor notes from a single source. 81 | 82 | Here are some conventions to follow to achieve this: 83 | 84 | * **Cell tagging**: challenges should be tagged `challenge` and **solutions should be tagged** `solution`. 85 | The `nbconvert` templates hide cells tagged `solution` from the main student notes, 86 | but output them for instructor notes. Currently (May-2018) only `jupyter notebook` 87 | allows editing cell tags - the required UI for `jupyter lab` hasn't been completed yet. 88 | * **Package dependencies**: Include a `!pip install somepackage` cell near to start of every module 89 | that installs any required dependencies. This makes the modules work as standalone units in a range 90 | of environments (local Jupyter or IPython REPL, Azure Notebooks, Colaboratory, Python Anywhere). 91 | * **Acquire data via URLs in the notebook**: Include cells like `import urllib; urllib.request.urlretrieve("https://files.rcsb.org/download/3FPR.pdb")` to download external data. 92 | This allows the notes to be used in various hosted or local Jupyter environments 93 | (it's also a useful operation for students to learn). 94 | 95 | ## Viewing the generated site 96 | 97 | You can view the site locally via: 98 | 99 | ```bash 100 | pipenv shell 101 | cd workshops 102 | mkdocs serve 103 | 104 | # or, run 105 | # pipenv run bash -c "cd workshops; mkdocs serve" 106 | ``` 107 | 108 | Go to [http://127.0.0.1:8000](http://127.0.0.1:8000) 109 | 110 | ## Deploying the static site to Github Pages 111 | 112 | To update the site at https://MonashDataFluency.github.io/python-workshop-base/, run: 113 | 114 | ```bash 115 | pipenv run ./deploy.sh 116 | ``` 117 | 118 | # License 119 | 120 | This workshop material is made available under a 121 | [Creative Commons Attribution license (CC-BY 4.0)](https://creativecommons.org/licenses/by/4.0/legalcode) 122 | 123 | Parts of this content have been adapted from the 124 | [Data Carpentry "Python for ecologists"](http://www.datacarpentry.org/python-ecology-lesson/) 125 | workshop material, used under a [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/legalcode). 126 | 127 | Code is made available under the 128 | [MIT License](http://opensource.org/licenses/mit-license.html). 129 | 130 | See [LICENCE.md](LICENSE.md) for the full text. 131 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/wip/functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "## Functions" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "Functions wrap up reusable pieces of code - the *DRY* principle\n", 19 | "\n", 20 | "Significant whitespace: the body of the function is indicated by indenting by 4 spaces\n", 21 | "\n", 22 | "*(We also use these indented blocks for if/else, for and while statements .. later !)*\n", 23 | "\n", 24 | "`return` statements immediately return a value (or `None` if no value is given)\n", 25 | "\n", 26 | "Any code in the function after the `return` statement does not get executed." 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 60, 32 | "metadata": { 33 | "slideshow": { 34 | "slide_type": "subslide" 35 | } 36 | }, 37 | "outputs": [ 38 | { 39 | "name": "stdout", 40 | "output_type": "stream", 41 | "text": [ 42 | "256 python-esque\n" 43 | ] 44 | } 45 | ], 46 | "source": [ 47 | "def square(x):\n", 48 | " return x**2\n", 49 | "\n", 50 | "def hyphenate(a, b):\n", 51 | " return a + '-' + b\n", 52 | " print(\"We will never get here\")\n", 53 | "\n", 54 | "print(square(16), hyphenate('python', 'esque'))" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "slideshow": { 61 | "slide_type": "subslide" 62 | } 63 | }, 64 | "source": [ 65 | "### Indentation and whitespace\n", 66 | "\n", 67 | "* Python uses spaces at the start of a line to indicate a 'block' of code.\n", 68 | "* A new block of code should be indented by four spaces.\n", 69 | "\n", 70 | "* For a function, all the indented code is part of the function.\n", 71 | "* (This also applies to loops like `for` and `while` and conditionals like `if`)\n", 72 | "\n", 73 | "(Indenting/dedenting by four spaces in Python is the equivalent to opening **{** and closing **}** curly brackets in languages like Java, Javascript, C, C++, C# etc)\n", 74 | "\n", 75 | "(Python actually allows you to indent by any number of spaces as long as you are consistent throughout the file. The official Python style guide prefers four spaces https://www.python.org/dev/peps/pep-0008/, and most Python code you'll find follows that convention, so you should too. You can even use tab characters, but please, please, pretty please don't do that)." 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 61, 81 | "metadata": { 82 | "slideshow": { 83 | "slide_type": "slide" 84 | } 85 | }, 86 | "outputs": [ 87 | { 88 | "name": "stdout", 89 | "output_type": "stream", 90 | "text": [ 91 | "4 6 9\n" 92 | ] 93 | } 94 | ], 95 | "source": [ 96 | "# Functions can return multiple values (just return a tuple and unpack it)\n", 97 | "def lengths(a, b, c):\n", 98 | " return len(a), len(b), len(c)\n", 99 | "\n", 100 | "x, y, z = lengths(\"long\", \"longer\", \"LONGEREST\")\n", 101 | "print(x, y, z)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 62, 107 | "metadata": { 108 | "slideshow": { 109 | "slide_type": "slide" 110 | }, 111 | "tags": [ 112 | "biosummer" 113 | ] 114 | }, 115 | "outputs": [ 116 | { 117 | "data": { 118 | "text/plain": [ 119 | "['MIL', 'GROGDRIN', 'PINEAPPLE']" 120 | ] 121 | }, 122 | "execution_count": 62, 123 | "metadata": {}, 124 | "output_type": "execute_result" 125 | } 126 | ], 127 | "source": [ 128 | "def split_at(seq, residue='K'):\n", 129 | " \"\"\"\n", 130 | " Takes a protein sequence (as a string) and splits it at each K residue,\n", 131 | " or the residue specified in the `residue` keyword argument. Split point\n", 132 | " residue is discarded.\n", 133 | " \n", 134 | " Returns a list of strings.\n", 135 | " \"\"\"\n", 136 | " return seq.split(residue)\n", 137 | "\n", 138 | "split_at('MILKGROGDRINKPINEAPPLE')" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 63, 144 | "metadata": { 145 | "slideshow": { 146 | "slide_type": "slide" 147 | } 148 | }, 149 | "outputs": [], 150 | "source": [ 151 | "# Functions can have an indeterminate number of arguments and keyword arguments using * and **\n", 152 | "import math\n", 153 | "\n", 154 | "def vector_magnitude(x, y, *args, **kwargs):\n", 155 | " \n", 156 | " # print(args) # args is a tuple\n", 157 | " # print(kwargs) # kwargs is a dictionary\n", 158 | " \n", 159 | " scale = kwargs.get('scale', 1)\n", 160 | " \n", 161 | " vector = [x,y] + list(args)\n", 162 | " return math.sqrt(sum(v**2 for v in vector)) * scale" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 64, 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "9.219544457292887\n" 175 | ] 176 | } 177 | ], 178 | "source": [ 179 | "print(vector_magnitude(1, 2, 4, 8, m=2))" 180 | ] 181 | } 182 | ], 183 | "metadata": { 184 | "celltoolbar": "Tags", 185 | "kernelspec": { 186 | "display_name": "Python 3", 187 | "language": "python", 188 | "name": "python3" 189 | }, 190 | "language_info": { 191 | "codemirror_mode": { 192 | "name": "ipython", 193 | "version": 3 194 | }, 195 | "file_extension": ".py", 196 | "mimetype": "text/x-python", 197 | "name": "python", 198 | "nbconvert_exporter": "python", 199 | "pygments_lexer": "ipython3", 200 | "version": "3.6.3" 201 | } 202 | }, 203 | "nbformat": 4, 204 | "nbformat_minor": 2 205 | } 206 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/wip/conditionals.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "## Conditionals" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 65, 17 | "metadata": {}, 18 | "outputs": [ 19 | { 20 | "data": { 21 | "text/plain": [ 22 | "True" 23 | ] 24 | }, 25 | "execution_count": 65, 26 | "metadata": {}, 27 | "output_type": "execute_result" 28 | } 29 | ], 30 | "source": [ 31 | "a = 10\n", 32 | "b = 0\n", 33 | "a > 1" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 66, 39 | "metadata": { 40 | "slideshow": { 41 | "slide_type": "subslide" 42 | } 43 | }, 44 | "outputs": [ 45 | { 46 | "name": "stdout", 47 | "output_type": "stream", 48 | "text": [ 49 | "a is greater than one\n" 50 | ] 51 | } 52 | ], 53 | "source": [ 54 | "if a > 1:\n", 55 | " print(\"a is greater than one\")" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 67, 61 | "metadata": { 62 | "slideshow": { 63 | "slide_type": "subslide" 64 | } 65 | }, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "Bird is the word.\n", 72 | "The word is not girt.\n" 73 | ] 74 | } 75 | ], 76 | "source": [ 77 | "word = 'Bird'\n", 78 | "\n", 79 | "# Note: Double equals for a conditional vs single equals for assignment !\n", 80 | "if word == 'Bird':\n", 81 | " print('Bird is the word.')\n", 82 | " \n", 83 | "if word != 'Girt':\n", 84 | " print('The word is not girt.')" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 68, 90 | "metadata": { 91 | "slideshow": { 92 | "slide_type": "subslide" 93 | } 94 | }, 95 | "outputs": [ 96 | { 97 | "name": "stdout", 98 | "output_type": "stream", 99 | "text": [ 100 | "'ird' is in Bird.\n", 101 | "'i' is in letters.\n" 102 | ] 103 | } 104 | ], 105 | "source": [ 106 | "if 'ird' in word:\n", 107 | " print(\"'ird' is in Bird.\")\n", 108 | " \n", 109 | "letters = ['B', 'i', 'r', 'd']\n", 110 | "if 'i' in letters:\n", 111 | " print(\"'i' is in letters.\")" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": { 117 | "slideshow": { 118 | "slide_type": "subslide" 119 | } 120 | }, 121 | "source": [ 122 | "*Protip*: Long lines can be split across two or more using a backslash ('\\')\n", 123 | "\n", 124 | "This can make your code more readable.\n", 125 | "\n", 126 | "There should be nothing after the backslash, including whitespace.\n", 127 | "\n", 128 | "Try to keep lines shorter than 78 characters for a PEP-8 style bonus." 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 69, 134 | "metadata": { 135 | "slideshow": { 136 | "slide_type": "subslide" 137 | } 138 | }, 139 | "outputs": [ 140 | { 141 | "name": "stdout", 142 | "output_type": "stream", 143 | "text": [ 144 | "There is no 'I' in team (or TEAM).\n" 145 | ] 146 | } 147 | ], 148 | "source": [ 149 | "if 'I' not in 'team' or \\\n", 150 | " 'I' not in 'TEAM':\n", 151 | " print(\"There is no 'I' in team (or TEAM).\")" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 70, 157 | "metadata": { 158 | "slideshow": { 159 | "slide_type": "subslide" 160 | } 161 | }, 162 | "outputs": [ 163 | { 164 | "data": { 165 | "text/plain": [ 166 | "True" 167 | ] 168 | }, 169 | "execution_count": 70, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "# Boolean logic\n", 176 | "# True and True => True\n", 177 | "a > 1 and b <= 0" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 71, 183 | "metadata": {}, 184 | "outputs": [ 185 | { 186 | "data": { 187 | "text/plain": [ 188 | "True" 189 | ] 190 | }, 191 | "execution_count": 71, 192 | "metadata": {}, 193 | "output_type": "execute_result" 194 | } 195 | ], 196 | "source": [ 197 | "# True or False => True\n", 198 | "a > 1 or b > 1" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 72, 204 | "metadata": { 205 | "slideshow": { 206 | "slide_type": "subslide" 207 | } 208 | }, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "a is less than fifty\n" 215 | ] 216 | } 217 | ], 218 | "source": [ 219 | "if a > 100:\n", 220 | " print(\"a is greater than one hundred\")\n", 221 | "elif a > 50:\n", 222 | " print(\"a is greater than fifty but less than one hundred\")\n", 223 | "else:\n", 224 | " print(\"a is less than fifty\")\n", 225 | " \n", 226 | "# For better or worse, there is no case/switch statement in Python - you just use if/elif/elif/else" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 73, 232 | "metadata": { 233 | "slideshow": { 234 | "slide_type": "subslide" 235 | } 236 | }, 237 | "outputs": [ 238 | { 239 | "name": "stdout", 240 | "output_type": "stream", 241 | "text": [ 242 | "A non-zero int is truthy\n", 243 | "The int 0 is 'falsey' ... not False => True !\n", 244 | "A non-empty string, even whitespace, is 'truthy\n" 245 | ] 246 | } 247 | ], 248 | "source": [ 249 | "# Truthyness\n", 250 | "if a:\n", 251 | " print(\"A non-zero int is truthy\")\n", 252 | "\n", 253 | "if not (a - 10):\n", 254 | " print(\"The int 0 is 'falsey' ... not False => True !\")\n", 255 | "\n", 256 | "if '' or [] or () or dict():\n", 257 | " print(\"We will never see this since an empty string, list, tuple and dict are all 'falsey'\")\n", 258 | " \n", 259 | "if \" \":\n", 260 | " print(\"A non-empty string, even whitespace, is 'truthy\")" 261 | ] 262 | } 263 | ], 264 | "metadata": { 265 | "celltoolbar": "Tags", 266 | "kernelspec": { 267 | "display_name": "Python 3", 268 | "language": "python", 269 | "name": "python3" 270 | }, 271 | "language_info": { 272 | "codemirror_mode": { 273 | "name": "ipython", 274 | "version": 3 275 | }, 276 | "file_extension": ".py", 277 | "mimetype": "text/x-python", 278 | "name": "python", 279 | "nbconvert_exporter": "python", 280 | "pygments_lexer": "ipython3", 281 | "version": "3.6.3" 282 | } 283 | }, 284 | "nbformat": 4, 285 | "nbformat_minor": 2 286 | } 287 | -------------------------------------------------------------------------------- /workshops/docs/modules/loops.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 12 | 13 | 14 | # Automation with Loops 15 | 16 | 17 | 18 | 19 | 45 | 46 | 47 | 48 | 49 | An example task that we might want to repeat is printing each character in a 50 | word on a line of its own. 51 | 52 | 53 | 54 | 55 | 56 | 57 | ```python 58 | word = 'lead' 59 | ``` 60 | 61 | 62 | 63 | 64 | 65 | We can access a character in a string using its index. For example, we can get the first 66 | character of the word `'lead'`, by using `word[0]`. One way to print each character is to use 67 | four `print` statements: 68 | 69 | 70 | 71 | 72 | 73 | 74 | ```python 75 | print(word[0]) 76 | print(word[1]) 77 | print(word[2]) 78 | print(word[3]) 79 | ``` 80 | 81 |
 82 | 
output
83 | 84 | l 85 | e 86 | a 87 | d 88 | 89 | 90 |
91 | 92 | 93 | 94 | 95 | 96 | While this works, it's a bad approach for two reasons: 97 | 98 | 1. It doesn't scale: 99 | if we want to print the characters in a string that's hundreds of letters long, 100 | we'd be better off just typing them in. 101 | 102 | 2. It's fragile: 103 | if we give it a longer string, 104 | it only prints part of the data, 105 | and if we give it a shorter one, 106 | it produces an error because we're asking for characters that don't exist. 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | Running: 115 | 116 | ```python 117 | word = 'tin' 118 | print(word[0]) 119 | print(word[1]) 120 | print(word[2]) 121 | print(word[3]) 122 | ``` 123 | 124 | 125 | 126 | 127 | 128 | Gives the error: 129 | 130 | ``` 131 | --------------------------------------------------------------------------- 132 | IndexError Traceback (most recent call last) 133 | in () 134 | 3 print(word[1]) 135 | 4 print(word[2]) 136 | ----> 5 print(word[3]) 137 | 138 | IndexError: string index out of range 139 | ``` 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | Here's a better approach: 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | ```python 157 | word = 'lead' 158 | for char in word: 159 | print(char) 160 | ``` 161 | 162 |
163 | 
output
164 | 165 | l 166 | e 167 | a 168 | d 169 | 170 | 171 |
172 | 173 | 174 | 175 | 176 | 177 | This is shorter --- certainly shorter than something that prints every character in a hundred-letter string --- and 178 | more robust as well: 179 | 180 | 181 | 182 | 183 | 184 | 185 | ```python 186 | word = 'oxygen' 187 | for char in word: 188 | print(char) 189 | ``` 190 | 191 |
192 | 
output
193 | 194 | o 195 | x 196 | y 197 | g 198 | e 199 | n 200 | 201 | 202 |
203 | 204 | 205 | 206 | 207 | 208 | The improved version uses a **for loop** to repeat an operation --- in this case, printing --- once for each thing in a sequence. 209 | The general form of a loop is: 210 | 211 | ```python 212 | for variable in collection: 213 | # do things with variable 214 | ``` 215 | 216 | 217 | 218 | 219 | 220 | 221 | Using the oxygen example above, the loop might look like this: 222 | 223 | ![loop_image](images/loops_image.png) 224 | 225 | where each character (`char`) in the variable `word` is looped through and printed one character after another. 226 | The numbers in the diagram denote which loop cycle the character was printed in (1 being the first loop, and 6 being the final loop). 227 | 228 | We can call the **loop variable** anything we like, 229 | but there must be a colon at the end of the line starting the loop, and we must indent anything we want to run inside the loop. Unlike many other languages, there is no command to signify the end of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop. 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | ## What's in a name? 239 | 240 | 241 | In the example above, the loop variable was given the name `char` as a mnemonic; it is short for 'character'. 242 | We can choose any name we want for variables. We might just as easily have chosen the name `banana` for the loop variable, as long as we use the same name when we invoke the variable inside the loop: 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | ```python 252 | word = 'oxygen' 253 | for banana in word: 254 | print(banana) 255 | ``` 256 | 257 |
258 | 
output
259 | 260 | o 261 | x 262 | y 263 | g 264 | e 265 | n 266 | 267 | 268 |
269 | 270 | 271 | 272 | 273 | 274 | It is a good idea to choose variable names that are meaningful, otherwise it would be more difficult to understand what the loop is doing. 275 | 276 | 277 | Here's another loop that repeatedly updates a variable: 278 | 279 | 280 | 281 | 282 | 283 | 284 | ```python 285 | length = 0 286 | for vowel in 'aeiou': 287 | length = length + 1 288 | print('There are', length, 'vowels') 289 | ``` 290 | 291 |
292 | 
output
293 | 294 | There are 5 vowels 295 | 296 | 297 |
298 | 299 | 300 | 301 | 302 | 303 | It's worth tracing the execution of this little program step by step. 304 | 305 | Since there are five characters in `'aeiou'`, 306 | the statement on line 3 will be executed five times. 307 | 308 | The first time around, 309 | `length` is zero (the value assigned to it on line 1) 310 | and `vowel` is `'a'`. 311 | The statement adds 1 to the old value of `length`, 312 | producing 1, 313 | and updates `length` to refer to that new value. 314 | 315 | The next time around, 316 | `vowel` is `'e'` and `length` is 1, 317 | so `length` is updated to be 2. 318 | 319 | After three more updates, 320 | `length` is 5; 321 | since there is nothing left in `'aeiou'` for Python to process, 322 | the loop finishes 323 | and the `print` statement on line 4 tells us our final answer. 324 | 325 | Note that a loop variable `vowel` is just a variable that's being used to record progress in a loop. 326 | 327 | 328 | 329 | 330 | 331 | ## Challenge - scope of the loop variable 332 | 333 | 1. In the loop over `"aeiou"` above, does the loop variable `vowel` exist after the loop has finished ? 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | ```python 342 | length = 0 343 | for vowel in 'aeiou': 344 | length = length + 1 345 | print('After the loop, `vowel` exists and has the value: ' + vowel) 346 | 347 | # The loop variable `vowel` exists after the loop is completed, not only inside the loop 348 | ``` 349 | 350 |
351 | 
output
352 | 353 | After the loop, `vowel` exists and has the value: u 354 | 355 | 356 |
357 | 358 | 359 | 360 | 361 | 362 | Note also that finding the length of a string is such a common operation that Python actually has a built-in function to do it called `len`: 363 | 364 | 365 | 366 | 367 | 368 | 369 | ```python 370 | print(len('aeiou')) 371 | ``` 372 | 373 |
374 | 
output
375 | 376 | 5 377 | 378 | 379 |
380 | 381 | 382 | 383 | 384 | 385 | `len` is much faster than any function we could write ourselves, 386 | and much easier to read than a two-line loop; 387 | it will also give us the length of many other things that we haven't met yet, 388 | so we should always use it when we can. 389 | 390 | 391 | 392 | 393 | 394 | ## From 1 to N 395 | 396 | Python has a built-in function called `range` that creates a sequence of numbers. `range` can 397 | accept 1, 2, or 3 parameters. 398 | 399 | * If one parameter is given, `range` creates an array of that length, 400 | starting at zero and incrementing by 1. 401 | For example, `range(3)` produces the numbers `0, 1, 2`. 402 | * If two parameters are given, `range` starts at 403 | the first and ends just before the second, incrementing by one. 404 | For example, `range(2, 5)` produces `2, 3, 4`. 405 | * If `range` is given 3 parameters, 406 | it starts at the first one, ends just before the second one, and increments by the third one. 407 | For exmaple `range(3, 10, 2)` produces `3, 5, 7, 9`. 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | ## Challenge - loop over a range 416 | Using `range`, 417 | write a loop that uses `range` to print the first 3 natural numbers: 418 | 419 | ``` 420 | 1 421 | 2 422 | 3 423 | ``` 424 | 425 | 426 | 427 | 428 | 429 | 430 | 431 | 434 | 435 | 436 | 437 | 454 | 455 | 456 | 457 | 458 | ## Computing Powers With Loops 459 | 460 | Exponentiation is built into Python: 461 | 462 | 463 | 464 | 465 | 466 | 467 | ```python 468 | print(5 ** 3) 469 | ``` 470 | 471 |
472 | 
output
473 | 474 | 125 475 | 476 | 477 |
478 | 479 | 480 | 481 | 482 | 483 | ## Challenge - multiplication in a loop 484 | 485 | Write a loop that calculates the same result as `5 ** 3` using 486 | multiplication (and without exponentiation). 487 | 488 | 489 | 490 | 491 | 494 | 495 | 496 | 497 | 514 | 515 | 516 | 517 | 518 | ## Bonus challenge: reverse a string 519 | 520 | Knowing that two strings can be concatenated using the `+` operator, 521 | write a loop that takes a string 522 | and produces a new string with the characters in reverse order, 523 | so `'Newton'` becomes `'notweN'`. 524 | 525 | 526 | 527 | 528 | 531 | 532 | 533 | 534 | 552 | 553 | 554 | 555 | 556 | ## Enumerate 557 | 558 | The built-in function `enumerate` takes a sequence (e.g. a list) and generates a 559 | new sequence of the same length. Each element of the new sequence is a pair composed of the index 560 | (0, 1, 2,...) and the value from the original sequence: 561 | 562 | ``` 563 | for i, x in enumerate(xs): 564 | # Do something with i and x 565 | ``` 566 | 567 | 568 | The code above loops through `xs`, assigning the index to `i` and the value to `x`. 569 | 570 | 571 | 572 | 573 | 574 | ## Bonus challenge: enumeration for computing the value of a polynomial 575 | 576 | Suppose you have encoded a polynomial as a list of coefficients in 577 | the following way: the first element is the constant term, the 578 | second element is the coefficient of the linear term, the third is the 579 | coefficient of the quadratic term, etc. 580 | 581 | ``` 582 | x = 5 583 | cc = [2, 4, 3] 584 | ``` 585 | 586 | 587 | ``` 588 | y = cc[0] * x**0 + cc[1] * x**1 + cc[2] * x**2 589 | y = 97 590 | ``` 591 | 592 | 593 | Write a loop using `enumerate(cc)` which computes the value `y` of any 594 | polynomial, given `x` and `cc`. 595 | 596 | 597 | 598 | 599 | 602 | 603 | 604 | 605 | 627 | 628 | 629 | 630 | 631 | 632 | ```python 633 | 634 | ``` 635 | 636 | 637 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/wip/slicing_and_list_comprehensions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "slideshow": { 7 | "slide_type": "slide" 8 | } 9 | }, 10 | "source": [ 11 | "## Slicing lists" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 41, 17 | "metadata": {}, 18 | "outputs": [ 19 | { 20 | "data": { 21 | "text/plain": [ 22 | "[2, 4, 6]" 23 | ] 24 | }, 25 | "execution_count": 41, 26 | "metadata": {}, 27 | "output_type": "execute_result" 28 | } 29 | ], 30 | "source": [ 31 | "numbers = [2, 4, 6, 8, 10, 12]\n", 32 | "\n", 33 | "# list[start:end]\n", 34 | "# start is inclusive, end isn't\n", 35 | "\n", 36 | "numbers[0:3]" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 42, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/plain": [ 47 | "[10, 12]" 48 | ] 49 | }, 50 | "execution_count": 42, 51 | "metadata": {}, 52 | "output_type": "execute_result" 53 | } 54 | ], 55 | "source": [ 56 | "numbers[4:7]" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 43, 62 | "metadata": {}, 63 | "outputs": [ 64 | { 65 | "data": { 66 | "text/plain": [ 67 | "[2, 4, 6]" 68 | ] 69 | }, 70 | "execution_count": 43, 71 | "metadata": {}, 72 | "output_type": "execute_result" 73 | } 74 | ], 75 | "source": [ 76 | "numbers[:3] # omitting start implies 0 (the very start)" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 44, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "data": { 86 | "text/plain": [ 87 | "[8, 10, 12]" 88 | ] 89 | }, 90 | "execution_count": 44, 91 | "metadata": {}, 92 | "output_type": "execute_result" 93 | } 94 | ], 95 | "source": [ 96 | "numbers[3:] # omitting end means to the very end eg len(numbers)" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 45, 102 | "metadata": { 103 | "slideshow": { 104 | "slide_type": "subslide" 105 | } 106 | }, 107 | "outputs": [ 108 | { 109 | "data": { 110 | "text/plain": [ 111 | "[12]" 112 | ] 113 | }, 114 | "execution_count": 45, 115 | "metadata": {}, 116 | "output_type": "execute_result" 117 | } 118 | ], 119 | "source": [ 120 | "numbers[-1:] # negative values reverse direction" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 46, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "data": { 130 | "text/plain": [ 131 | "[2, 4, 6, 8, 10]" 132 | ] 133 | }, 134 | "execution_count": 46, 135 | "metadata": {}, 136 | "output_type": "execute_result" 137 | } 138 | ], 139 | "source": [ 140 | "numbers[:-1]" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 47, 146 | "metadata": { 147 | "slideshow": { 148 | "slide_type": "subslide" 149 | } 150 | }, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "[2, 6, 10]" 156 | ] 157 | }, 158 | "execution_count": 47, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "# you can also specify a step size\n", 165 | "# list[start:end:step]\n", 166 | "\n", 167 | "numbers[0:6:2]" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 48, 173 | "metadata": { 174 | "slideshow": { 175 | "slide_type": "subslide" 176 | } 177 | }, 178 | "outputs": [ 179 | { 180 | "data": { 181 | "text/plain": [ 182 | "[2, 4, 6, 8, 10, 12]" 183 | ] 184 | }, 185 | "execution_count": 48, 186 | "metadata": {}, 187 | "output_type": "execute_result" 188 | } 189 | ], 190 | "source": [ 191 | "# [:] is a shorthand for copying a list.\n", 192 | "# Equivalent to:\n", 193 | "# n_copy = list(numbers)\n", 194 | "\n", 195 | "n_copy = numbers[:]\n", 196 | "n_copy" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 49, 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/plain": [ 207 | "[2, 4, 6, 8, 10, 12]" 208 | ] 209 | }, 210 | "execution_count": 49, 211 | "metadata": {}, 212 | "output_type": "execute_result" 213 | } 214 | ], 215 | "source": [ 216 | "n_copy[3] = 8\n", 217 | "n_copy" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 50, 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "data": { 227 | "text/plain": [ 228 | "[2, 4, 6, 8, 10, 12]" 229 | ] 230 | }, 231 | "execution_count": 50, 232 | "metadata": {}, 233 | "output_type": "execute_result" 234 | } 235 | ], 236 | "source": [ 237 | "numbers" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": { 243 | "slideshow": { 244 | "slide_type": "slide" 245 | }, 246 | "tags": [ 247 | "challenge" 248 | ] 249 | }, 250 | "source": [ 251 | "### Challenge 1\n", 252 | "\n", 253 | "Given the list: `['banana', 'cherry', 'strawberry', 'orange']`\n", 254 | "\n", 255 | "Return a list of just the red fruits." 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": { 261 | "tags": [ 262 | "solution" 263 | ] 264 | }, 265 | "source": [ 266 | "### Solution" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 51, 272 | "metadata": { 273 | "slideshow": { 274 | "slide_type": "slide" 275 | }, 276 | "tags": [ 277 | "solution" 278 | ] 279 | }, 280 | "outputs": [ 281 | { 282 | "data": { 283 | "text/plain": [ 284 | "['cherry', 'strawberry']" 285 | ] 286 | }, 287 | "execution_count": 51, 288 | "metadata": {}, 289 | "output_type": "execute_result" 290 | } 291 | ], 292 | "source": [ 293 | "fruits = ['banana', 'cherry', 'strawberry', 'orange']\n", 294 | "red_ones = fruits[1:3]\n", 295 | "red_ones" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": { 301 | "slideshow": { 302 | "slide_type": "slide" 303 | } 304 | }, 305 | "source": [ 306 | "## Loops" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": { 312 | "slideshow": { 313 | "slide_type": "subslide" 314 | } 315 | }, 316 | "source": [ 317 | "A `for` loop works on a sequence types, generators and iterators\n", 318 | "\n", 319 | "(this includes lists, tuples, strings and dictionaries)" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 74, 325 | "metadata": {}, 326 | "outputs": [ 327 | { 328 | "name": "stdout", 329 | "output_type": "stream", 330 | "text": [ 331 | "A\n", 332 | "B\n", 333 | "C\n", 334 | "D\n", 335 | ".\n", 336 | ".\n", 337 | "m\n", 338 | "e\n", 339 | "h\n" 340 | ] 341 | } 342 | ], 343 | "source": [ 344 | "for letter in \"ABCD..meh\":\n", 345 | " print(letter)" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 75, 351 | "metadata": { 352 | "slideshow": { 353 | "slide_type": "subslide" 354 | } 355 | }, 356 | "outputs": [ 357 | { 358 | "name": "stdout", 359 | "output_type": "stream", 360 | "text": [ 361 | "('Z', 99)\n", 362 | "('Y', 98)\n", 363 | "('X', 97)\n", 364 | "Z 99\n", 365 | "Y 98\n", 366 | "X 97\n" 367 | ] 368 | } 369 | ], 370 | "source": [ 371 | "ts = [('Z', 99), ('Y', 98), ('X', 97)]\n", 372 | "\n", 373 | "for t in ts:\n", 374 | " print(t)\n", 375 | " \n", 376 | "# using tuple unpacking\n", 377 | "for m, n in ts:\n", 378 | " print(m, n)" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 76, 384 | "metadata": { 385 | "slideshow": { 386 | "slide_type": "subslide" 387 | } 388 | }, 389 | "outputs": [ 390 | { 391 | "name": "stdout", 392 | "output_type": "stream", 393 | "text": [ 394 | "('A', 1)\n", 395 | "('B', 2)\n", 396 | "('C', 3)\n" 397 | ] 398 | } 399 | ], 400 | "source": [ 401 | "# for on dictionary.items()\n", 402 | "d = {'A': 1, 'B': 2, 'C': 3}\n", 403 | "\n", 404 | "for item in d.items():\n", 405 | " # print(type(item))\n", 406 | " print(item)" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": 77, 412 | "metadata": { 413 | "slideshow": { 414 | "slide_type": "subslide" 415 | } 416 | }, 417 | "outputs": [ 418 | { 419 | "name": "stdout", 420 | "output_type": "stream", 421 | "text": [ 422 | "A 1\n", 423 | "B 2\n", 424 | "C 3\n" 425 | ] 426 | } 427 | ], 428 | "source": [ 429 | "for k, v in d.items():\n", 430 | " print(k, v)" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": { 436 | "slideshow": { 437 | "slide_type": "subslide" 438 | } 439 | }, 440 | "source": [ 441 | "`while` loops keep looping while their condition is true:\n", 442 | "\n", 443 | "```\n", 444 | "while some_condition:\n", 445 | " do_stuff()\n", 446 | "```\n", 447 | "\n", 448 | "Note: If the condition for your `while` loops never becomes `False`, the loop will run forever (in Jupyter you can do *Kernel -> Interrupt* to break out of the infinite loop)." 449 | ] 450 | }, 451 | { 452 | "cell_type": "code", 453 | "execution_count": 78, 454 | "metadata": {}, 455 | "outputs": [ 456 | { 457 | "name": "stdout", 458 | "output_type": "stream", 459 | "text": [ 460 | "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 " 461 | ] 462 | } 463 | ], 464 | "source": [ 465 | "a = 0\n", 466 | "while a < 16:\n", 467 | " print(a, end=' ')\n", 468 | " a += 1" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": { 474 | "slideshow": { 475 | "slide_type": "subslide" 476 | } 477 | }, 478 | "source": [ 479 | "`break` immediately exits a loop\n", 480 | "\n", 481 | "`continue` immediately starts the next iteration of the loop\n", 482 | "\n", 483 | "Any code inside the loop after a `break` or `continue` is skipped." 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 79, 489 | "metadata": {}, 490 | "outputs": [ 491 | { 492 | "name": "stdout", 493 | "output_type": "stream", 494 | "text": [ 495 | "2 4 6 8 10 12 14 16 " 496 | ] 497 | } 498 | ], 499 | "source": [ 500 | "a = 0\n", 501 | "while True:\n", 502 | " a += 1\n", 503 | " \n", 504 | " if a > 16:\n", 505 | " break\n", 506 | " print('We will never see this.')\n", 507 | " \n", 508 | " if a % 2:\n", 509 | " continue\n", 510 | " print('We will also never see this.')\n", 511 | " \n", 512 | " print(a, end=' ')" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": { 518 | "slideshow": { 519 | "slide_type": "slide" 520 | } 521 | }, 522 | "source": [ 523 | "## List comprehensions\n", 524 | "\n", 525 | "List comprehensions are a shorthand way to loop over a list, modify the items and create a new list." 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 80, 531 | "metadata": { 532 | "slideshow": { 533 | "slide_type": "subslide" 534 | } 535 | }, 536 | "outputs": [ 537 | { 538 | "data": { 539 | "text/plain": [ 540 | "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]" 541 | ] 542 | }, 543 | "execution_count": 80, 544 | "metadata": {}, 545 | "output_type": "execute_result" 546 | } 547 | ], 548 | "source": [ 549 | "# Instead of doing\n", 550 | "new_list = []\n", 551 | "for i in range(0,11):\n", 552 | " new_list.append(i**2)\n", 553 | "\n", 554 | "new_list" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": 81, 560 | "metadata": {}, 561 | "outputs": [ 562 | { 563 | "data": { 564 | "text/plain": [ 565 | "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]" 566 | ] 567 | }, 568 | "execution_count": 81, 569 | "metadata": {}, 570 | "output_type": "execute_result" 571 | } 572 | ], 573 | "source": [ 574 | "# Use a list comprehension instead\n", 575 | "new_list = [i**2 for i in range(0,11)]\n", 576 | "new_list" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": 82, 582 | "metadata": { 583 | "slideshow": { 584 | "slide_type": "subslide" 585 | } 586 | }, 587 | "outputs": [ 588 | { 589 | "data": { 590 | "text/plain": [ 591 | "[0, 1, 4, 9]" 592 | ] 593 | }, 594 | "execution_count": 82, 595 | "metadata": {}, 596 | "output_type": "execute_result" 597 | } 598 | ], 599 | "source": [ 600 | "# You can also `filter` values using an if statement inside the list comprehension\n", 601 | "new_list = [i**2 for i in range(0,11) if i < 4]\n", 602 | "new_list" 603 | ] 604 | } 605 | ], 606 | "metadata": { 607 | "celltoolbar": "Tags", 608 | "kernelspec": { 609 | "display_name": "Python 3", 610 | "language": "python", 611 | "name": "python3" 612 | }, 613 | "language_info": { 614 | "codemirror_mode": { 615 | "name": "ipython", 616 | "version": 3 617 | }, 618 | "file_extension": ".py", 619 | "mimetype": "text/x-python", 620 | "name": "python", 621 | "nbconvert_exporter": "python", 622 | "pygments_lexer": "ipython3", 623 | "version": "3.6.3" 624 | } 625 | }, 626 | "nbformat": 4, 627 | "nbformat_minor": 2 628 | } 629 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/wip/basics_data_carpentry.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# The Basics of Python\n", 8 | "\n", 9 | "Python is a general purpose programming language that supports rapid development\n", 10 | "of scripts and applications.\n", 11 | "\n", 12 | "Python's main advantages:\n", 13 | "\n", 14 | "* Open Source software, supported by Python Software Foundation\n", 15 | "* Available on all major platforms (ie. Windows, Linux and MacOS) \n", 16 | "* It is a general-purpose programming language, designed for readability\n", 17 | "* Supports multiple programming paradigms ('functional', 'object oriented')\n", 18 | "* Very large community with a rich ecosystem of third-party packages" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## Interpreter\n", 26 | "\n", 27 | "Python is an interpreted language which can be used in two ways:\n", 28 | "\n", 29 | "* \"Interactive\" Mode: It functions like an \"advanced calculator\" Executing\n", 30 | " one command at a time:\n", 31 | "\n" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": null, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "user:host:~$ python\n", 41 | "Python 3.5.1 (default, Oct 23 2015, 18:05:06)\n", 42 | "[GCC 4.8.3] on linux2\n", 43 | "Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n", 44 | ">>> 2 + 2\n", 45 | "4\n", 46 | ">>> print(\"Hello World\")\n", 47 | "Hello World\n" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "\n", 55 | "* \"Scripting\" Mode: Executing a series of \"commands\" saved in text file,\n", 56 | " usually with a `.py` extension after the name of your file:\n", 57 | "\n", 58 | "```bash\n", 59 | "user:host:~$ python my_script.py\n", 60 | "Hello World\n", 61 | "```\n", 62 | "\n", 63 | "\n" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "## Introduction to Python built-in data types\n", 71 | "\n", 72 | "### Strings, integers and floats\n", 73 | "\n", 74 | "One of the most basic things we can do in Python is assign values to variables:\n", 75 | "\n" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "text = \"Data Fluency\" # An example of a string\n", 85 | "number = 42 # An example of an integer\n", 86 | "pi_value = 3.1415 # An example of a float\n" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "\n", 94 | "Here we've assigned data to the variables `text`, `number` and `pi_value`,\n", 95 | "using the assignment operator `=`. To review the value of a variable, we\n", 96 | "can type the name of the variable into the Jupyter notebook and press **Shift** and **Enter**:\n", 97 | "\n" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "text\n", 107 | "## Which Returns\n", 108 | "\"Data Fluency\"\n" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "\n", 116 | "Everything in Python has a type. To get the type of something, we can pass it\n", 117 | "to the built-in function `type`:\n", 118 | "\n" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "type(text)\n", 128 | " str\n", 129 | "type(number)\n", 130 | " int\n", 131 | "type(6.02)\n", 132 | " float\n" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "\n", 140 | "The variable `text` is of type `str`, short for \"string\". Strings hold\n", 141 | "sequences of characters, which can be letters, numbers, punctuation\n", 142 | "or more exotic forms of text (even emoji!).\n", 143 | "\n", 144 | "We can also see the value of something using another built-in function, `print`:\n", 145 | "\n" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "\n", 155 | "print(text)\n", 156 | "Data Fluency\n", 157 | "\n", 158 | "print(11)\n", 159 | "11\n" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "\n", 167 | "This may seem redundant, but in fact it's the only way to display output in a script:\n", 168 | "\n", 169 | "*example.py*\n" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "# A Python script file\n", 179 | "# Comments in Python start with #\n", 180 | "# The next line assigns the string \"Data Carpentry\" to the variable \"text\".\n", 181 | "text = \"Data Fluency\"\n", 182 | "# The next line does nothing!\n", 183 | "text\n", 184 | "# The next line uses the print function to print out the value we assigned to \"text\"\n", 185 | "print(text)\n" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "*Running the script*\n", 193 | "```bash\n", 194 | "$ python example.py\n", 195 | "Data Fluency\n", 196 | "```\n", 197 | "\n", 198 | "Notice that \"Data Fluency\" is printed only once. \n", 199 | "\n", 200 | "**Tip**: `print` and `type` are built-in functions in Python. Later in this\n", 201 | "lesson, we will introduce methods and user-defined functions. The Python\n", 202 | "documentation is excellent for reference on the differences between them.\n", 203 | "\n" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "help(print)\n" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "\n", 220 | "Will give the output\n", 221 | "\n", 222 | "```\n", 223 | "Help on built-in function print in module builtins:\n", 224 | "\n", 225 | "print(...)\n", 226 | " print(value, ..., sep=' ', end='\\n', file=sys.stdout, flush=False)\n", 227 | " \n", 228 | " Prints the values to a stream, or to sys.stdout by default.\n", 229 | " Optional keyword arguments:\n", 230 | " file: a file-like object (stream); defaults to the current sys.stdout.\n", 231 | " sep: string inserted between values, default a space.\n", 232 | " end: string appended after the last value, default a newline.\n", 233 | " flush: whether to forcibly flush the stream.\n", 234 | "```\n", 235 | "\n", 236 | "### Operators\n", 237 | "\n", 238 | "We can perform mathematical calculations in Python using the basic operators\n", 239 | " `+, -, /, *, %`:\n", 240 | "\n" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": {}, 247 | "outputs": [], 248 | "source": [ 249 | ">>> 2 + 2 # Addition\n", 250 | "4\n", 251 | ">>> 6 * 7 # Multiplication\n", 252 | "42\n", 253 | ">>> 2 ** 16 # Power\n", 254 | "65536\n", 255 | ">>> 13 % 5 # Modulo\n", 256 | "3\n" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "\n", 264 | "We can also use comparison and logic operators:\n", 265 | "`<, >, ==, !=, <=, >=` and statements of identity such as\n", 266 | "`and, or, not`. The data type returned by this is\n", 267 | "called a _boolean_.\n", 268 | "\n", 269 | "\n" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | ">>> 3 > 4\n", 279 | "False\n", 280 | ">>> True and True\n", 281 | "True\n", 282 | ">>> True or False\n", 283 | "True\n" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "\n" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "## Sequential types: Lists and Tuples\n", 298 | "\n", 299 | "### Lists\n", 300 | "\n", 301 | "**Lists** are a common data structure to hold an ordered sequence of\n", 302 | "elements. Each element can be accessed by an index. Note that Python\n", 303 | "indexes start with 0 instead of 1:\n", 304 | "\n" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | ">>> numbers = [1, 2, 3]\n", 314 | ">>> numbers[0]\n", 315 | "1\n" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "\n", 323 | "A `for` loop can be used to access the elements in a list or other Python data structure one at a time. We will learn about loops in other lesson.\n", 324 | "\n" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": {}, 331 | "outputs": [], 332 | "source": [ 333 | ">>> for num in numbers:\n", 334 | "... print(num)\n", 335 | "...\n", 336 | "1\n", 337 | "2\n", 338 | "3\n" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "\n", 346 | "**Indentation** is very important in Python. Note that the second line in the\n", 347 | "example above is indented. Just like three chevrons `>>>` indicate an\n", 348 | "interactive prompt in Python, the three dots `...` are Python's prompt for\n", 349 | "multiple lines. This is Python's way of marking a block of code. [Note: you\n", 350 | "do not type `>>>` or `...`.]\n", 351 | "\n", 352 | "To add elements to the end of a list, we can use the `append` method. Methods\n", 353 | "are a way to interact with an object (a list, for example). We can invoke a\n", 354 | "method using the dot `.` followed by the method name and a list of arguments\n", 355 | "in parentheses. Let's look at an example using `append`:\n", 356 | "\n" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | ">>> numbers.append(4)\n", 366 | ">>> print(numbers)\n", 367 | "[1, 2, 3, 4]\n", 368 | ">>>\n" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "\n", 376 | "To find out what methods are available for an\n", 377 | "object, we can use the built-in `help` command:\n", 378 | "\n" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": null, 384 | "metadata": {}, 385 | "outputs": [], 386 | "source": [ 387 | "help(numbers)\n", 388 | "\n", 389 | "Help on list object:\n", 390 | "\n", 391 | "class list(object)\n", 392 | " | list() -> new empty list\n", 393 | " | list(iterable) -> new list initialized from iterable's items\n", 394 | " ...\n" 395 | ] 396 | }, 397 | { 398 | "cell_type": "markdown", 399 | "metadata": {}, 400 | "source": [ 401 | "\n", 402 | "### Tuples\n", 403 | "\n", 404 | "A tuple is similar to a list in that it's an ordered sequence of elements.\n", 405 | "However, tuples can not be changed once created (they are \"immutable\"). Tuples\n", 406 | "are created by placing comma-separated values inside parentheses `()`.\n", 407 | "\n" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": null, 413 | "metadata": {}, 414 | "outputs": [], 415 | "source": [ 416 | "# Tuples use parentheses\n", 417 | "a_tuple= (1, 2, 3)\n", 418 | "another_tuple = ('blue', 'green', 'red')\n", 419 | "# Note: lists use square brackets\n", 420 | "a_list = [1, 2, 3]\n" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "\n" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "## Challenge - Tuples\n", 435 | "1. What happens when you type `a_tuple[2]=5` vs `a_list[1]=5` ?\n", 436 | "2. Type `type(a_tuple)` into python - what is the object type?\n", 437 | "\n", 438 | "\n", 439 | "\n" 440 | ] 441 | }, 442 | { 443 | "cell_type": "markdown", 444 | "metadata": {}, 445 | "source": [ 446 | "## Dictionaries\n", 447 | "\n", 448 | "A **dictionary** is a container that holds pairs of objects - keys and values.\n", 449 | "\n" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": null, 455 | "metadata": {}, 456 | "outputs": [], 457 | "source": [ 458 | ">>> translation = {'one': 1, 'two': 2}\n", 459 | ">>> translation['one']\n", 460 | "1\n" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "Dictionaries work a lot like lists - except that you index them with *keys*.\n", 468 | "You can think about a key as a name for or a unique identifier for a set of values\n", 469 | "in the dictionary. Keys can only have particular types - they have to be\n", 470 | "\"hashable\". Strings and numeric types are acceptable, but lists aren't.\n", 471 | "\n" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": null, 477 | "metadata": {}, 478 | "outputs": [], 479 | "source": [ 480 | ">>> rev = {1: 'one', 2: 'two'}\n", 481 | ">>> rev[1]\n", 482 | "'one'\n", 483 | ">>> bad = {[1, 2, 3]: 3}\n", 484 | "Traceback (most recent call last):\n", 485 | " File \"\", line 1, in \n", 486 | "TypeError: unhashable type: 'list'\n" 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "\n", 494 | "In Python, a \"Traceback\" is an multi-line error block printed out for the\n", 495 | "user.\n", 496 | "\n", 497 | "To add an item to the dictionary we assign a value to a new key:\n", 498 | "\n" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": null, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | ">>> rev = {1: 'one', 2: 'two'}\n", 508 | ">>> rev[3] = 'three'\n", 509 | ">>> rev\n", 510 | "{1: 'one', 2: 'two', 3: 'three'}\n" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "metadata": {}, 516 | "source": [ 517 | "\n" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "metadata": {}, 523 | "source": [ 524 | "## Challenge - Can you do reassignment in a dictionary? \n", 525 | "1. First check what `rev` is right now (remember `rev` is the name of our dictionary). Type: **rev**\n", 526 | "\n", 527 | "2. Try to reassign the second value (in the *key value pair*) so that it no longer reads \"two\" but instead reads \"apple-sauce\". \n", 528 | "\n", 529 | "3. Now display `rev` again to see if it has changed. \n", 530 | "\n", 531 | "\n", 532 | "\n", 533 | "It is important to note that dictionaries are \"unordered\" and do not remember\n", 534 | "the sequence of their items (i.e. the order in which key:value pairs were\n", 535 | "added to the dictionary). Because of this, the order in which items are\n", 536 | "returned from loops over dictionaries might appear random and can even change\n", 537 | "with time.\n", 538 | "\n", 539 | "\n" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": null, 545 | "metadata": {}, 546 | "outputs": [], 547 | "source": [] 548 | } 549 | ], 550 | "metadata": { 551 | "celltoolbar": "Tags", 552 | "kernelspec": { 553 | "display_name": "Python 3", 554 | "language": "python", 555 | "name": "python3" 556 | }, 557 | "language_info": { 558 | "codemirror_mode": { 559 | "name": "ipython", 560 | "version": 3 561 | }, 562 | "file_extension": ".py", 563 | "mimetype": "text/x-python", 564 | "name": "python", 565 | "nbconvert_exporter": "python", 566 | "pygments_lexer": "ipython3", 567 | "version": "3.6.3" 568 | } 569 | }, 570 | "nbformat": 4, 571 | "nbformat_minor": 2 572 | } 573 | -------------------------------------------------------------------------------- /workshops/docs/modules/defensive_programming.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 12 | 13 | 35 | 36 | 37 | 38 | 39 | ## Defensive Programming 40 | 41 | 42 | Our previous lessons have introduced the basic tools of programming: variables and lists, file operations, data visualisation, loops, conditionals, and functions. What they haven’t done is show us how to tell whether a program is getting the right answer, and how to tell if it’s still getting the right answer as we make changes to it. 43 | 44 | To achieve that, we need to: 45 | 46 | - Write programs that check their own operation. 47 | - Write and run tests for widely-used functions. 48 | - Make sure we know what “correct” actually means. 49 | 50 | The good news is, doing these things will speed up our programming, not slow it down. As in real carpentry — the kind done with lumber — the time saved by measuring carefully before cutting a piece of wood is much greater than the time that measuring takes. 51 | 52 | 53 | 54 | 55 | 56 | ## Assertions 57 | 58 | The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion’s condition. If it’s true, Python does nothing, but if it’s false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn’t positive: 59 | 60 | 61 | 62 | 63 | 64 | ``` python 65 | numbers = [1.5, 2.3, 0.7, -0.001, 4.4] 66 | total = 0.0 67 | for n in numbers: 68 | assert n > 0.0, 'Data should only contain positive values' 69 | total += n 70 | print('total is:', total) 71 | 72 | ``` 73 | 74 | 75 | 76 | 77 | 78 | ```python 79 | --------------------------------------------------------------------------- 80 | AssertionError Traceback (most recent call last) 81 | in () 82 | 3 total = 0.0 83 | 4 for n in numbers: 84 | ----> 5 assert n > 0.0, 'Data should only contain positive values' 85 | 6 total += n 86 | 7 print('total is:', total) 87 | 88 | AssertionError: Data should only contain positive values 89 | 90 | ``` 91 | 92 | 93 | 94 | 95 | 96 | Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80–90% are working correctly. Broadly speaking, assertions fall into three categories: 97 | 98 | A `precondition` is something that must be true at the start of a function in order for it to work correctly. 99 | 100 | A `postcondition` is something that the function guarantees is true when it finishes. 101 | 102 | An `invariant` is something that is always true at a particular point inside a piece of code. 103 | 104 | For example, suppose we are representing rectangles using a `tuple` of four coordinates `(x0, y0, x1, y1)`, representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, but checks that its input is correctly formatted and that its result makes sense: 105 | 106 | 107 | 108 | 109 | 110 | 111 | ```python 112 | def normalize_rectangle(rect): 113 | '''Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis. 114 | Input should be of the format (x0, y0, x1, y1). 115 | (x0, y0) and (x1, y1) define the lower left and upper right corners 116 | of the rectangle, respectively.''' 117 | assert len(rect) == 4, 'Rectangles must contain 4 coordinates' 118 | x0, y0, x1, y1 = rect 119 | assert x0 < x1, 'Invalid X coordinates' 120 | assert y0 < y1, 'Invalid Y coordinates' 121 | 122 | dx = x1 - x0 123 | dy = y1 - y0 124 | if dx > dy: 125 | scaled = float(dx) / dy 126 | upper_x, upper_y = 1.0, scaled 127 | else: 128 | scaled = float(dx) / dy 129 | upper_x, upper_y = scaled, 1.0 130 | 131 | assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid' 132 | assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid' 133 | 134 | return (0, 0, upper_x, upper_y) 135 | ``` 136 | 137 | 138 | 139 | 140 | 141 | The preconditions on lines 3, 5, and 6 catch invalid inputs: 142 | 143 | 144 | 145 | 146 | 147 | ``` python 148 | print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate 149 | 150 | ``` 151 | 152 | 153 | 154 | 155 | 156 | ``` python 157 | --------------------------------------------------------------------------- 158 | AssertionError Traceback (most recent call last) 159 | in () 160 | ----> 1 print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate 161 | 162 | in normalize_rectangle(rect) 163 | 4 (x0, y0) and (x1, y1) define the lower left and upper right corners 164 | 5 of the rectangle, respectively.''' 165 | ----> 6 assert len(rect) == 4, 'Rectangles must contain 4 coordinates' 166 | 7 x0, y0, x1, y1 = rect 167 | 8 assert x0 < x1, 'Invalid X coordinates' 168 | 169 | AssertionError: Rectangles must contain 4 coordinates 170 | 171 | ``` 172 | 173 | 174 | 175 | 176 | 177 | ```python 178 | print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted 179 | ``` 180 | 181 | 182 | 183 | 184 | 185 | ```python 186 | --------------------------------------------------------------------------- 187 | AssertionError Traceback (most recent call last) 188 | in () 189 | ----> 1 print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted 190 | 191 | in normalize_rectangle(rect) 192 | 6 assert len(rect) == 4, 'Rectangles must contain 4 coordinates' 193 | 7 x0, y0, x1, y1 = rect 194 | ----> 8 assert x0 < x1, 'Invalid X coordinates' 195 | 9 assert y0 < y1, 'Invalid Y coordinates' 196 | 10 197 | 198 | AssertionError: Invalid X coordinates 199 | 200 | ``` 201 | 202 | 203 | 204 | 205 | 206 | The post-conditions on lines 17 and 18 help us catch bugs by telling us when our calculations cannot have been correct. For example, if we normalize a rectangle that is taller than it is wide everything seems OK: 207 | 208 | 209 | 210 | 211 | 212 | 213 | ```python 214 | print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) )) 215 | ``` 216 | 217 |
218 | 
output
219 | 220 | (0, 0, 0.2, 1.0) 221 | 222 | 223 |
224 | 225 | 226 | 227 | 228 | 229 | but if we normalize one that’s wider than it is tall, the assertion is triggered: 230 | 231 | 232 | 233 | 234 | 235 | ```python 236 | print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) )) 237 | ``` 238 | 239 | 240 | 241 | 242 | 243 | ``` python 244 | --------------------------------------------------------------------------- 245 | AssertionError Traceback (most recent call last) 246 | in () 247 | ----> 1 print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) )) 248 | 249 | in normalize_rectangle(rect) 250 | 19 251 | 20 assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid' 252 | ---> 21 assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid' 253 | 22 254 | 23 return (0, 0, upper_x, upper_y) 255 | 256 | AssertionError: Calculated upper Y coordinate invalid 257 | 258 | ``` 259 | 260 | 261 | 262 | 263 | 264 | Re-reading our function, we realize that line 11 should divide `dy` by `dx` rather than `dx` by `dy`. If we had left out the assertion at the end of the function, we would have created and returned something that had the right shape as a valid answer, but wasn’t. Detecting and debugging that would almost certainly have taken more time in the long run than writing the assertion. 265 | 266 | But assertions aren’t just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing. 267 | 268 | Most good programmers follow two rules when adding assertions to their code. The first is, fail early, fail often. The greater the distance between when and where an error occurs and when it’s noticed, the harder the error will be to debug, so good code catches mistakes as early as possible. 269 | 270 | The second rule is, turn bugs into assertions or tests. Whenever you fix a bug, write an assertion that catches the mistake should you make it again. If you made a mistake in a piece of code, the odds are good that you have made other mistakes nearby, or will make the same mistake (or a related one) the next time you change it. Writing assertions to check that you haven’t regressed (i.e., haven’t re-introduced an old problem) can save a lot of time in the long run, and helps to warn people who are reading the code (including your future self) that this bit is tricky. 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | ### Test-Driven Development 279 | 280 | An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it’s given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include: 281 | 282 | 283 | 284 | 285 | 286 | ![test diagram](images/testing.svg) 287 | 288 | 289 | 290 | 291 | 292 | Most novice programmers would solve this problem like this: 293 | 294 | 1. Write a function `range_overlap`. 295 | 2. Call it interactively on two or three different inputs. 296 | 3. If it produces the wrong answer, fix the function and re-run that test. 297 | 298 | This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way: 299 | 300 | 1. Write a short function for each test. 301 | 2. Write a `range_overlap` function that should pass those tests. 302 | 3. If `range_overlap` produces any wrong answers, fix it and re-run the test functions. 303 | 304 | Writing the tests before writing the function they exercise is called `test-driven development` (TDD). Its advocates believe it produces better code faster because: 305 | 306 | 1. If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors. 307 | 2. Writing tests helps programmers figure out what the function is actually supposed to do. 308 | 309 | Here are three test functions for `range_overlap`: 310 | 311 | 312 | 313 | 314 | 315 | ``` python 316 | assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0) 317 | assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0) 318 | assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0) 319 | ``` 320 | 321 | 322 | 323 | 324 | 325 | ```python 326 | --------------------------------------------------------------------------- 327 | NameError Traceback (most recent call last) 328 | in () 329 | ----> 1 assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0) 330 | 2 assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0) 331 | 3 assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0) 332 | 333 | NameError: name 'range_overlap' is not defined 334 | ``` 335 | 336 | 337 | 338 | 339 | 340 | 341 | The error is actually reassuring: we haven’t written `range_overlap` yet, so if the tests passed, it would be a sign that someone else had and that we were accidentally using their function. 342 | 343 | And as a bonus of writing these tests, we’ve implicitly defined what our input and output look like: we expect a list of pairs as input, and produce a single pair as output. 344 | 345 | Something important is missing, though. We don’t have any tests for the case where the ranges don’t overlap at all: 346 | 347 | 348 | 349 | 350 | 351 | ```python 352 | assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == ??? 353 | ``` 354 | 355 | 356 | 357 | 358 | 359 | What should `range_overlap` do in this case: fail with an error message, produce a special value like `(0.0, 0.0)` to signal that there’s no overlap, or *something* else? Any actual implementation of the function will do one of these things; writing the tests first helps us figure out which is *best before* we’re emotionally invested in whatever we happened to write before we realized there was an issue. 360 | 361 | And what about this case? 362 | 363 | 364 | 365 | 366 | 367 | ```python 368 | assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == ??? 369 | ``` 370 | 371 | 372 | 373 | 374 | 375 | Do two segments that touch at their endpoints overlap or not? Mathematicians usually say “yes”, but engineers usually say “no”. The best answer is “whatever is most useful in the rest of our program”, but again, any actual implementation of `range_overlap` is going to do *something*, and whatever it is ought to be consistent with what it does when there’s no overlap at all. 376 | 377 | Since we’re planning to use the range this function returns as the X axis in a time series chart, we decide that: 378 | 379 | 1. every overlap has to have non-zero width, and 380 | 2. we will return the special value None when there’s no overlap. 381 | 382 | `None` is built into Python, and means “nothing here”. (Other languages often call the equivalent value `null` or `nil`). With that decision made, we can finish writing our last two tests: 383 | 384 | 385 | 386 | 387 | 388 | ```python 389 | assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None 390 | assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None 391 | ``` 392 | 393 | 394 | 395 | 396 | 397 | ```python 398 | --------------------------------------------------------------------------- 399 | NameError Traceback (most recent call last) 400 | in () 401 | ----> 1 assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None 402 | 2 assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None 403 | 404 | NameError: name 'range_overlap' is not defined 405 | ``` 406 | 407 | 408 | 409 | 410 | 411 | Again, we get an error because we haven’t written our function, but we’re now ready to do so: 412 | 413 | 414 | 415 | 416 | 417 | 418 | 419 | ```python 420 | def range_overlap(ranges): 421 | '''Return common overlap among a set of [low, high] ranges.''' 422 | lowest = 0.0 423 | highest = 1.0 424 | for (low, high) in ranges: 425 | lowest = max(lowest, low) 426 | highest = min(highest, high) 427 | return (lowest, highest) 428 | ``` 429 | 430 | 431 | 432 | 433 | 434 | (Take a moment to think about why we use `max` to raise `lowest` and `min` to lower `highest`). We’d now like to re-run our tests, but they’re scattered across three different cells. To make running them easier, let’s put them all in a function: 435 | 436 | 437 | 438 | 439 | 440 | 441 | ```python 442 | def test_range_overlap(): 443 | assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None 444 | assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None 445 | assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0) 446 | assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0) 447 | assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0) 448 | ``` 449 | 450 | 451 | 452 | 453 | 454 | We can now test `range_overlap` with a single function call: 455 | 456 | 457 | 458 | 459 | 460 | ```python 461 | test_range_overlap() 462 | ``` 463 | 464 | 465 | 466 | 467 | 468 | ```python 469 | --------------------------------------------------------------------------- 470 | AssertionError Traceback (most recent call last) 471 | in () 472 | ----> 1 test_range_overlap() 473 | 474 | in test_range_overlap() 475 | 1 def test_range_overlap(): 476 | ----> 2 assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None 477 | 3 assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None 478 | 4 assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0) 479 | 5 assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0) 480 | 481 | AssertionError: 482 | ``` 483 | 484 | 485 | 486 | 487 | 488 | The first test that was supposed to produce `None` fails, so we know something is wrong with our function. We don’t know whether the other tests passed or failed because Python halted the program as soon as it spotted the first error. Still, some information is better than none, and if we trace the behavior of the function with that input, we realize that we’re initializing `lowest` and `highest` to 0.0 and 1.0 respectively, regardless of the input values. This violates another important rule of programming: always initialize from data. 489 | 490 | 491 | 492 | 493 | 494 | Fix `range_overlap`. Re-run `test_range_overlap` after each change you make. 495 | 496 | 497 | 498 | 499 | 519 | 520 | 521 | 522 | 523 | ## Key points 524 | 525 | - Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do. 526 | 527 | - Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work. 528 | 529 | - Use preconditions to check that the inputs to a function are safe to use. 530 | 531 | - Use postconditions to check that the output from a function is safe to use. 532 | 533 | - Write tests before writing code in order to help determine exactly what that code is supposed to do. 534 | 535 | 536 | 537 | 538 | 539 | 540 | ```python 541 | 542 | ``` 543 | 544 | 545 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/loops.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Automation with Loops" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "tags": [ 14 | "solution" 15 | ] 16 | }, 17 | "source": [ 18 | "## Instructor notes\n", 19 | "\n", 20 | "*Estimated teaching time:* 30 min\n", 21 | "\n", 22 | "*Estimated challenge time:* 0 min\n", 23 | "\n", 24 | "*Key questions:*\n", 25 | "\n", 26 | " - \"How can I do the same operations on many different values?\"\"\n", 27 | " \n", 28 | "*Learning objectives:*\n", 29 | "\n", 30 | " - \"Explain what a `for` loop does.\"\n", 31 | " - \"Correctly write `for` loops to repeat simple calculations.\"\n", 32 | " - \"Trace changes to a loop variable as the loop runs.\"\n", 33 | " - \"Trace changes to other variables as they are updated by a `for` loop.\"\n", 34 | "\n", 35 | "*Key points:*\n", 36 | "\n", 37 | " - \"Use `for variable in sequence` to process the elements of a sequence one at a time.\"\n", 38 | " - \"The body of a `for` loop must be indented.\"\n", 39 | " - \"Use `len(thing)` to determine the length of something that contains other values.\"\n", 40 | "\n", 41 | "---" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "An example task that we might want to repeat is printing each character in a\n", 49 | "word on a line of its own." 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 17, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "word = 'lead'" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "We can access a character in a string using its index. For example, we can get the first\n", 66 | "character of the word `'lead'`, by using `word[0]`. One way to print each character is to use\n", 67 | "four `print` statements:" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 18, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "name": "stdout", 77 | "output_type": "stream", 78 | "text": [ 79 | "l\n", 80 | "e\n", 81 | "a\n", 82 | "d\n" 83 | ] 84 | } 85 | ], 86 | "source": [ 87 | "print(word[0])\n", 88 | "print(word[1])\n", 89 | "print(word[2])\n", 90 | "print(word[3])" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "While this works, it's a bad approach for two reasons:\n", 98 | "\n", 99 | "1. It doesn't scale:\n", 100 | " if we want to print the characters in a string that's hundreds of letters long,\n", 101 | " we'd be better off just typing them in.\n", 102 | "\n", 103 | "2. It's fragile:\n", 104 | " if we give it a longer string,\n", 105 | " it only prints part of the data,\n", 106 | " and if we give it a shorter one,\n", 107 | " it produces an error because we're asking for characters that don't exist.\n", 108 | "\n" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "Running:\n", 116 | "\n", 117 | "```python\n", 118 | "word = 'tin'\n", 119 | "print(word[0])\n", 120 | "print(word[1])\n", 121 | "print(word[2])\n", 122 | "print(word[3])\n", 123 | "```" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "Gives the error:\n", 131 | "\n", 132 | "```\n", 133 | "---------------------------------------------------------------------------\n", 134 | "IndexError Traceback (most recent call last)\n", 135 | " in ()\n", 136 | " 3 print(word[1])\n", 137 | " 4 print(word[2])\n", 138 | "----> 5 print(word[3])\n", 139 | "\n", 140 | "IndexError: string index out of range\n", 141 | "```" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "\n", 149 | "\n", 150 | "Here's a better approach:\n", 151 | "\n" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 19, 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "name": "stdout", 161 | "output_type": "stream", 162 | "text": [ 163 | "l\n", 164 | "e\n", 165 | "a\n", 166 | "d\n" 167 | ] 168 | } 169 | ], 170 | "source": [ 171 | "word = 'lead'\n", 172 | "for char in word:\n", 173 | " print(char)" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "This is shorter --- certainly shorter than something that prints every character in a hundred-letter string --- and\n", 181 | "more robust as well:" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 20, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "name": "stdout", 191 | "output_type": "stream", 192 | "text": [ 193 | "o\n", 194 | "x\n", 195 | "y\n", 196 | "g\n", 197 | "e\n", 198 | "n\n" 199 | ] 200 | } 201 | ], 202 | "source": [ 203 | "word = 'oxygen'\n", 204 | "for char in word:\n", 205 | " print(char)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "The improved version uses a **for loop** to repeat an operation --- in this case, printing --- once for each thing in a sequence.\n", 213 | "The general form of a loop is:\n", 214 | "\n", 215 | "```python\n", 216 | "for variable in collection:\n", 217 | " # do things with variable\n", 218 | "```" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "\n", 226 | "Using the oxygen example above, the loop might look like this:\n", 227 | "\n", 228 | "![loop_image](images/loops_image.png)\n", 229 | "\n", 230 | "where each character (`char`) in the variable `word` is looped through and printed one character after another.\n", 231 | "The numbers in the diagram denote which loop cycle the character was printed in (1 being the first loop, and 6 being the final loop).\n", 232 | "\n", 233 | "We can call the **loop variable** anything we like,\n", 234 | "but there must be a colon at the end of the line starting the loop, and we must indent anything we want to run inside the loop. Unlike many other languages, there is no command to signify the end of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop.\n", 235 | "\n", 236 | "\n" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "## What's in a name?\n", 244 | "\n", 245 | "\n", 246 | "In the example above, the loop variable was given the name `char` as a mnemonic; it is short for 'character'. \n", 247 | "We can choose any name we want for variables. We might just as easily have chosen the name `banana` for the loop variable, as long as we use the same name when we invoke the variable inside the loop:\n", 248 | "\n" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 21, 254 | "metadata": {}, 255 | "outputs": [ 256 | { 257 | "name": "stdout", 258 | "output_type": "stream", 259 | "text": [ 260 | "o\n", 261 | "x\n", 262 | "y\n", 263 | "g\n", 264 | "e\n", 265 | "n\n" 266 | ] 267 | } 268 | ], 269 | "source": [ 270 | "word = 'oxygen'\n", 271 | "for banana in word:\n", 272 | " print(banana)" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "It is a good idea to choose variable names that are meaningful, otherwise it would be more difficult to understand what the loop is doing.\n", 280 | "\n", 281 | "\n", 282 | "Here's another loop that repeatedly updates a variable:" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 22, 288 | "metadata": {}, 289 | "outputs": [ 290 | { 291 | "name": "stdout", 292 | "output_type": "stream", 293 | "text": [ 294 | "There are 5 vowels\n" 295 | ] 296 | } 297 | ], 298 | "source": [ 299 | "length = 0\n", 300 | "for vowel in 'aeiou':\n", 301 | " length = length + 1\n", 302 | "print('There are', length, 'vowels')" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "It's worth tracing the execution of this little program step by step.\n", 310 | "\n", 311 | "Since there are five characters in `'aeiou'`,\n", 312 | "the statement on line 3 will be executed five times.\n", 313 | "\n", 314 | "The first time around,\n", 315 | "`length` is zero (the value assigned to it on line 1)\n", 316 | "and `vowel` is `'a'`.\n", 317 | "The statement adds 1 to the old value of `length`,\n", 318 | "producing 1,\n", 319 | "and updates `length` to refer to that new value.\n", 320 | "\n", 321 | "The next time around,\n", 322 | "`vowel` is `'e'` and `length` is 1,\n", 323 | "so `length` is updated to be 2.\n", 324 | "\n", 325 | "After three more updates,\n", 326 | "`length` is 5;\n", 327 | "since there is nothing left in `'aeiou'` for Python to process,\n", 328 | "the loop finishes\n", 329 | "and the `print` statement on line 4 tells us our final answer.\n", 330 | "\n", 331 | "Note that a loop variable `vowel` is just a variable that's being used to record progress in a loop." 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "## Challenge - scope of the loop variable\n", 339 | "\n", 340 | "1. In the loop over `\"aeiou\"` above, does the loop variable `vowel` exist after the loop has finished ?\n" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 23, 346 | "metadata": {}, 347 | "outputs": [ 348 | { 349 | "name": "stdout", 350 | "output_type": "stream", 351 | "text": [ 352 | "After the loop, `vowel` exists and has the value: u\n" 353 | ] 354 | } 355 | ], 356 | "source": [ 357 | "length = 0\n", 358 | "for vowel in 'aeiou':\n", 359 | " length = length + 1\n", 360 | "print('After the loop, `vowel` exists and has the value: ' + vowel)\n", 361 | "\n", 362 | "# The loop variable `vowel` exists after the loop is completed, not only inside the loop" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "Note also that finding the length of a string is such a common operation that Python actually has a built-in function to do it called `len`:" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 24, 375 | "metadata": {}, 376 | "outputs": [ 377 | { 378 | "name": "stdout", 379 | "output_type": "stream", 380 | "text": [ 381 | "5\n" 382 | ] 383 | } 384 | ], 385 | "source": [ 386 | "print(len('aeiou'))" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "`len` is much faster than any function we could write ourselves,\n", 394 | "and much easier to read than a two-line loop;\n", 395 | "it will also give us the length of many other things that we haven't met yet,\n", 396 | "so we should always use it when we can." 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "## From 1 to N\n", 404 | "\n", 405 | "Python has a built-in function called `range` that creates a sequence of numbers. `range` can\n", 406 | "accept 1, 2, or 3 parameters.\n", 407 | "\n", 408 | "* If one parameter is given, `range` creates an array of that length,\n", 409 | " starting at zero and incrementing by 1.\n", 410 | " For example, `range(3)` produces the numbers `0, 1, 2`.\n", 411 | "* If two parameters are given, `range` starts at\n", 412 | " the first and ends just before the second, incrementing by one.\n", 413 | " For example, `range(2, 5)` produces `2, 3, 4`.\n", 414 | "* If `range` is given 3 parameters,\n", 415 | " it starts at the first one, ends just before the second one, and increments by the third one.\n", 416 | " For exmaple `range(3, 10, 2)` produces `3, 5, 7, 9`.\n", 417 | "\n" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": { 423 | "tags": [ 424 | "challenge" 425 | ] 426 | }, 427 | "source": [ 428 | "## Challenge - loop over a range\n", 429 | "Using `range`,\n", 430 | "write a loop that uses `range` to print the first 3 natural numbers:\n", 431 | "\n", 432 | "```\n", 433 | "1\n", 434 | "2\n", 435 | "3\n", 436 | "```\n", 437 | "\n", 438 | "\n" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": { 444 | "tags": [ 445 | "solution" 446 | ] 447 | }, 448 | "source": [ 449 | "## Solution" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": 25, 455 | "metadata": { 456 | "tags": [ 457 | "solution" 458 | ] 459 | }, 460 | "outputs": [ 461 | { 462 | "name": "stdout", 463 | "output_type": "stream", 464 | "text": [ 465 | "1\n", 466 | "2\n", 467 | "3\n" 468 | ] 469 | } 470 | ], 471 | "source": [ 472 | "for i in range(1, 4):\n", 473 | " print(i)" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "## Computing Powers With Loops\n", 481 | "\n", 482 | "Exponentiation is built into Python:" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": 26, 488 | "metadata": {}, 489 | "outputs": [ 490 | { 491 | "name": "stdout", 492 | "output_type": "stream", 493 | "text": [ 494 | "125\n" 495 | ] 496 | } 497 | ], 498 | "source": [ 499 | "print(5 ** 3)" 500 | ] 501 | }, 502 | { 503 | "cell_type": "markdown", 504 | "metadata": { 505 | "tags": [ 506 | "challenge" 507 | ] 508 | }, 509 | "source": [ 510 | "## Challenge - multiplication in a loop\n", 511 | "\n", 512 | "Write a loop that calculates the same result as `5 ** 3` using\n", 513 | "multiplication (and without exponentiation)." 514 | ] 515 | }, 516 | { 517 | "cell_type": "markdown", 518 | "metadata": { 519 | "tags": [ 520 | "solution" 521 | ] 522 | }, 523 | "source": [ 524 | "## Solution" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 27, 530 | "metadata": { 531 | "tags": [ 532 | "solution" 533 | ] 534 | }, 535 | "outputs": [ 536 | { 537 | "name": "stdout", 538 | "output_type": "stream", 539 | "text": [ 540 | "125\n" 541 | ] 542 | } 543 | ], 544 | "source": [ 545 | "result = 1\n", 546 | "for i in range(0, 3):\n", 547 | " result = result * 5\n", 548 | "print(result)" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": { 554 | "tags": [ 555 | "challenge" 556 | ] 557 | }, 558 | "source": [ 559 | "## Bonus challenge: reverse a string\n", 560 | "\n", 561 | "Knowing that two strings can be concatenated using the `+` operator,\n", 562 | "write a loop that takes a string\n", 563 | "and produces a new string with the characters in reverse order,\n", 564 | "so `'Newton'` becomes `'notweN'`." 565 | ] 566 | }, 567 | { 568 | "cell_type": "markdown", 569 | "metadata": { 570 | "tags": [ 571 | "solution" 572 | ] 573 | }, 574 | "source": [ 575 | "## Solution" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 28, 581 | "metadata": { 582 | "tags": [ 583 | "solution" 584 | ] 585 | }, 586 | "outputs": [ 587 | { 588 | "name": "stdout", 589 | "output_type": "stream", 590 | "text": [ 591 | "notweN\n" 592 | ] 593 | } 594 | ], 595 | "source": [ 596 | "newstring = ''\n", 597 | "oldstring = 'Newton'\n", 598 | "for char in oldstring:\n", 599 | " newstring = char + newstring\n", 600 | "print(newstring)" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "## Enumerate\n", 608 | "\n", 609 | "The built-in function `enumerate` takes a sequence (e.g. a list) and generates a\n", 610 | "new sequence of the same length. Each element of the new sequence is a pair composed of the index\n", 611 | "(0, 1, 2,...) and the value from the original sequence:\n", 612 | "\n", 613 | "```\n", 614 | "for i, x in enumerate(xs):\n", 615 | " # Do something with i and x\n", 616 | "```\n", 617 | "\n", 618 | "\n", 619 | "The code above loops through `xs`, assigning the index to `i` and the value to `x`." 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": { 625 | "tags": [ 626 | "challenge" 627 | ] 628 | }, 629 | "source": [ 630 | "## Bonus challenge: enumeration for computing the value of a polynomial\n", 631 | "\n", 632 | "Suppose you have encoded a polynomial as a list of coefficients in\n", 633 | "the following way: the first element is the constant term, the\n", 634 | "second element is the coefficient of the linear term, the third is the\n", 635 | "coefficient of the quadratic term, etc.\n", 636 | "\n", 637 | "```\n", 638 | "x = 5\n", 639 | "cc = [2, 4, 3]\n", 640 | "```\n", 641 | "\n", 642 | "\n", 643 | "```\n", 644 | "y = cc[0] * x**0 + cc[1] * x**1 + cc[2] * x**2\n", 645 | "y = 97\n", 646 | "```\n", 647 | "\n", 648 | "\n", 649 | "Write a loop using `enumerate(cc)` which computes the value `y` of any\n", 650 | "polynomial, given `x` and `cc`." 651 | ] 652 | }, 653 | { 654 | "cell_type": "markdown", 655 | "metadata": { 656 | "tags": [ 657 | "solution" 658 | ] 659 | }, 660 | "source": [ 661 | "## Solution" 662 | ] 663 | }, 664 | { 665 | "cell_type": "code", 666 | "execution_count": 29, 667 | "metadata": { 668 | "tags": [ 669 | "solution" 670 | ] 671 | }, 672 | "outputs": [ 673 | { 674 | "name": "stdout", 675 | "output_type": "stream", 676 | "text": [ 677 | "97\n" 678 | ] 679 | } 680 | ], 681 | "source": [ 682 | "x = 5\n", 683 | "cc = [2, 4, 3]\n", 684 | "y = cc[0] * x**0 + cc[1] * x**1 + cc[2] * x**2\n", 685 | "\n", 686 | "y = 0\n", 687 | "for i, c in enumerate(cc):\n", 688 | " y = y + x**i * c\n", 689 | " \n", 690 | "print(y)" 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "execution_count": null, 696 | "metadata": {}, 697 | "outputs": [], 698 | "source": [] 699 | } 700 | ], 701 | "metadata": { 702 | "celltoolbar": "Tags", 703 | "kernelspec": { 704 | "display_name": "Python 3", 705 | "language": "python", 706 | "name": "python3" 707 | }, 708 | "language_info": { 709 | "codemirror_mode": { 710 | "name": "ipython", 711 | "version": 3 712 | }, 713 | "file_extension": ".py", 714 | "mimetype": "text/x-python", 715 | "name": "python", 716 | "nbconvert_exporter": "python", 717 | "pygments_lexer": "ipython3", 718 | "version": "3.6.6" 719 | } 720 | }, 721 | "nbformat": 4, 722 | "nbformat_minor": 2 723 | } 724 | -------------------------------------------------------------------------------- /workshops/docs/modules/plotting_with_ggplot.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 12 | 13 | 14 | # Making Plots With plotnine (aka ggplot) 15 | 16 | 17 | 18 | 19 | 44 | 45 | 46 | 47 | 48 | ## Introduction 49 | 50 | Python has a number of powerful plotting libraries to choose from. One of the oldest and most popular is [`matplotlib`](https://matplotlib.org/) - it forms the foundation for many other Python plotting libraries. For this exercise we are going to use [`plotnine`](https://plotnine.readthedocs.io/en/stable/) which is a Python implementation of the [The Grammar of Graphics](http://link.springer.com/book/10.1007%2F0-387-28695-0), inspired by the interface of the [`ggplot2`](http://ggplot2.org/) package from R. `plotnine` (and it's R cousin `ggplot2`) is a very nice way to create publication quality plots. 51 | 52 | #### The Grammar of Graphics 53 | 54 | > Statistical graphics is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars) 55 | 56 | > Faceting can be used to generate the same plot for different subsets of the dataset 57 | 58 | These are basic building blocks according to the grammar of graphics: 59 | 60 | - **data** The data + a set of aesthetic mappings that describing variables mapping 61 | - **geom** Geometric objects, represent what you actually see on the plot: points, lines, polygons, etc. 62 | - **stats** Statistical transformations, summarise data in many useful ways. 63 | - **scale** The scales map values in the data space to values in an aesthetic space 64 | - **coord** A coordinate system, describes how data coordinates are mapped to the plane of the graphic. 65 | - **facet** A faceting specification describes how to break up the data into subsets for plotting individual set 66 | 67 | Let's explore these in detail. 68 | 69 | 70 | 71 | 72 | 73 | First, install the `pandas` and `plotnine` packages to ensure they are available. 74 | 75 | 76 | 77 | 78 | 79 | 80 | ```python 81 | !pip install pandas plotnine 82 | ``` 83 | 84 |
 85 | 
output
86 | 87 | Requirement already satisfied: pandas in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (0.25.0) 88 | Requirement already satisfied: plotnine in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (0.5.1) 89 | Requirement already satisfied: numpy>=1.13.3 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from pandas) (1.17.0) 90 | Requirement already satisfied: python-dateutil>=2.6.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from pandas) (2.8.0) 91 | Requirement already satisfied: pytz>=2017.2 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from pandas) (2019.1) 92 | Requirement already satisfied: descartes>=1.1.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (1.1.0) 93 | Requirement already satisfied: scipy>=1.0.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (1.3.0) 94 | Requirement already satisfied: patsy>=0.4.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (0.5.1) 95 | Requirement already satisfied: matplotlib>=3.0.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (3.1.1) 96 | Requirement already satisfied: statsmodels>=0.8.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (0.10.1) 97 | Requirement already satisfied: mizani>=0.5.2 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (0.5.4) 98 | Requirement already satisfied: six>=1.5 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0) 99 | Requirement already satisfied: cycler>=0.10 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from matplotlib>=3.0.0->plotnine) (0.10.0) 100 | Requirement already satisfied: kiwisolver>=1.0.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from matplotlib>=3.0.0->plotnine) (1.1.0) 101 | Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from matplotlib>=3.0.0->plotnine) (2.4.1.1) 102 | Requirement already satisfied: palettable in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from mizani>=0.5.2->plotnine) (3.2.0) 103 | Requirement already satisfied: setuptools in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib>=3.0.0->plotnine) (39.1.0) 104 | 105 | 106 |
107 | 108 | 109 | 110 | 111 | 112 | 113 | ```python 114 | # We run this to suppress various deprecation warnings from plotnine - keeps our notebook cleaner 115 | import warnings 116 | warnings.filterwarnings('ignore') 117 | ``` 118 | 119 | 120 | 121 | 122 | 123 | # Plotting in ggplot style 124 | 125 | Let's set up our working environment with necessary libraries and also load our csv file into data frame called `survs_df`, 126 | 127 | 128 | 129 | 130 | 131 | 132 | ```python 133 | import numpy as np 134 | import pandas as pd 135 | from plotnine import * 136 | 137 | %matplotlib inline 138 | survs_df = pd.read_csv('surveys.csv').dropna() 139 | ``` 140 | 141 | 142 | 143 | 144 | 145 | 146 | To produce a plot with the `ggplot` class from `plotnine`, we must provide three things: 147 | 148 | 1. A data frame containing our data. 149 | 2. How the columns of the data frame can be translated into positions, colors, sizes, and shapes of graphical elements ("aesthetics"). 150 | 3. The actual graphical elements to display ("geometric objects"). 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | ## Introduction to plotting 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | ```python 168 | ggplot(survs_df, aes(x='weight', y='hindfoot_length')) + geom_point() 169 | ``` 170 | 171 | 172 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_10_0.png) 173 | 174 | 175 | 176 | 177 | 178 |
179 | 
output
180 | 181 | 182 | 183 |
184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | Let's see if we can also include information about species and year. 192 | 193 | 194 | 195 | 196 | 197 | 198 | ```python 199 | ggplot(survs_df, aes(x='weight', y='hindfoot_length', 200 | size = 'year')) + geom_point() 201 | ``` 202 | 203 | 204 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_12_0.png) 205 | 206 | 207 | 208 | 209 | 210 |
211 | 
output
212 | 213 | 214 | 215 |
216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | Notice that we've dropped the `x=` and `y=` ? These are implied for the first and second argument of `aes()`. 224 | 225 | 226 | 227 | 228 | 229 | 230 | ```python 231 | ggplot(survs_df, aes(x='weight', y='hindfoot_length', 232 | size = 'year', color = 'species_id')) + geom_point() 233 | ``` 234 | 235 | 236 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_14_0.png) 237 | 238 | 239 | 240 | 241 | 242 |
243 | 
output
244 | 245 | 246 | 247 |
248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | We can do simple counting plot, to see how many observation (data points) we have for each year for example 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | ```python 266 | ggplot(survs_df, aes(x='year')) + \ 267 | geom_bar(stat = 'count') 268 | ``` 269 | 270 | 271 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_16_0.png) 272 | 273 | 274 | 275 | 276 | 277 |
278 | 
output
279 | 280 | 281 | 282 |
283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | Let's now also color by species to see how many observation we have per species in a given year 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | ```python 301 | ggplot(survs_df, aes(x='year', fill = 'species_id')) + \ 302 | geom_bar(stat = 'count') 303 | ``` 304 | 305 | 306 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_18_0.png) 307 | 308 | 309 | 310 | 311 | 312 |
313 | 
output
314 | 315 | 316 | 317 |
318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | ## Challenges 326 | 327 | 1. Produce a plot comparing the number of observations for each species at each site. The plot should have `site_id` on the x axis, ideally as categorical data. (HINT: You can convert a column in a DataFrame `df` to the 'category' type using: `df['some_col_name'] = df['some_col_name'].astype('category')`) 328 | 329 | 2. Create a **boxplot** of `hindfoot_length` across different species (`species_id` column) (HINT: There's a list of _geoms_ available for `plotnine` in the [docs](https://plotnine.readthedocs.io/en/stable/api.html#geoms) - instead of `geom_bar`, which one should you use ?) 330 | 331 | 332 | 333 | 334 | 337 | 338 | 339 | 340 | 370 | 371 | 372 | 373 | 398 | 399 | 400 | 401 | 402 | ## More geom types 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | ```python 412 | ggplot(survs_df, aes(x='year', y='weight')) + \ 413 | geom_boxplot() 414 | ``` 415 | 416 | 417 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_24_0.png) 418 | 419 | 420 | 421 | 422 | 423 |
424 | 
output
425 | 426 | 427 | 428 |
429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | Why are we not seeing mulitple boxplots, one for each year? 437 | This is because year variable is continuous in our data frame, but for this purpose we want it to be categorical. 438 | 439 | 440 | 441 | 442 | 443 | 444 | ```python 445 | survs_df['year_fact'] = survs_df['year'].astype("category") 446 | 447 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \ 448 | geom_boxplot() 449 | ``` 450 | 451 | 452 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_26_0.png) 453 | 454 | 455 | 456 | 457 | 458 |
459 | 
output
460 | 461 | 462 | 463 |
464 | 465 | 466 | 467 | 468 | 469 | 470 | 471 | You'll notice the x-axis labels are overlapped. To flip them 90-degrees we can apply a `theme` so they look less cluttered. We will revisit themes later. 472 | 473 | 474 | 475 | 476 | 477 | 478 | ```python 479 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \ 480 | geom_boxplot() + \ 481 | theme(axis_text_x = element_text(angle=90, hjust=1)) 482 | ``` 483 | 484 | 485 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_28_0.png) 486 | 487 | 488 | 489 | 490 | 491 |
492 | 
output
493 | 494 | 495 | 496 |
497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | To save some typing, let's define this x-axis label rotating theme as a short variable name that we can reuse: 505 | 506 | 507 | 508 | 509 | 510 | 511 | ```python 512 | flip_xlabels = theme(axis_text_x = element_text(angle=90, hjust=1)) 513 | ``` 514 | 515 | 516 | 517 | 518 | 519 | 520 | ```python 521 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \ 522 | geom_violin() + \ 523 | flip_xlabels 524 | ``` 525 | 526 | 527 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_31_0.png) 528 | 529 | 530 | 531 | 532 | 533 |
534 | 
output
535 | 536 | 537 | 538 |
539 | 540 | 541 | 542 | 543 | 544 | 545 | 546 | To save an image for later: 547 | 548 | 549 | 550 | 551 | 552 | 553 | ```python 554 | plt1 = ggplot(survs_df, aes(x='year_fact', y='weight')) + \ 555 | geom_boxplot() + \ 556 | xlab("Years") + \ 557 | ylab("Weight log2(kg)") + \ 558 | ggtitle("Boxplots, summary of species weight in each year") 559 | 560 | ggsave(filename="plot1.png", 561 | plot=plt1, 562 | device='png', 563 | dpi=300, 564 | height=25, 565 | width=25) 566 | ``` 567 | 568 | 569 | 570 | 571 | 572 | ## Challenges 573 | 574 | 1. Can you log2 transform `weight` and plot a "normalised" boxplot ? Hint: use `np.log2()` function and name new column `weight_log`. 575 | 576 | 2. Does a log2 transform make this data visualisation better ? 577 | 578 | 579 | 580 | 581 | 584 | 585 | 586 | 587 | 616 | 617 | 618 | 619 | 620 | ## Faceting 621 | 622 | ggplot has a special technique called *faceting* that allows to split one plot 623 | into multiple plots based on a factor included in the dataset. We will use it to 624 | make one plot for a time series for each species. 625 | 626 | 627 | 628 | 629 | 630 | 631 | 632 | 633 | ```python 634 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \ 635 | geom_boxplot() + \ 636 | facet_wrap(['sex']) + \ 637 | flip_xlabels + \ 638 | theme(axis_text_x = element_text(size=6)) 639 | ``` 640 | 641 | 642 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_38_0.png) 643 | 644 | 645 | 646 | 647 | 648 |
649 | 
output
650 | 651 | 652 | 653 |
654 | 655 | 656 | 657 | 658 | 659 | 660 | 661 | 662 | ```python 663 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \ 664 | geom_boxplot() + \ 665 | theme(axis_text_x = element_text(size=4)) + \ 666 | facet_wrap(['species_id']) + \ 667 | flip_xlabels 668 | ``` 669 | 670 | 671 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_39_0.png) 672 | 673 | 674 | 675 | 676 | 677 |
678 | 
output
679 | 680 | 681 | 682 |
683 | 684 | 685 | 686 | 687 | 688 | 689 | 690 | The two faceted plots above are probably easier to interpret using the `weight_log` column we created - give it a try ! 691 | 692 | 693 | 694 | 695 | 696 | ## The "Layered Grammar of Graphics" 697 | 698 | ```erlang 699 | ggplot(data = ) + 700 | ( 701 | mapping = aes(), 702 | stat = , 703 | position = 704 | ) + 705 | + 706 | 707 | ``` 708 | 709 | 710 | 711 | 712 | 713 | ## Theming 714 | 715 | `plotnine` allows pre-defined 'themes' to be applied as aesthetics to the plot. 716 | 717 | A list available theme you may want to experiment with is here: https://plotnine.readthedocs.io/en/stable/api.html#themes 718 | 719 | 720 | 721 | 722 | 723 | 724 | ```python 725 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \ 726 | geom_boxplot() + \ 727 | theme_bw() + \ 728 | flip_xlabels 729 | ``` 730 | 731 | 732 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_43_0.png) 733 | 734 | 735 | 736 | 737 | 738 |
739 | 
output
740 | 741 | 742 | 743 |
744 | 745 | 746 | 747 | 748 | 749 | 750 | 751 | 752 | ```python 753 | ggplot(survs_df, aes(x='year_fact', y='weight_log')) + \ 754 | geom_boxplot() + \ 755 | facet_wrap(['species_id']) + \ 756 | theme_xkcd() + \ 757 | theme(axis_text_x = element_text(size=4, angle=90, hjust=1)) 758 | ``` 759 | 760 | 761 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_44_0.png) 762 | 763 | 764 | 765 | 766 | 767 |
768 | 
output
769 | 770 | 771 | 772 |
773 | 774 | 775 | 776 | 777 | 778 | 779 | 780 | ## Extra bits 1 781 | 782 | Let's try to bin years into decades, which could be crude but might gives simple images to look at. 783 | 784 | 785 | 786 | 787 | 788 | 789 | 790 | 791 | ```python 792 | bins = [(survs_df['year'] < 1980), 793 | (survs_df['year'] < 1990), 794 | (survs_df['year'] < 2000), 795 | (survs_df['year'] >= 2000)] 796 | 797 | labels = ['70s', '80s', '90s', 'Z'] 798 | 799 | survs_df['year_bins'] = np.select(bins, labels) 800 | ``` 801 | 802 | 803 | 804 | 805 | 806 | 807 | ```python 808 | plt2 = ggplot(survs_df, aes(x='year_bins', y='weight_log')) + \ 809 | geom_boxplot() 810 | plt2 811 | ``` 812 | 813 | 814 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_47_0.png) 815 | 816 | 817 | 818 | 819 | 820 |
821 | 
output
822 | 823 | 824 | 825 |
826 | 827 | 828 | 829 | 830 | 831 | 832 | 833 | 834 | ```python 835 | plt2 = ggplot(survs_df, aes(x='year_bins', y='weight_log')) + \ 836 | geom_boxplot() + \ 837 | flip_xlabels + \ 838 | facet_wrap(['species_id']) 839 | plt2 840 | ``` 841 | 842 | 843 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_48_0.png) 844 | 845 | 846 | 847 | 848 | 849 |
850 | 
output
851 | 852 | 853 | 854 |
855 | 856 | 857 | 858 | 859 | 860 | 861 | 862 | ## Extra bits 2 863 | 864 | This is a different way to look at your data 865 | 866 | 867 | 868 | 869 | 870 | 871 | 872 | 873 | ```python 874 | ggplot(survs_df, aes("year_fact", "weight")) + \ 875 | stat_summary(fun_y = np.mean, fun_ymin=np.min, fun_ymax=np.max) + \ 876 | theme(axis_text_x = element_text(angle=90, hjust=1)) 877 | 878 | ggplot(survs_df, aes("year_fact", "weight")) + \ 879 | stat_summary(fun_y = np.median, fun_ymin=np.min, fun_ymax=np.max) + \ 880 | theme(axis_text_x = element_text(angle=90, hjust=1)) 881 | 882 | ggplot(survs_df, aes("year_fact", "weight_log")) + \ 883 | stat_summary(fun_y = np.mean, fun_ymin=np.min, fun_ymax=np.max) + \ 884 | theme(axis_text_x = element_text(angle=90, hjust=1)) 885 | ``` 886 | 887 | 888 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_50_0.png) 889 | 890 | 891 | 892 | 893 | 894 |
895 | 
output
896 | 897 | 898 | 899 |
900 | 901 | 902 | 903 | 904 | 905 | 906 | 907 | 908 | ```python 909 | 910 | ``` 911 | 912 | 913 | -------------------------------------------------------------------------------- /workshops/docs/modules/notebooks/defensive_programming.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "tags": [ 7 | "solution" 8 | ] 9 | }, 10 | "source": [ 11 | "## Defensive Programming\n", 12 | "*Estimated teaching time:* 30 min\n", 13 | "\n", 14 | "*Estimated challenge time:* 0 min\n", 15 | "\n", 16 | "\n", 17 | "## Module information\n", 18 | "\n", 19 | "*Key questions:*\n", 20 | "\n", 21 | " - \"How can I make my programs more reliable?\"\n", 22 | " \n", 23 | "*Learning objectives:*\n", 24 | "\n", 25 | " - Explain what an assertion is.\n", 26 | " - Add assertions that check the program's state is correct. \n", 27 | " - Correctly add precondition and postcondition assertions to functions.\n", 28 | " - Explain what test-driven development is, and use it when creating new functions.\n", 29 | " - Explain why variables should be initialized using actual data values rather than arbitrary constants.\n", 30 | "---" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## Defensive Programming\n", 38 | "\n", 39 | "\n", 40 | "Our previous lessons have introduced the basic tools of programming: variables and lists, file operations, data visualisation, loops, conditionals, and functions. What they haven’t done is show us how to tell whether a program is getting the right answer, and how to tell if it’s still getting the right answer as we make changes to it.\n", 41 | "\n", 42 | "To achieve that, we need to:\n", 43 | "\n", 44 | " - Write programs that check their own operation.\n", 45 | " - Write and run tests for widely-used functions.\n", 46 | " - Make sure we know what “correct” actually means.\n", 47 | " \n", 48 | "The good news is, doing these things will speed up our programming, not slow it down. As in real carpentry — the kind done with lumber — the time saved by measuring carefully before cutting a piece of wood is much greater than the time that measuring takes." 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "## Assertions\n", 56 | "\n", 57 | "The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion’s condition. If it’s true, Python does nothing, but if it’s false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn’t positive:" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "``` python\n", 65 | "numbers = [1.5, 2.3, 0.7, -0.001, 4.4]\n", 66 | "total = 0.0\n", 67 | "for n in numbers:\n", 68 | " assert n > 0.0, 'Data should only contain positive values'\n", 69 | " total += n\n", 70 | "print('total is:', total)\n", 71 | "\n", 72 | "```" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "```python\n", 80 | "---------------------------------------------------------------------------\n", 81 | "AssertionError Traceback (most recent call last)\n", 82 | " in ()\n", 83 | " 3 total = 0.0\n", 84 | " 4 for n in numbers:\n", 85 | "----> 5 assert n > 0.0, 'Data should only contain positive values'\n", 86 | " 6 total += n\n", 87 | " 7 print('total is:', total)\n", 88 | "\n", 89 | "AssertionError: Data should only contain positive values\n", 90 | "\n", 91 | "```" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80–90% are working correctly. Broadly speaking, assertions fall into three categories:\n", 99 | "\n", 100 | "A `precondition` is something that must be true at the start of a function in order for it to work correctly.\n", 101 | "\n", 102 | "A `postcondition` is something that the function guarantees is true when it finishes.\n", 103 | "\n", 104 | "An `invariant` is something that is always true at a particular point inside a piece of code.\n", 105 | "\n", 106 | "For example, suppose we are representing rectangles using a `tuple` of four coordinates `(x0, y0, x1, y1)`, representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, but checks that its input is correctly formatted and that its result makes sense:" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 2, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "def normalize_rectangle(rect):\n", 116 | " '''Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.\n", 117 | " Input should be of the format (x0, y0, x1, y1).\n", 118 | " (x0, y0) and (x1, y1) define the lower left and upper right corners\n", 119 | " of the rectangle, respectively.'''\n", 120 | " assert len(rect) == 4, 'Rectangles must contain 4 coordinates'\n", 121 | " x0, y0, x1, y1 = rect\n", 122 | " assert x0 < x1, 'Invalid X coordinates'\n", 123 | " assert y0 < y1, 'Invalid Y coordinates'\n", 124 | "\n", 125 | " dx = x1 - x0\n", 126 | " dy = y1 - y0\n", 127 | " if dx > dy:\n", 128 | " scaled = float(dx) / dy\n", 129 | " upper_x, upper_y = 1.0, scaled\n", 130 | " else:\n", 131 | " scaled = float(dx) / dy\n", 132 | " upper_x, upper_y = scaled, 1.0\n", 133 | "\n", 134 | " assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'\n", 135 | " assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'\n", 136 | "\n", 137 | " return (0, 0, upper_x, upper_y)" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "The preconditions on lines 3, 5, and 6 catch invalid inputs:" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "``` python\n", 152 | "print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate\n", 153 | "\n", 154 | "```" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "``` python\n", 162 | "---------------------------------------------------------------------------\n", 163 | "AssertionError Traceback (most recent call last)\n", 164 | " in ()\n", 165 | "----> 1 print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate\n", 166 | "\n", 167 | " in normalize_rectangle(rect)\n", 168 | " 4 (x0, y0) and (x1, y1) define the lower left and upper right corners\n", 169 | " 5 of the rectangle, respectively.'''\n", 170 | "----> 6 assert len(rect) == 4, 'Rectangles must contain 4 coordinates'\n", 171 | " 7 x0, y0, x1, y1 = rect\n", 172 | " 8 assert x0 < x1, 'Invalid X coordinates'\n", 173 | "\n", 174 | "AssertionError: Rectangles must contain 4 coordinates\n", 175 | "\n", 176 | "```" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "```python\n", 184 | "print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted\n", 185 | "```" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "```python\n", 193 | "---------------------------------------------------------------------------\n", 194 | "AssertionError Traceback (most recent call last)\n", 195 | " in ()\n", 196 | "----> 1 print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted\n", 197 | "\n", 198 | " in normalize_rectangle(rect)\n", 199 | " 6 assert len(rect) == 4, 'Rectangles must contain 4 coordinates'\n", 200 | " 7 x0, y0, x1, y1 = rect\n", 201 | "----> 8 assert x0 < x1, 'Invalid X coordinates'\n", 202 | " 9 assert y0 < y1, 'Invalid Y coordinates'\n", 203 | " 10 \n", 204 | "\n", 205 | "AssertionError: Invalid X coordinates\n", 206 | "\n", 207 | "```" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "The post-conditions on lines 17 and 18 help us catch bugs by telling us when our calculations cannot have been correct. For example, if we normalize a rectangle that is taller than it is wide everything seems OK:" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 5, 220 | "metadata": {}, 221 | "outputs": [ 222 | { 223 | "name": "stdout", 224 | "output_type": "stream", 225 | "text": [ 226 | "(0, 0, 0.2, 1.0)\n" 227 | ] 228 | } 229 | ], 230 | "source": [ 231 | "print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) ))" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "but if we normalize one that’s wider than it is tall, the assertion is triggered:" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "```python\n", 246 | "print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))\n", 247 | "```" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "``` python\n", 255 | "---------------------------------------------------------------------------\n", 256 | "AssertionError Traceback (most recent call last)\n", 257 | " in ()\n", 258 | "----> 1 print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))\n", 259 | "\n", 260 | " in normalize_rectangle(rect)\n", 261 | " 19 \n", 262 | " 20 assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'\n", 263 | "---> 21 assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'\n", 264 | " 22 \n", 265 | " 23 return (0, 0, upper_x, upper_y)\n", 266 | "\n", 267 | "AssertionError: Calculated upper Y coordinate invalid\n", 268 | "\n", 269 | "```" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "Re-reading our function, we realize that line 11 should divide `dy` by `dx` rather than `dx` by `dy`. If we had left out the assertion at the end of the function, we would have created and returned something that had the right shape as a valid answer, but wasn’t. Detecting and debugging that would almost certainly have taken more time in the long run than writing the assertion.\n", 277 | "\n", 278 | "But assertions aren’t just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing.\n", 279 | "\n", 280 | "Most good programmers follow two rules when adding assertions to their code. The first is, fail early, fail often. The greater the distance between when and where an error occurs and when it’s noticed, the harder the error will be to debug, so good code catches mistakes as early as possible.\n", 281 | "\n", 282 | "The second rule is, turn bugs into assertions or tests. Whenever you fix a bug, write an assertion that catches the mistake should you make it again. If you made a mistake in a piece of code, the odds are good that you have made other mistakes nearby, or will make the same mistake (or a related one) the next time you change it. Writing assertions to check that you haven’t regressed (i.e., haven’t re-introduced an old problem) can save a lot of time in the long run, and helps to warn people who are reading the code (including your future self) that this bit is tricky.\n", 283 | "\n" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "### Test-Driven Development\n", 291 | "\n", 292 | "An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it’s given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "![test diagram](images/testing.svg)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "Most novice programmers would solve this problem like this:\n", 307 | "\n", 308 | " 1. Write a function `range_overlap`.\n", 309 | " 2. Call it interactively on two or three different inputs.\n", 310 | " 3. If it produces the wrong answer, fix the function and re-run that test.\n", 311 | "\n", 312 | "This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way:\n", 313 | "\n", 314 | "1. Write a short function for each test.\n", 315 | "2. Write a `range_overlap` function that should pass those tests.\n", 316 | "3. If `range_overlap` produces any wrong answers, fix it and re-run the test functions.\n", 317 | "\n", 318 | "Writing the tests before writing the function they exercise is called `test-driven development` (TDD). Its advocates believe it produces better code faster because:\n", 319 | "\n", 320 | "1. If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.\n", 321 | "2. Writing tests helps programmers figure out what the function is actually supposed to do.\n", 322 | "\n", 323 | "Here are three test functions for `range_overlap`:" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "``` python\n", 331 | "assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n", 332 | "assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n", 333 | "assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)\n", 334 | "```" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "```python\n", 342 | "---------------------------------------------------------------------------\n", 343 | "NameError Traceback (most recent call last)\n", 344 | " in ()\n", 345 | "----> 1 assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n", 346 | " 2 assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n", 347 | " 3 assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)\n", 348 | "\n", 349 | "NameError: name 'range_overlap' is not defined\n", 350 | "```\n" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "The error is actually reassuring: we haven’t written `range_overlap` yet, so if the tests passed, it would be a sign that someone else had and that we were accidentally using their function.\n", 358 | "\n", 359 | "And as a bonus of writing these tests, we’ve implicitly defined what our input and output look like: we expect a list of pairs as input, and produce a single pair as output.\n", 360 | "\n", 361 | "Something important is missing, though. We don’t have any tests for the case where the ranges don’t overlap at all:" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "```python\n", 369 | "assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == ???\n", 370 | "```" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "What should `range_overlap` do in this case: fail with an error message, produce a special value like `(0.0, 0.0)` to signal that there’s no overlap, or *something* else? Any actual implementation of the function will do one of these things; writing the tests first helps us figure out which is *best before* we’re emotionally invested in whatever we happened to write before we realized there was an issue.\n", 378 | "\n", 379 | "And what about this case?" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "```python\n", 387 | "assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == ???\n", 388 | "```" 389 | ] 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "metadata": {}, 394 | "source": [ 395 | "Do two segments that touch at their endpoints overlap or not? Mathematicians usually say “yes”, but engineers usually say “no”. The best answer is “whatever is most useful in the rest of our program”, but again, any actual implementation of `range_overlap` is going to do *something*, and whatever it is ought to be consistent with what it does when there’s no overlap at all.\n", 396 | "\n", 397 | "Since we’re planning to use the range this function returns as the X axis in a time series chart, we decide that:\n", 398 | "\n", 399 | " 1. every overlap has to have non-zero width, and\n", 400 | " 2. we will return the special value None when there’s no overlap.\n", 401 | " \n", 402 | "`None` is built into Python, and means “nothing here”. (Other languages often call the equivalent value `null` or `nil`). With that decision made, we can finish writing our last two tests:" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "```python\n", 410 | "assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n", 411 | "assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n", 412 | "```" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "```python\n", 420 | "---------------------------------------------------------------------------\n", 421 | "NameError Traceback (most recent call last)\n", 422 | " in ()\n", 423 | "----> 1 assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n", 424 | " 2 assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n", 425 | "\n", 426 | "NameError: name 'range_overlap' is not defined\n", 427 | "```" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "Again, we get an error because we haven’t written our function, but we’re now ready to do so:\n" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 14, 440 | "metadata": {}, 441 | "outputs": [], 442 | "source": [ 443 | "def range_overlap(ranges):\n", 444 | " '''Return common overlap among a set of [low, high] ranges.'''\n", 445 | " lowest = 0.0\n", 446 | " highest = 1.0\n", 447 | " for (low, high) in ranges:\n", 448 | " lowest = max(lowest, low)\n", 449 | " highest = min(highest, high)\n", 450 | " return (lowest, highest)" 451 | ] 452 | }, 453 | { 454 | "cell_type": "markdown", 455 | "metadata": {}, 456 | "source": [ 457 | "(Take a moment to think about why we use `max` to raise `lowest` and `min` to lower `highest`). We’d now like to re-run our tests, but they’re scattered across three different cells. To make running them easier, let’s put them all in a function:" 458 | ] 459 | }, 460 | { 461 | "cell_type": "code", 462 | "execution_count": 15, 463 | "metadata": {}, 464 | "outputs": [], 465 | "source": [ 466 | "def test_range_overlap():\n", 467 | " assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n", 468 | " assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n", 469 | " assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n", 470 | " assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n", 471 | " assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "We can now test `range_overlap` with a single function call:" 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": {}, 484 | "source": [ 485 | "```python \n", 486 | "test_range_overlap() \n", 487 | "```" 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": {}, 493 | "source": [ 494 | "```python\n", 495 | "---------------------------------------------------------------------------\n", 496 | "AssertionError Traceback (most recent call last)\n", 497 | " in ()\n", 498 | "----> 1 test_range_overlap()\n", 499 | "\n", 500 | " in test_range_overlap()\n", 501 | " 1 def test_range_overlap():\n", 502 | "----> 2 assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n", 503 | " 3 assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n", 504 | " 4 assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n", 505 | " 5 assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n", 506 | "\n", 507 | "AssertionError: \n", 508 | "```" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "The first test that was supposed to produce `None` fails, so we know something is wrong with our function. We don’t know whether the other tests passed or failed because Python halted the program as soon as it spotted the first error. Still, some information is better than none, and if we trace the behavior of the function with that input, we realize that we’re initializing `lowest` and `highest` to 0.0 and 1.0 respectively, regardless of the input values. This violates another important rule of programming: always initialize from data." 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "metadata": { 521 | "tags": [ 522 | "challenge" 523 | ] 524 | }, 525 | "source": [ 526 | "Fix `range_overlap`. Re-run `test_range_overlap` after each change you make." 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": 17, 532 | "metadata": { 533 | "tags": [ 534 | "solution" 535 | ] 536 | }, 537 | "outputs": [], 538 | "source": [ 539 | "import numpy\n", 540 | "\n", 541 | "def range_overlap(ranges):\n", 542 | " '''Return common overlap among a set of [low, high] ranges.'''\n", 543 | " if not ranges:\n", 544 | " # ranges is None or an empty list\n", 545 | " return None\n", 546 | " lowest, highest = ranges[0]\n", 547 | " for (low, high) in ranges[1:]:\n", 548 | " lowest = max(lowest, low)\n", 549 | " highest = min(highest, high)\n", 550 | " if lowest >= highest: # no overlap\n", 551 | " return None\n", 552 | " else:\n", 553 | " return (lowest, highest)" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": {}, 559 | "source": [ 560 | "## Key points\n", 561 | "\n", 562 | " - Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.\n", 563 | "\n", 564 | " - Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.\n", 565 | "\n", 566 | " - Use preconditions to check that the inputs to a function are safe to use.\n", 567 | "\n", 568 | " - Use postconditions to check that the output from a function is safe to use.\n", 569 | "\n", 570 | " - Write tests before writing code in order to help determine exactly what that code is supposed to do." 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": null, 576 | "metadata": {}, 577 | "outputs": [], 578 | "source": [] 579 | } 580 | ], 581 | "metadata": { 582 | "celltoolbar": "Tags", 583 | "kernelspec": { 584 | "display_name": "Python 3", 585 | "language": "python", 586 | "name": "python3" 587 | }, 588 | "language_info": { 589 | "codemirror_mode": { 590 | "name": "ipython", 591 | "version": 3 592 | }, 593 | "file_extension": ".py", 594 | "mimetype": "text/x-python", 595 | "name": "python", 596 | "nbconvert_exporter": "python", 597 | "pygments_lexer": "ipython3", 598 | "version": "3.6.3" 599 | } 600 | }, 601 | "nbformat": 4, 602 | "nbformat_minor": 2 603 | } 604 | -------------------------------------------------------------------------------- /workshops/docs/modules/intro.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 12 | 13 | 14 | # Python: the basics 15 | 16 | 17 | 18 | 19 | 20 | Python is a general purpose programming language that supports rapid development 21 | of scripts and applications. 22 | 23 | Python's main advantages: 24 | 25 | * Open Source software, supported by Python Software Foundation 26 | * Available on all major platforms (ie. Windows, Linux and MacOS) 27 | * It is a general-purpose programming language, designed for readability 28 | * Supports multiple programming paradigms ('functional', 'object oriented') 29 | * Very large community with a rich ecosystem of third-party packages 30 | 31 | 32 | 33 | 34 | 35 | ## Interpreter 36 | 37 | Python is an interpreted language[*](https://softwareengineering.stackexchange.com/a/24560) which can be used in two ways: 38 | 39 | * "Interactive" Mode: It functions like an "advanced calculator", executing 40 | one command at a time: 41 | 42 | ```bash 43 | user:host:~$ python 44 | Python 3.5.1 (default, Oct 23 2015, 18:05:06) 45 | [GCC 4.8.3] on linux2 46 | Type "help", "copyright", "credits" or "license" for more information. 47 | >>> 2 + 2 48 | 4 49 | >>> print("Hello World") 50 | Hello World 51 | ``` 52 | 53 | 54 | 55 | 56 | 57 | * "Scripting" Mode: Executing a series of "commands" saved in text file, 58 | usually with a `.py` extension after the name of your file: 59 | 60 | ```bash 61 | user:host:~$ python my_script.py 62 | Hello World 63 | ``` 64 | 65 | 66 | 67 | 68 | 69 | ## Using interactive Python in Jupyter-style notebooks 70 | 71 | A convenient and powerful way to use interactive-mode Python is via a Jupyter Notebook, or similar browser-based interface. 72 | 73 | This particularly lends itself to data analysis since the notebook records a history of commands and shows output and graphs immediately in the browser. 74 | 75 | There are several ways you can run a Jupyter(-style) notebook - locally installed on your computer or hosted as a service on the web. Today we will use a Jupyter notebook service provided by Google: https://colab.research.google.com (Colaboratory). 76 | 77 | ### Jupyter-style notebooks: a quick tour 78 | 79 | Go to https://colab.research.google.com and login with your Google account. 80 | 81 | Select ***NEW NOTEBOOK → NEW PYTHON 3 NOTEBOOK*** - a new notebook will be created. 82 | 83 | --- 84 | 85 | Type some Python code in the top cell, eg: 86 | 87 | ```python 88 | print("Hello Jupyter !") 89 | ``` 90 | 91 | ***Shift-Enter*** to run the contents of the cell 92 | 93 | --- 94 | 95 | You can add new cells. 96 | 97 | ***Insert → Insert Code Cell*** 98 | 99 | --- 100 | 101 | NOTE: When the text on the left hand of the cell is: `In [*]` (with an asterisk rather than a number), the cell is still running. It's usually best to wait until one cell has finished running before running the next. 102 | 103 | Let's begin writing some code in our notebook. 104 | 105 | 106 | 107 | 108 | 109 | 110 | ```python 111 | print("Hello Jupyter !") 112 | ``` 113 | 114 |
 115 | 
output
116 | 117 | Hello Jupyter ! 118 | 119 | 120 |
121 | 122 | 123 | 124 | 125 | 126 | In Jupyter/Collaboratory, just typing the name of a variable in the cell prints its representation: 127 | 128 | 129 | 130 | 131 | 132 | 133 | ```python 134 | message = "Hello again !" 135 | message 136 | ``` 137 | 138 | 139 | 140 | 141 |
 142 | 
output
143 | 144 | 'Hello again !' 145 | 146 |
147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | ```python 156 | # A 'hash' symbol denotes a comment 157 | # This is a comment. Anything after the 'hash' symbol on the line is ignored by the Python interpreter 158 | 159 | print("No comment") # comment 160 | ``` 161 | 162 |
 163 | 
output
164 | 165 | No comment 166 | 167 | 168 |
169 | 170 | 171 | 172 | 173 | 174 | ## Variables and data types 175 | ### Integers, floats, strings 176 | 177 | 178 | 179 | 180 | 181 | 182 | ```python 183 | a = 5 184 | ``` 185 | 186 | 187 | 188 | 189 | 190 | 191 | ```python 192 | a 193 | ``` 194 | 195 | 196 | 197 | 198 |
 199 | 
output
200 | 201 | 5 202 | 203 |
204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | ```python 213 | type(a) 214 | ``` 215 | 216 | 217 | 218 | 219 |
 220 | 
output
221 | 222 | int 223 | 224 |
225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | Adding a decimal point creates a `float` 233 | 234 | 235 | 236 | 237 | 238 | 239 | ```python 240 | b = 5.0 241 | ``` 242 | 243 | 244 | 245 | 246 | 247 | 248 | ```python 249 | b 250 | ``` 251 | 252 | 253 | 254 | 255 |
 256 | 
output
257 | 258 | 5.0 259 | 260 |
261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | ```python 270 | type(b) 271 | ``` 272 | 273 | 274 | 275 | 276 |
 277 | 
output
278 | 279 | float 280 | 281 |
282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | `int` and `float` are collectively called 'numeric' types 290 | 291 | (There are also other numeric types like `hex` for hexidemical and `complex` for complex numbers) 292 | 293 | 294 | 295 | 296 | 297 | ## Challenge - Types 298 | 299 | What is the **type** of the variable `letters` defined below ? 300 | 301 | `letters = "ABACBS"` 302 | 303 | * A) `int` 304 | * B) `str` 305 | * C) `float` 306 | * D) `text` 307 | 308 | Write some code the outputs the type - paste your answer into the Etherpad. 309 | 310 | 311 | 312 | 313 | 318 | 319 | 320 | 321 | 340 | 341 | 342 | 343 | 344 | ### Strings 345 | 346 | 347 | 348 | 349 | 350 | 351 | ```python 352 | some_words = "Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇" 353 | ``` 354 | 355 | 356 | 357 | 358 | 359 | 360 | ```python 361 | some_words 362 | ``` 363 | 364 | 365 | 366 | 367 |
 368 | 
output
369 | 370 | 'Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇' 371 | 372 |
373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | ```python 382 | type(some_words) 383 | ``` 384 | 385 | 386 | 387 | 388 |
 389 | 
output
390 | 391 | str 392 | 393 |
394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | The variable `some_words` is of type `str`, short for "string". Strings hold 402 | sequences of characters, which can be letters, numbers, punctuation 403 | or more exotic forms of text (even emoji!). 404 | 405 | 406 | 407 | 408 | 409 | ## Operators 410 | 411 | We can perform mathematical calculations in Python using the basic operators: 412 | 413 | `+` `-` `*` `/` `%` `**` 414 | 415 | 416 | 417 | 418 | 419 | 420 | ```python 421 | 2 + 2 # Addition 422 | ``` 423 | 424 | 425 | 426 | 427 |
 428 | 
output
429 | 430 | 4 431 | 432 |
433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | ```python 442 | 6 * 7 # Multiplication 443 | ``` 444 | 445 | 446 | 447 | 448 |
 449 | 
output
450 | 451 | 42 452 | 453 |
454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | ```python 463 | 2 ** 16 # Power 464 | ``` 465 | 466 | 467 | 468 | 469 |
 470 | 
output
471 | 472 | 65536 473 | 474 |
475 | 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | ```python 484 | 13 % 5 # Modulo 485 | ``` 486 | 487 | 488 | 489 | 490 |
 491 | 
output
492 | 493 | 3 494 | 495 |
496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | ```python 505 | # int + int = int 506 | a = 5 507 | a + 1 508 | ``` 509 | 510 | 511 | 512 | 513 |
 514 | 
output
515 | 516 | 6 517 | 518 |
519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | ```python 528 | # float + int = float 529 | b = 5.0 530 | b + 1 531 | ``` 532 | 533 | 534 | 535 | 536 |
 537 | 
output
538 | 539 | 6.0 540 | 541 |
542 | 543 | 544 | 545 | 546 | 547 | 548 | 549 | 550 | ```python 551 | a + b 552 | ``` 553 | 554 | 555 | 556 | 557 |
 558 | 
output
559 | 560 | 10.0 561 | 562 |
563 | 564 | 565 | 566 | 567 | 568 | 569 | 570 | ```python 571 | some_words = "I'm a string" 572 | a = 6 573 | a + some_words 574 | ``` 575 | 576 | 577 | 578 | 579 | 580 | 581 | 582 | Outputs: 583 | 584 | ``` 585 | --------------------------------------------------------------------------- 586 | TypeError Traceback (most recent call last) 587 | in () 588 | 1 some_words = "I'm a string" 589 | 2 a = 6 590 | ----> 3 a + some_words 591 | 592 | TypeError: unsupported operand type(s) for +: 'int' and 'str' 593 | ``` 594 | 595 | 596 | 597 | 598 | 599 | 600 | ```python 601 | str(a) + " " + some_words 602 | ``` 603 | 604 | 605 | 606 | 607 |
 608 | 
output
609 | 610 | '5 Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇' 611 | 612 |
613 | 614 | 615 | 616 | 617 | 618 | 619 | 620 | 621 | ```python 622 | # Shorthand: operators with assignment 623 | a += 1 624 | a 625 | 626 | # Equivalent to: 627 | # a = a + 1 628 | ``` 629 | 630 | 631 | 632 | 633 |
 634 | 
output
635 | 636 | 6 637 | 638 |
639 | 640 | 641 | 642 | 643 | 644 | 645 | 646 | ### Boolean operations 647 | 648 | We can also use comparison and logic operators: 649 | `<, >, ==, !=, <=, >=` and statements of identity such as 650 | `and, or, not`. The data type returned by this is 651 | called a _boolean_. 652 | 653 | 654 | 655 | 656 | 657 | 658 | 659 | ```python 660 | 3 > 4 661 | ``` 662 | 663 | 664 | 665 | 666 |
 667 | 
output
668 | 669 | False 670 | 671 |
672 | 673 | 674 | 675 | 676 | 677 | 678 | 679 | 680 | ```python 681 | True and True 682 | ``` 683 | 684 | 685 | 686 | 687 |
 688 | 
output
689 | 690 | True 691 | 692 |
693 | 694 | 695 | 696 | 697 | 698 | 699 | 700 | 701 | ```python 702 | True or False 703 | ``` 704 | 705 | 706 | 707 | 708 |
 709 | 
output
710 | 711 | True 712 | 713 |
714 | 715 | 716 | 717 | 718 | 719 | 720 | 721 | ## Lists and sequence types 722 | 723 | 724 | 725 | 726 | 727 | ### Lists 728 | 729 | 730 | 731 | 732 | 733 | 734 | ```python 735 | numbers = [2, 4, 6, 8, 10] 736 | numbers 737 | ``` 738 | 739 | 740 | 741 | 742 |
 743 | 
output
744 | 745 | [2, 4, 6, 8, 10] 746 | 747 |
748 | 749 | 750 | 751 | 752 | 753 | 754 | 755 | 756 | ```python 757 | # `len` get the length of a list 758 | len(numbers) 759 | ``` 760 | 761 | 762 | 763 | 764 |
 765 | 
output
766 | 767 | 5 768 | 769 |
770 | 771 | 772 | 773 | 774 | 775 | 776 | 777 | 778 | ```python 779 | # Lists can contain multiple data types, including other lists 780 | mixed_list = ["asdf", 2, 3.142, numbers, ['a','b','c']] 781 | mixed_list 782 | ``` 783 | 784 | 785 | 786 | 787 |
 788 | 
output
789 | 790 | ['asdf', 2, 3.142, [2, 4, 6, 8, 10], ['a', 'b', 'c']] 791 | 792 |
793 | 794 | 795 | 796 | 797 | 798 | 799 | 800 | You can retrieve items from a list by their *index*. In Python, the first item has an index of 0 (zero). 801 | 802 | 803 | 804 | 805 | 806 | 807 | ```python 808 | numbers[0] 809 | ``` 810 | 811 | 812 | 813 | 814 |
 815 | 
output
816 | 817 | 2 818 | 819 |
820 | 821 | 822 | 823 | 824 | 825 | 826 | 827 | 828 | ```python 829 | numbers[3] 830 | ``` 831 | 832 | 833 | 834 | 835 |
 836 | 
output
837 | 838 | 8 839 | 840 |
841 | 842 | 843 | 844 | 845 | 846 | 847 | 848 | You can also assign a new value to any position in the list. 849 | 850 | 851 | 852 | 853 | 854 | 855 | ```python 856 | numbers[3] = numbers[3] * 100 857 | numbers 858 | ``` 859 | 860 | 861 | 862 | 863 |
 864 | 
output
865 | 866 | [2, 4, 6, 800, 10] 867 | 868 |
869 | 870 | 871 | 872 | 873 | 874 | 875 | 876 | You can append items to the end of the list. 877 | 878 | 879 | 880 | 881 | 882 | 883 | ```python 884 | numbers.append(12) 885 | numbers 886 | ``` 887 | 888 | 889 | 890 | 891 |
 892 | 
output
893 | 894 | [2, 4, 6, 800, 10, 12] 895 | 896 |
897 | 898 | 899 | 900 | 901 | 902 | 903 | 904 | You can add multiple items to the end of a list with `extend`. 905 | 906 | 907 | 908 | 909 | 910 | 911 | ```python 912 | numbers.extend([14, 16, 18]) 913 | numbers 914 | ``` 915 | 916 | 917 | 918 | 919 |
 920 | 
output
921 | 922 | [2, 4, 6, 800, 10, 12, 14, 16, 18] 923 | 924 |
925 | 926 | 927 | 928 | 929 | 930 | 931 | 932 | ### Loops 933 | 934 | A for loop can be used to access the elements in a list or other Python data structure one at a time. We will learn about loops in other lesson. 935 | 936 | 937 | 938 | 939 | 940 | 941 | ```python 942 | for num in numbers: 943 | print(num) 944 | ``` 945 | 946 |
 947 | 
output
948 | 949 | 2 950 | 4 951 | 6 952 | 800 953 | 10 954 | 12 955 | 14 956 | 16 957 | 18 958 | 959 | 960 |
961 | 962 | 963 | 964 | 965 | 966 | **Indentation** is very important in Python. Note that the second line in the 967 | example above is indented, indicating the code that is the body of the loop. 968 | 969 | 970 | 971 | 972 | 973 | To find out what methods are available for an object, we can use the built-in `help` command: 974 | 975 | 976 | 977 | 978 | 979 | 980 | ```python 981 | help(numbers) 982 | ``` 983 | 984 |
 985 | 
output
986 | 987 | Help on list object: 988 | 989 | class list(object) 990 | | list() -> new empty list 991 | | list(iterable) -> new list initialized from iterable's items 992 | | 993 | | Methods defined here: 994 | | 995 | | __add__(self, value, /) 996 | | Return self+value. 997 | | 998 | | __contains__(self, key, /) 999 | | Return key in self. 1000 | | 1001 | | __delitem__(self, key, /) 1002 | | Delete self[key]. 1003 | | 1004 | | __eq__(self, value, /) 1005 | | Return self==value. 1006 | | 1007 | | __ge__(self, value, /) 1008 | | Return self>=value. 1009 | | 1010 | | __getattribute__(self, name, /) 1011 | | Return getattr(self, name). 1012 | | 1013 | | __getitem__(...) 1014 | | x.__getitem__(y) <==> x[y] 1015 | | 1016 | | __gt__(self, value, /) 1017 | | Return self>value. 1018 | | 1019 | | __iadd__(self, value, /) 1020 | | Implement self+=value. 1021 | | 1022 | | __imul__(self, value, /) 1023 | | Implement self*=value. 1024 | | 1025 | | __init__(self, /, *args, **kwargs) 1026 | | Initialize self. See help(type(self)) for accurate signature. 1027 | | 1028 | | __iter__(self, /) 1029 | | Implement iter(self). 1030 | | 1031 | | __le__(self, value, /) 1032 | | Return self<=value. 1033 | | 1034 | | __len__(self, /) 1035 | | Return len(self). 1036 | | 1037 | | __lt__(self, value, /) 1038 | | Return self None -- append object to end 1066 | | 1067 | | clear(...) 1068 | | L.clear() -> None -- remove all items from L 1069 | | 1070 | | copy(...) 1071 | | L.copy() -> list -- a shallow copy of L 1072 | | 1073 | | count(...) 1074 | | L.count(value) -> integer -- return number of occurrences of value 1075 | | 1076 | | extend(...) 1077 | | L.extend(iterable) -> None -- extend list by appending elements from the iterable 1078 | | 1079 | | index(...) 1080 | | L.index(value, [start, [stop]]) -> integer -- return first index of value. 1081 | | Raises ValueError if the value is not present. 1082 | | 1083 | | insert(...) 1084 | | L.insert(index, object) -- insert object before index 1085 | | 1086 | | pop(...) 1087 | | L.pop([index]) -> item -- remove and return item at index (default last). 1088 | | Raises IndexError if list is empty or index is out of range. 1089 | | 1090 | | remove(...) 1091 | | L.remove(value) -> None -- remove first occurrence of value. 1092 | | Raises ValueError if the value is not present. 1093 | | 1094 | | reverse(...) 1095 | | L.reverse() -- reverse *IN PLACE* 1096 | | 1097 | | sort(...) 1098 | | L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE* 1099 | | 1100 | | ---------------------------------------------------------------------- 1101 | | Data and other attributes defined here: 1102 | | 1103 | | __hash__ = None 1104 | 1105 | 1106 | 1107 |
1108 | 1109 | 1110 | 1111 | 1112 | 1113 | ### Tuples 1114 | 1115 | A tuple is similar to a list in that it's an ordered sequence of elements. 1116 | However, tuples can not be changed once created (they are "immutable"). Tuples 1117 | are created by placing comma-separated values inside parentheses `()`. 1118 | 1119 | 1120 | 1121 | 1122 | 1123 | 1124 | ```python 1125 | tuples_are_immutable = ("bar", 100, 200, "foo") 1126 | tuples_are_immutable 1127 | ``` 1128 | 1129 | 1130 | 1131 | 1132 |
1133 | 
output
1134 | 1135 | ('bar', 100, 200, 'foo') 1136 | 1137 |
1138 | 1139 | 1140 | 1141 | 1142 | 1143 | 1144 | 1145 | 1146 | ```python 1147 | tuples_are_immutable[1] 1148 | ``` 1149 | 1150 | 1151 | 1152 | 1153 |
1154 | 
output
1155 | 1156 | 100 1157 | 1158 |
1159 | 1160 | 1161 | 1162 | 1163 | 1164 | 1165 | 1166 | ```python 1167 | tuples_are_immutable[1] = 666 1168 | ``` 1169 | 1170 | 1171 | 1172 | 1173 | 1174 | Outputs: 1175 | 1176 | ``` 1177 | --------------------------------------------------------------------------- 1178 | TypeError Traceback (most recent call last) 1179 | in () 1180 | ----> 1 tuples_are_immutable[1] = 666 1181 | 1182 | TypeError: 'tuple' object does not support item assignment 1183 | ``` 1184 | 1185 | 1186 | 1187 | 1188 | 1189 | ### Dictionaries 1190 | 1191 | Dictionaries are a container that store key-value pairs. They are unordered. 1192 | 1193 | Other programming languages might call this a 'hash', 'hashtable' or 'hashmap'. 1194 | 1195 | 1196 | 1197 | 1198 | 1199 | 1200 | ```python 1201 | pairs = {'Apple': 1, 'Orange': 2, 'Pear': 4} 1202 | pairs 1203 | ``` 1204 | 1205 | 1206 | 1207 | 1208 |
1209 | 
output
1210 | 1211 | {'Apple': 1, 'Orange': 2, 'Pear': 4} 1212 | 1213 |
1214 | 1215 | 1216 | 1217 | 1218 | 1219 | 1220 | 1221 | 1222 | ```python 1223 | pairs['Orange'] 1224 | ``` 1225 | 1226 | 1227 | 1228 | 1229 |
1230 | 
output
1231 | 1232 | 2 1233 | 1234 |
1235 | 1236 | 1237 | 1238 | 1239 | 1240 | 1241 | 1242 | 1243 | ```python 1244 | pairs['Orange'] = 16 1245 | pairs 1246 | ``` 1247 | 1248 | 1249 | 1250 | 1251 |
1252 | 
output
1253 | 1254 | {'Apple': 1, 'Orange': 16, 'Pear': 4} 1255 | 1256 |
1257 | 1258 | 1259 | 1260 | 1261 | 1262 | 1263 | 1264 | The `items` method returns a sequence of the key-value pairs as tuples. 1265 | 1266 | `values` returns a sequence of just the values. 1267 | 1268 | `keys` returns a sequence of just the keys. 1269 | 1270 | --- 1271 | In Python 3, the `.items()`, `.values()` and `.keys()` methods return a ['dictionary view' object](https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects) that behaves like a list or tuple in for loops but doesn't support indexing. 'Dictionary views' stay in sync even when the dictionary changes. 1272 | 1273 | You can turn them into a normal list or tuple with the `list()` or `tuple()` functions. 1274 | 1275 | 1276 | 1277 | 1278 | 1279 | 1280 | ```python 1281 | pairs.items() 1282 | # list(pairs.items()) 1283 | ``` 1284 | 1285 | 1286 | 1287 | 1288 |
1289 | 
output
1290 | 1291 | dict_items([('Apple', 1), ('Orange', 16), ('Pear', 4)]) 1292 | 1293 |
1294 | 1295 | 1296 | 1297 | 1298 | 1299 | 1300 | 1301 | 1302 | ```python 1303 | pairs.values() 1304 | # list(pairs.values()) 1305 | ``` 1306 | 1307 | 1308 | 1309 | 1310 |
1311 | 
output
1312 | 1313 | dict_values([1, 16, 4]) 1314 | 1315 |
1316 | 1317 | 1318 | 1319 | 1320 | 1321 | 1322 | 1323 | 1324 | ```python 1325 | pairs.keys() 1326 | # list(pairs.keys()) 1327 | ``` 1328 | 1329 | 1330 | 1331 | 1332 |
1333 | 
output
1334 | 1335 | dict_keys(['Apple', 'Orange', 'Pear']) 1336 | 1337 |
1338 | 1339 | 1340 | 1341 | 1342 | 1343 | 1344 | 1345 | 1346 | ```python 1347 | len(pairs) 1348 | ``` 1349 | 1350 | 1351 | 1352 | 1353 |
1354 | 
output
1355 | 1356 | 3 1357 | 1358 |
1359 | 1360 | 1361 | 1362 | 1363 | 1364 | 1365 | 1366 | 1367 | ```python 1368 | dict_of_dicts = {'first': {1:2, 2: 4, 4: 8, 8: 16}, 'second': {'a': 2.2, 'b': 4.4}} 1369 | dict_of_dicts 1370 | ``` 1371 | 1372 | 1373 | 1374 | 1375 |
1376 | 
output
1377 | 1378 | {'first': {1: 2, 2: 4, 4: 8, 8: 16}, 'second': {'a': 2.2, 'b': 4.4}} 1379 | 1380 |
1381 | 1382 | 1383 | 1384 | 1385 | 1386 | 1387 | 1388 | ## Challenge - Dictionaries 1389 | 1390 | Given the dictionary: 1391 | 1392 | ```python 1393 | jam_ratings = {'Plum': 6, 'Apricot': 2, 'Strawberry': 8} 1394 | ``` 1395 | 1396 | How would you change the value associated with the key `Apricot` to `9`. 1397 | 1398 | A) `jam_ratings = {'apricot': 9}` 1399 | 1400 | B) `jam_ratings[9] = 'Apricot'` 1401 | 1402 | C) `jam_ratings['Apricot'] = 9` 1403 | 1404 | D) `jam_ratings[2] = 'Apricot'` 1405 | 1406 | 1407 | 1408 | 1409 | 1422 | 1423 | --------------------------------------------------------------------------------