├── workshops
    ├── docs
    │   ├── modules
    │   │   ├── data
    │   │   ├── images
    │   │   ├── notebooks
    │   │   │   ├── plot1.png
    │   │   │   ├── images
    │   │   │   │   ├── inner-join.png
    │   │   │   │   ├── left_join.png
    │   │   │   │   ├── loops_image.png
    │   │   │   │   ├── make_female.png
    │   │   │   │   ├── plot_mean_weight.png
    │   │   │   │   ├── plot_total_animals.png
    │   │   │   │   ├── testing.svg
    │   │   │   │   ├── slicing-indexing.svg
    │   │   │   │   └── slicing-slicing.svg
    │   │   │   ├── speciesSubset.csv
    │   │   │   ├── data
    │   │   │   │   └── speciesSubset.csv
    │   │   │   ├── Pipfile
    │   │   │   ├── wip
    │   │   │   │   ├── README.md
    │   │   │   │   ├── more_data_structures.ipynb
    │   │   │   │   ├── functions.ipynb
    │   │   │   │   ├── conditionals.ipynb
    │   │   │   │   ├── slicing_and_list_comprehensions.ipynb
    │   │   │   │   └── basics_data_carpentry.ipynb
    │   │   │   ├── nbconvert_templates
    │   │   │   │   ├── student_markdown.tpl
    │   │   │   │   ├── workshop_notes.tpl
    │   │   │   │   ├── instructor_markdown.tpl
    │   │   │   │   ├── student.tpl
    │   │   │   │   ├── workshop_notes_markdown.tpl
    │   │   │   │   └── instructor.tpl
    │   │   │   ├── loops.ipynb
    │   │   │   └── defensive_programming.ipynb
    │   │   ├── indexing_files
    │   │   │   └── indexing_74_1.png
    │   │   ├── working_with_data_files
    │   │   │   ├── working_with_data_57_1.png
    │   │   │   ├── working_with_data_59_1.png
    │   │   │   ├── working_with_data_62_1.png
    │   │   │   ├── working_with_data_64_1.png
    │   │   │   └── working_with_data_70_1.png
    │   │   ├── plotting_with_ggplot_files
    │   │   │   ├── plotting_with_ggplot_10_0.png
    │   │   │   ├── plotting_with_ggplot_12_0.png
    │   │   │   ├── plotting_with_ggplot_14_0.png
    │   │   │   ├── plotting_with_ggplot_16_0.png
    │   │   │   ├── plotting_with_ggplot_18_0.png
    │   │   │   ├── plotting_with_ggplot_21_0.png
    │   │   │   ├── plotting_with_ggplot_22_0.png
    │   │   │   ├── plotting_with_ggplot_24_0.png
    │   │   │   ├── plotting_with_ggplot_26_0.png
    │   │   │   ├── plotting_with_ggplot_28_0.png
    │   │   │   ├── plotting_with_ggplot_31_0.png
    │   │   │   ├── plotting_with_ggplot_36_0.png
    │   │   │   ├── plotting_with_ggplot_38_0.png
    │   │   │   ├── plotting_with_ggplot_39_0.png
    │   │   │   ├── plotting_with_ggplot_43_0.png
    │   │   │   ├── plotting_with_ggplot_44_0.png
    │   │   │   ├── plotting_with_ggplot_47_0.png
    │   │   │   ├── plotting_with_ggplot_48_0.png
    │   │   │   └── plotting_with_ggplot_50_0.png
    │   │   ├── loops.md
    │   │   ├── defensive_programming.md
    │   │   ├── plotting_with_ggplot.md
    │   │   └── intro.md
    │   ├── css
    │   │   └── extra.css
    │   ├── halfday.md
    │   ├── fullday.md
    │   └── index.md
    └── mkdocs.yml
├── deploy.sh
├── .gitmodules
├── Pipfile
├── .gitignore
├── scripts
    └── markdown2ipynb.py
├── LICENSE.md
└── README.md


/workshops/docs/modules/data:
--------------------------------------------------------------------------------
1 | notebooks/data


--------------------------------------------------------------------------------
/workshops/docs/modules/images:
--------------------------------------------------------------------------------
1 | notebooks/images


--------------------------------------------------------------------------------
/deploy.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | cd workshops
4 | mkdocs gh-deploy
5 | cd ..
6 | 


--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "themes/mkdocs-windmill"]
2 | 	path = themes/mkdocs-windmill
3 | 	url = https://github.com/gristlabs/mkdocs-windmill
4 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/plot1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/plot1.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/inner-join.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/inner-join.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/left_join.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/left_join.png


--------------------------------------------------------------------------------
/workshops/docs/modules/indexing_files/indexing_74_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/indexing_files/indexing_74_1.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/loops_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/loops_image.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/make_female.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/make_female.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/plot_mean_weight.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/plot_mean_weight.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/plot_total_animals.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/notebooks/images/plot_total_animals.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/speciesSubset.csv:
--------------------------------------------------------------------------------
1 | "species_id","genus","species","taxa"
2 | "DM","Dipodomys","merriami","Rodent"
3 | "NL","Neotoma","albigula","Rodent"
4 | "PE","Peromyscus","eremicus","Rodent"
5 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/data/speciesSubset.csv:
--------------------------------------------------------------------------------
1 | "species_id","genus","species","taxa"
2 | "DM","Dipodomys","merriami","Rodent"
3 | "NL","Neotoma","albigula","Rodent"
4 | "PE","Peromyscus","eremicus","Rodent"
5 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/working_with_data_files/working_with_data_57_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_57_1.png


--------------------------------------------------------------------------------
/workshops/docs/modules/working_with_data_files/working_with_data_59_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_59_1.png


--------------------------------------------------------------------------------
/workshops/docs/modules/working_with_data_files/working_with_data_62_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_62_1.png


--------------------------------------------------------------------------------
/workshops/docs/modules/working_with_data_files/working_with_data_64_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_64_1.png


--------------------------------------------------------------------------------
/workshops/docs/modules/working_with_data_files/working_with_data_70_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/working_with_data_files/working_with_data_70_1.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_10_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_10_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_12_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_12_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_14_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_14_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_16_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_16_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_18_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_18_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_21_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_21_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_22_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_22_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_24_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_24_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_26_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_26_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_28_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_28_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_31_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_31_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_36_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_36_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_38_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_38_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_39_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_39_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_43_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_43_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_44_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_44_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_47_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_47_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_48_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_48_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_50_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MonashDataFluency/python-workshop-base/HEAD/workshops/docs/modules/plotting_with_ggplot_files/plotting_with_ggplot_50_0.png


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/Pipfile:
--------------------------------------------------------------------------------
 1 | [[source]]
 2 | url = "https://pypi.org/simple"
 3 | verify_ssl = true
 4 | name = "pypi"
 5 | 
 6 | [packages]
 7 | 
 8 | [dev-packages]
 9 | 
10 | [requires]
11 | python_version = "3.6"
12 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/wip/README.md:
--------------------------------------------------------------------------------
1 | These are work-in-progress modules that don't yet get rendered to Markdown or
2 | the main site.
3 | 
4 | Once these modules take shape they can be moved to `workshops/docs/modules/notebooks`
5 | and integrated.
6 | 


--------------------------------------------------------------------------------
/Pipfile:
--------------------------------------------------------------------------------
 1 | [[source]]
 2 | url = "https://pypi.org/simple"
 3 | verify_ssl = true
 4 | name = "pypi"
 5 | 
 6 | [packages]
 7 | mkdocs = "*"
 8 | mkdocs-windmill = "*"
 9 | mkdocs-bootswatch = "*"
10 | mkdocs-cinder = "*"
11 | mkdocs-cluster = "*"
12 | jupyter = "*"
13 | pandas = "*"
14 | numpy = "*"
15 | plotnine = "*"
16 | 
17 | [dev-packages]
18 | 
19 | [requires]
20 | python_version = "3.6"
21 | pyyaml = ">=4.2b1"
22 | notebook = ">=5.7.2"
23 | 


--------------------------------------------------------------------------------
/workshops/docs/css/extra.css:
--------------------------------------------------------------------------------
 1 | /* default boxes around code (Jupyter input and output blocks) */
 2 | pre {
 3 |     border-style: solid;
 4 |     border-width: 1px;
 5 |     border-color: rgb(204, 204, 204);
 6 | }
 7 | 
 8 | /* only left border on Jupyter output blocks */
 9 | pre.output {
10 |     border-left-style: solid;
11 |     border-top-style: none;
12 |     border-right-style: none;
13 |     border-bottom-style: none;
14 |     border-width: 1px;
15 |     border-color: #008cba;
16 | }
17 | 


--------------------------------------------------------------------------------
/workshops/docs/halfday.md:
--------------------------------------------------------------------------------
 1 | # Introduction to Python Workshop (half-day)
 2 | 
 3 | Welcome to _Introduction to Python_ !
 4 | 
 5 | ## Sections
 6 | 
 7 | * 01 - [Introduction - the basics of Python](modules/intro.md)
 8 | * 02 - [Data analysis in Python with Pandas](modules/working_with_data.md)
 9 | * 03 - [Missing Values](modules/missing_values.md)
10 | * 04 - [Repetitive tasks with loops](modules/loops.md)
11 | * 05 - [Plotting with plotnine (ggplot)](modules/plotting_with_ggplot.md)
12 | 
13 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | .ipynb_checkpoints
 2 | 
 3 | # mkdocs generated site
 4 | /workshops/site
 5 | 
 6 | # Some side-effect outputs from ./build.sh we don't want
 7 | /workshops/docs/modules/notebooks/surveys.csv
 8 | /workshops/docs/modules/notebooks/speciesSubset.csv
 9 | /workshops/docs/modules/notebooks/function_surveys*.csv
10 | /workshops/docs/modules/notebooks/output/*
11 | 
12 | # Byte-compiled / optimized / DLL files
13 | __pycache__/
14 | *.py[cod]
15 | *$py.class
16 | 
17 | # C extensions
18 | *.so
19 | 
20 | 
21 | 


--------------------------------------------------------------------------------
/workshops/docs/fullday.md:
--------------------------------------------------------------------------------
 1 | # Introduction to Python Workshop
 2 | 
 3 | Welcome to _Introduction to Python_ !
 4 | 
 5 | ## Sections
 6 | 
 7 | * 01 - [Introduction - the basics of Python](modules/intro.md)
 8 | * 02 - [Repetitive tasks with loops](modules/loops.md)
 9 | * 03 - [Data analysis in Python with Pandas](modules/working_with_data.md)
10 | * 04 - [Reusable and modular code with functions](modules/functions.md)
11 | * 05 - [Handling Missing Values](modules/missing_values.md)
12 | * 06 - [Plotting with plotnine (ggplot)](modules/plotting_with_ggplot.md)
13 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/nbconvert_templates/student_markdown.tpl:
--------------------------------------------------------------------------------
 1 | {% extends 'workshop_notes_markdown.tpl'%}
 2 | 
 3 | {% block any_cell %}
 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %}
 5 | {{ super() }}
 6 | {% elif 'solution' in cell['metadata'].get('tags', []) %}
 7 | <!-- {{ super() }} -->
 8 | {% elif 'instructor' in cell['metadata'].get('tags', []) %}
 9 | <!-- {{ super() }} -->
10 | {% elif 'hide' in cell['metadata'].get('tags', []) %}
11 | <!-- {{ super() }} -->
12 | {% elif 'oneday' in cell['metadata'].get('tags', []) %}
13 | <!-- {{ super() }} -->
14 | {% else %}
15 | {{ super() }}
16 | {% endif %}
17 | {% endblock any_cell %}
18 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/nbconvert_templates/workshop_notes.tpl:
--------------------------------------------------------------------------------
 1 | {% extends 'full.tpl'%}
 2 | 
 3 | {%- block header -%}
 4 | {{ super() }}
 5 | 
 6 | <script src="https://unpkg.com/jquery"></script>
 7 | 
 8 | <style type="text/css">
 9 | div.output_wrapper {
10 |   margin-top: 0px;
11 | }
12 | .output_text {
13 |   max-height: 200px;
14 |   overflow-y: scroll;
15 | }
16 | 
17 | /*
18 | .text_cell inner_cell {
19 |   background-color: #fffff6;
20 | }
21 | */
22 | 
23 | </style>
24 | 
25 | {%- endblock header -%}
26 | 
27 | {% block in_prompt -%}
28 | <div class="prompt input_prompt">
29 |   <strong>>>></strong>&nbsp;
30 | </div>
31 | {%- endblock in_prompt -%}
32 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/nbconvert_templates/instructor_markdown.tpl:
--------------------------------------------------------------------------------
 1 | {% extends 'workshop_notes_markdown.tpl'%}
 2 | 
 3 | {% block any_cell %}
 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %}
 5 |         {{ super() }}
 6 | {% elif 'solution' in cell['metadata'].get('tags', []) %}
 7 |         {{ super() }}
 8 | {% elif 'instructor' in cell['metadata'].get('tags', []) %}
 9 |         {{ super() }}
10 | {% elif 'hide' in cell['metadata'].get('tags', []) %}
11 |     <div style="display:none">
12 |     </div>
13 | {% elif 'oneday' in cell['metadata'].get('tags', []) %}
14 |     <div style="display:none">
15 |     </div>
16 | {% else %}
17 |     {{ super() }}
18 | {% endif %}
19 | {% endblock any_cell %}
20 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/nbconvert_templates/student.tpl:
--------------------------------------------------------------------------------
 1 | {% extends 'workshop_notes.tpl'%}
 2 | 
 3 | {% block any_cell %}
 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %}
 5 |     <div style="background: #ffefef; border: solid thin #ffbaba">
 6 |         {{ super() }}
 7 |     </div>
 8 | {% elif 'solution' in cell['metadata'].get('tags', []) %}
 9 |     <div style="display:none">
10 |     </div>
11 | {% elif 'instructor' in cell['metadata'].get('tags', []) %}
12 |     <div style="display:none">
13 |     </div>
14 | {% elif 'hide' in cell['metadata'].get('tags', []) %}
15 |     <div style="display:none">
16 |     </div>
17 | {% elif 'oneday' in cell['metadata'].get('tags', []) %}
18 |     <div style="display:none">
19 |     </div>
20 | {% else %}
21 |     {{ super() }}
22 | {% endif %}
23 | {% endblock any_cell %}
24 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/nbconvert_templates/workshop_notes_markdown.tpl:
--------------------------------------------------------------------------------
 1 | {% extends 'markdown.tpl'%}
 2 | 
 3 | {%- block header -%}
 4 | {{ super() }}
 5 | 
 6 | <style>
 7 | .output_label {
 8 |     text-align: right;
 9 |     margin: -1em;
10 |     padding: 0;
11 |     font-size: 0.5em;
12 |     color: grey
13 | }
14 | </style>
15 | {%- endblock header -%}
16 | 
17 | {% block stream %}
18 | <pre class="output">
19 | <div class="output_label">output</div>
20 | <code class="text">
21 | {{ output.text }}
22 | </code>
23 | </pre>
24 | {% endblock stream %}
25 | 
26 | {% block data_text scoped %}
27 | <pre class="output">
28 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
29 | <code class="text">
30 | {{ output.get('data', {}).get('text/plain', '') }}
31 | </code>
32 | </pre>
33 | {% endblock data_text %}
34 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/nbconvert_templates/instructor.tpl:
--------------------------------------------------------------------------------
 1 | {% extends 'workshop_notes.tpl'%}
 2 | 
 3 | {% block any_cell %}
 4 | {% if 'challenge' in cell['metadata'].get('tags', []) %}
 5 |     <div style="background: #ffefef; border: solid thin #ffbaba">
 6 |         {{ super() }}
 7 |     </div>
 8 | {% elif 'solution' in cell['metadata'].get('tags', []) %}
 9 |     <div style="background: #efffef; border: solid thin #c6d8c6">
10 |         {{ super() }}
11 |     </div>
12 | {% elif 'instructor' in cell['metadata'].get('tags', []) %}
13 |     <div style="background: #f0f0ff; border: solid thin #bbbbff">
14 |         {{ super() }}
15 |     </div>
16 | {% elif 'hide' in cell['metadata'].get('tags', []) %}
17 |     <div style="display:none">
18 |     </div>
19 | {% elif 'oneday' in cell['metadata'].get('tags', []) %}
20 |     <div style="display:none">
21 |     </div>
22 | {% else %}
23 |     {{ super() }}
24 | {% endif %}
25 | {% endblock any_cell %}
26 | 


--------------------------------------------------------------------------------
/workshops/mkdocs.yml:
--------------------------------------------------------------------------------
 1 | ---
 2 | site_name: 'Introduction to Python Workshop'
 3 | repo_url: 'https://github.com/MonashDataFluency/python-workshop-base'
 4 | edit_uri: 'blob/master/README.md#modifying-and-building'
 5 | site_description: 'Monash Data Fluency Python Workshops'
 6 | theme: cinder
 7 | # theme: windmill
 8 | # theme: cluster
 9 | # theme: bootswatch
10 | 
11 | extra_css:
12 |   - css/extra.css
13 | 
14 | pages:
15 |   - 'Home': 'index.md'
16 |   - 'Modules':
17 |     - 'Introduction - the basics of Python': 'modules/intro.md'
18 |     - 'Working with Data': 'modules/working_with_data.md'
19 |     - 'Missing values': 'modules/missing_values.md'
20 |     - 'Indexing': 'modules/indexing.md'
21 |     - 'Loops': 'modules/loops.md'
22 |     - 'Combining DataFrames with Pandas': 'modules/merging_data.md'
23 |     - 'Plotting with ggplot for Python': 'modules/plotting_with_ggplot.md'
24 |     - 'Reusable and modular code with functions': 'modules/functions.md'
25 |     - 'Defensive Programming': 'modules/defensive_programming.md'
26 |   - 'Half Day Course': 'halfday.md'
27 |   - 'Full Day Course': 'fullday.md'
28 | 


--------------------------------------------------------------------------------
/workshops/docs/index.md:
--------------------------------------------------------------------------------
 1 | # Introduction to Python Workshop
 2 | 
 3 | Welcome to _Introduction to Python_ !
 4 | 
 5 | ## Modules
 6 | 
 7 | * 01 - [Introduction - the basics of Python](modules/intro.md)
 8 | * 02 - [Data analysis in Python with Pandas](modules/working_with_data.md)
 9 | * 03 - [Indexing and slicing](modules/indexing.md)
10 | * 04 - [Missing Values](modules/missing_values.md)
11 | * 05 - [Combining DataFrames in Pandas](modules/merging_data.md)
12 | * 06 - [Repetitive tasks with loops](modules/loops.md)
13 | * 07 - [Plotting with plotnine (ggplot)](modules/plotting_with_ggplot.md)
14 | * 08 - [Reusable and modular code with functions](modules/functions.md)
15 | * 09 - [Defensive Programming](modules/defensive_programming.md)
16 | 
17 | Some of these modules have been adapted from the original versions at 
18 | [Data Carpentry - Python for Ecologists](http://www.datacarpentry.org/python-ecology-lesson/) 
19 | and [Software Carpentry - Programming with Python](https://swcarpentry.github.io/python-novice-inflammation/)
20 | (used under a [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/)).
21 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/testing.svg:
--------------------------------------------------------------------------------
 1 | <svg xmlns="http://www.w3.org/2000/svg" width="344" height="153" font-size="14px">
 2 |   <line x1="142" x2="142" y1="2" y2="152" stroke="#c0c0c0" stroke-dasharray="3, 3" stroke-width=".5"/>
 3 |   <line x1="212" x2="212" y1="2" y2="152" stroke="#c0c0c0" stroke-dasharray="3, 3" stroke-width=".5"/>
 4 |   <line x1="24" x2="344" y1="121" y2="121" stroke="#c0c0c0" stroke-dasharray="3, 3" stroke-width=".5"/>
 5 | 
 6 |   <text x="31" y="22" text-anchor="end">&#8722;3.0</text>
 7 |   <path fill="none" stroke="#000" stroke-width="1" d="m38 16v6h279v-6"/>
 8 |   <text x="323" y="22">5.0</text>
 9 | 
10 |   <text x="136" y="57" text-anchor="end">0.0</text>
11 |   <path d="m299 51v6h-157v-6" fill="none" stroke="#000" stroke-width="1"/>
12 |   <text x="306" y="57">4.5</text>
13 | 
14 |   <text x="84" y="92" text-anchor="end">&#8722;1.5</text>
15 |   <path d="m212 86v6h-122v-6" fill="none" stroke="#000" stroke-width="1"/>
16 |   <text x="218" y="92">2.0</text>
17 | 
18 |   <text x="136" y="151" text-anchor="end">0.0</text>
19 |   <path d="m142 145v6h70v-6" fill="none" stroke="#000" stroke-width="1"/>
20 |   <text x="218" y="151">2.0</text>
21 | </svg>
22 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/wip/more_data_structures.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "## Sets"
 8 |    ]
 9 |   },
10 |   {
11 |    "cell_type": "code",
12 |    "execution_count": 1,
13 |    "metadata": {},
14 |    "outputs": [
15 |     {
16 |      "data": {
17 |       "text/plain": [
18 |        "{1, 2, 3, 4}"
19 |       ]
20 |      },
21 |      "execution_count": 1,
22 |      "metadata": {},
23 |      "output_type": "execute_result"
24 |     }
25 |    ],
26 |    "source": [
27 |     "unique_items = set([1, 1, 2, 2, 3, 4, 1, 2, 3, 4])\n",
28 |     "# or curly brackets\n",
29 |     "# unique_items = {1, 1, 2, 2, 3, 4, 1, 2, 3, 4}\n",
30 |     "unique_items"
31 |    ]
32 |   },
33 |   {
34 |    "cell_type": "code",
35 |    "execution_count": null,
36 |    "metadata": {},
37 |    "outputs": [],
38 |    "source": []
39 |   }
40 |  ],
41 |  "metadata": {
42 |   "kernelspec": {
43 |    "display_name": "Python 3",
44 |    "language": "python",
45 |    "name": "python3"
46 |   },
47 |   "language_info": {
48 |    "codemirror_mode": {
49 |     "name": "ipython",
50 |     "version": 3
51 |    },
52 |    "file_extension": ".py",
53 |    "mimetype": "text/x-python",
54 |    "name": "python",
55 |    "nbconvert_exporter": "python",
56 |    "pygments_lexer": "ipython3",
57 |    "version": "3.6.3"
58 |   }
59 |  },
60 |  "nbformat": 4,
61 |  "nbformat_minor": 2
62 | }
63 | 


--------------------------------------------------------------------------------
/scripts/markdown2ipynb.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Hacky script to take Markdown with ```python code blocks
 3 | # and convert to a ipynb format Jupyter notebook.
 4 | # The result will almost always need hand editing in Jupyter after
 5 | # conversion.
 6 | #
 7 | # Usage:
 8 | #
 9 | # python markdown2ipynb.py some_markdown.md >a_notebook.ipynb
10 | 
11 | import sys
12 | import json
13 | 
14 | with open(sys.argv[1], 'r') as f:
15 |     content = f.readlines()
16 | 
17 | metadata = {
18 |     "metadata": {
19 |         "celltoolbar": "Tags",
20 |         "kernelspec": {
21 |             "display_name": "Python 3",
22 |             "language": "python",
23 |             "name": "python3"
24 |         },
25 |         "language_info": {
26 |             "codemirror_mode": {
27 |                 "name": "ipython",
28 |                 "version": 3
29 |             },
30 |             "file_extension": ".py",
31 |             "mimetype": "text/x-python",
32 |             "name": "python",
33 |             "nbconvert_exporter": "python",
34 |             "pygments_lexer": "ipython3",
35 |             "version": "3.6.3"
36 |         }
37 |     },
38 |     "nbformat": 4,
39 |     "nbformat_minor": 2
40 | }
41 | 
42 | cells = {"cells": []}
43 | split_at_markdown_h2 = True
44 | in_python_block = False
45 | source = []
46 | for line in content:
47 |     if line.startswith("```python"):
48 |         in_python_block = True
49 |         # source_lines = ["%s\n" % l for l in source]
50 |         cells['cells'].append(
51 |             {"cell_type": "markdown",
52 |              "metadata": {},
53 |              "source": list(source)})
54 |         source = []
55 |         continue
56 | 
57 |     if in_python_block and line.startswith("```"):
58 |         in_python_block = False
59 |         cells['cells'].append(
60 |             {"cell_type": "code",
61 |              "metadata": {},
62 |              "outputs": [],
63 |              "execution_count": 0,
64 |              "source": list(source)})
65 |         source = []
66 |         continue
67 | 
68 |     if not in_python_block:
69 |         if split_at_markdown_h2 and line.startswith("## "):
70 |             cells['cells'].append(
71 |                 {"cell_type": "markdown",
72 |                  "metadata": {},
73 |                  "source": list(source)})
74 |             source = []
75 |             source.append(line)
76 |             continue
77 | 
78 |     source.append(line)
79 | 
80 | cells.update(metadata)
81 | print(json.dumps(cells, indent=2))
82 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/slicing-indexing.svg:
--------------------------------------------------------------------------------
 1 | <svg width="585" height="163" xmlns="http://www.w3.org/2000/svg">
 2 |  <!-- Created with Method Draw - http://github.com/duopixel/Method-Draw/ -->
 3 | 
 4 |  <g>
 5 |   <title>background</title>
 6 |   <rect x="-1" y="-1" width="587" height="165" id="canvas_background" fill="#fff"/>
 7 |   <g id="canvasGrid" display="none">
 8 |    <rect id="svg_3" width="100%" height="100%" x="0" y="0" stroke-width="0" fill="url(#gridpattern)"/>
 9 |   </g>
10 |  </g>
11 |  <g>
12 |   <title>Layer 1</title>
13 |   <rect stroke-width="0" id="svg_5" height="76" width="77" y="30.5" x="400.5" stroke="#000" fill="#A0D58A"/>
14 |   <text fill="#000000" stroke-width="0" x="505.603666" y="238.82258" id="svg_1" font-size="24" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve" transform="matrix(20.18501091003418,0,0,1.8235293626785278,-9963.615509808064,-210.41175216436386) " stroke="#000"/>
15 |   <text fill="#000000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="34" y="89.5" id="svg_2" font-size="48" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve" font-weight="normal" stroke="#000">grades = [88, 72, 93, 94]</text>
16 |   <text fill="#000000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="271.148342" y="135.5" id="svg_4" font-size="48" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve" transform="matrix(9.393442153930664,0,0,1,-2283.0162658691406,0) " stroke="#000"/>
17 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="273" y="47.5" id="svg_8" font-size="18" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve">0      1              3</text>
18 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="143" y="25.5" id="svg_10" font-size="18" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve">indexing: getting a specific element</text>
19 |   <text fill="#000000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="247.443099" y="332.5" id="svg_12" font-size="48" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve" transform="matrix(6.77049160003662,0,0,1,-1590.3114197850227,-47) " stroke="#000"/>
20 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="38" y="125.5" id="svg_17" font-size="20" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve">&gt;&gt;&gt; grades[2]</text>
21 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="38" y="152.5" id="svg_18" font-size="20" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve">93</text>
22 |   <text font-weight="bold" xml:space="preserve" text-anchor="start" font-family="'Courier New', Courier, monospace" font-size="18" id="svg_11" y="48" x="433.5" fill-opacity="null" stroke-opacity="null" stroke-width="0" stroke="#000" fill="#000000">2</text>
23 |  </g>
24 | </svg>


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/images/slicing-slicing.svg:
--------------------------------------------------------------------------------
 1 | <svg width="582" height="163" xmlns="http://www.w3.org/2000/svg">
 2 |  <!-- Created with Method Draw - http://github.com/duopixel/Method-Draw/ -->
 3 | 
 4 |  <g>
 5 |   <title>background</title>
 6 |   <rect x="-1" y="-1" width="584" height="165" id="canvas_background" fill="#fff"/>
 7 |   <g id="canvasGrid" display="none">
 8 |    <rect id="svg_3" width="100%" height="100%" x="0" y="0" stroke-width="0" fill="url(#gridpattern)"/>
 9 |   </g>
10 |  </g>
11 |  <g>
12 |   <title>Layer 1</title>
13 |   <rect stroke-width="0" stroke="#000" id="svg_9" height="68.999994" width="172.999984" y="35.500006" x="314.500012" stroke-opacity="null" fill="#7AD6CA"/>
14 |   <text fill="#000000" stroke-width="0" x="505.900917" y="247.596774" id="svg_1" font-size="24" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve" transform="matrix(20.18501091003418,0,0,1.8235293626785278,-9963.615509808064,-210.41175216436386) " stroke="#000"/>
15 |   <text fill="#000000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="35" y="74.5" id="svg_2" font-size="48" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve" font-weight="normal" stroke="#000">grades = [88, 72, 93, 94]</text>
16 |   <text fill="#000000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="271.787086" y="151.5" id="svg_4" font-size="48" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve" transform="matrix(9.393442153930664,0,0,1,-2283.0162658691406,0) " stroke="#000"/>
17 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="240" y="102" id="svg_6" font-size="18" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve">0             2             4</text>
18 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="143" y="26" id="svg_11" font-size="18" font-family="Helvetica, Arial, sans-serif" text-anchor="start" xml:space="preserve">slicing: selecting a set of elements</text>
19 |   <text fill="#000000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="247.1477" y="301.5" id="svg_12" font-size="48" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve" transform="matrix(6.77049160003662,0,0,1,-1582.3114197850227,0) " stroke="#000"/>
20 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="39" y="120.5" id="svg_13" font-size="20" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve">&gt;&gt;&gt; grades[1:3]</text>
21 |   <text fill="#000000" stroke="#000" stroke-width="0" stroke-opacity="null" fill-opacity="null" x="39" y="145.5" id="svg_14" font-size="20" font-family="'Courier New', Courier, monospace" text-anchor="start" xml:space="preserve">[72, 93]</text>
22 |   <text font-weight="bold" xml:space="preserve" text-anchor="start" font-family="'Courier New', Courier, monospace" font-size="18" id="svg_10" y="102" x="317.5" fill-opacity="null" stroke-opacity="null" stroke-width="0" stroke="#000" fill="#000000">1</text>
23 |   <text font-weight="bold" xml:space="preserve" text-anchor="start" font-family="'Courier New', Courier, monospace" font-size="18" id="svg_15" y="102" x="475.5" fill-opacity="null" stroke-opacity="null" stroke-width="0" stroke="#000" fill="#000000">3</text>
24 |  </g>
25 | </svg>


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | ## Instructional Material
 2 | 
 3 | This workshop material is made available under a [Creative Commons Attribution license (CC-BY 4.0)][cc-by-human]
 4 | 
 5 | Parts of this content have been adapted from the [Data Carpentry "Python for ecologists"](http://www.datacarpentry.org/python-ecology-lesson/) workshop material, used under a [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/legalcode).
 6 | 
 7 | The following is a human-readable summary of (and not a substitute for) the [full legal text of the CC BY 4.0
 8 | license][cc-by-legal].
 9 | 
10 | You are free:
11 | 
12 | * to **Share**---copy and redistribute the material in any medium or format
13 | * to **Adapt**---remix, transform, and build upon the material
14 | 
15 | for any purpose, even commercially.
16 | 
17 | The licensor cannot revoke these freedoms as long as you follow the
18 | license terms.
19 | 
20 | Under the following terms:
21 | 
22 | * **Attribution**---You must give appropriate credit (mentioning that
23 |   your work is derived from work that is Copyright © Software
24 |   Carpentry and Monash Data Fluency, where practical, linking to
25 |   http://software-carpentry.org/ and https://github.com/MonashDataFluency), 
26 |   provide a [link to the license][cc-by-human], 
27 |   and indicate if changes were made. You may do
28 |   so in any reasonable manner, but not in any way that suggests the
29 |   licensor endorses you or your use.
30 | 
31 | **No additional restrictions**---You may not apply legal terms or
32 | technological measures that legally restrict others from doing
33 | anything the license permits.  With the understanding that:
34 | 
35 | Notices:
36 | 
37 | * You do not have to comply with the license for elements of the
38 |   material in the public domain or where your use is permitted by an
39 |   applicable exception or limitation.
40 | * No warranties are given. The license may not give you all of the
41 |   permissions necessary for your intended use. For example, other
42 |   rights such as publicity, privacy, or moral rights may limit how you
43 |   use the material.
44 | 
45 | ## Software
46 | 
47 | [The MIT License (MIT)][mit-license]
48 | 
49 | Permission is hereby granted, free of charge, to any person obtaining
50 | a copy of this software and associated documentation files (the
51 | "Software"), to deal in the Software without restriction, including
52 | without limitation the rights to use, copy, modify, merge, publish,
53 | distribute, sublicense, and/or sell copies of the Software, and to
54 | permit persons to whom the Software is furnished to do so, subject to
55 | the following conditions:
56 | 
57 | The above copyright notice and this permission notice shall be
58 | included in all copies or substantial portions of the Software.
59 | 
60 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
61 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
62 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
63 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
64 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
65 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
66 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
67 | 
68 | [cc-by-human]: https://creativecommons.org/licenses/by/4.0/
69 | [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode
70 | [mit-license]: http://opensource.org/licenses/mit-license.html
71 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Python workshops base
  2 | 
  3 | This is a base repository for Data Fluency Python Workshop modules.
  4 | 
  5 | To add or modify content, edit the notebooks in 
  6 | `workshops/docs/modules/notebooks`.
  7 | 
  8 | ## Quick start
  9 | ```bash
 10 | # Install pipenv to ~/.local/bin/pipenv
 11 | pip install --user pipenv
 12 | 
 13 | git clone https://github.com/MonashDataFluency/python-workshop-base.git
 14 | cd python-workshop-base
 15 | 
 16 | # Install dependencies
 17 | pipenv install
 18 | 
 19 | # Enter the virtual environment
 20 | pipenv shell
 21 | jupyter notebook
 22 | # Edit the notebooks in workshops/docs/modules/notebooks
 23 | # Ctrl-C in terminal to stop Jupyter when you are done
 24 | ./build.sh
 25 | 
 26 | # To view the generated site
 27 | cd workshops
 28 | open http://127.0.0.1:8000 && mkdocs serve
 29 | ```
 30 | 
 31 | If everything looks fine, commit your changes (ideally to a branch), `git push` and send a Pull Request.
 32 | 
 33 | To deploy the public docs, [see here](#deploying-the-static-site-to-github-pages).
 34 | 
 35 | ----
 36 | 
 37 | ## Setup
 38 | 
 39 | Install [Pipenv](https://docs.pipenv.org/) (eg `pip install pipenv`).
 40 | 
 41 | Run:
 42 | 
 43 | ```bash
 44 | pipenv install
 45 | ```
 46 | 
 47 | You can enter the virtualenv with `pipenv shell`, or run single commands in the 
 48 | enviroment of the virtualenv with `pipenv run`.
 49 | 
 50 | ## Modifying and building
 51 | 
 52 | Workshop modules can be found in `workshops/docs/modules/notebooks`.
 53 | 
 54 | To edit and update a module:
 55 | * edit the Jupyter Notebook, following the required [conventions](#jupyter-notebook-conventions).
 56 | * ensure your code runs
 57 | * save the notebook
 58 | * **stop the kernel for the notebook**
 59 | 
 60 | Then run:
 61 | 
 62 | ```bash
 63 | # Export the notebooks, build the docs
 64 | pipenv run ./build.sh
 65 | ```
 66 | 
 67 | This script runs `jupyter nbconvert` to generate Markdown from the notebooks, 
 68 | then runs `mkdocs build` to generate the static HTML.
 69 | 
 70 | New modules should be listed in `workshops/mkdocs.yml`, `workshops/docs/index.md` 
 71 | and possibly `workshops/docs/fullday.md` and/or `workshops/docs/halfday.md` if they form part of the 
 72 | full or half day workshops.
 73 | 
 74 | ### Jupyter notebook conventions
 75 | 
 76 | The intention of developing the workshop materials directly from Jupyter notebooks is to:
 77 | 
 78 | 1. Ensure code examples run correctly, catch errors early.
 79 | 2. Make each module a self-contained unit, including pulling in dependencies.
 80 | 3. Enable generation of student and instructor notes from a single source.
 81 | 
 82 | Here are some conventions to follow to achieve this:
 83 | 
 84 | * **Cell tagging**: challenges should be tagged `challenge` and **solutions should be tagged** `solution`.
 85 |   The `nbconvert` templates hide cells tagged `solution` from the main student notes,
 86 |   but output them for instructor notes. Currently (May-2018) only `jupyter notebook` 
 87 |   allows editing cell tags - the required UI for `jupyter lab` hasn't been completed yet.
 88 | * **Package dependencies**: Include a `!pip install somepackage` cell near to start of every module
 89 |   that installs any required dependencies. This makes the modules work as standalone units in a range 
 90 |   of environments (local Jupyter or IPython REPL, Azure Notebooks, Colaboratory, Python Anywhere).
 91 | * **Acquire data via URLs in the notebook**: Include cells like `import urllib; urllib.request.urlretrieve("https://files.rcsb.org/download/3FPR.pdb")` to download external data.
 92 |   This allows the notes to be used in various hosted or local Jupyter environments 
 93 |   (it's also a useful operation for students to learn).
 94 | 
 95 | ## Viewing the generated site
 96 | 
 97 | You can view the site locally via:
 98 | 
 99 | ```bash
100 | pipenv shell
101 | cd workshops
102 | mkdocs serve
103 | 
104 | # or, run
105 | # pipenv run bash -c "cd workshops; mkdocs serve"
106 | ```
107 | 
108 | Go to [http://127.0.0.1:8000](http://127.0.0.1:8000)
109 | 
110 | ## Deploying the static site to Github Pages
111 | 
112 | To update the site at https://MonashDataFluency.github.io/python-workshop-base/, run:
113 | 
114 | ```bash
115 | pipenv run ./deploy.sh
116 | ```
117 | 
118 | # License
119 | 
120 | This workshop material is made available under a 
121 | [Creative Commons Attribution license (CC-BY 4.0)](https://creativecommons.org/licenses/by/4.0/legalcode)
122 | 
123 | Parts of this content have been adapted from the 
124 | [Data Carpentry "Python for ecologists"](http://www.datacarpentry.org/python-ecology-lesson/) 
125 | workshop material, used under a [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/legalcode).
126 | 
127 | Code is made available under the 
128 | [MIT License](http://opensource.org/licenses/mit-license.html).
129 | 
130 | See [LICENCE.md](LICENSE.md) for the full text.
131 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/wip/functions.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "slideshow": {
  7 |      "slide_type": "slide"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "## Functions"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "Functions wrap up reusable pieces of code - the *DRY* principle\n",
 19 |     "\n",
 20 |     "Significant whitespace: the body of the function is indicated by indenting by 4 spaces\n",
 21 |     "\n",
 22 |     "*(We also use these indented blocks for if/else, for and while statements .. later !)*\n",
 23 |     "\n",
 24 |     "`return` statements immediately return a value (or `None` if no value is given)\n",
 25 |     "\n",
 26 |     "Any code in the function after the `return` statement does not get executed."
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": 60,
 32 |    "metadata": {
 33 |     "slideshow": {
 34 |      "slide_type": "subslide"
 35 |     }
 36 |    },
 37 |    "outputs": [
 38 |     {
 39 |      "name": "stdout",
 40 |      "output_type": "stream",
 41 |      "text": [
 42 |       "256 python-esque\n"
 43 |      ]
 44 |     }
 45 |    ],
 46 |    "source": [
 47 |     "def square(x):\n",
 48 |     "    return x**2\n",
 49 |     "\n",
 50 |     "def hyphenate(a, b):\n",
 51 |     "    return a + '-' + b\n",
 52 |     "    print(\"We will never get here\")\n",
 53 |     "\n",
 54 |     "print(square(16), hyphenate('python', 'esque'))"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "markdown",
 59 |    "metadata": {
 60 |     "slideshow": {
 61 |      "slide_type": "subslide"
 62 |     }
 63 |    },
 64 |    "source": [
 65 |     "### Indentation and whitespace\n",
 66 |     "\n",
 67 |     "* Python uses spaces at the start of a line to indicate a 'block' of code.\n",
 68 |     "* A new block of code should be indented by four spaces.\n",
 69 |     "\n",
 70 |     "* For a function, all the indented code is part of the function.\n",
 71 |     "* (This also applies to loops like `for` and `while` and conditionals like `if`)\n",
 72 |     "\n",
 73 |     "(Indenting/dedenting by four spaces in Python is the equivalent to opening **{** and closing **}** curly brackets in languages like Java, Javascript, C, C++, C# etc)\n",
 74 |     "\n",
 75 |     "(Python actually allows you to indent by any number of spaces as long as you are consistent throughout the file. The official Python style guide prefers four spaces https://www.python.org/dev/peps/pep-0008/, and most Python code you'll find follows that convention, so you should too. You can even use tab characters, but please, please, pretty please don't do that)."
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "code",
 80 |    "execution_count": 61,
 81 |    "metadata": {
 82 |     "slideshow": {
 83 |      "slide_type": "slide"
 84 |     }
 85 |    },
 86 |    "outputs": [
 87 |     {
 88 |      "name": "stdout",
 89 |      "output_type": "stream",
 90 |      "text": [
 91 |       "4 6 9\n"
 92 |      ]
 93 |     }
 94 |    ],
 95 |    "source": [
 96 |     "# Functions can return multiple values (just return a tuple and unpack it)\n",
 97 |     "def lengths(a, b, c):\n",
 98 |     "    return len(a), len(b), len(c)\n",
 99 |     "\n",
100 |     "x, y, z = lengths(\"long\", \"longer\", \"LONGEREST\")\n",
101 |     "print(x, y, z)"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "code",
106 |    "execution_count": 62,
107 |    "metadata": {
108 |     "slideshow": {
109 |      "slide_type": "slide"
110 |     },
111 |     "tags": [
112 |      "biosummer"
113 |     ]
114 |    },
115 |    "outputs": [
116 |     {
117 |      "data": {
118 |       "text/plain": [
119 |        "['MIL', 'GROGDRIN', 'PINEAPPLE']"
120 |       ]
121 |      },
122 |      "execution_count": 62,
123 |      "metadata": {},
124 |      "output_type": "execute_result"
125 |     }
126 |    ],
127 |    "source": [
128 |     "def split_at(seq, residue='K'):\n",
129 |     "    \"\"\"\n",
130 |     "    Takes a protein sequence (as a string) and splits it at each K residue,\n",
131 |     "    or the residue specified in the `residue` keyword argument. Split point\n",
132 |     "    residue is discarded.\n",
133 |     "    \n",
134 |     "    Returns a list of strings.\n",
135 |     "    \"\"\"\n",
136 |     "    return seq.split(residue)\n",
137 |     "\n",
138 |     "split_at('MILKGROGDRINKPINEAPPLE')"
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "code",
143 |    "execution_count": 63,
144 |    "metadata": {
145 |     "slideshow": {
146 |      "slide_type": "slide"
147 |     }
148 |    },
149 |    "outputs": [],
150 |    "source": [
151 |     "# Functions can have an indeterminate number of arguments and keyword arguments using * and **\n",
152 |     "import math\n",
153 |     "\n",
154 |     "def vector_magnitude(x, y, *args, **kwargs):\n",
155 |     "    \n",
156 |     "    # print(args)    # args is a tuple\n",
157 |     "    # print(kwargs)  # kwargs is a dictionary\n",
158 |     "    \n",
159 |     "    scale = kwargs.get('scale', 1)\n",
160 |     "    \n",
161 |     "    vector = [x,y] + list(args)\n",
162 |     "    return math.sqrt(sum(v**2 for v in vector)) * scale"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": 64,
168 |    "metadata": {},
169 |    "outputs": [
170 |     {
171 |      "name": "stdout",
172 |      "output_type": "stream",
173 |      "text": [
174 |       "9.219544457292887\n"
175 |      ]
176 |     }
177 |    ],
178 |    "source": [
179 |     "print(vector_magnitude(1, 2, 4, 8, m=2))"
180 |    ]
181 |   }
182 |  ],
183 |  "metadata": {
184 |   "celltoolbar": "Tags",
185 |   "kernelspec": {
186 |    "display_name": "Python 3",
187 |    "language": "python",
188 |    "name": "python3"
189 |   },
190 |   "language_info": {
191 |    "codemirror_mode": {
192 |     "name": "ipython",
193 |     "version": 3
194 |    },
195 |    "file_extension": ".py",
196 |    "mimetype": "text/x-python",
197 |    "name": "python",
198 |    "nbconvert_exporter": "python",
199 |    "pygments_lexer": "ipython3",
200 |    "version": "3.6.3"
201 |   }
202 |  },
203 |  "nbformat": 4,
204 |  "nbformat_minor": 2
205 | }
206 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/wip/conditionals.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "slideshow": {
  7 |      "slide_type": "slide"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "## Conditionals"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "code",
 16 |    "execution_count": 65,
 17 |    "metadata": {},
 18 |    "outputs": [
 19 |     {
 20 |      "data": {
 21 |       "text/plain": [
 22 |        "True"
 23 |       ]
 24 |      },
 25 |      "execution_count": 65,
 26 |      "metadata": {},
 27 |      "output_type": "execute_result"
 28 |     }
 29 |    ],
 30 |    "source": [
 31 |     "a = 10\n",
 32 |     "b = 0\n",
 33 |     "a > 1"
 34 |    ]
 35 |   },
 36 |   {
 37 |    "cell_type": "code",
 38 |    "execution_count": 66,
 39 |    "metadata": {
 40 |     "slideshow": {
 41 |      "slide_type": "subslide"
 42 |     }
 43 |    },
 44 |    "outputs": [
 45 |     {
 46 |      "name": "stdout",
 47 |      "output_type": "stream",
 48 |      "text": [
 49 |       "a is greater than one\n"
 50 |      ]
 51 |     }
 52 |    ],
 53 |    "source": [
 54 |     "if a > 1:\n",
 55 |     "    print(\"a is greater than one\")"
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "code",
 60 |    "execution_count": 67,
 61 |    "metadata": {
 62 |     "slideshow": {
 63 |      "slide_type": "subslide"
 64 |     }
 65 |    },
 66 |    "outputs": [
 67 |     {
 68 |      "name": "stdout",
 69 |      "output_type": "stream",
 70 |      "text": [
 71 |       "Bird is the word.\n",
 72 |       "The word is not girt.\n"
 73 |      ]
 74 |     }
 75 |    ],
 76 |    "source": [
 77 |     "word = 'Bird'\n",
 78 |     "\n",
 79 |     "# Note: Double equals for a conditional vs single equals for assignment !\n",
 80 |     "if word == 'Bird':\n",
 81 |     "    print('Bird is the word.')\n",
 82 |     "    \n",
 83 |     "if word != 'Girt':\n",
 84 |     "    print('The word is not girt.')"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 68,
 90 |    "metadata": {
 91 |     "slideshow": {
 92 |      "slide_type": "subslide"
 93 |     }
 94 |    },
 95 |    "outputs": [
 96 |     {
 97 |      "name": "stdout",
 98 |      "output_type": "stream",
 99 |      "text": [
100 |       "'ird' is in Bird.\n",
101 |       "'i' is in letters.\n"
102 |      ]
103 |     }
104 |    ],
105 |    "source": [
106 |     "if 'ird' in word:\n",
107 |     "    print(\"'ird' is in Bird.\")\n",
108 |     "    \n",
109 |     "letters = ['B', 'i', 'r', 'd']\n",
110 |     "if 'i' in letters:\n",
111 |     "    print(\"'i' is in letters.\")"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "markdown",
116 |    "metadata": {
117 |     "slideshow": {
118 |      "slide_type": "subslide"
119 |     }
120 |    },
121 |    "source": [
122 |     "*Protip*: Long lines can be split across two or more using a backslash ('\\')\n",
123 |     "\n",
124 |     "This can make your code more readable.\n",
125 |     "\n",
126 |     "There should be nothing after the backslash, including whitespace.\n",
127 |     "\n",
128 |     "Try to keep lines shorter than 78 characters for a PEP-8 style bonus."
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "code",
133 |    "execution_count": 69,
134 |    "metadata": {
135 |     "slideshow": {
136 |      "slide_type": "subslide"
137 |     }
138 |    },
139 |    "outputs": [
140 |     {
141 |      "name": "stdout",
142 |      "output_type": "stream",
143 |      "text": [
144 |       "There is no 'I' in team (or TEAM).\n"
145 |      ]
146 |     }
147 |    ],
148 |    "source": [
149 |     "if 'I' not in 'team' or \\\n",
150 |     "   'I' not in 'TEAM':\n",
151 |     "    print(\"There is no 'I' in team (or TEAM).\")"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "code",
156 |    "execution_count": 70,
157 |    "metadata": {
158 |     "slideshow": {
159 |      "slide_type": "subslide"
160 |     }
161 |    },
162 |    "outputs": [
163 |     {
164 |      "data": {
165 |       "text/plain": [
166 |        "True"
167 |       ]
168 |      },
169 |      "execution_count": 70,
170 |      "metadata": {},
171 |      "output_type": "execute_result"
172 |     }
173 |    ],
174 |    "source": [
175 |     "# Boolean logic\n",
176 |     "# True and True => True\n",
177 |     "a > 1 and b <= 0"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "code",
182 |    "execution_count": 71,
183 |    "metadata": {},
184 |    "outputs": [
185 |     {
186 |      "data": {
187 |       "text/plain": [
188 |        "True"
189 |       ]
190 |      },
191 |      "execution_count": 71,
192 |      "metadata": {},
193 |      "output_type": "execute_result"
194 |     }
195 |    ],
196 |    "source": [
197 |     "# True or False => True\n",
198 |     "a > 1 or b > 1"
199 |    ]
200 |   },
201 |   {
202 |    "cell_type": "code",
203 |    "execution_count": 72,
204 |    "metadata": {
205 |     "slideshow": {
206 |      "slide_type": "subslide"
207 |     }
208 |    },
209 |    "outputs": [
210 |     {
211 |      "name": "stdout",
212 |      "output_type": "stream",
213 |      "text": [
214 |       "a is less than fifty\n"
215 |      ]
216 |     }
217 |    ],
218 |    "source": [
219 |     "if a > 100:\n",
220 |     "    print(\"a is greater than one hundred\")\n",
221 |     "elif a > 50:\n",
222 |     "    print(\"a is greater than fifty but less than one hundred\")\n",
223 |     "else:\n",
224 |     "    print(\"a is less than fifty\")\n",
225 |     "    \n",
226 |     "# For better or worse, there is no case/switch statement in Python - you just use if/elif/elif/else"
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": 73,
232 |    "metadata": {
233 |     "slideshow": {
234 |      "slide_type": "subslide"
235 |     }
236 |    },
237 |    "outputs": [
238 |     {
239 |      "name": "stdout",
240 |      "output_type": "stream",
241 |      "text": [
242 |       "A non-zero int is truthy\n",
243 |       "The int 0 is 'falsey' ... not False => True !\n",
244 |       "A non-empty string, even whitespace, is 'truthy\n"
245 |      ]
246 |     }
247 |    ],
248 |    "source": [
249 |     "# Truthyness\n",
250 |     "if a:\n",
251 |     "    print(\"A non-zero int is truthy\")\n",
252 |     "\n",
253 |     "if not (a - 10):\n",
254 |     "    print(\"The int 0 is 'falsey' ... not False => True !\")\n",
255 |     "\n",
256 |     "if '' or [] or () or dict():\n",
257 |     "    print(\"We will never see this since an empty string, list, tuple and dict are all 'falsey'\")\n",
258 |     "    \n",
259 |     "if \"    \":\n",
260 |     "    print(\"A non-empty string, even whitespace, is 'truthy\")"
261 |    ]
262 |   }
263 |  ],
264 |  "metadata": {
265 |   "celltoolbar": "Tags",
266 |   "kernelspec": {
267 |    "display_name": "Python 3",
268 |    "language": "python",
269 |    "name": "python3"
270 |   },
271 |   "language_info": {
272 |    "codemirror_mode": {
273 |     "name": "ipython",
274 |     "version": 3
275 |    },
276 |    "file_extension": ".py",
277 |    "mimetype": "text/x-python",
278 |    "name": "python",
279 |    "nbconvert_exporter": "python",
280 |    "pygments_lexer": "ipython3",
281 |    "version": "3.6.3"
282 |   }
283 |  },
284 |  "nbformat": 4,
285 |  "nbformat_minor": 2
286 | }
287 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/loops.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | <style>
  4 | .output_label {
  5 |     text-align: right;
  6 |     margin: -1em;
  7 |     padding: 0;
  8 |     font-size: 0.5em;
  9 |     color: grey
 10 | }
 11 | </style>
 12 | 
 13 | 
 14 | # Automation with Loops
 15 | 
 16 | 
 17 | 
 18 | 
 19 | <!-- 
 20 | ## Instructor notes
 21 | 
 22 | *Estimated teaching time:* 30 min
 23 | 
 24 | *Estimated challenge time:* 0 min
 25 | 
 26 | *Key questions:*
 27 | 
 28 |   - "How can I do the same operations on many different values?""
 29 |     
 30 | *Learning objectives:*
 31 | 
 32 |   - "Explain what a `for` loop does."
 33 |   - "Correctly write `for` loops to repeat simple calculations."
 34 |   - "Trace changes to a loop variable as the loop runs."
 35 |   - "Trace changes to other variables as they are updated by a `for` loop."
 36 | 
 37 | *Key points:*
 38 | 
 39 |   - "Use `for variable in sequence` to process the elements of a sequence one at a time."
 40 |   - "The body of a `for` loop must be indented."
 41 |   - "Use `len(thing)` to determine the length of something that contains other values."
 42 | 
 43 | ---
 44 |  -->
 45 | 
 46 | 
 47 | 
 48 | 
 49 | An example task that we might want to repeat is printing each character in a
 50 | word on a line of its own.
 51 | 
 52 | 
 53 | 
 54 | 
 55 | 
 56 | 
 57 | ```python
 58 | word = 'lead'
 59 | ```
 60 | 
 61 | 
 62 | 
 63 | 
 64 | 
 65 | We can access a character in a string using its index. For example, we can get the first
 66 | character of the word `'lead'`, by using `word[0]`. One way to print each character is to use
 67 | four `print` statements:
 68 | 
 69 | 
 70 | 
 71 | 
 72 | 
 73 | 
 74 | ```python
 75 | print(word[0])
 76 | print(word[1])
 77 | print(word[2])
 78 | print(word[3])
 79 | ```
 80 | 
 81 | <pre class="output">
 82 | <div class="output_label">output</div>
 83 | <code class="text">
 84 | l
 85 | e
 86 | a
 87 | d
 88 | 
 89 | </code>
 90 | </pre>
 91 | 
 92 | 
 93 | 
 94 | 
 95 | 
 96 | While this works, it's a bad approach for two reasons:
 97 | 
 98 | 1. It doesn't scale:
 99 |    if we want to print the characters in a string that's hundreds of letters long,
100 |    we'd be better off just typing them in.
101 | 
102 | 2. It's fragile:
103 |    if we give it a longer string,
104 |    it only prints part of the data,
105 |    and if we give it a shorter one,
106 |    it produces an error because we're asking for characters that don't exist.
107 | 
108 | 
109 | 
110 | 
111 | 
112 | 
113 | 
114 | Running:
115 | 
116 | ```python
117 | word = 'tin'
118 | print(word[0])
119 | print(word[1])
120 | print(word[2])
121 | print(word[3])
122 | ```
123 | 
124 | 
125 | 
126 | 
127 | 
128 | Gives the error:
129 | 
130 | ```
131 | ---------------------------------------------------------------------------
132 | IndexError                                Traceback (most recent call last)
133 | <ipython-input-4-e59d5eac5430> in <module>()
134 |       3 print(word[1])
135 |       4 print(word[2])
136 | ----> 5 print(word[3])
137 | 
138 | IndexError: string index out of range
139 | ```
140 | 
141 | 
142 | 
143 | 
144 | 
145 | 
146 | 
147 | Here's a better approach:
148 | 
149 | 
150 | 
151 | 
152 | 
153 | 
154 | 
155 | 
156 | ```python
157 | word = 'lead'
158 | for char in word:
159 |     print(char)
160 | ```
161 | 
162 | <pre class="output">
163 | <div class="output_label">output</div>
164 | <code class="text">
165 | l
166 | e
167 | a
168 | d
169 | 
170 | </code>
171 | </pre>
172 | 
173 | 
174 | 
175 | 
176 | 
177 | This is shorter --- certainly shorter than something that prints every character in a hundred-letter string --- and
178 | more robust as well:
179 | 
180 | 
181 | 
182 | 
183 | 
184 | 
185 | ```python
186 | word = 'oxygen'
187 | for char in word:
188 |     print(char)
189 | ```
190 | 
191 | <pre class="output">
192 | <div class="output_label">output</div>
193 | <code class="text">
194 | o
195 | x
196 | y
197 | g
198 | e
199 | n
200 | 
201 | </code>
202 | </pre>
203 | 
204 | 
205 | 
206 | 
207 | 
208 | The improved version uses a **for loop** to repeat an operation --- in this case, printing --- once for each thing in a sequence.
209 | The general form of a loop is:
210 | 
211 | ```python
212 | for variable in collection:
213 |     # do things with variable
214 | ```
215 | 
216 | 
217 | 
218 | 
219 | 
220 | 
221 | Using the oxygen example above, the loop might look like this:
222 | 
223 | ![loop_image](images/loops_image.png)
224 | 
225 | where each character (`char`) in the variable `word` is looped through and printed one character after another.
226 | The numbers in the diagram denote which loop cycle the character was printed in (1 being the first loop, and 6 being the final loop).
227 | 
228 | We can call the **loop variable** anything we like,
229 | but there must be a colon at the end of the line starting the loop, and we must indent anything we want to run inside the loop. Unlike many other languages, there is no command to signify the end of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop.
230 | 
231 | 
232 | 
233 | 
234 | 
235 | 
236 | 
237 | 
238 | ## What's in a name?
239 | 
240 | 
241 | In the example above, the loop variable was given the name `char` as a mnemonic; it is short for 'character'. 
242 | We can choose any name we want for variables. We might just as easily have chosen the name `banana` for the loop variable, as long as we use the same name when we invoke the variable inside the loop:
243 | 
244 | 
245 | 
246 | 
247 | 
248 | 
249 | 
250 | 
251 | ```python
252 | word = 'oxygen'
253 | for banana in word:
254 |     print(banana)
255 | ```
256 | 
257 | <pre class="output">
258 | <div class="output_label">output</div>
259 | <code class="text">
260 | o
261 | x
262 | y
263 | g
264 | e
265 | n
266 | 
267 | </code>
268 | </pre>
269 | 
270 | 
271 | 
272 | 
273 | 
274 | It is a good idea to choose variable names that are meaningful, otherwise it would be more difficult to understand what the loop is doing.
275 | 
276 | 
277 | Here's another loop that repeatedly updates a variable:
278 | 
279 | 
280 | 
281 | 
282 | 
283 | 
284 | ```python
285 | length = 0
286 | for vowel in 'aeiou':
287 |     length = length + 1
288 | print('There are', length, 'vowels')
289 | ```
290 | 
291 | <pre class="output">
292 | <div class="output_label">output</div>
293 | <code class="text">
294 | There are 5 vowels
295 | 
296 | </code>
297 | </pre>
298 | 
299 | 
300 | 
301 | 
302 | 
303 | It's worth tracing the execution of this little program step by step.
304 | 
305 | Since there are five characters in `'aeiou'`,
306 | the statement on line 3 will be executed five times.
307 | 
308 | The first time around,
309 | `length` is zero (the value assigned to it on line 1)
310 | and `vowel` is `'a'`.
311 | The statement adds 1 to the old value of `length`,
312 | producing 1,
313 | and updates `length` to refer to that new value.
314 | 
315 | The next time around,
316 | `vowel` is `'e'` and `length` is 1,
317 | so `length` is updated to be 2.
318 | 
319 | After three more updates,
320 | `length` is 5;
321 | since there is nothing left in `'aeiou'` for Python to process,
322 | the loop finishes
323 | and the `print` statement on line 4 tells us our final answer.
324 | 
325 | Note that a loop variable `vowel` is just a variable that's being used to record progress in a loop.
326 | 
327 | 
328 | 
329 | 
330 | 
331 | ## Challenge - scope of the loop variable
332 | 
333 | 1. In the loop over `"aeiou"` above, does the loop variable `vowel` exist after the loop has finished ?
334 | 
335 | 
336 | 
337 | 
338 | 
339 | 
340 | 
341 | ```python
342 | length = 0
343 | for vowel in 'aeiou':
344 |     length = length + 1
345 | print('After the loop, `vowel` exists and has the value: ' + vowel)
346 | 
347 | # The loop variable `vowel` exists after the loop is completed, not only inside the loop
348 | ```
349 | 
350 | <pre class="output">
351 | <div class="output_label">output</div>
352 | <code class="text">
353 | After the loop, `vowel` exists and has the value: u
354 | 
355 | </code>
356 | </pre>
357 | 
358 | 
359 | 
360 | 
361 | 
362 | Note also that finding the length of a string is such a common operation that Python actually has a built-in function to do it called `len`:
363 | 
364 | 
365 | 
366 | 
367 | 
368 | 
369 | ```python
370 | print(len('aeiou'))
371 | ```
372 | 
373 | <pre class="output">
374 | <div class="output_label">output</div>
375 | <code class="text">
376 | 5
377 | 
378 | </code>
379 | </pre>
380 | 
381 | 
382 | 
383 | 
384 | 
385 | `len` is much faster than any function we could write ourselves,
386 | and much easier to read than a two-line loop;
387 | it will also give us the length of many other things that we haven't met yet,
388 | so we should always use it when we can.
389 | 
390 | 
391 | 
392 | 
393 | 
394 | ## From 1 to N
395 | 
396 | Python has a built-in function called `range` that creates a sequence of numbers. `range` can
397 | accept 1, 2, or 3 parameters.
398 | 
399 | * If one parameter is given, `range` creates an array of that length,
400 |   starting at zero and incrementing by 1.
401 |   For example, `range(3)` produces the numbers `0, 1, 2`.
402 | * If two parameters are given, `range` starts at
403 |   the first and ends just before the second, incrementing by one.
404 |   For example, `range(2, 5)` produces `2, 3, 4`.
405 | * If `range` is given 3 parameters,
406 |   it starts at the first one, ends just before the second one, and increments by the third one.
407 |   For exmaple `range(3, 10, 2)` produces `3, 5, 7, 9`.
408 | 
409 | 
410 | 
411 | 
412 | 
413 | 
414 | 
415 | ## Challenge - loop over a range
416 | Using `range`,
417 | write a loop that uses `range` to print the first 3 natural numbers:
418 | 
419 | ```
420 | 1
421 | 2
422 | 3
423 | ```
424 | 
425 | 
426 | 
427 | 
428 | 
429 | 
430 | 
431 | <!-- 
432 | ## Solution
433 |  -->
434 | 
435 | 
436 | 
437 | <!-- 
438 | 
439 | ```python
440 | for i in range(1, 4):
441 |    print(i)
442 | ```
443 | 
444 | <pre class="output">
445 | <div class="output_label">output</div>
446 | <code class="text">
447 | 1
448 | 2
449 | 3
450 | 
451 | </code>
452 | </pre>
453 |  -->
454 | 
455 | 
456 | 
457 | 
458 | ## Computing Powers With Loops
459 | 
460 | Exponentiation is built into Python:
461 | 
462 | 
463 | 
464 | 
465 | 
466 | 
467 | ```python
468 | print(5 ** 3)
469 | ```
470 | 
471 | <pre class="output">
472 | <div class="output_label">output</div>
473 | <code class="text">
474 | 125
475 | 
476 | </code>
477 | </pre>
478 | 
479 | 
480 | 
481 | 
482 | 
483 | ## Challenge - multiplication in a loop
484 | 
485 | Write a loop that calculates the same result as `5 ** 3` using
486 | multiplication (and without exponentiation).
487 | 
488 | 
489 | 
490 | 
491 | <!-- 
492 | ## Solution
493 |  -->
494 | 
495 | 
496 | 
497 | <!-- 
498 | 
499 | ```python
500 | result = 1
501 | for i in range(0, 3):
502 |    result = result * 5
503 | print(result)
504 | ```
505 | 
506 | <pre class="output">
507 | <div class="output_label">output</div>
508 | <code class="text">
509 | 125
510 | 
511 | </code>
512 | </pre>
513 |  -->
514 | 
515 | 
516 | 
517 | 
518 | ## Bonus challenge: reverse a string
519 | 
520 | Knowing that two strings can be concatenated using the `+` operator,
521 | write a loop that takes a string
522 | and produces a new string with the characters in reverse order,
523 | so `'Newton'` becomes `'notweN'`.
524 | 
525 | 
526 | 
527 | 
528 | <!-- 
529 | ## Solution
530 |  -->
531 | 
532 | 
533 | 
534 | <!-- 
535 | 
536 | ```python
537 | newstring = ''
538 | oldstring = 'Newton'
539 | for char in oldstring:
540 |    newstring = char + newstring
541 | print(newstring)
542 | ```
543 | 
544 | <pre class="output">
545 | <div class="output_label">output</div>
546 | <code class="text">
547 | notweN
548 | 
549 | </code>
550 | </pre>
551 |  -->
552 | 
553 | 
554 | 
555 | 
556 | ## Enumerate
557 | 
558 | The built-in function `enumerate` takes a sequence (e.g. a list) and generates a
559 | new sequence of the same length. Each element of the new sequence is a pair composed of the index
560 | (0, 1, 2,...) and the value from the original sequence:
561 | 
562 | ```
563 | for i, x in enumerate(xs):
564 |     # Do something with i and x
565 | ```
566 | 
567 | 
568 | The code above loops through `xs`, assigning the index to `i` and the value to `x`.
569 | 
570 | 
571 | 
572 | 
573 | 
574 | ## Bonus challenge: enumeration for computing the value of a polynomial
575 | 
576 | Suppose you have encoded a polynomial as a list of coefficients in
577 | the following way: the first element is the constant term, the
578 | second element is the coefficient of the linear term, the third is the
579 | coefficient of the quadratic term, etc.
580 | 
581 | ```
582 | x = 5
583 | cc = [2, 4, 3]
584 | ```
585 | 
586 | 
587 | ```
588 | y = cc[0] * x**0 + cc[1] * x**1 + cc[2] * x**2
589 | y = 97
590 | ```
591 | 
592 | 
593 | Write a loop using `enumerate(cc)` which computes the value `y` of any
594 | polynomial, given `x` and `cc`.
595 | 
596 | 
597 | 
598 | 
599 | <!-- 
600 | ## Solution
601 |  -->
602 | 
603 | 
604 | 
605 | <!-- 
606 | 
607 | ```python
608 | x = 5
609 | cc = [2, 4, 3]
610 | y = cc[0] * x**0 + cc[1] * x**1 + cc[2] * x**2
611 | 
612 | y = 0
613 | for i, c in enumerate(cc):
614 |     y = y + x**i * c
615 |     
616 | print(y)
617 | ```
618 | 
619 | <pre class="output">
620 | <div class="output_label">output</div>
621 | <code class="text">
622 | 97
623 | 
624 | </code>
625 | </pre>
626 |  -->
627 | 
628 | 
629 | 
630 | 
631 | 
632 | ```python
633 | 
634 | ```
635 | 
636 | 
637 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/wip/slicing_and_list_comprehensions.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "slideshow": {
  7 |      "slide_type": "slide"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "## Slicing lists"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "code",
 16 |    "execution_count": 41,
 17 |    "metadata": {},
 18 |    "outputs": [
 19 |     {
 20 |      "data": {
 21 |       "text/plain": [
 22 |        "[2, 4, 6]"
 23 |       ]
 24 |      },
 25 |      "execution_count": 41,
 26 |      "metadata": {},
 27 |      "output_type": "execute_result"
 28 |     }
 29 |    ],
 30 |    "source": [
 31 |     "numbers = [2, 4, 6, 8, 10, 12]\n",
 32 |     "\n",
 33 |     "# list[start:end]\n",
 34 |     "# start is inclusive, end isn't\n",
 35 |     "\n",
 36 |     "numbers[0:3]"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "code",
 41 |    "execution_count": 42,
 42 |    "metadata": {},
 43 |    "outputs": [
 44 |     {
 45 |      "data": {
 46 |       "text/plain": [
 47 |        "[10, 12]"
 48 |       ]
 49 |      },
 50 |      "execution_count": 42,
 51 |      "metadata": {},
 52 |      "output_type": "execute_result"
 53 |     }
 54 |    ],
 55 |    "source": [
 56 |     "numbers[4:7]"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": 43,
 62 |    "metadata": {},
 63 |    "outputs": [
 64 |     {
 65 |      "data": {
 66 |       "text/plain": [
 67 |        "[2, 4, 6]"
 68 |       ]
 69 |      },
 70 |      "execution_count": 43,
 71 |      "metadata": {},
 72 |      "output_type": "execute_result"
 73 |     }
 74 |    ],
 75 |    "source": [
 76 |     "numbers[:3] # omitting start implies 0 (the very start)"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "code",
 81 |    "execution_count": 44,
 82 |    "metadata": {},
 83 |    "outputs": [
 84 |     {
 85 |      "data": {
 86 |       "text/plain": [
 87 |        "[8, 10, 12]"
 88 |       ]
 89 |      },
 90 |      "execution_count": 44,
 91 |      "metadata": {},
 92 |      "output_type": "execute_result"
 93 |     }
 94 |    ],
 95 |    "source": [
 96 |     "numbers[3:] # omitting end means to the very end eg len(numbers)"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": 45,
102 |    "metadata": {
103 |     "slideshow": {
104 |      "slide_type": "subslide"
105 |     }
106 |    },
107 |    "outputs": [
108 |     {
109 |      "data": {
110 |       "text/plain": [
111 |        "[12]"
112 |       ]
113 |      },
114 |      "execution_count": 45,
115 |      "metadata": {},
116 |      "output_type": "execute_result"
117 |     }
118 |    ],
119 |    "source": [
120 |     "numbers[-1:] # negative values reverse direction"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": 46,
126 |    "metadata": {},
127 |    "outputs": [
128 |     {
129 |      "data": {
130 |       "text/plain": [
131 |        "[2, 4, 6, 8, 10]"
132 |       ]
133 |      },
134 |      "execution_count": 46,
135 |      "metadata": {},
136 |      "output_type": "execute_result"
137 |     }
138 |    ],
139 |    "source": [
140 |     "numbers[:-1]"
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": 47,
146 |    "metadata": {
147 |     "slideshow": {
148 |      "slide_type": "subslide"
149 |     }
150 |    },
151 |    "outputs": [
152 |     {
153 |      "data": {
154 |       "text/plain": [
155 |        "[2, 6, 10]"
156 |       ]
157 |      },
158 |      "execution_count": 47,
159 |      "metadata": {},
160 |      "output_type": "execute_result"
161 |     }
162 |    ],
163 |    "source": [
164 |     "# you can also specify a step size\n",
165 |     "# list[start:end:step]\n",
166 |     "\n",
167 |     "numbers[0:6:2]"
168 |    ]
169 |   },
170 |   {
171 |    "cell_type": "code",
172 |    "execution_count": 48,
173 |    "metadata": {
174 |     "slideshow": {
175 |      "slide_type": "subslide"
176 |     }
177 |    },
178 |    "outputs": [
179 |     {
180 |      "data": {
181 |       "text/plain": [
182 |        "[2, 4, 6, 8, 10, 12]"
183 |       ]
184 |      },
185 |      "execution_count": 48,
186 |      "metadata": {},
187 |      "output_type": "execute_result"
188 |     }
189 |    ],
190 |    "source": [
191 |     "# [:] is a shorthand for copying a list.\n",
192 |     "# Equivalent to:\n",
193 |     "# n_copy = list(numbers)\n",
194 |     "\n",
195 |     "n_copy = numbers[:]\n",
196 |     "n_copy"
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "code",
201 |    "execution_count": 49,
202 |    "metadata": {},
203 |    "outputs": [
204 |     {
205 |      "data": {
206 |       "text/plain": [
207 |        "[2, 4, 6, 8, 10, 12]"
208 |       ]
209 |      },
210 |      "execution_count": 49,
211 |      "metadata": {},
212 |      "output_type": "execute_result"
213 |     }
214 |    ],
215 |    "source": [
216 |     "n_copy[3] = 8\n",
217 |     "n_copy"
218 |    ]
219 |   },
220 |   {
221 |    "cell_type": "code",
222 |    "execution_count": 50,
223 |    "metadata": {},
224 |    "outputs": [
225 |     {
226 |      "data": {
227 |       "text/plain": [
228 |        "[2, 4, 6, 8, 10, 12]"
229 |       ]
230 |      },
231 |      "execution_count": 50,
232 |      "metadata": {},
233 |      "output_type": "execute_result"
234 |     }
235 |    ],
236 |    "source": [
237 |     "numbers"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "markdown",
242 |    "metadata": {
243 |     "slideshow": {
244 |      "slide_type": "slide"
245 |     },
246 |     "tags": [
247 |      "challenge"
248 |     ]
249 |    },
250 |    "source": [
251 |     "### Challenge 1\n",
252 |     "\n",
253 |     "Given the list: `['banana', 'cherry', 'strawberry', 'orange']`\n",
254 |     "\n",
255 |     "Return a list of just the red fruits."
256 |    ]
257 |   },
258 |   {
259 |    "cell_type": "markdown",
260 |    "metadata": {
261 |     "tags": [
262 |      "solution"
263 |     ]
264 |    },
265 |    "source": [
266 |     "### Solution"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "code",
271 |    "execution_count": 51,
272 |    "metadata": {
273 |     "slideshow": {
274 |      "slide_type": "slide"
275 |     },
276 |     "tags": [
277 |      "solution"
278 |     ]
279 |    },
280 |    "outputs": [
281 |     {
282 |      "data": {
283 |       "text/plain": [
284 |        "['cherry', 'strawberry']"
285 |       ]
286 |      },
287 |      "execution_count": 51,
288 |      "metadata": {},
289 |      "output_type": "execute_result"
290 |     }
291 |    ],
292 |    "source": [
293 |     "fruits = ['banana', 'cherry', 'strawberry', 'orange']\n",
294 |     "red_ones = fruits[1:3]\n",
295 |     "red_ones"
296 |    ]
297 |   },
298 |   {
299 |    "cell_type": "markdown",
300 |    "metadata": {
301 |     "slideshow": {
302 |      "slide_type": "slide"
303 |     }
304 |    },
305 |    "source": [
306 |     "## Loops"
307 |    ]
308 |   },
309 |   {
310 |    "cell_type": "markdown",
311 |    "metadata": {
312 |     "slideshow": {
313 |      "slide_type": "subslide"
314 |     }
315 |    },
316 |    "source": [
317 |     "A `for` loop works on a sequence types, generators and iterators\n",
318 |     "\n",
319 |     "(this includes lists, tuples, strings and dictionaries)"
320 |    ]
321 |   },
322 |   {
323 |    "cell_type": "code",
324 |    "execution_count": 74,
325 |    "metadata": {},
326 |    "outputs": [
327 |     {
328 |      "name": "stdout",
329 |      "output_type": "stream",
330 |      "text": [
331 |       "A\n",
332 |       "B\n",
333 |       "C\n",
334 |       "D\n",
335 |       ".\n",
336 |       ".\n",
337 |       "m\n",
338 |       "e\n",
339 |       "h\n"
340 |      ]
341 |     }
342 |    ],
343 |    "source": [
344 |     "for letter in \"ABCD..meh\":\n",
345 |     "    print(letter)"
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "code",
350 |    "execution_count": 75,
351 |    "metadata": {
352 |     "slideshow": {
353 |      "slide_type": "subslide"
354 |     }
355 |    },
356 |    "outputs": [
357 |     {
358 |      "name": "stdout",
359 |      "output_type": "stream",
360 |      "text": [
361 |       "('Z', 99)\n",
362 |       "('Y', 98)\n",
363 |       "('X', 97)\n",
364 |       "Z 99\n",
365 |       "Y 98\n",
366 |       "X 97\n"
367 |      ]
368 |     }
369 |    ],
370 |    "source": [
371 |     "ts = [('Z', 99), ('Y', 98), ('X', 97)]\n",
372 |     "\n",
373 |     "for t in ts:\n",
374 |     "    print(t)\n",
375 |     "    \n",
376 |     "# using tuple unpacking\n",
377 |     "for m, n in ts:\n",
378 |     "    print(m, n)"
379 |    ]
380 |   },
381 |   {
382 |    "cell_type": "code",
383 |    "execution_count": 76,
384 |    "metadata": {
385 |     "slideshow": {
386 |      "slide_type": "subslide"
387 |     }
388 |    },
389 |    "outputs": [
390 |     {
391 |      "name": "stdout",
392 |      "output_type": "stream",
393 |      "text": [
394 |       "('A', 1)\n",
395 |       "('B', 2)\n",
396 |       "('C', 3)\n"
397 |      ]
398 |     }
399 |    ],
400 |    "source": [
401 |     "# for on dictionary.items()\n",
402 |     "d = {'A': 1, 'B': 2, 'C': 3}\n",
403 |     "\n",
404 |     "for item in d.items():\n",
405 |     "    # print(type(item))\n",
406 |     "    print(item)"
407 |    ]
408 |   },
409 |   {
410 |    "cell_type": "code",
411 |    "execution_count": 77,
412 |    "metadata": {
413 |     "slideshow": {
414 |      "slide_type": "subslide"
415 |     }
416 |    },
417 |    "outputs": [
418 |     {
419 |      "name": "stdout",
420 |      "output_type": "stream",
421 |      "text": [
422 |       "A 1\n",
423 |       "B 2\n",
424 |       "C 3\n"
425 |      ]
426 |     }
427 |    ],
428 |    "source": [
429 |     "for k, v in d.items():\n",
430 |     "    print(k, v)"
431 |    ]
432 |   },
433 |   {
434 |    "cell_type": "markdown",
435 |    "metadata": {
436 |     "slideshow": {
437 |      "slide_type": "subslide"
438 |     }
439 |    },
440 |    "source": [
441 |     "`while` loops keep looping while their condition is true:\n",
442 |     "\n",
443 |     "```\n",
444 |     "while some_condition:\n",
445 |     "    do_stuff()\n",
446 |     "```\n",
447 |     "\n",
448 |     "Note: If the condition for your `while` loops never becomes `False`, the loop will run forever (in Jupyter you can do *Kernel -> Interrupt* to break out of the infinite loop)."
449 |    ]
450 |   },
451 |   {
452 |    "cell_type": "code",
453 |    "execution_count": 78,
454 |    "metadata": {},
455 |    "outputs": [
456 |     {
457 |      "name": "stdout",
458 |      "output_type": "stream",
459 |      "text": [
460 |       "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 "
461 |      ]
462 |     }
463 |    ],
464 |    "source": [
465 |     "a = 0\n",
466 |     "while a < 16:\n",
467 |     "    print(a, end=' ')\n",
468 |     "    a += 1"
469 |    ]
470 |   },
471 |   {
472 |    "cell_type": "markdown",
473 |    "metadata": {
474 |     "slideshow": {
475 |      "slide_type": "subslide"
476 |     }
477 |    },
478 |    "source": [
479 |     "`break` immediately exits a loop\n",
480 |     "\n",
481 |     "`continue` immediately starts the next iteration of the loop\n",
482 |     "\n",
483 |     "Any code inside the loop after a `break` or `continue` is skipped."
484 |    ]
485 |   },
486 |   {
487 |    "cell_type": "code",
488 |    "execution_count": 79,
489 |    "metadata": {},
490 |    "outputs": [
491 |     {
492 |      "name": "stdout",
493 |      "output_type": "stream",
494 |      "text": [
495 |       "2 4 6 8 10 12 14 16 "
496 |      ]
497 |     }
498 |    ],
499 |    "source": [
500 |     "a = 0\n",
501 |     "while True:\n",
502 |     "    a += 1\n",
503 |     "    \n",
504 |     "    if a > 16:\n",
505 |     "        break\n",
506 |     "        print('We will never see this.')\n",
507 |     "    \n",
508 |     "    if a % 2:\n",
509 |     "        continue\n",
510 |     "        print('We will also never see this.')\n",
511 |     "        \n",
512 |     "    print(a, end=' ')"
513 |    ]
514 |   },
515 |   {
516 |    "cell_type": "markdown",
517 |    "metadata": {
518 |     "slideshow": {
519 |      "slide_type": "slide"
520 |     }
521 |    },
522 |    "source": [
523 |     "## List comprehensions\n",
524 |     "\n",
525 |     "List comprehensions are a shorthand way to loop over a list, modify the items and create a new list."
526 |    ]
527 |   },
528 |   {
529 |    "cell_type": "code",
530 |    "execution_count": 80,
531 |    "metadata": {
532 |     "slideshow": {
533 |      "slide_type": "subslide"
534 |     }
535 |    },
536 |    "outputs": [
537 |     {
538 |      "data": {
539 |       "text/plain": [
540 |        "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]"
541 |       ]
542 |      },
543 |      "execution_count": 80,
544 |      "metadata": {},
545 |      "output_type": "execute_result"
546 |     }
547 |    ],
548 |    "source": [
549 |     "# Instead of doing\n",
550 |     "new_list = []\n",
551 |     "for i in range(0,11):\n",
552 |     "    new_list.append(i**2)\n",
553 |     "\n",
554 |     "new_list"
555 |    ]
556 |   },
557 |   {
558 |    "cell_type": "code",
559 |    "execution_count": 81,
560 |    "metadata": {},
561 |    "outputs": [
562 |     {
563 |      "data": {
564 |       "text/plain": [
565 |        "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]"
566 |       ]
567 |      },
568 |      "execution_count": 81,
569 |      "metadata": {},
570 |      "output_type": "execute_result"
571 |     }
572 |    ],
573 |    "source": [
574 |     "# Use a list comprehension instead\n",
575 |     "new_list = [i**2 for i in range(0,11)]\n",
576 |     "new_list"
577 |    ]
578 |   },
579 |   {
580 |    "cell_type": "code",
581 |    "execution_count": 82,
582 |    "metadata": {
583 |     "slideshow": {
584 |      "slide_type": "subslide"
585 |     }
586 |    },
587 |    "outputs": [
588 |     {
589 |      "data": {
590 |       "text/plain": [
591 |        "[0, 1, 4, 9]"
592 |       ]
593 |      },
594 |      "execution_count": 82,
595 |      "metadata": {},
596 |      "output_type": "execute_result"
597 |     }
598 |    ],
599 |    "source": [
600 |     "# You can also `filter` values using an if statement inside the list comprehension\n",
601 |     "new_list = [i**2 for i in range(0,11) if i < 4]\n",
602 |     "new_list"
603 |    ]
604 |   }
605 |  ],
606 |  "metadata": {
607 |   "celltoolbar": "Tags",
608 |   "kernelspec": {
609 |    "display_name": "Python 3",
610 |    "language": "python",
611 |    "name": "python3"
612 |   },
613 |   "language_info": {
614 |    "codemirror_mode": {
615 |     "name": "ipython",
616 |     "version": 3
617 |    },
618 |    "file_extension": ".py",
619 |    "mimetype": "text/x-python",
620 |    "name": "python",
621 |    "nbconvert_exporter": "python",
622 |    "pygments_lexer": "ipython3",
623 |    "version": "3.6.3"
624 |   }
625 |  },
626 |  "nbformat": 4,
627 |  "nbformat_minor": 2
628 | }
629 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/wip/basics_data_carpentry.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# The Basics of Python\n",
  8 |     "\n",
  9 |     "Python is a general purpose programming language that supports rapid development\n",
 10 |     "of scripts and applications.\n",
 11 |     "\n",
 12 |     "Python's main advantages:\n",
 13 |     "\n",
 14 |     "* Open Source software, supported by Python Software Foundation\n",
 15 |     "* Available on all major platforms (ie. Windows, Linux and MacOS) \n",
 16 |     "* It is a general-purpose programming language, designed for readability\n",
 17 |     "* Supports multiple programming paradigms ('functional', 'object oriented')\n",
 18 |     "* Very large community with a rich ecosystem of third-party packages"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Interpreter\n",
 26 |     "\n",
 27 |     "Python is an interpreted language which can be used in two ways:\n",
 28 |     "\n",
 29 |     "* \"Interactive\" Mode: It functions like an \"advanced calculator\" Executing\n",
 30 |     "  one command at a time:\n",
 31 |     "\n"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": null,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "user:host:~$ python\n",
 41 |     "Python 3.5.1 (default, Oct 23 2015, 18:05:06)\n",
 42 |     "[GCC 4.8.3] on linux2\n",
 43 |     "Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n",
 44 |     ">>> 2 + 2\n",
 45 |     "4\n",
 46 |     ">>> print(\"Hello World\")\n",
 47 |     "Hello World\n"
 48 |    ]
 49 |   },
 50 |   {
 51 |    "cell_type": "markdown",
 52 |    "metadata": {},
 53 |    "source": [
 54 |     "\n",
 55 |     "* \"Scripting\" Mode: Executing a series of \"commands\" saved in text file,\n",
 56 |     "  usually with a `.py` extension after the name of your file:\n",
 57 |     "\n",
 58 |     "```bash\n",
 59 |     "user:host:~$ python my_script.py\n",
 60 |     "Hello World\n",
 61 |     "```\n",
 62 |     "\n",
 63 |     "\n"
 64 |    ]
 65 |   },
 66 |   {
 67 |    "cell_type": "markdown",
 68 |    "metadata": {},
 69 |    "source": [
 70 |     "## Introduction to Python built-in data types\n",
 71 |     "\n",
 72 |     "### Strings, integers and floats\n",
 73 |     "\n",
 74 |     "One of the most basic things we can do in Python is assign values to variables:\n",
 75 |     "\n"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "code",
 80 |    "execution_count": null,
 81 |    "metadata": {},
 82 |    "outputs": [],
 83 |    "source": [
 84 |     "text = \"Data Fluency\"  # An example of a string\n",
 85 |     "number = 42  # An example of an integer\n",
 86 |     "pi_value = 3.1415  # An example of a float\n"
 87 |    ]
 88 |   },
 89 |   {
 90 |    "cell_type": "markdown",
 91 |    "metadata": {},
 92 |    "source": [
 93 |     "\n",
 94 |     "Here we've assigned data to the variables `text`, `number` and `pi_value`,\n",
 95 |     "using the assignment operator `=`. To review the value of a variable, we\n",
 96 |     "can type the name of the variable into the Jupyter notebook and press **Shift** and **Enter**:\n",
 97 |     "\n"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": null,
103 |    "metadata": {},
104 |    "outputs": [],
105 |    "source": [
106 |     "text\n",
107 |     "## Which Returns\n",
108 |     "\"Data Fluency\"\n"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "markdown",
113 |    "metadata": {},
114 |    "source": [
115 |     "\n",
116 |     "Everything in Python has a type. To get the type of something, we can pass it\n",
117 |     "to the built-in function `type`:\n",
118 |     "\n"
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": null,
124 |    "metadata": {},
125 |    "outputs": [],
126 |    "source": [
127 |     "type(text)\n",
128 |     " str\n",
129 |     "type(number)\n",
130 |     " int\n",
131 |     "type(6.02)\n",
132 |     " float\n"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "markdown",
137 |    "metadata": {},
138 |    "source": [
139 |     "\n",
140 |     "The variable `text` is of type `str`, short for \"string\". Strings hold\n",
141 |     "sequences of characters, which can be letters, numbers, punctuation\n",
142 |     "or more exotic forms of text (even emoji!).\n",
143 |     "\n",
144 |     "We can also see the value of something using another built-in function, `print`:\n",
145 |     "\n"
146 |    ]
147 |   },
148 |   {
149 |    "cell_type": "code",
150 |    "execution_count": null,
151 |    "metadata": {},
152 |    "outputs": [],
153 |    "source": [
154 |     "\n",
155 |     "print(text)\n",
156 |     "Data Fluency\n",
157 |     "\n",
158 |     "print(11)\n",
159 |     "11\n"
160 |    ]
161 |   },
162 |   {
163 |    "cell_type": "markdown",
164 |    "metadata": {},
165 |    "source": [
166 |     "\n",
167 |     "This may seem redundant, but in fact it's the only way to display output in a script:\n",
168 |     "\n",
169 |     "*example.py*\n"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "code",
174 |    "execution_count": null,
175 |    "metadata": {},
176 |    "outputs": [],
177 |    "source": [
178 |     "# A Python script file\n",
179 |     "# Comments in Python start with #\n",
180 |     "# The next line assigns the string \"Data Carpentry\" to the variable \"text\".\n",
181 |     "text = \"Data Fluency\"\n",
182 |     "# The next line does nothing!\n",
183 |     "text\n",
184 |     "# The next line uses the print function to print out the value we assigned to \"text\"\n",
185 |     "print(text)\n"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "markdown",
190 |    "metadata": {},
191 |    "source": [
192 |     "*Running the script*\n",
193 |     "```bash\n",
194 |     "$ python example.py\n",
195 |     "Data Fluency\n",
196 |     "```\n",
197 |     "\n",
198 |     "Notice that \"Data Fluency\" is printed only once. \n",
199 |     "\n",
200 |     "**Tip**: `print` and `type` are built-in functions in Python. Later in this\n",
201 |     "lesson, we will introduce methods and user-defined functions. The Python\n",
202 |     "documentation is excellent for reference on the differences between them.\n",
203 |     "\n"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "code",
208 |    "execution_count": null,
209 |    "metadata": {},
210 |    "outputs": [],
211 |    "source": [
212 |     "help(print)\n"
213 |    ]
214 |   },
215 |   {
216 |    "cell_type": "markdown",
217 |    "metadata": {},
218 |    "source": [
219 |     "\n",
220 |     "Will give the output\n",
221 |     "\n",
222 |     "```\n",
223 |     "Help on built-in function print in module builtins:\n",
224 |     "\n",
225 |     "print(...)\n",
226 |     "    print(value, ..., sep=' ', end='\\n', file=sys.stdout, flush=False)\n",
227 |     "    \n",
228 |     "    Prints the values to a stream, or to sys.stdout by default.\n",
229 |     "    Optional keyword arguments:\n",
230 |     "    file:  a file-like object (stream); defaults to the current sys.stdout.\n",
231 |     "    sep:   string inserted between values, default a space.\n",
232 |     "    end:   string appended after the last value, default a newline.\n",
233 |     "    flush: whether to forcibly flush the stream.\n",
234 |     "```\n",
235 |     "\n",
236 |     "### Operators\n",
237 |     "\n",
238 |     "We can perform mathematical calculations in Python using the basic operators\n",
239 |     " `+, -, /, *, %`:\n",
240 |     "\n"
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": null,
246 |    "metadata": {},
247 |    "outputs": [],
248 |    "source": [
249 |     ">>> 2 + 2  # Addition\n",
250 |     "4\n",
251 |     ">>> 6 * 7  # Multiplication\n",
252 |     "42\n",
253 |     ">>> 2 ** 16  # Power\n",
254 |     "65536\n",
255 |     ">>> 13 % 5  # Modulo\n",
256 |     "3\n"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "metadata": {},
262 |    "source": [
263 |     "\n",
264 |     "We can also use comparison and logic operators:\n",
265 |     "`<, >, ==, !=, <=, >=` and statements of identity such as\n",
266 |     "`and, or, not`. The data type returned by this is\n",
267 |     "called a _boolean_.\n",
268 |     "\n",
269 |     "\n"
270 |    ]
271 |   },
272 |   {
273 |    "cell_type": "code",
274 |    "execution_count": null,
275 |    "metadata": {},
276 |    "outputs": [],
277 |    "source": [
278 |     ">>> 3 > 4\n",
279 |     "False\n",
280 |     ">>> True and True\n",
281 |     "True\n",
282 |     ">>> True or False\n",
283 |     "True\n"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "markdown",
288 |    "metadata": {},
289 |    "source": [
290 |     "\n"
291 |    ]
292 |   },
293 |   {
294 |    "cell_type": "markdown",
295 |    "metadata": {},
296 |    "source": [
297 |     "## Sequential types: Lists and Tuples\n",
298 |     "\n",
299 |     "### Lists\n",
300 |     "\n",
301 |     "**Lists** are a common data structure to hold an ordered sequence of\n",
302 |     "elements. Each element can be accessed by an index.  Note that Python\n",
303 |     "indexes start with 0 instead of 1:\n",
304 |     "\n"
305 |    ]
306 |   },
307 |   {
308 |    "cell_type": "code",
309 |    "execution_count": null,
310 |    "metadata": {},
311 |    "outputs": [],
312 |    "source": [
313 |     ">>> numbers = [1, 2, 3]\n",
314 |     ">>> numbers[0]\n",
315 |     "1\n"
316 |    ]
317 |   },
318 |   {
319 |    "cell_type": "markdown",
320 |    "metadata": {},
321 |    "source": [
322 |     "\n",
323 |     "A `for` loop can be used to access the elements in a list or other Python data structure one at a time. We will learn about loops in other lesson.\n",
324 |     "\n"
325 |    ]
326 |   },
327 |   {
328 |    "cell_type": "code",
329 |    "execution_count": null,
330 |    "metadata": {},
331 |    "outputs": [],
332 |    "source": [
333 |     ">>> for num in numbers:\n",
334 |     "...     print(num)\n",
335 |     "...\n",
336 |     "1\n",
337 |     "2\n",
338 |     "3\n"
339 |    ]
340 |   },
341 |   {
342 |    "cell_type": "markdown",
343 |    "metadata": {},
344 |    "source": [
345 |     "\n",
346 |     "**Indentation** is very important in Python. Note that the second line in the\n",
347 |     "example above is indented. Just like three chevrons `>>>` indicate an\n",
348 |     "interactive prompt in Python, the three dots `...` are Python's prompt for\n",
349 |     "multiple lines. This is Python's way of marking a block of code. [Note: you\n",
350 |     "do not type `>>>` or `...`.]\n",
351 |     "\n",
352 |     "To add elements to the end of a list, we can use the `append` method. Methods\n",
353 |     "are a way to interact with an object (a list, for example). We can invoke a\n",
354 |     "method using the dot `.` followed by the method name and a list of arguments\n",
355 |     "in parentheses. Let's look at an example using `append`:\n",
356 |     "\n"
357 |    ]
358 |   },
359 |   {
360 |    "cell_type": "code",
361 |    "execution_count": null,
362 |    "metadata": {},
363 |    "outputs": [],
364 |    "source": [
365 |     ">>> numbers.append(4)\n",
366 |     ">>> print(numbers)\n",
367 |     "[1, 2, 3, 4]\n",
368 |     ">>>\n"
369 |    ]
370 |   },
371 |   {
372 |    "cell_type": "markdown",
373 |    "metadata": {},
374 |    "source": [
375 |     "\n",
376 |     "To find out what methods are available for an\n",
377 |     "object, we can use the built-in `help` command:\n",
378 |     "\n"
379 |    ]
380 |   },
381 |   {
382 |    "cell_type": "code",
383 |    "execution_count": null,
384 |    "metadata": {},
385 |    "outputs": [],
386 |    "source": [
387 |     "help(numbers)\n",
388 |     "\n",
389 |     "Help on list object:\n",
390 |     "\n",
391 |     "class list(object)\n",
392 |     " |  list() -> new empty list\n",
393 |     " |  list(iterable) -> new list initialized from iterable's items\n",
394 |     " ...\n"
395 |    ]
396 |   },
397 |   {
398 |    "cell_type": "markdown",
399 |    "metadata": {},
400 |    "source": [
401 |     "\n",
402 |     "### Tuples\n",
403 |     "\n",
404 |     "A tuple is similar to a list in that it's an ordered sequence of elements.\n",
405 |     "However, tuples can not be changed once created (they are \"immutable\"). Tuples\n",
406 |     "are created by placing comma-separated values inside parentheses `()`.\n",
407 |     "\n"
408 |    ]
409 |   },
410 |   {
411 |    "cell_type": "code",
412 |    "execution_count": null,
413 |    "metadata": {},
414 |    "outputs": [],
415 |    "source": [
416 |     "# Tuples use parentheses\n",
417 |     "a_tuple= (1, 2, 3)\n",
418 |     "another_tuple = ('blue', 'green', 'red')\n",
419 |     "# Note: lists use square brackets\n",
420 |     "a_list = [1, 2, 3]\n"
421 |    ]
422 |   },
423 |   {
424 |    "cell_type": "markdown",
425 |    "metadata": {},
426 |    "source": [
427 |     "\n"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "markdown",
432 |    "metadata": {},
433 |    "source": [
434 |     "## Challenge - Tuples\n",
435 |     "1. What happens when you type `a_tuple[2]=5` vs `a_list[1]=5` ?\n",
436 |     "2. Type `type(a_tuple)` into python - what is the object type?\n",
437 |     "\n",
438 |     "\n",
439 |     "\n"
440 |    ]
441 |   },
442 |   {
443 |    "cell_type": "markdown",
444 |    "metadata": {},
445 |    "source": [
446 |     "## Dictionaries\n",
447 |     "\n",
448 |     "A **dictionary** is a container that holds pairs of objects - keys and values.\n",
449 |     "\n"
450 |    ]
451 |   },
452 |   {
453 |    "cell_type": "code",
454 |    "execution_count": null,
455 |    "metadata": {},
456 |    "outputs": [],
457 |    "source": [
458 |     ">>> translation = {'one': 1, 'two': 2}\n",
459 |     ">>> translation['one']\n",
460 |     "1\n"
461 |    ]
462 |   },
463 |   {
464 |    "cell_type": "markdown",
465 |    "metadata": {},
466 |    "source": [
467 |     "Dictionaries work a lot like lists - except that you index them with *keys*.\n",
468 |     "You can think about a key as a name for or a unique identifier for a set of values\n",
469 |     "in the dictionary. Keys can only have particular types - they have to be\n",
470 |     "\"hashable\". Strings and numeric types are acceptable, but lists aren't.\n",
471 |     "\n"
472 |    ]
473 |   },
474 |   {
475 |    "cell_type": "code",
476 |    "execution_count": null,
477 |    "metadata": {},
478 |    "outputs": [],
479 |    "source": [
480 |     ">>> rev = {1: 'one', 2: 'two'}\n",
481 |     ">>> rev[1]\n",
482 |     "'one'\n",
483 |     ">>> bad = {[1, 2, 3]: 3}\n",
484 |     "Traceback (most recent call last):\n",
485 |     "  File \"<stdin>\", line 1, in <module>\n",
486 |     "TypeError: unhashable type: 'list'\n"
487 |    ]
488 |   },
489 |   {
490 |    "cell_type": "markdown",
491 |    "metadata": {},
492 |    "source": [
493 |     "\n",
494 |     "In Python, a \"Traceback\" is an multi-line error block printed out for the\n",
495 |     "user.\n",
496 |     "\n",
497 |     "To add an item to the dictionary we assign a value to a new key:\n",
498 |     "\n"
499 |    ]
500 |   },
501 |   {
502 |    "cell_type": "code",
503 |    "execution_count": null,
504 |    "metadata": {},
505 |    "outputs": [],
506 |    "source": [
507 |     ">>> rev = {1: 'one', 2: 'two'}\n",
508 |     ">>> rev[3] = 'three'\n",
509 |     ">>> rev\n",
510 |     "{1: 'one', 2: 'two', 3: 'three'}\n"
511 |    ]
512 |   },
513 |   {
514 |    "cell_type": "markdown",
515 |    "metadata": {},
516 |    "source": [
517 |     "\n"
518 |    ]
519 |   },
520 |   {
521 |    "cell_type": "markdown",
522 |    "metadata": {},
523 |    "source": [
524 |     "## Challenge - Can you do reassignment in a dictionary? \n",
525 |     "1. First check what `rev` is right now (remember `rev` is the name of our dictionary).  Type: **rev**\n",
526 |     "\n",
527 |     "2. Try to reassign the second value (in the *key value pair*) so that it no longer reads \"two\" but instead reads \"apple-sauce\". \n",
528 |     "\n",
529 |     "3. Now display `rev` again to see if it has changed. \n",
530 |     "\n",
531 |     "\n",
532 |     "\n",
533 |     "It is important to note that dictionaries are \"unordered\" and do not remember\n",
534 |     "the sequence of their items (i.e. the order in which key:value pairs were\n",
535 |     "added to the dictionary). Because of this, the order in which items are\n",
536 |     "returned from loops over dictionaries might appear random and can even change\n",
537 |     "with time.\n",
538 |     "\n",
539 |     "\n"
540 |    ]
541 |   },
542 |   {
543 |    "cell_type": "code",
544 |    "execution_count": null,
545 |    "metadata": {},
546 |    "outputs": [],
547 |    "source": []
548 |   }
549 |  ],
550 |  "metadata": {
551 |   "celltoolbar": "Tags",
552 |   "kernelspec": {
553 |    "display_name": "Python 3",
554 |    "language": "python",
555 |    "name": "python3"
556 |   },
557 |   "language_info": {
558 |    "codemirror_mode": {
559 |     "name": "ipython",
560 |     "version": 3
561 |    },
562 |    "file_extension": ".py",
563 |    "mimetype": "text/x-python",
564 |    "name": "python",
565 |    "nbconvert_exporter": "python",
566 |    "pygments_lexer": "ipython3",
567 |    "version": "3.6.3"
568 |   }
569 |  },
570 |  "nbformat": 4,
571 |  "nbformat_minor": 2
572 | }
573 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/defensive_programming.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | <style>
  4 | .output_label {
  5 |     text-align: right;
  6 |     margin: -1em;
  7 |     padding: 0;
  8 |     font-size: 0.5em;
  9 |     color: grey
 10 | }
 11 | </style>
 12 | 
 13 | <!-- 
 14 | ## Defensive Programming
 15 | *Estimated teaching time:* 30 min
 16 | 
 17 | *Estimated challenge time:* 0 min
 18 | 
 19 | 
 20 | ## Module information
 21 | 
 22 | *Key questions:*
 23 | 
 24 |   - "How can I make my programs more reliable?"
 25 |     
 26 | *Learning objectives:*
 27 | 
 28 |   - Explain what an assertion is.
 29 |   - Add assertions that check the program's state is correct. 
 30 |   - Correctly add precondition and postcondition assertions to functions.
 31 |   - Explain what test-driven development is, and use it when creating new functions.
 32 |   - Explain why variables should be initialized using actual data values rather than arbitrary constants.
 33 | ---
 34 |  -->
 35 | 
 36 | 
 37 | 
 38 | 
 39 | ## Defensive Programming
 40 | 
 41 | 
 42 | Our previous lessons have introduced the basic tools of programming: variables and lists, file operations, data visualisation, loops, conditionals, and functions. What they haven’t done is show us how to tell whether a program is getting the right answer, and how to tell if it’s still getting the right answer as we make changes to it.
 43 | 
 44 | To achieve that, we need to:
 45 | 
 46 |   - Write programs that check their own operation.
 47 |   - Write and run tests for widely-used functions.
 48 |   - Make sure we know what “correct” actually means.
 49 |     
 50 | The good news is, doing these things will speed up our programming, not slow it down. As in real carpentry — the kind done with lumber — the time saved by measuring carefully before cutting a piece of wood is much greater than the time that measuring takes.
 51 | 
 52 | 
 53 | 
 54 | 
 55 | 
 56 | ## Assertions
 57 | 
 58 | The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion’s condition. If it’s true, Python does nothing, but if it’s false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn’t positive:
 59 | 
 60 | 
 61 | 
 62 | 
 63 | 
 64 | ``` python
 65 | numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
 66 | total = 0.0
 67 | for n in numbers:
 68 |     assert n > 0.0, 'Data should only contain positive values'
 69 |     total += n
 70 | print('total is:', total)
 71 | 
 72 | ```
 73 | 
 74 | 
 75 | 
 76 | 
 77 | 
 78 | ```python
 79 | ---------------------------------------------------------------------------
 80 | AssertionError                            Traceback (most recent call last)
 81 | <ipython-input-1-091518d2f2e2> in <module>()
 82 |       3 total = 0.0
 83 |       4 for n in numbers:
 84 | ----> 5     assert n > 0.0, 'Data should only contain positive values'
 85 |       6     total += n
 86 |       7 print('total is:', total)
 87 | 
 88 | AssertionError: Data should only contain positive values
 89 | 
 90 | ```
 91 | 
 92 | 
 93 | 
 94 | 
 95 | 
 96 | Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80–90% are working correctly. Broadly speaking, assertions fall into three categories:
 97 | 
 98 | A `precondition` is something that must be true at the start of a function in order for it to work correctly.
 99 | 
100 | A `postcondition` is something that the function guarantees is true when it finishes.
101 | 
102 | An `invariant` is something that is always true at a particular point inside a piece of code.
103 | 
104 | For example, suppose we are representing rectangles using a `tuple` of four coordinates `(x0, y0, x1, y1)`, representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, but checks that its input is correctly formatted and that its result makes sense:
105 | 
106 | 
107 | 
108 | 
109 | 
110 | 
111 | ```python
112 | def normalize_rectangle(rect):
113 |     '''Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.
114 |     Input should be of the format (x0, y0, x1, y1).
115 |     (x0, y0) and (x1, y1) define the lower left and upper right corners
116 |     of the rectangle, respectively.'''
117 |     assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
118 |     x0, y0, x1, y1 = rect
119 |     assert x0 < x1, 'Invalid X coordinates'
120 |     assert y0 < y1, 'Invalid Y coordinates'
121 | 
122 |     dx = x1 - x0
123 |     dy = y1 - y0
124 |     if dx > dy:
125 |         scaled = float(dx) / dy
126 |         upper_x, upper_y = 1.0, scaled
127 |     else:
128 |         scaled = float(dx) / dy
129 |         upper_x, upper_y = scaled, 1.0
130 | 
131 |     assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
132 |     assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'
133 | 
134 |     return (0, 0, upper_x, upper_y)
135 | ```
136 | 
137 | 
138 | 
139 | 
140 | 
141 | The preconditions on lines 3, 5, and 6 catch invalid inputs:
142 | 
143 | 
144 | 
145 | 
146 | 
147 | ``` python
148 | print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate
149 | 
150 | ```
151 | 
152 | 
153 | 
154 | 
155 | 
156 | ``` python
157 | ---------------------------------------------------------------------------
158 | AssertionError                            Traceback (most recent call last)
159 | <ipython-input-3-1b9cd8e18a1f> in <module>()
160 | ----> 1 print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate
161 | 
162 | <ipython-input-2-c94cf5b065b9> in normalize_rectangle(rect)
163 |       4     (x0, y0) and (x1, y1) define the lower left and upper right corners
164 |       5     of the rectangle, respectively.'''
165 | ----> 6     assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
166 |       7     x0, y0, x1, y1 = rect
167 |       8     assert x0 < x1, 'Invalid X coordinates'
168 | 
169 | AssertionError: Rectangles must contain 4 coordinates
170 | 
171 | ```
172 | 
173 | 
174 | 
175 | 
176 | 
177 | ```python
178 | print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted
179 | ```
180 | 
181 | 
182 | 
183 | 
184 | 
185 | ```python
186 | ---------------------------------------------------------------------------
187 | AssertionError                            Traceback (most recent call last)
188 | <ipython-input-4-325036405532> in <module>()
189 | ----> 1 print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted
190 | 
191 | <ipython-input-2-c94cf5b065b9> in normalize_rectangle(rect)
192 |       6     assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
193 |       7     x0, y0, x1, y1 = rect
194 | ----> 8     assert x0 < x1, 'Invalid X coordinates'
195 |       9     assert y0 < y1, 'Invalid Y coordinates'
196 |      10 
197 | 
198 | AssertionError: Invalid X coordinates
199 | 
200 | ```
201 | 
202 | 
203 | 
204 | 
205 | 
206 | The post-conditions on lines 17 and 18 help us catch bugs by telling us when our calculations cannot have been correct. For example, if we normalize a rectangle that is taller than it is wide everything seems OK:
207 | 
208 | 
209 | 
210 | 
211 | 
212 | 
213 | ```python
214 | print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) ))
215 | ```
216 | 
217 | <pre class="output">
218 | <div class="output_label">output</div>
219 | <code class="text">
220 | (0, 0, 0.2, 1.0)
221 | 
222 | </code>
223 | </pre>
224 | 
225 | 
226 | 
227 | 
228 | 
229 | but if we normalize one that’s wider than it is tall, the assertion is triggered:
230 | 
231 | 
232 | 
233 | 
234 | 
235 | ```python
236 | print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))
237 | ```
238 | 
239 | 
240 | 
241 | 
242 | 
243 | ``` python
244 | ---------------------------------------------------------------------------
245 | AssertionError                            Traceback (most recent call last)
246 | <ipython-input-6-8d4a48f1d068> in <module>()
247 | ----> 1 print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))
248 | 
249 | <ipython-input-2-c94cf5b065b9> in normalize_rectangle(rect)
250 |      19 
251 |      20     assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
252 | ---> 21     assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'
253 |      22 
254 |      23     return (0, 0, upper_x, upper_y)
255 | 
256 | AssertionError: Calculated upper Y coordinate invalid
257 | 
258 | ```
259 | 
260 | 
261 | 
262 | 
263 | 
264 | Re-reading our function, we realize that line 11 should divide `dy` by `dx` rather than `dx` by `dy`.  If we had left out the assertion at the end of the function, we would have created and returned something that had the right shape as a valid answer, but wasn’t. Detecting and debugging that would almost certainly have taken more time in the long run than writing the assertion.
265 | 
266 | But assertions aren’t just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing.
267 | 
268 | Most good programmers follow two rules when adding assertions to their code. The first is, fail early, fail often. The greater the distance between when and where an error occurs and when it’s noticed, the harder the error will be to debug, so good code catches mistakes as early as possible.
269 | 
270 | The second rule is, turn bugs into assertions or tests. Whenever you fix a bug, write an assertion that catches the mistake should you make it again. If you made a mistake in a piece of code, the odds are good that you have made other mistakes nearby, or will make the same mistake (or a related one) the next time you change it. Writing assertions to check that you haven’t regressed (i.e., haven’t re-introduced an old problem) can save a lot of time in the long run, and helps to warn people who are reading the code (including your future self) that this bit is tricky.
271 | 
272 | 
273 | 
274 | 
275 | 
276 | 
277 | 
278 | ### Test-Driven Development
279 | 
280 | An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it’s given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:
281 | 
282 | 
283 | 
284 | 
285 | 
286 | ![test diagram](images/testing.svg)
287 | 
288 | 
289 | 
290 | 
291 | 
292 | Most novice programmers would solve this problem like this:
293 | 
294 |  1. Write a function `range_overlap`.
295 |  2. Call it interactively on two or three different inputs.
296 |  3. If it produces the wrong answer, fix the function and re-run that test.
297 | 
298 | This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way:
299 | 
300 | 1. Write a short function for each test.
301 | 2. Write a `range_overlap` function that should pass those tests.
302 | 3. If `range_overlap` produces any wrong answers, fix it and re-run the test functions.
303 | 
304 | Writing the tests before writing the function they exercise is called `test-driven development` (TDD). Its advocates believe it produces better code faster because:
305 | 
306 | 1. If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.
307 | 2. Writing tests helps programmers figure out what the function is actually supposed to do.
308 | 
309 | Here are three test functions for `range_overlap`:
310 | 
311 | 
312 | 
313 | 
314 | 
315 | ``` python
316 | assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
317 | assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
318 | assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)
319 | ```
320 | 
321 | 
322 | 
323 | 
324 | 
325 | ```python
326 | ---------------------------------------------------------------------------
327 | NameError                                 Traceback (most recent call last)
328 | <ipython-input-9-dc16b942c085> in <module>()
329 | ----> 1 assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
330 |       2 assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
331 |       3 assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)
332 | 
333 | NameError: name 'range_overlap' is not defined
334 | ```
335 | 
336 | 
337 | 
338 | 
339 | 
340 | 
341 | The error is actually reassuring: we haven’t written `range_overlap` yet, so if the tests passed, it would be a sign that someone else had and that we were accidentally using their function.
342 | 
343 | And as a bonus of writing these tests, we’ve implicitly defined what our input and output look like: we expect a list of pairs as input, and produce a single pair as output.
344 | 
345 | Something important is missing, though. We don’t have any tests for the case where the ranges don’t overlap at all:
346 | 
347 | 
348 | 
349 | 
350 | 
351 | ```python
352 | assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == ???
353 | ```
354 | 
355 | 
356 | 
357 | 
358 | 
359 | What should `range_overlap` do in this case: fail with an error message, produce a special value like `(0.0, 0.0)` to signal that there’s no overlap, or *something* else? Any actual implementation of the function will do one of these things; writing the tests first helps us figure out which is *best before* we’re emotionally invested in whatever we happened to write before we realized there was an issue.
360 | 
361 | And what about this case?
362 | 
363 | 
364 | 
365 | 
366 | 
367 | ```python
368 | assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == ???
369 | ```
370 | 
371 | 
372 | 
373 | 
374 | 
375 | Do two segments that touch at their endpoints overlap or not? Mathematicians usually say “yes”, but engineers usually say “no”. The best answer is “whatever is most useful in the rest of our program”, but again, any actual implementation of `range_overlap` is going to do *something*, and whatever it is ought to be consistent with what it does when there’s no overlap at all.
376 | 
377 | Since we’re planning to use the range this function returns as the X axis in a time series chart, we decide that:
378 | 
379 |  1. every overlap has to have non-zero width, and
380 |  2. we will return the special value None when there’s no overlap.
381 |  
382 | `None` is built into Python, and means “nothing here”. (Other languages often call the equivalent value `null` or `nil`). With that decision made, we can finish writing our last two tests:
383 | 
384 | 
385 | 
386 | 
387 | 
388 | ```python
389 | assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
390 | assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
391 | ```
392 | 
393 | 
394 | 
395 | 
396 | 
397 | ```python
398 | ---------------------------------------------------------------------------
399 | NameError                                 Traceback (most recent call last)
400 | <ipython-input-13-42de7ddfb428> in <module>()
401 | ----> 1 assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
402 |       2 assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
403 | 
404 | NameError: name 'range_overlap' is not defined
405 | ```
406 | 
407 | 
408 | 
409 | 
410 | 
411 | Again, we get an error because we haven’t written our function, but we’re now ready to do so:
412 | 
413 | 
414 | 
415 | 
416 | 
417 | 
418 | 
419 | ```python
420 | def range_overlap(ranges):
421 |     '''Return common overlap among a set of [low, high] ranges.'''
422 |     lowest = 0.0
423 |     highest = 1.0
424 |     for (low, high) in ranges:
425 |         lowest = max(lowest, low)
426 |         highest = min(highest, high)
427 |     return (lowest, highest)
428 | ```
429 | 
430 | 
431 | 
432 | 
433 | 
434 | (Take a moment to think about why we use `max` to raise `lowest` and `min` to lower `highest`). We’d now like to re-run our tests, but they’re scattered across three different cells. To make running them easier, let’s put them all in a function:
435 | 
436 | 
437 | 
438 | 
439 | 
440 | 
441 | ```python
442 | def test_range_overlap():
443 |     assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
444 |     assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
445 |     assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
446 |     assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
447 |     assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)
448 | ```
449 | 
450 | 
451 | 
452 | 
453 | 
454 | We can now test `range_overlap` with a single function call:
455 | 
456 | 
457 | 
458 | 
459 | 
460 | ```python 
461 | test_range_overlap() 
462 | ```
463 | 
464 | 
465 | 
466 | 
467 | 
468 | ```python
469 | ---------------------------------------------------------------------------
470 | AssertionError                            Traceback (most recent call last)
471 | <ipython-input-16-80290759369d> in <module>()
472 | ----> 1 test_range_overlap()
473 | 
474 | <ipython-input-15-d61f343ad67a> in test_range_overlap()
475 |       1 def test_range_overlap():
476 | ----> 2     assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
477 |       3     assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
478 |       4     assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
479 |       5     assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
480 | 
481 | AssertionError: 
482 | ```
483 | 
484 | 
485 | 
486 | 
487 | 
488 | The first test that was supposed to produce `None` fails, so we know something is wrong with our function. We don’t know whether the other tests passed or failed because Python halted the program as soon as it spotted the first error. Still, some information is better than none, and if we trace the behavior of the function with that input, we realize that we’re initializing `lowest` and `highest` to 0.0 and 1.0 respectively, regardless of the input values. This violates another important rule of programming: always initialize from data.
489 | 
490 | 
491 | 
492 | 
493 | 
494 | Fix `range_overlap`. Re-run `test_range_overlap` after each change you make.
495 | 
496 | 
497 | 
498 | 
499 | <!-- 
500 | 
501 | ```python
502 | import numpy
503 | 
504 | def range_overlap(ranges):
505 |     '''Return common overlap among a set of [low, high] ranges.'''
506 |     if not ranges:
507 |         # ranges is None or an empty list
508 |         return None
509 |     lowest, highest = ranges[0]
510 |     for (low, high) in ranges[1:]:
511 |         lowest = max(lowest, low)
512 |         highest = min(highest, high)
513 |     if lowest >= highest:  # no overlap
514 |         return None
515 |     else:
516 |         return (lowest, highest)
517 | ```
518 |  -->
519 | 
520 | 
521 | 
522 | 
523 | ## Key points
524 | 
525 |  - Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
526 | 
527 |  - Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.
528 | 
529 |  - Use preconditions to check that the inputs to a function are safe to use.
530 | 
531 |  - Use postconditions to check that the output from a function is safe to use.
532 | 
533 |  - Write tests before writing code in order to help determine exactly what that code is supposed to do.
534 | 
535 | 
536 | 
537 | 
538 | 
539 | 
540 | ```python
541 | 
542 | ```
543 | 
544 | 
545 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/loops.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Automation with Loops"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {
 13 |     "tags": [
 14 |      "solution"
 15 |     ]
 16 |    },
 17 |    "source": [
 18 |     "## Instructor notes\n",
 19 |     "\n",
 20 |     "*Estimated teaching time:* 30 min\n",
 21 |     "\n",
 22 |     "*Estimated challenge time:* 0 min\n",
 23 |     "\n",
 24 |     "*Key questions:*\n",
 25 |     "\n",
 26 |     "  - \"How can I do the same operations on many different values?\"\"\n",
 27 |     "    \n",
 28 |     "*Learning objectives:*\n",
 29 |     "\n",
 30 |     "  - \"Explain what a `for` loop does.\"\n",
 31 |     "  - \"Correctly write `for` loops to repeat simple calculations.\"\n",
 32 |     "  - \"Trace changes to a loop variable as the loop runs.\"\n",
 33 |     "  - \"Trace changes to other variables as they are updated by a `for` loop.\"\n",
 34 |     "\n",
 35 |     "*Key points:*\n",
 36 |     "\n",
 37 |     "  - \"Use `for variable in sequence` to process the elements of a sequence one at a time.\"\n",
 38 |     "  - \"The body of a `for` loop must be indented.\"\n",
 39 |     "  - \"Use `len(thing)` to determine the length of something that contains other values.\"\n",
 40 |     "\n",
 41 |     "---"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "markdown",
 46 |    "metadata": {},
 47 |    "source": [
 48 |     "An example task that we might want to repeat is printing each character in a\n",
 49 |     "word on a line of its own."
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": 17,
 55 |    "metadata": {},
 56 |    "outputs": [],
 57 |    "source": [
 58 |     "word = 'lead'"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "We can access a character in a string using its index. For example, we can get the first\n",
 66 |     "character of the word `'lead'`, by using `word[0]`. One way to print each character is to use\n",
 67 |     "four `print` statements:"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "code",
 72 |    "execution_count": 18,
 73 |    "metadata": {},
 74 |    "outputs": [
 75 |     {
 76 |      "name": "stdout",
 77 |      "output_type": "stream",
 78 |      "text": [
 79 |       "l\n",
 80 |       "e\n",
 81 |       "a\n",
 82 |       "d\n"
 83 |      ]
 84 |     }
 85 |    ],
 86 |    "source": [
 87 |     "print(word[0])\n",
 88 |     "print(word[1])\n",
 89 |     "print(word[2])\n",
 90 |     "print(word[3])"
 91 |    ]
 92 |   },
 93 |   {
 94 |    "cell_type": "markdown",
 95 |    "metadata": {},
 96 |    "source": [
 97 |     "While this works, it's a bad approach for two reasons:\n",
 98 |     "\n",
 99 |     "1. It doesn't scale:\n",
100 |     "   if we want to print the characters in a string that's hundreds of letters long,\n",
101 |     "   we'd be better off just typing them in.\n",
102 |     "\n",
103 |     "2. It's fragile:\n",
104 |     "   if we give it a longer string,\n",
105 |     "   it only prints part of the data,\n",
106 |     "   and if we give it a shorter one,\n",
107 |     "   it produces an error because we're asking for characters that don't exist.\n",
108 |     "\n"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "markdown",
113 |    "metadata": {},
114 |    "source": [
115 |     "Running:\n",
116 |     "\n",
117 |     "```python\n",
118 |     "word = 'tin'\n",
119 |     "print(word[0])\n",
120 |     "print(word[1])\n",
121 |     "print(word[2])\n",
122 |     "print(word[3])\n",
123 |     "```"
124 |    ]
125 |   },
126 |   {
127 |    "cell_type": "markdown",
128 |    "metadata": {},
129 |    "source": [
130 |     "Gives the error:\n",
131 |     "\n",
132 |     "```\n",
133 |     "---------------------------------------------------------------------------\n",
134 |     "IndexError                                Traceback (most recent call last)\n",
135 |     "<ipython-input-4-e59d5eac5430> in <module>()\n",
136 |     "      3 print(word[1])\n",
137 |     "      4 print(word[2])\n",
138 |     "----> 5 print(word[3])\n",
139 |     "\n",
140 |     "IndexError: string index out of range\n",
141 |     "```"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {},
147 |    "source": [
148 |     "\n",
149 |     "\n",
150 |     "Here's a better approach:\n",
151 |     "\n"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "code",
156 |    "execution_count": 19,
157 |    "metadata": {},
158 |    "outputs": [
159 |     {
160 |      "name": "stdout",
161 |      "output_type": "stream",
162 |      "text": [
163 |       "l\n",
164 |       "e\n",
165 |       "a\n",
166 |       "d\n"
167 |      ]
168 |     }
169 |    ],
170 |    "source": [
171 |     "word = 'lead'\n",
172 |     "for char in word:\n",
173 |     "    print(char)"
174 |    ]
175 |   },
176 |   {
177 |    "cell_type": "markdown",
178 |    "metadata": {},
179 |    "source": [
180 |     "This is shorter --- certainly shorter than something that prints every character in a hundred-letter string --- and\n",
181 |     "more robust as well:"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "code",
186 |    "execution_count": 20,
187 |    "metadata": {},
188 |    "outputs": [
189 |     {
190 |      "name": "stdout",
191 |      "output_type": "stream",
192 |      "text": [
193 |       "o\n",
194 |       "x\n",
195 |       "y\n",
196 |       "g\n",
197 |       "e\n",
198 |       "n\n"
199 |      ]
200 |     }
201 |    ],
202 |    "source": [
203 |     "word = 'oxygen'\n",
204 |     "for char in word:\n",
205 |     "    print(char)"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "The improved version uses a **for loop** to repeat an operation --- in this case, printing --- once for each thing in a sequence.\n",
213 |     "The general form of a loop is:\n",
214 |     "\n",
215 |     "```python\n",
216 |     "for variable in collection:\n",
217 |     "    # do things with variable\n",
218 |     "```"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "markdown",
223 |    "metadata": {},
224 |    "source": [
225 |     "\n",
226 |     "Using the oxygen example above, the loop might look like this:\n",
227 |     "\n",
228 |     "![loop_image](images/loops_image.png)\n",
229 |     "\n",
230 |     "where each character (`char`) in the variable `word` is looped through and printed one character after another.\n",
231 |     "The numbers in the diagram denote which loop cycle the character was printed in (1 being the first loop, and 6 being the final loop).\n",
232 |     "\n",
233 |     "We can call the **loop variable** anything we like,\n",
234 |     "but there must be a colon at the end of the line starting the loop, and we must indent anything we want to run inside the loop. Unlike many other languages, there is no command to signify the end of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop.\n",
235 |     "\n",
236 |     "\n"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "markdown",
241 |    "metadata": {},
242 |    "source": [
243 |     "## What's in a name?\n",
244 |     "\n",
245 |     "\n",
246 |     "In the example above, the loop variable was given the name `char` as a mnemonic; it is short for 'character'. \n",
247 |     "We can choose any name we want for variables. We might just as easily have chosen the name `banana` for the loop variable, as long as we use the same name when we invoke the variable inside the loop:\n",
248 |     "\n"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": 21,
254 |    "metadata": {},
255 |    "outputs": [
256 |     {
257 |      "name": "stdout",
258 |      "output_type": "stream",
259 |      "text": [
260 |       "o\n",
261 |       "x\n",
262 |       "y\n",
263 |       "g\n",
264 |       "e\n",
265 |       "n\n"
266 |      ]
267 |     }
268 |    ],
269 |    "source": [
270 |     "word = 'oxygen'\n",
271 |     "for banana in word:\n",
272 |     "    print(banana)"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "markdown",
277 |    "metadata": {},
278 |    "source": [
279 |     "It is a good idea to choose variable names that are meaningful, otherwise it would be more difficult to understand what the loop is doing.\n",
280 |     "\n",
281 |     "\n",
282 |     "Here's another loop that repeatedly updates a variable:"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": 22,
288 |    "metadata": {},
289 |    "outputs": [
290 |     {
291 |      "name": "stdout",
292 |      "output_type": "stream",
293 |      "text": [
294 |       "There are 5 vowels\n"
295 |      ]
296 |     }
297 |    ],
298 |    "source": [
299 |     "length = 0\n",
300 |     "for vowel in 'aeiou':\n",
301 |     "    length = length + 1\n",
302 |     "print('There are', length, 'vowels')"
303 |    ]
304 |   },
305 |   {
306 |    "cell_type": "markdown",
307 |    "metadata": {},
308 |    "source": [
309 |     "It's worth tracing the execution of this little program step by step.\n",
310 |     "\n",
311 |     "Since there are five characters in `'aeiou'`,\n",
312 |     "the statement on line 3 will be executed five times.\n",
313 |     "\n",
314 |     "The first time around,\n",
315 |     "`length` is zero (the value assigned to it on line 1)\n",
316 |     "and `vowel` is `'a'`.\n",
317 |     "The statement adds 1 to the old value of `length`,\n",
318 |     "producing 1,\n",
319 |     "and updates `length` to refer to that new value.\n",
320 |     "\n",
321 |     "The next time around,\n",
322 |     "`vowel` is `'e'` and `length` is 1,\n",
323 |     "so `length` is updated to be 2.\n",
324 |     "\n",
325 |     "After three more updates,\n",
326 |     "`length` is 5;\n",
327 |     "since there is nothing left in `'aeiou'` for Python to process,\n",
328 |     "the loop finishes\n",
329 |     "and the `print` statement on line 4 tells us our final answer.\n",
330 |     "\n",
331 |     "Note that a loop variable `vowel` is just a variable that's being used to record progress in a loop."
332 |    ]
333 |   },
334 |   {
335 |    "cell_type": "markdown",
336 |    "metadata": {},
337 |    "source": [
338 |     "## Challenge - scope of the loop variable\n",
339 |     "\n",
340 |     "1. In the loop over `\"aeiou\"` above, does the loop variable `vowel` exist after the loop has finished ?\n"
341 |    ]
342 |   },
343 |   {
344 |    "cell_type": "code",
345 |    "execution_count": 23,
346 |    "metadata": {},
347 |    "outputs": [
348 |     {
349 |      "name": "stdout",
350 |      "output_type": "stream",
351 |      "text": [
352 |       "After the loop, `vowel` exists and has the value: u\n"
353 |      ]
354 |     }
355 |    ],
356 |    "source": [
357 |     "length = 0\n",
358 |     "for vowel in 'aeiou':\n",
359 |     "    length = length + 1\n",
360 |     "print('After the loop, `vowel` exists and has the value: ' + vowel)\n",
361 |     "\n",
362 |     "# The loop variable `vowel` exists after the loop is completed, not only inside the loop"
363 |    ]
364 |   },
365 |   {
366 |    "cell_type": "markdown",
367 |    "metadata": {},
368 |    "source": [
369 |     "Note also that finding the length of a string is such a common operation that Python actually has a built-in function to do it called `len`:"
370 |    ]
371 |   },
372 |   {
373 |    "cell_type": "code",
374 |    "execution_count": 24,
375 |    "metadata": {},
376 |    "outputs": [
377 |     {
378 |      "name": "stdout",
379 |      "output_type": "stream",
380 |      "text": [
381 |       "5\n"
382 |      ]
383 |     }
384 |    ],
385 |    "source": [
386 |     "print(len('aeiou'))"
387 |    ]
388 |   },
389 |   {
390 |    "cell_type": "markdown",
391 |    "metadata": {},
392 |    "source": [
393 |     "`len` is much faster than any function we could write ourselves,\n",
394 |     "and much easier to read than a two-line loop;\n",
395 |     "it will also give us the length of many other things that we haven't met yet,\n",
396 |     "so we should always use it when we can."
397 |    ]
398 |   },
399 |   {
400 |    "cell_type": "markdown",
401 |    "metadata": {},
402 |    "source": [
403 |     "## From 1 to N\n",
404 |     "\n",
405 |     "Python has a built-in function called `range` that creates a sequence of numbers. `range` can\n",
406 |     "accept 1, 2, or 3 parameters.\n",
407 |     "\n",
408 |     "* If one parameter is given, `range` creates an array of that length,\n",
409 |     "  starting at zero and incrementing by 1.\n",
410 |     "  For example, `range(3)` produces the numbers `0, 1, 2`.\n",
411 |     "* If two parameters are given, `range` starts at\n",
412 |     "  the first and ends just before the second, incrementing by one.\n",
413 |     "  For example, `range(2, 5)` produces `2, 3, 4`.\n",
414 |     "* If `range` is given 3 parameters,\n",
415 |     "  it starts at the first one, ends just before the second one, and increments by the third one.\n",
416 |     "  For exmaple `range(3, 10, 2)` produces `3, 5, 7, 9`.\n",
417 |     "\n"
418 |    ]
419 |   },
420 |   {
421 |    "cell_type": "markdown",
422 |    "metadata": {
423 |     "tags": [
424 |      "challenge"
425 |     ]
426 |    },
427 |    "source": [
428 |     "## Challenge - loop over a range\n",
429 |     "Using `range`,\n",
430 |     "write a loop that uses `range` to print the first 3 natural numbers:\n",
431 |     "\n",
432 |     "```\n",
433 |     "1\n",
434 |     "2\n",
435 |     "3\n",
436 |     "```\n",
437 |     "\n",
438 |     "\n"
439 |    ]
440 |   },
441 |   {
442 |    "cell_type": "markdown",
443 |    "metadata": {
444 |     "tags": [
445 |      "solution"
446 |     ]
447 |    },
448 |    "source": [
449 |     "## Solution"
450 |    ]
451 |   },
452 |   {
453 |    "cell_type": "code",
454 |    "execution_count": 25,
455 |    "metadata": {
456 |     "tags": [
457 |      "solution"
458 |     ]
459 |    },
460 |    "outputs": [
461 |     {
462 |      "name": "stdout",
463 |      "output_type": "stream",
464 |      "text": [
465 |       "1\n",
466 |       "2\n",
467 |       "3\n"
468 |      ]
469 |     }
470 |    ],
471 |    "source": [
472 |     "for i in range(1, 4):\n",
473 |     "   print(i)"
474 |    ]
475 |   },
476 |   {
477 |    "cell_type": "markdown",
478 |    "metadata": {},
479 |    "source": [
480 |     "## Computing Powers With Loops\n",
481 |     "\n",
482 |     "Exponentiation is built into Python:"
483 |    ]
484 |   },
485 |   {
486 |    "cell_type": "code",
487 |    "execution_count": 26,
488 |    "metadata": {},
489 |    "outputs": [
490 |     {
491 |      "name": "stdout",
492 |      "output_type": "stream",
493 |      "text": [
494 |       "125\n"
495 |      ]
496 |     }
497 |    ],
498 |    "source": [
499 |     "print(5 ** 3)"
500 |    ]
501 |   },
502 |   {
503 |    "cell_type": "markdown",
504 |    "metadata": {
505 |     "tags": [
506 |      "challenge"
507 |     ]
508 |    },
509 |    "source": [
510 |     "## Challenge - multiplication in a loop\n",
511 |     "\n",
512 |     "Write a loop that calculates the same result as `5 ** 3` using\n",
513 |     "multiplication (and without exponentiation)."
514 |    ]
515 |   },
516 |   {
517 |    "cell_type": "markdown",
518 |    "metadata": {
519 |     "tags": [
520 |      "solution"
521 |     ]
522 |    },
523 |    "source": [
524 |     "## Solution"
525 |    ]
526 |   },
527 |   {
528 |    "cell_type": "code",
529 |    "execution_count": 27,
530 |    "metadata": {
531 |     "tags": [
532 |      "solution"
533 |     ]
534 |    },
535 |    "outputs": [
536 |     {
537 |      "name": "stdout",
538 |      "output_type": "stream",
539 |      "text": [
540 |       "125\n"
541 |      ]
542 |     }
543 |    ],
544 |    "source": [
545 |     "result = 1\n",
546 |     "for i in range(0, 3):\n",
547 |     "   result = result * 5\n",
548 |     "print(result)"
549 |    ]
550 |   },
551 |   {
552 |    "cell_type": "markdown",
553 |    "metadata": {
554 |     "tags": [
555 |      "challenge"
556 |     ]
557 |    },
558 |    "source": [
559 |     "## Bonus challenge: reverse a string\n",
560 |     "\n",
561 |     "Knowing that two strings can be concatenated using the `+` operator,\n",
562 |     "write a loop that takes a string\n",
563 |     "and produces a new string with the characters in reverse order,\n",
564 |     "so `'Newton'` becomes `'notweN'`."
565 |    ]
566 |   },
567 |   {
568 |    "cell_type": "markdown",
569 |    "metadata": {
570 |     "tags": [
571 |      "solution"
572 |     ]
573 |    },
574 |    "source": [
575 |     "## Solution"
576 |    ]
577 |   },
578 |   {
579 |    "cell_type": "code",
580 |    "execution_count": 28,
581 |    "metadata": {
582 |     "tags": [
583 |      "solution"
584 |     ]
585 |    },
586 |    "outputs": [
587 |     {
588 |      "name": "stdout",
589 |      "output_type": "stream",
590 |      "text": [
591 |       "notweN\n"
592 |      ]
593 |     }
594 |    ],
595 |    "source": [
596 |     "newstring = ''\n",
597 |     "oldstring = 'Newton'\n",
598 |     "for char in oldstring:\n",
599 |     "   newstring = char + newstring\n",
600 |     "print(newstring)"
601 |    ]
602 |   },
603 |   {
604 |    "cell_type": "markdown",
605 |    "metadata": {},
606 |    "source": [
607 |     "## Enumerate\n",
608 |     "\n",
609 |     "The built-in function `enumerate` takes a sequence (e.g. a list) and generates a\n",
610 |     "new sequence of the same length. Each element of the new sequence is a pair composed of the index\n",
611 |     "(0, 1, 2,...) and the value from the original sequence:\n",
612 |     "\n",
613 |     "```\n",
614 |     "for i, x in enumerate(xs):\n",
615 |     "    # Do something with i and x\n",
616 |     "```\n",
617 |     "\n",
618 |     "\n",
619 |     "The code above loops through `xs`, assigning the index to `i` and the value to `x`."
620 |    ]
621 |   },
622 |   {
623 |    "cell_type": "markdown",
624 |    "metadata": {
625 |     "tags": [
626 |      "challenge"
627 |     ]
628 |    },
629 |    "source": [
630 |     "## Bonus challenge: enumeration for computing the value of a polynomial\n",
631 |     "\n",
632 |     "Suppose you have encoded a polynomial as a list of coefficients in\n",
633 |     "the following way: the first element is the constant term, the\n",
634 |     "second element is the coefficient of the linear term, the third is the\n",
635 |     "coefficient of the quadratic term, etc.\n",
636 |     "\n",
637 |     "```\n",
638 |     "x = 5\n",
639 |     "cc = [2, 4, 3]\n",
640 |     "```\n",
641 |     "\n",
642 |     "\n",
643 |     "```\n",
644 |     "y = cc[0] * x**0 + cc[1] * x**1 + cc[2] * x**2\n",
645 |     "y = 97\n",
646 |     "```\n",
647 |     "\n",
648 |     "\n",
649 |     "Write a loop using `enumerate(cc)` which computes the value `y` of any\n",
650 |     "polynomial, given `x` and `cc`."
651 |    ]
652 |   },
653 |   {
654 |    "cell_type": "markdown",
655 |    "metadata": {
656 |     "tags": [
657 |      "solution"
658 |     ]
659 |    },
660 |    "source": [
661 |     "## Solution"
662 |    ]
663 |   },
664 |   {
665 |    "cell_type": "code",
666 |    "execution_count": 29,
667 |    "metadata": {
668 |     "tags": [
669 |      "solution"
670 |     ]
671 |    },
672 |    "outputs": [
673 |     {
674 |      "name": "stdout",
675 |      "output_type": "stream",
676 |      "text": [
677 |       "97\n"
678 |      ]
679 |     }
680 |    ],
681 |    "source": [
682 |     "x = 5\n",
683 |     "cc = [2, 4, 3]\n",
684 |     "y = cc[0] * x**0 + cc[1] * x**1 + cc[2] * x**2\n",
685 |     "\n",
686 |     "y = 0\n",
687 |     "for i, c in enumerate(cc):\n",
688 |     "    y = y + x**i * c\n",
689 |     "    \n",
690 |     "print(y)"
691 |    ]
692 |   },
693 |   {
694 |    "cell_type": "code",
695 |    "execution_count": null,
696 |    "metadata": {},
697 |    "outputs": [],
698 |    "source": []
699 |   }
700 |  ],
701 |  "metadata": {
702 |   "celltoolbar": "Tags",
703 |   "kernelspec": {
704 |    "display_name": "Python 3",
705 |    "language": "python",
706 |    "name": "python3"
707 |   },
708 |   "language_info": {
709 |    "codemirror_mode": {
710 |     "name": "ipython",
711 |     "version": 3
712 |    },
713 |    "file_extension": ".py",
714 |    "mimetype": "text/x-python",
715 |    "name": "python",
716 |    "nbconvert_exporter": "python",
717 |    "pygments_lexer": "ipython3",
718 |    "version": "3.6.6"
719 |   }
720 |  },
721 |  "nbformat": 4,
722 |  "nbformat_minor": 2
723 | }
724 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/plotting_with_ggplot.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | <style>
  4 | .output_label {
  5 |     text-align: right;
  6 |     margin: -1em;
  7 |     padding: 0;
  8 |     font-size: 0.5em;
  9 |     color: grey
 10 | }
 11 | </style>
 12 | 
 13 | 
 14 | # Making Plots With plotnine (aka ggplot)
 15 | 
 16 | 
 17 | 
 18 | 
 19 | <!-- 
 20 | ## Instructor notes
 21 | 
 22 | *Estimated teaching time:* 40 min
 23 | 
 24 | *Estimated challenge time:* 50 min
 25 | 
 26 | *Key questions:*
 27 | 
 28 |   - " How can I visualize data in Python ?"
 29 |   - " What is 'grammar of graphics' ?"
 30 |     
 31 | *Learning objectives:*
 32 | 
 33 |   - "Familiarise yourself with The Grammar of Graphics through plotinine library"
 34 |   - "Create a ggplot object."
 35 |   - "Explore different geom objects"
 36 |   - "Explore other layers of ggplot, including themes and labels"
 37 | 
 38 | *Key points:*
 39 | 
 40 |   - "plotnine is python implementation of The Gramma of Graphics"
 41 |   - "ggplot is a set of gramma rules to make publication quality plots"
 42 |   - "ggplot has idea of layer, building a plot is just adding different layers together"
 43 |  -->
 44 | 
 45 | 
 46 | 
 47 | 
 48 | ## Introduction
 49 | 
 50 | Python has a number of powerful plotting libraries to choose from. One of the oldest and most popular is [`matplotlib`](https://matplotlib.org/) - it forms the foundation for many other Python plotting libraries. For this exercise we are going to use [`plotnine`](https://plotnine.readthedocs.io/en/stable/) which is a Python implementation of the [The Grammar of Graphics](http://link.springer.com/book/10.1007%2F0-387-28695-0), inspired by the interface of the [`ggplot2`](http://ggplot2.org/) package from R. `plotnine` (and it's R cousin `ggplot2`) is a very nice way to create publication quality plots.
 51 | 
 52 | #### The Grammar of Graphics
 53 | 
 54 | > Statistical graphics is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars)
 55 | 
 56 | > Faceting can be used to generate the same plot for different subsets of the dataset
 57 | 
 58 | These are basic building blocks according to the grammar of graphics:
 59 | 
 60 | - **data** The data + a set of aesthetic mappings that describing variables mapping
 61 | - **geom** Geometric objects, represent what you actually see on the plot: points, lines, polygons, etc.
 62 | - **stats** Statistical transformations, summarise data in many useful ways.
 63 | - **scale** The scales map values in the data space to values in an aesthetic space
 64 | - **coord** A coordinate system, describes how data coordinates are mapped to the plane of the graphic.
 65 | - **facet** A faceting specification describes how to break up the data into subsets for plotting individual set
 66 | 
 67 | Let's explore these in detail.
 68 | 
 69 | 
 70 | 
 71 | 
 72 | 
 73 | First, install the `pandas` and `plotnine` packages to ensure they are available.
 74 | 
 75 | 
 76 | 
 77 | 
 78 | 
 79 | 
 80 | ```python
 81 | !pip install pandas plotnine
 82 | ```
 83 | 
 84 | <pre class="output">
 85 | <div class="output_label">output</div>
 86 | <code class="text">
 87 | Requirement already satisfied: pandas in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (0.25.0)
 88 | Requirement already satisfied: plotnine in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (0.5.1)
 89 | Requirement already satisfied: numpy>=1.13.3 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from pandas) (1.17.0)
 90 | Requirement already satisfied: python-dateutil>=2.6.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from pandas) (2.8.0)
 91 | Requirement already satisfied: pytz>=2017.2 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from pandas) (2019.1)
 92 | Requirement already satisfied: descartes>=1.1.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (1.1.0)
 93 | Requirement already satisfied: scipy>=1.0.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (1.3.0)
 94 | Requirement already satisfied: patsy>=0.4.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (0.5.1)
 95 | Requirement already satisfied: matplotlib>=3.0.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (3.1.1)
 96 | Requirement already satisfied: statsmodels>=0.8.0 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (0.10.1)
 97 | Requirement already satisfied: mizani>=0.5.2 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from plotnine) (0.5.4)
 98 | Requirement already satisfied: six>=1.5 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0)
 99 | Requirement already satisfied: cycler>=0.10 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from matplotlib>=3.0.0->plotnine) (0.10.0)
100 | Requirement already satisfied: kiwisolver>=1.0.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from matplotlib>=3.0.0->plotnine) (1.1.0)
101 | Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from matplotlib>=3.0.0->plotnine) (2.4.1.1)
102 | Requirement already satisfied: palettable in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from mizani>=0.5.2->plotnine) (3.2.0)
103 | Requirement already satisfied: setuptools in /Users/perry/.virtualenvs/python-workshop-base-ufuVBSbV/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib>=3.0.0->plotnine) (39.1.0)
104 | 
105 | </code>
106 | </pre>
107 | 
108 | 
109 | 
110 | 
111 | 
112 | 
113 | ```python
114 | # We run this to suppress various deprecation warnings from plotnine - keeps our notebook cleaner
115 | import warnings
116 | warnings.filterwarnings('ignore')
117 | ```
118 | 
119 | 
120 | 
121 | 
122 | 
123 | # Plotting in ggplot style
124 | 
125 | Let's set up our working environment with necessary libraries and also load our csv file into data frame called `survs_df`,
126 | 
127 | 
128 | 
129 | 
130 | 
131 | 
132 | ```python
133 | import numpy as np
134 | import pandas as pd
135 | from plotnine import *
136 | 
137 | %matplotlib inline
138 | survs_df = pd.read_csv('surveys.csv').dropna()
139 | ```
140 | 
141 | 
142 | 
143 | 
144 | 
145 | 
146 | To produce a plot with the `ggplot` class from `plotnine`, we must provide three things:
147 | 
148 | 1. A data frame containing our data.
149 | 2. How the columns of the data frame can be translated into positions, colors, sizes, and shapes of graphical elements ("aesthetics").
150 | 3. The actual graphical elements to display ("geometric objects").
151 | 
152 | 
153 | 
154 | 
155 | 
156 | 
157 | 
158 | ## Introduction to plotting
159 | 
160 | 
161 | 
162 | 
163 | 
164 | 
165 | 
166 | 
167 | ```python
168 | ggplot(survs_df, aes(x='weight', y='hindfoot_length')) + geom_point()
169 | ```
170 | 
171 | 
172 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_10_0.png)
173 | 
174 | 
175 | 
176 | 
177 | 
178 | <pre class="output">
179 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
180 | <code class="text">
181 | <ggplot: (275481864)>
182 | </code>
183 | </pre>
184 | 
185 | 
186 | 
187 | 
188 | 
189 | 
190 | 
191 | Let's see if we can also include information about species and year.
192 | 
193 | 
194 | 
195 | 
196 | 
197 | 
198 | ```python
199 | ggplot(survs_df, aes(x='weight', y='hindfoot_length',
200 |     size = 'year')) + geom_point()
201 | ```
202 | 
203 | 
204 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_12_0.png)
205 | 
206 | 
207 | 
208 | 
209 | 
210 | <pre class="output">
211 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
212 | <code class="text">
213 | <ggplot: (295313142)>
214 | </code>
215 | </pre>
216 | 
217 | 
218 | 
219 | 
220 | 
221 | 
222 | 
223 | Notice that we've dropped the `x=` and `y=` ? These are implied for the first and second argument of `aes()`.
224 | 
225 | 
226 | 
227 | 
228 | 
229 | 
230 | ```python
231 | ggplot(survs_df, aes(x='weight', y='hindfoot_length', 
232 |     size = 'year', color = 'species_id')) + geom_point()
233 | ```
234 | 
235 | 
236 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_14_0.png)
237 | 
238 | 
239 | 
240 | 
241 | 
242 | <pre class="output">
243 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
244 | <code class="text">
245 | <ggplot: (299641906)>
246 | </code>
247 | </pre>
248 | 
249 | 
250 | 
251 | 
252 | 
253 | 
254 | 
255 | 
256 | We can do simple counting plot, to see how many observation (data points) we have for each year for example
257 | 
258 | 
259 | 
260 | 
261 | 
262 | 
263 | 
264 | 
265 | ```python
266 | ggplot(survs_df, aes(x='year')) + \
267 |     geom_bar(stat = 'count')
268 | ```
269 | 
270 | 
271 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_16_0.png)
272 | 
273 | 
274 | 
275 | 
276 | 
277 | <pre class="output">
278 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
279 | <code class="text">
280 | <ggplot: (274955419)>
281 | </code>
282 | </pre>
283 | 
284 | 
285 | 
286 | 
287 | 
288 | 
289 | 
290 | 
291 | Let's now also color by species to see how many observation we have per species in a given year
292 | 
293 | 
294 | 
295 | 
296 | 
297 | 
298 | 
299 | 
300 | ```python
301 | ggplot(survs_df, aes(x='year', fill = 'species_id')) + \
302 |     geom_bar(stat = 'count')
303 | ```
304 | 
305 | 
306 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_18_0.png)
307 | 
308 | 
309 | 
310 | 
311 | 
312 | <pre class="output">
313 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
314 | <code class="text">
315 | <ggplot: (-9223372036559208532)>
316 | </code>
317 | </pre>
318 | 
319 | 
320 | 
321 | 
322 | 
323 | 
324 | 
325 | ## Challenges
326 | 
327 | 1. Produce a plot comparing the number of observations for each species at each site. The plot should have `site_id` on the x axis, ideally as categorical data. (HINT: You can convert a column in a DataFrame `df` to the 'category' type using: `df['some_col_name'] = df['some_col_name'].astype('category')`)
328 | 
329 | 2. Create a **boxplot** of `hindfoot_length` across different species (`species_id` column) (HINT: There's a list of _geoms_ available for `plotnine` in the [docs](https://plotnine.readthedocs.io/en/stable/api.html#geoms) - instead of `geom_bar`, which one should you use ?)
330 | 
331 | 
332 | 
333 | 
334 | <!-- 
335 | ## Solutions
336 |  -->
337 | 
338 | 
339 | 
340 | <!-- 
341 | 
342 | ```python
343 | # Part 1
344 | 
345 | # We convert site_id into a categorical column.
346 | # This isn't strictly nessecary, but with categories we get all the x-axis labels 
347 | # (with continuous we don't by default) - try both and see
348 | survs_df['site_id'] = survs_df['site_id'].astype('category')
349 | 
350 | ggplot(survs_df, aes(x='site_id', fill = 'species_id')) \
351 |     + geom_bar(stat='count')
352 | ```
353 | 
354 | 
355 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_21_0.png)
356 | 
357 | 
358 | 
359 | 
360 | 
361 | <pre class="output">
362 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
363 | <code class="text">
364 | <ggplot: (-9223372036559208665)>
365 | </code>
366 | </pre>
367 | 
368 | 
369 |  -->
370 | 
371 | 
372 | 
373 | <!-- 
374 | 
375 | ```python
376 | # Part 2
377 | ggplot(survs_df, aes(x='species_id', y='hindfoot_length')) + \
378 |     geom_boxplot() + \
379 |     theme(axis_text_x = element_text(angle=90, hjust=1))
380 | ```
381 | 
382 | 
383 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_22_0.png)
384 | 
385 | 
386 | 
387 | 
388 | 
389 | <pre class="output">
390 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
391 | <code class="text">
392 | <ggplot: (-9223372036558968204)>
393 | </code>
394 | </pre>
395 | 
396 | 
397 |  -->
398 | 
399 | 
400 | 
401 | 
402 | ## More geom types
403 | 
404 | 
405 | 
406 | 
407 | 
408 | 
409 | 
410 | 
411 | ```python
412 | ggplot(survs_df, aes(x='year', y='weight')) + \
413 |     geom_boxplot()
414 | ```
415 | 
416 | 
417 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_24_0.png)
418 | 
419 | 
420 | 
421 | 
422 | 
423 | <pre class="output">
424 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
425 | <code class="text">
426 | <ggplot: (-9223372036558968190)>
427 | </code>
428 | </pre>
429 | 
430 | 
431 | 
432 | 
433 | 
434 | 
435 | 
436 | Why are we not seeing mulitple boxplots, one for each year?
437 | This is because year variable is continuous in our data frame, but for this purpose we want it to be categorical.
438 | 
439 | 
440 | 
441 | 
442 | 
443 | 
444 | ```python
445 | survs_df['year_fact'] = survs_df['year'].astype("category")
446 | 
447 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \
448 |     geom_boxplot()
449 | ```
450 | 
451 | 
452 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_26_0.png)
453 | 
454 | 
455 | 
456 | 
457 | 
458 | <pre class="output">
459 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
460 | <code class="text">
461 | <ggplot: (-9223372036553754957)>
462 | </code>
463 | </pre>
464 | 
465 | 
466 | 
467 | 
468 | 
469 | 
470 | 
471 | You'll notice the x-axis labels are overlapped. To flip them 90-degrees we can apply a `theme` so they look less cluttered. We will revisit themes later.
472 | 
473 | 
474 | 
475 | 
476 | 
477 | 
478 | ```python
479 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \
480 |     geom_boxplot() + \
481 |     theme(axis_text_x = element_text(angle=90, hjust=1))
482 | ```
483 | 
484 | 
485 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_28_0.png)
486 | 
487 | 
488 | 
489 | 
490 | 
491 | <pre class="output">
492 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
493 | <code class="text">
494 | <ggplot: (-9223372036553653735)>
495 | </code>
496 | </pre>
497 | 
498 | 
499 | 
500 | 
501 | 
502 | 
503 | 
504 | To save some typing, let's define this x-axis label rotating theme as a short variable name that we can reuse:
505 | 
506 | 
507 | 
508 | 
509 | 
510 | 
511 | ```python
512 | flip_xlabels = theme(axis_text_x = element_text(angle=90, hjust=1))
513 | ```
514 | 
515 | 
516 | 
517 | 
518 | 
519 | 
520 | ```python
521 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \
522 |     geom_violin() + \
523 |     flip_xlabels
524 | ```
525 | 
526 | 
527 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_31_0.png)
528 | 
529 | 
530 | 
531 | 
532 | 
533 | <pre class="output">
534 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
535 | <code class="text">
536 | <ggplot: (301020918)>
537 | </code>
538 | </pre>
539 | 
540 | 
541 | 
542 | 
543 | 
544 | 
545 | 
546 | To save an image for later:
547 | 
548 | 
549 | 
550 | 
551 | 
552 | 
553 | ```python
554 | plt1 = ggplot(survs_df, aes(x='year_fact', y='weight')) + \
555 |            geom_boxplot() + \
556 |            xlab("Years") + \
557 |            ylab("Weight log2(kg)") + \
558 |            ggtitle("Boxplots, summary of species weight in each year")
559 | 
560 | ggsave(filename="plot1.png",
561 |        plot=plt1,
562 |        device='png',
563 |        dpi=300,
564 |        height=25,
565 |        width=25)
566 | ```
567 | 
568 | 
569 | 
570 | 
571 | 
572 | ## Challenges
573 | 
574 | 1. Can you log2 transform `weight` and plot a "normalised" boxplot ? Hint: use `np.log2()` function and name new column `weight_log`.
575 | 
576 | 2. Does a log2 transform make this data visualisation better ?
577 | 
578 | 
579 | 
580 | 
581 | <!-- 
582 | ## Solution
583 |  -->
584 | 
585 | 
586 | 
587 | <!-- 
588 | 
589 | ```python
590 | survs_df['weight_log'] = np.log2(survs_df['weight'])
591 |     
592 | ggplot(survs_df, aes(x='year_fact', y='weight_log')) + \
593 |     geom_boxplot() + \
594 |     xlab("Years") + \
595 |     ylab("Weight log2(kg)") + \
596 |     ggtitle("Boxplots, summary of species wieght in each year") + \
597 |     theme(axis_text_x = element_text(angle=90, hjust=1))
598 | ```
599 | 
600 | 
601 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_36_0.png)
602 | 
603 | 
604 | 
605 | 
606 | 
607 | <pre class="output">
608 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
609 | <code class="text">
610 | <ggplot: (-9223372036558995035)>
611 | </code>
612 | </pre>
613 | 
614 | 
615 |  -->
616 | 
617 | 
618 | 
619 | 
620 | ## Faceting
621 | 
622 | ggplot has a special technique called *faceting* that allows to split one plot
623 | into multiple plots based on a factor included in the dataset. We will use it to
624 | make one plot for a time series for each species.
625 | 
626 | 
627 | 
628 | 
629 | 
630 | 
631 | 
632 | 
633 | ```python
634 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \
635 |     geom_boxplot() + \
636 |     facet_wrap(['sex']) + \
637 |     flip_xlabels + \
638 |     theme(axis_text_x = element_text(size=6))
639 | ```
640 | 
641 | 
642 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_38_0.png)
643 | 
644 | 
645 | 
646 | 
647 | 
648 | <pre class="output">
649 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
650 | <code class="text">
651 | <ggplot: (301142420)>
652 | </code>
653 | </pre>
654 | 
655 | 
656 | 
657 | 
658 | 
659 | 
660 | 
661 | 
662 | ```python
663 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \
664 |     geom_boxplot() + \
665 |     theme(axis_text_x = element_text(size=4)) + \
666 |     facet_wrap(['species_id']) + \
667 |     flip_xlabels
668 | ```
669 | 
670 | 
671 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_39_0.png)
672 | 
673 | 
674 | 
675 | 
676 | 
677 | <pre class="output">
678 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
679 | <code class="text">
680 | <ggplot: (295742412)>
681 | </code>
682 | </pre>
683 | 
684 | 
685 | 
686 | 
687 | 
688 | 
689 | 
690 | The two faceted plots above are probably easier to interpret using the `weight_log` column we created - give it a try !
691 | 
692 | 
693 | 
694 | 
695 | 
696 | ## The "Layered Grammar of Graphics"
697 | 
698 | ```erlang
699 | ggplot(data = <DATA>) + 
700 |   <GEOM_FUNCTION>(
701 |      mapping = aes(<MAPPINGS>),
702 |      stat = <STAT>, 
703 |      position = <POSITION>
704 |   ) +
705 |   <COORDINATE_FUNCTION> +
706 |   <FACET_FUNCTION>
707 | ```
708 | 
709 | 
710 | 
711 | 
712 | 
713 | ## Theming
714 | 
715 | `plotnine` allows pre-defined 'themes' to be applied as aesthetics to the plot.
716 | 
717 | A list available theme you may want to experiment with is here: https://plotnine.readthedocs.io/en/stable/api.html#themes
718 | 
719 | 
720 | 
721 | 
722 | 
723 | 
724 | ```python
725 | ggplot(survs_df, aes(x='year_fact', y='weight')) + \
726 |     geom_boxplot() + \
727 |     theme_bw() + \
728 |     flip_xlabels
729 | ```
730 | 
731 | 
732 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_43_0.png)
733 | 
734 | 
735 | 
736 | 
737 | 
738 | <pre class="output">
739 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
740 | <code class="text">
741 | <ggplot: (-9223372036580658190)>
742 | </code>
743 | </pre>
744 | 
745 | 
746 | 
747 | 
748 | 
749 | 
750 | 
751 | 
752 | ```python
753 | ggplot(survs_df, aes(x='year_fact', y='weight_log')) + \
754 |     geom_boxplot() + \
755 |     facet_wrap(['species_id']) + \
756 |     theme_xkcd() + \
757 |     theme(axis_text_x = element_text(size=4, angle=90, hjust=1))
758 | ```
759 | 
760 | 
761 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_44_0.png)
762 | 
763 | 
764 | 
765 | 
766 | 
767 | <pre class="output">
768 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
769 | <code class="text">
770 | <ggplot: (-9223372036555930487)>
771 | </code>
772 | </pre>
773 | 
774 | 
775 | 
776 | 
777 | 
778 | 
779 | 
780 | ## Extra bits 1
781 | 
782 | Let's try to bin years into decades, which could be crude but might gives simple images to look at.
783 | 
784 | 
785 | 
786 | 
787 | 
788 | 
789 | 
790 | 
791 | ```python
792 | bins = [(survs_df['year'] < 1980),
793 |         (survs_df['year'] < 1990),
794 |         (survs_df['year'] < 2000),
795 |         (survs_df['year'] >= 2000)]
796 | 
797 | labels = ['70s', '80s', '90s', 'Z']
798 | 
799 | survs_df['year_bins'] = np.select(bins, labels)
800 | ```
801 | 
802 | 
803 | 
804 | 
805 | 
806 | 
807 | ```python
808 | plt2 = ggplot(survs_df, aes(x='year_bins', y='weight_log')) + \
809 |            geom_boxplot()
810 | plt2
811 | ```
812 | 
813 | 
814 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_47_0.png)
815 | 
816 | 
817 | 
818 | 
819 | 
820 | <pre class="output">
821 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
822 | <code class="text">
823 | <ggplot: (-9223372036554427725)>
824 | </code>
825 | </pre>
826 | 
827 | 
828 | 
829 | 
830 | 
831 | 
832 | 
833 | 
834 | ```python
835 | plt2 = ggplot(survs_df, aes(x='year_bins', y='weight_log')) + \
836 |            geom_boxplot() + \
837 |            flip_xlabels + \
838 |            facet_wrap(['species_id'])
839 | plt2
840 | ```
841 | 
842 | 
843 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_48_0.png)
844 | 
845 | 
846 | 
847 | 
848 | 
849 | <pre class="output">
850 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
851 | <code class="text">
852 | <ggplot: (299804466)>
853 | </code>
854 | </pre>
855 | 
856 | 
857 | 
858 | 
859 | 
860 | 
861 | 
862 | ## Extra bits 2
863 | 
864 | This is a different way to look at your data
865 | 
866 | 
867 | 
868 | 
869 | 
870 | 
871 | 
872 | 
873 | ```python
874 | ggplot(survs_df, aes("year_fact", "weight")) + \
875 |     stat_summary(fun_y = np.mean, fun_ymin=np.min, fun_ymax=np.max) + \
876 |     theme(axis_text_x = element_text(angle=90, hjust=1))
877 |     
878 | ggplot(survs_df, aes("year_fact", "weight")) + \
879 |     stat_summary(fun_y = np.median, fun_ymin=np.min, fun_ymax=np.max) + \
880 |     theme(axis_text_x = element_text(angle=90, hjust=1))
881 |     
882 | ggplot(survs_df, aes("year_fact", "weight_log")) + \
883 |     stat_summary(fun_y = np.mean, fun_ymin=np.min, fun_ymax=np.max) + \
884 |     theme(axis_text_x = element_text(angle=90, hjust=1))
885 | ```
886 | 
887 | 
888 | ![png](plotting_with_ggplot_files/plotting_with_ggplot_50_0.png)
889 | 
890 | 
891 | 
892 | 
893 | 
894 | <pre class="output">
895 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
896 | <code class="text">
897 | <ggplot: (284801081)>
898 | </code>
899 | </pre>
900 | 
901 | 
902 | 
903 | 
904 | 
905 | 
906 | 
907 | 
908 | ```python
909 | 
910 | ```
911 | 
912 | 
913 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/notebooks/defensive_programming.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "tags": [
  7 |      "solution"
  8 |     ]
  9 |    },
 10 |    "source": [
 11 |     "## Defensive Programming\n",
 12 |     "*Estimated teaching time:* 30 min\n",
 13 |     "\n",
 14 |     "*Estimated challenge time:* 0 min\n",
 15 |     "\n",
 16 |     "\n",
 17 |     "## Module information\n",
 18 |     "\n",
 19 |     "*Key questions:*\n",
 20 |     "\n",
 21 |     "  - \"How can I make my programs more reliable?\"\n",
 22 |     "    \n",
 23 |     "*Learning objectives:*\n",
 24 |     "\n",
 25 |     "  - Explain what an assertion is.\n",
 26 |     "  - Add assertions that check the program's state is correct. \n",
 27 |     "  - Correctly add precondition and postcondition assertions to functions.\n",
 28 |     "  - Explain what test-driven development is, and use it when creating new functions.\n",
 29 |     "  - Explain why variables should be initialized using actual data values rather than arbitrary constants.\n",
 30 |     "---"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "markdown",
 35 |    "metadata": {},
 36 |    "source": [
 37 |     "## Defensive Programming\n",
 38 |     "\n",
 39 |     "\n",
 40 |     "Our previous lessons have introduced the basic tools of programming: variables and lists, file operations, data visualisation, loops, conditionals, and functions. What they haven’t done is show us how to tell whether a program is getting the right answer, and how to tell if it’s still getting the right answer as we make changes to it.\n",
 41 |     "\n",
 42 |     "To achieve that, we need to:\n",
 43 |     "\n",
 44 |     "  - Write programs that check their own operation.\n",
 45 |     "  - Write and run tests for widely-used functions.\n",
 46 |     "  - Make sure we know what “correct” actually means.\n",
 47 |     "    \n",
 48 |     "The good news is, doing these things will speed up our programming, not slow it down. As in real carpentry — the kind done with lumber — the time saved by measuring carefully before cutting a piece of wood is much greater than the time that measuring takes."
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "markdown",
 53 |    "metadata": {},
 54 |    "source": [
 55 |     "## Assertions\n",
 56 |     "\n",
 57 |     "The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion’s condition. If it’s true, Python does nothing, but if it’s false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn’t positive:"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "markdown",
 62 |    "metadata": {},
 63 |    "source": [
 64 |     "``` python\n",
 65 |     "numbers = [1.5, 2.3, 0.7, -0.001, 4.4]\n",
 66 |     "total = 0.0\n",
 67 |     "for n in numbers:\n",
 68 |     "    assert n > 0.0, 'Data should only contain positive values'\n",
 69 |     "    total += n\n",
 70 |     "print('total is:', total)\n",
 71 |     "\n",
 72 |     "```"
 73 |    ]
 74 |   },
 75 |   {
 76 |    "cell_type": "markdown",
 77 |    "metadata": {},
 78 |    "source": [
 79 |     "```python\n",
 80 |     "---------------------------------------------------------------------------\n",
 81 |     "AssertionError                            Traceback (most recent call last)\n",
 82 |     "<ipython-input-1-091518d2f2e2> in <module>()\n",
 83 |     "      3 total = 0.0\n",
 84 |     "      4 for n in numbers:\n",
 85 |     "----> 5     assert n > 0.0, 'Data should only contain positive values'\n",
 86 |     "      6     total += n\n",
 87 |     "      7 print('total is:', total)\n",
 88 |     "\n",
 89 |     "AssertionError: Data should only contain positive values\n",
 90 |     "\n",
 91 |     "```"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "metadata": {},
 97 |    "source": [
 98 |     "Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80–90% are working correctly. Broadly speaking, assertions fall into three categories:\n",
 99 |     "\n",
100 |     "A `precondition` is something that must be true at the start of a function in order for it to work correctly.\n",
101 |     "\n",
102 |     "A `postcondition` is something that the function guarantees is true when it finishes.\n",
103 |     "\n",
104 |     "An `invariant` is something that is always true at a particular point inside a piece of code.\n",
105 |     "\n",
106 |     "For example, suppose we are representing rectangles using a `tuple` of four coordinates `(x0, y0, x1, y1)`, representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, but checks that its input is correctly formatted and that its result makes sense:"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "code",
111 |    "execution_count": 2,
112 |    "metadata": {},
113 |    "outputs": [],
114 |    "source": [
115 |     "def normalize_rectangle(rect):\n",
116 |     "    '''Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.\n",
117 |     "    Input should be of the format (x0, y0, x1, y1).\n",
118 |     "    (x0, y0) and (x1, y1) define the lower left and upper right corners\n",
119 |     "    of the rectangle, respectively.'''\n",
120 |     "    assert len(rect) == 4, 'Rectangles must contain 4 coordinates'\n",
121 |     "    x0, y0, x1, y1 = rect\n",
122 |     "    assert x0 < x1, 'Invalid X coordinates'\n",
123 |     "    assert y0 < y1, 'Invalid Y coordinates'\n",
124 |     "\n",
125 |     "    dx = x1 - x0\n",
126 |     "    dy = y1 - y0\n",
127 |     "    if dx > dy:\n",
128 |     "        scaled = float(dx) / dy\n",
129 |     "        upper_x, upper_y = 1.0, scaled\n",
130 |     "    else:\n",
131 |     "        scaled = float(dx) / dy\n",
132 |     "        upper_x, upper_y = scaled, 1.0\n",
133 |     "\n",
134 |     "    assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'\n",
135 |     "    assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'\n",
136 |     "\n",
137 |     "    return (0, 0, upper_x, upper_y)"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "markdown",
142 |    "metadata": {},
143 |    "source": [
144 |     "The preconditions on lines 3, 5, and 6 catch invalid inputs:"
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "markdown",
149 |    "metadata": {},
150 |    "source": [
151 |     "``` python\n",
152 |     "print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate\n",
153 |     "\n",
154 |     "```"
155 |    ]
156 |   },
157 |   {
158 |    "cell_type": "markdown",
159 |    "metadata": {},
160 |    "source": [
161 |     "``` python\n",
162 |     "---------------------------------------------------------------------------\n",
163 |     "AssertionError                            Traceback (most recent call last)\n",
164 |     "<ipython-input-3-1b9cd8e18a1f> in <module>()\n",
165 |     "----> 1 print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate\n",
166 |     "\n",
167 |     "<ipython-input-2-c94cf5b065b9> in normalize_rectangle(rect)\n",
168 |     "      4     (x0, y0) and (x1, y1) define the lower left and upper right corners\n",
169 |     "      5     of the rectangle, respectively.'''\n",
170 |     "----> 6     assert len(rect) == 4, 'Rectangles must contain 4 coordinates'\n",
171 |     "      7     x0, y0, x1, y1 = rect\n",
172 |     "      8     assert x0 < x1, 'Invalid X coordinates'\n",
173 |     "\n",
174 |     "AssertionError: Rectangles must contain 4 coordinates\n",
175 |     "\n",
176 |     "```"
177 |    ]
178 |   },
179 |   {
180 |    "cell_type": "markdown",
181 |    "metadata": {},
182 |    "source": [
183 |     "```python\n",
184 |     "print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted\n",
185 |     "```"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "markdown",
190 |    "metadata": {},
191 |    "source": [
192 |     "```python\n",
193 |     "---------------------------------------------------------------------------\n",
194 |     "AssertionError                            Traceback (most recent call last)\n",
195 |     "<ipython-input-4-325036405532> in <module>()\n",
196 |     "----> 1 print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted\n",
197 |     "\n",
198 |     "<ipython-input-2-c94cf5b065b9> in normalize_rectangle(rect)\n",
199 |     "      6     assert len(rect) == 4, 'Rectangles must contain 4 coordinates'\n",
200 |     "      7     x0, y0, x1, y1 = rect\n",
201 |     "----> 8     assert x0 < x1, 'Invalid X coordinates'\n",
202 |     "      9     assert y0 < y1, 'Invalid Y coordinates'\n",
203 |     "     10 \n",
204 |     "\n",
205 |     "AssertionError: Invalid X coordinates\n",
206 |     "\n",
207 |     "```"
208 |    ]
209 |   },
210 |   {
211 |    "cell_type": "markdown",
212 |    "metadata": {},
213 |    "source": [
214 |     "The post-conditions on lines 17 and 18 help us catch bugs by telling us when our calculations cannot have been correct. For example, if we normalize a rectangle that is taller than it is wide everything seems OK:"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 5,
220 |    "metadata": {},
221 |    "outputs": [
222 |     {
223 |      "name": "stdout",
224 |      "output_type": "stream",
225 |      "text": [
226 |       "(0, 0, 0.2, 1.0)\n"
227 |      ]
228 |     }
229 |    ],
230 |    "source": [
231 |     "print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) ))"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "markdown",
236 |    "metadata": {},
237 |    "source": [
238 |     "but if we normalize one that’s wider than it is tall, the assertion is triggered:"
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "markdown",
243 |    "metadata": {},
244 |    "source": [
245 |     "```python\n",
246 |     "print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))\n",
247 |     "```"
248 |    ]
249 |   },
250 |   {
251 |    "cell_type": "markdown",
252 |    "metadata": {},
253 |    "source": [
254 |     "``` python\n",
255 |     "---------------------------------------------------------------------------\n",
256 |     "AssertionError                            Traceback (most recent call last)\n",
257 |     "<ipython-input-6-8d4a48f1d068> in <module>()\n",
258 |     "----> 1 print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))\n",
259 |     "\n",
260 |     "<ipython-input-2-c94cf5b065b9> in normalize_rectangle(rect)\n",
261 |     "     19 \n",
262 |     "     20     assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'\n",
263 |     "---> 21     assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'\n",
264 |     "     22 \n",
265 |     "     23     return (0, 0, upper_x, upper_y)\n",
266 |     "\n",
267 |     "AssertionError: Calculated upper Y coordinate invalid\n",
268 |     "\n",
269 |     "```"
270 |    ]
271 |   },
272 |   {
273 |    "cell_type": "markdown",
274 |    "metadata": {},
275 |    "source": [
276 |     "Re-reading our function, we realize that line 11 should divide `dy` by `dx` rather than `dx` by `dy`.  If we had left out the assertion at the end of the function, we would have created and returned something that had the right shape as a valid answer, but wasn’t. Detecting and debugging that would almost certainly have taken more time in the long run than writing the assertion.\n",
277 |     "\n",
278 |     "But assertions aren’t just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing.\n",
279 |     "\n",
280 |     "Most good programmers follow two rules when adding assertions to their code. The first is, fail early, fail often. The greater the distance between when and where an error occurs and when it’s noticed, the harder the error will be to debug, so good code catches mistakes as early as possible.\n",
281 |     "\n",
282 |     "The second rule is, turn bugs into assertions or tests. Whenever you fix a bug, write an assertion that catches the mistake should you make it again. If you made a mistake in a piece of code, the odds are good that you have made other mistakes nearby, or will make the same mistake (or a related one) the next time you change it. Writing assertions to check that you haven’t regressed (i.e., haven’t re-introduced an old problem) can save a lot of time in the long run, and helps to warn people who are reading the code (including your future self) that this bit is tricky.\n",
283 |     "\n"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "markdown",
288 |    "metadata": {},
289 |    "source": [
290 |     "### Test-Driven Development\n",
291 |     "\n",
292 |     "An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it’s given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "markdown",
297 |    "metadata": {},
298 |    "source": [
299 |     "![test diagram](images/testing.svg)"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "markdown",
304 |    "metadata": {},
305 |    "source": [
306 |     "Most novice programmers would solve this problem like this:\n",
307 |     "\n",
308 |     " 1. Write a function `range_overlap`.\n",
309 |     " 2. Call it interactively on two or three different inputs.\n",
310 |     " 3. If it produces the wrong answer, fix the function and re-run that test.\n",
311 |     "\n",
312 |     "This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way:\n",
313 |     "\n",
314 |     "1. Write a short function for each test.\n",
315 |     "2. Write a `range_overlap` function that should pass those tests.\n",
316 |     "3. If `range_overlap` produces any wrong answers, fix it and re-run the test functions.\n",
317 |     "\n",
318 |     "Writing the tests before writing the function they exercise is called `test-driven development` (TDD). Its advocates believe it produces better code faster because:\n",
319 |     "\n",
320 |     "1. If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.\n",
321 |     "2. Writing tests helps programmers figure out what the function is actually supposed to do.\n",
322 |     "\n",
323 |     "Here are three test functions for `range_overlap`:"
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "markdown",
328 |    "metadata": {},
329 |    "source": [
330 |     "``` python\n",
331 |     "assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n",
332 |     "assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n",
333 |     "assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)\n",
334 |     "```"
335 |    ]
336 |   },
337 |   {
338 |    "cell_type": "markdown",
339 |    "metadata": {},
340 |    "source": [
341 |     "```python\n",
342 |     "---------------------------------------------------------------------------\n",
343 |     "NameError                                 Traceback (most recent call last)\n",
344 |     "<ipython-input-9-dc16b942c085> in <module>()\n",
345 |     "----> 1 assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n",
346 |     "      2 assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n",
347 |     "      3 assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)\n",
348 |     "\n",
349 |     "NameError: name 'range_overlap' is not defined\n",
350 |     "```\n"
351 |    ]
352 |   },
353 |   {
354 |    "cell_type": "markdown",
355 |    "metadata": {},
356 |    "source": [
357 |     "The error is actually reassuring: we haven’t written `range_overlap` yet, so if the tests passed, it would be a sign that someone else had and that we were accidentally using their function.\n",
358 |     "\n",
359 |     "And as a bonus of writing these tests, we’ve implicitly defined what our input and output look like: we expect a list of pairs as input, and produce a single pair as output.\n",
360 |     "\n",
361 |     "Something important is missing, though. We don’t have any tests for the case where the ranges don’t overlap at all:"
362 |    ]
363 |   },
364 |   {
365 |    "cell_type": "markdown",
366 |    "metadata": {},
367 |    "source": [
368 |     "```python\n",
369 |     "assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == ???\n",
370 |     "```"
371 |    ]
372 |   },
373 |   {
374 |    "cell_type": "markdown",
375 |    "metadata": {},
376 |    "source": [
377 |     "What should `range_overlap` do in this case: fail with an error message, produce a special value like `(0.0, 0.0)` to signal that there’s no overlap, or *something* else? Any actual implementation of the function will do one of these things; writing the tests first helps us figure out which is *best before* we’re emotionally invested in whatever we happened to write before we realized there was an issue.\n",
378 |     "\n",
379 |     "And what about this case?"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "markdown",
384 |    "metadata": {},
385 |    "source": [
386 |     "```python\n",
387 |     "assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == ???\n",
388 |     "```"
389 |    ]
390 |   },
391 |   {
392 |    "cell_type": "markdown",
393 |    "metadata": {},
394 |    "source": [
395 |     "Do two segments that touch at their endpoints overlap or not? Mathematicians usually say “yes”, but engineers usually say “no”. The best answer is “whatever is most useful in the rest of our program”, but again, any actual implementation of `range_overlap` is going to do *something*, and whatever it is ought to be consistent with what it does when there’s no overlap at all.\n",
396 |     "\n",
397 |     "Since we’re planning to use the range this function returns as the X axis in a time series chart, we decide that:\n",
398 |     "\n",
399 |     " 1. every overlap has to have non-zero width, and\n",
400 |     " 2. we will return the special value None when there’s no overlap.\n",
401 |     " \n",
402 |     "`None` is built into Python, and means “nothing here”. (Other languages often call the equivalent value `null` or `nil`). With that decision made, we can finish writing our last two tests:"
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "markdown",
407 |    "metadata": {},
408 |    "source": [
409 |     "```python\n",
410 |     "assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n",
411 |     "assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n",
412 |     "```"
413 |    ]
414 |   },
415 |   {
416 |    "cell_type": "markdown",
417 |    "metadata": {},
418 |    "source": [
419 |     "```python\n",
420 |     "---------------------------------------------------------------------------\n",
421 |     "NameError                                 Traceback (most recent call last)\n",
422 |     "<ipython-input-13-42de7ddfb428> in <module>()\n",
423 |     "----> 1 assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n",
424 |     "      2 assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n",
425 |     "\n",
426 |     "NameError: name 'range_overlap' is not defined\n",
427 |     "```"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "markdown",
432 |    "metadata": {},
433 |    "source": [
434 |     "Again, we get an error because we haven’t written our function, but we’re now ready to do so:\n"
435 |    ]
436 |   },
437 |   {
438 |    "cell_type": "code",
439 |    "execution_count": 14,
440 |    "metadata": {},
441 |    "outputs": [],
442 |    "source": [
443 |     "def range_overlap(ranges):\n",
444 |     "    '''Return common overlap among a set of [low, high] ranges.'''\n",
445 |     "    lowest = 0.0\n",
446 |     "    highest = 1.0\n",
447 |     "    for (low, high) in ranges:\n",
448 |     "        lowest = max(lowest, low)\n",
449 |     "        highest = min(highest, high)\n",
450 |     "    return (lowest, highest)"
451 |    ]
452 |   },
453 |   {
454 |    "cell_type": "markdown",
455 |    "metadata": {},
456 |    "source": [
457 |     "(Take a moment to think about why we use `max` to raise `lowest` and `min` to lower `highest`). We’d now like to re-run our tests, but they’re scattered across three different cells. To make running them easier, let’s put them all in a function:"
458 |    ]
459 |   },
460 |   {
461 |    "cell_type": "code",
462 |    "execution_count": 15,
463 |    "metadata": {},
464 |    "outputs": [],
465 |    "source": [
466 |     "def test_range_overlap():\n",
467 |     "    assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n",
468 |     "    assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n",
469 |     "    assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n",
470 |     "    assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n",
471 |     "    assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)"
472 |    ]
473 |   },
474 |   {
475 |    "cell_type": "markdown",
476 |    "metadata": {},
477 |    "source": [
478 |     "We can now test `range_overlap` with a single function call:"
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "markdown",
483 |    "metadata": {},
484 |    "source": [
485 |     "```python \n",
486 |     "test_range_overlap() \n",
487 |     "```"
488 |    ]
489 |   },
490 |   {
491 |    "cell_type": "markdown",
492 |    "metadata": {},
493 |    "source": [
494 |     "```python\n",
495 |     "---------------------------------------------------------------------------\n",
496 |     "AssertionError                            Traceback (most recent call last)\n",
497 |     "<ipython-input-16-80290759369d> in <module>()\n",
498 |     "----> 1 test_range_overlap()\n",
499 |     "\n",
500 |     "<ipython-input-15-d61f343ad67a> in test_range_overlap()\n",
501 |     "      1 def test_range_overlap():\n",
502 |     "----> 2     assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None\n",
503 |     "      3     assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None\n",
504 |     "      4     assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)\n",
505 |     "      5     assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)\n",
506 |     "\n",
507 |     "AssertionError: \n",
508 |     "```"
509 |    ]
510 |   },
511 |   {
512 |    "cell_type": "markdown",
513 |    "metadata": {},
514 |    "source": [
515 |     "The first test that was supposed to produce `None` fails, so we know something is wrong with our function. We don’t know whether the other tests passed or failed because Python halted the program as soon as it spotted the first error. Still, some information is better than none, and if we trace the behavior of the function with that input, we realize that we’re initializing `lowest` and `highest` to 0.0 and 1.0 respectively, regardless of the input values. This violates another important rule of programming: always initialize from data."
516 |    ]
517 |   },
518 |   {
519 |    "cell_type": "markdown",
520 |    "metadata": {
521 |     "tags": [
522 |      "challenge"
523 |     ]
524 |    },
525 |    "source": [
526 |     "Fix `range_overlap`. Re-run `test_range_overlap` after each change you make."
527 |    ]
528 |   },
529 |   {
530 |    "cell_type": "code",
531 |    "execution_count": 17,
532 |    "metadata": {
533 |     "tags": [
534 |      "solution"
535 |     ]
536 |    },
537 |    "outputs": [],
538 |    "source": [
539 |     "import numpy\n",
540 |     "\n",
541 |     "def range_overlap(ranges):\n",
542 |     "    '''Return common overlap among a set of [low, high] ranges.'''\n",
543 |     "    if not ranges:\n",
544 |     "        # ranges is None or an empty list\n",
545 |     "        return None\n",
546 |     "    lowest, highest = ranges[0]\n",
547 |     "    for (low, high) in ranges[1:]:\n",
548 |     "        lowest = max(lowest, low)\n",
549 |     "        highest = min(highest, high)\n",
550 |     "    if lowest >= highest:  # no overlap\n",
551 |     "        return None\n",
552 |     "    else:\n",
553 |     "        return (lowest, highest)"
554 |    ]
555 |   },
556 |   {
557 |    "cell_type": "markdown",
558 |    "metadata": {},
559 |    "source": [
560 |     "## Key points\n",
561 |     "\n",
562 |     " - Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.\n",
563 |     "\n",
564 |     " - Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.\n",
565 |     "\n",
566 |     " - Use preconditions to check that the inputs to a function are safe to use.\n",
567 |     "\n",
568 |     " - Use postconditions to check that the output from a function is safe to use.\n",
569 |     "\n",
570 |     " - Write tests before writing code in order to help determine exactly what that code is supposed to do."
571 |    ]
572 |   },
573 |   {
574 |    "cell_type": "code",
575 |    "execution_count": null,
576 |    "metadata": {},
577 |    "outputs": [],
578 |    "source": []
579 |   }
580 |  ],
581 |  "metadata": {
582 |   "celltoolbar": "Tags",
583 |   "kernelspec": {
584 |    "display_name": "Python 3",
585 |    "language": "python",
586 |    "name": "python3"
587 |   },
588 |   "language_info": {
589 |    "codemirror_mode": {
590 |     "name": "ipython",
591 |     "version": 3
592 |    },
593 |    "file_extension": ".py",
594 |    "mimetype": "text/x-python",
595 |    "name": "python",
596 |    "nbconvert_exporter": "python",
597 |    "pygments_lexer": "ipython3",
598 |    "version": "3.6.3"
599 |   }
600 |  },
601 |  "nbformat": 4,
602 |  "nbformat_minor": 2
603 | }
604 | 


--------------------------------------------------------------------------------
/workshops/docs/modules/intro.md:
--------------------------------------------------------------------------------
   1 | 
   2 | 
   3 | <style>
   4 | .output_label {
   5 |     text-align: right;
   6 |     margin: -1em;
   7 |     padding: 0;
   8 |     font-size: 0.5em;
   9 |     color: grey
  10 | }
  11 | </style>
  12 | 
  13 | 
  14 | # Python: the basics
  15 | 
  16 | 
  17 | 
  18 | 
  19 | 
  20 | Python is a general purpose programming language that supports rapid development
  21 | of scripts and applications.
  22 | 
  23 | Python's main advantages:
  24 | 
  25 | * Open Source software, supported by Python Software Foundation
  26 | * Available on all major platforms (ie. Windows, Linux and MacOS) 
  27 | * It is a general-purpose programming language, designed for readability
  28 | * Supports multiple programming paradigms ('functional', 'object oriented')
  29 | * Very large community with a rich ecosystem of third-party packages
  30 | 
  31 | 
  32 | 
  33 | 
  34 | 
  35 | ## Interpreter
  36 | 
  37 | Python is an interpreted language[*](https://softwareengineering.stackexchange.com/a/24560) which can be used in two ways:
  38 | 
  39 | * "Interactive" Mode: It functions like an "advanced calculator", executing
  40 |   one command at a time:
  41 |   
  42 | ```bash
  43 | user:host:~$ python
  44 | Python 3.5.1 (default, Oct 23 2015, 18:05:06)
  45 | [GCC 4.8.3] on linux2
  46 | Type "help", "copyright", "credits" or "license" for more information.
  47 | >>> 2 + 2
  48 | 4
  49 | >>> print("Hello World")
  50 | Hello World
  51 | ```
  52 | 
  53 | 
  54 | 
  55 | 
  56 | 
  57 | * "Scripting" Mode: Executing a series of "commands" saved in text file,
  58 |   usually with a `.py` extension after the name of your file:
  59 | 
  60 | ```bash
  61 | user:host:~$ python my_script.py
  62 | Hello World
  63 | ```
  64 | 
  65 | 
  66 | 
  67 | 
  68 | 
  69 | ## Using interactive Python in Jupyter-style notebooks
  70 | 
  71 | A convenient and powerful way to use interactive-mode Python is via a Jupyter Notebook, or similar browser-based interface.
  72 | 
  73 | This particularly lends itself to data analysis since the notebook records a history of commands and shows output and graphs immediately in the browser.
  74 | 
  75 | There are several ways you can run a Jupyter(-style) notebook - locally installed on your computer or hosted as a service on the web. Today we will use a Jupyter notebook service provided by Google: https://colab.research.google.com (Colaboratory).
  76 | 
  77 | ### Jupyter-style notebooks: a quick tour
  78 | 
  79 | Go to https://colab.research.google.com and login with your Google account.
  80 | 
  81 | Select ***NEW NOTEBOOK → NEW PYTHON 3 NOTEBOOK*** - a new notebook will be created.
  82 | 
  83 | ---
  84 | 
  85 | Type some Python code in the top cell, eg:
  86 | 
  87 | ```python
  88 | print("Hello Jupyter !")
  89 | ```
  90 | 
  91 | ***Shift-Enter*** to run the contents of the cell
  92 | 
  93 | ---
  94 | 
  95 | You can add new cells.
  96 | 
  97 | ***Insert → Insert Code Cell***
  98 | 
  99 | ---
 100 | 
 101 | NOTE: When the text on the left hand of the cell is: `In [*]` (with an asterisk rather than a number), the cell is still running. It's usually best to wait until one cell has finished running before running the next.
 102 | 
 103 | Let's begin writing some code in our notebook.
 104 | 
 105 | 
 106 | 
 107 | 
 108 | 
 109 | 
 110 | ```python
 111 | print("Hello Jupyter !")
 112 | ```
 113 | 
 114 | <pre class="output">
 115 | <div class="output_label">output</div>
 116 | <code class="text">
 117 | Hello Jupyter !
 118 | 
 119 | </code>
 120 | </pre>
 121 | 
 122 | 
 123 | 
 124 | 
 125 | 
 126 | In Jupyter/Collaboratory, just typing the name of a variable in the cell prints its representation:
 127 | 
 128 | 
 129 | 
 130 | 
 131 | 
 132 | 
 133 | ```python
 134 | message = "Hello again !"
 135 | message
 136 | ```
 137 | 
 138 | 
 139 | 
 140 | 
 141 | <pre class="output">
 142 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 143 | <code class="text">
 144 | 'Hello again !'
 145 | </code>
 146 | </pre>
 147 | 
 148 | 
 149 | 
 150 | 
 151 | 
 152 | 
 153 | 
 154 | 
 155 | ```python
 156 | # A 'hash' symbol denotes a comment
 157 | # This is a comment. Anything after the 'hash' symbol on the line is ignored by the Python interpreter
 158 | 
 159 | print("No comment")  # comment
 160 | ```
 161 | 
 162 | <pre class="output">
 163 | <div class="output_label">output</div>
 164 | <code class="text">
 165 | No comment
 166 | 
 167 | </code>
 168 | </pre>
 169 | 
 170 | 
 171 | 
 172 | 
 173 | 
 174 | ## Variables and data types
 175 | ### Integers, floats, strings
 176 | 
 177 | 
 178 | 
 179 | 
 180 | 
 181 | 
 182 | ```python
 183 | a = 5
 184 | ```
 185 | 
 186 | 
 187 | 
 188 | 
 189 | 
 190 | 
 191 | ```python
 192 | a
 193 | ```
 194 | 
 195 | 
 196 | 
 197 | 
 198 | <pre class="output">
 199 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 200 | <code class="text">
 201 | 5
 202 | </code>
 203 | </pre>
 204 | 
 205 | 
 206 | 
 207 | 
 208 | 
 209 | 
 210 | 
 211 | 
 212 | ```python
 213 | type(a)
 214 | ```
 215 | 
 216 | 
 217 | 
 218 | 
 219 | <pre class="output">
 220 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 221 | <code class="text">
 222 | int
 223 | </code>
 224 | </pre>
 225 | 
 226 | 
 227 | 
 228 | 
 229 | 
 230 | 
 231 | 
 232 | Adding a decimal point creates a `float`
 233 | 
 234 | 
 235 | 
 236 | 
 237 | 
 238 | 
 239 | ```python
 240 | b = 5.0
 241 | ```
 242 | 
 243 | 
 244 | 
 245 | 
 246 | 
 247 | 
 248 | ```python
 249 | b
 250 | ```
 251 | 
 252 | 
 253 | 
 254 | 
 255 | <pre class="output">
 256 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 257 | <code class="text">
 258 | 5.0
 259 | </code>
 260 | </pre>
 261 | 
 262 | 
 263 | 
 264 | 
 265 | 
 266 | 
 267 | 
 268 | 
 269 | ```python
 270 | type(b)
 271 | ```
 272 | 
 273 | 
 274 | 
 275 | 
 276 | <pre class="output">
 277 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 278 | <code class="text">
 279 | float
 280 | </code>
 281 | </pre>
 282 | 
 283 | 
 284 | 
 285 | 
 286 | 
 287 | 
 288 | 
 289 | `int` and `float` are collectively called 'numeric' types
 290 | 
 291 | (There are also other numeric types like `hex` for hexidemical and `complex` for complex numbers)
 292 | 
 293 | 
 294 | 
 295 | 
 296 | 
 297 | ## Challenge - Types
 298 | 
 299 | What is the **type** of the variable `letters` defined below ?
 300 | 
 301 | `letters = "ABACBS"`
 302 | 
 303 | * A) `int`
 304 | * B) `str`
 305 | * C) `float`
 306 | * D) `text`
 307 | 
 308 | Write some code the outputs the type - paste your answer into the Etherpad.
 309 | 
 310 | 
 311 | 
 312 | 
 313 | <!-- 
 314 | ## Solution
 315 | 
 316 | Option B - `str`.
 317 |  -->
 318 | 
 319 | 
 320 | 
 321 | <!-- 
 322 | 
 323 | ```python
 324 | letters = "ABACBS"
 325 | type(letters)
 326 | ```
 327 | 
 328 | 
 329 | 
 330 | 
 331 | <pre class="output">
 332 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 333 | <code class="text">
 334 | str
 335 | </code>
 336 | </pre>
 337 | 
 338 | 
 339 |  -->
 340 | 
 341 | 
 342 | 
 343 | 
 344 | ### Strings
 345 | 
 346 | 
 347 | 
 348 | 
 349 | 
 350 | 
 351 | ```python
 352 | some_words = "Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇"
 353 | ```
 354 | 
 355 | 
 356 | 
 357 | 
 358 | 
 359 | 
 360 | ```python
 361 | some_words
 362 | ```
 363 | 
 364 | 
 365 | 
 366 | 
 367 | <pre class="output">
 368 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 369 | <code class="text">
 370 | 'Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇'
 371 | </code>
 372 | </pre>
 373 | 
 374 | 
 375 | 
 376 | 
 377 | 
 378 | 
 379 | 
 380 | 
 381 | ```python
 382 | type(some_words)
 383 | ```
 384 | 
 385 | 
 386 | 
 387 | 
 388 | <pre class="output">
 389 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 390 | <code class="text">
 391 | str
 392 | </code>
 393 | </pre>
 394 | 
 395 | 
 396 | 
 397 | 
 398 | 
 399 | 
 400 | 
 401 | The variable `some_words` is of type `str`, short for "string". Strings hold
 402 | sequences of characters, which can be letters, numbers, punctuation
 403 | or more exotic forms of text (even emoji!).
 404 | 
 405 | 
 406 | 
 407 | 
 408 | 
 409 | ## Operators
 410 | 
 411 | We can perform mathematical calculations in Python using the basic operators:
 412 | 
 413 | `+`  `-`  `*`  `/`  `%`  `**`
 414 | 
 415 | 
 416 | 
 417 | 
 418 | 
 419 | 
 420 | ```python
 421 | 2 + 2  # Addition
 422 | ```
 423 | 
 424 | 
 425 | 
 426 | 
 427 | <pre class="output">
 428 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 429 | <code class="text">
 430 | 4
 431 | </code>
 432 | </pre>
 433 | 
 434 | 
 435 | 
 436 | 
 437 | 
 438 | 
 439 | 
 440 | 
 441 | ```python
 442 | 6 * 7  # Multiplication
 443 | ```
 444 | 
 445 | 
 446 | 
 447 | 
 448 | <pre class="output">
 449 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 450 | <code class="text">
 451 | 42
 452 | </code>
 453 | </pre>
 454 | 
 455 | 
 456 | 
 457 | 
 458 | 
 459 | 
 460 | 
 461 | 
 462 | ```python
 463 | 2 ** 16  # Power
 464 | ```
 465 | 
 466 | 
 467 | 
 468 | 
 469 | <pre class="output">
 470 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 471 | <code class="text">
 472 | 65536
 473 | </code>
 474 | </pre>
 475 | 
 476 | 
 477 | 
 478 | 
 479 | 
 480 | 
 481 | 
 482 | 
 483 | ```python
 484 | 13 % 5  # Modulo
 485 | ```
 486 | 
 487 | 
 488 | 
 489 | 
 490 | <pre class="output">
 491 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 492 | <code class="text">
 493 | 3
 494 | </code>
 495 | </pre>
 496 | 
 497 | 
 498 | 
 499 | 
 500 | 
 501 | 
 502 | 
 503 | 
 504 | ```python
 505 | # int + int = int
 506 | a = 5
 507 | a + 1
 508 | ```
 509 | 
 510 | 
 511 | 
 512 | 
 513 | <pre class="output">
 514 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 515 | <code class="text">
 516 | 6
 517 | </code>
 518 | </pre>
 519 | 
 520 | 
 521 | 
 522 | 
 523 | 
 524 | 
 525 | 
 526 | 
 527 | ```python
 528 | # float + int = float
 529 | b = 5.0
 530 | b + 1
 531 | ```
 532 | 
 533 | 
 534 | 
 535 | 
 536 | <pre class="output">
 537 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 538 | <code class="text">
 539 | 6.0
 540 | </code>
 541 | </pre>
 542 | 
 543 | 
 544 | 
 545 | 
 546 | 
 547 | 
 548 | 
 549 | 
 550 | ```python
 551 | a + b
 552 | ```
 553 | 
 554 | 
 555 | 
 556 | 
 557 | <pre class="output">
 558 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 559 | <code class="text">
 560 | 10.0
 561 | </code>
 562 | </pre>
 563 | 
 564 | 
 565 | 
 566 | 
 567 | 
 568 | 
 569 | 
 570 | ```python
 571 | some_words = "I'm a string"
 572 | a = 6
 573 | a + some_words
 574 | ```
 575 | 
 576 | 
 577 | 
 578 | 
 579 | 
 580 | 
 581 | 
 582 | Outputs:
 583 | 
 584 | ```
 585 | ---------------------------------------------------------------------------
 586 | TypeError                                 Traceback (most recent call last)
 587 | <ipython-input-1-781eba7cf148> in <module>()
 588 |       1 some_words = "I'm a string"
 589 |       2 a = 6
 590 | ----> 3 a + some_words
 591 | 
 592 | TypeError: unsupported operand type(s) for +: 'int' and 'str'
 593 | ```
 594 | 
 595 | 
 596 | 
 597 | 
 598 | 
 599 | 
 600 | ```python
 601 | str(a) + " " + some_words
 602 | ```
 603 | 
 604 | 
 605 | 
 606 | 
 607 | <pre class="output">
 608 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 609 | <code class="text">
 610 | '5 Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇'
 611 | </code>
 612 | </pre>
 613 | 
 614 | 
 615 | 
 616 | 
 617 | 
 618 | 
 619 | 
 620 | 
 621 | ```python
 622 | # Shorthand: operators with assignment
 623 | a += 1
 624 | a
 625 | 
 626 | # Equivalent to:
 627 | # a = a + 1
 628 | ```
 629 | 
 630 | 
 631 | 
 632 | 
 633 | <pre class="output">
 634 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 635 | <code class="text">
 636 | 6
 637 | </code>
 638 | </pre>
 639 | 
 640 | 
 641 | 
 642 | 
 643 | 
 644 | 
 645 | 
 646 | ### Boolean operations
 647 | 
 648 | We can also use comparison and logic operators:
 649 | `<, >, ==, !=, <=, >=` and statements of identity such as
 650 | `and, or, not`. The data type returned by this is
 651 | called a _boolean_.
 652 | 
 653 | 
 654 | 
 655 | 
 656 | 
 657 | 
 658 | 
 659 | ```python
 660 | 3 > 4
 661 | ```
 662 | 
 663 | 
 664 | 
 665 | 
 666 | <pre class="output">
 667 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 668 | <code class="text">
 669 | False
 670 | </code>
 671 | </pre>
 672 | 
 673 | 
 674 | 
 675 | 
 676 | 
 677 | 
 678 | 
 679 | 
 680 | ```python
 681 | True and True
 682 | ```
 683 | 
 684 | 
 685 | 
 686 | 
 687 | <pre class="output">
 688 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 689 | <code class="text">
 690 | True
 691 | </code>
 692 | </pre>
 693 | 
 694 | 
 695 | 
 696 | 
 697 | 
 698 | 
 699 | 
 700 | 
 701 | ```python
 702 | True or False
 703 | ```
 704 | 
 705 | 
 706 | 
 707 | 
 708 | <pre class="output">
 709 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 710 | <code class="text">
 711 | True
 712 | </code>
 713 | </pre>
 714 | 
 715 | 
 716 | 
 717 | 
 718 | 
 719 | 
 720 | 
 721 | ## Lists and sequence types
 722 | 
 723 | 
 724 | 
 725 | 
 726 | 
 727 | ### Lists
 728 | 
 729 | 
 730 | 
 731 | 
 732 | 
 733 | 
 734 | ```python
 735 | numbers = [2, 4, 6, 8, 10]
 736 | numbers
 737 | ```
 738 | 
 739 | 
 740 | 
 741 | 
 742 | <pre class="output">
 743 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 744 | <code class="text">
 745 | [2, 4, 6, 8, 10]
 746 | </code>
 747 | </pre>
 748 | 
 749 | 
 750 | 
 751 | 
 752 | 
 753 | 
 754 | 
 755 | 
 756 | ```python
 757 | # `len` get the length of a list
 758 | len(numbers)
 759 | ```
 760 | 
 761 | 
 762 | 
 763 | 
 764 | <pre class="output">
 765 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 766 | <code class="text">
 767 | 5
 768 | </code>
 769 | </pre>
 770 | 
 771 | 
 772 | 
 773 | 
 774 | 
 775 | 
 776 | 
 777 | 
 778 | ```python
 779 | # Lists can contain multiple data types, including other lists
 780 | mixed_list = ["asdf", 2, 3.142, numbers, ['a','b','c']]
 781 | mixed_list
 782 | ```
 783 | 
 784 | 
 785 | 
 786 | 
 787 | <pre class="output">
 788 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 789 | <code class="text">
 790 | ['asdf', 2, 3.142, [2, 4, 6, 8, 10], ['a', 'b', 'c']]
 791 | </code>
 792 | </pre>
 793 | 
 794 | 
 795 | 
 796 | 
 797 | 
 798 | 
 799 | 
 800 | You can retrieve items from a list by their *index*. In Python, the first item has an index of 0 (zero).
 801 | 
 802 | 
 803 | 
 804 | 
 805 | 
 806 | 
 807 | ```python
 808 | numbers[0]
 809 | ```
 810 | 
 811 | 
 812 | 
 813 | 
 814 | <pre class="output">
 815 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 816 | <code class="text">
 817 | 2
 818 | </code>
 819 | </pre>
 820 | 
 821 | 
 822 | 
 823 | 
 824 | 
 825 | 
 826 | 
 827 | 
 828 | ```python
 829 | numbers[3]
 830 | ```
 831 | 
 832 | 
 833 | 
 834 | 
 835 | <pre class="output">
 836 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 837 | <code class="text">
 838 | 8
 839 | </code>
 840 | </pre>
 841 | 
 842 | 
 843 | 
 844 | 
 845 | 
 846 | 
 847 | 
 848 | You can also assign a new value to any position in the list.
 849 | 
 850 | 
 851 | 
 852 | 
 853 | 
 854 | 
 855 | ```python
 856 | numbers[3] = numbers[3] * 100
 857 | numbers
 858 | ```
 859 | 
 860 | 
 861 | 
 862 | 
 863 | <pre class="output">
 864 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 865 | <code class="text">
 866 | [2, 4, 6, 800, 10]
 867 | </code>
 868 | </pre>
 869 | 
 870 | 
 871 | 
 872 | 
 873 | 
 874 | 
 875 | 
 876 | You can append items to the end of the list.
 877 | 
 878 | 
 879 | 
 880 | 
 881 | 
 882 | 
 883 | ```python
 884 | numbers.append(12)
 885 | numbers
 886 | ```
 887 | 
 888 | 
 889 | 
 890 | 
 891 | <pre class="output">
 892 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 893 | <code class="text">
 894 | [2, 4, 6, 800, 10, 12]
 895 | </code>
 896 | </pre>
 897 | 
 898 | 
 899 | 
 900 | 
 901 | 
 902 | 
 903 | 
 904 | You can add multiple items to the end of a list with `extend`.
 905 | 
 906 | 
 907 | 
 908 | 
 909 | 
 910 | 
 911 | ```python
 912 | numbers.extend([14, 16, 18])
 913 | numbers
 914 | ```
 915 | 
 916 | 
 917 | 
 918 | 
 919 | <pre class="output">
 920 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
 921 | <code class="text">
 922 | [2, 4, 6, 800, 10, 12, 14, 16, 18]
 923 | </code>
 924 | </pre>
 925 | 
 926 | 
 927 | 
 928 | 
 929 | 
 930 | 
 931 | 
 932 | ### Loops
 933 | 
 934 | A for loop can be used to access the elements in a list or other Python data structure one at a time. We will learn about loops in other lesson.
 935 | 
 936 | 
 937 | 
 938 | 
 939 | 
 940 | 
 941 | ```python
 942 | for num in numbers:
 943 |     print(num)
 944 | ```
 945 | 
 946 | <pre class="output">
 947 | <div class="output_label">output</div>
 948 | <code class="text">
 949 | 2
 950 | 4
 951 | 6
 952 | 800
 953 | 10
 954 | 12
 955 | 14
 956 | 16
 957 | 18
 958 | 
 959 | </code>
 960 | </pre>
 961 | 
 962 | 
 963 | 
 964 | 
 965 | 
 966 | **Indentation** is very important in Python. Note that the second line in the
 967 | example above is indented, indicating the code that is the body of the loop.
 968 | 
 969 | 
 970 | 
 971 | 
 972 | 
 973 | To find out what methods are available for an object, we can use the built-in `help` command:
 974 | 
 975 | 
 976 | 
 977 | 
 978 | 
 979 | 
 980 | ```python
 981 | help(numbers)
 982 | ```
 983 | 
 984 | <pre class="output">
 985 | <div class="output_label">output</div>
 986 | <code class="text">
 987 | Help on list object:
 988 | 
 989 | class list(object)
 990 |  |  list() -> new empty list
 991 |  |  list(iterable) -> new list initialized from iterable's items
 992 |  |  
 993 |  |  Methods defined here:
 994 |  |  
 995 |  |  __add__(self, value, /)
 996 |  |      Return self+value.
 997 |  |  
 998 |  |  __contains__(self, key, /)
 999 |  |      Return key in self.
1000 |  |  
1001 |  |  __delitem__(self, key, /)
1002 |  |      Delete self[key].
1003 |  |  
1004 |  |  __eq__(self, value, /)
1005 |  |      Return self==value.
1006 |  |  
1007 |  |  __ge__(self, value, /)
1008 |  |      Return self>=value.
1009 |  |  
1010 |  |  __getattribute__(self, name, /)
1011 |  |      Return getattr(self, name).
1012 |  |  
1013 |  |  __getitem__(...)
1014 |  |      x.__getitem__(y) <==> x[y]
1015 |  |  
1016 |  |  __gt__(self, value, /)
1017 |  |      Return self>value.
1018 |  |  
1019 |  |  __iadd__(self, value, /)
1020 |  |      Implement self+=value.
1021 |  |  
1022 |  |  __imul__(self, value, /)
1023 |  |      Implement self*=value.
1024 |  |  
1025 |  |  __init__(self, /, *args, **kwargs)
1026 |  |      Initialize self.  See help(type(self)) for accurate signature.
1027 |  |  
1028 |  |  __iter__(self, /)
1029 |  |      Implement iter(self).
1030 |  |  
1031 |  |  __le__(self, value, /)
1032 |  |      Return self<=value.
1033 |  |  
1034 |  |  __len__(self, /)
1035 |  |      Return len(self).
1036 |  |  
1037 |  |  __lt__(self, value, /)
1038 |  |      Return self<value.
1039 |  |  
1040 |  |  __mul__(self, value, /)
1041 |  |      Return self*value.
1042 |  |  
1043 |  |  __ne__(self, value, /)
1044 |  |      Return self!=value.
1045 |  |  
1046 |  |  __new__(*args, **kwargs) from builtins.type
1047 |  |      Create and return a new object.  See help(type) for accurate signature.
1048 |  |  
1049 |  |  __repr__(self, /)
1050 |  |      Return repr(self).
1051 |  |  
1052 |  |  __reversed__(...)
1053 |  |      L.__reversed__() -- return a reverse iterator over the list
1054 |  |  
1055 |  |  __rmul__(self, value, /)
1056 |  |      Return value*self.
1057 |  |  
1058 |  |  __setitem__(self, key, value, /)
1059 |  |      Set self[key] to value.
1060 |  |  
1061 |  |  __sizeof__(...)
1062 |  |      L.__sizeof__() -- size of L in memory, in bytes
1063 |  |  
1064 |  |  append(...)
1065 |  |      L.append(object) -> None -- append object to end
1066 |  |  
1067 |  |  clear(...)
1068 |  |      L.clear() -> None -- remove all items from L
1069 |  |  
1070 |  |  copy(...)
1071 |  |      L.copy() -> list -- a shallow copy of L
1072 |  |  
1073 |  |  count(...)
1074 |  |      L.count(value) -> integer -- return number of occurrences of value
1075 |  |  
1076 |  |  extend(...)
1077 |  |      L.extend(iterable) -> None -- extend list by appending elements from the iterable
1078 |  |  
1079 |  |  index(...)
1080 |  |      L.index(value, [start, [stop]]) -> integer -- return first index of value.
1081 |  |      Raises ValueError if the value is not present.
1082 |  |  
1083 |  |  insert(...)
1084 |  |      L.insert(index, object) -- insert object before index
1085 |  |  
1086 |  |  pop(...)
1087 |  |      L.pop([index]) -> item -- remove and return item at index (default last).
1088 |  |      Raises IndexError if list is empty or index is out of range.
1089 |  |  
1090 |  |  remove(...)
1091 |  |      L.remove(value) -> None -- remove first occurrence of value.
1092 |  |      Raises ValueError if the value is not present.
1093 |  |  
1094 |  |  reverse(...)
1095 |  |      L.reverse() -- reverse *IN PLACE*
1096 |  |  
1097 |  |  sort(...)
1098 |  |      L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
1099 |  |  
1100 |  |  ----------------------------------------------------------------------
1101 |  |  Data and other attributes defined here:
1102 |  |  
1103 |  |  __hash__ = None
1104 | 
1105 | 
1106 | </code>
1107 | </pre>
1108 | 
1109 | 
1110 | 
1111 | 
1112 | 
1113 | ### Tuples
1114 | 
1115 | A tuple is similar to a list in that it's an ordered sequence of elements.
1116 | However, tuples can not be changed once created (they are "immutable"). Tuples
1117 | are created by placing comma-separated values inside parentheses `()`.
1118 | 
1119 | 
1120 | 
1121 | 
1122 | 
1123 | 
1124 | ```python
1125 | tuples_are_immutable = ("bar", 100, 200, "foo")
1126 | tuples_are_immutable
1127 | ```
1128 | 
1129 | 
1130 | 
1131 | 
1132 | <pre class="output">
1133 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1134 | <code class="text">
1135 | ('bar', 100, 200, 'foo')
1136 | </code>
1137 | </pre>
1138 | 
1139 | 
1140 | 
1141 | 
1142 | 
1143 | 
1144 | 
1145 | 
1146 | ```python
1147 | tuples_are_immutable[1]
1148 | ```
1149 | 
1150 | 
1151 | 
1152 | 
1153 | <pre class="output">
1154 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1155 | <code class="text">
1156 | 100
1157 | </code>
1158 | </pre>
1159 | 
1160 | 
1161 | 
1162 | 
1163 | 
1164 | 
1165 | 
1166 | ```python
1167 | tuples_are_immutable[1] = 666
1168 | ```
1169 | 
1170 | 
1171 | 
1172 | 
1173 | 
1174 | Outputs:
1175 | 
1176 | ```
1177 | ---------------------------------------------------------------------------
1178 | TypeError                                 Traceback (most recent call last)
1179 | <ipython-input-39-c91965b0815a> in <module>()
1180 | ----> 1 tuples_are_immutable[1] = 666
1181 | 
1182 | TypeError: 'tuple' object does not support item assignment
1183 | ```
1184 | 
1185 | 
1186 | 
1187 | 
1188 | 
1189 | ### Dictionaries
1190 | 
1191 | Dictionaries are a container that store key-value pairs. They are unordered. 
1192 | 
1193 | Other programming languages might call this a 'hash', 'hashtable' or 'hashmap'.
1194 | 
1195 | 
1196 | 
1197 | 
1198 | 
1199 | 
1200 | ```python
1201 | pairs = {'Apple': 1, 'Orange': 2, 'Pear': 4}
1202 | pairs
1203 | ```
1204 | 
1205 | 
1206 | 
1207 | 
1208 | <pre class="output">
1209 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1210 | <code class="text">
1211 | {'Apple': 1, 'Orange': 2, 'Pear': 4}
1212 | </code>
1213 | </pre>
1214 | 
1215 | 
1216 | 
1217 | 
1218 | 
1219 | 
1220 | 
1221 | 
1222 | ```python
1223 | pairs['Orange']
1224 | ```
1225 | 
1226 | 
1227 | 
1228 | 
1229 | <pre class="output">
1230 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1231 | <code class="text">
1232 | 2
1233 | </code>
1234 | </pre>
1235 | 
1236 | 
1237 | 
1238 | 
1239 | 
1240 | 
1241 | 
1242 | 
1243 | ```python
1244 | pairs['Orange'] = 16
1245 | pairs
1246 | ```
1247 | 
1248 | 
1249 | 
1250 | 
1251 | <pre class="output">
1252 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1253 | <code class="text">
1254 | {'Apple': 1, 'Orange': 16, 'Pear': 4}
1255 | </code>
1256 | </pre>
1257 | 
1258 | 
1259 | 
1260 | 
1261 | 
1262 | 
1263 | 
1264 | The `items` method returns a sequence of the key-value pairs as tuples.
1265 | 
1266 | `values` returns a sequence of just the values.
1267 | 
1268 | `keys` returns a sequence of just the keys.
1269 | 
1270 | ---
1271 | In Python 3, the `.items()`, `.values()` and `.keys()` methods return a ['dictionary view' object](https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects) that behaves like a list or tuple in for loops but doesn't support indexing. 'Dictionary views' stay in sync even when the dictionary changes.
1272 | 
1273 | You can turn them into a normal list or tuple with the `list()` or `tuple()` functions.
1274 | 
1275 | 
1276 | 
1277 | 
1278 | 
1279 | 
1280 | ```python
1281 | pairs.items()
1282 | # list(pairs.items())
1283 | ```
1284 | 
1285 | 
1286 | 
1287 | 
1288 | <pre class="output">
1289 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1290 | <code class="text">
1291 | dict_items([('Apple', 1), ('Orange', 16), ('Pear', 4)])
1292 | </code>
1293 | </pre>
1294 | 
1295 | 
1296 | 
1297 | 
1298 | 
1299 | 
1300 | 
1301 | 
1302 | ```python
1303 | pairs.values()
1304 | # list(pairs.values())
1305 | ```
1306 | 
1307 | 
1308 | 
1309 | 
1310 | <pre class="output">
1311 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1312 | <code class="text">
1313 | dict_values([1, 16, 4])
1314 | </code>
1315 | </pre>
1316 | 
1317 | 
1318 | 
1319 | 
1320 | 
1321 | 
1322 | 
1323 | 
1324 | ```python
1325 | pairs.keys()
1326 | # list(pairs.keys())
1327 | ```
1328 | 
1329 | 
1330 | 
1331 | 
1332 | <pre class="output">
1333 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1334 | <code class="text">
1335 | dict_keys(['Apple', 'Orange', 'Pear'])
1336 | </code>
1337 | </pre>
1338 | 
1339 | 
1340 | 
1341 | 
1342 | 
1343 | 
1344 | 
1345 | 
1346 | ```python
1347 | len(pairs)
1348 | ```
1349 | 
1350 | 
1351 | 
1352 | 
1353 | <pre class="output">
1354 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1355 | <code class="text">
1356 | 3
1357 | </code>
1358 | </pre>
1359 | 
1360 | 
1361 | 
1362 | 
1363 | 
1364 | 
1365 | 
1366 | 
1367 | ```python
1368 | dict_of_dicts = {'first': {1:2, 2: 4, 4: 8, 8: 16}, 'second': {'a': 2.2, 'b': 4.4}}
1369 | dict_of_dicts
1370 | ```
1371 | 
1372 | 
1373 | 
1374 | 
1375 | <pre class="output">
1376 | <div style="text-align: right; margin: -1em; padding: 0;"><span style="font-size: 0.5em; color: grey">output</span></div>
1377 | <code class="text">
1378 | {'first': {1: 2, 2: 4, 4: 8, 8: 16}, 'second': {'a': 2.2, 'b': 4.4}}
1379 | </code>
1380 | </pre>
1381 | 
1382 | 
1383 | 
1384 | 
1385 | 
1386 | 
1387 | 
1388 | ## Challenge - Dictionaries
1389 | 
1390 | Given the dictionary:
1391 | 
1392 | ```python
1393 | jam_ratings = {'Plum': 6, 'Apricot': 2, 'Strawberry': 8}
1394 | ```
1395 | 
1396 | How would you change the value associated with the key `Apricot` to `9`.
1397 | 
1398 | A) `jam_ratings = {'apricot': 9}`
1399 | 
1400 | B) `jam_ratings[9] = 'Apricot'`
1401 | 
1402 | C) `jam_ratings['Apricot'] = 9`
1403 | 
1404 | D) `jam_ratings[2] = 'Apricot'`
1405 | 
1406 | 
1407 | 
1408 | 
1409 | <!-- 
1410 | ## Solution - Dictionaries
1411 | 
1412 | The correct answer is **C**.
1413 | 
1414 | **A** assigns the name `jam_ratings` to a new dictionary with only the key `apricot` - not only are the other jam ratings now missing, but strings used as dictionary keys are *case sensitive* - `apricot` is not the same key as `Apricot`.
1415 | 
1416 | **B** mixes up the value and the key. Assigning to a dictionary uses the form: `dictionary[key] = value`.
1417 | 
1418 | **C** is correct. Bonus - another way to do this would be `jam_ratings.update({'Apricot': 9})` or even `jam_ratings.update(Apricot=9)`.
1419 | 
1420 | **D** mixes up the value and the key (and doesn't actually include the new value to be assigned, `9`, anywhere). `2` is the original *value*, `Apricot` is the key. Assigning to a dictionary uses the form: `dictionary[key] = value`.
1421 |  -->
1422 | 
1423 | 


--------------------------------------------------------------------------------