├── exercises_solutions ├── employment_info.csv ├── bananabread.recipe ├── bananabread.txt ├── ex05_file_IO_solution.ipynb ├── ex03_booleans_branching_loops_solution.ipynb ├── ex04_functions_solution.ipynb ├── ex05_reading_error_messages_solution.ipynb ├── ex02_datastructures_loops_solution.ipynb ├── ex07_pandas_b_solution.ipynb └── ex06_pandas_a_solution.ipynb ├── exercises ├── ex01_print_comments_strings_numbers.ipynb ├── ex03_booleans_branching_loops.ipynb ├── ex05_file_IO.ipynb ├── ex04_functions.ipynb ├── ex07_pandas_b.ipynb ├── ex05_reading_error_messages.ipynb ├── ex02_datastructures_loops.ipynb ├── ex08_plotting.ipynb ├── ex06_pandas_a.ipynb ├── ex09_scipy_statsmodels.ipynb └── ex10_networkx.ipynb ├── README.md └── tutorials ├── tut05_file_IO.ipynb ├── tut04_functions.ipynb ├── tut01_print_comments_strings_numbers.ipynb └── tut03_booleans_branching_loops.ipynb /exercises_solutions/employment_info.csv: -------------------------------------------------------------------------------- 1 | Name,Employee-ID,Employment,Yearly-Net-Income 2 | Jhon,109231,Carpenter,49520 3 | Anna,201201,Graphic Designer,5700 4 | Bob,302211,Python Programmer,2147483647 5 | -------------------------------------------------------------------------------- /exercises_solutions/bananabread.recipe: -------------------------------------------------------------------------------- 1 | Mix everything together and bake for 45min at 180°C. 2 | Wheat,180,g 3 | Sugar,100,g 4 | Cocoa,60,g 5 | Salt,1,pinch 6 | Baking Powder,1,tsp 7 | Chopped Chocolate,170,g 8 | Bananas,3,psc 9 | Oil,60,ml -------------------------------------------------------------------------------- /exercises_solutions/bananabread.txt: -------------------------------------------------------------------------------- 1 | How to make 3 serving(s) of Bananabread: 2 | Wheat: 540g 3 | Sugar: 300g 4 | Cocoa: 180g 5 | Salt: 3pinch 6 | Baking Powder: 3tsp 7 | Chopped Chocolate: 510g 8 | Bananas: 9psc 9 | Oil: 180ml 10 | Mix everything together and bake for 45min at 180°C. -------------------------------------------------------------------------------- /exercises/ex01_print_comments_strings_numbers.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 01: Strings & Numbers"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Here are a few ideas what you could try to practice the skills learned today:\n", "\n", "- Print something (e.g. a greeting)! \n", "- Try to join a few strings using the operator `+`.\n", "- Use the string methods `.join`, `.split`, and `.find`.\n", "- Perform a calculation which uses all mathematical operators.\n", "- Change the order of operations by using parentheses.\n", "- Create 3 variables: a string, an integer, and a float. Print a message containing the variables' values using an f-string!"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:"]}], "metadata": {"interpreter": {"hash": "9876a56b36d3e86ed839f942802fea42f90ec2f0ee3c4ea82631e635694a47fe"}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0"}}, "nbformat": 4, "nbformat_minor": 4} -------------------------------------------------------------------------------- /exercises/ex03_booleans_branching_loops.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {"id": "Y2SZiOJLknDc"}, "source": ["# Exercise 03: Branching, and Loops"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## BMI calculation\n", "\n", "Calculate the BMI using the following formula: \n", "\n", "$BMI = \\frac{weight}{height^2}$\n", "\n", "where _weight_ is given in kilograms (kg) and _height_ is given in meters (m).\n", "\n", "Create the variables `weight` and `height` and assign values of your choice.\n", "\n", "Print the BMI rounded to 2 decimal places and also its classification as below:\n", "\n", "* BMI < 18.5: underweight\n", "* 18.5 <= BMI < 25: normal weight\n", "* 25 <= BMI < 30: overweight\n", "* 30 <= BMI < 40: obesity\n", "* BMI >= 40: morbid obesity\n", "\n", "hint: use if/elif/else statements\n", "\n", "**Example:**\n", "\n", "```\n", "weight = 69\n", "height = 1.75\n", "```\n", "\n", "The result could look like this:\n", "```\n", "BMI: 22.53\n", "Classification: normal weight\n", "```"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Find minimum\n", "\n", "Find the smallest number in the list below using a `for`-loop and an `if`-condition!"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["numbers = [27, 83, 20, 77, 1923, 4, 7, 19]\n", "\n", "# your code goes here:\n"]}], "metadata": {"interpreter": {"hash": "46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2"}, "kernelspec": {"display_name": "Python 3.10.4 64-bit", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4"}}, "nbformat": 4, "nbformat_minor": 4} -------------------------------------------------------------------------------- /exercises/ex05_file_IO.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {"id": "Y2SZiOJLknDc"}, "source": ["# Exercise 05: File I/O\n", "\n", "## Cookbook\n", "You want to use a cookbook to calculate amounts for any desired number of servings. For this your have following information:\n", "* filename of the recipe (`string`), e.g. `\"bananabread.recipe\"`\n", "* name of the desired recipe (`string`), e.g. `\"Bananabread\"`\n", "* `servings` -> number of servings (`int`), e.g. 3
\n", "\n", "The `.recipe`-file contains the desired recipe. From there, read the instructions and ingredients, calculate the needed amounts given by the servings and save them in a new file. \n", "Such a `.recipe` file could look like this:\n", "```\n", "Mix everything together and bake for 45min at 180\u00b0C.\n", "Wheat,180,g\n", "Sugar,100,g\n", "Cocoa,60,g\n", "Salt,1,pinch\n", "Baking Powder,1,tsp\n", "Chopped Chocolate,170,g\n", "Bananas,3,psc\n", "```\n", "The first line denotes always the instructoins and all following lines contains the ingredients. All ingreadients have the following structure: \n", "`,,`.\n", "\n", "Now read all Ingredients and multiply the amount by the number of servings. Sace the final new recipe in a new `.txt` file with the same filename as the `.recipe` file (but with `.txt` as suffix). \n", "The file should look like the following:\n", "```\n", "How to make 3 serving(s) of Bananabread:\n", "Wheat: 540g\n", "Sugar: 300g\n", "Cocoa: 180g\n", "Salt: 3pinch\n", "Baking Powder: 3tsp\n", "Chopped Chocolate: 510g\n", "Bananas: 9pcs\n", "Oil: 180ml\n", "Mix everything together and bake for 45min at 180\u00b0C.\n", "```"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# Your code goes here:\n"]}], "metadata": {"interpreter": {"hash": "46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2"}, "kernelspec": {"display_name": "Python 3.10.4 64-bit", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4"}}, "nbformat": 4, "nbformat_minor": 4} -------------------------------------------------------------------------------- /exercises/ex04_functions.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 04: Functions\n", "\n", "## Basic Function\n", "Write a function `my_contact_information` which takes **no** arguments and prints your contact information as follows:\n", "\n", "```\n", "First Name: \n", "Last Name: \n", "Email: \n", "```"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Functions and return\n", "Calculate the sum of a list. Write a function which takes a list or a tuple as input. Calculate the sum of the elements in the list/tuple and return the resulting sum. \n", "*Do not use `sum()`*\n", "\n", "**Example:** \n", "```\n", "[1,2,3] -> 6\n", "(6, 90, 10, 15, 114, 25, 18, 91, 51) -> 420\n", "[63, 100, 48, 79, 4, 85, 26, 84, 16, 73, 58, 78, 87, 198, 321, 17] -> 1337\n", "```"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# Your code goes here.\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Functions with keyword arguments\n", "Write a function which takes a list or a tuple of numbers and a keyword argumnet `power` which default value is 2. For each number you should calculate the number to the power (given by the keyword argument). Return the new list.\n", "\n", "**Example: with power=2** \n", "```\n", "[1,2,3] -> [1, 4, 9]\n", "(6, 90, 10, 15, 114, 25, 18, 91, 51) -> [36, 8100, 100, 225, 12996, 625, 324, 8281, 2601]\n", "```\n", "**Example: with power=3** \n", "```\n", "[63, 100, 48, 79] -> [250047, 1000000, 110592, 493039]\n", "```"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# Your code goes here.\n"]}], "metadata": {"interpreter": {"hash": "46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2"}, "kernelspec": {"display_name": "Python 3.10.4 64-bit", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4"}}, "nbformat": 4, "nbformat_minor": 4} -------------------------------------------------------------------------------- /exercises/ex07_pandas_b.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 07: Pandas B\n", "Today's exercise continues on working with pandas DataFrames. If you prefer the [video tutorials](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS), parts 7 to 11 cover the skills needed (and quite a bit more!). Alternatively, check out the [Getting started tutorials](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html) or follow the \"Help\"-links to relevant Google searches ;)\n", "\n", "The required tables are the same as in Exercise 06, so you can just copy the code for reading them from there."]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["For each location, get the total number of cases (i.e. the sum of the daily new cases).\n", "\n", "[Help](https://www.google.com/search?q=pandas+sum+column+by+group)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Calculate the total number of cases per millions inhabitants for all locations.\n", "\n", "[Help1](https://www.google.com/search?q=pandas+merge+dataframe), [Help2](https://www.google.com/search?q=pandas+divide+two+columns)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["\n", "# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Now sort the resulting dataframe descending by cases per million.\n", "\n", "[Help](https://www.google.com/search?q=pandas+sort+descending)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Write the dataframe to a csv file, but include only location and cases_per_million.\n", "\n", "[Help1](https://www.google.com/search?q=pandas+write+csv), [Help2](https://www.google.com/search?q=pandas+write+csv+ignore+index)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Get the number of new vaccinations per continent in November 2021."]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}], "metadata": {"language_info": {"name": "python"}, "orig_nbformat": 4}, "nbformat": 4, "nbformat_minor": 2} -------------------------------------------------------------------------------- /exercises/ex05_reading_error_messages.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {"id": "Y2SZiOJLknDc"}, "source": ["# Exercise 05: Fixing Errors\n", "Try to understand the examples below and fix all errors that occure with the knowledge you gained in the last few days."]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["some_random_numbers = [98, 98, 22, 49, 19, 38, 22, 28, 23, 70, 21, 97, 13, 56, 98, 2, 93, 41, 72, 56}\n", " \n", "result = 0 \n", "for number in some_random_numbers\n", " result += number\n", " \n", "print(\"The sum of the random numbers is {result}')\n", "\n"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["students_and_points = {'Rolf': 24, 'Bob': 18, 'Anne' 42, 'Charlie': 37, 'Jen': 12, 'Jose': 24}\n", "for student, points in students_and_points:\n", " print(student, student)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["\n", "def is_palindrome(string)\n", " string = \"\".join(string.lower().split()\n", " return string = string[::-1]\n", "\n", "is_palindrome(\"Anna\")"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["employment_info = {\n", "\"Jhon\": {\n", " \"employee_id\" 109231, \n", " \"employment\": \"Carpenter\", \n", " \"Yearly Net Income\": 49520},\n", "\"Anna\": {\n", " \"employee_id\": 201201 \n", " \"employment\": \"Graphic Designer\", \n", " \"Yearly Net Income\": 5700}\n", "\"Bob': {\n", " \"employee_id\": 302211, \n", " \"employment\": \"Python Programmer\", \n", " \"Yearly Net Income\": 2147483647,\n", "\n", "with open(\"employment_info.csv\"):\n", " employment_info_file.write(\"Name,Employee-ID,Employment,Yearly-Net-Income\")\n", " for name, info in employment_info:\n", " some_file.write(f\"{name},{info[\"employee_id\"]},{info[\"employment\"]},{info[\"Yearly Net Income\"]}\")\n", " "]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# Fix the example before first since you need the datafile from it ;)\n", "with open(\"C:/a/very/good/and/definitely/available/absolute/path/to/employment_info.csv\") as employment_info:\n", " total_net_income = 0\n", " for line in employment_info.readlines()[1:]: # skip header\n", " total_net_income = line.strip().split(\",')[-1]\n", " print(\"All employees together have a yearly net income of: {total_net_income}\")"]}], "metadata": {"interpreter": {"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7"}}, "nbformat": 4, "nbformat_minor": 4} -------------------------------------------------------------------------------- /exercises/ex02_datastructures_loops.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 02: Data Structures & for-Loops"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Lists & Loops\n", "Based on the two lists of programming languages below, create a new list containing only the yet unknown languages. Sort the resulting list alphabetically and print it."]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["programming_languages = [\n", " \"Python\", \n", " \"C\", \n", " \"Julia\", \n", " \"Java\", \n", " \"C#\", \n", " \"Fortran\", \n", " \"C++\", \n", " \"JavaScript\",\n", " \"Perl\",\n", " \"Lisp\",\n", "]\n", "\n", "languages_i_know = [\"Python\", \"Julia\", \"C\"]\n", "\n", "# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Sets\n", "Solve the above example with sets instead of loops!"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["\n", ""]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Data structure conversion\n", "Create a dictionary which maps product names to prices, using the `price_table` given below."]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["price_table = [(\"honey\", 2.45), (\"butter\", 1.30), (\"catfood\", 4.85), (\"tea\", 1.50)]\n", "\n", "# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Nested data structures\n", "The data below is given as a list of nested tuples, where each outer tuple is structured as follows:\n", "\n", "`(, (, ))`\n", "\n", "Transform the data into a more explicit form, a dictionary where each key is a product name and each value is again a dictionary containing entries for `\"price\"` and `\"amount\"`. The result should look like this:\n", "```python\n", "{'honey': {'price': 2.45, 'amount': 10}, 'butter': {'price': 1.3, 'amount': 100}, 'catfood': {'price': 4.85, 'amount': 3}, 'tea': {'price': 1.5, 'amount': 123}}\n", "```"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["input_data = [\n", " (\"honey\", (2.45, 10)), \n", " (\"butter\", (1.30, 100)), \n", " (\"catfood\", (4.85, 3)), \n", " (\"tea\", (1.50, 123)),\n", "]\n", "\n", "# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Finally, reduce the amount of butter by `20` and print the entire dictionary:"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}], "metadata": {"interpreter": {"hash": "9876a56b36d3e86ed839f942802fea42f90ec2f0ee3c4ea82631e635694a47fe"}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0"}}, "nbformat": 4, "nbformat_minor": 4} -------------------------------------------------------------------------------- /exercises_solutions/ex05_file_IO_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "Y2SZiOJLknDc" 7 | }, 8 | "source": [ 9 | "# Exercise 05: File I/O\n", 10 | "\n", 11 | "## Cookbook\n", 12 | "You want to use a cookbook to calculate amounts for any desired number of servings. For this your have following information:\n", 13 | "* filename of the recipe (`string`), e.g. `\"bananabread.recipe\"`\n", 14 | "* name of the desired recipe (`string`), e.g. `\"Bananabread\"`\n", 15 | "* `servings` -> number of servings (`int`), e.g. 3
\n", 16 | "\n", 17 | "The `.recipe`-file contains the desired recipe. From there, read the instructions and ingredients, calculate the needed amounts given by the servings and save them in a new file. \n", 18 | "Such a `.recipe` file could look like this:\n", 19 | "```\n", 20 | "Mix everything together and bake for 45min at 180°C.\n", 21 | "Wheat,180,g\n", 22 | "Sugar,100,g\n", 23 | "Cocoa,60,g\n", 24 | "Salt,1,pinch\n", 25 | "Baking Powder,1,tsp\n", 26 | "Chopped Chocolate,170,g\n", 27 | "Bananas,3,psc\n", 28 | "```\n", 29 | "The first line denotes always the instructoins and all following lines contains the ingredients. All ingreadients have the following structure: \n", 30 | "`,,`.\n", 31 | "\n", 32 | "Now read all Ingredients and multiply the amount by the number of servings. Sace the final new recipe in a new `.txt` file with the same filename as the `.recipe` file (but with `.txt` as suffix). \n", 33 | "The file should look like the following:\n", 34 | "```\n", 35 | "How to make 3 serving(s) of Bananabread:\n", 36 | "Wheat: 540g\n", 37 | "Sugar: 300g\n", 38 | "Cocoa: 180g\n", 39 | "Salt: 3pinch\n", 40 | "Baking Powder: 3tsp\n", 41 | "Chopped Chocolate: 510g\n", 42 | "Bananas: 9pcs\n", 43 | "Oil: 180ml\n", 44 | "Mix everything together and bake for 45min at 180°C.\n", 45 | "```" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 1, 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "# Your code goes here:\n", 55 | "filename = \"bananabread.recipe\"\n", 56 | "name = \"Bananabread\"\n", 57 | "servings = 3\n", 58 | "\n", 59 | "with open(filename, \"r\") as f:\n", 60 | " recipe = f.readlines()\n", 61 | "\n", 62 | "with open(filename.replace(\".recipe\", \".txt\"), \"w\") as f:\n", 63 | " f.write(f\"How to make {servings} serving(s) of {name}:\\n\")\n", 64 | " for line in recipe[1:]:\n", 65 | " ingredient = line.strip().split(\",\")\n", 66 | " ingredient[1] = int(ingredient[1]) * servings\n", 67 | " f.write(f\"{ingredient[0]}: {ingredient[1]}{ingredient[2]}\\n\")\n", 68 | " f.write(recipe[0].strip())" 69 | ] 70 | } 71 | ], 72 | "metadata": { 73 | "kernelspec": { 74 | "display_name": "Python 3.9.7 64-bit", 75 | "language": "python", 76 | "name": "python3" 77 | }, 78 | "language_info": { 79 | "codemirror_mode": { 80 | "name": "ipython", 81 | "version": 3 82 | }, 83 | "file_extension": ".py", 84 | "mimetype": "text/x-python", 85 | "name": "python", 86 | "nbconvert_exporter": "python", 87 | "pygments_lexer": "ipython3", 88 | "version": "3.9.7" 89 | }, 90 | "vscode": { 91 | "interpreter": { 92 | "hash": "e65133fff93ab36b53ddb68612dd6a95a21053ff6cb8524adc7f46061287cbaa" 93 | } 94 | } 95 | }, 96 | "nbformat": 4, 97 | "nbformat_minor": 4 98 | } 99 | -------------------------------------------------------------------------------- /exercises_solutions/ex03_booleans_branching_loops_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "Y2SZiOJLknDc" 7 | }, 8 | "source": [ 9 | "# Exercise 03: Branching, and Loops" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## BMI calculation\n", 17 | "\n", 18 | "Calculate the BMI using the following formula: \n", 19 | "\n", 20 | "$BMI = \\frac{weight}{height^2}$\n", 21 | "\n", 22 | "where _weight_ is given in kilograms (kg) and _height_ is given in meters (m).\n", 23 | "\n", 24 | "Create the variables `weight` and `height` and assign values of your choice.\n", 25 | "\n", 26 | "Print the BMI rounded to 2 decimal places and also its classification as below:\n", 27 | "\n", 28 | "* BMI < 18.5: underweight\n", 29 | "* 18.5 <= BMI < 25: normal weight\n", 30 | "* 25 <= BMI < 30: overweight\n", 31 | "* 30 <= BMI < 40: obesity\n", 32 | "* BMI >= 40: morbid obesity\n", 33 | "\n", 34 | "hint: use if/elif/else statements\n", 35 | "\n", 36 | "**Example:**\n", 37 | "\n", 38 | "```\n", 39 | "weight = 69\n", 40 | "height = 1.75\n", 41 | "```\n", 42 | "\n", 43 | "The result could look like this:\n", 44 | "```\n", 45 | "BMI: 22.53\n", 46 | "Classification: normal weight\n", 47 | "```" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 2, 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "name": "stdout", 57 | "output_type": "stream", 58 | "text": [ 59 | "BMI: 22.53\n", 60 | "Classification: normal weight\n" 61 | ] 62 | } 63 | ], 64 | "source": [ 65 | "# your code goes here:\n", 66 | "weight = 69\n", 67 | "height = 1.75\n", 68 | "\n", 69 | "bmi = weight / (height ** 2)\n", 70 | "if bmi < 18.5:\n", 71 | " print(f\"BMI: {bmi:.2f}\\nClassification: underweight\")\n", 72 | "elif bmi < 25:\n", 73 | " print(f\"BMI: {bmi:.2f}\\nClassification: normal weight\")\n", 74 | "elif bmi < 30:\n", 75 | " print(f\"BMI: {bmi:.2f}\\nClassification: overweight\")\n", 76 | "elif bmi < 40:\n", 77 | " print(f\"BMI: {bmi:.2f}\\nClassification: obesity\")\n", 78 | "else:\n", 79 | " print(f\"BMI: {bmi:.2f}\\nClassification: morbid obesity\")" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "## Find minimum\n", 87 | "\n", 88 | "Find the smallest number in the list below using a `for`-loop and an `if`-condition!" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 1, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "name": "stdout", 98 | "output_type": "stream", 99 | "text": [ 100 | "4\n" 101 | ] 102 | } 103 | ], 104 | "source": [ 105 | "numbers = [27, 83, 20, 77, 1923, 4, 7, 19]\n", 106 | "\n", 107 | "# your code goes here:\n", 108 | "min_number = float(\"inf\")\n", 109 | "for number in numbers:\n", 110 | " if number < min_number:\n", 111 | " min_number = number\n", 112 | "print(min_number)" 113 | ] 114 | } 115 | ], 116 | "metadata": { 117 | "interpreter": { 118 | "hash": "46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2" 119 | }, 120 | "kernelspec": { 121 | "display_name": "Python 3.10.4 64-bit", 122 | "language": "python", 123 | "name": "python3" 124 | }, 125 | "language_info": { 126 | "codemirror_mode": { 127 | "name": "ipython", 128 | "version": 3 129 | }, 130 | "file_extension": ".py", 131 | "mimetype": "text/x-python", 132 | "name": "python", 133 | "nbconvert_exporter": "python", 134 | "pygments_lexer": "ipython3", 135 | "version": "3.10.4" 136 | } 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 4 140 | } 141 | -------------------------------------------------------------------------------- /exercises/ex08_plotting.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 08: Plotting\n", "The package [matplotlib](https://matplotlib.org/stable/tutorials/introductory/usage.html) allows to create, style, and save plots in Python. While there are other packages which make creating quick - and interactive - plots easier (like [Plotly](https://plotly.com/python/), [Bokeh](https://bokeh.org/), or [HoloViews](https://holoviews.org/)), if you want to create a perfectly styled plot for a publication, matplotlib is usually the best choice. Just like pandas, matplotlib is already installed in Google Colab. Install it locally with `python -m pip install matplotlib`.\n", "\n", "[This video tutorial (1h 34min)](https://www.youtube.com/watch?v=wB9C0Mz9gSo) covers all the basics you need to know and more (we will focus on line plots, histograms, and scatter plots).\n", "\n", "In matplotlib, there are two different ways of working with plots - a _state-based_ approach (which mimics MATLAB behaviour) and a _object-oriented_ approach. In the matplotlib documentation (and all over the internet), you will find both of them, sometimes even mixed. We encourage you to use the latter, as it is more explicit and avoids strange bugs. [This Real Python article](https://realpython.com/python-matplotlib-guide) explains the two approaches and what's happening in the background. It does not provide instructions for all plot types in this exercise, but there are once again Google search links which should help ;)\n", "\n", "For all the plots you create, add a [title](https://matplotlib.org/3.1.1/gallery/subplots_axes_and_figures/figure_title.html) as well as [labels for both axes](https://matplotlib.org/stable/gallery/pyplots/fig_axes_labels_simple.html).\n", "\n", "Pandas DataFrames have a method `.plot` which uses matplotlib in the background. This is helpful for quick visualizations, but make sure you learn how to use matplotlib directly, as it is much more versatile! Check out [this Real Python article](https://realpython.com/pandas-plot-python/) for details. \n", "\n", "To start off, load the tables from Exercises 06/07!"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Create a line plot showing the daily cases and 1-week rolling mean for Austria.\n", "\n", "[Help with rolling mean](https://www.google.com/search?q=pandas+rolling+mean) \n", "[Help with MultiIndex](https://www.google.com/search?q=pandas+remove+multiindex) \n", "[Help with line plot](https://www.google.com/search?q=matplotlib+line+plot)\n"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Do the same for two more locations of your choice using additional subplots. Limit the x-axes to an interval of interest (hint: since the \"date\" column has a datetime type, you can use strings written as `\"YYYY-MM-DD\"` to specify the limits - matplotlib will know what to do).\n", "\n", "[Help1](https://www.google.com/search?q=matplotlib+subplots), [Help2](https://www.google.com/search?q=matplotlib+x+axis+limits)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Create a histogram of the populations for all locations in Africa. Adjust the bin size to something you think makes sense.\n", "\n", "[Help](https://www.google.com/search?q=matplotlib+histogram)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Create a scatter plot showing the number of deaths vs. the number of cases (both relative to the population), in January 2022, for all locations. Color-code the dots based on the associated continent, show a legend explaining the colors, and scale both axes logarithmically.\n", "\n", "[Help1](https://www.google.com/search?q=matplotlib+scatter+plot), [Help2](https://www.google.com/search?q=matplotlib+scatter+marker+colors), [Help3](https://www.google.com/search?q=matplotlib+legend), [Help4](https://www.google.com/search?q=matplotlib+scatter+log+scale)"]}, {"cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": ["# your code goes here:\n"]}], "metadata": {"language_info": {"name": "python"}, "orig_nbformat": 4}, "nbformat": 4, "nbformat_minor": 2} -------------------------------------------------------------------------------- /exercises_solutions/ex04_functions_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Exercise 04: Functions\n", 8 | "\n", 9 | "## Basic Function\n", 10 | "Write a function `my_contact_information` which takes **no** arguments and prints your contact information as follows:\n", 11 | "\n", 12 | "```\n", 13 | "First Name: \n", 14 | "Last Name: \n", 15 | "Email: \n", 16 | "```" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "First Name: Moritz\n", 29 | "Last Name: Erlacher\n", 30 | "Email: moritz.erlacher@student.tugraz.at\n" 31 | ] 32 | } 33 | ], 34 | "source": [ 35 | "# your code goes here:\n", 36 | "def my_contact_information():\n", 37 | " print(\"First Name: Moritz\\nLast Name: Erlacher\\nEmail: moritz.erlacher@student.tugraz.at\")\n", 38 | "my_contact_information()" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "## Functions and return\n", 46 | "Calculate the sum of a list. Write a function which takes a list or a tuple as input. Calculate the sum of the elements in the list/tuple and return the resulting sum. \n", 47 | "*Do not use `sum()`*\n", 48 | "\n", 49 | "**Example:** \n", 50 | "```\n", 51 | "[1,2,3] -> 6\n", 52 | "(6, 90, 10, 15, 114, 25, 18, 91, 51) -> 420\n", 53 | "[63, 100, 48, 79, 4, 85, 26, 84, 16, 73, 58, 78, 87, 198, 321, 17] -> 1337\n", 54 | "```" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 3, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "6" 66 | ] 67 | }, 68 | "execution_count": 3, 69 | "metadata": {}, 70 | "output_type": "execute_result" 71 | } 72 | ], 73 | "source": [ 74 | "# Your code goes here.\n", 75 | "def sum(data):\n", 76 | " result = 0\n", 77 | " for element in data:\n", 78 | " result += element\n", 79 | " return result\n", 80 | "\n", 81 | "sum([1,2,3])" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "## Functions with keyword arguments\n", 89 | "Write a function which takes a list or a tuple of numbers and a keyword argumnet `power` which default value is 2. For each number you should calculate the number to the power (given by the keyword argument). Return the new list.\n", 90 | "\n", 91 | "**Example: with power=2** \n", 92 | "```\n", 93 | "[1,2,3] -> [1, 4, 9]\n", 94 | "(6, 90, 10, 15, 114, 25, 18, 91, 51) -> [36, 8100, 100, 225, 12996, 625, 324, 8281, 2601]\n", 95 | "```\n", 96 | "**Example: with power=3** \n", 97 | "```\n", 98 | "[63, 100, 48, 79] -> [250047, 1000000, 110592, 493039]\n", 99 | "```" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 4, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "data": { 109 | "text/plain": [ 110 | "[250047, 1000000, 110592, 493039]" 111 | ] 112 | }, 113 | "execution_count": 4, 114 | "metadata": {}, 115 | "output_type": "execute_result" 116 | } 117 | ], 118 | "source": [ 119 | "# Your code goes here.\n", 120 | "def list_to_power(data, power=2):\n", 121 | " return [element**power for element in data]\n", 122 | "\n", 123 | "# Alternative without list comprehension:\n", 124 | "def list_to_power(data, power=2):\n", 125 | " result = []\n", 126 | " for element in data:\n", 127 | " result.append(element**power)\n", 128 | " return result\n", 129 | "\n", 130 | "list_to_power([63, 100, 48, 79], 3)" 131 | ] 132 | } 133 | ], 134 | "metadata": { 135 | "kernelspec": { 136 | "display_name": "Python 3.9.7 64-bit", 137 | "language": "python", 138 | "name": "python3" 139 | }, 140 | "language_info": { 141 | "codemirror_mode": { 142 | "name": "ipython", 143 | "version": 3 144 | }, 145 | "file_extension": ".py", 146 | "mimetype": "text/x-python", 147 | "name": "python", 148 | "nbconvert_exporter": "python", 149 | "pygments_lexer": "ipython3", 150 | "version": "3.9.7" 151 | }, 152 | "vscode": { 153 | "interpreter": { 154 | "hash": "e65133fff93ab36b53ddb68612dd6a95a21053ff6cb8524adc7f46061287cbaa" 155 | } 156 | } 157 | }, 158 | "nbformat": 4, 159 | "nbformat_minor": 4 160 | } 161 | -------------------------------------------------------------------------------- /exercises_solutions/ex05_reading_error_messages_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "Y2SZiOJLknDc" 7 | }, 8 | "source": [ 9 | "# Exercise 05: Fixing Errors\n", 10 | "Try to understand the examples below and fix all errors that occure with the knowledge you gained in the last few days." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 3, 16 | "metadata": {}, 17 | "outputs": [ 18 | { 19 | "name": "stdout", 20 | "output_type": "stream", 21 | "text": [ 22 | "The sum of the random numbers is 1016\n" 23 | ] 24 | } 25 | ], 26 | "source": [ 27 | "some_random_numbers = [98, 98, 22, 49, 19, 38, 22, 28, 23, 70, 21, 97, 13, 56, 98, 2, 93, 41, 72, 56]\n", 28 | " \n", 29 | "result = 0 \n", 30 | "for number in some_random_numbers:\n", 31 | " result += number\n", 32 | " \n", 33 | "print(f\"The sum of the random numbers is {result}\")\n", 34 | "\n" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 5, 40 | "metadata": {}, 41 | "outputs": [ 42 | { 43 | "name": "stdout", 44 | "output_type": "stream", 45 | "text": [ 46 | "Rolf Rolf\n", 47 | "Bob Bob\n", 48 | "Anne Anne\n", 49 | "Charlie Charlie\n", 50 | "Jen Jen\n", 51 | "Jose Jose\n" 52 | ] 53 | } 54 | ], 55 | "source": [ 56 | "students_and_points = {'Rolf': 24, 'Bob': 18, 'Anne': 42, 'Charlie': 37, 'Jen': 12, 'Jose': 24}\n", 57 | "for student, points in students_and_points.items():\n", 58 | " print(student, student)" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 7, 64 | "metadata": {}, 65 | "outputs": [ 66 | { 67 | "data": { 68 | "text/plain": [ 69 | "True" 70 | ] 71 | }, 72 | "execution_count": 7, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "def is_palindrome(string):\n", 79 | " string = \"\".join(string.lower().split())\n", 80 | " return string == string[::-1]\n", 81 | "\n", 82 | "is_palindrome(\"Anna\")" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 18, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "employment_info = {\n", 92 | "\"Jhon\": {\n", 93 | " \"employee_id\": 109231, \n", 94 | " \"employment\": \"Carpenter\", \n", 95 | " \"Yearly Net Income\": 49520},\n", 96 | "\"Anna\": {\n", 97 | " \"employee_id\": 201201, \n", 98 | " \"employment\": \"Graphic Designer\", \n", 99 | " \"Yearly Net Income\": 5700},\n", 100 | "\"Bob\": {\n", 101 | " \"employee_id\": 302211, \n", 102 | " \"employment\": \"Python Programmer\", \n", 103 | " \"Yearly Net Income\": 2147483647},\n", 104 | "}\n", 105 | "\n", 106 | "with open(\"employment_info.csv\", \"w\") as employment_info_file:\n", 107 | " employment_info_file.write(\"Name,Employee-ID,Employment,Yearly-Net-Income\\n\")\n", 108 | " for name, info in employment_info.items():\n", 109 | " employment_info_file.write(f'{name},{info[\"employee_id\"]},{info[\"employment\"]},{info[\"Yearly Net Income\"]}\\n')\n", 110 | " " 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 22, 116 | "metadata": {}, 117 | "outputs": [ 118 | { 119 | "name": "stdout", 120 | "output_type": "stream", 121 | "text": [ 122 | "All employees together have a yearly net income of: 2147538867\n" 123 | ] 124 | } 125 | ], 126 | "source": [ 127 | "# Fix the example before first since you need the datafile from it ;)\n", 128 | "with open(\"employment_info.csv\", \"r\") as employment_info:\n", 129 | " total_net_income = 0\n", 130 | " for line in employment_info.readlines()[1:]: # skip header\n", 131 | " total_net_income += int(line.strip().split(\",\")[-1]) # You could also cast it to float\n", 132 | " print(f\"All employees together have a yearly net income of: {total_net_income}\")" 133 | ] 134 | } 135 | ], 136 | "metadata": { 137 | "kernelspec": { 138 | "display_name": "Python 3.9.7 64-bit", 139 | "language": "python", 140 | "name": "python3" 141 | }, 142 | "language_info": { 143 | "codemirror_mode": { 144 | "name": "ipython", 145 | "version": 3 146 | }, 147 | "file_extension": ".py", 148 | "mimetype": "text/x-python", 149 | "name": "python", 150 | "nbconvert_exporter": "python", 151 | "pygments_lexer": "ipython3", 152 | "version": "3.9.7" 153 | }, 154 | "vscode": { 155 | "interpreter": { 156 | "hash": "e65133fff93ab36b53ddb68612dd6a95a21053ff6cb8524adc7f46061287cbaa" 157 | } 158 | } 159 | }, 160 | "nbformat": 4, 161 | "nbformat_minor": 4 162 | } 163 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SICSS Aachen-Graz - Python Crash Course 2 | 3 | Welcome to the Python crash course of the Summer Institute in Computational Social Science! 4 | 5 | In the first week, you will learn the basics of programming in Python: data types, data structures, branching, loops, and functions. In [`./tutorials`](./tutorials/), you'll find interactive notebooks explaining these concepts (for each day there is one notebook). They include links to video tutorials for those of you who prefer spoken explanations over reading. To practice the newly learned skills, check out the examples in [`./exercises`](./exercises/)! (Again one notebook for each day). 6 | The table bellow gives you an overview of the weeks content and the links to the tutorials and exercises: 7 | 8 | | Day | Tutorial | Exercise | 9 | | --- | --- | --- | 10 | | 1 | [Print, Comments, Strings and Numbers](./tutorials/tut01_print_comments_strings_numbers.ipynb) | [Exercise 1](./exercises/ex01_print_comments_strings_numbers.ipynb) | 11 | | 2 | [Datastructures and Loops Part 1](./tutorials/tut02_datastructures_loops.ipynb) | [Exercise 2](./exercises/ex02_datastructures_loops.ipynb) | 12 | | 3 | [Booleans, Conditions and Loops Part 2](./tutorials/tut03_booleans_branching_loops.ipynb) | [Exercise 3](./exercises/ex03_booleans_branching_loops.ipynb) | 13 | | 4 | [Functions](./tutorials/tut04_functions.ipynb) | [Exercise 4](./exercises/ex04_functions.ipynb) | 14 | | 5 | [File I/O](./tutorials/tut05_file_IO.ipynb) | [Exercise 5_1](./exercises/ex05_file_IO.ipynb) and [Exercise 5_2](./exercises/ex05_reading_error_messages.ipynb) | 15 | 16 | The second week covers working with tabular data, plotting, basic statistics, and networks. For those topics, we'll skip the explanatory notebooks and go straight for practical tasks (again found in [`./exercises`](./exercises/)). But don't worry, the task descriptions contain links to helpful videos and documentation pages :) 17 | The table bellow gives you an overview of the weeks content and the links to the exercises: 18 | | Day | Tutorial | Exercise | 19 | | --- | --- | --- | 20 | | 1 | `pandas` Part 1 | [Exercise 6](./exercises/ex06_pandas_a.ipynb) | 21 | | 2 | `pandas` Part 2 | [Exercise 7](./exercises/ex07_pandas_b.ipynb) | 22 | | 3 | Plotting with `matplotlib` | [Exercise 8](./exercises/ex08_plotting.ipynb) | 23 | | 4 | Basic Statistics with `scipy` and `statsmodels` | [Exercise 9](./exercises/ex09_scipy_statsmodels.ipynb) | 24 | | 5 | Networks with `networkx` | [Exercise 10](./exercises/ex10_networkx.ipynb) | 25 | 26 | 27 | ## Setup 28 | There are many different environments in which you can work with Python. To simplify the setup, we encourage you to use Google Colab: https://colab.research.google.com/. It provides a notebook environment with all the required packages already installed. To get the content of this repository into Colab, download it (click the green "Code" button at the top right and select "Download ZIP"), extract the content, and upload it to your Google Drive. You should then be able to open the notebook files in Colab. 29 | 30 | _**Alternatively**_, you can follow the steps to set up a local Python environment: 31 | 32 | ### Install Python locally 33 | 34 | Do this only if you're not using Goole Colab!! 35 | 36 | - Download the latest installer for your operating system here: https://www.python.org/downloads/ 37 | If you use Linux, execute the following commands instead (and skip the steps below) 38 | ``` 39 | sudo apt-get update 40 | sudo apt-get install python3.10 41 | ``` 42 | - Open the downloaded Python installer. **Make sure to check the "Add Python 3.10 to PATH" mark!** 43 | - Click "Install Now". This may take a while. 44 | - To finish the installation click "Close". 45 | - To check if you successfully installed Python open up a terminal window: 46 | - Windows: press the Windows key ![](http://i.stack.imgur.com/B8Zit.png), enter `cmd`, and hit enter. 47 | - MacOSX: press and enter `Terminal` 48 | - Linux: press the "Super key" and enter `Terminal` 49 | - Type `python --version` and hit enter. You should see something like `Python 3.10.4`. 50 | 51 | ### Install a notebook environment 52 | There are multiple notebook environments available. Below are two of the most commonly used: 53 | #### VS Code 54 | [Visual Studio Code](https://code.visualstudio.com/) is a general-purpose text editor. It [Python extension](https://marketplace.visualstudio.com/items?itemName=ms-python.python) provides in-depth support for developing Python programs, including a notebook environment. For that to work, a _kernel_ is required. To install, open a terminal and execute 55 | 56 | `python -m pip install ipykernel` 57 | 58 | Click [here](https://code.visualstudio.com/docs/datascience/jupyter-notebooks) for detailed instructions on working with notebooks in VS Code. 59 | 60 | #### JupyterLab 61 | [JupyterLab](https://jupyter.org/install) is an interactive notebook environment. You can open the `.ipynb` files provided for tutorials and exercises as well as other text-based files. 62 | To install, enter 63 | 64 | `python -m pip install jupyterlab` 65 | 66 | in the terminal. 67 | To run JupyterLab, execute 68 | 69 | `python -m jupyter lab`. 70 | 71 | A browser window should pop up with JupyterLab loading. Don't close your Command Prompt while working in jupyter lab. Click [here](https://jupyterlab.readthedocs.io/en/latest/user/interface.html) for a documentation of the JupyterLab Interface and how to use it 72 | 73 | #### Which one should i choose? 74 | JupyterLab is the more simple and lightweight option. VS Code provides much better autocompletion and helps finding mistakes early - it might feel overwhelming at first, but if you plan on programming more, it'll be worth your effort! 75 | -------------------------------------------------------------------------------- /exercises_solutions/ex02_datastructures_loops_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Exercise 02: Data Structures & for-Loops" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Lists & Loops\n", 15 | "Based on the two lists of programming languages below, create a new list containing only the yet unknown languages. Sort the resulting list alphabetically and print it." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 2, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "name": "stdout", 25 | "output_type": "stream", 26 | "text": [ 27 | "['C#', 'C++', 'Fortran', 'Java', 'JavaScript', 'Lisp', 'Perl']\n", 28 | "['C#', 'C++', 'Fortran', 'Java', 'JavaScript', 'Lisp', 'Perl']\n" 29 | ] 30 | } 31 | ], 32 | "source": [ 33 | "programming_languages = [\n", 34 | " \"Python\", \n", 35 | " \"C\", \n", 36 | " \"Julia\", \n", 37 | " \"Java\", \n", 38 | " \"C#\", \n", 39 | " \"Fortran\", \n", 40 | " \"C++\", \n", 41 | " \"JavaScript\",\n", 42 | " \"Perl\",\n", 43 | " \"Lisp\",\n", 44 | "]\n", 45 | "\n", 46 | "languages_i_know = [\"Python\", \"Julia\", \"C\"]\n", 47 | "\n", 48 | "# your code goes here:\n", 49 | "unknown_programming_languages = programming_languages[:]\n", 50 | "for language in languages_i_know:\n", 51 | " unknown_programming_languages.remove(language)\n", 52 | "print(sorted(unknown_programming_languages))\n", 53 | "\n", 54 | "# If you already know \"if\":\n", 55 | "unknown_programming_languages = []\n", 56 | "for language in programming_languages:\n", 57 | " if language not in languages_i_know:\n", 58 | " unknown_programming_languages.append(language)\n", 59 | "print(sorted(unknown_programming_languages))\n" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### Sets\n", 67 | "Solve the above example with sets instead of loops!" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "name": "stdout", 77 | "output_type": "stream", 78 | "text": [ 79 | "['C#', 'C++', 'Fortran', 'Java', 'JavaScript', 'Lisp', 'Perl']\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "# your code goes here:\n", 85 | "unknown_programming_languages = set(programming_languages) - set(languages_i_know)\n", 86 | "print(sorted(unknown_programming_languages))" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "\n", 94 | "" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "## Data structure conversion\n", 102 | "Create a dictionary which maps product names to prices, using the `price_table` given below." 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 4, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "name": "stdout", 112 | "output_type": "stream", 113 | "text": [ 114 | "{'honey': 2.45, 'butter': 1.3, 'catfood': 4.85, 'tea': 1.5}\n", 115 | "{'honey': 2.45, 'butter': 1.3, 'catfood': 4.85, 'tea': 1.5}\n" 116 | ] 117 | } 118 | ], 119 | "source": [ 120 | "price_table = [(\"honey\", 2.45), (\"butter\", 1.30), (\"catfood\", 4.85), (\"tea\", 1.50)]\n", 121 | "\n", 122 | "# your code goes here:\n", 123 | "products = {}\n", 124 | "\n", 125 | "for name, price in price_table:\n", 126 | " products[name] = price\n", 127 | "print(products)\n", 128 | "\n", 129 | "# Alternative:\n", 130 | "print(dict(price_table))" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "## Nested data structures\n", 138 | "The data below is given as a list of nested tuples, where each outer tuple is structured as follows:\n", 139 | "\n", 140 | "`(, (, ))`\n", 141 | "\n", 142 | "Transform the data into a more explicit form, a dictionary where each key is a product name and each value is again a dictionary containing entries for `\"price\"` and `\"amount\"`. The result should look like this:\n", 143 | "```python\n", 144 | "{'honey': {'price': 2.45, 'amount': 10}, 'butter': {'price': 1.3, 'amount': 100}, 'catfood': {'price': 4.85, 'amount': 3}, 'tea': {'price': 1.5, 'amount': 123}}\n", 145 | "```" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 19, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "{'honey': {'price': 2.45, 'amount': 10}, 'butter': {'price': 1.3, 'amount': 100}, 'catfood': {'price': 4.85, 'amount': 3}, 'tea': {'price': 1.5, 'amount': 123}}\n" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "input_data = [\n", 163 | " (\"honey\", (2.45, 10)), \n", 164 | " (\"butter\", (1.30, 100)), \n", 165 | " (\"catfood\", (4.85, 3)), \n", 166 | " (\"tea\", (1.50, 123)),\n", 167 | "]\n", 168 | "\n", 169 | "# your code goes here:\n", 170 | "products = {}\n", 171 | "for name, (price, amount) in input_data:\n", 172 | " products[name] = {\n", 173 | " \"price\": price,\n", 174 | " \"amount\": amount,\n", 175 | " }\n", 176 | "print(products)" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "Finally, reduce the amount of butter by `20` and print the entire dictionary:" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 20, 189 | "metadata": {}, 190 | "outputs": [ 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "{'honey': {'price': 2.45, 'amount': 10}, 'butter': {'price': 1.3, 'amount': 80}, 'catfood': {'price': 4.85, 'amount': 3}, 'tea': {'price': 1.5, 'amount': 123}}\n" 196 | ] 197 | } 198 | ], 199 | "source": [ 200 | "# your code goes here:\n", 201 | "products[\"butter\"][\"amount\"] -= 20\n", 202 | "print(products)" 203 | ] 204 | } 205 | ], 206 | "metadata": { 207 | "kernelspec": { 208 | "display_name": "Python 3.10.4 64-bit", 209 | "language": "python", 210 | "name": "python3" 211 | }, 212 | "language_info": { 213 | "codemirror_mode": { 214 | "name": "ipython", 215 | "version": 3 216 | }, 217 | "file_extension": ".py", 218 | "mimetype": "text/x-python", 219 | "name": "python", 220 | "nbconvert_exporter": "python", 221 | "pygments_lexer": "ipython3", 222 | "version": "3.10.4" 223 | }, 224 | "vscode": { 225 | "interpreter": { 226 | "hash": "46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2" 227 | } 228 | } 229 | }, 230 | "nbformat": 4, 231 | "nbformat_minor": 4 232 | } 233 | -------------------------------------------------------------------------------- /exercises/ex06_pandas_a.ipynb: -------------------------------------------------------------------------------- 1 | {"cells":[{"cell_type":"markdown","metadata":{},"source":["# Exercise 06: Pandas A\n","\n","## Import\n","\n","Before we start with todays exercice, let us talk a little bit about **modules** and what they represent. Imagine you are working on a script which includes a variety of functions to solve common tasks, for instance: functions to perform mathematical matrix operations or functions to visualise huge amounts of data. Wouldn't it be convenient if we could use those functions in another python-script? Well....since we are too lazy to rewrite everything.... Yeah, it would!\n","\n","Modules - it's your time to shine🌞
\n","Modules are nothing more than .py-files consisting of different kinds of components, i.e. functions, which can be made available in any other python-script using the `import`-statement. And yes, you can import your own python-scripts as well! Besides that, Python comes with a extensive amount of modules, known as Standard-Library. You can also include 3rd-party packages (numpy, pandas, scipy, matplotlib,...) but you will have to install them first. \n","\n","To keep the namespace clean (and your brain sane), lets have a look on how to use `import`.\n","### How to `import` everything from a module\n","\n","**Python-Syntax:** \n","```python\n","import module_name\n","```\n","\n","It doesn't get easier than that - after the import-statement follows the name of the module. Now you can use all functions from the `pandas` module by prefixing their name with their namespace `pandas.` Usually, all import-statements are found at the top of the script to keep the code tidy and clear.\n","\n"]},{"cell_type":"markdown","metadata":{},"source":["### How to `import` specific contents `from` a module\n","\n","**Python-Syntax:**\n","```python\n","from module_name import content_name1, content_name2, etc\n","``` \n","\n","Instead of importing everything of a module, we can extract specific contents, i.e. only functions we really need. This allows us, to use functions without the namespace-prefix. Keep in mind, that multiple contents are separated with commas (`,`)."]},{"cell_type":"markdown","metadata":{},"source":["**Pitfall:**\n","```python\n","from statistics import mean\n","from numpy import mean\n","```\n","Always keep an eye on which elements you are importing from different modules. In our case, there are two imported functions with the same name (name-collision). Therefore python always uses the last imported function with that name - in our case, the mean-function of the numpy module. The last import always wins!\n","\n"]},{"cell_type":"markdown","metadata":{},"source":["### How to `import` a module `as` you like\n","\n","**Python-Syntax:** \n","```python\n","import module_name as new_module_name_in_namespace\n","from module_name import component as new_component_name_in_namespace\n","```\n","\n","Modules and packages can be renamed on import to keep code more succinct. Most widely-used packages have an established abbreviation. Stick to it to make your code readable for others! For example pandas established abbrevation is `pd` so you would import it as:\n","```python\n","import pandas as pd\n","```"]},{"cell_type":"markdown","metadata":{},"source":["\n","[Pandas](https://pandas.pydata.org/docs/ ) is a Python package which provides data structures for working with tabular, labeled data (i.e. data in a table with rows and columns). It is a good tool for real-world data analysis in Python. In Google Colab, pandas is already installed. If you work locally, install it by executing `python -m pip install pandas` in a terminal.\n","\n","[Here](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS) is an extensive playlist covering all basic pandas operations. The skills required for this exercise are covered in Parts 1 to 6. Feel free to skip around, as the videos cover lots of details :)\n","\n","If you prefer a text-based tutorial, take a look at the [Getting started](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html) section of the pandas documentation. \n","\n","You can also just go ahead and try to solve the tasks without any tutorial - each time something new is required, a link to a Google Search is provided. Since programming usually requires lots and lots of googling and reading documentation or Stack Overflow, this might give you an idea of what to google and which sites are helpful ;)\n","\n","This exercise uses COVID-19 data from [Our World in Data](https://ourworldindata.org/). The cell below extracts part of that data from the source on [GitHub](https://github.com/owid/covid-19-data/) and stores it in .csv files in a directory \"data\" next to this notebook. Execute it to get the most recent data and have a look at the .csv files!\n"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from pathlib import Path\n","\n","import pandas as pd\n","\n","data_dir = Path(\"./data\")\n","data_dir.mkdir(parents=True, exist_ok=True)\n","vaccinations_raw = pd.read_csv(\"https://github.com/owid/covid-19-data/raw/master/public/data/vaccinations/vaccinations.csv\")\n","vaccinations_raw[['location', 'date', 'daily_vaccinations', 'people_fully_vaccinated']].to_csv(data_dir / \"vaccinations.csv\", index=False)\n","cases_deaths_raw = pd.read_csv(\"https://github.com/owid/covid-19-data/raw/master/public/data/jhu/full_data.csv\")\n","cases_deaths_raw[['location', 'date', 'new_cases', 'new_deaths']].to_csv(data_dir / \"cases_deaths.csv\", index=False)\n","locations_raw = pd.read_csv(\"https://github.com/owid/covid-19-data/raw/master/public/data/jhu/locations.csv\")\n","locations_raw[['location', 'continent', 'population']].dropna().to_csv(data_dir / \"locations.csv\", index=False)"]},{"cell_type":"markdown","metadata":{},"source":["Read the three .csv-files. Make sure to parse the \"date\" columns to a datetime type (check by viewing the `.dtypes` attribute).\n","\n","[Help!](https://www.google.com/search?q=pandas+read+csv) \n","[Help with dates!](https://www.google.com/search?q=pandas+csv+parse+date)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Access the rows containing the most recent vaccination data for Austria.\n","\n","[Help1](https://www.google.com/search?q=pandas+last+rows), [Help2](https://www.google.com/search?q=pandas+filter+rows)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Create a new dataframe which contains dates, locations, and new cases - but no information about deaths.\n","\n","[Help](https://www.google.com/search?q=pandas+remove+column) ([alternative](https://www.google.com/search?q=pandas+select+columns))"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Get the names of all locations starting with an \"E\"!\n","\n","[Help](https://www.google.com/search?q=pandas+string+starts+with) ([alternative](https://www.google.com/search?q=pandas+string+slice))\n"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["For each letter in the alphabet, print how many location names start with that letter.\n","\n","[Help](https://www.google.com/search?q=python+loop+over+alphabet)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Get the names of all locations with a population above 200,000,000.\n","\n","[Help](https://www.google.com/search?q=pandas+select+larger+than)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Get the names of all locations with a population between 7,000,000 and 9,000,000.\n","\n","[Help](https://www.google.com/search?q=pandas+select+between)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Vaccinations, cases, and deaths are reported not only for individual countries, but also for groups of countries (e.g. continents). Create a new column named \"is_country\" in each of the dataframes based on whether the location is present in `locations.csv`\n","\n","[Help](https://www.google.com/search?q=pandas+select+if+in+list)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Get the _country_ with the highest number of vaccinations in a single day - continents and other country groups don't count!\n","\n","[Help](https://www.google.com/search?q=pandas+row+with+max+value+in+column)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Get the 10 least-populated locations.\n","\n","[Help](https://www.google.com/search?q=pandas+smallest+rows)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Find the unique continent names contained in the locations file.\n","\n","[Help](https://www.google.com/search?q=pandas+find+unique+values)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","metadata":{},"source":["Count the number of locations associated with each continent.\n","\n","[Help](https://www.google.com/search?q=pandas+count+values)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# your code goes here:\n"]}],"metadata":{"kernelspec":{"display_name":"CSS kernel","language":"python","name":"css"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.0"},"orig_nbformat":4},"nbformat":4,"nbformat_minor":2} 2 | -------------------------------------------------------------------------------- /exercises_solutions/ex07_pandas_b_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Exercise 07: Pandas B\n", 8 | "Today's exercise continues on working with pandas DataFrames. If you prefer the [video tutorials](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS), parts 7 to 11 cover the skills needed (and quite a bit more!). Alternatively, check out the [Getting started tutorials](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html) or follow the \"Help\"-links to relevant Google searches ;)\n", 9 | "\n", 10 | "The required tables are the same as in Exercise 06, so you can just copy the code for reading them from there." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "# your code goes here:\n", 20 | "import pandas as pd\n", 21 | "vaccinations = pd.read_csv(\"./data/vaccinations.csv\", parse_dates=[\"date\"])\n", 22 | "cases_deaths = pd.read_csv(\"./data/cases_deaths.csv\", parse_dates=[\"date\"])\n", 23 | "locations = pd.read_csv(\"./data/locations.csv\")" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "For each location, get the total number of cases (i.e. the sum of the daily new cases).\n", 31 | "\n", 32 | "[Help](https://www.google.com/search?q=pandas+sum+column+by+group)" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [ 40 | { 41 | "data": { 42 | "text/html": [ 43 | "
\n", 44 | "\n", 57 | "\n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | "
total_cases
location
Afghanistan180355.0
Africa11867790.0
Albania276101.0
Algeria265884.0
Andorra42894.0
......
Wallis and Futuna454.0
World528911736.0
Yemen11823.0
Zambia321779.0
Zimbabwe252404.0
\n", 115 | "

229 rows × 1 columns

\n", 116 | "
" 117 | ], 118 | "text/plain": [ 119 | " total_cases\n", 120 | "location \n", 121 | "Afghanistan 180355.0\n", 122 | "Africa 11867790.0\n", 123 | "Albania 276101.0\n", 124 | "Algeria 265884.0\n", 125 | "Andorra 42894.0\n", 126 | "... ...\n", 127 | "Wallis and Futuna 454.0\n", 128 | "World 528911736.0\n", 129 | "Yemen 11823.0\n", 130 | "Zambia 321779.0\n", 131 | "Zimbabwe 252404.0\n", 132 | "\n", 133 | "[229 rows x 1 columns]" 134 | ] 135 | }, 136 | "metadata": {}, 137 | "output_type": "display_data" 138 | } 139 | ], 140 | "source": [ 141 | "# your code goes here:\n", 142 | "total_cases = cases_deaths.groupby('location').agg(total_cases=('new_cases', sum))\n", 143 | "total_cases = cases_deaths.groupby('location')[\"new_cases\"].sum()\n", 144 | "total_cases" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "Calculate the total number of cases per millions inhabitants for all locations.\n", 152 | "\n", 153 | "[Help1](https://www.google.com/search?q=pandas+merge+dataframe), [Help2](https://www.google.com/search?q=pandas+divide+two+columns)" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "\n", 163 | "# your code goes here:\n", 164 | "df = total_cases.merge(locations, left_index=True, right_on='location')\n", 165 | "df['cases_per_million'] = df['total_cases'] / df['population'] * 1000000\n", 166 | "df" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "Now sort the resulting dataframe descending by cases per million.\n", 174 | "\n", 175 | "[Help](https://www.google.com/search?q=pandas+sort+descending)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [ 184 | "# your code goes here:\n", 185 | "df = df.sort_values('cases_per_million', ascending=False)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "Write the dataframe to a csv file, but include only location and cases_per_million.\n", 193 | "\n", 194 | "[Help1](https://www.google.com/search?q=pandas+write+csv), [Help2](https://www.google.com/search?q=pandas+write+csv+ignore+index)" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": null, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [ 203 | "# your code goes here:\n", 204 | "df[['location', 'cases_per_million']].to_csv('cpm.csv', index=False)" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "Get the number of new vaccinations per continent in November 2021." 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": {}, 218 | "outputs": [ 219 | { 220 | "data": { 221 | "text/html": [ 222 | "
\n", 223 | "\n", 236 | "\n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | "
all_vaccs
continent
Africa49661203.0
Asia652129741.0
Europe91966100.0
North America67906416.0
Oceania5061740.0
South America68982646.0
\n", 274 | "
" 275 | ], 276 | "text/plain": [ 277 | " all_vaccs\n", 278 | "continent \n", 279 | "Africa 49661203.0\n", 280 | "Asia 652129741.0\n", 281 | "Europe 91966100.0\n", 282 | "North America 67906416.0\n", 283 | "Oceania 5061740.0\n", 284 | "South America 68982646.0" 285 | ] 286 | }, 287 | "metadata": {}, 288 | "output_type": "display_data" 289 | } 290 | ], 291 | "source": [ 292 | "# your code goes here:\n", 293 | "vacc_cont = vaccinations.merge(locations, on='location')\n", 294 | "vacc_nov = vacc_cont.loc[(\"2021-11-01\" <= vacc_cont[\"date\"]) & (vacc_cont[\"date\"] <= \"2021-11-30\"), :]\n", 295 | "vacc_nov.groupby(\"continent\").agg(all_vaccs=(\"daily_vaccinations\", sum))" 296 | ] 297 | } 298 | ], 299 | "metadata": { 300 | "language_info": { 301 | "name": "python" 302 | }, 303 | "orig_nbformat": 4 304 | }, 305 | "nbformat": 4, 306 | "nbformat_minor": 2 307 | } 308 | -------------------------------------------------------------------------------- /tutorials/tut05_file_IO.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python Crash Course 05 - File I/O" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## File I/O\n", 15 | "[Video tutorial (25 min)](https://www.youtube.com/watch?v=Uh2ebFW8OYM) \n", 16 | "[Library Reference](https://docs.python.org/3/library/functions.html#open)" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "### Reading and writing to Files\n", 24 | "Python's built-in function `open(...)` opens a file and returns a _file object_:\n", 25 | "\n", 26 | "```python\n", 27 | "open(filename, mode=\"r\")\n", 28 | "```\n", 29 | "The argument `filename` expects the path to the file (e.g. as a string). \n", 30 | "The optional argument `mode` expects the mode in which the file should be opened, it defaults to `r` (reading the file). The most important modes are: \n", 31 | "* `\"r\"` - Reading a file\n", 32 | "* `\"w\"` - Open for writing (deleting previous content)\n", 33 | "* `\"a\"` - Open for writing (appending to the end if exists)\n", 34 | "\n", 35 | "As mentioned before `open()` returns a file object, you can use different methods on this file object - for example to read or to write lines. The table below shows which methods work on which file object opened with a certain mode ([here](https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects) is some more information about the methods): \n", 36 | "\n", 37 | "mode | Method\n", 38 | "-------- | --------\n", 39 | "`\"r\"` | `.read()`, `.readlines()`\n", 40 | "`\"w\"` | `.write()`, `.writelines()`\n", 41 | "`\"a\"` | `.write()`, `.writelines()`\n" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "# Just execute, this creates a file with some content for the examples below\n", 51 | "with open(\"shoppinglist.txt\", \"w\") as datafile:\n", 52 | " datafile.write(\"noodles\\nbread\\nmilk\\ncheese\\napples\\n\")" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "This creates a file with following content:\n", 60 | "```\n", 61 | "noodles\n", 62 | "bread\n", 63 | "milk\n", 64 | "cheese\n", 65 | "apples\n", 66 | "```" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "But what's the `with` here for? Have you ever tried to move some Word-document or image file while it was opened? Your operating system most likely told you that you'll have to close the file first. Since performing multiple different operations on a single file at the same time often leads to chaos, we have to _close_ files after opening them (and doing something with them). Closing a file signals to other software that the file is \"available\" now. The _context manager_ `with` takes care of opening **and closing** the file for us. The file will only stay open for the block of code indented below the `with` statement, and will be closed at the first dedented line. \n", 74 | "\n", 75 | "\n", 76 | "Let's see what happens if we want to write `\"rice\"` to the file, opened with mode `\"w\"` (open the file in a text editor to check what happens):" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "with open(\"shoppinglist.txt\", \"w\") as datafile:\n", 86 | " datafile.write(\"rice\\n\")" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "The file now contains:\n", 94 | "```\n", 95 | "rice\n", 96 | "```" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "Didn't work out well, since `\"w\"` overwrites all content and writes the new content. The old list got deleted. \n", 104 | "Using `\"a\"` will do a better job:" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": null, 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "with open(\"shoppinglist.txt\", \"w\") as datafile:\n", 114 | " datafile.write(\"noodles\\nbread\\nmilk\\ncheese\\napples\\n\")" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "with open(\"shoppinglist.txt\", \"a\") as datafile:\n", 124 | " datafile.write(\"rice\\n\")" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "The file contains now:\n", 132 | "```\n", 133 | "noodles\n", 134 | "bread\n", 135 | "milk\n", 136 | "cheese\n", 137 | "apples\n", 138 | "rice\n", 139 | "```" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "You can also iteratively write multiple lines to a file with `write()`:" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "#writing to a file\n", 156 | "data = [\"John\", \"Lisa\", \"Anna\", \"Bob\"]\n", 157 | "with open('some_new_file.txt', 'w') as some_file:\n", 158 | " for index, name in enumerate(data):\n", 159 | " line = f'Line number {index+1} Name: {name}\\n'\n", 160 | " some_file.write(line)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "Notice you have to add the `\\n` at the end of each line, if you don't do this everything gets written to one line. Try out what happens when you remove the `\\n` at the end." 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "You can also use `writelines()` to write multiple lines at once" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": null, 180 | "metadata": {}, 181 | "outputs": [], 182 | "source": [ 183 | "data = [\"John\", \"Lisa\", \"Anna\", \"Bob\"]\n", 184 | "lines = []\n", 185 | "with open('some_new_file.txt', 'w') as some_file:\n", 186 | " for index, name in enumerate(data):\n", 187 | " lines.append(f'Line number {index+1} Name: {name}\\n')\n", 188 | " some_file.writelines(lines)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "Notice here you also need to add the `\\n` at the end of each string in the list." 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "### Reading the whole file\n", 203 | "The `.read()` method reads some quantity of data and returns it as a string. Size is an optional argument, if size is omitted or negative, the entire content of the file will be read and returned." 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "# read the whole file\n", 213 | "with open(\"shoppinglist.txt\", \"r\") as shoppinglist_file:\n", 214 | " content = shoppinglist_file.read()\n", 215 | "\n", 216 | "print(content)" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "# content -> this is a string with newline characters ('\\n')\n", 226 | "print(type(content))\n", 227 | "print(repr(content)) # printable representation of the given object" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "As you can see, the output of the method `.read()` is a string, every newline is represented by a `\\n`.\n", 235 | "\n", 236 | "You can also iterate over each line in the file. Each line is represented as a string" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "# Reading a file line by line (i.e. iterate over the file):\n", 246 | "with open('shoppinglist.txt', 'r') as shoppinglist_file:\n", 247 | " for line in shoppinglist_file:\n", 248 | " print(repr(line))\n", 249 | " # print(line)\n" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "`.readlines()` reads all lines from a file and returns them as a list. A newline character (`\\n`) is left at the end of every string in the list." 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "# reading with readlines()\n", 266 | "with open(\"shoppinglist.txt\", \"r\") as shoppinglist_file:\n", 267 | " content = shoppinglist_file.readlines()\n", 268 | " print(content)\n", 269 | " print(\"Length of content is: \", len(content))" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "print(type(content[0]))" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "### Absolute Paths\n", 286 | "A _absolute path_ is the whole path to the file. Absolute paths work only on your system - since you can never be certain that another user has the same directory tree as you. \n", 287 | "We recommend to **avoid absolute paths whenever possible!** Also, hardcoding paths (absolute or relative) will most likely produce code others can not use." 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "metadata": {}, 294 | "outputs": [], 295 | "source": [ 296 | "#just execute to create the python-file for demonstration\n", 297 | "with open(\"demofile.py\", \"w\") as datafile:\n", 298 | " datafile.write('print(\"Hello, World!\")')" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": {}, 305 | "outputs": [], 306 | "source": [ 307 | "# This will not work on your system because you most likely will not have this directory structure\n", 308 | "absolute_path = \"C:/absolute/path/to/this/demofile.py\"\n", 309 | "\n", 310 | "with open(absolute_path, \"r\") as text_file:\n", 311 | " data = text_file.read()\n", 312 | "print(data)\n", 313 | "#eval(data)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "So reading this file with using absolute paths won't work on your PC..." 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "### Relative Paths\n", 328 | "Relative Paths are paths which start from the current working directory. So only the directory tree 'below' the current working directory is relevant.\n", 329 | "_Hint:_ you can still go 'up' the directory tree by using (however many necessary) `../` at the beginning of the path." 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": null, 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [ 338 | "relative_path = \"demofile.py\"\n", 339 | "\n", 340 | "with open(relative_path, \"r\") as text_file:\n", 341 | " data = text_file.read()\n", 342 | "print(data)\n", 343 | "#eval(data)" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "## Common Mistakes\n", 351 | "`UnsupportedOperation`: make sure you open your file with the correct mode. You can not open a file with mode `\"r\"` and then write to your file, you have to use `\"w\"` here. Same the other way around" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "with open(\"shoppinglist.txt\", \"r\") as shoppinglist_file:\n", 361 | " shoppinglist_file.write(\"rice\\n\")" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [ 370 | "with open(\"shoppinglist.txt\", \"w\") as shoppinglist_file:\n", 371 | " print(shoppinglist_file.readlines())" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "Both errors can be fixed by using the appropriate method `\"w\"` for the first example and `\"r\"` for the second one." 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "`FileNotFoundError` is raised when the file is not found with the given path. Make sure your paths point to a valid file and try not to use absolute paths" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": null, 391 | "metadata": {}, 392 | "outputs": [], 393 | "source": [ 394 | "absolute_path = \"Folder_1/demofile.py\"\n", 395 | "with open(absolute_path, \"r\") as text_file:\n", 396 | " data = text_file.read()\n", 397 | "print(data)\n", 398 | "#eval(data)" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "## Best Practice\n", 406 | "\n", 407 | "### Use the `with`-statement\n", 408 | "While it is possible to manually open and close a file like this:\n", 409 | "##### _Don't_:\n", 410 | "```python\n", 411 | "file_to_read = open(\"some_file.txt\", \"r\")\n", 412 | "data = file_to_read.read()\n", 413 | "\n", 414 | "file_to_read.close()\n", 415 | "```\n", 416 | "\n", 417 | "it is easy to forget to close the file (or let the program crash before it reaches the call to `.close()`). To make sure the file is closed, always use `with`\n", 418 | ":\n", 419 | "##### _Do:_\n", 420 | "```python\n", 421 | "with open(\"some_file.txt\", \"r\") as file_to_read:\n", 422 | " data = file_to_read.read()\n", 423 | "```" 424 | ] 425 | } 426 | ], 427 | "metadata": { 428 | "interpreter": { 429 | "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" 430 | }, 431 | "kernelspec": { 432 | "display_name": "Python 3", 433 | "language": "python", 434 | "name": "python3" 435 | }, 436 | "language_info": { 437 | "codemirror_mode": { 438 | "name": "ipython", 439 | "version": 3 440 | }, 441 | "file_extension": ".py", 442 | "mimetype": "text/x-python", 443 | "name": "python", 444 | "nbconvert_exporter": "python", 445 | "pygments_lexer": "ipython3", 446 | "version": "3.9.7" 447 | } 448 | }, 449 | "nbformat": 4, 450 | "nbformat_minor": 4 451 | } 452 | -------------------------------------------------------------------------------- /exercises/ex09_scipy_statsmodels.ipynb: -------------------------------------------------------------------------------- 1 | {"cells":[{"cell_type":"markdown","id":"a92d7910","metadata":{},"source":["# Exercise 09: Statistics with `scipy` and `statsmodels`\n","The `scipy.stats` module contains a large number of statistical distributions, functions, and tests. For a complete documentation of its features, see [here](http://docs.scipy.org/doc/scipy/reference/stats.html). \n","[This video tutorial (20min)](https://www.youtube.com/watch?v=CIbJSX-biu0) covers Hypothesis Testing and T-Tests with scipy.\n","\n","Secondly there is also a Python package for statistical modelling called `statsmodels`. For further informations see [here](https://www.statsmodels.org/stable/index.html). \n","[This video tutorial (19min)](https://www.youtube.com/watch?v=z_BXANUOjJY) covers Regression analysis with statsmodels.\n","\n","[This video tutorial (9min)](https://www.youtube.com/watch?v=ZR6bf8_s-hw) shows T-Tests with scipy and statsmodels.\n","\n","In Google Colab, scipy and statsmodels is already installed. If you work locally, install it by executing `python -m pip install scipy statsmodels` in a terminal. \n","\n","First we import all necessary libraries:"]},{"cell_type":"code","execution_count":null,"id":"8244b32e","metadata":{},"outputs":[],"source":["from scipy import stats\n","import statsmodels.api as sm\n","import matplotlib.pyplot as plt\n","import numpy as np\n","from sklearn import datasets\n","import numpy as np\n","import pandas as pd"]},{"cell_type":"markdown","id":"d05b0138","metadata":{},"source":["### Distributions\n","Create a discrete random variable with _Poisson distribution_. Use `stats.poisson` and set the paramter $\\mu$ to `3.5`: \n","[Help](https://www.google.com/search?q=scipy+random+poisson+distribution&oq=scipy+random+poisson+distribution&aqs=chrome..69i57.7064j0j1&sourceid=chrome&ie=UTF-8)"]},{"cell_type":"code","execution_count":null,"id":"70815df6","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"7ad8ff0c","metadata":{},"source":["Next, plot the proability mass function (PMF) and the cumulative distribution function (CDF). Use the methods `pmf` and `cdf` for this (k should be 0 to 20). Plot the random variates should in a histogram with a sample size of 1000 - use the method `rvs`. Since X is a discrete distribution, you can use the `vlines` method from matplotlib to plot the results of calling `pmf` and `cdf`. \n","[Help1](https://www.google.com/search?q=python+plot+discrete+cdf&sxsrf=ALiCzsaIGFWol2YC7JQUFhHGKJAISCBgwA%3A1654333485047&ei=LSCbYsTKAqSWxc8PvuGM6Ak&oq=scipy+plot+cdf+disc&gs_lcp=Cgdnd3Mtd2l6EAMYADIGCAAQHhAWOgcIIxCwAxAnOgcIABBHELADOgQIIxAnSgQIQRgASgQIRhgAUIsHWJkRYLYZaAFwAXgAgAFciAGQA5IBATWYAQCgAQHIAQrAAQE&sclient=gws-wiz), [Help2](https://www.google.com/search?q=python+plot+discrete+pdf&sxsrf=ALiCzsZ2RneuUM80urle5OvBCbf0gw-dBA%3A1654333509044&ei=RSCbYuGhAsqVxc8PoJ2ViAM&ved=0ahUKEwih2JXJuJP4AhXKSvEDHaBOBTEQ4dUDCA8&uact=5&oq=python+plot+discrete+pdf&gs_lcp=Cgdnd3Mtd2l6EAM6BwgAEEcQsAM6BQgAEMsBSgQIQRgASgQIRhgAUK4HWIcJYO0JaAJwAHgAgAFbiAGvAZIBATKYAQCgAQHIAQjAAQE&sclient=gws-wiz), [Help3](https://www.google.com/search?q=python+plot+discrete+distribution&sxsrf=ALiCzsZGXdyjxjcmTgsTgfOfP-9nLMWHKQ%3A1654333556415&ei=dCCbYv35GI-Rxc8PvI6MoAY&oq=python+plot+discrete+dis&gs_lcp=Cgdnd3Mtd2l6EAMYADIFCAAQywE6BwgAEEcQsAM6BAgjECc6BggAEB4QFkoECEEYAEoECEYYAFDyUVj0W2D-YmgBcAF4AIABX4gBzgKSAQE0mAEAoAEByAEIwAEB&sclient=gws-wiz)"]},{"cell_type":"code","execution_count":null,"id":"17526204","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"d4522107","metadata":{},"source":["Repeat the above task with a continuous normal distribution, using `stats.norm`. \n","[Help](https://www.google.com/search?q=python+scipy+normal+distribution&sxsrf=ALiCzsYCER9zC84yrMGMlZ8F8woXTU7EbQ%3A1654333579139&ei=iyCbYsuCCMCFxc8Pqo6fyAw&oq=python+scipy+normal+&gs_lcp=Cgdnd3Mtd2l6EAMYADIFCAAQywEyBQgAEMsBMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjoHCAAQRxCwAzoECCMQJzoECAAQQzoKCC4QgAQQhwIQFDoKCAAQgAQQhwIQFDoFCAAQgAQ6CAgAEB4QDxAWSgQIQRgASgQIRhgAUJMIWP0WYLEdaAFwAHgAgAFciAHqB5IBAjEzmAEAoAEByAEIwAEB&sclient=gws-wiz)"]},{"cell_type":"code","execution_count":null,"id":"d5a3b988","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"9e8a25a5","metadata":{},"source":["Again plot the `pdf`, `cdf` and a Histogram of 1000 random relaizations of the normal distribution. \n","[Help](https://stackoverflow.com/questions/10138085/how-to-plot-normal-distribution)"]},{"cell_type":"code","execution_count":null,"id":"27c141fc","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"3a0cd49b","metadata":{},"source":["Now calculate the mean, standard deviation, and the variance of the poisson distribution and the normal distribution (use `mean`, `std`, and `var`). \n","[Help1](https://www.google.com/search?q=python+scipy+mean&sxsrf=ALiCzsZQDy6uG19bIO4AZdJwvrQTmTpT7g%3A1654333688711&ei=-CCbYsaKK7GIxc8Pn9yqwAs&ved=0ahUKEwjG5uueuZP4AhUxRPEDHR-uCrgQ4dUDCA8&uact=5&oq=python+scipy+mean&gs_lcp=Cgdnd3Mtd2l6EAMyBQgAEMsBMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWOgcIABBHELADOgQIIxAnOgQIABBDOgUIABCABEoECEEYAEoECEYYAFDLBFiRC2DhD2gBcAB4AIABU4gBzQOSAQE2mAEAoAEByAEIwAEB&sclient=gws-wiz), [Help2](https://www.google.com/search?q=python+scipy+std&sxsrf=ALiCzsYPhyVkhn1mZNPwBKlkDAaQ8K6z9A%3A1654333710075&ei=DiGbYuiaBMeVxc8PoMWE8AE&ved=0ahUKEwjo2YOpuZP4AhXHSvEDHaAiAR4Q4dUDCA8&uact=5&oq=python+scipy+std&gs_lcp=Cgdnd3Mtd2l6EAMyBggAEB4QFjIGCAAQHhAWMggIABAeEA8QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIICAAQHhAPEBY6BwgAEEcQsAM6BQgAEMsBSgQIQRgASgQIRhgAUI4MWMoQYKURaAJwAXgAgAFRiAHlAZIBATOYAQCgAQHIAQjAAQE&sclient=gws-wiz), [Help3](https://www.google.com/search?q=python+scipy+var&sxsrf=ALiCzsY58SzkdbWTMwYPo3dcT15ox9z2XQ%3A1654333729609&ei=ISGbYpnWJPaSxc8PreSPsAg&ved=0ahUKEwjZ6quyuZP4AhV2SfEDHS3yA4YQ4dUDCA8&uact=5&oq=python+scipy+var&gs_lcp=Cgdnd3Mtd2l6EAMyBQgAEMsBMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWOgcIIxCwAxAnOgcIABBHELADSgQIQRgASgQIRhgAUJ8KWJYMYPIMaAJwAXgAgAFQiAGcAZIBATKYAQCgAQHIAQnAAQE&sclient=gws-wiz)"]},{"cell_type":"code","execution_count":null,"id":"bceb66ce","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"1b29ac6b","metadata":{},"source":["### Statistical tests\n","Create two samples of size 1000 from the poisson distribution with `rvs`. Test if the two sets come from the same distribution by performing a two-sided t-test with `stats.ttest_ind`. \n","\n","Can we say that the two distributioins have the same mean? Take a look a the _p-value_: \n","- If the p-value is very large we cannot reject the hypothesis that the two sets of random data have equal means. \n","- If the p-value is very small we can reject the hypothesis that the two sets of random data have equal means.\n","\n","[Help](https://www.google.com/search?q=python+scipy+ttest&sxsrf=ALiCzsYJwej8jbFhcS5sR7TSQUTaSP3VQA%3A1654333746555&ei=MiGbYv68IeeSxc8Pj-CnOA&ved=0ahUKEwj-nba6uZP4AhVnSfEDHQ_wCQcQ4dUDCA8&uact=5&oq=python+scipy+ttest&gs_lcp=Cgdnd3Mtd2l6EAMyBwgAEAoQywEyBggAEB4QFjIGCAAQHhAWMggIABAeEBYQCjIGCAAQHhAWMgYIABAeEBYyBggAEB4QFjIGCAAQHhAWMgYIABAeEBY6BwgAEEcQsAM6BQgAEMsBSgQIQRgASgQIRhgAUJ4DWJkJYJcKaAFwAXgAgAGBAYgB2QOSAQM0LjGYAQCgAQHIAQjAAQE&sclient=gws-wiz)"]},{"cell_type":"code","execution_count":null,"id":"56d18bfc","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"bb77bb39","metadata":{},"source":["Repeat the same test with `statsmodels` by using `sm.stats.ttest_ind`: \n","[Help](https://www.google.com/search?q=python+statsmodels+ttest&sxsrf=ALiCzsbwFKlVSrGXMu1NhXh-UatsF2N0Yg%3A1654333761992&ei=QSGbYpeUPNiRxc8P8rOXyA8&ved=0ahUKEwjXuOTBuZP4AhXYSPEDHfLZBfkQ4dUDCA8&uact=5&oq=python+statsmodels+ttest&gs_lcp=Cgdnd3Mtd2l6EAMyBggAEB4QCjIECAAQHjIGCAAQHhAHOgcIABBHELADOgoIABBHELADEMkDOggIABAeEAcQCjoKCAAQHhAPEAcQCjoICAAQHhAPEAc6CAgAEB4QCBAHOgoIABAeEAgQBxAKOgcIABAKEMsBOgYIABAeEAg6CAgAEB4QCBAKOgoIABAeEAcQChATOggIABAeEAcQEzoECAAQEzoKCAAQHhAIEA0QEzoOCAAQHhAPEAgQDRAKEBNKBAhBGABKBAhGGABQhQRYkxBgoBpoAnABeACAAWSIAZkHkgEEMTAuMZgBAKABAcgBCMABAQ&sclient=gws-wiz)"]},{"cell_type":"code","execution_count":null,"id":"520f5cfd","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"81c5327c","metadata":{},"source":["Use the two-sided t-test to check if the mean is _less_ then 3 and secondly check if the mean is _greater_ then 3. What are your findings? You can use the keyword arguments `alternative=\"less\"` and `alternative=\"greater\"` \n","Again, look at the p-value: \n","- If the p-value is very large we cannot reject the hypothesis that the mean of the first sample is less/greater then the mean of the second sample. \n","- If the p-value is very small we can reject the hypothesis that the mean of the first sample is less/greater then the mean of the second sample."]},{"cell_type":"code","execution_count":null,"id":"da22d5fc","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"e253f333","metadata":{},"source":["Repeat the same tests with `statsmodels` by using `sm.stats.ttest_ind`. Ese the keyword arguments `alternative=\"smaller\"` and `alternative=\"larger\"`:"]},{"cell_type":"code","execution_count":null,"id":"bccbd720","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"5f603b04","metadata":{},"source":["Next check if the mean of a single sample of size 1000 from the normal distribution is `0.1` (the actual mean is `0.0`). This can be done with a one-sided t-test, using `stats.ttest_1samp`.\n","\n","Again looking at the `p-value`: \n","If the `p-value` is very large we cannot reject the hypothesis that the mean of the sample is equal to `0.1`. \n","If the `p-value` is very small we can reject the hypothesis that the mean of the sample is equal to `0.1`.\n","\n","[Help](https://www.google.com/search?q=python+scipy+ttest+one+sided+1samp&sxsrf=ALiCzsbgNwoN3IwKbyxce0mKA4LiBnIEEQ%3A1654333889519&ei=wSGbYqSrH_eOxc8Pkri--AY&ved=0ahUKEwjkj8z-uZP4AhV3R_EDHRKcD28Q4dUDCA8&uact=5&oq=python+scipy+ttest+one+sided+1samp&gs_lcp=Cgdnd3Mtd2l6EAMyBQghEKABMgUIIRCgAToHCCMQsAMQJzoHCAAQRxCwAzoHCCEQChCgAUoECEEYAEoECEYYAFCfH1jbMWCPM2gCcAF4AIABpgGIAYIFkgEDNC4ymAEAoAEByAEJwAEB&sclient=gws-wiz)"]},{"cell_type":"code","execution_count":null,"id":"40f8d499","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"f32ed7c4","metadata":{},"source":["### Linear Regression Models with `statsmodels`\n","Linear regression is a statistical model which examines the linear relationship between two (Simple Linear Regression) or more (Multiple Linear Regression) variables. Let's see how to actually use `statsmodels` for linear regression. First of all we load a dataset from `sklearn`. We use the californian housing dataset. Executing the cell below loads the dataset and gives a description about it. \n"]},{"cell_type":"code","execution_count":null,"id":"58d7e1d2","metadata":{},"outputs":[],"source":["data = datasets.fetch_california_housing(as_frame=True)\n","print(data.DESCR)"]},{"cell_type":"markdown","id":"8a340635","metadata":{},"source":["First the dataset is converted to a pandas dataframe, and we create the feature matrix `X`, which holds all the features (independent variables) and the target vector `y`, which holds the targets to predict (dependent variable)."]},{"cell_type":"code","execution_count":null,"id":"4ecc1bbd","metadata":{},"outputs":[],"source":["X = pd.DataFrame(data.data, columns=data.feature_names)\n","y = pd.DataFrame(data.target, columns=[\"MedHouseVal\"])\n","dataframe = pd.DataFrame(data.frame) # needed for formula api"]},{"cell_type":"markdown","id":"a8c6c1b3","metadata":{},"source":["Now fit a linear regression model with the variable `\"MedInc\"` as the independent variable and `\"MedHouseVal\"` as the dependent variable. You can get a model with `sm.OLS(y,X).fit()`. _OLS_ stands for _Ordinary Least Squares_, a model which fits a regression line that minimizes the square of the distances between the data points and the regression line. To get the predictions use the method `predict(X)`. To get the summary of the model use the method `summary`. \n","`summary` creates a long table with a lot of information. Here is how to interpret the most importatn stuff in the table: \n","- First we have what’s the dependent variable and the model and the method. \n","- Df of residuals and models (`Df Residuals` and `Df Model`) relates to the degrees of freedom — \"the number of values in the final calculation of a statistic that are free to vary.\"\n","- The coefficient of HouseAge (`coef`) means that as the HouseAge variable increases by 1, the predicted value of target increases by the coefficient of HouseAge.\n","- `R-squared` is the percentage of variance our model explains. \n","- `std err` is the standard deviation of the sampling distribution of a statistic, most commonly of the mean.\n","- The 95% confidence intervals for the HouseAge (`[0.025 0.975]`), meaning we predict at a 95% percent confidence that the value of HouseAge is between these two values. \n","\n","If you are more familiar with `R` statsmodels provides also the usage of the `R` formula for ols. For this look [here](https://www.statsmodels.org/devel/generated/statsmodels.formula.api.ols.html). Here you do not need to use `add_constant` since this is already done with the formula approach (given as `Intercept`) \n","\n","[Help](https://www.google.com/search?q=python+statsmodels+ols&sxsrf=ALiCzsbpd-ZizRcZVzXmELEjQmmYWlaHeA%3A1654333964644&ei=DCKbYqLjJsCFxc8P8bOnoAs&ved=0ahUKEwiimbWiupP4AhXAQvEDHfHZCbQQ4dUDCA8&uact=5&oq=python+statsmodels+ols&gs_lcp=Cgdnd3Mtd2l6EAMyBQgAEMsBMgUIABDLATIFCAAQywEyBQgAEMsBMgUIABDLATIFCAAQywEyBQgAEMsBMgUIABDLATIFCAAQywEyBQgAEMsBOgQIIxAnOgQIABBDOgcILhDUAhBDOgcIABCxAxBDOgoILhCABBCHAhAUOgoIABCABBCHAhAUOgUIABCABDoGCAAQHhAWOgUIIRCgAUoECEEYAEoECEYYAFAAWO9LYKZOaARwAXgAgAF9iAG1D5IBBDI0LjKYAQCgAQHAAQE&sclient=gws-wiz) "]},{"cell_type":"code","execution_count":null,"id":"cf859c6a","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"334566e4","metadata":{},"source":["Now try to fit the model with more than one variable. Try using `\"HouseAge\"` and `\"AveRooms\"`. What is the difference?"]},{"cell_type":"code","execution_count":null,"id":"de02834f","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"b7197e28","metadata":{},"source":["Lastly use now all independent variables for fitting. Is your model better then before? (Try to look at the `R-squared` value):"]},{"cell_type":"code","execution_count":null,"id":"2d1eb6eb","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"751487ba","metadata":{},"source":["Lastly look at the mean square error (MSE) of the three different models and their prediction (use `predict`). Which model performs the best? Note that we use the same independent variable for the prediction and the fitting, so we do not have any unseen data:"]},{"cell_type":"code","execution_count":null,"id":"371e86a3","metadata":{},"outputs":[],"source":["def mse(y_true, y_pred):\n"," return np.mean((y_true - y_pred)**2)\n","\n","# your code goes here:\n"]},{"cell_type":"markdown","id":"7ddd938f","metadata":{},"source":["Plot for the last model the partial regression. For this you can use the `sm.graphics.plot_partregress_grid` method. \n","[Help](https://www.statsmodels.org/devel/generated/statsmodels.graphics.regressionplots.plot_partregress_grid.html)"]},{"cell_type":"code","execution_count":null,"id":"bcf518c6","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"f6188a12","metadata":{},"source":["Plot the component and component plus residual. For this you can use the `sm.graphics.plot_ccpr_grid` method. \n","[Help](https://www.statsmodels.org/dev/generated/statsmodels.graphics.regressionplots.plot_ccpr_grid.html)"]},{"cell_type":"code","execution_count":null,"id":"856b3dc7","metadata":{},"outputs":[],"source":["# your code goes here:\n"]}],"metadata":{"interpreter":{"hash":"46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2"},"jupytext":{"notebook_metadata_filter":"all","text_representation":{"extension":".md","format_name":"myst","format_version":0.13,"jupytext_version":"1.11.1"}},"kernelspec":{"display_name":"Python 3.10.4 64-bit","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.4"},"source_map":[23,32,36,43,47,50,54,56,68,73,92,97,110,112,116,118,124,126,146,148,157,159,168,173,185,192,204,225,232,234,238,240,245,251,254,259,261,268,275,281,286,292,295,304,311,317,322,330,333,341,343,350,356,360,365,372,375,383,388,392,399,405,410,415,417,421,427,433,436,440,442,446,448,455,460,462,466,469,476,479,485,491,497,501,503,509,517,526,530,534,540,546,550,554]},"nbformat":4,"nbformat_minor":5} 2 | -------------------------------------------------------------------------------- /exercises_solutions/ex06_pandas_a_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Import\n", 8 | "\n", 9 | "Before we start with todays exercice, let us talk a little bit about **modules** and what they represent. Imagine you are working on a script which includes a variety of functions to solve common tasks, for instance: functions to perform mathematical matrix operations or functions to visualise huge amounts of data. Wouldn't it be convenient if we could use those functions in another python-script? Well....since we are too lazy to rewrite everything.... Yeah, it would!\n", 10 | "\n", 11 | "Modules - it's your time to shine🌞
\n", 12 | "Modules are nothing more than .py-files consisting of different kinds of components, i.e. functions, which can be made available in any other python-script using the `import`-statement. And yes, you can import your own python-scripts as well! Besides that, Python comes with a extensive amount of modules, known as Standard-Library. You can also include 3rd-party packages (numpy, pandas, scipy, matplotlib,...) but you will have to install them first. \n", 13 | "\n", 14 | "To keep the namespace clean (and your brain sane), lets have a look on how to use `import`.\n", 15 | "### How to `import` everything from a module\n", 16 | "\n", 17 | "**Python-Syntax:** \n", 18 | "```python\n", 19 | "import module_name\n", 20 | "```\n", 21 | "\n", 22 | "It doesn't get easier than that - after the import-statement follows the name of the module. Now you can use all functions from the `pandas` module by prefixing their name with their namespace `pandas.` Usually, all import-statements are found at the top of the script to keep the code tidy and clear.\n", 23 | "\n" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "### How to `import` specific contents `from` a module\n", 31 | "\n", 32 | "**Python-Syntax:**\n", 33 | "```python\n", 34 | "from module_name import content_name1, content_name2, etc\n", 35 | "``` \n", 36 | "\n", 37 | "Instead of importing everything of a module, we can extract specific contents, i.e. only functions we really need. This allows us, to use functions without the namespace-prefix. Keep in mind, that multiple contents are separated with commas (`,`)." 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "**Pitfall:**\n", 45 | "```python\n", 46 | "from statistics import mean\n", 47 | "from numpy import mean\n", 48 | "```\n", 49 | "Always keep an eye on which elements you are importing from different modules. In our case, there are two imported functions with the same name (name-collision). Therefore python always uses the last imported function with that name - in our case, the mean-function of the numpy module. The last import always wins!\n", 50 | "\n" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### How to `import` a module `as` you like\n", 58 | "\n", 59 | "**Python-Syntax:** \n", 60 | "```python\n", 61 | "import module_name as new_module_name_in_namespace\n", 62 | "from module_name import component as new_component_name_in_namespace\n", 63 | "```\n", 64 | "\n", 65 | "Modules and packages can be renamed on import to keep code more succinct. Most widely-used packages have an established abbreviation. Stick to it to make your code readable for others! For example pandas established abbrevation is `pd` so you would import it as:\n", 66 | "```python\n", 67 | "import pandas as pd\n", 68 | "```" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "# Exercise 06: Pandas A\n", 76 | "\n", 77 | "[Pandas](https://pandas.pydata.org/docs/ ) is a Python package which provides data structures for working with tabular, labeled data (i.e. data in a table with rows and columns). It is a good tool for real-world data analysis in Python. In Google Colab, pandas is already installed. If you work locally, install it by executing `python -m pip install pandas` in a terminal.\n", 78 | "\n", 79 | "[Here](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS) is an extensive playlist covering all basic pandas operations. The skills required for this exercise are covered in Parts 1 to 6. Feel free to skip around, as the videos cover lots of details :)\n", 80 | "\n", 81 | "If you prefer a text-based tutorial, take a look at the [Getting started](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html) section of the pandas documentation. \n", 82 | "\n", 83 | "You can also just go ahead and try to solve the tasks without any tutorial - each time something new is required, a link to a Google Search is provided. Since programming usually requires lots and lots of googling and reading documentation or Stack Overflow, this might give you an idea of what to google and which sites are helpful ;)\n", 84 | "\n", 85 | "This exercise uses COVID-19 data from [Our World in Data](https://ourworldindata.org/). The cell below extracts part of that data from the source on [GitHub](https://github.com/owid/covid-19-data/) and stores it in .csv files in a directory \"data\" next to this notebook. Execute it to get the most recent data and have a look at the .csv files!\n" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 3, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [ 94 | "from pathlib import Path\n", 95 | "\n", 96 | "import pandas as pd\n", 97 | "\n", 98 | "data_dir = Path(\"./data\")\n", 99 | "data_dir.mkdir(parents=True, exist_ok=True)\n", 100 | "vaccinations_raw = pd.read_csv(\"https://github.com/owid/covid-19-data/raw/master/public/data/vaccinations/vaccinations.csv\")\n", 101 | "vaccinations_raw[['location', 'date', 'daily_vaccinations', 'people_fully_vaccinated']].to_csv(data_dir / \"vaccinations.csv\", index=False)\n", 102 | "cases_deaths_raw = pd.read_csv(\"https://github.com/owid/covid-19-data/raw/master/public/data/jhu/full_data.csv\")\n", 103 | "cases_deaths_raw[['location', 'date', 'new_cases', 'new_deaths']].to_csv(data_dir / \"cases_deaths.csv\", index=False)\n", 104 | "locations_raw = pd.read_csv(\"https://github.com/owid/covid-19-data/raw/master/public/data/jhu/locations.csv\")\n", 105 | "locations_raw[['location', 'continent', 'population']].dropna().to_csv(data_dir / \"locations.csv\", index=False)" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "Read the three .csv-files. Make sure to parse the \"date\" columns to a datetime type (check by viewing the `.dtypes` attribute).\n", 113 | "\n", 114 | "[Help!](https://www.google.com/search?q=pandas+read+csv) \n", 115 | "[Help with dates!](https://www.google.com/search?q=pandas+csv+parse+date)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 122, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "# your code goes here:\n", 125 | "vaccinations = pd.read_csv(\"./data/vaccinations.csv\", parse_dates=[\"date\"])\n", 126 | "cases_deaths = pd.read_csv(\"./data/cases_deaths.csv\", parse_dates=[\"date\"])\n", 127 | "locations = pd.read_csv(\"./data/locations.csv\")" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "Access the rows containing the most recent vaccination data for Austria.\n", 135 | "\n", 136 | "[Help1](https://www.google.com/search?q=pandas+last+rows), [Help2](https://www.google.com/search?q=pandas+filter+rows)" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 8, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "data": { 146 | "text/html": [ 147 | "
\n", 148 | "\n", 161 | "\n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | "
locationdatedaily_vaccinationspeople_fully_vaccinated
6531Austria2022-05-162668.0NaN
6532Austria2022-05-172376.0NaN
6533Austria2022-05-182084.0NaN
6534Austria2022-05-191792.0NaN
6535Austria2022-05-201500.06616365.0
\n", 209 | "
" 210 | ], 211 | "text/plain": [ 212 | " location date daily_vaccinations people_fully_vaccinated\n", 213 | "6531 Austria 2022-05-16 2668.0 NaN\n", 214 | "6532 Austria 2022-05-17 2376.0 NaN\n", 215 | "6533 Austria 2022-05-18 2084.0 NaN\n", 216 | "6534 Austria 2022-05-19 1792.0 NaN\n", 217 | "6535 Austria 2022-05-20 1500.0 6616365.0" 218 | ] 219 | }, 220 | "execution_count": 8, 221 | "metadata": {}, 222 | "output_type": "execute_result" 223 | } 224 | ], 225 | "source": [ 226 | "# your code goes here:\n", 227 | "vaccinations.loc[vaccinations[\"location\"] == \"Austria\", ].tail()" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "Create a new dataframe which contains dates, locations, and new cases - but no information about deaths.\n", 235 | "\n", 236 | "[Help](https://www.google.com/search?q=pandas+remove+column) ([alternative](https://www.google.com/search?q=pandas+select+columns))" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "# your code goes here:\n", 246 | "cases = cases_deaths.drop(columns='new_deaths')\n", 247 | "cases = cases_deaths.loc[:, ['date', 'location', 'new_cases']]\n", 248 | "cases" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "Get the names of all locations starting with an \"E\"!\n", 256 | "\n", 257 | "[Help](https://www.google.com/search?q=pandas+string+starts+with) ([alternative](https://www.google.com/search?q=pandas+string+slice))\n" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "# your code goes here:\n", 267 | "locations.loc[locations['location'].str.startswith('E'), 'location']" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "For each letter in the alphabet, print how many location names start with that letter.\n", 275 | "\n", 276 | "[Help](https://www.google.com/search?q=python+loop+over+alphabet)" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "# your code goes here:\n", 286 | "for letter in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':\n", 287 | " print(letter, locations['location'].str.startswith(letter).sum())" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "Get the names of all locations with a population above 200,000,000.\n", 295 | "\n", 296 | "[Help](https://www.google.com/search?q=pandas+select+larger+than)" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "# your code goes here:\n", 306 | "locations.loc[locations['population'] > 200_000_000, 'location']" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "Get the names of all locations with a population between 7,000,000 and 9,000,000.\n", 314 | "\n", 315 | "[Help](https://www.google.com/search?q=pandas+select+between)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": {}, 322 | "outputs": [], 323 | "source": [ 324 | "# your code goes here:\n", 325 | "locations.loc[(7_000_000 < locations['population']) & (locations['population'] < 9_000_000), 'location']" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "Vaccinations, cases, and deaths are reported not only for individual countries, but also for groups of countries (e.g. continents). Create a new column named \"is_country\" in each of the dataframes based on whether the location is present in `locations.csv`\n", 333 | "\n", 334 | "[Help](https://www.google.com/search?q=pandas+select+if+in+list)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": null, 340 | "metadata": {}, 341 | "outputs": [], 342 | "source": [ 343 | "# your code goes here:\n", 344 | "for df in (cases, cases_deaths, vaccinations):\n", 345 | " df[\"is_country\"] = df[\"location\"].isin(locations[\"location\"])" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "Get the _country_ with the highest number of vaccinations in a single day - continents and other country groups don't count!\n", 353 | "\n", 354 | "[Help](https://www.google.com/search?q=pandas+row+with+max+value+in+column)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": {}, 361 | "outputs": [], 362 | "source": [ 363 | "# your code goes here:\n", 364 | "vaccinations.loc[vaccinations[vaccinations['is_country']]['daily_vaccinations'].idxmax()]" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "Get the 10 least-populated locations.\n", 372 | "\n", 373 | "[Help](https://www.google.com/search?q=pandas+smallest+rows)" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "# your code goes here:\n", 383 | "locations.nsmallest(10, 'population')" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "Find the unique continent names contained in the locations file.\n", 391 | "\n", 392 | "[Help](https://www.google.com/search?q=pandas+find+unique+values)" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 130, 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "data": { 402 | "text/plain": [ 403 | "array(['Asia', 'Europe', 'Africa', 'North America', 'South America',\n", 404 | " 'Oceania'], dtype=object)" 405 | ] 406 | }, 407 | "execution_count": 130, 408 | "metadata": {}, 409 | "output_type": "execute_result" 410 | } 411 | ], 412 | "source": [ 413 | "# your code goes here:\n", 414 | "locations[\"continent\"].unique()" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "Count the number of locations associated with each continent.\n", 422 | "\n", 423 | "[Help](https://www.google.com/search?q=pandas+count+values)" 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": 109, 429 | "metadata": {}, 430 | "outputs": [ 431 | { 432 | "data": { 433 | "text/plain": [ 434 | "Africa 55\n", 435 | "Asia 49\n", 436 | "Europe 49\n", 437 | "North America 34\n", 438 | "Oceania 16\n", 439 | "South America 13\n", 440 | "Name: continent, dtype: int64" 441 | ] 442 | }, 443 | "execution_count": 109, 444 | "metadata": {}, 445 | "output_type": "execute_result" 446 | } 447 | ], 448 | "source": [ 449 | "# your code goes here:\n", 450 | "locations[\"continent\"].value_counts()" 451 | ] 452 | } 453 | ], 454 | "metadata": { 455 | "kernelspec": { 456 | "display_name": "CSS kernel", 457 | "language": "python", 458 | "name": "css" 459 | }, 460 | "language_info": { 461 | "codemirror_mode": { 462 | "name": "ipython", 463 | "version": 3 464 | }, 465 | "file_extension": ".py", 466 | "mimetype": "text/x-python", 467 | "name": "python", 468 | "nbconvert_exporter": "python", 469 | "pygments_lexer": "ipython3", 470 | "version": "3.10.0" 471 | }, 472 | "orig_nbformat": 4 473 | }, 474 | "nbformat": 4, 475 | "nbformat_minor": 2 476 | } 477 | -------------------------------------------------------------------------------- /tutorials/tut04_functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python Crash Course 04 - Functions" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Functions \n", 15 | "\n", 16 | "[Video tutorial (22 min)](https://www.youtube.com/watch?v=9Os0o3wzS_I)\n", 17 | "\n", 18 | "A _function_ is a block of code which only runs if it is called. We can pass data to the function in form of _arguments_. While it is possible to avoid functions when programming, they make our lifes a lot easier since the allow...\n", 19 | "\n", 20 | "* ... reusing code snippets\n", 21 | "* ... better code structuring\n", 22 | "* ... changing of code throughout a program without copy-paste\n", 23 | "\n", 24 | "We can use a function on similar but different input data to get the desired output without copying or rewriting a lot of code. For example, look at the following gif. There is a function (`add_one_side`) which adds one side to a geometric form. Applying the function to different geometric forms (the _input_) creates new forms, each with one added side (the _output_).\n", 25 | "\n", 26 | "\n", 27 | "\n", 28 | "(gif taken from [https://www.codecademy.com](https://www.codecademy.com/courses/learn-python-3/lessons/intro-to-functions/exercises/introduction))" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### Defining functions in python\n", 36 | "Now that we basically now the concept of functions we have a look at how we can define and use them in python. Actually, define or `def` is a good first keyword 😉\n", 37 | "\n", 38 | "```python\n", 39 | "# Use \"def\" to create new functions\n", 40 | "def function_name(arg1, arg2, ..., argN):\n", 41 | " # do something\n", 42 | " return something # optional!\n", 43 | "```\n", 44 | "Ok. What is this all? So every definition of a function in python starts with the `def` keyword followed by the function name. \n", 45 | "As we said before, we want to pass data to a function. This is done by passing arguments. The expected arguments are defined in the parentheses directly after the function name. You can add as many arguments as you need and they are all separated by comma. \n", 46 | "At the end of the line we need to add the colon! \n", 47 | "\n", 48 | "Now we can write code inside the function. It is important to tell python which part is part of the function and which is not. So everything inside the function needs to be indented (by 4 spaces). \n", 49 | "At the end of the function we can return values and use the `return` keyword followed by the value we want to return.\n", 50 | "\n", 51 | "Let's try it!" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "def hello_world():\n", 61 | " print(\"Inside the function\")\n", 62 | "\n", 63 | "print(\"Outside the function\")" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "So we created a function `hello_world` with no expected arguments (notice the empty parentheses). The function prints `\"Inside the Function\"` and does not have an explicit return value (implicitly, `None` is returned).\n", 71 | "\n", 72 | "But what happend here? Why did we only see the `\"Outside the Function\"` string not the `\"Inside the Function\"` string? \n", 73 | "\n", 74 | "A function has to be _defined_ __and__ _executed_ to make something happen!\n", 75 | "\n", 76 | "So now, lets call the function. This is done by writing the function name followed by parentheses. If the function required arguments, we would also need to pass them here." 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "hello_world()" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "#### Positional arguments\n", 93 | "As we heard before, we can pass data to a function via arguments. The names between the parentheses in the function definition decide by which names the passed values will be known inside the function body. These names are so called _local variables_, i.e. they are only valid _within the function_.\n", 94 | "\n", 95 | "Let's try to create a function with two parameters which should subtract the second from the first. Notice the ordering of the arguments is important. Therefore also the name positional arguments (positional *->* the position matters)." 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "def absolute_distance(x, y):\n", 105 | " print(abs(x - y))" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "absolute_distance(5, 7)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "As we said we can reuse the function with different arguments:" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "absolute_distance(7, 5)\n", 131 | "absolute_distance(100, 200)\n", 132 | "absolute_distance(-10, 5)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "#### Keyword arguments\n", 140 | "While positional arguments' values are assigned implicitly based on their position, values can also be passed _explicitly_ by their name. For functions with many arguments, this makes it more clear which value belongs to which argument." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "id": "33fb95e0", 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "def say_hello(name, age):\n", 151 | " print(f\"Hi, my name is {name} and I'm {age} years old\")" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "id": "0b13280b", 157 | "metadata": {}, 158 | "source": [ 159 | "Argument assignment based on position, as before:" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "id": "89f107ae", 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "say_hello(\"Tim\", 27)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "id": "264f35e2", 175 | "metadata": {}, 176 | "source": [ 177 | "Argument assignment based on name (i.e. using \"keywords\"):" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": null, 183 | "id": "a90e8881", 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "say_hello(name=\"Tim\", age=27)" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "id": "98eb40a3", 193 | "metadata": {}, 194 | "source": [ 195 | "The order of keyword arguments can be changed:" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "id": "75735d68", 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "say_hello(age=27, name=\"Tim\")" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "id": "60bc7efb", 211 | "metadata": {}, 212 | "source": [ 213 | "You can use both positional arguments and keyword arguments in a single function call:" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "id": "669dc44f", 220 | "metadata": {}, 221 | "outputs": [], 222 | "source": [ 223 | "say_hello(\"Tim\", age=27)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "id": "ed99cf08", 229 | "metadata": {}, 230 | "source": [ 231 | "However, keyword arguments are only allowed _after_ positional ones:" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "id": "401e27c9", 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "say_hello(name=\"Tim\", 27)" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "id": "ed2d8320", 247 | "metadata": {}, 248 | "source": [ 249 | "#### Default argument values\n", 250 | "Inside a function definition, default values can be given for (some of) its arguments. Such a function can then be called without passing all arguments:" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "id": "a6aabd12", 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "def power(base, exponent=2):\n", 261 | " print(base**exponent)\n", 262 | " \n", 263 | "power(4)" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "id": "ecb5430d", 269 | "metadata": {}, 270 | "source": [ 271 | "If the argument `exponent` is not given, the default value `2` is used. Otherwise, the function uses the given value:" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "id": "7635754e", 278 | "metadata": {}, 279 | "outputs": [], 280 | "source": [ 281 | "power(4, 3)" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "id": "d0a6ab21", 287 | "metadata": {}, 288 | "source": [ 289 | "Again, both the mandatory argument and the argument with a default value can be given with or without an explicit name:" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "id": "6c4c724d", 296 | "metadata": {}, 297 | "outputs": [], 298 | "source": [ 299 | "power(10, 3)\n", 300 | "power(10, exponent=3)\n", 301 | "power(base=10, exponent=3)" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "id": "cfe58586", 307 | "metadata": {}, 308 | "source": [ 309 | "Arguments with default values have to be defined _after_ all mandatory arguments, so this is syntactically invalid:" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": null, 315 | "id": "dd2b538e", 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [ 319 | "def power(exponent=2, base):\n", 320 | " print(base**exponent)" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "### Return values\n", 328 | "Last but not least we talk about the return values. They are used to pass results from inside the function to the outside. Again, when no return is present Python automatically returns `None`, which can be seen when looking at the return value of our frst function:" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": null, 334 | "metadata": { 335 | "scrolled": true 336 | }, 337 | "outputs": [], 338 | "source": [ 339 | "print(hello_world())" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "Now we create a function `sqrt` which calculates the square root of a given number and returns this:" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": {}, 353 | "outputs": [], 354 | "source": [ 355 | "def sqrt(x):\n", 356 | "# print(x**0.5)\n", 357 | " return x**0.5" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": null, 363 | "metadata": {}, 364 | "outputs": [], 365 | "source": [ 366 | "print(f\"The square root of 7 is {sqrt(2)}\")\n" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "We can also return more than one value! This is done by creating a tuple of the desired values in the `return` statement and _unpacking_ them when assigning names to the function results:" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "def minmax(numbers):\n", 383 | " return min(numbers), max(numbers)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": null, 389 | "metadata": {}, 390 | "outputs": [], 391 | "source": [ 392 | "nums = [52, 27, 10, 99, 83]\n", 393 | "\n", 394 | "smallest, largest = minmax(nums)\n", 395 | "print(smallest)\n", 396 | "print(largest)" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "### Type hints\n", 404 | "Since we always want to create code which is nice to read, we can tell other about the required types for a function's arguments and what the type of the return value will be by adding _type hints_." 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": null, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "def a_very_useful_function(\n", 414 | " first_param: int, \n", 415 | " second_param: float, \n", 416 | " default_param: list[float] = [1.2, 2.1],\n", 417 | ") -> str:\n", 418 | " # do something\n", 419 | " \n", 420 | " return \"some string as defined\"" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "Notice here: we tell the user that the first parameter *should* be a integer, the second *should* be a float and the default parameter *should* be a list of floats. The function returns a string. These typehints are, as the name suggests only hints and don't prevent from passing other types than defined. For more information what to include in your typehints see [here](https://docs.python.org/3/library/typing.html).\n", 428 | "\n", 429 | "We encourage you to always add typehints, as they show mistakes early and help your text editor's autocompletion." 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "### Docstrings\n", 437 | "Another way to help others read your code is to create a docstring for each of your functions. This is a text which describes the expected parameters, what the function does, and what the return value is.\n", 438 | "A docstring starts and ends with three quotation marks `\"\"\"`. The content is up to the programmer, but there are different style guides on how to structure the information inside a docstring. [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) is the most commonly used convention in the data science community, so we encourage you to use that, as shown in the example below:" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "id": "a2fc5b59", 445 | "metadata": {}, 446 | "outputs": [], 447 | "source": [ 448 | "\n", 449 | "def calculate_price(product: str, amount: int = 1) -> float:\n", 450 | " \"\"\"\n", 451 | " Calculate the total price of a purchase.\n", 452 | "\n", 453 | " For now, the product range is rather small ;) \n", 454 | "\n", 455 | " Parameters\n", 456 | " ----------\n", 457 | " product : str\n", 458 | " The desired product.\n", 459 | " amount : int, optional\n", 460 | " The number of products to be bought, by default 1.\n", 461 | "\n", 462 | " Returns\n", 463 | " -------\n", 464 | " float\n", 465 | " The total price.\n", 466 | " \"\"\" \n", 467 | " products = {\"pizza\": 3.45, \"noodles\": 0.99}\n", 468 | " return products[product] * amount" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "### Common Mistakes\n", 476 | "\n", 477 | "It is important that you **pass all needed parameters to the function when calling**, else you encounter an error:" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "def subtract(x, y):\n", 487 | " return x - y\n", 488 | " \n", 489 | "subtract(5) # one argument missing -> TypeError" 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": {}, 495 | "source": [ 496 | "The same if you pass too many parameters:" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": null, 502 | "metadata": {}, 503 | "outputs": [], 504 | "source": [ 505 | "subtract(5, 4, 3) # one argument extra -> TypeError" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "Both errors above are fixed by passing the correct amount of parameters" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "subtract(5, 4)" 522 | ] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": {}, 527 | "source": [ 528 | "Don't forget to indent inside a function (you should use 4 spaces for that)" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": {}, 535 | "outputs": [], 536 | "source": [ 537 | "def some_function():\n", 538 | "print(\"Some text 'inside' a function\")\n", 539 | "some_function()" 540 | ] 541 | }, 542 | { 543 | "cell_type": "markdown", 544 | "metadata": {}, 545 | "source": [ 546 | "Here the fix is to correctly indent the code inside the function" 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": null, 552 | "metadata": {}, 553 | "outputs": [], 554 | "source": [ 555 | "def some_function():\n", 556 | " print(\"Some text inside a function\")\n", 557 | "some_function()" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "id": "c033c3e2", 563 | "metadata": {}, 564 | "source": [ 565 | "Default values for function arguments are created _just once_ and used on every function call. So using something mutable as a default argument leads to unwanted behaviour (most of the time):" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": null, 571 | "id": "409161d1", 572 | "metadata": {}, 573 | "outputs": [], 574 | "source": [ 575 | "def add_even_numbers_to_list(new_numbers: list, even_numbers: list[int] = []):\n", 576 | " for number in new_numbers:\n", 577 | " if number % 2 == 0:\n", 578 | " even_numbers.append(number)\n", 579 | " return even_numbers\n", 580 | "\n", 581 | "\n", 582 | "result = add_even_numbers_to_list([1, 2, 3, 4, 5])\n", 583 | "print(result)\n", 584 | "result = add_even_numbers_to_list([6, 7, 8])\n", 585 | "print(result)\n", 586 | "result = add_even_numbers_to_list([12, 13, 14], even_numbers=[8, 10])\n", 587 | "print(result)" 588 | ] 589 | }, 590 | { 591 | "cell_type": "markdown", 592 | "id": "58794233", 593 | "metadata": {}, 594 | "source": [ 595 | "In the second function call, the list specified as the default for the argument `even_numbers` already contained `2` and `4` from the first call. To avoid such a situation, write the function like this:" 596 | ] 597 | }, 598 | { 599 | "cell_type": "code", 600 | "execution_count": null, 601 | "id": "588c5e31", 602 | "metadata": {}, 603 | "outputs": [], 604 | "source": [ 605 | "def add_even_numbers_to_list(new_numbers: list, even_numbers: list[int] = None):\n", 606 | " if even_numbers is None:\n", 607 | " even_numbers = []\n", 608 | " for number in new_numbers:\n", 609 | " if number % 2 == 0:\n", 610 | " even_numbers.append(number)\n", 611 | " return even_numbers\n", 612 | "\n", 613 | "\n", 614 | "result = add_even_numbers_to_list([1, 2, 3, 4, 5])\n", 615 | "print(result)\n", 616 | "result = add_even_numbers_to_list([6, 7, 8])\n", 617 | "print(result)\n", 618 | "result = add_even_numbers_to_list([12, 13, 14], even_numbers=[8, 10])\n", 619 | "print(result)" 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": {}, 625 | "source": [ 626 | "## Best Practice\n", 627 | "\n", 628 | "### Global variables\n", 629 | "If, inside a function, a variable name is accessed which is not part of the argument list, Python will look _outside the function_ for this name. This can lead to hard-to-find bugs. Try to make your functions self-contained, i.e. let them only use variables which are passed as arguments!\n", 630 | "\n", 631 | "##### _Don't_:\n", 632 | "```python\n", 633 | "a = 5\n", 634 | "\n", 635 | "def add_number(b: int) -> int:\n", 636 | " return a + b\n", 637 | "```\n", 638 | "Here `a` is defined outside the function, the function relies that `a` is defined somewhere outside, if this is not the case the function would not work.\n", 639 | "\n", 640 | "##### _Do:_\n", 641 | "```python\n", 642 | "def add_number(a: int, b: int) -> int:\n", 643 | " return a + b\n", 644 | "```\n", 645 | "Adding `a` as a positional argument is here the way to go so the function does not rely on variables from outside.\n" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "### Exit function early\n", 653 | "\n", 654 | "When your function has multiple exit points you should always exit as soon a possible.\n", 655 | "##### _Don't:_\n", 656 | "```python\n", 657 | "def root(number: float, degree: int = 2) -> float|None:\n", 658 | " if not isinstance(number, float):\n", 659 | " print(f\"'number' must be of type float but is: {type(number)}\")\n", 660 | " else:\n", 661 | " return number**(1/degree)\n", 662 | "```\n", 663 | "Code can get messy real fast when using a lot of `if/elif/else` statements, therefore try to reduce them by exiting the function early.\n", 664 | "\n", 665 | "##### _Do:_\n", 666 | "```python\n", 667 | "def root(number: float, degree: int = 2) -> float|None:\n", 668 | " if not isinstance(number, float):\n", 669 | " print(f\"'number' must be of type float but is: {type(number)}\")\n", 670 | " return\n", 671 | " return number**(1/degree)\n", 672 | "```\n", 673 | "You don't even need an `else` block here anymore since if `number` is not a float the function returns and never reaches the code outside the `if` block.\n" 674 | ] 675 | } 676 | ], 677 | "metadata": { 678 | "interpreter": { 679 | "hash": "9876a56b36d3e86ed839f942802fea42f90ec2f0ee3c4ea82631e635694a47fe" 680 | }, 681 | "kernelspec": { 682 | "display_name": "Python 3.10.0 64-bit", 683 | "language": "python", 684 | "name": "python3" 685 | }, 686 | "language_info": { 687 | "codemirror_mode": { 688 | "name": "ipython", 689 | "version": 3 690 | }, 691 | "file_extension": ".py", 692 | "mimetype": "text/x-python", 693 | "name": "python", 694 | "nbconvert_exporter": "python", 695 | "pygments_lexer": "ipython3", 696 | "version": "3.10.0" 697 | } 698 | }, 699 | "nbformat": 4, 700 | "nbformat_minor": 5 701 | } 702 | -------------------------------------------------------------------------------- /tutorials/tut01_print_comments_strings_numbers.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "c2073762", 6 | "metadata": {}, 7 | "source": [ 8 | "# Python Crash Course 01 - Print, Comments, Strings, and Numbers" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "51e89896", 14 | "metadata": {}, 15 | "source": [ 16 | "## The notebook environment\n", 17 | "\n", 18 | "Notebook environments consist of two types of cells:\n", 19 | "- Markdown cells (like this one), for writing formatted text\n", 20 | "- Code cells, for writing and executing Python source code\n", 21 | "\n", 22 | "Also, there are two _modes_ in a notebook environment:\n", 23 | "- Edit mode: for changing the content of a cell\n", 24 | "- Command mode: for creating, moving, deleting one or multiple cells\n", 25 | "\n", 26 | "_Edit mode_ can be intered in different ways:\n", 27 | "- Double-click on a markdown cell\n", 28 | "- Single-click in a code cell\n", 29 | "- Hit Enter when any cell is selected in command mode\n", 30 | "\n", 31 | "In _edit mode_, the usual text-editing shortcuts work:\n", 32 | "- Ctrl+c to copy selected text\n", 33 | "- Ctrl+v to paste something\n", 34 | "- Tab for text completion (if your notebook environment supports it)\n", 35 | "\n", 36 | "\n", 37 | "To switch to _command mode_, hit Esc. Below you'll find some of the most commonly used shortcuts. In Colab, you'll have to press Ctrl+m before entering the shortcut, in VS Code and JupyterLab they work directly.\n", 38 | "- a create a new cell _above_ the currently selected\n", 39 | "- b create a new cell _below_ the currently selected\n", 40 | "- c copy selected cell(s)\n", 41 | "- v paste copied cell(s) below the selected one\n", 42 | "- dd delete the selected cell(s) (yes, that's hitting d twice)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "id": "d29f1e24", 48 | "metadata": {}, 49 | "source": [ 50 | "## Print\n", 51 | "The function `print` allows to show something to the user. The message to be printed needs to be surrounded by quotes (`\"` or `'`). The printed words that appear as a result of the `print` function are referred to as _output_. \n", 52 | "\n", 53 | "What you see below is a _code cell_. Place the cursor in the cell (i.e. click on it) and press Shift+Enter to execute it. \n", 54 | "\n", 55 | "Note: If you double click this text, its \"source code\" will be shown.Escgets you back to the rendered version.\n" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "id": "8bfbfef3", 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "print(\"Hello beautiful people\")" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "id": "09c6d7f7", 71 | "metadata": {}, 72 | "source": [ 73 | "By default, the print functions also prints a _newline_ (`\\n`) at the end of the string, so the next print starts off in a new line:\n" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "id": "6503fd8f", 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "print(\"Hello\")\n", 84 | "print(\"Goodbye\")" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "id": "51c9dd61", 90 | "metadata": {}, 91 | "source": [ 92 | "You can also print the contents of variables:" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "id": "2dcacc75", 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "saying_hello = \"Hello there!\"\n", 103 | "saying_goodbye = \"Hasta la vista!\"\n", 104 | "\n", 105 | "print(saying_hello)\n", 106 | "print(saying_goodbye)" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "id": "258db1b4", 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "saying_hello" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "id": "41d9af38", 122 | "metadata": {}, 123 | "source": [ 124 | "What happened here? \n", 125 | "__In a notebook environment, the result of the last expression is always displayed!__ \n", 126 | "This is _not_ normal Python behaviour. In order to show output in a terminal (or show intermediate output in a notebook environment), `print` is required." 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "id": "7260d005", 132 | "metadata": {}, 133 | "source": [ 134 | "## Comments\n", 135 | "_Comments_ allow you to tell the Python interpreter to _ignore_ something. This is useful in two situations:\n", 136 | "\n", 137 | "- providing additional plain text information to someone reading the code\n", 138 | "- trying different implementations without needing to actually (re)move code\n", 139 | "\n", 140 | "In Python, comments start with a number sign (pound or hash sign): `#`. They can be placed in their own line or next to some code:" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "id": "58af2a2e", 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "# This is the first time we're using comments\n", 151 | "\n", 152 | "print(\"Commenting is great!\") # commenting should be encouraged" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "id": "fcd40215", 158 | "metadata": {}, 159 | "source": [ 160 | "If you comment out a line of code, it won't be executed. \n", 161 | "\n", 162 | "Try to comment out the first line and uncomment the second one!" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "id": "8eb55fa7", 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "print(\"Hello\")\n", 173 | "# print(\"Goodbye\")" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "id": "45a6fb7a", 179 | "metadata": {}, 180 | "source": [ 181 | "Most programming environments have keyboard shortcuts to comment and uncomment a line or block of code (so you don't have to add/remove the `#` in each individual line). Most commonly, those are Ctrl+/ or Ctrl+#, depending on your keyboard layout. You can try with the block of print statements below (select multiple lines and press the keys). If neither of the two works, take a look at the shortcut list (in Colab, go to \"Tools\" -> \"Keyboard shortcuts\")." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "id": "567f5ff4", 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "print(\"So\")\n", 192 | "print(\"many\")\n", 193 | "print(\"lines\")\n", 194 | "print(\"to\")\n", 195 | "print(\"(un-)\")\n", 196 | "print(\"comment\")" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "id": "294c984f", 202 | "metadata": {}, 203 | "source": [ 204 | "## Best practices\n", 205 | "- Code itself exactly states what the program will do. Therefore, instead of explaining in plain text _what_ the code does (which will often be less specific), try to explain _why_ a part of the code is needed.\n", 206 | "- Sometimes someone will read the code in the future, think \"oh, there should be a much easier way to do this\". This someone will often be you and the \"easy way\" may turn out to be difficult of not work at all. So, if you tried multiple approaches to a problem before you arrived at a solution, you might want to state why the others didn't work. \n", 207 | "- Add a single space between the `#` and the comment text for readability.\n", 208 | "- You'll find more best practices [here](https://pep8.org/#comments).\n" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "id": "a2817e85", 214 | "metadata": {}, 215 | "source": [ 216 | "## Strings\n", 217 | "- [Video tutorial (21 min)](https://www.youtube.com/watch?v=k9TUPpGqYTo)\n", 218 | "- [Library reference](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)\n", 219 | "\n", 220 | "Strings hold textual data. A string in python is created with double quotes (`\"`) or single quotes (`'`) (Don't mix them up!). Each string can be seen as a sequence of characters.\n", 221 | "You can do a lot with strings and since Python is a cool programing language it provides a big set of string methods to use. Here are some of the most commonly used (you'll find a lot more in the [library reference](https://docs.python.org/3/library/stdtypes.html#string-methods)):\n", 222 | "\n", 223 | "| Method | Action |\n", 224 | "|-----------------|-------------------------------------------------------|\n", 225 | "| `len(string)` | gives the length of the string (amount of characters) |\n", 226 | "| `string[4]` | gives the 4th character of the string (begin to count at 0!)\n", 227 | "| `str.join(iterable)` | joins the iterable to a string. `str` is the delimiter between the elements |\n", 228 | "| `str.split(sep=None, maxsplit=-1)`| splits the string by the seperator |\n", 229 | "| `str.find(sub[, start[, end]])` | finds the substring in `str` and returns the index |\n", 230 | "| `str.replace(old, new[, count])`| replaces `old` with `new` in `str` |" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "id": "a506c5cb", 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "string_1 = \"Hello\"\n", 241 | "string_2 = 'World'\n", 242 | "\n", 243 | "type(string_1)" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "id": "effa93c5", 249 | "metadata": {}, 250 | "source": [ 251 | "What? Math on string? This is getting crazy. \n", 252 | "But actually it makes sense: \n", 253 | "Strings can be concatenated (\"put together\") using `+`" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "id": "501753c6", 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "string_1 + \" \" + string_2\n", 264 | "# string_2 + \" \" + string_1" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "id": "dc090cf7", 270 | "metadata": {}, 271 | "source": [ 272 | "If you want to repeat a string multiple times you can use `*`" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "id": "27db3530", 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [ 282 | "string_1 * 10" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "id": "0e18440d", 288 | "metadata": {}, 289 | "source": [ 290 | "To get a single character of the string you can use square brakets. Be carefull, we programmers always start with 0 when counting! \n", 291 | "For example the string `hello_world = \"Hello World\"` has following indexing: \n", 292 | "```python\n", 293 | "hello_world[0] = 'H'\n", 294 | "hello_world[1] = 'e'\n", 295 | "hello_world[2] = 'l'\n", 296 | "hello_world[3] = 'l'\n", 297 | "hello_world[4] = 'o'\n", 298 | "hello_world[5] = ' '\n", 299 | "hello_world[6] = 'W'\n", 300 | "hello_world[7] = 'o'\n", 301 | "hello_world[8] = 'r'\n", 302 | "hello_world[9] = 'l'\n", 303 | "hello_world[10] = 'd'\n", 304 | "```\n", 305 | "\n", 306 | "You can also get a substrings from the string by using slicing. It always follows the same pattern: \n", 307 | "```python\n", 308 | "str[start:end:stepsize]\n", 309 | "```\n", 310 | "If there's only one colon (`:`), \n", 311 | "Notice `start` is always _inclusive_ while `end` is _exclusive_:\n", 312 | "\n", 313 | "```python\n", 314 | "hello_world[0:4] = 'Hell'\n", 315 | "hello_world[0:4:2] = 'Hl'\n", 316 | "hello_world[0:-3] = 'Hello Wo'\n", 317 | "hello_world[0:-3:3] = 'HlW'\n", 318 | "hello_world[::] = 'Hello World'\n", 319 | "hello_world[::-1] = 'dlroW olleH'\n", 320 | "```" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "id": "167b6bb6", 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "hello_world = \"Hello World\"\n", 331 | "hello_world[:]" 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "id": "b4cde4c4", 337 | "metadata": { 338 | "tags": [] 339 | }, 340 | "source": [ 341 | "Now let´s also take a look at the methods we mentioned earlier that exist for strings:" 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": null, 347 | "id": "0c3bc861", 348 | "metadata": { 349 | "tags": [] 350 | }, 351 | "outputs": [], 352 | "source": [ 353 | "string = \"I am a string\"" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "id": "efd5d4b4", 360 | "metadata": { 361 | "tags": [] 362 | }, 363 | "outputs": [], 364 | "source": [ 365 | "len(string)" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": null, 371 | "id": "68b4b01d", 372 | "metadata": { 373 | "tags": [] 374 | }, 375 | "outputs": [], 376 | "source": [ 377 | "splitted_string = string.split(\"a\")\n", 378 | "splitted_string" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": null, 384 | "id": "86a02e01", 385 | "metadata": {}, 386 | "outputs": [], 387 | "source": [ 388 | "\"_hallo_\".join(splitted_string)" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": null, 394 | "id": "f6a13e38", 395 | "metadata": { 396 | "tags": [] 397 | }, 398 | "outputs": [], 399 | "source": [ 400 | "string.find(\"z\") # returns the index. What happens if the substring does not exist?" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "id": "6cb8a65b", 406 | "metadata": { 407 | "tags": [] 408 | }, 409 | "source": [ 410 | "## Numbers\n", 411 | "- [Video tutorial (12 min)](https://www.youtube.com/watch?v=khKv-8q7YmY)\n", 412 | "- [Library reference](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex)\n" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "id": "127cce8a", 418 | "metadata": { 419 | "tags": [] 420 | }, 421 | "source": [ 422 | "#### Integers:\n", 423 | "An Integer is a Datatype which holds whole Numbers. The Numbers can be negative or positiv. In python Integers have unlimited precision." 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": null, 429 | "id": "ebaa533d", 430 | "metadata": { 431 | "tags": [] 432 | }, 433 | "outputs": [], 434 | "source": [ 435 | "an_integer = 5\n", 436 | "type(an_integer) # The type is int" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "id": "b1c0d4f1", 443 | "metadata": { 444 | "tags": [] 445 | }, 446 | "outputs": [], 447 | "source": [ 448 | "an_integer" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "id": "6ef29e97", 454 | "metadata": { 455 | "tags": [] 456 | }, 457 | "source": [ 458 | "#### Float:\n", 459 | "A Float is a Datatype which holds numbers with decimal places. The Numbers can be negative or positiv. In python floats have a limited precision, which is dependent on the system." 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": null, 465 | "id": "7ae040a3", 466 | "metadata": { 467 | "tags": [] 468 | }, 469 | "outputs": [], 470 | "source": [ 471 | "a_float = 1.45\n", 472 | "type(a_float) # the type is float" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": null, 478 | "id": "4b8ea3a6", 479 | "metadata": { 480 | "tags": [] 481 | }, 482 | "outputs": [], 483 | "source": [ 484 | "a_float" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "id": "d6322fca", 490 | "metadata": {}, 491 | "source": [ 492 | "## Conversion\n", 493 | "Floats can be turned into integers (this cuts off the decimal places, no rounding is performed!):" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "id": "54d5c865", 500 | "metadata": { 501 | "tags": [] 502 | }, 503 | "outputs": [], 504 | "source": [ 505 | "int(1.45)" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": null, 511 | "id": "148a58a3", 512 | "metadata": {}, 513 | "outputs": [], 514 | "source": [ 515 | "int(9.99)" 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "id": "306c7dde", 521 | "metadata": {}, 522 | "source": [ 523 | "This works the other way around as well:" 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": null, 529 | "id": "7d3ef387", 530 | "metadata": {}, 531 | "outputs": [], 532 | "source": [ 533 | "float(2)" 534 | ] 535 | }, 536 | { 537 | "cell_type": "markdown", 538 | "id": "9f547a65", 539 | "metadata": {}, 540 | "source": [ 541 | "Integers and floats can also be created from strings containing only digits (and up to 1 decimal point in case of floats):" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": null, 547 | "id": "9554f454", 548 | "metadata": {}, 549 | "outputs": [], 550 | "source": [ 551 | "float(\"1.234\")" 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": null, 557 | "id": "7678f8cb", 558 | "metadata": { 559 | "tags": [] 560 | }, 561 | "outputs": [], 562 | "source": [ 563 | "int(\"444\")" 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": null, 569 | "id": "8b4b7ce0", 570 | "metadata": { 571 | "tags": [] 572 | }, 573 | "outputs": [], 574 | "source": [ 575 | "int(\"1.234\")" 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "id": "bac4b3ac", 581 | "metadata": {}, 582 | "source": [ 583 | "Again, the other way around works too (the quotes tell us the results are strings):" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": null, 589 | "id": "c482377a", 590 | "metadata": {}, 591 | "outputs": [], 592 | "source": [ 593 | "str(12.345)" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": null, 599 | "id": "6ec0e93a", 600 | "metadata": {}, 601 | "outputs": [], 602 | "source": [ 603 | "str(81926379182793812)" 604 | ] 605 | }, 606 | { 607 | "cell_type": "markdown", 608 | "id": "11738901", 609 | "metadata": { 610 | "tags": [] 611 | }, 612 | "source": [ 613 | "## Math\n", 614 | "With the now newly learned numbers you can also perform math. Following operations are available in python (with floats and integers):\n", 615 | "\n", 616 | "| Syntax | Action | Notes |\n", 617 | "|-------------------|-----------------------------------------------------------------------------|--------|\n", 618 | "| `x + y` | sum of x and y | |\n", 619 | "| `x - y` | difference of x and y | |\n", 620 | "| `x * y` | product of x and y | |\n", 621 | "| `x / y` | quotient of x and y | |\n", 622 | "| `x // y` | floored quotient of x and y | (1) |\n", 623 | "| `x % y` | remainder of x / y (modulo) | | \n", 624 | "| `-x` | x negated | |\n", 625 | "| `x**y` | x to the power y | (2), (3) |\n", 626 | "| [`abs(x)`](https://docs.python.org/3/library/functions.html#abs) | absolute value of x | |\n", 627 | "| [`round(x)`](https://docs.python.org/3/library/functions.html#round) | round x to the nearest integer |||\n", 628 | "| [`round(x, n)`](https://docs.python.org/3/library/functions.html#round) | round x to n decimal places |||\n", 629 | "| [`divmod(x, y)`](https://docs.python.org/3/library/functions.html#divmod) | the pair (x // y, x % y) | |\n", 630 | "\n", 631 | "Notes:\n", 632 | "\n", 633 | "(1) Also referred to as integer division. The resultant value is a whole integer, though the result’s type is not necessarily int. The result is always rounded towards negative infinity: `1//2` is `0`, `(-1)//2` is `-1`, `1//(-2)` is `-1`, and `(-1)//(-2)` is `0`.\n", 634 | "\n", 635 | "(2) Python defines `0**0` to be `1`, as is common for programming languages.\n", 636 | "\n", 637 | "(3) Attention: `x^y` is not the power but the bitwise XOR of the two numbers\n" 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": null, 643 | "id": "524dfbee", 644 | "metadata": {}, 645 | "outputs": [], 646 | "source": [ 647 | "# use parentheses to change the order of operations:\n", 648 | "print(1 + 2 * 3)\n", 649 | "print((1 + 2) * 3)" 650 | ] 651 | }, 652 | { 653 | "cell_type": "markdown", 654 | "id": "0016fb5b", 655 | "metadata": {}, 656 | "source": [ 657 | "# f-strings\n", 658 | "\n", 659 | "Using _formatted string literals_ (\"f-strings\" for short), it is super easy to put variables into some predefined text. Prefix the string with an `f` and put the desired variables into curly brackets:" 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": null, 665 | "id": "82963855", 666 | "metadata": {}, 667 | "outputs": [], 668 | "source": [ 669 | "name = \"Tim\"\n", 670 | "age = 23\n", 671 | "\n", 672 | "print(f\"Hi, my name is {name} and I'm {age} years old!\")" 673 | ] 674 | }, 675 | { 676 | "cell_type": "markdown", 677 | "id": "426d23c4", 678 | "metadata": {}, 679 | "source": [ 680 | "The curly brackets can contain arbitrary Python expressions:" 681 | ] 682 | }, 683 | { 684 | "cell_type": "code", 685 | "execution_count": null, 686 | "id": "9924c089", 687 | "metadata": {}, 688 | "outputs": [], 689 | "source": [ 690 | "print(f\"Hi, my name is {name} and in two years I'll be {age + 2}!\")" 691 | ] 692 | } 693 | ], 694 | "metadata": { 695 | "interpreter": { 696 | "hash": "46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2" 697 | }, 698 | "kernelspec": { 699 | "display_name": "Python 3.10.4 64-bit", 700 | "language": "python", 701 | "name": "python3" 702 | }, 703 | "language_info": { 704 | "codemirror_mode": { 705 | "name": "ipython", 706 | "version": 3 707 | }, 708 | "file_extension": ".py", 709 | "mimetype": "text/x-python", 710 | "name": "python", 711 | "nbconvert_exporter": "python", 712 | "pygments_lexer": "ipython3", 713 | "version": "3.10.4" 714 | } 715 | }, 716 | "nbformat": 4, 717 | "nbformat_minor": 5 718 | } 719 | -------------------------------------------------------------------------------- /exercises/ex10_networkx.ipynb: -------------------------------------------------------------------------------- 1 | {"cells":[{"cell_type":"markdown","id":"1cc99af8","metadata":{},"source":["# Exercise 10: NetworkX\n","NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. [Here](https://networkx.org/documentation/stable/tutorial.html) is a written tutorial of the basics of the library. \n","[This video tutorial (8min)](https://www.youtube.com/watch?v=flwcAf1_1RU) covers the basic usage of networkx.\n","\n","In Google Colab, networkx is already installed. If you work locally, install it by executing `python -m pip install networkx` in a terminal. \n","## Toy Network\n","First we look at a simple toy network before we start with a big one😉\n"]},{"cell_type":"code","execution_count":null,"id":"69eb1a32","metadata":{},"outputs":[],"source":["# Import networkx\n","import networkx as nx\n","import matplotlib.pyplot as plt"]},{"cell_type":"markdown","id":"1135c02f","metadata":{},"source":["First create an empty graph. For this you can use `nx.Graph()`"]},{"cell_type":"code","execution_count":null,"id":"e895b489","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"1f1fb11b","metadata":{},"source":["You can add nodes to the graph by using `add_node` for adding one node or by using `add_nodes_from` with a given iterator. \n","Now add following nodes to the graph `[1,2,3,4,5,6,7,8,9,10]`"]},{"cell_type":"code","execution_count":null,"id":"1b0ea63a","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"6efadf13","metadata":{},"source":["Next we want to add edges. Same as with nodes you can add one edge with the method `add_edge` by giving the two nodes you want to connect, or by `add_edges_from` and given a list of tuples where in each tuple the nodes to connect are saved. For example if you want to connect node 1 to node 2 and node 3, you can use `add_edge`:\n","```python\n","G.add_edge(1,2)\n","G.add_edge(1,3)\n","```\n","or you can use `add_edge_from`:\n","```python\n","G.add_edge_from([(1,2),(1,3)])\n","``` \n","Now connect all nodes to node 1:"]},{"cell_type":"code","execution_count":null,"id":"048b0a11","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"8cc9e8c5","metadata":{},"source":["Now let's visualize the network. For this you can use `nx.random_layout(G)`. To draw the network use `nx.draw_networkx`."]},{"cell_type":"code","execution_count":null,"id":"55f80e67","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"e811f889","metadata":{},"source":["Lastly you can remove nodes and edges with `remove_node` and `remove_edge`."]},{"cell_type":"markdown","id":"a92d7910","metadata":{},"source":["## Facebook Network Analysis\n","Now we look at a bigger network. \n","Modified from: https://networkx.org/nx-guides/content/exploratory_notebooks/facebook_notebook.html\n","\n","This notebook contains a social network analysis mainly executed with the library of NetworkX. In detail, the facebook circles (friends lists) of ten people will be examined and scrutinized in order to extract all kinds of valuable information. The dataset can be found in the [stanford website](http://snap.stanford.edu/data/ego-Facebook.html). Moreover, as known, a facebook network is undirected and has no weights because one user can become friends with another user just once. Looking at the dataset from a graph analysis perspective:\n","* Each node represents an anonymized facebook user that belongs to one of those ten friends lists.\n","* Each edge corresponds to the friendship of two facebook users that belong to this network. In other words, two users must become friends on facebook in order for them to be connected in the particular network.\n","\n","Note: Nodes $0, 107, 348, 414, 686, 698, 1684, 1912, 3437, 3980$ are the ones whose friends list will be examined. That means that they are in the spotlight of this analysis. Those nodes are considered the `spotlight nodes`"]},{"cell_type":"markdown","id":"8a0646b9","metadata":{},"source":["* First the necessary libraries are imported"]},{"cell_type":"code","execution_count":null,"id":"3c092e3e","metadata":{},"outputs":[],"source":["%matplotlib inline\n","import pandas as pd\n","import numpy as np\n","import networkx as nx\n","import matplotlib.pyplot as plt\n","from random import randint"]},{"cell_type":"markdown","id":"59679317","metadata":{},"source":["* The edges are loaded from the `data` folder and saved in a dataframe. Each edge is a new row and for each edge there is a `start_node` and an `end_node` column"]},{"cell_type":"code","execution_count":null,"id":"76cc313d","metadata":{},"outputs":[],"source":["facebook = pd.read_csv('facebook_combined.txt.gz', compression='gzip', sep=' ', names=['start_node', 'end_node'])\n","facebook"]},{"cell_type":"markdown","id":"36b6446a","metadata":{},"source":["* The graph is created from the `facebook` dataframe of the edges:"]},{"cell_type":"code","execution_count":null,"id":"a22bca0c","metadata":{},"outputs":[],"source":["G = nx.from_pandas_edgelist(facebook, 'start_node', 'end_node')"]},{"cell_type":"markdown","id":"8a09df92","metadata":{},"source":["## Visualizing the graph\n","\n","Let's start our exploration by visualizing the graph. Visualization plays a\n","central role in exploratory data analysis to help get a qualitative feel for\n","the data.\n","\n","Since we don't have any real sense of structure in the data, let's start by\n","viewing the graph with `random_layout`, which is among the fastest of the layout\n","functions.\n","\n","For this you can use `nx.random_layout(G)`. To draw the network use `nx.draw_networkx`."]},{"cell_type":"code","execution_count":null,"id":"1456baec","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"fb87685b","metadata":{},"source":["The resulting image is... not very useful. Graph visualizations of this kind\n","are sometimes colloquially referred to as \"hairballs\" due to the overlapping\n","edges resulting in an entangled mess.\n","\n","It's clear that we need to impose more structure on the positioning of the if\n","we want to get a sense for the data. For this, we can use the `spring_layout`\n","function which is the default layout function for the networkx drawing module.\n","The `spring_layout` function has the advantage that it takes into account the\n","nodes and edges to compute locations of the nodes. The downside however, is\n","that this process is much more computationally expensive, and can be quite\n","slow for graphs with 100's of nodes and 1000's of edges.\n","\n","Since our dataset has over 80k edges, you should limit the number of iterations\n","used in the `spring_layout` function to reduce the computation time.\n","We will also save the computed layout so we can use it for future\n","visualizations.\n","\n","You can also try out different layouts. \n","[Here](https://networkx.org/documentation/stable/reference/drawing.html#module-networkx.drawing.layout) \n","is a list with different layouts to choose from."]},{"cell_type":"code","execution_count":null,"id":"89319bb9","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"26b6e550","metadata":{},"source":["This visualization is much more useful than the previous one! Already we can\n","glean something about the structure of the network; for example, many of the\n","nodes seem to be highly connected, as we might expect for a social network.\n","We also get a sense that the nodes tend to form clusters. The `spring_layout`\n","serves to give a qualitative sense of clustering, but it is not designed for\n","repeatable, qualitative clustering analysis. We'll revisit evaluating\n","network clustering [later in the analysis](#clustering-effects)\n","\n","## Basic topological attributes\n","Now lets have a deeper look at the topology of the graph. \n","Output the total number of nodes in network and total number of edges:"]},{"cell_type":"code","execution_count":null,"id":"afd0034e","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"57bfadfb","metadata":{},"source":["Try to output the average number of degrees of the graph. You can create a list of all the degrees of the nodes and use `np.mean` to calculate their mean:"]},{"cell_type":"code","execution_count":null,"id":"939194e5","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"cce3eece","metadata":{},"source":["There are many interesting properties related to the distribution of *paths*\n","through the graph.\n","For example, the *diameter* of a graph represents the longest of the\n","shortest-paths that connect any node to another node in the Graph.\n","Similarly, the average path length gives a measure of the average number of\n","edges to be traversed to get from one node to another in the network.\n","These attributes can be calculated with the `nx.diameter` and\n","`nx.average_shortest_path_length` functions, respectively.\n","Note however that these analyses require computing the shortest path between\n","every pair of nodes in the network: this can be quite expensive for networks\n","of this size!\n","Since we're interested in several analyses involving the shortest path length\n","for all nodes in the network, we can instead compute this once and reuse the\n","information to save computation time.\n","\n","Let's start by computing the shortest path length for all pairs of nodes in the\n","network. For this you can use `nx.all_pairs_shortest_path_length` method and cast it to \n","a dictionary. This computation might take a while ;)\n","`nx.all_pairs_shortest_path_length` returns a dict-of-dict that maps a node `u`\n","to all other nodes in the network, where the inner-most mapping returns the\n","length of the shortest path between the two nodes.\n","In other words, `shortest_path_lengths[u][v]` will return the shortest path\n","length between any two pair of nodes `u` and `v`."]},{"cell_type":"code","execution_count":null,"id":"5d137d20","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"c8990322","metadata":{},"source":["Now let's use `shortest_path_lengths` to perform our analyses, starting with\n","the *diameter* of `G`.\n","If we look carefully at the [docstring for `nx.diameter`][nx_diameter_], we see\n","that it is equivalent to the maximum *eccentricity* of the graph.\n","It turns out that `nx.eccentricity` has an optional argument `sp` where we can\n","pass in our pre-computed `shortest_path_lengths` to save the extra computation.\n","Now ues `nx.eccentricity` and `max` to calculate the maximum *eccentricity* of\n","the graph. As said you should use the optional argument `sp` for that."]},{"cell_type":"code","execution_count":null,"id":"7e7e20d9","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"7c60ae16","metadata":{},"source":["[nx_diameter_]: https://networkx.org/documentation/latest/reference/algorithms/generated/networkx.algorithms.distance_measures.diameter.html\n","\n","Next up, the average path length is found.\n","Again, we could use `nx.average_shortest_path_length` to compute this\n","directly, but it's much more efficient to use the `shortest_path_length` that\n","we've already computed. For this you have to calculate the mean of all shorthest path\n","lenghts have to be calculated:"]},{"cell_type":"code","execution_count":null,"id":"b13a14ff","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"c858d6b6","metadata":{},"source":["This represents the average of the shortest path length for all pairs of nodes\n","\n","The above measures capture useful information about the network, but metrics\n","like the average value represent only a moment of the distribution; it is\n","also often valuable to look at the *distribution* itself.\n","Again, we can construct a visualization of the distribution of shortest path\n","lengths from our pre-computed dict-of-dicts. For this create a bar plot with\n","matplotlib, on the x-Axis the shortest path lengths should be plotted and on the \n","y-Axis the corresponding frequency in percent:"]},{"cell_type":"code","execution_count":null,"id":"e65c0986","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"81e7e313","metadata":{},"source":["\n","\n","* The graph's density is calculated here. Clearly, the graph is a very sparse one as: $density < 1$\n","Next on calculate the density of the graph and the number of connected components. This can be done with `nx.density` and\n","`nx.number_connected_components`:"]},{"cell_type":"code","execution_count":null,"id":"eb164b44","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"bb86b600","metadata":{},"source":["## Centrality measures\n","Now the centrality measures will be examined for the facebook graph"]},{"cell_type":"markdown","id":"f7219518","metadata":{},"source":["### Degree Centrality\n","Degree centrality assigns an importance score based simply on the number of links held by each node. In this analysis, that means that the higher the degree centrality of a node is, the more edges are connected to the particular node and thus the more neighbor nodes (facebook friends) this node has. In fact, the degree of centrality of a node is the fraction of nodes it is connected to. In other words, it is the percentage of the network that the particular node is connected to meaning being friends with. \n","Calculate the nodes with the highest degree centralities. Output the 8 nodes with the highest degree centralities. You can use `nx.centrality.degree_centrality`:"]},{"cell_type":"code","execution_count":null,"id":"e87891ef","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"9370fd6a","metadata":{},"source":["Plot the distribution of degree centralities. Use matplotlib to create a histogram, where the x-axis shows the degree centralities and the y-axis shows their counts."]},{"cell_type":"code","execution_count":null,"id":"434ab8e8","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"335667a5","metadata":{},"source":["Now let's plot the users with highest degree centralities from the size of their nodes. For this use `nx.draw_network` again, using the parameter `node_size` to set the size of the nodes. The ones with a higher degree centratlity should be bigger:"]},{"cell_type":"code","execution_count":null,"id":"2888827d","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"dd740621","metadata":{},"source":["### Betweenness Centrality\n","Betweenness centrality measures the number of times a node lies on the shortest path between other nodes, meaning it acts as a bridge. In detail, betweenness centrality of a node $v$ is the percentage of all the shortest paths of any two nodes (apart from $v$), which pass through $v$. Specifically, in the facebook graph this measure is associated with the user's ability to influence others. A user with a high betweenness centrality acts as a bridge to many users that are not friends and thus has the ability to influence them by conveying information (e.g. by posting something or sharing a post) or even connect them via the user's circle (which would reduce the user's betweeness centrality after). \n","Compute the nodes with the $8$ highest betweenness centralities with their centrality values. You can use `nx.centrality.betweenness_centrality` (this might take a while to execute)."]},{"cell_type":"code","execution_count":null,"id":"66712c37","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"ec8f474c","metadata":{},"source":["Moving on, the distribution of betweenness centralities must be plotted. Again, use matplotlib to create a histogram!"]},{"cell_type":"code","execution_count":null,"id":"ed9140cb","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"085f38e8","metadata":{},"source":["We can also get an image on the nodes with the highest betweenness centralities and where they are located in the network. It is clear that they are the bridges from one community to another. Again plot the users with highest betweenness centralities from the size of their nodes. For this use again `nx.draw_network` and use the parameter `node_size` to set the size of the nodes. The ones with a higher betweenness centratlity should be bigger:"]},{"cell_type":"code","execution_count":null,"id":"1457d461","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"1bf4549f","metadata":{},"source":["There are also other centrality measures available for a full list look [here](https://networkx.org/documentation/stable/reference/algorithms/centrality.html)."]},{"cell_type":"markdown","id":"94ff5884","metadata":{},"source":["## Clustering Effects\n","The clustering coefficient of a node $v$ is defined as the probability that two randomly selected friends of $v$ are friends with each other. As a result, the average clustering coefficient is the average of clustering coefficients of all the nodes. The closer the average clustering coefficient is to $1$, the more complete the graph will be because there's just one giant component. Lastly, it is a sign of triadic closure because the more complete the graph is, the more triangles will usually arise. \n","Compute the average clustering coefficient with the help of `nx.average_clustering`:"]},{"cell_type":"code","execution_count":null,"id":"b6bb1481","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"ec246f80","metadata":{},"source":["Get the clustering coefficient using `nx.clustering` and plot a histogram of its distribution."]},{"cell_type":"code","execution_count":null,"id":"376f354f","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"fa63341e","metadata":{},"source":["## Bridges\n","First of all, an edge joining two nodes A and B in the graph is considered a bridge, if deleting the edge would cause A and B to lie in two different components. Now it is checked if there are any bridges in this network this can be done with `nx.has_bridges`:"]},{"cell_type":"code","execution_count":null,"id":"64d31a59","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"43697d2e","metadata":{},"source":["Now the edges that are bridges should be saved in a list and the number of them should be printed. For this you can use `nx.bridges`:"]},{"cell_type":"code","execution_count":null,"id":"8bf012c7","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"e7927e23","metadata":{},"source":["The existence of so many bridges is due to the fact that this network only contains the spotlight nodes and the friends of them. As a result, some friends of spotlight nodes are only connected to a spotlight node, making that edge a bridge.\n","\n","Also, compute the edges that are local bridges and print their number. In detaill, an edge joining two nodes $C$ and $D$ \n","in a graph is a local bridge, if its endpoints $C$ and $D$ have no friends in common. Very importantly, an edge that is a bridge is also a local bridge. Thus, this list contains all the above bridges as well. This can be done with `nx.local_bridges`:"]},{"cell_type":"code","execution_count":null,"id":"a45ccc0d","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"8bd633a5","metadata":{},"source":["Now we want to show the bridges and local bridges in the network. This can be done with `nx.draw_networkx` and `nx.draw_networkx_edges`. Use the parameter `edgelist` to draw the bridges and the local bridges:"]},{"cell_type":"code","execution_count":null,"id":"14d1a609","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"8c793bd1","metadata":{},"source":["## Assortativity\n","Assortativity describes the preference for a network's nodes to attach to others that are similar in some way.\n","Calculate the assortativity of the network. You can use `nx.degree_assortativity_coefficient` or `nx.degree_pearson_correlation_coefficient` (the latter might be faster)."]},{"cell_type":"code","execution_count":null,"id":"cf888fcb","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"f4019484","metadata":{},"source":["The assortativity coefficient is the Pearson correlation coefficient of degree between pairs of linked nodes. That means that it takes values from $-1$ to $1$. In detail, a positive assortativity coefficient indicates a correlation between nodes of similar degree, while a negative indicates correlation between nodes of different degrees."]},{"cell_type":"markdown","id":"8f0ea501","metadata":{},"source":["## Network Communities\n","A community is a group of nodes, so that nodes inside the group are connected with many more edges than between groups. Two different algorithms will be used for communities detection in this network\n","* Firstly, a semi-synchronous label propagation method[^1] should be used to detect the communities.\n","\n","The function `nx.community.label_propagation_communities` determines the number of communities that will be detected. Now the communities should be iterated through and a colors list should be created to contain the same color for nodes that belong to the same community. Also, the number of communities should be printed:"]},{"cell_type":"code","execution_count":null,"id":"49a6ac01","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"4df3d738","metadata":{},"source":["Now the communities should be showcased in the graph. Each community should be depicted with a different color and its nodes are usually located close to each other. For this you can again use `nx.draw_network` and pass the parameter `node_color`:"]},{"cell_type":"code","execution_count":null,"id":"f76475da","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"2b8fe9e4","metadata":{},"source":["* Next, the asynchronous fluid communities algorithm should be used. \n","\n","With the function `nx.community.asyn_fluidc`, we can decide the number of communities to be detected. Let's say that $8$ communities is the number we want. Again, iterate over the communities and create a colors list to contain the same color for nodes that belong to the same community."]},{"cell_type":"code","execution_count":null,"id":"a0ffcea6","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"faf5aee9","metadata":{},"source":["Now show the $8$ communities in the graph. Again, each community should be depicted with a different color. One last time, use `nx.draw_networkx` and for the parameter `node_colors` use the created colors."]},{"cell_type":"code","execution_count":null,"id":"9b25eece","metadata":{},"outputs":[],"source":["# your code goes here:\n"]},{"cell_type":"markdown","id":"97ecea20","metadata":{},"source":["### References\n","[Cambridge-intelligence](https://cambridge-intelligence.com/keylines-faqs-social-network-analysis/#:~:text=Centrality%20measures%20are%20a%20vital,but%20they%20all%20work%20differently.)\n","\n","[^1]: [Semi-synchronous label propagation](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.label_propagation.label_propagation_communities.html#networkx.algorithms.community.label_propagation.label_propagation_communities)\n","\n","[^2]: [Asynchronous fluid communities algorithm](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.asyn_fluid.asyn_fluidc.html#networkx.algorithms.community.asyn_fluid.asyn_fluidc)"]}],"metadata":{"interpreter":{"hash":"46e2835a142a16ae115bce5fddf19f27ce13b17a4ab8ded638c88ab5ce5171d2"},"jupytext":{"notebook_metadata_filter":"all","text_representation":{"extension":".md","format_name":"myst","format_version":0.13,"jupytext_version":"1.11.1"}},"kernelspec":{"display_name":"Python 3.10.4 64-bit","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.4"},"source_map":[23,32,36,43,47,50,54,56,68,73,92,97,110,112,116,118,124,126,146,148,157,159,168,173,185,192,204,225,232,234,238,240,245,251,254,259,261,268,275,281,286,292,295,304,311,317,322,330,333,341,343,350,356,360,365,372,375,383,388,392,399,405,410,415,417,421,427,433,436,440,442,446,448,455,460,462,466,469,476,479,485,491,497,501,503,509,517,526,530,534,540,546,550,554]},"nbformat":4,"nbformat_minor":5} 2 | -------------------------------------------------------------------------------- /tutorials/tut03_booleans_branching_loops.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python Crash Course 03 - Booleans, Branching, and Loops" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Booleans\n", 15 | "- [Video tutorial (17 min)](https://www.youtube.com/watch?v=DZwmZ8Usvnk&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7&index=6) \n", 16 | "- [Library reference](https://docs.python.org/3/library/stdtypes.html#truth-value-testing )\n", 17 | "\n", 18 | "A _boolean_ is a data type intended to store \"truth values\" - `True` or `False` (note the capitalized first letters!)." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "true_value = True\n", 28 | "false_value = False\n", 29 | "type(true_value)" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "Booleans can be combined an manipulated. There exist the following boolean operations:\n", 37 | "\n", 38 | "| Operation | Result |\n", 39 | "|-------------|--------------------------------------|\n", 40 | "| `x or y` | (OR) if x is False, then y, else x |\n", 41 | "| `x and y` | (AND) if x is False, then x, else y |\n", 42 | "| `not x` | (NOT) if x is False, then True, else False |" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "#### `and`\n", 50 | "\"conjunction\" of two booleans \n", 51 | "\n", 52 | "| x | y | `x and y` |\n", 53 | "|-------|-------|-----------|\n", 54 | "|`False`|`False`| `False` |\n", 55 | "|`False`|`True` | `False` |\n", 56 | "|`True` |`False`| `False` |\n", 57 | "|`True` |`True` | `True` |" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "true_value and false_value" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": { 72 | "tags": [] 73 | }, 74 | "source": [ 75 | "#### `or`\n", 76 | "\"disjunction\" of two booleans \n", 77 | "\n", 78 | "| x | y | `x or y` |\n", 79 | "|-------|-------|-----------|\n", 80 | "|`False`|`False`| `False` |\n", 81 | "|`False`|`True` | `True` |\n", 82 | "|`True` |`False`| `True` |\n", 83 | "|`True` |`True` | `True` |" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "true_value or false_value" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "#### `not`\n", 100 | "negation of a single boolean\n", 101 | "\n", 102 | "| x | `not x` |\n", 103 | "|-------|-----------|\n", 104 | "|`True` | `False` |\n", 105 | "|`False`| `True` |" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "not true_value" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "not false_value" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "## Comparison of Numbers\n", 131 | "It is also possible to compare two numbers. Each comparison results in a boolean. The following comparison operations are possible in Python:\n", 132 | "\n", 133 | "| Operation | Meaning | Example |\n", 134 | "|-------------|-------------------------|---------------------------------|\n", 135 | "| `<` | strictly less than | `5 < 4` -> `False` |\n", 136 | "| `<=` | less than or equal | `5 <= 4` -> `False` |\n", 137 | "| `>` | strictly greater than | `5 > 4` -> `True` |\n", 138 | "| `>=` | greater than or equal | `5 >= 4` -> `True` |\n", 139 | "| `==` | equal | `5 == 4` -> `False` |\n", 140 | "| `!=` | not equal | `5 != 4` -> `True` |\n", 141 | "| `is` | object identity | `True is True` -> `True` |\n", 142 | "| `is not` | negated object identity | `False is not False` -> `False` |" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "**Be careful with float comparisons!**" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": null, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "0.1 + 0.1 == 0.2" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "0.1 + 0.1 + 0.1 == 0.3" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "What happend here? Due to a float's finite precision (click [here](https://docs.python.org/3/tutorial/floatingpoint.html#tut-fp-issues) for a thourough explanation), performing multiple mathematical operations can lead to rounding errors:" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": null, 180 | "metadata": {}, 181 | "outputs": [], 182 | "source": [ 183 | "0.1 + 0.1 + 0.1 # Should be 0.3" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "This means we cannot confidently compare two floating point numbers for equality. Luckily, most of the time we don't have to. But just in case, you can check whether a result is \"close enough\" by comparing the absolute value of the difference to a very small number:" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "abs((0.1 + 0.1 + 0.1) - 0.3) < 10**-10" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "## Branching\n", 207 | "\n", 208 | "There exist different tools to control the flow of the program. The most simple and intuitive control flow tool are the conditional statements `if`, `else`, and `elif`.\n", 209 | "This is the general form for using these conditional statements:\n", 210 | "```python\n", 211 | "if condition:\n", 212 | " # do something\n", 213 | "elif some_other_condition:\n", 214 | " # do something\n", 215 | "else:\n", 216 | " # do something\n", 217 | "```\n", 218 | "\n", 219 | "The `if`- and `elif`-statements are checked from top to bottom - the code indented below the first one with a `True` value is executed. If none of the conditions are true, the code inside the else block is executed." 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "The simplest way of a conditional statement is just using the `if`-statement. If the condition is `True` the code indented below the `if`-statement is executed- otherwise nothing happens. In the example below if `a_number` is equal to `10` the string `It is 10` is printed, if `a_number` does not equal `10`, the code inside the `if`-statement does not executed, therefore nothing is printed. \n", 227 | "\n", 228 | "Try yourself with different values for `a_number` and see what happens!" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "a_number = 10\n", 238 | "\n", 239 | "if a_number == 10:\n", 240 | " print(\"It is 10\")" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "Below the `else`-statement, we can define what should happen if the condition in the `if`-statement is `False`. In the exampe below `a_number` is now `11` and we have the same `if`-statement which checks if `a_number` is `10`. If it is, the code indented below the `else`-statement is executed. In the example below we print `It is not 10`. Note that an `else`-statement can never be alone - it is always necessary to define an `if`-statement beforehand. \n", 248 | "\n", 249 | "Try yourself with different values for `a_number` and see what happens!" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": {}, 256 | "outputs": [], 257 | "source": [ 258 | "a_number = 11\n", 259 | "\n", 260 | "if a_number == 10:\n", 261 | " print(\"It is 10\")\n", 262 | "else:\n", 263 | " print(\"It is not 10\")" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "Sometimes we want to check multiple conditions. For this we can use `elif`-statements. Notice: An `elif`-statement must always be after an `if`-statement or another `elif`-statement. \n", 271 | "In the example below we want to check if a number is 10 or 5 or something else. For this we define the same `if`-statement as before, then an `elif`-statement to check if the number is 5. And finally there's the `else`-statement from before. \n", 272 | "Now `a_number` is 5. So the first check is in the `if`-statement which is `False` since 5 is not 10. Then we check the `elif`-statement which is in this case `True` since 5 is actually 5, therefore `It is not 10 but 5` is printed. Afterwards we do not enter the `else`-statement since this is only entered if none of the previous conditions are `True`. \n", 273 | "\n", 274 | "Try yourself with different values for `a_number` and see what happens!" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": null, 280 | "metadata": {}, 281 | "outputs": [], 282 | "source": [ 283 | "a_number = 5\n", 284 | "\n", 285 | "if a_number == 10:\n", 286 | " print(\"It is 10\")\n", 287 | "elif a_number == 5:\n", 288 | " print(\"It is not 10 but 5\")\n", 289 | "else:\n", 290 | " print(\"It is not 10\")" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "Using the keyword `in`, you can check whether something is part of a container type (like a list, tuple, or dictionary):" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [ 306 | "number = 3\n", 307 | "\n", 308 | "small_primes = [2, 3, 5, 7, 9]\n", 309 | "\n", 310 | "if number in small_primes:\n", 311 | " print(f\"{number} is a small prime number!\")\n" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "The opposite also works:" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [ 327 | "language_to_check = \"french\"\n", 328 | "\n", 329 | "languages_i_speak = [\"german\", \"english\", \"spanish\"]\n", 330 | "\n", 331 | "if language_to_check not in languages_i_speak:\n", 332 | " print(f\"I've yet to learn {language_to_check}\")\n", 333 | "else:\n", 334 | " print(f\"Yay, I know {language_to_check}\")" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "## `for`-Loops\n", 342 | "\n", 343 | "[Video tutorial (6 min)](https://www.youtube.com/watch?v=6iF8Xb7Z3wQ)\n", 344 | "\n", 345 | "A short recap from last session: \n", 346 | "\n", 347 | "`for`-loops iterate over sequences (like lists, tuples, or sets). Often in programming, we want to perform the same action using different input data. Instead of writing this down X times we define a loop which does this for us.\n", 348 | "\n", 349 | "The `for`-loop in python is defined as follows:\n", 350 | "```python\n", 351 | "for variable in sequence:\n", 352 | " # do stuff\n", 353 | "```\n", 354 | "\n", 355 | "So we start with the keyword `for` followed by a variable name. This variable is created here and contains one element of the sequence, which changes every iteration of the loop until the whole sequence is finished, or we as programmers stop the loop. Next comes the `in` keyword followed by the sequence. Finally, the part which should happen at each iteration is _indented_ below the `for`-statement (i.e. the line starts 4 spaces further to the right).\n" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "As you already know you can iterate over the different sequence datatypes. Now we show you how to control stop a loop or skip elements in the loop based on a condition being met. For this there exist two keywords: \n", 363 | "* `continue`: stop the current iteration and begins the next one (jump back to the top of the loop, but use the next element)\n", 364 | "* `break`: exits the loop immediately (jump after the loop body)" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "In the Example below every letter but the letter `l` should be printed. For this we iterate over the string and check with an `if`-statement if the letter is actually `l` - if so we skip the print by using `continue`. \n", 372 | "\n", 373 | "Try this with different letters to skip!" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "# continue -> Print every letter but the letter \"l\"\n", 383 | "hello_world = \"Hello World!\"\n", 384 | "for letter in hello_world:\n", 385 | " if letter == \"l\":\n", 386 | " continue\n", 387 | " print(letter)" 388 | ] 389 | }, 390 | { 391 | "cell_type": "markdown", 392 | "metadata": {}, 393 | "source": [ 394 | "Similar we can also break the loop when the letter `l` is encountered in the sequence using `break`. \n", 395 | "\n", 396 | "Try to stop the loop at different letters!" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": {}, 403 | "outputs": [], 404 | "source": [ 405 | "# break -> Print all letters until the first encounter of the letter \"l\"\n", 406 | "hello_world = \"Hello World!\"\n", 407 | "for letter in hello_world:\n", 408 | " if letter == \"l\":\n", 409 | " break\n", 410 | " print(letter)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "#### `enumerate` a sequence\n", 418 | "You may encounter a problem where you need the index of the element in the sequence and the element itself. `enumerate` generates a tuple for each iterable, which contains the element's index and the element itself: `(, )`. This tuple can directly be unpacked by using two variable names separated by a comma in the `for`-statement." 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": null, 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [ 427 | "data = [63, 100, 48, 79, 4, 85, 26, 84, 16, 73, 58, 78]\n", 428 | "for index, element in enumerate(data):\n", 429 | " print(f\"{index}. element in data is {element}\")" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "#### List Comprehensions\n", 442 | "\n", 443 | "Sometimes we want to create a list based on the content of another list. Using what we've learned so far, we can use the list method `.append` inside a `for`-loop:" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": null, 449 | "metadata": {}, 450 | "outputs": [], 451 | "source": [ 452 | "numbers = [4, 8, 15, 16, 23, 42]\n", 453 | "\n", 454 | "squared_numbers = []\n", 455 | "for number in numbers:\n", 456 | " squared_numbers.append(number**2)\n", 457 | "\n", 458 | "print(squared_numbers)" 459 | ] 460 | }, 461 | { 462 | "cell_type": "markdown", 463 | "metadata": {}, 464 | "source": [ 465 | "A _list comprehension_ gives a more elegant way to do this:" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": null, 471 | "metadata": {}, 472 | "outputs": [], 473 | "source": [ 474 | "squared_even_numbers = [number**2 for number in numbers]\n", 475 | "print(squared_even_numbers)" 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": {}, 481 | "source": [ 482 | "It's even possible to use an `if`-statement in a list comprehension:" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": null, 488 | "metadata": {}, 489 | "outputs": [], 490 | "source": [ 491 | "numbers = [4, 8, 15, 16, 23, 42]\n", 492 | "\n", 493 | "squared_even_numbers = []\n", 494 | "for number in numbers:\n", 495 | " if number % 2 == 0: # check if the number is even, i.e. if dividing by 2 leaves no remainder\n", 496 | " squared_even_numbers.append(number**2)\n", 497 | "\n", 498 | "print(squared_even_numbers)" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": null, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "# this is the listcomprehension equivalent\n", 508 | "squared_even_numbers = [number**2 for number in numbers if number % 2 == 0]\n", 509 | "print(squared_even_numbers)" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "## `while`-Loops\n", 517 | "\n", 518 | "[Video tutorial (4 min)](https://youtu.be/6iF8Xb7Z3wQ?t=375)\n", 519 | "\n", 520 | "Besides `for`-loops there is an other common looping technique: `while`-loops. \n", 521 | "This type of loop performs instructions as long as a given condition is true." 522 | ] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": {}, 527 | "source": [ 528 | "General form:\n", 529 | "\n", 530 | "```python\n", 531 | "while condition:\n", 532 | " # do something\n", 533 | "```" 534 | ] 535 | }, 536 | { 537 | "cell_type": "markdown", 538 | "metadata": { 539 | "tags": [] 540 | }, 541 | "source": [ 542 | "In the example below, the code inside the `while`-loop gets executed until the statement `counter < 3` is `False`." 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": null, 548 | "metadata": {}, 549 | "outputs": [], 550 | "source": [ 551 | "counter = 0\n", 552 | "print(\"while-loop begins:\\n\")\n", 553 | "while counter < 3:\n", 554 | " print(\"The condition counter < 3 is still true.\") \n", 555 | " print(f\"counter is currently {counter}\") # print current value of counter\n", 556 | " \n", 557 | " counter += 1 # counter = counter+1\n", 558 | " print(\"incrementing counter...\")\n", 559 | " print(\"-\" * 40)\n", 560 | "\n", 561 | "print(\"The condition counter < 3 is false.\")\n", 562 | "print(f\"counter is currently {counter}\\n\")\n", 563 | "print(\"while-loop ends!\")" 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": { 569 | "tags": [] 570 | }, 571 | "source": [ 572 | "Many `while`-loops can be also rewritten as `for`-loops, but depending on the use case, one might be easier to implement and read or more efficient than the other.Let us see how such an implementation of the same problem can be done with both variants:" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "#### Controlling `while`\n", 580 | "\n", 581 | "Just as `for`-loops, also `while`-loops can be controlled by using the `break` and `continue` keywords." 582 | ] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": { 587 | "tags": [] 588 | }, 589 | "source": [ 590 | "#### `break`-statement\n", 591 | "`break`: stops the execution of the loop altogether" 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "execution_count": null, 597 | "metadata": { 598 | "tags": [] 599 | }, 600 | "outputs": [], 601 | "source": [ 602 | "x = 0\n", 603 | "while x < 10:\n", 604 | " x += 1\n", 605 | " if x % 4 == 0:\n", 606 | " break\n", 607 | " print(x)\n", 608 | "print(\"Executed after while-loop\")" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "metadata": {}, 614 | "source": [ 615 | "#### `continue`-statement\n", 616 | "`continue`: stops the current iteration and starts with the next iteration of the loop" 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": null, 622 | "metadata": { 623 | "tags": [] 624 | }, 625 | "outputs": [], 626 | "source": [ 627 | "x = 0\n", 628 | "while x < 10:\n", 629 | " x += 1\n", 630 | " if x % 2 == 0:\n", 631 | " print(f\"{x} is even\")\n", 632 | " continue\n", 633 | " print(f\"{x} is odd\")" 634 | ] 635 | }, 636 | { 637 | "cell_type": "markdown", 638 | "metadata": {}, 639 | "source": [ 640 | "### Infinite Loops" 641 | ] 642 | }, 643 | { 644 | "cell_type": "markdown", 645 | "metadata": {}, 646 | "source": [ 647 | "A loop that never ends is called an _infinite loop_, meaning it will not terminate on its own. If you accidentally create an endless loop in a notebook environment, click \"Interrupt kernel\". In a terminal environment, execution can be cancelled with Ctrl+C." 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": null, 653 | "metadata": { 654 | "tags": [] 655 | }, 656 | "outputs": [], 657 | "source": [ 658 | "# Don't do this... \n", 659 | "# while True:\n", 660 | "# print(\"Oh Oh...\")" 661 | ] 662 | }, 663 | { 664 | "cell_type": "markdown", 665 | "metadata": { 666 | "tags": [] 667 | }, 668 | "source": [ 669 | "## Common Mistakes" 670 | ] 671 | }, 672 | { 673 | "cell_type": "markdown", 674 | "metadata": {}, 675 | "source": [ 676 | "The order of `if`- and `elif`-statements matters! Depending on the use case, this may lead to unwanted behaviour. Take this code which should print the \"generation name\" for a given birth year:" 677 | ] 678 | }, 679 | { 680 | "cell_type": "code", 681 | "execution_count": null, 682 | "metadata": {}, 683 | "outputs": [], 684 | "source": [ 685 | "birth_year = 1960\n", 686 | "\n", 687 | "if birth_year <= 1964:\n", 688 | " print(\"Baby Boomer\")\n", 689 | "elif birth_year <= 1980:\n", 690 | " print(\"Generation X\")\n", 691 | "elif birth_year <= 1996:\n", 692 | " print(\"Generation Y\")\n", 693 | "elif birth_year <= 2012:\n", 694 | " print(\"Generation Z\")\n", 695 | "elif birth_year < 2023:\n", 696 | " print(\"Generation Alpha\")" 697 | ] 698 | }, 699 | { 700 | "cell_type": "markdown", 701 | "metadata": {}, 702 | "source": [ 703 | "Since `1960` is less than `1964`, the first `if`-statement is `True`, so `\"Baby Boomer\"` is printed. The `elif`s below are skipped, even though they would be `True` as well! So the code above behaves as intented, while the code below, with a different order of conditions, does not:" 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": null, 709 | "metadata": {}, 710 | "outputs": [], 711 | "source": [ 712 | "birth_year = 1960\n", 713 | "\n", 714 | "if birth_year < 2023:\n", 715 | " print(\"Generation Alpha\")\n", 716 | "elif birth_year <= 2012:\n", 717 | " print(\"Generation Z\")\n", 718 | "elif birth_year <= 1996:\n", 719 | " print(\"Generation Y\")\n", 720 | "elif birth_year <= 1980:\n", 721 | " print(\"Generation X\")\n", 722 | "elif birth_year <= 1964:\n", 723 | " print(\"Baby Boomer\")" 724 | ] 725 | }, 726 | { 727 | "cell_type": "markdown", 728 | "metadata": {}, 729 | "source": [ 730 | "Always place the spaces properly when indenting, else an `IndentationError` occurs. You should always use 4 spaces for indentation." 731 | ] 732 | }, 733 | { 734 | "cell_type": "code", 735 | "execution_count": null, 736 | "metadata": {}, 737 | "outputs": [], 738 | "source": [ 739 | "some_var = 17\n", 740 | "\n", 741 | "if some_var > 10:\n", 742 | "print(\"some_var is totally bigger than 10.\") # no indendation at all\n", 743 | "elif some_var < 10: \n", 744 | " print(\"some_var is smaller than 10.\") # this will not produce an error, but should be avoided\n", 745 | "else: \n", 746 | " print(\"some_var is indeed 10.\") # this will not produce an error, but should be avoided" 747 | ] 748 | }, 749 | { 750 | "cell_type": "markdown", 751 | "metadata": {}, 752 | "source": [ 753 | "This can be fixed by adding appropriate whitespace to the blocks below the `if` and `elif`-statement and removing some whitespaces in the `else`-statement:" 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": null, 759 | "metadata": {}, 760 | "outputs": [], 761 | "source": [ 762 | "some_var = 17\n", 763 | "\n", 764 | "if some_var > 10:\n", 765 | " print(\"some_var is totally bigger than 10.\") # no indendation at all\n", 766 | "elif some_var < 10: \n", 767 | " print(\"some_var is smaller than 10.\") # this will not produce an error, but should be avoided\n", 768 | "else: \n", 769 | " print(\"some_var is indeed 10.\") # this will not produce an error, but should be avoided" 770 | ] 771 | }, 772 | { 773 | "cell_type": "markdown", 774 | "metadata": {}, 775 | "source": [ 776 | "Never reaching the end condition when working with while-loops might happen by mistake, so be aware! :)" 777 | ] 778 | }, 779 | { 780 | "cell_type": "code", 781 | "execution_count": null, 782 | "metadata": {}, 783 | "outputs": [], 784 | "source": [ 785 | "number = 5\n", 786 | "\n", 787 | "while number > 0:\n", 788 | " print(number)\n", 789 | " \n", 790 | "# Stop this by interupting the kernel ;)" 791 | ] 792 | }, 793 | { 794 | "cell_type": "markdown", 795 | "metadata": {}, 796 | "source": [ 797 | "This can be fixed by adding an end condition (`break`) or by letting the condition be `False` at some time:" 798 | ] 799 | }, 800 | { 801 | "cell_type": "code", 802 | "execution_count": null, 803 | "metadata": {}, 804 | "outputs": [], 805 | "source": [ 806 | "number = 5\n", 807 | "\n", 808 | "while number > 0:\n", 809 | " print(number)\n", 810 | " number -= 1" 811 | ] 812 | }, 813 | { 814 | "cell_type": "markdown", 815 | "metadata": {}, 816 | "source": [ 817 | "`break` and `continue` only breaks or skips the innermost loop:." 818 | ] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "execution_count": null, 823 | "metadata": {}, 824 | "outputs": [], 825 | "source": [ 826 | "matrix = [[1, 0, -1],\n", 827 | " [2, 0, -2],\n", 828 | " [1, 0, -1]]\n", 829 | "\n", 830 | "# Check for elements smaller 0 in a matrix\n", 831 | "for row in matrix:\n", 832 | " for column in row:\n", 833 | " if column < 0:\n", 834 | " print(\"There is a negativ element in the Matrix!\")\n", 835 | " break" 836 | ] 837 | }, 838 | { 839 | "cell_type": "markdown", 840 | "metadata": {}, 841 | "source": [ 842 | "By adding the `negativ_found` flag and breaking if this is `True` we can control the outer loop as well:" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": null, 848 | "metadata": {}, 849 | "outputs": [], 850 | "source": [ 851 | "matrix = [[1, 0, -1],\n", 852 | " [2, 0, -2],\n", 853 | " [1, 0, -1]]\n", 854 | "\n", 855 | "# Check for elements smaller 0 in a matrix\n", 856 | "for row in matrix:\n", 857 | " negativ_found = False\n", 858 | " for column in row:\n", 859 | " if column < 0:\n", 860 | " print(\"There is a negativ element in the Matrix!\")\n", 861 | " negativ_found = True\n", 862 | " if negativ_found:\n", 863 | " break" 864 | ] 865 | }, 866 | { 867 | "cell_type": "markdown", 868 | "metadata": {}, 869 | "source": [ 870 | "## Best Practice\n", 871 | "\n", 872 | "### Nesting `if`/`elif`/`else`-statements if not needed:\n", 873 | "Try to avoid nesting multiple conditional statements. This keeps lines short and related pieces of code close to each other.\n", 874 | "\n", 875 | "##### _Don't_:\n", 876 | "```python\n", 877 | "if arg1 >= 10:\n", 878 | " if arg2 < 0:\n", 879 | " if arg3 == 2:\n", 880 | " do_something()\n", 881 | " else:\n", 882 | " print(f\"Argument 3 must be 2 but is {arg3}\")\n", 883 | " else:\n", 884 | " print(f\"Argument 2 must be smaller then 0 but is {arg2}\")\n", 885 | "else:\n", 886 | " print(f\"Argument 1 is not bigger then 10 but is {arg1}\")\n", 887 | "```\n", 888 | "As you see this can be very confusing to read and also the indentations get deep fast ;)\n", 889 | "\n", 890 | "##### _Do:_\n", 891 | "```python\n", 892 | "if not arg1 >= 10:\n", 893 | " print(f\"Argument 1 is not bigger then 10 but is {arg1}\")\n", 894 | "elif not arg2 < 0:\n", 895 | " print(f\"Argument 2 must be smaller then 0 but is {arg2}\")\n", 896 | "elif not arg3 == 2:\n", 897 | " print(f\"Argument 3 must be 2 but is {arg3}\")\n", 898 | "else:\n", 899 | " do_something()\n", 900 | "```" 901 | ] 902 | }, 903 | { 904 | "cell_type": "markdown", 905 | "metadata": {}, 906 | "source": [ 907 | "## Iterating over iterables to check for existence of an element\n", 908 | "The `in` statement checks for existence of an element inside an iterable or string.\n", 909 | "##### _Don't_:\n", 910 | "```python\n", 911 | "numbers = [1,2,3,4,5]\n", 912 | "target_number = 6\n", 913 | "\n", 914 | "for number in numbers:\n", 915 | " if number == target_number:\n", 916 | " print(\"found the target\")\n", 917 | "```\n", 918 | "Here you can directly check if `target_number` is in `numbers`, this saves some loop iterations ;)\n", 919 | "\n", 920 | "##### _Do:_\n", 921 | "```python\n", 922 | "numbers = [1,2,3,4,5]\n", 923 | "target_number = 6\n", 924 | "\n", 925 | "if target_number in numbers:\n", 926 | " print(\"found the target\")\n", 927 | "```\n", 928 | "This is much faster and more readable since you do not have to loop over the whole list explicitly." 929 | ] 930 | }, 931 | { 932 | "cell_type": "markdown", 933 | "metadata": {}, 934 | "source": [ 935 | "## Iterating over elements in a iterable with a `for` loop instead of a `while`-loop\n", 936 | "When you want to do something with the elements of an iterable, it is almost always better to use a `for`-loop\n", 937 | "##### _Don't_:\n", 938 | "```python\n", 939 | "numbers = [43, 26, 42, 28, 33, 16, 91, 88, 55, 61, 62, 46, 18, 49, 8, 89, 12, 1, 42, 52]\n", 940 | "\n", 941 | "index = 0\n", 942 | "while index < len(numbers):\n", 943 | " print(numbers[index])\n", 944 | "```\n", 945 | "Here you need an index to actually access the element in numbers. Additionally, using a `for`-loop avoids endless loops - you don't have to check for the end condition, so no mistakes here ;)\n", 946 | "\n", 947 | "##### _Do:_\n", 948 | "```python\n", 949 | "numbers = [43, 26, 42, 28, 33, 16, 91, 88, 55, 61, 62, 46, 18, 49, 8, 89, 12, 1, 42, 52]\n", 950 | "\n", 951 | "for number in bumbers:\n", 952 | " print(numbers)\n", 953 | "```" 954 | ] 955 | } 956 | ], 957 | "metadata": { 958 | "interpreter": { 959 | "hash": "9876a56b36d3e86ed839f942802fea42f90ec2f0ee3c4ea82631e635694a47fe" 960 | }, 961 | "kernelspec": { 962 | "display_name": "Python 3.10.0 64-bit", 963 | "language": "python", 964 | "name": "python3" 965 | }, 966 | "language_info": { 967 | "codemirror_mode": { 968 | "name": "ipython", 969 | "version": 3 970 | }, 971 | "file_extension": ".py", 972 | "mimetype": "text/x-python", 973 | "name": "python", 974 | "nbconvert_exporter": "python", 975 | "pygments_lexer": "ipython3", 976 | "version": "3.10.0" 977 | } 978 | }, 979 | "nbformat": 4, 980 | "nbformat_minor": 4 981 | } 982 | --------------------------------------------------------------------------------