├── README.md ├── data ├── marr.p ├── sql_pt2.zip ├── testdb2.db ├── viz_datasets.zip └── weather_data.db ├── exercises ├── ex_01.ipynb ├── ex_02.ipynb ├── ex_03.ipynb ├── ex_04.ipynb ├── ex_05.ipynb ├── ex_06.ipynb ├── ex_07.ipynb ├── ex_09_simple_web_app.ipynb ├── ex_10.ipynb ├── ex_adv1_django.ipynb └── ex_adv2_ML.ipynb └── notebooks ├── 01_Python_basics.ipynb ├── 02_Memory_flow_control.ipynb ├── 03_Functions_and_Objects.ipynb ├── 04_NumPy.ipynb ├── 05_Pandas.ipynb ├── 06_SQL_python.ipynb ├── 07_1_intro_visual.ipynb ├── 07_2_intro_visual_seaborn.ipynb ├── 08_SQLIntro_part2.ipynb ├── 09_Webb_apps_with_Python_-_Bottle.ipynb ├── 10_JSON_XML_and_Webscrapping.ipynb └── exm_4.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Python and SQL: intro / SQL platforms 2024/2025 2 | Course materials for Python and SQL: intro / SQL platforms - UW\ 3 | dr Jakub Michańków, email: j.michankow@uw.edu.pl 4 | 5 | ## Final test 6 | Online, at meeting link below.\ 7 | Group 1: 23th Jan 2025, 13:15\ 8 | Group 4: 23th Jan 2025, 15:00 9 | 10 | ## Meeting link 11 | https://meet.google.com/rfw-gtmc-ytv 12 | 13 | ## Course Scope 14 | 15 | 1. Introdution and course organization, environment setup (github, jupyter, colab/kaggle) 16 | 2. Basics, flow control and text files 17 | 3. Functions and objects 18 | 4. Numpy (algebra) and Pandas (data handling) 19 | 5. Intro to SQL part 1 20 | 6. Data visualization (matplotlib and Seaborn) 21 | 7. Python for web apps (XML, JSON, scrapping) 22 | 8. Intro to SQL part 2 23 | 24 | ## Requirements and grading 25 | - Project and presentation (60%) 26 | - Written test (40%) 27 | - Final grade: 0.6 * project score + 0.4 * test score + extra pts 28 | 29 | To pass the course you need at least 60% total AND at least 20.5%(51%) from the test. 30 | 31 | ## Deadlines 32 | - Project proposal: 31st October - choose the topic and and members of your team (3-4 people), send via email. 33 | - Presetations: 28th November - Around 5 mins, the presentation needs to include slides with title page (including authors), information on why you chose you topic, how you want to solve the problem, short description of the tools you are going to use and short description of each team member responsibilities. 34 | - Project delivery: 16th of January - Practical project will be a progam of your design, built with Python and any SQL language. 35 | - Test: 23rd of January 36 | 37 | ## Generative AI Rules & Plagiarism 38 | 39 | Use of generative AI is allowed only to **help** you with your coding. Any parts of the code that have been generated need to clearly marked by comments in the code, where you need to state the scope of the support the type of AI model and its version. 40 | 41 | Plagiarism is not tolerated in any form (including self-plagiarism). You are not allowed to copy parts from one project to another and you always need to provide an exact source. 42 | 43 | ## Meetings 44 | - 03.10.2024 - Intro - Download and install [Python](https://www.python.org/downloads/), download and install [VSCode](https://code.visualstudio.com/download). Try to run a simple code, e.g. `print(2+2)`. 45 | - 10.10.2024 - Python basics. [Lecture 1 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/01_Python_basics.ipynb), [Exercise 1 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_01.ipynb) 46 | - 17.10.2024 - Flow control. [Lecture 2 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/02_Memory_flow_control.ipynb), [Exercise 2 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_02.ipynb) 47 | - 24.10.2024 - Functions and objects. [Lecture 3 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/03_Functions_and_Objects.ipynb), [Exercise 3 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_03.ipynb) 48 | - 31.10.2024 - Numpy. [Lecture 4 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/04_NumPy.ipynb), [Exercise 4 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_04.ipynb) 49 | - 07 & 14.11.2024 - Pandas. [Lecture 5 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/05_Pandas.ipynb), [Exercise 5 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_05.ipynb) 50 | - 21.11.2024 - SQL part 1 [Lecture 6 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/06_SQL_python.ipynb), [Exercise 6 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_06.ipynb) 51 | - 05.12.2024 - Visualizations [Lecture 7_1 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/07_1_intro_visual.ipynb), [Exercise 7 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_07.ipynb) 52 | - 12.12.2024 - Seaborn, Plotly [Lecture 7_2 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/07_2_intro_visual_seaborn.ipynb) 53 | - 19.12.2024 - SQL pt 2 [Lecture 8 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/08_SQLIntro_part2.ipynb) 54 | - 09.01.2025 (**asynchronous**) - [Lecture 9 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/09_Webb_apps_with_Python_-_Bottle.ipynb), [Exercise 9 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_09_simple_web_app.ipynb) 55 | - 16.01.2025 - Web scraping [Lecture 10 notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/notebooks/10_JSON_XML_and_Webscrapping.ipynb), [Exercise 10 Notebook](https://github.com/glowform/intro_python_sql_2024/blob/main/exercises/ex_10.ipynb) 56 | - 23.01.2025 - Final test 57 | 58 | 59 | ## Annoucements 60 | 61 | Classes on 09.01 are synchronous only, meaning that there wont be an online meeting on this day. 62 | -------------------------------------------------------------------------------- /data/marr.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/glowform/intro_python_sql_2024/70588230e5021a88e071b6ee8675efdf523a504c/data/marr.p -------------------------------------------------------------------------------- /data/sql_pt2.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/glowform/intro_python_sql_2024/70588230e5021a88e071b6ee8675efdf523a504c/data/sql_pt2.zip -------------------------------------------------------------------------------- /data/testdb2.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/glowform/intro_python_sql_2024/70588230e5021a88e071b6ee8675efdf523a504c/data/testdb2.db -------------------------------------------------------------------------------- /data/viz_datasets.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/glowform/intro_python_sql_2024/70588230e5021a88e071b6ee8675efdf523a504c/data/viz_datasets.zip -------------------------------------------------------------------------------- /data/weather_data.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/glowform/intro_python_sql_2024/70588230e5021a88e071b6ee8675efdf523a504c/data/weather_data.db -------------------------------------------------------------------------------- /exercises/ex_01.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Intro to Python - exercises\n", 8 | "When solving exercises in this course it is advised to sometimes search for necessary information on the Internet (google, stackoverflow etc.). Finding a solution to a problem using Internet in a skillful and fast way is crucial.\n", 9 | "\n", 10 | "## Basic variables and print\n", 11 | "* Create a basic Hello World program as follows: create two strings, one for each word. Create a third variable, which will concatenate two previous variables, and then show its contents." 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": { 18 | "collapsed": true 19 | }, 20 | "outputs": [], 21 | "source": [] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "## Lists\n", 28 | "Do following exercises about lists:\n", 29 | "* Create a list of three names.\n", 30 | "* Add a fourth name at the end of the list.\n", 31 | "* Add another name at the beginning of the list.\n", 32 | "* Delete third element from the list.\n", 33 | "* Show the number of elements on the list.\n", 34 | "* Using Python syntax to check if following names are on the list: Anna, John, Siegfried" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": { 41 | "collapsed": true 42 | }, 43 | "outputs": [], 44 | "source": [] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "* Create a list containing your three lucky numbers.\n", 51 | "* Concatenate two lists.\n", 52 | "* Create a new list, which will be a list of all three lists available in the notebook.\n", 53 | "* Create a copy of the list of lists.\n", 54 | "* Clear the original list of lists.\n", 55 | "* Find Python 3.6 documentation for lists." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "metadata": { 62 | "collapsed": true 63 | }, 64 | "outputs": [], 65 | "source": [] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "## Sets\n", 72 | "Create three sets of colors: colorsRainbow, colorsRGB, colorsCMYK, and do the following exercises:\n", 73 | "* Add white to rainbow.\n", 74 | "* Delete \"K\" color from CMYK.\n", 75 | "* Create a set of these colors, which are both in rainbow and RGB.\n", 76 | "* Create a set of these colors, which are in rainbow and are not in CMYK.\n", 77 | "* Create a set which contains colors from all sets.\n", 78 | "* Make a list of the set of all colors." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": { 85 | "collapsed": true 86 | }, 87 | "outputs": [], 88 | "source": [] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "## Dictionaries\n", 95 | "Create an English-Polish (or another foreign language) dictionary, which contains your three favorite English words and then:\n", 96 | "* Add \"author\", which will contain your first name and surname.\n", 97 | "* Delete the second added word from dictionary.\n", 98 | "* Create a dictionary, which will be a translation of the previous dictionary. (inverse dictionaries)\n", 99 | "* Concatenate two existing dictionaries.\n", 100 | "* Find Python 3.6 documentation for dictionaries." 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "metadata": { 107 | "collapsed": true 108 | }, 109 | "outputs": [], 110 | "source": [] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "## Tuples\n", 117 | "Create the following two tuples:\n", 118 | "* Tuple containing your name, surname and year of birth.\n", 119 | "* Tuple containing three lists: colorsRainbow, colorsRGB, colorsCMYK." 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": { 126 | "collapsed": true 127 | }, 128 | "outputs": [], 129 | "source": [] 130 | } 131 | ], 132 | "metadata": { 133 | "kernelspec": { 134 | "display_name": "Python 3", 135 | "language": "python", 136 | "name": "python3" 137 | }, 138 | "language_info": { 139 | "codemirror_mode": { 140 | "name": "ipython", 141 | "version": 3 142 | }, 143 | "file_extension": ".py", 144 | "mimetype": "text/x-python", 145 | "name": "python", 146 | "nbconvert_exporter": "python", 147 | "pygments_lexer": "ipython3", 148 | "version": "3.6.9" 149 | } 150 | }, 151 | "nbformat": 4, 152 | "nbformat_minor": 2 153 | } 154 | -------------------------------------------------------------------------------- /exercises/ex_02.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Flow control - exercises\n", 8 | "## For and iterators\n", 9 | "\n", 10 | "* Print 5 consecutive numbers which are greater than 247 and divisible by 3.\n", 11 | "* Print all colors of rainbow." 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": { 18 | "collapsed": true 19 | }, 20 | "outputs": [], 21 | "source": [ 22 | "rainbow = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']\n", 23 | "\n" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "* Create a list of your three favorite numbers, and then (using enumerate) modify a list in such a way that new values are equal to the old value multiplied by its index. For every number print its old and new value." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": { 37 | "collapsed": true 38 | }, 39 | "outputs": [], 40 | "source": [] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "* Using zip() print names and surnames of movie protagonists." 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": { 53 | "collapsed": true 54 | }, 55 | "outputs": [], 56 | "source": [ 57 | "names = ['Grzegorz', 'Zdzisław', 'Ryszard']\n", 58 | "surnames = ['Brzęczyszczykiewicz', 'Dyrman', 'Ochódzki']\n", 59 | "\n" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "## List comprehension\n", 67 | "* Using list comprehensions create a list of letters which are not vowels." 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "collapsed": true 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "sentence = \"The quick brown fox jumps over the lazy dog.\"\n", 79 | "vowels = 'aeiou'\n", 80 | "\n" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "* Choose numbers between 2 and 37 (inclusive) which have a remainder of 1 when divided by 3, and raise them to the power of 2." 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": { 94 | "collapsed": true 95 | }, 96 | "outputs": [], 97 | "source": [] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "## If and while\n", 104 | "* For every element on the list below, using if, elif and else: if element is a string: print it; if it is a float greater than 0: print \"Float, OK\"; if it is an even int print \"Even int\", in every other case print \"else\"." 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": null, 110 | "metadata": { 111 | "collapsed": true 112 | }, 113 | "outputs": [], 114 | "source": [ 115 | "theList = [2, 2.5, 3, \"element\", -3.532]\n", 116 | "# Hint: type(x)" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "* Print consecutive Fibonacci numbers as long as sum of previous elements is lower than 100." 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": { 130 | "collapsed": true 131 | }, 132 | "outputs": [], 133 | "source": [] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "* Sum of a following geometric series: 1, 0.5, 0.25... equals 2. Using while and break check if more than 100 elements are required to get a difference between sum elements and 2 which is lower than 0.001. If yes - how many elements are required?" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": null, 145 | "metadata": { 146 | "collapsed": true 147 | }, 148 | "outputs": [], 149 | "source": [] 150 | } 151 | ], 152 | "metadata": { 153 | "kernelspec": { 154 | "display_name": "Python 3", 155 | "language": "python", 156 | "name": "python3" 157 | }, 158 | "language_info": { 159 | "codemirror_mode": { 160 | "name": "ipython", 161 | "version": 3 162 | }, 163 | "file_extension": ".py", 164 | "mimetype": "text/x-python", 165 | "name": "python", 166 | "nbconvert_exporter": "python", 167 | "pygments_lexer": "ipython3", 168 | "version": "3.6.2" 169 | } 170 | }, 171 | "nbformat": 4, 172 | "nbformat_minor": 2 173 | } 174 | -------------------------------------------------------------------------------- /exercises/ex_03.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Functions\n", 8 | "* Write and test a function, which takes length of the base and height of a triangle and returns its area.\n", 9 | "* Write a function with two arguments, which returns perimeter and area of a regular polygon, with a number and length of edges as arguments." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": { 16 | "collapsed": true 17 | }, 18 | "outputs": [], 19 | "source": [ 20 | "\n", 21 | "\n", 22 | "\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "* Write and test a function, which returns a lambda function according to the chosen unit (\"F\" or \"C\"). It should calculate, how much energy is needed to raise the temperature of one kilogram of water by one degree (specific heat of water equals 4189,9 J/(kg\\*K)). Remember to take into account these cases, where the argument is wrong (return False or a default value)." 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "\n", 41 | "\n", 42 | "\n" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "* Write a function which is an English-Polish translator of lists. Function should have optional argument \"inplace = False\". If inplace is True, function should translate words inside the list passed as an argument. If it is not True, a new list should be returned. Think about the proper solution for words not in the dictionary." 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": null, 55 | "metadata": { 56 | "collapsed": true 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "enPl = {'have':'mieć', 'which':'który', 'make':'robić', 'can':'potrafić', 'know':'wiedzieć', 'take':'brać', 'people':'ludzie', 'year':'rok', 'good':'dobry', 'bad':'zły', 'look':'patrzeć' }\n", 61 | "toTranslate = ['good', 'look', 'make', 'can', 'year', 'becasue', 'mastermidn', 'have']\n", 62 | "\n", 63 | "\n" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "Profile your translator function using both %timeit and %prun." 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "* Write a FizzBuzz function. This function should print consecutive numbers from 1 to 50. If the number is divisible by 3 it should print \"Fizz\" instead of the number, if it is divisible by 5, \"Buzz\" should be printed." 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": { 84 | "collapsed": true 85 | }, 86 | "outputs": [], 87 | "source": [] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "* Modify the function above so that it would take arguments \"k\" and \"l\" instead of numbers 3 and 5." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": { 100 | "collapsed": true 101 | }, 102 | "outputs": [], 103 | "source": [] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": { 108 | "collapsed": true 109 | }, 110 | "source": [ 111 | "# Objects\n", 112 | "* Create a \"person\" object with the following attributes: name, surname, age, gender, height, email.\n", 113 | "* Creating an object should require name, surname and email; other attributes are optional.\n", 114 | "* The \"person\" objects should have methods allowing for:\n", 115 | " * Setting a value for gender, height and age.\n", 116 | " * Calculating BMI if the required values are filled in; if not - print a message which tells the user to input these values." 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": null, 122 | "metadata": { 123 | "collapsed": true 124 | }, 125 | "outputs": [], 126 | "source": [] 127 | } 128 | ], 129 | "metadata": { 130 | "kernelspec": { 131 | "display_name": "Python 3", 132 | "language": "python", 133 | "name": "python3" 134 | }, 135 | "language_info": { 136 | "codemirror_mode": { 137 | "name": "ipython", 138 | "version": 3 139 | }, 140 | "file_extension": ".py", 141 | "mimetype": "text/x-python", 142 | "name": "python", 143 | "nbconvert_exporter": "python", 144 | "pygments_lexer": "ipython3", 145 | "version": "3.6.2" 146 | } 147 | }, 148 | "nbformat": 4, 149 | "nbformat_minor": 2 150 | } 151 | -------------------------------------------------------------------------------- /exercises/ex_04.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "deletable": true, 7 | "editable": true 8 | }, 9 | "source": [ 10 | "# Numpy - exercises\n", 11 | "## Creating objects\n", 12 | "Create the following objects:\n", 13 | "* 10-element, 0-dimensional vector of zeros.\n", 14 | "* 8-element, 1-dimensional vector of ones.\n", 15 | "* 12-element matrix filled with numbers from 2 to 14.\n", 16 | "* just as above, but make sure that the numbers are integers." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": { 23 | "collapsed": true, 24 | "deletable": true, 25 | "editable": true 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "\n", 30 | "\n" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": { 36 | "deletable": true, 37 | "editable": true 38 | }, 39 | "source": [ 40 | "Create the following objects:\n", 41 | "* Matrix of dimensions 4,5 filled with random numbers from distribution U~[0,1)\n", 42 | "* Matrix of dimensions 3,6 filled with random numbers from distribution N~(0,1)\n", 43 | "* Matrix of dimensions 2,5,6 filled with random numbers from beta distribution (0.5, 0.5)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": { 50 | "collapsed": true, 51 | "deletable": true, 52 | "editable": true 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "\n", 57 | "\n" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": { 63 | "deletable": true, 64 | "editable": true 65 | }, 66 | "source": [ 67 | "Transform the last matrix from the previous cell:\n", 68 | "* Change its dimension to (x, 6) in such a way that numpy infers x.\n", 69 | "* Split it to two halves vertically.\n", 70 | "* Split one of the halves horizotally, so you get two quarters (A and B)\n", 71 | "* Transform A to a one- and two-dimensional vector in two ways (in both cases): without copying and with copying.\n", 72 | "* Transform A and B to shape (x, 2) and concatenate them vertically and horizontally." 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "metadata": { 79 | "collapsed": true, 80 | "deletable": true, 81 | "editable": true 82 | }, 83 | "outputs": [], 84 | "source": [ 85 | "\n", 86 | "\n" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": { 92 | "deletable": true, 93 | "editable": true 94 | }, 95 | "source": [ 96 | "Sort the last matrix in four different ways (by rows/by columns, ascending/descending)." 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": { 103 | "collapsed": true, 104 | "deletable": true, 105 | "editable": true 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "\n", 110 | "\n" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": { 116 | "deletable": true, 117 | "editable": true 118 | }, 119 | "source": [ 120 | "Implement two-dimenstional Rosenbrock function (https://en.wikipedia.org/wiki/Rosenbrock_function) in two ways: using np.apply_along_axis and for loop.\n", 121 | "Compare how fast the two implementations are." 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": { 128 | "collapsed": true, 129 | "deletable": true, 130 | "editable": true 131 | }, 132 | "outputs": [], 133 | "source": [ 134 | "x = random.normal(size=(10000, 2))\n", 135 | "\n" 136 | ] 137 | } 138 | ], 139 | "metadata": { 140 | "kernelspec": { 141 | "display_name": "Python 3", 142 | "language": "python", 143 | "name": "python3" 144 | }, 145 | "language_info": { 146 | "codemirror_mode": { 147 | "name": "ipython", 148 | "version": 3 149 | }, 150 | "file_extension": ".py", 151 | "mimetype": "text/x-python", 152 | "name": "python", 153 | "nbconvert_exporter": "python", 154 | "pygments_lexer": "ipython3", 155 | "version": "3.6.2" 156 | } 157 | }, 158 | "nbformat": 4, 159 | "nbformat_minor": 2 160 | } 161 | -------------------------------------------------------------------------------- /exercises/ex_05.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "deletable": true, 7 | "editable": true 8 | }, 9 | "source": [ 10 | "# Pandas - exercises\n", 11 | "Using the marriage dataset solve the following exercises:" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": { 18 | "collapsed": true, 19 | "deletable": true, 20 | "editable": true 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "import numpy as np\n", 25 | "import pandas as pd\n" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "deletable": true, 32 | "editable": true 33 | }, 34 | "source": [ 35 | "Show the first three and the last three rows using display function (remember to import it)" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": { 42 | "collapsed": true, 43 | "deletable": true, 44 | "editable": true 45 | }, 46 | "outputs": [], 47 | "source": [] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": { 52 | "deletable": true, 53 | "editable": true 54 | }, 55 | "source": [ 56 | "Create a new dataset which contains all the original dataset's columns except occupation." 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": { 63 | "collapsed": true, 64 | "deletable": true, 65 | "editable": true 66 | }, 67 | "outputs": [], 68 | "source": [] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "deletable": true, 74 | "editable": true 75 | }, 76 | "source": [ 77 | "Check how many unique values the children variable has. Display summary statistics about this variable." 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": { 84 | "collapsed": true, 85 | "deletable": true, 86 | "editable": true 87 | }, 88 | "outputs": [], 89 | "source": [] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": { 94 | "deletable": true, 95 | "editable": true 96 | }, 97 | "source": [ 98 | "Create a variable containing natural logarithm of age." 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": { 105 | "collapsed": true, 106 | "deletable": true, 107 | "editable": true 108 | }, 109 | "outputs": [], 110 | "source": [] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": { 115 | "deletable": true, 116 | "editable": true 117 | }, 118 | "source": [ 119 | "Create a discrete variable for education which denotes education level (instead of years of education) in ascending order." 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": { 126 | "collapsed": true, 127 | "deletable": true, 128 | "editable": true 129 | }, 130 | "outputs": [], 131 | "source": [] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": { 136 | "deletable": true, 137 | "editable": true 138 | }, 139 | "source": [ 140 | "Using numpy draw a random income from log-normal distribution for each observation. Use such parameters that income looks credible taking current market conditions into account and that expected value for income increases with years of education." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": { 147 | "collapsed": true, 148 | "deletable": true, 149 | "editable": true 150 | }, 151 | "outputs": [], 152 | "source": [] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": { 157 | "deletable": true, 158 | "editable": true 159 | }, 160 | "source": [ 161 | "Select and display rows which fulfill the following criteria:\n", 162 | "* People who have at least two children\n", 163 | "* People who have at least two children and are over 30 years old\n", 164 | "* People who have at least two children and are over 30 years old, but not over 40 years old." 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": { 171 | "collapsed": true, 172 | "deletable": true, 173 | "editable": true 174 | }, 175 | "outputs": [], 176 | "source": [] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": { 181 | "deletable": true, 182 | "editable": true 183 | }, 184 | "source": [ 185 | "* Add a row (fill all the values manually) for a person with seven children.\n", 186 | "* Change all the values in such a way that all people having three children would have four children." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": { 193 | "collapsed": true, 194 | "deletable": true, 195 | "editable": true 196 | }, 197 | "outputs": [], 198 | "source": [] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": { 203 | "deletable": true, 204 | "editable": true 205 | }, 206 | "source": [ 207 | "For people who have: no children, 1 child, 2 children and so on (make sure you display all values) select and display first 5 rows of people who are over 40 years old. Use query and @ operator.\n", 208 | "\n", 209 | "Tip: You can iterate over marr.children.unique()" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": true, 217 | "deletable": true, 218 | "editable": true 219 | }, 220 | "outputs": [], 221 | "source": [] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": { 226 | "deletable": true, 227 | "editable": true 228 | }, 229 | "source": [ 230 | "* Group observations by variable \"rate\" and calculate income's mean and standard deviation for each of the groups.\n", 231 | "* Get rid of unnecessary MultiIndex." 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "metadata": { 238 | "collapsed": true, 239 | "deletable": true, 240 | "editable": true 241 | }, 242 | "outputs": [], 243 | "source": [] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": { 248 | "deletable": true, 249 | "editable": true 250 | }, 251 | "source": [ 252 | "For every row assign the mean income of people who have the same values of rate and level of education (defined earlier)." 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": { 259 | "collapsed": true, 260 | "deletable": true, 261 | "editable": true 262 | }, 263 | "outputs": [], 264 | "source": [] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": { 269 | "deletable": true, 270 | "editable": true 271 | }, 272 | "source": [ 273 | "For every row calculate the difference between income and mean income for a group of people who have the same values of rate and level of education (defined earlier). Do it in one statement (one line, no semicolon) not using the variable created earlier." 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": { 280 | "collapsed": true, 281 | "deletable": true, 282 | "editable": true 283 | }, 284 | "outputs": [], 285 | "source": [] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "deletable": true, 291 | "editable": true 292 | }, 293 | "source": [ 294 | "For every row assign a variable containing the word \"yes\" if a person have more than one child and income below average and the word \"no\" otherwise." 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": null, 300 | "metadata": { 301 | "collapsed": true, 302 | "deletable": true, 303 | "editable": true 304 | }, 305 | "outputs": [], 306 | "source": [] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": { 311 | "deletable": true, 312 | "editable": true 313 | }, 314 | "source": [ 315 | "Group people by education level and describe the group using .describe()." 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": { 322 | "collapsed": true, 323 | "deletable": true, 324 | "editable": true 325 | }, 326 | "outputs": [], 327 | "source": [] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": { 333 | "collapsed": true, 334 | "deletable": true, 335 | "editable": true 336 | }, 337 | "outputs": [], 338 | "source": [] 339 | } 340 | ], 341 | "metadata": { 342 | "kernelspec": { 343 | "display_name": "Python 3", 344 | "language": "python", 345 | "name": "python3" 346 | }, 347 | "language_info": { 348 | "codemirror_mode": { 349 | "name": "ipython", 350 | "version": 3 351 | }, 352 | "file_extension": ".py", 353 | "mimetype": "text/x-python", 354 | "name": "python", 355 | "nbconvert_exporter": "python", 356 | "pygments_lexer": "ipython3", 357 | "version": "3.6.2" 358 | } 359 | }, 360 | "nbformat": 4, 361 | "nbformat_minor": 2 362 | } 363 | -------------------------------------------------------------------------------- /exercises/ex_06.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# SQL + Python integration task\n", 8 | "\n", 9 | "1 Create new database taskDB.db using SQLiteStudio\n", 10 | "\n", 11 | "2 Write code in Python which creates table employee. Define following columns : id_employee int, name text, surname text, salary text ,year_of_birth int,month_of_birth int, day_of_birth int,pesel text . Name and surname values cannot be NULL. Column id_employee is primary key with autoincrement option. Pesel default value is NULL.\n" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": { 18 | "collapsed": true 19 | }, 20 | "outputs": [], 21 | "source": [] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "3 Create list, which contains random data about 10 employees and insert this data into employee table. Print info about number of affected rows." 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": { 34 | "collapsed": true 35 | }, 36 | "outputs": [], 37 | "source": [] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "4 Add code in Python which inserts info about single employee into table employee." 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": { 50 | "collapsed": true 51 | }, 52 | "outputs": [], 53 | "source": [] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "5 Use SQL select command to read data about all empoyee's id,names,surnames,salary ordered by salary. Save results on the list\n", 60 | "and print it." 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "metadata": { 67 | "collapsed": true 68 | }, 69 | "outputs": [], 70 | "source": [] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "6 Write custom function generatePesel, which can be used in SQL insert procedure. Function generatePesel returns pesel as a string value. The format of pesel is following: year_month_day_A_B, where A and B are random values, and year=year_of_birth,month__of_birth,day_of_birth " 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": { 83 | "collapsed": true 84 | }, 85 | "outputs": [], 86 | "source": [] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "7 Write SQL command, which returns info about employees,whose salary is greater than 3000. Write code in Python which operates on row objects in given table. Print basic statistics: print column names and number of columns." 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "collapsed": true 100 | }, 101 | "outputs": [], 102 | "source": [] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "8 Add the use of Exceptions to Your code." 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": { 115 | "collapsed": true 116 | }, 117 | "outputs": [], 118 | "source": [] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "9 Read data about all emploees to list allRecords. Double the salary of each employee and updata new data in table employee. " 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": { 131 | "collapsed": true 132 | }, 133 | "outputs": [], 134 | "source": [] 135 | } 136 | ], 137 | "metadata": { 138 | "kernelspec": { 139 | "display_name": "Python 3", 140 | "language": "python", 141 | "name": "python3" 142 | }, 143 | "language_info": { 144 | "codemirror_mode": { 145 | "name": "ipython", 146 | "version": 3 147 | }, 148 | "file_extension": ".py", 149 | "mimetype": "text/x-python", 150 | "name": "python", 151 | "nbconvert_exporter": "python", 152 | "pygments_lexer": "ipython3", 153 | "version": "3.6.2" 154 | } 155 | }, 156 | "nbformat": 4, 157 | "nbformat_minor": 2 158 | } 159 | -------------------------------------------------------------------------------- /exercises/ex_07.ipynb: -------------------------------------------------------------------------------- 1 | {"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.10.12","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":7090817,"sourceType":"datasetVersion","datasetId":4086088}],"dockerImageVersionId":30587,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Visualizations - Pandas\n## Pandas and matplotlib\nYou should now be able to control matplotlib charts quite well. Now see how to combine pandas with matplotib objects.","metadata":{}},{"cell_type":"code","source":"%matplotlib inline\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport pandas as pd\nfrom IPython.display import display, HTML","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:04:12.706706Z","start_time":"2021-12-08T11:04:12.431539Z"},"execution":{"iopub.status.busy":"2023-12-07T14:13:13.527883Z","iopub.execute_input":"2023-12-07T14:13:13.528441Z","iopub.status.idle":"2023-12-07T14:13:13.535706Z","shell.execute_reply.started":"2023-12-07T14:13:13.528410Z","shell.execute_reply":"2023-12-07T14:13:13.534266Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"### Milan weather data\nLet us begin with importing weather dataset.\n\nA weather station from milano data sets:\n\nhttps://dandelion.eu/datagems/SpazioDati/milano-weather-station-data/resource/\n* 6045,Milano - via Filippo Juvara,45.473622,9.220392,Wind Direction,degree\n* 5908,Milano - via Filippo Juvara,45.473622,9.220392,Precipitation,mm\n* 6502,Milano - via Filippo Juvara,45.473622,9.220392,Atmospheric Pressure,hPa\n* 6457,Milano - via Filippo Juvara,45.473622,9.220392,Net Radiation,W/m^2\n* 5909,Milano - via Filippo Juvara,45.473622,9.220392,Temperature,Celsius degree\n* 6179,Milano - via Filippo Juvara,45.473622,9.220392,Relative Humidity,%\n* 6129,Milano - via Filippo Juvara,45.473622,9.220392,Wind Speed,m/s","metadata":{}},{"cell_type":"code","source":"sets = [\n (\"6045\", \"windDirection\"),\n (\"5908\", \"precipitation\"),\n (\"6502\", \"pressure\"),\n (\"6457\", \"radiation\"),\n (\"5909\", \"temp\"),\n (\"6179\", \"humidity\"),\n (\"6129\", \"windSpeed\"),\n]\n\nfor i, oneSet in enumerate(sets):\n df = pd.read_csv(\"/kaggle/input/intro-visual-data/datasets/Milano_WeatherPhenomena/mi_meteo_\"+oneSet[0]+\".csv\", names=[\"code\", \"date\", oneSet[1]])\n print(df.shape)\n# df.set_index(\"date\", inplace=True)\n df.drop(\"code\", axis=1, inplace=True)\n if i == 0:\n milano = df\n else:\n # pandas sees there is only one common column to perform merge on (date)\n milano = milano.merge(df)\n# display(df.head())","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:04:35.659087Z","start_time":"2021-12-08T11:04:35.629964Z"},"execution":{"iopub.status.busy":"2023-12-07T14:13:16.196467Z","iopub.execute_input":"2023-12-07T14:13:16.196841Z","iopub.status.idle":"2023-12-07T14:13:16.318025Z","shell.execute_reply.started":"2023-12-07T14:13:16.196804Z","shell.execute_reply":"2023-12-07T14:13:16.317069Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"milano.date = pd.to_datetime(milano.date)\ndisplay(milano.head())\nprint(milano.isnull().sum())\nprint(milano.dtypes)","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:05:08.355418Z","start_time":"2021-12-08T11:05:08.342000Z"},"execution":{"iopub.status.busy":"2023-12-07T14:14:22.004474Z","iopub.execute_input":"2023-12-07T14:14:22.004854Z","iopub.status.idle":"2023-12-07T14:14:22.034968Z","shell.execute_reply.started":"2023-12-07T14:14:22.004824Z","shell.execute_reply":"2023-12-07T14:14:22.033524Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Create columns which tell us something useful about the date.","metadata":{}},{"cell_type":"code","source":"milano[\"month\"] = milano.date.dt.month\nmilano[\"weekday\"] = milano.date.dt.weekday\nmilano[\"hour\"] = milano.date.dt.hour","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:05:17.081434Z","start_time":"2021-12-08T11:05:17.076880Z"},"execution":{"iopub.status.busy":"2023-12-07T14:15:32.898531Z","iopub.execute_input":"2023-12-07T14:15:32.898890Z","iopub.status.idle":"2023-12-07T14:15:32.910025Z","shell.execute_reply.started":"2023-12-07T14:15:32.898864Z","shell.execute_reply":"2023-12-07T14:15:32.908335Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"See how hourly temperatures change from November to December.","metadata":{}},{"cell_type":"code","source":"monthDay = milano.groupby([\"month\", \"hour\"]).agg(\"mean\")\nmonthDay.plot()","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:05:24.996924Z","start_time":"2021-12-08T11:05:24.991638Z"},"execution":{"iopub.status.busy":"2023-12-07T14:15:46.028348Z","iopub.execute_input":"2023-12-07T14:15:46.028671Z","iopub.status.idle":"2023-12-07T14:15:46.476465Z","shell.execute_reply.started":"2023-12-07T14:15:46.028643Z","shell.execute_reply":"2023-12-07T14:15:46.475179Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"monthDay","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:06:10.376225Z","start_time":"2021-12-08T11:06:10.358061Z"},"scrolled":true,"execution":{"iopub.status.busy":"2023-12-07T14:16:04.598610Z","iopub.execute_input":"2023-12-07T14:16:04.598993Z","iopub.status.idle":"2023-12-07T14:16:04.625753Z","shell.execute_reply.started":"2023-12-07T14:16:04.598966Z","shell.execute_reply":"2023-12-07T14:16:04.624216Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"fig, ax = plt.subplots(1, 1, figsize=(8,4))\n\nfor month in monthDay.index.get_level_values(\"month\").unique():\n monthDay[\"temp\"].loc[month].plot(ax=ax, label=month)\nax.set_title(\"Temperature\")\nax.legend()\nplt.show()","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:05:28.134041Z","start_time":"2021-12-08T11:05:28.006305Z"},"execution":{"iopub.status.busy":"2023-12-07T14:16:44.161317Z","iopub.execute_input":"2023-12-07T14:16:44.161674Z","iopub.status.idle":"2023-12-07T14:16:44.399433Z","shell.execute_reply.started":"2023-12-07T14:16:44.161648Z","shell.execute_reply":"2023-12-07T14:16:44.398515Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"It seems that December is considerably colder. How does wind speed and humidity change? Assume that these indicators are less important and you want them to take less space on the chart.","metadata":{}},{"cell_type":"code","source":"fig, ax = plt.subplots(3, 1, figsize=(8,8), gridspec_kw={'height_ratios':[3, 2, 2]})\n\n\nfor month in monthDay.index.get_level_values(\"month\").unique():\n monthDay[\"temp\"].loc[month].plot(ax=ax[0], label=month)\nax[0].legend()\nax[0].set_title(\"Temperature\")\n \nfor month in monthDay.index.get_level_values(\"month\").unique():\n monthDay[\"humidity\"].loc[month].plot(ax=ax[1], label=month)\nax[1].legend()\nax[1].set_title(\"Humidity\")\n\nfor month in monthDay.index.get_level_values(\"month\").unique():\n monthDay[\"windSpeed\"].loc[month].plot(ax=ax[2], label=month)\nax[2].legend()\nax[2].set_title(\"Wind speed\")\nplt.tight_layout() \nplt.show()\n ","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:09:00.436443Z","start_time":"2021-12-08T11:09:00.151306Z"},"execution":{"iopub.status.busy":"2023-12-07T14:19:14.281512Z","iopub.execute_input":"2023-12-07T14:19:14.281867Z","iopub.status.idle":"2023-12-07T14:19:14.909588Z","shell.execute_reply.started":"2023-12-07T14:19:14.281841Z","shell.execute_reply":"2023-12-07T14:19:14.908083Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"You may also want to compare humidity and temperature on one chart.","metadata":{}},{"cell_type":"code","source":"fig, ax = plt.subplots(1, 1, figsize=(10,5))\nax1 = ax.twinx()\n\nfor month in monthDay.index.get_level_values(\"month\").unique():\n monthDay[\"temp\"].loc[month].plot(ax=ax, label=month)\nax.legend()\n# ax[0].set_title(\"Temperature\")\n \nfor month in monthDay.index.get_level_values(\"month\").unique():\n monthDay[\"humidity\"].loc[month].plot(ax=ax1, label=str(str(month)+str(\"- Humidity\")), style=\"--\")\nax1.legend(loc=2)\nfig.tight_layout()\n\nax.set_ylabel('Temperature [C]')\nax1.set_ylabel('Humidity [%]')\nax1.set_xlabel('Hour')\n\n# ax[1].set_title(\"Humidity\")\n\n# t = np.linspace(0., 10., 100)\n# ax1.plot(t, t ** 2, 'b-')\n# ax2.plot(t, 1000 / (t + 1), 'r-')\n# ax1.set_ylabel('Density (cgs)', color='red')\n# ax1.set_ylabel('Temperature (K)', color='blue')\n# ax1.set_xlabel('Time (s)')","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:10:08.388807Z","start_time":"2021-12-08T11:10:08.224116Z"},"execution":{"iopub.status.busy":"2023-12-07T14:20:00.080039Z","iopub.execute_input":"2023-12-07T14:20:00.080406Z","iopub.status.idle":"2023-12-07T14:20:00.550303Z","shell.execute_reply.started":"2023-12-07T14:20:00.080376Z","shell.execute_reply":"2023-12-07T14:20:00.549397Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"","metadata":{"collapsed":true,"jupyter":{"outputs_hidden":true}},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"### Airport and air routes data\nNow you will use a well-known and interesting dataset about airports and air routes.\n* https://openflights.org/data.html\n\nAs usual, let us begin with reading the dataset, checking its shape and column types, which we improve if necessary. We will also get rid of unnecessary columns.","metadata":{}},{"cell_type":"code","source":"cols = ['airportID', 'name', 'city', 'country', 'IATA', 'ICAO', 'lat', 'lon', 'altitude', 'timezone', 'DST', 'tz', 'type', 'source']\nairports = pd.read_csv(\"/kaggle/input/intro-visual-data/datasets/air/airports.bin\",sep=',',names=cols, dtype={'airportID':object})\ncols = ['airportID', 'name', 'city', 'country', 'IATA', 'ICAO', 'lat', 'lon', 'altitude', 'timezone', 'DST', 'tz']\nairports = airports[cols]\n\ncols = ['airline', 'airlineID', 'sourceAirport', 'sourceAirportID', 'destAirport', 'destAirportID', 'codeshare', 'stops', 'equipment']\nroutes = pd.read_csv(\"/kaggle/input/intro-visual-data/datasets/air/routes.bin\",sep=',',names=cols)\ncols = ['airline', 'airlineID', 'sourceAirport', 'sourceAirportID', 'destAirport', 'destAirportID', 'stops', 'equipment']\nroutes = routes[cols]","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:13:08.671567Z","start_time":"2021-12-08T11:13:08.602118Z"},"execution":{"iopub.status.busy":"2023-12-07T14:25:16.440201Z","iopub.execute_input":"2023-12-07T14:25:16.440542Z","iopub.status.idle":"2023-12-07T14:25:16.595424Z","shell.execute_reply.started":"2023-12-07T14:25:16.440513Z","shell.execute_reply":"2023-12-07T14:25:16.594115Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"print(\"Airports\", airports.shape, \"Routes\", routes.shape)\ndisplay(airports.head())\ndisplay(routes.head())\nprint(airports.dtypes)\nprint(routes.dtypes)","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:13:09.067761Z","start_time":"2021-12-08T11:13:09.048698Z"},"execution":{"iopub.status.busy":"2023-12-07T14:25:23.126889Z","iopub.execute_input":"2023-12-07T14:25:23.127279Z","iopub.status.idle":"2023-12-07T14:25:23.156321Z","shell.execute_reply.started":"2023-12-07T14:25:23.127247Z","shell.execute_reply":"2023-12-07T14:25:23.155152Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Just in case, check before merging how many routes are without a corresponding airport. For 67000 routes, 850 unknown airports is not that bad.","metadata":{}},{"cell_type":"code","source":"print((~routes.sourceAirportID.isin(airports.airportID)).sum())\nprint((~routes.sourceAirport.isin(airports.IATA)).sum())\nprint((~routes.destAirportID.isin(airports.airportID)).sum())\nprint((~routes.destAirport.isin(airports.IATA)).sum())","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:13:09.926839Z","start_time":"2021-12-08T11:13:09.912516Z"},"execution":{"iopub.status.busy":"2023-12-07T14:28:09.507702Z","iopub.execute_input":"2023-12-07T14:28:09.508072Z","iopub.status.idle":"2023-12-07T14:28:09.536165Z","shell.execute_reply.started":"2023-12-07T14:28:09.508043Z","shell.execute_reply":"2023-12-07T14:28:09.534703Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Now, merge on proper columns. Choose an \"inner\" option. In this case we do not care about routes with unidentified airport. Do a double merge, so that you know latitude and longitude of both departure and arrival.\n* Why do we choose \"inner\" instead of \"right\"? What would be the shape of routAir dataframe if you choose \"right\"? Would it make sense?","metadata":{}},{"cell_type":"code","source":"routAir = routes.merge(airports, left_on=\"sourceAirportID\", right_on=\"airportID\", how=\"inner\" )\nroutAir = routAir.merge(airports, left_on=\"destAirportID\", right_on=\"airportID\", how=\"left\", suffixes=[\"\", \"_dest\"])\nprint(routAir.shape)\ndisplay(routAir.head())","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:13:10.742922Z","start_time":"2021-12-08T11:13:10.672211Z"},"execution":{"iopub.status.busy":"2023-12-07T14:29:12.129593Z","iopub.execute_input":"2023-12-07T14:29:12.130011Z","iopub.status.idle":"2023-12-07T14:29:12.239259Z","shell.execute_reply.started":"2023-12-07T14:29:12.129979Z","shell.execute_reply":"2023-12-07T14:29:12.237994Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"You may want to know the distance of the routes. They are not available directly in the dataset, but fortunately we have geographical coordinates of both airports (disregard stopovers).\ngeopy library will be useful in this case.","metadata":{}},{"cell_type":"code","source":"from geopy.distance import distance","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:15:24.673436Z","start_time":"2021-12-08T11:15:24.671371Z"},"execution":{"iopub.status.busy":"2023-12-07T14:33:49.822536Z","iopub.execute_input":"2023-12-07T14:33:49.822894Z","iopub.status.idle":"2023-12-07T14:33:49.827749Z","shell.execute_reply.started":"2023-12-07T14:33:49.822867Z","shell.execute_reply":"2023-12-07T14:33:49.826213Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"#!pip install geopy","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:15:26.276903Z","start_time":"2021-12-08T11:15:24.863311Z"},"execution":{"iopub.status.busy":"2023-12-07T14:33:18.599786Z","iopub.execute_input":"2023-12-07T14:33:18.600553Z","iopub.status.idle":"2023-12-07T14:33:22.121275Z","shell.execute_reply.started":"2023-12-07T14:33:18.600522Z","shell.execute_reply":"2023-12-07T14:33:22.120132Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"distances = []\nfor k,l,x,y in zip(routAir.lat, routAir.lon, routAir.lat_dest, routAir.lon_dest):\n try:\n distances.append(distance((x,y), (k,l)).meters/1000)\n except:\n distances.append(np.nan)\nroutAir[\"distance\"] = distances","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:15:36.105136Z","start_time":"2021-12-08T11:15:26.278946Z"},"execution":{"iopub.status.busy":"2023-12-07T14:33:51.509097Z","iopub.execute_input":"2023-12-07T14:33:51.509501Z","iopub.status.idle":"2023-12-07T14:34:07.456261Z","shell.execute_reply.started":"2023-12-07T14:33:51.509470Z","shell.execute_reply":"2023-12-07T14:34:07.454802Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Choose only flights departing from European airports.","metadata":{}},{"cell_type":"code","source":"euro = routAir.loc[routAir.tz.str.contains(\"Europe\")]\neuro.shape","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:16:06.167455Z","start_time":"2021-12-08T11:16:06.119965Z"},"execution":{"iopub.status.busy":"2023-12-07T14:34:20.027636Z","iopub.execute_input":"2023-12-07T14:34:20.028025Z","iopub.status.idle":"2023-12-07T14:34:20.076255Z","shell.execute_reply.started":"2023-12-07T14:34:20.027992Z","shell.execute_reply":"2023-12-07T14:34:20.074554Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Count some interesting aggregate values.","metadata":{}},{"cell_type":"code","source":"euroAir = euro.groupby(\"airportID\").agg({\"airline\":\"count\", \"distance\":\"mean\"})","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:16:12.166381Z","start_time":"2021-12-08T11:16:12.159292Z"},"execution":{"iopub.status.busy":"2023-12-07T14:34:31.275321Z","iopub.execute_input":"2023-12-07T14:34:31.275660Z","iopub.status.idle":"2023-12-07T14:34:31.287297Z","shell.execute_reply.started":"2023-12-07T14:34:31.275635Z","shell.execute_reply":"2023-12-07T14:34:31.286308Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"euroAir[\"size\"] = np.sqrt(euroAir.airline)\neuroAir[\"sqrDistance\"] = np.sqrt(euroAir.distance)","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:16:14.318214Z","start_time":"2021-12-08T11:16:14.313780Z"},"execution":{"iopub.status.busy":"2023-12-07T14:34:42.516092Z","iopub.execute_input":"2023-12-07T14:34:42.516416Z","iopub.status.idle":"2023-12-07T14:34:42.523562Z","shell.execute_reply.started":"2023-12-07T14:34:42.516389Z","shell.execute_reply":"2023-12-07T14:34:42.521924Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"euroAir = euroAir.reset_index()","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:16:28.341916Z","start_time":"2021-12-08T11:16:28.338787Z"},"execution":{"iopub.status.busy":"2023-12-07T14:34:51.103308Z","iopub.execute_input":"2023-12-07T14:34:51.103695Z","iopub.status.idle":"2023-12-07T14:34:51.112622Z","shell.execute_reply.started":"2023-12-07T14:34:51.103663Z","shell.execute_reply":"2023-12-07T14:34:51.111068Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"# there is only one common column, so Pandas guesses which one to use\neuroAir = euroAir.merge(airports)\n","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:16:30.466197Z","start_time":"2021-12-08T11:16:30.457488Z"},"execution":{"iopub.status.busy":"2023-12-07T14:35:11.361212Z","iopub.execute_input":"2023-12-07T14:35:11.361623Z","iopub.status.idle":"2023-12-07T14:35:11.381957Z","shell.execute_reply.started":"2023-12-07T14:35:11.361591Z","shell.execute_reply":"2023-12-07T14:35:11.380008Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Now create a more interesting plot. Draw airports consideting their geographical coordinates, size of an airport (mean number of flights) and mean distance of flights.","metadata":{}},{"cell_type":"code","source":"import matplotlib.cm as cmaps\neuroAir.plot(\"lon\", \"lat\", kind=\"scatter\", figsize=(13,8), s=euroAir[\"distance\"]/25, alpha=0.7, c=euroAir[\"size\"], cmap=cmaps.viridis)","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:16:33.127677Z","start_time":"2021-12-08T11:16:32.864233Z"},"execution":{"iopub.status.busy":"2023-12-07T14:36:43.550891Z","iopub.execute_input":"2023-12-07T14:36:43.551240Z","iopub.status.idle":"2023-12-07T14:36:44.053130Z","shell.execute_reply.started":"2023-12-07T14:36:43.551212Z","shell.execute_reply":"2023-12-07T14:36:44.051890Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"This chart may look pretty, but as long as coordinates are not exact and you do not see country borders, it is not that useful.","metadata":{}},{"cell_type":"code","source":"euroAir.plot(\"sqrDistance\", \"size\", kind=\"scatter\", figsize=(12,8))","metadata":{"execution":{"iopub.status.busy":"2023-12-07T14:39:55.385310Z","iopub.execute_input":"2023-12-07T14:39:55.385764Z","iopub.status.idle":"2023-12-07T14:39:55.645305Z","shell.execute_reply.started":"2023-12-07T14:39:55.385716Z","shell.execute_reply":"2023-12-07T14:39:55.643902Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"","metadata":{"collapsed":true,"jupyter":{"outputs_hidden":true}},"outputs":[],"execution_count":null}]} 2 | -------------------------------------------------------------------------------- /exercises/ex_09_simple_web_app.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Building a Simple Web Application with Bottle\n", 8 | "\n", 9 | "Objective: Create a basic web application using the Bottle framework in Python. This application will display a welcome message and the current time on a web page.\n", 10 | "\n", 11 | "Install Bottle: If Bottle is not installed, use pip to install it" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "pip install bottle ## in the console/terminal/bash" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "**Import Bottle**\n", 28 | "\n", 29 | "Write a Python script and start by importing the Bottle class from the bottle module.\n", 30 | "\n", 31 | "**Create an Instance of Bottle**\n", 32 | "\n", 33 | "Create an instance of the Bottle class. This instance represents your web application.\n", 34 | "\n", 35 | "**Define Routes and Views**\n", 36 | "\n", 37 | "Define a route for the root URL ('/'). A route is a URL pattern that is used to map a function to a URL.\n", 38 | "\n", 39 | "Create a view function that will be executed when the root URL is accessed. This function should return a string that includes a welcome message and the current time.\n", 40 | "\n", 41 | "**Run the Application**\n", 42 | "\n", 43 | "Use the run method of the Bottle instance to run your application. Set parameters like host and port as needed.\n", 44 | "\n", 45 | "**Test the Application**\n", 46 | "\n", 47 | "After running the script, open a web browser and go to http://localhost:8080/ (or the respective host and port you set) to see the application in action." 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": null, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "\n", 62 | "**Modify the welcome message to something of your choice.**\n", 63 | "\n", 64 | "**Experiment by adding more routes and corresponding view functions. For example, create a new route like '/about' that displays information about the web application or its developer.**\n" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "**Create a page with a form where users can enter their name. When the form is submitted, display a personalized greeting on a new page.**\n", 79 | "\n", 80 | "Create a new route (e.g., /greet) that renders an HTML form with a text input for the user's name and a submit button.\n", 81 | "\n", 82 | "Create a second route (e.g., /greet_user) to handle the POST request from the form. Extract the user's name from the form data and display a personalized greeting." 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "**Modify the application to use Bottle's template engine for rendering HTML content.**\n", 97 | "\n", 98 | "Create a template file for the greeting form and another for the personalized greeting.\n", 99 | "\n", 100 | "Use the template function from Bottle to render these templates in the respective routes." 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "**Serve static files such as CSS or images to enhance the appearance of the application.**\n", 115 | "\n", 116 | "Create a directory named static in your project.\n", 117 | "\n", 118 | "Add a CSS file and/or image files to this directory.\n", 119 | "\n", 120 | "Create a route in your Bottle app to serve files from the static directory." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "**Additional (bonus) tasks**\n", 135 | "\n", 136 | "Implement error handling for routes (e.g., a 404 page).\n", 137 | "\n", 138 | "Experiment with more advanced features of Bottle, such as cookies, file uploads, or database integration." 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "## Streamlit\n", 153 | "\n", 154 | "See if you can build a similiar app using streamlit. What are the differences?" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [] 163 | } 164 | ], 165 | "metadata": { 166 | "language_info": { 167 | "name": "python" 168 | } 169 | }, 170 | "nbformat": 4, 171 | "nbformat_minor": 2 172 | } 173 | -------------------------------------------------------------------------------- /exercises/ex_10.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Webscrapping - excercises\n", 8 | "## Wikipedia\n", 9 | "Go to Wiki website for Basketball categroy.\n", 10 | "* https://en.wikipedia.org/wiki/Category:Basketball\n", 11 | "* Get the names of all articles in this category using requests and beautiful soup.\n", 12 | "* Get the conetns of all these articles using requests and BeautifulSoup\n", 13 | "* Get the contents of all these articles using Wikipedia API (https://www.mediawiki.org/wiki/API:Query)" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": null, 19 | "metadata": { 20 | "collapsed": true 21 | }, 22 | "outputs": [], 23 | "source": [] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "Using selenium perform following steps.\n", 30 | "* Visit frontpage of reddit.com.\n", 31 | "* Get all the target links and links to comment sections for all links in first three pages on main page (paginate using a selenium action).\n", 32 | "* Choose an article on random and get all the comments for one article. Make sure that you have retained the tree structure and information about authors and points." 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": { 39 | "collapsed": true 40 | }, 41 | "outputs": [], 42 | "source": [] 43 | } 44 | ], 45 | "metadata": { 46 | "kernelspec": { 47 | "display_name": "Python 3", 48 | "language": "python", 49 | "name": "python3" 50 | }, 51 | "language_info": { 52 | "codemirror_mode": { 53 | "name": "ipython", 54 | "version": 3 55 | }, 56 | "file_extension": ".py", 57 | "mimetype": "text/x-python", 58 | "name": "python", 59 | "nbconvert_exporter": "python", 60 | "pygments_lexer": "ipython3", 61 | "version": "3.6.3" 62 | } 63 | }, 64 | "nbformat": 4, 65 | "nbformat_minor": 2 66 | } 67 | -------------------------------------------------------------------------------- /exercises/ex_adv1_django.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Business Analytics dashboard with Django and SQL " 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Setting Up the Project \n", 15 | "\n", 16 | "### Project initialization\n", 17 | "\n", 18 | "- Set up the Django project and the analytics_dashboard app.\n", 19 | "- Use a virtual environment and organize the project.\n", 20 | "- Optional: Code it using OOP.\n", 21 | "\n", 22 | "### Database models and raw SQL Queries\n", 23 | "\n", 24 | "- Create models to represent busines data. For example, models for Sales, Customer, and Product. Instead of relying only on Django ORM, integrate raw SQL queries.\n", 25 | "- Populate the database with some sample data (manually or via CSV import).\n", 26 | "- Allow file upload for data (CSV or Excel) to populate the models dynamically.\n", 27 | "- Write custom methods using connection.cursor() to run SQL queries for tasks like getting top-selling products, customer retention rates, or sales by region.\n", 28 | "- Query sales data to find top 5 customers based on total purchases and display this on the dashboard.\n", 29 | "\n", 30 | "### Advanced SQL queries for business metrics\n", 31 | " - Write a SQL query to calculate metrics like Average Ordr Value (AOV) or Customer Lifetime Value (CLV) and integrate these metrics into the dashboard.\n", 32 | " - Write SQL queries to filter data by date range, products, or customer types\n", 33 | "\n", 34 | "### Views and templates\n", 35 | "\n", 36 | "- Create views that use these SQL queries that display summary statistics, such as total sales, average customer spending, or product performance on the frontend.\n", 37 | "- Build templates with dynamic content based on SQL query results, using charts for visualization (you could use a library like chart.js or plotly).\n", 38 | "- Allow users to filter results by date range, product category, or customer demographics.\n", 39 | "\n", 40 | "### User authentication and permissions\n", 41 | "\n", 42 | "- Set up user authentication and role-based access control (e.g., admin and standard user).\n", 43 | "- Only authorized users can access certain parts of the dashboard." 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## Advanced SQL and Django integration, data visualization\n", 51 | "\n", 52 | "### SQL optimization and query performance\n", 53 | "\n", 54 | "- Optimize SQL queries by adding indexes and testing the impact on performance.\n", 55 | "- Use the EXPLAIN statement to analyze query efficiency and suggest optimizations (e.g., indexing frequently queried fields like product_id or sales_date).\n", 56 | "- Create a Django management command to generate a SQL report of daily sales performance and email it to admins.\n", 57 | "\n", 58 | "### Advanced data handling using SQL and Django ORM\n", 59 | "\n", 60 | "- Combine ORM with raw SQL to manage complex queries\n", 61 | "- Use Django ORM for simple queries but fall back on raw SQL for optimized multi-table joins.\n", 62 | "- Write a custom SQL query for a product sales trend over time (e.g., using GROUP BY and date functions), and integrate this into a visualization.\n", 63 | "\n", 64 | "\n", 65 | "### Data visualization\n", 66 | "\n", 67 | "- Use SQL queries to feed the data into interactive visualizations (e.g., bar charts for product sales, line graphs for sales over time, pie charts for customer segmentation) using Chart.js, Plotly, or D3.js.\n", 68 | "- Display sales per month, customer segmentation, and other business metrics calculated through SQL.\n", 69 | "- Alternatively try using Django REST Framework (DRF) to provide JSON dta to the front-end for these visualizations.\n", 70 | "- Create an API endpoint (using Django REST Framework) that exposes key business insights (e.g., sales trends, top-performing products).\n", 71 | "- Add filtering and sorting parameters to the API to allow more flexible queries.\n", 72 | "\n", 73 | "### SQL security and best practices\n", 74 | "\n", 75 | "- Think how to securely integrate SQL into Django, avoiding SQL injection by using parameterized queries (cursor.execute(query, params)).\n", 76 | "- Test the security of SQL queries and improve performance by writing efficient join queries and using indexing strategies.\n", 77 | "\n", 78 | "### Advanced data handling\n", 79 | "\n", 80 | "- Include a feature to dynamically calculate and visualize KPIs such as customer lifetime value (CLV), retention rates, or sales growth.\n", 81 | "- Implement caching (e.g., using Redis) for expensive data calculations.\n", 82 | "\n", 83 | "### Testing and documentation\n", 84 | "\n", 85 | "- Write unit tests for the models and views, especially focusing on ensuring the integrity of the analytics results.\n", 86 | "- Create basic documentation for how to use the dashboard and its features, including API usage.\n", 87 | "\n", 88 | "\n", 89 | "\n" 90 | ] 91 | } 92 | ], 93 | "metadata": { 94 | "language_info": { 95 | "name": "python" 96 | } 97 | }, 98 | "nbformat": 4, 99 | "nbformat_minor": 2 100 | } 101 | -------------------------------------------------------------------------------- /exercises/ex_adv2_ML.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Build a predictive analytics model for business insights using ML", 8 | "\n", 9 | "- Optional: Code it using OOP.\n" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Data simulation and exploration\n", 17 | "\n", 18 | "### Data simulation\n", 19 | "\n", 20 | "- Create a simulated dataset representing a business scenerio. For example, simulate a dataset with features such as customer_id, age, income, purchase_history, and customer_satisfaction.\n", 21 | "- Use numpy or pandas to generate this dataset.\n", 22 | "\n", 23 | "### Exploratory Data Analysis (EDA)\n", 24 | "\n", 25 | "- Perform EDA using pandas and visualizations (e.g., using Matplotlib or Seaborn) to understand the data distributions, correlations, and relationships between features.\n", 26 | "- Identify potential features for modeling and any data preprocessing required (e.g., scaling, handling missing values).\n", 27 | "\n", 28 | "### Data manipulation with SQL\n", 29 | "\n", 30 | "- Use SQLite or PostgreSQL to store the simulated data. Consider using NoSQL databases.\n", 31 | "- Use SQL queries to aggregate or filter the data, such as finding average customer satisfaction by age group or income bracket.\n", 32 | "\n", 33 | "## Model development and evaluation\n", 34 | "\n", 35 | "### Model selection and implementation\n", 36 | "\n", 37 | "- Linear Regression for predicting customer satisfaction based on features.\n", 38 | "- Decision Trees for classification based on customer segments.\n", 39 | "- Gradient Boosting for improved predictions.\n", 40 | "- Neural Networks as an alternative approach.\n", 41 | "- Use libraries like Scikit-learn and Keras to implement these models.\n", 42 | "\n", 43 | "### Model training\n", 44 | "\n", 45 | "- Split the dataset into training, validation and testing sets.\n", 46 | "- Train the selected models on the training set and evaluate performance using metrics such as Mean Absolute Error (MAE) or other for regression and accuracy for classification.\n", 47 | "- Cross-Validation - implement K-Fold cross-validation to assess model stability and performance better\n", 48 | "- Feature engineering - create new features based on existing ones (e.g., a new feature represnting interaction between age and income).\n", 49 | "\n", 50 | "\n", 51 | "### Model evaluation and comparison\n", 52 | "\n", 53 | "- Compare the performance of your models based on evaluation metrics and think about the trade-offs of each model.\n", 54 | "- Use hyperparameter tuning techniques (e.g., Grid Search) to optimize model performance. You can try out KerasTuner or look for other libraries.\n", 55 | "- Sue ensemble methods such as Voting Classifier for classification tasks or Stacking to combine predictions from multiple models.\n", 56 | "\n", 57 | "## Advanced data visualization and model deployment\n", 58 | "\n", 59 | "- Use SHAP (SHapley Additive exPlanations) values to interpret model predictions, it should help you understand feature importance and model decisions.\n", 60 | "- Try to save and load models using joblib or pickle, simulating model deployment.\n", 61 | "- Save your best-performing model and provide a simple interface to make predictions on new simulated customer data.\n" 62 | ] 63 | } 64 | ], 65 | "metadata": { 66 | "language_info": { 67 | "name": "python" 68 | } 69 | }, 70 | "nbformat": 4, 71 | "nbformat_minor": 2 72 | } 73 | -------------------------------------------------------------------------------- /notebooks/02_Memory_flow_control.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Basics of programming in Python\n", 8 | "## Memory and memory addressing\n", 9 | "For an analyst Python is an excellent programming language, partly because of its efficiency. Because we are efficiency-oriented, we have to undestand at least the basics of memory management and addressing in programming, especially in Python. Understanding of this issue is important not only for our programs' running time, but first of all - their correctness. This notebook presents slightly simplified and hopefully clear explanations.\n", 10 | "\n", 11 | "Effective memory management is crucial because of two reasons: **memory is slow**, and in the age of large datasets **memory is valuable**. Because of these reasons we will avoid copying objects and rewriting them in a different place by all means. You may think that RAM is \"fast\", because it is much faster than HDD and even SSD. But it is much slower than contemporary CPUs (this is why CPU cache exists, link below). We will avoid copying, reading and writing because of our programs' efficiency.\n", 12 | "\n", 13 | "Every object which is stored in memory, no matter its size, has its own address. It is true both for small (single int) and large (enormous dataset for analysis) objects. Objects' addresses are always \"small\", even if object itself is very large. This is why it is much easier to pass information about object's address than pass the whole object - create a copy of it in the memory.\n", 14 | "\n", 15 | "Look at a simple example:" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": null, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "a = 3\n", 25 | "b = a\n", 26 | "print(a, b)\n", 27 | "b = 4\n", 28 | "print(a, b)" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "As you can see, for a number, assignment operator \"=\" copies an object. How does this operator behave for a list?" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "colors = [\"red\", \"blue\", \"green\"]\n", 45 | "colors2 = colors\n", 46 | "colors2.append(\"black\")\n", 47 | "print(colors)" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "After creating colors2 variable you could expect a copied object. It seems that appending \"black\" to colors2 should not have any effect on colors. However, for an object of type list operator \"=\" copies address (reference/alias) to an object. After a line:\n", 55 | "\n", 56 | "colors2 = colors\n", 57 | "\n", 58 | "both colors and colors2 variables contain the address of the same list. You could think about it as writing the address of a building on two different sheets of paper. If you append \"black\" to a list with a given address (second sheet of paper - colors2), when returning to the same address (read from the first sheet of paper - colors), you will see the only list (the same building) which exists in memory.\n", 59 | "\n", 60 | "Go back to the previous example and try to understand what happens there. This example perfectly shows that you may cause errors which are difficult for debugging, if you write code without understanding references." 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "colors = [\"red\", \"blue\", \"green\"]\n", 70 | "numbers = [4, 5, 6]\n", 71 | "\n", 72 | "mixedList1 = colors\n", 73 | "mixedList1.append(numbers)\n", 74 | "print(mixedList1)\n", 75 | "\n", 76 | "mixedList2 = []\n", 77 | "mixedList2.append(colors)\n", 78 | "mixedList2.append(numbers)\n", 79 | "print(mixedList2)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "If you write *mixedList1 = colors* instead of *mixedList1 = list(colors)*, variable mixedList1 is not the address of a copy, but only a new address of the old object. This is why when writing:\n", 87 | "\n", 88 | "mixedList2.append(colors)\n", 89 | "\n", 90 | "you append to the first place of a new list a mixed list (earlier, you have modified the list which had had the \"address\" colors)\n", 91 | "\n", 92 | "See the correct code, which shows two ways to create a new object (a copy):" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "colors = [\"red\", \"blue\", \"green\"]\n", 102 | "numbers = [4, 5, 6]\n", 103 | "\n", 104 | "mixedList1 = list(colors)\n", 105 | "# or\n", 106 | "mixedList1 = colors.copy()\n", 107 | "\n", 108 | "mixedList1.append(numbers)\n", 109 | "print(mixedList1)\n", 110 | "\n", 111 | "mixedList2 = []\n", 112 | "mixedList2.append(colors)\n", 113 | "mixedList2.append(numbers)\n", 114 | "print(mixedList2)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "* Introduction of 64-bit CPUs is directly connected with memory addressing: https://www.youtube.com/watch?v=KgiMzKb8dD0\n", 122 | "* For the curious, how important are levels of cache:\n", 123 | "https://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu-caches-work-and-why-theyre-an-essential-part-of-modern-chips\n", 124 | "* For the very curious, how CPU works on the low level:\n", 125 | "https://www.youtube.com/watch?v=cNN_tTXABUA\n", 126 | "* If you are deeply interested in programming, you should understand the difference between objects, references and pointers." 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "## Flow control\n", 134 | "\n", 135 | "### For, ranges and iterators\n", 136 | "There is an easy way to create ranges of numbers in Python. See a few examples using \"for\" and iterator \"range\":" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "# If you want to simply print a range of numbers, you will see a \"strange\" result:\n", 146 | "print(range(4))\n", 147 | "# Output \"range(0,4)\" tells you what has been created.\n", 148 | "# It does not tell you about all the elements it can show.\n", 149 | "print(\"Print all elements in range(4): \")\n", 150 | "for i in range(4):\n", 151 | " print(i)\n", 152 | "# See two other examples:\n", 153 | "print(\"Print all elements in range(2, 10, 2): \")\n", 154 | "for i in range(2, 10, 2):\n", 155 | " print(i)\n", 156 | "\n", 157 | "print(\"Print all elements in range(0, -11, -3): \")\n", 158 | "for i in range(0, -11, -3):\n", 159 | " print(i)" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "Iterators allow you to traverse a container (e.g. a list), when you want to see what each element contains." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "colors = [\"red\", \"blue\", \"green\"]\n", 176 | "for color in colors:\n", 177 | " print(color)" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "If you try to do a similar thing for dictionaries, the iterator returns two-element tuples containing pairs from the dictionary. In practice it is not very convenient.\n", 185 | "\n", 186 | "If you do not want to return a tuple, but two variables instead, you can use automatic unzipping of tuples. As you can see below, if you pass a number of arguments equal to the length of a single tuple, Python unzipped them." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "author = {'name': 'Maciej', 'surname': 'Wilamowski', 'age': 32}\n", 196 | "for element in author.items():\n", 197 | " print(element)\n", 198 | "print(\"\\nUnzipped tuples: \")\n", 199 | "for key, value in author.items():\n", 200 | " print(key, value)" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "### Enumerate\n", 208 | "In some cases you may not only need information about list's elements' content, but also about their indices. enumerate(), a counting iterator, is used for this purpose:" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "for i, color in enumerate(colors):\n", 218 | " print(i, color)" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "### Zip\n", 226 | "Sometimes you may have two lists, over which you want to iterate simultaneously. zip() joins the lists and retuns their elements as a tuple. Number of elements returned by zip() is equal to length of the shortest list." 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": null, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "colors = [\"red\", \"blue\", \"green\"]\n", 236 | "numbers = [4, 5, 6, 7]\n", 237 | "names = [\"Matt\", \"Ben\", \"John\", \"Adam\", \"Jim\"]\n", 238 | "\n", 239 | "for color, number in zip(colors,numbers):\n", 240 | " print(color, number)\n", 241 | "\n", 242 | "print(\"\\nZip for 3 elements\")\n", 243 | "for color, number, name in zip(colors,numbers,names):\n", 244 | " print(color, number, name)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "### List Comprehensions\n", 252 | "Calling functions/operations on all list elements is used so often, that there is a special syntax/instruction for that (list comprehensions), which creates a list based on another existing list. This is a one-line for loop, which has the following syntax:\n", 253 | "\n", 254 | "[what_to_do(x) for x in some_list optional_logical_condition]\n", 255 | "\n", 256 | "For example:" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "list1 = list(range(5))\n", 266 | "print([x**2 for x in list1])\n", 267 | "# You could perform this operation only for even numbers.\n", 268 | "print([x**2 for x in list1 if x % 2 == 0])\n", 269 | "# The operation may have more than one argument.\n", 270 | "list2 = list(range(2, 12, 2))\n", 271 | "print([x * y for (x, y) in zip(list1, list2)])" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "### If and while\n", 279 | "There are two more basic flow control tools: if and while. Their implementation is fully analogous to other programming languages." 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "x = 3\n", 289 | "if x < 2:\n", 290 | " print(\"Value below 2\")\n", 291 | "elif x > 10:\n", 292 | " print(\"Value above 10\")\n", 293 | "else:\n", 294 | " print(\"Value between 2 and 10, inclusive\")" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": null, 300 | "metadata": {}, 301 | "outputs": [], 302 | "source": [ 303 | "import math\n", 304 | "# Description of other functions available in math module.\n", 305 | "# https://docs.python.org/3/library/math.html\n", 306 | "math.pow(2, 3)\n", 307 | "tol = 0.1\n", 308 | "diff = 1\n", 309 | "k = 1\n", 310 | "while(diff > tol):\n", 311 | " diff = math.e - abs(math.pow((1 + 1 / k), k))\n", 312 | " print(k, math.pow((1 + 1 / k), k), diff)\n", 313 | " k += 1" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "### Continue\n", 321 | "Sometimes you may want to skip a loop iteration. You could use continue statement for that. For example:" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": null, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "for i in range(11):\n", 331 | " if i % 3 == 0:\n", 332 | " continue\n", 333 | " else:\n", 334 | " print(i)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "### Break\n", 342 | "A loop (for and while) may be stopped using break statement." 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": {}, 349 | "outputs": [], 350 | "source": [ 351 | "import math\n", 352 | "# Description of other functions available in math module.\n", 353 | "# https://docs.python.org/3/library/math.html\n", 354 | "math.pow(2, 3)\n", 355 | "tol = 0\n", 356 | "diff = 1\n", 357 | "k = 1\n", 358 | "while(diff > tol):\n", 359 | " diff = math.e - abs(math.pow((1 + 1 / k), k))\n", 360 | " print(k, math.pow((1 + 1 / k), k), diff)\n", 361 | " k += 1\n", 362 | " if k > 15:\n", 363 | " print(\"Value of tol (tolerance) is probably wrong... break.\")\n", 364 | " break" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "## Error handling\n", 372 | "When you use Python for data analysis you may experience errors relatively often. The simplest examples are missing values or dividing by 0. You often do not want to stop the whole program because of that.\n", 373 | "\n", 374 | "In the code below the program returns an error in the third line and does not execute the fourth (you may check it by running the next cell)." 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": null, 380 | "metadata": {}, 381 | "outputs": [], 382 | "source": [ 383 | "a = 0\n", 384 | "b = 4\n", 385 | "c = b / a\n", 386 | "d = a + b" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": {}, 393 | "outputs": [], 394 | "source": [ 395 | "d" 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": null, 401 | "metadata": {}, 402 | "outputs": [], 403 | "source": [ 404 | "a = 0\n", 405 | "b = 4\n", 406 | "try:\n", 407 | " c = b / a\n", 408 | "# In the case of division, the only error you may expect is:\n", 409 | "except ZeroDivisionError as e:\n", 410 | " print(\"You tried to divide by zero!\")\n", 411 | " c = b * float('inf')\n", 412 | "# You do not really expect an exception here.\n", 413 | "d = a + b\n", 414 | "print (c, d)" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "Because you want to know how to handle a given error (know what to do when it happens, for example assign \"inf\") you should not catch all exceptions (in the cell below, the last statement will not run). However in some cases, especially during writing or testing code catching all exceptions may be useful." 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "execution_count": null, 427 | "metadata": {}, 428 | "outputs": [], 429 | "source": [ 430 | "a = 0\n", 431 | "b = 4\n", 432 | "try:\n", 433 | " f = b / a\n", 434 | "except Exception as e:\n", 435 | " print (e.__doc__)\n", 436 | " \n", 437 | "g = a + b\n", 438 | "print (f, g)" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "This is why you may want to find the error, run additional code (e.g. logging), and then stop the script regardless." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": null, 451 | "metadata": {}, 452 | "outputs": [], 453 | "source": [ 454 | "a = 0\n", 455 | "b = 4\n", 456 | "try:\n", 457 | " f = b / a\n", 458 | "except Exception as e:\n", 459 | " print (e.__doc__)\n", 460 | " print (\"Error, stopping the script.\")\n", 461 | " raise \n", 462 | " \n", 463 | "g = a + b\n", 464 | "print (f, g)" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [ 471 | "In practice error handling may be more advanced. Now you do not need to know anything more. For the curious, read the following links:\n", 472 | "* https://docs.python.org/3/tutorial/errors.html\n", 473 | "* https://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/\n", 474 | "* http://www.pythonforbeginners.com/error-handling/exception-handling-in-python\n", 475 | "* http://eli.thegreenplace.net/2008/08/21/robust-exception-handling/" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": { 482 | "collapsed": true 483 | }, 484 | "outputs": [], 485 | "source": [] 486 | } 487 | ], 488 | "metadata": { 489 | "kernelspec": { 490 | "display_name": "Python 3", 491 | "language": "python", 492 | "name": "python3" 493 | }, 494 | "language_info": { 495 | "codemirror_mode": { 496 | "name": "ipython", 497 | "version": 3 498 | }, 499 | "file_extension": ".py", 500 | "mimetype": "text/x-python", 501 | "name": "python", 502 | "nbconvert_exporter": "python", 503 | "pygments_lexer": "ipython3", 504 | "version": "3.6.2" 505 | } 506 | }, 507 | "nbformat": 4, 508 | "nbformat_minor": 2 509 | } 510 | -------------------------------------------------------------------------------- /notebooks/03_Functions_and_Objects.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Functions and objects\n", 8 | "## Functions\n", 9 | "Writing simple functions for better code organization and avoiding doing the same things many times is common during data analysis. Functions in Python are different from many other programming languages in two ways:\n", 10 | "* automatic packing of multiple returned values,\n", 11 | "* passing arguments by their name.\n", 12 | "\n", 13 | "See examples of functions below, starting from the simplest one:" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": null, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "from math import pi\n", 23 | "# simple function with one required argument and one optional argument, which returns a number as a result\n", 24 | "def circle_surface(radius, pi = pi):\n", 25 | " return pi * radius ** 2\n", 26 | "\n", 27 | "print(circle_surface(3))" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "# You may want your function to calculate circumference and area of a circle\n", 37 | "def circle(radius, pi = pi):\n", 38 | " return 2 * pi * radius, pi * radius ** 2\n", 39 | "print(\"Circumference and area of a circle with radius of 3: \", circle(3))\n", 40 | "# when you pass arguments with their names, you do not have to maintain order\n", 41 | "print(\"The same with arguments reversed: \", circle(pi = 3.1415, radius = 3))\n", 42 | "# you can unpack automatically packed results\n", 43 | "perimiter, surface = circle(pi = 3.1415, radius = 3)\n", 44 | "print(\"Circumference of a circle of radius 3:\", perimiter, \"area of this circle: \", surface)" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "As you can see, everything is meant to be convenient and fast to type. Passing an argument with a name is particularly useful if a function has a lot of arguments with default values and you want to change only one of them.\n", 52 | "\n", 53 | "## Lambda (anonymous) functions\n", 54 | "Sometimes defining a function and putting it at the beginning of your script may seem not useful, e.g. because the function is very simple and you will not use it multiple times. This type of function has no advantage over standard function, and their use is often a matter of coding style. You may see it in other programmers' code, so it is good to know about it." 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": null, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "f = lambda r: pi * r ** 2\n", 64 | "print(f(3))" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "Function may also return another function. In this case using lambda function is convenient and makes code more readable." 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [ 80 | "def switchBMI(sex = \"M\"):\n", 81 | " if sex == \"M\":\n", 82 | " return lambda weight, height: weight / height ** 2\n", 83 | " else:\n", 84 | " return lambda weight, height: (weight - 2) / height ** 2\n", 85 | "BMI = switchBMI(\"M\")\n", 86 | "print(BMI(75, 1.90))\n", 87 | "BMI = switchBMI(\"F\")\n", 88 | "print(BMI(75, 1.90))" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## Variable ranges\n", 96 | "When using functions you have to remember that arguments in Python are passed by assignment (operator \"=\"). It means you have to know, how this operator works for a particular argument - whether it will be a copy or just a reference. Compare these two cells:" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "def change_arg(arg_list):\n", 106 | " print('Input inside the function: ', arg_list)\n", 107 | " arg_list.append('black')\n", 108 | " print('Change within function: ', arg_list)\n", 109 | "\n", 110 | "colors = [\"red\", \"blue\", \"green\"]\n", 111 | "\n", 112 | "print('Variable before function call: ', colors)\n", 113 | "change_arg(colors)\n", 114 | "print('Variable after function call: ', colors)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "def change_arg(arg_list):\n", 124 | " print('Input inside the function: ', arg_list)\n", 125 | " arg_list = ['cyan', 'magenta', 'yellow']\n", 126 | " print('Change within function: ', arg_list)\n", 127 | "\n", 128 | "colors = [\"red\", \"blue\", \"green\"]\n", 129 | "\n", 130 | "print('Variable before function call: ', colors)\n", 131 | "change_arg(colors)\n", 132 | "print('Variable after function call: ', colors)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "In the first case a reference was passed to the function. Using append() method you changed the content of what had been located at the given address. You did not try to change the argument itself (reference/address). Function has changed the content of what was outside function.\n", 140 | "\n", 141 | "In the second case a new list was assigned to the \"arg_list\" argument, so you tried to changed the passed argument, which was impossible. A function cannot change the argument outside function. The change was local only." 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "Every function in Python has access (read mode) to variables defined in the script. The example below is NOT consistent with best programming practices. However, knowledge about this may save some time if you want to get results as quickly as possible." 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "multiplier = 5\n", 158 | "def circle_surface(radius, pi = pi):\n", 159 | " return multiplier * pi * radius ** 2\n", 160 | "\n", 161 | "print(circle_surface(3))" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "## Dynamic list of arguments\n", 169 | "Python allows you to write a function which takes an unspecified number of arguments. It may be a list (operator - \\*) or a dictionary (operator - \\*\\*). The naming convention is \\*args and \\*\\*kwargs, respectively. Even though in structured programming it is rarely used, in object-oriented programming it is very useful, e.g. for expanding an existing class. This is why you may find it often looking at code of existing libraries." 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "def printArgs(*args):\n", 179 | " for arg in args:\n", 180 | " print(arg)\n", 181 | "\n", 182 | "printArgs(\"red\", \"blue\", \"green\")" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "def printKwargs(**kwargs):\n", 192 | " for name, value in kwargs.items():\n", 193 | " print(name, value)\n", 194 | "\n", 195 | "printKwargs(height = 1.92, age = 32, name = \"Maciej\")" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": { 201 | "collapsed": true 202 | }, 203 | "source": [ 204 | "## Profiling\n", 205 | "When code is too slow or slower than expected, it is a good idea to measure it precisely, and in the case of more complicated functions - profile their elements. Notebook has convenient built-in tools: %timeit, %%timeit, %prun (there are other commands, more information below).\n", 206 | "\n", 207 | "Look at the following examples:" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": null, 213 | "metadata": { 214 | "collapsed": true 215 | }, 216 | "outputs": [], 217 | "source": [ 218 | "import math\n", 219 | "# A function with multiple steps.\n", 220 | "x = list(range(10000))\n", 221 | "def complexFunction(x):\n", 222 | " results = []\n", 223 | " for k in x:\n", 224 | " if k >= 500:\n", 225 | " results.append(math.sin(k))\n", 226 | " else:\n", 227 | " results.append(math.cos(k))\n", 228 | " for i in results:\n", 229 | " i = math.pow(i, 2)\n", 230 | " \n", 231 | " for i in range(len(results)):\n", 232 | " results[i] = math.pow(results[i], 2)\n", 233 | " return results" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "# Mean execution time\n", 243 | "%timeit complexFunction(x)\n", 244 | "# Mean execution time with a specified number of loops\n", 245 | "%timeit -n 57 complexFunction(x)" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": null, 251 | "metadata": {}, 252 | "outputs": [], 253 | "source": [ 254 | "%%timeit\n", 255 | "# %%timeit allows you to measure execution time of a whole cell\n", 256 | "x = list(range(10000))\n", 257 | "complexFunction(x)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "%prun complexFunction(x)\n", 267 | "# this line magic opens a window in the bottom of the site with detailed information\n", 268 | "# how many times every function has been called and how much time it has taken" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "%%prun\n", 278 | "# You may profile a whole cell, if your code has multiple lines.\n", 279 | "# You do not have to make a function of a cell to measure its performance.\n", 280 | "y = complexFunction(x)\n", 281 | "complexFunction(y)" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "You may see other line magics in Notebook and read more on: http://ipython.readthedocs.io/en/stable/interactive/magics.html" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "%lsmagic" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "## Objects\n", 305 | "For beginners and intermediate Python users deep knowledge about classes/objects is not essential. However, it is good to have a general idea about the topic to analyse code written by other programmers using object-oriented programming.\n", 306 | "\n", 307 | "Currently, most popular programming languages are object-oriented. There are several advantages of objects. First, already mentioned at the beginning of the course, is that you can create elements which possess a state (just like variables), other predefined attributes (multiple variables) and functions. Because you can create multiple objects/instances of the same class simultaneously, object-oriented programming makes situations in which you need multiple instances (e.g. of users) much easier compared with structured programming.\n", 308 | "\n", 309 | "Additionally, object-oriented programming makes encapsulation (a kind of code organization and separation) compulsory. It becomes practical in large projects, because it makes code managing and debugging much easier. You may read more about advantages and disadvantages of object-oriented programming here:\n", 310 | "* https://www.roberthalf.com/blog/salaries-and-skills/4-advantages-of-object-oriented-programming\n", 311 | "* https://softwareengineering.stackexchange.com/a/120038\n", 312 | "* http://www.freekpaans.nl/2015/06/exploring-the-essence-of-object-oriented-programming/\n", 313 | "\n", 314 | "Below there is an example of a simple class, which should make you understand the difference between class attributes and element (single instance of class) attributes." 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": { 321 | "collapsed": true 322 | }, 323 | "outputs": [], 324 | "source": [ 325 | "class SimpleClass:\n", 326 | " # Class attribute\n", 327 | " i = 3\n", 328 | " def __init__(self):\n", 329 | " # Attribute of an instance of class\n", 330 | " self.j = 7" 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "We can change attributes of a single object in a way which does not modify other instances." 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": {}, 344 | "outputs": [], 345 | "source": [ 346 | "a = SimpleClass()\n", 347 | "b = SimpleClass()\n", 348 | "print(a.i, b.i)\n", 349 | "\n", 350 | "a.j = 8\n", 351 | "print(a.i, b.i, a.j, b.j)" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "The line below changes the definition of a class. All existing instances (objects) will be modified." 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "SimpleClass.i = 5\n", 368 | "print(a.i, b.i, a.j, b.j)\n", 369 | "\n", 370 | "# New objects will be created according to the modified instruttions.\n", 371 | "c = SimpleClass()\n", 372 | "print(c.i, c.j)" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "However, if you try to assign a value to the same variable name (as attribute of an instance), \"i\" will become an instance attribute for object \"a\", but for other objects it will still be a class attribute." 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": {}, 386 | "outputs": [], 387 | "source": [ 388 | "a.i = 1\n", 389 | "SimpleClass.i = 17\n", 390 | "d = SimpleClass()\n", 391 | "print(a.i, b.i, c.i, d.i)" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "It is good to remember that instance attributes override and overwrite class attributes." 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "Consider now the word \"self\" and see how class methods are called." 406 | ] 407 | }, 408 | { 409 | "cell_type": "code", 410 | "execution_count": null, 411 | "metadata": { 412 | "collapsed": true 413 | }, 414 | "outputs": [], 415 | "source": [ 416 | "class NewClass:\n", 417 | " def __init__(self):\n", 418 | " # Instance attribute\n", 419 | " self.name = \"Maciej\"\n", 420 | " # Static function\n", 421 | " def hi():\n", 422 | " # Instance attribute\n", 423 | " print(\"Hi\")\n", 424 | " \n", 425 | " # Static function\n", 426 | " def hi2(self):\n", 427 | " # Instance attribute\n", 428 | " print(\"Hi\")\n", 429 | " \n", 430 | " def personalized_hi(self):\n", 431 | " print(\"Hi,\", self.name)" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": null, 437 | "metadata": { 438 | "collapsed": true 439 | }, 440 | "outputs": [], 441 | "source": [ 442 | "uczen = NewClass()" 443 | ] 444 | }, 445 | { 446 | "cell_type": "markdown", 447 | "metadata": {}, 448 | "source": [ 449 | "Both lines of code in the cell below work in exactly the same way. Usually the first, shorter type is used. In practice every time when *instance.method()* gets called, a *class.method(instance)* gets called. It means that when you call a method of an instance, you call the method of a class and pass an object there." 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": null, 455 | "metadata": {}, 456 | "outputs": [], 457 | "source": [ 458 | "uczen.personalized_hi()\n", 459 | "NewClass.personalized_hi(uczen)" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "This is why code below does not work:" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": null, 472 | "metadata": {}, 473 | "outputs": [], 474 | "source": [ 475 | "uczen.hi()" 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": {}, 481 | "source": [ 482 | "You are allowed to call a static function only by calling a class method without passing arguments." 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": null, 488 | "metadata": {}, 489 | "outputs": [], 490 | "source": [ 491 | "NewClass.hi()" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "You can create the same function with argument self (see the definition if hi2 above) and then not use it." 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": null, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "uczen.hi2()" 508 | ] 509 | }, 510 | { 511 | "cell_type": "markdown", 512 | "metadata": {}, 513 | "source": [ 514 | "However, it makes the code below throw an error." 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": null, 520 | "metadata": {}, 521 | "outputs": [], 522 | "source": [ 523 | "NewClass.hi2()" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "In practice, static functions without passing a class instance are usually not used.\n", 531 | "Note that \"self\" is not a keyword in Python, but a widely used convention. The code below is correct, but writing such classes is strongly discouraged. Using of the word \"self\" in Python is so common and widespread, that some IDEs are based on its existence." 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": null, 537 | "metadata": {}, 538 | "outputs": [], 539 | "source": [ 540 | "class UglyClass:\n", 541 | " def __init__(self):\n", 542 | " self.name = \"Maciej\"\n", 543 | " def personalized_hi(anyWord):\n", 544 | " print(\"Hi, \", anyWord.name)\n", 545 | "test = UglyClass()\n", 546 | "test.personalized_hi()" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": {}, 552 | "source": [ 553 | "Details about classes such as inheritance are left for later, when you knowledge of Python will be deeper." 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": { 559 | "collapsed": true 560 | }, 561 | "source": [ 562 | "You may read more about objects here: http://python-textbok.readthedocs.io/en/1.0/Classes.html" 563 | ] 564 | }, 565 | { 566 | "cell_type": "code", 567 | "execution_count": null, 568 | "metadata": { 569 | "collapsed": true 570 | }, 571 | "outputs": [], 572 | "source": [] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": null, 577 | "metadata": { 578 | "collapsed": true 579 | }, 580 | "outputs": [], 581 | "source": [] 582 | }, 583 | { 584 | "cell_type": "code", 585 | "execution_count": null, 586 | "metadata": { 587 | "collapsed": true 588 | }, 589 | "outputs": [], 590 | "source": [] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": null, 595 | "metadata": { 596 | "collapsed": true 597 | }, 598 | "outputs": [], 599 | "source": [] 600 | } 601 | ], 602 | "metadata": { 603 | "kernelspec": { 604 | "display_name": "Python 3", 605 | "language": "python", 606 | "name": "python3" 607 | }, 608 | "language_info": { 609 | "codemirror_mode": { 610 | "name": "ipython", 611 | "version": 3 612 | }, 613 | "file_extension": ".py", 614 | "mimetype": "text/x-python", 615 | "name": "python", 616 | "nbconvert_exporter": "python", 617 | "pygments_lexer": "ipython3", 618 | "version": "3.6.2" 619 | } 620 | }, 621 | "nbformat": 4, 622 | "nbformat_minor": 2 623 | } 624 | -------------------------------------------------------------------------------- /notebooks/04_NumPy.ipynb: -------------------------------------------------------------------------------- 1 | {"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.10.14","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[],"dockerImageVersionId":30786,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# NumPy\nNumpy is an essential Python package for vector and matrix operations. It is fast, flexible, useful for handling large datasets and pseudorandom numbers. It makes a foundation for multiple other packages, including Pandas.","metadata":{}},{"cell_type":"markdown","source":"## Creating objects\n### Manual filling\nUsually when using Numpy as one of multiple libaries, we import it as np (import numpy as np). To increase code readability in this notebook we import all numpy functions directly.\n\nLet us begin with two easiest examples, creating a vector and a matrix.","metadata":{}},{"cell_type":"code","source":"from numpy import *\n\nv = array([1,2,3,4])\n\nprint(v, type(v), v.shape, v.size)\n\n\nM = array([[1,2],[3,4]])\nprint(M, type(M), M.shape, M.size)","metadata":{"execution":{"iopub.status.busy":"2024-10-31T14:19:09.924896Z","iopub.execute_input":"2024-10-31T14:19:09.925572Z","iopub.status.idle":"2024-10-31T14:19:09.934199Z","shell.execute_reply.started":"2024-10-31T14:19:09.925530Z","shell.execute_reply":"2024-10-31T14:19:09.932888Z"},"trusted":true},"outputs":[{"name":"stdout","text":"[1 2 3 4] (4,) 4\n[[1 2]\n [3 4]] (2, 2) 4\n","output_type":"stream"}],"execution_count":5},{"cell_type":"markdown","source":"### \"zeros\", \"ones\" and \"empty\"\nYou rarely want to fill vector/matrix values manually. Usually you would want to initialize it in a different way. Most common options are zeros, ones and empty. Each of these functions requires a shape argument and optionally takes arguments about matrix's type and orientation (default: rows x columns). You can find other ways to create arrays here:\nhttps://docs.scipy.org/doc/numpy/reference/routines.array-creation.html ","metadata":{}},{"cell_type":"code","source":"print(zeros((3, 4)))\n\nprint(ones((3, 4)), float32)\n\n# You cannot foresee what will be used to fill a matrix when using empty.\n# Because it is only marginally faster than zeros and ones you should not use it often.\nprint(empty((3, 4)))","metadata":{"execution":{"iopub.status.busy":"2024-10-31T12:35:56.449551Z","iopub.execute_input":"2024-10-31T12:35:56.450640Z","iopub.status.idle":"2024-10-31T12:35:56.458607Z","shell.execute_reply.started":"2024-10-31T12:35:56.450592Z","shell.execute_reply":"2024-10-31T12:35:56.457238Z"},"trusted":true},"outputs":[{"name":"stdout","text":"[[0. 0. 0. 0.]\n [0. 0. 0. 0.]\n [0. 0. 0. 0.]]\n[[1. 1. 1. 1.]\n [1. 1. 1. 1.]\n [1. 1. 1. 1.]] \n[[1. 1. 1. 1.]\n [1. 1. 1. 1.]\n [1. 1. 1. 1.]]\n","output_type":"stream"}],"execution_count":4},{"cell_type":"markdown","source":"### Convert existing object to a matrix\nSometimes you may want to create a matrix from an existing object. It is good to mention here that changing shape of a matrix is easy and fast even for large matrices. It does not change the way how matrix is stored in memory, just the way how memory is interpreted.","metadata":{}},{"cell_type":"code","source":"x = list(range(12))\nprint(\"List: \", x)\n\nprint(\"numpy matrix:\", asarray(x))\n\nprint(\"numpy matrix with a different shape:\\n\", asarray(x).reshape((3, 4)))","metadata":{"execution":{"iopub.status.busy":"2024-10-31T14:34:34.402335Z","iopub.execute_input":"2024-10-31T14:34:34.402743Z","iopub.status.idle":"2024-10-31T14:34:34.410301Z","shell.execute_reply.started":"2024-10-31T14:34:34.402704Z","shell.execute_reply":"2024-10-31T14:34:34.408758Z"},"trusted":true},"outputs":[{"name":"stdout","text":"List: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]\nnumpy matrix: [ 0 1 2 3 4 5 6 7 8 9 10 11]\nnumpy matrix with a different shape:\n [[ 0 1 2 3]\n [ 4 5 6 7]\n [ 8 9 10 11]]\n","output_type":"stream"}],"execution_count":11},{"cell_type":"markdown","source":"### Vector/matrix as range\nReshape method called on an existing matrix does not return its copy. It creates a new \"view\" of the same data.\n\nLook at the example below, which uses arange - a way to create matrix from a range, which takes arguments analogous to Python's range.","metadata":{}},{"cell_type":"code","source":"r = arange(0, 23, 2)\nprint(r)\n\nR = r.reshape((3, 4))\n# Alternatively, you can write the line above like this: R = reshape(r, (3, 4))\n\nr[0]=10\nr.shape=(3, 4)\nprint(\"Changing the value of 'r' also changes the value of 'R'\")\nprint(R)\nprint(r)","metadata":{"execution":{"iopub.status.busy":"2024-10-31T14:38:09.473965Z","iopub.execute_input":"2024-10-31T14:38:09.475105Z","iopub.status.idle":"2024-10-31T14:38:09.482149Z","shell.execute_reply.started":"2024-10-31T14:38:09.475057Z","shell.execute_reply":"2024-10-31T14:38:09.480939Z"},"trusted":true},"outputs":[{"name":"stdout","text":"[ 0 2 4 6 8 10 12 14 16 18 20 22]\nChanging the value of 'r' also changes the value of 'R'\n[[10 2 4 6]\n [ 8 10 12 14]\n [16 18 20 22]]\n[[10 2 4 6]\n [ 8 10 12 14]\n [16 18 20 22]]\n","output_type":"stream"}],"execution_count":14},{"cell_type":"markdown","source":"Those interested how matrix implementation works should read the following stackoverflow answer: http://stackoverflow.com/a/22074424 Those very interested may read the documentation https://docs.scipy.org/doc/numpy/reference/internals.html","metadata":{}},{"cell_type":"markdown","source":"### Random contents\nNumpy allows filling a matrix with random numbers using a given distribution. There are many available distributions: https://docs.scipy.org/doc/numpy/reference/routines.random.html. In every available method if you do not pass a matrix's shape you get one random number returned. Remember that the methods belong to the object random.","metadata":{}},{"cell_type":"code","source":"print(\"Single random numbers:\")\nprint(random.random()) # float U~[0,1)]\n\nprint(random.normal()) # float N~[0,1)]\n\nprint(random.normal(20, 5)) # float N~[20,5)]\n\nprint(\"\\nMatrix of a uniform distribution [0,1):\")\nprint(random.random((3, 4))) # float U~[0,1)]\n\nprint(\"\\nMatrix of a normal distribution N~(0,1):\")\n\nprint(random.normal(size=(3, 4))) # float N~(0,1)]\n\nprint(\"\\nMatrix of a normal distribution N~(20,5):\")\nprint(random.normal(20, 5, (3, 4))) # float N~(20,5)]","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"Single random numbers:\n\n0.8363538715461638\n\n-0.5271872805320973\n\n16.01146155468938\n\n\n\nMatrix of a uniform distribution [0,1):\n\n[[0.6403578 0.56110941 0.15871126 0.76322276]\n\n [0.98559098 0.78269753 0.50414145 0.82533653]\n\n [0.32318423 0.7243911 0.7183383 0.76714993]]\n\n\n\nMatrix of a normal distribution N~(0,1):\n\n[[ 1.59293474 -0.69124521 -0.23815543 -0.35205644]\n\n [-0.58088989 0.40167456 -0.97436438 0.06092411]\n\n [-0.93305085 -0.42883561 -1.12750697 1.80836823]]\n\n\n\nMatrix of a normal distribution N~(20,5):\n\n[[12.46732773 13.46942019 17.76110242 14.38409448]\n\n [15.82471115 16.62002818 15.92237535 22.02263231]\n\n [19.91401444 19.58775294 16.89260299 19.00785207]]\n"}],"execution_count":5},{"cell_type":"markdown","source":"## Transformation\nYou have seen basic transformations in examples above. Some additional useful transformation are shown below.\n\nNumpy makes transformations easy. If one of passed dimensions equals -1, you tell numpy to infer the new dimension.","metadata":{}},{"cell_type":"code","source":"r = arange(0, 12, 1)\nprint(r)\n\nprint(r.reshape((2, -1)))\nprint(r.reshape((2, 2, -1))) # Three-dimensional matrix / two matrices 2x3\n\nprint(\"A special case is creating a one-dimensional object and a zero-dimensional vector.\")\nr.shape=(3,4)\nprint(r.reshape(1, -1).shape, r.reshape(-1).shape)\nprint(\"The last example is so widely used that it even has its own functions.\")\nprint(\"As you have seen earlier, .reshape only returns a new view if it is possible.\")\nprint(\".ravel() behaves in a fully analogous way to r.reshape(-1).shape\")\nr.shape=(3,4)\nprint(r.ravel().shape)\nprint(\"However, .flatten() returns a copy\")\nprint(r)\n\np = r.flatten()\nr[0]=100\nprint(r)\nprint(p)\n","metadata":{"execution":{"iopub.status.busy":"2024-10-31T14:54:23.187008Z","iopub.execute_input":"2024-10-31T14:54:23.187392Z","iopub.status.idle":"2024-10-31T14:54:23.196844Z","shell.execute_reply.started":"2024-10-31T14:54:23.187345Z","shell.execute_reply":"2024-10-31T14:54:23.195452Z"},"trusted":true},"outputs":[{"name":"stdout","text":"[ 0 1 2 3 4 5 6 7 8 9 10 11]\n[[ 0 1 2 3 4 5]\n [ 6 7 8 9 10 11]]\n[[[ 0 1 2]\n [ 3 4 5]]\n\n [[ 6 7 8]\n [ 9 10 11]]]\nA special case is creating a one-dimensional object and a zero-dimensional vector.\n(1, 12) (12,)\nThe last example is so widely used that it even has its own functions.\nAs you have seen earlier, .reshape only returns a new view if it is possible.\n.ravel() behaves in a fully analogous way to r.reshape(-1).shape\n(12,)\nHowever, .flatten() returns a copy\n[[ 0 1 2 3]\n [ 4 5 6 7]\n [ 8 9 10 11]]\n[[100 100 100 100]\n [ 4 5 6 7]\n [ 8 9 10 11]]\n[ 0 1 2 3 4 5 6 7 8 9 10 11]\n","output_type":"stream"}],"execution_count":16},{"cell_type":"markdown","source":"## Memory addressing and indexing\nNumpy creates row-major matrices by default. If you select a part of a two-dimensional matrix using only one dimension, you get the specified row. If you want to select a column, you also have to select all rows (:).","metadata":{}},{"cell_type":"code","source":"r = arange(0, 12)\nr.shape=(3, 4)\nprint(r)\nprint(\"Row: \", r[2], \"; row's dimension\", r[2].shape) # alternatively, r[2, :]\nprint(\"Column: \", r[:,2])","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[[ 0 1 2 3]\n\n [ 4 5 6 7]\n\n [ 8 9 10 11]]\n\nRow: [ 8 9 10 11] ; row's dimension (4,)\n\nColumn: [ 2 6 10]\n"}],"execution_count":7},{"cell_type":"markdown","source":"An interesting fact is that if you choose a single row it is a vector (second dimension equals 0). Dimensions (4,) and (4,1) are not the same views. You may see this problem occur during algebraic operations which require a one-dimensional structure (not zero-dimensional).","metadata":{}},{"cell_type":"markdown","source":"If you assign values to a part of a matrix, you do not have to care about problems with memory addressing or references. The behavior is as expected.","metadata":{}},{"cell_type":"code","source":"r[1]=r[2]\nprint(r)\nr[2,1]=100\nprint(r)","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[[ 0 1 2 3]\n\n [ 8 9 10 11]\n\n [ 8 9 10 11]]\n\n[[ 0 1 2 3]\n\n [ 8 9 10 11]\n\n [ 8 100 10 11]]\n"}],"execution_count":8},{"cell_type":"markdown","source":"You can change values of cells, rows, columns and parts of matrix.","metadata":{}},{"cell_type":"code","source":"ones1 = ones((4,4))\nones2 = ones((2,2))\nprint(\"Large matrix:\")\nprint(ones1)\nprint(\"Small matrix:\")\nprint(ones2)\nones1[1:3, 1:3]+=ones2\nprint(\"Changed large matrix:\")\nprint(ones1)","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"Large matrix:\n\n[[1. 1. 1. 1.]\n\n [1. 1. 1. 1.]\n\n [1. 1. 1. 1.]\n\n [1. 1. 1. 1.]]\n\nSmall matrix:\n\n[[1. 1.]\n\n [1. 1.]]\n\nChanged large matrix:\n\n[[1. 1. 1. 1.]\n\n [1. 2. 2. 1.]\n\n [1. 2. 2. 1.]\n\n [1. 1. 1. 1.]]\n"}],"execution_count":9},{"cell_type":"markdown","source":"# Combining/dividing matrices\nThese operations are often used. Useful functions for this purpose are concatenate and split. The first name is hard to remember, if you want you may use aliases (vstack and hstack). For the sake of readability, only shapes of newly created matrices are printed.","metadata":{}},{"cell_type":"code","source":"x = random.randint(0, 12, size=(3, 4))\ny = random.randint(0, 12, size=(3, 4))\n\n\nprint(x.shape, y.shape)\nprint(\"Concatenating vertically:\")\nprint(concatenate((x,y), axis=0).shape, vstack((x,y)).shape)\n\nprint(\"Concatenating horizontally:\")\nprint(concatenate((x,y), axis=1).shape, hstack((x,y)).shap)\n\nprint(\"Split returns a list which contains two matrices.\")\nprint(split(x, 2, axis=1)[0].shape, split(x, 2, axis=1)[1].shape)\n\nprint(\"You can only split the x matrix to two halves only by columns, because the number of rows is odd.\")\nprint(split(x, 3, axis=0)[0].shape, split(x, 3, axis=0)[1].shape, split(x, 3, axis=0)[2].shape)","metadata":{"execution":{"iopub.status.busy":"2024-10-31T13:02:08.271499Z","iopub.execute_input":"2024-10-31T13:02:08.271942Z","iopub.status.idle":"2024-10-31T13:02:08.289731Z","shell.execute_reply.started":"2024-10-31T13:02:08.271901Z","shell.execute_reply":"2024-10-31T13:02:08.288458Z"},"trusted":true},"outputs":[{"name":"stdout","text":"[[ 1 9 6 1]\n [ 1 3 3 4]\n [11 7 11 4]]\n(3, 4) (3, 4)\nConcatenating vertically:\n(6, 4) (6, 4)\nConcatenating horizontally:\n(3, 8) (3, 8)\nSplit returns a list which contains two matrices.\n(3, 2) (3, 2)\nYou can only split the x matrix to two halves only by columns, because the number of rows is odd.\n(1, 4) (1, 4) (1, 4)\n","output_type":"stream"}],"execution_count":10},{"cell_type":"markdown","source":"## Variable types\nAll available numpy data types are shown here: https://docs.scipy.org/doc/numpy/user/basics.types.html. Fortunately,\nnumpy allows for easy typecasting. However, if running time and/or memory efficiency are important, you should declare the right type at the beginning. Remember that choosing a different type than default float64 may cause problems. If you make a mistake you may create a \"nuclear Gandhi\" (http://knowyourmeme.com/memes/nuclear-gandhi).","metadata":{}},{"cell_type":"code","source":"L = array([1, 2, 2.5, -1, -1.25, 32764, 4294961294, 18446744073709541613])\n\nprint(L)\nprint(L.dtype)\n\nprint(L.astype(int))\n\nprint(L.astype(uint))\nprint(L.astype(int8))\nprint(L.astype(uint8))","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[ 1.00000000e+00 2.00000000e+00 2.50000000e+00 -1.00000000e+00\n\n -1.25000000e+00 3.27640000e+04 4.29496129e+09 1.84467441e+19]\n\nfloat64\n\n[ 1 2 2\n\n -1 -1 32764\n\n 4294961294 -9223372036854775808]\n\n[ 1 2 2\n\n 18446744073709551615 18446744073709551615 32764\n\n 4294961294 18446744073709541376]\n\n[ 1 2 2 -1 -1 -4 0 0]\n\n[ 1 2 2 255 255 252 0 0]\n"}],"execution_count":11},{"cell_type":"markdown","source":"## Descriptive statistics\nFunctions used in descriptive statistics are implemented in a standard and efficient way. The full list is available here: https://docs.scipy.org/doc/numpy/reference/routines.statistics.html. A few examples are shown below.","metadata":{}},{"cell_type":"code","source":"r = arange(0, 12)\nr.shape=(3, 4)\n\nprint(r)\n\nprint(amin(r))\n\nprint(amin(r, axis=0))\n\nprint(amin(r, axis=1))\n","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[[ 0 1 2 3]\n\n [ 4 5 6 7]\n\n [ 8 9 10 11]]\n\n0\n\n[0 1 2 3]\n\n[0 4 8]\n"}],"execution_count":12},{"cell_type":"code","source":"r = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n\n# Summary statistics similar to pandas.describe()\nsummary = {\n 'count': r.size,\n 'mean': mean(r),\n 'std': std(r, ddof=1), # ddof=1 for sample standard deviation\n 'min': min(r),\n '25%': percentile(r, 25),\n '50% (median)': median(r),\n '75%': percentile(r, 75),\n 'max': max(r)\n}\n\nprint(summary)","metadata":{"execution":{"iopub.status.busy":"2024-10-31T15:09:26.880593Z","iopub.execute_input":"2024-10-31T15:09:26.881032Z","iopub.status.idle":"2024-10-31T15:09:26.889728Z","shell.execute_reply.started":"2024-10-31T15:09:26.880977Z","shell.execute_reply":"2024-10-31T15:09:26.888683Z"},"trusted":true},"outputs":[{"name":"stdout","text":"{'count': 10, 'mean': 5.5, 'std': 3.0276503540974917, 'min': 1, '25%': 3.25, '50% (median)': 5.5, '75%': 7.75, 'max': 10}\n","output_type":"stream"}],"execution_count":20},{"cell_type":"code","source":"x = random.normal(size=(100, 100))\n\nhist = histogram(x, bins=10)\n\n# length of a list of counts in an interval equals the number of bins\nprint(hist[0])\n# length of a list of interval's boundaries equals the number of bins + 1\nprint(hist[1])","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[ 9 64 465 1543 2805 2842 1696 484 79 13]\n\n[-3.97520557 -3.18469509 -2.39418462 -1.60367414 -0.81316367 -0.02265319\n\n 0.76785729 1.55836776 2.34887824 3.13938872 3.92989919]\n"}],"execution_count":13},{"cell_type":"markdown","source":"## Sorting\nSorting is an operation used quite often. It is important to see a difference between sorting in place and sorting a copy. It may be surprising that numpy does not have implemented sorting in descending order by default. You have to flip a sorted matrix.","metadata":{}},{"cell_type":"code","source":"x = random.randint(0, 12, size=(3, 4))\nprint(x)\nprint(\"np.sort returns a copy of an object.\")\n\nprint(\"Sorting by column means sorting rows -> axis=0\")\nprint(sort(x, axis=0))\nprint(flipud(sort(x, axis=0)))\n\nprint(\"Sorting by row means sorting columns -> axis=1\")\nprint(sort(x, axis=1))\nprint(fliplr(sort(x, axis=1)))\n\nprint(\"\\nSorting in place is fully analogous:\")\nx.sort(axis=0)\nprint(x)\nx.sort(axis=1)\nprint(x)","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[[ 6 11 6 4]\n\n [ 9 2 0 8]\n\n [ 6 1 9 2]]\n\nnp.sort returns a copy of an object.\n\nSorting by column means sorting rows -> axis=0\n\n[[ 6 1 0 2]\n\n [ 6 2 6 4]\n\n [ 9 11 9 8]]\n\n[[ 9 11 9 8]\n\n [ 6 2 6 4]\n\n [ 6 1 0 2]]\n\nSorting by row means sorting columns -> axis=1\n\n[[ 4 6 6 11]\n\n [ 0 2 8 9]\n\n [ 1 2 6 9]]\n\n[[11 6 6 4]\n\n [ 9 8 2 0]\n\n [ 9 6 2 1]]\n\n\n\nSorting in place is fully analogous:\n\n[[ 6 1 0 2]\n\n [ 6 2 6 4]\n\n [ 9 11 9 8]]\n\n[[ 0 1 2 6]\n\n [ 2 4 6 6]\n\n [ 8 9 9 11]]\n"}],"execution_count":14},{"cell_type":"markdown","source":"You may wonder if flipping a matrix is faster than multiplying it by -1 two times. As you can see, the difference is small.","metadata":{}},{"cell_type":"code","source":"# Create a large matrix, so that possible differences are significant:\nx = random.normal(size=(1000, 1000))\n%timeit flipud(sort(x, axis=0))\n%timeit (-1*sort(-x, axis=0))","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"48.8 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n\n57.7 ms ± 2.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"}],"execution_count":15},{"cell_type":"markdown","source":"Sometimes you may want to know what are the sorted indices of the matrix.","metadata":{}},{"cell_type":"code","source":"x = random.randint(0, 12, size=(3, 4)).copy()\nprint(x)\nprint(argsort(x, axis=0))\n# print(x[argsort(x, axis=0)])\nprint(argsort(x, axis=1))","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[[ 2 0 10 9]\n\n [ 8 4 4 1]\n\n [10 3 0 9]]\n\n[[0 0 2 1]\n\n [1 2 1 0]\n\n [2 1 0 2]]\n\n[[1 0 3 2]\n\n [3 1 2 0]\n\n [2 1 3 0]]\n"}],"execution_count":23},{"cell_type":"markdown","source":"## Algebra and other functions\nNumpy has many mathematical, algebraic etc. functions built in. You will see a few examples below. Ability to find a function you need in the documentation is a crucial skill. Not only you see if a function is implemented, but also you know how the implementation works.\n\nUsing the example of mathematical functions you will see how important it is to use built-in functions whenever possible.\n\n* Mathematical functions: https://docs.scipy.org/doc/numpy/reference/routines.math.html\n* Algebraic functions: https://docs.scipy.org/doc/numpy/reference/routines.linalg.html\n* Other groups of functions: https://docs.scipy.org/doc/numpy/reference/index.html","metadata":{}},{"cell_type":"code","source":"M = arange(0, 4)\nN = arange(1, 5)\n\nM.shape=(-1)\nN.shape=(-1)\n\nprint(M)\nprint(N)\n\nprint(\"Multiplying vectors:\")\nprint(dot(M,N))\n\nprint(\"\\nmatrices\")\nM = arange(0, 4)\nN = arange(1, 5)\nM.shape=(2, 2)\nN.shape=(2, 2)\n\nprint(M)\nprint(N)\n\nprint(\"\\nMultiplying matrices:\")\nprint(dot(M,N))\n\nprint(\"\\nInversing matrices:\")\nprint(linalg.inv(M))","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"[0 1 2 3]\n\n[1 2 3 4]\n\nMultiplying vectors:\n\n20\n\n\n\nmatrices\n\n[[0 1]\n\n [2 3]]\n\n[[1 2]\n\n [3 4]]\n\n\n\nMultiplying matrices:\n\n[[ 3 4]\n\n [11 16]]\n\n\n\nInversing matrices:\n\n[[-1.5 0.5]\n\n [ 1. 0. ]]\n"}],"execution_count":17},{"cell_type":"markdown","source":"Compare how fast a simple operation using a built-in numpy function is compared to solving the same problem using Python alone.","metadata":{}},{"cell_type":"code","source":"import math\nx = random.normal(size=(10000, 1))\nprint(\"Max:\")\n%timeit -n 100 max(x)\n%timeit -n 100 x.max()\n\nprint(\"Sin:\")\n%timeit -n 100 [math.sin(z) for z in x]\n%timeit -n 100 sin(x)\n\nprint(\"Mean:\")\n%timeit -n 100 mean(x)\n%timeit -n 100 x.mean()","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"Max:\n\n3.95 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n\n3.78 µs ± 609 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)\n\nSin:\n\n1.79 ms ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n\n147 µs ± 320 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)\n\nMean:\n\n9.16 µs ± 542 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)\n\n8.26 µs ± 493 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"}],"execution_count":18},{"cell_type":"markdown","source":"As you can see a simple functiom max is over a thousand times\\* faster in numpy. sin is over 10 times faster, the difference for a mean is only 10-20%.\n\n\\* When running the cell for the first time you may see a message: \"The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.\". It means that the max function is even slower in practice, as the best 3 results are probably using cache.","metadata":{}},{"cell_type":"markdown","source":"## Vectorize\nSometimes you may want to perform operations which are impossible to define using numpy functions. In this case you may rewrite a function to a vector \"numpy\" version. It is not perfect, but may often save time.","metadata":{}},{"cell_type":"code","source":"def newFunc(x):\n if x >= 0.5:\n return x\n else:\n return 0.0\n \nnewFuncV = vectorize(newFunc)\nx = random.normal(size=(10000, 1))\n%timeit y=newFuncV(x)","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"1.25 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"}],"execution_count":19},{"cell_type":"code","source":"%%timeit\ny=x.copy()\nfor i in range(y.shape[0]):\n if y[i]<0.5:\n y[i]=0 ","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"8.39 ms ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"}],"execution_count":20},{"cell_type":"markdown","source":"As you can see, a simple change may make the code visibly faster. In practice, optimisations like this are significant only when you use really large datasets.","metadata":{}},{"cell_type":"code","source":"","metadata":{"collapsed":true,"jupyter":{"outputs_hidden":true}},"outputs":[],"execution_count":null}]} -------------------------------------------------------------------------------- /notebooks/06_SQL_python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# SQL + Python integration\n", 8 | "\n", 9 | "Python allows us to connect and manage SQLite database. In order to do that we use module called sqlite3.\n", 10 | "The sqlite3 module was written by Gerhard Häring. It provides a SQL interface compliant with the DB-API 2.0.\n", 11 | "\n", 12 | "\n", 13 | "The DB API provides a minimal standard for working with databases, using Python structures and syntax wherever possible. This API includes the following:\n", 14 | "\n", 15 | " Connections, which cover guidelines for how to connect to databases\n", 16 | "\n", 17 | " Executing statements and stored procedures to query, update, insert, and delete data with cursors\n", 18 | "\n", 19 | " Transactions, with support for committing or rolling back a transaction\n", 20 | "\n", 21 | " Examining metadata on the database module as well as on database and table structure\n", 22 | "\n", 23 | " Defining the types of errors\n" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "# Create\\Close Connection\n", 31 | "\n", 32 | "To use sqlite3 module we have to first import it and than we create a Connection object (cnn) that represents the database.\n", 33 | "We pass the database file name (should be located in the same directory as jupyter notebook file) as parameter in connect method. " 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": { 40 | "collapsed": true 41 | }, 42 | "outputs": [], 43 | "source": [ 44 | "import sqlite3\n", 45 | "# We connect to testdb2.db database\n", 46 | "conn = sqlite3.connect('testdb2.db')\n", 47 | "\n", 48 | "\n", 49 | "# We close the connection and free all resources\n", 50 | "conn.close()" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "# Create table\n", 58 | "\n", 59 | "Once we have object Connection (conn), we can create an Cursor object (c). The c object allows us to create table students by callling the method execute. We pass creat table command as parameter in execute method.\n", 60 | "\n" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "metadata": { 67 | "collapsed": true 68 | }, 69 | "outputs": [], 70 | "source": [ 71 | "conn = sqlite3.connect('testdb2.db')\n", 72 | "\n", 73 | "c = conn.cursor()\n", 74 | "\n", 75 | "# Create table\n", 76 | "c.execute(\"CREATE TABLE students(student_id integer primary key autoincrement,name text not null, surname text not null,birth date,weight int,height int)\")\n", 77 | "\n", 78 | "# Save (commit) the changes in database.\n", 79 | "# Changes not commited will be lost\n", 80 | "conn.commit()\n", 81 | "\n", 82 | "# We close the connection and free all resources\n", 83 | "conn.close()\n" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "We ca n ru a query to what tables there are in our database." 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": {}, 97 | "outputs": [], 98 | "source": [ 99 | "conn = sqlite3.connect('testdb2.db')\n", 100 | "\n", 101 | "c = conn.cursor()\n", 102 | "\n", 103 | "c.execute(\"SELECT name FROM sqlite_master WHERE type='table';\")\n", 104 | "tables = c.fetchall()\n", 105 | "for tableName in tables:\n", 106 | " print(tableName)\n", 107 | "\n", 108 | "# We do not have to commit anything as we didn't make any changes.\n", 109 | "conn.close()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "We use the method commit() of Cursor object to save changes in database.\n", 117 | "Finally, we close the connection with close() method to free resources." 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "# DROP table\n", 125 | "\n", 126 | "We use method execute and sql drop statment to remove table from database." 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": { 133 | "collapsed": true 134 | }, 135 | "outputs": [], 136 | "source": [ 137 | "conn = sqlite3.connect('testdb2.db')\n", 138 | "\n", 139 | "c = conn.cursor()\n", 140 | "\n", 141 | "# execute the commend below to drop table\n", 142 | "\n", 143 | "c.execute(\"Drop table students\")\n", 144 | "# Save (commit) the changes\n", 145 | "conn.commit()\n", 146 | "\n", 147 | "# We close the connection and free all resources\n", 148 | "conn.close()" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "# Insert rows\n", 156 | "\n", 157 | "We put data into a table students, using sql insert statment in execute command.\n", 158 | "\n", 159 | "The values in sql insert command should be passed with placeholder operator (?).\n", 160 | "\n", 161 | "The use of string operators (not placeholder operator) is a bad programming practice.\n", 162 | "It results in the thread of sql injection attack. Humorous description of that problem we find at https://xkcd.com/327/ .\n" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "First create table:" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": { 176 | "collapsed": true 177 | }, 178 | "outputs": [], 179 | "source": [ 180 | "conn = sqlite3.connect('testdb2.db')\n", 181 | "\n", 182 | "c = conn.cursor()\n", 183 | "\n", 184 | "# Create table\n", 185 | "c.execute(\"CREATE TABLE students(student_id integer primary key autoincrement,name text not null, surname text not null,birth date,weight int,height int)\")\n", 186 | "\n", 187 | "# Save (commit) the changes in database.\n", 188 | "# Changes not commited will be lost\n", 189 | "conn.commit()\n", 190 | "\n", 191 | "# We close the connection and free all resources\n", 192 | "conn.close()" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "Now,let's insert records:" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "conn = sqlite3.connect('testdb2.db')\n", 209 | "\n", 210 | "c = conn.cursor()\n", 211 | "\n", 212 | "# Insert record values using placeholder operator\n", 213 | "# While student_id column value is autoincrement primary key,its value should be equal to None\n", 214 | "c.execute(\"INSERT INTO students VALUES(?,?,?,?,?,?)\",(None,'Tom','Silver',72,182,'1989-11-03'))\n", 215 | "\n", 216 | "# WRONG:Do not use string operators as below\n", 217 | "#c.execute(\"INSERT INTO students VALUES ({0},{1},{2},{3},{4})\".format('Tom','Silver',72,182,'1989-11-03'))\n", 218 | "\n", 219 | "# print total number of changed rows\n", 220 | "print(\"number of affected rows: {0}\".format(conn.total_changes))\n", 221 | "\n", 222 | "# Save (commit) the changes\n", 223 | "conn.commit()\n", 224 | "\n", 225 | "# We close the connection and free all resources\n", 226 | "conn.close()" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "We can also insert many records in one time into database. We use for that executemany() method. We pass list of records as argument. Each record is also a single list. " 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "conn = sqlite3.connect('testdb2.db')\n", 243 | "\n", 244 | "c = conn.cursor()\n", 245 | "\n", 246 | "# Larger example that inserts many records at a time\n", 247 | "studentsRecords = [(None,'Tom','Silver',72,182,'1989-11-03'),\n", 248 | " (None,'Adam','Brown',82,192,'1992-11-03'),\n", 249 | " (None,'Maria','Great',52,162,'1995-11-03'),]\n", 250 | "\n", 251 | "c.executemany('INSERT INTO students VALUES (?,?,?,?,?,?)', studentsRecords)\n", 252 | "\n", 253 | "# print total number of changed rows\n", 254 | "print(\"number of affected rows: {0}\".format(conn.total_changes))\n", 255 | "\n", 256 | "# Save (commit) the changes\n", 257 | "conn.commit()\n", 258 | "\n", 259 | "# We close the connection and free all resources\n", 260 | "conn.close()" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "# Select rows\n", 268 | "\n", 269 | "To retrieve data after executing a SELECT statement, you can either treat the cursor as an iterator, call the cursor’s fetchone() method to retrieve a single matching row, or call fetchall() to get a list of the matching rows" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "conn = sqlite3.connect('testdb2.db')\n", 279 | "\n", 280 | "c = conn.cursor()\n", 281 | "\n", 282 | "# execute the commend below to select name,surname and iteterate through results using iterator\n", 283 | "for row in c.execute(\"SELECT name,surname FROM students ORDER BY surname\"):\n", 284 | " print(row)\n", 285 | "\n", 286 | "c.execute(\"SELECT name,surname FROM students ORDER BY surname\")\n", 287 | "\n", 288 | "# execute the commend below to select name,surname,weight,height\n", 289 | "c.execute(\"SELECT name,surname,weight,height FROM students ORDER BY surname\")\n", 290 | "\n", 291 | "# get and print single result\n", 292 | "print(c.fetchone())\n", 293 | "\n", 294 | "\n", 295 | "# execute the commend below to select name,surname,weight,height\n", 296 | "c.execute(\"SELECT * from students ORDER BY surname\")\n", 297 | "\n", 298 | "# get all results,assign them to the list,fecthall() returns empty list if no results\n", 299 | "listOfResults=c.fetchall()\n", 300 | "for item in listOfResults:\n", 301 | " print(item)\n", 302 | "\n", 303 | "\n", 304 | "# Save (commit) the changes\n", 305 | "conn.commit()\n", 306 | "\n", 307 | "# We close the connection and free all resources\n", 308 | "conn.close()" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "If we want to obtain information about number of columns and columns names then we shoud operate on Row object. Row object is returned by fetchone() method. \n", 316 | "\n", 317 | "REMARK: We must define row_factory in Connection object as below." 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": null, 323 | "metadata": {}, 324 | "outputs": [], 325 | "source": [ 326 | "conn = sqlite3.connect('testdb2.db')\n", 327 | "\n", 328 | "# If we want to operate on rows we must define row factory\n", 329 | "conn.row_factory = sqlite3.Row\n", 330 | "\n", 331 | "c = conn.cursor()\n", 332 | "\n", 333 | "c.execute(\"SELECT * from students ORDER BY surname\")\n", 334 | "row=c.fetchone()\n", 335 | "\n", 336 | "# print number of columns\n", 337 | "print(len(row))\n", 338 | "# print value in the first three column \n", 339 | "print(row[0], row[1], row[2])\n", 340 | "# or address values by column names.\n", 341 | "print(row[\"name\"], row[\"surname\"])\n", 342 | "# print columns names\n", 343 | "print(row.keys())\n", 344 | "\n", 345 | "# We close the connection and free all resources\n", 346 | "conn.close()\n" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "# Custom function\n", 354 | "\n", 355 | "SQLite3 module allows user to define custom function. Below we have example of md5sum(t) function, which encodes the input string. " 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": null, 361 | "metadata": {}, 362 | "outputs": [], 363 | "source": [ 364 | "# import required modules\n", 365 | "import hashlib\n", 366 | "\n", 367 | "def md5sum(t):\n", 368 | " return hashlib.md5(t).hexdigest()\n", 369 | "\n", 370 | "conn = sqlite3.connect(\"testdb2.db\")\n", 371 | "# create function takes three arguments: \n", 372 | "# name of the custom function in SQLite3 module, number of parameters,name of the custom function in Python \n", 373 | "conn.create_function(\"md5\", 1, md5sum)\n", 374 | "c = conn.cursor()\n", 375 | "# below code encodes name,surname and inserts data into table\n", 376 | "c.execute(\"INSERT INTO students VALUES (?,md5(?),?,?,?,?)\",(None,b'Tom','Silver',72,182,'1989-11-03'))\n", 377 | "\n", 378 | "# print total number of changed rows\n", 379 | "print(\"number of affected rows: {0}\".format(conn.total_changes))\n", 380 | "\n", 381 | "c.execute(\"Select * from students\")\n", 382 | "# get all results,assign them to the list,fecthall() returns empty list if no results\n", 383 | "listOfResults=c.fetchall()\n", 384 | "for item in listOfResults:\n", 385 | " print(item)\n", 386 | "\n", 387 | "# Save (commit) the changes\n", 388 | "conn.commit()\n", 389 | "\n", 390 | "# We close the connection and free all resources\n", 391 | "conn.close()" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "# Controlling Transactions\n" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "The sqlite3 module opens transactions implicitly before a SQL statement (i.e. INSERT/UPDATE/DELETE/REPLACE). \n", 406 | "The method rollback() method rolls back any changes to the database since the last call to commit()." 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "# Exceptions" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "SQLite3 module provides the following types of exceptions:\n", 421 | "\n", 422 | "*exception sqlite3.DatabaseError\n", 423 | "\n", 424 | " Exception raised for errors that are related to the database.\n", 425 | "\n", 426 | "*exception sqlite3.IntegrityError\n", 427 | "\n", 428 | " Exception raised when the relational integrity of the database is affected, e.g. a foreign key check fails. \n", 429 | " It is a subclass of DatabaseError.\n", 430 | "\n", 431 | "*exception sqlite3.ProgrammingError\n", 432 | "\n", 433 | " Exception raised for programming errors, e.g. table not found or already exists, \n", 434 | " syntax error in the SQL statement, wrong number of parameters specified, etc. \n", 435 | " It is a subclass of DatabaseError.\n", 436 | "\n", 437 | "REMARK: We should remember to close connection and free resources in both cases when exception occurs and does not occur.\n", 438 | "When exception occurs we should rollback changes in given transaction.\n" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": {}, 445 | "outputs": [], 446 | "source": [ 447 | "con = sqlite3.connect(\"testdb2.db\")\n", 448 | "\n", 449 | "try:\n", 450 | " # Successful, con.commit() is called automatically afterwards\n", 451 | " with con:\n", 452 | " con.execute(\"INSERT INTO students VALUES (?,?,?,?,?,?)\",(None,'Mark','LastGood',69,174,'1989-11-03'))\n", 453 | " con.execute(\"INSERT INTO students VALUES (?,?,?,?,?,?)\",(None,'Mark',None,69,174,'1989-11-03'))\n", 454 | " con.execute(\"INSERT INTO students VALUES (?,?,?,?,?,?)\",('Tom',None,72,182,'1989-11-03'))\n", 455 | "\n", 456 | " # When exception occurs, we use rollback() method to revert changes \n", 457 | "except sqlite3.IntegrityError:\n", 458 | " print(\"IntegrityError:couldn't add record with null surname\")\n", 459 | " con.rollback()\n", 460 | "except sqlite3.ProgrammingError:\n", 461 | " print(\"ProgrammingError:table does not exist\")\n", 462 | " con.rollback()\n", 463 | "except sqlite3.Error:\n", 464 | " print(\"Error:general db erorr\")\n", 465 | " con.rollback()\n", 466 | " \n", 467 | "# We close the connection and free all resources\n", 468 | "conn.close()" 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": null, 474 | "metadata": { 475 | "collapsed": true 476 | }, 477 | "outputs": [], 478 | "source": [] 479 | } 480 | ], 481 | "metadata": { 482 | "kernelspec": { 483 | "display_name": "Python 3", 484 | "language": "python", 485 | "name": "python3" 486 | }, 487 | "language_info": { 488 | "codemirror_mode": { 489 | "name": "ipython", 490 | "version": 3 491 | }, 492 | "file_extension": ".py", 493 | "mimetype": "text/x-python", 494 | "name": "python", 495 | "nbconvert_exporter": "python", 496 | "pygments_lexer": "ipython3", 497 | "version": "3.6.2" 498 | } 499 | }, 500 | "nbformat": 4, 501 | "nbformat_minor": 2 502 | } 503 | -------------------------------------------------------------------------------- /notebooks/07_1_intro_visual.ipynb: -------------------------------------------------------------------------------- 1 | {"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.10.12","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[],"dockerImageVersionId":30587,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Visualizations\n## Matplotlib\nMatplotlib is an essential Python library for creating charts. There is a massive number of options and settings. You will see basics and best practices of creating charts, which will allow you to create most charts you need.\n\nThe best introduction to Matplotlib is in this link below, a few examples below are from this official documentation:\n* http://matplotlib.org/faq/usage_faq.html","metadata":{}},{"cell_type":"code","source":"%matplotlib inline\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport pandas as pd\nfrom IPython.display import display, HTML","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:38:55.338677Z","start_time":"2021-12-08T10:38:55.083629Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:27.815447Z","iopub.execute_input":"2023-11-30T15:07:27.815957Z","iopub.status.idle":"2023-11-30T15:07:27.823233Z","shell.execute_reply.started":"2023-11-30T15:07:27.815920Z","shell.execute_reply":"2023-11-30T15:07:27.821885Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"#!pip install matplotlib\n","metadata":{"execution":{"iopub.status.busy":"2023-11-30T15:07:27.825561Z","iopub.execute_input":"2023-11-30T15:07:27.826050Z","iopub.status.idle":"2023-11-30T15:07:27.834911Z","shell.execute_reply.started":"2023-11-30T15:07:27.826008Z","shell.execute_reply":"2023-11-30T15:07:27.833550Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"#!conda install -c conda-forge matplotlib","metadata":{"execution":{"iopub.status.busy":"2023-11-30T15:07:27.836718Z","iopub.execute_input":"2023-11-30T15:07:27.837168Z","iopub.status.idle":"2023-11-30T15:07:27.842815Z","shell.execute_reply.started":"2023-11-30T15:07:27.837128Z","shell.execute_reply":"2023-11-30T15:07:27.841820Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Let us begin with a basic example. Notice that plt is simply matplotlib.pyplot - matplotlib's object for drawing.","metadata":{}},{"cell_type":"code","source":"# create a series\nx = np.linspace(0, 2, 100)\n# draw it\nplt.plot(x, x, label='Line chart')\n","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:40:08.616103Z","start_time":"2021-12-08T10:40:08.513074Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:27.844948Z","iopub.execute_input":"2023-11-30T15:07:27.845961Z","iopub.status.idle":"2023-11-30T15:07:28.158915Z","shell.execute_reply.started":"2023-11-30T15:07:27.845929Z","shell.execute_reply":"2023-11-30T15:07:28.158100Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Now, see what happens if you try to draw two charts, one by one.","metadata":{}},{"cell_type":"code","source":"plt.plot(x, x, label='Linear function')\n\nplt.plot(x, x**2, label='Quadratic function')","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:40:20.928559Z","start_time":"2021-12-08T10:40:20.851047Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:28.160028Z","iopub.execute_input":"2023-11-30T15:07:28.160539Z","iopub.status.idle":"2023-11-30T15:07:28.480657Z","shell.execute_reply.started":"2023-11-30T15:07:28.160502Z","shell.execute_reply":"2023-11-30T15:07:28.479517Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"For some reasons both have been drawn on the same area/canvas. Now let us see, how to draw them separately.","metadata":{}},{"cell_type":"code","source":"plt.plot(x, x, label='Linear function')\nplt.show()\n\nplt.plot(x, x**2, label='Quadratic function')\nplt.show()","metadata":{"execution":{"iopub.status.busy":"2023-11-30T15:07:28.482245Z","iopub.execute_input":"2023-11-30T15:07:28.483369Z","iopub.status.idle":"2023-11-30T15:07:29.045597Z","shell.execute_reply.started":"2023-11-30T15:07:28.483334Z","shell.execute_reply":"2023-11-30T15:07:29.044276Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"The first .plot() method creates a new chart (canvas to draw on). It is not shown until the cell has been fully executed or until you tell Python to show the chart (plt.show()). After a plot has been shown, it is closed. See an example below.\n\nfoo.png file is empty, because the plot has been saved after showing it (the buffer had been cleared).","metadata":{}},{"cell_type":"code","source":"plt.plot(x, x, label='Linear function')\nplt.show()\nplt.savefig('foo.png')\n\nplt.plot(x, x**2, label='Quadratic function')\nplt.savefig('foo1.png')\nplt.show()\n","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:42:32.667291Z","start_time":"2021-12-08T10:42:32.458560Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:29.047120Z","iopub.execute_input":"2023-11-30T15:07:29.047578Z","iopub.status.idle":"2023-11-30T15:07:29.606414Z","shell.execute_reply.started":"2023-11-30T15:07:29.047525Z","shell.execute_reply":"2023-11-30T15:07:29.605294Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Now, see what happens if you save images between adding series.","metadata":{}},{"cell_type":"code","source":"plt.plot(x, x, label='Linear function')\nplt.savefig('boo.png')\nplt.plot(x, x**2, label='Quadratic function')\nplt.savefig('boo1.png')\nplt.show()","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:43:11.860915Z","start_time":"2021-12-08T10:43:11.730131Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:29.608011Z","iopub.execute_input":"2023-11-30T15:07:29.608445Z","iopub.status.idle":"2023-11-30T15:07:30.052815Z","shell.execute_reply.started":"2023-11-30T15:07:29.608403Z","shell.execute_reply":"2023-11-30T15:07:30.051696Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"As expected, boo.png contains only one series, and boo1.png contains two. Saving to a file does not clear the buffer.\n\nLet us see how to add a few helpful pieces of information to a chart.","metadata":{}},{"cell_type":"code","source":"plt.plot(x, x, label='Linear function')\nplt.plot(x, x**2, label='Quadratic function')\n\nplt.xlabel('Description of x axis')\nplt.ylabel('Description of y axis')\nplt.title(\"Title of the chart\")\n\nplt.legend()","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:43:52.076396Z","start_time":"2021-12-08T10:43:51.970542Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:30.056990Z","iopub.execute_input":"2023-11-30T15:07:30.057338Z","iopub.status.idle":"2023-11-30T15:07:30.419158Z","shell.execute_reply.started":"2023-11-30T15:07:30.057311Z","shell.execute_reply":"2023-11-30T15:07:30.418063Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Now it starts to look properly and you can develop this idea further.\n\nIn practice, calling particular elements using the whole matplotlib object is not the best practice. If the chart is simple, it does not pose a problem. However, if you want to have greater control over the chart and combine it with other libraries, e.g. pandas, it is good to know how to control charts in a better way.\n\nMatplotlib splits a chart into two objects, Figure and Axes. The first controls the whole canvas (the \"wrapping\" of the chart), and Axes are single charts, which do not have to ocupy the whole canvas.\n\nLet us start with creating a picture/canvas in size 6,6.\n* https://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure","metadata":{}},{"cell_type":"code","source":"fig = plt.figure(figsize=(6,6))","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:45:41.854667Z","start_time":"2021-12-08T10:45:41.850685Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:30.420669Z","iopub.execute_input":"2023-11-30T15:07:30.421521Z","iopub.status.idle":"2023-11-30T15:07:30.432200Z","shell.execute_reply.started":"2023-11-30T15:07:30.421458Z","shell.execute_reply":"2023-11-30T15:07:30.431010Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Now develop the empty canvas. fig.add_subplot(1, 1, 1) - the numbers are (respectively) rows and columns the canvas should be split into, and the number of the part of canvas you want to use. Function add_subplot returns Ax, a part of the whole canvas.\n* https://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure.add_subplot\n* https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot\nNote that commands to set axis labels and chart title have been slighly changed. The method names are a bit different for Ax object than for plt module.","metadata":{}},{"cell_type":"code","source":"ax = fig.add_subplot(1, 1, 1)\nax.plot(x, x, label='Linear function')\nax.plot(x, x**2, label='Quadratic function')\n\nax.set_xlabel('Description of x axis')\nax.set_ylabel('Description of y axis')\nax.set_title(\"Title of the chart\")\n\nax.legend()\n\nfig.savefig(\"fig.png\")\n# fig has been created in another cell, so the chart is not drawn automatically\nfig","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:48:28.803856Z","start_time":"2021-12-08T10:48:28.682825Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:30.433747Z","iopub.execute_input":"2023-11-30T15:07:30.434072Z","iopub.status.idle":"2023-11-30T15:07:31.087691Z","shell.execute_reply.started":"2023-11-30T15:07:30.434043Z","shell.execute_reply":"2023-11-30T15:07:31.084703Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"To better understand why ax is useful, see what happens in this example.","metadata":{}},{"cell_type":"code","source":"\nfig = plt.figure(figsize=(8,8))\n# Almost never use it\nfig.suptitle('Title of the whole chart', fontsize=12)\n\nax = fig.add_subplot(2, 1, 1)\n\nax.plot(x, x**2, label='Quadratic function')\nax.set_xlabel('Description of x axis')\nax.set_ylabel('Description of y axis')\nax.set_title(\"Title of chart 1\")\nax.legend()\n\nax = fig.add_subplot(2, 1, 2)\nax.plot(x, x, label='Linear function')\nax.set_xlabel('Description of x axis')\nax.set_ylabel('Description of y axis')\nax.set_title(\"Title of chart 2\")\nax.legend()\n\n# tight_layout tries to align all elements in such a way that they do not overlap each other.\nfig.tight_layout()","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:50:49.531020Z","start_time":"2021-12-08T10:50:49.330298Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:31.089291Z","iopub.execute_input":"2023-11-30T15:07:31.091618Z","iopub.status.idle":"2023-11-30T15:07:31.870200Z","shell.execute_reply.started":"2023-11-30T15:07:31.091568Z","shell.execute_reply":"2023-11-30T15:07:31.869019Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Subplots and axes may be created in yet another, possibly more convenient way. The example is virtually identical to the one above, but for readability we join together the x axis (sharex) and get rid of one label and title.","metadata":{}},{"cell_type":"code","source":"fig, ax = plt.subplots(1, 2,figsize=(8,4), sharey=True)\n\nfig.suptitle('Title of the whole chart', fontsize=12)\n\nax[0].plot(x, x**2, label='Quadratic function')\n# ax[0].set_xlabel('Description of x axis')\nax[0].set_ylabel('Description of y axis')\nax[0].set_title(\"Title of chart 1\")\nax[0].legend()\n\n\nax[1].plot(x, x, label='Linear chart')\nax[1].set_xlabel('Description of x axis')\nax[1].set_ylabel('Description of y axis')\n# ax[1].set_title(\"Title of chart 2\")\nax[1].legend()\nfig.tight_layout(pad=3)","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:55:41.942328Z","start_time":"2021-12-08T10:55:41.747125Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:31.871669Z","iopub.execute_input":"2023-11-30T15:07:31.872539Z","iopub.status.idle":"2023-11-30T15:07:32.421053Z","shell.execute_reply.started":"2023-11-30T15:07:31.872497Z","shell.execute_reply":"2023-11-30T15:07:32.419884Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"fig, ax = plt.subplots(2, 1,figsize=(8,8), sharex=True)\n\nfig.suptitle('Title of the whole chart', fontsize=12)\n\nax[0].plot(x, x**2, label='Quadratic function')\n# ax[0].set_xlabel('Description of x axis')\nax[0].set_ylabel('Description of y axis')\nax[0].set_title(\"Title of chart 1\")\nax[0].legend()\n\n\nax[1].plot(x, x, label='Linear chart')\nax[1].set_xlabel('Description of x axis')\nax[1].set_ylabel('Description of y axis')\n# ax[1].set_title(\"Title of chart 2\")\nax[1].legend()\nfig.tight_layout(pad=3)","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:53:08.821305Z","start_time":"2021-12-08T10:53:08.589033Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:32.422607Z","iopub.execute_input":"2023-11-30T15:07:32.422930Z","iopub.status.idle":"2023-11-30T15:07:33.184793Z","shell.execute_reply.started":"2023-11-30T15:07:32.422902Z","shell.execute_reply":"2023-11-30T15:07:33.183541Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"You may create charts in any grid format. By the way - note that ax may also take a dictionary parameter.","metadata":{}},{"cell_type":"code","source":"fig, ax = plt.subplots(2, 2, figsize=(8,8), gridspec_kw={'width_ratios':[4, 1], 'height_ratios':[4, 1]})\n\nax[0,0].plot(x, x**2, label='Quadratic function')\nax[0,0].set(**{\"xlabel\":'Description of x axis', \"ylabel\":'Description of y axis', \"title\":'Title of chart 1'})\n# ax[0,0].set_xlabel('Description of x axis')\n# ax[0,0].set_ylabel('Description of y axis')\n# ax[0,0].set_title(\"Title of chart 1\")\nax[0,0].legend()\n\n\nax[1,0].plot(x, x, label='Linear function')\nax[1,0].set_xlabel('Description of x axis')\nax[1,0].set_ylabel('Description of y axis')\nax[1,0].set_title(\"Title of chart 2\")\n\n\n\nax[0,1].plot(x, x, label='Linear function')\nax[0,1].set_xlabel('Description of x axis')\nax[0,1].set_ylabel('Description of y axis')\nax[0,1].set_title(\"Title of chart 2\")\n\n# In this case we do not want anything in the bottom right\n# so we turn off drawing the axes and add nothing to the chart.\nax[1,1].axis(\"off\")\nfig.tight_layout()","metadata":{"ExecuteTime":{"end_time":"2021-12-08T10:57:07.065937Z","start_time":"2021-12-08T10:57:06.820058Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:33.186903Z","iopub.execute_input":"2023-11-30T15:07:33.187698Z","iopub.status.idle":"2023-11-30T15:07:34.295748Z","shell.execute_reply.started":"2023-11-30T15:07:33.187648Z","shell.execute_reply":"2023-11-30T15:07:34.294549Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"In practice you may draw any shape of a canvas freely using add_axes, even shapes overlapping other ones.\n* https://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure.add_axes","metadata":{}},{"cell_type":"code","source":"fig = plt.figure(figsize=(8,5))\nax1 = fig.add_axes([0.1, 0.1, 0.8, 0.8])\nax2 = fig.add_axes([0.72, 0.72, 0.16, 0.16])\nax1.plot(x, np.sin(8*x), label='Line chart of sine')\nax2.plot(x, x, label='Linear function')\n","metadata":{"execution":{"iopub.status.busy":"2023-11-30T15:10:35.914628Z","iopub.execute_input":"2023-11-30T15:10:35.915054Z","iopub.status.idle":"2023-11-30T15:10:36.388312Z","shell.execute_reply.started":"2023-11-30T15:10:35.915024Z","shell.execute_reply":"2023-11-30T15:10:36.387195Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"fig = plt.figure(figsize=(8,5))\nax1 = fig.add_axes([0, 0, 0.8, 0.8])\nax2 = fig.add_axes([0.62, 0.1, 0.16, 0.16])\nax1.plot(x, np.sin(8*x), label='Line chart of sine')\nax2.plot(x, x, label='Linear function')\n","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:02:41.060697Z","start_time":"2021-12-08T11:02:40.925066Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:34.754029Z","iopub.execute_input":"2023-11-30T15:07:34.754730Z","iopub.status.idle":"2023-11-30T15:07:35.212804Z","shell.execute_reply.started":"2023-11-30T15:07:34.754688Z","shell.execute_reply":"2023-11-30T15:07:35.211673Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"fig = plt.figure(figsize=(8,5))\nax1 = fig.add_axes([0, 0, 0.8, 0.8])\nax2 = fig.add_axes([0.8, 0.8, 0.2, 0.2])\nax1.plot(x, np.sin(8*x), label='Line chart of sine')\nax2.plot(x, x, label='Linear function')\n","metadata":{"ExecuteTime":{"end_time":"2021-12-08T11:03:49.202415Z","start_time":"2021-12-08T11:03:49.070669Z"},"execution":{"iopub.status.busy":"2023-11-30T15:07:35.214077Z","iopub.execute_input":"2023-11-30T15:07:35.214407Z","iopub.status.idle":"2023-11-30T15:07:35.683237Z","shell.execute_reply.started":"2023-11-30T15:07:35.214379Z","shell.execute_reply":"2023-11-30T15:07:35.682225Z"},"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"","metadata":{},"outputs":[],"execution_count":null}]} 2 | -------------------------------------------------------------------------------- /notebooks/07_2_intro_visual_seaborn.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Visualizations - Seaborn\n", 8 | "Seaborn is a commonly used library for visualizations (http://seaborn.pydata.org/index.html). It makes it fast and easy to create pretty charts. What is more, it has a great documentation full of interesting and inspiring examples. Examples in this notebook are taken from this gallery (http://seaborn.pydata.org/examples/index.html).\n", 9 | "\n", 10 | "%matplotlib command tells the notebook to show charts as output." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": { 17 | "ExecuteTime": { 18 | "end_time": "2021-12-08T11:19:27.236730Z", 19 | "start_time": "2021-12-08T11:19:27.232751Z" 20 | }, 21 | "execution": { 22 | "iopub.execute_input": "2023-12-07T14:45:03.582334Z", 23 | "iopub.status.busy": "2023-12-07T14:45:03.581513Z", 24 | "iopub.status.idle": "2023-12-07T14:45:05.838969Z", 25 | "shell.execute_reply": "2023-12-07T14:45:05.837439Z", 26 | "shell.execute_reply.started": "2023-12-07T14:45:03.582254Z" 27 | }, 28 | "trusted": true 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "%matplotlib inline\n", 33 | "import numpy as np\n", 34 | "import seaborn as sns\n", 35 | "import matplotlib\n", 36 | "import matplotlib.pyplot as plt\n", 37 | "import pandas as pd" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## Histograms\n", 45 | "You can get the simplest histograms using distplot. First, draw numbers from the normal distribution, then modify default size of the image. Seaborn is albo based on matplotlib, which is a basic Python library to draw." 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": { 52 | "ExecuteTime": { 53 | "end_time": "2021-12-08T11:19:31.541499Z", 54 | "start_time": "2021-12-08T11:19:31.360892Z" 55 | }, 56 | "execution": { 57 | "iopub.execute_input": "2023-12-07T14:47:27.759264Z", 58 | "iopub.status.busy": "2023-12-07T14:47:27.758878Z", 59 | "iopub.status.idle": "2023-12-07T14:47:28.543544Z", 60 | "shell.execute_reply": "2023-12-07T14:47:28.542324Z", 61 | "shell.execute_reply.started": "2023-12-07T14:47:27.759234Z" 62 | }, 63 | "trusted": true 64 | }, 65 | "outputs": [], 66 | "source": [ 67 | "rs = np.random.RandomState(10)\n", 68 | "d = rs.normal(size=200)\n", 69 | "\n", 70 | "\n", 71 | "matplotlib.rcParams['figure.figsize'] = (5.0, 2.5)\n", 72 | "\n", 73 | "sns.distplot(d, kde=False)\n", 74 | "plt.show()\n", 75 | "\n", 76 | "sns.distplot(d, hist=False)\n", 77 | "plt.show()\n", 78 | "\n", 79 | "sns.distplot(d)\n", 80 | "plt.show()" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "Default colors are quite readable and nice to look at. You have a few default styles available to use. If you do not like dark-grey color scheme, you may use bright one. Additionally, in the last chart you may fill the area under the distribution curve. despine gets rid of chart borders, which is added by default to \"white\" style." 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": { 94 | "ExecuteTime": { 95 | "end_time": "2021-12-08T11:20:01.641252Z", 96 | "start_time": "2021-12-08T11:20:01.520886Z" 97 | }, 98 | "execution": { 99 | "iopub.execute_input": "2023-12-07T14:48:55.503889Z", 100 | "iopub.status.busy": "2023-12-07T14:48:55.503457Z", 101 | "iopub.status.idle": "2023-12-07T14:48:55.849044Z", 102 | "shell.execute_reply": "2023-12-07T14:48:55.847536Z", 103 | "shell.execute_reply.started": "2023-12-07T14:48:55.503857Z" 104 | }, 105 | "trusted": true 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "sns.set(style=\"white\", palette=\"pastel\", color_codes=True)\n", 110 | "plt.figure(figsize=(5.0, 2.5))\n", 111 | "sns.distplot(d, hist=True, color=\"b\", kde_kws={\"shade\": True})\n", 112 | "sns.despine(left=True, bottom=True)\n", 113 | "plt.show()" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "## Charts of two variables\n", 121 | "Seaborn offers great possibilities to draw charts of two variables. Apart from the usual scatterplot (or extended with histograms) you may easily generate hexheatmap or KDF chart." 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": { 128 | "ExecuteTime": { 129 | "end_time": "2021-12-08T11:20:17.291880Z", 130 | "start_time": "2021-12-08T11:20:15.849411Z" 131 | }, 132 | "execution": { 133 | "iopub.execute_input": "2023-12-07T14:51:35.684355Z", 134 | "iopub.status.busy": "2023-12-07T14:51:35.683960Z", 135 | "iopub.status.idle": "2023-12-07T14:51:39.180385Z", 136 | "shell.execute_reply": "2023-12-07T14:51:39.179497Z", 137 | "shell.execute_reply.started": "2023-12-07T14:51:35.684325Z" 138 | }, 139 | "scrolled": true, 140 | "trusted": true 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "#height instead of size\n", 145 | "\n", 146 | "data = np.random.multivariate_normal([1,4],[[.5,.3], [.3,.8]],1000).T\n", 147 | "sns.set(style=\"white\", palette=\"muted\", color_codes=True)\n", 148 | "sns.jointplot(x=data[0], y=data[1], size=4)\n", 149 | "\n", 150 | "sns.jointplot(x=data[0], y=data[1], kind=\"hex\", height=4)\n", 151 | "sns.jointplot(x=data[0], y=data[1], kind=\"kde\", height=4)\n", 152 | "plt.show()" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "## Many series on one chart\n", 160 | "The example below shows perfecly how to add more series to one chart." 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": { 167 | "ExecuteTime": { 168 | "end_time": "2021-12-08T11:22:03.204824Z", 169 | "start_time": "2021-12-08T11:22:03.017356Z" 170 | }, 171 | "execution": { 172 | "iopub.execute_input": "2023-12-07T14:59:25.124779Z", 173 | "iopub.status.busy": "2023-12-07T14:59:25.124402Z", 174 | "iopub.status.idle": "2023-12-07T14:59:25.717870Z", 175 | "shell.execute_reply": "2023-12-07T14:59:25.716778Z", 176 | "shell.execute_reply.started": "2023-12-07T14:59:25.124751Z" 177 | }, 178 | "scrolled": true, 179 | "trusted": true 180 | }, 181 | "outputs": [], 182 | "source": [ 183 | "# Generate data\n", 184 | "iris = sns.load_dataset(\"iris\")\n", 185 | "setosa = iris.query(\"species == 'setosa'\")\n", 186 | "virginica = iris.query(\"species == 'virginica'\")\n", 187 | "\n", 188 | "sns.set(style=\"darkgrid\")\n", 189 | "# \"Break up\" the chart object to two variables, ax is related to our data series.\n", 190 | "f, ax = plt.subplots(1, 1, figsize=(6, 4))\n", 191 | "# Make sure that both series have the same scaling\n", 192 | "ax.set_aspect(\"equal\")\n", 193 | "\n", 194 | "#Add further series. Do not shade the lowest level.\n", 195 | "# ax = sns.kdeplot(setosa.sepal_width, setosa.sepal_length,\n", 196 | "# cmap=\"Reds\", shade=True, shade_lowest=False)\n", 197 | "# ax = sns.kdeplot(virginica.sepal_width, virginica.sepal_length,\n", 198 | "# cmap=\"Blues\", shade=True, shade_lowest=False)\n", 199 | "\n", 200 | "# Ensure same scaling for both series\n", 201 | "ax.set_aspect(\"equal\")\n", 202 | "\n", 203 | "# Plotting KDE plots with the updated syntax\n", 204 | "sns.kdeplot(x=setosa.sepal_width, y=setosa.sepal_length,\n", 205 | " cmap=\"Reds\", shade=True, shade_lowest=False,ax=ax)\n", 206 | "sns.kdeplot(x=virginica.sepal_width, y=virginica.sepal_length,\n", 207 | " cmap=\"Blues\", shade=True, shade_lowest=False, ax=ax)\n", 208 | "\n", 209 | "\n", 210 | "# Add text in chosen places.\n", 211 | "red = sns.color_palette(\"Reds\")[-2]\n", 212 | "blue = sns.color_palette(\"Blues\")[-2]\n", 213 | "ax.text(2.5, 8.2, \"virginica\", size=12, color=blue)\n", 214 | "ax.text(3.8, 4.5, \"setosa\", size=12, color=red)\n", 215 | "plt.show()" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "## Violin charts and boxplot\n", 223 | "Violin charts are quite popular recently. See two Seaborn examples. They are essentially classic boxplots extended by estimated KDF. The first example shows how to draw multiple distributions in an attractive way. Note that every violin chart is symmetric. It seems to be wasted potential. The second example uses split option, which allows to draw each half differently for two subgroups." 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": { 230 | "ExecuteTime": { 231 | "end_time": "2021-12-08T11:23:00.142300Z", 232 | "start_time": "2021-12-08T11:22:59.945668Z" 233 | }, 234 | "execution": { 235 | "iopub.execute_input": "2023-12-07T15:07:05.615139Z", 236 | "iopub.status.busy": "2023-12-07T15:07:05.613783Z", 237 | "iopub.status.idle": "2023-12-07T15:07:08.552979Z", 238 | "shell.execute_reply": "2023-12-07T15:07:08.551695Z", 239 | "shell.execute_reply.started": "2023-12-07T15:07:05.615088Z" 240 | }, 241 | "trusted": true 242 | }, 243 | "outputs": [], 244 | "source": [ 245 | "sns.set(style=\"whitegrid\")\n", 246 | "\n", 247 | "# Load the example dataset of brain network correlations\n", 248 | "df = sns.load_dataset(\"brain_networks\", header=[0, 1, 2], index_col=0)\n", 249 | "\n", 250 | "# Pull out a specific subset of networks\n", 251 | "used_networks = [1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 16, 17]\n", 252 | "used_columns = (df.columns.get_level_values(\"network\")\n", 253 | " .astype(int)\n", 254 | " .isin(used_networks))\n", 255 | "df = df.loc[:, used_columns]\n", 256 | "\n", 257 | "# Compute the correlation matrix and average over networks\n", 258 | "corr_df = df.corr().groupby(level=\"network\").mean()\n", 259 | "corr_df.index = corr_df.index.astype(int)\n", 260 | "corr_df = corr_df.sort_index().T\n", 261 | "\n", 262 | "# Set up the matplotlib figure\n", 263 | "f, ax = plt.subplots(figsize=(10, 5))\n", 264 | "\n", 265 | "# Draw a violinplot with a narrower bandwidth than the default\n", 266 | "sns.violinplot(data=corr_df, palette=\"viridis\", bw=.2, cut=1, linewidth=1, ax=ax)\n", 267 | "\n", 268 | "# Finalize the figure\n", 269 | "ax.set(ylim=(-.7, 1.05))\n", 270 | "sns.despine(left=True, bottom=True)" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": { 277 | "ExecuteTime": { 278 | "end_time": "2021-12-08T11:23:51.734543Z", 279 | "start_time": "2021-12-08T11:23:51.594036Z" 280 | }, 281 | "execution": { 282 | "iopub.execute_input": "2023-12-07T15:07:47.582669Z", 283 | "iopub.status.busy": "2023-12-07T15:07:47.580873Z", 284 | "iopub.status.idle": "2023-12-07T15:07:48.885517Z", 285 | "shell.execute_reply": "2023-12-07T15:07:48.884361Z", 286 | "shell.execute_reply.started": "2023-12-07T15:07:47.582586Z" 287 | }, 288 | "trusted": true 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "sns.set(style=\"whitegrid\", palette=\"pastel\", color_codes=True)\n", 293 | "\n", 294 | "# Load the example tips dataset\n", 295 | "tips = sns.load_dataset(\"tips\")\n", 296 | "plt.figure(figsize=(8.0, 4))\n", 297 | "# Draw a nested violinplot and split the violins for easier comparison\n", 298 | "sns.violinplot(x=\"day\", y=\"total_bill\", hue=\"sex\", data=tips, split=True,\n", 299 | " inner=\"quart\", palette={\"Male\": \"b\", \"Female\": \"y\"})\n", 300 | "sns.despine(left=True)" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "Of course, the classic boxplot is also available." 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": null, 313 | "metadata": { 314 | "ExecuteTime": { 315 | "end_time": "2021-12-08T10:32:56.793370Z", 316 | "start_time": "2021-12-08T10:32:55.696960Z" 317 | }, 318 | "execution": { 319 | "iopub.execute_input": "2023-12-07T15:11:29.444431Z", 320 | "iopub.status.busy": "2023-12-07T15:11:29.443821Z", 321 | "iopub.status.idle": "2023-12-07T15:11:31.767971Z", 322 | "shell.execute_reply": "2023-12-07T15:11:31.766848Z", 323 | "shell.execute_reply.started": "2023-12-07T15:11:29.444390Z" 324 | }, 325 | "trusted": true 326 | }, 327 | "outputs": [], 328 | "source": [ 329 | "sns.set(style=\"ticks\", palette=\"muted\", color_codes=True)\n", 330 | "\n", 331 | "# Load the example planets dataset\n", 332 | "planets = sns.load_dataset(\"planets\")\n", 333 | "\n", 334 | "plt.figure(figsize=(10.0, 6))\n", 335 | "# Plot the orbital period with horizontal boxes\n", 336 | "ax = sns.boxplot(x=\"distance\", y=\"method\", data=planets,\n", 337 | " whis=np.inf, color=\"c\")\n", 338 | "\n", 339 | "# Add in points to show each observation\n", 340 | "sns.stripplot(x=\"distance\", y=\"method\", data=planets,\n", 341 | " jitter=True, size=3, color=\".3\", linewidth=0)\n", 342 | "\n", 343 | "\n", 344 | "# Make the quantitative axis logarithmic\n", 345 | "ax.set_xscale(\"log\")\n", 346 | "sns.despine(trim=True)" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "## Additional settings\n", 354 | "Seaborn is perfect to create charts quickly and it is usually fine not to change a lot of settings. However, if you feel like it, you may adjust the chart for your particular needs.\n", 355 | "* http://seaborn.pydata.org/tutorial/aesthetics.html\n", 356 | "* http://seaborn.pydata.org/tutorial/color_palettes.html" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "## Plotly examples\n", 371 | "\n", 372 | "Lets have a look at some interesting plotly examples. More can be found here: https://plotly.com/python/basic-charts/" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": {}, 379 | "outputs": [], 380 | "source": [ 381 | "#!pip install plotly==5.24.1" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": null, 387 | "metadata": {}, 388 | "outputs": [], 389 | "source": [ 390 | "import plotly.express as px\n", 391 | "df = px.data.iris()\n", 392 | "fig = px.scatter(df, x=\"sepal_width\", y=\"sepal_length\", color=\"species\",\n", 393 | " size='petal_length', hover_data=['petal_width'])\n", 394 | "fig.show(renderer='iframe')" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": null, 400 | "metadata": {}, 401 | "outputs": [], 402 | "source": [ 403 | "\n", 404 | "df = px.data.tips()\n", 405 | "fig = px.sunburst(df, path=['sex', 'day', 'time'], values='total_bill', color='time')\n", 406 | "fig.show(renderer='iframe')" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": null, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [ 415 | "df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')\n", 416 | "\n", 417 | "fig = px.density_mapbox(df, lat='Latitude', lon='Longitude', z='Magnitude', radius=10,\n", 418 | " center=dict(lat=0, lon=180), zoom=0,\n", 419 | " mapbox_style=\"open-street-map\")\n", 420 | "fig.show(renderer='iframe')" 421 | ] 422 | } 423 | ], 424 | "metadata": { 425 | "kaggle": { 426 | "accelerator": "none", 427 | "dataSources": [], 428 | "dockerImageVersionId": 30587, 429 | "isGpuEnabled": false, 430 | "isInternetEnabled": true, 431 | "language": "python", 432 | "sourceType": "notebook" 433 | }, 434 | "kernelspec": { 435 | "display_name": "Python 3", 436 | "language": "python", 437 | "name": "python3" 438 | }, 439 | "language_info": { 440 | "codemirror_mode": { 441 | "name": "ipython", 442 | "version": 3 443 | }, 444 | "file_extension": ".py", 445 | "mimetype": "text/x-python", 446 | "name": "python", 447 | "nbconvert_exporter": "python", 448 | "pygments_lexer": "ipython3", 449 | "version": "3.10.12" 450 | } 451 | }, 452 | "nbformat": 4, 453 | "nbformat_minor": 4 454 | } 455 | -------------------------------------------------------------------------------- /notebooks/09_Webb_apps_with_Python_-_Bottle.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Web applications with Python - Bottle\n", 8 | "## Introduction to web applications\n", 9 | "\n", 10 | "Web applications or websites can be created in multiple ways. Both when it comes to the engine/software (PHP, Python, Ruby etc.) and architecture of the application. In practice, Model View Controller (MVC) or Model View Presenter (MVP, sometimes called a successor to MVC) are the most common architectures. You should know how MVC looks like and create applications according to this rule.\n", 11 | "\n", 12 | "Model View Controller:\n", 13 | "* Model - responsible for data structure. Model represents knowledge/data. A single table or a complex database may be a model.\n", 14 | "* View - View is the way of displaying a model. View is responsible for presenting data to the user.\n", 15 | "* Controller - engine which decides what to do with the model (query the model, modify the model, display view with model's data)\n", 16 | "\n", 17 | "Thorough understanding of the philosophy behind each architecture is mostly unnecessary for beginners. However, you should remember that data storage (model), actions connected with data and reacting to user's input (controller) and way of displaying data (view) are three separate layers which should not be combined. It is not particularly important if you choose MVC or MVP.\n", 18 | "\n", 19 | "Read more here:\n", 20 | "* https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller\n", 21 | "* https://blog.codinghorror.com/understanding-model-view-controller/\n", 22 | "* https://www.codeproject.com/Articles/288928/Differences-between-MVC-and-MVP-for-Beginners" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## Webservers, frameworks, template languages etc.\n", 30 | "One of the more common solutions is LAMP - Linux, Apache, MySQL, and PHP. In Linux operating system, webserver communicates with the net, PHP prepares the content from the MySQL database. In practice solutions/libraries/frameworks make our life easier, so that we do not need to write everything from scratch.\n", 31 | "\n", 32 | "In case of Python, Web Server Gateway Interface (WSGI) is usually used. It allows you to combine Python applications with webserver capabilities (communication with web browsers). Fortunately you do not have to understand every part of the webserver when using any Python framework. For a beginner, it \"just works\".\n", 33 | "\n", 34 | "If yoy have a working Python installation with a framework you can publish your application immediately.\n", 35 | "\n", 36 | "See also:\n", 37 | "* https://www.fullstackpython.com/wsgi-servers.html" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## Python frameworks\n", 45 | "There are many popular Python frameworks. Most commonly used are probably Bottle, Flask, and Django. Bottle, which will be a starting point for you is the smallest and easiest solution, perfectly suited for beginners. Flask is a more advanced alternative, and Django truly is a Swiss knife of web frameworks. Django is used by large websites (Instagram, Washington Post, Pinterest etc.). To sum up, Bottle is a good first step in creating web applications and allows us access to an endless world." 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "## Bottle\n", 53 | "Bottle is a small and convenient framework, in which you can create interesting applications fast. The most distinctive feature of Bottle is that it requires you to write an application in one file. Of course you can use imports, but still it is a limitation which makes Bottle unsuitable for large projects. In a moment you will see how much you can achieve with a simple script.\n", 54 | "\n", 55 | "* https://bottlepy.org/docs/dev/" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "### The simplest application\n", 63 | "The simplest application looks like this:\n", 64 | "```python\n", 65 | "from bottle import route, run, template\n", 66 | "\n", 67 | "@route('/hello/')\n", 68 | "def index(name):\n", 69 | " return template('Hello {{name}}!', name=name)\n", 70 | "\n", 71 | "run(host='localhost', port=8080)\n", 72 | "```\n", 73 | "\n", 74 | "In this case you may see that the Python script imports parts of Bottle and defines a route - address within the server - with assigned Python function. Funtion returns a template (view), which is a simple, one-line HTML in this case.\n", 75 | "\n", 76 | "Note that to create web applications in Python you still need the basics of HTML, CSS and JavaScript. Fortunately you will gain this knowledge in practice, just copying and modifying ready-made solutions at the beginning.\n", 77 | "\n", 78 | "If you want to learn HTML and CSS it is a good idea to visit online tutorials, for example:\n", 79 | "* https://learn.shayhowe.com/html-css/\n", 80 | "* https://learn.shayhowe.com/advanced-html-css/" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "## Extending an application\n", 95 | "\n", 96 | "See how an application with two subpages looks like:\n", 97 | "```python\n", 98 | "from bottle import route, run, template\n", 99 | "\n", 100 | "@route('/hello/')\n", 101 | "def hello(name):\n", 102 | " return template('Hello {{name}}!', name=name)\n", 103 | "\n", 104 | "@route('/bye/')\n", 105 | "def bye(name):\n", 106 | " return template('Bye, Bye {{name}}!', name=name)\n", 107 | "\n", 108 | "run(host='localhost', port=8080)\n", 109 | "```\n", 110 | "\n", 111 | "As you can see, it was enough to add a few lines. Now let us describe the elements.\n", 112 | "\n", 113 | "```python\n", 114 | "@route('/hello/')\n", 115 | "```\n", 116 | "Tells Bottle that a function is defined there. The function returns a template when you access the page: [our site]/hello, and name is an argument.\n", 117 | "\n", 118 | "If you access the site:\n", 119 | "\n", 120 | "[our site]/hello/Maciej\n", 121 | "\n", 122 | "You will see:\n", 123 | "\n", 124 | "\"Hello Maciej\"\n", 125 | "\n", 126 | "What you can see as the result of the function? It tells us that to a `Hello {{name}}!` view you should pass a \"name\" variable as \"name\", so that the view knows what to put in there.\n", 127 | "\n", 128 | "```python\n", 129 | " return template('Hello {{name}}!', name=name)\n", 130 | "```\n" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "## Basic application\n", 138 | "You may see most important parts of the application in an example in the catalog 08_Webapp. Let us start analyzing the server.py file, beginning with the end.\n", 139 | "\n", 140 | "```python\n", 141 | "@app.route('/')\n", 142 | "@app.route('/index')\n", 143 | "@app.route('/index/')\n", 144 | "@app.route('/index/')\n", 145 | "def index(message=''):\n", 146 | " loginName = checkAuth()\n", 147 | " messDict = {'error': \"Something went wrong\",\n", 148 | " 'ok': \"Everything is ok.\"}\n", 149 | " return template('index', message=messDict.get(message, \"\"), loginName=loginName)\n", 150 | "```\n", 151 | "\n", 152 | "You define an index function for a few names. This function calls checkAuth() (look below). If everything went fine, it calls a template 'index' and passes two variables to it, message and loginName.\n", 153 | "\n", 154 | "Note that template is a simple string. If it is the case, Bottle looks for a file in templates catalog. Look at the contents:\n", 155 | "\n", 156 | "```html\n", 157 | "% rebase('base.tpl', title='Python')\n", 158 | "
\n", 159 | "\t
\n", 160 | "\t

{{message}}

\n", 161 | "\tHello {{loginName}}\n", 162 | "\t
\n", 163 | "
\n", 164 | "```\n", 165 | "\n", 166 | "Contents of index.html are self-explanatory. You may expect that contents of variables will be inserted into {{message}} and {{loginName}}.\n", 167 | "\n", 168 | "% rebase('base.tpl', title='Python') makes contents of the template being inserted into base.tpl, so that you may keep common parts of the application in this template (e.g. navbar)." 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": { 174 | "collapsed": true 175 | }, 176 | "source": [ 177 | "### Authorization\n", 178 | "Look what checkAuth() does.\n", 179 | "\n", 180 | "```python\n", 181 | "def checkAuth():\n", 182 | " loginName = request.get_cookie(\"user\", secret=secretKey)\n", 183 | " randStr = request.get_cookie(\"randStr\", secret=secretKey)\n", 184 | " log.info(str(loginName) + ' ' + request.method + ' ' +\n", 185 | " request.url + ' ' + request.environ.get('REMOTE_ADDR'))\n", 186 | " if (loginName in users) and (users[loginName].get(\"randStr\", \"\") == randStr) and (users[loginName][\"loggedIn\"] == True) and (time.time() - users[loginName][\"lastSeen\"] < 3600):\n", 187 | " users[loginName][\"lastSeen\"] = time.time()\n", 188 | " return loginName\n", 189 | " return redirect('/login')\n", 190 | "```\n", 191 | "\n", 192 | "\n", 193 | "* At the beginning, try to access an encrypted cookie file for a given user.\n", 194 | "* Log in a file that a user tried to do something at our website.\n", 195 | "* Check if they are actually logged in.\n", 196 | "* If yes, update their activity time and return loginName.\n", 197 | "* If no, redirect them to login page.\n" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "### Logging in\n", 205 | "Logging in part should be rather easy.\n", 206 | "\n", 207 | "```python\n", 208 | "@app.route('/login')\n", 209 | "@app.route('/login/')\n", 210 | "@app.route('/login', method='POST')\n", 211 | "def login():\n", 212 | " loginName = request.forms.get('login_name', default=False)\n", 213 | " password = request.forms.get('password', default=False)\n", 214 | " randStr = ''.join(random.choice(\n", 215 | " string.ascii_uppercase + string.digits) for _ in range(18))\n", 216 | " log.info(str(loginName) + ' ' + request.method + ' ' +\n", 217 | " request.url + ' ' + request.environ.get('REMOTE_ADDR'))\n", 218 | " if (loginName in users) and users[loginName][\"password\"] == password:\n", 219 | " response.set_cookie(\"user\", loginName, secret=secretKey)\n", 220 | " response.set_cookie(\"randStr\", randStr, secret=secretKey)\n", 221 | " users[loginName][\"loggedIn\"] = True\n", 222 | " users[loginName][\"randStr\"] = randStr\n", 223 | " users[loginName][\"lastSeen\"] = time.time()\n", 224 | " \n", 225 | " redirect('/index')\n", 226 | " return True\n", 227 | " else:\n", 228 | " return template('login')\n", 229 | " return template('login')\n", 230 | "```\n", 231 | "\n", 232 | "* @app.route('/login', method='POST') tells a server that POST data will be passed (data from a form).\n", 233 | "* Try to access data from a form: loginName = request.forms.get('login_name', default=False)\n", 234 | "* Generate a random string for a user. It makes us safe from forged cookies or using outdated cookies.\n", 235 | "* Log the activity in a log file.\n", 236 | "* Check if the password is correct. You must never store passwords in plain text, always use hashes in production, see https://crackstation.net/hashing-security.htm." 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "### Other\n", 244 | "Other parts are simple and do not require changes.\n", 245 | "```python\n", 246 | "@app.route('/static/:path#.+#', name='static')\n", 247 | "def static(path):\n", 248 | " return static_file(path, root='./static')\n", 249 | "```\n", 250 | "Tell Bottle that some elements are static and they are not a function or Python code.\n", 251 | "\n", 252 | "```python\n", 253 | "app = Bottle()\n", 254 | "```\n", 255 | "Create an instance of the application.\n", 256 | "\n", 257 | "```python\n", 258 | "from users import users\n", 259 | "```\n", 260 | "Read user data from a dictionary (it should eventually be a database). The dictionary of uses allows you to store data in a session.\n", 261 | "\n", 262 | "```python\n", 263 | "secretKey = \"SDMDSIUDSFYODS&TTFS987f9ds7f8sd6DFOUFYWE&FY\"\n", 264 | "```\n", 265 | "Define a secret key. This key should be hidden, because it is used to encrypt cookies. The key should be unique (different) for every application.\n", 266 | "\n", 267 | "```python\n", 268 | "log = logging.getLogger('bottle')\n", 269 | "log.setLevel('INFO')\n", 270 | "h = logging.handlers.TimedRotatingFileHandler(\n", 271 | " 'logs/nlog', when='midnight', backupCount=9999)\n", 272 | "f = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s')\n", 273 | "h.setFormatter(f)\n", 274 | "log.addHandler(h)\n", 275 | "```\n", 276 | "Define how logging in works.\n", 277 | "Read more about logging in:\n", 278 | "* https://docs.python.org/3/library/logging.html\n", 279 | "* https://fangpenlin.com/posts/2012/08/26/good-logging-practice-in-python/" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": { 285 | "collapsed": true 286 | }, 287 | "source": [ 288 | "## Streamlit\n", 289 | "\n", 290 | "A simple library that can be used for rapid prototyping or building quick web apps in python is called Streamlit.\n", 291 | "\n", 292 | "You can read more about it as see examples of the capabilities on their website: https://streamlit.io/\n", 293 | "\n", 294 | "Below is a simple example of a web app built with streamlit, using just a few lines of code.\n", 295 | "\n", 296 | "Copy it into a .py file and then run using `streamlit run streamlit_app.py`\n", 297 | "\n", 298 | "```python\n", 299 | "\n", 300 | "# streamlit_app.py\n", 301 | "\n", 302 | "import streamlit as st\n", 303 | "import pandas as pd\n", 304 | "import numpy as np\n", 305 | "import matplotlib.pyplot as plt\n", 306 | "\n", 307 | "\n", 308 | "st.title(\"Streamlit Demo Application\")\n", 309 | "st.write(\"This is a simple demonstration of Streamlit capabilities.\")\n", 310 | "\n", 311 | "# Sidebar\n", 312 | "st.sidebar.header(\"User Input\")\n", 313 | "user_name = st.sidebar.text_input(\"Enter your name:\", \"Guest\")\n", 314 | "data_points = st.sidebar.slider(\"Number of data points:\", min_value=10, max_value=500, value=100)\n", 315 | "chart_type = st.sidebar.selectbox(\"Choose chart type:\", [\"Line\", \"Bar\", \"Area\"])\n", 316 | "\n", 317 | "\n", 318 | "st.write(f\"Hello, **{user_name}!** Let's explore some data!\")\n", 319 | "\n", 320 | "# Generate random dataset\n", 321 | "data = pd.DataFrame({\n", 322 | " \"X\": np.arange(1, data_points + 1),\n", 323 | " \"Y\": np.random.randn(data_points).cumsum()\n", 324 | "})\n", 325 | "\n", 326 | "# Display data\n", 327 | "st.subheader(\"Generated Dataset\")\n", 328 | "st.dataframe(data)\n", 329 | "\n", 330 | "# Descriptive stats\n", 331 | "st.subheader(\"Basic Data Statistics\")\n", 332 | "st.write(data.describe())\n", 333 | "\n", 334 | "# Visualization\n", 335 | "st.subheader(\"Data Visualization\")\n", 336 | "if chart_type == \"Line\":\n", 337 | " st.line_chart(data.set_index(\"X\"))\n", 338 | "elif chart_type == \"Bar\":\n", 339 | " st.bar_chart(data.set_index(\"X\"))\n", 340 | "elif chart_type == \"Area\":\n", 341 | " st.area_chart(data.set_index(\"X\"))\n", 342 | "\n", 343 | "# Matplotlib plot\n", 344 | "st.subheader(\"Matplotlib Plot\")\n", 345 | "fig, ax = plt.subplots()\n", 346 | "ax.plot(data[\"X\"], data[\"Y\"], label=\"Cumulative Sum\", color=\"blue\", alpha=0.7)\n", 347 | "ax.set_title(\"Matplotlib Plot\")\n", 348 | "ax.set_xlabel(\"X\")\n", 349 | "ax.set_ylabel(\"Y\")\n", 350 | "ax.legend()\n", 351 | "st.pyplot(fig)\n", 352 | "\n", 353 | "# Footer\n", 354 | "st.write(\"Thank you for using this demo! 🎉\")\n", 355 | "\n", 356 | "```" 357 | ] 358 | } 359 | ], 360 | "metadata": { 361 | "kernelspec": { 362 | "display_name": "Python 3", 363 | "language": "python", 364 | "name": "python3" 365 | }, 366 | "language_info": { 367 | "codemirror_mode": { 368 | "name": "ipython", 369 | "version": 3 370 | }, 371 | "file_extension": ".py", 372 | "mimetype": "text/x-python", 373 | "name": "python", 374 | "nbconvert_exporter": "python", 375 | "pygments_lexer": "ipython3", 376 | "version": "3.8.10" 377 | } 378 | }, 379 | "nbformat": 4, 380 | "nbformat_minor": 2 381 | } 382 | -------------------------------------------------------------------------------- /notebooks/exm_4.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Intro to Python and SQL Test - Group 4.\n", 8 | "\n", 9 | "Instructions:\n", 10 | "\n", 11 | "1. Set up Jupyter Notebook for this project using Kaggle, Google Colab or locally. Each point below should be in a separate cell.\n", 12 | "2. Load up the data file from `wather_data.db` and convert it to a dataframe (file is in data folder on our repo).\n", 13 | "3. Draw random variables between 0 and 4, where the sample size is equal to the number of obeservations in the Temperature column.\n", 14 | "4. Create new column in the df called temp_add. Add these random variables to this column, show first 25.\n", 15 | "5. Create a new column called temp_increased when you add values from temperature column and temp_add column.\n", 16 | "6. Display descriptive statistics of temperature column and temperature increased column.\n", 17 | "7. Analyze these values and comment on them using markdown (compare mean values, check the range, compare standard deviation and variance).\n", 18 | "8. Change the Date column to be an index of a dataframe.\n", 19 | "10. List months with an average temperature higher than 23.\n", 20 | "11. Export the data as CSV file. Then load the file and display the contents.\n", 21 | "12. Calculate the mean temperature for each month for temperature column and temp increased column. Plot both using seaborn.\n", 22 | "13. In markdown, shortly explain the function you used in pt 11 and its parameters.\n", 23 | "14. Create a simple streamlit app which shows the original data, the mean data, and the plot. If you're on Kaggle on Colab just write code for this, don't run it.\n", 24 | "15. Send the notebook, the csv file and screenshot of the streamlit app (if you run it locally) to j.michankow@uw.edu.pl" 25 | ] 26 | } 27 | ], 28 | "metadata": { 29 | "language_info": { 30 | "name": "python" 31 | } 32 | }, 33 | "nbformat": 4, 34 | "nbformat_minor": 2 35 | } 36 | --------------------------------------------------------------------------------