├── LICENSE ├── README.md └── notebooks ├── 00_python_basics.ipynb ├── 01_basic_data_types.ipynb ├── 02_basic_numerical_operations.ipynb ├── 03_basic_string_operations.ipynb ├── 04_basic_list_operations.ipynb ├── 05_basic_tuple_dict_operations.ipynb ├── 06_logical_operations.ipynb ├── 07_iterations.ipynb ├── 08_input_output.ipynb ├── 09_functions.ipynb ├── 10_classes.ipynb ├── 11_modules_and_packages.ipynb ├── 12_exception_handling.ipynb ├── 13_time_random_ordereddict.ipynb ├── 14_os_sys_shutil.ipynb ├── 15_numpy_scipy.ipynb ├── 16_pandas.ipynb ├── 17_data_visualization_matplotlib.ipynb ├── 18_data_visualization_seaborn.ipynb ├── 19_time_series.ipynb ├── 20_intro_to_machine_leaning.ipynb ├── 21_scikit-learn.ipynb ├── 22_advanced_ml_concepts.ipynb ├── 23_preprocessing_1.ipynb ├── 24_preprocessing_2.ipynb ├── 25_pipelines_gridsearch.ipynb ├── 26_forecasting.ipynb ├── 27_clustering.ipynb ├── 28_natural_language_processing.ipynb ├── custom_module.py └── scr_args.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 ThanosTagaris 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A complete tutorial in python for Data Analysis and Machine Learning 2 | 3 | This tutorial doesn't assume any prior knowledge in python, or any other background in programming languages. 4 | 5 | The whole tutorial is written in jupyter notebooks, which I feel is the best platform for this sort of thing. If you want to run them locally: 6 | 7 | - Download and install [python](https://www.python.org/downloads/) (preferably python 3). Add python to your environmental variables. 8 | - Download [pip](https://bootstrap.pypa.io/get-pip.py). 9 | - Install it through `python get-pip.py`. 10 | - Install jupyter through pip: `pip install jupyter`. 11 | 12 | If the notebooks don't render properly through github you can always view them through [here](https://nbviewer.jupyter.org/github/djib2011/python_ml_tutorial/tree/master/). 13 | -------------------------------------------------------------------------------- /notebooks/00_python_basics.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"},"colab":{"name":"00_python_basics.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"6CYFJ7S1zapL"},"source":["# Python\n","\n","What is python exactly?\n","\n","> Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. \n","-- Wikipedia\n","\n","So what does that mean? Let's start from the end.\n","\n","Python is a **programming language**. This means python is a language with which we can communicate to our computer and tell it what to do!\n","\n","A **dynamic** programming language, is a programming language that executes it's commands at runtime. On the other hand *static* programming languages require a procedure that is called *compilation*. This procedure first translates the human-readable commands into machine-language instructions. An **interpreted** language doesn't require a *compiler* to run, but an *interpreter*.\n","\n","**General-purpose** programming languages, are those that are used in a wide variety of application domains. On the contrary there are *domain-specific* programming languages that are used in a single domain. Some examples of the latter are *HTML* (markup language for web applications), *MATLAB* (matrix representation and visualization), *SQL* (relational database queries) and *UNIX shell scripts* (data organization in unix-based systems).\n","\n","A **high-level** programming language has a strong abstraction from the details of the computer. These languages are much easier to work with, because they automate many areas such as memory management.\n","\n","Python is a really easy and powerful programming language and it has a simple and very straightforward syntax.\n","\n","\n","## Versions\n","\n","There are two python versions running in parallel, **python 2.7** and **python 3.x**. While python 3 came out in 2008, it saw a slow adoption from the community. We will be using **python 3** for the remainder of this tutorial.\n","\n","## Comments in python\n","\n","Comments are lines that are ignored from the computer and are meant only for humans to be read."]},{"cell_type":"code","metadata":{"id":"IL4YCU0PzapO","outputId":"d980da55-2a09-44eb-a723-bae675a66189"},"source":["# This is a comment!\n","\n","'''\n","This is a\n","multiline\n","comment.\n","'''\n","\n","\"\"\"\n","This is also a multiline comment.\n","' and \" are interchangeable in python.\n","\"\"\""],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'\\nThis is also a multiline comment.\\n\\' and \" are interchangeable in python.\\n'"]},"metadata":{"tags":[]},"execution_count":1}]},{"cell_type":"markdown","metadata":{"id":"DpjFIsp2zapY"},"source":["## Printing to the screen\n","\n","The first thing we want to do when learning any programming language is print something (like *\"Hello World\"*) on screen.\n","\n","In order to display information to the user, we use the `print` function. This also includes a *new line* directive (i.e two prints are shown in separate lines)."]},{"cell_type":"code","metadata":{"id":"z9fTpkTuzapc","outputId":"31354efd-3bad-40fc-9aa4-8c8568650c37"},"source":["print('Hello World!')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Hello World!\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"CRA7XYNczapi"},"source":["This line instructs the computer to display the phrase 'Hello World!' on screen.\n","\n","The `print` function is one of the major differences between python 2 and 3. In python 2.x the correct syntax would be:\n","\n","```python \n","print 'Hello World!'\n","```\n","\n","We can also print multiple things at once:"]},{"cell_type":"code","metadata":{"id":"wnyQzZmQzapj","outputId":"680973fa-1048-486b-f9c6-330c608deb17"},"source":["print('argument 1, ', 'argument 2, ', 'argument 3')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["argument 1, argument 2, argument 3\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"IzVmuYnzzapq"},"source":["In order to get the same result for both versions of python, we could add the line:"]},{"cell_type":"code","metadata":{"id":"3RRaGhkAzapq","outputId":"3b426cf4-706c-4677-ac42-1933a8b50d4d"},"source":["from __future__ import print_function\n","print('argument 1, ', 'argument 2, ', 'argument 3')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["argument 1, argument 2, argument 3\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"a6OYEgwbzapw"},"source":["## Importing external libraries\n","\n","The first command is what we call an ***import***. These extend the functionality of python by adding more commands (or in this case modifying existing ones).\n","This is done by *imorting* an **external library** (in this case the `numpy` library).\n","\n","Libraries are a really important part of programming, because they allow us to use code that is already written by others. This way we don't have to write every program from scratch!\n"," \n","Let's say we want to create an *array*. Python does not support arrays, but luckily there is an external library that does: **numpy**. \n","We can use external libraries (like *numpy*) in three ways:"]},{"cell_type":"code","metadata":{"id":"TRtJUIXKzapx","outputId":"929aed86-2e0a-4342-863c-3646b092b117"},"source":["import numpy # imports the library as is\n","numpy.array([1,2,3]) # creates the array [1,2,3]\n","\n","import numpy as np # imports numpy and from the future we can refer to it as 'np'\n","np.array([1,2,3]) # creates the array [1,2,3]\n","\n","from numpy import array # imports only the class 'array' from the library numpy\n","array([1,2,3]) # creates the array [1,2,3]"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([1, 2, 3])"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"2ZBB5-Wdzap2"},"source":["We can choose any way we like to import libraries, each has it's pros and cons. Note that we only need to import a library **once** in our program to use it!\n","\n","The main repository for python packages is [PyPI](https://pypi.python.org/pypi), and the main tool for dowloading and installing packages from this repositpry is called [pip](https://pip.pypa.io/en/stable/). In order to install pip, download [get-pip.py](https://bootstrap.pypa.io/get-pip.py) and run the downloaded script with the following command:\n","\n","```\n","python get-pip.py\n","```\n","\n","Once *pip* is installed, to download and install a new package (e.g. *numpy*) just type:\n","\n","```\n","pip install numpy\n","```\n","or\n","\n","```\n","python -m pip install numpy\n","```\n","\n","## Assigning values to variables\n","\n","To store information in memory we use *variables*. These can be letters or combinations of letters and numbers that help us store data to memory.\n","\n","The procedure with which we give a variable a value is called **assignment** and is done in python with the **equal sign (=)**."]},{"cell_type":"code","metadata":{"id":"VIw4GVdozap3"},"source":["number_1 = 66"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"f2-RSJq5zap8"},"source":["In the previous line we instructed the computer to store the number *66* in it's memory. We also told the computer, that from now on, we will refer to this number as *number_1*.\n","\n","Now, if we want to print the stored number, we just need to reference it."]},{"cell_type":"code","metadata":{"id":"mVxq8pWSzap8","outputId":"15558d4f-c3ee-4276-dca2-d1409e2d7324"},"source":["print(number_1)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["66\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"sRiJWp9qzaqB"},"source":["Once a value has been assigned to a variable, it **can** be changed."]},{"cell_type":"code","metadata":{"id":"SCvsRUNxzaqC","outputId":"9a64dedc-b59b-4076-d5ad-4d582ce32910"},"source":["number_1 = 55\n","print(number_1)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["55\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"m1e8KPL-zaqG"},"source":["Python also allows for multiple variable assignments at once."]},{"cell_type":"code","metadata":{"id":"Xde0j_2DzaqH","outputId":"51fdf157-cbd4-4890-8114-390e21c829af"},"source":["a = b = c = 1\n","print(a, b, c)\n","a, b, c = 1, 2, 3\n","print(a, b, c)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["1 1 1\n","1 2 3\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"VvTQcEH3zaqL"},"source":["Whitespace is good for making code easier to read by humans, but it makes **no** difference for the computer. "]},{"cell_type":"code","metadata":{"id":"L_ER8mOWzaqM","outputId":"cdf95e51-2a7b-45e2-a7b8-36a7438f4d21"},"source":["a=5\n","print(a)\n","a = 5\n","print(a)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["5\n","5\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"L48WEflMzaqQ"},"source":["These two are exactly the same.\n","\n","One exception would be **whitespace before the commands**. This is also called **indentation**. Indentation, as we'll see in the future, is important for python to understand nested commands!"]},{"cell_type":"code","metadata":{"id":"rTwoCUdKzaqR","outputId":"6b545c5e-48c5-4ab1-f01c-4dc2b23694ae"},"source":["print(a)\n"," print(a)"],"execution_count":null,"outputs":[{"output_type":"error","ename":"IndentationError","evalue":"unexpected indent (, line 2)","traceback":["\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m2\u001b[0m\n\u001b[0;31m print(a)\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mIndentationError\u001b[0m\u001b[0;31m:\u001b[0m unexpected indent\n"]}]},{"cell_type":"markdown","metadata":{"id":"1XmyN1aVzaqX"},"source":["We are not allowed to indent commands without reason! \n","\n","When python encounters a state it is not meant to, it usually raises an **error**! There are several built-in error types (such as the *IndentationError* we saw before). Furthermore, we can *create* our own errors and *handle* them accordingly, in order to prevent our program from doing things it is not meant to! We'll learn how to do this in a later tutorial.\n","\n","The process of assigning a value to a variable binds the variable's name (e.g *number_1*) to that value. If we wish to free the name for future use we can **delete** the variable. This causes the stored memory to be lost!"]},{"cell_type":"code","metadata":{"id":"i3-uv65ezaqX","outputId":"8a8d8f54-cb14-44c0-e722-039d209611ad"},"source":["del number_1\n","print(number_1)"],"execution_count":null,"outputs":[{"output_type":"error","ename":"NameError","evalue":"name 'number_1' is not defined","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0mnumber_1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumber_1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mNameError\u001b[0m: name 'number_1' is not defined"]}]},{"cell_type":"markdown","metadata":{"id":"ZVpBlqD3zaqb"},"source":["We got this error because we tried to print the value of a variable that didn't exist (because we deleted it in the previous line).\n","\n","Variable deletion should be done **only** when we **no longer need** the information this variable stores.\n","\n","Similar to assignments, del can also delete multiple objects at once:"]},{"cell_type":"code","metadata":{"id":"wTpLgHkQzaqc"},"source":["del a, b, c"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zADpDus0zaqg"},"source":["When naming variables keep in mind that python is case sensitive."]},{"cell_type":"code","metadata":{"id":"qn5FS04Azaqh","outputId":"5b765f2c-30b6-4d17-d946-6552f340c78c"},"source":["number_1 = 1\n","Number_1 = 2\n","NUMBER_1 = 3\n","nUmBeR_1 = 4\n","print(number_1, Number_1, NUMBER_1, nUmBeR_1)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["1 2 3 4\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"vQIy8GzIzaqt"},"source":["In python these are 4 different variables.\n","\n","A good way of presenting data to the user is adding a description and then the value."]},{"cell_type":"code","metadata":{"id":"TeeK0ghuzaqt","outputId":"2ef9c6b5-aa1c-4cd7-b051-341d4c5728dc"},"source":["print('The value stored in the variable nUmBeR_1 is:', nUmBeR_1)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The value stored in the variable nUmBeR_1 is: 4\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"xVU1E_70zaqx"},"source":["In python we prefer **lower_case_variable_names_separated_with_underscores**. \n","\n","In the next tutorials we'll see:\n","1. The main data types.\n","2. Other common programming techniques (conditionals and loops).\n","3. Input/Output operations (such as print, input and file read/write).\n","4. How functions and classes are used in python.\n","5. An intro in error and exception handling.\n","6. Data loading and manipulation with `numpy` and `pandas`.\n","7. Data visualization with `matplotlib` and `seaborn`.\n","8. Machine learning with `scikit-learn`."]}]} -------------------------------------------------------------------------------- /notebooks/01_basic_data_types.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"},"colab":{"name":"01_basic_data_types.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"-mZsZdvX4q7q"},"source":["# Data Types\n","\n","The data we store in a variable is always of a certain **type**. In other programming languages we would have to declare the type of data that the variable stores beforehand. However, in python the variable has it's type appointed dynamically when it is assigned a value.\n","\n","We can figure out the type of a variable through the `type()` built-in function."]},{"cell_type":"code","metadata":{"id":"wqZL123e4q73","outputId":"dafa51a7-64f6-4acf-e158-67a08af0317f"},"source":["i = 5\n","type(i)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["int"]},"metadata":{"tags":[]},"execution_count":1}]},{"cell_type":"markdown","metadata":{"id":"CnngqJhV4q8D"},"source":["Here, the variable `i` is of type `int`, which stands for integer.\n","\n","Python has 5 standard data types: \n","Numbers, Strings, Lists, Tuples and Dictionaries.\n","\n","## 1. Numbers\n","Python has three basic numerical types.\n","\n","### a) int \n","for signed integers"]},{"cell_type":"code","metadata":{"id":"QiYz1UQp4q8H","outputId":"6b173901-db48-4784-d479-5a069275e13d"},"source":["a, b, c = 0, 55, -555\n","type(a)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["int"]},"metadata":{"tags":[]},"execution_count":2}]},{"cell_type":"markdown","metadata":{"id":"RH64xXby4q8N"},"source":["### b) float \n","for floating point decimals"]},{"cell_type":"code","metadata":{"scrolled":true,"id":"6ZJrc5Uf4q8O","outputId":"709933c2-7300-4bcc-9cae-9f5a843e63c0"},"source":["a, b, c = 1.0, 89.3333333, -12.1\n","type(a)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["float"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"udMYKr0Y4q8U"},"source":["Decimals in python are separated using dots: `.`\n","\n","### c) complex: \n","for complex numbers"]},{"cell_type":"code","metadata":{"id":"m5NVdPmU4q8V","outputId":"aa092521-c224-46b5-b886-c3a133b5f011"},"source":["r, i, z = -.12+0j, 45.j, 5+3j\n","type(z)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["complex"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"markdown","metadata":{"id":"x3LnnvG84q8b"},"source":["## 2. Strings\n","Strings are sequences of characters represented in quotation marks. Stings can be either defined using single (`'...'`) and double `(\"..\")` quotation marks. "]},{"cell_type":"code","metadata":{"id":"z3EKR-yb4q8c","outputId":"21bb7704-bb91-4ed0-fdc9-ee7961e76c9e"},"source":["st = 'I am a string!'\n","st_2 = \"1\"\n","st_3 = ''\n","type(st)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["str"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"Qc37iiYT4q8i"},"source":["Note that `st_2` is a string and not an integer (because of the quotation marks).\n","\n","## 3. Lists\n","Lists are data types that contain objects separated by commas and enclosed by square brackets: `[...]` \n","Lists don't need to contain objects of the same data type."]},{"cell_type":"code","metadata":{"id":"MgoxhZTy4q8j","outputId":"5ffcd95c-2f7a-4da8-e80f-d3215f2a31d3"},"source":["ls = ['a_string', 123, 12.21, 1+2j]\n","# a list can even contain other lists\n","ls_2 = [ [ 'abc', 123 ], ['0', 0], [99, 12j, \"1\", 'asdf'] ]\n","ls_3 = []\n","type(ls)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["list"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"uKvwtrmX4q8n"},"source":["## 4. Tuples\n","Tuples are similar to lists, but their contents and size cannot be changed (unlike lists). \n","\n","Tuple elements are separated by commas (like lists) but enclosed within parentheses: `(...)`"]},{"cell_type":"code","metadata":{"id":"vq2ZJF1K4q8o","outputId":"21697ac1-b2ae-4045-8af3-63f64514b5e6"},"source":["tp = ('a_string', 123, 12.21, 1+2j)\n","tp_2 = ( ( 'abc', 123 ), ('0', 0), (99, 12j, \"`\", 'asdf') )\n","tp_2 = ()\n","type(tp)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["tuple"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"r01PEFDf4q8t"},"source":["## 5. Dictionaries\n","Dictionaries store data in key-value pairs. Dictionaries are enclosed by curly brackets: `{...}`"]},{"cell_type":"code","metadata":{"id":"dI6S4Pxd4q8t","outputId":"bc54de84-ab7c-49bf-f062-3a38de7c41e0"},"source":["dc = {'name': 'a_name', 'age': 66, 'address': [53, 'Fragkoklistias', 'St'] }\n","dc_2 = {}\n","type(dc)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["dict"]},"metadata":{"tags":[]},"execution_count":8}]},{"cell_type":"markdown","metadata":{"id":"fz3Pi93e4q8y"},"source":["Dictionaries have no concept of order among elements. For that you would need an *OrderedDict*, which we will cover in the future.\n","\n","## Type Casting\n","In order to change from one data type to another we can use type casting."]},{"cell_type":"code","metadata":{"id":"KD8fXPx54q8z","outputId":"e778c95c-8577-4c05-c239-e9bd52ec727a"},"source":["a = 5\n","type(a)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["int"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"0NeSbEE34q83"},"source":["`a` is an integer."]},{"cell_type":"code","metadata":{"id":"VaAjivBO4q84","outputId":"80f8692a-2247-478a-9509-609352c83561"},"source":["b = str(a)\n","type(b)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["str"]},"metadata":{"tags":[]},"execution_count":10}]},{"cell_type":"markdown","metadata":{"id":"P09sOoFP4q89"},"source":["`b` is the string `'a'`."]},{"cell_type":"code","metadata":{"id":"n2sBqZxp4q89","outputId":"a12d4945-9a64-4fa6-b4b1-46b3d4ac8580"},"source":["c = float(b)\n","type(c)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["float"]},"metadata":{"tags":[]},"execution_count":11}]},{"cell_type":"markdown","metadata":{"id":"HpgQ4Pua4q9D"},"source":["`c` is the float 5.0.\n","\n","Casting can work only on allowed conversions."]},{"cell_type":"code","metadata":{"id":"7DQa4Aq84q9E","outputId":"62e3cc8f-7232-4b36-ae49-900b475a7ab2"},"source":["float('asdf')"],"execution_count":null,"outputs":[{"output_type":"error","ename":"ValueError","evalue":"could not convert string to float: 'asdf'","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'asdf'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mValueError\u001b[0m: could not convert string to float: 'asdf'"]}]},{"cell_type":"markdown","metadata":{"id":"AUybbk8z4q9H"},"source":["### Other type casts:"]},{"cell_type":"code","metadata":{"id":"xA6JcP6i4q9J","outputId":"bd46c8bd-077f-46f5-ff35-7866e5396e5f"},"source":["int(a) # integer\n","float(a) # float\n","str(a) # string\n","repr(a) # expression string\n","l = []\n","t =tuple(l) # tuple\n","list(t) # list\n","set(t) # set\n","dict(t) # dict\n","chr(a) # character\n","ord('a') # converts a single character to it's integer value\n","hex(a) # hexadecimal\n","oct(a) # octal"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'0o5'"]},"metadata":{"tags":[]},"execution_count":13}]},{"cell_type":"markdown","metadata":{"id":"3Pw6KayH4q9O"},"source":["## Boolean Type\n","\n","True of False. The default type for logical operations."]},{"cell_type":"code","metadata":{"id":"ePrJkppv4q9P","outputId":"7a0b4ff8-c05e-4dd7-b89e-14076e64a1c0"},"source":["b = True\n","b = False\n","b = bool(1)\n","type(b)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["bool"]},"metadata":{"tags":[]},"execution_count":14}]}]} -------------------------------------------------------------------------------- /notebooks/02_basic_numerical_operations.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"},"colab":{"name":"02_basic_numerical_operations.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"F9zXNkyq4_7Q"},"source":["# Numerical Operations in Python"]},{"cell_type":"code","metadata":{"id":"dZXVpSRn4_7T"},"source":["from __future__ import print_function\n","# we will use the print function in this tutorial for python 2 - 3 compatibility"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"ShR7De4w4_7d"},"source":["a = 4\n","b = 5\n","c = 6\n","# we'll declare three integers to assist us in our operations"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"vuxr1Adg4_7m"},"source":["If we want to add the first two together (and store the result in a variable we will call `S`):\n","\n","```python\n","S = a + b \n","```\n","\n","The last part of the equation (i.e `a+b`) is the numerical operation. This sums the value stored in the variable `a` with the value stored in `b`.\n","The plus sign (`+`) is called an arithmetic operator.\n","The equal sign is a symbol used for assigning a value to a variable. In this case the result of the operation is assigned to a new variable called `S`."]},{"cell_type":"markdown","metadata":{"id":"nLLhSPa_4_7n"},"source":["## The basic numeric operators in python are: "]},{"cell_type":"code","metadata":{"id":"dYRWr_PL4_7o","outputId":"e0161ea4-500e-40fe-b71d-d7ba117ba97a"},"source":["# Sum:\n","S = a + b\n","print('a + b =', S)\n","\n","# Difference:\n","D = c - a\n","print('c + a =', D)\n","\n","# Product:\n","P = b * c\n","print('b * c =', P)\n","\n","# Quotient:\n","Q = c / a\n","print('c / a =', Q)\n","\n","# Remainder:\n","R = c % a \n","print('a % b =', R)\n","\n","# Floored Quotient:\n","F = c // a\n","print('a // b =', F)\n","\n","# Negative:\n","N = -a\n","print('-a =', N)\n","\n","# Power:\n","Pow = b ** a\n","print('b ** a =', Pow)\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["a + b = 9\n","c + a = 2\n","b * c = 30\n","c / a = 1.5\n","a % b = 2\n","a // b = 1\n","-a = -4\n","b ** a = 625\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"kzVQJ2hZ4_7x"},"source":["What is the difference between `/` and `//` ?\n","\n","The first performs a regular division between two numbers, while the second performs a *euclidean division* **without the remainder**. \n","\n","Important note: \n","In python 2 `/` would return an integer if the two numbers participating in the division were integers. In that sense:\n","\n","```python\n","Q = 6 / 4 # this would perform a euclidean division because both divisor and dividend are integers!\n","Q = 6.0 / 4 # this would perform a real division because the dividend is a float\n","Q = c / (a * 1.0) # this would perform a real division because the divisor is a float\n","Q = c / float(a) # this would perform a real division because the divisor is a float\n","```\n","\n","One way to make python 2 compatible with python 3 division is to import `division` from the `__future__` package. We will do this for the remainder of this tutorial."]},{"cell_type":"code","metadata":{"id":"dNXYcwyc4_7y","outputId":"9df7f01a-92aa-462f-fce5-4a47a930b0e2"},"source":["from __future__ import division\n","Q = c / a\n","print(Q)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["1.5\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"OB3QIqGV4_75"},"source":["We can combine more than one operations in a single line."]},{"cell_type":"code","metadata":{"id":"8GC4_YKM4_75","outputId":"e82b8335-6f25-41e7-c4e6-4dba222ce2ab"},"source":["E = a + b - c\n","print(E)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["3\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"1Z8zuLa64_8A"},"source":["Priorities are the same as in algebra: \n","parentheses -> powers -> products -> sums\n","\n","We can also perform more complex assignment operations:"]},{"cell_type":"code","metadata":{"id":"3bYj42Ed4_8B","outputId":"db4cc6fc-951f-44d3-f20a-0247b31ceca4"},"source":["print('a =', a)\n","print('S =', S)\n","S += a # equivalent to S = S + a\n","print('+ a =', S)\n","S -= a # equivalent to S = S - a\n","print('- a =', S)\n","S *= a # equivalent to S = S * a\n","print('* a =', S)\n","S /= a # equivalent to S = S / a\n","print('/ a =', S)\n","S %= a # equivalent to S = S % a\n","print('% a =', S)\n","S **= a # equivalent to S = S ** a\n","print('** a =', S)\n","S //= a # equivalent to S = S // a\n","print('// a =', S)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["a = 4\n","S = 9\n","+ a = 13\n","- a = 9\n","* a = 36\n","/ a = 9.0\n","% a = 1.0\n","** a = 1.0\n","// a = 0.0\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"SiTLn08e4_8G"},"source":["## Other operations:"]},{"cell_type":"code","metadata":{"id":"556hgV8x4_8H","outputId":"ab0c41a0-f14b-4637-caba-4bf829bbe98b"},"source":["n = -3\n","print('n =', n)\n","A = abs(n) # Absolute:\n","print('absolute(n) =', A)\n","C = complex(n, a) # Complex: -3+4j\n","print('complex(n,a) =', C)\n","c = C.conjugate() # Conjugate: -3-4j\n","print('conjugate(C) =', c)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["n = -3\n","absolute(n) = 3\n","complex(n,a) = (-3+4j)\n","conjugate(C) = (-3-4j)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"ivGJ2YF64_8L"},"source":["## Bitwise operations:\n","\n","Operations that first convert a number to its binary equivalent and then perform operations bit by bit bevore converting them again to their original form."]},{"cell_type":"code","metadata":{"id":"1ynWfd7g4_8M","outputId":"fc606d82-2863-4e47-fc93-e8def1be7d89"},"source":["a = 3 # or 011 (in binary)\n","b = 5 # or 101 (in binary)\n","print(a | b) # bitwise OR: 111 (binary) --> 7 (decimal)\n","print(a ^ b) # exclusive OR: 110 (binary) --> 6 (decimal)\n","print(a & b) # bitwise AND: 001 (binary) --> 1 (decimal)\n","print(b << a) # b shifted left by a bits: 101000 (binary) --> 40 (decimal)\n","print(8 >> a) # 8 shifted left by a bits: 0001 (binary - was 1000 before shift) --> 1(decimal)\n","print(~a) # NOT: 100 (binary) --> -4 (decimal)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["7\n","6\n","1\n","40\n","1\n","-4\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"6dlX0qIH4_8S"},"source":["## Built-in methods\n","\n","Some data types have built in methods, for example we can check if a float variable stores an integer as follows:"]},{"cell_type":"code","metadata":{"id":"uPxyRVo54_8S","outputId":"f46106b6-1d05-4db3-9475-c488245f90b9"},"source":["a = 3.0\n","t = a.is_integer()\n","print(t)\n","a = 3.2\n","t = a.is_integer()\n","print(t)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["True\n","False\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"gucItGiC4_8X"},"source":["Note that the casting operation between floats to integers just discards the decimal part (it doesn't attempt to round the number)."]},{"cell_type":"code","metadata":{"id":"w43vU8s-4_8Z","outputId":"f1230635-3b34-4e75-c065-11982f65206b"},"source":["print(int(3.21))\n","print(int(3.99))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["3\n","3\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"sH6oPVAx4_8e"},"source":["We can always `round` the number beforehand."]},{"cell_type":"code","metadata":{"id":"_a2iQuDE4_8f","outputId":"d794e95c-7e0b-410d-8144-b754e46906e2"},"source":["int(round(3.6))"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["4"]},"metadata":{"tags":[]},"execution_count":11}]},{"cell_type":"markdown","metadata":{"id":"3wZ_kG8P4_8k"},"source":["## Exercise\n","\n","What do the following operations return?"]},{"cell_type":"markdown","metadata":{"id":"eNKyKS-_4_8l"},"source":["E1 = ( 3.2 + 12 ) * 2 / ( 1 + 1 )\n","E2 = abs(-4 ** 3)\n","E3 = complex( 8 % 3, int(-2 * 1.0 / 4)-1 )\n","E4 = (6.0 / 4.0).is_integer()\n","E5 = (4 | 2) ^ (5 & 6)"]},{"cell_type":"markdown","metadata":{"id":"_OQ51TLj4_8m"},"source":["## Python's mathematical functions\n","\n","Most math functions are included in a seperate library called `math`."]},{"cell_type":"code","metadata":{"id":"tBQAUHuV4_8n","outputId":"72f58c18-12a1-438b-f1dc-3c324199b404"},"source":["import math\n","x = 4\n","\n","print('exp = ', math.exp(x)) # exponent of x (e**x)\n","print('log = ',math.log(x)) # natural logarithm (base=e) of x\n","print('log2 = ',math.log(x,2)) # logarithm of x with base 2\n","print('log10 = ',math.log10(x)) # logarithm of x with base 10, equivalent to math.log(x,10)\n","\n","print('sqrt = ',math.sqrt(x)) # square root\n","\n","print('cos = ',math.cos(x)) # cosine of x (x is in radians)\n","print('sin = ',math.sin(x)) # sine\n","print('tan = ',math.tan(x)) # tangent\n","print('arccos = ',math.acos(.5)) # arc cosine (in radians)\n","print('arcsin = ',math.asin(.5)) # arc sine\n","print('arctan = ',math.atan(.5)) # arc tangent\n","# arc-trigonometric functions only accept values in [-1,1]\n","\n","print('deg = ',math.degrees(x)) # converts x from radians to degrees\n","print('rad = ',math.radians(x)) # converts x from degrees to radians\n","\n","print('e = ',math.e) # mathematical constant e = 2.718281...\n","print('pi = ',math.pi) # mathematical constant pi = 3.141592..."],"execution_count":null,"outputs":[{"output_type":"stream","text":["exp = 54.598150033144236\n","log = 1.3862943611198906\n","log2 = 2.0\n","log10 = 0.6020599913279624\n","sqrt = 2.0\n","cos = -0.6536436208636119\n","sin = -0.7568024953079282\n","tan = 1.1578212823495775\n","arccos = 1.0471975511965979\n","arcsin = 0.5235987755982989\n","arctan = 0.4636476090008061\n","deg = 229.1831180523293\n","rad = 0.06981317007977318\n","e = 2.718281828459045\n","pi = 3.141592653589793\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"QB7HLdF84_8s"},"source":["The `math` package also provides other functions such as hyperbolic trigonometric functions, error functions, gamma functions etc. "]},{"cell_type":"markdown","metadata":{"id":"nlM0Ej7c4_8s"},"source":["## Generating a pseudo-random number\n","\n","Python has a built-in package for generating pseudo-random sequences called `random`."]},{"cell_type":"code","metadata":{"id":"LfMztBWH4_8u","outputId":"e91a9b36-8d92-451f-ae3e-fdae0adb8029"},"source":["import random\n","print(random.randint(1,10))\n","# Generates a random integer in [1,10]\n","print(random.randrange(1,100,2))\n","# Generates a random integer from [1,100) with step 2, i.e from 1, 3, 5, ..., 97, 99.\n","print(random.uniform(0,1))\n","# Generates a random float in [0,1]"],"execution_count":null,"outputs":[{"output_type":"stream","text":["1\n","21\n","0.7912325286049906\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"qfHcADlp4_8y"},"source":["## Example\n","\n","Consider the complex number $3 + 4j$. Calculate it's magnitude and it's angle, then transform it into a tuple of it's polar form."]},{"cell_type":"code","metadata":{"id":"-OaTY0Wb4_8z"},"source":["z = 3 + 4j"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"SR4H9kcj4_82"},"source":["### Solution attempt 1 (analytical). \n","\n","We don't know any of the built-in complex methods and we try to figure out an analytical solution. We will first calculate the real and imaginary parts of the complex number and then we will try to apply the Pythagorean theorem to calculate the magnitude.\n","\n","#### Step 1: \n","Find the real part of the complex number.\n","We will make use of the mathematical formula: \n","\n","$$Re(z) = \\frac{1}{2} \\cdot ( z + \\overline{z} )$$"]},{"cell_type":"code","metadata":{"id":"eZCrQCsl4_83","outputId":"6d21e087-08a3-4130-98f4-732a0d8c8aac"},"source":["rl = ( z + z.conjugate() ) / 2 \n","print(rl)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["(3+0j)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"vDY_FZ3H4_89"},"source":["Note that *rl* is still in complex format, even though it represents a real number...\n","\n","#### Step 2: \n","Find the imaginary part of the complex number.\n","\n","**1st way**, like before, we use the mathematical formula: \n","\n","$$Im(z) = \\frac{z - \\overline{z}}{2i}$$"]},{"cell_type":"code","metadata":{"id":"c7j35gGX4_89","outputId":"4d15662f-f2d6-4940-b446-034632136dec"},"source":["im = ( z - z.conjugate() ) / 2j\n","print(im)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["(4+0j)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"nylEZIGq4_9N"},"source":["Same as before `im` is in complex format, even though it represents a real number...\n","\n","#### Step 3: \n","Find the sum of the squares of the real and the imaginary parts:\n","\n","$$ S = Re(z)^2 + Im(z)^2 $$"]},{"cell_type":"code","metadata":{"id":"JdYLNtSP4_9O","outputId":"e254edb8-4990-4eb7-b5a4-29781e40feaa"},"source":["sq_sum = rl**2 + im**2\n","print(sq_sum)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["(25+0j)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"JeFosuxF4_9S"},"source":["Still we are in complex format.\n","\n","Let's try to calculate it's square root to find out the magnitude:"]},{"cell_type":"code","metadata":{"id":"VrW4ceNV4_9U","outputId":"8a472d3d-7b10-44c4-8fbc-884507a8f571"},"source":["mag = math.sqrt(sq_sum)"],"execution_count":null,"outputs":[{"output_type":"error","ename":"TypeError","evalue":"can't convert complex to float","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmag\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqrt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msq_sum\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mTypeError\u001b[0m: can't convert complex to float"]}]},{"cell_type":"markdown","metadata":{"id":"Coi-yEaj4_9Z"},"source":["Oh... so the `math.sqrt()` method doesn't support complex numbers, even though what we're trying to use actually represents a real number. \n","\n","Well, let's try to cast it as an integer and then pass it into *math.sqrt()*."]},{"cell_type":"code","metadata":{"id":"S4FX6QXm4_9Z","outputId":"3207703a-399a-4fc4-ea3b-eccddd3171ad"},"source":["sq_sum = int(sq_sum)"],"execution_count":null,"outputs":[{"output_type":"error","ename":"TypeError","evalue":"can't convert complex to int","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msq_sum\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msq_sum\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mTypeError\u001b[0m: can't convert complex to int"]}]},{"cell_type":"markdown","metadata":{"id":"GKK_uB244_9d"},"source":["We still get the same error.\n","\n","We're not stuck in a situation where we are trying to do something **mathematically sound**, that the computer refuses to do.\n","But what is causing this error? \n","\n","In math $25$ and $25+0i$ are exactly the same number. Both represent a natural number. But the computer sees them as two different entities entirely. One is an object of the *integer* data type and the other is an object of the *complex* data type. The programmer who wrote the code for the `math.sqrt()` method of the math package, created it so that it can be used on *integers* and *floats* (but not *complex* numbers), even though in our instance the two are semantically the same thing.\n","\n","Ok, so trying our first approach didn't work out. Let's try calculating this another way. We know from complex number theory that:\n","\n","$$ z \\cdot \\overline{z} = Re(z)^2 + Im(z)^2 $$"]},{"cell_type":"code","metadata":{"id":"GxAJONQw4_9e","outputId":"a6a27308-1721-41cd-bdb8-882182e1a989"},"source":["sq_sum = z * z.conjugate()\n","mag = math.sqrt(sq_sum) "],"execution_count":null,"outputs":[{"output_type":"error","ename":"TypeError","evalue":"can't convert complex to float","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0msq_sum\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mz\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mz\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconjugate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mmag\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqrt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msq_sum\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mTypeError\u001b[0m: can't convert complex to float"]}]},{"cell_type":"markdown","metadata":{"id":"UvXW-RmL4_9h"},"source":["This didn't work out either...\n","\n","### Solution attempt 2. \n","\n","We know that a complex number represents a vector in the *Re*, *Im* axes. Mathematically speaking the absolute value of a real number is defined differently than the absolute value of a complex one. Graphically though, they can both be defined as the distance of the number from (0,0). If we wanted to calculate the absolute of a real number we should just disregard it's sign and treat it as positive. On the other hand if we wanted to do the same thing to a complex number we would need to calculate the euclidean norm of it's vector (or in other words measure the distance from the complex number to (0,0), using the Pythagorean theorem). So in essence what we are looking for is the absolute value of the complex number.\n","\n","#### Step 1: \n","\n","Calculate the magnitude."]},{"cell_type":"code","metadata":{"id":"y-bdWRRU4_9j","outputId":"0c9322fb-b114-4ddf-dd9f-060b709934dc"},"source":["mag = abs(z)\n","print(mag)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["5.0\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"FTkxq9uK4_9n"},"source":["Ironically, this is the exact opposite situation of where we were before. Two things that have totally **different mathematical definitions** and methods of calculation (the absolute value of a complex and an integer), can be calculated using the same function.\n","\n","**2nd Way:** \n","As a side note we could have calculated the magnitude using the previous way, if we knew some of the complex numbers' built-in functions:"]},{"cell_type":"code","metadata":{"id":"_wWQYvk74_9o","outputId":"b62ef033-5235-434f-f74b-5fb7a05c5fb7"},"source":["rl = z.real\n","print('real =', rl)\n","im = z.imag \n","print('imaginary =', im)\n","# (now that these numbers are floats we can continue and perform operations such as the square root\n","mag = math.sqrt(rl**2 + im**2) # mag = 5.0\n","print('magnitude =', mag)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["real = 3.0\n","imaginary = 4.0\n","magnitude = 5.0\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"x_V8Mg2K4_9r"},"source":["#### Step 2: \n","Calculate the angle.\n","\n","**1st way:** \n","First we will calculate the cosine of the angle. The cosine is the real part divided by the magnitude."]},{"cell_type":"code","metadata":{"id":"Nn860_UY4_9s","outputId":"795497d5-cac1-42a6-da9c-edc2411c878b"},"source":["cos_ang = rl / mag\n","print(cos_ang)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["0.6\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"42ku6dNl4_9v"},"source":["To find the angle we use the arc cosine function from the math package."]},{"cell_type":"code","metadata":{"id":"mVI1iHI04_9w","outputId":"f713192f-485b-45d9-986a-ef6eeef711a4"},"source":["ang = math.acos(cos_ang)\n","print('phase in rad =', ang)\n","print('phase in deg =', math.degrees(ang))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["phase in rad = 0.9272952180016123\n","phase in deg = 53.13010235415599\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"OS7q8lrD4_90"},"source":["**2nd way:** \n","Another way tou find the angle (or more correctly phase) of the complex number is to use a function from the `cmath` (complex math) package."]},{"cell_type":"code","metadata":{"id":"oJYopzq34_91","outputId":"1f36ae96-ccb8-4ef9-e6d7-5d5356d6c8cf"},"source":["import cmath\n","ang = cmath.phase(z)\n","print('phase in rad =', ang)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["phase in rad = 0.9272952180016122\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"pGIW5zEI4_93"},"source":["Without needing to calculate anything beforehand (no *rl* and no *mag* needed).\n","\n","#### Step 3: \n","Create a tuple of the complex number's polar form:"]},{"cell_type":"code","metadata":{"id":"sHZbadQr4_93","outputId":"c51d006a-c200-4fb3-f96c-bba196d0c213"},"source":["pol = (mag, ang)\n","print(pol)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["(5.0, 0.9272952180016122)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"ALTNYPoi4_96"},"source":["### Solution attempt 4 (using python's built in cmath package):"]},{"cell_type":"code","metadata":{"id":"rz8azj9X4_97","outputId":"f3c805e2-fd2f-4cd4-9483-065460f19819"},"source":["pol = cmath.polar(z)\n","print(pol)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["(5.0, 0.9272952180016122)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"Mgh7K6N74_9-"},"source":["So... by just knowing of the existance of this package we can solve this exercise in only one line (two, if you count the `import`)\n","\n","**Lesson of the day**: Before attempting to do anything, check if there is a library that can help you out! "]}]} -------------------------------------------------------------------------------- /notebooks/03_basic_string_operations.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"},"colab":{"name":"03_basic_string_operations.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"jg-F-ZXQTIDM"},"source":["# Strings\n","\n","> In computer science a string is traditionally a sequence of characters and numbers. A string is essentially an array of characters.\n","\n","Strings are a really important especially for communicating between the user and the program.\n","\n","You can think of a string as a sentence that is **enclosed in quotes** (either 'single quotes' or \"double quotes\")."]},{"cell_type":"code","metadata":{"id":"222H4HIUTIDQ","executionInfo":{"status":"ok","timestamp":1604830384979,"user_tz":-120,"elapsed":1976,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}}},"source":["from __future__ import print_function"],"execution_count":1,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ezo-PUBPTIDZ"},"source":["Let's first create two strings."]},{"cell_type":"code","metadata":{"id":"vhGrJH3sTIDd","executionInfo":{"status":"ok","timestamp":1604830385324,"user_tz":-120,"elapsed":2315,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}}},"source":["st1 = 'I am a string'\n","st2 = \"me too!\""],"execution_count":2,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"g_yv3RtfTIDj"},"source":["## Referencing and slicing\n","\n","We said before that a string is essentially an **array of characters**. So, how can we retrieve a single character from our string? \n","```python\n","str[i]\n","```\n","This returns the *i-th* character in the string `str` (count starts from 0).\n","\n","**Slicing** is when we want to retrieve a substring from our initial string.\n","```python\n","str[i:j:k]\n","```\n","This returns a substring of string `str` that starts from the character with index *i*, ends with the character with index *k-1* and returns one character every *k* ones.\n","- ***i***: **starting point**\n","- ***j***: **ending point**\n","- ***k***: **step**\n","\n","Negative indices start count from the end of the string. \n","\n","Let's look at some examples:"]},{"cell_type":"code","metadata":{"id":"p3xg3LpDTIDk","executionInfo":{"status":"ok","timestamp":1604830385325,"user_tz":-120,"elapsed":2309,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"020be417-6a3a-4314-9071-66d8835746d7","colab":{"base_uri":"https://localhost:8080/"}},"source":["print('Whole string: ', st1)\n","print('st1[3]: ', st1[3])\n","# In python the first element in a set (string, list, tuple dictionary, array, etc) has an index of 0!\n","# so by calling index number 3 we are referring to the 4th character (in this case letter 'm')\n","print('st1[-1]: ', st1[-1]) # Returns the last character: 'g'\n","print('st1[4]: ', st1[:4]) # Returns the first 4 characters: 'I am'\n","print('st1[-3:]: ', st1[-3:]) # Returns the last 3 characters: 'ing'\n","print('st1[2:6]: ', st1[2:6]) # Returns the characters with indices 2-5 (doesn't return index 6!): 'am a'\n","print('st1[3:10:2]: ', st1[3:10:2]) # Returns characters with indices from 2 to 8 with a step of 2 (indices: 2,4,6,8): 'masr'"],"execution_count":3,"outputs":[{"output_type":"stream","text":["Whole string: I am a string\n","st1[3]: m\n","st1[-1]: g\n","st1[4]: I am\n","st1[-3:]: ing\n","st1[2:6]: am a\n","st1[3:10:2]: masr\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"tsRUX4HdTIDs"},"source":["## Helpful built-in string functions\n","\n","These are either universal python functions that can take a string as a parameter (e.g *len*), or built-in methods of string type objects (e.g *string.index*)"]},{"cell_type":"code","metadata":{"id":"m780ArA7TIDt","executionInfo":{"status":"ok","timestamp":1604830385325,"user_tz":-120,"elapsed":2303,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"9556b746-1cb3-4332-cd70-ba1eb2165cef","colab":{"base_uri":"https://localhost:8080/"}},"source":["print('st1: ', st1)\n","\n","print('len(st1): ', len(st1))\n","# returns the length of the string: 13\n","\n","print(\"st1.index('a'): \", st1.index('a'))\n","# returns the index of the first matching argument passed (in this case 'a'): 2\n","\n","print(\"st1.count('a'): \", st1.count('a'))\n","# returns how many times 'a' appears in the string: 2\n","\n","print(\"st1.count('i'): \", st1.count('i'))\n","# strings are case sensitive: 1\n","\n","st3 = 'sasasas'\n","# we'll create a new string to show this better\n","print(\"st3: \", st3)\n","\n","print(\"st1.count('sas'):\", st3.count('sas'))\n","# counts only discrete appearances: 2 (even though 'sas' appears 3 times it returns 2 because the first 's' from the second 'sas' is the same as the last 's' of the first 'sas')"],"execution_count":4,"outputs":[{"output_type":"stream","text":["st1: I am a string\n","len(st1): 13\n","st1.index('a'): 2\n","st1.count('a'): 2\n","st1.count('i'): 1\n","st3: sasasas\n","st1.count('sas'): 2\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"tprgj80XTIDy"},"source":["## Logical operations\n","These help us figure out if a string contains a certain character or substring."]},{"cell_type":"code","metadata":{"id":"7GuqxDkhTIDz","executionInfo":{"status":"ok","timestamp":1604830385326,"user_tz":-120,"elapsed":2301,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"5270d688-b38d-423b-d3a7-3df2a26f3d84","colab":{"base_uri":"https://localhost:8080/"}},"source":["print('st1: ', st1)\n","print(\"'a' in st1: \", 'a' in st1)\n","# Returns True (because there is an 'a' in st1)\n","\n","print(\"'o' in st1: \", 'o' in st1)\n","# Returns False (because there isn't an 'o' in st1)\n","\n","print(\"'o' not in st1: \", 'o' not in st1)\n","# Returns the opposite of the previous\n","\n","print(\"'tri' not in st1: \", 'o' not in st1)\n","# Returns True (because there is a substring 'tri' in st1)"],"execution_count":5,"outputs":[{"output_type":"stream","text":["st1: I am a string\n","'a' in st1: True\n","'o' in st1: False\n","'o' not in st1: True\n","'tri' not in st1: True\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"jrTm_rxeTID6"},"source":["## String operations\n","Concatenating and repeating strings is done easily in python."]},{"cell_type":"code","metadata":{"id":"kqSof8vcTID6","executionInfo":{"status":"ok","timestamp":1604830385327,"user_tz":-120,"elapsed":2296,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"6746ceb1-c33a-4406-d7b1-455c30729c2d","colab":{"base_uri":"https://localhost:8080/"}},"source":["print('st1: ', st1)\n","print('st2: ', st2)\n","print('st1 + st2: ', st1 + st2) # concatenation of st1 and st2\n","print(\"st2 * 3: \", st2 * 3) # equivalent to adding st2 to itself 3 times"],"execution_count":6,"outputs":[{"output_type":"stream","text":["st1: I am a string\n","st2: me too!\n","st1 + st2: I am a stringme too!\n","st2 * 3: me too!me too!me too!\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"x-7Q-KDHTID_"},"source":["## Capitalization\n","Python strings also have a lot of built in functions for manipulating the capitalization in strings. These are mostly used for user inputs. \n","\n","For example, say you want to ask a user his name. He could reply: `John`, `john` or `JOHN`. All three of these are the same for a human, but totally different for the computer. Capitalization methods help us with these cases."]},{"cell_type":"code","metadata":{"id":"eWK9iM_9TIEA","executionInfo":{"status":"ok","timestamp":1604830385328,"user_tz":-120,"elapsed":2292,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"c2717e54-5864-47e3-8344-3c6d86338a31","colab":{"base_uri":"https://localhost:8080/"}},"source":["st4 = 'rAnDoMLy CAPitaLiZed StrInG'\n","print('st4: ', st4)\n","\n","print(\"st4.capitalize(): \", st4.capitalize())\n","# returns string with first letter capitalized and rest lowercase\n","\n","print(\"st4.lower(): \", st4.lower())\n","# all lowercase\n","\n","print(\"st4.upper(): \", st4.upper())\n","# all uppercase\n","\n","print(\"st4.swapcase(): \", st4.swapcase())\n","# swaps upper for lowercase and vice versa\n","\n","print(\"st4.title(): \", st4.title())\n","# capitalizes the first letter of each word"],"execution_count":7,"outputs":[{"output_type":"stream","text":["st4: rAnDoMLy CAPitaLiZed StrInG\n","st4.capitalize(): Randomly capitalized string\n","st4.lower(): randomly capitalized string\n","st4.upper(): RANDOMLY CAPITALIZED STRING\n","st4.swapcase(): RaNdOmlY capITAlIzED sTRiNg\n","st4.title(): Randomly Capitalized String\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"PD2JJaySTIED"},"source":["## Whitespace manipulation\n","`str.strip` can help us deal with a lot of issues with whitespace (or even excess characters)."]},{"cell_type":"code","metadata":{"id":"9lPlbgOyTIEF","executionInfo":{"status":"ok","timestamp":1604830385328,"user_tz":-120,"elapsed":2288,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"6964d9da-ed05-47ba-a8ae-2250b29df201","colab":{"base_uri":"https://localhost:8080/"}},"source":["st5 = ' lots of whitespace '\n","\n","print('st5: ', st5)\n","\n","print(\"st5.lstrip(): \", st5.lstrip())\n","# removes leading whitespace\n","\n","print(\"st5.lstrip(' stlow'): \", st5.lstrip(' stlow'))\n","# Removes leading characters (in this case it removed all the whitespace, 'lots' and\n","# the 'o' from of, but didn't remove the 'w' from 'whitepace' (because of the 'f')\n","\n","print(\"st5.rstrip(): \", st5.rstrip())\n","# same as lstrip but strips from the right\n","\n","print(\"st5.strip(): \", st5.strip())\n","# removes both from the left and from the right"],"execution_count":8,"outputs":[{"output_type":"stream","text":["st5: lots of whitespace \n","st5.lstrip(): lots of whitespace \n","st5.lstrip(' stlow'): f whitespace \n","st5.rstrip(): lots of whitespace\n","st5.strip(): lots of whitespace\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"5vrCKmztTIEJ"},"source":["## Split and Join\n","Split is used for splitting a string into a list of substrings according to a delimiter. Join helps us merging a list of characters into a single string."]},{"cell_type":"code","metadata":{"id":"_H-QG5ESTIEK","executionInfo":{"status":"ok","timestamp":1604830385330,"user_tz":-120,"elapsed":2285,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"10b509ab-cff0-4bf9-8273-db4361109ebb","colab":{"base_uri":"https://localhost:8080/"}},"source":["print('st1: ', st1)\n","\n","spl = st1.split()\n","\n","print('st1.split(): ', spl)\n","# Splits string into a list of substrings (default delimiter: space)\n","\n","print(\"st1.split('a'): \", st1.split('a'))\n","# Split with delimiter 'a' (spacing is preserved)\n","\n","print(\"st1.split('a', 1):\", st1.split('a', 1))\n","# Split with delimiter 'a'. Performs only 1 split\n","\n","print(\"''.join(spl): \", ''.join(spl))\n","# Joins sequence of strings as a string\n","\n","print(\"'-'.join(spl): \", '-'.join(spl))\n","# Joins sequence of strings with '-' as delimiter"],"execution_count":9,"outputs":[{"output_type":"stream","text":["st1: I am a string\n","st1.split(): ['I', 'am', 'a', 'string']\n","st1.split('a'): ['I ', 'm ', ' string']\n","st1.split('a', 1): ['I ', 'm a string']\n","''.join(spl): Iamastring\n","'-'.join(spl): I-am-a-string\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"IbQlFT8pTIEP"},"source":["## Replace\n","Python also has a built-in method for finding a substring in a string and replacing it with another one. This is also useful for removing a part of a string completely:"]},{"cell_type":"code","metadata":{"id":"TnKoHBzrTIEP","executionInfo":{"status":"ok","timestamp":1604830385330,"user_tz":-120,"elapsed":2281,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"ebf857a2-7cfe-4716-d533-1de10b91b26b","colab":{"base_uri":"https://localhost:8080/"}},"source":["print(\"Before replace: \", st2)\n","\n","print('After replace: ', st2.replace('too','three'))\n","# replaces 'too' with 'three' in st2\n","\n","print(\"replace 'zz': \", 'razzndzzom stzzrizzng'.replace('zz', ''))\n","# removes 'zz' from the string\n","\n","print('remove whitespace from previous example:', st5.replace(' ', ''))\n","# removes all whitespace from the string\n","\n","print('remove whitespace from previous example:', st5.replace(' ', ' '))\n","# replaces double spaces with single ones"],"execution_count":10,"outputs":[{"output_type":"stream","text":["Before replace: me too!\n","After replace: me three!\n","replace 'zz': random string\n","remove whitespace from previous example: lotsofwhitespace\n","remove whitespace from previous example: lots of whitespace \n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"t3u4FxmFTIEU"},"source":["## Formatting\n","\n","The best way of incorporating values from variables to strings is through formatting. This is especially useful for printing data on the screen. Formatting is typically done with the built-in `string.format()` method.\n","This method takes a string and replaces occurrences of curly brackets (`{}`) with whatever parameter we pass into it."]},{"cell_type":"code","metadata":{"id":"vsmJnknqTIEV","executionInfo":{"status":"ok","timestamp":1604830385331,"user_tz":-120,"elapsed":2277,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"f4e0713d-47b0-442a-a818-dbe1d2beace2","colab":{"base_uri":"https://localhost:8080/"}},"source":["ct = 55\n","print('bla bla {} bla')\n","print('bla bla {} bla'.format(ct))"],"execution_count":11,"outputs":[{"output_type":"stream","text":["bla bla {} bla\n","bla bla 55 bla\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"olZeJpV6TIEa"},"source":["There are a lot of formatting options, we wont go into much detail about formatting in this tutorial, but we will revisit the topic in the future.\n","\n","There is also an older way of using formatting strings with the percent (`%`) sign. This does **not** utilize the `string.format()` method."]},{"cell_type":"code","metadata":{"id":"_Mf7-pdWTIEb","executionInfo":{"status":"ok","timestamp":1604830385332,"user_tz":-120,"elapsed":2271,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"9d9ea799-511f-46cd-bdc5-c395347fece0","colab":{"base_uri":"https://localhost:8080/"}},"source":["ts = 'The first string I used in this tutorial was: %s, and the second one was: %s' %(st1,st2)\n","print(ts)\n","\n","n1, n2 = 5, 10\n","print('bla bla %i bla %.2f' %(n1,n2))\n","\n","print('%(language)s has %(number)03d quote types.' %{\"language\": \"Python\", \"number\": 2})\n","\n","print('%i is an integer, %f is a float, %s is a string.' %(15, 1.66, 'asdf'))"],"execution_count":12,"outputs":[{"output_type":"stream","text":["The first string I used in this tutorial was: I am a string, and the second one was: me too!\n","bla bla 5 bla 10.00\n","Python has 002 quote types.\n","15 is an integer, 1.660000 is a float, asdf is a string.\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"hFVQbe8zTIEf"},"source":["## Example\n","\n","We want to write a program that splits a string according to commas (,) and full stops (.) but preserves full stops. Spacing after commas and full stops should also be removed."]},{"cell_type":"code","metadata":{"id":"QeWMLRedTIEg","executionInfo":{"status":"ok","timestamp":1604830385333,"user_tz":-120,"elapsed":2268,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"f00b3364-40ce-4699-a478-29141e38e9a9","colab":{"base_uri":"https://localhost:8080/"}},"source":["ex_str = 'This is the string that we will use to test our example. ' \\\n"," 'The expected output of the program should contain every word ' \\\n"," 'this string has, but it should be split according to punctuation. ' \\\n"," 'Full stops should be preserved, but commas should not. ' \\\n"," 'Spacing after punctuation, should also be removed.'\n","print(ex_str)"],"execution_count":13,"outputs":[{"output_type":"stream","text":["This is the string that we will use to test our example. The expected output of the program should contain every word this string has, but it should be split according to punctuation. Full stops should be preserved, but commas should not. Spacing after punctuation, should also be removed.\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"SNhjjeZETIEk"},"source":["Let's do the easy part first. Let's split the string according to commas:"]},{"cell_type":"code","metadata":{"id":"EcZHnU4yTIEl","executionInfo":{"status":"ok","timestamp":1604830385333,"user_tz":-120,"elapsed":2264,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"cd70c406-171d-4b05-b00a-bfdb9cf3e523","colab":{"base_uri":"https://localhost:8080/"}},"source":["temp = ex_str.split(',')\n","print(temp)"],"execution_count":14,"outputs":[{"output_type":"stream","text":["['This is the string that we will use to test our example. The expected output of the program should contain every word this string has', ' but it should be split according to punctuation. Full stops should be preserved', ' but commas should not. Spacing after punctuation', ' should also be removed.']\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"ZFzJizXxTIEo"},"source":["We did manage to split the string, but we haven't yet removed the excess spacing in the beginning of our substrings. \n","One thought would be to remove the first character from each of these strings, but that would also remove the fist character from the first string (This ...). If we wanted to do it this way we would have to keep that in mind. This would also require an elementwise list operation which we haven't covered yet.\n","\n","What we can do is remove spacing during the split phase:"]},{"cell_type":"code","metadata":{"id":"a4ggI9XmTIEo","executionInfo":{"status":"ok","timestamp":1604830385334,"user_tz":-120,"elapsed":2260,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"8b4cb7a6-feec-45eb-8362-ba28bfcf8a59","colab":{"base_uri":"https://localhost:8080/"}},"source":["temp = ex_str.split(', ')\n","print(temp)"],"execution_count":15,"outputs":[{"output_type":"stream","text":["['This is the string that we will use to test our example. The expected output of the program should contain every word this string has', 'but it should be split according to punctuation. Full stops should be preserved', 'but commas should not. Spacing after punctuation', 'should also be removed.']\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"p2vjundkTIEs"},"source":["Now it's better. this method has one problem though which we will discuss later.\n","\n","Let's try perform the other split:"]},{"cell_type":"code","metadata":{"id":"y_8eJLBpTIEt","executionInfo":{"status":"error","timestamp":1604830385998,"user_tz":-120,"elapsed":2918,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"8ac7402c-bf7b-4f63-d00d-7df40bced8ad","colab":{"base_uri":"https://localhost:8080/","height":167}},"source":["fin_lst = temp.split('. ')"],"execution_count":16,"outputs":[{"output_type":"error","ename":"AttributeError","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfin_lst\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtemp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'. '\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mAttributeError\u001b[0m: 'list' object has no attribute 'split'"]}]},{"cell_type":"markdown","metadata":{"id":"MQXRul4LTIEx"},"source":["So... we can't split a list of strings. \n","Furthermore the `.split()` method does not accept multiple delimiters and we don't know how to perform elementwise operations on lists. \n","What can we do?\n","\n","We could replace all full stops to commas, and then perform the split:"]},{"cell_type":"code","metadata":{"id":"GqEl74ZiTIEy","executionInfo":{"status":"ok","timestamp":1604830394756,"user_tz":-120,"elapsed":1436,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"3ca38c3b-7678-47a7-b57e-b3a18125990b","colab":{"base_uri":"https://localhost:8080/"}},"source":["temp = ex_str.replace('.', ',')\n","print(temp)"],"execution_count":17,"outputs":[{"output_type":"stream","text":["This is the string that we will use to test our example, The expected output of the program should contain every word this string has, but it should be split according to punctuation, Full stops should be preserved, but commas should not, Spacing after punctuation, should also be removed,\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"TR7uaL42TIE0"},"source":["... and then perform the split"]},{"cell_type":"code","metadata":{"id":"dQvql0pQTIE2","executionInfo":{"status":"ok","timestamp":1604830395204,"user_tz":-120,"elapsed":1136,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"8dc1fba6-0231-4cd1-e86b-894484b5f939","colab":{"base_uri":"https://localhost:8080/"}},"source":["temp = temp.split(', ')\n","print(temp)"],"execution_count":18,"outputs":[{"output_type":"stream","text":["['This is the string that we will use to test our example', 'The expected output of the program should contain every word this string has', 'but it should be split according to punctuation', 'Full stops should be preserved', 'but commas should not', 'Spacing after punctuation', 'should also be removed,']\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"lfCEWXUSTIE5"},"source":["OK, now we're close. The only thing to do is to modify it a bit so that we preserve the full stops:"]},{"cell_type":"code","metadata":{"id":"CD8w3ooKTIE6","executionInfo":{"status":"ok","timestamp":1604830395473,"user_tz":-120,"elapsed":797,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"c269efa4-3d0c-4f95-b50b-de14b62ab207","colab":{"base_uri":"https://localhost:8080/"}},"source":["temp = ex_str.replace('. ', '., ')\n","print(temp)\n","\n","fin_lst = temp.split(', ')\n","print('\\nFinal list:')\n","print(fin_lst)"],"execution_count":19,"outputs":[{"output_type":"stream","text":["This is the string that we will use to test our example., The expected output of the program should contain every word this string has, but it should be split according to punctuation., Full stops should be preserved, but commas should not., Spacing after punctuation, should also be removed.\n","\n","Final list:\n","['This is the string that we will use to test our example.', 'The expected output of the program should contain every word this string has', 'but it should be split according to punctuation.', 'Full stops should be preserved', 'but commas should not.', 'Spacing after punctuation', 'should also be removed.']\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"iepmnYUBTIE9"},"source":["Now we finally got it!\n","\n","The only problem would be if our string didn't have spaces after punctuation marks (e.g: 'strings like this,would not be split').\n","\n","We can solve this problem by replacing all punctuation + spacing with just the punctuation:"]},{"cell_type":"code","metadata":{"id":"56s6Z8qNTIE-","executionInfo":{"status":"ok","timestamp":1604830395826,"user_tz":-120,"elapsed":463,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"869433f6-f8d6-4889-9b9f-3ca207c92233","colab":{"base_uri":"https://localhost:8080/"}},"source":["temp = ex_str.replace('. ', '.').replace(', ', ',')\n","print(temp)"],"execution_count":20,"outputs":[{"output_type":"stream","text":["This is the string that we will use to test our example.The expected output of the program should contain every word this string has,but it should be split according to punctuation.Full stops should be preserved,but commas should not.Spacing after punctuation,should also be removed.\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"-ZfHZ1UqTIFB"},"source":["But what if someone by mistake had placed an extra space? (*'like this, bla bla'*)\n","\n","We would have to somehow figure out how much whitespace we have and then replace it with a single space.\n","\n","The easiest way to do this is to do a primary split on our string (with a single space as our delimiter). This would create a list of the words in our string. Then we could reassemble (join) the string with a single whitespace as the delimiter of the words. This method would effectively substitute all multiple whitespaces with single whitespace.\n","```python\n","temp = ' '.join(ex_str.split())\n","```"]},{"cell_type":"markdown","metadata":{"id":"CymkCnYTTIFD"},"source":["Let's write the program as a whole:"]},{"cell_type":"code","metadata":{"id":"WbppAQ7PTIFE","executionInfo":{"status":"ok","timestamp":1604830397557,"user_tz":-120,"elapsed":687,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"98e73dfc-04aa-4ec7-bc74-d252df54a404","colab":{"base_uri":"https://localhost:8080/"}},"source":["temp = ' '.join(ex_str.split())\n","# first we remove excess whitespace\n","\n","temp = temp.replace('. ', '.').replace(', ', ',')\n","# then we remove all whitespace after punctuation marks\n","\n","temp = temp.replace('.', '.,')\n","# then we replace every fullstop with fullstop+comma (in order to preserve full stops)\n","\n","fin_lst = temp.split(',')\n","# finally we split the string according to the commas\n","# this has a side effect of creating an empty element in the last spot of the list, but this is of little importance and we can always remove it:\n","\n","fin_lst.pop()\n","# only if string ends with a full stop\n","\n","print(fin_lst)"],"execution_count":21,"outputs":[{"output_type":"stream","text":["['This is the string that we will use to test our example.', 'The expected output of the program should contain every word this string has', 'but it should be split according to punctuation.', 'Full stops should be preserved', 'but commas should not.', 'Spacing after punctuation', 'should also be removed.']\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"Kl0q4lLkTIFI"},"source":["Finally, we'll test this in a more difficult string."]},{"cell_type":"code","metadata":{"id":"nMXKPjJmTIFI","executionInfo":{"status":"ok","timestamp":1604830400278,"user_tz":-120,"elapsed":704,"user":{"displayName":"Thanos Tagaris","photoUrl":"","userId":"11094556072874949144"}},"outputId":"1dbe1a2b-b48e-49eb-a765-a9761b8ea22e","colab":{"base_uri":"https://localhost:8080/"}},"source":["test_str = 'Element 1, element 2. Element 3, element 4,element 5. Element 6.Element 7.'\n","temp = ' '.join(test_str.split())\n","temp = temp.replace('. ', '.').replace(', ', ',')\n","temp = temp.replace('.', '.,')\n","fin_lst = temp.split(',')\n","fin_lst.pop()\n","print(fin_lst)"],"execution_count":22,"outputs":[{"output_type":"stream","text":["['Element 1', 'element 2.', 'Element 3', 'element 4', 'element 5.', 'Element 6.', 'Element 7.']\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"s_g278k2TIFM"},"source":["## Exercises\n","\n","1. Write a python program which takes a string and replaces all occurrences of it's first character with the dollar sign (\\$): \n","e.g: restart ---> resta\\$t\n","\n","2. Write a Python program to get a single string from two given strings, separated by a space and swap the first two characters of each string: \n","e.g: 'abcd', 'wxyz' ---> 'wxcd abyz'"]}]} -------------------------------------------------------------------------------- /notebooks/05_basic_tuple_dict_operations.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.6.3"},"colab":{"name":"05_basic_tuple_dict_operations.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"CIlfOy2jhScv"},"source":["# Tuples\n","\n","Tuples are lists of immutable python objects. You can think of them as read-only lists."]},{"cell_type":"code","metadata":{"id":"4I40e13GhScz"},"source":["T1 = 'a', 'b', 'c', 'd', 'e'\n","T2 = (1, 2, 3, 4, 5)\n","T3 = () # empty tuple\n","T4 = (15,) # single value tuple (requires comma!)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"07B6Iu8ahSc9"},"source":["Tuple indexing works the same as in lists but assignment does not. You **can't** change the value of an element in a tuple after it has been declared.\n","\n","For example T1[1] = 'g' is not allowed."]},{"cell_type":"code","metadata":{"id":"1GyAUPsPhSdC","outputId":"0cb4f3fb-2f05-4ebe-83c6-f2d988c72b07"},"source":["T1[1] = 'g'"],"execution_count":null,"outputs":[{"output_type":"error","ename":"TypeError","evalue":"'tuple' object does not support item assignment","traceback":["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m","\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)","\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mT1\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'g'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[1;31mTypeError\u001b[0m: 'tuple' object does not support item assignment"]}]},{"cell_type":"markdown","metadata":{"id":"T_iyKTsbhSdK"},"source":["## Tuple operations:"]},{"cell_type":"code","metadata":{"id":"zAvuc8j8hSdM","outputId":"6ab6a0e6-064e-41b0-f0a0-5cb41d463e70"},"source":["T1 + T2 \n","# concatenation works like with lists:\n","# ('a', 'b', 'c', 'd', 'e', 1, 2, 3, 4, 5)\n","T4 * 5\n","# so does repetition:\n","# (15, 15, 15, 15, 15)\n","'a' in T1\n","# and membership: True"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["True"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"ZCjxRmhshSdT"},"source":["Tuples support many of the built-in functions that lists do (`min()`, `max()`, `len()`, `sum()`, etc.), but does not support functions that change the elements (e.g `.sort()`). Instead we are forced to make copies of the tuple (`new_tup = sorted(tup)`).\n","\n","## List to tuple conversion."]},{"cell_type":"code","metadata":{"id":"ayllSs3nhSdU"},"source":["lst = [1, 2, 3]\n","tup = tuple(lst) # converts list to tuple"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"aQjfaXOShSdb"},"source":["## Tuple deletion."]},{"cell_type":"code","metadata":{"id":"EWfPkJwGhSdc"},"source":["del T2\n","# deletes the whole tuple"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"EzrSgy1jhSdi"},"source":["## Changing tuple elements.\n","We can change the elements of a tuple indirectly:"]},{"cell_type":"code","metadata":{"id":"zGxVLUiVhSdk"},"source":["lst = list(T1)\n","# first we convert list to tuple\n","T1 = tuple(lst)\n","# then convert list to a new tuple and overrides the old one with it\n","del lst\n","# optionally delete list"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"DKyuHNCEhSdo"},"source":["Note that if we try to convert a tuple to a list through brackets."]},{"cell_type":"code","metadata":{"id":"aY38kBpMhSdp"},"source":["lst = [T1]\n","# lst: [('a', 'b', 'c', 'd', 'e')]"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"fgjyEdjchSdt"},"source":["This creates a list that has a tuple as an element.\n","If we wanted to the same with a tuple from a list we would have to add a comma"]},{"cell_type":"code","metadata":{"id":"kQRK8XPthSdu"},"source":["T = (lst,)\n","# (['a','b','c','d','e'])"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"C8C_msvAhSd0"},"source":["# Dictionaries.\n","\n","Dictionaries are data types that store key-value pairs."]},{"cell_type":"code","metadata":{"id":"ikFlCP8NhSd1"},"source":["D1 = {'Name': 'Jack', 'Age': 12, 'Phone': '0123456789'}"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"GxnfQE4uhSd5"},"source":["Dictionary entries are separated by commas (`,`). Each entry has a **key** and a **value** which are separated by colons (`:`)."]},{"cell_type":"code","metadata":{"id":"HnuQGnlVhSd6"},"source":["D2 = {} # empty dictionary"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"h5Z-DrTrhSd-"},"source":["## Referencing items in a dictionary.\n","\n","Dictionaries have **no concept of order**, so if we want to retrieve something we stored we need to use it's key.\n","\n","*Note: since python 3.6+ python dictionaries are insertion ordered.*"]},{"cell_type":"code","metadata":{"id":"r_rZK21shSd_"},"source":["D1['Age']\n","# Returns the value of the entry which has 'Age' as a key: 12\n","D1['Phone']\n","# 0123456789\n","\n","# Updating entries\n","D1['Age'] = 13\n","# update existing entry with 'Age' as a key\n","D1['PK'] = 999999\n","# create new entry with 'PK' as a key and 999999 as a value"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ZwJzTwlfhSeG"},"source":["## Deleting in dictionaries"]},{"cell_type":"code","metadata":{"id":"VaKWyaTuhSeH"},"source":["del D1['Name']\n","# Removes entry with key 'Name'\n","D1.clear()\n","# Remove all D1 entries\n","del D1\n","# Deletes D1 as a whole"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"k5_pzP8qhSeK"},"source":["## Other Built-In Methods"]},{"cell_type":"code","metadata":{"id":"hHmphyYqhSeM"},"source":["D1 = {'Name': 'Jack', 'Age': 12, 'Phone': '0123456789'}\n","\n","D1_cp = D1.copy()\n","# Returns a shallow copy of dictionary dict\n","\n","D1.get('Age')\n","# Returns the value of the key 'Age'. Same as D1['Age']: 12\n","\n","D1.items()\n","# Returns a list of dict's (key, value) tuple pairs: [('Phone', '0123456789'), ('Age', 12), ('Name', 'Jack')]\n","\n","D1.keys()\n","# Returns list of dictionary dict's keys: ['Phone', 'Age', 'Name']\n","\n","D1.values()\n","# Returns list of dictionary dict's values: ['0123456789', 12, 'Jack']\n","# note again that both keys and values are unsorted\n","\n","D2 = D2.fromkeys(('key1', 'key2', 'key3', 'key4'), ['val1', 'val2'])\n","# Create a new dictionary with keys from a sequence and values set to the values in a list: {'key3': ['val1', 'val2'], 'key2': ['val1', 'val2'], 'key1': ['val1', 'val2'], 'key4': ['val1', 'val2']}\n","\n","D1.update(D2)\n","# Adds dictionary D2's key-values pairs to D1: {'Phone': '0123456789', 'Name': 'Jack', 'key3': ['val1', 'val2'], 'key2': ['val1', 'val2'], 'Age': 12, 'key1': ['val1', 'val2'], 'key4': ['val1', 'val2']}"],"execution_count":null,"outputs":[]}]} -------------------------------------------------------------------------------- /notebooks/06_logical_operations.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.6.3"},"colab":{"name":"06_logical_operations.ipynb","provenance":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"BosA3bA5i0Ve"},"source":["# Logical Conditions and Operations in Python\n","\n","Conditional statements allow us to alter the flow of a program based on the outcome of a logical condition."]},{"cell_type":"code","metadata":{"id":"aTqmfmaEi0Vi"},"source":["from __future__ import print_function"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"qZwT-8Y7i0Vs"},"source":["## Comparison Operators\n","\n","These operators compare the values of two variables and return `True` or `False` accordingly. These operators are in the heart of any logical condition."]},{"cell_type":"code","metadata":{"id":"EyeyBawTi0Vv","outputId":"50e4d42d-6968-4bbd-ccef-22babc14d978"},"source":["a = 5\n","b = 3\n","\n","print('{} == {}: {}'.format(a, b, a == b)) # Returns True if the two operands are equal: False\n","print('{} != {}: {}'.format(a, b, a != b)) # Returns True if the two operands are different: True\n","print('{} > {}: {}'.format(a, b, a > b)) # Returns True if the left operand is greater than the right one: True\n","print('{} < {}: {}'.format(a, b, a < b)) # Returns True if the right operand is greater than the left one: False\n","print('{} >= {}: {}'.format(a, b, a >= b)) # Returns True if the left operand is greater or equal to the right one: True\n","print('{} <= {}: {}'.format(a, b, a <= b)) # Returns True if the right operand is greater or equal to the left one: False"],"execution_count":null,"outputs":[{"output_type":"stream","text":["5 == 3: False\n","5 != 3: True\n","5 > 3: True\n","5 < 3: False\n","5 >= 3: True\n","5 <= 3: False\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"cSnKvnaWi0V4"},"source":["## Logical Operators\n","\n","These operators allow is to combine two or more logical conditions into a larger condition that too returns True of False."]},{"cell_type":"code","metadata":{"id":"2i7XLnb3i0V5","outputId":"dd3ce152-ee99-44d1-dab6-c5ca037ac2b5"},"source":["t = True\n","f = False\n","\n","print('{} and {}: {}'.format(t, f, t and f)) # Returns True if both operands are True: False\n","print('{} or {}: {}'.format(t, f, t or f)) # Returns True if one of the two operands is True: True\n","print('not {}: {}'.format(t, not t)) # Returns the reverse logical state of it's operand: False"],"execution_count":null,"outputs":[{"output_type":"stream","text":["True and False: False\n","True or False: True\n","not True: False\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"twoRr35si0V_"},"source":["## Decision Making\n","\n","We can create conditional statements in python with the `if` statement:\n","\n","Syntax is:\n","```python \n","if (logical statement is True): # (does something)\n","```\n","or\n","\n","```python\n","if (logical statement is True):\n"," # (does something)\n"," # (does something else)\n"," # (... and so on)\n","```\n"," \n","Note the **indentation** when expressing multi-line `if` statements. Every line in python must be in the same **indentation level** or else we would raise an *IndentationError*."]},{"cell_type":"code","metadata":{"id":"74JdAkuSi0WA","outputId":"4bb08a49-366c-4bf4-988d-638f09334425"},"source":["if (a > b): print('yes') # because the condition is true it prints 'yes'\n","if (a < b): print('yes') # this time around it doesn't print anything because the condition isn't true"],"execution_count":null,"outputs":[{"output_type":"stream","text":["yes\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"llgX-TB2i0WG"},"source":["`if ... else` statements:\n","\n","```python\n","if (logical statement is True):\n"," # (does something)\n"," # (does something else)\n"," # (... and so on)\n","else:\n"," # (does something if the condition is false)\n"," # (... and so on)\n","```"]},{"cell_type":"code","metadata":{"id":"2GO9Nw8gi0WH","outputId":"f13a7dd0-3c93-4015-895a-bec86e1fa3d1"},"source":["if (a < b): # if the condition is true then execute the indented code below\n"," print('a is smaller than b') # note the indent\n","else: # if the condition is not true then execute the indented code below\n"," print('a is not smaller than b')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["a is not smaller than b\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"HOGRZC4_i0WN"},"source":["`elif` statements:\n","\n","```python \n","if (statement1 is True):\n"," # (does something)\n"," # (does something else)\n"," # (... and so on)\n","elif (statement1 is False but statement2 is True): \n"," # (does something)\n"," # (...)\n","elif (statement1 and statement2 are False but statement3 is True): \n"," # (...)\n","else:\n"," # (does something if the condition is false)\n"," # (... and so on)\n","```"]},{"cell_type":"code","metadata":{"id":"8Xwl03Ili0WP","outputId":"1603f67f-34b5-419b-a0dd-a614912639ec"},"source":["if (a < b): # if the condition is true then execute the indented code below\n"," print('a is smaller than b') # indented line\n","elif (a == b): # if the first condition is not true but this one is then execute the indented code below\n"," print('a is equal to b')\n","else: # if none of the above conditions aren't true then execute the indented code below\n"," print('a is larger than b')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["a is larger than b\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2HkVIbZ2i0WU"},"source":["## Nested if statements\n","\n","This means having an if statement inside another if statement.\n","\n","```python\n","if (statement1):\n"," # (does something if statement1 is True)\n"," if (statement2):\n"," # (does something if statement1 and statement2 are True)\n"," if(statement3):\n"," # (does something if all three statements are True)\n"," else:\n"," # (does something if statements 1 and 2 are True)\n"," # (but statement3 is False)\n"," else:\n"," # (does something if statement1 is True but statement2 is False)\n","else:\n"," # (does something if statement1 is False)\n","```\n"," \n","In the above syntax there is never a state that checks if more than one of the statements are `False`. If we wanted to do something if **all three** logical statements were False, we would need to modify the code accordingly. One way to do so would be:\n","\n","```python\n","if (statement1):\n"," # (same nesting as before)\n","else:\n"," if (not statement2 and not statement3):\n"," # (does something if all three conditions are False)\n","```"]},{"cell_type":"code","metadata":{"id":"iFGJUQi4i0WV","outputId":"e484c00f-2c9f-4fc6-be2d-744f8881d9ad"},"source":["if a < 0:\n"," print('a is negative')\n"," if a < -100: # this if condition is checked only if the first condition (a<0) is met\n"," print('a is smaller than minus 100')\n"," else:\n"," print('a is not smaller than minus 100')\n","elif a == 0:\n"," print('a is zero')\n","else:\n"," print('a is positive')\n"," if a > 100:\n"," print('a is greater than 100')\n"," else:\n"," print('a is not greater than 100')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["a is positive\n","a is not greater than 100\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"o8RfXBBHi0WZ"},"source":["## Example\n","\n","Given a roulette number x find out if it is black/red, odd/even, high/low and which set of 12s or column it belongs to:\n","\n","![](http://www.casinonewsdaily.com/wp-content/uploads/2015/02/european-roulette-layout.jpeg)\n","\n","Imagine the roulette table pictured above. Roulette numbers go from 0 to 36. A given number can be either black or red, odd or even (in this case it is the same as being black or red), it can be small (1-18) or large (19-36), it can belong to one distinct set of 12s (1-12, 13-24, 25-36), it can belong of 3 the columns ( 1,3,...,34 / 2,5,...,35 / 3,6,...,36 ) or it can be 0 (which doesn't belong to any of the above categories)."]},{"cell_type":"code","metadata":{"id":"42EI6wxmi0Wa","outputId":"e8b8e105-b73a-4729-fd17-75c0e94cea93"},"source":["from random import randint\n","x = randint(0,100)\n","# we don't know what x is, but we know it belongs in [0,100].\n","print('The number is:', x)\n","# Only numbers in [0,36] are valid roulette numbers, though."],"execution_count":null,"outputs":[{"output_type":"stream","text":["The number is: 12\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"WK3YvlAYi0Wf","outputId":"14a58f41-94e2-42c9-8a94-01c264230675"},"source":["# First lets check to see if our number is a valid roulette number\n","if x > 36 or x < 0:\n"," print('This is not a valid roulette number!')\n","else:\n"," print('This is a valid roulette number!')\n"," # Now lets check if it is 0 or not\n"," if x == 0:\n"," # actually we don't need to do anything here (because zero doesn't belong in any category\n"," pass # with this statement we tell the program not do anything.\n"," # if we hadn't used pass then we would get an IndentationError \n"," # (because python expects an indented statement after colons)\n"," else: # or if it is an integer between 1 and 36\n"," # now lets check the color\n"," if x % 2 == 0:\n"," print('This is an even number!')\n"," print('The color is red!')\n"," else: \n"," print('This is an odd number!')\n"," print('The color is black!')\n"," if x > 18: # we don't have to check if x <= 36 because we are nested inside the first else statement\n"," print('This is a large number!')\n"," if x > 24:\n"," print('This number belongs in the third set of 12s!')\n"," else:\n"," print('This number belongs in the second set of 12s!')\n"," else: # or if x <= 18\n"," print('This is a small number!')\n"," if x > 12:\n"," print('This number belongs in the second set of 12s!')\n"," else: # if x <= 12\n"," print('This number belongs in the first set of 12s!')\n"," # Now that we checked if it is large or small and figured out in which\n"," # set of thirds it belongs to, lets check what column it belongs to:\n"," if (x % 3) == 0: # if the number divides perfectly with 3 it belongs in the 3rd column\n"," print('This number belongs in the third column!')\n"," elif (x % 3) == 1: # no need for elif (we could just have used if instead)\n"," print('This number belongs in the first column!')\n"," elif (x % 3) == 2: # same as before\n"," print('This number belongs in the second column!')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["This is a valid roulette number!\n","This is an even number!\n","The color is red!\n","This is a small number!\n","This number belongs in the first set of 12s!\n","This number belongs in the third column!\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"9zI2rdVni0Wk"},"source":["## Exercise\n","\n","As an exercise you can try to write a program that does the exact same thing as the previous example. Try to nest everything in a large if condition that checks for the color.\n","\n","Pseudocode follows:\n","\n","```python\n","if (color_check):\n"," if (small/large):\n"," if (...):\n"," ...\n"," else:\n"," ...\n"," else:\n"," ...\n","else:\n"," ...\n","```"]},{"cell_type":"markdown","metadata":{"id":"FEQMqii0cqmC"},"source":["## Truthy and Falsy values\n","\n","In python most objects have a boolean representation of either"]},{"cell_type":"code","metadata":{"id":"muhaFfdNcvni"},"source":["\n"],"execution_count":null,"outputs":[]}]} -------------------------------------------------------------------------------- /notebooks/08_input_output.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Input / Output Operations\n", 8 | "\n", 9 | "Programs are no good on their own without having means of interacting with their user.\n", 10 | "\n", 11 | "## Output to screen - print function.\n", 12 | "We have had a brief look at python's print function over the last couple of tutorials." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 1, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "from __future__ import print_function\n", 22 | "s = 'string1' # a random string\n", 23 | "i = 5 # an integer\n", 24 | "f = 0.33 # a float" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "If we want to print out a certain variable on screen we can use the `print` function." 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 2, 37 | "metadata": {}, 38 | "outputs": [ 39 | { 40 | "name": "stdout", 41 | "output_type": "stream", 42 | "text": [ 43 | "string1\n", 44 | "5\n", 45 | "0.33\n" 46 | ] 47 | } 48 | ], 49 | "source": [ 50 | "print(s) # string1\n", 51 | "print(i) # 5\n", 52 | "print(f) # 0.33" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "We can also use print to print out multiple variables." 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "string1 5 0.33\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "print(s, i, f)" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "A good practise it to describe what we are printing when we print the value of a variable." 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 4, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "The value of the variable s is: string1\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "print('The value of the variable s is:', s)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "We can even place variable values in text.\n", 108 | "\n", 109 | "### old formatting scheme" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 5, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "name": "stdout", 119 | "output_type": "stream", 120 | "text": [ 121 | "string1 is a string, 5 is an integer and 0.330000 is a float\n" 122 | ] 123 | } 124 | ], 125 | "source": [ 126 | "print('%s is a string, %i is an integer and %f is a float' %(s, i, f))" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "When we place a `%s` in a string, python knows to expect a string variable to take its place. The same happens with integers (`%i`) and floats (`%f`)." 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 6, 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "name": "stdout", 143 | "output_type": "stream", 144 | "text": [ 145 | "0.33 is a string\n" 146 | ] 147 | } 148 | ], 149 | "source": [ 150 | "print('%s is a string' %f)" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "So the float was typecasted as a string and then printed out. \n", 158 | "We can confirm this because `print('%i is a string' %s)` would raise a *TypeError*. \n", 159 | "\n", 160 | "### a better way...\n", 161 | "\n", 162 | "A better way to format is to use the `string.format()` method which has more functionality." 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 7, 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "12 23\n" 175 | ] 176 | } 177 | ], 178 | "source": [ 179 | "print('{} {}'.format(12, 23))" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "We can also configure the order in which the variables are passed to the string, by placing an index into the brackets." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 8, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "23 12\n" 199 | ] 200 | } 201 | ], 202 | "source": [ 203 | "print('{1} {0}'.format(12, 23))" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "With this we essentially instruct python to use the second (index 1) argument of the tuple first and the first (index 0) one second. \n", 211 | "\n", 212 | "Another useful thing is to define how many characters the variable should occupy in the string. If the variable requires more characters, it will discard the latter ones. If it has less it will fill the rest with whitespaces. If this is the case, we can align the string wherever we want.\n", 213 | "\n", 214 | "The syntax is:\n", 215 | "```python\n", 216 | "'{:<10}'.format(variable) # this reserves 10 characters for the variable and aligns it left\n", 217 | "# By changing the number we can define how many characters we will reserve for the variable:\n", 218 | "'{:<5}'.format(variable) # this reserves 5 characters and aligns the variable to the left\n", 219 | "# We can also align the variable to the middle or the right of the reserved characters:\n", 220 | "'{:^5}'.format(variable) # middle alignment\n", 221 | "'{:>5}'.format(variable) # right alignment\n", 222 | "```" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 9, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "name": "stdout", 232 | "output_type": "stream", 233 | "text": [ 234 | " string1\n" 235 | ] 236 | } 237 | ], 238 | "source": [ 239 | "print('{:>20}'.format(s))" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "This tells python to create a string of 20 characters and align our string to the right of that (`>`). \n", 247 | "\n", 248 | "Here `string1` contains 7 characters, but we told the string to reserve 20 characters for this variable and to place it in the end of those 20 characters. The rest were *padded* with whitespace. \n", 249 | "\n", 250 | "We can also define with which character the string will *pad* these surplus characters by adding that value after the colon (`:`).\n", 251 | "\n", 252 | "A final option is to *truncate* long strings (this is usually done in floats where we don't care about all of the decimals." 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 10, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "name": "stdout", 262 | "output_type": "stream", 263 | "text": [ 264 | "_____________string1\n", 265 | "string1_____________\n", 266 | "______string1_______\n", 267 | "strin\n", 268 | "strin_____\n" 269 | ] 270 | } 271 | ], 272 | "source": [ 273 | "# we can use padding to see this more clearly\n", 274 | "print('{:_>20}'.format(s))\n", 275 | "# left alignment (with padding):\n", 276 | "print('{:_<20}'.format(s))\n", 277 | "# middle alignment (with padding):\n", 278 | "print('{:_^20}'.format(s))\n", 279 | "# we can also truncate long strings (lets say to 5 characters)\n", 280 | "print('{:.5}'.format(s))\n", 281 | "# truncating + padding\n", 282 | "print('{:_<10.5}'.format(s))" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "### Number types:" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 11, 295 | "metadata": {}, 296 | "outputs": [ 297 | { 298 | "name": "stdout", 299 | "output_type": "stream", 300 | "text": [ 301 | "42\n", 302 | "3.141593\n", 303 | " 42\n", 304 | "003.14\n" 305 | ] 306 | } 307 | ], 308 | "source": [ 309 | "# Integers:\n", 310 | "print('{:d}'.format(42)) # 'd' is same thing as 'i' for printing \n", 311 | "# Floats:\n", 312 | "print('{:f}'.format(3.141592653589793))\n", 313 | "# with padding: \n", 314 | "print('{:5d}'.format(42))\n", 315 | "# padding and truncating\n", 316 | "print('{:06.2f}'.format(3.141592653589793))" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "### Named Placeholders:" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 12, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "name": "stdout", 333 | "output_type": "stream", 334 | "text": [ 335 | "first_string, third_string\n", 336 | "str1 st2\n", 337 | "23, 26, 21\n", 338 | "Gib = 2.718\n", 339 | " test \n" 340 | ] 341 | } 342 | ], 343 | "source": [ 344 | "# from dictionary\n", 345 | "dic = {'first':'first_string','second':'second_string','third':'third_string'}\n", 346 | "print('{first}, {third}'.format(**dic))\n", 347 | "# explicit naming\n", 348 | "print('{first} {last}'.format(first='str1', last='st2'))\n", 349 | "# from list\n", 350 | "range_list = list(range(21,30))\n", 351 | "print('{l[2]}, {l[5]}, {l[0]}'.format(l=range_list))\n", 352 | "print('{:.{prec}} = {:.{prec}f}'.format('Gibberish', 2.7182, prec=3))\n", 353 | "# Parameterized formats\n", 354 | "print('{:{align}{width}}'.format('test', align='^', width='10'))" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "### Datetime:" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 13, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "name": "stdout", 371 | "output_type": "stream", 372 | "text": [ 373 | "2001-02-03 04:05\n" 374 | ] 375 | } 376 | ], 377 | "source": [ 378 | "from datetime import datetime\n", 379 | "dt = datetime(2001, 2, 3, 4, 5)\n", 380 | "print('{:{dfmt} {tfmt}}'.format(dt, dfmt='%Y-%m-%d', tfmt='%H:%M'))" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "### Formatting example:" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 14, 393 | "metadata": {}, 394 | "outputs": [ 395 | { 396 | "name": "stdout", 397 | "output_type": "stream", 398 | "text": [ 399 | "Without formatting:\n", 400 | "1 1 1\n", 401 | "2 4 8\n", 402 | "3 9 27\n", 403 | "4 16 64\n", 404 | "5 25 125\n", 405 | "6 36 216\n", 406 | "7 49 343\n", 407 | "8 64 512\n", 408 | "9 81 729\n", 409 | "10 100 1000\n", 410 | "\n", 411 | "\n", 412 | "With formatting:\n", 413 | " 1 1 1\n", 414 | " 2 4 8\n", 415 | " 3 9 27\n", 416 | " 4 16 64\n", 417 | " 5 25 125\n", 418 | " 6 36 216\n", 419 | " 7 49 343\n", 420 | " 8 64 512\n", 421 | " 9 81 729\n", 422 | "10 100 1000\n" 423 | ] 424 | } 425 | ], 426 | "source": [ 427 | "print('Without formatting:')\n", 428 | "for x in range(1, 11):\n", 429 | " print(x, x ** 2, x ** 3)\n", 430 | " \n", 431 | "print('\\n')\n", 432 | "\n", 433 | "print('With formatting:')\n", 434 | "for x in range(1, 11):\n", 435 | " print('{0:2d} {1:3d} {2:4d}'.format(x, x ** 2, x ** 3))" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "## Keyboard Input.\n", 443 | "\n", 444 | "This happens when the we ask the user to input something from the keyboard so that we can use it in our code.\n", 445 | "\n", 446 | "This is done through the built-in `input()` function equivalent to python2's `raw_input()`. This function accepts a parameter and reads it as a `string`." 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 15, 452 | "metadata": {}, 453 | "outputs": [ 454 | { 455 | "name": "stdout", 456 | "output_type": "stream", 457 | "text": [ 458 | "What is your name? Thanos\n", 459 | "Thanos \n", 460 | "What is your age? 28\n", 461 | "28 \n", 462 | "\n" 463 | ] 464 | } 465 | ], 466 | "source": [ 467 | "name = input('What is your name? ') # prints the string 'What is your name? 'and expects a string as an input to store in the variable name\n", 468 | "print(name, type(name))\n", 469 | "age = input('What is your age? ') # same thing with age\n", 470 | "print(age, type(age))\n", 471 | "# So, no matter what we enter, it is interpreted as a string.\n", 472 | "# If we age expecting an integer we need to cast it\n", 473 | "age = int(age) \n", 474 | "print(type(age))" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "When reading inputs from users, things might not turn out always as expected. A smart thing to do is to check if the input matches your specifications and force the user to repeat what he entered if it does not.\n", 482 | "\n", 483 | "### Example\n", 484 | "\n", 485 | "Lets say we want to make a test." 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "execution_count": 16, 491 | "metadata": {}, 492 | "outputs": [ 493 | { 494 | "name": "stdout", 495 | "output_type": "stream", 496 | "text": [ 497 | "Which of the following animals is also a name for a programming language?\n", 498 | "A: Python B: Pitbull \n", 499 | "C: Bear D: Horse \n" 500 | ] 501 | } 502 | ], 503 | "source": [ 504 | "print('Which of the following animals is also a name for a programming language?')\n", 505 | "print('{:<15} {:<15}'.format('A: Python', 'B: Pitbull'))\n", 506 | "print('{:<15} {:<15}'.format('C: Bear', 'D: Horse'))" 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "The first thing we want to do is to create a loop that will force the user to answer only with a letter a, b, c or d." 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": 17, 519 | "metadata": {}, 520 | "outputs": [ 521 | { 522 | "name": "stdout", 523 | "output_type": "stream", 524 | "text": [ 525 | "Please enter your answer: e\n", 526 | "Not an appropriate choice. The answer must be A, B, C or D.\n", 527 | "Try again: b\n", 528 | "Sorry, the correct answer was A: Python.\n" 529 | ] 530 | } 531 | ], 532 | "source": [ 533 | "answer = input('Please enter your answer: ')\n", 534 | "while answer.lower() not in ('a', 'b', 'c', 'd'):\n", 535 | " answer = input('Not an appropriate choice. The answer must be A, B, C or D.\\nTry again: ')\n", 536 | "\n", 537 | "# With this infinite loop we repeat the raw input until the user answers a letter that is in a given set.\n", 538 | "# By utilizing the .lower function we can ensure that both upper and lowercase letters are acceptable.\n", 539 | "if answer.lower() == 'a':\n", 540 | " print('Congratulations, you chose correctly!')\n", 541 | "else:\n", 542 | " print('Sorry, the correct answer was A: Python.')" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": {}, 548 | "source": [ 549 | "## Using files for I/O operations.\n", 550 | "\n", 551 | "Variables are no good for storing data for long term use. Once the program has finished, variables will be removed and the data they stored will be deleted from memory. **Files** are a way of storing data in your disk for long term use." 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": 18, 557 | "metadata": {}, 558 | "outputs": [], 559 | "source": [ 560 | "# say we create a list\n", 561 | "a_list = list(range(55)) + ['a', 'ab', 'abc'] * 15" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "We want to store this list into a file so that we use it in the future. First, we need to specify the path that we want the file to be.\n", 569 | "\n", 570 | "```python\n", 571 | "# Linux paths:\n", 572 | "file_location = '/home/thanos/path/to/file'\n", 573 | "# Windows paths:\n", 574 | "file_location = 'C:\\\\Users\\\\thanos\\\\path\\\\to\\\\file'\n", 575 | "```\n", 576 | "\n", 577 | "The default path your working directory (the directory from where python was launched). " 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": 19, 583 | "metadata": {}, 584 | "outputs": [], 585 | "source": [ 586 | "# Feel free to change your path to a valid one. I'll use the current directory.\n", 587 | "file_location = 'test_file'" 588 | ] 589 | }, 590 | { 591 | "cell_type": "markdown", 592 | "metadata": {}, 593 | "source": [ 594 | "We can use a file with the python's `open()` function.\n", 595 | "\n", 596 | "```python\n", 597 | "handle = open(path, 'wb')\n", 598 | "```\n", 599 | "\n", 600 | "This command instructs python to open the file identified in path and refer to it with the variable `handle`.\n", 601 | "\n", 602 | "- The `'w'` option indicates that the file is opened only for writing (not reading).\n", 603 | "- The `'b'` option indicates that the file is being written in binary format.\n", 604 | "\n", 605 | "The latter has an effect only on systems that differentiate between binary and text files. If we want to write a text file we shouldn't add the `'b'` option.\n", 606 | "\n", 607 | "If the file doesn't exist it will create it. If a file with the same name exists in this location python will overwrite it!" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 20, 613 | "metadata": {}, 614 | "outputs": [], 615 | "source": [ 616 | "f = open(file_location, 'w')\n", 617 | "# According to python f is: " 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 21, 623 | "metadata": {}, 624 | "outputs": [], 625 | "source": [ 626 | "# In order to store something in the file we can use the file's .write() function.\n", 627 | "f.write('This is the file \\n')\n", 628 | "# The previous command writes the string 'This is the file \\n' to the file.\n", 629 | "# If we want to write the contents of the list, we need to first convert them to a string.\n", 630 | "for i in a_list:\n", 631 | " f.write(str(i) + '\\n') # we chose to separate the elements with a new line\n", 632 | "# Now that we're done with the file we need to close it.\n", 633 | "# This is done with the file.close() method \n", 634 | "f.close()" 635 | ] 636 | }, 637 | { 638 | "cell_type": "markdown", 639 | "metadata": {}, 640 | "source": [ 641 | "The `file.close()` method is important and should not be forgotten. \n", 642 | "If we don't close it manually, python will eventually close the file when it is up for garbage collection but we don't know exactly when this will be. \n", 643 | "It is a bad practice leaving open file handles all over the place because that would be a waste of system resources.\n", 644 | "\n", 645 | "Now let's try and retrieve the stored data from the file. \n", 646 | "This time we need to open the file for reading. \n", 647 | "We will use a different syntax this time to open the file. Instead of `open()` - `close()` we will use the `with` statement. This allows us to open a file and perform all actions on the file with **indented** commands. We don't need to close the file afterwards!" 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": 22, 653 | "metadata": {}, 654 | "outputs": [ 655 | { 656 | "name": "stdout", 657 | "output_type": "stream", 658 | "text": [ 659 | "['0\\n', '1\\n', '2\\n', '3\\n', '4\\n', '5\\n', '6\\n', '7\\n', '8\\n', '9\\n', '10\\n', '11\\n', '12\\n', '13\\n', '14\\n', '15\\n', '16\\n', '17\\n', '18\\n', '19\\n', '20\\n', '21\\n', '22\\n', '23\\n', '24\\n', '25\\n', '26\\n', '27\\n', '28\\n', '29\\n', '30\\n', '31\\n', '32\\n', '33\\n', '34\\n', '35\\n', '36\\n', '37\\n', '38\\n', '39\\n', '40\\n', '41\\n', '42\\n', '43\\n', '44\\n', '45\\n', '46\\n', '47\\n', '48\\n', '49\\n', '50\\n', '51\\n', '52\\n', '53\\n', '54\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n', 'a\\n', 'ab\\n', 'abc\\n']\n" 660 | ] 661 | } 662 | ], 663 | "source": [ 664 | "new_list = [] # we will store the retrieved elements here\n", 665 | "with open(file_location, 'r') as f: # this command opens a file for reading ('r') and refers to it as f\n", 666 | " f.readline() # this command will read the first line of the file and print it on screen.\n", 667 | " for line in f: # this automatically iterates through every line of the file until the eof\n", 668 | " new_list.append(line) # because the elements were line-separated we just need \n", 669 | " # to append each line to a new element of the file\n", 670 | "print(new_list)" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": {}, 676 | "source": [ 677 | "Because the format isn't what we wanted, we need to erase the last two characters from the string (the `'\\n'` part) and convert the first numbers to integers. This is not a really effective way, though, to store data.\n", 678 | "\n", 679 | "Other open modes are `'a'` which opens the file for **updating** (doesn't overwrite the previous file). `'r+'` which opens a file for **reading and writing** (similarly `'w+'` and `'a+'`)." 680 | ] 681 | }, 682 | { 683 | "cell_type": "markdown", 684 | "metadata": {}, 685 | "source": [ 686 | "### Pickle module\n", 687 | "\n", 688 | "This is probably the most common and most effective way to store data in python." 689 | ] 690 | }, 691 | { 692 | "cell_type": "code", 693 | "execution_count": 23, 694 | "metadata": {}, 695 | "outputs": [ 696 | { 697 | "name": "stdout", 698 | "output_type": "stream", 699 | "text": [ 700 | "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc', 'a', 'ab', 'abc']\n" 701 | ] 702 | } 703 | ], 704 | "source": [ 705 | "import pickle\n", 706 | "pickle_file = 'test_file.pkl' # or /path/to/test_file.pkl. suffix is optional\n", 707 | "pickle.dump(a_list, open(pickle_file, 'wb')) # this command will store the data in a_list inside the pickle_file \n", 708 | "# This file is now NOT human-readable\n", 709 | "new_list = pickle.load(open(pickle_file, 'rb')) # this command loads data from a pickle file to memory\n", 710 | "print(new_list)" 711 | ] 712 | }, 713 | { 714 | "cell_type": "markdown", 715 | "metadata": {}, 716 | "source": [ 717 | "With pickle, the new list has the exact same format as the old one. As a plus we managed that only with one line for the saving (`.dump()`) and one line for the loading (`.load()`). No iterations are needed when using pickle. Pickle files are only good for storing data. It is not good for creating any sort of human-readable file (for example log files).\n", 718 | "\n", 719 | "### CSV module\n", 720 | "\n", 721 | "This module is for storing (and loading) data from (and to) csv format.\n", 722 | "CSV (comma separated values) is one of the most common formats for storing data (most commonly from spreadsheets) into text files. The standard csv format separates the values of an entry in a spreadsheet with commas and entries (rows or records) with new lines. \n", 723 | "Other similar formats are the tab-separated values (TSV) format, or more custom formats that use custom symbols to separate values." 724 | ] 725 | }, 726 | { 727 | "cell_type": "code", 728 | "execution_count": 24, 729 | "metadata": {}, 730 | "outputs": [], 731 | "source": [ 732 | "# lets say we have a list of triples that we want to store as a csv file\n", 733 | "trip_list = [('thanos', 1, 2), ('mike', 2, 3), ('george', 5, 6), ('maria', 9, 8), ('xristina', 12, 23)] \n", 734 | "import csv\n", 735 | "with open('a_file.csv', 'w', newline='\\n') as f: # define the line separator\n", 736 | " writer = csv.writer(f, delimiter=',') # here we are using the default delimiter and it is not necessary.\n", 737 | " # if we wanted a space separated values file we could use delimiter=' ',\n", 738 | " # if we wanted a tab-separated values file we could have used delimiter='\\t'\n", 739 | " for i in trip_list:\n", 740 | " writer.writerow(i)" 741 | ] 742 | }, 743 | { 744 | "cell_type": "markdown", 745 | "metadata": {}, 746 | "source": [ 747 | "This creates a csv file named a_file.csv containing the following:\n", 748 | "\n", 749 | " thanos,1,2\n", 750 | " mike,2,3\n", 751 | " george,5,6\n", 752 | " maria,9,8\n", 753 | " xristina,12,23\n", 754 | "\n", 755 | "Now, let's try and load it back." 756 | ] 757 | }, 758 | { 759 | "cell_type": "code", 760 | "execution_count": 25, 761 | "metadata": {}, 762 | "outputs": [ 763 | { 764 | "name": "stdout", 765 | "output_type": "stream", 766 | "text": [ 767 | "[['thanos', '1', '2'], ['mike', '2', '3'], ['george', '5', '6'], ['maria', '9', '8'], ['xristina', '12', '23']]\n" 768 | ] 769 | } 770 | ], 771 | "source": [ 772 | "new_list = []\n", 773 | "with open('a_file.csv', 'r', newline='\\n') as f:\n", 774 | " reader = csv.reader(f, delimiter=',') # again here delimiter is optional\n", 775 | " for row in reader:\n", 776 | " new_list.append(row)\n", 777 | "print(new_list)" 778 | ] 779 | }, 780 | { 781 | "cell_type": "markdown", 782 | "metadata": {}, 783 | "source": [ 784 | "If we wanted a list of tuples we should have changed the last line to `new_list.append(tuple(row))`.\n", 785 | "Then we would have gotten new_list: `[('thanos', '1', '2'), ('mike', '2', '3'), ('george', '5', '6'), ('maria', '9', '8'), ('xristina', '12', '23')]`.\n", 786 | "\n", 787 | "Again, there are differences with the original (for example integers are stored as strings and depending on our situation may need to be identified and casted back as integers).\n", 788 | "\n", 789 | "In future tutorials we will learn easier ways of reading and writing csv files.\n", 790 | "\n", 791 | "### Other formats\n", 792 | "\n", 793 | "Besides from these there are many more ways of storing data, such as HDF5, json, memory-mapped data and other.\n", 794 | "\n", 795 | "- **hdf5** is a versatile data model that can represent complex objects. It offers good performance with large files. **module: h5py**\n", 796 | "\n", 797 | "- **json**, like xml is a sort of markup language that is designed for human-readable data interchange. **module: json**\n", 798 | "\n", 799 | "- **memory-mapped data** (such as LMDB) offer higher I/O performance. **module: mmap**\n", 800 | "\n", 801 | "- **xls**, or excel spreadsheets. **module: xlrd**\n", 802 | "\n", 803 | "- **xml**, **module: xml**\n", 804 | "\n", 805 | "- **rdf**, **module: rdflib**\n", 806 | "\n", 807 | "and many more..." 808 | ] 809 | } 810 | ], 811 | "metadata": { 812 | "kernelspec": { 813 | "display_name": "Python 3", 814 | "language": "python", 815 | "name": "python3" 816 | }, 817 | "language_info": { 818 | "codemirror_mode": { 819 | "name": "ipython", 820 | "version": 3 821 | }, 822 | "file_extension": ".py", 823 | "mimetype": "text/x-python", 824 | "name": "python", 825 | "nbconvert_exporter": "python", 826 | "pygments_lexer": "ipython3", 827 | "version": "3.6.3" 828 | } 829 | }, 830 | "nbformat": 4, 831 | "nbformat_minor": 1 832 | } 833 | -------------------------------------------------------------------------------- /notebooks/09_functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Functions\n", 8 | "\n", 9 | "> A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": { 16 | "collapsed": true 17 | }, 18 | "outputs": [], 19 | "source": [ 20 | "from __future__ import print_function" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "In the previous tutorials we saw many built-in python functions (like `print()`, `open()`, `sum()` and others).\n", 28 | "\n", 29 | "For a function to work, we must first define it. During this procedure we define the name of the function, the function parameters, what the function does and what it returns upon completion.\n", 30 | "\n", 31 | "Let's look at an example. We want to find out if a person is old enough to vote or not.\n", 32 | "\n", 33 | "First, we'll write the code without the use of functions:" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "name": "stdout", 43 | "output_type": "stream", 44 | "text": [ 45 | "Sorry. That is not a valid number.\n", 46 | "Sorry. That is not a valid number.\n", 47 | "Sorry. That is not a valid number.\n", 48 | "Sorry this is not a valid age.\n", 49 | "Sorry. You are too young to vote.\n" 50 | ] 51 | } 52 | ], 53 | "source": [ 54 | "while True:\n", 55 | " age = input('Please enter your age: ')\n", 56 | " if age.isdigit(): # check if the age is an integer\n", 57 | " age = int(age)\n", 58 | " if age > 0 and age < 150: # condition actually checks only ages over 150 and the age of 0. \n", 59 | " # negative values are returned as False from the previous condition ( .isdigit() )\n", 60 | " if age >= 18:\n", 61 | " print('Congratulations! You are old enough to vote!')\n", 62 | " break\n", 63 | " else:\n", 64 | " print('Sorry. You are too young to vote.')\n", 65 | " break\n", 66 | " else:\n", 67 | " print('Sorry this is not a valid age.')\n", 68 | " else:\n", 69 | " print('Sorry. That is not a valid number.')" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "Say we wanted to write a large program that requires us to check if the person is old enough to vote in more than one places in our code. We could do one of two things. \n", 77 | "\n", 78 | "- The first would be to copy and paste the code above wherever we require that functionality. \n", 79 | "That is not optimal though, because we would create a really bloated program, with a lot more lines of code than necessary. Large programs are harder to maintain and should be avoided if possible. \n", 80 | "- The best thing to do is to define a function that checks if a person is old enough to vote and use it when required." 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 1, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "def voting_privileges(age): # this line defines a function called 'voting_privileges',\n", 90 | " # that takes one argument that is called age.\n", 91 | " age = str(age) # this converts age to a string (so that we can use the same code we wrote above)\n", 92 | " result = False # Default value for the variable which will tell us if the person is able to vote or not.\n", 93 | " if age.isdigit():\n", 94 | " age = int(age)\n", 95 | " if age > 0 and age < 150: \n", 96 | " if age >= 18:\n", 97 | " print('Congratulations! You are old enough to vote!')\n", 98 | " result = True # If the person is able to vote, the flag is set as True. \n", 99 | " # In all other cases the flag remains False\n", 100 | " else:\n", 101 | " print('Sorry. You are too young to vote.')\n", 102 | " else:\n", 103 | " print('Sorry this is not a valid age.')\n", 104 | " else:\n", 105 | " print('Sorry. That is not a valid number.')\n", 106 | " return result # This line specifies what variable we want to get returned as the result.\n", 107 | " # Note that all the function code is written indented!" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "This function takes a variable, checks if the variable is an integer, checks if the integer is a valid age and then checks if the person with this age would be able to vote. In each case it prints what it thinks about the age (valid/invalid, old enough/too young) and it returns a Boolean value of `True` if the person is indeed old enough (`False` in all other cases).\n", 115 | "We can call any function by it's name from our main program." 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 2, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "name": "stdout", 125 | "output_type": "stream", 126 | "text": [ 127 | "Enter an age: 56\n", 128 | "Congratulations! You are old enough to vote!\n", 129 | "Sorry. That is not a valid number.\n", 130 | "Sorry. That is not a valid number.\n", 131 | "Sorry. That is not a valid number.\n", 132 | "Congratulations! You are old enough to vote!\n", 133 | "Sorry. You are too young to vote.\n" 134 | ] 135 | }, 136 | { 137 | "data": { 138 | "text/plain": [ 139 | "False" 140 | ] 141 | }, 142 | "execution_count": 2, 143 | "metadata": {}, 144 | "output_type": "execute_result" 145 | } 146 | ], 147 | "source": [ 148 | "my_age1 = input('Enter an age: ') # 56\n", 149 | "my_age2 = 33.23\n", 150 | "my_age3 = -5\n", 151 | "my_age4 = 'asdf'\n", 152 | "my_age5 = 19\n", 153 | "my_age6 = '17'\n", 154 | "voting_privileges(my_age1)\n", 155 | "voting_privileges(my_age2)\n", 156 | "voting_privileges(my_age3)\n", 157 | "voting_privileges(my_age4)\n", 158 | "voting_privileges(my_age5)\n", 159 | "voting_privileges(my_age6)" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "We can also use the value that the function returns." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 3, 172 | "metadata": {}, 173 | "outputs": [ 174 | { 175 | "name": "stdout", 176 | "output_type": "stream", 177 | "text": [ 178 | "Congratulations! You are old enough to vote!\n" 179 | ] 180 | } 181 | ], 182 | "source": [ 183 | "priv = voting_privileges(my_age5) # priv: True" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 6, 189 | "metadata": { 190 | "collapsed": true 191 | }, 192 | "outputs": [], 193 | "source": [ 194 | "# We can also have a function that doesn't accept any variables or doesn't return anything\n", 195 | "def print_something(): # no arguments\n", 196 | " print('Something!')\n", 197 | " # no return statement" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "Functions can use more than one arguments:\n", 205 | "\n", 206 | "We want to write a program that checks if a person is old enough to vote (18), drive a car (16), drink alcohol (21), be a representative (25), senator (30) or the president/vice president (35) of the US.\n", 207 | "\n", 208 | "Instead of creating six functions (i.e. one for each condition), we could create a generic function in which we would pass two arguments: the person's age and the minimum required age for the privilege that we want to check. This way we only need to create one function, which would differ slightly in the way it is called each time." 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 8, 214 | "metadata": { 215 | "collapsed": true 216 | }, 217 | "outputs": [], 218 | "source": [ 219 | "def age_checker(age, threshold):\n", 220 | " age = str(age)\n", 221 | " result = False\n", 222 | " if age.isdigit():\n", 223 | " age = int(age)\n", 224 | " if age > 0 and age < 150: \n", 225 | " if age >= threshold:\n", 226 | " result = True\n", 227 | " else:\n", 228 | " print('Sorry this is not a valid age.')\n", 229 | " else:\n", 230 | " print('Sorry. That is not a valid number.')\n", 231 | " return result" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "Now let's see how we would write the body of our program" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 10, 244 | "metadata": { 245 | "collapsed": true 246 | }, 247 | "outputs": [], 248 | "source": [ 249 | "age = input('Enter an age: ')\n", 250 | "drive = age_checker(age, 16) # check if person is old enough to drive\n", 251 | "vote = age_checker(age, 18) # check if person is old enough to vote\n", 252 | "drink = age_checker(age, 21) # check if person is old enough to drink\n", 253 | "repres = age_checker(age, 25) # check if person is old enough to be a representative\n", 254 | "senat = age_checker(age, 30) # check if person is old enough to be a senator\n", 255 | "pres = age_checker(age, 35) # check if person is old enough to be the (vice) president" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "Let's check what the privileges the person has." 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": 11, 268 | "metadata": {}, 269 | "outputs": [ 270 | { 271 | "name": "stdout", 272 | "output_type": "stream", 273 | "text": [ 274 | "You are legally able to ... do nothing in the US\n" 275 | ] 276 | } 277 | ], 278 | "source": [ 279 | "print('You are legally able to ', end=\"\") # the end=\"\" argument removes the \\n after each print function\n", 280 | "if drive:\n", 281 | " if vote:\n", 282 | " if drink:\n", 283 | " if repres:\n", 284 | " if senat:\n", 285 | " if pres:\n", 286 | " print('do anything you want to ', end=\"\")\n", 287 | " else:\n", 288 | " print('drive, vote, drink, be a representative and run for senate ', end=\"\")\n", 289 | " else:\n", 290 | " print('drive, vote, drink and be a representative ', end=\"\")\n", 291 | " else:\n", 292 | " print('drive, vote and drink ', end=\"\")\n", 293 | " else:\n", 294 | " print('drive and vote ', end=\"\")\n", 295 | " else:\n", 296 | " print('drive ', end=\"\")\n", 297 | "else:\n", 298 | " print('... do nothing ', end=\"\")\n", 299 | "print('in the US')" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "Another way to do this would be to write a new function that called `age_checker`. This way we create a more elegant main body for our program." 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 13, 312 | "metadata": { 313 | "collapsed": true 314 | }, 315 | "outputs": [], 316 | "source": [ 317 | "def us_privileges(age):\n", 318 | " age = str(age)\n", 319 | " if not age.isdigit(): # check if we entered a valid number\n", 320 | " print('Sorry, not a valid number.')\n", 321 | " return # With this command we exit the function. Can be used in a similar fashion with the break command in loops\n", 322 | " age = int(age)\n", 323 | " if (age == 0) or (age >= 150): # check if it is indeed a valid age\n", 324 | " print('Sorry, this is not a valid age.')\n", 325 | " return\n", 326 | " print('You are legally able to ', end=\"\") # the end=\"\" argument removes the \\n after each print function\n", 327 | " if age_checker(age,16): # Here we call a function inside a function.\n", 328 | " if age_checker(age,18):\n", 329 | " if age_checker(age,21):\n", 330 | " if age_checker(age,25):\n", 331 | " if age_checker(age,30):\n", 332 | " if age_checker(age,35):\n", 333 | " print('do anything you want to ', end=\"\")\n", 334 | " else:\n", 335 | " print('drive, vote, drink, be a representative and run for senate ', end=\"\")\n", 336 | " else:\n", 337 | " print('drive, vote, drink and be a representative ', end=\"\")\n", 338 | " else:\n", 339 | " print('drive, vote and drink ', end=\"\")\n", 340 | " else:\n", 341 | " print('drive and vote ', end=\"\")\n", 342 | " else:\n", 343 | " print('drive ', end=\"\")\n", 344 | " else:\n", 345 | " print('... do nothing ', end=\"\")\n", 346 | " print('in the US.')" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "Now our program's main body would be:" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 14, 359 | "metadata": {}, 360 | "outputs": [ 361 | { 362 | "name": "stdout", 363 | "output_type": "stream", 364 | "text": [ 365 | "You are legally able to drive, vote, drink, be a representative and run for senate in the US.\n" 366 | ] 367 | } 368 | ], 369 | "source": [ 370 | "my_age = input('Enter your age: ') # 34\n", 371 | "us_privileges(my_age)" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "Notice that even though we defined two functions, the main body of our program is just 2 lines long!\n", 379 | "\n", 380 | "Functions make programs more modular. By using functions our programs are easier to maintain. Suppose the law changed in the US allowing teenagers of or above 18 to drink alcohol. If we hadn't used functions would have had to search the whole program and replace each check manually. If we had indeed used functions we would just have had to replace a few lines in the function definition.\n", 381 | "\n", 382 | "## Function arguments\n", 383 | "\n", 384 | "Now let's look at something more advanced. We already said that print is a python built-in function and we saw a way of passing an extra argument to this function (the end=\"\" argument) to change it's functionality. How can we incorporate this into our own functions?\n", 385 | "\n", 386 | "In python there are two types of arguments: **positional arguments** and **keyword arguments**. \n", 387 | "- The first type of arguments need to be entered in the same position they appear such as the day and month argument in the previous example. \n", 388 | "- Keyword arguments are defined by a keyword (such as the `leap` argument in the previous example) and a value. These arguments don't need to be in a specific order when referenced by their keyword. When not entered they take the default value.\n", 389 | "\n", 390 | "\n", 391 | "```python\n", 392 | "# Function definition:\n", 393 | "def function(positional_1, positional_2, keyword_1=default_1, keyword_2=default_2):\n", 394 | " # function body\n", 395 | "```\n", 396 | "\n", 397 | "In the function above there are 2 positional and 2 keyword values. When calling the function we **have** to enter the two positional arguments, but **not** the positional ones.\n", 398 | "\n", 399 | "```python\n", 400 | "# Main body of the program:\n", 401 | "function(val_1, val_2, val_3, val_4) # keyword_1 = val_3, keyword_2 = val_4\n", 402 | "function(val_1, val_2, val_4, val_3) # keyword_1 = val_4, keyword_2 = val_3\n", 403 | "function(val_1, val_2, keyword_1=val_3, keyword_2=val_4) # keyword_1 = val_3, keyword_2 = val_4\n", 404 | "function(val_1, val_2, keyword_2=val_3, keyword_1=val_4) # keyword_1 = val_4, keyword_2 = val_3\n", 405 | "function(val_1, val_2, keyword_1=val_3) # keyword_1 = val_3, keyword_2 = default_2\n", 406 | "function(val_1, val_2, keyword_2=val_3) # keyword_1 = default_1, keyword_2 = val_3\n", 407 | "function(val_1, val_2) # keyword_1 = default_1, keyword_2 = default_2\n", 408 | "```\n", 409 | "\n", 410 | "This is especially helpful when wanting to create a function that will do a certain thing most of the times, but a few times would do something else.\n", 411 | "\n", 412 | "### Example\n", 413 | "\n", 414 | "Lets say we want to write a program calculates how many days the year has left. Obviously leap years have one more day than other years, so we need to take that into account. One way would be to require the user to enter the current day, month AND year (and check if that year was a leap year after all). However, we'll try something different. We want the user to just enter the day and month and signify this is a leap year." 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": 15, 420 | "metadata": { 421 | "collapsed": true 422 | }, 423 | "outputs": [], 424 | "source": [ 425 | "def days_remaining(day, month, leap=False): # we are using the european date format: dd/mm/yy\n", 426 | " # End day is NOT included\n", 427 | " if month==1:\n", 428 | " rem = 364 # remaining days from 1st January to 31 December in a non leap year\n", 429 | " if leap:\n", 430 | " rem += 1 # leap years have 1 more day\n", 431 | " if month==2:\n", 432 | " rem = 333\n", 433 | " if leap:\n", 434 | " rem += 1\n", 435 | " if month==3:\n", 436 | " rem = 305 # we have already passed the leap day in March\n", 437 | " if month==4:\n", 438 | " rem = 274\n", 439 | " if month==5:\n", 440 | " rem = 244\n", 441 | " if month==6:\n", 442 | " rem = 213\n", 443 | " if month==7:\n", 444 | " rem = 183\n", 445 | " if month==8:\n", 446 | " rem = 152\n", 447 | " if month==9:\n", 448 | " rem = 121\n", 449 | " if month==10:\n", 450 | " rem = 91\n", 451 | " if month==11:\n", 452 | " rem = 60\n", 453 | " if month==12:\n", 454 | " rem = 30\n", 455 | " return rem - day + 1 # +1 because we calculated remaining days from 1st of each month not last day of previous month" 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": {}, 461 | "source": [ 462 | "Here's how we call this function from our program" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": 16, 468 | "metadata": {}, 469 | "outputs": [ 470 | { 471 | "name": "stdout", 472 | "output_type": "stream", 473 | "text": [ 474 | "242\n", 475 | "344\n", 476 | "344\n", 477 | "343\n" 478 | ] 479 | } 480 | ], 481 | "source": [ 482 | "print(days_remaining(3,5)) # days remaining from May the 3rd till the end of the year\n", 483 | "# say this is a leap year (e.g 2016). we need to incorporate this into the calculation\n", 484 | "print(days_remaining(22, 1, True))\n", 485 | "print(days_remaining(22, 1, leap=True)) # same thing different syntax\n", 486 | "# note that the leap argument is completely optional if it has the default value (of a non-leap year)!\n", 487 | "print(days_remaining(22, 1, False))" 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": {}, 493 | "source": [ 494 | "A function doesn't have to have any positional arguments." 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 17, 500 | "metadata": { 501 | "collapsed": true 502 | }, 503 | "outputs": [], 504 | "source": [ 505 | "def remaining_days(day=1, month=1, leap=False):\n", 506 | " return days_remaining(day, month, leap) # too lazy to write the function's body again so I'll use the previous one" 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "Now we are free to rearrange the arguments as we please" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": 18, 519 | "metadata": {}, 520 | "outputs": [ 521 | { 522 | "name": "stdout", 523 | "output_type": "stream", 524 | "text": [ 525 | "344\n", 526 | "344\n", 527 | "242\n", 528 | "344\n" 529 | ] 530 | } 531 | ], 532 | "source": [ 533 | "print(remaining_days(day=22, month=1, leap=True))\n", 534 | "print(remaining_days(day=22, leap=True)) # month not required because it carries the default value\n", 535 | "print(remaining_days(month=5, leap=True, day=3)) # we can rearrange the arguments\n", 536 | "print(remaining_days(22, 1, True)) # or we can use them as positional arguments" 537 | ] 538 | }, 539 | { 540 | "cell_type": "markdown", 541 | "metadata": {}, 542 | "source": [ 543 | "Positional arguments need to be always positioned **before** the keyword ones: for instance `days_ramaining(leap=True, 22, 1)` would raise a SyntaxError.\n", 544 | "\n", 545 | "We can also have **variable-length arguments**:" 546 | ] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "execution_count": 21, 551 | "metadata": {}, 552 | "outputs": [ 553 | { 554 | "name": "stdout", 555 | "output_type": "stream", 556 | "text": [ 557 | "Positional: ('one', 'two', 'three')\n", 558 | "Keywords: {}\n", 559 | "\n", 560 | "\n", 561 | "Positional: ()\n", 562 | "Keywords: {'a': 'one', 'c': 'three', 'b': 'two'}\n", 563 | "\n", 564 | "\n", 565 | "Positional: ('one', 'two')\n", 566 | "Keywords: {'c': 'three', 'd': 'four'}\n" 567 | ] 568 | } 569 | ], 570 | "source": [ 571 | "def foo(*positional, **keywords):\n", 572 | " print(\"Positional:\", positional)\n", 573 | " print(\"Keywords:\", keywords)\n", 574 | " \n", 575 | "foo('one', 'two', 'three')\n", 576 | "print('\\n')\n", 577 | "foo(a='one', b='two', c='three')\n", 578 | "print('\\n')\n", 579 | "foo('one', 'two', c='three', d='four')" 580 | ] 581 | }, 582 | { 583 | "cell_type": "markdown", 584 | "metadata": {}, 585 | "source": [ 586 | "## Anonymous of lambda functions.\n", 587 | "\n", 588 | "These functions are declared without using the `def` keyword. They are called anonymous because they are **not bound to a name**.\n", 589 | "\n", 590 | "The syntax for defining `lambda` functions in python is: \n", 591 | "```python\n", 592 | "lambda arguments: return_value\n", 593 | "```" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": 22, 599 | "metadata": {}, 600 | "outputs": [ 601 | { 602 | "name": "stdout", 603 | "output_type": "stream", 604 | "text": [ 605 | "\n", 606 | "64\n", 607 | "81\n" 608 | ] 609 | } 610 | ], 611 | "source": [ 612 | "sq = lambda x: x**2\n", 613 | "print(type(sq))\n", 614 | "print(sq(8))\n", 615 | "print(sq(9))\n", 616 | "add = lambda x, y: return x + y\n", 617 | "print(add(3, 4))" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "## Example - fibonacci\n", 625 | "\n", 626 | "Let's try to calculate the n-th number in a Fibonacci sequence, where n is a predefined number of steps.\n", 627 | "Each number in the sequence is the sum of the previous two numbers.\n", 628 | "This way we generate the following sequence of integers: \n", 629 | "\n", 630 | "$$ 0,1,1,2,3,5,8,13,21,34,55,89, ...$$" 631 | ] 632 | }, 633 | { 634 | "cell_type": "code", 635 | "execution_count": 46, 636 | "metadata": {}, 637 | "outputs": [ 638 | { 639 | "name": "stdout", 640 | "output_type": "stream", 641 | "text": [ 642 | "1st way: fib(0) = 0 2nd way: fib(0) = 0\n", 643 | "1st way: fib(1) = 1 2nd way: fib(1) = 1\n", 644 | "1st way: fib(2) = 1 2nd way: fib(2) = 1\n", 645 | "1st way: fib(3) = 2 2nd way: fib(3) = 2\n", 646 | "1st way: fib(4) = 3 2nd way: fib(4) = 3\n", 647 | "1st way: fib(5) = 5 2nd way: fib(5) = 5\n", 648 | "1st way: fib(6) = 8 2nd way: fib(6) = 8\n", 649 | "1st way: fib(7) = 13 2nd way: fib(7) = 13\n", 650 | "1st way: fib(8) = 21 2nd way: fib(8) = 21\n", 651 | "1st way: fib(9) = 34 2nd way: fib(9) = 34\n", 652 | " ... and so on \n" 653 | ] 654 | } 655 | ], 656 | "source": [ 657 | "# 1st way - Iteration\n", 658 | "def fib1(n): # will calculate the sequence and return the n-th element in the Fibonacci sequence\n", 659 | " a, b = 0, 1 # we set the values of the first two numbers in the sequence\n", 660 | " for i in range(n): # repeat for n steps\n", 661 | " a, b = b, a + b # we set the first number equal to the second one from the previous iteration and\n", 662 | " # the second one equal to the sum of the two numbers of the previous iteration\n", 663 | " return a # return the result\n", 664 | "\n", 665 | "\n", 666 | "# This way is probably the easiest to understand, but there is a second way of doing this using function recursion.\n", 667 | "# Recursion is when a function keeps calling itself till a condition is met.\n", 668 | "\n", 669 | "# 2nd way - Recursion\n", 670 | "def fib2(n):\n", 671 | " if n == 0 or n == 1: # We need to manually return the first two fibonacci numbers: fib(0) = 0 and fib(1) = 1\n", 672 | " return n\n", 673 | " else:\n", 674 | " return fib2(n-1) + fib2(n-2) # This adds the previous two numbers. \n", 675 | " # Those two numbers are calculated in the same way.\n", 676 | " # This procedure continues until n reaches 1 and 0.\n", 677 | " \n", 678 | "\n", 679 | "for x in range(10):\n", 680 | " print('{}{:<10}'.format('1st way: fib({}) = '.format(x), fib1(x)), end='')\n", 681 | " print('2nd way: fib({}) = '.format(x), fib2(x))\n", 682 | "print('{:^45}'.format('... and so on'))" 683 | ] 684 | } 685 | ], 686 | "metadata": { 687 | "kernelspec": { 688 | "display_name": "Python 3", 689 | "language": "python", 690 | "name": "python3" 691 | }, 692 | "language_info": { 693 | "codemirror_mode": { 694 | "name": "ipython", 695 | "version": 3 696 | }, 697 | "file_extension": ".py", 698 | "mimetype": "text/x-python", 699 | "name": "python", 700 | "nbconvert_exporter": "python", 701 | "pygments_lexer": "ipython3", 702 | "version": "3.6.3" 703 | } 704 | }, 705 | "nbformat": 4, 706 | "nbformat_minor": 1 707 | } 708 | -------------------------------------------------------------------------------- /notebooks/11_modules_and_packages.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Modules\n", 8 | "\n", 9 | "If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a **script**. As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each script.\n", 10 | "\n", 11 | "To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a **module**; definitions from a module can be imported into other modules or into the main module (the collection of variables that you have access to in a script executed at the top level and in calculator mode).\n", 12 | "\n", 13 | "A module is a file containing Python definitions and statements. The file name is the module name with the suffix *.py* appended. Within a module, the module’s name (as a string) is available as the value of the global variable `__name__`.\n", 14 | "\n", 15 | "So say we write a python module and save it as a *.py* file. How do we use it? Well, we first have to import it (like the way we were importing packages all along!)." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "from __future__ import print_function" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "We just imported a function from a python package. In the same way we can import function from custom modules.\n", 32 | "\n", 33 | "We have created a module called *custom_module.py*\n", 34 | "\n", 35 | "First of all, we have to be in the **same folder** as the module, in which the function was defined.\n", 36 | "\n", 37 | "Then we need to `import` the module." 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 2, 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "import custom_module as mod" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "The above command instructs python to load module *custom_module.py* and refer to it from now on as mod. \n", 54 | "\n", 55 | "Finally, we call the function we wanted from the module, similar to how we call a method from a class." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 3, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "name": "stdout", 65 | "output_type": "stream", 66 | "text": [ 67 | "The 1st element in the range is: 0\n", 68 | "The 2nd element in the range is: 1\n", 69 | "The 3rd element in the range is: 2\n", 70 | "The 4th element in the range is: 3\n", 71 | "The 5th element in the range is: 4\n", 72 | "The 6th element in the range is: 5\n", 73 | "The 7th element in the range is: 6\n", 74 | "The 8th element in the range is: 7\n", 75 | "The 9th element in the range is: 8\n", 76 | "The 10th element in the range is: 9\n" 77 | ] 78 | } 79 | ], 80 | "source": [ 81 | "mod.verbose_print(range(10))" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "When importing modules or functions or classes from modules we need to be careful.\n", 89 | "\n", 90 | "While using the `from xxx import yyy` syntax appears convenient because we don't need to do as much typing as the plain `import xxx`, we may end up with name conflicts. The other problem is that we lose context about the function `yyy`. For example, it's less clear what `ceil()` does compared to `math.ceil()`. Another problem is that import statements need more support this way. If we reazlize we also need function `zzz` we need to add this to the import statement.\n", 91 | "\n", 92 | "Both methods have their pros and cons and it is up to your preference to choose which one to use.\n", 93 | "\n", 94 | "Just **don't** use \n", 95 | "```python\n", 96 | "from xxx import *\n", 97 | "```\n", 98 | "For any reasonable large set of code, if you `import *` you will likely be cementing it into the module, unable to be removed. This is because it is difficult to determine what items used in the code are coming from 'module', making it easy to get to the point where you think you don't use the import any more but it's extremely difficult to be sure.\n", 99 | "\n", 100 | "We can see which names a module defines by the `dir()` function." 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "name": "stdout", 110 | "output_type": "stream", 111 | "text": [ 112 | "['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'verbose_print']\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "print(dir(mod))" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "- `__builtins__` is by default `__builtin__`. This module provides direct access to all *built-in* identifiers of Python.\n", 125 | "- `__doc__` prints out the class' docstring. Python documentation strings (or docstrings) provide a convenient way of associating documentation with Python modules, functions, classes, and methods. In order to write a docstring just write a multiline comment under the class definition.\n", 126 | "- `__file__` contains the filename of the module. In this case *custom_module.py*.\n", 127 | "- `__package__` contains the name of the package the module belongs to." 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "There are also these functions that can be used to see the bindings in the global or local scope.\n", 135 | "\n", 136 | "- `globals()` always returns the dictionary of the module namespace.\n", 137 | "- `locals()` always returns a dictionary of the current namespace.\n", 138 | "- `vars()` returns either a dictionary of the current namespace (if called with no argument) or the dictionary of the argument.\n", 139 | "\n", 140 | "Note that for efficiency reasons, each module is only imported once per interpreter session. Therefore, if you change your modules, you must restart the interpreter – or, if it’s just one module you want to test interactively, use `importlib.reload()`.\n", 141 | "\n", 142 | "# Scripts\n", 143 | "\n", 144 | "There is a difference between scripts and modules. Scripts are *.py* files that are meant to be run from the command-line. Modules on the other hand are meant to support other modules or scripts. To ensure a *.py* file is run as a script, or to modify it's behavior depending on weather it is run as a script or a module we use the following code:" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": { 151 | "collapsed": true 152 | }, 153 | "outputs": [], 154 | "source": [ 155 | "if __name__ == '__main':\n", 156 | " print('module used as script')" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "This states that if the `*__name__` of the module is set to `'__main__'`, then we run the code. `'__main__'` is the name of the scope in which top-level code executes. \n", 164 | "\n", 165 | "If we want to run a module as a script we need to type:\n", 166 | "\n", 167 | "
python custom_module.py
\n", 168 | "\n", 169 | "from the command line.\n", 170 | "\n", 171 | "# Module search path\n", 172 | "\n", 173 | "When we try to import a module, python searches in 3 locations for it:\n", 174 | "\n", 175 | "- The **current working directory** (or the directory containing the input script).\n", 176 | "- The **`PYTHONPATH`**, which is an OS environmental variable containing a list of directory names.\n", 177 | "- The installation-dependent default. \n", 178 | " - For linux based OS this usually is: `/usr/local/lib/pythonX.X/...`.\n", 179 | " - For Windows OS this is the directory where you installed python (most common locations are `C:\\PythonXX`, `C:\\Users\\current_user\\AppData\\Local\\Programs\\Python\\PythonXX\\...`, etc.)" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "# Packages\n", 187 | "A package is a Python module which can contain submodules or recursively, subpackages. Technically, a package is a Python module with a `__path__` attribute.\n", 188 | "\n", 189 | "In order to access a submodule from a module, we need to use the dotted syntax we saw before!\n", 190 | "\n", 191 | "Naming can become troublesome when using packages:\n", 192 | "```python\n", 193 | "import package.subpackage.subsubpackage.module\n", 194 | "package.subpackage.subsubpackage.module.function()\n", 195 | "package.subpackage.subsubpackage.module.another_function()\n", 196 | "```\n", 197 | "This is **not** efficient for your code.\n", 198 | "\n", 199 | "Either use:\n", 200 | "\n", 201 | "```python\n", 202 | "from package.subpackage.subsubpackage import module\n", 203 | "module.function()\n", 204 | "module.another_function()\n", 205 | "```\n", 206 | " Or\n", 207 | " ```python\n", 208 | "import package.subpackage.subsubpackage.module as mod\n", 209 | "mod.function()\n", 210 | "mod.another_function()\n", 211 | "```" 212 | ] 213 | } 214 | ], 215 | "metadata": { 216 | "kernelspec": { 217 | "display_name": "Python 3", 218 | "language": "python", 219 | "name": "python3" 220 | }, 221 | "language_info": { 222 | "codemirror_mode": { 223 | "name": "ipython", 224 | "version": 3 225 | }, 226 | "file_extension": ".py", 227 | "mimetype": "text/x-python", 228 | "name": "python", 229 | "nbconvert_exporter": "python", 230 | "pygments_lexer": "ipython3", 231 | "version": "3.6.3" 232 | } 233 | }, 234 | "nbformat": 4, 235 | "nbformat_minor": 1 236 | } 237 | -------------------------------------------------------------------------------- /notebooks/12_exception_handling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from __future__ import print_function" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Errors\n", 17 | "\n", 18 | "Errors are messages being raised when python finds itself in a situation it isn't supposed to be in.\n", 19 | "\n", 20 | "For example, when we forget to close a bracket or make any mistake concerning the language's syntax, we will raise a *SyntaxError*." 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 2, 26 | "metadata": {}, 27 | "outputs": [ 28 | { 29 | "ename": "SyntaxError", 30 | "evalue": "invalid syntax (, line 1)", 31 | "output_type": "error", 32 | "traceback": [ 33 | "\u001b[1;36m File \u001b[1;32m\"\"\u001b[1;36m, line \u001b[1;32m1\u001b[0m\n\u001b[1;33m if True print('something') # we get a SyntaxError\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid syntax\n" 34 | ] 35 | } 36 | ], 37 | "source": [ 38 | "if True print('something') # we get a SyntaxError" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "When trying to use a variable we haven't assigned, we raise a *NameError*." 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "ename": "NameError", 55 | "evalue": "name 'a' is not defined", 56 | "output_type": "error", 57 | "traceback": [ 58 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 59 | "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", 60 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0ma\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# we get a NameError\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 61 | "\u001b[1;31mNameError\u001b[0m: name 'a' is not defined" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "print(a) # we get a NameError" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "When trying to perform an operation that is not supported for the objects we are trying to use it on, we raise a *TypeError*." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 4, 79 | "metadata": {}, 80 | "outputs": [ 81 | { 82 | "ename": "TypeError", 83 | "evalue": "must be str, not int", 84 | "output_type": "error", 85 | "traceback": [ 86 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 87 | "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", 88 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;34m'a'\u001b[0m \u001b[1;33m+\u001b[0m \u001b[1;36m3\u001b[0m \u001b[1;31m# we get a TypeError\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 89 | "\u001b[1;31mTypeError\u001b[0m: must be str, not int" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "'a' + 3 # we get a TypeError" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "When passing an index to a list that is out of its range, we raise an *IndexError*." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 5, 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "ename": "IndexError", 111 | "evalue": "list index out of range", 112 | "output_type": "error", 113 | "traceback": [ 114 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 115 | "\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)", 116 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0ml\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m2\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0ml\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 117 | "\u001b[1;31mIndexError\u001b[0m: list index out of range" 118 | ] 119 | } 120 | ], 121 | "source": [ 122 | "l = [0, 1, 2]\n", 123 | "l[3]" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "When referencing a dictionary key that does not exist, we raise a *KeyError*." 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 6, 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "ename": "KeyError", 140 | "evalue": "'c'", 141 | "output_type": "error", 142 | "traceback": [ 143 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 144 | "\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)", 145 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0md\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;34m'a'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'b'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[1;36m2\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0md\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'c'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 146 | "\u001b[1;31mKeyError\u001b[0m: 'c'" 147 | ] 148 | } 149 | ], 150 | "source": [ 151 | "d = {'a': 1, 'b': 2}\n", 152 | "d['c']" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "When attempting to divide by zero (in regular python), we will raise a *ZeroDivisionError*." 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 7, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "ename": "ZeroDivisionError", 169 | "evalue": "division by zero", 170 | "output_type": "error", 171 | "traceback": [ 172 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 173 | "\u001b[1;31mZeroDivisionError\u001b[0m Traceback (most recent call last)", 174 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;36m1\u001b[0m\u001b[1;33m/\u001b[0m\u001b[1;36m0\u001b[0m \u001b[1;31m# we get a ZeroDivisionError\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 175 | "\u001b[1;31mZeroDivisionError\u001b[0m: division by zero" 176 | ] 177 | } 178 | ], 179 | "source": [ 180 | "1/0 # we get a ZeroDivisionError" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "and so on...\n", 188 | "\n", 189 | "These different types of errors exist to inform us on what **kind** of mistake we made. \n", 190 | "\n", 191 | "It is important to differentiate between two types of errors:\n", 192 | "\n", 193 | "- **SyntaxErrors** are errors the interpreter finds when it is trying to parse the commands given to him. These errors are fatal for the execution of the script. These errors need to be corrected!\n", 194 | "- Errors that occur **during execution** are known as **exceptions** and can be **handled**!\n", 195 | "\n", 196 | "Handling of exceptions is done with the `try...except` block. This block consists of two parts: The `try` block contains all the commands we want python to **try** to execute. **If** an error occurs, the rest of the commands in the `try` block are skipped and python starts executing the commands in the `except` block. If no error is found, python performs all commands in the `try` block but **ignores** the `except` block.\n", 197 | "\n", 198 | "Syntax is:\n", 199 | "```python\n", 200 | "try:\n", 201 | " # Operations we want python to try to execute.\n", 202 | " # If an 'ErrorName' type error is encountered, python skips the rest of the commands in the try block\n", 203 | "except ErrorName:\n", 204 | " # Operations we want executed IF python encounters an error of the 'ErrorName' type.\n", 205 | "```\n", 206 | "\n", 207 | "## Example 1\n", 208 | "\n", 209 | "We want the user to enter a number from the keyboard:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 8, 215 | "metadata": {}, 216 | "outputs": [ 217 | { 218 | "name": "stdout", 219 | "output_type": "stream", 220 | "text": [ 221 | "Please enter a number: a\n", 222 | "Not a valid number, try again!\n", 223 | "Please enter a number: '5'\n", 224 | "Not a valid number, try again!\n", 225 | "Please enter a number: 5\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "while True:\n", 231 | " try:\n", 232 | " # under this line we put the commands we want python to try to perform\n", 233 | " x = int(input('Please enter a number: '))\n", 234 | " # here we want the user to enter something from the keyboard and then we try to convert this to an integer\n", 235 | " # if it is not a number the casting cannot be performed and we will have raised a ValueError\n", 236 | " break\n", 237 | " # if we didn't raise an error the next command will be performed (which exits the while loop)\n", 238 | " except ValueError:\n", 239 | " # This command is how we 'catch' errors. This means that if we raised a ValueError, python skips the rest \n", 240 | " # of the try block and performs the commands of the except block!\n", 241 | " print('Not a valid number, try again!')\n", 242 | " # if we want to just ignore the error (and not print anything like here) we can just use pass" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "There are many ways of handling exceptions in python, using the `try...except` block.\n", 250 | "\n", 251 | "- Multiple exceptions:\n", 252 | "```python\n", 253 | "try:\n", 254 | " # ...\n", 255 | "except(RuntimeError, TypeError, NameError):\n", 256 | " # ...\n", 257 | "```\n", 258 | "- Refering to exceptions in a different way (and handling them):\n", 259 | " - In python 3:\n", 260 | "```python\n", 261 | "try:\n", 262 | " # ...\n", 263 | "except ValueError as ve: # refers to the instance of the ValueError exception we caught as 've'\n", 264 | " print(type(ve)) # \n", 265 | " print(ve.args) # prints the arguments of the error\n", 266 | " print(ve) # prints the __str__() method of the ValueError class\n", 267 | " # ...\n", 268 | "```\n", 269 | " - In python 2:\n", 270 | "```python\n", 271 | "try:\n", 272 | " # ...\n", 273 | "except ValueError, ve: # same exact thing as above\n", 274 | " # ...\n", 275 | "``` \n", 276 | "\n", 277 | "- Ingnore all exceptions:\n", 278 | "```python\n", 279 | "try:\n", 280 | " # ...\n", 281 | "except: # catches all exceptions\n", 282 | " pass # ignores them\n", 283 | "```\n", 284 | "**Broad except clauses like the above are not recommended**.\n", 285 | "\n", 286 | "# Raising exceptions.\n", 287 | "\n", 288 | "The `raise` statement allows the programmer to force a specified exception to occur.\n", 289 | "\n", 290 | "Syntax is:\n", 291 | "```python\n", 292 | "raise ErrorName(arguments)\n", 293 | "```" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 9, 299 | "metadata": {}, 300 | "outputs": [ 301 | { 302 | "ename": "NameError", 303 | "evalue": "my error", 304 | "output_type": "error", 305 | "traceback": [ 306 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 307 | "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", 308 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mNameError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'my error'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 309 | "\u001b[1;31mNameError\u001b[0m: my error" 310 | ] 311 | } 312 | ], 313 | "source": [ 314 | "raise NameError('my error')" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "The argument in raise indicates the exception to be raised. This **must** be either an **exception instance** or an **exception class** (a class that derives from Exception).\n", 322 | "\n", 323 | "# Defining Exceptions\n", 324 | "\n", 325 | "Sometimes it is useful to define our own `type` of error. This can be done by creating a class that **derives from python's Exception class** either directly, or indirectly.\n", 326 | "\n", 327 | "```python\n", 328 | "class CustomError(Exception):\n", 329 | " ...\n", 330 | "```" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 10, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "ename": "MyError", 340 | "evalue": "'muahaha!'", 341 | "output_type": "error", 342 | "traceback": [ 343 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 344 | "\u001b[1;31mMyError\u001b[0m Traceback (most recent call last)", 345 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 6\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mrepr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mvalue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 8\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mMyError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'muahaha!'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 346 | "\u001b[1;31mMyError\u001b[0m: 'muahaha!'" 347 | ] 348 | } 349 | ], 350 | "source": [ 351 | "class MyError(Exception):\n", 352 | " # A user-defined exception must be derived directly or indirectly from the Exception class.\n", 353 | " def __init__(self, value):\n", 354 | " self.value = value\n", 355 | " def __str__(self):\n", 356 | " return repr(self.value)\n", 357 | " \n", 358 | "raise MyError('muahaha!')" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "# Assertions\n", 366 | "\n", 367 | "The `assert` statement helps the programmer find bugs and ensure his program is used the way he meant it to be used. The assert statements tests a condition, if it is `True` it continues; if it is `False`, it raises an **AssertionError**.\n", 368 | "\n", 369 | "```python\n", 370 | "assert condition, arguments\n", 371 | "```\n", 372 | "The arguments are optional and are passed as arguments to the `AssertionError()` exception.\n", 373 | "\n", 374 | "Assert is roughly equivalent to:\n", 375 | "```python\n", 376 | "if not condition:\n", 377 | "\traise AssertionError(arguments)\n", 378 | "```" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 11, 384 | "metadata": {}, 385 | "outputs": [ 386 | { 387 | "ename": "AssertionError", 388 | "evalue": "Failed condition 2!", 389 | "output_type": "error", 390 | "traceback": [ 391 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 392 | "\u001b[1;31mAssertionError\u001b[0m Traceback (most recent call last)", 393 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0mcnd2\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m(\u001b[0m\u001b[1;36m5\u001b[0m \u001b[1;33m==\u001b[0m \u001b[1;34m'5'\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# False\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[1;32massert\u001b[0m \u001b[0mcnd1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'Failed condition 1!'\u001b[0m \u001b[1;31m# does nothing\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[1;32massert\u001b[0m \u001b[0mcnd2\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'Failed condition 2!'\u001b[0m \u001b[1;31m# raises an AssertionError\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 394 | "\u001b[1;31mAssertionError\u001b[0m: Failed condition 2!" 395 | ] 396 | } 397 | ], 398 | "source": [ 399 | "cnd1 = (5 == 5) # True\n", 400 | "cnd2 = (5 == '5') # False\n", 401 | "assert cnd1, 'Failed condition 1!' # does nothing\n", 402 | "assert cnd2, 'Failed condition 2!' # raises an AssertionError" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "For nother example of an assertion, we will check if a user-defined exception class (that we call `err`), is a valid python `Exception`:" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": 12, 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [ 418 | "err = MyError\n", 419 | "assert issubclass(err, Exception), '{} is not a suitable class for an exception, ' \\\n", 420 | " 'because {} is not derived from class Exception'.format(err.__name__, err.__name__)" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "The previous assert checks if `err` is a subclass of `Exception`, which in this case it is. Because the condition was `True`, `assert` didn't raise any errors.\n", 428 | "\n", 429 | "Assertions should **not** be used to test for failure cases that can occur because of bad user input or operating system/environment failures, such as a file not being found. \n", 430 | "Instead, you should raise an exception, or print an error message, or whatever is appropriate. \n", 431 | "\n", 432 | "One important reason why assertions should only be used for self-tests of the program is that assertions **can be disabled at compile time**. If Python is started with the **-O** option, then assertions will be stripped out and not evaluated.\n", 433 | "\n", 434 | "For example:\n", 435 | "
python -O script.py
\n", 436 | "\n", 437 | "The **-O** option, is to run python in an *optimized mode*. This flag instructs python to **ignore assertions**, and sets the `__debug__` flag to `False`. If your code is heavy in assertions (which isn't a bad practice), you might achieve a better performance when ignoring them.\n", 438 | "\n", 439 | "# The try...except...else...finally block\n", 440 | "\n", 441 | "In a `try...except` block, `finally` indicates code that will always be executed whether or not an exception has been raised.\n", 442 | "\n", 443 | "```python\n", 444 | "try:\n", 445 | "\t# Operations go here.\n", 446 | "\t# If an error is encountered\n", 447 | "\t# the rest or the operations are skipped.\n", 448 | "except: # catches all errors\n", 449 | "\t# If an error is encountered\n", 450 | "\t# do the operations in this block.\n", 451 | "else:\n", 452 | "\t# If no error is encountered\n", 453 | "\t# do the operations of this block.\n", 454 | "finally:\n", 455 | "\t# This will always be executed.\n", 456 | "```" 457 | ] 458 | } 459 | ], 460 | "metadata": { 461 | "kernelspec": { 462 | "display_name": "Python 3", 463 | "language": "python", 464 | "name": "python3" 465 | }, 466 | "language_info": { 467 | "codemirror_mode": { 468 | "name": "ipython", 469 | "version": 3 470 | }, 471 | "file_extension": ".py", 472 | "mimetype": "text/x-python", 473 | "name": "python", 474 | "nbconvert_exporter": "python", 475 | "pygments_lexer": "ipython3", 476 | "version": "3.6.3" 477 | } 478 | }, 479 | "nbformat": 4, 480 | "nbformat_minor": 1 481 | } 482 | -------------------------------------------------------------------------------- /notebooks/13_time_random_ordereddict.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Ordered Dictionaries\n", 8 | "\n", 9 | "We discussed in a previous tutorial that dictionaries have **no sense or order**. In some cases though, we might want a ordered dictionary." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": {}, 16 | "outputs": [ 17 | { 18 | "name": "stdout", 19 | "output_type": "stream", 20 | "text": [ 21 | "OrderedDict: OrderedDict([('a', 1), ('b', 3), ('c', 5), ('d', 7), ('e', 9), ('f', 11), ('g', 13)])\n", 22 | "Normal Dict: {'a': 1, 'b': 3, 'c': 5, 'd': 7, 'e': 9, 'f': 11, 'g': 13}\n" 23 | ] 24 | } 25 | ], 26 | "source": [ 27 | "from __future__ import print_function\n", 28 | "from collections import OrderedDict\n", 29 | "\n", 30 | "keys = ['a', 'b', 'c', 'd', 'e', 'f', 'g']\n", 31 | "vals = range(1, 14, 2) # [1, 3, 5, 7, 9, 11, 13]\n", 32 | "\n", 33 | "od = OrderedDict() # OrderedDict declaration\n", 34 | "nd = dict() # normal dictionary\n", 35 | "for i in range(len(keys)):\n", 36 | " od[keys[i]] = vals[i] # population like normal dictionary\n", 37 | " nd[keys[i]] = vals[i]\n", 38 | "\n", 39 | "print('OrderedDict: ', od)\n", 40 | "print('Normal Dict: ', nd)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "We can see that the *OrderedDict* maintains the order in which it's elements were entered in. This is useful for mappings.\n", 48 | "\n", 49 | "If we want to retrieve, lets say, the 3rd pair we entered." 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 2, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "Pair: ('c', 5)\n", 62 | "Key: c\n", 63 | "Value: 5\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "print('Pair: ', list(od.items())[2])\n", 69 | "print('Key: ', list(od.keys())[2])\n", 70 | "print('Value: ', list(od.values())[2])" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "# Random\n", 78 | "\n", 79 | "This module implements pseudo-random number generators for various distributions. We talked about the `random` module a bit in a previous tutorial.\n", 80 | "\n", 81 | "## Functions for integers:\n", 82 | "\n", 83 | "\n", 84 | "- ```python \n", 85 | " random.randint(a,b) \n", 86 | " ```\n", 87 | " Returns a random integer $N$ such that $ a \\leqslant N \\leqslant b $\n", 88 | " \n", 89 | "- ```python\n", 90 | " random.randrange(start,stop[,step])\n", 91 | " ```\n", 92 | " Return a randomly selected element from `range(start, stop, step)`.\n", 93 | " A variant of this is also:\n", 94 | " ```python\n", 95 | " random.randrange(stop) # returns a random element from [0, stop)\n", 96 | " ```" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 3, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "name": "stdout", 106 | "output_type": "stream", 107 | "text": [ 108 | "Random Integers in [1,100]:\n", 109 | "88, 43, 65, 72, 70, 96, 45, 13, 95, 60, ...\n", 110 | "\n", 111 | "\n", 112 | "Random Integers in [1,3,5, ... ,99]:\n", 113 | "69, 69, 75, 55, 21, 21, 49, 55, 65, 39, ...\n" 114 | ] 115 | } 116 | ], 117 | "source": [ 118 | "import random\n", 119 | "print('Random Integers in [1,100]:')\n", 120 | "for i in range(10):\n", 121 | " print(random.randint(1,101), end=', ')\n", 122 | " # Returns a random integer in [1,100]\n", 123 | "print('...')\n", 124 | "print('\\n')\n", 125 | "print('Random Integers in [1,3,5, ... ,99]:')\n", 126 | "for i in range(10): \n", 127 | " print(random.randrange(1,101,2), end=', ')\n", 128 | "print('...')\n", 129 | "# Returns a random integer in [1,3,5, ... ,99]" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "## Functions for sequences\n", 137 | "\n", 138 | "\n", 139 | "- ```python\n", 140 | " random.random()\n", 141 | " ```\n", 142 | " Return the next random floating point number in the range $[0, 1)$.\n", 143 | "- ```python\n", 144 | " random.choice(seq)\n", 145 | " ```\n", 146 | " Return a random element from the non-empty sequence `seq`.\n", 147 | "- ```python\n", 148 | " random.shuffle(seq)\n", 149 | " ```\n", 150 | " Randomly shuffle the sequence `seq` in place.\n", 151 | "- ```python\n", 152 | " random.sample(population, k)\n", 153 | " ```\n", 154 | " Return a `k` length list of unique elements chosen from the `population` sequence. Used for random sampling without replacement.\n", 155 | "- ```python\n", 156 | " random.uniform(a,b)\n", 157 | " ```\n", 158 | " Return a random floating point number $N$ such that $ a \\leqslant N \\leqslant b $ for $ a \\leqslant b $ and $ b \\leqslant N \\leqslant a $ for $ b \\leqslant a $.\n", 159 | "- ```python \n", 160 | " random.gauss(mu, sigma)\n", 161 | " ```\n", 162 | " Gaussian distribution. `mu` is the mean, and `sigma` is the standard deviation." 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 4, 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "Random Numbers in [0,1):\n", 175 | "0.25585729084957154, 0.5145516408439316, 0.5103052413360393, 0.36230886465665757, 0.8818870280752225, 0.17336244010575041, 0.11880322850806313, 0.26513378700818746, 0.541489105937288, 0.08180041014948547, ...\n", 176 | "\n", 177 | "\n", 178 | "seq = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9)\n", 179 | "Random Choice in seq:\n", 180 | "6.6\n", 181 | "\n", 182 | "\n", 183 | "Random Shuffle of seq:\n", 184 | "[3.3, 2.2, 7.7, 9.9, 4.4, 1.1, 5.5, 6.6, 0.0, 8.8]\n", 185 | "\n", 186 | "\n", 187 | "Randomly generated population with a size of 1000\n", 188 | "Population mean: 4.803\n", 189 | "\n", 190 | "\n", 191 | "Random sample of population with a size of 50\n", 192 | "Sample mean: 5.14\n", 193 | "\n", 194 | "\n", 195 | "Random Numbers from Uniform distribution in [1,10]:\n", 196 | "9.02120062299251, 7.46806533106201, 3.2874204256322637, 3.22499528118127, 1.4700601222755596, 7.790588677943567, 5.17026993224375, 7.605240338851941, 1.6823203837384506, 8.633285717928793, ...\n", 197 | "\n", 198 | "\n", 199 | "Randomly generated sequence from Gaussian distribution with mean=0 and std.dev=1:\n", 200 | "size = 1000\n", 201 | "mean = 0.04128371203246678\n", 202 | "standard dev = 0.9837071665999959\n" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "print('Random Numbers in [0,1):')\n", 208 | "for i in range(10):\n", 209 | " print(random.random(), end=', ')\n", 210 | "print('...')\n", 211 | "print('\\n')\n", 212 | "\n", 213 | "seq = (0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9)\n", 214 | "print('seq =', seq)\n", 215 | "print('Random Choice in seq:')\n", 216 | "print(random.choice(seq))\n", 217 | "print('\\n')\n", 218 | "\n", 219 | "print('Random Shuffle of seq:')\n", 220 | "ls = list(seq)# Converted tuple to list. Tuples cannot be shuffled!\n", 221 | "random.shuffle(ls)\n", 222 | "print(ls) \n", 223 | "print('\\n')\n", 224 | "\n", 225 | "# Generating random population\n", 226 | "pop = []\n", 227 | "for i in range(1000):\n", 228 | " pop.append(random.randint(0,10))\n", 229 | "print('Randomly generated population with a size of', len(pop))\n", 230 | "print('Population mean: ', sum(pop)/float(len(pop)))\n", 231 | "print('\\n')\n", 232 | "\n", 233 | "smp = random.sample(pop, 50)\n", 234 | "print('Random sample of population with a size of', len(smp))\n", 235 | "print('Sample mean: ', sum(smp)/float(len(smp)))\n", 236 | "print('\\n')\n", 237 | "\n", 238 | "print('Random Numbers from Uniform distribution in [1,10]:')\n", 239 | "for i in range(10):\n", 240 | " print(random.uniform(1,10), end=', ')\n", 241 | "print('...')\n", 242 | "print('\\n')\n", 243 | "\n", 244 | "print('Randomly generated sequence from Gaussian distribution with mean=0 and std.dev=1:')\n", 245 | "gaus = []\n", 246 | "for i in range(1000):\n", 247 | " gaus.append(random.gauss(0,1))\n", 248 | "print('size =', len(gaus))\n", 249 | "print('mean =', sum(gaus)/float(len(gaus)))\n", 250 | "import statistics as st # using the statistics package for calculating the std deviation.\n", 251 | "print('standard dev =', st.stdev(gaus))" 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "## Generator States and Seeding\n", 259 | "\n", 260 | "A seed is a number used to initialize the pseudo-random number generator.\n", 261 | "```python\n", 262 | "random.seed(a)\n", 263 | "```\n", 264 | "If *a* is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the `os.urandom()` function for details on availability - we'll comment on this later).\n", 265 | "\n", 266 | "We could retrieve or set the **internal state** of the generator by the `getstate()` or `setstate()` functions.\n", 267 | "\n", 268 | "```python\n", 269 | "state = random.getstate()\n", 270 | "random.setstate(state)\n", 271 | "```\n", 272 | "The first command retrieves the state of the generator and stores it in variable `state`. The second one sets the state of the generator to `state`.\n", 273 | "\n", 274 | "These all can be done to reproduce results!" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": 5, 280 | "metadata": {}, 281 | "outputs": [ 282 | { 283 | "name": "stdout", 284 | "output_type": "stream", 285 | "text": [ 286 | "Without a seed:\n", 287 | "[287, 31, 969, 84, 381, 533, 670, 306, 945, 813]\n", 288 | "[364, 224, 230, 433, 753, 530, 777, 700, 545, 532]\n", 289 | "[584, 582, 308, 717, 976, 790, 13, 176, 297, 982]\n", 290 | "\n", 291 | "\n", 292 | "Seed 1:\n", 293 | "[426, 750, 10, 839, 845, 822, 305, 875, 377, 954]\n", 294 | "\n", 295 | "\n", 296 | "Seed 2:\n", 297 | "[500, 801, 71, 250, 174, 566, 541, 401, 179, 937]\n", 298 | "\n", 299 | "\n", 300 | "Without a seed:\n", 301 | "[154, 530, 873, 87, 489, 133, 207, 894, 88, 653]\n", 302 | "\n", 303 | "\n", 304 | "Seed 1:\n", 305 | "[426, 750, 10, 839, 845, 822, 305, 875, 377, 954]\n", 306 | "\n", 307 | "\n", 308 | "Seed 2:\n", 309 | "[500, 801, 71, 250, 174, 566, 541, 401, 179, 937]\n" 310 | ] 311 | } 312 | ], 313 | "source": [ 314 | "print('Without a seed:')\n", 315 | "print(random.Random().sample(range(1000),10))\n", 316 | "print(random.Random().sample(range(1000),10))\n", 317 | "print(random.Random().sample(range(1000),10))\n", 318 | "print('\\n')\n", 319 | "\n", 320 | "print('Seed 1:')\n", 321 | "seed = 12345\n", 322 | "print(random.Random(seed).sample(range(1000),10))\n", 323 | "print('\\n')\n", 324 | "\n", 325 | "print('Seed 2:')\n", 326 | "seed = 54321\n", 327 | "print(random.Random(seed).sample(range(1000),10))\n", 328 | "print('\\n')\n", 329 | "\n", 330 | "print('Without a seed:')\n", 331 | "print(random.Random().sample(range(1000),10))\n", 332 | "print('\\n')\n", 333 | "\n", 334 | "print('Seed 1:')\n", 335 | "seed = 12345\n", 336 | "print(random.Random(seed).sample(range(1000),10))\n", 337 | "print('\\n')\n", 338 | "\n", 339 | "print('Seed 2:')\n", 340 | "seed = 54321\n", 341 | "print(random.Random(seed).sample(range(1000),10))" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "Using the same seed results in the **same** random numbers generated. This is because, as we'll discuss below, the *random* module generates numbers in a deterministic way! When used without a seed, the sequence differs each time, but the results still **aren't random!**\n", 349 | "\n", 350 | "## A few notes on random numbers.\n", 351 | "\n", 352 | "Almost all module functions depend on the basic function `random()`, which generates a random float uniformly in the semi-open range *[0.0, 1.0)*. Python uses the **Mersenne Twister** as the core generator. It produces 53-bit precision floats and has a period of $2^19937 - 1$. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being **completely deterministic**, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.\n", 353 | "\n", 354 | "For cryptographic use we need a near-true random number generation technique, like\n", 355 | "```python\n", 356 | " os.urandom(n)\n", 357 | "```\n", 358 | "Return a string of *n* random bytes suitable for cryptographic use. This can in turn be used to *seed* the generator.\n", 359 | "\n", 360 | "This function returns random bytes from an **OS-specific randomness source**. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a UNIX-like system this will query `/dev/urandom`, and on Windows it will use `CryptGenRandom()`. If a randomness source is not found, `NotImplementedError` will be raised.\n", 361 | "\n", 362 | "The OS gathers environmental noise collected from device drivers and stores the bits of noise in an *entropy pool*. This pool is used to generate the random numbers.\n", 363 | "\n", 364 | "Another way to go is to use a Cryptographically Secure Pseudorandom Number Generator (CSPRNG). The problem with Mersenne Twister is that anyone can predict the state of the generator by observing a small number of it's outputs. This is harder to do in CSPRNG algorithms. " 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "# Time\n", 372 | "\n", 373 | "This module provides various time-related functions.\n", 374 | "\n", 375 | "For instance:" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 6, 381 | "metadata": {}, 382 | "outputs": [ 383 | { 384 | "name": "stdout", 385 | "output_type": "stream", 386 | "text": [ 387 | "Thu Apr 26 16:32:00 2018\n" 388 | ] 389 | } 390 | ], 391 | "source": [ 392 | "import time\n", 393 | "\n", 394 | "print(time.ctime())" 395 | ] 396 | }, 397 | { 398 | "cell_type": "markdown", 399 | "metadata": {}, 400 | "source": [ 401 | "An important use of this package is calculating how much time a script has run. This can be done with the `time.time()` function.\n", 402 | "```python\n", 403 | "time.time()\n", 404 | "```\n", 405 | "Returns the time in seconds since the epoch as a floating point number. While this is usually of no use to us, we can **compare** the time the script has started to the time the script has finished. By doing this we can determin how long the script has run.\n", 406 | "\n", 407 | "```python\n", 408 | "time.sleep(secs)\n", 409 | "```\n", 410 | "Suspend execution of the current thread for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time." 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 7, 416 | "metadata": {}, 417 | "outputs": [ 418 | { 419 | "name": "stdout", 420 | "output_type": "stream", 421 | "text": [ 422 | "Time elapsed: 10.00489854812622 seconds\n" 423 | ] 424 | } 425 | ], 426 | "source": [ 427 | "start_time = time.time() # time the program starts\n", 428 | "\n", 429 | "time.sleep(10) # suspends the execution for 10 seconds\n", 430 | "\n", 431 | "end_time = time.time() # time the program ends\n", 432 | "\n", 433 | "print('Time elapsed: {!s} seconds'.format(end_time - start_time))" 434 | ] 435 | } 436 | ], 437 | "metadata": { 438 | "kernelspec": { 439 | "display_name": "Python 3", 440 | "language": "python", 441 | "name": "python3" 442 | }, 443 | "language_info": { 444 | "codemirror_mode": { 445 | "name": "ipython", 446 | "version": 3 447 | }, 448 | "file_extension": ".py", 449 | "mimetype": "text/x-python", 450 | "name": "python", 451 | "nbconvert_exporter": "python", 452 | "pygments_lexer": "ipython3", 453 | "version": "3.6.3" 454 | } 455 | }, 456 | "nbformat": 4, 457 | "nbformat_minor": 1 458 | } 459 | -------------------------------------------------------------------------------- /notebooks/25_pipelines_gridsearch.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Completing the ML workflow\n", 8 | "\n", 9 | "Over the past few tutorials we've seen many aspects of a supervised ML workflow. From loading data to preprocessing, selecting and training a model, optimizing hyperparameters and finally evaluating the model. It's time to put all these together into a complete workflow for supervised ML problems.\n", 10 | "\n", 11 | "\n", 12 | "\n", 13 | "The main steps are:\n", 14 | "\n", 15 | "1. **Load** the data into python\n", 16 | "2. **Split** the data into train/test sets\n", 17 | "3. **Preprocess** the data\n", 18 | "\n", 19 | " 1. Perform all **necessary** preprocessing steps. These include:\n", 20 | " - Handling **missing** data (i.e. discard or impute)\n", 21 | " - Feature **encoding** (i.e. convert alphanumeric features into numeric)\n", 22 | " - Feature **scaling** (i.e. transform features so that they occupy similar value ranges) \n", 23 | " \n", 24 | " 2. **Optionally** we might want to perform:\n", 25 | " - Feature **selection** (i.e. discard some of the features)\n", 26 | " - Feature **extraction** (i.e. transform the data into a usually smaller feature space)\n", 27 | " - **Resampling** (i.e. under/over-sampling)\n", 28 | "4. **Select** a ML algorithm\n", 29 | "5. Optimize the algorithm's **hyperparameters** through **cross-validation**.\n", 30 | "6. **Evaluate** its performance on the test set. If it is inadequate, or if we want to improve on the results: **start over from step 2 and refine the process**! \n", 31 | "7. Finally, if we've achieved an adequate performance on the test set: train the model one last time, with the optimal hyperparameters, **on the whole dataset**.\n", 32 | "\n", 33 | "Scikit-learn has two very helpful classes that make our life easier when refining hyperparameters: **pipeline** and **grid search**.\n", 34 | "\n", 35 | "## pipeline\n", 36 | "\n", 37 | "*scikit-learn* [pipelines](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) provide a convenient way for incorporating multiple steps in a ML workflow.\n", 38 | "\n", 39 | "The concept of the `pipeline` is to encapsulate more than one steps into a single class. The first steps of the pipeline involve **preprocessing** steps. Through these the data is transformed accordingly. The last step of the pipeline is a model that can make predictions. Unfortunately **all preprocessing steps must be *scikit-learn* compatible objects**.\n", 40 | "\n", 41 | "All intermediate steps in a pipeline are transforms and must implement both a `.fit()` and a `.transform()` argument (like the scaler we saw before). The last step should be an estimator (i.e. have `.fit()` and `.predict()` methods). We need to pass these steps, sequentially, as a *list* of *tuples*, each containing the name and object of the transform/estimator.\n", 42 | "\n", 43 | "```python\n", 44 | "from sklearn.pipeline import Pipeline\n", 45 | "\n", 46 | "pipe = Pipeline([('transform1', transform1), ('transform2', transform2), ..., ('estimator', estimator)])\n", 47 | "```\n", 48 | "\n", 49 | "Let's try to implement a pipeline containing a StandardScaler and a k-NN model. " 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 1, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "0.956140350877193\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "from sklearn.model_selection import train_test_split\n", 67 | "from sklearn.neighbors import KNeighborsClassifier\n", 68 | "from sklearn.preprocessing import StandardScaler\n", 69 | "from sklearn.metrics import accuracy_score\n", 70 | "from sklearn.pipeline import Pipeline\n", 71 | "from sklearn import datasets\n", 72 | "\n", 73 | "# Load the iris dataset\n", 74 | "iris = datasets.load_breast_cancer()\n", 75 | "\n", 76 | "seed = 13 # random seed for reproducibility\n", 77 | "\n", 78 | "# Shuffle and split the data\n", 79 | "train, test, train_labels, test_labels = train_test_split(iris['data'], iris['target'], test_size=0.4, random_state=seed)\n", 80 | "\n", 81 | "# Define a scaler (default parameters)\n", 82 | "scaler = StandardScaler() \n", 83 | "\n", 84 | "# Define a kNN model (not default parameters)\n", 85 | "knn = KNeighborsClassifier(n_neighbors=11)\n", 86 | "\n", 87 | "# Create a pipeline with the scaler and the kNN\n", 88 | "pipe = Pipeline([('standardizer', scaler), ('classifier', knn)])\n", 89 | "\n", 90 | "# Train on the training set\n", 91 | "pipe.fit(train, train_labels)\n", 92 | "\n", 93 | "# Evaluate on the test set\n", 94 | "preds = pipe.predict(test)\n", 95 | "print(accuracy_score(test_labels, preds))" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "What the pipeline did is that when we called `pipe.fit()`, internally it called `.fit_transform()` **for each of its transforms** and `.fit()` **for its estimator**. Assuming an estimator with $M$ preprocessing steps, when we called `pipe.fit()` it ran the equivalent of fitting and transforming the data through each of the preprocessing steps and fitting the last step (i.e. the estimator)\n", 103 | "\n", 104 | "```python\n", 105 | "# Assuming that our pipeline is:\n", 106 | "pipe = Pipeline([('transform1', transform1), ('transform2', transform2), ..., ('estimator', estimator)])\n", 107 | "\n", 108 | "# If we ran:\n", 109 | "pipe.fit(train, train_labels)\n", 110 | "\n", 111 | "# It would be the equivalent of:\n", 112 | "tmp = transform1.fit_transform(train)\n", 113 | "tmp = transform2.fit_transform(tmp)\n", 114 | "# ...\n", 115 | "tmp = transformM.fit_transform(tmp)\n", 116 | "estimator.fit(tmp)\n", 117 | "```\n", 118 | "\n", 119 | "Running `pipe.predict()`, on the other hand, simply applied `.transform()` to each of the preprocessing steps and `.predict()` to the final step.\n", 120 | "\n", 121 | "```python\n", 122 | "# If we ran:\n", 123 | "preds = pipe.predict(test, test_labels)\n", 124 | "\n", 125 | "# It would be the equivalent of:\n", 126 | "tmp = transform1.transform(test)\n", 127 | "tmp = transform2.transform(tmp)\n", 128 | "# ...\n", 129 | "tmp = transformM.transform(tmp)\n", 130 | "preds = estimator.predict(tmp)\n", 131 | "```\n", 132 | "\n", 133 | "An easier way to create Pipelines is through scikit-learn `make_pipeline` function. This is a shorthand for the Pipeline constructor, that does not require naming the estimators. Instead, their names will be set to the lowercase of their types automatically.\n", 134 | "\n", 135 | "```python\n", 136 | "from sklearn.pipeline import make_pipeline\n", 137 | "\n", 138 | "pipe = make_pipeline(scaler, knn) \n", 139 | "```\n", 140 | "\n", 141 | "**Note**: If we want to put a sampler from imblearn into our pipeline we **must** use ` imblearn.pipeline.Pipeline` which extends sklearn's pipeline." 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 2, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | "0.9473684210526315\n" 154 | ] 155 | } 156 | ], 157 | "source": [ 158 | "from sklearn.feature_selection import VarianceThreshold\n", 159 | "from sklearn.decomposition import PCA\n", 160 | "from imblearn.over_sampling import SMOTE\n", 161 | "from imblearn.pipeline import Pipeline # import imblearn's pipeline because one of the steps is SMOTE\n", 162 | "\n", 163 | "\n", 164 | "pipe = Pipeline([('selector', VarianceThreshold()),\n", 165 | " ('scaler', StandardScaler()),\n", 166 | " ('sampler', SMOTE()),\n", 167 | " ('pca', PCA()),\n", 168 | " ('knn', KNeighborsClassifier())])\n", 169 | "\n", 170 | "pipe.fit(train, train_labels)\n", 171 | "\n", 172 | "preds = pipe.predict(test)\n", 173 | "print(accuracy_score(test_labels, preds))" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "## Grid search\n", 181 | "\n", 182 | "Before, we attempted to optimize a model by selecting its hyperparameters through a for loop. There is a much easier way provided through scikit-learn's [GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV). This function takes two main arguments: an estimator (or pipeline) and a *grid* of parameters we want the grid search to consider. The grid could be one of two things:\n", 183 | "\n", 184 | "- A dictionary with the hyperparameter names as its keys and a list of values as the corresponding dictionary value:\n", 185 | "```python\n", 186 | "grid = {'name1': [val1, val2, val3], 'name2': [val4, val5], ...}\n", 187 | "```\n", 188 | "This will force the grid search to search for **all** possible combinations of parameter values: \n", 189 | "```python\n", 190 | "(val1, val4, ...), (val1, val5, ...), (val2, val4, ...), (val2, val5, ...), ... etc.\n", 191 | "```\n", 192 | "\n", 193 | "- A list of such dictionaries:\n", 194 | "```python\n", 195 | "grid = [{'name1': [val1, val2, val3], 'name2': [val4, val5], ...},\n", 196 | " {'name1': [val1, val2, val3], 'name3': [val6, val7], ...}]\n", 197 | "```\n", 198 | "This will create a grid that contains combinations from both dictionaries.\n", 199 | "\n", 200 | "After creating such a grid:\n", 201 | "\n", 202 | "```python\n", 203 | "from sklearn.model_selection import GridSearchCV\n", 204 | "\n", 205 | "grid = {...}\n", 206 | "clf = GridSearchCV(estimator, grid)\n", 207 | "clf.fit(X_train, y_train) # will search all possible combinations defined by the grid\n", 208 | "preds = clf.predict(X_test) # will generate predictions based on the best configuration\n", 209 | "\n", 210 | "# In order to access the best model:\n", 211 | "clf.best_estimator_\n", 212 | "```" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 3, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "name": "stdout", 222 | "output_type": "stream", 223 | "text": [ 224 | "0.9780701754385965\n", 225 | "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", 226 | " metric_params=None, n_jobs=None, n_neighbors=3, p=1,\n", 227 | " weights='uniform')\n" 228 | ] 229 | } 230 | ], 231 | "source": [ 232 | "from sklearn.model_selection import GridSearchCV\n", 233 | "\n", 234 | "# Scale the data to be comparable to previous.\n", 235 | "scaled_train = scaler.fit_transform(train)\n", 236 | "scaled_test = scaler.transform(test)\n", 237 | "\n", 238 | "# Define a search grid.\n", 239 | "grid = {'n_neighbors': list(range(1, 15, 2)), \n", 240 | " 'p': [1, 2, 3, 4]}\n", 241 | "\n", 242 | "# Create the GridSearch class. This will serve as our classifier from now on.\n", 243 | "clf = GridSearchCV(knn, grid, cv=5) # 5-fold cross validation\n", 244 | "\n", 245 | "# Train the model as many times as designated by the grid.\n", 246 | "clf.fit(scaled_train, train_labels)\n", 247 | "\n", 248 | "# Evaluate on the test set and print best hyperparameters\n", 249 | "preds = clf.predict(scaled_test)\n", 250 | "print(accuracy_score(test_labels, preds))\n", 251 | "print(clf.best_estimator_)" 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "Grid searches can be performed on pipelines too! The only thing that changes is that now we need to specify which step each parameter belongs to. This is done by adding both the name of the step and the name of the parameter separated by two underscores (i.e. `__`). \n", 259 | "\n", 260 | "```python\n", 261 | "pipe = Pipeline([('step1', ...), ...])\n", 262 | "grid = {'step1__param1`': [val1, ...], ...} # this dictates param1 from step1 to take the values [val1, ...]\n", 263 | "clf = GridSearchCV(pipe, grid)\n", 264 | "clf.fit(X_train, y_train) # will search all possible combinations defined by the grid\n", 265 | "preds = clf.predict(X_test) # will generate predictions based on the best configuration\n", 266 | "```" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 4, 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "name": "stdout", 276 | "output_type": "stream", 277 | "text": [ 278 | "Best accuracy: 97.37%\n", 279 | "Pipeline(memory=None,\n", 280 | " steps=[('standardizer', StandardScaler(copy=True, with_mean=True, with_std=True)), ('classifier', KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", 281 | " metric_params=None, n_jobs=None, n_neighbors=5, p=1,\n", 282 | " weights='uniform'))])\n" 283 | ] 284 | } 285 | ], 286 | "source": [ 287 | "# Revert to the previous pipeline\n", 288 | "pipe = Pipeline([('standardizer', scaler), ('classifier', knn)])\n", 289 | "\n", 290 | "# Define a grid that checks for hyperparameters for both steps\n", 291 | "grid = {'standardizer__with_mean': [True, False], # Check parameters True/False for 'with_mean' argument of scaler\n", 292 | " 'standardizer__with_std': [True, False], # Check parameters True/False for 'with_std' argument of scaler\n", 293 | " 'classifier__n_neighbors': list(range(1, 15, 2)), # Check for values of 'n_neighbors' of knn\n", 294 | " 'classifier__p': [1, 2, 3, 4]} # Check for values of 'p' of knn\n", 295 | "\n", 296 | "# Create and train the grid search\n", 297 | "clf = GridSearchCV(pipe, grid, cv=5)\n", 298 | "clf.fit(train, train_labels)\n", 299 | "\n", 300 | "# Evaluate on the test set and print best hypterparameter values\n", 301 | "print('Best accuracy: {:.2f}%'.format(accuracy_score(test_labels, clf.predict(test))*100))\n", 302 | "print(clf.best_estimator_) # print the best configuration" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "Let's try to optimize the more complex pipeline. " 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 5, 315 | "metadata": {}, 316 | "outputs": [ 317 | { 318 | "name": "stdout", 319 | "output_type": "stream", 320 | "text": [ 321 | "Best accuracy: 96.49%\n", 322 | "Pipeline(memory=None,\n", 323 | " steps=[('selector', VarianceThreshold(threshold=0.0)), ('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('sampler', SMOTE(k_neighbors=5, kind='deprecated', m_neighbors='deprecated', n_jobs=1,\n", 324 | " out_step='deprecated', random_state=None, ratio=None,\n", 325 | " sampling_strategy='auto', s...ki',\n", 326 | " metric_params=None, n_jobs=None, n_neighbors=5, p=1,\n", 327 | " weights='uniform'))])\n" 328 | ] 329 | } 330 | ], 331 | "source": [ 332 | "pipe = Pipeline(steps=[('selector', VarianceThreshold()),\n", 333 | " ('scaler', StandardScaler()),\n", 334 | " ('sampler', SMOTE()),\n", 335 | " ('pca', PCA()),\n", 336 | " ('knn', KNeighborsClassifier())])\n", 337 | "\n", 338 | "grid = {'selector__threshold': [0.0, 0.005],\n", 339 | " 'pca__n_components': list(range(5, 16, 5)),\n", 340 | " 'knn__n_neighbors': list(range(1, 15, 2)),\n", 341 | " 'knn__p': [1, 2, 3, 4]}\n", 342 | "\n", 343 | "clf = GridSearchCV(pipe, grid, cv=5)\n", 344 | "clf.fit(train, train_labels)\n", 345 | "\n", 346 | "print('Best accuracy: {:.2f}%'.format(accuracy_score(test_labels, clf.predict(test)) * 100))\n", 347 | "print(clf.best_estimator_) " 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "With the inclusion of the feature selection/extraction steps, we actually managed to **hurt** our performance here.\n", 355 | "\n", 356 | "### Tips for using grid search:\n", 357 | "\n", 358 | "1. Always **calculate** the number of times a model is fit. In the example above we check for $2 \\cdot 3 \\cdot 4 \\cdot 4 = 168$ different hyperparameter combinations. Because we are using a 5-fold cross validation, each combination is used for 5 separate model fits. So the above grid search accounts for 840 different fits! It is very easy when using a grid search for this number to go up to the thousands which would take a **long time to complete**. If we were using a feature selection or imputing through a model, we would need to take that into account too!\n", 359 | "\n", 360 | "2. Grid search has a parameter called `verbose` which offers several **levels of verbosity**. I'd recommend setting a `verbose=1` so that *scikit-learn* informs you on the number of times a model needs to be trained and how much time it took. You can, however, set a larger value which will inform you on the progress of each fit in detail. Caution: this will flood your screen!\n", 361 | "\n", 362 | "3. Instead of checking all different parameter combinations which would be computationally impossible to achieve, we could use a more **progressive** grid search! Imagine we want to optimize a hyperparameter `x` that ranges from $1$ to $1000$:\n", 363 | " - First perform a grid search on `[1, 5, 10, 50, 100, 500, 1000]` (or even more sparse if it takes too long). We get the best performance for $x = 500$.\n", 364 | " - Now perform a grid search on `[200, 350, 500, 650, 800]`. The best performance is produced with $x=800$.\n", 365 | " - Choose an even more close grid `[725, 730, 735, 740, 745, 750]`.\n", 366 | " - Repeat until you achieve the desired precision.\n", 367 | "\n", 368 | "4. `GridSearchCV` has a parameter called `n_jobs`. This can determine the number of jobs to run in parallel. This can increase computation time, but might criple your pc.\n", 369 | "\n", 370 | "\n", 371 | "### Drawbacks:\n", 372 | "\n", 373 | "One major drawback of using pipelines is that they support only scikit-learn compatible objects. Many preprocessing steps, however, need to be implemented in a library like *pandas*. To refine these steps we'll need to do so manually! Either that or you can write your own class in an sklearn-like manner and incorporate them into a pipeline. " 374 | ] 375 | } 376 | ], 377 | "metadata": { 378 | "kernelspec": { 379 | "display_name": "Python 3", 380 | "language": "python", 381 | "name": "python3" 382 | }, 383 | "language_info": { 384 | "codemirror_mode": { 385 | "name": "ipython", 386 | "version": 3 387 | }, 388 | "file_extension": ".py", 389 | "mimetype": "text/x-python", 390 | "name": "python", 391 | "nbconvert_exporter": "python", 392 | "pygments_lexer": "ipython3", 393 | "version": "3.8.6" 394 | } 395 | }, 396 | "nbformat": 4, 397 | "nbformat_minor": 2 398 | } 399 | -------------------------------------------------------------------------------- /notebooks/custom_module.py: -------------------------------------------------------------------------------- 1 | def verbose_print(data): 2 | typ = data.__class__.__name__ 3 | ln = len(data) 4 | for i in range(ln): 5 | j = i + 1 6 | if j==1: print('The 1st element in the {} is: {}'.format(typ, data[i])) 7 | elif j==2: print('The 2nd element in the {} is: {}'.format(typ, data[i])) 8 | elif j==3: print('The 3rd element in the {} is: {}'.format(typ, data[i])) 9 | else: print('The {}th element in the {} is: {}'.format(j, typ, data[i])) 10 | 11 | if __name__ == '__main': 12 | print('module used as script') 13 | -------------------------------------------------------------------------------- /notebooks/scr_args.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import sys 3 | 4 | print(sys.argv) 5 | --------------------------------------------------------------------------------