├── .gitignore ├── LICENSE ├── Lesson 1 ├── Lesson 1 List Exercises.ipynb ├── Lesson 1 Student Activity 01 Solutions.ipynb ├── Lesson 1 Student Activity 02 - Solutions.ipynb ├── Readme.md ├── Set Exercises.ipynb ├── String Exercise.ipynb └── Tuple Exercise.ipynb ├── Lesson 2 ├── List Exercises.ipynb ├── OS Exercises.ipynb ├── Readme.md ├── Solutions Student Activity.ipynb └── Student Activity 02 Design your own CSV parser - Solutions.ipynb ├── Lesson 3 ├── Boston_housing.csv ├── Lesson 3 Activity Solution.ipynb ├── Lesson 3 Activity.ipynb ├── Lesson 3 Exercise 1-13 Numpy arrays.ipynb ├── Lesson 3 Exercise 14-21 Pandas DataFrame.ipynb ├── Lesson 3 Exercise 22-27 Matplotlib and Descriptive Statistics.ipynb └── Lesson Three Numpy, Pandas, Matplotlib.ipynb ├── Lesson 4 ├── Boston_housing.csv ├── Lesson 4 All Exercises.ipynb ├── Lesson 4 Topic 1 Exercises 1-7.ipynb ├── Lesson 4 Topic 2 Exercises 8-11.ipynb ├── Lesson 4 Topic 3 Exercises 12-14.ipynb ├── Lesson 4 Topic 4 Exercises 15-19.ipynb └── Sample - Superstore.xls ├── Lesson 5 ├── Boston_housing.csv ├── C11065_Packt Author Contract_Tirthajyoti Sarkar.pdf ├── CSV_EX_1.csv ├── CSV_EX_1.zip ├── CSV_EX_2.csv ├── CSV_EX_3.csv ├── CSV_EX_blankline.csv ├── CSV_EX_skipfooter.csv ├── CSV_EX_skiprows.csv ├── Data Wrangling with Python.pdf ├── Housing_data.pdf ├── Housing_data.xlsx ├── JSON_EX_1.json ├── JSON_EX_Movies.json ├── Lesson 05 Topic 02 Exercises.ipynb ├── Lesson 5 Activity 2 SOLUTION.ipynb ├── Lesson 5 Activity 2.ipynb ├── Lesson 5 Topic 1 Exercise.ipynb ├── Table_EX_1.txt ├── Table_tab_separated.txt ├── WDI-2016.pdf ├── movies.json ├── rscfp2016.dta ├── scfp2016s.zip └── tsarkar31-analysis.pdf ├── Lesson 6 ├── Lesson 6 Activitiy 01 - Solutions.ipynb ├── Lesson 6 Topic 1 Exercises.ipynb ├── Lesson 6 Topic 2 Exercises.ipynb ├── Lesson 6 Topic 3 Exercieses.ipynb ├── Readme.md ├── combinded_data.csv ├── dummy_data.csv ├── dummy_header.csv └── visit_data.csv ├── Lesson 7 Topic 3 Exercises.ipynb ├── Lesson 8 ├── Exercise 161 - 173.ipynb ├── Readme.md ├── Student Activity 01 Solutions.ipynb └── petsdb ├── Lesson 9 ├── India_World_Bank_Info.csv ├── Lesson 9 Topic 1.ipynb └── Readme.md ├── Lesson-7 ├── Lesson 7 Activity 1 - List Top 100 ebooks from Gutenberg.org - SOLUTION.ipynb ├── Lesson 7 Activity 1 - List Top 100 ebooks from Gutenberg.org.ipynb ├── Lesson 7 Activity 2 - Build your own movie database - SOLUTION.ipynb ├── Lesson 7 Activity 2 - Build your own movie database.ipynb ├── Lesson 7 Topic 1 Exercises.ipynb ├── Lesson 7 Topic 2 Exercises.ipynb ├── Lesson 7 Topic 3 Exercises.ipynb ├── Lesson 7 Topic 4 Exercises.ipynb └── Readme.md └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Tirthajyoti Sarkar 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Lesson 1/Lesson 1 Student Activity 01 Solutions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Task 1" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import random\n", 17 | "\n", 18 | "LIMIT = 100\n", 19 | "random_number_list = [random.randint(0, LIMIT) for x in range(0, LIMIT)]" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": {}, 26 | "outputs": [ 27 | { 28 | "data": { 29 | "text/plain": [ 30 | "[68,\n", 31 | " 17,\n", 32 | " 96,\n", 33 | " 78,\n", 34 | " 57,\n", 35 | " 63,\n", 36 | " 23,\n", 37 | " 38,\n", 38 | " 8,\n", 39 | " 57,\n", 40 | " 77,\n", 41 | " 40,\n", 42 | " 52,\n", 43 | " 52,\n", 44 | " 33,\n", 45 | " 99,\n", 46 | " 13,\n", 47 | " 10,\n", 48 | " 54,\n", 49 | " 11,\n", 50 | " 60,\n", 51 | " 92,\n", 52 | " 89,\n", 53 | " 24,\n", 54 | " 2,\n", 55 | " 4,\n", 56 | " 75,\n", 57 | " 45,\n", 58 | " 6,\n", 59 | " 41,\n", 60 | " 69,\n", 61 | " 56,\n", 62 | " 86,\n", 63 | " 19,\n", 64 | " 47,\n", 65 | " 4,\n", 66 | " 76,\n", 67 | " 34,\n", 68 | " 22,\n", 69 | " 96,\n", 70 | " 30,\n", 71 | " 47,\n", 72 | " 17,\n", 73 | " 29,\n", 74 | " 51,\n", 75 | " 52,\n", 76 | " 49,\n", 77 | " 79,\n", 78 | " 51,\n", 79 | " 38,\n", 80 | " 87,\n", 81 | " 69,\n", 82 | " 12,\n", 83 | " 1,\n", 84 | " 95,\n", 85 | " 19,\n", 86 | " 75,\n", 87 | " 67,\n", 88 | " 97,\n", 89 | " 77,\n", 90 | " 93,\n", 91 | " 44,\n", 92 | " 3,\n", 93 | " 97,\n", 94 | " 98,\n", 95 | " 8,\n", 96 | " 93,\n", 97 | " 2,\n", 98 | " 17,\n", 99 | " 86,\n", 100 | " 19,\n", 101 | " 62,\n", 102 | " 79,\n", 103 | " 29,\n", 104 | " 6,\n", 105 | " 84,\n", 106 | " 2,\n", 107 | " 28,\n", 108 | " 32,\n", 109 | " 37,\n", 110 | " 45,\n", 111 | " 18,\n", 112 | " 15,\n", 113 | " 42,\n", 114 | " 85,\n", 115 | " 43,\n", 116 | " 59,\n", 117 | " 30,\n", 118 | " 28,\n", 119 | " 30,\n", 120 | " 63,\n", 121 | " 93,\n", 122 | " 34,\n", 123 | " 62,\n", 124 | " 43,\n", 125 | " 90,\n", 126 | " 68,\n", 127 | " 14,\n", 128 | " 38,\n", 129 | " 87]" 130 | ] 131 | }, 132 | "execution_count": 2, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "random_number_list" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "### Task 2" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 3, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "list_with_divisible_by_3 = [a for a in random_number_list if a % 3 == 0]" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 4, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/plain": [ 165 | "[96,\n", 166 | " 78,\n", 167 | " 57,\n", 168 | " 63,\n", 169 | " 57,\n", 170 | " 33,\n", 171 | " 99,\n", 172 | " 54,\n", 173 | " 60,\n", 174 | " 24,\n", 175 | " 75,\n", 176 | " 45,\n", 177 | " 6,\n", 178 | " 69,\n", 179 | " 96,\n", 180 | " 30,\n", 181 | " 51,\n", 182 | " 51,\n", 183 | " 87,\n", 184 | " 69,\n", 185 | " 12,\n", 186 | " 75,\n", 187 | " 93,\n", 188 | " 3,\n", 189 | " 93,\n", 190 | " 6,\n", 191 | " 84,\n", 192 | " 45,\n", 193 | " 18,\n", 194 | " 15,\n", 195 | " 42,\n", 196 | " 30,\n", 197 | " 30,\n", 198 | " 63,\n", 199 | " 93,\n", 200 | " 90,\n", 201 | " 87]" 202 | ] 203 | }, 204 | "execution_count": 4, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "list_with_divisible_by_3" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "### Task 3" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 5, 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "data": { 227 | "text/plain": [ 228 | "63" 229 | ] 230 | }, 231 | "execution_count": 5, 232 | "metadata": {}, 233 | "output_type": "execute_result" 234 | } 235 | ], 236 | "source": [ 237 | "length_of_random_list = len(random_number_list)\n", 238 | "length_of_3_divisible_list = len(list_with_divisible_by_3)\n", 239 | "difference = length_of_random_list - length_of_3_divisible_list\n", 240 | "difference" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "### Task 4" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 6, 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "data": { 257 | "text/plain": [ 258 | "[64, 62, 66, 72, 66, 67, 74, 62, 59, 57]" 259 | ] 260 | }, 261 | "execution_count": 6, 262 | "metadata": {}, 263 | "output_type": "execute_result" 264 | } 265 | ], 266 | "source": [ 267 | "NUMBER_OF_EXPERIEMENTS = 10\n", 268 | "difference_list = []\n", 269 | "for i in range(0, NUMBER_OF_EXPERIEMENTS):\n", 270 | " random_number_list = [random.randint(0, LIMIT) for x in range(0, LIMIT)]\n", 271 | " list_with_divisible_by_3 = [a for a in random_number_list if a % 3 == 0]\n", 272 | " \n", 273 | " length_of_random_list = len(random_number_list)\n", 274 | " length_of_3_divisible_list = len(list_with_divisible_by_3)\n", 275 | " difference = length_of_random_list - length_of_3_divisible_list\n", 276 | " \n", 277 | " difference_list.append(difference)\n", 278 | "difference_list" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 7, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "data": { 288 | "text/plain": [ 289 | "64.9" 290 | ] 291 | }, 292 | "execution_count": 7, 293 | "metadata": {}, 294 | "output_type": "execute_result" 295 | } 296 | ], 297 | "source": [ 298 | "avg_diff = sum(difference_list) / float(len(difference_list))\n", 299 | "avg_diff" 300 | ] 301 | } 302 | ], 303 | "metadata": { 304 | "kernelspec": { 305 | "display_name": "Python 3", 306 | "language": "python", 307 | "name": "python3" 308 | }, 309 | "language_info": { 310 | "codemirror_mode": { 311 | "name": "ipython", 312 | "version": 3 313 | }, 314 | "file_extension": ".py", 315 | "mimetype": "text/x-python", 316 | "name": "python", 317 | "nbconvert_exporter": "python", 318 | "pygments_lexer": "ipython3", 319 | "version": "3.6.4" 320 | } 321 | }, 322 | "nbformat": 4, 323 | "nbformat_minor": 2 324 | } 325 | -------------------------------------------------------------------------------- /Lesson 1/Readme.md: -------------------------------------------------------------------------------- 1 | ## Lesson 1 materials 2 | -------------------------------------------------------------------------------- /Lesson 1/Set Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Exercise 6" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import random\n", 17 | "list_1 = [random.randint(0, 30) for x in range (0, 100)]" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "metadata": {}, 24 | "outputs": [ 25 | { 26 | "data": { 27 | "text/plain": [ 28 | "[29,\n", 29 | " 22,\n", 30 | " 9,\n", 31 | " 5,\n", 32 | " 24,\n", 33 | " 23,\n", 34 | " 18,\n", 35 | " 12,\n", 36 | " 17,\n", 37 | " 30,\n", 38 | " 23,\n", 39 | " 0,\n", 40 | " 13,\n", 41 | " 14,\n", 42 | " 0,\n", 43 | " 14,\n", 44 | " 13,\n", 45 | " 18,\n", 46 | " 18,\n", 47 | " 1,\n", 48 | " 26,\n", 49 | " 17,\n", 50 | " 9,\n", 51 | " 10,\n", 52 | " 3,\n", 53 | " 30,\n", 54 | " 6,\n", 55 | " 22,\n", 56 | " 1,\n", 57 | " 0,\n", 58 | " 2,\n", 59 | " 28,\n", 60 | " 12,\n", 61 | " 2,\n", 62 | " 1,\n", 63 | " 20,\n", 64 | " 0,\n", 65 | " 6,\n", 66 | " 8,\n", 67 | " 24,\n", 68 | " 20,\n", 69 | " 10,\n", 70 | " 16,\n", 71 | " 9,\n", 72 | " 17,\n", 73 | " 23,\n", 74 | " 12,\n", 75 | " 30,\n", 76 | " 12,\n", 77 | " 27,\n", 78 | " 24,\n", 79 | " 22,\n", 80 | " 18,\n", 81 | " 14,\n", 82 | " 12,\n", 83 | " 9,\n", 84 | " 11,\n", 85 | " 10,\n", 86 | " 9,\n", 87 | " 13,\n", 88 | " 28,\n", 89 | " 22,\n", 90 | " 15,\n", 91 | " 27,\n", 92 | " 12,\n", 93 | " 15,\n", 94 | " 4,\n", 95 | " 0,\n", 96 | " 16,\n", 97 | " 9,\n", 98 | " 4,\n", 99 | " 30,\n", 100 | " 26,\n", 101 | " 10,\n", 102 | " 1,\n", 103 | " 5,\n", 104 | " 28,\n", 105 | " 20,\n", 106 | " 7,\n", 107 | " 12,\n", 108 | " 17,\n", 109 | " 29,\n", 110 | " 20,\n", 111 | " 24,\n", 112 | " 12,\n", 113 | " 5,\n", 114 | " 3,\n", 115 | " 2,\n", 116 | " 26,\n", 117 | " 19,\n", 118 | " 7,\n", 119 | " 23,\n", 120 | " 5,\n", 121 | " 6,\n", 122 | " 13,\n", 123 | " 26,\n", 124 | " 26,\n", 125 | " 4,\n", 126 | " 22,\n", 127 | " 13]" 128 | ] 129 | }, 130 | "execution_count": 2, 131 | "metadata": {}, 132 | "output_type": "execute_result" 133 | } 134 | ], 135 | "source": [ 136 | "list_1" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 3, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "data": { 146 | "text/plain": [ 147 | "[0,\n", 148 | " 1,\n", 149 | " 2,\n", 150 | " 3,\n", 151 | " 4,\n", 152 | " 5,\n", 153 | " 6,\n", 154 | " 7,\n", 155 | " 8,\n", 156 | " 9,\n", 157 | " 10,\n", 158 | " 11,\n", 159 | " 12,\n", 160 | " 13,\n", 161 | " 14,\n", 162 | " 15,\n", 163 | " 16,\n", 164 | " 17,\n", 165 | " 18,\n", 166 | " 19,\n", 167 | " 20,\n", 168 | " 22,\n", 169 | " 23,\n", 170 | " 24,\n", 171 | " 26,\n", 172 | " 27,\n", 173 | " 28,\n", 174 | " 29,\n", 175 | " 30]" 176 | ] 177 | }, 178 | "execution_count": 3, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "unique_number_list = list(set(list_1))\n", 185 | "unique_number_list" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "### Exercise 7" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 4, 198 | "metadata": {}, 199 | "outputs": [], 200 | "source": [ 201 | "set1 = {\"Apple\", \"Orange\", \"Banana\"}\n", 202 | "set2 = {\"Pear\", \"Peach\", \"Mango\", \"Banana\"}" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 5, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/plain": [ 213 | "{'Apple', 'Banana', 'Orange'}" 214 | ] 215 | }, 216 | "execution_count": 5, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "set1" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 6, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "data": { 232 | "text/plain": [ 233 | "{'Banana', 'Mango', 'Peach', 'Pear'}" 234 | ] 235 | }, 236 | "execution_count": 6, 237 | "metadata": {}, 238 | "output_type": "execute_result" 239 | } 240 | ], 241 | "source": [ 242 | "set2" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 7, 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/plain": [ 253 | "{'Apple', 'Banana', 'Mango', 'Orange', 'Peach', 'Pear'}" 254 | ] 255 | }, 256 | "execution_count": 7, 257 | "metadata": {}, 258 | "output_type": "execute_result" 259 | } 260 | ], 261 | "source": [ 262 | "set1 | set2" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": 8, 268 | "metadata": {}, 269 | "outputs": [ 270 | { 271 | "data": { 272 | "text/plain": [ 273 | "{'Banana'}" 274 | ] 275 | }, 276 | "execution_count": 8, 277 | "metadata": {}, 278 | "output_type": "execute_result" 279 | } 280 | ], 281 | "source": [ 282 | "set1 & set2" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "### Exercise 8" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 9, 295 | "metadata": {}, 296 | "outputs": [], 297 | "source": [ 298 | "my_null_set = set({})" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 10, 304 | "metadata": {}, 305 | "outputs": [ 306 | { 307 | "data": { 308 | "text/plain": [ 309 | "set" 310 | ] 311 | }, 312 | "execution_count": 10, 313 | "metadata": {}, 314 | "output_type": "execute_result" 315 | } 316 | ], 317 | "source": [ 318 | "type(my_null_set)" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 11, 324 | "metadata": {}, 325 | "outputs": [ 326 | { 327 | "data": { 328 | "text/plain": [ 329 | "dict" 330 | ] 331 | }, 332 | "execution_count": 11, 333 | "metadata": {}, 334 | "output_type": "execute_result" 335 | } 336 | ], 337 | "source": [ 338 | "my_null_set = {}\n", 339 | "type(my_null_set)" 340 | ] 341 | } 342 | ], 343 | "metadata": { 344 | "kernelspec": { 345 | "display_name": "Python 3", 346 | "language": "python", 347 | "name": "python3" 348 | }, 349 | "language_info": { 350 | "codemirror_mode": { 351 | "name": "ipython", 352 | "version": 3 353 | }, 354 | "file_extension": ".py", 355 | "mimetype": "text/x-python", 356 | "name": "python", 357 | "nbconvert_exporter": "python", 358 | "pygments_lexer": "ipython3", 359 | "version": "3.6.4" 360 | } 361 | }, 362 | "nbformat": 4, 363 | "nbformat_minor": 2 364 | } 365 | -------------------------------------------------------------------------------- /Lesson 1/String Exercise.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### String exmaple" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "my_string = 'Hello World!'" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/plain": [ 27 | "'Hello World!'" 28 | ] 29 | }, 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "output_type": "execute_result" 33 | } 34 | ], 35 | "source": [ 36 | "my_string" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 3, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/plain": [ 47 | "'Hello World 2!'" 48 | ] 49 | }, 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "output_type": "execute_result" 53 | } 54 | ], 55 | "source": [ 56 | "my_string = \"Hello World 2!\"\n", 57 | "my_string" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "data": { 67 | "text/plain": [ 68 | "'It started here\\n\\nBut continued here.\\n'" 69 | ] 70 | }, 71 | "execution_count": 4, 72 | "metadata": {}, 73 | "output_type": "execute_result" 74 | } 75 | ], 76 | "source": [ 77 | "my_multiline_string = \"\"\"It started here\n", 78 | "\n", 79 | "But continued here.\n", 80 | "\"\"\"\n", 81 | "my_multiline_string" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "### Exercise 14" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 5, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "my_str = \"Hello World!\"" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 6, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "'H'" 109 | ] 110 | }, 111 | "execution_count": 6, 112 | "metadata": {}, 113 | "output_type": "execute_result" 114 | } 115 | ], 116 | "source": [ 117 | "my_str[0]" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 7, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "data": { 127 | "text/plain": [ 128 | "'o'" 129 | ] 130 | }, 131 | "execution_count": 7, 132 | "metadata": {}, 133 | "output_type": "execute_result" 134 | } 135 | ], 136 | "source": [ 137 | "my_str[4]" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 8, 143 | "metadata": {}, 144 | "outputs": [ 145 | { 146 | "data": { 147 | "text/plain": [ 148 | "'!'" 149 | ] 150 | }, 151 | "execution_count": 8, 152 | "metadata": {}, 153 | "output_type": "execute_result" 154 | } 155 | ], 156 | "source": [ 157 | "my_str[len(my_str) - 1]" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 9, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "data": { 167 | "text/plain": [ 168 | "'!'" 169 | ] 170 | }, 171 | "execution_count": 9, 172 | "metadata": {}, 173 | "output_type": "execute_result" 174 | } 175 | ], 176 | "source": [ 177 | "my_str[-1]" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "### Exercise 15" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 10, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "my_str = \"Hello World! I am learning data wrangling\"" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 11, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "'llo World'" 205 | ] 206 | }, 207 | "execution_count": 11, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "my_str[2:11]" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 12, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "'d! I am learning data wrangling'" 225 | ] 226 | }, 227 | "execution_count": 12, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "my_str[-31:]" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 13, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/plain": [ 244 | "' wranglin'" 245 | ] 246 | }, 247 | "execution_count": 13, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "my_str[-10:-1]" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "### Exercise 16" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 14, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [ 269 | "my_str = \"Name, Age, Sex, Address\"\n", 270 | "list_1 = my_str.split(\",\")" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 15, 276 | "metadata": {}, 277 | "outputs": [ 278 | { 279 | "data": { 280 | "text/plain": [ 281 | "['Name', ' Age', ' Sex', ' Address']" 282 | ] 283 | }, 284 | "execution_count": 15, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "list_1" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 16, 296 | "metadata": {}, 297 | "outputs": [ 298 | { 299 | "data": { 300 | "text/plain": [ 301 | "'Name | Age | Sex | Address'" 302 | ] 303 | }, 304 | "execution_count": 16, 305 | "metadata": {}, 306 | "output_type": "execute_result" 307 | } 308 | ], 309 | "source": [ 310 | "\" | \".join(list_1)" 311 | ] 312 | } 313 | ], 314 | "metadata": { 315 | "kernelspec": { 316 | "display_name": "Python 3", 317 | "language": "python", 318 | "name": "python3" 319 | }, 320 | "language_info": { 321 | "codemirror_mode": { 322 | "name": "ipython", 323 | "version": 3 324 | }, 325 | "file_extension": ".py", 326 | "mimetype": "text/x-python", 327 | "name": "python", 328 | "nbconvert_exporter": "python", 329 | "pygments_lexer": "ipython3", 330 | "version": "3.6.4" 331 | } 332 | }, 333 | "nbformat": 4, 334 | "nbformat_minor": 2 335 | } 336 | -------------------------------------------------------------------------------- /Lesson 1/Tuple Exercise.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Tuple example" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 4, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "data": { 17 | "text/plain": [ 18 | "(24, 42, 2.3456, 'Hello')" 19 | ] 20 | }, 21 | "execution_count": 4, 22 | "metadata": {}, 23 | "output_type": "execute_result" 24 | } 25 | ], 26 | "source": [ 27 | "my_tuple = 24, 42, 2.3456, \"Hello\"\n", 28 | "my_tuple" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "metadata": {}, 35 | "outputs": [ 36 | { 37 | "ename": "TypeError", 38 | "evalue": "'tuple' object does not support item assignment", 39 | "output_type": "error", 40 | "traceback": [ 41 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 42 | "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", 43 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mmy_tuple\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m\"New\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 44 | "\u001b[1;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" 45 | ] 46 | } 47 | ], 48 | "source": [ 49 | "my_tuple[1] = \"New\"" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 5, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "('One',)" 61 | ] 62 | }, 63 | "execution_count": 5, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "my_tuple2 = \"One\", \n", 70 | "my_tuple2" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 6, 76 | "metadata": {}, 77 | "outputs": [ 78 | { 79 | "data": { 80 | "text/plain": [ 81 | "()" 82 | ] 83 | }, 84 | "execution_count": 6, 85 | "metadata": {}, 86 | "output_type": "execute_result" 87 | } 88 | ], 89 | "source": [ 90 | "my_tuple3 = ()\n", 91 | "my_tuple3" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 7, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "my_tuple = \"hello\", \"there\"\n", 101 | "my_tuple2 = my_tuple, 45, \"guido\"" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 8, 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "data": { 111 | "text/plain": [ 112 | "('hello', 'there')" 113 | ] 114 | }, 115 | "execution_count": 8, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "my_tuple" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 9, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "data": { 131 | "text/plain": [ 132 | "(('hello', 'there'), 45, 'guido')" 133 | ] 134 | }, 135 | "execution_count": 9, 136 | "metadata": {}, 137 | "output_type": "execute_result" 138 | } 139 | ], 140 | "source": [ 141 | "my_tuple2" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 10, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | "Hello\n", 154 | "World\n" 155 | ] 156 | } 157 | ], 158 | "source": [ 159 | "my_tuple = \"Hello\", \"World\"\n", 160 | "hello, world = my_tuple\n", 161 | "print(hello)\n", 162 | "print(world)\n" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 11, 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "Welcome\n", 175 | "World\n" 176 | ] 177 | } 178 | ], 179 | "source": [ 180 | "hello = \"Welcome\"\n", 181 | "print(hello)\n", 182 | "print(world)" 183 | ] 184 | } 185 | ], 186 | "metadata": { 187 | "kernelspec": { 188 | "display_name": "Python 3", 189 | "language": "python", 190 | "name": "python3" 191 | }, 192 | "language_info": { 193 | "codemirror_mode": { 194 | "name": "ipython", 195 | "version": 3 196 | }, 197 | "file_extension": ".py", 198 | "mimetype": "text/x-python", 199 | "name": "python", 200 | "nbconvert_exporter": "python", 201 | "pygments_lexer": "ipython3", 202 | "version": "3.6.4" 203 | } 204 | }, 205 | "nbformat": 4, 206 | "nbformat_minor": 2 207 | } 208 | -------------------------------------------------------------------------------- /Lesson 2/List Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Exercise 1" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "big_list_of_numbers = [1 for x in range(0, 10000000)]" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/plain": [ 27 | "81528056" 28 | ] 29 | }, 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "output_type": "execute_result" 33 | } 34 | ], 35 | "source": [ 36 | "from sys import getsizeof\n", 37 | "getsizeof(big_list_of_numbers)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 3, 43 | "metadata": {}, 44 | "outputs": [ 45 | { 46 | "data": { 47 | "text/plain": [ 48 | "56" 49 | ] 50 | }, 51 | "execution_count": 3, 52 | "metadata": {}, 53 | "output_type": "execute_result" 54 | } 55 | ], 56 | "source": [ 57 | "from itertools import repeat\n", 58 | "small_list_of_numbers = repeat(1, times=10000000)\n", 59 | "getsizeof(small_list_of_numbers)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 4, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "1\n", 72 | "1\n", 73 | "1\n", 74 | "1\n", 75 | "1\n", 76 | "1\n", 77 | "1\n", 78 | "1\n", 79 | "1\n", 80 | "1\n", 81 | "1\n", 82 | "1\n" 83 | ] 84 | } 85 | ], 86 | "source": [ 87 | "for i, x in enumerate(small_list_of_numbers):\n", 88 | " print(x)\n", 89 | " if i > 10:\n", 90 | " break" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "### Exercise 2 (Stack)" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 5, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "stack = []" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 6, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "stack.append(25)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 7, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "data": { 125 | "text/plain": [ 126 | "[25]" 127 | ] 128 | }, 129 | "execution_count": 7, 130 | "metadata": {}, 131 | "output_type": "execute_result" 132 | } 133 | ], 134 | "source": [ 135 | "stack" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 8, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "stack.append(-12)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 9, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "[25, -12]" 156 | ] 157 | }, 158 | "execution_count": 9, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "stack" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 10, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "tos = stack.pop()" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 11, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "data": { 183 | "text/plain": [ 184 | "-12" 185 | ] 186 | }, 187 | "execution_count": 11, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "tos" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 12, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "[25]" 205 | ] 206 | }, 207 | "execution_count": 12, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "stack" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 13, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "stack.append(\"Hello\")" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 14, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "data": { 232 | "text/plain": [ 233 | "[25, 'Hello']" 234 | ] 235 | }, 236 | "execution_count": 14, 237 | "metadata": {}, 238 | "output_type": "execute_result" 239 | } 240 | ], 241 | "source": [ 242 | "stack" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "### Exercise 3 (Stack, custom functions)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 15, 255 | "metadata": {}, 256 | "outputs": [], 257 | "source": [ 258 | "def stack_push(s, value):\n", 259 | " return s + [value]\n", 260 | "\n", 261 | "def stack_pop(s):\n", 262 | " tos = s[-1]\n", 263 | " del s[-1]\n", 264 | " return tos\n", 265 | "\n", 266 | "url_stack = []" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 16, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "wikipedia_datascience = \"Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge [https://en.wikipedia.org/wiki/Knowledge] and insights from data [https://en.wikipedia.org/wiki/Data] in various forms, both structured and unstructured,similar to data mining [https://en.wikipedia.org/wiki/Data_mining]\"" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 17, 281 | "metadata": {}, 282 | "outputs": [ 283 | { 284 | "data": { 285 | "text/plain": [ 286 | "347" 287 | ] 288 | }, 289 | "execution_count": 17, 290 | "metadata": {}, 291 | "output_type": "execute_result" 292 | } 293 | ], 294 | "source": [ 295 | "len(wikipedia_datascience)" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 18, 301 | "metadata": {}, 302 | "outputs": [ 303 | { 304 | "data": { 305 | "text/plain": [ 306 | "34" 307 | ] 308 | }, 309 | "execution_count": 18, 310 | "metadata": {}, 311 | "output_type": "execute_result" 312 | } 313 | ], 314 | "source": [ 315 | "wd_list = wikipedia_datascience.split()\n", 316 | "len(wd_list)" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 19, 322 | "metadata": {}, 323 | "outputs": [], 324 | "source": [ 325 | "for word in wd_list:\n", 326 | " if word.startswith(\"[https://\"):\n", 327 | " url_stack = stack_push(url_stack, word[1:-1])" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": 20, 333 | "metadata": {}, 334 | "outputs": [ 335 | { 336 | "data": { 337 | "text/plain": [ 338 | "['https://en.wikipedia.org/wiki/Knowledge',\n", 339 | " 'https://en.wikipedia.org/wiki/Data',\n", 340 | " 'https://en.wikipedia.org/wiki/Data_mining']" 341 | ] 342 | }, 343 | "execution_count": 20, 344 | "metadata": {}, 345 | "output_type": "execute_result" 346 | } 347 | ], 348 | "source": [ 349 | "url_stack" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 21, 355 | "metadata": {}, 356 | "outputs": [ 357 | { 358 | "name": "stdout", 359 | "output_type": "stream", 360 | "text": [ 361 | "https://en.wikipedia.org/wiki/Data_mining\n", 362 | "https://en.wikipedia.org/wiki/Data\n", 363 | "https://en.wikipedia.org/wiki/Knowledge\n" 364 | ] 365 | } 366 | ], 367 | "source": [ 368 | "for i in range(0, len(url_stack)):\n", 369 | " print(stack_pop(url_stack))" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 22, 375 | "metadata": {}, 376 | "outputs": [ 377 | { 378 | "name": "stdout", 379 | "output_type": "stream", 380 | "text": [ 381 | "[]\n" 382 | ] 383 | } 384 | ], 385 | "source": [ 386 | "print(url_stack)" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "### Exercise 4 (lambda expression)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 23, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "import math" 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": 24, 408 | "metadata": {}, 409 | "outputs": [], 410 | "source": [ 411 | "def my_sine():\n", 412 | " return lambda x: math.sin(math.radians(x))\n", 413 | "\n", 414 | "def my_cosine():\n", 415 | " return lambda x: math.cos(math.radians(x))" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": 25, 421 | "metadata": {}, 422 | "outputs": [ 423 | { 424 | "data": { 425 | "text/plain": [ 426 | "1.0" 427 | ] 428 | }, 429 | "execution_count": 25, 430 | "metadata": {}, 431 | "output_type": "execute_result" 432 | } 433 | ], 434 | "source": [ 435 | "sine = my_sine()\n", 436 | "cosine = my_cosine()\n", 437 | "math.pow(sine(30), 2) + math.pow(cosine(30), 2)" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 26, 443 | "metadata": {}, 444 | "outputs": [], 445 | "source": [ 446 | "# Sorting list of tuples\n", 447 | "capitals = [(\"USA\", \"Washington\"), (\"India\", \"Delhi\"), (\"France\", \"Paris\"), (\"UK\", \"London\")]" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 27, 453 | "metadata": {}, 454 | "outputs": [ 455 | { 456 | "data": { 457 | "text/plain": [ 458 | "[('USA', 'Washington'),\n", 459 | " ('India', 'Delhi'),\n", 460 | " ('France', 'Paris'),\n", 461 | " ('UK', 'London')]" 462 | ] 463 | }, 464 | "execution_count": 27, 465 | "metadata": {}, 466 | "output_type": "execute_result" 467 | } 468 | ], 469 | "source": [ 470 | "capitals" 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": 28, 476 | "metadata": {}, 477 | "outputs": [ 478 | { 479 | "data": { 480 | "text/plain": [ 481 | "[('India', 'Delhi'),\n", 482 | " ('UK', 'London'),\n", 483 | " ('France', 'Paris'),\n", 484 | " ('USA', 'Washington')]" 485 | ] 486 | }, 487 | "execution_count": 28, 488 | "metadata": {}, 489 | "output_type": "execute_result" 490 | } 491 | ], 492 | "source": [ 493 | "capitals.sort(key=lambda item: item[1])\n", 494 | "capitals" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "### Exercise 5 (Multi Element membership checking)" 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "execution_count": 29, 507 | "metadata": {}, 508 | "outputs": [ 509 | { 510 | "data": { 511 | "text/plain": [ 512 | "True" 513 | ] 514 | }, 515 | "execution_count": 29, 516 | "metadata": {}, 517 | "output_type": "execute_result" 518 | } 519 | ], 520 | "source": [ 521 | "list_of_words = [\"Hello\", \"there.\", \"How\", \"are\", \"you\", \"doing\"]\n", 522 | "check_for = [\"How\", \"are\"]\n", 523 | "all(w in list_of_words for w in check_for)" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "### Exercise 6 (Queue with list and dqueue)" 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": 30, 536 | "metadata": {}, 537 | "outputs": [], 538 | "source": [ 539 | "queue = []" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 31, 545 | "metadata": {}, 546 | "outputs": [ 547 | { 548 | "name": "stdout", 549 | "output_type": "stream", 550 | "text": [ 551 | "Queue created\n", 552 | "Queue emptied\n", 553 | "Wall time: 1.44 s\n" 554 | ] 555 | } 556 | ], 557 | "source": [ 558 | "%%time\n", 559 | "for i in range(0, 100000):\n", 560 | " queue.append(i)\n", 561 | "print(\"Queue created\")\n", 562 | "for i in range(0, 100000):\n", 563 | " queue.pop(0)\n", 564 | "print(\"Queue emptied\")" 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": 32, 570 | "metadata": {}, 571 | "outputs": [ 572 | { 573 | "name": "stdout", 574 | "output_type": "stream", 575 | "text": [ 576 | "Queue created\n", 577 | "Queue emptied\n", 578 | "Wall time: 43 ms\n" 579 | ] 580 | } 581 | ], 582 | "source": [ 583 | "%%time\n", 584 | "from collections import deque\n", 585 | "queue2 = deque()\n", 586 | "for i in range(0, 100000):\n", 587 | " queue2.append(i)\n", 588 | "print(\"Queue created\")\n", 589 | "for i in range(0, 100000):\n", 590 | " queue2.popleft()\n", 591 | "print(\"Queue emptied\")" 592 | ] 593 | } 594 | ], 595 | "metadata": { 596 | "kernelspec": { 597 | "display_name": "Python 3", 598 | "language": "python", 599 | "name": "python3" 600 | }, 601 | "language_info": { 602 | "codemirror_mode": { 603 | "name": "ipython", 604 | "version": 3 605 | }, 606 | "file_extension": ".py", 607 | "mimetype": "text/x-python", 608 | "name": "python", 609 | "nbconvert_exporter": "python", 610 | "pygments_lexer": "ipython3", 611 | "version": "3.6.4" 612 | } 613 | }, 614 | "nbformat": 4, 615 | "nbformat_minor": 2 616 | } 617 | -------------------------------------------------------------------------------- /Lesson 2/Readme.md: -------------------------------------------------------------------------------- 1 | ## Lesson 2 materials 2 | -------------------------------------------------------------------------------- /Lesson 2/Solutions Student Activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Task 1" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from itertools import permutations, dropwhile" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "permutations?" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 3, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "dropwhile?" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "### Task 2" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 4, 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "data": { 51 | "text/plain": [ 52 | "" 53 | ] 54 | }, 55 | "execution_count": 4, 56 | "metadata": {}, 57 | "output_type": "execute_result" 58 | } 59 | ], 60 | "source": [ 61 | "permutations(range(3))" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "### Task 3" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 5, 74 | "metadata": {}, 75 | "outputs": [ 76 | { 77 | "name": "stdout", 78 | "output_type": "stream", 79 | "text": [ 80 | "(0, 1, 2)\n", 81 | "(0, 2, 1)\n", 82 | "(1, 0, 2)\n", 83 | "(1, 2, 0)\n", 84 | "(2, 0, 1)\n", 85 | "(2, 1, 0)\n" 86 | ] 87 | } 88 | ], 89 | "source": [ 90 | "for number_tuple in permutations(range(3)):\n", 91 | " print(number_tuple)\n", 92 | " assert isinstance(number_tuple, tuple)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "### Task 4" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 6, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "name": "stdout", 109 | "output_type": "stream", 110 | "text": [ 111 | "[1, 2]\n", 112 | "[2, 1]\n", 113 | "[1, 0, 2]\n", 114 | "[1, 2, 0]\n", 115 | "[2, 0, 1]\n", 116 | "[2, 1, 0]\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "for number_tuple in permutations(range(3)):\n", 122 | " print(list(dropwhile(lambda x: x <= 0, number_tuple)))" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "### Task 5" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 7, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "import math\n", 139 | "def convert_to_number(number_stack):\n", 140 | " final_number = 0\n", 141 | " for i in range(0, len(number_stack)):\n", 142 | " final_number += (number_stack.pop() * (math.pow(10, i)))\n", 143 | " return final_number" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 8, 149 | "metadata": {}, 150 | "outputs": [ 151 | { 152 | "name": "stdout", 153 | "output_type": "stream", 154 | "text": [ 155 | "12.0\n", 156 | "21.0\n", 157 | "102.0\n", 158 | "120.0\n", 159 | "201.0\n", 160 | "210.0\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "for number_tuple in permutations(range(3)):\n", 166 | " number_stack = list(dropwhile(lambda x: x <= 0, number_tuple))\n", 167 | " print(convert_to_number(number_stack))" 168 | ] 169 | } 170 | ], 171 | "metadata": { 172 | "kernelspec": { 173 | "display_name": "Python 3", 174 | "language": "python", 175 | "name": "python3" 176 | }, 177 | "language_info": { 178 | "codemirror_mode": { 179 | "name": "ipython", 180 | "version": 3 181 | }, 182 | "file_extension": ".py", 183 | "mimetype": "text/x-python", 184 | "name": "python", 185 | "nbconvert_exporter": "python", 186 | "pygments_lexer": "ipython3", 187 | "version": "3.6.4" 188 | } 189 | }, 190 | "nbformat": 4, 191 | "nbformat_minor": 2 192 | } 193 | -------------------------------------------------------------------------------- /Lesson 2/Student Activity 02 Design your own CSV parser - Solutions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from itertools import zip_longest" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 2, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "def return_dict_from_csv_line(header, line):\n", 19 | " # Zip them\n", 20 | " zipped_line = zip_longest(header, line, fillvalue=None)\n", 21 | " # Use dict comprehension to generate the final dict\n", 22 | " ret_dict = {kv[0]: kv[1] for kv in zipped_line}\n", 23 | " return ret_dict" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 3, 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "name": "stdout", 33 | "output_type": "stream", 34 | "text": [ 35 | "{'Region': 'Central America and the Caribbean', 'Country': 'Antigua and Barbuda ', 'Item Type': 'Baby Food', 'Sales Channel': 'Online', 'Order Priority': 'M', 'Order Date': '12/20/2013', 'Order ID': '957081544', 'Ship Date': '1/11/2014', 'Units Sold': '552', 'Unit Price': '255.28', 'Unit Cost': '159.42', 'Total Revenue': '140914.56', 'Total Cost': '87999.84', 'Total Profit': '52914.72'}\n", 36 | "{'Region': 'Central America and the Caribbean', 'Country': 'Panama', 'Item Type': 'Snacks', 'Sales Channel': 'Offline', 'Order Priority': 'C', 'Order Date': '7/5/2010', 'Order ID': '301644504', 'Ship Date': '7/26/2010', 'Units Sold': '2167', 'Unit Price': '152.58', 'Unit Cost': '97.44', 'Total Revenue': '330640.86', 'Total Cost': '211152.48', 'Total Profit': '119488.38'}\n", 37 | "{'Region': 'Europe', 'Country': 'Czech Republic', 'Item Type': 'Beverages', 'Sales Channel': 'Offline', 'Order Priority': 'C', 'Order Date': '9/12/2011', 'Order ID': '478051030', 'Ship Date': '9/29/2011', 'Units Sold': '4778', 'Unit Price': '47.45', 'Unit Cost': '31.79', 'Total Revenue': '226716.10', 'Total Cost': '151892.62', 'Total Profit': '74823.48'}\n", 38 | "{'Region': 'Asia', 'Country': 'North Korea', 'Item Type': 'Cereal', 'Sales Channel': 'Offline', 'Order Priority': 'L', 'Order Date': '5/13/2010', 'Order ID': '892599952', 'Ship Date': '6/15/2010', 'Units Sold': '9016', 'Unit Price': '205.70', 'Unit Cost': '117.11', 'Total Revenue': '1854591.20', 'Total Cost': '1055863.76', 'Total Profit': '798727.44'}\n", 39 | "{'Region': 'Asia', 'Country': 'Sri Lanka', 'Item Type': 'Snacks', 'Sales Channel': 'Offline', 'Order Priority': 'C', 'Order Date': '7/20/2015', 'Order ID': '571902596', 'Ship Date': '7/27/2015', 'Units Sold': '7542', 'Unit Price': '152.58', 'Unit Cost': '97.44', 'Total Revenue': '1150758.36', 'Total Cost': '734892.48', 'Total Profit': '415865.88'}\n", 40 | "{'Region': 'Middle East and North Africa', 'Country': 'Morocco', 'Item Type': 'Personal Care', 'Sales Channel': 'Offline', 'Order Priority': 'L', 'Order Date': '11/8/2010', 'Order ID': '412882792', 'Ship Date': '11/22/2010', 'Units Sold': '48', 'Unit Price': '81.73', 'Unit Cost': '56.67', 'Total Revenue': '3923.04', 'Total Cost': '2720.16', 'Total Profit': '1202.88'}\n", 41 | "{'Region': 'Australia and Oceania', 'Country': 'Federated States of Micronesia', 'Item Type': 'Clothes', 'Sales Channel': 'Offline', 'Order Priority': 'H', 'Order Date': '3/28/2011', 'Order ID': '932776868', 'Ship Date': '5/10/2011', 'Units Sold': '8258', 'Unit Price': '109.28', 'Unit Cost': '35.84', 'Total Revenue': '902434.24', 'Total Cost': '295966.72', 'Total Profit': '606467.52'}\n", 42 | "{'Region': 'Europe', 'Country': 'Bosnia and Herzegovina', 'Item Type': 'Clothes', 'Sales Channel': 'Online', 'Order Priority': 'M', 'Order Date': '10/14/2013', 'Order ID': '919133651', 'Ship Date': '11/4/2013', 'Units Sold': '927', 'Unit Price': '109.28', 'Unit Cost': '35.84', 'Total Revenue': '101302.56', 'Total Cost': '33223.68', 'Total Profit': '68078.88'}\n", 43 | "{'Region': 'Middle East and North Africa', 'Country': 'Afghanistan', 'Item Type': 'Clothes', 'Sales Channel': 'Offline', 'Order Priority': 'M', 'Order Date': '8/27/2016', 'Order ID': '579814469', 'Ship Date': '10/5/2016', 'Units Sold': '8841', 'Unit Price': '109.28', 'Unit Cost': '35.84', 'Total Revenue': '966144.48', 'Total Cost': '316861.44', 'Total Profit': '649283.04'}\n", 44 | "{'Region': 'Sub-Saharan Africa', 'Country': 'Ethiopia', 'Item Type': 'Baby Food', 'Sales Channel': 'Online', 'Order Priority': 'M', 'Order Date': '4/13/2015', 'Order ID': '192993152', 'Ship Date': '5/7/2015', 'Units Sold': '9817', 'Unit Price': '255.28', 'Unit Cost': '159.42', 'Total Revenue': '2506083.76', 'Total Cost': '1565026.14', 'Total Profit': '941057.62'}\n", 45 | "{'Region': 'Middle East and North Africa', 'Country': 'Turkey', 'Item Type': 'Office Supplies', 'Sales Channel': 'Offline', 'Order Priority': 'C', 'Order Date': '9/25/2013', 'Order ID': '557156026', 'Ship Date': '10/15/2013', 'Units Sold': '3704', 'Unit Price': '651.21', 'Unit Cost': '524.96', 'Total Revenue': '2412081.84', 'Total Cost': '1944451.84', 'Total Profit': '467630.00'}\n", 46 | "{'Region': 'Middle East and North Africa', 'Country': 'Oman', 'Item Type': 'Cosmetics', 'Sales Channel': 'Online', 'Order Priority': 'M', 'Order Date': '5/12/2013', 'Order ID': '741101920', 'Ship Date': '5/17/2013', 'Units Sold': '7382', 'Unit Price': '437.20', 'Unit Cost': '263.33', 'Total Revenue': '3227410.40', 'Total Cost': '1943902.06', 'Total Profit': '1283508.34'}\n" 47 | ] 48 | } 49 | ], 50 | "source": [ 51 | "with open(\"sales_record.csv\", \"r\") as fd:\n", 52 | " first_line = fd.readline()\n", 53 | " header = first_line.replace(\"\\n\", \"\").split(\",\")\n", 54 | " for i, line in enumerate(fd):\n", 55 | " # Here we loop over the first 10 lines in order to not to make the output too big\n", 56 | " line = line.replace(\"\\n\", \"\").split(\",\")\n", 57 | " d = return_dict_from_csv_line(header, line)\n", 58 | " print(d)\n", 59 | " if i > 10:\n", 60 | " break" 61 | ] 62 | } 63 | ], 64 | "metadata": { 65 | "kernelspec": { 66 | "display_name": "Python 3", 67 | "language": "python", 68 | "name": "python3" 69 | }, 70 | "language_info": { 71 | "codemirror_mode": { 72 | "name": "ipython", 73 | "version": 3 74 | }, 75 | "file_extension": ".py", 76 | "mimetype": "text/x-python", 77 | "name": "python", 78 | "nbconvert_exporter": "python", 79 | "pygments_lexer": "ipython3", 80 | "version": "3.6.4" 81 | } 82 | }, 83 | "nbformat": 4, 84 | "nbformat_minor": 2 85 | } 86 | -------------------------------------------------------------------------------- /Lesson 3/Lesson 3 Activity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Leasson 3: Activity\n", 8 | "\n", 9 | "In this acitvity, you will work with **Boston Housing Price dataset**. The Boston house-price data has been used in many machine learning papers that address regression problems. You will read the data from a CSV file into a Pandas DataFrame and do some data basic wrangling with it.\n", 10 | "\n", 11 | "Following are the details of the attributes of this dataset for your reference. You may have to refer them while answering question on this activity.\n", 12 | "\n", 13 | "* **CRIM**: per capita crime rate by town\n", 14 | "* **ZN**: proportion of residential land zoned for lots over 25,000 sq.ft.\n", 15 | "* **INDUS**: proportion of non-retail business acres per town\n", 16 | "* **CHAS**: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n", 17 | "* **NOX**: nitric oxides concentration (parts per 10 million)\n", 18 | "* **RM**: average number of rooms per dwelling\n", 19 | "* **AGE**: proportion of owner-occupied units built prior to 1940\n", 20 | "* **DIS**: weighted distances to five Boston employment centres\n", 21 | "* **RAD**: index of accessibility to radial highways\n", 22 | "* **TAX**: full-value property-tax rate per 10,000 dollars\n", 23 | "* **PTRATIO**: pupil-teacher ratio by town\n", 24 | "* **B**: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n", 25 | "* **LSTAT**: % of lower status of the population\n", 26 | "* **PRICE**: Median value of owner-occupied homes in $1000's" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "### Load necessary libraries" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "# Write your code here" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "### Read in the Boston housing data set (given as a .csv file) from the local directory" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": null, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "# Hint: The Pandas function for reading a CSV file is 'read_csv'.\n", 59 | "# Don't forget that all functions in Pandas can be accessed by syntax like pd.{function_name}\n", 60 | "# write your code here" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "### Check first 10 records" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 1, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "# Write your code here" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "### In total, how many records are there?" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 2, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "# Write your code here to answer the question above" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "### Create a smaller DataFrame with columns which do not include 'CHAS', 'NOX', 'B', and 'LSTAT'" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 4, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "# Write your code here" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "### Check the last 7 records of the new DataFrame you just created" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 3, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "# Write your code here" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "### Can you plot histograms of all the variables (columns) in the new DataFrame?\n", 132 | "You can of course plot them one by one. But try to write a short code to plot all of them once.\n", 133 | "
***Hint***: 'For loop'!\n", 134 | "
***Bonus problem***: Can you also show each plot with its unique title i.e. of the variable that it is a plot of? " 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 5, 140 | "metadata": { 141 | "scrolled": false 142 | }, 143 | "outputs": [], 144 | "source": [ 145 | "# Write your code here" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "### Crime rate could be an indicator of house price (people don't want to live in high-crime areas). Create a scatter plot of crime rate vs. Price." 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 6, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "# Write your code here" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "### We can understand the relationship better if we plot _log10(crime)_ vs. Price. Create that plot and make it nice. Give proper title, x-axis, y-axis label, make data points a color of your choice, etc...\n", 169 | "***Hint***: Try `np.log10` function" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 7, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "# Write your code here" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "### Can you calculate the mean rooms per dwelling?" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 8, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "# Write your code here" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "### Can you calculate median Age?" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 9, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "# Write your code here" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "### Can you calculate average (mean) distances to five Boston employment centres?" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 10, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "# Write your code here" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "### Tricky question: Can you calculate the percentage of houses with low price (< $20,000)?" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 11, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "# Write your code here" 243 | ] 244 | } 245 | ], 246 | "metadata": { 247 | "kernelspec": { 248 | "display_name": "Python 3", 249 | "language": "python", 250 | "name": "python3" 251 | }, 252 | "language_info": { 253 | "codemirror_mode": { 254 | "name": "ipython", 255 | "version": 3 256 | }, 257 | "file_extension": ".py", 258 | "mimetype": "text/x-python", 259 | "name": "python", 260 | "nbconvert_exporter": "python", 261 | "pygments_lexer": "ipython3", 262 | "version": "3.6.2" 263 | }, 264 | "latex_envs": { 265 | "LaTeX_envs_menu_present": true, 266 | "autoclose": false, 267 | "autocomplete": true, 268 | "bibliofile": "biblio.bib", 269 | "cite_by": "apalike", 270 | "current_citInitial": 1, 271 | "eqLabelWithNumbers": true, 272 | "eqNumInitial": 1, 273 | "hotkeys": { 274 | "equation": "Ctrl-E", 275 | "itemize": "Ctrl-I" 276 | }, 277 | "labels_anchors": false, 278 | "latex_user_defs": false, 279 | "report_style_numbering": false, 280 | "user_envs_cfg": false 281 | } 282 | }, 283 | "nbformat": 4, 284 | "nbformat_minor": 2 285 | } 286 | -------------------------------------------------------------------------------- /Lesson 3/Lesson 3 Exercise 1-13 Numpy arrays.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Exercise 1: Create a Numpy array (from a list)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 38, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import numpy as np\n", 17 | "lst1=[1,2,3]\n", 18 | "array1 = np.array(lst1)" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 39, 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "data": { 28 | "text/plain": [ 29 | "numpy.ndarray" 30 | ] 31 | }, 32 | "execution_count": 39, 33 | "metadata": {}, 34 | "output_type": "execute_result" 35 | } 36 | ], 37 | "source": [ 38 | "type(array1)" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 40, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/plain": [ 49 | "list" 50 | ] 51 | }, 52 | "execution_count": 40, 53 | "metadata": {}, 54 | "output_type": "execute_result" 55 | } 56 | ], 57 | "source": [ 58 | "type(lst1)" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "### Exercise 2: Add two Numpy arrays" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 41, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "name": "stdout", 75 | "output_type": "stream", 76 | "text": [ 77 | "[1, 2, 3, 1, 2, 3]\n", 78 | "[2 4 6]\n" 79 | ] 80 | } 81 | ], 82 | "source": [ 83 | "lst2 = lst1 + lst1\n", 84 | "print(lst2)\n", 85 | "array2 = array1 + array1\n", 86 | "print(array2)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### Exercise 3: Mathematical operations on Numpy arrays" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 42, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "name": "stdout", 103 | "output_type": "stream", 104 | "text": [ 105 | "array1 multiplied by array1: [1 4 9]\n", 106 | "array1 divided by array1: [1. 1. 1.]\n", 107 | "array1 raised to the power of array1: [ 1 4 27]\n" 108 | ] 109 | } 110 | ], 111 | "source": [ 112 | "print(\"array1 multiplied by array1: \",array1*array1)\n", 113 | "print(\"array1 divided by array1: \",array1/array1)\n", 114 | "print(\"array1 raised to the power of array1: \",array1**array1)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "### Exercise 4: More advanced mathematical operations on Numpy arrays" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 43, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "name": "stdout", 131 | "output_type": "stream", 132 | "text": [ 133 | "[1, 2, 3, 4, 5]\n", 134 | "Sine: [ 0.84147098 0.90929743 0.14112001 -0.7568025 -0.95892427]\n", 135 | "Natural logarithm: [0. 0.69314718 1.09861229 1.38629436 1.60943791]\n", 136 | "Base-10 logarithm: [0. 0.30103 0.47712125 0.60205999 0.69897 ]\n", 137 | "Base-2 logarithm: [0. 1. 1.5849625 2. 2.32192809]\n", 138 | "Exponential: [ 2.71828183 7.3890561 20.08553692 54.59815003 148.4131591 ]\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "lst_5=[i for i in range(1,6)]\n", 144 | "print(lst_5)\n", 145 | "array_5=np.array(lst_5)\n", 146 | "# sine function\n", 147 | "print(\"Sine: \",np.sin(array_5))\n", 148 | "# logarithm\n", 149 | "print(\"Natural logarithm: \",np.log(array_5))\n", 150 | "print(\"Base-10 logarithm: \",np.log10(array_5))\n", 151 | "print(\"Base-2 logarithm: \",np.log2(array_5))\n", 152 | "# Exponential\n", 153 | "print(\"Exponential: \",np.exp(array_5))" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "### Exercise 5: How to generate arrays easily? `arange` and `linspace`" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 44, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "name": "stdout", 170 | "output_type": "stream", 171 | "text": [ 172 | "A series of numbers: [ 5 6 7 8 9 10 11 12 13 14 15]\n", 173 | "Numbers spaced apart by 2: [ 0 2 4 6 8 10]\n", 174 | "Numbers spaced apart by float: [ 0. 2.5 5. 7.5 10. ]\n", 175 | "Every 5th number from 30 in reverse order: [30 25 20 15 10 5 0]\n", 176 | "11 linearly spaced numbers between 1 and 5: [1. 1.4 1.8 2.2 2.6 3. 3.4 3.8 4.2 4.6 5. ]\n" 177 | ] 178 | } 179 | ], 180 | "source": [ 181 | "print(\"A series of numbers:\",np.arange(5,16))\n", 182 | "print(\"Numbers spaced apart by 2:\",np.arange(0,11,2))\n", 183 | "print(\"Numbers spaced apart by float:\",np.arange(0,11,2.5))\n", 184 | "print(\"Every 5th number from 30 in reverse order: \",np.arange(30,-1,-5))\n", 185 | "print(\"11 linearly spaced numbers between 1 and 5: \",np.linspace(1,5,11))" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "### Exercise 6: Creating multi-dimensional array" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 45, 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "name": "stdout", 202 | "output_type": "stream", 203 | "text": [ 204 | "Type/Class of this object: \n", 205 | "Here is the matrix\n", 206 | "----------\n", 207 | " [[1 2 3]\n", 208 | " [4 5 6]\n", 209 | " [7 8 9]] \n", 210 | "----------\n" 211 | ] 212 | } 213 | ], 214 | "source": [ 215 | "list_2D = [[1,2,3],[4,5,6],[7,8,9]]\n", 216 | "mat1 = np.array(list_2D)\n", 217 | "print(\"Type/Class of this object:\",type(mat1))\n", 218 | "print(\"Here is the matrix\\n----------\\n\",mat1,\"\\n----------\")" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 46, 224 | "metadata": {}, 225 | "outputs": [ 226 | { 227 | "name": "stdout", 228 | "output_type": "stream", 229 | "text": [ 230 | "[[1.5 2. 3. ]\n", 231 | " [4. 5. 6. ]]\n" 232 | ] 233 | } 234 | ], 235 | "source": [ 236 | "tuple_2D = np.array([(1.5,2,3), (4,5,6)])\n", 237 | "mat_tuple = np.array(tuple_2D)\n", 238 | "print (mat_tuple)" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "### Exercise 7: Dimension, shape, size, and data type of the 2D array" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 47, 251 | "metadata": {}, 252 | "outputs": [ 253 | { 254 | "name": "stdout", 255 | "output_type": "stream", 256 | "text": [ 257 | "Dimension of this matrix: 2\n", 258 | "Size of this matrix: 9\n", 259 | "Shape of this matrix: (3, 3)\n", 260 | "Data type of this matrix: int32\n" 261 | ] 262 | } 263 | ], 264 | "source": [ 265 | "print(\"Dimension of this matrix: \",mat1.ndim,sep='') \n", 266 | "print(\"Size of this matrix: \", mat1.size,sep='') \n", 267 | "print(\"Shape of this matrix: \", mat1.shape,sep='')\n", 268 | "print(\"Data type of this matrix: \", mat1.dtype,sep='')" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "### Exercise 8: Zeros, Ones, Random, and Identity Matrices and Vectors" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 48, 281 | "metadata": {}, 282 | "outputs": [ 283 | { 284 | "name": "stdout", 285 | "output_type": "stream", 286 | "text": [ 287 | "Vector of zeros: [0. 0. 0. 0. 0.]\n", 288 | "Matrix of zeros: [[0. 0. 0. 0.]\n", 289 | " [0. 0. 0. 0.]\n", 290 | " [0. 0. 0. 0.]]\n", 291 | "Vector of ones: [1. 1. 1. 1.]\n", 292 | "Matrix of ones: [[1. 1.]\n", 293 | " [1. 1.]\n", 294 | " [1. 1.]\n", 295 | " [1. 1.]]\n", 296 | "Matrix of 5’s: [[5. 5. 5.]\n", 297 | " [5. 5. 5.]\n", 298 | " [5. 5. 5.]]\n", 299 | "Identity matrix of dimension 2: [[1. 0.]\n", 300 | " [0. 1.]]\n", 301 | "Identity matrix of dimension 4: [[1. 0. 0. 0.]\n", 302 | " [0. 1. 0. 0.]\n", 303 | " [0. 0. 1. 0.]\n", 304 | " [0. 0. 0. 1.]]\n", 305 | "Random matrix of shape (4,3):\n", 306 | " [[2 6 1]\n", 307 | " [9 1 1]\n", 308 | " [8 7 2]\n", 309 | " [7 6 9]]\n" 310 | ] 311 | } 312 | ], 313 | "source": [ 314 | "print(\"Vector of zeros: \",np.zeros(5))\n", 315 | "print(\"Matrix of zeros: \",np.zeros((3,4)))\n", 316 | "print(\"Vector of ones: \",np.ones(4))\n", 317 | "print(\"Matrix of ones: \",np.ones((4,2)))\n", 318 | "print(\"Matrix of 5’s: \",5*np.ones((3,3)))\n", 319 | "print(\"Identity matrix of dimension 2:\",np.eye(2))\n", 320 | "print(\"Identity matrix of dimension 4:\",np.eye(4))\n", 321 | "print(\"Random matrix of shape (4,3):\\n\",np.random.randint(low=1,high=10,size=(4,3)))" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "### Exercise 9: Reshaping, Ravel, Min, Max, Sorting" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 49, 334 | "metadata": {}, 335 | "outputs": [ 336 | { 337 | "name": "stdout", 338 | "output_type": "stream", 339 | "text": [ 340 | "Shape of a: (30,)\n", 341 | "Shape of b: (2, 3, 5)\n", 342 | "Shape of c: (6, 5)\n" 343 | ] 344 | } 345 | ], 346 | "source": [ 347 | "a = np.random.randint(1,100,30)\n", 348 | "b = a.reshape(2,3,5)\n", 349 | "c = a.reshape(6,5)\n", 350 | "print (\"Shape of a:\", a.shape)\n", 351 | "print (\"Shape of b:\", b.shape)\n", 352 | "print (\"Shape of c:\", c.shape)" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": 50, 358 | "metadata": {}, 359 | "outputs": [ 360 | { 361 | "name": "stdout", 362 | "output_type": "stream", 363 | "text": [ 364 | "\n", 365 | "a looks like\n", 366 | " [ 3 94 42 63 68 39 68 65 61 31 74 19 94 6 96 15 8 15 56 21 95 66 96 60\n", 367 | " 3 41 31 73 78 52]\n", 368 | "\n", 369 | "b looks like\n", 370 | " [[[ 3 94 42 63 68]\n", 371 | " [39 68 65 61 31]\n", 372 | " [74 19 94 6 96]]\n", 373 | "\n", 374 | " [[15 8 15 56 21]\n", 375 | " [95 66 96 60 3]\n", 376 | " [41 31 73 78 52]]]\n", 377 | "\n", 378 | "c looks like\n", 379 | " [[ 3 94 42 63 68]\n", 380 | " [39 68 65 61 31]\n", 381 | " [74 19 94 6 96]\n", 382 | " [15 8 15 56 21]\n", 383 | " [95 66 96 60 3]\n", 384 | " [41 31 73 78 52]]\n" 385 | ] 386 | } 387 | ], 388 | "source": [ 389 | "print(\"\\na looks like\\n\",a)\n", 390 | "print(\"\\nb looks like\\n\",b)\n", 391 | "print(\"\\nc looks like\\n\",c)" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 51, 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "name": "stdout", 401 | "output_type": "stream", 402 | "text": [ 403 | "[ 3 94 42 63 68 39 68 65 61 31 74 19 94 6 96 15 8 15 56 21 95 66 96 60\n", 404 | " 3 41 31 73 78 52]\n" 405 | ] 406 | } 407 | ], 408 | "source": [ 409 | "b_flat = b.ravel()\n", 410 | "print(b_flat)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "### Exercise 10: Indexing and slicing" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 52, 423 | "metadata": {}, 424 | "outputs": [ 425 | { 426 | "name": "stdout", 427 | "output_type": "stream", 428 | "text": [ 429 | "Array: [ 0 1 2 3 4 5 6 7 8 9 10]\n", 430 | "Element at 7th index is: 7\n", 431 | "Elements from 3rd to 5th index are: [3 4 5]\n", 432 | "Elements up to 4th index are: [0 1 2 3]\n", 433 | "Elements from last backwards are: [10 9 8 7 6 5 4 3 2 1 0]\n", 434 | "3 Elements from last backwards are: [10 8 6]\n", 435 | "New array: [ 0 2 4 6 8 10 12 14 16 18 20]\n", 436 | "Elements at 2nd, 4th, and 9th index are: [ 4 8 18]\n" 437 | ] 438 | } 439 | ], 440 | "source": [ 441 | "arr = np.arange(0,11)\n", 442 | "print(\"Array:\",arr)\n", 443 | "print(\"Element at 7th index is:\", arr[7])\n", 444 | "print(\"Elements from 3rd to 5th index are:\", arr[3:6])\n", 445 | "print(\"Elements up to 4th index are:\", arr[:4])\n", 446 | "print(\"Elements from last backwards are:\", arr[-1::-1])\n", 447 | "print(\"3 Elements from last backwards are:\", arr[-1:-6:-2])\n", 448 | "\n", 449 | "arr2 = np.arange(0,21,2)\n", 450 | "print(\"New array:\",arr2)\n", 451 | "print(\"Elements at 2nd, 4th, and 9th index are:\", arr2[[2,4,9]]) # Pass a list as a index to subset" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 53, 457 | "metadata": {}, 458 | "outputs": [ 459 | { 460 | "name": "stdout", 461 | "output_type": "stream", 462 | "text": [ 463 | "Matrix of random 2-digit numbers\n", 464 | " [[89 12 21 92 74]\n", 465 | " [91 94 56 89 93]\n", 466 | " [39 49 58 92 63]]\n", 467 | "\n", 468 | "Double bracket indexing\n", 469 | "\n", 470 | "Element in row index 1 and column index 2: 56\n", 471 | "\n", 472 | "Single bracket with comma indexing\n", 473 | "\n", 474 | "Element in row index 1 and column index 2: 56\n", 475 | "\n", 476 | "Row or column extract\n", 477 | "\n", 478 | "Entire row at index 2: [39 49 58 92 63]\n", 479 | "Entire column at index 3: [92 89 92]\n", 480 | "\n", 481 | "Subsetting sub-matrices\n", 482 | "\n", 483 | "Matrix with row indices 1 and 2 and column indices 3 and 4\n", 484 | " [[89 93]\n", 485 | " [92 63]]\n", 486 | "Matrix with row indices 0 and 1 and column indices 1 and 3\n", 487 | " [[12 92]\n", 488 | " [94 89]]\n" 489 | ] 490 | } 491 | ], 492 | "source": [ 493 | "mat = np.random.randint(10,100,15).reshape(3,5)\n", 494 | "print(\"Matrix of random 2-digit numbers\\n\",mat)\n", 495 | "\n", 496 | "print(\"\\nDouble bracket indexing\\n\")\n", 497 | "print(\"Element in row index 1 and column index 2:\", mat[1][2])\n", 498 | "\n", 499 | "print(\"\\nSingle bracket with comma indexing\\n\")\n", 500 | "print(\"Element in row index 1 and column index 2:\", mat[1,2])\n", 501 | "print(\"\\nRow or column extract\\n\")\n", 502 | "\n", 503 | "print(\"Entire row at index 2:\", mat[2])\n", 504 | "print(\"Entire column at index 3:\", mat[:,3])\n", 505 | "\n", 506 | "print(\"\\nSubsetting sub-matrices\\n\")\n", 507 | "print(\"Matrix with row indices 1 and 2 and column indices 3 and 4\\n\", mat[1:3,3:5])\n", 508 | "print(\"Matrix with row indices 0 and 1 and column indices 1 and 3\\n\", mat[0:2,[1,3]])" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "### Exercise 11: Conditional subsetting" 516 | ] 517 | }, 518 | { 519 | "cell_type": "code", 520 | "execution_count": 54, 521 | "metadata": {}, 522 | "outputs": [ 523 | { 524 | "name": "stdout", 525 | "output_type": "stream", 526 | "text": [ 527 | "Matrix of random 2-digit numbers\n", 528 | " [[21 16 77 31 81]\n", 529 | " [21 50 62 67 14]\n", 530 | " [33 94 52 39 18]]\n", 531 | "\n", 532 | "Elements greater than 50\n", 533 | " [77 81 62 67 94 52]\n" 534 | ] 535 | } 536 | ], 537 | "source": [ 538 | "mat = np.random.randint(10,100,15).reshape(3,5)\n", 539 | "print(\"Matrix of random 2-digit numbers\\n\",mat)\n", 540 | "print (\"\\nElements greater than 50\\n\", mat[mat>50])" 541 | ] 542 | }, 543 | { 544 | "cell_type": "markdown", 545 | "metadata": {}, 546 | "source": [ 547 | "### Exercise 12: Array operations (array-array, array-scalar, universal functions)" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 55, 553 | "metadata": {}, 554 | "outputs": [ 555 | { 556 | "name": "stdout", 557 | "output_type": "stream", 558 | "text": [ 559 | "\n", 560 | "1st Matrix of random single-digit numbers\n", 561 | " [[1 6 5]\n", 562 | " [6 2 2]\n", 563 | " [5 3 5]]\n", 564 | "\n", 565 | "2nd Matrix of random single-digit numbers\n", 566 | " [[9 7 5]\n", 567 | " [3 9 8]\n", 568 | " [8 8 9]]\n", 569 | "\n", 570 | "Addition\n", 571 | " [[10 13 10]\n", 572 | " [ 9 11 10]\n", 573 | " [13 11 14]]\n", 574 | "\n", 575 | "Multiplication\n", 576 | " [[ 9 42 25]\n", 577 | " [18 18 16]\n", 578 | " [40 24 45]]\n", 579 | "\n", 580 | "Division\n", 581 | " [[0.11111111 0.85714286 1. ]\n", 582 | " [2. 0.22222222 0.25 ]\n", 583 | " [0.625 0.375 0.55555556]]\n", 584 | "\n", 585 | "Lineaer combination: 3*A - 2*B\n", 586 | " [[-15 4 5]\n", 587 | " [ 12 -12 -10]\n", 588 | " [ -1 -7 -3]]\n", 589 | "\n", 590 | "Addition of a scalar (100)\n", 591 | " [[101 106 105]\n", 592 | " [106 102 102]\n", 593 | " [105 103 105]]\n", 594 | "\n", 595 | "Exponentiation, matrix cubed here\n", 596 | " [[ 1 216 125]\n", 597 | " [216 8 8]\n", 598 | " [125 27 125]]\n", 599 | "\n", 600 | "Exponentiation, sq-root using pow function\n", 601 | " [[1. 2.44948974 2.23606798]\n", 602 | " [2.44948974 1.41421356 1.41421356]\n", 603 | " [2.23606798 1.73205081 2.23606798]]\n" 604 | ] 605 | } 606 | ], 607 | "source": [ 608 | "mat1 = np.random.randint(1,10,9).reshape(3,3)\n", 609 | "mat2 = np.random.randint(1,10,9).reshape(3,3)\n", 610 | "print(\"\\n1st Matrix of random single-digit numbers\\n\",mat1)\n", 611 | "print(\"\\n2nd Matrix of random single-digit numbers\\n\",mat2)\n", 612 | "\n", 613 | "print(\"\\nAddition\\n\", mat1+mat2)\n", 614 | "print(\"\\nMultiplication\\n\", mat1*mat2)\n", 615 | "print(\"\\nDivision\\n\", mat1/mat2)\n", 616 | "print(\"\\nLineaer combination: 3*A - 2*B\\n\", 3*mat1-2*mat2)\n", 617 | "\n", 618 | "print(\"\\nAddition of a scalar (100)\\n\", 100+mat1)\n", 619 | "\n", 620 | "print(\"\\nExponentiation, matrix cubed here\\n\", mat1**3)\n", 621 | "print(\"\\nExponentiation, sq-root using pow function\\n\",pow(mat1,0.5))" 622 | ] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "### Stacking arrays" 629 | ] 630 | }, 631 | { 632 | "cell_type": "code", 633 | "execution_count": 56, 634 | "metadata": {}, 635 | "outputs": [ 636 | { 637 | "name": "stdout", 638 | "output_type": "stream", 639 | "text": [ 640 | "Matrix a\n", 641 | " [[1 2]\n", 642 | " [3 4]]\n", 643 | "Matrix b\n", 644 | " [[5 6]\n", 645 | " [7 8]]\n", 646 | "Vertical stacking\n", 647 | " [[1 2]\n", 648 | " [3 4]\n", 649 | " [5 6]\n", 650 | " [7 8]]\n", 651 | "Horizontal stacking\n", 652 | " [[1 2 5 6]\n", 653 | " [3 4 7 8]]\n" 654 | ] 655 | } 656 | ], 657 | "source": [ 658 | "a = np.array([[1,2],[3,4]])\n", 659 | "b = np.array([[5,6],[7,8]])\n", 660 | "print(\"Matrix a\\n\",a)\n", 661 | "print(\"Matrix b\\n\",b)\n", 662 | "print(\"Vertical stacking\\n\",np.vstack((a,b)))\n", 663 | "print(\"Horizontal stacking\\n\",np.hstack((a,b)))" 664 | ] 665 | } 666 | ], 667 | "metadata": { 668 | "kernelspec": { 669 | "display_name": "Python 3", 670 | "language": "python", 671 | "name": "python3" 672 | }, 673 | "language_info": { 674 | "codemirror_mode": { 675 | "name": "ipython", 676 | "version": 3 677 | }, 678 | "file_extension": ".py", 679 | "mimetype": "text/x-python", 680 | "name": "python", 681 | "nbconvert_exporter": "python", 682 | "pygments_lexer": "ipython3", 683 | "version": "3.6.2" 684 | }, 685 | "latex_envs": { 686 | "LaTeX_envs_menu_present": true, 687 | "autoclose": false, 688 | "autocomplete": true, 689 | "bibliofile": "biblio.bib", 690 | "cite_by": "apalike", 691 | "current_citInitial": 1, 692 | "eqLabelWithNumbers": true, 693 | "eqNumInitial": 1, 694 | "hotkeys": { 695 | "equation": "Ctrl-E", 696 | "itemize": "Ctrl-I" 697 | }, 698 | "labels_anchors": false, 699 | "latex_user_defs": false, 700 | "report_style_numbering": false, 701 | "user_envs_cfg": false 702 | } 703 | }, 704 | "nbformat": 4, 705 | "nbformat_minor": 2 706 | } 707 | -------------------------------------------------------------------------------- /Lesson 4/Sample - Superstore.xls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 4/Sample - Superstore.xls -------------------------------------------------------------------------------- /Lesson 5/C11065_Packt Author Contract_Tirthajyoti Sarkar.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/C11065_Packt Author Contract_Tirthajyoti Sarkar.pdf -------------------------------------------------------------------------------- /Lesson 5/CSV_EX_1.csv: -------------------------------------------------------------------------------- 1 | Bedroom, Sq. foot, Locality, Price ($) 2 | 2, 1500, Good, 300000 3 | 3, 1300, Fair, 240000 4 | 3, 1900, Very good, 450000 5 | 3, 1850, Bad, 280000 6 | 2, 1640, Good, 310000 7 | -------------------------------------------------------------------------------- /Lesson 5/CSV_EX_1.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/CSV_EX_1.zip -------------------------------------------------------------------------------- /Lesson 5/CSV_EX_2.csv: -------------------------------------------------------------------------------- 1 | 2, 1500, Good, 300000 2 | 3, 1300, Fair, 240000 3 | 3, 1900, Very good, 450000 4 | 3, 1850, Bad, 280000 5 | 2, 1640, Good, 310000 6 | -------------------------------------------------------------------------------- /Lesson 5/CSV_EX_3.csv: -------------------------------------------------------------------------------- 1 | Bedroom; Sq. foot; Locality; Price ($) 2 | 2; 1500; Good; 300000 3 | 3; 1300; Fair; 240000 4 | 3; 1900; Very good; 450000 5 | 3; 1850; Bad; 280000 6 | 2; 1640; Good; 310000 -------------------------------------------------------------------------------- /Lesson 5/CSV_EX_blankline.csv: -------------------------------------------------------------------------------- 1 | Bedroom, Sq. foot, Locality, Price ($) 2 | 2, 1500, Good, 300000 3 | 3, 1300, Fair, 240000 4 | 5 | 3, 1900, Very good, 450000 6 | 3, 1850, Bad, 280000 7 | 8 | 2, 1640, Good, 310000 9 | -------------------------------------------------------------------------------- /Lesson 5/CSV_EX_skipfooter.csv: -------------------------------------------------------------------------------- 1 | Filetype: CSV,,, 2 | ,Info about some houses,, 3 | Bedroom, Sq. foot, Locality, Price ($) 4 | 2,1500, Good,300000 5 | 3,1300, Fair,240000 6 | 3,1900, Very good,450000 7 | 3,1850, Bad,280000 8 | 2,1640, Good,310000 9 | , This is the end of file,, -------------------------------------------------------------------------------- /Lesson 5/CSV_EX_skiprows.csv: -------------------------------------------------------------------------------- 1 | Filetype: CSV,,, 2 | ,Info about some houses,, 3 | Bedroom, Sq. foot, Locality, Price ($) 4 | 2,1500, Good,300000 5 | 3,1300, Fair,240000 6 | 3,1900, Very good,450000 7 | 3,1850, Bad,280000 8 | 2,1640, Good,310000 9 | -------------------------------------------------------------------------------- /Lesson 5/Data Wrangling with Python.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/Data Wrangling with Python.pdf -------------------------------------------------------------------------------- /Lesson 5/Housing_data.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/Housing_data.pdf -------------------------------------------------------------------------------- /Lesson 5/Housing_data.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/Housing_data.xlsx -------------------------------------------------------------------------------- /Lesson 5/JSON_EX_1.json: -------------------------------------------------------------------------------- 1 | {"Bedroom":{"0":2,"1":3,"2":3,"3":3,"4":2}," Sq. foot":{"0":1500,"1":1300,"2":1900,"3":1850,"4":1640}," Locality":{"0":" Good","1":" Fair","2":" Very good","3":" Bad","4":" Good"}," Price ($)":{"0":300000,"1":240000,"2":450000,"3":280000,"4":310000}} -------------------------------------------------------------------------------- /Lesson 5/JSON_EX_Movies.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "title": "Avengers: Age of Ultron", 4 | "year": 2015, 5 | "cast": [ 6 | "Robert Downey, Jr.", 7 | "Chris Evans", 8 | "Chris Hemsworth", 9 | "Mark Ruffalo" 10 | ], 11 | "genres": [ 12 | "Action" 13 | ] 14 | }, 15 | { 16 | "title": "The Avengers", 17 | "year": 2012, 18 | "cast": [ 19 | "Robert Downey, Jr.", 20 | "Chris Evans", 21 | "Mark Ruffalo", 22 | "Chris Hemsworth", 23 | "Scarlett Johansson", 24 | "Jeremy Renner", 25 | "Tom Hiddleston", 26 | "Clark Gregg", 27 | "Cobie Smulders", 28 | "Stellan Skarsgård", 29 | "Samuel L. Jackson" 30 | ], 31 | "genres": [ 32 | "Superhero" 33 | ] 34 | } 35 | ] -------------------------------------------------------------------------------- /Lesson 5/Lesson 5 Activity 2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 5 Activity 2: \n", 8 | "# Read tabular data from a PDF report of World Bank for doing some analysis" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "### Load libraries\n", 16 | "* **Don't forget that a special package to load for this exercise is Tabula. You will need the `read_pdf` function from Tabula**" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "## Write your code here" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "### Get familiar with the PDF file to be read\n", 33 | "#### Open the accompanying PDF file \"WDI-2016\" and browse through it quickly. It is an annual report from World Bank on World Development Indicators (poverty, hunger, child mortality, social mobility, education, etc.)\n", 34 | "\n", 35 | "#### Go to pages 68-72 to look at the tables we need toextract in this activity for analysis. They show various statistics for nations around the world." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "### Define a list of page numbers to read" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "## Write your code here" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "### Create a list of column names. This will not be extracted by the PDF reader correctly, so we need to manually use it later.\n", 59 | "\n", 60 | "#### Look at the pages 68-72 and come up with these variable names. Use your own judgment." 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 3, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "## Write your code here" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### Test a PDF table extraction by using the `read_pdf` function from Tabula\n", 77 | "\n", 78 | "* **You can read details on this library here: https://github.com/chezou/tabula-py**\n", 79 | "* **You may have to set `multiple_tables=True` in this case**" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 4, 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "## Write your code here" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "### If you have done previous step correctly, you should get a simple list back. Check its length and contents. Do you see the table (as a Pandas DataFrame) in the list?" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 6, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "## Write your code here to check the length of the list" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 7, 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "## Write your code here to check the first element of the list" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 8, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "## Write your code here to check the second element of the list" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "### It looks like that the 2nd element of the list is the table we want to extract. Let's assign it to a DataFrame and check first few rows using `head` method" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 9, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "## Write your code here" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 10, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "## Write your code here" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "### You should observe that the column headers are just numbers. Here, we need to use the defined list of variables we created earlier. Assign that list as column names of this DataFrame. " 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 11, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "## Write your code here" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "### Next, write a loop to create such DataFrames by reading data tables from the pages 68-72 of the PDF file. You can store those DataFrames in a list for concatenating later." 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 12, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "## Write your code here" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "### Examine individual DataFrames from the list. Does the last DataFrame look alright?" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 13, 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "## Write your code here" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "### Concetenate all the DataFrames in the list into a single DataFrame so that we can use it for further wrangling and analysis.\n", 203 | "\n", 204 | "* Check the shape of the DataFrame. It should show 226 entries in total with 11 columns." 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 14, 210 | "metadata": {}, 211 | "outputs": [], 212 | "source": [ 213 | "## Write your code here to concatenate the individual DataFrames into a single one" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 15, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "## Write your code here to check the shape" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 16, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "## Write your code here to examine the final DataFrame" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "### Is the Data set clean and ready to be analyzed? \n", 239 | "* **Are there missing entries? How to handle them?**\n", 240 | "* **Are there entries not specific to countries but regions? Do we need them here or can they be copied to another data set?**\n", 241 | "\n", 242 | "#### As with any real-world example, this data set also needs further wrangling and cleaning before it can be used in an analytics pipeline. But you have already mastered many of those data cleaning techniques from the previous lessons! So those will not be discussed here but you can try on your own how to extract beautiful plots and insights from this dataset by using your data wrangling skills!" 243 | ] 244 | } 245 | ], 246 | "metadata": { 247 | "kernelspec": { 248 | "display_name": "Python 3", 249 | "language": "python", 250 | "name": "python3" 251 | }, 252 | "language_info": { 253 | "codemirror_mode": { 254 | "name": "ipython", 255 | "version": 3 256 | }, 257 | "file_extension": ".py", 258 | "mimetype": "text/x-python", 259 | "name": "python", 260 | "nbconvert_exporter": "python", 261 | "pygments_lexer": "ipython3", 262 | "version": "3.6.2" 263 | }, 264 | "latex_envs": { 265 | "LaTeX_envs_menu_present": true, 266 | "autoclose": false, 267 | "autocomplete": true, 268 | "bibliofile": "biblio.bib", 269 | "cite_by": "apalike", 270 | "current_citInitial": 1, 271 | "eqLabelWithNumbers": true, 272 | "eqNumInitial": 1, 273 | "hotkeys": { 274 | "equation": "Ctrl-E", 275 | "itemize": "Ctrl-I" 276 | }, 277 | "labels_anchors": false, 278 | "latex_user_defs": false, 279 | "report_style_numbering": false, 280 | "user_envs_cfg": false 281 | } 282 | }, 283 | "nbformat": 4, 284 | "nbformat_minor": 2 285 | } 286 | -------------------------------------------------------------------------------- /Lesson 5/Table_EX_1.txt: -------------------------------------------------------------------------------- 1 | Bedroom, Sq. foot, Locality, Price ($) 2 | 2, 1500, Good, 300000 3 | 3, 1300, Fair, 240000 4 | 3, 1900, Very good, 450000 5 | 3, 1850, Bad, 280000 6 | 2, 1640, Good, 310000 7 | -------------------------------------------------------------------------------- /Lesson 5/Table_tab_separated.txt: -------------------------------------------------------------------------------- 1 | Bedroom Sq. foot Locality Price ($) 2 | 2 1500 Good 300000 3 | 3 1300 Fair 240000 4 | 3 1900 Very good 450000 5 | 3 1850 Bad 280000 6 | 2 1640 Good 310000 7 | -------------------------------------------------------------------------------- /Lesson 5/WDI-2016.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/WDI-2016.pdf -------------------------------------------------------------------------------- /Lesson 5/rscfp2016.dta: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/rscfp2016.dta -------------------------------------------------------------------------------- /Lesson 5/scfp2016s.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/scfp2016s.zip -------------------------------------------------------------------------------- /Lesson 5/tsarkar31-analysis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 5/tsarkar31-analysis.pdf -------------------------------------------------------------------------------- /Lesson 6/Lesson 6 Activitiy 01 - Solutions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Load the data" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 20, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "import numpy as np\n", 18 | "import matplotlib.pyplot as plt\n", 19 | "\n", 20 | "%matplotlib inline" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 21, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "df = pd.read_csv(\"visit_data.csv\")" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 22, 35 | "metadata": {}, 36 | "outputs": [ 37 | { 38 | "data": { 39 | "text/html": [ 40 | "
\n", 41 | "\n", 54 | "\n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | "
idfirst_namelast_nameemailgenderip_addressvisit
01SonnyDahlsdahl0@mysql.comMale135.36.96.1831225.0
12NaNNaNdhoovart1@hud.govNaN237.165.194.143919.0
23GarArmalgarmal2@technorati.comNaN166.43.137.224271.0
34ChiarraNultycnulty3@newyorker.comNaN139.98.137.1081002.0
45NaNNaNsleaver4@elegantthemes.comNaN46.117.117.272434.0
\n", 120 | "
" 121 | ], 122 | "text/plain": [ 123 | " id first_name last_name email gender \\\n", 124 | "0 1 Sonny Dahl sdahl0@mysql.com Male \n", 125 | "1 2 NaN NaN dhoovart1@hud.gov NaN \n", 126 | "2 3 Gar Armal garmal2@technorati.com NaN \n", 127 | "3 4 Chiarra Nulty cnulty3@newyorker.com NaN \n", 128 | "4 5 NaN NaN sleaver4@elegantthemes.com NaN \n", 129 | "\n", 130 | " ip_address visit \n", 131 | "0 135.36.96.183 1225.0 \n", 132 | "1 237.165.194.143 919.0 \n", 133 | "2 166.43.137.224 271.0 \n", 134 | "3 139.98.137.108 1002.0 \n", 135 | "4 46.117.117.27 2434.0 " 136 | ] 137 | }, 138 | "execution_count": 22, 139 | "metadata": {}, 140 | "output_type": "execute_result" 141 | } 142 | ], 143 | "source": [ 144 | "df.head()" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "As we can see that there are data where some values are missing and if we exmine we will see some outliers" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "### Task - 1 (Are there duplicates?)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 23, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "name": "stdout", 168 | "output_type": "stream", 169 | "text": [ 170 | "First name is duplictaed - True\n", 171 | "Last name is duplictaed - True\n", 172 | "Email is duplictaed - False\n" 173 | ] 174 | } 175 | ], 176 | "source": [ 177 | "print(\"First name is duplictaed - {}\".format(any(df.first_name.duplicated())))\n", 178 | "print(\"Last name is duplictaed - {}\".format(any(df.last_name.duplicated())))\n", 179 | "print(\"Email is duplictaed - {}\".format(any(df.email.duplicated())))" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "There are duplicates in both First and Last names. Which is normal. However, as we can see, there is no duplicate in email. That is good. " 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "### Task - 2 (do any essential column contain NaN?)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 24, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "name": "stdout", 203 | "output_type": "stream", 204 | "text": [ 205 | "The column Email contains NaN - False \n", 206 | "The column IP Address contains NaN - False \n", 207 | "The column Visit contains NaN - True \n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "# Notice that we have different ways to format boolean values for the % operator\n", 213 | "print(\"The column Email contains NaN - %r \" % df.email.isnull().values.any())\n", 214 | "print(\"The column IP Address contains NaN - %s \" % df.ip_address.isnull().values.any())\n", 215 | "print(\"The column Visit contains NaN - %s \" % df.visit.isnull().values.any())" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "The column visit contains some None values. Given that the final task at hand will probably be predicting the number of visits, we can not do anything with rows which do not have that info. They are a type of `Outliers` for us. Let's get rid of them" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### Task - 3 (Get rid of the outliers)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 25, 235 | "metadata": {}, 236 | "outputs": [], 237 | "source": [ 238 | "# There are various ways to do this. This is just one way. We encourage you to explore other ways.\n", 239 | "# But before that we need to store the previous size of the data set and we will compare it with the new size\n", 240 | "size_prev = df.shape\n", 241 | "df = df[np.isfinite(df['visit'])] #This is an inplace operation. After this operation the original DataFrame is lost.\n", 242 | "size_after = df.shape" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "### Task - 4 (Report the size difference)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 37, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "name": "stdout", 259 | "output_type": "stream", 260 | "text": [ 261 | "The size of previous data was - 1000 rows and the size of the new one is - 974 rows\n" 262 | ] 263 | } 264 | ], 265 | "source": [ 266 | "# Notice how parameterized format is used and then the indexing is working inside the quote marks\n", 267 | "print(\"The size of previous data was - {prev[0]} rows and the size of the new one is - {after[0]} rows\".\n", 268 | " format(prev=size_prev, after=size_after))" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "### Task - 5 (Box plot visit to further check any Outliers)" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 39, 281 | "metadata": {}, 282 | "outputs": [ 283 | { 284 | "data": { 285 | "text/plain": [ 286 | "{'whiskers': [,\n", 287 | " ],\n", 288 | " 'caps': [,\n", 289 | " ],\n", 290 | " 'boxes': [],\n", 291 | " 'medians': [],\n", 292 | " 'fliers': [],\n", 293 | " 'means': []}" 294 | ] 295 | }, 296 | "execution_count": 39, 297 | "metadata": {}, 298 | "output_type": "execute_result" 299 | }, 300 | { 301 | "data": { 302 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAEbJJREFUeJzt3W2MXvV55/HvDxva1dKEp1kExmCL9YYQoVI0Ml4Vom4BP70hVVaRiVIcL9HwgiiNUhIRkgjSbhRWWcIqUnhwhTFpKGC1RLEILLiGJIrkJB66LMFmCbMOxnYJNgVTNglU4177Yo53B2ozz3Mb/78f6dZ9znX+55zrSPb85jzZqSokSe05ptcNSJJ6wwCQpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNWpurxt4J6ecckotWLCg121I0rvKE0888XJV9Y017ogOgAULFjA4ONjrNiTpXSXJzvGM8xKQJDXKAJCkRhkAktQoA0CSGjVmACT57SQ/TfI/k2xL8uWuvjDJT5IMJbk/yXFd/be6+aFu+YJR2/p8V382ybKZOihJ0tjGcwbwJvCHVfW7wPnA8iRLgP8C3FJV/xZ4FbiqG38V8GpXv6UbR5JzgVXAB4DlwK1J5kznwUiSxm/MAKgR/6ebPbb7FPCHwF939buBD3XTl3fzdMsvSZKufl9VvVlVvwCGgMXTchSSpAkb1z2AJHOSPAnsBTYB/xvYX1XD3ZDdwLxueh6wC6Bb/hpw8uj6IdaRJM2ycb0IVlUHgPOTnAB8BzhnphpKMgAMAJx55pkztRvpLUZOUmee/we3jiQTegqoqvYDjwP/HjghycEAOQPY003vAeYDdMvfC/zD6Poh1hm9j7VV1V9V/X19Y77JLE2LqprwZzLrSUeS8TwF1Nf95k+SfwVcBjzDSBD8x27YauC73fTGbp5u+WM18id/I7Cqe0poIbAI+Ol0HYgkaWLGcwnoNODu7omdY4ANVfVgku3AfUn+M/A/gDu78XcCf5lkCHiFkSd/qKptSTYA24Fh4Jru0pIkqQdyJJ+W9vf3l/8YnI5USbysoyNSkieqqn+scb4JLEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJatSYAZBkfpLHk2xPsi3Jn3T1G5PsSfJk91k5ap3PJxlK8mySZaPqy7vaUJLrZuaQJEnjMXccY4aBP62qv0vyO8ATSTZ1y26pqv86enCSc4FVwAeA04G/TfLvusXfBC4DdgNbk2ysqu3TcSCSpIkZMwCq6kXgxW769STPAPPeYZXLgfuq6k3gF0mGgMXdsqGq2gGQ5L5urAEgST0woXsASRYAvwf8pCt9MslTSdYlObGrzQN2jVptd1c7XF2S1APjDoAkxwN/A3y6qv4RuA04GzifkTOEm6ejoSQDSQaTDO7bt286NilJOoRxBUCSYxn54X9PVT0AUFUvVdWBqvpn4C/4/5d59gDzR61+Rlc7XP0tqmptVfVXVX9fX99Ej0eSNE7jeQoowJ3AM1X19VH100YN+yPg6W56I7AqyW8lWQgsAn4KbAUWJVmY5DhGbhRvnJ7DkCRN1HieAvp94I+BnyV5sqtdD1yR5HyggOeBqwGqaluSDYzc3B0GrqmqAwBJPgk8AswB1lXVtmk8FknSBKSqet3DYfX399fg4GCv25AOKQlH8t8ftSvJE1XVP9Y43wSWpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqPG86+BSu8qu3btYvfu3bOyry1btszo9s877zyOP/74Gd2H2mUA6Khz5ZVXsnfvXt7znvfM+L4+85nPzNi2d+7cybXXXjuj+1DbDAAddYaHh7n99tu5+OKLe93KlHzuc59jeHi4123oKOY9AElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJatSYAZBkfpLHk2xPsi3Jn3T1k5JsSvJc931iV0+SbyQZSvJUkgtGbWt1N/65JKtn7rAkSWMZzxnAMPCnVXUusAS4Jsm5wHXA5qpaBGzu5gFWAIu6zwBwG4wEBnADcCGwGLjhYGhIkmbfmAFQVS9W1d91068DzwDzgMuBu7thdwMf6qYvB75VI34MnJDkNGAZsKmqXqmqV4FNwPJpPRpJ0rhN6B5AkgXA7wE/AU6tqhe7Rb8ETu2m5wG7Rq22u6sdri5J6oFxB0CS44G/AT5dVf84ellVFVDT0VCSgSSDSQb37ds3HZuUJB3CuAIgybGM/PC/p6oe6MovdZd26L73dvU9wPxRq5/R1Q5Xf4uqWltV/VXV39fXN5FjkSRNwHieAgpwJ/BMVX191KKNwMEneVYD3x1Vv7J7GmgJ8Fp3qegRYGmSE7ubv0u7miSpB8bzH8L8PvDHwM+SPNnVrgduAjYkuQrYCXykW/YQsBIYAn4NrAGoqleS/DmwtRv3Z1X1yrQchSRpwsYMgKr6EZDDLL7kEOMLuOYw21oHrJtIg5KkmeGbwJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEaNGQBJ1iXZm+TpUbUbk+xJ8mT3WTlq2eeTDCV5NsmyUfXlXW0oyXXTfyjSiGOOOYYHH3yQl19+udetTNq2bdvYunUrxxzj72iaOeP507UeWH6I+i1VdX73eQggybnAKuAD3Tq3JpmTZA7wTWAFcC5wRTdWmna33HILu3fv5uyzz+bDH/4wDz74IMPDw71ua0yvvvoqt956K4sXL2bp0qUsWbKENWvW9LotHcXGDICq+iHwyji3dzlwX1W9WVW/AIaAxd1nqKp2VNU/Afd1Y6Vpd8EFF3DPPffwwgsvsGzZMr7yla8wf/58PvvZz7J9+/Zet/cWBw4c4JFHHmHVqlUsXLiQH/zgB3z5y19m586dfPWrX+Xkk0/udYs6ik3l/PKTSZ7qLhGd2NXmAbtGjdnd1Q5Xl2bMe9/7XgYGBtiyZQuPP/44c+bM4dJLL+XCCy/k9ttvZ//+/T3r7ec//znXX389Z511Fl/84he5+OKL2bFjB/fffz8rVqxg7ty5PetN7ZhsANwGnA2cD7wI3DxdDSUZSDKYZHDfvn3TtVk17pxzzuGmm27ihRde4MYbb+Sxxx5jwYIFXHHFFTz66KMcOHBgxnt4/fXXufPOO7nooou4+OKLefPNN3n44YfZunUr11xzDSeddNKM9yCNNqkAqKqXqupAVf0z8BeMXOIB2APMHzX0jK52uPqhtr22qvqrqr+vr28y7UmHNXfuXFasWMGGDRt49tlnee2111i2bBkDAwMzut/h4WHe9773MTAwwPLly9m9ezc333wz55133ozuV3onkzrPTHJaVb3Yzf4RcPAJoY3AXyX5OnA6sAj4KRBgUZKFjPzgXwV8dCqNS5O1fft27rrrLr797W+zcOFC7rjjDgb+/rNw41/P2D7nAn8/AF86cD1r167lgQceYM2aNXz0ox/1Or96JlX1zgOSe4E/AE4BXgJu6ObPBwp4Hrj6YCAk+QLwn4Bh4NNV9XBXXwn8N2AOsK6qvjJWc/39/TU4ODiJw5Leav/+/dx7772sX7+eXbt2ceWVV/Lxj3+cc845Z9Z7OXDgAI899hjr16/ne9/7Hpdeeilr1qxh2bJlXvvXtEjyRFX1jzlurADoJQNAU3HgwAE2b97MXXfdxcMPP8zSpUtZs2YNl1122RHzg3b//v3cf//9rF+/np07d/Kxj32MNWvW8P73v7/XreldbLwB4FsmOuq88MILfOELX2DBggVcf/31XHTRRezYsYMNGzYccU/YnHDCCVx99dVs2bKFzZs3k4RLLrmEJUuWcMcdd/Cb3/ym1y3qKOYZgI46l112GaeffjrXXnvtu/Im6/DwMI8++ihf+tKXWL16NZ/61Kd63ZLeZcZ7BnDk/CokTZM33niDT3ziE+/KH/4w8qTSypUr+f73v88bb7zR63Z0FPMSkCQ1ygCQpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjxgyAJOuS7E3y9KjaSUk2JXmu+z6xqyfJN5IMJXkqyQWj1lndjX8uyeqZORxJ0niN5wxgPbD8bbXrgM1VtQjY3M0DrAAWdZ8B4DYYCQzgBuBCYDFww8HQkCT1xpgBUFU/BF55W/ly4O5u+m7gQ6Pq36oRPwZOSHIasAzYVFWvVNWrwCb+ZahIkmbRZO8BnFpVL3bTvwRO7abnAbtGjdvd1Q5XlyT1yJRvAldVATUNvQCQZCDJYJLBffv2TddmJUlvM9kAeKm7tEP3vber7wHmjxp3Rlc7XP1fqKq1VdVfVf19fX2TbE+SNJbJBsBG4OCTPKuB746qX9k9DbQEeK27VPQIsDTJid3N36VdTZLUI3PHGpDkXuAPgFOS7GbkaZ6bgA1JrgJ2Ah/phj8ErASGgF8DawCq6pUkfw5s7cb9WVW9/cayJGkWjRkAVXXFYRZdcoixBVxzmO2sA9ZNqDtJ0ozxTWBJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSo8Z8EUx6t/nRj37EBz/4QU4++eRetzIlv/rVr/ja177W6zZ0FDMAdNR5/vnnOe644zj22GNndD99fX3M9L9Ye9JJJ83o9tU2A0BHnbPOOmvW9nXKKafM2r6k6eY9AElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEZNKQCSPJ/kZ0meTDLY1U5KsinJc933iV09Sb6RZCjJU0kumI4DkCRNznScAfyHqjq/qvq7+euAzVW1CNjczQOsABZ1nwHgtmnYtyRpkmbiEtDlwN3d9N3Ah0bVv1UjfgyckOS0Gdi/JGkcphoABTya5IkkA13t1Kp6sZv+JXBqNz0P2DVq3d1dTZLUA1P9H8Euqqo9Sf4NsCnJ/xq9sKoqSU1kg12QDACceeaZU2xPknQ4UzoDqKo93fde4DvAYuClg5d2uu+93fA9wPxRq5/R1d6+zbVV1V9V/X19fVNpT5L0DiYdAEn+dZLfOTgNLAWeBjYCq7thq4HvdtMbgSu7p4GWAK+NulQkSZplU7kEdCrwnSQHt/NXVfXfk2wFNiS5CtgJfKQb/xCwEhgCfg2smcK+JUlTNOkAqKodwO8eov4PwCWHqBdwzWT3J0maXr4JLEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJatSsB0CS5UmeTTKU5LrZ3r8kacSsBkCSOcA3gRXAucAVSc6dzR4kSSNm+wxgMTBUVTuq6p+A+4DLZ7kHSRKzHwDzgF2j5nd3NUnSLJvb6wbeLskAMABw5pln9rgbtSLJrKxXVZPajzQTZvsMYA8wf9T8GV3t/6mqtVXVX1X9fX19s9qc2lVVs/KRjiSzHQBbgUVJFiY5DlgFbJzlHiRJzPIloKoaTvJJ4BFgDrCuqrbNZg+SpBGzfg+gqh4CHprt/UqS3so3gSWpUQaAJDXKAJCkRhkAktQoA0CSGpUj+eWUJPuAnb3uQzqMU4CXe92EdAhnVdWYb9Ie0QEgHcmSDFZVf6/7kCbLS0CS1CgDQJIaZQBIk7e21w1IU+E9AElqlGcAktQoA0CaoCTrkuxN8nSve5GmwgCQJm49sLzXTUhTZQBIE1RVPwRe6XUf0lQZAJLUKANAkhplAEhSowwASWqUASBNUJJ7gS3A+5LsTnJVr3uSJsM3gSWpUZ4BSFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhr1fwEkj1ygKhpQqgAAAABJRU5ErkJggg==\n", 303 | "text/plain": [ 304 | "
" 305 | ] 306 | }, 307 | "metadata": { 308 | "needs_background": "light" 309 | }, 310 | "output_type": "display_data" 311 | } 312 | ], 313 | "source": [ 314 | "plt.boxplot(df.visit, notch=True)" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "As we can see that we have data in this column in the interval (0, 3000). However, the main concentration of the data is between ~700 to ~2300. Let us say that anything beyond 2900 and bellow 100 are outliers for us. We need to get rid of them" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 41, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "df1 = df[(df['visit'] <= 2900) & (df['visit'] >= 100)] # Notice the powerful & operator" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 45, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "name": "stdout", 340 | "output_type": "stream", 341 | "text": [ 342 | "After getting rid of outliers the new size of the data is - 923\n" 343 | ] 344 | } 345 | ], 346 | "source": [ 347 | "# Here we abuse the fact the number of variable can be greater than the number of replacement targets\n", 348 | "print(\"After getting rid of outliers the new size of the data is - {}\".format(*df1.shape))" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "This is the end of the activity for this lesson :)" 356 | ] 357 | } 358 | ], 359 | "metadata": { 360 | "kernelspec": { 361 | "display_name": "Python 3", 362 | "language": "python", 363 | "name": "python3" 364 | }, 365 | "language_info": { 366 | "codemirror_mode": { 367 | "name": "ipython", 368 | "version": 3 369 | }, 370 | "file_extension": ".py", 371 | "mimetype": "text/x-python", 372 | "name": "python", 373 | "nbconvert_exporter": "python", 374 | "pygments_lexer": "ipython3", 375 | "version": "3.6.6" 376 | } 377 | }, 378 | "nbformat": 4, 379 | "nbformat_minor": 2 380 | } 381 | -------------------------------------------------------------------------------- /Lesson 6/Lesson 6 Topic 1 Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Exercise 1 (Generator Expression)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "odd_numbers = (x for x in range(100000) if x % 2 != 0)" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/plain": [ 27 | "88" 28 | ] 29 | }, 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "output_type": "execute_result" 33 | } 34 | ], 35 | "source": [ 36 | "from sys import getsizeof\n", 37 | "getsizeof(odd_numbers)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 3, 43 | "metadata": {}, 44 | "outputs": [ 45 | { 46 | "name": "stdout", 47 | "output_type": "stream", 48 | "text": [ 49 | "1\n", 50 | "3\n", 51 | "5\n", 52 | "7\n", 53 | "9\n", 54 | "11\n", 55 | "13\n", 56 | "15\n", 57 | "17\n", 58 | "19\n", 59 | "21\n", 60 | "23\n" 61 | ] 62 | } 63 | ], 64 | "source": [ 65 | "for i, number in enumerate(odd_numbers):\n", 66 | " print(number)\n", 67 | " if i > 10:\n", 68 | " break" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 4, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "odd_numbers2 = [x for x in range(100000) if x % 2 != 0]" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 5, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "data": { 87 | "text/plain": [ 88 | "406496" 89 | ] 90 | }, 91 | "execution_count": 5, 92 | "metadata": {}, 93 | "output_type": "execute_result" 94 | } 95 | ], 96 | "source": [ 97 | "getsizeof(odd_numbers2)" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### Exercise 2 (String operations using generator expressions)" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 11, 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "words = [\"Hello\\n\", \"My name\", \"is\\n\", \"Bob\", \"How are you\", \"doing\\n\"]" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 12, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "modified_words = (word.strip().lower() for word in words)" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 13, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "data": { 132 | "text/plain": [ 133 | "['hello', 'my name', 'is', 'bob', 'how are you', 'doing']" 134 | ] 135 | }, 136 | "execution_count": 13, 137 | "metadata": {}, 138 | "output_type": "execute_result" 139 | } 140 | ], 141 | "source": [ 142 | "final_list_of_word = [word for word in modified_words]\n", 143 | "final_list_of_word" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "### Exercise 3 (String operation and nested for loops in generator expression)" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 20, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "modified_words2 = (w.strip().lower() for word in words for w in word.split(\" \"))" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 21, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "data": { 169 | "text/plain": [ 170 | "['hello', 'my', 'name', 'is', 'bob', 'how', 'are', 'you', 'doing']" 171 | ] 172 | }, 173 | "execution_count": 21, 174 | "metadata": {}, 175 | "output_type": "execute_result" 176 | } 177 | ], 178 | "source": [ 179 | "final_list_of_word = [word for word in modified_words2]\n", 180 | "final_list_of_word" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 22, 186 | "metadata": {}, 187 | "outputs": [ 188 | { 189 | "data": { 190 | "text/plain": [ 191 | "['hello', 'my', 'name', 'is', 'bob', 'how', 'are', 'you', 'doing']" 192 | ] 193 | }, 194 | "execution_count": 22, 195 | "metadata": {}, 196 | "output_type": "execute_result" 197 | } 198 | ], 199 | "source": [ 200 | "modified_words3 = []\n", 201 | "for word in words:\n", 202 | " for w in word.split(\" \"):\n", 203 | " modified_words3.append(w.strip().lower())\n", 204 | "modified_words3" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "### Exercise 4 (Multiple, non-nested for loop in generator expression)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 23, 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "marbles = [\"RED\", \"BLUE\", \"GREEN\"]\n", 221 | "counts = [1, 5, 13]\n", 222 | "\n", 223 | "marble_with_count = ((m, c) for m in marbles for c in counts)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 24, 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/plain": [ 234 | "[('RED', 1),\n", 235 | " ('RED', 5),\n", 236 | " ('RED', 13),\n", 237 | " ('BLUE', 1),\n", 238 | " ('BLUE', 5),\n", 239 | " ('BLUE', 13),\n", 240 | " ('GREEN', 1),\n", 241 | " ('GREEN', 5),\n", 242 | " ('GREEN', 13)]" 243 | ] 244 | }, 245 | "execution_count": 24, 246 | "metadata": {}, 247 | "output_type": "execute_result" 248 | } 249 | ], 250 | "source": [ 251 | "marble_with_count_as_list = list(marble_with_count)\n", 252 | "marble_with_count_as_list" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 26, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "data": { 262 | "text/plain": [ 263 | "[('RED', 1),\n", 264 | " ('RED', 5),\n", 265 | " ('RED', 13),\n", 266 | " ('BLUE', 1),\n", 267 | " ('BLUE', 5),\n", 268 | " ('BLUE', 13),\n", 269 | " ('GREEN', 1),\n", 270 | " ('GREEN', 5),\n", 271 | " ('GREEN', 13)]" 272 | ] 273 | }, 274 | "execution_count": 26, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "marble_with_count_as_list_2 = []\n", 281 | "for m in marbles:\n", 282 | " for c in counts:\n", 283 | " marble_with_count_as_list_2.append((m, c))\n", 284 | "marble_with_count_as_list_2" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "### Exercise 5 (zip)" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 30, 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "data": { 301 | "text/plain": [ 302 | "[('India', 'Delhi'),\n", 303 | " ('USA', 'Washington'),\n", 304 | " ('France', 'Paris'),\n", 305 | " ('UK', 'London')]" 306 | ] 307 | }, 308 | "execution_count": 30, 309 | "metadata": {}, 310 | "output_type": "execute_result" 311 | } 312 | ], 313 | "source": [ 314 | "countries = [\"India\", \"USA\", \"France\", \"UK\"]\n", 315 | "capitals = [\"Delhi\", \"Washington\", \"Paris\", \"London\"]\n", 316 | "countries_and_capitals = [t for t in zip(countries, capitals)]\n", 317 | "countries_and_capitals" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 32, 323 | "metadata": {}, 324 | "outputs": [ 325 | { 326 | "data": { 327 | "text/plain": [ 328 | "{'India': 'Delhi', 'USA': 'Washington', 'France': 'Paris', 'UK': 'London'}" 329 | ] 330 | }, 331 | "execution_count": 32, 332 | "metadata": {}, 333 | "output_type": "execute_result" 334 | } 335 | ], 336 | "source": [ 337 | "countries_and_capitals_as_dict = dict(zip(countries, capitals))\n", 338 | "countries_and_capitals_as_dict" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "### Exercise 6 (ziplongest)" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 36, 351 | "metadata": {}, 352 | "outputs": [ 353 | { 354 | "data": { 355 | "text/plain": [ 356 | "{'India': 'Delhi',\n", 357 | " 'USA': 'Washington',\n", 358 | " 'France': 'Paris',\n", 359 | " 'UK': 'London',\n", 360 | " 'Brasil': None,\n", 361 | " 'Japan': None}" 362 | ] 363 | }, 364 | "execution_count": 36, 365 | "metadata": {}, 366 | "output_type": "execute_result" 367 | } 368 | ], 369 | "source": [ 370 | "countries = [\"India\", \"USA\", \"France\", \"UK\", \"Brasil\", \"Japan\"]\n", 371 | "capitals = [\"Delhi\", \"Washington\", \"Paris\", \"London\"]\n", 372 | "from itertools import zip_longest\n", 373 | "countries_and_capitals_as_dict_2 = dict(zip_longest(countries, capitals))\n", 374 | "countries_and_capitals_as_dict_2" 375 | ] 376 | } 377 | ], 378 | "metadata": { 379 | "kernelspec": { 380 | "display_name": "Python 3", 381 | "language": "python", 382 | "name": "python3" 383 | }, 384 | "language_info": { 385 | "codemirror_mode": { 386 | "name": "ipython", 387 | "version": 3 388 | }, 389 | "file_extension": ".py", 390 | "mimetype": "text/x-python", 391 | "name": "python", 392 | "nbconvert_exporter": "python", 393 | "pygments_lexer": "ipython3", 394 | "version": "3.6.6" 395 | } 396 | }, 397 | "nbformat": 4, 398 | "nbformat_minor": 2 399 | } 400 | -------------------------------------------------------------------------------- /Lesson 6/Lesson 6 Topic 2 Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Exercise 1 using the % operator to format" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 18, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from csv import DictReader\n", 17 | "raw_data = []\n", 18 | "with open(\"combinded_data.csv\", \"rt\") as fd:\n", 19 | " data_rows = DictReader(fd)\n", 20 | " for data in data_rows:\n", 21 | " raw_data.append(dict(data))" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 19, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/plain": [ 32 | "[{'Name': 'Bob',\n", 33 | " 'Age': '23.0',\n", 34 | " 'Height': '1.7',\n", 35 | " 'Weight': '70',\n", 36 | " 'Disease_history': 'N',\n", 37 | " 'Heart_problem': 'N'},\n", 38 | " {'Name': 'Alex',\n", 39 | " 'Age': '45',\n", 40 | " 'Height': '1.61',\n", 41 | " 'Weight': '61',\n", 42 | " 'Disease_history': 'Y',\n", 43 | " 'Heart_problem': 'N'},\n", 44 | " {'Name': 'George',\n", 45 | " 'Age': '12.5',\n", 46 | " 'Height': '1.4',\n", 47 | " 'Weight': '40',\n", 48 | " 'Disease_history': 'N',\n", 49 | " 'Heart_problem': ''},\n", 50 | " {'Name': 'Alice',\n", 51 | " 'Age': '34',\n", 52 | " 'Height': '1.56',\n", 53 | " 'Weight': '51',\n", 54 | " 'Disease_history': 'N',\n", 55 | " 'Heart_problem': 'Y'}]" 56 | ] 57 | }, 58 | "execution_count": 19, 59 | "metadata": {}, 60 | "output_type": "execute_result" 61 | } 62 | ], 63 | "source": [ 64 | "raw_data" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 22, 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "name": "stdout", 74 | "output_type": "stream", 75 | "text": [ 76 | "Bob is 23.0 years old and is 1.7 meter tall weiging about 70 kg.\n", 77 | "\n", 78 | " Has a hsitory of family illness: N.\n", 79 | "\n", 80 | " Presently suffering from a heart disease: N\n", 81 | " \n", 82 | "Alex is 45 years old and is 1.61 meter tall weiging about 61 kg.\n", 83 | "\n", 84 | " Has a hsitory of family illness: Y.\n", 85 | "\n", 86 | " Presently suffering from a heart disease: N\n", 87 | " \n", 88 | "George is 12.5 years old and is 1.4 meter tall weiging about 40 kg.\n", 89 | "\n", 90 | " Has a hsitory of family illness: N.\n", 91 | "\n", 92 | " Presently suffering from a heart disease: \n", 93 | " \n", 94 | "Alice is 34 years old and is 1.56 meter tall weiging about 51 kg.\n", 95 | "\n", 96 | " Has a hsitory of family illness: N.\n", 97 | "\n", 98 | " Presently suffering from a heart disease: Y\n", 99 | " \n" 100 | ] 101 | } 102 | ], 103 | "source": [ 104 | "for data in raw_data:\n", 105 | " report_str = \"\"\"%s is %s years old and is %s meter tall weiging about %s kg.\\n\n", 106 | " Has a hsitory of family illness: %s.\\n\n", 107 | " Presently suffering from a heart disease: %s\n", 108 | " \"\"\" % (data[\"Name\"], data[\"Age\"], data[\"Height\"], data[\"Weight\"], data[\"Disease_history\"], data[\"Heart_problem\"])\n", 109 | " print(report_str)" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "### Exercise 2 (using the 'format' statement)" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 23, 122 | "metadata": {}, 123 | "outputs": [ 124 | { 125 | "name": "stdout", 126 | "output_type": "stream", 127 | "text": [ 128 | "Bob is 23.0 years old and is 1.7 meter tall weiging about 70 kg.\n", 129 | "\n", 130 | " Has a hsitory of family illness: N.\n", 131 | "\n", 132 | " Presently suffering from a heart disease: N\n", 133 | " \n", 134 | "Alex is 45 years old and is 1.61 meter tall weiging about 61 kg.\n", 135 | "\n", 136 | " Has a hsitory of family illness: Y.\n", 137 | "\n", 138 | " Presently suffering from a heart disease: N\n", 139 | " \n", 140 | "George is 12.5 years old and is 1.4 meter tall weiging about 40 kg.\n", 141 | "\n", 142 | " Has a hsitory of family illness: N.\n", 143 | "\n", 144 | " Presently suffering from a heart disease: \n", 145 | " \n", 146 | "Alice is 34 years old and is 1.56 meter tall weiging about 51 kg.\n", 147 | "\n", 148 | " Has a hsitory of family illness: N.\n", 149 | "\n", 150 | " Presently suffering from a heart disease: Y\n", 151 | " \n" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "for data in raw_data:\n", 157 | " report_str = \"\"\"{} is {} years old and is {} meter tall weiging about {} kg.\\n\n", 158 | " Has a hsitory of family illness: {}.\\n\n", 159 | " Presently suffering from a heart disease: {}\n", 160 | " \"\"\".format(data[\"Name\"], data[\"Age\"], data[\"Height\"], data[\"Weight\"], data[\"Disease_history\"], data[\"Heart_problem\"])\n", 161 | " print(report_str)" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "### Exercise 3 (Naming the variables inside the string representation)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 25, 174 | "metadata": {}, 175 | "outputs": [ 176 | { 177 | "name": "stdout", 178 | "output_type": "stream", 179 | "text": [ 180 | "Bob is 23.0 years old and is 1.7 meter tall weiging about 70 kg.\n", 181 | "\n", 182 | " Has a hsitory of family illness: N.\n", 183 | "\n", 184 | " Presently suffering from a heart disease: N\n", 185 | " \n", 186 | "Alex is 45 years old and is 1.61 meter tall weiging about 61 kg.\n", 187 | "\n", 188 | " Has a hsitory of family illness: Y.\n", 189 | "\n", 190 | " Presently suffering from a heart disease: N\n", 191 | " \n", 192 | "George is 12.5 years old and is 1.4 meter tall weiging about 40 kg.\n", 193 | "\n", 194 | " Has a hsitory of family illness: N.\n", 195 | "\n", 196 | " Presently suffering from a heart disease: \n", 197 | " \n", 198 | "Alice is 34 years old and is 1.56 meter tall weiging about 51 kg.\n", 199 | "\n", 200 | " Has a hsitory of family illness: N.\n", 201 | "\n", 202 | " Presently suffering from a heart disease: Y\n", 203 | " \n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "for data in raw_data:\n", 209 | " report_str = \"\"\"{Name} is {Age} years old and is {Height} meter tall weiging about {Weight} kg.\\n\n", 210 | " Has a hsitory of family illness: {Disease_history}.\\n\n", 211 | " Presently suffering from a heart disease: {Heart_problem}\n", 212 | " \"\"\".format(**data)\n", 213 | " print(report_str)" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "### Exercise 4 (Various conversion tricks with format)" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 30, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "The binary representation of 42 is - 101010\n" 233 | ] 234 | } 235 | ], 236 | "source": [ 237 | "original_number = 42\n", 238 | "print(\"The binary representation of 42 is - {0:b}\".format(original_number))" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 31, 244 | "metadata": {}, 245 | "outputs": [ 246 | { 247 | "name": "stdout", 248 | "output_type": "stream", 249 | "text": [ 250 | " I am at the center \n" 251 | ] 252 | } 253 | ], 254 | "source": [ 255 | "print(\"{:^42}\".format(\"I am at the center\"))" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 33, 261 | "metadata": {}, 262 | "outputs": [ 263 | { 264 | "name": "stdout", 265 | "output_type": "stream", 266 | "text": [ 267 | "============I am at the center============\n" 268 | ] 269 | } 270 | ], 271 | "source": [ 272 | "print(\"{:=^42}\".format(\"I am at the center\"))" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "### Exercise 5 (Date formatting)" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 34, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "from datetime import datetime" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 35, 294 | "metadata": {}, 295 | "outputs": [ 296 | { 297 | "name": "stdout", 298 | "output_type": "stream", 299 | "text": [ 300 | "The present datetime is 2018-10-14 09:57:15\n" 301 | ] 302 | } 303 | ], 304 | "source": [ 305 | "print(\"The present datetime is {:%Y-%m-%d %H:%M:%S}\".format(datetime.utcnow()))" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": null, 311 | "metadata": {}, 312 | "outputs": [], 313 | "source": [] 314 | } 315 | ], 316 | "metadata": { 317 | "kernelspec": { 318 | "display_name": "Python 3", 319 | "language": "python", 320 | "name": "python3" 321 | }, 322 | "language_info": { 323 | "codemirror_mode": { 324 | "name": "ipython", 325 | "version": 3 326 | }, 327 | "file_extension": ".py", 328 | "mimetype": "text/x-python", 329 | "name": "python", 330 | "nbconvert_exporter": "python", 331 | "pygments_lexer": "ipython3", 332 | "version": "3.6.6" 333 | } 334 | }, 335 | "nbformat": 4, 336 | "nbformat_minor": 2 337 | } 338 | -------------------------------------------------------------------------------- /Lesson 6/Readme.md: -------------------------------------------------------------------------------- 1 | ## Lesson 6 notebooks 2 | -------------------------------------------------------------------------------- /Lesson 6/combinded_data.csv: -------------------------------------------------------------------------------- 1 | Name,Age,Height,Weight,Disease_history,Heart_problem 2 | Bob,23.0,1.7,70,N,N 3 | Alex,45,1.61,61,Y,N 4 | George,12.5,1.4,40,N, 5 | Alice,34,1.56,51,N,Y -------------------------------------------------------------------------------- /Lesson 6/dummy_data.csv: -------------------------------------------------------------------------------- 1 | Bob, 23, 1.7, 70, N, N 2 | Alex, 45, 1.61, 61, Y, N 3 | George, 12, 1.4, 40, N, 4 | Alice, 34, 1.56, 51, N, Y -------------------------------------------------------------------------------- /Lesson 6/dummy_header.csv: -------------------------------------------------------------------------------- 1 | Name 2 | Age 3 | Height 4 | Weight 5 | Family Sickness History 6 | Suffering from Heart Problem -------------------------------------------------------------------------------- /Lesson 7 Topic 3 Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 7: Advanced Web Scraping and Data Gathering\n", 8 | "## Topic 3: Reading data from an API\n", 9 | "This Notebook shows how to use a free API (no authorization or API key needed) to download some basic information about various countries around the world and put them in a DataFrame." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### Import libraries" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "import urllib.request, urllib.parse\n", 26 | "from urllib.error import HTTPError,URLError\n", 27 | "import pandas as pd" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "### Exercise 20: Define the base URL" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "serviceurl = 'https://restcountries.eu/rest/v2/name/'" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### Exercise 21: Define a function to pull the country data from the API" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 18, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "def get_country_data(country):\n", 60 | " country_name=country\n", 61 | " url = serviceurl + country_name\n", 62 | " \n", 63 | " try: \n", 64 | " uh = urllib.request.urlopen(url)\n", 65 | " except HTTPError as e:\n", 66 | " print(\"Sorry! Could not retrive anything on {}\".format(country_name))\n", 67 | " return None\n", 68 | " except URLError as e:\n", 69 | " print('Failed to reach a server.')\n", 70 | " print('Reason: ', e.reason)\n", 71 | " return None\n", 72 | " else:\n", 73 | " data = uh.read().decode()\n", 74 | " print(\"Retrieved data on {}. Total {} characters read.\".format(country_name,len(data)))\n", 75 | " return data" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "### Exercise 22: Test the function by passing a correct and an incorrect argument" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 19, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "country_name = 'Switzerland'" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 20, 97 | "metadata": {}, 98 | "outputs": [ 99 | { 100 | "name": "stdout", 101 | "output_type": "stream", 102 | "text": [ 103 | "Retrieved data on Switzerland. Total 1090 characters read.\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "data=get_country_data(country_name)" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": 21, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "country_name1 = 'Switzerland1'" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 22, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "name": "stdout", 127 | "output_type": "stream", 128 | "text": [ 129 | "Sorry! Could not retrive anything on Switzerland1\n" 130 | ] 131 | } 132 | ], 133 | "source": [ 134 | "data1=get_country_data(country_name1)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "### Exercise 23: Use the built-in `JSON` library to read and examine the data properly" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 8, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "import json" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 9, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "# Load from string 'data'\n", 160 | "x=json.loads(data)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 10, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "# Load the only element\n", 170 | "y=x[0]" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 11, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "data": { 180 | "text/plain": [ 181 | "dict" 182 | ] 183 | }, 184 | "execution_count": 11, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "type(y)" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 12, 196 | "metadata": {}, 197 | "outputs": [ 198 | { 199 | "data": { 200 | "text/plain": [ 201 | "dict_keys(['name', 'topLevelDomain', 'alpha2Code', 'alpha3Code', 'callingCodes', 'capital', 'altSpellings', 'region', 'subregion', 'population', 'latlng', 'demonym', 'area', 'gini', 'timezones', 'borders', 'nativeName', 'numericCode', 'currencies', 'languages', 'translations', 'flag', 'regionalBlocs', 'cioc'])" 202 | ] 203 | }, 204 | "execution_count": 12, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "y.keys()" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "### Exercise 24: Can you print all the data elements one by one?" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 13, 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "name": "stdout", 227 | "output_type": "stream", 228 | "text": [ 229 | "name: Switzerland\n", 230 | "topLevelDomain: ['.ch']\n", 231 | "alpha2Code: CH\n", 232 | "alpha3Code: CHE\n", 233 | "callingCodes: ['41']\n", 234 | "capital: Bern\n", 235 | "altSpellings: ['CH', 'Swiss Confederation', 'Schweiz', 'Suisse', 'Svizzera', 'Svizra']\n", 236 | "region: Europe\n", 237 | "subregion: Western Europe\n", 238 | "population: 8341600\n", 239 | "latlng: [47.0, 8.0]\n", 240 | "demonym: Swiss\n", 241 | "area: 41284.0\n", 242 | "gini: 33.7\n", 243 | "timezones: ['UTC+01:00']\n", 244 | "borders: ['AUT', 'FRA', 'ITA', 'LIE', 'DEU']\n", 245 | "nativeName: Schweiz\n", 246 | "numericCode: 756\n", 247 | "currencies: [{'code': 'CHF', 'name': 'Swiss franc', 'symbol': 'Fr'}]\n", 248 | "languages: [{'iso639_1': 'de', 'iso639_2': 'deu', 'name': 'German', 'nativeName': 'Deutsch'}, {'iso639_1': 'fr', 'iso639_2': 'fra', 'name': 'French', 'nativeName': 'français'}, {'iso639_1': 'it', 'iso639_2': 'ita', 'name': 'Italian', 'nativeName': 'Italiano'}]\n", 249 | "translations: {'de': 'Schweiz', 'es': 'Suiza', 'fr': 'Suisse', 'ja': 'スイス', 'it': 'Svizzera', 'br': 'Suíça', 'pt': 'Suíça', 'nl': 'Zwitserland', 'hr': 'Švicarska', 'fa': 'سوئیس'}\n", 250 | "flag: https://restcountries.eu/data/che.svg\n", 251 | "regionalBlocs: [{'acronym': 'EFTA', 'name': 'European Free Trade Association', 'otherAcronyms': [], 'otherNames': []}]\n", 252 | "cioc: SUI\n" 253 | ] 254 | } 255 | ], 256 | "source": [ 257 | "for k,v in y.items():\n", 258 | " print(\"{}: {}\".format(k,v))" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "### Exercise 25: The dictionary values are not of the same type - print all the languages spoken" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 14, 271 | "metadata": {}, 272 | "outputs": [ 273 | { 274 | "name": "stdout", 275 | "output_type": "stream", 276 | "text": [ 277 | "German\n", 278 | "French\n", 279 | "Italian\n" 280 | ] 281 | } 282 | ], 283 | "source": [ 284 | "for i in y['languages']:\n", 285 | " print(i['name'])" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "### Exercise 26: Write a function which can take a list of countries and return a DataFrame containing key info\n", 293 | "* Capital\n", 294 | "* Region\n", 295 | "* Sub-region\n", 296 | "* Population\n", 297 | "* lattitude/longitude\n", 298 | "* Area\n", 299 | "* Gini index\n", 300 | "* Timezones\n", 301 | "* Currencies\n", 302 | "* Languages" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 23, 308 | "metadata": {}, 309 | "outputs": [], 310 | "source": [ 311 | "def build_country_database(list_country):\n", 312 | " \"\"\"\n", 313 | " \"\"\"\n", 314 | " import pandas as pd\n", 315 | " import json\n", 316 | " # Define an empty dictionary with keys\n", 317 | " country_dict={'Country':[],'Capital':[],'Region':[],'Sub-region':[],'Population':[],\n", 318 | " 'Lattitude':[],'Longitude':[],'Area':[],'Gini':[],'Timezones':[],\n", 319 | " 'Currencies':[],'Languages':[]}\n", 320 | " \n", 321 | " for c in list_country:\n", 322 | " data = get_country_data(c)\n", 323 | " if data!=None:\n", 324 | " x = json.loads(data)\n", 325 | " y=x[0]\n", 326 | " country_dict['Country'].append(y['name'])\n", 327 | " country_dict['Capital'].append(y['capital'])\n", 328 | " country_dict['Region'].append(y['region'])\n", 329 | " country_dict['Sub-region'].append(y['subregion'])\n", 330 | " country_dict['Population'].append(y['population'])\n", 331 | " country_dict['Lattitude'].append(y['latlng'][0])\n", 332 | " country_dict['Longitude'].append(y['latlng'][1])\n", 333 | " country_dict['Area'].append(y['area'])\n", 334 | " country_dict['Gini'].append(y['gini'])\n", 335 | " # Note the code to handle possibility of multiple timezones as a list\n", 336 | " if len(y['timezones'])>1:\n", 337 | " country_dict['Timezones'].append(','.join(y['timezones']))\n", 338 | " else:\n", 339 | " country_dict['Timezones'].append(y['timezones'][0])\n", 340 | " # Note the code to handle possibility of multiple currencies as dictionaries\n", 341 | " if len(y['currencies'])>1:\n", 342 | " lst_currencies = []\n", 343 | " for i in y['currencies']:\n", 344 | " lst_currencies.append(i['name'])\n", 345 | " country_dict['Currencies'].append(','.join(lst_currencies))\n", 346 | " else:\n", 347 | " country_dict['Currencies'].append(y['currencies'][0]['name'])\n", 348 | " # Note the code to handle possibility of multiple languages as dictionaries\n", 349 | " if len(y['languages'])>1:\n", 350 | " lst_languages = []\n", 351 | " for i in y['languages']:\n", 352 | " lst_languages.append(i['name'])\n", 353 | " country_dict['Languages'].append(','.join(lst_languages))\n", 354 | " else:\n", 355 | " country_dict['Languages'].append(y['languages'][0]['name'])\n", 356 | " \n", 357 | " # Return as a Pandas DataFrame\n", 358 | " return pd.DataFrame(country_dict)" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "### Exercise 27: Test the function by building a small database of countries' info. Include an incorrect name too." 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": 24, 371 | "metadata": {}, 372 | "outputs": [ 373 | { 374 | "name": "stdout", 375 | "output_type": "stream", 376 | "text": [ 377 | "Retrieved data on Nigeria. Total 1004 characters read.\n", 378 | "Retrieved data on Switzerland. Total 1090 characters read.\n", 379 | "Retrieved data on France. Total 1047 characters read.\n", 380 | "Sorry! Could not retrive anything on Turmeric\n", 381 | "Retrieved data on Russia. Total 1120 characters read.\n", 382 | "Retrieved data on Kenya. Total 1052 characters read.\n", 383 | "Retrieved data on Singapore. Total 1223 characters read.\n" 384 | ] 385 | } 386 | ], 387 | "source": [ 388 | "df1=build_country_database(['Nigeria','Switzerland','France','Turmeric','Russia','Kenya','Singapore'])" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 17, 394 | "metadata": {}, 395 | "outputs": [ 396 | { 397 | "data": { 398 | "text/html": [ 399 | "
\n", 400 | "\n", 413 | "\n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | "
CountryCapitalRegionSub-regionPopulationLattitudeLongitudeAreaGiniTimezonesCurrenciesLanguages
0NigeriaAbujaAfricaWestern Africa18698800010.0000008.0923768.048.8UTC+01:00Nigerian nairaEnglish
1SwitzerlandBernEuropeWestern Europe834160047.0000008.041284.033.7UTC+01:00Swiss francGerman,French,Italian
2FranceParisEuropeWestern Europe6671000046.0000002.0640679.032.7UTC-10:00,UTC-09:30,UTC-09:00,UTC-08:00,UTC-04...EuroFrench
3Russian FederationMoscowEuropeEastern Europe14659918360.000000100.017124442.040.1UTC+03:00,UTC+04:00,UTC+06:00,UTC+07:00,UTC+08...Russian rubleRussian
4KenyaNairobiAfricaEastern Africa472510001.00000038.0580367.047.7UTC+03:00Kenyan shillingEnglish,Swahili
5SingaporeSingaporeAsiaSouth-Eastern Asia55350001.366667103.8710.048.1UTC+08:00Brunei dollar,Singapore dollarEnglish,Malay,Tamil,Chinese
\n", 524 | "
" 525 | ], 526 | "text/plain": [ 527 | " Country Capital Region Sub-region Population \\\n", 528 | "0 Nigeria Abuja Africa Western Africa 186988000 \n", 529 | "1 Switzerland Bern Europe Western Europe 8341600 \n", 530 | "2 France Paris Europe Western Europe 66710000 \n", 531 | "3 Russian Federation Moscow Europe Eastern Europe 146599183 \n", 532 | "4 Kenya Nairobi Africa Eastern Africa 47251000 \n", 533 | "5 Singapore Singapore Asia South-Eastern Asia 5535000 \n", 534 | "\n", 535 | " Lattitude Longitude Area Gini \\\n", 536 | "0 10.000000 8.0 923768.0 48.8 \n", 537 | "1 47.000000 8.0 41284.0 33.7 \n", 538 | "2 46.000000 2.0 640679.0 32.7 \n", 539 | "3 60.000000 100.0 17124442.0 40.1 \n", 540 | "4 1.000000 38.0 580367.0 47.7 \n", 541 | "5 1.366667 103.8 710.0 48.1 \n", 542 | "\n", 543 | " Timezones \\\n", 544 | "0 UTC+01:00 \n", 545 | "1 UTC+01:00 \n", 546 | "2 UTC-10:00,UTC-09:30,UTC-09:00,UTC-08:00,UTC-04... \n", 547 | "3 UTC+03:00,UTC+04:00,UTC+06:00,UTC+07:00,UTC+08... \n", 548 | "4 UTC+03:00 \n", 549 | "5 UTC+08:00 \n", 550 | "\n", 551 | " Currencies Languages \n", 552 | "0 Nigerian naira English \n", 553 | "1 Swiss franc German,French,Italian \n", 554 | "2 Euro French \n", 555 | "3 Russian ruble Russian \n", 556 | "4 Kenyan shilling English,Swahili \n", 557 | "5 Brunei dollar,Singapore dollar English,Malay,Tamil,Chinese " 558 | ] 559 | }, 560 | "execution_count": 17, 561 | "metadata": {}, 562 | "output_type": "execute_result" 563 | } 564 | ], 565 | "source": [ 566 | "df1" 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": null, 572 | "metadata": {}, 573 | "outputs": [], 574 | "source": [] 575 | } 576 | ], 577 | "metadata": { 578 | "kernelspec": { 579 | "display_name": "Python 3", 580 | "language": "python", 581 | "name": "python3" 582 | }, 583 | "language_info": { 584 | "codemirror_mode": { 585 | "name": "ipython", 586 | "version": 3 587 | }, 588 | "file_extension": ".py", 589 | "mimetype": "text/x-python", 590 | "name": "python", 591 | "nbconvert_exporter": "python", 592 | "pygments_lexer": "ipython3", 593 | "version": "3.6.2" 594 | }, 595 | "latex_envs": { 596 | "LaTeX_envs_menu_present": true, 597 | "autoclose": false, 598 | "autocomplete": true, 599 | "bibliofile": "biblio.bib", 600 | "cite_by": "apalike", 601 | "current_citInitial": 1, 602 | "eqLabelWithNumbers": true, 603 | "eqNumInitial": 1, 604 | "hotkeys": { 605 | "equation": "Ctrl-E", 606 | "itemize": "Ctrl-I" 607 | }, 608 | "labels_anchors": false, 609 | "latex_user_defs": false, 610 | "report_style_numbering": false, 611 | "user_envs_cfg": false 612 | } 613 | }, 614 | "nbformat": 4, 615 | "nbformat_minor": 2 616 | } 617 | -------------------------------------------------------------------------------- /Lesson 8/Exercise 161 - 173.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Code tested snd verified by Tirthajyoti Sarkar, February 12, 2019." 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import sqlite3\n", 17 | "conn = sqlite3.connect(\"lesson.db\")\n" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "conn.close()" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 3, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 36 | " pass\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 4, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 46 | " cursor = conn.cursor()\n", 47 | " cursor.execute(\"CREATE TABLE IF NOT EXISTS user (email text, first_name text, last_name text, address text, age integer, PRIMARY KEY (email))\")\n", 48 | " cursor.execute(\"INSERT INTO user VALUES ('bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31)\")\n", 49 | " cursor.execute(\"INSERT INTO user VALUES ('tom@web.com', 'Tom', 'Fake', '123 Fantasy lane, Fantasu City', 39)\")\n", 50 | " conn.commit()" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 5, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "('bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31)\n", 63 | "('tom@web.com', 'Tom', 'Fake', '123 Fantasy lane, Fantasu City', 39)\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 69 | " cursor = conn.cursor()\n", 70 | " rows = cursor.execute('SELECT * FROM user')\n", 71 | " for row in rows:\n", 72 | " print(row)\n" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 6, 78 | "metadata": {}, 79 | "outputs": [ 80 | { 81 | "name": "stdout", 82 | "output_type": "stream", 83 | "text": [ 84 | "('tom@web.com', 'Tom', 'Fake', '123 Fantasy lane, Fantasu City', 39)\n", 85 | "('bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31)\n" 86 | ] 87 | } 88 | ], 89 | "source": [ 90 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 91 | " cursor = conn.cursor()\n", 92 | " rows = cursor.execute('SELECT * FROM user ORDER BY age DESC')\n", 93 | " for row in rows:\n", 94 | " print(row)\n" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 7, 100 | "metadata": {}, 101 | "outputs": [], 102 | "source": [ 103 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 104 | " cursor = conn.cursor()\n", 105 | " cursor.execute(\"ALTER TABLE user ADD COLUMN gender text\")\n", 106 | " cursor.execute(\"UPDATE user SET gender='M'\")\n", 107 | " conn.commit()" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 8, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 117 | " cursor = conn.cursor()\n", 118 | " cursor.execute(\"INSERT INTO user VALUES ('shelly@www.com', 'Shelly', 'Milar', '123, Ocean View Lane', 39, 'F')\")\n", 119 | " conn.commit()" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 9, 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "name": "stdout", 129 | "output_type": "stream", 130 | "text": [ 131 | "(1, 'F')\n", 132 | "(2, 'M')\n" 133 | ] 134 | } 135 | ], 136 | "source": [ 137 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 138 | " cursor = conn.cursor()\n", 139 | " rows = cursor.execute(\"SELECT COUNT(*), gender FROM user GROUP BY gender\")\n", 140 | " for row in rows:\n", 141 | " print(row)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 10, 147 | "metadata": { 148 | "scrolled": true 149 | }, 150 | "outputs": [], 151 | "source": [ 152 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 153 | " cursor = conn.cursor()\n", 154 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 155 | " sql = \"\"\"\n", 156 | " CREATE TABLE comments (\n", 157 | " user_id text,\n", 158 | " comments text,\n", 159 | " FOREIGN KEY (user_id) REFERENCES user (email) \n", 160 | " ON DELETE CASCADE ON UPDATE NO ACTION\n", 161 | " )\n", 162 | " \"\"\"\n", 163 | " cursor.execute(sql)\n", 164 | " conn.commit()\n" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 11, 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "name": "stdout", 174 | "output_type": "stream", 175 | "text": [ 176 | "Going to create rows for bob@example.com\n", 177 | "Going to create rows for tom@web.com\n", 178 | "Going to create rows for shelly@www.com\n" 179 | ] 180 | } 181 | ], 182 | "source": [ 183 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 184 | " cursor = conn.cursor()\n", 185 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 186 | " sql = \"INSERT INTO comments VALUES ('{}', '{}')\"\n", 187 | " rows = cursor.execute('SELECT * FROM user ORDER BY age')\n", 188 | " for row in rows:\n", 189 | " email = row[0]\n", 190 | " print(\"Going to create rows for {}\".format(email))\n", 191 | " name = row[1] + \" \" + row[2]\n", 192 | " for i in range(10):\n", 193 | " comment = \"This is comment {} by {}\".format(i, name)\n", 194 | " conn.cursor().execute(sql.format(email, comment))\n", 195 | " conn.commit()\n" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 12, 201 | "metadata": {}, 202 | "outputs": [ 203 | { 204 | "name": "stdout", 205 | "output_type": "stream", 206 | "text": [ 207 | "('bob@example.com', 'This is comment 0 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 208 | "('bob@example.com', 'This is comment 1 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 209 | "('bob@example.com', 'This is comment 2 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 210 | "('bob@example.com', 'This is comment 3 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 211 | "('bob@example.com', 'This is comment 4 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 212 | "('bob@example.com', 'This is comment 5 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 213 | "('bob@example.com', 'This is comment 6 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 214 | "('bob@example.com', 'This is comment 7 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 215 | "('bob@example.com', 'This is comment 8 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n", 216 | "('bob@example.com', 'This is comment 9 by Bob Codd', 'bob@example.com', 'Bob', 'Codd', '123 Fantasy lane, Fantasu City', 31, 'M')\n" 217 | ] 218 | } 219 | ], 220 | "source": [ 221 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 222 | " cursor = conn.cursor()\n", 223 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 224 | " sql = \"\"\"\n", 225 | " SELECT * FROM comments \n", 226 | " JOIN user ON comments.user_id = user.email\n", 227 | " WHERE user.email='bob@example.com'\n", 228 | " \"\"\"\n", 229 | " rows = cursor.execute(sql)\n", 230 | " for row in rows:\n", 231 | " print(row)\n" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 13, 237 | "metadata": {}, 238 | "outputs": [ 239 | { 240 | "name": "stdout", 241 | "output_type": "stream", 242 | "text": [ 243 | "('bob@example.com', 'This is comment 0 by Bob Codd')\n", 244 | "('bob@example.com', 'This is comment 1 by Bob Codd')\n", 245 | "('bob@example.com', 'This is comment 2 by Bob Codd')\n", 246 | "('bob@example.com', 'This is comment 3 by Bob Codd')\n", 247 | "('bob@example.com', 'This is comment 4 by Bob Codd')\n", 248 | "('bob@example.com', 'This is comment 5 by Bob Codd')\n", 249 | "('bob@example.com', 'This is comment 6 by Bob Codd')\n", 250 | "('bob@example.com', 'This is comment 7 by Bob Codd')\n", 251 | "('bob@example.com', 'This is comment 8 by Bob Codd')\n", 252 | "('bob@example.com', 'This is comment 9 by Bob Codd')\n" 253 | ] 254 | } 255 | ], 256 | "source": [ 257 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 258 | " cursor = conn.cursor()\n", 259 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 260 | " sql = \"\"\"\n", 261 | " SELECT comments.* FROM comments\n", 262 | " JOIN user ON comments.user_id = user.email\n", 263 | " WHERE user.email='bob@example.com'\n", 264 | " \"\"\"\n", 265 | " rows = cursor.execute(sql)\n", 266 | " for row in rows:\n", 267 | " print(row)\n" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 14, 273 | "metadata": {}, 274 | "outputs": [], 275 | "source": [ 276 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 277 | " cursor = conn.cursor()\n", 278 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 279 | " cursor.execute(\"DELETE FROM user WHERE email='bob@example.com'\")\n", 280 | " conn.commit()\n" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 15, 286 | "metadata": {}, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "('tom@web.com', 'Tom', 'Fake', '123 Fantasy lane, Fantasu City', 39, 'M')\n", 293 | "('shelly@www.com', 'Shelly', 'Milar', '123, Ocean View Lane', 39, 'F')\n" 294 | ] 295 | } 296 | ], 297 | "source": [ 298 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 299 | " cursor = conn.cursor()\n", 300 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 301 | " rows = cursor.execute(\"SELECT * FROM user\")\n", 302 | " for row in rows:\n", 303 | " print(row)\n" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": 16, 309 | "metadata": {}, 310 | "outputs": [ 311 | { 312 | "name": "stdout", 313 | "output_type": "stream", 314 | "text": [ 315 | "('tom@web.com', 'This is comment 0 by Tom Fake')\n", 316 | "('tom@web.com', 'This is comment 1 by Tom Fake')\n", 317 | "('tom@web.com', 'This is comment 2 by Tom Fake')\n", 318 | "('tom@web.com', 'This is comment 3 by Tom Fake')\n", 319 | "('tom@web.com', 'This is comment 4 by Tom Fake')\n", 320 | "('tom@web.com', 'This is comment 5 by Tom Fake')\n", 321 | "('tom@web.com', 'This is comment 6 by Tom Fake')\n", 322 | "('tom@web.com', 'This is comment 7 by Tom Fake')\n", 323 | "('tom@web.com', 'This is comment 8 by Tom Fake')\n", 324 | "('tom@web.com', 'This is comment 9 by Tom Fake')\n", 325 | "('shelly@www.com', 'This is comment 0 by Shelly Milar')\n", 326 | "('shelly@www.com', 'This is comment 1 by Shelly Milar')\n", 327 | "('shelly@www.com', 'This is comment 2 by Shelly Milar')\n", 328 | "('shelly@www.com', 'This is comment 3 by Shelly Milar')\n", 329 | "('shelly@www.com', 'This is comment 4 by Shelly Milar')\n", 330 | "('shelly@www.com', 'This is comment 5 by Shelly Milar')\n", 331 | "('shelly@www.com', 'This is comment 6 by Shelly Milar')\n", 332 | "('shelly@www.com', 'This is comment 7 by Shelly Milar')\n", 333 | "('shelly@www.com', 'This is comment 8 by Shelly Milar')\n", 334 | "('shelly@www.com', 'This is comment 9 by Shelly Milar')\n" 335 | ] 336 | } 337 | ], 338 | "source": [ 339 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 340 | " cursor = conn.cursor()\n", 341 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 342 | " rows = cursor.execute(\"SELECT * FROM comments\")\n", 343 | " for row in rows:\n", 344 | " print(row)\n" 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": 17, 350 | "metadata": { 351 | "scrolled": true 352 | }, 353 | "outputs": [ 354 | { 355 | "name": "stdout", 356 | "output_type": "stream", 357 | "text": [ 358 | "('tom@web.com', 'Chris', 'Fake', '123 Fantasy lane, Fantasu City', 39, 'M')\n", 359 | "('shelly@www.com', 'Shelly', 'Milar', '123, Ocean View Lane', 39, 'F')\n" 360 | ] 361 | } 362 | ], 363 | "source": [ 364 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 365 | " cursor = conn.cursor()\n", 366 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 367 | " cursor.execute(\"UPDATE user set first_name='Chris' where email='tom@web.com'\")\n", 368 | " conn.commit()\n", 369 | " rows = cursor.execute(\"SELECT * FROM user\")\n", 370 | " for row in rows:\n", 371 | " print(row)\n" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 18, 377 | "metadata": {}, 378 | "outputs": [ 379 | { 380 | "data": { 381 | "text/html": [ 382 | "
\n", 383 | "\n", 396 | "\n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | "
EmailFirst NameLast NameAgeGenderComments
0tom@web.comChrisFake39MThis is comment 0 by Tom Fake
1tom@web.comChrisFake39MThis is comment 1 by Tom Fake
2tom@web.comChrisFake39MThis is comment 2 by Tom Fake
3tom@web.comChrisFake39MThis is comment 3 by Tom Fake
4tom@web.comChrisFake39MThis is comment 4 by Tom Fake
\n", 456 | "
" 457 | ], 458 | "text/plain": [ 459 | " Email First Name Last Name Age Gender Comments\n", 460 | "0 tom@web.com Chris Fake 39 M This is comment 0 by Tom Fake\n", 461 | "1 tom@web.com Chris Fake 39 M This is comment 1 by Tom Fake\n", 462 | "2 tom@web.com Chris Fake 39 M This is comment 2 by Tom Fake\n", 463 | "3 tom@web.com Chris Fake 39 M This is comment 3 by Tom Fake\n", 464 | "4 tom@web.com Chris Fake 39 M This is comment 4 by Tom Fake" 465 | ] 466 | }, 467 | "execution_count": 18, 468 | "metadata": {}, 469 | "output_type": "execute_result" 470 | } 471 | ], 472 | "source": [ 473 | "import pandas as pd\n", 474 | "\n", 475 | "columns = [\"Email\", \"First Name\", \"Last Name\", \"Age\", \"Gender\", \"Comments\"]\n", 476 | "data = []\n", 477 | "with sqlite3.connect(\"lesson.db\") as conn:\n", 478 | " cursor = conn.cursor()\n", 479 | " cursor.execute(\"PRAGMA foreign_keys = 1\")\n", 480 | " \n", 481 | " sql = \"\"\"\n", 482 | " SELECT user.email, user.first_name, user.last_name, user.age, user.gender, comments.comments FROM comments\n", 483 | " JOIN user ON comments.user_id = user.email\n", 484 | " WHERE user.email = 'tom@web.com'\n", 485 | " \"\"\"\n", 486 | " rows = cursor.execute(sql)\n", 487 | " for row in rows:\n", 488 | " data.append(row)\n", 489 | "\n", 490 | "df = pd.DataFrame(data, columns=columns)\n", 491 | "df.head()\n" 492 | ] 493 | } 494 | ], 495 | "metadata": { 496 | "kernelspec": { 497 | "display_name": "Python 3", 498 | "language": "python", 499 | "name": "python3" 500 | }, 501 | "language_info": { 502 | "codemirror_mode": { 503 | "name": "ipython", 504 | "version": 3 505 | }, 506 | "file_extension": ".py", 507 | "mimetype": "text/x-python", 508 | "name": "python", 509 | "nbconvert_exporter": "python", 510 | "pygments_lexer": "ipython3", 511 | "version": "3.6.2" 512 | }, 513 | "latex_envs": { 514 | "LaTeX_envs_menu_present": true, 515 | "autoclose": false, 516 | "autocomplete": true, 517 | "bibliofile": "biblio.bib", 518 | "cite_by": "apalike", 519 | "current_citInitial": 1, 520 | "eqLabelWithNumbers": true, 521 | "eqNumInitial": 1, 522 | "hotkeys": { 523 | "equation": "Ctrl-E", 524 | "itemize": "Ctrl-I" 525 | }, 526 | "labels_anchors": false, 527 | "latex_user_defs": false, 528 | "report_style_numbering": false, 529 | "user_envs_cfg": false 530 | } 531 | }, 532 | "nbformat": 4, 533 | "nbformat_minor": 2 534 | } 535 | -------------------------------------------------------------------------------- /Lesson 8/Readme.md: -------------------------------------------------------------------------------- 1 | ## Lesson 8 notebooks 2 | -------------------------------------------------------------------------------- /Lesson 8/Student Activity 01 Solutions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Task 1: (Connect to the supplied petsDB, and write a function to check if the connection is done)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import sqlite3" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "conn = sqlite3.connect(\"petsdb\")" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 3, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "# a tiny function to make sure the connection is successful\n", 35 | "def is_opened(conn):\n", 36 | " try:\n", 37 | " conn.execute(\"SELECT * FROM persons LIMIT 1\")\n", 38 | " return True\n", 39 | " except sqlite3.ProgrammingError as e:\n", 40 | " print(\"Connection closed {}\".format(e))\n", 41 | " return False" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 4, 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "name": "stdout", 51 | "output_type": "stream", 52 | "text": [ 53 | "True\n" 54 | ] 55 | } 56 | ], 57 | "source": [ 58 | "print(is_opened(conn))" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 5, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "conn.close()" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 6, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "name": "stdout", 77 | "output_type": "stream", 78 | "text": [ 79 | "Connection closed Cannot operate on a closed database.\n", 80 | "False\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "print(is_opened(conn))" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "### Task 2: (What are the different age groups in the persons database)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 7, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "conn = sqlite3.connect(\"petsdb\")" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 8, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "c = conn.cursor()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 9, 116 | "metadata": {}, 117 | "outputs": [ 118 | { 119 | "name": "stdout", 120 | "output_type": "stream", 121 | "text": [ 122 | "We have 2 people aged 5\n", 123 | "We have 1 people aged 6\n", 124 | "We have 1 people aged 7\n", 125 | "We have 3 people aged 8\n", 126 | "We have 1 people aged 9\n", 127 | "We have 2 people aged 11\n", 128 | "We have 3 people aged 12\n", 129 | "We have 1 people aged 13\n", 130 | "We have 4 people aged 14\n", 131 | "We have 2 people aged 16\n", 132 | "We have 2 people aged 17\n", 133 | "We have 3 people aged 18\n", 134 | "We have 1 people aged 19\n", 135 | "We have 3 people aged 22\n", 136 | "We have 2 people aged 23\n", 137 | "We have 3 people aged 24\n", 138 | "We have 2 people aged 25\n", 139 | "We have 1 people aged 27\n", 140 | "We have 1 people aged 30\n", 141 | "We have 3 people aged 31\n", 142 | "We have 1 people aged 32\n", 143 | "We have 1 people aged 33\n", 144 | "We have 2 people aged 34\n", 145 | "We have 3 people aged 35\n", 146 | "We have 3 people aged 36\n", 147 | "We have 1 people aged 37\n", 148 | "We have 2 people aged 39\n", 149 | "We have 1 people aged 40\n", 150 | "We have 1 people aged 42\n", 151 | "We have 2 people aged 44\n", 152 | "We have 2 people aged 48\n", 153 | "We have 1 people aged 49\n", 154 | "We have 1 people aged 50\n", 155 | "We have 2 people aged 51\n", 156 | "We have 2 people aged 52\n", 157 | "We have 2 people aged 53\n", 158 | "We have 2 people aged 54\n", 159 | "We have 1 people aged 58\n", 160 | "We have 1 people aged 59\n", 161 | "We have 1 people aged 60\n", 162 | "We have 1 people aged 61\n", 163 | "We have 2 people aged 62\n", 164 | "We have 1 people aged 63\n", 165 | "We have 2 people aged 65\n", 166 | "We have 2 people aged 66\n", 167 | "We have 1 people aged 67\n", 168 | "We have 3 people aged 68\n", 169 | "We have 1 people aged 69\n", 170 | "We have 1 people aged 70\n", 171 | "We have 4 people aged 71\n", 172 | "We have 1 people aged 72\n", 173 | "We have 5 people aged 73\n", 174 | "We have 3 people aged 74\n" 175 | ] 176 | } 177 | ], 178 | "source": [ 179 | "for ppl, age in c.execute(\"SELECT count(*), age FROM persons GROUP BY age\"):\n", 180 | " print(\"We have {} people aged {}\".format(ppl, age))" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "### Task 3: Which age group has maximum number of people?" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 10, 193 | "metadata": {}, 194 | "outputs": [ 195 | { 196 | "name": "stdout", 197 | "output_type": "stream", 198 | "text": [ 199 | "Highest number of people 5 came from 73 age group\n" 200 | ] 201 | } 202 | ], 203 | "source": [ 204 | "for ppl, age in c.execute(\"SELECT count(*), age FROM persons GROUP BY age ORDER BY count(*) DESC\"):\n", 205 | " print(\"Highest number of people {} came from {} age group\".format(ppl, age))\n", 206 | " break" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "### Task 4: How many people do not have a full name (Last name is blank/null)" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 11, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "name": "stdout", 223 | "output_type": "stream", 224 | "text": [ 225 | "(60,)\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "res = c.execute(\"SELECT count(*) FROM persons WHERE last_name IS null\")\n", 231 | "for row in res:\n", 232 | " print(row)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "### Task 5: How many people has more than one pet? (*)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 12, 245 | "metadata": {}, 246 | "outputs": [ 247 | { 248 | "name": "stdout", 249 | "output_type": "stream", 250 | "text": [ 251 | "43 People has more than one pets\n" 252 | ] 253 | } 254 | ], 255 | "source": [ 256 | "res = c.execute(\"SELECT count(*) FROM (SELECT count(owner_id) FROM pets GROUP BY owner_id HAVING count(owner_id) >1)\")\n", 257 | "for row in res:\n", 258 | " print(\"{} People has more than one pets\".format(row[0]))" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "### Task 6: How many pets have received treaments?" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 15, 271 | "metadata": {}, 272 | "outputs": [ 273 | { 274 | "name": "stdout", 275 | "output_type": "stream", 276 | "text": [ 277 | "(36,)\n" 278 | ] 279 | } 280 | ], 281 | "source": [ 282 | "res = c.execute(\"SELECT count(*) FROM pets WHERE treatment_done=1\")\n", 283 | "for row in res:\n", 284 | " print(row)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "### Task 7: How many pets have received treatment that we know the type of? (*)" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 17, 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "name": "stdout", 301 | "output_type": "stream", 302 | "text": [ 303 | "(16,)\n" 304 | ] 305 | } 306 | ], 307 | "source": [ 308 | "res = c.execute(\"SELECT count(*) FROM pets WHERE treatment_done=1 AND pet_type IS NOT null\")\n", 309 | "for row in res:\n", 310 | " print(row)" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "### Task 8: How many pets are there from the city called \"east port\"" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 22, 323 | "metadata": {}, 324 | "outputs": [ 325 | { 326 | "name": "stdout", 327 | "output_type": "stream", 328 | "text": [ 329 | "(49,)\n" 330 | ] 331 | } 332 | ], 333 | "source": [ 334 | "res = c.execute(\"SELECT count(*) FROM pets JOIN persons ON pets.owner_id = persons.id WHERE persons.city='east port'\")\n", 335 | "for row in res:\n", 336 | " print(row)" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "### Task 9: How many pets are there from the city called \"east port\" and who received a treatment?" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 23, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "name": "stdout", 353 | "output_type": "stream", 354 | "text": [ 355 | "(11,)\n" 356 | ] 357 | } 358 | ], 359 | "source": [ 360 | "res = c.execute(\"SELECT count(*) FROM pets JOIN persons ON pets.owner_id = persons.id WHERE persons.city='east port' AND pets.treatment_done=1\")\n", 361 | "for row in res:\n", 362 | " print(row)" 363 | ] 364 | } 365 | ], 366 | "metadata": { 367 | "kernelspec": { 368 | "display_name": "Python 3", 369 | "language": "python", 370 | "name": "python3" 371 | }, 372 | "language_info": { 373 | "codemirror_mode": { 374 | "name": "ipython", 375 | "version": 3 376 | }, 377 | "file_extension": ".py", 378 | "mimetype": "text/x-python", 379 | "name": "python", 380 | "nbconvert_exporter": "python", 381 | "pygments_lexer": "ipython3", 382 | "version": "3.6.6" 383 | } 384 | }, 385 | "nbformat": 4, 386 | "nbformat_minor": 2 387 | } 388 | -------------------------------------------------------------------------------- /Lesson 8/petsdb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tirthajyoti/Packt-Data_Wrangling/744b6b243d577ce1a3030a91666fc30409be450a/Lesson 8/petsdb -------------------------------------------------------------------------------- /Lesson 9/Readme.md: -------------------------------------------------------------------------------- 1 | ## Lesson 9 codes and files 2 | -------------------------------------------------------------------------------- /Lesson-7/Lesson 7 Activity 1 - List Top 100 ebooks from Gutenberg.org.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 7 Activity 1: Top 100 ebooks' name extraction from Gutenberg.org\n", 8 | "\n", 9 | "## What is Project Gutenberg? - \n", 10 | "Project Gutenberg is a volunteer effort to digitize and archive cultural works, to \"encourage the creation and distribution of eBooks\". It was founded in 1971 by American writer Michael S. Hart and is the **oldest digital library.** This longest-established ebook project releases books that entered the public domain, and can be freely read or downloaded in various electronic formats.\n", 11 | "\n", 12 | "## What is this activity all about?\n", 13 | "* **This activity aims to scrape the url of the Project Gutenberg's Top 100 ebooks (yesterday's ranking) for identifying the ebook links. **\n", 14 | "* **It uses BeautifulSoup4 for parsing the HTML and regular expression code for identifying the Top 100 ebook file numbers.**\n", 15 | "* **You can use those book ID numbers to download the book into your local drive if you want**" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "### Import necessary libraries including regex, and beautifulsoup" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 3, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "import urllib.request, urllib.parse, urllib.error\n", 32 | "import requests\n", 33 | "from bs4 import BeautifulSoup\n", 34 | "import ssl\n", 35 | "import re" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "### Ignore SSL errors (this code will be given)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "# Ignore SSL certificate errors\n", 52 | "ctx = ssl.create_default_context()\n", 53 | "ctx.check_hostname = False\n", 54 | "ctx.verify_mode = ssl.CERT_NONE" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "### Read the HTML from the URL" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 1, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "# Write your code here" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "### Write a small function to check the status of web request" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 4, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "# Write your function here" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 3, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "# Write your code here to check status" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "### Decode the response and pass on to `BeautifulSoup` for HTML parsing" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 6, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "# Write your code here (decode)" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 8, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "# Write your code here (pass on to BS)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "### Find all the _href_ tags and store them in the list of links. Check how the list looks like - print first 30 elements" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 8, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "# Write your code here" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 7, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "# Write your code here (print the list)" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "### Use regular expression to find the numeric digits in these links.
These are the file number for the Top 100 books." 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "#### Initialize empty list to hold the file numbers" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 9, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "# Write your code here" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "* Number 19 to 118 in the original list of links have the Top 100 ebooks' number. \n", 176 | "* Loop over appropriate range and use regex to find the numeric digits in the link (href) string.\n", 177 | "* Hint: Use `findall()` method" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 10, 183 | "metadata": {}, 184 | "outputs": [], 185 | "source": [ 186 | "# Write your code here" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "#### Print the file numbers" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 11, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "# Write your code here" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "### How does the `soup` object's text look like? Use `.text()` method and print only first 2000 characters (i.e. do not print the whole thing, it is long).\n", 210 | "\n", 211 | "You will notice lot of empty spaces/blanks here and there. Ignore them. They are part of HTML page markup and its whimsical nature!" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 12, 217 | "metadata": { 218 | "scrolled": false 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "# Write your code here" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### Search in the extracted text (using regular expression) from the `soup` object to find the names of top 100 Ebooks (Yesterday's rank)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 13, 235 | "metadata": {}, 236 | "outputs": [], 237 | "source": [ 238 | "# Temp empty list of Ebook names\n", 239 | "# Write your code here" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "#### Create a starting index. It should point at the text _\"Top 100 Ebooks yesterday\"_. Hint: Use `splitlines()` method of the `soup.text`. It splits the lines of the text of the `soup` object." 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 14, 252 | "metadata": {}, 253 | "outputs": [], 254 | "source": [ 255 | "# Write your code here" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "#### Loop 1-100 to add the strings of next 100 lines to this temporary list. Hint: `splitlines()` method" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": 15, 268 | "metadata": {}, 269 | "outputs": [], 270 | "source": [ 271 | "# Write your code here" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "#### Use regular expression to extract only text from the name strings and append to an empty list\n", 279 | "* Hint: Use `match` and `span` to find indices and use them" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 16, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [ 288 | "# Write your code here" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "#### Print the list of titles" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 17, 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [ 304 | "# Write your code here" 305 | ] 306 | } 307 | ], 308 | "metadata": { 309 | "kernelspec": { 310 | "display_name": "Python 3", 311 | "language": "python", 312 | "name": "python3" 313 | }, 314 | "language_info": { 315 | "codemirror_mode": { 316 | "name": "ipython", 317 | "version": 3 318 | }, 319 | "file_extension": ".py", 320 | "mimetype": "text/x-python", 321 | "name": "python", 322 | "nbconvert_exporter": "python", 323 | "pygments_lexer": "ipython3", 324 | "version": "3.6.2" 325 | }, 326 | "latex_envs": { 327 | "LaTeX_envs_menu_present": true, 328 | "autoclose": false, 329 | "autocomplete": true, 330 | "bibliofile": "biblio.bib", 331 | "cite_by": "apalike", 332 | "current_citInitial": 1, 333 | "eqLabelWithNumbers": true, 334 | "eqNumInitial": 1, 335 | "hotkeys": { 336 | "equation": "Ctrl-E", 337 | "itemize": "Ctrl-I" 338 | }, 339 | "labels_anchors": false, 340 | "latex_user_defs": false, 341 | "report_style_numbering": false, 342 | "user_envs_cfg": false 343 | } 344 | }, 345 | "nbformat": 4, 346 | "nbformat_minor": 2 347 | } 348 | -------------------------------------------------------------------------------- /Lesson-7/Lesson 7 Activity 2 - Build your own movie database - SOLUTION.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 7: Advanced web scraping and data gathering\n", 8 | "## Activity 2: Build your own movie database by reading from an API\n", 9 | "### This notebook does the following\n", 10 | "* Retrieves and prints basic data about a movie (title entered by user) from the web (OMDB database)\n", 11 | "* If a poster of the movie could be found, it downloads the file and saves at a user-specified location" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 3, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import urllib.request, urllib.parse, urllib.error\n", 21 | "import json" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Load the secret API key (you have to get one from OMDB website and use that, 1000 daily limit) from a JSON file, stored in the same folder into a variable\n", 29 | "Hint: Use **`json.loads()`**\n", 30 | "\n", 31 | "#### Note: The following cell will not be executed in the solution notebook because the author cannot give out his private API key. \n", 32 | "#### Students/users/instructor will need to obtain a key and store in a JSON file. \n", 33 | "#### For the code's sake, we are calling this file `APIkeys.json`. But you need to store your own key in this file." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 5, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "with open('APIkeys.json') as f:\n", 43 | " keys = json.load(f)\n", 44 | " omdbapi = keys['OMDBapi']" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "### The final URL to be passed should look like: http://www.omdbapi.com/?t=movie_name&apikey=secretapikey \n", 52 | "Do the following,\n", 53 | "* Assign the OMDB portal (http://www.omdbapi.com/?) as a string to a variable `serviceurl` (don't miss the `?`)\n", 54 | "* Create a variable `apikey` with the last portion of the URL (\"&apikey=secretapikey\"), where `secretapikey` is your own API key (an actual code)\n", 55 | "* The movie name portion i.e. \"t=movie_name\" will be addressed later" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 11, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "serviceurl = 'http://www.omdbapi.com/?'\n", 65 | "apikey = '&apikey='+omdbapi" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "### Write a utility function `print_json` to print nicely the movie data from a JSON file (which we will get from the portal)\n", 73 | "Here are the keys of a JSON file,\n", 74 | "\n", 75 | "'Title', 'Year', 'Rated', 'Released', 'Runtime', 'Genre', 'Director', 'Writer', 'Actors', 'Plot', 'Language','Country', 'Awards', 'Ratings', 'Metascore', 'imdbRating', 'imdbVotes', 'imdbID'" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 12, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "def print_json(json_data):\n", 85 | " list_keys=['Title', 'Year', 'Rated', 'Released', 'Runtime', 'Genre', 'Director', 'Writer', \n", 86 | " 'Actors', 'Plot', 'Language', 'Country', 'Awards', 'Ratings', \n", 87 | " 'Metascore', 'imdbRating', 'imdbVotes', 'imdbID']\n", 88 | " print(\"-\"*50)\n", 89 | " for k in list_keys:\n", 90 | " if k in list(json_data.keys()):\n", 91 | " print(f\"{k}: {json_data[k]}\")\n", 92 | " print(\"-\"*50)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "### Write a utility function to download a poster of the movie based on the information from the jason dataset and save in your local folder\n", 100 | "\n", 101 | "* Use `os` module\n", 102 | "* The poster data is stored in the JSON key 'Poster'\n", 103 | "* You may want to split the name of the Poster file and extract the file extension only. Let's say the extension is ***'jpg'***.\n", 104 | "* Then later join this extension to the movie name and create a filename like ***movie.jpg***\n", 105 | "* Use the Python command `open` to open a file and write the poster data. Close the file after done.\n", 106 | "* This function may not return anything. It just saves the poster data as an image file." 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 13, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "def save_poster(json_data):\n", 116 | " import os\n", 117 | " title = json_data['Title']\n", 118 | " poster_url = json_data['Poster']\n", 119 | " # Splits the poster url by '.' and picks up the last string as file extension\n", 120 | " poster_file_extension=poster_url.split('.')[-1]\n", 121 | " # Reads the image file from web\n", 122 | " poster_data = urllib.request.urlopen(poster_url).read()\n", 123 | " \n", 124 | " savelocation=os.getcwd()+'\\\\'+'Posters'+'\\\\'\n", 125 | " # Creates new directory if the directory does not exist. Otherwise, just use the existing path.\n", 126 | " if not os.path.isdir(savelocation):\n", 127 | " os.mkdir(savelocation)\n", 128 | " \n", 129 | " filename=savelocation+str(title)+'.'+poster_file_extension\n", 130 | " f=open(filename,'wb')\n", 131 | " f.write(poster_data)\n", 132 | " f.close()" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "### Write a utility function `search_movie` to search a movie by its name, print the downloaded JSON data (use the `print_json` function for this) and save the movie poster in the local folder (use `save_poster` function for this)\n", 140 | "\n", 141 | "* Use `try-except` loop for this i.e. try to connect to the web portal, if successful proceed but if not (i.e. exception raised) then just print an error message\n", 142 | "* Here use the previously created variables `serviceurl` and `apikey`\n", 143 | "* You have to pass on a dictionary with a key `t` and the movie name as the corresponding value to `urllib.parse.urlencode()` function and then add the `serviceurl` and `apikey` to the output of the function to construct the full URL\n", 144 | "* This URL will be used for accessing the data\n", 145 | "* The JSON data has a key called `Response`. If it is `True`, that means the read was successful. Check this before processing the data. If not successful, then print the JSON key `Error`, which will contain the appropriate error message returned by the movie database." 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 20, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "def search_movie(title):\n", 155 | " try:\n", 156 | " url = serviceurl + urllib.parse.urlencode({'t': str(title)})+apikey\n", 157 | " print(f'Retrieving the data of \"{title}\" now... ')\n", 158 | " print(url)\n", 159 | " uh = urllib.request.urlopen(url)\n", 160 | " data = uh.read()\n", 161 | " json_data=json.loads(data)\n", 162 | " \n", 163 | " if json_data['Response']=='True':\n", 164 | " print_json(json_data)\n", 165 | " # Asks user whether to download the poster of the movie\n", 166 | " if json_data['Poster']!='N/A':\n", 167 | " save_poster(json_data)\n", 168 | " else:\n", 169 | " print(\"Error encountered: \",json_data['Error'])\n", 170 | " \n", 171 | " except urllib.error.URLError as e:\n", 172 | " print(f\"ERROR: {e.reason}\")" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "### Test `search_movie` function by entering *Titanic*" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 21, 185 | "metadata": {}, 186 | "outputs": [ 187 | { 188 | "name": "stdout", 189 | "output_type": "stream", 190 | "text": [ 191 | "Retrieving the data of \"Titanic\" now... \n", 192 | "http://www.omdbapi.com/?t=Titanic&apikey=17cdc959\n", 193 | "--------------------------------------------------\n", 194 | "Title: Titanic\n", 195 | "Year: 1997\n", 196 | "Rated: PG-13\n", 197 | "Released: 19 Dec 1997\n", 198 | "Runtime: 194 min\n", 199 | "Genre: Drama, Romance\n", 200 | "Director: James Cameron\n", 201 | "Writer: James Cameron\n", 202 | "Actors: Leonardo DiCaprio, Kate Winslet, Billy Zane, Kathy Bates\n", 203 | "Plot: A seventeen-year-old aristocrat falls in love with a kind but poor artist aboard the luxurious, ill-fated R.M.S. Titanic.\n", 204 | "Language: English, Swedish\n", 205 | "Country: USA\n", 206 | "Awards: Won 11 Oscars. Another 111 wins & 77 nominations.\n", 207 | "Ratings: [{'Source': 'Internet Movie Database', 'Value': '7.8/10'}, {'Source': 'Rotten Tomatoes', 'Value': '89%'}, {'Source': 'Metacritic', 'Value': '75/100'}]\n", 208 | "Metascore: 75\n", 209 | "imdbRating: 7.8\n", 210 | "imdbVotes: 913,780\n", 211 | "imdbID: tt0120338\n", 212 | "--------------------------------------------------\n" 213 | ] 214 | } 215 | ], 216 | "source": [ 217 | "search_movie(\"Titanic\")" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "### Test `search_movie` function by entering \"*Random_error*\" (obviously this will not be found and you should be able to check whether your error catching code is working properly)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 22, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "name": "stdout", 234 | "output_type": "stream", 235 | "text": [ 236 | "Retrieving the data of \"Random_error\" now... \n", 237 | "http://www.omdbapi.com/?t=Random_error&apikey=17cdc959\n", 238 | "Error encountered: Movie not found!\n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "search_movie(\"Random_error\")" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "### Look for a folder called 'Posters' in the same directory you are working in. It should contain a file called 'Titanic.jpg'. Open and see if the poster came alright!" 251 | ] 252 | } 253 | ], 254 | "metadata": { 255 | "kernelspec": { 256 | "display_name": "Python 3", 257 | "language": "python", 258 | "name": "python3" 259 | }, 260 | "language_info": { 261 | "codemirror_mode": { 262 | "name": "ipython", 263 | "version": 3 264 | }, 265 | "file_extension": ".py", 266 | "mimetype": "text/x-python", 267 | "name": "python", 268 | "nbconvert_exporter": "python", 269 | "pygments_lexer": "ipython3", 270 | "version": "3.6.2" 271 | }, 272 | "latex_envs": { 273 | "LaTeX_envs_menu_present": true, 274 | "autoclose": false, 275 | "autocomplete": true, 276 | "bibliofile": "biblio.bib", 277 | "cite_by": "apalike", 278 | "current_citInitial": 1, 279 | "eqLabelWithNumbers": true, 280 | "eqNumInitial": 1, 281 | "hotkeys": { 282 | "equation": "Ctrl-E", 283 | "itemize": "Ctrl-I" 284 | }, 285 | "labels_anchors": false, 286 | "latex_user_defs": false, 287 | "report_style_numbering": false, 288 | "user_envs_cfg": false 289 | } 290 | }, 291 | "nbformat": 4, 292 | "nbformat_minor": 2 293 | } 294 | -------------------------------------------------------------------------------- /Lesson-7/Lesson 7 Activity 2 - Build your own movie database.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lesson 7: Advanced web scraping and data gathering\n", 8 | "## Activity 2: Build your own movie database by reading from an API\n", 9 | "### This notebook does the following\n", 10 | "* Retrieves and prints basic data about a movie (title entered by user) from the web (OMDB database)\n", 11 | "* If a poster of the movie could be found, it downloads the file and saves at a user-specified location" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 3, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import urllib.request, urllib.parse, urllib.error\n", 21 | "import json" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Load the secret API key (you have to get one from OMDB website and use that, 1000 daily limit) from a JSON file, stored in the same folder into a variable\n", 29 | "Hint: Use **`json.loads()`**\n", 30 | "\n", 31 | "#### Note: The following cell will not be executed in the solution notebook because the author cannot give out his private API key. \n", 32 | "#### Students/users/instructor will need to obtain a key and store in a JSON file. \n", 33 | "#### For the code's sake, we are calling this file `APIkeys.json`. But you need to store your own key in this file.\n", 34 | "#### An example file called `\"APIkey_Bogus_example.json\"` is given along with the notebook. Just change the code in this file and rename as `APIkeys.json`. The file name does not matter of course." 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 1, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "# Write your code here" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### The final URL to be passed should look like: http://www.omdbapi.com/?t=movie_name&apikey=secretapikey \n", 51 | "Do the following,\n", 52 | "* Assign the OMDB portal (http://www.omdbapi.com/?) as a string to a variable `serviceurl` (don't miss the `?`)\n", 53 | "* Create a variable `apikey` with the last portion of the URL (\"&apikey=secretapikey\"), where `secretapikey` is your own API key (an actual code)\n", 54 | "* The movie name portion i.e. \"t=movie_name\" will be addressed later" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 2, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "# Write your code here" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "### Write a utility function `print_json` to print nicely the movie data from a JSON file (which we will get from the portal)\n", 71 | "Here are the keys of a JSON file,\n", 72 | "\n", 73 | "'Title', 'Year', 'Rated', 'Released', 'Runtime', 'Genre', 'Director', 'Writer', 'Actors', 'Plot', 'Language','Country', 'Awards', 'Ratings', 'Metascore', 'imdbRating', 'imdbVotes', 'imdbID'" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 3, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "# Write your code here" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "### Write a utility function to download a poster of the movie based on the information from the jason dataset and save in your local folder\n", 90 | "\n", 91 | "* Use `os` module\n", 92 | "* The poster data is stored in the JSON key 'Poster'\n", 93 | "* You may want to split the name of the Poster file and extract the file extension only. Let's say the extension is ***'jpg'***.\n", 94 | "* Then later join this extension to the movie name and create a filename like ***movie.jpg***\n", 95 | "* Use the Python command `open` to open a file and write the poster data. Close the file after done.\n", 96 | "* This function may not return anything. It just saves the poster data as an image file." 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 4, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "# Write your code here" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "### Write a utility function `search_movie` to search a movie by its name, print the downloaded JSON data (use the `print_json` function for this) and save the movie poster in the local folder (use `save_poster` function for this)\n", 113 | "\n", 114 | "* Use `try-except` loop for this i.e. try to connect to the web portal, if successful proceed but if not (i.e. exception raised) then just print an error message\n", 115 | "* Here use the previously created variables `serviceurl` and `apikey`\n", 116 | "* You have to pass on a dictionary with a key `t` and the movie name as the corresponding value to `urllib.parse.urlencode()` function and then add the `serviceurl` and `apikey` to the output of the function to construct the full URL\n", 117 | "* This URL will be used for accessing the data\n", 118 | "* The JSON data has a key called `Response`. If it is `True`, that means the read was successful. Check this before processing the data. If not successful, then print the JSON key `Error`, which will contain the appropriate error message returned by the movie database." 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 5, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "# Write your code here" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "### Test `search_movie` function by entering *Titanic*" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 6, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "# Write your code here" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "### Test `search_movie` function by entering \"*Random_error*\" (obviously this will not be found and you should be able to check whether your error catching code is working properly)" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 7, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "# Write your code here" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "### Look for a folder called 'Posters' in the same directory you are working in. It should contain a file called 'Titanic.jpg'. Open and see if the poster came alright!" 167 | ] 168 | } 169 | ], 170 | "metadata": { 171 | "kernelspec": { 172 | "display_name": "Python 3", 173 | "language": "python", 174 | "name": "python3" 175 | }, 176 | "language_info": { 177 | "codemirror_mode": { 178 | "name": "ipython", 179 | "version": 3 180 | }, 181 | "file_extension": ".py", 182 | "mimetype": "text/x-python", 183 | "name": "python", 184 | "nbconvert_exporter": "python", 185 | "pygments_lexer": "ipython3", 186 | "version": "3.6.2" 187 | }, 188 | "latex_envs": { 189 | "LaTeX_envs_menu_present": true, 190 | "autoclose": false, 191 | "autocomplete": true, 192 | "bibliofile": "biblio.bib", 193 | "cite_by": "apalike", 194 | "current_citInitial": 1, 195 | "eqLabelWithNumbers": true, 196 | "eqNumInitial": 1, 197 | "hotkeys": { 198 | "equation": "Ctrl-E", 199 | "itemize": "Ctrl-I" 200 | }, 201 | "labels_anchors": false, 202 | "latex_user_defs": false, 203 | "report_style_numbering": false, 204 | "user_envs_cfg": false 205 | } 206 | }, 207 | "nbformat": 4, 208 | "nbformat_minor": 2 209 | } 210 | -------------------------------------------------------------------------------- /Lesson-7/Readme.md: -------------------------------------------------------------------------------- 1 | ## Lesson 7 codes 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Please feel free to [add me here on LinkedIn](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7/) if you are interested in data science and like to connect. 2 | 3 | # Packt-Data_Wrangling 4 | Code repo for Packt course I developed, "Beginning Data Wrangling with Python". 5 | 6 | ### [Here you can buy this book](https://www.amazon.com/Data-Wrangling-Python-Creating-actionable-ebook-dp-B07JF26NGJ/dp/B07JF26NGJ/) 7 | ![image](https://images-na.ssl-images-amazon.com/images/I/51-AuclWzTL._SX403_BO1,204,203,200_.jpg) 8 | 9 | ## What is this course about? 10 | “Data is the new Oil” and it is ruling the modern way of life through incredibly smart tools and transformative technologies. But oil does not come out in its final form from the rig. It has to be refined through a complex processing network. Similarly, data needs to be curated, massaged and refined to be used in intelligent algorithms and consumer products. This is called “wrangling” and (according to Forbes) all the good data scientists spend almost 60-80% of their time on this, each day, every project. It involves scraping the raw data from multiple sources (including web and database tables), imputing, formatting, transforming – basically making it ready, to be used flawlessly in the modeling process. 11 | 12 | This course aims to teach you all the core ideas behind this process and to equip you with the knowledge of the most popular tools and techniques in the domain. As the programming framework, we have chosen Python, the most widely used language for data science. We work through real-life examples, not toy datasets. At the end of this course, you will be confident to handle a myriad array of sources to extract, clean, transform, and format your data for the great machine learning app you are thinking of building. Hop on and be the part of this exciting journey. 13 | 14 | 15 | ## What you will learn 16 | * Able to manipulate complex and simple data structure using Python and it’s built-in functions 17 | * Use the fundamental and advanced level of Pandas DataFrames and numpy.array. Manipulate them at 18 | run time. 19 | * Extract and format data from various formats (textual) – normal text file, SQL, CSV, Excel, JSON, and 20 | XML 21 | * Perform web scraping using Python libraries such as BeautifulSoup4 and html5lib 22 | * Perform advanced string search and manipulation using Python and RegEX 23 | * Handle outliers, apply advanced programming tricks, and perform data imputation using Pandas 24 | * Basic descriptive statistics and plotting techniques in Python for quick examination of data 25 | * Practice data wrangling and modeling using the random data generation techniques - Bonus Topic 26 | 27 | ### Hardware requirements 28 | For an optimal student experience, we recommend the following hardware configuration: 29 | * **OS**: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit, Ubuntu Linux, or the latest 30 | version of OS X 31 | * **Processor**: Intel Core i5 or equivalent 32 | * **Memory**: 4GB RAM (8 GB RAM Preferred) 33 | * **Hard disk**: 40GB or more 34 | * A strong and stable Internet connection 35 | 36 | ### Software requirements 37 | You’ll also need the following software installed in advance: 38 | * Browser: Google Chrome/Mozilla Firefox Latest Version 39 | * Python 3.4+ (preferably Python 3.6) installed (from https://python.org) 40 | * Python libraries as needed (Jupyter, Numpy, Pandas, Matplotlib, BeautifulSoup4, and so) 41 | * Notepad++/Sublime Text (latest version), Atom IDE (latest version) or other similar text editor applications. 42 | 43 | * The following python libraries installed: 44 | * NumPy 45 | * Pandas 46 | * SciPy 47 | * Matplotlib 48 | * BeautifulSoup4 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | --------------------------------------------------------------------------------