├── practiceResource ├── heart.pkl ├── save.hdf ├── save.pkl ├── data │ ├── example.pkl │ ├── example_1.csv │ ├── example_2.csv │ ├── example_3.csv │ ├── example_4.csv │ ├── CreatingDataFrames.ipynb │ └── heart.csv ├── Questions.ipynb ├── heart.csv ├── dataMaipulation │ ├── Questions.ipynb │ └── 5_Basics_ApplyMapVectorised.ipynb ├── Answers.ipynb ├── SavingAndSerialising.ipynb ├── .ipynb_checkpoints │ └── dataSavingAndSerialising-checkpoint.ipynb ├── dataloading.ipynb └── dataSavingAndSerialising.ipynb └── NewPracticeResource ├── Practice_material_2.ipynb └── practice_material.ipynb /practiceResource/heart.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/heart.pkl -------------------------------------------------------------------------------- /practiceResource/save.hdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/save.hdf -------------------------------------------------------------------------------- /practiceResource/save.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/save.pkl -------------------------------------------------------------------------------- /practiceResource/data/example.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/data/example.pkl -------------------------------------------------------------------------------- /practiceResource/data/example_1.csv: -------------------------------------------------------------------------------- 1 | name,gender,age,oaths 2 | Kaladin,Male,20.0,3.0 3 | Shallan,Female,17.0,2.0 4 | Dalinar,Male,53.0,3.0 5 | Szeth,Male,35.0,0.0 6 | Hoid,Male,, 7 | Jashnah,Female,34.0,3.0 8 | -------------------------------------------------------------------------------- /practiceResource/data/example_2.csv: -------------------------------------------------------------------------------- 1 | name gender age oaths 2 | Kaladin Male 20.0 3.0 3 | Shallan Female 17.0 2.0 4 | Dalinar Male 53.0 3.0 5 | Szeth Male 35.0 0.0 6 | Hoid Male 7 | Jashnah Female 34.0 3.0 8 | -------------------------------------------------------------------------------- /practiceResource/data/example_3.csv: -------------------------------------------------------------------------------- 1 | ,name,gender,age,oaths 2 | 0,Kaladin,Male,20.0,3.0 3 | 1,Shallan,Female,17.0,2.0 4 | 2,Dalinar,Male,53.0,3.0 5 | 3,Szeth,Male,35.0,0.0 6 | 4,Hoid,Male,, 7 | 5,Jashnah,Female,34.0,3.0 8 | -------------------------------------------------------------------------------- /practiceResource/data/example_4.csv: -------------------------------------------------------------------------------- 1 | # This file contains details guessed from Stormlight Archive 2 | name|gender|age|oaths 3 | Kaladin|Male|20.0|3.0 4 | Shallan|Female|17.0|2.0 5 | Dalinar|Male|53.0|3.0 6 | Szeth|Male|35.0|0.0 7 | # Who knows about Hoid 8 | Hoid|Male|NaN|NaN 9 | Jashnah|Female|34.0|3.0 10 | -------------------------------------------------------------------------------- /practiceResource/Questions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Optional Exercise - Data loading\n", 8 | "\n", 9 | "Heres a very short example for four files to load in. None of these example should be hard, but the idea is to familiarise yourself with the sorts of error messages or weird output you'd get if you load in with the defaults (and they don't work), and then how to change your input arguments to make it better.\n", 10 | "\n", 11 | "The files to attempt to load in are:\n", 12 | "\n", 13 | "1. example.pkl\n", 14 | "2. example_1.csv\n", 15 | "3. example_2.csv\n", 16 | "3. example_3.csv\n", 17 | "3. example_4.csv" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 1, 23 | "metadata": { 24 | "ExecuteTime": { 25 | "end_time": "2020-03-17T04:34:36.761456Z", 26 | "start_time": "2020-03-17T04:34:36.452524Z" 27 | } 28 | }, 29 | "outputs": [], 30 | "source": [ 31 | "import pandas as pd" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 3, 37 | "metadata": { 38 | "ExecuteTime": { 39 | "end_time": "2020-03-17T04:35:03.643107Z", 40 | "start_time": "2020-03-17T04:35:03.639118Z" 41 | } 42 | }, 43 | "outputs": [], 44 | "source": [ 45 | "# Load example.pkl" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 4, 51 | "metadata": { 52 | "ExecuteTime": { 53 | "end_time": "2020-03-17T04:35:05.900792Z", 54 | "start_time": "2020-03-17T04:35:05.896809Z" 55 | } 56 | }, 57 | "outputs": [], 58 | "source": [ 59 | "# Load example_1.csv" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 5, 65 | "metadata": { 66 | "ExecuteTime": { 67 | "end_time": "2020-03-17T04:35:07.023962Z", 68 | "start_time": "2020-03-17T04:35:07.021962Z" 69 | } 70 | }, 71 | "outputs": [], 72 | "source": [ 73 | "# Load example_2.csv" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 6, 79 | "metadata": { 80 | "ExecuteTime": { 81 | "end_time": "2020-03-17T04:35:07.698522Z", 82 | "start_time": "2020-03-17T04:35:07.694534Z" 83 | } 84 | }, 85 | "outputs": [], 86 | "source": [ 87 | "# Load example_3.csv" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 7, 93 | "metadata": { 94 | "ExecuteTime": { 95 | "end_time": "2020-03-17T04:35:08.514409Z", 96 | "start_time": "2020-03-17T04:35:08.511417Z" 97 | } 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "# Load example_4.csv" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [] 110 | } 111 | ], 112 | "metadata": { 113 | "kernelspec": { 114 | "display_name": "Python 3", 115 | "language": "python", 116 | "name": "python3" 117 | }, 118 | "language_info": { 119 | "codemirror_mode": { 120 | "name": "ipython", 121 | "version": 3 122 | }, 123 | "file_extension": ".py", 124 | "mimetype": "text/x-python", 125 | "name": "python", 126 | "nbconvert_exporter": "python", 127 | "pygments_lexer": "ipython3", 128 | "version": "3.7.3" 129 | } 130 | }, 131 | "nbformat": 4, 132 | "nbformat_minor": 2 133 | } 134 | -------------------------------------------------------------------------------- /NewPracticeResource/Practice_material_2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Download dataset form this website\n", 8 | "\n", 9 | "## https://www.kaggle.com/faressayah/stanford-open-policing-project?select=police_project.csv" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Description::" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.\n" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## Importing libraries::" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "import pandas as pd\n", 40 | "import numpy as np" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "# Use Pandas' read_csv function open it as a DataFrame" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": null, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "# What does each row represent?\n", 62 | "\n", 63 | "#### hint::\n", 64 | "head : Return the first n rows. (By default return first 5 rows.)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "# How to get the basic statistics of all the columns?" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "# How to check the shape of dataset?" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "# Check the type of columns?" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "# Locating missing Values?\n", 121 | "#### detecting missing values\n", 122 | "#### calculates the sum of each column\n" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "# Dropping Column that only contains missing values." 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "# Do the men or women speed more often?" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "# Which year had the least number of stops?" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": null, 198 | "metadata": {}, 199 | "outputs": [], 200 | "source": [] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "# Does gender affect who gets searched during a stop?" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "\n", 228 | "# How does drug activity change by time of day?" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": null, 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "# Do most stops occur at night?" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": {}, 256 | "outputs": [], 257 | "source": [] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": null, 281 | "metadata": {}, 282 | "outputs": [], 283 | "source": [] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "metadata": {}, 289 | "outputs": [], 290 | "source": [] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": {}, 296 | "outputs": [], 297 | "source": [] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [] 305 | } 306 | ], 307 | "metadata": { 308 | "environment": { 309 | "name": "tf-gpu.1-15.m56", 310 | "type": "gcloud", 311 | "uri": "gcr.io/deeplearning-platform-release/tf-gpu.1-15:m56" 312 | }, 313 | "kernelspec": { 314 | "display_name": "Python 3", 315 | "language": "python", 316 | "name": "python3" 317 | }, 318 | "language_info": { 319 | "codemirror_mode": { 320 | "name": "ipython", 321 | "version": 3 322 | }, 323 | "file_extension": ".py", 324 | "mimetype": "text/x-python", 325 | "name": "python", 326 | "nbconvert_exporter": "python", 327 | "pygments_lexer": "ipython3", 328 | "version": "3.7.8" 329 | } 330 | }, 331 | "nbformat": 4, 332 | "nbformat_minor": 4 333 | } 334 | -------------------------------------------------------------------------------- /NewPracticeResource/practice_material.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Practice Assignment :: 01" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## How to import pandas and check the version?" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "## import useful libraries" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "## How to create a series from a list, numpy array and dict?" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "## How to convert the index of a series into a column of a dataframe?" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "## hint::\n", 85 | "### Convert the series ser into a dataframe with its index as another column on the dataframe." 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "# How to combine many series to form a dataframe?" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": null, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "# How to calculate the number of characters in each word in a series?" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "# How to filter valid emails from a series?" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "## Desired Output::\n", 156 | "1 rameses@egypt.com\n", 157 | "\n", 158 | "2 matt@t.com\n", 159 | "\n", 160 | "3 narendra@modi.com\n" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "# How to replace missing spaces in a string with the least frequent character?" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "## Input::\n", 189 | "### my_str = 'dbc deb abed gade'\n", 190 | "## Desired Output::\n", 191 | "### 'dbccdebcabedcgade' # least frequent is 'c'" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "# How to swap two rows of a dataframe?" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": {}, 226 | "outputs": [], 227 | "source": [] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "# How to get the positions where values of two columns match?" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": {}, 247 | "outputs": [], 248 | "source": [] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "# How to replace both the diagonals of dataframe with 0?" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": {}, 268 | "outputs": [], 269 | "source": [] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "# How to get the particular group of a groupby dataframe by key?" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "### This is a question related to understanding of grouped dataframe." 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "metadata": {}, 289 | "outputs": [], 290 | "source": [] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": {}, 296 | "outputs": [], 297 | "source": [] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "# Which column contains the highest number of row-wise maximum values?" 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "### Obtain the column name with the highest number of row-wise maximum’s in df." 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": null, 323 | "metadata": {}, 324 | "outputs": [], 325 | "source": [] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": {}, 331 | "outputs": [], 332 | "source": [] 333 | } 334 | ], 335 | "metadata": { 336 | "environment": { 337 | "name": "tf-gpu.1-15.m56", 338 | "type": "gcloud", 339 | "uri": "gcr.io/deeplearning-platform-release/tf-gpu.1-15:m56" 340 | }, 341 | "kernelspec": { 342 | "display_name": "Python 3", 343 | "language": "python", 344 | "name": "python3" 345 | }, 346 | "language_info": { 347 | "codemirror_mode": { 348 | "name": "ipython", 349 | "version": 3 350 | }, 351 | "file_extension": ".py", 352 | "mimetype": "text/x-python", 353 | "name": "python", 354 | "nbconvert_exporter": "python", 355 | "pygments_lexer": "ipython3", 356 | "version": "3.7.8" 357 | } 358 | }, 359 | "nbformat": 4, 360 | "nbformat_minor": 4 361 | } 362 | -------------------------------------------------------------------------------- /practiceResource/data/CreatingDataFrames.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Creating DataFrames\n", 8 | "\n", 9 | "Many ways to do it!" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": { 16 | "ExecuteTime": { 17 | "end_time": "2020-02-16T02:33:23.867017Z", 18 | "start_time": "2020-02-16T02:33:21.139382Z" 19 | } 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd\n", 24 | "import numpy as np" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 5, 30 | "metadata": { 31 | "ExecuteTime": { 32 | "end_time": "2020-02-16T02:48:39.727399Z", 33 | "start_time": "2020-02-16T02:48:39.707843Z" 34 | } 35 | }, 36 | "outputs": [ 37 | { 38 | "name": "stdout", 39 | "output_type": "stream", 40 | "text": [ 41 | "[[0.97699724 0.23250035 0.17454747]\n", 42 | " [0.11011626 0.90673085 0.37222005]\n", 43 | " [0.77665114 0.81701713 0.57427769]\n", 44 | " [0.34080801 0.09617229 0.26027026]\n", 45 | " [0.03694591 0.5385542 0.95945971]]\n" 46 | ] 47 | }, 48 | { 49 | "data": { 50 | "text/html": [ 51 | "
\n", 52 | "\n", 65 | "\n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | "
ABC
00.9769970.2325000.174547
10.1101160.9067310.372220
20.7766510.8170170.574278
30.3408080.0961720.260270
40.0369460.5385540.959460
\n", 107 | "
" 108 | ], 109 | "text/plain": [ 110 | " A B C\n", 111 | "0 0.976997 0.232500 0.174547\n", 112 | "1 0.110116 0.906731 0.372220\n", 113 | "2 0.776651 0.817017 0.574278\n", 114 | "3 0.340808 0.096172 0.260270\n", 115 | "4 0.036946 0.538554 0.959460" 116 | ] 117 | }, 118 | "execution_count": 5, 119 | "metadata": {}, 120 | "output_type": "execute_result" 121 | } 122 | ], 123 | "source": [ 124 | "data = np.random.random(size=(5, 3))\n", 125 | "print(data)\n", 126 | "\n", 127 | "# Common 2D array and columns method\n", 128 | "df = pd.DataFrame(data=data, columns=[\"A\", \"B\", \"C\"])\n", 129 | "df" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 6, 135 | "metadata": { 136 | "ExecuteTime": { 137 | "end_time": "2020-02-16T02:49:34.262524Z", 138 | "start_time": "2020-02-16T02:49:34.252447Z" 139 | } 140 | }, 141 | "outputs": [ 142 | { 143 | "data": { 144 | "text/html": [ 145 | "
\n", 146 | "\n", 159 | "\n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | "
AB
01Sam
12Alex
23John
\n", 185 | "
" 186 | ], 187 | "text/plain": [ 188 | " A B\n", 189 | "0 1 Sam\n", 190 | "1 2 Alex\n", 191 | "2 3 John" 192 | ] 193 | }, 194 | "execution_count": 6, 195 | "metadata": {}, 196 | "output_type": "execute_result" 197 | } 198 | ], 199 | "source": [ 200 | "# A dictionary of columns\n", 201 | "df = pd.DataFrame(data={\"A\": [1, 2, 3], \"B\": [\"Sam\", \"Alex\", \"John\"]})\n", 202 | "df" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 9, 208 | "metadata": { 209 | "ExecuteTime": { 210 | "end_time": "2020-02-16T02:51:16.389841Z", 211 | "start_time": "2020-02-16T02:51:16.379319Z" 212 | } 213 | }, 214 | "outputs": [ 215 | { 216 | "data": { 217 | "text/html": [ 218 | "
\n", 219 | "\n", 232 | "\n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | "
AB
01Sam
12Alex
23John
\n", 258 | "
" 259 | ], 260 | "text/plain": [ 261 | " A B\n", 262 | "0 1 Sam\n", 263 | "1 2 Alex\n", 264 | "2 3 John" 265 | ] 266 | }, 267 | "execution_count": 9, 268 | "metadata": {}, 269 | "output_type": "execute_result" 270 | } 271 | ], 272 | "source": [ 273 | "# Or a list of rows (ie tuples) with a dtype\n", 274 | "dtype = [(\"A\", np.int), (\"B\", (np.str, 20))]\n", 275 | "data = np.array([(1, \"Sam\"), (2, \"Alex\"), (3, \"John\")], dtype=dtype)\n", 276 | "df = pd.DataFrame(data)\n", 277 | "df" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 10, 283 | "metadata": { 284 | "ExecuteTime": { 285 | "end_time": "2020-02-16T02:52:39.660112Z", 286 | "start_time": "2020-02-16T02:52:39.651418Z" 287 | } 288 | }, 289 | "outputs": [ 290 | { 291 | "data": { 292 | "text/html": [ 293 | "
\n", 294 | "\n", 307 | "\n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | "
AB
01Sam
12Alex
23John
\n", 333 | "
" 334 | ], 335 | "text/plain": [ 336 | " A B\n", 337 | "0 1 Sam\n", 338 | "1 2 Alex\n", 339 | "2 3 John" 340 | ] 341 | }, 342 | "execution_count": 10, 343 | "metadata": {}, 344 | "output_type": "execute_result" 345 | } 346 | ], 347 | "source": [ 348 | "# Or the dictionary based version of list of rows\n", 349 | "data = [{\"A\": 1, \"B\": \"Sam\"}, {\"A\": 2, \"B\": \"Alex\"}, {\"A\": 3, \"B\": \"John\"}]\n", 350 | "df = pd.DataFrame(data)\n", 351 | "df" 352 | ] 353 | } 354 | ], 355 | "metadata": { 356 | "kernelspec": { 357 | "display_name": "Python 3", 358 | "language": "python", 359 | "name": "python3" 360 | }, 361 | "language_info": { 362 | "codemirror_mode": { 363 | "name": "ipython", 364 | "version": 3 365 | }, 366 | "file_extension": ".py", 367 | "mimetype": "text/x-python", 368 | "name": "python", 369 | "nbconvert_exporter": "python", 370 | "pygments_lexer": "ipython3", 371 | "version": "3.7.3" 372 | } 373 | }, 374 | "nbformat": 4, 375 | "nbformat_minor": 2 376 | } 377 | -------------------------------------------------------------------------------- /practiceResource/heart.csv: -------------------------------------------------------------------------------- 1 | age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target 2 | 63,1,3,145,233,1,0,150,0,2.3,0,0,1,1 3 | 37,1,2,130,250,0,1,187,0,3.5,0,0,2,1 4 | 41,0,1,130,204,0,0,172,0,1.4,2,0,2,1 5 | 56,1,1,120,236,0,1,178,0,0.8,2,0,2,1 6 | 57,0,0,120,354,0,1,163,1,0.6,2,0,2,1 7 | 57,1,0,140,192,0,1,148,0,0.4,1,0,1,1 8 | 56,0,1,140,294,0,0,153,0,1.3,1,0,2,1 9 | 44,1,1,120,263,0,1,173,0,0,2,0,3,1 10 | 52,1,2,172,199,1,1,162,0,0.5,2,0,3,1 11 | 57,1,2,150,168,0,1,174,0,1.6,2,0,2,1 12 | 54,1,0,140,239,0,1,160,0,1.2,2,0,2,1 13 | 48,0,2,130,275,0,1,139,0,0.2,2,0,2,1 14 | 49,1,1,130,266,0,1,171,0,0.6,2,0,2,1 15 | 64,1,3,110,211,0,0,144,1,1.8,1,0,2,1 16 | 58,0,3,150,283,1,0,162,0,1,2,0,2,1 17 | 50,0,2,120,219,0,1,158,0,1.6,1,0,2,1 18 | 58,0,2,120,340,0,1,172,0,0,2,0,2,1 19 | 66,0,3,150,226,0,1,114,0,2.6,0,0,2,1 20 | 43,1,0,150,247,0,1,171,0,1.5,2,0,2,1 21 | 69,0,3,140,239,0,1,151,0,1.8,2,2,2,1 22 | 59,1,0,135,234,0,1,161,0,0.5,1,0,3,1 23 | 44,1,2,130,233,0,1,179,1,0.4,2,0,2,1 24 | 42,1,0,140,226,0,1,178,0,0,2,0,2,1 25 | 61,1,2,150,243,1,1,137,1,1,1,0,2,1 26 | 40,1,3,140,199,0,1,178,1,1.4,2,0,3,1 27 | 71,0,1,160,302,0,1,162,0,0.4,2,2,2,1 28 | 59,1,2,150,212,1,1,157,0,1.6,2,0,2,1 29 | 51,1,2,110,175,0,1,123,0,0.6,2,0,2,1 30 | 65,0,2,140,417,1,0,157,0,0.8,2,1,2,1 31 | 53,1,2,130,197,1,0,152,0,1.2,0,0,2,1 32 | 41,0,1,105,198,0,1,168,0,0,2,1,2,1 33 | 65,1,0,120,177,0,1,140,0,0.4,2,0,3,1 34 | 44,1,1,130,219,0,0,188,0,0,2,0,2,1 35 | 54,1,2,125,273,0,0,152,0,0.5,0,1,2,1 36 | 51,1,3,125,213,0,0,125,1,1.4,2,1,2,1 37 | 46,0,2,142,177,0,0,160,1,1.4,0,0,2,1 38 | 54,0,2,135,304,1,1,170,0,0,2,0,2,1 39 | 54,1,2,150,232,0,0,165,0,1.6,2,0,3,1 40 | 65,0,2,155,269,0,1,148,0,0.8,2,0,2,1 41 | 65,0,2,160,360,0,0,151,0,0.8,2,0,2,1 42 | 51,0,2,140,308,0,0,142,0,1.5,2,1,2,1 43 | 48,1,1,130,245,0,0,180,0,0.2,1,0,2,1 44 | 45,1,0,104,208,0,0,148,1,3,1,0,2,1 45 | 53,0,0,130,264,0,0,143,0,0.4,1,0,2,1 46 | 39,1,2,140,321,0,0,182,0,0,2,0,2,1 47 | 52,1,1,120,325,0,1,172,0,0.2,2,0,2,1 48 | 44,1,2,140,235,0,0,180,0,0,2,0,2,1 49 | 47,1,2,138,257,0,0,156,0,0,2,0,2,1 50 | 53,0,2,128,216,0,0,115,0,0,2,0,0,1 51 | 53,0,0,138,234,0,0,160,0,0,2,0,2,1 52 | 51,0,2,130,256,0,0,149,0,0.5,2,0,2,1 53 | 66,1,0,120,302,0,0,151,0,0.4,1,0,2,1 54 | 62,1,2,130,231,0,1,146,0,1.8,1,3,3,1 55 | 44,0,2,108,141,0,1,175,0,0.6,1,0,2,1 56 | 63,0,2,135,252,0,0,172,0,0,2,0,2,1 57 | 52,1,1,134,201,0,1,158,0,0.8,2,1,2,1 58 | 48,1,0,122,222,0,0,186,0,0,2,0,2,1 59 | 45,1,0,115,260,0,0,185,0,0,2,0,2,1 60 | 34,1,3,118,182,0,0,174,0,0,2,0,2,1 61 | 57,0,0,128,303,0,0,159,0,0,2,1,2,1 62 | 71,0,2,110,265,1,0,130,0,0,2,1,2,1 63 | 54,1,1,108,309,0,1,156,0,0,2,0,3,1 64 | 52,1,3,118,186,0,0,190,0,0,1,0,1,1 65 | 41,1,1,135,203,0,1,132,0,0,1,0,1,1 66 | 58,1,2,140,211,1,0,165,0,0,2,0,2,1 67 | 35,0,0,138,183,0,1,182,0,1.4,2,0,2,1 68 | 51,1,2,100,222,0,1,143,1,1.2,1,0,2,1 69 | 45,0,1,130,234,0,0,175,0,0.6,1,0,2,1 70 | 44,1,1,120,220,0,1,170,0,0,2,0,2,1 71 | 62,0,0,124,209,0,1,163,0,0,2,0,2,1 72 | 54,1,2,120,258,0,0,147,0,0.4,1,0,3,1 73 | 51,1,2,94,227,0,1,154,1,0,2,1,3,1 74 | 29,1,1,130,204,0,0,202,0,0,2,0,2,1 75 | 51,1,0,140,261,0,0,186,1,0,2,0,2,1 76 | 43,0,2,122,213,0,1,165,0,0.2,1,0,2,1 77 | 55,0,1,135,250,0,0,161,0,1.4,1,0,2,1 78 | 51,1,2,125,245,1,0,166,0,2.4,1,0,2,1 79 | 59,1,1,140,221,0,1,164,1,0,2,0,2,1 80 | 52,1,1,128,205,1,1,184,0,0,2,0,2,1 81 | 58,1,2,105,240,0,0,154,1,0.6,1,0,3,1 82 | 41,1,2,112,250,0,1,179,0,0,2,0,2,1 83 | 45,1,1,128,308,0,0,170,0,0,2,0,2,1 84 | 60,0,2,102,318,0,1,160,0,0,2,1,2,1 85 | 52,1,3,152,298,1,1,178,0,1.2,1,0,3,1 86 | 42,0,0,102,265,0,0,122,0,0.6,1,0,2,1 87 | 67,0,2,115,564,0,0,160,0,1.6,1,0,3,1 88 | 68,1,2,118,277,0,1,151,0,1,2,1,3,1 89 | 46,1,1,101,197,1,1,156,0,0,2,0,3,1 90 | 54,0,2,110,214,0,1,158,0,1.6,1,0,2,1 91 | 58,0,0,100,248,0,0,122,0,1,1,0,2,1 92 | 48,1,2,124,255,1,1,175,0,0,2,2,2,1 93 | 57,1,0,132,207,0,1,168,1,0,2,0,3,1 94 | 52,1,2,138,223,0,1,169,0,0,2,4,2,1 95 | 54,0,1,132,288,1,0,159,1,0,2,1,2,1 96 | 45,0,1,112,160,0,1,138,0,0,1,0,2,1 97 | 53,1,0,142,226,0,0,111,1,0,2,0,3,1 98 | 62,0,0,140,394,0,0,157,0,1.2,1,0,2,1 99 | 52,1,0,108,233,1,1,147,0,0.1,2,3,3,1 100 | 43,1,2,130,315,0,1,162,0,1.9,2,1,2,1 101 | 53,1,2,130,246,1,0,173,0,0,2,3,2,1 102 | 42,1,3,148,244,0,0,178,0,0.8,2,2,2,1 103 | 59,1,3,178,270,0,0,145,0,4.2,0,0,3,1 104 | 63,0,1,140,195,0,1,179,0,0,2,2,2,1 105 | 42,1,2,120,240,1,1,194,0,0.8,0,0,3,1 106 | 50,1,2,129,196,0,1,163,0,0,2,0,2,1 107 | 68,0,2,120,211,0,0,115,0,1.5,1,0,2,1 108 | 69,1,3,160,234,1,0,131,0,0.1,1,1,2,1 109 | 45,0,0,138,236,0,0,152,1,0.2,1,0,2,1 110 | 50,0,1,120,244,0,1,162,0,1.1,2,0,2,1 111 | 50,0,0,110,254,0,0,159,0,0,2,0,2,1 112 | 64,0,0,180,325,0,1,154,1,0,2,0,2,1 113 | 57,1,2,150,126,1,1,173,0,0.2,2,1,3,1 114 | 64,0,2,140,313,0,1,133,0,0.2,2,0,3,1 115 | 43,1,0,110,211,0,1,161,0,0,2,0,3,1 116 | 55,1,1,130,262,0,1,155,0,0,2,0,2,1 117 | 37,0,2,120,215,0,1,170,0,0,2,0,2,1 118 | 41,1,2,130,214,0,0,168,0,2,1,0,2,1 119 | 56,1,3,120,193,0,0,162,0,1.9,1,0,3,1 120 | 46,0,1,105,204,0,1,172,0,0,2,0,2,1 121 | 46,0,0,138,243,0,0,152,1,0,1,0,2,1 122 | 64,0,0,130,303,0,1,122,0,2,1,2,2,1 123 | 59,1,0,138,271,0,0,182,0,0,2,0,2,1 124 | 41,0,2,112,268,0,0,172,1,0,2,0,2,1 125 | 54,0,2,108,267,0,0,167,0,0,2,0,2,1 126 | 39,0,2,94,199,0,1,179,0,0,2,0,2,1 127 | 34,0,1,118,210,0,1,192,0,0.7,2,0,2,1 128 | 47,1,0,112,204,0,1,143,0,0.1,2,0,2,1 129 | 67,0,2,152,277,0,1,172,0,0,2,1,2,1 130 | 52,0,2,136,196,0,0,169,0,0.1,1,0,2,1 131 | 74,0,1,120,269,0,0,121,1,0.2,2,1,2,1 132 | 54,0,2,160,201,0,1,163,0,0,2,1,2,1 133 | 49,0,1,134,271,0,1,162,0,0,1,0,2,1 134 | 42,1,1,120,295,0,1,162,0,0,2,0,2,1 135 | 41,1,1,110,235,0,1,153,0,0,2,0,2,1 136 | 41,0,1,126,306,0,1,163,0,0,2,0,2,1 137 | 49,0,0,130,269,0,1,163,0,0,2,0,2,1 138 | 60,0,2,120,178,1,1,96,0,0,2,0,2,1 139 | 62,1,1,128,208,1,0,140,0,0,2,0,2,1 140 | 57,1,0,110,201,0,1,126,1,1.5,1,0,1,1 141 | 64,1,0,128,263,0,1,105,1,0.2,1,1,3,1 142 | 51,0,2,120,295,0,0,157,0,0.6,2,0,2,1 143 | 43,1,0,115,303,0,1,181,0,1.2,1,0,2,1 144 | 42,0,2,120,209,0,1,173,0,0,1,0,2,1 145 | 67,0,0,106,223,0,1,142,0,0.3,2,2,2,1 146 | 76,0,2,140,197,0,2,116,0,1.1,1,0,2,1 147 | 70,1,1,156,245,0,0,143,0,0,2,0,2,1 148 | 44,0,2,118,242,0,1,149,0,0.3,1,1,2,1 149 | 60,0,3,150,240,0,1,171,0,0.9,2,0,2,1 150 | 44,1,2,120,226,0,1,169,0,0,2,0,2,1 151 | 42,1,2,130,180,0,1,150,0,0,2,0,2,1 152 | 66,1,0,160,228,0,0,138,0,2.3,2,0,1,1 153 | 71,0,0,112,149,0,1,125,0,1.6,1,0,2,1 154 | 64,1,3,170,227,0,0,155,0,0.6,1,0,3,1 155 | 66,0,2,146,278,0,0,152,0,0,1,1,2,1 156 | 39,0,2,138,220,0,1,152,0,0,1,0,2,1 157 | 58,0,0,130,197,0,1,131,0,0.6,1,0,2,1 158 | 47,1,2,130,253,0,1,179,0,0,2,0,2,1 159 | 35,1,1,122,192,0,1,174,0,0,2,0,2,1 160 | 58,1,1,125,220,0,1,144,0,0.4,1,4,3,1 161 | 56,1,1,130,221,0,0,163,0,0,2,0,3,1 162 | 56,1,1,120,240,0,1,169,0,0,0,0,2,1 163 | 55,0,1,132,342,0,1,166,0,1.2,2,0,2,1 164 | 41,1,1,120,157,0,1,182,0,0,2,0,2,1 165 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1 166 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1 167 | 67,1,0,160,286,0,0,108,1,1.5,1,3,2,0 168 | 67,1,0,120,229,0,0,129,1,2.6,1,2,3,0 169 | 62,0,0,140,268,0,0,160,0,3.6,0,2,2,0 170 | 63,1,0,130,254,0,0,147,0,1.4,1,1,3,0 171 | 53,1,0,140,203,1,0,155,1,3.1,0,0,3,0 172 | 56,1,2,130,256,1,0,142,1,0.6,1,1,1,0 173 | 48,1,1,110,229,0,1,168,0,1,0,0,3,0 174 | 58,1,1,120,284,0,0,160,0,1.8,1,0,2,0 175 | 58,1,2,132,224,0,0,173,0,3.2,2,2,3,0 176 | 60,1,0,130,206,0,0,132,1,2.4,1,2,3,0 177 | 40,1,0,110,167,0,0,114,1,2,1,0,3,0 178 | 60,1,0,117,230,1,1,160,1,1.4,2,2,3,0 179 | 64,1,2,140,335,0,1,158,0,0,2,0,2,0 180 | 43,1,0,120,177,0,0,120,1,2.5,1,0,3,0 181 | 57,1,0,150,276,0,0,112,1,0.6,1,1,1,0 182 | 55,1,0,132,353,0,1,132,1,1.2,1,1,3,0 183 | 65,0,0,150,225,0,0,114,0,1,1,3,3,0 184 | 61,0,0,130,330,0,0,169,0,0,2,0,2,0 185 | 58,1,2,112,230,0,0,165,0,2.5,1,1,3,0 186 | 50,1,0,150,243,0,0,128,0,2.6,1,0,3,0 187 | 44,1,0,112,290,0,0,153,0,0,2,1,2,0 188 | 60,1,0,130,253,0,1,144,1,1.4,2,1,3,0 189 | 54,1,0,124,266,0,0,109,1,2.2,1,1,3,0 190 | 50,1,2,140,233,0,1,163,0,0.6,1,1,3,0 191 | 41,1,0,110,172,0,0,158,0,0,2,0,3,0 192 | 51,0,0,130,305,0,1,142,1,1.2,1,0,3,0 193 | 58,1,0,128,216,0,0,131,1,2.2,1,3,3,0 194 | 54,1,0,120,188,0,1,113,0,1.4,1,1,3,0 195 | 60,1,0,145,282,0,0,142,1,2.8,1,2,3,0 196 | 60,1,2,140,185,0,0,155,0,3,1,0,2,0 197 | 59,1,0,170,326,0,0,140,1,3.4,0,0,3,0 198 | 46,1,2,150,231,0,1,147,0,3.6,1,0,2,0 199 | 67,1,0,125,254,1,1,163,0,0.2,1,2,3,0 200 | 62,1,0,120,267,0,1,99,1,1.8,1,2,3,0 201 | 65,1,0,110,248,0,0,158,0,0.6,2,2,1,0 202 | 44,1,0,110,197,0,0,177,0,0,2,1,2,0 203 | 60,1,0,125,258,0,0,141,1,2.8,1,1,3,0 204 | 58,1,0,150,270,0,0,111,1,0.8,2,0,3,0 205 | 68,1,2,180,274,1,0,150,1,1.6,1,0,3,0 206 | 62,0,0,160,164,0,0,145,0,6.2,0,3,3,0 207 | 52,1,0,128,255,0,1,161,1,0,2,1,3,0 208 | 59,1,0,110,239,0,0,142,1,1.2,1,1,3,0 209 | 60,0,0,150,258,0,0,157,0,2.6,1,2,3,0 210 | 49,1,2,120,188,0,1,139,0,2,1,3,3,0 211 | 59,1,0,140,177,0,1,162,1,0,2,1,3,0 212 | 57,1,2,128,229,0,0,150,0,0.4,1,1,3,0 213 | 61,1,0,120,260,0,1,140,1,3.6,1,1,3,0 214 | 39,1,0,118,219,0,1,140,0,1.2,1,0,3,0 215 | 61,0,0,145,307,0,0,146,1,1,1,0,3,0 216 | 56,1,0,125,249,1,0,144,1,1.2,1,1,2,0 217 | 43,0,0,132,341,1,0,136,1,3,1,0,3,0 218 | 62,0,2,130,263,0,1,97,0,1.2,1,1,3,0 219 | 63,1,0,130,330,1,0,132,1,1.8,2,3,3,0 220 | 65,1,0,135,254,0,0,127,0,2.8,1,1,3,0 221 | 48,1,0,130,256,1,0,150,1,0,2,2,3,0 222 | 63,0,0,150,407,0,0,154,0,4,1,3,3,0 223 | 55,1,0,140,217,0,1,111,1,5.6,0,0,3,0 224 | 65,1,3,138,282,1,0,174,0,1.4,1,1,2,0 225 | 56,0,0,200,288,1,0,133,1,4,0,2,3,0 226 | 54,1,0,110,239,0,1,126,1,2.8,1,1,3,0 227 | 70,1,0,145,174,0,1,125,1,2.6,0,0,3,0 228 | 62,1,1,120,281,0,0,103,0,1.4,1,1,3,0 229 | 35,1,0,120,198,0,1,130,1,1.6,1,0,3,0 230 | 59,1,3,170,288,0,0,159,0,0.2,1,0,3,0 231 | 64,1,2,125,309,0,1,131,1,1.8,1,0,3,0 232 | 47,1,2,108,243,0,1,152,0,0,2,0,2,0 233 | 57,1,0,165,289,1,0,124,0,1,1,3,3,0 234 | 55,1,0,160,289,0,0,145,1,0.8,1,1,3,0 235 | 64,1,0,120,246,0,0,96,1,2.2,0,1,2,0 236 | 70,1,0,130,322,0,0,109,0,2.4,1,3,2,0 237 | 51,1,0,140,299,0,1,173,1,1.6,2,0,3,0 238 | 58,1,0,125,300,0,0,171,0,0,2,2,3,0 239 | 60,1,0,140,293,0,0,170,0,1.2,1,2,3,0 240 | 77,1,0,125,304,0,0,162,1,0,2,3,2,0 241 | 35,1,0,126,282,0,0,156,1,0,2,0,3,0 242 | 70,1,2,160,269,0,1,112,1,2.9,1,1,3,0 243 | 59,0,0,174,249,0,1,143,1,0,1,0,2,0 244 | 64,1,0,145,212,0,0,132,0,2,1,2,1,0 245 | 57,1,0,152,274,0,1,88,1,1.2,1,1,3,0 246 | 56,1,0,132,184,0,0,105,1,2.1,1,1,1,0 247 | 48,1,0,124,274,0,0,166,0,0.5,1,0,3,0 248 | 56,0,0,134,409,0,0,150,1,1.9,1,2,3,0 249 | 66,1,1,160,246,0,1,120,1,0,1,3,1,0 250 | 54,1,1,192,283,0,0,195,0,0,2,1,3,0 251 | 69,1,2,140,254,0,0,146,0,2,1,3,3,0 252 | 51,1,0,140,298,0,1,122,1,4.2,1,3,3,0 253 | 43,1,0,132,247,1,0,143,1,0.1,1,4,3,0 254 | 62,0,0,138,294,1,1,106,0,1.9,1,3,2,0 255 | 67,1,0,100,299,0,0,125,1,0.9,1,2,2,0 256 | 59,1,3,160,273,0,0,125,0,0,2,0,2,0 257 | 45,1,0,142,309,0,0,147,1,0,1,3,3,0 258 | 58,1,0,128,259,0,0,130,1,3,1,2,3,0 259 | 50,1,0,144,200,0,0,126,1,0.9,1,0,3,0 260 | 62,0,0,150,244,0,1,154,1,1.4,1,0,2,0 261 | 38,1,3,120,231,0,1,182,1,3.8,1,0,3,0 262 | 66,0,0,178,228,1,1,165,1,1,1,2,3,0 263 | 52,1,0,112,230,0,1,160,0,0,2,1,2,0 264 | 53,1,0,123,282,0,1,95,1,2,1,2,3,0 265 | 63,0,0,108,269,0,1,169,1,1.8,1,2,2,0 266 | 54,1,0,110,206,0,0,108,1,0,1,1,2,0 267 | 66,1,0,112,212,0,0,132,1,0.1,2,1,2,0 268 | 55,0,0,180,327,0,2,117,1,3.4,1,0,2,0 269 | 49,1,2,118,149,0,0,126,0,0.8,2,3,2,0 270 | 54,1,0,122,286,0,0,116,1,3.2,1,2,2,0 271 | 56,1,0,130,283,1,0,103,1,1.6,0,0,3,0 272 | 46,1,0,120,249,0,0,144,0,0.8,2,0,3,0 273 | 61,1,3,134,234,0,1,145,0,2.6,1,2,2,0 274 | 67,1,0,120,237,0,1,71,0,1,1,0,2,0 275 | 58,1,0,100,234,0,1,156,0,0.1,2,1,3,0 276 | 47,1,0,110,275,0,0,118,1,1,1,1,2,0 277 | 52,1,0,125,212,0,1,168,0,1,2,2,3,0 278 | 58,1,0,146,218,0,1,105,0,2,1,1,3,0 279 | 57,1,1,124,261,0,1,141,0,0.3,2,0,3,0 280 | 58,0,1,136,319,1,0,152,0,0,2,2,2,0 281 | 61,1,0,138,166,0,0,125,1,3.6,1,1,2,0 282 | 42,1,0,136,315,0,1,125,1,1.8,1,0,1,0 283 | 52,1,0,128,204,1,1,156,1,1,1,0,0,0 284 | 59,1,2,126,218,1,1,134,0,2.2,1,1,1,0 285 | 40,1,0,152,223,0,1,181,0,0,2,0,3,0 286 | 61,1,0,140,207,0,0,138,1,1.9,2,1,3,0 287 | 46,1,0,140,311,0,1,120,1,1.8,1,2,3,0 288 | 59,1,3,134,204,0,1,162,0,0.8,2,2,2,0 289 | 57,1,1,154,232,0,0,164,0,0,2,1,2,0 290 | 57,1,0,110,335,0,1,143,1,3,1,1,3,0 291 | 55,0,0,128,205,0,2,130,1,2,1,1,3,0 292 | 61,1,0,148,203,0,1,161,0,0,2,1,3,0 293 | 58,1,0,114,318,0,2,140,0,4.4,0,3,1,0 294 | 58,0,0,170,225,1,0,146,1,2.8,1,2,1,0 295 | 67,1,2,152,212,0,0,150,0,0.8,1,0,3,0 296 | 44,1,0,120,169,0,1,144,1,2.8,0,0,1,0 297 | 63,1,0,140,187,0,0,144,1,4,2,2,3,0 298 | 63,0,0,124,197,0,1,136,1,0,1,0,2,0 299 | 59,1,0,164,176,1,0,90,0,1,1,2,1,0 300 | 57,0,0,140,241,0,1,123,1,0.2,1,0,3,0 301 | 45,1,3,110,264,0,1,132,0,1.2,1,0,3,0 302 | 68,1,0,144,193,1,1,141,0,3.4,1,2,3,0 303 | 57,1,0,130,131,0,1,115,1,1.2,1,1,3,0 304 | 57,0,1,130,236,0,0,174,0,0,1,1,2,0 305 | -------------------------------------------------------------------------------- /practiceResource/data/heart.csv: -------------------------------------------------------------------------------- 1 | age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target 2 | 63,1,3,145,233,1,0,150,0,2.3,0,0,1,1 3 | 37,1,2,130,250,0,1,187,0,3.5,0,0,2,1 4 | 41,0,1,130,204,0,0,172,0,1.4,2,0,2,1 5 | 56,1,1,120,236,0,1,178,0,0.8,2,0,2,1 6 | 57,0,0,120,354,0,1,163,1,0.6,2,0,2,1 7 | 57,1,0,140,192,0,1,148,0,0.4,1,0,1,1 8 | 56,0,1,140,294,0,0,153,0,1.3,1,0,2,1 9 | 44,1,1,120,263,0,1,173,0,0,2,0,3,1 10 | 52,1,2,172,199,1,1,162,0,0.5,2,0,3,1 11 | 57,1,2,150,168,0,1,174,0,1.6,2,0,2,1 12 | 54,1,0,140,239,0,1,160,0,1.2,2,0,2,1 13 | 48,0,2,130,275,0,1,139,0,0.2,2,0,2,1 14 | 49,1,1,130,266,0,1,171,0,0.6,2,0,2,1 15 | 64,1,3,110,211,0,0,144,1,1.8,1,0,2,1 16 | 58,0,3,150,283,1,0,162,0,1,2,0,2,1 17 | 50,0,2,120,219,0,1,158,0,1.6,1,0,2,1 18 | 58,0,2,120,340,0,1,172,0,0,2,0,2,1 19 | 66,0,3,150,226,0,1,114,0,2.6,0,0,2,1 20 | 43,1,0,150,247,0,1,171,0,1.5,2,0,2,1 21 | 69,0,3,140,239,0,1,151,0,1.8,2,2,2,1 22 | 59,1,0,135,234,0,1,161,0,0.5,1,0,3,1 23 | 44,1,2,130,233,0,1,179,1,0.4,2,0,2,1 24 | 42,1,0,140,226,0,1,178,0,0,2,0,2,1 25 | 61,1,2,150,243,1,1,137,1,1,1,0,2,1 26 | 40,1,3,140,199,0,1,178,1,1.4,2,0,3,1 27 | 71,0,1,160,302,0,1,162,0,0.4,2,2,2,1 28 | 59,1,2,150,212,1,1,157,0,1.6,2,0,2,1 29 | 51,1,2,110,175,0,1,123,0,0.6,2,0,2,1 30 | 65,0,2,140,417,1,0,157,0,0.8,2,1,2,1 31 | 53,1,2,130,197,1,0,152,0,1.2,0,0,2,1 32 | 41,0,1,105,198,0,1,168,0,0,2,1,2,1 33 | 65,1,0,120,177,0,1,140,0,0.4,2,0,3,1 34 | 44,1,1,130,219,0,0,188,0,0,2,0,2,1 35 | 54,1,2,125,273,0,0,152,0,0.5,0,1,2,1 36 | 51,1,3,125,213,0,0,125,1,1.4,2,1,2,1 37 | 46,0,2,142,177,0,0,160,1,1.4,0,0,2,1 38 | 54,0,2,135,304,1,1,170,0,0,2,0,2,1 39 | 54,1,2,150,232,0,0,165,0,1.6,2,0,3,1 40 | 65,0,2,155,269,0,1,148,0,0.8,2,0,2,1 41 | 65,0,2,160,360,0,0,151,0,0.8,2,0,2,1 42 | 51,0,2,140,308,0,0,142,0,1.5,2,1,2,1 43 | 48,1,1,130,245,0,0,180,0,0.2,1,0,2,1 44 | 45,1,0,104,208,0,0,148,1,3,1,0,2,1 45 | 53,0,0,130,264,0,0,143,0,0.4,1,0,2,1 46 | 39,1,2,140,321,0,0,182,0,0,2,0,2,1 47 | 52,1,1,120,325,0,1,172,0,0.2,2,0,2,1 48 | 44,1,2,140,235,0,0,180,0,0,2,0,2,1 49 | 47,1,2,138,257,0,0,156,0,0,2,0,2,1 50 | 53,0,2,128,216,0,0,115,0,0,2,0,0,1 51 | 53,0,0,138,234,0,0,160,0,0,2,0,2,1 52 | 51,0,2,130,256,0,0,149,0,0.5,2,0,2,1 53 | 66,1,0,120,302,0,0,151,0,0.4,1,0,2,1 54 | 62,1,2,130,231,0,1,146,0,1.8,1,3,3,1 55 | 44,0,2,108,141,0,1,175,0,0.6,1,0,2,1 56 | 63,0,2,135,252,0,0,172,0,0,2,0,2,1 57 | 52,1,1,134,201,0,1,158,0,0.8,2,1,2,1 58 | 48,1,0,122,222,0,0,186,0,0,2,0,2,1 59 | 45,1,0,115,260,0,0,185,0,0,2,0,2,1 60 | 34,1,3,118,182,0,0,174,0,0,2,0,2,1 61 | 57,0,0,128,303,0,0,159,0,0,2,1,2,1 62 | 71,0,2,110,265,1,0,130,0,0,2,1,2,1 63 | 54,1,1,108,309,0,1,156,0,0,2,0,3,1 64 | 52,1,3,118,186,0,0,190,0,0,1,0,1,1 65 | 41,1,1,135,203,0,1,132,0,0,1,0,1,1 66 | 58,1,2,140,211,1,0,165,0,0,2,0,2,1 67 | 35,0,0,138,183,0,1,182,0,1.4,2,0,2,1 68 | 51,1,2,100,222,0,1,143,1,1.2,1,0,2,1 69 | 45,0,1,130,234,0,0,175,0,0.6,1,0,2,1 70 | 44,1,1,120,220,0,1,170,0,0,2,0,2,1 71 | 62,0,0,124,209,0,1,163,0,0,2,0,2,1 72 | 54,1,2,120,258,0,0,147,0,0.4,1,0,3,1 73 | 51,1,2,94,227,0,1,154,1,0,2,1,3,1 74 | 29,1,1,130,204,0,0,202,0,0,2,0,2,1 75 | 51,1,0,140,261,0,0,186,1,0,2,0,2,1 76 | 43,0,2,122,213,0,1,165,0,0.2,1,0,2,1 77 | 55,0,1,135,250,0,0,161,0,1.4,1,0,2,1 78 | 51,1,2,125,245,1,0,166,0,2.4,1,0,2,1 79 | 59,1,1,140,221,0,1,164,1,0,2,0,2,1 80 | 52,1,1,128,205,1,1,184,0,0,2,0,2,1 81 | 58,1,2,105,240,0,0,154,1,0.6,1,0,3,1 82 | 41,1,2,112,250,0,1,179,0,0,2,0,2,1 83 | 45,1,1,128,308,0,0,170,0,0,2,0,2,1 84 | 60,0,2,102,318,0,1,160,0,0,2,1,2,1 85 | 52,1,3,152,298,1,1,178,0,1.2,1,0,3,1 86 | 42,0,0,102,265,0,0,122,0,0.6,1,0,2,1 87 | 67,0,2,115,564,0,0,160,0,1.6,1,0,3,1 88 | 68,1,2,118,277,0,1,151,0,1,2,1,3,1 89 | 46,1,1,101,197,1,1,156,0,0,2,0,3,1 90 | 54,0,2,110,214,0,1,158,0,1.6,1,0,2,1 91 | 58,0,0,100,248,0,0,122,0,1,1,0,2,1 92 | 48,1,2,124,255,1,1,175,0,0,2,2,2,1 93 | 57,1,0,132,207,0,1,168,1,0,2,0,3,1 94 | 52,1,2,138,223,0,1,169,0,0,2,4,2,1 95 | 54,0,1,132,288,1,0,159,1,0,2,1,2,1 96 | 45,0,1,112,160,0,1,138,0,0,1,0,2,1 97 | 53,1,0,142,226,0,0,111,1,0,2,0,3,1 98 | 62,0,0,140,394,0,0,157,0,1.2,1,0,2,1 99 | 52,1,0,108,233,1,1,147,0,0.1,2,3,3,1 100 | 43,1,2,130,315,0,1,162,0,1.9,2,1,2,1 101 | 53,1,2,130,246,1,0,173,0,0,2,3,2,1 102 | 42,1,3,148,244,0,0,178,0,0.8,2,2,2,1 103 | 59,1,3,178,270,0,0,145,0,4.2,0,0,3,1 104 | 63,0,1,140,195,0,1,179,0,0,2,2,2,1 105 | 42,1,2,120,240,1,1,194,0,0.8,0,0,3,1 106 | 50,1,2,129,196,0,1,163,0,0,2,0,2,1 107 | 68,0,2,120,211,0,0,115,0,1.5,1,0,2,1 108 | 69,1,3,160,234,1,0,131,0,0.1,1,1,2,1 109 | 45,0,0,138,236,0,0,152,1,0.2,1,0,2,1 110 | 50,0,1,120,244,0,1,162,0,1.1,2,0,2,1 111 | 50,0,0,110,254,0,0,159,0,0,2,0,2,1 112 | 64,0,0,180,325,0,1,154,1,0,2,0,2,1 113 | 57,1,2,150,126,1,1,173,0,0.2,2,1,3,1 114 | 64,0,2,140,313,0,1,133,0,0.2,2,0,3,1 115 | 43,1,0,110,211,0,1,161,0,0,2,0,3,1 116 | 55,1,1,130,262,0,1,155,0,0,2,0,2,1 117 | 37,0,2,120,215,0,1,170,0,0,2,0,2,1 118 | 41,1,2,130,214,0,0,168,0,2,1,0,2,1 119 | 56,1,3,120,193,0,0,162,0,1.9,1,0,3,1 120 | 46,0,1,105,204,0,1,172,0,0,2,0,2,1 121 | 46,0,0,138,243,0,0,152,1,0,1,0,2,1 122 | 64,0,0,130,303,0,1,122,0,2,1,2,2,1 123 | 59,1,0,138,271,0,0,182,0,0,2,0,2,1 124 | 41,0,2,112,268,0,0,172,1,0,2,0,2,1 125 | 54,0,2,108,267,0,0,167,0,0,2,0,2,1 126 | 39,0,2,94,199,0,1,179,0,0,2,0,2,1 127 | 34,0,1,118,210,0,1,192,0,0.7,2,0,2,1 128 | 47,1,0,112,204,0,1,143,0,0.1,2,0,2,1 129 | 67,0,2,152,277,0,1,172,0,0,2,1,2,1 130 | 52,0,2,136,196,0,0,169,0,0.1,1,0,2,1 131 | 74,0,1,120,269,0,0,121,1,0.2,2,1,2,1 132 | 54,0,2,160,201,0,1,163,0,0,2,1,2,1 133 | 49,0,1,134,271,0,1,162,0,0,1,0,2,1 134 | 42,1,1,120,295,0,1,162,0,0,2,0,2,1 135 | 41,1,1,110,235,0,1,153,0,0,2,0,2,1 136 | 41,0,1,126,306,0,1,163,0,0,2,0,2,1 137 | 49,0,0,130,269,0,1,163,0,0,2,0,2,1 138 | 60,0,2,120,178,1,1,96,0,0,2,0,2,1 139 | 62,1,1,128,208,1,0,140,0,0,2,0,2,1 140 | 57,1,0,110,201,0,1,126,1,1.5,1,0,1,1 141 | 64,1,0,128,263,0,1,105,1,0.2,1,1,3,1 142 | 51,0,2,120,295,0,0,157,0,0.6,2,0,2,1 143 | 43,1,0,115,303,0,1,181,0,1.2,1,0,2,1 144 | 42,0,2,120,209,0,1,173,0,0,1,0,2,1 145 | 67,0,0,106,223,0,1,142,0,0.3,2,2,2,1 146 | 76,0,2,140,197,0,2,116,0,1.1,1,0,2,1 147 | 70,1,1,156,245,0,0,143,0,0,2,0,2,1 148 | 44,0,2,118,242,0,1,149,0,0.3,1,1,2,1 149 | 60,0,3,150,240,0,1,171,0,0.9,2,0,2,1 150 | 44,1,2,120,226,0,1,169,0,0,2,0,2,1 151 | 42,1,2,130,180,0,1,150,0,0,2,0,2,1 152 | 66,1,0,160,228,0,0,138,0,2.3,2,0,1,1 153 | 71,0,0,112,149,0,1,125,0,1.6,1,0,2,1 154 | 64,1,3,170,227,0,0,155,0,0.6,1,0,3,1 155 | 66,0,2,146,278,0,0,152,0,0,1,1,2,1 156 | 39,0,2,138,220,0,1,152,0,0,1,0,2,1 157 | 58,0,0,130,197,0,1,131,0,0.6,1,0,2,1 158 | 47,1,2,130,253,0,1,179,0,0,2,0,2,1 159 | 35,1,1,122,192,0,1,174,0,0,2,0,2,1 160 | 58,1,1,125,220,0,1,144,0,0.4,1,4,3,1 161 | 56,1,1,130,221,0,0,163,0,0,2,0,3,1 162 | 56,1,1,120,240,0,1,169,0,0,0,0,2,1 163 | 55,0,1,132,342,0,1,166,0,1.2,2,0,2,1 164 | 41,1,1,120,157,0,1,182,0,0,2,0,2,1 165 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1 166 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1 167 | 67,1,0,160,286,0,0,108,1,1.5,1,3,2,0 168 | 67,1,0,120,229,0,0,129,1,2.6,1,2,3,0 169 | 62,0,0,140,268,0,0,160,0,3.6,0,2,2,0 170 | 63,1,0,130,254,0,0,147,0,1.4,1,1,3,0 171 | 53,1,0,140,203,1,0,155,1,3.1,0,0,3,0 172 | 56,1,2,130,256,1,0,142,1,0.6,1,1,1,0 173 | 48,1,1,110,229,0,1,168,0,1,0,0,3,0 174 | 58,1,1,120,284,0,0,160,0,1.8,1,0,2,0 175 | 58,1,2,132,224,0,0,173,0,3.2,2,2,3,0 176 | 60,1,0,130,206,0,0,132,1,2.4,1,2,3,0 177 | 40,1,0,110,167,0,0,114,1,2,1,0,3,0 178 | 60,1,0,117,230,1,1,160,1,1.4,2,2,3,0 179 | 64,1,2,140,335,0,1,158,0,0,2,0,2,0 180 | 43,1,0,120,177,0,0,120,1,2.5,1,0,3,0 181 | 57,1,0,150,276,0,0,112,1,0.6,1,1,1,0 182 | 55,1,0,132,353,0,1,132,1,1.2,1,1,3,0 183 | 65,0,0,150,225,0,0,114,0,1,1,3,3,0 184 | 61,0,0,130,330,0,0,169,0,0,2,0,2,0 185 | 58,1,2,112,230,0,0,165,0,2.5,1,1,3,0 186 | 50,1,0,150,243,0,0,128,0,2.6,1,0,3,0 187 | 44,1,0,112,290,0,0,153,0,0,2,1,2,0 188 | 60,1,0,130,253,0,1,144,1,1.4,2,1,3,0 189 | 54,1,0,124,266,0,0,109,1,2.2,1,1,3,0 190 | 50,1,2,140,233,0,1,163,0,0.6,1,1,3,0 191 | 41,1,0,110,172,0,0,158,0,0,2,0,3,0 192 | 51,0,0,130,305,0,1,142,1,1.2,1,0,3,0 193 | 58,1,0,128,216,0,0,131,1,2.2,1,3,3,0 194 | 54,1,0,120,188,0,1,113,0,1.4,1,1,3,0 195 | 60,1,0,145,282,0,0,142,1,2.8,1,2,3,0 196 | 60,1,2,140,185,0,0,155,0,3,1,0,2,0 197 | 59,1,0,170,326,0,0,140,1,3.4,0,0,3,0 198 | 46,1,2,150,231,0,1,147,0,3.6,1,0,2,0 199 | 67,1,0,125,254,1,1,163,0,0.2,1,2,3,0 200 | 62,1,0,120,267,0,1,99,1,1.8,1,2,3,0 201 | 65,1,0,110,248,0,0,158,0,0.6,2,2,1,0 202 | 44,1,0,110,197,0,0,177,0,0,2,1,2,0 203 | 60,1,0,125,258,0,0,141,1,2.8,1,1,3,0 204 | 58,1,0,150,270,0,0,111,1,0.8,2,0,3,0 205 | 68,1,2,180,274,1,0,150,1,1.6,1,0,3,0 206 | 62,0,0,160,164,0,0,145,0,6.2,0,3,3,0 207 | 52,1,0,128,255,0,1,161,1,0,2,1,3,0 208 | 59,1,0,110,239,0,0,142,1,1.2,1,1,3,0 209 | 60,0,0,150,258,0,0,157,0,2.6,1,2,3,0 210 | 49,1,2,120,188,0,1,139,0,2,1,3,3,0 211 | 59,1,0,140,177,0,1,162,1,0,2,1,3,0 212 | 57,1,2,128,229,0,0,150,0,0.4,1,1,3,0 213 | 61,1,0,120,260,0,1,140,1,3.6,1,1,3,0 214 | 39,1,0,118,219,0,1,140,0,1.2,1,0,3,0 215 | 61,0,0,145,307,0,0,146,1,1,1,0,3,0 216 | 56,1,0,125,249,1,0,144,1,1.2,1,1,2,0 217 | 43,0,0,132,341,1,0,136,1,3,1,0,3,0 218 | 62,0,2,130,263,0,1,97,0,1.2,1,1,3,0 219 | 63,1,0,130,330,1,0,132,1,1.8,2,3,3,0 220 | 65,1,0,135,254,0,0,127,0,2.8,1,1,3,0 221 | 48,1,0,130,256,1,0,150,1,0,2,2,3,0 222 | 63,0,0,150,407,0,0,154,0,4,1,3,3,0 223 | 55,1,0,140,217,0,1,111,1,5.6,0,0,3,0 224 | 65,1,3,138,282,1,0,174,0,1.4,1,1,2,0 225 | 56,0,0,200,288,1,0,133,1,4,0,2,3,0 226 | 54,1,0,110,239,0,1,126,1,2.8,1,1,3,0 227 | 70,1,0,145,174,0,1,125,1,2.6,0,0,3,0 228 | 62,1,1,120,281,0,0,103,0,1.4,1,1,3,0 229 | 35,1,0,120,198,0,1,130,1,1.6,1,0,3,0 230 | 59,1,3,170,288,0,0,159,0,0.2,1,0,3,0 231 | 64,1,2,125,309,0,1,131,1,1.8,1,0,3,0 232 | 47,1,2,108,243,0,1,152,0,0,2,0,2,0 233 | 57,1,0,165,289,1,0,124,0,1,1,3,3,0 234 | 55,1,0,160,289,0,0,145,1,0.8,1,1,3,0 235 | 64,1,0,120,246,0,0,96,1,2.2,0,1,2,0 236 | 70,1,0,130,322,0,0,109,0,2.4,1,3,2,0 237 | 51,1,0,140,299,0,1,173,1,1.6,2,0,3,0 238 | 58,1,0,125,300,0,0,171,0,0,2,2,3,0 239 | 60,1,0,140,293,0,0,170,0,1.2,1,2,3,0 240 | 77,1,0,125,304,0,0,162,1,0,2,3,2,0 241 | 35,1,0,126,282,0,0,156,1,0,2,0,3,0 242 | 70,1,2,160,269,0,1,112,1,2.9,1,1,3,0 243 | 59,0,0,174,249,0,1,143,1,0,1,0,2,0 244 | 64,1,0,145,212,0,0,132,0,2,1,2,1,0 245 | 57,1,0,152,274,0,1,88,1,1.2,1,1,3,0 246 | 56,1,0,132,184,0,0,105,1,2.1,1,1,1,0 247 | 48,1,0,124,274,0,0,166,0,0.5,1,0,3,0 248 | 56,0,0,134,409,0,0,150,1,1.9,1,2,3,0 249 | 66,1,1,160,246,0,1,120,1,0,1,3,1,0 250 | 54,1,1,192,283,0,0,195,0,0,2,1,3,0 251 | 69,1,2,140,254,0,0,146,0,2,1,3,3,0 252 | 51,1,0,140,298,0,1,122,1,4.2,1,3,3,0 253 | 43,1,0,132,247,1,0,143,1,0.1,1,4,3,0 254 | 62,0,0,138,294,1,1,106,0,1.9,1,3,2,0 255 | 67,1,0,100,299,0,0,125,1,0.9,1,2,2,0 256 | 59,1,3,160,273,0,0,125,0,0,2,0,2,0 257 | 45,1,0,142,309,0,0,147,1,0,1,3,3,0 258 | 58,1,0,128,259,0,0,130,1,3,1,2,3,0 259 | 50,1,0,144,200,0,0,126,1,0.9,1,0,3,0 260 | 62,0,0,150,244,0,1,154,1,1.4,1,0,2,0 261 | 38,1,3,120,231,0,1,182,1,3.8,1,0,3,0 262 | 66,0,0,178,228,1,1,165,1,1,1,2,3,0 263 | 52,1,0,112,230,0,1,160,0,0,2,1,2,0 264 | 53,1,0,123,282,0,1,95,1,2,1,2,3,0 265 | 63,0,0,108,269,0,1,169,1,1.8,1,2,2,0 266 | 54,1,0,110,206,0,0,108,1,0,1,1,2,0 267 | 66,1,0,112,212,0,0,132,1,0.1,2,1,2,0 268 | 55,0,0,180,327,0,2,117,1,3.4,1,0,2,0 269 | 49,1,2,118,149,0,0,126,0,0.8,2,3,2,0 270 | 54,1,0,122,286,0,0,116,1,3.2,1,2,2,0 271 | 56,1,0,130,283,1,0,103,1,1.6,0,0,3,0 272 | 46,1,0,120,249,0,0,144,0,0.8,2,0,3,0 273 | 61,1,3,134,234,0,1,145,0,2.6,1,2,2,0 274 | 67,1,0,120,237,0,1,71,0,1,1,0,2,0 275 | 58,1,0,100,234,0,1,156,0,0.1,2,1,3,0 276 | 47,1,0,110,275,0,0,118,1,1,1,1,2,0 277 | 52,1,0,125,212,0,1,168,0,1,2,2,3,0 278 | 58,1,0,146,218,0,1,105,0,2,1,1,3,0 279 | 57,1,1,124,261,0,1,141,0,0.3,2,0,3,0 280 | 58,0,1,136,319,1,0,152,0,0,2,2,2,0 281 | 61,1,0,138,166,0,0,125,1,3.6,1,1,2,0 282 | 42,1,0,136,315,0,1,125,1,1.8,1,0,1,0 283 | 52,1,0,128,204,1,1,156,1,1,1,0,0,0 284 | 59,1,2,126,218,1,1,134,0,2.2,1,1,1,0 285 | 40,1,0,152,223,0,1,181,0,0,2,0,3,0 286 | 61,1,0,140,207,0,0,138,1,1.9,2,1,3,0 287 | 46,1,0,140,311,0,1,120,1,1.8,1,2,3,0 288 | 59,1,3,134,204,0,1,162,0,0.8,2,2,2,0 289 | 57,1,1,154,232,0,0,164,0,0,2,1,2,0 290 | 57,1,0,110,335,0,1,143,1,3,1,1,3,0 291 | 55,0,0,128,205,0,2,130,1,2,1,1,3,0 292 | 61,1,0,148,203,0,1,161,0,0,2,1,3,0 293 | 58,1,0,114,318,0,2,140,0,4.4,0,3,1,0 294 | 58,0,0,170,225,1,0,146,1,2.8,1,2,1,0 295 | 67,1,2,152,212,0,0,150,0,0.8,1,0,3,0 296 | 44,1,0,120,169,0,1,144,1,2.8,0,0,1,0 297 | 63,1,0,140,187,0,0,144,1,4,2,2,3,0 298 | 63,0,0,124,197,0,1,136,1,0,1,0,2,0 299 | 59,1,0,164,176,1,0,90,0,1,1,2,1,0 300 | 57,0,0,140,241,0,1,123,1,0.2,1,0,3,0 301 | 45,1,3,110,264,0,1,132,0,1.2,1,0,3,0 302 | 68,1,0,144,193,1,1,141,0,3.4,1,2,3,0 303 | 57,1,0,130,131,0,1,115,1,1.2,1,1,3,0 304 | 57,0,1,130,236,0,0,174,0,0,1,1,2,0 305 | -------------------------------------------------------------------------------- /practiceResource/dataMaipulation/Questions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Extra Practise - Basics\n", 8 | "\n", 9 | "In this optional practise session, I thought it would be fun to look at some cost of living data from, you guessed it, Kaggle: https://www.kaggle.com/stephenofarrell/cost-of-living\n", 10 | "\n", 11 | "Here are the objectives:\n", 12 | "\n", 13 | "1. Rename the \"index\" column to \"location\"\n", 14 | "2. Utilise apply to generate two new columns from the location - city and country\n", 15 | "3. Realise the easy solution doesn't doesnt work for the United States and create a function for apply to remove specific states.\n", 16 | "3. Figure out which country has the most cities listed, and create a dataset from only that country\n", 17 | "4. Sort the dataset by the cost of living 'Apartment (1 bedroom) in City Centre'\n", 18 | "5. Cry over housing prices if you live in the Bay Area.\n", 19 | "\n", 20 | "After that, feel free to keep playing with the data yourself.\n" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 32, 26 | "metadata": { 27 | "ExecuteTime": { 28 | "end_time": "2020-02-03T02:01:59.091568Z", 29 | "start_time": "2020-02-03T02:01:59.056659Z" 30 | } 31 | }, 32 | "outputs": [ 33 | { 34 | "data": { 35 | "text/html": [ 36 | "
\n", 37 | "\n", 50 | "\n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | "
indexMeal, Inexpensive RestaurantMeal for 2 People, Mid-range Restaurant, Three-courseMcMeal at McDonalds (or Equivalent Combo Meal)Domestic Beer (0.5 liter draught)Imported Beer (0.33 liter bottle)Coke/Pepsi (0.33 liter bottle)Water (0.33 liter bottle)Milk (regular), (1 liter)Loaf of Fresh White Bread (500g)...Lettuce (1 head)Cappuccino (regular)Rice (white), (1kg)Tomato (1kg)Banana (1kg)Onion (1kg)Beef Round (1kg) (or Equivalent Back Leg Red Meat)Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car)Preschool (or Kindergarten), Full Day, Private, Monthly for 1 ChildInternational Primary School, Yearly for 1 Child
0Saint Petersburg, Russia7.3429.354.402.202.200.760.530.980.71...0.861.960.921.910.890.487.1819305.29411.835388.86
1Istanbul, Turkey4.5815.283.823.063.060.640.240.710.36...0.611.841.300.801.910.629.7320874.72282.946905.43
2Izmir, Turkey3.0612.223.062.292.750.610.220.650.38...0.571.561.310.701.780.588.6120898.83212.184948.41
3Helsinki, Finland12.0065.008.006.506.752.661.890.962.27...2.303.872.132.911.611.2512.3424402.77351.601641.00
4Chisinau, Moldova4.6720.744.151.041.430.640.440.680.33...0.841.250.931.561.370.595.3717238.13210.522679.30
\n", 200 | "

5 rows × 56 columns

\n", 201 | "
" 202 | ], 203 | "text/plain": [ 204 | " index Meal, Inexpensive Restaurant \\\n", 205 | "0 Saint Petersburg, Russia 7.34 \n", 206 | "1 Istanbul, Turkey 4.58 \n", 207 | "2 Izmir, Turkey 3.06 \n", 208 | "3 Helsinki, Finland 12.00 \n", 209 | "4 Chisinau, Moldova 4.67 \n", 210 | "\n", 211 | " Meal for 2 People, Mid-range Restaurant, Three-course \\\n", 212 | "0 29.35 \n", 213 | "1 15.28 \n", 214 | "2 12.22 \n", 215 | "3 65.00 \n", 216 | "4 20.74 \n", 217 | "\n", 218 | " McMeal at McDonalds (or Equivalent Combo Meal) \\\n", 219 | "0 4.40 \n", 220 | "1 3.82 \n", 221 | "2 3.06 \n", 222 | "3 8.00 \n", 223 | "4 4.15 \n", 224 | "\n", 225 | " Domestic Beer (0.5 liter draught) Imported Beer (0.33 liter bottle) \\\n", 226 | "0 2.20 2.20 \n", 227 | "1 3.06 3.06 \n", 228 | "2 2.29 2.75 \n", 229 | "3 6.50 6.75 \n", 230 | "4 1.04 1.43 \n", 231 | "\n", 232 | " Coke/Pepsi (0.33 liter bottle) Water (0.33 liter bottle) \\\n", 233 | "0 0.76 0.53 \n", 234 | "1 0.64 0.24 \n", 235 | "2 0.61 0.22 \n", 236 | "3 2.66 1.89 \n", 237 | "4 0.64 0.44 \n", 238 | "\n", 239 | " Milk (regular), (1 liter) Loaf of Fresh White Bread (500g) ... \\\n", 240 | "0 0.98 0.71 ... \n", 241 | "1 0.71 0.36 ... \n", 242 | "2 0.65 0.38 ... \n", 243 | "3 0.96 2.27 ... \n", 244 | "4 0.68 0.33 ... \n", 245 | "\n", 246 | " Lettuce (1 head) Cappuccino (regular) Rice (white), (1kg) Tomato (1kg) \\\n", 247 | "0 0.86 1.96 0.92 1.91 \n", 248 | "1 0.61 1.84 1.30 0.80 \n", 249 | "2 0.57 1.56 1.31 0.70 \n", 250 | "3 2.30 3.87 2.13 2.91 \n", 251 | "4 0.84 1.25 0.93 1.56 \n", 252 | "\n", 253 | " Banana (1kg) Onion (1kg) \\\n", 254 | "0 0.89 0.48 \n", 255 | "1 1.91 0.62 \n", 256 | "2 1.78 0.58 \n", 257 | "3 1.61 1.25 \n", 258 | "4 1.37 0.59 \n", 259 | "\n", 260 | " Beef Round (1kg) (or Equivalent Back Leg Red Meat) \\\n", 261 | "0 7.18 \n", 262 | "1 9.73 \n", 263 | "2 8.61 \n", 264 | "3 12.34 \n", 265 | "4 5.37 \n", 266 | "\n", 267 | " Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car) \\\n", 268 | "0 19305.29 \n", 269 | "1 20874.72 \n", 270 | "2 20898.83 \n", 271 | "3 24402.77 \n", 272 | "4 17238.13 \n", 273 | "\n", 274 | " Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child \\\n", 275 | "0 411.83 \n", 276 | "1 282.94 \n", 277 | "2 212.18 \n", 278 | "3 351.60 \n", 279 | "4 210.52 \n", 280 | "\n", 281 | " International Primary School, Yearly for 1 Child \n", 282 | "0 5388.86 \n", 283 | "1 6905.43 \n", 284 | "2 4948.41 \n", 285 | "3 1641.00 \n", 286 | "4 2679.30 \n", 287 | "\n", 288 | "[5 rows x 56 columns]" 289 | ] 290 | }, 291 | "execution_count": 32, 292 | "metadata": {}, 293 | "output_type": "execute_result" 294 | } 295 | ], 296 | "source": [ 297 | "# Code to start you off and manipulate the data. .T is transpose - swap columns and rows\n", 298 | "import pandas as pd\n", 299 | "\n", 300 | "df = pd.read_csv(\"cost-of-living.csv\", index_col=0).T.reset_index()\n", 301 | "df.head()" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "## Rename column" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 1, 314 | "metadata": { 315 | "ExecuteTime": { 316 | "end_time": "2020-02-03T02:16:15.578519Z", 317 | "start_time": "2020-02-03T02:16:15.574529Z" 318 | } 319 | }, 320 | "outputs": [], 321 | "source": [ 322 | "# your code here" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "## Get city and country" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 2, 335 | "metadata": { 336 | "ExecuteTime": { 337 | "end_time": "2020-02-03T02:16:18.088160Z", 338 | "start_time": "2020-02-03T02:16:18.084161Z" 339 | } 340 | }, 341 | "outputs": [], 342 | "source": [ 343 | "# your code here" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 3, 349 | "metadata": { 350 | "ExecuteTime": { 351 | "end_time": "2020-02-03T02:16:46.755343Z", 352 | "start_time": "2020-02-03T02:16:46.752351Z" 353 | } 354 | }, 355 | "outputs": [], 356 | "source": [ 357 | "# And - if needed - correct for the US including states and nowhere else doing it" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "## Figure out which country has the most cities" 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 4, 370 | "metadata": { 371 | "ExecuteTime": { 372 | "end_time": "2020-02-03T02:16:50.046784Z", 373 | "start_time": "2020-02-03T02:16:50.042796Z" 374 | } 375 | }, 376 | "outputs": [], 377 | "source": [ 378 | "# your code here" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "## Create a subset of only that country" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": 5, 391 | "metadata": { 392 | "ExecuteTime": { 393 | "end_time": "2020-02-03T02:16:54.606541Z", 394 | "start_time": "2020-02-03T02:16:54.602530Z" 395 | } 396 | }, 397 | "outputs": [], 398 | "source": [ 399 | "# your code here" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": {}, 405 | "source": [ 406 | "## Sort by housing accommodation" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": 8, 412 | "metadata": { 413 | "ExecuteTime": { 414 | "end_time": "2020-02-03T02:17:07.409143Z", 415 | "start_time": "2020-02-03T02:17:07.406151Z" 416 | } 417 | }, 418 | "outputs": [], 419 | "source": [ 420 | "col = \"Apartment (1 bedroom) in City Centre\"\n", 421 | "# your code here" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "## Despair over the cost of housing" 429 | ] 430 | } 431 | ], 432 | "metadata": { 433 | "kernelspec": { 434 | "display_name": "Python 3", 435 | "language": "python", 436 | "name": "python3" 437 | }, 438 | "language_info": { 439 | "codemirror_mode": { 440 | "name": "ipython", 441 | "version": 3 442 | }, 443 | "file_extension": ".py", 444 | "mimetype": "text/x-python", 445 | "name": "python", 446 | "nbconvert_exporter": "python", 447 | "pygments_lexer": "ipython3", 448 | "version": "3.7.3" 449 | } 450 | }, 451 | "nbformat": 4, 452 | "nbformat_minor": 2 453 | } 454 | -------------------------------------------------------------------------------- /practiceResource/Answers.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Optional Exercise - Data loading\n", 8 | "\n", 9 | "Heres a very short example for four files to load in. None of these example should be hard, but the idea is to familiarise yourself with the sorts of error messages or weird output you'd get if you load in with the defaults (and they don't work), and then how to change your input arguments to make it better.\n", 10 | "\n", 11 | "The files to attempt to load in are:\n", 12 | "\n", 13 | "1. example.pkl\n", 14 | "2. example_1.csv\n", 15 | "3. example_2.csv\n", 16 | "3. example_3.csv\n", 17 | "3. example_4.csv" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "metadata": { 24 | "ExecuteTime": { 25 | "end_time": "2020-02-02T07:24:53.012402Z", 26 | "start_time": "2020-02-02T07:24:52.651384Z" 27 | } 28 | }, 29 | "outputs": [], 30 | "source": [ 31 | "import pandas as pd" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 4, 37 | "metadata": { 38 | "ExecuteTime": { 39 | "end_time": "2020-02-02T07:24:57.184281Z", 40 | "start_time": "2020-02-02T07:24:57.169320Z" 41 | } 42 | }, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/html": [ 47 | "
\n", 48 | "\n", 61 | "\n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | "
namegenderageoaths
0KaladinMale20.03.0
1ShallanFemale17.02.0
2DalinarMale53.03.0
3SzethMale35.00.0
4HoidMaleNaNNaN
5JashnahFemale34.03.0
\n", 116 | "
" 117 | ], 118 | "text/plain": [ 119 | " name gender age oaths\n", 120 | "0 Kaladin Male 20.0 3.0\n", 121 | "1 Shallan Female 17.0 2.0\n", 122 | "2 Dalinar Male 53.0 3.0\n", 123 | "3 Szeth Male 35.0 0.0\n", 124 | "4 Hoid Male NaN NaN\n", 125 | "5 Jashnah Female 34.0 3.0" 126 | ] 127 | }, 128 | "execution_count": 4, 129 | "metadata": {}, 130 | "output_type": "execute_result" 131 | } 132 | ], 133 | "source": [ 134 | "pd.read_pickle(\"example.pkl\")" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 6, 140 | "metadata": { 141 | "ExecuteTime": { 142 | "end_time": "2020-02-02T07:34:57.663844Z", 143 | "start_time": "2020-02-02T07:34:57.648884Z" 144 | } 145 | }, 146 | "outputs": [ 147 | { 148 | "data": { 149 | "text/html": [ 150 | "
\n", 151 | "\n", 164 | "\n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | "
namegenderageoaths
0KaladinMale20.03.0
1ShallanFemale17.02.0
2DalinarMale53.03.0
3SzethMale35.00.0
4HoidMaleNaNNaN
5JashnahFemale34.03.0
\n", 219 | "
" 220 | ], 221 | "text/plain": [ 222 | " name gender age oaths\n", 223 | "0 Kaladin Male 20.0 3.0\n", 224 | "1 Shallan Female 17.0 2.0\n", 225 | "2 Dalinar Male 53.0 3.0\n", 226 | "3 Szeth Male 35.0 0.0\n", 227 | "4 Hoid Male NaN NaN\n", 228 | "5 Jashnah Female 34.0 3.0" 229 | ] 230 | }, 231 | "execution_count": 6, 232 | "metadata": {}, 233 | "output_type": "execute_result" 234 | } 235 | ], 236 | "source": [ 237 | "pd.read_csv(\"example_1.csv\")" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 8, 243 | "metadata": { 244 | "ExecuteTime": { 245 | "end_time": "2020-02-02T07:35:19.179516Z", 246 | "start_time": "2020-02-02T07:35:19.168566Z" 247 | } 248 | }, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/html": [ 253 | "
\n", 254 | "\n", 267 | "\n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | "
namegenderageoaths
0KaladinMale20.03.0
1ShallanFemale17.02.0
2DalinarMale53.03.0
3SzethMale35.00.0
4HoidMaleNaNNaN
5JashnahFemale34.03.0
\n", 322 | "
" 323 | ], 324 | "text/plain": [ 325 | " name gender age oaths\n", 326 | "0 Kaladin Male 20.0 3.0\n", 327 | "1 Shallan Female 17.0 2.0\n", 328 | "2 Dalinar Male 53.0 3.0\n", 329 | "3 Szeth Male 35.0 0.0\n", 330 | "4 Hoid Male NaN NaN\n", 331 | "5 Jashnah Female 34.0 3.0" 332 | ] 333 | }, 334 | "execution_count": 8, 335 | "metadata": {}, 336 | "output_type": "execute_result" 337 | } 338 | ], 339 | "source": [ 340 | "pd.read_csv(\"example_2.csv\", delim_whitespace=True)" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 11, 346 | "metadata": { 347 | "ExecuteTime": { 348 | "end_time": "2020-02-02T07:37:11.470254Z", 349 | "start_time": "2020-02-02T07:37:11.455294Z" 350 | } 351 | }, 352 | "outputs": [ 353 | { 354 | "data": { 355 | "text/html": [ 356 | "
\n", 357 | "\n", 370 | "\n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | "
namegenderageoaths
0KaladinMale20.03.0
1ShallanFemale17.02.0
2DalinarMale53.03.0
3SzethMale35.00.0
4HoidMaleNaNNaN
5JashnahFemale34.03.0
\n", 425 | "
" 426 | ], 427 | "text/plain": [ 428 | " name gender age oaths\n", 429 | "0 Kaladin Male 20.0 3.0\n", 430 | "1 Shallan Female 17.0 2.0\n", 431 | "2 Dalinar Male 53.0 3.0\n", 432 | "3 Szeth Male 35.0 0.0\n", 433 | "4 Hoid Male NaN NaN\n", 434 | "5 Jashnah Female 34.0 3.0" 435 | ] 436 | }, 437 | "execution_count": 11, 438 | "metadata": {}, 439 | "output_type": "execute_result" 440 | } 441 | ], 442 | "source": [ 443 | "pd.read_csv(\"example_3.csv\", index_col=0)" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": 14, 449 | "metadata": { 450 | "ExecuteTime": { 451 | "end_time": "2020-02-02T07:37:31.812465Z", 452 | "start_time": "2020-02-02T07:37:31.801514Z" 453 | } 454 | }, 455 | "outputs": [ 456 | { 457 | "data": { 458 | "text/html": [ 459 | "
\n", 460 | "\n", 473 | "\n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | "
namegenderageoaths
0KaladinMale20.03.0
1ShallanFemale17.02.0
2DalinarMale53.03.0
3SzethMale35.00.0
4HoidMaleNaNNaN
5JashnahFemale34.03.0
\n", 528 | "
" 529 | ], 530 | "text/plain": [ 531 | " name gender age oaths\n", 532 | "0 Kaladin Male 20.0 3.0\n", 533 | "1 Shallan Female 17.0 2.0\n", 534 | "2 Dalinar Male 53.0 3.0\n", 535 | "3 Szeth Male 35.0 0.0\n", 536 | "4 Hoid Male NaN NaN\n", 537 | "5 Jashnah Female 34.0 3.0" 538 | ] 539 | }, 540 | "execution_count": 14, 541 | "metadata": {}, 542 | "output_type": "execute_result" 543 | } 544 | ], 545 | "source": [ 546 | "pd.read_csv(\"example_4.csv\", sep=\"|\", comment=\"#\")" 547 | ] 548 | } 549 | ], 550 | "metadata": { 551 | "kernelspec": { 552 | "display_name": "Python 3", 553 | "language": "python", 554 | "name": "python3" 555 | }, 556 | "language_info": { 557 | "codemirror_mode": { 558 | "name": "ipython", 559 | "version": 3 560 | }, 561 | "file_extension": ".py", 562 | "mimetype": "text/x-python", 563 | "name": "python", 564 | "nbconvert_exporter": "python", 565 | "pygments_lexer": "ipython3", 566 | "version": "3.7.3" 567 | } 568 | }, 569 | "nbformat": 4, 570 | "nbformat_minor": 2 571 | } 572 | -------------------------------------------------------------------------------- /practiceResource/SavingAndSerialising.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Saving and Serialising a dataframe\n" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "ExecuteTime": { 15 | "end_time": "2020-02-16T02:55:25.086213Z", 16 | "start_time": "2020-02-16T02:55:23.758762Z" 17 | } 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np\n", 22 | "import pandas as pd" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 4, 28 | "metadata": { 29 | "ExecuteTime": { 30 | "end_time": "2020-02-16T03:01:00.686674Z", 31 | "start_time": "2020-02-16T03:01:00.668178Z" 32 | } 33 | }, 34 | "outputs": [ 35 | { 36 | "data": { 37 | "text/html": [ 38 | "
\n", 39 | "\n", 52 | "\n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | "
ABCD
00.0694740.0168390.6076930.960414
10.7555620.7923020.6388260.257696
20.7662770.0490240.2643780.898995
30.2633860.1885900.9770280.101986
40.0521840.3811860.6552440.827316
\n", 100 | "
" 101 | ], 102 | "text/plain": [ 103 | " A B C D\n", 104 | "0 0.069474 0.016839 0.607693 0.960414\n", 105 | "1 0.755562 0.792302 0.638826 0.257696\n", 106 | "2 0.766277 0.049024 0.264378 0.898995\n", 107 | "3 0.263386 0.188590 0.977028 0.101986\n", 108 | "4 0.052184 0.381186 0.655244 0.827316" 109 | ] 110 | }, 111 | "execution_count": 4, 112 | "metadata": {}, 113 | "output_type": "execute_result" 114 | } 115 | ], 116 | "source": [ 117 | "# Lets make a new dataframe and save it out using various formats\n", 118 | "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n", 119 | "df.head()" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 6, 125 | "metadata": { 126 | "ExecuteTime": { 127 | "end_time": "2020-02-16T03:03:34.987813Z", 128 | "start_time": "2020-02-16T03:03:34.219248Z" 129 | } 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 7, 139 | "metadata": { 140 | "ExecuteTime": { 141 | "end_time": "2020-02-16T03:04:00.092272Z", 142 | "start_time": "2020-02-16T03:04:00.079738Z" 143 | } 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "df.to_pickle(\"save.pkl\")" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 8, 153 | "metadata": { 154 | "ExecuteTime": { 155 | "end_time": "2020-02-16T03:06:06.874338Z", 156 | "start_time": "2020-02-16T03:06:05.955905Z" 157 | } 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "# pip install tables\n", 162 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 9, 168 | "metadata": { 169 | "ExecuteTime": { 170 | "end_time": "2020-02-16T03:06:56.305779Z", 171 | "start_time": "2020-02-16T03:06:56.204901Z" 172 | } 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "# pip install feather-format\n", 177 | "df.to_feather(\"save.fth\")" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 11, 183 | "metadata": { 184 | "ExecuteTime": { 185 | "end_time": "2020-02-16T03:10:46.080056Z", 186 | "start_time": "2020-02-16T03:10:46.075636Z" 187 | } 188 | }, 189 | "outputs": [], 190 | "source": [ 191 | "# If you want to get the timings you can see in the video, you'll need this extension:\n", 192 | "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 14, 205 | "metadata": { 206 | "ExecuteTime": { 207 | "end_time": "2020-02-16T03:14:16.764994Z", 208 | "start_time": "2020-02-16T03:14:16.741456Z" 209 | } 210 | }, 211 | "outputs": [ 212 | { 213 | "data": { 214 | "text/html": [ 215 | "
\n", 216 | "\n", 229 | "\n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | "
NameYearGroupStatusBirth DateBirth PlaceGenderAlma MaterUndergraduate MajorGraduate MajorMilitary RankMilitary BranchSpace FlightsSpace Flight (hr)Space WalksSpace Walks (hr)MissionsDeath DateDeath Mission
0Joseph M. Acaba2004.019.0Active5/17/1967Inglewood, CAMaleUniversity of California-Santa Barbara; Univer...GeologyGeologyNaNNaN23307213.0STS-119 (Discovery), ISS-31/32 (Soyuz)NaNNaN
1Loren W. ActonNaNNaNRetired3/7/1936Lewiston, MTMaleMontana State University; University of ColoradoEngineering PhysicsSolar PhysicsNaNNaN119000.0STS 51-F (Challenger)NaNNaN
2James C. Adamson1984.010.0Retired3/3/1946Warsaw, NYMaleUS Military Academy; Princeton UniversityEngineeringAerospace EngineeringColonelUS Army (Retired)233400.0STS-28 (Columbia), STS-43 (Atlantis)NaNNaN
3Thomas D. Akers1987.012.0Retired5/20/1951St. Louis, MOMaleUniversity of Missouri-RollaApplied MathematicsApplied MathematicsColonelUS Air Force (Retired)4814429.0STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...NaNNaN
4Buzz Aldrin1963.03.0Retired1/20/1930Montclair, NJMaleUS Military Academy; MITMechanical EngineeringAstronauticsColonelUS Air Force (Retired)228928.0Gemini 12, Apollo 11NaNNaN
\n", 367 | "
" 368 | ], 369 | "text/plain": [ 370 | " Name Year Group Status Birth Date Birth Place Gender \\\n", 371 | "0 Joseph M. Acaba 2004.0 19.0 Active 5/17/1967 Inglewood, CA Male \n", 372 | "1 Loren W. Acton NaN NaN Retired 3/7/1936 Lewiston, MT Male \n", 373 | "2 James C. Adamson 1984.0 10.0 Retired 3/3/1946 Warsaw, NY Male \n", 374 | "3 Thomas D. Akers 1987.0 12.0 Retired 5/20/1951 St. Louis, MO Male \n", 375 | "4 Buzz Aldrin 1963.0 3.0 Retired 1/20/1930 Montclair, NJ Male \n", 376 | "\n", 377 | " Alma Mater Undergraduate Major \\\n", 378 | "0 University of California-Santa Barbara; Univer... Geology \n", 379 | "1 Montana State University; University of Colorado Engineering Physics \n", 380 | "2 US Military Academy; Princeton University Engineering \n", 381 | "3 University of Missouri-Rolla Applied Mathematics \n", 382 | "4 US Military Academy; MIT Mechanical Engineering \n", 383 | "\n", 384 | " Graduate Major Military Rank Military Branch Space Flights \\\n", 385 | "0 Geology NaN NaN 2 \n", 386 | "1 Solar Physics NaN NaN 1 \n", 387 | "2 Aerospace Engineering Colonel US Army (Retired) 2 \n", 388 | "3 Applied Mathematics Colonel US Air Force (Retired) 4 \n", 389 | "4 Astronautics Colonel US Air Force (Retired) 2 \n", 390 | "\n", 391 | " Space Flight (hr) Space Walks Space Walks (hr) \\\n", 392 | "0 3307 2 13.0 \n", 393 | "1 190 0 0.0 \n", 394 | "2 334 0 0.0 \n", 395 | "3 814 4 29.0 \n", 396 | "4 289 2 8.0 \n", 397 | "\n", 398 | " Missions Death Date Death Mission \n", 399 | "0 STS-119 (Discovery), ISS-31/32 (Soyuz) NaN NaN \n", 400 | "1 STS 51-F (Challenger) NaN NaN \n", 401 | "2 STS-28 (Columbia), STS-43 (Atlantis) NaN NaN \n", 402 | "3 STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... NaN NaN \n", 403 | "4 Gemini 12, Apollo 11 NaN NaN " 404 | ] 405 | }, 406 | "execution_count": 14, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "df = pd.read_csv(\"astronauts.csv\")\n", 413 | "df.head()" 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": 15, 419 | "metadata": { 420 | "ExecuteTime": { 421 | "end_time": "2020-02-16T03:14:48.250858Z", 422 | "start_time": "2020-02-16T03:14:48.237441Z" 423 | } 424 | }, 425 | "outputs": [], 426 | "source": [ 427 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 16, 433 | "metadata": { 434 | "ExecuteTime": { 435 | "end_time": "2020-02-16T03:14:52.892116Z", 436 | "start_time": "2020-02-16T03:14:52.876108Z" 437 | } 438 | }, 439 | "outputs": [], 440 | "source": [ 441 | "pd.read_csv(\"save.csv\");" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 17, 447 | "metadata": { 448 | "ExecuteTime": { 449 | "end_time": "2020-02-16T03:15:12.997156Z", 450 | "start_time": "2020-02-16T03:15:12.988669Z" 451 | } 452 | }, 453 | "outputs": [], 454 | "source": [ 455 | "df.to_pickle(\"save.pkl\")" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 18, 461 | "metadata": { 462 | "ExecuteTime": { 463 | "end_time": "2020-02-16T03:15:16.375064Z", 464 | "start_time": "2020-02-16T03:15:16.365034Z" 465 | } 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "pd.read_pickle(\"save.pkl\");" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 32, 475 | "metadata": { 476 | "ExecuteTime": { 477 | "end_time": "2020-02-16T03:19:15.617617Z", 478 | "start_time": "2020-02-16T03:19:15.588076Z" 479 | } 480 | }, 481 | "outputs": [], 482 | "source": [ 483 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 22, 489 | "metadata": { 490 | "ExecuteTime": { 491 | "end_time": "2020-02-16T03:15:35.323031Z", 492 | "start_time": "2020-02-16T03:15:35.301528Z" 493 | } 494 | }, 495 | "outputs": [], 496 | "source": [ 497 | "pd.read_hdf(\"save.hdf\");" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 23, 503 | "metadata": { 504 | "ExecuteTime": { 505 | "end_time": "2020-02-16T03:15:47.513253Z", 506 | "start_time": "2020-02-16T03:15:47.499922Z" 507 | } 508 | }, 509 | "outputs": [], 510 | "source": [ 511 | "df.to_feather(\"save.fth\")" 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": 24, 517 | "metadata": { 518 | "ExecuteTime": { 519 | "end_time": "2020-02-16T03:15:50.574863Z", 520 | "start_time": "2020-02-16T03:15:50.557141Z" 521 | } 522 | }, 523 | "outputs": [], 524 | "source": [ 525 | "pd.read_feather(\"save.fth\");" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 34, 531 | "metadata": { 532 | "ExecuteTime": { 533 | "end_time": "2020-02-16T03:20:03.082982Z", 534 | "start_time": "2020-02-16T03:20:03.062532Z" 535 | } 536 | }, 537 | "outputs": [ 538 | { 539 | "name": "stdout", 540 | "output_type": "stream", 541 | "text": [ 542 | " Volume in drive C is System\n", 543 | " Volume Serial Number is 48F0-A822\n", 544 | "\n", 545 | " Directory of C:\\Users\\shint1\\Google Drive\\SDS\\DataManip\\2. Notebooks and Datasets\\2_Data\\Lectures\n", 546 | "\n", 547 | "16/02/2020 01:18 PM .\n", 548 | "16/02/2020 01:18 PM ..\n", 549 | "16/02/2020 12:55 PM .ipynb_checkpoints\n", 550 | "14/02/2020 10:50 PM 38,725 1_Loading.ipynb\n", 551 | "14/02/2020 11:32 PM 32,118 2_NumpyVPandas.ipynb\n", 552 | "16/02/2020 12:54 PM 9,216 3_CreatingDataFrames.ipynb\n", 553 | "16/02/2020 01:18 PM 18,019 4_SavingAndSerialising.ipynb\n", 554 | "20/09/2019 10:04 AM 81,593 astronauts.csv\n", 555 | "01/10/2019 08:15 PM 11,328 heart.csv\n", 556 | "18/01/2020 01:19 PM 35,216 heart.pkl\n", 557 | "16/02/2020 01:14 PM 87,030 save.csv\n", 558 | "16/02/2020 01:15 PM 107,240 save.fth\n", 559 | "16/02/2020 01:19 PM 4,108,481 save.hdf\n", 560 | "16/02/2020 01:15 PM 90,693 save.pkl\n", 561 | " 11 File(s) 4,619,659 bytes\n", 562 | " 3 Dir(s) 244,606,853,120 bytes free\n" 563 | ] 564 | } 565 | ], 566 | "source": [ 567 | "%ls" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": {}, 573 | "source": [ 574 | "### Recap\n", 575 | "\n", 576 | "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n", 577 | "\n", 578 | "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine." 579 | ] 580 | } 581 | ], 582 | "metadata": { 583 | "kernelspec": { 584 | "display_name": "Python 3", 585 | "language": "python", 586 | "name": "python3" 587 | }, 588 | "language_info": { 589 | "codemirror_mode": { 590 | "name": "ipython", 591 | "version": 3 592 | }, 593 | "file_extension": ".py", 594 | "mimetype": "text/x-python", 595 | "name": "python", 596 | "nbconvert_exporter": "python", 597 | "pygments_lexer": "ipython3", 598 | "version": "3.7.3" 599 | } 600 | }, 601 | "nbformat": 4, 602 | "nbformat_minor": 2 603 | } 604 | -------------------------------------------------------------------------------- /practiceResource/.ipynb_checkpoints/dataSavingAndSerialising-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Saving and Serialising a dataframe\n" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "ExecuteTime": { 15 | "end_time": "2020-02-16T02:55:25.086213Z", 16 | "start_time": "2020-02-16T02:55:23.758762Z" 17 | } 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np\n", 22 | "import pandas as pd" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 2, 28 | "metadata": { 29 | "ExecuteTime": { 30 | "end_time": "2020-02-16T03:01:00.686674Z", 31 | "start_time": "2020-02-16T03:01:00.668178Z" 32 | } 33 | }, 34 | "outputs": [ 35 | { 36 | "data": { 37 | "text/html": [ 38 | "
\n", 39 | "\n", 52 | "\n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | "
ABCD
00.8631490.3147320.6697470.702656
10.5465420.5636070.7805320.312281
20.0240580.4731080.4479800.811878
30.8887020.3925240.8301590.452014
40.2667930.4497800.5895460.882689
\n", 100 | "
" 101 | ], 102 | "text/plain": [ 103 | " A B C D\n", 104 | "0 0.863149 0.314732 0.669747 0.702656\n", 105 | "1 0.546542 0.563607 0.780532 0.312281\n", 106 | "2 0.024058 0.473108 0.447980 0.811878\n", 107 | "3 0.888702 0.392524 0.830159 0.452014\n", 108 | "4 0.266793 0.449780 0.589546 0.882689" 109 | ] 110 | }, 111 | "execution_count": 2, 112 | "metadata": {}, 113 | "output_type": "execute_result" 114 | } 115 | ], 116 | "source": [ 117 | "# Lets make a new dataframe and save it out using various formats\n", 118 | "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n", 119 | "df.head()" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 3, 125 | "metadata": { 126 | "ExecuteTime": { 127 | "end_time": "2020-02-16T03:03:34.987813Z", 128 | "start_time": "2020-02-16T03:03:34.219248Z" 129 | } 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.3f\")" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 4, 139 | "metadata": { 140 | "ExecuteTime": { 141 | "end_time": "2020-02-16T03:04:00.092272Z", 142 | "start_time": "2020-02-16T03:04:00.079738Z" 143 | } 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "df.to_pickle(\"save.pkl\")" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 8, 153 | "metadata": { 154 | "ExecuteTime": { 155 | "end_time": "2020-02-16T03:06:06.874338Z", 156 | "start_time": "2020-02-16T03:06:05.955905Z" 157 | } 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "# pip install tables\n", 162 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 9, 168 | "metadata": { 169 | "ExecuteTime": { 170 | "end_time": "2020-02-16T03:06:56.305779Z", 171 | "start_time": "2020-02-16T03:06:56.204901Z" 172 | } 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "# pip install feather-format\n", 177 | "df.to_feather(\"save.fth\")" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 11, 183 | "metadata": { 184 | "ExecuteTime": { 185 | "end_time": "2020-02-16T03:10:46.080056Z", 186 | "start_time": "2020-02-16T03:10:46.075636Z" 187 | } 188 | }, 189 | "outputs": [], 190 | "source": [ 191 | "# If you want to get the timings you can see in the video, you'll need this extension:\n", 192 | "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 14, 205 | "metadata": { 206 | "ExecuteTime": { 207 | "end_time": "2020-02-16T03:14:16.764994Z", 208 | "start_time": "2020-02-16T03:14:16.741456Z" 209 | } 210 | }, 211 | "outputs": [ 212 | { 213 | "data": { 214 | "text/html": [ 215 | "
\n", 216 | "\n", 229 | "\n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | "
NameYearGroupStatusBirth DateBirth PlaceGenderAlma MaterUndergraduate MajorGraduate MajorMilitary RankMilitary BranchSpace FlightsSpace Flight (hr)Space WalksSpace Walks (hr)MissionsDeath DateDeath Mission
0Joseph M. Acaba2004.019.0Active5/17/1967Inglewood, CAMaleUniversity of California-Santa Barbara; Univer...GeologyGeologyNaNNaN23307213.0STS-119 (Discovery), ISS-31/32 (Soyuz)NaNNaN
1Loren W. ActonNaNNaNRetired3/7/1936Lewiston, MTMaleMontana State University; University of ColoradoEngineering PhysicsSolar PhysicsNaNNaN119000.0STS 51-F (Challenger)NaNNaN
2James C. Adamson1984.010.0Retired3/3/1946Warsaw, NYMaleUS Military Academy; Princeton UniversityEngineeringAerospace EngineeringColonelUS Army (Retired)233400.0STS-28 (Columbia), STS-43 (Atlantis)NaNNaN
3Thomas D. Akers1987.012.0Retired5/20/1951St. Louis, MOMaleUniversity of Missouri-RollaApplied MathematicsApplied MathematicsColonelUS Air Force (Retired)4814429.0STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...NaNNaN
4Buzz Aldrin1963.03.0Retired1/20/1930Montclair, NJMaleUS Military Academy; MITMechanical EngineeringAstronauticsColonelUS Air Force (Retired)228928.0Gemini 12, Apollo 11NaNNaN
\n", 367 | "
" 368 | ], 369 | "text/plain": [ 370 | " Name Year Group Status Birth Date Birth Place Gender \\\n", 371 | "0 Joseph M. Acaba 2004.0 19.0 Active 5/17/1967 Inglewood, CA Male \n", 372 | "1 Loren W. Acton NaN NaN Retired 3/7/1936 Lewiston, MT Male \n", 373 | "2 James C. Adamson 1984.0 10.0 Retired 3/3/1946 Warsaw, NY Male \n", 374 | "3 Thomas D. Akers 1987.0 12.0 Retired 5/20/1951 St. Louis, MO Male \n", 375 | "4 Buzz Aldrin 1963.0 3.0 Retired 1/20/1930 Montclair, NJ Male \n", 376 | "\n", 377 | " Alma Mater Undergraduate Major \\\n", 378 | "0 University of California-Santa Barbara; Univer... Geology \n", 379 | "1 Montana State University; University of Colorado Engineering Physics \n", 380 | "2 US Military Academy; Princeton University Engineering \n", 381 | "3 University of Missouri-Rolla Applied Mathematics \n", 382 | "4 US Military Academy; MIT Mechanical Engineering \n", 383 | "\n", 384 | " Graduate Major Military Rank Military Branch Space Flights \\\n", 385 | "0 Geology NaN NaN 2 \n", 386 | "1 Solar Physics NaN NaN 1 \n", 387 | "2 Aerospace Engineering Colonel US Army (Retired) 2 \n", 388 | "3 Applied Mathematics Colonel US Air Force (Retired) 4 \n", 389 | "4 Astronautics Colonel US Air Force (Retired) 2 \n", 390 | "\n", 391 | " Space Flight (hr) Space Walks Space Walks (hr) \\\n", 392 | "0 3307 2 13.0 \n", 393 | "1 190 0 0.0 \n", 394 | "2 334 0 0.0 \n", 395 | "3 814 4 29.0 \n", 396 | "4 289 2 8.0 \n", 397 | "\n", 398 | " Missions Death Date Death Mission \n", 399 | "0 STS-119 (Discovery), ISS-31/32 (Soyuz) NaN NaN \n", 400 | "1 STS 51-F (Challenger) NaN NaN \n", 401 | "2 STS-28 (Columbia), STS-43 (Atlantis) NaN NaN \n", 402 | "3 STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... NaN NaN \n", 403 | "4 Gemini 12, Apollo 11 NaN NaN " 404 | ] 405 | }, 406 | "execution_count": 14, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "df = pd.read_csv(\"astronauts.csv\")\n", 413 | "df.head()" 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": 15, 419 | "metadata": { 420 | "ExecuteTime": { 421 | "end_time": "2020-02-16T03:14:48.250858Z", 422 | "start_time": "2020-02-16T03:14:48.237441Z" 423 | } 424 | }, 425 | "outputs": [], 426 | "source": [ 427 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 16, 433 | "metadata": { 434 | "ExecuteTime": { 435 | "end_time": "2020-02-16T03:14:52.892116Z", 436 | "start_time": "2020-02-16T03:14:52.876108Z" 437 | } 438 | }, 439 | "outputs": [], 440 | "source": [ 441 | "pd.read_csv(\"save.csv\");" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 17, 447 | "metadata": { 448 | "ExecuteTime": { 449 | "end_time": "2020-02-16T03:15:12.997156Z", 450 | "start_time": "2020-02-16T03:15:12.988669Z" 451 | } 452 | }, 453 | "outputs": [], 454 | "source": [ 455 | "df.to_pickle(\"save.pkl\")" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 18, 461 | "metadata": { 462 | "ExecuteTime": { 463 | "end_time": "2020-02-16T03:15:16.375064Z", 464 | "start_time": "2020-02-16T03:15:16.365034Z" 465 | } 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "pd.read_pickle(\"save.pkl\");" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 32, 475 | "metadata": { 476 | "ExecuteTime": { 477 | "end_time": "2020-02-16T03:19:15.617617Z", 478 | "start_time": "2020-02-16T03:19:15.588076Z" 479 | } 480 | }, 481 | "outputs": [], 482 | "source": [ 483 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 22, 489 | "metadata": { 490 | "ExecuteTime": { 491 | "end_time": "2020-02-16T03:15:35.323031Z", 492 | "start_time": "2020-02-16T03:15:35.301528Z" 493 | } 494 | }, 495 | "outputs": [], 496 | "source": [ 497 | "pd.read_hdf(\"save.hdf\");" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 23, 503 | "metadata": { 504 | "ExecuteTime": { 505 | "end_time": "2020-02-16T03:15:47.513253Z", 506 | "start_time": "2020-02-16T03:15:47.499922Z" 507 | } 508 | }, 509 | "outputs": [], 510 | "source": [ 511 | "df.to_feather(\"save.fth\")" 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": 24, 517 | "metadata": { 518 | "ExecuteTime": { 519 | "end_time": "2020-02-16T03:15:50.574863Z", 520 | "start_time": "2020-02-16T03:15:50.557141Z" 521 | } 522 | }, 523 | "outputs": [], 524 | "source": [ 525 | "pd.read_feather(\"save.fth\");" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 34, 531 | "metadata": { 532 | "ExecuteTime": { 533 | "end_time": "2020-02-16T03:20:03.082982Z", 534 | "start_time": "2020-02-16T03:20:03.062532Z" 535 | } 536 | }, 537 | "outputs": [ 538 | { 539 | "name": "stdout", 540 | "output_type": "stream", 541 | "text": [ 542 | " Volume in drive C is System\n", 543 | " Volume Serial Number is 48F0-A822\n", 544 | "\n", 545 | " Directory of C:\\Users\\shint1\\Google Drive\\SDS\\DataManip\\2. Notebooks and Datasets\\2_Data\\Lectures\n", 546 | "\n", 547 | "16/02/2020 01:18 PM .\n", 548 | "16/02/2020 01:18 PM ..\n", 549 | "16/02/2020 12:55 PM .ipynb_checkpoints\n", 550 | "14/02/2020 10:50 PM 38,725 1_Loading.ipynb\n", 551 | "14/02/2020 11:32 PM 32,118 2_NumpyVPandas.ipynb\n", 552 | "16/02/2020 12:54 PM 9,216 3_CreatingDataFrames.ipynb\n", 553 | "16/02/2020 01:18 PM 18,019 4_SavingAndSerialising.ipynb\n", 554 | "20/09/2019 10:04 AM 81,593 astronauts.csv\n", 555 | "01/10/2019 08:15 PM 11,328 heart.csv\n", 556 | "18/01/2020 01:19 PM 35,216 heart.pkl\n", 557 | "16/02/2020 01:14 PM 87,030 save.csv\n", 558 | "16/02/2020 01:15 PM 107,240 save.fth\n", 559 | "16/02/2020 01:19 PM 4,108,481 save.hdf\n", 560 | "16/02/2020 01:15 PM 90,693 save.pkl\n", 561 | " 11 File(s) 4,619,659 bytes\n", 562 | " 3 Dir(s) 244,606,853,120 bytes free\n" 563 | ] 564 | } 565 | ], 566 | "source": [ 567 | "%ls" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": {}, 573 | "source": [ 574 | "### Recap\n", 575 | "\n", 576 | "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n", 577 | "\n", 578 | "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine." 579 | ] 580 | } 581 | ], 582 | "metadata": { 583 | "kernelspec": { 584 | "display_name": "Python 3", 585 | "language": "python", 586 | "name": "python3" 587 | }, 588 | "language_info": { 589 | "codemirror_mode": { 590 | "name": "ipython", 591 | "version": 3 592 | }, 593 | "file_extension": ".py", 594 | "mimetype": "text/x-python", 595 | "name": "python", 596 | "nbconvert_exporter": "python", 597 | "pygments_lexer": "ipython3", 598 | "version": "3.7.4" 599 | } 600 | }, 601 | "nbformat": 4, 602 | "nbformat_minor": 2 603 | } 604 | -------------------------------------------------------------------------------- /practiceResource/dataloading.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Loading Datasets\n", 8 | "\n", 9 | "We'll be using the Kaggle Heart Disease UCI dataset as an example. You can find it here: https://www.kaggle.com/ronitf/heart-disease-uci\n", 10 | "\n", 11 | "* Manual loading (last resort)\n", 12 | "* `np.loadtxt`\n", 13 | "* `np.genfromtxt`\n", 14 | "* `pd.read_csv`\n", 15 | "* `pd.read*`\n", 16 | "* `pickle`" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": { 23 | "ExecuteTime": { 24 | "end_time": "2020-02-14T12:39:12.306606Z", 25 | "start_time": "2020-02-14T12:39:12.302988Z" 26 | } 27 | }, 28 | "outputs": [], 29 | "source": [ 30 | "import numpy as np\n", 31 | "import pandas as pd\n", 32 | "import pickle\n", 33 | "\n", 34 | "filename = \"heart.csv\"" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## The best method - panda's read_csv\n", 42 | "Handles the most edge cases, datetime and file issues best." 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": { 49 | "ExecuteTime": { 50 | "end_time": "2020-02-14T12:39:55.204452Z", 51 | "start_time": "2020-02-14T12:39:55.185019Z" 52 | } 53 | }, 54 | "outputs": [ 55 | { 56 | "data": { 57 | "text/html": [ 58 | "
\n", 59 | "\n", 72 | "\n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget
063131452331015002.30011
137121302500118703.50021
241011302040017201.42021
356111202360117800.82021
457001203540116310.62021
\n", 180 | "
" 181 | ], 182 | "text/plain": [ 183 | " age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n", 184 | "0 63 1 3 145 233 1 0 150 0 2.3 0 \n", 185 | "1 37 1 2 130 250 0 1 187 0 3.5 0 \n", 186 | "2 41 0 1 130 204 0 0 172 0 1.4 2 \n", 187 | "3 56 1 1 120 236 0 1 178 0 0.8 2 \n", 188 | "4 57 0 0 120 354 0 1 163 1 0.6 2 \n", 189 | "\n", 190 | " ca thal target \n", 191 | "0 0 1 1 \n", 192 | "1 0 2 1 \n", 193 | "2 0 2 1 \n", 194 | "3 0 2 1 \n", 195 | "4 0 2 1 " 196 | ] 197 | }, 198 | "execution_count": 2, 199 | "metadata": {}, 200 | "output_type": "execute_result" 201 | } 202 | ], 203 | "source": [ 204 | "df = pd.read_csv(filename)\n", 205 | "df.head()" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "## Using numpy's loadtxt and genfromtxt\n", 213 | "\n", 214 | "If you must. Notice it fails without extra arguments - its not as smart and we have to tell it what to do. Designed for loading in data saved using `np.savetxt`, not meant to be a robust loader." 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 5, 220 | "metadata": { 221 | "ExecuteTime": { 222 | "end_time": "2020-02-14T12:41:25.154199Z", 223 | "start_time": "2020-02-14T12:41:25.144188Z" 224 | } 225 | }, 226 | "outputs": [ 227 | { 228 | "name": "stdout", 229 | "output_type": "stream", 230 | "text": [ 231 | "[[63. 1. 3. ... 0. 1. 1.]\n", 232 | " [37. 1. 2. ... 0. 2. 1.]\n", 233 | " [41. 0. 1. ... 0. 2. 1.]\n", 234 | " ...\n", 235 | " [68. 1. 0. ... 2. 3. 0.]\n", 236 | " [57. 1. 0. ... 1. 3. 0.]\n", 237 | " [57. 0. 1. ... 1. 2. 0.]]\n" 238 | ] 239 | } 240 | ], 241 | "source": [ 242 | "data = np.loadtxt(filename, delimiter=\",\", skiprows=1)\n", 243 | "print(data)" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": 7, 249 | "metadata": { 250 | "ExecuteTime": { 251 | "end_time": "2020-02-14T12:43:04.186497Z", 252 | "start_time": "2020-02-14T12:43:04.159393Z" 253 | } 254 | }, 255 | "outputs": [ 256 | { 257 | "name": "stdout", 258 | "output_type": "stream", 259 | "text": [ 260 | "[(63, 1, 3, 145, 233, 1, 0, 150, 0, 2.3, 0, 0, 1, 1)\n", 261 | " (37, 1, 2, 130, 250, 0, 1, 187, 0, 3.5, 0, 0, 2, 1)\n", 262 | " (41, 0, 1, 130, 204, 0, 0, 172, 0, 1.4, 2, 0, 2, 1)\n", 263 | " (56, 1, 1, 120, 236, 0, 1, 178, 0, 0.8, 2, 0, 2, 1)\n", 264 | " (57, 0, 0, 120, 354, 0, 1, 163, 1, 0.6, 2, 0, 2, 1)\n", 265 | " (57, 1, 0, 140, 192, 0, 1, 148, 0, 0.4, 1, 0, 1, 1)\n", 266 | " (56, 0, 1, 140, 294, 0, 0, 153, 0, 1.3, 1, 0, 2, 1)\n", 267 | " (44, 1, 1, 120, 263, 0, 1, 173, 0, 0. , 2, 0, 3, 1)\n", 268 | " (52, 1, 2, 172, 199, 1, 1, 162, 0, 0.5, 2, 0, 3, 1)\n", 269 | " (57, 1, 2, 150, 168, 0, 1, 174, 0, 1.6, 2, 0, 2, 1)]\n", 270 | "[('age', '\n", 302 | "\n", 315 | "\n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget
063.01.03.0145.0233.01.00.0150.00.02.30.00.01.01.0
137.01.02.0130.0250.00.01.0187.00.03.50.00.02.01.0
241.00.01.0130.0204.00.00.0172.00.01.42.00.02.01.0
356.01.01.0120.0236.00.01.0178.00.00.82.00.02.01.0
457.00.00.0120.0354.00.01.0163.01.00.62.00.02.01.0
\n", 423 | "" 424 | ], 425 | "text/plain": [ 426 | " age sex cp trestbps chol fbs restecg thalach exang oldpeak \\\n", 427 | "0 63.0 1.0 3.0 145.0 233.0 1.0 0.0 150.0 0.0 2.3 \n", 428 | "1 37.0 1.0 2.0 130.0 250.0 0.0 1.0 187.0 0.0 3.5 \n", 429 | "2 41.0 0.0 1.0 130.0 204.0 0.0 0.0 172.0 0.0 1.4 \n", 430 | "3 56.0 1.0 1.0 120.0 236.0 0.0 1.0 178.0 0.0 0.8 \n", 431 | "4 57.0 0.0 0.0 120.0 354.0 0.0 1.0 163.0 1.0 0.6 \n", 432 | "\n", 433 | " slope ca thal target \n", 434 | "0 0.0 0.0 1.0 1.0 \n", 435 | "1 0.0 0.0 2.0 1.0 \n", 436 | "2 2.0 0.0 2.0 1.0 \n", 437 | "3 2.0 0.0 2.0 1.0 \n", 438 | "4 2.0 0.0 2.0 1.0 " 439 | ] 440 | }, 441 | "execution_count": 8, 442 | "metadata": {}, 443 | "output_type": "execute_result" 444 | } 445 | ], 446 | "source": [ 447 | "def load_file(filename):\n", 448 | " with open(filename, encoding=\"utf-8-sig\") as f:\n", 449 | " data, cols = [], []\n", 450 | " for i, line in enumerate(f.read().splitlines()):\n", 451 | " if i == 0:\n", 452 | " cols += line.split(\",\")\n", 453 | " else:\n", 454 | " data.append([float(x) for x in line.split(\",\")])\n", 455 | " df = pd.DataFrame(data, columns=cols)\n", 456 | " return df\n", 457 | "load_file(filename).head()" 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "## Pickles!\n", 465 | "Some danger using pickles as encoding changes. Use an industry standard like hd5 instead if you can. Note if you're working with dataframes, dont use python's `pickle`, pandas has their own implementation - `df.to_pickle` and `df.read_pickle`. Underlying algorithm is the same, but less code for you to type, and supports compression." 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": 10, 471 | "metadata": { 472 | "ExecuteTime": { 473 | "end_time": "2020-02-14T12:48:34.375437Z", 474 | "start_time": "2020-02-14T12:48:34.359410Z" 475 | } 476 | }, 477 | "outputs": [ 478 | { 479 | "data": { 480 | "text/html": [ 481 | "
\n", 482 | "\n", 495 | "\n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget
063131452331015002.30011
137121302500118703.50021
241011302040017201.42021
356111202360117800.82021
457001203540116310.62021
\n", 603 | "
" 604 | ], 605 | "text/plain": [ 606 | " age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n", 607 | "0 63 1 3 145 233 1 0 150 0 2.3 0 \n", 608 | "1 37 1 2 130 250 0 1 187 0 3.5 0 \n", 609 | "2 41 0 1 130 204 0 0 172 0 1.4 2 \n", 610 | "3 56 1 1 120 236 0 1 178 0 0.8 2 \n", 611 | "4 57 0 0 120 354 0 1 163 1 0.6 2 \n", 612 | "\n", 613 | " ca thal target \n", 614 | "0 0 1 1 \n", 615 | "1 0 2 1 \n", 616 | "2 0 2 1 \n", 617 | "3 0 2 1 \n", 618 | "4 0 2 1 " 619 | ] 620 | }, 621 | "execution_count": 10, 622 | "metadata": {}, 623 | "output_type": "execute_result" 624 | } 625 | ], 626 | "source": [ 627 | "df = pd.read_pickle(\"heart.pkl\")\n", 628 | "df.head()" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "### Recap\n", 636 | "\n", 637 | "* Use pd.read_csv 99% of the time\n", 638 | "* Use pd.read_* for other cases (pd.read_excel, pd.read_pickle, etc)\n", 639 | "* If pd cant handle it, I doubt numpy can\n", 640 | "* If you use a manual function, save your data to a sensible format" 641 | ] 642 | } 643 | ], 644 | "metadata": { 645 | "kernelspec": { 646 | "display_name": "Python 3", 647 | "language": "python", 648 | "name": "python3" 649 | }, 650 | "language_info": { 651 | "codemirror_mode": { 652 | "name": "ipython", 653 | "version": 3 654 | }, 655 | "file_extension": ".py", 656 | "mimetype": "text/x-python", 657 | "name": "python", 658 | "nbconvert_exporter": "python", 659 | "pygments_lexer": "ipython3", 660 | "version": "3.7.4" 661 | } 662 | }, 663 | "nbformat": 4, 664 | "nbformat_minor": 2 665 | } 666 | -------------------------------------------------------------------------------- /practiceResource/dataSavingAndSerialising.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Saving and Serialising a dataframe\n" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "ExecuteTime": { 15 | "end_time": "2020-02-16T02:55:25.086213Z", 16 | "start_time": "2020-02-16T02:55:23.758762Z" 17 | } 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np\n", 22 | "import pandas as pd" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 2, 28 | "metadata": { 29 | "ExecuteTime": { 30 | "end_time": "2020-02-16T03:01:00.686674Z", 31 | "start_time": "2020-02-16T03:01:00.668178Z" 32 | } 33 | }, 34 | "outputs": [ 35 | { 36 | "data": { 37 | "text/html": [ 38 | "
\n", 39 | "\n", 52 | "\n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | "
ABCD
00.8631490.3147320.6697470.702656
10.5465420.5636070.7805320.312281
20.0240580.4731080.4479800.811878
30.8887020.3925240.8301590.452014
40.2667930.4497800.5895460.882689
\n", 100 | "
" 101 | ], 102 | "text/plain": [ 103 | " A B C D\n", 104 | "0 0.863149 0.314732 0.669747 0.702656\n", 105 | "1 0.546542 0.563607 0.780532 0.312281\n", 106 | "2 0.024058 0.473108 0.447980 0.811878\n", 107 | "3 0.888702 0.392524 0.830159 0.452014\n", 108 | "4 0.266793 0.449780 0.589546 0.882689" 109 | ] 110 | }, 111 | "execution_count": 2, 112 | "metadata": {}, 113 | "output_type": "execute_result" 114 | } 115 | ], 116 | "source": [ 117 | "# Lets make a new dataframe and save it out using various formats\n", 118 | "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n", 119 | "df.head()" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 3, 125 | "metadata": { 126 | "ExecuteTime": { 127 | "end_time": "2020-02-16T03:03:34.987813Z", 128 | "start_time": "2020-02-16T03:03:34.219248Z" 129 | } 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.3f\")" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 4, 139 | "metadata": { 140 | "ExecuteTime": { 141 | "end_time": "2020-02-16T03:04:00.092272Z", 142 | "start_time": "2020-02-16T03:04:00.079738Z" 143 | } 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "df.to_pickle(\"save.pkl\")" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 8, 153 | "metadata": { 154 | "ExecuteTime": { 155 | "end_time": "2020-02-16T03:06:06.874338Z", 156 | "start_time": "2020-02-16T03:06:05.955905Z" 157 | } 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "# pip install tables\n", 162 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 9, 168 | "metadata": { 169 | "ExecuteTime": { 170 | "end_time": "2020-02-16T03:06:56.305779Z", 171 | "start_time": "2020-02-16T03:06:56.204901Z" 172 | } 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "# pip install feather-format\n", 177 | "df.to_feather(\"save.fth\")" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 11, 183 | "metadata": { 184 | "ExecuteTime": { 185 | "end_time": "2020-02-16T03:10:46.080056Z", 186 | "start_time": "2020-02-16T03:10:46.075636Z" 187 | } 188 | }, 189 | "outputs": [], 190 | "source": [ 191 | "# If you want to get the timings you can see in the video, you'll need this extension:\n", 192 | "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 7, 205 | "metadata": { 206 | "ExecuteTime": { 207 | "end_time": "2020-02-16T03:14:16.764994Z", 208 | "start_time": "2020-02-16T03:14:16.741456Z" 209 | } 210 | }, 211 | "outputs": [ 212 | { 213 | "data": { 214 | "text/html": [ 215 | "
\n", 216 | "\n", 229 | "\n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | "
NameYearGroupStatusBirth DateBirth PlaceGenderAlma MaterUndergraduate MajorGraduate MajorMilitary RankMilitary BranchSpace FlightsSpace Flight (hr)Space WalksSpace Walks (hr)MissionsDeath DateDeath Mission
0Joseph M. Acaba2004.019.0Active5/17/1967Inglewood, CAMaleUniversity of California-Santa Barbara; Univer...GeologyGeologyNaNNaN23307213.0STS-119 (Discovery), ISS-31/32 (Soyuz)NaNNaN
1Loren W. ActonNaNNaNRetired3/7/1936Lewiston, MTMaleMontana State University; University of ColoradoEngineering PhysicsSolar PhysicsNaNNaN119000.0STS 51-F (Challenger)NaNNaN
2James C. Adamson1984.010.0Retired3/3/1946Warsaw, NYMaleUS Military Academy; Princeton UniversityEngineeringAerospace EngineeringColonelUS Army (Retired)233400.0STS-28 (Columbia), STS-43 (Atlantis)NaNNaN
3Thomas D. Akers1987.012.0Retired5/20/1951St. Louis, MOMaleUniversity of Missouri-RollaApplied MathematicsApplied MathematicsColonelUS Air Force (Retired)4814429.0STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...NaNNaN
4Buzz Aldrin1963.03.0Retired1/20/1930Montclair, NJMaleUS Military Academy; MITMechanical EngineeringAstronauticsColonelUS Air Force (Retired)228928.0Gemini 12, Apollo 11NaNNaN
\n", 367 | "
" 368 | ], 369 | "text/plain": [ 370 | " Name Year Group Status Birth Date Birth Place Gender \\\n", 371 | "0 Joseph M. Acaba 2004.0 19.0 Active 5/17/1967 Inglewood, CA Male \n", 372 | "1 Loren W. Acton NaN NaN Retired 3/7/1936 Lewiston, MT Male \n", 373 | "2 James C. Adamson 1984.0 10.0 Retired 3/3/1946 Warsaw, NY Male \n", 374 | "3 Thomas D. Akers 1987.0 12.0 Retired 5/20/1951 St. Louis, MO Male \n", 375 | "4 Buzz Aldrin 1963.0 3.0 Retired 1/20/1930 Montclair, NJ Male \n", 376 | "\n", 377 | " Alma Mater Undergraduate Major \\\n", 378 | "0 University of California-Santa Barbara; Univer... Geology \n", 379 | "1 Montana State University; University of Colorado Engineering Physics \n", 380 | "2 US Military Academy; Princeton University Engineering \n", 381 | "3 University of Missouri-Rolla Applied Mathematics \n", 382 | "4 US Military Academy; MIT Mechanical Engineering \n", 383 | "\n", 384 | " Graduate Major Military Rank Military Branch Space Flights \\\n", 385 | "0 Geology NaN NaN 2 \n", 386 | "1 Solar Physics NaN NaN 1 \n", 387 | "2 Aerospace Engineering Colonel US Army (Retired) 2 \n", 388 | "3 Applied Mathematics Colonel US Air Force (Retired) 4 \n", 389 | "4 Astronautics Colonel US Air Force (Retired) 2 \n", 390 | "\n", 391 | " Space Flight (hr) Space Walks Space Walks (hr) \\\n", 392 | "0 3307 2 13.0 \n", 393 | "1 190 0 0.0 \n", 394 | "2 334 0 0.0 \n", 395 | "3 814 4 29.0 \n", 396 | "4 289 2 8.0 \n", 397 | "\n", 398 | " Missions Death Date Death Mission \n", 399 | "0 STS-119 (Discovery), ISS-31/32 (Soyuz) NaN NaN \n", 400 | "1 STS 51-F (Challenger) NaN NaN \n", 401 | "2 STS-28 (Columbia), STS-43 (Atlantis) NaN NaN \n", 402 | "3 STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... NaN NaN \n", 403 | "4 Gemini 12, Apollo 11 NaN NaN " 404 | ] 405 | }, 406 | "execution_count": 7, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "df = pd.read_csv(\"astronauts.csv\")\n", 413 | "df.head()" 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": 8, 419 | "metadata": { 420 | "ExecuteTime": { 421 | "end_time": "2020-02-16T03:14:48.250858Z", 422 | "start_time": "2020-02-16T03:14:48.237441Z" 423 | } 424 | }, 425 | "outputs": [], 426 | "source": [ 427 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 9, 433 | "metadata": { 434 | "ExecuteTime": { 435 | "end_time": "2020-02-16T03:14:52.892116Z", 436 | "start_time": "2020-02-16T03:14:52.876108Z" 437 | } 438 | }, 439 | "outputs": [], 440 | "source": [ 441 | "pd.read_csv(\"save.csv\");" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 10, 447 | "metadata": { 448 | "ExecuteTime": { 449 | "end_time": "2020-02-16T03:15:12.997156Z", 450 | "start_time": "2020-02-16T03:15:12.988669Z" 451 | } 452 | }, 453 | "outputs": [], 454 | "source": [ 455 | "df.to_pickle(\"save.pkl\")" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 11, 461 | "metadata": { 462 | "ExecuteTime": { 463 | "end_time": "2020-02-16T03:15:16.375064Z", 464 | "start_time": "2020-02-16T03:15:16.365034Z" 465 | } 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "pd.read_pickle(\"save.pkl\");" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 12, 475 | "metadata": { 476 | "ExecuteTime": { 477 | "end_time": "2020-02-16T03:19:15.617617Z", 478 | "start_time": "2020-02-16T03:19:15.588076Z" 479 | } 480 | }, 481 | "outputs": [], 482 | "source": [ 483 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 13, 489 | "metadata": { 490 | "ExecuteTime": { 491 | "end_time": "2020-02-16T03:15:35.323031Z", 492 | "start_time": "2020-02-16T03:15:35.301528Z" 493 | } 494 | }, 495 | "outputs": [], 496 | "source": [ 497 | "pd.read_hdf(\"save.hdf\");" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 14, 503 | "metadata": { 504 | "ExecuteTime": { 505 | "end_time": "2020-02-16T03:15:47.513253Z", 506 | "start_time": "2020-02-16T03:15:47.499922Z" 507 | } 508 | }, 509 | "outputs": [ 510 | { 511 | "ename": "ImportError", 512 | "evalue": "Missing optional dependency 'pyarrow'. Use pip or conda to install pyarrow.", 513 | "output_type": "error", 514 | "traceback": [ 515 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 516 | "\u001b[1;31mImportError\u001b[0m Traceback (most recent call last)", 517 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mto_feather\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"save.fth\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 518 | "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py\u001b[0m in \u001b[0;36mto_feather\u001b[1;34m(self, fname)\u001b[0m\n\u001b[0;32m 2135\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mio\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfeather_format\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mto_feather\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2136\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2137\u001b[1;33m \u001b[0mto_feather\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2138\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2139\u001b[0m def to_parquet(\n", 519 | "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\feather_format.py\u001b[0m in \u001b[0;36mto_feather\u001b[1;34m(df, path)\u001b[0m\n\u001b[0;32m 21\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 22\u001b[0m \"\"\"\n\u001b[1;32m---> 23\u001b[1;33m \u001b[0mimport_optional_dependency\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"pyarrow\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 24\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mpyarrow\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mfeather\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 25\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", 520 | "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\compat\\_optional.py\u001b[0m in \u001b[0;36mimport_optional_dependency\u001b[1;34m(name, extra, raise_on_missing, on_version)\u001b[0m\n\u001b[0;32m 91\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mImportError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 92\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mraise_on_missing\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 93\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mImportError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mextra\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mextra\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 94\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 95\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 521 | "\u001b[1;31mImportError\u001b[0m: Missing optional dependency 'pyarrow'. Use pip or conda to install pyarrow." 522 | ] 523 | } 524 | ], 525 | "source": [ 526 | "df.to_feather(\"save.fth\")" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": null, 532 | "metadata": { 533 | "ExecuteTime": { 534 | "end_time": "2020-02-16T03:15:50.574863Z", 535 | "start_time": "2020-02-16T03:15:50.557141Z" 536 | } 537 | }, 538 | "outputs": [], 539 | "source": [ 540 | "pd.read_feather(\"save.fth\");" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 16, 546 | "metadata": { 547 | "ExecuteTime": { 548 | "end_time": "2020-02-16T03:20:03.082982Z", 549 | "start_time": "2020-02-16T03:20:03.062532Z" 550 | } 551 | }, 552 | "outputs": [ 553 | { 554 | "name": "stdout", 555 | "output_type": "stream", 556 | "text": [ 557 | " Volume in drive G is HDD Storge 2\n", 558 | " Volume Serial Number is D2CA-B02B\n", 559 | "\n", 560 | " Directory of G:\\Qaurter2Notebooks\\piaic_q2_class_reseouces\\practiceResource\n", 561 | "\n", 562 | "01/10/2021 12:15 PM .\n", 563 | "01/10/2021 12:15 PM ..\n", 564 | "01/10/2021 12:12 PM .ipynb_checkpoints\n", 565 | "01/09/2021 10:33 AM 15,377 Answers.ipynb\n", 566 | "01/10/2021 12:03 PM 81,593 astronauts.csv\n", 567 | "01/10/2021 12:10 PM 33,930 dataInspecting.ipynb\n", 568 | "01/10/2021 12:04 PM 19,860 dataloading.ipynb\n", 569 | "01/10/2021 12:14 PM 18,591 dataSavingAndSerialising.ipynb\n", 570 | "01/10/2021 11:57 AM 11,328 heart.csv\n", 571 | "01/09/2021 10:31 AM 35,216 heart.pkl\n", 572 | "01/09/2021 10:53 AM 32,414 NumpyVPandas.ipynb\n", 573 | "01/09/2021 10:33 AM 2,812 Questions.ipynb\n", 574 | "01/10/2021 12:14 PM 87,030 save.csv\n", 575 | "01/10/2021 12:15 PM 801,617 save.hdf\n", 576 | "01/10/2021 12:15 PM 90,693 save.pkl\n", 577 | "01/09/2021 10:30 AM 18,594 SavingAndSerialising.ipynb\n", 578 | " 13 File(s) 1,249,055 bytes\n", 579 | " 3 Dir(s) 391,598,575,616 bytes free\n" 580 | ] 581 | } 582 | ], 583 | "source": [ 584 | "%ls" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": {}, 590 | "source": [ 591 | "### Recap\n", 592 | "\n", 593 | "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n", 594 | "\n", 595 | "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine." 596 | ] 597 | } 598 | ], 599 | "metadata": { 600 | "kernelspec": { 601 | "display_name": "Python 3", 602 | "language": "python", 603 | "name": "python3" 604 | }, 605 | "language_info": { 606 | "codemirror_mode": { 607 | "name": "ipython", 608 | "version": 3 609 | }, 610 | "file_extension": ".py", 611 | "mimetype": "text/x-python", 612 | "name": "python", 613 | "nbconvert_exporter": "python", 614 | "pygments_lexer": "ipython3", 615 | "version": "3.7.4" 616 | } 617 | }, 618 | "nbformat": 4, 619 | "nbformat_minor": 2 620 | } 621 | -------------------------------------------------------------------------------- /practiceResource/dataMaipulation/5_Basics_ApplyMapVectorised.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Basics - Apply, Map and Vectorised Functions" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "ExecuteTime": { 15 | "end_time": "2020-02-22T02:40:07.096900Z", 16 | "start_time": "2020-02-22T02:40:04.211321Z" 17 | } 18 | }, 19 | "outputs": [ 20 | { 21 | "data": { 22 | "text/html": [ 23 | "
\n", 24 | "\n", 37 | "\n", 38 | " \n", 39 | " \n", 40 | " \n", 41 | " \n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | "
ABC
0-1.01-0.700.62
10.780.050.68
20.38-0.05-0.07
3-1.55-0.19-0.08
\n", 73 | "
" 74 | ], 75 | "text/plain": [ 76 | " A B C\n", 77 | "0 -1.01 -0.70 0.62\n", 78 | "1 0.78 0.05 0.68\n", 79 | "2 0.38 -0.05 -0.07\n", 80 | "3 -1.55 -0.19 -0.08" 81 | ] 82 | }, 83 | "execution_count": 1, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "import pandas as pd\n", 90 | "import numpy as np\n", 91 | "\n", 92 | "data = np.round(np.random.normal(size=(4, 3)), 2)\n", 93 | "df = pd.DataFrame(data, columns=[\"A\", \"B\", \"C\"])\n", 94 | "df.head()" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "## Apply\n", 102 | "\n", 103 | "Used to execute an arbitrary function again an entire dataframe, or a subection. Applies in a vectorised fashion." 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 3, 109 | "metadata": { 110 | "ExecuteTime": { 111 | "end_time": "2020-02-22T03:02:25.417815Z", 112 | "start_time": "2020-02-22T03:02:25.407038Z" 113 | } 114 | }, 115 | "outputs": [ 116 | { 117 | "data": { 118 | "text/html": [ 119 | "
\n", 120 | "\n", 133 | "\n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | "
ABC
02.011.701.62
11.781.051.68
21.381.051.07
32.551.191.08
\n", 169 | "
" 170 | ], 171 | "text/plain": [ 172 | " A B C\n", 173 | "0 2.01 1.70 1.62\n", 174 | "1 1.78 1.05 1.68\n", 175 | "2 1.38 1.05 1.07\n", 176 | "3 2.55 1.19 1.08" 177 | ] 178 | }, 179 | "execution_count": 3, 180 | "metadata": {}, 181 | "output_type": "execute_result" 182 | } 183 | ], 184 | "source": [ 185 | "df.apply(lambda x: 1 + np.abs(x))" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 4, 191 | "metadata": { 192 | "ExecuteTime": { 193 | "end_time": "2020-02-22T03:02:55.820553Z", 194 | "start_time": "2020-02-22T03:02:55.814335Z" 195 | } 196 | }, 197 | "outputs": [ 198 | { 199 | "data": { 200 | "text/plain": [ 201 | "0 1.01\n", 202 | "1 0.78\n", 203 | "2 0.38\n", 204 | "3 1.55\n", 205 | "Name: A, dtype: float64" 206 | ] 207 | }, 208 | "execution_count": 4, 209 | "metadata": {}, 210 | "output_type": "execute_result" 211 | } 212 | ], 213 | "source": [ 214 | "df.A.apply(np.abs)" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 6, 220 | "metadata": { 221 | "ExecuteTime": { 222 | "end_time": "2020-02-22T03:04:42.256485Z", 223 | "start_time": "2020-02-22T03:04:42.253987Z" 224 | } 225 | }, 226 | "outputs": [], 227 | "source": [ 228 | "#def double_if_positive(x):\n", 229 | "# if x > 0:\n", 230 | "# return 2 * x\n", 231 | "# return x\n", 232 | "#\n", 233 | "#df.apply(double_if_positive)" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 7, 239 | "metadata": { 240 | "ExecuteTime": { 241 | "end_time": "2020-02-22T03:05:04.690134Z", 242 | "start_time": "2020-02-22T03:05:04.662382Z" 243 | } 244 | }, 245 | "outputs": [ 246 | { 247 | "data": { 248 | "text/html": [ 249 | "
\n", 250 | "\n", 263 | "\n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | "
ABC
0-1.01-0.701.24
11.560.101.36
20.76-0.05-0.07
3-1.55-0.19-0.08
\n", 299 | "
" 300 | ], 301 | "text/plain": [ 302 | " A B C\n", 303 | "0 -1.01 -0.70 1.24\n", 304 | "1 1.56 0.10 1.36\n", 305 | "2 0.76 -0.05 -0.07\n", 306 | "3 -1.55 -0.19 -0.08" 307 | ] 308 | }, 309 | "execution_count": 7, 310 | "metadata": {}, 311 | "output_type": "execute_result" 312 | } 313 | ], 314 | "source": [ 315 | "def double_if_positive(x):\n", 316 | " x[x > 0] *= 2\n", 317 | " return x\n", 318 | "\n", 319 | "df.apply(double_if_positive)" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 8, 325 | "metadata": { 326 | "ExecuteTime": { 327 | "end_time": "2020-02-22T03:05:32.894881Z", 328 | "start_time": "2020-02-22T03:05:32.887394Z" 329 | } 330 | }, 331 | "outputs": [ 332 | { 333 | "data": { 334 | "text/html": [ 335 | "
\n", 336 | "\n", 349 | "\n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | "
ABC
0-1.01-0.701.24
11.560.101.36
20.76-0.05-0.07
3-1.55-0.19-0.08
\n", 385 | "
" 386 | ], 387 | "text/plain": [ 388 | " A B C\n", 389 | "0 -1.01 -0.70 1.24\n", 390 | "1 1.56 0.10 1.36\n", 391 | "2 0.76 -0.05 -0.07\n", 392 | "3 -1.55 -0.19 -0.08" 393 | ] 394 | }, 395 | "execution_count": 8, 396 | "metadata": {}, 397 | "output_type": "execute_result" 398 | } 399 | ], 400 | "source": [ 401 | "df" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": 11, 407 | "metadata": { 408 | "ExecuteTime": { 409 | "end_time": "2020-02-22T03:07:51.904072Z", 410 | "start_time": "2020-02-22T03:07:51.894055Z" 411 | } 412 | }, 413 | "outputs": [ 414 | { 415 | "data": { 416 | "text/html": [ 417 | "
\n", 418 | "\n", 431 | "\n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | "
ABC
0-1.01-0.702.48
13.120.202.72
21.52-0.05-0.07
3-1.55-0.19-0.08
\n", 467 | "
" 468 | ], 469 | "text/plain": [ 470 | " A B C\n", 471 | "0 -1.01 -0.70 2.48\n", 472 | "1 3.12 0.20 2.72\n", 473 | "2 1.52 -0.05 -0.07\n", 474 | "3 -1.55 -0.19 -0.08" 475 | ] 476 | }, 477 | "execution_count": 11, 478 | "metadata": {}, 479 | "output_type": "execute_result" 480 | } 481 | ], 482 | "source": [ 483 | "def double_if_positive(x):\n", 484 | " x = x.copy()\n", 485 | " x[x > 0] *= 2\n", 486 | " return x\n", 487 | "\n", 488 | "df.apply(double_if_positive, raw=True)" 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": {}, 494 | "source": [ 495 | "## Map\n", 496 | "\n", 497 | "Similar to apply, but operators on Series, and uses dictionary based inputs rather than an array of values.\n" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 12, 503 | "metadata": { 504 | "ExecuteTime": { 505 | "end_time": "2020-02-22T03:09:07.652877Z", 506 | "start_time": "2020-02-22T03:09:07.646810Z" 507 | } 508 | }, 509 | "outputs": [], 510 | "source": [ 511 | "series = pd.Series([\"Steve\", \"Alex\", \"Jess\", \"Mark\"])" 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": 13, 517 | "metadata": { 518 | "ExecuteTime": { 519 | "end_time": "2020-02-22T03:09:19.239855Z", 520 | "start_time": "2020-02-22T03:09:19.231863Z" 521 | } 522 | }, 523 | "outputs": [ 524 | { 525 | "data": { 526 | "text/plain": [ 527 | "0 Stephen\n", 528 | "1 NaN\n", 529 | "2 NaN\n", 530 | "3 NaN\n", 531 | "dtype: object" 532 | ] 533 | }, 534 | "execution_count": 13, 535 | "metadata": {}, 536 | "output_type": "execute_result" 537 | } 538 | ], 539 | "source": [ 540 | "series.map({\"Steve\": \"Stephen\"})" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 14, 546 | "metadata": { 547 | "ExecuteTime": { 548 | "end_time": "2020-02-22T03:10:19.253698Z", 549 | "start_time": "2020-02-22T03:10:19.247477Z" 550 | } 551 | }, 552 | "outputs": [ 553 | { 554 | "data": { 555 | "text/plain": [ 556 | "0 I am Steve\n", 557 | "1 I am Alex\n", 558 | "2 I am Jess\n", 559 | "3 I am Mark\n", 560 | "dtype: object" 561 | ] 562 | }, 563 | "execution_count": 14, 564 | "metadata": {}, 565 | "output_type": "execute_result" 566 | } 567 | ], 568 | "source": [ 569 | "series.map(lambda d: f\"I am {d}\")" 570 | ] 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "metadata": { 575 | "ExecuteTime": { 576 | "end_time": "2020-02-22T03:12:35.912759Z", 577 | "start_time": "2020-02-22T03:12:35.902370Z" 578 | } 579 | }, 580 | "source": [ 581 | "## Vectorised functions\n", 582 | "\n", 583 | "Pandas and numpy obviously have tons of these, here are some examples" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 17, 589 | "metadata": { 590 | "ExecuteTime": { 591 | "end_time": "2020-02-22T03:14:11.987446Z", 592 | "start_time": "2020-02-22T03:14:11.974356Z" 593 | } 594 | }, 595 | "outputs": [ 596 | { 597 | "data": { 598 | "text/html": [ 599 | "
\n", 600 | "\n", 613 | "\n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | "
ABC
0-1.01-0.701.24
11.560.101.36
20.76-0.05-0.07
3-1.55-0.19-0.08
\n", 649 | "
" 650 | ], 651 | "text/plain": [ 652 | " A B C\n", 653 | "0 -1.01 -0.70 1.24\n", 654 | "1 1.56 0.10 1.36\n", 655 | "2 0.76 -0.05 -0.07\n", 656 | "3 -1.55 -0.19 -0.08" 657 | ] 658 | }, 659 | "metadata": {}, 660 | "output_type": "display_data" 661 | }, 662 | { 663 | "data": { 664 | "text/html": [ 665 | "
\n", 666 | "\n", 679 | "\n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | "
ABC
01.010.701.24
11.560.101.36
20.760.050.07
31.550.190.08
\n", 715 | "
" 716 | ], 717 | "text/plain": [ 718 | " A B C\n", 719 | "0 1.01 0.70 1.24\n", 720 | "1 1.56 0.10 1.36\n", 721 | "2 0.76 0.05 0.07\n", 722 | "3 1.55 0.19 0.08" 723 | ] 724 | }, 725 | "metadata": {}, 726 | "output_type": "display_data" 727 | } 728 | ], 729 | "source": [ 730 | "display(df, df.abs())" 731 | ] 732 | }, 733 | { 734 | "cell_type": "code", 735 | "execution_count": 18, 736 | "metadata": { 737 | "ExecuteTime": { 738 | "end_time": "2020-02-22T03:14:53.996400Z", 739 | "start_time": "2020-02-22T03:14:53.992364Z" 740 | } 741 | }, 742 | "outputs": [], 743 | "source": [ 744 | "series = pd.Series([\"Obi-Wan Kenobi\", \"Luke Skywalker\", \"Han Solo\", \"Leia Organa\"])" 745 | ] 746 | }, 747 | { 748 | "cell_type": "code", 749 | "execution_count": 20, 750 | "metadata": { 751 | "ExecuteTime": { 752 | "end_time": "2020-02-22T03:15:40.875036Z", 753 | "start_time": "2020-02-22T03:15:40.871022Z" 754 | } 755 | }, 756 | "outputs": [ 757 | { 758 | "data": { 759 | "text/plain": [ 760 | "['Luke', 'Skywalker']" 761 | ] 762 | }, 763 | "execution_count": 20, 764 | "metadata": {}, 765 | "output_type": "execute_result" 766 | } 767 | ], 768 | "source": [ 769 | "\"Luke Skywalker\".split()" 770 | ] 771 | }, 772 | { 773 | "cell_type": "code", 774 | "execution_count": 23, 775 | "metadata": { 776 | "ExecuteTime": { 777 | "end_time": "2020-02-22T03:16:42.001894Z", 778 | "start_time": "2020-02-22T03:16:41.992370Z" 779 | } 780 | }, 781 | "outputs": [ 782 | { 783 | "data": { 784 | "text/html": [ 785 | "
\n", 786 | "\n", 799 | "\n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | "
01
0Obi-WanKenobi
1LukeSkywalker
2HanSolo
3LeiaOrgana
\n", 830 | "
" 831 | ], 832 | "text/plain": [ 833 | " 0 1\n", 834 | "0 Obi-Wan Kenobi\n", 835 | "1 Luke Skywalker\n", 836 | "2 Han Solo\n", 837 | "3 Leia Organa" 838 | ] 839 | }, 840 | "execution_count": 23, 841 | "metadata": {}, 842 | "output_type": "execute_result" 843 | } 844 | ], 845 | "source": [ 846 | "series.str.split(expand=True)" 847 | ] 848 | }, 849 | { 850 | "cell_type": "code", 851 | "execution_count": 24, 852 | "metadata": { 853 | "ExecuteTime": { 854 | "end_time": "2020-02-22T03:17:28.038500Z", 855 | "start_time": "2020-02-22T03:17:28.033999Z" 856 | } 857 | }, 858 | "outputs": [ 859 | { 860 | "data": { 861 | "text/plain": [ 862 | "0 False\n", 863 | "1 True\n", 864 | "2 False\n", 865 | "3 False\n", 866 | "dtype: bool" 867 | ] 868 | }, 869 | "execution_count": 24, 870 | "metadata": {}, 871 | "output_type": "execute_result" 872 | } 873 | ], 874 | "source": [ 875 | "series.str.contains(\"Skywalker\")" 876 | ] 877 | }, 878 | { 879 | "cell_type": "code", 880 | "execution_count": 26, 881 | "metadata": { 882 | "ExecuteTime": { 883 | "end_time": "2020-02-22T03:18:20.707962Z", 884 | "start_time": "2020-02-22T03:18:20.702104Z" 885 | } 886 | }, 887 | "outputs": [ 888 | { 889 | "data": { 890 | "text/plain": [ 891 | "0 [OBI-WAN, KENOBI]\n", 892 | "1 [LUKE, SKYWALKER]\n", 893 | "2 [HAN, SOLO]\n", 894 | "3 [LEIA, ORGANA]\n", 895 | "dtype: object" 896 | ] 897 | }, 898 | "execution_count": 26, 899 | "metadata": {}, 900 | "output_type": "execute_result" 901 | } 902 | ], 903 | "source": [ 904 | "series.str.upper().str.split()" 905 | ] 906 | }, 907 | { 908 | "cell_type": "markdown", 909 | "metadata": {}, 910 | "source": [ 911 | "## User defined functions\n", 912 | "\n", 913 | "Lets investigate a super simple example of trying to find the hypotenuse given x and y distances.\n" 914 | ] 915 | }, 916 | { 917 | "cell_type": "code", 918 | "execution_count": 27, 919 | "metadata": { 920 | "ExecuteTime": { 921 | "end_time": "2020-02-22T03:19:38.514718Z", 922 | "start_time": "2020-02-22T03:19:38.503227Z" 923 | } 924 | }, 925 | "outputs": [], 926 | "source": [ 927 | "data2 = np.random.normal(10, 2, size=(100000, 2))\n", 928 | "df2 = pd.DataFrame(data2, columns=[\"x\", \"y\"])" 929 | ] 930 | }, 931 | { 932 | "cell_type": "code", 933 | "execution_count": 28, 934 | "metadata": { 935 | "ExecuteTime": { 936 | "end_time": "2020-02-22T03:20:22.345484Z", 937 | "start_time": "2020-02-22T03:20:22.320297Z" 938 | } 939 | }, 940 | "outputs": [ 941 | { 942 | "name": "stdout", 943 | "output_type": "stream", 944 | "text": [ 945 | "13.385640543875555\n" 946 | ] 947 | } 948 | ], 949 | "source": [ 950 | "hypot = (df2.x**2 + df2.y**2)**0.5\n", 951 | "print(hypot[0])" 952 | ] 953 | }, 954 | { 955 | "cell_type": "code", 956 | "execution_count": 29, 957 | "metadata": { 958 | "ExecuteTime": { 959 | "end_time": "2020-02-22T03:22:05.047787Z", 960 | "start_time": "2020-02-22T03:21:57.547968Z" 961 | } 962 | }, 963 | "outputs": [ 964 | { 965 | "name": "stdout", 966 | "output_type": "stream", 967 | "text": [ 968 | "13.385640543875555\n" 969 | ] 970 | } 971 | ], 972 | "source": [ 973 | "def hypot1(x, y):\n", 974 | " return np.sqrt(x**2 + y**2)\n", 975 | "\n", 976 | "h1 = []\n", 977 | "for index, (x, y) in df2.iterrows():\n", 978 | " h1.append(hypot1(x, y))\n", 979 | "print(h1[0])" 980 | ] 981 | }, 982 | { 983 | "cell_type": "code", 984 | "execution_count": 30, 985 | "metadata": { 986 | "ExecuteTime": { 987 | "end_time": "2020-02-22T03:23:27.324121Z", 988 | "start_time": "2020-02-22T03:23:24.153687Z" 989 | } 990 | }, 991 | "outputs": [ 992 | { 993 | "name": "stdout", 994 | "output_type": "stream", 995 | "text": [ 996 | "13.385640543875555\n" 997 | ] 998 | } 999 | ], 1000 | "source": [ 1001 | "def hypot2(row):\n", 1002 | " return np.sqrt(row.x**2 + row.y**2)\n", 1003 | "\n", 1004 | "h2 = df2.apply(hypot2, axis=1)\n", 1005 | "print(h2[0])" 1006 | ] 1007 | }, 1008 | { 1009 | "cell_type": "code", 1010 | "execution_count": 31, 1011 | "metadata": { 1012 | "ExecuteTime": { 1013 | "end_time": "2020-02-22T03:24:23.324639Z", 1014 | "start_time": "2020-02-22T03:24:23.313038Z" 1015 | } 1016 | }, 1017 | "outputs": [ 1018 | { 1019 | "name": "stdout", 1020 | "output_type": "stream", 1021 | "text": [ 1022 | "13.385640543875555\n" 1023 | ] 1024 | } 1025 | ], 1026 | "source": [ 1027 | "def hypot3(xs, ys):\n", 1028 | " return np.sqrt(xs**2 + ys**2)\n", 1029 | "h3 = hypot3(df2.x, df2.y)\n", 1030 | "print(h3[0])" 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "markdown", 1035 | "metadata": {}, 1036 | "source": [ 1037 | "Vectorising everything you can is the key to speeding up your code. Once you've done that, you should use other tools to investigate. PyCharm Professional has a great optimisation tool built in. Jupyter has %lprun (line profiler) command you can find here: https://github.com/rkern/line_profiler\n", 1038 | "\n", 1039 | "### Recap\n", 1040 | "\n", 1041 | "* apply\n", 1042 | "* map\n", 1043 | "* .str & similar" 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "code", 1048 | "execution_count": null, 1049 | "metadata": {}, 1050 | "outputs": [], 1051 | "source": [] 1052 | } 1053 | ], 1054 | "metadata": { 1055 | "kernelspec": { 1056 | "display_name": "Python 3", 1057 | "language": "python", 1058 | "name": "python3" 1059 | }, 1060 | "language_info": { 1061 | "codemirror_mode": { 1062 | "name": "ipython", 1063 | "version": 3 1064 | }, 1065 | "file_extension": ".py", 1066 | "mimetype": "text/x-python", 1067 | "name": "python", 1068 | "nbconvert_exporter": "python", 1069 | "pygments_lexer": "ipython3", 1070 | "version": "3.7.3" 1071 | } 1072 | }, 1073 | "nbformat": 4, 1074 | "nbformat_minor": 2 1075 | } 1076 | --------------------------------------------------------------------------------