├── practiceResource
    ├── heart.pkl
    ├── save.hdf
    ├── save.pkl
    ├── data
    │   ├── example.pkl
    │   ├── example_1.csv
    │   ├── example_2.csv
    │   ├── example_3.csv
    │   ├── example_4.csv
    │   ├── CreatingDataFrames.ipynb
    │   └── heart.csv
    ├── Questions.ipynb
    ├── heart.csv
    ├── dataMaipulation
    │   ├── Questions.ipynb
    │   └── 5_Basics_ApplyMapVectorised.ipynb
    ├── Answers.ipynb
    ├── SavingAndSerialising.ipynb
    ├── .ipynb_checkpoints
    │   └── dataSavingAndSerialising-checkpoint.ipynb
    ├── dataloading.ipynb
    └── dataSavingAndSerialising.ipynb
└── NewPracticeResource
    ├── Practice_material_2.ipynb
    └── practice_material.ipynb


/practiceResource/heart.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/heart.pkl


--------------------------------------------------------------------------------
/practiceResource/save.hdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/save.hdf


--------------------------------------------------------------------------------
/practiceResource/save.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/save.pkl


--------------------------------------------------------------------------------
/practiceResource/data/example.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/data/example.pkl


--------------------------------------------------------------------------------
/practiceResource/data/example_1.csv:
--------------------------------------------------------------------------------
1 | name,gender,age,oaths
2 | Kaladin,Male,20.0,3.0
3 | Shallan,Female,17.0,2.0
4 | Dalinar,Male,53.0,3.0
5 | Szeth,Male,35.0,0.0
6 | Hoid,Male,,
7 | Jashnah,Female,34.0,3.0
8 | 


--------------------------------------------------------------------------------
/practiceResource/data/example_2.csv:
--------------------------------------------------------------------------------
1 | name	gender	age	oaths
2 | Kaladin	Male	20.0	3.0
3 | Shallan	Female	17.0	2.0
4 | Dalinar	Male	53.0	3.0
5 | Szeth	Male	35.0	0.0
6 | Hoid	Male		
7 | Jashnah	Female	34.0	3.0
8 | 


--------------------------------------------------------------------------------
/practiceResource/data/example_3.csv:
--------------------------------------------------------------------------------
1 | ,name,gender,age,oaths
2 | 0,Kaladin,Male,20.0,3.0
3 | 1,Shallan,Female,17.0,2.0
4 | 2,Dalinar,Male,53.0,3.0
5 | 3,Szeth,Male,35.0,0.0
6 | 4,Hoid,Male,,
7 | 5,Jashnah,Female,34.0,3.0
8 | 


--------------------------------------------------------------------------------
/practiceResource/data/example_4.csv:
--------------------------------------------------------------------------------
 1 | # This file contains details guessed from Stormlight Archive
 2 | name|gender|age|oaths
 3 | Kaladin|Male|20.0|3.0
 4 | Shallan|Female|17.0|2.0
 5 | Dalinar|Male|53.0|3.0
 6 | Szeth|Male|35.0|0.0
 7 | # Who knows about Hoid
 8 | Hoid|Male|NaN|NaN
 9 | Jashnah|Female|34.0|3.0
10 | 


--------------------------------------------------------------------------------
/practiceResource/Questions.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Optional Exercise - Data loading\n",
  8 |     "\n",
  9 |     "Heres a very short example for four files to load in. None of these example should be hard, but the idea is to familiarise yourself with the sorts of error messages or weird output you'd get if you load in with the defaults (and they don't work), and then how to change your input arguments to make it better.\n",
 10 |     "\n",
 11 |     "The files to attempt to load in are:\n",
 12 |     "\n",
 13 |     "1. example.pkl\n",
 14 |     "2. example_1.csv\n",
 15 |     "3. example_2.csv\n",
 16 |     "3. example_3.csv\n",
 17 |     "3. example_4.csv"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "code",
 22 |    "execution_count": 1,
 23 |    "metadata": {
 24 |     "ExecuteTime": {
 25 |      "end_time": "2020-03-17T04:34:36.761456Z",
 26 |      "start_time": "2020-03-17T04:34:36.452524Z"
 27 |     }
 28 |    },
 29 |    "outputs": [],
 30 |    "source": [
 31 |     "import pandas as pd"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": 3,
 37 |    "metadata": {
 38 |     "ExecuteTime": {
 39 |      "end_time": "2020-03-17T04:35:03.643107Z",
 40 |      "start_time": "2020-03-17T04:35:03.639118Z"
 41 |     }
 42 |    },
 43 |    "outputs": [],
 44 |    "source": [
 45 |     "# Load example.pkl"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "code",
 50 |    "execution_count": 4,
 51 |    "metadata": {
 52 |     "ExecuteTime": {
 53 |      "end_time": "2020-03-17T04:35:05.900792Z",
 54 |      "start_time": "2020-03-17T04:35:05.896809Z"
 55 |     }
 56 |    },
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "# Load example_1.csv"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": 5,
 65 |    "metadata": {
 66 |     "ExecuteTime": {
 67 |      "end_time": "2020-03-17T04:35:07.023962Z",
 68 |      "start_time": "2020-03-17T04:35:07.021962Z"
 69 |     }
 70 |    },
 71 |    "outputs": [],
 72 |    "source": [
 73 |     "# Load example_2.csv"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "code",
 78 |    "execution_count": 6,
 79 |    "metadata": {
 80 |     "ExecuteTime": {
 81 |      "end_time": "2020-03-17T04:35:07.698522Z",
 82 |      "start_time": "2020-03-17T04:35:07.694534Z"
 83 |     }
 84 |    },
 85 |    "outputs": [],
 86 |    "source": [
 87 |     "# Load example_3.csv"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": 7,
 93 |    "metadata": {
 94 |     "ExecuteTime": {
 95 |      "end_time": "2020-03-17T04:35:08.514409Z",
 96 |      "start_time": "2020-03-17T04:35:08.511417Z"
 97 |     }
 98 |    },
 99 |    "outputs": [],
100 |    "source": [
101 |     "# Load example_4.csv"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "code",
106 |    "execution_count": null,
107 |    "metadata": {},
108 |    "outputs": [],
109 |    "source": []
110 |   }
111 |  ],
112 |  "metadata": {
113 |   "kernelspec": {
114 |    "display_name": "Python 3",
115 |    "language": "python",
116 |    "name": "python3"
117 |   },
118 |   "language_info": {
119 |    "codemirror_mode": {
120 |     "name": "ipython",
121 |     "version": 3
122 |    },
123 |    "file_extension": ".py",
124 |    "mimetype": "text/x-python",
125 |    "name": "python",
126 |    "nbconvert_exporter": "python",
127 |    "pygments_lexer": "ipython3",
128 |    "version": "3.7.3"
129 |   }
130 |  },
131 |  "nbformat": 4,
132 |  "nbformat_minor": 2
133 | }
134 | 


--------------------------------------------------------------------------------
/NewPracticeResource/Practice_material_2.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Download dataset form  this website\n",
  8 |     "\n",
  9 |     "## https://www.kaggle.com/faressayah/stanford-open-policing-project?select=police_project.csv"
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "markdown",
 14 |    "metadata": {},
 15 |    "source": [
 16 |     "## Description::"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "markdown",
 21 |    "metadata": {},
 22 |    "source": [
 23 |     "On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.\n"
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "markdown",
 28 |    "metadata": {},
 29 |    "source": [
 30 |     "## Importing libraries::"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": null,
 36 |    "metadata": {},
 37 |    "outputs": [],
 38 |    "source": [
 39 |     "import pandas as pd\n",
 40 |     "import numpy as np"
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "markdown",
 45 |    "metadata": {},
 46 |    "source": [
 47 |     "# Use Pandas' read_csv function  open it as a DataFrame"
 48 |    ]
 49 |   },
 50 |   {
 51 |    "cell_type": "code",
 52 |    "execution_count": null,
 53 |    "metadata": {},
 54 |    "outputs": [],
 55 |    "source": []
 56 |   },
 57 |   {
 58 |    "cell_type": "markdown",
 59 |    "metadata": {},
 60 |    "source": [
 61 |     "# What does each row represent?\n",
 62 |     "\n",
 63 |     "#### hint::\n",
 64 |     "head : Return the first n rows. (By default return first 5 rows.)"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": null,
 70 |    "metadata": {},
 71 |    "outputs": [],
 72 |    "source": []
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {},
 77 |    "source": [
 78 |     "# How to get the basic statistics of all the columns?"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": null,
 84 |    "metadata": {},
 85 |    "outputs": [],
 86 |    "source": []
 87 |   },
 88 |   {
 89 |    "cell_type": "markdown",
 90 |    "metadata": {},
 91 |    "source": [
 92 |     "# How to check the shape of dataset?"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": null,
 98 |    "metadata": {},
 99 |    "outputs": [],
100 |    "source": []
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "metadata": {},
105 |    "source": [
106 |     "# Check the type of columns?"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "code",
111 |    "execution_count": null,
112 |    "metadata": {},
113 |    "outputs": [],
114 |    "source": []
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "# Locating missing Values?\n",
121 |     "#### detecting missing values\n",
122 |     "#### calculates the sum of each column\n"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "code",
127 |    "execution_count": null,
128 |    "metadata": {},
129 |    "outputs": [],
130 |    "source": []
131 |   },
132 |   {
133 |    "cell_type": "code",
134 |    "execution_count": null,
135 |    "metadata": {},
136 |    "outputs": [],
137 |    "source": []
138 |   },
139 |   {
140 |    "cell_type": "markdown",
141 |    "metadata": {},
142 |    "source": [
143 |     "# Dropping Column that only contains missing values."
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": null,
149 |    "metadata": {},
150 |    "outputs": [],
151 |    "source": []
152 |   },
153 |   {
154 |    "cell_type": "code",
155 |    "execution_count": null,
156 |    "metadata": {},
157 |    "outputs": [],
158 |    "source": []
159 |   },
160 |   {
161 |    "cell_type": "markdown",
162 |    "metadata": {},
163 |    "source": [
164 |     "# Do the men or women speed more often?"
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "code",
169 |    "execution_count": null,
170 |    "metadata": {},
171 |    "outputs": [],
172 |    "source": []
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "metadata": {},
178 |    "outputs": [],
179 |    "source": []
180 |   },
181 |   {
182 |    "cell_type": "markdown",
183 |    "metadata": {},
184 |    "source": [
185 |     "# Which year had the least number of stops?"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "code",
190 |    "execution_count": null,
191 |    "metadata": {},
192 |    "outputs": [],
193 |    "source": []
194 |   },
195 |   {
196 |    "cell_type": "code",
197 |    "execution_count": null,
198 |    "metadata": {},
199 |    "outputs": [],
200 |    "source": []
201 |   },
202 |   {
203 |    "cell_type": "markdown",
204 |    "metadata": {},
205 |    "source": [
206 |     "# Does gender affect who gets searched during a stop?"
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "code",
211 |    "execution_count": null,
212 |    "metadata": {},
213 |    "outputs": [],
214 |    "source": []
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": null,
219 |    "metadata": {},
220 |    "outputs": [],
221 |    "source": []
222 |   },
223 |   {
224 |    "cell_type": "markdown",
225 |    "metadata": {},
226 |    "source": [
227 |     "\n",
228 |     "# How does drug activity change by time of day?"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": null,
234 |    "metadata": {},
235 |    "outputs": [],
236 |    "source": []
237 |   },
238 |   {
239 |    "cell_type": "code",
240 |    "execution_count": null,
241 |    "metadata": {},
242 |    "outputs": [],
243 |    "source": []
244 |   },
245 |   {
246 |    "cell_type": "markdown",
247 |    "metadata": {},
248 |    "source": [
249 |     "# Do most stops occur at night?"
250 |    ]
251 |   },
252 |   {
253 |    "cell_type": "code",
254 |    "execution_count": null,
255 |    "metadata": {},
256 |    "outputs": [],
257 |    "source": []
258 |   },
259 |   {
260 |    "cell_type": "code",
261 |    "execution_count": null,
262 |    "metadata": {},
263 |    "outputs": [],
264 |    "source": []
265 |   },
266 |   {
267 |    "cell_type": "markdown",
268 |    "metadata": {},
269 |    "source": []
270 |   },
271 |   {
272 |    "cell_type": "code",
273 |    "execution_count": null,
274 |    "metadata": {},
275 |    "outputs": [],
276 |    "source": []
277 |   },
278 |   {
279 |    "cell_type": "code",
280 |    "execution_count": null,
281 |    "metadata": {},
282 |    "outputs": [],
283 |    "source": []
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": null,
288 |    "metadata": {},
289 |    "outputs": [],
290 |    "source": []
291 |   },
292 |   {
293 |    "cell_type": "code",
294 |    "execution_count": null,
295 |    "metadata": {},
296 |    "outputs": [],
297 |    "source": []
298 |   },
299 |   {
300 |    "cell_type": "code",
301 |    "execution_count": null,
302 |    "metadata": {},
303 |    "outputs": [],
304 |    "source": []
305 |   }
306 |  ],
307 |  "metadata": {
308 |   "environment": {
309 |    "name": "tf-gpu.1-15.m56",
310 |    "type": "gcloud",
311 |    "uri": "gcr.io/deeplearning-platform-release/tf-gpu.1-15:m56"
312 |   },
313 |   "kernelspec": {
314 |    "display_name": "Python 3",
315 |    "language": "python",
316 |    "name": "python3"
317 |   },
318 |   "language_info": {
319 |    "codemirror_mode": {
320 |     "name": "ipython",
321 |     "version": 3
322 |    },
323 |    "file_extension": ".py",
324 |    "mimetype": "text/x-python",
325 |    "name": "python",
326 |    "nbconvert_exporter": "python",
327 |    "pygments_lexer": "ipython3",
328 |    "version": "3.7.8"
329 |   }
330 |  },
331 |  "nbformat": 4,
332 |  "nbformat_minor": 4
333 | }
334 | 


--------------------------------------------------------------------------------
/NewPracticeResource/practice_material.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Practice Assignment :: 01"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## How to import pandas and check the version?"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": []
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": null,
 27 |    "metadata": {},
 28 |    "outputs": [],
 29 |    "source": []
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "## import useful libraries"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "code",
 40 |    "execution_count": null,
 41 |    "metadata": {},
 42 |    "outputs": [],
 43 |    "source": []
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": null,
 48 |    "metadata": {},
 49 |    "outputs": [],
 50 |    "source": []
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "## How to create a series from a list, numpy array and dict?"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": null,
 62 |    "metadata": {},
 63 |    "outputs": [],
 64 |    "source": []
 65 |   },
 66 |   {
 67 |    "cell_type": "code",
 68 |    "execution_count": null,
 69 |    "metadata": {},
 70 |    "outputs": [],
 71 |    "source": []
 72 |   },
 73 |   {
 74 |    "cell_type": "markdown",
 75 |    "metadata": {},
 76 |    "source": [
 77 |     "## How to convert the index of a series into a column of a dataframe?"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "markdown",
 82 |    "metadata": {},
 83 |    "source": [
 84 |     "## hint::\n",
 85 |     "### Convert the series ser into a dataframe with its index as another column on the dataframe."
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "code",
 90 |    "execution_count": null,
 91 |    "metadata": {},
 92 |    "outputs": [],
 93 |    "source": []
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": null,
 98 |    "metadata": {},
 99 |    "outputs": [],
100 |    "source": []
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "metadata": {},
105 |    "source": [
106 |     "# How to combine many series to form a dataframe?"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "code",
111 |    "execution_count": null,
112 |    "metadata": {},
113 |    "outputs": [],
114 |    "source": []
115 |   },
116 |   {
117 |    "cell_type": "code",
118 |    "execution_count": null,
119 |    "metadata": {},
120 |    "outputs": [],
121 |    "source": []
122 |   },
123 |   {
124 |    "cell_type": "markdown",
125 |    "metadata": {},
126 |    "source": [
127 |     "# How to calculate the number of characters in each word in a series?"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "code",
132 |    "execution_count": null,
133 |    "metadata": {},
134 |    "outputs": [],
135 |    "source": []
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": null,
140 |    "metadata": {},
141 |    "outputs": [],
142 |    "source": []
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {},
147 |    "source": [
148 |     "# How to filter valid emails from a series?"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "markdown",
153 |    "metadata": {},
154 |    "source": [
155 |     "## Desired Output::\n",
156 |     "1    rameses@egypt.com\n",
157 |     "\n",
158 |     "2            matt@t.com\n",
159 |     "\n",
160 |     "3    narendra@modi.com\n"
161 |    ]
162 |   },
163 |   {
164 |    "cell_type": "code",
165 |    "execution_count": null,
166 |    "metadata": {},
167 |    "outputs": [],
168 |    "source": []
169 |   },
170 |   {
171 |    "cell_type": "code",
172 |    "execution_count": null,
173 |    "metadata": {},
174 |    "outputs": [],
175 |    "source": []
176 |   },
177 |   {
178 |    "cell_type": "markdown",
179 |    "metadata": {},
180 |    "source": [
181 |     "# How to replace missing spaces in a string with the least frequent character?"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "markdown",
186 |    "metadata": {},
187 |    "source": [
188 |     "## Input::\n",
189 |     "###  my_str = 'dbc deb abed gade'\n",
190 |     "## Desired Output::\n",
191 |     "### 'dbccdebcabedcgade'  # least frequent is 'c'"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "code",
196 |    "execution_count": null,
197 |    "metadata": {},
198 |    "outputs": [],
199 |    "source": []
200 |   },
201 |   {
202 |    "cell_type": "code",
203 |    "execution_count": null,
204 |    "metadata": {},
205 |    "outputs": [],
206 |    "source": []
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "# How to swap two rows of a dataframe?"
213 |    ]
214 |   },
215 |   {
216 |    "cell_type": "code",
217 |    "execution_count": null,
218 |    "metadata": {},
219 |    "outputs": [],
220 |    "source": []
221 |   },
222 |   {
223 |    "cell_type": "code",
224 |    "execution_count": null,
225 |    "metadata": {},
226 |    "outputs": [],
227 |    "source": []
228 |   },
229 |   {
230 |    "cell_type": "markdown",
231 |    "metadata": {},
232 |    "source": [
233 |     "# How to get the positions where values of two columns match?"
234 |    ]
235 |   },
236 |   {
237 |    "cell_type": "code",
238 |    "execution_count": null,
239 |    "metadata": {},
240 |    "outputs": [],
241 |    "source": []
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": null,
246 |    "metadata": {},
247 |    "outputs": [],
248 |    "source": []
249 |   },
250 |   {
251 |    "cell_type": "markdown",
252 |    "metadata": {},
253 |    "source": [
254 |     "# How to replace both the diagonals of dataframe with 0?"
255 |    ]
256 |   },
257 |   {
258 |    "cell_type": "code",
259 |    "execution_count": null,
260 |    "metadata": {},
261 |    "outputs": [],
262 |    "source": []
263 |   },
264 |   {
265 |    "cell_type": "code",
266 |    "execution_count": null,
267 |    "metadata": {},
268 |    "outputs": [],
269 |    "source": []
270 |   },
271 |   {
272 |    "cell_type": "markdown",
273 |    "metadata": {},
274 |    "source": [
275 |     "# How to get the particular group of a groupby dataframe by key?"
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "markdown",
280 |    "metadata": {},
281 |    "source": [
282 |     "### This is a question related to understanding of grouped dataframe."
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": null,
288 |    "metadata": {},
289 |    "outputs": [],
290 |    "source": []
291 |   },
292 |   {
293 |    "cell_type": "code",
294 |    "execution_count": null,
295 |    "metadata": {},
296 |    "outputs": [],
297 |    "source": []
298 |   },
299 |   {
300 |    "cell_type": "markdown",
301 |    "metadata": {},
302 |    "source": [
303 |     "# Which column contains the highest number of row-wise maximum values?"
304 |    ]
305 |   },
306 |   {
307 |    "cell_type": "markdown",
308 |    "metadata": {},
309 |    "source": [
310 |     "### Obtain the column name with the highest number of row-wise maximum’s in df."
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "code",
315 |    "execution_count": null,
316 |    "metadata": {},
317 |    "outputs": [],
318 |    "source": []
319 |   },
320 |   {
321 |    "cell_type": "code",
322 |    "execution_count": null,
323 |    "metadata": {},
324 |    "outputs": [],
325 |    "source": []
326 |   },
327 |   {
328 |    "cell_type": "code",
329 |    "execution_count": null,
330 |    "metadata": {},
331 |    "outputs": [],
332 |    "source": []
333 |   }
334 |  ],
335 |  "metadata": {
336 |   "environment": {
337 |    "name": "tf-gpu.1-15.m56",
338 |    "type": "gcloud",
339 |    "uri": "gcr.io/deeplearning-platform-release/tf-gpu.1-15:m56"
340 |   },
341 |   "kernelspec": {
342 |    "display_name": "Python 3",
343 |    "language": "python",
344 |    "name": "python3"
345 |   },
346 |   "language_info": {
347 |    "codemirror_mode": {
348 |     "name": "ipython",
349 |     "version": 3
350 |    },
351 |    "file_extension": ".py",
352 |    "mimetype": "text/x-python",
353 |    "name": "python",
354 |    "nbconvert_exporter": "python",
355 |    "pygments_lexer": "ipython3",
356 |    "version": "3.7.8"
357 |   }
358 |  },
359 |  "nbformat": 4,
360 |  "nbformat_minor": 4
361 | }
362 | 


--------------------------------------------------------------------------------
/practiceResource/data/CreatingDataFrames.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Creating DataFrames\n",
  8 |     "\n",
  9 |     "Many ways to do it!"
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "code",
 14 |    "execution_count": 1,
 15 |    "metadata": {
 16 |     "ExecuteTime": {
 17 |      "end_time": "2020-02-16T02:33:23.867017Z",
 18 |      "start_time": "2020-02-16T02:33:21.139382Z"
 19 |     }
 20 |    },
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "import pandas as pd\n",
 24 |     "import numpy as np"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 5,
 30 |    "metadata": {
 31 |     "ExecuteTime": {
 32 |      "end_time": "2020-02-16T02:48:39.727399Z",
 33 |      "start_time": "2020-02-16T02:48:39.707843Z"
 34 |     }
 35 |    },
 36 |    "outputs": [
 37 |     {
 38 |      "name": "stdout",
 39 |      "output_type": "stream",
 40 |      "text": [
 41 |       "[[0.97699724 0.23250035 0.17454747]\n",
 42 |       " [0.11011626 0.90673085 0.37222005]\n",
 43 |       " [0.77665114 0.81701713 0.57427769]\n",
 44 |       " [0.34080801 0.09617229 0.26027026]\n",
 45 |       " [0.03694591 0.5385542  0.95945971]]\n"
 46 |      ]
 47 |     },
 48 |     {
 49 |      "data": {
 50 |       "text/html": [
 51 |        "<div>\n",
 52 |        "<style scoped>\n",
 53 |        "    .dataframe tbody tr th:only-of-type {\n",
 54 |        "        vertical-align: middle;\n",
 55 |        "    }\n",
 56 |        "\n",
 57 |        "    .dataframe tbody tr th {\n",
 58 |        "        vertical-align: top;\n",
 59 |        "    }\n",
 60 |        "\n",
 61 |        "    .dataframe thead th {\n",
 62 |        "        text-align: right;\n",
 63 |        "    }\n",
 64 |        "</style>\n",
 65 |        "<table border=\"1\" class=\"dataframe\">\n",
 66 |        "  <thead>\n",
 67 |        "    <tr style=\"text-align: right;\">\n",
 68 |        "      <th></th>\n",
 69 |        "      <th>A</th>\n",
 70 |        "      <th>B</th>\n",
 71 |        "      <th>C</th>\n",
 72 |        "    </tr>\n",
 73 |        "  </thead>\n",
 74 |        "  <tbody>\n",
 75 |        "    <tr>\n",
 76 |        "      <th>0</th>\n",
 77 |        "      <td>0.976997</td>\n",
 78 |        "      <td>0.232500</td>\n",
 79 |        "      <td>0.174547</td>\n",
 80 |        "    </tr>\n",
 81 |        "    <tr>\n",
 82 |        "      <th>1</th>\n",
 83 |        "      <td>0.110116</td>\n",
 84 |        "      <td>0.906731</td>\n",
 85 |        "      <td>0.372220</td>\n",
 86 |        "    </tr>\n",
 87 |        "    <tr>\n",
 88 |        "      <th>2</th>\n",
 89 |        "      <td>0.776651</td>\n",
 90 |        "      <td>0.817017</td>\n",
 91 |        "      <td>0.574278</td>\n",
 92 |        "    </tr>\n",
 93 |        "    <tr>\n",
 94 |        "      <th>3</th>\n",
 95 |        "      <td>0.340808</td>\n",
 96 |        "      <td>0.096172</td>\n",
 97 |        "      <td>0.260270</td>\n",
 98 |        "    </tr>\n",
 99 |        "    <tr>\n",
100 |        "      <th>4</th>\n",
101 |        "      <td>0.036946</td>\n",
102 |        "      <td>0.538554</td>\n",
103 |        "      <td>0.959460</td>\n",
104 |        "    </tr>\n",
105 |        "  </tbody>\n",
106 |        "</table>\n",
107 |        "</div>"
108 |       ],
109 |       "text/plain": [
110 |        "          A         B         C\n",
111 |        "0  0.976997  0.232500  0.174547\n",
112 |        "1  0.110116  0.906731  0.372220\n",
113 |        "2  0.776651  0.817017  0.574278\n",
114 |        "3  0.340808  0.096172  0.260270\n",
115 |        "4  0.036946  0.538554  0.959460"
116 |       ]
117 |      },
118 |      "execution_count": 5,
119 |      "metadata": {},
120 |      "output_type": "execute_result"
121 |     }
122 |    ],
123 |    "source": [
124 |     "data = np.random.random(size=(5, 3))\n",
125 |     "print(data)\n",
126 |     "\n",
127 |     "# Common 2D array and columns method\n",
128 |     "df = pd.DataFrame(data=data, columns=[\"A\", \"B\", \"C\"])\n",
129 |     "df"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "code",
134 |    "execution_count": 6,
135 |    "metadata": {
136 |     "ExecuteTime": {
137 |      "end_time": "2020-02-16T02:49:34.262524Z",
138 |      "start_time": "2020-02-16T02:49:34.252447Z"
139 |     }
140 |    },
141 |    "outputs": [
142 |     {
143 |      "data": {
144 |       "text/html": [
145 |        "<div>\n",
146 |        "<style scoped>\n",
147 |        "    .dataframe tbody tr th:only-of-type {\n",
148 |        "        vertical-align: middle;\n",
149 |        "    }\n",
150 |        "\n",
151 |        "    .dataframe tbody tr th {\n",
152 |        "        vertical-align: top;\n",
153 |        "    }\n",
154 |        "\n",
155 |        "    .dataframe thead th {\n",
156 |        "        text-align: right;\n",
157 |        "    }\n",
158 |        "</style>\n",
159 |        "<table border=\"1\" class=\"dataframe\">\n",
160 |        "  <thead>\n",
161 |        "    <tr style=\"text-align: right;\">\n",
162 |        "      <th></th>\n",
163 |        "      <th>A</th>\n",
164 |        "      <th>B</th>\n",
165 |        "    </tr>\n",
166 |        "  </thead>\n",
167 |        "  <tbody>\n",
168 |        "    <tr>\n",
169 |        "      <th>0</th>\n",
170 |        "      <td>1</td>\n",
171 |        "      <td>Sam</td>\n",
172 |        "    </tr>\n",
173 |        "    <tr>\n",
174 |        "      <th>1</th>\n",
175 |        "      <td>2</td>\n",
176 |        "      <td>Alex</td>\n",
177 |        "    </tr>\n",
178 |        "    <tr>\n",
179 |        "      <th>2</th>\n",
180 |        "      <td>3</td>\n",
181 |        "      <td>John</td>\n",
182 |        "    </tr>\n",
183 |        "  </tbody>\n",
184 |        "</table>\n",
185 |        "</div>"
186 |       ],
187 |       "text/plain": [
188 |        "   A     B\n",
189 |        "0  1   Sam\n",
190 |        "1  2  Alex\n",
191 |        "2  3  John"
192 |       ]
193 |      },
194 |      "execution_count": 6,
195 |      "metadata": {},
196 |      "output_type": "execute_result"
197 |     }
198 |    ],
199 |    "source": [
200 |     "# A dictionary of columns\n",
201 |     "df = pd.DataFrame(data={\"A\": [1, 2, 3], \"B\": [\"Sam\", \"Alex\", \"John\"]})\n",
202 |     "df"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "code",
207 |    "execution_count": 9,
208 |    "metadata": {
209 |     "ExecuteTime": {
210 |      "end_time": "2020-02-16T02:51:16.389841Z",
211 |      "start_time": "2020-02-16T02:51:16.379319Z"
212 |     }
213 |    },
214 |    "outputs": [
215 |     {
216 |      "data": {
217 |       "text/html": [
218 |        "<div>\n",
219 |        "<style scoped>\n",
220 |        "    .dataframe tbody tr th:only-of-type {\n",
221 |        "        vertical-align: middle;\n",
222 |        "    }\n",
223 |        "\n",
224 |        "    .dataframe tbody tr th {\n",
225 |        "        vertical-align: top;\n",
226 |        "    }\n",
227 |        "\n",
228 |        "    .dataframe thead th {\n",
229 |        "        text-align: right;\n",
230 |        "    }\n",
231 |        "</style>\n",
232 |        "<table border=\"1\" class=\"dataframe\">\n",
233 |        "  <thead>\n",
234 |        "    <tr style=\"text-align: right;\">\n",
235 |        "      <th></th>\n",
236 |        "      <th>A</th>\n",
237 |        "      <th>B</th>\n",
238 |        "    </tr>\n",
239 |        "  </thead>\n",
240 |        "  <tbody>\n",
241 |        "    <tr>\n",
242 |        "      <th>0</th>\n",
243 |        "      <td>1</td>\n",
244 |        "      <td>Sam</td>\n",
245 |        "    </tr>\n",
246 |        "    <tr>\n",
247 |        "      <th>1</th>\n",
248 |        "      <td>2</td>\n",
249 |        "      <td>Alex</td>\n",
250 |        "    </tr>\n",
251 |        "    <tr>\n",
252 |        "      <th>2</th>\n",
253 |        "      <td>3</td>\n",
254 |        "      <td>John</td>\n",
255 |        "    </tr>\n",
256 |        "  </tbody>\n",
257 |        "</table>\n",
258 |        "</div>"
259 |       ],
260 |       "text/plain": [
261 |        "   A     B\n",
262 |        "0  1   Sam\n",
263 |        "1  2  Alex\n",
264 |        "2  3  John"
265 |       ]
266 |      },
267 |      "execution_count": 9,
268 |      "metadata": {},
269 |      "output_type": "execute_result"
270 |     }
271 |    ],
272 |    "source": [
273 |     "# Or a list of rows (ie tuples) with a dtype\n",
274 |     "dtype = [(\"A\", np.int), (\"B\", (np.str, 20))]\n",
275 |     "data = np.array([(1, \"Sam\"), (2, \"Alex\"), (3, \"John\")], dtype=dtype)\n",
276 |     "df = pd.DataFrame(data)\n",
277 |     "df"
278 |    ]
279 |   },
280 |   {
281 |    "cell_type": "code",
282 |    "execution_count": 10,
283 |    "metadata": {
284 |     "ExecuteTime": {
285 |      "end_time": "2020-02-16T02:52:39.660112Z",
286 |      "start_time": "2020-02-16T02:52:39.651418Z"
287 |     }
288 |    },
289 |    "outputs": [
290 |     {
291 |      "data": {
292 |       "text/html": [
293 |        "<div>\n",
294 |        "<style scoped>\n",
295 |        "    .dataframe tbody tr th:only-of-type {\n",
296 |        "        vertical-align: middle;\n",
297 |        "    }\n",
298 |        "\n",
299 |        "    .dataframe tbody tr th {\n",
300 |        "        vertical-align: top;\n",
301 |        "    }\n",
302 |        "\n",
303 |        "    .dataframe thead th {\n",
304 |        "        text-align: right;\n",
305 |        "    }\n",
306 |        "</style>\n",
307 |        "<table border=\"1\" class=\"dataframe\">\n",
308 |        "  <thead>\n",
309 |        "    <tr style=\"text-align: right;\">\n",
310 |        "      <th></th>\n",
311 |        "      <th>A</th>\n",
312 |        "      <th>B</th>\n",
313 |        "    </tr>\n",
314 |        "  </thead>\n",
315 |        "  <tbody>\n",
316 |        "    <tr>\n",
317 |        "      <th>0</th>\n",
318 |        "      <td>1</td>\n",
319 |        "      <td>Sam</td>\n",
320 |        "    </tr>\n",
321 |        "    <tr>\n",
322 |        "      <th>1</th>\n",
323 |        "      <td>2</td>\n",
324 |        "      <td>Alex</td>\n",
325 |        "    </tr>\n",
326 |        "    <tr>\n",
327 |        "      <th>2</th>\n",
328 |        "      <td>3</td>\n",
329 |        "      <td>John</td>\n",
330 |        "    </tr>\n",
331 |        "  </tbody>\n",
332 |        "</table>\n",
333 |        "</div>"
334 |       ],
335 |       "text/plain": [
336 |        "   A     B\n",
337 |        "0  1   Sam\n",
338 |        "1  2  Alex\n",
339 |        "2  3  John"
340 |       ]
341 |      },
342 |      "execution_count": 10,
343 |      "metadata": {},
344 |      "output_type": "execute_result"
345 |     }
346 |    ],
347 |    "source": [
348 |     "# Or the dictionary based version of list of rows\n",
349 |     "data = [{\"A\": 1, \"B\": \"Sam\"}, {\"A\": 2, \"B\": \"Alex\"}, {\"A\": 3, \"B\": \"John\"}]\n",
350 |     "df = pd.DataFrame(data)\n",
351 |     "df"
352 |    ]
353 |   }
354 |  ],
355 |  "metadata": {
356 |   "kernelspec": {
357 |    "display_name": "Python 3",
358 |    "language": "python",
359 |    "name": "python3"
360 |   },
361 |   "language_info": {
362 |    "codemirror_mode": {
363 |     "name": "ipython",
364 |     "version": 3
365 |    },
366 |    "file_extension": ".py",
367 |    "mimetype": "text/x-python",
368 |    "name": "python",
369 |    "nbconvert_exporter": "python",
370 |    "pygments_lexer": "ipython3",
371 |    "version": "3.7.3"
372 |   }
373 |  },
374 |  "nbformat": 4,
375 |  "nbformat_minor": 2
376 | }
377 | 


--------------------------------------------------------------------------------
/practiceResource/heart.csv:
--------------------------------------------------------------------------------
  1 | age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
  2 | 63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
  3 | 37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
  4 | 41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
  5 | 56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
  6 | 57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
  7 | 57,1,0,140,192,0,1,148,0,0.4,1,0,1,1
  8 | 56,0,1,140,294,0,0,153,0,1.3,1,0,2,1
  9 | 44,1,1,120,263,0,1,173,0,0,2,0,3,1
 10 | 52,1,2,172,199,1,1,162,0,0.5,2,0,3,1
 11 | 57,1,2,150,168,0,1,174,0,1.6,2,0,2,1
 12 | 54,1,0,140,239,0,1,160,0,1.2,2,0,2,1
 13 | 48,0,2,130,275,0,1,139,0,0.2,2,0,2,1
 14 | 49,1,1,130,266,0,1,171,0,0.6,2,0,2,1
 15 | 64,1,3,110,211,0,0,144,1,1.8,1,0,2,1
 16 | 58,0,3,150,283,1,0,162,0,1,2,0,2,1
 17 | 50,0,2,120,219,0,1,158,0,1.6,1,0,2,1
 18 | 58,0,2,120,340,0,1,172,0,0,2,0,2,1
 19 | 66,0,3,150,226,0,1,114,0,2.6,0,0,2,1
 20 | 43,1,0,150,247,0,1,171,0,1.5,2,0,2,1
 21 | 69,0,3,140,239,0,1,151,0,1.8,2,2,2,1
 22 | 59,1,0,135,234,0,1,161,0,0.5,1,0,3,1
 23 | 44,1,2,130,233,0,1,179,1,0.4,2,0,2,1
 24 | 42,1,0,140,226,0,1,178,0,0,2,0,2,1
 25 | 61,1,2,150,243,1,1,137,1,1,1,0,2,1
 26 | 40,1,3,140,199,0,1,178,1,1.4,2,0,3,1
 27 | 71,0,1,160,302,0,1,162,0,0.4,2,2,2,1
 28 | 59,1,2,150,212,1,1,157,0,1.6,2,0,2,1
 29 | 51,1,2,110,175,0,1,123,0,0.6,2,0,2,1
 30 | 65,0,2,140,417,1,0,157,0,0.8,2,1,2,1
 31 | 53,1,2,130,197,1,0,152,0,1.2,0,0,2,1
 32 | 41,0,1,105,198,0,1,168,0,0,2,1,2,1
 33 | 65,1,0,120,177,0,1,140,0,0.4,2,0,3,1
 34 | 44,1,1,130,219,0,0,188,0,0,2,0,2,1
 35 | 54,1,2,125,273,0,0,152,0,0.5,0,1,2,1
 36 | 51,1,3,125,213,0,0,125,1,1.4,2,1,2,1
 37 | 46,0,2,142,177,0,0,160,1,1.4,0,0,2,1
 38 | 54,0,2,135,304,1,1,170,0,0,2,0,2,1
 39 | 54,1,2,150,232,0,0,165,0,1.6,2,0,3,1
 40 | 65,0,2,155,269,0,1,148,0,0.8,2,0,2,1
 41 | 65,0,2,160,360,0,0,151,0,0.8,2,0,2,1
 42 | 51,0,2,140,308,0,0,142,0,1.5,2,1,2,1
 43 | 48,1,1,130,245,0,0,180,0,0.2,1,0,2,1
 44 | 45,1,0,104,208,0,0,148,1,3,1,0,2,1
 45 | 53,0,0,130,264,0,0,143,0,0.4,1,0,2,1
 46 | 39,1,2,140,321,0,0,182,0,0,2,0,2,1
 47 | 52,1,1,120,325,0,1,172,0,0.2,2,0,2,1
 48 | 44,1,2,140,235,0,0,180,0,0,2,0,2,1
 49 | 47,1,2,138,257,0,0,156,0,0,2,0,2,1
 50 | 53,0,2,128,216,0,0,115,0,0,2,0,0,1
 51 | 53,0,0,138,234,0,0,160,0,0,2,0,2,1
 52 | 51,0,2,130,256,0,0,149,0,0.5,2,0,2,1
 53 | 66,1,0,120,302,0,0,151,0,0.4,1,0,2,1
 54 | 62,1,2,130,231,0,1,146,0,1.8,1,3,3,1
 55 | 44,0,2,108,141,0,1,175,0,0.6,1,0,2,1
 56 | 63,0,2,135,252,0,0,172,0,0,2,0,2,1
 57 | 52,1,1,134,201,0,1,158,0,0.8,2,1,2,1
 58 | 48,1,0,122,222,0,0,186,0,0,2,0,2,1
 59 | 45,1,0,115,260,0,0,185,0,0,2,0,2,1
 60 | 34,1,3,118,182,0,0,174,0,0,2,0,2,1
 61 | 57,0,0,128,303,0,0,159,0,0,2,1,2,1
 62 | 71,0,2,110,265,1,0,130,0,0,2,1,2,1
 63 | 54,1,1,108,309,0,1,156,0,0,2,0,3,1
 64 | 52,1,3,118,186,0,0,190,0,0,1,0,1,1
 65 | 41,1,1,135,203,0,1,132,0,0,1,0,1,1
 66 | 58,1,2,140,211,1,0,165,0,0,2,0,2,1
 67 | 35,0,0,138,183,0,1,182,0,1.4,2,0,2,1
 68 | 51,1,2,100,222,0,1,143,1,1.2,1,0,2,1
 69 | 45,0,1,130,234,0,0,175,0,0.6,1,0,2,1
 70 | 44,1,1,120,220,0,1,170,0,0,2,0,2,1
 71 | 62,0,0,124,209,0,1,163,0,0,2,0,2,1
 72 | 54,1,2,120,258,0,0,147,0,0.4,1,0,3,1
 73 | 51,1,2,94,227,0,1,154,1,0,2,1,3,1
 74 | 29,1,1,130,204,0,0,202,0,0,2,0,2,1
 75 | 51,1,0,140,261,0,0,186,1,0,2,0,2,1
 76 | 43,0,2,122,213,0,1,165,0,0.2,1,0,2,1
 77 | 55,0,1,135,250,0,0,161,0,1.4,1,0,2,1
 78 | 51,1,2,125,245,1,0,166,0,2.4,1,0,2,1
 79 | 59,1,1,140,221,0,1,164,1,0,2,0,2,1
 80 | 52,1,1,128,205,1,1,184,0,0,2,0,2,1
 81 | 58,1,2,105,240,0,0,154,1,0.6,1,0,3,1
 82 | 41,1,2,112,250,0,1,179,0,0,2,0,2,1
 83 | 45,1,1,128,308,0,0,170,0,0,2,0,2,1
 84 | 60,0,2,102,318,0,1,160,0,0,2,1,2,1
 85 | 52,1,3,152,298,1,1,178,0,1.2,1,0,3,1
 86 | 42,0,0,102,265,0,0,122,0,0.6,1,0,2,1
 87 | 67,0,2,115,564,0,0,160,0,1.6,1,0,3,1
 88 | 68,1,2,118,277,0,1,151,0,1,2,1,3,1
 89 | 46,1,1,101,197,1,1,156,0,0,2,0,3,1
 90 | 54,0,2,110,214,0,1,158,0,1.6,1,0,2,1
 91 | 58,0,0,100,248,0,0,122,0,1,1,0,2,1
 92 | 48,1,2,124,255,1,1,175,0,0,2,2,2,1
 93 | 57,1,0,132,207,0,1,168,1,0,2,0,3,1
 94 | 52,1,2,138,223,0,1,169,0,0,2,4,2,1
 95 | 54,0,1,132,288,1,0,159,1,0,2,1,2,1
 96 | 45,0,1,112,160,0,1,138,0,0,1,0,2,1
 97 | 53,1,0,142,226,0,0,111,1,0,2,0,3,1
 98 | 62,0,0,140,394,0,0,157,0,1.2,1,0,2,1
 99 | 52,1,0,108,233,1,1,147,0,0.1,2,3,3,1
100 | 43,1,2,130,315,0,1,162,0,1.9,2,1,2,1
101 | 53,1,2,130,246,1,0,173,0,0,2,3,2,1
102 | 42,1,3,148,244,0,0,178,0,0.8,2,2,2,1
103 | 59,1,3,178,270,0,0,145,0,4.2,0,0,3,1
104 | 63,0,1,140,195,0,1,179,0,0,2,2,2,1
105 | 42,1,2,120,240,1,1,194,0,0.8,0,0,3,1
106 | 50,1,2,129,196,0,1,163,0,0,2,0,2,1
107 | 68,0,2,120,211,0,0,115,0,1.5,1,0,2,1
108 | 69,1,3,160,234,1,0,131,0,0.1,1,1,2,1
109 | 45,0,0,138,236,0,0,152,1,0.2,1,0,2,1
110 | 50,0,1,120,244,0,1,162,0,1.1,2,0,2,1
111 | 50,0,0,110,254,0,0,159,0,0,2,0,2,1
112 | 64,0,0,180,325,0,1,154,1,0,2,0,2,1
113 | 57,1,2,150,126,1,1,173,0,0.2,2,1,3,1
114 | 64,0,2,140,313,0,1,133,0,0.2,2,0,3,1
115 | 43,1,0,110,211,0,1,161,0,0,2,0,3,1
116 | 55,1,1,130,262,0,1,155,0,0,2,0,2,1
117 | 37,0,2,120,215,0,1,170,0,0,2,0,2,1
118 | 41,1,2,130,214,0,0,168,0,2,1,0,2,1
119 | 56,1,3,120,193,0,0,162,0,1.9,1,0,3,1
120 | 46,0,1,105,204,0,1,172,0,0,2,0,2,1
121 | 46,0,0,138,243,0,0,152,1,0,1,0,2,1
122 | 64,0,0,130,303,0,1,122,0,2,1,2,2,1
123 | 59,1,0,138,271,0,0,182,0,0,2,0,2,1
124 | 41,0,2,112,268,0,0,172,1,0,2,0,2,1
125 | 54,0,2,108,267,0,0,167,0,0,2,0,2,1
126 | 39,0,2,94,199,0,1,179,0,0,2,0,2,1
127 | 34,0,1,118,210,0,1,192,0,0.7,2,0,2,1
128 | 47,1,0,112,204,0,1,143,0,0.1,2,0,2,1
129 | 67,0,2,152,277,0,1,172,0,0,2,1,2,1
130 | 52,0,2,136,196,0,0,169,0,0.1,1,0,2,1
131 | 74,0,1,120,269,0,0,121,1,0.2,2,1,2,1
132 | 54,0,2,160,201,0,1,163,0,0,2,1,2,1
133 | 49,0,1,134,271,0,1,162,0,0,1,0,2,1
134 | 42,1,1,120,295,0,1,162,0,0,2,0,2,1
135 | 41,1,1,110,235,0,1,153,0,0,2,0,2,1
136 | 41,0,1,126,306,0,1,163,0,0,2,0,2,1
137 | 49,0,0,130,269,0,1,163,0,0,2,0,2,1
138 | 60,0,2,120,178,1,1,96,0,0,2,0,2,1
139 | 62,1,1,128,208,1,0,140,0,0,2,0,2,1
140 | 57,1,0,110,201,0,1,126,1,1.5,1,0,1,1
141 | 64,1,0,128,263,0,1,105,1,0.2,1,1,3,1
142 | 51,0,2,120,295,0,0,157,0,0.6,2,0,2,1
143 | 43,1,0,115,303,0,1,181,0,1.2,1,0,2,1
144 | 42,0,2,120,209,0,1,173,0,0,1,0,2,1
145 | 67,0,0,106,223,0,1,142,0,0.3,2,2,2,1
146 | 76,0,2,140,197,0,2,116,0,1.1,1,0,2,1
147 | 70,1,1,156,245,0,0,143,0,0,2,0,2,1
148 | 44,0,2,118,242,0,1,149,0,0.3,1,1,2,1
149 | 60,0,3,150,240,0,1,171,0,0.9,2,0,2,1
150 | 44,1,2,120,226,0,1,169,0,0,2,0,2,1
151 | 42,1,2,130,180,0,1,150,0,0,2,0,2,1
152 | 66,1,0,160,228,0,0,138,0,2.3,2,0,1,1
153 | 71,0,0,112,149,0,1,125,0,1.6,1,0,2,1
154 | 64,1,3,170,227,0,0,155,0,0.6,1,0,3,1
155 | 66,0,2,146,278,0,0,152,0,0,1,1,2,1
156 | 39,0,2,138,220,0,1,152,0,0,1,0,2,1
157 | 58,0,0,130,197,0,1,131,0,0.6,1,0,2,1
158 | 47,1,2,130,253,0,1,179,0,0,2,0,2,1
159 | 35,1,1,122,192,0,1,174,0,0,2,0,2,1
160 | 58,1,1,125,220,0,1,144,0,0.4,1,4,3,1
161 | 56,1,1,130,221,0,0,163,0,0,2,0,3,1
162 | 56,1,1,120,240,0,1,169,0,0,0,0,2,1
163 | 55,0,1,132,342,0,1,166,0,1.2,2,0,2,1
164 | 41,1,1,120,157,0,1,182,0,0,2,0,2,1
165 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
166 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
167 | 67,1,0,160,286,0,0,108,1,1.5,1,3,2,0
168 | 67,1,0,120,229,0,0,129,1,2.6,1,2,3,0
169 | 62,0,0,140,268,0,0,160,0,3.6,0,2,2,0
170 | 63,1,0,130,254,0,0,147,0,1.4,1,1,3,0
171 | 53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
172 | 56,1,2,130,256,1,0,142,1,0.6,1,1,1,0
173 | 48,1,1,110,229,0,1,168,0,1,0,0,3,0
174 | 58,1,1,120,284,0,0,160,0,1.8,1,0,2,0
175 | 58,1,2,132,224,0,0,173,0,3.2,2,2,3,0
176 | 60,1,0,130,206,0,0,132,1,2.4,1,2,3,0
177 | 40,1,0,110,167,0,0,114,1,2,1,0,3,0
178 | 60,1,0,117,230,1,1,160,1,1.4,2,2,3,0
179 | 64,1,2,140,335,0,1,158,0,0,2,0,2,0
180 | 43,1,0,120,177,0,0,120,1,2.5,1,0,3,0
181 | 57,1,0,150,276,0,0,112,1,0.6,1,1,1,0
182 | 55,1,0,132,353,0,1,132,1,1.2,1,1,3,0
183 | 65,0,0,150,225,0,0,114,0,1,1,3,3,0
184 | 61,0,0,130,330,0,0,169,0,0,2,0,2,0
185 | 58,1,2,112,230,0,0,165,0,2.5,1,1,3,0
186 | 50,1,0,150,243,0,0,128,0,2.6,1,0,3,0
187 | 44,1,0,112,290,0,0,153,0,0,2,1,2,0
188 | 60,1,0,130,253,0,1,144,1,1.4,2,1,3,0
189 | 54,1,0,124,266,0,0,109,1,2.2,1,1,3,0
190 | 50,1,2,140,233,0,1,163,0,0.6,1,1,3,0
191 | 41,1,0,110,172,0,0,158,0,0,2,0,3,0
192 | 51,0,0,130,305,0,1,142,1,1.2,1,0,3,0
193 | 58,1,0,128,216,0,0,131,1,2.2,1,3,3,0
194 | 54,1,0,120,188,0,1,113,0,1.4,1,1,3,0
195 | 60,1,0,145,282,0,0,142,1,2.8,1,2,3,0
196 | 60,1,2,140,185,0,0,155,0,3,1,0,2,0
197 | 59,1,0,170,326,0,0,140,1,3.4,0,0,3,0
198 | 46,1,2,150,231,0,1,147,0,3.6,1,0,2,0
199 | 67,1,0,125,254,1,1,163,0,0.2,1,2,3,0
200 | 62,1,0,120,267,0,1,99,1,1.8,1,2,3,0
201 | 65,1,0,110,248,0,0,158,0,0.6,2,2,1,0
202 | 44,1,0,110,197,0,0,177,0,0,2,1,2,0
203 | 60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
204 | 58,1,0,150,270,0,0,111,1,0.8,2,0,3,0
205 | 68,1,2,180,274,1,0,150,1,1.6,1,0,3,0
206 | 62,0,0,160,164,0,0,145,0,6.2,0,3,3,0
207 | 52,1,0,128,255,0,1,161,1,0,2,1,3,0
208 | 59,1,0,110,239,0,0,142,1,1.2,1,1,3,0
209 | 60,0,0,150,258,0,0,157,0,2.6,1,2,3,0
210 | 49,1,2,120,188,0,1,139,0,2,1,3,3,0
211 | 59,1,0,140,177,0,1,162,1,0,2,1,3,0
212 | 57,1,2,128,229,0,0,150,0,0.4,1,1,3,0
213 | 61,1,0,120,260,0,1,140,1,3.6,1,1,3,0
214 | 39,1,0,118,219,0,1,140,0,1.2,1,0,3,0
215 | 61,0,0,145,307,0,0,146,1,1,1,0,3,0
216 | 56,1,0,125,249,1,0,144,1,1.2,1,1,2,0
217 | 43,0,0,132,341,1,0,136,1,3,1,0,3,0
218 | 62,0,2,130,263,0,1,97,0,1.2,1,1,3,0
219 | 63,1,0,130,330,1,0,132,1,1.8,2,3,3,0
220 | 65,1,0,135,254,0,0,127,0,2.8,1,1,3,0
221 | 48,1,0,130,256,1,0,150,1,0,2,2,3,0
222 | 63,0,0,150,407,0,0,154,0,4,1,3,3,0
223 | 55,1,0,140,217,0,1,111,1,5.6,0,0,3,0
224 | 65,1,3,138,282,1,0,174,0,1.4,1,1,2,0
225 | 56,0,0,200,288,1,0,133,1,4,0,2,3,0
226 | 54,1,0,110,239,0,1,126,1,2.8,1,1,3,0
227 | 70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
228 | 62,1,1,120,281,0,0,103,0,1.4,1,1,3,0
229 | 35,1,0,120,198,0,1,130,1,1.6,1,0,3,0
230 | 59,1,3,170,288,0,0,159,0,0.2,1,0,3,0
231 | 64,1,2,125,309,0,1,131,1,1.8,1,0,3,0
232 | 47,1,2,108,243,0,1,152,0,0,2,0,2,0
233 | 57,1,0,165,289,1,0,124,0,1,1,3,3,0
234 | 55,1,0,160,289,0,0,145,1,0.8,1,1,3,0
235 | 64,1,0,120,246,0,0,96,1,2.2,0,1,2,0
236 | 70,1,0,130,322,0,0,109,0,2.4,1,3,2,0
237 | 51,1,0,140,299,0,1,173,1,1.6,2,0,3,0
238 | 58,1,0,125,300,0,0,171,0,0,2,2,3,0
239 | 60,1,0,140,293,0,0,170,0,1.2,1,2,3,0
240 | 77,1,0,125,304,0,0,162,1,0,2,3,2,0
241 | 35,1,0,126,282,0,0,156,1,0,2,0,3,0
242 | 70,1,2,160,269,0,1,112,1,2.9,1,1,3,0
243 | 59,0,0,174,249,0,1,143,1,0,1,0,2,0
244 | 64,1,0,145,212,0,0,132,0,2,1,2,1,0
245 | 57,1,0,152,274,0,1,88,1,1.2,1,1,3,0
246 | 56,1,0,132,184,0,0,105,1,2.1,1,1,1,0
247 | 48,1,0,124,274,0,0,166,0,0.5,1,0,3,0
248 | 56,0,0,134,409,0,0,150,1,1.9,1,2,3,0
249 | 66,1,1,160,246,0,1,120,1,0,1,3,1,0
250 | 54,1,1,192,283,0,0,195,0,0,2,1,3,0
251 | 69,1,2,140,254,0,0,146,0,2,1,3,3,0
252 | 51,1,0,140,298,0,1,122,1,4.2,1,3,3,0
253 | 43,1,0,132,247,1,0,143,1,0.1,1,4,3,0
254 | 62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
255 | 67,1,0,100,299,0,0,125,1,0.9,1,2,2,0
256 | 59,1,3,160,273,0,0,125,0,0,2,0,2,0
257 | 45,1,0,142,309,0,0,147,1,0,1,3,3,0
258 | 58,1,0,128,259,0,0,130,1,3,1,2,3,0
259 | 50,1,0,144,200,0,0,126,1,0.9,1,0,3,0
260 | 62,0,0,150,244,0,1,154,1,1.4,1,0,2,0
261 | 38,1,3,120,231,0,1,182,1,3.8,1,0,3,0
262 | 66,0,0,178,228,1,1,165,1,1,1,2,3,0
263 | 52,1,0,112,230,0,1,160,0,0,2,1,2,0
264 | 53,1,0,123,282,0,1,95,1,2,1,2,3,0
265 | 63,0,0,108,269,0,1,169,1,1.8,1,2,2,0
266 | 54,1,0,110,206,0,0,108,1,0,1,1,2,0
267 | 66,1,0,112,212,0,0,132,1,0.1,2,1,2,0
268 | 55,0,0,180,327,0,2,117,1,3.4,1,0,2,0
269 | 49,1,2,118,149,0,0,126,0,0.8,2,3,2,0
270 | 54,1,0,122,286,0,0,116,1,3.2,1,2,2,0
271 | 56,1,0,130,283,1,0,103,1,1.6,0,0,3,0
272 | 46,1,0,120,249,0,0,144,0,0.8,2,0,3,0
273 | 61,1,3,134,234,0,1,145,0,2.6,1,2,2,0
274 | 67,1,0,120,237,0,1,71,0,1,1,0,2,0
275 | 58,1,0,100,234,0,1,156,0,0.1,2,1,3,0
276 | 47,1,0,110,275,0,0,118,1,1,1,1,2,0
277 | 52,1,0,125,212,0,1,168,0,1,2,2,3,0
278 | 58,1,0,146,218,0,1,105,0,2,1,1,3,0
279 | 57,1,1,124,261,0,1,141,0,0.3,2,0,3,0
280 | 58,0,1,136,319,1,0,152,0,0,2,2,2,0
281 | 61,1,0,138,166,0,0,125,1,3.6,1,1,2,0
282 | 42,1,0,136,315,0,1,125,1,1.8,1,0,1,0
283 | 52,1,0,128,204,1,1,156,1,1,1,0,0,0
284 | 59,1,2,126,218,1,1,134,0,2.2,1,1,1,0
285 | 40,1,0,152,223,0,1,181,0,0,2,0,3,0
286 | 61,1,0,140,207,0,0,138,1,1.9,2,1,3,0
287 | 46,1,0,140,311,0,1,120,1,1.8,1,2,3,0
288 | 59,1,3,134,204,0,1,162,0,0.8,2,2,2,0
289 | 57,1,1,154,232,0,0,164,0,0,2,1,2,0
290 | 57,1,0,110,335,0,1,143,1,3,1,1,3,0
291 | 55,0,0,128,205,0,2,130,1,2,1,1,3,0
292 | 61,1,0,148,203,0,1,161,0,0,2,1,3,0
293 | 58,1,0,114,318,0,2,140,0,4.4,0,3,1,0
294 | 58,0,0,170,225,1,0,146,1,2.8,1,2,1,0
295 | 67,1,2,152,212,0,0,150,0,0.8,1,0,3,0
296 | 44,1,0,120,169,0,1,144,1,2.8,0,0,1,0
297 | 63,1,0,140,187,0,0,144,1,4,2,2,3,0
298 | 63,0,0,124,197,0,1,136,1,0,1,0,2,0
299 | 59,1,0,164,176,1,0,90,0,1,1,2,1,0
300 | 57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
301 | 45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
302 | 68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
303 | 57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
304 | 57,0,1,130,236,0,0,174,0,0,1,1,2,0
305 | 


--------------------------------------------------------------------------------
/practiceResource/data/heart.csv:
--------------------------------------------------------------------------------
  1 | age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
  2 | 63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
  3 | 37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
  4 | 41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
  5 | 56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
  6 | 57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
  7 | 57,1,0,140,192,0,1,148,0,0.4,1,0,1,1
  8 | 56,0,1,140,294,0,0,153,0,1.3,1,0,2,1
  9 | 44,1,1,120,263,0,1,173,0,0,2,0,3,1
 10 | 52,1,2,172,199,1,1,162,0,0.5,2,0,3,1
 11 | 57,1,2,150,168,0,1,174,0,1.6,2,0,2,1
 12 | 54,1,0,140,239,0,1,160,0,1.2,2,0,2,1
 13 | 48,0,2,130,275,0,1,139,0,0.2,2,0,2,1
 14 | 49,1,1,130,266,0,1,171,0,0.6,2,0,2,1
 15 | 64,1,3,110,211,0,0,144,1,1.8,1,0,2,1
 16 | 58,0,3,150,283,1,0,162,0,1,2,0,2,1
 17 | 50,0,2,120,219,0,1,158,0,1.6,1,0,2,1
 18 | 58,0,2,120,340,0,1,172,0,0,2,0,2,1
 19 | 66,0,3,150,226,0,1,114,0,2.6,0,0,2,1
 20 | 43,1,0,150,247,0,1,171,0,1.5,2,0,2,1
 21 | 69,0,3,140,239,0,1,151,0,1.8,2,2,2,1
 22 | 59,1,0,135,234,0,1,161,0,0.5,1,0,3,1
 23 | 44,1,2,130,233,0,1,179,1,0.4,2,0,2,1
 24 | 42,1,0,140,226,0,1,178,0,0,2,0,2,1
 25 | 61,1,2,150,243,1,1,137,1,1,1,0,2,1
 26 | 40,1,3,140,199,0,1,178,1,1.4,2,0,3,1
 27 | 71,0,1,160,302,0,1,162,0,0.4,2,2,2,1
 28 | 59,1,2,150,212,1,1,157,0,1.6,2,0,2,1
 29 | 51,1,2,110,175,0,1,123,0,0.6,2,0,2,1
 30 | 65,0,2,140,417,1,0,157,0,0.8,2,1,2,1
 31 | 53,1,2,130,197,1,0,152,0,1.2,0,0,2,1
 32 | 41,0,1,105,198,0,1,168,0,0,2,1,2,1
 33 | 65,1,0,120,177,0,1,140,0,0.4,2,0,3,1
 34 | 44,1,1,130,219,0,0,188,0,0,2,0,2,1
 35 | 54,1,2,125,273,0,0,152,0,0.5,0,1,2,1
 36 | 51,1,3,125,213,0,0,125,1,1.4,2,1,2,1
 37 | 46,0,2,142,177,0,0,160,1,1.4,0,0,2,1
 38 | 54,0,2,135,304,1,1,170,0,0,2,0,2,1
 39 | 54,1,2,150,232,0,0,165,0,1.6,2,0,3,1
 40 | 65,0,2,155,269,0,1,148,0,0.8,2,0,2,1
 41 | 65,0,2,160,360,0,0,151,0,0.8,2,0,2,1
 42 | 51,0,2,140,308,0,0,142,0,1.5,2,1,2,1
 43 | 48,1,1,130,245,0,0,180,0,0.2,1,0,2,1
 44 | 45,1,0,104,208,0,0,148,1,3,1,0,2,1
 45 | 53,0,0,130,264,0,0,143,0,0.4,1,0,2,1
 46 | 39,1,2,140,321,0,0,182,0,0,2,0,2,1
 47 | 52,1,1,120,325,0,1,172,0,0.2,2,0,2,1
 48 | 44,1,2,140,235,0,0,180,0,0,2,0,2,1
 49 | 47,1,2,138,257,0,0,156,0,0,2,0,2,1
 50 | 53,0,2,128,216,0,0,115,0,0,2,0,0,1
 51 | 53,0,0,138,234,0,0,160,0,0,2,0,2,1
 52 | 51,0,2,130,256,0,0,149,0,0.5,2,0,2,1
 53 | 66,1,0,120,302,0,0,151,0,0.4,1,0,2,1
 54 | 62,1,2,130,231,0,1,146,0,1.8,1,3,3,1
 55 | 44,0,2,108,141,0,1,175,0,0.6,1,0,2,1
 56 | 63,0,2,135,252,0,0,172,0,0,2,0,2,1
 57 | 52,1,1,134,201,0,1,158,0,0.8,2,1,2,1
 58 | 48,1,0,122,222,0,0,186,0,0,2,0,2,1
 59 | 45,1,0,115,260,0,0,185,0,0,2,0,2,1
 60 | 34,1,3,118,182,0,0,174,0,0,2,0,2,1
 61 | 57,0,0,128,303,0,0,159,0,0,2,1,2,1
 62 | 71,0,2,110,265,1,0,130,0,0,2,1,2,1
 63 | 54,1,1,108,309,0,1,156,0,0,2,0,3,1
 64 | 52,1,3,118,186,0,0,190,0,0,1,0,1,1
 65 | 41,1,1,135,203,0,1,132,0,0,1,0,1,1
 66 | 58,1,2,140,211,1,0,165,0,0,2,0,2,1
 67 | 35,0,0,138,183,0,1,182,0,1.4,2,0,2,1
 68 | 51,1,2,100,222,0,1,143,1,1.2,1,0,2,1
 69 | 45,0,1,130,234,0,0,175,0,0.6,1,0,2,1
 70 | 44,1,1,120,220,0,1,170,0,0,2,0,2,1
 71 | 62,0,0,124,209,0,1,163,0,0,2,0,2,1
 72 | 54,1,2,120,258,0,0,147,0,0.4,1,0,3,1
 73 | 51,1,2,94,227,0,1,154,1,0,2,1,3,1
 74 | 29,1,1,130,204,0,0,202,0,0,2,0,2,1
 75 | 51,1,0,140,261,0,0,186,1,0,2,0,2,1
 76 | 43,0,2,122,213,0,1,165,0,0.2,1,0,2,1
 77 | 55,0,1,135,250,0,0,161,0,1.4,1,0,2,1
 78 | 51,1,2,125,245,1,0,166,0,2.4,1,0,2,1
 79 | 59,1,1,140,221,0,1,164,1,0,2,0,2,1
 80 | 52,1,1,128,205,1,1,184,0,0,2,0,2,1
 81 | 58,1,2,105,240,0,0,154,1,0.6,1,0,3,1
 82 | 41,1,2,112,250,0,1,179,0,0,2,0,2,1
 83 | 45,1,1,128,308,0,0,170,0,0,2,0,2,1
 84 | 60,0,2,102,318,0,1,160,0,0,2,1,2,1
 85 | 52,1,3,152,298,1,1,178,0,1.2,1,0,3,1
 86 | 42,0,0,102,265,0,0,122,0,0.6,1,0,2,1
 87 | 67,0,2,115,564,0,0,160,0,1.6,1,0,3,1
 88 | 68,1,2,118,277,0,1,151,0,1,2,1,3,1
 89 | 46,1,1,101,197,1,1,156,0,0,2,0,3,1
 90 | 54,0,2,110,214,0,1,158,0,1.6,1,0,2,1
 91 | 58,0,0,100,248,0,0,122,0,1,1,0,2,1
 92 | 48,1,2,124,255,1,1,175,0,0,2,2,2,1
 93 | 57,1,0,132,207,0,1,168,1,0,2,0,3,1
 94 | 52,1,2,138,223,0,1,169,0,0,2,4,2,1
 95 | 54,0,1,132,288,1,0,159,1,0,2,1,2,1
 96 | 45,0,1,112,160,0,1,138,0,0,1,0,2,1
 97 | 53,1,0,142,226,0,0,111,1,0,2,0,3,1
 98 | 62,0,0,140,394,0,0,157,0,1.2,1,0,2,1
 99 | 52,1,0,108,233,1,1,147,0,0.1,2,3,3,1
100 | 43,1,2,130,315,0,1,162,0,1.9,2,1,2,1
101 | 53,1,2,130,246,1,0,173,0,0,2,3,2,1
102 | 42,1,3,148,244,0,0,178,0,0.8,2,2,2,1
103 | 59,1,3,178,270,0,0,145,0,4.2,0,0,3,1
104 | 63,0,1,140,195,0,1,179,0,0,2,2,2,1
105 | 42,1,2,120,240,1,1,194,0,0.8,0,0,3,1
106 | 50,1,2,129,196,0,1,163,0,0,2,0,2,1
107 | 68,0,2,120,211,0,0,115,0,1.5,1,0,2,1
108 | 69,1,3,160,234,1,0,131,0,0.1,1,1,2,1
109 | 45,0,0,138,236,0,0,152,1,0.2,1,0,2,1
110 | 50,0,1,120,244,0,1,162,0,1.1,2,0,2,1
111 | 50,0,0,110,254,0,0,159,0,0,2,0,2,1
112 | 64,0,0,180,325,0,1,154,1,0,2,0,2,1
113 | 57,1,2,150,126,1,1,173,0,0.2,2,1,3,1
114 | 64,0,2,140,313,0,1,133,0,0.2,2,0,3,1
115 | 43,1,0,110,211,0,1,161,0,0,2,0,3,1
116 | 55,1,1,130,262,0,1,155,0,0,2,0,2,1
117 | 37,0,2,120,215,0,1,170,0,0,2,0,2,1
118 | 41,1,2,130,214,0,0,168,0,2,1,0,2,1
119 | 56,1,3,120,193,0,0,162,0,1.9,1,0,3,1
120 | 46,0,1,105,204,0,1,172,0,0,2,0,2,1
121 | 46,0,0,138,243,0,0,152,1,0,1,0,2,1
122 | 64,0,0,130,303,0,1,122,0,2,1,2,2,1
123 | 59,1,0,138,271,0,0,182,0,0,2,0,2,1
124 | 41,0,2,112,268,0,0,172,1,0,2,0,2,1
125 | 54,0,2,108,267,0,0,167,0,0,2,0,2,1
126 | 39,0,2,94,199,0,1,179,0,0,2,0,2,1
127 | 34,0,1,118,210,0,1,192,0,0.7,2,0,2,1
128 | 47,1,0,112,204,0,1,143,0,0.1,2,0,2,1
129 | 67,0,2,152,277,0,1,172,0,0,2,1,2,1
130 | 52,0,2,136,196,0,0,169,0,0.1,1,0,2,1
131 | 74,0,1,120,269,0,0,121,1,0.2,2,1,2,1
132 | 54,0,2,160,201,0,1,163,0,0,2,1,2,1
133 | 49,0,1,134,271,0,1,162,0,0,1,0,2,1
134 | 42,1,1,120,295,0,1,162,0,0,2,0,2,1
135 | 41,1,1,110,235,0,1,153,0,0,2,0,2,1
136 | 41,0,1,126,306,0,1,163,0,0,2,0,2,1
137 | 49,0,0,130,269,0,1,163,0,0,2,0,2,1
138 | 60,0,2,120,178,1,1,96,0,0,2,0,2,1
139 | 62,1,1,128,208,1,0,140,0,0,2,0,2,1
140 | 57,1,0,110,201,0,1,126,1,1.5,1,0,1,1
141 | 64,1,0,128,263,0,1,105,1,0.2,1,1,3,1
142 | 51,0,2,120,295,0,0,157,0,0.6,2,0,2,1
143 | 43,1,0,115,303,0,1,181,0,1.2,1,0,2,1
144 | 42,0,2,120,209,0,1,173,0,0,1,0,2,1
145 | 67,0,0,106,223,0,1,142,0,0.3,2,2,2,1
146 | 76,0,2,140,197,0,2,116,0,1.1,1,0,2,1
147 | 70,1,1,156,245,0,0,143,0,0,2,0,2,1
148 | 44,0,2,118,242,0,1,149,0,0.3,1,1,2,1
149 | 60,0,3,150,240,0,1,171,0,0.9,2,0,2,1
150 | 44,1,2,120,226,0,1,169,0,0,2,0,2,1
151 | 42,1,2,130,180,0,1,150,0,0,2,0,2,1
152 | 66,1,0,160,228,0,0,138,0,2.3,2,0,1,1
153 | 71,0,0,112,149,0,1,125,0,1.6,1,0,2,1
154 | 64,1,3,170,227,0,0,155,0,0.6,1,0,3,1
155 | 66,0,2,146,278,0,0,152,0,0,1,1,2,1
156 | 39,0,2,138,220,0,1,152,0,0,1,0,2,1
157 | 58,0,0,130,197,0,1,131,0,0.6,1,0,2,1
158 | 47,1,2,130,253,0,1,179,0,0,2,0,2,1
159 | 35,1,1,122,192,0,1,174,0,0,2,0,2,1
160 | 58,1,1,125,220,0,1,144,0,0.4,1,4,3,1
161 | 56,1,1,130,221,0,0,163,0,0,2,0,3,1
162 | 56,1,1,120,240,0,1,169,0,0,0,0,2,1
163 | 55,0,1,132,342,0,1,166,0,1.2,2,0,2,1
164 | 41,1,1,120,157,0,1,182,0,0,2,0,2,1
165 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
166 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
167 | 67,1,0,160,286,0,0,108,1,1.5,1,3,2,0
168 | 67,1,0,120,229,0,0,129,1,2.6,1,2,3,0
169 | 62,0,0,140,268,0,0,160,0,3.6,0,2,2,0
170 | 63,1,0,130,254,0,0,147,0,1.4,1,1,3,0
171 | 53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
172 | 56,1,2,130,256,1,0,142,1,0.6,1,1,1,0
173 | 48,1,1,110,229,0,1,168,0,1,0,0,3,0
174 | 58,1,1,120,284,0,0,160,0,1.8,1,0,2,0
175 | 58,1,2,132,224,0,0,173,0,3.2,2,2,3,0
176 | 60,1,0,130,206,0,0,132,1,2.4,1,2,3,0
177 | 40,1,0,110,167,0,0,114,1,2,1,0,3,0
178 | 60,1,0,117,230,1,1,160,1,1.4,2,2,3,0
179 | 64,1,2,140,335,0,1,158,0,0,2,0,2,0
180 | 43,1,0,120,177,0,0,120,1,2.5,1,0,3,0
181 | 57,1,0,150,276,0,0,112,1,0.6,1,1,1,0
182 | 55,1,0,132,353,0,1,132,1,1.2,1,1,3,0
183 | 65,0,0,150,225,0,0,114,0,1,1,3,3,0
184 | 61,0,0,130,330,0,0,169,0,0,2,0,2,0
185 | 58,1,2,112,230,0,0,165,0,2.5,1,1,3,0
186 | 50,1,0,150,243,0,0,128,0,2.6,1,0,3,0
187 | 44,1,0,112,290,0,0,153,0,0,2,1,2,0
188 | 60,1,0,130,253,0,1,144,1,1.4,2,1,3,0
189 | 54,1,0,124,266,0,0,109,1,2.2,1,1,3,0
190 | 50,1,2,140,233,0,1,163,0,0.6,1,1,3,0
191 | 41,1,0,110,172,0,0,158,0,0,2,0,3,0
192 | 51,0,0,130,305,0,1,142,1,1.2,1,0,3,0
193 | 58,1,0,128,216,0,0,131,1,2.2,1,3,3,0
194 | 54,1,0,120,188,0,1,113,0,1.4,1,1,3,0
195 | 60,1,0,145,282,0,0,142,1,2.8,1,2,3,0
196 | 60,1,2,140,185,0,0,155,0,3,1,0,2,0
197 | 59,1,0,170,326,0,0,140,1,3.4,0,0,3,0
198 | 46,1,2,150,231,0,1,147,0,3.6,1,0,2,0
199 | 67,1,0,125,254,1,1,163,0,0.2,1,2,3,0
200 | 62,1,0,120,267,0,1,99,1,1.8,1,2,3,0
201 | 65,1,0,110,248,0,0,158,0,0.6,2,2,1,0
202 | 44,1,0,110,197,0,0,177,0,0,2,1,2,0
203 | 60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
204 | 58,1,0,150,270,0,0,111,1,0.8,2,0,3,0
205 | 68,1,2,180,274,1,0,150,1,1.6,1,0,3,0
206 | 62,0,0,160,164,0,0,145,0,6.2,0,3,3,0
207 | 52,1,0,128,255,0,1,161,1,0,2,1,3,0
208 | 59,1,0,110,239,0,0,142,1,1.2,1,1,3,0
209 | 60,0,0,150,258,0,0,157,0,2.6,1,2,3,0
210 | 49,1,2,120,188,0,1,139,0,2,1,3,3,0
211 | 59,1,0,140,177,0,1,162,1,0,2,1,3,0
212 | 57,1,2,128,229,0,0,150,0,0.4,1,1,3,0
213 | 61,1,0,120,260,0,1,140,1,3.6,1,1,3,0
214 | 39,1,0,118,219,0,1,140,0,1.2,1,0,3,0
215 | 61,0,0,145,307,0,0,146,1,1,1,0,3,0
216 | 56,1,0,125,249,1,0,144,1,1.2,1,1,2,0
217 | 43,0,0,132,341,1,0,136,1,3,1,0,3,0
218 | 62,0,2,130,263,0,1,97,0,1.2,1,1,3,0
219 | 63,1,0,130,330,1,0,132,1,1.8,2,3,3,0
220 | 65,1,0,135,254,0,0,127,0,2.8,1,1,3,0
221 | 48,1,0,130,256,1,0,150,1,0,2,2,3,0
222 | 63,0,0,150,407,0,0,154,0,4,1,3,3,0
223 | 55,1,0,140,217,0,1,111,1,5.6,0,0,3,0
224 | 65,1,3,138,282,1,0,174,0,1.4,1,1,2,0
225 | 56,0,0,200,288,1,0,133,1,4,0,2,3,0
226 | 54,1,0,110,239,0,1,126,1,2.8,1,1,3,0
227 | 70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
228 | 62,1,1,120,281,0,0,103,0,1.4,1,1,3,0
229 | 35,1,0,120,198,0,1,130,1,1.6,1,0,3,0
230 | 59,1,3,170,288,0,0,159,0,0.2,1,0,3,0
231 | 64,1,2,125,309,0,1,131,1,1.8,1,0,3,0
232 | 47,1,2,108,243,0,1,152,0,0,2,0,2,0
233 | 57,1,0,165,289,1,0,124,0,1,1,3,3,0
234 | 55,1,0,160,289,0,0,145,1,0.8,1,1,3,0
235 | 64,1,0,120,246,0,0,96,1,2.2,0,1,2,0
236 | 70,1,0,130,322,0,0,109,0,2.4,1,3,2,0
237 | 51,1,0,140,299,0,1,173,1,1.6,2,0,3,0
238 | 58,1,0,125,300,0,0,171,0,0,2,2,3,0
239 | 60,1,0,140,293,0,0,170,0,1.2,1,2,3,0
240 | 77,1,0,125,304,0,0,162,1,0,2,3,2,0
241 | 35,1,0,126,282,0,0,156,1,0,2,0,3,0
242 | 70,1,2,160,269,0,1,112,1,2.9,1,1,3,0
243 | 59,0,0,174,249,0,1,143,1,0,1,0,2,0
244 | 64,1,0,145,212,0,0,132,0,2,1,2,1,0
245 | 57,1,0,152,274,0,1,88,1,1.2,1,1,3,0
246 | 56,1,0,132,184,0,0,105,1,2.1,1,1,1,0
247 | 48,1,0,124,274,0,0,166,0,0.5,1,0,3,0
248 | 56,0,0,134,409,0,0,150,1,1.9,1,2,3,0
249 | 66,1,1,160,246,0,1,120,1,0,1,3,1,0
250 | 54,1,1,192,283,0,0,195,0,0,2,1,3,0
251 | 69,1,2,140,254,0,0,146,0,2,1,3,3,0
252 | 51,1,0,140,298,0,1,122,1,4.2,1,3,3,0
253 | 43,1,0,132,247,1,0,143,1,0.1,1,4,3,0
254 | 62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
255 | 67,1,0,100,299,0,0,125,1,0.9,1,2,2,0
256 | 59,1,3,160,273,0,0,125,0,0,2,0,2,0
257 | 45,1,0,142,309,0,0,147,1,0,1,3,3,0
258 | 58,1,0,128,259,0,0,130,1,3,1,2,3,0
259 | 50,1,0,144,200,0,0,126,1,0.9,1,0,3,0
260 | 62,0,0,150,244,0,1,154,1,1.4,1,0,2,0
261 | 38,1,3,120,231,0,1,182,1,3.8,1,0,3,0
262 | 66,0,0,178,228,1,1,165,1,1,1,2,3,0
263 | 52,1,0,112,230,0,1,160,0,0,2,1,2,0
264 | 53,1,0,123,282,0,1,95,1,2,1,2,3,0
265 | 63,0,0,108,269,0,1,169,1,1.8,1,2,2,0
266 | 54,1,0,110,206,0,0,108,1,0,1,1,2,0
267 | 66,1,0,112,212,0,0,132,1,0.1,2,1,2,0
268 | 55,0,0,180,327,0,2,117,1,3.4,1,0,2,0
269 | 49,1,2,118,149,0,0,126,0,0.8,2,3,2,0
270 | 54,1,0,122,286,0,0,116,1,3.2,1,2,2,0
271 | 56,1,0,130,283,1,0,103,1,1.6,0,0,3,0
272 | 46,1,0,120,249,0,0,144,0,0.8,2,0,3,0
273 | 61,1,3,134,234,0,1,145,0,2.6,1,2,2,0
274 | 67,1,0,120,237,0,1,71,0,1,1,0,2,0
275 | 58,1,0,100,234,0,1,156,0,0.1,2,1,3,0
276 | 47,1,0,110,275,0,0,118,1,1,1,1,2,0
277 | 52,1,0,125,212,0,1,168,0,1,2,2,3,0
278 | 58,1,0,146,218,0,1,105,0,2,1,1,3,0
279 | 57,1,1,124,261,0,1,141,0,0.3,2,0,3,0
280 | 58,0,1,136,319,1,0,152,0,0,2,2,2,0
281 | 61,1,0,138,166,0,0,125,1,3.6,1,1,2,0
282 | 42,1,0,136,315,0,1,125,1,1.8,1,0,1,0
283 | 52,1,0,128,204,1,1,156,1,1,1,0,0,0
284 | 59,1,2,126,218,1,1,134,0,2.2,1,1,1,0
285 | 40,1,0,152,223,0,1,181,0,0,2,0,3,0
286 | 61,1,0,140,207,0,0,138,1,1.9,2,1,3,0
287 | 46,1,0,140,311,0,1,120,1,1.8,1,2,3,0
288 | 59,1,3,134,204,0,1,162,0,0.8,2,2,2,0
289 | 57,1,1,154,232,0,0,164,0,0,2,1,2,0
290 | 57,1,0,110,335,0,1,143,1,3,1,1,3,0
291 | 55,0,0,128,205,0,2,130,1,2,1,1,3,0
292 | 61,1,0,148,203,0,1,161,0,0,2,1,3,0
293 | 58,1,0,114,318,0,2,140,0,4.4,0,3,1,0
294 | 58,0,0,170,225,1,0,146,1,2.8,1,2,1,0
295 | 67,1,2,152,212,0,0,150,0,0.8,1,0,3,0
296 | 44,1,0,120,169,0,1,144,1,2.8,0,0,1,0
297 | 63,1,0,140,187,0,0,144,1,4,2,2,3,0
298 | 63,0,0,124,197,0,1,136,1,0,1,0,2,0
299 | 59,1,0,164,176,1,0,90,0,1,1,2,1,0
300 | 57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
301 | 45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
302 | 68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
303 | 57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
304 | 57,0,1,130,236,0,0,174,0,0,1,1,2,0
305 | 


--------------------------------------------------------------------------------
/practiceResource/dataMaipulation/Questions.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Extra Practise - Basics\n",
  8 |     "\n",
  9 |     "In this optional practise session, I thought it would be fun to look at some cost of living data from, you guessed it, Kaggle: https://www.kaggle.com/stephenofarrell/cost-of-living\n",
 10 |     "\n",
 11 |     "Here are the objectives:\n",
 12 |     "\n",
 13 |     "1. Rename the \"index\" column to \"location\"\n",
 14 |     "2. Utilise apply to generate two new columns from the location - city and country\n",
 15 |     "3. Realise the easy solution doesn't doesnt work for the United States and create a function for apply to remove specific states.\n",
 16 |     "3. Figure out which country has the most cities listed, and create a dataset from only that country\n",
 17 |     "4. Sort the dataset by the cost of living 'Apartment (1 bedroom) in City Centre'\n",
 18 |     "5. Cry over housing prices if you live in the Bay Area.\n",
 19 |     "\n",
 20 |     "After that, feel free to keep playing with the data yourself.\n"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "code",
 25 |    "execution_count": 32,
 26 |    "metadata": {
 27 |     "ExecuteTime": {
 28 |      "end_time": "2020-02-03T02:01:59.091568Z",
 29 |      "start_time": "2020-02-03T02:01:59.056659Z"
 30 |     }
 31 |    },
 32 |    "outputs": [
 33 |     {
 34 |      "data": {
 35 |       "text/html": [
 36 |        "<div>\n",
 37 |        "<style scoped>\n",
 38 |        "    .dataframe tbody tr th:only-of-type {\n",
 39 |        "        vertical-align: middle;\n",
 40 |        "    }\n",
 41 |        "\n",
 42 |        "    .dataframe tbody tr th {\n",
 43 |        "        vertical-align: top;\n",
 44 |        "    }\n",
 45 |        "\n",
 46 |        "    .dataframe thead th {\n",
 47 |        "        text-align: right;\n",
 48 |        "    }\n",
 49 |        "</style>\n",
 50 |        "<table border=\"1\" class=\"dataframe\">\n",
 51 |        "  <thead>\n",
 52 |        "    <tr style=\"text-align: right;\">\n",
 53 |        "      <th></th>\n",
 54 |        "      <th>index</th>\n",
 55 |        "      <th>Meal, Inexpensive Restaurant</th>\n",
 56 |        "      <th>Meal for 2 People, Mid-range Restaurant, Three-course</th>\n",
 57 |        "      <th>McMeal at McDonalds (or Equivalent Combo Meal)</th>\n",
 58 |        "      <th>Domestic Beer (0.5 liter draught)</th>\n",
 59 |        "      <th>Imported Beer (0.33 liter bottle)</th>\n",
 60 |        "      <th>Coke/Pepsi (0.33 liter bottle)</th>\n",
 61 |        "      <th>Water (0.33 liter bottle)</th>\n",
 62 |        "      <th>Milk (regular), (1 liter)</th>\n",
 63 |        "      <th>Loaf of Fresh White Bread (500g)</th>\n",
 64 |        "      <th>...</th>\n",
 65 |        "      <th>Lettuce (1 head)</th>\n",
 66 |        "      <th>Cappuccino (regular)</th>\n",
 67 |        "      <th>Rice (white), (1kg)</th>\n",
 68 |        "      <th>Tomato (1kg)</th>\n",
 69 |        "      <th>Banana (1kg)</th>\n",
 70 |        "      <th>Onion (1kg)</th>\n",
 71 |        "      <th>Beef Round (1kg) (or Equivalent Back Leg Red Meat)</th>\n",
 72 |        "      <th>Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car)</th>\n",
 73 |        "      <th>Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child</th>\n",
 74 |        "      <th>International Primary School, Yearly for 1 Child</th>\n",
 75 |        "    </tr>\n",
 76 |        "  </thead>\n",
 77 |        "  <tbody>\n",
 78 |        "    <tr>\n",
 79 |        "      <th>0</th>\n",
 80 |        "      <td>Saint Petersburg, Russia</td>\n",
 81 |        "      <td>7.34</td>\n",
 82 |        "      <td>29.35</td>\n",
 83 |        "      <td>4.40</td>\n",
 84 |        "      <td>2.20</td>\n",
 85 |        "      <td>2.20</td>\n",
 86 |        "      <td>0.76</td>\n",
 87 |        "      <td>0.53</td>\n",
 88 |        "      <td>0.98</td>\n",
 89 |        "      <td>0.71</td>\n",
 90 |        "      <td>...</td>\n",
 91 |        "      <td>0.86</td>\n",
 92 |        "      <td>1.96</td>\n",
 93 |        "      <td>0.92</td>\n",
 94 |        "      <td>1.91</td>\n",
 95 |        "      <td>0.89</td>\n",
 96 |        "      <td>0.48</td>\n",
 97 |        "      <td>7.18</td>\n",
 98 |        "      <td>19305.29</td>\n",
 99 |        "      <td>411.83</td>\n",
100 |        "      <td>5388.86</td>\n",
101 |        "    </tr>\n",
102 |        "    <tr>\n",
103 |        "      <th>1</th>\n",
104 |        "      <td>Istanbul, Turkey</td>\n",
105 |        "      <td>4.58</td>\n",
106 |        "      <td>15.28</td>\n",
107 |        "      <td>3.82</td>\n",
108 |        "      <td>3.06</td>\n",
109 |        "      <td>3.06</td>\n",
110 |        "      <td>0.64</td>\n",
111 |        "      <td>0.24</td>\n",
112 |        "      <td>0.71</td>\n",
113 |        "      <td>0.36</td>\n",
114 |        "      <td>...</td>\n",
115 |        "      <td>0.61</td>\n",
116 |        "      <td>1.84</td>\n",
117 |        "      <td>1.30</td>\n",
118 |        "      <td>0.80</td>\n",
119 |        "      <td>1.91</td>\n",
120 |        "      <td>0.62</td>\n",
121 |        "      <td>9.73</td>\n",
122 |        "      <td>20874.72</td>\n",
123 |        "      <td>282.94</td>\n",
124 |        "      <td>6905.43</td>\n",
125 |        "    </tr>\n",
126 |        "    <tr>\n",
127 |        "      <th>2</th>\n",
128 |        "      <td>Izmir, Turkey</td>\n",
129 |        "      <td>3.06</td>\n",
130 |        "      <td>12.22</td>\n",
131 |        "      <td>3.06</td>\n",
132 |        "      <td>2.29</td>\n",
133 |        "      <td>2.75</td>\n",
134 |        "      <td>0.61</td>\n",
135 |        "      <td>0.22</td>\n",
136 |        "      <td>0.65</td>\n",
137 |        "      <td>0.38</td>\n",
138 |        "      <td>...</td>\n",
139 |        "      <td>0.57</td>\n",
140 |        "      <td>1.56</td>\n",
141 |        "      <td>1.31</td>\n",
142 |        "      <td>0.70</td>\n",
143 |        "      <td>1.78</td>\n",
144 |        "      <td>0.58</td>\n",
145 |        "      <td>8.61</td>\n",
146 |        "      <td>20898.83</td>\n",
147 |        "      <td>212.18</td>\n",
148 |        "      <td>4948.41</td>\n",
149 |        "    </tr>\n",
150 |        "    <tr>\n",
151 |        "      <th>3</th>\n",
152 |        "      <td>Helsinki, Finland</td>\n",
153 |        "      <td>12.00</td>\n",
154 |        "      <td>65.00</td>\n",
155 |        "      <td>8.00</td>\n",
156 |        "      <td>6.50</td>\n",
157 |        "      <td>6.75</td>\n",
158 |        "      <td>2.66</td>\n",
159 |        "      <td>1.89</td>\n",
160 |        "      <td>0.96</td>\n",
161 |        "      <td>2.27</td>\n",
162 |        "      <td>...</td>\n",
163 |        "      <td>2.30</td>\n",
164 |        "      <td>3.87</td>\n",
165 |        "      <td>2.13</td>\n",
166 |        "      <td>2.91</td>\n",
167 |        "      <td>1.61</td>\n",
168 |        "      <td>1.25</td>\n",
169 |        "      <td>12.34</td>\n",
170 |        "      <td>24402.77</td>\n",
171 |        "      <td>351.60</td>\n",
172 |        "      <td>1641.00</td>\n",
173 |        "    </tr>\n",
174 |        "    <tr>\n",
175 |        "      <th>4</th>\n",
176 |        "      <td>Chisinau, Moldova</td>\n",
177 |        "      <td>4.67</td>\n",
178 |        "      <td>20.74</td>\n",
179 |        "      <td>4.15</td>\n",
180 |        "      <td>1.04</td>\n",
181 |        "      <td>1.43</td>\n",
182 |        "      <td>0.64</td>\n",
183 |        "      <td>0.44</td>\n",
184 |        "      <td>0.68</td>\n",
185 |        "      <td>0.33</td>\n",
186 |        "      <td>...</td>\n",
187 |        "      <td>0.84</td>\n",
188 |        "      <td>1.25</td>\n",
189 |        "      <td>0.93</td>\n",
190 |        "      <td>1.56</td>\n",
191 |        "      <td>1.37</td>\n",
192 |        "      <td>0.59</td>\n",
193 |        "      <td>5.37</td>\n",
194 |        "      <td>17238.13</td>\n",
195 |        "      <td>210.52</td>\n",
196 |        "      <td>2679.30</td>\n",
197 |        "    </tr>\n",
198 |        "  </tbody>\n",
199 |        "</table>\n",
200 |        "<p>5 rows × 56 columns</p>\n",
201 |        "</div>"
202 |       ],
203 |       "text/plain": [
204 |        "                      index  Meal, Inexpensive Restaurant  \\\n",
205 |        "0  Saint Petersburg, Russia                          7.34   \n",
206 |        "1          Istanbul, Turkey                          4.58   \n",
207 |        "2             Izmir, Turkey                          3.06   \n",
208 |        "3         Helsinki, Finland                         12.00   \n",
209 |        "4         Chisinau, Moldova                          4.67   \n",
210 |        "\n",
211 |        "   Meal for 2 People, Mid-range Restaurant, Three-course  \\\n",
212 |        "0                                              29.35       \n",
213 |        "1                                              15.28       \n",
214 |        "2                                              12.22       \n",
215 |        "3                                              65.00       \n",
216 |        "4                                              20.74       \n",
217 |        "\n",
218 |        "   McMeal at McDonalds (or Equivalent Combo Meal)  \\\n",
219 |        "0                                            4.40   \n",
220 |        "1                                            3.82   \n",
221 |        "2                                            3.06   \n",
222 |        "3                                            8.00   \n",
223 |        "4                                            4.15   \n",
224 |        "\n",
225 |        "   Domestic Beer (0.5 liter draught)  Imported Beer (0.33 liter bottle)  \\\n",
226 |        "0                               2.20                               2.20   \n",
227 |        "1                               3.06                               3.06   \n",
228 |        "2                               2.29                               2.75   \n",
229 |        "3                               6.50                               6.75   \n",
230 |        "4                               1.04                               1.43   \n",
231 |        "\n",
232 |        "   Coke/Pepsi (0.33 liter bottle)  Water (0.33 liter bottle)   \\\n",
233 |        "0                            0.76                        0.53   \n",
234 |        "1                            0.64                        0.24   \n",
235 |        "2                            0.61                        0.22   \n",
236 |        "3                            2.66                        1.89   \n",
237 |        "4                            0.64                        0.44   \n",
238 |        "\n",
239 |        "   Milk (regular), (1 liter)  Loaf of Fresh White Bread (500g)  ...  \\\n",
240 |        "0                       0.98                              0.71  ...   \n",
241 |        "1                       0.71                              0.36  ...   \n",
242 |        "2                       0.65                              0.38  ...   \n",
243 |        "3                       0.96                              2.27  ...   \n",
244 |        "4                       0.68                              0.33  ...   \n",
245 |        "\n",
246 |        "   Lettuce (1 head)  Cappuccino (regular)  Rice (white), (1kg)  Tomato (1kg)  \\\n",
247 |        "0              0.86                  1.96                 0.92          1.91   \n",
248 |        "1              0.61                  1.84                 1.30          0.80   \n",
249 |        "2              0.57                  1.56                 1.31          0.70   \n",
250 |        "3              2.30                  3.87                 2.13          2.91   \n",
251 |        "4              0.84                  1.25                 0.93          1.56   \n",
252 |        "\n",
253 |        "   Banana (1kg)  Onion (1kg)  \\\n",
254 |        "0          0.89         0.48   \n",
255 |        "1          1.91         0.62   \n",
256 |        "2          1.78         0.58   \n",
257 |        "3          1.61         1.25   \n",
258 |        "4          1.37         0.59   \n",
259 |        "\n",
260 |        "   Beef Round (1kg) (or Equivalent Back Leg Red Meat)  \\\n",
261 |        "0                                               7.18    \n",
262 |        "1                                               9.73    \n",
263 |        "2                                               8.61    \n",
264 |        "3                                              12.34    \n",
265 |        "4                                               5.37    \n",
266 |        "\n",
267 |        "   Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car)  \\\n",
268 |        "0                                           19305.29          \n",
269 |        "1                                           20874.72          \n",
270 |        "2                                           20898.83          \n",
271 |        "3                                           24402.77          \n",
272 |        "4                                           17238.13          \n",
273 |        "\n",
274 |        "   Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child  \\\n",
275 |        "0                                             411.83                     \n",
276 |        "1                                             282.94                     \n",
277 |        "2                                             212.18                     \n",
278 |        "3                                             351.60                     \n",
279 |        "4                                             210.52                     \n",
280 |        "\n",
281 |        "   International Primary School, Yearly for 1 Child  \n",
282 |        "0                                           5388.86  \n",
283 |        "1                                           6905.43  \n",
284 |        "2                                           4948.41  \n",
285 |        "3                                           1641.00  \n",
286 |        "4                                           2679.30  \n",
287 |        "\n",
288 |        "[5 rows x 56 columns]"
289 |       ]
290 |      },
291 |      "execution_count": 32,
292 |      "metadata": {},
293 |      "output_type": "execute_result"
294 |     }
295 |    ],
296 |    "source": [
297 |     "# Code to start you off and manipulate the data. .T is transpose - swap columns and rows\n",
298 |     "import pandas as pd\n",
299 |     "\n",
300 |     "df = pd.read_csv(\"cost-of-living.csv\", index_col=0).T.reset_index()\n",
301 |     "df.head()"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "markdown",
306 |    "metadata": {},
307 |    "source": [
308 |     "## Rename column"
309 |    ]
310 |   },
311 |   {
312 |    "cell_type": "code",
313 |    "execution_count": 1,
314 |    "metadata": {
315 |     "ExecuteTime": {
316 |      "end_time": "2020-02-03T02:16:15.578519Z",
317 |      "start_time": "2020-02-03T02:16:15.574529Z"
318 |     }
319 |    },
320 |    "outputs": [],
321 |    "source": [
322 |     "# your code here"
323 |    ]
324 |   },
325 |   {
326 |    "cell_type": "markdown",
327 |    "metadata": {},
328 |    "source": [
329 |     "## Get city and country"
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "code",
334 |    "execution_count": 2,
335 |    "metadata": {
336 |     "ExecuteTime": {
337 |      "end_time": "2020-02-03T02:16:18.088160Z",
338 |      "start_time": "2020-02-03T02:16:18.084161Z"
339 |     }
340 |    },
341 |    "outputs": [],
342 |    "source": [
343 |     "# your code here"
344 |    ]
345 |   },
346 |   {
347 |    "cell_type": "code",
348 |    "execution_count": 3,
349 |    "metadata": {
350 |     "ExecuteTime": {
351 |      "end_time": "2020-02-03T02:16:46.755343Z",
352 |      "start_time": "2020-02-03T02:16:46.752351Z"
353 |     }
354 |    },
355 |    "outputs": [],
356 |    "source": [
357 |     "# And - if needed - correct for the US including states and nowhere else doing it"
358 |    ]
359 |   },
360 |   {
361 |    "cell_type": "markdown",
362 |    "metadata": {},
363 |    "source": [
364 |     "## Figure out which country has the most cities"
365 |    ]
366 |   },
367 |   {
368 |    "cell_type": "code",
369 |    "execution_count": 4,
370 |    "metadata": {
371 |     "ExecuteTime": {
372 |      "end_time": "2020-02-03T02:16:50.046784Z",
373 |      "start_time": "2020-02-03T02:16:50.042796Z"
374 |     }
375 |    },
376 |    "outputs": [],
377 |    "source": [
378 |     "# your code here"
379 |    ]
380 |   },
381 |   {
382 |    "cell_type": "markdown",
383 |    "metadata": {},
384 |    "source": [
385 |     "## Create a subset of only that country"
386 |    ]
387 |   },
388 |   {
389 |    "cell_type": "code",
390 |    "execution_count": 5,
391 |    "metadata": {
392 |     "ExecuteTime": {
393 |      "end_time": "2020-02-03T02:16:54.606541Z",
394 |      "start_time": "2020-02-03T02:16:54.602530Z"
395 |     }
396 |    },
397 |    "outputs": [],
398 |    "source": [
399 |     "# your code here"
400 |    ]
401 |   },
402 |   {
403 |    "cell_type": "markdown",
404 |    "metadata": {},
405 |    "source": [
406 |     "## Sort by housing accommodation"
407 |    ]
408 |   },
409 |   {
410 |    "cell_type": "code",
411 |    "execution_count": 8,
412 |    "metadata": {
413 |     "ExecuteTime": {
414 |      "end_time": "2020-02-03T02:17:07.409143Z",
415 |      "start_time": "2020-02-03T02:17:07.406151Z"
416 |     }
417 |    },
418 |    "outputs": [],
419 |    "source": [
420 |     "col = \"Apartment (1 bedroom) in City Centre\"\n",
421 |     "# your code here"
422 |    ]
423 |   },
424 |   {
425 |    "cell_type": "markdown",
426 |    "metadata": {},
427 |    "source": [
428 |     "## Despair over the cost of housing"
429 |    ]
430 |   }
431 |  ],
432 |  "metadata": {
433 |   "kernelspec": {
434 |    "display_name": "Python 3",
435 |    "language": "python",
436 |    "name": "python3"
437 |   },
438 |   "language_info": {
439 |    "codemirror_mode": {
440 |     "name": "ipython",
441 |     "version": 3
442 |    },
443 |    "file_extension": ".py",
444 |    "mimetype": "text/x-python",
445 |    "name": "python",
446 |    "nbconvert_exporter": "python",
447 |    "pygments_lexer": "ipython3",
448 |    "version": "3.7.3"
449 |   }
450 |  },
451 |  "nbformat": 4,
452 |  "nbformat_minor": 2
453 | }
454 | 


--------------------------------------------------------------------------------
/practiceResource/Answers.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Optional Exercise - Data loading\n",
  8 |     "\n",
  9 |     "Heres a very short example for four files to load in. None of these example should be hard, but the idea is to familiarise yourself with the sorts of error messages or weird output you'd get if you load in with the defaults (and they don't work), and then how to change your input arguments to make it better.\n",
 10 |     "\n",
 11 |     "The files to attempt to load in are:\n",
 12 |     "\n",
 13 |     "1. example.pkl\n",
 14 |     "2. example_1.csv\n",
 15 |     "3. example_2.csv\n",
 16 |     "3. example_3.csv\n",
 17 |     "3. example_4.csv"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "code",
 22 |    "execution_count": 2,
 23 |    "metadata": {
 24 |     "ExecuteTime": {
 25 |      "end_time": "2020-02-02T07:24:53.012402Z",
 26 |      "start_time": "2020-02-02T07:24:52.651384Z"
 27 |     }
 28 |    },
 29 |    "outputs": [],
 30 |    "source": [
 31 |     "import pandas as pd"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": 4,
 37 |    "metadata": {
 38 |     "ExecuteTime": {
 39 |      "end_time": "2020-02-02T07:24:57.184281Z",
 40 |      "start_time": "2020-02-02T07:24:57.169320Z"
 41 |     }
 42 |    },
 43 |    "outputs": [
 44 |     {
 45 |      "data": {
 46 |       "text/html": [
 47 |        "<div>\n",
 48 |        "<style scoped>\n",
 49 |        "    .dataframe tbody tr th:only-of-type {\n",
 50 |        "        vertical-align: middle;\n",
 51 |        "    }\n",
 52 |        "\n",
 53 |        "    .dataframe tbody tr th {\n",
 54 |        "        vertical-align: top;\n",
 55 |        "    }\n",
 56 |        "\n",
 57 |        "    .dataframe thead th {\n",
 58 |        "        text-align: right;\n",
 59 |        "    }\n",
 60 |        "</style>\n",
 61 |        "<table border=\"1\" class=\"dataframe\">\n",
 62 |        "  <thead>\n",
 63 |        "    <tr style=\"text-align: right;\">\n",
 64 |        "      <th></th>\n",
 65 |        "      <th>name</th>\n",
 66 |        "      <th>gender</th>\n",
 67 |        "      <th>age</th>\n",
 68 |        "      <th>oaths</th>\n",
 69 |        "    </tr>\n",
 70 |        "  </thead>\n",
 71 |        "  <tbody>\n",
 72 |        "    <tr>\n",
 73 |        "      <th>0</th>\n",
 74 |        "      <td>Kaladin</td>\n",
 75 |        "      <td>Male</td>\n",
 76 |        "      <td>20.0</td>\n",
 77 |        "      <td>3.0</td>\n",
 78 |        "    </tr>\n",
 79 |        "    <tr>\n",
 80 |        "      <th>1</th>\n",
 81 |        "      <td>Shallan</td>\n",
 82 |        "      <td>Female</td>\n",
 83 |        "      <td>17.0</td>\n",
 84 |        "      <td>2.0</td>\n",
 85 |        "    </tr>\n",
 86 |        "    <tr>\n",
 87 |        "      <th>2</th>\n",
 88 |        "      <td>Dalinar</td>\n",
 89 |        "      <td>Male</td>\n",
 90 |        "      <td>53.0</td>\n",
 91 |        "      <td>3.0</td>\n",
 92 |        "    </tr>\n",
 93 |        "    <tr>\n",
 94 |        "      <th>3</th>\n",
 95 |        "      <td>Szeth</td>\n",
 96 |        "      <td>Male</td>\n",
 97 |        "      <td>35.0</td>\n",
 98 |        "      <td>0.0</td>\n",
 99 |        "    </tr>\n",
100 |        "    <tr>\n",
101 |        "      <th>4</th>\n",
102 |        "      <td>Hoid</td>\n",
103 |        "      <td>Male</td>\n",
104 |        "      <td>NaN</td>\n",
105 |        "      <td>NaN</td>\n",
106 |        "    </tr>\n",
107 |        "    <tr>\n",
108 |        "      <th>5</th>\n",
109 |        "      <td>Jashnah</td>\n",
110 |        "      <td>Female</td>\n",
111 |        "      <td>34.0</td>\n",
112 |        "      <td>3.0</td>\n",
113 |        "    </tr>\n",
114 |        "  </tbody>\n",
115 |        "</table>\n",
116 |        "</div>"
117 |       ],
118 |       "text/plain": [
119 |        "      name  gender   age  oaths\n",
120 |        "0  Kaladin    Male  20.0    3.0\n",
121 |        "1  Shallan  Female  17.0    2.0\n",
122 |        "2  Dalinar    Male  53.0    3.0\n",
123 |        "3    Szeth    Male  35.0    0.0\n",
124 |        "4     Hoid    Male   NaN    NaN\n",
125 |        "5  Jashnah  Female  34.0    3.0"
126 |       ]
127 |      },
128 |      "execution_count": 4,
129 |      "metadata": {},
130 |      "output_type": "execute_result"
131 |     }
132 |    ],
133 |    "source": [
134 |     "pd.read_pickle(\"example.pkl\")"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": 6,
140 |    "metadata": {
141 |     "ExecuteTime": {
142 |      "end_time": "2020-02-02T07:34:57.663844Z",
143 |      "start_time": "2020-02-02T07:34:57.648884Z"
144 |     }
145 |    },
146 |    "outputs": [
147 |     {
148 |      "data": {
149 |       "text/html": [
150 |        "<div>\n",
151 |        "<style scoped>\n",
152 |        "    .dataframe tbody tr th:only-of-type {\n",
153 |        "        vertical-align: middle;\n",
154 |        "    }\n",
155 |        "\n",
156 |        "    .dataframe tbody tr th {\n",
157 |        "        vertical-align: top;\n",
158 |        "    }\n",
159 |        "\n",
160 |        "    .dataframe thead th {\n",
161 |        "        text-align: right;\n",
162 |        "    }\n",
163 |        "</style>\n",
164 |        "<table border=\"1\" class=\"dataframe\">\n",
165 |        "  <thead>\n",
166 |        "    <tr style=\"text-align: right;\">\n",
167 |        "      <th></th>\n",
168 |        "      <th>name</th>\n",
169 |        "      <th>gender</th>\n",
170 |        "      <th>age</th>\n",
171 |        "      <th>oaths</th>\n",
172 |        "    </tr>\n",
173 |        "  </thead>\n",
174 |        "  <tbody>\n",
175 |        "    <tr>\n",
176 |        "      <th>0</th>\n",
177 |        "      <td>Kaladin</td>\n",
178 |        "      <td>Male</td>\n",
179 |        "      <td>20.0</td>\n",
180 |        "      <td>3.0</td>\n",
181 |        "    </tr>\n",
182 |        "    <tr>\n",
183 |        "      <th>1</th>\n",
184 |        "      <td>Shallan</td>\n",
185 |        "      <td>Female</td>\n",
186 |        "      <td>17.0</td>\n",
187 |        "      <td>2.0</td>\n",
188 |        "    </tr>\n",
189 |        "    <tr>\n",
190 |        "      <th>2</th>\n",
191 |        "      <td>Dalinar</td>\n",
192 |        "      <td>Male</td>\n",
193 |        "      <td>53.0</td>\n",
194 |        "      <td>3.0</td>\n",
195 |        "    </tr>\n",
196 |        "    <tr>\n",
197 |        "      <th>3</th>\n",
198 |        "      <td>Szeth</td>\n",
199 |        "      <td>Male</td>\n",
200 |        "      <td>35.0</td>\n",
201 |        "      <td>0.0</td>\n",
202 |        "    </tr>\n",
203 |        "    <tr>\n",
204 |        "      <th>4</th>\n",
205 |        "      <td>Hoid</td>\n",
206 |        "      <td>Male</td>\n",
207 |        "      <td>NaN</td>\n",
208 |        "      <td>NaN</td>\n",
209 |        "    </tr>\n",
210 |        "    <tr>\n",
211 |        "      <th>5</th>\n",
212 |        "      <td>Jashnah</td>\n",
213 |        "      <td>Female</td>\n",
214 |        "      <td>34.0</td>\n",
215 |        "      <td>3.0</td>\n",
216 |        "    </tr>\n",
217 |        "  </tbody>\n",
218 |        "</table>\n",
219 |        "</div>"
220 |       ],
221 |       "text/plain": [
222 |        "      name  gender   age  oaths\n",
223 |        "0  Kaladin    Male  20.0    3.0\n",
224 |        "1  Shallan  Female  17.0    2.0\n",
225 |        "2  Dalinar    Male  53.0    3.0\n",
226 |        "3    Szeth    Male  35.0    0.0\n",
227 |        "4     Hoid    Male   NaN    NaN\n",
228 |        "5  Jashnah  Female  34.0    3.0"
229 |       ]
230 |      },
231 |      "execution_count": 6,
232 |      "metadata": {},
233 |      "output_type": "execute_result"
234 |     }
235 |    ],
236 |    "source": [
237 |     "pd.read_csv(\"example_1.csv\")"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": 8,
243 |    "metadata": {
244 |     "ExecuteTime": {
245 |      "end_time": "2020-02-02T07:35:19.179516Z",
246 |      "start_time": "2020-02-02T07:35:19.168566Z"
247 |     }
248 |    },
249 |    "outputs": [
250 |     {
251 |      "data": {
252 |       "text/html": [
253 |        "<div>\n",
254 |        "<style scoped>\n",
255 |        "    .dataframe tbody tr th:only-of-type {\n",
256 |        "        vertical-align: middle;\n",
257 |        "    }\n",
258 |        "\n",
259 |        "    .dataframe tbody tr th {\n",
260 |        "        vertical-align: top;\n",
261 |        "    }\n",
262 |        "\n",
263 |        "    .dataframe thead th {\n",
264 |        "        text-align: right;\n",
265 |        "    }\n",
266 |        "</style>\n",
267 |        "<table border=\"1\" class=\"dataframe\">\n",
268 |        "  <thead>\n",
269 |        "    <tr style=\"text-align: right;\">\n",
270 |        "      <th></th>\n",
271 |        "      <th>name</th>\n",
272 |        "      <th>gender</th>\n",
273 |        "      <th>age</th>\n",
274 |        "      <th>oaths</th>\n",
275 |        "    </tr>\n",
276 |        "  </thead>\n",
277 |        "  <tbody>\n",
278 |        "    <tr>\n",
279 |        "      <th>0</th>\n",
280 |        "      <td>Kaladin</td>\n",
281 |        "      <td>Male</td>\n",
282 |        "      <td>20.0</td>\n",
283 |        "      <td>3.0</td>\n",
284 |        "    </tr>\n",
285 |        "    <tr>\n",
286 |        "      <th>1</th>\n",
287 |        "      <td>Shallan</td>\n",
288 |        "      <td>Female</td>\n",
289 |        "      <td>17.0</td>\n",
290 |        "      <td>2.0</td>\n",
291 |        "    </tr>\n",
292 |        "    <tr>\n",
293 |        "      <th>2</th>\n",
294 |        "      <td>Dalinar</td>\n",
295 |        "      <td>Male</td>\n",
296 |        "      <td>53.0</td>\n",
297 |        "      <td>3.0</td>\n",
298 |        "    </tr>\n",
299 |        "    <tr>\n",
300 |        "      <th>3</th>\n",
301 |        "      <td>Szeth</td>\n",
302 |        "      <td>Male</td>\n",
303 |        "      <td>35.0</td>\n",
304 |        "      <td>0.0</td>\n",
305 |        "    </tr>\n",
306 |        "    <tr>\n",
307 |        "      <th>4</th>\n",
308 |        "      <td>Hoid</td>\n",
309 |        "      <td>Male</td>\n",
310 |        "      <td>NaN</td>\n",
311 |        "      <td>NaN</td>\n",
312 |        "    </tr>\n",
313 |        "    <tr>\n",
314 |        "      <th>5</th>\n",
315 |        "      <td>Jashnah</td>\n",
316 |        "      <td>Female</td>\n",
317 |        "      <td>34.0</td>\n",
318 |        "      <td>3.0</td>\n",
319 |        "    </tr>\n",
320 |        "  </tbody>\n",
321 |        "</table>\n",
322 |        "</div>"
323 |       ],
324 |       "text/plain": [
325 |        "      name  gender   age  oaths\n",
326 |        "0  Kaladin    Male  20.0    3.0\n",
327 |        "1  Shallan  Female  17.0    2.0\n",
328 |        "2  Dalinar    Male  53.0    3.0\n",
329 |        "3    Szeth    Male  35.0    0.0\n",
330 |        "4     Hoid    Male   NaN    NaN\n",
331 |        "5  Jashnah  Female  34.0    3.0"
332 |       ]
333 |      },
334 |      "execution_count": 8,
335 |      "metadata": {},
336 |      "output_type": "execute_result"
337 |     }
338 |    ],
339 |    "source": [
340 |     "pd.read_csv(\"example_2.csv\", delim_whitespace=True)"
341 |    ]
342 |   },
343 |   {
344 |    "cell_type": "code",
345 |    "execution_count": 11,
346 |    "metadata": {
347 |     "ExecuteTime": {
348 |      "end_time": "2020-02-02T07:37:11.470254Z",
349 |      "start_time": "2020-02-02T07:37:11.455294Z"
350 |     }
351 |    },
352 |    "outputs": [
353 |     {
354 |      "data": {
355 |       "text/html": [
356 |        "<div>\n",
357 |        "<style scoped>\n",
358 |        "    .dataframe tbody tr th:only-of-type {\n",
359 |        "        vertical-align: middle;\n",
360 |        "    }\n",
361 |        "\n",
362 |        "    .dataframe tbody tr th {\n",
363 |        "        vertical-align: top;\n",
364 |        "    }\n",
365 |        "\n",
366 |        "    .dataframe thead th {\n",
367 |        "        text-align: right;\n",
368 |        "    }\n",
369 |        "</style>\n",
370 |        "<table border=\"1\" class=\"dataframe\">\n",
371 |        "  <thead>\n",
372 |        "    <tr style=\"text-align: right;\">\n",
373 |        "      <th></th>\n",
374 |        "      <th>name</th>\n",
375 |        "      <th>gender</th>\n",
376 |        "      <th>age</th>\n",
377 |        "      <th>oaths</th>\n",
378 |        "    </tr>\n",
379 |        "  </thead>\n",
380 |        "  <tbody>\n",
381 |        "    <tr>\n",
382 |        "      <th>0</th>\n",
383 |        "      <td>Kaladin</td>\n",
384 |        "      <td>Male</td>\n",
385 |        "      <td>20.0</td>\n",
386 |        "      <td>3.0</td>\n",
387 |        "    </tr>\n",
388 |        "    <tr>\n",
389 |        "      <th>1</th>\n",
390 |        "      <td>Shallan</td>\n",
391 |        "      <td>Female</td>\n",
392 |        "      <td>17.0</td>\n",
393 |        "      <td>2.0</td>\n",
394 |        "    </tr>\n",
395 |        "    <tr>\n",
396 |        "      <th>2</th>\n",
397 |        "      <td>Dalinar</td>\n",
398 |        "      <td>Male</td>\n",
399 |        "      <td>53.0</td>\n",
400 |        "      <td>3.0</td>\n",
401 |        "    </tr>\n",
402 |        "    <tr>\n",
403 |        "      <th>3</th>\n",
404 |        "      <td>Szeth</td>\n",
405 |        "      <td>Male</td>\n",
406 |        "      <td>35.0</td>\n",
407 |        "      <td>0.0</td>\n",
408 |        "    </tr>\n",
409 |        "    <tr>\n",
410 |        "      <th>4</th>\n",
411 |        "      <td>Hoid</td>\n",
412 |        "      <td>Male</td>\n",
413 |        "      <td>NaN</td>\n",
414 |        "      <td>NaN</td>\n",
415 |        "    </tr>\n",
416 |        "    <tr>\n",
417 |        "      <th>5</th>\n",
418 |        "      <td>Jashnah</td>\n",
419 |        "      <td>Female</td>\n",
420 |        "      <td>34.0</td>\n",
421 |        "      <td>3.0</td>\n",
422 |        "    </tr>\n",
423 |        "  </tbody>\n",
424 |        "</table>\n",
425 |        "</div>"
426 |       ],
427 |       "text/plain": [
428 |        "      name  gender   age  oaths\n",
429 |        "0  Kaladin    Male  20.0    3.0\n",
430 |        "1  Shallan  Female  17.0    2.0\n",
431 |        "2  Dalinar    Male  53.0    3.0\n",
432 |        "3    Szeth    Male  35.0    0.0\n",
433 |        "4     Hoid    Male   NaN    NaN\n",
434 |        "5  Jashnah  Female  34.0    3.0"
435 |       ]
436 |      },
437 |      "execution_count": 11,
438 |      "metadata": {},
439 |      "output_type": "execute_result"
440 |     }
441 |    ],
442 |    "source": [
443 |     "pd.read_csv(\"example_3.csv\", index_col=0)"
444 |    ]
445 |   },
446 |   {
447 |    "cell_type": "code",
448 |    "execution_count": 14,
449 |    "metadata": {
450 |     "ExecuteTime": {
451 |      "end_time": "2020-02-02T07:37:31.812465Z",
452 |      "start_time": "2020-02-02T07:37:31.801514Z"
453 |     }
454 |    },
455 |    "outputs": [
456 |     {
457 |      "data": {
458 |       "text/html": [
459 |        "<div>\n",
460 |        "<style scoped>\n",
461 |        "    .dataframe tbody tr th:only-of-type {\n",
462 |        "        vertical-align: middle;\n",
463 |        "    }\n",
464 |        "\n",
465 |        "    .dataframe tbody tr th {\n",
466 |        "        vertical-align: top;\n",
467 |        "    }\n",
468 |        "\n",
469 |        "    .dataframe thead th {\n",
470 |        "        text-align: right;\n",
471 |        "    }\n",
472 |        "</style>\n",
473 |        "<table border=\"1\" class=\"dataframe\">\n",
474 |        "  <thead>\n",
475 |        "    <tr style=\"text-align: right;\">\n",
476 |        "      <th></th>\n",
477 |        "      <th>name</th>\n",
478 |        "      <th>gender</th>\n",
479 |        "      <th>age</th>\n",
480 |        "      <th>oaths</th>\n",
481 |        "    </tr>\n",
482 |        "  </thead>\n",
483 |        "  <tbody>\n",
484 |        "    <tr>\n",
485 |        "      <th>0</th>\n",
486 |        "      <td>Kaladin</td>\n",
487 |        "      <td>Male</td>\n",
488 |        "      <td>20.0</td>\n",
489 |        "      <td>3.0</td>\n",
490 |        "    </tr>\n",
491 |        "    <tr>\n",
492 |        "      <th>1</th>\n",
493 |        "      <td>Shallan</td>\n",
494 |        "      <td>Female</td>\n",
495 |        "      <td>17.0</td>\n",
496 |        "      <td>2.0</td>\n",
497 |        "    </tr>\n",
498 |        "    <tr>\n",
499 |        "      <th>2</th>\n",
500 |        "      <td>Dalinar</td>\n",
501 |        "      <td>Male</td>\n",
502 |        "      <td>53.0</td>\n",
503 |        "      <td>3.0</td>\n",
504 |        "    </tr>\n",
505 |        "    <tr>\n",
506 |        "      <th>3</th>\n",
507 |        "      <td>Szeth</td>\n",
508 |        "      <td>Male</td>\n",
509 |        "      <td>35.0</td>\n",
510 |        "      <td>0.0</td>\n",
511 |        "    </tr>\n",
512 |        "    <tr>\n",
513 |        "      <th>4</th>\n",
514 |        "      <td>Hoid</td>\n",
515 |        "      <td>Male</td>\n",
516 |        "      <td>NaN</td>\n",
517 |        "      <td>NaN</td>\n",
518 |        "    </tr>\n",
519 |        "    <tr>\n",
520 |        "      <th>5</th>\n",
521 |        "      <td>Jashnah</td>\n",
522 |        "      <td>Female</td>\n",
523 |        "      <td>34.0</td>\n",
524 |        "      <td>3.0</td>\n",
525 |        "    </tr>\n",
526 |        "  </tbody>\n",
527 |        "</table>\n",
528 |        "</div>"
529 |       ],
530 |       "text/plain": [
531 |        "      name  gender   age  oaths\n",
532 |        "0  Kaladin    Male  20.0    3.0\n",
533 |        "1  Shallan  Female  17.0    2.0\n",
534 |        "2  Dalinar    Male  53.0    3.0\n",
535 |        "3    Szeth    Male  35.0    0.0\n",
536 |        "4     Hoid    Male   NaN    NaN\n",
537 |        "5  Jashnah  Female  34.0    3.0"
538 |       ]
539 |      },
540 |      "execution_count": 14,
541 |      "metadata": {},
542 |      "output_type": "execute_result"
543 |     }
544 |    ],
545 |    "source": [
546 |     "pd.read_csv(\"example_4.csv\", sep=\"|\", comment=\"#\")"
547 |    ]
548 |   }
549 |  ],
550 |  "metadata": {
551 |   "kernelspec": {
552 |    "display_name": "Python 3",
553 |    "language": "python",
554 |    "name": "python3"
555 |   },
556 |   "language_info": {
557 |    "codemirror_mode": {
558 |     "name": "ipython",
559 |     "version": 3
560 |    },
561 |    "file_extension": ".py",
562 |    "mimetype": "text/x-python",
563 |    "name": "python",
564 |    "nbconvert_exporter": "python",
565 |    "pygments_lexer": "ipython3",
566 |    "version": "3.7.3"
567 |   }
568 |  },
569 |  "nbformat": 4,
570 |  "nbformat_minor": 2
571 | }
572 | 


--------------------------------------------------------------------------------
/practiceResource/SavingAndSerialising.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Saving and Serialising a dataframe\n"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {
 14 |     "ExecuteTime": {
 15 |      "end_time": "2020-02-16T02:55:25.086213Z",
 16 |      "start_time": "2020-02-16T02:55:23.758762Z"
 17 |     }
 18 |    },
 19 |    "outputs": [],
 20 |    "source": [
 21 |     "import numpy as np\n",
 22 |     "import pandas as pd"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "code",
 27 |    "execution_count": 4,
 28 |    "metadata": {
 29 |     "ExecuteTime": {
 30 |      "end_time": "2020-02-16T03:01:00.686674Z",
 31 |      "start_time": "2020-02-16T03:01:00.668178Z"
 32 |     }
 33 |    },
 34 |    "outputs": [
 35 |     {
 36 |      "data": {
 37 |       "text/html": [
 38 |        "<div>\n",
 39 |        "<style scoped>\n",
 40 |        "    .dataframe tbody tr th:only-of-type {\n",
 41 |        "        vertical-align: middle;\n",
 42 |        "    }\n",
 43 |        "\n",
 44 |        "    .dataframe tbody tr th {\n",
 45 |        "        vertical-align: top;\n",
 46 |        "    }\n",
 47 |        "\n",
 48 |        "    .dataframe thead th {\n",
 49 |        "        text-align: right;\n",
 50 |        "    }\n",
 51 |        "</style>\n",
 52 |        "<table border=\"1\" class=\"dataframe\">\n",
 53 |        "  <thead>\n",
 54 |        "    <tr style=\"text-align: right;\">\n",
 55 |        "      <th></th>\n",
 56 |        "      <th>A</th>\n",
 57 |        "      <th>B</th>\n",
 58 |        "      <th>C</th>\n",
 59 |        "      <th>D</th>\n",
 60 |        "    </tr>\n",
 61 |        "  </thead>\n",
 62 |        "  <tbody>\n",
 63 |        "    <tr>\n",
 64 |        "      <th>0</th>\n",
 65 |        "      <td>0.069474</td>\n",
 66 |        "      <td>0.016839</td>\n",
 67 |        "      <td>0.607693</td>\n",
 68 |        "      <td>0.960414</td>\n",
 69 |        "    </tr>\n",
 70 |        "    <tr>\n",
 71 |        "      <th>1</th>\n",
 72 |        "      <td>0.755562</td>\n",
 73 |        "      <td>0.792302</td>\n",
 74 |        "      <td>0.638826</td>\n",
 75 |        "      <td>0.257696</td>\n",
 76 |        "    </tr>\n",
 77 |        "    <tr>\n",
 78 |        "      <th>2</th>\n",
 79 |        "      <td>0.766277</td>\n",
 80 |        "      <td>0.049024</td>\n",
 81 |        "      <td>0.264378</td>\n",
 82 |        "      <td>0.898995</td>\n",
 83 |        "    </tr>\n",
 84 |        "    <tr>\n",
 85 |        "      <th>3</th>\n",
 86 |        "      <td>0.263386</td>\n",
 87 |        "      <td>0.188590</td>\n",
 88 |        "      <td>0.977028</td>\n",
 89 |        "      <td>0.101986</td>\n",
 90 |        "    </tr>\n",
 91 |        "    <tr>\n",
 92 |        "      <th>4</th>\n",
 93 |        "      <td>0.052184</td>\n",
 94 |        "      <td>0.381186</td>\n",
 95 |        "      <td>0.655244</td>\n",
 96 |        "      <td>0.827316</td>\n",
 97 |        "    </tr>\n",
 98 |        "  </tbody>\n",
 99 |        "</table>\n",
100 |        "</div>"
101 |       ],
102 |       "text/plain": [
103 |        "          A         B         C         D\n",
104 |        "0  0.069474  0.016839  0.607693  0.960414\n",
105 |        "1  0.755562  0.792302  0.638826  0.257696\n",
106 |        "2  0.766277  0.049024  0.264378  0.898995\n",
107 |        "3  0.263386  0.188590  0.977028  0.101986\n",
108 |        "4  0.052184  0.381186  0.655244  0.827316"
109 |       ]
110 |      },
111 |      "execution_count": 4,
112 |      "metadata": {},
113 |      "output_type": "execute_result"
114 |     }
115 |    ],
116 |    "source": [
117 |     "# Lets make a new dataframe and save it out using various formats\n",
118 |     "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n",
119 |     "df.head()"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": 6,
125 |    "metadata": {
126 |     "ExecuteTime": {
127 |      "end_time": "2020-02-16T03:03:34.987813Z",
128 |      "start_time": "2020-02-16T03:03:34.219248Z"
129 |     }
130 |    },
131 |    "outputs": [],
132 |    "source": [
133 |     "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": 7,
139 |    "metadata": {
140 |     "ExecuteTime": {
141 |      "end_time": "2020-02-16T03:04:00.092272Z",
142 |      "start_time": "2020-02-16T03:04:00.079738Z"
143 |     }
144 |    },
145 |    "outputs": [],
146 |    "source": [
147 |     "df.to_pickle(\"save.pkl\")"
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": 8,
153 |    "metadata": {
154 |     "ExecuteTime": {
155 |      "end_time": "2020-02-16T03:06:06.874338Z",
156 |      "start_time": "2020-02-16T03:06:05.955905Z"
157 |     }
158 |    },
159 |    "outputs": [],
160 |    "source": [
161 |     "# pip install tables\n",
162 |     "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": 9,
168 |    "metadata": {
169 |     "ExecuteTime": {
170 |      "end_time": "2020-02-16T03:06:56.305779Z",
171 |      "start_time": "2020-02-16T03:06:56.204901Z"
172 |     }
173 |    },
174 |    "outputs": [],
175 |    "source": [
176 |     "# pip install feather-format\n",
177 |     "df.to_feather(\"save.fth\")"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "code",
182 |    "execution_count": 11,
183 |    "metadata": {
184 |     "ExecuteTime": {
185 |      "end_time": "2020-02-16T03:10:46.080056Z",
186 |      "start_time": "2020-02-16T03:10:46.075636Z"
187 |     }
188 |    },
189 |    "outputs": [],
190 |    "source": [
191 |     "# If you want to get the timings you can see in the video, you'll need this extension:\n",
192 |     "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html"
193 |    ]
194 |   },
195 |   {
196 |    "cell_type": "markdown",
197 |    "metadata": {},
198 |    "source": [
199 |     "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "code",
204 |    "execution_count": 14,
205 |    "metadata": {
206 |     "ExecuteTime": {
207 |      "end_time": "2020-02-16T03:14:16.764994Z",
208 |      "start_time": "2020-02-16T03:14:16.741456Z"
209 |     }
210 |    },
211 |    "outputs": [
212 |     {
213 |      "data": {
214 |       "text/html": [
215 |        "<div>\n",
216 |        "<style scoped>\n",
217 |        "    .dataframe tbody tr th:only-of-type {\n",
218 |        "        vertical-align: middle;\n",
219 |        "    }\n",
220 |        "\n",
221 |        "    .dataframe tbody tr th {\n",
222 |        "        vertical-align: top;\n",
223 |        "    }\n",
224 |        "\n",
225 |        "    .dataframe thead th {\n",
226 |        "        text-align: right;\n",
227 |        "    }\n",
228 |        "</style>\n",
229 |        "<table border=\"1\" class=\"dataframe\">\n",
230 |        "  <thead>\n",
231 |        "    <tr style=\"text-align: right;\">\n",
232 |        "      <th></th>\n",
233 |        "      <th>Name</th>\n",
234 |        "      <th>Year</th>\n",
235 |        "      <th>Group</th>\n",
236 |        "      <th>Status</th>\n",
237 |        "      <th>Birth Date</th>\n",
238 |        "      <th>Birth Place</th>\n",
239 |        "      <th>Gender</th>\n",
240 |        "      <th>Alma Mater</th>\n",
241 |        "      <th>Undergraduate Major</th>\n",
242 |        "      <th>Graduate Major</th>\n",
243 |        "      <th>Military Rank</th>\n",
244 |        "      <th>Military Branch</th>\n",
245 |        "      <th>Space Flights</th>\n",
246 |        "      <th>Space Flight (hr)</th>\n",
247 |        "      <th>Space Walks</th>\n",
248 |        "      <th>Space Walks (hr)</th>\n",
249 |        "      <th>Missions</th>\n",
250 |        "      <th>Death Date</th>\n",
251 |        "      <th>Death Mission</th>\n",
252 |        "    </tr>\n",
253 |        "  </thead>\n",
254 |        "  <tbody>\n",
255 |        "    <tr>\n",
256 |        "      <th>0</th>\n",
257 |        "      <td>Joseph M. Acaba</td>\n",
258 |        "      <td>2004.0</td>\n",
259 |        "      <td>19.0</td>\n",
260 |        "      <td>Active</td>\n",
261 |        "      <td>5/17/1967</td>\n",
262 |        "      <td>Inglewood, CA</td>\n",
263 |        "      <td>Male</td>\n",
264 |        "      <td>University of California-Santa Barbara; Univer...</td>\n",
265 |        "      <td>Geology</td>\n",
266 |        "      <td>Geology</td>\n",
267 |        "      <td>NaN</td>\n",
268 |        "      <td>NaN</td>\n",
269 |        "      <td>2</td>\n",
270 |        "      <td>3307</td>\n",
271 |        "      <td>2</td>\n",
272 |        "      <td>13.0</td>\n",
273 |        "      <td>STS-119 (Discovery), ISS-31/32 (Soyuz)</td>\n",
274 |        "      <td>NaN</td>\n",
275 |        "      <td>NaN</td>\n",
276 |        "    </tr>\n",
277 |        "    <tr>\n",
278 |        "      <th>1</th>\n",
279 |        "      <td>Loren W. Acton</td>\n",
280 |        "      <td>NaN</td>\n",
281 |        "      <td>NaN</td>\n",
282 |        "      <td>Retired</td>\n",
283 |        "      <td>3/7/1936</td>\n",
284 |        "      <td>Lewiston, MT</td>\n",
285 |        "      <td>Male</td>\n",
286 |        "      <td>Montana State University; University of Colorado</td>\n",
287 |        "      <td>Engineering Physics</td>\n",
288 |        "      <td>Solar Physics</td>\n",
289 |        "      <td>NaN</td>\n",
290 |        "      <td>NaN</td>\n",
291 |        "      <td>1</td>\n",
292 |        "      <td>190</td>\n",
293 |        "      <td>0</td>\n",
294 |        "      <td>0.0</td>\n",
295 |        "      <td>STS 51-F (Challenger)</td>\n",
296 |        "      <td>NaN</td>\n",
297 |        "      <td>NaN</td>\n",
298 |        "    </tr>\n",
299 |        "    <tr>\n",
300 |        "      <th>2</th>\n",
301 |        "      <td>James C. Adamson</td>\n",
302 |        "      <td>1984.0</td>\n",
303 |        "      <td>10.0</td>\n",
304 |        "      <td>Retired</td>\n",
305 |        "      <td>3/3/1946</td>\n",
306 |        "      <td>Warsaw, NY</td>\n",
307 |        "      <td>Male</td>\n",
308 |        "      <td>US Military Academy; Princeton University</td>\n",
309 |        "      <td>Engineering</td>\n",
310 |        "      <td>Aerospace Engineering</td>\n",
311 |        "      <td>Colonel</td>\n",
312 |        "      <td>US Army (Retired)</td>\n",
313 |        "      <td>2</td>\n",
314 |        "      <td>334</td>\n",
315 |        "      <td>0</td>\n",
316 |        "      <td>0.0</td>\n",
317 |        "      <td>STS-28 (Columbia), STS-43 (Atlantis)</td>\n",
318 |        "      <td>NaN</td>\n",
319 |        "      <td>NaN</td>\n",
320 |        "    </tr>\n",
321 |        "    <tr>\n",
322 |        "      <th>3</th>\n",
323 |        "      <td>Thomas D. Akers</td>\n",
324 |        "      <td>1987.0</td>\n",
325 |        "      <td>12.0</td>\n",
326 |        "      <td>Retired</td>\n",
327 |        "      <td>5/20/1951</td>\n",
328 |        "      <td>St. Louis, MO</td>\n",
329 |        "      <td>Male</td>\n",
330 |        "      <td>University of Missouri-Rolla</td>\n",
331 |        "      <td>Applied Mathematics</td>\n",
332 |        "      <td>Applied Mathematics</td>\n",
333 |        "      <td>Colonel</td>\n",
334 |        "      <td>US Air Force (Retired)</td>\n",
335 |        "      <td>4</td>\n",
336 |        "      <td>814</td>\n",
337 |        "      <td>4</td>\n",
338 |        "      <td>29.0</td>\n",
339 |        "      <td>STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...</td>\n",
340 |        "      <td>NaN</td>\n",
341 |        "      <td>NaN</td>\n",
342 |        "    </tr>\n",
343 |        "    <tr>\n",
344 |        "      <th>4</th>\n",
345 |        "      <td>Buzz Aldrin</td>\n",
346 |        "      <td>1963.0</td>\n",
347 |        "      <td>3.0</td>\n",
348 |        "      <td>Retired</td>\n",
349 |        "      <td>1/20/1930</td>\n",
350 |        "      <td>Montclair, NJ</td>\n",
351 |        "      <td>Male</td>\n",
352 |        "      <td>US Military Academy; MIT</td>\n",
353 |        "      <td>Mechanical Engineering</td>\n",
354 |        "      <td>Astronautics</td>\n",
355 |        "      <td>Colonel</td>\n",
356 |        "      <td>US Air Force (Retired)</td>\n",
357 |        "      <td>2</td>\n",
358 |        "      <td>289</td>\n",
359 |        "      <td>2</td>\n",
360 |        "      <td>8.0</td>\n",
361 |        "      <td>Gemini 12, Apollo 11</td>\n",
362 |        "      <td>NaN</td>\n",
363 |        "      <td>NaN</td>\n",
364 |        "    </tr>\n",
365 |        "  </tbody>\n",
366 |        "</table>\n",
367 |        "</div>"
368 |       ],
369 |       "text/plain": [
370 |        "               Name    Year  Group   Status Birth Date    Birth Place Gender  \\\n",
371 |        "0   Joseph M. Acaba  2004.0   19.0   Active  5/17/1967  Inglewood, CA   Male   \n",
372 |        "1    Loren W. Acton     NaN    NaN  Retired   3/7/1936   Lewiston, MT   Male   \n",
373 |        "2  James C. Adamson  1984.0   10.0  Retired   3/3/1946     Warsaw, NY   Male   \n",
374 |        "3   Thomas D. Akers  1987.0   12.0  Retired  5/20/1951  St. Louis, MO   Male   \n",
375 |        "4       Buzz Aldrin  1963.0    3.0  Retired  1/20/1930  Montclair, NJ   Male   \n",
376 |        "\n",
377 |        "                                          Alma Mater     Undergraduate Major  \\\n",
378 |        "0  University of California-Santa Barbara; Univer...                 Geology   \n",
379 |        "1   Montana State University; University of Colorado     Engineering Physics   \n",
380 |        "2          US Military Academy; Princeton University             Engineering   \n",
381 |        "3                       University of Missouri-Rolla     Applied Mathematics   \n",
382 |        "4                           US Military Academy; MIT  Mechanical Engineering   \n",
383 |        "\n",
384 |        "          Graduate Major Military Rank         Military Branch  Space Flights  \\\n",
385 |        "0                Geology           NaN                     NaN              2   \n",
386 |        "1          Solar Physics           NaN                     NaN              1   \n",
387 |        "2  Aerospace Engineering       Colonel       US Army (Retired)              2   \n",
388 |        "3    Applied Mathematics       Colonel  US Air Force (Retired)              4   \n",
389 |        "4           Astronautics       Colonel  US Air Force (Retired)              2   \n",
390 |        "\n",
391 |        "   Space Flight (hr)  Space Walks  Space Walks (hr)  \\\n",
392 |        "0               3307            2              13.0   \n",
393 |        "1                190            0               0.0   \n",
394 |        "2                334            0               0.0   \n",
395 |        "3                814            4              29.0   \n",
396 |        "4                289            2               8.0   \n",
397 |        "\n",
398 |        "                                            Missions Death Date Death Mission  \n",
399 |        "0             STS-119 (Discovery), ISS-31/32 (Soyuz)        NaN           NaN  \n",
400 |        "1                              STS 51-F (Challenger)        NaN           NaN  \n",
401 |        "2               STS-28 (Columbia), STS-43 (Atlantis)        NaN           NaN  \n",
402 |        "3  STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...        NaN           NaN  \n",
403 |        "4                               Gemini 12, Apollo 11        NaN           NaN  "
404 |       ]
405 |      },
406 |      "execution_count": 14,
407 |      "metadata": {},
408 |      "output_type": "execute_result"
409 |     }
410 |    ],
411 |    "source": [
412 |     "df = pd.read_csv(\"astronauts.csv\")\n",
413 |     "df.head()"
414 |    ]
415 |   },
416 |   {
417 |    "cell_type": "code",
418 |    "execution_count": 15,
419 |    "metadata": {
420 |     "ExecuteTime": {
421 |      "end_time": "2020-02-16T03:14:48.250858Z",
422 |      "start_time": "2020-02-16T03:14:48.237441Z"
423 |     }
424 |    },
425 |    "outputs": [],
426 |    "source": [
427 |     "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "code",
432 |    "execution_count": 16,
433 |    "metadata": {
434 |     "ExecuteTime": {
435 |      "end_time": "2020-02-16T03:14:52.892116Z",
436 |      "start_time": "2020-02-16T03:14:52.876108Z"
437 |     }
438 |    },
439 |    "outputs": [],
440 |    "source": [
441 |     "pd.read_csv(\"save.csv\");"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": 17,
447 |    "metadata": {
448 |     "ExecuteTime": {
449 |      "end_time": "2020-02-16T03:15:12.997156Z",
450 |      "start_time": "2020-02-16T03:15:12.988669Z"
451 |     }
452 |    },
453 |    "outputs": [],
454 |    "source": [
455 |     "df.to_pickle(\"save.pkl\")"
456 |    ]
457 |   },
458 |   {
459 |    "cell_type": "code",
460 |    "execution_count": 18,
461 |    "metadata": {
462 |     "ExecuteTime": {
463 |      "end_time": "2020-02-16T03:15:16.375064Z",
464 |      "start_time": "2020-02-16T03:15:16.365034Z"
465 |     }
466 |    },
467 |    "outputs": [],
468 |    "source": [
469 |     "pd.read_pickle(\"save.pkl\");"
470 |    ]
471 |   },
472 |   {
473 |    "cell_type": "code",
474 |    "execution_count": 32,
475 |    "metadata": {
476 |     "ExecuteTime": {
477 |      "end_time": "2020-02-16T03:19:15.617617Z",
478 |      "start_time": "2020-02-16T03:19:15.588076Z"
479 |     }
480 |    },
481 |    "outputs": [],
482 |    "source": [
483 |     "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
484 |    ]
485 |   },
486 |   {
487 |    "cell_type": "code",
488 |    "execution_count": 22,
489 |    "metadata": {
490 |     "ExecuteTime": {
491 |      "end_time": "2020-02-16T03:15:35.323031Z",
492 |      "start_time": "2020-02-16T03:15:35.301528Z"
493 |     }
494 |    },
495 |    "outputs": [],
496 |    "source": [
497 |     "pd.read_hdf(\"save.hdf\");"
498 |    ]
499 |   },
500 |   {
501 |    "cell_type": "code",
502 |    "execution_count": 23,
503 |    "metadata": {
504 |     "ExecuteTime": {
505 |      "end_time": "2020-02-16T03:15:47.513253Z",
506 |      "start_time": "2020-02-16T03:15:47.499922Z"
507 |     }
508 |    },
509 |    "outputs": [],
510 |    "source": [
511 |     "df.to_feather(\"save.fth\")"
512 |    ]
513 |   },
514 |   {
515 |    "cell_type": "code",
516 |    "execution_count": 24,
517 |    "metadata": {
518 |     "ExecuteTime": {
519 |      "end_time": "2020-02-16T03:15:50.574863Z",
520 |      "start_time": "2020-02-16T03:15:50.557141Z"
521 |     }
522 |    },
523 |    "outputs": [],
524 |    "source": [
525 |     "pd.read_feather(\"save.fth\");"
526 |    ]
527 |   },
528 |   {
529 |    "cell_type": "code",
530 |    "execution_count": 34,
531 |    "metadata": {
532 |     "ExecuteTime": {
533 |      "end_time": "2020-02-16T03:20:03.082982Z",
534 |      "start_time": "2020-02-16T03:20:03.062532Z"
535 |     }
536 |    },
537 |    "outputs": [
538 |     {
539 |      "name": "stdout",
540 |      "output_type": "stream",
541 |      "text": [
542 |       " Volume in drive C is System\n",
543 |       " Volume Serial Number is 48F0-A822\n",
544 |       "\n",
545 |       " Directory of C:\\Users\\shint1\\Google Drive\\SDS\\DataManip\\2. Notebooks and Datasets\\2_Data\\Lectures\n",
546 |       "\n",
547 |       "16/02/2020  01:18 PM    <DIR>          .\n",
548 |       "16/02/2020  01:18 PM    <DIR>          ..\n",
549 |       "16/02/2020  12:55 PM    <DIR>          .ipynb_checkpoints\n",
550 |       "14/02/2020  10:50 PM            38,725 1_Loading.ipynb\n",
551 |       "14/02/2020  11:32 PM            32,118 2_NumpyVPandas.ipynb\n",
552 |       "16/02/2020  12:54 PM             9,216 3_CreatingDataFrames.ipynb\n",
553 |       "16/02/2020  01:18 PM            18,019 4_SavingAndSerialising.ipynb\n",
554 |       "20/09/2019  10:04 AM            81,593 astronauts.csv\n",
555 |       "01/10/2019  08:15 PM            11,328 heart.csv\n",
556 |       "18/01/2020  01:19 PM            35,216 heart.pkl\n",
557 |       "16/02/2020  01:14 PM            87,030 save.csv\n",
558 |       "16/02/2020  01:15 PM           107,240 save.fth\n",
559 |       "16/02/2020  01:19 PM         4,108,481 save.hdf\n",
560 |       "16/02/2020  01:15 PM            90,693 save.pkl\n",
561 |       "              11 File(s)      4,619,659 bytes\n",
562 |       "               3 Dir(s)  244,606,853,120 bytes free\n"
563 |      ]
564 |     }
565 |    ],
566 |    "source": [
567 |     "%ls"
568 |    ]
569 |   },
570 |   {
571 |    "cell_type": "markdown",
572 |    "metadata": {},
573 |    "source": [
574 |     "### Recap\n",
575 |     "\n",
576 |     "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n",
577 |     "\n",
578 |     "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine."
579 |    ]
580 |   }
581 |  ],
582 |  "metadata": {
583 |   "kernelspec": {
584 |    "display_name": "Python 3",
585 |    "language": "python",
586 |    "name": "python3"
587 |   },
588 |   "language_info": {
589 |    "codemirror_mode": {
590 |     "name": "ipython",
591 |     "version": 3
592 |    },
593 |    "file_extension": ".py",
594 |    "mimetype": "text/x-python",
595 |    "name": "python",
596 |    "nbconvert_exporter": "python",
597 |    "pygments_lexer": "ipython3",
598 |    "version": "3.7.3"
599 |   }
600 |  },
601 |  "nbformat": 4,
602 |  "nbformat_minor": 2
603 | }
604 | 


--------------------------------------------------------------------------------
/practiceResource/.ipynb_checkpoints/dataSavingAndSerialising-checkpoint.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Saving and Serialising a dataframe\n"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {
 14 |     "ExecuteTime": {
 15 |      "end_time": "2020-02-16T02:55:25.086213Z",
 16 |      "start_time": "2020-02-16T02:55:23.758762Z"
 17 |     }
 18 |    },
 19 |    "outputs": [],
 20 |    "source": [
 21 |     "import numpy as np\n",
 22 |     "import pandas as pd"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "code",
 27 |    "execution_count": 2,
 28 |    "metadata": {
 29 |     "ExecuteTime": {
 30 |      "end_time": "2020-02-16T03:01:00.686674Z",
 31 |      "start_time": "2020-02-16T03:01:00.668178Z"
 32 |     }
 33 |    },
 34 |    "outputs": [
 35 |     {
 36 |      "data": {
 37 |       "text/html": [
 38 |        "<div>\n",
 39 |        "<style scoped>\n",
 40 |        "    .dataframe tbody tr th:only-of-type {\n",
 41 |        "        vertical-align: middle;\n",
 42 |        "    }\n",
 43 |        "\n",
 44 |        "    .dataframe tbody tr th {\n",
 45 |        "        vertical-align: top;\n",
 46 |        "    }\n",
 47 |        "\n",
 48 |        "    .dataframe thead th {\n",
 49 |        "        text-align: right;\n",
 50 |        "    }\n",
 51 |        "</style>\n",
 52 |        "<table border=\"1\" class=\"dataframe\">\n",
 53 |        "  <thead>\n",
 54 |        "    <tr style=\"text-align: right;\">\n",
 55 |        "      <th></th>\n",
 56 |        "      <th>A</th>\n",
 57 |        "      <th>B</th>\n",
 58 |        "      <th>C</th>\n",
 59 |        "      <th>D</th>\n",
 60 |        "    </tr>\n",
 61 |        "  </thead>\n",
 62 |        "  <tbody>\n",
 63 |        "    <tr>\n",
 64 |        "      <td>0</td>\n",
 65 |        "      <td>0.863149</td>\n",
 66 |        "      <td>0.314732</td>\n",
 67 |        "      <td>0.669747</td>\n",
 68 |        "      <td>0.702656</td>\n",
 69 |        "    </tr>\n",
 70 |        "    <tr>\n",
 71 |        "      <td>1</td>\n",
 72 |        "      <td>0.546542</td>\n",
 73 |        "      <td>0.563607</td>\n",
 74 |        "      <td>0.780532</td>\n",
 75 |        "      <td>0.312281</td>\n",
 76 |        "    </tr>\n",
 77 |        "    <tr>\n",
 78 |        "      <td>2</td>\n",
 79 |        "      <td>0.024058</td>\n",
 80 |        "      <td>0.473108</td>\n",
 81 |        "      <td>0.447980</td>\n",
 82 |        "      <td>0.811878</td>\n",
 83 |        "    </tr>\n",
 84 |        "    <tr>\n",
 85 |        "      <td>3</td>\n",
 86 |        "      <td>0.888702</td>\n",
 87 |        "      <td>0.392524</td>\n",
 88 |        "      <td>0.830159</td>\n",
 89 |        "      <td>0.452014</td>\n",
 90 |        "    </tr>\n",
 91 |        "    <tr>\n",
 92 |        "      <td>4</td>\n",
 93 |        "      <td>0.266793</td>\n",
 94 |        "      <td>0.449780</td>\n",
 95 |        "      <td>0.589546</td>\n",
 96 |        "      <td>0.882689</td>\n",
 97 |        "    </tr>\n",
 98 |        "  </tbody>\n",
 99 |        "</table>\n",
100 |        "</div>"
101 |       ],
102 |       "text/plain": [
103 |        "          A         B         C         D\n",
104 |        "0  0.863149  0.314732  0.669747  0.702656\n",
105 |        "1  0.546542  0.563607  0.780532  0.312281\n",
106 |        "2  0.024058  0.473108  0.447980  0.811878\n",
107 |        "3  0.888702  0.392524  0.830159  0.452014\n",
108 |        "4  0.266793  0.449780  0.589546  0.882689"
109 |       ]
110 |      },
111 |      "execution_count": 2,
112 |      "metadata": {},
113 |      "output_type": "execute_result"
114 |     }
115 |    ],
116 |    "source": [
117 |     "# Lets make a new dataframe and save it out using various formats\n",
118 |     "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n",
119 |     "df.head()"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": 3,
125 |    "metadata": {
126 |     "ExecuteTime": {
127 |      "end_time": "2020-02-16T03:03:34.987813Z",
128 |      "start_time": "2020-02-16T03:03:34.219248Z"
129 |     }
130 |    },
131 |    "outputs": [],
132 |    "source": [
133 |     "df.to_csv(\"save.csv\", index=False, float_format=\"%0.3f\")"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": 4,
139 |    "metadata": {
140 |     "ExecuteTime": {
141 |      "end_time": "2020-02-16T03:04:00.092272Z",
142 |      "start_time": "2020-02-16T03:04:00.079738Z"
143 |     }
144 |    },
145 |    "outputs": [],
146 |    "source": [
147 |     "df.to_pickle(\"save.pkl\")"
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": 8,
153 |    "metadata": {
154 |     "ExecuteTime": {
155 |      "end_time": "2020-02-16T03:06:06.874338Z",
156 |      "start_time": "2020-02-16T03:06:05.955905Z"
157 |     }
158 |    },
159 |    "outputs": [],
160 |    "source": [
161 |     "# pip install tables\n",
162 |     "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": 9,
168 |    "metadata": {
169 |     "ExecuteTime": {
170 |      "end_time": "2020-02-16T03:06:56.305779Z",
171 |      "start_time": "2020-02-16T03:06:56.204901Z"
172 |     }
173 |    },
174 |    "outputs": [],
175 |    "source": [
176 |     "# pip install feather-format\n",
177 |     "df.to_feather(\"save.fth\")"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "code",
182 |    "execution_count": 11,
183 |    "metadata": {
184 |     "ExecuteTime": {
185 |      "end_time": "2020-02-16T03:10:46.080056Z",
186 |      "start_time": "2020-02-16T03:10:46.075636Z"
187 |     }
188 |    },
189 |    "outputs": [],
190 |    "source": [
191 |     "# If you want to get the timings you can see in the video, you'll need this extension:\n",
192 |     "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html"
193 |    ]
194 |   },
195 |   {
196 |    "cell_type": "markdown",
197 |    "metadata": {},
198 |    "source": [
199 |     "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "code",
204 |    "execution_count": 14,
205 |    "metadata": {
206 |     "ExecuteTime": {
207 |      "end_time": "2020-02-16T03:14:16.764994Z",
208 |      "start_time": "2020-02-16T03:14:16.741456Z"
209 |     }
210 |    },
211 |    "outputs": [
212 |     {
213 |      "data": {
214 |       "text/html": [
215 |        "<div>\n",
216 |        "<style scoped>\n",
217 |        "    .dataframe tbody tr th:only-of-type {\n",
218 |        "        vertical-align: middle;\n",
219 |        "    }\n",
220 |        "\n",
221 |        "    .dataframe tbody tr th {\n",
222 |        "        vertical-align: top;\n",
223 |        "    }\n",
224 |        "\n",
225 |        "    .dataframe thead th {\n",
226 |        "        text-align: right;\n",
227 |        "    }\n",
228 |        "</style>\n",
229 |        "<table border=\"1\" class=\"dataframe\">\n",
230 |        "  <thead>\n",
231 |        "    <tr style=\"text-align: right;\">\n",
232 |        "      <th></th>\n",
233 |        "      <th>Name</th>\n",
234 |        "      <th>Year</th>\n",
235 |        "      <th>Group</th>\n",
236 |        "      <th>Status</th>\n",
237 |        "      <th>Birth Date</th>\n",
238 |        "      <th>Birth Place</th>\n",
239 |        "      <th>Gender</th>\n",
240 |        "      <th>Alma Mater</th>\n",
241 |        "      <th>Undergraduate Major</th>\n",
242 |        "      <th>Graduate Major</th>\n",
243 |        "      <th>Military Rank</th>\n",
244 |        "      <th>Military Branch</th>\n",
245 |        "      <th>Space Flights</th>\n",
246 |        "      <th>Space Flight (hr)</th>\n",
247 |        "      <th>Space Walks</th>\n",
248 |        "      <th>Space Walks (hr)</th>\n",
249 |        "      <th>Missions</th>\n",
250 |        "      <th>Death Date</th>\n",
251 |        "      <th>Death Mission</th>\n",
252 |        "    </tr>\n",
253 |        "  </thead>\n",
254 |        "  <tbody>\n",
255 |        "    <tr>\n",
256 |        "      <th>0</th>\n",
257 |        "      <td>Joseph M. Acaba</td>\n",
258 |        "      <td>2004.0</td>\n",
259 |        "      <td>19.0</td>\n",
260 |        "      <td>Active</td>\n",
261 |        "      <td>5/17/1967</td>\n",
262 |        "      <td>Inglewood, CA</td>\n",
263 |        "      <td>Male</td>\n",
264 |        "      <td>University of California-Santa Barbara; Univer...</td>\n",
265 |        "      <td>Geology</td>\n",
266 |        "      <td>Geology</td>\n",
267 |        "      <td>NaN</td>\n",
268 |        "      <td>NaN</td>\n",
269 |        "      <td>2</td>\n",
270 |        "      <td>3307</td>\n",
271 |        "      <td>2</td>\n",
272 |        "      <td>13.0</td>\n",
273 |        "      <td>STS-119 (Discovery), ISS-31/32 (Soyuz)</td>\n",
274 |        "      <td>NaN</td>\n",
275 |        "      <td>NaN</td>\n",
276 |        "    </tr>\n",
277 |        "    <tr>\n",
278 |        "      <th>1</th>\n",
279 |        "      <td>Loren W. Acton</td>\n",
280 |        "      <td>NaN</td>\n",
281 |        "      <td>NaN</td>\n",
282 |        "      <td>Retired</td>\n",
283 |        "      <td>3/7/1936</td>\n",
284 |        "      <td>Lewiston, MT</td>\n",
285 |        "      <td>Male</td>\n",
286 |        "      <td>Montana State University; University of Colorado</td>\n",
287 |        "      <td>Engineering Physics</td>\n",
288 |        "      <td>Solar Physics</td>\n",
289 |        "      <td>NaN</td>\n",
290 |        "      <td>NaN</td>\n",
291 |        "      <td>1</td>\n",
292 |        "      <td>190</td>\n",
293 |        "      <td>0</td>\n",
294 |        "      <td>0.0</td>\n",
295 |        "      <td>STS 51-F (Challenger)</td>\n",
296 |        "      <td>NaN</td>\n",
297 |        "      <td>NaN</td>\n",
298 |        "    </tr>\n",
299 |        "    <tr>\n",
300 |        "      <th>2</th>\n",
301 |        "      <td>James C. Adamson</td>\n",
302 |        "      <td>1984.0</td>\n",
303 |        "      <td>10.0</td>\n",
304 |        "      <td>Retired</td>\n",
305 |        "      <td>3/3/1946</td>\n",
306 |        "      <td>Warsaw, NY</td>\n",
307 |        "      <td>Male</td>\n",
308 |        "      <td>US Military Academy; Princeton University</td>\n",
309 |        "      <td>Engineering</td>\n",
310 |        "      <td>Aerospace Engineering</td>\n",
311 |        "      <td>Colonel</td>\n",
312 |        "      <td>US Army (Retired)</td>\n",
313 |        "      <td>2</td>\n",
314 |        "      <td>334</td>\n",
315 |        "      <td>0</td>\n",
316 |        "      <td>0.0</td>\n",
317 |        "      <td>STS-28 (Columbia), STS-43 (Atlantis)</td>\n",
318 |        "      <td>NaN</td>\n",
319 |        "      <td>NaN</td>\n",
320 |        "    </tr>\n",
321 |        "    <tr>\n",
322 |        "      <th>3</th>\n",
323 |        "      <td>Thomas D. Akers</td>\n",
324 |        "      <td>1987.0</td>\n",
325 |        "      <td>12.0</td>\n",
326 |        "      <td>Retired</td>\n",
327 |        "      <td>5/20/1951</td>\n",
328 |        "      <td>St. Louis, MO</td>\n",
329 |        "      <td>Male</td>\n",
330 |        "      <td>University of Missouri-Rolla</td>\n",
331 |        "      <td>Applied Mathematics</td>\n",
332 |        "      <td>Applied Mathematics</td>\n",
333 |        "      <td>Colonel</td>\n",
334 |        "      <td>US Air Force (Retired)</td>\n",
335 |        "      <td>4</td>\n",
336 |        "      <td>814</td>\n",
337 |        "      <td>4</td>\n",
338 |        "      <td>29.0</td>\n",
339 |        "      <td>STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...</td>\n",
340 |        "      <td>NaN</td>\n",
341 |        "      <td>NaN</td>\n",
342 |        "    </tr>\n",
343 |        "    <tr>\n",
344 |        "      <th>4</th>\n",
345 |        "      <td>Buzz Aldrin</td>\n",
346 |        "      <td>1963.0</td>\n",
347 |        "      <td>3.0</td>\n",
348 |        "      <td>Retired</td>\n",
349 |        "      <td>1/20/1930</td>\n",
350 |        "      <td>Montclair, NJ</td>\n",
351 |        "      <td>Male</td>\n",
352 |        "      <td>US Military Academy; MIT</td>\n",
353 |        "      <td>Mechanical Engineering</td>\n",
354 |        "      <td>Astronautics</td>\n",
355 |        "      <td>Colonel</td>\n",
356 |        "      <td>US Air Force (Retired)</td>\n",
357 |        "      <td>2</td>\n",
358 |        "      <td>289</td>\n",
359 |        "      <td>2</td>\n",
360 |        "      <td>8.0</td>\n",
361 |        "      <td>Gemini 12, Apollo 11</td>\n",
362 |        "      <td>NaN</td>\n",
363 |        "      <td>NaN</td>\n",
364 |        "    </tr>\n",
365 |        "  </tbody>\n",
366 |        "</table>\n",
367 |        "</div>"
368 |       ],
369 |       "text/plain": [
370 |        "               Name    Year  Group   Status Birth Date    Birth Place Gender  \\\n",
371 |        "0   Joseph M. Acaba  2004.0   19.0   Active  5/17/1967  Inglewood, CA   Male   \n",
372 |        "1    Loren W. Acton     NaN    NaN  Retired   3/7/1936   Lewiston, MT   Male   \n",
373 |        "2  James C. Adamson  1984.0   10.0  Retired   3/3/1946     Warsaw, NY   Male   \n",
374 |        "3   Thomas D. Akers  1987.0   12.0  Retired  5/20/1951  St. Louis, MO   Male   \n",
375 |        "4       Buzz Aldrin  1963.0    3.0  Retired  1/20/1930  Montclair, NJ   Male   \n",
376 |        "\n",
377 |        "                                          Alma Mater     Undergraduate Major  \\\n",
378 |        "0  University of California-Santa Barbara; Univer...                 Geology   \n",
379 |        "1   Montana State University; University of Colorado     Engineering Physics   \n",
380 |        "2          US Military Academy; Princeton University             Engineering   \n",
381 |        "3                       University of Missouri-Rolla     Applied Mathematics   \n",
382 |        "4                           US Military Academy; MIT  Mechanical Engineering   \n",
383 |        "\n",
384 |        "          Graduate Major Military Rank         Military Branch  Space Flights  \\\n",
385 |        "0                Geology           NaN                     NaN              2   \n",
386 |        "1          Solar Physics           NaN                     NaN              1   \n",
387 |        "2  Aerospace Engineering       Colonel       US Army (Retired)              2   \n",
388 |        "3    Applied Mathematics       Colonel  US Air Force (Retired)              4   \n",
389 |        "4           Astronautics       Colonel  US Air Force (Retired)              2   \n",
390 |        "\n",
391 |        "   Space Flight (hr)  Space Walks  Space Walks (hr)  \\\n",
392 |        "0               3307            2              13.0   \n",
393 |        "1                190            0               0.0   \n",
394 |        "2                334            0               0.0   \n",
395 |        "3                814            4              29.0   \n",
396 |        "4                289            2               8.0   \n",
397 |        "\n",
398 |        "                                            Missions Death Date Death Mission  \n",
399 |        "0             STS-119 (Discovery), ISS-31/32 (Soyuz)        NaN           NaN  \n",
400 |        "1                              STS 51-F (Challenger)        NaN           NaN  \n",
401 |        "2               STS-28 (Columbia), STS-43 (Atlantis)        NaN           NaN  \n",
402 |        "3  STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...        NaN           NaN  \n",
403 |        "4                               Gemini 12, Apollo 11        NaN           NaN  "
404 |       ]
405 |      },
406 |      "execution_count": 14,
407 |      "metadata": {},
408 |      "output_type": "execute_result"
409 |     }
410 |    ],
411 |    "source": [
412 |     "df = pd.read_csv(\"astronauts.csv\")\n",
413 |     "df.head()"
414 |    ]
415 |   },
416 |   {
417 |    "cell_type": "code",
418 |    "execution_count": 15,
419 |    "metadata": {
420 |     "ExecuteTime": {
421 |      "end_time": "2020-02-16T03:14:48.250858Z",
422 |      "start_time": "2020-02-16T03:14:48.237441Z"
423 |     }
424 |    },
425 |    "outputs": [],
426 |    "source": [
427 |     "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "code",
432 |    "execution_count": 16,
433 |    "metadata": {
434 |     "ExecuteTime": {
435 |      "end_time": "2020-02-16T03:14:52.892116Z",
436 |      "start_time": "2020-02-16T03:14:52.876108Z"
437 |     }
438 |    },
439 |    "outputs": [],
440 |    "source": [
441 |     "pd.read_csv(\"save.csv\");"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": 17,
447 |    "metadata": {
448 |     "ExecuteTime": {
449 |      "end_time": "2020-02-16T03:15:12.997156Z",
450 |      "start_time": "2020-02-16T03:15:12.988669Z"
451 |     }
452 |    },
453 |    "outputs": [],
454 |    "source": [
455 |     "df.to_pickle(\"save.pkl\")"
456 |    ]
457 |   },
458 |   {
459 |    "cell_type": "code",
460 |    "execution_count": 18,
461 |    "metadata": {
462 |     "ExecuteTime": {
463 |      "end_time": "2020-02-16T03:15:16.375064Z",
464 |      "start_time": "2020-02-16T03:15:16.365034Z"
465 |     }
466 |    },
467 |    "outputs": [],
468 |    "source": [
469 |     "pd.read_pickle(\"save.pkl\");"
470 |    ]
471 |   },
472 |   {
473 |    "cell_type": "code",
474 |    "execution_count": 32,
475 |    "metadata": {
476 |     "ExecuteTime": {
477 |      "end_time": "2020-02-16T03:19:15.617617Z",
478 |      "start_time": "2020-02-16T03:19:15.588076Z"
479 |     }
480 |    },
481 |    "outputs": [],
482 |    "source": [
483 |     "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
484 |    ]
485 |   },
486 |   {
487 |    "cell_type": "code",
488 |    "execution_count": 22,
489 |    "metadata": {
490 |     "ExecuteTime": {
491 |      "end_time": "2020-02-16T03:15:35.323031Z",
492 |      "start_time": "2020-02-16T03:15:35.301528Z"
493 |     }
494 |    },
495 |    "outputs": [],
496 |    "source": [
497 |     "pd.read_hdf(\"save.hdf\");"
498 |    ]
499 |   },
500 |   {
501 |    "cell_type": "code",
502 |    "execution_count": 23,
503 |    "metadata": {
504 |     "ExecuteTime": {
505 |      "end_time": "2020-02-16T03:15:47.513253Z",
506 |      "start_time": "2020-02-16T03:15:47.499922Z"
507 |     }
508 |    },
509 |    "outputs": [],
510 |    "source": [
511 |     "df.to_feather(\"save.fth\")"
512 |    ]
513 |   },
514 |   {
515 |    "cell_type": "code",
516 |    "execution_count": 24,
517 |    "metadata": {
518 |     "ExecuteTime": {
519 |      "end_time": "2020-02-16T03:15:50.574863Z",
520 |      "start_time": "2020-02-16T03:15:50.557141Z"
521 |     }
522 |    },
523 |    "outputs": [],
524 |    "source": [
525 |     "pd.read_feather(\"save.fth\");"
526 |    ]
527 |   },
528 |   {
529 |    "cell_type": "code",
530 |    "execution_count": 34,
531 |    "metadata": {
532 |     "ExecuteTime": {
533 |      "end_time": "2020-02-16T03:20:03.082982Z",
534 |      "start_time": "2020-02-16T03:20:03.062532Z"
535 |     }
536 |    },
537 |    "outputs": [
538 |     {
539 |      "name": "stdout",
540 |      "output_type": "stream",
541 |      "text": [
542 |       " Volume in drive C is System\n",
543 |       " Volume Serial Number is 48F0-A822\n",
544 |       "\n",
545 |       " Directory of C:\\Users\\shint1\\Google Drive\\SDS\\DataManip\\2. Notebooks and Datasets\\2_Data\\Lectures\n",
546 |       "\n",
547 |       "16/02/2020  01:18 PM    <DIR>          .\n",
548 |       "16/02/2020  01:18 PM    <DIR>          ..\n",
549 |       "16/02/2020  12:55 PM    <DIR>          .ipynb_checkpoints\n",
550 |       "14/02/2020  10:50 PM            38,725 1_Loading.ipynb\n",
551 |       "14/02/2020  11:32 PM            32,118 2_NumpyVPandas.ipynb\n",
552 |       "16/02/2020  12:54 PM             9,216 3_CreatingDataFrames.ipynb\n",
553 |       "16/02/2020  01:18 PM            18,019 4_SavingAndSerialising.ipynb\n",
554 |       "20/09/2019  10:04 AM            81,593 astronauts.csv\n",
555 |       "01/10/2019  08:15 PM            11,328 heart.csv\n",
556 |       "18/01/2020  01:19 PM            35,216 heart.pkl\n",
557 |       "16/02/2020  01:14 PM            87,030 save.csv\n",
558 |       "16/02/2020  01:15 PM           107,240 save.fth\n",
559 |       "16/02/2020  01:19 PM         4,108,481 save.hdf\n",
560 |       "16/02/2020  01:15 PM            90,693 save.pkl\n",
561 |       "              11 File(s)      4,619,659 bytes\n",
562 |       "               3 Dir(s)  244,606,853,120 bytes free\n"
563 |      ]
564 |     }
565 |    ],
566 |    "source": [
567 |     "%ls"
568 |    ]
569 |   },
570 |   {
571 |    "cell_type": "markdown",
572 |    "metadata": {},
573 |    "source": [
574 |     "### Recap\n",
575 |     "\n",
576 |     "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n",
577 |     "\n",
578 |     "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine."
579 |    ]
580 |   }
581 |  ],
582 |  "metadata": {
583 |   "kernelspec": {
584 |    "display_name": "Python 3",
585 |    "language": "python",
586 |    "name": "python3"
587 |   },
588 |   "language_info": {
589 |    "codemirror_mode": {
590 |     "name": "ipython",
591 |     "version": 3
592 |    },
593 |    "file_extension": ".py",
594 |    "mimetype": "text/x-python",
595 |    "name": "python",
596 |    "nbconvert_exporter": "python",
597 |    "pygments_lexer": "ipython3",
598 |    "version": "3.7.4"
599 |   }
600 |  },
601 |  "nbformat": 4,
602 |  "nbformat_minor": 2
603 | }
604 | 


--------------------------------------------------------------------------------
/practiceResource/dataloading.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Loading Datasets\n",
  8 |     "\n",
  9 |     "We'll be using the Kaggle Heart Disease UCI dataset as an example. You can find it here: https://www.kaggle.com/ronitf/heart-disease-uci\n",
 10 |     "\n",
 11 |     "* Manual loading (last resort)\n",
 12 |     "* `np.loadtxt`\n",
 13 |     "* `np.genfromtxt`\n",
 14 |     "* `pd.read_csv`\n",
 15 |     "* `pd.read*`\n",
 16 |     "* `pickle`"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": 1,
 22 |    "metadata": {
 23 |     "ExecuteTime": {
 24 |      "end_time": "2020-02-14T12:39:12.306606Z",
 25 |      "start_time": "2020-02-14T12:39:12.302988Z"
 26 |     }
 27 |    },
 28 |    "outputs": [],
 29 |    "source": [
 30 |     "import numpy as np\n",
 31 |     "import pandas as pd\n",
 32 |     "import pickle\n",
 33 |     "\n",
 34 |     "filename = \"heart.csv\""
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "metadata": {},
 40 |    "source": [
 41 |     "## The best method - panda's read_csv\n",
 42 |     "Handles the most edge cases, datetime and file issues best."
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 2,
 48 |    "metadata": {
 49 |     "ExecuteTime": {
 50 |      "end_time": "2020-02-14T12:39:55.204452Z",
 51 |      "start_time": "2020-02-14T12:39:55.185019Z"
 52 |     }
 53 |    },
 54 |    "outputs": [
 55 |     {
 56 |      "data": {
 57 |       "text/html": [
 58 |        "<div>\n",
 59 |        "<style scoped>\n",
 60 |        "    .dataframe tbody tr th:only-of-type {\n",
 61 |        "        vertical-align: middle;\n",
 62 |        "    }\n",
 63 |        "\n",
 64 |        "    .dataframe tbody tr th {\n",
 65 |        "        vertical-align: top;\n",
 66 |        "    }\n",
 67 |        "\n",
 68 |        "    .dataframe thead th {\n",
 69 |        "        text-align: right;\n",
 70 |        "    }\n",
 71 |        "</style>\n",
 72 |        "<table border=\"1\" class=\"dataframe\">\n",
 73 |        "  <thead>\n",
 74 |        "    <tr style=\"text-align: right;\">\n",
 75 |        "      <th></th>\n",
 76 |        "      <th>age</th>\n",
 77 |        "      <th>sex</th>\n",
 78 |        "      <th>cp</th>\n",
 79 |        "      <th>trestbps</th>\n",
 80 |        "      <th>chol</th>\n",
 81 |        "      <th>fbs</th>\n",
 82 |        "      <th>restecg</th>\n",
 83 |        "      <th>thalach</th>\n",
 84 |        "      <th>exang</th>\n",
 85 |        "      <th>oldpeak</th>\n",
 86 |        "      <th>slope</th>\n",
 87 |        "      <th>ca</th>\n",
 88 |        "      <th>thal</th>\n",
 89 |        "      <th>target</th>\n",
 90 |        "    </tr>\n",
 91 |        "  </thead>\n",
 92 |        "  <tbody>\n",
 93 |        "    <tr>\n",
 94 |        "      <td>0</td>\n",
 95 |        "      <td>63</td>\n",
 96 |        "      <td>1</td>\n",
 97 |        "      <td>3</td>\n",
 98 |        "      <td>145</td>\n",
 99 |        "      <td>233</td>\n",
100 |        "      <td>1</td>\n",
101 |        "      <td>0</td>\n",
102 |        "      <td>150</td>\n",
103 |        "      <td>0</td>\n",
104 |        "      <td>2.3</td>\n",
105 |        "      <td>0</td>\n",
106 |        "      <td>0</td>\n",
107 |        "      <td>1</td>\n",
108 |        "      <td>1</td>\n",
109 |        "    </tr>\n",
110 |        "    <tr>\n",
111 |        "      <td>1</td>\n",
112 |        "      <td>37</td>\n",
113 |        "      <td>1</td>\n",
114 |        "      <td>2</td>\n",
115 |        "      <td>130</td>\n",
116 |        "      <td>250</td>\n",
117 |        "      <td>0</td>\n",
118 |        "      <td>1</td>\n",
119 |        "      <td>187</td>\n",
120 |        "      <td>0</td>\n",
121 |        "      <td>3.5</td>\n",
122 |        "      <td>0</td>\n",
123 |        "      <td>0</td>\n",
124 |        "      <td>2</td>\n",
125 |        "      <td>1</td>\n",
126 |        "    </tr>\n",
127 |        "    <tr>\n",
128 |        "      <td>2</td>\n",
129 |        "      <td>41</td>\n",
130 |        "      <td>0</td>\n",
131 |        "      <td>1</td>\n",
132 |        "      <td>130</td>\n",
133 |        "      <td>204</td>\n",
134 |        "      <td>0</td>\n",
135 |        "      <td>0</td>\n",
136 |        "      <td>172</td>\n",
137 |        "      <td>0</td>\n",
138 |        "      <td>1.4</td>\n",
139 |        "      <td>2</td>\n",
140 |        "      <td>0</td>\n",
141 |        "      <td>2</td>\n",
142 |        "      <td>1</td>\n",
143 |        "    </tr>\n",
144 |        "    <tr>\n",
145 |        "      <td>3</td>\n",
146 |        "      <td>56</td>\n",
147 |        "      <td>1</td>\n",
148 |        "      <td>1</td>\n",
149 |        "      <td>120</td>\n",
150 |        "      <td>236</td>\n",
151 |        "      <td>0</td>\n",
152 |        "      <td>1</td>\n",
153 |        "      <td>178</td>\n",
154 |        "      <td>0</td>\n",
155 |        "      <td>0.8</td>\n",
156 |        "      <td>2</td>\n",
157 |        "      <td>0</td>\n",
158 |        "      <td>2</td>\n",
159 |        "      <td>1</td>\n",
160 |        "    </tr>\n",
161 |        "    <tr>\n",
162 |        "      <td>4</td>\n",
163 |        "      <td>57</td>\n",
164 |        "      <td>0</td>\n",
165 |        "      <td>0</td>\n",
166 |        "      <td>120</td>\n",
167 |        "      <td>354</td>\n",
168 |        "      <td>0</td>\n",
169 |        "      <td>1</td>\n",
170 |        "      <td>163</td>\n",
171 |        "      <td>1</td>\n",
172 |        "      <td>0.6</td>\n",
173 |        "      <td>2</td>\n",
174 |        "      <td>0</td>\n",
175 |        "      <td>2</td>\n",
176 |        "      <td>1</td>\n",
177 |        "    </tr>\n",
178 |        "  </tbody>\n",
179 |        "</table>\n",
180 |        "</div>"
181 |       ],
182 |       "text/plain": [
183 |        "   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \\\n",
184 |        "0   63    1   3       145   233    1        0      150      0      2.3      0   \n",
185 |        "1   37    1   2       130   250    0        1      187      0      3.5      0   \n",
186 |        "2   41    0   1       130   204    0        0      172      0      1.4      2   \n",
187 |        "3   56    1   1       120   236    0        1      178      0      0.8      2   \n",
188 |        "4   57    0   0       120   354    0        1      163      1      0.6      2   \n",
189 |        "\n",
190 |        "   ca  thal  target  \n",
191 |        "0   0     1       1  \n",
192 |        "1   0     2       1  \n",
193 |        "2   0     2       1  \n",
194 |        "3   0     2       1  \n",
195 |        "4   0     2       1  "
196 |       ]
197 |      },
198 |      "execution_count": 2,
199 |      "metadata": {},
200 |      "output_type": "execute_result"
201 |     }
202 |    ],
203 |    "source": [
204 |     "df = pd.read_csv(filename)\n",
205 |     "df.head()"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "## Using numpy's loadtxt and genfromtxt\n",
213 |     "\n",
214 |     "If you must. Notice it fails without extra arguments - its not as smart and we have to tell it what to do. Designed for loading in data saved using `np.savetxt`, not meant to be a robust loader."
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 5,
220 |    "metadata": {
221 |     "ExecuteTime": {
222 |      "end_time": "2020-02-14T12:41:25.154199Z",
223 |      "start_time": "2020-02-14T12:41:25.144188Z"
224 |     }
225 |    },
226 |    "outputs": [
227 |     {
228 |      "name": "stdout",
229 |      "output_type": "stream",
230 |      "text": [
231 |       "[[63.  1.  3. ...  0.  1.  1.]\n",
232 |       " [37.  1.  2. ...  0.  2.  1.]\n",
233 |       " [41.  0.  1. ...  0.  2.  1.]\n",
234 |       " ...\n",
235 |       " [68.  1.  0. ...  2.  3.  0.]\n",
236 |       " [57.  1.  0. ...  1.  3.  0.]\n",
237 |       " [57.  0.  1. ...  1.  2.  0.]]\n"
238 |      ]
239 |     }
240 |    ],
241 |    "source": [
242 |     "data = np.loadtxt(filename, delimiter=\",\", skiprows=1)\n",
243 |     "print(data)"
244 |    ]
245 |   },
246 |   {
247 |    "cell_type": "code",
248 |    "execution_count": 7,
249 |    "metadata": {
250 |     "ExecuteTime": {
251 |      "end_time": "2020-02-14T12:43:04.186497Z",
252 |      "start_time": "2020-02-14T12:43:04.159393Z"
253 |     }
254 |    },
255 |    "outputs": [
256 |     {
257 |      "name": "stdout",
258 |      "output_type": "stream",
259 |      "text": [
260 |       "[(63, 1, 3, 145, 233, 1, 0, 150, 0, 2.3, 0, 0, 1, 1)\n",
261 |       " (37, 1, 2, 130, 250, 0, 1, 187, 0, 3.5, 0, 0, 2, 1)\n",
262 |       " (41, 0, 1, 130, 204, 0, 0, 172, 0, 1.4, 2, 0, 2, 1)\n",
263 |       " (56, 1, 1, 120, 236, 0, 1, 178, 0, 0.8, 2, 0, 2, 1)\n",
264 |       " (57, 0, 0, 120, 354, 0, 1, 163, 1, 0.6, 2, 0, 2, 1)\n",
265 |       " (57, 1, 0, 140, 192, 0, 1, 148, 0, 0.4, 1, 0, 1, 1)\n",
266 |       " (56, 0, 1, 140, 294, 0, 0, 153, 0, 1.3, 1, 0, 2, 1)\n",
267 |       " (44, 1, 1, 120, 263, 0, 1, 173, 0, 0. , 2, 0, 3, 1)\n",
268 |       " (52, 1, 2, 172, 199, 1, 1, 162, 0, 0.5, 2, 0, 3, 1)\n",
269 |       " (57, 1, 2, 150, 168, 0, 1, 174, 0, 1.6, 2, 0, 2, 1)]\n",
270 |       "[('age', '<i4'), ('sex', '<i4'), ('cp', '<i4'), ('trestbps', '<i4'), ('chol', '<i4'), ('fbs', '<i4'), ('restecg', '<i4'), ('thalach', '<i4'), ('exang', '<i4'), ('oldpeak', '<f8'), ('slope', '<i4'), ('ca', '<i4'), ('thal', '<i4'), ('target', '<i4')]\n"
271 |      ]
272 |     }
273 |    ],
274 |    "source": [
275 |     "data = np.genfromtxt(filename, delimiter=\",\", dtype=None, names=True, encoding=\"utf-8-sig\")\n",
276 |     "print(data[:10])\n",
277 |     "print(data.dtype)"
278 |    ]
279 |   },
280 |   {
281 |    "cell_type": "markdown",
282 |    "metadata": {},
283 |    "source": [
284 |     "## Manual Loading\n",
285 |     "For completely weird file structures\n"
286 |    ]
287 |   },
288 |   {
289 |    "cell_type": "code",
290 |    "execution_count": 8,
291 |    "metadata": {
292 |     "ExecuteTime": {
293 |      "end_time": "2020-02-14T12:46:47.290481Z",
294 |      "start_time": "2020-02-14T12:46:47.266365Z"
295 |     }
296 |    },
297 |    "outputs": [
298 |     {
299 |      "data": {
300 |       "text/html": [
301 |        "<div>\n",
302 |        "<style scoped>\n",
303 |        "    .dataframe tbody tr th:only-of-type {\n",
304 |        "        vertical-align: middle;\n",
305 |        "    }\n",
306 |        "\n",
307 |        "    .dataframe tbody tr th {\n",
308 |        "        vertical-align: top;\n",
309 |        "    }\n",
310 |        "\n",
311 |        "    .dataframe thead th {\n",
312 |        "        text-align: right;\n",
313 |        "    }\n",
314 |        "</style>\n",
315 |        "<table border=\"1\" class=\"dataframe\">\n",
316 |        "  <thead>\n",
317 |        "    <tr style=\"text-align: right;\">\n",
318 |        "      <th></th>\n",
319 |        "      <th>age</th>\n",
320 |        "      <th>sex</th>\n",
321 |        "      <th>cp</th>\n",
322 |        "      <th>trestbps</th>\n",
323 |        "      <th>chol</th>\n",
324 |        "      <th>fbs</th>\n",
325 |        "      <th>restecg</th>\n",
326 |        "      <th>thalach</th>\n",
327 |        "      <th>exang</th>\n",
328 |        "      <th>oldpeak</th>\n",
329 |        "      <th>slope</th>\n",
330 |        "      <th>ca</th>\n",
331 |        "      <th>thal</th>\n",
332 |        "      <th>target</th>\n",
333 |        "    </tr>\n",
334 |        "  </thead>\n",
335 |        "  <tbody>\n",
336 |        "    <tr>\n",
337 |        "      <td>0</td>\n",
338 |        "      <td>63.0</td>\n",
339 |        "      <td>1.0</td>\n",
340 |        "      <td>3.0</td>\n",
341 |        "      <td>145.0</td>\n",
342 |        "      <td>233.0</td>\n",
343 |        "      <td>1.0</td>\n",
344 |        "      <td>0.0</td>\n",
345 |        "      <td>150.0</td>\n",
346 |        "      <td>0.0</td>\n",
347 |        "      <td>2.3</td>\n",
348 |        "      <td>0.0</td>\n",
349 |        "      <td>0.0</td>\n",
350 |        "      <td>1.0</td>\n",
351 |        "      <td>1.0</td>\n",
352 |        "    </tr>\n",
353 |        "    <tr>\n",
354 |        "      <td>1</td>\n",
355 |        "      <td>37.0</td>\n",
356 |        "      <td>1.0</td>\n",
357 |        "      <td>2.0</td>\n",
358 |        "      <td>130.0</td>\n",
359 |        "      <td>250.0</td>\n",
360 |        "      <td>0.0</td>\n",
361 |        "      <td>1.0</td>\n",
362 |        "      <td>187.0</td>\n",
363 |        "      <td>0.0</td>\n",
364 |        "      <td>3.5</td>\n",
365 |        "      <td>0.0</td>\n",
366 |        "      <td>0.0</td>\n",
367 |        "      <td>2.0</td>\n",
368 |        "      <td>1.0</td>\n",
369 |        "    </tr>\n",
370 |        "    <tr>\n",
371 |        "      <td>2</td>\n",
372 |        "      <td>41.0</td>\n",
373 |        "      <td>0.0</td>\n",
374 |        "      <td>1.0</td>\n",
375 |        "      <td>130.0</td>\n",
376 |        "      <td>204.0</td>\n",
377 |        "      <td>0.0</td>\n",
378 |        "      <td>0.0</td>\n",
379 |        "      <td>172.0</td>\n",
380 |        "      <td>0.0</td>\n",
381 |        "      <td>1.4</td>\n",
382 |        "      <td>2.0</td>\n",
383 |        "      <td>0.0</td>\n",
384 |        "      <td>2.0</td>\n",
385 |        "      <td>1.0</td>\n",
386 |        "    </tr>\n",
387 |        "    <tr>\n",
388 |        "      <td>3</td>\n",
389 |        "      <td>56.0</td>\n",
390 |        "      <td>1.0</td>\n",
391 |        "      <td>1.0</td>\n",
392 |        "      <td>120.0</td>\n",
393 |        "      <td>236.0</td>\n",
394 |        "      <td>0.0</td>\n",
395 |        "      <td>1.0</td>\n",
396 |        "      <td>178.0</td>\n",
397 |        "      <td>0.0</td>\n",
398 |        "      <td>0.8</td>\n",
399 |        "      <td>2.0</td>\n",
400 |        "      <td>0.0</td>\n",
401 |        "      <td>2.0</td>\n",
402 |        "      <td>1.0</td>\n",
403 |        "    </tr>\n",
404 |        "    <tr>\n",
405 |        "      <td>4</td>\n",
406 |        "      <td>57.0</td>\n",
407 |        "      <td>0.0</td>\n",
408 |        "      <td>0.0</td>\n",
409 |        "      <td>120.0</td>\n",
410 |        "      <td>354.0</td>\n",
411 |        "      <td>0.0</td>\n",
412 |        "      <td>1.0</td>\n",
413 |        "      <td>163.0</td>\n",
414 |        "      <td>1.0</td>\n",
415 |        "      <td>0.6</td>\n",
416 |        "      <td>2.0</td>\n",
417 |        "      <td>0.0</td>\n",
418 |        "      <td>2.0</td>\n",
419 |        "      <td>1.0</td>\n",
420 |        "    </tr>\n",
421 |        "  </tbody>\n",
422 |        "</table>\n",
423 |        "</div>"
424 |       ],
425 |       "text/plain": [
426 |        "    age  sex   cp  trestbps   chol  fbs  restecg  thalach  exang  oldpeak  \\\n",
427 |        "0  63.0  1.0  3.0     145.0  233.0  1.0      0.0    150.0    0.0      2.3   \n",
428 |        "1  37.0  1.0  2.0     130.0  250.0  0.0      1.0    187.0    0.0      3.5   \n",
429 |        "2  41.0  0.0  1.0     130.0  204.0  0.0      0.0    172.0    0.0      1.4   \n",
430 |        "3  56.0  1.0  1.0     120.0  236.0  0.0      1.0    178.0    0.0      0.8   \n",
431 |        "4  57.0  0.0  0.0     120.0  354.0  0.0      1.0    163.0    1.0      0.6   \n",
432 |        "\n",
433 |        "   slope   ca  thal  target  \n",
434 |        "0    0.0  0.0   1.0     1.0  \n",
435 |        "1    0.0  0.0   2.0     1.0  \n",
436 |        "2    2.0  0.0   2.0     1.0  \n",
437 |        "3    2.0  0.0   2.0     1.0  \n",
438 |        "4    2.0  0.0   2.0     1.0  "
439 |       ]
440 |      },
441 |      "execution_count": 8,
442 |      "metadata": {},
443 |      "output_type": "execute_result"
444 |     }
445 |    ],
446 |    "source": [
447 |     "def load_file(filename):\n",
448 |     "    with open(filename, encoding=\"utf-8-sig\") as f:\n",
449 |     "        data, cols = [], []\n",
450 |     "        for i, line in enumerate(f.read().splitlines()):\n",
451 |     "            if i == 0:\n",
452 |     "                cols += line.split(\",\")\n",
453 |     "            else:\n",
454 |     "                data.append([float(x) for x in line.split(\",\")])\n",
455 |     "        df = pd.DataFrame(data, columns=cols)\n",
456 |     "    return df\n",
457 |     "load_file(filename).head()"
458 |    ]
459 |   },
460 |   {
461 |    "cell_type": "markdown",
462 |    "metadata": {},
463 |    "source": [
464 |     "## Pickles!\n",
465 |     "Some danger using pickles as encoding changes. Use an industry standard like hd5 instead if you can. Note if you're working with dataframes, dont use python's `pickle`, pandas has their own implementation - `df.to_pickle` and `df.read_pickle`. Underlying algorithm is the same, but less code for you to type, and supports compression."
466 |    ]
467 |   },
468 |   {
469 |    "cell_type": "code",
470 |    "execution_count": 10,
471 |    "metadata": {
472 |     "ExecuteTime": {
473 |      "end_time": "2020-02-14T12:48:34.375437Z",
474 |      "start_time": "2020-02-14T12:48:34.359410Z"
475 |     }
476 |    },
477 |    "outputs": [
478 |     {
479 |      "data": {
480 |       "text/html": [
481 |        "<div>\n",
482 |        "<style scoped>\n",
483 |        "    .dataframe tbody tr th:only-of-type {\n",
484 |        "        vertical-align: middle;\n",
485 |        "    }\n",
486 |        "\n",
487 |        "    .dataframe tbody tr th {\n",
488 |        "        vertical-align: top;\n",
489 |        "    }\n",
490 |        "\n",
491 |        "    .dataframe thead th {\n",
492 |        "        text-align: right;\n",
493 |        "    }\n",
494 |        "</style>\n",
495 |        "<table border=\"1\" class=\"dataframe\">\n",
496 |        "  <thead>\n",
497 |        "    <tr style=\"text-align: right;\">\n",
498 |        "      <th></th>\n",
499 |        "      <th>age</th>\n",
500 |        "      <th>sex</th>\n",
501 |        "      <th>cp</th>\n",
502 |        "      <th>trestbps</th>\n",
503 |        "      <th>chol</th>\n",
504 |        "      <th>fbs</th>\n",
505 |        "      <th>restecg</th>\n",
506 |        "      <th>thalach</th>\n",
507 |        "      <th>exang</th>\n",
508 |        "      <th>oldpeak</th>\n",
509 |        "      <th>slope</th>\n",
510 |        "      <th>ca</th>\n",
511 |        "      <th>thal</th>\n",
512 |        "      <th>target</th>\n",
513 |        "    </tr>\n",
514 |        "  </thead>\n",
515 |        "  <tbody>\n",
516 |        "    <tr>\n",
517 |        "      <td>0</td>\n",
518 |        "      <td>63</td>\n",
519 |        "      <td>1</td>\n",
520 |        "      <td>3</td>\n",
521 |        "      <td>145</td>\n",
522 |        "      <td>233</td>\n",
523 |        "      <td>1</td>\n",
524 |        "      <td>0</td>\n",
525 |        "      <td>150</td>\n",
526 |        "      <td>0</td>\n",
527 |        "      <td>2.3</td>\n",
528 |        "      <td>0</td>\n",
529 |        "      <td>0</td>\n",
530 |        "      <td>1</td>\n",
531 |        "      <td>1</td>\n",
532 |        "    </tr>\n",
533 |        "    <tr>\n",
534 |        "      <td>1</td>\n",
535 |        "      <td>37</td>\n",
536 |        "      <td>1</td>\n",
537 |        "      <td>2</td>\n",
538 |        "      <td>130</td>\n",
539 |        "      <td>250</td>\n",
540 |        "      <td>0</td>\n",
541 |        "      <td>1</td>\n",
542 |        "      <td>187</td>\n",
543 |        "      <td>0</td>\n",
544 |        "      <td>3.5</td>\n",
545 |        "      <td>0</td>\n",
546 |        "      <td>0</td>\n",
547 |        "      <td>2</td>\n",
548 |        "      <td>1</td>\n",
549 |        "    </tr>\n",
550 |        "    <tr>\n",
551 |        "      <td>2</td>\n",
552 |        "      <td>41</td>\n",
553 |        "      <td>0</td>\n",
554 |        "      <td>1</td>\n",
555 |        "      <td>130</td>\n",
556 |        "      <td>204</td>\n",
557 |        "      <td>0</td>\n",
558 |        "      <td>0</td>\n",
559 |        "      <td>172</td>\n",
560 |        "      <td>0</td>\n",
561 |        "      <td>1.4</td>\n",
562 |        "      <td>2</td>\n",
563 |        "      <td>0</td>\n",
564 |        "      <td>2</td>\n",
565 |        "      <td>1</td>\n",
566 |        "    </tr>\n",
567 |        "    <tr>\n",
568 |        "      <td>3</td>\n",
569 |        "      <td>56</td>\n",
570 |        "      <td>1</td>\n",
571 |        "      <td>1</td>\n",
572 |        "      <td>120</td>\n",
573 |        "      <td>236</td>\n",
574 |        "      <td>0</td>\n",
575 |        "      <td>1</td>\n",
576 |        "      <td>178</td>\n",
577 |        "      <td>0</td>\n",
578 |        "      <td>0.8</td>\n",
579 |        "      <td>2</td>\n",
580 |        "      <td>0</td>\n",
581 |        "      <td>2</td>\n",
582 |        "      <td>1</td>\n",
583 |        "    </tr>\n",
584 |        "    <tr>\n",
585 |        "      <td>4</td>\n",
586 |        "      <td>57</td>\n",
587 |        "      <td>0</td>\n",
588 |        "      <td>0</td>\n",
589 |        "      <td>120</td>\n",
590 |        "      <td>354</td>\n",
591 |        "      <td>0</td>\n",
592 |        "      <td>1</td>\n",
593 |        "      <td>163</td>\n",
594 |        "      <td>1</td>\n",
595 |        "      <td>0.6</td>\n",
596 |        "      <td>2</td>\n",
597 |        "      <td>0</td>\n",
598 |        "      <td>2</td>\n",
599 |        "      <td>1</td>\n",
600 |        "    </tr>\n",
601 |        "  </tbody>\n",
602 |        "</table>\n",
603 |        "</div>"
604 |       ],
605 |       "text/plain": [
606 |        "   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \\\n",
607 |        "0   63    1   3       145   233    1        0      150      0      2.3      0   \n",
608 |        "1   37    1   2       130   250    0        1      187      0      3.5      0   \n",
609 |        "2   41    0   1       130   204    0        0      172      0      1.4      2   \n",
610 |        "3   56    1   1       120   236    0        1      178      0      0.8      2   \n",
611 |        "4   57    0   0       120   354    0        1      163      1      0.6      2   \n",
612 |        "\n",
613 |        "   ca  thal  target  \n",
614 |        "0   0     1       1  \n",
615 |        "1   0     2       1  \n",
616 |        "2   0     2       1  \n",
617 |        "3   0     2       1  \n",
618 |        "4   0     2       1  "
619 |       ]
620 |      },
621 |      "execution_count": 10,
622 |      "metadata": {},
623 |      "output_type": "execute_result"
624 |     }
625 |    ],
626 |    "source": [
627 |     "df = pd.read_pickle(\"heart.pkl\")\n",
628 |     "df.head()"
629 |    ]
630 |   },
631 |   {
632 |    "cell_type": "markdown",
633 |    "metadata": {},
634 |    "source": [
635 |     "### Recap\n",
636 |     "\n",
637 |     "* Use pd.read_csv 99% of the time\n",
638 |     "* Use pd.read_* for other cases (pd.read_excel, pd.read_pickle, etc)\n",
639 |     "* If pd cant handle it, I doubt numpy can\n",
640 |     "* If you use a manual function, save your data to a sensible format"
641 |    ]
642 |   }
643 |  ],
644 |  "metadata": {
645 |   "kernelspec": {
646 |    "display_name": "Python 3",
647 |    "language": "python",
648 |    "name": "python3"
649 |   },
650 |   "language_info": {
651 |    "codemirror_mode": {
652 |     "name": "ipython",
653 |     "version": 3
654 |    },
655 |    "file_extension": ".py",
656 |    "mimetype": "text/x-python",
657 |    "name": "python",
658 |    "nbconvert_exporter": "python",
659 |    "pygments_lexer": "ipython3",
660 |    "version": "3.7.4"
661 |   }
662 |  },
663 |  "nbformat": 4,
664 |  "nbformat_minor": 2
665 | }
666 | 


--------------------------------------------------------------------------------
/practiceResource/dataSavingAndSerialising.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Saving and Serialising a dataframe\n"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {
 14 |     "ExecuteTime": {
 15 |      "end_time": "2020-02-16T02:55:25.086213Z",
 16 |      "start_time": "2020-02-16T02:55:23.758762Z"
 17 |     }
 18 |    },
 19 |    "outputs": [],
 20 |    "source": [
 21 |     "import numpy as np\n",
 22 |     "import pandas as pd"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "code",
 27 |    "execution_count": 2,
 28 |    "metadata": {
 29 |     "ExecuteTime": {
 30 |      "end_time": "2020-02-16T03:01:00.686674Z",
 31 |      "start_time": "2020-02-16T03:01:00.668178Z"
 32 |     }
 33 |    },
 34 |    "outputs": [
 35 |     {
 36 |      "data": {
 37 |       "text/html": [
 38 |        "<div>\n",
 39 |        "<style scoped>\n",
 40 |        "    .dataframe tbody tr th:only-of-type {\n",
 41 |        "        vertical-align: middle;\n",
 42 |        "    }\n",
 43 |        "\n",
 44 |        "    .dataframe tbody tr th {\n",
 45 |        "        vertical-align: top;\n",
 46 |        "    }\n",
 47 |        "\n",
 48 |        "    .dataframe thead th {\n",
 49 |        "        text-align: right;\n",
 50 |        "    }\n",
 51 |        "</style>\n",
 52 |        "<table border=\"1\" class=\"dataframe\">\n",
 53 |        "  <thead>\n",
 54 |        "    <tr style=\"text-align: right;\">\n",
 55 |        "      <th></th>\n",
 56 |        "      <th>A</th>\n",
 57 |        "      <th>B</th>\n",
 58 |        "      <th>C</th>\n",
 59 |        "      <th>D</th>\n",
 60 |        "    </tr>\n",
 61 |        "  </thead>\n",
 62 |        "  <tbody>\n",
 63 |        "    <tr>\n",
 64 |        "      <td>0</td>\n",
 65 |        "      <td>0.863149</td>\n",
 66 |        "      <td>0.314732</td>\n",
 67 |        "      <td>0.669747</td>\n",
 68 |        "      <td>0.702656</td>\n",
 69 |        "    </tr>\n",
 70 |        "    <tr>\n",
 71 |        "      <td>1</td>\n",
 72 |        "      <td>0.546542</td>\n",
 73 |        "      <td>0.563607</td>\n",
 74 |        "      <td>0.780532</td>\n",
 75 |        "      <td>0.312281</td>\n",
 76 |        "    </tr>\n",
 77 |        "    <tr>\n",
 78 |        "      <td>2</td>\n",
 79 |        "      <td>0.024058</td>\n",
 80 |        "      <td>0.473108</td>\n",
 81 |        "      <td>0.447980</td>\n",
 82 |        "      <td>0.811878</td>\n",
 83 |        "    </tr>\n",
 84 |        "    <tr>\n",
 85 |        "      <td>3</td>\n",
 86 |        "      <td>0.888702</td>\n",
 87 |        "      <td>0.392524</td>\n",
 88 |        "      <td>0.830159</td>\n",
 89 |        "      <td>0.452014</td>\n",
 90 |        "    </tr>\n",
 91 |        "    <tr>\n",
 92 |        "      <td>4</td>\n",
 93 |        "      <td>0.266793</td>\n",
 94 |        "      <td>0.449780</td>\n",
 95 |        "      <td>0.589546</td>\n",
 96 |        "      <td>0.882689</td>\n",
 97 |        "    </tr>\n",
 98 |        "  </tbody>\n",
 99 |        "</table>\n",
100 |        "</div>"
101 |       ],
102 |       "text/plain": [
103 |        "          A         B         C         D\n",
104 |        "0  0.863149  0.314732  0.669747  0.702656\n",
105 |        "1  0.546542  0.563607  0.780532  0.312281\n",
106 |        "2  0.024058  0.473108  0.447980  0.811878\n",
107 |        "3  0.888702  0.392524  0.830159  0.452014\n",
108 |        "4  0.266793  0.449780  0.589546  0.882689"
109 |       ]
110 |      },
111 |      "execution_count": 2,
112 |      "metadata": {},
113 |      "output_type": "execute_result"
114 |     }
115 |    ],
116 |    "source": [
117 |     "# Lets make a new dataframe and save it out using various formats\n",
118 |     "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n",
119 |     "df.head()"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": 3,
125 |    "metadata": {
126 |     "ExecuteTime": {
127 |      "end_time": "2020-02-16T03:03:34.987813Z",
128 |      "start_time": "2020-02-16T03:03:34.219248Z"
129 |     }
130 |    },
131 |    "outputs": [],
132 |    "source": [
133 |     "df.to_csv(\"save.csv\", index=False, float_format=\"%0.3f\")"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": 4,
139 |    "metadata": {
140 |     "ExecuteTime": {
141 |      "end_time": "2020-02-16T03:04:00.092272Z",
142 |      "start_time": "2020-02-16T03:04:00.079738Z"
143 |     }
144 |    },
145 |    "outputs": [],
146 |    "source": [
147 |     "df.to_pickle(\"save.pkl\")"
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": 8,
153 |    "metadata": {
154 |     "ExecuteTime": {
155 |      "end_time": "2020-02-16T03:06:06.874338Z",
156 |      "start_time": "2020-02-16T03:06:05.955905Z"
157 |     }
158 |    },
159 |    "outputs": [],
160 |    "source": [
161 |     "# pip install tables\n",
162 |     "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": 9,
168 |    "metadata": {
169 |     "ExecuteTime": {
170 |      "end_time": "2020-02-16T03:06:56.305779Z",
171 |      "start_time": "2020-02-16T03:06:56.204901Z"
172 |     }
173 |    },
174 |    "outputs": [],
175 |    "source": [
176 |     "# pip install feather-format\n",
177 |     "df.to_feather(\"save.fth\")"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "code",
182 |    "execution_count": 11,
183 |    "metadata": {
184 |     "ExecuteTime": {
185 |      "end_time": "2020-02-16T03:10:46.080056Z",
186 |      "start_time": "2020-02-16T03:10:46.075636Z"
187 |     }
188 |    },
189 |    "outputs": [],
190 |    "source": [
191 |     "# If you want to get the timings you can see in the video, you'll need this extension:\n",
192 |     "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html"
193 |    ]
194 |   },
195 |   {
196 |    "cell_type": "markdown",
197 |    "metadata": {},
198 |    "source": [
199 |     "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "code",
204 |    "execution_count": 7,
205 |    "metadata": {
206 |     "ExecuteTime": {
207 |      "end_time": "2020-02-16T03:14:16.764994Z",
208 |      "start_time": "2020-02-16T03:14:16.741456Z"
209 |     }
210 |    },
211 |    "outputs": [
212 |     {
213 |      "data": {
214 |       "text/html": [
215 |        "<div>\n",
216 |        "<style scoped>\n",
217 |        "    .dataframe tbody tr th:only-of-type {\n",
218 |        "        vertical-align: middle;\n",
219 |        "    }\n",
220 |        "\n",
221 |        "    .dataframe tbody tr th {\n",
222 |        "        vertical-align: top;\n",
223 |        "    }\n",
224 |        "\n",
225 |        "    .dataframe thead th {\n",
226 |        "        text-align: right;\n",
227 |        "    }\n",
228 |        "</style>\n",
229 |        "<table border=\"1\" class=\"dataframe\">\n",
230 |        "  <thead>\n",
231 |        "    <tr style=\"text-align: right;\">\n",
232 |        "      <th></th>\n",
233 |        "      <th>Name</th>\n",
234 |        "      <th>Year</th>\n",
235 |        "      <th>Group</th>\n",
236 |        "      <th>Status</th>\n",
237 |        "      <th>Birth Date</th>\n",
238 |        "      <th>Birth Place</th>\n",
239 |        "      <th>Gender</th>\n",
240 |        "      <th>Alma Mater</th>\n",
241 |        "      <th>Undergraduate Major</th>\n",
242 |        "      <th>Graduate Major</th>\n",
243 |        "      <th>Military Rank</th>\n",
244 |        "      <th>Military Branch</th>\n",
245 |        "      <th>Space Flights</th>\n",
246 |        "      <th>Space Flight (hr)</th>\n",
247 |        "      <th>Space Walks</th>\n",
248 |        "      <th>Space Walks (hr)</th>\n",
249 |        "      <th>Missions</th>\n",
250 |        "      <th>Death Date</th>\n",
251 |        "      <th>Death Mission</th>\n",
252 |        "    </tr>\n",
253 |        "  </thead>\n",
254 |        "  <tbody>\n",
255 |        "    <tr>\n",
256 |        "      <td>0</td>\n",
257 |        "      <td>Joseph M. Acaba</td>\n",
258 |        "      <td>2004.0</td>\n",
259 |        "      <td>19.0</td>\n",
260 |        "      <td>Active</td>\n",
261 |        "      <td>5/17/1967</td>\n",
262 |        "      <td>Inglewood, CA</td>\n",
263 |        "      <td>Male</td>\n",
264 |        "      <td>University of California-Santa Barbara; Univer...</td>\n",
265 |        "      <td>Geology</td>\n",
266 |        "      <td>Geology</td>\n",
267 |        "      <td>NaN</td>\n",
268 |        "      <td>NaN</td>\n",
269 |        "      <td>2</td>\n",
270 |        "      <td>3307</td>\n",
271 |        "      <td>2</td>\n",
272 |        "      <td>13.0</td>\n",
273 |        "      <td>STS-119 (Discovery), ISS-31/32 (Soyuz)</td>\n",
274 |        "      <td>NaN</td>\n",
275 |        "      <td>NaN</td>\n",
276 |        "    </tr>\n",
277 |        "    <tr>\n",
278 |        "      <td>1</td>\n",
279 |        "      <td>Loren W. Acton</td>\n",
280 |        "      <td>NaN</td>\n",
281 |        "      <td>NaN</td>\n",
282 |        "      <td>Retired</td>\n",
283 |        "      <td>3/7/1936</td>\n",
284 |        "      <td>Lewiston, MT</td>\n",
285 |        "      <td>Male</td>\n",
286 |        "      <td>Montana State University; University of Colorado</td>\n",
287 |        "      <td>Engineering Physics</td>\n",
288 |        "      <td>Solar Physics</td>\n",
289 |        "      <td>NaN</td>\n",
290 |        "      <td>NaN</td>\n",
291 |        "      <td>1</td>\n",
292 |        "      <td>190</td>\n",
293 |        "      <td>0</td>\n",
294 |        "      <td>0.0</td>\n",
295 |        "      <td>STS 51-F (Challenger)</td>\n",
296 |        "      <td>NaN</td>\n",
297 |        "      <td>NaN</td>\n",
298 |        "    </tr>\n",
299 |        "    <tr>\n",
300 |        "      <td>2</td>\n",
301 |        "      <td>James C. Adamson</td>\n",
302 |        "      <td>1984.0</td>\n",
303 |        "      <td>10.0</td>\n",
304 |        "      <td>Retired</td>\n",
305 |        "      <td>3/3/1946</td>\n",
306 |        "      <td>Warsaw, NY</td>\n",
307 |        "      <td>Male</td>\n",
308 |        "      <td>US Military Academy; Princeton University</td>\n",
309 |        "      <td>Engineering</td>\n",
310 |        "      <td>Aerospace Engineering</td>\n",
311 |        "      <td>Colonel</td>\n",
312 |        "      <td>US Army (Retired)</td>\n",
313 |        "      <td>2</td>\n",
314 |        "      <td>334</td>\n",
315 |        "      <td>0</td>\n",
316 |        "      <td>0.0</td>\n",
317 |        "      <td>STS-28 (Columbia), STS-43 (Atlantis)</td>\n",
318 |        "      <td>NaN</td>\n",
319 |        "      <td>NaN</td>\n",
320 |        "    </tr>\n",
321 |        "    <tr>\n",
322 |        "      <td>3</td>\n",
323 |        "      <td>Thomas D. Akers</td>\n",
324 |        "      <td>1987.0</td>\n",
325 |        "      <td>12.0</td>\n",
326 |        "      <td>Retired</td>\n",
327 |        "      <td>5/20/1951</td>\n",
328 |        "      <td>St. Louis, MO</td>\n",
329 |        "      <td>Male</td>\n",
330 |        "      <td>University of Missouri-Rolla</td>\n",
331 |        "      <td>Applied Mathematics</td>\n",
332 |        "      <td>Applied Mathematics</td>\n",
333 |        "      <td>Colonel</td>\n",
334 |        "      <td>US Air Force (Retired)</td>\n",
335 |        "      <td>4</td>\n",
336 |        "      <td>814</td>\n",
337 |        "      <td>4</td>\n",
338 |        "      <td>29.0</td>\n",
339 |        "      <td>STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...</td>\n",
340 |        "      <td>NaN</td>\n",
341 |        "      <td>NaN</td>\n",
342 |        "    </tr>\n",
343 |        "    <tr>\n",
344 |        "      <td>4</td>\n",
345 |        "      <td>Buzz Aldrin</td>\n",
346 |        "      <td>1963.0</td>\n",
347 |        "      <td>3.0</td>\n",
348 |        "      <td>Retired</td>\n",
349 |        "      <td>1/20/1930</td>\n",
350 |        "      <td>Montclair, NJ</td>\n",
351 |        "      <td>Male</td>\n",
352 |        "      <td>US Military Academy; MIT</td>\n",
353 |        "      <td>Mechanical Engineering</td>\n",
354 |        "      <td>Astronautics</td>\n",
355 |        "      <td>Colonel</td>\n",
356 |        "      <td>US Air Force (Retired)</td>\n",
357 |        "      <td>2</td>\n",
358 |        "      <td>289</td>\n",
359 |        "      <td>2</td>\n",
360 |        "      <td>8.0</td>\n",
361 |        "      <td>Gemini 12, Apollo 11</td>\n",
362 |        "      <td>NaN</td>\n",
363 |        "      <td>NaN</td>\n",
364 |        "    </tr>\n",
365 |        "  </tbody>\n",
366 |        "</table>\n",
367 |        "</div>"
368 |       ],
369 |       "text/plain": [
370 |        "               Name    Year  Group   Status Birth Date    Birth Place Gender  \\\n",
371 |        "0   Joseph M. Acaba  2004.0   19.0   Active  5/17/1967  Inglewood, CA   Male   \n",
372 |        "1    Loren W. Acton     NaN    NaN  Retired   3/7/1936   Lewiston, MT   Male   \n",
373 |        "2  James C. Adamson  1984.0   10.0  Retired   3/3/1946     Warsaw, NY   Male   \n",
374 |        "3   Thomas D. Akers  1987.0   12.0  Retired  5/20/1951  St. Louis, MO   Male   \n",
375 |        "4       Buzz Aldrin  1963.0    3.0  Retired  1/20/1930  Montclair, NJ   Male   \n",
376 |        "\n",
377 |        "                                          Alma Mater     Undergraduate Major  \\\n",
378 |        "0  University of California-Santa Barbara; Univer...                 Geology   \n",
379 |        "1   Montana State University; University of Colorado     Engineering Physics   \n",
380 |        "2          US Military Academy; Princeton University             Engineering   \n",
381 |        "3                       University of Missouri-Rolla     Applied Mathematics   \n",
382 |        "4                           US Military Academy; MIT  Mechanical Engineering   \n",
383 |        "\n",
384 |        "          Graduate Major Military Rank         Military Branch  Space Flights  \\\n",
385 |        "0                Geology           NaN                     NaN              2   \n",
386 |        "1          Solar Physics           NaN                     NaN              1   \n",
387 |        "2  Aerospace Engineering       Colonel       US Army (Retired)              2   \n",
388 |        "3    Applied Mathematics       Colonel  US Air Force (Retired)              4   \n",
389 |        "4           Astronautics       Colonel  US Air Force (Retired)              2   \n",
390 |        "\n",
391 |        "   Space Flight (hr)  Space Walks  Space Walks (hr)  \\\n",
392 |        "0               3307            2              13.0   \n",
393 |        "1                190            0               0.0   \n",
394 |        "2                334            0               0.0   \n",
395 |        "3                814            4              29.0   \n",
396 |        "4                289            2               8.0   \n",
397 |        "\n",
398 |        "                                            Missions Death Date Death Mission  \n",
399 |        "0             STS-119 (Discovery), ISS-31/32 (Soyuz)        NaN           NaN  \n",
400 |        "1                              STS 51-F (Challenger)        NaN           NaN  \n",
401 |        "2               STS-28 (Columbia), STS-43 (Atlantis)        NaN           NaN  \n",
402 |        "3  STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...        NaN           NaN  \n",
403 |        "4                               Gemini 12, Apollo 11        NaN           NaN  "
404 |       ]
405 |      },
406 |      "execution_count": 7,
407 |      "metadata": {},
408 |      "output_type": "execute_result"
409 |     }
410 |    ],
411 |    "source": [
412 |     "df = pd.read_csv(\"astronauts.csv\")\n",
413 |     "df.head()"
414 |    ]
415 |   },
416 |   {
417 |    "cell_type": "code",
418 |    "execution_count": 8,
419 |    "metadata": {
420 |     "ExecuteTime": {
421 |      "end_time": "2020-02-16T03:14:48.250858Z",
422 |      "start_time": "2020-02-16T03:14:48.237441Z"
423 |     }
424 |    },
425 |    "outputs": [],
426 |    "source": [
427 |     "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "code",
432 |    "execution_count": 9,
433 |    "metadata": {
434 |     "ExecuteTime": {
435 |      "end_time": "2020-02-16T03:14:52.892116Z",
436 |      "start_time": "2020-02-16T03:14:52.876108Z"
437 |     }
438 |    },
439 |    "outputs": [],
440 |    "source": [
441 |     "pd.read_csv(\"save.csv\");"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": 10,
447 |    "metadata": {
448 |     "ExecuteTime": {
449 |      "end_time": "2020-02-16T03:15:12.997156Z",
450 |      "start_time": "2020-02-16T03:15:12.988669Z"
451 |     }
452 |    },
453 |    "outputs": [],
454 |    "source": [
455 |     "df.to_pickle(\"save.pkl\")"
456 |    ]
457 |   },
458 |   {
459 |    "cell_type": "code",
460 |    "execution_count": 11,
461 |    "metadata": {
462 |     "ExecuteTime": {
463 |      "end_time": "2020-02-16T03:15:16.375064Z",
464 |      "start_time": "2020-02-16T03:15:16.365034Z"
465 |     }
466 |    },
467 |    "outputs": [],
468 |    "source": [
469 |     "pd.read_pickle(\"save.pkl\");"
470 |    ]
471 |   },
472 |   {
473 |    "cell_type": "code",
474 |    "execution_count": 12,
475 |    "metadata": {
476 |     "ExecuteTime": {
477 |      "end_time": "2020-02-16T03:19:15.617617Z",
478 |      "start_time": "2020-02-16T03:19:15.588076Z"
479 |     }
480 |    },
481 |    "outputs": [],
482 |    "source": [
483 |     "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
484 |    ]
485 |   },
486 |   {
487 |    "cell_type": "code",
488 |    "execution_count": 13,
489 |    "metadata": {
490 |     "ExecuteTime": {
491 |      "end_time": "2020-02-16T03:15:35.323031Z",
492 |      "start_time": "2020-02-16T03:15:35.301528Z"
493 |     }
494 |    },
495 |    "outputs": [],
496 |    "source": [
497 |     "pd.read_hdf(\"save.hdf\");"
498 |    ]
499 |   },
500 |   {
501 |    "cell_type": "code",
502 |    "execution_count": 14,
503 |    "metadata": {
504 |     "ExecuteTime": {
505 |      "end_time": "2020-02-16T03:15:47.513253Z",
506 |      "start_time": "2020-02-16T03:15:47.499922Z"
507 |     }
508 |    },
509 |    "outputs": [
510 |     {
511 |      "ename": "ImportError",
512 |      "evalue": "Missing optional dependency 'pyarrow'.  Use pip or conda to install pyarrow.",
513 |      "output_type": "error",
514 |      "traceback": [
515 |       "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
516 |       "\u001b[1;31mImportError\u001b[0m                               Traceback (most recent call last)",
517 |       "\u001b[1;32m<ipython-input-14-6e39ea4ecb2f>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mto_feather\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"save.fth\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
518 |       "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py\u001b[0m in \u001b[0;36mto_feather\u001b[1;34m(self, fname)\u001b[0m\n\u001b[0;32m   2135\u001b[0m         \u001b[1;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mio\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfeather_format\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mto_feather\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2136\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2137\u001b[1;33m         \u001b[0mto_feather\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   2138\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2139\u001b[0m     def to_parquet(\n",
519 |       "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\feather_format.py\u001b[0m in \u001b[0;36mto_feather\u001b[1;34m(df, path)\u001b[0m\n\u001b[0;32m     21\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     22\u001b[0m     \"\"\"\n\u001b[1;32m---> 23\u001b[1;33m     \u001b[0mimport_optional_dependency\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"pyarrow\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     24\u001b[0m     \u001b[1;32mfrom\u001b[0m \u001b[0mpyarrow\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mfeather\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     25\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
520 |       "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\compat\\_optional.py\u001b[0m in \u001b[0;36mimport_optional_dependency\u001b[1;34m(name, extra, raise_on_missing, on_version)\u001b[0m\n\u001b[0;32m     91\u001b[0m     \u001b[1;32mexcept\u001b[0m \u001b[0mImportError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     92\u001b[0m         \u001b[1;32mif\u001b[0m \u001b[0mraise_on_missing\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 93\u001b[1;33m             \u001b[1;32mraise\u001b[0m \u001b[0mImportError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mextra\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mextra\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     94\u001b[0m         \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     95\u001b[0m             \u001b[1;32mreturn\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
521 |       "\u001b[1;31mImportError\u001b[0m: Missing optional dependency 'pyarrow'.  Use pip or conda to install pyarrow."
522 |      ]
523 |     }
524 |    ],
525 |    "source": [
526 |     "df.to_feather(\"save.fth\")"
527 |    ]
528 |   },
529 |   {
530 |    "cell_type": "code",
531 |    "execution_count": null,
532 |    "metadata": {
533 |     "ExecuteTime": {
534 |      "end_time": "2020-02-16T03:15:50.574863Z",
535 |      "start_time": "2020-02-16T03:15:50.557141Z"
536 |     }
537 |    },
538 |    "outputs": [],
539 |    "source": [
540 |     "pd.read_feather(\"save.fth\");"
541 |    ]
542 |   },
543 |   {
544 |    "cell_type": "code",
545 |    "execution_count": 16,
546 |    "metadata": {
547 |     "ExecuteTime": {
548 |      "end_time": "2020-02-16T03:20:03.082982Z",
549 |      "start_time": "2020-02-16T03:20:03.062532Z"
550 |     }
551 |    },
552 |    "outputs": [
553 |     {
554 |      "name": "stdout",
555 |      "output_type": "stream",
556 |      "text": [
557 |       " Volume in drive G is HDD Storge 2\n",
558 |       " Volume Serial Number is D2CA-B02B\n",
559 |       "\n",
560 |       " Directory of G:\\Qaurter2Notebooks\\piaic_q2_class_reseouces\\practiceResource\n",
561 |       "\n",
562 |       "01/10/2021  12:15 PM    <DIR>          .\n",
563 |       "01/10/2021  12:15 PM    <DIR>          ..\n",
564 |       "01/10/2021  12:12 PM    <DIR>          .ipynb_checkpoints\n",
565 |       "01/09/2021  10:33 AM            15,377 Answers.ipynb\n",
566 |       "01/10/2021  12:03 PM            81,593 astronauts.csv\n",
567 |       "01/10/2021  12:10 PM            33,930 dataInspecting.ipynb\n",
568 |       "01/10/2021  12:04 PM            19,860 dataloading.ipynb\n",
569 |       "01/10/2021  12:14 PM            18,591 dataSavingAndSerialising.ipynb\n",
570 |       "01/10/2021  11:57 AM            11,328 heart.csv\n",
571 |       "01/09/2021  10:31 AM            35,216 heart.pkl\n",
572 |       "01/09/2021  10:53 AM            32,414 NumpyVPandas.ipynb\n",
573 |       "01/09/2021  10:33 AM             2,812 Questions.ipynb\n",
574 |       "01/10/2021  12:14 PM            87,030 save.csv\n",
575 |       "01/10/2021  12:15 PM           801,617 save.hdf\n",
576 |       "01/10/2021  12:15 PM            90,693 save.pkl\n",
577 |       "01/09/2021  10:30 AM            18,594 SavingAndSerialising.ipynb\n",
578 |       "              13 File(s)      1,249,055 bytes\n",
579 |       "               3 Dir(s)  391,598,575,616 bytes free\n"
580 |      ]
581 |     }
582 |    ],
583 |    "source": [
584 |     "%ls"
585 |    ]
586 |   },
587 |   {
588 |    "cell_type": "markdown",
589 |    "metadata": {},
590 |    "source": [
591 |     "### Recap\n",
592 |     "\n",
593 |     "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n",
594 |     "\n",
595 |     "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine."
596 |    ]
597 |   }
598 |  ],
599 |  "metadata": {
600 |   "kernelspec": {
601 |    "display_name": "Python 3",
602 |    "language": "python",
603 |    "name": "python3"
604 |   },
605 |   "language_info": {
606 |    "codemirror_mode": {
607 |     "name": "ipython",
608 |     "version": 3
609 |    },
610 |    "file_extension": ".py",
611 |    "mimetype": "text/x-python",
612 |    "name": "python",
613 |    "nbconvert_exporter": "python",
614 |    "pygments_lexer": "ipython3",
615 |    "version": "3.7.4"
616 |   }
617 |  },
618 |  "nbformat": 4,
619 |  "nbformat_minor": 2
620 | }
621 | 


--------------------------------------------------------------------------------
/practiceResource/dataMaipulation/5_Basics_ApplyMapVectorised.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# Basics - Apply, Map and Vectorised Functions"
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "code",
  12 |    "execution_count": 1,
  13 |    "metadata": {
  14 |     "ExecuteTime": {
  15 |      "end_time": "2020-02-22T02:40:07.096900Z",
  16 |      "start_time": "2020-02-22T02:40:04.211321Z"
  17 |     }
  18 |    },
  19 |    "outputs": [
  20 |     {
  21 |      "data": {
  22 |       "text/html": [
  23 |        "<div>\n",
  24 |        "<style scoped>\n",
  25 |        "    .dataframe tbody tr th:only-of-type {\n",
  26 |        "        vertical-align: middle;\n",
  27 |        "    }\n",
  28 |        "\n",
  29 |        "    .dataframe tbody tr th {\n",
  30 |        "        vertical-align: top;\n",
  31 |        "    }\n",
  32 |        "\n",
  33 |        "    .dataframe thead th {\n",
  34 |        "        text-align: right;\n",
  35 |        "    }\n",
  36 |        "</style>\n",
  37 |        "<table border=\"1\" class=\"dataframe\">\n",
  38 |        "  <thead>\n",
  39 |        "    <tr style=\"text-align: right;\">\n",
  40 |        "      <th></th>\n",
  41 |        "      <th>A</th>\n",
  42 |        "      <th>B</th>\n",
  43 |        "      <th>C</th>\n",
  44 |        "    </tr>\n",
  45 |        "  </thead>\n",
  46 |        "  <tbody>\n",
  47 |        "    <tr>\n",
  48 |        "      <th>0</th>\n",
  49 |        "      <td>-1.01</td>\n",
  50 |        "      <td>-0.70</td>\n",
  51 |        "      <td>0.62</td>\n",
  52 |        "    </tr>\n",
  53 |        "    <tr>\n",
  54 |        "      <th>1</th>\n",
  55 |        "      <td>0.78</td>\n",
  56 |        "      <td>0.05</td>\n",
  57 |        "      <td>0.68</td>\n",
  58 |        "    </tr>\n",
  59 |        "    <tr>\n",
  60 |        "      <th>2</th>\n",
  61 |        "      <td>0.38</td>\n",
  62 |        "      <td>-0.05</td>\n",
  63 |        "      <td>-0.07</td>\n",
  64 |        "    </tr>\n",
  65 |        "    <tr>\n",
  66 |        "      <th>3</th>\n",
  67 |        "      <td>-1.55</td>\n",
  68 |        "      <td>-0.19</td>\n",
  69 |        "      <td>-0.08</td>\n",
  70 |        "    </tr>\n",
  71 |        "  </tbody>\n",
  72 |        "</table>\n",
  73 |        "</div>"
  74 |       ],
  75 |       "text/plain": [
  76 |        "      A     B     C\n",
  77 |        "0 -1.01 -0.70  0.62\n",
  78 |        "1  0.78  0.05  0.68\n",
  79 |        "2  0.38 -0.05 -0.07\n",
  80 |        "3 -1.55 -0.19 -0.08"
  81 |       ]
  82 |      },
  83 |      "execution_count": 1,
  84 |      "metadata": {},
  85 |      "output_type": "execute_result"
  86 |     }
  87 |    ],
  88 |    "source": [
  89 |     "import pandas as pd\n",
  90 |     "import numpy as np\n",
  91 |     "\n",
  92 |     "data = np.round(np.random.normal(size=(4, 3)), 2)\n",
  93 |     "df = pd.DataFrame(data, columns=[\"A\", \"B\", \"C\"])\n",
  94 |     "df.head()"
  95 |    ]
  96 |   },
  97 |   {
  98 |    "cell_type": "markdown",
  99 |    "metadata": {},
 100 |    "source": [
 101 |     "## Apply\n",
 102 |     "\n",
 103 |     "Used to execute an arbitrary function again an entire dataframe, or a subection. Applies in a vectorised fashion."
 104 |    ]
 105 |   },
 106 |   {
 107 |    "cell_type": "code",
 108 |    "execution_count": 3,
 109 |    "metadata": {
 110 |     "ExecuteTime": {
 111 |      "end_time": "2020-02-22T03:02:25.417815Z",
 112 |      "start_time": "2020-02-22T03:02:25.407038Z"
 113 |     }
 114 |    },
 115 |    "outputs": [
 116 |     {
 117 |      "data": {
 118 |       "text/html": [
 119 |        "<div>\n",
 120 |        "<style scoped>\n",
 121 |        "    .dataframe tbody tr th:only-of-type {\n",
 122 |        "        vertical-align: middle;\n",
 123 |        "    }\n",
 124 |        "\n",
 125 |        "    .dataframe tbody tr th {\n",
 126 |        "        vertical-align: top;\n",
 127 |        "    }\n",
 128 |        "\n",
 129 |        "    .dataframe thead th {\n",
 130 |        "        text-align: right;\n",
 131 |        "    }\n",
 132 |        "</style>\n",
 133 |        "<table border=\"1\" class=\"dataframe\">\n",
 134 |        "  <thead>\n",
 135 |        "    <tr style=\"text-align: right;\">\n",
 136 |        "      <th></th>\n",
 137 |        "      <th>A</th>\n",
 138 |        "      <th>B</th>\n",
 139 |        "      <th>C</th>\n",
 140 |        "    </tr>\n",
 141 |        "  </thead>\n",
 142 |        "  <tbody>\n",
 143 |        "    <tr>\n",
 144 |        "      <th>0</th>\n",
 145 |        "      <td>2.01</td>\n",
 146 |        "      <td>1.70</td>\n",
 147 |        "      <td>1.62</td>\n",
 148 |        "    </tr>\n",
 149 |        "    <tr>\n",
 150 |        "      <th>1</th>\n",
 151 |        "      <td>1.78</td>\n",
 152 |        "      <td>1.05</td>\n",
 153 |        "      <td>1.68</td>\n",
 154 |        "    </tr>\n",
 155 |        "    <tr>\n",
 156 |        "      <th>2</th>\n",
 157 |        "      <td>1.38</td>\n",
 158 |        "      <td>1.05</td>\n",
 159 |        "      <td>1.07</td>\n",
 160 |        "    </tr>\n",
 161 |        "    <tr>\n",
 162 |        "      <th>3</th>\n",
 163 |        "      <td>2.55</td>\n",
 164 |        "      <td>1.19</td>\n",
 165 |        "      <td>1.08</td>\n",
 166 |        "    </tr>\n",
 167 |        "  </tbody>\n",
 168 |        "</table>\n",
 169 |        "</div>"
 170 |       ],
 171 |       "text/plain": [
 172 |        "      A     B     C\n",
 173 |        "0  2.01  1.70  1.62\n",
 174 |        "1  1.78  1.05  1.68\n",
 175 |        "2  1.38  1.05  1.07\n",
 176 |        "3  2.55  1.19  1.08"
 177 |       ]
 178 |      },
 179 |      "execution_count": 3,
 180 |      "metadata": {},
 181 |      "output_type": "execute_result"
 182 |     }
 183 |    ],
 184 |    "source": [
 185 |     "df.apply(lambda x: 1 + np.abs(x))"
 186 |    ]
 187 |   },
 188 |   {
 189 |    "cell_type": "code",
 190 |    "execution_count": 4,
 191 |    "metadata": {
 192 |     "ExecuteTime": {
 193 |      "end_time": "2020-02-22T03:02:55.820553Z",
 194 |      "start_time": "2020-02-22T03:02:55.814335Z"
 195 |     }
 196 |    },
 197 |    "outputs": [
 198 |     {
 199 |      "data": {
 200 |       "text/plain": [
 201 |        "0    1.01\n",
 202 |        "1    0.78\n",
 203 |        "2    0.38\n",
 204 |        "3    1.55\n",
 205 |        "Name: A, dtype: float64"
 206 |       ]
 207 |      },
 208 |      "execution_count": 4,
 209 |      "metadata": {},
 210 |      "output_type": "execute_result"
 211 |     }
 212 |    ],
 213 |    "source": [
 214 |     "df.A.apply(np.abs)"
 215 |    ]
 216 |   },
 217 |   {
 218 |    "cell_type": "code",
 219 |    "execution_count": 6,
 220 |    "metadata": {
 221 |     "ExecuteTime": {
 222 |      "end_time": "2020-02-22T03:04:42.256485Z",
 223 |      "start_time": "2020-02-22T03:04:42.253987Z"
 224 |     }
 225 |    },
 226 |    "outputs": [],
 227 |    "source": [
 228 |     "#def double_if_positive(x):\n",
 229 |     "#    if x > 0:\n",
 230 |     "#        return 2 * x\n",
 231 |     "#    return x\n",
 232 |     "#\n",
 233 |     "#df.apply(double_if_positive)"
 234 |    ]
 235 |   },
 236 |   {
 237 |    "cell_type": "code",
 238 |    "execution_count": 7,
 239 |    "metadata": {
 240 |     "ExecuteTime": {
 241 |      "end_time": "2020-02-22T03:05:04.690134Z",
 242 |      "start_time": "2020-02-22T03:05:04.662382Z"
 243 |     }
 244 |    },
 245 |    "outputs": [
 246 |     {
 247 |      "data": {
 248 |       "text/html": [
 249 |        "<div>\n",
 250 |        "<style scoped>\n",
 251 |        "    .dataframe tbody tr th:only-of-type {\n",
 252 |        "        vertical-align: middle;\n",
 253 |        "    }\n",
 254 |        "\n",
 255 |        "    .dataframe tbody tr th {\n",
 256 |        "        vertical-align: top;\n",
 257 |        "    }\n",
 258 |        "\n",
 259 |        "    .dataframe thead th {\n",
 260 |        "        text-align: right;\n",
 261 |        "    }\n",
 262 |        "</style>\n",
 263 |        "<table border=\"1\" class=\"dataframe\">\n",
 264 |        "  <thead>\n",
 265 |        "    <tr style=\"text-align: right;\">\n",
 266 |        "      <th></th>\n",
 267 |        "      <th>A</th>\n",
 268 |        "      <th>B</th>\n",
 269 |        "      <th>C</th>\n",
 270 |        "    </tr>\n",
 271 |        "  </thead>\n",
 272 |        "  <tbody>\n",
 273 |        "    <tr>\n",
 274 |        "      <th>0</th>\n",
 275 |        "      <td>-1.01</td>\n",
 276 |        "      <td>-0.70</td>\n",
 277 |        "      <td>1.24</td>\n",
 278 |        "    </tr>\n",
 279 |        "    <tr>\n",
 280 |        "      <th>1</th>\n",
 281 |        "      <td>1.56</td>\n",
 282 |        "      <td>0.10</td>\n",
 283 |        "      <td>1.36</td>\n",
 284 |        "    </tr>\n",
 285 |        "    <tr>\n",
 286 |        "      <th>2</th>\n",
 287 |        "      <td>0.76</td>\n",
 288 |        "      <td>-0.05</td>\n",
 289 |        "      <td>-0.07</td>\n",
 290 |        "    </tr>\n",
 291 |        "    <tr>\n",
 292 |        "      <th>3</th>\n",
 293 |        "      <td>-1.55</td>\n",
 294 |        "      <td>-0.19</td>\n",
 295 |        "      <td>-0.08</td>\n",
 296 |        "    </tr>\n",
 297 |        "  </tbody>\n",
 298 |        "</table>\n",
 299 |        "</div>"
 300 |       ],
 301 |       "text/plain": [
 302 |        "      A     B     C\n",
 303 |        "0 -1.01 -0.70  1.24\n",
 304 |        "1  1.56  0.10  1.36\n",
 305 |        "2  0.76 -0.05 -0.07\n",
 306 |        "3 -1.55 -0.19 -0.08"
 307 |       ]
 308 |      },
 309 |      "execution_count": 7,
 310 |      "metadata": {},
 311 |      "output_type": "execute_result"
 312 |     }
 313 |    ],
 314 |    "source": [
 315 |     "def double_if_positive(x):\n",
 316 |     "    x[x > 0] *= 2\n",
 317 |     "    return x\n",
 318 |     "\n",
 319 |     "df.apply(double_if_positive)"
 320 |    ]
 321 |   },
 322 |   {
 323 |    "cell_type": "code",
 324 |    "execution_count": 8,
 325 |    "metadata": {
 326 |     "ExecuteTime": {
 327 |      "end_time": "2020-02-22T03:05:32.894881Z",
 328 |      "start_time": "2020-02-22T03:05:32.887394Z"
 329 |     }
 330 |    },
 331 |    "outputs": [
 332 |     {
 333 |      "data": {
 334 |       "text/html": [
 335 |        "<div>\n",
 336 |        "<style scoped>\n",
 337 |        "    .dataframe tbody tr th:only-of-type {\n",
 338 |        "        vertical-align: middle;\n",
 339 |        "    }\n",
 340 |        "\n",
 341 |        "    .dataframe tbody tr th {\n",
 342 |        "        vertical-align: top;\n",
 343 |        "    }\n",
 344 |        "\n",
 345 |        "    .dataframe thead th {\n",
 346 |        "        text-align: right;\n",
 347 |        "    }\n",
 348 |        "</style>\n",
 349 |        "<table border=\"1\" class=\"dataframe\">\n",
 350 |        "  <thead>\n",
 351 |        "    <tr style=\"text-align: right;\">\n",
 352 |        "      <th></th>\n",
 353 |        "      <th>A</th>\n",
 354 |        "      <th>B</th>\n",
 355 |        "      <th>C</th>\n",
 356 |        "    </tr>\n",
 357 |        "  </thead>\n",
 358 |        "  <tbody>\n",
 359 |        "    <tr>\n",
 360 |        "      <th>0</th>\n",
 361 |        "      <td>-1.01</td>\n",
 362 |        "      <td>-0.70</td>\n",
 363 |        "      <td>1.24</td>\n",
 364 |        "    </tr>\n",
 365 |        "    <tr>\n",
 366 |        "      <th>1</th>\n",
 367 |        "      <td>1.56</td>\n",
 368 |        "      <td>0.10</td>\n",
 369 |        "      <td>1.36</td>\n",
 370 |        "    </tr>\n",
 371 |        "    <tr>\n",
 372 |        "      <th>2</th>\n",
 373 |        "      <td>0.76</td>\n",
 374 |        "      <td>-0.05</td>\n",
 375 |        "      <td>-0.07</td>\n",
 376 |        "    </tr>\n",
 377 |        "    <tr>\n",
 378 |        "      <th>3</th>\n",
 379 |        "      <td>-1.55</td>\n",
 380 |        "      <td>-0.19</td>\n",
 381 |        "      <td>-0.08</td>\n",
 382 |        "    </tr>\n",
 383 |        "  </tbody>\n",
 384 |        "</table>\n",
 385 |        "</div>"
 386 |       ],
 387 |       "text/plain": [
 388 |        "      A     B     C\n",
 389 |        "0 -1.01 -0.70  1.24\n",
 390 |        "1  1.56  0.10  1.36\n",
 391 |        "2  0.76 -0.05 -0.07\n",
 392 |        "3 -1.55 -0.19 -0.08"
 393 |       ]
 394 |      },
 395 |      "execution_count": 8,
 396 |      "metadata": {},
 397 |      "output_type": "execute_result"
 398 |     }
 399 |    ],
 400 |    "source": [
 401 |     "df"
 402 |    ]
 403 |   },
 404 |   {
 405 |    "cell_type": "code",
 406 |    "execution_count": 11,
 407 |    "metadata": {
 408 |     "ExecuteTime": {
 409 |      "end_time": "2020-02-22T03:07:51.904072Z",
 410 |      "start_time": "2020-02-22T03:07:51.894055Z"
 411 |     }
 412 |    },
 413 |    "outputs": [
 414 |     {
 415 |      "data": {
 416 |       "text/html": [
 417 |        "<div>\n",
 418 |        "<style scoped>\n",
 419 |        "    .dataframe tbody tr th:only-of-type {\n",
 420 |        "        vertical-align: middle;\n",
 421 |        "    }\n",
 422 |        "\n",
 423 |        "    .dataframe tbody tr th {\n",
 424 |        "        vertical-align: top;\n",
 425 |        "    }\n",
 426 |        "\n",
 427 |        "    .dataframe thead th {\n",
 428 |        "        text-align: right;\n",
 429 |        "    }\n",
 430 |        "</style>\n",
 431 |        "<table border=\"1\" class=\"dataframe\">\n",
 432 |        "  <thead>\n",
 433 |        "    <tr style=\"text-align: right;\">\n",
 434 |        "      <th></th>\n",
 435 |        "      <th>A</th>\n",
 436 |        "      <th>B</th>\n",
 437 |        "      <th>C</th>\n",
 438 |        "    </tr>\n",
 439 |        "  </thead>\n",
 440 |        "  <tbody>\n",
 441 |        "    <tr>\n",
 442 |        "      <th>0</th>\n",
 443 |        "      <td>-1.01</td>\n",
 444 |        "      <td>-0.70</td>\n",
 445 |        "      <td>2.48</td>\n",
 446 |        "    </tr>\n",
 447 |        "    <tr>\n",
 448 |        "      <th>1</th>\n",
 449 |        "      <td>3.12</td>\n",
 450 |        "      <td>0.20</td>\n",
 451 |        "      <td>2.72</td>\n",
 452 |        "    </tr>\n",
 453 |        "    <tr>\n",
 454 |        "      <th>2</th>\n",
 455 |        "      <td>1.52</td>\n",
 456 |        "      <td>-0.05</td>\n",
 457 |        "      <td>-0.07</td>\n",
 458 |        "    </tr>\n",
 459 |        "    <tr>\n",
 460 |        "      <th>3</th>\n",
 461 |        "      <td>-1.55</td>\n",
 462 |        "      <td>-0.19</td>\n",
 463 |        "      <td>-0.08</td>\n",
 464 |        "    </tr>\n",
 465 |        "  </tbody>\n",
 466 |        "</table>\n",
 467 |        "</div>"
 468 |       ],
 469 |       "text/plain": [
 470 |        "      A     B     C\n",
 471 |        "0 -1.01 -0.70  2.48\n",
 472 |        "1  3.12  0.20  2.72\n",
 473 |        "2  1.52 -0.05 -0.07\n",
 474 |        "3 -1.55 -0.19 -0.08"
 475 |       ]
 476 |      },
 477 |      "execution_count": 11,
 478 |      "metadata": {},
 479 |      "output_type": "execute_result"
 480 |     }
 481 |    ],
 482 |    "source": [
 483 |     "def double_if_positive(x):\n",
 484 |     "    x = x.copy()\n",
 485 |     "    x[x > 0] *= 2\n",
 486 |     "    return x\n",
 487 |     "\n",
 488 |     "df.apply(double_if_positive, raw=True)"
 489 |    ]
 490 |   },
 491 |   {
 492 |    "cell_type": "markdown",
 493 |    "metadata": {},
 494 |    "source": [
 495 |     "## Map\n",
 496 |     "\n",
 497 |     "Similar to apply, but operators on Series, and uses dictionary based inputs rather than an array of values.\n"
 498 |    ]
 499 |   },
 500 |   {
 501 |    "cell_type": "code",
 502 |    "execution_count": 12,
 503 |    "metadata": {
 504 |     "ExecuteTime": {
 505 |      "end_time": "2020-02-22T03:09:07.652877Z",
 506 |      "start_time": "2020-02-22T03:09:07.646810Z"
 507 |     }
 508 |    },
 509 |    "outputs": [],
 510 |    "source": [
 511 |     "series = pd.Series([\"Steve\", \"Alex\", \"Jess\", \"Mark\"])"
 512 |    ]
 513 |   },
 514 |   {
 515 |    "cell_type": "code",
 516 |    "execution_count": 13,
 517 |    "metadata": {
 518 |     "ExecuteTime": {
 519 |      "end_time": "2020-02-22T03:09:19.239855Z",
 520 |      "start_time": "2020-02-22T03:09:19.231863Z"
 521 |     }
 522 |    },
 523 |    "outputs": [
 524 |     {
 525 |      "data": {
 526 |       "text/plain": [
 527 |        "0    Stephen\n",
 528 |        "1        NaN\n",
 529 |        "2        NaN\n",
 530 |        "3        NaN\n",
 531 |        "dtype: object"
 532 |       ]
 533 |      },
 534 |      "execution_count": 13,
 535 |      "metadata": {},
 536 |      "output_type": "execute_result"
 537 |     }
 538 |    ],
 539 |    "source": [
 540 |     "series.map({\"Steve\": \"Stephen\"})"
 541 |    ]
 542 |   },
 543 |   {
 544 |    "cell_type": "code",
 545 |    "execution_count": 14,
 546 |    "metadata": {
 547 |     "ExecuteTime": {
 548 |      "end_time": "2020-02-22T03:10:19.253698Z",
 549 |      "start_time": "2020-02-22T03:10:19.247477Z"
 550 |     }
 551 |    },
 552 |    "outputs": [
 553 |     {
 554 |      "data": {
 555 |       "text/plain": [
 556 |        "0    I am Steve\n",
 557 |        "1     I am Alex\n",
 558 |        "2     I am Jess\n",
 559 |        "3     I am Mark\n",
 560 |        "dtype: object"
 561 |       ]
 562 |      },
 563 |      "execution_count": 14,
 564 |      "metadata": {},
 565 |      "output_type": "execute_result"
 566 |     }
 567 |    ],
 568 |    "source": [
 569 |     "series.map(lambda d: f\"I am {d}\")"
 570 |    ]
 571 |   },
 572 |   {
 573 |    "cell_type": "markdown",
 574 |    "metadata": {
 575 |     "ExecuteTime": {
 576 |      "end_time": "2020-02-22T03:12:35.912759Z",
 577 |      "start_time": "2020-02-22T03:12:35.902370Z"
 578 |     }
 579 |    },
 580 |    "source": [
 581 |     "## Vectorised functions\n",
 582 |     "\n",
 583 |     "Pandas and numpy obviously have tons of these, here are some examples"
 584 |    ]
 585 |   },
 586 |   {
 587 |    "cell_type": "code",
 588 |    "execution_count": 17,
 589 |    "metadata": {
 590 |     "ExecuteTime": {
 591 |      "end_time": "2020-02-22T03:14:11.987446Z",
 592 |      "start_time": "2020-02-22T03:14:11.974356Z"
 593 |     }
 594 |    },
 595 |    "outputs": [
 596 |     {
 597 |      "data": {
 598 |       "text/html": [
 599 |        "<div>\n",
 600 |        "<style scoped>\n",
 601 |        "    .dataframe tbody tr th:only-of-type {\n",
 602 |        "        vertical-align: middle;\n",
 603 |        "    }\n",
 604 |        "\n",
 605 |        "    .dataframe tbody tr th {\n",
 606 |        "        vertical-align: top;\n",
 607 |        "    }\n",
 608 |        "\n",
 609 |        "    .dataframe thead th {\n",
 610 |        "        text-align: right;\n",
 611 |        "    }\n",
 612 |        "</style>\n",
 613 |        "<table border=\"1\" class=\"dataframe\">\n",
 614 |        "  <thead>\n",
 615 |        "    <tr style=\"text-align: right;\">\n",
 616 |        "      <th></th>\n",
 617 |        "      <th>A</th>\n",
 618 |        "      <th>B</th>\n",
 619 |        "      <th>C</th>\n",
 620 |        "    </tr>\n",
 621 |        "  </thead>\n",
 622 |        "  <tbody>\n",
 623 |        "    <tr>\n",
 624 |        "      <th>0</th>\n",
 625 |        "      <td>-1.01</td>\n",
 626 |        "      <td>-0.70</td>\n",
 627 |        "      <td>1.24</td>\n",
 628 |        "    </tr>\n",
 629 |        "    <tr>\n",
 630 |        "      <th>1</th>\n",
 631 |        "      <td>1.56</td>\n",
 632 |        "      <td>0.10</td>\n",
 633 |        "      <td>1.36</td>\n",
 634 |        "    </tr>\n",
 635 |        "    <tr>\n",
 636 |        "      <th>2</th>\n",
 637 |        "      <td>0.76</td>\n",
 638 |        "      <td>-0.05</td>\n",
 639 |        "      <td>-0.07</td>\n",
 640 |        "    </tr>\n",
 641 |        "    <tr>\n",
 642 |        "      <th>3</th>\n",
 643 |        "      <td>-1.55</td>\n",
 644 |        "      <td>-0.19</td>\n",
 645 |        "      <td>-0.08</td>\n",
 646 |        "    </tr>\n",
 647 |        "  </tbody>\n",
 648 |        "</table>\n",
 649 |        "</div>"
 650 |       ],
 651 |       "text/plain": [
 652 |        "      A     B     C\n",
 653 |        "0 -1.01 -0.70  1.24\n",
 654 |        "1  1.56  0.10  1.36\n",
 655 |        "2  0.76 -0.05 -0.07\n",
 656 |        "3 -1.55 -0.19 -0.08"
 657 |       ]
 658 |      },
 659 |      "metadata": {},
 660 |      "output_type": "display_data"
 661 |     },
 662 |     {
 663 |      "data": {
 664 |       "text/html": [
 665 |        "<div>\n",
 666 |        "<style scoped>\n",
 667 |        "    .dataframe tbody tr th:only-of-type {\n",
 668 |        "        vertical-align: middle;\n",
 669 |        "    }\n",
 670 |        "\n",
 671 |        "    .dataframe tbody tr th {\n",
 672 |        "        vertical-align: top;\n",
 673 |        "    }\n",
 674 |        "\n",
 675 |        "    .dataframe thead th {\n",
 676 |        "        text-align: right;\n",
 677 |        "    }\n",
 678 |        "</style>\n",
 679 |        "<table border=\"1\" class=\"dataframe\">\n",
 680 |        "  <thead>\n",
 681 |        "    <tr style=\"text-align: right;\">\n",
 682 |        "      <th></th>\n",
 683 |        "      <th>A</th>\n",
 684 |        "      <th>B</th>\n",
 685 |        "      <th>C</th>\n",
 686 |        "    </tr>\n",
 687 |        "  </thead>\n",
 688 |        "  <tbody>\n",
 689 |        "    <tr>\n",
 690 |        "      <th>0</th>\n",
 691 |        "      <td>1.01</td>\n",
 692 |        "      <td>0.70</td>\n",
 693 |        "      <td>1.24</td>\n",
 694 |        "    </tr>\n",
 695 |        "    <tr>\n",
 696 |        "      <th>1</th>\n",
 697 |        "      <td>1.56</td>\n",
 698 |        "      <td>0.10</td>\n",
 699 |        "      <td>1.36</td>\n",
 700 |        "    </tr>\n",
 701 |        "    <tr>\n",
 702 |        "      <th>2</th>\n",
 703 |        "      <td>0.76</td>\n",
 704 |        "      <td>0.05</td>\n",
 705 |        "      <td>0.07</td>\n",
 706 |        "    </tr>\n",
 707 |        "    <tr>\n",
 708 |        "      <th>3</th>\n",
 709 |        "      <td>1.55</td>\n",
 710 |        "      <td>0.19</td>\n",
 711 |        "      <td>0.08</td>\n",
 712 |        "    </tr>\n",
 713 |        "  </tbody>\n",
 714 |        "</table>\n",
 715 |        "</div>"
 716 |       ],
 717 |       "text/plain": [
 718 |        "      A     B     C\n",
 719 |        "0  1.01  0.70  1.24\n",
 720 |        "1  1.56  0.10  1.36\n",
 721 |        "2  0.76  0.05  0.07\n",
 722 |        "3  1.55  0.19  0.08"
 723 |       ]
 724 |      },
 725 |      "metadata": {},
 726 |      "output_type": "display_data"
 727 |     }
 728 |    ],
 729 |    "source": [
 730 |     "display(df, df.abs())"
 731 |    ]
 732 |   },
 733 |   {
 734 |    "cell_type": "code",
 735 |    "execution_count": 18,
 736 |    "metadata": {
 737 |     "ExecuteTime": {
 738 |      "end_time": "2020-02-22T03:14:53.996400Z",
 739 |      "start_time": "2020-02-22T03:14:53.992364Z"
 740 |     }
 741 |    },
 742 |    "outputs": [],
 743 |    "source": [
 744 |     "series = pd.Series([\"Obi-Wan Kenobi\", \"Luke Skywalker\", \"Han Solo\", \"Leia Organa\"])"
 745 |    ]
 746 |   },
 747 |   {
 748 |    "cell_type": "code",
 749 |    "execution_count": 20,
 750 |    "metadata": {
 751 |     "ExecuteTime": {
 752 |      "end_time": "2020-02-22T03:15:40.875036Z",
 753 |      "start_time": "2020-02-22T03:15:40.871022Z"
 754 |     }
 755 |    },
 756 |    "outputs": [
 757 |     {
 758 |      "data": {
 759 |       "text/plain": [
 760 |        "['Luke', 'Skywalker']"
 761 |       ]
 762 |      },
 763 |      "execution_count": 20,
 764 |      "metadata": {},
 765 |      "output_type": "execute_result"
 766 |     }
 767 |    ],
 768 |    "source": [
 769 |     "\"Luke Skywalker\".split()"
 770 |    ]
 771 |   },
 772 |   {
 773 |    "cell_type": "code",
 774 |    "execution_count": 23,
 775 |    "metadata": {
 776 |     "ExecuteTime": {
 777 |      "end_time": "2020-02-22T03:16:42.001894Z",
 778 |      "start_time": "2020-02-22T03:16:41.992370Z"
 779 |     }
 780 |    },
 781 |    "outputs": [
 782 |     {
 783 |      "data": {
 784 |       "text/html": [
 785 |        "<div>\n",
 786 |        "<style scoped>\n",
 787 |        "    .dataframe tbody tr th:only-of-type {\n",
 788 |        "        vertical-align: middle;\n",
 789 |        "    }\n",
 790 |        "\n",
 791 |        "    .dataframe tbody tr th {\n",
 792 |        "        vertical-align: top;\n",
 793 |        "    }\n",
 794 |        "\n",
 795 |        "    .dataframe thead th {\n",
 796 |        "        text-align: right;\n",
 797 |        "    }\n",
 798 |        "</style>\n",
 799 |        "<table border=\"1\" class=\"dataframe\">\n",
 800 |        "  <thead>\n",
 801 |        "    <tr style=\"text-align: right;\">\n",
 802 |        "      <th></th>\n",
 803 |        "      <th>0</th>\n",
 804 |        "      <th>1</th>\n",
 805 |        "    </tr>\n",
 806 |        "  </thead>\n",
 807 |        "  <tbody>\n",
 808 |        "    <tr>\n",
 809 |        "      <th>0</th>\n",
 810 |        "      <td>Obi-Wan</td>\n",
 811 |        "      <td>Kenobi</td>\n",
 812 |        "    </tr>\n",
 813 |        "    <tr>\n",
 814 |        "      <th>1</th>\n",
 815 |        "      <td>Luke</td>\n",
 816 |        "      <td>Skywalker</td>\n",
 817 |        "    </tr>\n",
 818 |        "    <tr>\n",
 819 |        "      <th>2</th>\n",
 820 |        "      <td>Han</td>\n",
 821 |        "      <td>Solo</td>\n",
 822 |        "    </tr>\n",
 823 |        "    <tr>\n",
 824 |        "      <th>3</th>\n",
 825 |        "      <td>Leia</td>\n",
 826 |        "      <td>Organa</td>\n",
 827 |        "    </tr>\n",
 828 |        "  </tbody>\n",
 829 |        "</table>\n",
 830 |        "</div>"
 831 |       ],
 832 |       "text/plain": [
 833 |        "         0          1\n",
 834 |        "0  Obi-Wan     Kenobi\n",
 835 |        "1     Luke  Skywalker\n",
 836 |        "2      Han       Solo\n",
 837 |        "3     Leia     Organa"
 838 |       ]
 839 |      },
 840 |      "execution_count": 23,
 841 |      "metadata": {},
 842 |      "output_type": "execute_result"
 843 |     }
 844 |    ],
 845 |    "source": [
 846 |     "series.str.split(expand=True)"
 847 |    ]
 848 |   },
 849 |   {
 850 |    "cell_type": "code",
 851 |    "execution_count": 24,
 852 |    "metadata": {
 853 |     "ExecuteTime": {
 854 |      "end_time": "2020-02-22T03:17:28.038500Z",
 855 |      "start_time": "2020-02-22T03:17:28.033999Z"
 856 |     }
 857 |    },
 858 |    "outputs": [
 859 |     {
 860 |      "data": {
 861 |       "text/plain": [
 862 |        "0    False\n",
 863 |        "1     True\n",
 864 |        "2    False\n",
 865 |        "3    False\n",
 866 |        "dtype: bool"
 867 |       ]
 868 |      },
 869 |      "execution_count": 24,
 870 |      "metadata": {},
 871 |      "output_type": "execute_result"
 872 |     }
 873 |    ],
 874 |    "source": [
 875 |     "series.str.contains(\"Skywalker\")"
 876 |    ]
 877 |   },
 878 |   {
 879 |    "cell_type": "code",
 880 |    "execution_count": 26,
 881 |    "metadata": {
 882 |     "ExecuteTime": {
 883 |      "end_time": "2020-02-22T03:18:20.707962Z",
 884 |      "start_time": "2020-02-22T03:18:20.702104Z"
 885 |     }
 886 |    },
 887 |    "outputs": [
 888 |     {
 889 |      "data": {
 890 |       "text/plain": [
 891 |        "0    [OBI-WAN, KENOBI]\n",
 892 |        "1    [LUKE, SKYWALKER]\n",
 893 |        "2          [HAN, SOLO]\n",
 894 |        "3       [LEIA, ORGANA]\n",
 895 |        "dtype: object"
 896 |       ]
 897 |      },
 898 |      "execution_count": 26,
 899 |      "metadata": {},
 900 |      "output_type": "execute_result"
 901 |     }
 902 |    ],
 903 |    "source": [
 904 |     "series.str.upper().str.split()"
 905 |    ]
 906 |   },
 907 |   {
 908 |    "cell_type": "markdown",
 909 |    "metadata": {},
 910 |    "source": [
 911 |     "## User defined functions\n",
 912 |     "\n",
 913 |     "Lets investigate a super simple example of trying to find the hypotenuse given x and y distances.\n"
 914 |    ]
 915 |   },
 916 |   {
 917 |    "cell_type": "code",
 918 |    "execution_count": 27,
 919 |    "metadata": {
 920 |     "ExecuteTime": {
 921 |      "end_time": "2020-02-22T03:19:38.514718Z",
 922 |      "start_time": "2020-02-22T03:19:38.503227Z"
 923 |     }
 924 |    },
 925 |    "outputs": [],
 926 |    "source": [
 927 |     "data2 = np.random.normal(10, 2, size=(100000, 2))\n",
 928 |     "df2 = pd.DataFrame(data2, columns=[\"x\", \"y\"])"
 929 |    ]
 930 |   },
 931 |   {
 932 |    "cell_type": "code",
 933 |    "execution_count": 28,
 934 |    "metadata": {
 935 |     "ExecuteTime": {
 936 |      "end_time": "2020-02-22T03:20:22.345484Z",
 937 |      "start_time": "2020-02-22T03:20:22.320297Z"
 938 |     }
 939 |    },
 940 |    "outputs": [
 941 |     {
 942 |      "name": "stdout",
 943 |      "output_type": "stream",
 944 |      "text": [
 945 |       "13.385640543875555\n"
 946 |      ]
 947 |     }
 948 |    ],
 949 |    "source": [
 950 |     "hypot = (df2.x**2 + df2.y**2)**0.5\n",
 951 |     "print(hypot[0])"
 952 |    ]
 953 |   },
 954 |   {
 955 |    "cell_type": "code",
 956 |    "execution_count": 29,
 957 |    "metadata": {
 958 |     "ExecuteTime": {
 959 |      "end_time": "2020-02-22T03:22:05.047787Z",
 960 |      "start_time": "2020-02-22T03:21:57.547968Z"
 961 |     }
 962 |    },
 963 |    "outputs": [
 964 |     {
 965 |      "name": "stdout",
 966 |      "output_type": "stream",
 967 |      "text": [
 968 |       "13.385640543875555\n"
 969 |      ]
 970 |     }
 971 |    ],
 972 |    "source": [
 973 |     "def hypot1(x, y):\n",
 974 |     "    return np.sqrt(x**2 + y**2)\n",
 975 |     "\n",
 976 |     "h1 = []\n",
 977 |     "for index, (x, y) in df2.iterrows():\n",
 978 |     "    h1.append(hypot1(x, y))\n",
 979 |     "print(h1[0])"
 980 |    ]
 981 |   },
 982 |   {
 983 |    "cell_type": "code",
 984 |    "execution_count": 30,
 985 |    "metadata": {
 986 |     "ExecuteTime": {
 987 |      "end_time": "2020-02-22T03:23:27.324121Z",
 988 |      "start_time": "2020-02-22T03:23:24.153687Z"
 989 |     }
 990 |    },
 991 |    "outputs": [
 992 |     {
 993 |      "name": "stdout",
 994 |      "output_type": "stream",
 995 |      "text": [
 996 |       "13.385640543875555\n"
 997 |      ]
 998 |     }
 999 |    ],
1000 |    "source": [
1001 |     "def hypot2(row):\n",
1002 |     "    return np.sqrt(row.x**2 + row.y**2)\n",
1003 |     "\n",
1004 |     "h2 = df2.apply(hypot2, axis=1)\n",
1005 |     "print(h2[0])"
1006 |    ]
1007 |   },
1008 |   {
1009 |    "cell_type": "code",
1010 |    "execution_count": 31,
1011 |    "metadata": {
1012 |     "ExecuteTime": {
1013 |      "end_time": "2020-02-22T03:24:23.324639Z",
1014 |      "start_time": "2020-02-22T03:24:23.313038Z"
1015 |     }
1016 |    },
1017 |    "outputs": [
1018 |     {
1019 |      "name": "stdout",
1020 |      "output_type": "stream",
1021 |      "text": [
1022 |       "13.385640543875555\n"
1023 |      ]
1024 |     }
1025 |    ],
1026 |    "source": [
1027 |     "def hypot3(xs, ys):\n",
1028 |     "    return np.sqrt(xs**2 + ys**2)\n",
1029 |     "h3 = hypot3(df2.x, df2.y)\n",
1030 |     "print(h3[0])"
1031 |    ]
1032 |   },
1033 |   {
1034 |    "cell_type": "markdown",
1035 |    "metadata": {},
1036 |    "source": [
1037 |     "Vectorising everything you can is the key to speeding up your code. Once you've done that, you should use other tools to investigate. PyCharm Professional has a great optimisation tool built in. Jupyter has %lprun (line profiler) command you can find here: https://github.com/rkern/line_profiler\n",
1038 |     "\n",
1039 |     "### Recap\n",
1040 |     "\n",
1041 |     "* apply\n",
1042 |     "* map\n",
1043 |     "* .str & similar"
1044 |    ]
1045 |   },
1046 |   {
1047 |    "cell_type": "code",
1048 |    "execution_count": null,
1049 |    "metadata": {},
1050 |    "outputs": [],
1051 |    "source": []
1052 |   }
1053 |  ],
1054 |  "metadata": {
1055 |   "kernelspec": {
1056 |    "display_name": "Python 3",
1057 |    "language": "python",
1058 |    "name": "python3"
1059 |   },
1060 |   "language_info": {
1061 |    "codemirror_mode": {
1062 |     "name": "ipython",
1063 |     "version": 3
1064 |    },
1065 |    "file_extension": ".py",
1066 |    "mimetype": "text/x-python",
1067 |    "name": "python",
1068 |    "nbconvert_exporter": "python",
1069 |    "pygments_lexer": "ipython3",
1070 |    "version": "3.7.3"
1071 |   }
1072 |  },
1073 |  "nbformat": 4,
1074 |  "nbformat_minor": 2
1075 | }
1076 | 


--------------------------------------------------------------------------------