├── practiceResource
├── heart.pkl
├── save.hdf
├── save.pkl
├── data
│ ├── example.pkl
│ ├── example_1.csv
│ ├── example_2.csv
│ ├── example_3.csv
│ ├── example_4.csv
│ ├── CreatingDataFrames.ipynb
│ └── heart.csv
├── Questions.ipynb
├── heart.csv
├── dataMaipulation
│ ├── Questions.ipynb
│ └── 5_Basics_ApplyMapVectorised.ipynb
├── Answers.ipynb
├── SavingAndSerialising.ipynb
├── .ipynb_checkpoints
│ └── dataSavingAndSerialising-checkpoint.ipynb
├── dataloading.ipynb
└── dataSavingAndSerialising.ipynb
└── NewPracticeResource
├── Practice_material_2.ipynb
└── practice_material.ipynb
/practiceResource/heart.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/heart.pkl
--------------------------------------------------------------------------------
/practiceResource/save.hdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/save.hdf
--------------------------------------------------------------------------------
/practiceResource/save.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/save.pkl
--------------------------------------------------------------------------------
/practiceResource/data/example.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nasir-hussain1/piaic_q2_class_reseouces/HEAD/practiceResource/data/example.pkl
--------------------------------------------------------------------------------
/practiceResource/data/example_1.csv:
--------------------------------------------------------------------------------
1 | name,gender,age,oaths
2 | Kaladin,Male,20.0,3.0
3 | Shallan,Female,17.0,2.0
4 | Dalinar,Male,53.0,3.0
5 | Szeth,Male,35.0,0.0
6 | Hoid,Male,,
7 | Jashnah,Female,34.0,3.0
8 |
--------------------------------------------------------------------------------
/practiceResource/data/example_2.csv:
--------------------------------------------------------------------------------
1 | name gender age oaths
2 | Kaladin Male 20.0 3.0
3 | Shallan Female 17.0 2.0
4 | Dalinar Male 53.0 3.0
5 | Szeth Male 35.0 0.0
6 | Hoid Male
7 | Jashnah Female 34.0 3.0
8 |
--------------------------------------------------------------------------------
/practiceResource/data/example_3.csv:
--------------------------------------------------------------------------------
1 | ,name,gender,age,oaths
2 | 0,Kaladin,Male,20.0,3.0
3 | 1,Shallan,Female,17.0,2.0
4 | 2,Dalinar,Male,53.0,3.0
5 | 3,Szeth,Male,35.0,0.0
6 | 4,Hoid,Male,,
7 | 5,Jashnah,Female,34.0,3.0
8 |
--------------------------------------------------------------------------------
/practiceResource/data/example_4.csv:
--------------------------------------------------------------------------------
1 | # This file contains details guessed from Stormlight Archive
2 | name|gender|age|oaths
3 | Kaladin|Male|20.0|3.0
4 | Shallan|Female|17.0|2.0
5 | Dalinar|Male|53.0|3.0
6 | Szeth|Male|35.0|0.0
7 | # Who knows about Hoid
8 | Hoid|Male|NaN|NaN
9 | Jashnah|Female|34.0|3.0
10 |
--------------------------------------------------------------------------------
/practiceResource/Questions.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Optional Exercise - Data loading\n",
8 | "\n",
9 | "Heres a very short example for four files to load in. None of these example should be hard, but the idea is to familiarise yourself with the sorts of error messages or weird output you'd get if you load in with the defaults (and they don't work), and then how to change your input arguments to make it better.\n",
10 | "\n",
11 | "The files to attempt to load in are:\n",
12 | "\n",
13 | "1. example.pkl\n",
14 | "2. example_1.csv\n",
15 | "3. example_2.csv\n",
16 | "3. example_3.csv\n",
17 | "3. example_4.csv"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": 1,
23 | "metadata": {
24 | "ExecuteTime": {
25 | "end_time": "2020-03-17T04:34:36.761456Z",
26 | "start_time": "2020-03-17T04:34:36.452524Z"
27 | }
28 | },
29 | "outputs": [],
30 | "source": [
31 | "import pandas as pd"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 3,
37 | "metadata": {
38 | "ExecuteTime": {
39 | "end_time": "2020-03-17T04:35:03.643107Z",
40 | "start_time": "2020-03-17T04:35:03.639118Z"
41 | }
42 | },
43 | "outputs": [],
44 | "source": [
45 | "# Load example.pkl"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 4,
51 | "metadata": {
52 | "ExecuteTime": {
53 | "end_time": "2020-03-17T04:35:05.900792Z",
54 | "start_time": "2020-03-17T04:35:05.896809Z"
55 | }
56 | },
57 | "outputs": [],
58 | "source": [
59 | "# Load example_1.csv"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 5,
65 | "metadata": {
66 | "ExecuteTime": {
67 | "end_time": "2020-03-17T04:35:07.023962Z",
68 | "start_time": "2020-03-17T04:35:07.021962Z"
69 | }
70 | },
71 | "outputs": [],
72 | "source": [
73 | "# Load example_2.csv"
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": 6,
79 | "metadata": {
80 | "ExecuteTime": {
81 | "end_time": "2020-03-17T04:35:07.698522Z",
82 | "start_time": "2020-03-17T04:35:07.694534Z"
83 | }
84 | },
85 | "outputs": [],
86 | "source": [
87 | "# Load example_3.csv"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": 7,
93 | "metadata": {
94 | "ExecuteTime": {
95 | "end_time": "2020-03-17T04:35:08.514409Z",
96 | "start_time": "2020-03-17T04:35:08.511417Z"
97 | }
98 | },
99 | "outputs": [],
100 | "source": [
101 | "# Load example_4.csv"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": null,
107 | "metadata": {},
108 | "outputs": [],
109 | "source": []
110 | }
111 | ],
112 | "metadata": {
113 | "kernelspec": {
114 | "display_name": "Python 3",
115 | "language": "python",
116 | "name": "python3"
117 | },
118 | "language_info": {
119 | "codemirror_mode": {
120 | "name": "ipython",
121 | "version": 3
122 | },
123 | "file_extension": ".py",
124 | "mimetype": "text/x-python",
125 | "name": "python",
126 | "nbconvert_exporter": "python",
127 | "pygments_lexer": "ipython3",
128 | "version": "3.7.3"
129 | }
130 | },
131 | "nbformat": 4,
132 | "nbformat_minor": 2
133 | }
134 |
--------------------------------------------------------------------------------
/NewPracticeResource/Practice_material_2.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Download dataset form this website\n",
8 | "\n",
9 | "## https://www.kaggle.com/faressayah/stanford-open-policing-project?select=police_project.csv"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## Description::"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.\n"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "## Importing libraries::"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": null,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "import pandas as pd\n",
40 | "import numpy as np"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "# Use Pandas' read_csv function open it as a DataFrame"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": null,
53 | "metadata": {},
54 | "outputs": [],
55 | "source": []
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {},
60 | "source": [
61 | "# What does each row represent?\n",
62 | "\n",
63 | "#### hint::\n",
64 | "head : Return the first n rows. (By default return first 5 rows.)"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": null,
70 | "metadata": {},
71 | "outputs": [],
72 | "source": []
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "# How to get the basic statistics of all the columns?"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": null,
84 | "metadata": {},
85 | "outputs": [],
86 | "source": []
87 | },
88 | {
89 | "cell_type": "markdown",
90 | "metadata": {},
91 | "source": [
92 | "# How to check the shape of dataset?"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": null,
98 | "metadata": {},
99 | "outputs": [],
100 | "source": []
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": [
106 | "# Check the type of columns?"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": null,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": []
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "# Locating missing Values?\n",
121 | "#### detecting missing values\n",
122 | "#### calculates the sum of each column\n"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": null,
128 | "metadata": {},
129 | "outputs": [],
130 | "source": []
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": null,
135 | "metadata": {},
136 | "outputs": [],
137 | "source": []
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "# Dropping Column that only contains missing values."
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": null,
149 | "metadata": {},
150 | "outputs": [],
151 | "source": []
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": null,
156 | "metadata": {},
157 | "outputs": [],
158 | "source": []
159 | },
160 | {
161 | "cell_type": "markdown",
162 | "metadata": {},
163 | "source": [
164 | "# Do the men or women speed more often?"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": null,
170 | "metadata": {},
171 | "outputs": [],
172 | "source": []
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "metadata": {},
178 | "outputs": [],
179 | "source": []
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "# Which year had the least number of stops?"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": []
194 | },
195 | {
196 | "cell_type": "code",
197 | "execution_count": null,
198 | "metadata": {},
199 | "outputs": [],
200 | "source": []
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {},
205 | "source": [
206 | "# Does gender affect who gets searched during a stop?"
207 | ]
208 | },
209 | {
210 | "cell_type": "code",
211 | "execution_count": null,
212 | "metadata": {},
213 | "outputs": [],
214 | "source": []
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": null,
219 | "metadata": {},
220 | "outputs": [],
221 | "source": []
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {},
226 | "source": [
227 | "\n",
228 | "# How does drug activity change by time of day?"
229 | ]
230 | },
231 | {
232 | "cell_type": "code",
233 | "execution_count": null,
234 | "metadata": {},
235 | "outputs": [],
236 | "source": []
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": null,
241 | "metadata": {},
242 | "outputs": [],
243 | "source": []
244 | },
245 | {
246 | "cell_type": "markdown",
247 | "metadata": {},
248 | "source": [
249 | "# Do most stops occur at night?"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": null,
255 | "metadata": {},
256 | "outputs": [],
257 | "source": []
258 | },
259 | {
260 | "cell_type": "code",
261 | "execution_count": null,
262 | "metadata": {},
263 | "outputs": [],
264 | "source": []
265 | },
266 | {
267 | "cell_type": "markdown",
268 | "metadata": {},
269 | "source": []
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": null,
274 | "metadata": {},
275 | "outputs": [],
276 | "source": []
277 | },
278 | {
279 | "cell_type": "code",
280 | "execution_count": null,
281 | "metadata": {},
282 | "outputs": [],
283 | "source": []
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "metadata": {},
289 | "outputs": [],
290 | "source": []
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": null,
295 | "metadata": {},
296 | "outputs": [],
297 | "source": []
298 | },
299 | {
300 | "cell_type": "code",
301 | "execution_count": null,
302 | "metadata": {},
303 | "outputs": [],
304 | "source": []
305 | }
306 | ],
307 | "metadata": {
308 | "environment": {
309 | "name": "tf-gpu.1-15.m56",
310 | "type": "gcloud",
311 | "uri": "gcr.io/deeplearning-platform-release/tf-gpu.1-15:m56"
312 | },
313 | "kernelspec": {
314 | "display_name": "Python 3",
315 | "language": "python",
316 | "name": "python3"
317 | },
318 | "language_info": {
319 | "codemirror_mode": {
320 | "name": "ipython",
321 | "version": 3
322 | },
323 | "file_extension": ".py",
324 | "mimetype": "text/x-python",
325 | "name": "python",
326 | "nbconvert_exporter": "python",
327 | "pygments_lexer": "ipython3",
328 | "version": "3.7.8"
329 | }
330 | },
331 | "nbformat": 4,
332 | "nbformat_minor": 4
333 | }
334 |
--------------------------------------------------------------------------------
/NewPracticeResource/practice_material.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Practice Assignment :: 01"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## How to import pandas and check the version?"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": []
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": []
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "## import useful libraries"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": null,
41 | "metadata": {},
42 | "outputs": [],
43 | "source": []
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": null,
48 | "metadata": {},
49 | "outputs": [],
50 | "source": []
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "## How to create a series from a list, numpy array and dict?"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": null,
62 | "metadata": {},
63 | "outputs": [],
64 | "source": []
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": null,
69 | "metadata": {},
70 | "outputs": [],
71 | "source": []
72 | },
73 | {
74 | "cell_type": "markdown",
75 | "metadata": {},
76 | "source": [
77 | "## How to convert the index of a series into a column of a dataframe?"
78 | ]
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {},
83 | "source": [
84 | "## hint::\n",
85 | "### Convert the series ser into a dataframe with its index as another column on the dataframe."
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": null,
91 | "metadata": {},
92 | "outputs": [],
93 | "source": []
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": null,
98 | "metadata": {},
99 | "outputs": [],
100 | "source": []
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": [
106 | "# How to combine many series to form a dataframe?"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": null,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": []
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": null,
119 | "metadata": {},
120 | "outputs": [],
121 | "source": []
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "# How to calculate the number of characters in each word in a series?"
128 | ]
129 | },
130 | {
131 | "cell_type": "code",
132 | "execution_count": null,
133 | "metadata": {},
134 | "outputs": [],
135 | "source": []
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": null,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": []
143 | },
144 | {
145 | "cell_type": "markdown",
146 | "metadata": {},
147 | "source": [
148 | "# How to filter valid emails from a series?"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "## Desired Output::\n",
156 | "1 rameses@egypt.com\n",
157 | "\n",
158 | "2 matt@t.com\n",
159 | "\n",
160 | "3 narendra@modi.com\n"
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": null,
166 | "metadata": {},
167 | "outputs": [],
168 | "source": []
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": null,
173 | "metadata": {},
174 | "outputs": [],
175 | "source": []
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "# How to replace missing spaces in a string with the least frequent character?"
182 | ]
183 | },
184 | {
185 | "cell_type": "markdown",
186 | "metadata": {},
187 | "source": [
188 | "## Input::\n",
189 | "### my_str = 'dbc deb abed gade'\n",
190 | "## Desired Output::\n",
191 | "### 'dbccdebcabedcgade' # least frequent is 'c'"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": null,
197 | "metadata": {},
198 | "outputs": [],
199 | "source": []
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": null,
204 | "metadata": {},
205 | "outputs": [],
206 | "source": []
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "# How to swap two rows of a dataframe?"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": null,
218 | "metadata": {},
219 | "outputs": [],
220 | "source": []
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": null,
225 | "metadata": {},
226 | "outputs": [],
227 | "source": []
228 | },
229 | {
230 | "cell_type": "markdown",
231 | "metadata": {},
232 | "source": [
233 | "# How to get the positions where values of two columns match?"
234 | ]
235 | },
236 | {
237 | "cell_type": "code",
238 | "execution_count": null,
239 | "metadata": {},
240 | "outputs": [],
241 | "source": []
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": null,
246 | "metadata": {},
247 | "outputs": [],
248 | "source": []
249 | },
250 | {
251 | "cell_type": "markdown",
252 | "metadata": {},
253 | "source": [
254 | "# How to replace both the diagonals of dataframe with 0?"
255 | ]
256 | },
257 | {
258 | "cell_type": "code",
259 | "execution_count": null,
260 | "metadata": {},
261 | "outputs": [],
262 | "source": []
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": null,
267 | "metadata": {},
268 | "outputs": [],
269 | "source": []
270 | },
271 | {
272 | "cell_type": "markdown",
273 | "metadata": {},
274 | "source": [
275 | "# How to get the particular group of a groupby dataframe by key?"
276 | ]
277 | },
278 | {
279 | "cell_type": "markdown",
280 | "metadata": {},
281 | "source": [
282 | "### This is a question related to understanding of grouped dataframe."
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "metadata": {},
289 | "outputs": [],
290 | "source": []
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": null,
295 | "metadata": {},
296 | "outputs": [],
297 | "source": []
298 | },
299 | {
300 | "cell_type": "markdown",
301 | "metadata": {},
302 | "source": [
303 | "# Which column contains the highest number of row-wise maximum values?"
304 | ]
305 | },
306 | {
307 | "cell_type": "markdown",
308 | "metadata": {},
309 | "source": [
310 | "### Obtain the column name with the highest number of row-wise maximum’s in df."
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": null,
316 | "metadata": {},
317 | "outputs": [],
318 | "source": []
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": null,
323 | "metadata": {},
324 | "outputs": [],
325 | "source": []
326 | },
327 | {
328 | "cell_type": "code",
329 | "execution_count": null,
330 | "metadata": {},
331 | "outputs": [],
332 | "source": []
333 | }
334 | ],
335 | "metadata": {
336 | "environment": {
337 | "name": "tf-gpu.1-15.m56",
338 | "type": "gcloud",
339 | "uri": "gcr.io/deeplearning-platform-release/tf-gpu.1-15:m56"
340 | },
341 | "kernelspec": {
342 | "display_name": "Python 3",
343 | "language": "python",
344 | "name": "python3"
345 | },
346 | "language_info": {
347 | "codemirror_mode": {
348 | "name": "ipython",
349 | "version": 3
350 | },
351 | "file_extension": ".py",
352 | "mimetype": "text/x-python",
353 | "name": "python",
354 | "nbconvert_exporter": "python",
355 | "pygments_lexer": "ipython3",
356 | "version": "3.7.8"
357 | }
358 | },
359 | "nbformat": 4,
360 | "nbformat_minor": 4
361 | }
362 |
--------------------------------------------------------------------------------
/practiceResource/data/CreatingDataFrames.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Creating DataFrames\n",
8 | "\n",
9 | "Many ways to do it!"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "ExecuteTime": {
17 | "end_time": "2020-02-16T02:33:23.867017Z",
18 | "start_time": "2020-02-16T02:33:21.139382Z"
19 | }
20 | },
21 | "outputs": [],
22 | "source": [
23 | "import pandas as pd\n",
24 | "import numpy as np"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": 5,
30 | "metadata": {
31 | "ExecuteTime": {
32 | "end_time": "2020-02-16T02:48:39.727399Z",
33 | "start_time": "2020-02-16T02:48:39.707843Z"
34 | }
35 | },
36 | "outputs": [
37 | {
38 | "name": "stdout",
39 | "output_type": "stream",
40 | "text": [
41 | "[[0.97699724 0.23250035 0.17454747]\n",
42 | " [0.11011626 0.90673085 0.37222005]\n",
43 | " [0.77665114 0.81701713 0.57427769]\n",
44 | " [0.34080801 0.09617229 0.26027026]\n",
45 | " [0.03694591 0.5385542 0.95945971]]\n"
46 | ]
47 | },
48 | {
49 | "data": {
50 | "text/html": [
51 | "
\n",
52 | "\n",
65 | "
\n",
66 | " \n",
67 | " \n",
68 | " | \n",
69 | " A | \n",
70 | " B | \n",
71 | " C | \n",
72 | "
\n",
73 | " \n",
74 | " \n",
75 | " \n",
76 | " | 0 | \n",
77 | " 0.976997 | \n",
78 | " 0.232500 | \n",
79 | " 0.174547 | \n",
80 | "
\n",
81 | " \n",
82 | " | 1 | \n",
83 | " 0.110116 | \n",
84 | " 0.906731 | \n",
85 | " 0.372220 | \n",
86 | "
\n",
87 | " \n",
88 | " | 2 | \n",
89 | " 0.776651 | \n",
90 | " 0.817017 | \n",
91 | " 0.574278 | \n",
92 | "
\n",
93 | " \n",
94 | " | 3 | \n",
95 | " 0.340808 | \n",
96 | " 0.096172 | \n",
97 | " 0.260270 | \n",
98 | "
\n",
99 | " \n",
100 | " | 4 | \n",
101 | " 0.036946 | \n",
102 | " 0.538554 | \n",
103 | " 0.959460 | \n",
104 | "
\n",
105 | " \n",
106 | "
\n",
107 | "
"
108 | ],
109 | "text/plain": [
110 | " A B C\n",
111 | "0 0.976997 0.232500 0.174547\n",
112 | "1 0.110116 0.906731 0.372220\n",
113 | "2 0.776651 0.817017 0.574278\n",
114 | "3 0.340808 0.096172 0.260270\n",
115 | "4 0.036946 0.538554 0.959460"
116 | ]
117 | },
118 | "execution_count": 5,
119 | "metadata": {},
120 | "output_type": "execute_result"
121 | }
122 | ],
123 | "source": [
124 | "data = np.random.random(size=(5, 3))\n",
125 | "print(data)\n",
126 | "\n",
127 | "# Common 2D array and columns method\n",
128 | "df = pd.DataFrame(data=data, columns=[\"A\", \"B\", \"C\"])\n",
129 | "df"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 6,
135 | "metadata": {
136 | "ExecuteTime": {
137 | "end_time": "2020-02-16T02:49:34.262524Z",
138 | "start_time": "2020-02-16T02:49:34.252447Z"
139 | }
140 | },
141 | "outputs": [
142 | {
143 | "data": {
144 | "text/html": [
145 | "\n",
146 | "\n",
159 | "
\n",
160 | " \n",
161 | " \n",
162 | " | \n",
163 | " A | \n",
164 | " B | \n",
165 | "
\n",
166 | " \n",
167 | " \n",
168 | " \n",
169 | " | 0 | \n",
170 | " 1 | \n",
171 | " Sam | \n",
172 | "
\n",
173 | " \n",
174 | " | 1 | \n",
175 | " 2 | \n",
176 | " Alex | \n",
177 | "
\n",
178 | " \n",
179 | " | 2 | \n",
180 | " 3 | \n",
181 | " John | \n",
182 | "
\n",
183 | " \n",
184 | "
\n",
185 | "
"
186 | ],
187 | "text/plain": [
188 | " A B\n",
189 | "0 1 Sam\n",
190 | "1 2 Alex\n",
191 | "2 3 John"
192 | ]
193 | },
194 | "execution_count": 6,
195 | "metadata": {},
196 | "output_type": "execute_result"
197 | }
198 | ],
199 | "source": [
200 | "# A dictionary of columns\n",
201 | "df = pd.DataFrame(data={\"A\": [1, 2, 3], \"B\": [\"Sam\", \"Alex\", \"John\"]})\n",
202 | "df"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 9,
208 | "metadata": {
209 | "ExecuteTime": {
210 | "end_time": "2020-02-16T02:51:16.389841Z",
211 | "start_time": "2020-02-16T02:51:16.379319Z"
212 | }
213 | },
214 | "outputs": [
215 | {
216 | "data": {
217 | "text/html": [
218 | "\n",
219 | "\n",
232 | "
\n",
233 | " \n",
234 | " \n",
235 | " | \n",
236 | " A | \n",
237 | " B | \n",
238 | "
\n",
239 | " \n",
240 | " \n",
241 | " \n",
242 | " | 0 | \n",
243 | " 1 | \n",
244 | " Sam | \n",
245 | "
\n",
246 | " \n",
247 | " | 1 | \n",
248 | " 2 | \n",
249 | " Alex | \n",
250 | "
\n",
251 | " \n",
252 | " | 2 | \n",
253 | " 3 | \n",
254 | " John | \n",
255 | "
\n",
256 | " \n",
257 | "
\n",
258 | "
"
259 | ],
260 | "text/plain": [
261 | " A B\n",
262 | "0 1 Sam\n",
263 | "1 2 Alex\n",
264 | "2 3 John"
265 | ]
266 | },
267 | "execution_count": 9,
268 | "metadata": {},
269 | "output_type": "execute_result"
270 | }
271 | ],
272 | "source": [
273 | "# Or a list of rows (ie tuples) with a dtype\n",
274 | "dtype = [(\"A\", np.int), (\"B\", (np.str, 20))]\n",
275 | "data = np.array([(1, \"Sam\"), (2, \"Alex\"), (3, \"John\")], dtype=dtype)\n",
276 | "df = pd.DataFrame(data)\n",
277 | "df"
278 | ]
279 | },
280 | {
281 | "cell_type": "code",
282 | "execution_count": 10,
283 | "metadata": {
284 | "ExecuteTime": {
285 | "end_time": "2020-02-16T02:52:39.660112Z",
286 | "start_time": "2020-02-16T02:52:39.651418Z"
287 | }
288 | },
289 | "outputs": [
290 | {
291 | "data": {
292 | "text/html": [
293 | "\n",
294 | "\n",
307 | "
\n",
308 | " \n",
309 | " \n",
310 | " | \n",
311 | " A | \n",
312 | " B | \n",
313 | "
\n",
314 | " \n",
315 | " \n",
316 | " \n",
317 | " | 0 | \n",
318 | " 1 | \n",
319 | " Sam | \n",
320 | "
\n",
321 | " \n",
322 | " | 1 | \n",
323 | " 2 | \n",
324 | " Alex | \n",
325 | "
\n",
326 | " \n",
327 | " | 2 | \n",
328 | " 3 | \n",
329 | " John | \n",
330 | "
\n",
331 | " \n",
332 | "
\n",
333 | "
"
334 | ],
335 | "text/plain": [
336 | " A B\n",
337 | "0 1 Sam\n",
338 | "1 2 Alex\n",
339 | "2 3 John"
340 | ]
341 | },
342 | "execution_count": 10,
343 | "metadata": {},
344 | "output_type": "execute_result"
345 | }
346 | ],
347 | "source": [
348 | "# Or the dictionary based version of list of rows\n",
349 | "data = [{\"A\": 1, \"B\": \"Sam\"}, {\"A\": 2, \"B\": \"Alex\"}, {\"A\": 3, \"B\": \"John\"}]\n",
350 | "df = pd.DataFrame(data)\n",
351 | "df"
352 | ]
353 | }
354 | ],
355 | "metadata": {
356 | "kernelspec": {
357 | "display_name": "Python 3",
358 | "language": "python",
359 | "name": "python3"
360 | },
361 | "language_info": {
362 | "codemirror_mode": {
363 | "name": "ipython",
364 | "version": 3
365 | },
366 | "file_extension": ".py",
367 | "mimetype": "text/x-python",
368 | "name": "python",
369 | "nbconvert_exporter": "python",
370 | "pygments_lexer": "ipython3",
371 | "version": "3.7.3"
372 | }
373 | },
374 | "nbformat": 4,
375 | "nbformat_minor": 2
376 | }
377 |
--------------------------------------------------------------------------------
/practiceResource/heart.csv:
--------------------------------------------------------------------------------
1 | age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
2 | 63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
3 | 37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
4 | 41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
5 | 56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
6 | 57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
7 | 57,1,0,140,192,0,1,148,0,0.4,1,0,1,1
8 | 56,0,1,140,294,0,0,153,0,1.3,1,0,2,1
9 | 44,1,1,120,263,0,1,173,0,0,2,0,3,1
10 | 52,1,2,172,199,1,1,162,0,0.5,2,0,3,1
11 | 57,1,2,150,168,0,1,174,0,1.6,2,0,2,1
12 | 54,1,0,140,239,0,1,160,0,1.2,2,0,2,1
13 | 48,0,2,130,275,0,1,139,0,0.2,2,0,2,1
14 | 49,1,1,130,266,0,1,171,0,0.6,2,0,2,1
15 | 64,1,3,110,211,0,0,144,1,1.8,1,0,2,1
16 | 58,0,3,150,283,1,0,162,0,1,2,0,2,1
17 | 50,0,2,120,219,0,1,158,0,1.6,1,0,2,1
18 | 58,0,2,120,340,0,1,172,0,0,2,0,2,1
19 | 66,0,3,150,226,0,1,114,0,2.6,0,0,2,1
20 | 43,1,0,150,247,0,1,171,0,1.5,2,0,2,1
21 | 69,0,3,140,239,0,1,151,0,1.8,2,2,2,1
22 | 59,1,0,135,234,0,1,161,0,0.5,1,0,3,1
23 | 44,1,2,130,233,0,1,179,1,0.4,2,0,2,1
24 | 42,1,0,140,226,0,1,178,0,0,2,0,2,1
25 | 61,1,2,150,243,1,1,137,1,1,1,0,2,1
26 | 40,1,3,140,199,0,1,178,1,1.4,2,0,3,1
27 | 71,0,1,160,302,0,1,162,0,0.4,2,2,2,1
28 | 59,1,2,150,212,1,1,157,0,1.6,2,0,2,1
29 | 51,1,2,110,175,0,1,123,0,0.6,2,0,2,1
30 | 65,0,2,140,417,1,0,157,0,0.8,2,1,2,1
31 | 53,1,2,130,197,1,0,152,0,1.2,0,0,2,1
32 | 41,0,1,105,198,0,1,168,0,0,2,1,2,1
33 | 65,1,0,120,177,0,1,140,0,0.4,2,0,3,1
34 | 44,1,1,130,219,0,0,188,0,0,2,0,2,1
35 | 54,1,2,125,273,0,0,152,0,0.5,0,1,2,1
36 | 51,1,3,125,213,0,0,125,1,1.4,2,1,2,1
37 | 46,0,2,142,177,0,0,160,1,1.4,0,0,2,1
38 | 54,0,2,135,304,1,1,170,0,0,2,0,2,1
39 | 54,1,2,150,232,0,0,165,0,1.6,2,0,3,1
40 | 65,0,2,155,269,0,1,148,0,0.8,2,0,2,1
41 | 65,0,2,160,360,0,0,151,0,0.8,2,0,2,1
42 | 51,0,2,140,308,0,0,142,0,1.5,2,1,2,1
43 | 48,1,1,130,245,0,0,180,0,0.2,1,0,2,1
44 | 45,1,0,104,208,0,0,148,1,3,1,0,2,1
45 | 53,0,0,130,264,0,0,143,0,0.4,1,0,2,1
46 | 39,1,2,140,321,0,0,182,0,0,2,0,2,1
47 | 52,1,1,120,325,0,1,172,0,0.2,2,0,2,1
48 | 44,1,2,140,235,0,0,180,0,0,2,0,2,1
49 | 47,1,2,138,257,0,0,156,0,0,2,0,2,1
50 | 53,0,2,128,216,0,0,115,0,0,2,0,0,1
51 | 53,0,0,138,234,0,0,160,0,0,2,0,2,1
52 | 51,0,2,130,256,0,0,149,0,0.5,2,0,2,1
53 | 66,1,0,120,302,0,0,151,0,0.4,1,0,2,1
54 | 62,1,2,130,231,0,1,146,0,1.8,1,3,3,1
55 | 44,0,2,108,141,0,1,175,0,0.6,1,0,2,1
56 | 63,0,2,135,252,0,0,172,0,0,2,0,2,1
57 | 52,1,1,134,201,0,1,158,0,0.8,2,1,2,1
58 | 48,1,0,122,222,0,0,186,0,0,2,0,2,1
59 | 45,1,0,115,260,0,0,185,0,0,2,0,2,1
60 | 34,1,3,118,182,0,0,174,0,0,2,0,2,1
61 | 57,0,0,128,303,0,0,159,0,0,2,1,2,1
62 | 71,0,2,110,265,1,0,130,0,0,2,1,2,1
63 | 54,1,1,108,309,0,1,156,0,0,2,0,3,1
64 | 52,1,3,118,186,0,0,190,0,0,1,0,1,1
65 | 41,1,1,135,203,0,1,132,0,0,1,0,1,1
66 | 58,1,2,140,211,1,0,165,0,0,2,0,2,1
67 | 35,0,0,138,183,0,1,182,0,1.4,2,0,2,1
68 | 51,1,2,100,222,0,1,143,1,1.2,1,0,2,1
69 | 45,0,1,130,234,0,0,175,0,0.6,1,0,2,1
70 | 44,1,1,120,220,0,1,170,0,0,2,0,2,1
71 | 62,0,0,124,209,0,1,163,0,0,2,0,2,1
72 | 54,1,2,120,258,0,0,147,0,0.4,1,0,3,1
73 | 51,1,2,94,227,0,1,154,1,0,2,1,3,1
74 | 29,1,1,130,204,0,0,202,0,0,2,0,2,1
75 | 51,1,0,140,261,0,0,186,1,0,2,0,2,1
76 | 43,0,2,122,213,0,1,165,0,0.2,1,0,2,1
77 | 55,0,1,135,250,0,0,161,0,1.4,1,0,2,1
78 | 51,1,2,125,245,1,0,166,0,2.4,1,0,2,1
79 | 59,1,1,140,221,0,1,164,1,0,2,0,2,1
80 | 52,1,1,128,205,1,1,184,0,0,2,0,2,1
81 | 58,1,2,105,240,0,0,154,1,0.6,1,0,3,1
82 | 41,1,2,112,250,0,1,179,0,0,2,0,2,1
83 | 45,1,1,128,308,0,0,170,0,0,2,0,2,1
84 | 60,0,2,102,318,0,1,160,0,0,2,1,2,1
85 | 52,1,3,152,298,1,1,178,0,1.2,1,0,3,1
86 | 42,0,0,102,265,0,0,122,0,0.6,1,0,2,1
87 | 67,0,2,115,564,0,0,160,0,1.6,1,0,3,1
88 | 68,1,2,118,277,0,1,151,0,1,2,1,3,1
89 | 46,1,1,101,197,1,1,156,0,0,2,0,3,1
90 | 54,0,2,110,214,0,1,158,0,1.6,1,0,2,1
91 | 58,0,0,100,248,0,0,122,0,1,1,0,2,1
92 | 48,1,2,124,255,1,1,175,0,0,2,2,2,1
93 | 57,1,0,132,207,0,1,168,1,0,2,0,3,1
94 | 52,1,2,138,223,0,1,169,0,0,2,4,2,1
95 | 54,0,1,132,288,1,0,159,1,0,2,1,2,1
96 | 45,0,1,112,160,0,1,138,0,0,1,0,2,1
97 | 53,1,0,142,226,0,0,111,1,0,2,0,3,1
98 | 62,0,0,140,394,0,0,157,0,1.2,1,0,2,1
99 | 52,1,0,108,233,1,1,147,0,0.1,2,3,3,1
100 | 43,1,2,130,315,0,1,162,0,1.9,2,1,2,1
101 | 53,1,2,130,246,1,0,173,0,0,2,3,2,1
102 | 42,1,3,148,244,0,0,178,0,0.8,2,2,2,1
103 | 59,1,3,178,270,0,0,145,0,4.2,0,0,3,1
104 | 63,0,1,140,195,0,1,179,0,0,2,2,2,1
105 | 42,1,2,120,240,1,1,194,0,0.8,0,0,3,1
106 | 50,1,2,129,196,0,1,163,0,0,2,0,2,1
107 | 68,0,2,120,211,0,0,115,0,1.5,1,0,2,1
108 | 69,1,3,160,234,1,0,131,0,0.1,1,1,2,1
109 | 45,0,0,138,236,0,0,152,1,0.2,1,0,2,1
110 | 50,0,1,120,244,0,1,162,0,1.1,2,0,2,1
111 | 50,0,0,110,254,0,0,159,0,0,2,0,2,1
112 | 64,0,0,180,325,0,1,154,1,0,2,0,2,1
113 | 57,1,2,150,126,1,1,173,0,0.2,2,1,3,1
114 | 64,0,2,140,313,0,1,133,0,0.2,2,0,3,1
115 | 43,1,0,110,211,0,1,161,0,0,2,0,3,1
116 | 55,1,1,130,262,0,1,155,0,0,2,0,2,1
117 | 37,0,2,120,215,0,1,170,0,0,2,0,2,1
118 | 41,1,2,130,214,0,0,168,0,2,1,0,2,1
119 | 56,1,3,120,193,0,0,162,0,1.9,1,0,3,1
120 | 46,0,1,105,204,0,1,172,0,0,2,0,2,1
121 | 46,0,0,138,243,0,0,152,1,0,1,0,2,1
122 | 64,0,0,130,303,0,1,122,0,2,1,2,2,1
123 | 59,1,0,138,271,0,0,182,0,0,2,0,2,1
124 | 41,0,2,112,268,0,0,172,1,0,2,0,2,1
125 | 54,0,2,108,267,0,0,167,0,0,2,0,2,1
126 | 39,0,2,94,199,0,1,179,0,0,2,0,2,1
127 | 34,0,1,118,210,0,1,192,0,0.7,2,0,2,1
128 | 47,1,0,112,204,0,1,143,0,0.1,2,0,2,1
129 | 67,0,2,152,277,0,1,172,0,0,2,1,2,1
130 | 52,0,2,136,196,0,0,169,0,0.1,1,0,2,1
131 | 74,0,1,120,269,0,0,121,1,0.2,2,1,2,1
132 | 54,0,2,160,201,0,1,163,0,0,2,1,2,1
133 | 49,0,1,134,271,0,1,162,0,0,1,0,2,1
134 | 42,1,1,120,295,0,1,162,0,0,2,0,2,1
135 | 41,1,1,110,235,0,1,153,0,0,2,0,2,1
136 | 41,0,1,126,306,0,1,163,0,0,2,0,2,1
137 | 49,0,0,130,269,0,1,163,0,0,2,0,2,1
138 | 60,0,2,120,178,1,1,96,0,0,2,0,2,1
139 | 62,1,1,128,208,1,0,140,0,0,2,0,2,1
140 | 57,1,0,110,201,0,1,126,1,1.5,1,0,1,1
141 | 64,1,0,128,263,0,1,105,1,0.2,1,1,3,1
142 | 51,0,2,120,295,0,0,157,0,0.6,2,0,2,1
143 | 43,1,0,115,303,0,1,181,0,1.2,1,0,2,1
144 | 42,0,2,120,209,0,1,173,0,0,1,0,2,1
145 | 67,0,0,106,223,0,1,142,0,0.3,2,2,2,1
146 | 76,0,2,140,197,0,2,116,0,1.1,1,0,2,1
147 | 70,1,1,156,245,0,0,143,0,0,2,0,2,1
148 | 44,0,2,118,242,0,1,149,0,0.3,1,1,2,1
149 | 60,0,3,150,240,0,1,171,0,0.9,2,0,2,1
150 | 44,1,2,120,226,0,1,169,0,0,2,0,2,1
151 | 42,1,2,130,180,0,1,150,0,0,2,0,2,1
152 | 66,1,0,160,228,0,0,138,0,2.3,2,0,1,1
153 | 71,0,0,112,149,0,1,125,0,1.6,1,0,2,1
154 | 64,1,3,170,227,0,0,155,0,0.6,1,0,3,1
155 | 66,0,2,146,278,0,0,152,0,0,1,1,2,1
156 | 39,0,2,138,220,0,1,152,0,0,1,0,2,1
157 | 58,0,0,130,197,0,1,131,0,0.6,1,0,2,1
158 | 47,1,2,130,253,0,1,179,0,0,2,0,2,1
159 | 35,1,1,122,192,0,1,174,0,0,2,0,2,1
160 | 58,1,1,125,220,0,1,144,0,0.4,1,4,3,1
161 | 56,1,1,130,221,0,0,163,0,0,2,0,3,1
162 | 56,1,1,120,240,0,1,169,0,0,0,0,2,1
163 | 55,0,1,132,342,0,1,166,0,1.2,2,0,2,1
164 | 41,1,1,120,157,0,1,182,0,0,2,0,2,1
165 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
166 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
167 | 67,1,0,160,286,0,0,108,1,1.5,1,3,2,0
168 | 67,1,0,120,229,0,0,129,1,2.6,1,2,3,0
169 | 62,0,0,140,268,0,0,160,0,3.6,0,2,2,0
170 | 63,1,0,130,254,0,0,147,0,1.4,1,1,3,0
171 | 53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
172 | 56,1,2,130,256,1,0,142,1,0.6,1,1,1,0
173 | 48,1,1,110,229,0,1,168,0,1,0,0,3,0
174 | 58,1,1,120,284,0,0,160,0,1.8,1,0,2,0
175 | 58,1,2,132,224,0,0,173,0,3.2,2,2,3,0
176 | 60,1,0,130,206,0,0,132,1,2.4,1,2,3,0
177 | 40,1,0,110,167,0,0,114,1,2,1,0,3,0
178 | 60,1,0,117,230,1,1,160,1,1.4,2,2,3,0
179 | 64,1,2,140,335,0,1,158,0,0,2,0,2,0
180 | 43,1,0,120,177,0,0,120,1,2.5,1,0,3,0
181 | 57,1,0,150,276,0,0,112,1,0.6,1,1,1,0
182 | 55,1,0,132,353,0,1,132,1,1.2,1,1,3,0
183 | 65,0,0,150,225,0,0,114,0,1,1,3,3,0
184 | 61,0,0,130,330,0,0,169,0,0,2,0,2,0
185 | 58,1,2,112,230,0,0,165,0,2.5,1,1,3,0
186 | 50,1,0,150,243,0,0,128,0,2.6,1,0,3,0
187 | 44,1,0,112,290,0,0,153,0,0,2,1,2,0
188 | 60,1,0,130,253,0,1,144,1,1.4,2,1,3,0
189 | 54,1,0,124,266,0,0,109,1,2.2,1,1,3,0
190 | 50,1,2,140,233,0,1,163,0,0.6,1,1,3,0
191 | 41,1,0,110,172,0,0,158,0,0,2,0,3,0
192 | 51,0,0,130,305,0,1,142,1,1.2,1,0,3,0
193 | 58,1,0,128,216,0,0,131,1,2.2,1,3,3,0
194 | 54,1,0,120,188,0,1,113,0,1.4,1,1,3,0
195 | 60,1,0,145,282,0,0,142,1,2.8,1,2,3,0
196 | 60,1,2,140,185,0,0,155,0,3,1,0,2,0
197 | 59,1,0,170,326,0,0,140,1,3.4,0,0,3,0
198 | 46,1,2,150,231,0,1,147,0,3.6,1,0,2,0
199 | 67,1,0,125,254,1,1,163,0,0.2,1,2,3,0
200 | 62,1,0,120,267,0,1,99,1,1.8,1,2,3,0
201 | 65,1,0,110,248,0,0,158,0,0.6,2,2,1,0
202 | 44,1,0,110,197,0,0,177,0,0,2,1,2,0
203 | 60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
204 | 58,1,0,150,270,0,0,111,1,0.8,2,0,3,0
205 | 68,1,2,180,274,1,0,150,1,1.6,1,0,3,0
206 | 62,0,0,160,164,0,0,145,0,6.2,0,3,3,0
207 | 52,1,0,128,255,0,1,161,1,0,2,1,3,0
208 | 59,1,0,110,239,0,0,142,1,1.2,1,1,3,0
209 | 60,0,0,150,258,0,0,157,0,2.6,1,2,3,0
210 | 49,1,2,120,188,0,1,139,0,2,1,3,3,0
211 | 59,1,0,140,177,0,1,162,1,0,2,1,3,0
212 | 57,1,2,128,229,0,0,150,0,0.4,1,1,3,0
213 | 61,1,0,120,260,0,1,140,1,3.6,1,1,3,0
214 | 39,1,0,118,219,0,1,140,0,1.2,1,0,3,0
215 | 61,0,0,145,307,0,0,146,1,1,1,0,3,0
216 | 56,1,0,125,249,1,0,144,1,1.2,1,1,2,0
217 | 43,0,0,132,341,1,0,136,1,3,1,0,3,0
218 | 62,0,2,130,263,0,1,97,0,1.2,1,1,3,0
219 | 63,1,0,130,330,1,0,132,1,1.8,2,3,3,0
220 | 65,1,0,135,254,0,0,127,0,2.8,1,1,3,0
221 | 48,1,0,130,256,1,0,150,1,0,2,2,3,0
222 | 63,0,0,150,407,0,0,154,0,4,1,3,3,0
223 | 55,1,0,140,217,0,1,111,1,5.6,0,0,3,0
224 | 65,1,3,138,282,1,0,174,0,1.4,1,1,2,0
225 | 56,0,0,200,288,1,0,133,1,4,0,2,3,0
226 | 54,1,0,110,239,0,1,126,1,2.8,1,1,3,0
227 | 70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
228 | 62,1,1,120,281,0,0,103,0,1.4,1,1,3,0
229 | 35,1,0,120,198,0,1,130,1,1.6,1,0,3,0
230 | 59,1,3,170,288,0,0,159,0,0.2,1,0,3,0
231 | 64,1,2,125,309,0,1,131,1,1.8,1,0,3,0
232 | 47,1,2,108,243,0,1,152,0,0,2,0,2,0
233 | 57,1,0,165,289,1,0,124,0,1,1,3,3,0
234 | 55,1,0,160,289,0,0,145,1,0.8,1,1,3,0
235 | 64,1,0,120,246,0,0,96,1,2.2,0,1,2,0
236 | 70,1,0,130,322,0,0,109,0,2.4,1,3,2,0
237 | 51,1,0,140,299,0,1,173,1,1.6,2,0,3,0
238 | 58,1,0,125,300,0,0,171,0,0,2,2,3,0
239 | 60,1,0,140,293,0,0,170,0,1.2,1,2,3,0
240 | 77,1,0,125,304,0,0,162,1,0,2,3,2,0
241 | 35,1,0,126,282,0,0,156,1,0,2,0,3,0
242 | 70,1,2,160,269,0,1,112,1,2.9,1,1,3,0
243 | 59,0,0,174,249,0,1,143,1,0,1,0,2,0
244 | 64,1,0,145,212,0,0,132,0,2,1,2,1,0
245 | 57,1,0,152,274,0,1,88,1,1.2,1,1,3,0
246 | 56,1,0,132,184,0,0,105,1,2.1,1,1,1,0
247 | 48,1,0,124,274,0,0,166,0,0.5,1,0,3,0
248 | 56,0,0,134,409,0,0,150,1,1.9,1,2,3,0
249 | 66,1,1,160,246,0,1,120,1,0,1,3,1,0
250 | 54,1,1,192,283,0,0,195,0,0,2,1,3,0
251 | 69,1,2,140,254,0,0,146,0,2,1,3,3,0
252 | 51,1,0,140,298,0,1,122,1,4.2,1,3,3,0
253 | 43,1,0,132,247,1,0,143,1,0.1,1,4,3,0
254 | 62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
255 | 67,1,0,100,299,0,0,125,1,0.9,1,2,2,0
256 | 59,1,3,160,273,0,0,125,0,0,2,0,2,0
257 | 45,1,0,142,309,0,0,147,1,0,1,3,3,0
258 | 58,1,0,128,259,0,0,130,1,3,1,2,3,0
259 | 50,1,0,144,200,0,0,126,1,0.9,1,0,3,0
260 | 62,0,0,150,244,0,1,154,1,1.4,1,0,2,0
261 | 38,1,3,120,231,0,1,182,1,3.8,1,0,3,0
262 | 66,0,0,178,228,1,1,165,1,1,1,2,3,0
263 | 52,1,0,112,230,0,1,160,0,0,2,1,2,0
264 | 53,1,0,123,282,0,1,95,1,2,1,2,3,0
265 | 63,0,0,108,269,0,1,169,1,1.8,1,2,2,0
266 | 54,1,0,110,206,0,0,108,1,0,1,1,2,0
267 | 66,1,0,112,212,0,0,132,1,0.1,2,1,2,0
268 | 55,0,0,180,327,0,2,117,1,3.4,1,0,2,0
269 | 49,1,2,118,149,0,0,126,0,0.8,2,3,2,0
270 | 54,1,0,122,286,0,0,116,1,3.2,1,2,2,0
271 | 56,1,0,130,283,1,0,103,1,1.6,0,0,3,0
272 | 46,1,0,120,249,0,0,144,0,0.8,2,0,3,0
273 | 61,1,3,134,234,0,1,145,0,2.6,1,2,2,0
274 | 67,1,0,120,237,0,1,71,0,1,1,0,2,0
275 | 58,1,0,100,234,0,1,156,0,0.1,2,1,3,0
276 | 47,1,0,110,275,0,0,118,1,1,1,1,2,0
277 | 52,1,0,125,212,0,1,168,0,1,2,2,3,0
278 | 58,1,0,146,218,0,1,105,0,2,1,1,3,0
279 | 57,1,1,124,261,0,1,141,0,0.3,2,0,3,0
280 | 58,0,1,136,319,1,0,152,0,0,2,2,2,0
281 | 61,1,0,138,166,0,0,125,1,3.6,1,1,2,0
282 | 42,1,0,136,315,0,1,125,1,1.8,1,0,1,0
283 | 52,1,0,128,204,1,1,156,1,1,1,0,0,0
284 | 59,1,2,126,218,1,1,134,0,2.2,1,1,1,0
285 | 40,1,0,152,223,0,1,181,0,0,2,0,3,0
286 | 61,1,0,140,207,0,0,138,1,1.9,2,1,3,0
287 | 46,1,0,140,311,0,1,120,1,1.8,1,2,3,0
288 | 59,1,3,134,204,0,1,162,0,0.8,2,2,2,0
289 | 57,1,1,154,232,0,0,164,0,0,2,1,2,0
290 | 57,1,0,110,335,0,1,143,1,3,1,1,3,0
291 | 55,0,0,128,205,0,2,130,1,2,1,1,3,0
292 | 61,1,0,148,203,0,1,161,0,0,2,1,3,0
293 | 58,1,0,114,318,0,2,140,0,4.4,0,3,1,0
294 | 58,0,0,170,225,1,0,146,1,2.8,1,2,1,0
295 | 67,1,2,152,212,0,0,150,0,0.8,1,0,3,0
296 | 44,1,0,120,169,0,1,144,1,2.8,0,0,1,0
297 | 63,1,0,140,187,0,0,144,1,4,2,2,3,0
298 | 63,0,0,124,197,0,1,136,1,0,1,0,2,0
299 | 59,1,0,164,176,1,0,90,0,1,1,2,1,0
300 | 57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
301 | 45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
302 | 68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
303 | 57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
304 | 57,0,1,130,236,0,0,174,0,0,1,1,2,0
305 |
--------------------------------------------------------------------------------
/practiceResource/data/heart.csv:
--------------------------------------------------------------------------------
1 | age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
2 | 63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
3 | 37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
4 | 41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
5 | 56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
6 | 57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
7 | 57,1,0,140,192,0,1,148,0,0.4,1,0,1,1
8 | 56,0,1,140,294,0,0,153,0,1.3,1,0,2,1
9 | 44,1,1,120,263,0,1,173,0,0,2,0,3,1
10 | 52,1,2,172,199,1,1,162,0,0.5,2,0,3,1
11 | 57,1,2,150,168,0,1,174,0,1.6,2,0,2,1
12 | 54,1,0,140,239,0,1,160,0,1.2,2,0,2,1
13 | 48,0,2,130,275,0,1,139,0,0.2,2,0,2,1
14 | 49,1,1,130,266,0,1,171,0,0.6,2,0,2,1
15 | 64,1,3,110,211,0,0,144,1,1.8,1,0,2,1
16 | 58,0,3,150,283,1,0,162,0,1,2,0,2,1
17 | 50,0,2,120,219,0,1,158,0,1.6,1,0,2,1
18 | 58,0,2,120,340,0,1,172,0,0,2,0,2,1
19 | 66,0,3,150,226,0,1,114,0,2.6,0,0,2,1
20 | 43,1,0,150,247,0,1,171,0,1.5,2,0,2,1
21 | 69,0,3,140,239,0,1,151,0,1.8,2,2,2,1
22 | 59,1,0,135,234,0,1,161,0,0.5,1,0,3,1
23 | 44,1,2,130,233,0,1,179,1,0.4,2,0,2,1
24 | 42,1,0,140,226,0,1,178,0,0,2,0,2,1
25 | 61,1,2,150,243,1,1,137,1,1,1,0,2,1
26 | 40,1,3,140,199,0,1,178,1,1.4,2,0,3,1
27 | 71,0,1,160,302,0,1,162,0,0.4,2,2,2,1
28 | 59,1,2,150,212,1,1,157,0,1.6,2,0,2,1
29 | 51,1,2,110,175,0,1,123,0,0.6,2,0,2,1
30 | 65,0,2,140,417,1,0,157,0,0.8,2,1,2,1
31 | 53,1,2,130,197,1,0,152,0,1.2,0,0,2,1
32 | 41,0,1,105,198,0,1,168,0,0,2,1,2,1
33 | 65,1,0,120,177,0,1,140,0,0.4,2,0,3,1
34 | 44,1,1,130,219,0,0,188,0,0,2,0,2,1
35 | 54,1,2,125,273,0,0,152,0,0.5,0,1,2,1
36 | 51,1,3,125,213,0,0,125,1,1.4,2,1,2,1
37 | 46,0,2,142,177,0,0,160,1,1.4,0,0,2,1
38 | 54,0,2,135,304,1,1,170,0,0,2,0,2,1
39 | 54,1,2,150,232,0,0,165,0,1.6,2,0,3,1
40 | 65,0,2,155,269,0,1,148,0,0.8,2,0,2,1
41 | 65,0,2,160,360,0,0,151,0,0.8,2,0,2,1
42 | 51,0,2,140,308,0,0,142,0,1.5,2,1,2,1
43 | 48,1,1,130,245,0,0,180,0,0.2,1,0,2,1
44 | 45,1,0,104,208,0,0,148,1,3,1,0,2,1
45 | 53,0,0,130,264,0,0,143,0,0.4,1,0,2,1
46 | 39,1,2,140,321,0,0,182,0,0,2,0,2,1
47 | 52,1,1,120,325,0,1,172,0,0.2,2,0,2,1
48 | 44,1,2,140,235,0,0,180,0,0,2,0,2,1
49 | 47,1,2,138,257,0,0,156,0,0,2,0,2,1
50 | 53,0,2,128,216,0,0,115,0,0,2,0,0,1
51 | 53,0,0,138,234,0,0,160,0,0,2,0,2,1
52 | 51,0,2,130,256,0,0,149,0,0.5,2,0,2,1
53 | 66,1,0,120,302,0,0,151,0,0.4,1,0,2,1
54 | 62,1,2,130,231,0,1,146,0,1.8,1,3,3,1
55 | 44,0,2,108,141,0,1,175,0,0.6,1,0,2,1
56 | 63,0,2,135,252,0,0,172,0,0,2,0,2,1
57 | 52,1,1,134,201,0,1,158,0,0.8,2,1,2,1
58 | 48,1,0,122,222,0,0,186,0,0,2,0,2,1
59 | 45,1,0,115,260,0,0,185,0,0,2,0,2,1
60 | 34,1,3,118,182,0,0,174,0,0,2,0,2,1
61 | 57,0,0,128,303,0,0,159,0,0,2,1,2,1
62 | 71,0,2,110,265,1,0,130,0,0,2,1,2,1
63 | 54,1,1,108,309,0,1,156,0,0,2,0,3,1
64 | 52,1,3,118,186,0,0,190,0,0,1,0,1,1
65 | 41,1,1,135,203,0,1,132,0,0,1,0,1,1
66 | 58,1,2,140,211,1,0,165,0,0,2,0,2,1
67 | 35,0,0,138,183,0,1,182,0,1.4,2,0,2,1
68 | 51,1,2,100,222,0,1,143,1,1.2,1,0,2,1
69 | 45,0,1,130,234,0,0,175,0,0.6,1,0,2,1
70 | 44,1,1,120,220,0,1,170,0,0,2,0,2,1
71 | 62,0,0,124,209,0,1,163,0,0,2,0,2,1
72 | 54,1,2,120,258,0,0,147,0,0.4,1,0,3,1
73 | 51,1,2,94,227,0,1,154,1,0,2,1,3,1
74 | 29,1,1,130,204,0,0,202,0,0,2,0,2,1
75 | 51,1,0,140,261,0,0,186,1,0,2,0,2,1
76 | 43,0,2,122,213,0,1,165,0,0.2,1,0,2,1
77 | 55,0,1,135,250,0,0,161,0,1.4,1,0,2,1
78 | 51,1,2,125,245,1,0,166,0,2.4,1,0,2,1
79 | 59,1,1,140,221,0,1,164,1,0,2,0,2,1
80 | 52,1,1,128,205,1,1,184,0,0,2,0,2,1
81 | 58,1,2,105,240,0,0,154,1,0.6,1,0,3,1
82 | 41,1,2,112,250,0,1,179,0,0,2,0,2,1
83 | 45,1,1,128,308,0,0,170,0,0,2,0,2,1
84 | 60,0,2,102,318,0,1,160,0,0,2,1,2,1
85 | 52,1,3,152,298,1,1,178,0,1.2,1,0,3,1
86 | 42,0,0,102,265,0,0,122,0,0.6,1,0,2,1
87 | 67,0,2,115,564,0,0,160,0,1.6,1,0,3,1
88 | 68,1,2,118,277,0,1,151,0,1,2,1,3,1
89 | 46,1,1,101,197,1,1,156,0,0,2,0,3,1
90 | 54,0,2,110,214,0,1,158,0,1.6,1,0,2,1
91 | 58,0,0,100,248,0,0,122,0,1,1,0,2,1
92 | 48,1,2,124,255,1,1,175,0,0,2,2,2,1
93 | 57,1,0,132,207,0,1,168,1,0,2,0,3,1
94 | 52,1,2,138,223,0,1,169,0,0,2,4,2,1
95 | 54,0,1,132,288,1,0,159,1,0,2,1,2,1
96 | 45,0,1,112,160,0,1,138,0,0,1,0,2,1
97 | 53,1,0,142,226,0,0,111,1,0,2,0,3,1
98 | 62,0,0,140,394,0,0,157,0,1.2,1,0,2,1
99 | 52,1,0,108,233,1,1,147,0,0.1,2,3,3,1
100 | 43,1,2,130,315,0,1,162,0,1.9,2,1,2,1
101 | 53,1,2,130,246,1,0,173,0,0,2,3,2,1
102 | 42,1,3,148,244,0,0,178,0,0.8,2,2,2,1
103 | 59,1,3,178,270,0,0,145,0,4.2,0,0,3,1
104 | 63,0,1,140,195,0,1,179,0,0,2,2,2,1
105 | 42,1,2,120,240,1,1,194,0,0.8,0,0,3,1
106 | 50,1,2,129,196,0,1,163,0,0,2,0,2,1
107 | 68,0,2,120,211,0,0,115,0,1.5,1,0,2,1
108 | 69,1,3,160,234,1,0,131,0,0.1,1,1,2,1
109 | 45,0,0,138,236,0,0,152,1,0.2,1,0,2,1
110 | 50,0,1,120,244,0,1,162,0,1.1,2,0,2,1
111 | 50,0,0,110,254,0,0,159,0,0,2,0,2,1
112 | 64,0,0,180,325,0,1,154,1,0,2,0,2,1
113 | 57,1,2,150,126,1,1,173,0,0.2,2,1,3,1
114 | 64,0,2,140,313,0,1,133,0,0.2,2,0,3,1
115 | 43,1,0,110,211,0,1,161,0,0,2,0,3,1
116 | 55,1,1,130,262,0,1,155,0,0,2,0,2,1
117 | 37,0,2,120,215,0,1,170,0,0,2,0,2,1
118 | 41,1,2,130,214,0,0,168,0,2,1,0,2,1
119 | 56,1,3,120,193,0,0,162,0,1.9,1,0,3,1
120 | 46,0,1,105,204,0,1,172,0,0,2,0,2,1
121 | 46,0,0,138,243,0,0,152,1,0,1,0,2,1
122 | 64,0,0,130,303,0,1,122,0,2,1,2,2,1
123 | 59,1,0,138,271,0,0,182,0,0,2,0,2,1
124 | 41,0,2,112,268,0,0,172,1,0,2,0,2,1
125 | 54,0,2,108,267,0,0,167,0,0,2,0,2,1
126 | 39,0,2,94,199,0,1,179,0,0,2,0,2,1
127 | 34,0,1,118,210,0,1,192,0,0.7,2,0,2,1
128 | 47,1,0,112,204,0,1,143,0,0.1,2,0,2,1
129 | 67,0,2,152,277,0,1,172,0,0,2,1,2,1
130 | 52,0,2,136,196,0,0,169,0,0.1,1,0,2,1
131 | 74,0,1,120,269,0,0,121,1,0.2,2,1,2,1
132 | 54,0,2,160,201,0,1,163,0,0,2,1,2,1
133 | 49,0,1,134,271,0,1,162,0,0,1,0,2,1
134 | 42,1,1,120,295,0,1,162,0,0,2,0,2,1
135 | 41,1,1,110,235,0,1,153,0,0,2,0,2,1
136 | 41,0,1,126,306,0,1,163,0,0,2,0,2,1
137 | 49,0,0,130,269,0,1,163,0,0,2,0,2,1
138 | 60,0,2,120,178,1,1,96,0,0,2,0,2,1
139 | 62,1,1,128,208,1,0,140,0,0,2,0,2,1
140 | 57,1,0,110,201,0,1,126,1,1.5,1,0,1,1
141 | 64,1,0,128,263,0,1,105,1,0.2,1,1,3,1
142 | 51,0,2,120,295,0,0,157,0,0.6,2,0,2,1
143 | 43,1,0,115,303,0,1,181,0,1.2,1,0,2,1
144 | 42,0,2,120,209,0,1,173,0,0,1,0,2,1
145 | 67,0,0,106,223,0,1,142,0,0.3,2,2,2,1
146 | 76,0,2,140,197,0,2,116,0,1.1,1,0,2,1
147 | 70,1,1,156,245,0,0,143,0,0,2,0,2,1
148 | 44,0,2,118,242,0,1,149,0,0.3,1,1,2,1
149 | 60,0,3,150,240,0,1,171,0,0.9,2,0,2,1
150 | 44,1,2,120,226,0,1,169,0,0,2,0,2,1
151 | 42,1,2,130,180,0,1,150,0,0,2,0,2,1
152 | 66,1,0,160,228,0,0,138,0,2.3,2,0,1,1
153 | 71,0,0,112,149,0,1,125,0,1.6,1,0,2,1
154 | 64,1,3,170,227,0,0,155,0,0.6,1,0,3,1
155 | 66,0,2,146,278,0,0,152,0,0,1,1,2,1
156 | 39,0,2,138,220,0,1,152,0,0,1,0,2,1
157 | 58,0,0,130,197,0,1,131,0,0.6,1,0,2,1
158 | 47,1,2,130,253,0,1,179,0,0,2,0,2,1
159 | 35,1,1,122,192,0,1,174,0,0,2,0,2,1
160 | 58,1,1,125,220,0,1,144,0,0.4,1,4,3,1
161 | 56,1,1,130,221,0,0,163,0,0,2,0,3,1
162 | 56,1,1,120,240,0,1,169,0,0,0,0,2,1
163 | 55,0,1,132,342,0,1,166,0,1.2,2,0,2,1
164 | 41,1,1,120,157,0,1,182,0,0,2,0,2,1
165 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
166 | 38,1,2,138,175,0,1,173,0,0,2,4,2,1
167 | 67,1,0,160,286,0,0,108,1,1.5,1,3,2,0
168 | 67,1,0,120,229,0,0,129,1,2.6,1,2,3,0
169 | 62,0,0,140,268,0,0,160,0,3.6,0,2,2,0
170 | 63,1,0,130,254,0,0,147,0,1.4,1,1,3,0
171 | 53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
172 | 56,1,2,130,256,1,0,142,1,0.6,1,1,1,0
173 | 48,1,1,110,229,0,1,168,0,1,0,0,3,0
174 | 58,1,1,120,284,0,0,160,0,1.8,1,0,2,0
175 | 58,1,2,132,224,0,0,173,0,3.2,2,2,3,0
176 | 60,1,0,130,206,0,0,132,1,2.4,1,2,3,0
177 | 40,1,0,110,167,0,0,114,1,2,1,0,3,0
178 | 60,1,0,117,230,1,1,160,1,1.4,2,2,3,0
179 | 64,1,2,140,335,0,1,158,0,0,2,0,2,0
180 | 43,1,0,120,177,0,0,120,1,2.5,1,0,3,0
181 | 57,1,0,150,276,0,0,112,1,0.6,1,1,1,0
182 | 55,1,0,132,353,0,1,132,1,1.2,1,1,3,0
183 | 65,0,0,150,225,0,0,114,0,1,1,3,3,0
184 | 61,0,0,130,330,0,0,169,0,0,2,0,2,0
185 | 58,1,2,112,230,0,0,165,0,2.5,1,1,3,0
186 | 50,1,0,150,243,0,0,128,0,2.6,1,0,3,0
187 | 44,1,0,112,290,0,0,153,0,0,2,1,2,0
188 | 60,1,0,130,253,0,1,144,1,1.4,2,1,3,0
189 | 54,1,0,124,266,0,0,109,1,2.2,1,1,3,0
190 | 50,1,2,140,233,0,1,163,0,0.6,1,1,3,0
191 | 41,1,0,110,172,0,0,158,0,0,2,0,3,0
192 | 51,0,0,130,305,0,1,142,1,1.2,1,0,3,0
193 | 58,1,0,128,216,0,0,131,1,2.2,1,3,3,0
194 | 54,1,0,120,188,0,1,113,0,1.4,1,1,3,0
195 | 60,1,0,145,282,0,0,142,1,2.8,1,2,3,0
196 | 60,1,2,140,185,0,0,155,0,3,1,0,2,0
197 | 59,1,0,170,326,0,0,140,1,3.4,0,0,3,0
198 | 46,1,2,150,231,0,1,147,0,3.6,1,0,2,0
199 | 67,1,0,125,254,1,1,163,0,0.2,1,2,3,0
200 | 62,1,0,120,267,0,1,99,1,1.8,1,2,3,0
201 | 65,1,0,110,248,0,0,158,0,0.6,2,2,1,0
202 | 44,1,0,110,197,0,0,177,0,0,2,1,2,0
203 | 60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
204 | 58,1,0,150,270,0,0,111,1,0.8,2,0,3,0
205 | 68,1,2,180,274,1,0,150,1,1.6,1,0,3,0
206 | 62,0,0,160,164,0,0,145,0,6.2,0,3,3,0
207 | 52,1,0,128,255,0,1,161,1,0,2,1,3,0
208 | 59,1,0,110,239,0,0,142,1,1.2,1,1,3,0
209 | 60,0,0,150,258,0,0,157,0,2.6,1,2,3,0
210 | 49,1,2,120,188,0,1,139,0,2,1,3,3,0
211 | 59,1,0,140,177,0,1,162,1,0,2,1,3,0
212 | 57,1,2,128,229,0,0,150,0,0.4,1,1,3,0
213 | 61,1,0,120,260,0,1,140,1,3.6,1,1,3,0
214 | 39,1,0,118,219,0,1,140,0,1.2,1,0,3,0
215 | 61,0,0,145,307,0,0,146,1,1,1,0,3,0
216 | 56,1,0,125,249,1,0,144,1,1.2,1,1,2,0
217 | 43,0,0,132,341,1,0,136,1,3,1,0,3,0
218 | 62,0,2,130,263,0,1,97,0,1.2,1,1,3,0
219 | 63,1,0,130,330,1,0,132,1,1.8,2,3,3,0
220 | 65,1,0,135,254,0,0,127,0,2.8,1,1,3,0
221 | 48,1,0,130,256,1,0,150,1,0,2,2,3,0
222 | 63,0,0,150,407,0,0,154,0,4,1,3,3,0
223 | 55,1,0,140,217,0,1,111,1,5.6,0,0,3,0
224 | 65,1,3,138,282,1,0,174,0,1.4,1,1,2,0
225 | 56,0,0,200,288,1,0,133,1,4,0,2,3,0
226 | 54,1,0,110,239,0,1,126,1,2.8,1,1,3,0
227 | 70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
228 | 62,1,1,120,281,0,0,103,0,1.4,1,1,3,0
229 | 35,1,0,120,198,0,1,130,1,1.6,1,0,3,0
230 | 59,1,3,170,288,0,0,159,0,0.2,1,0,3,0
231 | 64,1,2,125,309,0,1,131,1,1.8,1,0,3,0
232 | 47,1,2,108,243,0,1,152,0,0,2,0,2,0
233 | 57,1,0,165,289,1,0,124,0,1,1,3,3,0
234 | 55,1,0,160,289,0,0,145,1,0.8,1,1,3,0
235 | 64,1,0,120,246,0,0,96,1,2.2,0,1,2,0
236 | 70,1,0,130,322,0,0,109,0,2.4,1,3,2,0
237 | 51,1,0,140,299,0,1,173,1,1.6,2,0,3,0
238 | 58,1,0,125,300,0,0,171,0,0,2,2,3,0
239 | 60,1,0,140,293,0,0,170,0,1.2,1,2,3,0
240 | 77,1,0,125,304,0,0,162,1,0,2,3,2,0
241 | 35,1,0,126,282,0,0,156,1,0,2,0,3,0
242 | 70,1,2,160,269,0,1,112,1,2.9,1,1,3,0
243 | 59,0,0,174,249,0,1,143,1,0,1,0,2,0
244 | 64,1,0,145,212,0,0,132,0,2,1,2,1,0
245 | 57,1,0,152,274,0,1,88,1,1.2,1,1,3,0
246 | 56,1,0,132,184,0,0,105,1,2.1,1,1,1,0
247 | 48,1,0,124,274,0,0,166,0,0.5,1,0,3,0
248 | 56,0,0,134,409,0,0,150,1,1.9,1,2,3,0
249 | 66,1,1,160,246,0,1,120,1,0,1,3,1,0
250 | 54,1,1,192,283,0,0,195,0,0,2,1,3,0
251 | 69,1,2,140,254,0,0,146,0,2,1,3,3,0
252 | 51,1,0,140,298,0,1,122,1,4.2,1,3,3,0
253 | 43,1,0,132,247,1,0,143,1,0.1,1,4,3,0
254 | 62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
255 | 67,1,0,100,299,0,0,125,1,0.9,1,2,2,0
256 | 59,1,3,160,273,0,0,125,0,0,2,0,2,0
257 | 45,1,0,142,309,0,0,147,1,0,1,3,3,0
258 | 58,1,0,128,259,0,0,130,1,3,1,2,3,0
259 | 50,1,0,144,200,0,0,126,1,0.9,1,0,3,0
260 | 62,0,0,150,244,0,1,154,1,1.4,1,0,2,0
261 | 38,1,3,120,231,0,1,182,1,3.8,1,0,3,0
262 | 66,0,0,178,228,1,1,165,1,1,1,2,3,0
263 | 52,1,0,112,230,0,1,160,0,0,2,1,2,0
264 | 53,1,0,123,282,0,1,95,1,2,1,2,3,0
265 | 63,0,0,108,269,0,1,169,1,1.8,1,2,2,0
266 | 54,1,0,110,206,0,0,108,1,0,1,1,2,0
267 | 66,1,0,112,212,0,0,132,1,0.1,2,1,2,0
268 | 55,0,0,180,327,0,2,117,1,3.4,1,0,2,0
269 | 49,1,2,118,149,0,0,126,0,0.8,2,3,2,0
270 | 54,1,0,122,286,0,0,116,1,3.2,1,2,2,0
271 | 56,1,0,130,283,1,0,103,1,1.6,0,0,3,0
272 | 46,1,0,120,249,0,0,144,0,0.8,2,0,3,0
273 | 61,1,3,134,234,0,1,145,0,2.6,1,2,2,0
274 | 67,1,0,120,237,0,1,71,0,1,1,0,2,0
275 | 58,1,0,100,234,0,1,156,0,0.1,2,1,3,0
276 | 47,1,0,110,275,0,0,118,1,1,1,1,2,0
277 | 52,1,0,125,212,0,1,168,0,1,2,2,3,0
278 | 58,1,0,146,218,0,1,105,0,2,1,1,3,0
279 | 57,1,1,124,261,0,1,141,0,0.3,2,0,3,0
280 | 58,0,1,136,319,1,0,152,0,0,2,2,2,0
281 | 61,1,0,138,166,0,0,125,1,3.6,1,1,2,0
282 | 42,1,0,136,315,0,1,125,1,1.8,1,0,1,0
283 | 52,1,0,128,204,1,1,156,1,1,1,0,0,0
284 | 59,1,2,126,218,1,1,134,0,2.2,1,1,1,0
285 | 40,1,0,152,223,0,1,181,0,0,2,0,3,0
286 | 61,1,0,140,207,0,0,138,1,1.9,2,1,3,0
287 | 46,1,0,140,311,0,1,120,1,1.8,1,2,3,0
288 | 59,1,3,134,204,0,1,162,0,0.8,2,2,2,0
289 | 57,1,1,154,232,0,0,164,0,0,2,1,2,0
290 | 57,1,0,110,335,0,1,143,1,3,1,1,3,0
291 | 55,0,0,128,205,0,2,130,1,2,1,1,3,0
292 | 61,1,0,148,203,0,1,161,0,0,2,1,3,0
293 | 58,1,0,114,318,0,2,140,0,4.4,0,3,1,0
294 | 58,0,0,170,225,1,0,146,1,2.8,1,2,1,0
295 | 67,1,2,152,212,0,0,150,0,0.8,1,0,3,0
296 | 44,1,0,120,169,0,1,144,1,2.8,0,0,1,0
297 | 63,1,0,140,187,0,0,144,1,4,2,2,3,0
298 | 63,0,0,124,197,0,1,136,1,0,1,0,2,0
299 | 59,1,0,164,176,1,0,90,0,1,1,2,1,0
300 | 57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
301 | 45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
302 | 68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
303 | 57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
304 | 57,0,1,130,236,0,0,174,0,0,1,1,2,0
305 |
--------------------------------------------------------------------------------
/practiceResource/dataMaipulation/Questions.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Extra Practise - Basics\n",
8 | "\n",
9 | "In this optional practise session, I thought it would be fun to look at some cost of living data from, you guessed it, Kaggle: https://www.kaggle.com/stephenofarrell/cost-of-living\n",
10 | "\n",
11 | "Here are the objectives:\n",
12 | "\n",
13 | "1. Rename the \"index\" column to \"location\"\n",
14 | "2. Utilise apply to generate two new columns from the location - city and country\n",
15 | "3. Realise the easy solution doesn't doesnt work for the United States and create a function for apply to remove specific states.\n",
16 | "3. Figure out which country has the most cities listed, and create a dataset from only that country\n",
17 | "4. Sort the dataset by the cost of living 'Apartment (1 bedroom) in City Centre'\n",
18 | "5. Cry over housing prices if you live in the Bay Area.\n",
19 | "\n",
20 | "After that, feel free to keep playing with the data yourself.\n"
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": 32,
26 | "metadata": {
27 | "ExecuteTime": {
28 | "end_time": "2020-02-03T02:01:59.091568Z",
29 | "start_time": "2020-02-03T02:01:59.056659Z"
30 | }
31 | },
32 | "outputs": [
33 | {
34 | "data": {
35 | "text/html": [
36 | "\n",
37 | "\n",
50 | "
\n",
51 | " \n",
52 | " \n",
53 | " | \n",
54 | " index | \n",
55 | " Meal, Inexpensive Restaurant | \n",
56 | " Meal for 2 People, Mid-range Restaurant, Three-course | \n",
57 | " McMeal at McDonalds (or Equivalent Combo Meal) | \n",
58 | " Domestic Beer (0.5 liter draught) | \n",
59 | " Imported Beer (0.33 liter bottle) | \n",
60 | " Coke/Pepsi (0.33 liter bottle) | \n",
61 | " Water (0.33 liter bottle) | \n",
62 | " Milk (regular), (1 liter) | \n",
63 | " Loaf of Fresh White Bread (500g) | \n",
64 | " ... | \n",
65 | " Lettuce (1 head) | \n",
66 | " Cappuccino (regular) | \n",
67 | " Rice (white), (1kg) | \n",
68 | " Tomato (1kg) | \n",
69 | " Banana (1kg) | \n",
70 | " Onion (1kg) | \n",
71 | " Beef Round (1kg) (or Equivalent Back Leg Red Meat) | \n",
72 | " Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car) | \n",
73 | " Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child | \n",
74 | " International Primary School, Yearly for 1 Child | \n",
75 | "
\n",
76 | " \n",
77 | " \n",
78 | " \n",
79 | " | 0 | \n",
80 | " Saint Petersburg, Russia | \n",
81 | " 7.34 | \n",
82 | " 29.35 | \n",
83 | " 4.40 | \n",
84 | " 2.20 | \n",
85 | " 2.20 | \n",
86 | " 0.76 | \n",
87 | " 0.53 | \n",
88 | " 0.98 | \n",
89 | " 0.71 | \n",
90 | " ... | \n",
91 | " 0.86 | \n",
92 | " 1.96 | \n",
93 | " 0.92 | \n",
94 | " 1.91 | \n",
95 | " 0.89 | \n",
96 | " 0.48 | \n",
97 | " 7.18 | \n",
98 | " 19305.29 | \n",
99 | " 411.83 | \n",
100 | " 5388.86 | \n",
101 | "
\n",
102 | " \n",
103 | " | 1 | \n",
104 | " Istanbul, Turkey | \n",
105 | " 4.58 | \n",
106 | " 15.28 | \n",
107 | " 3.82 | \n",
108 | " 3.06 | \n",
109 | " 3.06 | \n",
110 | " 0.64 | \n",
111 | " 0.24 | \n",
112 | " 0.71 | \n",
113 | " 0.36 | \n",
114 | " ... | \n",
115 | " 0.61 | \n",
116 | " 1.84 | \n",
117 | " 1.30 | \n",
118 | " 0.80 | \n",
119 | " 1.91 | \n",
120 | " 0.62 | \n",
121 | " 9.73 | \n",
122 | " 20874.72 | \n",
123 | " 282.94 | \n",
124 | " 6905.43 | \n",
125 | "
\n",
126 | " \n",
127 | " | 2 | \n",
128 | " Izmir, Turkey | \n",
129 | " 3.06 | \n",
130 | " 12.22 | \n",
131 | " 3.06 | \n",
132 | " 2.29 | \n",
133 | " 2.75 | \n",
134 | " 0.61 | \n",
135 | " 0.22 | \n",
136 | " 0.65 | \n",
137 | " 0.38 | \n",
138 | " ... | \n",
139 | " 0.57 | \n",
140 | " 1.56 | \n",
141 | " 1.31 | \n",
142 | " 0.70 | \n",
143 | " 1.78 | \n",
144 | " 0.58 | \n",
145 | " 8.61 | \n",
146 | " 20898.83 | \n",
147 | " 212.18 | \n",
148 | " 4948.41 | \n",
149 | "
\n",
150 | " \n",
151 | " | 3 | \n",
152 | " Helsinki, Finland | \n",
153 | " 12.00 | \n",
154 | " 65.00 | \n",
155 | " 8.00 | \n",
156 | " 6.50 | \n",
157 | " 6.75 | \n",
158 | " 2.66 | \n",
159 | " 1.89 | \n",
160 | " 0.96 | \n",
161 | " 2.27 | \n",
162 | " ... | \n",
163 | " 2.30 | \n",
164 | " 3.87 | \n",
165 | " 2.13 | \n",
166 | " 2.91 | \n",
167 | " 1.61 | \n",
168 | " 1.25 | \n",
169 | " 12.34 | \n",
170 | " 24402.77 | \n",
171 | " 351.60 | \n",
172 | " 1641.00 | \n",
173 | "
\n",
174 | " \n",
175 | " | 4 | \n",
176 | " Chisinau, Moldova | \n",
177 | " 4.67 | \n",
178 | " 20.74 | \n",
179 | " 4.15 | \n",
180 | " 1.04 | \n",
181 | " 1.43 | \n",
182 | " 0.64 | \n",
183 | " 0.44 | \n",
184 | " 0.68 | \n",
185 | " 0.33 | \n",
186 | " ... | \n",
187 | " 0.84 | \n",
188 | " 1.25 | \n",
189 | " 0.93 | \n",
190 | " 1.56 | \n",
191 | " 1.37 | \n",
192 | " 0.59 | \n",
193 | " 5.37 | \n",
194 | " 17238.13 | \n",
195 | " 210.52 | \n",
196 | " 2679.30 | \n",
197 | "
\n",
198 | " \n",
199 | "
\n",
200 | "
5 rows × 56 columns
\n",
201 | "
"
202 | ],
203 | "text/plain": [
204 | " index Meal, Inexpensive Restaurant \\\n",
205 | "0 Saint Petersburg, Russia 7.34 \n",
206 | "1 Istanbul, Turkey 4.58 \n",
207 | "2 Izmir, Turkey 3.06 \n",
208 | "3 Helsinki, Finland 12.00 \n",
209 | "4 Chisinau, Moldova 4.67 \n",
210 | "\n",
211 | " Meal for 2 People, Mid-range Restaurant, Three-course \\\n",
212 | "0 29.35 \n",
213 | "1 15.28 \n",
214 | "2 12.22 \n",
215 | "3 65.00 \n",
216 | "4 20.74 \n",
217 | "\n",
218 | " McMeal at McDonalds (or Equivalent Combo Meal) \\\n",
219 | "0 4.40 \n",
220 | "1 3.82 \n",
221 | "2 3.06 \n",
222 | "3 8.00 \n",
223 | "4 4.15 \n",
224 | "\n",
225 | " Domestic Beer (0.5 liter draught) Imported Beer (0.33 liter bottle) \\\n",
226 | "0 2.20 2.20 \n",
227 | "1 3.06 3.06 \n",
228 | "2 2.29 2.75 \n",
229 | "3 6.50 6.75 \n",
230 | "4 1.04 1.43 \n",
231 | "\n",
232 | " Coke/Pepsi (0.33 liter bottle) Water (0.33 liter bottle) \\\n",
233 | "0 0.76 0.53 \n",
234 | "1 0.64 0.24 \n",
235 | "2 0.61 0.22 \n",
236 | "3 2.66 1.89 \n",
237 | "4 0.64 0.44 \n",
238 | "\n",
239 | " Milk (regular), (1 liter) Loaf of Fresh White Bread (500g) ... \\\n",
240 | "0 0.98 0.71 ... \n",
241 | "1 0.71 0.36 ... \n",
242 | "2 0.65 0.38 ... \n",
243 | "3 0.96 2.27 ... \n",
244 | "4 0.68 0.33 ... \n",
245 | "\n",
246 | " Lettuce (1 head) Cappuccino (regular) Rice (white), (1kg) Tomato (1kg) \\\n",
247 | "0 0.86 1.96 0.92 1.91 \n",
248 | "1 0.61 1.84 1.30 0.80 \n",
249 | "2 0.57 1.56 1.31 0.70 \n",
250 | "3 2.30 3.87 2.13 2.91 \n",
251 | "4 0.84 1.25 0.93 1.56 \n",
252 | "\n",
253 | " Banana (1kg) Onion (1kg) \\\n",
254 | "0 0.89 0.48 \n",
255 | "1 1.91 0.62 \n",
256 | "2 1.78 0.58 \n",
257 | "3 1.61 1.25 \n",
258 | "4 1.37 0.59 \n",
259 | "\n",
260 | " Beef Round (1kg) (or Equivalent Back Leg Red Meat) \\\n",
261 | "0 7.18 \n",
262 | "1 9.73 \n",
263 | "2 8.61 \n",
264 | "3 12.34 \n",
265 | "4 5.37 \n",
266 | "\n",
267 | " Toyota Corolla 1.6l 97kW Comfort (Or Equivalent New Car) \\\n",
268 | "0 19305.29 \n",
269 | "1 20874.72 \n",
270 | "2 20898.83 \n",
271 | "3 24402.77 \n",
272 | "4 17238.13 \n",
273 | "\n",
274 | " Preschool (or Kindergarten), Full Day, Private, Monthly for 1 Child \\\n",
275 | "0 411.83 \n",
276 | "1 282.94 \n",
277 | "2 212.18 \n",
278 | "3 351.60 \n",
279 | "4 210.52 \n",
280 | "\n",
281 | " International Primary School, Yearly for 1 Child \n",
282 | "0 5388.86 \n",
283 | "1 6905.43 \n",
284 | "2 4948.41 \n",
285 | "3 1641.00 \n",
286 | "4 2679.30 \n",
287 | "\n",
288 | "[5 rows x 56 columns]"
289 | ]
290 | },
291 | "execution_count": 32,
292 | "metadata": {},
293 | "output_type": "execute_result"
294 | }
295 | ],
296 | "source": [
297 | "# Code to start you off and manipulate the data. .T is transpose - swap columns and rows\n",
298 | "import pandas as pd\n",
299 | "\n",
300 | "df = pd.read_csv(\"cost-of-living.csv\", index_col=0).T.reset_index()\n",
301 | "df.head()"
302 | ]
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "metadata": {},
307 | "source": [
308 | "## Rename column"
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": 1,
314 | "metadata": {
315 | "ExecuteTime": {
316 | "end_time": "2020-02-03T02:16:15.578519Z",
317 | "start_time": "2020-02-03T02:16:15.574529Z"
318 | }
319 | },
320 | "outputs": [],
321 | "source": [
322 | "# your code here"
323 | ]
324 | },
325 | {
326 | "cell_type": "markdown",
327 | "metadata": {},
328 | "source": [
329 | "## Get city and country"
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "execution_count": 2,
335 | "metadata": {
336 | "ExecuteTime": {
337 | "end_time": "2020-02-03T02:16:18.088160Z",
338 | "start_time": "2020-02-03T02:16:18.084161Z"
339 | }
340 | },
341 | "outputs": [],
342 | "source": [
343 | "# your code here"
344 | ]
345 | },
346 | {
347 | "cell_type": "code",
348 | "execution_count": 3,
349 | "metadata": {
350 | "ExecuteTime": {
351 | "end_time": "2020-02-03T02:16:46.755343Z",
352 | "start_time": "2020-02-03T02:16:46.752351Z"
353 | }
354 | },
355 | "outputs": [],
356 | "source": [
357 | "# And - if needed - correct for the US including states and nowhere else doing it"
358 | ]
359 | },
360 | {
361 | "cell_type": "markdown",
362 | "metadata": {},
363 | "source": [
364 | "## Figure out which country has the most cities"
365 | ]
366 | },
367 | {
368 | "cell_type": "code",
369 | "execution_count": 4,
370 | "metadata": {
371 | "ExecuteTime": {
372 | "end_time": "2020-02-03T02:16:50.046784Z",
373 | "start_time": "2020-02-03T02:16:50.042796Z"
374 | }
375 | },
376 | "outputs": [],
377 | "source": [
378 | "# your code here"
379 | ]
380 | },
381 | {
382 | "cell_type": "markdown",
383 | "metadata": {},
384 | "source": [
385 | "## Create a subset of only that country"
386 | ]
387 | },
388 | {
389 | "cell_type": "code",
390 | "execution_count": 5,
391 | "metadata": {
392 | "ExecuteTime": {
393 | "end_time": "2020-02-03T02:16:54.606541Z",
394 | "start_time": "2020-02-03T02:16:54.602530Z"
395 | }
396 | },
397 | "outputs": [],
398 | "source": [
399 | "# your code here"
400 | ]
401 | },
402 | {
403 | "cell_type": "markdown",
404 | "metadata": {},
405 | "source": [
406 | "## Sort by housing accommodation"
407 | ]
408 | },
409 | {
410 | "cell_type": "code",
411 | "execution_count": 8,
412 | "metadata": {
413 | "ExecuteTime": {
414 | "end_time": "2020-02-03T02:17:07.409143Z",
415 | "start_time": "2020-02-03T02:17:07.406151Z"
416 | }
417 | },
418 | "outputs": [],
419 | "source": [
420 | "col = \"Apartment (1 bedroom) in City Centre\"\n",
421 | "# your code here"
422 | ]
423 | },
424 | {
425 | "cell_type": "markdown",
426 | "metadata": {},
427 | "source": [
428 | "## Despair over the cost of housing"
429 | ]
430 | }
431 | ],
432 | "metadata": {
433 | "kernelspec": {
434 | "display_name": "Python 3",
435 | "language": "python",
436 | "name": "python3"
437 | },
438 | "language_info": {
439 | "codemirror_mode": {
440 | "name": "ipython",
441 | "version": 3
442 | },
443 | "file_extension": ".py",
444 | "mimetype": "text/x-python",
445 | "name": "python",
446 | "nbconvert_exporter": "python",
447 | "pygments_lexer": "ipython3",
448 | "version": "3.7.3"
449 | }
450 | },
451 | "nbformat": 4,
452 | "nbformat_minor": 2
453 | }
454 |
--------------------------------------------------------------------------------
/practiceResource/Answers.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Optional Exercise - Data loading\n",
8 | "\n",
9 | "Heres a very short example for four files to load in. None of these example should be hard, but the idea is to familiarise yourself with the sorts of error messages or weird output you'd get if you load in with the defaults (and they don't work), and then how to change your input arguments to make it better.\n",
10 | "\n",
11 | "The files to attempt to load in are:\n",
12 | "\n",
13 | "1. example.pkl\n",
14 | "2. example_1.csv\n",
15 | "3. example_2.csv\n",
16 | "3. example_3.csv\n",
17 | "3. example_4.csv"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": 2,
23 | "metadata": {
24 | "ExecuteTime": {
25 | "end_time": "2020-02-02T07:24:53.012402Z",
26 | "start_time": "2020-02-02T07:24:52.651384Z"
27 | }
28 | },
29 | "outputs": [],
30 | "source": [
31 | "import pandas as pd"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 4,
37 | "metadata": {
38 | "ExecuteTime": {
39 | "end_time": "2020-02-02T07:24:57.184281Z",
40 | "start_time": "2020-02-02T07:24:57.169320Z"
41 | }
42 | },
43 | "outputs": [
44 | {
45 | "data": {
46 | "text/html": [
47 | "\n",
48 | "\n",
61 | "
\n",
62 | " \n",
63 | " \n",
64 | " | \n",
65 | " name | \n",
66 | " gender | \n",
67 | " age | \n",
68 | " oaths | \n",
69 | "
\n",
70 | " \n",
71 | " \n",
72 | " \n",
73 | " | 0 | \n",
74 | " Kaladin | \n",
75 | " Male | \n",
76 | " 20.0 | \n",
77 | " 3.0 | \n",
78 | "
\n",
79 | " \n",
80 | " | 1 | \n",
81 | " Shallan | \n",
82 | " Female | \n",
83 | " 17.0 | \n",
84 | " 2.0 | \n",
85 | "
\n",
86 | " \n",
87 | " | 2 | \n",
88 | " Dalinar | \n",
89 | " Male | \n",
90 | " 53.0 | \n",
91 | " 3.0 | \n",
92 | "
\n",
93 | " \n",
94 | " | 3 | \n",
95 | " Szeth | \n",
96 | " Male | \n",
97 | " 35.0 | \n",
98 | " 0.0 | \n",
99 | "
\n",
100 | " \n",
101 | " | 4 | \n",
102 | " Hoid | \n",
103 | " Male | \n",
104 | " NaN | \n",
105 | " NaN | \n",
106 | "
\n",
107 | " \n",
108 | " | 5 | \n",
109 | " Jashnah | \n",
110 | " Female | \n",
111 | " 34.0 | \n",
112 | " 3.0 | \n",
113 | "
\n",
114 | " \n",
115 | "
\n",
116 | "
"
117 | ],
118 | "text/plain": [
119 | " name gender age oaths\n",
120 | "0 Kaladin Male 20.0 3.0\n",
121 | "1 Shallan Female 17.0 2.0\n",
122 | "2 Dalinar Male 53.0 3.0\n",
123 | "3 Szeth Male 35.0 0.0\n",
124 | "4 Hoid Male NaN NaN\n",
125 | "5 Jashnah Female 34.0 3.0"
126 | ]
127 | },
128 | "execution_count": 4,
129 | "metadata": {},
130 | "output_type": "execute_result"
131 | }
132 | ],
133 | "source": [
134 | "pd.read_pickle(\"example.pkl\")"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 6,
140 | "metadata": {
141 | "ExecuteTime": {
142 | "end_time": "2020-02-02T07:34:57.663844Z",
143 | "start_time": "2020-02-02T07:34:57.648884Z"
144 | }
145 | },
146 | "outputs": [
147 | {
148 | "data": {
149 | "text/html": [
150 | "\n",
151 | "\n",
164 | "
\n",
165 | " \n",
166 | " \n",
167 | " | \n",
168 | " name | \n",
169 | " gender | \n",
170 | " age | \n",
171 | " oaths | \n",
172 | "
\n",
173 | " \n",
174 | " \n",
175 | " \n",
176 | " | 0 | \n",
177 | " Kaladin | \n",
178 | " Male | \n",
179 | " 20.0 | \n",
180 | " 3.0 | \n",
181 | "
\n",
182 | " \n",
183 | " | 1 | \n",
184 | " Shallan | \n",
185 | " Female | \n",
186 | " 17.0 | \n",
187 | " 2.0 | \n",
188 | "
\n",
189 | " \n",
190 | " | 2 | \n",
191 | " Dalinar | \n",
192 | " Male | \n",
193 | " 53.0 | \n",
194 | " 3.0 | \n",
195 | "
\n",
196 | " \n",
197 | " | 3 | \n",
198 | " Szeth | \n",
199 | " Male | \n",
200 | " 35.0 | \n",
201 | " 0.0 | \n",
202 | "
\n",
203 | " \n",
204 | " | 4 | \n",
205 | " Hoid | \n",
206 | " Male | \n",
207 | " NaN | \n",
208 | " NaN | \n",
209 | "
\n",
210 | " \n",
211 | " | 5 | \n",
212 | " Jashnah | \n",
213 | " Female | \n",
214 | " 34.0 | \n",
215 | " 3.0 | \n",
216 | "
\n",
217 | " \n",
218 | "
\n",
219 | "
"
220 | ],
221 | "text/plain": [
222 | " name gender age oaths\n",
223 | "0 Kaladin Male 20.0 3.0\n",
224 | "1 Shallan Female 17.0 2.0\n",
225 | "2 Dalinar Male 53.0 3.0\n",
226 | "3 Szeth Male 35.0 0.0\n",
227 | "4 Hoid Male NaN NaN\n",
228 | "5 Jashnah Female 34.0 3.0"
229 | ]
230 | },
231 | "execution_count": 6,
232 | "metadata": {},
233 | "output_type": "execute_result"
234 | }
235 | ],
236 | "source": [
237 | "pd.read_csv(\"example_1.csv\")"
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": 8,
243 | "metadata": {
244 | "ExecuteTime": {
245 | "end_time": "2020-02-02T07:35:19.179516Z",
246 | "start_time": "2020-02-02T07:35:19.168566Z"
247 | }
248 | },
249 | "outputs": [
250 | {
251 | "data": {
252 | "text/html": [
253 | "\n",
254 | "\n",
267 | "
\n",
268 | " \n",
269 | " \n",
270 | " | \n",
271 | " name | \n",
272 | " gender | \n",
273 | " age | \n",
274 | " oaths | \n",
275 | "
\n",
276 | " \n",
277 | " \n",
278 | " \n",
279 | " | 0 | \n",
280 | " Kaladin | \n",
281 | " Male | \n",
282 | " 20.0 | \n",
283 | " 3.0 | \n",
284 | "
\n",
285 | " \n",
286 | " | 1 | \n",
287 | " Shallan | \n",
288 | " Female | \n",
289 | " 17.0 | \n",
290 | " 2.0 | \n",
291 | "
\n",
292 | " \n",
293 | " | 2 | \n",
294 | " Dalinar | \n",
295 | " Male | \n",
296 | " 53.0 | \n",
297 | " 3.0 | \n",
298 | "
\n",
299 | " \n",
300 | " | 3 | \n",
301 | " Szeth | \n",
302 | " Male | \n",
303 | " 35.0 | \n",
304 | " 0.0 | \n",
305 | "
\n",
306 | " \n",
307 | " | 4 | \n",
308 | " Hoid | \n",
309 | " Male | \n",
310 | " NaN | \n",
311 | " NaN | \n",
312 | "
\n",
313 | " \n",
314 | " | 5 | \n",
315 | " Jashnah | \n",
316 | " Female | \n",
317 | " 34.0 | \n",
318 | " 3.0 | \n",
319 | "
\n",
320 | " \n",
321 | "
\n",
322 | "
"
323 | ],
324 | "text/plain": [
325 | " name gender age oaths\n",
326 | "0 Kaladin Male 20.0 3.0\n",
327 | "1 Shallan Female 17.0 2.0\n",
328 | "2 Dalinar Male 53.0 3.0\n",
329 | "3 Szeth Male 35.0 0.0\n",
330 | "4 Hoid Male NaN NaN\n",
331 | "5 Jashnah Female 34.0 3.0"
332 | ]
333 | },
334 | "execution_count": 8,
335 | "metadata": {},
336 | "output_type": "execute_result"
337 | }
338 | ],
339 | "source": [
340 | "pd.read_csv(\"example_2.csv\", delim_whitespace=True)"
341 | ]
342 | },
343 | {
344 | "cell_type": "code",
345 | "execution_count": 11,
346 | "metadata": {
347 | "ExecuteTime": {
348 | "end_time": "2020-02-02T07:37:11.470254Z",
349 | "start_time": "2020-02-02T07:37:11.455294Z"
350 | }
351 | },
352 | "outputs": [
353 | {
354 | "data": {
355 | "text/html": [
356 | "\n",
357 | "\n",
370 | "
\n",
371 | " \n",
372 | " \n",
373 | " | \n",
374 | " name | \n",
375 | " gender | \n",
376 | " age | \n",
377 | " oaths | \n",
378 | "
\n",
379 | " \n",
380 | " \n",
381 | " \n",
382 | " | 0 | \n",
383 | " Kaladin | \n",
384 | " Male | \n",
385 | " 20.0 | \n",
386 | " 3.0 | \n",
387 | "
\n",
388 | " \n",
389 | " | 1 | \n",
390 | " Shallan | \n",
391 | " Female | \n",
392 | " 17.0 | \n",
393 | " 2.0 | \n",
394 | "
\n",
395 | " \n",
396 | " | 2 | \n",
397 | " Dalinar | \n",
398 | " Male | \n",
399 | " 53.0 | \n",
400 | " 3.0 | \n",
401 | "
\n",
402 | " \n",
403 | " | 3 | \n",
404 | " Szeth | \n",
405 | " Male | \n",
406 | " 35.0 | \n",
407 | " 0.0 | \n",
408 | "
\n",
409 | " \n",
410 | " | 4 | \n",
411 | " Hoid | \n",
412 | " Male | \n",
413 | " NaN | \n",
414 | " NaN | \n",
415 | "
\n",
416 | " \n",
417 | " | 5 | \n",
418 | " Jashnah | \n",
419 | " Female | \n",
420 | " 34.0 | \n",
421 | " 3.0 | \n",
422 | "
\n",
423 | " \n",
424 | "
\n",
425 | "
"
426 | ],
427 | "text/plain": [
428 | " name gender age oaths\n",
429 | "0 Kaladin Male 20.0 3.0\n",
430 | "1 Shallan Female 17.0 2.0\n",
431 | "2 Dalinar Male 53.0 3.0\n",
432 | "3 Szeth Male 35.0 0.0\n",
433 | "4 Hoid Male NaN NaN\n",
434 | "5 Jashnah Female 34.0 3.0"
435 | ]
436 | },
437 | "execution_count": 11,
438 | "metadata": {},
439 | "output_type": "execute_result"
440 | }
441 | ],
442 | "source": [
443 | "pd.read_csv(\"example_3.csv\", index_col=0)"
444 | ]
445 | },
446 | {
447 | "cell_type": "code",
448 | "execution_count": 14,
449 | "metadata": {
450 | "ExecuteTime": {
451 | "end_time": "2020-02-02T07:37:31.812465Z",
452 | "start_time": "2020-02-02T07:37:31.801514Z"
453 | }
454 | },
455 | "outputs": [
456 | {
457 | "data": {
458 | "text/html": [
459 | "\n",
460 | "\n",
473 | "
\n",
474 | " \n",
475 | " \n",
476 | " | \n",
477 | " name | \n",
478 | " gender | \n",
479 | " age | \n",
480 | " oaths | \n",
481 | "
\n",
482 | " \n",
483 | " \n",
484 | " \n",
485 | " | 0 | \n",
486 | " Kaladin | \n",
487 | " Male | \n",
488 | " 20.0 | \n",
489 | " 3.0 | \n",
490 | "
\n",
491 | " \n",
492 | " | 1 | \n",
493 | " Shallan | \n",
494 | " Female | \n",
495 | " 17.0 | \n",
496 | " 2.0 | \n",
497 | "
\n",
498 | " \n",
499 | " | 2 | \n",
500 | " Dalinar | \n",
501 | " Male | \n",
502 | " 53.0 | \n",
503 | " 3.0 | \n",
504 | "
\n",
505 | " \n",
506 | " | 3 | \n",
507 | " Szeth | \n",
508 | " Male | \n",
509 | " 35.0 | \n",
510 | " 0.0 | \n",
511 | "
\n",
512 | " \n",
513 | " | 4 | \n",
514 | " Hoid | \n",
515 | " Male | \n",
516 | " NaN | \n",
517 | " NaN | \n",
518 | "
\n",
519 | " \n",
520 | " | 5 | \n",
521 | " Jashnah | \n",
522 | " Female | \n",
523 | " 34.0 | \n",
524 | " 3.0 | \n",
525 | "
\n",
526 | " \n",
527 | "
\n",
528 | "
"
529 | ],
530 | "text/plain": [
531 | " name gender age oaths\n",
532 | "0 Kaladin Male 20.0 3.0\n",
533 | "1 Shallan Female 17.0 2.0\n",
534 | "2 Dalinar Male 53.0 3.0\n",
535 | "3 Szeth Male 35.0 0.0\n",
536 | "4 Hoid Male NaN NaN\n",
537 | "5 Jashnah Female 34.0 3.0"
538 | ]
539 | },
540 | "execution_count": 14,
541 | "metadata": {},
542 | "output_type": "execute_result"
543 | }
544 | ],
545 | "source": [
546 | "pd.read_csv(\"example_4.csv\", sep=\"|\", comment=\"#\")"
547 | ]
548 | }
549 | ],
550 | "metadata": {
551 | "kernelspec": {
552 | "display_name": "Python 3",
553 | "language": "python",
554 | "name": "python3"
555 | },
556 | "language_info": {
557 | "codemirror_mode": {
558 | "name": "ipython",
559 | "version": 3
560 | },
561 | "file_extension": ".py",
562 | "mimetype": "text/x-python",
563 | "name": "python",
564 | "nbconvert_exporter": "python",
565 | "pygments_lexer": "ipython3",
566 | "version": "3.7.3"
567 | }
568 | },
569 | "nbformat": 4,
570 | "nbformat_minor": 2
571 | }
572 |
--------------------------------------------------------------------------------
/practiceResource/SavingAndSerialising.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Saving and Serialising a dataframe\n"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "ExecuteTime": {
15 | "end_time": "2020-02-16T02:55:25.086213Z",
16 | "start_time": "2020-02-16T02:55:23.758762Z"
17 | }
18 | },
19 | "outputs": [],
20 | "source": [
21 | "import numpy as np\n",
22 | "import pandas as pd"
23 | ]
24 | },
25 | {
26 | "cell_type": "code",
27 | "execution_count": 4,
28 | "metadata": {
29 | "ExecuteTime": {
30 | "end_time": "2020-02-16T03:01:00.686674Z",
31 | "start_time": "2020-02-16T03:01:00.668178Z"
32 | }
33 | },
34 | "outputs": [
35 | {
36 | "data": {
37 | "text/html": [
38 | "\n",
39 | "\n",
52 | "
\n",
53 | " \n",
54 | " \n",
55 | " | \n",
56 | " A | \n",
57 | " B | \n",
58 | " C | \n",
59 | " D | \n",
60 | "
\n",
61 | " \n",
62 | " \n",
63 | " \n",
64 | " | 0 | \n",
65 | " 0.069474 | \n",
66 | " 0.016839 | \n",
67 | " 0.607693 | \n",
68 | " 0.960414 | \n",
69 | "
\n",
70 | " \n",
71 | " | 1 | \n",
72 | " 0.755562 | \n",
73 | " 0.792302 | \n",
74 | " 0.638826 | \n",
75 | " 0.257696 | \n",
76 | "
\n",
77 | " \n",
78 | " | 2 | \n",
79 | " 0.766277 | \n",
80 | " 0.049024 | \n",
81 | " 0.264378 | \n",
82 | " 0.898995 | \n",
83 | "
\n",
84 | " \n",
85 | " | 3 | \n",
86 | " 0.263386 | \n",
87 | " 0.188590 | \n",
88 | " 0.977028 | \n",
89 | " 0.101986 | \n",
90 | "
\n",
91 | " \n",
92 | " | 4 | \n",
93 | " 0.052184 | \n",
94 | " 0.381186 | \n",
95 | " 0.655244 | \n",
96 | " 0.827316 | \n",
97 | "
\n",
98 | " \n",
99 | "
\n",
100 | "
"
101 | ],
102 | "text/plain": [
103 | " A B C D\n",
104 | "0 0.069474 0.016839 0.607693 0.960414\n",
105 | "1 0.755562 0.792302 0.638826 0.257696\n",
106 | "2 0.766277 0.049024 0.264378 0.898995\n",
107 | "3 0.263386 0.188590 0.977028 0.101986\n",
108 | "4 0.052184 0.381186 0.655244 0.827316"
109 | ]
110 | },
111 | "execution_count": 4,
112 | "metadata": {},
113 | "output_type": "execute_result"
114 | }
115 | ],
116 | "source": [
117 | "# Lets make a new dataframe and save it out using various formats\n",
118 | "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n",
119 | "df.head()"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 6,
125 | "metadata": {
126 | "ExecuteTime": {
127 | "end_time": "2020-02-16T03:03:34.987813Z",
128 | "start_time": "2020-02-16T03:03:34.219248Z"
129 | }
130 | },
131 | "outputs": [],
132 | "source": [
133 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": 7,
139 | "metadata": {
140 | "ExecuteTime": {
141 | "end_time": "2020-02-16T03:04:00.092272Z",
142 | "start_time": "2020-02-16T03:04:00.079738Z"
143 | }
144 | },
145 | "outputs": [],
146 | "source": [
147 | "df.to_pickle(\"save.pkl\")"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 8,
153 | "metadata": {
154 | "ExecuteTime": {
155 | "end_time": "2020-02-16T03:06:06.874338Z",
156 | "start_time": "2020-02-16T03:06:05.955905Z"
157 | }
158 | },
159 | "outputs": [],
160 | "source": [
161 | "# pip install tables\n",
162 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": 9,
168 | "metadata": {
169 | "ExecuteTime": {
170 | "end_time": "2020-02-16T03:06:56.305779Z",
171 | "start_time": "2020-02-16T03:06:56.204901Z"
172 | }
173 | },
174 | "outputs": [],
175 | "source": [
176 | "# pip install feather-format\n",
177 | "df.to_feather(\"save.fth\")"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 11,
183 | "metadata": {
184 | "ExecuteTime": {
185 | "end_time": "2020-02-16T03:10:46.080056Z",
186 | "start_time": "2020-02-16T03:10:46.075636Z"
187 | }
188 | },
189 | "outputs": [],
190 | "source": [
191 | "# If you want to get the timings you can see in the video, you'll need this extension:\n",
192 | "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 14,
205 | "metadata": {
206 | "ExecuteTime": {
207 | "end_time": "2020-02-16T03:14:16.764994Z",
208 | "start_time": "2020-02-16T03:14:16.741456Z"
209 | }
210 | },
211 | "outputs": [
212 | {
213 | "data": {
214 | "text/html": [
215 | "\n",
216 | "\n",
229 | "
\n",
230 | " \n",
231 | " \n",
232 | " | \n",
233 | " Name | \n",
234 | " Year | \n",
235 | " Group | \n",
236 | " Status | \n",
237 | " Birth Date | \n",
238 | " Birth Place | \n",
239 | " Gender | \n",
240 | " Alma Mater | \n",
241 | " Undergraduate Major | \n",
242 | " Graduate Major | \n",
243 | " Military Rank | \n",
244 | " Military Branch | \n",
245 | " Space Flights | \n",
246 | " Space Flight (hr) | \n",
247 | " Space Walks | \n",
248 | " Space Walks (hr) | \n",
249 | " Missions | \n",
250 | " Death Date | \n",
251 | " Death Mission | \n",
252 | "
\n",
253 | " \n",
254 | " \n",
255 | " \n",
256 | " | 0 | \n",
257 | " Joseph M. Acaba | \n",
258 | " 2004.0 | \n",
259 | " 19.0 | \n",
260 | " Active | \n",
261 | " 5/17/1967 | \n",
262 | " Inglewood, CA | \n",
263 | " Male | \n",
264 | " University of California-Santa Barbara; Univer... | \n",
265 | " Geology | \n",
266 | " Geology | \n",
267 | " NaN | \n",
268 | " NaN | \n",
269 | " 2 | \n",
270 | " 3307 | \n",
271 | " 2 | \n",
272 | " 13.0 | \n",
273 | " STS-119 (Discovery), ISS-31/32 (Soyuz) | \n",
274 | " NaN | \n",
275 | " NaN | \n",
276 | "
\n",
277 | " \n",
278 | " | 1 | \n",
279 | " Loren W. Acton | \n",
280 | " NaN | \n",
281 | " NaN | \n",
282 | " Retired | \n",
283 | " 3/7/1936 | \n",
284 | " Lewiston, MT | \n",
285 | " Male | \n",
286 | " Montana State University; University of Colorado | \n",
287 | " Engineering Physics | \n",
288 | " Solar Physics | \n",
289 | " NaN | \n",
290 | " NaN | \n",
291 | " 1 | \n",
292 | " 190 | \n",
293 | " 0 | \n",
294 | " 0.0 | \n",
295 | " STS 51-F (Challenger) | \n",
296 | " NaN | \n",
297 | " NaN | \n",
298 | "
\n",
299 | " \n",
300 | " | 2 | \n",
301 | " James C. Adamson | \n",
302 | " 1984.0 | \n",
303 | " 10.0 | \n",
304 | " Retired | \n",
305 | " 3/3/1946 | \n",
306 | " Warsaw, NY | \n",
307 | " Male | \n",
308 | " US Military Academy; Princeton University | \n",
309 | " Engineering | \n",
310 | " Aerospace Engineering | \n",
311 | " Colonel | \n",
312 | " US Army (Retired) | \n",
313 | " 2 | \n",
314 | " 334 | \n",
315 | " 0 | \n",
316 | " 0.0 | \n",
317 | " STS-28 (Columbia), STS-43 (Atlantis) | \n",
318 | " NaN | \n",
319 | " NaN | \n",
320 | "
\n",
321 | " \n",
322 | " | 3 | \n",
323 | " Thomas D. Akers | \n",
324 | " 1987.0 | \n",
325 | " 12.0 | \n",
326 | " Retired | \n",
327 | " 5/20/1951 | \n",
328 | " St. Louis, MO | \n",
329 | " Male | \n",
330 | " University of Missouri-Rolla | \n",
331 | " Applied Mathematics | \n",
332 | " Applied Mathematics | \n",
333 | " Colonel | \n",
334 | " US Air Force (Retired) | \n",
335 | " 4 | \n",
336 | " 814 | \n",
337 | " 4 | \n",
338 | " 29.0 | \n",
339 | " STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... | \n",
340 | " NaN | \n",
341 | " NaN | \n",
342 | "
\n",
343 | " \n",
344 | " | 4 | \n",
345 | " Buzz Aldrin | \n",
346 | " 1963.0 | \n",
347 | " 3.0 | \n",
348 | " Retired | \n",
349 | " 1/20/1930 | \n",
350 | " Montclair, NJ | \n",
351 | " Male | \n",
352 | " US Military Academy; MIT | \n",
353 | " Mechanical Engineering | \n",
354 | " Astronautics | \n",
355 | " Colonel | \n",
356 | " US Air Force (Retired) | \n",
357 | " 2 | \n",
358 | " 289 | \n",
359 | " 2 | \n",
360 | " 8.0 | \n",
361 | " Gemini 12, Apollo 11 | \n",
362 | " NaN | \n",
363 | " NaN | \n",
364 | "
\n",
365 | " \n",
366 | "
\n",
367 | "
"
368 | ],
369 | "text/plain": [
370 | " Name Year Group Status Birth Date Birth Place Gender \\\n",
371 | "0 Joseph M. Acaba 2004.0 19.0 Active 5/17/1967 Inglewood, CA Male \n",
372 | "1 Loren W. Acton NaN NaN Retired 3/7/1936 Lewiston, MT Male \n",
373 | "2 James C. Adamson 1984.0 10.0 Retired 3/3/1946 Warsaw, NY Male \n",
374 | "3 Thomas D. Akers 1987.0 12.0 Retired 5/20/1951 St. Louis, MO Male \n",
375 | "4 Buzz Aldrin 1963.0 3.0 Retired 1/20/1930 Montclair, NJ Male \n",
376 | "\n",
377 | " Alma Mater Undergraduate Major \\\n",
378 | "0 University of California-Santa Barbara; Univer... Geology \n",
379 | "1 Montana State University; University of Colorado Engineering Physics \n",
380 | "2 US Military Academy; Princeton University Engineering \n",
381 | "3 University of Missouri-Rolla Applied Mathematics \n",
382 | "4 US Military Academy; MIT Mechanical Engineering \n",
383 | "\n",
384 | " Graduate Major Military Rank Military Branch Space Flights \\\n",
385 | "0 Geology NaN NaN 2 \n",
386 | "1 Solar Physics NaN NaN 1 \n",
387 | "2 Aerospace Engineering Colonel US Army (Retired) 2 \n",
388 | "3 Applied Mathematics Colonel US Air Force (Retired) 4 \n",
389 | "4 Astronautics Colonel US Air Force (Retired) 2 \n",
390 | "\n",
391 | " Space Flight (hr) Space Walks Space Walks (hr) \\\n",
392 | "0 3307 2 13.0 \n",
393 | "1 190 0 0.0 \n",
394 | "2 334 0 0.0 \n",
395 | "3 814 4 29.0 \n",
396 | "4 289 2 8.0 \n",
397 | "\n",
398 | " Missions Death Date Death Mission \n",
399 | "0 STS-119 (Discovery), ISS-31/32 (Soyuz) NaN NaN \n",
400 | "1 STS 51-F (Challenger) NaN NaN \n",
401 | "2 STS-28 (Columbia), STS-43 (Atlantis) NaN NaN \n",
402 | "3 STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... NaN NaN \n",
403 | "4 Gemini 12, Apollo 11 NaN NaN "
404 | ]
405 | },
406 | "execution_count": 14,
407 | "metadata": {},
408 | "output_type": "execute_result"
409 | }
410 | ],
411 | "source": [
412 | "df = pd.read_csv(\"astronauts.csv\")\n",
413 | "df.head()"
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": 15,
419 | "metadata": {
420 | "ExecuteTime": {
421 | "end_time": "2020-02-16T03:14:48.250858Z",
422 | "start_time": "2020-02-16T03:14:48.237441Z"
423 | }
424 | },
425 | "outputs": [],
426 | "source": [
427 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
428 | ]
429 | },
430 | {
431 | "cell_type": "code",
432 | "execution_count": 16,
433 | "metadata": {
434 | "ExecuteTime": {
435 | "end_time": "2020-02-16T03:14:52.892116Z",
436 | "start_time": "2020-02-16T03:14:52.876108Z"
437 | }
438 | },
439 | "outputs": [],
440 | "source": [
441 | "pd.read_csv(\"save.csv\");"
442 | ]
443 | },
444 | {
445 | "cell_type": "code",
446 | "execution_count": 17,
447 | "metadata": {
448 | "ExecuteTime": {
449 | "end_time": "2020-02-16T03:15:12.997156Z",
450 | "start_time": "2020-02-16T03:15:12.988669Z"
451 | }
452 | },
453 | "outputs": [],
454 | "source": [
455 | "df.to_pickle(\"save.pkl\")"
456 | ]
457 | },
458 | {
459 | "cell_type": "code",
460 | "execution_count": 18,
461 | "metadata": {
462 | "ExecuteTime": {
463 | "end_time": "2020-02-16T03:15:16.375064Z",
464 | "start_time": "2020-02-16T03:15:16.365034Z"
465 | }
466 | },
467 | "outputs": [],
468 | "source": [
469 | "pd.read_pickle(\"save.pkl\");"
470 | ]
471 | },
472 | {
473 | "cell_type": "code",
474 | "execution_count": 32,
475 | "metadata": {
476 | "ExecuteTime": {
477 | "end_time": "2020-02-16T03:19:15.617617Z",
478 | "start_time": "2020-02-16T03:19:15.588076Z"
479 | }
480 | },
481 | "outputs": [],
482 | "source": [
483 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": 22,
489 | "metadata": {
490 | "ExecuteTime": {
491 | "end_time": "2020-02-16T03:15:35.323031Z",
492 | "start_time": "2020-02-16T03:15:35.301528Z"
493 | }
494 | },
495 | "outputs": [],
496 | "source": [
497 | "pd.read_hdf(\"save.hdf\");"
498 | ]
499 | },
500 | {
501 | "cell_type": "code",
502 | "execution_count": 23,
503 | "metadata": {
504 | "ExecuteTime": {
505 | "end_time": "2020-02-16T03:15:47.513253Z",
506 | "start_time": "2020-02-16T03:15:47.499922Z"
507 | }
508 | },
509 | "outputs": [],
510 | "source": [
511 | "df.to_feather(\"save.fth\")"
512 | ]
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": 24,
517 | "metadata": {
518 | "ExecuteTime": {
519 | "end_time": "2020-02-16T03:15:50.574863Z",
520 | "start_time": "2020-02-16T03:15:50.557141Z"
521 | }
522 | },
523 | "outputs": [],
524 | "source": [
525 | "pd.read_feather(\"save.fth\");"
526 | ]
527 | },
528 | {
529 | "cell_type": "code",
530 | "execution_count": 34,
531 | "metadata": {
532 | "ExecuteTime": {
533 | "end_time": "2020-02-16T03:20:03.082982Z",
534 | "start_time": "2020-02-16T03:20:03.062532Z"
535 | }
536 | },
537 | "outputs": [
538 | {
539 | "name": "stdout",
540 | "output_type": "stream",
541 | "text": [
542 | " Volume in drive C is System\n",
543 | " Volume Serial Number is 48F0-A822\n",
544 | "\n",
545 | " Directory of C:\\Users\\shint1\\Google Drive\\SDS\\DataManip\\2. Notebooks and Datasets\\2_Data\\Lectures\n",
546 | "\n",
547 | "16/02/2020 01:18 PM .\n",
548 | "16/02/2020 01:18 PM ..\n",
549 | "16/02/2020 12:55 PM .ipynb_checkpoints\n",
550 | "14/02/2020 10:50 PM 38,725 1_Loading.ipynb\n",
551 | "14/02/2020 11:32 PM 32,118 2_NumpyVPandas.ipynb\n",
552 | "16/02/2020 12:54 PM 9,216 3_CreatingDataFrames.ipynb\n",
553 | "16/02/2020 01:18 PM 18,019 4_SavingAndSerialising.ipynb\n",
554 | "20/09/2019 10:04 AM 81,593 astronauts.csv\n",
555 | "01/10/2019 08:15 PM 11,328 heart.csv\n",
556 | "18/01/2020 01:19 PM 35,216 heart.pkl\n",
557 | "16/02/2020 01:14 PM 87,030 save.csv\n",
558 | "16/02/2020 01:15 PM 107,240 save.fth\n",
559 | "16/02/2020 01:19 PM 4,108,481 save.hdf\n",
560 | "16/02/2020 01:15 PM 90,693 save.pkl\n",
561 | " 11 File(s) 4,619,659 bytes\n",
562 | " 3 Dir(s) 244,606,853,120 bytes free\n"
563 | ]
564 | }
565 | ],
566 | "source": [
567 | "%ls"
568 | ]
569 | },
570 | {
571 | "cell_type": "markdown",
572 | "metadata": {},
573 | "source": [
574 | "### Recap\n",
575 | "\n",
576 | "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n",
577 | "\n",
578 | "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine."
579 | ]
580 | }
581 | ],
582 | "metadata": {
583 | "kernelspec": {
584 | "display_name": "Python 3",
585 | "language": "python",
586 | "name": "python3"
587 | },
588 | "language_info": {
589 | "codemirror_mode": {
590 | "name": "ipython",
591 | "version": 3
592 | },
593 | "file_extension": ".py",
594 | "mimetype": "text/x-python",
595 | "name": "python",
596 | "nbconvert_exporter": "python",
597 | "pygments_lexer": "ipython3",
598 | "version": "3.7.3"
599 | }
600 | },
601 | "nbformat": 4,
602 | "nbformat_minor": 2
603 | }
604 |
--------------------------------------------------------------------------------
/practiceResource/.ipynb_checkpoints/dataSavingAndSerialising-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Saving and Serialising a dataframe\n"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "ExecuteTime": {
15 | "end_time": "2020-02-16T02:55:25.086213Z",
16 | "start_time": "2020-02-16T02:55:23.758762Z"
17 | }
18 | },
19 | "outputs": [],
20 | "source": [
21 | "import numpy as np\n",
22 | "import pandas as pd"
23 | ]
24 | },
25 | {
26 | "cell_type": "code",
27 | "execution_count": 2,
28 | "metadata": {
29 | "ExecuteTime": {
30 | "end_time": "2020-02-16T03:01:00.686674Z",
31 | "start_time": "2020-02-16T03:01:00.668178Z"
32 | }
33 | },
34 | "outputs": [
35 | {
36 | "data": {
37 | "text/html": [
38 | "\n",
39 | "\n",
52 | "
\n",
53 | " \n",
54 | " \n",
55 | " | \n",
56 | " A | \n",
57 | " B | \n",
58 | " C | \n",
59 | " D | \n",
60 | "
\n",
61 | " \n",
62 | " \n",
63 | " \n",
64 | " | 0 | \n",
65 | " 0.863149 | \n",
66 | " 0.314732 | \n",
67 | " 0.669747 | \n",
68 | " 0.702656 | \n",
69 | "
\n",
70 | " \n",
71 | " | 1 | \n",
72 | " 0.546542 | \n",
73 | " 0.563607 | \n",
74 | " 0.780532 | \n",
75 | " 0.312281 | \n",
76 | "
\n",
77 | " \n",
78 | " | 2 | \n",
79 | " 0.024058 | \n",
80 | " 0.473108 | \n",
81 | " 0.447980 | \n",
82 | " 0.811878 | \n",
83 | "
\n",
84 | " \n",
85 | " | 3 | \n",
86 | " 0.888702 | \n",
87 | " 0.392524 | \n",
88 | " 0.830159 | \n",
89 | " 0.452014 | \n",
90 | "
\n",
91 | " \n",
92 | " | 4 | \n",
93 | " 0.266793 | \n",
94 | " 0.449780 | \n",
95 | " 0.589546 | \n",
96 | " 0.882689 | \n",
97 | "
\n",
98 | " \n",
99 | "
\n",
100 | "
"
101 | ],
102 | "text/plain": [
103 | " A B C D\n",
104 | "0 0.863149 0.314732 0.669747 0.702656\n",
105 | "1 0.546542 0.563607 0.780532 0.312281\n",
106 | "2 0.024058 0.473108 0.447980 0.811878\n",
107 | "3 0.888702 0.392524 0.830159 0.452014\n",
108 | "4 0.266793 0.449780 0.589546 0.882689"
109 | ]
110 | },
111 | "execution_count": 2,
112 | "metadata": {},
113 | "output_type": "execute_result"
114 | }
115 | ],
116 | "source": [
117 | "# Lets make a new dataframe and save it out using various formats\n",
118 | "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n",
119 | "df.head()"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 3,
125 | "metadata": {
126 | "ExecuteTime": {
127 | "end_time": "2020-02-16T03:03:34.987813Z",
128 | "start_time": "2020-02-16T03:03:34.219248Z"
129 | }
130 | },
131 | "outputs": [],
132 | "source": [
133 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.3f\")"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": 4,
139 | "metadata": {
140 | "ExecuteTime": {
141 | "end_time": "2020-02-16T03:04:00.092272Z",
142 | "start_time": "2020-02-16T03:04:00.079738Z"
143 | }
144 | },
145 | "outputs": [],
146 | "source": [
147 | "df.to_pickle(\"save.pkl\")"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 8,
153 | "metadata": {
154 | "ExecuteTime": {
155 | "end_time": "2020-02-16T03:06:06.874338Z",
156 | "start_time": "2020-02-16T03:06:05.955905Z"
157 | }
158 | },
159 | "outputs": [],
160 | "source": [
161 | "# pip install tables\n",
162 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": 9,
168 | "metadata": {
169 | "ExecuteTime": {
170 | "end_time": "2020-02-16T03:06:56.305779Z",
171 | "start_time": "2020-02-16T03:06:56.204901Z"
172 | }
173 | },
174 | "outputs": [],
175 | "source": [
176 | "# pip install feather-format\n",
177 | "df.to_feather(\"save.fth\")"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 11,
183 | "metadata": {
184 | "ExecuteTime": {
185 | "end_time": "2020-02-16T03:10:46.080056Z",
186 | "start_time": "2020-02-16T03:10:46.075636Z"
187 | }
188 | },
189 | "outputs": [],
190 | "source": [
191 | "# If you want to get the timings you can see in the video, you'll need this extension:\n",
192 | "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 14,
205 | "metadata": {
206 | "ExecuteTime": {
207 | "end_time": "2020-02-16T03:14:16.764994Z",
208 | "start_time": "2020-02-16T03:14:16.741456Z"
209 | }
210 | },
211 | "outputs": [
212 | {
213 | "data": {
214 | "text/html": [
215 | "\n",
216 | "\n",
229 | "
\n",
230 | " \n",
231 | " \n",
232 | " | \n",
233 | " Name | \n",
234 | " Year | \n",
235 | " Group | \n",
236 | " Status | \n",
237 | " Birth Date | \n",
238 | " Birth Place | \n",
239 | " Gender | \n",
240 | " Alma Mater | \n",
241 | " Undergraduate Major | \n",
242 | " Graduate Major | \n",
243 | " Military Rank | \n",
244 | " Military Branch | \n",
245 | " Space Flights | \n",
246 | " Space Flight (hr) | \n",
247 | " Space Walks | \n",
248 | " Space Walks (hr) | \n",
249 | " Missions | \n",
250 | " Death Date | \n",
251 | " Death Mission | \n",
252 | "
\n",
253 | " \n",
254 | " \n",
255 | " \n",
256 | " | 0 | \n",
257 | " Joseph M. Acaba | \n",
258 | " 2004.0 | \n",
259 | " 19.0 | \n",
260 | " Active | \n",
261 | " 5/17/1967 | \n",
262 | " Inglewood, CA | \n",
263 | " Male | \n",
264 | " University of California-Santa Barbara; Univer... | \n",
265 | " Geology | \n",
266 | " Geology | \n",
267 | " NaN | \n",
268 | " NaN | \n",
269 | " 2 | \n",
270 | " 3307 | \n",
271 | " 2 | \n",
272 | " 13.0 | \n",
273 | " STS-119 (Discovery), ISS-31/32 (Soyuz) | \n",
274 | " NaN | \n",
275 | " NaN | \n",
276 | "
\n",
277 | " \n",
278 | " | 1 | \n",
279 | " Loren W. Acton | \n",
280 | " NaN | \n",
281 | " NaN | \n",
282 | " Retired | \n",
283 | " 3/7/1936 | \n",
284 | " Lewiston, MT | \n",
285 | " Male | \n",
286 | " Montana State University; University of Colorado | \n",
287 | " Engineering Physics | \n",
288 | " Solar Physics | \n",
289 | " NaN | \n",
290 | " NaN | \n",
291 | " 1 | \n",
292 | " 190 | \n",
293 | " 0 | \n",
294 | " 0.0 | \n",
295 | " STS 51-F (Challenger) | \n",
296 | " NaN | \n",
297 | " NaN | \n",
298 | "
\n",
299 | " \n",
300 | " | 2 | \n",
301 | " James C. Adamson | \n",
302 | " 1984.0 | \n",
303 | " 10.0 | \n",
304 | " Retired | \n",
305 | " 3/3/1946 | \n",
306 | " Warsaw, NY | \n",
307 | " Male | \n",
308 | " US Military Academy; Princeton University | \n",
309 | " Engineering | \n",
310 | " Aerospace Engineering | \n",
311 | " Colonel | \n",
312 | " US Army (Retired) | \n",
313 | " 2 | \n",
314 | " 334 | \n",
315 | " 0 | \n",
316 | " 0.0 | \n",
317 | " STS-28 (Columbia), STS-43 (Atlantis) | \n",
318 | " NaN | \n",
319 | " NaN | \n",
320 | "
\n",
321 | " \n",
322 | " | 3 | \n",
323 | " Thomas D. Akers | \n",
324 | " 1987.0 | \n",
325 | " 12.0 | \n",
326 | " Retired | \n",
327 | " 5/20/1951 | \n",
328 | " St. Louis, MO | \n",
329 | " Male | \n",
330 | " University of Missouri-Rolla | \n",
331 | " Applied Mathematics | \n",
332 | " Applied Mathematics | \n",
333 | " Colonel | \n",
334 | " US Air Force (Retired) | \n",
335 | " 4 | \n",
336 | " 814 | \n",
337 | " 4 | \n",
338 | " 29.0 | \n",
339 | " STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... | \n",
340 | " NaN | \n",
341 | " NaN | \n",
342 | "
\n",
343 | " \n",
344 | " | 4 | \n",
345 | " Buzz Aldrin | \n",
346 | " 1963.0 | \n",
347 | " 3.0 | \n",
348 | " Retired | \n",
349 | " 1/20/1930 | \n",
350 | " Montclair, NJ | \n",
351 | " Male | \n",
352 | " US Military Academy; MIT | \n",
353 | " Mechanical Engineering | \n",
354 | " Astronautics | \n",
355 | " Colonel | \n",
356 | " US Air Force (Retired) | \n",
357 | " 2 | \n",
358 | " 289 | \n",
359 | " 2 | \n",
360 | " 8.0 | \n",
361 | " Gemini 12, Apollo 11 | \n",
362 | " NaN | \n",
363 | " NaN | \n",
364 | "
\n",
365 | " \n",
366 | "
\n",
367 | "
"
368 | ],
369 | "text/plain": [
370 | " Name Year Group Status Birth Date Birth Place Gender \\\n",
371 | "0 Joseph M. Acaba 2004.0 19.0 Active 5/17/1967 Inglewood, CA Male \n",
372 | "1 Loren W. Acton NaN NaN Retired 3/7/1936 Lewiston, MT Male \n",
373 | "2 James C. Adamson 1984.0 10.0 Retired 3/3/1946 Warsaw, NY Male \n",
374 | "3 Thomas D. Akers 1987.0 12.0 Retired 5/20/1951 St. Louis, MO Male \n",
375 | "4 Buzz Aldrin 1963.0 3.0 Retired 1/20/1930 Montclair, NJ Male \n",
376 | "\n",
377 | " Alma Mater Undergraduate Major \\\n",
378 | "0 University of California-Santa Barbara; Univer... Geology \n",
379 | "1 Montana State University; University of Colorado Engineering Physics \n",
380 | "2 US Military Academy; Princeton University Engineering \n",
381 | "3 University of Missouri-Rolla Applied Mathematics \n",
382 | "4 US Military Academy; MIT Mechanical Engineering \n",
383 | "\n",
384 | " Graduate Major Military Rank Military Branch Space Flights \\\n",
385 | "0 Geology NaN NaN 2 \n",
386 | "1 Solar Physics NaN NaN 1 \n",
387 | "2 Aerospace Engineering Colonel US Army (Retired) 2 \n",
388 | "3 Applied Mathematics Colonel US Air Force (Retired) 4 \n",
389 | "4 Astronautics Colonel US Air Force (Retired) 2 \n",
390 | "\n",
391 | " Space Flight (hr) Space Walks Space Walks (hr) \\\n",
392 | "0 3307 2 13.0 \n",
393 | "1 190 0 0.0 \n",
394 | "2 334 0 0.0 \n",
395 | "3 814 4 29.0 \n",
396 | "4 289 2 8.0 \n",
397 | "\n",
398 | " Missions Death Date Death Mission \n",
399 | "0 STS-119 (Discovery), ISS-31/32 (Soyuz) NaN NaN \n",
400 | "1 STS 51-F (Challenger) NaN NaN \n",
401 | "2 STS-28 (Columbia), STS-43 (Atlantis) NaN NaN \n",
402 | "3 STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... NaN NaN \n",
403 | "4 Gemini 12, Apollo 11 NaN NaN "
404 | ]
405 | },
406 | "execution_count": 14,
407 | "metadata": {},
408 | "output_type": "execute_result"
409 | }
410 | ],
411 | "source": [
412 | "df = pd.read_csv(\"astronauts.csv\")\n",
413 | "df.head()"
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": 15,
419 | "metadata": {
420 | "ExecuteTime": {
421 | "end_time": "2020-02-16T03:14:48.250858Z",
422 | "start_time": "2020-02-16T03:14:48.237441Z"
423 | }
424 | },
425 | "outputs": [],
426 | "source": [
427 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
428 | ]
429 | },
430 | {
431 | "cell_type": "code",
432 | "execution_count": 16,
433 | "metadata": {
434 | "ExecuteTime": {
435 | "end_time": "2020-02-16T03:14:52.892116Z",
436 | "start_time": "2020-02-16T03:14:52.876108Z"
437 | }
438 | },
439 | "outputs": [],
440 | "source": [
441 | "pd.read_csv(\"save.csv\");"
442 | ]
443 | },
444 | {
445 | "cell_type": "code",
446 | "execution_count": 17,
447 | "metadata": {
448 | "ExecuteTime": {
449 | "end_time": "2020-02-16T03:15:12.997156Z",
450 | "start_time": "2020-02-16T03:15:12.988669Z"
451 | }
452 | },
453 | "outputs": [],
454 | "source": [
455 | "df.to_pickle(\"save.pkl\")"
456 | ]
457 | },
458 | {
459 | "cell_type": "code",
460 | "execution_count": 18,
461 | "metadata": {
462 | "ExecuteTime": {
463 | "end_time": "2020-02-16T03:15:16.375064Z",
464 | "start_time": "2020-02-16T03:15:16.365034Z"
465 | }
466 | },
467 | "outputs": [],
468 | "source": [
469 | "pd.read_pickle(\"save.pkl\");"
470 | ]
471 | },
472 | {
473 | "cell_type": "code",
474 | "execution_count": 32,
475 | "metadata": {
476 | "ExecuteTime": {
477 | "end_time": "2020-02-16T03:19:15.617617Z",
478 | "start_time": "2020-02-16T03:19:15.588076Z"
479 | }
480 | },
481 | "outputs": [],
482 | "source": [
483 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": 22,
489 | "metadata": {
490 | "ExecuteTime": {
491 | "end_time": "2020-02-16T03:15:35.323031Z",
492 | "start_time": "2020-02-16T03:15:35.301528Z"
493 | }
494 | },
495 | "outputs": [],
496 | "source": [
497 | "pd.read_hdf(\"save.hdf\");"
498 | ]
499 | },
500 | {
501 | "cell_type": "code",
502 | "execution_count": 23,
503 | "metadata": {
504 | "ExecuteTime": {
505 | "end_time": "2020-02-16T03:15:47.513253Z",
506 | "start_time": "2020-02-16T03:15:47.499922Z"
507 | }
508 | },
509 | "outputs": [],
510 | "source": [
511 | "df.to_feather(\"save.fth\")"
512 | ]
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": 24,
517 | "metadata": {
518 | "ExecuteTime": {
519 | "end_time": "2020-02-16T03:15:50.574863Z",
520 | "start_time": "2020-02-16T03:15:50.557141Z"
521 | }
522 | },
523 | "outputs": [],
524 | "source": [
525 | "pd.read_feather(\"save.fth\");"
526 | ]
527 | },
528 | {
529 | "cell_type": "code",
530 | "execution_count": 34,
531 | "metadata": {
532 | "ExecuteTime": {
533 | "end_time": "2020-02-16T03:20:03.082982Z",
534 | "start_time": "2020-02-16T03:20:03.062532Z"
535 | }
536 | },
537 | "outputs": [
538 | {
539 | "name": "stdout",
540 | "output_type": "stream",
541 | "text": [
542 | " Volume in drive C is System\n",
543 | " Volume Serial Number is 48F0-A822\n",
544 | "\n",
545 | " Directory of C:\\Users\\shint1\\Google Drive\\SDS\\DataManip\\2. Notebooks and Datasets\\2_Data\\Lectures\n",
546 | "\n",
547 | "16/02/2020 01:18 PM .\n",
548 | "16/02/2020 01:18 PM ..\n",
549 | "16/02/2020 12:55 PM .ipynb_checkpoints\n",
550 | "14/02/2020 10:50 PM 38,725 1_Loading.ipynb\n",
551 | "14/02/2020 11:32 PM 32,118 2_NumpyVPandas.ipynb\n",
552 | "16/02/2020 12:54 PM 9,216 3_CreatingDataFrames.ipynb\n",
553 | "16/02/2020 01:18 PM 18,019 4_SavingAndSerialising.ipynb\n",
554 | "20/09/2019 10:04 AM 81,593 astronauts.csv\n",
555 | "01/10/2019 08:15 PM 11,328 heart.csv\n",
556 | "18/01/2020 01:19 PM 35,216 heart.pkl\n",
557 | "16/02/2020 01:14 PM 87,030 save.csv\n",
558 | "16/02/2020 01:15 PM 107,240 save.fth\n",
559 | "16/02/2020 01:19 PM 4,108,481 save.hdf\n",
560 | "16/02/2020 01:15 PM 90,693 save.pkl\n",
561 | " 11 File(s) 4,619,659 bytes\n",
562 | " 3 Dir(s) 244,606,853,120 bytes free\n"
563 | ]
564 | }
565 | ],
566 | "source": [
567 | "%ls"
568 | ]
569 | },
570 | {
571 | "cell_type": "markdown",
572 | "metadata": {},
573 | "source": [
574 | "### Recap\n",
575 | "\n",
576 | "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n",
577 | "\n",
578 | "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine."
579 | ]
580 | }
581 | ],
582 | "metadata": {
583 | "kernelspec": {
584 | "display_name": "Python 3",
585 | "language": "python",
586 | "name": "python3"
587 | },
588 | "language_info": {
589 | "codemirror_mode": {
590 | "name": "ipython",
591 | "version": 3
592 | },
593 | "file_extension": ".py",
594 | "mimetype": "text/x-python",
595 | "name": "python",
596 | "nbconvert_exporter": "python",
597 | "pygments_lexer": "ipython3",
598 | "version": "3.7.4"
599 | }
600 | },
601 | "nbformat": 4,
602 | "nbformat_minor": 2
603 | }
604 |
--------------------------------------------------------------------------------
/practiceResource/dataloading.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Loading Datasets\n",
8 | "\n",
9 | "We'll be using the Kaggle Heart Disease UCI dataset as an example. You can find it here: https://www.kaggle.com/ronitf/heart-disease-uci\n",
10 | "\n",
11 | "* Manual loading (last resort)\n",
12 | "* `np.loadtxt`\n",
13 | "* `np.genfromtxt`\n",
14 | "* `pd.read_csv`\n",
15 | "* `pd.read*`\n",
16 | "* `pickle`"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 1,
22 | "metadata": {
23 | "ExecuteTime": {
24 | "end_time": "2020-02-14T12:39:12.306606Z",
25 | "start_time": "2020-02-14T12:39:12.302988Z"
26 | }
27 | },
28 | "outputs": [],
29 | "source": [
30 | "import numpy as np\n",
31 | "import pandas as pd\n",
32 | "import pickle\n",
33 | "\n",
34 | "filename = \"heart.csv\""
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {},
40 | "source": [
41 | "## The best method - panda's read_csv\n",
42 | "Handles the most edge cases, datetime and file issues best."
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 2,
48 | "metadata": {
49 | "ExecuteTime": {
50 | "end_time": "2020-02-14T12:39:55.204452Z",
51 | "start_time": "2020-02-14T12:39:55.185019Z"
52 | }
53 | },
54 | "outputs": [
55 | {
56 | "data": {
57 | "text/html": [
58 | "\n",
59 | "\n",
72 | "
\n",
73 | " \n",
74 | " \n",
75 | " | \n",
76 | " age | \n",
77 | " sex | \n",
78 | " cp | \n",
79 | " trestbps | \n",
80 | " chol | \n",
81 | " fbs | \n",
82 | " restecg | \n",
83 | " thalach | \n",
84 | " exang | \n",
85 | " oldpeak | \n",
86 | " slope | \n",
87 | " ca | \n",
88 | " thal | \n",
89 | " target | \n",
90 | "
\n",
91 | " \n",
92 | " \n",
93 | " \n",
94 | " | 0 | \n",
95 | " 63 | \n",
96 | " 1 | \n",
97 | " 3 | \n",
98 | " 145 | \n",
99 | " 233 | \n",
100 | " 1 | \n",
101 | " 0 | \n",
102 | " 150 | \n",
103 | " 0 | \n",
104 | " 2.3 | \n",
105 | " 0 | \n",
106 | " 0 | \n",
107 | " 1 | \n",
108 | " 1 | \n",
109 | "
\n",
110 | " \n",
111 | " | 1 | \n",
112 | " 37 | \n",
113 | " 1 | \n",
114 | " 2 | \n",
115 | " 130 | \n",
116 | " 250 | \n",
117 | " 0 | \n",
118 | " 1 | \n",
119 | " 187 | \n",
120 | " 0 | \n",
121 | " 3.5 | \n",
122 | " 0 | \n",
123 | " 0 | \n",
124 | " 2 | \n",
125 | " 1 | \n",
126 | "
\n",
127 | " \n",
128 | " | 2 | \n",
129 | " 41 | \n",
130 | " 0 | \n",
131 | " 1 | \n",
132 | " 130 | \n",
133 | " 204 | \n",
134 | " 0 | \n",
135 | " 0 | \n",
136 | " 172 | \n",
137 | " 0 | \n",
138 | " 1.4 | \n",
139 | " 2 | \n",
140 | " 0 | \n",
141 | " 2 | \n",
142 | " 1 | \n",
143 | "
\n",
144 | " \n",
145 | " | 3 | \n",
146 | " 56 | \n",
147 | " 1 | \n",
148 | " 1 | \n",
149 | " 120 | \n",
150 | " 236 | \n",
151 | " 0 | \n",
152 | " 1 | \n",
153 | " 178 | \n",
154 | " 0 | \n",
155 | " 0.8 | \n",
156 | " 2 | \n",
157 | " 0 | \n",
158 | " 2 | \n",
159 | " 1 | \n",
160 | "
\n",
161 | " \n",
162 | " | 4 | \n",
163 | " 57 | \n",
164 | " 0 | \n",
165 | " 0 | \n",
166 | " 120 | \n",
167 | " 354 | \n",
168 | " 0 | \n",
169 | " 1 | \n",
170 | " 163 | \n",
171 | " 1 | \n",
172 | " 0.6 | \n",
173 | " 2 | \n",
174 | " 0 | \n",
175 | " 2 | \n",
176 | " 1 | \n",
177 | "
\n",
178 | " \n",
179 | "
\n",
180 | "
"
181 | ],
182 | "text/plain": [
183 | " age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n",
184 | "0 63 1 3 145 233 1 0 150 0 2.3 0 \n",
185 | "1 37 1 2 130 250 0 1 187 0 3.5 0 \n",
186 | "2 41 0 1 130 204 0 0 172 0 1.4 2 \n",
187 | "3 56 1 1 120 236 0 1 178 0 0.8 2 \n",
188 | "4 57 0 0 120 354 0 1 163 1 0.6 2 \n",
189 | "\n",
190 | " ca thal target \n",
191 | "0 0 1 1 \n",
192 | "1 0 2 1 \n",
193 | "2 0 2 1 \n",
194 | "3 0 2 1 \n",
195 | "4 0 2 1 "
196 | ]
197 | },
198 | "execution_count": 2,
199 | "metadata": {},
200 | "output_type": "execute_result"
201 | }
202 | ],
203 | "source": [
204 | "df = pd.read_csv(filename)\n",
205 | "df.head()"
206 | ]
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "## Using numpy's loadtxt and genfromtxt\n",
213 | "\n",
214 | "If you must. Notice it fails without extra arguments - its not as smart and we have to tell it what to do. Designed for loading in data saved using `np.savetxt`, not meant to be a robust loader."
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 5,
220 | "metadata": {
221 | "ExecuteTime": {
222 | "end_time": "2020-02-14T12:41:25.154199Z",
223 | "start_time": "2020-02-14T12:41:25.144188Z"
224 | }
225 | },
226 | "outputs": [
227 | {
228 | "name": "stdout",
229 | "output_type": "stream",
230 | "text": [
231 | "[[63. 1. 3. ... 0. 1. 1.]\n",
232 | " [37. 1. 2. ... 0. 2. 1.]\n",
233 | " [41. 0. 1. ... 0. 2. 1.]\n",
234 | " ...\n",
235 | " [68. 1. 0. ... 2. 3. 0.]\n",
236 | " [57. 1. 0. ... 1. 3. 0.]\n",
237 | " [57. 0. 1. ... 1. 2. 0.]]\n"
238 | ]
239 | }
240 | ],
241 | "source": [
242 | "data = np.loadtxt(filename, delimiter=\",\", skiprows=1)\n",
243 | "print(data)"
244 | ]
245 | },
246 | {
247 | "cell_type": "code",
248 | "execution_count": 7,
249 | "metadata": {
250 | "ExecuteTime": {
251 | "end_time": "2020-02-14T12:43:04.186497Z",
252 | "start_time": "2020-02-14T12:43:04.159393Z"
253 | }
254 | },
255 | "outputs": [
256 | {
257 | "name": "stdout",
258 | "output_type": "stream",
259 | "text": [
260 | "[(63, 1, 3, 145, 233, 1, 0, 150, 0, 2.3, 0, 0, 1, 1)\n",
261 | " (37, 1, 2, 130, 250, 0, 1, 187, 0, 3.5, 0, 0, 2, 1)\n",
262 | " (41, 0, 1, 130, 204, 0, 0, 172, 0, 1.4, 2, 0, 2, 1)\n",
263 | " (56, 1, 1, 120, 236, 0, 1, 178, 0, 0.8, 2, 0, 2, 1)\n",
264 | " (57, 0, 0, 120, 354, 0, 1, 163, 1, 0.6, 2, 0, 2, 1)\n",
265 | " (57, 1, 0, 140, 192, 0, 1, 148, 0, 0.4, 1, 0, 1, 1)\n",
266 | " (56, 0, 1, 140, 294, 0, 0, 153, 0, 1.3, 1, 0, 2, 1)\n",
267 | " (44, 1, 1, 120, 263, 0, 1, 173, 0, 0. , 2, 0, 3, 1)\n",
268 | " (52, 1, 2, 172, 199, 1, 1, 162, 0, 0.5, 2, 0, 3, 1)\n",
269 | " (57, 1, 2, 150, 168, 0, 1, 174, 0, 1.6, 2, 0, 2, 1)]\n",
270 | "[('age', '\n",
302 | "\n",
315 | "\n",
316 | " \n",
317 | " \n",
318 | " | \n",
319 | " age | \n",
320 | " sex | \n",
321 | " cp | \n",
322 | " trestbps | \n",
323 | " chol | \n",
324 | " fbs | \n",
325 | " restecg | \n",
326 | " thalach | \n",
327 | " exang | \n",
328 | " oldpeak | \n",
329 | " slope | \n",
330 | " ca | \n",
331 | " thal | \n",
332 | " target | \n",
333 | "
\n",
334 | " \n",
335 | " \n",
336 | " \n",
337 | " | 0 | \n",
338 | " 63.0 | \n",
339 | " 1.0 | \n",
340 | " 3.0 | \n",
341 | " 145.0 | \n",
342 | " 233.0 | \n",
343 | " 1.0 | \n",
344 | " 0.0 | \n",
345 | " 150.0 | \n",
346 | " 0.0 | \n",
347 | " 2.3 | \n",
348 | " 0.0 | \n",
349 | " 0.0 | \n",
350 | " 1.0 | \n",
351 | " 1.0 | \n",
352 | "
\n",
353 | " \n",
354 | " | 1 | \n",
355 | " 37.0 | \n",
356 | " 1.0 | \n",
357 | " 2.0 | \n",
358 | " 130.0 | \n",
359 | " 250.0 | \n",
360 | " 0.0 | \n",
361 | " 1.0 | \n",
362 | " 187.0 | \n",
363 | " 0.0 | \n",
364 | " 3.5 | \n",
365 | " 0.0 | \n",
366 | " 0.0 | \n",
367 | " 2.0 | \n",
368 | " 1.0 | \n",
369 | "
\n",
370 | " \n",
371 | " | 2 | \n",
372 | " 41.0 | \n",
373 | " 0.0 | \n",
374 | " 1.0 | \n",
375 | " 130.0 | \n",
376 | " 204.0 | \n",
377 | " 0.0 | \n",
378 | " 0.0 | \n",
379 | " 172.0 | \n",
380 | " 0.0 | \n",
381 | " 1.4 | \n",
382 | " 2.0 | \n",
383 | " 0.0 | \n",
384 | " 2.0 | \n",
385 | " 1.0 | \n",
386 | "
\n",
387 | " \n",
388 | " | 3 | \n",
389 | " 56.0 | \n",
390 | " 1.0 | \n",
391 | " 1.0 | \n",
392 | " 120.0 | \n",
393 | " 236.0 | \n",
394 | " 0.0 | \n",
395 | " 1.0 | \n",
396 | " 178.0 | \n",
397 | " 0.0 | \n",
398 | " 0.8 | \n",
399 | " 2.0 | \n",
400 | " 0.0 | \n",
401 | " 2.0 | \n",
402 | " 1.0 | \n",
403 | "
\n",
404 | " \n",
405 | " | 4 | \n",
406 | " 57.0 | \n",
407 | " 0.0 | \n",
408 | " 0.0 | \n",
409 | " 120.0 | \n",
410 | " 354.0 | \n",
411 | " 0.0 | \n",
412 | " 1.0 | \n",
413 | " 163.0 | \n",
414 | " 1.0 | \n",
415 | " 0.6 | \n",
416 | " 2.0 | \n",
417 | " 0.0 | \n",
418 | " 2.0 | \n",
419 | " 1.0 | \n",
420 | "
\n",
421 | " \n",
422 | "
\n",
423 | ""
424 | ],
425 | "text/plain": [
426 | " age sex cp trestbps chol fbs restecg thalach exang oldpeak \\\n",
427 | "0 63.0 1.0 3.0 145.0 233.0 1.0 0.0 150.0 0.0 2.3 \n",
428 | "1 37.0 1.0 2.0 130.0 250.0 0.0 1.0 187.0 0.0 3.5 \n",
429 | "2 41.0 0.0 1.0 130.0 204.0 0.0 0.0 172.0 0.0 1.4 \n",
430 | "3 56.0 1.0 1.0 120.0 236.0 0.0 1.0 178.0 0.0 0.8 \n",
431 | "4 57.0 0.0 0.0 120.0 354.0 0.0 1.0 163.0 1.0 0.6 \n",
432 | "\n",
433 | " slope ca thal target \n",
434 | "0 0.0 0.0 1.0 1.0 \n",
435 | "1 0.0 0.0 2.0 1.0 \n",
436 | "2 2.0 0.0 2.0 1.0 \n",
437 | "3 2.0 0.0 2.0 1.0 \n",
438 | "4 2.0 0.0 2.0 1.0 "
439 | ]
440 | },
441 | "execution_count": 8,
442 | "metadata": {},
443 | "output_type": "execute_result"
444 | }
445 | ],
446 | "source": [
447 | "def load_file(filename):\n",
448 | " with open(filename, encoding=\"utf-8-sig\") as f:\n",
449 | " data, cols = [], []\n",
450 | " for i, line in enumerate(f.read().splitlines()):\n",
451 | " if i == 0:\n",
452 | " cols += line.split(\",\")\n",
453 | " else:\n",
454 | " data.append([float(x) for x in line.split(\",\")])\n",
455 | " df = pd.DataFrame(data, columns=cols)\n",
456 | " return df\n",
457 | "load_file(filename).head()"
458 | ]
459 | },
460 | {
461 | "cell_type": "markdown",
462 | "metadata": {},
463 | "source": [
464 | "## Pickles!\n",
465 | "Some danger using pickles as encoding changes. Use an industry standard like hd5 instead if you can. Note if you're working with dataframes, dont use python's `pickle`, pandas has their own implementation - `df.to_pickle` and `df.read_pickle`. Underlying algorithm is the same, but less code for you to type, and supports compression."
466 | ]
467 | },
468 | {
469 | "cell_type": "code",
470 | "execution_count": 10,
471 | "metadata": {
472 | "ExecuteTime": {
473 | "end_time": "2020-02-14T12:48:34.375437Z",
474 | "start_time": "2020-02-14T12:48:34.359410Z"
475 | }
476 | },
477 | "outputs": [
478 | {
479 | "data": {
480 | "text/html": [
481 | "\n",
482 | "\n",
495 | "
\n",
496 | " \n",
497 | " \n",
498 | " | \n",
499 | " age | \n",
500 | " sex | \n",
501 | " cp | \n",
502 | " trestbps | \n",
503 | " chol | \n",
504 | " fbs | \n",
505 | " restecg | \n",
506 | " thalach | \n",
507 | " exang | \n",
508 | " oldpeak | \n",
509 | " slope | \n",
510 | " ca | \n",
511 | " thal | \n",
512 | " target | \n",
513 | "
\n",
514 | " \n",
515 | " \n",
516 | " \n",
517 | " | 0 | \n",
518 | " 63 | \n",
519 | " 1 | \n",
520 | " 3 | \n",
521 | " 145 | \n",
522 | " 233 | \n",
523 | " 1 | \n",
524 | " 0 | \n",
525 | " 150 | \n",
526 | " 0 | \n",
527 | " 2.3 | \n",
528 | " 0 | \n",
529 | " 0 | \n",
530 | " 1 | \n",
531 | " 1 | \n",
532 | "
\n",
533 | " \n",
534 | " | 1 | \n",
535 | " 37 | \n",
536 | " 1 | \n",
537 | " 2 | \n",
538 | " 130 | \n",
539 | " 250 | \n",
540 | " 0 | \n",
541 | " 1 | \n",
542 | " 187 | \n",
543 | " 0 | \n",
544 | " 3.5 | \n",
545 | " 0 | \n",
546 | " 0 | \n",
547 | " 2 | \n",
548 | " 1 | \n",
549 | "
\n",
550 | " \n",
551 | " | 2 | \n",
552 | " 41 | \n",
553 | " 0 | \n",
554 | " 1 | \n",
555 | " 130 | \n",
556 | " 204 | \n",
557 | " 0 | \n",
558 | " 0 | \n",
559 | " 172 | \n",
560 | " 0 | \n",
561 | " 1.4 | \n",
562 | " 2 | \n",
563 | " 0 | \n",
564 | " 2 | \n",
565 | " 1 | \n",
566 | "
\n",
567 | " \n",
568 | " | 3 | \n",
569 | " 56 | \n",
570 | " 1 | \n",
571 | " 1 | \n",
572 | " 120 | \n",
573 | " 236 | \n",
574 | " 0 | \n",
575 | " 1 | \n",
576 | " 178 | \n",
577 | " 0 | \n",
578 | " 0.8 | \n",
579 | " 2 | \n",
580 | " 0 | \n",
581 | " 2 | \n",
582 | " 1 | \n",
583 | "
\n",
584 | " \n",
585 | " | 4 | \n",
586 | " 57 | \n",
587 | " 0 | \n",
588 | " 0 | \n",
589 | " 120 | \n",
590 | " 354 | \n",
591 | " 0 | \n",
592 | " 1 | \n",
593 | " 163 | \n",
594 | " 1 | \n",
595 | " 0.6 | \n",
596 | " 2 | \n",
597 | " 0 | \n",
598 | " 2 | \n",
599 | " 1 | \n",
600 | "
\n",
601 | " \n",
602 | "
\n",
603 | "
"
604 | ],
605 | "text/plain": [
606 | " age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n",
607 | "0 63 1 3 145 233 1 0 150 0 2.3 0 \n",
608 | "1 37 1 2 130 250 0 1 187 0 3.5 0 \n",
609 | "2 41 0 1 130 204 0 0 172 0 1.4 2 \n",
610 | "3 56 1 1 120 236 0 1 178 0 0.8 2 \n",
611 | "4 57 0 0 120 354 0 1 163 1 0.6 2 \n",
612 | "\n",
613 | " ca thal target \n",
614 | "0 0 1 1 \n",
615 | "1 0 2 1 \n",
616 | "2 0 2 1 \n",
617 | "3 0 2 1 \n",
618 | "4 0 2 1 "
619 | ]
620 | },
621 | "execution_count": 10,
622 | "metadata": {},
623 | "output_type": "execute_result"
624 | }
625 | ],
626 | "source": [
627 | "df = pd.read_pickle(\"heart.pkl\")\n",
628 | "df.head()"
629 | ]
630 | },
631 | {
632 | "cell_type": "markdown",
633 | "metadata": {},
634 | "source": [
635 | "### Recap\n",
636 | "\n",
637 | "* Use pd.read_csv 99% of the time\n",
638 | "* Use pd.read_* for other cases (pd.read_excel, pd.read_pickle, etc)\n",
639 | "* If pd cant handle it, I doubt numpy can\n",
640 | "* If you use a manual function, save your data to a sensible format"
641 | ]
642 | }
643 | ],
644 | "metadata": {
645 | "kernelspec": {
646 | "display_name": "Python 3",
647 | "language": "python",
648 | "name": "python3"
649 | },
650 | "language_info": {
651 | "codemirror_mode": {
652 | "name": "ipython",
653 | "version": 3
654 | },
655 | "file_extension": ".py",
656 | "mimetype": "text/x-python",
657 | "name": "python",
658 | "nbconvert_exporter": "python",
659 | "pygments_lexer": "ipython3",
660 | "version": "3.7.4"
661 | }
662 | },
663 | "nbformat": 4,
664 | "nbformat_minor": 2
665 | }
666 |
--------------------------------------------------------------------------------
/practiceResource/dataSavingAndSerialising.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Saving and Serialising a dataframe\n"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "ExecuteTime": {
15 | "end_time": "2020-02-16T02:55:25.086213Z",
16 | "start_time": "2020-02-16T02:55:23.758762Z"
17 | }
18 | },
19 | "outputs": [],
20 | "source": [
21 | "import numpy as np\n",
22 | "import pandas as pd"
23 | ]
24 | },
25 | {
26 | "cell_type": "code",
27 | "execution_count": 2,
28 | "metadata": {
29 | "ExecuteTime": {
30 | "end_time": "2020-02-16T03:01:00.686674Z",
31 | "start_time": "2020-02-16T03:01:00.668178Z"
32 | }
33 | },
34 | "outputs": [
35 | {
36 | "data": {
37 | "text/html": [
38 | "\n",
39 | "\n",
52 | "
\n",
53 | " \n",
54 | " \n",
55 | " | \n",
56 | " A | \n",
57 | " B | \n",
58 | " C | \n",
59 | " D | \n",
60 | "
\n",
61 | " \n",
62 | " \n",
63 | " \n",
64 | " | 0 | \n",
65 | " 0.863149 | \n",
66 | " 0.314732 | \n",
67 | " 0.669747 | \n",
68 | " 0.702656 | \n",
69 | "
\n",
70 | " \n",
71 | " | 1 | \n",
72 | " 0.546542 | \n",
73 | " 0.563607 | \n",
74 | " 0.780532 | \n",
75 | " 0.312281 | \n",
76 | "
\n",
77 | " \n",
78 | " | 2 | \n",
79 | " 0.024058 | \n",
80 | " 0.473108 | \n",
81 | " 0.447980 | \n",
82 | " 0.811878 | \n",
83 | "
\n",
84 | " \n",
85 | " | 3 | \n",
86 | " 0.888702 | \n",
87 | " 0.392524 | \n",
88 | " 0.830159 | \n",
89 | " 0.452014 | \n",
90 | "
\n",
91 | " \n",
92 | " | 4 | \n",
93 | " 0.266793 | \n",
94 | " 0.449780 | \n",
95 | " 0.589546 | \n",
96 | " 0.882689 | \n",
97 | "
\n",
98 | " \n",
99 | "
\n",
100 | "
"
101 | ],
102 | "text/plain": [
103 | " A B C D\n",
104 | "0 0.863149 0.314732 0.669747 0.702656\n",
105 | "1 0.546542 0.563607 0.780532 0.312281\n",
106 | "2 0.024058 0.473108 0.447980 0.811878\n",
107 | "3 0.888702 0.392524 0.830159 0.452014\n",
108 | "4 0.266793 0.449780 0.589546 0.882689"
109 | ]
110 | },
111 | "execution_count": 2,
112 | "metadata": {},
113 | "output_type": "execute_result"
114 | }
115 | ],
116 | "source": [
117 | "# Lets make a new dataframe and save it out using various formats\n",
118 | "df = pd.DataFrame(np.random.random(size=(100000, 4)), columns=[\"A\", \"B\", \"C\", \"D\"])\n",
119 | "df.head()"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 3,
125 | "metadata": {
126 | "ExecuteTime": {
127 | "end_time": "2020-02-16T03:03:34.987813Z",
128 | "start_time": "2020-02-16T03:03:34.219248Z"
129 | }
130 | },
131 | "outputs": [],
132 | "source": [
133 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.3f\")"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": 4,
139 | "metadata": {
140 | "ExecuteTime": {
141 | "end_time": "2020-02-16T03:04:00.092272Z",
142 | "start_time": "2020-02-16T03:04:00.079738Z"
143 | }
144 | },
145 | "outputs": [],
146 | "source": [
147 | "df.to_pickle(\"save.pkl\")"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 8,
153 | "metadata": {
154 | "ExecuteTime": {
155 | "end_time": "2020-02-16T03:06:06.874338Z",
156 | "start_time": "2020-02-16T03:06:05.955905Z"
157 | }
158 | },
159 | "outputs": [],
160 | "source": [
161 | "# pip install tables\n",
162 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": 9,
168 | "metadata": {
169 | "ExecuteTime": {
170 | "end_time": "2020-02-16T03:06:56.305779Z",
171 | "start_time": "2020-02-16T03:06:56.204901Z"
172 | }
173 | },
174 | "outputs": [],
175 | "source": [
176 | "# pip install feather-format\n",
177 | "df.to_feather(\"save.fth\")"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 11,
183 | "metadata": {
184 | "ExecuteTime": {
185 | "end_time": "2020-02-16T03:10:46.080056Z",
186 | "start_time": "2020-02-16T03:10:46.075636Z"
187 | }
188 | },
189 | "outputs": [],
190 | "source": [
191 | "# If you want to get the timings you can see in the video, you'll need this extension:\n",
192 | "# https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/execute_time/readme.html"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "Now this is a very easy test - its only numeric data. If we add strings and categorical data things can slow down a lot! Let's try this on mixed Astronaut data from Kaggle: https://www.kaggle.com/nasa/astronaut-yearbook"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 7,
205 | "metadata": {
206 | "ExecuteTime": {
207 | "end_time": "2020-02-16T03:14:16.764994Z",
208 | "start_time": "2020-02-16T03:14:16.741456Z"
209 | }
210 | },
211 | "outputs": [
212 | {
213 | "data": {
214 | "text/html": [
215 | "\n",
216 | "\n",
229 | "
\n",
230 | " \n",
231 | " \n",
232 | " | \n",
233 | " Name | \n",
234 | " Year | \n",
235 | " Group | \n",
236 | " Status | \n",
237 | " Birth Date | \n",
238 | " Birth Place | \n",
239 | " Gender | \n",
240 | " Alma Mater | \n",
241 | " Undergraduate Major | \n",
242 | " Graduate Major | \n",
243 | " Military Rank | \n",
244 | " Military Branch | \n",
245 | " Space Flights | \n",
246 | " Space Flight (hr) | \n",
247 | " Space Walks | \n",
248 | " Space Walks (hr) | \n",
249 | " Missions | \n",
250 | " Death Date | \n",
251 | " Death Mission | \n",
252 | "
\n",
253 | " \n",
254 | " \n",
255 | " \n",
256 | " | 0 | \n",
257 | " Joseph M. Acaba | \n",
258 | " 2004.0 | \n",
259 | " 19.0 | \n",
260 | " Active | \n",
261 | " 5/17/1967 | \n",
262 | " Inglewood, CA | \n",
263 | " Male | \n",
264 | " University of California-Santa Barbara; Univer... | \n",
265 | " Geology | \n",
266 | " Geology | \n",
267 | " NaN | \n",
268 | " NaN | \n",
269 | " 2 | \n",
270 | " 3307 | \n",
271 | " 2 | \n",
272 | " 13.0 | \n",
273 | " STS-119 (Discovery), ISS-31/32 (Soyuz) | \n",
274 | " NaN | \n",
275 | " NaN | \n",
276 | "
\n",
277 | " \n",
278 | " | 1 | \n",
279 | " Loren W. Acton | \n",
280 | " NaN | \n",
281 | " NaN | \n",
282 | " Retired | \n",
283 | " 3/7/1936 | \n",
284 | " Lewiston, MT | \n",
285 | " Male | \n",
286 | " Montana State University; University of Colorado | \n",
287 | " Engineering Physics | \n",
288 | " Solar Physics | \n",
289 | " NaN | \n",
290 | " NaN | \n",
291 | " 1 | \n",
292 | " 190 | \n",
293 | " 0 | \n",
294 | " 0.0 | \n",
295 | " STS 51-F (Challenger) | \n",
296 | " NaN | \n",
297 | " NaN | \n",
298 | "
\n",
299 | " \n",
300 | " | 2 | \n",
301 | " James C. Adamson | \n",
302 | " 1984.0 | \n",
303 | " 10.0 | \n",
304 | " Retired | \n",
305 | " 3/3/1946 | \n",
306 | " Warsaw, NY | \n",
307 | " Male | \n",
308 | " US Military Academy; Princeton University | \n",
309 | " Engineering | \n",
310 | " Aerospace Engineering | \n",
311 | " Colonel | \n",
312 | " US Army (Retired) | \n",
313 | " 2 | \n",
314 | " 334 | \n",
315 | " 0 | \n",
316 | " 0.0 | \n",
317 | " STS-28 (Columbia), STS-43 (Atlantis) | \n",
318 | " NaN | \n",
319 | " NaN | \n",
320 | "
\n",
321 | " \n",
322 | " | 3 | \n",
323 | " Thomas D. Akers | \n",
324 | " 1987.0 | \n",
325 | " 12.0 | \n",
326 | " Retired | \n",
327 | " 5/20/1951 | \n",
328 | " St. Louis, MO | \n",
329 | " Male | \n",
330 | " University of Missouri-Rolla | \n",
331 | " Applied Mathematics | \n",
332 | " Applied Mathematics | \n",
333 | " Colonel | \n",
334 | " US Air Force (Retired) | \n",
335 | " 4 | \n",
336 | " 814 | \n",
337 | " 4 | \n",
338 | " 29.0 | \n",
339 | " STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... | \n",
340 | " NaN | \n",
341 | " NaN | \n",
342 | "
\n",
343 | " \n",
344 | " | 4 | \n",
345 | " Buzz Aldrin | \n",
346 | " 1963.0 | \n",
347 | " 3.0 | \n",
348 | " Retired | \n",
349 | " 1/20/1930 | \n",
350 | " Montclair, NJ | \n",
351 | " Male | \n",
352 | " US Military Academy; MIT | \n",
353 | " Mechanical Engineering | \n",
354 | " Astronautics | \n",
355 | " Colonel | \n",
356 | " US Air Force (Retired) | \n",
357 | " 2 | \n",
358 | " 289 | \n",
359 | " 2 | \n",
360 | " 8.0 | \n",
361 | " Gemini 12, Apollo 11 | \n",
362 | " NaN | \n",
363 | " NaN | \n",
364 | "
\n",
365 | " \n",
366 | "
\n",
367 | "
"
368 | ],
369 | "text/plain": [
370 | " Name Year Group Status Birth Date Birth Place Gender \\\n",
371 | "0 Joseph M. Acaba 2004.0 19.0 Active 5/17/1967 Inglewood, CA Male \n",
372 | "1 Loren W. Acton NaN NaN Retired 3/7/1936 Lewiston, MT Male \n",
373 | "2 James C. Adamson 1984.0 10.0 Retired 3/3/1946 Warsaw, NY Male \n",
374 | "3 Thomas D. Akers 1987.0 12.0 Retired 5/20/1951 St. Louis, MO Male \n",
375 | "4 Buzz Aldrin 1963.0 3.0 Retired 1/20/1930 Montclair, NJ Male \n",
376 | "\n",
377 | " Alma Mater Undergraduate Major \\\n",
378 | "0 University of California-Santa Barbara; Univer... Geology \n",
379 | "1 Montana State University; University of Colorado Engineering Physics \n",
380 | "2 US Military Academy; Princeton University Engineering \n",
381 | "3 University of Missouri-Rolla Applied Mathematics \n",
382 | "4 US Military Academy; MIT Mechanical Engineering \n",
383 | "\n",
384 | " Graduate Major Military Rank Military Branch Space Flights \\\n",
385 | "0 Geology NaN NaN 2 \n",
386 | "1 Solar Physics NaN NaN 1 \n",
387 | "2 Aerospace Engineering Colonel US Army (Retired) 2 \n",
388 | "3 Applied Mathematics Colonel US Air Force (Retired) 4 \n",
389 | "4 Astronautics Colonel US Air Force (Retired) 2 \n",
390 | "\n",
391 | " Space Flight (hr) Space Walks Space Walks (hr) \\\n",
392 | "0 3307 2 13.0 \n",
393 | "1 190 0 0.0 \n",
394 | "2 334 0 0.0 \n",
395 | "3 814 4 29.0 \n",
396 | "4 289 2 8.0 \n",
397 | "\n",
398 | " Missions Death Date Death Mission \n",
399 | "0 STS-119 (Discovery), ISS-31/32 (Soyuz) NaN NaN \n",
400 | "1 STS 51-F (Challenger) NaN NaN \n",
401 | "2 STS-28 (Columbia), STS-43 (Atlantis) NaN NaN \n",
402 | "3 STS-41 (Discovery), STS-49 (Endeavor), STS-61 ... NaN NaN \n",
403 | "4 Gemini 12, Apollo 11 NaN NaN "
404 | ]
405 | },
406 | "execution_count": 7,
407 | "metadata": {},
408 | "output_type": "execute_result"
409 | }
410 | ],
411 | "source": [
412 | "df = pd.read_csv(\"astronauts.csv\")\n",
413 | "df.head()"
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": 8,
419 | "metadata": {
420 | "ExecuteTime": {
421 | "end_time": "2020-02-16T03:14:48.250858Z",
422 | "start_time": "2020-02-16T03:14:48.237441Z"
423 | }
424 | },
425 | "outputs": [],
426 | "source": [
427 | "df.to_csv(\"save.csv\", index=False, float_format=\"%0.4f\")"
428 | ]
429 | },
430 | {
431 | "cell_type": "code",
432 | "execution_count": 9,
433 | "metadata": {
434 | "ExecuteTime": {
435 | "end_time": "2020-02-16T03:14:52.892116Z",
436 | "start_time": "2020-02-16T03:14:52.876108Z"
437 | }
438 | },
439 | "outputs": [],
440 | "source": [
441 | "pd.read_csv(\"save.csv\");"
442 | ]
443 | },
444 | {
445 | "cell_type": "code",
446 | "execution_count": 10,
447 | "metadata": {
448 | "ExecuteTime": {
449 | "end_time": "2020-02-16T03:15:12.997156Z",
450 | "start_time": "2020-02-16T03:15:12.988669Z"
451 | }
452 | },
453 | "outputs": [],
454 | "source": [
455 | "df.to_pickle(\"save.pkl\")"
456 | ]
457 | },
458 | {
459 | "cell_type": "code",
460 | "execution_count": 11,
461 | "metadata": {
462 | "ExecuteTime": {
463 | "end_time": "2020-02-16T03:15:16.375064Z",
464 | "start_time": "2020-02-16T03:15:16.365034Z"
465 | }
466 | },
467 | "outputs": [],
468 | "source": [
469 | "pd.read_pickle(\"save.pkl\");"
470 | ]
471 | },
472 | {
473 | "cell_type": "code",
474 | "execution_count": 12,
475 | "metadata": {
476 | "ExecuteTime": {
477 | "end_time": "2020-02-16T03:19:15.617617Z",
478 | "start_time": "2020-02-16T03:19:15.588076Z"
479 | }
480 | },
481 | "outputs": [],
482 | "source": [
483 | "df.to_hdf(\"save.hdf\", key=\"data\", format=\"table\")"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": 13,
489 | "metadata": {
490 | "ExecuteTime": {
491 | "end_time": "2020-02-16T03:15:35.323031Z",
492 | "start_time": "2020-02-16T03:15:35.301528Z"
493 | }
494 | },
495 | "outputs": [],
496 | "source": [
497 | "pd.read_hdf(\"save.hdf\");"
498 | ]
499 | },
500 | {
501 | "cell_type": "code",
502 | "execution_count": 14,
503 | "metadata": {
504 | "ExecuteTime": {
505 | "end_time": "2020-02-16T03:15:47.513253Z",
506 | "start_time": "2020-02-16T03:15:47.499922Z"
507 | }
508 | },
509 | "outputs": [
510 | {
511 | "ename": "ImportError",
512 | "evalue": "Missing optional dependency 'pyarrow'. Use pip or conda to install pyarrow.",
513 | "output_type": "error",
514 | "traceback": [
515 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
516 | "\u001b[1;31mImportError\u001b[0m Traceback (most recent call last)",
517 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mto_feather\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"save.fth\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
518 | "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py\u001b[0m in \u001b[0;36mto_feather\u001b[1;34m(self, fname)\u001b[0m\n\u001b[0;32m 2135\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mio\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfeather_format\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mto_feather\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2136\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2137\u001b[1;33m \u001b[0mto_feather\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2138\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2139\u001b[0m def to_parquet(\n",
519 | "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\feather_format.py\u001b[0m in \u001b[0;36mto_feather\u001b[1;34m(df, path)\u001b[0m\n\u001b[0;32m 21\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 22\u001b[0m \"\"\"\n\u001b[1;32m---> 23\u001b[1;33m \u001b[0mimport_optional_dependency\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"pyarrow\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 24\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mpyarrow\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mfeather\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 25\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
520 | "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\compat\\_optional.py\u001b[0m in \u001b[0;36mimport_optional_dependency\u001b[1;34m(name, extra, raise_on_missing, on_version)\u001b[0m\n\u001b[0;32m 91\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mImportError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 92\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mraise_on_missing\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 93\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mImportError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mextra\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mextra\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 94\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 95\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
521 | "\u001b[1;31mImportError\u001b[0m: Missing optional dependency 'pyarrow'. Use pip or conda to install pyarrow."
522 | ]
523 | }
524 | ],
525 | "source": [
526 | "df.to_feather(\"save.fth\")"
527 | ]
528 | },
529 | {
530 | "cell_type": "code",
531 | "execution_count": null,
532 | "metadata": {
533 | "ExecuteTime": {
534 | "end_time": "2020-02-16T03:15:50.574863Z",
535 | "start_time": "2020-02-16T03:15:50.557141Z"
536 | }
537 | },
538 | "outputs": [],
539 | "source": [
540 | "pd.read_feather(\"save.fth\");"
541 | ]
542 | },
543 | {
544 | "cell_type": "code",
545 | "execution_count": 16,
546 | "metadata": {
547 | "ExecuteTime": {
548 | "end_time": "2020-02-16T03:20:03.082982Z",
549 | "start_time": "2020-02-16T03:20:03.062532Z"
550 | }
551 | },
552 | "outputs": [
553 | {
554 | "name": "stdout",
555 | "output_type": "stream",
556 | "text": [
557 | " Volume in drive G is HDD Storge 2\n",
558 | " Volume Serial Number is D2CA-B02B\n",
559 | "\n",
560 | " Directory of G:\\Qaurter2Notebooks\\piaic_q2_class_reseouces\\practiceResource\n",
561 | "\n",
562 | "01/10/2021 12:15 PM .\n",
563 | "01/10/2021 12:15 PM ..\n",
564 | "01/10/2021 12:12 PM .ipynb_checkpoints\n",
565 | "01/09/2021 10:33 AM 15,377 Answers.ipynb\n",
566 | "01/10/2021 12:03 PM 81,593 astronauts.csv\n",
567 | "01/10/2021 12:10 PM 33,930 dataInspecting.ipynb\n",
568 | "01/10/2021 12:04 PM 19,860 dataloading.ipynb\n",
569 | "01/10/2021 12:14 PM 18,591 dataSavingAndSerialising.ipynb\n",
570 | "01/10/2021 11:57 AM 11,328 heart.csv\n",
571 | "01/09/2021 10:31 AM 35,216 heart.pkl\n",
572 | "01/09/2021 10:53 AM 32,414 NumpyVPandas.ipynb\n",
573 | "01/09/2021 10:33 AM 2,812 Questions.ipynb\n",
574 | "01/10/2021 12:14 PM 87,030 save.csv\n",
575 | "01/10/2021 12:15 PM 801,617 save.hdf\n",
576 | "01/10/2021 12:15 PM 90,693 save.pkl\n",
577 | "01/09/2021 10:30 AM 18,594 SavingAndSerialising.ipynb\n",
578 | " 13 File(s) 1,249,055 bytes\n",
579 | " 3 Dir(s) 391,598,575,616 bytes free\n"
580 | ]
581 | }
582 | ],
583 | "source": [
584 | "%ls"
585 | ]
586 | },
587 | {
588 | "cell_type": "markdown",
589 | "metadata": {},
590 | "source": [
591 | "### Recap\n",
592 | "\n",
593 | "In terms of file size, HDF5 is the largest for this example. Everything else is approximately equal. For small data sizes, often csv is the easiest as its human readable. HDF5 is great for *loading* in huge amounts of data quickly. Pickle is faster than CSV, but not human readable.\n",
594 | "\n",
595 | "Lots of options, don't get hung up on any of them. csv and pickle are easy and for most cases work fine."
596 | ]
597 | }
598 | ],
599 | "metadata": {
600 | "kernelspec": {
601 | "display_name": "Python 3",
602 | "language": "python",
603 | "name": "python3"
604 | },
605 | "language_info": {
606 | "codemirror_mode": {
607 | "name": "ipython",
608 | "version": 3
609 | },
610 | "file_extension": ".py",
611 | "mimetype": "text/x-python",
612 | "name": "python",
613 | "nbconvert_exporter": "python",
614 | "pygments_lexer": "ipython3",
615 | "version": "3.7.4"
616 | }
617 | },
618 | "nbformat": 4,
619 | "nbformat_minor": 2
620 | }
621 |
--------------------------------------------------------------------------------
/practiceResource/dataMaipulation/5_Basics_ApplyMapVectorised.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Basics - Apply, Map and Vectorised Functions"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "ExecuteTime": {
15 | "end_time": "2020-02-22T02:40:07.096900Z",
16 | "start_time": "2020-02-22T02:40:04.211321Z"
17 | }
18 | },
19 | "outputs": [
20 | {
21 | "data": {
22 | "text/html": [
23 | "\n",
24 | "\n",
37 | "
\n",
38 | " \n",
39 | " \n",
40 | " | \n",
41 | " A | \n",
42 | " B | \n",
43 | " C | \n",
44 | "
\n",
45 | " \n",
46 | " \n",
47 | " \n",
48 | " | 0 | \n",
49 | " -1.01 | \n",
50 | " -0.70 | \n",
51 | " 0.62 | \n",
52 | "
\n",
53 | " \n",
54 | " | 1 | \n",
55 | " 0.78 | \n",
56 | " 0.05 | \n",
57 | " 0.68 | \n",
58 | "
\n",
59 | " \n",
60 | " | 2 | \n",
61 | " 0.38 | \n",
62 | " -0.05 | \n",
63 | " -0.07 | \n",
64 | "
\n",
65 | " \n",
66 | " | 3 | \n",
67 | " -1.55 | \n",
68 | " -0.19 | \n",
69 | " -0.08 | \n",
70 | "
\n",
71 | " \n",
72 | "
\n",
73 | "
"
74 | ],
75 | "text/plain": [
76 | " A B C\n",
77 | "0 -1.01 -0.70 0.62\n",
78 | "1 0.78 0.05 0.68\n",
79 | "2 0.38 -0.05 -0.07\n",
80 | "3 -1.55 -0.19 -0.08"
81 | ]
82 | },
83 | "execution_count": 1,
84 | "metadata": {},
85 | "output_type": "execute_result"
86 | }
87 | ],
88 | "source": [
89 | "import pandas as pd\n",
90 | "import numpy as np\n",
91 | "\n",
92 | "data = np.round(np.random.normal(size=(4, 3)), 2)\n",
93 | "df = pd.DataFrame(data, columns=[\"A\", \"B\", \"C\"])\n",
94 | "df.head()"
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "metadata": {},
100 | "source": [
101 | "## Apply\n",
102 | "\n",
103 | "Used to execute an arbitrary function again an entire dataframe, or a subection. Applies in a vectorised fashion."
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 3,
109 | "metadata": {
110 | "ExecuteTime": {
111 | "end_time": "2020-02-22T03:02:25.417815Z",
112 | "start_time": "2020-02-22T03:02:25.407038Z"
113 | }
114 | },
115 | "outputs": [
116 | {
117 | "data": {
118 | "text/html": [
119 | "\n",
120 | "\n",
133 | "
\n",
134 | " \n",
135 | " \n",
136 | " | \n",
137 | " A | \n",
138 | " B | \n",
139 | " C | \n",
140 | "
\n",
141 | " \n",
142 | " \n",
143 | " \n",
144 | " | 0 | \n",
145 | " 2.01 | \n",
146 | " 1.70 | \n",
147 | " 1.62 | \n",
148 | "
\n",
149 | " \n",
150 | " | 1 | \n",
151 | " 1.78 | \n",
152 | " 1.05 | \n",
153 | " 1.68 | \n",
154 | "
\n",
155 | " \n",
156 | " | 2 | \n",
157 | " 1.38 | \n",
158 | " 1.05 | \n",
159 | " 1.07 | \n",
160 | "
\n",
161 | " \n",
162 | " | 3 | \n",
163 | " 2.55 | \n",
164 | " 1.19 | \n",
165 | " 1.08 | \n",
166 | "
\n",
167 | " \n",
168 | "
\n",
169 | "
"
170 | ],
171 | "text/plain": [
172 | " A B C\n",
173 | "0 2.01 1.70 1.62\n",
174 | "1 1.78 1.05 1.68\n",
175 | "2 1.38 1.05 1.07\n",
176 | "3 2.55 1.19 1.08"
177 | ]
178 | },
179 | "execution_count": 3,
180 | "metadata": {},
181 | "output_type": "execute_result"
182 | }
183 | ],
184 | "source": [
185 | "df.apply(lambda x: 1 + np.abs(x))"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 4,
191 | "metadata": {
192 | "ExecuteTime": {
193 | "end_time": "2020-02-22T03:02:55.820553Z",
194 | "start_time": "2020-02-22T03:02:55.814335Z"
195 | }
196 | },
197 | "outputs": [
198 | {
199 | "data": {
200 | "text/plain": [
201 | "0 1.01\n",
202 | "1 0.78\n",
203 | "2 0.38\n",
204 | "3 1.55\n",
205 | "Name: A, dtype: float64"
206 | ]
207 | },
208 | "execution_count": 4,
209 | "metadata": {},
210 | "output_type": "execute_result"
211 | }
212 | ],
213 | "source": [
214 | "df.A.apply(np.abs)"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 6,
220 | "metadata": {
221 | "ExecuteTime": {
222 | "end_time": "2020-02-22T03:04:42.256485Z",
223 | "start_time": "2020-02-22T03:04:42.253987Z"
224 | }
225 | },
226 | "outputs": [],
227 | "source": [
228 | "#def double_if_positive(x):\n",
229 | "# if x > 0:\n",
230 | "# return 2 * x\n",
231 | "# return x\n",
232 | "#\n",
233 | "#df.apply(double_if_positive)"
234 | ]
235 | },
236 | {
237 | "cell_type": "code",
238 | "execution_count": 7,
239 | "metadata": {
240 | "ExecuteTime": {
241 | "end_time": "2020-02-22T03:05:04.690134Z",
242 | "start_time": "2020-02-22T03:05:04.662382Z"
243 | }
244 | },
245 | "outputs": [
246 | {
247 | "data": {
248 | "text/html": [
249 | "\n",
250 | "\n",
263 | "
\n",
264 | " \n",
265 | " \n",
266 | " | \n",
267 | " A | \n",
268 | " B | \n",
269 | " C | \n",
270 | "
\n",
271 | " \n",
272 | " \n",
273 | " \n",
274 | " | 0 | \n",
275 | " -1.01 | \n",
276 | " -0.70 | \n",
277 | " 1.24 | \n",
278 | "
\n",
279 | " \n",
280 | " | 1 | \n",
281 | " 1.56 | \n",
282 | " 0.10 | \n",
283 | " 1.36 | \n",
284 | "
\n",
285 | " \n",
286 | " | 2 | \n",
287 | " 0.76 | \n",
288 | " -0.05 | \n",
289 | " -0.07 | \n",
290 | "
\n",
291 | " \n",
292 | " | 3 | \n",
293 | " -1.55 | \n",
294 | " -0.19 | \n",
295 | " -0.08 | \n",
296 | "
\n",
297 | " \n",
298 | "
\n",
299 | "
"
300 | ],
301 | "text/plain": [
302 | " A B C\n",
303 | "0 -1.01 -0.70 1.24\n",
304 | "1 1.56 0.10 1.36\n",
305 | "2 0.76 -0.05 -0.07\n",
306 | "3 -1.55 -0.19 -0.08"
307 | ]
308 | },
309 | "execution_count": 7,
310 | "metadata": {},
311 | "output_type": "execute_result"
312 | }
313 | ],
314 | "source": [
315 | "def double_if_positive(x):\n",
316 | " x[x > 0] *= 2\n",
317 | " return x\n",
318 | "\n",
319 | "df.apply(double_if_positive)"
320 | ]
321 | },
322 | {
323 | "cell_type": "code",
324 | "execution_count": 8,
325 | "metadata": {
326 | "ExecuteTime": {
327 | "end_time": "2020-02-22T03:05:32.894881Z",
328 | "start_time": "2020-02-22T03:05:32.887394Z"
329 | }
330 | },
331 | "outputs": [
332 | {
333 | "data": {
334 | "text/html": [
335 | "\n",
336 | "\n",
349 | "
\n",
350 | " \n",
351 | " \n",
352 | " | \n",
353 | " A | \n",
354 | " B | \n",
355 | " C | \n",
356 | "
\n",
357 | " \n",
358 | " \n",
359 | " \n",
360 | " | 0 | \n",
361 | " -1.01 | \n",
362 | " -0.70 | \n",
363 | " 1.24 | \n",
364 | "
\n",
365 | " \n",
366 | " | 1 | \n",
367 | " 1.56 | \n",
368 | " 0.10 | \n",
369 | " 1.36 | \n",
370 | "
\n",
371 | " \n",
372 | " | 2 | \n",
373 | " 0.76 | \n",
374 | " -0.05 | \n",
375 | " -0.07 | \n",
376 | "
\n",
377 | " \n",
378 | " | 3 | \n",
379 | " -1.55 | \n",
380 | " -0.19 | \n",
381 | " -0.08 | \n",
382 | "
\n",
383 | " \n",
384 | "
\n",
385 | "
"
386 | ],
387 | "text/plain": [
388 | " A B C\n",
389 | "0 -1.01 -0.70 1.24\n",
390 | "1 1.56 0.10 1.36\n",
391 | "2 0.76 -0.05 -0.07\n",
392 | "3 -1.55 -0.19 -0.08"
393 | ]
394 | },
395 | "execution_count": 8,
396 | "metadata": {},
397 | "output_type": "execute_result"
398 | }
399 | ],
400 | "source": [
401 | "df"
402 | ]
403 | },
404 | {
405 | "cell_type": "code",
406 | "execution_count": 11,
407 | "metadata": {
408 | "ExecuteTime": {
409 | "end_time": "2020-02-22T03:07:51.904072Z",
410 | "start_time": "2020-02-22T03:07:51.894055Z"
411 | }
412 | },
413 | "outputs": [
414 | {
415 | "data": {
416 | "text/html": [
417 | "\n",
418 | "\n",
431 | "
\n",
432 | " \n",
433 | " \n",
434 | " | \n",
435 | " A | \n",
436 | " B | \n",
437 | " C | \n",
438 | "
\n",
439 | " \n",
440 | " \n",
441 | " \n",
442 | " | 0 | \n",
443 | " -1.01 | \n",
444 | " -0.70 | \n",
445 | " 2.48 | \n",
446 | "
\n",
447 | " \n",
448 | " | 1 | \n",
449 | " 3.12 | \n",
450 | " 0.20 | \n",
451 | " 2.72 | \n",
452 | "
\n",
453 | " \n",
454 | " | 2 | \n",
455 | " 1.52 | \n",
456 | " -0.05 | \n",
457 | " -0.07 | \n",
458 | "
\n",
459 | " \n",
460 | " | 3 | \n",
461 | " -1.55 | \n",
462 | " -0.19 | \n",
463 | " -0.08 | \n",
464 | "
\n",
465 | " \n",
466 | "
\n",
467 | "
"
468 | ],
469 | "text/plain": [
470 | " A B C\n",
471 | "0 -1.01 -0.70 2.48\n",
472 | "1 3.12 0.20 2.72\n",
473 | "2 1.52 -0.05 -0.07\n",
474 | "3 -1.55 -0.19 -0.08"
475 | ]
476 | },
477 | "execution_count": 11,
478 | "metadata": {},
479 | "output_type": "execute_result"
480 | }
481 | ],
482 | "source": [
483 | "def double_if_positive(x):\n",
484 | " x = x.copy()\n",
485 | " x[x > 0] *= 2\n",
486 | " return x\n",
487 | "\n",
488 | "df.apply(double_if_positive, raw=True)"
489 | ]
490 | },
491 | {
492 | "cell_type": "markdown",
493 | "metadata": {},
494 | "source": [
495 | "## Map\n",
496 | "\n",
497 | "Similar to apply, but operators on Series, and uses dictionary based inputs rather than an array of values.\n"
498 | ]
499 | },
500 | {
501 | "cell_type": "code",
502 | "execution_count": 12,
503 | "metadata": {
504 | "ExecuteTime": {
505 | "end_time": "2020-02-22T03:09:07.652877Z",
506 | "start_time": "2020-02-22T03:09:07.646810Z"
507 | }
508 | },
509 | "outputs": [],
510 | "source": [
511 | "series = pd.Series([\"Steve\", \"Alex\", \"Jess\", \"Mark\"])"
512 | ]
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": 13,
517 | "metadata": {
518 | "ExecuteTime": {
519 | "end_time": "2020-02-22T03:09:19.239855Z",
520 | "start_time": "2020-02-22T03:09:19.231863Z"
521 | }
522 | },
523 | "outputs": [
524 | {
525 | "data": {
526 | "text/plain": [
527 | "0 Stephen\n",
528 | "1 NaN\n",
529 | "2 NaN\n",
530 | "3 NaN\n",
531 | "dtype: object"
532 | ]
533 | },
534 | "execution_count": 13,
535 | "metadata": {},
536 | "output_type": "execute_result"
537 | }
538 | ],
539 | "source": [
540 | "series.map({\"Steve\": \"Stephen\"})"
541 | ]
542 | },
543 | {
544 | "cell_type": "code",
545 | "execution_count": 14,
546 | "metadata": {
547 | "ExecuteTime": {
548 | "end_time": "2020-02-22T03:10:19.253698Z",
549 | "start_time": "2020-02-22T03:10:19.247477Z"
550 | }
551 | },
552 | "outputs": [
553 | {
554 | "data": {
555 | "text/plain": [
556 | "0 I am Steve\n",
557 | "1 I am Alex\n",
558 | "2 I am Jess\n",
559 | "3 I am Mark\n",
560 | "dtype: object"
561 | ]
562 | },
563 | "execution_count": 14,
564 | "metadata": {},
565 | "output_type": "execute_result"
566 | }
567 | ],
568 | "source": [
569 | "series.map(lambda d: f\"I am {d}\")"
570 | ]
571 | },
572 | {
573 | "cell_type": "markdown",
574 | "metadata": {
575 | "ExecuteTime": {
576 | "end_time": "2020-02-22T03:12:35.912759Z",
577 | "start_time": "2020-02-22T03:12:35.902370Z"
578 | }
579 | },
580 | "source": [
581 | "## Vectorised functions\n",
582 | "\n",
583 | "Pandas and numpy obviously have tons of these, here are some examples"
584 | ]
585 | },
586 | {
587 | "cell_type": "code",
588 | "execution_count": 17,
589 | "metadata": {
590 | "ExecuteTime": {
591 | "end_time": "2020-02-22T03:14:11.987446Z",
592 | "start_time": "2020-02-22T03:14:11.974356Z"
593 | }
594 | },
595 | "outputs": [
596 | {
597 | "data": {
598 | "text/html": [
599 | "\n",
600 | "\n",
613 | "
\n",
614 | " \n",
615 | " \n",
616 | " | \n",
617 | " A | \n",
618 | " B | \n",
619 | " C | \n",
620 | "
\n",
621 | " \n",
622 | " \n",
623 | " \n",
624 | " | 0 | \n",
625 | " -1.01 | \n",
626 | " -0.70 | \n",
627 | " 1.24 | \n",
628 | "
\n",
629 | " \n",
630 | " | 1 | \n",
631 | " 1.56 | \n",
632 | " 0.10 | \n",
633 | " 1.36 | \n",
634 | "
\n",
635 | " \n",
636 | " | 2 | \n",
637 | " 0.76 | \n",
638 | " -0.05 | \n",
639 | " -0.07 | \n",
640 | "
\n",
641 | " \n",
642 | " | 3 | \n",
643 | " -1.55 | \n",
644 | " -0.19 | \n",
645 | " -0.08 | \n",
646 | "
\n",
647 | " \n",
648 | "
\n",
649 | "
"
650 | ],
651 | "text/plain": [
652 | " A B C\n",
653 | "0 -1.01 -0.70 1.24\n",
654 | "1 1.56 0.10 1.36\n",
655 | "2 0.76 -0.05 -0.07\n",
656 | "3 -1.55 -0.19 -0.08"
657 | ]
658 | },
659 | "metadata": {},
660 | "output_type": "display_data"
661 | },
662 | {
663 | "data": {
664 | "text/html": [
665 | "\n",
666 | "\n",
679 | "
\n",
680 | " \n",
681 | " \n",
682 | " | \n",
683 | " A | \n",
684 | " B | \n",
685 | " C | \n",
686 | "
\n",
687 | " \n",
688 | " \n",
689 | " \n",
690 | " | 0 | \n",
691 | " 1.01 | \n",
692 | " 0.70 | \n",
693 | " 1.24 | \n",
694 | "
\n",
695 | " \n",
696 | " | 1 | \n",
697 | " 1.56 | \n",
698 | " 0.10 | \n",
699 | " 1.36 | \n",
700 | "
\n",
701 | " \n",
702 | " | 2 | \n",
703 | " 0.76 | \n",
704 | " 0.05 | \n",
705 | " 0.07 | \n",
706 | "
\n",
707 | " \n",
708 | " | 3 | \n",
709 | " 1.55 | \n",
710 | " 0.19 | \n",
711 | " 0.08 | \n",
712 | "
\n",
713 | " \n",
714 | "
\n",
715 | "
"
716 | ],
717 | "text/plain": [
718 | " A B C\n",
719 | "0 1.01 0.70 1.24\n",
720 | "1 1.56 0.10 1.36\n",
721 | "2 0.76 0.05 0.07\n",
722 | "3 1.55 0.19 0.08"
723 | ]
724 | },
725 | "metadata": {},
726 | "output_type": "display_data"
727 | }
728 | ],
729 | "source": [
730 | "display(df, df.abs())"
731 | ]
732 | },
733 | {
734 | "cell_type": "code",
735 | "execution_count": 18,
736 | "metadata": {
737 | "ExecuteTime": {
738 | "end_time": "2020-02-22T03:14:53.996400Z",
739 | "start_time": "2020-02-22T03:14:53.992364Z"
740 | }
741 | },
742 | "outputs": [],
743 | "source": [
744 | "series = pd.Series([\"Obi-Wan Kenobi\", \"Luke Skywalker\", \"Han Solo\", \"Leia Organa\"])"
745 | ]
746 | },
747 | {
748 | "cell_type": "code",
749 | "execution_count": 20,
750 | "metadata": {
751 | "ExecuteTime": {
752 | "end_time": "2020-02-22T03:15:40.875036Z",
753 | "start_time": "2020-02-22T03:15:40.871022Z"
754 | }
755 | },
756 | "outputs": [
757 | {
758 | "data": {
759 | "text/plain": [
760 | "['Luke', 'Skywalker']"
761 | ]
762 | },
763 | "execution_count": 20,
764 | "metadata": {},
765 | "output_type": "execute_result"
766 | }
767 | ],
768 | "source": [
769 | "\"Luke Skywalker\".split()"
770 | ]
771 | },
772 | {
773 | "cell_type": "code",
774 | "execution_count": 23,
775 | "metadata": {
776 | "ExecuteTime": {
777 | "end_time": "2020-02-22T03:16:42.001894Z",
778 | "start_time": "2020-02-22T03:16:41.992370Z"
779 | }
780 | },
781 | "outputs": [
782 | {
783 | "data": {
784 | "text/html": [
785 | "\n",
786 | "\n",
799 | "
\n",
800 | " \n",
801 | " \n",
802 | " | \n",
803 | " 0 | \n",
804 | " 1 | \n",
805 | "
\n",
806 | " \n",
807 | " \n",
808 | " \n",
809 | " | 0 | \n",
810 | " Obi-Wan | \n",
811 | " Kenobi | \n",
812 | "
\n",
813 | " \n",
814 | " | 1 | \n",
815 | " Luke | \n",
816 | " Skywalker | \n",
817 | "
\n",
818 | " \n",
819 | " | 2 | \n",
820 | " Han | \n",
821 | " Solo | \n",
822 | "
\n",
823 | " \n",
824 | " | 3 | \n",
825 | " Leia | \n",
826 | " Organa | \n",
827 | "
\n",
828 | " \n",
829 | "
\n",
830 | "
"
831 | ],
832 | "text/plain": [
833 | " 0 1\n",
834 | "0 Obi-Wan Kenobi\n",
835 | "1 Luke Skywalker\n",
836 | "2 Han Solo\n",
837 | "3 Leia Organa"
838 | ]
839 | },
840 | "execution_count": 23,
841 | "metadata": {},
842 | "output_type": "execute_result"
843 | }
844 | ],
845 | "source": [
846 | "series.str.split(expand=True)"
847 | ]
848 | },
849 | {
850 | "cell_type": "code",
851 | "execution_count": 24,
852 | "metadata": {
853 | "ExecuteTime": {
854 | "end_time": "2020-02-22T03:17:28.038500Z",
855 | "start_time": "2020-02-22T03:17:28.033999Z"
856 | }
857 | },
858 | "outputs": [
859 | {
860 | "data": {
861 | "text/plain": [
862 | "0 False\n",
863 | "1 True\n",
864 | "2 False\n",
865 | "3 False\n",
866 | "dtype: bool"
867 | ]
868 | },
869 | "execution_count": 24,
870 | "metadata": {},
871 | "output_type": "execute_result"
872 | }
873 | ],
874 | "source": [
875 | "series.str.contains(\"Skywalker\")"
876 | ]
877 | },
878 | {
879 | "cell_type": "code",
880 | "execution_count": 26,
881 | "metadata": {
882 | "ExecuteTime": {
883 | "end_time": "2020-02-22T03:18:20.707962Z",
884 | "start_time": "2020-02-22T03:18:20.702104Z"
885 | }
886 | },
887 | "outputs": [
888 | {
889 | "data": {
890 | "text/plain": [
891 | "0 [OBI-WAN, KENOBI]\n",
892 | "1 [LUKE, SKYWALKER]\n",
893 | "2 [HAN, SOLO]\n",
894 | "3 [LEIA, ORGANA]\n",
895 | "dtype: object"
896 | ]
897 | },
898 | "execution_count": 26,
899 | "metadata": {},
900 | "output_type": "execute_result"
901 | }
902 | ],
903 | "source": [
904 | "series.str.upper().str.split()"
905 | ]
906 | },
907 | {
908 | "cell_type": "markdown",
909 | "metadata": {},
910 | "source": [
911 | "## User defined functions\n",
912 | "\n",
913 | "Lets investigate a super simple example of trying to find the hypotenuse given x and y distances.\n"
914 | ]
915 | },
916 | {
917 | "cell_type": "code",
918 | "execution_count": 27,
919 | "metadata": {
920 | "ExecuteTime": {
921 | "end_time": "2020-02-22T03:19:38.514718Z",
922 | "start_time": "2020-02-22T03:19:38.503227Z"
923 | }
924 | },
925 | "outputs": [],
926 | "source": [
927 | "data2 = np.random.normal(10, 2, size=(100000, 2))\n",
928 | "df2 = pd.DataFrame(data2, columns=[\"x\", \"y\"])"
929 | ]
930 | },
931 | {
932 | "cell_type": "code",
933 | "execution_count": 28,
934 | "metadata": {
935 | "ExecuteTime": {
936 | "end_time": "2020-02-22T03:20:22.345484Z",
937 | "start_time": "2020-02-22T03:20:22.320297Z"
938 | }
939 | },
940 | "outputs": [
941 | {
942 | "name": "stdout",
943 | "output_type": "stream",
944 | "text": [
945 | "13.385640543875555\n"
946 | ]
947 | }
948 | ],
949 | "source": [
950 | "hypot = (df2.x**2 + df2.y**2)**0.5\n",
951 | "print(hypot[0])"
952 | ]
953 | },
954 | {
955 | "cell_type": "code",
956 | "execution_count": 29,
957 | "metadata": {
958 | "ExecuteTime": {
959 | "end_time": "2020-02-22T03:22:05.047787Z",
960 | "start_time": "2020-02-22T03:21:57.547968Z"
961 | }
962 | },
963 | "outputs": [
964 | {
965 | "name": "stdout",
966 | "output_type": "stream",
967 | "text": [
968 | "13.385640543875555\n"
969 | ]
970 | }
971 | ],
972 | "source": [
973 | "def hypot1(x, y):\n",
974 | " return np.sqrt(x**2 + y**2)\n",
975 | "\n",
976 | "h1 = []\n",
977 | "for index, (x, y) in df2.iterrows():\n",
978 | " h1.append(hypot1(x, y))\n",
979 | "print(h1[0])"
980 | ]
981 | },
982 | {
983 | "cell_type": "code",
984 | "execution_count": 30,
985 | "metadata": {
986 | "ExecuteTime": {
987 | "end_time": "2020-02-22T03:23:27.324121Z",
988 | "start_time": "2020-02-22T03:23:24.153687Z"
989 | }
990 | },
991 | "outputs": [
992 | {
993 | "name": "stdout",
994 | "output_type": "stream",
995 | "text": [
996 | "13.385640543875555\n"
997 | ]
998 | }
999 | ],
1000 | "source": [
1001 | "def hypot2(row):\n",
1002 | " return np.sqrt(row.x**2 + row.y**2)\n",
1003 | "\n",
1004 | "h2 = df2.apply(hypot2, axis=1)\n",
1005 | "print(h2[0])"
1006 | ]
1007 | },
1008 | {
1009 | "cell_type": "code",
1010 | "execution_count": 31,
1011 | "metadata": {
1012 | "ExecuteTime": {
1013 | "end_time": "2020-02-22T03:24:23.324639Z",
1014 | "start_time": "2020-02-22T03:24:23.313038Z"
1015 | }
1016 | },
1017 | "outputs": [
1018 | {
1019 | "name": "stdout",
1020 | "output_type": "stream",
1021 | "text": [
1022 | "13.385640543875555\n"
1023 | ]
1024 | }
1025 | ],
1026 | "source": [
1027 | "def hypot3(xs, ys):\n",
1028 | " return np.sqrt(xs**2 + ys**2)\n",
1029 | "h3 = hypot3(df2.x, df2.y)\n",
1030 | "print(h3[0])"
1031 | ]
1032 | },
1033 | {
1034 | "cell_type": "markdown",
1035 | "metadata": {},
1036 | "source": [
1037 | "Vectorising everything you can is the key to speeding up your code. Once you've done that, you should use other tools to investigate. PyCharm Professional has a great optimisation tool built in. Jupyter has %lprun (line profiler) command you can find here: https://github.com/rkern/line_profiler\n",
1038 | "\n",
1039 | "### Recap\n",
1040 | "\n",
1041 | "* apply\n",
1042 | "* map\n",
1043 | "* .str & similar"
1044 | ]
1045 | },
1046 | {
1047 | "cell_type": "code",
1048 | "execution_count": null,
1049 | "metadata": {},
1050 | "outputs": [],
1051 | "source": []
1052 | }
1053 | ],
1054 | "metadata": {
1055 | "kernelspec": {
1056 | "display_name": "Python 3",
1057 | "language": "python",
1058 | "name": "python3"
1059 | },
1060 | "language_info": {
1061 | "codemirror_mode": {
1062 | "name": "ipython",
1063 | "version": 3
1064 | },
1065 | "file_extension": ".py",
1066 | "mimetype": "text/x-python",
1067 | "name": "python",
1068 | "nbconvert_exporter": "python",
1069 | "pygments_lexer": "ipython3",
1070 | "version": "3.7.3"
1071 | }
1072 | },
1073 | "nbformat": 4,
1074 | "nbformat_minor": 2
1075 | }
1076 |
--------------------------------------------------------------------------------