(plus a few new ideas)\n",
15 | "\n",
16 | "
"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "## Scientific Programming\n",
24 | "\n",
25 | "**Definition**\n",
26 | "1. Programming whose goal is for scientific usage (e.g. workflows, data analysis) and visualization.\n",
27 | "\n",
28 | "2. Learning to program is an academic, scholarly manner (i.e. wissenschaftliches Arbeit).\n",
29 | " - minimize errors\n",
30 | " - enable others to expand upon your ideas\n",
31 | " - reproducible results\n",
32 | " - convey knowledge (e.g. ideas are placed into context and explained)\n",
33 | " - being precise and explicit in what is done\n",
34 | " - following established standards within computer science (e.g. PEP8)\n",
35 | "\n",
36 | "**3 Ways to Think About It**\n",
37 | "1. **Usage**: to perform mathematics (simple numerical computations to some complex math models)\n",
38 | "\n",
39 | "2. **Practice**: to create while maintaining good scholarship\n",
40 | " - knowing what is state-of-the-art\n",
41 | " - being clear and supportive (e.g. citing sources)\n",
42 | " - write code in concise, explicit manner (i.e. do something with understanding and intention)\n",
43 | " - write code in a way that minimizes the chances of introducing an error\n",
44 | " - \"A machine requires precise instructions, which users often fail to supply\"[1]\n",
45 | "\n",
46 | "3. **Target**: to support science (doing research, data support and analysis)\n",
47 | " - \"Scientists commonly use languages such as Python and R to conduct and **automate analyses**, because in this way they can **speed data crunching**, **increase reproducibility**, protect data from accidental deletion or alteration and handle data sets that would overwhelm commercial applications.\"[1]\n",
48 | " - Create workflows to help do the research\n",
49 | " - Create simulations (increasingly becoming more important in research)\n",
50 | " - exploratory: for understanding raw data\n",
51 | " - supportive: for strengthening interpretations of the data\n",
52 | " - predictive: creating new ideas\n",
53 | " \n",
54 | "[1] Baker, Monya. \"Scientific computing: code alert.\" Nature 541, no. 7638 (2017): 563-565.\n",
55 | "\n",
56 | "
\n",
57 | "\n",
58 | "#### Conceptual highlights about the course\n",
59 | "1. Significant amount of personal feedback\n",
60 | "\n",
61 | "2. Emphasis on\n",
62 | " - **K**eep **I**t (i.e. coding) **S**imple & **S**mart (K.I.S.S.)\n",
63 | " - $C^3$: code is **clear** (i.e. clarity), **concise**, and place into **context**\n",
64 | " - easily readable and understandable\n",
65 | " - use of built-in functions over libraries with large overhead\n",
66 | " - user functions for reproducibility, reuse, and error reduction"
67 | ]
68 | },
69 | {
70 | "cell_type": "markdown",
71 | "metadata": {},
72 | "source": [
73 | "#### Lots of built-in keywords and commands"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "**Built-in Functions**\n",
81 | "https://docs.python.org/3/library/functions.html\n",
82 | "\n",
83 | "`abs()\t\t\tdivmod()\t\tinput()\t\t open()\t staticmethod()`
\n",
84 | "`all()\t\t\tenumerate()\t int()\t\t ord()\t str()`
\n",
85 | "`any()\t\t\teval()\t\t isinstance()\tpow()\t sum()`
\n",
86 | "`basestring()\t execfile() \tissubclass()\tprint() super()`
\n",
87 | "`bin()\t\t file()\t\t iter()\t\t property() tuple()`
\n",
88 | "`bool()\t\t filter()\t len()\t\t range()\t type()`
\n",
89 | "`bytearray()\t float()\t list()\t\t raw_input() unichr()`
\n",
90 | "`callable()\t format()\t\tlocals()\t\treduce() unicode()`
\n",
91 | "`chr()\t\t\tfrozenset()\t long()\t\t reload() vars()`
\n",
92 | "`classmethod()\tgetattr()\t map()\t\t repr()\t\txrange()`
\n",
93 | "`cmp()\t\t\tglobals()\t max()\t\t reversed()\tzip()`
\n",
94 | "`compile()\t\thasattr()\t memoryview() round()\t __import__()`
\n",
95 | "`complex()\t\thash()\t\t min()\t\t set()`
\n",
96 | "`delattr()\t\thelp()\t\t next()\t setattr()`
\n",
97 | "`dict()\t\t hex()\t\t object()\t slice()`
\n",
98 | "`dir()\t\t\tid()\t\t oct()\t sorted()`
\n",
99 | "***"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": [
108 | "import keyword\n",
109 | "keyword.kwlist"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {},
115 | "source": [
116 | "#### A comprehensive list of built-in functions and modules\n",
117 | "\n",
118 | "Functions:"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": null,
124 | "metadata": {},
125 | "outputs": [],
126 | "source": [
127 | "dir(__builtins__)"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "Modules:"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": null,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": [
143 | "help('modules')"
144 | ]
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "metadata": {},
149 | "source": [
150 | "
\n",
151 | "\n",
152 | "### What we have learned about so far\n",
153 | "\n",
154 | "1. User functions - creating one code block that focuses on a single idea/concept\n",
155 | " - typing - adds **clarity** to the code\n",
156 | " - **context** through good block comments, and\n",
157 | " - some internal code control ideas (e.g. isinstance, checking for None default assignments)\n",
158 | "\n",
159 | "\n",
160 | "2. Writing **concise** code without losing clarity (i.e. not making it confusing)\n",
161 | " - easier to understand\n",
162 | " - easier to debug\n",
163 | " - less prone to introducing errors\n",
164 | "\n",
165 | "\n",
166 | "3. Significant Figures and Rounding Numbers\n",
167 | "\n",
168 | "\n",
169 | "4. PEP8 styles recommendations\n",
170 | " - why it is important\n",
171 | " - order for importing libraries\n",
172 | " - indentation, spacing and line length\n",
173 | "\n",
174 | "\n",
175 | "5. Pandas library\n",
176 | " - reading in csv files (series and dataframes)\n",
177 | " - Pandas built in math functions\n",
178 | " - plotting data\n",
179 | "\n",
180 | "
"
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {},
186 | "source": [
187 | "#### Data Structures Review\n",
188 | "\n",
189 | "- Lists\n",
190 | "- Dictionaries\n",
191 | "- Tuples\n",
192 | "\n",
193 | "**lists**"
194 | ]
195 | },
196 | {
197 | "cell_type": "code",
198 | "execution_count": null,
199 | "metadata": {},
200 | "outputs": [],
201 | "source": [
202 | "my_list = ['Christmas', 'Halloween', 'German Unity Day', \"New Year's Day\", \"Christmas\"]\n",
203 | "type(my_list)"
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": null,
209 | "metadata": {},
210 | "outputs": [],
211 | "source": [
212 | "for holiday in my_list:\n",
213 | " print(holiday)"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": null,
219 | "metadata": {},
220 | "outputs": [],
221 | "source": [
222 | "my_list"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": null,
228 | "metadata": {},
229 | "outputs": [],
230 | "source": [
231 | "my_list[0] = 'Carnival'\n",
232 | "my_list"
233 | ]
234 | },
235 | {
236 | "cell_type": "markdown",
237 | "metadata": {},
238 | "source": [
239 | "**tuples**\n",
240 | "- faster than lists, bacause they are\n",
241 | "- not mutable (i.e. immutable)"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": null,
247 | "metadata": {},
248 | "outputs": [],
249 | "source": [
250 | "my_tuple = ('Christmas', 'Halloween', 'German Unity Day', \"New Year's Day\", \"Christmas\")\n",
251 | "type(my_tuple)"
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": null,
257 | "metadata": {},
258 | "outputs": [],
259 | "source": [
260 | "for holiday in my_tuple:\n",
261 | " print(holiday)"
262 | ]
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": null,
267 | "metadata": {},
268 | "outputs": [],
269 | "source": [
270 | "my_tuple[0] = 'Carnival'\n",
271 | "my_tuple"
272 | ]
273 | },
274 | {
275 | "cell_type": "markdown",
276 | "metadata": {},
277 | "source": [
278 | "**dictionary**"
279 | ]
280 | },
281 | {
282 | "cell_type": "code",
283 | "execution_count": null,
284 | "metadata": {},
285 | "outputs": [],
286 | "source": [
287 | "my_dictionary = {'German Holidays': ['Christmas', 'Halloween', 'German Unity Day', \"New Year's Day\", \"Christmas\"],\n",
288 | " 'US Holidays': ['Christmas', 'Halloween', 'Thanksgiving', \"New Year's Day\", \"Christmas\"]}\n",
289 | "my_dictionary"
290 | ]
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": null,
295 | "metadata": {},
296 | "outputs": [],
297 | "source": [
298 | "type(my_dictionary)"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": [
307 | "my_dictionary.get('US Holidays')"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": null,
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "list(my_dictionary.keys())"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": null,
322 | "metadata": {},
323 | "outputs": [],
324 | "source": [
325 | "list(my_dictionary.values())"
326 | ]
327 | },
328 | {
329 | "cell_type": "code",
330 | "execution_count": null,
331 | "metadata": {},
332 | "outputs": [],
333 | "source": [
334 | "list(my_dictionary.items())"
335 | ]
336 | },
337 | {
338 | "cell_type": "code",
339 | "execution_count": null,
340 | "metadata": {},
341 | "outputs": [],
342 | "source": [
343 | "for key, value in my_dictionary.items():\n",
344 | " print(key, value)"
345 | ]
346 | },
347 | {
348 | "cell_type": "markdown",
349 | "metadata": {},
350 | "source": [
351 | "#### Identify unique entries\n",
352 | "\n",
353 | "- sets\n",
354 | " - mutable\n",
355 | "- frozen sets\n",
356 | " - immutable"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": null,
362 | "metadata": {},
363 | "outputs": [],
364 | "source": [
365 | "my_set = set(my_list)\n",
366 | "my_set"
367 | ]
368 | },
369 | {
370 | "cell_type": "code",
371 | "execution_count": null,
372 | "metadata": {},
373 | "outputs": [],
374 | "source": [
375 | "type(my_set)"
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": null,
381 | "metadata": {},
382 | "outputs": [],
383 | "source": [
384 | "for holiday in my_set:\n",
385 | " print(holiday)"
386 | ]
387 | },
388 | {
389 | "cell_type": "code",
390 | "execution_count": null,
391 | "metadata": {},
392 | "outputs": [],
393 | "source": [
394 | "my_set.remove(\"New Year's Day\")\n",
395 | "my_set"
396 | ]
397 | },
398 | {
399 | "cell_type": "markdown",
400 | "metadata": {},
401 | "source": [
402 | "**frozen set** (a small new idea that we haven't seen before)"
403 | ]
404 | },
405 | {
406 | "cell_type": "code",
407 | "execution_count": null,
408 | "metadata": {},
409 | "outputs": [],
410 | "source": [
411 | "my_frozenset = frozenset(my_list)\n",
412 | "my_frozenset"
413 | ]
414 | },
415 | {
416 | "cell_type": "code",
417 | "execution_count": null,
418 | "metadata": {},
419 | "outputs": [],
420 | "source": [
421 | "type(my_frozenset)"
422 | ]
423 | },
424 | {
425 | "cell_type": "code",
426 | "execution_count": null,
427 | "metadata": {},
428 | "outputs": [],
429 | "source": [
430 | "my_frozenset.remove(\"Christmas\")"
431 | ]
432 | }
433 | ],
434 | "metadata": {
435 | "kernelspec": {
436 | "display_name": "Python 3 (ipykernel)",
437 | "language": "python",
438 | "name": "python3"
439 | },
440 | "language_info": {
441 | "codemirror_mode": {
442 | "name": "ipython",
443 | "version": 3
444 | },
445 | "file_extension": ".py",
446 | "mimetype": "text/x-python",
447 | "name": "python",
448 | "nbconvert_exporter": "python",
449 | "pygments_lexer": "ipython3",
450 | "version": "3.8.10"
451 | }
452 | },
453 | "nbformat": 4,
454 | "nbformat_minor": 2
455 | }
456 |
--------------------------------------------------------------------------------
/numpy_polynomials.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "1f0ce628",
6 | "metadata": {},
7 | "source": [
8 | "\n",
9 | "\n",
10 | "\n",
11 | "# Polynomials\n",
12 | "\n",
13 | "## Importance of polynomials in science\n",
14 | "\n",
15 | "They are prevalent in all fields - ranging from physics to economics.\n",
16 | "\n",
17 | "**Examples**:\n",
18 | "- chemistry: model potential energy surfaces\n",
19 | "- astronomy: model object (stars, planets, asteroids) trajectories, velocities, interactions \n",
20 | "- economics: model forecast money trends\n",
21 | "- meteorology: model weather patterns\n",
22 | "- engineering: create physical designs (roller coaster)\n",
23 | "- virology: predict contagion growth\n",
24 | "- statistics: regressions and interpolation\n",
25 | "\n",
26 | "Often in the real world, we can rarely evaluate functions exactly because they become **too complicated**. Instead, we evaluate functions **using approximations created using polynomials** (e.g., Taylor series expansions: https://en.wikipedia.org/wiki/Taylor_series).\n",
27 | "\n",
28 | "---\n",
29 | "\n",
30 | "## Polynomials Defintion and Components\n",
31 | "\n",
32 | "Example (2$^{nd}$-degree polynomial): $3x^2 + 2x + 1$\n",
33 | "\n",
34 | "where the overall polynomial degree is defined by the highest power value (i.e., $x^2$: 2$^{nd}$-degree), and the coefficients are [3, 2, 1].\n",
35 | "\n",
36 | "- https://en.wikipedia.org/wiki/Polynomial\n",
37 | "\n",
38 | "\n",
39 | "## Example of simple polynomials using Numpy functions\n",
40 | "\n",
41 | "#### Creating one-dimensional polynomials to various degrees: \n",
42 | "\n",
43 | "`poly1d`: https://numpy.org/doc/stable/reference/generated/numpy.poly1d.html\n",
44 | "\n",
45 | "
\n",
46 | "\n",
47 | "Okay, let's revisit the idea of polynomials, and slowly build up our understanding together."
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": null,
53 | "id": "fe395a92",
54 | "metadata": {},
55 | "outputs": [],
56 | "source": [
57 | "import matplotlib.pyplot as plt\n",
58 | "import numpy as np"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "id": "f6c6139e",
64 | "metadata": {},
65 | "source": [
66 | "1. Create a one dimensional (i.e., has one variable...x) polynomials using a function\n",
67 | " - using different coefficient(s)\n",
68 | " - evaluate the resulting function using a variable (i.e., x) value of 2.\n",
69 | "\n",
70 | "To do this **concisely** and to **reduce error**, let's create a user-defined function that\n",
71 | "\n",
72 | "a) generates a polynomial using provided coefficients,\n",
73 | "\n",
74 | "b) prints out the polynomial equation and\n",
75 | "\n",
76 | "c) its value when evaluated at x=2\n",
77 | "\n",
78 | "**Note**: I do not provide a docstring comment to `isinstance` within this function to simplify the teaching message. Normally, you should include them."
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": null,
84 | "id": "055068d1",
85 | "metadata": {},
86 | "outputs": [],
87 | "source": [
88 | "def my_poly1d(coeff: int=None):\n",
89 | " \n",
90 | " polynomial = np.poly1d(coeff)\n",
91 | " \n",
92 | " print(f'The value of the polynomial')\n",
93 | " print()\n",
94 | " print(f'{polynomial} when x=2: {polynomial(2)}')"
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "id": "2763e62c",
100 | "metadata": {},
101 | "source": [
102 | "#### One coefficient (not very exciting): [M] --> $M$"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": null,
108 | "id": "cd0c38b6",
109 | "metadata": {},
110 | "outputs": [],
111 | "source": [
112 | "coefficients = [3]\n",
113 | "my_poly1d(coefficients)"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "id": "445825fa",
119 | "metadata": {},
120 | "source": [
121 | "Notice that the polynomial generated does not have an 'x' term...the equation is just a constant.\n",
122 | "\n",
123 | "That means the y-values of the \"polynomial\" equation \"3\" is 3 for all x-values:"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": null,
129 | "id": "e6ad295d",
130 | "metadata": {},
131 | "outputs": [],
132 | "source": [
133 | "plt.plot()\n",
134 | "plt.hlines(xmin=0, xmax=10, y=3)\n",
135 | "plt.show()"
136 | ]
137 | },
138 | {
139 | "cell_type": "markdown",
140 | "id": "9d9dff3d",
141 | "metadata": {},
142 | "source": [
143 | "#### Two coefficients: [M, N] --> $Mx + N$\n",
144 | "- Note how the M shifts to the x"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "id": "39a33a98",
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "coefficients = [1, 2]\n",
155 | "my_poly1d(coefficients)"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "id": "a737dfe5",
161 | "metadata": {},
162 | "source": [
163 | "##### Including a negative coefficients: [1, -1]"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": null,
169 | "id": "6e46edcd",
170 | "metadata": {},
171 | "outputs": [],
172 | "source": [
173 | "coefficients = [1, -1]\n",
174 | "my_poly1d(coefficients)"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "id": "c8e49f19",
180 | "metadata": {},
181 | "source": [
182 | "#### Three coefficients: [M, N, O] --> $Mx^2 + Nx + O$"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": null,
188 | "id": "a5389129",
189 | "metadata": {},
190 | "outputs": [],
191 | "source": [
192 | "# coefficients = [x^2, x, constant]\n",
193 | "coefficients = [-4, 1, -2]\n",
194 | "my_poly1d(coefficients)"
195 | ]
196 | },
197 | {
198 | "cell_type": "markdown",
199 | "id": "1dd9d533",
200 | "metadata": {},
201 | "source": [
202 | "#### Four coefficients: [M, N, O, P] --> $Mx^3 + Nx^2 + Ox + P$"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": null,
208 | "id": "cc678750",
209 | "metadata": {},
210 | "outputs": [],
211 | "source": [
212 | "coefficients = [5, -4, 1, -1]\n",
213 | "my_poly1d(coefficients)"
214 | ]
215 | },
216 | {
217 | "cell_type": "markdown",
218 | "id": "428af698",
219 | "metadata": {},
220 | "source": [
221 | "#### Access the Polynomial's Coefficients"
222 | ]
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": null,
227 | "id": "78a7a86a",
228 | "metadata": {},
229 | "outputs": [],
230 | "source": [
231 | "polynomial = np.poly1d(coefficients)\n",
232 | "\n",
233 | "polynomial.coefficients"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "id": "668701f3",
239 | "metadata": {},
240 | "source": [
241 | "#### Access the Polynomial's Order"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": null,
247 | "id": "6c8f5524",
248 | "metadata": {},
249 | "outputs": [],
250 | "source": [
251 | "polynomial.order"
252 | ]
253 | },
254 | {
255 | "cell_type": "markdown",
256 | "id": "3071942b",
257 | "metadata": {},
258 | "source": [
259 | "## Math with polynomials\n",
260 | "\n",
261 | "#### Square of a polynomial\n",
262 | "\n",
263 | "The square of a polynomial\n",
264 | "\n",
265 | "\n",
266 | "$\n",
267 | "\\begin{align}\n",
268 | "(a + b)^2 &= (a + b)(a + b)\\\\\n",
269 | " &= a^2 + 2ab + b^2\n",
270 | "\\end{align}\n",
271 | "$\n",
272 | "\n",
273 | "**Example**:\n",
274 | "\n",
275 | "`np.poly1d([2, 1])` $\\rightarrow$ ${2x + 1}$\n",
276 | "\n",
277 | "Thus we when square this polynomial:\n",
278 | "\n",
279 | "$(2x + 1)(2x + 1) = 4x^2 + 4x + 1$\n",
280 | "\n",
281 | "So, how do we code this:"
282 | ]
283 | },
284 | {
285 | "cell_type": "code",
286 | "execution_count": null,
287 | "id": "2b293c77",
288 | "metadata": {},
289 | "outputs": [],
290 | "source": [
291 | "coefficients = [2, 1]\n",
292 | "\n",
293 | "polynomial = np.poly1d(coefficients)\n",
294 | "\n",
295 | "poly_square = polynomial**2\n",
296 | "\n",
297 | "print(poly_square)"
298 | ]
299 | },
300 | {
301 | "cell_type": "markdown",
302 | "id": "2d91a4c8",
303 | "metadata": {},
304 | "source": [
305 | "Now evaluate the squared polynomial at x=2\n",
306 | "\n",
307 | "(i.e., $ (2x + 1)^2 \\text{ at }x=2 \\rightarrow 4*2^2 + 4*2 + 1$)"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": null,
313 | "id": "eafb76e6",
314 | "metadata": {},
315 | "outputs": [],
316 | "source": [
317 | "poly_square(2)"
318 | ]
319 | },
320 | {
321 | "cell_type": "markdown",
322 | "id": "8e6e0878",
323 | "metadata": {},
324 | "source": [
325 | "#### A polynomial cubed"
326 | ]
327 | },
328 | {
329 | "cell_type": "code",
330 | "execution_count": null,
331 | "id": "21e231c4",
332 | "metadata": {},
333 | "outputs": [],
334 | "source": [
335 | "print(polynomial**3)"
336 | ]
337 | },
338 | {
339 | "cell_type": "markdown",
340 | "id": "e60ebef5",
341 | "metadata": {},
342 | "source": [
343 | "Summary so far:\n",
344 | "1. We reviewed what a polynomial is,\n",
345 | "2. We learned that `poly1d` allows us to create n-degree polynomials that depend on x data\n",
346 | "3. We learned that we can do some polynomial math easily\n",
347 | "\n",
348 | "
\n",
349 | "\n",
350 | "What happens though when you have **x- and y-data** that follow a **polynomial form**, but you **don't know** the **coefficients** to define the polynomial?\n",
351 | "\n",
352 | "\n",
353 | "## Numpy's `polyfit` Function\n",
354 | "\n",
355 | "Fit a polynomial of a specified degree to specific data (x, y).\n",
356 | "\n",
357 | "Returns a \"vector\" of coefficients that minimizes the squared error.\n",
358 | "\n",
359 | "- https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html\n",
360 | "\n",
361 | "First, let's create some **nonideal data** that follows a **cubic polynomial** form."
362 | ]
363 | },
364 | {
365 | "cell_type": "code",
366 | "execution_count": null,
367 | "id": "9f4b5b82",
368 | "metadata": {},
369 | "outputs": [],
370 | "source": [
371 | "x_values = [-12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1,\n",
372 | " 0, 1, 2, 3, 4, 5, 6, 7, 8]\n",
373 | "\n",
374 | "y_values = [2000, 1500, 1400, 700, 600, 500, 300, 100, 70, 30, 20, 5,\n",
375 | " 4, 7, 10, 6, -10, -50, -200, -220, -400]"
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": null,
381 | "id": "2ea722c2",
382 | "metadata": {},
383 | "outputs": [],
384 | "source": [
385 | "plt.plot()\n",
386 | "plt.plot(x_values, y_values, 'o', markersize=15)\n",
387 | "plt.show()"
388 | ]
389 | },
390 | {
391 | "cell_type": "markdown",
392 | "id": "f833017f",
393 | "metadata": {},
394 | "source": [
395 | "Now, `polyfit` will fit a n-degree polynomial (i.e., 3$^{rd}$ degree) to the provided x- and y-data.\n",
396 | "\n",
397 | "The `polyfit` function returns coefficients.\n",
398 | "\n",
399 | "Recall from above: four coefficients are needed to define a 3$^{rd}$ degree polynomial\n",
400 | "\n",
401 | "[M, N, O, P] --> $Mx^3 + Nx^2 + Ox + P$"
402 | ]
403 | },
404 | {
405 | "cell_type": "code",
406 | "execution_count": null,
407 | "id": "3c890c78",
408 | "metadata": {},
409 | "outputs": [],
410 | "source": [
411 | "coefficients = np.polyfit(x_values, y_values, 3)\n",
412 | "coefficients"
413 | ]
414 | },
415 | {
416 | "cell_type": "markdown",
417 | "id": "203b43de",
418 | "metadata": {},
419 | "source": [
420 | "Therefore, we have the following polynomial (rounded coefficients):\n",
421 | "\n",
422 | "$-1.046*x^3 + 1.696*x^2 + 3.413*x + 2.793$\n",
423 | "\n",
424 | "Now we can use Numpy's `poly1d` to encode this polynomial exactly:"
425 | ]
426 | },
427 | {
428 | "cell_type": "code",
429 | "execution_count": null,
430 | "id": "1f37527a",
431 | "metadata": {},
432 | "outputs": [],
433 | "source": [
434 | "cubic_polynomial = np.poly1d(coefficients)\n",
435 | "print(cubic_polynomial)"
436 | ]
437 | },
438 | {
439 | "cell_type": "markdown",
440 | "id": "d7f69413",
441 | "metadata": {},
442 | "source": [
443 | "Using this cubic polynomial, we can generate \"ideal\" y-data given a range of x-data values."
444 | ]
445 | },
446 | {
447 | "cell_type": "code",
448 | "execution_count": null,
449 | "id": "819fc8e1",
450 | "metadata": {},
451 | "outputs": [],
452 | "source": [
453 | "y_ideal_cubic = cubic_polynomial(x_values)"
454 | ]
455 | },
456 | {
457 | "cell_type": "markdown",
458 | "id": "375854cb",
459 | "metadata": {},
460 | "source": [
461 | "Plot both the original data and the data from the fitted cubic polynomial:"
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": null,
467 | "id": "72511232",
468 | "metadata": {},
469 | "outputs": [],
470 | "source": [
471 | "plt.plot()\n",
472 | "plt.plot(x_values, y_values, 'o', markersize=15)\n",
473 | "plt.plot(x_values, y_ideal_cubic, linewidth=5)\n",
474 | "plt.show()"
475 | ]
476 | },
477 | {
478 | "cell_type": "code",
479 | "execution_count": null,
480 | "id": "49a71e37-29f1-401e-9995-cbcb37630788",
481 | "metadata": {},
482 | "outputs": [],
483 | "source": []
484 | }
485 | ],
486 | "metadata": {
487 | "kernelspec": {
488 | "display_name": "Python 3 (ipykernel)",
489 | "language": "python",
490 | "name": "python3"
491 | },
492 | "language_info": {
493 | "codemirror_mode": {
494 | "name": "ipython",
495 | "version": 3
496 | },
497 | "file_extension": ".py",
498 | "mimetype": "text/x-python",
499 | "name": "python",
500 | "nbconvert_exporter": "python",
501 | "pygments_lexer": "ipython3",
502 | "version": "3.11.9"
503 | }
504 | },
505 | "nbformat": 4,
506 | "nbformat_minor": 5
507 | }
508 |
--------------------------------------------------------------------------------
/operations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Basic Operators\n",
8 | "***"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "Class exercise:\n",
16 | " Using a calculator (i.e. not via coding), calculate:\n",
17 | " \n",
18 | " 2 + 3 * 4 / 5\n",
19 | " \n",
20 | " Write your answer in the the 'chat box'\n",
21 | " \n",
22 | "
\n",
23 | "
"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "Answer = 4.4 (we will see how this done below)\n",
31 | "\n",
32 | "---\n",
33 | "# Operators\n",
34 | "\n",
35 | "In Python, there are different types of **operators** (special symbols) that operate on different values. Some of the basic operators include:\n",
36 | "\n",
37 | "- arithmetic operators\n",
38 | " - **`+`** (addition)\n",
39 | " - **`-`** (subtraction)\n",
40 | " - **`*`** (multiplication)\n",
41 | " - **`/`** (division)\n",
42 | " - __`**`__ (exponent)\n",
43 | "\n",
44 | "\n",
45 | "- assignment operators\n",
46 | " - **`=`** (assign a value)\n",
47 | " - **`+=`** (add and re-assign; increment)\n",
48 | " - **`-=`** (subtract and re-assign; decrement)\n",
49 | " - **`*=`** (multiply and re-assign)\n",
50 | "\n",
51 | "\n",
52 | "- comparison operators (return either `True` or `False`; booleans)\n",
53 | " - **`==`** (equal to)\n",
54 | " - **`!=`** (not equal to)\n",
55 | " - **`<`** (less than)\n",
56 | " - **`<=`** (less than or equal to)\n",
57 | " - **`>`** (greater than)\n",
58 | " - **`>=`** (greater than or equal to)"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "---\n",
66 | "## Order of Operation\n",
67 | "\n",
68 | "When multiple operators are used in a single expression, **operator precedence** determines which parts of the expression are evaluated in which order.\n",
69 | "\n",
70 | "Operators with higher precedence are evaluated first.\n",
71 | "\n",
72 | "Operators with the same precedence are evaluated from **left to right**.\n",
73 | "\n",
74 | "Precedence importance ordering:\n",
75 | "1. `()` parentheses, for grouping\n",
76 | "2. `**` exponent\n",
77 | "3. `*`, `/` multiplication and division\n",
78 | "4. `+`, `-` addition and subtraction\n",
79 | "5. `==`, `!=`, `<`, `<=`, `>`, `>=` comparisons\n",
80 | "\n",
81 | "\n",
82 | "#### Additional information:\n",
83 | "\n",
84 | "1. https://docs.python.org/3/reference/expressions.html#operator-precedence\n",
85 | "\n",
86 | "2. https://en.wikipedia.org/wiki/Order_of_operations"
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "metadata": {},
92 | "source": [
93 | "#### Example 1: 2 + 3 * 4 / 5\n",
94 | "\n",
95 | "> Step 1. $3 * 4 = 12$ (**Notice**: `*` and `/` have the same precedence, but `*` is farthest to the left)\n",
96 | "\n",
97 | "> Step 2. $12 / 5 = 2.2$\n",
98 | "\n",
99 | "> Step 3. $2 + 2.2 = 4.4$\n",
100 | "\n",
101 | "Now, let's see what Python gives:"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": null,
107 | "metadata": {},
108 | "outputs": [],
109 | "source": [
110 | "equation = 2.0 + 3.0 * 4.0 / 5.0\n",
111 | "\n",
112 | "print(\"2.0 + 3.0 * 4.0 / 5.0 = {0}\".format(equation))"
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {},
118 | "source": [
119 | "Next, let's force a specific sequence of actions to occur by using `()`.\n",
120 | "\n",
121 | "#### Example 2: (2 + 3) * 4 / 5"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": null,
127 | "metadata": {},
128 | "outputs": [],
129 | "source": [
130 | "equation = (2.0 + 3.0) * 4.0 / 5.0\n",
131 | "\n",
132 | "print(\"(2.0 + 3.0) * 4.0 / 5.0 = {0}\".format(equation))\n",
133 | "\n",
134 | "print('''\n",
135 | " i.e. step 1: 2.0 + 3.0 = 5.0\n",
136 | " step 2: 5.0 * 4.0 = 20.0 since * is further left than /\n",
137 | " step 3: 20.0 / 5.0 = 4.0\n",
138 | " ''')"
139 | ]
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {},
144 | "source": [
145 | "#### Example 3: 2 + 3 * 4 ** 2 / 5"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "equation = 2 + 3 * 4 ** 2 / 5\n",
155 | "\n",
156 | "print(\"2 + 3 * 4 ** 2 / 5 = \", equation)\n",
157 | "\n",
158 | "print(\"\"\"\n",
159 | " i.e. 4**2 = 16\n",
160 | " 3*16 = 48\n",
161 | " 48/5 = 9.6\n",
162 | " 2+9.6 = 11.6\n",
163 | " \"\"\")"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {},
169 | "source": [
170 | "#### Example 4: 2 + (3 * 4) ** 2 / 5"
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": null,
176 | "metadata": {},
177 | "outputs": [],
178 | "source": [
179 | "equation = 2 + (3 * 4) ** 2 / 5\n",
180 | "\n",
181 | "print(\"2 + (3 * 4) ** 2 / 5 = \", equation)\n",
182 | "\n",
183 | "print(\"\"\"\n",
184 | " i.e. 3*4 = 12\n",
185 | " 12**2 = 144\n",
186 | " 144/5 = 28.8\n",
187 | " 2+28.8 = 30.8\n",
188 | " \"\"\")"
189 | ]
190 | },
191 | {
192 | "cell_type": "markdown",
193 | "metadata": {},
194 | "source": [
195 | "**Take-home Message**: Be careful in encoding mathematical equations so that you do introduce and error.\n",
196 | "\n",
197 | "---\n",
198 | "## Increment (by reassigning variable)\n",
199 | "- right-hand-side happens first, and then the left-hand-side\n",
200 | "\n",
201 | "In the following assignment statements\n",
202 | "- evaluation of the right-hand-side is done first, then\n",
203 | "- it is assigned to the variable"
204 | ]
205 | },
206 | {
207 | "cell_type": "markdown",
208 | "metadata": {},
209 | "source": [
210 | "Simple demonstration - counting plants, starting with 1"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": null,
216 | "metadata": {},
217 | "outputs": [],
218 | "source": [
219 | "planets = 1\n",
220 | "print(planets)"
221 | ]
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {},
226 | "source": [
227 | "Now lets add 1 more to the number of planets:\n",
228 | "- notice how `planets + 1` is computed first\n",
229 | "- then the new value is assigned to the variable `planets`"
230 | ]
231 | },
232 | {
233 | "cell_type": "code",
234 | "execution_count": null,
235 | "metadata": {},
236 | "outputs": [],
237 | "source": [
238 | "planets = planets + 1\n",
239 | "print(planets)"
240 | ]
241 | },
242 | {
243 | "cell_type": "markdown",
244 | "metadata": {},
245 | "source": [
246 | "Alternatively, we can use `+= 1` instead of `planets + 1`:"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": null,
252 | "metadata": {},
253 | "outputs": [],
254 | "source": [
255 | "\n",
256 | "planets += 1\n",
257 | "print(planets)"
258 | ]
259 | },
260 | {
261 | "cell_type": "markdown",
262 | "metadata": {},
263 | "source": [
264 | "`+=` will increment other numbers too"
265 | ]
266 | },
267 | {
268 | "cell_type": "code",
269 | "execution_count": null,
270 | "metadata": {},
271 | "outputs": [],
272 | "source": [
273 | "planets += 5\n",
274 | "print(planets)"
275 | ]
276 | },
277 | {
278 | "cell_type": "markdown",
279 | "metadata": {},
280 | "source": [
281 | "---\n",
282 | "## Booleans\n",
283 | "- used through comparison operators (e.g. `<=`)\n",
284 | "\n",
285 | "Demo Scenerio - you have to discover more than 10 planets before you graduate."
286 | ]
287 | },
288 | {
289 | "cell_type": "code",
290 | "execution_count": null,
291 | "metadata": {},
292 | "outputs": [],
293 | "source": [
294 | "total_planets = 10\n",
295 | "print(total_planets < planets)"
296 | ]
297 | },
298 | {
299 | "cell_type": "markdown",
300 | "metadata": {},
301 | "source": [
302 | "### Why could this be interesting?\n",
303 | "- you can create conditions"
304 | ]
305 | },
306 | {
307 | "cell_type": "code",
308 | "execution_count": null,
309 | "metadata": {},
310 | "outputs": [],
311 | "source": [
312 | "planets = 0\n",
313 | "\n",
314 | "if (total_planets < planets) == False:\n",
315 | " print('You can not graduate yet - get back to work!')\n",
316 | "else:\n",
317 | " print('Congratulations - it is time to move on!')"
318 | ]
319 | },
320 | {
321 | "cell_type": "markdown",
322 | "metadata": {},
323 | "source": [
324 | "- you can create loops"
325 | ]
326 | },
327 | {
328 | "cell_type": "code",
329 | "execution_count": null,
330 | "metadata": {},
331 | "outputs": [],
332 | "source": [
333 | "planets = 0\n",
334 | "graduating = False\n",
335 | "\n",
336 | "while not graduating:\n",
337 | " if (total_planets < planets) == False:\n",
338 | " print('You can not graduate yet. Number of planets discovered: {0}'.format(planets))\n",
339 | " planets += 1\n",
340 | " else:\n",
341 | " print('Congratulations - it is time to move on!')\n",
342 | " graduating = True"
343 | ]
344 | }
345 | ],
346 | "metadata": {
347 | "kernelspec": {
348 | "display_name": "Python 3 (ipykernel)",
349 | "language": "python",
350 | "name": "python3"
351 | },
352 | "language_info": {
353 | "codemirror_mode": {
354 | "name": "ipython",
355 | "version": 3
356 | },
357 | "file_extension": ".py",
358 | "mimetype": "text/x-python",
359 | "name": "python",
360 | "nbconvert_exporter": "python",
361 | "pygments_lexer": "ipython3",
362 | "version": "3.8.17"
363 | }
364 | },
365 | "nbformat": 4,
366 | "nbformat_minor": 2
367 | }
368 |
--------------------------------------------------------------------------------
/scientific_figures.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Scientific Figures\n",
8 | "\n",
9 | "- General Figure Making Advice\n",
10 | " - [Ten Simple Rules for Better Figures (PLoS)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833)\n",
11 | "\n",
12 | " \n",
13 | "- Colorblindness\n",
14 | " - [Wong, Bang. \"Points of view: Color blindness.\" Nature (2011) 441.](https://www.nature.com/articles/nmeth.1618#citeas)\n",
15 | " - [Colorbrewer (for pallete suggestions)](https://colorbrewer2.org)"
16 | ]
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {},
21 | "source": [
22 | "Ideas:\n",
23 | " - Find a figure that gives examples of possible plots\n",
24 | " - Talk about the components that go into making a good plot (e.g. ticks, legend, lables)"
25 | ]
26 | }
27 | ],
28 | "metadata": {
29 | "kernelspec": {
30 | "display_name": "Python 3 (ipykernel)",
31 | "language": "python",
32 | "name": "python3"
33 | },
34 | "language_info": {
35 | "codemirror_mode": {
36 | "name": "ipython",
37 | "version": 3
38 | },
39 | "file_extension": ".py",
40 | "mimetype": "text/x-python",
41 | "name": "python",
42 | "nbconvert_exporter": "python",
43 | "pygments_lexer": "ipython3",
44 | "version": "3.8.10"
45 | }
46 | },
47 | "nbformat": 4,
48 | "nbformat_minor": 2
49 | }
50 |
--------------------------------------------------------------------------------
/statistics_simple.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## The Statistics Library\n",
8 | "- Relaively Simple (compared to numpy or scipy)\n",
9 | " - low number of other library dependencies (i.e. more \"usable\" by the community)\n",
10 | "- Relaively Slow (compared to numpy) - use with smaller data sets\n",
11 | "- More precise (compared to numpy) - when high precision is important"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "## Mean, median, standard deviation and variance\n",
19 | "- Measures of the center in a numerical data\n",
20 | " - Mean (i.e. average): can be influenced by several outliers\n",
21 | " - Median: the middle value of sorted data (less influenced by outliers)\n",
22 | "- Standard Deviation: a measure of the data's uncertainty\n",
23 | "- Variance: a measure of the data's spread"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": null,
29 | "metadata": {},
30 | "outputs": [],
31 | "source": [
32 | "import matplotlib.pyplot as plt\n",
33 | "import statistics"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "#### 3 Data sets \n",
41 | "- They have the same mean value\n",
42 | "- They have different median, standard deviation and variance\n",
43 | "- Asssumption: these values have 2 (e.g. 65.) or 3 (i.e. 110.) sigfigs"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": null,
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "data_1 = [65, 75, 73, 50, 60, 64, 69, 62, 67, 85]\n",
53 | "data_2 = [85, 79, 57, 39, 45, 71, 67, 87, 91, 49]\n",
54 | "data_3 = [43, 51, 53, 110, 50, 48, 87, 69, 68, 91]"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": null,
60 | "metadata": {},
61 | "outputs": [],
62 | "source": [
63 | "plt.figure()\n",
64 | "plt.xlim(0, 9)\n",
65 | "plt.ylim(30, 120)\n",
66 | "\n",
67 | "plt.plot(data_1, marker='.', markersize=10)\n",
68 | "plt.show()"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {},
74 | "source": [
75 | "https://matplotlib.org/3.2.1/api/markers_api.html"
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": null,
81 | "metadata": {},
82 | "outputs": [],
83 | "source": [
84 | "plt.figure()\n",
85 | "plt.xlim(0, 9)\n",
86 | "plt.ylim(30, 120)\n",
87 | "\n",
88 | "plt.plot(data_2)\n",
89 | "plt.show()"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": null,
95 | "metadata": {},
96 | "outputs": [],
97 | "source": [
98 | "plt.figure()\n",
99 | "plt.xlim(0, 9)\n",
100 | "plt.ylim(30, 120)\n",
101 | "\n",
102 | "plt.plot(data_3)\n",
103 | "plt.show()"
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "metadata": {},
109 | "source": [
110 | "---\n",
111 | "## Calculate some common statistics (and show on plot)"
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": [
118 | "### Data Set 1"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": null,
124 | "metadata": {},
125 | "outputs": [],
126 | "source": [
127 | "data_1.sort()\n",
128 | "data_1"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": null,
134 | "metadata": {},
135 | "outputs": [],
136 | "source": [
137 | "print(f'Mean: {statistics.mean(data_1)}')\n",
138 | "print(f'Median: {statistics.median(data_1)}')\n",
139 | "print(f'StDev: {statistics.stdev(data_1)}')\n",
140 | "print(f'Variance: {statistics.variance(data_1)}')"
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {},
146 | "source": [
147 | "Mean: read\n",
148 | "\n",
149 | "Median: green"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": null,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "plt.figure()\n",
159 | "plt.xlim(0, 9)\n",
160 | "plt.ylim(30, 120)\n",
161 | "\n",
162 | "plt.axhline(y=67, color='red')\n",
163 | "plt.axhline(y=66, color='green')\n",
164 | "\n",
165 | "plt.plot(data_1, marker='.', markersize=10)\n",
166 | "plt.show()"
167 | ]
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "metadata": {},
172 | "source": [
173 | "### Data Set 2"
174 | ]
175 | },
176 | {
177 | "cell_type": "code",
178 | "execution_count": null,
179 | "metadata": {},
180 | "outputs": [],
181 | "source": [
182 | "print(f'Mean: {statistics.mean(data_2)}')\n",
183 | "print(f'Median: {statistics.median(data_2)}')\n",
184 | "print(f'StDev: {statistics.stdev(data_2)}')\n",
185 | "print(f'Variance: {statistics.variance(data_2)}')"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": [
194 | "plt.figure()\n",
195 | "plt.xlim(0, 9)\n",
196 | "plt.ylim(30, 120)\n",
197 | "\n",
198 | "plt.axhline(y=67, color='red')\n",
199 | "plt.axhline(y=69, color='green')\n",
200 | "\n",
201 | "plt.plot(data_2, marker='.', markersize=10)\n",
202 | "plt.show()"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {},
208 | "source": [
209 | "### Data Set 3"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {},
216 | "outputs": [],
217 | "source": [
218 | "print(f'Mean: {statistics.mean(data_3)}')\n",
219 | "print(f'Median: {statistics.median(data_3)}')\n",
220 | "print(f'StDev: {statistics.stdev(data_3)}')\n",
221 | "print(f'Variance: {statistics.variance(data_3)}')"
222 | ]
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": null,
227 | "metadata": {},
228 | "outputs": [],
229 | "source": [
230 | "plt.figure()\n",
231 | "plt.xlim(0, 9)\n",
232 | "plt.ylim(30, 120)\n",
233 | "\n",
234 | "plt.axhline(y=67, color='red')\n",
235 | "plt.axhline(y=60, color='green')\n",
236 | "\n",
237 | "plt.plot(data_3, marker='.', markersize=10)\n",
238 | "plt.show()"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {},
244 | "source": [
245 | "## How might the above code been done better?\n",
246 | "\n",
247 | "- reduce possible inclusion of errors"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": null,
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "def do_statistics(working_data=None):\n",
257 | " print(f'Mean: {statistics.mean(working_data)}')\n",
258 | " print(f'Median: {statistics.median(working_data)}')\n",
259 | " print(f'StDev: {statistics.stdev(working_data)}')\n",
260 | " print(f'Variance: {statistics.variance(working_data)}')\n",
261 | "\n",
262 | "do_statistics(data_1)"
263 | ]
264 | },
265 | {
266 | "cell_type": "markdown",
267 | "metadata": {},
268 | "source": [
269 | "---\n",
270 | "Source: https://www.siyavula.com/read/maths/grade-11/statistics/11-statistics-04"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": null,
276 | "metadata": {},
277 | "outputs": [],
278 | "source": []
279 | }
280 | ],
281 | "metadata": {
282 | "kernelspec": {
283 | "display_name": "Python 3 (ipykernel)",
284 | "language": "python",
285 | "name": "python3"
286 | },
287 | "language_info": {
288 | "codemirror_mode": {
289 | "name": "ipython",
290 | "version": 3
291 | },
292 | "file_extension": ".py",
293 | "mimetype": "text/x-python",
294 | "name": "python",
295 | "nbconvert_exporter": "python",
296 | "pygments_lexer": "ipython3",
297 | "version": "3.8.10"
298 | }
299 | },
300 | "nbformat": 4,
301 | "nbformat_minor": 2
302 | }
303 |
--------------------------------------------------------------------------------
/student_questions.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Questions during course\n",
8 | "\n",
9 | "## Lecture 1\n",
10 | "1. Does Python have Double object type?\n",
11 | "\n",
12 | " Answer: Python's float type has double precision. NumPy has options for higher percision."
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": null,
18 | "metadata": {},
19 | "outputs": [],
20 | "source": []
21 | }
22 | ],
23 | "metadata": {
24 | "kernelspec": {
25 | "display_name": "Python 3",
26 | "language": "python",
27 | "name": "python3"
28 | },
29 | "language_info": {
30 | "codemirror_mode": {
31 | "name": "ipython",
32 | "version": 3
33 | },
34 | "file_extension": ".py",
35 | "mimetype": "text/x-python",
36 | "name": "python",
37 | "nbconvert_exporter": "python",
38 | "pygments_lexer": "ipython3",
39 | "version": "3.8.1"
40 | }
41 | },
42 | "nbformat": 4,
43 | "nbformat_minor": 2
44 | }
45 |
--------------------------------------------------------------------------------
/t-test.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## T-Tests\n",
8 | "- Used on two samples data sets that are numeric\n",
9 | " - if you have more than two data sets, then you must either do it a) pairwise T-tests, or something else (e.g. Anova)\n",
10 | "\n",
11 | "A T-test does two things:\n",
12 | "1. How **different** two means values (obtained from two sets of distributed data) are - T value\n",
13 | " - Samller values indicate less differences\n",
14 | " \n",
15 | "2. How **significant** are these differences (i.e. did they occur randomly/by chance) - P value\n",
16 | " - Smaller values indicate that they were not random (i.e. P=0.05 means there is a 5% probability that the results happened by chance).\n",
17 | "\n",
18 | "\n",
19 | "Random, Gaussian (i.e. normal) distributin of data points\n",
20 | "- random.normal: https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html\n",
21 | " - creates a Gaussian distribution with a curve width based on a standard deviation value"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": [
30 | "from scipy import stats\n",
31 | "import matplotlib.pyplot as plt\n",
32 | "import numpy as np"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": null,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "## Three data sets - with means at 0.0 and 2.0, and standard deviations of 1.0 and 0.5\n",
42 | "data1 = np.random.normal(0.0, 1.0, size=50)\n",
43 | "data2 = np.random.normal(0.0, 0.5, size=50)\n",
44 | "data3 = np.random.normal(2.0, 1.0, size=50)"
45 | ]
46 | },
47 | {
48 | "cell_type": "markdown",
49 | "metadata": {},
50 | "source": [
51 | "Note: std = 0, would provide perfect resulting means of either 0.0 and 2.0.\n",
52 | "\n",
53 | "So, let's check what the means actually are (i.e. the effect of the std. dev.)"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "np.mean(data1)"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": null,
68 | "metadata": {},
69 | "outputs": [],
70 | "source": [
71 | "np.mean(data2)"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": null,
77 | "metadata": {},
78 | "outputs": [],
79 | "source": [
80 | "np.mean(data3)"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": null,
86 | "metadata": {},
87 | "outputs": [],
88 | "source": [
89 | "plt.plot()\n",
90 | "plt.plot(data1, '-o')\n",
91 | "plt.plot(data2, '-r')\n",
92 | "plt.plot(data3, '-g')\n",
93 | "\n",
94 | "plt.hlines(np.mean(data1), 0, 50, colors='blue')\n",
95 | "plt.hlines(np.mean(data2), 0, 50, colors='red')\n",
96 | "plt.hlines(np.mean(data3), 0, 50, colors='green')\n",
97 | "\n",
98 | "plt.show()"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "Use seaborn to easily plot the histogram of the data"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": null,
111 | "metadata": {},
112 | "outputs": [],
113 | "source": [
114 | "import seaborn as sns\n",
115 | "\n",
116 | "plt.plot()\n",
117 | "sns.kdeplot(data1, color='blue', shade=True)\n",
118 | "sns.kdeplot(data2, color='red', shade=True)\n",
119 | "sns.kdeplot(data3, color='green', shade=True)\n",
120 | "plt.title(\"Histogram of Data\")\n",
121 | "plt.show()"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "metadata": {},
127 | "source": [
128 | "#### t-test\n",
129 | "- https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html\n",
130 | "\n",
131 | "- \"The test measures **whether the mean (expected) value differs significantly across samples**.\"\n",
132 | "\n",
133 | "- A **large p-value** (e.g. **0.05, 0.1**), then it is likely that the averages are not distinguishable.\"\n",
134 | "\n",
135 | "- \"If the p-value is smaller than the threshold, e.g. 1%, 5% or 10%, then we reject the null hypothesis of equal averages.\""
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "t_stat, p_value = stats.ttest_ind(data1, data3, equal_var = False)"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "metadata": {},
151 | "outputs": [],
152 | "source": [
153 | "t_stat"
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": null,
159 | "metadata": {},
160 | "outputs": [],
161 | "source": [
162 | "p_value"
163 | ]
164 | },
165 | {
166 | "cell_type": "markdown",
167 | "metadata": {},
168 | "source": [
169 | "---\n",
170 | "#### Real-world example\n",
171 | "\n",
172 | "Research task: Investigate the height, weight, gender and age of a population of people (e.g. Germans)\n",
173 | "- We can't investigate the full population, so we must sample a random subset of people"
174 | ]
175 | },
176 | {
177 | "cell_type": "code",
178 | "execution_count": null,
179 | "metadata": {},
180 | "outputs": [],
181 | "source": [
182 | "len(np.random.random_sample((50,)))"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": null,
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "import pandas as pd\n",
192 | "\n",
193 | "np.random.seed(10)\n",
194 | "\n",
195 | "height = np.random.uniform(0.5, 2.0, size=50)\n",
196 | "weight = np.random.uniform(15.0, 90.0, size=50)\n",
197 | "gender = np.random.choice(('Male', 'Female'), size=50, p=[0.4, 0.6]) ## With a 40:60 ratio of male:female\n",
198 | "age = np.random.uniform(3.0, 80.0, size=50)"
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": null,
204 | "metadata": {},
205 | "outputs": [],
206 | "source": [
207 | "height"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": null,
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "np.array([height, weight, gender, age])"
217 | ]
218 | },
219 | {
220 | "cell_type": "markdown",
221 | "metadata": {},
222 | "source": [
223 | "Create a dataframe that contains the information of our variables.\n",
224 | "- 1 catagorical type data: gender\n",
225 | "- 3 numeric type data: height, weight and age"
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": null,
231 | "metadata": {},
232 | "outputs": [],
233 | "source": [
234 | "df = pd.DataFrame(list(zip(height, weight, gender, age)), columns=['height (m)', 'weight (kg)', 'gender', 'age'])"
235 | ]
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": null,
240 | "metadata": {},
241 | "outputs": [],
242 | "source": [
243 | "df"
244 | ]
245 | },
246 | {
247 | "cell_type": "code",
248 | "execution_count": null,
249 | "metadata": {},
250 | "outputs": [],
251 | "source": [
252 | "df[df[\"gender\"] == \"Male\"].count()"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": null,
258 | "metadata": {},
259 | "outputs": [],
260 | "source": [
261 | "df[df[\"gender\"] == \"Female\"].count()"
262 | ]
263 | },
264 | {
265 | "cell_type": "markdown",
266 | "metadata": {},
267 | "source": [
268 | "Now let's look at our random distribution and think about correlation between the data."
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": null,
274 | "metadata": {},
275 | "outputs": [],
276 | "source": [
277 | "hist = sns.FacetGrid(df, col=\"gender\")\n",
278 | "hist.map(sns.distplot, \"height (m)\", bins=20)\n",
279 | "hist.add_legend()"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": null,
285 | "metadata": {},
286 | "outputs": [],
287 | "source": [
288 | "hist = sns.FacetGrid(df, col=\"gender\")\n",
289 | "hist.map(sns.distplot, \"weight (kg)\", bins=20)\n",
290 | "hist.add_legend()"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": null,
296 | "metadata": {},
297 | "outputs": [],
298 | "source": [
299 | "hist = sns.FacetGrid(df, col=\"gender\")\n",
300 | "hist.map(sns.distplot, \"age\", bins=20)\n",
301 | "hist.add_legend()"
302 | ]
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "metadata": {},
307 | "source": [
308 | "Compute the mean and median as a function of the gender.\n",
309 | "- use pandas groupby function"
310 | ]
311 | },
312 | {
313 | "cell_type": "code",
314 | "execution_count": null,
315 | "metadata": {},
316 | "outputs": [],
317 | "source": [
318 | "df.groupby(['gender']).mean()"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": null,
324 | "metadata": {},
325 | "outputs": [],
326 | "source": [
327 | "df.groupby(['gender']).median()"
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": null,
333 | "metadata": {},
334 | "outputs": [],
335 | "source": [
336 | "sns.lmplot(x='weight (kg)', y='age', data=df, hue='gender', fit_reg=True, legend=True)\n",
337 | "plt.show()"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": null,
343 | "metadata": {},
344 | "outputs": [],
345 | "source": [
346 | "sns.lmplot(x='height (m)', y='age', data=df, hue='gender', fit_reg=True, legend=True)\n",
347 | "plt.show()"
348 | ]
349 | },
350 | {
351 | "cell_type": "code",
352 | "execution_count": null,
353 | "metadata": {},
354 | "outputs": [],
355 | "source": [
356 | "sns.lmplot(x='height (m)', y='weight (kg)', data=df, hue='gender', fit_reg=True, legend=True)\n",
357 | "plt.show()"
358 | ]
359 | },
360 | {
361 | "cell_type": "code",
362 | "execution_count": null,
363 | "metadata": {},
364 | "outputs": [],
365 | "source": [
366 | "sns.jointplot(x='height (m)', y='age', data=df, kind='reg', color='b')\n",
367 | "plt.show()"
368 | ]
369 | },
370 | {
371 | "cell_type": "markdown",
372 | "metadata": {},
373 | "source": [
374 | "####\n",
375 | "Research Question: Is there a difference between the number of mean and women in the population?\n",
376 | "- Hypothesis Zero (aka Null Hypotheis): there is no difference\n",
377 | "- Hypothesis One: there is a difference\n",
378 | "\n",
379 | "In other words - is our sampling of the real population artifically skewed, or is it likely to be real?"
380 | ]
381 | },
382 | {
383 | "cell_type": "code",
384 | "execution_count": null,
385 | "metadata": {},
386 | "outputs": [],
387 | "source": [
388 | "sns.countplot(x=\"gender\", data=df)"
389 | ]
390 | },
391 | {
392 | "cell_type": "markdown",
393 | "metadata": {},
394 | "source": [
395 | "We must first determine a threshold for how low our statisical result is for use to have confidence that the Null Hypothesis is wrong - i.e. the probability value (P-value) that the Null Hypothesis is wrong.\n",
396 | "\n",
397 | "Common convention is 5%\n",
398 | "\n",
399 | "One-Sample Proportion Test"
400 | ]
401 | },
402 | {
403 | "cell_type": "code",
404 | "execution_count": null,
405 | "metadata": {},
406 | "outputs": [],
407 | "source": [
408 | "from scipy import stats\n",
409 | "\n",
410 | "test = np.array([60,40])\n",
411 | "stats.zscore(test)\n"
412 | ]
413 | },
414 | {
415 | "cell_type": "code",
416 | "execution_count": null,
417 | "metadata": {},
418 | "outputs": [],
419 | "source": [
420 | "df.groupby('gender')['age'].transform(stats.zscore)"
421 | ]
422 | },
423 | {
424 | "cell_type": "code",
425 | "execution_count": null,
426 | "metadata": {},
427 | "outputs": [],
428 | "source": [
429 | "x = (df.groupby('gender')['age'])"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": null,
435 | "metadata": {},
436 | "outputs": [],
437 | "source": [
438 | "for i in x:\n",
439 | " print(i)"
440 | ]
441 | }
442 | ],
443 | "metadata": {
444 | "kernelspec": {
445 | "display_name": "Python 3",
446 | "language": "python",
447 | "name": "python3"
448 | },
449 | "language_info": {
450 | "codemirror_mode": {
451 | "name": "ipython",
452 | "version": 3
453 | },
454 | "file_extension": ".py",
455 | "mimetype": "text/x-python",
456 | "name": "python",
457 | "nbconvert_exporter": "python",
458 | "pygments_lexer": "ipython3",
459 | "version": "3.8.1"
460 | }
461 | },
462 | "nbformat": 4,
463 | "nbformat_minor": 2
464 | }
465 |
--------------------------------------------------------------------------------
/tips.csv:
--------------------------------------------------------------------------------
1 | "total_bill","tip","sex","smoker","day","time","size"
2 | 16.99,1.01,"Female","No","Sun","Dinner",2
3 | 10.34,1.66,"Male","No","Sun","Dinner",3
4 | 21.01,3.5,"Male","No","Sun","Dinner",3
5 | 23.68,3.31,"Male","No","Sun","Dinner",2
6 | 24.59,3.61,"Female","No","Sun","Dinner",4
7 | 25.29,4.71,"Male","No","Sun","Dinner",4
8 | 8.77,2,"Male","No","Sun","Dinner",2
9 | 26.88,3.12,"Male","No","Sun","Dinner",4
10 | 15.04,1.96,"Male","No","Sun","Dinner",2
11 | 14.78,3.23,"Male","No","Sun","Dinner",2
12 | 10.27,1.71,"Male","No","Sun","Dinner",2
13 | 35.26,5,"Female","No","Sun","Dinner",4
14 | 15.42,1.57,"Male","No","Sun","Dinner",2
15 | 18.43,3,"Male","No","Sun","Dinner",4
16 | 14.83,3.02,"Female","No","Sun","Dinner",2
17 | 21.58,3.92,"Male","No","Sun","Dinner",2
18 | 10.33,1.67,"Female","No","Sun","Dinner",3
19 | 16.29,3.71,"Male","No","Sun","Dinner",3
20 | 16.97,3.5,"Female","No","Sun","Dinner",3
21 | 20.65,3.35,"Male","No","Sat","Dinner",3
22 | 17.92,4.08,"Male","No","Sat","Dinner",2
23 | 20.29,2.75,"Female","No","Sat","Dinner",2
24 | 15.77,2.23,"Female","No","Sat","Dinner",2
25 | 39.42,7.58,"Male","No","Sat","Dinner",4
26 | 19.82,3.18,"Male","No","Sat","Dinner",2
27 | 17.81,2.34,"Male","No","Sat","Dinner",4
28 | 13.37,2,"Male","No","Sat","Dinner",2
29 | 12.69,2,"Male","No","Sat","Dinner",2
30 | 21.7,4.3,"Male","No","Sat","Dinner",2
31 | 19.65,3,"Female","No","Sat","Dinner",2
32 | 9.55,1.45,"Male","No","Sat","Dinner",2
33 | 18.35,2.5,"Male","No","Sat","Dinner",4
34 | 15.06,3,"Female","No","Sat","Dinner",2
35 | 20.69,2.45,"Female","No","Sat","Dinner",4
36 | 17.78,3.27,"Male","No","Sat","Dinner",2
37 | 24.06,3.6,"Male","No","Sat","Dinner",3
38 | 16.31,2,"Male","No","Sat","Dinner",3
39 | 16.93,3.07,"Female","No","Sat","Dinner",3
40 | 18.69,2.31,"Male","No","Sat","Dinner",3
41 | 31.27,5,"Male","No","Sat","Dinner",3
42 | 16.04,2.24,"Male","No","Sat","Dinner",3
43 | 17.46,2.54,"Male","No","Sun","Dinner",2
44 | 13.94,3.06,"Male","No","Sun","Dinner",2
45 | 9.68,1.32,"Male","No","Sun","Dinner",2
46 | 30.4,5.6,"Male","No","Sun","Dinner",4
47 | 18.29,3,"Male","No","Sun","Dinner",2
48 | 22.23,5,"Male","No","Sun","Dinner",2
49 | 32.4,6,"Male","No","Sun","Dinner",4
50 | 28.55,2.05,"Male","No","Sun","Dinner",3
51 | 18.04,3,"Male","No","Sun","Dinner",2
52 | 12.54,2.5,"Male","No","Sun","Dinner",2
53 | 10.29,2.6,"Female","No","Sun","Dinner",2
54 | 34.81,5.2,"Female","No","Sun","Dinner",4
55 | 9.94,1.56,"Male","No","Sun","Dinner",2
56 | 25.56,4.34,"Male","No","Sun","Dinner",4
57 | 19.49,3.51,"Male","No","Sun","Dinner",2
58 | 38.01,3,"Male","Yes","Sat","Dinner",4
59 | 26.41,1.5,"Female","No","Sat","Dinner",2
60 | 11.24,1.76,"Male","Yes","Sat","Dinner",2
61 | 48.27,6.73,"Male","No","Sat","Dinner",4
62 | 20.29,3.21,"Male","Yes","Sat","Dinner",2
63 | 13.81,2,"Male","Yes","Sat","Dinner",2
64 | 11.02,1.98,"Male","Yes","Sat","Dinner",2
65 | 18.29,3.76,"Male","Yes","Sat","Dinner",4
66 | 17.59,2.64,"Male","No","Sat","Dinner",3
67 | 20.08,3.15,"Male","No","Sat","Dinner",3
68 | 16.45,2.47,"Female","No","Sat","Dinner",2
69 | 3.07,1,"Female","Yes","Sat","Dinner",1
70 | 20.23,2.01,"Male","No","Sat","Dinner",2
71 | 15.01,2.09,"Male","Yes","Sat","Dinner",2
72 | 12.02,1.97,"Male","No","Sat","Dinner",2
73 | 17.07,3,"Female","No","Sat","Dinner",3
74 | 26.86,3.14,"Female","Yes","Sat","Dinner",2
75 | 25.28,5,"Female","Yes","Sat","Dinner",2
76 | 14.73,2.2,"Female","No","Sat","Dinner",2
77 | 10.51,1.25,"Male","No","Sat","Dinner",2
78 | 17.92,3.08,"Male","Yes","Sat","Dinner",2
79 | 27.2,4,"Male","No","Thur","Lunch",4
80 | 22.76,3,"Male","No","Thur","Lunch",2
81 | 17.29,2.71,"Male","No","Thur","Lunch",2
82 | 19.44,3,"Male","Yes","Thur","Lunch",2
83 | 16.66,3.4,"Male","No","Thur","Lunch",2
84 | 10.07,1.83,"Female","No","Thur","Lunch",1
85 | 32.68,5,"Male","Yes","Thur","Lunch",2
86 | 15.98,2.03,"Male","No","Thur","Lunch",2
87 | 34.83,5.17,"Female","No","Thur","Lunch",4
88 | 13.03,2,"Male","No","Thur","Lunch",2
89 | 18.28,4,"Male","No","Thur","Lunch",2
90 | 24.71,5.85,"Male","No","Thur","Lunch",2
91 | 21.16,3,"Male","No","Thur","Lunch",2
92 | 28.97,3,"Male","Yes","Fri","Dinner",2
93 | 22.49,3.5,"Male","No","Fri","Dinner",2
94 | 5.75,1,"Female","Yes","Fri","Dinner",2
95 | 16.32,4.3,"Female","Yes","Fri","Dinner",2
96 | 22.75,3.25,"Female","No","Fri","Dinner",2
97 | 40.17,4.73,"Male","Yes","Fri","Dinner",4
98 | 27.28,4,"Male","Yes","Fri","Dinner",2
99 | 12.03,1.5,"Male","Yes","Fri","Dinner",2
100 | 21.01,3,"Male","Yes","Fri","Dinner",2
101 | 12.46,1.5,"Male","No","Fri","Dinner",2
102 | 11.35,2.5,"Female","Yes","Fri","Dinner",2
103 | 15.38,3,"Female","Yes","Fri","Dinner",2
104 | 44.3,2.5,"Female","Yes","Sat","Dinner",3
105 | 22.42,3.48,"Female","Yes","Sat","Dinner",2
106 | 20.92,4.08,"Female","No","Sat","Dinner",2
107 | 15.36,1.64,"Male","Yes","Sat","Dinner",2
108 | 20.49,4.06,"Male","Yes","Sat","Dinner",2
109 | 25.21,4.29,"Male","Yes","Sat","Dinner",2
110 | 18.24,3.76,"Male","No","Sat","Dinner",2
111 | 14.31,4,"Female","Yes","Sat","Dinner",2
112 | 14,3,"Male","No","Sat","Dinner",2
113 | 7.25,1,"Female","No","Sat","Dinner",1
114 | 38.07,4,"Male","No","Sun","Dinner",3
115 | 23.95,2.55,"Male","No","Sun","Dinner",2
116 | 25.71,4,"Female","No","Sun","Dinner",3
117 | 17.31,3.5,"Female","No","Sun","Dinner",2
118 | 29.93,5.07,"Male","No","Sun","Dinner",4
119 | 10.65,1.5,"Female","No","Thur","Lunch",2
120 | 12.43,1.8,"Female","No","Thur","Lunch",2
121 | 24.08,2.92,"Female","No","Thur","Lunch",4
122 | 11.69,2.31,"Male","No","Thur","Lunch",2
123 | 13.42,1.68,"Female","No","Thur","Lunch",2
124 | 14.26,2.5,"Male","No","Thur","Lunch",2
125 | 15.95,2,"Male","No","Thur","Lunch",2
126 | 12.48,2.52,"Female","No","Thur","Lunch",2
127 | 29.8,4.2,"Female","No","Thur","Lunch",6
128 | 8.52,1.48,"Male","No","Thur","Lunch",2
129 | 14.52,2,"Female","No","Thur","Lunch",2
130 | 11.38,2,"Female","No","Thur","Lunch",2
131 | 22.82,2.18,"Male","No","Thur","Lunch",3
132 | 19.08,1.5,"Male","No","Thur","Lunch",2
133 | 20.27,2.83,"Female","No","Thur","Lunch",2
134 | 11.17,1.5,"Female","No","Thur","Lunch",2
135 | 12.26,2,"Female","No","Thur","Lunch",2
136 | 18.26,3.25,"Female","No","Thur","Lunch",2
137 | 8.51,1.25,"Female","No","Thur","Lunch",2
138 | 10.33,2,"Female","No","Thur","Lunch",2
139 | 14.15,2,"Female","No","Thur","Lunch",2
140 | 16,2,"Male","Yes","Thur","Lunch",2
141 | 13.16,2.75,"Female","No","Thur","Lunch",2
142 | 17.47,3.5,"Female","No","Thur","Lunch",2
143 | 34.3,6.7,"Male","No","Thur","Lunch",6
144 | 41.19,5,"Male","No","Thur","Lunch",5
145 | 27.05,5,"Female","No","Thur","Lunch",6
146 | 16.43,2.3,"Female","No","Thur","Lunch",2
147 | 8.35,1.5,"Female","No","Thur","Lunch",2
148 | 18.64,1.36,"Female","No","Thur","Lunch",3
149 | 11.87,1.63,"Female","No","Thur","Lunch",2
150 | 9.78,1.73,"Male","No","Thur","Lunch",2
151 | 7.51,2,"Male","No","Thur","Lunch",2
152 | 14.07,2.5,"Male","No","Sun","Dinner",2
153 | 13.13,2,"Male","No","Sun","Dinner",2
154 | 17.26,2.74,"Male","No","Sun","Dinner",3
155 | 24.55,2,"Male","No","Sun","Dinner",4
156 | 19.77,2,"Male","No","Sun","Dinner",4
157 | 29.85,5.14,"Female","No","Sun","Dinner",5
158 | 48.17,5,"Male","No","Sun","Dinner",6
159 | 25,3.75,"Female","No","Sun","Dinner",4
160 | 13.39,2.61,"Female","No","Sun","Dinner",2
161 | 16.49,2,"Male","No","Sun","Dinner",4
162 | 21.5,3.5,"Male","No","Sun","Dinner",4
163 | 12.66,2.5,"Male","No","Sun","Dinner",2
164 | 16.21,2,"Female","No","Sun","Dinner",3
165 | 13.81,2,"Male","No","Sun","Dinner",2
166 | 17.51,3,"Female","Yes","Sun","Dinner",2
167 | 24.52,3.48,"Male","No","Sun","Dinner",3
168 | 20.76,2.24,"Male","No","Sun","Dinner",2
169 | 31.71,4.5,"Male","No","Sun","Dinner",4
170 | 10.59,1.61,"Female","Yes","Sat","Dinner",2
171 | 10.63,2,"Female","Yes","Sat","Dinner",2
172 | 50.81,10,"Male","Yes","Sat","Dinner",3
173 | 15.81,3.16,"Male","Yes","Sat","Dinner",2
174 | 7.25,5.15,"Male","Yes","Sun","Dinner",2
175 | 31.85,3.18,"Male","Yes","Sun","Dinner",2
176 | 16.82,4,"Male","Yes","Sun","Dinner",2
177 | 32.9,3.11,"Male","Yes","Sun","Dinner",2
178 | 17.89,2,"Male","Yes","Sun","Dinner",2
179 | 14.48,2,"Male","Yes","Sun","Dinner",2
180 | 9.6,4,"Female","Yes","Sun","Dinner",2
181 | 34.63,3.55,"Male","Yes","Sun","Dinner",2
182 | 34.65,3.68,"Male","Yes","Sun","Dinner",4
183 | 23.33,5.65,"Male","Yes","Sun","Dinner",2
184 | 45.35,3.5,"Male","Yes","Sun","Dinner",3
185 | 23.17,6.5,"Male","Yes","Sun","Dinner",4
186 | 40.55,3,"Male","Yes","Sun","Dinner",2
187 | 20.69,5,"Male","No","Sun","Dinner",5
188 | 20.9,3.5,"Female","Yes","Sun","Dinner",3
189 | 30.46,2,"Male","Yes","Sun","Dinner",5
190 | 18.15,3.5,"Female","Yes","Sun","Dinner",3
191 | 23.1,4,"Male","Yes","Sun","Dinner",3
192 | 15.69,1.5,"Male","Yes","Sun","Dinner",2
193 | 19.81,4.19,"Female","Yes","Thur","Lunch",2
194 | 28.44,2.56,"Male","Yes","Thur","Lunch",2
195 | 15.48,2.02,"Male","Yes","Thur","Lunch",2
196 | 16.58,4,"Male","Yes","Thur","Lunch",2
197 | 7.56,1.44,"Male","No","Thur","Lunch",2
198 | 10.34,2,"Male","Yes","Thur","Lunch",2
199 | 43.11,5,"Female","Yes","Thur","Lunch",4
200 | 13,2,"Female","Yes","Thur","Lunch",2
201 | 13.51,2,"Male","Yes","Thur","Lunch",2
202 | 18.71,4,"Male","Yes","Thur","Lunch",3
203 | 12.74,2.01,"Female","Yes","Thur","Lunch",2
204 | 13,2,"Female","Yes","Thur","Lunch",2
205 | 16.4,2.5,"Female","Yes","Thur","Lunch",2
206 | 20.53,4,"Male","Yes","Thur","Lunch",4
207 | 16.47,3.23,"Female","Yes","Thur","Lunch",3
208 | 26.59,3.41,"Male","Yes","Sat","Dinner",3
209 | 38.73,3,"Male","Yes","Sat","Dinner",4
210 | 24.27,2.03,"Male","Yes","Sat","Dinner",2
211 | 12.76,2.23,"Female","Yes","Sat","Dinner",2
212 | 30.06,2,"Male","Yes","Sat","Dinner",3
213 | 25.89,5.16,"Male","Yes","Sat","Dinner",4
214 | 48.33,9,"Male","No","Sat","Dinner",4
215 | 13.27,2.5,"Female","Yes","Sat","Dinner",2
216 | 28.17,6.5,"Female","Yes","Sat","Dinner",3
217 | 12.9,1.1,"Female","Yes","Sat","Dinner",2
218 | 28.15,3,"Male","Yes","Sat","Dinner",5
219 | 11.59,1.5,"Male","Yes","Sat","Dinner",2
220 | 7.74,1.44,"Male","Yes","Sat","Dinner",2
221 | 30.14,3.09,"Female","Yes","Sat","Dinner",4
222 | 12.16,2.2,"Male","Yes","Fri","Lunch",2
223 | 13.42,3.48,"Female","Yes","Fri","Lunch",2
224 | 8.58,1.92,"Male","Yes","Fri","Lunch",1
225 | 15.98,3,"Female","No","Fri","Lunch",3
226 | 13.42,1.58,"Male","Yes","Fri","Lunch",2
227 | 16.27,2.5,"Female","Yes","Fri","Lunch",2
228 | 10.09,2,"Female","Yes","Fri","Lunch",2
229 | 20.45,3,"Male","No","Sat","Dinner",4
230 | 13.28,2.72,"Male","No","Sat","Dinner",2
231 | 22.12,2.88,"Female","Yes","Sat","Dinner",2
232 | 24.01,2,"Male","Yes","Sat","Dinner",4
233 | 15.69,3,"Male","Yes","Sat","Dinner",3
234 | 11.61,3.39,"Male","No","Sat","Dinner",2
235 | 10.77,1.47,"Male","No","Sat","Dinner",2
236 | 15.53,3,"Male","Yes","Sat","Dinner",2
237 | 10.07,1.25,"Male","No","Sat","Dinner",2
238 | 12.6,1,"Male","Yes","Sat","Dinner",2
239 | 32.83,1.17,"Male","Yes","Sat","Dinner",2
240 | 35.83,4.67,"Female","No","Sat","Dinner",3
241 | 29.03,5.92,"Male","No","Sat","Dinner",3
242 | 27.18,2,"Female","Yes","Sat","Dinner",2
243 | 22.67,2,"Male","Yes","Sat","Dinner",2
244 | 17.82,1.75,"Male","No","Sat","Dinner",2
245 | 18.78,3,"Female","No","Thur","Dinner",2
246 |
--------------------------------------------------------------------------------