├── .gitignore
├── LICENSE
├── README.md
├── book
├── 00_python_crash_course.ipynb
├── 00_python_crash_course_datatypes.ipynb
├── 00_python_crash_course_functions.ipynb
├── 00_python_crash_course_oop.ipynb
├── 00_python_crash_course_variables.ipynb
├── 01_pandas_dataframe.ipynb
├── 02_loading_data.ipynb
├── 03_cleaning_data.ipynb
├── 04_data_visualization.ipynb
├── 05_data_exploration.ipynb
├── AP_nyc_data_definitions.md
├── AP_seaborn_palette.ipynb
├── _config.yml
├── _toc.yml
├── data
│ ├── building_class.psv
│ ├── movies_data.csv
│ ├── nyc_real_estate.csv
│ └── nyc_real_estate_clean.csv
├── intro.md
├── logo.png
└── references.bib
├── requirements.txt
└── runtime.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 |
131 | book/_build/
132 | book/assets/
133 | *.DS_Store
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2022, Jupyter Academy
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 |
9 | 1. Redistributions of source code must retain the above copyright notice, this
10 | list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 | this list of conditions and the following disclaimer in the documentation
14 | and/or other materials provided with the distribution.
15 |
16 | 3. Neither the name of the copyright holder nor the names of its
17 | contributors may be used to endorse or promote products derived from
18 | this software without specific prior written permission.
19 |
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Practical Python for Data Science
2 |
3 | Oh, hey there! Welcome to my open-source book - `Practical Python for Data Science`. This repo contains all of the code that was used to build this book.
4 |
5 | Check out the online book at [www.practicalpythonfordatascience.com](www.practicalpythonfordatascience.com).
6 |
7 |
8 |
9 |
10 | ### Introduction
11 |
12 | Python is the "swiss army knife" of programming. There are several factors that contribute to its versatility:
13 |
14 | - it has clean and human-readable syntax so it’s easy to learn
15 | - it’s an interpreted object-oriented scripting language
16 | - it has a strong open-source community and a large repository of Python packages
17 |
18 | Because of its versatility, Python can be applied to both software development (e.g., building web applications and API’s) and data science (e.g., scientific computing, creating end-to-end data science pipelines). However, writing Python for data science is very different than writing Python for software devleopment. A huge part of the learning curve is getting familiar with the syntax of Python’s data science packages including but not limited to Pandas, NumPy, and scikit-learn.
19 |
20 | In this book, we will focus on how to use Python in the context of data science. We will work with a real-life dataset and explore it using the following data science Python packages:
21 |
22 | - [Pandas](https://pandas.pydata.org/)
23 | - [Seaborn](https://seaborn.pydata.org/)
24 | - [Matplotlib](https://matplotlib.org/)
25 |
26 | # Prerequisites
27 |
28 | This book is designed to be accessible for people without a strong technical background. In order to make the most of this book, the suggested requirements are:
29 |
30 | - Basic knowledge of Python
31 | - Some familiarity with Jupyter Notebooks, Pandas, and Seaborn
32 | - Googling skills and ability to read documentation
33 |
34 | # Open a Github Issue
35 |
36 | Did you spot an error in this book? Have an idea on how to make the book better? I'm always open to feedback and new ideas. You can contribute by opening a [Github issue](https://github.com/jupyteracademy/practical-python-for-data-science/issues) or creating a pull request with the proposed fix.
37 |
38 | # Support This Project
39 |
40 | If you would like to support this open-sourced project and its continued development and maintenance, you can support in a few of ways:
41 |
42 | - [buy me a coffee](https://www.buymeacoffee.com/jupyteracademy) ☕
43 | - sign up for my upcoming online courses at [Jupyter Academy](https://jupyteracademy.com/) 💕
44 |
45 |
46 |
47 |
--------------------------------------------------------------------------------
/book/00_python_crash_course.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "90cc6d0b",
6 | "metadata": {},
7 | "source": [
8 | "# Python Crash Course\n",
9 | "\n",
10 | "You don't need to have world-class Python skills to use Python for data science. While it's helpful to have coding experience, you can get by with knowing the main datatypes of Python and how object-oriented programming works. In this Python crash course, we will cover:\n",
11 | "\n",
12 | "1. [Object-Oriented Programming](00_python_crash_course_oop)\n",
13 | "2. [Python Datatypes](00_python_crash_course_datatypes)\n",
14 | "3. [Variables](00_python_crash_course_variables)\n",
15 | "4. [Functions](00_python_crash_course_functions)\n",
16 | "\n",
17 | "
"
18 | ]
19 | }
20 | ],
21 | "metadata": {
22 | "jupytext": {
23 | "cell_metadata_filter": "-all",
24 | "main_language": "python",
25 | "notebook_metadata_filter": "-all"
26 | },
27 | "kernelspec": {
28 | "display_name": "Python 3 (ipykernel)",
29 | "language": "python",
30 | "name": "python3"
31 | },
32 | "language_info": {
33 | "codemirror_mode": {
34 | "name": "ipython",
35 | "version": 3
36 | },
37 | "file_extension": ".py",
38 | "mimetype": "text/x-python",
39 | "name": "python",
40 | "nbconvert_exporter": "python",
41 | "pygments_lexer": "ipython3",
42 | "version": "3.9.12"
43 | }
44 | },
45 | "nbformat": 4,
46 | "nbformat_minor": 5
47 | }
48 |
--------------------------------------------------------------------------------
/book/00_python_crash_course_datatypes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "9b833a43",
6 | "metadata": {},
7 | "source": [
8 | "# Python Datatypes \n",
9 | "\n",
10 | "All objects in Python have a datatype. If you want to know the datatype of an object, you can simply use the `type()` function. The main datatypes of Python are:\n",
11 | "\n",
12 | "1. [Integer](#integer)\n",
13 | "2. [Float](#float)\n",
14 | "3. [String](#string)\n",
15 | "4. [Boolean](#boolean)\n",
16 | "5. [List](#list) \n",
17 | "6. [Dictionary](#dictionary)\n",
18 | "\n",
19 | "Let's take a look at each one. "
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "8307b235",
25 | "metadata": {},
26 | "source": [
27 | "## 1) Integer \n",
28 | "\n",
29 | "The **integer** is a numerical datatype. It's a whole number, which means that it does not have any decimals and cannot be expressed as a fraction. \n",
30 | "\n",
31 | "**Examples of integers:**\n",
32 | "\n",
33 | "- population\n",
34 | "- number of cities\n",
35 | "- year "
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 1,
41 | "id": "4974e663",
42 | "metadata": {},
43 | "outputs": [
44 | {
45 | "data": {
46 | "text/plain": [
47 | "int"
48 | ]
49 | },
50 | "execution_count": 1,
51 | "metadata": {},
52 | "output_type": "execute_result"
53 | }
54 | ],
55 | "source": [
56 | "population = 1000\n",
57 | "type(population)"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "64c5f934",
63 | "metadata": {},
64 | "source": [
65 | "\"int\" is short for \"integer\"! 😎"
66 | ]
67 | },
68 | {
69 | "cell_type": "markdown",
70 | "id": "6d77146d",
71 | "metadata": {},
72 | "source": [
73 | "## 2) Float \n",
74 | "\n",
75 | "The **float** is a real number written in scientific notation with decimals. This is useful when more precision is needed.\n",
76 | "\n",
77 | "**Examples of floats:**\n",
78 | "\n",
79 | "- cost of a latte\n",
80 | "- weight\n",
81 | "- distance in miles "
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 2,
87 | "id": "c8f0d2b2",
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "data": {
92 | "text/plain": [
93 | "float"
94 | ]
95 | },
96 | "execution_count": 2,
97 | "metadata": {},
98 | "output_type": "execute_result"
99 | }
100 | ],
101 | "source": [
102 | "cost_of_latte = 4.50\n",
103 | "type(cost_of_latte) "
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "id": "d07380e2",
109 | "metadata": {},
110 | "source": [
111 | "Fractions are also expressed as floats (even if the output is theoretically a whole number):"
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": 3,
117 | "id": "f024fb8b",
118 | "metadata": {},
119 | "outputs": [
120 | {
121 | "data": {
122 | "text/plain": [
123 | "float"
124 | ]
125 | },
126 | "execution_count": 3,
127 | "metadata": {},
128 | "output_type": "execute_result"
129 | }
130 | ],
131 | "source": [
132 | "cost_per_egg = 12/12 \n",
133 | "type(cost_per_egg)"
134 | ]
135 | },
136 | {
137 | "cell_type": "markdown",
138 | "id": "7fb4115f",
139 | "metadata": {},
140 | "source": [
141 | "### Mixing Floats and Integers\n",
142 | "\n",
143 | "If we want to convert a float to an integer (or vice versa), we can easily do so by wrapping the variable in `int()` or `float()`. Let's try this out:"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": 4,
149 | "id": "ada52513",
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "name": "stdout",
154 | "output_type": "stream",
155 | "text": [
156 | "Cost of apple: float 3.55 --> int 3\n",
157 | "Number of apples: int 10 --> float 10.0\n"
158 | ]
159 | }
160 | ],
161 | "source": [
162 | "cost_per_apple = 3.55\n",
163 | "n_apples = 10\n",
164 | "\n",
165 | "print(f\"Cost of apple: float {cost_per_apple} --> int {int(cost_per_apple)}\")\n",
166 | "print(f\"Number of apples: int {n_apples} --> float {float(n_apples)}\")"
167 | ]
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "id": "cadeb5f2",
172 | "metadata": {},
173 | "source": [
174 | "When you \"cast\" (convert) a float into an integer using `int()`, it will trim the values after the decimal point and returns only the integer/whole number part. In other words, `int()` will always round down to the whole number."
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "id": "728a7cc6",
180 | "metadata": {},
181 | "source": [
182 | "```{note}\n",
183 | "The print statements above use something called [f-strings](https://realpython.com/python-f-strings/). Why is it called **f-string**? If you notice in the code above, the string inside the print statement is preceded by an \"f\" - this puts it in \"f-string mode\". To embed an expression in your string, you need to wrap it inside squiggly brackets { }. The f-string is only available in Python 3.6 or greater. It lets you embed Python expressions inside string literals in a readable way. Before Python 3.6, you would have to use %-formatting or .format() to embed expressions inside strings, which was much more verbose and prone to error.\n",
184 | "```"
185 | ]
186 | },
187 | {
188 | "cell_type": "markdown",
189 | "id": "bfbf11e3",
190 | "metadata": {},
191 | "source": [
192 | "In Python, it's possible to mix integers and floats in an arithmetic operation. So you don't need to worry about converting these numeric types into a common format. Let's test it out with our variables n_apples (an integer) and cost_per_apple (a float)."
193 | ]
194 | },
195 | {
196 | "cell_type": "code",
197 | "execution_count": 5,
198 | "id": "803b0572",
199 | "metadata": {},
200 | "outputs": [
201 | {
202 | "data": {
203 | "text/plain": [
204 | "35.5"
205 | ]
206 | },
207 | "execution_count": 5,
208 | "metadata": {},
209 | "output_type": "execute_result"
210 | }
211 | ],
212 | "source": [
213 | "total_cost = n_apples*cost_per_apple\n",
214 | "total_cost"
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "id": "dd2b266b",
220 | "metadata": {},
221 | "source": [
222 | "We can see that the output of `n_apples * cost_per_apple` is a float. This is because `n_apples`, which was originally an integer, gets converted to a float when it gets multiplied with `cost_per_apple`.\n",
223 | "\n",
224 | "Here's a complete list of arithmetic operations in Python:\n",
225 | "\n",
226 | "- **Addition:** gets the sum of the operands\n",
227 | "```\n",
228 | "x + y\n",
229 | "```\n",
230 | "- **Subtraction:** gets the difference of the operands\n",
231 | "```\n",
232 | "x - y\n",
233 | "```\n",
234 | "- **Multiplication:** gets the product of the operands\n",
235 | "```\n",
236 | "x * y\n",
237 | "```\n",
238 | "- **Division:** produces the quotient of the operands and returns a float\n",
239 | "```\n",
240 | "x / y\n",
241 | "```\n",
242 | "- **Division with floor:** produces the quotient of the operands and returns an integer (rounds down)\n",
243 | "```\n",
244 | "x // y\n",
245 | "```\n",
246 | "- **Exponent:** raises the first operand to the power of the second operand\n",
247 | "```\n",
248 | "x ** y\n",
249 | "```"
250 | ]
251 | },
252 | {
253 | "cell_type": "markdown",
254 | "id": "d3656a4b",
255 | "metadata": {},
256 | "source": [
257 | "## 3) String \n",
258 | "\n",
259 | "The **string** datatype is typically used to store text. We can think of a string as a \"sequence of charactertics\" which can be alphateic, numeric, or having special characters. A string is surrounded by quotations, which can be either double quotes `\" \"` or single `' '` quotes.\n",
260 | "\n",
261 | "**Examples of strings:**\n",
262 | "\n",
263 | "- name of city \n",
264 | "- address\n",
265 | "- Canadian postal code"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": 6,
271 | "id": "c68f4b94",
272 | "metadata": {},
273 | "outputs": [
274 | {
275 | "data": {
276 | "text/plain": [
277 | "str"
278 | ]
279 | },
280 | "execution_count": 6,
281 | "metadata": {},
282 | "output_type": "execute_result"
283 | }
284 | ],
285 | "source": [
286 | "name_of_city = 'Toronto'\n",
287 | "type(name_of_city) "
288 | ]
289 | },
290 | {
291 | "cell_type": "markdown",
292 | "id": "ae5ab0e9",
293 | "metadata": {},
294 | "source": [
295 | "If a string contains an apostrophe, we can use double quotes to define the string and use a single quote character in the string."
296 | ]
297 | },
298 | {
299 | "cell_type": "code",
300 | "execution_count": 7,
301 | "id": "6837e39d",
302 | "metadata": {},
303 | "outputs": [
304 | {
305 | "name": "stdout",
306 | "output_type": "stream",
307 | "text": [
308 | "It's snowing outside\n"
309 | ]
310 | }
311 | ],
312 | "source": [
313 | "text = \"It's snowing outside\"\n",
314 | "print(text)"
315 | ]
316 | },
317 | {
318 | "cell_type": "markdown",
319 | "id": "5ff53e65",
320 | "metadata": {},
321 | "source": [
322 | "It's important to note that *anything* surrounded by quotations is treated as a string. For example, if you wrap an integer in quotations, its datatype will be a string. "
323 | ]
324 | },
325 | {
326 | "cell_type": "code",
327 | "execution_count": 8,
328 | "id": "d52359f9",
329 | "metadata": {},
330 | "outputs": [
331 | {
332 | "data": {
333 | "text/plain": [
334 | "str"
335 | ]
336 | },
337 | "execution_count": 8,
338 | "metadata": {},
339 | "output_type": "execute_result"
340 | }
341 | ],
342 | "source": [
343 | "number_of_planets = \"9\"\n",
344 | "type(number_of_planets)"
345 | ]
346 | },
347 | {
348 | "cell_type": "markdown",
349 | "id": "8a914f78",
350 | "metadata": {},
351 | "source": [
352 | "### Strings within Strings \n",
353 | "\n",
354 | "If we want to see if a shorter string is inside a longer string, we can use the `in` operator."
355 | ]
356 | },
357 | {
358 | "cell_type": "code",
359 | "execution_count": 9,
360 | "id": "88118e42",
361 | "metadata": {},
362 | "outputs": [
363 | {
364 | "data": {
365 | "text/plain": [
366 | "True"
367 | ]
368 | },
369 | "execution_count": 9,
370 | "metadata": {},
371 | "output_type": "execute_result"
372 | }
373 | ],
374 | "source": [
375 | "'el' in 'Hello'"
376 | ]
377 | },
378 | {
379 | "cell_type": "markdown",
380 | "id": "1a4606b5",
381 | "metadata": {},
382 | "source": [
383 | "### Built-in Functions\n",
384 | "\n",
385 | "Strings have some special built-in functions that are useful when you're analyzing data.\n",
386 | "\n",
387 | "- `text.upper()` - converts text to all uppercase \n",
388 | "- `text.lower()` - converts text to all lowercase\n",
389 | "- `text.capitalize()` - capitalizes text (first character is made uppercase, followed by all lowercase characters)\n",
390 | "- `len(text)` - measures the length of a string (i.e., character count)\n",
391 | "- `text.replace('t', 'a')` - replaces a part of the string with another string "
392 | ]
393 | },
394 | {
395 | "cell_type": "markdown",
396 | "id": "13f502b5",
397 | "metadata": {},
398 | "source": [
399 | "## 4) Boolean \n",
400 | "\n",
401 | "A boolean is a binary datatype which can be either `True` or `False`. For those of you who are familiar with other programming languages, it's important to note that Python's boolean datatype must be capitalized - uppercase T for `True` and uppercase F for `False`. Booleans are often used to answer a yes/no question like \"is it nighttime?\" or \"is the patient female?\". \n",
402 | "\n",
403 | "**Examples of booleans:**\n",
404 | "\n",
405 | "- is it morning?\n",
406 | "- is the patient on meds? \n",
407 | "- does x equal y?"
408 | ]
409 | },
410 | {
411 | "cell_type": "code",
412 | "execution_count": 10,
413 | "id": "86dbc88d",
414 | "metadata": {},
415 | "outputs": [
416 | {
417 | "data": {
418 | "text/plain": [
419 | "bool"
420 | ]
421 | },
422 | "execution_count": 10,
423 | "metadata": {},
424 | "output_type": "execute_result"
425 | }
426 | ],
427 | "source": [
428 | "is_morning = False\n",
429 | "type(is_morning)"
430 | ]
431 | },
432 | {
433 | "cell_type": "markdown",
434 | "id": "88f970b6",
435 | "metadata": {},
436 | "source": [
437 | "\"bool\" is short for boolean! 😎"
438 | ]
439 | },
440 | {
441 | "cell_type": "markdown",
442 | "id": "6ad80a3f",
443 | "metadata": {},
444 | "source": [
445 | "### Comparing Values with Boolean Expressions\n",
446 | "\n",
447 | "A boolean expression evaluates a statement and results in a boolean value. For example, the operator `==` tests if two values are equal."
448 | ]
449 | },
450 | {
451 | "cell_type": "code",
452 | "execution_count": 11,
453 | "id": "f05820c6",
454 | "metadata": {},
455 | "outputs": [
456 | {
457 | "data": {
458 | "text/plain": [
459 | "False"
460 | ]
461 | },
462 | "execution_count": 11,
463 | "metadata": {},
464 | "output_type": "execute_result"
465 | }
466 | ],
467 | "source": [
468 | "is_vegan = False\n",
469 | "is_vegetarian = True \n",
470 | "\n",
471 | "is_vegan == is_vegetarian"
472 | ]
473 | },
474 | {
475 | "cell_type": "markdown",
476 | "id": "4c9adf4e",
477 | "metadata": {},
478 | "source": [
479 | "You can also compare two numeric values using:\n",
480 | "\n",
481 | "- `>` (greater than)\n",
482 | "- `<` (less than)\n",
483 | "- `>=` (greater than or equal to)\n",
484 | "- `<=` (less than or equal to)"
485 | ]
486 | },
487 | {
488 | "cell_type": "code",
489 | "execution_count": 12,
490 | "id": "a8cb2e9c",
491 | "metadata": {},
492 | "outputs": [
493 | {
494 | "data": {
495 | "text/plain": [
496 | "True"
497 | ]
498 | },
499 | "execution_count": 12,
500 | "metadata": {},
501 | "output_type": "execute_result"
502 | }
503 | ],
504 | "source": [
505 | "n_donuts = 10\n",
506 | "n_muffins = 5\n",
507 | "\n",
508 | "n_donuts >= n_muffins"
509 | ]
510 | },
511 | {
512 | "cell_type": "markdown",
513 | "id": "53bd7d9a",
514 | "metadata": {},
515 | "source": [
516 | "### Comparing Strings with Boolean Expressions\n",
517 | "\n",
518 | "Interestingly, you can also compare two strings. The evaluation goes by alphabetical order so the \"larger\" item would be higher up in the alphabet. "
519 | ]
520 | },
521 | {
522 | "cell_type": "code",
523 | "execution_count": 13,
524 | "id": "e18321a6",
525 | "metadata": {},
526 | "outputs": [
527 | {
528 | "data": {
529 | "text/plain": [
530 | "False"
531 | ]
532 | },
533 | "execution_count": 13,
534 | "metadata": {},
535 | "output_type": "execute_result"
536 | }
537 | ],
538 | "source": [
539 | "server = 'Anne'\n",
540 | "host = 'Jim'\n",
541 | "\n",
542 | "server > host"
543 | ]
544 | },
545 | {
546 | "cell_type": "markdown",
547 | "id": "a80a0b65",
548 | "metadata": {},
549 | "source": [
550 | "## 5) List\n",
551 | "\n",
552 | "[Lists](https://www.w3schools.com/python/python_ref_list.asp) represent a collection of objects and are constructed with square brackets, separating items with commas. A list can contain a collection of one datatype:\n",
553 | "\n",
554 | "```\n",
555 | "list_of_integers = [1,2,3,4,5]\n",
556 | "```\n",
557 | "\n",
558 | "It can also contain a collection of mixed datatypes:\n",
559 | "\n",
560 | "```\n",
561 | "list_of_mixed_datatypes = ['cat', 10, 'belarus', True]\n",
562 | "```"
563 | ]
564 | },
565 | {
566 | "cell_type": "markdown",
567 | "id": "cfd4d609",
568 | "metadata": {},
569 | "source": [
570 | "Let's start with a simple list that captures the number of hours slept by a group of friends:"
571 | ]
572 | },
573 | {
574 | "cell_type": "code",
575 | "execution_count": 14,
576 | "id": "9177fd50",
577 | "metadata": {},
578 | "outputs": [],
579 | "source": [
580 | "hours_slept = [10,12,5,8]"
581 | ]
582 | },
583 | {
584 | "cell_type": "markdown",
585 | "id": "538cabb9",
586 | "metadata": {},
587 | "source": [
588 | "To get the length (count) of a list, you can use `len()`."
589 | ]
590 | },
591 | {
592 | "cell_type": "code",
593 | "execution_count": 15,
594 | "id": "8649a5fc",
595 | "metadata": {},
596 | "outputs": [
597 | {
598 | "data": {
599 | "text/plain": [
600 | "4"
601 | ]
602 | },
603 | "execution_count": 15,
604 | "metadata": {},
605 | "output_type": "execute_result"
606 | }
607 | ],
608 | "source": [
609 | "len(hours_slept)"
610 | ]
611 | },
612 | {
613 | "cell_type": "markdown",
614 | "id": "10365d15",
615 | "metadata": {},
616 | "source": [
617 | "To get the sum of numbers in a list, you can use `sum()`. This will only work if all elements in the list are numeric. "
618 | ]
619 | },
620 | {
621 | "cell_type": "code",
622 | "execution_count": 16,
623 | "id": "07f5b1e9",
624 | "metadata": {},
625 | "outputs": [
626 | {
627 | "data": {
628 | "text/plain": [
629 | "35"
630 | ]
631 | },
632 | "execution_count": 16,
633 | "metadata": {},
634 | "output_type": "execute_result"
635 | }
636 | ],
637 | "source": [
638 | "sum(hours_slept)"
639 | ]
640 | },
641 | {
642 | "cell_type": "markdown",
643 | "id": "7cdbe619",
644 | "metadata": {},
645 | "source": [
646 | "You can get the smallest and largest values of a list using `min()` and `max()`, respectively."
647 | ]
648 | },
649 | {
650 | "cell_type": "code",
651 | "execution_count": 17,
652 | "id": "4db43aa4",
653 | "metadata": {},
654 | "outputs": [
655 | {
656 | "data": {
657 | "text/plain": [
658 | "5"
659 | ]
660 | },
661 | "execution_count": 17,
662 | "metadata": {},
663 | "output_type": "execute_result"
664 | }
665 | ],
666 | "source": [
667 | "min(hours_slept)"
668 | ]
669 | },
670 | {
671 | "cell_type": "code",
672 | "execution_count": 18,
673 | "id": "798cecd1",
674 | "metadata": {},
675 | "outputs": [
676 | {
677 | "data": {
678 | "text/plain": [
679 | "12"
680 | ]
681 | },
682 | "execution_count": 18,
683 | "metadata": {},
684 | "output_type": "execute_result"
685 | }
686 | ],
687 | "source": [
688 | "max(hours_slept)"
689 | ]
690 | },
691 | {
692 | "cell_type": "markdown",
693 | "id": "d014e71c",
694 | "metadata": {},
695 | "source": [
696 | "### Sorting Lists\n",
697 | "\n",
698 | "You can also sort elements within a list using the `.sorted()` function, which sorts the list from lowest to highest value."
699 | ]
700 | },
701 | {
702 | "cell_type": "code",
703 | "execution_count": 19,
704 | "id": "d264d401",
705 | "metadata": {},
706 | "outputs": [
707 | {
708 | "data": {
709 | "text/plain": [
710 | "[5, 8, 10, 12]"
711 | ]
712 | },
713 | "execution_count": 19,
714 | "metadata": {},
715 | "output_type": "execute_result"
716 | }
717 | ],
718 | "source": [
719 | "hours_slept.sort()\n",
720 | "hours_slept"
721 | ]
722 | },
723 | {
724 | "cell_type": "markdown",
725 | "id": "b6b14ef6",
726 | "metadata": {},
727 | "source": [
728 | "You can also reverse the order of the sort, from highest to lowest value, sing `.reverse()`."
729 | ]
730 | },
731 | {
732 | "cell_type": "code",
733 | "execution_count": 20,
734 | "id": "10b064e0",
735 | "metadata": {},
736 | "outputs": [
737 | {
738 | "data": {
739 | "text/plain": [
740 | "[12, 10, 8, 5]"
741 | ]
742 | },
743 | "execution_count": 20,
744 | "metadata": {},
745 | "output_type": "execute_result"
746 | }
747 | ],
748 | "source": [
749 | "hours_slept.reverse()\n",
750 | "hours_slept"
751 | ]
752 | },
753 | {
754 | "cell_type": "markdown",
755 | "id": "b04f2cea",
756 | "metadata": {},
757 | "source": [
758 | "### Lists are ordered\n",
759 | "\n",
760 | "Lists are ordered which means that the order of elements within a list is part of a list's identity. You can have two lists with the exact same elements but if the order of elements are different, these lists are not the same. Let's demonstrate this with an example."
761 | ]
762 | },
763 | {
764 | "cell_type": "code",
765 | "execution_count": 21,
766 | "id": "4499f09f",
767 | "metadata": {},
768 | "outputs": [
769 | {
770 | "data": {
771 | "text/plain": [
772 | "False"
773 | ]
774 | },
775 | "execution_count": 21,
776 | "metadata": {},
777 | "output_type": "execute_result"
778 | }
779 | ],
780 | "source": [
781 | "list1 = [1,2,3,4]\n",
782 | "list2 = [4,3,2,1]\n",
783 | "\n",
784 | "list1 == list2"
785 | ]
786 | },
787 | {
788 | "cell_type": "markdown",
789 | "id": "c353c8e6",
790 | "metadata": {},
791 | "source": [
792 | "`list1` and `list2` are not equal to one another since the order of their elements are different."
793 | ]
794 | },
795 | {
796 | "cell_type": "markdown",
797 | "id": "bca028ad",
798 | "metadata": {},
799 | "source": [
800 | "### The Index: Accessing Elements within a List\n",
801 | "\n",
802 | "You can access elements in a list by referencing its index. The index of a list starts at 0, which is probably different from what you're use to if you come from an R or Matlab background.\n",
803 | "\n",
804 | "
\n",
805 | "\n",
806 | "Let's say we want to go grocery shopping. We made a list of all the items we want to buy:\n",
807 | "\n",
808 | "
\n",
809 | "\n",
810 | "Each item in this list has a location (an index). \n",
811 | "\n",
812 | "
\n",
813 | "\n",
814 | "A list can have negative indices too. A negative list index counts from the end of a list. \n",
815 | "
\n",
816 | "\n",
817 | "We can get an individual item from a list using `shopping_list[index]`. Let's test this out!"
818 | ]
819 | },
820 | {
821 | "cell_type": "code",
822 | "execution_count": 22,
823 | "id": "228f132d",
824 | "metadata": {},
825 | "outputs": [
826 | {
827 | "name": "stdout",
828 | "output_type": "stream",
829 | "text": [
830 | "apples\n",
831 | "carrots\n",
832 | "chocolate\n",
833 | "bananas\n",
834 | "onions\n"
835 | ]
836 | }
837 | ],
838 | "source": [
839 | "shopping_list = ['apples', 'carrots', 'chocolate', 'bananas', 'onions']\n",
840 | "\n",
841 | "print(shopping_list[0])\n",
842 | "print(shopping_list[1])\n",
843 | "print(shopping_list[2])\n",
844 | "print(shopping_list[3])\n",
845 | "print(shopping_list[4])"
846 | ]
847 | },
848 | {
849 | "cell_type": "markdown",
850 | "id": "4f4457b3",
851 | "metadata": {},
852 | "source": [
853 | "Now, let's try calling each item by its negative index. "
854 | ]
855 | },
856 | {
857 | "cell_type": "code",
858 | "execution_count": 23,
859 | "id": "0a477bdd",
860 | "metadata": {},
861 | "outputs": [
862 | {
863 | "name": "stdout",
864 | "output_type": "stream",
865 | "text": [
866 | "apples\n",
867 | "carrots\n",
868 | "chocolate\n",
869 | "bananas\n",
870 | "onions\n"
871 | ]
872 | }
873 | ],
874 | "source": [
875 | "print(shopping_list[-5])\n",
876 | "print(shopping_list[-4])\n",
877 | "print(shopping_list[-3])\n",
878 | "print(shopping_list[-2])\n",
879 | "print(shopping_list[-1])"
880 | ]
881 | },
882 | {
883 | "cell_type": "markdown",
884 | "id": "9106803c",
885 | "metadata": {},
886 | "source": [
887 | "### Slicing a list\n",
888 | "\n",
889 | "You can get a subset of a list, or \"slice\" it, using list indices. If `shopping_list` is a list, the expression `[m:n]` returns the portion of `shopping_list` from the index `m` to BUT not including index `n`. Let's see how this works."
890 | ]
891 | },
892 | {
893 | "cell_type": "code",
894 | "execution_count": 24,
895 | "id": "f6d380d1",
896 | "metadata": {},
897 | "outputs": [
898 | {
899 | "data": {
900 | "text/plain": [
901 | "['carrots', 'chocolate']"
902 | ]
903 | },
904 | "execution_count": 24,
905 | "metadata": {},
906 | "output_type": "execute_result"
907 | }
908 | ],
909 | "source": [
910 | "shopping_list[1:3]"
911 | ]
912 | },
913 | {
914 | "cell_type": "markdown",
915 | "id": "c2d657e4",
916 | "metadata": {},
917 | "source": [
918 | "The code above returns 'carrots' and 'chocolate', which are represented by indices 1 and 2. It didn't return index 3 (bananas) because the second number of the slice is non-inclusive. To include index 3, we would have to update the slice to `[1:4]`:"
919 | ]
920 | },
921 | {
922 | "cell_type": "code",
923 | "execution_count": 25,
924 | "id": "d7fdc8c1",
925 | "metadata": {},
926 | "outputs": [
927 | {
928 | "data": {
929 | "text/plain": [
930 | "['carrots', 'chocolate', 'bananas']"
931 | ]
932 | },
933 | "execution_count": 25,
934 | "metadata": {},
935 | "output_type": "execute_result"
936 | }
937 | ],
938 | "source": [
939 | "shopping_list[1:4]"
940 | ]
941 | },
942 | {
943 | "cell_type": "markdown",
944 | "id": "49765705",
945 | "metadata": {},
946 | "source": [
947 | "### Finding Elements in a List\n",
948 | "\n",
949 | "You can check to see if an element exists inside a list using the `in` operator."
950 | ]
951 | },
952 | {
953 | "cell_type": "code",
954 | "execution_count": 26,
955 | "id": "670a3201",
956 | "metadata": {},
957 | "outputs": [
958 | {
959 | "data": {
960 | "text/plain": [
961 | "True"
962 | ]
963 | },
964 | "execution_count": 26,
965 | "metadata": {},
966 | "output_type": "execute_result"
967 | }
968 | ],
969 | "source": [
970 | "'carrots' in shopping_list"
971 | ]
972 | },
973 | {
974 | "cell_type": "code",
975 | "execution_count": 27,
976 | "id": "4615230f",
977 | "metadata": {},
978 | "outputs": [
979 | {
980 | "data": {
981 | "text/plain": [
982 | "False"
983 | ]
984 | },
985 | "execution_count": 27,
986 | "metadata": {},
987 | "output_type": "execute_result"
988 | }
989 | ],
990 | "source": [
991 | "'milk' in shopping_list"
992 | ]
993 | },
994 | {
995 | "cell_type": "markdown",
996 | "id": "a7aebc0f",
997 | "metadata": {},
998 | "source": [
999 | "### Iterating Over Lists\n",
1000 | "\n",
1001 | "There are several ways to iterate over a list. The traditional approach is to use a `for loop`."
1002 | ]
1003 | },
1004 | {
1005 | "cell_type": "code",
1006 | "execution_count": 28,
1007 | "id": "89965411",
1008 | "metadata": {},
1009 | "outputs": [
1010 | {
1011 | "name": "stdout",
1012 | "output_type": "stream",
1013 | "text": [
1014 | "apples\n",
1015 | "carrots\n",
1016 | "chocolate\n",
1017 | "bananas\n",
1018 | "onions\n"
1019 | ]
1020 | }
1021 | ],
1022 | "source": [
1023 | "for item in shopping_list:\n",
1024 | " print(item)"
1025 | ]
1026 | },
1027 | {
1028 | "cell_type": "markdown",
1029 | "id": "4b630e30",
1030 | "metadata": {},
1031 | "source": [
1032 | "If you also need the element's index in your for loop, you can access it using `enumerate()`."
1033 | ]
1034 | },
1035 | {
1036 | "cell_type": "code",
1037 | "execution_count": 29,
1038 | "id": "153975e5",
1039 | "metadata": {},
1040 | "outputs": [
1041 | {
1042 | "name": "stdout",
1043 | "output_type": "stream",
1044 | "text": [
1045 | "1) apples\n",
1046 | "2) carrots\n",
1047 | "3) chocolate\n",
1048 | "4) bananas\n",
1049 | "5) onions\n"
1050 | ]
1051 | }
1052 | ],
1053 | "source": [
1054 | "for i, item in enumerate(shopping_list):\n",
1055 | " print(f\"{i+1}) {item}\")"
1056 | ]
1057 | },
1058 | {
1059 | "cell_type": "markdown",
1060 | "id": "bb45d629",
1061 | "metadata": {},
1062 | "source": [
1063 | "Another way to iterate over a list is to use list comprehension. This is a one-liner that is useful when you're applying a simple operation to each element in your list. For example, let's make all elements inside `shopping_list` uppercase."
1064 | ]
1065 | },
1066 | {
1067 | "cell_type": "code",
1068 | "execution_count": 30,
1069 | "id": "4a3a185f",
1070 | "metadata": {},
1071 | "outputs": [
1072 | {
1073 | "data": {
1074 | "text/plain": [
1075 | "['APPLES', 'CARROTS', 'CHOCOLATE', 'BANANAS', 'ONIONS']"
1076 | ]
1077 | },
1078 | "execution_count": 30,
1079 | "metadata": {},
1080 | "output_type": "execute_result"
1081 | }
1082 | ],
1083 | "source": [
1084 | "[item.upper() for item in shopping_list]"
1085 | ]
1086 | },
1087 | {
1088 | "cell_type": "markdown",
1089 | "id": "b7df65c6",
1090 | "metadata": {},
1091 | "source": [
1092 | "### Lists are Mutable\n",
1093 | "\n",
1094 | "An important feature of a list is that it's mutable. This means that elements within a list can be added, deleted, or changed after being defined.\n",
1095 | "\n",
1096 | "To add a new element to a list, you can use `.extend()`:"
1097 | ]
1098 | },
1099 | {
1100 | "cell_type": "code",
1101 | "execution_count": 31,
1102 | "id": "92a9f89f",
1103 | "metadata": {},
1104 | "outputs": [
1105 | {
1106 | "data": {
1107 | "text/plain": [
1108 | "['apples', 'carrots', 'chocolate', 'bananas', 'onions', 'milk']"
1109 | ]
1110 | },
1111 | "execution_count": 31,
1112 | "metadata": {},
1113 | "output_type": "execute_result"
1114 | }
1115 | ],
1116 | "source": [
1117 | "shopping_list.extend(['milk'])\n",
1118 | "shopping_list"
1119 | ]
1120 | },
1121 | {
1122 | "cell_type": "markdown",
1123 | "id": "20a62882",
1124 | "metadata": {},
1125 | "source": [
1126 | "You can also add another list like this:"
1127 | ]
1128 | },
1129 | {
1130 | "cell_type": "code",
1131 | "execution_count": 32,
1132 | "id": "38748117",
1133 | "metadata": {},
1134 | "outputs": [
1135 | {
1136 | "data": {
1137 | "text/plain": [
1138 | "['apples',\n",
1139 | " 'carrots',\n",
1140 | " 'chocolate',\n",
1141 | " 'bananas',\n",
1142 | " 'onions',\n",
1143 | " 'milk',\n",
1144 | " 'cake',\n",
1145 | " 'watermelon']"
1146 | ]
1147 | },
1148 | "execution_count": 32,
1149 | "metadata": {},
1150 | "output_type": "execute_result"
1151 | }
1152 | ],
1153 | "source": [
1154 | "more_food = ['cake', 'watermelon']\n",
1155 | "shopping_list += more_food\n",
1156 | "shopping_list"
1157 | ]
1158 | },
1159 | {
1160 | "cell_type": "markdown",
1161 | "id": "0c9561ca",
1162 | "metadata": {},
1163 | "source": [
1164 | "To remove the last element of a list, you can \"pop\" it:"
1165 | ]
1166 | },
1167 | {
1168 | "cell_type": "code",
1169 | "execution_count": 33,
1170 | "id": "128f3015",
1171 | "metadata": {},
1172 | "outputs": [
1173 | {
1174 | "data": {
1175 | "text/plain": [
1176 | "['apples', 'carrots', 'chocolate', 'bananas', 'onions', 'milk', 'cake']"
1177 | ]
1178 | },
1179 | "execution_count": 33,
1180 | "metadata": {},
1181 | "output_type": "execute_result"
1182 | }
1183 | ],
1184 | "source": [
1185 | "shopping_list.pop()\n",
1186 | "shopping_list"
1187 | ]
1188 | },
1189 | {
1190 | "cell_type": "markdown",
1191 | "id": "fd4d823f",
1192 | "metadata": {},
1193 | "source": [
1194 | "If you wanted to remove a specific element from your list, you can use the `remove()` method."
1195 | ]
1196 | },
1197 | {
1198 | "cell_type": "code",
1199 | "execution_count": 34,
1200 | "id": "5d6f829f",
1201 | "metadata": {},
1202 | "outputs": [
1203 | {
1204 | "data": {
1205 | "text/plain": [
1206 | "['apples', 'chocolate', 'bananas', 'onions', 'milk', 'cake']"
1207 | ]
1208 | },
1209 | "execution_count": 34,
1210 | "metadata": {},
1211 | "output_type": "execute_result"
1212 | }
1213 | ],
1214 | "source": [
1215 | "shopping_list.remove('carrots')\n",
1216 | "shopping_list"
1217 | ]
1218 | },
1219 | {
1220 | "cell_type": "markdown",
1221 | "id": "bed14dc2",
1222 | "metadata": {},
1223 | "source": [
1224 | "## 6) Dictionary\n",
1225 | "\n",
1226 | "Dictionaries are used to store data values in `key:value` pairs. Similar to the [list](#List), a dictionary is a collection of objects. It is also **mutable**, meaning that you can add, remove, change values inside of it. \n",
1227 | "\n",
1228 | "```{note}\n",
1229 | "If you’ve ever worked with JSON before, a dictionary is very similar to the JSON object. In fact, if you load JSON data into Python, it will be expressed as a dictionary. Similarly, you can write a Python dictionary to a JSON file. \n",
1230 | "```\n",
1231 | "\n",
1232 | "With the **list**, we access elements using the index. With the **dictionary**, we access elements using keys. Let's take a look at an example of a dictionary which captures population information about boroughs in New York City: "
1233 | ]
1234 | },
1235 | {
1236 | "cell_type": "code",
1237 | "execution_count": 35,
1238 | "id": "62413e2a",
1239 | "metadata": {},
1240 | "outputs": [
1241 | {
1242 | "data": {
1243 | "text/plain": [
1244 | "dict"
1245 | ]
1246 | },
1247 | "execution_count": 35,
1248 | "metadata": {},
1249 | "output_type": "execute_result"
1250 | }
1251 | ],
1252 | "source": [
1253 | "population_nyc = {\n",
1254 | " 'bronx': 1472654,\n",
1255 | " 'brooklyn': 2736074,\n",
1256 | " 'manhattan': 1694251, \n",
1257 | " 'queens': 2405464,\n",
1258 | " 'staten_island': 495747\n",
1259 | "}\n",
1260 | "\n",
1261 | "type(population_nyc)"
1262 | ]
1263 | },
1264 | {
1265 | "cell_type": "markdown",
1266 | "id": "8d4348f1",
1267 | "metadata": {},
1268 | "source": [
1269 | "\"dict\" is short for \"dictionary\"! 😎"
1270 | ]
1271 | },
1272 | {
1273 | "cell_type": "markdown",
1274 | "id": "051cf4bd",
1275 | "metadata": {},
1276 | "source": [
1277 | "In this dictionary, the \"key\" is the borough name and the \"value\" is the population of that borough. To get a particular value, we need to know the key of that value.\n",
1278 | "\n",
1279 | "
\n",
1280 | "\n",
1281 | "For example, let's say we want to get the population of Manhattan. We can do so by doing this:"
1282 | ]
1283 | },
1284 | {
1285 | "cell_type": "code",
1286 | "execution_count": 36,
1287 | "id": "26ea328a",
1288 | "metadata": {},
1289 | "outputs": [
1290 | {
1291 | "data": {
1292 | "text/plain": [
1293 | "1694251"
1294 | ]
1295 | },
1296 | "execution_count": 36,
1297 | "metadata": {},
1298 | "output_type": "execute_result"
1299 | }
1300 | ],
1301 | "source": [
1302 | "population_nyc['manhattan']"
1303 | ]
1304 | },
1305 | {
1306 | "cell_type": "markdown",
1307 | "id": "81aa9efe",
1308 | "metadata": {},
1309 | "source": [
1310 | "We can get all keys of a dictionary using `.keys()`:"
1311 | ]
1312 | },
1313 | {
1314 | "cell_type": "code",
1315 | "execution_count": 37,
1316 | "id": "9e79d91a",
1317 | "metadata": {},
1318 | "outputs": [
1319 | {
1320 | "data": {
1321 | "text/plain": [
1322 | "dict_keys(['bronx', 'brooklyn', 'manhattan', 'queens', 'staten_island'])"
1323 | ]
1324 | },
1325 | "execution_count": 37,
1326 | "metadata": {},
1327 | "output_type": "execute_result"
1328 | }
1329 | ],
1330 | "source": [
1331 | "population_nyc.keys()"
1332 | ]
1333 | },
1334 | {
1335 | "cell_type": "markdown",
1336 | "id": "0cda691e",
1337 | "metadata": {},
1338 | "source": [
1339 | "We can get all values of a dictionary using `.values()`:"
1340 | ]
1341 | },
1342 | {
1343 | "cell_type": "code",
1344 | "execution_count": 38,
1345 | "id": "fcdbefb0",
1346 | "metadata": {},
1347 | "outputs": [
1348 | {
1349 | "data": {
1350 | "text/plain": [
1351 | "dict_values([1472654, 2736074, 1694251, 2405464, 495747])"
1352 | ]
1353 | },
1354 | "execution_count": 38,
1355 | "metadata": {},
1356 | "output_type": "execute_result"
1357 | }
1358 | ],
1359 | "source": [
1360 | "population_nyc.values()"
1361 | ]
1362 | },
1363 | {
1364 | "cell_type": "markdown",
1365 | "id": "f60941e1",
1366 | "metadata": {},
1367 | "source": [
1368 | "You can **add** a new key-value pair to the dictionary like this:"
1369 | ]
1370 | },
1371 | {
1372 | "cell_type": "code",
1373 | "execution_count": 39,
1374 | "id": "9d5f67e1",
1375 | "metadata": {},
1376 | "outputs": [
1377 | {
1378 | "data": {
1379 | "text/plain": [
1380 | "{'bronx': 1472654,\n",
1381 | " 'brooklyn': 2736074,\n",
1382 | " 'manhattan': 1694251,\n",
1383 | " 'queens': 2405464,\n",
1384 | " 'staten_island': 495747,\n",
1385 | " 'long_island': 8063232}"
1386 | ]
1387 | },
1388 | "execution_count": 39,
1389 | "metadata": {},
1390 | "output_type": "execute_result"
1391 | }
1392 | ],
1393 | "source": [
1394 | "population_nyc['long_island'] = 8063232\n",
1395 | "population_nyc"
1396 | ]
1397 | },
1398 | {
1399 | "cell_type": "markdown",
1400 | "id": "64e5425c",
1401 | "metadata": {},
1402 | "source": [
1403 | "You can also **change the value** of a key like this:"
1404 | ]
1405 | },
1406 | {
1407 | "cell_type": "code",
1408 | "execution_count": 40,
1409 | "id": "29120f7d",
1410 | "metadata": {},
1411 | "outputs": [
1412 | {
1413 | "data": {
1414 | "text/plain": [
1415 | "{'bronx': 1472654,\n",
1416 | " 'brooklyn': 2736074,\n",
1417 | " 'manhattan': 1694251,\n",
1418 | " 'queens': 2405464,\n",
1419 | " 'staten_island': 495747,\n",
1420 | " 'long_island': 8}"
1421 | ]
1422 | },
1423 | "execution_count": 40,
1424 | "metadata": {},
1425 | "output_type": "execute_result"
1426 | }
1427 | ],
1428 | "source": [
1429 | "population_nyc['long_island'] = 8\n",
1430 | "population_nyc"
1431 | ]
1432 | },
1433 | {
1434 | "cell_type": "markdown",
1435 | "id": "efd69a3b",
1436 | "metadata": {},
1437 | "source": [
1438 | "Long Island is technically not part of NYC so let's remove it from our dictionary. We can **remove** the \"long_island\" key-value pair using `.pop(key_name)`."
1439 | ]
1440 | },
1441 | {
1442 | "cell_type": "code",
1443 | "execution_count": 41,
1444 | "id": "22edcdc7",
1445 | "metadata": {},
1446 | "outputs": [
1447 | {
1448 | "data": {
1449 | "text/plain": [
1450 | "{'bronx': 1472654,\n",
1451 | " 'brooklyn': 2736074,\n",
1452 | " 'manhattan': 1694251,\n",
1453 | " 'queens': 2405464,\n",
1454 | " 'staten_island': 495747}"
1455 | ]
1456 | },
1457 | "execution_count": 41,
1458 | "metadata": {},
1459 | "output_type": "execute_result"
1460 | }
1461 | ],
1462 | "source": [
1463 | "population_nyc.pop('long_island')\n",
1464 | "population_nyc"
1465 | ]
1466 | }
1467 | ],
1468 | "metadata": {
1469 | "jupytext": {
1470 | "cell_metadata_filter": "-all",
1471 | "main_language": "python",
1472 | "notebook_metadata_filter": "-all"
1473 | },
1474 | "kernelspec": {
1475 | "display_name": "Python 3 (ipykernel)",
1476 | "language": "python",
1477 | "name": "python3"
1478 | },
1479 | "language_info": {
1480 | "codemirror_mode": {
1481 | "name": "ipython",
1482 | "version": 3
1483 | },
1484 | "file_extension": ".py",
1485 | "mimetype": "text/x-python",
1486 | "name": "python",
1487 | "nbconvert_exporter": "python",
1488 | "pygments_lexer": "ipython3",
1489 | "version": "3.9.12"
1490 | }
1491 | },
1492 | "nbformat": 4,
1493 | "nbformat_minor": 5
1494 | }
1495 |
--------------------------------------------------------------------------------
/book/00_python_crash_course_functions.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Functions\n",
8 | "\n",
9 | "We use functions in programming to bundle a set of instructions in a self-contained package. A function is a piece of code written to carry out a specific task. Functions are useful when we want to execute a specific task multiple times within our program.\n",
10 | "\n",
11 | "A function typically has two main components: \n",
12 | "\n",
13 | "1. an **input**, which can be assigned a default if not specified\n",
14 | "2. an **output**, which gets return once the code inside the function is finished running \n",
15 | "\n",
16 | "
\n",
17 | "\n",
18 | "The general structure of a function looks like this:\n",
19 | "\n",
20 | "```\n",
21 | "def task_name(input):\n",
22 | " # task code goes here\n",
23 | " return output \n",
24 | "```\n",
25 | "\n",
26 | "The input(s) of the function are called `parameters`. "
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "## Built-in Python Functions\n",
34 | "\n",
35 | "Python comes with a wide variety of built-in functions. For example, `type()` is a function that takes a value or variable name as its input and returns the name of the datatype as the output. "
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 1,
41 | "metadata": {},
42 | "outputs": [
43 | {
44 | "data": {
45 | "text/plain": [
46 | "int"
47 | ]
48 | },
49 | "execution_count": 1,
50 | "metadata": {},
51 | "output_type": "execute_result"
52 | }
53 | ],
54 | "source": [
55 | "type(100)"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "`sum()` is another built-in Python function that takes in a list of values and returns the sum. Similarly, `len()` takes in a list and returns the length of that list."
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 2,
68 | "metadata": {},
69 | "outputs": [
70 | {
71 | "name": "stdout",
72 | "output_type": "stream",
73 | "text": [
74 | "sum: 60\n",
75 | "len: 3\n"
76 | ]
77 | }
78 | ],
79 | "source": [
80 | "n_apples = [10,20,30]\n",
81 | "\n",
82 | "print(f\"sum: {sum(n_apples)}\")\n",
83 | "print(f\"len: {len(n_apples)}\")\n"
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {},
89 | "source": [
90 | "If you want to learn more about a Python function, you can use the `help()` function:"
91 | ]
92 | },
93 | {
94 | "cell_type": "code",
95 | "execution_count": 3,
96 | "metadata": {},
97 | "outputs": [
98 | {
99 | "name": "stdout",
100 | "output_type": "stream",
101 | "text": [
102 | "Help on built-in function sum in module builtins:\n",
103 | "\n",
104 | "sum(iterable, /, start=0)\n",
105 | " Return the sum of a 'start' value (default: 0) plus an iterable of numbers\n",
106 | " \n",
107 | " When the iterable is empty, return the start value.\n",
108 | " This function is intended specifically for use with numeric values and may\n",
109 | " reject non-numeric types.\n",
110 | "\n"
111 | ]
112 | }
113 | ],
114 | "source": [
115 | "help(sum)"
116 | ]
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "## How to Define Your Own Function\n",
123 | "\n",
124 | "When defining your own function, here are the steps that you should follow:\n",
125 | "\n",
126 | "1. Use the keyword `def` to declare the function and follow this with the function name\n",
127 | "2. Add parameters (inputs) to the function. These go inside the parentheses of the function.\n",
128 | "3. Add statements (instructions/logic) that the function should execute. \n",
129 | "4. End the function with a return statement so that it returns the desired output.\n",
130 | "\n",
131 | "You don't need to have a return statement for your function to be valid. What happens when your function doesn't return anything?"
132 | ]
133 | },
134 | {
135 | "cell_type": "code",
136 | "execution_count": 4,
137 | "metadata": {},
138 | "outputs": [
139 | {
140 | "name": "stdout",
141 | "output_type": "stream",
142 | "text": [
143 | "hello!\n"
144 | ]
145 | }
146 | ],
147 | "source": [
148 | "def hello():\n",
149 | " print(\"hello!\")\n",
150 | "\n",
151 | "hello()"
152 | ]
153 | },
154 | {
155 | "cell_type": "markdown",
156 | "metadata": {},
157 | "source": [
158 | "Don't be confused with the function above! It's printing `\"hello!\"` but it's not returning it. "
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 5,
164 | "metadata": {},
165 | "outputs": [
166 | {
167 | "name": "stdout",
168 | "output_type": "stream",
169 | "text": [
170 | "hello!\n",
171 | "Output type: \n"
172 | ]
173 | }
174 | ],
175 | "source": [
176 | "output = hello()\n",
177 | "\n",
178 | "print(f\"Output type: {type(output)}\")"
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "Here's an example that has two parameters (`number_1`, `number_2`) and returns the sum of those two inputs."
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 6,
191 | "metadata": {},
192 | "outputs": [
193 | {
194 | "data": {
195 | "text/plain": [
196 | "25"
197 | ]
198 | },
199 | "execution_count": 6,
200 | "metadata": {},
201 | "output_type": "execute_result"
202 | }
203 | ],
204 | "source": [
205 | "def sum_numbers(number_1, number_2):\n",
206 | " total_sum = number_1 + number_2\n",
207 | " return total_sum\n",
208 | "\n",
209 | "x = sum_numbers(10,15)\n",
210 | "x"
211 | ]
212 | },
213 | {
214 | "cell_type": "markdown",
215 | "metadata": {},
216 | "source": [
217 | "## Documenting Your Function\n",
218 | "\n",
219 | "When defining your function, it's very important to include documentation. A function's documentation is called `docstrings`. It typically describes the purpose of your function, what computations it performs, and what gets returned. It also provides information on what your inputs should be. \n",
220 | "\n",
221 | "In Python, there are two main styles for writing docstrings. The first style is [Google](http://google.github.io/styleguide/pyguide.html#Comments):\n",
222 | "\n",
223 | "```\n",
224 | "def func(arg1, arg2):\n",
225 | " \"\"\"Summary line.\n",
226 | "\n",
227 | " Extended description of function.\n",
228 | "\n",
229 | " Args:\n",
230 | " arg1 (int): Description of arg1\n",
231 | " arg2 (str): Description of arg2\n",
232 | "\n",
233 | " Returns:\n",
234 | " bool: Description of return value\n",
235 | "\n",
236 | " \"\"\"\n",
237 | " return True\n",
238 | "```\n",
239 | "\n",
240 | "The second style is [Numpy](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard):\n",
241 | "\n",
242 | "```\n",
243 | "def func(arg1, arg2):\n",
244 | " \"\"\"Summary line.\n",
245 | "\n",
246 | " Extended description of function.\n",
247 | "\n",
248 | " Parameters\n",
249 | " ----------\n",
250 | " arg1 : int\n",
251 | " Description of arg1\n",
252 | " arg2 : str\n",
253 | " Description of arg2\n",
254 | "\n",
255 | " Returns\n",
256 | " -------\n",
257 | " bool\n",
258 | " Description of return value\n",
259 | "\n",
260 | " \"\"\"\n",
261 | " return True\n",
262 | "```\n",
263 | "\n",
264 | "Before building a Python application, it's a good idea to decide which docstring style you want to use. If your docstrings are short and simple, Google's style is a great option. If you have long, in-depth docstrings, you may want to opt for the Numpy style. That being said, this is mainly a style preference. Both docstring styles are valid.\n",
265 | "\n",
266 | "Let's re-write our `sum_numbers()` function with docstrings using the Google style guide."
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": 7,
272 | "metadata": {},
273 | "outputs": [],
274 | "source": [
275 | "def sum_numbers(number_1, number_2):\n",
276 | " \"\"\"Sums two numbers together. \n",
277 | " \n",
278 | " Args:\n",
279 | " number_1 (int): first number to be summed\n",
280 | " number_2 (int): second number to be summed\n",
281 | " \n",
282 | " Returns:\n",
283 | " int: sum of number_1 and number_2\n",
284 | " \"\"\"\n",
285 | " total_sum = number_1 + number_2\n",
286 | " return total_sum"
287 | ]
288 | },
289 | {
290 | "cell_type": "markdown",
291 | "metadata": {},
292 | "source": [
293 | "Want to see more examples of Python docstrings in action? Check out the code base of open-source Python packages like [pandas](https://github.com/pandas-dev/pandas/tree/master/pandas) and [scikit-learn](https://github.com/scikit-learn/scikit-learn/tree/master/sklearn) for inspiration. "
294 | ]
295 | }
296 | ],
297 | "metadata": {
298 | "kernelspec": {
299 | "display_name": "Python 3 (ipykernel)",
300 | "language": "python",
301 | "name": "python3"
302 | },
303 | "language_info": {
304 | "codemirror_mode": {
305 | "name": "ipython",
306 | "version": 3
307 | },
308 | "file_extension": ".py",
309 | "mimetype": "text/x-python",
310 | "name": "python",
311 | "nbconvert_exporter": "python",
312 | "pygments_lexer": "ipython3",
313 | "version": "3.9.12"
314 | }
315 | },
316 | "nbformat": 4,
317 | "nbformat_minor": 2
318 | }
319 |
--------------------------------------------------------------------------------
/book/00_python_crash_course_oop.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "90cc6d0b",
6 | "metadata": {},
7 | "source": [
8 | "# Object-Oriented Programming\n",
9 | "\n",
10 | "Python is an object-oriented programming (OOP) language. In Python, just about everything is an “object”. \n",
11 | "\n",
12 | "Objects have their own attributes. Let’s say we have an object called `cat`. A cat's attributes could include color, size, and age. Suppose we want to know the color of the `cat`. We can inspect the color attribute like this:\n",
13 | "\n",
14 | "```\n",
15 | "cat.color \n",
16 | "```\n",
17 | "> red \n",
18 | "\n",
19 | "Objects also have their own methods, which are basically built-in functions that are applied to the object. In this case, the `cat`’s methods could include jumping, sleeping, or playing. This is how we would ask the cat to jump:\n",
20 | "\n",
21 | "```\n",
22 | "cat.jump()\n",
23 | "```"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "id": "80b248d9",
29 | "metadata": {},
30 | "source": [
31 | "Now, you might be wondering: where did this `cat` object come from? How did we create it? \n",
32 | "\n",
33 | "An object is an instance of a \"[class](https://docs.python.org/3/tutorial/classes.html)\", which can be thought of as a “blueprint” for creating objects. That means that our object, `cat`, came from a class. Let's call the class `Cat`. The `Cat` class is where the attributes and methods are defined. It might look something like this:"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 1,
39 | "id": "cb7d055d",
40 | "metadata": {},
41 | "outputs": [],
42 | "source": [
43 | "class Cat:\n",
44 | " def __init__(self, name, color, age):\n",
45 | " self.name = name\n",
46 | " self.color = color \n",
47 | " self.age = age\n",
48 | " \n",
49 | " def jump(self):\n",
50 | " print(\"jump!\")\n",
51 | "\n",
52 | " def meow(self):\n",
53 | " print(\"meow!\")"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "id": "f7773053",
59 | "metadata": {},
60 | "source": [
61 | "The `cat` object was created like this:"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": 2,
67 | "id": "ae8ec5cf",
68 | "metadata": {},
69 | "outputs": [
70 | {
71 | "name": "stdout",
72 | "output_type": "stream",
73 | "text": [
74 | "meow!\n"
75 | ]
76 | }
77 | ],
78 | "source": [
79 | "cat = Cat(name='Tabby', color='red', age=2)\n",
80 | "cat.meow()"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "id": "1472445e",
86 | "metadata": {},
87 | "source": [
88 | "As we'll learn very soon, all objects have a datatype. The datatype of an object is its class. In the case of our `cat` object, it's datatype is `Cat`! "
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "id": "1276db4b",
94 | "metadata": {},
95 | "source": [
96 | "```{note}\n",
97 | "When we start learning about dataframes in the next chapter, it'll be helpful to remember 2 things:\n",
98 | "\n",
99 | "- a dataframe attribute looks like: `dataframe.attribute_name` (without parentheses)\n",
100 | "- a dataframe method looks like: `dataframe.method()` (with parentheses)\n",
101 | "\n",
102 | "If this is super confusing, don't worry! We will learn as we go. \n",
103 | "```"
104 | ]
105 | }
106 | ],
107 | "metadata": {
108 | "jupytext": {
109 | "cell_metadata_filter": "-all",
110 | "main_language": "python",
111 | "notebook_metadata_filter": "-all"
112 | },
113 | "kernelspec": {
114 | "display_name": "Python 3 (ipykernel)",
115 | "language": "python",
116 | "name": "python3"
117 | },
118 | "language_info": {
119 | "codemirror_mode": {
120 | "name": "ipython",
121 | "version": 3
122 | },
123 | "file_extension": ".py",
124 | "mimetype": "text/x-python",
125 | "name": "python",
126 | "nbconvert_exporter": "python",
127 | "pygments_lexer": "ipython3",
128 | "version": "3.9.9"
129 | }
130 | },
131 | "nbformat": 4,
132 | "nbformat_minor": 5
133 | }
134 |
--------------------------------------------------------------------------------
/book/00_python_crash_course_variables.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Variables\n",
8 | "\n",
9 | "In Python, *everything* is an object. Variables are names given to identify these objects. In other words, a variable can be thought of as a “label” or “name tag” for Python objects that we're working with. With any good labelling system, it makes it easy to retrieve the right object that we're looking for. \n",
10 | "\n",
11 | "
"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "## Example of a Variable\n",
19 | "\n",
20 | "The easiest way to understand a variable is to see how it's used in the wild.\n",
21 | "\n",
22 | "Let's say we have an oven that measures temperature in Celcius, but all of our recipes are in Fahrenheit. We're baking cookies and it says to pre-heat the oven to 350 degrees Fahrenheit. We'll need to convert this to Celcius using this calculation: \n",
23 | "\n",
24 | "$T-32 \\times \\frac{5}{9}$\n",
25 | "\n",
26 | "where T is the temperature in Fahrenheit. "
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 1,
32 | "metadata": {},
33 | "outputs": [
34 | {
35 | "data": {
36 | "text/plain": [
37 | "176.66666666666669"
38 | ]
39 | },
40 | "execution_count": 1,
41 | "metadata": {},
42 | "output_type": "execute_result"
43 | }
44 | ],
45 | "source": [
46 | "(350-32)*(5/9)"
47 | ]
48 | },
49 | {
50 | "cell_type": "markdown",
51 | "metadata": {},
52 | "source": [
53 | "To make the code more clear, we can store temperature as a variable, called `temperature_in_fahrenheit`. "
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 2,
59 | "metadata": {},
60 | "outputs": [
61 | {
62 | "data": {
63 | "text/plain": [
64 | "176.66666666666669"
65 | ]
66 | },
67 | "execution_count": 2,
68 | "metadata": {},
69 | "output_type": "execute_result"
70 | }
71 | ],
72 | "source": [
73 | "temperature_in_fahrenheit = 350\n",
74 | "\n",
75 | "(temperature_in_fahrenheit-32)*(5/9)"
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {},
81 | "source": [
82 | "## Creating a Variable\n",
83 | "\n",
84 | "We can create a variable by assigning it a value. "
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 3,
90 | "metadata": {},
91 | "outputs": [
92 | {
93 | "name": "stdout",
94 | "output_type": "stream",
95 | "text": [
96 | "10\n"
97 | ]
98 | }
99 | ],
100 | "source": [
101 | "x = 10\n",
102 | "print(x)"
103 | ]
104 | },
105 | {
106 | "cell_type": "markdown",
107 | "metadata": {},
108 | "source": [
109 | "In the example above, variable `x` is assigned the value 10. We can treat `x` as if it were 10 and apply arithmetic operations to it. "
110 | ]
111 | },
112 | {
113 | "cell_type": "code",
114 | "execution_count": 4,
115 | "metadata": {},
116 | "outputs": [
117 | {
118 | "name": "stdout",
119 | "output_type": "stream",
120 | "text": [
121 | "20\n",
122 | "110\n"
123 | ]
124 | }
125 | ],
126 | "source": [
127 | "print(x*2)\n",
128 | "print(x+100)"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "We can re-assign a variable to another value even after it's already been assigned once. When we use the variable after re-assignment, the new value will referenced. The initial value is no longer stored. "
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 5,
141 | "metadata": {},
142 | "outputs": [
143 | {
144 | "name": "stdout",
145 | "output_type": "stream",
146 | "text": [
147 | "1\n",
148 | "2\n",
149 | "101\n"
150 | ]
151 | }
152 | ],
153 | "source": [
154 | "x = 1\n",
155 | "print(x)\n",
156 | "print(x*2)\n",
157 | "print(x+100)"
158 | ]
159 | },
160 | {
161 | "cell_type": "markdown",
162 | "metadata": {},
163 | "source": [
164 | "We can also re-assign a variable to a value of another datatype. In doing so, we are changing the datatype of the variable."
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": 6,
170 | "metadata": {},
171 | "outputs": [
172 | {
173 | "name": "stdout",
174 | "output_type": "stream",
175 | "text": [
176 | "Datatype of 10: \n",
177 | "Datatype of helloworld: \n"
178 | ]
179 | }
180 | ],
181 | "source": [
182 | "a = 10\n",
183 | "print(f\"Datatype of {a}: {type(a)}\")\n",
184 | "\n",
185 | "a = \"helloworld\"\n",
186 | "print(f\"Datatype of {a}: {type(a)}\")"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "## Chain Assignment\n",
194 | "\n",
195 | "With chain assignment, you can assign the same value to several variables simultaneously. Let's assign `a`, `b`, and `c` to 100. "
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 7,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": [
204 | "a = b = c = 100"
205 | ]
206 | },
207 | {
208 | "cell_type": "code",
209 | "execution_count": 8,
210 | "metadata": {},
211 | "outputs": [
212 | {
213 | "name": "stdout",
214 | "output_type": "stream",
215 | "text": [
216 | "100 100 100\n"
217 | ]
218 | }
219 | ],
220 | "source": [
221 | "print(a, b, c)"
222 | ]
223 | },
224 | {
225 | "cell_type": "markdown",
226 | "metadata": {},
227 | "source": [
228 | "`a`, `b`, and `c` are 3 variables that all have the same value 100. "
229 | ]
230 | },
231 | {
232 | "cell_type": "markdown",
233 | "metadata": {},
234 | "source": [
235 | "## Variable Assignment and Objects in Python\n",
236 | "\n",
237 | "In Python, it's important to note that everything is an \"object\". Integers, strings, and floats are treated as objects. So a variable is a symbolic name that is a reference to an object (value). Once an object (value) is assigned to a variable, you can refer to the object by that name. Let's look at an example."
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": 9,
243 | "metadata": {},
244 | "outputs": [],
245 | "source": [
246 | "x = 500"
247 | ]
248 | },
249 | {
250 | "cell_type": "markdown",
251 | "metadata": {},
252 | "source": [
253 | "This assignment creates an integer object with the value 500 and assigns the variable `x` to point to that object.\n",
254 | "\n",
255 | "Now, let's say we want to create a new variable `y` that points to `x`. "
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": 10,
261 | "metadata": {},
262 | "outputs": [
263 | {
264 | "data": {
265 | "text/plain": [
266 | "500"
267 | ]
268 | },
269 | "execution_count": 10,
270 | "metadata": {},
271 | "output_type": "execute_result"
272 | }
273 | ],
274 | "source": [
275 | "y = x \n",
276 | "y"
277 | ]
278 | },
279 | {
280 | "cell_type": "markdown",
281 | "metadata": {},
282 | "source": [
283 | "Assigning one variable to another does not create a new object. Instead, it creates a new symbolic reference, `y`, which points to the same object that `x` points to. What happens when we re-assign `x` to another value?"
284 | ]
285 | },
286 | {
287 | "cell_type": "code",
288 | "execution_count": 11,
289 | "metadata": {},
290 | "outputs": [],
291 | "source": [
292 | "x = 111"
293 | ]
294 | },
295 | {
296 | "cell_type": "markdown",
297 | "metadata": {},
298 | "source": [
299 | "A new integer object gets created with the value 111 and `x` becomes a reference to it. The value of `y` is still referencing the original value that `x` was assigned to. "
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": 12,
305 | "metadata": {},
306 | "outputs": [
307 | {
308 | "data": {
309 | "text/plain": [
310 | "500"
311 | ]
312 | },
313 | "execution_count": 12,
314 | "metadata": {},
315 | "output_type": "execute_result"
316 | }
317 | ],
318 | "source": [
319 | "y"
320 | ]
321 | },
322 | {
323 | "cell_type": "markdown",
324 | "metadata": {},
325 | "source": [
326 | "## Global and Local Variables\n",
327 | "\n",
328 | "A global variable is defined outside a function and can be accessed inside any function within your Python environment. Let's create a variable called `w`. It's a global variable because it's defined outside of a function. "
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": 13,
334 | "metadata": {},
335 | "outputs": [
336 | {
337 | "data": {
338 | "text/plain": [
339 | "'hi!'"
340 | ]
341 | },
342 | "execution_count": 13,
343 | "metadata": {},
344 | "output_type": "execute_result"
345 | }
346 | ],
347 | "source": [
348 | "w = 'hi!'\n",
349 | "w"
350 | ]
351 | },
352 | {
353 | "cell_type": "markdown",
354 | "metadata": {},
355 | "source": [
356 | "We can access `w` inside a function. Let's create a function that prints `w`."
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 14,
362 | "metadata": {},
363 | "outputs": [
364 | {
365 | "name": "stdout",
366 | "output_type": "stream",
367 | "text": [
368 | "hi!\n"
369 | ]
370 | }
371 | ],
372 | "source": [
373 | "def print_greetings():\n",
374 | " print(w)\n",
375 | " \n",
376 | "print_greetings()"
377 | ]
378 | },
379 | {
380 | "cell_type": "markdown",
381 | "metadata": {},
382 | "source": [
383 | "If a variable is defined inside a function, it's called a local variable and can only be accessed inside that function. "
384 | ]
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": 15,
389 | "metadata": {},
390 | "outputs": [
391 | {
392 | "name": "stdout",
393 | "output_type": "stream",
394 | "text": [
395 | "hola!\n"
396 | ]
397 | }
398 | ],
399 | "source": [
400 | "def print_greetings():\n",
401 | " w = 'hola!'\n",
402 | " print(w)\n",
403 | "\n",
404 | "print_greetings()"
405 | ]
406 | },
407 | {
408 | "cell_type": "markdown",
409 | "metadata": {},
410 | "source": [
411 | "We created a local `w` variable inside our function `print_greetings`. This takes priority over the outside global variable. That's why the value of local variable `w` gets printed (\"hola!\") instead of the global variable value (\"hi!\").\n",
412 | "\n",
413 | "That being said, if we were to print `w` on its own, it will refer to the global variable rather than the local variable. This is because the local variable cannot be accessed outside of the function that it's defined in."
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": 16,
419 | "metadata": {},
420 | "outputs": [
421 | {
422 | "name": "stdout",
423 | "output_type": "stream",
424 | "text": [
425 | "hi!\n"
426 | ]
427 | }
428 | ],
429 | "source": [
430 | "print(w)"
431 | ]
432 | },
433 | {
434 | "cell_type": "markdown",
435 | "metadata": {},
436 | "source": [
437 | "## Variable Naming\n",
438 | "\n",
439 | "When writing a Python script or application, it's important to give your variable a descriptive name. This is especially true for data science projects where the name of your variable can give more information on its purpose at first glance. \n",
440 | "\n",
441 | "Here are some general rules about variable naming in Python: \n",
442 | "\n",
443 | "- In Javascript, variables tend to follow the `camelCase` convention. In Python, we use `snake_case` where every word is separated with an underscore.\n",
444 | "- Variables can contain digits but the first character of a variable name cannot be a digit. For example, `a2` is a legitimate variable name but `2a` would raise an error. \n",
445 | "- Variable names are case-sensitive so you can create two variables with the same spelling but if they have different lower-case/upper-case letters, they will be treated as two separate variables. The general trend is to keep your variable names lower-case (i.e., use `age` instead of `Age`)."
446 | ]
447 | },
448 | {
449 | "cell_type": "code",
450 | "execution_count": 17,
451 | "metadata": {},
452 | "outputs": [
453 | {
454 | "name": "stdout",
455 | "output_type": "stream",
456 | "text": [
457 | "age: 10, Age: 50\n"
458 | ]
459 | }
460 | ],
461 | "source": [
462 | "age = 10\n",
463 | "Age = 50\n",
464 | "\n",
465 | "print(f\"age: {age}, Age: {Age}\")"
466 | ]
467 | },
468 | {
469 | "cell_type": "markdown",
470 | "metadata": {},
471 | "source": [
472 | "In Python, there are several keywords that are restricted from being used as variable names. These should never be used as variable names. You can check out the reserved keywords below:"
473 | ]
474 | },
475 | {
476 | "cell_type": "code",
477 | "execution_count": 18,
478 | "metadata": {},
479 | "outputs": [
480 | {
481 | "name": "stdout",
482 | "output_type": "stream",
483 | "text": [
484 | "\n",
485 | "Here is a list of the Python keywords. Enter any keyword to get more help.\n",
486 | "\n",
487 | "False break for not\n",
488 | "None class from or\n",
489 | "True continue global pass\n",
490 | "__peg_parser__ def if raise\n",
491 | "and del import return\n",
492 | "as elif in try\n",
493 | "assert else is while\n",
494 | "async except lambda with\n",
495 | "await finally nonlocal yield\n",
496 | "\n"
497 | ]
498 | }
499 | ],
500 | "source": [
501 | "help(\"keywords\")"
502 | ]
503 | },
504 | {
505 | "cell_type": "markdown",
506 | "metadata": {},
507 | "source": [
508 | "Variable naming is partly a style preference, but there are suggested guidelines to follow in Python's official Style Guide. You can check out the suggestions [here](https://www.python.org/dev/peps/pep-0008/#naming-conventions). "
509 | ]
510 | }
511 | ],
512 | "metadata": {
513 | "kernelspec": {
514 | "display_name": "Python 3 (ipykernel)",
515 | "language": "python",
516 | "name": "python3"
517 | },
518 | "language_info": {
519 | "codemirror_mode": {
520 | "name": "ipython",
521 | "version": 3
522 | },
523 | "file_extension": ".py",
524 | "mimetype": "text/x-python",
525 | "name": "python",
526 | "nbconvert_exporter": "python",
527 | "pygments_lexer": "ipython3",
528 | "version": "3.9.12"
529 | }
530 | },
531 | "nbformat": 4,
532 | "nbformat_minor": 2
533 | }
534 |
--------------------------------------------------------------------------------
/book/01_pandas_dataframe.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "70594b81",
6 | "metadata": {},
7 | "source": [
8 | "# Getting to Know the Pandas DataFrame"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "id": "6fdaa7fa",
14 | "metadata": {},
15 | "source": [
16 | "The [Pandas DataFrame](https://pandas.pydata.org/docs/reference/frame.html) is a data structure that allows us to manipulate and analyze tabular data. A \"tabular\" data structure can be thought of as a matrix, where rows represent observations and columns represent features that describe each observation. It's a structure that you would find in a SQL database or Excel spreadsheet. Let's say we have a tabular dataset about movies.\n",
17 | "\n",
18 | "
\n",
19 | "\n",
20 | "In this case, each row represents a movie and each column represents a characteristic about the movie like the genre, rating, and director. The \"index\" column represents a row's position in the dataframe. By default, a Pandas DataFrame's index starts at 0."
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "id": "cdda2b7c",
26 | "metadata": {},
27 | "source": [
28 | "## Importing the Pandas package\n",
29 | "\n",
30 | "In order to create and use a Pandas DataFrame, we need to have the `pandas` package readily available in our environment. Let's import `pandas` and give it the alias of \"pd\" so that we don't have to write out \"pandas\" every time we call a function.\n",
31 | "\n",
32 | "
"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 1,
38 | "id": "f9dd60c4",
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "import pandas as pd "
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "id": "cd2658d2",
48 | "metadata": {},
49 | "source": [
50 | "## Creating a dataframe\n",
51 | "\n",
52 | "There are several ways to create a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame). Here, we'll describe 2 approaches. \n",
53 | "\n",
54 | "### Converting a dictionary to a dataframe\n",
55 | "\n",
56 | "You can create a dataframe from a dictionary. Each key of the dictionary represents a column name and the value of the dictionary is a list that represents values belonging to that particular column. Each element of the list represents the value of a row in the dataframe. \n",
57 | "\n",
58 | "
\n",
59 | "\n",
60 | "Let's create a dataframe called `df_movies`.\n",
61 | "\n",
62 | "```{note}\n",
63 | "`df` is short for \"dataframe\". It's common for data scientists to name their dataframe \"df\". \n",
64 | "```"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": 2,
70 | "id": "27f8611b",
71 | "metadata": {},
72 | "outputs": [],
73 | "source": [
74 | "data = {\n",
75 | " 'movie': ['Batman', 'Jungle Book', 'Titanic'], \n",
76 | " 'genre': ['action', 'kids', 'romance'], \n",
77 | " 'rating': [6, 9, 8],\n",
78 | " 'director': ['Tim Burton', 'Wolfgang Reitherman', 'James Cameron']\n",
79 | "}\n",
80 | "\n",
81 | "df_movies = pd.DataFrame(data)"
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "id": "f5b2b9b9",
87 | "metadata": {},
88 | "source": [
89 | "We can confirm that `df_movies` is indeed a dataframe:"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": 3,
95 | "id": "c0c71ad1",
96 | "metadata": {},
97 | "outputs": [
98 | {
99 | "data": {
100 | "text/plain": [
101 | "pandas.core.frame.DataFrame"
102 | ]
103 | },
104 | "execution_count": 3,
105 | "metadata": {},
106 | "output_type": "execute_result"
107 | }
108 | ],
109 | "source": [
110 | "type(df_movies)"
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "id": "2ac9bef6",
116 | "metadata": {},
117 | "source": [
118 | "Now let's see how it looks 👀:"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": 4,
124 | "id": "09c7f239",
125 | "metadata": {},
126 | "outputs": [
127 | {
128 | "data": {
129 | "text/html": [
130 | "\n",
131 | "\n",
144 | "
\n",
145 | " \n",
146 | " \n",
147 | " | \n",
148 | " movie | \n",
149 | " genre | \n",
150 | " rating | \n",
151 | " director | \n",
152 | "
\n",
153 | " \n",
154 | " \n",
155 | " \n",
156 | " 0 | \n",
157 | " Batman | \n",
158 | " action | \n",
159 | " 6 | \n",
160 | " Tim Burton | \n",
161 | "
\n",
162 | " \n",
163 | " 1 | \n",
164 | " Jungle Book | \n",
165 | " kids | \n",
166 | " 9 | \n",
167 | " Wolfgang Reitherman | \n",
168 | "
\n",
169 | " \n",
170 | " 2 | \n",
171 | " Titanic | \n",
172 | " romance | \n",
173 | " 8 | \n",
174 | " James Cameron | \n",
175 | "
\n",
176 | " \n",
177 | "
\n",
178 | "
"
179 | ],
180 | "text/plain": [
181 | " movie genre rating director\n",
182 | "0 Batman action 6 Tim Burton\n",
183 | "1 Jungle Book kids 9 Wolfgang Reitherman\n",
184 | "2 Titanic romance 8 James Cameron"
185 | ]
186 | },
187 | "execution_count": 4,
188 | "metadata": {},
189 | "output_type": "execute_result"
190 | }
191 | ],
192 | "source": [
193 | "df_movies"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "id": "6afd5ec6",
199 | "metadata": {},
200 | "source": [
201 | "### Loading a csv file into a dataframe\n",
202 | "\n",
203 | "You can also create a dataframe by importing tabular data from a comma-separated-value (csv) file, or Excel spreadsheet. A csv file looks somthing like this:\n",
204 | "\n",
205 | "
\n",
206 | "\n",
207 | "To load this csv file into a Pandas DataFrame, we will need to use the Pandas [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function. For data in Excel format, you can use [`read_excel()`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html). We will also need to know the path where the csv file is located. This can be either on your local machine or in the cloud. \n",
208 | "\n",
209 | "Let's load in `movies_data.csv` file as a dataframe. The original file is located on my local machine in a folder called `data/`."
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 5,
215 | "id": "52e90654",
216 | "metadata": {},
217 | "outputs": [
218 | {
219 | "data": {
220 | "text/html": [
221 | "\n",
222 | "\n",
235 | "
\n",
236 | " \n",
237 | " \n",
238 | " | \n",
239 | " movie | \n",
240 | " genre | \n",
241 | " rating | \n",
242 | " director | \n",
243 | "
\n",
244 | " \n",
245 | " \n",
246 | " \n",
247 | " 0 | \n",
248 | " Batman | \n",
249 | " action | \n",
250 | " 6 | \n",
251 | " Tim Burton | \n",
252 | "
\n",
253 | " \n",
254 | " 1 | \n",
255 | " Jungle Book | \n",
256 | " kids | \n",
257 | " 9 | \n",
258 | " Wolfgang Reitherman | \n",
259 | "
\n",
260 | " \n",
261 | " 2 | \n",
262 | " Titanic | \n",
263 | " romance | \n",
264 | " 8 | \n",
265 | " James Cameron | \n",
266 | "
\n",
267 | " \n",
268 | "
\n",
269 | "
"
270 | ],
271 | "text/plain": [
272 | " movie genre rating director\n",
273 | "0 Batman action 6 Tim Burton\n",
274 | "1 Jungle Book kids 9 Wolfgang Reitherman\n",
275 | "2 Titanic romance 8 James Cameron"
276 | ]
277 | },
278 | "execution_count": 5,
279 | "metadata": {},
280 | "output_type": "execute_result"
281 | }
282 | ],
283 | "source": [
284 | "df_movies = pd.read_csv(\"data/movies_data.csv\")\n",
285 | "\n",
286 | "df_movies"
287 | ]
288 | },
289 | {
290 | "cell_type": "markdown",
291 | "id": "ea1b631c",
292 | "metadata": {},
293 | "source": [
294 | "This csv-loaded dataframe is identical to the one that was generated from a dictionary. "
295 | ]
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "id": "2629988a",
300 | "metadata": {},
301 | "source": [
302 | "## Pandas Series\n",
303 | "\n",
304 | "An important part of the Pandas DataFrame is the [Pandas Series](https://pandas.pydata.org/docs/reference/series.html). While the DataFrame is a 2-dimensional structure, a Series is 1-dimensional. It can store any datatype (integers, strings, floats, timestamps, even lists). A Series represents a single column of a DataFrame. This is how you get an individual column (represented as a Pandas Series) from a dataframe:\n",
305 | "\n",
306 | "```\n",
307 | "dataframe['column_name'] \n",
308 | "```\n",
309 | "\n",
310 | "Let's say we want to pull the `rating` column from our `df_movies` dataframe."
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": 6,
316 | "id": "f1cf12c4",
317 | "metadata": {},
318 | "outputs": [
319 | {
320 | "data": {
321 | "text/plain": [
322 | "0 6\n",
323 | "1 9\n",
324 | "2 8\n",
325 | "Name: rating, dtype: int64"
326 | ]
327 | },
328 | "execution_count": 6,
329 | "metadata": {},
330 | "output_type": "execute_result"
331 | }
332 | ],
333 | "source": [
334 | "df_movies['rating']"
335 | ]
336 | },
337 | {
338 | "cell_type": "markdown",
339 | "id": "4dcb11cc",
340 | "metadata": {},
341 | "source": [
342 | "The `rating` column is a Pandas Series! We can confirm its datatype:"
343 | ]
344 | },
345 | {
346 | "cell_type": "code",
347 | "execution_count": 7,
348 | "id": "3bc9b2c0",
349 | "metadata": {},
350 | "outputs": [
351 | {
352 | "data": {
353 | "text/plain": [
354 | "pandas.core.series.Series"
355 | ]
356 | },
357 | "execution_count": 7,
358 | "metadata": {},
359 | "output_type": "execute_result"
360 | }
361 | ],
362 | "source": [
363 | "type(df_movies['rating'])"
364 | ]
365 | },
366 | {
367 | "cell_type": "markdown",
368 | "id": "3615b86a",
369 | "metadata": {},
370 | "source": [
371 | "There is a wide range of built-in functions that come with the Pandas Series. Some examples include:\n",
372 | "\n",
373 | "- `.mean()`: if the column is numeric, it gets the average value of the column\n",
374 | "- `.nunique()`: counts number of unique values belonging to a particular column \n",
375 | "- `.fillna(value='value')`: fills missing values with 'value' (or any other value of your choosing)\n",
376 | "\n",
377 | "The [official documentation](https://pandas.pydata.org/docs/reference/series.html) on Pandas Series provides a list of all available functions. \n",
378 | "We'll explore the functions of Pandas Series in more detail in the upcmoing chapter, Data Exploration. "
379 | ]
380 | }
381 | ],
382 | "metadata": {
383 | "kernelspec": {
384 | "display_name": "Python 3 (ipykernel)",
385 | "language": "python",
386 | "name": "python3"
387 | },
388 | "language_info": {
389 | "codemirror_mode": {
390 | "name": "ipython",
391 | "version": 3
392 | },
393 | "file_extension": ".py",
394 | "mimetype": "text/x-python",
395 | "name": "python",
396 | "nbconvert_exporter": "python",
397 | "pygments_lexer": "ipython3",
398 | "version": "3.9.12"
399 | }
400 | },
401 | "nbformat": 4,
402 | "nbformat_minor": 5
403 | }
404 |
--------------------------------------------------------------------------------
/book/AP_nyc_data_definitions.md:
--------------------------------------------------------------------------------
1 | # NYC Real Estate Data Defintions
2 |
3 | **Source:** [The City of New York, Department of Finance.](https://www1.nyc.gov/assets/finance/downloads/pdf/07pdf/glossary_rsf071607.pdf)
4 |
5 | ## Borough
6 |
7 | The name of the borough in which the property is located.
8 |
9 | ## Neighborhood
10 |
11 | Department of Finance assessors determine the neighborhood name in the course of valuing properties. The common name of the neighborhood is generally the same as the name Finance designates. However, there may be slight differences in neighborhood boundary lines and some sub-neighborhoods may not be included.
12 |
13 | ## Building Class Category
14 |
15 | This is a field that we are including so that users of the Rolling Sales Files can easily identify similar properties by broad usage (e.g. One Family Homes) without looking up individual Building Classes. Files are sorted by Borough, Neighborhood, Building Class Category, Block and Lot.
16 |
17 | ## Tax Class at Present
18 |
19 | Every property in the city is assigned to one of four tax classes (Classes 1, 2, 3, and 4),
20 | based on the use of the property.
21 |
22 | - **Class 1:** Includes most residential property of up to three units (such as one-, two-, and three-family homes and small stores or offices with one or two attached apartments), vacant land that is zoned for residential use, and most condominiums that are not more than three stories.
23 | - **Class 2:** Includes all other property that is primarily residential, such as cooperatives and condominiums.
24 | - **Class 3:** Includes property with equipment owned by a gas, telephone or electric company.
25 | - **Class 4:** Includes all other properties not included in class 1, 2, and 3, such as
26 | offices, factories, warehouses, garage buildings, etc.
27 |
28 | ## Block
29 |
30 | A Tax Block is a sub-division of the borough on which real properties are located. The Department of Finance uses a Borough-Block-Lot classification to label all real property in the City. “Whereas” addresses describe the street location of a property, the block and lot distinguishes one unit of real property from another, such as the different condominiums in a single building. Also, block and lots are not subject to name changes based on which side of the parcel the building puts its entrance on.
31 |
32 | ## Lot
33 |
34 | A Tax Lot is a subdivision of a Tax Block and represents the property unique location.
35 |
36 | ## Easement
37 |
38 | An easement is a right, such as a right of way, which allows an entity to make limited use of another’s real property. For example: MTA railroad tracks that run across a portion of another property.
39 |
40 | ## Building Class at Present
41 |
42 | The Building Classification is used to describe a property’s constructive use. The first position of the Building Class is a letter that is used to describe a general class of properties (for example "A" signifies one-family homes, "O" signifies office buildings. "R" signifies
43 | condominiums). The second position, a number, adds more specific information about the property’s use or construction style (using our previous examples “A0” is a Cape Cod style one family home, “O4” is a tower type office building and “R5” is a commercial condominium unit). The term Building Class used by the Department of Finance is interchangeable with the term Building Code used by the Department of Buildings. See NYC Building Classifications.
44 |
45 | ## Address
46 |
47 | The street address of the property as listed on the Sales File. Coop sales include the apartment number in the address field.
48 |
49 | ## Zip Code
50 |
51 | The property’s postal code
52 |
53 | ## Residential Units
54 |
55 | The number of residential units at the listed property.
56 |
57 | ## Commercial Units
58 |
59 | The number of commercial units at the listed property.
60 |
61 | ## Total Units
62 |
63 | The total number of units at the listed property.
64 |
65 | ## Land Square Feet
66 |
67 | The land area of the property listed in square feet.
68 |
69 | ## Gross Square Feet
70 |
71 | The total area of all the floors of a building as measured from the exterior surfaces of the outside walls of the building, including the land area and space within any building or structure on the property.
72 |
73 | ## Year Built
74 |
75 | Year the structure on the property was built.
76 |
77 | ## Building Class at Time of Sale
78 |
79 | The Building Classification is used to describe a property’s constructive use. The first position of the Building Class is a letter that is used to describe a general class of properties (for example "A" signifies one-family homes, "O" signifies office buildings. "R" signifies condominiums). The second position, a number, adds more specific information about the property’s use or construction style (using our previous examples "A0" is a Cape Cod style one family home, “O4” is a tower type office building and "R5" is a commercial condominium unit). The term Building Class as used by the Department of Finance is interchangeable with the term Building Code as used by the Department of Buildings.
80 |
81 | ## Sales Price
82 |
83 | Price paid for the property.
84 |
85 | ## Sale Date
86 |
87 | Date the property sold.
88 |
89 | ## $0 Sales Price
90 |
91 | A `$0` sale indicates that there was a transfer of ownership without a cash consideration. There can be a number of reasons for a $0 sale including transfers of ownership from parents to children.
--------------------------------------------------------------------------------
/book/_config.yml:
--------------------------------------------------------------------------------
1 | # Book settings
2 | # Learn more at https://jupyterbook.org/customize/config.html
3 |
4 | title: Practical Python for Data Science
5 | author: Jill Cates
6 | copyright: "Jupyter Academy 2022"
7 | logo: logo.png
8 |
9 | # Force re-execution of notebooks on each build.
10 | # See https://jupyterbook.org/content/execute.html
11 | execute:
12 | execute_notebooks: force
13 |
14 | # Define the name of the latex output file for PDF builds
15 | latex:
16 | latex_documents:
17 | targetname: book.tex
18 |
19 | # Add a bibtex file so that we can create citations
20 | # bibtex_bibfiles:
21 | # - references.bib
22 |
23 | # Information about where the book exists on the web
24 | repository:
25 | url: https://github.com/jupyteracademy/practical-python-for-data-science # Online location of your book
26 | branch: main # Which branch of the repository should be used when creating links (optional)
27 | path_to_book: "book/"
28 |
29 | # Add GitHub buttons to your book
30 | # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository
31 | html:
32 | use_issues_button: true
33 | use_repository_button: true
34 | favicon: "https://practicalpython.s3.us-east-2.amazonaws.com/assets/favicon.ico"
35 | extra_navbar: Jupyter Academy
36 | google_analytics_id: UA-219614792-1
--------------------------------------------------------------------------------
/book/_toc.yml:
--------------------------------------------------------------------------------
1 | # Table of contents
2 | # Learn more at https://jupyterbook.org/customize/toc.html
3 |
4 | format: jb-book
5 | root: intro
6 | parts:
7 | - caption: The Book
8 | chapters:
9 | - file: 00_python_crash_course
10 | sections:
11 | - file: 00_python_crash_course_oop
12 | - file: 00_python_crash_course_datatypes
13 | - file: 00_python_crash_course_variables
14 | - file: 00_python_crash_course_functions
15 | - file: 01_pandas_dataframe
16 | - file: 02_loading_data
17 | - file: 03_cleaning_data
18 | - file: 04_data_visualization
19 | - file: 05_data_exploration
20 | - caption: Appendix
21 | chapters:
22 | - file: AP_nyc_data_definitions.md
23 | - file: AP_seaborn_palette.ipynb
--------------------------------------------------------------------------------
/book/data/building_class.psv:
--------------------------------------------------------------------------------
1 | building_class_code|description
2 | A0|CAPE COD
3 | A1|TWO STORIES - DETACHED SM OR MID
4 | A2|ONE STORY - PERMANENT LIVING QUARTER
5 | A3|LARGE SUBURBAN RESIDENCE
6 | A4|CITY RESIDENCE ONE FAMILY
7 | A5|ONE FAMILY ATTACHED OR SEMI-DETACHED
8 | A6|SUMMER COTTAGE
9 | A7|MANSION TYPE OR TOWN HOUSE
10 | A8|BUNGALOW COLONY - COOPERATIVELY OWNED LAND
11 | A9|MISCELLANEOUS ONE FAMILY
12 | B1|TWO FAMILY BRICK
13 | B2|TWO FAMILY FRAME
14 | B3|TWO FAMILY CONVERTED FROM ONE FAMILY
15 | B9|MISCELLANEOUS TWO FAMILY
16 | C0|THREE FAMILIES
17 | C1|OVER SIX FAMILIES WITHOUT STORES
18 | C2|FIVE TO SIX FAMILIES
19 | C3|FOUR FAMILIES
20 | C4|OLD LAW TENEMENT
21 | C5|CONVERTED DWELLINGS OR ROOMING HOUSE
22 | C6|WALK-UP COOPERATIVE
23 | C7|WALK-UP APT. OVER SIX FAMILIES WITH STORES
24 | C8|WALK-UP CO-OP; CONVERSION FROM LOFT/WAREHOUSE
25 | C9|GARDEN APARTMENTS
26 | CM|MOBILE HOMES/TRAILER PARKS
27 | D0|ELEVATOR CO-OP; CONVERSION FROM LOFT/WAREHOUSE
28 | D1|ELEVATOR APT; SEMI-FIREPROOF WITHOUT STORES
29 | D2|ELEVATOR APT; ARTISTS IN RESIDENCE
30 | D3|ELEVATOR APT; FIREPROOF WITHOUT STORES
31 | D4|ELEVATOR COOPERATIVE
32 | D5|ELEVATOR APT; CONVERTED
33 | D6|ELEVATOR APT; FIREPROOF WITH STORES
34 | D7|ELEVATOR APT; SEMI-FIREPROOF WITH STORES
35 | D8|ELEVATOR APT; LUXURY TYPE
36 | D9|ELEVATOR APT; MISCELLANEOUS
37 | E1|FIREPROOF WAREHOUSE
38 | E2|CONTRACTORS WAREHOUSE
39 | E3|SEMI-FIREPROOF WAREHOUSE
40 | E4|METAL FRAME WAREHOUSE
41 | E7|SELF-STORAGE WAREHOUSES
42 | E9|MISCELLANEOUS WAREHOUSE
43 | F1|FACTORY; HEAVY MANUFACTURING - FIREPROOF
44 | F2|FACTORY; SPECIAL CONSTRUCTION - FIREPROOF
45 | F4|FACTORY; INDUSTRIAL SEMI-FIREPROOF
46 | F5|FACTORY; LIGHT MANUFACTURING
47 | F8|FACTORY; TANK FARM
48 | F9|FACTORY; INDUSTRIAL-MISCELLANEOUS
49 | G0|GARAGE; RESIDENTIAL TAX CLASS 1
50 | G1|ALL PARKING GARAGES
51 | G2|AUTO BODY/COLLISION OR AUTO REPAIR
52 | G3|GAS STATION WITH RETAIL STORE
53 | G4|GAS STATION WITH SERVICE/AUTO REPAIR
54 | G5|GAS STATION ONLY WITH/WITHOUT SMALL KIOSK
55 | G6|LICENSED PARKING LOT
56 | G7|UNLICENSED PARKING LOT
57 | G8|CAR SALES/RENTAL WITH SHOWROOM
58 | G9|MISCELLANEOUS GARAGE
59 | GU|CAR SALES OR RENTAL LOTS WITHOUT SHOWROOM
60 | GW|CAR WASH OR LUBRITORIUM FACILITY
61 | HB|BOUTIQUE: 10-100 ROOMS, W/LUXURY FACILITIES, THEMED, STYLISH, W/FULL SVC ACCOMMODATIONS
62 | HH|HOSTELS- BED RENTALS IN DORMITORY-LIKE SETTINGS W/SHARED ROOMS & BATHROOMS
63 | HR|SRO- 1 OR 2 PEOPLE HOUSED IN INDIVIDUAL ROOMS IN MULTIPLE DWELLING AFFORDABLE HOUSING
64 | HS|EXTENDED STAY/SUITE: AMENITIES SIMILAR TO APT; TYPICALLY CHARGE WEEKLY RATES & LESS EXPENSIVE THAN FULL-SERVICE HOTEL
65 | H1|LUXURY HOTEL
66 | H2|FULL SERVICE HOTEL
67 | H3|LIMITED SERVICE; MANY AFFILIATED WITH NATIONAL CHAIN
68 | H4|MOTEL
69 | H5|HOTEL; PRIVATE CLUB, LUXURY TYPE
70 | H6|APARTMENT HOTEL
71 | H7|APARTMENT HOTEL - COOPERATIVELY OWNED
72 | H8|DORMITORY
73 | H9|MISCELLANEOUS HOTEL
74 | I1|HOSPITAL, SANITARIUM, MENTAL INSTITUTION
75 | I2|INFIRMARY
76 | I3|DISPENSARY
77 | I4|HOSPITAL; STAFF FACILITY
78 | I5|HEALTH CENTER, CHILD CENTER, CLINIC
79 | I6|NURSING HOME
80 | I7|ADULT CARE FACILITY
81 | I9|MISCELLANEOUS HOSPITAL, HEALTH CARE FACILITY
82 | J1|THEATRE; ART TYPE LESS THAN 400 SEATS
83 | J2|THEATRE; ART TYPE MORE THAN 400 SEATS
84 | J3|MOTION PICTURE THEATRE WITH BALCONY
85 | J4|LEGITIMATE THEATRE, SOLE USE
86 | J5|THEATRE IN MIXED-USE BUILDING
87 | J6|TELEVISION STUDIO
88 | J7|OFF BROADWAY TYPE THEATRE
89 | J8|MULTIPLEX PICTURE THEATRE
90 | J9|MISCELLANEOUS THEATRE
91 | K1|ONE STORY RETAIL BUILDING
92 | K2|MULTI-STORY RETAIL BUILDING (2 OR MORE)
93 | K3|MULTI-STORY DEPARTMENT STORE
94 | K4|PREDOMINANT RETAIL WITH OTHER USES
95 | K5|STAND-ALONE FOOD ESTABLISHMENT
96 | K6|SHOPPING CENTER WITH OR WITHOUT PARKING
97 | K7|BANKING FACILITIES WITH OR WITHOUT PARKING
98 | K8|BIG BOX RETAIL: NOT AFFIXED & STANDING ON OWN LOT W/PARKING, E.G. COSTCO & BJ'S
99 | K9|MISCELLANEOUS STORE BUILDING
100 | L1|LOFT; OVER 8 STORIES (MID MANH. TYPE)
101 | L2|LOFT; FIREPROOF AND STORAGE TYPE WITHOUT STORES
102 | L3|LOFT; SEMI-FIREPROOF
103 | L8|LOFT; WITH RETAIL STORES OTHER THAN TYPE ONE
104 | L9|MISCELLANEOUS LOFT
105 | M1|CHURCH, SYNAGOGUE, CHAPEL
106 | M2|MISSION HOUSE (NON-RESIDENTIAL)
107 | M3|PARSONAGE, RECTORY
108 | M4|CONVENT
109 | M9|MISCELLANEOUS RELIGIOUS FACILITY
110 | N1|ASYLUM
111 | N2|HOME FOR INDIGENT CHILDREN, AGED, HOMELESS
112 | N3|ORPHANAGE
113 | N4|DETENTION HOUSE FOR WAYWARD GIRLS
114 | N9|MISCELLANEOUS ASYLUM, HOME
115 | O1|OFFICE ONLY - 1 STORY
116 | O2|OFFICE ONLY 2 - 6 STORIES
117 | O3|OFFICE ONLY 7 - 19 STORIES
118 | O4|OFFICE ONLY WITH OR WITHOUT COMM - 20 STORIES OR MORE
119 | O5|OFFICE WITH COMM - 1 TO 6 STORIES
120 | O6|OFFICE WITH COMM 7 - 19 STORIES
121 | O7|PROFESSIONAL BUILDINGS/STAND ALONE FUNERAL HOMES
122 | O8|OFFICE WITH APARTMENTS ONLY (NO COMM)
123 | O9|MISCELLANEOUS AND OLD STYLE BANK BLDGS.
124 | P1|CONCERT HALL
125 | P2|LODGE ROOM
126 | P3|YWCA, YMCA, YWHA, YMHA, PAL
127 | P4|BEACH CLUB
128 | P5|COMMUNITY CENTER
129 | P6|AMUSEMENT PLACE, BATH HOUSE, BOAT HOUSE
130 | P7|MUSEUM
131 | P8|LIBRARY
132 | P9|MISCELLANEOUS INDOOR PUBLIC ASSEMBLY
133 | Q1|PARKS/RECREATION FACILTY
134 | Q2|PLAYGROUND
135 | Q3|OUTDOOR POOL
136 | Q4|BEACH
137 | Q5|GOLF COURSE
138 | Q6|STADIUM, RACE TRACK, BASEBALL FIELD
139 | Q7|TENNIS COURT
140 | Q8|MARINA, YACHT CLUB
141 | Q9|MISCELLANEOUS OUTDOOR RECREATIONAL FACILITY
142 | RA|CULTURAL, MEDICAL, EDUCATIONAL, ETC.
143 | RB|OFFICE SPACE
144 | RG|INDOOR PARKING
145 | RH|HOTEL/BOATEL
146 | RK|RETAIL SPACE
147 | RP|OUTDOOR PARKING
148 | RR|CONDOMINIUM RENTALS
149 | RS|NON-BUSINESS STORAGE SPACE
150 | RT|TERRACES/GARDENS/CABANAS
151 | RW|WAREHOUSE/FACTORY/INDUSTRIAL
152 | R0|SPECIAL CONDOMINIUM BILLING LOT
153 | R1|CONDO; RESIDENTIAL UNIT IN 2-10 UNIT BLDG.
154 | R2|CONDO; RESIDENTIAL UNIT IN WALK-UP BLDG.
155 | R3|CONDO; RESIDENTIAL UNIT IN 1-3 STORY BLDG.
156 | R4|CONDO; RESIDENTIAL UNIT IN ELEVATOR BLDG.
157 | R5|MISCELLANEOUS COMMERCIAL
158 | R6|CONDO; RESID.UNIT OF 1-3 UNIT BLDG-ORIG CLASS 1
159 | R7|CONDO; COMML.UNIT OF 1-3 UNIT BLDG-ORIG CLASS 1
160 | R8|CONDO; COMML.UNIT OF 2-10 UNIT BLDG.
161 | R9|CO-OP WITHIN A CONDOMINIUM
162 | RR|CONDO RENTALS
163 | S0|PRIMARILY 1 FAMILY WITH 2 STORES OR OFFICES
164 | S1|PRIMARILY 1 FAMILY WITH 1 STORE OR OFFICE
165 | S2|PRIMARILY 2 FAMILY WITH 1 STORE OR OFFICE
166 | S3|PRIMARILY 3 FAMILY WITH 1 STORE OR OFFICE
167 | S4|PRIMARILY 4 FAMILY WITH 1 STORE OROFFICE
168 | S5|PRIMARILY 5-6 FAMILY WITH 1 STORE OR OFFICE
169 | S9|SINGLE OR MULTIPLE DWELLING WITH STORES OR OFFICES
170 | T1|AIRPORT, AIRFIELD, TERMINAL
171 | T2|PIER, DOCK, BULKHEAD
172 | T9|MISCELLANEOUS TRANSPORTATION FACILITY
173 | U0|UTILITY COMPANY LAND AND BUILDING
174 | U1|BRIDGE, TUNNEL, HIGHWAY
175 | U2|GAS OR ELECTRIC UTILITY
176 | U3|CEILING RAILROAD
177 | U4|TELEPHONE UTILITY
178 | U5|COMMUNICATION FACILITY OTHER THAN TELEPHONE
179 | U6|RAILROAD - PRIVATE OWNERSHIP
180 | U7|TRANSPORTATION - PUBLIC OWNERSHIP
181 | U8|REVOCABLE CONSENT
182 | U9|MISCELLANEOUS UTILITY PROPERTY
183 | V0|ZONED RESIDENTIAL; NOT MANHATTAN
184 | V1|ZONED COMMERCIAL OR MANHATTAN RESIDENTIAL
185 | V2|ZONED COMMERCIAL ADJACENT TO CLASS 1 DWELLING: NOT MANHATTAN
186 | V3|ZONED PRIMARILY RESIDENTIAL; NOT MANHATTAN
187 | V4|POLICE OR FIRE DEPARTMENT
188 | V5|SCHOOL SITE OR YARD
189 | V6|LIBRARY, HOSPITAL OR MUSEUM
190 | V7|PORT AUTHORITY OF NEW YORK AND NEW JERSEY
191 | V8|NEW YORK STATE OR US GOVERNMENT
192 | V9|MISCELLANEOUS VACANT LAND
193 | W1|PUBLIC ELEMENTARY, JUNIOR OR SENIOR HIGH
194 | W2|PAROCHIAL SCHOOL, YESHIVA
195 | W3|SCHOOL OR ACADEMY
196 | W4|TRAINING SCHOOL
197 | W5|CITY UNIVERSITY
198 | W6|OTHER COLLEGE AND UNIVERSITY
199 | W7|THEOLOGICAL SEMINARY
200 | W8|OTHER PRIVATE SCHOOL
201 | W9|MISCELLANEOUS EDUCATIONAL FACILITY
202 | Y1|FIRE DEPARTMENT
203 | Y2|POLICE DEPARTMENT
204 | Y3|PRISON, JAIL, HOUSE OF DETENTION
205 | Y4|MILITARY AND NAVAL INSTALLATION
206 | Y5|DEPARTMENT OF REAL ESTATE
207 | Y6|DEPARTMENT OF SANITATION
208 | Y7|DEPARTMENT OF PORTS AND TERMINALS
209 | Y8|DEPARTMENT OF PUBLIC WORKS
210 | Y9|DEPARTMENT OF ENVIRONMENTAL PROTECTION
211 | Z0|TENNIS COURT, POOL, SHED, ETC.
212 | Z1|COURT HOUSE
213 | Z2|PUBLIC PARKING AREA
214 | Z3|POST OFFICE
215 | Z4|FOREIGN GOVERNMENT
216 | Z5|UNITED NATIONS
217 | Z7|EASEMENT
218 | Z8|CEMETERY
219 | Z9|OTHER MISCELLANEOUS
--------------------------------------------------------------------------------
/book/data/movies_data.csv:
--------------------------------------------------------------------------------
1 | movie,genre,rating,director
2 | Batman,action,6,Tim Burton
3 | Jungle Book,kids,9,Wolfgang Reitherman
4 | Titanic,romance,8,James Cameron
5 |
--------------------------------------------------------------------------------
/book/intro.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | **Practical Python for Data Science** by [**Jill Cates**](https://www.jillcates.com/)
4 |
5 |
6 |
7 | Python is the "swiss army knife" of programming. There are several factors that contribute to its versatility:
8 | - it has clean and human-readable syntax so it's easy to learn
9 | - it's an interpreted object-oriented scripting language
10 | - it has a strong open-source community and a large repository of Python packages
11 |
12 | Because of its versatility, Python can be applied to both software development (e.g., building web applications and API's) *and* data science (e.g., scientific computing, creating end-to-end data science pipelines). However, writing Python for data science is *very different* than writing Python for software devleopment. A huge part of the learning curve is getting familiar with the syntax of Python's data science packages including but not limited to Pandas, NumPy, and scikit-learn.
13 |
14 | In this book, we will focus on how to use Python in the context of data science. We will work with a real-life dataset and explore it using the following data science Python packages:
15 |
16 | - [Pandas](https://pandas.pydata.org/)
17 | - [Seaborn](https://seaborn.pydata.org/)
18 | - [Matplotlib](https://matplotlib.org/)
19 |
20 | # Prerequisites
21 |
22 | This book is designed to be accessible for people without a strong technical background. In order to make the most of this book, the suggested requirements are:
23 |
24 | - Basic knowledge of Python
25 | - Some familiarity with Jupyter Notebooks, Pandas, and Seaborn
26 | - Googling skills and ability to read documentation
27 |
28 | # Open a Github Issue
29 |
30 | Did you spot an error in this book? Have an idea on how to make the book better? I'm always open to feedback and new ideas. You can contribute by opening a [Github issue](https://github.com/jupyteracademy/practical-python-for-data-science/issues) or creating a pull request with the proposed fix.
31 |
32 | # Support This Project
33 |
34 | If you would like to support this open-sourced project and its continued development and maintenance, you can support in a few of ways:
35 |
36 | - [buy me a coffee](https://www.buymeacoffee.com/jupyteracademy) ☕
37 | - sign up for my upcoming online courses at [Jupyter Academy](https://jupyteracademy.com/) 🍎
38 |
--------------------------------------------------------------------------------
/book/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thehouseofdata/practical-python-for-data-science/31207f2343526350244d79a633661a8e5a09b3a1/book/logo.png
--------------------------------------------------------------------------------
/book/references.bib:
--------------------------------------------------------------------------------
1 | ---
2 | ---
3 |
4 | @inproceedings{holdgraf_evidence_2014,
5 | address = {Brisbane, Australia, Australia},
6 | title = {Evidence for {Predictive} {Coding} in {Human} {Auditory} {Cortex}},
7 | booktitle = {International {Conference} on {Cognitive} {Neuroscience}},
8 | publisher = {Frontiers in Neuroscience},
9 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Knight, Robert T.},
10 | year = {2014}
11 | }
12 |
13 | @article{holdgraf_rapid_2016,
14 | title = {Rapid tuning shifts in human auditory cortex enhance speech intelligibility},
15 | volume = {7},
16 | issn = {2041-1723},
17 | url = {http://www.nature.com/doifinder/10.1038/ncomms13654},
18 | doi = {10.1038/ncomms13654},
19 | number = {May},
20 | journal = {Nature Communications},
21 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Rieger, Jochem W. and Crone, Nathan and Lin, Jack J. and Knight, Robert T. and Theunissen, Frédéric E.},
22 | year = {2016},
23 | pages = {13654},
24 | file = {Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:C\:\\Users\\chold\\Zotero\\storage\\MDQP3JWE\\Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:application/pdf}
25 | }
26 |
27 | @inproceedings{holdgraf_portable_2017,
28 | title = {Portable learning environments for hands-on computational instruction using container-and cloud-based technology to teach data science},
29 | volume = {Part F1287},
30 | isbn = {978-1-4503-5272-7},
31 | doi = {10.1145/3093338.3093370},
32 | abstract = {© 2017 ACM. There is an increasing interest in learning outside of the traditional classroom setting. This is especially true for topics covering computational tools and data science, as both are challenging to incorporate in the standard curriculum. These atypical learning environments offer new opportunities for teaching, particularly when it comes to combining conceptual knowledge with hands-on experience/expertise with methods and skills. Advances in cloud computing and containerized environments provide an attractive opportunity to improve the effciency and ease with which students can learn. This manuscript details recent advances towards using commonly-Available cloud computing services and advanced cyberinfrastructure support for improving the learning experience in bootcamp-style events. We cover the benets (and challenges) of using a server hosted remotely instead of relying on student laptops, discuss the technology that was used in order to make this possible, and give suggestions for how others could implement and improve upon this model for pedagogy and reproducibility.},
33 | booktitle = {{ACM} {International} {Conference} {Proceeding} {Series}},
34 | author = {Holdgraf, Christopher Ramsay and Culich, A. and Rokem, A. and Deniz, F. and Alegro, M. and Ushizima, D.},
35 | year = {2017},
36 | keywords = {Teaching, Bootcamps, Cloud computing, Data science, Docker, Pedagogy}
37 | }
38 |
39 | @article{holdgraf_encoding_2017,
40 | title = {Encoding and decoding models in cognitive electrophysiology},
41 | volume = {11},
42 | issn = {16625137},
43 | doi = {10.3389/fnsys.2017.00061},
44 | abstract = {© 2017 Holdgraf, Rieger, Micheli, Martin, Knight and Theunissen. Cognitive neuroscience has seen rapid growth in the size and complexity of data recorded from the human brain as well as in the computational tools available to analyze this data. This data explosion has resulted in an increased use of multivariate, model-based methods for asking neuroscience questions, allowing scientists to investigate multiple hypotheses with a single dataset, to use complex, time-varying stimuli, and to study the human brain under more naturalistic conditions. These tools come in the form of “Encoding” models, in which stimulus features are used to model brain activity, and “Decoding” models, in which neural features are used to generated a stimulus output. Here we review the current state of encoding and decoding models in cognitive electrophysiology and provide a practical guide toward conducting experiments and analyses in this emerging field. Our examples focus on using linear models in the study of human language and audition. We show how to calculate auditory receptive fields from natural sounds as well as how to decode neural recordings to predict speech. The paper aims to be a useful tutorial to these approaches, and a practical introduction to using machine learning and applied statistics to build models of neural activity. The data analytic approaches we discuss may also be applied to other sensory modalities, motor systems, and cognitive systems, and we cover some examples in these areas. In addition, a collection of Jupyter notebooks is publicly available as a complement to the material covered in this paper, providing code examples and tutorials for predictive modeling in python. The aimis to provide a practical understanding of predictivemodeling of human brain data and to propose best-practices in conducting these analyses.},
45 | journal = {Frontiers in Systems Neuroscience},
46 | author = {Holdgraf, Christopher Ramsay and Rieger, J.W. and Micheli, C. and Martin, S. and Knight, R.T. and Theunissen, F.E.},
47 | year = {2017},
48 | keywords = {Decoding models, Encoding models, Electrocorticography (ECoG), Electrophysiology/evoked potentials, Machine learning applied to neuroscience, Natural stimuli, Predictive modeling, Tutorials}
49 | }
50 |
51 | @book{ruby,
52 | title = {The Ruby Programming Language},
53 | author = {Flanagan, David and Matsumoto, Yukihiro},
54 | year = {2008},
55 | publisher = {O'Reilly Media}
56 | }
57 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | jupyter-book
2 | pandas
3 | matplotlib
4 | numpy
5 | seaborn
6 | matplotlib
7 | wordcloud
--------------------------------------------------------------------------------
/runtime.txt:
--------------------------------------------------------------------------------
1 | 3.7
--------------------------------------------------------------------------------