├── .gitignore ├── LICENSE ├── README.md ├── book ├── 00_python_crash_course.ipynb ├── 00_python_crash_course_datatypes.ipynb ├── 00_python_crash_course_functions.ipynb ├── 00_python_crash_course_oop.ipynb ├── 00_python_crash_course_variables.ipynb ├── 01_pandas_dataframe.ipynb ├── 02_loading_data.ipynb ├── 03_cleaning_data.ipynb ├── 04_data_visualization.ipynb ├── 05_data_exploration.ipynb ├── AP_nyc_data_definitions.md ├── AP_seaborn_palette.ipynb ├── _config.yml ├── _toc.yml ├── data │ ├── building_class.psv │ ├── movies_data.csv │ ├── nyc_real_estate.csv │ └── nyc_real_estate_clean.csv ├── intro.md ├── logo.png └── references.bib ├── requirements.txt └── runtime.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | 131 | book/_build/ 132 | book/assets/ 133 | *.DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2022, Jupyter Academy 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Practical Python for Data Science 2 | 3 | Oh, hey there! Welcome to my open-source book - `Practical Python for Data Science`. This repo contains all of the code that was used to build this book. 4 | 5 | Check out the online book at [www.practicalpythonfordatascience.com](www.practicalpythonfordatascience.com). 6 | 7 | 8 | 9 | 10 | ### Introduction 11 | 12 | Python is the "swiss army knife" of programming. There are several factors that contribute to its versatility: 13 | 14 | - it has clean and human-readable syntax so it’s easy to learn 15 | - it’s an interpreted object-oriented scripting language 16 | - it has a strong open-source community and a large repository of Python packages 17 | 18 | Because of its versatility, Python can be applied to both software development (e.g., building web applications and API’s) and data science (e.g., scientific computing, creating end-to-end data science pipelines). However, writing Python for data science is very different than writing Python for software devleopment. A huge part of the learning curve is getting familiar with the syntax of Python’s data science packages including but not limited to Pandas, NumPy, and scikit-learn. 19 | 20 | In this book, we will focus on how to use Python in the context of data science. We will work with a real-life dataset and explore it using the following data science Python packages: 21 | 22 | - [Pandas](https://pandas.pydata.org/) 23 | - [Seaborn](https://seaborn.pydata.org/) 24 | - [Matplotlib](https://matplotlib.org/) 25 | 26 | # Prerequisites 27 | 28 | This book is designed to be accessible for people without a strong technical background. In order to make the most of this book, the suggested requirements are: 29 | 30 | - Basic knowledge of Python 31 | - Some familiarity with Jupyter Notebooks, Pandas, and Seaborn 32 | - Googling skills and ability to read documentation 33 | 34 | # Open a Github Issue 35 | 36 | Did you spot an error in this book? Have an idea on how to make the book better? I'm always open to feedback and new ideas. You can contribute by opening a [Github issue](https://github.com/jupyteracademy/practical-python-for-data-science/issues) or creating a pull request with the proposed fix. 37 | 38 | # Support This Project 39 | 40 | If you would like to support this open-sourced project and its continued development and maintenance, you can support in a few of ways: 41 | 42 | - [buy me a coffee](https://www.buymeacoffee.com/jupyteracademy) ☕ 43 | - sign up for my upcoming online courses at [Jupyter Academy](https://jupyteracademy.com/) 💕 44 | 45 | 46 | 47 | -------------------------------------------------------------------------------- /book/00_python_crash_course.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "90cc6d0b", 6 | "metadata": {}, 7 | "source": [ 8 | "# Python Crash Course\n", 9 | "\n", 10 | "You don't need to have world-class Python skills to use Python for data science. While it's helpful to have coding experience, you can get by with knowing the main datatypes of Python and how object-oriented programming works. In this Python crash course, we will cover:\n", 11 | "\n", 12 | "1. [Object-Oriented Programming](00_python_crash_course_oop)\n", 13 | "2. [Python Datatypes](00_python_crash_course_datatypes)\n", 14 | "3. [Variables](00_python_crash_course_variables)\n", 15 | "4. [Functions](00_python_crash_course_functions)\n", 16 | "\n", 17 | "" 18 | ] 19 | } 20 | ], 21 | "metadata": { 22 | "jupytext": { 23 | "cell_metadata_filter": "-all", 24 | "main_language": "python", 25 | "notebook_metadata_filter": "-all" 26 | }, 27 | "kernelspec": { 28 | "display_name": "Python 3 (ipykernel)", 29 | "language": "python", 30 | "name": "python3" 31 | }, 32 | "language_info": { 33 | "codemirror_mode": { 34 | "name": "ipython", 35 | "version": 3 36 | }, 37 | "file_extension": ".py", 38 | "mimetype": "text/x-python", 39 | "name": "python", 40 | "nbconvert_exporter": "python", 41 | "pygments_lexer": "ipython3", 42 | "version": "3.9.12" 43 | } 44 | }, 45 | "nbformat": 4, 46 | "nbformat_minor": 5 47 | } 48 | -------------------------------------------------------------------------------- /book/00_python_crash_course_datatypes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "9b833a43", 6 | "metadata": {}, 7 | "source": [ 8 | "# Python Datatypes \n", 9 | "\n", 10 | "All objects in Python have a datatype. If you want to know the datatype of an object, you can simply use the `type()` function. The main datatypes of Python are:\n", 11 | "\n", 12 | "1. [Integer](#integer)\n", 13 | "2. [Float](#float)\n", 14 | "3. [String](#string)\n", 15 | "4. [Boolean](#boolean)\n", 16 | "5. [List](#list) \n", 17 | "6. [Dictionary](#dictionary)\n", 18 | "\n", 19 | "Let's take a look at each one. " 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "8307b235", 25 | "metadata": {}, 26 | "source": [ 27 | "## 1) Integer \n", 28 | "\n", 29 | "The **integer** is a numerical datatype. It's a whole number, which means that it does not have any decimals and cannot be expressed as a fraction. \n", 30 | "\n", 31 | "**Examples of integers:**\n", 32 | "\n", 33 | "- population\n", 34 | "- number of cities\n", 35 | "- year " 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 1, 41 | "id": "4974e663", 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/plain": [ 47 | "int" 48 | ] 49 | }, 50 | "execution_count": 1, 51 | "metadata": {}, 52 | "output_type": "execute_result" 53 | } 54 | ], 55 | "source": [ 56 | "population = 1000\n", 57 | "type(population)" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "64c5f934", 63 | "metadata": {}, 64 | "source": [ 65 | "\"int\" is short for \"integer\"! 😎" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "id": "6d77146d", 71 | "metadata": {}, 72 | "source": [ 73 | "## 2) Float \n", 74 | "\n", 75 | "The **float** is a real number written in scientific notation with decimals. This is useful when more precision is needed.\n", 76 | "\n", 77 | "**Examples of floats:**\n", 78 | "\n", 79 | "- cost of a latte\n", 80 | "- weight\n", 81 | "- distance in miles " 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 2, 87 | "id": "c8f0d2b2", 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "text/plain": [ 93 | "float" 94 | ] 95 | }, 96 | "execution_count": 2, 97 | "metadata": {}, 98 | "output_type": "execute_result" 99 | } 100 | ], 101 | "source": [ 102 | "cost_of_latte = 4.50\n", 103 | "type(cost_of_latte) " 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "id": "d07380e2", 109 | "metadata": {}, 110 | "source": [ 111 | "Fractions are also expressed as floats (even if the output is theoretically a whole number):" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 3, 117 | "id": "f024fb8b", 118 | "metadata": {}, 119 | "outputs": [ 120 | { 121 | "data": { 122 | "text/plain": [ 123 | "float" 124 | ] 125 | }, 126 | "execution_count": 3, 127 | "metadata": {}, 128 | "output_type": "execute_result" 129 | } 130 | ], 131 | "source": [ 132 | "cost_per_egg = 12/12 \n", 133 | "type(cost_per_egg)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "id": "7fb4115f", 139 | "metadata": {}, 140 | "source": [ 141 | "### Mixing Floats and Integers\n", 142 | "\n", 143 | "If we want to convert a float to an integer (or vice versa), we can easily do so by wrapping the variable in `int()` or `float()`. Let's try this out:" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 4, 149 | "id": "ada52513", 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "name": "stdout", 154 | "output_type": "stream", 155 | "text": [ 156 | "Cost of apple: float 3.55 --> int 3\n", 157 | "Number of apples: int 10 --> float 10.0\n" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "cost_per_apple = 3.55\n", 163 | "n_apples = 10\n", 164 | "\n", 165 | "print(f\"Cost of apple: float {cost_per_apple} --> int {int(cost_per_apple)}\")\n", 166 | "print(f\"Number of apples: int {n_apples} --> float {float(n_apples)}\")" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "id": "cadeb5f2", 172 | "metadata": {}, 173 | "source": [ 174 | "When you \"cast\" (convert) a float into an integer using `int()`, it will trim the values after the decimal point and returns only the integer/whole number part. In other words, `int()` will always round down to the whole number." 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "id": "728a7cc6", 180 | "metadata": {}, 181 | "source": [ 182 | "```{note}\n", 183 | "The print statements above use something called [f-strings](https://realpython.com/python-f-strings/). Why is it called **f-string**? If you notice in the code above, the string inside the print statement is preceded by an \"f\" - this puts it in \"f-string mode\". To embed an expression in your string, you need to wrap it inside squiggly brackets { }. The f-string is only available in Python 3.6 or greater. It lets you embed Python expressions inside string literals in a readable way. Before Python 3.6, you would have to use %-formatting or .format() to embed expressions inside strings, which was much more verbose and prone to error.\n", 184 | "```" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "id": "bfbf11e3", 190 | "metadata": {}, 191 | "source": [ 192 | "In Python, it's possible to mix integers and floats in an arithmetic operation. So you don't need to worry about converting these numeric types into a common format. Let's test it out with our variables n_apples (an integer) and cost_per_apple (a float)." 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 5, 198 | "id": "803b0572", 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "35.5" 205 | ] 206 | }, 207 | "execution_count": 5, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "total_cost = n_apples*cost_per_apple\n", 214 | "total_cost" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "id": "dd2b266b", 220 | "metadata": {}, 221 | "source": [ 222 | "We can see that the output of `n_apples * cost_per_apple` is a float. This is because `n_apples`, which was originally an integer, gets converted to a float when it gets multiplied with `cost_per_apple`.\n", 223 | "\n", 224 | "Here's a complete list of arithmetic operations in Python:\n", 225 | "\n", 226 | "- **Addition:** gets the sum of the operands\n", 227 | "```\n", 228 | "x + y\n", 229 | "```\n", 230 | "- **Subtraction:** gets the difference of the operands\n", 231 | "```\n", 232 | "x - y\n", 233 | "```\n", 234 | "- **Multiplication:** gets the product of the operands\n", 235 | "```\n", 236 | "x * y\n", 237 | "```\n", 238 | "- **Division:** produces the quotient of the operands and returns a float\n", 239 | "```\n", 240 | "x / y\n", 241 | "```\n", 242 | "- **Division with floor:** produces the quotient of the operands and returns an integer (rounds down)\n", 243 | "```\n", 244 | "x // y\n", 245 | "```\n", 246 | "- **Exponent:** raises the first operand to the power of the second operand\n", 247 | "```\n", 248 | "x ** y\n", 249 | "```" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "id": "d3656a4b", 255 | "metadata": {}, 256 | "source": [ 257 | "## 3) String \n", 258 | "\n", 259 | "The **string** datatype is typically used to store text. We can think of a string as a \"sequence of charactertics\" which can be alphateic, numeric, or having special characters. A string is surrounded by quotations, which can be either double quotes `\" \"` or single `' '` quotes.\n", 260 | "\n", 261 | "**Examples of strings:**\n", 262 | "\n", 263 | "- name of city \n", 264 | "- address\n", 265 | "- Canadian postal code" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 6, 271 | "id": "c68f4b94", 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "data": { 276 | "text/plain": [ 277 | "str" 278 | ] 279 | }, 280 | "execution_count": 6, 281 | "metadata": {}, 282 | "output_type": "execute_result" 283 | } 284 | ], 285 | "source": [ 286 | "name_of_city = 'Toronto'\n", 287 | "type(name_of_city) " 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "id": "ae5ab0e9", 293 | "metadata": {}, 294 | "source": [ 295 | "If a string contains an apostrophe, we can use double quotes to define the string and use a single quote character in the string." 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 7, 301 | "id": "6837e39d", 302 | "metadata": {}, 303 | "outputs": [ 304 | { 305 | "name": "stdout", 306 | "output_type": "stream", 307 | "text": [ 308 | "It's snowing outside\n" 309 | ] 310 | } 311 | ], 312 | "source": [ 313 | "text = \"It's snowing outside\"\n", 314 | "print(text)" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "id": "5ff53e65", 320 | "metadata": {}, 321 | "source": [ 322 | "It's important to note that *anything* surrounded by quotations is treated as a string. For example, if you wrap an integer in quotations, its datatype will be a string. " 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 8, 328 | "id": "d52359f9", 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "data": { 333 | "text/plain": [ 334 | "str" 335 | ] 336 | }, 337 | "execution_count": 8, 338 | "metadata": {}, 339 | "output_type": "execute_result" 340 | } 341 | ], 342 | "source": [ 343 | "number_of_planets = \"9\"\n", 344 | "type(number_of_planets)" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "id": "8a914f78", 350 | "metadata": {}, 351 | "source": [ 352 | "### Strings within Strings \n", 353 | "\n", 354 | "If we want to see if a shorter string is inside a longer string, we can use the `in` operator." 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 9, 360 | "id": "88118e42", 361 | "metadata": {}, 362 | "outputs": [ 363 | { 364 | "data": { 365 | "text/plain": [ 366 | "True" 367 | ] 368 | }, 369 | "execution_count": 9, 370 | "metadata": {}, 371 | "output_type": "execute_result" 372 | } 373 | ], 374 | "source": [ 375 | "'el' in 'Hello'" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "id": "1a4606b5", 381 | "metadata": {}, 382 | "source": [ 383 | "### Built-in Functions\n", 384 | "\n", 385 | "Strings have some special built-in functions that are useful when you're analyzing data.\n", 386 | "\n", 387 | "- `text.upper()` - converts text to all uppercase \n", 388 | "- `text.lower()` - converts text to all lowercase\n", 389 | "- `text.capitalize()` - capitalizes text (first character is made uppercase, followed by all lowercase characters)\n", 390 | "- `len(text)` - measures the length of a string (i.e., character count)\n", 391 | "- `text.replace('t', 'a')` - replaces a part of the string with another string " 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "id": "13f502b5", 397 | "metadata": {}, 398 | "source": [ 399 | "## 4) Boolean \n", 400 | "\n", 401 | "A boolean is a binary datatype which can be either `True` or `False`. For those of you who are familiar with other programming languages, it's important to note that Python's boolean datatype must be capitalized - uppercase T for `True` and uppercase F for `False`. Booleans are often used to answer a yes/no question like \"is it nighttime?\" or \"is the patient female?\". \n", 402 | "\n", 403 | "**Examples of booleans:**\n", 404 | "\n", 405 | "- is it morning?\n", 406 | "- is the patient on meds? \n", 407 | "- does x equal y?" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 10, 413 | "id": "86dbc88d", 414 | "metadata": {}, 415 | "outputs": [ 416 | { 417 | "data": { 418 | "text/plain": [ 419 | "bool" 420 | ] 421 | }, 422 | "execution_count": 10, 423 | "metadata": {}, 424 | "output_type": "execute_result" 425 | } 426 | ], 427 | "source": [ 428 | "is_morning = False\n", 429 | "type(is_morning)" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "id": "88f970b6", 435 | "metadata": {}, 436 | "source": [ 437 | "\"bool\" is short for boolean! 😎" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "id": "6ad80a3f", 443 | "metadata": {}, 444 | "source": [ 445 | "### Comparing Values with Boolean Expressions\n", 446 | "\n", 447 | "A boolean expression evaluates a statement and results in a boolean value. For example, the operator `==` tests if two values are equal." 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 11, 453 | "id": "f05820c6", 454 | "metadata": {}, 455 | "outputs": [ 456 | { 457 | "data": { 458 | "text/plain": [ 459 | "False" 460 | ] 461 | }, 462 | "execution_count": 11, 463 | "metadata": {}, 464 | "output_type": "execute_result" 465 | } 466 | ], 467 | "source": [ 468 | "is_vegan = False\n", 469 | "is_vegetarian = True \n", 470 | "\n", 471 | "is_vegan == is_vegetarian" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "id": "4c9adf4e", 477 | "metadata": {}, 478 | "source": [ 479 | "You can also compare two numeric values using:\n", 480 | "\n", 481 | "- `>` (greater than)\n", 482 | "- `<` (less than)\n", 483 | "- `>=` (greater than or equal to)\n", 484 | "- `<=` (less than or equal to)" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 12, 490 | "id": "a8cb2e9c", 491 | "metadata": {}, 492 | "outputs": [ 493 | { 494 | "data": { 495 | "text/plain": [ 496 | "True" 497 | ] 498 | }, 499 | "execution_count": 12, 500 | "metadata": {}, 501 | "output_type": "execute_result" 502 | } 503 | ], 504 | "source": [ 505 | "n_donuts = 10\n", 506 | "n_muffins = 5\n", 507 | "\n", 508 | "n_donuts >= n_muffins" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "id": "53bd7d9a", 514 | "metadata": {}, 515 | "source": [ 516 | "### Comparing Strings with Boolean Expressions\n", 517 | "\n", 518 | "Interestingly, you can also compare two strings. The evaluation goes by alphabetical order so the \"larger\" item would be higher up in the alphabet. " 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 13, 524 | "id": "e18321a6", 525 | "metadata": {}, 526 | "outputs": [ 527 | { 528 | "data": { 529 | "text/plain": [ 530 | "False" 531 | ] 532 | }, 533 | "execution_count": 13, 534 | "metadata": {}, 535 | "output_type": "execute_result" 536 | } 537 | ], 538 | "source": [ 539 | "server = 'Anne'\n", 540 | "host = 'Jim'\n", 541 | "\n", 542 | "server > host" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "id": "a80a0b65", 548 | "metadata": {}, 549 | "source": [ 550 | "## 5) List\n", 551 | "\n", 552 | "[Lists](https://www.w3schools.com/python/python_ref_list.asp) represent a collection of objects and are constructed with square brackets, separating items with commas. A list can contain a collection of one datatype:\n", 553 | "\n", 554 | "```\n", 555 | "list_of_integers = [1,2,3,4,5]\n", 556 | "```\n", 557 | "\n", 558 | "It can also contain a collection of mixed datatypes:\n", 559 | "\n", 560 | "```\n", 561 | "list_of_mixed_datatypes = ['cat', 10, 'belarus', True]\n", 562 | "```" 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "id": "cfd4d609", 568 | "metadata": {}, 569 | "source": [ 570 | "Let's start with a simple list that captures the number of hours slept by a group of friends:" 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": 14, 576 | "id": "9177fd50", 577 | "metadata": {}, 578 | "outputs": [], 579 | "source": [ 580 | "hours_slept = [10,12,5,8]" 581 | ] 582 | }, 583 | { 584 | "cell_type": "markdown", 585 | "id": "538cabb9", 586 | "metadata": {}, 587 | "source": [ 588 | "To get the length (count) of a list, you can use `len()`." 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 15, 594 | "id": "8649a5fc", 595 | "metadata": {}, 596 | "outputs": [ 597 | { 598 | "data": { 599 | "text/plain": [ 600 | "4" 601 | ] 602 | }, 603 | "execution_count": 15, 604 | "metadata": {}, 605 | "output_type": "execute_result" 606 | } 607 | ], 608 | "source": [ 609 | "len(hours_slept)" 610 | ] 611 | }, 612 | { 613 | "cell_type": "markdown", 614 | "id": "10365d15", 615 | "metadata": {}, 616 | "source": [ 617 | "To get the sum of numbers in a list, you can use `sum()`. This will only work if all elements in the list are numeric. " 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 16, 623 | "id": "07f5b1e9", 624 | "metadata": {}, 625 | "outputs": [ 626 | { 627 | "data": { 628 | "text/plain": [ 629 | "35" 630 | ] 631 | }, 632 | "execution_count": 16, 633 | "metadata": {}, 634 | "output_type": "execute_result" 635 | } 636 | ], 637 | "source": [ 638 | "sum(hours_slept)" 639 | ] 640 | }, 641 | { 642 | "cell_type": "markdown", 643 | "id": "7cdbe619", 644 | "metadata": {}, 645 | "source": [ 646 | "You can get the smallest and largest values of a list using `min()` and `max()`, respectively." 647 | ] 648 | }, 649 | { 650 | "cell_type": "code", 651 | "execution_count": 17, 652 | "id": "4db43aa4", 653 | "metadata": {}, 654 | "outputs": [ 655 | { 656 | "data": { 657 | "text/plain": [ 658 | "5" 659 | ] 660 | }, 661 | "execution_count": 17, 662 | "metadata": {}, 663 | "output_type": "execute_result" 664 | } 665 | ], 666 | "source": [ 667 | "min(hours_slept)" 668 | ] 669 | }, 670 | { 671 | "cell_type": "code", 672 | "execution_count": 18, 673 | "id": "798cecd1", 674 | "metadata": {}, 675 | "outputs": [ 676 | { 677 | "data": { 678 | "text/plain": [ 679 | "12" 680 | ] 681 | }, 682 | "execution_count": 18, 683 | "metadata": {}, 684 | "output_type": "execute_result" 685 | } 686 | ], 687 | "source": [ 688 | "max(hours_slept)" 689 | ] 690 | }, 691 | { 692 | "cell_type": "markdown", 693 | "id": "d014e71c", 694 | "metadata": {}, 695 | "source": [ 696 | "### Sorting Lists\n", 697 | "\n", 698 | "You can also sort elements within a list using the `.sorted()` function, which sorts the list from lowest to highest value." 699 | ] 700 | }, 701 | { 702 | "cell_type": "code", 703 | "execution_count": 19, 704 | "id": "d264d401", 705 | "metadata": {}, 706 | "outputs": [ 707 | { 708 | "data": { 709 | "text/plain": [ 710 | "[5, 8, 10, 12]" 711 | ] 712 | }, 713 | "execution_count": 19, 714 | "metadata": {}, 715 | "output_type": "execute_result" 716 | } 717 | ], 718 | "source": [ 719 | "hours_slept.sort()\n", 720 | "hours_slept" 721 | ] 722 | }, 723 | { 724 | "cell_type": "markdown", 725 | "id": "b6b14ef6", 726 | "metadata": {}, 727 | "source": [ 728 | "You can also reverse the order of the sort, from highest to lowest value, sing `.reverse()`." 729 | ] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": 20, 734 | "id": "10b064e0", 735 | "metadata": {}, 736 | "outputs": [ 737 | { 738 | "data": { 739 | "text/plain": [ 740 | "[12, 10, 8, 5]" 741 | ] 742 | }, 743 | "execution_count": 20, 744 | "metadata": {}, 745 | "output_type": "execute_result" 746 | } 747 | ], 748 | "source": [ 749 | "hours_slept.reverse()\n", 750 | "hours_slept" 751 | ] 752 | }, 753 | { 754 | "cell_type": "markdown", 755 | "id": "b04f2cea", 756 | "metadata": {}, 757 | "source": [ 758 | "### Lists are ordered\n", 759 | "\n", 760 | "Lists are ordered which means that the order of elements within a list is part of a list's identity. You can have two lists with the exact same elements but if the order of elements are different, these lists are not the same. Let's demonstrate this with an example." 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": 21, 766 | "id": "4499f09f", 767 | "metadata": {}, 768 | "outputs": [ 769 | { 770 | "data": { 771 | "text/plain": [ 772 | "False" 773 | ] 774 | }, 775 | "execution_count": 21, 776 | "metadata": {}, 777 | "output_type": "execute_result" 778 | } 779 | ], 780 | "source": [ 781 | "list1 = [1,2,3,4]\n", 782 | "list2 = [4,3,2,1]\n", 783 | "\n", 784 | "list1 == list2" 785 | ] 786 | }, 787 | { 788 | "cell_type": "markdown", 789 | "id": "c353c8e6", 790 | "metadata": {}, 791 | "source": [ 792 | "`list1` and `list2` are not equal to one another since the order of their elements are different." 793 | ] 794 | }, 795 | { 796 | "cell_type": "markdown", 797 | "id": "bca028ad", 798 | "metadata": {}, 799 | "source": [ 800 | "### The Index: Accessing Elements within a List\n", 801 | "\n", 802 | "You can access elements in a list by referencing its index. The index of a list starts at 0, which is probably different from what you're use to if you come from an R or Matlab background.\n", 803 | "\n", 804 | "\n", 805 | "\n", 806 | "Let's say we want to go grocery shopping. We made a list of all the items we want to buy:\n", 807 | "\n", 808 | "\n", 809 | "\n", 810 | "Each item in this list has a location (an index). \n", 811 | "\n", 812 | "\n", 813 | "\n", 814 | "A list can have negative indices too. A negative list index counts from the end of a list. \n", 815 | "\n", 816 | "\n", 817 | "We can get an individual item from a list using `shopping_list[index]`. Let's test this out!" 818 | ] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "execution_count": 22, 823 | "id": "228f132d", 824 | "metadata": {}, 825 | "outputs": [ 826 | { 827 | "name": "stdout", 828 | "output_type": "stream", 829 | "text": [ 830 | "apples\n", 831 | "carrots\n", 832 | "chocolate\n", 833 | "bananas\n", 834 | "onions\n" 835 | ] 836 | } 837 | ], 838 | "source": [ 839 | "shopping_list = ['apples', 'carrots', 'chocolate', 'bananas', 'onions']\n", 840 | "\n", 841 | "print(shopping_list[0])\n", 842 | "print(shopping_list[1])\n", 843 | "print(shopping_list[2])\n", 844 | "print(shopping_list[3])\n", 845 | "print(shopping_list[4])" 846 | ] 847 | }, 848 | { 849 | "cell_type": "markdown", 850 | "id": "4f4457b3", 851 | "metadata": {}, 852 | "source": [ 853 | "Now, let's try calling each item by its negative index. " 854 | ] 855 | }, 856 | { 857 | "cell_type": "code", 858 | "execution_count": 23, 859 | "id": "0a477bdd", 860 | "metadata": {}, 861 | "outputs": [ 862 | { 863 | "name": "stdout", 864 | "output_type": "stream", 865 | "text": [ 866 | "apples\n", 867 | "carrots\n", 868 | "chocolate\n", 869 | "bananas\n", 870 | "onions\n" 871 | ] 872 | } 873 | ], 874 | "source": [ 875 | "print(shopping_list[-5])\n", 876 | "print(shopping_list[-4])\n", 877 | "print(shopping_list[-3])\n", 878 | "print(shopping_list[-2])\n", 879 | "print(shopping_list[-1])" 880 | ] 881 | }, 882 | { 883 | "cell_type": "markdown", 884 | "id": "9106803c", 885 | "metadata": {}, 886 | "source": [ 887 | "### Slicing a list\n", 888 | "\n", 889 | "You can get a subset of a list, or \"slice\" it, using list indices. If `shopping_list` is a list, the expression `[m:n]` returns the portion of `shopping_list` from the index `m` to BUT not including index `n`. Let's see how this works." 890 | ] 891 | }, 892 | { 893 | "cell_type": "code", 894 | "execution_count": 24, 895 | "id": "f6d380d1", 896 | "metadata": {}, 897 | "outputs": [ 898 | { 899 | "data": { 900 | "text/plain": [ 901 | "['carrots', 'chocolate']" 902 | ] 903 | }, 904 | "execution_count": 24, 905 | "metadata": {}, 906 | "output_type": "execute_result" 907 | } 908 | ], 909 | "source": [ 910 | "shopping_list[1:3]" 911 | ] 912 | }, 913 | { 914 | "cell_type": "markdown", 915 | "id": "c2d657e4", 916 | "metadata": {}, 917 | "source": [ 918 | "The code above returns 'carrots' and 'chocolate', which are represented by indices 1 and 2. It didn't return index 3 (bananas) because the second number of the slice is non-inclusive. To include index 3, we would have to update the slice to `[1:4]`:" 919 | ] 920 | }, 921 | { 922 | "cell_type": "code", 923 | "execution_count": 25, 924 | "id": "d7fdc8c1", 925 | "metadata": {}, 926 | "outputs": [ 927 | { 928 | "data": { 929 | "text/plain": [ 930 | "['carrots', 'chocolate', 'bananas']" 931 | ] 932 | }, 933 | "execution_count": 25, 934 | "metadata": {}, 935 | "output_type": "execute_result" 936 | } 937 | ], 938 | "source": [ 939 | "shopping_list[1:4]" 940 | ] 941 | }, 942 | { 943 | "cell_type": "markdown", 944 | "id": "49765705", 945 | "metadata": {}, 946 | "source": [ 947 | "### Finding Elements in a List\n", 948 | "\n", 949 | "You can check to see if an element exists inside a list using the `in` operator." 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": 26, 955 | "id": "670a3201", 956 | "metadata": {}, 957 | "outputs": [ 958 | { 959 | "data": { 960 | "text/plain": [ 961 | "True" 962 | ] 963 | }, 964 | "execution_count": 26, 965 | "metadata": {}, 966 | "output_type": "execute_result" 967 | } 968 | ], 969 | "source": [ 970 | "'carrots' in shopping_list" 971 | ] 972 | }, 973 | { 974 | "cell_type": "code", 975 | "execution_count": 27, 976 | "id": "4615230f", 977 | "metadata": {}, 978 | "outputs": [ 979 | { 980 | "data": { 981 | "text/plain": [ 982 | "False" 983 | ] 984 | }, 985 | "execution_count": 27, 986 | "metadata": {}, 987 | "output_type": "execute_result" 988 | } 989 | ], 990 | "source": [ 991 | "'milk' in shopping_list" 992 | ] 993 | }, 994 | { 995 | "cell_type": "markdown", 996 | "id": "a7aebc0f", 997 | "metadata": {}, 998 | "source": [ 999 | "### Iterating Over Lists\n", 1000 | "\n", 1001 | "There are several ways to iterate over a list. The traditional approach is to use a `for loop`." 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "code", 1006 | "execution_count": 28, 1007 | "id": "89965411", 1008 | "metadata": {}, 1009 | "outputs": [ 1010 | { 1011 | "name": "stdout", 1012 | "output_type": "stream", 1013 | "text": [ 1014 | "apples\n", 1015 | "carrots\n", 1016 | "chocolate\n", 1017 | "bananas\n", 1018 | "onions\n" 1019 | ] 1020 | } 1021 | ], 1022 | "source": [ 1023 | "for item in shopping_list:\n", 1024 | " print(item)" 1025 | ] 1026 | }, 1027 | { 1028 | "cell_type": "markdown", 1029 | "id": "4b630e30", 1030 | "metadata": {}, 1031 | "source": [ 1032 | "If you also need the element's index in your for loop, you can access it using `enumerate()`." 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "code", 1037 | "execution_count": 29, 1038 | "id": "153975e5", 1039 | "metadata": {}, 1040 | "outputs": [ 1041 | { 1042 | "name": "stdout", 1043 | "output_type": "stream", 1044 | "text": [ 1045 | "1) apples\n", 1046 | "2) carrots\n", 1047 | "3) chocolate\n", 1048 | "4) bananas\n", 1049 | "5) onions\n" 1050 | ] 1051 | } 1052 | ], 1053 | "source": [ 1054 | "for i, item in enumerate(shopping_list):\n", 1055 | " print(f\"{i+1}) {item}\")" 1056 | ] 1057 | }, 1058 | { 1059 | "cell_type": "markdown", 1060 | "id": "bb45d629", 1061 | "metadata": {}, 1062 | "source": [ 1063 | "Another way to iterate over a list is to use list comprehension. This is a one-liner that is useful when you're applying a simple operation to each element in your list. For example, let's make all elements inside `shopping_list` uppercase." 1064 | ] 1065 | }, 1066 | { 1067 | "cell_type": "code", 1068 | "execution_count": 30, 1069 | "id": "4a3a185f", 1070 | "metadata": {}, 1071 | "outputs": [ 1072 | { 1073 | "data": { 1074 | "text/plain": [ 1075 | "['APPLES', 'CARROTS', 'CHOCOLATE', 'BANANAS', 'ONIONS']" 1076 | ] 1077 | }, 1078 | "execution_count": 30, 1079 | "metadata": {}, 1080 | "output_type": "execute_result" 1081 | } 1082 | ], 1083 | "source": [ 1084 | "[item.upper() for item in shopping_list]" 1085 | ] 1086 | }, 1087 | { 1088 | "cell_type": "markdown", 1089 | "id": "b7df65c6", 1090 | "metadata": {}, 1091 | "source": [ 1092 | "### Lists are Mutable\n", 1093 | "\n", 1094 | "An important feature of a list is that it's mutable. This means that elements within a list can be added, deleted, or changed after being defined.\n", 1095 | "\n", 1096 | "To add a new element to a list, you can use `.extend()`:" 1097 | ] 1098 | }, 1099 | { 1100 | "cell_type": "code", 1101 | "execution_count": 31, 1102 | "id": "92a9f89f", 1103 | "metadata": {}, 1104 | "outputs": [ 1105 | { 1106 | "data": { 1107 | "text/plain": [ 1108 | "['apples', 'carrots', 'chocolate', 'bananas', 'onions', 'milk']" 1109 | ] 1110 | }, 1111 | "execution_count": 31, 1112 | "metadata": {}, 1113 | "output_type": "execute_result" 1114 | } 1115 | ], 1116 | "source": [ 1117 | "shopping_list.extend(['milk'])\n", 1118 | "shopping_list" 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "markdown", 1123 | "id": "20a62882", 1124 | "metadata": {}, 1125 | "source": [ 1126 | "You can also add another list like this:" 1127 | ] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "execution_count": 32, 1132 | "id": "38748117", 1133 | "metadata": {}, 1134 | "outputs": [ 1135 | { 1136 | "data": { 1137 | "text/plain": [ 1138 | "['apples',\n", 1139 | " 'carrots',\n", 1140 | " 'chocolate',\n", 1141 | " 'bananas',\n", 1142 | " 'onions',\n", 1143 | " 'milk',\n", 1144 | " 'cake',\n", 1145 | " 'watermelon']" 1146 | ] 1147 | }, 1148 | "execution_count": 32, 1149 | "metadata": {}, 1150 | "output_type": "execute_result" 1151 | } 1152 | ], 1153 | "source": [ 1154 | "more_food = ['cake', 'watermelon']\n", 1155 | "shopping_list += more_food\n", 1156 | "shopping_list" 1157 | ] 1158 | }, 1159 | { 1160 | "cell_type": "markdown", 1161 | "id": "0c9561ca", 1162 | "metadata": {}, 1163 | "source": [ 1164 | "To remove the last element of a list, you can \"pop\" it:" 1165 | ] 1166 | }, 1167 | { 1168 | "cell_type": "code", 1169 | "execution_count": 33, 1170 | "id": "128f3015", 1171 | "metadata": {}, 1172 | "outputs": [ 1173 | { 1174 | "data": { 1175 | "text/plain": [ 1176 | "['apples', 'carrots', 'chocolate', 'bananas', 'onions', 'milk', 'cake']" 1177 | ] 1178 | }, 1179 | "execution_count": 33, 1180 | "metadata": {}, 1181 | "output_type": "execute_result" 1182 | } 1183 | ], 1184 | "source": [ 1185 | "shopping_list.pop()\n", 1186 | "shopping_list" 1187 | ] 1188 | }, 1189 | { 1190 | "cell_type": "markdown", 1191 | "id": "fd4d823f", 1192 | "metadata": {}, 1193 | "source": [ 1194 | "If you wanted to remove a specific element from your list, you can use the `remove()` method." 1195 | ] 1196 | }, 1197 | { 1198 | "cell_type": "code", 1199 | "execution_count": 34, 1200 | "id": "5d6f829f", 1201 | "metadata": {}, 1202 | "outputs": [ 1203 | { 1204 | "data": { 1205 | "text/plain": [ 1206 | "['apples', 'chocolate', 'bananas', 'onions', 'milk', 'cake']" 1207 | ] 1208 | }, 1209 | "execution_count": 34, 1210 | "metadata": {}, 1211 | "output_type": "execute_result" 1212 | } 1213 | ], 1214 | "source": [ 1215 | "shopping_list.remove('carrots')\n", 1216 | "shopping_list" 1217 | ] 1218 | }, 1219 | { 1220 | "cell_type": "markdown", 1221 | "id": "bed14dc2", 1222 | "metadata": {}, 1223 | "source": [ 1224 | "## 6) Dictionary\n", 1225 | "\n", 1226 | "Dictionaries are used to store data values in `key:value` pairs. Similar to the [list](#List), a dictionary is a collection of objects. It is also **mutable**, meaning that you can add, remove, change values inside of it. \n", 1227 | "\n", 1228 | "```{note}\n", 1229 | "If you’ve ever worked with JSON before, a dictionary is very similar to the JSON object. In fact, if you load JSON data into Python, it will be expressed as a dictionary. Similarly, you can write a Python dictionary to a JSON file. \n", 1230 | "```\n", 1231 | "\n", 1232 | "With the **list**, we access elements using the index. With the **dictionary**, we access elements using keys. Let's take a look at an example of a dictionary which captures population information about boroughs in New York City: " 1233 | ] 1234 | }, 1235 | { 1236 | "cell_type": "code", 1237 | "execution_count": 35, 1238 | "id": "62413e2a", 1239 | "metadata": {}, 1240 | "outputs": [ 1241 | { 1242 | "data": { 1243 | "text/plain": [ 1244 | "dict" 1245 | ] 1246 | }, 1247 | "execution_count": 35, 1248 | "metadata": {}, 1249 | "output_type": "execute_result" 1250 | } 1251 | ], 1252 | "source": [ 1253 | "population_nyc = {\n", 1254 | " 'bronx': 1472654,\n", 1255 | " 'brooklyn': 2736074,\n", 1256 | " 'manhattan': 1694251, \n", 1257 | " 'queens': 2405464,\n", 1258 | " 'staten_island': 495747\n", 1259 | "}\n", 1260 | "\n", 1261 | "type(population_nyc)" 1262 | ] 1263 | }, 1264 | { 1265 | "cell_type": "markdown", 1266 | "id": "8d4348f1", 1267 | "metadata": {}, 1268 | "source": [ 1269 | "\"dict\" is short for \"dictionary\"! 😎" 1270 | ] 1271 | }, 1272 | { 1273 | "cell_type": "markdown", 1274 | "id": "051cf4bd", 1275 | "metadata": {}, 1276 | "source": [ 1277 | "In this dictionary, the \"key\" is the borough name and the \"value\" is the population of that borough. To get a particular value, we need to know the key of that value.\n", 1278 | "\n", 1279 | "\n", 1280 | "\n", 1281 | "For example, let's say we want to get the population of Manhattan. We can do so by doing this:" 1282 | ] 1283 | }, 1284 | { 1285 | "cell_type": "code", 1286 | "execution_count": 36, 1287 | "id": "26ea328a", 1288 | "metadata": {}, 1289 | "outputs": [ 1290 | { 1291 | "data": { 1292 | "text/plain": [ 1293 | "1694251" 1294 | ] 1295 | }, 1296 | "execution_count": 36, 1297 | "metadata": {}, 1298 | "output_type": "execute_result" 1299 | } 1300 | ], 1301 | "source": [ 1302 | "population_nyc['manhattan']" 1303 | ] 1304 | }, 1305 | { 1306 | "cell_type": "markdown", 1307 | "id": "81aa9efe", 1308 | "metadata": {}, 1309 | "source": [ 1310 | "We can get all keys of a dictionary using `.keys()`:" 1311 | ] 1312 | }, 1313 | { 1314 | "cell_type": "code", 1315 | "execution_count": 37, 1316 | "id": "9e79d91a", 1317 | "metadata": {}, 1318 | "outputs": [ 1319 | { 1320 | "data": { 1321 | "text/plain": [ 1322 | "dict_keys(['bronx', 'brooklyn', 'manhattan', 'queens', 'staten_island'])" 1323 | ] 1324 | }, 1325 | "execution_count": 37, 1326 | "metadata": {}, 1327 | "output_type": "execute_result" 1328 | } 1329 | ], 1330 | "source": [ 1331 | "population_nyc.keys()" 1332 | ] 1333 | }, 1334 | { 1335 | "cell_type": "markdown", 1336 | "id": "0cda691e", 1337 | "metadata": {}, 1338 | "source": [ 1339 | "We can get all values of a dictionary using `.values()`:" 1340 | ] 1341 | }, 1342 | { 1343 | "cell_type": "code", 1344 | "execution_count": 38, 1345 | "id": "fcdbefb0", 1346 | "metadata": {}, 1347 | "outputs": [ 1348 | { 1349 | "data": { 1350 | "text/plain": [ 1351 | "dict_values([1472654, 2736074, 1694251, 2405464, 495747])" 1352 | ] 1353 | }, 1354 | "execution_count": 38, 1355 | "metadata": {}, 1356 | "output_type": "execute_result" 1357 | } 1358 | ], 1359 | "source": [ 1360 | "population_nyc.values()" 1361 | ] 1362 | }, 1363 | { 1364 | "cell_type": "markdown", 1365 | "id": "f60941e1", 1366 | "metadata": {}, 1367 | "source": [ 1368 | "You can **add** a new key-value pair to the dictionary like this:" 1369 | ] 1370 | }, 1371 | { 1372 | "cell_type": "code", 1373 | "execution_count": 39, 1374 | "id": "9d5f67e1", 1375 | "metadata": {}, 1376 | "outputs": [ 1377 | { 1378 | "data": { 1379 | "text/plain": [ 1380 | "{'bronx': 1472654,\n", 1381 | " 'brooklyn': 2736074,\n", 1382 | " 'manhattan': 1694251,\n", 1383 | " 'queens': 2405464,\n", 1384 | " 'staten_island': 495747,\n", 1385 | " 'long_island': 8063232}" 1386 | ] 1387 | }, 1388 | "execution_count": 39, 1389 | "metadata": {}, 1390 | "output_type": "execute_result" 1391 | } 1392 | ], 1393 | "source": [ 1394 | "population_nyc['long_island'] = 8063232\n", 1395 | "population_nyc" 1396 | ] 1397 | }, 1398 | { 1399 | "cell_type": "markdown", 1400 | "id": "64e5425c", 1401 | "metadata": {}, 1402 | "source": [ 1403 | "You can also **change the value** of a key like this:" 1404 | ] 1405 | }, 1406 | { 1407 | "cell_type": "code", 1408 | "execution_count": 40, 1409 | "id": "29120f7d", 1410 | "metadata": {}, 1411 | "outputs": [ 1412 | { 1413 | "data": { 1414 | "text/plain": [ 1415 | "{'bronx': 1472654,\n", 1416 | " 'brooklyn': 2736074,\n", 1417 | " 'manhattan': 1694251,\n", 1418 | " 'queens': 2405464,\n", 1419 | " 'staten_island': 495747,\n", 1420 | " 'long_island': 8}" 1421 | ] 1422 | }, 1423 | "execution_count": 40, 1424 | "metadata": {}, 1425 | "output_type": "execute_result" 1426 | } 1427 | ], 1428 | "source": [ 1429 | "population_nyc['long_island'] = 8\n", 1430 | "population_nyc" 1431 | ] 1432 | }, 1433 | { 1434 | "cell_type": "markdown", 1435 | "id": "efd69a3b", 1436 | "metadata": {}, 1437 | "source": [ 1438 | "Long Island is technically not part of NYC so let's remove it from our dictionary. We can **remove** the \"long_island\" key-value pair using `.pop(key_name)`." 1439 | ] 1440 | }, 1441 | { 1442 | "cell_type": "code", 1443 | "execution_count": 41, 1444 | "id": "22edcdc7", 1445 | "metadata": {}, 1446 | "outputs": [ 1447 | { 1448 | "data": { 1449 | "text/plain": [ 1450 | "{'bronx': 1472654,\n", 1451 | " 'brooklyn': 2736074,\n", 1452 | " 'manhattan': 1694251,\n", 1453 | " 'queens': 2405464,\n", 1454 | " 'staten_island': 495747}" 1455 | ] 1456 | }, 1457 | "execution_count": 41, 1458 | "metadata": {}, 1459 | "output_type": "execute_result" 1460 | } 1461 | ], 1462 | "source": [ 1463 | "population_nyc.pop('long_island')\n", 1464 | "population_nyc" 1465 | ] 1466 | } 1467 | ], 1468 | "metadata": { 1469 | "jupytext": { 1470 | "cell_metadata_filter": "-all", 1471 | "main_language": "python", 1472 | "notebook_metadata_filter": "-all" 1473 | }, 1474 | "kernelspec": { 1475 | "display_name": "Python 3 (ipykernel)", 1476 | "language": "python", 1477 | "name": "python3" 1478 | }, 1479 | "language_info": { 1480 | "codemirror_mode": { 1481 | "name": "ipython", 1482 | "version": 3 1483 | }, 1484 | "file_extension": ".py", 1485 | "mimetype": "text/x-python", 1486 | "name": "python", 1487 | "nbconvert_exporter": "python", 1488 | "pygments_lexer": "ipython3", 1489 | "version": "3.9.12" 1490 | } 1491 | }, 1492 | "nbformat": 4, 1493 | "nbformat_minor": 5 1494 | } 1495 | -------------------------------------------------------------------------------- /book/00_python_crash_course_functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Functions\n", 8 | "\n", 9 | "We use functions in programming to bundle a set of instructions in a self-contained package. A function is a piece of code written to carry out a specific task. Functions are useful when we want to execute a specific task multiple times within our program.\n", 10 | "\n", 11 | "A function typically has two main components: \n", 12 | "\n", 13 | "1. an **input**, which can be assigned a default if not specified\n", 14 | "2. an **output**, which gets return once the code inside the function is finished running \n", 15 | "\n", 16 | "\n", 17 | "\n", 18 | "The general structure of a function looks like this:\n", 19 | "\n", 20 | "```\n", 21 | "def task_name(input):\n", 22 | " # task code goes here\n", 23 | " return output \n", 24 | "```\n", 25 | "\n", 26 | "The input(s) of the function are called `parameters`. " 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "## Built-in Python Functions\n", 34 | "\n", 35 | "Python comes with a wide variety of built-in functions. For example, `type()` is a function that takes a value or variable name as its input and returns the name of the datatype as the output. " 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 1, 41 | "metadata": {}, 42 | "outputs": [ 43 | { 44 | "data": { 45 | "text/plain": [ 46 | "int" 47 | ] 48 | }, 49 | "execution_count": 1, 50 | "metadata": {}, 51 | "output_type": "execute_result" 52 | } 53 | ], 54 | "source": [ 55 | "type(100)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "`sum()` is another built-in Python function that takes in a list of values and returns the sum. Similarly, `len()` takes in a list and returns the length of that list." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 2, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "name": "stdout", 72 | "output_type": "stream", 73 | "text": [ 74 | "sum: 60\n", 75 | "len: 3\n" 76 | ] 77 | } 78 | ], 79 | "source": [ 80 | "n_apples = [10,20,30]\n", 81 | "\n", 82 | "print(f\"sum: {sum(n_apples)}\")\n", 83 | "print(f\"len: {len(n_apples)}\")\n" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "If you want to learn more about a Python function, you can use the `help()` function:" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 3, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "Help on built-in function sum in module builtins:\n", 103 | "\n", 104 | "sum(iterable, /, start=0)\n", 105 | " Return the sum of a 'start' value (default: 0) plus an iterable of numbers\n", 106 | " \n", 107 | " When the iterable is empty, return the start value.\n", 108 | " This function is intended specifically for use with numeric values and may\n", 109 | " reject non-numeric types.\n", 110 | "\n" 111 | ] 112 | } 113 | ], 114 | "source": [ 115 | "help(sum)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "## How to Define Your Own Function\n", 123 | "\n", 124 | "When defining your own function, here are the steps that you should follow:\n", 125 | "\n", 126 | "1. Use the keyword `def` to declare the function and follow this with the function name\n", 127 | "2. Add parameters (inputs) to the function. These go inside the parentheses of the function.\n", 128 | "3. Add statements (instructions/logic) that the function should execute. \n", 129 | "4. End the function with a return statement so that it returns the desired output.\n", 130 | "\n", 131 | "You don't need to have a return statement for your function to be valid. What happens when your function doesn't return anything?" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 4, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "name": "stdout", 141 | "output_type": "stream", 142 | "text": [ 143 | "hello!\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "def hello():\n", 149 | " print(\"hello!\")\n", 150 | "\n", 151 | "hello()" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "Don't be confused with the function above! It's printing `\"hello!\"` but it's not returning it. " 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 5, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "name": "stdout", 168 | "output_type": "stream", 169 | "text": [ 170 | "hello!\n", 171 | "Output type: \n" 172 | ] 173 | } 174 | ], 175 | "source": [ 176 | "output = hello()\n", 177 | "\n", 178 | "print(f\"Output type: {type(output)}\")" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "Here's an example that has two parameters (`number_1`, `number_2`) and returns the sum of those two inputs." 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 6, 191 | "metadata": {}, 192 | "outputs": [ 193 | { 194 | "data": { 195 | "text/plain": [ 196 | "25" 197 | ] 198 | }, 199 | "execution_count": 6, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "def sum_numbers(number_1, number_2):\n", 206 | " total_sum = number_1 + number_2\n", 207 | " return total_sum\n", 208 | "\n", 209 | "x = sum_numbers(10,15)\n", 210 | "x" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "## Documenting Your Function\n", 218 | "\n", 219 | "When defining your function, it's very important to include documentation. A function's documentation is called `docstrings`. It typically describes the purpose of your function, what computations it performs, and what gets returned. It also provides information on what your inputs should be. \n", 220 | "\n", 221 | "In Python, there are two main styles for writing docstrings. The first style is [Google](http://google.github.io/styleguide/pyguide.html#Comments):\n", 222 | "\n", 223 | "```\n", 224 | "def func(arg1, arg2):\n", 225 | " \"\"\"Summary line.\n", 226 | "\n", 227 | " Extended description of function.\n", 228 | "\n", 229 | " Args:\n", 230 | " arg1 (int): Description of arg1\n", 231 | " arg2 (str): Description of arg2\n", 232 | "\n", 233 | " Returns:\n", 234 | " bool: Description of return value\n", 235 | "\n", 236 | " \"\"\"\n", 237 | " return True\n", 238 | "```\n", 239 | "\n", 240 | "The second style is [Numpy](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard):\n", 241 | "\n", 242 | "```\n", 243 | "def func(arg1, arg2):\n", 244 | " \"\"\"Summary line.\n", 245 | "\n", 246 | " Extended description of function.\n", 247 | "\n", 248 | " Parameters\n", 249 | " ----------\n", 250 | " arg1 : int\n", 251 | " Description of arg1\n", 252 | " arg2 : str\n", 253 | " Description of arg2\n", 254 | "\n", 255 | " Returns\n", 256 | " -------\n", 257 | " bool\n", 258 | " Description of return value\n", 259 | "\n", 260 | " \"\"\"\n", 261 | " return True\n", 262 | "```\n", 263 | "\n", 264 | "Before building a Python application, it's a good idea to decide which docstring style you want to use. If your docstrings are short and simple, Google's style is a great option. If you have long, in-depth docstrings, you may want to opt for the Numpy style. That being said, this is mainly a style preference. Both docstring styles are valid.\n", 265 | "\n", 266 | "Let's re-write our `sum_numbers()` function with docstrings using the Google style guide." 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 7, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "def sum_numbers(number_1, number_2):\n", 276 | " \"\"\"Sums two numbers together. \n", 277 | " \n", 278 | " Args:\n", 279 | " number_1 (int): first number to be summed\n", 280 | " number_2 (int): second number to be summed\n", 281 | " \n", 282 | " Returns:\n", 283 | " int: sum of number_1 and number_2\n", 284 | " \"\"\"\n", 285 | " total_sum = number_1 + number_2\n", 286 | " return total_sum" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "Want to see more examples of Python docstrings in action? Check out the code base of open-source Python packages like [pandas](https://github.com/pandas-dev/pandas/tree/master/pandas) and [scikit-learn](https://github.com/scikit-learn/scikit-learn/tree/master/sklearn) for inspiration. " 294 | ] 295 | } 296 | ], 297 | "metadata": { 298 | "kernelspec": { 299 | "display_name": "Python 3 (ipykernel)", 300 | "language": "python", 301 | "name": "python3" 302 | }, 303 | "language_info": { 304 | "codemirror_mode": { 305 | "name": "ipython", 306 | "version": 3 307 | }, 308 | "file_extension": ".py", 309 | "mimetype": "text/x-python", 310 | "name": "python", 311 | "nbconvert_exporter": "python", 312 | "pygments_lexer": "ipython3", 313 | "version": "3.9.12" 314 | } 315 | }, 316 | "nbformat": 4, 317 | "nbformat_minor": 2 318 | } 319 | -------------------------------------------------------------------------------- /book/00_python_crash_course_oop.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "90cc6d0b", 6 | "metadata": {}, 7 | "source": [ 8 | "# Object-Oriented Programming\n", 9 | "\n", 10 | "Python is an object-oriented programming (OOP) language. In Python, just about everything is an “object”. \n", 11 | "\n", 12 | "Objects have their own attributes. Let’s say we have an object called `cat`. A cat's attributes could include color, size, and age. Suppose we want to know the color of the `cat`. We can inspect the color attribute like this:\n", 13 | "\n", 14 | "```\n", 15 | "cat.color \n", 16 | "```\n", 17 | "> red \n", 18 | "\n", 19 | "Objects also have their own methods, which are basically built-in functions that are applied to the object. In this case, the `cat`’s methods could include jumping, sleeping, or playing. This is how we would ask the cat to jump:\n", 20 | "\n", 21 | "```\n", 22 | "cat.jump()\n", 23 | "```" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "id": "80b248d9", 29 | "metadata": {}, 30 | "source": [ 31 | "Now, you might be wondering: where did this `cat` object come from? How did we create it? \n", 32 | "\n", 33 | "An object is an instance of a \"[class](https://docs.python.org/3/tutorial/classes.html)\", which can be thought of as a “blueprint” for creating objects. That means that our object, `cat`, came from a class. Let's call the class `Cat`. The `Cat` class is where the attributes and methods are defined. It might look something like this:" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "id": "cb7d055d", 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "class Cat:\n", 44 | " def __init__(self, name, color, age):\n", 45 | " self.name = name\n", 46 | " self.color = color \n", 47 | " self.age = age\n", 48 | " \n", 49 | " def jump(self):\n", 50 | " print(\"jump!\")\n", 51 | "\n", 52 | " def meow(self):\n", 53 | " print(\"meow!\")" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "id": "f7773053", 59 | "metadata": {}, 60 | "source": [ 61 | "The `cat` object was created like this:" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 2, 67 | "id": "ae8ec5cf", 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "name": "stdout", 72 | "output_type": "stream", 73 | "text": [ 74 | "meow!\n" 75 | ] 76 | } 77 | ], 78 | "source": [ 79 | "cat = Cat(name='Tabby', color='red', age=2)\n", 80 | "cat.meow()" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "id": "1472445e", 86 | "metadata": {}, 87 | "source": [ 88 | "As we'll learn very soon, all objects have a datatype. The datatype of an object is its class. In the case of our `cat` object, it's datatype is `Cat`! " 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "id": "1276db4b", 94 | "metadata": {}, 95 | "source": [ 96 | "```{note}\n", 97 | "When we start learning about dataframes in the next chapter, it'll be helpful to remember 2 things:\n", 98 | "\n", 99 | "- a dataframe attribute looks like: `dataframe.attribute_name` (without parentheses)\n", 100 | "- a dataframe method looks like: `dataframe.method()` (with parentheses)\n", 101 | "\n", 102 | "If this is super confusing, don't worry! We will learn as we go. \n", 103 | "```" 104 | ] 105 | } 106 | ], 107 | "metadata": { 108 | "jupytext": { 109 | "cell_metadata_filter": "-all", 110 | "main_language": "python", 111 | "notebook_metadata_filter": "-all" 112 | }, 113 | "kernelspec": { 114 | "display_name": "Python 3 (ipykernel)", 115 | "language": "python", 116 | "name": "python3" 117 | }, 118 | "language_info": { 119 | "codemirror_mode": { 120 | "name": "ipython", 121 | "version": 3 122 | }, 123 | "file_extension": ".py", 124 | "mimetype": "text/x-python", 125 | "name": "python", 126 | "nbconvert_exporter": "python", 127 | "pygments_lexer": "ipython3", 128 | "version": "3.9.9" 129 | } 130 | }, 131 | "nbformat": 4, 132 | "nbformat_minor": 5 133 | } 134 | -------------------------------------------------------------------------------- /book/00_python_crash_course_variables.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Variables\n", 8 | "\n", 9 | "In Python, *everything* is an object. Variables are names given to identify these objects. In other words, a variable can be thought of as a “label” or “name tag” for Python objects that we're working with. With any good labelling system, it makes it easy to retrieve the right object that we're looking for. \n", 10 | "\n", 11 | "" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Example of a Variable\n", 19 | "\n", 20 | "The easiest way to understand a variable is to see how it's used in the wild.\n", 21 | "\n", 22 | "Let's say we have an oven that measures temperature in Celcius, but all of our recipes are in Fahrenheit. We're baking cookies and it says to pre-heat the oven to 350 degrees Fahrenheit. We'll need to convert this to Celcius using this calculation: \n", 23 | "\n", 24 | "$T-32 \\times \\frac{5}{9}$\n", 25 | "\n", 26 | "where T is the temperature in Fahrenheit. " 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "data": { 36 | "text/plain": [ 37 | "176.66666666666669" 38 | ] 39 | }, 40 | "execution_count": 1, 41 | "metadata": {}, 42 | "output_type": "execute_result" 43 | } 44 | ], 45 | "source": [ 46 | "(350-32)*(5/9)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "To make the code more clear, we can store temperature as a variable, called `temperature_in_fahrenheit`. " 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": {}, 60 | "outputs": [ 61 | { 62 | "data": { 63 | "text/plain": [ 64 | "176.66666666666669" 65 | ] 66 | }, 67 | "execution_count": 2, 68 | "metadata": {}, 69 | "output_type": "execute_result" 70 | } 71 | ], 72 | "source": [ 73 | "temperature_in_fahrenheit = 350\n", 74 | "\n", 75 | "(temperature_in_fahrenheit-32)*(5/9)" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "## Creating a Variable\n", 83 | "\n", 84 | "We can create a variable by assigning it a value. " 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 3, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "name": "stdout", 94 | "output_type": "stream", 95 | "text": [ 96 | "10\n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "x = 10\n", 102 | "print(x)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "In the example above, variable `x` is assigned the value 10. We can treat `x` as if it were 10 and apply arithmetic operations to it. " 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 4, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "name": "stdout", 119 | "output_type": "stream", 120 | "text": [ 121 | "20\n", 122 | "110\n" 123 | ] 124 | } 125 | ], 126 | "source": [ 127 | "print(x*2)\n", 128 | "print(x+100)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "We can re-assign a variable to another value even after it's already been assigned once. When we use the variable after re-assignment, the new value will referenced. The initial value is no longer stored. " 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 5, 141 | "metadata": {}, 142 | "outputs": [ 143 | { 144 | "name": "stdout", 145 | "output_type": "stream", 146 | "text": [ 147 | "1\n", 148 | "2\n", 149 | "101\n" 150 | ] 151 | } 152 | ], 153 | "source": [ 154 | "x = 1\n", 155 | "print(x)\n", 156 | "print(x*2)\n", 157 | "print(x+100)" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "We can also re-assign a variable to a value of another datatype. In doing so, we are changing the datatype of the variable." 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 6, 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "name": "stdout", 174 | "output_type": "stream", 175 | "text": [ 176 | "Datatype of 10: \n", 177 | "Datatype of helloworld: \n" 178 | ] 179 | } 180 | ], 181 | "source": [ 182 | "a = 10\n", 183 | "print(f\"Datatype of {a}: {type(a)}\")\n", 184 | "\n", 185 | "a = \"helloworld\"\n", 186 | "print(f\"Datatype of {a}: {type(a)}\")" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "## Chain Assignment\n", 194 | "\n", 195 | "With chain assignment, you can assign the same value to several variables simultaneously. Let's assign `a`, `b`, and `c` to 100. " 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "a = b = c = 100" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 8, 210 | "metadata": {}, 211 | "outputs": [ 212 | { 213 | "name": "stdout", 214 | "output_type": "stream", 215 | "text": [ 216 | "100 100 100\n" 217 | ] 218 | } 219 | ], 220 | "source": [ 221 | "print(a, b, c)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "`a`, `b`, and `c` are 3 variables that all have the same value 100. " 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "## Variable Assignment and Objects in Python\n", 236 | "\n", 237 | "In Python, it's important to note that everything is an \"object\". Integers, strings, and floats are treated as objects. So a variable is a symbolic name that is a reference to an object (value). Once an object (value) is assigned to a variable, you can refer to the object by that name. Let's look at an example." 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 9, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "x = 500" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "This assignment creates an integer object with the value 500 and assigns the variable `x` to point to that object.\n", 254 | "\n", 255 | "Now, let's say we want to create a new variable `y` that points to `x`. " 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 10, 261 | "metadata": {}, 262 | "outputs": [ 263 | { 264 | "data": { 265 | "text/plain": [ 266 | "500" 267 | ] 268 | }, 269 | "execution_count": 10, 270 | "metadata": {}, 271 | "output_type": "execute_result" 272 | } 273 | ], 274 | "source": [ 275 | "y = x \n", 276 | "y" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "Assigning one variable to another does not create a new object. Instead, it creates a new symbolic reference, `y`, which points to the same object that `x` points to. What happens when we re-assign `x` to another value?" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 11, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [ 292 | "x = 111" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "A new integer object gets created with the value 111 and `x` becomes a reference to it. The value of `y` is still referencing the original value that `x` was assigned to. " 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 12, 305 | "metadata": {}, 306 | "outputs": [ 307 | { 308 | "data": { 309 | "text/plain": [ 310 | "500" 311 | ] 312 | }, 313 | "execution_count": 12, 314 | "metadata": {}, 315 | "output_type": "execute_result" 316 | } 317 | ], 318 | "source": [ 319 | "y" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "## Global and Local Variables\n", 327 | "\n", 328 | "A global variable is defined outside a function and can be accessed inside any function within your Python environment. Let's create a variable called `w`. It's a global variable because it's defined outside of a function. " 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 13, 334 | "metadata": {}, 335 | "outputs": [ 336 | { 337 | "data": { 338 | "text/plain": [ 339 | "'hi!'" 340 | ] 341 | }, 342 | "execution_count": 13, 343 | "metadata": {}, 344 | "output_type": "execute_result" 345 | } 346 | ], 347 | "source": [ 348 | "w = 'hi!'\n", 349 | "w" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "We can access `w` inside a function. Let's create a function that prints `w`." 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 14, 362 | "metadata": {}, 363 | "outputs": [ 364 | { 365 | "name": "stdout", 366 | "output_type": "stream", 367 | "text": [ 368 | "hi!\n" 369 | ] 370 | } 371 | ], 372 | "source": [ 373 | "def print_greetings():\n", 374 | " print(w)\n", 375 | " \n", 376 | "print_greetings()" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "If a variable is defined inside a function, it's called a local variable and can only be accessed inside that function. " 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 15, 389 | "metadata": {}, 390 | "outputs": [ 391 | { 392 | "name": "stdout", 393 | "output_type": "stream", 394 | "text": [ 395 | "hola!\n" 396 | ] 397 | } 398 | ], 399 | "source": [ 400 | "def print_greetings():\n", 401 | " w = 'hola!'\n", 402 | " print(w)\n", 403 | "\n", 404 | "print_greetings()" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "We created a local `w` variable inside our function `print_greetings`. This takes priority over the outside global variable. That's why the value of local variable `w` gets printed (\"hola!\") instead of the global variable value (\"hi!\").\n", 412 | "\n", 413 | "That being said, if we were to print `w` on its own, it will refer to the global variable rather than the local variable. This is because the local variable cannot be accessed outside of the function that it's defined in." 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": 16, 419 | "metadata": {}, 420 | "outputs": [ 421 | { 422 | "name": "stdout", 423 | "output_type": "stream", 424 | "text": [ 425 | "hi!\n" 426 | ] 427 | } 428 | ], 429 | "source": [ 430 | "print(w)" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "## Variable Naming\n", 438 | "\n", 439 | "When writing a Python script or application, it's important to give your variable a descriptive name. This is especially true for data science projects where the name of your variable can give more information on its purpose at first glance. \n", 440 | "\n", 441 | "Here are some general rules about variable naming in Python: \n", 442 | "\n", 443 | "- In Javascript, variables tend to follow the `camelCase` convention. In Python, we use `snake_case` where every word is separated with an underscore.\n", 444 | "- Variables can contain digits but the first character of a variable name cannot be a digit. For example, `a2` is a legitimate variable name but `2a` would raise an error. \n", 445 | "- Variable names are case-sensitive so you can create two variables with the same spelling but if they have different lower-case/upper-case letters, they will be treated as two separate variables. The general trend is to keep your variable names lower-case (i.e., use `age` instead of `Age`)." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 17, 451 | "metadata": {}, 452 | "outputs": [ 453 | { 454 | "name": "stdout", 455 | "output_type": "stream", 456 | "text": [ 457 | "age: 10, Age: 50\n" 458 | ] 459 | } 460 | ], 461 | "source": [ 462 | "age = 10\n", 463 | "Age = 50\n", 464 | "\n", 465 | "print(f\"age: {age}, Age: {Age}\")" 466 | ] 467 | }, 468 | { 469 | "cell_type": "markdown", 470 | "metadata": {}, 471 | "source": [ 472 | "In Python, there are several keywords that are restricted from being used as variable names. These should never be used as variable names. You can check out the reserved keywords below:" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": 18, 478 | "metadata": {}, 479 | "outputs": [ 480 | { 481 | "name": "stdout", 482 | "output_type": "stream", 483 | "text": [ 484 | "\n", 485 | "Here is a list of the Python keywords. Enter any keyword to get more help.\n", 486 | "\n", 487 | "False break for not\n", 488 | "None class from or\n", 489 | "True continue global pass\n", 490 | "__peg_parser__ def if raise\n", 491 | "and del import return\n", 492 | "as elif in try\n", 493 | "assert else is while\n", 494 | "async except lambda with\n", 495 | "await finally nonlocal yield\n", 496 | "\n" 497 | ] 498 | } 499 | ], 500 | "source": [ 501 | "help(\"keywords\")" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "Variable naming is partly a style preference, but there are suggested guidelines to follow in Python's official Style Guide. You can check out the suggestions [here](https://www.python.org/dev/peps/pep-0008/#naming-conventions). " 509 | ] 510 | } 511 | ], 512 | "metadata": { 513 | "kernelspec": { 514 | "display_name": "Python 3 (ipykernel)", 515 | "language": "python", 516 | "name": "python3" 517 | }, 518 | "language_info": { 519 | "codemirror_mode": { 520 | "name": "ipython", 521 | "version": 3 522 | }, 523 | "file_extension": ".py", 524 | "mimetype": "text/x-python", 525 | "name": "python", 526 | "nbconvert_exporter": "python", 527 | "pygments_lexer": "ipython3", 528 | "version": "3.9.12" 529 | } 530 | }, 531 | "nbformat": 4, 532 | "nbformat_minor": 2 533 | } 534 | -------------------------------------------------------------------------------- /book/01_pandas_dataframe.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "70594b81", 6 | "metadata": {}, 7 | "source": [ 8 | "# Getting to Know the Pandas DataFrame" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "6fdaa7fa", 14 | "metadata": {}, 15 | "source": [ 16 | "The [Pandas DataFrame](https://pandas.pydata.org/docs/reference/frame.html) is a data structure that allows us to manipulate and analyze tabular data. A \"tabular\" data structure can be thought of as a matrix, where rows represent observations and columns represent features that describe each observation. It's a structure that you would find in a SQL database or Excel spreadsheet. Let's say we have a tabular dataset about movies.\n", 17 | "\n", 18 | "\n", 19 | "\n", 20 | "In this case, each row represents a movie and each column represents a characteristic about the movie like the genre, rating, and director. The \"index\" column represents a row's position in the dataframe. By default, a Pandas DataFrame's index starts at 0." 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "cdda2b7c", 26 | "metadata": {}, 27 | "source": [ 28 | "## Importing the Pandas package\n", 29 | "\n", 30 | "In order to create and use a Pandas DataFrame, we need to have the `pandas` package readily available in our environment. Let's import `pandas` and give it the alias of \"pd\" so that we don't have to write out \"pandas\" every time we call a function.\n", 31 | "\n", 32 | "" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 1, 38 | "id": "f9dd60c4", 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "import pandas as pd " 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "id": "cd2658d2", 48 | "metadata": {}, 49 | "source": [ 50 | "## Creating a dataframe\n", 51 | "\n", 52 | "There are several ways to create a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame). Here, we'll describe 2 approaches. \n", 53 | "\n", 54 | "### Converting a dictionary to a dataframe\n", 55 | "\n", 56 | "You can create a dataframe from a dictionary. Each key of the dictionary represents a column name and the value of the dictionary is a list that represents values belonging to that particular column. Each element of the list represents the value of a row in the dataframe. \n", 57 | "\n", 58 | "\n", 59 | "\n", 60 | "Let's create a dataframe called `df_movies`.\n", 61 | "\n", 62 | "```{note}\n", 63 | "`df` is short for \"dataframe\". It's common for data scientists to name their dataframe \"df\". \n", 64 | "```" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 2, 70 | "id": "27f8611b", 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "data = {\n", 75 | " 'movie': ['Batman', 'Jungle Book', 'Titanic'], \n", 76 | " 'genre': ['action', 'kids', 'romance'], \n", 77 | " 'rating': [6, 9, 8],\n", 78 | " 'director': ['Tim Burton', 'Wolfgang Reitherman', 'James Cameron']\n", 79 | "}\n", 80 | "\n", 81 | "df_movies = pd.DataFrame(data)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "id": "f5b2b9b9", 87 | "metadata": {}, 88 | "source": [ 89 | "We can confirm that `df_movies` is indeed a dataframe:" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 3, 95 | "id": "c0c71ad1", 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "pandas.core.frame.DataFrame" 102 | ] 103 | }, 104 | "execution_count": 3, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "type(df_movies)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "id": "2ac9bef6", 116 | "metadata": {}, 117 | "source": [ 118 | "Now let's see how it looks 👀:" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 4, 124 | "id": "09c7f239", 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "data": { 129 | "text/html": [ 130 | "
\n", 131 | "\n", 144 | "\n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | "
moviegenreratingdirector
0Batmanaction6Tim Burton
1Jungle Bookkids9Wolfgang Reitherman
2Titanicromance8James Cameron
\n", 178 | "
" 179 | ], 180 | "text/plain": [ 181 | " movie genre rating director\n", 182 | "0 Batman action 6 Tim Burton\n", 183 | "1 Jungle Book kids 9 Wolfgang Reitherman\n", 184 | "2 Titanic romance 8 James Cameron" 185 | ] 186 | }, 187 | "execution_count": 4, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "df_movies" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "id": "6afd5ec6", 199 | "metadata": {}, 200 | "source": [ 201 | "### Loading a csv file into a dataframe\n", 202 | "\n", 203 | "You can also create a dataframe by importing tabular data from a comma-separated-value (csv) file, or Excel spreadsheet. A csv file looks somthing like this:\n", 204 | "\n", 205 | "\n", 206 | "\n", 207 | "To load this csv file into a Pandas DataFrame, we will need to use the Pandas [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function. For data in Excel format, you can use [`read_excel()`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html). We will also need to know the path where the csv file is located. This can be either on your local machine or in the cloud. \n", 208 | "\n", 209 | "Let's load in `movies_data.csv` file as a dataframe. The original file is located on my local machine in a folder called `data/`." 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 5, 215 | "id": "52e90654", 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/html": [ 221 | "
\n", 222 | "\n", 235 | "\n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | "
moviegenreratingdirector
0Batmanaction6Tim Burton
1Jungle Bookkids9Wolfgang Reitherman
2Titanicromance8James Cameron
\n", 269 | "
" 270 | ], 271 | "text/plain": [ 272 | " movie genre rating director\n", 273 | "0 Batman action 6 Tim Burton\n", 274 | "1 Jungle Book kids 9 Wolfgang Reitherman\n", 275 | "2 Titanic romance 8 James Cameron" 276 | ] 277 | }, 278 | "execution_count": 5, 279 | "metadata": {}, 280 | "output_type": "execute_result" 281 | } 282 | ], 283 | "source": [ 284 | "df_movies = pd.read_csv(\"data/movies_data.csv\")\n", 285 | "\n", 286 | "df_movies" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "id": "ea1b631c", 292 | "metadata": {}, 293 | "source": [ 294 | "This csv-loaded dataframe is identical to the one that was generated from a dictionary. " 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "id": "2629988a", 300 | "metadata": {}, 301 | "source": [ 302 | "## Pandas Series\n", 303 | "\n", 304 | "An important part of the Pandas DataFrame is the [Pandas Series](https://pandas.pydata.org/docs/reference/series.html). While the DataFrame is a 2-dimensional structure, a Series is 1-dimensional. It can store any datatype (integers, strings, floats, timestamps, even lists). A Series represents a single column of a DataFrame. This is how you get an individual column (represented as a Pandas Series) from a dataframe:\n", 305 | "\n", 306 | "```\n", 307 | "dataframe['column_name'] \n", 308 | "```\n", 309 | "\n", 310 | "Let's say we want to pull the `rating` column from our `df_movies` dataframe." 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": 6, 316 | "id": "f1cf12c4", 317 | "metadata": {}, 318 | "outputs": [ 319 | { 320 | "data": { 321 | "text/plain": [ 322 | "0 6\n", 323 | "1 9\n", 324 | "2 8\n", 325 | "Name: rating, dtype: int64" 326 | ] 327 | }, 328 | "execution_count": 6, 329 | "metadata": {}, 330 | "output_type": "execute_result" 331 | } 332 | ], 333 | "source": [ 334 | "df_movies['rating']" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "id": "4dcb11cc", 340 | "metadata": {}, 341 | "source": [ 342 | "The `rating` column is a Pandas Series! We can confirm its datatype:" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 7, 348 | "id": "3bc9b2c0", 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "data": { 353 | "text/plain": [ 354 | "pandas.core.series.Series" 355 | ] 356 | }, 357 | "execution_count": 7, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "type(df_movies['rating'])" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "id": "3615b86a", 369 | "metadata": {}, 370 | "source": [ 371 | "There is a wide range of built-in functions that come with the Pandas Series. Some examples include:\n", 372 | "\n", 373 | "- `.mean()`: if the column is numeric, it gets the average value of the column\n", 374 | "- `.nunique()`: counts number of unique values belonging to a particular column \n", 375 | "- `.fillna(value='value')`: fills missing values with 'value' (or any other value of your choosing)\n", 376 | "\n", 377 | "The [official documentation](https://pandas.pydata.org/docs/reference/series.html) on Pandas Series provides a list of all available functions. \n", 378 | "We'll explore the functions of Pandas Series in more detail in the upcmoing chapter, Data Exploration. " 379 | ] 380 | } 381 | ], 382 | "metadata": { 383 | "kernelspec": { 384 | "display_name": "Python 3 (ipykernel)", 385 | "language": "python", 386 | "name": "python3" 387 | }, 388 | "language_info": { 389 | "codemirror_mode": { 390 | "name": "ipython", 391 | "version": 3 392 | }, 393 | "file_extension": ".py", 394 | "mimetype": "text/x-python", 395 | "name": "python", 396 | "nbconvert_exporter": "python", 397 | "pygments_lexer": "ipython3", 398 | "version": "3.9.12" 399 | } 400 | }, 401 | "nbformat": 4, 402 | "nbformat_minor": 5 403 | } 404 | -------------------------------------------------------------------------------- /book/AP_nyc_data_definitions.md: -------------------------------------------------------------------------------- 1 | # NYC Real Estate Data Defintions 2 | 3 | **Source:** [The City of New York, Department of Finance.](https://www1.nyc.gov/assets/finance/downloads/pdf/07pdf/glossary_rsf071607.pdf) 4 | 5 | ## Borough 6 | 7 | The name of the borough in which the property is located. 8 | 9 | ## Neighborhood 10 | 11 | Department of Finance assessors determine the neighborhood name in the course of valuing properties. The common name of the neighborhood is generally the same as the name Finance designates. However, there may be slight differences in neighborhood boundary lines and some sub-neighborhoods may not be included. 12 | 13 | ## Building Class Category 14 | 15 | This is a field that we are including so that users of the Rolling Sales Files can easily identify similar properties by broad usage (e.g. One Family Homes) without looking up individual Building Classes. Files are sorted by Borough, Neighborhood, Building Class Category, Block and Lot. 16 | 17 | ## Tax Class at Present 18 | 19 | Every property in the city is assigned to one of four tax classes (Classes 1, 2, 3, and 4), 20 | based on the use of the property. 21 | 22 | - **Class 1:** Includes most residential property of up to three units (such as one-, two-, and three-family homes and small stores or offices with one or two attached apartments), vacant land that is zoned for residential use, and most condominiums that are not more than three stories. 23 | - **Class 2:** Includes all other property that is primarily residential, such as cooperatives and condominiums. 24 | - **Class 3:** Includes property with equipment owned by a gas, telephone or electric company. 25 | - **Class 4:** Includes all other properties not included in class 1, 2, and 3, such as 26 | offices, factories, warehouses, garage buildings, etc. 27 | 28 | ## Block 29 | 30 | A Tax Block is a sub-division of the borough on which real properties are located. The Department of Finance uses a Borough-Block-Lot classification to label all real property in the City. “Whereas” addresses describe the street location of a property, the block and lot distinguishes one unit of real property from another, such as the different condominiums in a single building. Also, block and lots are not subject to name changes based on which side of the parcel the building puts its entrance on. 31 | 32 | ## Lot 33 | 34 | A Tax Lot is a subdivision of a Tax Block and represents the property unique location. 35 | 36 | ## Easement 37 | 38 | An easement is a right, such as a right of way, which allows an entity to make limited use of another’s real property. For example: MTA railroad tracks that run across a portion of another property. 39 | 40 | ## Building Class at Present 41 | 42 | The Building Classification is used to describe a property’s constructive use. The first position of the Building Class is a letter that is used to describe a general class of properties (for example "A" signifies one-family homes, "O" signifies office buildings. "R" signifies 43 | condominiums). The second position, a number, adds more specific information about the property’s use or construction style (using our previous examples “A0” is a Cape Cod style one family home, “O4” is a tower type office building and “R5” is a commercial condominium unit). The term Building Class used by the Department of Finance is interchangeable with the term Building Code used by the Department of Buildings. See NYC Building Classifications. 44 | 45 | ## Address 46 | 47 | The street address of the property as listed on the Sales File. Coop sales include the apartment number in the address field. 48 | 49 | ## Zip Code 50 | 51 | The property’s postal code 52 | 53 | ## Residential Units 54 | 55 | The number of residential units at the listed property. 56 | 57 | ## Commercial Units 58 | 59 | The number of commercial units at the listed property. 60 | 61 | ## Total Units 62 | 63 | The total number of units at the listed property. 64 | 65 | ## Land Square Feet 66 | 67 | The land area of the property listed in square feet. 68 | 69 | ## Gross Square Feet 70 | 71 | The total area of all the floors of a building as measured from the exterior surfaces of the outside walls of the building, including the land area and space within any building or structure on the property. 72 | 73 | ## Year Built 74 | 75 | Year the structure on the property was built. 76 | 77 | ## Building Class at Time of Sale 78 | 79 | The Building Classification is used to describe a property’s constructive use. The first position of the Building Class is a letter that is used to describe a general class of properties (for example "A" signifies one-family homes, "O" signifies office buildings. "R" signifies condominiums). The second position, a number, adds more specific information about the property’s use or construction style (using our previous examples "A0" is a Cape Cod style one family home, “O4” is a tower type office building and "R5" is a commercial condominium unit). The term Building Class as used by the Department of Finance is interchangeable with the term Building Code as used by the Department of Buildings. 80 | 81 | ## Sales Price 82 | 83 | Price paid for the property. 84 | 85 | ## Sale Date 86 | 87 | Date the property sold. 88 | 89 | ## $0 Sales Price 90 | 91 | A `$0` sale indicates that there was a transfer of ownership without a cash consideration. There can be a number of reasons for a $0 sale including transfers of ownership from parents to children. -------------------------------------------------------------------------------- /book/_config.yml: -------------------------------------------------------------------------------- 1 | # Book settings 2 | # Learn more at https://jupyterbook.org/customize/config.html 3 | 4 | title: Practical Python for Data Science 5 | author: Jill Cates 6 | copyright: "Jupyter Academy 2022" 7 | logo: logo.png 8 | 9 | # Force re-execution of notebooks on each build. 10 | # See https://jupyterbook.org/content/execute.html 11 | execute: 12 | execute_notebooks: force 13 | 14 | # Define the name of the latex output file for PDF builds 15 | latex: 16 | latex_documents: 17 | targetname: book.tex 18 | 19 | # Add a bibtex file so that we can create citations 20 | # bibtex_bibfiles: 21 | # - references.bib 22 | 23 | # Information about where the book exists on the web 24 | repository: 25 | url: https://github.com/jupyteracademy/practical-python-for-data-science # Online location of your book 26 | branch: main # Which branch of the repository should be used when creating links (optional) 27 | path_to_book: "book/" 28 | 29 | # Add GitHub buttons to your book 30 | # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository 31 | html: 32 | use_issues_button: true 33 | use_repository_button: true 34 | favicon: "https://practicalpython.s3.us-east-2.amazonaws.com/assets/favicon.ico" 35 | extra_navbar: Jupyter Academy 36 | google_analytics_id: UA-219614792-1 -------------------------------------------------------------------------------- /book/_toc.yml: -------------------------------------------------------------------------------- 1 | # Table of contents 2 | # Learn more at https://jupyterbook.org/customize/toc.html 3 | 4 | format: jb-book 5 | root: intro 6 | parts: 7 | - caption: The Book 8 | chapters: 9 | - file: 00_python_crash_course 10 | sections: 11 | - file: 00_python_crash_course_oop 12 | - file: 00_python_crash_course_datatypes 13 | - file: 00_python_crash_course_variables 14 | - file: 00_python_crash_course_functions 15 | - file: 01_pandas_dataframe 16 | - file: 02_loading_data 17 | - file: 03_cleaning_data 18 | - file: 04_data_visualization 19 | - file: 05_data_exploration 20 | - caption: Appendix 21 | chapters: 22 | - file: AP_nyc_data_definitions.md 23 | - file: AP_seaborn_palette.ipynb -------------------------------------------------------------------------------- /book/data/building_class.psv: -------------------------------------------------------------------------------- 1 | building_class_code|description 2 | A0|CAPE COD 3 | A1|TWO STORIES - DETACHED SM OR MID 4 | A2|ONE STORY - PERMANENT LIVING QUARTER 5 | A3|LARGE SUBURBAN RESIDENCE 6 | A4|CITY RESIDENCE ONE FAMILY 7 | A5|ONE FAMILY ATTACHED OR SEMI-DETACHED 8 | A6|SUMMER COTTAGE 9 | A7|MANSION TYPE OR TOWN HOUSE 10 | A8|BUNGALOW COLONY - COOPERATIVELY OWNED LAND 11 | A9|MISCELLANEOUS ONE FAMILY 12 | B1|TWO FAMILY BRICK 13 | B2|TWO FAMILY FRAME 14 | B3|TWO FAMILY CONVERTED FROM ONE FAMILY 15 | B9|MISCELLANEOUS TWO FAMILY 16 | C0|THREE FAMILIES 17 | C1|OVER SIX FAMILIES WITHOUT STORES 18 | C2|FIVE TO SIX FAMILIES 19 | C3|FOUR FAMILIES 20 | C4|OLD LAW TENEMENT 21 | C5|CONVERTED DWELLINGS OR ROOMING HOUSE 22 | C6|WALK-UP COOPERATIVE 23 | C7|WALK-UP APT. OVER SIX FAMILIES WITH STORES 24 | C8|WALK-UP CO-OP; CONVERSION FROM LOFT/WAREHOUSE 25 | C9|GARDEN APARTMENTS 26 | CM|MOBILE HOMES/TRAILER PARKS 27 | D0|ELEVATOR CO-OP; CONVERSION FROM LOFT/WAREHOUSE 28 | D1|ELEVATOR APT; SEMI-FIREPROOF WITHOUT STORES 29 | D2|ELEVATOR APT; ARTISTS IN RESIDENCE 30 | D3|ELEVATOR APT; FIREPROOF WITHOUT STORES 31 | D4|ELEVATOR COOPERATIVE 32 | D5|ELEVATOR APT; CONVERTED 33 | D6|ELEVATOR APT; FIREPROOF WITH STORES 34 | D7|ELEVATOR APT; SEMI-FIREPROOF WITH STORES 35 | D8|ELEVATOR APT; LUXURY TYPE 36 | D9|ELEVATOR APT; MISCELLANEOUS 37 | E1|FIREPROOF WAREHOUSE 38 | E2|CONTRACTORS WAREHOUSE 39 | E3|SEMI-FIREPROOF WAREHOUSE 40 | E4|METAL FRAME WAREHOUSE 41 | E7|SELF-STORAGE WAREHOUSES 42 | E9|MISCELLANEOUS WAREHOUSE 43 | F1|FACTORY; HEAVY MANUFACTURING - FIREPROOF 44 | F2|FACTORY; SPECIAL CONSTRUCTION - FIREPROOF 45 | F4|FACTORY; INDUSTRIAL SEMI-FIREPROOF 46 | F5|FACTORY; LIGHT MANUFACTURING 47 | F8|FACTORY; TANK FARM 48 | F9|FACTORY; INDUSTRIAL-MISCELLANEOUS 49 | G0|GARAGE; RESIDENTIAL TAX CLASS 1 50 | G1|ALL PARKING GARAGES 51 | G2|AUTO BODY/COLLISION OR AUTO REPAIR 52 | G3|GAS STATION WITH RETAIL STORE 53 | G4|GAS STATION WITH SERVICE/AUTO REPAIR 54 | G5|GAS STATION ONLY WITH/WITHOUT SMALL KIOSK 55 | G6|LICENSED PARKING LOT 56 | G7|UNLICENSED PARKING LOT 57 | G8|CAR SALES/RENTAL WITH SHOWROOM 58 | G9|MISCELLANEOUS GARAGE 59 | GU|CAR SALES OR RENTAL LOTS WITHOUT SHOWROOM 60 | GW|CAR WASH OR LUBRITORIUM FACILITY 61 | HB|BOUTIQUE: 10-100 ROOMS, W/LUXURY FACILITIES, THEMED, STYLISH, W/FULL SVC ACCOMMODATIONS 62 | HH|HOSTELS- BED RENTALS IN DORMITORY-LIKE SETTINGS W/SHARED ROOMS & BATHROOMS 63 | HR|SRO- 1 OR 2 PEOPLE HOUSED IN INDIVIDUAL ROOMS IN MULTIPLE DWELLING AFFORDABLE HOUSING 64 | HS|EXTENDED STAY/SUITE: AMENITIES SIMILAR TO APT; TYPICALLY CHARGE WEEKLY RATES & LESS EXPENSIVE THAN FULL-SERVICE HOTEL 65 | H1|LUXURY HOTEL 66 | H2|FULL SERVICE HOTEL 67 | H3|LIMITED SERVICE; MANY AFFILIATED WITH NATIONAL CHAIN 68 | H4|MOTEL 69 | H5|HOTEL; PRIVATE CLUB, LUXURY TYPE 70 | H6|APARTMENT HOTEL 71 | H7|APARTMENT HOTEL - COOPERATIVELY OWNED 72 | H8|DORMITORY 73 | H9|MISCELLANEOUS HOTEL 74 | I1|HOSPITAL, SANITARIUM, MENTAL INSTITUTION 75 | I2|INFIRMARY 76 | I3|DISPENSARY 77 | I4|HOSPITAL; STAFF FACILITY 78 | I5|HEALTH CENTER, CHILD CENTER, CLINIC 79 | I6|NURSING HOME 80 | I7|ADULT CARE FACILITY 81 | I9|MISCELLANEOUS HOSPITAL, HEALTH CARE FACILITY 82 | J1|THEATRE; ART TYPE LESS THAN 400 SEATS 83 | J2|THEATRE; ART TYPE MORE THAN 400 SEATS 84 | J3|MOTION PICTURE THEATRE WITH BALCONY 85 | J4|LEGITIMATE THEATRE, SOLE USE 86 | J5|THEATRE IN MIXED-USE BUILDING 87 | J6|TELEVISION STUDIO 88 | J7|OFF BROADWAY TYPE THEATRE 89 | J8|MULTIPLEX PICTURE THEATRE 90 | J9|MISCELLANEOUS THEATRE 91 | K1|ONE STORY RETAIL BUILDING 92 | K2|MULTI-STORY RETAIL BUILDING (2 OR MORE) 93 | K3|MULTI-STORY DEPARTMENT STORE 94 | K4|PREDOMINANT RETAIL WITH OTHER USES 95 | K5|STAND-ALONE FOOD ESTABLISHMENT 96 | K6|SHOPPING CENTER WITH OR WITHOUT PARKING 97 | K7|BANKING FACILITIES WITH OR WITHOUT PARKING 98 | K8|BIG BOX RETAIL: NOT AFFIXED & STANDING ON OWN LOT W/PARKING, E.G. COSTCO & BJ'S 99 | K9|MISCELLANEOUS STORE BUILDING 100 | L1|LOFT; OVER 8 STORIES (MID MANH. TYPE) 101 | L2|LOFT; FIREPROOF AND STORAGE TYPE WITHOUT STORES 102 | L3|LOFT; SEMI-FIREPROOF 103 | L8|LOFT; WITH RETAIL STORES OTHER THAN TYPE ONE 104 | L9|MISCELLANEOUS LOFT 105 | M1|CHURCH, SYNAGOGUE, CHAPEL 106 | M2|MISSION HOUSE (NON-RESIDENTIAL) 107 | M3|PARSONAGE, RECTORY 108 | M4|CONVENT 109 | M9|MISCELLANEOUS RELIGIOUS FACILITY 110 | N1|ASYLUM 111 | N2|HOME FOR INDIGENT CHILDREN, AGED, HOMELESS 112 | N3|ORPHANAGE 113 | N4|DETENTION HOUSE FOR WAYWARD GIRLS 114 | N9|MISCELLANEOUS ASYLUM, HOME 115 | O1|OFFICE ONLY - 1 STORY 116 | O2|OFFICE ONLY 2 - 6 STORIES 117 | O3|OFFICE ONLY 7 - 19 STORIES 118 | O4|OFFICE ONLY WITH OR WITHOUT COMM - 20 STORIES OR MORE 119 | O5|OFFICE WITH COMM - 1 TO 6 STORIES 120 | O6|OFFICE WITH COMM 7 - 19 STORIES 121 | O7|PROFESSIONAL BUILDINGS/STAND ALONE FUNERAL HOMES 122 | O8|OFFICE WITH APARTMENTS ONLY (NO COMM) 123 | O9|MISCELLANEOUS AND OLD STYLE BANK BLDGS. 124 | P1|CONCERT HALL 125 | P2|LODGE ROOM 126 | P3|YWCA, YMCA, YWHA, YMHA, PAL 127 | P4|BEACH CLUB 128 | P5|COMMUNITY CENTER 129 | P6|AMUSEMENT PLACE, BATH HOUSE, BOAT HOUSE 130 | P7|MUSEUM 131 | P8|LIBRARY 132 | P9|MISCELLANEOUS INDOOR PUBLIC ASSEMBLY 133 | Q1|PARKS/RECREATION FACILTY 134 | Q2|PLAYGROUND 135 | Q3|OUTDOOR POOL 136 | Q4|BEACH 137 | Q5|GOLF COURSE 138 | Q6|STADIUM, RACE TRACK, BASEBALL FIELD 139 | Q7|TENNIS COURT 140 | Q8|MARINA, YACHT CLUB 141 | Q9|MISCELLANEOUS OUTDOOR RECREATIONAL FACILITY 142 | RA|CULTURAL, MEDICAL, EDUCATIONAL, ETC. 143 | RB|OFFICE SPACE 144 | RG|INDOOR PARKING 145 | RH|HOTEL/BOATEL 146 | RK|RETAIL SPACE 147 | RP|OUTDOOR PARKING 148 | RR|CONDOMINIUM RENTALS 149 | RS|NON-BUSINESS STORAGE SPACE 150 | RT|TERRACES/GARDENS/CABANAS 151 | RW|WAREHOUSE/FACTORY/INDUSTRIAL 152 | R0|SPECIAL CONDOMINIUM BILLING LOT 153 | R1|CONDO; RESIDENTIAL UNIT IN 2-10 UNIT BLDG. 154 | R2|CONDO; RESIDENTIAL UNIT IN WALK-UP BLDG. 155 | R3|CONDO; RESIDENTIAL UNIT IN 1-3 STORY BLDG. 156 | R4|CONDO; RESIDENTIAL UNIT IN ELEVATOR BLDG. 157 | R5|MISCELLANEOUS COMMERCIAL 158 | R6|CONDO; RESID.UNIT OF 1-3 UNIT BLDG-ORIG CLASS 1 159 | R7|CONDO; COMML.UNIT OF 1-3 UNIT BLDG-ORIG CLASS 1 160 | R8|CONDO; COMML.UNIT OF 2-10 UNIT BLDG. 161 | R9|CO-OP WITHIN A CONDOMINIUM 162 | RR|CONDO RENTALS 163 | S0|PRIMARILY 1 FAMILY WITH 2 STORES OR OFFICES 164 | S1|PRIMARILY 1 FAMILY WITH 1 STORE OR OFFICE 165 | S2|PRIMARILY 2 FAMILY WITH 1 STORE OR OFFICE 166 | S3|PRIMARILY 3 FAMILY WITH 1 STORE OR OFFICE 167 | S4|PRIMARILY 4 FAMILY WITH 1 STORE OROFFICE 168 | S5|PRIMARILY 5-6 FAMILY WITH 1 STORE OR OFFICE 169 | S9|SINGLE OR MULTIPLE DWELLING WITH STORES OR OFFICES 170 | T1|AIRPORT, AIRFIELD, TERMINAL 171 | T2|PIER, DOCK, BULKHEAD 172 | T9|MISCELLANEOUS TRANSPORTATION FACILITY 173 | U0|UTILITY COMPANY LAND AND BUILDING 174 | U1|BRIDGE, TUNNEL, HIGHWAY 175 | U2|GAS OR ELECTRIC UTILITY 176 | U3|CEILING RAILROAD 177 | U4|TELEPHONE UTILITY 178 | U5|COMMUNICATION FACILITY OTHER THAN TELEPHONE 179 | U6|RAILROAD - PRIVATE OWNERSHIP 180 | U7|TRANSPORTATION - PUBLIC OWNERSHIP 181 | U8|REVOCABLE CONSENT 182 | U9|MISCELLANEOUS UTILITY PROPERTY 183 | V0|ZONED RESIDENTIAL; NOT MANHATTAN 184 | V1|ZONED COMMERCIAL OR MANHATTAN RESIDENTIAL 185 | V2|ZONED COMMERCIAL ADJACENT TO CLASS 1 DWELLING: NOT MANHATTAN 186 | V3|ZONED PRIMARILY RESIDENTIAL; NOT MANHATTAN 187 | V4|POLICE OR FIRE DEPARTMENT 188 | V5|SCHOOL SITE OR YARD 189 | V6|LIBRARY, HOSPITAL OR MUSEUM 190 | V7|PORT AUTHORITY OF NEW YORK AND NEW JERSEY 191 | V8|NEW YORK STATE OR US GOVERNMENT 192 | V9|MISCELLANEOUS VACANT LAND 193 | W1|PUBLIC ELEMENTARY, JUNIOR OR SENIOR HIGH 194 | W2|PAROCHIAL SCHOOL, YESHIVA 195 | W3|SCHOOL OR ACADEMY 196 | W4|TRAINING SCHOOL 197 | W5|CITY UNIVERSITY 198 | W6|OTHER COLLEGE AND UNIVERSITY 199 | W7|THEOLOGICAL SEMINARY 200 | W8|OTHER PRIVATE SCHOOL 201 | W9|MISCELLANEOUS EDUCATIONAL FACILITY 202 | Y1|FIRE DEPARTMENT 203 | Y2|POLICE DEPARTMENT 204 | Y3|PRISON, JAIL, HOUSE OF DETENTION 205 | Y4|MILITARY AND NAVAL INSTALLATION 206 | Y5|DEPARTMENT OF REAL ESTATE 207 | Y6|DEPARTMENT OF SANITATION 208 | Y7|DEPARTMENT OF PORTS AND TERMINALS 209 | Y8|DEPARTMENT OF PUBLIC WORKS 210 | Y9|DEPARTMENT OF ENVIRONMENTAL PROTECTION 211 | Z0|TENNIS COURT, POOL, SHED, ETC. 212 | Z1|COURT HOUSE 213 | Z2|PUBLIC PARKING AREA 214 | Z3|POST OFFICE 215 | Z4|FOREIGN GOVERNMENT 216 | Z5|UNITED NATIONS 217 | Z7|EASEMENT 218 | Z8|CEMETERY 219 | Z9|OTHER MISCELLANEOUS -------------------------------------------------------------------------------- /book/data/movies_data.csv: -------------------------------------------------------------------------------- 1 | movie,genre,rating,director 2 | Batman,action,6,Tim Burton 3 | Jungle Book,kids,9,Wolfgang Reitherman 4 | Titanic,romance,8,James Cameron 5 | -------------------------------------------------------------------------------- /book/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | **Practical Python for Data Science** by [**Jill Cates**](https://www.jillcates.com/) 4 | 5 | 6 | 7 | Python is the "swiss army knife" of programming. There are several factors that contribute to its versatility: 8 | - it has clean and human-readable syntax so it's easy to learn 9 | - it's an interpreted object-oriented scripting language 10 | - it has a strong open-source community and a large repository of Python packages 11 | 12 | Because of its versatility, Python can be applied to both software development (e.g., building web applications and API's) *and* data science (e.g., scientific computing, creating end-to-end data science pipelines). However, writing Python for data science is *very different* than writing Python for software devleopment. A huge part of the learning curve is getting familiar with the syntax of Python's data science packages including but not limited to Pandas, NumPy, and scikit-learn. 13 | 14 | In this book, we will focus on how to use Python in the context of data science. We will work with a real-life dataset and explore it using the following data science Python packages: 15 | 16 | - [Pandas](https://pandas.pydata.org/) 17 | - [Seaborn](https://seaborn.pydata.org/) 18 | - [Matplotlib](https://matplotlib.org/) 19 | 20 | # Prerequisites 21 | 22 | This book is designed to be accessible for people without a strong technical background. In order to make the most of this book, the suggested requirements are: 23 | 24 | - Basic knowledge of Python 25 | - Some familiarity with Jupyter Notebooks, Pandas, and Seaborn 26 | - Googling skills and ability to read documentation 27 | 28 | # Open a Github Issue 29 | 30 | Did you spot an error in this book? Have an idea on how to make the book better? I'm always open to feedback and new ideas. You can contribute by opening a [Github issue](https://github.com/jupyteracademy/practical-python-for-data-science/issues) or creating a pull request with the proposed fix. 31 | 32 | # Support This Project 33 | 34 | If you would like to support this open-sourced project and its continued development and maintenance, you can support in a few of ways: 35 | 36 | - [buy me a coffee](https://www.buymeacoffee.com/jupyteracademy) ☕ 37 | - sign up for my upcoming online courses at [Jupyter Academy](https://jupyteracademy.com/) 🍎 38 | -------------------------------------------------------------------------------- /book/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thehouseofdata/practical-python-for-data-science/31207f2343526350244d79a633661a8e5a09b3a1/book/logo.png -------------------------------------------------------------------------------- /book/references.bib: -------------------------------------------------------------------------------- 1 | --- 2 | --- 3 | 4 | @inproceedings{holdgraf_evidence_2014, 5 | address = {Brisbane, Australia, Australia}, 6 | title = {Evidence for {Predictive} {Coding} in {Human} {Auditory} {Cortex}}, 7 | booktitle = {International {Conference} on {Cognitive} {Neuroscience}}, 8 | publisher = {Frontiers in Neuroscience}, 9 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Knight, Robert T.}, 10 | year = {2014} 11 | } 12 | 13 | @article{holdgraf_rapid_2016, 14 | title = {Rapid tuning shifts in human auditory cortex enhance speech intelligibility}, 15 | volume = {7}, 16 | issn = {2041-1723}, 17 | url = {http://www.nature.com/doifinder/10.1038/ncomms13654}, 18 | doi = {10.1038/ncomms13654}, 19 | number = {May}, 20 | journal = {Nature Communications}, 21 | author = {Holdgraf, Christopher Ramsay and de Heer, Wendy and Pasley, Brian N. and Rieger, Jochem W. and Crone, Nathan and Lin, Jack J. and Knight, Robert T. and Theunissen, Frédéric E.}, 22 | year = {2016}, 23 | pages = {13654}, 24 | file = {Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:C\:\\Users\\chold\\Zotero\\storage\\MDQP3JWE\\Holdgraf et al. - 2016 - Rapid tuning shifts in human auditory cortex enhance speech intelligibility.pdf:application/pdf} 25 | } 26 | 27 | @inproceedings{holdgraf_portable_2017, 28 | title = {Portable learning environments for hands-on computational instruction using container-and cloud-based technology to teach data science}, 29 | volume = {Part F1287}, 30 | isbn = {978-1-4503-5272-7}, 31 | doi = {10.1145/3093338.3093370}, 32 | abstract = {© 2017 ACM. There is an increasing interest in learning outside of the traditional classroom setting. This is especially true for topics covering computational tools and data science, as both are challenging to incorporate in the standard curriculum. These atypical learning environments offer new opportunities for teaching, particularly when it comes to combining conceptual knowledge with hands-on experience/expertise with methods and skills. Advances in cloud computing and containerized environments provide an attractive opportunity to improve the effciency and ease with which students can learn. This manuscript details recent advances towards using commonly-Available cloud computing services and advanced cyberinfrastructure support for improving the learning experience in bootcamp-style events. We cover the benets (and challenges) of using a server hosted remotely instead of relying on student laptops, discuss the technology that was used in order to make this possible, and give suggestions for how others could implement and improve upon this model for pedagogy and reproducibility.}, 33 | booktitle = {{ACM} {International} {Conference} {Proceeding} {Series}}, 34 | author = {Holdgraf, Christopher Ramsay and Culich, A. and Rokem, A. and Deniz, F. and Alegro, M. and Ushizima, D.}, 35 | year = {2017}, 36 | keywords = {Teaching, Bootcamps, Cloud computing, Data science, Docker, Pedagogy} 37 | } 38 | 39 | @article{holdgraf_encoding_2017, 40 | title = {Encoding and decoding models in cognitive electrophysiology}, 41 | volume = {11}, 42 | issn = {16625137}, 43 | doi = {10.3389/fnsys.2017.00061}, 44 | abstract = {© 2017 Holdgraf, Rieger, Micheli, Martin, Knight and Theunissen. Cognitive neuroscience has seen rapid growth in the size and complexity of data recorded from the human brain as well as in the computational tools available to analyze this data. This data explosion has resulted in an increased use of multivariate, model-based methods for asking neuroscience questions, allowing scientists to investigate multiple hypotheses with a single dataset, to use complex, time-varying stimuli, and to study the human brain under more naturalistic conditions. These tools come in the form of “Encoding” models, in which stimulus features are used to model brain activity, and “Decoding” models, in which neural features are used to generated a stimulus output. Here we review the current state of encoding and decoding models in cognitive electrophysiology and provide a practical guide toward conducting experiments and analyses in this emerging field. Our examples focus on using linear models in the study of human language and audition. We show how to calculate auditory receptive fields from natural sounds as well as how to decode neural recordings to predict speech. The paper aims to be a useful tutorial to these approaches, and a practical introduction to using machine learning and applied statistics to build models of neural activity. The data analytic approaches we discuss may also be applied to other sensory modalities, motor systems, and cognitive systems, and we cover some examples in these areas. In addition, a collection of Jupyter notebooks is publicly available as a complement to the material covered in this paper, providing code examples and tutorials for predictive modeling in python. The aimis to provide a practical understanding of predictivemodeling of human brain data and to propose best-practices in conducting these analyses.}, 45 | journal = {Frontiers in Systems Neuroscience}, 46 | author = {Holdgraf, Christopher Ramsay and Rieger, J.W. and Micheli, C. and Martin, S. and Knight, R.T. and Theunissen, F.E.}, 47 | year = {2017}, 48 | keywords = {Decoding models, Encoding models, Electrocorticography (ECoG), Electrophysiology/evoked potentials, Machine learning applied to neuroscience, Natural stimuli, Predictive modeling, Tutorials} 49 | } 50 | 51 | @book{ruby, 52 | title = {The Ruby Programming Language}, 53 | author = {Flanagan, David and Matsumoto, Yukihiro}, 54 | year = {2008}, 55 | publisher = {O'Reilly Media} 56 | } 57 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | jupyter-book 2 | pandas 3 | matplotlib 4 | numpy 5 | seaborn 6 | matplotlib 7 | wordcloud -------------------------------------------------------------------------------- /runtime.txt: -------------------------------------------------------------------------------- 1 | 3.7 --------------------------------------------------------------------------------