├── .gitignore ├── LICENSE ├── OOP_workshop └── OOP_Workshop.ipynb ├── README.md ├── intro_to_git ├── Git_Workshop_v4.html ├── Git_Workshop_v4.ipynb ├── README.md └── images │ ├── cloning.png │ ├── cloning_adding.png │ ├── commit.png │ ├── evilclone.png │ ├── logpush.png │ ├── newrepo.png │ ├── nice.png │ ├── personal-access-token-form.png │ ├── postpush.png │ └── viewconflict.png ├── kaggle_workshop ├── Student Exam Performance Workshop.ipynb └── StudentsPerformance.csv ├── nba ├── Data Analysis of NBA Players Challenge.ipynb ├── LICENSE ├── NBA Player Statistics Workshop.ipynb ├── README.md ├── basketball.py └── requirements.txt ├── testing_workshop ├── LICENSE.txt ├── README.md ├── docs │ └── .gitkeep ├── fixtures │ └── .gitkeep ├── motorsports │ ├── __init__.py │ ├── buildings.py │ └── vehicles.py ├── requirements.txt ├── setup.py └── tests │ ├── __init__.py │ ├── test_buildings.py │ ├── test_imports.py │ └── test_vehicles.py └── xbus-501-01.software-engineering-for-data.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | 55 | # Sphinx documentation 56 | docs/_build/ 57 | 58 | # PyBuilder 59 | target/ 60 | 61 | #Ipython Notebook 62 | .ipynb_checkpoints 63 | 64 | .DS_Store 65 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Georgetown Data Analytics (CCPE) 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /OOP_workshop/OOP_Workshop.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Object-Oriented Python\n", 8 | "\n", 9 | "Object-oriented programming (OOP) is a way of writing programs that represent real-world problem spaces (in terms of objects, functions, classes, attributes, methods, and inheritance). As Allen Downey explains in [__Think Python__](http://www.greenteapress.com/thinkpython/html/thinkpython018.html), in object-oriented programming, we shift away from framing the *function* as the active agent and toward seeing the *object* as the active agent.\n", 10 | "\n", 11 | "In this workshop, we are going to create a class that represents the rational numbers. This tutorial is adapted from content in Anand Chitipothu's [__Python Practice Book__](http://anandology.com/python-practice-book/index.html). It was created by [Rebecca Bilbro](https://github.com/rebeccabilbro/Tutorials/tree/master/OOP)\n", 12 | "\n", 13 | "## Part 1: Classes, methods, modules, and packages.\n", 14 | "\n", 15 | "In our first part we will modify the code in this Jupyter notebook by filling in the missing functionality. We have a `Rationalnumber` class which can be used to make objects of type... `RationalNumber`. These are basically fractions and we can leverage the built-ins in Python to take two objects of this type and add, multiply, subtract, or divide using traditional math operations such as `+`, `*`, `-`, and `/`.\n", 16 | "\n", 17 | "As you add code you can \"execute\" the notebook cell to check your results.\n", 18 | "\n", 19 | "It may help to review [built-ins in Python](https://docs.python.org/3.5/library/functions.html) and the [Python data model](https://docs.python.org/3.5/reference/datamodel.html)." 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": { 26 | "scrolled": true 27 | }, 28 | "outputs": [ 29 | { 30 | "name": "stdout", 31 | "output_type": "stream", 32 | "text": [ 33 | "The first number is 1/2\n", 34 | "The second number is 3/2\n", 35 | "\n", 36 | "Their sum is 8/4\n", 37 | "Their product is None\n", 38 | "Their difference is None\n", 39 | "Their quotient is None\n" 40 | ] 41 | } 42 | ], 43 | "source": [ 44 | "class RationalNumber:\n", 45 | " \"\"\"Any number that can be expressed as the quotient or fraction p/q \n", 46 | " of two integers, p and q, with the denominator q not equal to zero. \n", 47 | " \n", 48 | " Since q may be equal to 1, every integer is a rational number.\n", 49 | " \"\"\"\n", 50 | " \n", 51 | " def __init__(self, numerator, denominator=1):\n", 52 | " self.n = numerator\n", 53 | " self.d = denominator\n", 54 | "\n", 55 | " def __add__(self, other):\n", 56 | " # Write a function that allows for the addition of two rational numbers.\n", 57 | " # I did this one for you :D\n", 58 | " if not isinstance(other, RationalNumber):\n", 59 | " other = RationalNumber(other)\n", 60 | "\n", 61 | " n = self.n * other.d + self.d * other.n\n", 62 | " d = self.d * other.d\n", 63 | " return RationalNumber(n, d)\n", 64 | " \n", 65 | " def __sub__(self, other):\n", 66 | " # Write a function that allows for the subtraction of two rational numbers.\n", 67 | " pass\n", 68 | "\n", 69 | " \n", 70 | " \n", 71 | " def __mul__(self, other):\n", 72 | " # Write a function that allows for the multiplication of two rational numbers.\n", 73 | " pass\n", 74 | "\n", 75 | " \n", 76 | " \n", 77 | " def __truediv__(self, other):\n", 78 | " # Write a function that allows for the division of two rational numbers.\n", 79 | " pass\n", 80 | "\n", 81 | " \n", 82 | " def __str__(self):\n", 83 | " return \"%s/%s\" % (self.n, self.d)\n", 84 | "\n", 85 | " __repr__ = __str__\n", 86 | " \n", 87 | "\n", 88 | "if __name__ == \"__main__\": \n", 89 | " # Let's create two RationalNumber variables to represent the values 1/2 and 3/2 \n", 90 | " x = RationalNumber(1,2)\n", 91 | " y = RationalNumber(3,2)\n", 92 | " print (\"The first number is {!s}\".format(x))\n", 93 | " print (\"The second number is {!s}\\n\".format(y))\n", 94 | "\n", 95 | " # Now let's test our math operations\n", 96 | " print (\"Their sum is {!s}\".format(x+y))\n", 97 | " print (\"Their product is {!s}\".format(x*y))\n", 98 | " print (\"Their difference is {!s}\".format(x-y))\n", 99 | " print (\"Their quotient is {!s}\".format(x/y))" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "(hint) \n", 107 | "\n", 108 | "|Operation |Method |\n", 109 | "|---------------|----------------------------|\n", 110 | "|Addition |(a/b) + (c/d) = (ad + bc)/bd|\n", 111 | "|Subtraction |(a/b) - (c/d) = (ad - bc)/bd|\n", 112 | "|Multiplication |(a/b) x (c/d) = ac/bd |\n", 113 | "|Division |(a/b) / (c/d) = ad/bc |" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "## Part 2: Modules (Optional)\n", 121 | "\n", 122 | "Modules are reusable libraries of code and many libraries come standard with Python. You can import them into a program using the *import* statement. For example:" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 2, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "The first few digits of pi are 3.141593...\n" 135 | ] 136 | } 137 | ], 138 | "source": [ 139 | "import math\n", 140 | "print (\"The first few digits of pi are {:f}...\".format(math.pi))" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "The math module implements many functions for complex mathematical operations using floating point values, including logarithms, trigonometric operations, and irrational numbers like π. \n", 148 | "\n", 149 | "#### As an exercise, we'll encapsulate your rational numbers script into a module and then import it.\n", 150 | "\n", 151 | "We are now going to continue this lab at the command line. Save the code you wrote in a new file called `RatNum.py`. Open your terminal and navigate to wherever you have the file saved. \n", 152 | "\n", 153 | "Type: \n", 154 | "\n", 155 | " python \n", 156 | "\n", 157 | "When you're inside the Python interpreter, enter:\n", 158 | "\n", 159 | " from RatNum import RationalNumber\n", 160 | " a = RationalNumber(1,3)\n", 161 | " b = RationalNumber(2,3)\n", 162 | " print (a*b)\n", 163 | "\n", 164 | "Success! You have just made a module. \n", 165 | "\n", 166 | "## Packages\n", 167 | "\n", 168 | "A package is a directory of modules. For example, we could make a big package by bundling together modules with classes for natural numbers, integers, irrational numbers, and real numbers. \n", 169 | "\n", 170 | "The Python Package Index, or \"PyPI\", is the official third-party software repository for the Python programming language. It is a comprehensive catalog of all open source Python packages and is maintained by the Python Software Foundation. You can download packages from PyPI with the *pip* command in your terminal.\n", 171 | "\n", 172 | "PyPI packages are uploaded by individual package maintainers. That means you can write and contribute your own Python packages!\n", 173 | "\n", 174 | "#### Now let's turn your module into a package called Mathy.\n", 175 | "\n", 176 | "1. Create a folder called Mathy, and add your RatNum.py file to the folder.\n", 177 | "2. Add an empty file to the folder called \\_\\_init\\_\\_.py.\n", 178 | "3. Create a third file in that folder called MathQuiz.py that imports RationalNumber from RatNum... \n", 179 | "4. ...and uses the RationalNumbers class from RatNum. For example: \n", 180 | "\n", 181 | "```\n", 182 | " #MathQuiz.py\n", 183 | " \n", 184 | " from RatNum import RationalNumber\n", 185 | "\n", 186 | " print \"Pop quiz! Find the sum, product, difference, and quotient for the following rational numbers:\"\n", 187 | " \n", 188 | " x = RationalNumber(1,3)\n", 189 | " y = RationalNumber(2,3)\n", 190 | "\n", 191 | " print (\"The first number is {!s}\".format(x))\n", 192 | " print (\"The second number is {!s}\".format(y))\n", 193 | " print (\"Their sum is {!s}\".format(x+y))\n", 194 | " print (\"Their product is {!s}\".format(x*y))\n", 195 | " print (\"Their difference is {!s}\".format(x-y))\n", 196 | " print (\"Their quotient is {!s}\".format(x/y))\n", 197 | "```\n", 198 | "\n", 199 | "#### In the terminal, navigate to the Mathy folder. When you are inside the folder, type:\n", 200 | "\n", 201 | " python MathQuiz.py\n", 202 | "\n", 203 | "Congrats! You have just made a Python package! \n", 204 | "\n", 205 | "#### Now type: \n", 206 | "\n", 207 | " python RatNum.py \n", 208 | " \n", 209 | "What did you get this time? Is it different from the answer you got for the previous command? Why??\n", 210 | "\n", 211 | "Once you've completed this exercise, move on to Part 3." 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "## Part 3: Inheritance (Optional)\n", 219 | "\n", 220 | "Suppose we were to write out another class for another set of numbers, say the integers. What are the rules for addition, subtraction, multiplication, and division? If we can identify shared properties between integers and rational numbers, we could use that information to write a integer class that 'inherits' properties from our rational number class.\n", 221 | "\n", 222 | "#### Let's add an integer class to our RatNum.py file that inherits all the properties of our RationalNumber class." 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "class Integer(RationalNumber):\n", 232 | " #What should we add here?\n", 233 | " pass" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "#### Now update your \\_\\_name\\_\\_ == \"\\_\\_main\\_\\_\" statement at the end of RatNum.py to read:" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": {}, 247 | "outputs": [], 248 | "source": [ 249 | "if __name__ == \"__main__\":\n", 250 | " q = Integer(5)\n", 251 | " r = Integer(6)\n", 252 | " print (\"{!s} is an integer expressed as a rational number\".format(q))\n", 253 | " print (\"So is {!s}\".format(r))\n", 254 | " print (\"When you add them you get {!s}\".format(q+r))\n", 255 | " print (\"When you multiply them you get {!s}\".format(q*r))\n", 256 | " print (\"When you subtract them you get {!s}\".format(q-r))\n", 257 | " print (\"When you divide them you get {!s}\".format(q/r))" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "Did it work? \n", 265 | "\n", 266 | "Nice job! " 267 | ] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3.10.5 ('my-env')", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.10.5" 287 | }, 288 | "vscode": { 289 | "interpreter": { 290 | "hash": "d474be476f0a6db94789ad08b16ede00cb8654a4c727ed769fba79dc07ed6ebd" 291 | } 292 | } 293 | }, 294 | "nbformat": 4, 295 | "nbformat_minor": 1 296 | } 297 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # XBUS-501-01.Software-Engineering-for-Data 2 | 3 | The workshops and demos for this course are listed below along with links to materials that exist for the demo/ workshop. For objectives of and prerequisites for this class, please see the [course details](https://github.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/blob/master/xbus-501-01.software-engineering-for-data.md). Slides are available in BlackBoard. 4 | 5 | **Session 1** (Friday PM) 6 | 7 | * Project Environment Workshop, see class slides 8 | 9 | **Session 2** (Saturday AM) 10 | 11 | * Git/ GitHub Demo 12 | * [Intro to Git Workshop](https://github.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/tree/master/intro_to_git) 13 | 14 | **Session Break** (Saturday PM) 15 | 16 | * Capstone presentations from finishing cohort 17 | 18 | 19 | **Session 3** (Friday PM) 20 | 21 | * Python 2/3 Transition Demo 22 | * [Object Oriented Programming Workshop](https://github.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/tree/master/OOP_workshop) 23 | 24 | 25 | **Session 4** (Saturday AM) 26 | 27 | * [Testing Workshop](https://github.com/looselycoupled/xbus-501-test-workshop) 28 | 29 | **Session 5** (Saturday PM) 30 | 31 | * [PDB Demo](https://gist.github.com/looselycoupled/7fd8331ad5551b35c4c1) 32 | * [Timing Demo](https://github.com/looselycoupled/xbus-501-timing-demonstrations) 33 | * [NBA Player Statistics Workshop](https://github.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/tree/master/nba) 34 | 35 | 36 | 37 | -------------------------------------------------------------------------------- /intro_to_git/Git_Workshop_v4.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This notebook was created by Rebecca Bilbro" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Introduction to Git, Part 1" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## Setting up" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "For this lab, pair up with someone else. Open your terminal. If you have questions about how to access your terminal, ask your teammate.\n", 29 | "\n", 30 | "Now make sure both of you have Github accounts by typing the following into your terminal: \n", 31 | "\n", 32 | " git --version" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "*Note: I am working on a Mac. If you are using Windows and want to know the Windows versions of terminal commands, there's a look-up table here: http://www.lemoda.net/windows/windows2unix/windows2unix.html.*" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "If you don't already have Git installed, do that now: \n", 47 | "\n", 48 | "1. Create a Github account at http://github.com \n", 49 | "2. Download and install the latest version of Git: http://git-scm.com/downloads\n", 50 | "3. Configure Git with your name and email by typing the following into your terminal:" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | " git config --global user.name \"YOUR NAME\" \n", 58 | " git config --global user.email \"YOUR EMAIL ADDRESS\"" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "**NOTE**: This username and email will be publicly displayed on GitHub. Please feel free to use a handle instead of your real name, and if you would prefer not to use your actual email address you may use \"@noreply.github.com\" instead." 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "## Create a Personal Access Token" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "When git communicates to a remote server (like Github) it can use the HTTPS protocol or the SSH protocol. You may already be familiar with HTTPS as that's what our web browsers use to retrieve HTML and other files on the internet. When using git to clone a repo (or other operations on a remote server) using HTTPS, git will ask you for your Username and Password like so:\n", 80 | "\n", 81 | "```\n", 82 | "git clone https://github.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data.git\n", 83 | "Username for 'https://github.com': bob\n", 84 | "Password for 'https://bob@github.com':\n", 85 | "```\n", 86 | "\n", 87 | "However, on August 13, 2021 **Github stopped allowing you to use your password and instead requires you to enter a \"Personal Access Token\"** (for most accounts) as they make their platform more secure (you can read their [blog post](https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/) for more details).\n", 88 | "\n", 89 | "If you choose to use git over SSH to communicate with Github then it will use your SSH keys and no password will be required. However, this can be problematic to setup on Windows for beginner users and we will assume you are going to use HTTPS.\n", 90 | "\n", 91 | "Once you've created a Personal Access Token, you can use it in place of your password when performing operations such as `git clone` or `git push`. As such **keep your token somewhere safe as it is functionally a password for your account**. You will also need it for this lab and for working with your CAPSTONE group. \n", 92 | "\n", 93 | "\n", 94 | "Use the instructions provided by Github to create your token at the following address:\n", 95 | "\n", 96 | "\n", 97 | "https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token\n", 98 | "\n", 99 | "When you get to the form pictured below: 1) set the expiration for 90 days or the length of the certificate program 2) select the \"repo (Full control of private repositories)\" option (no others are needed for this lab)\n", 100 | "\n", 101 | "![github form](images/personal-access-token-form.png)\n", 102 | " " 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "## Create a Repository on Github" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "Each of you should [create a new repository](https://github.com/new) - but **don't call them the same thing**. Add a title and a description, set the repo to Public, initialize with a _README_, and click Create.\n", 117 | "\n", 118 | "Now, give your team mate permission to collaborate: Go to the settings tab on your repo page, click on Collaborators, and add your team mate's Github username.\n", 119 | "\n", 120 | "![](images/newrepo.png)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "## Working with a repository" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "Next, clone your empty repo. Look in the bottom righthand corner of the repo page on Github to find the URL. The default is an HTTPS clone. You can read more about [options](https://help.github.com/articles/which-remote-url-should-i-use/). You can always switch between HTTPS and SSH later by following [these instructions](https://help.github.com/articles/changing-a-remote-s-url/).\n", 135 | "\n", 136 | "![clone address depiction](images/cloning.png)" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "Now in your terminal, clone the repo and then change to that directory (the `cd` command) type: \n", 144 | "\n", 145 | " git clone [PASTE THE URL HERE] \n", 146 | " cd [THE NAME OF THE REPO]" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "Then add some content to the empty myclone folder. Some simple text documents would be good. \n", 154 | "\n", 155 | "![](images/cloning_adding.png)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "Now add and commit those changes." 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | " git add --all \n", 170 | " git commit -m \"A few additions.\"" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "*Note: git add -all can be a bit ham-fisted and is not the only option. You can precision-add files using git add FILENAME.*" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "You should also try to make your commit messages specific and meaningful.\n", 185 | "\n", 186 | "![commit messages](http://imgs.xkcd.com/comics/git_commit.png \"1296\")" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": { 192 | "scrolled": false 193 | }, 194 | "source": [ 195 | "![](images/commit.png)" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "View the log and push the changes back to Github. \n", 203 | "\n", 204 | " git log \n", 205 | " git push origin main" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "![](images/logpush.png)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "View your changes on the repo webpage.\n", 220 | "\n", 221 | "![](images/postpush.png)\n", 222 | "\n", 223 | "Once you've finished, move on to part two." 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "# Introduction to Git, Part 2" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "## Breaking Git*" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "*Don't worry, you won't actually break it." 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "#### Now, clone your teammate's repo. Hopefully you gave your repos different names! \n", 252 | "\n", 253 | " git clone [PASTE TEAMMATE'S URL HERE] \n", 254 | " cd [YOUR TEAMATE'S REPO NAME]" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "#### Make more changes! Add some new files, commit them, and push back to Github. Remember these 6 commands: \n", 262 | "\n", 263 | " git add --all #Stage all the changes\n", 264 | " git add (FILENAME) #Just stage one updated file\n", 265 | " git status #Check the status to see what's changed/staged\n", 266 | " git commit -m \"Unique message.\" #Commit the changes\n", 267 | " git log #See the commit history\n", 268 | " git push origin main #Push the commits back to main branch on Github servers" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "#### Now, both teammates should edit the *same line* on the *same file*, and try to add, commit, and push changes back to Github." 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "What happened??" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "## Conflict Resolution" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "Here, the first person (named Nice Clone in the example below) to push the changes is successful." 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "![](images/nice.png)" 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "The next person (Evil Clone) gets a failure message when they try to push. Then they type:\n", 311 | "\n", 312 | " git pull #syncs up with current repo version (combo of git fetch + git merge)\n", 313 | "\n", 314 | "...and Git sends a merge conflict error message:" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "![](images/evilclone.png)" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "This is called a merge conflict. Git tells you to fix the conflicts and then commit the result. Now let's resolve the merge conflict.\n", 329 | "\n", 330 | "Use the error message to identify which file caused the conflict. Open that file in your text editor." 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "![](images/viewconflict.png)" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "The file shows the changes made by both parties. The first section is the version of the current branch head (Evil Clone's). The second section is the version of main branch (Nice Clone's). \n", 345 | "\n", 346 | "#### Let's decide to be nice in order to resolve the conflict. Edit the file accordingly. Then run: \n", 347 | "\n", 348 | " git add NAME OF FIXED FILE\n", 349 | " git commit -m \"Fixed merge conflict.\"\n", 350 | " git push origin main" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "Nice job! For more on advanced Git topics, check out __Pro Git__ by Scott Chacon and Ben Straub http://git-scm.com/book/en/v2." 358 | ] 359 | } 360 | ], 361 | "metadata": { 362 | "kernelspec": { 363 | "display_name": "Python 3 (ipykernel)", 364 | "language": "python", 365 | "name": "python3" 366 | }, 367 | "language_info": { 368 | "codemirror_mode": { 369 | "name": "ipython", 370 | "version": 3 371 | }, 372 | "file_extension": ".py", 373 | "mimetype": "text/x-python", 374 | "name": "python", 375 | "nbconvert_exporter": "python", 376 | "pygments_lexer": "ipython3", 377 | "version": "3.10.2" 378 | } 379 | }, 380 | "nbformat": 4, 381 | "nbformat_minor": 1 382 | } 383 | -------------------------------------------------------------------------------- /intro_to_git/README.md: -------------------------------------------------------------------------------- 1 | # Instructions 2 | 3 | This lab is performed entirely at the command line! However, you have the option of reading the instructions in either an HTML page or within a Jupyter notebook. 4 | 5 | If you would like to view the HTML version, open "Git_Workshop_v4.html" in your browser. 6 | 7 | If you would like to use the Jupyter notebook version, startup a Jupyter server within this directory using the command `jupyter notebook`. If you are on Windows, you will likely need to do this within the "Anaconda Prompt" application. -------------------------------------------------------------------------------- /intro_to_git/images/cloning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/cloning.png -------------------------------------------------------------------------------- /intro_to_git/images/cloning_adding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/cloning_adding.png -------------------------------------------------------------------------------- /intro_to_git/images/commit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/commit.png -------------------------------------------------------------------------------- /intro_to_git/images/evilclone.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/evilclone.png -------------------------------------------------------------------------------- /intro_to_git/images/logpush.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/logpush.png -------------------------------------------------------------------------------- /intro_to_git/images/newrepo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/newrepo.png -------------------------------------------------------------------------------- /intro_to_git/images/nice.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/nice.png -------------------------------------------------------------------------------- /intro_to_git/images/personal-access-token-form.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/personal-access-token-form.png -------------------------------------------------------------------------------- /intro_to_git/images/postpush.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/postpush.png -------------------------------------------------------------------------------- /intro_to_git/images/viewconflict.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/intro_to_git/images/viewconflict.png -------------------------------------------------------------------------------- /kaggle_workshop/Student Exam Performance Workshop.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Student Exam Performance Workshop\n", 8 | "Use Python's pandas module to load a dataset containing student demographic information and test scores and find relationships between student attributes and test scores. This workshop will serve as an introduction to pandas and will allow students to practice the following skills: \n", 9 | "\n", 10 | "- Load a csv into a pandas DataFrame and examine summary statistics\n", 11 | "- Rename DataFrame column names\n", 12 | "- Add columns to a DataFrame\n", 13 | "- Change values in DataFrame rows\n", 14 | "- Analyze relationships between categorical features and test scores\n", 15 | "\n", 16 | "**Bonus:**\n", 17 | "\n", 18 | "Determine the relationship between the students' lunch classification and average test scores by creating a seaborn boxplot" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# Import the python modules that we will need to use\n", 28 | "import numpy as np\n", 29 | "import pandas as pd\n", 30 | "import matplotlib.pyplot as plt" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "def load_data(my_path):\n", 40 | " my_dataframe = pd.read_csv(my_path)\n", 41 | " return my_dataframe" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "Use the `load_data` function to load the StudentsPerformance.csv file into a pandas dataframe variable called `df`\n", 49 | "\n", 50 | "__Hint__: Keep in mind where the csv file is in relation to this Jupyter Notebook. Do you need to provide an absolute or relative file path?" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "# Write python to call the function above and load the StudentPeformance csv file into a pandas dataframe\n", 60 | "\n", 61 | "# Keep this line so you can see the first five rows of your dataframe once you have loaded it!\n", 62 | "students_df.head(5)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "__Next step:__ Now that we have loaded our DataFrame, let's look at the summary statistics of our data. We can use the `describe` method to accomplish this:" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "students_df.describe(include='all')" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "By looking at this breakdown of our dataset, I can make at least the following observations:\n", 86 | "\n", 87 | "1. Our DataFrame consists of eight columns, three of which are student test scores.\n", 88 | "2. There are no missing any values in our DataFrame!\n", 89 | "3. The data appears to be pretty evenly distributed.\n", 90 | "4. The column names are long and difficult to type\n", 91 | "\n", 92 | "## Renaming DataFrame Columns\n", 93 | "\n", 94 | "Let's change our column names so they are easier to work with!\n", 95 | "\n", 96 | "__Hint__: Look into the pandas `columns` attribute to make the change!" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "columns = [\n", 106 | " 'gender', 'race', 'parentDegree', 'lunchStatus', \n", 107 | " 'courseCompletion', 'mathScore', 'readingScore', \n", 108 | " 'writingScore'\n", 109 | "]" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": null, 115 | "metadata": {}, 116 | "outputs": [], 117 | "source": [ 118 | "def rename_columns(my_dataframe, my_columns):\n", 119 | " my_dataframe.columns = my_columns\n", 120 | " return my_dataframe" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "# Use the above function to rename the DataFrame's column names\n", 130 | "\n", 131 | "\n", 132 | "students_df.head(10) #Look at the first ten rows of the DataFrame to ensure the renaming worked!" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "## Adding Columns to a DataFrame\n", 140 | "\n", 141 | "Great! Next we want to add an `avgScore` column that is an average of the three given test scores (`mathScore`, `readingScore` and `writingScore`). This will allow us to generalize the students' performance and simplify the process of us examining our feature's impact on student performance. " 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "# Complete the following line of code to create an avgScore column\n", 151 | "students_df['avgScore'] = \n", 152 | "\n", 153 | "students_df.head(5)" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "## Analyzing Feature Relationships\n", 161 | "Now that our data is looking the way we want, let's examine how some of our features correlate with students' test performances. We will start by looking at the relationship between race and parent degree status on test scores.\n", 162 | "\n", 163 | "__Hint__: Use pandas' `groupby` method to examine these relationships. The documentation for `groupby` can be found here: https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.groupby.html" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "students_df.groupby(['race','parentDegree']).mean()" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "From examining the above output, we can see that across all `race` groups, students with \"high school\" and \"some high school\" as their parent degree status (`parentDegree`) had lower test scores. \n", 180 | "\n", 181 | "We can also use `groupby` to examine a subset of the original columns with respect to column-specific aggregations:" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "students_df[['race','parentDegree','gender']].groupby(['gender']).count()" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "__Next step__: Since there seems to be a clear distinction between students that have parents with have some college education and those that do not, let's simplify our DataFrame by creating a `degreeBinary` column based on values in the `parentDegree` column. This new column will simply contain either \"no_degree\" or \"has_degree.\" We can do this by writing a basic function and using pandas' `apply` method:" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": null, 203 | "metadata": {}, 204 | "outputs": [], 205 | "source": [ 206 | "# Complete this function to return the proper strings to denote degree status\n", 207 | "\n", 208 | "def degree_status(edu):\n", 209 | " if edu in {'high school', 'some high school'}:\n", 210 | " #Fill in your code here!\n", 211 | "\n", 212 | "students_df['degreeBinary'] = students_df['parentDegree'].apply(degree_status)\n", 213 | "students_df.head(10)" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "Great job! Now let's continue examining our features to find relationships in our data\n", 221 | "\n", 222 | "__Your turn:__ Use the `groupby` function again examine relationships between other features and student test scores. What can we learn about the relationship between these whether or not the students have completed the course and their test scores? What about the relationship between gender and test scores?" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "# Use groupby to examine the relationship between course completion status and test scores" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "# Use groupby to examine the relationship between gender and test scores" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "## Bonus: Visualization\n", 248 | "\n", 249 | "Great job making it this far! As a bonus exercise, we will create a simple data visualization. We have examined the relationship between all of our features and student test scores except for one -- student lunch status, which is found in the `lunch` column.\n", 250 | "\n", 251 | "In order to explore this relationship, let's create a `barplot`, with the students'`lunch` status as the x-axis and their average test scores (`avgScore`) as the y-axis.\n", 252 | "\n", 253 | "We will use seaborn, which is a third-party library, to complete this visualization. If you do not already have seaborn installed, `pip install` it now! Follow the seaborn documentation to create the `barplot` in the cell below." 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [ 262 | "import seaborn as sns # import the seaborn module -- make sure you have it installed!\n", 263 | "\n", 264 | "sns.set(style='whitegrid')\n", 265 | " \n", 266 | "def graph_data(my_dataframe, xkey='lunchStatus', ykey='avgScore'):\n", 267 | " # Fill this in to create the barplot!\n", 268 | " \n", 269 | "graph_data(students_df)" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [] 278 | } 279 | ], 280 | "metadata": { 281 | "kernelspec": { 282 | "display_name": "Python 3", 283 | "language": "python", 284 | "name": "python3" 285 | }, 286 | "language_info": { 287 | "codemirror_mode": { 288 | "name": "ipython", 289 | "version": 3 290 | }, 291 | "file_extension": ".py", 292 | "mimetype": "text/x-python", 293 | "name": "python", 294 | "nbconvert_exporter": "python", 295 | "pygments_lexer": "ipython3", 296 | "version": "3.7.1" 297 | } 298 | }, 299 | "nbformat": 4, 300 | "nbformat_minor": 2 301 | } 302 | -------------------------------------------------------------------------------- /kaggle_workshop/StudentsPerformance.csv: -------------------------------------------------------------------------------- 1 | "gender","race/ethnicity","parental level of education","lunch","test preparation course","math score","reading score","writing score" 2 | "female","group B","bachelor's degree","standard","none","72","72","74" 3 | "female","group C","some college","standard","completed","69","90","88" 4 | "female","group B","master's degree","standard","none","90","95","93" 5 | "male","group A","associate's degree","free/reduced","none","47","57","44" 6 | "male","group C","some college","standard","none","76","78","75" 7 | "female","group B","associate's degree","standard","none","71","83","78" 8 | "female","group B","some college","standard","completed","88","95","92" 9 | "male","group B","some college","free/reduced","none","40","43","39" 10 | "male","group D","high school","free/reduced","completed","64","64","67" 11 | "female","group B","high school","free/reduced","none","38","60","50" 12 | "male","group C","associate's degree","standard","none","58","54","52" 13 | "male","group D","associate's degree","standard","none","40","52","43" 14 | "female","group B","high school","standard","none","65","81","73" 15 | "male","group A","some college","standard","completed","78","72","70" 16 | "female","group A","master's degree","standard","none","50","53","58" 17 | "female","group C","some high school","standard","none","69","75","78" 18 | "male","group C","high school","standard","none","88","89","86" 19 | "female","group B","some high school","free/reduced","none","18","32","28" 20 | "male","group C","master's degree","free/reduced","completed","46","42","46" 21 | "female","group C","associate's degree","free/reduced","none","54","58","61" 22 | "male","group D","high school","standard","none","66","69","63" 23 | "female","group B","some college","free/reduced","completed","65","75","70" 24 | "male","group D","some college","standard","none","44","54","53" 25 | "female","group C","some high school","standard","none","69","73","73" 26 | "male","group D","bachelor's degree","free/reduced","completed","74","71","80" 27 | "male","group A","master's degree","free/reduced","none","73","74","72" 28 | "male","group B","some college","standard","none","69","54","55" 29 | "female","group C","bachelor's degree","standard","none","67","69","75" 30 | "male","group C","high school","standard","none","70","70","65" 31 | "female","group D","master's degree","standard","none","62","70","75" 32 | "female","group D","some college","standard","none","69","74","74" 33 | "female","group B","some college","standard","none","63","65","61" 34 | "female","group E","master's degree","free/reduced","none","56","72","65" 35 | "male","group D","some college","standard","none","40","42","38" 36 | "male","group E","some college","standard","none","97","87","82" 37 | "male","group E","associate's degree","standard","completed","81","81","79" 38 | "female","group D","associate's degree","standard","none","74","81","83" 39 | "female","group D","some high school","free/reduced","none","50","64","59" 40 | "female","group D","associate's degree","free/reduced","completed","75","90","88" 41 | "male","group B","associate's degree","free/reduced","none","57","56","57" 42 | "male","group C","associate's degree","free/reduced","none","55","61","54" 43 | "female","group C","associate's degree","standard","none","58","73","68" 44 | "female","group B","associate's degree","standard","none","53","58","65" 45 | "male","group B","some college","free/reduced","completed","59","65","66" 46 | "female","group E","associate's degree","free/reduced","none","50","56","54" 47 | "male","group B","associate's degree","standard","none","65","54","57" 48 | "female","group A","associate's degree","standard","completed","55","65","62" 49 | "female","group C","high school","standard","none","66","71","76" 50 | "female","group D","associate's degree","free/reduced","completed","57","74","76" 51 | "male","group C","high school","standard","completed","82","84","82" 52 | "male","group E","some college","standard","none","53","55","48" 53 | "male","group E","associate's degree","free/reduced","completed","77","69","68" 54 | "male","group C","some college","standard","none","53","44","42" 55 | "male","group D","high school","standard","none","88","78","75" 56 | "female","group C","some high school","free/reduced","completed","71","84","87" 57 | "female","group C","high school","free/reduced","none","33","41","43" 58 | "female","group E","associate's degree","standard","completed","82","85","86" 59 | "male","group D","associate's degree","standard","none","52","55","49" 60 | "male","group D","some college","standard","completed","58","59","58" 61 | "female","group C","some high school","free/reduced","none","0","17","10" 62 | "male","group E","bachelor's degree","free/reduced","completed","79","74","72" 63 | "male","group A","some high school","free/reduced","none","39","39","34" 64 | "male","group A","associate's degree","free/reduced","none","62","61","55" 65 | "female","group C","associate's degree","standard","none","69","80","71" 66 | "female","group D","some high school","standard","none","59","58","59" 67 | "male","group B","some high school","standard","none","67","64","61" 68 | "male","group D","some high school","free/reduced","none","45","37","37" 69 | "female","group C","some college","standard","none","60","72","74" 70 | "male","group B","associate's degree","free/reduced","none","61","58","56" 71 | "female","group C","associate's degree","standard","none","39","64","57" 72 | "female","group D","some college","free/reduced","completed","58","63","73" 73 | "male","group D","some college","standard","completed","63","55","63" 74 | "female","group A","associate's degree","free/reduced","none","41","51","48" 75 | "male","group C","some high school","free/reduced","none","61","57","56" 76 | "male","group C","some high school","standard","none","49","49","41" 77 | "male","group B","associate's degree","free/reduced","none","44","41","38" 78 | "male","group E","some high school","standard","none","30","26","22" 79 | "male","group A","bachelor's degree","standard","completed","80","78","81" 80 | "female","group D","some high school","standard","completed","61","74","72" 81 | "female","group E","master's degree","standard","none","62","68","68" 82 | "female","group B","associate's degree","standard","none","47","49","50" 83 | "male","group B","high school","free/reduced","none","49","45","45" 84 | "male","group A","some college","free/reduced","completed","50","47","54" 85 | "male","group E","associate's degree","standard","none","72","64","63" 86 | "male","group D","high school","free/reduced","none","42","39","34" 87 | "female","group C","some college","standard","none","73","80","82" 88 | "female","group C","some college","free/reduced","none","76","83","88" 89 | "female","group D","associate's degree","standard","none","71","71","74" 90 | "female","group A","some college","standard","none","58","70","67" 91 | "female","group D","some high school","standard","none","73","86","82" 92 | "female","group C","bachelor's degree","standard","none","65","72","74" 93 | "male","group C","high school","free/reduced","none","27","34","36" 94 | "male","group C","high school","standard","none","71","79","71" 95 | "male","group C","associate's degree","free/reduced","completed","43","45","50" 96 | "female","group B","some college","standard","none","79","86","92" 97 | "male","group C","associate's degree","free/reduced","completed","78","81","82" 98 | "male","group B","some high school","standard","completed","65","66","62" 99 | "female","group E","some college","standard","completed","63","72","70" 100 | "female","group D","some college","free/reduced","none","58","67","62" 101 | "female","group D","bachelor's degree","standard","none","65","67","62" 102 | "male","group B","some college","standard","none","79","67","67" 103 | "male","group D","bachelor's degree","standard","completed","68","74","74" 104 | "female","group D","associate's degree","standard","none","85","91","89" 105 | "male","group B","high school","standard","completed","60","44","47" 106 | "male","group C","some college","standard","completed","98","86","90" 107 | "female","group C","some college","standard","none","58","67","72" 108 | "female","group D","master's degree","standard","none","87","100","100" 109 | "male","group E","associate's degree","standard","completed","66","63","64" 110 | "female","group B","associate's degree","free/reduced","none","52","76","70" 111 | "female","group B","some high school","standard","none","70","64","72" 112 | "female","group D","associate's degree","free/reduced","completed","77","89","98" 113 | "male","group C","high school","standard","none","62","55","49" 114 | "male","group A","associate's degree","standard","none","54","53","47" 115 | "female","group D","some college","standard","none","51","58","54" 116 | "female","group E","bachelor's degree","standard","completed","99","100","100" 117 | "male","group C","high school","standard","none","84","77","74" 118 | "female","group B","bachelor's degree","free/reduced","none","75","85","82" 119 | "female","group D","bachelor's degree","standard","none","78","82","79" 120 | "female","group D","some high school","standard","none","51","63","61" 121 | "female","group C","some college","standard","none","55","69","65" 122 | "female","group C","bachelor's degree","standard","completed","79","92","89" 123 | "male","group B","associate's degree","standard","completed","91","89","92" 124 | "female","group C","some college","standard","completed","88","93","93" 125 | "male","group D","high school","free/reduced","none","63","57","56" 126 | "male","group E","some college","standard","none","83","80","73" 127 | "female","group B","high school","standard","none","87","95","86" 128 | "male","group B","some high school","standard","none","72","68","67" 129 | "male","group D","some college","standard","completed","65","77","74" 130 | "male","group D","master's degree","standard","none","82","82","74" 131 | "female","group A","bachelor's degree","standard","none","51","49","51" 132 | "male","group D","master's degree","standard","none","89","84","82" 133 | "male","group C","some high school","free/reduced","completed","53","37","40" 134 | "male","group E","some college","free/reduced","completed","87","74","70" 135 | "female","group C","some college","standard","completed","75","81","84" 136 | "male","group D","bachelor's degree","free/reduced","completed","74","79","75" 137 | "male","group C","bachelor's degree","standard","none","58","55","48" 138 | "male","group B","some high school","standard","completed","51","54","41" 139 | "male","group E","high school","standard","none","70","55","56" 140 | "female","group C","associate's degree","standard","none","59","66","67" 141 | "male","group D","some college","standard","completed","71","61","69" 142 | "female","group D","some high school","standard","none","76","72","71" 143 | "female","group C","some college","free/reduced","none","59","62","64" 144 | "female","group E","some college","free/reduced","completed","42","55","54" 145 | "male","group A","high school","standard","none","57","43","47" 146 | "male","group D","some college","standard","none","88","73","78" 147 | "female","group C","some college","free/reduced","none","22","39","33" 148 | "male","group B","some high school","standard","none","88","84","75" 149 | "male","group C","associate's degree","free/reduced","none","73","68","66" 150 | "female","group D","bachelor's degree","standard","completed","68","75","81" 151 | "male","group E","associate's degree","free/reduced","completed","100","100","93" 152 | "male","group A","some high school","standard","completed","62","67","69" 153 | "male","group A","bachelor's degree","standard","none","77","67","68" 154 | "female","group B","associate's degree","standard","completed","59","70","66" 155 | "male","group D","bachelor's degree","standard","none","54","49","47" 156 | "male","group D","some high school","standard","none","62","67","61" 157 | "female","group C","some college","standard","completed","70","89","88" 158 | "female","group E","high school","free/reduced","completed","66","74","78" 159 | "male","group B","some college","free/reduced","none","60","60","60" 160 | "female","group B","associate's degree","standard","completed","61","86","87" 161 | "male","group D","associate's degree","free/reduced","none","66","62","64" 162 | "male","group B","associate's degree","free/reduced","completed","82","78","74" 163 | "female","group E","some college","free/reduced","completed","75","88","85" 164 | "male","group B","master's degree","free/reduced","none","49","53","52" 165 | "male","group C","high school","standard","none","52","53","49" 166 | "female","group E","master's degree","standard","none","81","92","91" 167 | "female","group C","bachelor's degree","standard","completed","96","100","100" 168 | "male","group C","high school","free/reduced","completed","53","51","51" 169 | "female","group B","master's degree","free/reduced","completed","58","76","78" 170 | "female","group B","high school","standard","completed","68","83","78" 171 | "female","group C","some college","free/reduced","completed","67","75","70" 172 | "male","group A","high school","standard","completed","72","73","74" 173 | "male","group E","some high school","standard","none","94","88","78" 174 | "female","group D","some college","standard","none","79","86","81" 175 | "female","group C","associate's degree","standard","none","63","67","70" 176 | "female","group C","bachelor's degree","free/reduced","completed","43","51","54" 177 | "female","group C","master's degree","standard","completed","81","91","87" 178 | "female","group B","high school","free/reduced","completed","46","54","58" 179 | "female","group C","associate's degree","standard","completed","71","77","77" 180 | "female","group B","master's degree","free/reduced","completed","52","70","62" 181 | "female","group D","some high school","standard","completed","97","100","100" 182 | "male","group C","master's degree","free/reduced","completed","62","68","75" 183 | "female","group C","some college","free/reduced","none","46","64","66" 184 | "female","group E","high school","standard","none","50","50","47" 185 | "female","group D","associate's degree","standard","none","65","69","70" 186 | "male","group C","some high school","free/reduced","completed","45","52","49" 187 | "male","group C","associate's degree","free/reduced","completed","65","67","65" 188 | "male","group E","high school","standard","none","80","76","65" 189 | "male","group D","some high school","standard","completed","62","66","68" 190 | "male","group B","some high school","free/reduced","none","48","52","45" 191 | "female","group C","bachelor's degree","standard","none","77","88","87" 192 | "female","group E","associate's degree","standard","none","66","65","69" 193 | "male","group D","some college","standard","completed","76","83","79" 194 | "female","group B","some high school","standard","none","62","64","66" 195 | "male","group D","some college","standard","completed","77","62","62" 196 | "female","group C","master's degree","standard","completed","69","84","85" 197 | "male","group D","associate's degree","standard","none","61","55","52" 198 | "male","group C","some high school","free/reduced","completed","59","69","65" 199 | "male","group E","high school","free/reduced","none","55","56","51" 200 | "female","group B","some college","free/reduced","none","45","53","55" 201 | "female","group B","bachelor's degree","free/reduced","none","78","79","76" 202 | "female","group C","associate's degree","standard","completed","67","84","86" 203 | "female","group D","some college","free/reduced","none","65","81","77" 204 | "male","group C","associate's degree","standard","none","69","77","69" 205 | "female","group B","associate's degree","standard","none","57","69","68" 206 | "male","group C","some college","standard","none","59","41","42" 207 | "male","group D","some high school","standard","completed","74","71","78" 208 | "male","group E","bachelor's degree","standard","none","82","62","62" 209 | "male","group E","high school","standard","completed","81","80","76" 210 | "female","group B","some college","free/reduced","none","74","81","76" 211 | "female","group B","some college","free/reduced","none","58","61","66" 212 | "male","group D","some high school","free/reduced","completed","80","79","79" 213 | "male","group C","some college","free/reduced","none","35","28","27" 214 | "female","group C","high school","free/reduced","none","42","62","60" 215 | "male","group C","associate's degree","free/reduced","completed","60","51","56" 216 | "male","group E","high school","standard","completed","87","91","81" 217 | "male","group B","some high school","standard","completed","84","83","75" 218 | "female","group E","associate's degree","free/reduced","completed","83","86","88" 219 | "female","group C","high school","free/reduced","none","34","42","39" 220 | "male","group B","high school","free/reduced","none","66","77","70" 221 | "male","group B","some high school","standard","completed","61","56","56" 222 | "female","group D","high school","standard","completed","56","68","74" 223 | "male","group B","associate's degree","standard","none","87","85","73" 224 | "female","group C","some high school","free/reduced","none","55","65","62" 225 | "male","group D","some high school","standard","none","86","80","75" 226 | "female","group B","associate's degree","standard","completed","52","66","73" 227 | "female","group E","master's degree","free/reduced","none","45","56","54" 228 | "female","group C","some college","standard","none","72","72","71" 229 | "male","group D","high school","standard","none","57","50","54" 230 | "male","group A","some high school","free/reduced","none","68","72","64" 231 | "female","group C","some college","standard","completed","88","95","94" 232 | "male","group D","some college","standard","none","76","64","66" 233 | "male","group C","associate's degree","standard","none","46","43","42" 234 | "female","group B","bachelor's degree","standard","none","67","86","83" 235 | "male","group E","some high school","standard","none","92","87","78" 236 | "male","group C","bachelor's degree","standard","completed","83","82","84" 237 | "male","group D","associate's degree","standard","none","80","75","77" 238 | "male","group D","bachelor's degree","free/reduced","none","63","66","67" 239 | "female","group D","some high school","standard","completed","64","60","74" 240 | "male","group B","some college","standard","none","54","52","51" 241 | "male","group C","associate's degree","standard","none","84","80","80" 242 | "male","group D","high school","free/reduced","completed","73","68","66" 243 | "female","group E","bachelor's degree","standard","none","80","83","83" 244 | "female","group D","high school","standard","none","56","52","55" 245 | "male","group E","some college","standard","none","59","51","43" 246 | "male","group D","some high school","standard","none","75","74","69" 247 | "male","group C","associate's degree","standard","none","85","76","71" 248 | "male","group E","associate's degree","standard","none","89","76","74" 249 | "female","group B","high school","standard","completed","58","70","68" 250 | "female","group B","high school","standard","none","65","64","62" 251 | "male","group C","high school","standard","none","68","60","53" 252 | "male","group A","some high school","standard","completed","47","49","49" 253 | "female","group D","some college","free/reduced","none","71","83","83" 254 | "female","group B","some high school","standard","completed","60","70","70" 255 | "male","group D","master's degree","standard","none","80","80","72" 256 | "male","group D","high school","standard","none","54","52","52" 257 | "female","group E","some college","standard","none","62","73","70" 258 | "female","group C","associate's degree","free/reduced","none","64","73","68" 259 | "male","group C","associate's degree","standard","completed","78","77","77" 260 | "female","group B","some college","standard","none","70","75","78" 261 | "female","group C","master's degree","free/reduced","completed","65","81","81" 262 | "female","group C","some high school","free/reduced","completed","64","79","77" 263 | "male","group C","some college","standard","completed","79","79","78" 264 | "female","group C","some high school","free/reduced","none","44","50","51" 265 | "female","group E","high school","standard","none","99","93","90" 266 | "male","group D","high school","standard","none","76","73","68" 267 | "male","group D","some high school","free/reduced","none","59","42","41" 268 | "female","group C","bachelor's degree","standard","none","63","75","81" 269 | "female","group D","high school","standard","none","69","72","77" 270 | "female","group D","associate's degree","standard","completed","88","92","95" 271 | "female","group E","some college","free/reduced","none","71","76","70" 272 | "male","group C","bachelor's degree","standard","none","69","63","61" 273 | "male","group C","some college","standard","none","58","49","42" 274 | "female","group D","associate's degree","free/reduced","none","47","53","58" 275 | "female","group D","some college","standard","none","65","70","71" 276 | "male","group B","some college","standard","completed","88","85","76" 277 | "male","group C","bachelor's degree","standard","none","83","78","73" 278 | "female","group C","some high school","standard","completed","85","92","93" 279 | "female","group E","high school","standard","completed","59","63","75" 280 | "female","group C","some high school","free/reduced","none","65","86","80" 281 | "male","group B","bachelor's degree","free/reduced","none","73","56","57" 282 | "male","group D","high school","standard","none","53","52","42" 283 | "male","group D","high school","standard","none","45","48","46" 284 | "female","group D","bachelor's degree","free/reduced","none","73","79","84" 285 | "female","group D","some college","free/reduced","completed","70","78","78" 286 | "female","group B","some high school","standard","none","37","46","46" 287 | "male","group B","associate's degree","standard","completed","81","82","82" 288 | "male","group E","associate's degree","standard","completed","97","82","88" 289 | "female","group B","some high school","standard","none","67","89","82" 290 | "male","group B","bachelor's degree","free/reduced","none","88","75","76" 291 | "male","group E","some high school","standard","completed","77","76","77" 292 | "male","group C","associate's degree","standard","none","76","70","68" 293 | "male","group D","some high school","standard","none","86","73","70" 294 | "male","group C","some high school","standard","completed","63","60","57" 295 | "female","group E","bachelor's degree","standard","none","65","73","75" 296 | "male","group D","high school","free/reduced","completed","78","77","80" 297 | "male","group B","associate's degree","free/reduced","none","67","62","60" 298 | "male","group A","some high school","standard","completed","46","41","43" 299 | "male","group E","associate's degree","standard","completed","71","74","68" 300 | "male","group C","high school","free/reduced","completed","40","46","50" 301 | "male","group D","associate's degree","free/reduced","none","90","87","75" 302 | "male","group A","some college","free/reduced","completed","81","78","81" 303 | "male","group D","some high school","free/reduced","none","56","54","52" 304 | "female","group C","associate's degree","standard","completed","67","84","81" 305 | "male","group B","associate's degree","standard","none","80","76","64" 306 | "female","group C","associate's degree","standard","completed","74","75","83" 307 | "male","group A","some college","standard","none","69","67","69" 308 | "male","group E","some college","standard","completed","99","87","81" 309 | "male","group C","some high school","standard","none","51","52","44" 310 | "female","group B","associate's degree","free/reduced","none","53","71","67" 311 | "female","group D","high school","free/reduced","none","49","57","52" 312 | "female","group B","associate's degree","standard","none","73","76","80" 313 | "male","group B","bachelor's degree","standard","none","66","60","57" 314 | "male","group D","bachelor's degree","standard","completed","67","61","68" 315 | "female","group C","associate's degree","free/reduced","completed","68","67","69" 316 | "female","group C","bachelor's degree","standard","completed","59","64","75" 317 | "male","group C","high school","standard","none","71","66","65" 318 | "female","group D","master's degree","standard","completed","77","82","91" 319 | "male","group C","associate's degree","standard","none","83","72","78" 320 | "male","group B","bachelor's degree","standard","none","63","71","69" 321 | "female","group D","associate's degree","free/reduced","none","56","65","63" 322 | "female","group C","high school","free/reduced","completed","67","79","84" 323 | "female","group E","high school","standard","none","75","86","79" 324 | "female","group C","some college","standard","none","71","81","80" 325 | "female","group C","some high school","free/reduced","none","43","53","53" 326 | "female","group C","high school","free/reduced","none","41","46","43" 327 | "female","group C","some college","standard","none","82","90","94" 328 | "male","group C","some college","standard","none","61","61","62" 329 | "male","group A","some college","free/reduced","none","28","23","19" 330 | "male","group C","associate's degree","standard","completed","82","75","77" 331 | "female","group B","some high school","standard","none","41","55","51" 332 | "male","group C","high school","standard","none","71","60","61" 333 | "male","group C","associate's degree","standard","none","47","37","35" 334 | "male","group E","associate's degree","standard","completed","62","56","53" 335 | "male","group B","associate's degree","standard","none","90","78","81" 336 | "female","group C","bachelor's degree","standard","none","83","93","95" 337 | "female","group B","some college","free/reduced","none","61","68","66" 338 | "male","group D","some high school","standard","completed","76","70","69" 339 | "male","group C","associate's degree","standard","none","49","51","43" 340 | "female","group B","some high school","free/reduced","none","24","38","27" 341 | "female","group D","some high school","free/reduced","completed","35","55","60" 342 | "male","group C","high school","free/reduced","none","58","61","52" 343 | "female","group C","high school","standard","none","61","73","63" 344 | "female","group B","high school","standard","completed","69","76","74" 345 | "male","group D","associate's degree","standard","completed","67","72","67" 346 | "male","group D","some college","standard","none","79","73","67" 347 | "female","group C","high school","standard","none","72","80","75" 348 | "male","group B","some college","standard","none","62","61","57" 349 | "female","group C","bachelor's degree","standard","completed","77","94","95" 350 | "male","group D","high school","free/reduced","none","75","74","66" 351 | "male","group E","associate's degree","standard","none","87","74","76" 352 | "female","group B","bachelor's degree","standard","none","52","65","69" 353 | "male","group E","some college","standard","none","66","57","52" 354 | "female","group C","some college","standard","completed","63","78","80" 355 | "female","group C","associate's degree","standard","none","46","58","57" 356 | "female","group C","some college","standard","none","59","71","70" 357 | "female","group B","bachelor's degree","standard","none","61","72","70" 358 | "male","group A","associate's degree","standard","none","63","61","61" 359 | "female","group C","some college","free/reduced","completed","42","66","69" 360 | "male","group D","some college","free/reduced","none","59","62","61" 361 | "female","group D","some college","standard","none","80","90","89" 362 | "female","group B","high school","standard","none","58","62","59" 363 | "male","group B","some high school","standard","completed","85","84","78" 364 | "female","group C","some college","standard","none","52","58","58" 365 | "female","group D","some high school","free/reduced","none","27","34","32" 366 | "male","group C","some college","standard","none","59","60","58" 367 | "male","group A","bachelor's degree","free/reduced","completed","49","58","60" 368 | "male","group C","high school","standard","completed","69","58","53" 369 | "male","group C","bachelor's degree","free/reduced","none","61","66","61" 370 | "female","group A","some high school","free/reduced","none","44","64","58" 371 | "female","group D","some high school","standard","none","73","84","85" 372 | "male","group E","some college","standard","none","84","77","71" 373 | "female","group C","some college","free/reduced","completed","45","73","70" 374 | "male","group D","some high school","standard","none","74","74","72" 375 | "female","group D","some college","standard","completed","82","97","96" 376 | "female","group D","bachelor's degree","standard","none","59","70","73" 377 | "male","group E","associate's degree","free/reduced","none","46","43","41" 378 | "female","group D","some high school","standard","none","80","90","82" 379 | "female","group D","master's degree","free/reduced","completed","85","95","100" 380 | "female","group A","some high school","standard","none","71","83","77" 381 | "male","group A","bachelor's degree","standard","none","66","64","62" 382 | "female","group B","associate's degree","standard","none","80","86","83" 383 | "male","group C","associate's degree","standard","completed","87","100","95" 384 | "male","group C","master's degree","free/reduced","none","79","81","71" 385 | "female","group E","some high school","free/reduced","none","38","49","45" 386 | "female","group A","some high school","free/reduced","none","38","43","43" 387 | "female","group E","some college","standard","none","67","76","75" 388 | "female","group E","bachelor's degree","standard","none","64","73","70" 389 | "female","group C","associate's degree","free/reduced","none","57","78","67" 390 | "female","group D","high school","standard","none","62","64","64" 391 | "male","group D","master's degree","standard","none","73","70","75" 392 | "male","group E","some high school","free/reduced","completed","73","67","59" 393 | "female","group D","some college","standard","none","77","68","77" 394 | "male","group E","some college","standard","none","76","67","67" 395 | "male","group C","associate's degree","standard","completed","57","54","56" 396 | "female","group C","some high school","standard","completed","65","74","77" 397 | "male","group A","high school","free/reduced","none","48","45","41" 398 | "female","group B","high school","free/reduced","none","50","67","63" 399 | "female","group C","associate's degree","standard","none","85","89","95" 400 | "male","group B","some high school","standard","none","74","63","57" 401 | "male","group D","some high school","standard","none","60","59","54" 402 | "female","group C","some high school","standard","completed","59","54","67" 403 | "male","group A","some college","standard","none","53","43","43" 404 | "female","group A","some college","free/reduced","none","49","65","55" 405 | "female","group D","high school","standard","completed","88","99","100" 406 | "female","group C","high school","standard","none","54","59","62" 407 | "female","group C","some high school","standard","none","63","73","68" 408 | "male","group B","associate's degree","standard","completed","65","65","63" 409 | "female","group B","associate's degree","standard","none","82","80","77" 410 | "female","group D","high school","free/reduced","completed","52","57","56" 411 | "male","group D","associate's degree","standard","completed","87","84","85" 412 | "female","group D","master's degree","standard","completed","70","71","74" 413 | "male","group E","some college","standard","completed","84","83","78" 414 | "male","group D","associate's degree","standard","none","71","66","60" 415 | "male","group B","some high school","standard","completed","63","67","67" 416 | "female","group C","bachelor's degree","free/reduced","completed","51","72","79" 417 | "male","group E","high school","standard","none","84","73","69" 418 | "male","group C","bachelor's degree","standard","completed","71","74","68" 419 | "male","group C","associate's degree","standard","none","74","73","67" 420 | "male","group D","some college","standard","none","68","59","62" 421 | "male","group E","high school","free/reduced","completed","57","56","54" 422 | "female","group C","associate's degree","free/reduced","completed","82","93","93" 423 | "female","group D","high school","standard","completed","57","58","64" 424 | "female","group D","master's degree","free/reduced","completed","47","58","67" 425 | "female","group A","some high school","standard","completed","59","85","80" 426 | "male","group B","some college","free/reduced","none","41","39","34" 427 | "female","group C","some college","free/reduced","none","62","67","62" 428 | "male","group C","bachelor's degree","standard","none","86","83","86" 429 | "male","group C","some high school","free/reduced","none","69","71","65" 430 | "male","group A","some high school","free/reduced","none","65","59","53" 431 | "male","group C","some high school","free/reduced","none","68","63","54" 432 | "male","group C","associate's degree","free/reduced","none","64","66","59" 433 | "female","group C","high school","standard","none","61","72","70" 434 | "male","group C","high school","standard","none","61","56","55" 435 | "female","group A","some high school","free/reduced","none","47","59","50" 436 | "male","group C","some high school","standard","none","73","66","66" 437 | "male","group C","some college","free/reduced","completed","50","48","53" 438 | "male","group D","associate's degree","standard","none","75","68","64" 439 | "male","group D","associate's degree","free/reduced","none","75","66","73" 440 | "male","group C","high school","standard","none","70","56","51" 441 | "male","group D","some high school","standard","completed","89","88","82" 442 | "female","group C","some college","standard","completed","67","81","79" 443 | "female","group D","high school","standard","none","78","81","80" 444 | "female","group A","some high school","free/reduced","none","59","73","69" 445 | "female","group B","associate's degree","standard","none","73","83","76" 446 | "male","group A","some high school","free/reduced","none","79","82","73" 447 | "female","group C","some high school","standard","completed","67","74","77" 448 | "male","group D","some college","free/reduced","none","69","66","60" 449 | "male","group C","high school","standard","completed","86","81","80" 450 | "male","group B","high school","standard","none","47","46","42" 451 | "male","group B","associate's degree","standard","none","81","73","72" 452 | "female","group C","some college","free/reduced","completed","64","85","85" 453 | "female","group E","some college","standard","none","100","92","97" 454 | "female","group C","associate's degree","free/reduced","none","65","77","74" 455 | "male","group C","some college","free/reduced","none","65","58","49" 456 | "female","group C","associate's degree","free/reduced","none","53","61","62" 457 | "male","group C","bachelor's degree","free/reduced","none","37","56","47" 458 | "female","group D","bachelor's degree","standard","none","79","89","89" 459 | "male","group D","associate's degree","free/reduced","none","53","54","48" 460 | "female","group E","bachelor's degree","standard","none","100","100","100" 461 | "male","group B","high school","standard","completed","72","65","68" 462 | "male","group C","bachelor's degree","free/reduced","none","53","58","55" 463 | "male","group B","some college","free/reduced","none","54","54","45" 464 | "female","group E","some college","standard","none","71","70","76" 465 | "female","group C","some college","free/reduced","none","77","90","91" 466 | "male","group A","bachelor's degree","standard","completed","75","58","62" 467 | "female","group C","some college","standard","none","84","87","91" 468 | "female","group D","associate's degree","free/reduced","none","26","31","38" 469 | "male","group A","high school","free/reduced","completed","72","67","65" 470 | "female","group A","high school","free/reduced","completed","77","88","85" 471 | "male","group C","some college","standard","none","91","74","76" 472 | "female","group C","associate's degree","standard","completed","83","85","90" 473 | "female","group C","high school","standard","none","63","69","74" 474 | "female","group C","associate's degree","standard","completed","68","86","84" 475 | "female","group D","some high school","standard","none","59","67","61" 476 | "female","group B","associate's degree","standard","completed","90","90","91" 477 | "female","group D","bachelor's degree","standard","completed","71","76","83" 478 | "male","group E","bachelor's degree","standard","completed","76","62","66" 479 | "male","group D","associate's degree","standard","none","80","68","72" 480 | "female","group D","master's degree","standard","none","55","64","70" 481 | "male","group E","associate's degree","standard","none","76","71","67" 482 | "male","group B","high school","standard","completed","73","71","68" 483 | "female","group D","associate's degree","free/reduced","none","52","59","56" 484 | "male","group C","some college","free/reduced","none","68","68","61" 485 | "male","group A","high school","standard","none","59","52","46" 486 | "female","group B","associate's degree","standard","none","49","52","54" 487 | "male","group C","high school","standard","none","70","74","71" 488 | "male","group D","some college","free/reduced","none","61","47","56" 489 | "female","group C","associate's degree","free/reduced","none","60","75","74" 490 | "male","group B","some high school","standard","completed","64","53","57" 491 | "male","group A","associate's degree","free/reduced","completed","79","82","82" 492 | "female","group A","associate's degree","free/reduced","none","65","85","76" 493 | "female","group C","associate's degree","standard","none","64","64","70" 494 | "female","group C","some college","standard","none","83","83","90" 495 | "female","group C","bachelor's degree","standard","none","81","88","90" 496 | "female","group B","high school","standard","none","54","64","68" 497 | "male","group D","high school","standard","completed","68","64","66" 498 | "female","group C","some college","standard","none","54","48","52" 499 | "female","group D","some college","free/reduced","completed","59","78","76" 500 | "female","group B","some high school","standard","none","66","69","68" 501 | "male","group E","some college","standard","none","76","71","72" 502 | "female","group D","master's degree","standard","none","74","79","82" 503 | "female","group B","associate's degree","standard","completed","94","87","92" 504 | "male","group C","some college","free/reduced","none","63","61","54" 505 | "female","group E","associate's degree","standard","completed","95","89","92" 506 | "female","group D","master's degree","free/reduced","none","40","59","54" 507 | "female","group B","some high school","standard","none","82","82","80" 508 | "male","group A","high school","standard","none","68","70","66" 509 | "male","group B","bachelor's degree","free/reduced","none","55","59","54" 510 | "male","group C","master's degree","standard","none","79","78","77" 511 | "female","group C","bachelor's degree","standard","none","86","92","87" 512 | "male","group D","some college","standard","none","76","71","73" 513 | "male","group A","some high school","standard","none","64","50","43" 514 | "male","group D","some high school","free/reduced","none","62","49","52" 515 | "female","group B","some high school","standard","completed","54","61","62" 516 | "female","group B","master's degree","free/reduced","completed","77","97","94" 517 | "female","group C","some high school","standard","completed","76","87","85" 518 | "female","group D","some college","standard","none","74","89","84" 519 | "female","group E","some college","standard","completed","66","74","73" 520 | "female","group D","some high school","standard","completed","66","78","78" 521 | "female","group B","high school","free/reduced","completed","67","78","79" 522 | "male","group D","some college","standard","none","71","49","52" 523 | "female","group C","associate's degree","standard","none","91","86","84" 524 | "male","group D","bachelor's degree","standard","none","69","58","57" 525 | "male","group C","master's degree","free/reduced","none","54","59","50" 526 | "male","group C","high school","standard","completed","53","52","49" 527 | "male","group E","some college","standard","none","68","60","59" 528 | "male","group C","some high school","free/reduced","completed","56","61","60" 529 | "female","group C","high school","free/reduced","none","36","53","43" 530 | "female","group D","bachelor's degree","free/reduced","none","29","41","47" 531 | "female","group C","associate's degree","standard","none","62","74","70" 532 | "female","group C","associate's degree","standard","completed","68","67","73" 533 | "female","group C","some high school","standard","none","47","54","53" 534 | "male","group E","associate's degree","standard","completed","62","61","58" 535 | "female","group E","associate's degree","standard","completed","79","88","94" 536 | "male","group B","high school","standard","completed","73","69","68" 537 | "female","group C","bachelor's degree","free/reduced","completed","66","83","83" 538 | "male","group C","associate's degree","standard","completed","51","60","58" 539 | "female","group D","high school","standard","none","51","66","62" 540 | "male","group E","bachelor's degree","standard","completed","85","66","71" 541 | "male","group A","associate's degree","standard","completed","97","92","86" 542 | "male","group C","high school","standard","completed","75","69","68" 543 | "male","group D","associate's degree","free/reduced","completed","79","82","80" 544 | "female","group C","associate's degree","standard","none","81","77","79" 545 | "female","group D","associate's degree","standard","none","82","95","89" 546 | "female","group D","master's degree","standard","none","64","63","66" 547 | "male","group E","some high school","free/reduced","completed","78","83","80" 548 | "female","group A","some high school","standard","completed","92","100","97" 549 | "male","group C","high school","standard","completed","72","67","64" 550 | "female","group C","high school","free/reduced","none","62","67","64" 551 | "male","group C","master's degree","standard","none","79","72","69" 552 | "male","group C","some high school","free/reduced","none","79","76","65" 553 | "male","group B","bachelor's degree","free/reduced","completed","87","90","88" 554 | "female","group B","associate's degree","standard","none","40","48","50" 555 | "male","group D","some college","free/reduced","none","77","62","64" 556 | "male","group E","associate's degree","standard","none","53","45","40" 557 | "female","group C","some college","free/reduced","none","32","39","33" 558 | "female","group C","associate's degree","standard","completed","55","72","79" 559 | "male","group C","master's degree","free/reduced","none","61","67","66" 560 | "female","group B","associate's degree","free/reduced","none","53","70","70" 561 | "male","group D","some high school","standard","none","73","66","62" 562 | "female","group D","some college","standard","completed","74","75","79" 563 | "female","group C","some college","standard","none","63","74","74" 564 | "male","group C","bachelor's degree","standard","completed","96","90","92" 565 | "female","group D","some college","free/reduced","completed","63","80","80" 566 | "male","group B","bachelor's degree","free/reduced","none","48","51","46" 567 | "male","group B","associate's degree","standard","none","48","43","45" 568 | "female","group E","bachelor's degree","free/reduced","completed","92","100","100" 569 | "female","group D","master's degree","free/reduced","completed","61","71","78" 570 | "male","group B","high school","free/reduced","none","63","48","47" 571 | "male","group D","bachelor's degree","free/reduced","none","68","68","67" 572 | "male","group B","some college","standard","completed","71","75","70" 573 | "male","group A","bachelor's degree","standard","none","91","96","92" 574 | "female","group C","some college","standard","none","53","62","56" 575 | "female","group C","high school","free/reduced","completed","50","66","64" 576 | "female","group E","high school","standard","none","74","81","71" 577 | "male","group A","associate's degree","free/reduced","completed","40","55","53" 578 | "male","group A","some college","standard","completed","61","51","52" 579 | "female","group B","high school","standard","none","81","91","89" 580 | "female","group B","some college","free/reduced","completed","48","56","58" 581 | "female","group D","master's degree","standard","none","53","61","68" 582 | "female","group D","some high school","standard","none","81","97","96" 583 | "female","group E","some high school","standard","none","77","79","80" 584 | "female","group D","bachelor's degree","free/reduced","none","63","73","78" 585 | "female","group D","associate's degree","standard","completed","73","75","80" 586 | "female","group D","some college","standard","none","69","77","77" 587 | "female","group C","associate's degree","standard","none","65","76","76" 588 | "female","group A","high school","standard","none","55","73","73" 589 | "female","group C","bachelor's degree","free/reduced","none","44","63","62" 590 | "female","group C","some college","standard","none","54","64","65" 591 | "female","group A","some high school","standard","none","48","66","65" 592 | "male","group C","some college","free/reduced","none","58","57","54" 593 | "male","group A","some high school","standard","none","71","62","50" 594 | "male","group E","bachelor's degree","standard","none","68","68","64" 595 | "female","group E","high school","standard","none","74","76","73" 596 | "female","group C","bachelor's degree","standard","completed","92","100","99" 597 | "female","group C","bachelor's degree","standard","completed","56","79","72" 598 | "male","group B","high school","free/reduced","none","30","24","15" 599 | "male","group A","some high school","standard","none","53","54","48" 600 | "female","group D","high school","standard","none","69","77","73" 601 | "female","group D","some high school","standard","none","65","82","81" 602 | "female","group D","master's degree","standard","none","54","60","63" 603 | "female","group C","high school","standard","none","29","29","30" 604 | "female","group E","some college","standard","none","76","78","80" 605 | "male","group D","high school","free/reduced","none","60","57","51" 606 | "male","group D","master's degree","free/reduced","completed","84","89","90" 607 | "male","group C","some high school","standard","none","75","72","62" 608 | "female","group C","associate's degree","standard","none","85","84","82" 609 | "female","group C","master's degree","free/reduced","none","40","58","54" 610 | "female","group E","some college","standard","none","61","64","62" 611 | "female","group B","associate's degree","standard","none","58","63","65" 612 | "male","group D","some college","free/reduced","completed","69","60","63" 613 | "female","group C","some college","standard","none","58","59","66" 614 | "male","group C","bachelor's degree","standard","completed","94","90","91" 615 | "female","group C","associate's degree","standard","none","65","77","74" 616 | "female","group A","associate's degree","standard","none","82","93","93" 617 | "female","group C","high school","standard","none","60","68","72" 618 | "female","group E","bachelor's degree","standard","none","37","45","38" 619 | "male","group D","bachelor's degree","standard","none","88","78","83" 620 | "male","group D","master's degree","standard","none","95","81","84" 621 | "male","group C","associate's degree","free/reduced","completed","65","73","68" 622 | "female","group C","high school","free/reduced","none","35","61","54" 623 | "male","group B","bachelor's degree","free/reduced","none","62","63","56" 624 | "male","group C","high school","free/reduced","completed","58","51","52" 625 | "male","group A","some college","standard","completed","100","96","86" 626 | "female","group E","bachelor's degree","free/reduced","none","61","58","62" 627 | "male","group D","some college","standard","completed","100","97","99" 628 | "male","group B","associate's degree","free/reduced","completed","69","70","63" 629 | "male","group D","associate's degree","standard","none","61","48","46" 630 | "male","group D","some college","free/reduced","none","49","57","46" 631 | "female","group C","some high school","standard","completed","44","51","55" 632 | "male","group D","some college","standard","none","67","64","70" 633 | "male","group B","high school","standard","none","79","60","65" 634 | "female","group B","bachelor's degree","standard","completed","66","74","81" 635 | "female","group C","high school","standard","none","75","88","85" 636 | "male","group D","some high school","standard","none","84","84","80" 637 | "male","group A","high school","standard","none","71","74","64" 638 | "female","group B","high school","free/reduced","completed","67","80","81" 639 | "female","group D","some high school","standard","completed","80","92","88" 640 | "male","group E","some college","standard","none","86","76","74" 641 | "female","group D","associate's degree","standard","none","76","74","73" 642 | "male","group D","high school","standard","none","41","52","51" 643 | "female","group D","associate's degree","free/reduced","completed","74","88","90" 644 | "female","group B","some high school","free/reduced","none","72","81","79" 645 | "female","group E","high school","standard","completed","74","79","80" 646 | "male","group B","high school","standard","none","70","65","60" 647 | "female","group B","bachelor's degree","standard","completed","65","81","81" 648 | "female","group D","associate's degree","standard","none","59","70","65" 649 | "female","group E","high school","free/reduced","none","64","62","68" 650 | "female","group B","high school","standard","none","50","53","55" 651 | "female","group D","some college","standard","completed","69","79","81" 652 | "male","group C","some high school","free/reduced","completed","51","56","53" 653 | "female","group A","high school","standard","completed","68","80","76" 654 | "female","group D","some college","standard","completed","85","86","98" 655 | "female","group A","associate's degree","standard","completed","65","70","74" 656 | "female","group B","some high school","standard","none","73","79","79" 657 | "female","group B","some college","standard","none","62","67","67" 658 | "male","group C","associate's degree","free/reduced","none","77","67","64" 659 | "male","group D","some high school","standard","none","69","66","61" 660 | "female","group D","associate's degree","free/reduced","none","43","60","58" 661 | "male","group D","associate's degree","standard","none","90","87","85" 662 | "male","group C","some college","free/reduced","none","74","77","73" 663 | "male","group C","some high school","standard","none","73","66","63" 664 | "female","group D","some college","free/reduced","none","55","71","69" 665 | "female","group C","high school","standard","none","65","69","67" 666 | "male","group D","associate's degree","standard","none","80","63","63" 667 | "female","group C","some high school","free/reduced","completed","50","60","60" 668 | "female","group C","some college","free/reduced","completed","63","73","71" 669 | "female","group B","bachelor's degree","free/reduced","none","77","85","87" 670 | "male","group C","some college","standard","none","73","74","61" 671 | "male","group D","associate's degree","standard","completed","81","72","77" 672 | "female","group C","high school","free/reduced","none","66","76","68" 673 | "male","group D","associate's degree","free/reduced","none","52","57","50" 674 | "female","group C","some college","standard","none","69","78","76" 675 | "female","group C","associate's degree","standard","completed","65","84","84" 676 | "female","group D","high school","standard","completed","69","77","78" 677 | "female","group B","some college","standard","completed","50","64","66" 678 | "female","group E","some college","standard","completed","73","78","76" 679 | "female","group C","some high school","standard","completed","70","82","76" 680 | "male","group D","associate's degree","free/reduced","none","81","75","78" 681 | "male","group D","some college","free/reduced","none","63","61","60" 682 | "female","group D","high school","standard","none","67","72","74" 683 | "male","group B","high school","standard","none","60","68","60" 684 | "male","group B","high school","standard","none","62","55","54" 685 | "female","group C","some high school","free/reduced","completed","29","40","44" 686 | "male","group B","some college","standard","completed","62","66","68" 687 | "female","group E","master's degree","standard","completed","94","99","100" 688 | "male","group E","some college","standard","completed","85","75","68" 689 | "male","group D","associate's degree","free/reduced","none","77","78","73" 690 | "male","group A","high school","free/reduced","none","53","58","44" 691 | "male","group E","some college","free/reduced","none","93","90","83" 692 | "female","group C","associate's degree","standard","none","49","53","53" 693 | "female","group E","associate's degree","free/reduced","none","73","76","78" 694 | "female","group C","bachelor's degree","free/reduced","completed","66","74","81" 695 | "female","group D","associate's degree","standard","none","77","77","73" 696 | "female","group C","some high school","standard","none","49","63","56" 697 | "female","group D","some college","free/reduced","none","79","89","86" 698 | "female","group C","associate's degree","standard","completed","75","82","90" 699 | "female","group A","bachelor's degree","standard","none","59","72","70" 700 | "female","group D","associate's degree","standard","completed","57","78","79" 701 | "male","group C","high school","free/reduced","none","66","66","59" 702 | "female","group E","bachelor's degree","standard","completed","79","81","82" 703 | "female","group B","some high school","standard","none","57","67","72" 704 | "male","group A","bachelor's degree","standard","completed","87","84","87" 705 | "female","group D","some college","standard","none","63","64","67" 706 | "female","group B","some high school","free/reduced","completed","59","63","64" 707 | "male","group A","bachelor's degree","free/reduced","none","62","72","65" 708 | "male","group D","high school","standard","none","46","34","36" 709 | "male","group C","some college","standard","none","66","59","52" 710 | "male","group D","high school","standard","none","89","87","79" 711 | "female","group D","associate's degree","free/reduced","completed","42","61","58" 712 | "male","group C","some college","standard","completed","93","84","90" 713 | "female","group E","some high school","standard","completed","80","85","85" 714 | "female","group D","some college","standard","none","98","100","99" 715 | "male","group D","master's degree","standard","none","81","81","84" 716 | "female","group B","some high school","standard","completed","60","70","74" 717 | "female","group B","associate's degree","free/reduced","completed","76","94","87" 718 | "male","group C","associate's degree","standard","completed","73","78","72" 719 | "female","group C","associate's degree","standard","completed","96","96","99" 720 | "female","group C","high school","standard","none","76","76","74" 721 | "male","group E","associate's degree","free/reduced","completed","91","73","80" 722 | "female","group C","some college","free/reduced","none","62","72","70" 723 | "male","group D","some high school","free/reduced","completed","55","59","59" 724 | "female","group B","some high school","free/reduced","completed","74","90","88" 725 | "male","group C","high school","standard","none","50","48","42" 726 | "male","group B","some college","standard","none","47","43","41" 727 | "male","group E","some college","standard","completed","81","74","71" 728 | "female","group E","associate's degree","standard","completed","65","75","77" 729 | "male","group E","some high school","standard","completed","68","51","57" 730 | "female","group D","high school","free/reduced","none","73","92","84" 731 | "male","group C","some college","standard","none","53","39","37" 732 | "female","group B","associate's degree","free/reduced","completed","68","77","80" 733 | "male","group A","some high school","free/reduced","none","55","46","43" 734 | "female","group C","some college","standard","completed","87","89","94" 735 | "male","group D","some high school","standard","none","55","47","44" 736 | "female","group E","some college","free/reduced","none","53","58","57" 737 | "male","group C","master's degree","standard","none","67","57","59" 738 | "male","group C","associate's degree","standard","none","92","79","84" 739 | "female","group B","some college","free/reduced","completed","53","66","73" 740 | "male","group D","associate's degree","standard","none","81","71","73" 741 | "male","group C","high school","free/reduced","none","61","60","55" 742 | "male","group D","bachelor's degree","standard","none","80","73","72" 743 | "female","group A","associate's degree","free/reduced","none","37","57","56" 744 | "female","group C","high school","standard","none","81","84","82" 745 | "female","group C","associate's degree","standard","completed","59","73","72" 746 | "male","group B","some college","free/reduced","none","55","55","47" 747 | "male","group D","associate's degree","standard","none","72","79","74" 748 | "male","group D","high school","standard","none","69","75","71" 749 | "male","group C","some college","standard","none","69","64","68" 750 | "female","group C","bachelor's degree","free/reduced","none","50","60","59" 751 | "male","group B","some college","standard","completed","87","84","86" 752 | "male","group D","some high school","standard","completed","71","69","68" 753 | "male","group E","some college","standard","none","68","72","65" 754 | "male","group C","master's degree","free/reduced","completed","79","77","75" 755 | "female","group C","some high school","standard","completed","77","90","85" 756 | "male","group C","associate's degree","free/reduced","none","58","55","53" 757 | "female","group E","associate's degree","standard","none","84","95","92" 758 | "male","group D","some college","standard","none","55","58","52" 759 | "male","group E","bachelor's degree","free/reduced","completed","70","68","72" 760 | "female","group D","some college","free/reduced","completed","52","59","65" 761 | "male","group B","some college","standard","completed","69","77","77" 762 | "female","group C","high school","free/reduced","none","53","72","64" 763 | "female","group D","some high school","standard","none","48","58","54" 764 | "male","group D","some high school","standard","completed","78","81","86" 765 | "female","group B","high school","standard","none","62","62","63" 766 | "male","group D","some college","standard","none","60","63","59" 767 | "female","group B","high school","standard","none","74","72","72" 768 | "female","group C","high school","standard","completed","58","75","77" 769 | "male","group B","high school","standard","completed","76","62","60" 770 | "female","group D","some high school","standard","none","68","71","75" 771 | "male","group A","some college","free/reduced","none","58","60","57" 772 | "male","group B","high school","standard","none","52","48","49" 773 | "male","group D","bachelor's degree","standard","none","75","73","74" 774 | "female","group B","some high school","free/reduced","completed","52","67","72" 775 | "female","group C","bachelor's degree","free/reduced","none","62","78","79" 776 | "male","group B","some college","standard","none","66","65","60" 777 | "female","group B","some high school","free/reduced","none","49","58","55" 778 | "female","group B","high school","standard","none","66","72","70" 779 | "female","group C","some college","free/reduced","none","35","44","43" 780 | "female","group A","some college","standard","completed","72","79","82" 781 | "male","group E","associate's degree","standard","completed","94","85","82" 782 | "female","group D","associate's degree","free/reduced","none","46","56","57" 783 | "female","group B","master's degree","standard","none","77","90","84" 784 | "female","group B","high school","free/reduced","completed","76","85","82" 785 | "female","group C","associate's degree","standard","completed","52","59","62" 786 | "male","group C","bachelor's degree","standard","completed","91","81","79" 787 | "female","group B","some high school","standard","completed","32","51","44" 788 | "female","group E","some high school","free/reduced","none","72","79","77" 789 | "female","group B","some college","standard","none","19","38","32" 790 | "male","group C","associate's degree","free/reduced","none","68","65","61" 791 | "female","group C","master's degree","free/reduced","none","52","65","61" 792 | "female","group B","high school","standard","none","48","62","60" 793 | "female","group D","some college","free/reduced","none","60","66","70" 794 | "male","group D","high school","free/reduced","none","66","74","69" 795 | "male","group E","some high school","standard","completed","89","84","77" 796 | "female","group B","high school","standard","none","42","52","51" 797 | "female","group E","associate's degree","free/reduced","completed","57","68","73" 798 | "male","group D","high school","standard","none","70","70","70" 799 | "female","group E","associate's degree","free/reduced","none","70","84","81" 800 | "male","group E","some college","standard","none","69","60","54" 801 | "female","group C","associate's degree","standard","none","52","55","57" 802 | "male","group C","some high school","standard","completed","67","73","68" 803 | "male","group C","some high school","standard","completed","76","80","73" 804 | "female","group E","associate's degree","standard","none","87","94","95" 805 | "female","group B","some college","standard","none","82","85","87" 806 | "female","group C","some college","standard","none","73","76","78" 807 | "male","group A","some college","free/reduced","none","75","81","74" 808 | "female","group D","some college","free/reduced","none","64","74","75" 809 | "female","group E","high school","free/reduced","none","41","45","40" 810 | "male","group C","high school","standard","none","90","75","69" 811 | "male","group B","bachelor's degree","standard","none","59","54","51" 812 | "male","group A","some high school","standard","none","51","31","36" 813 | "male","group A","high school","free/reduced","none","45","47","49" 814 | "female","group C","master's degree","standard","completed","54","64","67" 815 | "male","group E","some high school","standard","completed","87","84","76" 816 | "female","group C","high school","standard","none","72","80","83" 817 | "male","group B","some high school","standard","completed","94","86","87" 818 | "female","group A","bachelor's degree","standard","none","45","59","64" 819 | "male","group D","bachelor's degree","free/reduced","completed","61","70","76" 820 | "female","group B","high school","free/reduced","none","60","72","68" 821 | "female","group C","some high school","standard","none","77","91","88" 822 | "female","group A","some high school","standard","completed","85","90","92" 823 | "female","group D","bachelor's degree","free/reduced","none","78","90","93" 824 | "male","group E","some college","free/reduced","completed","49","52","51" 825 | "female","group B","high school","free/reduced","none","71","87","82" 826 | "female","group C","some high school","free/reduced","none","48","58","52" 827 | "male","group C","high school","standard","none","62","67","58" 828 | "female","group C","associate's degree","free/reduced","completed","56","68","70" 829 | "female","group C","some high school","standard","none","65","69","76" 830 | "female","group D","some high school","free/reduced","completed","69","86","81" 831 | "male","group B","some high school","standard","none","68","54","53" 832 | "female","group A","some college","free/reduced","none","61","60","57" 833 | "female","group C","bachelor's degree","free/reduced","completed","74","86","89" 834 | "male","group A","bachelor's degree","standard","none","64","60","58" 835 | "female","group B","high school","standard","completed","77","82","89" 836 | "male","group B","some college","standard","none","58","50","45" 837 | "female","group C","high school","standard","completed","60","64","74" 838 | "male","group E","high school","standard","none","73","64","57" 839 | "female","group A","high school","standard","completed","75","82","79" 840 | "male","group B","associate's degree","free/reduced","completed","58","57","53" 841 | "female","group C","associate's degree","standard","none","66","77","73" 842 | "female","group D","high school","free/reduced","none","39","52","46" 843 | "male","group C","some high school","standard","none","64","58","51" 844 | "female","group B","high school","free/reduced","completed","23","44","36" 845 | "male","group B","some college","free/reduced","completed","74","77","76" 846 | "female","group D","some high school","free/reduced","completed","40","65","64" 847 | "male","group E","master's degree","standard","none","90","85","84" 848 | "male","group C","master's degree","standard","completed","91","85","85" 849 | "male","group D","high school","standard","none","64","54","50" 850 | "female","group C","high school","standard","none","59","72","68" 851 | "male","group D","associate's degree","standard","none","80","75","69" 852 | "male","group C","master's degree","standard","none","71","67","67" 853 | "female","group A","high school","standard","none","61","68","63" 854 | "female","group E","some college","standard","none","87","85","93" 855 | "male","group E","some high school","standard","none","82","67","61" 856 | "male","group C","some high school","standard","none","62","64","55" 857 | "female","group B","bachelor's degree","standard","none","97","97","96" 858 | "male","group B","some college","free/reduced","none","75","68","65" 859 | "female","group C","bachelor's degree","standard","none","65","79","81" 860 | "male","group B","high school","standard","completed","52","49","46" 861 | "male","group C","associate's degree","free/reduced","none","87","73","72" 862 | "female","group C","associate's degree","standard","none","53","62","53" 863 | "female","group E","master's degree","free/reduced","none","81","86","87" 864 | "male","group D","bachelor's degree","free/reduced","completed","39","42","38" 865 | "female","group C","some college","standard","completed","71","71","80" 866 | "male","group C","associate's degree","standard","none","97","93","91" 867 | "male","group D","some college","standard","completed","82","82","88" 868 | "male","group C","high school","free/reduced","none","59","53","52" 869 | "male","group B","associate's degree","standard","none","61","42","41" 870 | "male","group E","associate's degree","free/reduced","completed","78","74","72" 871 | "male","group C","associate's degree","free/reduced","none","49","51","51" 872 | "male","group B","high school","standard","none","59","58","47" 873 | "female","group C","some college","standard","completed","70","72","76" 874 | "male","group B","associate's degree","standard","completed","82","84","78" 875 | "male","group E","associate's degree","free/reduced","none","90","90","82" 876 | "female","group C","bachelor's degree","free/reduced","none","43","62","61" 877 | "male","group C","some college","free/reduced","none","80","64","66" 878 | "male","group D","some college","standard","none","81","82","84" 879 | "male","group C","some high school","standard","none","57","61","54" 880 | "female","group D","some high school","standard","none","59","72","80" 881 | "female","group D","associate's degree","standard","none","64","76","74" 882 | "male","group C","bachelor's degree","standard","completed","63","64","66" 883 | "female","group E","bachelor's degree","standard","completed","71","70","70" 884 | "female","group B","high school","free/reduced","none","64","73","71" 885 | "male","group D","bachelor's degree","free/reduced","none","55","46","44" 886 | "female","group E","associate's degree","standard","none","51","51","54" 887 | "female","group C","associate's degree","standard","completed","62","76","80" 888 | "female","group E","associate's degree","standard","completed","93","100","95" 889 | "male","group C","high school","free/reduced","none","54","72","59" 890 | "female","group D","some college","free/reduced","none","69","65","74" 891 | "male","group D","high school","free/reduced","none","44","51","48" 892 | "female","group E","some college","standard","completed","86","85","91" 893 | "female","group E","associate's degree","standard","none","85","92","85" 894 | "female","group A","master's degree","free/reduced","none","50","67","73" 895 | "male","group D","some high school","standard","completed","88","74","75" 896 | "female","group E","associate's degree","standard","none","59","62","69" 897 | "female","group E","some high school","free/reduced","none","32","34","38" 898 | "male","group B","high school","free/reduced","none","36","29","27" 899 | "female","group B","some high school","free/reduced","completed","63","78","79" 900 | "male","group D","associate's degree","standard","completed","67","54","63" 901 | "female","group D","some high school","standard","completed","65","78","82" 902 | "male","group D","master's degree","standard","none","85","84","89" 903 | "female","group C","master's degree","standard","none","73","78","74" 904 | "female","group A","high school","free/reduced","completed","34","48","41" 905 | "female","group D","bachelor's degree","free/reduced","completed","93","100","100" 906 | "female","group D","some high school","free/reduced","none","67","84","84" 907 | "male","group D","some college","standard","none","88","77","77" 908 | "male","group B","high school","standard","none","57","48","51" 909 | "female","group D","some college","standard","completed","79","84","91" 910 | "female","group C","bachelor's degree","free/reduced","none","67","75","72" 911 | "male","group E","bachelor's degree","standard","completed","70","64","70" 912 | "male","group D","bachelor's degree","free/reduced","none","50","42","48" 913 | "female","group A","some college","standard","none","69","84","82" 914 | "female","group C","bachelor's degree","standard","completed","52","61","66" 915 | "female","group C","bachelor's degree","free/reduced","completed","47","62","66" 916 | "female","group B","associate's degree","free/reduced","none","46","61","55" 917 | "female","group E","some college","standard","none","68","70","66" 918 | "male","group E","bachelor's degree","standard","completed","100","100","100" 919 | "female","group C","high school","standard","none","44","61","52" 920 | "female","group C","associate's degree","standard","completed","57","77","80" 921 | "male","group B","some college","standard","completed","91","96","91" 922 | "male","group D","high school","free/reduced","none","69","70","67" 923 | "female","group C","high school","free/reduced","none","35","53","46" 924 | "male","group D","high school","standard","none","72","66","66" 925 | "female","group B","associate's degree","free/reduced","none","54","65","65" 926 | "male","group D","high school","free/reduced","none","74","70","69" 927 | "male","group E","some high school","standard","completed","74","64","60" 928 | "male","group E","associate's degree","free/reduced","none","64","56","52" 929 | "female","group D","high school","free/reduced","completed","65","61","71" 930 | "male","group E","associate's degree","free/reduced","completed","46","43","44" 931 | "female","group C","some high school","free/reduced","none","48","56","51" 932 | "male","group C","some college","free/reduced","completed","67","74","70" 933 | "male","group D","some college","free/reduced","none","62","57","62" 934 | "male","group D","associate's degree","free/reduced","completed","61","71","73" 935 | "male","group C","bachelor's degree","free/reduced","completed","70","75","74" 936 | "male","group C","associate's degree","standard","completed","98","87","90" 937 | "male","group D","some college","free/reduced","none","70","63","58" 938 | "male","group A","associate's degree","standard","none","67","57","53" 939 | "female","group E","high school","free/reduced","none","57","58","57" 940 | "male","group D","some college","standard","completed","85","81","85" 941 | "male","group D","some high school","standard","completed","77","68","69" 942 | "male","group C","master's degree","free/reduced","completed","72","66","72" 943 | "female","group D","master's degree","standard","none","78","91","96" 944 | "male","group C","high school","standard","none","81","66","64" 945 | "male","group A","some high school","free/reduced","completed","61","62","61" 946 | "female","group B","high school","standard","none","58","68","61" 947 | "female","group C","associate's degree","standard","none","54","61","58" 948 | "male","group B","high school","standard","none","82","82","80" 949 | "female","group D","some college","free/reduced","none","49","58","60" 950 | "male","group B","some high school","free/reduced","completed","49","50","52" 951 | "female","group E","high school","free/reduced","completed","57","75","73" 952 | "male","group E","high school","standard","none","94","73","71" 953 | "female","group D","some college","standard","completed","75","77","83" 954 | "female","group E","some high school","free/reduced","none","74","74","72" 955 | "male","group C","high school","standard","completed","58","52","54" 956 | "female","group C","some college","standard","none","62","69","69" 957 | "male","group E","associate's degree","standard","none","72","57","62" 958 | "male","group C","some college","standard","none","84","87","81" 959 | "female","group D","master's degree","standard","none","92","100","100" 960 | "female","group D","high school","standard","none","45","63","59" 961 | "male","group C","high school","standard","none","75","81","71" 962 | "female","group A","some college","standard","none","56","58","64" 963 | "female","group D","some high school","free/reduced","none","48","54","53" 964 | "female","group E","associate's degree","standard","none","100","100","100" 965 | "female","group C","some high school","free/reduced","completed","65","76","75" 966 | "male","group D","some college","standard","none","72","57","58" 967 | "female","group D","some college","standard","none","62","70","72" 968 | "male","group A","some high school","standard","completed","66","68","64" 969 | "male","group C","some college","standard","none","63","63","60" 970 | "female","group E","associate's degree","standard","none","68","76","67" 971 | "female","group B","bachelor's degree","standard","none","75","84","80" 972 | "female","group D","bachelor's degree","standard","none","89","100","100" 973 | "male","group C","some high school","standard","completed","78","72","69" 974 | "female","group A","high school","free/reduced","completed","53","50","60" 975 | "female","group D","some college","free/reduced","none","49","65","61" 976 | "female","group A","some college","standard","none","54","63","67" 977 | "female","group C","some college","standard","completed","64","82","77" 978 | "male","group B","some college","free/reduced","completed","60","62","60" 979 | "male","group C","associate's degree","standard","none","62","65","58" 980 | "male","group D","high school","standard","completed","55","41","48" 981 | "female","group C","associate's degree","standard","none","91","95","94" 982 | "female","group B","high school","free/reduced","none","8","24","23" 983 | "male","group D","some high school","standard","none","81","78","78" 984 | "male","group B","some high school","standard","completed","79","85","86" 985 | "female","group A","some college","standard","completed","78","87","91" 986 | "female","group C","some high school","standard","none","74","75","82" 987 | "male","group A","high school","standard","none","57","51","54" 988 | "female","group C","associate's degree","standard","none","40","59","51" 989 | "male","group E","some high school","standard","completed","81","75","76" 990 | "female","group A","some high school","free/reduced","none","44","45","45" 991 | "female","group D","some college","free/reduced","completed","67","86","83" 992 | "male","group E","high school","free/reduced","completed","86","81","75" 993 | "female","group B","some high school","standard","completed","65","82","78" 994 | "female","group D","associate's degree","free/reduced","none","55","76","76" 995 | "female","group D","bachelor's degree","free/reduced","none","62","72","74" 996 | "male","group A","high school","standard","none","63","63","62" 997 | "female","group E","master's degree","standard","completed","88","99","95" 998 | "male","group C","high school","free/reduced","none","62","55","55" 999 | "female","group C","high school","free/reduced","completed","59","71","65" 1000 | "female","group D","some college","standard","completed","68","78","77" 1001 | "female","group D","some college","free/reduced","none","77","86","86" 1002 | -------------------------------------------------------------------------------- /nba/Data Analysis of NBA Players Challenge.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data Analysis of NBA Players\n", 8 | "\n", 9 | "This iPython notebook is meant to challenge Georgetown Data Science Certificate students to ensure they have understood or acknowledged the technologies and techniques taught througout the semester. We’ll be analyzing a dataset of NBA players and their performance in the 2013-2014 season. You can download the file [here](http://bit.ly/2cdIpUc)." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Read in a CSV File\n", 17 | "\n", 18 | "In the first step, load the dataset into a dataframe. " 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 1, 24 | "metadata": { 25 | "collapsed": false 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "# Load CSV into a dataframe named nba" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "Remember to import the `pandas` library to get access to Dataframes. Dataframes are two-dimensional arrays (matrices) where each column can be of a different datatype.\n", 37 | "\n", 38 | "## Find the number of Players\n", 39 | "\n", 40 | "How many players are in the dataset?" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 2, 46 | "metadata": { 47 | "collapsed": false 48 | }, 49 | "outputs": [], 50 | "source": [ 51 | "# Print the number of rows and columns in the dataframe" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "## Look at the First Row of Data\n", 59 | "\n", 60 | "What does the first row look like?" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 3, 66 | "metadata": { 67 | "collapsed": false 68 | }, 69 | "outputs": [], 70 | "source": [ 71 | "# Print the first row of data" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "## Find the Average of Each Statistic\n", 79 | "\n", 80 | "Find the average value for each statistic. The columns have names like PER (player efficiency rating) and GP (Games Played) that represent the season statistics for each player. For more on the various statistics look [here](http://stats.nba.com/help/glossary)." 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 4, 86 | "metadata": { 87 | "collapsed": false 88 | }, 89 | "outputs": [], 90 | "source": [ 91 | "# Print the mean of each column" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "## Make Pairwise Scatterplots\n", 99 | "\n", 100 | "One common way to explore a dataset is to see how different columns correlate to others. We’ll compare the ast, mpg, and to columns. Create a scatter matrix plot of the dataset. " 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 5, 106 | "metadata": { 107 | "collapsed": false 108 | }, 109 | "outputs": [], 110 | "source": [ 111 | "%matplotlib inline \n", 112 | "\n", 113 | "# Use seaborn or pandas to plot the scatter matrix" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "In Python, `matplotlib` is the primary plotting package, and `seaborn` is a widely used layer over matplotlib. You could have also used the `pandas` `scatter_matrix` for a similar result. \n", 121 | "\n", 122 | "## Make Clusters of the Players \n", 123 | "\n", 124 | "One good way to explore this kind of data is to generate cluster plots. These will show which players are most similar. Use Scikit-Learn to cluster the data. " 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 6, 130 | "metadata": { 131 | "collapsed": true 132 | }, 133 | "outputs": [], 134 | "source": [ 135 | "# Use a clustering model like K-means to cluster the players " 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "We can use the main Python machine learning package, `scikit-learn`, to fit a `k-means` clustering model and get our cluster labels. In order to cluster properly, we remove any non-numeric columns, or columns with missing values (NA, Nan, etc) with the `get_numeric_data` and `dropna` methods.\n", 143 | "\n", 144 | "## Plot Players by Cluster\n", 145 | "\n", 146 | "We can now plot out the players by cluster to discover patterns. One way to do this is to first use PCA to make our data 2-dimensional, then plot it, and shade each point according to cluster association." 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 7, 152 | "metadata": { 153 | "collapsed": false 154 | }, 155 | "outputs": [], 156 | "source": [ 157 | "# Use PCA to plot the clusters in 2 dimensions" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "With Python, we used the PCA class in the scikit-learn library. We used matplotlib to create the plot.\n", 165 | "\n", 166 | "## Split into Training and Testing Sets\n", 167 | "\n", 168 | "If we want to do supervised machine learning, it’s a good idea to split the data into training and testing sets so we don’t overfit." 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 8, 174 | "metadata": { 175 | "collapsed": true 176 | }, 177 | "outputs": [], 178 | "source": [ 179 | "# Create train (80%) and test (20%) splits" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "In Python, the recent version of pandas came with a sample method that returns a certain proportion of rows randomly sampled from a source dataframe – this makes the code much more concise. We could also use Scikit-Learns `KFolds` and `train_test_splits` for different types of shuffle and splits in the data set. In both cases, we set a random seed to make the results reproducible.\n", 187 | "\n", 188 | "## Univariate Linear Regression \n", 189 | "\n", 190 | "Let’s say we want to predict number of assists per player from the turnovers made per player." 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 9, 196 | "metadata": { 197 | "collapsed": true 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "# Compute the univariate regression of TO to AST" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "Scikit-learn has a linear regression model that we can fit and generate predictions from. Note also the use of Lasso and Ridge regressions, though this doesn't apply in the univariate case. \n", 209 | "\n", 210 | "## Calculate Summary Statistics for the Model\n", 211 | "\n", 212 | "Evaluate the above model using your test set, how well does it perform?" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 10, 218 | "metadata": { 219 | "collapsed": false 220 | }, 221 | "outputs": [], 222 | "source": [ 223 | "# Compute the regression results" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "If we want to get summary statistics about the fit, like r-squared value, we can use the `score` method of the Sckit-Learn model. However, if we want more advanced regression statistics we’ll need to do a bit more. The `statsmodels` package enables many statistical methods to be used in Python and is a good tool to know.\n", 231 | "\n", 232 | "## Fit a Random Forest Model \n", 233 | "\n", 234 | "Our linear regression worked well in the single variable case, but we suspect there may be nonlinearities in the data. Thus, we want to fit a random forest model." 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 11, 240 | "metadata": { 241 | "collapsed": false 242 | }, 243 | "outputs": [], 244 | "source": [ 245 | "# Compute random forest from the predictors \"AGE\", \"MPG\", \"TO\", \"HT\", \"WT\", \"REBR\" to the target, \"AST\"" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "## Calculate Error\n", 253 | "\n", 254 | "Now that we’ve fit two models, let’s calculate error. We’ll use MSE." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 12, 260 | "metadata": { 261 | "collapsed": false 262 | }, 263 | "outputs": [], 264 | "source": [ 265 | "# Compute the MSE of the classifier" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "The scikit-learn library has a variety of error metrics that we can use\n", 273 | "\n", 274 | "## Download a Webpage\n", 275 | "\n", 276 | "Now that we have data on NBA players from 2013-2014, let’s scrape some additional data to supplement it. We’ll just look at one box score from the NBA Finals here to save time." 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 19, 282 | "metadata": { 283 | "collapsed": true 284 | }, 285 | "outputs": [], 286 | "source": [ 287 | "# Download \"http://www.basketball-reference.com/boxscores/201506140GSW.html\"" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "## Extract Player Box Scores\n", 295 | "\n", 296 | "Now that we have the web page, we’ll need to parse it to extract scores for players." 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 13, 302 | "metadata": { 303 | "collapsed": false 304 | }, 305 | "outputs": [], 306 | "source": [ 307 | "# Use BeautifulSoup to parse the table from the web page" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "This will create a list containing two lists, the first with the box score for CLE, and the second with the box score for GSW. Both contain the headers, along with each player and their in-game stats. We won’t turn this into more training data now, but it could easily be transformed into a format that could be added to our nba dataframe.\n", 315 | "\n", 316 | "BeautifulSoup, the most commonly used web scraping package. It enables us to loop through the tags and construct a list of lists in a straightforward way." 317 | ] 318 | } 319 | ], 320 | "metadata": { 321 | "kernelspec": { 322 | "display_name": "Python 3", 323 | "language": "python", 324 | "name": "python3" 325 | }, 326 | "language_info": { 327 | "codemirror_mode": { 328 | "name": "ipython", 329 | "version": 3 330 | }, 331 | "file_extension": ".py", 332 | "mimetype": "text/x-python", 333 | "name": "python", 334 | "nbconvert_exporter": "python", 335 | "pygments_lexer": "ipython3", 336 | "version": "3.5.1" 337 | } 338 | }, 339 | "nbformat": 4, 340 | "nbformat_minor": 0 341 | } 342 | -------------------------------------------------------------------------------- /nba/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Georgetown Data Analytics (CCPE) 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /nba/NBA Player Statistics Workshop.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# NBA Player Statistics Workshop\n", 8 | "\n", 9 | "Given a dataset of NBA players performance and salary in 2014, use Python to load the dataset and compute the summary statistics for the `SALARY` field:\n", 10 | "\n", 11 | "- mean\n", 12 | "- median\n", 13 | "- mode\n", 14 | "- minimum\n", 15 | "- maximum\n", 16 | "\n", 17 | "You will need to make use of the csv module to load the data and interact with it. Computations should require only simple arithmetic. (For the purposes of this exercise, attempt to use pure Python and no third party dependencies like Pandas - you can then compare and contrast the use of Pandas for this task later). \n", 18 | "\n", 19 | "**Bonus:**\n", 20 | "\n", 21 | "Determine the relationship of PER (Player Efficiency Rating) to Salary via a visualization of the data.\n", 22 | "\n", 23 | "\n", 24 | "NBA 2014 Players Dataset: [http://bit.ly/2n9twqX](http://bit.ly/2n9twqX)" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": { 31 | "collapsed": true 32 | }, 33 | "outputs": [], 34 | "source": [ 35 | "# Imports - you'll need some of these later, but it's traditional to put them all at the beginning.\n", 36 | "\n", 37 | "import os\n", 38 | "import csv\n", 39 | "import json\n", 40 | "\n", 41 | "from collections import Counter\n", 42 | "from operator import itemgetter\n", 43 | "from requests import get" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## Fetching the Data\n", 51 | "\n", 52 | "You have a couple of options of fetching the data set to begin your analysis:\n", 53 | "\n", 54 | "1. Click on the link above and Download the file. \n", 55 | "2. Write a Python function that automatically downloads the data as a comma-separated value file (CSV) and writes it to disk. \n", 56 | "\n", 57 | "In either case, you'll have to be cognizant of where the CSV file lands. Here is a quick implementation of a function to download a URL at a file and write it to disk. Note the many approaches to do this as outlined here: [How do I download a file over HTTP using Python?](http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python). " 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": { 64 | "collapsed": true 65 | }, 66 | "outputs": [], 67 | "source": [ 68 | "def download(url, path):\n", 69 | " \"\"\"\n", 70 | " Downloads a URL and writes it to the specified path. The \"path\" \n", 71 | " is like the mailing address for the file - it tells the function \n", 72 | " where on your computer to send it!\n", 73 | " \n", 74 | " Also note the use of \"with\" to automatically close files - this \n", 75 | " is a good standard practice to follow.\n", 76 | " \"\"\"\n", 77 | " with open(path,'wb') as f:\n", 78 | " response = get(url)\n", 79 | " f.write(response.content)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "**Your turn: use the above function to download the data!**" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": { 93 | "collapsed": true 94 | }, 95 | "outputs": [], 96 | "source": [ 97 | "## Write the Python to execute the function and download the file here:" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "## Loading the Data\n", 105 | "\n", 106 | "Now that we have the CSV file that we're looking for, we need to be able to open the file and read it into memory. The trick is that we want to read only a single line at a time - consider really large CSV files. Python provides memory efficient iteration in the form of `generators` and the `csv.reader` module exposes one such generator, that reads the data from the CSV one row at a time. Moreover, we also want to parse our data so that we have specific access to the fields we're looking for. The `csv.DictReader` class will give you each row as a dictionary, where the keys are derived from the first, header line of the file. \n", 107 | "\n", 108 | "Here is a function that reads data from disk one line at a time and `yield`s it to the user. " 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": { 115 | "collapsed": true 116 | }, 117 | "outputs": [], 118 | "source": [ 119 | "def read_csv(path):\n", 120 | " # First open the file\n", 121 | " with open(path, 'rt') as f:\n", 122 | " # Create a DictReader to parse the CSV\n", 123 | " reader = csv.DictReader(f)\n", 124 | " for row in reader:\n", 125 | " # HINT: Convert SALARY column values into integers & PER column into floats.\n", 126 | " # Otherwise CSVs can turn ints into strs! You'll thank me later :D\n", 127 | " row['SALARY'] = int(row['SALARY'])\n", 128 | " row['PER'] = float(row['PER'])\n", 129 | " # Now yield each row one at a time.\n", 130 | " yield row" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "**Your turn: use the above function to open the file and print out the first row of the CSV!**\n", 138 | "\n", 139 | "To do this, you'll need to do three things:\n", 140 | "\n", 141 | "First, remember where you told the `download` function to store your file? Pass that same path into `read_csv`:" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": { 148 | "collapsed": true 149 | }, 150 | "outputs": [], 151 | "source": [ 152 | "## Write the Python to execute our read_csv function." 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "Next step: The `read_csv` function \"returns\" a generator. How can we access just the first row? Remember [how to access the next row of a generator](http://stackoverflow.com/questions/4741243/how-to-pick-just-one-item-from-a-generator-in-python)? " 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": { 166 | "collapsed": true 167 | }, 168 | "outputs": [], 169 | "source": [ 170 | "## Now write the Python to print the first row of the CSV here." 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "Are there different ways to print the first _n_ rows of something? Sure! Try using `break`, which will stop a `for` loop from running. E.g. the code:\n", 178 | "\n", 179 | "```python\n", 180 | "for idx in xrange(100):\n", 181 | " if idx > 10:\n", 182 | " break\n", 183 | "```\n", 184 | "\n", 185 | "...will stop the for loop after 10 iterations." 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "Next, write a `for` loop that can access and print every row. " 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": null, 198 | "metadata": { 199 | "collapsed": true 200 | }, 201 | "outputs": [], 202 | "source": [ 203 | "## Write the Python to print *every* row of the CSV here." 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "## Summary Statistics\n", 211 | "\n", 212 | "In this section, you'll use the CSV data to write computations for mean, median, mode, minimum, and maximum. Use Python to access the values in the `SALARY` column." 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": { 219 | "collapsed": true 220 | }, 221 | "outputs": [], 222 | "source": [ 223 | "data = list(read_csv('fixtures/nba_players.csv')) #Put in your own path here.\n", 224 | "data = sorted(data, key=itemgetter('SALARY'))\n", 225 | "\n", 226 | "total = 0\n", 227 | "count = 0\n", 228 | "\n", 229 | "for row in data:\n", 230 | " count += 1\n", 231 | " total += row['SALARY']\n", 232 | "\n", 233 | "# Total Count\n", 234 | "print(\"There are {0:d} total players.\".format(count))\n", 235 | "\n", 236 | "# Write the Python to get the median\n", 237 | "median = \n", 238 | "print(\"The median salary is {0:d}.\".format(median))\n", 239 | "\n", 240 | "# Write the Python to get the minimum\n", 241 | "minimum = \n", 242 | "print(\"The minimum salary is {0:d}.\".format(minimum))\n", 243 | "\n", 244 | "# Write the Python to get the maximum\n", 245 | "maximum = \n", 246 | "print(\"The maximum salary is {0:d}.\".format(maximum))\n", 247 | "\n", 248 | "# Write the Python to get the mean\n", 249 | "mean = \n", 250 | "print(\"The mean salary is {0:d}.\".format(mean))" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "Nice work! Now... calculating the mode is a bit different. Remember about the [Decorate-Sort-Undecorate](http://www.greenteapress.com/thinkpython/html/thinkpython013.html) pattern that we learned about in ThinkPython? That will work here!" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": { 264 | "collapsed": true 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "## Write the Python to get the mode of the salaries." 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "The \"DSU\" approach is a little inefficient. Instead of using a dictionary as our data type to solve the mode problem, we could use counter() from the Collections module. [Read more about counter()](https://pymotw.com/3/collections/counter.html) and try it out here:" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": null, 281 | "metadata": { 282 | "collapsed": true 283 | }, 284 | "outputs": [], 285 | "source": [ 286 | "## Experiment with using counter() here." 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "#### Putting the pieces together\n", 294 | "\n", 295 | "The above summary statistics can actually be computed inside of a single (and elegant!) function. Give it a try!" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": { 302 | "collapsed": true 303 | }, 304 | "outputs": [], 305 | "source": [ 306 | "def statistics(path):\n", 307 | " \"\"\"\n", 308 | " Takes as input a path to `read_csv` and the field to\n", 309 | " compute the summary statistics upon.\n", 310 | " \"\"\"\n", 311 | " \n", 312 | " # Uncomment below to load the CSV into a list\n", 313 | " # data = list(read_csv(path))\n", 314 | " \n", 315 | " # Fill in the function here\n", 316 | "\n", 317 | "\n", 318 | " stats = {\n", 319 | " 'maximum': data[-1]['SALARY'],\n", 320 | " 'minimum': data[0]['SALARY'],\n", 321 | " 'median': data[count // 2]['SALARY'], # Any potential problems here?\n", 322 | " 'mode': freqs.most_common(2),\n", 323 | " 'mean': total // count,\n", 324 | " }\n", 325 | "\n", 326 | " return stats" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": { 333 | "collapsed": true 334 | }, 335 | "outputs": [], 336 | "source": [ 337 | "print(statistics('./fixtures/nba_players.csv')) # Put in your own path here" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "Keep playing with the above function to get it to work more efficiently or to reduce bad data in the computation - e.g. what are all those zero salaries? \n", 345 | "\n", 346 | "\n", 347 | "## Visualization\n", 348 | "\n", 349 | "\n", 350 | "Congratulations if you've made it this far! It's time for the bonus round!\n", 351 | "\n", 352 | "You've now had some summary statistics about the salaries of NBA players, but what we're really interested in is the relationship between `SALARY` and the rest of the fields in the data set. The `PER` - Player Efficiency Rating, is an aggregate score of all performance statistics; therefore if we determine the relationship of `PER` to `SALARY`, we might learn a lot about how to model NBA salaries. \n", 353 | "\n", 354 | "In order to explore this, let's create a scatter plot of `SALARY` to `PER`, where each point is an NBA player.\n", 355 | "\n", 356 | "Visualization is going to require a third party library. You probably already have matplotlib, so that might be the simplest if you're having trouble with installation. If you don't, `pip install` it now! Follow the documentation to create the scatter plot inline in the notebook in the following cells." 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": { 363 | "collapsed": true 364 | }, 365 | "outputs": [], 366 | "source": [ 367 | "# Insert your Python to create the visualization here\n", 368 | "import os\n", 369 | "import pandas as pd\n", 370 | "import numpy as np\n", 371 | "\n", 372 | "import matplotlib.pyplot as plt\n", 373 | "%matplotlib inline # Makes the plot appear inline in your iPython Notebook.\n", 374 | "\n", 375 | "def read_data(path):\n", 376 | " # Pandas is an efficient way to wrangle the data quickly\n", 377 | " return pd.DataFrame(pd.read_csv(path))\n", 378 | "\n", 379 | "def graph_data(path, xkey='PER', ykey='SALARY'):\n", 380 | " data = read_data(path)\n", 381 | " ## Fill this in yourself!\n", 382 | " plt.show()\n", 383 | "\n", 384 | "graph_data('fixtures/nba_players.csv') # Or whatever your path is" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "Nice work!! Matplotlib is pretty useful, but also kind of bare bones. Once you're ready to experiment with other libraries and take your visualizations to the next level, check out the following:\n", 392 | "\n", 393 | "- Seaborn\n", 394 | "- Bokeh\n", 395 | "- Pandas\n", 396 | "\n", 397 | "Our favorite is Bokeh - it's interactive!" 398 | ] 399 | } 400 | ], 401 | "metadata": { 402 | "kernelspec": { 403 | "display_name": "Python [default]", 404 | "language": "python", 405 | "name": "python3" 406 | }, 407 | "language_info": { 408 | "codemirror_mode": { 409 | "name": "ipython", 410 | "version": 3 411 | }, 412 | "file_extension": ".py", 413 | "mimetype": "text/x-python", 414 | "name": "python", 415 | "nbconvert_exporter": "python", 416 | "pygments_lexer": "ipython3", 417 | "version": "3.5.3" 418 | } 419 | }, 420 | "nbformat": 4, 421 | "nbformat_minor": 1 422 | } 423 | -------------------------------------------------------------------------------- /nba/README.md: -------------------------------------------------------------------------------- 1 | # Data Loading Python Workshop 2 | 3 | ## Software Engineering Students: 4 | 5 | $ git clone https://github.com/georgetown-analytics/nba.git 6 | $ cd nba 7 | $ jupyter notebook 8 | 9 | In Jupyter, open the file called "NBA Player Statistics Workshop.ipynb" 10 | 11 | Given a dataset of NBA players performance and salary in 2014, you'll use Python to load the dataset and compute the summary statistics for the `SALARY` field: 12 | 13 | - mean 14 | - median 15 | - mode 16 | - minimum 17 | - maximum 18 | 19 | You will need to make use of the csv module or use pandas to load the data and interact with it. Computations should require only simple arithmetic. 20 | 21 | **Bonus:** 22 | 23 | Determine the relationship of PER (Player Efficiency Rating) to Salary via a visualization of the data. 24 | 25 | 26 | NBA 2014 Players Dataset: [http://bit.ly/2n9twqX](http://bit.ly/2n9twqX) 27 | 28 | ## Certificate Completion Challenge: 29 | 30 | If you've completed the certificate program and want to test your data science skills from ingestion through machine learning, follow the instructions in the file called "Data Analysis of NBA Players Challenge.ipynb" 31 | -------------------------------------------------------------------------------- /nba/basketball.py: -------------------------------------------------------------------------------- 1 | # basketball.py 2 | 3 | # NBA Player Statistics Workshop for the 4 | # Georgetown University Data Science Certificate Program 5 | # 09/19/2015 6 | # by Ben Benfort and Rebecca Bilbro 7 | 8 | 9 | 10 | """ 11 | Welcome! 12 | 13 | The NBA Player Statistics Workshop: 14 | 15 | Given a dataset of NBA players performance and salary in 2014, use Python to 16 | load the dataset and compute the summary statistics for the SALARY field: 17 | - mean 18 | - median 19 | - mode 20 | - minimum 21 | - maximum 22 | 23 | You will need to make use of the csv module to load the data and interact with 24 | it. Computations should require only simple arithmetic. (For the purposes of 25 | this exercise, attempt to use pure Python and no third party dependencies like 26 | Pandas - you can then compare and contrast the use of Pandas for this task 27 | later). 28 | 29 | Bonus: 30 | Determine the relationship of PER (Player Efficiency Rating) to Salary via a 31 | visualization of the data. 32 | 33 | 34 | URL for NBA 2014 Players Dataset => http://bit.ly/gtnbads 35 | """ 36 | 37 | ##################################################### 38 | # Imports 39 | ##################################################### 40 | """ 41 | You'll need some of these later, but it's traditional to put them 42 | all at the beginning. 43 | """ 44 | import os 45 | import csv 46 | import json 47 | 48 | from collections import Counter 49 | from operator import itemgetter 50 | from urllib.request import urlopen 51 | 52 | ##################################################### 53 | # Fetching the Data 54 | ##################################################### 55 | """ 56 | You have a couple of options of fetching the data set to begin your analysis: 57 | Cut and past the link above and download the file. 58 | Write a Python function that automatically downloads the data as a 59 | comma-separated value file (CSV) and writes it to disk. 60 | 61 | In either case, you'll have to be cognizant of where the CSV file lands. 62 | Here is a quick implementation of a function to download a URL at a file and 63 | write it to disk. 64 | 65 | Note the many approaches to do this as outlined here: How do I download a file 66 | over HTTP using Python?: 67 | http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 68 | """ 69 | 70 | 71 | def download(url, path): 72 | """ 73 | Downloads a URL and writes it to the specified path. The "path" 74 | is like the mailing address for the file - it tells the function 75 | where on your computer to send it! 76 | 77 | Also note the use of "with" to automatically close files - this 78 | is a good standard practice to follow. 79 | """ 80 | response = urlopen(url) 81 | with open(path, 'wb') as f: 82 | f.write(response.read()) 83 | 84 | response.close() 85 | 86 | """ 87 | Your turn: use the above function to download the data! 88 | """ 89 | ## Write the Python to execute the function and download the file here: 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | ##################################################### 98 | # Loading the Data 99 | ##################################################### 100 | """ 101 | Now that we have the CSV file that we're looking for, we need to be able to 102 | open the file and read it into memory. The trick is that we want to read only a 103 | single line at a time - consider really large CSV files. Python provides memory 104 | efficient iteration in the form of generators and the csv.reader module exposes 105 | one such generator, that reads the data from the CSV one row at a time. Moreover, 106 | we also want to parse our data so that we have specific access to the fields 107 | we're looking for. The csv.DictReader class will give you each row as a 108 | dictionary, where the keys are derived from the first, header line of the file. 109 | Here is a function that reads data from disk one line at a time and yields it to 110 | the user. 111 | """ 112 | def read_csv(path): 113 | # First open the file 114 | with open(path, 'rt') as f: 115 | # Create a DictReader to parse the CSV 116 | reader = csv.DictReader(f) 117 | for row in reader: 118 | # HINT: Convert SALARY column values into integers & PER column into 119 | # floats. Otherwise CSVs can turn ints into strs! 120 | # You'll thank me later :D 121 | row['SALARY'] = int(row['SALARY']) 122 | row['PER'] = float(row['PER']) 123 | # Now yield each row one at a time. 124 | yield row 125 | """ 126 | Your turn: use the above function to open the file and print out the first row 127 | of the CSV! To do this, you'll need to do three things: 128 | 129 | First, remember where you told the download function to store your file? 130 | Pass that same path into read_csv: 131 | """ 132 | ## Write the Python to execute our read_csv function. 133 | 134 | 135 | 136 | 137 | """ 138 | Next step: The read_csv function "returns" a generator. How can we access just 139 | the first row? Remember how to access the next row of a generator? 140 | HINT: http://stackoverflow.com/questions/4741243/how-to-pick-just-one-item-from-a-generator-in-python 141 | """ 142 | ## Now write the Python to print the first row of the CSV here. 143 | 144 | 145 | 146 | 147 | """ 148 | Are there different ways to print the first n rows of something? Sure! Try using break, which will stop a for loop from running. E.g. the code: 149 | for idx in xrange(100): 150 | if idx > 10: 151 | break 152 | ...will stop the for loop after 10 iterations. 153 | Next, write a for loop that can access and print every row. 154 | """ 155 | ## Write the Python to print *every* row of the CSV here. 156 | 157 | 158 | 159 | 160 | ##################################################### 161 | # Summary Statistics 162 | ##################################################### 163 | """ 164 | In this section, you'll use the CSV data to write computations for mean, median, 165 | mode, minimum, and maximum. Use Python to access the values in the SALARY column. 166 | """ 167 | data = list(read_csv('fixtures/nba_players.csv')) #Put in your own path here. 168 | data = sorted(data, key=itemgetter('SALARY')) 169 | 170 | total = 0 171 | count = 0 172 | 173 | for row in data: 174 | count += 1 175 | total += row['SALARY'] 176 | 177 | 178 | # Total Count 179 | print("There are {0:d} total players.".format(count)) 180 | 181 | 182 | # Write the Python to get the median 183 | median = 184 | print("The median salary is {0:d}.".format(median)) 185 | 186 | 187 | # Write the Python to get the minimum 188 | minimum = 189 | print("The minimum salary is {0:d}.".format(minimum)) 190 | 191 | 192 | # Write the Python to get the maximum 193 | maximum = 194 | print("The maximum salary is {0:d}.".format(maximum)) 195 | 196 | 197 | # Write the Python to get the mean 198 | mean = 199 | print("The mean salary is {0:d}.".format(mean)) 200 | 201 | 202 | 203 | """ 204 | Nice work! Now... calculating the mode is a bit different. Remember about the 205 | Decorate-Sort-Undecorate pattern that we learned about in ThinkPython? That 206 | will work here! 207 | 208 | Reminder: http://www.greenteapress.com/thinkpython/html/thinkpython013.html 209 | """ 210 | ## Write the Python to get the mode of the salaries. 211 | 212 | 213 | 214 | 215 | """ 216 | The "DSU" approach is a little inefficient. Instead of using a dictionary as 217 | our data type to solve the mode problem, we could use counter() from the 218 | Collections module. Read more about counter() and try it out here: 219 | https://pymotw.com/2/collections/counter.html 220 | """ 221 | ## Experiment with using counter() here. 222 | 223 | 224 | 225 | 226 | 227 | 228 | ##################################################### 229 | # Putting the pieces together 230 | ##################################################### 231 | """ 232 | The above summary statistics can actually be computed inside of a single 233 | (and elegant!) function. Give it a try! 234 | """ 235 | 236 | def statistics(path): 237 | """ 238 | Takes as input a path to `read_csv` and the field to 239 | compute the summary statistics upon. 240 | """ 241 | 242 | # Uncomment below to load the CSV into a list 243 | # data = list(read_csv(path)) 244 | 245 | # Fill in the function here 246 | 247 | 248 | 249 | 250 | stats = { 251 | 'maximum': data[-1]['SALARY'], 252 | 'minimum': data[0]['SALARY'], 253 | 'median': data[count // 2]['SALARY'], # Any potential problems here? 254 | 'mode': freqs.most_common(2), 255 | 'mean': total // count, 256 | } 257 | 258 | return stats 259 | { 260 | "minimum": null, 261 | "median": null, 262 | "mode": null, 263 | "maximum": null, 264 | "mean": null 265 | } 266 | 267 | 268 | """ 269 | Keep playing with the above function to get it to work more efficiently or to 270 | reduce bad data in the computation - e.g. what are all those zero salaries? 271 | """ 272 | 273 | 274 | 275 | 276 | ##################################################### 277 | # Visualization 278 | ##################################################### 279 | """Congratulations if you've made it this far! It's time for the bonus round! 280 | You've now had some summary statistics about the salaries of NBA players, 281 | but what we're really interested in is the relationship between SALARY and 282 | the rest of the fields in the data set. The PER - Player Efficiency Rating, 283 | is an aggregate score of all performance statistics; therefore if we determine 284 | the relationship of PER to SALARY, we might learn a lot about how to model NBA 285 | salaries. 286 | 287 | In order to explore this, let's create a scatter plot of SALARY to PER, where 288 | each point is an NBA player. Visualization is going to require a third party 289 | library. You probably already have matplotlib, so that might be the simplest 290 | if you're having trouble with installation. If you don't, pip install it now! 291 | Follow the documentation to create the scatter plot inline in the notebook in 292 | the following cells. 293 | """ 294 | # Fill in the blanks below to create the visualization here 295 | import os 296 | import pandas as pd 297 | import numpy as np 298 | 299 | import matplotlib.pyplot as plt 300 | 301 | def read_data(path): 302 | # Pandas is an efficient way to wrangle the data quickly 303 | return pd.DataFrame(pd.read_csv(path)) 304 | 305 | def graph_data(path, xkey='PER', ykey='SALARY'): 306 | data = read_data(path) 307 | ## Fill this in yourself! 308 | plt.show() 309 | 310 | graph_data('fixtures/nba_players.csv') # Or whatever your path is 311 | 312 | 313 | 314 | 315 | 316 | 317 | """ 318 | Nice work!! Matplotlib is pretty useful, but also kind of bare bones. Once you're ready to experiment with other libraries and take your visualizations to the next level, check out the following: 319 | -Seaborn 320 | -Bokeh 321 | -Pandas 322 | 323 | Our favorite is Bokeh - it's interactive! 324 | """ 325 | -------------------------------------------------------------------------------- /nba/requirements.txt: -------------------------------------------------------------------------------- 1 | # requirements.txt 2 | 3 | 4 | # For ingestion, wrangling and computation 5 | os 6 | csv 7 | json 8 | urllib.request.urlopen 9 | collections.Counter 10 | operator.itemgetter 11 | 12 | 13 | # For the visualization part 14 | pandas==0.18.1 15 | numpy==1.22.0 16 | matplotlib==1.5.3 17 | 18 | 19 | # If you want to do the lab in iPython 20 | ipython==7.16.3 21 | -------------------------------------------------------------------------------- /testing_workshop/LICENSE.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/testing_workshop/LICENSE.txt -------------------------------------------------------------------------------- /testing_workshop/README.md: -------------------------------------------------------------------------------- 1 | Overview 2 | -------- 3 | 4 | This repository contains the automated testing workshop for the Georgetown University, Data Science Certificate Program, XBUS-501 Software Engineering for Data class. 5 | 6 | In order to complete the workshop, clone this repository and then fill in the missing tests. If time allows feel free to continue adding tests or add more functionality to the buildings or vehicles modules for testing. 7 | 8 | The documentation for pytest can be found at [https://docs.pytest.org/en/stable/reference.html](https://docs.pytest.org/en/stable/reference.html) 9 | 10 | 11 | Setup 12 | ----- 13 | 14 | Install the project dependencies using `pip` and the supplied `requirements.txt` file. 15 | 16 | pip install -r requirements.txt 17 | 18 | Tests 19 | ----- 20 | 21 | To execute the tests using the `pytest` package test runner, use the following command: 22 | 23 | pytest tests/ 24 | -------------------------------------------------------------------------------- /testing_workshop/docs/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/testing_workshop/docs/.gitkeep -------------------------------------------------------------------------------- /testing_workshop/fixtures/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/testing_workshop/fixtures/.gitkeep -------------------------------------------------------------------------------- /testing_workshop/motorsports/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/georgetown-analytics/XBUS-501-01.Software-Engineering-for-Data/5905248de94b92fe9a2667ba2c05271b58f2918e/testing_workshop/motorsports/__init__.py -------------------------------------------------------------------------------- /testing_workshop/motorsports/buildings.py: -------------------------------------------------------------------------------- 1 | # motorsports.buildings 2 | # description 3 | # 4 | # Sample Usage: 5 | # g = Garage("Al's Garage") 6 | # c = Car('silver', 'Porsche', 'Boxster') 7 | # g.enter(c) 8 | # g.exit(c) 9 | # 10 | # Author: Allen Leis 11 | # Created: Fri Sep 11 23:22:32 2015 -0400 12 | # 13 | # Copyright (C) 2015 georgetown.edu 14 | # For license information, see LICENSE.txt 15 | # 16 | # ID: buildings.py [] allen.leis@georgetown.edu $ 17 | 18 | """ 19 | A module to supply building related classes 20 | """ 21 | 22 | ########################################################################## 23 | ## Imports 24 | ########################################################################## 25 | 26 | from motorsports.vehicles import BaseVehicle, Car 27 | 28 | ########################################################################## 29 | ## Classes 30 | ########################################################################## 31 | 32 | class BaseBuilding(object): 33 | 34 | def __init__(self, name): 35 | self._name = name 36 | 37 | @property 38 | def name(self): 39 | return self._name 40 | 41 | 42 | class Garage(BaseBuilding): 43 | 44 | def __init__(self, *args, **kwargs): 45 | self._vehicles = [] 46 | super(Garage, self).__init__(*args, **kwargs) 47 | 48 | def enter(self, vehicle): 49 | """ 50 | Adds a new vehicle to the garage 51 | """ 52 | if isinstance(vehicle, BaseVehicle): 53 | self._vehicles.append(vehicle) 54 | print('The {} has been parked in {}.'.format(vehicle.description, self.name)) 55 | else: 56 | raise TypeError('Only vehicles are allowed in garages') 57 | 58 | def exit(self, vehicle): 59 | """ 60 | Removes a vehicle from the garage 61 | """ 62 | if isinstance(vehicle, BaseVehicle): 63 | if vehicle not in self: 64 | raise LookupError('That vehicle is not in {}.'.format(self.name)) 65 | self._vehicles.remove(vehicle) 66 | print('The {} has left {}.'.format(vehicle.description, self.name)) 67 | 68 | else: 69 | raise TypeError('Only vehicles are allowed in garages.') 70 | 71 | def __len__(self): 72 | """ 73 | Python builtin to support len function 74 | """ 75 | return len(self._vehicles) 76 | 77 | def __iter__(self): 78 | """ 79 | Python builtin to support iteration 80 | """ 81 | for v in self._vehicles: 82 | yield v 83 | -------------------------------------------------------------------------------- /testing_workshop/motorsports/vehicles.py: -------------------------------------------------------------------------------- 1 | # motorsports.vehicles 2 | # description 3 | # 4 | # Author: Allen Leis 5 | # Created: Fri Sep 11 23:22:32 2015 -0400 6 | # 7 | # Copyright (C) 2015 georgetown.edu 8 | # For license information, see LICENSE.txt 9 | # 10 | # ID: vehicles.py [] allen.leis@georgetown.edu $ 11 | 12 | """ 13 | A module to supply building related classes 14 | """ 15 | 16 | ########################################################################## 17 | ## Imports 18 | ########################################################################## 19 | 20 | 21 | ########################################################################## 22 | ## Classes 23 | ########################################################################## 24 | 25 | class BaseVehicle(object): 26 | 27 | def __init__(self): 28 | self._state = None 29 | 30 | @property 31 | def state(self): 32 | """ 33 | return a string describing the state of the vehicle 34 | """ 35 | return self._state or 'stopped' 36 | 37 | @property 38 | def description(self): 39 | """ 40 | return a string describing this object 41 | """ 42 | return self.__class__.__name__ 43 | 44 | def start(self): 45 | """ 46 | Starts the vehicle 47 | """ 48 | self._state = 'started' 49 | 50 | def shutdown(self): 51 | """ 52 | Starts the vehicle 53 | """ 54 | self._state = 'stopped' 55 | 56 | def __str__(self): 57 | return "I am a {}.".format(self.description) 58 | 59 | 60 | class Car(BaseVehicle): 61 | 62 | def __init__(self, color, make, model): 63 | self.color = color 64 | self.make = make 65 | self.model = model 66 | super(Car, self).__init__() 67 | 68 | @property 69 | def description(self): 70 | """ 71 | return a string describing this object 72 | """ 73 | return '{} {} {}'.format(self.color, self.make, self.model) 74 | 75 | def start(self): 76 | """ 77 | Starts the vehicle 78 | """ 79 | super(Car, self).start() 80 | print('vroom') 81 | 82 | 83 | ########################################################################## 84 | ## Execution 85 | ########################################################################## 86 | 87 | if __name__ == '__main__': 88 | c = Car('white', 'Ford', 'Bronco') 89 | print(c.state) 90 | c.start() 91 | print(c.state) 92 | -------------------------------------------------------------------------------- /testing_workshop/requirements.txt: -------------------------------------------------------------------------------- 1 | pytest==5.3.1 2 | pytest-cov==2.8.1 3 | -------------------------------------------------------------------------------- /testing_workshop/setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | raise NotImplementedError("Setup not implemented yet.") 3 | -------------------------------------------------------------------------------- /testing_workshop/tests/__init__.py: -------------------------------------------------------------------------------- 1 | # tests 2 | # Testing package for the motorsports library 3 | # 4 | # Author: Allen Leis 5 | # Created: Fri Sep 11 23:22:32 2015 -0400 6 | # Adapted by: Rebecca Bilbro 7 | # Updated on: Thu Mar 5 14:20:08 EST 2020 8 | # 9 | # Copyright (C) 2015 georgetown.edu 10 | # For license information, see LICENSE.txt 11 | # 12 | # ID: __init__.py [] allen.leis@georgetown.edu $ 13 | 14 | """ 15 | Testing package for the motorsports library. 16 | """ 17 | -------------------------------------------------------------------------------- /testing_workshop/tests/test_buildings.py: -------------------------------------------------------------------------------- 1 | # tests.test_buildings 2 | # test module to evaluate the classes in the buildings module 3 | # 4 | # to execute tests, run the following command from project root: 5 | # pytest tests 6 | # 7 | # for a list of available asserts: 8 | # https://docs.pytest.org/en/latest/assert.html 9 | # 10 | # Author: Allen Leis 11 | # Created: Fri Sep 11 23:22:32 2015 -0400 12 | # Adapted by: Rebecca Bilbro 13 | # Updated on: Thu Mar 5 14:20:08 EST 2020 14 | # 15 | # Copyright (C) 2015 georgetown.edu 16 | # For license information, see LICENSE.txt 17 | # 18 | # ID: test_buildings.py [] allen.leis@georgetown.edu $ 19 | 20 | """ 21 | Test cases for buildings module 22 | """ 23 | 24 | ########################################################################## 25 | ## Imports 26 | ########################################################################## 27 | 28 | import pytest 29 | 30 | from motorsports.buildings import Car 31 | from motorsports.buildings import Garage 32 | 33 | ########################################################################## 34 | ## Tests 35 | ########################################################################## 36 | 37 | 38 | class TestGarage(): 39 | 40 | def test_has_name(self): 41 | """ 42 | Ensure the garage returns the name provided at creation 43 | """ 44 | name = 'Bob\'s Garage' 45 | g = Garage(name) 46 | assert name == g.name 47 | 48 | @pytest.mark.skip(reason="pending test code") 49 | def test_allows_cars_to_enter(self): 50 | """ 51 | Ensure the garage allows Car object to enter 52 | """ 53 | pass 54 | 55 | @pytest.mark.skip(reason="pending test code") 56 | def test_only_allows_cars_to_enter(self): 57 | """ 58 | Ensure the garage raises TypeError if non vehicle attempts to enter 59 | """ 60 | pass 61 | 62 | @pytest.mark.skip(reason="pending test code") 63 | def test_only_allows_cars_to_exit(self): 64 | """ 65 | Ensure the garage raises TypeError if non vehicle attempts to exit 66 | """ 67 | pass 68 | 69 | @pytest.mark.skip(reason="pending test code") 70 | def test_allows_cars_to_exit(self): 71 | """ 72 | Ensure vehicles can leave the garage 73 | """ 74 | pass 75 | 76 | @pytest.mark.skip(reason="pending test code") 77 | def test_raise_lookup_error_on_exit(self): 78 | """ 79 | Ensure that garage raises LookupError if vehicle attempts 80 | to exit but was never in garage. 81 | """ 82 | pass 83 | 84 | @pytest.mark.skip(reason="pending test code") 85 | def test_iter_builtin(self): 86 | """ 87 | Ensure we can iterate over garage vehicles by trying to 88 | iterate over the garage itself 89 | """ 90 | pass 91 | 92 | @pytest.mark.skip(reason="pending test code") 93 | def test_len_builtin(self): 94 | """ 95 | Ensure that the length of the garage matches the number 96 | of vehicles parked in it 97 | """ 98 | pass 99 | -------------------------------------------------------------------------------- /testing_workshop/tests/test_imports.py: -------------------------------------------------------------------------------- 1 | # tests.test_imports.py 2 | # Test the imports for the motorsports library 3 | # 4 | # Author: Allen Leis 5 | # Created: Fri Sep 11 23:22:32 2015 -0400 6 | # Adapted by: Rebecca Bilbro 7 | # Updated on: Thu Mar 5 14:20:08 EST 2020 8 | # 9 | # Copyright (C) 2015 georgetown.edu 10 | # For license information, see LICENSE.txt 11 | # 12 | # ID: test_imports.py [] rsb89@georgetown.edu $ 13 | 14 | """ 15 | Test imports and initialization for the motorsports library. 16 | """ 17 | 18 | ########################################################################## 19 | ## Imports 20 | ########################################################################## 21 | 22 | import pytest 23 | 24 | ########################################################################## 25 | ## Initialization Tests 26 | ########################################################################## 27 | 28 | 29 | class TestImports(): 30 | 31 | def test_import_motorsports(self): 32 | """ 33 | Ensure the test suite can import the motorsports module 34 | """ 35 | try: 36 | import motorsports 37 | except ImportError: 38 | self.fail("Was not able to import the motorsports") 39 | 40 | def test_import_buildings(self): 41 | """ 42 | Ensure the test suite can import the buildings module 43 | """ 44 | try: 45 | import motorsports.buildings 46 | except ImportError: 47 | self.fail("Was not able to import the motorsports") 48 | 49 | @pytest.mark.skip(reason="pending test code") 50 | def test_import_vehicles(self): 51 | """ 52 | Ensure the test suite can import the vehicles module 53 | """ 54 | pass 55 | -------------------------------------------------------------------------------- /testing_workshop/tests/test_vehicles.py: -------------------------------------------------------------------------------- 1 | # tests.test_vehicles 2 | # test module to evaluate the classes in the vehicles module 3 | # 4 | # for a list of available asserts: 5 | # https://docs.pytest.org/en/latest/assert.html 6 | # 7 | # to execute tests, run the following command from project root: 8 | # pytest tests 9 | # 10 | # Author: Allen Leis 11 | # Created: Fri Sep 11 23:22:32 2015 -0400 12 | # Adapted by: Rebecca Bilbro 13 | # Updated on: Thu Mar 5 14:20:08 EST 2020 14 | # 15 | # Copyright (C) 2015 georgetown.edu 16 | # For license information, see LICENSE.txt 17 | # 18 | # ID: test_vehicles.py [] allen.leis@georgetown.edu $ 19 | 20 | """ 21 | Test cases for vehicles module 22 | """ 23 | 24 | ########################################################################## 25 | ## Imports 26 | ########################################################################## 27 | 28 | import pytest 29 | 30 | from motorsports.buildings import Car, BaseVehicle 31 | 32 | ########################################################################## 33 | ## Tests 34 | ########################################################################## 35 | 36 | 37 | class TestVehicle(): 38 | 39 | @pytest.mark.skip(reason="pending test code") 40 | def test_description(self): 41 | """ 42 | Ensure the car description return a string of: "color, make model" 43 | """ 44 | pass 45 | 46 | @pytest.mark.skip(reason="pending test code") 47 | def test_initial_state_is_stopped(self): 48 | """ 49 | Ensure the a car's initial state is "stopped" 50 | """ 51 | pass 52 | 53 | @pytest.mark.skip(reason="pending test code") 54 | def test_state_after_start(self): 55 | """ 56 | Ensure the car's state is "started" after using start method 57 | """ 58 | pass 59 | 60 | @pytest.mark.skip(reason="pending test code") 61 | def test_state_after_stop(self): 62 | """ 63 | Ensure the car's state is "stopped" after using shutdown method 64 | """ 65 | pass 66 | 67 | @pytest.mark.skip(reason="pending test code") 68 | def test_str_builtin(self): 69 | """ 70 | Ensure the car evaluates to a string of 71 | "I am a , , ." 72 | """ 73 | pass 74 | 75 | @pytest.mark.skip(reason="pending test code") 76 | def test_color_requirement(self): 77 | """ 78 | Ensure the car requires a color argument during instantiation 79 | """ 80 | pass 81 | 82 | @pytest.mark.skip(reason="pending test code") 83 | def test_color_requirement(self): 84 | """ 85 | Ensure the car requires a color argument during instantiation 86 | """ 87 | pass 88 | 89 | @pytest.mark.skip(reason="pending test code") 90 | def test_make_requirement(self): 91 | """ 92 | Ensure the car requires a make argument during instantiation 93 | """ 94 | pass 95 | 96 | @pytest.mark.skip(reason="pending test code") 97 | def test_model_requirement(self): 98 | """ 99 | Ensure the car requires a model argument during instantiation 100 | """ 101 | pass 102 | 103 | @pytest.mark.skip(reason="pending test code") 104 | def test_state_read_only(self): 105 | """ 106 | Ensure the car state attribute is read only and throws 107 | AttributeError if someone tries to assign a value directly 108 | """ 109 | pass 110 | 111 | @pytest.mark.skip(reason="pending test code") 112 | def test_car_is_a_vehicle(self): 113 | """ 114 | Ensure a car object is also an instance of BaseVehicle 115 | """ 116 | pass 117 | -------------------------------------------------------------------------------- /xbus-501-01.software-engineering-for-data.md: -------------------------------------------------------------------------------- 1 | # XBUS-501-01.Software-Engineering-for-Data 2 | 3 | ## Course Details 4 | Data scientists work in teams and it's important for each team member to understand software engineering processes and practices. From requirements gathering to agile development to testing and deployment, the ability to go beyond writing macros and simple scripts is key to both more sophisticated analyses and building reproducible and scalable data investigations and data products. This course, based in Python, will cover fundamental aspects of computer science, good practices in software engineering, and practical aspects of deploying code in production environments. To do this, we will use the Python language, a simple yet elegant general purpose programming language that is well-suited for data analysis and visualization. 5 | 6 | ## Course Objectives 7 | Upon successful completion of the course, students will: 8 | 9 | * Understand software architecture and design 10 | * Examine agile and hypothesis-driven software development processes 11 | * Identify team roles and workflows in software engineering 12 | * Conduct requirements gathering 13 | * Use Git and Github for version control and collaboration 14 | * Recognize the importance of testing and building test suites 15 | * Understand the legal aspects of software development 16 | * Apply software engineering practices to your data science project 17 | 18 | ## Notes 19 | Enrollment in this course is restricted. Students must submit an application and be accepted into the [Certificate in Data Science](http://scs.georgetown.edu/programs_nc/CE0124/data-analytics) in order to register for this course. 20 | 21 | Current Georgetown students must create an application using their Georgetown NetID and password. New students will be prompted to create an account. 22 | 23 | ## Course Prerequisites 24 | Course prerequisites include: 25 | 26 | * A bachelor's degree or equivalent 27 | * Completion of at least two college-level math courses (e.g. statistics, calculus, etc.) 28 | * Successful completion of Foundations of Data Analytics and Data Science (XBUS-500) 29 | * Basic familiarity with programming or a programming language 30 | * A laptop for class meetings and coursework 31 | 32 | Students with little or no programming experience are strongly encouraged to complete [Python Basics for Data Analysis](http://scs.georgetown.edu/courses/1415/python-basics-for-data-analysis) before enrolling in this course. 33 | 34 | 35 | ## Applies Towards the Following Certificates 36 | [Data Science](http://scs.georgetown.edu/programs/11193156&) 37 | --------------------------------------------------------------------------------