├── README.md
└── college_basketball_finalfour.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Project: College Basketball Final Four Prediction
2 |
3 | ## Project Goal
4 | Analyze basketball metrics of different college basketball teams and predict which teams can make it to the NCAA Final Four.
5 |
6 | ## Dataset Information:
7 | This dataset is about the performance of basketball teams. The cbb.csv data set includes performance data about five seasons of 354 basketball teams.
8 |
9 | ## Tech Stack
10 | Python
11 | Jupyter Notebook
12 | NumPy
13 | Pandas
14 | Matplotlib
15 | Scikit-learn
16 | IBM Watson Studio
17 |
18 | ## Featured ML Algorithms
19 | K-Nearest Neighbors (KNN)
20 | Decision Tree
21 | Support Vector Machines (SVM)
22 | Logistic Regression
23 |
24 | ## Things to Note
25 | This Jupyter Notebook is also hosted on [IBM Watson Studio](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/967af07f-20f4-441d-9b77-1c71ad1fb54a/view?access_token=19ad526b688b2b7d02e93fa615a35124fc97a6fcb28ccf8385244b451868409e)
26 |
27 |
--------------------------------------------------------------------------------
/college_basketball_finalfour.ipynb:
--------------------------------------------------------------------------------
1 | {"cells": [{"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "
College Basketball Final Four Prediction "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "import itertools\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.ticker import NullFormatter\nimport pandas as pd\nimport numpy as np\nimport matplotlib.ticker as ticker\nfrom sklearn import preprocessing\n%matplotlib inline", "execution_count": 1, "outputs": []}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### About dataset"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "This dataset is about the performance of basketball teams. The __cbb.csv__ data set includes performance data about five seasons of 354 basketball teams. It includes following fields:\n\n| Field | Description |\n|----------------|---------------------------------------------------------------------------------------|\n|TEAM |\tThe Division I college basketball school|\n|CONF|\tThe Athletic Conference in which the school participates in (A10 = Atlantic 10, ACC = Atlantic Coast Conference, AE = America East, Amer = American, ASun = ASUN, B10 = Big Ten, B12 = Big 12, BE = Big East, BSky = Big Sky, BSth = Big South, BW = Big West, CAA = Colonial Athletic Association, CUSA = Conference USA, Horz = Horizon League, Ivy = Ivy League, MAAC = Metro Atlantic Athletic Conference, MAC = Mid-American Conference, MEAC = Mid-Eastern Athletic Conference, MVC = Missouri Valley Conference, MWC = Mountain West, NEC = Northeast Conference, OVC = Ohio Valley Conference, P12 = Pac-12, Pat = Patriot League, SB = Sun Belt, SC = Southern Conference, SEC = South Eastern Conference, Slnd = Southland Conference, Sum = Summit League, SWAC = Southwestern Athletic Conference, WAC = Western Athletic Conference, WCC = West Coast Conference)|\n|G|\tNumber of games played|\n|W|\tNumber of games won|\n|ADJOE|\tAdjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)|\n|ADJDE|\tAdjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)|\n|BARTHAG|\tPower Rating (Chance of beating an average Division I team)|\n|EFG_O|\tEffective Field Goal Percentage Shot|\n|EFG_D|\tEffective Field Goal Percentage Allowed|\n|TOR|\tTurnover Percentage Allowed (Turnover Rate)|\n|TORD|\tTurnover Percentage Committed (Steal Rate)|\n|ORB|\tOffensive Rebound Percentage|\n|DRB|\tDefensive Rebound Percentage|\n|FTR|\tFree Throw Rate (How often the given team shoots Free Throws)|\n|FTRD|\tFree Throw Rate Allowed|\n|2P_O|\tTwo-Point Shooting Percentage|\n|2P_D|\tTwo-Point Shooting Percentage Allowed|\n|3P_O|\tThree-Point Shooting Percentage|\n|3P_D|\tThree-Point Shooting Percentage Allowed|\n|ADJ_T|\tAdjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)|\n|WAB|\tWins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)|\n|POSTSEASON|\tRound where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)|\n|SEED|\tSeed in the NCAA March Madness Tournament|\n|YEAR|\tSeason"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Load Data From CSV File "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Let's load the dataset [NB Need to provide link to csv file]"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0120ENv3/Dataset/ML0101EN_EDX_skill_up/cbb.csv')\ndf.head()", "execution_count": 2, "outputs": [{"output_type": "execute_result", "execution_count": 2, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR \\\n0 North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 15.4 \n1 Villanova BE 40 35 123.1 90.9 0.9703 56.1 46.7 16.3 \n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 15.3 \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 15.1 \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 17.8 \n\n ... FTRD 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED YEAR \n0 ... 30.4 53.9 44.6 32.7 36.2 71.7 8.6 2ND 1.0 2016 \n1 ... 30.0 57.4 44.1 36.2 33.9 66.7 8.9 Champions 2.0 2016 \n2 ... 26.0 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 2016 \n3 ... 33.4 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 2016 \n4 ... 37.3 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 2016 \n\n[5 rows x 24 columns]", "text/html": "\n\n
\n \n \n \n TEAM \n CONF \n G \n W \n ADJOE \n ADJDE \n BARTHAG \n EFG_O \n EFG_D \n TOR \n ... \n FTRD \n 2P_O \n 2P_D \n 3P_O \n 3P_D \n ADJ_T \n WAB \n POSTSEASON \n SEED \n YEAR \n \n \n \n \n 0 \n North Carolina \n ACC \n 40 \n 33 \n 123.3 \n 94.9 \n 0.9531 \n 52.6 \n 48.1 \n 15.4 \n ... \n 30.4 \n 53.9 \n 44.6 \n 32.7 \n 36.2 \n 71.7 \n 8.6 \n 2ND \n 1.0 \n 2016 \n \n \n 1 \n Villanova \n BE \n 40 \n 35 \n 123.1 \n 90.9 \n 0.9703 \n 56.1 \n 46.7 \n 16.3 \n ... \n 30.0 \n 57.4 \n 44.1 \n 36.2 \n 33.9 \n 66.7 \n 8.9 \n Champions \n 2.0 \n 2016 \n \n \n 2 \n Notre Dame \n ACC \n 36 \n 24 \n 118.3 \n 103.3 \n 0.8269 \n 54.0 \n 49.5 \n 15.3 \n ... \n 26.0 \n 52.9 \n 46.5 \n 37.4 \n 36.9 \n 65.5 \n 2.3 \n E8 \n 6.0 \n 2016 \n \n \n 3 \n Virginia \n ACC \n 37 \n 29 \n 119.9 \n 91.0 \n 0.9600 \n 54.8 \n 48.4 \n 15.1 \n ... \n 33.4 \n 52.6 \n 46.3 \n 40.3 \n 34.7 \n 61.9 \n 8.6 \n E8 \n 1.0 \n 2016 \n \n \n 4 \n Kansas \n B12 \n 37 \n 32 \n 120.9 \n 90.4 \n 0.9662 \n 55.7 \n 45.1 \n 17.8 \n ... \n 37.3 \n 52.7 \n 43.4 \n 41.3 \n 32.5 \n 70.1 \n 11.6 \n E8 \n 1.0 \n 2016 \n \n \n
\n
5 rows \u00d7 24 columns
\n
"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "df.shape", "execution_count": 3, "outputs": [{"output_type": "execute_result", "execution_count": 3, "data": {"text/plain": "(1406, 24)"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "## Add Column\nNext we'll add a column that will contain \"true\" if the wins above bubble are over 7 and \"false\" if not. We'll call this column Win Index or \"windex\" for short. "}, {"metadata": {}, "cell_type": "code", "source": "df['windex'] = np.where(df.WAB > 7, 'True', 'False')", "execution_count": 4, "outputs": []}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "# Data visualization and pre-processing\n\n"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Next we'll filter the data set to the teams that made the Sweet Sixteen, the Elite Eight, and the Final Four in the post season. We'll also create a new dataframe that will hold the values with the new column."}, {"metadata": {}, "cell_type": "code", "source": "df1 = df[df['POSTSEASON'].str.contains('F4|S16|E8', na=False)]\ndf1.head()", "execution_count": 5, "outputs": [{"output_type": "execute_result", "execution_count": 5, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR ... \\\n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 15.3 ... \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 15.1 ... \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 17.8 ... \n5 Oregon P12 37 30 118.4 96.2 0.9163 52.3 48.9 16.1 ... \n6 Syracuse ACC 37 23 111.9 93.6 0.8857 50.0 47.3 18.1 ... \n\n 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED YEAR windex \n2 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 2016 False \n3 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 2016 True \n4 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 2016 True \n5 52.6 46.1 34.4 36.2 69.0 6.7 E8 1.0 2016 False \n6 47.2 48.1 36.0 30.7 65.5 -0.3 F4 10.0 2016 False \n\n[5 rows x 25 columns]", "text/html": "\n\n
\n \n \n \n TEAM \n CONF \n G \n W \n ADJOE \n ADJDE \n BARTHAG \n EFG_O \n EFG_D \n TOR \n ... \n 2P_O \n 2P_D \n 3P_O \n 3P_D \n ADJ_T \n WAB \n POSTSEASON \n SEED \n YEAR \n windex \n \n \n \n \n 2 \n Notre Dame \n ACC \n 36 \n 24 \n 118.3 \n 103.3 \n 0.8269 \n 54.0 \n 49.5 \n 15.3 \n ... \n 52.9 \n 46.5 \n 37.4 \n 36.9 \n 65.5 \n 2.3 \n E8 \n 6.0 \n 2016 \n False \n \n \n 3 \n Virginia \n ACC \n 37 \n 29 \n 119.9 \n 91.0 \n 0.9600 \n 54.8 \n 48.4 \n 15.1 \n ... \n 52.6 \n 46.3 \n 40.3 \n 34.7 \n 61.9 \n 8.6 \n E8 \n 1.0 \n 2016 \n True \n \n \n 4 \n Kansas \n B12 \n 37 \n 32 \n 120.9 \n 90.4 \n 0.9662 \n 55.7 \n 45.1 \n 17.8 \n ... \n 52.7 \n 43.4 \n 41.3 \n 32.5 \n 70.1 \n 11.6 \n E8 \n 1.0 \n 2016 \n True \n \n \n 5 \n Oregon \n P12 \n 37 \n 30 \n 118.4 \n 96.2 \n 0.9163 \n 52.3 \n 48.9 \n 16.1 \n ... \n 52.6 \n 46.1 \n 34.4 \n 36.2 \n 69.0 \n 6.7 \n E8 \n 1.0 \n 2016 \n False \n \n \n 6 \n Syracuse \n ACC \n 37 \n 23 \n 111.9 \n 93.6 \n 0.8857 \n 50.0 \n 47.3 \n 18.1 \n ... \n 47.2 \n 48.1 \n 36.0 \n 30.7 \n 65.5 \n -0.3 \n F4 \n 10.0 \n 2016 \n False \n \n \n
\n
5 rows \u00d7 25 columns
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1['POSTSEASON'].value_counts()", "execution_count": 6, "outputs": [{"output_type": "execute_result", "execution_count": 6, "data": {"text/plain": "S16 32\nE8 16\nF4 8\nName: POSTSEASON, dtype: int64"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "40 teams made it into the Sweet Sixteen, 20 into the Elite Eight, and 10 made it into the Final Four over 5 seasons. \n"}, {"metadata": {}, "cell_type": "markdown", "source": "Lets plot some columns to underestand data better:"}, {"metadata": {}, "cell_type": "code", "source": "# notice: installing seaborn might takes a few minutes\n#!conda install -c anaconda seaborn -y", "execution_count": 7, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "import seaborn as sns\n\nbins = np.linspace(df1.BARTHAG.min(), df1.BARTHAG.max(), 10)\ng = sns.FacetGrid(df1, col=\"windex\", hue=\"POSTSEASON\", palette=\"Set1\", col_wrap=6)\ng.map(plt.hist, 'BARTHAG', bins=bins, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()", "execution_count": 8, "outputs": [{"output_type": "display_data", "data": {"text/plain": "", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbcAAADQCAYAAACEES+9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFidJREFUeJzt3X+cVXWdx/HXe4YfQ4wUASXMOIAZBSiNOi2bmdKPNWQtJS20H490a9nVtMJaNx+16ZY+1tLH1pZai2RoZrZZ6qqsP9ZCqZQEBNNINDWZBh4CK5gGAvLZP86BbuOMc2fm3Lkz33k/H4/z4Nxzv/ecz3dmPnzuOffc71cRgZmZWUpqqh2AmZlZ0VzczMwsOS5uZmaWHBc3MzNLjoubmZklx8XNzMyS4+LWByQtkfSqbrSfJOmhSsbUyXEXS3pC0up8+WQX7ZdKaumr+MzaGwi5JemyPJ9+I2l7SX6d1JdxDDZDqh3AYBARc6odQzf8U0RcX+0gzMoxEHIrIj4BWWEFbomI5o7aSRoSEbv7MLSk+cytlySds/cMR9LXJP00X3+npGvy9Scljc3fNa6VdIWkhyXdIWlE3uZwSWsk3Qt8omT/tZIulnS/pAcl/UO+fa6k/1VmvKR1kvavUB+/JWlFHvO/dvB8bX7W95CkX0takG9/naTbJK2UtEzSGysRn6VpkOTWzyVdKOke4ExJ10g6oeT550rWPyfpV3msX6xEPClxceu9e4C35estQL2kocCRwLIO2r8euCwipgNbgRPz7d8FPhkRb2nX/mPAtoh4M/Bm4O8lTY6IG4CNZMl6BXBeRGwsfaGk/UougbRfpnXSn4tL2hySb/t8RLQAM4CjJc1o95pmoCEiDo6IQ/K+ACwEzoqIw4HPApd3ckyzjqSWW50ZFRFHRcTXO2sgaQ7QBMwky7cjJB3RzeMMKr4s2XsrgcMl7Qe8AKwiS8S3AR19ZvVERKwuee0kSa8EXhURd+fbvwccm68fA8wouT7/SrIkfgI4C3gIuC8iftD+QBHxR7JE6I6OLkt+QNJ8sr+X8cA04MGS5x8HDpT0TeBW4A5J9cARwI8k7W03vJux2OCWWm515roy2hxDFvcD+eN6YArwy4JiSI6LWy9FxC5JTwKnkf2hPQi8HXgdsLaDl7xQsv4iMAIQ0NkgnyI7+7m9g+cagD3AayXVRMSev3hh9p9CR+9wAT4YEb/p5LnSfUwmO+t6c0Q8I2kxUFfaJt/+JuDdZO92PwB8Gtja2ecLZl1JPbdKPF+yvpv8ipqkWv78f7SACyLiO93Y76Dmy5LFuIesANxD9gf/j8DqKHNU6ojYCmyTdGS+6UMlT98OnJ5fjkHSFEkjJQ0hu9zyQbJEP7uD/f4xIpo7WcpNvlFkybdN0mv587vefSSNBWoi4sfAvwCHRcSzwBOS3p+3UV4Azboj5dzqyJPA4fn6XKC2JNaPSRqZx9qY5511wmduxVgGfB64NyKel7SDzt/VdeY04EpJfyL7Q95rETAJWKXs+t4m4ATgM8CyiFgmaTVwv6RbI6Kjd7Q9FhFrJD0APEx2+fEXHTRrAL4rae+bpXPzfz8EfEvSF4ChZJdf1hQZnyUv2dzqxH8CN0n6G+AO8rPRiFiS35B1X36Z/49kxXdzH8Q0IMlT3piZWWp8WdLMzJLj4mZmZslxcTMzs+S4uJmZWXIqUtxmz54dZN8t8eJlsC+FcE558bJvKUtFitvmzb471axIzimz7vFlSTMzS46Lm5mZJaes4iZpQT6NxEOSfiCprutXmZmZVUeXw29JaiAbgXtaRGyX9F/AycDiCsdmZmZd2LVrF62trezYsaPaoRSqrq6OxsZGhg4d2qPXlzu25BBghKRdwCuAth4dzczMCtXa2sp+++3HpEmTKJleakCLCLZs2UJrayuTJ0/u0T66vCwZEX8ALgGeAjaQTe53R4+OZmZmhdqxYwdjxoxJprABSGLMmDG9OhvtsrhJGg0cD0wGJgAjJX24g3bzJa2QtGLTpk09Dsj6VkNTA5IKWxqaGqrdpWQ4p6xcKRW2vXrbp3IuS76LbIbbTfkBf0I2w/I1pY0iYiGwEKClpaXsL9pZdbWtb+M9N8wpbH83z11S2L4GO+eUWc+Vc7fkU8BfS3pFPufRO+l4FlwzM6uyiePHF3o1ZuL48V0es7a2lubm5n3LRRddBMBdd93FYYcdRnNzM0ceeSSPPfZYpbu/T5dnbhGxXNL1wCqyKdAfIH83aWZm/ctTGzfSOqGxsP01trV22WbEiBGsXr36JdtPP/10brrpJqZOncrll1/OBRdcwOLFiwuL7eWUdbdkRJwHnFfhWMzMLCGSePbZZwHYtm0bEyZM6LNjl/tVADMzsw5t376d5ubmfY/PPfdc5s2bx6JFi5gzZw4jRoxg1KhR3HfffX0Wk4ubmZn1SmeXJb/2ta+xZMkSZs6cycUXX8zZZ5/NokWL+iQmjy1pZmaF27RpE2vWrGHmzJkAzJs3j1/+8pd9dnwXNzMzK9zo0aPZtm0b69atA+DOO+9k6tSpfXZ8X5Y0M0tI0/77l3WHY3f215X2n7nNnj2biy66iCuuuIITTzyRmpoaRo8ezZVXXllYXF1xcTMzS8jvN2zo82O++OKLHW6fO3cuc+fO7eNoMr4saWZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmVlCJjQ2FTrlzYTGpi6P2X7KmyeffHLfc0899RT19fVccsklFez1S/l7bmZmCdnwh/XM/OJthe1v+Zdmd9mms7ElARYsWMCxxx5bWDzlcnEzM7OKuPHGGznwwAMZOXJknx/blyXNzKxX9g6/1dzcvG9Ekueff56vfOUrnHdedaYC9ZmbmZn1SkeXJc877zwWLFhAfX19VWJycTMzs8ItX76c66+/nnPOOYetW7dSU1NDXV0dZ555Zp8c38XNzMwKt2zZsn3r559/PvX19X1W2MDFzcwsKeMbDijrDsfu7G8gcnEzM0tIW+tTfX7M55577mWfP//88/smkBK+W9LMzJLj4mZmZslxcTMzs+S4uJmZWXJc3MzMLDkubmZmlpyyipukV0m6XtJvJa2V9JZKB2ZmZt3X0NRQ6JQ3DU0NZR33wgsvZPr06cyYMYPm5maWL1/OpZdeykEHHYQkNm/e/Bftly5dSnNzM9OnT+foo48u/OdQ7vfc/gO4LSJOkjQMeEXhkZiZWa+1rW/jPTfMKWx/N89d0mWbe++9l1tuuYVVq1YxfPhwNm/ezM6dOxk2bBjHHXccs2bN+ov2W7du5YwzzuC2226jqamJp59+urB49+qyuEkaBRwFnAoQETuBnYVHYmZmA9KGDRsYO3Ysw4cPB2Ds2LEATJgwocP21157Le973/toasomQn3Na15TeEzlXJY8ENgEfFfSA5IWSXrJ5DyS5ktaIWnFpk2bCg/UbLBxTtlAccwxx7B+/XqmTJnCGWecwd133/2y7detW8czzzzDrFmzOPzww7n66qsLj6mc4jYEOAz4VkQcCjwPfK59o4hYGBEtEdEybty4gsM0G3ycUzZQ1NfXs3LlShYuXMi4ceOYN28eixcv7rT97t27WblyJbfeeiu33347X/7yl1m3bl2hMZXzmVsr0BoRy/PH19NBcTMzs8GrtraWWbNmMWvWLA455BCuuuoqTj311A7bNjY2MnbsWEaOHMnIkSM56qijWLNmDVOmTCksni7P3CJiI7Be0hvyTe8EflNYBGZmNqA98sgjPProo/ser169mokTJ3ba/vjjj2fZsmXs3r2bP/3pTyxfvpypU6cWGlO5d0ueBXw/v1PyceC0QqMwM7NCTDhgQll3OHZnf1157rnnOOuss9i6dStDhgzhoIMOYuHChXzjG9/gq1/9Khs3bmTGjBnMmTOHRYsWMXXqVGbPns2MGTOoqanh4x//OAcffHBhMQMoIgrdIUBLS0usWLGi8P1a8SQVfttwJf6mBjAVsRPnlHVm7dq1hZ/19Bed9K2snPIIJWZmlhwXNzMzS46Lm5nZAJfiRwG97ZOLm5nZAFZXV8eWLVuSKnARwZYtW6irq+vxPsq9W9LMzPqhxsZGWltbSW0Um7q6OhobG3v8ehc3M7MBbOjQoUyePLnaYfQ7vixpZmbJcXEzM7PkuLiZmVlyXNzMzCw5Lm5mZpYcFzczM0uOi5sVqmZoDZIKWxqaGqrdJTMbgPw9NyvUnl17Cp9lwMysu3zmZmZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmZklx8XNzMyS4+JmZmbJcXEzM7PkuLiZmVlyXNzMzCw5ZRc3SbWSHpB0SyUDMjMz663unLl9ClhbqUDMzMyKUlZxk9QI/C2wqLLhmJmZ9V65Z25fB84B9lQwFjMzs0J0WdwkHQc8HREru2g3X9IKSSs2bdpUWIBmg5VzyqznyjlzeyvwXklPAtcB75B0TftGEbEwIloiomXcuHEFh2k2+DinzHquy+IWEedGRGNETAJOBn4aER+ueGRmZmY95O+5mZlZcoZ0p3FELAWWViQSMzOzgvjMzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmZklx8XNzMyS4+JmZmbJcXEzM7PkuLiZmVlyXNwGmIamBiQVtphZsYrO0YamhkLjG1Y3tLDYhgyrLbSvE8ePL6yf3ZoVwKqvbX0b77lhTmH7u3nuksL2ZWb9P0d3vbC7sPhunruE1gmNhewLoLGttbB9+czNzMyS4+JmZmbJcXEzM7PkuLiZmVlyXNzMzCw5Lm5mZpYcFzczM0uOi5uZmSXHxc3MzJLj4mZmZslxcTMzs+S4uJmZWXJc3MzMLDldFjdJB0j6maS1kh6W9Km+CMzMzKynypnyZjfwmYhYJWk/YKWkOyPiNxWOzczMrEe6PHOLiA0RsSpf/yOwFih29jwzM7MCdWuyUkmTgEOB5R08Nx+YD9DU1FRAaOVraGqgbX1bYfurGVbDnp17Ctvf0OFD2LljV2H7G2yKmjF8RE0t2/e8WMi+AJr235/fb9hQ2P7aq2ZOWf9SVA4UrWZoTbETjA4t7jaQsoubpHrgx8CnI+LZ9s9HxEJgIUBLS0sUFmEZKjHzbX+eSXewKWqm38a21n47a3BHqplT1r/M/OJthe1r+ZdmF7avPbv29Nv/K8sqk5KGkhW270fETwo7upmZWQWUc7ekgO8AayPi3ysfkpmZWe+Uc+b2VuAjwDskrc6X4s5DzczMCtblZ24R8XOgf36aaWZm1gGPUGJmZslxcTMzs+S4uJmZWXJc3MzMLDkubmZmlhwXNzMzS46Lm5mZJcfFzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOd2aibsow+qGsuuF3dU4dFXUDK3ptzPp9ndFzvRb5Cy/lpaGpgba1rdVOwwrUFWK264Xdvfb2VsrocjZavt7X4vmn531hbb1bf47S4zfypqZWXJc3MzMLDkubmZmlhwXNzMzS46Lm5mZJcfFzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmZklp6ziJmm2pEckPSbpc5UOyszMrDe6LG6SaoHLgGOBacApkqZVOjAzM7OeKufM7a+AxyLi8YjYCVwHHF/ZsMzMzHpOEfHyDaSTgNkR8fH88UeAmRFxZrt284H5+cM3AI8UH26PjQU2VzuIPjTY+gv9t8+bI2J2T17onOp3Bluf+2t/y8qpcmbiVgfbXlIRI2IhsLCM/fU5SSsioqXacfSVwdZfSLPPzqn+ZbD1eaD3t5zLkq3AASWPG4G2yoRjZmbWe+UUt/uB10uaLGkYcDLw35UNy8zMrOe6vCwZEbslnQncDtQCV0bEwxWPrFj98tJOBQ22/sLg7HM1Dcaf92Dr84Dub5c3lJiZmQ00HqHEzMyS4+JmZmbJGfDFrauhwSQ1SfqZpAckPShpTr59kqTtklbny7f7PvruK6O/EyXdlfd1qaTGkuc+KunRfPlo30beM73s74slv1/fBFUm59RLnndO/fm5gZNTETFgF7IbXH4HHAgMA9YA09q1WQicnq9PA57M1ycBD1W7DxXo74+Aj+br7wC+l6+/Gng8/3d0vj662n2qVH/zx89Vuw8DbXFOOadSyamBfuZWztBgAYzK11/JwP6OXjn9nQbcla//rOT5dwN3RsT/RcQzwJ1Aj0bO6EO96a/1jHPKOZVETg304tYArC953JpvK3U+8GFJrcAS4KyS5ybnl1bulvS2ikZajHL6uwY4MV+fC+wnaUyZr+1vetNfgDpJKyTdJ+mEyoaaDOeUcyqJnBroxa2cocFOARZHRCMwB/iepBpgA9AUEYcCZwPXShpF/1ZOfz8LHC3pAeBo4A/A7jJf29/0pr+Q/X5bgA8CX5f0uopFmg7nlHMqiZwqZ2zJ/qycocE+Rn6pICLulVQHjI2Ip4EX8u0rJf0OmAKsqHjUPddlfyOiDXgfgKR64MSI2Ja/y57V7rVLKxlsAXrc35LniIjHJS0FDiX7vME655xyTqWRU9X+0K83C1lxfhyYzJ8/HJ3ers3/AKfm61PJfpECxgG1+fYDyd6dvLrafSqgv2OBmnz9QuBL+fqrgSfIPvgena+n3N/RwPCSNo/S7oNzLz3+mTunwjnV33Oq6gEU8MuaA6wje/fw+Xzbl4D35uvTgF/kv8TVwDH59hOBh/Ptq4D3VLsvBfX3pPyPbh2waO8fY/7c3wGP5ctp1e5LJfsLHAH8Ov/9/hr4WLX7MlAW55RzKoWc8vBbZmaWnIF+Q4mZmdlLuLiZmVlyXNzMzCw5Lm5mZpYcFzczM0uOi1sVlYywvUbSKklHtHt+gaQdkl5Zsm2WpG35EEe/lXRJvv20ktG6d0r6db5+kaRTJV3abt9LJbWUPD5UUkh6d7t2r5V0raTHJa2UdK+kuZX5iZj1jnPK9nJxq67tEdEcEW8CzgX+rd3zpwD3k43vVmpZZEMcHQocJ+mtEfHdfF/NZF+qfXv++CVTWnTiFODn+b8ASBJwI3BPRBwYEYcDJ5ONamDWHzmnDHBx609GAc/sfZCP2VYPfIGS5CgVEdvJvkTbq8Fa84Q7CTgVOCYfTgmy6S52RsS+ebki4vcR8c3eHM+sjzinBrGBPrbkQDdC0mqgDhhP9oe/1ynAD4BlwBskvSaysfv2kTQaeD1wTxnHmifpyJLHB5WsvxV4IiJ+l48XNwf4CTCdbKQJs4HCOWWAz9yqbe8llDeSDUR7df6OD7JLFddFxB6ypHh/yeveJulBYCNwS0RsLONYP9x7iSW/zFI6mO0pZPM6kf/b4btaSZfln2XcX3YPzfqWc8oAn7n1G5GNrj4WGCdpf7J3j3fmeTmMbLDTy/LmyyLiOElTgJ9LuiEiVvfkuJJqycYEfK+kz5MNgDtG0n5k4wTundeJiPhEHmN/HuXdDHBODXY+c+snJL2RbAr4LWTv8s6PiEn5MgFokDSx9DURsY7sA/N/7sWh3wWsiYgD8mNNBH4MnAD8lGxywtNL2r+iF8cy6zPOqcHNxa26Ruy91Rj4IfDRiHiR7PLJDe3a3pBvb+/bwFGSJvcwhlM6ONaPgQ9GNqr2CWQTFz4h6VfAVfQu8c0qyTllAJ4VwMzM0uMzNzMzS46Lm5mZJcfFzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOf8Pgf5cATIG3JgAAAAASUVORK5CYII=\n"}, "metadata": {"needs_background": "light"}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "bins = np.linspace(df1.ADJOE.min(), df1.ADJOE.max(), 10)\ng = sns.FacetGrid(df1, col=\"windex\", hue=\"POSTSEASON\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'ADJOE', bins=bins, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()", "execution_count": 9, "outputs": [{"output_type": "display_data", "data": {"text/plain": "", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFEZJREFUeJzt3X+UVOV9x/HPZ5eV5bBaPZIf7C4rGEOCKF1lE2JiDE1Ts1KNQZOgMTk11ZJqlERTTKxtND16Dv44NSdRkyIxamPiaU3UaIg/akSxKEYQLEpEoxbWhQpEUBQV5Ns/5kJGnGVn8e7OszPv1zn3MHPnmXu/l9lnP3OfvfOMI0IAAKSmrtIFAABQCgEFAEgSAQUASBIBBQBIEgEFAEgSAQUASBIB1Q9sz7W9dx/aj7a9rD9r6mG/19p+1vaSbJnRS/t5tjsGqj7UpsHQf2xfmfWZJ2xvLupDnxvIOqrdkEoXUI0iYkqla+iDmRFxU6WLALYbDP0nIr4mFcJR0u0R0V6qne0hEbF1AEurKpxB9ZHtc7afadi+3PZvs9t/afun2e3nbI/I3tktt3217cdt32V7WNZmou2lth+U9LWi7dfbvtT272w/Zvur2fqptv/LBSNtr7D93n46xh/afiSr+bslHq/Pzr6W2f4f22dl699n+w7bi2zPt/3B/qgPg1eN9J8HbF9k+35JZ9j+qe3PFj2+qej2t20/nNX6nf6oZzAjoPrufkkfz253SGqy3SDpcEnzS7R/v6QrI2K8pA2Sjs/W/0TSjIg4bKf2p0jaGBEfkvQhSX9ne0xE3CxpjQqd8WpJ50fEmuIn2t6zaKhh5+XAHo7n0qI2B2frzouIDkkTJH3C9oSdntMuqSUiDoqIg7NjkaTZks6MiImS/kHSVT3sE7Wr2vpPT/aKiCMi4ns9NbA9RVKbpEkq9KmP2v5oH/dT1Rji67tFkiba3lPS65IWq9DRPi6p1N9wno2IJUXPHW37zyTtHRH3Zev/XdJR2e0jJU0oGsv+MxU66bOSzpS0TNJDEfHznXcUES+r8IPeF6WG+L5ge7oKPx8jJR0o6bGix5+RtL/tH0j6taS7bDdJ+qik/7S9vd3QPtaC6ldt/acnN5bR5kgV6n40u98kaaykBTnVMOgRUH0UEVtsPyfpKyr8ID0m6S8kvU/S8hJPeb3o9puShkmypJ4mQbQKZyF3lnisRdI2Se+xXRcR297yxEKnL/UuVJK+GBFP9PBY8TbGqHD286GIeNH2tZIai9tk6/9c0qdVeEf6BUnfkLShp7F4QKr+/lPklaLbW5WNVtmu159+71rShRHx4z5st6YwxLd77lfhl/j9KvxA/72kJVHmzLsRsUHSRtuHZ6tOKnr4TkmnZcMesj3W9nDbQ1QY1viiCh357BLbfTki2ntYyu1ce6nQuTbafo/+9M50B9sjJNVFxC8k/bOkQyPiJUnP2v581sZZiAE7q+b+U8pzkiZmt6dKqi+q9RTbw7NaW7O+hQxnULtnvqTzJD0YEa/Yfk09v/PqyVckXWP7VRV+ULebI2m0pMUujJWtlfRZSd+UND8i5tteIul3tn8dEaXede62iFhq+1FJj6swlPffJZq1SPqJ7e1vcM7N/j1J0g9t/5OkBhWGOZbmWR+qQtX2nx78m6Rbbf+VpLuUnRVGxNzsQqKHsmHxl1UI0HUDUNOgYL5uAwCQIob4AABJIqAAAEkioAAASSKgAABJ6peA6uzsDBU+p8DCUu1LLugzLDW2lKVfAmrdOq6SBPqCPgO8HUN8AIAkEVAAgCSVHVDZNPaP2r69PwsCAEDq21RHX1dhDqu9+qkWAKhJW7ZsUVdXl1577bVKl5KrxsZGtba2qqGhYbeeX1ZA2W6V9NeSLlKJSRYBALuvq6tLe+65p0aPHq2ir6sZ1CJC69evV1dXl8aMGbNb2yh3iO97ks5RYap6AECOXnvtNe27775VE06SZFv77rvvOzor7DWgbB8t6YWIWNRLu+kufE34I2vXrt3tgtC/WtpaZDuXpWFYQ5LbamlrqfR/c1noMyhWTeG03Ts9pnKG+D4m6TMufD1xo6S9bP80Ir5U3CgiZqvwld/q6Ogo+4NYGFjdq7p1zM1TctnWbVPnJrutwYA+A+xar2dQEXFuRLRGxGhJJ0j67c7hBADIz34jR+Y2omBb+40c2es+6+vr1d7evmOZNWuWJOmee+7RoYceqvb2dh1++OF6+umn+/vwd+ALCwEgMSvXrFFXc2tu22vt7uq1zbBhw7RkyZK3rT/ttNN06623aty4cbrqqqt04YUX6tprr82ttl3pU0BFxDxJ8/qlEgBAcmzrpZdekiRt3LhRzc3NA7ZvzqAAANq8ebPa29t33D/33HM1bdo0zZkzR1OmTNGwYcO011576aGHHhqwmggoAECPQ3yXX3655s6dq0mTJunSSy/V2WefrTlz5gxITczFBwAoae3atVq6dKkmTZokSZo2bZoWLFgwYPsnoAAAJe2zzz7auHGjVqxYIUm6++67NW7cuAHbP0N8AJCYtve+t6wr7/qyvd7s/Deozs5OzZo1S1dffbWOP/541dXVaZ999tE111yTW129IaAAIDH/u3r1gO/zzTffLLl+6tSpmjp16gBXU8AQHwAgSQQUACBJBBQAIEkEFAAgSQQUACBJBBQAIEkEFAAkprm1Ldev22hubet1nzt/3cZzzz2347GVK1eqqalJl112WT8e9dvxOSgASMzq51dp0nfuyG17C/+ls9c2Pc3FJ0lnnXWWjjrqqNzqKRcBBQDo0S233KL9999fw4cPH/B9M8QHANgx1VF7e/uOmSNeeeUVXXzxxTr//PMrUhNnUACAkkN8559/vs466yw1NTVVpCYCCgBQ0sKFC3XTTTfpnHPO0YYNG1RXV6fGxkadccYZA7J/AgoAUNL8+fN33L7gggvU1NQ0YOEkEVAAkJyRLaPKuvKuL9sbjAgoAEhMd9fKAd/npk2bdvn4BRdcMDCFFOEqPgBAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJJ6DSjbjbYftr3U9uO2vzsQhQFArWppa8n16zZa2lrK2u9FF12k8ePHa8KECWpvb9fChQt1xRVX6IADDpBtrVu37i3t582bp/b2do0fP16f+MQncv9/KOdzUK9L+mREbLLdIOkB27+JiIdyrwYAoO5V3Trm5im5be+2qXN7bfPggw/q9ttv1+LFizV06FCtW7dOb7zxhvbYYw8dffTRmjx58lvab9iwQaeffrruuOMOtbW16YUXXsit3u16DaiICEnbP8HVkC2ReyUAgIpZvXq1RowYoaFDh0qSRowYIUlqbm4u2f5nP/uZjjvuOLW1Fb4M8d3vfnfuNZX1Nyjb9baXSHpB0t0RsbBEm+m2H7H9yNq1a/Ous6blebpfK/L6/9pv5Mj+rJE+M0jk2QfLHW4baEceeaRWrVqlsWPH6vTTT9d99923y/YrVqzQiy++qMmTJ2vixIm6/vrrc6+prKmOIuJNSe2295Z0s+2DImLZTm1mS5otSR0dHZxh5SjP0/1yTvWrQVdzay7bae3uymU7pdBnBo9a6INNTU1atGiR5s+fr3vvvVfTpk3TrFmzdPLJJ5dsv3XrVi1atEj33HOPNm/erMMOO0wf+chHNHbs2Nxq6tNcfBGxwfY8SZ2SlvXSHAAwiNTX12vy5MmaPHmyDj74YF133XU9BlRra6tGjBih4cOHa/jw4TriiCO0dOnSXAOqnKv43pWdOcn2MEmfkvT73CoAAFTck08+qaeeemrH/SVLlmi//fbrsf2xxx6r+fPna+vWrXr11Ve1cOFCjRs3LteayjmDGinpOtv1KgTaf0TE7blWAQDYoXlUc65Dgc2jSl/oUGzTpk0688wztWHDBg0ZMkQHHHCAZs+ere9///u65JJLtGbNGk2YMEFTpkzRnDlzNG7cOHV2dmrChAmqq6vTqaeeqoMOOii3mqXyruJ7TNIhue4VANCj51c+P+D7nDhxohYsWPC29TNmzNCMGTNKPmfmzJmaOXNmv9XETBIAgCQRUACAJBFQAJCAwpwI1eWdHhMBBQAV1tjYqPXr11dVSEWE1q9fr8bGxt3eRp8+BwUAyF9ra6u6urpUbTOKNDY2qrV19z80T0ABQIU1NDRozJgxlS4jOQzxAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJLUa0DZHmX7XtvLbT9u++sDURgAoLYNKaPNVknfjIjFtveUtMj23RHxRD/XBgCoYb2eQUXE6ohYnN1+WdJySS39XRgAoLb16W9QtkdLOkTSwv4oBgCA7coZ4pMk2W6S9AtJ34iIl0o8Pl3SdElqa2vLrcDetLS1qHtVdy7bah7VrOdXPp/LtvZobNCW17fmsi30TV1DnVq7u3LbVn+pVJ9B5dnOZTvD6uq1edubuWyrYeiQ3H5n5fW7tKyAst2gQjjdEBG/LNUmImZLmi1JHR0d8Y4rK1P3qm4dc/OUXLZ129S5uWxHkra8vjXJumrBti3bBsX/faX6DCqvq7k1l+20dnfluq3U+k05V/FZ0o8lLY+If81lrwAA9KKc8YuPSfqypE/aXpIt+cQsAAA96HWILyIekJTPgCkAAGViJgkAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSeg0o29fYfsH2soEoCAAAqbwzqGsldfZzHQAAvEWvARUR90v64wDUAgDADkPy2pDt6ZKmS1JbW9su27a0tah7VXdeu85NXUOdbFe6DNSIvvSZlOXZn5tHNev5lc/nsq2Uf8+0dnclt60U5RZQETFb0mxJ6ujoiF217V7VrWNunpLLfm+bOjeX7UjSti3bkqwL1akvfSZlqfbnVOvK+/dMiseYF67iAwAkiYACACSpnMvMfy7pQUkfsN1l+5T+LwsAUOt6/RtURJw4EIUAAFCMIT4AQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSygoo2522n7T9tO1v93dRAAD0GlC26yVdKekoSQdKOtH2gf1dGACgtpVzBvVhSU9HxDMR8YakGyUd279lAQBqnSNi1w3sz0nqjIhTs/tfljQpIs7Yqd10SdOzux+Q9GT+5Q6IEZLWVbqId6gajkEaHMexLiI6d+eJVdRnpMHxWvWGYxg4ZfWbIWVsyCXWvS3VImK2pNllbC9pth+JiI5K1/FOVMMxSNVzHD2plj4jVcdrxTGkp5whvi5Jo4rut0rq7p9yAAAoKCegfifp/bbH2N5D0gmSftW/ZQEAal2vQ3wRsdX2GZLulFQv6ZqIeLzfK6ucahhyqYZjkKrnOGpBNbxWHENier1IAgCASmAmCQBAkggoAECSai6gbF9j+wXby4rWfd7247a32e7Yqf252RRPT9r+9MBX/HZ9OQbbo21vtr0kW35UmarfqodjuNT2720/Zvtm23sXPZbc61BL6Df0m4qIiJpaJB0h6VBJy4rWjVPhg5LzJHUUrT9Q0lJJQyWNkfQHSfWD7BhGF7dLZenhGI6UNCS7fbGki1N+HWppod+ksdRav6m5M6iIuF/SH3datzwiSn2K/1hJN0bE6xHxrKSnVZj6qaL6eAxJ6uEY7oqIrdndh1T4zJ2U6OtQS+g3aai1flNzAdVHLZJWFd3vytYNNmNsP2r7Ptsfr3QxZfpbSb/JblfL61ArquX1ot9UWDlTHdWysqZ5StxqSW0Rsd72REm32B4fES9VurCe2D5P0lZJN2xfVaLZYHsdakk1vF70mwRwBrVrg36ap+z0fn12e5EK49BjK1tVz2z/jaSjJZ0U2UC6quB1qDGD/vWi36SBgNq1X0k6wfZQ22MkvV/SwxWuqU9svyv7Ti/Z3l+FY3imslWVZrtT0rckfSYiXi16aNC/DjVm0L9e9JtEVPoqjYFeJP1chdP3LSq8wzhF0tTs9uuS/k/SnUXtz1Ph3dOTko6qdP19PQZJx0t6XIWreRZLOqbS9e/iGJ5WYcx8Sbb8KOXXoZYW+g39phILUx0BAJLEEB8AIEkEFAAgSQQUACBJBBQAIEkEFAAgSQRUgmxPtR22P5jd3z6z8qO2l9t+OPtg3vb2J9u+ouj+9Gx2499nbQ8vemxeNrPx9lmabxrYowPyR5+pTkx1lKYTJT0g6QRJF2Tr/hARh0g7Pjj4S9t1EfGT4ifaPlrSVyUdHhHrbB+qwjQtH46INVmzkyLikYE4EGCA0GeqEGdQibHdJOljKnwA74RSbSLiGUlnS5pR4uFvSZoZEeuytoslXSfpa/1SMFBh9JnqRUCl57OS7oiIFZL+mL2bK2WxpA+WWD9e0qKd1j2Srd/uhqLhikvfccVAZdFnqhRDfOk5UdL3sts3ZvevLNGu1EzFPbHeOosxwxWoJvSZKkVAJcT2vpI+Kekg2yGpXoVOclWJ5odIWl5i/ROSJkr6bdG6Q7P1QFWhz1Q3hvjS8jlJ10fEfhExOiJGSXpWf/qGTEmFK5QkXSbpByW2cYmki7OOK9vtkk5W6Q4LDHb0mSrGGVRaTpQ0a6d1v5D0j5LeZ/tRSY2SXpb0g6KrkYaoMBuzIuJXtlskLcjeUb4s6UsRsbpomzfY3pzdXhcRn+qfwwH6HX2mijGbeRWwfbmkpyKCd3xAGegzgwMBNcjZ/o2kPSQdFxEbK10PkDr6zOBBQAEAksRFEgCAJBFQAIAkEVAAgCQRUACAJBFQAIAk/T9mzBRRZdMnhAAAAABJRU5ErkJggg==\n"}, "metadata": {"needs_background": "light"}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "# Pre-processing: Feature selection/extraction"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Lets look at how Adjusted Defense Efficiency plots"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "bins = np.linspace(df1.ADJDE.min(), df1.ADJDE.max(), 10)\ng = sns.FacetGrid(df1, col=\"windex\", hue=\"POSTSEASON\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'ADJDE', bins=bins, ec=\"k\")\ng.axes[-1].legend()\nplt.show()\n", "execution_count": 10, "outputs": [{"output_type": "display_data", "data": {"text/plain": "", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFSxJREFUeJzt3X2UXHV9x/H3Z5OQDVkQTECzuywJTxoe4gJr4wNCCpaGFAoL1IAPFY9KK4IarFRqK1jlHBRaLPJwGiIERLGnaEQg5aFQIFaIJrBBIBIQaLJsIkkk4RkS8u0fcxOHZSY7G+7s/Gb28zpnzs7c+c3vfu9kfvnM/c3MvYoIzMzMUtNU6wLMzMxKcUCZmVmSHFBmZpYkB5SZmSXJAWVmZklyQJmZWZIcUFUgab6knQbRfqKkh6pZU5n1zpX0pKSe7PKFAdrfJalrqOqz4akexo+kS7Mx84ikl4vG0IlDWUejG1nrAhpRRMyodQ2D8JWIuL7WRZhtVg/jJyI+D4VwBG6KiM5S7SSNjIiNQ1haQ/Ee1CBJOmvznoakiyTdmV0/QtK12fWnJI3P3tktlXSFpIcl3SZpTNbmYElLJN0LfL6o/xGSLpD0a0kPSvqbbHm3pP9WwQRJyyS9s0rbeLmkRVnN3yhx/4hs7+shSb+RNCtbvqekWyQtlrRA0rurUZ/Vr2Eyfn4h6TxJ9wCnS7pW0nFF979QdP2rkn6V1fr1atRTzxxQg3cP8KHsehfQImkUcAiwoET7vYFLI2I/YB1wQrb8KuALEfH+fu0/DayPiPcC7wU+K2lSRMwDVlEYjFcA50TEquIHStqhaKqh/2XfMttzQVGbA7JlX4uILmAKcJikKf0e0wm0RcT+EXFAti0As4EzIuJg4O+Ay8qs04avRhs/5ewYEYdGxHfLNZA0A+gAplIYUx+Q9IFBrqeheYpv8BYDB0vaAXgVuJ/CQPsQUOoznCcjoqfosRMlvQ3YKSLuzpb/ADgqu34kMKVoLvttFAbpk8AZwEPAfRFxXf8VRcTzFF7og1Fqiu8jkk6l8PqYAOwLPFh0/xPAHpK+B9wM3CapBfgA8J+SNrcbPcharPE12vgp58cVtDmSQt0PZLdbgH2AX+ZUQ91zQA1SRGyQ9BTwKQovpAeBPwX2BJaWeMirRddfB8YAAsodBFEU9kJuLXFfG7AJeIekpojY9IYHFgZ9qXehAB+NiEfK3FfcxyQKez/vjYhnJc0FmovbZMvfA/w5hXekHwG+BKwrNxdvBo0/foq8WHR9I9lslaQR/PH/XQHfiojvD6LfYcVTfNvmHgr/id9D4QX9t0BPVHjk3YhYB6yXdEi26GNFd98KfC6b9kDSPpLGShpJYVrjoxQG8pkl+n0+IjrLXCodXDtSGFzrJb2DP74z3ULSeKApIn4C/BNwUEQ8Bzwp6a+yNspCzKy/Rh4/pTwFHJxd7wZGFNX6aUljs1rbs7FlGe9BbZsFwNeAeyPiRUmvUP6dVzmfAq6U9BKFF+pmc4CJwP0qzJWtBo4DvgwsiIgFknqAX0u6OSJKvevcZhGxRNIDwMMUpvL+t0SzNuAqSZvf4Jyd/f0YcLmkfwRGUZjmWJJnfdYQGnb8lPHvwA2S/gy4jWyvMCLmZ18kui+bFn+eQoCuGYKa6oJ8ug0zM0uRp/jMzCxJDigzM0uSA8rMzJLkgDIzsyRVJaCmT58eFH6n4IsvjX7JhceML8PsUpGqBNSaNf6WpNlgeMyYvZmn+MzMLEkOKDMzS5IDyszMkuRDHZmZ1diGDRvo7e3llVdeqXUpuWpubqa9vZ1Ro0Zt0+MdUGZmNdbb28sOO+zAxIkTKTpdTV2LCNauXUtvby+TJk3apj48xWdmVmOvvPIK48aNa5hwApDEuHHj3tJeoQOqSto62pCUy6Wto63Wm2NmVdZI4bTZW90mT/FVSd+KPo6ZNyOXvm7snp9LP2Zm9cR7UGZmidl9woTcZmAksfuECQOuc8SIEXR2dm65nH/++QDccccdHHTQQXR2dnLIIYfw+OOPV3vzt/AelJlZYpavWkVva3tu/bX39Q7YZsyYMfT09Lxp+ec+9zluuOEGJk+ezGWXXca3vvUt5s6dm1ttW+M9KDMzK0sSzz33HADr16+ntbV1yNbtPSgzM+Pll1+ms7Nzy+2zzz6bmTNnMmfOHGbMmMGYMWPYcccdue+++4asJgeUmZmVneK76KKLmD9/PlOnTuWCCy7gzDPPZM6cOUNSk6f4zMyspNWrV7NkyRKmTp0KwMyZM/nlL385ZOt3QJmZWUk777wz69evZ9myZQDcfvvtTJ48ecjW7yk+M7PEdLzznRV9824w/Q2k/2dQ06dP5/zzz+eKK67ghBNOoKmpiZ133pkrr7wyt7oG4oAyM0vM/61cOeTrfP3110su7+7upru7e4irKfAUn5mZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJqiigJO0k6XpJv5W0VNL7q12Ymdlw1drekevpNlrbOwZcZ//TbTz11FNb7lu+fDktLS1ceOGFVdzqN6v0d1D/BtwSESdK2g7Yvoo1mZkNayufXsHUr9+SW38L/3n6gG3KHYsPYNasWRx11FG51VOpAQNK0o7AocApABHxGvBadcsyM7MU/OxnP2OPPfZg7NixQ77uSqb49gBWA1dJekDSHElvqlTSqZIWSVq0evXq3As1azQeM5aSzYc66uzs3HLkiBdffJFvf/vbnHPOOTWpqZKAGgkcBFweEQcCLwJf7d8oImZHRFdEdO2yyy45l2nWeDxmLCWbp/h6enqYN28eAOeccw6zZs2ipaWlJjVV8hlUL9AbEQuz29dTIqDMzKyxLFy4kOuvv56zzjqLdevW0dTURHNzM6effvqQrH/AgIqIVZJWSHpXRDwKHAE8Uv3SzMyslhYsWLDl+rnnnktLS8uQhRNU/i2+M4AfZt/gewL4VPVKMjMb3ia07VbRN+8G0189qiigIqIH6KpyLWZmBvT1Lh/ydb7wwgtbvf/cc88dmkKK+EgSZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSXJAWVmlpi2jrZcT7fR1tFW0XrPO+889ttvP6ZMmUJnZycLFy7kkksuYa+99kISa9aseUP7u+66i87OTvbbbz8OO+yw3J+HSn+oa2ZmQ6RvRR/HzJuRW383ds8fsM29997LTTfdxP3338/o0aNZs2YNr732Gttttx1HH30006ZNe0P7devWcdppp3HLLbfQ0dHBM888k1u9mzmgzMyMlStXMn78eEaPHg3A+PHjAWhtbS3Z/kc/+hHHH388HR2FkyHuuuuuudfkKT4zM+PII49kxYoV7LPPPpx22mncfffdW22/bNkynn32WaZNm8bBBx/MNddck3tNDigzM6OlpYXFixcze/ZsdtllF2bOnMncuXPLtt+4cSOLFy/m5ptv5tZbb+Wb3/wmy5Yty7UmT/GZmRkAI0aMYNq0aUybNo0DDjiAq6++mlNOOaVk2/b2dsaPH8/YsWMZO3Yshx56KEuWLGGfffbJrR7vQZmZGY8++iiPPfbYlts9PT3svvvuZdsfe+yxLFiwgI0bN/LSSy+xcOFCJk+enGtN3oMyM0tM626tFX3zbjD9DeSFF17gjDPOYN26dYwcOZK99tqL2bNnc/HFF/Od73yHVatWMWXKFGbMmMGcOXOYPHky06dPZ8qUKTQ1NfGZz3yG/fffP7eaARQRuXYI0NXVFYsWLcq933oiKbevid7YPZ+8/p3aOtroW9GXS1+tu7Xy9PKnc+mrjimPTjxmhrelS5fmvveRijLbVtG48R7UMJPn7yvyfIdnZtafP4MyM7MkOaDMzBJQjY9bau2tbpMDysysxpqbm1m7dm1DhVREsHbtWpqbm7e5D38GZWZWY+3t7fT29rJ69epal5Kr5uZm2tvbt/nxDigzsxobNWoUkyZNqnUZyfEUn5mZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSXJAWVmZkmqOKAkjZD0gKSbqlmQmZkZDG4P6ovA0moVYmZmVqyigJLUDvwFMKe65ZiZmRVUugf1XeAsYFO5BpJOlbRI0qJGO2R8rTWNakJSLpdU62rraMu1tnrQKGNm9wkTcnsd7D5hQq03xxIy4Ok2JB0NPBMRiyVNK9cuImYDswG6uroa56xbCdi0YRPHzJuRS183ds/PpR9It6560ShjZvmqVfS2bvs5f4q19/Xm0o81hkr2oD4I/KWkp4AfA4dLuraqVZmZ2bA3YEBFxNkR0R4RE4GTgDsj4uNVr8zMzIY1/w7KzMySNKhTvkfEXcBdVanEzMysiPegzMwsSQ4oMzNLkgPKzMyS5IAyM7MkOaDMzCxJDigzM0uSA8rMzJLkgDIzsyQ5oMzMLEkOKDMzS5IDyszMkuSAMjOzJDmgzMwsSQ4oMzNLkgPKzMySNKjzQZmZ9dc0qon2vt7c+kpRW0cbfSv6cumrdbdWnl7+dC59NToHlJm9JZs2bOKYeTNy6evG7vm59JO3vhV9Db+NKUrz7YqZmQ17DigzM0uSA8rMzJLkgDIzsyQ5oMzMLEkOKDMzS5IDyszMkuSAMjOzJDmgzMwsSQ4oMzNLkgPKzMyS5IAyM7MkOaDMzCxJDigzM0uSA8rMzJI0YEBJ2k3S/0haKulhSV8cisLMzGx4q+SEhRuBL0fE/ZJ2ABZLuj0iHqlybWZmNowNuAcVESsj4v7s+vPAUqCt2oWZmdnwNqjPoCRNBA4EFpa471RJiyQtWr16dT7VmTUwj5nqautoQ1IuF6uNSqb4AJDUAvwE+FJEPNf//oiYDcwG6OrqitwqNGtQHjPV1beij2Pmzcilrxu75+fSjw1ORXtQkkZRCKcfRsRPq1uSmZlZZd/iE/B9YGlE/Gv1SzIzM6tsD+qDwCeAwyX1ZJd89pvNzMzKGPAzqIj4BeBPCc3MbEj5SBJmZpYkB5SZmSXJAWVmZklyQJmZWZIcUGZmliQHlJmZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSWp4lO+p2rEdiPYtGFTLn01jWrKrS8bnKZRTRTOjZlDX9s1sem1fP4dR40eyWuvbMilr7eqraONvhV9ufQ1avRINry6MZe+zKql7gNq04ZNHDMvn/Mn3tg9n97W9lz6au/rzaWf4SLvf8c8+0pF34o+v9ZtWPEUn5mZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSXJAWVmZklyQJmZWZIcUGZmliQHlJmZJckBZWZmSXJAmZlZkhxQZmaWpIoCStJ0SY9KelzSV6tdlJmZ2YABJWkEcClwFLAvcLKkfatdmJmZDW+V7EH9CfB4RDwREa8BPwaOrW5ZZmY23Ckitt5AOhGYHhGfyW5/ApgaEaf3a3cqcGp2813Ao8B4YE3eRQ+xet8G119dayJi+rY8sMyYgfS3eSD1Xj/U/zakXn9F42ZkBR2pxLI3pVpEzAZmv+GB0qKI6KpgHcmq921w/ekqNWag/re53uuH+t+Geq9/s0qm+HqB3YputwN91SnHzMysoJKA+jWwt6RJkrYDTgJ+Xt2yzMxsuBtwii8iNko6HbgVGAFcGREPV9j/m6Yv6lC9b4Prrz/1vs31Xj/U/zbUe/1ABV+SMDMzqwUfScLMzJLkgDIzsyTlGlCSZkl6WNJDkq6T1CxprqQnJfVkl84815knSV/Man9Y0peyZW+XdLukx7K/O9e6znLK1H+upKeLnv8Zta6zmKQrJT0j6aGiZSWfcxVcnB1y60FJB9Wu8nzU+5gBj5uhNpzGTG4BJakN+ALQFRH7U/hCxUnZ3V+JiM7s0pPXOvMkaX/gsxSOnPEe4GhJewNfBe6IiL2BO7LbydlK/QAXFT3/82tWZGlzgf4/2Cv3nB8F7J1dTgUuH6Iaq6Lexwx43NTIXIbJmMl7im8kMEbSSGB76uv3UpOB+yLipYjYCNwNdFM4rNPVWZurgeNqVN9AytWftIi4B/hDv8XlnvNjgWui4D5gJ0kThqbSqqnnMQMeN0NuOI2Z3AIqIp4GLgSWAyuB9RFxW3b3ednu5UWSRue1zpw9BBwqaZyk7YEZFH6g/I6IWAmQ/d21hjVuTbn6AU7Pnv8rU55qKVLuOW8DVhS1682W1aUGGDPgcZOKhhwzeU7x7UwhrScBrcBYSR8HzgbeDbwXeDvw93mtM08RsRT4NnA7cAuwBNhY06IGYSv1Xw7sCXRS+E/wX2pVYw4qOuxWvaj3MQMeN3WgrsdMnlN8HwaejIjVEbEB+CnwgYhYme1evgpcRWGuN0kR8f2IOCgiDqWwC/0Y8PvNu8TZ32dqWePWlKo/In4fEa9HxCbgChJ+/ouUe84b7bBbdT9mwOMmEQ05ZvIMqOXA+yRtL0nAEcDSoidNFOZFH9pKHzUladfsbwdwPHAdhcM6fTJr8knghtpUN7BS9febb+4m4ee/SLnn/OfAX2ffTHofhSmxlbUoMCd1P2bA4yYRjTlmIiK3C/AN4LcU/jF/AIwG7gR+ky27FmjJc505178AeITCbv4R2bJxFL4V81j29+21rnOQ9f8ge/4fpPBinVDrOvvVfB2FKZQNFN7tfbrcc05huuJS4HfZNnXVuv4ctr+ux8xWXnceN9Wrd9iMGR/qyMzMkuQjSZiZWZIcUGZmliQHlJmZJckBZWZmSXJAmZlZkhxQCZLULSkkvTu7PVHSy5IekLRU0q8kfbKo/SmSLsmuFx+F+TFJP5W0b1HbuyQ9WnSU5uuHfgvN8udx03gGPOW71cTJwC8oHNn63GzZ7yLiQABJewA/ldQUEVeVePxFEXFh1nYmcKekAyJidXb/xyJiUVW3wGzoedw0GO9BJUZSC/BBCj++O6lUm4h4AjiTwqkatioi/gO4DfhojmWaJcXjpjE5oNJzHHBLRCwD/qDyJxi7n8IBRSvRv+0Pi6YqLngLtZqlwuOmAXmKLz0nA9/Nrv84u31piXaljlJcTv+2nqqwRuNx04AcUAmRNA44HNhfUlA4w2oAl5VofiCwtMKuDwQ8sKwhedw0Lk/xpeVECme/3D0iJkbEbsCTFA6Rv4WkiRROdPe9gTqUdAJwJIUDTJo1Io+bBuU9qLScDJzfb9lPgH8A9pT0ANAMPA98r+ibSCOBV4seMys78d1YCkfEPrzom0hQmEt/Obu+JiI+nPN2mA0lj5sG5aOZNwBJF1E4yVqpKQ0zK8HjJn0OqDon6b+A7YDjI2J9resxqwceN/XBAWVmZknylyTMzCxJDigzM0uSA8rMzJLkgDIzsyQ5oMzMLEn/D5atggtp71QbAAAAAElFTkSuQmCC\n"}, "metadata": {"needs_background": "light"}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "We see that this data point doesn't impact the ability of a team to get into the Final Four. "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## Convert Categorical features to numerical values"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Lets look at the postseason:"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1.groupby(['windex'])['POSTSEASON'].value_counts(normalize=True)", "execution_count": 11, "outputs": [{"output_type": "execute_result", "execution_count": 11, "data": {"text/plain": "windex POSTSEASON\nFalse S16 0.605263\n E8 0.263158\n F4 0.131579\nTrue S16 0.500000\n E8 0.333333\n F4 0.166667\nName: POSTSEASON, dtype: float64"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "12% of teams with 6 or less wins above bubble make it into the final four while 18% of teams with 7 or more do.\n"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Lets convert wins above bubble (winindex) under 7 to 0 and over 7 to 1:\n"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df['windex'].replace(to_replace=['False','True'], value=[0,1],inplace=True)\ndf.head()", "execution_count": 12, "outputs": [{"output_type": "execute_result", "execution_count": 12, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR \\\n0 North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 15.4 \n1 Villanova BE 40 35 123.1 90.9 0.9703 56.1 46.7 16.3 \n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 15.3 \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 15.1 \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 17.8 \n\n ... 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED YEAR windex \n0 ... 53.9 44.6 32.7 36.2 71.7 8.6 2ND 1.0 2016 1 \n1 ... 57.4 44.1 36.2 33.9 66.7 8.9 Champions 2.0 2016 1 \n2 ... 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 2016 0 \n3 ... 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 2016 1 \n4 ... 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 2016 1 \n\n[5 rows x 25 columns]", "text/html": "\n\n
\n \n \n \n TEAM \n CONF \n G \n W \n ADJOE \n ADJDE \n BARTHAG \n EFG_O \n EFG_D \n TOR \n ... \n 2P_O \n 2P_D \n 3P_O \n 3P_D \n ADJ_T \n WAB \n POSTSEASON \n SEED \n YEAR \n windex \n \n \n \n \n 0 \n North Carolina \n ACC \n 40 \n 33 \n 123.3 \n 94.9 \n 0.9531 \n 52.6 \n 48.1 \n 15.4 \n ... \n 53.9 \n 44.6 \n 32.7 \n 36.2 \n 71.7 \n 8.6 \n 2ND \n 1.0 \n 2016 \n 1 \n \n \n 1 \n Villanova \n BE \n 40 \n 35 \n 123.1 \n 90.9 \n 0.9703 \n 56.1 \n 46.7 \n 16.3 \n ... \n 57.4 \n 44.1 \n 36.2 \n 33.9 \n 66.7 \n 8.9 \n Champions \n 2.0 \n 2016 \n 1 \n \n \n 2 \n Notre Dame \n ACC \n 36 \n 24 \n 118.3 \n 103.3 \n 0.8269 \n 54.0 \n 49.5 \n 15.3 \n ... \n 52.9 \n 46.5 \n 37.4 \n 36.9 \n 65.5 \n 2.3 \n E8 \n 6.0 \n 2016 \n 0 \n \n \n 3 \n Virginia \n ACC \n 37 \n 29 \n 119.9 \n 91.0 \n 0.9600 \n 54.8 \n 48.4 \n 15.1 \n ... \n 52.6 \n 46.3 \n 40.3 \n 34.7 \n 61.9 \n 8.6 \n E8 \n 1.0 \n 2016 \n 1 \n \n \n 4 \n Kansas \n B12 \n 37 \n 32 \n 120.9 \n 90.4 \n 0.9662 \n 55.7 \n 45.1 \n 17.8 \n ... \n 52.7 \n 43.4 \n 41.3 \n 32.5 \n 70.1 \n 11.6 \n E8 \n 1.0 \n 2016 \n 1 \n \n \n
\n
5 rows \u00d7 25 columns
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## One Hot Encoding \n#### How about seed?"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1.groupby(['SEED'])['POSTSEASON'].value_counts(normalize=True)", "execution_count": 13, "outputs": [{"output_type": "execute_result", "execution_count": 13, "data": {"text/plain": "SEED POSTSEASON\n1.0 E8 0.750000\n F4 0.125000\n S16 0.125000\n2.0 S16 0.444444\n E8 0.333333\n F4 0.222222\n3.0 S16 0.700000\n E8 0.200000\n F4 0.100000\n4.0 S16 0.875000\n E8 0.125000\n5.0 S16 0.833333\n F4 0.166667\n6.0 E8 1.000000\n7.0 S16 0.800000\n F4 0.200000\n8.0 S16 1.000000\n9.0 E8 1.000000\n10.0 F4 1.000000\n11.0 S16 0.500000\n E8 0.250000\n F4 0.250000\n12.0 S16 1.000000\nName: POSTSEASON, dtype: float64"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "#### Feature before One Hot Encoding"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1[['ADJOE','ADJDE','BARTHAG','EFG_O','EFG_D']].head()", "execution_count": 14, "outputs": [{"output_type": "execute_result", "execution_count": 14, "data": {"text/plain": " ADJOE ADJDE BARTHAG EFG_O EFG_D\n2 118.3 103.3 0.8269 54.0 49.5\n3 119.9 91.0 0.9600 54.8 48.4\n4 120.9 90.4 0.9662 55.7 45.1\n5 118.4 96.2 0.9163 52.3 48.9\n6 111.9 93.6 0.8857 50.0 47.3", "text/html": "\n\n
\n \n \n \n ADJOE \n ADJDE \n BARTHAG \n EFG_O \n EFG_D \n \n \n \n \n 2 \n 118.3 \n 103.3 \n 0.8269 \n 54.0 \n 49.5 \n \n \n 3 \n 119.9 \n 91.0 \n 0.9600 \n 54.8 \n 48.4 \n \n \n 4 \n 120.9 \n 90.4 \n 0.9662 \n 55.7 \n 45.1 \n \n \n 5 \n 118.4 \n 96.2 \n 0.9163 \n 52.3 \n 48.9 \n \n \n 6 \n 111.9 \n 93.6 \n 0.8857 \n 50.0 \n 47.3 \n \n \n
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "#### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "Feature = df1[['ADJOE','ADJDE','BARTHAG','EFG_O','EFG_D']]\nFeature = pd.concat([Feature,pd.get_dummies(df1['POSTSEASON'])], axis=1)\nFeature.drop(['S16'], axis = 1,inplace=True)\nFeature.head()\n", "execution_count": 15, "outputs": [{"output_type": "execute_result", "execution_count": 15, "data": {"text/plain": " ADJOE ADJDE BARTHAG EFG_O EFG_D E8 F4\n2 118.3 103.3 0.8269 54.0 49.5 1 0\n3 119.9 91.0 0.9600 54.8 48.4 1 0\n4 120.9 90.4 0.9662 55.7 45.1 1 0\n5 118.4 96.2 0.9163 52.3 48.9 1 0\n6 111.9 93.6 0.8857 50.0 47.3 0 1", "text/html": "\n\n
\n \n \n \n ADJOE \n ADJDE \n BARTHAG \n EFG_O \n EFG_D \n E8 \n F4 \n \n \n \n \n 2 \n 118.3 \n 103.3 \n 0.8269 \n 54.0 \n 49.5 \n 1 \n 0 \n \n \n 3 \n 119.9 \n 91.0 \n 0.9600 \n 54.8 \n 48.4 \n 1 \n 0 \n \n \n 4 \n 120.9 \n 90.4 \n 0.9662 \n 55.7 \n 45.1 \n 1 \n 0 \n \n \n 5 \n 118.4 \n 96.2 \n 0.9163 \n 52.3 \n 48.9 \n 1 \n 0 \n \n \n 6 \n 111.9 \n 93.6 \n 0.8857 \n 50.0 \n 47.3 \n 0 \n 1 \n \n \n
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Feature selection"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Lets defind feature sets, X:"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "X = Feature\nX[0:5]", "execution_count": 16, "outputs": [{"output_type": "execute_result", "execution_count": 16, "data": {"text/plain": " ADJOE ADJDE BARTHAG EFG_O EFG_D E8 F4\n2 118.3 103.3 0.8269 54.0 49.5 1 0\n3 119.9 91.0 0.9600 54.8 48.4 1 0\n4 120.9 90.4 0.9662 55.7 45.1 1 0\n5 118.4 96.2 0.9163 52.3 48.9 1 0\n6 111.9 93.6 0.8857 50.0 47.3 0 1", "text/html": "\n\n
\n \n \n \n ADJOE \n ADJDE \n BARTHAG \n EFG_O \n EFG_D \n E8 \n F4 \n \n \n \n \n 2 \n 118.3 \n 103.3 \n 0.8269 \n 54.0 \n 49.5 \n 1 \n 0 \n \n \n 3 \n 119.9 \n 91.0 \n 0.9600 \n 54.8 \n 48.4 \n 1 \n 0 \n \n \n 4 \n 120.9 \n 90.4 \n 0.9662 \n 55.7 \n 45.1 \n 1 \n 0 \n \n \n 5 \n 118.4 \n 96.2 \n 0.9163 \n 52.3 \n 48.9 \n 1 \n 0 \n \n \n 6 \n 111.9 \n 93.6 \n 0.8857 \n 50.0 \n 47.3 \n 0 \n 1 \n \n \n
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "What are our lables? Round where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)|"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "y = df1['POSTSEASON'].values\ny[0:5]", "execution_count": 17, "outputs": [{"output_type": "execute_result", "execution_count": 17, "data": {"text/plain": "array(['E8', 'E8', 'E8', 'E8', 'F4'], dtype=object)"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## Normalize Data "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Data Standardization give data zero mean and unit variance (technically should be done after train test split )"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "X= preprocessing.StandardScaler().fit(X).transform(X)\nX[0:5]", "execution_count": 18, "outputs": [{"output_type": "stream", "text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n return self.partial_fit(X, y)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/ipykernel/__main__.py:1: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n if __name__ == '__main__':\n", "name": "stderr"}, {"output_type": "execute_result", "execution_count": 18, "data": {"text/plain": "array([[ 0.28034482, 2.74329908, -2.45717765, 0.10027963, 0.94171924,\n 1.58113883, -0.40824829],\n [ 0.64758014, -0.90102957, 1.127076 , 0.39390887, 0.38123706,\n 1.58113883, -0.40824829],\n [ 0.87710222, -1.0788017 , 1.29403598, 0.72424177, -1.30020946,\n 1.58113883, -0.40824829],\n [ 0.30329703, 0.63966222, -0.04972253, -0.52368251, 0.63600169,\n 1.58113883, -0.40824829],\n [-1.18859646, -0.13068368, -0.87375079, -1.36786658, -0.17924511,\n -0.63245553, 2.44948974]])"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## Training and Validation "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Split the data into Training and Validation data."}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "# We split the X into train and test to find the best k\nfrom sklearn.model_selection import train_test_split\nX_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=4)\nprint ('Train set:', X_train.shape, y_train.shape)\nprint ('Validation set:', X_val.shape, y_val.shape)", "execution_count": 19, "outputs": [{"output_type": "stream", "text": "Train set: (44, 7) (44,)\nValidation set: (12, 7) (12,)\n", "name": "stdout"}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "# Classification "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Now, it is your turn, use the training set to build an accurate model. Then use the validation set to report the accuracy of the model\nYou should use the following algorithm:\n- K Nearest Neighbor(KNN)\n- Decision Tree\n- Support Vector Machine\n- Logistic Regression\n\n"}, {"metadata": {}, "cell_type": "markdown", "source": "# K Nearest Neighbor(KNN)\n\nQuestion 1 Build a KNN model using a value of k equals three, find the accuracy on the validation data (X_val and y_val)"}, {"metadata": {}, "cell_type": "markdown", "source": "You can use accuracy_score"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.metrics import accuracy_score\nfrom sklearn.neighbors import KNeighborsClassifier\n\nk = 3\nfinalfour_knn = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)\n\nyhat = finalfour_knn.predict(X_val)\n\nprint(\"Train set Accuracy: \", accuracy_score(y_train, finalfour_knn.predict(X_train)))\nprint(\"Validation set Accuracy: \", accuracy_score(y_val, yhat))", "execution_count": 20, "outputs": [{"output_type": "stream", "text": "Train set Accuracy: 0.9772727272727273\nValidation set Accuracy: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "Question 2 Determine the accuracy for the first 15 values of k the on the validation data:"}, {"metadata": {}, "cell_type": "code", "source": "k_vals = []\ntrain_scores = []\nval_scores = []\n\nfor k in range(1,16):\n finalfour_knn = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)\n yhat = finalfour_knn.predict(X_val)\n k_vals.append(k)\n train_scores.append(accuracy_score(y_train, finalfour_knn.predict(X_train)))\n val_scores.append(accuracy_score(y_val, yhat))\n\nknn_k_scores = {\"K\": k_vals, \"Training Set Accuracy\": train_scores, \"Validation Set Accuracy\": val_scores}\n\ndf_k_scores = pd.DataFrame(knn_k_scores)\nnp.array(val_scores)", "execution_count": 21, "outputs": [{"output_type": "execute_result", "execution_count": 21, "data": {"text/plain": "array([1. , 1. , 1. , 1. , 1. ,\n 1. , 0.91666667, 0.91666667, 0.83333333, 0.83333333,\n 0.83333333, 0.83333333, 0.83333333, 0.83333333, 0.83333333])"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Decision Tree"}, {"metadata": {}, "cell_type": "markdown", "source": "The following lines of code fit a DecisionTreeClassifier:"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.tree import DecisionTreeClassifier", "execution_count": 22, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "Question 3 Determine the minumum value for the parameter max_depth that improves results "}, {"metadata": {}, "cell_type": "code", "source": "depths = []\nval_scores = []\n\nfor d in range(1,16):\n finalfour_tree = DecisionTreeClassifier(max_depth = d)\n finalfour_tree.fit(X_train, y_train)\n finalfour_predtree = finalfour_tree.predict(X_val)\n depths.append(d)\n val_scores.append(accuracy_score(y_val, finalfour_predtree))\n \ndepth_accuracy = {\"Max Depth\": depths, \"Validation Set Accuracy\": val_scores}\ndf_d_scores = pd.DataFrame(depth_accuracy)\nprint(\"To achieve improved accuracy, max depth must be at least 2\")\ndf_d_scores \n# Need max_depth >= 2", "execution_count": 23, "outputs": [{"output_type": "stream", "text": "To achieve improved accuracy, max depth must be at least 2\n", "name": "stdout"}, {"output_type": "execute_result", "execution_count": 23, "data": {"text/plain": " Max Depth Validation Set Accuracy\n0 1 0.833333\n1 2 1.000000\n2 3 1.000000\n3 4 1.000000\n4 5 1.000000\n5 6 1.000000\n6 7 1.000000\n7 8 1.000000\n8 9 1.000000\n9 10 1.000000\n10 11 1.000000\n11 12 1.000000\n12 13 1.000000\n13 14 1.000000\n14 15 1.000000", "text/html": "\n\n
\n \n \n \n Max Depth \n Validation Set Accuracy \n \n \n \n \n 0 \n 1 \n 0.833333 \n \n \n 1 \n 2 \n 1.000000 \n \n \n 2 \n 3 \n 1.000000 \n \n \n 3 \n 4 \n 1.000000 \n \n \n 4 \n 5 \n 1.000000 \n \n \n 5 \n 6 \n 1.000000 \n \n \n 6 \n 7 \n 1.000000 \n \n \n 7 \n 8 \n 1.000000 \n \n \n 8 \n 9 \n 1.000000 \n \n \n 9 \n 10 \n 1.000000 \n \n \n 10 \n 11 \n 1.000000 \n \n \n 11 \n 12 \n 1.000000 \n \n \n 12 \n 13 \n 1.000000 \n \n \n 13 \n 14 \n 1.000000 \n \n \n 14 \n 15 \n 1.000000 \n \n \n
\n
"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "finalfour_tree = DecisionTreeClassifier(max_depth = 2)\nfinalfour_tree.fit(X_train, y_train)\nfinalfour_predtree = finalfour_tree.predict(X_val)\nprint(\"Accuracy score\",accuracy_score(y_val, finalfour_predtree))", "execution_count": 24, "outputs": [{"output_type": "stream", "text": "Accuracy score 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Support Vector Machine"}, {"metadata": {}, "cell_type": "markdown", "source": "Question 4 Train the following linear support vector machine model and determine the accuracy on the validation data "}, {"metadata": {}, "cell_type": "code", "source": "from sklearn import svm", "execution_count": 25, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "finalfour_svm = svm.SVC(kernel='linear')\nfinalfour_svm.fit(X_train, y_train) \nyhat = finalfour_svm.predict(X_val)\nprint(\"Validation set Accuracy: \", accuracy_score(y_val, yhat))", "execution_count": 26, "outputs": [{"output_type": "stream", "text": "Validation set Accuracy: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Logistic Regression"}, {"metadata": {}, "cell_type": "markdown", "source": "Question 5 Train a logistic regression model and determine the accuracy of the validation data (set C=0.01)"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.linear_model import LogisticRegression", "execution_count": 27, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "finalfour_LR = LogisticRegression(C=0.01, solver='liblinear').fit(X_train, y_train)\nyhat = finalfour_LR.predict(X_val)\nprint(\"Validation set Accuracy: \", accuracy_score(y_val, yhat))", "execution_count": 28, "outputs": [{"output_type": "stream", "text": "Validation set Accuracy: 1.0\n", "name": "stdout"}, {"output_type": "stream", "text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.\n \"this warning.\", FutureWarning)\n", "name": "stderr"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Model Evaluation using Test set"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.metrics import f1_score\nfrom sklearn.metrics import log_loss", "execution_count": 29, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "def jaccard_index(predictions, true):\n if (len(predictions) == len(true)):\n intersect = 0;\n for x,y in zip(predictions, true):\n if (x == y):\n intersect += 1\n return intersect / (len(predictions) + len(true) - intersect)\n else:\n return -1", "execution_count": 30, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "Question 5 Calculate the F1 score and Jaccard Similarity score for each model from above. Use the Hyperparameter that performed best on the validation data."}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Load Test set for evaluation "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "test_df = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0120ENv3/Dataset/ML0101EN_EDX_skill_up/basketball_train.csv',error_bad_lines=False)\ntest_df", "execution_count": 31, "outputs": [{"output_type": "execute_result", "execution_count": 31, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D \\\n0 North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 \n1 Villanova BE 40 35 123.1 90.9 0.9703 56.1 46.7 \n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 \n5 Oregon P12 37 30 118.4 96.2 0.9163 52.3 48.9 \n6 Syracuse ACC 37 23 111.9 93.6 0.8857 50.0 47.3 \n7 Oklahoma B12 37 29 118.2 94.1 0.9326 54.3 47.2 \n8 Davidson A10 32 19 113.0 106.0 0.6767 52.0 52.0 \n9 Duquesne A10 33 16 108.2 105.1 0.5851 53.9 49.4 \n10 Fordham A10 30 16 101.8 100.4 0.5388 51.5 51.4 \n11 George Mason A10 32 11 101.4 103.1 0.4507 45.1 48.4 \n12 George Washington A10 38 28 114.9 99.1 0.8454 50.8 48.7 \n13 La Salle A10 30 8 99.2 105.3 0.3358 47.7 53.3 \n14 Massachusetts A10 32 14 102.1 100.7 0.5394 49.1 48.3 \n15 Rhode Island A10 32 17 107.6 98.3 0.7403 50.4 47.6 \n16 Richmond A10 32 16 112.6 104.0 0.7136 54.7 50.9 \n17 Saint Louis A10 32 11 96.4 100.2 0.3913 47.7 50.0 \n18 St. Bonaventure A10 31 22 113.2 104.4 0.7173 51.5 49.6 \n19 Boston College ACC 32 7 95.0 99.1 0.3820 47.5 52.4 \n20 Clemson ACC 31 17 112.1 98.0 0.8241 50.6 47.9 \n21 Florida St. ACC 34 20 112.2 99.1 0.8069 51.8 50.9 \n22 Georgia Tech ACC 36 21 113.9 98.2 0.8450 49.8 49.1 \n23 Louisville ACC 31 23 111.5 89.6 0.9250 51.9 44.7 \n24 North Carolina St. ACC 33 16 113.6 102.9 0.7561 48.5 50.6 \n25 Virginia Tech ACC 35 20 109.5 97.8 0.7855 50.0 49.0 \n26 Wake Forest ACC 31 11 106.9 99.4 0.6977 49.4 49.5 \n27 Albany AE 32 23 104.7 102.1 0.5703 50.7 48.7 \n28 Binghamton AE 30 8 89.3 102.1 0.1763 44.1 49.6 \n29 Hartford AE 32 9 99.4 114.2 0.1685 49.1 53.3 \n... ... ... .. .. ... ... ... ... ... \n1727 Mississippi SEC 34 21 112.2 99.2 0.8037 47.8 47.6 \n1728 New Mexico St. WAC 32 21 103.9 98.1 0.6613 50.6 46.4 \n1729 North Dakota St. Sum 31 21 100.8 100.7 0.5035 49.7 48.1 \n1730 Northeastern CAA 35 23 107.0 101.0 0.6586 54.5 49.8 \n1731 Oklahoma St. B12 31 17 110.7 95.4 0.8469 50.2 46.5 \n1732 Providence BE 34 22 111.6 93.7 0.8821 48.4 48.2 \n1733 Purdue B10 34 21 110.5 93.9 0.8654 50.3 45.4 \n1734 Robert Morris NEC 35 20 100.7 103.3 0.4286 50.1 49.3 \n1735 SMU Amer 33 26 111.1 94.6 0.8635 52.1 45.7 \n1736 St. John's BE 32 20 110.6 94.7 0.8569 49.3 46.4 \n1737 Stephen F. Austin Slnd 31 26 111.0 98.8 0.7919 55.6 49.0 \n1738 Texas B12 34 20 110.3 90.7 0.9048 49.1 42.1 \n1739 Texas Southern SWAC 35 22 101.9 108.1 0.3371 49.5 49.3 \n1740 UC Irvine BW 32 19 105.1 96.8 0.7202 50.7 44.7 \n1741 Valparaiso Horz 31 25 105.0 94.5 0.7706 51.3 45.2 \n1742 VCU A10 36 26 108.4 94.0 0.8373 48.7 49.5 \n1743 Wofford SC 33 26 102.3 96.8 0.6547 51.1 47.0 \n1744 Wyoming MWC 33 23 102.1 98.2 0.6121 52.2 47.0 \n1745 Boise St. MWC 32 23 109.5 96.0 0.8210 53.2 47.6 \n1746 BYU WCC 33 23 118.3 100.3 0.8699 53.2 49.6 \n1747 Manhattan MAAC 33 19 100.2 99.4 0.5238 49.4 48.1 \n1748 North Florida ASun 32 20 105.2 105.0 0.5063 53.6 47.9 \n1749 North Carolina ACC 38 26 119.6 92.5 0.9507 51.6 45.4 \n1750 North Carolina St. ACC 36 22 114.1 96.8 0.8684 49.3 45.5 \n1751 Oklahoma B12 35 24 111.6 88.5 0.9349 49.3 44.1 \n1752 UCLA P12 36 22 111.8 96.6 0.8425 49.6 48.5 \n1753 Utah P12 34 25 114.9 88.7 0.9513 55.2 43.0 \n1754 West Virginia B12 35 25 110.3 93.3 0.8733 46.1 52.7 \n1755 Wichita St. MVC 34 29 114.3 91.5 0.9277 50.3 45.8 \n1756 Xavier BE 37 23 115.7 95.1 0.9049 53.3 50.0 \n\n TOR ... FTRD 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED \\\n0 15.4 ... 30.4 53.9 44.6 32.7 36.2 71.7 8.6 2ND 1.0 \n1 16.3 ... 30.0 57.4 44.1 36.2 33.9 66.7 8.9 Champions 2.0 \n2 15.3 ... 26.0 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 \n3 15.1 ... 33.4 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 \n4 17.8 ... 37.3 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 \n5 16.1 ... 32.0 52.6 46.1 34.4 36.2 69.0 6.7 E8 1.0 \n6 18.1 ... 28.0 47.2 48.1 36.0 30.7 65.5 -0.3 F4 10.0 \n7 18.3 ... 28.3 48.2 45.3 42.2 33.7 70.8 8.0 F4 2.0 \n8 14.2 ... 30.6 51.1 52.2 35.5 34.3 71.3 -2.1 NaN NaN \n9 18.9 ... 38.6 53.1 42.8 36.6 40.2 73.6 -7.8 NaN NaN \n10 20.4 ... 40.5 51.2 52.3 34.6 33.4 67.5 -6.0 NaN NaN \n11 18.6 ... 30.0 45.7 46.0 29.2 35.3 68.6 -11.6 NaN NaN \n12 16.5 ... 27.0 48.4 48.2 37.1 33.0 67.2 -0.8 NaN NaN \n13 18.8 ... 34.9 45.4 51.1 33.6 38.1 67.0 -12.6 NaN NaN \n14 17.5 ... 40.2 48.8 49.0 33.0 31.3 71.6 -8.7 NaN NaN \n15 17.5 ... 39.6 48.3 46.6 36.5 33.3 65.7 -5.9 NaN NaN \n16 14.7 ... 36.1 55.4 49.4 35.6 36.0 69.2 -5.5 NaN NaN \n17 19.5 ... 39.3 46.9 50.5 32.8 32.8 69.6 -11.6 NaN NaN \n18 16.5 ... 37.4 48.8 50.5 37.3 32.1 68.9 -0.1 NaN NaN \n19 20.0 ... 37.0 46.9 50.0 32.3 38.2 67.8 -12.2 NaN NaN \n20 15.8 ... 28.5 49.1 45.5 35.2 34.9 64.5 -2.4 NaN NaN \n21 18.0 ... 36.7 52.1 48.9 34.1 36.2 72.7 0.1 NaN NaN \n22 16.7 ... 31.7 48.4 49.5 35.7 32.0 67.8 0.0 NaN NaN \n23 17.1 ... 40.5 51.9 42.8 34.7 32.1 67.2 4.3 NaN NaN \n24 16.1 ... 29.7 46.9 47.5 34.8 37.4 68.5 -4.0 NaN NaN \n25 18.0 ... 33.5 49.1 50.1 34.8 31.3 70.5 -0.5 NaN NaN \n26 20.1 ... 38.6 50.4 49.2 31.6 33.4 72.4 -6.2 NaN NaN \n27 18.4 ... 27.2 51.8 50.2 32.3 30.9 67.3 -4.0 NaN NaN \n28 20.7 ... 34.4 45.0 49.0 28.4 33.7 66.6 -17.3 NaN NaN \n29 16.3 ... 41.1 45.6 54.0 35.4 34.7 70.2 -17.7 NaN NaN \n... ... ... ... ... ... ... ... ... ... ... ... \n1727 16.5 ... 41.2 46.8 43.6 33.2 35.8 66.6 -0.5 R64 11.0 \n1728 21.8 ... 32.5 49.3 46.4 36.2 30.8 64.0 -4.5 R64 15.0 \n1729 16.0 ... 30.7 45.5 45.0 38.2 36.2 61.5 -3.9 R64 15.0 \n1730 21.5 ... 25.0 53.0 49.1 38.6 34.4 63.6 -4.6 R64 14.0 \n1731 18.5 ... 42.1 49.2 44.1 34.6 34.4 64.0 0.2 R64 9.0 \n1732 18.1 ... 39.3 48.9 47.1 31.3 33.3 65.7 3.7 R64 6.0 \n1733 19.9 ... 38.4 50.9 42.3 32.7 35.1 65.0 1.0 R64 9.0 \n1734 19.8 ... 37.6 47.4 48.3 37.5 34.1 66.0 -8.6 R64 16.0 \n1735 19.5 ... 31.2 51.3 42.9 36.3 32.7 63.6 3.1 R64 6.0 \n1736 15.7 ... 33.3 48.2 44.9 34.5 32.6 67.0 2.3 R64 9.0 \n1737 20.3 ... 49.1 55.4 48.3 37.3 33.7 66.4 1.2 R64 12.0 \n1738 20.1 ... 34.8 48.3 37.7 33.8 34.7 62.8 0.9 R64 11.0 \n1739 19.7 ... 33.1 50.0 48.5 32.2 34.4 63.7 -4.5 R64 15.0 \n1740 18.2 ... 35.3 47.8 42.4 38.2 33.6 63.9 -4.6 R64 13.0 \n1741 19.0 ... 33.4 48.0 42.3 38.4 33.4 63.1 0.1 R64 13.0 \n1742 15.7 ... 37.8 46.9 48.3 34.3 34.5 67.2 3.0 R64 7.0 \n1743 17.2 ... 36.4 49.1 47.3 36.8 30.9 61.2 0.2 R64 12.0 \n1744 18.8 ... 27.0 54.4 44.0 32.6 34.9 59.2 -1.6 R64 12.0 \n1745 16.1 ... 31.6 49.6 48.1 38.9 31.3 63.4 0.2 R68 11.0 \n1746 16.4 ... 39.2 50.2 49.6 39.0 33.1 70.8 0.8 R68 11.0 \n1747 21.1 ... 52.3 48.8 48.0 33.6 32.4 68.0 -7.6 R68 16.0 \n1748 18.3 ... 34.1 51.1 48.3 38.1 31.3 67.5 -5.1 R68 16.0 \n1749 18.2 ... 37.8 50.9 45.6 35.8 30.0 69.9 6.5 S16 4.0 \n1750 16.0 ... 34.7 47.3 43.6 35.6 33.0 64.7 1.1 S16 8.0 \n1751 17.6 ... 28.4 48.2 42.2 34.3 31.7 67.5 4.5 S16 3.0 \n1752 17.6 ... 32.3 47.4 45.4 36.8 35.6 66.8 0.0 S16 11.0 \n1753 18.2 ... 34.3 52.3 41.4 40.1 31.2 61.4 3.7 S16 5.0 \n1754 18.7 ... 55.5 45.5 51.8 31.6 36.5 68.6 4.1 S16 5.0 \n1755 15.0 ... 36.6 48.9 42.6 35.4 35.3 62.6 4.2 S16 7.0 \n1756 18.1 ... 33.3 53.7 48.9 35.1 34.6 65.5 1.3 S16 6.0 \n\n YEAR \n0 2016 \n1 2016 \n2 2016 \n3 2016 \n4 2016 \n5 2016 \n6 2016 \n7 2016 \n8 2016 \n9 2016 \n10 2016 \n11 2016 \n12 2016 \n13 2016 \n14 2016 \n15 2016 \n16 2016 \n17 2016 \n18 2016 \n19 2016 \n20 2016 \n21 2016 \n22 2016 \n23 2016 \n24 2016 \n25 2016 \n26 2016 \n27 2016 \n28 2016 \n29 2016 \n... ... \n1727 NaN \n1728 NaN \n1729 NaN \n1730 NaN \n1731 NaN \n1732 NaN \n1733 NaN \n1734 NaN \n1735 NaN \n1736 NaN \n1737 NaN \n1738 NaN \n1739 NaN \n1740 NaN \n1741 NaN \n1742 NaN \n1743 NaN \n1744 NaN \n1745 NaN \n1746 NaN \n1747 NaN \n1748 NaN \n1749 NaN \n1750 NaN \n1751 NaN \n1752 NaN \n1753 NaN \n1754 NaN \n1755 NaN \n1756 NaN \n\n[1757 rows x 24 columns]", "text/html": "\n\n
\n \n \n \n TEAM \n CONF \n G \n W \n ADJOE \n ADJDE \n BARTHAG \n EFG_O \n EFG_D \n TOR \n ... \n FTRD \n 2P_O \n 2P_D \n 3P_O \n 3P_D \n ADJ_T \n WAB \n POSTSEASON \n SEED \n YEAR \n \n \n \n \n 0 \n North Carolina \n ACC \n 40 \n 33 \n 123.3 \n 94.9 \n 0.9531 \n 52.6 \n 48.1 \n 15.4 \n ... \n 30.4 \n 53.9 \n 44.6 \n 32.7 \n 36.2 \n 71.7 \n 8.6 \n 2ND \n 1.0 \n 2016 \n \n \n 1 \n Villanova \n BE \n 40 \n 35 \n 123.1 \n 90.9 \n 0.9703 \n 56.1 \n 46.7 \n 16.3 \n ... \n 30.0 \n 57.4 \n 44.1 \n 36.2 \n 33.9 \n 66.7 \n 8.9 \n Champions \n 2.0 \n 2016 \n \n \n 2 \n Notre Dame \n ACC \n 36 \n 24 \n 118.3 \n 103.3 \n 0.8269 \n 54.0 \n 49.5 \n 15.3 \n ... \n 26.0 \n 52.9 \n 46.5 \n 37.4 \n 36.9 \n 65.5 \n 2.3 \n E8 \n 6.0 \n 2016 \n \n \n 3 \n Virginia \n ACC \n 37 \n 29 \n 119.9 \n 91.0 \n 0.9600 \n 54.8 \n 48.4 \n 15.1 \n ... \n 33.4 \n 52.6 \n 46.3 \n 40.3 \n 34.7 \n 61.9 \n 8.6 \n E8 \n 1.0 \n 2016 \n \n \n 4 \n Kansas \n B12 \n 37 \n 32 \n 120.9 \n 90.4 \n 0.9662 \n 55.7 \n 45.1 \n 17.8 \n ... \n 37.3 \n 52.7 \n 43.4 \n 41.3 \n 32.5 \n 70.1 \n 11.6 \n E8 \n 1.0 \n 2016 \n \n \n 5 \n Oregon \n P12 \n 37 \n 30 \n 118.4 \n 96.2 \n 0.9163 \n 52.3 \n 48.9 \n 16.1 \n ... \n 32.0 \n 52.6 \n 46.1 \n 34.4 \n 36.2 \n 69.0 \n 6.7 \n E8 \n 1.0 \n 2016 \n \n \n 6 \n Syracuse \n ACC \n 37 \n 23 \n 111.9 \n 93.6 \n 0.8857 \n 50.0 \n 47.3 \n 18.1 \n ... \n 28.0 \n 47.2 \n 48.1 \n 36.0 \n 30.7 \n 65.5 \n -0.3 \n F4 \n 10.0 \n 2016 \n \n \n 7 \n Oklahoma \n B12 \n 37 \n 29 \n 118.2 \n 94.1 \n 0.9326 \n 54.3 \n 47.2 \n 18.3 \n ... \n 28.3 \n 48.2 \n 45.3 \n 42.2 \n 33.7 \n 70.8 \n 8.0 \n F4 \n 2.0 \n 2016 \n \n \n 8 \n Davidson \n A10 \n 32 \n 19 \n 113.0 \n 106.0 \n 0.6767 \n 52.0 \n 52.0 \n 14.2 \n ... \n 30.6 \n 51.1 \n 52.2 \n 35.5 \n 34.3 \n 71.3 \n -2.1 \n NaN \n NaN \n 2016 \n \n \n 9 \n Duquesne \n A10 \n 33 \n 16 \n 108.2 \n 105.1 \n 0.5851 \n 53.9 \n 49.4 \n 18.9 \n ... \n 38.6 \n 53.1 \n 42.8 \n 36.6 \n 40.2 \n 73.6 \n -7.8 \n NaN \n NaN \n 2016 \n \n \n 10 \n Fordham \n A10 \n 30 \n 16 \n 101.8 \n 100.4 \n 0.5388 \n 51.5 \n 51.4 \n 20.4 \n ... \n 40.5 \n 51.2 \n 52.3 \n 34.6 \n 33.4 \n 67.5 \n -6.0 \n NaN \n NaN \n 2016 \n \n \n 11 \n George Mason \n A10 \n 32 \n 11 \n 101.4 \n 103.1 \n 0.4507 \n 45.1 \n 48.4 \n 18.6 \n ... \n 30.0 \n 45.7 \n 46.0 \n 29.2 \n 35.3 \n 68.6 \n -11.6 \n NaN \n NaN \n 2016 \n \n \n 12 \n George Washington \n A10 \n 38 \n 28 \n 114.9 \n 99.1 \n 0.8454 \n 50.8 \n 48.7 \n 16.5 \n ... \n 27.0 \n 48.4 \n 48.2 \n 37.1 \n 33.0 \n 67.2 \n -0.8 \n NaN \n NaN \n 2016 \n \n \n 13 \n La Salle \n A10 \n 30 \n 8 \n 99.2 \n 105.3 \n 0.3358 \n 47.7 \n 53.3 \n 18.8 \n ... \n 34.9 \n 45.4 \n 51.1 \n 33.6 \n 38.1 \n 67.0 \n -12.6 \n NaN \n NaN \n 2016 \n \n \n 14 \n Massachusetts \n A10 \n 32 \n 14 \n 102.1 \n 100.7 \n 0.5394 \n 49.1 \n 48.3 \n 17.5 \n ... \n 40.2 \n 48.8 \n 49.0 \n 33.0 \n 31.3 \n 71.6 \n -8.7 \n NaN \n NaN \n 2016 \n \n \n 15 \n Rhode Island \n A10 \n 32 \n 17 \n 107.6 \n 98.3 \n 0.7403 \n 50.4 \n 47.6 \n 17.5 \n ... \n 39.6 \n 48.3 \n 46.6 \n 36.5 \n 33.3 \n 65.7 \n -5.9 \n NaN \n NaN \n 2016 \n \n \n 16 \n Richmond \n A10 \n 32 \n 16 \n 112.6 \n 104.0 \n 0.7136 \n 54.7 \n 50.9 \n 14.7 \n ... \n 36.1 \n 55.4 \n 49.4 \n 35.6 \n 36.0 \n 69.2 \n -5.5 \n NaN \n NaN \n 2016 \n \n \n 17 \n Saint Louis \n A10 \n 32 \n 11 \n 96.4 \n 100.2 \n 0.3913 \n 47.7 \n 50.0 \n 19.5 \n ... \n 39.3 \n 46.9 \n 50.5 \n 32.8 \n 32.8 \n 69.6 \n -11.6 \n NaN \n NaN \n 2016 \n \n \n 18 \n St. Bonaventure \n A10 \n 31 \n 22 \n 113.2 \n 104.4 \n 0.7173 \n 51.5 \n 49.6 \n 16.5 \n ... \n 37.4 \n 48.8 \n 50.5 \n 37.3 \n 32.1 \n 68.9 \n -0.1 \n NaN \n NaN \n 2016 \n \n \n 19 \n Boston College \n ACC \n 32 \n 7 \n 95.0 \n 99.1 \n 0.3820 \n 47.5 \n 52.4 \n 20.0 \n ... \n 37.0 \n 46.9 \n 50.0 \n 32.3 \n 38.2 \n 67.8 \n -12.2 \n NaN \n NaN \n 2016 \n \n \n 20 \n Clemson \n ACC \n 31 \n 17 \n 112.1 \n 98.0 \n 0.8241 \n 50.6 \n 47.9 \n 15.8 \n ... \n 28.5 \n 49.1 \n 45.5 \n 35.2 \n 34.9 \n 64.5 \n -2.4 \n NaN \n NaN \n 2016 \n \n \n 21 \n Florida St. \n ACC \n 34 \n 20 \n 112.2 \n 99.1 \n 0.8069 \n 51.8 \n 50.9 \n 18.0 \n ... \n 36.7 \n 52.1 \n 48.9 \n 34.1 \n 36.2 \n 72.7 \n 0.1 \n NaN \n NaN \n 2016 \n \n \n 22 \n Georgia Tech \n ACC \n 36 \n 21 \n 113.9 \n 98.2 \n 0.8450 \n 49.8 \n 49.1 \n 16.7 \n ... \n 31.7 \n 48.4 \n 49.5 \n 35.7 \n 32.0 \n 67.8 \n 0.0 \n NaN \n NaN \n 2016 \n \n \n 23 \n Louisville \n ACC \n 31 \n 23 \n 111.5 \n 89.6 \n 0.9250 \n 51.9 \n 44.7 \n 17.1 \n ... \n 40.5 \n 51.9 \n 42.8 \n 34.7 \n 32.1 \n 67.2 \n 4.3 \n NaN \n NaN \n 2016 \n \n \n 24 \n North Carolina St. \n ACC \n 33 \n 16 \n 113.6 \n 102.9 \n 0.7561 \n 48.5 \n 50.6 \n 16.1 \n ... \n 29.7 \n 46.9 \n 47.5 \n 34.8 \n 37.4 \n 68.5 \n -4.0 \n NaN \n NaN \n 2016 \n \n \n 25 \n Virginia Tech \n ACC \n 35 \n 20 \n 109.5 \n 97.8 \n 0.7855 \n 50.0 \n 49.0 \n 18.0 \n ... \n 33.5 \n 49.1 \n 50.1 \n 34.8 \n 31.3 \n 70.5 \n -0.5 \n NaN \n NaN \n 2016 \n \n \n 26 \n Wake Forest \n ACC \n 31 \n 11 \n 106.9 \n 99.4 \n 0.6977 \n 49.4 \n 49.5 \n 20.1 \n ... \n 38.6 \n 50.4 \n 49.2 \n 31.6 \n 33.4 \n 72.4 \n -6.2 \n NaN \n NaN \n 2016 \n \n \n 27 \n Albany \n AE \n 32 \n 23 \n 104.7 \n 102.1 \n 0.5703 \n 50.7 \n 48.7 \n 18.4 \n ... \n 27.2 \n 51.8 \n 50.2 \n 32.3 \n 30.9 \n 67.3 \n -4.0 \n NaN \n NaN \n 2016 \n \n \n 28 \n Binghamton \n AE \n 30 \n 8 \n 89.3 \n 102.1 \n 0.1763 \n 44.1 \n 49.6 \n 20.7 \n ... \n 34.4 \n 45.0 \n 49.0 \n 28.4 \n 33.7 \n 66.6 \n -17.3 \n NaN \n NaN \n 2016 \n \n \n 29 \n Hartford \n AE \n 32 \n 9 \n 99.4 \n 114.2 \n 0.1685 \n 49.1 \n 53.3 \n 16.3 \n ... \n 41.1 \n 45.6 \n 54.0 \n 35.4 \n 34.7 \n 70.2 \n -17.7 \n NaN \n NaN \n 2016 \n \n \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n \n \n 1727 \n Mississippi \n SEC \n 34 \n 21 \n 112.2 \n 99.2 \n 0.8037 \n 47.8 \n 47.6 \n 16.5 \n ... \n 41.2 \n 46.8 \n 43.6 \n 33.2 \n 35.8 \n 66.6 \n -0.5 \n R64 \n 11.0 \n NaN \n \n \n 1728 \n New Mexico St. \n WAC \n 32 \n 21 \n 103.9 \n 98.1 \n 0.6613 \n 50.6 \n 46.4 \n 21.8 \n ... \n 32.5 \n 49.3 \n 46.4 \n 36.2 \n 30.8 \n 64.0 \n -4.5 \n R64 \n 15.0 \n NaN \n \n \n 1729 \n North Dakota St. \n Sum \n 31 \n 21 \n 100.8 \n 100.7 \n 0.5035 \n 49.7 \n 48.1 \n 16.0 \n ... \n 30.7 \n 45.5 \n 45.0 \n 38.2 \n 36.2 \n 61.5 \n -3.9 \n R64 \n 15.0 \n NaN \n \n \n 1730 \n Northeastern \n CAA \n 35 \n 23 \n 107.0 \n 101.0 \n 0.6586 \n 54.5 \n 49.8 \n 21.5 \n ... \n 25.0 \n 53.0 \n 49.1 \n 38.6 \n 34.4 \n 63.6 \n -4.6 \n R64 \n 14.0 \n NaN \n \n \n 1731 \n Oklahoma St. \n B12 \n 31 \n 17 \n 110.7 \n 95.4 \n 0.8469 \n 50.2 \n 46.5 \n 18.5 \n ... \n 42.1 \n 49.2 \n 44.1 \n 34.6 \n 34.4 \n 64.0 \n 0.2 \n R64 \n 9.0 \n NaN \n \n \n 1732 \n Providence \n BE \n 34 \n 22 \n 111.6 \n 93.7 \n 0.8821 \n 48.4 \n 48.2 \n 18.1 \n ... \n 39.3 \n 48.9 \n 47.1 \n 31.3 \n 33.3 \n 65.7 \n 3.7 \n R64 \n 6.0 \n NaN \n \n \n 1733 \n Purdue \n B10 \n 34 \n 21 \n 110.5 \n 93.9 \n 0.8654 \n 50.3 \n 45.4 \n 19.9 \n ... \n 38.4 \n 50.9 \n 42.3 \n 32.7 \n 35.1 \n 65.0 \n 1.0 \n R64 \n 9.0 \n NaN \n \n \n 1734 \n Robert Morris \n NEC \n 35 \n 20 \n 100.7 \n 103.3 \n 0.4286 \n 50.1 \n 49.3 \n 19.8 \n ... \n 37.6 \n 47.4 \n 48.3 \n 37.5 \n 34.1 \n 66.0 \n -8.6 \n R64 \n 16.0 \n NaN \n \n \n 1735 \n SMU \n Amer \n 33 \n 26 \n 111.1 \n 94.6 \n 0.8635 \n 52.1 \n 45.7 \n 19.5 \n ... \n 31.2 \n 51.3 \n 42.9 \n 36.3 \n 32.7 \n 63.6 \n 3.1 \n R64 \n 6.0 \n NaN \n \n \n 1736 \n St. John's \n BE \n 32 \n 20 \n 110.6 \n 94.7 \n 0.8569 \n 49.3 \n 46.4 \n 15.7 \n ... \n 33.3 \n 48.2 \n 44.9 \n 34.5 \n 32.6 \n 67.0 \n 2.3 \n R64 \n 9.0 \n NaN \n \n \n 1737 \n Stephen F. Austin \n Slnd \n 31 \n 26 \n 111.0 \n 98.8 \n 0.7919 \n 55.6 \n 49.0 \n 20.3 \n ... \n 49.1 \n 55.4 \n 48.3 \n 37.3 \n 33.7 \n 66.4 \n 1.2 \n R64 \n 12.0 \n NaN \n \n \n 1738 \n Texas \n B12 \n 34 \n 20 \n 110.3 \n 90.7 \n 0.9048 \n 49.1 \n 42.1 \n 20.1 \n ... \n 34.8 \n 48.3 \n 37.7 \n 33.8 \n 34.7 \n 62.8 \n 0.9 \n R64 \n 11.0 \n NaN \n \n \n 1739 \n Texas Southern \n SWAC \n 35 \n 22 \n 101.9 \n 108.1 \n 0.3371 \n 49.5 \n 49.3 \n 19.7 \n ... \n 33.1 \n 50.0 \n 48.5 \n 32.2 \n 34.4 \n 63.7 \n -4.5 \n R64 \n 15.0 \n NaN \n \n \n 1740 \n UC Irvine \n BW \n 32 \n 19 \n 105.1 \n 96.8 \n 0.7202 \n 50.7 \n 44.7 \n 18.2 \n ... \n 35.3 \n 47.8 \n 42.4 \n 38.2 \n 33.6 \n 63.9 \n -4.6 \n R64 \n 13.0 \n NaN \n \n \n 1741 \n Valparaiso \n Horz \n 31 \n 25 \n 105.0 \n 94.5 \n 0.7706 \n 51.3 \n 45.2 \n 19.0 \n ... \n 33.4 \n 48.0 \n 42.3 \n 38.4 \n 33.4 \n 63.1 \n 0.1 \n R64 \n 13.0 \n NaN \n \n \n 1742 \n VCU \n A10 \n 36 \n 26 \n 108.4 \n 94.0 \n 0.8373 \n 48.7 \n 49.5 \n 15.7 \n ... \n 37.8 \n 46.9 \n 48.3 \n 34.3 \n 34.5 \n 67.2 \n 3.0 \n R64 \n 7.0 \n NaN \n \n \n 1743 \n Wofford \n SC \n 33 \n 26 \n 102.3 \n 96.8 \n 0.6547 \n 51.1 \n 47.0 \n 17.2 \n ... \n 36.4 \n 49.1 \n 47.3 \n 36.8 \n 30.9 \n 61.2 \n 0.2 \n R64 \n 12.0 \n NaN \n \n \n 1744 \n Wyoming \n MWC \n 33 \n 23 \n 102.1 \n 98.2 \n 0.6121 \n 52.2 \n 47.0 \n 18.8 \n ... \n 27.0 \n 54.4 \n 44.0 \n 32.6 \n 34.9 \n 59.2 \n -1.6 \n R64 \n 12.0 \n NaN \n \n \n 1745 \n Boise St. \n MWC \n 32 \n 23 \n 109.5 \n 96.0 \n 0.8210 \n 53.2 \n 47.6 \n 16.1 \n ... \n 31.6 \n 49.6 \n 48.1 \n 38.9 \n 31.3 \n 63.4 \n 0.2 \n R68 \n 11.0 \n NaN \n \n \n 1746 \n BYU \n WCC \n 33 \n 23 \n 118.3 \n 100.3 \n 0.8699 \n 53.2 \n 49.6 \n 16.4 \n ... \n 39.2 \n 50.2 \n 49.6 \n 39.0 \n 33.1 \n 70.8 \n 0.8 \n R68 \n 11.0 \n NaN \n \n \n 1747 \n Manhattan \n MAAC \n 33 \n 19 \n 100.2 \n 99.4 \n 0.5238 \n 49.4 \n 48.1 \n 21.1 \n ... \n 52.3 \n 48.8 \n 48.0 \n 33.6 \n 32.4 \n 68.0 \n -7.6 \n R68 \n 16.0 \n NaN \n \n \n 1748 \n North Florida \n ASun \n 32 \n 20 \n 105.2 \n 105.0 \n 0.5063 \n 53.6 \n 47.9 \n 18.3 \n ... \n 34.1 \n 51.1 \n 48.3 \n 38.1 \n 31.3 \n 67.5 \n -5.1 \n R68 \n 16.0 \n NaN \n \n \n 1749 \n North Carolina \n ACC \n 38 \n 26 \n 119.6 \n 92.5 \n 0.9507 \n 51.6 \n 45.4 \n 18.2 \n ... \n 37.8 \n 50.9 \n 45.6 \n 35.8 \n 30.0 \n 69.9 \n 6.5 \n S16 \n 4.0 \n NaN \n \n \n 1750 \n North Carolina St. \n ACC \n 36 \n 22 \n 114.1 \n 96.8 \n 0.8684 \n 49.3 \n 45.5 \n 16.0 \n ... \n 34.7 \n 47.3 \n 43.6 \n 35.6 \n 33.0 \n 64.7 \n 1.1 \n S16 \n 8.0 \n NaN \n \n \n 1751 \n Oklahoma \n B12 \n 35 \n 24 \n 111.6 \n 88.5 \n 0.9349 \n 49.3 \n 44.1 \n 17.6 \n ... \n 28.4 \n 48.2 \n 42.2 \n 34.3 \n 31.7 \n 67.5 \n 4.5 \n S16 \n 3.0 \n NaN \n \n \n 1752 \n UCLA \n P12 \n 36 \n 22 \n 111.8 \n 96.6 \n 0.8425 \n 49.6 \n 48.5 \n 17.6 \n ... \n 32.3 \n 47.4 \n 45.4 \n 36.8 \n 35.6 \n 66.8 \n 0.0 \n S16 \n 11.0 \n NaN \n \n \n 1753 \n Utah \n P12 \n 34 \n 25 \n 114.9 \n 88.7 \n 0.9513 \n 55.2 \n 43.0 \n 18.2 \n ... \n 34.3 \n 52.3 \n 41.4 \n 40.1 \n 31.2 \n 61.4 \n 3.7 \n S16 \n 5.0 \n NaN \n \n \n 1754 \n West Virginia \n B12 \n 35 \n 25 \n 110.3 \n 93.3 \n 0.8733 \n 46.1 \n 52.7 \n 18.7 \n ... \n 55.5 \n 45.5 \n 51.8 \n 31.6 \n 36.5 \n 68.6 \n 4.1 \n S16 \n 5.0 \n NaN \n \n \n 1755 \n Wichita St. \n MVC \n 34 \n 29 \n 114.3 \n 91.5 \n 0.9277 \n 50.3 \n 45.8 \n 15.0 \n ... \n 36.6 \n 48.9 \n 42.6 \n 35.4 \n 35.3 \n 62.6 \n 4.2 \n S16 \n 7.0 \n NaN \n \n \n 1756 \n Xavier \n BE \n 37 \n 23 \n 115.7 \n 95.1 \n 0.9049 \n 53.3 \n 50.0 \n 18.1 \n ... \n 33.3 \n 53.7 \n 48.9 \n 35.1 \n 34.6 \n 65.5 \n 1.3 \n S16 \n 6.0 \n NaN \n \n \n
\n
1757 rows \u00d7 24 columns
\n
"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "test_df['windex'] = np.where(test_df.WAB > 7, 'True', 'False')\ntest_df1 = test_df[test_df['POSTSEASON'].str.contains('F4|S16|E8', na=False)]\ntest_df1. head()\ntest_df1.groupby(['windex'])['POSTSEASON'].value_counts(normalize=True)\ntest_Feature = test_df1[['ADJOE','ADJDE','BARTHAG','EFG_O','EFG_D']]\ntest_Feature = pd.concat([test_Feature,pd.get_dummies(test_df1['POSTSEASON'])], axis=1)\ntest_Feature.drop(['S16'], axis = 1,inplace=True)\ntest_Feature.head()\ntest_X=test_Feature\ntest_X= preprocessing.StandardScaler().fit(test_X).transform(test_X)\ntest_X[0:5]", "execution_count": 32, "outputs": [{"output_type": "stream", "text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n return self.partial_fit(X, y)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/ipykernel/__main__.py:10: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n", "name": "stderr"}, {"output_type": "execute_result", "execution_count": 32, "data": {"text/plain": "array([[ 3.37365934e-01, 2.66479976e+00, -2.46831661e+00,\n 2.13703245e-01, 9.44090550e-01, 1.58113883e+00,\n -4.08248290e-01],\n [ 7.03145068e-01, -7.13778644e-01, 1.07370841e+00,\n 4.82633172e-01, 4.77498943e-01, 1.58113883e+00,\n -4.08248290e-01],\n [ 9.31757027e-01, -8.78587347e-01, 1.23870131e+00,\n 7.85179340e-01, -9.22275877e-01, 1.58113883e+00,\n -4.08248290e-01],\n [ 3.60227129e-01, 7.14563447e-01, -8.92254236e-02,\n -3.57772849e-01, 6.89586037e-01, 1.58113883e+00,\n -4.08248290e-01],\n [-1.12575060e+00, 3.92401673e-04, -9.03545224e-01,\n -1.13094639e+00, 1.09073363e-02, -6.32455532e-01,\n 2.44948974e+00]])"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "test_y = test_df1['POSTSEASON'].values\ntest_y", "execution_count": 33, "outputs": [{"output_type": "execute_result", "execution_count": 33, "data": {"text/plain": "array(['E8', 'E8', 'E8', 'E8', 'F4', 'F4', 'S16', 'S16', 'S16', 'S16',\n 'S16', 'S16', 'S16', 'S16', 'E8', 'E8', 'E8', 'E8', 'F4', 'F4',\n 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'E8', 'E8',\n 'E8', 'E8', 'F4', 'F4', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16',\n 'S16', 'S16', 'E8', 'E8', 'E8', 'E8', 'F4', 'F4', 'S16', 'S16',\n 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'E8', 'E8', 'E8', 'E8',\n 'F4', 'F4', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16'],\n dtype=object)"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "KNN"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(finalfour_knn.predict(test_X), test_y, average='micro')\nJaccard = jaccard_index(finalfour_knn.predict(test_X), test_y)\nprint(\"F1-Score: {}\\nJaccard Index: {}\".format(F1, Jaccard))", "execution_count": 34, "outputs": [{"output_type": "stream", "text": "F1-Score: 0.7714285714285715\nJaccard Index: 0.627906976744186\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "Decision Tree"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(test_y, finalfour_tree.predict(test_X), average='micro')\nJaccard = jaccard_index(finalfour_tree.predict(test_X), test_y)\nprint(\"F1-Score: {}\\nJaccard Index: {}\".format(F1, Jaccard))", "execution_count": 35, "outputs": [{"output_type": "stream", "text": "F1-Score: 1.0\nJaccard Index: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "SVM"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(test_y, finalfour_svm.predict(test_X), average='micro')\nJaccard = jaccard_index(finalfour_svm.predict(test_X), test_y)\nprint(\"F1-Score: {}\\nJaccard Index: {}\".format(F1, Jaccard))", "execution_count": 36, "outputs": [{"output_type": "stream", "text": "F1-Score: 1.0\nJaccard Index: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "Logistic Regression"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(test_y, finalfour_LR.predict(test_X), average='micro')\nJaccard = jaccard_index(finalfour_LR.predict(test_X), test_y)\nLogLoss = log_loss(test_y, finalfour_LR.predict_proba(test_X))\nprint(\"F1-Score: {}\\nJaccard Index: {}\\nLog-Loss: {}\".format(F1, Jaccard, LogLoss))", "execution_count": 37, "outputs": [{"output_type": "stream", "text": "F1-Score: 1.0\nJaccard Index: 1.0\nLog-Loss: 0.9742807754772282\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Report\nYou should be able to report the accuracy of the built model using different evaluation metrics:"}, {"metadata": {}, "cell_type": "markdown", "source": "| Algorithm | Jaccard | F1-score | LogLoss |\n|--------------------|---------|----------|---------|\n| KNN |.93 |.93 | NA |\n| Decision Tree | 1 | 1 | NA |\n| SVM | 1 | 1 | NA |\n| LogisticRegression | 1 | 1 | .97 |"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Want to learn more? \n\nIBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems \u2013 by your enterprise as a whole. A free trial is available through this course, available here: SPSS Modeler \n\nAlso, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at Watson Studio \n\nThanks for completing this lesson! \n\n\nSaeed Aghabozorgi , PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients\u2019 ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.
\n\n \n\nCopyright © 2018 Cognitive Class . This notebook and its source code are released under the terms of the MIT License .
"}], "metadata": {"kernelspec": {"name": "python3", "display_name": "Python 3.6", "language": "python"}, "language_info": {"name": "python", "version": "3.6.9", "mimetype": "text/x-python", "codemirror_mode": {"name": "ipython", "version": 3}, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py"}}, "nbformat": 4, "nbformat_minor": 4}
--------------------------------------------------------------------------------