├── README.md └── college_basketball_finalfour.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Project: College Basketball Final Four Prediction 2 | 3 | ## Project Goal 4 | Analyze basketball metrics of different college basketball teams and predict which teams can make it to the NCAA Final Four. 5 | 6 | ## Dataset Information: 7 | This dataset is about the performance of basketball teams. The cbb.csv data set includes performance data about five seasons of 354 basketball teams. 8 | 9 | ## Tech Stack 10 | Python
11 | Jupyter Notebook
12 | NumPy
13 | Pandas
14 | Matplotlib
15 | Scikit-learn
16 | IBM Watson Studio
17 | 18 | ## Featured ML Algorithms 19 | K-Nearest Neighbors (KNN)
20 | Decision Tree
21 | Support Vector Machines (SVM)
22 | Logistic Regression
23 | 24 | ## Things to Note 25 | This Jupyter Notebook is also hosted on [IBM Watson Studio](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/967af07f-20f4-441d-9b77-1c71ad1fb54a/view?access_token=19ad526b688b2b7d02e93fa615a35124fc97a6fcb28ccf8385244b451868409e) 26 | 27 | -------------------------------------------------------------------------------- /college_basketball_finalfour.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "

College Basketball Final Four Prediction

"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "import itertools\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.ticker import NullFormatter\nimport pandas as pd\nimport numpy as np\nimport matplotlib.ticker as ticker\nfrom sklearn import preprocessing\n%matplotlib inline", "execution_count": 1, "outputs": []}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### About dataset"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "This dataset is about the performance of basketball teams. The __cbb.csv__ data set includes performance data about five seasons of 354 basketball teams. It includes following fields:\n\n| Field | Description |\n|----------------|---------------------------------------------------------------------------------------|\n|TEAM |\tThe Division I college basketball school|\n|CONF|\tThe Athletic Conference in which the school participates in (A10 = Atlantic 10, ACC = Atlantic Coast Conference, AE = America East, Amer = American, ASun = ASUN, B10 = Big Ten, B12 = Big 12, BE = Big East, BSky = Big Sky, BSth = Big South, BW = Big West, CAA = Colonial Athletic Association, CUSA = Conference USA, Horz = Horizon League, Ivy = Ivy League, MAAC = Metro Atlantic Athletic Conference, MAC = Mid-American Conference, MEAC = Mid-Eastern Athletic Conference, MVC = Missouri Valley Conference, MWC = Mountain West, NEC = Northeast Conference, OVC = Ohio Valley Conference, P12 = Pac-12, Pat = Patriot League, SB = Sun Belt, SC = Southern Conference, SEC = South Eastern Conference, Slnd = Southland Conference, Sum = Summit League, SWAC = Southwestern Athletic Conference, WAC = Western Athletic Conference, WCC = West Coast Conference)|\n|G|\tNumber of games played|\n|W|\tNumber of games won|\n|ADJOE|\tAdjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)|\n|ADJDE|\tAdjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)|\n|BARTHAG|\tPower Rating (Chance of beating an average Division I team)|\n|EFG_O|\tEffective Field Goal Percentage Shot|\n|EFG_D|\tEffective Field Goal Percentage Allowed|\n|TOR|\tTurnover Percentage Allowed (Turnover Rate)|\n|TORD|\tTurnover Percentage Committed (Steal Rate)|\n|ORB|\tOffensive Rebound Percentage|\n|DRB|\tDefensive Rebound Percentage|\n|FTR|\tFree Throw Rate (How often the given team shoots Free Throws)|\n|FTRD|\tFree Throw Rate Allowed|\n|2P_O|\tTwo-Point Shooting Percentage|\n|2P_D|\tTwo-Point Shooting Percentage Allowed|\n|3P_O|\tThree-Point Shooting Percentage|\n|3P_D|\tThree-Point Shooting Percentage Allowed|\n|ADJ_T|\tAdjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)|\n|WAB|\tWins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)|\n|POSTSEASON|\tRound where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)|\n|SEED|\tSeed in the NCAA March Madness Tournament|\n|YEAR|\tSeason"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Load Data From CSV File "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Let's load the dataset [NB Need to provide link to csv file]"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0120ENv3/Dataset/ML0101EN_EDX_skill_up/cbb.csv')\ndf.head()", "execution_count": 2, "outputs": [{"output_type": "execute_result", "execution_count": 2, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR \\\n0 North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 15.4 \n1 Villanova BE 40 35 123.1 90.9 0.9703 56.1 46.7 16.3 \n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 15.3 \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 15.1 \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 17.8 \n\n ... FTRD 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED YEAR \n0 ... 30.4 53.9 44.6 32.7 36.2 71.7 8.6 2ND 1.0 2016 \n1 ... 30.0 57.4 44.1 36.2 33.9 66.7 8.9 Champions 2.0 2016 \n2 ... 26.0 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 2016 \n3 ... 33.4 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 2016 \n4 ... 37.3 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 2016 \n\n[5 rows x 24 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
TEAMCONFGWADJOEADJDEBARTHAGEFG_OEFG_DTOR...FTRD2P_O2P_D3P_O3P_DADJ_TWABPOSTSEASONSEEDYEAR
0North CarolinaACC4033123.394.90.953152.648.115.4...30.453.944.632.736.271.78.62ND1.02016
1VillanovaBE4035123.190.90.970356.146.716.3...30.057.444.136.233.966.78.9Champions2.02016
2Notre DameACC3624118.3103.30.826954.049.515.3...26.052.946.537.436.965.52.3E86.02016
3VirginiaACC3729119.991.00.960054.848.415.1...33.452.646.340.334.761.98.6E81.02016
4KansasB123732120.990.40.966255.745.117.8...37.352.743.441.332.570.111.6E81.02016
\n

5 rows \u00d7 24 columns

\n
"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "df.shape", "execution_count": 3, "outputs": [{"output_type": "execute_result", "execution_count": 3, "data": {"text/plain": "(1406, 24)"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "## Add Column\nNext we'll add a column that will contain \"true\" if the wins above bubble are over 7 and \"false\" if not. We'll call this column Win Index or \"windex\" for short. "}, {"metadata": {}, "cell_type": "code", "source": "df['windex'] = np.where(df.WAB > 7, 'True', 'False')", "execution_count": 4, "outputs": []}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "# Data visualization and pre-processing\n\n"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Next we'll filter the data set to the teams that made the Sweet Sixteen, the Elite Eight, and the Final Four in the post season. We'll also create a new dataframe that will hold the values with the new column."}, {"metadata": {}, "cell_type": "code", "source": "df1 = df[df['POSTSEASON'].str.contains('F4|S16|E8', na=False)]\ndf1.head()", "execution_count": 5, "outputs": [{"output_type": "execute_result", "execution_count": 5, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR ... \\\n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 15.3 ... \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 15.1 ... \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 17.8 ... \n5 Oregon P12 37 30 118.4 96.2 0.9163 52.3 48.9 16.1 ... \n6 Syracuse ACC 37 23 111.9 93.6 0.8857 50.0 47.3 18.1 ... \n\n 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED YEAR windex \n2 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 2016 False \n3 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 2016 True \n4 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 2016 True \n5 52.6 46.1 34.4 36.2 69.0 6.7 E8 1.0 2016 False \n6 47.2 48.1 36.0 30.7 65.5 -0.3 F4 10.0 2016 False \n\n[5 rows x 25 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
TEAMCONFGWADJOEADJDEBARTHAGEFG_OEFG_DTOR...2P_O2P_D3P_O3P_DADJ_TWABPOSTSEASONSEEDYEARwindex
2Notre DameACC3624118.3103.30.826954.049.515.3...52.946.537.436.965.52.3E86.02016False
3VirginiaACC3729119.991.00.960054.848.415.1...52.646.340.334.761.98.6E81.02016True
4KansasB123732120.990.40.966255.745.117.8...52.743.441.332.570.111.6E81.02016True
5OregonP123730118.496.20.916352.348.916.1...52.646.134.436.269.06.7E81.02016False
6SyracuseACC3723111.993.60.885750.047.318.1...47.248.136.030.765.5-0.3F410.02016False
\n

5 rows \u00d7 25 columns

\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1['POSTSEASON'].value_counts()", "execution_count": 6, "outputs": [{"output_type": "execute_result", "execution_count": 6, "data": {"text/plain": "S16 32\nE8 16\nF4 8\nName: POSTSEASON, dtype: int64"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "40 teams made it into the Sweet Sixteen, 20 into the Elite Eight, and 10 made it into the Final Four over 5 seasons. \n"}, {"metadata": {}, "cell_type": "markdown", "source": "Lets plot some columns to underestand data better:"}, {"metadata": {}, "cell_type": "code", "source": "# notice: installing seaborn might takes a few minutes\n#!conda install -c anaconda seaborn -y", "execution_count": 7, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "import seaborn as sns\n\nbins = np.linspace(df1.BARTHAG.min(), df1.BARTHAG.max(), 10)\ng = sns.FacetGrid(df1, col=\"windex\", hue=\"POSTSEASON\", palette=\"Set1\", col_wrap=6)\ng.map(plt.hist, 'BARTHAG', bins=bins, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()", "execution_count": 8, "outputs": [{"output_type": "display_data", "data": {"text/plain": "
", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbcAAADQCAYAAACEES+9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFidJREFUeJzt3X+cVXWdx/HXe4YfQ4wUASXMOIAZBSiNOi2bmdKPNWQtJS20H490a9nVtMJaNx+16ZY+1tLH1pZai2RoZrZZ6qqsP9ZCqZQEBNNINDWZBh4CK5gGAvLZP86BbuOMc2fm3Lkz33k/H4/z4Nxzv/ecz3dmPnzuOffc71cRgZmZWUpqqh2AmZlZ0VzczMwsOS5uZmaWHBc3MzNLjoubmZklx8XNzMyS4+LWByQtkfSqbrSfJOmhSsbUyXEXS3pC0up8+WQX7ZdKaumr+MzaGwi5JemyPJ9+I2l7SX6d1JdxDDZDqh3AYBARc6odQzf8U0RcX+0gzMoxEHIrIj4BWWEFbomI5o7aSRoSEbv7MLSk+cytlySds/cMR9LXJP00X3+npGvy9Scljc3fNa6VdIWkhyXdIWlE3uZwSWsk3Qt8omT/tZIulnS/pAcl/UO+fa6k/1VmvKR1kvavUB+/JWlFHvO/dvB8bX7W95CkX0takG9/naTbJK2UtEzSGysRn6VpkOTWzyVdKOke4ExJ10g6oeT550rWPyfpV3msX6xEPClxceu9e4C35estQL2kocCRwLIO2r8euCwipgNbgRPz7d8FPhkRb2nX/mPAtoh4M/Bm4O8lTY6IG4CNZMl6BXBeRGwsfaGk/UougbRfpnXSn4tL2hySb/t8RLQAM4CjJc1o95pmoCEiDo6IQ/K+ACwEzoqIw4HPApd3ckyzjqSWW50ZFRFHRcTXO2sgaQ7QBMwky7cjJB3RzeMMKr4s2XsrgcMl7Qe8AKwiS8S3AR19ZvVERKwuee0kSa8EXhURd+fbvwccm68fA8wouT7/SrIkfgI4C3gIuC8iftD+QBHxR7JE6I6OLkt+QNJ8sr+X8cA04MGS5x8HDpT0TeBW4A5J9cARwI8k7W03vJux2OCWWm515roy2hxDFvcD+eN6YArwy4JiSI6LWy9FxC5JTwKnkf2hPQi8HXgdsLaDl7xQsv4iMAIQ0NkgnyI7+7m9g+cagD3AayXVRMSev3hh9p9CR+9wAT4YEb/p5LnSfUwmO+t6c0Q8I2kxUFfaJt/+JuDdZO92PwB8Gtja2ecLZl1JPbdKPF+yvpv8ipqkWv78f7SACyLiO93Y76Dmy5LFuIesANxD9gf/j8DqKHNU6ojYCmyTdGS+6UMlT98OnJ5fjkHSFEkjJQ0hu9zyQbJEP7uD/f4xIpo7WcpNvlFkybdN0mv587vefSSNBWoi4sfAvwCHRcSzwBOS3p+3UV4Azboj5dzqyJPA4fn6XKC2JNaPSRqZx9qY5511wmduxVgGfB64NyKel7SDzt/VdeY04EpJfyL7Q95rETAJWKXs+t4m4ATgM8CyiFgmaTVwv6RbI6Kjd7Q9FhFrJD0APEx2+fEXHTRrAL4rae+bpXPzfz8EfEvSF4ChZJdf1hQZnyUv2dzqxH8CN0n6G+AO8rPRiFiS35B1X36Z/49kxXdzH8Q0IMlT3piZWWp8WdLMzJLj4mZmZslxcTMzs+S4uJmZWXIqUtxmz54dZN8t8eJlsC+FcE558bJvKUtFitvmzb471axIzimz7vFlSTMzS46Lm5mZJaes4iZpQT6NxEOSfiCprutXmZmZVUeXw29JaiAbgXtaRGyX9F/AycDiCsdmZmZd2LVrF62trezYsaPaoRSqrq6OxsZGhg4d2qPXlzu25BBghKRdwCuAth4dzczMCtXa2sp+++3HpEmTKJleakCLCLZs2UJrayuTJ0/u0T66vCwZEX8ALgGeAjaQTe53R4+OZmZmhdqxYwdjxoxJprABSGLMmDG9OhvtsrhJGg0cD0wGJgAjJX24g3bzJa2QtGLTpk09Dsj6VkNTA5IKWxqaGqrdpWQ4p6xcKRW2vXrbp3IuS76LbIbbTfkBf0I2w/I1pY0iYiGwEKClpaXsL9pZdbWtb+M9N8wpbH83z11S2L4GO+eUWc+Vc7fkU8BfS3pFPufRO+l4FlwzM6uyiePHF3o1ZuL48V0es7a2lubm5n3LRRddBMBdd93FYYcdRnNzM0ceeSSPPfZYpbu/T5dnbhGxXNL1wCqyKdAfIH83aWZm/ctTGzfSOqGxsP01trV22WbEiBGsXr36JdtPP/10brrpJqZOncrll1/OBRdcwOLFiwuL7eWUdbdkRJwHnFfhWMzMLCGSePbZZwHYtm0bEyZM6LNjl/tVADMzsw5t376d5ubmfY/PPfdc5s2bx6JFi5gzZw4jRoxg1KhR3HfffX0Wk4ubmZn1SmeXJb/2ta+xZMkSZs6cycUXX8zZZ5/NokWL+iQmjy1pZmaF27RpE2vWrGHmzJkAzJs3j1/+8pd9dnwXNzMzK9zo0aPZtm0b69atA+DOO+9k6tSpfXZ8X5Y0M0tI0/77l3WHY3f215X2n7nNnj2biy66iCuuuIITTzyRmpoaRo8ezZVXXllYXF1xcTMzS8jvN2zo82O++OKLHW6fO3cuc+fO7eNoMr4saWZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmVlCJjQ2FTrlzYTGpi6P2X7KmyeffHLfc0899RT19fVccsklFez1S/l7bmZmCdnwh/XM/OJthe1v+Zdmd9mms7ElARYsWMCxxx5bWDzlcnEzM7OKuPHGGznwwAMZOXJknx/blyXNzKxX9g6/1dzcvG9Ekueff56vfOUrnHdedaYC9ZmbmZn1SkeXJc877zwWLFhAfX19VWJycTMzs8ItX76c66+/nnPOOYetW7dSU1NDXV0dZ555Zp8c38XNzMwKt2zZsn3r559/PvX19X1W2MDFzcwsKeMbDijrDsfu7G8gcnEzM0tIW+tTfX7M55577mWfP//88/smkBK+W9LMzJLj4mZmZslxcTMzs+S4uJmZWXJc3MzMLDkubmZmlpyyipukV0m6XtJvJa2V9JZKB2ZmZt3X0NRQ6JQ3DU0NZR33wgsvZPr06cyYMYPm5maWL1/OpZdeykEHHYQkNm/e/Bftly5dSnNzM9OnT+foo48u/OdQ7vfc/gO4LSJOkjQMeEXhkZiZWa+1rW/jPTfMKWx/N89d0mWbe++9l1tuuYVVq1YxfPhwNm/ezM6dOxk2bBjHHXccs2bN+ov2W7du5YwzzuC2226jqamJp59+urB49+qyuEkaBRwFnAoQETuBnYVHYmZmA9KGDRsYO3Ysw4cPB2Ds2LEATJgwocP21157Le973/toasomQn3Na15TeEzlXJY8ENgEfFfSA5IWSXrJ5DyS5ktaIWnFpk2bCg/UbLBxTtlAccwxx7B+/XqmTJnCGWecwd133/2y7detW8czzzzDrFmzOPzww7n66qsLj6mc4jYEOAz4VkQcCjwPfK59o4hYGBEtEdEybty4gsM0G3ycUzZQ1NfXs3LlShYuXMi4ceOYN28eixcv7rT97t27WblyJbfeeiu33347X/7yl1m3bl2hMZXzmVsr0BoRy/PH19NBcTMzs8GrtraWWbNmMWvWLA455BCuuuoqTj311A7bNjY2MnbsWEaOHMnIkSM56qijWLNmDVOmTCksni7P3CJiI7Be0hvyTe8EflNYBGZmNqA98sgjPProo/ser169mokTJ3ba/vjjj2fZsmXs3r2bP/3pTyxfvpypU6cWGlO5d0ueBXw/v1PyceC0QqMwM7NCTDhgQll3OHZnf1157rnnOOuss9i6dStDhgzhoIMOYuHChXzjG9/gq1/9Khs3bmTGjBnMmTOHRYsWMXXqVGbPns2MGTOoqanh4x//OAcffHBhMQMoIgrdIUBLS0usWLGi8P1a8SQVfttwJf6mBjAVsRPnlHVm7dq1hZ/19Bed9K2snPIIJWZmlhwXNzMzS46Lm5nZAJfiRwG97ZOLm5nZAFZXV8eWLVuSKnARwZYtW6irq+vxPsq9W9LMzPqhxsZGWltbSW0Um7q6OhobG3v8ehc3M7MBbOjQoUyePLnaYfQ7vixpZmbJcXEzM7PkuLiZmVlyXNzMzCw5Lm5mZpYcFzczM0uOi5sVqmZoDZIKWxqaGqrdJTMbgPw9NyvUnl17Cp9lwMysu3zmZmZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmZklx8XNzMyS4+JmZmbJcXEzM7PkuLiZmVlyXNzMzCw5ZRc3SbWSHpB0SyUDMjMz663unLl9ClhbqUDMzMyKUlZxk9QI/C2wqLLhmJmZ9V65Z25fB84B9lQwFjMzs0J0WdwkHQc8HREru2g3X9IKSSs2bdpUWIBmg5VzyqznyjlzeyvwXklPAtcB75B0TftGEbEwIloiomXcuHEFh2k2+DinzHquy+IWEedGRGNETAJOBn4aER+ueGRmZmY95O+5mZlZcoZ0p3FELAWWViQSMzOzgvjMzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmZklx8XNzMyS4+JmZmbJcXEzM7PkuLiZmVlyXNwGmIamBiQVtphZsYrO0YamhkLjG1Y3tLDYhgyrLbSvE8ePL6yf3ZoVwKqvbX0b77lhTmH7u3nuksL2ZWb9P0d3vbC7sPhunruE1gmNhewLoLGttbB9+czNzMyS4+JmZmbJcXEzM7PkuLiZmVlyXNzMzCw5Lm5mZpYcFzczM0uOi5uZmSXHxc3MzJLj4mZmZslxcTMzs+S4uJmZWXJc3MzMLDldFjdJB0j6maS1kh6W9Km+CMzMzKynypnyZjfwmYhYJWk/YKWkOyPiNxWOzczMrEe6PHOLiA0RsSpf/yOwFih29jwzM7MCdWuyUkmTgEOB5R08Nx+YD9DU1FRAaOVraGqgbX1bYfurGVbDnp17Ctvf0OFD2LljV2H7G2yKmjF8RE0t2/e8WMi+AJr235/fb9hQ2P7aq2ZOWf9SVA4UrWZoTbETjA4t7jaQsoubpHrgx8CnI+LZ9s9HxEJgIUBLS0sUFmEZKjHzbX+eSXewKWqm38a21n47a3BHqplT1r/M/OJthe1r+ZdmF7avPbv29Nv/K8sqk5KGkhW270fETwo7upmZWQWUc7ekgO8AayPi3ysfkpmZWe+Uc+b2VuAjwDskrc6X4s5DzczMCtblZ24R8XOgf36aaWZm1gGPUGJmZslxcTMzs+S4uJmZWXJc3MzMLDkubmZmlhwXNzMzS46Lm5mZJcfFzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOd2aibsow+qGsuuF3dU4dFXUDK3ptzPp9ndFzvRb5Cy/lpaGpgba1rdVOwwrUFWK264Xdvfb2VsrocjZavt7X4vmn531hbb1bf47S4zfypqZWXJc3MzMLDkubmZmlhwXNzMzS46Lm5mZJcfFzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOS5uZmaWHBc3MzNLjoubmZklp6ziJmm2pEckPSbpc5UOyszMrDe6LG6SaoHLgGOBacApkqZVOjAzM7OeKufM7a+AxyLi8YjYCVwHHF/ZsMzMzHpOEfHyDaSTgNkR8fH88UeAmRFxZrt284H5+cM3AI8UH26PjQU2VzuIPjTY+gv9t8+bI2J2T17onOp3Bluf+2t/y8qpcmbiVgfbXlIRI2IhsLCM/fU5SSsioqXacfSVwdZfSLPPzqn+ZbD1eaD3t5zLkq3AASWPG4G2yoRjZmbWe+UUt/uB10uaLGkYcDLw35UNy8zMrOe6vCwZEbslnQncDtQCV0bEwxWPrFj98tJOBQ22/sLg7HM1Dcaf92Dr84Dub5c3lJiZmQ00HqHEzMyS4+JmZmbJGfDFrauhwSQ1SfqZpAckPShpTr59kqTtklbny7f7PvruK6O/EyXdlfd1qaTGkuc+KunRfPlo30beM73s74slv1/fBFUm59RLnndO/fm5gZNTETFgF7IbXH4HHAgMA9YA09q1WQicnq9PA57M1ycBD1W7DxXo74+Aj+br7wC+l6+/Gng8/3d0vj662n2qVH/zx89Vuw8DbXFOOadSyamBfuZWztBgAYzK11/JwP6OXjn9nQbcla//rOT5dwN3RsT/RcQzwJ1Aj0bO6EO96a/1jHPKOZVETg304tYArC953JpvK3U+8GFJrcAS4KyS5ybnl1bulvS2ikZajHL6uwY4MV+fC+wnaUyZr+1vetNfgDpJKyTdJ+mEyoaaDOeUcyqJnBroxa2cocFOARZHRCMwB/iepBpgA9AUEYcCZwPXShpF/1ZOfz8LHC3pAeBo4A/A7jJf29/0pr+Q/X5bgA8CX5f0uopFmg7nlHMqiZwqZ2zJ/qycocE+Rn6pICLulVQHjI2Ip4EX8u0rJf0OmAKsqHjUPddlfyOiDXgfgKR64MSI2Ja/y57V7rVLKxlsAXrc35LniIjHJS0FDiX7vME655xyTqWRU9X+0K83C1lxfhyYzJ8/HJ3ers3/AKfm61PJfpECxgG1+fYDyd6dvLrafSqgv2OBmnz9QuBL+fqrgSfIPvgena+n3N/RwPCSNo/S7oNzLz3+mTunwjnV33Oq6gEU8MuaA6wje/fw+Xzbl4D35uvTgF/kv8TVwDH59hOBh/Ptq4D3VLsvBfX3pPyPbh2waO8fY/7c3wGP5ctp1e5LJfsLHAH8Ov/9/hr4WLX7MlAW55RzKoWc8vBbZmaWnIF+Q4mZmdlLuLiZmVlyXNzMzCw5Lm5mZpYcFzczM0uOi1sVlYywvUbSKklHtHt+gaQdkl5Zsm2WpG35EEe/lXRJvv20ktG6d0r6db5+kaRTJV3abt9LJbWUPD5UUkh6d7t2r5V0raTHJa2UdK+kuZX5iZj1jnPK9nJxq67tEdEcEW8CzgX+rd3zpwD3k43vVmpZZEMcHQocJ+mtEfHdfF/NZF+qfXv++CVTWnTiFODn+b8ASBJwI3BPRBwYEYcDJ5ONamDWHzmnDHBx609GAc/sfZCP2VYPfIGS5CgVEdvJvkTbq8Fa84Q7CTgVOCYfTgmy6S52RsS+ebki4vcR8c3eHM+sjzinBrGBPrbkQDdC0mqgDhhP9oe/1ynAD4BlwBskvSaysfv2kTQaeD1wTxnHmifpyJLHB5WsvxV4IiJ+l48XNwf4CTCdbKQJs4HCOWWAz9yqbe8llDeSDUR7df6OD7JLFddFxB6ypHh/yeveJulBYCNwS0RsLONYP9x7iSW/zFI6mO0pZPM6kf/b4btaSZfln2XcX3YPzfqWc8oAn7n1G5GNrj4WGCdpf7J3j3fmeTmMbLDTy/LmyyLiOElTgJ9LuiEiVvfkuJJqycYEfK+kz5MNgDtG0n5k4wTundeJiPhEHmN/HuXdDHBODXY+c+snJL2RbAr4LWTv8s6PiEn5MgFokDSx9DURsY7sA/N/7sWh3wWsiYgD8mNNBH4MnAD8lGxywtNL2r+iF8cy6zPOqcHNxa26Ruy91Rj4IfDRiHiR7PLJDe3a3pBvb+/bwFGSJvcwhlM6ONaPgQ9GNqr2CWQTFz4h6VfAVfQu8c0qyTllAJ4VwMzM0uMzNzMzS46Lm5mZJcfFzczMkuPiZmZmyXFxMzOz5Li4mZlZclzczMwsOf8Pgf5cATIG3JgAAAAASUVORK5CYII=\n"}, "metadata": {"needs_background": "light"}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "bins = np.linspace(df1.ADJOE.min(), df1.ADJOE.max(), 10)\ng = sns.FacetGrid(df1, col=\"windex\", hue=\"POSTSEASON\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'ADJOE', bins=bins, ec=\"k\")\n\ng.axes[-1].legend()\nplt.show()", "execution_count": 9, "outputs": [{"output_type": "display_data", "data": {"text/plain": "
", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFEZJREFUeJzt3X+UVOV9x/HPZ5eV5bBaPZIf7C4rGEOCKF1lE2JiDE1Ts1KNQZOgMTk11ZJqlERTTKxtND16Dv44NSdRkyIxamPiaU3UaIg/akSxKEYQLEpEoxbWhQpEUBQV5Ns/5kJGnGVn8e7OszPv1zn3MHPnmXu/l9lnP3OfvfOMI0IAAKSmrtIFAABQCgEFAEgSAQUASBIBBQBIEgEFAEgSAQUASBIB1Q9sz7W9dx/aj7a9rD9r6mG/19p+1vaSbJnRS/t5tjsGqj7UpsHQf2xfmfWZJ2xvLupDnxvIOqrdkEoXUI0iYkqla+iDmRFxU6WLALYbDP0nIr4mFcJR0u0R0V6qne0hEbF1AEurKpxB9ZHtc7afadi+3PZvs9t/afun2e3nbI/I3tktt3217cdt32V7WNZmou2lth+U9LWi7dfbvtT272w/Zvur2fqptv/LBSNtr7D93n46xh/afiSr+bslHq/Pzr6W2f4f22dl699n+w7bi2zPt/3B/qgPg1eN9J8HbF9k+35JZ9j+qe3PFj2+qej2t20/nNX6nf6oZzAjoPrufkkfz253SGqy3SDpcEnzS7R/v6QrI2K8pA2Sjs/W/0TSjIg4bKf2p0jaGBEfkvQhSX9ne0xE3CxpjQqd8WpJ50fEmuIn2t6zaKhh5+XAHo7n0qI2B2frzouIDkkTJH3C9oSdntMuqSUiDoqIg7NjkaTZks6MiImS/kHSVT3sE7Wr2vpPT/aKiCMi4ns9NbA9RVKbpEkq9KmP2v5oH/dT1Rji67tFkiba3lPS65IWq9DRPi6p1N9wno2IJUXPHW37zyTtHRH3Zev/XdJR2e0jJU0oGsv+MxU66bOSzpS0TNJDEfHznXcUES+r8IPeF6WG+L5ge7oKPx8jJR0o6bGix5+RtL/tH0j6taS7bDdJ+qik/7S9vd3QPtaC6ldt/acnN5bR5kgV6n40u98kaaykBTnVMOgRUH0UEVtsPyfpKyr8ID0m6S8kvU/S8hJPeb3o9puShkmypJ4mQbQKZyF3lnisRdI2Se+xXRcR297yxEKnL/UuVJK+GBFP9PBY8TbGqHD286GIeNH2tZIai9tk6/9c0qdVeEf6BUnfkLShp7F4QKr+/lPklaLbW5WNVtmu159+71rShRHx4z5st6YwxLd77lfhl/j9KvxA/72kJVHmzLsRsUHSRtuHZ6tOKnr4TkmnZcMesj3W9nDbQ1QY1viiCh357BLbfTki2ntYyu1ce6nQuTbafo/+9M50B9sjJNVFxC8k/bOkQyPiJUnP2v581sZZiAE7q+b+U8pzkiZmt6dKqi+q9RTbw7NaW7O+hQxnULtnvqTzJD0YEa/Yfk09v/PqyVckXWP7VRV+ULebI2m0pMUujJWtlfRZSd+UND8i5tteIul3tn8dEaXede62iFhq+1FJj6swlPffJZq1SPqJ7e1vcM7N/j1J0g9t/5OkBhWGOZbmWR+qQtX2nx78m6Rbbf+VpLuUnRVGxNzsQqKHsmHxl1UI0HUDUNOgYL5uAwCQIob4AABJIqAAAEkioAAASSKgAABJ6peA6uzsDBU+p8DCUu1LLugzLDW2lKVfAmrdOq6SBPqCPgO8HUN8AIAkEVAAgCSVHVDZNPaP2r69PwsCAEDq21RHX1dhDqu9+qkWAKhJW7ZsUVdXl1577bVKl5KrxsZGtba2qqGhYbeeX1ZA2W6V9NeSLlKJSRYBALuvq6tLe+65p0aPHq2ir6sZ1CJC69evV1dXl8aMGbNb2yh3iO97ks5RYap6AECOXnvtNe27775VE06SZFv77rvvOzor7DWgbB8t6YWIWNRLu+kufE34I2vXrt3tgtC/WtpaZDuXpWFYQ5LbamlrqfR/c1noMyhWTeG03Ts9pnKG+D4m6TMufD1xo6S9bP80Ir5U3CgiZqvwld/q6Ogo+4NYGFjdq7p1zM1TctnWbVPnJrutwYA+A+xar2dQEXFuRLRGxGhJJ0j67c7hBADIz34jR+Y2omBb+40c2es+6+vr1d7evmOZNWuWJOmee+7RoYceqvb2dh1++OF6+umn+/vwd+ALCwEgMSvXrFFXc2tu22vt7uq1zbBhw7RkyZK3rT/ttNN06623aty4cbrqqqt04YUX6tprr82ttl3pU0BFxDxJ8/qlEgBAcmzrpZdekiRt3LhRzc3NA7ZvzqAAANq8ebPa29t33D/33HM1bdo0zZkzR1OmTNGwYcO011576aGHHhqwmggoAECPQ3yXX3655s6dq0mTJunSSy/V2WefrTlz5gxITczFBwAoae3atVq6dKkmTZokSZo2bZoWLFgwYPsnoAAAJe2zzz7auHGjVqxYIUm6++67NW7cuAHbP0N8AJCYtve+t6wr7/qyvd7s/Deozs5OzZo1S1dffbWOP/541dXVaZ999tE111yTW129IaAAIDH/u3r1gO/zzTffLLl+6tSpmjp16gBXU8AQHwAgSQQUACBJBBQAIEkEFAAgSQQUACBJBBQAIEkEFAAkprm1Ldev22hubet1nzt/3cZzzz2347GVK1eqqalJl112WT8e9dvxOSgASMzq51dp0nfuyG17C/+ls9c2Pc3FJ0lnnXWWjjrqqNzqKRcBBQDo0S233KL9999fw4cPH/B9M8QHANgx1VF7e/uOmSNeeeUVXXzxxTr//PMrUhNnUACAkkN8559/vs466yw1NTVVpCYCCgBQ0sKFC3XTTTfpnHPO0YYNG1RXV6fGxkadccYZA7J/AgoAUNL8+fN33L7gggvU1NQ0YOEkEVAAkJyRLaPKuvKuL9sbjAgoAEhMd9fKAd/npk2bdvn4BRdcMDCFFOEqPgBAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJJ6DSjbjbYftr3U9uO2vzsQhQFArWppa8n16zZa2lrK2u9FF12k8ePHa8KECWpvb9fChQt1xRVX6IADDpBtrVu37i3t582bp/b2do0fP16f+MQncv9/KOdzUK9L+mREbLLdIOkB27+JiIdyrwYAoO5V3Trm5im5be+2qXN7bfPggw/q9ttv1+LFizV06FCtW7dOb7zxhvbYYw8dffTRmjx58lvab9iwQaeffrruuOMOtbW16YUXXsit3u16DaiICEnbP8HVkC2ReyUAgIpZvXq1RowYoaFDh0qSRowYIUlqbm4u2f5nP/uZjjvuOLW1Fb4M8d3vfnfuNZX1Nyjb9baXSHpB0t0RsbBEm+m2H7H9yNq1a/Ous6blebpfK/L6/9pv5Mj+rJE+M0jk2QfLHW4baEceeaRWrVqlsWPH6vTTT9d99923y/YrVqzQiy++qMmTJ2vixIm6/vrrc6+prKmOIuJNSe2295Z0s+2DImLZTm1mS5otSR0dHZxh5SjP0/1yTvWrQVdzay7bae3uymU7pdBnBo9a6INNTU1atGiR5s+fr3vvvVfTpk3TrFmzdPLJJ5dsv3XrVi1atEj33HOPNm/erMMOO0wf+chHNHbs2Nxq6tNcfBGxwfY8SZ2SlvXSHAAwiNTX12vy5MmaPHmyDj74YF133XU9BlRra6tGjBih4cOHa/jw4TriiCO0dOnSXAOqnKv43pWdOcn2MEmfkvT73CoAAFTck08+qaeeemrH/SVLlmi//fbrsf2xxx6r+fPna+vWrXr11Ve1cOFCjRs3LteayjmDGinpOtv1KgTaf0TE7blWAQDYoXlUc65Dgc2jSl/oUGzTpk0688wztWHDBg0ZMkQHHHCAZs+ere9///u65JJLtGbNGk2YMEFTpkzRnDlzNG7cOHV2dmrChAmqq6vTqaeeqoMOOii3mqXyruJ7TNIhue4VANCj51c+P+D7nDhxohYsWPC29TNmzNCMGTNKPmfmzJmaOXNmv9XETBIAgCQRUACAJBFQAJCAwpwI1eWdHhMBBQAV1tjYqPXr11dVSEWE1q9fr8bGxt3eRp8+BwUAyF9ra6u6urpUbTOKNDY2qrV19z80T0ABQIU1NDRozJgxlS4jOQzxAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJJEQAEAkkRAAQCSREABAJLUa0DZHmX7XtvLbT9u++sDURgAoLYNKaPNVknfjIjFtveUtMj23RHxRD/XBgCoYb2eQUXE6ohYnN1+WdJySS39XRgAoLb16W9QtkdLOkTSwv4oBgCA7coZ4pMk2W6S9AtJ34iIl0o8Pl3SdElqa2vLrcDetLS1qHtVdy7bah7VrOdXPp/LtvZobNCW17fmsi30TV1DnVq7u3LbVn+pVJ9B5dnOZTvD6uq1edubuWyrYeiQ3H5n5fW7tKyAst2gQjjdEBG/LNUmImZLmi1JHR0d8Y4rK1P3qm4dc/OUXLZ129S5uWxHkra8vjXJumrBti3bBsX/faX6DCqvq7k1l+20dnfluq3U+k05V/FZ0o8lLY+If81lrwAA9KKc8YuPSfqypE/aXpIt+cQsAAA96HWILyIekJTPgCkAAGViJgkAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSeg0o29fYfsH2soEoCAAAqbwzqGsldfZzHQAAvEWvARUR90v64wDUAgDADkPy2pDt6ZKmS1JbW9su27a0tah7VXdeu85NXUOdbFe6DNSIvvSZlOXZn5tHNev5lc/nsq2Uf8+0dnclt60U5RZQETFb0mxJ6ujoiF217V7VrWNunpLLfm+bOjeX7UjSti3bkqwL1akvfSZlqfbnVOvK+/dMiseYF67iAwAkiYACACSpnMvMfy7pQUkfsN1l+5T+LwsAUOt6/RtURJw4EIUAAFCMIT4AQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSCCgAQJIIKABAkggoAECSygoo2522n7T9tO1v93dRAAD0GlC26yVdKekoSQdKOtH2gf1dGACgtpVzBvVhSU9HxDMR8YakGyUd279lAQBqnSNi1w3sz0nqjIhTs/tfljQpIs7Yqd10SdOzux+Q9GT+5Q6IEZLWVbqId6gajkEaHMexLiI6d+eJVdRnpMHxWvWGYxg4ZfWbIWVsyCXWvS3VImK2pNllbC9pth+JiI5K1/FOVMMxSNVzHD2plj4jVcdrxTGkp5whvi5Jo4rut0rq7p9yAAAoKCegfifp/bbH2N5D0gmSftW/ZQEAal2vQ3wRsdX2GZLulFQv6ZqIeLzfK6ucahhyqYZjkKrnOGpBNbxWHENier1IAgCASmAmCQBAkggoAECSai6gbF9j+wXby4rWfd7247a32e7Yqf252RRPT9r+9MBX/HZ9OQbbo21vtr0kW35UmarfqodjuNT2720/Zvtm23sXPZbc61BL6Df0m4qIiJpaJB0h6VBJy4rWjVPhg5LzJHUUrT9Q0lJJQyWNkfQHSfWD7BhGF7dLZenhGI6UNCS7fbGki1N+HWppod+ksdRav6m5M6iIuF/SH3datzwiSn2K/1hJN0bE6xHxrKSnVZj6qaL6eAxJ6uEY7oqIrdndh1T4zJ2U6OtQS+g3aai1flNzAdVHLZJWFd3vytYNNmNsP2r7Ptsfr3QxZfpbSb/JblfL61ArquX1ot9UWDlTHdWysqZ5StxqSW0Rsd72REm32B4fES9VurCe2D5P0lZJN2xfVaLZYHsdakk1vF70mwRwBrVrg36ap+z0fn12e5EK49BjK1tVz2z/jaSjJZ0U2UC6quB1qDGD/vWi36SBgNq1X0k6wfZQ22MkvV/SwxWuqU9svyv7Ti/Z3l+FY3imslWVZrtT0rckfSYiXi16aNC/DjVm0L9e9JtEVPoqjYFeJP1chdP3LSq8wzhF0tTs9uuS/k/SnUXtz1Ph3dOTko6qdP19PQZJx0t6XIWreRZLOqbS9e/iGJ5WYcx8Sbb8KOXXoZYW+g39phILUx0BAJLEEB8AIEkEFAAgSQQUACBJBBQAIEkEFAAgSQRUgmxPtR22P5jd3z6z8qO2l9t+OPtg3vb2J9u+ouj+9Gx2499nbQ8vemxeNrPx9lmabxrYowPyR5+pTkx1lKYTJT0g6QRJF2Tr/hARh0g7Pjj4S9t1EfGT4ifaPlrSVyUdHhHrbB+qwjQtH46INVmzkyLikYE4EGCA0GeqEGdQibHdJOljKnwA74RSbSLiGUlnS5pR4uFvSZoZEeuytoslXSfpa/1SMFBh9JnqRUCl57OS7oiIFZL+mL2bK2WxpA+WWD9e0qKd1j2Srd/uhqLhikvfccVAZdFnqhRDfOk5UdL3sts3ZvevLNGu1EzFPbHeOosxwxWoJvSZKkVAJcT2vpI+Kekg2yGpXoVOclWJ5odIWl5i/ROSJkr6bdG6Q7P1QFWhz1Q3hvjS8jlJ10fEfhExOiJGSXpWf/qGTEmFK5QkXSbpByW2cYmki7OOK9vtkk5W6Q4LDHb0mSrGGVRaTpQ0a6d1v5D0j5LeZ/tRSY2SXpb0g6KrkYaoMBuzIuJXtlskLcjeUb4s6UsRsbpomzfY3pzdXhcRn+qfwwH6HX2mijGbeRWwfbmkpyKCd3xAGegzgwMBNcjZ/o2kPSQdFxEbK10PkDr6zOBBQAEAksRFEgCAJBFQAIAkEVAAgCQRUACAJBFQAIAk/T9mzBRRZdMnhAAAAABJRU5ErkJggg==\n"}, "metadata": {"needs_background": "light"}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "# Pre-processing: Feature selection/extraction"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Lets look at how Adjusted Defense Efficiency plots"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "bins = np.linspace(df1.ADJDE.min(), df1.ADJDE.max(), 10)\ng = sns.FacetGrid(df1, col=\"windex\", hue=\"POSTSEASON\", palette=\"Set1\", col_wrap=2)\ng.map(plt.hist, 'ADJDE', bins=bins, ec=\"k\")\ng.axes[-1].legend()\nplt.show()\n", "execution_count": 10, "outputs": [{"output_type": "display_data", "data": {"text/plain": "
", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFSxJREFUeJzt3X2UXHV9x/H3Z5OQDVkQTECzuywJTxoe4gJr4wNCCpaGFAoL1IAPFY9KK4IarFRqK1jlHBRaLPJwGiIERLGnaEQg5aFQIFaIJrBBIBIQaLJsIkkk4RkS8u0fcxOHZSY7G+7s/Gb28zpnzs7c+c3vfu9kfvnM/c3MvYoIzMzMUtNU6wLMzMxKcUCZmVmSHFBmZpYkB5SZmSXJAWVmZklyQJmZWZIcUFUgab6knQbRfqKkh6pZU5n1zpX0pKSe7PKFAdrfJalrqOqz4akexo+kS7Mx84ikl4vG0IlDWUejG1nrAhpRRMyodQ2D8JWIuL7WRZhtVg/jJyI+D4VwBG6KiM5S7SSNjIiNQ1haQ/Ee1CBJOmvznoakiyTdmV0/QtK12fWnJI3P3tktlXSFpIcl3SZpTNbmYElLJN0LfL6o/xGSLpD0a0kPSvqbbHm3pP9WwQRJyyS9s0rbeLmkRVnN3yhx/4hs7+shSb+RNCtbvqekWyQtlrRA0rurUZ/Vr2Eyfn4h6TxJ9wCnS7pW0nFF979QdP2rkn6V1fr1atRTzxxQg3cP8KHsehfQImkUcAiwoET7vYFLI2I/YB1wQrb8KuALEfH+fu0/DayPiPcC7wU+K2lSRMwDVlEYjFcA50TEquIHStqhaKqh/2XfMttzQVGbA7JlX4uILmAKcJikKf0e0wm0RcT+EXFAti0As4EzIuJg4O+Ay8qs04avRhs/5ewYEYdGxHfLNZA0A+gAplIYUx+Q9IFBrqeheYpv8BYDB0vaAXgVuJ/CQPsQUOoznCcjoqfosRMlvQ3YKSLuzpb/ADgqu34kMKVoLvttFAbpk8AZwEPAfRFxXf8VRcTzFF7og1Fqiu8jkk6l8PqYAOwLPFh0/xPAHpK+B9wM3CapBfgA8J+SNrcbPcharPE12vgp58cVtDmSQt0PZLdbgH2AX+ZUQ91zQA1SRGyQ9BTwKQovpAeBPwX2BJaWeMirRddfB8YAAsodBFEU9kJuLXFfG7AJeIekpojY9IYHFgZ9qXehAB+NiEfK3FfcxyQKez/vjYhnJc0FmovbZMvfA/w5hXekHwG+BKwrNxdvBo0/foq8WHR9I9lslaQR/PH/XQHfiojvD6LfYcVTfNvmHgr/id9D4QX9t0BPVHjk3YhYB6yXdEi26GNFd98KfC6b9kDSPpLGShpJYVrjoxQG8pkl+n0+IjrLXCodXDtSGFzrJb2DP74z3ULSeKApIn4C/BNwUEQ8Bzwp6a+yNspCzKy/Rh4/pTwFHJxd7wZGFNX6aUljs1rbs7FlGe9BbZsFwNeAeyPiRUmvUP6dVzmfAq6U9BKFF+pmc4CJwP0qzJWtBo4DvgwsiIgFknqAX0u6OSJKvevcZhGxRNIDwMMUpvL+t0SzNuAqSZvf4Jyd/f0YcLmkfwRGUZjmWJJnfdYQGnb8lPHvwA2S/gy4jWyvMCLmZ18kui+bFn+eQoCuGYKa6oJ8ug0zM0uRp/jMzCxJDigzM0uSA8rMzJLkgDIzsyRVJaCmT58eFH6n4IsvjX7JhceML8PsUpGqBNSaNf6WpNlgeMyYvZmn+MzMLEkOKDMzS5IDyszMkuRDHZmZ1diGDRvo7e3llVdeqXUpuWpubqa9vZ1Ro0Zt0+MdUGZmNdbb28sOO+zAxIkTKTpdTV2LCNauXUtvby+TJk3apj48xWdmVmOvvPIK48aNa5hwApDEuHHj3tJeoQOqSto62pCUy6Wto63Wm2NmVdZI4bTZW90mT/FVSd+KPo6ZNyOXvm7snp9LP2Zm9cR7UGZmidl9woTcZmAksfuECQOuc8SIEXR2dm65nH/++QDccccdHHTQQXR2dnLIIYfw+OOPV3vzt/AelJlZYpavWkVva3tu/bX39Q7YZsyYMfT09Lxp+ec+9zluuOEGJk+ezGWXXca3vvUt5s6dm1ttW+M9KDMzK0sSzz33HADr16+ntbV1yNbtPSgzM+Pll1+ms7Nzy+2zzz6bmTNnMmfOHGbMmMGYMWPYcccdue+++4asJgeUmZmVneK76KKLmD9/PlOnTuWCCy7gzDPPZM6cOUNSk6f4zMyspNWrV7NkyRKmTp0KwMyZM/nlL385ZOt3QJmZWUk777wz69evZ9myZQDcfvvtTJ48ecjW7yk+M7PEdLzznRV9824w/Q2k/2dQ06dP5/zzz+eKK67ghBNOoKmpiZ133pkrr7wyt7oG4oAyM0vM/61cOeTrfP3110su7+7upru7e4irKfAUn5mZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJqiigJO0k6XpJv5W0VNL7q12Ymdlw1drekevpNlrbOwZcZ//TbTz11FNb7lu+fDktLS1ceOGFVdzqN6v0d1D/BtwSESdK2g7Yvoo1mZkNayufXsHUr9+SW38L/3n6gG3KHYsPYNasWRx11FG51VOpAQNK0o7AocApABHxGvBadcsyM7MU/OxnP2OPPfZg7NixQ77uSqb49gBWA1dJekDSHElvqlTSqZIWSVq0evXq3As1azQeM5aSzYc66uzs3HLkiBdffJFvf/vbnHPOOTWpqZKAGgkcBFweEQcCLwJf7d8oImZHRFdEdO2yyy45l2nWeDxmLCWbp/h6enqYN28eAOeccw6zZs2ipaWlJjVV8hlUL9AbEQuz29dTIqDMzKyxLFy4kOuvv56zzjqLdevW0dTURHNzM6effvqQrH/AgIqIVZJWSHpXRDwKHAE8Uv3SzMyslhYsWLDl+rnnnktLS8uQhRNU/i2+M4AfZt/gewL4VPVKMjMb3ia07VbRN+8G0189qiigIqIH6KpyLWZmBvT1Lh/ydb7wwgtbvf/cc88dmkKK+EgSZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSXJAWVmlpi2jrZcT7fR1tFW0XrPO+889ttvP6ZMmUJnZycLFy7kkksuYa+99kISa9aseUP7u+66i87OTvbbbz8OO+yw3J+HSn+oa2ZmQ6RvRR/HzJuRW383ds8fsM29997LTTfdxP3338/o0aNZs2YNr732Gttttx1HH30006ZNe0P7devWcdppp3HLLbfQ0dHBM888k1u9mzmgzMyMlStXMn78eEaPHg3A+PHjAWhtbS3Z/kc/+hHHH388HR2FkyHuuuuuudfkKT4zM+PII49kxYoV7LPPPpx22mncfffdW22/bNkynn32WaZNm8bBBx/MNddck3tNDigzM6OlpYXFixcze/ZsdtllF2bOnMncuXPLtt+4cSOLFy/m5ptv5tZbb+Wb3/wmy5Yty7UmT/GZmRkAI0aMYNq0aUybNo0DDjiAq6++mlNOOaVk2/b2dsaPH8/YsWMZO3Yshx56KEuWLGGfffbJrR7vQZmZGY8++iiPPfbYlts9PT3svvvuZdsfe+yxLFiwgI0bN/LSSy+xcOFCJk+enGtN3oMyM0tM626tFX3zbjD9DeSFF17gjDPOYN26dYwcOZK99tqL2bNnc/HFF/Od73yHVatWMWXKFGbMmMGcOXOYPHky06dPZ8qUKTQ1NfGZz3yG/fffP7eaARQRuXYI0NXVFYsWLcq933oiKbevid7YPZ+8/p3aOtroW9GXS1+tu7Xy9PKnc+mrjimPTjxmhrelS5fmvveRijLbVtG48R7UMJPn7yvyfIdnZtafP4MyM7MkOaDMzBJQjY9bau2tbpMDysysxpqbm1m7dm1DhVREsHbtWpqbm7e5D38GZWZWY+3t7fT29rJ69epal5Kr5uZm2tvbt/nxDigzsxobNWoUkyZNqnUZyfEUn5mZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSXJAWVmZkmqOKAkjZD0gKSbqlmQmZkZDG4P6ovA0moVYmZmVqyigJLUDvwFMKe65ZiZmRVUugf1XeAsYFO5BpJOlbRI0qJGO2R8rTWNakJSLpdU62rraMu1tnrQKGNm9wkTcnsd7D5hQq03xxIy4Ok2JB0NPBMRiyVNK9cuImYDswG6uroa56xbCdi0YRPHzJuRS183ds/PpR9It6560ShjZvmqVfS2bvs5f4q19/Xm0o81hkr2oD4I/KWkp4AfA4dLuraqVZmZ2bA3YEBFxNkR0R4RE4GTgDsj4uNVr8zMzIY1/w7KzMySNKhTvkfEXcBdVanEzMysiPegzMwsSQ4oMzNLkgPKzMyS5IAyM7MkOaDMzCxJDigzM0uSA8rMzJLkgDIzsyQ5oMzMLEkOKDMzS5IDyszMkuSAMjOzJDmgzMwsSQ4oMzNLkgPKzMySNKjzQZmZ9dc0qon2vt7c+kpRW0cbfSv6cumrdbdWnl7+dC59NToHlJm9JZs2bOKYeTNy6evG7vm59JO3vhV9Db+NKUrz7YqZmQ17DigzM0uSA8rMzJLkgDIzsyQ5oMzMLEkOKDMzS5IDyszMkuSAMjOzJDmgzMwsSQ4oMzNLkgPKzMyS5IAyM7MkOaDMzCxJDigzM0uSA8rMzJI0YEBJ2k3S/0haKulhSV8cisLMzGx4q+SEhRuBL0fE/ZJ2ABZLuj0iHqlybWZmNowNuAcVESsj4v7s+vPAUqCt2oWZmdnwNqjPoCRNBA4EFpa471RJiyQtWr16dT7VmTUwj5nqautoQ1IuF6uNSqb4AJDUAvwE+FJEPNf//oiYDcwG6OrqitwqNGtQHjPV1beij2Pmzcilrxu75+fSjw1ORXtQkkZRCKcfRsRPq1uSmZlZZd/iE/B9YGlE/Gv1SzIzM6tsD+qDwCeAwyX1ZJd89pvNzMzKGPAzqIj4BeBPCc3MbEj5SBJmZpYkB5SZmSXJAWVmZklyQJmZWZIcUGZmliQHlJmZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSWp4lO+p2rEdiPYtGFTLn01jWrKrS8bnKZRTRTOjZlDX9s1sem1fP4dR40eyWuvbMilr7eqraONvhV9ufQ1avRINry6MZe+zKql7gNq04ZNHDMvn/Mn3tg9n97W9lz6au/rzaWf4SLvf8c8+0pF34o+v9ZtWPEUn5mZJckBZWZmSXJAmZlZkhxQZmaWJAeUmZklyQFlZmZJckCZmVmSHFBmZpYkB5SZmSXJAWVmZklyQJmZWZIcUGZmliQHlJmZJckBZWZmSXJAmZlZkhxQZmaWpIoCStJ0SY9KelzSV6tdlJmZ2YABJWkEcClwFLAvcLKkfatdmJmZDW+V7EH9CfB4RDwREa8BPwaOrW5ZZmY23Ckitt5AOhGYHhGfyW5/ApgaEaf3a3cqcGp2813Ao8B4YE3eRQ+xet8G119dayJi+rY8sMyYgfS3eSD1Xj/U/zakXn9F42ZkBR2pxLI3pVpEzAZmv+GB0qKI6KpgHcmq921w/ekqNWag/re53uuH+t+Geq9/s0qm+HqB3YputwN91SnHzMysoJKA+jWwt6RJkrYDTgJ+Xt2yzMxsuBtwii8iNko6HbgVGAFcGREPV9j/m6Yv6lC9b4Prrz/1vs31Xj/U/zbUe/1ABV+SMDMzqwUfScLMzJLkgDIzsyTlGlCSZkl6WNJDkq6T1CxprqQnJfVkl84815knSV/Man9Y0peyZW+XdLukx7K/O9e6znLK1H+upKeLnv8Zta6zmKQrJT0j6aGiZSWfcxVcnB1y60FJB9Wu8nzU+5gBj5uhNpzGTG4BJakN+ALQFRH7U/hCxUnZ3V+JiM7s0pPXOvMkaX/gsxSOnPEe4GhJewNfBe6IiL2BO7LbydlK/QAXFT3/82tWZGlzgf4/2Cv3nB8F7J1dTgUuH6Iaq6Lexwx43NTIXIbJmMl7im8kMEbSSGB76uv3UpOB+yLipYjYCNwNdFM4rNPVWZurgeNqVN9AytWftIi4B/hDv8XlnvNjgWui4D5gJ0kThqbSqqnnMQMeN0NuOI2Z3AIqIp4GLgSWAyuB9RFxW3b3ednu5UWSRue1zpw9BBwqaZyk7YEZFH6g/I6IWAmQ/d21hjVuTbn6AU7Pnv8rU55qKVLuOW8DVhS1682W1aUGGDPgcZOKhhwzeU7x7UwhrScBrcBYSR8HzgbeDbwXeDvw93mtM08RsRT4NnA7cAuwBNhY06IGYSv1Xw7sCXRS+E/wX2pVYw4qOuxWvaj3MQMeN3WgrsdMnlN8HwaejIjVEbEB+CnwgYhYme1evgpcRWGuN0kR8f2IOCgiDqWwC/0Y8PvNu8TZ32dqWePWlKo/In4fEa9HxCbgChJ+/ouUe84b7bBbdT9mwOMmEQ05ZvIMqOXA+yRtL0nAEcDSoidNFOZFH9pKHzUladfsbwdwPHAdhcM6fTJr8knghtpUN7BS9febb+4m4ee/SLnn/OfAX2ffTHofhSmxlbUoMCd1P2bA4yYRjTlmIiK3C/AN4LcU/jF/AIwG7gR+ky27FmjJc505178AeITCbv4R2bJxFL4V81j29+21rnOQ9f8ge/4fpPBinVDrOvvVfB2FKZQNFN7tfbrcc05huuJS4HfZNnXVuv4ctr+ux8xWXnceN9Wrd9iMGR/qyMzMkuQjSZiZWZIcUGZmliQHlJmZJckBZWZmSXJAmZlZkhxQCZLULSkkvTu7PVHSy5IekLRU0q8kfbKo/SmSLsmuFx+F+TFJP5W0b1HbuyQ9WnSU5uuHfgvN8udx03gGPOW71cTJwC8oHNn63GzZ7yLiQABJewA/ldQUEVeVePxFEXFh1nYmcKekAyJidXb/xyJiUVW3wGzoedw0GO9BJUZSC/BBCj++O6lUm4h4AjiTwqkatioi/gO4DfhojmWaJcXjpjE5oNJzHHBLRCwD/qDyJxi7n8IBRSvRv+0Pi6YqLngLtZqlwuOmAXmKLz0nA9/Nrv84u31piXaljlJcTv+2nqqwRuNx04AcUAmRNA44HNhfUlA4w2oAl5VofiCwtMKuDwQ8sKwhedw0Lk/xpeVECme/3D0iJkbEbsCTFA6Rv4WkiRROdPe9gTqUdAJwJIUDTJo1Io+bBuU9qLScDJzfb9lPgH8A9pT0ANAMPA98r+ibSCOBV4seMys78d1YCkfEPrzom0hQmEt/Obu+JiI+nPN2mA0lj5sG5aOZNwBJF1E4yVqpKQ0zK8HjJn0OqDon6b+A7YDjI2J9resxqwceN/XBAWVmZknylyTMzCxJDigzM0uSA8rMzJLkgDIzsyQ5oMzMLEn/D5atggtp71QbAAAAAElFTkSuQmCC\n"}, "metadata": {"needs_background": "light"}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "We see that this data point doesn't impact the ability of a team to get into the Final Four. "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## Convert Categorical features to numerical values"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Lets look at the postseason:"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1.groupby(['windex'])['POSTSEASON'].value_counts(normalize=True)", "execution_count": 11, "outputs": [{"output_type": "execute_result", "execution_count": 11, "data": {"text/plain": "windex POSTSEASON\nFalse S16 0.605263\n E8 0.263158\n F4 0.131579\nTrue S16 0.500000\n E8 0.333333\n F4 0.166667\nName: POSTSEASON, dtype: float64"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "12% of teams with 6 or less wins above bubble make it into the final four while 18% of teams with 7 or more do.\n"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Lets convert wins above bubble (winindex) under 7 to 0 and over 7 to 1:\n"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df['windex'].replace(to_replace=['False','True'], value=[0,1],inplace=True)\ndf.head()", "execution_count": 12, "outputs": [{"output_type": "execute_result", "execution_count": 12, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR \\\n0 North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 15.4 \n1 Villanova BE 40 35 123.1 90.9 0.9703 56.1 46.7 16.3 \n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 15.3 \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 15.1 \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 17.8 \n\n ... 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED YEAR windex \n0 ... 53.9 44.6 32.7 36.2 71.7 8.6 2ND 1.0 2016 1 \n1 ... 57.4 44.1 36.2 33.9 66.7 8.9 Champions 2.0 2016 1 \n2 ... 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 2016 0 \n3 ... 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 2016 1 \n4 ... 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 2016 1 \n\n[5 rows x 25 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
TEAMCONFGWADJOEADJDEBARTHAGEFG_OEFG_DTOR...2P_O2P_D3P_O3P_DADJ_TWABPOSTSEASONSEEDYEARwindex
0North CarolinaACC4033123.394.90.953152.648.115.4...53.944.632.736.271.78.62ND1.020161
1VillanovaBE4035123.190.90.970356.146.716.3...57.444.136.233.966.78.9Champions2.020161
2Notre DameACC3624118.3103.30.826954.049.515.3...52.946.537.436.965.52.3E86.020160
3VirginiaACC3729119.991.00.960054.848.415.1...52.646.340.334.761.98.6E81.020161
4KansasB123732120.990.40.966255.745.117.8...52.743.441.332.570.111.6E81.020161
\n

5 rows \u00d7 25 columns

\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## One Hot Encoding \n#### How about seed?"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1.groupby(['SEED'])['POSTSEASON'].value_counts(normalize=True)", "execution_count": 13, "outputs": [{"output_type": "execute_result", "execution_count": 13, "data": {"text/plain": "SEED POSTSEASON\n1.0 E8 0.750000\n F4 0.125000\n S16 0.125000\n2.0 S16 0.444444\n E8 0.333333\n F4 0.222222\n3.0 S16 0.700000\n E8 0.200000\n F4 0.100000\n4.0 S16 0.875000\n E8 0.125000\n5.0 S16 0.833333\n F4 0.166667\n6.0 E8 1.000000\n7.0 S16 0.800000\n F4 0.200000\n8.0 S16 1.000000\n9.0 E8 1.000000\n10.0 F4 1.000000\n11.0 S16 0.500000\n E8 0.250000\n F4 0.250000\n12.0 S16 1.000000\nName: POSTSEASON, dtype: float64"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "#### Feature before One Hot Encoding"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "df1[['ADJOE','ADJDE','BARTHAG','EFG_O','EFG_D']].head()", "execution_count": 14, "outputs": [{"output_type": "execute_result", "execution_count": 14, "data": {"text/plain": " ADJOE ADJDE BARTHAG EFG_O EFG_D\n2 118.3 103.3 0.8269 54.0 49.5\n3 119.9 91.0 0.9600 54.8 48.4\n4 120.9 90.4 0.9662 55.7 45.1\n5 118.4 96.2 0.9163 52.3 48.9\n6 111.9 93.6 0.8857 50.0 47.3", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
ADJOEADJDEBARTHAGEFG_OEFG_D
2118.3103.30.826954.049.5
3119.991.00.960054.848.4
4120.990.40.966255.745.1
5118.496.20.916352.348.9
6111.993.60.885750.047.3
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "#### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "Feature = df1[['ADJOE','ADJDE','BARTHAG','EFG_O','EFG_D']]\nFeature = pd.concat([Feature,pd.get_dummies(df1['POSTSEASON'])], axis=1)\nFeature.drop(['S16'], axis = 1,inplace=True)\nFeature.head()\n", "execution_count": 15, "outputs": [{"output_type": "execute_result", "execution_count": 15, "data": {"text/plain": " ADJOE ADJDE BARTHAG EFG_O EFG_D E8 F4\n2 118.3 103.3 0.8269 54.0 49.5 1 0\n3 119.9 91.0 0.9600 54.8 48.4 1 0\n4 120.9 90.4 0.9662 55.7 45.1 1 0\n5 118.4 96.2 0.9163 52.3 48.9 1 0\n6 111.9 93.6 0.8857 50.0 47.3 0 1", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
ADJOEADJDEBARTHAGEFG_OEFG_DE8F4
2118.3103.30.826954.049.510
3119.991.00.960054.848.410
4120.990.40.966255.745.110
5118.496.20.916352.348.910
6111.993.60.885750.047.301
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Feature selection"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Lets defind feature sets, X:"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "X = Feature\nX[0:5]", "execution_count": 16, "outputs": [{"output_type": "execute_result", "execution_count": 16, "data": {"text/plain": " ADJOE ADJDE BARTHAG EFG_O EFG_D E8 F4\n2 118.3 103.3 0.8269 54.0 49.5 1 0\n3 119.9 91.0 0.9600 54.8 48.4 1 0\n4 120.9 90.4 0.9662 55.7 45.1 1 0\n5 118.4 96.2 0.9163 52.3 48.9 1 0\n6 111.9 93.6 0.8857 50.0 47.3 0 1", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
ADJOEADJDEBARTHAGEFG_OEFG_DE8F4
2118.3103.30.826954.049.510
3119.991.00.960054.848.410
4120.990.40.966255.745.110
5118.496.20.916352.348.910
6111.993.60.885750.047.301
\n
"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "What are our lables? Round where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)|"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "y = df1['POSTSEASON'].values\ny[0:5]", "execution_count": 17, "outputs": [{"output_type": "execute_result", "execution_count": 17, "data": {"text/plain": "array(['E8', 'E8', 'E8', 'E8', 'F4'], dtype=object)"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## Normalize Data "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Data Standardization give data zero mean and unit variance (technically should be done after train test split )"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "X= preprocessing.StandardScaler().fit(X).transform(X)\nX[0:5]", "execution_count": 18, "outputs": [{"output_type": "stream", "text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n return self.partial_fit(X, y)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/ipykernel/__main__.py:1: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n if __name__ == '__main__':\n", "name": "stderr"}, {"output_type": "execute_result", "execution_count": 18, "data": {"text/plain": "array([[ 0.28034482, 2.74329908, -2.45717765, 0.10027963, 0.94171924,\n 1.58113883, -0.40824829],\n [ 0.64758014, -0.90102957, 1.127076 , 0.39390887, 0.38123706,\n 1.58113883, -0.40824829],\n [ 0.87710222, -1.0788017 , 1.29403598, 0.72424177, -1.30020946,\n 1.58113883, -0.40824829],\n [ 0.30329703, 0.63966222, -0.04972253, -0.52368251, 0.63600169,\n 1.58113883, -0.40824829],\n [-1.18859646, -0.13068368, -0.87375079, -1.36786658, -0.17924511,\n -0.63245553, 2.44948974]])"}, "metadata": {}}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "## Training and Validation "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Split the data into Training and Validation data."}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "# We split the X into train and test to find the best k\nfrom sklearn.model_selection import train_test_split\nX_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=4)\nprint ('Train set:', X_train.shape, y_train.shape)\nprint ('Validation set:', X_val.shape, y_val.shape)", "execution_count": 19, "outputs": [{"output_type": "stream", "text": "Train set: (44, 7) (44,)\nValidation set: (12, 7) (12,)\n", "name": "stdout"}]}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "# Classification "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "Now, it is your turn, use the training set to build an accurate model. Then use the validation set to report the accuracy of the model\nYou should use the following algorithm:\n- K Nearest Neighbor(KNN)\n- Decision Tree\n- Support Vector Machine\n- Logistic Regression\n\n"}, {"metadata": {}, "cell_type": "markdown", "source": "# K Nearest Neighbor(KNN)\n\nQuestion 1 Build a KNN model using a value of k equals three, find the accuracy on the validation data (X_val and y_val)"}, {"metadata": {}, "cell_type": "markdown", "source": "You can use accuracy_score"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.metrics import accuracy_score\nfrom sklearn.neighbors import KNeighborsClassifier\n\nk = 3\nfinalfour_knn = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)\n\nyhat = finalfour_knn.predict(X_val)\n\nprint(\"Train set Accuracy: \", accuracy_score(y_train, finalfour_knn.predict(X_train)))\nprint(\"Validation set Accuracy: \", accuracy_score(y_val, yhat))", "execution_count": 20, "outputs": [{"output_type": "stream", "text": "Train set Accuracy: 0.9772727272727273\nValidation set Accuracy: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "Question 2 Determine the accuracy for the first 15 values of k the on the validation data:"}, {"metadata": {}, "cell_type": "code", "source": "k_vals = []\ntrain_scores = []\nval_scores = []\n\nfor k in range(1,16):\n finalfour_knn = KNeighborsClassifier(n_neighbors = k).fit(X_train, y_train)\n yhat = finalfour_knn.predict(X_val)\n k_vals.append(k)\n train_scores.append(accuracy_score(y_train, finalfour_knn.predict(X_train)))\n val_scores.append(accuracy_score(y_val, yhat))\n\nknn_k_scores = {\"K\": k_vals, \"Training Set Accuracy\": train_scores, \"Validation Set Accuracy\": val_scores}\n\ndf_k_scores = pd.DataFrame(knn_k_scores)\nnp.array(val_scores)", "execution_count": 21, "outputs": [{"output_type": "execute_result", "execution_count": 21, "data": {"text/plain": "array([1. , 1. , 1. , 1. , 1. ,\n 1. , 0.91666667, 0.91666667, 0.83333333, 0.83333333,\n 0.83333333, 0.83333333, 0.83333333, 0.83333333, 0.83333333])"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Decision Tree"}, {"metadata": {}, "cell_type": "markdown", "source": "The following lines of code fit a DecisionTreeClassifier:"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.tree import DecisionTreeClassifier", "execution_count": 22, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "Question 3 Determine the minumum value for the parameter max_depth that improves results "}, {"metadata": {}, "cell_type": "code", "source": "depths = []\nval_scores = []\n\nfor d in range(1,16):\n finalfour_tree = DecisionTreeClassifier(max_depth = d)\n finalfour_tree.fit(X_train, y_train)\n finalfour_predtree = finalfour_tree.predict(X_val)\n depths.append(d)\n val_scores.append(accuracy_score(y_val, finalfour_predtree))\n \ndepth_accuracy = {\"Max Depth\": depths, \"Validation Set Accuracy\": val_scores}\ndf_d_scores = pd.DataFrame(depth_accuracy)\nprint(\"To achieve improved accuracy, max depth must be at least 2\")\ndf_d_scores \n# Need max_depth >= 2", "execution_count": 23, "outputs": [{"output_type": "stream", "text": "To achieve improved accuracy, max depth must be at least 2\n", "name": "stdout"}, {"output_type": "execute_result", "execution_count": 23, "data": {"text/plain": " Max Depth Validation Set Accuracy\n0 1 0.833333\n1 2 1.000000\n2 3 1.000000\n3 4 1.000000\n4 5 1.000000\n5 6 1.000000\n6 7 1.000000\n7 8 1.000000\n8 9 1.000000\n9 10 1.000000\n10 11 1.000000\n11 12 1.000000\n12 13 1.000000\n13 14 1.000000\n14 15 1.000000", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Max DepthValidation Set Accuracy
010.833333
121.000000
231.000000
341.000000
451.000000
561.000000
671.000000
781.000000
891.000000
9101.000000
10111.000000
11121.000000
12131.000000
13141.000000
14151.000000
\n
"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "finalfour_tree = DecisionTreeClassifier(max_depth = 2)\nfinalfour_tree.fit(X_train, y_train)\nfinalfour_predtree = finalfour_tree.predict(X_val)\nprint(\"Accuracy score\",accuracy_score(y_val, finalfour_predtree))", "execution_count": 24, "outputs": [{"output_type": "stream", "text": "Accuracy score 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Support Vector Machine"}, {"metadata": {}, "cell_type": "markdown", "source": "Question 4Train the following linear support vector machine model and determine the accuracy on the validation data "}, {"metadata": {}, "cell_type": "code", "source": "from sklearn import svm", "execution_count": 25, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "finalfour_svm = svm.SVC(kernel='linear')\nfinalfour_svm.fit(X_train, y_train) \nyhat = finalfour_svm.predict(X_val)\nprint(\"Validation set Accuracy: \", accuracy_score(y_val, yhat))", "execution_count": 26, "outputs": [{"output_type": "stream", "text": "Validation set Accuracy: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Logistic Regression"}, {"metadata": {}, "cell_type": "markdown", "source": "Question 5 Train a logistic regression model and determine the accuracy of the validation data (set C=0.01)"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.linear_model import LogisticRegression", "execution_count": 27, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "finalfour_LR = LogisticRegression(C=0.01, solver='liblinear').fit(X_train, y_train)\nyhat = finalfour_LR.predict(X_val)\nprint(\"Validation set Accuracy: \", accuracy_score(y_val, yhat))", "execution_count": 28, "outputs": [{"output_type": "stream", "text": "Validation set Accuracy: 1.0\n", "name": "stdout"}, {"output_type": "stream", "text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.\n \"this warning.\", FutureWarning)\n", "name": "stderr"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Model Evaluation using Test set"}, {"metadata": {}, "cell_type": "code", "source": "from sklearn.metrics import f1_score\nfrom sklearn.metrics import log_loss", "execution_count": 29, "outputs": []}, {"metadata": {}, "cell_type": "code", "source": "def jaccard_index(predictions, true):\n if (len(predictions) == len(true)):\n intersect = 0;\n for x,y in zip(predictions, true):\n if (x == y):\n intersect += 1\n return intersect / (len(predictions) + len(true) - intersect)\n else:\n return -1", "execution_count": 30, "outputs": []}, {"metadata": {}, "cell_type": "markdown", "source": "Question 5 Calculate the F1 score and Jaccard Similarity score for each model from above. Use the Hyperparameter that performed best on the validation data."}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "### Load Test set for evaluation "}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "code", "source": "test_df = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0120ENv3/Dataset/ML0101EN_EDX_skill_up/basketball_train.csv',error_bad_lines=False)\ntest_df", "execution_count": 31, "outputs": [{"output_type": "execute_result", "execution_count": 31, "data": {"text/plain": " TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D \\\n0 North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 \n1 Villanova BE 40 35 123.1 90.9 0.9703 56.1 46.7 \n2 Notre Dame ACC 36 24 118.3 103.3 0.8269 54.0 49.5 \n3 Virginia ACC 37 29 119.9 91.0 0.9600 54.8 48.4 \n4 Kansas B12 37 32 120.9 90.4 0.9662 55.7 45.1 \n5 Oregon P12 37 30 118.4 96.2 0.9163 52.3 48.9 \n6 Syracuse ACC 37 23 111.9 93.6 0.8857 50.0 47.3 \n7 Oklahoma B12 37 29 118.2 94.1 0.9326 54.3 47.2 \n8 Davidson A10 32 19 113.0 106.0 0.6767 52.0 52.0 \n9 Duquesne A10 33 16 108.2 105.1 0.5851 53.9 49.4 \n10 Fordham A10 30 16 101.8 100.4 0.5388 51.5 51.4 \n11 George Mason A10 32 11 101.4 103.1 0.4507 45.1 48.4 \n12 George Washington A10 38 28 114.9 99.1 0.8454 50.8 48.7 \n13 La Salle A10 30 8 99.2 105.3 0.3358 47.7 53.3 \n14 Massachusetts A10 32 14 102.1 100.7 0.5394 49.1 48.3 \n15 Rhode Island A10 32 17 107.6 98.3 0.7403 50.4 47.6 \n16 Richmond A10 32 16 112.6 104.0 0.7136 54.7 50.9 \n17 Saint Louis A10 32 11 96.4 100.2 0.3913 47.7 50.0 \n18 St. Bonaventure A10 31 22 113.2 104.4 0.7173 51.5 49.6 \n19 Boston College ACC 32 7 95.0 99.1 0.3820 47.5 52.4 \n20 Clemson ACC 31 17 112.1 98.0 0.8241 50.6 47.9 \n21 Florida St. ACC 34 20 112.2 99.1 0.8069 51.8 50.9 \n22 Georgia Tech ACC 36 21 113.9 98.2 0.8450 49.8 49.1 \n23 Louisville ACC 31 23 111.5 89.6 0.9250 51.9 44.7 \n24 North Carolina St. ACC 33 16 113.6 102.9 0.7561 48.5 50.6 \n25 Virginia Tech ACC 35 20 109.5 97.8 0.7855 50.0 49.0 \n26 Wake Forest ACC 31 11 106.9 99.4 0.6977 49.4 49.5 \n27 Albany AE 32 23 104.7 102.1 0.5703 50.7 48.7 \n28 Binghamton AE 30 8 89.3 102.1 0.1763 44.1 49.6 \n29 Hartford AE 32 9 99.4 114.2 0.1685 49.1 53.3 \n... ... ... .. .. ... ... ... ... ... \n1727 Mississippi SEC 34 21 112.2 99.2 0.8037 47.8 47.6 \n1728 New Mexico St. WAC 32 21 103.9 98.1 0.6613 50.6 46.4 \n1729 North Dakota St. Sum 31 21 100.8 100.7 0.5035 49.7 48.1 \n1730 Northeastern CAA 35 23 107.0 101.0 0.6586 54.5 49.8 \n1731 Oklahoma St. B12 31 17 110.7 95.4 0.8469 50.2 46.5 \n1732 Providence BE 34 22 111.6 93.7 0.8821 48.4 48.2 \n1733 Purdue B10 34 21 110.5 93.9 0.8654 50.3 45.4 \n1734 Robert Morris NEC 35 20 100.7 103.3 0.4286 50.1 49.3 \n1735 SMU Amer 33 26 111.1 94.6 0.8635 52.1 45.7 \n1736 St. John's BE 32 20 110.6 94.7 0.8569 49.3 46.4 \n1737 Stephen F. Austin Slnd 31 26 111.0 98.8 0.7919 55.6 49.0 \n1738 Texas B12 34 20 110.3 90.7 0.9048 49.1 42.1 \n1739 Texas Southern SWAC 35 22 101.9 108.1 0.3371 49.5 49.3 \n1740 UC Irvine BW 32 19 105.1 96.8 0.7202 50.7 44.7 \n1741 Valparaiso Horz 31 25 105.0 94.5 0.7706 51.3 45.2 \n1742 VCU A10 36 26 108.4 94.0 0.8373 48.7 49.5 \n1743 Wofford SC 33 26 102.3 96.8 0.6547 51.1 47.0 \n1744 Wyoming MWC 33 23 102.1 98.2 0.6121 52.2 47.0 \n1745 Boise St. MWC 32 23 109.5 96.0 0.8210 53.2 47.6 \n1746 BYU WCC 33 23 118.3 100.3 0.8699 53.2 49.6 \n1747 Manhattan MAAC 33 19 100.2 99.4 0.5238 49.4 48.1 \n1748 North Florida ASun 32 20 105.2 105.0 0.5063 53.6 47.9 \n1749 North Carolina ACC 38 26 119.6 92.5 0.9507 51.6 45.4 \n1750 North Carolina St. ACC 36 22 114.1 96.8 0.8684 49.3 45.5 \n1751 Oklahoma B12 35 24 111.6 88.5 0.9349 49.3 44.1 \n1752 UCLA P12 36 22 111.8 96.6 0.8425 49.6 48.5 \n1753 Utah P12 34 25 114.9 88.7 0.9513 55.2 43.0 \n1754 West Virginia B12 35 25 110.3 93.3 0.8733 46.1 52.7 \n1755 Wichita St. MVC 34 29 114.3 91.5 0.9277 50.3 45.8 \n1756 Xavier BE 37 23 115.7 95.1 0.9049 53.3 50.0 \n\n TOR ... FTRD 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED \\\n0 15.4 ... 30.4 53.9 44.6 32.7 36.2 71.7 8.6 2ND 1.0 \n1 16.3 ... 30.0 57.4 44.1 36.2 33.9 66.7 8.9 Champions 2.0 \n2 15.3 ... 26.0 52.9 46.5 37.4 36.9 65.5 2.3 E8 6.0 \n3 15.1 ... 33.4 52.6 46.3 40.3 34.7 61.9 8.6 E8 1.0 \n4 17.8 ... 37.3 52.7 43.4 41.3 32.5 70.1 11.6 E8 1.0 \n5 16.1 ... 32.0 52.6 46.1 34.4 36.2 69.0 6.7 E8 1.0 \n6 18.1 ... 28.0 47.2 48.1 36.0 30.7 65.5 -0.3 F4 10.0 \n7 18.3 ... 28.3 48.2 45.3 42.2 33.7 70.8 8.0 F4 2.0 \n8 14.2 ... 30.6 51.1 52.2 35.5 34.3 71.3 -2.1 NaN NaN \n9 18.9 ... 38.6 53.1 42.8 36.6 40.2 73.6 -7.8 NaN NaN \n10 20.4 ... 40.5 51.2 52.3 34.6 33.4 67.5 -6.0 NaN NaN \n11 18.6 ... 30.0 45.7 46.0 29.2 35.3 68.6 -11.6 NaN NaN \n12 16.5 ... 27.0 48.4 48.2 37.1 33.0 67.2 -0.8 NaN NaN \n13 18.8 ... 34.9 45.4 51.1 33.6 38.1 67.0 -12.6 NaN NaN \n14 17.5 ... 40.2 48.8 49.0 33.0 31.3 71.6 -8.7 NaN NaN \n15 17.5 ... 39.6 48.3 46.6 36.5 33.3 65.7 -5.9 NaN NaN \n16 14.7 ... 36.1 55.4 49.4 35.6 36.0 69.2 -5.5 NaN NaN \n17 19.5 ... 39.3 46.9 50.5 32.8 32.8 69.6 -11.6 NaN NaN \n18 16.5 ... 37.4 48.8 50.5 37.3 32.1 68.9 -0.1 NaN NaN \n19 20.0 ... 37.0 46.9 50.0 32.3 38.2 67.8 -12.2 NaN NaN \n20 15.8 ... 28.5 49.1 45.5 35.2 34.9 64.5 -2.4 NaN NaN \n21 18.0 ... 36.7 52.1 48.9 34.1 36.2 72.7 0.1 NaN NaN \n22 16.7 ... 31.7 48.4 49.5 35.7 32.0 67.8 0.0 NaN NaN \n23 17.1 ... 40.5 51.9 42.8 34.7 32.1 67.2 4.3 NaN NaN \n24 16.1 ... 29.7 46.9 47.5 34.8 37.4 68.5 -4.0 NaN NaN \n25 18.0 ... 33.5 49.1 50.1 34.8 31.3 70.5 -0.5 NaN NaN \n26 20.1 ... 38.6 50.4 49.2 31.6 33.4 72.4 -6.2 NaN NaN \n27 18.4 ... 27.2 51.8 50.2 32.3 30.9 67.3 -4.0 NaN NaN \n28 20.7 ... 34.4 45.0 49.0 28.4 33.7 66.6 -17.3 NaN NaN \n29 16.3 ... 41.1 45.6 54.0 35.4 34.7 70.2 -17.7 NaN NaN \n... ... ... ... ... ... ... ... ... ... ... ... \n1727 16.5 ... 41.2 46.8 43.6 33.2 35.8 66.6 -0.5 R64 11.0 \n1728 21.8 ... 32.5 49.3 46.4 36.2 30.8 64.0 -4.5 R64 15.0 \n1729 16.0 ... 30.7 45.5 45.0 38.2 36.2 61.5 -3.9 R64 15.0 \n1730 21.5 ... 25.0 53.0 49.1 38.6 34.4 63.6 -4.6 R64 14.0 \n1731 18.5 ... 42.1 49.2 44.1 34.6 34.4 64.0 0.2 R64 9.0 \n1732 18.1 ... 39.3 48.9 47.1 31.3 33.3 65.7 3.7 R64 6.0 \n1733 19.9 ... 38.4 50.9 42.3 32.7 35.1 65.0 1.0 R64 9.0 \n1734 19.8 ... 37.6 47.4 48.3 37.5 34.1 66.0 -8.6 R64 16.0 \n1735 19.5 ... 31.2 51.3 42.9 36.3 32.7 63.6 3.1 R64 6.0 \n1736 15.7 ... 33.3 48.2 44.9 34.5 32.6 67.0 2.3 R64 9.0 \n1737 20.3 ... 49.1 55.4 48.3 37.3 33.7 66.4 1.2 R64 12.0 \n1738 20.1 ... 34.8 48.3 37.7 33.8 34.7 62.8 0.9 R64 11.0 \n1739 19.7 ... 33.1 50.0 48.5 32.2 34.4 63.7 -4.5 R64 15.0 \n1740 18.2 ... 35.3 47.8 42.4 38.2 33.6 63.9 -4.6 R64 13.0 \n1741 19.0 ... 33.4 48.0 42.3 38.4 33.4 63.1 0.1 R64 13.0 \n1742 15.7 ... 37.8 46.9 48.3 34.3 34.5 67.2 3.0 R64 7.0 \n1743 17.2 ... 36.4 49.1 47.3 36.8 30.9 61.2 0.2 R64 12.0 \n1744 18.8 ... 27.0 54.4 44.0 32.6 34.9 59.2 -1.6 R64 12.0 \n1745 16.1 ... 31.6 49.6 48.1 38.9 31.3 63.4 0.2 R68 11.0 \n1746 16.4 ... 39.2 50.2 49.6 39.0 33.1 70.8 0.8 R68 11.0 \n1747 21.1 ... 52.3 48.8 48.0 33.6 32.4 68.0 -7.6 R68 16.0 \n1748 18.3 ... 34.1 51.1 48.3 38.1 31.3 67.5 -5.1 R68 16.0 \n1749 18.2 ... 37.8 50.9 45.6 35.8 30.0 69.9 6.5 S16 4.0 \n1750 16.0 ... 34.7 47.3 43.6 35.6 33.0 64.7 1.1 S16 8.0 \n1751 17.6 ... 28.4 48.2 42.2 34.3 31.7 67.5 4.5 S16 3.0 \n1752 17.6 ... 32.3 47.4 45.4 36.8 35.6 66.8 0.0 S16 11.0 \n1753 18.2 ... 34.3 52.3 41.4 40.1 31.2 61.4 3.7 S16 5.0 \n1754 18.7 ... 55.5 45.5 51.8 31.6 36.5 68.6 4.1 S16 5.0 \n1755 15.0 ... 36.6 48.9 42.6 35.4 35.3 62.6 4.2 S16 7.0 \n1756 18.1 ... 33.3 53.7 48.9 35.1 34.6 65.5 1.3 S16 6.0 \n\n YEAR \n0 2016 \n1 2016 \n2 2016 \n3 2016 \n4 2016 \n5 2016 \n6 2016 \n7 2016 \n8 2016 \n9 2016 \n10 2016 \n11 2016 \n12 2016 \n13 2016 \n14 2016 \n15 2016 \n16 2016 \n17 2016 \n18 2016 \n19 2016 \n20 2016 \n21 2016 \n22 2016 \n23 2016 \n24 2016 \n25 2016 \n26 2016 \n27 2016 \n28 2016 \n29 2016 \n... ... \n1727 NaN \n1728 NaN \n1729 NaN \n1730 NaN \n1731 NaN \n1732 NaN \n1733 NaN \n1734 NaN \n1735 NaN \n1736 NaN \n1737 NaN \n1738 NaN \n1739 NaN \n1740 NaN \n1741 NaN \n1742 NaN \n1743 NaN \n1744 NaN \n1745 NaN \n1746 NaN \n1747 NaN \n1748 NaN \n1749 NaN \n1750 NaN \n1751 NaN \n1752 NaN \n1753 NaN \n1754 NaN \n1755 NaN \n1756 NaN \n\n[1757 rows x 24 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
TEAMCONFGWADJOEADJDEBARTHAGEFG_OEFG_DTOR...FTRD2P_O2P_D3P_O3P_DADJ_TWABPOSTSEASONSEEDYEAR
0North CarolinaACC4033123.394.90.953152.648.115.4...30.453.944.632.736.271.78.62ND1.02016
1VillanovaBE4035123.190.90.970356.146.716.3...30.057.444.136.233.966.78.9Champions2.02016
2Notre DameACC3624118.3103.30.826954.049.515.3...26.052.946.537.436.965.52.3E86.02016
3VirginiaACC3729119.991.00.960054.848.415.1...33.452.646.340.334.761.98.6E81.02016
4KansasB123732120.990.40.966255.745.117.8...37.352.743.441.332.570.111.6E81.02016
5OregonP123730118.496.20.916352.348.916.1...32.052.646.134.436.269.06.7E81.02016
6SyracuseACC3723111.993.60.885750.047.318.1...28.047.248.136.030.765.5-0.3F410.02016
7OklahomaB123729118.294.10.932654.347.218.3...28.348.245.342.233.770.88.0F42.02016
8DavidsonA103219113.0106.00.676752.052.014.2...30.651.152.235.534.371.3-2.1NaNNaN2016
9DuquesneA103316108.2105.10.585153.949.418.9...38.653.142.836.640.273.6-7.8NaNNaN2016
10FordhamA103016101.8100.40.538851.551.420.4...40.551.252.334.633.467.5-6.0NaNNaN2016
11George MasonA103211101.4103.10.450745.148.418.6...30.045.746.029.235.368.6-11.6NaNNaN2016
12George WashingtonA103828114.999.10.845450.848.716.5...27.048.448.237.133.067.2-0.8NaNNaN2016
13La SalleA1030899.2105.30.335847.753.318.8...34.945.451.133.638.167.0-12.6NaNNaN2016
14MassachusettsA103214102.1100.70.539449.148.317.5...40.248.849.033.031.371.6-8.7NaNNaN2016
15Rhode IslandA103217107.698.30.740350.447.617.5...39.648.346.636.533.365.7-5.9NaNNaN2016
16RichmondA103216112.6104.00.713654.750.914.7...36.155.449.435.636.069.2-5.5NaNNaN2016
17Saint LouisA10321196.4100.20.391347.750.019.5...39.346.950.532.832.869.6-11.6NaNNaN2016
18St. BonaventureA103122113.2104.40.717351.549.616.5...37.448.850.537.332.168.9-0.1NaNNaN2016
19Boston CollegeACC32795.099.10.382047.552.420.0...37.046.950.032.338.267.8-12.2NaNNaN2016
20ClemsonACC3117112.198.00.824150.647.915.8...28.549.145.535.234.964.5-2.4NaNNaN2016
21Florida St.ACC3420112.299.10.806951.850.918.0...36.752.148.934.136.272.70.1NaNNaN2016
22Georgia TechACC3621113.998.20.845049.849.116.7...31.748.449.535.732.067.80.0NaNNaN2016
23LouisvilleACC3123111.589.60.925051.944.717.1...40.551.942.834.732.167.24.3NaNNaN2016
24North Carolina St.ACC3316113.6102.90.756148.550.616.1...29.746.947.534.837.468.5-4.0NaNNaN2016
25Virginia TechACC3520109.597.80.785550.049.018.0...33.549.150.134.831.370.5-0.5NaNNaN2016
26Wake ForestACC3111106.999.40.697749.449.520.1...38.650.449.231.633.472.4-6.2NaNNaN2016
27AlbanyAE3223104.7102.10.570350.748.718.4...27.251.850.232.330.967.3-4.0NaNNaN2016
28BinghamtonAE30889.3102.10.176344.149.620.7...34.445.049.028.433.766.6-17.3NaNNaN2016
29HartfordAE32999.4114.20.168549.153.316.3...41.145.654.035.434.770.2-17.7NaNNaN2016
..................................................................
1727MississippiSEC3421112.299.20.803747.847.616.5...41.246.843.633.235.866.6-0.5R6411.0NaN
1728New Mexico St.WAC3221103.998.10.661350.646.421.8...32.549.346.436.230.864.0-4.5R6415.0NaN
1729North Dakota St.Sum3121100.8100.70.503549.748.116.0...30.745.545.038.236.261.5-3.9R6415.0NaN
1730NortheasternCAA3523107.0101.00.658654.549.821.5...25.053.049.138.634.463.6-4.6R6414.0NaN
1731Oklahoma St.B123117110.795.40.846950.246.518.5...42.149.244.134.634.464.00.2R649.0NaN
1732ProvidenceBE3422111.693.70.882148.448.218.1...39.348.947.131.333.365.73.7R646.0NaN
1733PurdueB103421110.593.90.865450.345.419.9...38.450.942.332.735.165.01.0R649.0NaN
1734Robert MorrisNEC3520100.7103.30.428650.149.319.8...37.647.448.337.534.166.0-8.6R6416.0NaN
1735SMUAmer3326111.194.60.863552.145.719.5...31.251.342.936.332.763.63.1R646.0NaN
1736St. John'sBE3220110.694.70.856949.346.415.7...33.348.244.934.532.667.02.3R649.0NaN
1737Stephen F. AustinSlnd3126111.098.80.791955.649.020.3...49.155.448.337.333.766.41.2R6412.0NaN
1738TexasB123420110.390.70.904849.142.120.1...34.848.337.733.834.762.80.9R6411.0NaN
1739Texas SouthernSWAC3522101.9108.10.337149.549.319.7...33.150.048.532.234.463.7-4.5R6415.0NaN
1740UC IrvineBW3219105.196.80.720250.744.718.2...35.347.842.438.233.663.9-4.6R6413.0NaN
1741ValparaisoHorz3125105.094.50.770651.345.219.0...33.448.042.338.433.463.10.1R6413.0NaN
1742VCUA103626108.494.00.837348.749.515.7...37.846.948.334.334.567.23.0R647.0NaN
1743WoffordSC3326102.396.80.654751.147.017.2...36.449.147.336.830.961.20.2R6412.0NaN
1744WyomingMWC3323102.198.20.612152.247.018.8...27.054.444.032.634.959.2-1.6R6412.0NaN
1745Boise St.MWC3223109.596.00.821053.247.616.1...31.649.648.138.931.363.40.2R6811.0NaN
1746BYUWCC3323118.3100.30.869953.249.616.4...39.250.249.639.033.170.80.8R6811.0NaN
1747ManhattanMAAC3319100.299.40.523849.448.121.1...52.348.848.033.632.468.0-7.6R6816.0NaN
1748North FloridaASun3220105.2105.00.506353.647.918.3...34.151.148.338.131.367.5-5.1R6816.0NaN
1749North CarolinaACC3826119.692.50.950751.645.418.2...37.850.945.635.830.069.96.5S164.0NaN
1750North Carolina St.ACC3622114.196.80.868449.345.516.0...34.747.343.635.633.064.71.1S168.0NaN
1751OklahomaB123524111.688.50.934949.344.117.6...28.448.242.234.331.767.54.5S163.0NaN
1752UCLAP123622111.896.60.842549.648.517.6...32.347.445.436.835.666.80.0S1611.0NaN
1753UtahP123425114.988.70.951355.243.018.2...34.352.341.440.131.261.43.7S165.0NaN
1754West VirginiaB123525110.393.30.873346.152.718.7...55.545.551.831.636.568.64.1S165.0NaN
1755Wichita St.MVC3429114.391.50.927750.345.815.0...36.648.942.635.435.362.64.2S167.0NaN
1756XavierBE3723115.795.10.904953.350.018.1...33.353.748.935.134.665.51.3S166.0NaN
\n

1757 rows \u00d7 24 columns

\n
"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "test_df['windex'] = np.where(test_df.WAB > 7, 'True', 'False')\ntest_df1 = test_df[test_df['POSTSEASON'].str.contains('F4|S16|E8', na=False)]\ntest_df1. head()\ntest_df1.groupby(['windex'])['POSTSEASON'].value_counts(normalize=True)\ntest_Feature = test_df1[['ADJOE','ADJDE','BARTHAG','EFG_O','EFG_D']]\ntest_Feature = pd.concat([test_Feature,pd.get_dummies(test_df1['POSTSEASON'])], axis=1)\ntest_Feature.drop(['S16'], axis = 1,inplace=True)\ntest_Feature.head()\ntest_X=test_Feature\ntest_X= preprocessing.StandardScaler().fit(test_X).transform(test_X)\ntest_X[0:5]", "execution_count": 32, "outputs": [{"output_type": "stream", "text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n return self.partial_fit(X, y)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/ipykernel/__main__.py:10: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.\n", "name": "stderr"}, {"output_type": "execute_result", "execution_count": 32, "data": {"text/plain": "array([[ 3.37365934e-01, 2.66479976e+00, -2.46831661e+00,\n 2.13703245e-01, 9.44090550e-01, 1.58113883e+00,\n -4.08248290e-01],\n [ 7.03145068e-01, -7.13778644e-01, 1.07370841e+00,\n 4.82633172e-01, 4.77498943e-01, 1.58113883e+00,\n -4.08248290e-01],\n [ 9.31757027e-01, -8.78587347e-01, 1.23870131e+00,\n 7.85179340e-01, -9.22275877e-01, 1.58113883e+00,\n -4.08248290e-01],\n [ 3.60227129e-01, 7.14563447e-01, -8.92254236e-02,\n -3.57772849e-01, 6.89586037e-01, 1.58113883e+00,\n -4.08248290e-01],\n [-1.12575060e+00, 3.92401673e-04, -9.03545224e-01,\n -1.13094639e+00, 1.09073363e-02, -6.32455532e-01,\n 2.44948974e+00]])"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "code", "source": "test_y = test_df1['POSTSEASON'].values\ntest_y", "execution_count": 33, "outputs": [{"output_type": "execute_result", "execution_count": 33, "data": {"text/plain": "array(['E8', 'E8', 'E8', 'E8', 'F4', 'F4', 'S16', 'S16', 'S16', 'S16',\n 'S16', 'S16', 'S16', 'S16', 'E8', 'E8', 'E8', 'E8', 'F4', 'F4',\n 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'E8', 'E8',\n 'E8', 'E8', 'F4', 'F4', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16',\n 'S16', 'S16', 'E8', 'E8', 'E8', 'E8', 'F4', 'F4', 'S16', 'S16',\n 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'E8', 'E8', 'E8', 'E8',\n 'F4', 'F4', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16', 'S16'],\n dtype=object)"}, "metadata": {}}]}, {"metadata": {}, "cell_type": "markdown", "source": "KNN"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(finalfour_knn.predict(test_X), test_y, average='micro')\nJaccard = jaccard_index(finalfour_knn.predict(test_X), test_y)\nprint(\"F1-Score: {}\\nJaccard Index: {}\".format(F1, Jaccard))", "execution_count": 34, "outputs": [{"output_type": "stream", "text": "F1-Score: 0.7714285714285715\nJaccard Index: 0.627906976744186\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "Decision Tree"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(test_y, finalfour_tree.predict(test_X), average='micro')\nJaccard = jaccard_index(finalfour_tree.predict(test_X), test_y)\nprint(\"F1-Score: {}\\nJaccard Index: {}\".format(F1, Jaccard))", "execution_count": 35, "outputs": [{"output_type": "stream", "text": "F1-Score: 1.0\nJaccard Index: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "SVM"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(test_y, finalfour_svm.predict(test_X), average='micro')\nJaccard = jaccard_index(finalfour_svm.predict(test_X), test_y)\nprint(\"F1-Score: {}\\nJaccard Index: {}\".format(F1, Jaccard))", "execution_count": 36, "outputs": [{"output_type": "stream", "text": "F1-Score: 1.0\nJaccard Index: 1.0\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "Logistic Regression"}, {"metadata": {}, "cell_type": "code", "source": "F1 = f1_score(test_y, finalfour_LR.predict(test_X), average='micro')\nJaccard = jaccard_index(finalfour_LR.predict(test_X), test_y)\nLogLoss = log_loss(test_y, finalfour_LR.predict_proba(test_X))\nprint(\"F1-Score: {}\\nJaccard Index: {}\\nLog-Loss: {}\".format(F1, Jaccard, LogLoss))", "execution_count": 37, "outputs": [{"output_type": "stream", "text": "F1-Score: 1.0\nJaccard Index: 1.0\nLog-Loss: 0.9742807754772282\n", "name": "stdout"}]}, {"metadata": {}, "cell_type": "markdown", "source": "# Report\nYou should be able to report the accuracy of the built model using different evaluation metrics:"}, {"metadata": {}, "cell_type": "markdown", "source": "| Algorithm | Jaccard | F1-score | LogLoss |\n|--------------------|---------|----------|---------|\n| KNN |.93 |.93 | NA |\n| Decision Tree | 1 | 1 | NA |\n| SVM | 1 | 1 | NA |\n| LogisticRegression | 1 | 1 | .97 |"}, {"metadata": {"button": false, "new_sheet": false, "run_control": {"read_only": false}}, "cell_type": "markdown", "source": "

Want to learn more?

\n\nIBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems \u2013 by your enterprise as a whole. A free trial is available through this course, available here: SPSS Modeler\n\nAlso, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at Watson Studio\n\n

Thanks for completing this lesson!

\n\n

Author: Saeed Aghabozorgi

\n

Saeed Aghabozorgi, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients\u2019 ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.

\n\n
\n\n

Copyright © 2018 Cognitive Class. This notebook and its source code are released under the terms of the MIT License.

"}], "metadata": {"kernelspec": {"name": "python3", "display_name": "Python 3.6", "language": "python"}, "language_info": {"name": "python", "version": "3.6.9", "mimetype": "text/x-python", "codemirror_mode": {"name": "ipython", "version": 3}, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py"}}, "nbformat": 4, "nbformat_minor": 4} --------------------------------------------------------------------------------