├── assets
    ├── stacking.png
    ├── DoubleStacking.png
    └── datacamp.svg
├── notebooks
    ├── python_live_session_template_spark.ipynb
    ├── Applied_Machine_Learning_Ensemble_Modeling_Learners.ipynb
    └── Applied_Machine_Learning_Ensemble_Modeling_Solution.ipynb
├── README.md
└── data
    └── pima-indians-diabetes.csv


/assets/stacking.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/master/assets/stacking.png


--------------------------------------------------------------------------------
/assets/DoubleStacking.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/master/assets/DoubleStacking.png


--------------------------------------------------------------------------------
/notebooks/python_live_session_template_spark.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "colab_type": "text",
  7 |     "id": "6Ijg5wUCTQYG"
  8 |    },
  9 |    "source": [
 10 |     "<p align=\"center\">\n",
 11 |     "<img src=\"https://github.com/datacamp/python-live-training-template/blob/master/assets/datacamp.svg?raw=True\" alt = \"DataCamp icon\" width=\"50%\">\n",
 12 |     "</p>\n",
 13 |     "<br><br>\n",
 14 |     "\n",
 15 |     "## **Python PySpark Live Training Template**\n",
 16 |     "\n",
 17 |     "_Enter a brief description of your session, here's an example below:_\n",
 18 |     "\n",
 19 |     "Welcome to this hands-on training where we will immerse yourself in data visualization in Python. Using both `matplotlib` and `seaborn`, we'll learn how to create visualizations that are presentation-ready.\n",
 20 |     "\n",
 21 |     "The ability to present and discuss\n",
 22 |     "\n",
 23 |     "* Create various types of plots, including bar-plots, distribution plots, box-plots and more using Seaborn and Matplotlib.\n",
 24 |     "* Format and stylize your visualizations to make them report-ready.\n",
 25 |     "* Create sub-plots to create clearer visualizations and supercharge your workflow.\n",
 26 |     "\n",
 27 |     "## **The Dataset**\n",
 28 |     "\n",
 29 |     "_Enter a brief description of your dataset and its columns, here's an example below:_\n",
 30 |     "\n",
 31 |     "\n",
 32 |     "The dataset to be used in this webinar is a CSV file named `airbnb.csv`, which contains data on airbnb listings in the state of New York. It contains the following columns:\n",
 33 |     "\n",
 34 |     "- `listing_id`: The unique identifier for a listing\n",
 35 |     "- `description`: The description used on the listing\n",
 36 |     "- `host_id`: Unique identifier for a host\n",
 37 |     "- `host_name`: Name of host\n",
 38 |     "- `neighbourhood_full`: Name of boroughs and neighbourhoods\n",
 39 |     "- `coordinates`: Coordinates of listing _(latitude, longitude)_\n",
 40 |     "- `Listing added`: Date of added listing\n",
 41 |     "- `room_type`: Type of room \n",
 42 |     "- `rating`: Rating from 0 to 5.\n",
 43 |     "- `price`: Price per night for listing\n",
 44 |     "- `number_of_reviews`: Amount of reviews received \n",
 45 |     "- `last_review`: Date of last review\n",
 46 |     "- `reviews_per_month`: Number of reviews per month\n",
 47 |     "- `availability_365`: Number of days available per year\n",
 48 |     "- `Number of stays`: Total number of stays thus far\n"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "markdown",
 53 |    "metadata": {},
 54 |    "source": [
 55 |     "## **Setting up a PySpark session**\n",
 56 |     "\n",
 57 |     "This set of code lets you enable a PySpark session using google colabs, make sure to run the code snippets to enable PySpark."
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "code",
 62 |    "execution_count": null,
 63 |    "metadata": {},
 64 |    "outputs": [],
 65 |    "source": [
 66 |     "# Just run this code\n",
 67 |     "!apt-get install openjdk-8-jdk-headless -qq > /dev/null\n",
 68 |     "!wget -q https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz\n",
 69 |     "!tar xf spark-2.4.5-bin-hadoop2.7.tgz\n",
 70 |     "!pip install -q findspark"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "code",
 75 |    "execution_count": null,
 76 |    "metadata": {},
 77 |    "outputs": [],
 78 |    "source": [
 79 |     "# Just run this code too!\n",
 80 |     "import os\n",
 81 |     "os.environ[\"JAVA_HOME\"] = \"/usr/lib/jvm/java-8-openjdk-amd64\"\n",
 82 |     "os.environ[\"SPARK_HOME\"] = \"/content/spark-2.4.5-bin-hadoop2.7\""
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": null,
 88 |    "metadata": {},
 89 |    "outputs": [],
 90 |    "source": [
 91 |     "# Set up a Spark session\n",
 92 |     "import findspark\n",
 93 |     "findspark.init()\n",
 94 |     "from pyspark.sql import SparkSession\n",
 95 |     "spark = SparkSession.builder.master(\"local[*]\").getOrCreate()"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {
101 |     "colab_type": "text",
102 |     "id": "BMYfcKeDY85K"
103 |    },
104 |    "source": [
105 |     "## **Getting started**"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "code",
110 |    "execution_count": 2,
111 |    "metadata": {
112 |     "colab": {},
113 |     "colab_type": "code",
114 |     "id": "EMQfyC7GUNhT"
115 |    },
116 |    "outputs": [],
117 |    "source": [
118 |     "# Import other relevant libraries\n",
119 |     "from pyspark.ml.feature import VectorAssembler\n",
120 |     "from pyspark.ml.regression import LinearRegression"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": 0,
126 |    "metadata": {
127 |     "colab": {},
128 |     "colab_type": "code",
129 |     "id": "IAfz_jiu0NjN"
130 |    },
131 |    "outputs": [],
132 |    "source": [
133 |     "# Get dataset into local environment\n",
134 |     "!wget -O /tmp/airbnb.csv 'https://github.com/datacamp/python-live-training-template/blob/master/data/airbnb.csv?raw=True'\n",
135 |     "airbnb = spark.read.csv('/tmp/airbnb.csv', inferSchema=True, header =True)"
136 |    ]
137 |   }
138 |  ],
139 |  "metadata": {
140 |   "colab": {
141 |    "name": "Cleaning Data in Python live session.ipynb",
142 |    "provenance": []
143 |   },
144 |   "kernelspec": {
145 |    "display_name": "Python 3",
146 |    "language": "python",
147 |    "name": "python3"
148 |   },
149 |   "language_info": {
150 |    "codemirror_mode": {
151 |     "name": "ipython",
152 |     "version": 3
153 |    },
154 |    "file_extension": ".py",
155 |    "mimetype": "text/x-python",
156 |    "name": "python",
157 |    "nbconvert_exporter": "python",
158 |    "pygments_lexer": "ipython3",
159 |    "version": "3.7.1"
160 |   }
161 |  },
162 |  "nbformat": 4,
163 |  "nbformat_minor": 1
164 | }
165 | 


--------------------------------------------------------------------------------
/assets/datacamp.svg:
--------------------------------------------------------------------------------
1 | <svg height="77" viewBox="0 0 1367.47 306.77" width="345" xmlns="http://www.w3.org/2000/svg"><path d="m281.51 47.21-135.71-46.69a9.88 9.88 0 0 0 -6.43 0l-132.81 46.71a9.77 9.77 0 0 0 -6.5 10.29l18.7 176.93a9.82 9.82 0 0 0 5 7.56l115.55 63.55a9.81 9.81 0 0 0 9.54-.05l112.68-63.56a9.85 9.85 0 0 0 4.91-7.35l21.56-176.93a9.86 9.86 0 0 0 -6.49-10.46z" fill="#3ac"/><path d="m486.07 147.93a85.48 85.48 0 0 1 -5.79 31.92 71.67 71.67 0 0 1 -41.54 41.32 89 89 0 0 1 -32.79 5.83h-60.24v-158.1h60.23a88.33 88.33 0 0 1 32.79 5.85 74 74 0 0 1 25.27 16.25 72.76 72.76 0 0 1 16.29 25 85.49 85.49 0 0 1 5.78 31.93zm-30.17 0a73.61 73.61 0 0 0 -3.44-23.34 48.4 48.4 0 0 0 -9.95-17.49 43.45 43.45 0 0 0 -15.74-11 54 54 0 0 0 -20.82-3.83h-30.72v111.3h30.72a54 54 0 0 0 20.82-3.83 43.39 43.39 0 0 0 15.74-11 48.37 48.37 0 0 0 9.95-17.49 73.6 73.6 0 0 0 3.44-23.32zm49.46-17.49q19.35-17.71 46.57-17.71a45.54 45.54 0 0 1 17.6 3.22 37.21 37.21 0 0 1 13.12 9 38.4 38.4 0 0 1 8.14 13.72 52.72 52.72 0 0 1 2.79 17.49v70.84h-12.25a12.38 12.38 0 0 1 -5.9-1.15q-2.08-1.15-3.28-4.65l-2.4-8.09a97.29 97.29 0 0 1 -8.31 6.72 48.91 48.91 0 0 1 -8.42 4.86 45.32 45.32 0 0 1 -9.35 3 53.93 53.93 0 0 1 -11 1 43.08 43.08 0 0 1 -13.12-1.91 28.78 28.78 0 0 1 -10.38-5.74 25.73 25.73 0 0 1 -6.78-9.51 33.72 33.72 0 0 1 -2.4-13.23 26.32 26.32 0 0 1 1.42-8.47 24.76 24.76 0 0 1 4.65-8 38.63 38.63 0 0 1 8.36-7.21 54.43 54.43 0 0 1 12.63-5.9 109.08 109.08 0 0 1 17.44-4.1 174.35 174.35 0 0 1 22.74-1.91v-6.56q0-11.26-4.81-16.67t-13.88-5.41a33 33 0 0 0 -10.88 1.53 44.91 44.91 0 0 0 -7.6 3.44q-3.28 1.91-6 3.44a11.78 11.78 0 0 1 -6 1.53 7.79 7.79 0 0 1 -4.81-1.48 12.33 12.33 0 0 1 -3.17-3.44zm61.87 48.64a149.08 149.08 0 0 0 -19.68 2 52.42 52.42 0 0 0 -12.79 3.77 16.8 16.8 0 0 0 -6.89 5.36 11.63 11.63 0 0 0 -2.08 6.67q0 7.11 4.21 10.17t11 3.06a32 32 0 0 0 14.37-3 42.69 42.69 0 0 0 11.86-9.11zm90.56 49.64q-14.65 0-22.46-8.25t-7.82-22.79v-62.68h-11.51a5.26 5.26 0 0 1 -3.72-1.42 5.53 5.53 0 0 1 -1.53-4.26v-10.68l18-3 5.68-30.61a6 6 0 0 1 2.08-3.39 6.18 6.18 0 0 1 3.94-1.2h14v35.36h30v19.2h-30v60.78q0 5.25 2.57 8.2a8.85 8.85 0 0 0 7 3 12.7 12.7 0 0 0 4.21-.6 23 23 0 0 0 3-1.26q1.26-.66 2.24-1.26a3.75 3.75 0 0 1 2-.6 3.1 3.1 0 0 1 2 .6 9.23 9.23 0 0 1 1.64 1.8l8.09 13.12a40.87 40.87 0 0 1 -13.55 7.43 50.47 50.47 0 0 1 -15.86 2.51zm44.32-98.28q19.35-17.71 46.57-17.71a45.54 45.54 0 0 1 17.6 3.22 37.19 37.19 0 0 1 13.12 9 38.37 38.37 0 0 1 8.14 13.72 52.72 52.72 0 0 1 2.79 17.49v70.84h-12.25a12.37 12.37 0 0 1 -5.9-1.15q-2.08-1.15-3.28-4.65l-2.4-8.09a97.21 97.21 0 0 1 -8.31 6.72 49 49 0 0 1 -8.42 4.86 45.33 45.33 0 0 1 -9.35 3 53.92 53.92 0 0 1 -11 1 43.08 43.08 0 0 1 -13.12-1.91 28.78 28.78 0 0 1 -10.38-5.74 25.74 25.74 0 0 1 -6.78-9.51 33.72 33.72 0 0 1 -2.4-13.23 26.31 26.31 0 0 1 1.42-8.47 24.76 24.76 0 0 1 4.65-8 38.66 38.66 0 0 1 8.36-7.21 54.45 54.45 0 0 1 12.63-5.9 109.07 109.07 0 0 1 17.44-4.1 174.32 174.32 0 0 1 22.76-1.93v-6.56q0-11.26-4.81-16.67t-13.88-5.41a33 33 0 0 0 -10.88 1.53 44.89 44.89 0 0 0 -7.6 3.44q-3.28 1.91-6 3.44a11.79 11.79 0 0 1 -6 1.53 7.79 7.79 0 0 1 -4.81-1.48 12.33 12.33 0 0 1 -3.17-3.44zm61.89 48.65a149.06 149.06 0 0 0 -19.68 2 52.43 52.43 0 0 0 -12.79 3.77 16.81 16.81 0 0 0 -6.89 5.36 11.63 11.63 0 0 0 -2.08 6.67q0 7.11 4.21 10.17t11 3.06a32 32 0 0 0 14.37-3 42.68 42.68 0 0 0 11.86-9.12zm165.42 10.6a5.87 5.87 0 0 1 4.26 1.86l11.59 12.57a63.64 63.64 0 0 1 -23.67 18.26q-14 6.34-33.72 6.34-17.6 0-31.65-6a69.71 69.71 0 0 1 -24-16.73 73.19 73.19 0 0 1 -15.23-25.59 96 96 0 0 1 -5.3-32.47 90.36 90.36 0 0 1 5.68-32.63 75 75 0 0 1 16-25.52 72.5 72.5 0 0 1 24.62-16.67 81.77 81.77 0 0 1 31.7-6q17.27 0 30.66 5.68a73.08 73.08 0 0 1 22.83 14.91l-9.84 13.66a9 9 0 0 1 -2.24 2.29 6.35 6.35 0 0 1 -3.77 1 7.2 7.2 0 0 1 -3.39-.93q-1.75-.93-3.83-2.29t-4.81-3a39.67 39.67 0 0 0 -6.34-3 51.92 51.92 0 0 0 -8.36-2.29 57.87 57.87 0 0 0 -11-.93 48.74 48.74 0 0 0 -19.38 3.79 42.16 42.16 0 0 0 -15.14 10.93 50.38 50.38 0 0 0 -9.84 17.49 73.27 73.27 0 0 0 -3.5 23.56 69.25 69.25 0 0 0 3.77 23.72 52.12 52.12 0 0 0 10.22 17.49 43.61 43.61 0 0 0 15.2 10.81 47 47 0 0 0 18.8 3.77 81.89 81.89 0 0 0 10.88-.66 46.17 46.17 0 0 0 9-2.08 39.7 39.7 0 0 0 7.76-3.66 48.48 48.48 0 0 0 7.27-5.52 10.31 10.31 0 0 1 2.29-1.58 5.56 5.56 0 0 1 2.48-.58zm32.18-59.25q19.35-17.71 46.57-17.71a45.54 45.54 0 0 1 17.6 3.22 37.19 37.19 0 0 1 13.12 9 38.37 38.37 0 0 1 8.14 13.72 52.72 52.72 0 0 1 2.79 17.49v70.84h-12.24a12.37 12.37 0 0 1 -5.9-1.15q-2.08-1.15-3.28-4.65l-2.4-8.09a97.21 97.21 0 0 1 -8.31 6.72 49 49 0 0 1 -8.42 4.86 45.33 45.33 0 0 1 -9.35 3 53.92 53.92 0 0 1 -11 1 43.08 43.08 0 0 1 -13.12-1.91 28.78 28.78 0 0 1 -10.38-5.74 25.74 25.74 0 0 1 -6.78-9.51 33.73 33.73 0 0 1 -2.4-13.23 26.31 26.31 0 0 1 1.42-8.47 24.76 24.76 0 0 1 4.65-8 38.66 38.66 0 0 1 8.36-7.21 54.45 54.45 0 0 1 12.63-5.9 109.07 109.07 0 0 1 17.44-4.1 174.32 174.32 0 0 1 22.74-1.91v-6.56q0-11.26-4.81-16.67t-13.88-5.41a33 33 0 0 0 -10.88 1.53 44.89 44.89 0 0 0 -7.6 3.44q-3.28 1.91-6 3.44a11.79 11.79 0 0 1 -6 1.53 7.79 7.79 0 0 1 -4.81-1.48 12.33 12.33 0 0 1 -3.17-3.44zm61.87 48.64a149.06 149.06 0 0 0 -19.68 2 52.43 52.43 0 0 0 -12.79 3.8 16.81 16.81 0 0 0 -6.89 5.36 11.63 11.63 0 0 0 -2.08 6.67q0 7.11 4.21 10.17t11 3.06a32 32 0 0 0 14.37-3 42.68 42.68 0 0 0 11.86-9.13zm54.71 47.92v-112.19h16.51a6.63 6.63 0 0 1 6.89 4.92l1.75 8.31a63.83 63.83 0 0 1 6.18-6 39.66 39.66 0 0 1 6.89-4.7 37.29 37.29 0 0 1 7.87-3.12 35.05 35.05 0 0 1 9.24-1.15q10.6 0 17.43 5.74a34.56 34.56 0 0 1 10.22 15.25 34.23 34.23 0 0 1 6.56-9.57 35.9 35.9 0 0 1 8.63-6.5 39.77 39.77 0 0 1 10-3.72 48.07 48.07 0 0 1 10.66-1.2 44.82 44.82 0 0 1 16.51 2.84 31.69 31.69 0 0 1 12.13 8.31 36.63 36.63 0 0 1 7.49 13.34 58.1 58.1 0 0 1 2.57 18v71.44h-27v-71.41q0-10.71-4.7-16.12t-13.77-5.41a20.57 20.57 0 0 0 -7.71 1.42 18.37 18.37 0 0 0 -6.23 4.1 18.73 18.73 0 0 0 -4.21 6.72 26.13 26.13 0 0 0 -1.53 9.29v71.41h-27.11v-71.41q0-11.26-4.54-16.4t-13.28-5.14a21.83 21.83 0 0 0 -11 2.9 36.66 36.66 0 0 0 -9.46 7.92v82.13zm211.67-96.88a58.51 58.51 0 0 1 15.63-12.57q8.74-4.81 20.55-4.81a36.79 36.79 0 0 1 16.78 3.83 37.68 37.68 0 0 1 13.12 11.09 53.26 53.26 0 0 1 8.53 17.93 90.21 90.21 0 0 1 3 24.43 76.84 76.84 0 0 1 -3.39 23.28 57 57 0 0 1 -9.67 18.58 44.72 44.72 0 0 1 -15.19 12.3 44.22 44.22 0 0 1 -19.95 4.43 40.23 40.23 0 0 1 -16.07-2.9 39.23 39.23 0 0 1 -11.92-8v45.91h-27v-148.81h16.51a6.63 6.63 0 0 1 6.89 4.92zm1.42 67.77a26.57 26.57 0 0 0 10 7.87 29.76 29.76 0 0 0 11.75 2.3 25.79 25.79 0 0 0 11-2.3 21.63 21.63 0 0 0 8.36-7 34.62 34.62 0 0 0 5.3-11.86 67.79 67.79 0 0 0 1.86-16.89 75 75 0 0 0 -1.58-16.67 32.22 32.22 0 0 0 -4.54-11.09 17.92 17.92 0 0 0 -7.16-6.23 22.34 22.34 0 0 0 -9.57-2 27.28 27.28 0 0 0 -14.32 3.55 43.46 43.46 0 0 0 -11.15 10z" fill="#3d4251"/><g fill="#fff"><path d="m205.66 130.43c2.82-11.64 2.47-22.58-3.88-37.88 0 0 10.23 4 9.17-3.9s-12.34-16.17-20.8-21.44c-3.72-2.32-13.81-7.27-27.33-10.28l2.52-5.7-.7-.34c-21-10.21-45.23-1-45.48-.86l-.75.29 2.4 6.34a83.51 83.51 0 0 0 -20.39 7.59c-56.94 30.34-28.02 89.56-10.42 103.67s9.52 42.85 0 68.86h76.55c1.06-4.88.14 2.4 3.7-17s17.32-12.21 28.16-12.87 9.12-8.2 9-16.66c5.29-2.38 5.29-4.89 1.76-7.93 4.59-4.23 3.88-4.23 1.41-8.38s-.71-6.08-.71-6.08 7.4-1.44 8.46-5.31c.37-8.55-15.49-20.48-12.67-32.12zm-15.82-2.43-11.34-3.85c-.6 1.55-6.38 16-14 18.49l-7.16-9.46c-3.44 2.69-11.28 5.06-15.07 3.95v-.35h-.2l.21 26.08c-15 1.56-33-7.56-41.63-20.1l-9.41 7.14c-10.7-12.92-12.33-28.5-12.44-37.44h12.69c0-9.92 3.88-22.85 9.34-30l9.88 7c.8-1 8.27-9.92 18.26-13.15h.15l-7.12-18.92c15.09-4.34 26.17-4.28 38.23.2a4.14 4.14 0 0 1 1 .52l.19.14-2.07 5.63c.56.25 8.57 3.84 11.88 7.22l2.66-3.68s15.61 8.41 20.37 27.13l-4.15 1.45v-.1a59.23 59.23 0 0 1 -.27 32.1z"/><path d="m141.62 93.53a12.6 12.6 0 1 0 12.6 12.6 12.61 12.61 0 0 0 -12.6-12.6z"/></g></svg>


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # **Applied Machine Learning: Ensemble Modeling**<br/>by **Lisa Stuart**
  2 | 
  3 | Live training sessions are designed to mimic the flow of how a real data scientist would address a problem or a task. As such, a session needs to have some “narrative” where learners are achieving stated learning objectives in the form of a real-life data science task or project. For example, a data visualization live session could be around analyzing a dataset and creating a report with a specific business objective in mind _(ex: analyzing and visualizing churn)_, a data cleaning live session could be about preparing a dataset for analysis etc ... 
  4 | 
  5 | As part of the 'Live training Spec' process, you will need to complete the following tasks:
  6 | 
  7 | Edit this README by filling in the information for steps 1 - 4.
  8 | 
  9 | ## Step 1: Foundations 
 10 | 
 11 | This part of the 'Live training Spec' process is designed to help guide you through session design by having you think through several key questions. Please make sure to delete the examples provided here for you.
 12 | 
 13 | ### A. What problem(s) will students learn how to solve? (minimum of 5 problems)
 14 | 
 15 | > _Here's an example from the Python for Spreadsheeets Users live session_
 16 | >
 17 | > - Key considerations to take in when transitioning from spreadsheets to Python.
 18 | > - The Data Scientist mindset and keys to success in transitioning to Python.
 19 | > - How to import `.xlsx` and `.csv` files into Python using `pandas`.
 20 | > - How to filter a DataFrame using `pandas`.
 21 | > - How to create new columns out of your DataFrame for more interesting features.
 22 | > - Perform exploratory analysis of a DataFrame in `pandas`.
 23 | > - How to clean a DataFrame using `pandas` to make it ready for analysis.
 24 | > - Create simple, interesting visualizations using `matplotlib`.
 25 | 
 26 | > - Key considerations to take in when transitioning from single layer model to stacking layers.
 27 | > - The Data Scientist mindset and keys to success in transitioning from baseline models to stacking models.
 28 | > - How to select a baseline Machine Learning algorithm
 29 | > - Discuss alternative stacking methods
 30 | > - Create simple, two-layer regressor and classifier stacked models
 31 | > - How to tune hyperparameters using K-fold cross-validation
 32 | 
 33 | 
 34 | 
 35 | ### B. What technologies, packages, or functions will students use? Please be exhaustive.
 36 | 
 37 | > - pandas
 38 | > - matplotlib
 39 | > - seaborn
 40 | > - scikit-learn
 41 | > - mlxtend.StackingClassifier
 42 | > - vecstack
 43 | > - sklearn.ensemble.StackingClassifier
 44 | > - sklearn.ensemble.StackingRegressor
 45 | 
 46 | ### C. What terms or jargon will you define?
 47 | 
 48 | _Whether during your opening and closing talk or your live training, you might have to define some terms and jargon to walk students through a problem you’re solving. Intuitive explanations using analogies are encouraged._
 49 | 
 50 | > _Here's an example from the [Python for Spreadsheeets Users live session](https://www.datacamp.com/resources/webinars/live-training-python-for-spreadsheet-users)._
 51 | > 
 52 | > - Packages: Packages are pieces of software we can import to Python. Similar to how we download, install Excel on MacOs, we import pandas on Python. (You can find it at minute 6:30)
 53 | 
 54 | > - What is considered a 'weak' learner?
 55 | > - Ensemble: In machine learning, a collection of multiple base models combined to create a single model that has better predictive performance than any of the base models used to produce it.  For example, the Random Forest algorithm is an ensemble method that constructs a collection of Decision Trees to output a single trained Random Forest model.
 56 | > - Stacking: In machine learning, a collection of multiple base models that use algorithms that are different from one another and used in layers.  The predictions from the layers are used as input in the final layer to produce a final trained model that has better predictive performance than any of the base models used to product it.  Stacking is an ensemble method.
 57 | 
 58 | ### D. What mistakes or misconceptions do you expect? 
 59 | 
 60 | _To help minimize the amount of Q&As and make your live training re-usable, list out some mistakes and misconceptions you think students might encounter along the way._
 61 | 
 62 | > _Here's an example from the [Data Visualization in Python live session](https://www.datacamp.com/resources/webinars/data-visualization-in-python)_
 63 | > 
 64 | > - Anatomy of a matplotlib figure: When calling a matplotlib plot, a figure, axes and plot is being created behind the background. (You can find it at minute 11)
 65 | > - As long as you do understand how plots work behind the scenes, you don't need to memorize syntax to customize your plot. 
 66 | 
 67 | > - Ensuring the layers are composed of 'weak' learners.
 68 | > - Concept of leakage, not leaking information from between layers to avoid overfitting, not generalizing, etc.
 69 | > - As long as you understand how base models work behind the scenes, you don't need to memorize arguments to customize your stacking model.
 70 | 
 71 | ### E. What datasets will you use? 
 72 | 
 73 | Live training sessions are designed to walk students through something closer to a real-life data science workflow. Accordingly, the dataset needs to accommodate that user experience. 
 74 | As a rule of thumb, your dataset should always answer yes to the following question: 
 75 | > Is the dataset/problem I’m working on, something an industry data scientist/analyst could work on? 
 76 | 
 77 | Check our [datasets to avoid](https://instructor-support.datacamp.com/en/articles/2360699-datasets-to-avoid) list. 
 78 | 
 79 | > - [Abalone Age](https://archive.ics.uci.edu/ml/datasets/abalone)-Regression
 80 | > - [Pima Indians Diabetes](https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv)-Binary Classification
 81 | 
 82 | ## Step 2: Who is this session for?
 83 | 
 84 | Terms like "beginner" and "expert" mean different things to different people, so we use personas to help instructors clarify a live training's audience. When designing a specific live training, instructors should explain how it will or won't help these people, and what extra skills or prerequisite knowledge they are assuming their students have above and beyond what's included in the persona.
 85 | 
 86 | - [ ] Please select the roles and industries that align with your live training. 
 87 | - [ ] Include an explanation describing your reasoning and any other relevant information. 
 88 | 
 89 | ### What roles would this live training be suitable for?
 90 | 
 91 | *Check all that apply.*
 92 | 
 93 | - [ ] Data Consumer
 94 | - [ ] Leader 
 95 | - [X] Data Analyst
 96 | - [X] Citizen Data Scientist
 97 | - [X] Data Scientist
 98 | - [X] Data Engineer
 99 | - [ ] Database Administrator
100 | - [ ] Statistician
101 | - [X] Machine Learning Scientist
102 | - [ ] Programmer
103 | - [ ] Other (please describe)
104 | 
105 | ### What industries would this apply to?
106 | 
107 | *List one or more industries that the content would be appropriate for.*
108 | Industry Agnostic
109 | 
110 | 
111 | ### What level of expertise should learners have before beginning the live training?
112 | 
113 | *List three or more examples of skills that you expect learners to have before beginning the live training*
114 | 
115 | > - Can draw common plot types (scatter, bar, histogram) using matplotlib and interpret them
116 | > - Can run a linear regression, use it to make predictions, and interpret the coefficients.
117 | > - Can calculate grouped summary statistics using SELECT queries with GROUP BY clauses.
118 | 
119 | > - Can run a linear regression, use it to make predictions, and calculate performance metrics.
120 | > - Can run a logistic regression, use it to make predictions, and calculate performance metrics.
121 | > - Can run a decision tree classifier/regressor, use it to make predictions, and calculate performance metrics.
122 | 
123 | 
124 | ## Step 3: Prerequisites
125 | 
126 | List any prerequisite courses you think your live training could use from. This could be the live session’s companion course or a course you think students should take before the session. Prerequisites act as a guiding principle for your session and will set the topic framework, but you do not have to limit yourself in the live session to the syntax used in the prerequisite courses.
127 | 
128 | > - [Supervised Learning with scikit-learn](https://learn.datacamp.com/courses/supervised-learning-with-scikit-learn)
129 | > - [Ensemble Methods in Python](https://learn.datacamp.com/courses/ensemble-methods-in-python)
130 | 
131 | 
132 | 
133 | ## Step 4: Session Outline
134 | 
135 | A live training session usually begins with an introductory presentation, followed by the live training itself, and an ending presentation. Your live session is expected to be around 2h30m-3h long (including Q&A) with a hard-limit at 3h30m. You can check out our live training content guidelines [here](_LINK_). 
136 | 
137 | 
138 | > _Example from [Python for Spreadsheet Users](https://www.datacamp.com/resources/webinars/live-training-python-for-spreadsheet-users)_
139 | >
140 | > ### Introduction Slides 
141 | > - Introduction to the webinar and instructor (led by DataCamp TA)
142 | > - Introduction to the topics
143 | >   - Discuss need to become familiar with baseline machine learning algorithms
144 | >   - Define what a 'weak' learner is
145 | >   - Discuss learning ensemble methods and go over session outline
146 | >   - Set expectations about Q&A
147 | >
148 | > ### Live Training
149 | > #### Ensemble Technique #1 - `Classifier`
150 | > - Import Diabets classification dataset and print header of DataFrame `pd.read_csv()`, `.head()`
151 | > - Glimpse at the data using `.dtypes`, `.describe()`, `.info()` to understand the data
152 | > - Build baseline models
153 | > - Build layers and first stacked model <u>classifier</u> 
154 | > - Compare baseline and stacked models using `seaborn.boxplot`
155 | > #### Ensemble Technique #2 - `Regressor`
156 | > - Import and briefly explore Abalone regression dataset
157 | > - Build baseline models
158 | > - Build layers and first stacked <u>regressor</u> 
159 | > - Compare baseline and stacked models using `seaborn.boxplot`
160 | > #### Ensemble Technique #3 - `Regressor`
161 | > - Discuss multiple layer stacking
162 | > - Build additional layer and stacked regressor model
163 | > - Compare baseline and stacked models using `seaborn.boxplot`
164 | > - Discuss how to apply same to `sklearn.ensemble.StackingClassifier`
165 | > - **Q&A**
166 | >
167 | > ### Ending slides
168 | > - Recap of what we learned
169 | > - The model stacking mindset
170 | > - Call to action and course recommendations
171 | 
172 | ## Authoring your session
173 | 
174 | To get yourself started with setting up your live session, follow the steps below:
175 | 
176 | 1. Download and install the "Open in Colabs" extension from [here](https://chrome.google.com/webstore/detail/open-in-colab/iogfkhleblhcpcekbiedikdehleodpjo?hl=en). This will let you take any jupyter notebook you see in a GitHub repository and open it as a **temporary** Colabs link.
177 | 2. Upload your dataset(s) to the `data` folder.
178 | 3. Upload your images, gifs, or any other assets you want to use in the notebook in the `assets` folder.
179 | 4. Check out the notebooks templates in the `notebooks` folder, and keep the template you want for your session while deleting all remaining ones.
180 | 5. Preview your desired notebook, press on "Open in Colabs" extension - and start developing your content in colabs _(which will act as the solution code to the session)_.  :warning: **Important** :warning: Your progress will **not** be saved on Google Colabs since it's a temporary link. To save your progress, make sure to press on `File`, `Save a copy in GitHub` and follow remaining prompts. You can also download the notebook locally and develop the content there as long you test out that the syntax works on Colabs as well.
181 | 6. Once your notebooks is ready to go, give it the name `session_name_solution.ipynb` create an empty version of the Notebook to be filled out by you and learners during the session, end the file name with `session_name_learners.ipynb`. 
182 | 7. Create Colabs links for both sessions and save them in notebooks :tada: 
183 | 


--------------------------------------------------------------------------------
/data/pima-indians-diabetes.csv:
--------------------------------------------------------------------------------
  1 | n_preg,pl_glucose,dia_bp,tri_thick,serum_ins,bmi,diab_ped,age,class
  2 | 6,148,72,35,0,33.6,0.627,50,1
  3 | 1,85,66,29,0,26.6,0.351,31,0
  4 | 8,183,64,0,0,23.3,0.672,32,1
  5 | 1,89,66,23,94,28.1,0.167,21,0
  6 | 0,137,40,35,168,43.1,2.288,33,1
  7 | 5,116,74,0,0,25.6,0.201,30,0
  8 | 3,78,50,32,88,31,0.248,26,1
  9 | 10,115,0,0,0,35.3,0.134,29,0
 10 | 2,197,70,45,543,30.5,0.158,53,1
 11 | 8,125,96,0,0,0,0.232,54,1
 12 | 4,110,92,0,0,37.6,0.191,30,0
 13 | 10,168,74,0,0,38,0.537,34,1
 14 | 10,139,80,0,0,27.1,1.441,57,0
 15 | 1,189,60,23,846,30.1,0.398,59,1
 16 | 5,166,72,19,175,25.8,0.587,51,1
 17 | 7,100,0,0,0,30,0.484,32,1
 18 | 0,118,84,47,230,45.8,0.551,31,1
 19 | 7,107,74,0,0,29.6,0.254,31,1
 20 | 1,103,30,38,83,43.3,0.183,33,0
 21 | 1,115,70,30,96,34.6,0.529,32,1
 22 | 3,126,88,41,235,39.3,0.704,27,0
 23 | 8,99,84,0,0,35.4,0.388,50,0
 24 | 7,196,90,0,0,39.8,0.451,41,1
 25 | 9,119,80,35,0,29,0.263,29,1
 26 | 11,143,94,33,146,36.6,0.254,51,1
 27 | 10,125,70,26,115,31.1,0.205,41,1
 28 | 7,147,76,0,0,39.4,0.257,43,1
 29 | 1,97,66,15,140,23.2,0.487,22,0
 30 | 13,145,82,19,110,22.2,0.245,57,0
 31 | 5,117,92,0,0,34.1,0.337,38,0
 32 | 5,109,75,26,0,36,0.546,60,0
 33 | 3,158,76,36,245,31.6,0.851,28,1
 34 | 3,88,58,11,54,24.8,0.267,22,0
 35 | 6,92,92,0,0,19.9,0.188,28,0
 36 | 10,122,78,31,0,27.6,0.512,45,0
 37 | 4,103,60,33,192,24,0.966,33,0
 38 | 11,138,76,0,0,33.2,0.42,35,0
 39 | 9,102,76,37,0,32.9,0.665,46,1
 40 | 2,90,68,42,0,38.2,0.503,27,1
 41 | 4,111,72,47,207,37.1,1.39,56,1
 42 | 3,180,64,25,70,34,0.271,26,0
 43 | 7,133,84,0,0,40.2,0.696,37,0
 44 | 7,106,92,18,0,22.7,0.235,48,0
 45 | 9,171,110,24,240,45.4,0.721,54,1
 46 | 7,159,64,0,0,27.4,0.294,40,0
 47 | 0,180,66,39,0,42,1.893,25,1
 48 | 1,146,56,0,0,29.7,0.564,29,0
 49 | 2,71,70,27,0,28,0.586,22,0
 50 | 7,103,66,32,0,39.1,0.344,31,1
 51 | 7,105,0,0,0,0,0.305,24,0
 52 | 1,103,80,11,82,19.4,0.491,22,0
 53 | 1,101,50,15,36,24.2,0.526,26,0
 54 | 5,88,66,21,23,24.4,0.342,30,0
 55 | 8,176,90,34,300,33.7,0.467,58,1
 56 | 7,150,66,42,342,34.7,0.718,42,0
 57 | 1,73,50,10,0,23,0.248,21,0
 58 | 7,187,68,39,304,37.7,0.254,41,1
 59 | 0,100,88,60,110,46.8,0.962,31,0
 60 | 0,146,82,0,0,40.5,1.781,44,0
 61 | 0,105,64,41,142,41.5,0.173,22,0
 62 | 2,84,0,0,0,0,0.304,21,0
 63 | 8,133,72,0,0,32.9,0.27,39,1
 64 | 5,44,62,0,0,25,0.587,36,0
 65 | 2,141,58,34,128,25.4,0.699,24,0
 66 | 7,114,66,0,0,32.8,0.258,42,1
 67 | 5,99,74,27,0,29,0.203,32,0
 68 | 0,109,88,30,0,32.5,0.855,38,1
 69 | 2,109,92,0,0,42.7,0.845,54,0
 70 | 1,95,66,13,38,19.6,0.334,25,0
 71 | 4,146,85,27,100,28.9,0.189,27,0
 72 | 2,100,66,20,90,32.9,0.867,28,1
 73 | 5,139,64,35,140,28.6,0.411,26,0
 74 | 13,126,90,0,0,43.4,0.583,42,1
 75 | 4,129,86,20,270,35.1,0.231,23,0
 76 | 1,79,75,30,0,32,0.396,22,0
 77 | 1,0,48,20,0,24.7,0.14,22,0
 78 | 7,62,78,0,0,32.6,0.391,41,0
 79 | 5,95,72,33,0,37.7,0.37,27,0
 80 | 0,131,0,0,0,43.2,0.27,26,1
 81 | 2,112,66,22,0,25,0.307,24,0
 82 | 3,113,44,13,0,22.4,0.14,22,0
 83 | 2,74,0,0,0,0,0.102,22,0
 84 | 7,83,78,26,71,29.3,0.767,36,0
 85 | 0,101,65,28,0,24.6,0.237,22,0
 86 | 5,137,108,0,0,48.8,0.227,37,1
 87 | 2,110,74,29,125,32.4,0.698,27,0
 88 | 13,106,72,54,0,36.6,0.178,45,0
 89 | 2,100,68,25,71,38.5,0.324,26,0
 90 | 15,136,70,32,110,37.1,0.153,43,1
 91 | 1,107,68,19,0,26.5,0.165,24,0
 92 | 1,80,55,0,0,19.1,0.258,21,0
 93 | 4,123,80,15,176,32,0.443,34,0
 94 | 7,81,78,40,48,46.7,0.261,42,0
 95 | 4,134,72,0,0,23.8,0.277,60,1
 96 | 2,142,82,18,64,24.7,0.761,21,0
 97 | 6,144,72,27,228,33.9,0.255,40,0
 98 | 2,92,62,28,0,31.6,0.13,24,0
 99 | 1,71,48,18,76,20.4,0.323,22,0
100 | 6,93,50,30,64,28.7,0.356,23,0
101 | 1,122,90,51,220,49.7,0.325,31,1
102 | 1,163,72,0,0,39,1.222,33,1
103 | 1,151,60,0,0,26.1,0.179,22,0
104 | 0,125,96,0,0,22.5,0.262,21,0
105 | 1,81,72,18,40,26.6,0.283,24,0
106 | 2,85,65,0,0,39.6,0.93,27,0
107 | 1,126,56,29,152,28.7,0.801,21,0
108 | 1,96,122,0,0,22.4,0.207,27,0
109 | 4,144,58,28,140,29.5,0.287,37,0
110 | 3,83,58,31,18,34.3,0.336,25,0
111 | 0,95,85,25,36,37.4,0.247,24,1
112 | 3,171,72,33,135,33.3,0.199,24,1
113 | 8,155,62,26,495,34,0.543,46,1
114 | 1,89,76,34,37,31.2,0.192,23,0
115 | 4,76,62,0,0,34,0.391,25,0
116 | 7,160,54,32,175,30.5,0.588,39,1
117 | 4,146,92,0,0,31.2,0.539,61,1
118 | 5,124,74,0,0,34,0.22,38,1
119 | 5,78,48,0,0,33.7,0.654,25,0
120 | 4,97,60,23,0,28.2,0.443,22,0
121 | 4,99,76,15,51,23.2,0.223,21,0
122 | 0,162,76,56,100,53.2,0.759,25,1
123 | 6,111,64,39,0,34.2,0.26,24,0
124 | 2,107,74,30,100,33.6,0.404,23,0
125 | 5,132,80,0,0,26.8,0.186,69,0
126 | 0,113,76,0,0,33.3,0.278,23,1
127 | 1,88,30,42,99,55,0.496,26,1
128 | 3,120,70,30,135,42.9,0.452,30,0
129 | 1,118,58,36,94,33.3,0.261,23,0
130 | 1,117,88,24,145,34.5,0.403,40,1
131 | 0,105,84,0,0,27.9,0.741,62,1
132 | 4,173,70,14,168,29.7,0.361,33,1
133 | 9,122,56,0,0,33.3,1.114,33,1
134 | 3,170,64,37,225,34.5,0.356,30,1
135 | 8,84,74,31,0,38.3,0.457,39,0
136 | 2,96,68,13,49,21.1,0.647,26,0
137 | 2,125,60,20,140,33.8,0.088,31,0
138 | 0,100,70,26,50,30.8,0.597,21,0
139 | 0,93,60,25,92,28.7,0.532,22,0
140 | 0,129,80,0,0,31.2,0.703,29,0
141 | 5,105,72,29,325,36.9,0.159,28,0
142 | 3,128,78,0,0,21.1,0.268,55,0
143 | 5,106,82,30,0,39.5,0.286,38,0
144 | 2,108,52,26,63,32.5,0.318,22,0
145 | 10,108,66,0,0,32.4,0.272,42,1
146 | 4,154,62,31,284,32.8,0.237,23,0
147 | 0,102,75,23,0,0,0.572,21,0
148 | 9,57,80,37,0,32.8,0.096,41,0
149 | 2,106,64,35,119,30.5,1.4,34,0
150 | 5,147,78,0,0,33.7,0.218,65,0
151 | 2,90,70,17,0,27.3,0.085,22,0
152 | 1,136,74,50,204,37.4,0.399,24,0
153 | 4,114,65,0,0,21.9,0.432,37,0
154 | 9,156,86,28,155,34.3,1.189,42,1
155 | 1,153,82,42,485,40.6,0.687,23,0
156 | 8,188,78,0,0,47.9,0.137,43,1
157 | 7,152,88,44,0,50,0.337,36,1
158 | 2,99,52,15,94,24.6,0.637,21,0
159 | 1,109,56,21,135,25.2,0.833,23,0
160 | 2,88,74,19,53,29,0.229,22,0
161 | 17,163,72,41,114,40.9,0.817,47,1
162 | 4,151,90,38,0,29.7,0.294,36,0
163 | 7,102,74,40,105,37.2,0.204,45,0
164 | 0,114,80,34,285,44.2,0.167,27,0
165 | 2,100,64,23,0,29.7,0.368,21,0
166 | 0,131,88,0,0,31.6,0.743,32,1
167 | 6,104,74,18,156,29.9,0.722,41,1
168 | 3,148,66,25,0,32.5,0.256,22,0
169 | 4,120,68,0,0,29.6,0.709,34,0
170 | 4,110,66,0,0,31.9,0.471,29,0
171 | 3,111,90,12,78,28.4,0.495,29,0
172 | 6,102,82,0,0,30.8,0.18,36,1
173 | 6,134,70,23,130,35.4,0.542,29,1
174 | 2,87,0,23,0,28.9,0.773,25,0
175 | 1,79,60,42,48,43.5,0.678,23,0
176 | 2,75,64,24,55,29.7,0.37,33,0
177 | 8,179,72,42,130,32.7,0.719,36,1
178 | 6,85,78,0,0,31.2,0.382,42,0
179 | 0,129,110,46,130,67.1,0.319,26,1
180 | 5,143,78,0,0,45,0.19,47,0
181 | 5,130,82,0,0,39.1,0.956,37,1
182 | 6,87,80,0,0,23.2,0.084,32,0
183 | 0,119,64,18,92,34.9,0.725,23,0
184 | 1,0,74,20,23,27.7,0.299,21,0
185 | 5,73,60,0,0,26.8,0.268,27,0
186 | 4,141,74,0,0,27.6,0.244,40,0
187 | 7,194,68,28,0,35.9,0.745,41,1
188 | 8,181,68,36,495,30.1,0.615,60,1
189 | 1,128,98,41,58,32,1.321,33,1
190 | 8,109,76,39,114,27.9,0.64,31,1
191 | 5,139,80,35,160,31.6,0.361,25,1
192 | 3,111,62,0,0,22.6,0.142,21,0
193 | 9,123,70,44,94,33.1,0.374,40,0
194 | 7,159,66,0,0,30.4,0.383,36,1
195 | 11,135,0,0,0,52.3,0.578,40,1
196 | 8,85,55,20,0,24.4,0.136,42,0
197 | 5,158,84,41,210,39.4,0.395,29,1
198 | 1,105,58,0,0,24.3,0.187,21,0
199 | 3,107,62,13,48,22.9,0.678,23,1
200 | 4,109,64,44,99,34.8,0.905,26,1
201 | 4,148,60,27,318,30.9,0.15,29,1
202 | 0,113,80,16,0,31,0.874,21,0
203 | 1,138,82,0,0,40.1,0.236,28,0
204 | 0,108,68,20,0,27.3,0.787,32,0
205 | 2,99,70,16,44,20.4,0.235,27,0
206 | 6,103,72,32,190,37.7,0.324,55,0
207 | 5,111,72,28,0,23.9,0.407,27,0
208 | 8,196,76,29,280,37.5,0.605,57,1
209 | 5,162,104,0,0,37.7,0.151,52,1
210 | 1,96,64,27,87,33.2,0.289,21,0
211 | 7,184,84,33,0,35.5,0.355,41,1
212 | 2,81,60,22,0,27.7,0.29,25,0
213 | 0,147,85,54,0,42.8,0.375,24,0
214 | 7,179,95,31,0,34.2,0.164,60,0
215 | 0,140,65,26,130,42.6,0.431,24,1
216 | 9,112,82,32,175,34.2,0.26,36,1
217 | 12,151,70,40,271,41.8,0.742,38,1
218 | 5,109,62,41,129,35.8,0.514,25,1
219 | 6,125,68,30,120,30,0.464,32,0
220 | 5,85,74,22,0,29,1.224,32,1
221 | 5,112,66,0,0,37.8,0.261,41,1
222 | 0,177,60,29,478,34.6,1.072,21,1
223 | 2,158,90,0,0,31.6,0.805,66,1
224 | 7,119,0,0,0,25.2,0.209,37,0
225 | 7,142,60,33,190,28.8,0.687,61,0
226 | 1,100,66,15,56,23.6,0.666,26,0
227 | 1,87,78,27,32,34.6,0.101,22,0
228 | 0,101,76,0,0,35.7,0.198,26,0
229 | 3,162,52,38,0,37.2,0.652,24,1
230 | 4,197,70,39,744,36.7,2.329,31,0
231 | 0,117,80,31,53,45.2,0.089,24,0
232 | 4,142,86,0,0,44,0.645,22,1
233 | 6,134,80,37,370,46.2,0.238,46,1
234 | 1,79,80,25,37,25.4,0.583,22,0
235 | 4,122,68,0,0,35,0.394,29,0
236 | 3,74,68,28,45,29.7,0.293,23,0
237 | 4,171,72,0,0,43.6,0.479,26,1
238 | 7,181,84,21,192,35.9,0.586,51,1
239 | 0,179,90,27,0,44.1,0.686,23,1
240 | 9,164,84,21,0,30.8,0.831,32,1
241 | 0,104,76,0,0,18.4,0.582,27,0
242 | 1,91,64,24,0,29.2,0.192,21,0
243 | 4,91,70,32,88,33.1,0.446,22,0
244 | 3,139,54,0,0,25.6,0.402,22,1
245 | 6,119,50,22,176,27.1,1.318,33,1
246 | 2,146,76,35,194,38.2,0.329,29,0
247 | 9,184,85,15,0,30,1.213,49,1
248 | 10,122,68,0,0,31.2,0.258,41,0
249 | 0,165,90,33,680,52.3,0.427,23,0
250 | 9,124,70,33,402,35.4,0.282,34,0
251 | 1,111,86,19,0,30.1,0.143,23,0
252 | 9,106,52,0,0,31.2,0.38,42,0
253 | 2,129,84,0,0,28,0.284,27,0
254 | 2,90,80,14,55,24.4,0.249,24,0
255 | 0,86,68,32,0,35.8,0.238,25,0
256 | 12,92,62,7,258,27.6,0.926,44,1
257 | 1,113,64,35,0,33.6,0.543,21,1
258 | 3,111,56,39,0,30.1,0.557,30,0
259 | 2,114,68,22,0,28.7,0.092,25,0
260 | 1,193,50,16,375,25.9,0.655,24,0
261 | 11,155,76,28,150,33.3,1.353,51,1
262 | 3,191,68,15,130,30.9,0.299,34,0
263 | 3,141,0,0,0,30,0.761,27,1
264 | 4,95,70,32,0,32.1,0.612,24,0
265 | 3,142,80,15,0,32.4,0.2,63,0
266 | 4,123,62,0,0,32,0.226,35,1
267 | 5,96,74,18,67,33.6,0.997,43,0
268 | 0,138,0,0,0,36.3,0.933,25,1
269 | 2,128,64,42,0,40,1.101,24,0
270 | 0,102,52,0,0,25.1,0.078,21,0
271 | 2,146,0,0,0,27.5,0.24,28,1
272 | 10,101,86,37,0,45.6,1.136,38,1
273 | 2,108,62,32,56,25.2,0.128,21,0
274 | 3,122,78,0,0,23,0.254,40,0
275 | 1,71,78,50,45,33.2,0.422,21,0
276 | 13,106,70,0,0,34.2,0.251,52,0
277 | 2,100,70,52,57,40.5,0.677,25,0
278 | 7,106,60,24,0,26.5,0.296,29,1
279 | 0,104,64,23,116,27.8,0.454,23,0
280 | 5,114,74,0,0,24.9,0.744,57,0
281 | 2,108,62,10,278,25.3,0.881,22,0
282 | 0,146,70,0,0,37.9,0.334,28,1
283 | 10,129,76,28,122,35.9,0.28,39,0
284 | 7,133,88,15,155,32.4,0.262,37,0
285 | 7,161,86,0,0,30.4,0.165,47,1
286 | 2,108,80,0,0,27,0.259,52,1
287 | 7,136,74,26,135,26,0.647,51,0
288 | 5,155,84,44,545,38.7,0.619,34,0
289 | 1,119,86,39,220,45.6,0.808,29,1
290 | 4,96,56,17,49,20.8,0.34,26,0
291 | 5,108,72,43,75,36.1,0.263,33,0
292 | 0,78,88,29,40,36.9,0.434,21,0
293 | 0,107,62,30,74,36.6,0.757,25,1
294 | 2,128,78,37,182,43.3,1.224,31,1
295 | 1,128,48,45,194,40.5,0.613,24,1
296 | 0,161,50,0,0,21.9,0.254,65,0
297 | 6,151,62,31,120,35.5,0.692,28,0
298 | 2,146,70,38,360,28,0.337,29,1
299 | 0,126,84,29,215,30.7,0.52,24,0
300 | 14,100,78,25,184,36.6,0.412,46,1
301 | 8,112,72,0,0,23.6,0.84,58,0
302 | 0,167,0,0,0,32.3,0.839,30,1
303 | 2,144,58,33,135,31.6,0.422,25,1
304 | 5,77,82,41,42,35.8,0.156,35,0
305 | 5,115,98,0,0,52.9,0.209,28,1
306 | 3,150,76,0,0,21,0.207,37,0
307 | 2,120,76,37,105,39.7,0.215,29,0
308 | 10,161,68,23,132,25.5,0.326,47,1
309 | 0,137,68,14,148,24.8,0.143,21,0
310 | 0,128,68,19,180,30.5,1.391,25,1
311 | 2,124,68,28,205,32.9,0.875,30,1
312 | 6,80,66,30,0,26.2,0.313,41,0
313 | 0,106,70,37,148,39.4,0.605,22,0
314 | 2,155,74,17,96,26.6,0.433,27,1
315 | 3,113,50,10,85,29.5,0.626,25,0
316 | 7,109,80,31,0,35.9,1.127,43,1
317 | 2,112,68,22,94,34.1,0.315,26,0
318 | 3,99,80,11,64,19.3,0.284,30,0
319 | 3,182,74,0,0,30.5,0.345,29,1
320 | 3,115,66,39,140,38.1,0.15,28,0
321 | 6,194,78,0,0,23.5,0.129,59,1
322 | 4,129,60,12,231,27.5,0.527,31,0
323 | 3,112,74,30,0,31.6,0.197,25,1
324 | 0,124,70,20,0,27.4,0.254,36,1
325 | 13,152,90,33,29,26.8,0.731,43,1
326 | 2,112,75,32,0,35.7,0.148,21,0
327 | 1,157,72,21,168,25.6,0.123,24,0
328 | 1,122,64,32,156,35.1,0.692,30,1
329 | 10,179,70,0,0,35.1,0.2,37,0
330 | 2,102,86,36,120,45.5,0.127,23,1
331 | 6,105,70,32,68,30.8,0.122,37,0
332 | 8,118,72,19,0,23.1,1.476,46,0
333 | 2,87,58,16,52,32.7,0.166,25,0
334 | 1,180,0,0,0,43.3,0.282,41,1
335 | 12,106,80,0,0,23.6,0.137,44,0
336 | 1,95,60,18,58,23.9,0.26,22,0
337 | 0,165,76,43,255,47.9,0.259,26,0
338 | 0,117,0,0,0,33.8,0.932,44,0
339 | 5,115,76,0,0,31.2,0.343,44,1
340 | 9,152,78,34,171,34.2,0.893,33,1
341 | 7,178,84,0,0,39.9,0.331,41,1
342 | 1,130,70,13,105,25.9,0.472,22,0
343 | 1,95,74,21,73,25.9,0.673,36,0
344 | 1,0,68,35,0,32,0.389,22,0
345 | 5,122,86,0,0,34.7,0.29,33,0
346 | 8,95,72,0,0,36.8,0.485,57,0
347 | 8,126,88,36,108,38.5,0.349,49,0
348 | 1,139,46,19,83,28.7,0.654,22,0
349 | 3,116,0,0,0,23.5,0.187,23,0
350 | 3,99,62,19,74,21.8,0.279,26,0
351 | 5,0,80,32,0,41,0.346,37,1
352 | 4,92,80,0,0,42.2,0.237,29,0
353 | 4,137,84,0,0,31.2,0.252,30,0
354 | 3,61,82,28,0,34.4,0.243,46,0
355 | 1,90,62,12,43,27.2,0.58,24,0
356 | 3,90,78,0,0,42.7,0.559,21,0
357 | 9,165,88,0,0,30.4,0.302,49,1
358 | 1,125,50,40,167,33.3,0.962,28,1
359 | 13,129,0,30,0,39.9,0.569,44,1
360 | 12,88,74,40,54,35.3,0.378,48,0
361 | 1,196,76,36,249,36.5,0.875,29,1
362 | 5,189,64,33,325,31.2,0.583,29,1
363 | 5,158,70,0,0,29.8,0.207,63,0
364 | 5,103,108,37,0,39.2,0.305,65,0
365 | 4,146,78,0,0,38.5,0.52,67,1
366 | 4,147,74,25,293,34.9,0.385,30,0
367 | 5,99,54,28,83,34,0.499,30,0
368 | 6,124,72,0,0,27.6,0.368,29,1
369 | 0,101,64,17,0,21,0.252,21,0
370 | 3,81,86,16,66,27.5,0.306,22,0
371 | 1,133,102,28,140,32.8,0.234,45,1
372 | 3,173,82,48,465,38.4,2.137,25,1
373 | 0,118,64,23,89,0,1.731,21,0
374 | 0,84,64,22,66,35.8,0.545,21,0
375 | 2,105,58,40,94,34.9,0.225,25,0
376 | 2,122,52,43,158,36.2,0.816,28,0
377 | 12,140,82,43,325,39.2,0.528,58,1
378 | 0,98,82,15,84,25.2,0.299,22,0
379 | 1,87,60,37,75,37.2,0.509,22,0
380 | 4,156,75,0,0,48.3,0.238,32,1
381 | 0,93,100,39,72,43.4,1.021,35,0
382 | 1,107,72,30,82,30.8,0.821,24,0
383 | 0,105,68,22,0,20,0.236,22,0
384 | 1,109,60,8,182,25.4,0.947,21,0
385 | 1,90,62,18,59,25.1,1.268,25,0
386 | 1,125,70,24,110,24.3,0.221,25,0
387 | 1,119,54,13,50,22.3,0.205,24,0
388 | 5,116,74,29,0,32.3,0.66,35,1
389 | 8,105,100,36,0,43.3,0.239,45,1
390 | 5,144,82,26,285,32,0.452,58,1
391 | 3,100,68,23,81,31.6,0.949,28,0
392 | 1,100,66,29,196,32,0.444,42,0
393 | 5,166,76,0,0,45.7,0.34,27,1
394 | 1,131,64,14,415,23.7,0.389,21,0
395 | 4,116,72,12,87,22.1,0.463,37,0
396 | 4,158,78,0,0,32.9,0.803,31,1
397 | 2,127,58,24,275,27.7,1.6,25,0
398 | 3,96,56,34,115,24.7,0.944,39,0
399 | 0,131,66,40,0,34.3,0.196,22,1
400 | 3,82,70,0,0,21.1,0.389,25,0
401 | 3,193,70,31,0,34.9,0.241,25,1
402 | 4,95,64,0,0,32,0.161,31,1
403 | 6,137,61,0,0,24.2,0.151,55,0
404 | 5,136,84,41,88,35,0.286,35,1
405 | 9,72,78,25,0,31.6,0.28,38,0
406 | 5,168,64,0,0,32.9,0.135,41,1
407 | 2,123,48,32,165,42.1,0.52,26,0
408 | 4,115,72,0,0,28.9,0.376,46,1
409 | 0,101,62,0,0,21.9,0.336,25,0
410 | 8,197,74,0,0,25.9,1.191,39,1
411 | 1,172,68,49,579,42.4,0.702,28,1
412 | 6,102,90,39,0,35.7,0.674,28,0
413 | 1,112,72,30,176,34.4,0.528,25,0
414 | 1,143,84,23,310,42.4,1.076,22,0
415 | 1,143,74,22,61,26.2,0.256,21,0
416 | 0,138,60,35,167,34.6,0.534,21,1
417 | 3,173,84,33,474,35.7,0.258,22,1
418 | 1,97,68,21,0,27.2,1.095,22,0
419 | 4,144,82,32,0,38.5,0.554,37,1
420 | 1,83,68,0,0,18.2,0.624,27,0
421 | 3,129,64,29,115,26.4,0.219,28,1
422 | 1,119,88,41,170,45.3,0.507,26,0
423 | 2,94,68,18,76,26,0.561,21,0
424 | 0,102,64,46,78,40.6,0.496,21,0
425 | 2,115,64,22,0,30.8,0.421,21,0
426 | 8,151,78,32,210,42.9,0.516,36,1
427 | 4,184,78,39,277,37,0.264,31,1
428 | 0,94,0,0,0,0,0.256,25,0
429 | 1,181,64,30,180,34.1,0.328,38,1
430 | 0,135,94,46,145,40.6,0.284,26,0
431 | 1,95,82,25,180,35,0.233,43,1
432 | 2,99,0,0,0,22.2,0.108,23,0
433 | 3,89,74,16,85,30.4,0.551,38,0
434 | 1,80,74,11,60,30,0.527,22,0
435 | 2,139,75,0,0,25.6,0.167,29,0
436 | 1,90,68,8,0,24.5,1.138,36,0
437 | 0,141,0,0,0,42.4,0.205,29,1
438 | 12,140,85,33,0,37.4,0.244,41,0
439 | 5,147,75,0,0,29.9,0.434,28,0
440 | 1,97,70,15,0,18.2,0.147,21,0
441 | 6,107,88,0,0,36.8,0.727,31,0
442 | 0,189,104,25,0,34.3,0.435,41,1
443 | 2,83,66,23,50,32.2,0.497,22,0
444 | 4,117,64,27,120,33.2,0.23,24,0
445 | 8,108,70,0,0,30.5,0.955,33,1
446 | 4,117,62,12,0,29.7,0.38,30,1
447 | 0,180,78,63,14,59.4,2.42,25,1
448 | 1,100,72,12,70,25.3,0.658,28,0
449 | 0,95,80,45,92,36.5,0.33,26,0
450 | 0,104,64,37,64,33.6,0.51,22,1
451 | 0,120,74,18,63,30.5,0.285,26,0
452 | 1,82,64,13,95,21.2,0.415,23,0
453 | 2,134,70,0,0,28.9,0.542,23,1
454 | 0,91,68,32,210,39.9,0.381,25,0
455 | 2,119,0,0,0,19.6,0.832,72,0
456 | 2,100,54,28,105,37.8,0.498,24,0
457 | 14,175,62,30,0,33.6,0.212,38,1
458 | 1,135,54,0,0,26.7,0.687,62,0
459 | 5,86,68,28,71,30.2,0.364,24,0
460 | 10,148,84,48,237,37.6,1.001,51,1
461 | 9,134,74,33,60,25.9,0.46,81,0
462 | 9,120,72,22,56,20.8,0.733,48,0
463 | 1,71,62,0,0,21.8,0.416,26,0
464 | 8,74,70,40,49,35.3,0.705,39,0
465 | 5,88,78,30,0,27.6,0.258,37,0
466 | 10,115,98,0,0,24,1.022,34,0
467 | 0,124,56,13,105,21.8,0.452,21,0
468 | 0,74,52,10,36,27.8,0.269,22,0
469 | 0,97,64,36,100,36.8,0.6,25,0
470 | 8,120,0,0,0,30,0.183,38,1
471 | 6,154,78,41,140,46.1,0.571,27,0
472 | 1,144,82,40,0,41.3,0.607,28,0
473 | 0,137,70,38,0,33.2,0.17,22,0
474 | 0,119,66,27,0,38.8,0.259,22,0
475 | 7,136,90,0,0,29.9,0.21,50,0
476 | 4,114,64,0,0,28.9,0.126,24,0
477 | 0,137,84,27,0,27.3,0.231,59,0
478 | 2,105,80,45,191,33.7,0.711,29,1
479 | 7,114,76,17,110,23.8,0.466,31,0
480 | 8,126,74,38,75,25.9,0.162,39,0
481 | 4,132,86,31,0,28,0.419,63,0
482 | 3,158,70,30,328,35.5,0.344,35,1
483 | 0,123,88,37,0,35.2,0.197,29,0
484 | 4,85,58,22,49,27.8,0.306,28,0
485 | 0,84,82,31,125,38.2,0.233,23,0
486 | 0,145,0,0,0,44.2,0.63,31,1
487 | 0,135,68,42,250,42.3,0.365,24,1
488 | 1,139,62,41,480,40.7,0.536,21,0
489 | 0,173,78,32,265,46.5,1.159,58,0
490 | 4,99,72,17,0,25.6,0.294,28,0
491 | 8,194,80,0,0,26.1,0.551,67,0
492 | 2,83,65,28,66,36.8,0.629,24,0
493 | 2,89,90,30,0,33.5,0.292,42,0
494 | 4,99,68,38,0,32.8,0.145,33,0
495 | 4,125,70,18,122,28.9,1.144,45,1
496 | 3,80,0,0,0,0,0.174,22,0
497 | 6,166,74,0,0,26.6,0.304,66,0
498 | 5,110,68,0,0,26,0.292,30,0
499 | 2,81,72,15,76,30.1,0.547,25,0
500 | 7,195,70,33,145,25.1,0.163,55,1
501 | 6,154,74,32,193,29.3,0.839,39,0
502 | 2,117,90,19,71,25.2,0.313,21,0
503 | 3,84,72,32,0,37.2,0.267,28,0
504 | 6,0,68,41,0,39,0.727,41,1
505 | 7,94,64,25,79,33.3,0.738,41,0
506 | 3,96,78,39,0,37.3,0.238,40,0
507 | 10,75,82,0,0,33.3,0.263,38,0
508 | 0,180,90,26,90,36.5,0.314,35,1
509 | 1,130,60,23,170,28.6,0.692,21,0
510 | 2,84,50,23,76,30.4,0.968,21,0
511 | 8,120,78,0,0,25,0.409,64,0
512 | 12,84,72,31,0,29.7,0.297,46,1
513 | 0,139,62,17,210,22.1,0.207,21,0
514 | 9,91,68,0,0,24.2,0.2,58,0
515 | 2,91,62,0,0,27.3,0.525,22,0
516 | 3,99,54,19,86,25.6,0.154,24,0
517 | 3,163,70,18,105,31.6,0.268,28,1
518 | 9,145,88,34,165,30.3,0.771,53,1
519 | 7,125,86,0,0,37.6,0.304,51,0
520 | 13,76,60,0,0,32.8,0.18,41,0
521 | 6,129,90,7,326,19.6,0.582,60,0
522 | 2,68,70,32,66,25,0.187,25,0
523 | 3,124,80,33,130,33.2,0.305,26,0
524 | 6,114,0,0,0,0,0.189,26,0
525 | 9,130,70,0,0,34.2,0.652,45,1
526 | 3,125,58,0,0,31.6,0.151,24,0
527 | 3,87,60,18,0,21.8,0.444,21,0
528 | 1,97,64,19,82,18.2,0.299,21,0
529 | 3,116,74,15,105,26.3,0.107,24,0
530 | 0,117,66,31,188,30.8,0.493,22,0
531 | 0,111,65,0,0,24.6,0.66,31,0
532 | 2,122,60,18,106,29.8,0.717,22,0
533 | 0,107,76,0,0,45.3,0.686,24,0
534 | 1,86,66,52,65,41.3,0.917,29,0
535 | 6,91,0,0,0,29.8,0.501,31,0
536 | 1,77,56,30,56,33.3,1.251,24,0
537 | 4,132,0,0,0,32.9,0.302,23,1
538 | 0,105,90,0,0,29.6,0.197,46,0
539 | 0,57,60,0,0,21.7,0.735,67,0
540 | 0,127,80,37,210,36.3,0.804,23,0
541 | 3,129,92,49,155,36.4,0.968,32,1
542 | 8,100,74,40,215,39.4,0.661,43,1
543 | 3,128,72,25,190,32.4,0.549,27,1
544 | 10,90,85,32,0,34.9,0.825,56,1
545 | 4,84,90,23,56,39.5,0.159,25,0
546 | 1,88,78,29,76,32,0.365,29,0
547 | 8,186,90,35,225,34.5,0.423,37,1
548 | 5,187,76,27,207,43.6,1.034,53,1
549 | 4,131,68,21,166,33.1,0.16,28,0
550 | 1,164,82,43,67,32.8,0.341,50,0
551 | 4,189,110,31,0,28.5,0.68,37,0
552 | 1,116,70,28,0,27.4,0.204,21,0
553 | 3,84,68,30,106,31.9,0.591,25,0
554 | 6,114,88,0,0,27.8,0.247,66,0
555 | 1,88,62,24,44,29.9,0.422,23,0
556 | 1,84,64,23,115,36.9,0.471,28,0
557 | 7,124,70,33,215,25.5,0.161,37,0
558 | 1,97,70,40,0,38.1,0.218,30,0
559 | 8,110,76,0,0,27.8,0.237,58,0
560 | 11,103,68,40,0,46.2,0.126,42,0
561 | 11,85,74,0,0,30.1,0.3,35,0
562 | 6,125,76,0,0,33.8,0.121,54,1
563 | 0,198,66,32,274,41.3,0.502,28,1
564 | 1,87,68,34,77,37.6,0.401,24,0
565 | 6,99,60,19,54,26.9,0.497,32,0
566 | 0,91,80,0,0,32.4,0.601,27,0
567 | 2,95,54,14,88,26.1,0.748,22,0
568 | 1,99,72,30,18,38.6,0.412,21,0
569 | 6,92,62,32,126,32,0.085,46,0
570 | 4,154,72,29,126,31.3,0.338,37,0
571 | 0,121,66,30,165,34.3,0.203,33,1
572 | 3,78,70,0,0,32.5,0.27,39,0
573 | 2,130,96,0,0,22.6,0.268,21,0
574 | 3,111,58,31,44,29.5,0.43,22,0
575 | 2,98,60,17,120,34.7,0.198,22,0
576 | 1,143,86,30,330,30.1,0.892,23,0
577 | 1,119,44,47,63,35.5,0.28,25,0
578 | 6,108,44,20,130,24,0.813,35,0
579 | 2,118,80,0,0,42.9,0.693,21,1
580 | 10,133,68,0,0,27,0.245,36,0
581 | 2,197,70,99,0,34.7,0.575,62,1
582 | 0,151,90,46,0,42.1,0.371,21,1
583 | 6,109,60,27,0,25,0.206,27,0
584 | 12,121,78,17,0,26.5,0.259,62,0
585 | 8,100,76,0,0,38.7,0.19,42,0
586 | 8,124,76,24,600,28.7,0.687,52,1
587 | 1,93,56,11,0,22.5,0.417,22,0
588 | 8,143,66,0,0,34.9,0.129,41,1
589 | 6,103,66,0,0,24.3,0.249,29,0
590 | 3,176,86,27,156,33.3,1.154,52,1
591 | 0,73,0,0,0,21.1,0.342,25,0
592 | 11,111,84,40,0,46.8,0.925,45,1
593 | 2,112,78,50,140,39.4,0.175,24,0
594 | 3,132,80,0,0,34.4,0.402,44,1
595 | 2,82,52,22,115,28.5,1.699,25,0
596 | 6,123,72,45,230,33.6,0.733,34,0
597 | 0,188,82,14,185,32,0.682,22,1
598 | 0,67,76,0,0,45.3,0.194,46,0
599 | 1,89,24,19,25,27.8,0.559,21,0
600 | 1,173,74,0,0,36.8,0.088,38,1
601 | 1,109,38,18,120,23.1,0.407,26,0
602 | 1,108,88,19,0,27.1,0.4,24,0
603 | 6,96,0,0,0,23.7,0.19,28,0
604 | 1,124,74,36,0,27.8,0.1,30,0
605 | 7,150,78,29,126,35.2,0.692,54,1
606 | 4,183,0,0,0,28.4,0.212,36,1
607 | 1,124,60,32,0,35.8,0.514,21,0
608 | 1,181,78,42,293,40,1.258,22,1
609 | 1,92,62,25,41,19.5,0.482,25,0
610 | 0,152,82,39,272,41.5,0.27,27,0
611 | 1,111,62,13,182,24,0.138,23,0
612 | 3,106,54,21,158,30.9,0.292,24,0
613 | 3,174,58,22,194,32.9,0.593,36,1
614 | 7,168,88,42,321,38.2,0.787,40,1
615 | 6,105,80,28,0,32.5,0.878,26,0
616 | 11,138,74,26,144,36.1,0.557,50,1
617 | 3,106,72,0,0,25.8,0.207,27,0
618 | 6,117,96,0,0,28.7,0.157,30,0
619 | 2,68,62,13,15,20.1,0.257,23,0
620 | 9,112,82,24,0,28.2,1.282,50,1
621 | 0,119,0,0,0,32.4,0.141,24,1
622 | 2,112,86,42,160,38.4,0.246,28,0
623 | 2,92,76,20,0,24.2,1.698,28,0
624 | 6,183,94,0,0,40.8,1.461,45,0
625 | 0,94,70,27,115,43.5,0.347,21,0
626 | 2,108,64,0,0,30.8,0.158,21,0
627 | 4,90,88,47,54,37.7,0.362,29,0
628 | 0,125,68,0,0,24.7,0.206,21,0
629 | 0,132,78,0,0,32.4,0.393,21,0
630 | 5,128,80,0,0,34.6,0.144,45,0
631 | 4,94,65,22,0,24.7,0.148,21,0
632 | 7,114,64,0,0,27.4,0.732,34,1
633 | 0,102,78,40,90,34.5,0.238,24,0
634 | 2,111,60,0,0,26.2,0.343,23,0
635 | 1,128,82,17,183,27.5,0.115,22,0
636 | 10,92,62,0,0,25.9,0.167,31,0
637 | 13,104,72,0,0,31.2,0.465,38,1
638 | 5,104,74,0,0,28.8,0.153,48,0
639 | 2,94,76,18,66,31.6,0.649,23,0
640 | 7,97,76,32,91,40.9,0.871,32,1
641 | 1,100,74,12,46,19.5,0.149,28,0
642 | 0,102,86,17,105,29.3,0.695,27,0
643 | 4,128,70,0,0,34.3,0.303,24,0
644 | 6,147,80,0,0,29.5,0.178,50,1
645 | 4,90,0,0,0,28,0.61,31,0
646 | 3,103,72,30,152,27.6,0.73,27,0
647 | 2,157,74,35,440,39.4,0.134,30,0
648 | 1,167,74,17,144,23.4,0.447,33,1
649 | 0,179,50,36,159,37.8,0.455,22,1
650 | 11,136,84,35,130,28.3,0.26,42,1
651 | 0,107,60,25,0,26.4,0.133,23,0
652 | 1,91,54,25,100,25.2,0.234,23,0
653 | 1,117,60,23,106,33.8,0.466,27,0
654 | 5,123,74,40,77,34.1,0.269,28,0
655 | 2,120,54,0,0,26.8,0.455,27,0
656 | 1,106,70,28,135,34.2,0.142,22,0
657 | 2,155,52,27,540,38.7,0.24,25,1
658 | 2,101,58,35,90,21.8,0.155,22,0
659 | 1,120,80,48,200,38.9,1.162,41,0
660 | 11,127,106,0,0,39,0.19,51,0
661 | 3,80,82,31,70,34.2,1.292,27,1
662 | 10,162,84,0,0,27.7,0.182,54,0
663 | 1,199,76,43,0,42.9,1.394,22,1
664 | 8,167,106,46,231,37.6,0.165,43,1
665 | 9,145,80,46,130,37.9,0.637,40,1
666 | 6,115,60,39,0,33.7,0.245,40,1
667 | 1,112,80,45,132,34.8,0.217,24,0
668 | 4,145,82,18,0,32.5,0.235,70,1
669 | 10,111,70,27,0,27.5,0.141,40,1
670 | 6,98,58,33,190,34,0.43,43,0
671 | 9,154,78,30,100,30.9,0.164,45,0
672 | 6,165,68,26,168,33.6,0.631,49,0
673 | 1,99,58,10,0,25.4,0.551,21,0
674 | 10,68,106,23,49,35.5,0.285,47,0
675 | 3,123,100,35,240,57.3,0.88,22,0
676 | 8,91,82,0,0,35.6,0.587,68,0
677 | 6,195,70,0,0,30.9,0.328,31,1
678 | 9,156,86,0,0,24.8,0.23,53,1
679 | 0,93,60,0,0,35.3,0.263,25,0
680 | 3,121,52,0,0,36,0.127,25,1
681 | 2,101,58,17,265,24.2,0.614,23,0
682 | 2,56,56,28,45,24.2,0.332,22,0
683 | 0,162,76,36,0,49.6,0.364,26,1
684 | 0,95,64,39,105,44.6,0.366,22,0
685 | 4,125,80,0,0,32.3,0.536,27,1
686 | 5,136,82,0,0,0,0.64,69,0
687 | 2,129,74,26,205,33.2,0.591,25,0
688 | 3,130,64,0,0,23.1,0.314,22,0
689 | 1,107,50,19,0,28.3,0.181,29,0
690 | 1,140,74,26,180,24.1,0.828,23,0
691 | 1,144,82,46,180,46.1,0.335,46,1
692 | 8,107,80,0,0,24.6,0.856,34,0
693 | 13,158,114,0,0,42.3,0.257,44,1
694 | 2,121,70,32,95,39.1,0.886,23,0
695 | 7,129,68,49,125,38.5,0.439,43,1
696 | 2,90,60,0,0,23.5,0.191,25,0
697 | 7,142,90,24,480,30.4,0.128,43,1
698 | 3,169,74,19,125,29.9,0.268,31,1
699 | 0,99,0,0,0,25,0.253,22,0
700 | 4,127,88,11,155,34.5,0.598,28,0
701 | 4,118,70,0,0,44.5,0.904,26,0
702 | 2,122,76,27,200,35.9,0.483,26,0
703 | 6,125,78,31,0,27.6,0.565,49,1
704 | 1,168,88,29,0,35,0.905,52,1
705 | 2,129,0,0,0,38.5,0.304,41,0
706 | 4,110,76,20,100,28.4,0.118,27,0
707 | 6,80,80,36,0,39.8,0.177,28,0
708 | 10,115,0,0,0,0,0.261,30,1
709 | 2,127,46,21,335,34.4,0.176,22,0
710 | 9,164,78,0,0,32.8,0.148,45,1
711 | 2,93,64,32,160,38,0.674,23,1
712 | 3,158,64,13,387,31.2,0.295,24,0
713 | 5,126,78,27,22,29.6,0.439,40,0
714 | 10,129,62,36,0,41.2,0.441,38,1
715 | 0,134,58,20,291,26.4,0.352,21,0
716 | 3,102,74,0,0,29.5,0.121,32,0
717 | 7,187,50,33,392,33.9,0.826,34,1
718 | 3,173,78,39,185,33.8,0.97,31,1
719 | 10,94,72,18,0,23.1,0.595,56,0
720 | 1,108,60,46,178,35.5,0.415,24,0
721 | 5,97,76,27,0,35.6,0.378,52,1
722 | 4,83,86,19,0,29.3,0.317,34,0
723 | 1,114,66,36,200,38.1,0.289,21,0
724 | 1,149,68,29,127,29.3,0.349,42,1
725 | 5,117,86,30,105,39.1,0.251,42,0
726 | 1,111,94,0,0,32.8,0.265,45,0
727 | 4,112,78,40,0,39.4,0.236,38,0
728 | 1,116,78,29,180,36.1,0.496,25,0
729 | 0,141,84,26,0,32.4,0.433,22,0
730 | 2,175,88,0,0,22.9,0.326,22,0
731 | 2,92,52,0,0,30.1,0.141,22,0
732 | 3,130,78,23,79,28.4,0.323,34,1
733 | 8,120,86,0,0,28.4,0.259,22,1
734 | 2,174,88,37,120,44.5,0.646,24,1
735 | 2,106,56,27,165,29,0.426,22,0
736 | 2,105,75,0,0,23.3,0.56,53,0
737 | 4,95,60,32,0,35.4,0.284,28,0
738 | 0,126,86,27,120,27.4,0.515,21,0
739 | 8,65,72,23,0,32,0.6,42,0
740 | 2,99,60,17,160,36.6,0.453,21,0
741 | 1,102,74,0,0,39.5,0.293,42,1
742 | 11,120,80,37,150,42.3,0.785,48,1
743 | 3,102,44,20,94,30.8,0.4,26,0
744 | 1,109,58,18,116,28.5,0.219,22,0
745 | 9,140,94,0,0,32.7,0.734,45,1
746 | 13,153,88,37,140,40.6,1.174,39,0
747 | 12,100,84,33,105,30,0.488,46,0
748 | 1,147,94,41,0,49.3,0.358,27,1
749 | 1,81,74,41,57,46.3,1.096,32,0
750 | 3,187,70,22,200,36.4,0.408,36,1
751 | 6,162,62,0,0,24.3,0.178,50,1
752 | 4,136,70,0,0,31.2,1.182,22,1
753 | 1,121,78,39,74,39,0.261,28,0
754 | 3,108,62,24,0,26,0.223,25,0
755 | 0,181,88,44,510,43.3,0.222,26,1
756 | 8,154,78,32,0,32.4,0.443,45,1
757 | 1,128,88,39,110,36.5,1.057,37,1
758 | 7,137,90,41,0,32,0.391,39,0
759 | 0,123,72,0,0,36.3,0.258,52,1
760 | 1,106,76,0,0,37.5,0.197,26,0
761 | 6,190,92,0,0,35.5,0.278,66,1
762 | 2,88,58,26,16,28.4,0.766,22,0
763 | 9,170,74,31,0,44,0.403,43,1
764 | 9,89,62,0,0,22.5,0.142,33,0
765 | 10,101,76,48,180,32.9,0.171,63,0
766 | 2,122,70,27,0,36.8,0.34,27,0
767 | 5,121,72,23,112,26.2,0.245,30,0
768 | 1,126,60,0,0,30.1,0.349,47,1
769 | 1,93,70,31,0,30.4,0.315,23,0
770 | 


--------------------------------------------------------------------------------
/notebooks/Applied_Machine_Learning_Ensemble_Modeling_Learners.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "nbformat": 4,
   3 |   "nbformat_minor": 0,
   4 |   "metadata": {
   5 |     "colab": {
   6 |       "name": "python_live_session_template.ipynb",
   7 |       "provenance": []
   8 |     },
   9 |     "kernelspec": {
  10 |       "display_name": "Python 3",
  11 |       "language": "python",
  12 |       "name": "python3"
  13 |     },
  14 |     "language_info": {
  15 |       "codemirror_mode": {
  16 |         "name": "ipython",
  17 |         "version": 3
  18 |       },
  19 |       "file_extension": ".py",
  20 |       "mimetype": "text/x-python",
  21 |       "name": "python",
  22 |       "nbconvert_exporter": "python",
  23 |       "pygments_lexer": "ipython3",
  24 |       "version": "3.7.1"
  25 |     }
  26 |   },
  27 |   "cells": [
  28 |     {
  29 |       "cell_type": "markdown",
  30 |       "metadata": {
  31 |         "colab_type": "text",
  32 |         "id": "6Ijg5wUCTQYG"
  33 |       },
  34 |       "source": [
  35 |         "<p align=\"center\">\n",
  36 |         "<img src=\"https://github.com/datacamp/python-live-training-template/blob/master/assets/datacamp.svg?raw=True\" alt = \"DataCamp icon\" width=\"50%\">\n",
  37 |         "</p>\n",
  38 |         "<br><br>\n",
  39 |         "\n",
  40 |         "\n",
  41 |         "## **Applied Machine Learning - Ensemble Modeling Live Training**\n",
  42 |         "\n",
  43 |         "Welcome to this hands-on training where you will immerse yourself in applied machine learning in Python where we'll explore model stacking. Using `sklearn.ensemble`, we'll learn how to create layers that are stacking-ready.\n",
  44 |         "\n",
  45 |         "The foundations of model stacking:\n",
  46 |         "\n",
  47 |         "* Create various types of baseline models, including linear and logistic regression using Scikit-Learn, for comparison to ensemble methods.\n",
  48 |         "* Build layers, then stack them up.\n",
  49 |         "* Calculate and visualize performance metrics.\n",
  50 |         "\n",
  51 |         "\n",
  52 |         "\n",
  53 |         "---\n",
  54 |         "\n",
  55 |         "\n",
  56 |         "\n",
  57 |         "## **1st Dataset**\n",
  58 |         "\n",
  59 |         "\n",
  60 |         "The first dataset we'll use is a CSV file named `pima-indians-diabetes.csv`, which contains data on females of Pima Indian heritage that are at least 21 years old. It contains the following columns:\n",
  61 |         "\n",
  62 |         "- `n_preg`: Number of pregnancies\n",
  63 |         "- `pl_glucose`: Plasma glucose concentration 2 hours after an oral glucose tolerance test\n",
  64 |         "- `dia_bp`: Diastolic blood pressure (mm Hg)\n",
  65 |         "- `tri_thick`: Triceps skin fold thickness (mm)\n",
  66 |         "- `serum_ins`: 2-Hour serum insulin (mu U/ml)\n",
  67 |         "- `bmi`: Body mass index (weight in kg/(height in m)^2)\n",
  68 |         "- `diab_ped`: Diabetes pedigree function\n",
  69 |         "- `age`: Age (years)\n",
  70 |         "- `class`: Class variable (0 or 1)\n"
  71 |       ]
  72 |     },
  73 |     {
  74 |       "cell_type": "code",
  75 |       "metadata": {
  76 |         "colab_type": "code",
  77 |         "id": "EMQfyC7GUNhT",
  78 |         "colab": {}
  79 |       },
  80 |       "source": [
  81 |         "# Import libraries\n",
  82 |         "import pandas as pd\n",
  83 |         "import numpy as np\n",
  84 |         "from numpy import mean\n",
  85 |         "from numpy import std\n",
  86 |         "import matplotlib.pyplot as plt\n",
  87 |         "import seaborn as sns\n",
  88 |         "from collections import Counter\n",
  89 |         "from sklearn.preprocessing import LabelEncoder\n",
  90 |         "from sklearn.model_selection import cross_val_score\n",
  91 |         "from sklearn.model_selection import RepeatedStratifiedKFold\n",
  92 |         "from sklearn.dummy import DummyClassifier\n",
  93 |         "from sklearn.tree import DecisionTreeClassifier"
  94 |       ],
  95 |       "execution_count": null,
  96 |       "outputs": []
  97 |     },
  98 |     {
  99 |       "cell_type": "code",
 100 |       "metadata": {
 101 |         "colab_type": "code",
 102 |         "id": "l8t_EwRNZPLB",
 103 |         "colab": {}
 104 |       },
 105 |       "source": [
 106 |         "# Read in the dataset as Pandas DataFrame\n",
 107 |         "diabetes = pd.read_csv('https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/data/pima-indians-diabetes.csv?raw=true')"
 108 |       ],
 109 |       "execution_count": null,
 110 |       "outputs": []
 111 |     },
 112 |     {
 113 |       "cell_type": "code",
 114 |       "metadata": {
 115 |         "id": "PRJPuinPZpGA",
 116 |         "colab_type": "code",
 117 |         "colab": {}
 118 |       },
 119 |       "source": [
 120 |         "# Look at data using the info() function\n"
 121 |       ],
 122 |       "execution_count": null,
 123 |       "outputs": []
 124 |     },
 125 |     {
 126 |       "cell_type": "markdown",
 127 |       "metadata": {
 128 |         "id": "C6OVOkU80oKP",
 129 |         "colab_type": "text"
 130 |       },
 131 |       "source": [
 132 |         "## **Observations:** \n",
 133 |         "- The `info()` function is critical to beginning to understand your data.  Here, there are no missing values.  However, that is not typical.\n",
 134 |         "- There is a mixture of integers and floats with the first 5 columns being `int64`, the next 2 `float64` and the last 2 'int64`."
 135 |       ]
 136 |     },
 137 |     {
 138 |       "cell_type": "markdown",
 139 |       "metadata": {
 140 |         "id": "v3hAsYrhVi4L",
 141 |         "colab_type": "text"
 142 |       },
 143 |       "source": [
 144 |         "---\n",
 145 |         "\n",
 146 |         "## Q&A\n",
 147 |         "\n",
 148 |         "--- "
 149 |       ]
 150 |     },
 151 |     {
 152 |       "cell_type": "code",
 153 |       "metadata": {
 154 |         "id": "E6UtlpG_Zo50",
 155 |         "colab_type": "code",
 156 |         "colab": {}
 157 |       },
 158 |       "source": [
 159 |         "# Look at data using the describe() function\n"
 160 |       ],
 161 |       "execution_count": null,
 162 |       "outputs": []
 163 |     },
 164 |     {
 165 |       "cell_type": "markdown",
 166 |       "metadata": {
 167 |         "id": "bCK9W_gk1HG8",
 168 |         "colab_type": "text"
 169 |       },
 170 |       "source": [
 171 |         "\n",
 172 |         "## **Observations:** \n",
 173 |         "- The `.describe()` function gives the summary statistics of the data.  Notice that the min of the 1st six columns is zero.  Even though there are no missing values, this is indicative of the measurements for those features having not been captured.\n",
 174 |         "- Although we previously saw there is a mixture of integer and float data types (as seen with `.info()`), the printout makes it appear as if all values are float.  "
 175 |       ]
 176 |     },
 177 |     {
 178 |       "cell_type": "code",
 179 |       "metadata": {
 180 |         "id": "UE5F_JUQ2X-0",
 181 |         "colab_type": "code",
 182 |         "colab": {}
 183 |       },
 184 |       "source": [
 185 |         "# Print the first 5 rows of the data using the head() function\n"
 186 |       ],
 187 |       "execution_count": null,
 188 |       "outputs": []
 189 |     },
 190 |     {
 191 |       "cell_type": "markdown",
 192 |       "metadata": {
 193 |         "id": "A2VCIx0K2bT1",
 194 |         "colab_type": "text"
 195 |       },
 196 |       "source": [
 197 |         "\n",
 198 |         "## **Observation:**\n",
 199 |         "- Printing out the first 5 rows, we see that the data types of the columns are indeed as stated previously."
 200 |       ]
 201 |     },
 202 |     {
 203 |       "cell_type": "markdown",
 204 |       "metadata": {
 205 |         "id": "ajAzhMDc2b1D",
 206 |         "colab_type": "text"
 207 |       },
 208 |       "source": [
 209 |         "## Let's check the number in each class:\n",
 210 |         "\n",
 211 |         "This avoids getting surprised by great results that are actually a side effect of class imbalance.  This happens when the majority class far outweighs the minority class."
 212 |       ]
 213 |     },
 214 |     {
 215 |       "cell_type": "code",
 216 |       "metadata": {
 217 |         "id": "MKeXN3441-9W",
 218 |         "colab_type": "code",
 219 |         "colab": {}
 220 |       },
 221 |       "source": [
 222 |         "# Summarize class distribution\n",
 223 |         "target = diabetes['class']\n",
 224 |         "counter = Counter(target)\n",
 225 |         "print(counter)"
 226 |       ],
 227 |       "execution_count": null,
 228 |       "outputs": []
 229 |     },
 230 |     {
 231 |       "cell_type": "markdown",
 232 |       "metadata": {
 233 |         "id": "FOpbGyQw55v3",
 234 |         "colab_type": "text"
 235 |       },
 236 |       "source": [
 237 |         "## **Observation:** For every two negative cases there is one positive case, not enough of a difference to be considered class imbalance.  \n",
 238 |         "- Class imbalance tends to exist when the majority class is > 90% although there is no hard and fast rule about this threshold."
 239 |       ]
 240 |     },
 241 |     {
 242 |       "cell_type": "code",
 243 |       "metadata": {
 244 |         "id": "n5XaYl9ZZ8B5",
 245 |         "colab_type": "code",
 246 |         "colab": {}
 247 |       },
 248 |       "source": [
 249 |         "# Convert Pandas DataFrame to numpy array - Return only the values of the DataFrame with DataFrame.to_numpy()\n"
 250 |       ],
 251 |       "execution_count": null,
 252 |       "outputs": []
 253 |     },
 254 |     {
 255 |       "cell_type": "markdown",
 256 |       "metadata": {
 257 |         "id": "MlGa9IBc7Gsr",
 258 |         "colab_type": "text"
 259 |       },
 260 |       "source": [
 261 |         "### Always verify that your X matrix and target array have the same number of rows to avoid errors during model training."
 262 |       ]
 263 |     },
 264 |     {
 265 |       "cell_type": "code",
 266 |       "metadata": {
 267 |         "id": "9FEvD6Ab6InP",
 268 |         "colab_type": "code",
 269 |         "colab": {}
 270 |       },
 271 |       "source": [
 272 |         "# Create X matrix and y (target) array using slicing [row_start:row_end, col_start:target_col],[row_start:row_end, target_col]\n",
 273 |         "\n",
 274 |         "\n",
 275 |         "# Print X matrix and y (target) array dimensions using .shape \n"
 276 |       ],
 277 |       "execution_count": null,
 278 |       "outputs": []
 279 |     },
 280 |     {
 281 |       "cell_type": "code",
 282 |       "metadata": {
 283 |         "id": "hoI7t4U-Z8LU",
 284 |         "colab_type": "code",
 285 |         "colab": {}
 286 |       },
 287 |       "source": [
 288 |         "# Convert X matrix data types to 'float32' for consistency using .astype()\n",
 289 |         "\n",
 290 |         "\n",
 291 |         "# Convert y (target) array to 'str' using .astype()\n",
 292 |         "\n",
 293 |         "\n",
 294 |         "# Encode class labels in y array using dot notation with LabelEncoder().fit_transform()\n",
 295 |         "# Hint: y goes in the fit_transform function call\n"
 296 |       ],
 297 |       "execution_count": null,
 298 |       "outputs": []
 299 |     },
 300 |     {
 301 |       "cell_type": "markdown",
 302 |       "metadata": {
 303 |         "id": "djXWv2xp9v1q",
 304 |         "colab_type": "text"
 305 |       },
 306 |       "source": [
 307 |         "### Don't let the `.astype('str')` throw you!  This is simply taking the class labels and label encoding them – regardless of their original format.\n",
 308 |         "\n",
 309 |         "\n"
 310 |       ]
 311 |     },
 312 |     {
 313 |       "cell_type": "markdown",
 314 |       "metadata": {
 315 |         "id": "OHHu8uz7_yVa",
 316 |         "colab_type": "text"
 317 |       },
 318 |       "source": [
 319 |         "## **Creating a Naive Classifier**\n",
 320 |         "Here we'll use the `DummyClassifier` from `sklearn`.  This creates a so-called 'naive' classifer and is simply a model that predicts a single class for all of the rows, regardless of their original class.  \n",
 321 |         "\n",
 322 |         "1. `DummyClassifier()` arguments:\n",
 323 |         " - `strategy`: Strategy to use to generate predictions.\n",
 324 |         "\n",
 325 |         "2. `RepeatedStratifiedKFold()` arguments:\n",
 326 |         " - `n_splits`: Number of folds.\n",
 327 |         " - `n_repeats`: Number of times cross-validator needs to be repeated.\n",
 328 |         " - `random_state`: Controls the generation of the random states for each repetition. Pass an int for reproducible output across multiple function calls.  (This is an equivalent argument to np.random.seed above, but will be specific to this naive model.)\n",
 329 |         "\n",
 330 |         "3. `cross_val_score()` arguments:\n",
 331 |         " - The model to use.\n",
 332 |         " - The data to fit. (X)\n",
 333 |         " - The target variable to try to predict. (y)\n",
 334 |         " - `scoring`: A single string scorer callable object/function such as 'accuracy' or 'roc_auc'.  See https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter for more options.\n",
 335 |         " - `cv`: Cross-validation splitting strategy (default is 5)\n",
 336 |         " - `n_jobs`: Number of CPU cores used when parallelizing.  Set to -1 helps to avoid non-convergence errors.\n",
 337 |         " - `error_score`: Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised."
 338 |       ]
 339 |     },
 340 |     {
 341 |       "cell_type": "code",
 342 |       "metadata": {
 343 |         "id": "BL4huFGPZ8RA",
 344 |         "colab_type": "code",
 345 |         "colab": {}
 346 |       },
 347 |       "source": [
 348 |         "# Evaluate naive\n",
 349 |         "\n",
 350 |         "# Instantiate a DummyClassifier with 'most_frequent' strategy\n",
 351 |         "naive = \n",
 352 |         "\n",
 353 |         "# Create RepeatedStratifiedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
 354 |         "cv = \n",
 355 |         "\n",
 356 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator, n_jobs=-1, and error_score set to 'raise'\n",
 357 |         "n_scores = \n",
 358 |         "\n",
 359 |         "# Print mean and standard deviation of n_scores: \n",
 360 |         "print('Naive score: %.3f (%.3f)' % (mean(), std()))\n"
 361 |       ],
 362 |       "execution_count": null,
 363 |       "outputs": []
 364 |     },
 365 |     {
 366 |       "cell_type": "markdown",
 367 |       "metadata": {
 368 |         "id": "2tEgwsOfsoB6",
 369 |         "colab_type": "text"
 370 |       },
 371 |       "source": [
 372 |         "## **Observation** \n",
 373 |         "- We want to do better than 65% accuracy to consider any other models as an improvement to a totally naive model."
 374 |       ]
 375 |     },
 376 |     {
 377 |       "cell_type": "markdown",
 378 |       "metadata": {
 379 |         "id": "l8QZOyg8s1eQ",
 380 |         "colab_type": "text"
 381 |       },
 382 |       "source": [
 383 |         "## **Creating a Baseline Classifier**\n",
 384 |         "Now we'll create a baseline classifier, one that seeks to correctly predict the class that each observation belongs to.  Since the target variable is binary, we'll instantiate a `DecisionTreeClassifier` model. "
 385 |       ]
 386 |     },
 387 |     {
 388 |       "cell_type": "code",
 389 |       "metadata": {
 390 |         "id": "QczFUGSfbQvl",
 391 |         "colab_type": "code",
 392 |         "colab": {}
 393 |       },
 394 |       "source": [
 395 |         "# Evaluate baseline model\n",
 396 |         "\n",
 397 |         "# Instantiate a DecisionTreeClassifier\n",
 398 |         "model = \n",
 399 |         "\n",
 400 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator 'cv', and error_score set to 'raise'\n",
 401 |         "m_scores = \n",
 402 |         "\n",
 403 |         "# Print mean and standard deviation of m_scores: \n",
 404 |         "print('Baseline score: %.3f (%.3f)' % (mean(), std()))"
 405 |       ],
 406 |       "execution_count": null,
 407 |       "outputs": []
 408 |     },
 409 |     {
 410 |       "cell_type": "markdown",
 411 |       "metadata": {
 412 |         "id": "GRUBiqqmtNA6",
 413 |         "colab_type": "text"
 414 |       },
 415 |       "source": [
 416 |         "## **Observation**\n",
 417 |         "- We want to do better than 70% with a Stacking Classifier to consider it an improvement over this baseline Decision Tree model."
 418 |       ]
 419 |     },
 420 |     {
 421 |       "cell_type": "markdown",
 422 |       "metadata": {
 423 |         "colab_type": "text",
 424 |         "id": "BMYfcKeDY85K"
 425 |       },
 426 |       "source": [
 427 |         "## **Getting started with Stacking Classifier**\n",
 428 |         "\n",
 429 |         "- We're going to compare several additional baseline classifiers to see if they perform better than the Decision Tree Classifier we just trained previously.\n"
 430 |       ]
 431 |     },
 432 |     {
 433 |       "cell_type": "markdown",
 434 |       "metadata": {
 435 |         "id": "T2pwEXnQBEFf",
 436 |         "colab_type": "text"
 437 |       },
 438 |       "source": [
 439 |         "<p align=\"center\">\n",
 440 |         "<img src=\"https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/assets/stacking.png?raw=True\" alt = \"Stacking\" width=\"90%\">\n",
 441 |         "</p>\n",
 442 |         "<br><br>\n",
 443 |         "\n",
 444 |         "- We'll start by importing additional packages that we'll need."
 445 |       ]
 446 |     },
 447 |     {
 448 |       "cell_type": "code",
 449 |       "metadata": {
 450 |         "id": "eHCHmx7k5NeT",
 451 |         "colab_type": "code",
 452 |         "colab": {}
 453 |       },
 454 |       "source": [
 455 |         "# Import several other classifiers for ensemble\n",
 456 |         "from sklearn.neighbors import KNeighborsClassifier\n",
 457 |         "from sklearn.svm import SVC\n",
 458 |         "from sklearn.naive_bayes import GaussianNB\n",
 459 |         "from sklearn.linear_model import LogisticRegression\n",
 460 |         "from sklearn.ensemble import StackingClassifier"
 461 |       ],
 462 |       "execution_count": null,
 463 |       "outputs": []
 464 |     },
 465 |     {
 466 |       "cell_type": "markdown",
 467 |       "metadata": {
 468 |         "id": "teQMB0aWxhcN",
 469 |         "colab_type": "text"
 470 |       },
 471 |       "source": [
 472 |         "## Create custom functions\n",
 473 |         "1. get_stacking() - This function will create the layers of our `StackingClassifier()`.\n",
 474 |         "2. get_models() - This function will create a dictionary of models to be evaluated.\n",
 475 |         "3. evaluate_model() - This function will evaluate each of the models to be compared."
 476 |       ]
 477 |     },
 478 |     {
 479 |       "cell_type": "markdown",
 480 |       "metadata": {
 481 |         "id": "wqtHxQFPvMqu",
 482 |         "colab_type": "text"
 483 |       },
 484 |       "source": [
 485 |         "## Custom function # 1: get_stacking()\n",
 486 |         "1. `StackingClassifier()` arguments:\n",
 487 |         " - `estimators`: List of baseline classifiers\n",
 488 |         " - `final_estimator`: Defined meta classifier \n",
 489 |         " - `cv`: Number of cross validations to perform."
 490 |       ]
 491 |     },
 492 |     {
 493 |       "cell_type": "code",
 494 |       "metadata": {
 495 |         "id": "YFhBv6jR6FOe",
 496 |         "colab_type": "code",
 497 |         "colab": {}
 498 |       },
 499 |       "source": [
 500 |         "# Define get_stacking():\n",
 501 |         "def :\n",
 502 |         "\n",
 503 |         "\t# Create an empty list for the base models called layer1\n",
 504 |         "  \n",
 505 |         "\n",
 506 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsClassifier, SVC, and GaussianNB base models\n",
 507 |         "  # Hint: layer1.append(('ModelName', Classifier()))\n",
 508 |         "  \n",
 509 |         "\n",
 510 |         "  # Instantiate Logistic Regression as meta learner model called layer2\n",
 511 |         "  \n",
 512 |         "\n",
 513 |         "\t# Define StackingClassifier() called model passing layer1 model list and meta learner with 5 cross-validations\n",
 514 |         "  \n",
 515 |         "\n",
 516 |         "  # return model\n",
 517 |         "  "
 518 |       ],
 519 |       "execution_count": null,
 520 |       "outputs": []
 521 |     },
 522 |     {
 523 |       "cell_type": "markdown",
 524 |       "metadata": {
 525 |         "id": "d5szw9liyaxp",
 526 |         "colab_type": "text"
 527 |       },
 528 |       "source": [
 529 |         "## Custom function # 2: get_models()"
 530 |       ]
 531 |     },
 532 |     {
 533 |       "cell_type": "code",
 534 |       "metadata": {
 535 |         "id": "0hEJlDLB4kv5",
 536 |         "colab_type": "code",
 537 |         "colab": {}
 538 |       },
 539 |       "source": [
 540 |         "# Define get_models():\n",
 541 |         "def :\n",
 542 |         "\n",
 543 |         "  # Create empty dictionary called models\n",
 544 |         "  \n",
 545 |         "\n",
 546 |         "  # Add key:value pairs to dictionary with key as ModelName and value as instantiations (no arguments) for KNeighborsClassifier, SVC, and GaussianNB base models\n",
 547 |         "  # Hint: models['ModelName'] = Classifier()\n",
 548 |         " \n",
 549 |         "\n",
 550 |         "  # Add key:value pair to dictionary with key called Stacking and value that calls get_stacking() custom function\n",
 551 |         "  \n",
 552 |         "\n",
 553 |         "  # return dictionary\n",
 554 |         "  "
 555 |       ],
 556 |       "execution_count": null,
 557 |       "outputs": []
 558 |     },
 559 |     {
 560 |       "cell_type": "markdown",
 561 |       "metadata": {
 562 |         "id": "flSG4dH1zCTK",
 563 |         "colab_type": "text"
 564 |       },
 565 |       "source": [
 566 |         "## Custom function # 3: evaluate_model(model)"
 567 |       ]
 568 |     },
 569 |     {
 570 |       "cell_type": "code",
 571 |       "metadata": {
 572 |         "id": "mGLKRr0j5Nit",
 573 |         "colab_type": "code",
 574 |         "colab": {}
 575 |       },
 576 |       "source": [
 577 |         "# Define evaluate_model:\n",
 578 |         "def :\n",
 579 |         "\n",
 580 |         "  # Create RepeatedStratifiedKFold cross-validator with 10 folds, 3 repeats and a seed of 42.\n",
 581 |         "  cv = \n",
 582 |         "\n",
 583 |         "  # Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
 584 |         "  scores = \n",
 585 |         "\n",
 586 |         "  # return scores\n",
 587 |         "  "
 588 |       ],
 589 |       "execution_count": null,
 590 |       "outputs": []
 591 |     },
 592 |     {
 593 |       "cell_type": "code",
 594 |       "metadata": {
 595 |         "id": "Y5wmC-TH7B7E",
 596 |         "colab_type": "code",
 597 |         "colab": {}
 598 |       },
 599 |       "source": [
 600 |         "# Assign get_models() to a variable called models\n"
 601 |       ],
 602 |       "execution_count": null,
 603 |       "outputs": []
 604 |     },
 605 |     {
 606 |       "cell_type": "markdown",
 607 |       "metadata": {
 608 |         "id": "02tyK34l2eh7",
 609 |         "colab_type": "text"
 610 |       },
 611 |       "source": [
 612 |         "## Python Dictionary Review:\n",
 613 |         "- The items() method is used to return the list with all dictionary keys with values. Parameters: This method takes no parameters. Returns: A view object that displays a list of a given dictionary's (key, value) tuple pair.\n",
 614 |         "- For our purposes, we'll use the dictionary created when we call the get_models() custom function in a for loop to iterate over each key:value pair and store the results.\n",
 615 |         "- Then, we will plot the results as a `boxplot` for comparison using `seaborn`.\n",
 616 |         "\n",
 617 |         "1. `sns.boxplot()` arguments:\n",
 618 |         " - `x`: Names of the variables in the data\n",
 619 |         " - `y`: Names of the variables in the data\n",
 620 |         " - `showmeans`: Whether or not to show mark at the mean of the data."
 621 |       ]
 622 |     },
 623 |     {
 624 |       "cell_type": "code",
 625 |       "metadata": {
 626 |         "id": "QzXmYt1o6FWh",
 627 |         "colab_type": "code",
 628 |         "colab": {}
 629 |       },
 630 |       "source": [
 631 |         "# Evaluate the models and store results\n",
 632 |         "# Create an empty list for the results\n",
 633 |         "results =\n",
 634 |         "\n",
 635 |         "# Create an empty list for the model names\n",
 636 |         "names = \n",
 637 |         "\n",
 638 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
 639 |         "for :\n",
 640 |         "\n",
 641 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
 642 |         "\t\n",
 643 |         " \n",
 644 |         "  # Append output from scores to the results list\n",
 645 |         "\t\n",
 646 |         " \n",
 647 |         "  # Append name to the names list\n",
 648 |         "\t\n",
 649 |         " \n",
 650 |         "  # Print name, mean and standard deviation of scores:\n",
 651 |         "\tprint('>%s %.3f (%.3f)' % (, mean(), std()))\n",
 652 |         " \n",
 653 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
 654 |         "sns.boxplot(x=, y=, )"
 655 |       ],
 656 |       "execution_count": null,
 657 |       "outputs": []
 658 |     },
 659 |     {
 660 |       "cell_type": "markdown",
 661 |       "metadata": {
 662 |         "id": "xUqeWsol5RAt",
 663 |         "colab_type": "text"
 664 |       },
 665 |       "source": [
 666 |         "## **Observation**\n",
 667 |         "- Recall that we want to do better than 70% with a Stacking Classifier to consider it an improvement over the Decision Tree baseline model and, although we did achieve that, we can probably do even better with this dataset.  \n",
 668 |         "- Let's try some hyperparameter tuning via cross-validation next..."
 669 |       ]
 670 |     },
 671 |     {
 672 |       "cell_type": "markdown",
 673 |       "metadata": {
 674 |         "id": "xwc_6_Qf4amu",
 675 |         "colab_type": "text"
 676 |       },
 677 |       "source": [
 678 |         "---\n",
 679 |         "\n",
 680 |         "## Q&A\n",
 681 |         "\n",
 682 |         "--- \n"
 683 |       ]
 684 |     },
 685 |     {
 686 |       "cell_type": "code",
 687 |       "metadata": {
 688 |         "id": "yMZ8gTb6LGCP",
 689 |         "colab_type": "code",
 690 |         "colab": {}
 691 |       },
 692 |       "source": [
 693 |         "# Import additional libraries\n",
 694 |         "from xgboost import XGBClassifier \n",
 695 |         "from sklearn.ensemble import RandomForestClassifier\n",
 696 |         "from sklearn.preprocessing import StandardScaler\n",
 697 |         "from sklearn.pipeline import Pipeline\n",
 698 |         "from sklearn.model_selection import RandomizedSearchCV, GridSearchCV\n",
 699 |         "import xgboost as xgb\n",
 700 |         "from datetime import datetime"
 701 |       ],
 702 |       "execution_count": null,
 703 |       "outputs": []
 704 |     },
 705 |     {
 706 |       "cell_type": "markdown",
 707 |       "metadata": {
 708 |         "id": "BfctBvrs4ZcQ",
 709 |         "colab_type": "text"
 710 |       },
 711 |       "source": [
 712 |         "## Custom function # 4: best_model(name, model)\n",
 713 |         "- We're going to create a Pipeline that scales the data before applying the parameter grid via cross-validation.\n",
 714 |         "- Then it returns the model with the best hyperparameters from the search grid for each model."
 715 |       ]
 716 |     },
 717 |     {
 718 |       "cell_type": "code",
 719 |       "metadata": {
 720 |         "id": "5RG7lpMY3Bzz",
 721 |         "colab_type": "code",
 722 |         "colab": {}
 723 |       },
 724 |       "source": [
 725 |         "# Define best_model:\n",
 726 |         "def best_model(name, model):\n",
 727 |         "  pipe = Pipeline([('scaler', StandardScaler()), ('classifier',model)])  \n",
 728 |         "\n",
 729 |         "  if name == 'SVM':\n",
 730 |         "    param_grid = {'classifier__kernel' : ['linear', 'poly', 'rbf', 'sigmoid', 'precomputed']} \n",
 731 |         "    # Create grid search object\n",
 732 |         "    # this uses k-fold cv\n",
 733 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5, n_jobs=-1)\n",
 734 |         "\n",
 735 |         "    # Fit on data\n",
 736 |         "    best_clf = clf.fit(X, y)\n",
 737 |         "\n",
 738 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
 739 |         "\n",
 740 |         "    return name, best_hyperparams \n",
 741 |         "\n",
 742 |         "  if name == 'Bayes': \n",
 743 |         "    param_grid = {'classifier__var_smoothing' : np.array([1e-09, 1e-08])} \n",
 744 |         "    # Create grid search object\n",
 745 |         "    # this uses k-fold cv\n",
 746 |         "\n",
 747 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5, n_jobs=-1)\n",
 748 |         "\n",
 749 |         "    # Fit on data\n",
 750 |         "    best_clf = clf.fit(X, y)\n",
 751 |         "\n",
 752 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
 753 |         "\n",
 754 |         "    return name, best_hyperparams \n",
 755 |         "\n",
 756 |         "  if name == 'RF': \n",
 757 |         "    param_grid = {'classifier__criterion' : np.array(['gini', 'entropy']),\n",
 758 |         "                  'classifier__max_depth' : np.arange(5,11)} \n",
 759 |         "    # Create grid search object\n",
 760 |         "    # this uses k-fold cv\n",
 761 |         "\n",
 762 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5, n_jobs=-1)\n",
 763 |         "\n",
 764 |         "    # Fit on data\n",
 765 |         "    best_clf = clf.fit(X, y)\n",
 766 |         "\n",
 767 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
 768 |         " \n",
 769 |         "    return name, best_hyperparams  \n",
 770 |         "\n",
 771 |         "  if name == 'XGB':\n",
 772 |         "    param_grid = {'classifier__learning_rate' : np.arange(0.022,0.04,.01),\n",
 773 |         "                  'classifier__max_depth' : np.arange(5,10)} \n",
 774 |         "    # Create grid search object\n",
 775 |         "    # this uses k-fold cv\n",
 776 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5,  n_jobs=-1)\n",
 777 |         "\n",
 778 |         "    # Fit on data\n",
 779 |         "    best_clf = clf.fit(X, y)\n",
 780 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
 781 |         "\n",
 782 |         "    return name, best_hyperparams  "
 783 |       ],
 784 |       "execution_count": null,
 785 |       "outputs": []
 786 |     },
 787 |     {
 788 |       "cell_type": "markdown",
 789 |       "metadata": {
 790 |         "id": "8Ay2mPQo39mV",
 791 |         "colab_type": "text"
 792 |       },
 793 |       "source": [
 794 |         "## Adding Random Forest and XGBoost to our get_stacking() custom function in layer 1 (and removing the poorest performers DT and KNN):"
 795 |       ]
 796 |     },
 797 |     {
 798 |       "cell_type": "code",
 799 |       "metadata": {
 800 |         "id": "4ow6Aqaz27GJ",
 801 |         "colab_type": "code",
 802 |         "colab": {}
 803 |       },
 804 |       "source": [
 805 |         "# Define get_stacking():  \n",
 806 |         "def :\n",
 807 |         "\n",
 808 |         "\t# Create an empty list for the base models called layer1\n",
 809 |         "  \n",
 810 |         "\n",
 811 |         "  # Append tuple with classifier name and instantiations (no arguments) for SVC and GaussianNB base models AND call cust fx #4 best_model on each\n",
 812 |         "  # Hint: layer1.append((best_model('ModelName', Classifier())))\n",
 813 |         "  \n",
 814 |         "\n",
 815 |         "  # Add RandomForestClassifier and xgb.XGBClassifier as base models\n",
 816 |         "  \n",
 817 |         "\n",
 818 |         "  # Instantiate Logistic Regression as meta learner model called layer2\n",
 819 |         "  \n",
 820 |         "\n",
 821 |         "\t# Define StackingClassifier() called model passing layer1 model list and meta learner with 5 cross-validations\n",
 822 |         "  model = StackingClassifier(estimators=, final_estimator=, cv=)\n",
 823 |         "\n",
 824 |         "  # return model\n",
 825 |         "  "
 826 |       ],
 827 |       "execution_count": null,
 828 |       "outputs": []
 829 |     },
 830 |     {
 831 |       "cell_type": "markdown",
 832 |       "metadata": {
 833 |         "id": "dbp5PICC4HEk",
 834 |         "colab_type": "text"
 835 |       },
 836 |       "source": [
 837 |         "## Adding Random Forest and XGBoost to our get_models() custom function:"
 838 |       ]
 839 |     },
 840 |     {
 841 |       "cell_type": "code",
 842 |       "metadata": {
 843 |         "id": "GQqQUH_P3Bto",
 844 |         "colab_type": "code",
 845 |         "colab": {}
 846 |       },
 847 |       "source": [
 848 |         "# Define get_models():\n",
 849 |         "def :\n",
 850 |         "\n",
 851 |         "  # Create empty dictionary called models\n",
 852 |         "  \n",
 853 |         "\n",
 854 |         "  # Add key:value pairs to dictionary with key as ModelName and value as instantiations (no arguments) for SVC and GaussianNB base models\n",
 855 |         "  # Hint: models['ModelName'] = Classifier() \n",
 856 |         "  \n",
 857 |         "\n",
 858 |         "  # we'll add two more classifers to the mix - RandomForestClassifier and xgb.XGBClassifier\n",
 859 |         " \n",
 860 |         "\n",
 861 |         "\n",
 862 |         "  # Add key:value pair to dictionary with key called Stacking and value that calls get_stacking() custom function\n",
 863 |         "  \n",
 864 |         "\n",
 865 |         "  # return dictionary\n",
 866 |         "  "
 867 |       ],
 868 |       "execution_count": null,
 869 |       "outputs": []
 870 |     },
 871 |     {
 872 |       "cell_type": "code",
 873 |       "metadata": {
 874 |         "id": "JVTYjSno3B3s",
 875 |         "colab_type": "code",
 876 |         "colab": {}
 877 |       },
 878 |       "source": [
 879 |         "# Assign get_models() to a variable called models\n",
 880 |         "models = get_models()"
 881 |       ],
 882 |       "execution_count": null,
 883 |       "outputs": []
 884 |     },
 885 |     {
 886 |       "cell_type": "markdown",
 887 |       "metadata": {
 888 |         "id": "lNECWtJ74tZh",
 889 |         "colab_type": "text"
 890 |       },
 891 |       "source": [
 892 |         "## Custom function # 3: evaluate_model(model)"
 893 |       ]
 894 |     },
 895 |     {
 896 |       "cell_type": "code",
 897 |       "metadata": {
 898 |         "id": "TsTJZKNk3XWc",
 899 |         "colab_type": "code",
 900 |         "colab": {}
 901 |       },
 902 |       "source": [
 903 |         "# Define evaluate_model(model):\n",
 904 |         "def :\n",
 905 |         "\n",
 906 |         "  # Create RepeatedStratifiedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
 907 |         "  cv = \n",
 908 |         "\n",
 909 |         "  # Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
 910 |         "  scores = \n",
 911 |         "\n",
 912 |         "  # return scores\n",
 913 |         "  "
 914 |       ],
 915 |       "execution_count": null,
 916 |       "outputs": []
 917 |     },
 918 |     {
 919 |       "cell_type": "markdown",
 920 |       "metadata": {
 921 |         "id": "3CxRVSe_DGlI",
 922 |         "colab_type": "text"
 923 |       },
 924 |       "source": [
 925 |         "# 10 minute break while the following runs..."
 926 |       ]
 927 |     },
 928 |     {
 929 |       "cell_type": "code",
 930 |       "metadata": {
 931 |         "id": "rXrusVVBAbaJ",
 932 |         "colab_type": "code",
 933 |         "colab": {}
 934 |       },
 935 |       "source": [
 936 |         "# Evaluate the models and store results\n",
 937 |         "# Create an empty list for the results\n",
 938 |         "\n",
 939 |         "\n",
 940 |         "# Create an empty list for the model names\n",
 941 |         "\n",
 942 |         "\n",
 943 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
 944 |         "for :\n",
 945 |         "\n",
 946 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
 947 |         "\t\n",
 948 |         " \n",
 949 |         "  # Append output from scores to the results list\n",
 950 |         "\t\n",
 951 |         " \n",
 952 |         "  # Append name to the names list\n",
 953 |         "\t\n",
 954 |         " \n",
 955 |         "  # Print name, mean and standard deviation of scores:\n",
 956 |         "\tprint('>%s %.3f (%.3f)' % (, mean(), std()))\n",
 957 |         "\n",
 958 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
 959 |         "sns.boxplot(x=, y=, )"
 960 |       ],
 961 |       "execution_count": null,
 962 |       "outputs": []
 963 |     },
 964 |     {
 965 |       "cell_type": "markdown",
 966 |       "metadata": {
 967 |         "id": "uZlAHPaD419_",
 968 |         "colab_type": "text"
 969 |       },
 970 |       "source": [
 971 |         "## **Observation**\n",
 972 |         "- Before we added XGBoost and hyperparameter tuning, our Stacking Classifier got ~ 76% accuracy. \n",
 973 |         "- Here, we got just around 77% accuracy, a minor improvement, but an improvement nonetheless.\n",
 974 |         "- We could continue fiddling with other algorithms in layer 1\n",
 975 |         "- We could try other algorithms in layer 2.\n",
 976 |         "- We could add more hyperparameters to our parameter grid.\n",
 977 |         "- To this last point, keep in mind that the more parameters there are in a grid to search over, the longer it takes to train the Stacking Classifier."
 978 |       ]
 979 |     },
 980 |     {
 981 |       "cell_type": "markdown",
 982 |       "metadata": {
 983 |         "id": "lj8WeJR__bUo",
 984 |         "colab_type": "text"
 985 |       },
 986 |       "source": [
 987 |         "---\n",
 988 |         "\n",
 989 |         "## Q&A\n",
 990 |         "\n",
 991 |         "--- "
 992 |       ]
 993 |     },
 994 |     {
 995 |       "cell_type": "markdown",
 996 |       "metadata": {
 997 |         "id": "FPY3I2BlVxig",
 998 |         "colab_type": "text"
 999 |       },
1000 |       "source": [
1001 |         "# **Stacking Regressor**"
1002 |       ]
1003 |     },
1004 |     {
1005 |       "cell_type": "code",
1006 |       "metadata": {
1007 |         "id": "ftxDhyDq2lrH",
1008 |         "colab_type": "code",
1009 |         "colab": {}
1010 |       },
1011 |       "source": [
1012 |         "# Import libraries\n",
1013 |         "from sklearn.model_selection import RepeatedKFold\n",
1014 |         "from sklearn.dummy import DummyRegressor\n",
1015 |         "from sklearn.svm import SVR"
1016 |       ],
1017 |       "execution_count": null,
1018 |       "outputs": []
1019 |     },
1020 |     {
1021 |       "cell_type": "markdown",
1022 |       "metadata": {
1023 |         "id": "nqDHD8A_nhPB",
1024 |         "colab_type": "text"
1025 |       },
1026 |       "source": [
1027 |         "## **2nd Dataset**\n",
1028 |         "\n",
1029 |         "\n",
1030 |         "The second dataset we'll use is a CSV file named `abalone.csv`, which contains data on physical measurements of abalone shells used to determine the age of the abalone.  It contains the following columns:\n",
1031 |         "\n",
1032 |         "- `Sex`: M, F, and I (infant) - (removed for our purposes)\n",
1033 |         "- `Length`: Longest shell measurement (mm)\n",
1034 |         "- `Diameter`: Perpendicular to length (mm)\n",
1035 |         "- `Height`: with meat in shell (mm)\n",
1036 |         "- `Whole weight`: whole abalone (grams)\n",
1037 |         "- `Shucked weight`: weight of meat (grams)\n",
1038 |         "- `Viscera weight`: gut weight (grams)\n",
1039 |         "- `Shell weight`: after being dried (grams)\n",
1040 |         "- `Rings`: +1.5 gives the age in years\n",
1041 |         "\n",
1042 |         "\t"
1043 |       ]
1044 |     },
1045 |     {
1046 |       "cell_type": "markdown",
1047 |       "metadata": {
1048 |         "id": "HwNnn3ZKrh1o",
1049 |         "colab_type": "text"
1050 |       },
1051 |       "source": [
1052 |         "### **Get the dataset**"
1053 |       ]
1054 |     },
1055 |     {
1056 |       "cell_type": "code",
1057 |       "metadata": {
1058 |         "id": "K4LeaM4PzyAh",
1059 |         "colab_type": "code",
1060 |         "colab": {}
1061 |       },
1062 |       "source": [
1063 |         "# Read in the dataset as Pandas DataFrame\n",
1064 |         "abalone = pd.read_csv('https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/data/abalone.csv?raw=true')"
1065 |       ],
1066 |       "execution_count": null,
1067 |       "outputs": []
1068 |     },
1069 |     {
1070 |       "cell_type": "code",
1071 |       "metadata": {
1072 |         "id": "KfsmhIBdApVp",
1073 |         "colab_type": "code",
1074 |         "colab": {}
1075 |       },
1076 |       "source": [
1077 |         "# Look at data using the info() function\n"
1078 |       ],
1079 |       "execution_count": null,
1080 |       "outputs": []
1081 |     },
1082 |     {
1083 |       "cell_type": "markdown",
1084 |       "metadata": {
1085 |         "id": "NZAeIFGwBhe6",
1086 |         "colab_type": "text"
1087 |       },
1088 |       "source": [
1089 |         "## **Observations:** \n",
1090 |         "- Here, there are no missing values.  Again, that is not typical.\n",
1091 |         "- There is a mixture of object, float, and integers with the first column being `object` (categorical), the next 7 `float64` and the last 'int64`."
1092 |       ]
1093 |     },
1094 |     {
1095 |       "cell_type": "code",
1096 |       "metadata": {
1097 |         "id": "8D4Gfh08Avb2",
1098 |         "colab_type": "code",
1099 |         "colab": {}
1100 |       },
1101 |       "source": [
1102 |         "# Look at data using the describe() function\n"
1103 |       ],
1104 |       "execution_count": null,
1105 |       "outputs": []
1106 |     },
1107 |     {
1108 |       "cell_type": "markdown",
1109 |       "metadata": {
1110 |         "id": "WDGc7PPBBkGX",
1111 |         "colab_type": "text"
1112 |       },
1113 |       "source": [
1114 |         "## **Observations:** \n",
1115 |         "- Notice that the min of the `Height` column is zero.  Even though there are no missing values, this is indicative of the measurements for that feature having not been captured.\n",
1116 |         "- Again, the printout makes it appear as if all numeric values are float.  \n",
1117 |         "\n"
1118 |       ]
1119 |     },
1120 |     {
1121 |       "cell_type": "code",
1122 |       "metadata": {
1123 |         "id": "FVGtuWoDAvl2",
1124 |         "colab_type": "code",
1125 |         "colab": {}
1126 |       },
1127 |       "source": [
1128 |         "# Print the first 5 rows of the data using the head() function\n"
1129 |       ],
1130 |       "execution_count": null,
1131 |       "outputs": []
1132 |     },
1133 |     {
1134 |       "cell_type": "markdown",
1135 |       "metadata": {
1136 |         "id": "wnmVoSl8BmMY",
1137 |         "colab_type": "text"
1138 |       },
1139 |       "source": [
1140 |         "## **Observation:**\n",
1141 |         "- Printing out the first 5 rows, we see that the 1st column is the only non-numeric feature in this dataset and is aligned with the `object` datatype as we saw above when we called `.info()`."
1142 |       ]
1143 |     },
1144 |     {
1145 |       "cell_type": "code",
1146 |       "metadata": {
1147 |         "id": "xPfVhWzRrm_w",
1148 |         "colab_type": "code",
1149 |         "colab": {}
1150 |       },
1151 |       "source": [
1152 |         "# Convert Pandas DataFrame to numpy array - Return only the values of the DataFrame with DataFrame.to_numpy()\n",
1153 |         "abalone = \n",
1154 |         "\n",
1155 |         "# Create X matrix and y (target) array using slicing [row_start:row_end, 1:target_col],[row_start:row_end, target_col] - Removing 1st column by starting at index 1\n",
1156 |         "X, y = \n",
1157 |         "\n",
1158 |         "# Print X matrix and y (target) array dimensions using .shape\n",
1159 |         "print('Shape: %s, %s' % ())"
1160 |       ],
1161 |       "execution_count": null,
1162 |       "outputs": []
1163 |     },
1164 |     {
1165 |       "cell_type": "code",
1166 |       "metadata": {
1167 |         "id": "fZ6CHfsVrpE7",
1168 |         "colab_type": "code",
1169 |         "colab": {}
1170 |       },
1171 |       "source": [
1172 |         "# Convert y (target) array to 'float32' using .astype()\n",
1173 |         "y = "
1174 |       ],
1175 |       "execution_count": null,
1176 |       "outputs": []
1177 |     },
1178 |     {
1179 |       "cell_type": "markdown",
1180 |       "metadata": {
1181 |         "id": "7bYvtBfSF7k7",
1182 |         "colab_type": "text"
1183 |       },
1184 |       "source": [
1185 |         "## **Creating a Naive Regressor**\n",
1186 |         "Here we'll use the `DummyRegressor` from `sklearn`.  This creates a so-called 'naive' regressor and is simply a model that predicts a single value for all of the rows, regardless of their original value.  \n",
1187 |         "\n",
1188 |         "1. `DummyRegressor()` arguments:\n",
1189 |         " - `strategy`: Strategy to use to generate predictions.\n",
1190 |         "\n",
1191 |         "2. `RepeatedKFold()` arguments:\n",
1192 |         " - `n_splits`: Number of folds.\n",
1193 |         " - `n_repeats`: Number of times cross-validator needs to be repeated.\n",
1194 |         " - `random_state`: Controls the generation of the random states for each repetition. Pass an int for reproducible output across multiple function calls.  (This is an equivalent argument to np.random.seed above, but will be specific to this naive model.)\n",
1195 |         "\n",
1196 |         "3. `cross_val_score()` arguments:\n",
1197 |         " - The model to use.\n",
1198 |         " - The data to fit. (X)\n",
1199 |         " - The target variable to try to predict. (y)\n",
1200 |         " - `scoring`: A single string scorer callable object/function such as 'accuracy' or 'roc_auc'.  See https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter for more options.\n",
1201 |         " - `cv`: Cross-validation splitting strategy (default is 5)\n",
1202 |         " - `n_jobs`: Number of CPU cores used when parallelizing.  Set to -1 helps to avoid non-convergence errors.\n",
1203 |         " - `error_score`: Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised."
1204 |       ]
1205 |     },
1206 |     {
1207 |       "cell_type": "code",
1208 |       "metadata": {
1209 |         "id": "jAJdcu_Hrrg8",
1210 |         "colab_type": "code",
1211 |         "colab": {}
1212 |       },
1213 |       "source": [
1214 |         "# Evaluate naive\n",
1215 |         "\n",
1216 |         "# Instantiate a DummyRegressor with 'median' strategy\n",
1217 |         "naive = D\n",
1218 |         "\n",
1219 |         "# Create RepeatedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
1220 |         "cv = \n",
1221 |         "\n",
1222 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'neg_mean_absolute_error' scoring, cross validator, n_jobs=-1, and error_score set to 'raise'\n",
1223 |         "n_scores = \n",
1224 |         "\n",
1225 |         "# Print mean and standard deviation of n_scores:\n",
1226 |         "print('Baseline: %.3f (%.3f)' % (mean(), std()))"
1227 |       ],
1228 |       "execution_count": null,
1229 |       "outputs": []
1230 |     },
1231 |     {
1232 |       "cell_type": "markdown",
1233 |       "metadata": {
1234 |         "id": "dlYQmsCQHcdJ",
1235 |         "colab_type": "text"
1236 |       },
1237 |       "source": [
1238 |         "## **Observation** \n",
1239 |         "- We want to do better than -2.37 to consider any other models as an improvement to a totally naive regressor model with the Abalone dataset."
1240 |       ]
1241 |     },
1242 |     {
1243 |       "cell_type": "markdown",
1244 |       "metadata": {
1245 |         "id": "ZfiEdoUMHo-q",
1246 |         "colab_type": "text"
1247 |       },
1248 |       "source": [
1249 |         "## **Creating a Baseline Regressor**\n",
1250 |         "Now we'll create a baseline regressor, one that seeks to correctly predict the value for each observation.  Since the target variable is continuous, we'll instantiate a Support Vector Regression model.\n",
1251 |         "\n",
1252 |         "1. `SVR()` arguments:\n",
1253 |         " - `kernel`: Specifies the kernel type to be used in the algorithm.\n",
1254 |         " - `gamma`:  Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. \n",
1255 |         " - `C`: Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty."
1256 |       ]
1257 |     },
1258 |     {
1259 |       "cell_type": "code",
1260 |       "metadata": {
1261 |         "id": "cFip40FPrvOn",
1262 |         "colab_type": "code",
1263 |         "colab": {}
1264 |       },
1265 |       "source": [
1266 |         "# Evaluate baseline model\n",
1267 |         "\n",
1268 |         "# Instantiate a Support Vector Regressor with 'rbf' kernel, gamma set to 'scale', and regularization parameter set to 10\n",
1269 |         "model = \n",
1270 |         "\n",
1271 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'neg_mean_absolute_error' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
1272 |         "m_scores = \n",
1273 |         "\n",
1274 |         "# Print mean and standard deviation of m_scores: \n",
1275 |         "print('Good: %.3f (%.3f)' % (mean(), std()))"
1276 |       ],
1277 |       "execution_count": null,
1278 |       "outputs": []
1279 |     },
1280 |     {
1281 |       "cell_type": "markdown",
1282 |       "metadata": {
1283 |         "id": "Z_PMtVARKzBX",
1284 |         "colab_type": "text"
1285 |       },
1286 |       "source": [
1287 |         "## **Observation**\n",
1288 |         "- We want to do better than -1.48 with a Stacking Regressor to consider it an improvement over this baseline support vector regression model with the Abalone dataset."
1289 |       ]
1290 |     },
1291 |     {
1292 |       "cell_type": "markdown",
1293 |       "metadata": {
1294 |         "id": "J-OGF_7bupzn",
1295 |         "colab_type": "text"
1296 |       },
1297 |       "source": [
1298 |         "## **Getting started with Stacking Regressor**\n",
1299 |         "- We're going to compare several additional baseline regressors to see if they perform better than SVR we just trained previously.\n",
1300 |         "- We'll start by importing additional packages that we'll need."
1301 |       ]
1302 |     },
1303 |     {
1304 |       "cell_type": "code",
1305 |       "metadata": {
1306 |         "id": "jxbxTPkPrkNb",
1307 |         "colab_type": "code",
1308 |         "colab": {}
1309 |       },
1310 |       "source": [
1311 |         "# Compare machine learning models for regression\n",
1312 |         "from sklearn.linear_model import LinearRegression\n",
1313 |         "from sklearn.neighbors import KNeighborsRegressor\n",
1314 |         "from sklearn.tree import DecisionTreeRegressor\n",
1315 |         "from sklearn.ensemble import StackingRegressor"
1316 |       ],
1317 |       "execution_count": null,
1318 |       "outputs": []
1319 |     },
1320 |     {
1321 |       "cell_type": "markdown",
1322 |       "metadata": {
1323 |         "id": "yixxr2JLN9UP",
1324 |         "colab_type": "text"
1325 |       },
1326 |       "source": [
1327 |         "## Create custom functions\n",
1328 |         "1. get_stacking() - This function will create the layers of our `StackingRegressor()`.\n",
1329 |         "2. get_models() - This function will create a dictionary of models to be evaluated.\n",
1330 |         "3. evaluate_model() - This function will evaluate each of the models to be compared."
1331 |       ]
1332 |     },
1333 |     {
1334 |       "cell_type": "markdown",
1335 |       "metadata": {
1336 |         "id": "FdF239ZRN92B",
1337 |         "colab_type": "text"
1338 |       },
1339 |       "source": [
1340 |         "## Custom function # 1: get_stacking()\n",
1341 |         "1. `StackingRegressor()` arguments:\n",
1342 |         " - `estimators`: List of baseline regressors\n",
1343 |         " - `final_estimator`: Defined meta regressor \n",
1344 |         " - `cv`: Number of cross validations to perform."
1345 |       ]
1346 |     },
1347 |     {
1348 |       "cell_type": "code",
1349 |       "metadata": {
1350 |         "id": "qoRNxZSj72bZ",
1351 |         "colab_type": "code",
1352 |         "colab": {}
1353 |       },
1354 |       "source": [
1355 |         "# Define get_stacking():\n",
1356 |         "def :\n",
1357 |         "\n",
1358 |         "\t# Create an empty list for the base models called layer1\n",
1359 |         "  \n",
1360 |         "\n",
1361 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
1362 |         "  # Hint: layer1.append(('ModelName', Classifier()))\n",
1363 |         "  \n",
1364 |         "\n",
1365 |         "  # Instantiate Linear Regression as meta learner model called layer2\n",
1366 |         "  \n",
1367 |         "\n",
1368 |         "\t# Define Stackingregressor() called model passing layer1 model list and meta learner with 5 cross-validations\n",
1369 |         "  \n",
1370 |         "\n",
1371 |         "  # return model\n",
1372 |         "  "
1373 |       ],
1374 |       "execution_count": null,
1375 |       "outputs": []
1376 |     },
1377 |     {
1378 |       "cell_type": "markdown",
1379 |       "metadata": {
1380 |         "id": "KClsJExROLAZ",
1381 |         "colab_type": "text"
1382 |       },
1383 |       "source": [
1384 |         "## Custom function # 2: get_models()"
1385 |       ]
1386 |     },
1387 |     {
1388 |       "cell_type": "code",
1389 |       "metadata": {
1390 |         "id": "PtYbhE_ps4yo",
1391 |         "colab_type": "code",
1392 |         "colab": {}
1393 |       },
1394 |       "source": [
1395 |         "# Define get_models():\n",
1396 |         "def :\n",
1397 |         "\n",
1398 |         "  # Create empty dictionary called models\n",
1399 |         "  \n",
1400 |         "\n",
1401 |         "  # Add key:value pairs to dictionary with key as ModelName and value as instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
1402 |         "  # Hint: models['ModelName'] = Classifier()\n",
1403 |         "  \n",
1404 |         "\n",
1405 |         "  # Add key:value pair to dictionary with key called Stacking and value that calls get_stacking() custom function\n",
1406 |         "  \n",
1407 |         "\n",
1408 |         "  # return dictionary\n",
1409 |         "  "
1410 |       ],
1411 |       "execution_count": null,
1412 |       "outputs": []
1413 |     },
1414 |     {
1415 |       "cell_type": "markdown",
1416 |       "metadata": {
1417 |         "id": "SYH3KcjcOc56",
1418 |         "colab_type": "text"
1419 |       },
1420 |       "source": [
1421 |         "## Custom function # 3: evaluate_model(model)"
1422 |       ]
1423 |     },
1424 |     {
1425 |       "cell_type": "code",
1426 |       "metadata": {
1427 |         "id": "H95M82gks6EL",
1428 |         "colab_type": "code",
1429 |         "colab": {}
1430 |       },
1431 |       "source": [
1432 |         "# Define evaluate_model:\n",
1433 |         "def :\n",
1434 |         "\n",
1435 |         "  # Create RepeatedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
1436 |         "\tcv = \n",
1437 |         " \n",
1438 |         "  # Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'neg_mean_absolute_error' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
1439 |         "\tscores = \n",
1440 |         " \n",
1441 |         "  # return scores\n",
1442 |         "\t"
1443 |       ],
1444 |       "execution_count": null,
1445 |       "outputs": []
1446 |     },
1447 |     {
1448 |       "cell_type": "code",
1449 |       "metadata": {
1450 |         "id": "2C6Hw-wj56eK",
1451 |         "colab_type": "code",
1452 |         "colab": {}
1453 |       },
1454 |       "source": [
1455 |         "# Assign get_models() to a variable called models\n"
1456 |       ],
1457 |       "execution_count": null,
1458 |       "outputs": []
1459 |     },
1460 |     {
1461 |       "cell_type": "code",
1462 |       "metadata": {
1463 |         "id": "BZl3DjmU58Lm",
1464 |         "colab_type": "code",
1465 |         "colab": {}
1466 |       },
1467 |       "source": [
1468 |         "# Evaluate the models and store results\n",
1469 |         "# Create an empty list for the results\n",
1470 |         "\n",
1471 |         "\n",
1472 |         "# Create an empty list for the model names\n",
1473 |         "\n",
1474 |         "\n",
1475 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
1476 |         "for :\n",
1477 |         "\n",
1478 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
1479 |         "\t\n",
1480 |         " \n",
1481 |         "  # Append output from scores to the results list\n",
1482 |         "\t\n",
1483 |         " \n",
1484 |         "  # Append name to the names list\n",
1485 |         "\t\n",
1486 |         " \n",
1487 |         "  # Print name, mean and standard deviation of scores:\n",
1488 |         "\tprint('>%s %.3f (%.3f)' % (, (), ()))\n",
1489 |         " \n",
1490 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
1491 |         "sns.boxplot(x=, y=, )"
1492 |       ],
1493 |       "execution_count": null,
1494 |       "outputs": []
1495 |     },
1496 |     {
1497 |       "cell_type": "markdown",
1498 |       "metadata": {
1499 |         "id": "d6EKNBV1UOuG",
1500 |         "colab_type": "text"
1501 |       },
1502 |       "source": [
1503 |         "## **Observation**\n",
1504 |         "- Recall that we want to do better than -1.48  with a Stacking Regressor to consider it an improvement over this baseline SVR and, although close, we did not achieve that with this dataset.\n",
1505 |         "- So what else can try to improve our results with stacking?\n",
1506 |         "\n",
1507 |         "### We'll add another layer to the mix..."
1508 |       ]
1509 |     },
1510 |     {
1511 |       "cell_type": "markdown",
1512 |       "metadata": {
1513 |         "id": "N9DZ7iyZFxXo",
1514 |         "colab_type": "text"
1515 |       },
1516 |       "source": [
1517 |         "## **Double Stacking - 2 Layers**\n",
1518 |         "- Can get a little tricky\n",
1519 |         "- Just make sure that you name your layers VERY CLEARLY!\n",
1520 |         "- Both the last layer (here it's layer 3) and the stacking model will use a call to `StackingRegressor()`\n",
1521 |         "- The last layer will combine the 2nd layer with the final estimator while the model will combine the 1st layer with this last layer.\n",
1522 |         "\n",
1523 |         "<p align=\"center\">\n",
1524 |         "<img src=\"https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/assets/DoubleStacking.png?raw=True\" alt = \"Double Stacking\" width=\"90%\">\n",
1525 |         "</p>\n",
1526 |         "<br><br>"
1527 |       ]
1528 |     },
1529 |     {
1530 |       "cell_type": "code",
1531 |       "metadata": {
1532 |         "id": "fXvUmmQQF6vq",
1533 |         "colab_type": "code",
1534 |         "colab": {}
1535 |       },
1536 |       "source": [
1537 |         "# Define get_stacking() - adding another layer:\n",
1538 |         "def :\n",
1539 |         "\n",
1540 |         "\t# Create an empty list for the 1st layer of base models called layer1\n",
1541 |         "  \n",
1542 |         "\n",
1543 |         "  # Create an empty list for the 2nd layer of base models called layer2\n",
1544 |         "  \n",
1545 |         "\n",
1546 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
1547 |         "  # Hint: layer1.append(('ModelName', Classifier()))\n",
1548 |         "  \n",
1549 |         "\n",
1550 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
1551 |         "  # Hint: layer2.append(('ModelName', Classifier()))\n",
1552 |         "  \n",
1553 |         "\n",
1554 |         "\t# Define meta learner StackingRegressor() called layer3 passing layer2 model list to estimators, LinearRegression() to final_estimator with 5 cross-validations\n",
1555 |         "  layer3 = \n",
1556 |         "\n",
1557 |         "\t# Define StackingRegressor() called model passing layer1 model list to estimators and meta learner (layer3) to final_estimator with 5 cross-validations\n",
1558 |         "  model = \n",
1559 |         "\n",
1560 |         "  # return model\n",
1561 |         "  "
1562 |       ],
1563 |       "execution_count": null,
1564 |       "outputs": []
1565 |     },
1566 |     {
1567 |       "cell_type": "code",
1568 |       "metadata": {
1569 |         "id": "CnMMqOJ16Bft",
1570 |         "colab_type": "code",
1571 |         "colab": {}
1572 |       },
1573 |       "source": [
1574 |         "# Assign get_models() to a variable called models\n"
1575 |       ],
1576 |       "execution_count": null,
1577 |       "outputs": []
1578 |     },
1579 |     {
1580 |       "cell_type": "code",
1581 |       "metadata": {
1582 |         "id": "kvzSjLOEIKUx",
1583 |         "colab_type": "code",
1584 |         "colab": {}
1585 |       },
1586 |       "source": [
1587 |         "# Evaluate the models and store results\n",
1588 |         "# Create an empty list for the results\n",
1589 |         "\n",
1590 |         "\n",
1591 |         "# Create an empty list for the model names\n",
1592 |         "\n",
1593 |         "\n",
1594 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
1595 |         "for ):\n",
1596 |         "\n",
1597 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
1598 |         "\t\n",
1599 |         " \n",
1600 |         "  # Append output from scores to the results list\n",
1601 |         "\t\n",
1602 |         " \n",
1603 |         "  # Append name to the names list\n",
1604 |         "\t\n",
1605 |         " \n",
1606 |         "  # Print name, mean and standard deviation of scores:\n",
1607 |         "\tprint('>%s %.3f (%.3f)' % (, (), ()))\n",
1608 |         " \n",
1609 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
1610 |         "sns.( , , )"
1611 |       ],
1612 |       "execution_count": null,
1613 |       "outputs": []
1614 |     },
1615 |     {
1616 |       "cell_type": "markdown",
1617 |       "metadata": {
1618 |         "id": "ZMgN44SwcJPG",
1619 |         "colab_type": "text"
1620 |       },
1621 |       "source": [
1622 |         "## **Final Observation**\n",
1623 |         "- Adding a layer did not improve results.\n",
1624 |         "- Complexity does not always make a better model\n",
1625 |         "- Could try different base models to stack for both of the datasets and that may show improvements over baseline.\n",
1626 |         "- Generate polynomial features \n",
1627 |         "- Try sklearn feature selection\n",
1628 |         "- Try feature engineering - creating new features from existing ones (but remember to remove the original features to avoid multicollinearity)\n",
1629 |         "- Tune hyperparameters for grid search as previously with Stacking Classifier\n",
1630 |         "- When there is a tie between a baseline model and a stacked model, choose the simpler model!"
1631 |       ]
1632 |     },
1633 |     {
1634 |       "cell_type": "markdown",
1635 |       "metadata": {
1636 |         "id": "Z4iX02EkDujS",
1637 |         "colab_type": "text"
1638 |       },
1639 |       "source": [
1640 |         "---\n",
1641 |         "\n",
1642 |         "# Q&A\n",
1643 |         "\n",
1644 |         "---"
1645 |       ]
1646 |     },
1647 |     {
1648 |       "cell_type": "markdown",
1649 |       "metadata": {
1650 |         "id": "kNWB_J4QD0Ad",
1651 |         "colab_type": "text"
1652 |       },
1653 |       "source": [
1654 |         "# Back to the slides for wrap-up..."
1655 |       ]
1656 |     }
1657 |   ]
1658 | }


--------------------------------------------------------------------------------
/notebooks/Applied_Machine_Learning_Ensemble_Modeling_Solution.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "nbformat": 4,
   3 |   "nbformat_minor": 0,
   4 |   "metadata": {
   5 |     "colab": {
   6 |       "name": "python_live_session_template.ipynb",
   7 |       "provenance": []
   8 |     },
   9 |     "kernelspec": {
  10 |       "display_name": "Python 3",
  11 |       "language": "python",
  12 |       "name": "python3"
  13 |     },
  14 |     "language_info": {
  15 |       "codemirror_mode": {
  16 |         "name": "ipython",
  17 |         "version": 3
  18 |       },
  19 |       "file_extension": ".py",
  20 |       "mimetype": "text/x-python",
  21 |       "name": "python",
  22 |       "nbconvert_exporter": "python",
  23 |       "pygments_lexer": "ipython3",
  24 |       "version": "3.7.1"
  25 |     }
  26 |   },
  27 |   "cells": [
  28 |     {
  29 |       "cell_type": "markdown",
  30 |       "metadata": {
  31 |         "colab_type": "text",
  32 |         "id": "6Ijg5wUCTQYG"
  33 |       },
  34 |       "source": [
  35 |         "<p align=\"center\">\n",
  36 |         "<img src=\"https://github.com/datacamp/python-live-training-template/blob/master/assets/datacamp.svg?raw=True\" alt = \"DataCamp icon\" width=\"50%\">\n",
  37 |         "</p>\n",
  38 |         "<br><br>\n",
  39 |         "\n",
  40 |         "\n",
  41 |         "## **Applied Machine Learning - Ensemble Modeling Live Training**\n",
  42 |         "\n",
  43 |         "Welcome to this hands-on training where you will immerse yourself in applied machine learning in Python where we'll explore model stacking. Using `sklearn.ensemble`, we'll learn how to create layers that are stacking-ready.\n",
  44 |         "\n",
  45 |         "The foundations of model stacking:\n",
  46 |         "\n",
  47 |         "* Create various types of baseline models, including linear and logistic regression using Scikit-Learn, for comparison to ensemble methods.\n",
  48 |         "* Build layers, then stack them up.\n",
  49 |         "* Calculate and visualize performance metrics.\n",
  50 |         "\n",
  51 |         "\n",
  52 |         "\n",
  53 |         "---\n",
  54 |         "\n",
  55 |         "\n",
  56 |         "\n",
  57 |         "## **1st Dataset**\n",
  58 |         "\n",
  59 |         "\n",
  60 |         "The first dataset we'll use is a CSV file named `pima-indians-diabetes.csv`, which contains data on females of Pima Indian heritage that are at least 21 years old. It contains the following columns:\n",
  61 |         "\n",
  62 |         "- `n_preg`: Number of pregnancies\n",
  63 |         "- `pl_glucose`: Plasma glucose concentration 2 hours after an oral glucose tolerance test\n",
  64 |         "- `dia_bp`: Diastolic blood pressure (mm Hg)\n",
  65 |         "- `tri_thick`: Triceps skin fold thickness (mm)\n",
  66 |         "- `serum_ins`: 2-Hour serum insulin (mu U/ml)\n",
  67 |         "- `bmi`: Body mass index (weight in kg/(height in m)^2)\n",
  68 |         "- `diab_ped`: Diabetes pedigree function\n",
  69 |         "- `age`: Age (years)\n",
  70 |         "- `class`: Class variable (0 or 1)\n"
  71 |       ]
  72 |     },
  73 |     {
  74 |       "cell_type": "code",
  75 |       "metadata": {
  76 |         "colab_type": "code",
  77 |         "id": "EMQfyC7GUNhT",
  78 |         "colab": {
  79 |           "base_uri": "https://localhost:8080/",
  80 |           "height": 51
  81 |         },
  82 |         "outputId": "d5eb31f1-293e-40bd-9dd4-8d2a693deb66"
  83 |       },
  84 |       "source": [
  85 |         "# Import libraries\n",
  86 |         "import pandas as pd\n",
  87 |         "import numpy as np\n",
  88 |         "from numpy import mean\n",
  89 |         "from numpy import std\n",
  90 |         "import matplotlib.pyplot as plt\n",
  91 |         "import seaborn as sns\n",
  92 |         "from collections import Counter\n",
  93 |         "from sklearn.preprocessing import LabelEncoder\n",
  94 |         "from sklearn.model_selection import cross_val_score\n",
  95 |         "from sklearn.model_selection import RepeatedStratifiedKFold\n",
  96 |         "from sklearn.dummy import DummyClassifier\n",
  97 |         "from sklearn.tree import DecisionTreeClassifier"
  98 |       ],
  99 |       "execution_count": null,
 100 |       "outputs": [
 101 |         {
 102 |           "output_type": "stream",
 103 |           "text": [
 104 |             "/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n",
 105 |             "  import pandas.util.testing as tm\n"
 106 |           ],
 107 |           "name": "stderr"
 108 |         }
 109 |       ]
 110 |     },
 111 |     {
 112 |       "cell_type": "code",
 113 |       "metadata": {
 114 |         "colab_type": "code",
 115 |         "id": "l8t_EwRNZPLB",
 116 |         "colab": {}
 117 |       },
 118 |       "source": [
 119 |         "# Read in the dataset as Pandas DataFrame\n",
 120 |         "diabetes = pd.read_csv('https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/data/pima-indians-diabetes.csv?raw=true')"
 121 |       ],
 122 |       "execution_count": null,
 123 |       "outputs": []
 124 |     },
 125 |     {
 126 |       "cell_type": "code",
 127 |       "metadata": {
 128 |         "id": "PRJPuinPZpGA",
 129 |         "colab_type": "code",
 130 |         "colab": {
 131 |           "base_uri": "https://localhost:8080/",
 132 |           "height": 289
 133 |         },
 134 |         "outputId": "fe86f395-72c2-438a-c48c-4d16ec7a93af"
 135 |       },
 136 |       "source": [
 137 |         "# Look at data using the info() function\n",
 138 |         "diabetes.info()"
 139 |       ],
 140 |       "execution_count": null,
 141 |       "outputs": [
 142 |         {
 143 |           "output_type": "stream",
 144 |           "text": [
 145 |             "<class 'pandas.core.frame.DataFrame'>\n",
 146 |             "RangeIndex: 768 entries, 0 to 767\n",
 147 |             "Data columns (total 9 columns):\n",
 148 |             " #   Column      Non-Null Count  Dtype  \n",
 149 |             "---  ------      --------------  -----  \n",
 150 |             " 0   n_preg      768 non-null    int64  \n",
 151 |             " 1   pl_glucose  768 non-null    int64  \n",
 152 |             " 2   dia_bp      768 non-null    int64  \n",
 153 |             " 3   tri_thick   768 non-null    int64  \n",
 154 |             " 4   serum_ins   768 non-null    int64  \n",
 155 |             " 5   bmi         768 non-null    float64\n",
 156 |             " 6   diab_ped    768 non-null    float64\n",
 157 |             " 7   age         768 non-null    int64  \n",
 158 |             " 8   class       768 non-null    int64  \n",
 159 |             "dtypes: float64(2), int64(7)\n",
 160 |             "memory usage: 54.1 KB\n"
 161 |           ],
 162 |           "name": "stdout"
 163 |         }
 164 |       ]
 165 |     },
 166 |     {
 167 |       "cell_type": "markdown",
 168 |       "metadata": {
 169 |         "id": "C6OVOkU80oKP",
 170 |         "colab_type": "text"
 171 |       },
 172 |       "source": [
 173 |         "## **Observations:** \n",
 174 |         "- The `info()` function is critical to beginning to understand your data.  Here, there are no missing values.  However, that is not typical.\n",
 175 |         "- There is a mixture of integers and floats with the first 5 columns being `int64`, the next 2 `float64` and the last 2 'int64`."
 176 |       ]
 177 |     },
 178 |     {
 179 |       "cell_type": "markdown",
 180 |       "metadata": {
 181 |         "id": "v3hAsYrhVi4L",
 182 |         "colab_type": "text"
 183 |       },
 184 |       "source": [
 185 |         "---\n",
 186 |         "\n",
 187 |         "## Q&A\n",
 188 |         "\n",
 189 |         "--- "
 190 |       ]
 191 |     },
 192 |     {
 193 |       "cell_type": "code",
 194 |       "metadata": {
 195 |         "id": "E6UtlpG_Zo50",
 196 |         "colab_type": "code",
 197 |         "colab": {
 198 |           "base_uri": "https://localhost:8080/",
 199 |           "height": 297
 200 |         },
 201 |         "outputId": "dadacd3e-88ab-4768-c3b4-a501f3b4c4aa"
 202 |       },
 203 |       "source": [
 204 |         "# Look at data using the describe() function\n",
 205 |         "diabetes.describe()"
 206 |       ],
 207 |       "execution_count": null,
 208 |       "outputs": [
 209 |         {
 210 |           "output_type": "execute_result",
 211 |           "data": {
 212 |             "text/html": [
 213 |               "<div>\n",
 214 |               "<style scoped>\n",
 215 |               "    .dataframe tbody tr th:only-of-type {\n",
 216 |               "        vertical-align: middle;\n",
 217 |               "    }\n",
 218 |               "\n",
 219 |               "    .dataframe tbody tr th {\n",
 220 |               "        vertical-align: top;\n",
 221 |               "    }\n",
 222 |               "\n",
 223 |               "    .dataframe thead th {\n",
 224 |               "        text-align: right;\n",
 225 |               "    }\n",
 226 |               "</style>\n",
 227 |               "<table border=\"1\" class=\"dataframe\">\n",
 228 |               "  <thead>\n",
 229 |               "    <tr style=\"text-align: right;\">\n",
 230 |               "      <th></th>\n",
 231 |               "      <th>n_preg</th>\n",
 232 |               "      <th>pl_glucose</th>\n",
 233 |               "      <th>dia_bp</th>\n",
 234 |               "      <th>tri_thick</th>\n",
 235 |               "      <th>serum_ins</th>\n",
 236 |               "      <th>bmi</th>\n",
 237 |               "      <th>diab_ped</th>\n",
 238 |               "      <th>age</th>\n",
 239 |               "      <th>class</th>\n",
 240 |               "    </tr>\n",
 241 |               "  </thead>\n",
 242 |               "  <tbody>\n",
 243 |               "    <tr>\n",
 244 |               "      <th>count</th>\n",
 245 |               "      <td>768.000000</td>\n",
 246 |               "      <td>768.000000</td>\n",
 247 |               "      <td>768.000000</td>\n",
 248 |               "      <td>768.000000</td>\n",
 249 |               "      <td>768.000000</td>\n",
 250 |               "      <td>768.000000</td>\n",
 251 |               "      <td>768.000000</td>\n",
 252 |               "      <td>768.000000</td>\n",
 253 |               "      <td>768.000000</td>\n",
 254 |               "    </tr>\n",
 255 |               "    <tr>\n",
 256 |               "      <th>mean</th>\n",
 257 |               "      <td>3.845052</td>\n",
 258 |               "      <td>120.894531</td>\n",
 259 |               "      <td>69.105469</td>\n",
 260 |               "      <td>20.536458</td>\n",
 261 |               "      <td>79.799479</td>\n",
 262 |               "      <td>31.992578</td>\n",
 263 |               "      <td>0.471876</td>\n",
 264 |               "      <td>33.240885</td>\n",
 265 |               "      <td>0.348958</td>\n",
 266 |               "    </tr>\n",
 267 |               "    <tr>\n",
 268 |               "      <th>std</th>\n",
 269 |               "      <td>3.369578</td>\n",
 270 |               "      <td>31.972618</td>\n",
 271 |               "      <td>19.355807</td>\n",
 272 |               "      <td>15.952218</td>\n",
 273 |               "      <td>115.244002</td>\n",
 274 |               "      <td>7.884160</td>\n",
 275 |               "      <td>0.331329</td>\n",
 276 |               "      <td>11.760232</td>\n",
 277 |               "      <td>0.476951</td>\n",
 278 |               "    </tr>\n",
 279 |               "    <tr>\n",
 280 |               "      <th>min</th>\n",
 281 |               "      <td>0.000000</td>\n",
 282 |               "      <td>0.000000</td>\n",
 283 |               "      <td>0.000000</td>\n",
 284 |               "      <td>0.000000</td>\n",
 285 |               "      <td>0.000000</td>\n",
 286 |               "      <td>0.000000</td>\n",
 287 |               "      <td>0.078000</td>\n",
 288 |               "      <td>21.000000</td>\n",
 289 |               "      <td>0.000000</td>\n",
 290 |               "    </tr>\n",
 291 |               "    <tr>\n",
 292 |               "      <th>25%</th>\n",
 293 |               "      <td>1.000000</td>\n",
 294 |               "      <td>99.000000</td>\n",
 295 |               "      <td>62.000000</td>\n",
 296 |               "      <td>0.000000</td>\n",
 297 |               "      <td>0.000000</td>\n",
 298 |               "      <td>27.300000</td>\n",
 299 |               "      <td>0.243750</td>\n",
 300 |               "      <td>24.000000</td>\n",
 301 |               "      <td>0.000000</td>\n",
 302 |               "    </tr>\n",
 303 |               "    <tr>\n",
 304 |               "      <th>50%</th>\n",
 305 |               "      <td>3.000000</td>\n",
 306 |               "      <td>117.000000</td>\n",
 307 |               "      <td>72.000000</td>\n",
 308 |               "      <td>23.000000</td>\n",
 309 |               "      <td>30.500000</td>\n",
 310 |               "      <td>32.000000</td>\n",
 311 |               "      <td>0.372500</td>\n",
 312 |               "      <td>29.000000</td>\n",
 313 |               "      <td>0.000000</td>\n",
 314 |               "    </tr>\n",
 315 |               "    <tr>\n",
 316 |               "      <th>75%</th>\n",
 317 |               "      <td>6.000000</td>\n",
 318 |               "      <td>140.250000</td>\n",
 319 |               "      <td>80.000000</td>\n",
 320 |               "      <td>32.000000</td>\n",
 321 |               "      <td>127.250000</td>\n",
 322 |               "      <td>36.600000</td>\n",
 323 |               "      <td>0.626250</td>\n",
 324 |               "      <td>41.000000</td>\n",
 325 |               "      <td>1.000000</td>\n",
 326 |               "    </tr>\n",
 327 |               "    <tr>\n",
 328 |               "      <th>max</th>\n",
 329 |               "      <td>17.000000</td>\n",
 330 |               "      <td>199.000000</td>\n",
 331 |               "      <td>122.000000</td>\n",
 332 |               "      <td>99.000000</td>\n",
 333 |               "      <td>846.000000</td>\n",
 334 |               "      <td>67.100000</td>\n",
 335 |               "      <td>2.420000</td>\n",
 336 |               "      <td>81.000000</td>\n",
 337 |               "      <td>1.000000</td>\n",
 338 |               "    </tr>\n",
 339 |               "  </tbody>\n",
 340 |               "</table>\n",
 341 |               "</div>"
 342 |             ],
 343 |             "text/plain": [
 344 |               "           n_preg  pl_glucose      dia_bp  ...    diab_ped         age       class\n",
 345 |               "count  768.000000  768.000000  768.000000  ...  768.000000  768.000000  768.000000\n",
 346 |               "mean     3.845052  120.894531   69.105469  ...    0.471876   33.240885    0.348958\n",
 347 |               "std      3.369578   31.972618   19.355807  ...    0.331329   11.760232    0.476951\n",
 348 |               "min      0.000000    0.000000    0.000000  ...    0.078000   21.000000    0.000000\n",
 349 |               "25%      1.000000   99.000000   62.000000  ...    0.243750   24.000000    0.000000\n",
 350 |               "50%      3.000000  117.000000   72.000000  ...    0.372500   29.000000    0.000000\n",
 351 |               "75%      6.000000  140.250000   80.000000  ...    0.626250   41.000000    1.000000\n",
 352 |               "max     17.000000  199.000000  122.000000  ...    2.420000   81.000000    1.000000\n",
 353 |               "\n",
 354 |               "[8 rows x 9 columns]"
 355 |             ]
 356 |           },
 357 |           "metadata": {
 358 |             "tags": []
 359 |           },
 360 |           "execution_count": 4
 361 |         }
 362 |       ]
 363 |     },
 364 |     {
 365 |       "cell_type": "markdown",
 366 |       "metadata": {
 367 |         "id": "bCK9W_gk1HG8",
 368 |         "colab_type": "text"
 369 |       },
 370 |       "source": [
 371 |         "\n",
 372 |         "## **Observations:** \n",
 373 |         "- The `.describe()` function gives the summary statistics of the data.  Notice that the min of the 1st six columns is zero.  Even though there are no missing values, this is indicative of the measurements for those features having not been captured.\n",
 374 |         "- Although we previously saw there is a mixture of integer and float data types (as seen with `.info()`), the printout makes it appear as if all values are float.  "
 375 |       ]
 376 |     },
 377 |     {
 378 |       "cell_type": "code",
 379 |       "metadata": {
 380 |         "id": "UE5F_JUQ2X-0",
 381 |         "colab_type": "code",
 382 |         "colab": {
 383 |           "base_uri": "https://localhost:8080/",
 384 |           "height": 204
 385 |         },
 386 |         "outputId": "7cdeee97-80fc-4553-a3ec-c755bf8f19d2"
 387 |       },
 388 |       "source": [
 389 |         "# Print the first 5 rows of the data using the head() function\n",
 390 |         "diabetes.head()"
 391 |       ],
 392 |       "execution_count": null,
 393 |       "outputs": [
 394 |         {
 395 |           "output_type": "execute_result",
 396 |           "data": {
 397 |             "text/html": [
 398 |               "<div>\n",
 399 |               "<style scoped>\n",
 400 |               "    .dataframe tbody tr th:only-of-type {\n",
 401 |               "        vertical-align: middle;\n",
 402 |               "    }\n",
 403 |               "\n",
 404 |               "    .dataframe tbody tr th {\n",
 405 |               "        vertical-align: top;\n",
 406 |               "    }\n",
 407 |               "\n",
 408 |               "    .dataframe thead th {\n",
 409 |               "        text-align: right;\n",
 410 |               "    }\n",
 411 |               "</style>\n",
 412 |               "<table border=\"1\" class=\"dataframe\">\n",
 413 |               "  <thead>\n",
 414 |               "    <tr style=\"text-align: right;\">\n",
 415 |               "      <th></th>\n",
 416 |               "      <th>n_preg</th>\n",
 417 |               "      <th>pl_glucose</th>\n",
 418 |               "      <th>dia_bp</th>\n",
 419 |               "      <th>tri_thick</th>\n",
 420 |               "      <th>serum_ins</th>\n",
 421 |               "      <th>bmi</th>\n",
 422 |               "      <th>diab_ped</th>\n",
 423 |               "      <th>age</th>\n",
 424 |               "      <th>class</th>\n",
 425 |               "    </tr>\n",
 426 |               "  </thead>\n",
 427 |               "  <tbody>\n",
 428 |               "    <tr>\n",
 429 |               "      <th>0</th>\n",
 430 |               "      <td>6</td>\n",
 431 |               "      <td>148</td>\n",
 432 |               "      <td>72</td>\n",
 433 |               "      <td>35</td>\n",
 434 |               "      <td>0</td>\n",
 435 |               "      <td>33.6</td>\n",
 436 |               "      <td>0.627</td>\n",
 437 |               "      <td>50</td>\n",
 438 |               "      <td>1</td>\n",
 439 |               "    </tr>\n",
 440 |               "    <tr>\n",
 441 |               "      <th>1</th>\n",
 442 |               "      <td>1</td>\n",
 443 |               "      <td>85</td>\n",
 444 |               "      <td>66</td>\n",
 445 |               "      <td>29</td>\n",
 446 |               "      <td>0</td>\n",
 447 |               "      <td>26.6</td>\n",
 448 |               "      <td>0.351</td>\n",
 449 |               "      <td>31</td>\n",
 450 |               "      <td>0</td>\n",
 451 |               "    </tr>\n",
 452 |               "    <tr>\n",
 453 |               "      <th>2</th>\n",
 454 |               "      <td>8</td>\n",
 455 |               "      <td>183</td>\n",
 456 |               "      <td>64</td>\n",
 457 |               "      <td>0</td>\n",
 458 |               "      <td>0</td>\n",
 459 |               "      <td>23.3</td>\n",
 460 |               "      <td>0.672</td>\n",
 461 |               "      <td>32</td>\n",
 462 |               "      <td>1</td>\n",
 463 |               "    </tr>\n",
 464 |               "    <tr>\n",
 465 |               "      <th>3</th>\n",
 466 |               "      <td>1</td>\n",
 467 |               "      <td>89</td>\n",
 468 |               "      <td>66</td>\n",
 469 |               "      <td>23</td>\n",
 470 |               "      <td>94</td>\n",
 471 |               "      <td>28.1</td>\n",
 472 |               "      <td>0.167</td>\n",
 473 |               "      <td>21</td>\n",
 474 |               "      <td>0</td>\n",
 475 |               "    </tr>\n",
 476 |               "    <tr>\n",
 477 |               "      <th>4</th>\n",
 478 |               "      <td>0</td>\n",
 479 |               "      <td>137</td>\n",
 480 |               "      <td>40</td>\n",
 481 |               "      <td>35</td>\n",
 482 |               "      <td>168</td>\n",
 483 |               "      <td>43.1</td>\n",
 484 |               "      <td>2.288</td>\n",
 485 |               "      <td>33</td>\n",
 486 |               "      <td>1</td>\n",
 487 |               "    </tr>\n",
 488 |               "  </tbody>\n",
 489 |               "</table>\n",
 490 |               "</div>"
 491 |             ],
 492 |             "text/plain": [
 493 |               "   n_preg  pl_glucose  dia_bp  tri_thick  serum_ins   bmi  diab_ped  age  class\n",
 494 |               "0       6         148      72         35          0  33.6     0.627   50      1\n",
 495 |               "1       1          85      66         29          0  26.6     0.351   31      0\n",
 496 |               "2       8         183      64          0          0  23.3     0.672   32      1\n",
 497 |               "3       1          89      66         23         94  28.1     0.167   21      0\n",
 498 |               "4       0         137      40         35        168  43.1     2.288   33      1"
 499 |             ]
 500 |           },
 501 |           "metadata": {
 502 |             "tags": []
 503 |           },
 504 |           "execution_count": 5
 505 |         }
 506 |       ]
 507 |     },
 508 |     {
 509 |       "cell_type": "markdown",
 510 |       "metadata": {
 511 |         "id": "A2VCIx0K2bT1",
 512 |         "colab_type": "text"
 513 |       },
 514 |       "source": [
 515 |         "\n",
 516 |         "## **Observation:**\n",
 517 |         "- Printing out the first 5 rows, we see that the data types of the columns are indeed as stated previously."
 518 |       ]
 519 |     },
 520 |     {
 521 |       "cell_type": "markdown",
 522 |       "metadata": {
 523 |         "id": "ajAzhMDc2b1D",
 524 |         "colab_type": "text"
 525 |       },
 526 |       "source": [
 527 |         "## Let's check the number in each class:\n",
 528 |         "\n",
 529 |         "This avoids getting surprised by great results that are actually a side effect of class imbalance.  This happens when the majority class far outweighs the minority class."
 530 |       ]
 531 |     },
 532 |     {
 533 |       "cell_type": "code",
 534 |       "metadata": {
 535 |         "id": "MKeXN3441-9W",
 536 |         "colab_type": "code",
 537 |         "colab": {
 538 |           "base_uri": "https://localhost:8080/",
 539 |           "height": 34
 540 |         },
 541 |         "outputId": "a698fc39-4ac4-41bb-e77a-5cd6172f01a4"
 542 |       },
 543 |       "source": [
 544 |         "# Summarize class distribution\n",
 545 |         "target = diabetes['class']\n",
 546 |         "counter = Counter(target)\n",
 547 |         "print(counter)"
 548 |       ],
 549 |       "execution_count": null,
 550 |       "outputs": [
 551 |         {
 552 |           "output_type": "stream",
 553 |           "text": [
 554 |             "Counter({0: 500, 1: 268})\n"
 555 |           ],
 556 |           "name": "stdout"
 557 |         }
 558 |       ]
 559 |     },
 560 |     {
 561 |       "cell_type": "markdown",
 562 |       "metadata": {
 563 |         "id": "FOpbGyQw55v3",
 564 |         "colab_type": "text"
 565 |       },
 566 |       "source": [
 567 |         "## **Observation:** For every two negative cases there is one positive case, not enough of a difference to be considered class imbalance.  \n",
 568 |         "- Class imbalance tends to exist when the majority class is > 90% although there is no hard and fast rule about this threshold."
 569 |       ]
 570 |     },
 571 |     {
 572 |       "cell_type": "code",
 573 |       "metadata": {
 574 |         "id": "n5XaYl9ZZ8B5",
 575 |         "colab_type": "code",
 576 |         "colab": {}
 577 |       },
 578 |       "source": [
 579 |         "# Convert Pandas DataFrame to numpy array - Return only the values of the DataFrame with DataFrame.to_numpy()\n",
 580 |         "diabetes = diabetes.to_numpy()"
 581 |       ],
 582 |       "execution_count": null,
 583 |       "outputs": []
 584 |     },
 585 |     {
 586 |       "cell_type": "markdown",
 587 |       "metadata": {
 588 |         "id": "MlGa9IBc7Gsr",
 589 |         "colab_type": "text"
 590 |       },
 591 |       "source": [
 592 |         "### Always verify that your X matrix and target array have the same number of rows to avoid errors during model training."
 593 |       ]
 594 |     },
 595 |     {
 596 |       "cell_type": "code",
 597 |       "metadata": {
 598 |         "id": "9FEvD6Ab6InP",
 599 |         "colab_type": "code",
 600 |         "colab": {
 601 |           "base_uri": "https://localhost:8080/",
 602 |           "height": 34
 603 |         },
 604 |         "outputId": "5e5bd9ad-05ff-40c7-8c38-3f0c1877d1ee"
 605 |       },
 606 |       "source": [
 607 |         "# Create X matrix and y (target) array using slicing [row_start:row_end, col_start:target_col],[row_start:row_end, target_col]\n",
 608 |         "X, y = diabetes[:, :-1], diabetes[:, -1]\n",
 609 |         "\n",
 610 |         "# Print X matrix and y (target) array dimensions using .shape \n",
 611 |         "print('Shape: %s, %s' % (X.shape, y.shape))"
 612 |       ],
 613 |       "execution_count": null,
 614 |       "outputs": [
 615 |         {
 616 |           "output_type": "stream",
 617 |           "text": [
 618 |             "Shape: (768, 8), (768,)\n"
 619 |           ],
 620 |           "name": "stdout"
 621 |         }
 622 |       ]
 623 |     },
 624 |     {
 625 |       "cell_type": "code",
 626 |       "metadata": {
 627 |         "id": "hoI7t4U-Z8LU",
 628 |         "colab_type": "code",
 629 |         "colab": {}
 630 |       },
 631 |       "source": [
 632 |         "# Convert X matrix data types to 'float32' for consistency using .astype()\n",
 633 |         "X = X.astype('float32')\n",
 634 |         "\n",
 635 |         "# Convert y (target) array to 'str' using .astype()\n",
 636 |         "y = y.astype('str')\n",
 637 |         "\n",
 638 |         "# Encode class labels in y array using dot notation with LabelEncoder().fit_transform()\n",
 639 |         "# Hint: y goes in the fit_transform function call\n",
 640 |         "y = LabelEncoder().fit_transform(y)"
 641 |       ],
 642 |       "execution_count": null,
 643 |       "outputs": []
 644 |     },
 645 |     {
 646 |       "cell_type": "markdown",
 647 |       "metadata": {
 648 |         "id": "djXWv2xp9v1q",
 649 |         "colab_type": "text"
 650 |       },
 651 |       "source": [
 652 |         "### Don't let the `.astype('str')` throw you!  This is simply taking the class labels and label encoding them – regardless of their original format.\n",
 653 |         "\n",
 654 |         "\n"
 655 |       ]
 656 |     },
 657 |     {
 658 |       "cell_type": "markdown",
 659 |       "metadata": {
 660 |         "id": "OHHu8uz7_yVa",
 661 |         "colab_type": "text"
 662 |       },
 663 |       "source": [
 664 |         "## **Creating a Naive Classifier**\n",
 665 |         "Here we'll use the `DummyClassifier` from `sklearn`.  This creates a so-called 'naive' classifer and is simply a model that predicts a single class for all of the rows, regardless of their original class.  \n",
 666 |         "\n",
 667 |         "1. `DummyClassifier()` arguments:\n",
 668 |         " - `strategy`: Strategy to use to generate predictions.\n",
 669 |         "\n",
 670 |         "2. `RepeatedStratifiedKFold()` arguments:\n",
 671 |         " - `n_splits`: Number of folds.\n",
 672 |         " - `n_repeats`: Number of times cross-validator needs to be repeated.\n",
 673 |         " - `random_state`: Controls the generation of the random states for each repetition. Pass an int for reproducible output across multiple function calls.  (This is an equivalent argument to np.random.seed above, but will be specific to this naive model.)\n",
 674 |         "\n",
 675 |         "3. `cross_val_score()` arguments:\n",
 676 |         " - The model to use.\n",
 677 |         " - The data to fit. (X)\n",
 678 |         " - The target variable to try to predict. (y)\n",
 679 |         " - `scoring`: A single string scorer callable object/function such as 'accuracy' or 'roc_auc'.  See https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter for more options.\n",
 680 |         " - `cv`: Cross-validation splitting strategy (default is 5)\n",
 681 |         " - `n_jobs`: Number of CPU cores used when parallelizing.  Set to -1 helps to avoid non-convergence errors.\n",
 682 |         " - `error_score`: Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised."
 683 |       ]
 684 |     },
 685 |     {
 686 |       "cell_type": "code",
 687 |       "metadata": {
 688 |         "id": "BL4huFGPZ8RA",
 689 |         "colab_type": "code",
 690 |         "colab": {
 691 |           "base_uri": "https://localhost:8080/",
 692 |           "height": 34
 693 |         },
 694 |         "outputId": "9353fb35-5d21-4b76-f3e4-794798345c7f"
 695 |       },
 696 |       "source": [
 697 |         "# Evaluate naive\n",
 698 |         "\n",
 699 |         "# Instantiate a DummyClassifier with 'most_frequent' strategy\n",
 700 |         "naive = DummyClassifier(strategy='most_frequent')\n",
 701 |         "\n",
 702 |         "# Create RepeatedStratifiedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
 703 |         "cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\n",
 704 |         "\n",
 705 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator, n_jobs=-1, and error_score set to 'raise'\n",
 706 |         "n_scores = cross_val_score(naive, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\n",
 707 |         "\n",
 708 |         "# Print mean and standard deviation of n_scores: \n",
 709 |         "print('Naive score: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))\n"
 710 |       ],
 711 |       "execution_count": null,
 712 |       "outputs": [
 713 |         {
 714 |           "output_type": "stream",
 715 |           "text": [
 716 |             "Naive score: 0.651 (0.003)\n"
 717 |           ],
 718 |           "name": "stdout"
 719 |         }
 720 |       ]
 721 |     },
 722 |     {
 723 |       "cell_type": "markdown",
 724 |       "metadata": {
 725 |         "id": "2tEgwsOfsoB6",
 726 |         "colab_type": "text"
 727 |       },
 728 |       "source": [
 729 |         "## **Observation** \n",
 730 |         "- We want to do better than 65% accuracy to consider any other models as an improvement to a totally naive model."
 731 |       ]
 732 |     },
 733 |     {
 734 |       "cell_type": "markdown",
 735 |       "metadata": {
 736 |         "id": "l8QZOyg8s1eQ",
 737 |         "colab_type": "text"
 738 |       },
 739 |       "source": [
 740 |         "## **Creating a Baseline Classifier**\n",
 741 |         "Now we'll create a baseline classifier, one that seeks to correctly predict the class that each observation belongs to.  Since the target variable is binary, we'll instantiate a `DecisionTreeClassifier` model. "
 742 |       ]
 743 |     },
 744 |     {
 745 |       "cell_type": "code",
 746 |       "metadata": {
 747 |         "id": "QczFUGSfbQvl",
 748 |         "colab_type": "code",
 749 |         "colab": {
 750 |           "base_uri": "https://localhost:8080/",
 751 |           "height": 34
 752 |         },
 753 |         "outputId": "3b362a0d-dd04-4f5d-ddad-0bee09b549db"
 754 |       },
 755 |       "source": [
 756 |         "# Evaluate baseline model\n",
 757 |         "\n",
 758 |         "# Instantiate a DecisionTreeClassifier\n",
 759 |         "model = DecisionTreeClassifier()\n",
 760 |         "\n",
 761 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator 'cv', and error_score set to 'raise'\n",
 762 |         "m_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\n",
 763 |         "\n",
 764 |         "# Print mean and standard deviation of m_scores: \n",
 765 |         "print('Baseline score: %.3f (%.3f)' % (mean(m_scores), std(m_scores)))"
 766 |       ],
 767 |       "execution_count": null,
 768 |       "outputs": [
 769 |         {
 770 |           "output_type": "stream",
 771 |           "text": [
 772 |             "Baseline score: 0.697 (0.062)\n"
 773 |           ],
 774 |           "name": "stdout"
 775 |         }
 776 |       ]
 777 |     },
 778 |     {
 779 |       "cell_type": "markdown",
 780 |       "metadata": {
 781 |         "id": "GRUBiqqmtNA6",
 782 |         "colab_type": "text"
 783 |       },
 784 |       "source": [
 785 |         "## **Observation**\n",
 786 |         "- We want to do better than 70% with a Stacking Classifier to consider it an improvement over this baseline Decision Tree model."
 787 |       ]
 788 |     },
 789 |     {
 790 |       "cell_type": "markdown",
 791 |       "metadata": {
 792 |         "colab_type": "text",
 793 |         "id": "BMYfcKeDY85K"
 794 |       },
 795 |       "source": [
 796 |         "## **Getting started with Stacking Classifier**\n",
 797 |         "\n",
 798 |         "- We're going to compare several additional baseline classifiers to see if they perform better than the Decision Tree Classifier we just trained previously.\n"
 799 |       ]
 800 |     },
 801 |     {
 802 |       "cell_type": "markdown",
 803 |       "metadata": {
 804 |         "id": "T2pwEXnQBEFf",
 805 |         "colab_type": "text"
 806 |       },
 807 |       "source": [
 808 |         "<p align=\"center\">\n",
 809 |         "<img src=\"https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/assets/stacking.png?raw=True\" alt = \"Stacking\" width=\"90%\">\n",
 810 |         "</p>\n",
 811 |         "<br><br>\n",
 812 |         "\n",
 813 |         "- We'll start by importing additional packages that we'll need."
 814 |       ]
 815 |     },
 816 |     {
 817 |       "cell_type": "code",
 818 |       "metadata": {
 819 |         "id": "eHCHmx7k5NeT",
 820 |         "colab_type": "code",
 821 |         "colab": {}
 822 |       },
 823 |       "source": [
 824 |         "# Import several other classifiers for ensemble\n",
 825 |         "from sklearn.neighbors import KNeighborsClassifier\n",
 826 |         "from sklearn.svm import SVC\n",
 827 |         "from sklearn.naive_bayes import GaussianNB\n",
 828 |         "from sklearn.linear_model import LogisticRegression\n",
 829 |         "from sklearn.ensemble import StackingClassifier"
 830 |       ],
 831 |       "execution_count": null,
 832 |       "outputs": []
 833 |     },
 834 |     {
 835 |       "cell_type": "markdown",
 836 |       "metadata": {
 837 |         "id": "teQMB0aWxhcN",
 838 |         "colab_type": "text"
 839 |       },
 840 |       "source": [
 841 |         "## Create custom functions\n",
 842 |         "1. get_stacking() - This function will create the layers of our `StackingClassifier()`.\n",
 843 |         "2. get_models() - This function will create a dictionary of models to be evaluated.\n",
 844 |         "3. evaluate_model() - This function will evaluate each of the models to be compared."
 845 |       ]
 846 |     },
 847 |     {
 848 |       "cell_type": "markdown",
 849 |       "metadata": {
 850 |         "id": "wqtHxQFPvMqu",
 851 |         "colab_type": "text"
 852 |       },
 853 |       "source": [
 854 |         "## Custom function # 1: get_stacking()\n",
 855 |         "1. `StackingClassifier()` arguments:\n",
 856 |         " - `estimators`: List of baseline classifiers\n",
 857 |         " - `final_estimator`: Defined meta classifier \n",
 858 |         " - `cv`: Number of cross validations to perform."
 859 |       ]
 860 |     },
 861 |     {
 862 |       "cell_type": "code",
 863 |       "metadata": {
 864 |         "id": "YFhBv6jR6FOe",
 865 |         "colab_type": "code",
 866 |         "colab": {}
 867 |       },
 868 |       "source": [
 869 |         "# Define get_stacking():\n",
 870 |         "def get_stacking():\n",
 871 |         "\n",
 872 |         "\t# Create an empty list for the base models called layer1\n",
 873 |         "  layer1 = list()\n",
 874 |         "\n",
 875 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsClassifier, SVC, and GaussianNB base models\n",
 876 |         "  # Hint: layer1.append(('ModelName', Classifier()))\n",
 877 |         "  layer1.append(('DT', DecisionTreeClassifier()))\n",
 878 |         "  layer1.append(('KNN', KNeighborsClassifier()))\n",
 879 |         "  layer1.append(('SVM', SVC()))\n",
 880 |         "  layer1.append(('Bayes', GaussianNB()))\n",
 881 |         "\n",
 882 |         "  # Instantiate Logistic Regression as meta learner model called layer2\n",
 883 |         "  layer2 = LogisticRegression()\n",
 884 |         "\n",
 885 |         "\t# Define StackingClassifier() called model passing layer1 model list and meta learner with 5 cross-validations\n",
 886 |         "  model = StackingClassifier(estimators=layer1, final_estimator=layer2, cv=5)\n",
 887 |         "\n",
 888 |         "  # return model\n",
 889 |         "  return model"
 890 |       ],
 891 |       "execution_count": null,
 892 |       "outputs": []
 893 |     },
 894 |     {
 895 |       "cell_type": "markdown",
 896 |       "metadata": {
 897 |         "id": "d5szw9liyaxp",
 898 |         "colab_type": "text"
 899 |       },
 900 |       "source": [
 901 |         "## Custom function # 2: get_models()"
 902 |       ]
 903 |     },
 904 |     {
 905 |       "cell_type": "code",
 906 |       "metadata": {
 907 |         "id": "0hEJlDLB4kv5",
 908 |         "colab_type": "code",
 909 |         "colab": {}
 910 |       },
 911 |       "source": [
 912 |         "# Define get_models():\n",
 913 |         "def get_models():\n",
 914 |         "\n",
 915 |         "  # Create empty dictionary called models\n",
 916 |         "  models = dict()\n",
 917 |         "\n",
 918 |         "  # Add key:value pairs to dictionary with key as ModelName and value as instantiations (no arguments) for KNeighborsClassifier, SVC, and GaussianNB base models\n",
 919 |         "  # Hint: models['ModelName'] = Classifier()\n",
 920 |         "  models['DT'] = DecisionTreeClassifier() \n",
 921 |         "  models['KNN'] = KNeighborsClassifier() \n",
 922 |         "  models['SVM'] = SVC()\n",
 923 |         "  models['Bayes'] = GaussianNB()\n",
 924 |         "\n",
 925 |         "  # Add key:value pair to dictionary with key called Stacking and value that calls get_stacking() custom function\n",
 926 |         "  models['Stacking'] = get_stacking()\n",
 927 |         "\n",
 928 |         "  # return dictionary\n",
 929 |         "  return models"
 930 |       ],
 931 |       "execution_count": null,
 932 |       "outputs": []
 933 |     },
 934 |     {
 935 |       "cell_type": "markdown",
 936 |       "metadata": {
 937 |         "id": "flSG4dH1zCTK",
 938 |         "colab_type": "text"
 939 |       },
 940 |       "source": [
 941 |         "## Custom function # 3: evaluate_model(model)"
 942 |       ]
 943 |     },
 944 |     {
 945 |       "cell_type": "code",
 946 |       "metadata": {
 947 |         "id": "mGLKRr0j5Nit",
 948 |         "colab_type": "code",
 949 |         "colab": {}
 950 |       },
 951 |       "source": [
 952 |         "# Define evaluate_model:\n",
 953 |         "def evaluate_model(model):\n",
 954 |         "\n",
 955 |         "  # Create RepeatedStratifiedKFold cross-validator with 10 folds, 3 repeats and a seed of 42.\n",
 956 |         "  cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=42)\n",
 957 |         "\n",
 958 |         "  # Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
 959 |         "  scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\n",
 960 |         "\n",
 961 |         "  # return scores\n",
 962 |         "  return scores"
 963 |       ],
 964 |       "execution_count": null,
 965 |       "outputs": []
 966 |     },
 967 |     {
 968 |       "cell_type": "code",
 969 |       "metadata": {
 970 |         "id": "Y5wmC-TH7B7E",
 971 |         "colab_type": "code",
 972 |         "colab": {}
 973 |       },
 974 |       "source": [
 975 |         "# Assign get_models() to a variable called models\n",
 976 |         "models = get_models()"
 977 |       ],
 978 |       "execution_count": null,
 979 |       "outputs": []
 980 |     },
 981 |     {
 982 |       "cell_type": "markdown",
 983 |       "metadata": {
 984 |         "id": "02tyK34l2eh7",
 985 |         "colab_type": "text"
 986 |       },
 987 |       "source": [
 988 |         "## Python Dictionary Review:\n",
 989 |         "- The items() method is used to return the list with all dictionary keys with values. Parameters: This method takes no parameters. Returns: A view object that displays a list of a given dictionary's (key, value) tuple pair.\n",
 990 |         "- For our purposes, we'll use the dictionary created when we call the get_models() custom function in a for loop to iterate over each key:value pair and store the results.\n",
 991 |         "- Then, we will plot the results as a `boxplot` for comparison using `seaborn`.\n",
 992 |         "\n",
 993 |         "1. `sns.boxplot()` arguments:\n",
 994 |         " - `x`: Names of the variables in the data\n",
 995 |         " - `y`: Names of the variables in the data\n",
 996 |         " - `showmeans`: Whether or not to show mark at the mean of the data."
 997 |       ]
 998 |     },
 999 |     {
1000 |       "cell_type": "code",
1001 |       "metadata": {
1002 |         "id": "QzXmYt1o6FWh",
1003 |         "colab_type": "code",
1004 |         "colab": {
1005 |           "base_uri": "https://localhost:8080/",
1006 |           "height": 367
1007 |         },
1008 |         "outputId": "b98ee82e-bf8f-4a33-f187-f2feaef5dc2e"
1009 |       },
1010 |       "source": [
1011 |         "# Evaluate the models and store results\n",
1012 |         "# Create an empty list for the results\n",
1013 |         "results = list()\n",
1014 |         "\n",
1015 |         "# Create an empty list for the model names\n",
1016 |         "names = list()\n",
1017 |         "\n",
1018 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
1019 |         "for name, model in models.items():\n",
1020 |         "\n",
1021 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
1022 |         "\tscores = evaluate_model(model)\n",
1023 |         " \n",
1024 |         "  # Append output from scores to the results list\n",
1025 |         "\tresults.append(scores)\n",
1026 |         " \n",
1027 |         "  # Append name to the names list\n",
1028 |         "\tnames.append(name)\n",
1029 |         " \n",
1030 |         "  # Print name, mean and standard deviation of scores:\n",
1031 |         "\tprint('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\n",
1032 |         " \n",
1033 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
1034 |         "sns.boxplot(x=names, y=results, showmeans=True)"
1035 |       ],
1036 |       "execution_count": null,
1037 |       "outputs": [
1038 |         {
1039 |           "output_type": "stream",
1040 |           "text": [
1041 |             ">DT 0.707 (0.049)\n",
1042 |             ">KNN 0.713 (0.058)\n",
1043 |             ">SVM 0.759 (0.045)\n",
1044 |             ">Bayes 0.760 (0.049)\n",
1045 |             ">Stacking 0.763 (0.050)\n"
1046 |           ],
1047 |           "name": "stdout"
1048 |         },
1049 |         {
1050 |           "output_type": "execute_result",
1051 |           "data": {
1052 |             "text/plain": [
1053 |               "<matplotlib.axes._subplots.AxesSubplot at 0x7fa192283cc0>"
1054 |             ]
1055 |           },
1056 |           "metadata": {
1057 |             "tags": []
1058 |           },
1059 |           "execution_count": 36
1060 |         },
1061 |         {
1062 |           "output_type": "display_data",
1063 |           "data": {
1064 |             "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAZ60lEQVR4nO3df5TddX3n8edrJoRMiBAgE6m5hARnIuLSBhxxLWsrq4GAdrFdV4LtaTjLabZrIQrqKWxdofF3e1ztUKpGzRJ1IdK6eFI3FFBh3aOhzQRSICOQS/h1A8rkFxAmJPPjvX98v+Nchvlx78z9Mfc7r8c5OXPv9/v93O/7e2fmlfd8f11FBGZmll1N9S7AzMyqy0FvZpZxDnozs4xz0JuZZZyD3sws42bVu4CRFixYEEuWLKl3GWZmDWX79u17I6J1tHnTLuiXLFlCV1dXvcswM2sokp4aa5533ZiZZZyD3sws4xz0ZmYZ56A3M8s4B72Na+/evVx11VXs27ev3qWY2SQ56G1cGzdu5MEHH2Tjxo31LsXMJslBb2Pau3cvd9xxBxHBHXfc4a7erEFNu/PobfrYuHEjQ7exHhwcZOPGjVxzzTV1rsoqqbOzk3w+X/a4QqEAQC6XK2tcW1sba9euLXt9NjXu6G1Md999N319fQD09fVx11131bkimy4OHz7M4cOH612GlcgdvY1pxYoVbNmyhb6+Po455hguuOCCepdkFTbZ7npoXGdnZyXLsSpxR29jWr16NZIAaGpqYvXq1XWuyMwmw0FvY1qwYAEXXXQRkrjooos4+eST612SmU2Cd93YuFavXs2TTz7pbt6sgTnobVwLFizgxhtvrHcZZjYF3nVjZpZxDnozs4xz0JuZZZz30duM46tBbTRZ/rlw0JuVyFeC2mga4efCQW8zjq8GtdFk+eeipH30klZKelRSXtK1o8xfLOkeSQ9IelDSxen0JZIOS9qR/vtapTfAzMzGN2FHL6kZuAlYARSAbZI2R0R30WKfBG6LiK9KOhPYAixJ5z0eEcsrW7aZmZWqlI7+XCAfEbsj4iiwCbhkxDIBHJ8+PgF4tnIlmpnZVJQS9IuAZ4qeF9JpxW4A/khSgaSbv6po3tJ0l87/lfTO0VYgaY2kLkldPT09pVdvZmYTqtR59JcBN0dEDrgY+I6kJuA5YHFEnA1cA9wi6fiRgyNifUR0RERHa2trhUoyMzMoLej3AKcWPc+l04pdAdwGEBFbgTnAgog4EhH70unbgceBZVMt2szMSldK0G8D2iUtlTQbWAVsHrHM08C7ASS9mSToeyS1pgdzkXQ60A7srlTxZmY2sQnPuomIfklXAncCzcCGiNgpaR3QFRGbgY8B35B0NcmB2csjIiT9DrBOUh8wCPxpROyv2taYmdlrlHTBVERsITnIWjztU0WPu4HzRhn3feD7U6zRzMymwDc1MzPLOAe9mVnGOejNzDLOQW9mlnEOejOzjHPQm5llnIPezCzjHPRmZhnnoDczyzgHvZlZxjnozcwyzkFvZpZxDnozs4xz0JuZZZyD3sws4xz0ZmYZ56A3M8s4B72ZWcY56M3MMs5Bb2aWcQ56M7OMc9CbmWWcg97MLOMc9GZmGeegNzPLuFn1LsBqo7Ozk3w+X/a4QqEAQC6XK2tcW1sba9euLXt9ZlZ5Dnob1+HDh+tdgplNUUlBL2kl8DdAM/DNiPjCiPmLgY3A/HSZayNiSzrvOuAKYABYGxF3Vq58K9Vku+uhcZ2dnZUsx8xqaMKgl9QM3ASsAArANkmbI6K7aLFPArdFxFclnQlsAZakj1cBbwHeAPxI0rKIGKj0hpiZDZnsrsrJ2LVrFzD5Zqpck9ktWkpHfy6Qj4jdAJI2AZcAxUEfwPHp4xOAZ9PHlwCbIuII8ISkfPp6W8uq0sysDPl8np0P/YL5cxdWfV2DRwXAnsf3VX1dB3ufn9S4UoJ+EfBM0fMC8PYRy9wA3CXpKuA44D1FY+8bMXbRyBVIWgOsAVi8eHEpdZuZjWv+3IWcf8aqepdRUfc8smlS4yp1euVlwM0RkQMuBr4jqeTXjoj1EdERER2tra0VKsnMzKC0jn4PcGrR81w6rdgVwEqAiNgqaQ6woMSxZmZWRaV03duAdklLJc0mObi6ecQyTwPvBpD0ZmAO0JMut0rSsZKWAu3Av1SqeDMzm9iEHX1E9Eu6EriT5NTJDRGxU9I6oCsiNgMfA74h6WqSA7OXR0QAOyXdRnLgth/4M59xY2ZWWyWdR5+eE79lxLRPFT3uBs4bY+xngc9OocYpmcxpVr4atHH4NLphfi9sLL4ydhS+GrRx5PN5Htj5QHKpXrUNJl8e2PNA9dd1sPwh+XyeR3bs4JTKV/MaQ/t8D+7YUfV1/bLqa8i+zAf9ZLoAXw3aYObD4LsG611FRTXdO7kT4k4BrkCVLabOvkXUu4SG57tXmpllnIPezCzjHPRmZhnnoDczyzgHvZlZxjnozcwyzkFvZpZxDnozs4xz0JuZZZyD3sws4xz0ZmYZ56A3M8s4B72ZWcY56M3MMs5Bb2aWcQ56M7OMc9CbmU3By7NeZPPp6+md9VK9SxmTg97MbAruX3gPzx33FNsX/qTepYzJQW9WokEN8vLclxlUtj62cDJeaBnkKxe/yIstM/u9eHnWizx60nZQ8OhJ26dtV++gNyvRkdlHGGge4MjsI/Uupe7uOPswj5/Szx3LD9e7lLq6f+E9RPqZtkFM267eQW9WgkEN0je7DwR9s/tmdFf/Qssg/9x+hBDct+zIjO3qh7r5waYBAAabBqZtVz+r3gWYNYKRXfyR2UdoOdJSp2pGVygUeAn4VtphVsvTZ/fSnz7uB760vJfFW4+r2vqeAw4VClV7/ckq7uaHDHX173z2kjpVNTp39GYTKO7mgRnd1fe1DLK//SiRtogxC/YvO0rfDOzqf3Xc07/u5ocMNg3wq+OerlNFY3NHbzaBsfbJT7euPpfLcXDvXq749f9Ilbfp7FdoAorjrQk4cfkrXFqlrv5bBPNzuaq89lR8YNdV9S6hZO7ozSYwMGuA12Sn0ukzzJML+xkY0R4OzIInXt8/+gCbFkrq6CWtBP4GaAa+GRFfGDH/y8D56dO5wMKImJ/OGwAeSuc9HRH/oRKFm9XKvJfn1buEaePaH5xQ7xJKUigUeKH3Je55ZFO9S6mog73PE4Xyz3SaMOglNQM3ASuAArBN0uaI6B5aJiKuLlr+KuDsopc4HBHLy67MzMwqopSO/lwgHxG7ASRtAi4BusdY/jLg+sqUZ2ZWvlwuh47s4/wzVtW7lIq655FNLMqdXPa4UvbRLwKeKXpeSKe9hqTTgKVA8VUDcyR1SbpP0vvHGLcmXaarp6enxNLNzKwUlT4Yuwr4h4goPkp1WkR0AB8CviLpjSMHRcT6iOiIiI7W1tYKl2RmNrOVEvR7gFOLnufSaaNZBdxaPCEi9qRfdwP38ur992ZmVmWl7KPfBrRLWkoS8KtIuvNXkXQGcCKwtWjaiUBvRByRtAA4D/irShRuBsnZFbwATfdm7Ezhg1CI6Xc1qDWmCYM+IvolXQncSXJ65YaI2ClpHdAVEZvTRVcBmyKi+JrgNwNflzRI8tfDF4rP1jEzs+or6Tz6iNgCbBkx7VMjnt8wyrifA2dNoT6zceVyOXrUw+C7snUJftO9TeQWTb+rQa0xNcwtEDo7O8nn8zVZ165duwBYu3ZtTdbX1tZW1rr8XphZORom6PP5PA881M3g3JOqvi4dTfY+bX/8l1VfV1Pv/rLH5PN5Hnv4fhbPq/4l+LP7kn3frzy5rerrevpQc9XXYTYTNUzQAwzOPYlXznxfvcuoqDndP5zUuMXzBvhkx6EKV1Nfn+nyrQbMqiFjpyqYmdlIDnozs4xz0JuZZZyD3sws4xz0ZmYZ56A3M8s4B72ZWcY56M3MMs5Bb2aWcQ11ZazZqA7W6DbFQxci1+IC3oOM8TluZuVz0FtDa2trq9m6hm7w1r6ovforW1TbbbNsa5igLxQKNPW+MOl7w0xXTb37KBT6yxpTKBR4+aXmzN0b5qmXmjmuUN6HbdTyTpdD6+rs7KzZOs0qwfvozcwyrmE6+lwux6+OzMrk3StzuVPKGpPL5Xil/7lM3r1yTs4ftmFWaQ0T9GY2sV8C3yImXG6q9qVfT676mpJtmj+JcQd7n+eeRzZVupzXOPTKAQDmzTmx6us62Ps8iybxrjvobVz7m+GLC+DPe+CkbH1aX+bU8uBtT3pgen579Q9Mz6f8bavtQfrkw4MWvbH6/+0t4uRJbZuD3sZ16/Gw81jYdAJ8+EC9q7Hx+MD0ML8Xr+aDsTam/c3wo3kQgrvnwX7/tJg1JP/q2phuPR4GlTweVNLVm1njcdDbqIa6+f406Pvd1Zs1LP/ajiJmHaJv6a3ErGydvliO4m5+iLt6s8bkoB/FQOtWYm6Bgdat9S6lbh45dribH9Iv+MWx9anHzCbPZ92MELMOMXjiwyAYPPFhoucdqD9btxooxY2/rHcFZlYp7uhHSLr4oQtOYkZ39WaWDSUFvaSVkh6VlJd07SjzvyxpR/rvMUkHi+atlrQr/be6ksVX2q+7+aaBZELTQNLVz+B99WbW+CbcdSOpGbgJWAEUgG2SNkdE99AyEXF10fJXAWenj08Crgc6SNrk7enYaXnpzau7+SFJVz/ruRX1KMnMbMpK6ejPBfIRsTsijgKbgEvGWf4y4Nb08YXA3RGxPw33u4GVUym4mgbn7hnu5oc0DSTTzcwaVCkHYxcBzxQ9LwBvH21BSacBS4GfjDN22n5uzuzHL693CWZmFVfps25WAf8QEQMTLllE0hpgDcDixYsrXJKZ2cQ6OzvJ5/Nljxv65LFy76/T1tZWs3vylLLrZg9watHzXDptNKsY3m1T8tiIWB8RHRHR0draWkJJZmbTQ0tLCy0tLfUuY1yldPTbgHZJS0lCehXwoZELSToDOBEoPh/xTuBzkoZu1HwBcN2UKjYzq4Ja3vGy1iYM+ojol3QlSWg3AxsiYqekdUBXRGxOF10FbIqIKBq7X9KnSf6zAFgXEfsruwlmZjaekvbRR8QWYMuIaZ8a8fyGMcZuADZMsj4zM5siXxlrZpZxDnozs4zzTc3MZrAsn1Jowxz0Zla26X46ob2ag95sBnN3PTN4H72ZWcY56M3MMs5Bb2aWcQ56M7OMc9CbmWWcg97MLOMc9GZmGeegNzPLOAe9mVnGOejNzDLOQW9mlnEOejOzjHPQm5llnO9e2aCePtTMZ7rmVX09v+pNeoHXzx2s+rqePtTMsqqvxWzmcdA3oLa2tpqt62j6ARNzlrRXfV3LqO22mc0UDvoGVMt7iA+tq7Ozs2brNLPKaqigb+rdz5zuH1Z9PXrlRQBizvFVX1dT737glKqvx4b54/NspmmYoK/ln/S7dr0EQPsbaxHAp3h3RYPwx+dZo2qYoPfuCqsUd9c20/j0SjOzjHPQm5llnIPezCzjHPRmZhlXUtBLWinpUUl5SdeOscwHJXVL2inplqLpA5J2pP82V6pwMzMrzYRn3UhqBm4CVgAFYJukzRHRXbRMO3AdcF5EHJC0sOglDkfE8grXbWZmJSqloz8XyEfE7og4CmwCLhmxzJ8AN0XEAYCIeL6yZZqZ2WSVEvSLgGeKnhfSacWWAcsk/UzSfZJWFs2bI6krnf7+0VYgaU26TFdPT09ZG2BmZuOr1AVTs4B24F1ADvippLMi4iBwWkTskXQ68BNJD0XE48WDI2I9sB6go6MjKlSTmZlRWke/Bzi16HkunVasAGyOiL6IeAJ4jCT4iYg96dfdwL3A2VOs2czMylBK0G8D2iUtlTQbWAWMPHvmByTdPJIWkOzK2S3pREnHFk0/D+jGzMxqZsJdNxHRL+lK4E6gGdgQETslrQO6ImJzOu8CSd3AAPCJiNgn6beBr0saJPlP5QvFZ+uYmVn1lbSPPiK2AFtGTPtU0eMArkn/FS/zc+CsqZdpZmaT5StjzcwyzkFvZpZxDnozs4xz0JuZZZyD3sws4xz0ZmYZ56A3M8s4B72ZWcY56M3MMs5Bb2aWcQ56M7OMc9CbmWWcg97MLOMc9GZmGeegNzPLOAe9mVnGOejNzDLOQW9mlnEOejOzjHPQm5llnIPezCzjHPRmZhk3q94FVFtnZyf5fL6sMbt27QJg7dq1ZY1ra2sre4yZWbVlPugno6Wlpd4lmJlVTOaD3h22mc103kdvZpZxDnozs4wrKeglrZT0qKS8pGvHWOaDkrol7ZR0S9H01ZJ2pf9WV6pwMzMrzYT76CU1AzcBK4ACsE3S5ojoLlqmHbgOOC8iDkhamE4/Cbge6AAC2J6OPVD5TTEzs9GU0tGfC+QjYndEHAU2AZeMWOZPgJuGAjwink+nXwjcHRH703l3AysrU7qZmZWilKBfBDxT9LyQTiu2DFgm6WeS7pO0soyxSFojqUtSV09PT+nVm5nZhCp1euUsoB14F5ADfirprFIHR8R6YD1AR0dHVKgmKzKZC8fAF4+ZZUEpHf0e4NSi57l0WrECsDki+iLiCeAxkuAvZaxNYy0tLb6AzKzBKWL8BlrSLJLgfjdJSG8DPhQRO4uWWQlcFhGrJS0AHgCWkx6ABc5JF70feGtE7B9rfR0dHdHV1TX5LTIzm4EkbY+IjtHmTbjrJiL6JV0J3Ak0AxsiYqekdUBXRGxO510gqRsYAD4REfvSlX+a5D8HgHXjhbyZmVXehB19rbmjNzMr33gdva+MNTPLOAe9mVnGOejNzDLOQW9mlnEOejOzjHPQm5ll3LQ7vVJSD/BUvesAFgB7613ENOH3Ypjfi2F+L4ZNh/fitIhoHW3GtAv66UJS11jnpM40fi+G+b0Y5vdi2HR/L7zrxsws4xz0ZmYZ56Af2/p6FzCN+L0Y5vdimN+LYdP6vfA+ejOzjHNHb2aWcQ56M7OMm/FBL2lA0g5JOyX9q6SPSWqSdGE6fYekQ5IeTR9/u941V4qkQ0WPL5b0mKTTJN0gqVfSwjGWDUlfKnr+cUk31KzwKpH0F+nPwYPp9/p6SZ8fscxySb9IHz8p6f+NmL9D0sO1rHuqin4H/lXS/ZJ+u941Vdso3+u3S/qopLmTfL3LJf3tKNP/VNIfT73iqanUZ8Y2ssMRsRwgDbZbgOMj4nqSD1RB0r3AxyMikzfKl/RuoBO4MCKekgTJxR8fA/58lCFHgD+Q9PmIqPdFIhUh6R3A+4BzIuJI+klpZwI3A9cVLboKuLXo+esknRoRz0h6c80Krqzi34ELgc8Dv1vfkqpnjO/1bOB7wHeB3kqtKyK+VqnXmooZ39EXi4jngTXAlUrTLusk/Q7wDeB9EfF40awNwKWSThplWD/JWQZX16DEWvkNYG9EHAGIiL0R8VPggKS3Fy33QV4d9LcBl6aPLxsxrxEdDxwAkDRP0o/TLv8hSZek09dJ+ujQAEmflfSR9PEnJG1LO+W/TKcdJ+n/pH8xPCzp0lHWW0uv+V4DHwDeANwj6R4ASV+V1JV2/n85NFjS2yT9PN2ef5H0uuIXl/ReSVslLUj/Ov54Ov1eSV9Mxzwm6Z3p9LmSbpPULel2Sf8sqaIXXznoR4iI3SQfmbhwomUz4FjgB8D7I+KREfMOkYT9R8YYexPwh5JOqGJ9tXQXcGr6C/h3koY62ltJungk/Vtgf0TsKhr3feAP0se/B/xjrQquoJZ098UjwDeBT6fTXwF+PyLOAc4HvpQ2QBuAPwaQ1ETy/nxX0gVAO3AuyWdGvzVtJFYCz0bEb0XEvwH+qYbbNprXfK8johN4Fjg/Is5Pl/uL9GrX3wR+V9JvShrq/D8SEb8FvAc4PPTCkn4fuBa4eIy/dmdFxLnAR4Hr02kfBg5ExJnAfwfeWukNdtDPbH3Az4ErxpjfCawe2bEARMSLwLeBtdUrr3Yi4hDJL9gaoAf4nqTLSX6pP1AUaCM79n0kXf8q4BdU8M/+GjocEcsj4gySUP52GugCPifpQeBHwCLg9RHxJLBP0tnABcAD6WdEXzD0HLgfOIMk+B8CVqTd7Dsj4oUab9+rjPO9HumDku4n2Z63kOzKexPwXERsS1/rxYjoT5f/9yS7Ot8bEQfGWP3/Tr9uB5akj/8dsCl9vYeBBye9cWPwPvoRJJ1O8gHnz9e7lhoYJNkV8WNJ/y0iPlc8MyIOSroF+LMxxn+F5Bf6f1a3zNqIiAHgXuBeSQ8BqyPiZklPkOyz/o/AO0YZ+j2Sv3Aur1GpVRMRW9N91q3AxenXt0ZEn6QngTnpot8k2d5TSDp8SP5j+HxEfH3k60o6J329z0j6cUSsq+qGTGC073XxfElLgY8Db4uIA5JuZnjbx/I4cDqwDBjreN6R9OsANcxfd/RFJLUCXwP+NmbIlWQR0Qu8l2Q3zGid/f8A/guj/FBGxH6SfdRj/UXQMCS9SVJ70aTlDN9F9Vbgy8DuiCiMMvx24K9ID943MklnkOy63AecADyfhvz5wGlFi95O0v2/jeHtvhP4z5Lmpa+1SNJCSW8AeiPiu8BfA+fUZmtGN873+iVg6K/X44GXgRckvR64KJ3+KPAbkt6WvtbrJA39bjxF0gx8W9JbyijpZyQNF5LOBM4qf6vG544+3T8JHENykPE7JOE2Y0TEfkkrgZ8quU108by9km5n7AOvXwKurHaNNTAPuFHSfJKfgzzJn/YAf0+yG+uq0QZGxEvAFwEa9Bj+0O8AJF356ogYkPS/gH9MO94u4NfHcSLiaHrQ8mDaHRMRd6VnHm1N34dDwB8BbcBfSxok2V34X2u1YWMY63t9GfBPkp6NiPMlPUCyzc+QhPHQdl+ajm8h2T//nqEXjohHJP0h8PeSfq/Eev4O2CipO13fTqCiu7d8CwQzK1t6zOJ+4D+NODhtZZLUDBwTEa9IeiPJ8ZA3RcTRSq3DHb2ZlSXdvfBD4HaHfEXMJTmt8xiSv6g+XMmQB3f0ZmaZ54OxZmYZ56A3M8s4B72ZWcY56M3MMs5Bb2aWcf8fwkxYgKc6ZcUAAAAASUVORK5CYII=\n",
1065 |             "text/plain": [
1066 |               "<Figure size 432x288 with 1 Axes>"
1067 |             ]
1068 |           },
1069 |           "metadata": {
1070 |             "tags": [],
1071 |             "needs_background": "light"
1072 |           }
1073 |         }
1074 |       ]
1075 |     },
1076 |     {
1077 |       "cell_type": "markdown",
1078 |       "metadata": {
1079 |         "id": "xUqeWsol5RAt",
1080 |         "colab_type": "text"
1081 |       },
1082 |       "source": [
1083 |         "## **Observation**\n",
1084 |         "- Recall that we want to do better than 70% with a Stacking Classifier to consider it an improvement over the Decision Tree baseline model and, although we did achieve that, we can probably do even better with this dataset.  \n",
1085 |         "- Let's try some hyperparameter tuning via cross-validation next..."
1086 |       ]
1087 |     },
1088 |     {
1089 |       "cell_type": "markdown",
1090 |       "metadata": {
1091 |         "id": "xwc_6_Qf4amu",
1092 |         "colab_type": "text"
1093 |       },
1094 |       "source": [
1095 |         "---\n",
1096 |         "\n",
1097 |         "## Q&A\n",
1098 |         "\n",
1099 |         "--- \n"
1100 |       ]
1101 |     },
1102 |     {
1103 |       "cell_type": "code",
1104 |       "metadata": {
1105 |         "id": "yMZ8gTb6LGCP",
1106 |         "colab_type": "code",
1107 |         "colab": {}
1108 |       },
1109 |       "source": [
1110 |         "# Import additional libraries\n",
1111 |         "from xgboost import XGBClassifier \n",
1112 |         "from sklearn.ensemble import RandomForestClassifier\n",
1113 |         "from sklearn.preprocessing import StandardScaler\n",
1114 |         "from sklearn.pipeline import Pipeline\n",
1115 |         "from sklearn.model_selection import RandomizedSearchCV, GridSearchCV\n",
1116 |         "import xgboost as xgb\n",
1117 |         "from datetime import datetime"
1118 |       ],
1119 |       "execution_count": null,
1120 |       "outputs": []
1121 |     },
1122 |     {
1123 |       "cell_type": "markdown",
1124 |       "metadata": {
1125 |         "id": "BfctBvrs4ZcQ",
1126 |         "colab_type": "text"
1127 |       },
1128 |       "source": [
1129 |         "## Custom function # 4: best_model(name, model)\n",
1130 |         "- We're going to create a Pipeline that scales the data before applying the parameter grid via cross-validation.\n",
1131 |         "- Then it returns the model with the best hyperparameters from the search grid for each model."
1132 |       ]
1133 |     },
1134 |     {
1135 |       "cell_type": "code",
1136 |       "metadata": {
1137 |         "id": "5RG7lpMY3Bzz",
1138 |         "colab_type": "code",
1139 |         "colab": {}
1140 |       },
1141 |       "source": [
1142 |         "# Define best_model:\n",
1143 |         "def best_model(name, model):\n",
1144 |         "  pipe = Pipeline([('scaler', StandardScaler()), ('classifier',model)])  \n",
1145 |         "\n",
1146 |         "  if name == 'SVM':\n",
1147 |         "    param_grid = {'classifier__kernel' : ['linear', 'poly', 'rbf', 'sigmoid', 'precomputed']} \n",
1148 |         "    # Create grid search object\n",
1149 |         "    # this uses k-fold cv\n",
1150 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5, n_jobs=-1)\n",
1151 |         "\n",
1152 |         "    # Fit on data\n",
1153 |         "    best_clf = clf.fit(X, y)\n",
1154 |         "\n",
1155 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
1156 |         "\n",
1157 |         "    return name, best_hyperparams \n",
1158 |         "\n",
1159 |         "  if name == 'Bayes': \n",
1160 |         "    param_grid = {'classifier__var_smoothing' : np.array([1e-09, 1e-08])} \n",
1161 |         "    # Create grid search object\n",
1162 |         "    # this uses k-fold cv\n",
1163 |         "\n",
1164 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5, n_jobs=-1)\n",
1165 |         "\n",
1166 |         "    # Fit on data\n",
1167 |         "    best_clf = clf.fit(X, y)\n",
1168 |         "\n",
1169 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
1170 |         "\n",
1171 |         "    return name, best_hyperparams \n",
1172 |         "\n",
1173 |         "  if name == 'RF': \n",
1174 |         "    param_grid = {'classifier__criterion' : np.array(['gini', 'entropy']),\n",
1175 |         "                  'classifier__max_depth' : np.arange(5,11)} \n",
1176 |         "    # Create grid search object\n",
1177 |         "    # this uses k-fold cv\n",
1178 |         "\n",
1179 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5, n_jobs=-1)\n",
1180 |         "\n",
1181 |         "    # Fit on data\n",
1182 |         "    best_clf = clf.fit(X, y)\n",
1183 |         "\n",
1184 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
1185 |         " \n",
1186 |         "    return name, best_hyperparams  \n",
1187 |         "\n",
1188 |         "  if name == 'XGB':\n",
1189 |         "    param_grid = {'classifier__learning_rate' : np.arange(0.022,0.04,.01),\n",
1190 |         "                  'classifier__max_depth' : np.arange(5,10)} \n",
1191 |         "    # Create grid search object\n",
1192 |         "    # this uses k-fold cv\n",
1193 |         "    clf = GridSearchCV(pipe, param_grid = param_grid, cv = 5,  n_jobs=-1)\n",
1194 |         "\n",
1195 |         "    # Fit on data\n",
1196 |         "    best_clf = clf.fit(X, y)\n",
1197 |         "    best_hyperparams = best_clf.best_estimator_.get_params()['classifier']\n",
1198 |         "\n",
1199 |         "    return name, best_hyperparams  "
1200 |       ],
1201 |       "execution_count": null,
1202 |       "outputs": []
1203 |     },
1204 |     {
1205 |       "cell_type": "markdown",
1206 |       "metadata": {
1207 |         "id": "8Ay2mPQo39mV",
1208 |         "colab_type": "text"
1209 |       },
1210 |       "source": [
1211 |         "## Adding Random Forest and XGBoost to our get_stacking() custom function in layer 1 (and removing the poorest performers DT and KNN):"
1212 |       ]
1213 |     },
1214 |     {
1215 |       "cell_type": "code",
1216 |       "metadata": {
1217 |         "id": "4ow6Aqaz27GJ",
1218 |         "colab_type": "code",
1219 |         "colab": {}
1220 |       },
1221 |       "source": [
1222 |         "# Define get_stacking():  \n",
1223 |         "def get_stacking():\n",
1224 |         "\n",
1225 |         "\t# Create an empty list for the base models called layer1\n",
1226 |         "  layer1 = list()\n",
1227 |         "\n",
1228 |         "  # Append tuple with classifier name and instantiations (no arguments) for SVC and GaussianNB base models AND call cust fx #4 best_model on each\n",
1229 |         "  # Hint: layer1.append((best_model('ModelName', Classifier())))\n",
1230 |         "  layer1.append((best_model('SVM', SVC())))\n",
1231 |         "  layer1.append((best_model('Bayes', GaussianNB())))\n",
1232 |         "\n",
1233 |         "  # Add RandomForestClassifier and xgb.XGBClassifier as base models\n",
1234 |         "  layer1.append((best_model('RF', RandomForestClassifier())))\n",
1235 |         "  layer1.append((best_model('XGB', xgb.XGBClassifier())))\n",
1236 |         "\n",
1237 |         "  # Instantiate Logistic Regression as meta learner model called layer2\n",
1238 |         "  layer2 = LogisticRegression()\n",
1239 |         "\n",
1240 |         "\t# Define StackingClassifier() called model passing layer1 model list and meta learner with 5 cross-validations\n",
1241 |         "  model = StackingClassifier(estimators=layer1, final_estimator=layer2, cv=5)\n",
1242 |         "\n",
1243 |         "  # return model\n",
1244 |         "  return model"
1245 |       ],
1246 |       "execution_count": null,
1247 |       "outputs": []
1248 |     },
1249 |     {
1250 |       "cell_type": "markdown",
1251 |       "metadata": {
1252 |         "id": "dbp5PICC4HEk",
1253 |         "colab_type": "text"
1254 |       },
1255 |       "source": [
1256 |         "## Adding Random Forest and XGBoost to our get_models() custom function:"
1257 |       ]
1258 |     },
1259 |     {
1260 |       "cell_type": "code",
1261 |       "metadata": {
1262 |         "id": "GQqQUH_P3Bto",
1263 |         "colab_type": "code",
1264 |         "colab": {}
1265 |       },
1266 |       "source": [
1267 |         "# Define get_models():\n",
1268 |         "def get_models():\n",
1269 |         "\n",
1270 |         "  # Create empty dictionary called models\n",
1271 |         "  models = dict()\n",
1272 |         "\n",
1273 |         "  # Add key:value pairs to dictionary with key as ModelName and value as instantiations (no arguments) for SVC and GaussianNB base models\n",
1274 |         "  # Hint: models['ModelName'] = Classifier() \n",
1275 |         "  models['SVM'] = SVC()\n",
1276 |         "  models['Bayes'] = GaussianNB()\n",
1277 |         "\n",
1278 |         "  # we'll add two more classifers to the mix - RandomForestClassifier and xgb.XGBClassifier\n",
1279 |         "  models['RF'] = RandomForestClassifier()\n",
1280 |         "  models['XGB'] = xgb.XGBClassifier()\n",
1281 |         "\n",
1282 |         "\n",
1283 |         "  # Add key:value pair to dictionary with key called Stacking and value that calls get_stacking() custom function\n",
1284 |         "  models['Stacking'] = get_stacking()\n",
1285 |         "\n",
1286 |         "  # return dictionary\n",
1287 |         "  return models"
1288 |       ],
1289 |       "execution_count": null,
1290 |       "outputs": []
1291 |     },
1292 |     {
1293 |       "cell_type": "code",
1294 |       "metadata": {
1295 |         "id": "JVTYjSno3B3s",
1296 |         "colab_type": "code",
1297 |         "colab": {}
1298 |       },
1299 |       "source": [
1300 |         "# Assign get_models() to a variable called models\n",
1301 |         "models = get_models()"
1302 |       ],
1303 |       "execution_count": null,
1304 |       "outputs": []
1305 |     },
1306 |     {
1307 |       "cell_type": "markdown",
1308 |       "metadata": {
1309 |         "id": "lNECWtJ74tZh",
1310 |         "colab_type": "text"
1311 |       },
1312 |       "source": [
1313 |         "## Custom function # 3: evaluate_model(model)"
1314 |       ]
1315 |     },
1316 |     {
1317 |       "cell_type": "code",
1318 |       "metadata": {
1319 |         "id": "TsTJZKNk3XWc",
1320 |         "colab_type": "code",
1321 |         "colab": {}
1322 |       },
1323 |       "source": [
1324 |         "# Define evaluate_model(model):\n",
1325 |         "def evaluate_model(model):\n",
1326 |         "\n",
1327 |         "  # Create RepeatedStratifiedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
1328 |         "  cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\n",
1329 |         "\n",
1330 |         "  # Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'accuracy' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
1331 |         "  scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\n",
1332 |         "\n",
1333 |         "  # return scores\n",
1334 |         "  return scores"
1335 |       ],
1336 |       "execution_count": null,
1337 |       "outputs": []
1338 |     },
1339 |     {
1340 |       "cell_type": "markdown",
1341 |       "metadata": {
1342 |         "id": "3CxRVSe_DGlI",
1343 |         "colab_type": "text"
1344 |       },
1345 |       "source": [
1346 |         "# 10 minute break while the following runs..."
1347 |       ]
1348 |     },
1349 |     {
1350 |       "cell_type": "code",
1351 |       "metadata": {
1352 |         "id": "rXrusVVBAbaJ",
1353 |         "colab_type": "code",
1354 |         "colab": {
1355 |           "base_uri": "https://localhost:8080/",
1356 |           "height": 1000
1357 |         },
1358 |         "outputId": "79c1a4f3-f220-4efb-971d-1ac0f00b105d"
1359 |       },
1360 |       "source": [
1361 |         "# Evaluate the models and store results\n",
1362 |         "# Create an empty list for the results\n",
1363 |         "results = list()\n",
1364 |         "\n",
1365 |         "# Create an empty list for the model names\n",
1366 |         "names = list()\n",
1367 |         "\n",
1368 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
1369 |         "for name, model in models.items():\n",
1370 |         "\n",
1371 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
1372 |         "\tscores = evaluate_model(model)\n",
1373 |         " \n",
1374 |         "  # Append output from scores to the results list\n",
1375 |         "\tresults.append(scores)\n",
1376 |         " \n",
1377 |         "  # Append name to the names list\n",
1378 |         "\tnames.append(name)\n",
1379 |         " \n",
1380 |         "  # Print name, mean and standard deviation of scores:\n",
1381 |         "\tprint('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\n",
1382 |         "\n",
1383 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
1384 |         "sns.boxplot(x=names, y=results, showmeans=True)"
1385 |       ],
1386 |       "execution_count": null,
1387 |       "outputs": [
1388 |         {
1389 |           "output_type": "stream",
1390 |           "text": [
1391 |             ">SVM 0.757 (0.040) \n",
1392 |             " SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,\n",
1393 |             "    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',\n",
1394 |             "    max_iter=-1, probability=False, random_state=None, shrinking=True,\n",
1395 |             "    tol=0.001, verbose=False) \n",
1396 |             " 2020-08-05 23:22:38.978574\n",
1397 |             ">Bayes 0.759 (0.055) \n",
1398 |             " GaussianNB(priors=None, var_smoothing=1e-09) \n",
1399 |             " 2020-08-05 23:22:39.052228\n",
1400 |             ">RF 0.763 (0.047) \n",
1401 |             " RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
1402 |             "                       criterion='gini', max_depth=None, max_features='auto',\n",
1403 |             "                       max_leaf_nodes=None, max_samples=None,\n",
1404 |             "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
1405 |             "                       min_samples_leaf=1, min_samples_split=2,\n",
1406 |             "                       min_weight_fraction_leaf=0.0, n_estimators=100,\n",
1407 |             "                       n_jobs=None, oob_score=False, random_state=None,\n",
1408 |             "                       verbose=0, warm_start=False) \n",
1409 |             " 2020-08-05 23:22:44.240611\n",
1410 |             ">XGB 0.754 (0.044) \n",
1411 |             " XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
1412 |             "              colsample_bynode=1, colsample_bytree=1, gamma=0,\n",
1413 |             "              learning_rate=0.1, max_delta_step=0, max_depth=3,\n",
1414 |             "              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,\n",
1415 |             "              nthread=None, objective='binary:logistic', random_state=0,\n",
1416 |             "              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n",
1417 |             "              silent=None, subsample=1, verbosity=1) \n",
1418 |             " 2020-08-05 23:22:45.491879\n",
1419 |             ">Stacking 0.766 (0.042) \n",
1420 |             " StackingClassifier(cv=5,\n",
1421 |             "                   estimators=[('SVM',\n",
1422 |             "                                SVC(C=1.0, break_ties=False, cache_size=200,\n",
1423 |             "                                    class_weight=None, coef0=0.0,\n",
1424 |             "                                    decision_function_shape='ovr', degree=3,\n",
1425 |             "                                    gamma='scale', kernel='linear', max_iter=-1,\n",
1426 |             "                                    probability=False, random_state=None,\n",
1427 |             "                                    shrinking=True, tol=0.001, verbose=False)),\n",
1428 |             "                               ('Bayes',\n",
1429 |             "                                GaussianNB(priors=None, var_smoothing=1e-09)),\n",
1430 |             "                               ('RF',\n",
1431 |             "                                RandomForestClassif...\n",
1432 |             "                                              seed=None, silent=None,\n",
1433 |             "                                              subsample=1, verbosity=1))],\n",
1434 |             "                   final_estimator=LogisticRegression(C=1.0, class_weight=None,\n",
1435 |             "                                                      dual=False,\n",
1436 |             "                                                      fit_intercept=True,\n",
1437 |             "                                                      intercept_scaling=1,\n",
1438 |             "                                                      l1_ratio=None,\n",
1439 |             "                                                      max_iter=100,\n",
1440 |             "                                                      multi_class='auto',\n",
1441 |             "                                                      n_jobs=None, penalty='l2',\n",
1442 |             "                                                      random_state=None,\n",
1443 |             "                                                      solver='lbfgs',\n",
1444 |             "                                                      tol=0.0001, verbose=0,\n",
1445 |             "                                                      warm_start=False),\n",
1446 |             "                   n_jobs=None, passthrough=False, stack_method='auto',\n",
1447 |             "                   verbose=0) \n",
1448 |             " 2020-08-05 23:33:16.365883\n"
1449 |           ],
1450 |           "name": "stdout"
1451 |         },
1452 |         {
1453 |           "output_type": "execute_result",
1454 |           "data": {
1455 |             "text/plain": [
1456 |               "<matplotlib.axes._subplots.AxesSubplot at 0x7fa192333c88>"
1457 |             ]
1458 |           },
1459 |           "metadata": {
1460 |             "tags": []
1461 |           },
1462 |           "execution_count": 74
1463 |         },
1464 |         {
1465 |           "output_type": "display_data",
1466 |           "data": {
1467 |             "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAYu0lEQVR4nO3dfXRc9X3n8fdHso1lHGODDSQWxg42AdJ2TaKQbknSsKmpoWnYnmYT0+as2eXUp9tgN+ThlJzNIZRNNnT7kFSU0DoJJ063wZB0k3WIwU0au5vTuKll7IItHjw2jhmHgB/wEzK2JX33j3sVDbIeZqQZjfTT53WOjmfuvb+533s9+ug7996ZUURgZmbpaqh3AWZmVlsOejOzxDnozcwS56A3M0ucg97MLHGT6l1AX7Nnz4758+fXuwwzs3Fl69atByNiTn/zxlzQz58/n7a2tnqXYWY2rkj6yUDzfOjGzCxxDnozs8Q56M3MEuegNzNLnIPezCxxDnozs8Q56M3MEjfmrqO32mhtbaVQKFQ8rlgsAtDc3FzRuIULF7Jq1aqK12dm1eegt0GdPHmy3iWY2Qg56CeI4XbXPeNaW1urWY6ZjSIfozczS5yD3swscQ56M7PEOejNzBLnoDczS5yD3swscQ56M7PEOejNzBLnoDczS5zfGWsTjj/3xyYaB71Zmfy5PzZeOehtwvHn/thE42P0ZmaJKyvoJS2V9IykgqQ7+pk/T9JGSdskPSHpxnz6fEknJW3Pf/662htgZmaDG/LQjaRG4D5gCVAEtkhaFxHtJYt9Cng4Iu6XdBWwHpifz9sdEYurW7aZmZWrnI7+GqAQEXsi4jSwFripzzIBzMhvnwf8tHolmpnZSJQT9HOB50vuF/Nppe4CPiSpSNbNryyZtyA/pPNPkt7Z3wokrZDUJqntwIED5VdvZmZDqtbJ2JuBr0ZEM3Aj8LeSGoAXgHkRcTXwUeDrkmb0HRwRqyOiJSJa5syZU6WSzMwMygv6/cAlJfeb82mlbgUeBoiIzcBUYHZEnIqIQ/n0rcBu4PKRFm1mZuUrJ+i3AIskLZA0BVgGrOuzzD7gPQCSriQL+gOS5uQnc5H0RmARsKdaxZuZ2dCGvOomIjol3QZsABqBByJip6S7gbaIWAd8DPiSpNvJTszeEhEh6V3A3ZLOAN3A70fE4ZptTT+G83Z3v9XdzFJS1jtjI2I92UnW0ml3ltxuB67tZ9zfA38/whpHnd/qbmYpSf4jEIbTYfut7maWEn8EgplZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mlrjk3zBlZgMbzkeEQJofE5LyvnDQm1nF/DEhvcbDvnDQm01gw+0oU/yYkJT3hY/Rm5klzkFvZpY4B72ZWeIc9GZmiXPQm5klzkFvZpY4B72ZWeIc9GZmiXPQm5klzkFvZpY4B72ZWeIc9GZmiXPQm5klzkFvZpY4B72ZWeIc9GZmiXPQm5klzkFvZpY4B72ZWeIc9GZmiXPQm5klzkFvZpa4SfUuwMys2lpbWykUCqOyrl27dgGwatWqUVnfwoULK15XWUEvaSnwl0Aj8OWIuKfP/HnAGmBmvswdEbE+n/dJ4FagC1gVERsqqtDMrEKFQoGdTz7FzGkX1nxd3acFwP7dh2q+riMdLw1r3JBBL6kRuA9YAhSBLZLWRUR7yWKfAh6OiPslXQWsB+bnt5cBbwbeAHxf0uUR0TWsas3MyjRz2oVcd8WyepdRVRufXjusceUco78GKETEnog4DawFbuqzTAAz8tvnAT/Nb98ErI2IUxHxHFDIH8/MzEZJOUE/F3i+5H4xn1bqLuBDkopk3fzKCsYiaYWkNkltBw4cKLN0MzMrR7WuurkZ+GpENAM3An8rqezHjojVEdESES1z5sypUklmZgblnYzdD1xScr85n1bqVmApQERsljQVmF3mWDMzq6Fyuu4twCJJCyRNITu5uq7PMvuA9wBIuhKYChzIl1sm6RxJC4BFwL9Wq3gzMxvakB19RHRKug3YQHbp5AMRsVPS3UBbRKwDPgZ8SdLtZCdmb4mIAHZKehhoBzqBD/uKGzOz0VXWdfT5NfHr+0y7s+R2O3DtAGM/C3x2BDWamdkI+J2xNq75HZBmQ3PQ27hWKBTYtnNb9p7sWuvO/tm2f1vt13Wk9quwiWPcBL07NxvQTOh+d3e9q6iqhk3+vEGrnnET9IVCgW1PttM97fyar0unA4Ctu39W83U1dByu+TrMbGIbN0EP0D3tfF696r31LqOqprY/UvEYv7qx/vh5YQMZV0FvmUKhwLM7Hmfe9NpfqTrlTHYI4dW9W2q+rn0nGmu+jpQVCgWe3r6di0dhXT0Hlo5s317zddX+dXX6HPTj1LzpXXyq5US9y6iqz7RNr3cJ497FwK2o3mVU1VeIepcw7vmMj5lZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmY3AK5OOse6Nq+mYdLzepQzIn0dv41qxWISjCX7H6hEoRrHeVYxbxWKRox3H2fj02pqv68W3FDl67mHWTVnNRTvm1nRdRzpeIoonKx6X2G+Hmdno6Zx6hmPzXwbBsfmH6Zx6pt4l9csd/ThULBZ55Xhjct/I9JPjjZxbrKyLbW5u5oAO0P3u7hpV1atb3ZxsOknTySYaorY9UsOmBprnNtd0HSlrbm5Gpw5x3RXLarqeH77h/6KGBoIu1NDA1F+ZwTt/elPN1rfx6bXMbb6g4nHu6M3KdGrKKboauzg15VS9S7Ex4JVJx3jm/K10N2Tf3dzd0MUz528dk8fq3dGPQ83Nzbza+UKS3xk7tXlsdrHd6ubMlDMgODPlDOecPqfmXb2NbY9fuJHo8322QbD1wh/UtKsfDj9TbVCHG+GPLoLDE/yZ0reLd1dvL5677+fdfI/uhi5ePHdfnSoamDt6G9SDM2DnObD2PPiDl+tdTX2UdvOAu3oD4P27Vta7hLL5WdqPmHSCMwseJCaldWikUocb4fvTIQTfmz5xu/qBund39TZeTNBf3cF1zdlMTCvSNWdzvUupqwdnQHfexXYr6+onoq5JXb3dfA/l083GAQd9HzHpBN2zdoCge9aOCdvV93TznXnAdU7grn76K9OZcWzGWT/TX0nr8tZKHG3q5gs3HuNYU+0va7WRm4C/toPLuvieM+kxYbv60m6+x0Tu6u21Hr36JLsv7uTRxZW/S9NGn4O+xM+7+Z4z6Q1dE7arf/qc3m6+R6fgqXPqU4+NHUebuvnxolOE4F8uP+Wufhwo66obSUuBvwQagS9HxD195n8euC6/Ow24MCJm5vO6gCfzefsi4n3VKLwWXtvN98i6+kkvLKlHSXVz78/qXYFVqlgschz4ylnP4erad3UHnfntTuDPF3cwb/O5NVvfC8CJCt8xba81ZNBLagTuA5YARWCLpHUR0d6zTETcXrL8SuDqkoc4GRGLq1dy7XRP29/bzfdo6Mqmmxlnmro5vOg0kSdHTILDl5/m9dubmHzSBwjGqnI6+muAQkTsAZC0FrgJaB9g+ZuBT1envNE1Zfct9S7BbNiam5s5cvAgt551iVD1rL36VRqA0naoAZi1+FU+WKOu/isEM8foO6bHi3L+BM8Fni+5X8ynnUXSpcAC4Aclk6dKapP0L5L+47ArNbO623thJ1192sOuSfDcRZ39D7AxodrvjF0GfDMiSv/gXxoR+yW9EfiBpCcjYnfpIEkrgBUA8+bNq3JJZlYtd3zbl12NR+UE/X7gkpL7zfm0/iwDPlw6ISL25//ukbSJ7Pj97j7LrAZWA7S0tPR7JqlYLNLQcZSp7Y+UUfL40dBxiGLR3ZCZ1U45h262AIskLZA0hSzM1/VdSNIVwCxgc8m0WZLOyW/PBq5l4GP7ZmZWA0N29BHRKek2YAPZ5ZUPRMROSXcDbRHRE/rLgLURUdqRXwn8jaRusj8q95RerVOJ5uZmXjw1iVeveu9who9ZU9sfobn54nqXYWYJK+sYfUSsB9b3mXZnn/t39TPuR8AvjqA+MzMbIV/4amaWOAe9mVni/MUjZpakIx0vsfHptTVfz4lXs2/kmT51Vs3XdaTjJeZS+ZeDO+jNLDkLFy4ctXXt2nUYgLmXVR7AlZrLBcPaNge9mSVn1apVo76u1tbWUVtnpXyM3swscQ56M7PEOejNzBLnoDczS5yD3swscQ56M7PEjavLKxs6Do/KxxTr1WMAxNQZNV9XQ8dhoPIPNdt3opHPtE2vfkF9vNiR9QIXTav9F0DvO9HI5cMZeAQaNo1Cz9LzHfG13+1whAG+3mdwP6P23xkLcCj/t/ZXjmfbNHMU1pOycRP0o/sGiOMALLpsND5V8uKKt20098XpXbsAmDp/Uc3XdTmVb9voPi+yfbFobu33BXPH9r44kO+LmYtqvy9mMrrbliK99lOF66+lpSXa2trqWsN4eAPEaPG+6OV90cv7otdY2ReStkZES3/zfIzezCxxDnozs8Q56M3MEuegNzNLnIPezCxxDnozs8Q56M3MEuegNzNLnIPezCxxDnozs8Q56M3MEuegNzNLnIPezCxxDnozs8Q56M3MEuegNzNLnIPezCxxDnozs8Q56M3MEuegNzNLnIPezCxxZQW9pKWSnpFUkHRHP/M/L2l7/vOspCMl85ZL2pX/LK9m8WZmNrRJQy0gqRG4D1gCFIEtktZFRHvPMhFxe8nyK4Gr89vnA58GWoAAtuZjX67qVpiZ2YDK6eivAQoRsSciTgNrgZsGWf5m4MH89q8D34uIw3m4fw9YOpKCzcysMuUE/Vzg+ZL7xXzaWSRdCiwAflDJWEkrJLVJajtw4EA5dZuZWZmqfTJ2GfDNiOiqZFBErI6IlohomTNnTpVLMjOb2MoJ+v3AJSX3m/Np/VlG72GbSseamVkNlBP0W4BFkhZImkIW5uv6LiTpCmAWsLlk8gbgekmzJM0Crs+nmZnZKBnyqpuI6JR0G1lANwIPRMROSXcDbRHRE/rLgLURESVjD0v6H2R/LADujojD1d0EMzMbzJBBDxAR64H1fabd2ef+XQOMfQB4YJj1mZnZCPmdsWZmiXPQm5klzkFvZpY4B72ZWeLKOhlrZpa61tZWCoVCxeN27doFwKpVqyoat3DhworHDJeD3sxsBJqamupdwpAc9GZmVN6Rjyc+Rm9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJc9CbmSXOQW9mljgHvZlZ4hz0ZmaJm1TvAmqttbWVQqFQ0Zhdu3YBsGrVqorGLVy4sOIxNvqG85yANJ8X3hcTQ/JBPxxNTU31LsHGID8venlfjC/JB727B+vLz4le3hcTg4/Rm5klzkFvZpa4soJe0lJJz0gqSLpjgGU+IKld0k5JXy+Z3iVpe/6zrlqFm5lZeYY8Ri+pEbgPWAIUgS2S1kVEe8kyi4BPAtdGxMuSLix5iJMRsbjKdZuZWZnK6eivAQoRsSciTgNrgZv6LPN7wH0R8TJARLxU3TLNzGy4yrnqZi7wfMn9IvD2PstcDiDpn4FG4K6IeCyfN1VSG9AJ3BMR3+67AkkrgBUA8+bNq2gDrDy+Xtps4qrWydhJwCLg3cDNwJckzcznXRoRLcDvAF+QdFnfwRGxOiJaIqJlzpw5VSrJqqGpqcnXTJsN4uDBg6xcuZJDhw7Vu5QBldPR7wcuKbnfnE8rVQR+HBFngOckPUsW/FsiYj9AROyRtAm4Gtg90sKtMu6uzWpjzZo1PPHEE6xZs4aPfvSj9S6nX+V09FuARZIWSJoCLAP6Xj3zbbJuHkmzyQ7l7JE0S9I5JdOvBdoxM0vAwYMHefTRR4kIHn300THb1Q8Z9BHRCdwGbACeAh6OiJ2S7pb0vnyxDcAhSe3ARuATEXEIuBJok/Rv+fR7Sq/WMTMbz9asWUNEANDd3c2aNWvqXFH/1FPkWNHS0hJtbW31LsPMbEhLly6lo6Pj5/enTZvGY489NsiI2pG0NT8feha/M9bMbJiWLFnC5MmTAZg8eTLXX399nSvqn4PezGyYli9fjiQAGhoaWL58eZ0r6p+D3sxsmGbPns0NN9yAJG644QYuuOCCepfUr+Q/ptjMrJaWL1/O3r17x2w3Dw56M7MRmT17Nvfee2+9yxiUD92YmSXOQW9mljgHvZlZ4hz0ZmaJG3PvjJV0APhJvesAZgMH613EGOF90cv7opf3Ra+xsC8ujYh+P/53zAX9WCGpbaC3E0803he9vC96eV/0Guv7woduzMwS56A3M0ucg35gq+tdwBjifdHL+6KX90WvMb0vfIzezCxx7ujNzBLnoDczS9yEDHpJ/13STklPSNou6dOSPtdnmcWSnspv75X0wz7zt0vaMZp1j5Skrrzuf5P0uKRfqXdNY1nJ/toh6TuSZubT50s6mc/r+ZlS73qrQdIlkp6TdH5+f1Z+f76kRZIekbRb0lZJGyW9K1/uFkkH8n2xU9I3JU2r79YMrJ8MeLukjwy35nz7/6qf6b8v6T+PvOKRmXBBL+nfA+8F3hIRvwT8Gtn32X6wz6LLgAdL7r9O0iX5Y1w5GrXWwMmIWBwR/w74JPC5oQZMcD376xeAw8CHS+btzuf1/JyuU41VFRHPA/cD9+ST7iE70fgz4LvA6oi4LCLeCqwE3lgy/KF8X7wZOM3Zv1NjwgAZ8DzwEaCqf5wi4q8j4mvVfMzhmHBBD7weOBgRpwAi4mBE/D/gZUlvL1nuA7w26B+m94l7c59549EM4GUASdMl/WPe5T8p6aZ8+t2SPtIzQNJnJf1hfvsTkrbkHdEf59POlfTd/BXDDklj8hd9mDYDc+tdxCj5PPDL+f/9O4A/A34X2BwR63oWiogdEfHVvoMlTQLOJX9+jUFnZQDwfuANwEZJGwEk3S+pLe/8/7hnsKS3SfpR/jz/V0mvK31wSb8habOk2ZLukvTxfPomSX+Sj3lW0jvz6dMkPSypXdK3JP1YUnXffBURE+oHmA5sB54Fvgj8aj7948Dn89u/DLSVjNkLvAn4UX5/G3AVsKPe21Phtnfl2/40cBR4az59EjAjvz0bKAAC5gOP59MbgN3ABcD1ZF2e8umPAO8Cfhv4Usn6zqv3No9wf53I/20EvgEsze/PB07m+3I7cF+9a63Btv86EMCS/P5fAH84yPK3AAfy/fEi8EOgsd7bMUCtA2XAXmB2yXLnl/z/bwJ+CZgC7AHels+bkf/+3AL8FfBb+bbPyuffBXw8v70J+PP89o3A9/PbHwf+Jr/9C0An0FLNbZ5wHX1EnADeCqwge2I+JOkW4CHg/ZIaOPuwDcAhsq5/GfAU0MH403Mo4gpgKfA1ZV94KeB/SnoC+D5Z53pRROwFDkm6mizct0XEofz29WR/8B4HrgAWAU8CS/Ku5Z0RcXSUt6/amiRtJztscRHwvZJ5pYduPtz/8HHtBuAFsuA5S9557pD0f0omPxQRi4GLyZ4Ln6h9mZUbJAP6+oCkx8me528ma+7eBLwQEVvyxzoWEZ358v8B+CPgNyJioFczPftrK1nDANmrprX54+0Anhj2xg1gwgU9QER0RcSmiPg0cBvw25Edm3wO+FWyzvShfoY+BNzH+D9sQ0RsJuve55C9LJ9D1uEvJuvIpuaLfpmsW/kvwAP5NAGfKwm6hRHxlYh4FngL2S/5ZyTdOWobVBsn8/1xKdk2pxjoZ5G0GFhC9sr2dkmvB3aS/d8CEBG/Rfa8OL/v+Mha0++Qvcobk/rLgNL5khaQddrview4/nfp/Z0YyG7gdcDlgyxzKv+3i1H8hr8JF/SS3iRpUcmkxfR+WuaDZMcn90REsZ/h3wL+F7ChtlXWnqQryF6SHgLOA16KiDOSriMLth7fIuv+30bvdm8A/quk6fljzZV0oaQ3AB0R8b+BP6UkGMaziOgAVgEfy48/Jyt/hXc/8JGI2Ef2//hnwNeBayW9r2TxwU5cvoMs+MacQTLgOFlQQ3ZI5hXgqKSLyF7hADwDvF7S2/LHel3Jc+InZH8wvibpzRWU9M9k5wSRdBXwi5Vv1eCSftIOYDpwb36pXCfZ8egV+bxvAK1kVxOcJSKOA38CkP0+jDs9hyIg61CXR0SXpL8DviPpSaCN7Bg+ABFxOj85dSQiuvJp/5BfebQ53w8ngA8BC4E/ldQNnAH+22htWK1FxLb80NbNZMdgU/V7wL6I6DlM9UWyV3PXkF2p8heSvkD2qu848JmSsR+U9A6yBrJI1vGPRQNlwM3AY5J+GhHXSdpG9rvwPFkY9/w+fDAf30R2rubXeh44Ip6W9LvANyT9Zpn1fBFYI6k9X99OsnNoVeOPQLBB5ecsHgf+U0Tsqnc9ZqmR1AhMjohXJV1Gdp7sTVHFS3YnYkdvZcpfRj4CfMshb1Yz08gu65xM9kr7D6oZ8uCO3swseRPuZKyZ2UTjoDczS5yD3swscQ56M7PEOejNzBL3/wFHLiavsTsD7AAAAABJRU5ErkJggg==\n",
1468 |             "text/plain": [
1469 |               "<Figure size 432x288 with 1 Axes>"
1470 |             ]
1471 |           },
1472 |           "metadata": {
1473 |             "tags": [],
1474 |             "needs_background": "light"
1475 |           }
1476 |         }
1477 |       ]
1478 |     },
1479 |     {
1480 |       "cell_type": "markdown",
1481 |       "metadata": {
1482 |         "id": "uZlAHPaD419_",
1483 |         "colab_type": "text"
1484 |       },
1485 |       "source": [
1486 |         "## **Observation**\n",
1487 |         "- Before we added XGBoost and hyperparameter tuning, our Stacking Classifier got ~ 76% accuracy. \n",
1488 |         "- Here, we got just around 77% accuracy, a minor improvement, but an improvement nonetheless.\n",
1489 |         "- We could continue fiddling with other algorithms in layer 1\n",
1490 |         "- We could try other algorithms in layer 2.\n",
1491 |         "- We could add more hyperparameters to our parameter grid.\n",
1492 |         "- To this last point, keep in mind that the more parameters there are in a grid to search over, the longer it takes to train the Stacking Classifier."
1493 |       ]
1494 |     },
1495 |     {
1496 |       "cell_type": "markdown",
1497 |       "metadata": {
1498 |         "id": "lj8WeJR__bUo",
1499 |         "colab_type": "text"
1500 |       },
1501 |       "source": [
1502 |         "---\n",
1503 |         "\n",
1504 |         "## Q&A\n",
1505 |         "\n",
1506 |         "--- "
1507 |       ]
1508 |     },
1509 |     {
1510 |       "cell_type": "markdown",
1511 |       "metadata": {
1512 |         "id": "FPY3I2BlVxig",
1513 |         "colab_type": "text"
1514 |       },
1515 |       "source": [
1516 |         "# **Stacking Regressor**"
1517 |       ]
1518 |     },
1519 |     {
1520 |       "cell_type": "code",
1521 |       "metadata": {
1522 |         "id": "ftxDhyDq2lrH",
1523 |         "colab_type": "code",
1524 |         "colab": {}
1525 |       },
1526 |       "source": [
1527 |         "# Import libraries\n",
1528 |         "from sklearn.model_selection import RepeatedKFold\n",
1529 |         "from sklearn.dummy import DummyRegressor\n",
1530 |         "from sklearn.svm import SVR"
1531 |       ],
1532 |       "execution_count": null,
1533 |       "outputs": []
1534 |     },
1535 |     {
1536 |       "cell_type": "markdown",
1537 |       "metadata": {
1538 |         "id": "nqDHD8A_nhPB",
1539 |         "colab_type": "text"
1540 |       },
1541 |       "source": [
1542 |         "## **2nd Dataset**\n",
1543 |         "\n",
1544 |         "\n",
1545 |         "The second dataset we'll use is a CSV file named `abalone.csv`, which contains data on physical measurements of abalone shells used to determine the age of the abalone.  It contains the following columns:\n",
1546 |         "\n",
1547 |         "- `Sex`: M, F, and I (infant) - (removed for our purposes)\n",
1548 |         "- `Length`: Longest shell measurement (mm)\n",
1549 |         "- `Diameter`: Perpendicular to length (mm)\n",
1550 |         "- `Height`: with meat in shell (mm)\n",
1551 |         "- `Whole weight`: whole abalone (grams)\n",
1552 |         "- `Shucked weight`: weight of meat (grams)\n",
1553 |         "- `Viscera weight`: gut weight (grams)\n",
1554 |         "- `Shell weight`: after being dried (grams)\n",
1555 |         "- `Rings`: +1.5 gives the age in years\n",
1556 |         "\n",
1557 |         "\t"
1558 |       ]
1559 |     },
1560 |     {
1561 |       "cell_type": "markdown",
1562 |       "metadata": {
1563 |         "id": "HwNnn3ZKrh1o",
1564 |         "colab_type": "text"
1565 |       },
1566 |       "source": [
1567 |         "### **Get the dataset**"
1568 |       ]
1569 |     },
1570 |     {
1571 |       "cell_type": "code",
1572 |       "metadata": {
1573 |         "id": "K4LeaM4PzyAh",
1574 |         "colab_type": "code",
1575 |         "colab": {}
1576 |       },
1577 |       "source": [
1578 |         "# Read in the dataset as Pandas DataFrame\n",
1579 |         "abalone = pd.read_csv('https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/data/abalone.csv?raw=true')"
1580 |       ],
1581 |       "execution_count": null,
1582 |       "outputs": []
1583 |     },
1584 |     {
1585 |       "cell_type": "code",
1586 |       "metadata": {
1587 |         "id": "KfsmhIBdApVp",
1588 |         "colab_type": "code",
1589 |         "colab": {}
1590 |       },
1591 |       "source": [
1592 |         "# Look at data using the info() function\n",
1593 |         "abalone.info()"
1594 |       ],
1595 |       "execution_count": null,
1596 |       "outputs": []
1597 |     },
1598 |     {
1599 |       "cell_type": "markdown",
1600 |       "metadata": {
1601 |         "id": "NZAeIFGwBhe6",
1602 |         "colab_type": "text"
1603 |       },
1604 |       "source": [
1605 |         "## **Observations:** \n",
1606 |         "- Here, there are no missing values.  Again, that is not typical.\n",
1607 |         "- There is a mixture of object, float, and integers with the first column being `object` (categorical), the next 7 `float64` and the last 'int64`."
1608 |       ]
1609 |     },
1610 |     {
1611 |       "cell_type": "code",
1612 |       "metadata": {
1613 |         "id": "8D4Gfh08Avb2",
1614 |         "colab_type": "code",
1615 |         "colab": {}
1616 |       },
1617 |       "source": [
1618 |         "# Look at data using the describe() function\n",
1619 |         "abalone.describe()"
1620 |       ],
1621 |       "execution_count": null,
1622 |       "outputs": []
1623 |     },
1624 |     {
1625 |       "cell_type": "markdown",
1626 |       "metadata": {
1627 |         "id": "WDGc7PPBBkGX",
1628 |         "colab_type": "text"
1629 |       },
1630 |       "source": [
1631 |         "## **Observations:** \n",
1632 |         "- Notice that the min of the `Height` column is zero.  Even though there are no missing values, this is indicative of the measurements for that feature having not been captured.\n",
1633 |         "- Again, the printout makes it appear as if all numeric values are float.  \n",
1634 |         "\n"
1635 |       ]
1636 |     },
1637 |     {
1638 |       "cell_type": "code",
1639 |       "metadata": {
1640 |         "id": "FVGtuWoDAvl2",
1641 |         "colab_type": "code",
1642 |         "colab": {}
1643 |       },
1644 |       "source": [
1645 |         "# Print the first 5 rows of the data using the head() function\n",
1646 |         "abalone.head()"
1647 |       ],
1648 |       "execution_count": null,
1649 |       "outputs": []
1650 |     },
1651 |     {
1652 |       "cell_type": "markdown",
1653 |       "metadata": {
1654 |         "id": "wnmVoSl8BmMY",
1655 |         "colab_type": "text"
1656 |       },
1657 |       "source": [
1658 |         "## **Observation:**\n",
1659 |         "- Printing out the first 5 rows, we see that the 1st column is the only non-numeric feature in this dataset and is aligned with the `object` datatype as we saw above when we called `.info()`."
1660 |       ]
1661 |     },
1662 |     {
1663 |       "cell_type": "code",
1664 |       "metadata": {
1665 |         "id": "xPfVhWzRrm_w",
1666 |         "colab_type": "code",
1667 |         "colab": {}
1668 |       },
1669 |       "source": [
1670 |         "# Convert Pandas DataFrame to numpy array - Return only the values of the DataFrame with DataFrame.to_numpy()\n",
1671 |         "abalone = abalone.to_numpy()\n",
1672 |         "\n",
1673 |         "# Create X matrix and y (target) array using slicing [row_start:row_end, 1:target_col],[row_start:row_end, target_col] - Removing 1st column by starting at index 1\n",
1674 |         "X, y = abalone[:, 1:-1], abalone[:, -1]\n",
1675 |         "\n",
1676 |         "# Print X matrix and y (target) array dimensions using .shape\n",
1677 |         "print('Shape: %s, %s' % (X.shape,y.shape))"
1678 |       ],
1679 |       "execution_count": null,
1680 |       "outputs": []
1681 |     },
1682 |     {
1683 |       "cell_type": "code",
1684 |       "metadata": {
1685 |         "id": "fZ6CHfsVrpE7",
1686 |         "colab_type": "code",
1687 |         "colab": {}
1688 |       },
1689 |       "source": [
1690 |         "# Convert y (target) array to 'float32' using .astype()\n",
1691 |         "y = y.astype('float32')"
1692 |       ],
1693 |       "execution_count": null,
1694 |       "outputs": []
1695 |     },
1696 |     {
1697 |       "cell_type": "markdown",
1698 |       "metadata": {
1699 |         "id": "7bYvtBfSF7k7",
1700 |         "colab_type": "text"
1701 |       },
1702 |       "source": [
1703 |         "## **Creating a Naive Regressor**\n",
1704 |         "Here we'll use the `DummyRegressor` from `sklearn`.  This creates a so-called 'naive' regressor and is simply a model that predicts a single value for all of the rows, regardless of their original value.  \n",
1705 |         "\n",
1706 |         "1. `DummyRegressor()` arguments:\n",
1707 |         " - `strategy`: Strategy to use to generate predictions.\n",
1708 |         "\n",
1709 |         "2. `RepeatedKFold()` arguments:\n",
1710 |         " - `n_splits`: Number of folds.\n",
1711 |         " - `n_repeats`: Number of times cross-validator needs to be repeated.\n",
1712 |         " - `random_state`: Controls the generation of the random states for each repetition. Pass an int for reproducible output across multiple function calls.  (This is an equivalent argument to np.random.seed above, but will be specific to this naive model.)\n",
1713 |         "\n",
1714 |         "3. `cross_val_score()` arguments:\n",
1715 |         " - The model to use.\n",
1716 |         " - The data to fit. (X)\n",
1717 |         " - The target variable to try to predict. (y)\n",
1718 |         " - `scoring`: A single string scorer callable object/function such as 'accuracy' or 'roc_auc'.  See https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter for more options.\n",
1719 |         " - `cv`: Cross-validation splitting strategy (default is 5)\n",
1720 |         " - `n_jobs`: Number of CPU cores used when parallelizing.  Set to -1 helps to avoid non-convergence errors.\n",
1721 |         " - `error_score`: Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised."
1722 |       ]
1723 |     },
1724 |     {
1725 |       "cell_type": "code",
1726 |       "metadata": {
1727 |         "id": "jAJdcu_Hrrg8",
1728 |         "colab_type": "code",
1729 |         "colab": {}
1730 |       },
1731 |       "source": [
1732 |         "# Evaluate naive\n",
1733 |         "\n",
1734 |         "# Instantiate a DummyRegressor with 'median' strategy\n",
1735 |         "naive = DummyRegressor(strategy='median')\n",
1736 |         "\n",
1737 |         "# Create RepeatedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
1738 |         "cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\n",
1739 |         "\n",
1740 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'neg_mean_absolute_error' scoring, cross validator, n_jobs=-1, and error_score set to 'raise'\n",
1741 |         "n_scores = cross_val_score(naive, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')\n",
1742 |         "\n",
1743 |         "# Print mean and standard deviation of n_scores:\n",
1744 |         "print('Baseline: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))"
1745 |       ],
1746 |       "execution_count": null,
1747 |       "outputs": []
1748 |     },
1749 |     {
1750 |       "cell_type": "markdown",
1751 |       "metadata": {
1752 |         "id": "dlYQmsCQHcdJ",
1753 |         "colab_type": "text"
1754 |       },
1755 |       "source": [
1756 |         "## **Observation** \n",
1757 |         "- We want to do better than -2.37 to consider any other models as an improvement to a totally naive regressor model with the Abalone dataset."
1758 |       ]
1759 |     },
1760 |     {
1761 |       "cell_type": "markdown",
1762 |       "metadata": {
1763 |         "id": "ZfiEdoUMHo-q",
1764 |         "colab_type": "text"
1765 |       },
1766 |       "source": [
1767 |         "## **Creating a Baseline Regressor**\n",
1768 |         "Now we'll create a baseline regressor, one that seeks to correctly predict the value for each observation.  Since the target variable is continuous, we'll instantiate a Support Vector Regression model.\n",
1769 |         "\n",
1770 |         "1. `SVR()` arguments:\n",
1771 |         " - `kernel`: Specifies the kernel type to be used in the algorithm.\n",
1772 |         " - `gamma`:  Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. \n",
1773 |         " - `C`: Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty."
1774 |       ]
1775 |     },
1776 |     {
1777 |       "cell_type": "code",
1778 |       "metadata": {
1779 |         "id": "cFip40FPrvOn",
1780 |         "colab_type": "code",
1781 |         "colab": {}
1782 |       },
1783 |       "source": [
1784 |         "# Evaluate baseline model\n",
1785 |         "\n",
1786 |         "# Instantiate a Support Vector Regressor with 'rbf' kernel, gamma set to 'scale', and regularization parameter set to 10\n",
1787 |         "model = SVR(kernel='rbf',gamma='scale',C=10)\n",
1788 |         "\n",
1789 |         "# Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'neg_mean_absolute_error' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
1790 |         "m_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')\n",
1791 |         "\n",
1792 |         "# Print mean and standard deviation of m_scores: \n",
1793 |         "print('Good: %.3f (%.3f)' % (mean(m_scores), std(m_scores)))"
1794 |       ],
1795 |       "execution_count": null,
1796 |       "outputs": []
1797 |     },
1798 |     {
1799 |       "cell_type": "markdown",
1800 |       "metadata": {
1801 |         "id": "Z_PMtVARKzBX",
1802 |         "colab_type": "text"
1803 |       },
1804 |       "source": [
1805 |         "## **Observation**\n",
1806 |         "- We want to do better than -1.48 with a Stacking Regressor to consider it an improvement over this baseline support vector regression model with the Abalone dataset."
1807 |       ]
1808 |     },
1809 |     {
1810 |       "cell_type": "markdown",
1811 |       "metadata": {
1812 |         "id": "J-OGF_7bupzn",
1813 |         "colab_type": "text"
1814 |       },
1815 |       "source": [
1816 |         "## **Getting started with Stacking Regressor**\n",
1817 |         "- We're going to compare several additional baseline regressors to see if they perform better than SVR we just trained previously.\n",
1818 |         "- We'll start by importing additional packages that we'll need."
1819 |       ]
1820 |     },
1821 |     {
1822 |       "cell_type": "code",
1823 |       "metadata": {
1824 |         "id": "jxbxTPkPrkNb",
1825 |         "colab_type": "code",
1826 |         "colab": {}
1827 |       },
1828 |       "source": [
1829 |         "# Compare machine learning models for regression\n",
1830 |         "from sklearn.linear_model import LinearRegression\n",
1831 |         "from sklearn.neighbors import KNeighborsRegressor\n",
1832 |         "from sklearn.tree import DecisionTreeRegressor\n",
1833 |         "from sklearn.ensemble import StackingRegressor"
1834 |       ],
1835 |       "execution_count": null,
1836 |       "outputs": []
1837 |     },
1838 |     {
1839 |       "cell_type": "markdown",
1840 |       "metadata": {
1841 |         "id": "yixxr2JLN9UP",
1842 |         "colab_type": "text"
1843 |       },
1844 |       "source": [
1845 |         "## Create custom functions\n",
1846 |         "1. get_stacking() - This function will create the layers of our `StackingRegressor()`.\n",
1847 |         "2. get_models() - This function will create a dictionary of models to be evaluated.\n",
1848 |         "3. evaluate_model() - This function will evaluate each of the models to be compared."
1849 |       ]
1850 |     },
1851 |     {
1852 |       "cell_type": "markdown",
1853 |       "metadata": {
1854 |         "id": "FdF239ZRN92B",
1855 |         "colab_type": "text"
1856 |       },
1857 |       "source": [
1858 |         "## Custom function # 1: get_stacking()\n",
1859 |         "1. `StackingRegressor()` arguments:\n",
1860 |         " - `estimators`: List of baseline regressors\n",
1861 |         " - `final_estimator`: Defined meta regressor \n",
1862 |         " - `cv`: Number of cross validations to perform."
1863 |       ]
1864 |     },
1865 |     {
1866 |       "cell_type": "code",
1867 |       "metadata": {
1868 |         "id": "qoRNxZSj72bZ",
1869 |         "colab_type": "code",
1870 |         "colab": {}
1871 |       },
1872 |       "source": [
1873 |         "# Define get_stacking():\n",
1874 |         "def get_stacking():\n",
1875 |         "\n",
1876 |         "\t# Create an empty list for the base models called layer1\n",
1877 |         "  layer1 = list()\n",
1878 |         "\n",
1879 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
1880 |         "  # Hint: layer1.append(('ModelName', Classifier()))\n",
1881 |         "  layer1.append(('KNN', KNeighborsRegressor()))\n",
1882 |         "  layer1.append(('DT', DecisionTreeRegressor()))\n",
1883 |         "  layer1.append(('SVM', SVR()))\n",
1884 |         "\n",
1885 |         "  # Instantiate Linear Regression as meta learner model called layer2\n",
1886 |         "  layer2 = LinearRegression()\n",
1887 |         "\n",
1888 |         "\t# Define Stackingregressor() called model passing layer1 model list and meta learner with 5 cross-validations\n",
1889 |         "  model = StackingRegressor(estimators=layer1, final_estimator=layer2, cv=5)\n",
1890 |         "\n",
1891 |         "  # return model\n",
1892 |         "  return model"
1893 |       ],
1894 |       "execution_count": null,
1895 |       "outputs": []
1896 |     },
1897 |     {
1898 |       "cell_type": "markdown",
1899 |       "metadata": {
1900 |         "id": "KClsJExROLAZ",
1901 |         "colab_type": "text"
1902 |       },
1903 |       "source": [
1904 |         "## Custom function # 2: get_models()"
1905 |       ]
1906 |     },
1907 |     {
1908 |       "cell_type": "code",
1909 |       "metadata": {
1910 |         "id": "PtYbhE_ps4yo",
1911 |         "colab_type": "code",
1912 |         "colab": {}
1913 |       },
1914 |       "source": [
1915 |         "# Define get_models():\n",
1916 |         "def get_models():\n",
1917 |         "\n",
1918 |         "  # Create empty dictionary called models\n",
1919 |         "  models = dict()\n",
1920 |         "\n",
1921 |         "  # Add key:value pairs to dictionary with key as ModelName and value as instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
1922 |         "  # Hint: models['ModelName'] = Classifier()\n",
1923 |         "  models['KNN'] = KNeighborsRegressor()\n",
1924 |         "  models['DT'] = DecisionTreeRegressor()\n",
1925 |         "  models['SVM'] = SVR()\n",
1926 |         "\n",
1927 |         "  # Add key:value pair to dictionary with key called Stacking and value that calls get_stacking() custom function\n",
1928 |         "  models['Stacking'] = get_stacking()\n",
1929 |         "\n",
1930 |         "  # return dictionary\n",
1931 |         "  return models"
1932 |       ],
1933 |       "execution_count": null,
1934 |       "outputs": []
1935 |     },
1936 |     {
1937 |       "cell_type": "markdown",
1938 |       "metadata": {
1939 |         "id": "SYH3KcjcOc56",
1940 |         "colab_type": "text"
1941 |       },
1942 |       "source": [
1943 |         "## Custom function # 3: evaluate_model(model)"
1944 |       ]
1945 |     },
1946 |     {
1947 |       "cell_type": "code",
1948 |       "metadata": {
1949 |         "id": "H95M82gks6EL",
1950 |         "colab_type": "code",
1951 |         "colab": {}
1952 |       },
1953 |       "source": [
1954 |         "# Define evaluate_model:\n",
1955 |         "def evaluate_model(model):\n",
1956 |         "\n",
1957 |         "  # Create RepeatedKFold cross-validator with 10 folds, 3 repeats and a seed of 1.\n",
1958 |         "\tcv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\n",
1959 |         " \n",
1960 |         "  # Calculate accuracy using `cross_val_score()` with model instantiated, data to fit, target variable, 'neg_mean_absolute_error' scoring, cross validator 'cv', n_jobs=-1, and error_score set to 'raise'\n",
1961 |         "\tscores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')\n",
1962 |         " \n",
1963 |         "  # return scores\n",
1964 |         "\treturn scores"
1965 |       ],
1966 |       "execution_count": null,
1967 |       "outputs": []
1968 |     },
1969 |     {
1970 |       "cell_type": "code",
1971 |       "metadata": {
1972 |         "id": "2C6Hw-wj56eK",
1973 |         "colab_type": "code",
1974 |         "colab": {}
1975 |       },
1976 |       "source": [
1977 |         "# Assign get_models() to a variable called models\n",
1978 |         "models = get_models()"
1979 |       ],
1980 |       "execution_count": null,
1981 |       "outputs": []
1982 |     },
1983 |     {
1984 |       "cell_type": "code",
1985 |       "metadata": {
1986 |         "id": "BZl3DjmU58Lm",
1987 |         "colab_type": "code",
1988 |         "colab": {}
1989 |       },
1990 |       "source": [
1991 |         "# Evaluate the models and store results\n",
1992 |         "# Create an empty list for the results\n",
1993 |         "results = list()\n",
1994 |         "\n",
1995 |         "# Create an empty list for the model names\n",
1996 |         "names = list()\n",
1997 |         "\n",
1998 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
1999 |         "for name, model in models.items():\n",
2000 |         "\n",
2001 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
2002 |         "\tscores = evaluate_model(model)\n",
2003 |         " \n",
2004 |         "  # Append output from scores to the results list\n",
2005 |         "\tresults.append(scores)\n",
2006 |         " \n",
2007 |         "  # Append name to the names list\n",
2008 |         "\tnames.append(name)\n",
2009 |         " \n",
2010 |         "  # Print name, mean and standard deviation of scores:\n",
2011 |         "\tprint('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\n",
2012 |         " \n",
2013 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
2014 |         "sns.boxplot(x=names, y=results, showmeans=True)"
2015 |       ],
2016 |       "execution_count": null,
2017 |       "outputs": []
2018 |     },
2019 |     {
2020 |       "cell_type": "markdown",
2021 |       "metadata": {
2022 |         "id": "d6EKNBV1UOuG",
2023 |         "colab_type": "text"
2024 |       },
2025 |       "source": [
2026 |         "## **Observation**\n",
2027 |         "- Recall that we want to do better than -1.48  with a Stacking Regressor to consider it an improvement over this baseline SVR and, although close, we did not achieve that with this dataset.\n",
2028 |         "- So what else can try to improve our results with stacking?\n",
2029 |         "\n",
2030 |         "### We'll add another layer to the mix..."
2031 |       ]
2032 |     },
2033 |     {
2034 |       "cell_type": "markdown",
2035 |       "metadata": {
2036 |         "id": "N9DZ7iyZFxXo",
2037 |         "colab_type": "text"
2038 |       },
2039 |       "source": [
2040 |         "## **Double Stacking - 2 Layers**\n",
2041 |         "- Can get a little tricky\n",
2042 |         "- Just make sure that you name your layers VERY CLEARLY!\n",
2043 |         "- Both the last layer (here it's layer 3) and the stacking model will use a call to `StackingRegressor()`\n",
2044 |         "- The last layer will combine the 2nd layer with the final estimator while the model will combine the 1st layer with this last layer.\n",
2045 |         "\n",
2046 |         "<p align=\"center\">\n",
2047 |         "<img src=\"https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/blob/master/assets/DoubleStacking.png?raw=True\" alt = \"Double Stacking\" width=\"90%\">\n",
2048 |         "</p>\n",
2049 |         "<br><br>"
2050 |       ]
2051 |     },
2052 |     {
2053 |       "cell_type": "code",
2054 |       "metadata": {
2055 |         "id": "fXvUmmQQF6vq",
2056 |         "colab_type": "code",
2057 |         "colab": {}
2058 |       },
2059 |       "source": [
2060 |         "# Define get_stacking() - adding another layer:\n",
2061 |         "def get_stacking():\n",
2062 |         "\n",
2063 |         "\t# Create an empty list for the 1st layer of base models called layer1\n",
2064 |         "  layer1 = list()\n",
2065 |         "\n",
2066 |         "  # Create an empty list for the 2nd layer of base models called layer2\n",
2067 |         "  layer2 = list()\n",
2068 |         "\n",
2069 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
2070 |         "  # Hint: layer1.append(('ModelName', Classifier()))\n",
2071 |         "  layer1.append(('KNN', KNeighborsRegressor()))\n",
2072 |         "  layer1.append(('DT', DecisionTreeRegressor()))\n",
2073 |         "  layer1.append(('SVM', SVR()))\n",
2074 |         "\n",
2075 |         "  # Append tuple with classifier name and instantiations (no arguments) for KNeighborsRegressor, DecisionTreeRegressor, and SVR base models\n",
2076 |         "  # Hint: layer2.append(('ModelName', Classifier()))\n",
2077 |         "  layer2.append(('KNN', KNeighborsRegressor()))\n",
2078 |         "  layer2.append(('DT', DecisionTreeRegressor()))\n",
2079 |         "  layer2.append(('SVM', SVR()))\n",
2080 |         "\n",
2081 |         "\t# Define meta learner StackingRegressor() called layer3 passing layer2 model list to estimators, LinearRegression() to final_estimator with 5 cross-validations\n",
2082 |         "  layer3 = StackingRegressor(estimators=layer2, final_estimator=LinearRegression(), cv=5)\n",
2083 |         "\n",
2084 |         "\t# Define Stackingregressor()  called model passing layer1 model list to estimators and meta learner (layer3) to final_estimator with 5 cross-validations\n",
2085 |         "  model = StackingRegressor(estimators=layer1, final_estimator=layer3, cv=5)\n",
2086 |         "\n",
2087 |         "  # return model\n",
2088 |         "  return model"
2089 |       ],
2090 |       "execution_count": null,
2091 |       "outputs": []
2092 |     },
2093 |     {
2094 |       "cell_type": "code",
2095 |       "metadata": {
2096 |         "id": "CnMMqOJ16Bft",
2097 |         "colab_type": "code",
2098 |         "colab": {}
2099 |       },
2100 |       "source": [
2101 |         "# Assign get_models() to a variable called models\n",
2102 |         "models = get_models()"
2103 |       ],
2104 |       "execution_count": null,
2105 |       "outputs": []
2106 |     },
2107 |     {
2108 |       "cell_type": "code",
2109 |       "metadata": {
2110 |         "id": "kvzSjLOEIKUx",
2111 |         "colab_type": "code",
2112 |         "colab": {}
2113 |       },
2114 |       "source": [
2115 |         "# Evaluate the models and store results\n",
2116 |         "# Create an empty list for the results\n",
2117 |         "results = list()\n",
2118 |         "\n",
2119 |         "# Create an empty list for the model names\n",
2120 |         "names = list()\n",
2121 |         "\n",
2122 |         "# Create a for loop that iterates over each name, model in models dictionary \n",
2123 |         "for name, model in models.items():\n",
2124 |         "\n",
2125 |         "\t# Call evaluate_model(model) and assign it to variable called scores\n",
2126 |         "\tscores = evaluate_model(model)\n",
2127 |         " \n",
2128 |         "  # Append output from scores to the results list\n",
2129 |         "\tresults.append(scores)\n",
2130 |         " \n",
2131 |         "  # Append name to the names list\n",
2132 |         "\tnames.append(name)\n",
2133 |         " \n",
2134 |         "  # Print name, mean and standard deviation of scores:\n",
2135 |         "\tprint('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\n",
2136 |         " \n",
2137 |         "# Plot model performance for comparison using names for x and results for y and setting showmeans to True\n",
2138 |         "sns.boxplot(x=names, y=results, showmeans=True)"
2139 |       ],
2140 |       "execution_count": null,
2141 |       "outputs": []
2142 |     },
2143 |     {
2144 |       "cell_type": "markdown",
2145 |       "metadata": {
2146 |         "id": "ZMgN44SwcJPG",
2147 |         "colab_type": "text"
2148 |       },
2149 |       "source": [
2150 |         "## **Final Observation**\n",
2151 |         "- Adding a layer did not improve results.\n",
2152 |         "- Complexity does not always make a better model\n",
2153 |         "- Could try different base models to stack for both of the datasets and that may show improvements over baseline.\n",
2154 |         "- Generate polynomial features \n",
2155 |         "- Try sklearn feature selection\n",
2156 |         "- Try feature engineering - creating new features from existing ones (but remember to remove the original features to avoid multicollinearity)\n",
2157 |         "- Tune hyperparameters for grid search as previously with Stacking Classifier\n",
2158 |         "- When there is a tie between a baseline model and a stacked model, choose the simpler model!"
2159 |       ]
2160 |     },
2161 |     {
2162 |       "cell_type": "markdown",
2163 |       "metadata": {
2164 |         "id": "Z4iX02EkDujS",
2165 |         "colab_type": "text"
2166 |       },
2167 |       "source": [
2168 |         "---\n",
2169 |         "\n",
2170 |         "# Q&A\n",
2171 |         "\n",
2172 |         "---"
2173 |       ]
2174 |     },
2175 |     {
2176 |       "cell_type": "markdown",
2177 |       "metadata": {
2178 |         "id": "kNWB_J4QD0Ad",
2179 |         "colab_type": "text"
2180 |       },
2181 |       "source": [
2182 |         "# Back to the slides for wrap-up..."
2183 |       ]
2184 |     }
2185 |   ]
2186 | }


--------------------------------------------------------------------------------
	n_preg	pl_glucose	dia_bp	tri_thick	serum_ins	bmi	diab_ped	age	class
count	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000
mean	3.845052	120.894531	69.105469	20.536458	79.799479	31.992578	0.471876	33.240885	0.348958
std	3.369578	31.972618	19.355807	15.952218	115.244002	7.884160	0.331329	11.760232	0.476951
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.078000	21.000000	0.000000
25%	1.000000	99.000000	62.000000	0.000000	0.000000	27.300000	0.243750	24.000000	0.000000
50%	3.000000	117.000000	72.000000	23.000000	30.500000	32.000000	0.372500	29.000000	0.000000
75%	6.000000	140.250000	80.000000	32.000000	127.250000	36.600000	0.626250	41.000000	1.000000
max	17.000000	199.000000	122.000000	99.000000	846.000000	67.100000	2.420000	81.000000	1.000000
	n_preg	pl_glucose	dia_bp	tri_thick	serum_ins	bmi	diab_ped	age	class
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1